CN112507061A

CN112507061A - Multi-relation medical knowledge extraction method, device, equipment and storage medium

Info

Publication number: CN112507061A
Application number: CN202011476005.3A
Authority: CN
Inventors: 付亚州
Original assignee: Kangjian Information Technology Shenzhen Co Ltd
Current assignee: Kangjian Information Technology Shenzhen Co Ltd
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2021-03-16

Abstract

The invention relates to the field of artificial intelligence and discloses a method, a device, equipment and a storage medium for extracting multi-relation medical knowledge. The method comprises the following steps: inputting a plurality of historical medical sentences and corresponding label files into a preset pre-training stack model, and extracting medical knowledge relations in the historical medical sentences through a first pre-training model; predicting two entity characteristics associated with each medical knowledge relationship in each historical medical statement by using a second pre-training model, and combining the two entity characteristics to obtain a triple; continuously training the pre-training stacking model until a multi-relation medical knowledge extraction model is obtained; and acquiring medical sentences of medical knowledge relations to be extracted, inputting the medical sentences into the multi-relation medical knowledge extraction model, and outputting one or more triples in the medical sentences. The invention also relates to a blockchain technique, wherein the medical statement is stored in a blockchain. The invention realizes the information extraction of the medical knowledge of the multi-relation.

Description

Multi-relation medical knowledge extraction method, device, equipment and storage medium

Technical Field

The invention relates to the field of artificial intelligence, in particular to a method, a device, equipment and a storage medium for extracting multi-relation medical knowledge.

Background

Information extraction is a text processing technology for extracting fact information such as entities, attributes, relationships, events and the like from natural language texts, and is an important basis for artificial intelligence application such as information retrieval, intelligent question answering, intelligent conversation and the like. The information extraction of medical knowledge plays an important role in constructing a medical knowledge map, automatically asking for answers in medical science and improving the efficiency of doctors. The automatic extraction of structured knowledge from the physician's inquiry session has an important role in building the underlying medical knowledge base. A triple of knowledge, containing two entities and a relationship, may be extracted from the diagnostic dialog of the doctor and the user. For example, the symptom of acute upper respiratory infection is fever, and the information extraction model can extract two entities of acute upper respiratory infection and fever, and the relationship is the symptom.

The traditional information extraction model has good prediction performance only when a single entity in a medical statement corresponds to a single relation, but has a great problem in extracting a plurality of entities corresponding to a relation; on the diagnosis dialog that the same group of entities corresponds to the extraction of a plurality of relations, the traditional information extraction model can only extract one relation and can not extract all knowledge. Therefore, the traditional knowledge extraction model has higher difficulty in extracting information of multi-relation medical knowledge.

Disclosure of Invention

The invention mainly aims to solve the technical problem that the information extraction difficulty of the traditional knowledge extraction model on the multi-relation medical knowledge is high.

The invention provides a multi-relation medical knowledge extraction method in a first aspect, which comprises the following steps:

acquiring a plurality of historical medical sentences of doctor-patient conversation, and labeling the historical medical sentences to obtain corresponding labeling files;

inputting the historical medical sentences and the label file into a preset pre-training stack model, and performing relation classification on each historical medical sentence through a first pre-training model in the pre-training stack model to obtain one or more medical knowledge relations in each historical medical sentence;

predicting two entity features associated with the one or more medical knowledge relationships in each historical medical statement through a second pre-trained model in the pre-trained stacked models to obtain one or more triples in each historical medical statement;

training the pre-training stacking model according to the triples and the labeling files until the pre-training stacking model is converged to obtain a multi-relation medical knowledge extraction model;

and acquiring a medical statement of the medical knowledge relationship to be extracted, inputting the medical statement into the multi-relationship medical knowledge extraction model for medical knowledge relationship and entity feature extraction, and outputting one or more triples with multi-relationship medical knowledge in the medical statement.

Optionally, in a first implementation manner of the first aspect of the present invention, the pre-training stack model further includes an input layer, and after the inputting the historical medical statement and the markup document into a preset pre-training stack model, the method further includes:

and performing secondary word segmentation processing on each historical medical statement through the input layer to obtain secondary coding information of each single word in each historical medical statement.

Optionally, in a second implementation manner of the first aspect of the present invention, the obtaining, by using the first pre-trained model in the pre-trained stacked model, one or more medical knowledge relationships existing in each historical medical statement by performing relationship classification on each historical medical statement includes:

inputting the secondary coding information into a first pre-training model in the pre-training stacked models, and extracting one or more relation features in the secondary coding information through the first pre-training model;

and matching classification labels corresponding to the one or more relation features, and determining one or more medical knowledge relations existing in each historical medical statement based on the classification labels.

Optionally, in a third implementation manner of the first aspect of the present invention, the predicting, by the second pre-trained model in the pre-trained stacked model, two entity features associated with the one or more medical knowledge relationships in each historical medical statement to obtain one or more triples in each historical medical statement includes:

according to the predicted medical knowledge relationship in each historical medical statement, respectively combining the secondary codes in each historical medical statement and the corresponding medical knowledge relationship to obtain one or more training samples corresponding to each historical medical statement;

sequentially inputting the training samples into a second pre-training model in the pre-training stacked model, and extracting a plurality of entity characteristics in the secondary coding information through the second pre-training model;

and sequentially screening two entity features associated with each medical knowledge relation from the entity features, and sequentially combining each medical knowledge relation and the associated two entity features to obtain one or more triples in each historical medical statement.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the training the pre-trained stacked model according to the triplet and the markup file until the pre-trained stacked model converges to obtain a multi-relationship medical knowledge extraction model includes:

s1, calculating a cross entropy loss value of the pre-training stack model according to the triples and the label file, and judging whether the cross entropy loss value is smaller than a preset loss threshold value;

s2, if the result is smaller than the preset threshold value, obtaining a multi-relation medical knowledge extraction model, and if the result is larger than the preset threshold value, retraining the pre-trained stacking model;

s3, executing steps S1-S2 in a circulating mode until the cross entropy loss value is smaller than a preset loss threshold value or the training times exceed a preset training time threshold value, and obtaining the multi-relation medical knowledge extraction model.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the calculating, according to the triplet and the markup file, a cross entropy loss value of the pre-training stack model includes:

calculating the classification accuracy of the medical knowledge relationship according to the classification result of the medical knowledge relationship in the triple and the label file, and calculating the prediction accuracy of the entity characteristics according to the prediction result of the entity characteristics in the triple and the label file;

calculating a classification loss value of the first pre-training model according to the preset first model training parameter and the classification accuracy, and calculating a prediction loss value of the second pre-training model according to the second model training parameter and the prediction accuracy;

and calculating the cross entropy loss value of the pre-training stack model according to the classification loss value and the prediction loss value.

Optionally, in a sixth implementation manner of the first aspect of the present invention, the multi-relationship medical knowledge extraction model includes an input layer, a relationship extraction model, and an entity extraction model, where inputting the medical statement into the multi-relationship medical knowledge extraction model to perform medical knowledge relationship and entity feature extraction, and outputting one or more triples with multi-relationship medical knowledge in the medical statement includes:

inputting the medical sentences into the input layer for secondary word segmentation processing, and coding the medical sentences subjected to the secondary word segmentation processing to obtain secondary coding information of the historical medical sentences;

inputting the secondary coding information into the relation extraction model, and extracting a plurality of medical knowledge relations existing in the medical statement through the relation extraction model;

combining the secondary coding information and the medical knowledge relations to obtain a combined medical knowledge relation;

and inputting the combined medical knowledge relationship into the entity extraction model, and sequentially extracting two entity characteristics associated with each medical knowledge relationship through the entity extraction model to obtain a plurality of triples in the medical statement.

The second aspect of the present invention provides a multi-relation medical knowledge extraction apparatus, comprising:

the marking module is used for acquiring a plurality of historical medical sentences of doctor-patient conversation and marking the historical medical sentences to obtain corresponding marking files;

the classification module is used for inputting the historical medical sentences and the labeling files into a preset pre-training stack model, and performing relation classification on the historical medical sentences through a first pre-training model in the pre-training stack model to obtain one or more medical knowledge relations in the historical medical sentences;

the prediction module is used for predicting two entity characteristics associated with the one or more medical knowledge relations in each historical medical statement through a second pre-training model in the pre-training stacked models to obtain one or more triples in each historical medical statement;

the training module is used for training the pre-training stacking model according to the triples and the labeled files until the pre-training stacking model is converged to obtain a multi-relation medical knowledge extraction model;

and the extraction module is used for acquiring the medical statement of the medical knowledge relationship to be extracted, inputting the medical statement into the multi-relationship medical knowledge extraction model for medical knowledge relationship and entity feature extraction, and outputting one or more triples with the multi-relationship medical knowledge in the medical statement.

Optionally, in a first implementation manner of the first aspect of the present invention, the pre-training stack model further includes an input layer, and the multi-relation medical knowledge extraction apparatus further includes:

and the word segmentation module is used for performing secondary word segmentation processing on each historical medical statement through the input layer to obtain secondary coding information of each single word in each historical medical statement.

Optionally, in a second implementation manner of the first aspect of the present invention, the classification module includes:

the first extraction unit is used for inputting the secondary coding information into a first pre-training model in the pre-training stack model, and extracting one or more relation features in the secondary coding information through the first pre-training model;

and the matching unit is used for matching the classification labels corresponding to the one or more relation features and determining one or more medical knowledge relations existing in each historical medical statement based on the classification labels.

Optionally, in a third implementation manner of the first aspect of the present invention, the prediction module includes:

the first combination unit is used for respectively combining the secondary codes in the historical medical sentences and the corresponding medical knowledge relations according to the predicted medical knowledge relations in the historical medical sentences to obtain one or more training samples corresponding to the historical medical sentences;

the second extraction unit is used for sequentially inputting the training samples into a second pre-training model in the pre-training stacked model and extracting a plurality of entity characteristics in the secondary coding information through the second pre-training model;

and the screening unit is used for screening the two entity characteristics associated with each medical knowledge relationship from the entity characteristics in sequence, and combining each medical knowledge relationship and the two associated entity characteristics in sequence to obtain one or more triples in each historical medical statement.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the training module includes:

the calculation unit is used for calculating a cross entropy loss value of the pre-training stack model according to the triple and the label file, and judging whether the cross entropy loss value is smaller than a preset loss threshold value or not;

the judging unit is used for obtaining a multi-relation medical knowledge extraction model if the cross entropy loss value is smaller than a preset loss threshold value, and retraining the pre-training stack model if the cross entropy loss value is larger than the preset loss threshold value;

and the circulating unit is used for circularly executing the step calculating unit and the judging unit until the cross entropy loss value is smaller than a preset loss threshold value or the training times exceed a preset training time threshold value, so that the multi-relation medical knowledge extraction model is obtained.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the computing unit is further configured to:

Optionally, in a sixth implementation manner of the first aspect of the present invention, the multi-relationship medical knowledge extraction model includes an input layer, a relationship extraction model, and an entity extraction model, and the extraction module includes:

the word segmentation unit is used for inputting the medical statement into the input layer to perform secondary word segmentation processing, and coding the medical statement subjected to the secondary word segmentation processing to obtain secondary coding information of the historical medical statement;

the relation extraction unit is used for inputting the secondary coding information into the relation extraction model and extracting a plurality of medical knowledge relations existing in the medical statement through the relation extraction model;

the second combination unit is used for combining the secondary coding information and the medical knowledge relations to obtain a combined medical knowledge relation;

and the generating unit is used for inputting the combined medical knowledge relationship into the entity extraction model, and sequentially extracting two entity characteristics associated with each medical knowledge relationship through the entity extraction model to obtain a plurality of triples in the medical statement.

A third aspect of the present invention provides a multi-relationship medical knowledge extraction device, comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the multi-relational medical knowledge extraction device to perform the multi-relational medical knowledge extraction method described above.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to execute the above-mentioned multi-relationship medical knowledge extraction method.

In the technical scheme provided by the invention, in the model training stage, a plurality of historical medical sentences of doctor-patient conversation and corresponding label files are used as training samples; firstly, carrying out relation classification on each historical medical statement through a first pre-training model in a preset pre-training stack model to obtain one or more medical knowledge relations existing in each historical medical statement; predicting two entity characteristics associated with one or more medical knowledge relations in each historical medical statement through a second pre-training model in the pre-training stacked model to obtain one or more triples in each historical medical statement; continuously training the pre-training stacking model until the pre-training stacking model is converged, and obtaining a multi-relation medical knowledge extraction model; in the stage of model application, medical sentences of medical knowledge relations to be extracted are obtained, one or more triples in the medical sentences are extracted through the multi-relation medical knowledge extraction model so as to determine the multi-relation medical knowledge in the medical sentences, and the information extraction of the multi-relation medical knowledge is realized.

Drawings

FIG. 1 is a diagram of a first embodiment of a method for extracting multi-relationship medical knowledge according to an embodiment of the invention;

FIG. 2 is a diagram of a second embodiment of the method for extracting multi-relation medical knowledge according to the embodiment of the invention;

FIG. 3 is a diagram of a third embodiment of a multi-relation medical knowledge extraction method according to an embodiment of the invention;

FIG. 4 is a diagram of an embodiment of a multi-relation medical knowledge extraction apparatus in an embodiment of the invention;

FIG. 5 is a schematic diagram of another embodiment of a multi-relation medical knowledge extraction apparatus in an embodiment of the invention;

FIG. 6 is a diagram of an embodiment of a multi-relationship medical knowledge extraction device in an embodiment of the invention.

Detailed Description

The embodiment of the invention provides a method, a device, equipment and a storage medium for extracting multi-relation medical knowledge, wherein a plurality of historical medical sentences and corresponding label files are input into a preset pre-training stack model, and medical knowledge relations in the historical medical sentences are extracted through a first pre-training model; predicting two entity characteristics associated with each medical knowledge relationship in each historical medical statement by using a second pre-training model, and combining the two entity characteristics to obtain a triple; continuously training the pre-training stacking model until a multi-relation medical knowledge extraction model is obtained; and acquiring medical sentences of medical knowledge relations to be extracted, inputting the medical sentences into the multi-relation medical knowledge extraction model, and outputting one or more triples in the medical sentences. The method and the device realize the extraction of a plurality of triples existing in the medical sentences, namely the information extraction of the medical knowledge of the multi-relation.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of understanding, a specific flow of an embodiment of the present invention is described below, and referring to fig. 1, a first embodiment of a method for extracting multiple relation medical knowledge in an embodiment of the present invention includes:

101. acquiring a plurality of historical medical sentences of doctor-patient conversation, and labeling the historical medical sentences to obtain corresponding labeling files;

it is to be understood that the executing subject of the present invention may be a multi-relation medical knowledge extracting apparatus, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject. It is emphasized that, in order to further ensure the privacy and security of the medical statement, the medical statement may also be stored in a node of a blockchain.

In the embodiment, historical medical sentences required by training of the pre-training stacking model can be acquired through online conversation of the cloud medical institution; labeling medical knowledge relationship and entity characteristics existing in each historical medical statement, wherein the entity characteristics comprise disease names, disease symptom expressions, measurement data, prevention methods, treatment methods, family histories and the like, the medical knowledge relationship refers to the relationship existing between the two entity characteristics and comprises symptoms, prevention, measurement, treatment and the like, for example, if the historical medical statement is fever which is the symptom of acute upper respiratory tract infection, the acute upper respiratory tract infection and the fever are the two entity characteristics, and the medical knowledge relationship is the symptom.

102. Inputting the historical medical sentences and the label file into a preset pre-training stack model, and performing relation classification on each historical medical sentence through a first pre-training model in the pre-training stack model to obtain one or more medical knowledge relations in each historical medical sentence;

in this embodiment, the preset pre-training stacked model is formed by stacking one input layer and two pre-training models, where the input layer codes each word in the historical medical sentence, and the accuracy of word segmentation using the input layer is higher than that of coding each word in the historical medical sentence. And a plurality of triples may be contained in one historical medical sentence, that is, the triples represent that the historical medical sentence contains a plurality of medical knowledge relationships, and the medical knowledge relationships existing in the historical medical training sentence can be determined through the first pre-training model.

103. Predicting two entity features associated with the one or more medical knowledge relationships in each historical medical statement through a second pre-trained model in the pre-trained stacked models to obtain one or more triples in each historical medical statement;

in this embodiment, the second pre-training model may be a multi-label single classification model, and may specifically be a BERT (Bidirectional Encoder Representation from Transformers) model, which predicts entity features existing in the historical medical statement, and selects two entity features associated with each medical knowledge relationship from the entity features, and may combine the two entity features into a triplet, where the triplet has the medical knowledge relationship and identification information of the two entity features, and the multi-relationship medical knowledge of the historical medical statement may be determined by the triplet.

104. Training the pre-training stacking model according to the triples and the labeling files until the pre-training stacking model is converged to obtain a multi-relation medical knowledge extraction model;

in this embodiment, the triple is a prediction of medical knowledge relationship and entity features in the historical medical statement, and the labeled data is a true medical knowledge relationship and a true entity feature existing in the historical medical statement, and when the accuracy of the prediction result exceeds a preset degree, the pre-training stacked model converges to obtain the multi-relationship medical knowledge extraction model.

105. And acquiring a medical statement of the medical knowledge relationship to be extracted, inputting the medical statement into the multi-relationship medical knowledge extraction model for medical knowledge relationship and entity feature extraction, and outputting one or more triples with multi-relationship medical knowledge in the medical statement.

In this embodiment, the information extraction system for multi-relationship medical knowledge is a deep learning technology based on medical industry features, fully extracts medical knowledge features, fully extracts triples in medical statements by a finally trained multi-relationship medical knowledge extraction model, and solves related problems of multi-relationship multi-entity extraction and the like in the medical statements, wherein the multi-relationship medical knowledge can be used for building a medical basic structural database, a knowledge graph and the like.

In the embodiment of the invention, in the model training stage, a plurality of historical medical sentences of doctor-patient conversation and corresponding label files are used as training samples; firstly, carrying out relation classification on each historical medical statement through a first pre-training model in a preset pre-training stack model to obtain one or more medical knowledge relations existing in each historical medical statement; predicting two entity characteristics associated with one or more medical knowledge relations in each historical medical statement through a second pre-training model in the pre-training stacked model to obtain one or more triples in each historical medical statement; continuously training the pre-training stacking model until the pre-training stacking model is converged, and obtaining a multi-relation medical knowledge extraction model; in the stage of model application, medical sentences of medical knowledge relations to be extracted are obtained, one or more triples in the medical sentences are extracted through the multi-relation medical knowledge extraction model so as to determine the multi-relation medical knowledge in the medical sentences, and the information extraction of the multi-relation medical knowledge is realized.

Referring to fig. 2, a second embodiment of the method for extracting multi-relationship medical knowledge according to the embodiment of the present invention includes:

201. acquiring a plurality of historical medical sentences of doctor-patient conversation, and labeling the historical medical sentences to obtain corresponding labeling files;

202. inputting the historical medical sentences and the label files into a preset pre-training stack model, and performing secondary word segmentation processing on each historical medical sentence through the input layer to obtain secondary coding information of each single word in each historical medical sentence;

in this embodiment, the secondary word segmentation processing is to divide each sub-word in the historical medical sentence into a plurality of categories according to the medical field, each category is represented by a category code, and then the words in each category are encoded by the same encoding rule, and the category code and the word code are concatenated to obtain secondary encoding information, which can both shorten the encoding length and reduce the encoding complexity.

203. Inputting the secondary coding information into a first pre-training model in the pre-training stacked models, and extracting one or more relation features in the secondary coding information through the first pre-training model;

204. matching classification labels corresponding to the one or more relation features, and determining one or more medical knowledge relations existing in each historical medical statement based on the classification labels;

in this embodiment, the first pre-training model is a multi-label multi-classification model, which may be a BERT model, and may extract a plurality of medical knowledge relationships from a historical medical statement, where the relationship features are coding combinations representing the medical knowledge relationships, and the classification labels are associated with the coding combinations to represent what medical knowledge relationships the coding combinations belong to.

205. According to the predicted medical knowledge relationship in each historical medical statement, respectively combining the secondary codes in each historical medical statement and the corresponding medical knowledge relationship to obtain one or more training samples corresponding to each historical medical statement;

206. sequentially inputting the training samples into a second pre-training model in the pre-training stacked model, and extracting a plurality of entity characteristics in the secondary coding information through the second pre-training model;

207. sequentially screening two entity features associated with each medical knowledge relation from the entity features, and sequentially combining each medical knowledge relation and the associated two entity features to obtain one or more triples in each historical medical statement;

in this embodiment, the number of training samples input into the second pre-training model is determined according to the medical knowledge relationship obtained by the first pre-training model, for example, 2 medical knowledge relationships are obtained in the historical medical sentences, the two medical knowledge relationships are respectively combined with the secondary coding information of the historical medical sentences, and are divided into 2 training samples, and the content of the training samples is the secondary coding information + medical knowledge relationship 1 and the secondary coding information + medical knowledge relationship 2. And inputting the two training samples into a second pre-training model, and finally outputting two different triples.

208. Training the pre-training stacking model according to the triples and the labeling files until the pre-training stacking model is converged to obtain a multi-relation medical knowledge extraction model;

in this embodiment, the loss value of the current pre-training stack model may be calculated according to the triplet and the label file, so as to measure the prediction accuracy of the current pre-training stack model, and thus determine whether to continue training the pre-training stack model. The specific iterative process comprises the following steps:

(1) calculating a cross entropy loss value of the pre-training stack model according to the triple and the label file, and judging whether the cross entropy loss value is smaller than a preset loss threshold value;

(2) if the pre-training stacking model is smaller than the pre-training stacking model, obtaining a multi-relation medical knowledge extraction model, and if the pre-training stacking model is larger than the pre-training stacking model, retraining the pre-training stacking model;

(3) circularly executing the steps (1) to (2) until the cross entropy loss value is smaller than a preset loss threshold value or the training times exceed a preset training time threshold value, and obtaining a multi-relation medical knowledge extraction model;

in this embodiment, the accuracy of the prediction result of the pre-training stacked model is evaluated through the cross entropy loss value, when the cross entropy loss value exceeds the preset loss threshold value, it can be determined that the prediction accuracy of the pre-training stacked model exceeds the preset degree, the model convergence is judged, and the prediction of the medical knowledge relationship and the entity characteristics in the subsequent medical statement is directly applied. The specific calculation mode of the cross entropy loss value is as follows:

(1) calculating the classification accuracy of the medical knowledge relationship according to the classification result of the medical knowledge relationship in the triple and the label file, and calculating the prediction accuracy of the entity characteristics according to the prediction result of the entity characteristics in the triple and the label file;

(2) calculating a classification loss value of the first pre-training model according to the preset first model training parameter and the classification accuracy, and calculating a prediction loss value of the second pre-training model according to the second model training parameter and the prediction accuracy;

(3) and calculating the cross entropy loss value of the pre-training stack model according to the classification loss value and the prediction loss value.

In this embodiment, the preset first model parameter and the preset second model parameter are initially adjusted by a developer according to experience, and in the model training process, the model parameter is modified according to the prediction result of the model, so as to improve the prediction accuracy of the model.

209. And acquiring a medical statement of the medical knowledge relationship to be extracted, inputting the medical statement into the multi-relationship medical knowledge extraction model for medical knowledge relationship and entity feature extraction, and outputting one or more triples with multi-relationship medical knowledge in the medical statement.

In the embodiment of the invention, the classification and extraction of medical knowledge relationship on historical medical sentences through a first pre-training model and a second pre-training model in a pre-training stack model are sequentially introduced, two associated entity features are extracted from the historical medical sentences through the medical knowledge relationship, and a triple can be obtained through the combination of one medical knowledge relationship and the two entity features so as to describe medical multi-relationship medical knowledge in the historical medical sentences.

Referring to fig. 3, a fourth embodiment of the method for extracting multi-relationship medical knowledge according to the embodiment of the present invention includes:

301. acquiring a plurality of historical medical sentences of doctor-patient conversation, and labeling the historical medical sentences to obtain corresponding labeling files;

302. inputting the historical medical sentences and the label file into a preset pre-training stack model, and performing relation classification on each historical medical sentence through a first pre-training model in the pre-training stack model to obtain one or more medical knowledge relations in each historical medical sentence;

303. predicting two entity features associated with the one or more medical knowledge relationships in each historical medical statement through a second pre-trained model in the pre-trained stacked models to obtain one or more triples in each historical medical statement;

304. training the pre-training stacking model according to the triples and the labeling files until the pre-training stacking model is converged to obtain a multi-relation medical knowledge extraction model;

305. inputting the medical sentences into the input layer for secondary word segmentation processing, and coding the medical sentences subjected to the secondary word segmentation processing to obtain secondary coding information of the historical medical sentences;

in the embodiment, in practical application, when artificial intelligence applications such as medical knowledge mapping, intelligent medical question answering and intelligent medical dialogue are constructed, medical relation sentences are input into a preset stacking model, and multi-relation medical knowledge in the preset stacking model can be extracted; only when a medical statement is input, the preset stack model explains the extraction process of the multi-relation medical knowledge, and in the practical application process, for example, when a medical knowledge map is constructed, a plurality of medical statements can be simultaneously input into the preset stack model, and the multi-relation medical knowledge is sequentially extracted from the medical statements to construct a medical knowledge picture.

In this embodiment, a word segmentation tool based on words is usually used in the field, and there is a possibility of word vector boundary segmentation, and here, a word segmentation tool based on single words is used, so that the accuracy of the segmentation features is higher, and the length of codes is reduced and the difficulty of codes is reduced through two-stage word segmentation processing, and for single words of different classes, the same set of coding rules is used. Namely, the second-level word segmentation processing adopts a multi-class multi-label word segmentation mode for each single word in the medical sentence. Such as: the cancer is divided into liver cancer and gastric cancer, the stage is divided into early stage, middle stage and late stage, and there are six permutation combinations in total, so that one permutation combination, liver cancer-late stage, can be coded as [1,0,0,0,1], the first two are cancer categories, and the last three are stage.

306. Inputting the secondary coding information into the relation extraction model, and extracting a plurality of medical knowledge relations existing in the medical statement through the relation extraction model;

in this embodiment, the database stores the classification tags of the medical knowledge relationships, the classification tags correspond to the secondary coding information one by one, after the secondary coding information in the medical statement is determined, the classification tags corresponding to the secondary coding information can be determined, and the corresponding medical knowledge relationships can be found by using the classification tags as indexes. Here, the secondary coded information represents an input medical statement, and is input to the relationship extraction model in a fixed format of { [ CLS ] secondary coded information }, and the extracted medical mere relationship is also represented in a coded form.

307. Combining the secondary coding information and the medical knowledge relations to obtain a combined medical knowledge relation;

308. and inputting the combined medical knowledge relationship into the entity extraction model, and sequentially extracting two entity characteristics associated with each medical knowledge relationship through the entity extraction model to obtain a plurality of triples in the medical statement.

In this embodiment, for a plurality of medical knowledge relationships extracted by the relationship extraction model, a new sample is combined with the secondary coding confidence of the original medical statement to be input into the entity extraction model for entity feature extraction, and a triplet is generated, that is, how many medical statements in a medical statement are only relationships, the medical knowledge relationships can be combined into samples of the same number to obtain triplets of the same number. For example, a medical statement is extracted to obtain two medical knowledge relationships [ relationship 1, relationship 2], and the medical knowledge relationships and the secondary coding information are combined to form two new samples: { [ CLS ] secondary coding information [ SEP ] relation 1, [ CLS ] secondary coding information [ SEP ] relation 2}, sequentially inputting the entity extraction model to obtain two entity characteristics associated with relation 1 and two entity characteristics associated with relation 2, and finally obtaining two triples.

In the embodiment of the invention, the extraction process of the multi-relation medical knowledge is introduced in detail, medical sentences output by users are directly input into a trained multi-relation knowledge extraction model, and a plurality of triples of multi-relation multi-entities are separated from the medical sentences through word segmentation tools, classification models and prediction models in the multi-relation knowledge extraction model.

In the above description of the method for extracting multi-relationship medical knowledge in the embodiment of the present invention, referring to fig. 4, the following description of the device for extracting multi-relationship medical knowledge in the embodiment of the present invention, an embodiment of the device for extracting multi-relationship medical knowledge in the embodiment of the present invention includes:

the labeling module 401 is configured to obtain a plurality of historical medical sentences of doctor-patient conversations, label the historical medical sentences to obtain corresponding label files;

a classification module 402, configured to input the historical medical sentences and the markup document into a preset pre-training stack model, and perform relationship classification on each historical medical sentence through a first pre-training model in the pre-training stack model to obtain one or more medical knowledge relationships existing in each historical medical sentence;

a predicting module 403, configured to predict, through a second pre-training model in the pre-training stacked models, two entity features associated with the one or more medical knowledge relationships in each historical medical statement, to obtain one or more triples in each historical medical statement;

a training module 404, configured to train the pre-training stack model according to the triplet and the markup file until the pre-training stack model converges to obtain a multi-relationship medical knowledge extraction model;

the extracting module 405 is configured to obtain a medical statement of a medical knowledge relationship to be extracted, input the medical statement into the multi-relationship medical knowledge extraction model to perform medical knowledge relationship and entity feature extraction, and output one or more triples with multi-relationship medical knowledge in the medical statement.

Referring to fig. 5, another embodiment of the apparatus for extracting multiple relational medical knowledge according to the embodiment of the present invention includes:

Specifically, the multi-relationship medical knowledge extraction apparatus further includes:

and the word segmentation module 406 is configured to perform secondary word segmentation processing on each historical medical statement through the input layer to obtain secondary coding information of each individual word in each historical medical statement.

Specifically, the classification module 402 includes:

a first extracting unit 4021, configured to input the secondary coding information into a first pre-training model in the pre-training stacked models, and extract one or more relationship features in the secondary coding information through the first pre-training model;

a matching unit 4022, configured to match the classification label corresponding to the one or more relationship features, and determine one or more medical knowledge relationships existing in each historical medical statement based on the classification label.

Specifically, the prediction module 403 includes:

a first combination unit 4031, configured to respectively combine the secondary codes in the historical medical statements and the corresponding medical knowledge relationships according to the predicted medical knowledge relationships in the historical medical statements, so as to obtain one or more training samples corresponding to the historical medical statements;

a second extracting unit 4032, configured to sequentially input the training samples into a second pre-training model in the pre-training stack model, and extract, through the second pre-training model, a plurality of entity features in the secondary coding information;

a screening unit 4033, configured to sequentially screen two entity features associated with each medical knowledge relationship from the multiple entity features, and sequentially combine each medical knowledge relationship and the associated two entity features to obtain one or more triples in each historical medical statement.

Specifically, the training module 404 includes:

a calculating unit 4041, configured to calculate a cross entropy loss value of the pre-training stack model according to the triplet and the markup file, and determine whether the cross entropy loss value is smaller than a preset loss threshold;

a judging unit 4042, configured to obtain a multi-relationship medical knowledge extraction model if the cross entropy loss value is smaller than a preset loss threshold, and train the pre-training stack model again if the cross entropy loss value is larger than the preset loss threshold;

and the circulating unit 4043 is configured to circularly execute the step calculating unit and the judging unit, and stop when the cross entropy loss value is smaller than a preset loss threshold or the training frequency exceeds a preset training frequency threshold, so as to obtain the multi-relationship medical knowledge extraction model.

Specifically, the computing unit is further configured to:

Specifically, the extraction module 405 includes:

the word segmentation unit 4051 is configured to input the medical statement into the input layer to perform secondary word segmentation processing, and encode the medical statement after the secondary word segmentation processing to obtain secondary encoding information of the historical medical statement;

a relation extraction unit 4052, configured to input the secondary coding information into the relation extraction model, and extract a plurality of medical knowledge relations existing in the medical statement through the relation extraction model;

a second combining unit 4053, configured to combine the secondary coded information and the medical knowledge relationships to obtain a combined medical knowledge relationship;

the generating unit 4054 inputs the combined medical knowledge relationship into the entity extraction model, and sequentially extracts two entity features associated with each medical knowledge relationship through the entity extraction model to obtain a plurality of triples in the medical statement.

In the embodiment of the invention, the classification extraction of medical knowledge relationship is sequentially carried out on historical medical sentences through a first pre-training model and a second pre-training model in a pre-training stack model, two associated entity features are extracted from the historical medical sentences through the medical knowledge relationship, and a triple can be obtained through the combination of one medical knowledge relationship and the two entity features so as to describe medical multi-relationship medical knowledge in the historical medical sentences; and then, the extraction process of the multi-relation medical knowledge is further introduced in detail, medical sentences output by the user are directly input into a trained multi-relation knowledge extraction model, and a plurality of triples of multi-relation multi-entities are separated from the medical sentences through word segmentation tools, classification models and prediction models in the multi-relation knowledge extraction model.

Fig. 4 and 5 describe the multi-relationship medical knowledge extraction apparatus in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the multi-relationship medical knowledge extraction apparatus in the embodiment of the present invention is described in detail from the perspective of hardware processing.

Fig. 6 is a schematic structural diagram of a multi-relationship medical knowledge extraction device 600 according to an embodiment of the present invention, which may have relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 610 (e.g., one or more processors) and a memory 620, and one or more storage media 630 (e.g., one or more mass storage devices) for storing applications 633 or data 632. Memory 620 and storage medium 630 may be, among other things, transient or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations in the multi-relational medical knowledge extraction apparatus 600. Still further, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the multi-relationship medical knowledge extraction device 600.

The multi-relationship medical knowledge extraction apparatus 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input-output interfaces 660, and/or one or more operating systems 631, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the configuration of the multi-relational medical knowledge extraction apparatus shown in FIG. 6 does not constitute a limitation of the multi-relational medical knowledge extraction apparatus, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The present invention also provides a multi-relation medical knowledge extraction device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the multi-relation medical knowledge extraction method in the above embodiments.

The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and which may also be a volatile computer readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the multi-relationship medical knowledge extraction method.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A multi-relation medical knowledge extraction method is characterized by comprising the following steps:

2. The method for extracting multi-relationship medical knowledge according to claim 1, wherein the pre-trained stacked model further comprises an input layer, and after the inputting the historical medical sentences and the annotation files into the pre-trained stacked model, the method further comprises:

3. The method for extracting multi-relationship medical knowledge according to claim 2, wherein the obtaining one or more medical knowledge relationships existing in each historical medical statement by performing relationship classification on each historical medical statement through a first pre-trained model in the pre-trained stacked models comprises:

4. The method of claim 3, wherein predicting two entity features associated with the one or more medical knowledge relationships in each of the historical medical sentences by a second pre-trained model in the pre-trained stacked models to obtain one or more triples in each of the historical medical sentences comprises:

5. The method for extracting multi-relationship medical knowledge according to claim 1, wherein the training the pre-trained stacked model according to the triples and the labeled files until the pre-trained stacked model converges to obtain a multi-relationship medical knowledge extraction model comprises:

s2, if the loss is smaller than a preset loss threshold value, obtaining a multi-relation medical knowledge extraction model, and if the loss is smaller than the preset loss threshold value, retraining the pre-training stacking model;

6. The method of claim 5, wherein the calculating the cross-entropy loss value of the pre-trained stack model based on the triples and the annotation file comprises:

7. The method for extracting multi-relational medical knowledge according to any one of claims 1 to 6, wherein the multi-relational medical knowledge extraction model comprises an input layer, a relation extraction model and an entity extraction model, the inputting the medical statement into the multi-relational medical knowledge extraction model for medical knowledge relation and entity feature extraction, and the outputting one or more triples with multi-relational medical knowledge in the medical statement comprises:

8. A multi-relational medical knowledge extraction apparatus, characterized by comprising:

9. A multi-relational medical knowledge extraction apparatus, characterized in that the multi-relational medical knowledge extraction apparatus comprises: a memory and at least one processor, the memory having instructions stored therein;

the at least one processor invokes the instructions in the memory to cause the multi-relational medical knowledge extraction device to perform the multi-relational medical knowledge extraction method of any one of claims 1-7.

10. A computer-readable storage medium, having stored thereon a computer program, wherein the computer program, when being executed by a processor, implements the method for multi-relationship medical knowledge extraction as defined in any one of claims 1 to 7.