Disclosure of Invention
In order to enhance the information richness of text representation, the embodiment of the invention provides a vector representation generation method and device for a medical text.
In a first aspect, an embodiment of the present invention provides a vector representation generation method for a medical text, including:
constructing single-granularity maps of the medical text and a first adjacency matrix corresponding to each single-granularity map; the single granularity map comprises a character granularity map, a word granularity map and a term granularity map, each single granularity map comprises a plurality of nodes and a plurality of edges, each edge is connected between two nodes, and the first adjacency matrix is determined based on the weight of each edge in the single granularity map corresponding to the first adjacency matrix;
based on the first adjacency matrix, performing multiple iteration operations on each single granularity graph by using a preset graph neural network model to obtain a first vector representation of each node in the single granularity graph; the first vector representation of each node in each single granularity graph is obtained by aggregating the vector representation of each node in the single granularity graph and the vector representation of each-order neighbor node;
splicing the first vector representation of each node in each single granularity graph to obtain a second vector representation of each node;
determining a second adjacency matrix of the multi-granularity graph based on the inclusion relation of the nodes of different single-granularity graphs; the multi-granularity graph is obtained by reconstructing all single-granularity graphs, the multi-granularity graph comprises all nodes in each single-granularity graph, and edges included in the multi-granularity graph are obtained on the basis of the inclusion relations of the nodes of different single-granularity graphs;
based on the second adjacency matrix, performing multiple iteration operations on the multi-granularity graph by using the graph neural network model to obtain target vector representations of all nodes in the multi-granularity graph; and the target vector representation of each node in the multi-granularity graph is obtained by aggregating the second vector representation of each node and the second vector representation of each order of neighbor nodes in the multi-granularity graph.
In one possible design, the constructing the single-granularity map of the medical text and the first adjacency matrix corresponding to each single-granularity map includes:
respectively carrying out word segmentation processing, word segmentation processing and medical knowledge base matching on the medical text to obtain nodes of three types including characters, words and terms;
for each type of node, determining an edge connected between the two nodes;
constructing a single granularity graph corresponding to the type of node based on the same type of node and an edge connected between two nodes in the type;
for each type of node, determining the weight of an edge connecting two nodes in the type;
and constructing a first adjacency matrix corresponding to the type of node based on the weight of the edge connecting the two nodes in the type.
In a possible design, the performing, based on the first adjacency matrix, multiple iterations on each single-granularity map by using a preset map neural network model to obtain a first vector representation of each node in the single-granularity map includes:
determining the current vector representation of each node based on the last vector representation of each node in each single-granularity graph, the first adjacency matrix and the weight parameter matrix; the last vector representation of each node is obtained by performing t-1 iteration operation on the first adjacent matrix by using a preset graph neural network model, the current vector representation of each node is obtained by performing t iteration operation on the first adjacent matrix by using the graph neural network model, t is a positive integer greater than 1, the last vector representation of each node comprises the vector representation of each node and the vector representation of t-1 order neighbor nodes of the node, and the current vector representation of each node comprises the vector representation of each node and the vector representation of t order neighbor nodes of the node;
a first vector representation for each node is determined based on the current vector representation for each node.
In one possible design, the determining a first vector representation for each node based on a current vector representation for each node includes:
determining a feature matrix to be updated of each node based on the last vector representation of each node, the current vector representation of each node, an update gate weight parameter matrix, an update gate deviation parameter matrix and an update gate activation function;
determining a feature matrix to be forgotten of each node based on the last vector representation of each node, the current vector representation of each node, a forgetting gate weight parameter matrix, a forgetting gate deviation parameter matrix and a forgetting gate activation function;
determining a forgetting characteristic matrix of each node based on the last vector representation of each node, the current vector representation of each node, a to-be-forgotten characteristic matrix of each node, a forgetting weight parameter matrix and a forgetting deviation parameter matrix;
and determining the first vector representation of each node based on the forgetting feature matrix of each node, the feature matrix to be updated of each node and the last vector representation of each node.
In a possible design, the performing, based on the second adjacency matrix, a plurality of iterations on the multi-granularity map by using the graph neural network model to obtain a target vector representation of each node in the multi-granularity map includes:
determining the current second vector representation of each node based on the last second vector representation, the second adjacency matrix and the weight parameter matrix of each node in the multi-granularity graph; the last second vector representation of each node is obtained by performing t-1 iteration operation on the second adjacent matrix by using the graph neural network model, the current second vector representation of each node is obtained by performing t-1 iteration operation on the second adjacent matrix by using the graph neural network model, t is a positive integer greater than 1, the last second vector representation of each node comprises the second vector representation of each node and the second vector representation of a t-1 order neighbor node thereof, and the current second vector representation of each node comprises the second vector representation of each node and the second vector representation of a t order neighbor node thereof;
and determining the target vector representation of each node according to the current second vector representation of each node.
In one possible design, the determining the target vector representation for each node based on the current second vector representation for each node includes:
determining a feature matrix to be updated of each node based on the last second vector representation of each node, the current second vector representation of each node, the updated gate weight parameter matrix, the updated gate deviation parameter matrix and the updated gate activation function;
determining a feature matrix to be forgotten of each node based on the last second vector representation of each node, the current second vector representation of each node, a forgetting gate weight parameter matrix, a forgetting gate deviation parameter matrix and a forgetting gate activation function;
determining a forgetting feature matrix of each node based on the last second vector representation of each node, the current second vector representation of each node, a to-be-forgotten feature matrix of each node, a forgetting weight parameter matrix and a forgetting deviation parameter matrix;
and determining the target vector representation of each node based on the forgetting feature matrix of each node, the feature matrix to be updated of each node and the last second vector representation of each node.
In one possible design, the determining a second adjacency matrix of the multi-granularity map based on the inclusion relationship of the nodes of the different single-granularity maps includes:
determining a first weight of an edge connected between nodes of different single-granularity graphs based on the inclusion relation of the nodes of the different single-granularity graphs;
obtaining a second weight of each edge in the multi-granularity graph based on the first weight of each edge in the multi-granularity graph and the cosine similarity of each node to determine a second adjacency matrix; wherein an entry in the second adjacency matrix is the second weight.
In a second aspect, an embodiment of the present invention further provides a vector representation generation apparatus for medical text, including:
the construction module is used for constructing single granularity maps of the medical text and a first adjacency matrix corresponding to each single granularity map; the single granularity map comprises a character granularity map, a word granularity map and a term granularity map, each single granularity map comprises a plurality of nodes and a plurality of edges, each edge is connected between two nodes, and the first adjacency matrix is determined based on the weight of each edge in the single granularity map corresponding to the first adjacency matrix;
the first iteration module is used for carrying out multiple iteration operations on each single granularity graph by using a preset graph neural network model based on the first adjacency matrix to obtain a first vector representation of each node in the single granularity graph; the first vector representation of each node in each single granularity graph is obtained by aggregating the vector representation of each node in the single granularity graph and the vector representation of each-order neighbor node;
the splicing module is used for splicing the first vector representation of each node in each single granularity graph to obtain the second vector representation of each node;
the determining module is used for determining a second adjacency matrix of the multi-granularity graph based on the inclusion relation of the nodes of different single-granularity graphs; the multi-granularity graph is obtained by reconstructing all single-granularity graphs, the multi-granularity graph comprises all nodes in each single-granularity graph, and edges included in the multi-granularity graph are obtained on the basis of the inclusion relations of the nodes of different single-granularity graphs;
the second iteration module is used for carrying out multiple iteration operations on the multi-granularity graph by utilizing the graph neural network model based on the second adjacency matrix to obtain target vector representations of all nodes in the multi-granularity graph; and the target vector representation of each node in the multi-granularity graph is obtained by aggregating the second vector representation of each node and the second vector representation of each order of neighbor nodes in the multi-granularity graph.
In a third aspect, an embodiment of the present invention further provides a computing device, including a memory and a processor, where the memory stores a computer program, and the processor, when executing the computer program, implements the method described in any one of the above.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed in a computer, the computer program causes the computer to execute any one of the methods described above
The embodiment of the invention provides a vector representation generation method and a vector representation generation device for a medical text, wherein a single granularity graph with different granularity information is constructed, and a preset graph neural network model is utilized to carry out repeated iteration operation on each single granularity graph, so that a first vector representation of each node in the single granularity graph is obtained, and the transmission of messages in the graph is completed, and the semantic information among the nodes in the same granularity graph is richer; and then splicing the first vector representations of all nodes in each single-granularity graph to obtain a second vector representation of each node, determining a second adjacent matrix of the multi-granularity graph based on the inclusion relation of the nodes of different single-granularity graphs, and performing multiple iteration operations on the multi-granularity graph by using a graph neural network model, so that the target vector representation of all nodes in the multi-granularity graph is obtained, thereby completing the transmission of messages between graphs, realizing the interactive fusion of the first vector representations of all nodes in different granularity graphs, and further enhancing the information richness of text representation to improve the robustness of medical text representation.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
For natural language processing tasks in a particular domain, there is always a lot of high quality expertise, such as medical terminology in the medical domain. The expression form of the professional knowledge includes dictionary information, semantic relation information, description in wikipedia and the like, and we can refer to the domain prior knowledge. Through the learning of the priori knowledge, human beings can understand the semantics of the natural language in the corresponding field more deeply, and the machine learning model also can understand the semantics in the task.
From the perspective of semantic granularity, knowledge can be divided into fine-grained knowledge and coarse-grained knowledge. In the medical field, for example, fine-grained knowledge often exists in the form of single characters or words, such as "kidney", "blood sugar", etc., which imply site information and disease types. More complete terms, such as the name of the procedure, symptoms, signs, etc., can be considered to be a coarse level of knowledge.
In the process of understanding the whole text semantics, knowledge with different granularities provides information of different levels. Characters are the smallest semantic unit and are also the finest granularity of knowledge. The natural text is analyzed by taking the characters as units, so that the expressed semantics can be covered in a large range. The coarser the granularity, e.g., understanding the text at sentence level, or chapter level, the more complete and accurate the emphasis and core to be passed on by the text is. Many studies are based on knowledge considering character and word granularity, and pre-trained language models are typical representatives of fine-grained information learning, which are trained on large-scale corpora, such as ELMo, BERT, and XLNet, and exhibit powerful performance on multiple tasks. However, in the professional world, there are many unusual expressions and professional terms that are coarser in semantic granularity than characters and words, and these knowledge is discarded in the modeling process at the word level. For the Chinese natural language processing task, this problem is more common due to the Chinese word segmentation errors. Irregular expression in the professional field is easier to be cut off by word segmentation tools. Therefore, the semantic representation of coarse-grained information such as professional terms is lost in the feature extraction process.
Moreover, knowledge of different granularities can complement each other, because coarse-grained information is composed of fine-grained information, and their semantic relationships are related to each other. Some studies have introduced coarser grained word information into character-based models to make up for the missing information, because if the model is based only on characters, the word information would be discarded, and the additional coarse grained information results in a more accurate text representation. Furthermore, more accurate semantics can be obtained if a coarser granularity of knowledge can be involved. In turn, fine-grained information also contributes to the richness of coarse-grained information.
In general, a priori knowledge in the professional domain is of great help to understand natural language. Knowledge of different granularities can provide information of different levels, and the different granularities complement each other, so that good learning of knowledge of various granularities plays an important role in understanding natural language. However, although we have accumulated a large amount of prior knowledge in the professional domain, it has not been fully exploited in existing models. In the research using the prior knowledge, the coarse-grained knowledge is easier to discard in the modeling process, and only part of fine-grained information is effectively utilized.
In order to solve the technical problem, the knowledge with different granularities can be considered to be fused, so that the information richness of the text representation can be enhanced, and the robustness of the medical text representation is improved.
Specific implementations of the above concepts are described below.
Referring to fig. 1, an embodiment of the present invention provides a method for generating a vector representation of a medical text, including:
step 100: constructing single-granularity maps of the medical text and a first adjacency matrix corresponding to each single-granularity map; the single granularity graph comprises a character granularity graph, a word granularity graph and a term granularity graph, each single granularity graph comprises a plurality of nodes and a plurality of edges, each edge is connected between two nodes, and the first adjacency matrix is determined based on the weight of each edge in the single granularity graph corresponding to the first adjacency matrix;
step 102: based on the first adjacency matrix, performing multiple iteration operations on each single granularity graph by using a preset graph neural network model to obtain a first vector representation of each node in the single granularity graph; the first vector representation of each node in each single granularity graph is obtained by aggregating the vector representation of each node in the single granularity graph and the vector representation of each-order neighbor node;
step 104: splicing the first vector representation of each node in each single granularity graph to obtain a second vector representation of each node;
step 106, determining a second adjacency matrix of the multi-granularity graph based on the inclusion relation of the nodes of different single-granularity graphs; the multi-granularity graph is obtained by reconstructing all single-granularity graphs, the multi-granularity graph comprises all nodes in each single-granularity graph, and edges included in the multi-granularity graph are obtained on the basis of the inclusion relations of the nodes of different single-granularity graphs;
step 108, based on the second adjacency matrix, performing multiple iteration operations on the multi-granularity graph by using the graph neural network model to obtain target vector representations of all nodes in the multi-granularity graph; and the target vector representation of each node in the multi-granularity graph is obtained by aggregating the second vector representation of each node and the second vector representation of each order of neighbor nodes in the multi-granularity graph.
In the embodiment of the invention, by constructing single granularity maps with different granularity information and carrying out multiple iteration operations on each single granularity map by using a preset map neural network model, the first vector representation of each node in the single granularity map is obtained, so that the transmission of messages in the map is completed, and the semantic information among the nodes in the same granularity map is richer; and then splicing the first vector representations of all nodes in each single-granularity graph to obtain a second vector representation of each node, determining a second adjacent matrix of the multi-granularity graph based on the inclusion relation of the nodes of different single-granularity graphs, and performing multiple iteration operations on the multi-granularity graph by using a graph neural network model, so that the target vector representation of all nodes in the multi-granularity graph is obtained, thereby completing the transmission of messages between graphs, realizing the interactive fusion of the first vector representations of all nodes in different granularity graphs, and further enhancing the information richness of text representation to improve the robustness of medical text representation.
Each step in fig. 1 is explained below.
With respect to step 100, in some embodiments, step 100 may specifically include the following steps:
and A1, performing word segmentation processing, word segmentation processing and medical knowledge base matching on the medical text respectively to obtain nodes of three types including characters, words and terms.
In this step, the word segmentation processing means processing the medical text in units of characters (i.e., single words), for example, the word segmentation processing of "lumbar degenerative disease" results in "lumbar", "vertebral", "back", "line", "sex", "disease", and "change"; the word segmentation process may utilize a bar segmentation kit to process medical text, for example, the word segmentation process of "lumbar degenerative disease" results in "lumbar", "degenerative", "venereal disease", and "lesion"; the medical knowledge base matching can be performed by using a preset medical knowledge base to match the medical text, for example, the result of the medical knowledge base matching of the lumbar degenerative disease is the lumbar degenerative disease.
Step A2, for each type of node, determines an edge connected between two nodes.
In this step, the ways of determining the collinear relationship between different nodes include sliding windows, point-to-point information, and cosine similarity. In some embodiments, the collinear relationship between nodes of the character type is determined in a sliding window, the collinear relationship between nodes of the word type is determined in a sliding window or point-to-point information, and the collinear relationship between nodes of the term type is determined in a cosine similarity.
Taking the way of point mutual information as an example, if the value of the point mutual information between two nodes is 0, it can be considered that there is no collinear relationship between the two nodes, i.e. there is no edge; if the value of the point mutual information between two nodes is greater than 0, an edge can be constructed between the two nodes, and the point mutual information is used as the weight of the edge.
Taking the cosine similarity as an example, if the cosine similarity between two nodes is 0, it can be considered that there is no collinear relationship between the two nodes, that is, there is no edge; if the cosine similarity between two nodes is greater than 0, an edge may be constructed between the two nodes, and the cosine similarity may be used as the weight of the edge.
Step A3, constructing a single granularity graph corresponding to the type of node based on the same type of node and the edge connecting between two nodes in the type.
Step a4, for each type of node, determines the weight of the edges between different nodes in the type.
Determining the weight of the edge in this step by using a sliding window is well known to those skilled in the art and will not be described herein. Please refer to step a2 by using the point mutual information and cosine similarity.
And A5, constructing a first adjacency matrix corresponding to the type of node based on the weight of the edge between different nodes in the type.
In this step, if the current single-granularity graph includes M nodes, where M is a positive integer greater than or equal to 1, a first adjacency matrix of M × M may be generated, and the value of each entry in the first adjacency matrix corresponds to the weight of the edge between the nodes.
For step 102, the graph neural network model may be a gating-based graph neural network (GGNN) model or a graph convolution model (GCN) model, which is not specifically limited herein. The number of iterative operations can be set according to actual requirements, vector representation of first-order neighbor nodes can be aggregated in each iterative operation, and characteristics of multi-order neighbor nodes can be aggregated in multiple iterative operations. For example, a first iterative operation may aggregate the vector representation of each node itself and the vector representation of first-order neighbor nodes, a second iterative operation may aggregate the second-order neighbor node vector representation of each node on the basis of the first iterative operation, and so on. Thus, through multiple iterative operations, a first vector representation of the nodes in each single-granularity graph may be obtained.
In some embodiments, step 102 may specifically include the following steps:
and step B1, determining the current vector representation of each node based on the last vector representation, the first adjacency matrix and the weight parameter matrix of each node in each single-granularity graph.
In the step, the last vector representation of each node is obtained by performing t-1 th iteration operation on the first adjacent matrix by using a preset graph neural network model, the current vector representation of each node is obtained by performing t-1 th iteration operation on the first adjacent matrix by using the graph neural network model, t is a positive integer greater than 1, the last vector representation of each node comprises the vector representation of each node and the vector representation of t-1 order neighbor nodes thereof, and the current vector representation of each node comprises the vector representation of each node and the vector representation of t order neighbor nodes thereof.
In some embodiments, the current vector representation for each node may be determined according to the following formula:
a(t)=AH(t-1)Wt
in the formula, a(t)Representing the current vector representation, a represents the first adjacency matrix,H(t-1)representing the last vector representation, WtRepresenting a weight parameter matrix.
Step B2 determines a first vector representation for each node based on the current vector representation for each node.
In some embodiments, step B2 may specifically include the following steps:
and step B21, determining a feature matrix to be updated of each node based on the last vector representation of each node, the current vector representation of each node, the updated gate weight parameter matrix, the updated gate deviation parameter matrix and the updated gate activation function.
In the step, in order to depict different importance influences of vector representation of neighbor nodes of different orders on vector representation of the central node, a gating mechanism, namely an update gate and a forget gate, can be added on the basis of the graph neural network model.
In some embodiments, the feature matrix to be updated of each node may be determined according to the following formula:
z(t)=σ1(Wzat+UzH(t-1)+bz)
in the formula, z(t)Representing the feature matrix to be updated, σ1Indicating an update gate activation function, WzAnd UzRepresenting an updated gate weight parameter matrix, bzIndicating an updated gate bias parameter matrix. The updated gate weight parameter matrix and the updated gate deviation parameter matrix can control whether the vector representation and the adoption degree of the neighbor nodes are adopted or not.
And step B22, determining a feature matrix to be forgotten of each node based on the last vector representation of each node, the current vector representation of each node, a forgetting gate weight parameter matrix, a forgetting gate deviation parameter matrix and a forgetting gate activation function.
In this step, the feature matrix to be forgotten of each node may be determined according to the following formula:
r(t)=σ2(Wrat+UrH(t-1)+br)
in the formula, r(t)Representing a matrix of features to be forgotten, σ2Indicating a forgetting gate activation function, WrAnd UrRepresenting a forgetting gate weight parameter matrix, brRepresenting a forgetting gate deviation parameter matrix. The forgetting gate weight parameter matrix and the forgetting gate deviation parameter matrix can control whether the vector representation of the neighbor node is adopted or not.
And step B23, determining the forgetting characteristic matrix of each node based on the last vector representation of each node, the current vector representation of each node, the characteristic matrix to be forgotten of each node, the forgetting weight parameter matrix and the forgetting deviation parameter matrix.
In this step, the forgetting feature matrix of each node may be determined according to the following formula:
in the formula (I), the compound is shown in the specification,
representing a forgetting feature matrix, W
aAnd U
aDenotes a forgetting weight parameter matrix, bh denotes a forgetting offset parameter matrix, and |, denotes a Hadamard product.
And step B24, determining the first vector representation of each node based on the forgetting characteristic matrix of each node, the characteristic matrix to be updated of each node and the last vector representation of each node.
In this step, a first vector representation of each node may be determined according to the following formula:
in the formula, HtA first vector representation is represented.
In this embodiment, the forgetting feature matrix after the forgetting gate is mapped according to a formula that computes a first vector representation
Information aggregation is carried out to the forgetting characteristic matrix
And a feature matrix z to be updated
(t)Calculating similarity, and then representing the last semantic character H
(t-1)And (1-z)
(t)) And calculating the similarity, so that the graph neural network model has a controllable choice on updating the current vector representation and the updating degree on the basis of retaining the last vector representation, and simultaneously can prevent the graph neural network model from oscillating.
After step 102, intra-graph information aggregation for a single-granularity graph is completed. However, in general, characters, words, and terms of different granularity all contain different levels of semantics. For the Chinese medical knowledge base matching task as an example, the character-based model is the finest granularity model, and the character-based model shows superior performance compared with the word-based model. However, this does not mean that coarse-grained word information has no positive impact on the identification of the entity. Therefore, after completing intra-graph information aggregation, it is necessary to perform information aggregation between single-granularity graphs (i.e., inter-graph information aggregation, wherein the process of inter-graph information aggregation includes steps 104 to 108). Therefore, information with different granularities can be mutually promoted and supplemented, so that the information richness of the text representation can be enhanced, and the robustness of the medical text representation is improved.
For step 104, the first vector representations of the nodes in each single granularity map are merged, which may be considered as superposition of the first vector representations of the nodes in each single granularity map or averaging after superposition, and is not particularly limited herein.
As for step 106, the following steps may be specifically included:
step C1, determining a first weight of an edge connecting between nodes of different single granularity maps based on the inclusion relationship of the nodes of the different single granularity maps.
In this step, for example, one piece of medical text is "patient presents sore throat, stuffy nose and running nose, no fever", nodes in the character particle size diagram are "patient", "person", "present", "sore throat", "pain", "nose", "plug", "running nose", "no", "fever" and "heat", nodes in the word particle size diagram are "patient", "present", "sore throat", "stuffy nose", "running nose", "no fever", for example, the inclusion relationship of "patient" and "patient" is 1, the inclusion relationship of "person" and "patient" is 0, and so on. Where 1 and 0 are the first weights connecting the edges between two nodes. The term first weight of an edge between a node of a granularity map and a node of another granularity map is similar to that described above and is not described in detail herein.
Step C2, obtaining a second weight of each edge in the multi-granularity graph based on the first weight of each edge in the multi-granularity graph and the cosine similarity of each node to determine a second adjacency matrix; wherein the entries in the second adjacency matrix are second weights.
In this step, assume the first weight a of the ith row and the jth column in the multi-granularity mapijIf the number is 1, the cosine similarity between the nodes in the ith row and the jth column is determined by the following formula:
gij=simi[(Vi-1+Vi+Vi+1)t3,Vj]
in the formula, gijIs the cosine similarity between the nodes in the ith row and the jth column, and sim is the cosine similarity function, ViIs a one-dimensional vector of a first weight, V, of the node at which the ith row is locatedjA one-dimensional vector of the first weight of the node at which the jth column is located.
In some embodiments, the second weight is a Hadamard product of the first weight of each edge and the cosine similarity of each node.
In this embodiment, the first weight of each edge in the multi-granularity map is updated (i.e., Hadamard product is performed with the cosine similarity of each node), so that the obtained second weight of each edge in the multi-granularity map can reflect the true weight distribution more, thereby facilitating obtaining more accurate vector representation.
For example, "lumbar degenerative disease" is a disease name, and the word "sexual" is among and exhibits different meanings from the words "degenerative" and "venereal disease". It is clear that in the present context, the semantics of the word "sex" here is mainly determined by "degenerative" rather than "venereal disease". Through inter-graph information fusion, information transmission can be achieved, so that characters can be controlled to receive more information from 'degenerative' rather than 'venereal disease', namely the weight of the side of 'sex' and 'row' is greater than that of the side of 'sex' and 'disease'.
Specifically, step 108 may include the following steps:
and D1, determining the current second vector representation of each node based on the last second vector representation, the second adjacency matrix and the weight parameter matrix of each node in the multi-granularity graph.
In this step, the last second vector representation of each node is obtained by performing the t-1 th iteration operation on the second adjacent matrix by using the graph neural network model, the current second vector representation of each node is obtained by performing the t-1 th iteration operation on the second adjacent matrix by using the graph neural network model, t is a positive integer greater than 1, the last second vector representation of each node includes the second vector representation of each node and the second vector representation of the t-1 th order neighbor node thereof, and the current second vector representation of each node includes the second vector representation of each node and the second vector representation of the t-order neighbor node thereof.
Step D2, determining a target vector representation for each node based on the current second vector representation for each node.
In some embodiments, step D2 may specifically include the following steps:
determining a feature matrix to be updated of each node based on the last second vector representation of each node, the current second vector representation of each node, the updated gate weight parameter matrix, the updated gate deviation parameter matrix and the updated gate activation function;
determining a feature matrix to be forgotten of each node based on the last second vector representation of each node, the current second vector representation of each node, a forgetting gate weight parameter matrix, a forgetting gate deviation parameter matrix and a forgetting gate activation function;
determining a forgetting feature matrix of each node based on the last second vector representation of each node, the current second vector representation of each node, a to-be-forgotten feature matrix of each node, a forgetting weight parameter matrix and a forgetting deviation parameter matrix;
and determining the target vector representation of each node based on the forgetting feature matrix of each node, the feature matrix to be updated of each node and the last second vector representation of each node.
It is understood that the operation of step 108 is similar to that of step 102, and the related contents refer to the above description of step 102, which is not repeated herein.
To further illustrate the inventive concept of the present invention, two exemplary application scenarios of medical tasks are illustrated below.
(1) Medical text sequence tagging task
The task is to identify diagnoses, medications, symptoms, etc. in the Chinese electronic medical record EMR, which is a key step in converting textual electronic medical records to structured knowledge. We performed experiments on the NER dataset of the chinese electronic medical record provided by the cooperative hospital, which was labeled manually by the clinician and contained 2506 medical record texts and 11 medical term categories. The length of each text segment does not exceed 100 words, and the text segments are divided into 8: the scale of 2 is divided into a training set and a test set.
The model achieves high performance on four indexes of accuracy, precision, recall rate and F1 value. The accuracy of the multi-granularity model connected by the local PMI reaches 86.68%, and the accuracy of the multi-granularity model connected by the global PMI reaches 87.07%.
In order to verify the effectiveness of knowledge in single granularity, ablation experiments are carried out, and experiments of single granularity of characters, words and terms and experiments of pairwise combination of the two granularities are respectively carried out. The result shows that the model of the word level performs best in the learning of the single granularity model, and the accuracy rate reaches 85.94%. In addition, in the results of integrating the knowledge of both particle sizes, the particle size of the word-associated term is the best, with an accuracy of 86.45%.
(2) Assisting in diagnostic tasks.
The auxiliary diagnostic task is also called a disease prediction task. The diagnosis is the core part of medical treatment and is given by doctors after listening to patients to describe their condition and analyze their symptoms. This process is time consuming and relies on the physician's understanding of the medical knowledge, and assisted diagnosis is to help the physician make better decisions. To do this, we have collected the chief complaints and current medical history in electronic medical records and put them together to generate samples. The label of the sample is the diagnosis in the electronic medical record, and only one diagnosis is assigned to one sample, so that 12439 samples and 153 general disease diagnoses are obtained.
In the classical baseline model of BilSTM + CRF, the best micro-F1 was 75.49%. Further, in some approaches that attempt to integrate more word information into the character representation, Lattice LSTM and Soft Lexicon gave similar best performance, 79.27% and 79.22%, respectively. The F1 value of our process reached 80.26%. Also, the method of merging words and our method greatly improve recall rates because they introduce additional word and term information.
To verify that each granularity has a positive impact on the final performance, we performed an ablation experiment that was applied to the character-only granularity model and the character-word granularity model. The micro-F1 value for the character-level graphical model was 76.66%. In contrast to Bi-LSTM, character granularity maps add inter-map aggregation between the character embedding layer and the Bi-LSTM layer, effectively facilitating character representation. The character and word granularity maps use the same inter-map aggregation method to update the final character representation and increase the F1 value by 1.07%. The terminology information has also proven to be valid because the F1-score for the three particle size plots increased from 77.73% to 80.26%. Thus, knowledge at each granularity has its semantic information, which in combination can better represent the knowledge.
As shown in fig. 2 and fig. 3, an embodiment of the present invention provides a vector representation generation apparatus for medical text. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. From a hardware level, as shown in fig. 2, a vector representation generating apparatus for medical text provided in an embodiment of the present invention is a hardware architecture diagram of a computing device in which the apparatus is located, and in addition to the processor, the memory, the network interface, and the non-volatile memory shown in fig. 2, the computing device in which the apparatus is located in an embodiment may generally include other hardware, such as a forwarding chip responsible for processing a packet, and the like. Taking a software implementation as an example, as shown in fig. 3, as a logical apparatus, a CPU of a computing device in which the apparatus is located reads a corresponding computer program in a non-volatile memory into a memory to run.
As shown in fig. 3, the present embodiment provides a vector representation generating apparatus for medical text, including:
a construction module 300 for constructing single-granularity maps of the medical text and a first adjacency matrix corresponding to each single-granularity map; the single granularity map comprises a character granularity map, a word granularity map and a term granularity map, each single granularity map comprises a plurality of nodes and a plurality of edges, each edge is connected between two nodes, and the first adjacency matrix is determined based on the weight of each edge in the single granularity map corresponding to the first adjacency matrix;
a first iteration module 302, configured to perform, based on the first adjacency matrix, multiple iteration operations on each single granularity map by using a preset map neural network model to obtain a first vector representation of each node in the single granularity map; the first vector representation of each node in each single granularity graph is obtained by aggregating the vector representation of each node in the single granularity graph and the vector representation of each-order neighbor node;
a splicing module 304, configured to splice the first vector representation of each node in each single-granularity graph to obtain a second vector representation of each node;
a determining module 306, configured to determine a second adjacency matrix of the multi-granularity map based on inclusion relationships of nodes of different single-granularity maps; the multi-granularity graph is obtained by reconstructing all single-granularity graphs, the multi-granularity graph comprises all nodes in each single-granularity graph, and edges included in the multi-granularity graph are obtained on the basis of the inclusion relations of the nodes of different single-granularity graphs;
a second iteration module 308, configured to perform, based on the second adjacency matrix, multiple iteration operations on the multi-granularity graph by using the graph neural network model to obtain a target vector representation of each node in the multi-granularity graph; and the target vector representation of each node in the multi-granularity graph is obtained by aggregating the second vector representation of each node and the second vector representation of each order of neighbor nodes in the multi-granularity graph.
In an embodiment of the present invention, the building module 300 may be configured to perform step 100 in the above-described method embodiment, the first iteration module 302 may be configured to perform step 102 in the above-described method embodiment, the splicing module 304 may be configured to perform step 104 in the above-described method embodiment, the determining module 306 may be configured to perform step 106 in the above-described method embodiment, and the second iteration module 308 may be configured to perform step 108 in the above-described method embodiment.
In an embodiment of the present invention, the building module 300 is configured to perform the following operations:
respectively carrying out word segmentation processing, word segmentation processing and medical knowledge base matching on the medical text to obtain nodes of three types including characters, words and terms;
for each type of node, determining an edge connected between the two nodes;
constructing a single granularity graph corresponding to the type of node based on the same type of node and an edge connected between two nodes in the type;
for each type of node, determining the weight of an edge connecting two nodes in the type;
and constructing a first adjacency matrix corresponding to the type of node based on the weight of the edge connecting the two nodes in the type.
In an embodiment of the present invention, the first iteration module 302 is configured to perform the following operations:
determining the current vector representation of each node based on the last vector representation of each node in each single-granularity graph, the first adjacency matrix and the weight parameter matrix; the last vector representation of each node is obtained by performing t-1 iteration operation on the first adjacent matrix by using a preset graph neural network model, the current vector representation of each node is obtained by performing t iteration operation on the first adjacent matrix by using the graph neural network model, t is a positive integer greater than 1, the last vector representation of each node comprises the vector representation of each node and the vector representation of t-1 order neighbor nodes of the node, and the current vector representation of each node comprises the vector representation of each node and the vector representation of t order neighbor nodes of the node;
a first vector representation for each node is determined based on the current vector representation for each node.
In an embodiment of the present invention, the first iteration module 302, when performing the determining the first vector representation of each node according to the current vector representation of each node, is configured to perform the following operations:
determining a feature matrix to be updated of each node based on the last vector representation of each node, the current vector representation of each node, an update gate weight parameter matrix, an update gate deviation parameter matrix and an update gate activation function;
determining a feature matrix to be forgotten of each node based on the last vector representation of each node, the current vector representation of each node, a forgetting gate weight parameter matrix, a forgetting gate deviation parameter matrix and a forgetting gate activation function;
determining a forgetting characteristic matrix of each node based on the last vector representation of each node, the current vector representation of each node, a to-be-forgotten characteristic matrix of each node, a forgetting weight parameter matrix and a forgetting deviation parameter matrix;
and determining the first vector representation of each node based on the forgetting feature matrix of each node, the feature matrix to be updated of each node and the last vector representation of each node.
In an embodiment of the present invention, the second iteration module 308 is configured to perform the following operations:
determining the current second vector representation of each node based on the last second vector representation, the second adjacency matrix and the weight parameter matrix of each node in the multi-granularity graph; the last second vector representation of each node is obtained by performing t-1 iteration operation on the second adjacent matrix by using the graph neural network model, the current second vector representation of each node is obtained by performing t-1 iteration operation on the second adjacent matrix by using the graph neural network model, t is a positive integer greater than 1, the last second vector representation of each node comprises the second vector representation of each node and the second vector representation of a t-1 order neighbor node thereof, and the current second vector representation of each node comprises the second vector representation of each node and the second vector representation of a t order neighbor node thereof;
and determining the target vector representation of each node according to the current second vector representation of each node.
In an embodiment of the present invention, the second iteration module 308, when performing the determining of the target vector representation of each node according to the current second vector representation of each node, is configured to perform the following operations:
determining a feature matrix to be updated of each node based on the last second vector representation of each node, the current second vector representation of each node, the updated gate weight parameter matrix, the updated gate deviation parameter matrix and the updated gate activation function;
determining a feature matrix to be forgotten of each node based on the last second vector representation of each node, the current second vector representation of each node, a forgetting gate weight parameter matrix, a forgetting gate deviation parameter matrix and a forgetting gate activation function;
determining a forgetting feature matrix of each node based on the last second vector representation of each node, the current second vector representation of each node, a to-be-forgotten feature matrix of each node, a forgetting weight parameter matrix and a forgetting deviation parameter matrix;
and determining the target vector representation of each node based on the forgetting feature matrix of each node, the feature matrix to be updated of each node and the last second vector representation of each node.
In an embodiment of the present invention, the determining module 306 is configured to perform the following operations:
determining a first weight of an edge connected between nodes of different single-granularity graphs based on the inclusion relation of the nodes of the different single-granularity graphs;
obtaining a second weight of each edge in the multi-granularity graph based on the first weight of each edge in the multi-granularity graph and the cosine similarity of each node to determine a second adjacency matrix; wherein an entry in the second adjacency matrix is the second weight.
It is to be understood that the illustrated structure of the embodiment of the present invention does not constitute a specific limitation to a vector representation generating apparatus for medical texts. In further embodiments of the present invention, a vector representation generating apparatus for medical text may include more or fewer components than those shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Because the content of information interaction, execution process, and the like among the modules in the device is based on the same concept as the method embodiment of the present invention, specific content can be referred to the description in the method embodiment of the present invention, and is not described herein again.
The embodiment of the present invention further provides a computing device, which includes a memory and a processor, where the memory stores a computer program, and when the processor executes the computer program, the method for generating a vector representation of a medical text in any embodiment of the present invention is implemented.
Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, causes the processor to execute a method for generating a vector representation of a medical text according to any of the embodiments of the present invention.
Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion module connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion module to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an …" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.