Disclosure of Invention
In order to enhance the information richness of text characterization, the embodiment of the invention provides a method and a device for generating vector representation of medical text.
In a first aspect, an embodiment of the present invention provides a method for generating a vector representation of a medical text, including:
constructing a single granularity map of the medical text and a first adjacency matrix corresponding to each single granularity map; wherein the single granularity map comprises a character granularity map, a word granularity map and a term granularity map, each single granularity map comprises a plurality of nodes and a plurality of edges, each edge is connected between two nodes, and the first adjacency matrix is determined based on the weight of each edge in the single granularity map corresponding to the first adjacency matrix;
Performing repeated iterative operation on each single granularity graph by using a preset graph neural network model based on the first adjacency matrix to obtain a first vector representation of each node in the single granularity graph; the first vector representation of each node in each single granularity graph is obtained by aggregating the vector representation of each node in the single granularity graph and the vector representation of each level neighbor node;
splicing the first vector representation of each node in each single granularity graph to obtain the second vector representation of each node;
determining a second adjacency matrix of the multi-granularity graph based on the inclusion relation of the nodes of the different single granularity graphs; the multi-granularity graph is obtained by reconstructing all single-granularity graphs, the multi-granularity graph comprises all nodes in each single-granularity graph, and edges included in the multi-granularity graph are obtained based on inclusion relations of the nodes of different single-granularity graphs;
performing repeated iterative operation on the multi-granularity graph by using the graph neural network model based on the second adjacency matrix to obtain target vector representation of each node in the multi-granularity graph; the target vector representation of each node in the multi-granularity graph is obtained by aggregating the second vector representation of each node in the multi-granularity graph and the second vector representation of each level neighbor node.
In one possible design, the constructing the single granularity map of the medical text and the first adjacency matrix corresponding to each single granularity map includes:
performing word segmentation processing, word segmentation processing and medical knowledge base matching on the medical text respectively to obtain three types of nodes including characters, words and terms;
determining, for each type of node, an edge connected between two nodes;
constructing a single granularity graph corresponding to the node of the type based on the node of the same type and the edges connected between the two nodes of the type;
for each type of node, determining the weight of an edge connecting between two nodes in the type;
a first adjacency matrix corresponding to the type of node is constructed based on the weights of the edges in the type connecting between the two nodes.
In one possible design, the performing, based on the first adjacency matrix, multiple iterative operations on each single granularity graph by using a preset graph neural network model to obtain a first vector representation of each node in the single granularity graph includes:
determining the current vector representation of each node based on the last vector representation of each node in each single granularity graph, the first adjacency matrix and the weight parameter matrix; the last vector representation of each node is obtained after t-1 th iteration operation is carried out on the first adjacent matrix by using a preset graph neural network model, the current vector representation of each node is obtained after t-1 th iteration operation is carried out on the first adjacent matrix by using the graph neural network model, t is a positive integer greater than 1, the last vector representation of each node comprises the vector representation of each node and the vector representation of the t-1 th neighbor node thereof, and the current vector representation of each node comprises the vector representation of each node and the vector representation of the t-th neighbor node thereof;
A first vector representation of each node is determined based on the current vector representation of each node.
In one possible design, the determining the first vector representation of each node based on the current vector representation of each node includes:
determining a feature matrix to be updated of each node based on the last vector representation of each node, the current vector representation of each node, the updated gate weight parameter matrix, the updated gate bias parameter matrix and the updated gate activation function;
determining a feature matrix to be forgotten of each node based on the last vector representation of each node, the current vector representation of each node, the forgetting gate weight parameter matrix, the forgetting gate deviation parameter matrix and the forgetting gate activation function;
determining a forgetting feature matrix of each node based on the last vector representation of each node, the current vector representation of each node, the to-be-forgotten feature matrix, the forgetting weight parameter matrix and the forgetting deviation parameter matrix of each node;
and determining a first vector representation of each node based on the forgetting feature matrix of each node, the feature matrix to be updated of each node and the last vector representation of each node.
In one possible design, the performing, based on the second adjacency matrix, multiple iterative operations on the multi-granularity graph by using the graph neural network model to obtain a target vector representation of each node in the multi-granularity graph includes:
Determining the current second vector representation of each node based on the last second vector representation of each node in the multi-granularity graph, the second adjacency matrix and the weight parameter matrix; the last second vector representation of each node is obtained after the t-1 th iteration operation is carried out on the second adjacent matrix by utilizing the graph neural network model, the current second vector representation of each node is obtained after the t-1 th iteration operation is carried out on the second adjacent matrix by utilizing the graph neural network model, t is a positive integer greater than 1, the last second vector representation of each node comprises the second vector representation of each node and the second vector representation of the t-1 th neighbor node thereof, and the current second vector representation of each node comprises the second vector representation of each node and the second vector representation of the t-th neighbor node thereof;
and determining the target vector representation of each node according to the current second vector representation of each node.
In one possible design, the determining the target vector representation for each node based on the current second vector representation for each node includes:
determining a feature matrix to be updated of each node based on the last second vector representation of each node, the current second vector representation of each node, the updated gate weight parameter matrix, the updated gate bias parameter matrix and the updated gate activation function;
Determining a feature matrix to be forgotten of each node based on the last second vector representation of each node, the current second vector representation of each node, the forgetting gate weight parameter matrix, the forgetting gate deviation parameter matrix and the forgetting gate activation function;
determining a forgetting feature matrix of each node based on the last second vector representation of each node, the current second vector representation of each node, the to-be-forgotten feature matrix, the forgetting weight parameter matrix and the forgetting deviation parameter matrix of each node;
and determining the target vector representation of each node based on the forgetting feature matrix of each node, the feature matrix to be updated of each node and the last second vector representation of each node.
In one possible design, the determining the second adjacency matrix of the multi-granularity graph based on the inclusion relation of the nodes of the different single granularity graphs includes:
determining a first weight of edges connected between nodes of different single granularity graphs based on inclusion relationships of the nodes of the different single granularity graphs;
obtaining a second weight of each side in the multi-granularity graph based on the first weight of each side in the multi-granularity graph and cosine similarity of each node so as to determine a second adjacency matrix; wherein an entry in the second adjacency matrix is the second weight.
In a second aspect, an embodiment of the present invention further provides a vector representation generating apparatus for medical text, including:
a construction module for constructing a single granularity map of the medical text and a first adjacency matrix corresponding to each single granularity map; wherein the single granularity map comprises a character granularity map, a word granularity map and a term granularity map, each single granularity map comprises a plurality of nodes and a plurality of edges, each edge is connected between two nodes, and the first adjacency matrix is determined based on the weight of each edge in the single granularity map corresponding to the first adjacency matrix;
the first iteration module is used for carrying out repeated iteration operation on each single granularity graph by utilizing a preset graph neural network model based on the first adjacency matrix to obtain a first vector representation of each node in the single granularity graph; the first vector representation of each node in each single granularity graph is obtained by aggregating the vector representation of each node in the single granularity graph and the vector representation of each level neighbor node;
the splicing module is used for splicing the first vector representation of each node in each single granularity graph to obtain the second vector representation of each node;
A determining module, configured to determine a second adjacency matrix of the multi-granularity graph based on inclusion relationships of nodes of different single granularity graphs; the multi-granularity graph is obtained by reconstructing all single-granularity graphs, the multi-granularity graph comprises all nodes in each single-granularity graph, and edges included in the multi-granularity graph are obtained based on inclusion relations of the nodes of different single-granularity graphs;
the second iteration module is used for carrying out repeated iteration operation on the multi-granularity graph by utilizing the graph neural network model based on the second adjacency matrix to obtain target vector representation of each node in the multi-granularity graph; the target vector representation of each node in the multi-granularity graph is obtained by aggregating the second vector representation of each node in the multi-granularity graph and the second vector representation of each level neighbor node.
In a third aspect, an embodiment of the present invention further provides a computing device, including a memory and a processor, where the memory stores a computer program, and the processor implements a method according to any of the preceding claims when executing the computer program.
In a fourth aspect, embodiments of the present invention also provide a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any one of the above-mentioned aspects
The embodiment of the invention provides a vector representation generation method and a vector representation generation device for medical texts, which are characterized in that single granularity diagrams of different granularity information are constructed, and each single granularity diagram is subjected to repeated iterative operation by utilizing a preset diagram neural network model, so that a first vector representation of each node in the single granularity diagram is obtained, and the transfer of messages in the diagram is completed, so that semantic information among the nodes in the same granularity diagram is richer; and then, the first vector representation of each node in each single granularity graph is spliced to obtain the second vector representation of each node, a second adjacency matrix of the multi-granularity graph is determined based on the inclusion relation of the nodes of different single granularity graphs, and the multi-granularity graph is subjected to repeated iterative operation by utilizing a graph neural network model, so that the target vector representation of each node in the multi-granularity graph is obtained, the transfer of the inter-graph message is completed, the interactive fusion of the first vector representations of each node in different granularity graphs is further realized, the information richness of text representation can be enhanced, and the robustness of medical text representation is improved.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by those skilled in the art without making any inventive effort based on the embodiments of the present invention are within the scope of protection of the present invention.
There is always a great deal of high quality expertise for natural language processing tasks in a particular field, such as medical terms in the medical field, etc. These specialized knowledge representations include dictionary information, semantic relationship information, descriptions in wikipedia, and the like, which we can refer to as domain prior knowledge. By learning a priori knowledge, humans can understand the semantics of the corresponding domain natural language more deeply, as does machine learning models for semantic understanding in tasks.
From a semantic granularity perspective, knowledge can be divided into fine-granularity knowledge and coarse-granularity knowledge. Taking the medical field as an example, fine-grained knowledge often exists in the form of individual characters or words, such as "kidney", "blood glucose", etc., which implies site information and disease type. And more complete terms such as surgical name, symptoms and signs can be regarded as coarse-grained knowledge.
Knowledge of different granularities provides information at different levels in understanding the whole text semantics. Characters are the smallest semantic units, and are also the finest granularity of knowledge. The natural text is analyzed by taking the characters as units, so that the expressed semantics can be covered in a large range. While the coarser the granularity, if the text is understood at sentence level, or chapter level, the more complete and accurate the emphasis and core to be delivered to the text is. Many studies are based on knowledge that considers character, word granularity, pre-trained language models are typical representations of fine-grained information learning, which train on large-scale corpora, such as ELMo, BERT, and XLNet, and exhibit powerful performance over multiple tasks. However, in the field of expertise, there are many unusual expressions and terms of art whose semantic granularity is coarser than characters and words, and these knowledge is discarded in the modeling process at the word level. For Chinese natural language processing tasks, this problem is more common due to Chinese word segmentation errors. Irregular expressions in the professional field are easier to be intercepted by word segmentation tools. Therefore, semantic representations of coarse-grained information such as terms of art are lost in the feature extraction process.
Moreover, knowledge of different granularities can complement each other, since coarse-granularity information is composed of fine-granularity information, their semantic relationships are interrelated. Some studies introduce coarser granularity of word information into a character-based model to compensate for the missing information, because if the model is based on characters only, the word information will be discarded, with additional coarser granularity of information resulting in a more accurate representation of text. Furthermore, if coarser granularity of knowledge can be involved, more accurate semantics can be obtained. In turn, fine-grained information is also beneficial to the richness of coarse-grained information.
In general, a priori knowledge in the professional arts is of great help in understanding natural language. Knowledge with different granularities can provide information with different layers, and the different granularities complement each other, so that good learning of knowledge with multiple granularities plays a vital role in natural language understanding. However, although we have accumulated a great deal of prior knowledge in the field of expertise, it has not been fully exploited in existing models. In studies using a priori knowledge, coarse-grained knowledge is more easily discarded during modeling, and only a portion of fine-grained information is effectively utilized.
In order to solve the technical problem, knowledge with different granularities can be considered to be fused, so that the information richness of the text representation can be enhanced, and the robustness of the medical text representation is improved.
Specific implementations of the above concepts are described below.
Referring to fig. 1, an embodiment of the present invention provides a method for generating a vector representation of a medical text, including:
step 100: constructing a single granularity map of the medical text and a first adjacency matrix corresponding to each single granularity map; wherein the single granularity map comprises a character granularity map, a word granularity map and a term granularity map, each single granularity map comprises a plurality of nodes and a plurality of edges, each edge is connected between two nodes, and a first adjacency matrix is determined based on the weight of each edge in the single granularity map corresponding to the first adjacency matrix;
step 102: based on the first adjacency matrix, performing repeated iterative operation on each single granularity graph by using a preset graph neural network model to obtain a first vector representation of each node in the single granularity graph; the first vector representation of each node in each single granularity graph is obtained by aggregating the vector representation of each node in the single granularity graph and the vector representation of each level neighbor node;
Step 104: splicing the first vector representation of each node in each single granularity graph to obtain the second vector representation of each node;
step 106, determining a second adjacency matrix of the multi-granularity diagram based on the inclusion relation of the nodes of different single granularity diagrams; the multi-granularity graph is obtained by reconstructing all single-granularity graphs, the multi-granularity graph comprises all nodes in each single-granularity graph, and edges included in the multi-granularity graph are obtained based on inclusion relations of the nodes of different single-granularity graphs;
step 108, performing repeated iterative operation on the multi-granularity graph by using a graph neural network model based on the second adjacency matrix to obtain target vector representation of each node in the multi-granularity graph; the target vector representation of each node in the multi-granularity graph is obtained by aggregating the second vector representation of each node in the multi-granularity graph and the second vector representation of each level neighbor node.
In the embodiment of the invention, through constructing single granularity graphs of different granularity information and carrying out repeated iterative operation on each single granularity graph by utilizing a preset graph neural network model, the first vector representation of each node in the single granularity graph is obtained, thereby completing the transmission of the message in the graph and ensuring that semantic information among nodes in the same granularity graph is richer; and then, the first vector representation of each node in each single granularity graph is spliced to obtain the second vector representation of each node, a second adjacency matrix of the multi-granularity graph is determined based on the inclusion relation of the nodes of different single granularity graphs, and the multi-granularity graph is subjected to repeated iterative operation by utilizing a graph neural network model, so that the target vector representation of each node in the multi-granularity graph is obtained, the transfer of the inter-graph message is completed, the interactive fusion of the first vector representations of each node in different granularity graphs is further realized, the information richness of text representation can be enhanced, and the robustness of medical text representation is improved.
Each step in fig. 1 is described separately below.
With respect to step 100, in some embodiments, step 100 may specifically include the steps of:
and A1, performing word segmentation processing, word segmentation processing and medical knowledge base matching on the medical text respectively to obtain three types of nodes including characters, words and terms.
In this step, the word-dividing processing means processing the medical text in units of characters (i.e., single words), for example, the results of the word-dividing processing of "lumbar degenerative disease" are "waist", "vertebra", "back", "line", "sex", "disease" and "change"; the word segmentation processing can process medical texts by using a nubbed word segmentation kit, for example, the word segmentation processing results of lumbar vertebra degenerative disease are lumbar vertebra, degenerative disease, venereal disease and pathological disease; medical knowledge base matching can match medical texts by using a preset medical knowledge base, for example, the result of the medical knowledge base matching of "lumbar degenerative disease" is "lumbar degenerative disease".
Step A2, for each type of node, determining an edge connected between two nodes.
In this step, the manner in which the co-linear relationship between the different nodes is determined includes a sliding window, point-to-point information, and cosine similarity. In some embodiments, the manner in which the co-linear relationship between the nodes of the character type is determined is a sliding window, the manner in which the co-linear relationship between the nodes of the word type is determined is a sliding window or point-to-point information, and the manner in which the co-linear relationship between the nodes of the term type is determined is a cosine similarity.
Taking the mode of point mutual information as an example, if the value of the point mutual information between two nodes is 0, the two nodes can be considered to have no collinear relationship, namely no edge exists; if the value of the point mutual information between two nodes is greater than 0, an edge can be constructed between the two nodes, and the point mutual information is used as the weight of the edge.
Taking the cosine similarity mode as an example, if the cosine similarity between two nodes is 0, it can be considered that there is no collinearly relationship between the two nodes, i.e. there is no edge; if the cosine similarity between two nodes is greater than 0, an edge can be constructed between the two nodes, and the cosine similarity is used as the weight of the edge.
And A3, constructing a single granularity graph corresponding to the type of nodes based on the nodes of the same type and edges connected between the two nodes in the type.
Step A4, for each type of node, determining the weight of the edge between different nodes in the type.
In this step, the weights of the edges are determined by means of sliding windows, which are well known to those skilled in the art and will not be described here in detail. Refer to step A2 by adopting the point-to-point information and cosine similarity.
And step A5, constructing a first adjacency matrix corresponding to the type of nodes based on the weights of edges between different nodes in the type.
In this step, if the current single granularity map includes M nodes, where M is a positive integer greater than or equal to 1, a first adjacency matrix of M may be generated, where the value of each entry in the first adjacency matrix corresponds to the weight of the edge between the nodes.
For step 102, the graph neural network model may be a gate-based graph neural network (GGNN) model or a graph roll-up model (GCN) model, which is not specifically limited herein. The number of iterative operations can be set according to actual requirements, vector representations of first-order neighbor nodes can be aggregated in each iterative operation, and multi-order neighbor node features can be aggregated in multiple iterative operations. For example, a first iterative operation may aggregate the vector representation of each node itself with the vector representation of the first-order neighbor node, a second iterative operation may aggregate the second-order neighbor node vector representation of each node based on the first iterative operation, and so on. Thus, through multiple iterative operations, a first vector representation of the nodes in each single granularity graph may be obtained.
In some embodiments, step 102 may specifically include the steps of:
and B1, determining the current vector representation of each node based on the last vector representation of each node in each single granularity graph, the first adjacency matrix and the weight parameter matrix.
In the step, the last vector representation of each node is obtained after the t-1 th iteration operation is performed on the first adjacent matrix by using a preset graph neural network model, the current vector representation of each node is obtained after the t-1 th iteration operation is performed on the first adjacent matrix by using the graph neural network model, t is a positive integer greater than 1, the last vector representation of each node comprises the vector representation of each node and the vector representation of the t-1 th neighbor node thereof, and the current vector representation of each node comprises the vector representation of each node and the vector representation of the t-th neighbor node thereof.
In some implementations, the current vector representation for each node can be determined according to the following formula:
a (t) =AH (t-1) W t
wherein a is (t) Representing the current vector representation, A representing the first adjacency matrix, H (t-1) Representing the last vector representation, W t Representing a matrix of weight parameters.
And B2, determining a first vector representation of each node according to the current vector representation of each node.
In some embodiments, step B2 may specifically include the steps of:
and step B21, determining a feature matrix to be updated of each node based on the last vector representation of each node, the current vector representation of each node, the updated gate weight parameter matrix, the updated gate deviation parameter matrix and the updated gate activation function.
In this step, in order to describe different importance effects of vector representations of different neighboring nodes of different orders on vector representations of the central node, a gating mechanism, that is, an update gate and a forget gate, may be added on the basis of the graph neural network model.
In some embodiments, the feature matrix to be updated for each node may be determined according to the following formula:
z (t) =σ 1 (W z a t +U z H (t-1) +b z )
wherein z is (t) Representing the feature matrix to be updated, sigma 1 Representing an update gate activation function, W z And U z Representing an update gate weight parameter matrix, b z Representing updating the gate bias parameter matrix. The updating of the gate weight parameter matrix and the updating of the gate bias parameter matrix can control whether the vector representation and the adoption degree of the neighbor nodes are adopted.
And step B22, determining a feature matrix to be forgotten of each node based on the last vector representation of each node, the current vector representation of each node, the forgetting gate weight parameter matrix, the forgetting gate deviation parameter matrix and the forgetting gate activation function.
In this step, the feature matrix to be forgotten of each node may be determined according to the following formula:
r (t) =σ 2 (W r a t +U r H (t-1) +b r )
wherein r is (t) Representing the feature matrix to be forgotten, sigma 2 Representing a forgetting door activation function, W r And U r Matrix of weight parameters representing forgetting gate, b r Representing a matrix of forgetting gate bias parameters. The forgetting gate weight parameter matrix and the forgetting gate deviation parameter matrix can control whether the vector representation of the neighbor node is adopted or not.
And step B23, determining the forgetting feature matrix of each node based on the last vector representation of each node, the current vector representation of each node, the feature matrix to be forgotten of each node, the forgetting weight parameter matrix and the forgetting deviation parameter matrix.
In this step, the forgetting feature matrix of each node may be determined according to the following formula:
in the method, in the process of the invention,
representing forgetting feature matrix, W
a And U
a Represents a forgetting weight parameter matrix, bh represents a forgetting deviation parameter matrix, and by-indicates a Hadamard product.
And step B24, determining a first vector representation of each node based on the forgetting feature matrix of each node, the feature matrix to be updated of each node and the last vector representation of each node.
In this step, a first vector representation of each node may be determined according to the following formula:
Wherein H is t The first vector representation is represented.
In this embodiment, the forgetting feature matrix after the forgetting gate is calculated according to the formula for calculating the first vector representation
Information aggregation is carried out, and forgetting characteristic matrix is subjected to->
And the feature matrix z to be updated
(t) Calculate the similarity, then characterize the last semantic H
(t-1) And (1-z)
(t) ) The similarity is calculated, so that the image neural network model can have a controllable choice on the update of the current vector representation and the update degree on the basis of retaining the last vector representation, and meanwhile, the image neural network model can be prevented from vibrating.
After step 102, intra-graph information aggregation of the single granularity graph is completed. However, typically, characters, words, and terms of different granularity each contain different levels of semantics. For the Chinese medical knowledge base matching task as an example, the character-based model is the finest granularity model, and the character-based model shows more excellent performance compared with the word-based model. However, this does not mean that coarse-grained word information has no positive impact on the identity of the entity. Thus, after the intra-graph information aggregation is completed, it is necessary to perform information aggregation between single granularity graphs (i.e., inter-graph information aggregation, where the process of graph information aggregation includes steps 104 through 108). Therefore, the information with different granularities can be mutually promoted and supplemented, so that the information richness of the text characterization can be enhanced, and the robustness of the medical text characterization is improved.
The concatenation of the first vector representations of the nodes in each single granularity graph for step 104 may be considered as superposition or averaging after superposition of the first vector representations of the nodes in each single granularity graph, and is not specifically limited herein.
For step 106, the method specifically includes the following steps:
and C1, determining a first weight of an edge connected between nodes of different single granularity graphs based on the inclusion relation of the nodes of the different single granularity graphs.
In this step, for example, one piece of medical text is "a patient shows sore throat, a nasal obstruction, no fever", the nodes in the character granularity map are "a patient", "a person", "a pharynx", "a pain", "a nose", "a plug", "a nose", "no", "a hair" and "a fever", the nodes in the word granularity map are "a patient", "a person", "a sore throat", "a nasal obstruction", "a nasal discharge", "no fever", for example, the inclusion relationship of "a patient" and "a patient" is 1, the inclusion relationship of "a person" and "a patient" is 0, and so on. Wherein 1 and 0 are the first weights connecting the edges between two nodes. The term first weight of edges between nodes of the granularity map and nodes of other granularity maps is similar to the above manner, and will not be described here.
Step C2, obtaining a second weight of each side in the multi-granularity graph based on the first weight of each side in the multi-granularity graph and the cosine similarity of each node so as to determine a second adjacency matrix; wherein the term in the second adjacency matrix is a second weight.
In this step, it is assumed that the first weight a of the ith row and jth column in the multi-granularity map ij For 1, the cosine similarity between the nodes where the ith row and the jth column are located is determined by the following formula:
g ij =simi[(V i-1 +V i +V i+1 )t3,V j ]
in the formula g ij For cosine similarity between nodes where the ith row and the jth column are located, simi is cosine similarity function, V i A one-dimensional vector of a first weight of the node where the ith row is located, V j Is a one-dimensional vector of the first weight of the node where the j-th column is located.
In some embodiments, the second weight is obtained by performing a Hadamard product of the first weight of each edge and the cosine similarity of each node.
In this embodiment, the first weights of the edges in the multi-granularity graph are updated (i.e., hadamard products are performed on cosine similarities of the edges and the nodes), so that the second weights of the edges in the obtained multi-granularity graph can reflect real weight distribution, thereby being beneficial to obtaining more accurate vector representation.
For example, "lumbar degenerative disease" is a disease name, and the word "sexual" is among two words of "degenerative" and "venereal disease" and exhibits different meanings in the two words. It is clear that in the present context, the semantics of the "sexual" word here are mainly determined by "degenerative" rather than "venereal disease". By performing inter-graph information fusion, information transfer can be achieved, so that the character "sex" can be controlled to receive more information from "degenerative" rather than "venereal disease", i.e. the weight of the sides of "sex" and "line" is greater than the weight of the sides of "sex" and "venereal disease".
For step 108, the method specifically includes the following steps:
and D1, determining the current second vector representation of each node based on the last second vector representation of each node in the multi-granularity diagram, the second adjacency matrix and the weight parameter matrix.
In the step, the last second vector representation of each node is obtained after the t-1 th iteration operation is performed on the second adjacent matrix by using the graph neural network model, the current second vector representation of each node is obtained after the t-1 th iteration operation is performed on the second adjacent matrix by using the graph neural network model, t is a positive integer greater than 1, the last second vector representation of each node comprises the second vector representation of each node and the second vector representation of the t-1 th neighbor node thereof, and the current second vector representation of each node comprises the second vector representation of each node and the second vector representation of the t-th neighbor node thereof.
And D2, determining the target vector representation of each node according to the current second vector representation of each node.
In some embodiments, step D2 may specifically include the steps of:
determining a feature matrix to be updated of each node based on the last second vector representation of each node, the current second vector representation of each node, the updated gate weight parameter matrix, the updated gate bias parameter matrix and the updated gate activation function;
determining a feature matrix to be forgotten of each node based on the last second vector representation of each node, the current second vector representation of each node, the forgetting gate weight parameter matrix, the forgetting gate deviation parameter matrix and the forgetting gate activation function;
determining a forgetting feature matrix of each node based on the last second vector representation of each node, the current second vector representation of each node, the to-be-forgotten feature matrix, the forgetting weight parameter matrix and the forgetting deviation parameter matrix of each node;
and determining the target vector representation of each node based on the forgetting feature matrix of each node, the feature matrix to be updated of each node and the last second vector representation of each node.
It will be appreciated that the operations performed in step 108 are similar to those performed in step 102, and reference is made to the description of step 102 above, which is not repeated here.
To further illustrate the inventive concept, two exemplary medical task application scenarios are exemplified below.
(1) Medical text sequence tagging task
This task aims at identifying diagnostics, drugs, symptoms, etc. in the chinese electronic medical record EMR, which is a key step in converting textual electronic medical records into structured knowledge. We performed experiments on the NER dataset of chinese electronic medical records provided by a collaborative hospital, whose labels were manually labeled by a clinician, containing 2506 medical record texts and 11 medical term categories altogether. Each text segment has a length of no more than 100 words and the text segments are represented by 8: the scale of 2 is divided into training and test sets.
The model achieves higher results in four indexes of accuracy, precision, recall and F1 value. The accuracy of the multi-granularity model connected by the local PMIs reaches 86.68%, and the accuracy of the multi-granularity model connected by the global PMIs reaches 87.07%.
To verify the validity of knowledge at single granularity we performed ablative experiments, single granularity experiments of words, words and terms, and two granularity pairwise combinations experiments were performed separately. The result shows that the word-level model has the best performance in learning a single granularity model, and the accuracy reaches 85.94%. In addition, in the result of integrating two kinds of granularity knowledge, the granularity of word combination terms is optimal, and the accuracy reaches 86.45%.
(2) Assist in diagnostic tasks.
Auxiliary diagnostic tasks, also known as disease prediction tasks. Diagnosis is the core of medical treatment, and is given by a doctor after hearing a patient to describe the condition and analyze the symptoms. This process is time consuming and relies on the physician's understanding of medical knowledge, while aiding diagnosis is helping the physician make better decisions. To this end, we collect the complaints and current medical history in the electronic medical record and put them together to generate a sample. The label of this sample is a diagnosis in an electronic medical record, and only one diagnosis is assigned to one sample, obtaining 12439 samples in total, 153 general disease diagnoses.
In the classical baseline model of BiLSTM+CRF, the best micro-F1 is 75.49%. Further, in some approaches that attempt to integrate more word information into the character representation, lattice LSTM and Soft Lexicon give similar best performance, 79.27% and 79.22%, respectively. Our method achieved an F1 value of 80.26%. Also, the method of merging words and our method greatly improves recall rates as they introduce additional word and term information.
To verify that each granularity has a positive impact on the final performance we performed an ablation experiment that was applied to the character-only granularity model and the character word granularity model. The micro-F1 value of the character level graphic model was 76.66%. Compared with Bi-LSTM, the character granularity map adds inter-map aggregation between the character embedding layer and the Bi-LSTM layer, thereby effectively promoting character representation. The character granularity map and word granularity map use the same inter-map aggregation method to update the final character representation and increase the F1 value by 1.07%. The term information also proved to be valid because the F1-score of the three particle size plots increased from 77.73% to 80.26%. Thus, knowledge of each granularity has its semantic information, which, in combination, can better represent knowledge.
As shown in fig. 2 and 3, an embodiment of the present invention provides a vector representation generating apparatus for medical text. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. In terms of hardware, as shown in fig. 2, a hardware architecture diagram of a computing device where a vector representation generating apparatus for medical text provided in an embodiment of the present invention is located, in addition to a processor, a memory, a network interface, and a nonvolatile memory shown in fig. 2, the computing device where the apparatus is located in the embodiment may generally include other hardware, such as a forwarding chip responsible for processing a packet, and so on. Taking a software implementation as an example, as shown in fig. 3, as a device in a logic sense, the device is formed by reading a corresponding computer program in a nonvolatile memory into a memory by a CPU of a computing device where the device is located.
As shown in fig. 3, the vector representation generating device for medical text provided in this embodiment includes:
a construction module 300 for constructing a single granularity map of the medical text and a first adjacency matrix corresponding to each single granularity map; wherein the single granularity map comprises a character granularity map, a word granularity map and a term granularity map, each single granularity map comprises a plurality of nodes and a plurality of edges, each edge is connected between two nodes, and the first adjacency matrix is determined based on the weight of each edge in the single granularity map corresponding to the first adjacency matrix;
The first iteration module 302 is configured to perform multiple iteration operations on each single granularity graph by using a preset graph neural network model based on the first adjacency matrix, so as to obtain a first vector representation of each node in the single granularity graph; the first vector representation of each node in each single granularity graph is obtained by aggregating the vector representation of each node in the single granularity graph and the vector representation of each level neighbor node;
the splicing module 304 is configured to splice the first vector representations of the nodes in each single granularity graph to obtain second vector representations of the nodes;
a determining module 306, configured to determine a second adjacency matrix of the multi-granularity graph based on the inclusion relationships of the nodes of the different single granularity graphs; the multi-granularity graph is obtained by reconstructing all single-granularity graphs, the multi-granularity graph comprises all nodes in each single-granularity graph, and edges included in the multi-granularity graph are obtained based on inclusion relations of the nodes of different single-granularity graphs;
a second iteration module 308, configured to perform multiple iteration operations on the multi-granularity graph by using the graph neural network model based on the second adjacency matrix, so as to obtain a target vector representation of each node in the multi-granularity graph; the target vector representation of each node in the multi-granularity graph is obtained by aggregating the second vector representation of each node in the multi-granularity graph and the second vector representation of each level neighbor node.
In an embodiment of the present invention, the construction module 300 may be used to perform the step 100 in the above method embodiment, the first iteration module 302 may be used to perform the step 102 in the above method embodiment, the stitching module 304 may be used to perform the step 104 in the above method embodiment, the determination module 306 may be used to perform the step 106 in the above method embodiment, and the second iteration module 308 may be used to perform the step 108 in the above method embodiment.
In one embodiment of the present invention, the construction module 300 is configured to perform the following operations:
performing word segmentation processing, word segmentation processing and medical knowledge base matching on the medical text respectively to obtain three types of nodes including characters, words and terms;
determining, for each type of node, an edge connected between two nodes;
constructing a single granularity graph corresponding to the node of the type based on the node of the same type and the edges connected between the two nodes of the type;
for each type of node, determining the weight of an edge connecting between two nodes in the type;
a first adjacency matrix corresponding to the type of node is constructed based on the weights of the edges in the type connecting between the two nodes.
In one embodiment of the present invention, the first iteration module 302 is configured to perform the following operations:
determining the current vector representation of each node based on the last vector representation of each node in each single granularity graph, the first adjacency matrix and the weight parameter matrix; the last vector representation of each node is obtained after t-1 th iteration operation is carried out on the first adjacent matrix by using a preset graph neural network model, the current vector representation of each node is obtained after t-1 th iteration operation is carried out on the first adjacent matrix by using the graph neural network model, t is a positive integer greater than 1, the last vector representation of each node comprises the vector representation of each node and the vector representation of the t-1 th neighbor node thereof, and the current vector representation of each node comprises the vector representation of each node and the vector representation of the t-th neighbor node thereof;
a first vector representation of each node is determined based on the current vector representation of each node.
In one embodiment of the present invention, the first iteration module 302 is configured to, when executing the determination of the first vector representation of each node based on the current vector representation of each node, perform the following operations:
Determining a feature matrix to be updated of each node based on the last vector representation of each node, the current vector representation of each node, the updated gate weight parameter matrix, the updated gate bias parameter matrix and the updated gate activation function;
determining a feature matrix to be forgotten of each node based on the last vector representation of each node, the current vector representation of each node, the forgetting gate weight parameter matrix, the forgetting gate deviation parameter matrix and the forgetting gate activation function;
determining a forgetting feature matrix of each node based on the last vector representation of each node, the current vector representation of each node, the to-be-forgotten feature matrix, the forgetting weight parameter matrix and the forgetting deviation parameter matrix of each node;
and determining a first vector representation of each node based on the forgetting feature matrix of each node, the feature matrix to be updated of each node and the last vector representation of each node.
In one embodiment of the present invention, the second iteration module 308 is configured to perform the following operations:
determining the current second vector representation of each node based on the last second vector representation of each node in the multi-granularity graph, the second adjacency matrix and the weight parameter matrix; the last second vector representation of each node is obtained after the t-1 th iteration operation is carried out on the second adjacent matrix by utilizing the graph neural network model, the current second vector representation of each node is obtained after the t-1 th iteration operation is carried out on the second adjacent matrix by utilizing the graph neural network model, t is a positive integer greater than 1, the last second vector representation of each node comprises the second vector representation of each node and the second vector representation of the t-1 th neighbor node thereof, and the current second vector representation of each node comprises the second vector representation of each node and the second vector representation of the t-th neighbor node thereof;
And determining the target vector representation of each node according to the current second vector representation of each node.
In one embodiment of the present invention, the second iteration module 308 is configured to, when executing the determination of the target vector representation for each node based on the current second vector representation for each node, perform the following operations:
determining a feature matrix to be updated of each node based on the last second vector representation of each node, the current second vector representation of each node, the updated gate weight parameter matrix, the updated gate bias parameter matrix and the updated gate activation function;
determining a feature matrix to be forgotten of each node based on the last second vector representation of each node, the current second vector representation of each node, the forgetting gate weight parameter matrix, the forgetting gate deviation parameter matrix and the forgetting gate activation function;
determining a forgetting feature matrix of each node based on the last second vector representation of each node, the current second vector representation of each node, the to-be-forgotten feature matrix, the forgetting weight parameter matrix and the forgetting deviation parameter matrix of each node;
and determining the target vector representation of each node based on the forgetting feature matrix of each node, the feature matrix to be updated of each node and the last second vector representation of each node.
In one embodiment of the present invention, the determining module 306 is configured to perform the following operations:
determining a first weight of edges connected between nodes of different single granularity graphs based on inclusion relationships of the nodes of the different single granularity graphs;
obtaining a second weight of each side in the multi-granularity graph based on the first weight of each side in the multi-granularity graph and cosine similarity of each node so as to determine a second adjacency matrix; wherein an entry in the second adjacency matrix is the second weight.
It will be appreciated that the structure illustrated in the embodiments of the present invention does not constitute a specific limitation on the means for generating a vector representation of a medical document. In other embodiments of the invention, a vector representation generating apparatus of medical text may include more or fewer components than shown, or may combine certain components, or may split certain components, or may have a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The content of information interaction and execution process between the modules in the device is based on the same conception as the embodiment of the method of the present invention, and specific content can be referred to the description in the embodiment of the method of the present invention, which is not repeated here.
The embodiment of the invention also provides a computing device which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the vector representation generation method of the medical text in any embodiment of the invention when executing the computer program.
The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program, which when executed by a processor, causes the processor to execute the vector representation generation method of the medical text in any embodiment of the invention.
Specifically, a system or apparatus provided with a storage medium on which a software program code realizing the functions of any of the above embodiments is stored, and a computer (or CPU or MPU) of the system or apparatus may be caused to read out and execute the program code stored in the storage medium.
In this case, the program code itself read from the storage medium may realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code form part of the present invention.
Examples of the storage medium for providing the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer by a communication network.
Further, it should be apparent that the functions of any of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform part or all of the actual operations based on the instructions of the program code.
Further, it is understood that the program code read out by the storage medium is written into a memory provided in an expansion board inserted into a computer or into a memory provided in an expansion module connected to the computer, and then a CPU or the like mounted on the expansion board or the expansion module is caused to perform part and all of actual operations based on instructions of the program code, thereby realizing the functions of any of the above embodiments.
It is noted that relational terms such as first and second, and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one …" does not exclude the presence of additional identical elements in a process, method, article or apparatus that comprises the element.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: various media in which program code may be stored, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.