CN115293150A

CN115293150A - Automatic encoding method for operation record by fusing convolutional neural network and self-attention mechanism

Info

Publication number: CN115293150A
Application number: CN202210959294.5A
Authority: CN
Inventors: 赵岩; 赵耕红; 蔡巍; 王忱玉; 李佳宜; 王跃
Original assignee: Liaoning Cancer Hospital and Institute; Shenyang Neusoft Intelligent Medical Technology Research Institute Co Ltd
Current assignee: Liaoning Cancer Hospital and Institute; Shenyang Neusoft Intelligent Medical Technology Research Institute Co Ltd
Priority date: 2022-08-10
Filing date: 2022-08-10
Publication date: 2022-11-04

Abstract

The invention relates to a coding method, in particular to an automatic coding method for an operation record integrating a convolutional neural network and an attention mechanism. The operation type automatic coding is realized in the operation records by performing deep learning training on a large amount of operation record texts, so that the work of coding personnel is assisted. The method comprises the following steps: step 1, taking an automatic encoding task of the surgical operation as a text multi-label classification task. And 2, establishing a training model and realizing automatic coding through the training model.

Description

Automatic encoding method for operation record by fusing convolutional neural network and self-attention mechanism

Technical Field

The invention relates to a coding method, in particular to an automatic coding method for surgery records integrating a convolutional neural network and a self-attention mechanism.

Background

The international disease classification (ICD) is the basis for realizing the payment according to DRGs and the reporting of medical information data such as a Hospital Quality Monitoring System (HQMS) and the like in hospitals, is an important tool for medical treatment, teaching and scientific research data retrieval, and the operation and operation records in medical record data are important resources for reflecting the treatment information of inpatients. The operation record is used for recording the detailed information of the whole operation process performed by a clinician, the encoding personnel need to accurately read the operation record on the basis of reading all the contents of the medical record, and the operation code can be directly influenced by one sentence in the operation record. Therefore, the work of the encoding personnel is a professional and technical work, the encoding classification principle needs to be mastered, the hard sleeve can not be moved by living study, and the encoding personnel with higher professional degree is relatively short of the prior art. Therefore, finding a method capable of assisting encoding personnel to perform efficient and accurate encoding has become an urgent need of hospitals. In recent years, with the gradual maturity of artificial intelligence and natural language processing technology, the method has been successfully applied in the medical field, and provides possibility for the task of automatically generating codes for surgical records.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides an automatic operation record coding method integrating a Convolutional Neural Network (CNN) and a Self Attention Mechanism (SAM).

In order to achieve the purpose, the invention adopts the following technical scheme, which comprises the following steps:

step 1, taking an automatic encoding task of the surgical operation as a text multi-label classification task.

And 2, establishing a training model and realizing automatic coding through the training model.

Further, in step 1, the task of automatically encoding the surgical operation as a text multi-label classification task comprises that a given surgical record text and a corresponding label { (c) ₁ ,y ₁ ),(c ₂ ,y ₂ ),...,(c _m ,y _m ) Using the training set as a training set;

wherein, c _m ＝{w ₁ ,w ₂ ,...,w _n Denotes the mth text, w _n Denotes c _m N-th word in (b), y _m The label category corresponding to the mth text; and y is _m ∈{0,1} ^k Where k represents the total number of tag types contained in all texts.

Further, in step 2, the establishing a training model and implementing automatic coding through the training model includes:

step 2.1, vector representation is carried out on the text by adopting a pre-training language model BERT, and each word in the text can be represented as a multi-dimensional dense vector:

w _i ＝f(w _i ) (1)

step 2.2, capturing the context bidirectional semantic dependency relationship of the text through a bidirectional long and short term memory network BilSTM; each word is caused to acquire a representation having contextual semantic information.

And 2.3, fusing local text feature extraction and global text feature extraction, and modeling the text information from two angles of local and global respectively.

Step 2.4, feature fusion and output: after the text local and global text feature representation is obtained, the text local and global text feature representation are fused to obtain a semantic feature vector X containing the local and the global, the semantic feature vector X is predicted through a multilayer perceptron, the semantic feature vector X is converted into a corresponding label prediction probability, and cross entropy loss training is performed.

Further, in step 2.2, the capturing the text context bidirectional semantic dependency relationship by using the bidirectional long-short term memory network BiLSTM includes: when the text vector input at the moment t is w _t The implicit t time in the one-way LSTM sequenceLayer state h _t The calculation process is as follows:

f _t ＝σ(W _f ·[h _t-1 ,w _t ]+b _f ) (2)

i _t ＝σ(W _i ·[h _t-1 ,w _t ]+b _i ) (3)

o _t ＝σ(W _o ·[h _t-1 ,w _t ]+b _o ) (6)

h _t ＝o _t *tanh(c _t ) (7)

in the formula, f _t 、i _t 、

c _t 、o _t Respectively a forgetting gate, an input gate, a temporary cell state, a cell state and an output gate value at the time t, wherein W and b are respectively a corresponding weight matrix and a bias term, sigma is a Sigmoid activation function, and tanh is a hyperbolic tangent activation function;

representing the positive and negative information of the text vector at each moment

And

and (3) splicing to obtain an output representation H of the BilSTM, so that each word obtains a representation with context semantic information:

further, in step 2.3, the local text feature extraction includes extracting text local feature information using the convolutional neural network CNN.

Furthermore, the extracting of the text local feature information by using the convolutional neural network CNN includes selecting 3 convolutional kernels with different lengths and widths equal to the word vector length to perform local feature extraction on the output H of the BiLSTM through the movement of a sliding window, and the feature map is represented as:

v _i ＝f(w _c ·h _i:i+m-1 +b) (9)

v＝[v ₁ ,v ₂ ,...,v _n ] (10)

compressing the characteristic diagram by utilizing the maximum pooling layer and extracting characteristic information thereof, and then splicing the characteristic vectors extracted by different convolution kernels to form a characteristic sequence e _c ：

u＝max{v} (11)

e _c ＝[u ₁ ,u ₂ ,...,u _n ] (12)

Further, in step 2.3, the global text feature extraction includes extracting the text local feature information using an attention mechanism SAM.

Further, the output H of the BiLSTM is used as the input of the attention mechanism SAM, and then multiplied by three parameter matrices respectively to obtain a query matrix Q, a queried matrix K and an actual feature matrix V:

performing inner product operation on the Q and the V matrix and performing normalization processing to obtain a score corresponding to each word vector; then using softmax activation function to process to obtain the weight proportion of each word, multiplying the weight proportion with the actual characteristic vector V matrix to obtain the vector representation e with global information _a ：

Wherein d is _k In the form of a vector of dimensions, the vector,

the main purpose is to prevent the inner product from increasing with the increase of the dimension of the vector, and then to stabilize the gradient.

Further, in step 2.4, the vector X is:

X＝add(e _a ,e _c ) (15)

and (3) predicting X by a multi-layer perceptron:

wherein, W ₁ 、W ₂ F is a ReLu activation function as a training parameter, normalization processing is carried out through a sigmoid function, and the normalization processing is converted into corresponding label prediction probability;

the training by cross entropy loss is:

compared with the prior art, the invention has the beneficial effects.

The invention regards the automatic encoding task of the operation record as a text multi-label classification task, uses CNN to extract local features, uses SAM to extract global features, and obtains more comprehensive feature vector representation by fusing the features of CNN and SAM, thereby solving the problems of text feature information loss and global label dependence loss to a certain extent. Experiments show that the method provided by the invention not only can automatically generate the operation code for the operation record, but also can obtain a more accurate prediction result compared with a model which only uses CNN to perform local characteristics.

Drawings

The invention is further described with reference to the following figures and detailed description. The scope of the invention is not limited to the following expressions.

Figure 1 is a problem description schematic.

Fig. 2 is a model structure diagram.

Fig. 3 is a schematic diagram of a local information extraction layer structure.

Detailed Description

As shown in fig. 1-3, the specific embodiment: the method comprises the following steps: 1. problem description: as shown in FIG. 1, automatically extracting the mentioned surgical operations and obtaining codes from the text description of a surgical record can be described as a text multi-label classification task, i.e., a given surgical record text and its corresponding label { (c) ₁ ,y ₁ ),(c ₂ ,y ₂ ),...,(c _m ,y _m ) As a training set, where c _m ＝{w ₁ ,w ₂ ,...,w _n Denotes the mth text, w _n Denotes c _m N-th word in (b), y _m For the tag class corresponding to the mth text, y _m ∈{0,1} ^k Where k represents the total number of tag categories contained in all texts.

The multi-label classification model is obtained by training the training set, and new unlabeled samples can be classified into k semantic labels.

2. Introduction of a model:

1) A text embedding layer.

The invention adopts a pre-training language model BERT (Bidirectional Encoder reproduction from Transformers) to carry out vector Representation on a text, in the process, the BERT is only used as a vector Representation tool to serve downstream tasks, and each word in the text can be represented as a multi-dimensional dense vector:

w _i ＝f(w _i ) (1)

2) A BilSTM layer.

A bidirectional Long Short-Term Memory network (Bi-directional Long Short-Term Memory, bilSTM) can capture the bidirectional semantic dependency relationship of the text context. When the text vector input at the time t is w _t Then the hidden layer state h at t time in the one-way LSTM sequence _t The calculation process is as follows:

f _t ＝σ(W _f ·[h _t-1 ,w _t ]+b _f ) (2)

i _t ＝σ(W _i ·[h _t-1 ,w _t ]+b _i ) (3)

o _t ＝σ(Wo·[h _t-1 ,w _t ]+b _o ) (6)

h _t ＝o _t *tanh(c _t ) (7)

in the formula (f) _t 、i _t 、

c _t 、o _t The cell state is a cell state, W and b are respectively a weight matrix and a bias term corresponding to the cell state, σ is a Sigmoid activation function, and tanh is a hyperbolic tangent activation function.

And

3) And fusing local and global text feature extraction layers.

In order to extract text characteristic information more comprehensively, the model integrates respective advantages of multilayer CNN and SAM, and models the text information from two angles of local and global.

a) A local information extraction layer.

The CNN is widely applied to various fields of NLP (non line segment) by virtue of the advantages of strong adaptability, simple structure, low computational complexity and the like, the method extracts local main characteristic information of a text by utilizing the CNN, and a model structure is shown as a figure 3:

the invention selects 3 convolution kernels with different lengths and widths equal to the length of a word vector to extract local features of the output H of the BilSTM through the movement of a sliding window, and the feature map is shown as follows:

v _i ＝f(w _c ·h _i:i+m-1 +b) (9)

v＝[v ₁ ,v ₂ ,...,v _n ] (10)

compressing the characteristic diagram by utilizing the maximum pooling layer and extracting main characteristic information of the characteristic diagram, and then splicing the characteristic vectors extracted by different convolution kernels to form a characteristic sequence e _c ：

u＝max{v} (11)

e _c ＝[u ₁ ,u ₂ ,...,u _n ] (12)

b) And a global information extraction layer.

The essence of SAM is to give each word in the text a corresponding weight after weighing the context global information, and the word with the higher weight value plays a larger role in the classification task. The output H of the BilSTM is used as the input of SAM by the model, and then the SAM is multiplied by three parameter matrixes respectively to obtain a query matrix Q, a queried matrix K and an actual characteristic matrix V:

and performing inner product operation on the Q and the V matrix and performing normalization processing to obtain a score corresponding to each word vector. Then using softmax activation function to process to obtain the weight proportion of each word, multiplying the weight proportion by the actual characteristic vector V matrix to obtain the vector representation e with global information _a ：

Wherein, d _k In the form of a vector of dimensions, the vector,

4) And (5) feature fusion and output.

After the local and global feature representations of the text are obtained, the local and global feature representations are fused to obtain a semantic feature vector X containing the local and global features:

X＝add(e _a ,e _c ) (15)

finally, the model predicts X through a multilayer perceptron:

wherein, W ₁ 、W ₂ F is a ReLu activation function as a training parameter, normalization processing is carried out through a sigmoid function, the normalization processing is converted into corresponding label prediction probability, and cross entropy loss training is carried out:

the invention regards the automatic coding task of the operation as a text multi-label classification task, and realizes automatic coding by using a BERT + BilSTM + CNN + Self Attention frame training model. Text local features and global feature information are extracted through the CNN and the SAM respectively and are fused to obtain more comprehensive feature expression vectors, and therefore a better classification effect is achieved.

It should be understood that the detailed description of the present invention is only for illustrating the present invention and is not limited by the technical solutions described in the embodiments of the present invention, and those skilled in the art should understand that the present invention can be modified or substituted equally to achieve the same technical effects; as long as the use requirements are met, the method is within the protection scope of the invention.

Claims

1. An automatic encoding method for operation records fusing a convolutional neural network and a self-attention mechanism is characterized in that: the method comprises the following steps:

2. The method of claim 1, wherein: in step 1, the operation automatic coding task as a text multi-label classification task comprises a given operation record text and a corresponding label { (c) ₁ ,y ₁ ),(c ₂ ,y ₂ ),...,(c _m ,y _m ) Using the training set as a training set;

wherein, c _m ＝{w ₁ ,w ₂ ,...,w _n Denotes the mth text, w _n Is shown by c _m The nth word in (1), y _m The label category corresponding to the mth text; and y is _m ∈{0,1} ^k Where k represents the total number of tag types contained in all texts.

3. The method of claim 1, wherein: in step 2, the establishing of the training model and the realization of the automatic coding through the training model include:

w _i ＝f(w _i ) (1)

2.3, fusing local text feature extraction and global text feature extraction, and modeling the text information from two angles of local and global respectively;

step 2.4, feature fusion and output: after the text local and global text feature representation is obtained, the text local and global text feature representation is fused to obtain a semantic feature vector X containing the local and the global, the semantic feature vector X is predicted through a multilayer perceptron, the semantic feature vector X is converted into a corresponding label prediction probability, and cross entropy loss training is performed.

4. The method of claim 3, wherein: in step 2.2, the capturing the context bidirectional semantic dependency relationship of the text by using the bidirectional long-short term memory network BilSTM comprises: when the text vector input at the time t is w _t Then the hidden layer state h at t time in the one-way LSTM sequence _t The calculation process is as follows:

f _t ＝σ(W _f ·[h _t-1 ,w _t ]+b _f ) (2)

i _t ＝σ(W _i ·[h _t-1 ,w _t ]+b _i ) (3)

o _t ＝σ(W _o ·[h _t-1 ,w _t ]+b _o ) (6)

h _t ＝o _t *tanh(c _t ) (7)

in the formula (f) _t 、i _t 、

c _t 、o _t A forgetting gate, an input gate and a temporary cell at the time tState, cell state and output gate value, W and b are respectively corresponding weight matrix and bias term, sigma is Sigmoid activation function, and tanh is hyperbolic tangent activation function;

And with

5. the method of claim 4, wherein: in step 2.3, the local text feature extraction includes extracting text local feature information using the convolutional neural network CNN.

6. The method of claim 5, wherein: the method for extracting text local feature information by using the convolutional neural network CNN comprises the steps of selecting 3 convolutional kernels with different lengths and widths equal to the word vector length, and performing local feature extraction on the output H of the BilSTM through the movement of a sliding window, wherein the feature map is shown as follows:

v _i ＝f(w _c ·h _i:i+m-1 +b) (9)

v＝[v ₁ ,v ₂ ,...,v _n ] (10)

compressing the characteristic graph by using the maximum pooling layer and extracting the characteristic information thereof, and then splicing the characteristic vectors extracted by different convolution kernels to form a characteristic sequence e _c ：

u＝max{v} (11)

e _c ＝[u ₁ ,u ₂ ,...,u _n ] (12)

7. The method of claim 6, wherein: in step 2.3, the global text feature extraction includes extracting text local feature information using an attention mechanism SAM.

8. The method of claim 7, wherein: taking the output H of the BilSTM as the input of an attention mechanism SAM, and multiplying the output H by three parameter matrixes respectively to obtain a query matrix Q, a queried matrix K and an actual characteristic matrix V:

performing inner product operation on the Q matrix and the V matrix and performing normalization processing to obtain a score corresponding to each word vector; then using softmax activation function to process to obtain the weight proportion of each word, multiplying the weight proportion by the actual characteristic vector V matrix to obtain the vector representation with global information ^e _a ：

Wherein, d _k In the form of a vector of dimensions, the vector,

the main purpose is to prevent the inner product from increasing with the increase of the dimension of the vector, and further to stabilize the gradient.

9. The method of claim 8, wherein: in step 2.4, vector X is:

X＝add(e _a ,e _c ) (15)

and (3) predicting X by a multi-layer perceptron:

training by cross entropy loss is: