CN110633754A

CN110633754A - Intelligent medical record character recognition method based on neural network

Info

Publication number: CN110633754A
Application number: CN201910888334.XA
Authority: CN
Inventors: 徐登友; 董艺航; 许慧
Original assignee: Yibao Medical Science And Technology (shanghai) Co Ltd
Current assignee: Yibao Medical Science And Technology (shanghai) Co Ltd
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2019-12-31

Abstract

The invention provides an intelligent medical record character recognition method based on a neural network, the invention combines a newly designed convolution neural network module and a double-layer bidirectional neural network, the new volume and the neural network module can deepen the depth of the neural network, the feature map after convolution is processed by the double-layer bidirectional neural network, the relation between each slice of the picture is fully considered, the problem of gradient disappearance is not generated along with the deepening of the neural network, the features can be better extracted, the recognition accuracy of the recognition algorithm on the general text and the medical record text is obviously improved, particularly, the accuracy is higher than that of the existing OCR recognition medical record, the accuracy reaches 98.3 percent, secondly, the algorithm data set of the invention increases medical words and medical word training data, the prediction capability of the medical words is increased, furthermore, the data set of the invention is a data set made by real medical record data, the influence of the background can be effectively eliminated.

Description

Intelligent medical record character recognition method based on neural network

Technical Field

The invention relates to the field of medical case management, in particular to an intelligent case character recognition method based on a neural network.

Background

The files of the disease manifestations and diagnosis of the patients are recorded according to the regulations and are stored by the medical record management department of the medical institution according to the relevant regulations. Not only paper, but also electronic documents, medical images, examination films, pathological sections and other storage forms.

For the data stored in paper, the data in the paper needs to be extracted by a technical means and stored in an electronic document form for statistical analysis and scientific research tasks. For the extraction of the paper medical record data, an OCR technology is needed, the medical record is firstly subjected to character detection, and then the detected text line is subjected to character recognition.

At present, the defects of the character recognition technology in case recognition are as follows: 1. the general character recognition only comprises the recognition of common characters, and a plurality of professional medical vocabularies exist in the medical records. 2. The general character recognition method is easily interfered by the data background of the medical case text line, and the performance is poor. 3. Except that the data set contains few characters, the accuracy rate of the universal character recognition method is not up to 95 percent generally. .

Disclosure of Invention

The present invention is directed to overcome the deficiencies of the prior art and to provide a method for recognizing an intelligent medical record character based on a neural network, so as to solve the problems mentioned in the background of the related art.

The purpose of the invention is realized by the following technical scheme:

an intelligent medical record character recognition method based on a neural network comprises the following steps:

s1, taking medical record text line data for marking, marking out characters contained in each text line, and obtaining medical record training data;

s2, taking the marked medical record training data and the existing character detection data identified by the universal character identification method as training data, and dividing the existing character detection data into a training set, a test set and a verification set according to the proportion of 98:1: 1;

s3, performing gray processing on training data according to a 32x280x3 picture to convert the training data into a 32x280x1 gray map, inputting the gray map into a convolutional neural network model to perform feature extraction on the picture to generate a 4x35x192 feature map, performing dimension exchange on the feature map by 35x4x192, merging the feature maps of the last two dimensions by 35x768, transferring the feature maps into a double-layer bidirectional cyclic neural network to perform feature extraction, and obtaining 35 slices of the feature map;

s4, classifying and predicting scores of the 35 slices of the feature map by using a text recognition model;

and S5, taking the classification with the highest predicted score of each slice, wherein the word corresponding to the classification is the predicted word, and finally combining the same characters of 35 slices to obtain 10 characters.

Further, in step S1, each medical record training data has the same length and width and contains 10 characters.

Further, in step S3, the 35 slices of the feature map are 35 words or the number of characters.

Further, in step S4, the text recognition model uses CTC as a loss function, and uses Adam gradient descent optimizer to perform back propagation of the model, thereby adjusting parameters of the neural network model.

The invention has the beneficial effects that:

(1) the data set of the invention increases medical word and medical word training data, and increases the prediction capability of medical word;

(2) the data set made of the real medical record data can effectively eliminate the influence of the background;

(3) the invention combines the newly designed convolution neural network module and the double-layer bidirectional neural network, so that the invention obviously improves the identification accuracy of the general text and the medical record text, and particularly has higher accuracy than the existing OCR identification medical record, and the accuracy reaches 98.3 percent.

Drawings

FIG. 1 is a schematic diagram of a convolutional neural network module of the present invention;

FIG. 2 is a schematic diagram of a text recognition model according to the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

Example (b):

The invention combines the newly designed convolution neural network module and the double-layer bidirectional neural network, the new volume and the neural network module can deepen the depth of the neural network, the feature map after convolution is processed by the double-layer bidirectional neural network, the relation among each slice of the picture is fully considered, the problem of gradient disappearance is not generated along with the deepening of the neural network, the features can be better extracted, so that the recognition accuracy of the recognition algorithm on the general text and the medical case text is obviously improved, especially, the accuracy is higher than that of the existing OCR recognition medical record, the accuracy reaches 98.3 percent, secondly, the algorithm data set of the invention increases medical words and training data thereof, increases the predictive ability of the medical words, and finally, the data set is made of real medical record data, and background influence can be effectively eliminated.

The above-mentioned embodiments only express the specific embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims

1. An intelligent medical record character recognition method based on a neural network is characterized by comprising the following steps:

s3, performing gray processing on training data according to a 32x280x3 picture to convert the training data into a 32x280x1 gray map, inputting the gray map into a convolutional neural network module to perform feature extraction on the picture to generate a 4x35x192 feature map, performing dimension exchange on the feature map by 35x4x192, merging the feature maps of the last two dimensions by 35x768, transferring the feature maps into a double-layer bidirectional cyclic neural network to perform feature extraction, and obtaining 35 slices of the feature map;

2. The method of claim 1, wherein in step S1, each medical record training data has the same length and width and contains 10 characters.

3. The method of claim 1, wherein in step S3, the 35 slices of the feature map are 35 words or characters.

4. The method of claim 1, wherein in step S4, the text recognition model uses CTC as a loss function, and uses Adam gradient descent optimizer for model back-propagation to adjust neural network model parameters.