CN110633754A - Intelligent medical record character recognition method based on neural network - Google Patents
Intelligent medical record character recognition method based on neural network Download PDFInfo
- Publication number
- CN110633754A CN110633754A CN201910888334.XA CN201910888334A CN110633754A CN 110633754 A CN110633754 A CN 110633754A CN 201910888334 A CN201910888334 A CN 201910888334A CN 110633754 A CN110633754 A CN 110633754A
- Authority
- CN
- China
- Prior art keywords
- neural network
- medical record
- training data
- data
- characters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/28—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
- G06V30/287—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Character Discrimination (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention provides an intelligent medical record character recognition method based on a neural network, the invention combines a newly designed convolution neural network module and a double-layer bidirectional neural network, the new volume and the neural network module can deepen the depth of the neural network, the feature map after convolution is processed by the double-layer bidirectional neural network, the relation between each slice of the picture is fully considered, the problem of gradient disappearance is not generated along with the deepening of the neural network, the features can be better extracted, the recognition accuracy of the recognition algorithm on the general text and the medical record text is obviously improved, particularly, the accuracy is higher than that of the existing OCR recognition medical record, the accuracy reaches 98.3 percent, secondly, the algorithm data set of the invention increases medical words and medical word training data, the prediction capability of the medical words is increased, furthermore, the data set of the invention is a data set made by real medical record data, the influence of the background can be effectively eliminated.
Description
Technical Field
The invention relates to the field of medical case management, in particular to an intelligent case character recognition method based on a neural network.
Background
The files of the disease manifestations and diagnosis of the patients are recorded according to the regulations and are stored by the medical record management department of the medical institution according to the relevant regulations. Not only paper, but also electronic documents, medical images, examination films, pathological sections and other storage forms.
For the data stored in paper, the data in the paper needs to be extracted by a technical means and stored in an electronic document form for statistical analysis and scientific research tasks. For the extraction of the paper medical record data, an OCR technology is needed, the medical record is firstly subjected to character detection, and then the detected text line is subjected to character recognition.
At present, the defects of the character recognition technology in case recognition are as follows: 1. the general character recognition only comprises the recognition of common characters, and a plurality of professional medical vocabularies exist in the medical records. 2. The general character recognition method is easily interfered by the data background of the medical case text line, and the performance is poor. 3. Except that the data set contains few characters, the accuracy rate of the universal character recognition method is not up to 95 percent generally. .
Disclosure of Invention
The present invention is directed to overcome the deficiencies of the prior art and to provide a method for recognizing an intelligent medical record character based on a neural network, so as to solve the problems mentioned in the background of the related art.
The purpose of the invention is realized by the following technical scheme:
an intelligent medical record character recognition method based on a neural network comprises the following steps:
s1, taking medical record text line data for marking, marking out characters contained in each text line, and obtaining medical record training data;
s2, taking the marked medical record training data and the existing character detection data identified by the universal character identification method as training data, and dividing the existing character detection data into a training set, a test set and a verification set according to the proportion of 98:1: 1;
s3, performing gray processing on training data according to a 32x280x3 picture to convert the training data into a 32x280x1 gray map, inputting the gray map into a convolutional neural network model to perform feature extraction on the picture to generate a 4x35x192 feature map, performing dimension exchange on the feature map by 35x4x192, merging the feature maps of the last two dimensions by 35x768, transferring the feature maps into a double-layer bidirectional cyclic neural network to perform feature extraction, and obtaining 35 slices of the feature map;
s4, classifying and predicting scores of the 35 slices of the feature map by using a text recognition model;
and S5, taking the classification with the highest predicted score of each slice, wherein the word corresponding to the classification is the predicted word, and finally combining the same characters of 35 slices to obtain 10 characters.
Further, in step S1, each medical record training data has the same length and width and contains 10 characters.
Further, in step S3, the 35 slices of the feature map are 35 words or the number of characters.
Further, in step S4, the text recognition model uses CTC as a loss function, and uses Adam gradient descent optimizer to perform back propagation of the model, thereby adjusting parameters of the neural network model.
The invention has the beneficial effects that:
(1) the data set of the invention increases medical word and medical word training data, and increases the prediction capability of medical word;
(2) the data set made of the real medical record data can effectively eliminate the influence of the background;
(3) the invention combines the newly designed convolution neural network module and the double-layer bidirectional neural network, so that the invention obviously improves the identification accuracy of the general text and the medical record text, and particularly has higher accuracy than the existing OCR identification medical record, and the accuracy reaches 98.3 percent.
Drawings
FIG. 1 is a schematic diagram of a convolutional neural network module of the present invention;
FIG. 2 is a schematic diagram of a text recognition model according to the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
Example (b):
an intelligent medical record character recognition method based on a neural network comprises the following steps:
s1, taking medical record text line data for marking, marking out characters contained in each text line, and obtaining medical record training data;
s2, taking the marked medical record training data and the existing character detection data identified by the universal character identification method as training data, and dividing the existing character detection data into a training set, a test set and a verification set according to the proportion of 98:1: 1;
s3, performing gray processing on training data according to a 32x280x3 picture to convert the training data into a 32x280x1 gray map, inputting the gray map into a convolutional neural network model to perform feature extraction on the picture to generate a 4x35x192 feature map, performing dimension exchange on the feature map by 35x4x192, merging the feature maps of the last two dimensions by 35x768, transferring the feature maps into a double-layer bidirectional cyclic neural network to perform feature extraction, and obtaining 35 slices of the feature map;
s4, classifying and predicting scores of the 35 slices of the feature map by using a text recognition model;
and S5, taking the classification with the highest predicted score of each slice, wherein the word corresponding to the classification is the predicted word, and finally combining the same characters of 35 slices to obtain 10 characters.
Further, in step S1, each medical record training data has the same length and width and contains 10 characters.
Further, in step S3, the 35 slices of the feature map are 35 words or the number of characters.
Further, in step S4, the text recognition model uses CTC as a loss function, and uses Adam gradient descent optimizer to perform back propagation of the model, thereby adjusting parameters of the neural network model.
The invention combines the newly designed convolution neural network module and the double-layer bidirectional neural network, the new volume and the neural network module can deepen the depth of the neural network, the feature map after convolution is processed by the double-layer bidirectional neural network, the relation among each slice of the picture is fully considered, the problem of gradient disappearance is not generated along with the deepening of the neural network, the features can be better extracted, so that the recognition accuracy of the recognition algorithm on the general text and the medical case text is obviously improved, especially, the accuracy is higher than that of the existing OCR recognition medical record, the accuracy reaches 98.3 percent, secondly, the algorithm data set of the invention increases medical words and training data thereof, increases the predictive ability of the medical words, and finally, the data set is made of real medical record data, and background influence can be effectively eliminated.
The above-mentioned embodiments only express the specific embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.
Claims (4)
1. An intelligent medical record character recognition method based on a neural network is characterized by comprising the following steps:
s1, taking medical record text line data for marking, marking out characters contained in each text line, and obtaining medical record training data;
s2, taking the marked medical record training data and the existing character detection data identified by the universal character identification method as training data, and dividing the existing character detection data into a training set, a test set and a verification set according to the proportion of 98:1: 1;
s3, performing gray processing on training data according to a 32x280x3 picture to convert the training data into a 32x280x1 gray map, inputting the gray map into a convolutional neural network module to perform feature extraction on the picture to generate a 4x35x192 feature map, performing dimension exchange on the feature map by 35x4x192, merging the feature maps of the last two dimensions by 35x768, transferring the feature maps into a double-layer bidirectional cyclic neural network to perform feature extraction, and obtaining 35 slices of the feature map;
s4, classifying and predicting scores of the 35 slices of the feature map by using a text recognition model;
and S5, taking the classification with the highest predicted score of each slice, wherein the word corresponding to the classification is the predicted word, and finally combining the same characters of 35 slices to obtain 10 characters.
2. The method of claim 1, wherein in step S1, each medical record training data has the same length and width and contains 10 characters.
3. The method of claim 1, wherein in step S3, the 35 slices of the feature map are 35 words or characters.
4. The method of claim 1, wherein in step S4, the text recognition model uses CTC as a loss function, and uses Adam gradient descent optimizer for model back-propagation to adjust neural network model parameters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910888334.XA CN110633754A (en) | 2019-09-19 | 2019-09-19 | Intelligent medical record character recognition method based on neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910888334.XA CN110633754A (en) | 2019-09-19 | 2019-09-19 | Intelligent medical record character recognition method based on neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110633754A true CN110633754A (en) | 2019-12-31 |
Family
ID=68971863
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910888334.XA Withdrawn CN110633754A (en) | 2019-09-19 | 2019-09-19 | Intelligent medical record character recognition method based on neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110633754A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3985679A1 (en) * | 2020-10-19 | 2022-04-20 | Deepc GmbH | Technique for providing an interactive display of a medical image |
-
2019
- 2019-09-19 CN CN201910888334.XA patent/CN110633754A/en not_active Withdrawn
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3985679A1 (en) * | 2020-10-19 | 2022-04-20 | Deepc GmbH | Technique for providing an interactive display of a medical image |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108664996B (en) | Ancient character recognition method and system based on deep learning | |
CN109189767B (en) | Data processing method and device, electronic equipment and storage medium | |
KR101247891B1 (en) | Method for creating image database for object recognition, processing device, and processing program | |
CN112632980B (en) | Enterprise classification method and system based on big data deep learning and electronic equipment | |
Gallego et al. | Staff-line removal with selectional auto-encoders | |
CN110245657B (en) | Pathological image similarity detection method and detection device | |
WO2021051598A1 (en) | Text sentiment analysis model training method, apparatus and device, and readable storage medium | |
WO2021042505A1 (en) | Note generation method and apparatus based on character recognition technology, and computer device | |
CN108171243B (en) | Medical image information identification method and system based on deep neural network | |
US20200134382A1 (en) | Neural network training utilizing specialized loss functions | |
CN110097096B (en) | Text classification method based on TF-IDF matrix and capsule network | |
En et al. | A scalable pattern spotting system for historical documents | |
CN111145903A (en) | Method and device for acquiring vertigo inquiry text, electronic equipment and inquiry system | |
CN107291949A (en) | Information search method and device | |
CN111180025A (en) | Method and device for representing medical record text vector and inquiry system | |
Rigaud et al. | What do we expect from comic panel extraction? | |
CN110633754A (en) | Intelligent medical record character recognition method based on neural network | |
CN113642562A (en) | Data interpretation method, device and equipment based on image recognition and storage medium | |
CN112464957A (en) | Method and device for acquiring structured data based on unstructured bid document content | |
CN110728240A (en) | Method and device for automatically identifying title of electronic file | |
US11963771B2 (en) | Automatic depression detection method based on audio-video | |
CN112560849B (en) | Neural network algorithm-based grammar segmentation method and system | |
CN115114437A (en) | Gastroscope text classification system based on BERT and double-branch network | |
US20170293863A1 (en) | Data analysis system, and control method, program, and recording medium therefor | |
US11164035B2 (en) | Neural-network-based optical character recognition using specialized confidence functions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20191231 |