CN110633754A - Intelligent medical record character recognition method based on neural network - Google Patents

Intelligent medical record character recognition method based on neural network Download PDF

Info

Publication number
CN110633754A
CN110633754A CN201910888334.XA CN201910888334A CN110633754A CN 110633754 A CN110633754 A CN 110633754A CN 201910888334 A CN201910888334 A CN 201910888334A CN 110633754 A CN110633754 A CN 110633754A
Authority
CN
China
Prior art keywords
neural network
medical record
training data
data
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910888334.XA
Other languages
Chinese (zh)
Inventor
徐登友
董艺航
许慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yibao Medical Science And Technology (shanghai) Co Ltd
Original Assignee
Yibao Medical Science And Technology (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yibao Medical Science And Technology (shanghai) Co Ltd filed Critical Yibao Medical Science And Technology (shanghai) Co Ltd
Priority to CN201910888334.XA priority Critical patent/CN110633754A/en
Publication of CN110633754A publication Critical patent/CN110633754A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/287Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Character Discrimination (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides an intelligent medical record character recognition method based on a neural network, the invention combines a newly designed convolution neural network module and a double-layer bidirectional neural network, the new volume and the neural network module can deepen the depth of the neural network, the feature map after convolution is processed by the double-layer bidirectional neural network, the relation between each slice of the picture is fully considered, the problem of gradient disappearance is not generated along with the deepening of the neural network, the features can be better extracted, the recognition accuracy of the recognition algorithm on the general text and the medical record text is obviously improved, particularly, the accuracy is higher than that of the existing OCR recognition medical record, the accuracy reaches 98.3 percent, secondly, the algorithm data set of the invention increases medical words and medical word training data, the prediction capability of the medical words is increased, furthermore, the data set of the invention is a data set made by real medical record data, the influence of the background can be effectively eliminated.

Description

Intelligent medical record character recognition method based on neural network
Technical Field
The invention relates to the field of medical case management, in particular to an intelligent case character recognition method based on a neural network.
Background
The files of the disease manifestations and diagnosis of the patients are recorded according to the regulations and are stored by the medical record management department of the medical institution according to the relevant regulations. Not only paper, but also electronic documents, medical images, examination films, pathological sections and other storage forms.
For the data stored in paper, the data in the paper needs to be extracted by a technical means and stored in an electronic document form for statistical analysis and scientific research tasks. For the extraction of the paper medical record data, an OCR technology is needed, the medical record is firstly subjected to character detection, and then the detected text line is subjected to character recognition.
At present, the defects of the character recognition technology in case recognition are as follows: 1. the general character recognition only comprises the recognition of common characters, and a plurality of professional medical vocabularies exist in the medical records. 2. The general character recognition method is easily interfered by the data background of the medical case text line, and the performance is poor. 3. Except that the data set contains few characters, the accuracy rate of the universal character recognition method is not up to 95 percent generally. .
Disclosure of Invention
The present invention is directed to overcome the deficiencies of the prior art and to provide a method for recognizing an intelligent medical record character based on a neural network, so as to solve the problems mentioned in the background of the related art.
The purpose of the invention is realized by the following technical scheme:
an intelligent medical record character recognition method based on a neural network comprises the following steps:
s1, taking medical record text line data for marking, marking out characters contained in each text line, and obtaining medical record training data;
s2, taking the marked medical record training data and the existing character detection data identified by the universal character identification method as training data, and dividing the existing character detection data into a training set, a test set and a verification set according to the proportion of 98:1: 1;
s3, performing gray processing on training data according to a 32x280x3 picture to convert the training data into a 32x280x1 gray map, inputting the gray map into a convolutional neural network model to perform feature extraction on the picture to generate a 4x35x192 feature map, performing dimension exchange on the feature map by 35x4x192, merging the feature maps of the last two dimensions by 35x768, transferring the feature maps into a double-layer bidirectional cyclic neural network to perform feature extraction, and obtaining 35 slices of the feature map;
s4, classifying and predicting scores of the 35 slices of the feature map by using a text recognition model;
and S5, taking the classification with the highest predicted score of each slice, wherein the word corresponding to the classification is the predicted word, and finally combining the same characters of 35 slices to obtain 10 characters.
Further, in step S1, each medical record training data has the same length and width and contains 10 characters.
Further, in step S3, the 35 slices of the feature map are 35 words or the number of characters.
Further, in step S4, the text recognition model uses CTC as a loss function, and uses Adam gradient descent optimizer to perform back propagation of the model, thereby adjusting parameters of the neural network model.
The invention has the beneficial effects that:
(1) the data set of the invention increases medical word and medical word training data, and increases the prediction capability of medical word;
(2) the data set made of the real medical record data can effectively eliminate the influence of the background;
(3) the invention combines the newly designed convolution neural network module and the double-layer bidirectional neural network, so that the invention obviously improves the identification accuracy of the general text and the medical record text, and particularly has higher accuracy than the existing OCR identification medical record, and the accuracy reaches 98.3 percent.
Drawings
FIG. 1 is a schematic diagram of a convolutional neural network module of the present invention;
FIG. 2 is a schematic diagram of a text recognition model according to the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
Example (b):
an intelligent medical record character recognition method based on a neural network comprises the following steps:
s1, taking medical record text line data for marking, marking out characters contained in each text line, and obtaining medical record training data;
s2, taking the marked medical record training data and the existing character detection data identified by the universal character identification method as training data, and dividing the existing character detection data into a training set, a test set and a verification set according to the proportion of 98:1: 1;
s3, performing gray processing on training data according to a 32x280x3 picture to convert the training data into a 32x280x1 gray map, inputting the gray map into a convolutional neural network model to perform feature extraction on the picture to generate a 4x35x192 feature map, performing dimension exchange on the feature map by 35x4x192, merging the feature maps of the last two dimensions by 35x768, transferring the feature maps into a double-layer bidirectional cyclic neural network to perform feature extraction, and obtaining 35 slices of the feature map;
s4, classifying and predicting scores of the 35 slices of the feature map by using a text recognition model;
and S5, taking the classification with the highest predicted score of each slice, wherein the word corresponding to the classification is the predicted word, and finally combining the same characters of 35 slices to obtain 10 characters.
Further, in step S1, each medical record training data has the same length and width and contains 10 characters.
Further, in step S3, the 35 slices of the feature map are 35 words or the number of characters.
Further, in step S4, the text recognition model uses CTC as a loss function, and uses Adam gradient descent optimizer to perform back propagation of the model, thereby adjusting parameters of the neural network model.
The invention combines the newly designed convolution neural network module and the double-layer bidirectional neural network, the new volume and the neural network module can deepen the depth of the neural network, the feature map after convolution is processed by the double-layer bidirectional neural network, the relation among each slice of the picture is fully considered, the problem of gradient disappearance is not generated along with the deepening of the neural network, the features can be better extracted, so that the recognition accuracy of the recognition algorithm on the general text and the medical case text is obviously improved, especially, the accuracy is higher than that of the existing OCR recognition medical record, the accuracy reaches 98.3 percent, secondly, the algorithm data set of the invention increases medical words and training data thereof, increases the predictive ability of the medical words, and finally, the data set is made of real medical record data, and background influence can be effectively eliminated.
The above-mentioned embodiments only express the specific embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims (4)

1. An intelligent medical record character recognition method based on a neural network is characterized by comprising the following steps:
s1, taking medical record text line data for marking, marking out characters contained in each text line, and obtaining medical record training data;
s2, taking the marked medical record training data and the existing character detection data identified by the universal character identification method as training data, and dividing the existing character detection data into a training set, a test set and a verification set according to the proportion of 98:1: 1;
s3, performing gray processing on training data according to a 32x280x3 picture to convert the training data into a 32x280x1 gray map, inputting the gray map into a convolutional neural network module to perform feature extraction on the picture to generate a 4x35x192 feature map, performing dimension exchange on the feature map by 35x4x192, merging the feature maps of the last two dimensions by 35x768, transferring the feature maps into a double-layer bidirectional cyclic neural network to perform feature extraction, and obtaining 35 slices of the feature map;
s4, classifying and predicting scores of the 35 slices of the feature map by using a text recognition model;
and S5, taking the classification with the highest predicted score of each slice, wherein the word corresponding to the classification is the predicted word, and finally combining the same characters of 35 slices to obtain 10 characters.
2. The method of claim 1, wherein in step S1, each medical record training data has the same length and width and contains 10 characters.
3. The method of claim 1, wherein in step S3, the 35 slices of the feature map are 35 words or characters.
4. The method of claim 1, wherein in step S4, the text recognition model uses CTC as a loss function, and uses Adam gradient descent optimizer for model back-propagation to adjust neural network model parameters.
CN201910888334.XA 2019-09-19 2019-09-19 Intelligent medical record character recognition method based on neural network Withdrawn CN110633754A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910888334.XA CN110633754A (en) 2019-09-19 2019-09-19 Intelligent medical record character recognition method based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910888334.XA CN110633754A (en) 2019-09-19 2019-09-19 Intelligent medical record character recognition method based on neural network

Publications (1)

Publication Number Publication Date
CN110633754A true CN110633754A (en) 2019-12-31

Family

ID=68971863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910888334.XA Withdrawn CN110633754A (en) 2019-09-19 2019-09-19 Intelligent medical record character recognition method based on neural network

Country Status (1)

Country Link
CN (1) CN110633754A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3985679A1 (en) * 2020-10-19 2022-04-20 Deepc GmbH Technique for providing an interactive display of a medical image

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3985679A1 (en) * 2020-10-19 2022-04-20 Deepc GmbH Technique for providing an interactive display of a medical image

Similar Documents

Publication Publication Date Title
CN108664996B (en) Ancient character recognition method and system based on deep learning
CN109189767B (en) Data processing method and device, electronic equipment and storage medium
KR101247891B1 (en) Method for creating image database for object recognition, processing device, and processing program
CN112632980B (en) Enterprise classification method and system based on big data deep learning and electronic equipment
Gallego et al. Staff-line removal with selectional auto-encoders
CN110245657B (en) Pathological image similarity detection method and detection device
WO2021051598A1 (en) Text sentiment analysis model training method, apparatus and device, and readable storage medium
WO2021042505A1 (en) Note generation method and apparatus based on character recognition technology, and computer device
CN108171243B (en) Medical image information identification method and system based on deep neural network
US20200134382A1 (en) Neural network training utilizing specialized loss functions
CN110097096B (en) Text classification method based on TF-IDF matrix and capsule network
En et al. A scalable pattern spotting system for historical documents
CN111145903A (en) Method and device for acquiring vertigo inquiry text, electronic equipment and inquiry system
CN107291949A (en) Information search method and device
CN111180025A (en) Method and device for representing medical record text vector and inquiry system
Rigaud et al. What do we expect from comic panel extraction?
CN110633754A (en) Intelligent medical record character recognition method based on neural network
CN113642562A (en) Data interpretation method, device and equipment based on image recognition and storage medium
CN112464957A (en) Method and device for acquiring structured data based on unstructured bid document content
CN110728240A (en) Method and device for automatically identifying title of electronic file
US11963771B2 (en) Automatic depression detection method based on audio-video
CN112560849B (en) Neural network algorithm-based grammar segmentation method and system
CN115114437A (en) Gastroscope text classification system based on BERT and double-branch network
US20170293863A1 (en) Data analysis system, and control method, program, and recording medium therefor
US11164035B2 (en) Neural-network-based optical character recognition using specialized confidence functions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20191231