CN109784341A

CN109784341A - A kind of medical document recognition methods based on LSTM neural network

Info

Publication number: CN109784341A
Application number: CN201811589041.3A
Authority: CN
Inventors: 张宇; 朱清清
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2018-12-25
Filing date: 2018-12-25
Publication date: 2019-05-21

Abstract

The invention discloses a kind of medical document recognition methods based on LSTM neural network, comprising steps of 1) document image preprocessing, is converted to digital signal for picture signal；2) separating character, by document image normalization；3) character feature is extracted, feature vector is generated；4) form recognition and classification.The advantages that the method for the present invention creatively uses LSTM neural network to identify and classify image, has recognition speed fast, and fault-tolerant ability is strong, and discrimination is high, and classification results are good.

Description

A kind of medical document recognition methods based on LSTM neural network

Technical field

The present invention relates to the technical fields of image procossing, refer in particular to a kind of medical document knowledge based on LSTM neural network Other method.

Background technique

In settlement of insurance claim industry, Claims Resolution document includes that medical invoice, list of medications, case history, checklist etc. are all important Claims Resolution foundation.At present for insurance company, due to the needs and regulatory requirements of data accumulation, for the information of original document Acquisition often requires that very vigorous, but is constrained to cost pressure, major part insurance company acquires hair only by BPO at present Ticket information, other billing informations are often changed into silent data, can not support insurance company's product design and automate control It is required that.Traditional BPO mode depends on manual entry, needs to carry out bill manual sort, and personnel's investment is huge, and Data safety management is complicated, and whole efficiency is very low.

Currently, there is no a kind of special settlement of insurance claim form recognition method is formed for insurance industry, nowadays mostly It is to be identified to financial document, by the way of neural network filter, to the amount of money number and digital codes of identity cards of bill Segmentation, image procossing and feature extraction have been carried out, and it has been identified with update BP method on this basis.

LSTM, which is a kind of time recurrent neural network, institute, to be identified to Claims Resolution document using LSTM neural network herein Some RNN have a kind of chain type form for repeating neural network module, and LSTM is different from the place of RNN, is mainly that it It joined " processor " judged whether information is useful in the algorithm, the structure of this processor effect is referred to as cell. It has been placed three fan doors in one cell, has been called input gate respectively, forgets door and out gate.One information enters the net of LSTM It, can be according to rule to determine whether useful in network.The information for only meeting algorithm certification can just leave, and the information not being inconsistent is then It is passed into silence by forgeing door.Therefore Claims Resolution document is identified using LSTM neural network, can not only improve the accuracy of identification, It can also will improve the accuracy of classification.

Summary of the invention

It is an object of the invention to overcome the deficiencies in the prior art, and it is single to propose a kind of medical treatment based on LSTM neural network According to recognition methods, the attributive character of document can be efficiently extracted, identifies the particular content of document, and according to attributive character to ticket According to being classified automatically, and LSTM neural network structure complexity is low, and calculating speed is efficient, can effectively improve efficiency and knowledge Other precision.

Specifically, mutually independent elementary area is extracted by printed page analysis and carries out two sides of identification with to the document space of a whole page Method.It is primarily due to that document templating species are various, the medical document of Gai Liaochang, such as invoice, list of medications, case, diagnosis and treatment card, inspection The identification of verification certificate etc., template mode is unable to satisfy demand, needs to take mode end to end, realizes the automatic of document types Sort out, attribute field automatically extracts, and improves the accuracy of classification.

Secondly, medical document itself, due to factors such as printing precision limitations, document itself tends to dislocation, mistake occur Row, surface blot etc., in addition hospital can also require superposition hospital's seal, the information such as informing of paying dues, noise information amount according to management It is larger, it needs to pre-process document, including the operation such as denoising, slant correction and slant correction, then passes through LSTM nerve Network extracts main feature information to document, improves accuracy of identification.

To achieve the above object, technical solution provided by the present invention are as follows: a kind of medical treatment based on LSTM neural network is single According to recognition methods, comprising the following steps:

1) picture signal is converted to digital signal by document image preprocessing；

2) separating character, by document image normalization；

3) character feature is extracted, feature vector is generated；

4) form recognition and classification.

In step 1), picture signal is converted to digital signal by document image preprocessing, specific as follows:

During acquiring and obtaining image, since the interference of environment can generate noise, the accurate of document category is influenced Property, there are the characteristics that more salt-pepper noise for document, image is filtered using the method for median filtering.In image scanning When, image increases the difficulty of subsequent classification operation, it is therefore desirable to do slant correction it is possible that certain inclination.Using Tilt detection algorithm based on direction projection is scanned image with the scan line of different angle, calculates scanning line direction Maximal projection；The maximizing again in all direction maximal projections, obtains the scanning line direction of the maximum direction projection The as inclination direction of document image.Pretreatment before identification region positioning and character recognition can according to need carry out two-value Change operation, in order to better adapt to the image of writing quality difference or background complexity, uses Adaptive Thresholding herein Binarization operation is carried out to image: the pixel grey scale for being greater than some threshold grey scale value being set as gray scale maximum, less than this Pixel grey scale is set as gray scale minimum.Adaptive threshold T (x, y) is different in each pixel, by calculating around pixel B*b (b is specified by parameter) sub-region right is average, carries out average weighted to region all pixels, obtains threshold grey scale value, final to obtain To binary image, picture signal is converted into digital signal.

In step 2), separating character is specific as follows by document image normalization:

In step 1), the bianry image of document has been obtained, Character segmentation is carried out according to binary image.Firstly, right More character targets carry out floor projection, multiword can be accorded with Target Segmentation into different rows according to Y-axis projection value；Again to same line inscribed Symbol does upright projection, according to X-axis projection value, can be divided into multiple row；According to row, column value, divisible single character out.To hang down For delivering directly shadow, specific practice is: from left to right scanning a line character with a vertical line, is in certain position according to the vertical line It is no to encounter black pixel and determine whether this position has character.The character picture of segmentation is finally normalized to the one word of 24*24 Accord with image.

In step 3), character feature is extracted, generates feature vector, specific as follows:

Using thick meshed feature extracting method, independent single binaryzation character is divided into length and breadth and is made of n grid Form, take the ratio of the total character pixels of character pixels (being set as white pixel) Zhan in each grid, all proportions value lined up N dimensional feature vector is formed once arranging.The present invention is divided into the character picture after normalization 20 parts on longitudinal direction, 12 is divided into transverse direction Part, therefore 20 × 12=240 1 or 0 column matrix constituted is just used as the input feature vector of character, in this way for each input sample Will 240 features, that is, can determine input layer number of nodes be 240.

In step 4), form recognition and classification are specific as follows:

First define a LSTM model, need incoming parameter be input data dimension be 20, input dimension be 240, the number of plies 2, output node number classifies number for 10 (setting as the case may be), and concealed nodes number passes through following two Empirical equation adjusts according to the actual situation to determine；

Wherein, m is the number of hidden nodes, and n is input layer number, and l is output layer number of nodes, and α is normal between 1~10 Number；

LSTM neural network is divided into input layer, hidden layer and output layer.Input layer is responsible for receiving information, and passes to implicit Layer；Hidden layer is responsible for information transformation, the last one hidden layer is responsible for transferring information to output layer；Output layer outwardly output information Processing result.The learning process of LSTM neural network includes positive transmitting and two processes of error back propagation.Data are through forward direction Transmitting successively calculates through hidden layer since input layer, passes to output layer, if the reality output of output layer and desired output are not Symbol, then calculate the error amount of output layer, then reverse propagated error, that is, output error is passed through hidden layer with some form Input layer is returned in anti-pass, and gives error distribution to all neurons of each layer, so that the error of each layer neuron is obtained, this error conduct The foundation of the neuron parameter is corrected, finally identifies the information such as nomenclature of drug, the amount of money in document；

A last external softmax classifier takes the last one part of output to be passed to classifier and finds out class probability, Finally obtain the classification of document.

Compared with prior art, the present invention have the following advantages that with the utility model has the advantages that

1, the present invention is creatively identified and is classified to image using LSTM neural network, and recognition speed is fast, fault-tolerant Ability is strong, and discrimination is high, it is possible to prevente effectively from character fracture, thickness unevenness bring adverse effect.

2, the present invention uses LSTM neural network, and the number of iterations is few, and training accuracy rate is high, and discrimination is high, and classification results are good.

3, the present invention uses new slant correction algorithm, had not only reduced scanning times but also had improved scanning speed.

4, network structure of the invention is simple, and identification and assorting process carry out simultaneously, reduces calculation amount, and calculating is efficient, from And reach real-time.

Detailed description of the invention

Fig. 1 is settlement of insurance claim form recognition flow chart.

Fig. 2 is form recognition sorter network structure chart.

Specific embodiment

The present invention is further explained in the light of specific embodiments.

Medical document recognition methods based on LSTM neural network provided by the implementation case, inputs a hospital charge Document is identified.The entire flow of document image recognition is as shown in Figure 1.In pretreatment image file, will be schemed using algorithm As signal is converted to digital signal；Next, segmented image character, by image normalization at unified size；Then, image is extracted Feature generates feature vector；Reuse LSTM neural network recognization picture material；Finally, using softmax classifier by document Classify.Itself specifically includes the following steps:

1) image preprocessing: firstly, the document image to input carries out median filtering, salt-pepper noise is filtered out.If wanted The image gradient identified is bigger, is first scanned with a biggish scan line angle step, finds out maximum side To projecting and writing down corresponding scan line angle, when a neighborhood using centered on the angle then being asked inclination angle as essence Behind the inclination angle that detected image, the slant correction to image is can be realized in detection range.Finally, passing through Adaptive Thresholding Image is converted to gray level image, picture signal is converted to digital signal, is convenient for subsequent feature extraction.

2) separating character, by document image normalization.Firstly, carrying out floor projection to more character targets, projected according to Y-axis Multiword can be accorded with Target Segmentation into different rows by value；Upright projection is done to same line character again, it, can be by it according to X-axis projection value It is divided into multiple row；According to row, column value, divisible single character out.By taking upright projection as an example, specific practice is: vertical with one Line from left to right scans a line character, determines whether this position has word whether certain position encounters black pixel according to the vertical line Symbol.The character picture of segmentation is finally normalized to the single character picture of 24*24.

3) character feature is extracted, feature vector is generated.The character picture after normalization, 20 parts are divided on longitudinal direction, laterally On be divided into 12 parts, therefore 20 × 12=240 1 or 0 column matrix constituted is just used as the input feature vector of character, in this way for each Input sample will 240 features, that is, can determine input layer number of nodes be 240.

4) form recognition is with sorter network structure as shown in Fig. 2, the process of identification and classification is as follows:

Define a LSTM model, need incoming parameter be input data dimension be 20, input dimension be 240, layer Number is 2, and output node number i.e. classification number is 10, and concealed nodes number is determined by following two empirical equation, and according to reality Border situation adjusts:

Wherein, m is the number of hidden nodes, and n is input layer number, and l is output layer number of nodes, and α is normal between 1~10 Number.

LSTM neural network is divided into input layer, hidden layer and output layer；Input layer is responsible for receiving information, and passes to implicit Layer；Hidden layer is responsible for information transformation, the last one hidden layer is responsible for transferring information to output layer；Output layer is responsible for outwardly exporting Information processing result.

Firstly, being input to the input layer of LSTM network, and pass to hidden layer, hidden layer using 240 dimensional features as input The information in feature is obtained, and exchanges information, the last one hidden layer communicates information to output layer, by successively calculating, if The reality output and desired output of output layer are not inconsistent, then calculate the error amount of output layer, then reverse propagated error, that is, will Output error passes through hidden layer anti-pass with some form and returns input layer, and gives error distribution to all neurons of each layer, to obtain The error of each layer neuron is obtained, this error is as the foundation for correcting the neuron parameter, finally when error reaches minimum, Recognition result is obtained, identifies the information such as nomenclature of drug, the amount of money in document.

Finally, the output Fusion Features of LSTM network are input to Softmax classifier, according to pre-set point Class number finds out the class probability of every kind of classification, and probability is higher, illustrates that the document belongs to the classification, finally obtains classification results.

Embodiment described above is only the preferred embodiments of the invention, and but not intended to limit the scope of the present invention, therefore All shapes according to the present invention change made by principle, should all be included within the scope of protection of the present invention.

Claims

1. a kind of medical document recognition methods based on LSTM neural network, which comprises the following steps:

2) separating character, by document image normalization；

3) character feature is extracted, feature vector is generated；

4) form recognition and classification.

2. a kind of medical document recognition methods based on LSTM neural network according to claim 1, which is characterized in that In step 1), picture signal is converted to digital signal by document image preprocessing, specific as follows:

Image preprocessing includes filtering, slant correction and binaryzation, is filtered first using the method for median filtering to image； Then the tilt detection algorithm based on direction projection is used, image is scanned with the scan line of different angle, calculates and sweeps Retouch the maximal projection in line direction；The maximizing again in all direction maximal projections obtains the scanning of maximum direction projection Line direction is the inclination direction of bill images；Binarization operation is finally carried out to image using Adaptive Thresholding: being greater than The pixel grey scale of some threshold grey scale value is set as gray scale maximum, is set as gray scale minimum this pixel grey scale is less than, adaptive It answers threshold value different in each pixel, by calculating pixel peripheral region weighted average, region all pixels is carried out flat It weights, obtains threshold grey scale value, finally obtain binary image, picture signal is converted to digital signal.

3. a kind of medical document recognition methods based on LSTM neural network according to claim 1, which is characterized in that In step 2), separating character is specific as follows by document image normalization:

Firstly, floor projection is carried out to more character targets, according to Y-axis projection value by multiword symbol Target Segmentation at different rows；Again Upright projection is done to same line character, according to X-axis projection value, is divided into multiple row；According to row, column value, it is partitioned into single word Symbol；For upright projection, specific practice is: a line character is from left to right scanned with a vertical line, according to the vertical line at certain Whether position, which encounters black pixel, determines whether this position has character, and finally the character picture by segmentation is normalized to 24*24's Single character picture.

4. a kind of medical document recognition methods based on LSTM neural network according to claim 1, it is characterised in that: In step 3), character feature is extracted, generates feature vector, specific as follows:

Using thick meshed feature extracting method, independent single binaryzation character is divided into the shape being made of n grid in length and breadth Formula takes the ratio of the total character pixels of character pixels Zhan in each grid, by all proportions value form a line just formed n dimensional feature to Amount；The character picture after normalization, it is divided into 20 parts on longitudinal direction, 12 parts is divided into transverse direction, therefore 20 × 12=240 1 or 0 is constituted Column matrix be just used as the input feature vector of character, in this way for each input sample will 240 features, that is, can determine input The number of nodes of layer is 240.

5. a kind of medical document recognition methods based on LSTM neural network according to claim 1, it is characterised in that: In step 4), form recognition and classification are specific as follows:

A LSTM model is defined, it is 20 that need incoming parameter, which be the dimension of input data, and input dimension is 240, and the number of plies is 2, output node number i.e. classification number is 10, and concealed nodes number is determined by following two empirical equation, and according to practical feelings Condition adjusts:

Wherein, m is the number of hidden nodes, and n is input layer number, and l is output layer number of nodes, and α is the constant between 1~10；

LSTM neural network is divided into input layer, hidden layer and output layer；Input layer is responsible for receiving information, and passes to hidden layer； Hidden layer is responsible for information transformation, the last one hidden layer is responsible for transferring information to output layer；Output layer is responsible for outwardly output letter Cease processing result；The learning process of LSTM neural network includes positive transmitting and two processes of error back propagation；Data are through just To transmitting since input layer, is successively calculated through hidden layer, pass to output layer, if the reality output of output layer and desired output are not Symbol, then calculate the error amount of output layer, then reverse propagated error, that is, output error is passed through hidden layer with some form Input layer is returned in anti-pass, and gives error distribution to all neurons of each layer, so that the error of each layer neuron is obtained, this error conduct The foundation of the neuron parameter is corrected, finally identifies the information in document, including nomenclature of drug and the amount of money；