CN107247950A

CN107247950A - A kind of ID Card Image text recognition method based on machine learning

Info

Publication number: CN107247950A
Application number: CN201710416957.8A
Authority: CN
Inventors: 屈鸿; 黄鹂; 高榕; 刘永胜; 张翮; 史冬霞; 陈珊; 汪文; 汪一文
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2017-06-06
Filing date: 2017-06-06
Publication date: 2017-10-13

Abstract

The invention discloses a kind of ID Card Image text recognition method based on machine learning, belong to image procossing, machine vision, the technical fields such as neutral net, solve in the prior art OCR identification under complex background carry out ID Card Image automatic identification when, recognition time length, the accuracy rate of identification are low, anti-rotation, the problem of warping property is poor.The present invention includes obtaining the image shot, and the image of shooting is pre-processed, and the ID Card Image in pretreated image and complicated background image are distinguished；Word area detection is carried out to the ID Card Image detected, word cutting then is carried out to the character area detected, word one by one is obtained；The word cut out is identified character recognition model based on deep learning, exports the result identified.The present invention is for the text identification on ID Card Image.

Description

A kind of ID Card Image text recognition method based on machine learning

Technical field

A kind of ID Card Image text recognition method based on machine learning, the text identification on ID Card Image, Belong to image procossing, machine vision, the technical field such as neutral net.

Background technology

Certificate identification is come pair using optical character identification (OCR, Optical Character Recognition) technology Text information on certificate is identified.Specifically refer to using OCR technique to scanning, taking pictures after certificate image analyzed, Identification, to obtain the process of the text message on certificate.Compared with traditional manual entry mode, OCR automatic information record Enter with big advantage, the operating efficiency of remote superman's class is wanted in terms of speed and accuracy rate, especially in people with work The increase of time and under the fatigue state, the speed reduction of people's not merely typing information, accuracy rate is also natural Reduction.The mankind are natural when handling mechanical tedious work can not to defeat machine, in order to pursue the reasonable excellent of resource distribution Change, the mankind are freed from such work and put into that other work are imperative, this technology of OCR is just along with the mankind This demand be born out.

The purpose of one OCR identifying system, exactly comes out the Word Input of image file, then carries out layout reversion. The realization of a usual OCR system is mainly comprising four steps：Image preprocessing, word area detection, Character segmentation, character is known Not:

(1) pretreatment of image

Image preprocessing part mainly includes binaryzation, image noise reduction, Slant Rectify etc..Image preprocessing is to recognize The first step of journey, is to lift the treatment effeciency and accuracy rate of subsequent processing units.By taking RGB color image as an example, one Pixel three-component containing chromatic colour, and bianry image only needs to one-component and can just represented, then shared by coloured image Memory space will be three times of bianry image.So big information content is not only computationally intensive and computation complexity is also high, so needing Binary conversion treatment is carried out to picture.Moreover, because the difference time of the quality of picture in itself is uneven, pretreatment work first has to basis The feature of noise carries out denoising to image to be identified.Moreover, the image manually shot often has tilt phenomenon, therefore Slant Rectify is also a highly important ring, is easy to later stage scan text.The step of image preprocessing, is not necessarily to stream Journey is changeless, and different identification demands needs to make the adjustment of step according to experiment effect.Swept generally, for identification Pre-treatment step needed for the PDF retouched, word file is then simply more, and similar to Car license recognition, identity card identification, streetscape The complicated image of this kind of environment of billboard, then need troublesome step.

(2) Text RegionDetection

After image pretreatment operation is carried out, the character area being generally about to begin in detection image.Traditional Word area detection method has the Page Segmentation method of connected region and the dividing method based on textural characteristics, in recent years more popular Object detection method have the method based on deep neural network such as fast-rcnn.

(3) Character segmentation

Character segmentation is the first step of character recognition, the cutting that the good Character segmentation algorithm of a robustness can be complete Numeral, letter and the Chinese text gone out on identity card.Conventional Character segmentation algorithm, which has mainly, at present two classes, and a class is fixed The cutting of spacing, this method is cut image according to constant spacing, and possible Character segmentation is come out.This kind of method is very It is adapted to letter word or numeral as the cutting of target, reason is also very simple, because western language word or numeral are past in block letter It is past all to possess very big uniformity.It is another kind of, it is the cutting of not constant spacing, such as vertical projection method, this class algorithm is more suitable For possess unique scheme structure Chinese text or using whole word (word) as target cutting.In view of this technology institute The identity card identification engine of exploration is a conformability system that letter, numeral, Chinese text all can be identified as target System, therefore this technology is using the cutting method of second of not constant spacing, and in this approach based on make certain improvements.

(4) character recognition

Character recognition is the final step in OCR whole flow process, is also a very important step, the knowledge of this part of module Other accuracy determines that whether whole OCR system can use.All the time, character recognition algorithm is all based on mathematical theory design Algorithm, famous method has template matching method i.e. configuration mode identification, statistical pattern recognition method.Since deep learning emerges Afterwards, due to the feature that it enables it to extract more higher-dimension to the deeply abstraction of feature, with the knowledge of depth learning technology Malapropism symbol starts one upsurge in field.

The weak point of OCR identifications can only exactly recognize formatted document such as word document, it is impossible to which processing is multiple well Certificate identification under miscellaneous background, cause recognition time length, identification accuracy rate is low, anti-rotation, the problem of warping property is poor.

The content of the invention

The present invention provides a kind of ID Card Image text recognition method based on machine learning for above-mentioned weak point, OCR identifications in the prior art are solved under complex background during progress ID Card Image automatic identification, recognition time length, the standard of identification True rate is low, anti-rotation, the problem of warping property is poor.

The technical solution adopted by the present invention is as follows：

A kind of ID Card Image text recognition method based on machine learning, it is characterised in that comprise the following steps：

Step 1, the image of the shooting of acquisition pre-processed, by the ID Card Image in pretreated image and multiple Miscellaneous background image is distinguished；

Step 2, word area detection is carried out to the ID Card Image that detects, then the character area to detecting Word cutting is carried out, word one by one is obtained；

The word cut out is identified for step 3, the character recognition model based on deep learning, and output is identified Result.

Further, comprising the following steps that in the step 1：

(11), pre-processed using Gaussian Blur and gray processing come the image to shooting；

(12) pretreated image, is carried out to step (11), identity card is carried out using Canny operators and Sobe l operators Rim detection；

(13), the region for the identity card surrounded by edges for being detected step (12) using binaryzation and than operation is syncopated as Come, obtain ID Card Image region；

(14), ID Card Image region progress profile is selected using SVM classifier, correct identity card profile diagram is obtained Picture；

(15), the image for the irregular deflection for obtaining step (14), will be carried out using Hough transformation and perspective transform Correct.

Further, the step 2 is comprised the following steps that：

(21) network for the high-level characteristic that three self-encoding encoders of a cascade are obtained, is built, according to the network of high-level characteristic Carry out whether judging pixel as character area from pixel scale, take out accurate character area；Concretely comprise the following steps：

(211), first self-encoding encoder random 500k size of taking-up from given all training pictures is 5*5's Block is set to x as input⁽¹⁾, then x⁽¹⁾∈R⁷⁵, R represents real number space, R⁷⁵It is the vector that a dimension is 75 to define x；Will be defeated The 500k size entered determines hidden neuron number for 5*5 block by many experiments effect, final to determine hidden neuron Number is 40, then 500k size of input is trained for 5*5 block and hidden neuron number by self-encoding encoder, network convergence The result f of first self-encoding encoder coded portion is obtained afterwards⁽¹⁾, f⁽¹⁾∈40；

(212), taking out 500k size in the characteristic pattern matrix that second self-encoding encoder is obtained from step (211) at random is 3*3 block is set to x as input⁽²⁾, order"+" represents x⁽²⁾Be by 9 x⁽¹⁾Directly it is in series, w refers to weight, x⁽²⁾∈ 360, the hidden neuron number for taking second self-encoding encoder is 30, will 500k size is trained for 3*3 block and hidden neuron number by self-encoding encoder, obtains second self-encoding encoder coding unit The result f divided⁽²⁾, f⁽²⁾∈30；

(213), taking out 200k size in the characteristic pattern matrix that the 3rd self-encoding encoder is obtained from step (212) at random is 3*3 block is set to x as input⁽³⁾⁾, x⁽³⁾∈ 270, wherein, every fritter in 3*3 block has 5 pixels and next small Block is overlapping, and the hidden neuron for taking the 3rd self-encoding encoder is 20, by block and hidden neuron of the 200k size for 3*3 After the completion of number is by self-encoding encoder training, the result f of the 3rd self-encoding encoder coded portion is obtained⁽³⁾, f⁽³⁾∈20；

(214) three kinds of features of the central point of 5*5 block, are obtained according to step (211)-step (213), f=f is made⁽¹⁾+f⁽²⁾+f⁽³⁾, "+" represents direct series connection, forms the composite character of one 90 dimension, and the composite character of 90 dimensions is put into SVM models Classification based training is carried out, a svm classifier model is finally given, after training is finished, the body that svm classifier model is distinguished to step 1 Part card image is scanned, and judges whether each pixel is a part for character area, so as to take out accurate character area；

(22) accurate character area, is taken out, character cutting is carried out；Comprise the following steps that：

(221), by Chinese character mean breadth W in accurate character area₁With digital mean breadth W₂Come out as cutting Standard；

(222), the character area width record of the starting point of scan first character area and end point is got off, If the character area width of cutting is similar to grapholect mean breadth is considered as a Chinese character by the character area of cutting；If not Then go to step (223)；

(223) it is, noise if character area width is much smaller than digital averaging width, abandons the region；If literal field Character area is then given the SVM trained a digital sort device and determines whether number by field width degree close to digital averaging width Word, if numeral scans next character area, otherwise goes to step (224)；

(224) right side in current character region, will be inspected, two regional connections are got up in trial, judges to contact again Whether two regions come are Chinese character or numeral, if being not still Chinese character or numeral, reattempt the right side for merging a upper combined region Carry out Chinese character or digital judgement.

Further, the step 3 is comprised the following steps that：

(31) network model of identification character, is built, the network model is by input layer, multiple convolutional layers, multiple sample levels, Full articulamentum and output layer composition；

(32) the network weight parameter of a set of network model, is trained using the training dataset collected；

(33), the word being syncopated as is identified using the network model for training network weight parameter, output result.

In summary, by adopting the above-described technical solution, the beneficial effects of the invention are as follows：

1st, the present invention carries out the automatic identification of ID Card Image under complicated background, and recognition time is short, identification it is accurate Rate is high, there is anti-rotation, the advantage of distortion.

Brief description of the drawings

The particular flow sheet that Fig. 1 detects for ID Card Image in the present invention；

Fig. 2 is the overall flow figure of ID Card Image text recognition technique of the present invention.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, not For limiting the present invention.

Step 1, the image for obtaining shooting, the image of shooting are pre-processed, by the identity in pretreated image Card image and complicated background image are distinguished；Comprise the following steps that：

(12) pretreated image, is carried out to step (11), identity card is carried out using Canny operators and Sobel operators Rim detection；

(14), ID Card Image region progress profile is selected using SVM classifier, correct identity card profile diagram is obtained Picture.

(15), by the image of irregular deflection, it will be corrected using Hough transformation and perspective transform.

Step 2, word area detection is carried out to the ID Card Image that detects, then the character area to detecting Word cutting is carried out, word one by one is obtained；Comprise the following steps that：

(211), first self-encoding encoder is random from given all training pictures (general 1000 of picture of training) Take out 500k size and be used as input for 5*5 block (500k 5*5 cutting image block), be set to x⁽¹⁾, then x⁽¹⁾∈R⁷⁵, R generations Table real number space, R⁷⁵It is the vector that a dimension is 75 to define x；500k size of input is passed through for 5*5 block repeatedly real Test effect and determine hidden neuron number, final to determine hidden neuron number be 40, then by 500k size of input be 5*5's Block and hidden neuron number are trained by self-encoding encoder, and the result of first self-encoding encoder coded portion is obtained after network convergence f⁽¹⁾, f⁽¹⁾∈40；

(212), taking out 500k size in the characteristic pattern matrix that second self-encoding encoder is obtained from step (211) at random is 3*3 block is set to x as input⁽²⁾, order"+" represents x⁽²⁾It is By 9 x⁽¹⁾Directly it is in series, w refers to weight, x⁽²⁾∈ 360, the hidden neuron number for taking second self-encoding encoder is 30, 500k size is trained for 3*3 block and hidden neuron number by self-encoding encoder, second self-encoding encoder coding is obtained Partial result f⁽²⁾, f⁽²⁾∈30；

The word cut out is identified for step 3, the character recognition model based on deep learning, and output is identified Result.Comprise the following steps that：

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention Any modifications, equivalent substitutions and improvements made within refreshing and principle etc., should be included in the scope of the protection.

Claims

1. a kind of ID Card Image text recognition method based on machine learning, it is characterised in that comprise the following steps：

Step 1, the image of the shooting of acquisition pre-processed, by the ID Card Image in pretreated image and complicated Background image is distinguished；

Step 2, the ID Card Image progress word area detection to detecting, are then carried out to the character area detected Word is cut, and obtains word one by one；

The word cut out is identified for step 3, the character recognition model based on deep learning, exports the knot identified Really.

2. a kind of ID Card Image text recognition method based on machine learning according to claim 1, it is characterised in that： Comprising the following steps that in the step 1：

(12) pretreated image, is carried out to step (11), identity card edge is carried out using Canny operators and Sobel operators Detection；

(13), the region for the identity card surrounded by edges for being detected step (12) using binaryzation and than operation is cut out, Obtain ID Card Image region；

(14), ID Card Image region progress profile is selected using SVM classifier, correct identity card contour images are obtained；

(15), the image for the irregular deflection for obtaining step (14), will be corrected using Hough transformation and perspective transform.

3. a kind of ID Card Image text recognition method based on machine learning according to claim 1, it is characterised in that： The step 2 is comprised the following steps that：

(21) network of high-level characteristic that three self-encoding encoders of a cascade are obtained, is built, according to the network of high-level characteristic from picture Plain rank carries out whether judging pixel as character area, takes out accurate character area；Concretely comprise the following steps：

(211), first self-encoding encoder random block work for taking out 500k size for 5*5 from given all training pictures For input, x is set to⁽¹⁾, then x⁽¹⁾∈R⁷⁵, R represents real number space, R⁷⁵It is the vector that a dimension is 75 to define x；By input 500k size determines hidden neuron number for 5*5 block by many experiments effect, and finally determining hidden neuron number is 40, then 500k size of input is trained for 5*5 block and hidden neuron number by self-encoding encoder, after network convergence To the result f of first self-encoding encoder coded portion⁽¹⁾, f⁽¹⁾∈40；

(212) it is, random in the characteristic pattern matrix that second self-encoding encoder is obtained from step (211) to take out 500k size for 3*3 Block as input, be set to x⁽²⁾, order"+" represents x⁽²⁾It is by 9 x⁽¹⁾Directly it is in series, w refers to weight, x⁽²⁾∈ 360, the hidden neuron number for taking second self-encoding encoder is 30, will 500k size is trained for 3*3 block and hidden neuron number by self-encoding encoder, obtains second self-encoding encoder coding unit The result f divided⁽²⁾, f⁽²⁾∈30；

(213) it is, random in the characteristic pattern matrix that the 3rd self-encoding encoder is obtained from step (212) to take out 200k size for 3*3 Block as input, be set to x⁽³⁾⁾, x⁽³⁾∈ 270, wherein, every fritter in 3*3 block has 5 pixels and next fritter Overlapping, the hidden neuron for taking the 3rd self-encoding encoder is 20, by block and hidden neuron number of the 200k size for 3*3 After the completion of being trained by self-encoding encoder, the result f of the 3rd self-encoding encoder coded portion is obtained⁽³⁾, f⁽³⁾∈20；

(214) three kinds of features of the central point of 5*5 block, are obtained according to step (211)-step (213), f=f is made⁽¹⁾+f⁽²⁾+f⁽³⁾, "+" represents direct series connection, forms the composite character of one 90 dimension, and the composite character of 90 dimensions is put into SVM models is carried out Classification based training, finally gives a svm classifier model, after training is finished, the identity card that svm classifier model is distinguished to step 1 Image is scanned, and judges whether each pixel is a part for character area, so as to take out accurate character area；

(221), by Chinese character mean breadth W in accurate character area₁With digital mean breadth W₂Come out as cutting mark It is accurate；

(222), the character area width record of the starting point of scan first character area and end point is got off, if cutting The character area width divided is similar to grapholect mean breadth and the character area of cutting is considered as into a Chinese character；If not then turning To step (223)；

(223) it is, noise if character area width is much smaller than digital averaging width, abandons the region；If literal field field width Character area is then given the SVM trained a digital sort device and determines whether numeral, such as by degree close to digital averaging width Fruit is that numeral scans next character area, otherwise goes to step (224)；

(224) right side in current character region, will be inspected, two regional connections are got up in trial, judges what is connected again Whether two regions are Chinese character or numeral, if being not still Chinese character or numeral, reattempt the right side progress for merging a upper combined region Chinese character or digital judgement.

4. a kind of ID Card Image text recognition technique based on machine learning according to claim 1, it is characterised in that： The step 3 is comprised the following steps that：

(31) network model of identification character, is built, the network model is by input layer, multiple convolutional layers, multiple sample levels, Quan Lian Connect layer and output layer composition；