CN104346609B

CN104346609B - The method and device of character on a kind of identification printed matter

Info

Publication number: CN104346609B
Application number: CN201310331468.4A
Authority: CN
Inventors: 侯放
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd
Priority date: 2013-08-01
Filing date: 2013-08-01
Publication date: 2018-05-04
Anticipated expiration: 2033-08-01
Also published as: CN104346609A

Abstract

This application involves a kind of method and device for identifying character on printed matter.This method can include：The printed matter is shot to obtain the image to be identified；Described image is replicated to obtain at least two width duplicating images, and different image procossings is carried out respectively to every width duplicating image to obtain at least two width layered images；Obtained layered image is subjected to figure layer merging, with image after being handled；The image of each character is extracted from image after the processing；And the image of each character to extracting carries out character recognition.Using the technical solution of the application, when carrying out image procossing to the printed matter such as certificate, more effective, the more accurate identification to character on the printed matter can be realized.

Description

The method and device of character on a kind of identification printed matter

Technical field

This application involves image identification technical field, more particularly to a kind of method and device for identifying character on printed matter.

Background technology

In conventional OCR（Optical Character Recognition, optical character identification）In identification, for one Some exterior smoothers, reflect the identification of word on stronger printed matter, such as on the printed matter handled by surface coating The identification of word or for example all kinds of certificate photos or various cards（Especially cross the certificate of modeling processing（Driver's license, driving license etc.）） The identification of upper word, it is often relatively low or in addition reflective so as in the presence of the feelings for identifying mistake because of surface coating there are discrimination Condition, causes the essence of this problem to be effectively be filtered during identifying, causes the font in OCR identifications source to be deposited Fuzzy or contrast is excessive the problem of, simultaneously as various printed matters can also known often there are a variety of different fonts The character brought on not can not match or the problem of matching error.

At present, in the identification technology development of OCR, the demand towards license is more and more, and the hair of existing OCR technique Exhibition direction is all intended to identification and search for complete image information, and from the point of view of current license identification, existing is several In OCR identifying schemes, the identification for identity card, passport etc., although having more mature high discrimination engine at this stage with calculating Method, but in the identification for similar driving license, employee's card etc., since these certificates can all carry out certificate when final issue Cross modeling processing, as well as each regional similar certificate printing be not as identity card equally possess unified printing standard and Font, so as to result in existing license identification, often exist for the license for needing to identify is caused due to over-exposed Image obscure and for deformation font recognition efficiency it is low the problem of, for essence, in existing recognition methods simultaneously The needs in terms of the two are not considered completely.

The content of the invention

The main purpose of the application is to provide a kind of method and device for identifying character on printed matter, to solve existing skill Image processing problem and character recognition problem during character on printed matter is identified existing for art, wherein：

According to the one side of the application, there is provided a kind of method for identifying character on printed matter, it is characterised in that bag Include：The printed matter is shot to obtain the image to be identified；Described image is replicated and is answered with obtaining at least two width It is imaged, and different image procossings is carried out respectively to every width duplicating image to obtain at least two width layered images；By what is obtained Layered image carries out figure layer merging, with image after being handled；The image of each character is extracted from image after the processing；With And the image of each character to extracting carries out character recognition.

According to an embodiment of the present application, in the method, the printed matter is shot to obtain the image to be identified, Including：When being shot setting is exposed by predetermined condition.

According to an embodiment of the present application, in the method, different images is carried out respectively to each width duplicating image Handle to obtain at least two width layered images, including：Noise processing is removed to the width in the duplicating image to obtain First layer image；And contrast enhancement processing is carried out to another width in the duplicating image to obtain the second hierarchical diagram Picture.

According to an embodiment of the present application, in the method, noise processing is removed to the width in the duplicating image To obtain first layer image, including：Identify the noise in the duplicating image；By the gray value of each noise and phase around it The gray value phase adduction of eight adjacent pixels is averaged the denoising gray value as each noise；And by the copy pattern The gray value of each noise replaces with the denoising gray value of the noise to obtain first layer image as in.

According to an embodiment of the present application, in the method, identify that the noise in the duplicating image includes：By the duplication The gray value of each pixel and the gray value phase adduction of its two neighbor pixel in left and right are averaged as each in image The calculating gray value of pixel；Judge that whether the gray value of each pixel calculates the absolute value of the difference of gray value pre- with it Determine in threshold range；And gray value and the absolute value for the difference for calculating gray value are known beyond the pixel of predetermined threshold range Wei not noise.

According to an embodiment of the present application, in the method, contrast enhancing is carried out to another width in the duplicating image Handle to obtain the second layered image, including：The duplicating image is divided at least two subregions；And to each sub-district Domain carries out gray scale adjustment respectively, to obtain the second layered image.

According to an embodiment of the present application, in the method, the layered image is merged, to scheme after being handled Picture, including：Intermediate value is taken to the gray value of corresponding pixel in the layered image, is obtained in the gray value of each pixel Value；And the gray value of each pixel is replaced with to the gray value intermediate value of the pixel, with image after being handled.

According to an embodiment of the present application, in the method, the image of each character in image after the processing, bag are extracted Include：Determine the position of the text image after the processing in image；And Character segmentation is carried out to the text image, extract The image of each character in the text image.

According to an embodiment of the present application, in the method, the position of the text image after the processing in image, bag are obtained Include：Edge texture in every row pixel is identified by edge detection；Do histogram to the Edge texture of every row pixel, and according to Analysis to the histogram determines the recognition threshold of edge primitive；Often gone according to the recognition threshold of edge primitive statistics The number of edge primitive, and record the often starting position of row edge primitive and end position；Identify after the processing in image Non-blank-white row；Judge whether current non-blank-white row meets preset condition, if it is satisfied, then carrying out the detection of next non-blank-white row； And when being consecutively detected the non-blank-white row more than predetermined number and meeting the preset condition, according to each non-blank-white row edge The starting position of primitive and end position determine the position of text image.

According to an embodiment of the present application, in the method, the image of each character to extracting carries out character recognition, bag Include：Character recognition is carried out to the image of each character using BP neural network.

A kind of another aspect of the application, there is provided device for identifying character on printed matter, it is characterised in that including：Adopt Collect module, for being shot to the printed matter to obtain the image to be identified；Hierarchical processing module, for described image Replicated to obtain at least two width duplicating images, and different image procossings is carried out respectively to every width duplicating image with obtain to Few two width layered images；Figure layer merging module, for obtained layered image to be carried out figure layer merging, to scheme after being handled Picture；Extraction module, for extracting the image of each character from image after the processing；And identification module, for extraction The image of each character gone out carries out character recognition.

Compared with prior art, according to the technical solution of the application, by being shot to printed matter and to be identified Image carries out layered image processing, and is merged by figure layer and carry out effect compensation, can lift picture quality, improve the standard of identification True rate.

Brief description of the drawings

Attached drawing described herein is used for providing further understanding of the present application, forms the part of the application, this Shen Schematic description and description please is used to explain the application, does not form the improper restriction to the application.In the accompanying drawings：

Fig. 1 is the flow chart of the method for character on a kind of identification printed matter of the embodiment of the present application；

Fig. 2 is the flow chart for the step S1 that noise processing is removed in the step S102 in Fig. 1 of the embodiment of the present application；

Fig. 3 is the flow chart of the step S201 in Fig. 2 of the embodiment of the present application；

Fig. 4 is the flow chart of the step S2 of contrast enhancement processing in the step S102 in Fig. 1 of the embodiment of the present application；

Fig. 5 is the flow chart of the step S103 in Fig. 1 of the embodiment of the present application；

Fig. 6 is the flow chart of the step S104 in Fig. 1 of the embodiment of the present application；

Fig. 7 is the flow chart of the step S601 in Fig. 6 of the embodiment of the present application；And

Fig. 8 is the structure diagram of the device of character on a kind of identification printed matter of the embodiment of the present application.

Embodiment

The main idea of the present application lies in that by being shot to the printed matter with word, the copying image that will be obtained Carry out different image procossings respectively at least two images and obtain layered image, and figure layer merging is carried out to each layered image, Image after being handled, then Text Feature Extraction and Text region are carried out to the image after the processing.

To make the purpose, technical scheme and advantage of the application clearer, below in conjunction with the application specific embodiment and Technical scheme is clearly and completely described in corresponding attached drawing.Obviously, described embodiment is only the application one Section Example, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing Go out all other embodiments obtained under the premise of creative work, shall fall in the protection scope of this application.

According to an embodiment of the present application, there is provided a kind of method for identifying character on printed matter.

This application can be applied to which the character being printed on the printed matter of character is identified, for example, can be used for certificate Identification, especially mould processed certificate to crossing and be identified.

With reference to figure 1, Fig. 1 is the method flow diagram of character on a kind of identification printed matter of the embodiment of the present application：Such as Fig. 1 institutes Show, in step S101, the printed matter is shot to obtain the image to be identified.

When being shot, since image capture device is uneven, when shooting, may be influenced be subject to various aspects, example Such as time for exposure, exposure compensating, it is bad to cause to shoot the image effect come, while can also influence subsequently to image Processing.Therefore, when being shot, setting can be exposed by predetermined condition before shooting, obtains the more preferable picture of effect. By to same type image in identical environment（Such as the condition such as light intensity）Under that relevant parameter setting is exposed when being shot is different And the situation for producing different-effect is counted, and set the predetermined condition.

In step s 102, described image is replicated to obtain at least two width duplicating images, and to every width copy pattern As carrying out different image procossings respectively to obtain at least two width layered images.That is, by the copying image of shooting into more Part, image procossing is carried out respectively to obtained each width duplicating image, and the image procossing carried out to every piece image is Different, this is equivalent to carry out layered shaping to original image, so as to obtain the layered image by different disposal.

The different image procossing can include：Remove noise processing, contrast enhancement processing.It can also include other Image procossing, for example, path coloring treatment, pattern cut processing, texture recognition pretreatment etc., by these image procossings Afterwards, several layered images will be obtained.

Step S102 may further include step：S1 is removed noise to the width in the duplicating image and handles To first layer image；And step S2 carries out contrast enhancement processing to another width in the duplicating image and obtains second point Tomographic image.

Fig. 2 is the particular flow sheet for the step S1 for being removed noise processing, as shown in Fig. 2, step S1 can include：

Step S201, identifies the noise in the duplicating image.As shown in figure 3, step S201 may further include son Step S301-S303.

In sub-step S301, by the gray value of each pixel in the duplicating image and its two adjacent pixel in left and right The gray value phase adduction of point is averaged the calculating gray value as each pixel.

In sub-step S302, judge whether the gray value of each pixel calculates the absolute value of the difference of gray value with it In predetermined threshold range.

In sub-step S303, the absolute value of gray value and the difference for calculating gray value is exceeded to the picture of predetermined threshold range Vegetarian refreshments is identified as noise.Wherein predetermined threshold range can be configured according to specific condition, or can also be according in the past The empirical value accumulated in noise identification and processing procedure is configured.

Step S202, after the noise in identifying the duplicating image, by the gray value of each noise and phase around it The gray value phase adduction of eight adjacent pixels is averaged the denoising gray value as each noise.Since pixel is with vertical Horizontal both direction is evenly arranged, and therefore, each pixel can have eight adjacent pixels, therefore, by each noise The gray value of gray value eight pixels adjacent thereto is summed in the denoising gray value as the noise of averaging.

Step S203, the denoising gray scale that the gray value of each noise in the duplicating image is replaced with to the noise are worth to First layer image.After obtaining the denoising gray value of each noise, the gray value of each noise in the duplicating image is replaced It is changed to the denoising gray value of the noise, and other pixels（It is not noise）Gray value it is constant, obtain through remove noise at The first layer image of reason.

Digital picture, due to illumination or the object reason such as in itself, often occurs that target area contrasts in gatherer process Low situation is spent, contrast enhancement processing can be carried out to image.

Fig. 4 is the flow chart for the step S2 that contrast processing is carried out to the duplicating image, as shown in figure 4, step S2 can With including：

Step S401, at least two subregions are divided into by the duplicating image.

The basic thought of contrast enhancement processing is carried out, is that image is divided into two sections or multistage by gray scale interval, respectively Greyscale transformation is carried out, so as to strengthen the contrast of image.

It is possible, firstly, to number and the division of division subregion are determined by the analysis of the grey level histogram to duplicating image Subregion boundary threshold.Grey level histogram is the pixel frequency of occurrences of different grey-scale in statistical picture, therefore basis Grey level histogram can obtain the distribution situation of the duplicating image gray value, and according to the distribution of the duplicating image gray value Situation determines to divide an image into more sub-regions, and determines the boundary threshold in division region to determine two neighboring region Waypoint, and the duplicating image is divided at least two subregions by waypoint., can be according to figure in the division of subregion How many wave crest of the grey level histogram of picture or trough determine the number of division subregion, and are used as subzone boundaries threshold using paddy Value.In terms of the setting of boundary threshold, it can be determined according to being trained to image engine, i.e. to be identified to largely similar The image of image is trained with definite suitable boundary threshold, and determining for waypoint can be according to the boundary threshold of selection Calculated, or given threshold can also determine waypoint on the histogram.

Every sub-regions are carried out gray scale adjustment by step S402 respectively, to obtain the second layered image.

Gray scale adjustment is carried out respectively to every sub-regions, specifically, be exactly as needed, will be per each in sub-regions Pixel carries out the conversion of gray value according to pre-defined rule, opposite to suppress those to protrude the gray scale interval where interesting target Uninterested gray space, can use linear transformation, i.e., the conversion of gray value is carried out using predetermined linear transformation for mula, and Obtain the second layered image.

In step s 103, obtained layered image is subjected to figure layer merging, with image after being handled.

Fig. 5 is the particular flow sheet of step S103, as shown in figure 5, step S103 can include：

Step S501, takes intermediate value to the gray value of corresponding pixel in each layered image, obtains each pixel Gray value intermediate value.

Specifically, the every width layered image obtained in above-mentioned step S102 is both for each identical duplicating image Carry out the image after different images processing respectively, therefore the still still original pixel of the pixel in each width layered image, Expression or identical graphical information, simply after different image procossings, the gray scale of each pixel there may be Change, therefore, intermediate value is taken to the gray value of corresponding pixel in every width layered image, can be that each pixel determines one A suitable new gray value.

The gray value of each pixel, is replaced with the gray value intermediate value of the pixel, schemed after being handled by step S502 Picture.

Specifically, can be each by what is obtained in the original image or another width duplicating image of the image that shooting obtains New gray value of the gray value intermediate value of pixel as the pixel, the pixel is adjusted to by the gray value of each pixel Gray value intermediate value, image after being handled, this completes the figure layer merging of layered image, the image after being handled. Alternatively, after completing image and merging, it is contemplated that the needs of picture quality, can also be by image after the obtained processing The pixel to conform to a predetermined condition carries out gray scale coloring again, so as on the image more intentinonally be labeled image, example Such as, black pixel will be leveled off to（Gray value exceedes the point of certain value）The gray value of pixel add 2, to lift partially black pixel Color depth.

To image after obtained processing, the contrast with original image can also be carried out, by each picture of image after the processing The gray value of vegetarian refreshments and the gray value of the corresponding pixel points of original image subtract each other to obtain the gray value differences of each pixel, and judge institute Whether the absolute value for stating gray value differences exceedes predetermined threshold, if the gray value differences of the point exceed predetermined threshold, also needs to this The gray value of point carries out the adjustment of gray value.

In step S104, the image of each character is extracted from image after the processing.

With reference to figure 6, Fig. 6 is the particular flow sheet of step S104.Each character is extracted, texture point can be first passed through Analysis determines the position of text image in image after the processing, then carries out Character segmentation to text image to extract this each word Symbol.

As shown in fig. 6, step S104 can include step S601 and step S602.

In step s 601, the position of the text image after the processing in image is obtained.Refer to shown in Fig. 7, Fig. 7 is The particular flow sheet of step S601, specifically, may comprise steps of：

Step S701, the Edge texture in every row pixel is identified by edge detection.The Edge texture, refers to image Region jumpy occurs for middle gray scale, can be by setting a predetermined threshold value excursion to be identified, i.e. identify Grey scale change exceeds the region of the predetermined threshold value excursion.

Step S702, histogram is done to the Edge texture of every row pixel, and determines side according to the analysis to the histogram The recognition threshold of edge primitive.The edge primitive can be pixel of the gray value in predetermined threshold range.The edge base The recognition threshold of member, can be the dynamic threshold being calculated using adaptive thresholding algorithm.

Step S703, edge primitive number in often being gone according to the recognition threshold of edge primitive statistics, and record and often go The starting position of edge primitive and end position.

Step S704, identifies the non-blank-white row in image after the processing.Can be according to the gray scale of image after the processing Histogram, gray value is very poor（The difference of gray value maxima and minima）Row less than predetermined threshold is identified as blank line, its It is remaining to be identified as non-blank-white row.For example, it is less than amplitude between maximum gradation value and minimum gradation value in histogram by gray value is very poor （It is very poor）5% row be identified as blank line.The blank line that will identify that in follow-up processing is as blank background, after not doing It is continuous to handle, only using non-blank-white row as processing target in subsequent treatment.Wherein, predetermined threshold can be according to a variety of sample graphs The variable that piece obtains after being trained, for example, for the license picture being directed to after being currently known training, can be by predetermined threshold 5% of amplitude between maximum gradation value and minimum gradation value is set in grey level histogram, should for other types of picture recognition Variable can be configured according to the result being trained to other types image.

Step S705, judges whether current non-blank-white row meets preset condition, if it is satisfied, then carrying out next non-blank-white row Detection.Wherein, a large amount of character samples, can be sent into BP neural network and be trained study by the preset condition, according to Obtained result determines after BP neural network training, for example, judge whether the number of the edge primitive in often going reaches predetermined Number.

Step S706, when being consecutively detected the non-blank-white row more than predetermined number and meeting the preset condition, according to every The starting position of the edge primitive of one non-blank-white row and end position determine the position of text image.

For the step S701-S706 of the position of the text image in image after the above-mentioned definite processing, execution sequence Above-mentioned one kind is not limited to, other orders can also be used to perform, for example, the non-NULL after the processing in image can be identified first Bai Hang, then the non-blank-white row to identifying carry out identification, judgement etc. the step of other.

In step S602, Character segmentation is carried out to the text image, extracts each word in the text image The image of symbol.

Segmentation is carried out to the text image to utilize sciagraphy to extract the text into every trade cutting and character segmentation The image of each character in this image.Row cutting, exactly comes out the character cutting of a line a line, forms single file character text figure Picture.Can be along capable direction floor projection, by identifying the blank between literal line and row into every trade cutting.Character segmentation, is exactly After the single file character text image for having carried out row cutting and having obtained, by single character picture from each single file character text image In cut out, obtain the single character picture of each character.

In step S105, the image of each character to extracting carries out Text region.

Text region can be carried out to the character using BP neural network, the image of each character is sent into BP nerve nets Into the identification of line character in network system.

Wherein, the image array to character sample that the advance training to character sample in BP neural network can pass through The method being trained, i.e. be first normalized to the image of character sample, obtain the image moment of each character sample Battle array, then BP neural network is carried out to the image array of each character sample（Error back propagation）Training study.

The image of each character is sent into BP neural network when carrying out the identification of each character picture and carries out word The identification of symbol.

Present invention also provides a kind of device for identifying character on printed matter, Fig. 8 is the identification according to the embodiment of the present application The structure diagram of the device 800 of character on printed matter, as shown in the figure the device 800 can include：Acquisition module 810, at layering Manage module 820, figure layer merging module 830, extraction module 840, and identification module 850.

Acquisition module 810 can be used for shooting the printed matter to obtain the image to be identified.

Hierarchical processing module 820 can be used for replicating described image to obtain at least two width duplicating images, and right Every width duplicating image carries out different image procossings to obtain at least two width layered images respectively.

Figure layer merging module 830 can be used for obtained layered image carrying out figure layer merging, with image after being handled.

Extraction module 840 can be used for from image after the processing image for extracting each character.

The image that identification module 850 can be used for each character to extracting carries out character recognition.

According to one embodiment of the application, the acquisition module 810 can be further used for when being shot by pre- Fixed condition is exposed setting.

According to one embodiment of the application, the hierarchical processing module 820 can include denoising module and contrast Degree enhancing module.

Denoising module can be used for being removed the width in the duplicating image noise processing to obtain first Layered image.

Contrast-enhancement module can be used for carrying out another width in the duplicating image contrast enhancement processing to obtain To the second layered image.

According to one embodiment of the application, the denoising module can include：Noise identification module, denoising gray scale It is worth acquisition module, and noise removes module.

Noise identification module can be used for identifying the noise in the duplicating image.

Denoising gray value acquisition module can be used for the gray value of each noise and eight pixels adjacent around it Gray value phase adduction be averaged denoising gray value as each noise.

Noise removes module and can be used for the gray value of each noise in the duplicating image replacing with going for the noise Gray value make an uproar to obtain first layer image.

According to one embodiment of the application, the noise identification module can include：Calculating sub module, judges submodule Block, and identification submodule.

Calculating sub module can be used for the gray value of each pixel in the duplicating image is adjacent with its left and right two The gray value phase adduction of pixel is averaged the calculating gray value as each pixel.

Judging submodule can be used for judging that the gray value of each pixel calculates the absolute value of the difference of gray value with it Whether in predetermined threshold range.

Identification submodule can be used for gray value with calculating the absolute difference of gray value beyond predetermined threshold range Pixel is identified as noise.

According to one embodiment of the application, the contrast-enhancement module can include picture portion module and gray scale tune Mould preparation block.

Picture portion module can be used for the duplicating image being divided at least two subregions.

Gray scale adjustment module can be used for carrying out gray scale adjustment respectively to every sub-regions, to obtain the second layered image.

According to one embodiment of the application, the merging module 830 can include：Value module and gray value replace mould Block.

Value module can be used for taking intermediate value to the gray value of corresponding pixel in the layered image, obtain each picture The gray value intermediate value of vegetarian refreshments.

Gray value replacement module can be used for the gray value intermediate value that the gray value of each pixel is replaced with to the pixel, Image after being handled.

According to one embodiment of the application, the extraction module 840 can include：

Position acquisition module, can be used for obtaining the position of the text image after the processing in image；

Character segmentation module, can be used for carrying out Character segmentation to the text image, extracts in the text image Each character image.

According to one embodiment of the application, the position acquisition module may further include：Edge detection module, threshold It is worth acquisition module, statistic record module, non-blank-white row identification module, condition judgment module, and position determination module.

Edge detection module can be used for identifying the Edge texture in every row pixel by edge detection.Wherein, it is described Edge texture can be the region that acute variation occurs for gray value.

Threshold value acquisition module can be used for doing the Edge texture of every row pixel histogram, and according to the histogram point Analysis determines the recognition threshold of edge primitive.

Statistic record module can be used for the number that often row top edge primitive is counted according to the recognition threshold of the edge primitive Amount, and record the often starting position of row edge primitive and end position.

Non-blank-white row identification module can be used for identifying the non-blank-white row in image after the processing.

Condition judgment module can be used for judging whether current non-blank-white row meets preset condition, if it is satisfied, then carrying out The detection of next non-blank-white row.

Position determination module can be used for meeting the default bar when being consecutively detected the non-blank-white row more than predetermined number During part, the position of text image is determined according to the starting position of the edge primitive of each non-blank-white row and end position.

According to one embodiment of the application, the identification module 850 can be further used for, and utilize BP neural network pair The image of each character carries out character recognition.

Since the function that the device of the present embodiment is realized essentially corresponds to earlier figures 1 to the embodiment of the method shown in Fig. 7, Therefore not detailed part in the description of the present embodiment, the related description in previous embodiment is may refer to, this will not be repeated here.

In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and memory.

Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein Machine computer-readable recording medium does not include non-temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.

It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability Comprising so that process, method, commodity or equipment including a series of elements not only include those key elements, but also wrapping Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment it is intrinsic will Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described Also there are other identical element in the process of element, method, commodity or equipment.

It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer program product. Therefore, the application can be using the embodiment in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Form.Deposited moreover, the application can use to can use in one or more computers for wherein including computer usable program code Storage media（Including but not limited to magnetic disk storage, CD-ROM, optical memory etc.）The shape of the computer program product of upper implementation Formula.

The foregoing is merely embodiments herein, is not limited to the application.For those skilled in the art For, the application can have various modifications and variations.All any modifications made within spirit herein and principle, be equal Replace, improve etc., it should be included within the scope of claims hereof.

Claims

A kind of 1. method for identifying character on printed matter, it is characterised in that including：

The printed matter is shot to obtain the image to be identified；

Described image is replicated to obtain at least two width duplicating images, and different figures is carried out respectively to every width duplicating image Picture is handled to obtain at least two width layered images；

Obtained layered image is subjected to figure layer merging, with image after being handled；

The image of each character is extracted from image after the processing；And

The image of each character to extracting carries out character recognition；

Wherein, it is described to carry out different image procossings respectively to every width duplicating image to obtain at least two width layered images, including：

The grey level histogram of a width in the duplicating image is analyzed to obtain the distribution feelings of the duplicating image gray value Condition, and the duplicating image is divided into by more sub-regions according to the distribution situation of the duplicating image gray value, to every height Region carries out gray scale adjustment respectively, to obtain the second layered image.
2. according to the method described in claim 1, it is characterized in that, the printed matter is shot to obtain the figure to be identified Picture, including：When being shot setting is exposed by predetermined condition.
3. according to the method described in claim 1, it is characterized in that, different image procossings is carried out respectively to every width duplicating image To obtain at least two width layered images, including：

Noise processing is removed to another width in the duplicating image to obtain first layer image.
4. according to the method described in claim 3, it is characterized in that, noise is removed to another width in the duplicating image Handle to obtain first layer image, including：

Identify the noise in the duplicating image；

The gray value phase adduction of the gray value of each noise eight pixels adjacent with around it is averaged as each The denoising gray value of noise；And

The gray value of each noise in the duplicating image is replaced with into the denoising gray value of the noise to obtain first layer figure Picture.
5. according to the method described in claim 4, it is characterized in that, identify that the noise in the duplicating image includes：

The gray value phase adduction of the gray value of each pixel in the duplicating image and its two neighbor pixel in left and right is taken Calculating gray value of the average value as each pixel；

Judge that the gray value of each pixel calculates the absolute value of the difference of gray value whether in predetermined threshold range with it；With And

Gray value and the absolute value for the difference for calculating gray value are identified as noise beyond the pixel of predetermined threshold range.
6. according to the method described in claim 1, it is characterized in that, the layered image is merged, after obtaining processing Image, including：

Intermediate value is taken to the gray value of corresponding pixel in the layered image, obtains the gray value intermediate value of each pixel；With And

The gray value of each pixel is replaced with to the gray value intermediate value of the pixel, with image after being handled.
7. according to the method described in claim 1, it is characterized in that, extract the figure of each character in image after the processing Picture, including：

Obtain the position of the text image after the processing in image；And

Character segmentation is carried out to the text image, extracts the image of each character in the text image.
8. the method according to the description of claim 7 is characterized in that obtain the position of the text image after the processing in image Put, including：

Edge texture in every row pixel is identified by edge detection；

Histogram is done to the Edge texture of every row pixel, and the identification threshold of edge primitive is determined according to the analysis to the histogram Value；

According to the number of the often capable edge primitive of the recognition threshold of edge primitive statistics, and record opening for every row edge primitive Beginning position and end position；

Identify the non-blank-white row in image after the processing；

Judge whether current non-blank-white row meets preset condition, if it is satisfied, then carrying out the detection of next non-blank-white row；And

When being consecutively detected the non-blank-white row more than predetermined number and meeting the preset condition, according to the side of each non-blank-white row The starting position of edge primitive and end position determine the position of text image.
9. according to the method described in claim 1, it is characterized in that, the image of each character to extracting is known into line character Not, including：Character recognition is carried out to the image of each character using BP neural network.
A kind of 10. device for identifying character on printed matter, it is characterised in that including：

Acquisition module, for being shot to the printed matter to obtain the image to be identified；

Hierarchical processing module, for being replicated to described image to obtain at least two width duplicating images, and to every width copy pattern As carrying out different image procossings respectively to obtain at least two width layered images；

Figure layer merging module, for obtained layered image to be carried out figure layer merging, with image after being handled；

Extraction module, for extracting the image of each character from image after the processing；And

Identification module, the image for each character to extracting carry out character recognition；

Wherein, it is described to carry out different image procossings respectively to every width duplicating image to obtain at least two width layered images, including：

The grey level histogram of a width in the duplicating image is analyzed to obtain the distribution feelings of the duplicating image gray value Condition, and the duplicating image is divided into by more sub-regions according to the distribution situation of the duplicating image gray value, to every height Region carries out gray scale adjustment respectively, to obtain the second layered image.
11. device according to claim 10, it is characterised in that the acquisition module, is further used for being shot When by predetermined condition be exposed setting.
12. device according to claim 10, it is characterised in that the hierarchical block includes：

Denoising module, is handled to obtain first layer figure for being removed noise to another width in the duplicating image Picture.
13. device according to claim 12, it is characterised in that the denoising module, including：

Noise identification module, for identifying the noise in the duplicating image；

Denoising gray value acquisition module, for by the gray value of the gray value of each noise and eight pixels adjacent around it Phase adduction is averaged the denoising gray value as each noise；And

Noise removes module, for the gray value of each noise in the duplicating image to be replaced with to the denoising gray value of the noise To obtain first layer image.
14. device according to claim 13, it is characterised in that the noise identification module includes：

Calculating sub module, for by the gray value of each pixel in the duplicating image and its two neighbor pixel in left and right Gray value phase adduction is averaged the calculating gray value as each pixel；

Whether judging submodule, the gray value for judging each pixel calculate the absolute value of the difference of gray value pre- with it Determine in threshold range；And

Submodule is identified, for the absolute value of gray value and the difference for calculating gray value to be exceeded to the pixel of predetermined threshold range It is identified as noise.
15. device according to claim 10, it is characterised in that the merging module includes：

Value module, for taking intermediate value to the gray value of corresponding pixel in the layered image, obtains each pixel Gray value intermediate value；And

Gray value replacement module, for the gray value of each pixel to be replaced with to the gray value intermediate value of the pixel, obtains everywhere Image after reason.
16. device according to claim 10, it is characterised in that the extraction module includes：

Position acquisition module, for obtaining the position of the text image after the processing in image；And

Character segmentation module, for carrying out Character segmentation to the text image, extracts each word in the text image The image of symbol.
17. device according to claim 16, it is characterised in that the position acquisition module includes：

Edge detection module, for identifying the Edge texture in every row pixel by edge detection；

Threshold value acquisition module, for doing histogram to the Edge texture of every row pixel, and determines according to the histogram analysis The recognition threshold of edge primitive；

Statistic record module, for counting the quantity of often row top edge primitive according to the recognition threshold of the edge primitive, and remembers Record the often starting position of row edge primitive and end position；

Non-blank-white row identification module, for identifying the non-blank-white row after the processing in image；

Condition judgment module, for judging whether current non-blank-white row meets preset condition, if it is satisfied, then carrying out next non-NULL The detection of white row；And

Position determination module, for when being consecutively detected the non-blank-white row more than predetermined number and meeting the preset condition, root The position of text image is determined according to the starting position and end position of the edge primitive of each non-blank-white row.
18. device according to claim 10, it is characterised in that the identification module is further used for, and utilizes BP nerves Network carries out character recognition to the image of each character.