CN106682671A - Image character recognition system - Google Patents

Image character recognition system Download PDF

Info

Publication number
CN106682671A
CN106682671A CN201611254376.0A CN201611254376A CN106682671A CN 106682671 A CN106682671 A CN 106682671A CN 201611254376 A CN201611254376 A CN 201611254376A CN 106682671 A CN106682671 A CN 106682671A
Authority
CN
China
Prior art keywords
pictures
sub
picture
character
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611254376.0A
Other languages
Chinese (zh)
Inventor
景亮
康青杨
唐涔轩
刘世林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Business Big Data Technology Co Ltd
Original Assignee
Chengdu Business Big Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Business Big Data Technology Co Ltd filed Critical Chengdu Business Big Data Technology Co Ltd
Priority to CN201611254376.0A priority Critical patent/CN106682671A/en
Publication of CN106682671A publication Critical patent/CN106682671A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Character Discrimination (AREA)

Abstract

The invention relates to the field of image recognition processing, in particular to an image character recognition system. The system comprises an image character segmentation module, a feature image generation module, a storage module, a normalization processing module and an image character recognition module. The image character segmentation module segments a to-be-processed image into sub-pictures, wherein each sub-picture comprises a single character and is stored in the storage module. The feature image generation module manufactures a corresponding character feature picture and stores the character feature picture into the storage module according to the typeface of a to-be-recognized image character. The normalization processing module extracts the feature picture and the to-be-recognized sub-picture stored in the storage module, carries out normalization processing according to corresponding types, and stores the processed picture information in the storage module. The image character recognition module extracts the sub-picture in the storage module, and calculates the coincidence degree of the sub-picture and the feature picture by use of the exclusive OR algorithm, thereby achieving recognition of character contents of the sub-picture and inputting the recognition results.

Description

System for recognizing characters from image
Technical field
Field of image recognition of the present invention, more particularly to system for recognizing characters from image.
Background technology
With the development and the progress of science and technology of society, the knowledge that the mankind create just is increased with exponential quantity, in electronics Before books occur, most knowledge is passed in the way of books, Chinese 5,000-year and down, is generated a large amount of outstanding Books, these books in the long korneforos of history, more or less all suffered it is different degrees of damage, therefore these books are carried out Digitized storage is extremely urgent;In taking care of books field, the fast search of book contents is helpful for quick positioning book, And because books quantity is too many, adding the books of early stage printing does not have the electronic manuscript of author, therefore the electronization of paper book It is necessary.
Optical character recognition is exactly to process the sharp weapon that this paper book is converted to electronic document, and it is mainly using big The character sample of amount, through the study of complex network, generates corresponding model file, so as to reach the mesh for recognizing character in picture 's.
Optical character recognition major function is the character in identification shooting, scanned picture, is being carried out in prior art In image during the identification of word, it is necessary first to open the character string cutting in image, the little picture comprising single word is formed, so Afterwards the word after cutting is identified using certain method.And carry out character segmentation most common method for sciagraphy, i.e., Be by pictograph binary conversion treatment after, the demarcation line between two words is found by vertical projection method, according to demarcation line will Character segmentation comes.But when the Chinese character comprising tiled configuration during there is adhesion, and image between the word in image, simply Projecting method be difficult to realize preferable cutting effect;Exactly because this reason causes cutting to be always the difficulty of OCR identifications Point, the quality of cutting will directly influence the recognition effect of word.
In addition optical character recognition major function be identification shoot, the character in scanned picture, for some are special The scanned copy of font, official seal is taken pictures, such as the books of early stage printing, certificate etc. that government unit makes, due to historical reasons with And secrecy and safety need, its font is often special, and existing optical character recognition focuses primarily upon machine learning Method, model calculation amount is big, and due to training font sample to be not covered with sytlized font, causes the identification of sytlized font Accuracy rate is not high, has a strong impact on the electronization of paper document.
Prior art is identified using neural network machine learning algorithm to character mostly, needs to make substantial amounts of sample This, takes a substantial amount of time and is trained, and the model file for generating is very huge, and for the character of different fonts, identification Rate is not quite similar, for some sytlized font characters, discrimination than relatively low, it is difficult to the character met under some special screnes is known Not.
The content of the invention
It is an object of the invention to overcome the above-mentioned deficiency in the presence of prior art, there is provided system for recognizing characters from image, Corresponding feature image is generated according to the font that user selects, on the basis of effective cutting is carried out to pictograph to be identified, The automatic identification of images to be recognized word is realized with reference to targetedly character feature picture.Quick work is provided for pictograph identification Tool.
In order to realize foregoing invention purpose, the invention provides technical scheme below:System for recognizing characters from image, the system System pictograph identification includes implemented below step:
(1) by images to be recognized character segmentation into the sub-pictures for only including single character;By numeral therein, letter and mark Point symbol, word subgraph is marked respectively;
(2) a sub-pictures are selected in each numeral, letter and the corresponding sub-pictures of punctuate, by the character in subgraph, Respectively up and down, left and right, upper left, lower-left, upper right and bottom right movement setpoint distance l, makes corresponding feature image, and to system Into feature image carry out corresponding mark;
Correspondence font is selected according to images to be recognized, samples pictures are generated, to the character in samples pictures respectively upwards, Under, left and right, upper left, lower-left, upper right and bottom right movement setpoint distance l, make corresponding feature image, and the feature to made by Picture carries out corresponding mark;
(3) feature image and picture to be identified are normalized:
The dimension of picture of feature image and sub-pictures to be identified is adjusted to into formed objects, and to each picture in each picture Plain gray value change into 0 or 1 respectively according to the threshold value for arranging (by the gray value of 0-255 in picture, according to the threshold value for arranging, It is converted into 0 or 1) the pixel value opsition dependent after conversion is stored in memory module;
(4) sub-pictures to be identified are contrasted with the feature image of corresponding types, the value execution of same location of pixels is different Or process, the number of times that statistics 1 occurs, the error frequency is designated as, using the corresponding mark of the minimum feature image of the error frequency as knowledge Other result is exported.
Specifically, the system is in the step (4), by digital, alphabetical and punctuate sub-pictures to be identified and numeral, Letter and punctuate feature image are contrasted, and the value of same location of pixels performs XOR and processes, and the number of times that statistics 1 occurs is designated as The error frequency, the corresponding mark of the minimum feature image of the error frequency is exported as recognition result;
Alphabetic character sub-pictures to be identified are contrasted with corresponding character features picture, the value of same location of pixels is held Row XOR process, the number of times that statistics 1 occurs is designated as the error frequency, and the corresponding mark of the minimum feature image of the error frequency is made Exported for recognition result.
Further, n*h < l < N*h.
Further, n≤1/4.
Further, the cutting of alphabetic character picture includes implemented below process:
The initial dicing position of alphabetic character picture is found out using sciagraphy, according to initial dicing position by images to be recognized Piece is cut into initial sub-pictures sequence;
Initial sub-pictures in sequence are processed using following rule:
A, cutting is carried out using sciagraphy images to be recognized word, be cut into sub-pictures sequence;By numeral therein, letter With punctuation mark out;
B, unlabelled sub-pictures are judged:Whether L≤M*h is met, and L is the width of sub-pictures character projection, and M is Coefficient, h is high for row;
For the sub-pictures of the condition that is unsatisfactory for carry out cutting, dicing position is determined according to below equation:
F (x)=g (x) t (x)
Step B is repeated, unlabelled sub-pictures are satisfied by condition in sequence:L≤M*h;
The overall width of adjacent two sub-pictures beyond C, letter digital in sequence and punctuate word picture judges: Whether L is metClose≤M*h;
If it is satisfied, sequentially the adjacent sub-pictures to meeting condition are merged;
Step C is repeated until the adjacent sub-pictures overall width in addition to numeral, letter and punctuate is unsatisfactory for LClose≤ M*h;
D, unlabelled sub-pictures in sequence are judged:If there are three adjacent sub-pictures in sequence, and three Individual sub-pictures meet:Width L≤the 0.5h of the first sub-pictures and the 3rd sub-pictures, and the width L >=h of middle sub-pictures, then will Middle sub-pictures are according to formula:
F (x)=g (x) t (x)
Determined by cut-off carry out cutting;According to the cut-off for determining, middle sub-pictures are cut into into son in the middle of first Picture and the second middle sub-pictures;
First sub-pictures and the first middle sub-pictures are merged;
Second middle sub-pictures and the 3rd sub-pictures are merged.
Further, 0.9≤M≤1.3.
As a kind of preferred:M=1.2.
Further, the system is the computer or server for being loaded with above-mentioned pictograph identification function program.
Compared with prior art, beneficial effects of the present invention:The present invention provides system for recognizing characters from image, is selected according to user The font selected, constructs primitive character picture, on the basis of primitive character picture, by the character in picture respectively to different directions The distance of mobile setting, makes corresponding feature templates;Feature templates made by so can preferably adapt to character picture and cut Divide faulty situation, thus with more preferable fault-tolerance.On the basis of feature image, recognized with XOR algorithm to be identified The similarity degree of sub-pictures and feature templates, calculating process straightforward procedure, recognition efficiency and reliability it is higher.
Additionally, present invention employs step by step to judge cutting after sub-pictures cutting quality, and to the son after cutting Picture is processed accordingly, the mode screened layer by layer and process, it is ensured that the cutting quality of sub-pictures;For final discrimination, Condition is further prepared.In addition compared to traditional cutting method, present system introduces amendment on the basis of amplitude Value, by the distance of dicing position and character edge as the Consideration for determining cut-off, therefore with higher accuracy, And occur multiple smaller values when special construction character is run into, or during extreme point, can quickly be found out by this formula Optimized cut-off, increased the accuracy of cutting, improve the efficiency of cutting;It is more preferable to the cutting effect of adhesion character.
On the basis of feature image and image character, sub-pictures to be identified and feature templates are recognized using XOR algorithm Similarity degree, calculating process straightforward procedure, recognition efficiency and reliability it is higher.
Description of the drawings:
Fig. 1 is the system structure diagram of this system for recognizing characters from image.
Fig. 2 realizes step or signal flow schematic diagram for what the pictograph of the system was recognized.
Fig. 3 is the making schematic diagram of digital template.
Fig. 4 is the making schematic diagram of word template.
Specific embodiment
With reference to test example and specific embodiment, the present invention is described in further detail.But this should not be understood Scope for above-mentioned theme of the invention is only limitted to below example, and all technologies realized based on present invention belong to this The scope of invention.
Present system provides system for recognizing characters from image, as shown in figure 1, including pictograph cutting module, characteristic pattern Piece generation module, memory module, normalized module and pictograph identification module;
Character in pending image is carried out cutting by described image character segmentation module, is cut into each only comprising single The sub-pictures of character, and the sub-pictures sequence after cutting is stored in memory module;
The feature image generation module, the font of the images to be recognized word selected according to user, produces corresponding Character feature picture, and the feature image being fabricated to is stored in the memory module;
The normalized module extracts the feature image and sub-pictures to be identified being stored in memory module, according to right The type answered, is normalized, and the pictorial information after process is stored in memory module;
Described image Text region module, extract memory module in sub-pictures, using XOR algorithm calculate sub-pictures with The matching degree of feature image, and then the identification of sub-pictures character content is realized, and recognition result is input into.
The Text region of the system includes implemented below step as shown in Figure 2:
(1) by images to be recognized character segmentation into the sub-pictures for only including single character;By numeral therein, letter and mark Point symbol, word subgraph is marked respectively that (labelling of this step, the only type of labelling sub-pictures, are not specifically known Not).When realizing, cutting is carried out using sciagraphy to pictograph to be identified, is cut into sub-pictures sequence, by it is therein numeral, Letter and punctuation mark are out;Such as the narrower width (being such as set to < 0.4h) of projection, the area of projection is less (0.5h*0.8h) the distance between adjacent sub-pictures for, being formed after cutting are significantly greater than distance of general character picture etc., utilize Features described above, first can cut out the sub-pictures for belonging to numeral, letter and punctuate.In numeral, letter and punctuation mark Sub-pictures and it is labeled on the basis of, cutting is carried out to unlabelled sub-pictures (alphabetic character picture), be cut into and only include The sub-pictures of single character.The sub-pictures cutting for carrying out step by step can reach more preferable cutting effect.
(2) a sub-pictures are selected in each numeral, letter and the corresponding sub-pictures of punctuate, by the character in subgraph, Respectively up and down, left and right, upper left, lower-left, upper right and bottom right movement setpoint distance l, makes corresponding feature image, such as Fig. 3 It is shown, and the feature image to made by carries out corresponding mark (this mark referred to, by the corresponding character content mark of feature image Remember out, such as 9 feature images in Fig. 2 are labeled as " 8 ");
Correspondence font is selected according to images to be recognized, samples pictures are generated, to the character in samples pictures respectively upwards, Under, left and right, upper left, lower-left, upper right and bottom right movement setpoint distance l, make corresponding feature image, and the feature to made by Picture carries out corresponding mark, and (this mark refers to, the corresponding character content of feature image is marked, such as in Fig. 4 9 feature images be labeled as:" word ");Character in template is moved into respectively the distance of setting, more than sub-pictures frame scope Character portion will be removed, the picture and artwork piece set a trap apart from rear formation to the movement of above-mentioned direction together form same word The sample for reference picture of the different cutting situations of 9 of symbol is as shown in figure 3, this may not be advised with character picture cutting in practical operation Then, faulty situation is corresponding, therefore the character recognition realized based on the feature templates of this method formation, with more preferable Fault-tolerance.
(3) feature image and picture to be identified are normalized:
The dimension of picture of feature image and sub-pictures to be identified is adjusted to into formed objects, and to each picture in each picture Plain gray value change into 0 or 1 respectively according to the threshold value for arranging (by the gray value of 0-255 in picture, according to the threshold value for arranging, It is converted into 0 or 1) the pixel value opsition dependent after conversion is stored in memory module;
(4) sub-pictures to be identified are contrasted with the feature image of corresponding types, the value execution of same location of pixels is different Or process (if feature image is identical with the value of picture corresponding pixel points to be identified, XOR calculate after value be 0;If feature Picture is different with the value of picture corresponding pixel points to be identified, and the value after XOR is calculated is the 1) number of times that statistics 1 occurs, and is designated as missing Difference frequency, the corresponding mark of the minimum feature image of the error frequency is exported as recognition result.
Specifically, in the step (4), by digital, alphabetical and punctuate sub-pictures to be identified and numeral, letter and punctuate Feature image is contrasted, and the value of same location of pixels performs XOR and processes, and the number of times that statistics 1 occurs is designated as the error frequency, will The corresponding mark of the minimum feature image of the error frequency is exported as recognition result;
Alphabetic character sub-pictures to be identified are contrasted with corresponding character features picture, the value of same location of pixels is held Row XOR process, the number of times that statistics 1 occurs is designated as the error frequency, and the corresponding mark of the minimum feature image of the error frequency is made Exported for recognition result.
Present system recognizes the similarity degree of sub-pictures to be identified and feature templates, calculating process using XOR algorithm Straightforward procedure, recognition efficiency and reliability it is higher.
Further, the cutting of alphabetic character picture includes implemented below process:
The initial dicing position of alphabetic character picture is found out using sciagraphy, according to initial dicing position by images to be recognized Piece is cut into initial sub-pictures sequence;
Initial sub-pictures in sequence are processed using following rule:
A, cutting is carried out using sciagraphy images to be recognized word, be cut into sub-pictures sequence;By numeral therein, letter With punctuation mark out;
B, unlabelled sub-pictures are judged:Whether L≤M*h is met, and L is the width of sub-pictures character projection, and M is Coefficient, h is high for row;
For the sub-pictures of the condition that is unsatisfactory for carry out cutting, dicing position is determined according to below equation:
F (x)=g (x) t (x)
Step B is repeated, unlabelled sub-pictures are satisfied by condition in sequence:L≤M*h.
F (x) is amplitude in formula, and x is row subpoint coordinate in the row direction, and h is high for the row of current character, and g (x) is to repair On the occasion of t (x) is row projection value, and both together decide on the amplitude of subpoint, when amplitude is minimum, between as two characters Cut point;Minimum amplitude point is found as cut-off, through the amendment of g (x) compared to simple minimum row projection value, we The cut-off found in method method, introduces the considerations of cut-off position and character edge distance, therefore with higher Accuracy, and occur multiple smaller values when special construction character is run into, or during extreme point, can be fast by this formula Fast finds out optimized cut-off, increased the accuracy of cutting, improves the efficiency of cutting.
The overall width of adjacent two sub-pictures beyond C, letter digital in sequence and punctuate word picture judges: Whether L is metClose≤M*h;
If it is satisfied, sequentially the adjacent sub-pictures to meeting condition are merged;
Step C is repeated until the adjacent sub-pictures overall width in addition to numeral, letter and punctuate is unsatisfactory for LClose≤ M*h;
D, unlabelled sub-pictures in sequence are judged:If there are three adjacent sub-pictures in sequence, and three Individual sub-pictures meet:Width L≤the 0.5h of the first sub-pictures and the 3rd sub-pictures, and the width L >=h of middle sub-pictures, then will Middle sub-pictures are according to formula:
F (x)=g (x) t (x)
Determined by cut-off carry out cutting;According to the cut-off for determining, middle sub-pictures are cut into into son in the middle of first Picture and the second middle sub-pictures;
First sub-pictures and the first middle sub-pictures are merged;
Second middle sub-pictures and the 3rd sub-pictures are merged.
In some cases:The character picture of continuous two tiled configurations, centre has adhesion, then using projection When method carries out cutting, the radical in the middle of Qian Hou character may be cut, but for the radical of adhesion between two characters is recognized Not, character cutting situation out is treated as;Present system has in this case preferable treatment effect, for The mid portion of adhesion searches out optimal cut-off by above-mentioned formula, and by cutting after before and after the radical of character carry out weight New integration, has reached preferable cutting effect.
Above-mentioned rule is sequentially recycled, and through continuous iteration, ultimately forms the only sub-pictures comprising single character, Good cutting effect is that pictograph identification has prepared condition.
Further, 0.9≤M≤1.3.Being arranged in the range of this for sub-pictures width threshold value, can realize preferably cutting Divide and recognition effect.
As a kind of preferred:M=1.2.Verify repeatedly through experiment, when M is set to into 1.2, can realize preferably cutting Divide effect.
Further, the system is the computer or server for being loaded with above-mentioned pictograph identification function program.

Claims (8)

1. system for recognizing characters from image, it is characterised in that the system realizes pictograph identification comprising implemented below step:
(1) by images to be recognized character segmentation into the sub-pictures for only including single character;By numeral therein, letter and punctuate symbol Number, word subgraph is marked respectively;
(2) a sub-pictures are selected in each numeral, letter and the corresponding sub-pictures of punctuate, by the character in subgraph, difference Up and down, left and right, upper left, lower-left, upper right and bottom right movement setpoint distance l, makes corresponding feature image, and to made by Feature image carries out corresponding mark;
Correspondence font is selected according to images to be recognized, samples pictures are generated, to the character in samples pictures respectively up and down, it is left, The right side, upper left, lower-left, upper right and bottom right movement setpoint distance l, makes corresponding feature image, and feature image is entered to made by The corresponding mark of row;
(3) feature image and picture to be identified are normalized, and by the pixel respective value of each picture, step-by-step is stored in In memory module;
(4) sub-pictures to be identified are contrasted with the feature image of corresponding types, the value of same location of pixels is performed at XOR Reason, the number of times that statistics 1 occurs, is designated as the error frequency;Using the corresponding mark of the minimum feature image of the error frequency as identification knot Fruit is exported.
2. the system as claimed in claim 1, it is characterised in that n*h < l < N*h.
3. system as claimed in claim 2, it is characterised in that n≤1/4.
4. the system as claimed in claim 1, it is characterised in that the system, in normalized process include:By feature The dimension of picture of picture and sub-pictures to be identified is adjusted to formed objects;
0 or 1 is changed into respectively according to the threshold value for arranging to each grey scale pixel value in each picture, by the pixel value after conversion Opsition dependent is stored in memory module.
5. the system as described in one of Claims 1-4, it is characterised in that the cutting of alphabetic character picture includes implemented below Process:
A, by digital, the alphabetical and punctuation mark in sequence of pictures out;
B, unlabelled sub-pictures are judged:Whether L≤M*h is met, and L is the width of sub-pictures character projection, and M is to be Number, h is high for row;
For the sub-pictures of the condition that is unsatisfactory for carry out cutting, dicing position is determined according to below equation:
F (x)=g (x) t (x)
g ( x ) = 1 1 + e - 0.01 | x - h |
Step B is repeated, unlabelled sub-pictures are satisfied by condition in sequence:L≤M*h;
The overall width of adjacent two sub-pictures beyond C, letter digital in sequence and punctuate word picture judges:Whether Meet LClose≤M*h;
If it is satisfied, sequentially the adjacent sub-pictures to meeting condition are merged;
Step C is repeated until the adjacent sub-pictures overall width in addition to numeral, letter and punctuate is unsatisfactory for LClose≤M*h;
D, unlabelled sub-pictures in sequence are judged:If there are three adjacent sub-pictures in sequence, and three sons Picture meets:Width L≤the 0.5h of the first sub-pictures and the 3rd sub-pictures, and the width L >=h of middle sub-pictures, then by centre Sub-pictures are according to formula:
F (x)=g (x) t (x)
g ( x ) = 1 1 + e - 0.01 | x - 0.5 h |
Determined by cut-off carry out cutting;According to the cut-off for determining, middle sub-pictures are cut into into the first middle sub-pictures With the second middle sub-pictures;
First sub-pictures and the first middle sub-pictures are merged;
Second middle sub-pictures and the 3rd sub-pictures are merged.
6. system as claimed in claim 5, it is characterised in that 0.9≤M≤1.3.
7. system as claimed in claim 6, it is characterised in that M=1.2.
8. system as claimed in claim 7, it is characterised in that the system is to be loaded with above-mentioned pictograph identification function journey The computer or server of sequence.
CN201611254376.0A 2016-12-29 2016-12-29 Image character recognition system Pending CN106682671A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611254376.0A CN106682671A (en) 2016-12-29 2016-12-29 Image character recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611254376.0A CN106682671A (en) 2016-12-29 2016-12-29 Image character recognition system

Publications (1)

Publication Number Publication Date
CN106682671A true CN106682671A (en) 2017-05-17

Family

ID=58872298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611254376.0A Pending CN106682671A (en) 2016-12-29 2016-12-29 Image character recognition system

Country Status (1)

Country Link
CN (1) CN106682671A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682698A (en) * 2016-12-29 2017-05-17 成都数联铭品科技有限公司 OCR identification method based on template matching
CN107292280A (en) * 2017-07-04 2017-10-24 盛世贞观(北京)科技有限公司 A kind of seal automatic font identification method and identifying device
CN107545391A (en) * 2017-09-07 2018-01-05 安徽共生物流科技有限公司 A kind of logistics document intellectual analysis and automatic storage method based on image recognition
CN109034149A (en) * 2017-06-08 2018-12-18 北京君正集成电路股份有限公司 A kind of character identifying method and device
CN110390508A (en) * 2019-06-10 2019-10-29 平安科技(深圳)有限公司 Schedule method, apparatus and storage medium are created based on OCR
CN110942074A (en) * 2018-09-25 2020-03-31 京东数字科技控股有限公司 Character segmentation recognition method and device, electronic equipment and storage medium
CN113627849A (en) * 2021-08-12 2021-11-09 深圳市全景世纪科技有限公司 Method and system for improving automatic goods customer information acquisition recognition rate

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07131622A (en) * 1993-10-29 1995-05-19 Matsushita Graphic Commun Syst Inc Facsimile equipment
JPH08129443A (en) * 1994-07-13 1996-05-21 Yashima Denki Co Ltd Holograph storing and reproducing device, holograph reproducing method, and picture reproducing method
JP2001351065A (en) * 2000-06-05 2001-12-21 Japan Science & Technology Corp Method for recognizing character, computer readable recording medium recording character recognition program and character recognition device
JP2002230482A (en) * 2000-11-28 2002-08-16 Fujitsu Ltd Character recognizing method and device
JP2004334913A (en) * 2004-08-19 2004-11-25 Matsushita Electric Ind Co Ltd Document recognition device and document recognition method
JP2006331354A (en) * 2005-05-30 2006-12-07 Sharp Corp Character recognition device, character recognition method, its program and recording medium
JP2008004116A (en) * 2007-08-02 2008-01-10 Hitachi Ltd Method and device for retrieving character in video
CN101571921A (en) * 2008-04-28 2009-11-04 富士通株式会社 Method and device for identifying key words
CN102663378A (en) * 2012-03-22 2012-09-12 杭州新锐信息技术有限公司 Method for indentifying joined-up handwritten characters
CN102915440A (en) * 2011-08-03 2013-02-06 汉王科技股份有限公司 Method and device for character segmentation
JP2015215893A (en) * 2014-05-08 2015-12-03 株式会社Nttドコモ Identification method and facility for sign character of exercise participant
CN105447522A (en) * 2015-11-25 2016-03-30 成都数联铭品科技有限公司 Complex image character identification system
CN105654072A (en) * 2016-03-24 2016-06-08 哈尔滨工业大学 Automatic character extraction and recognition system and method for low-resolution medical bill image
CN105678292A (en) * 2015-12-30 2016-06-15 成都数联铭品科技有限公司 Complex optical text sequence identification system based on convolution and recurrent neural network
CN106682698A (en) * 2016-12-29 2017-05-17 成都数联铭品科技有限公司 OCR identification method based on template matching

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07131622A (en) * 1993-10-29 1995-05-19 Matsushita Graphic Commun Syst Inc Facsimile equipment
JPH08129443A (en) * 1994-07-13 1996-05-21 Yashima Denki Co Ltd Holograph storing and reproducing device, holograph reproducing method, and picture reproducing method
JP2001351065A (en) * 2000-06-05 2001-12-21 Japan Science & Technology Corp Method for recognizing character, computer readable recording medium recording character recognition program and character recognition device
JP2002230482A (en) * 2000-11-28 2002-08-16 Fujitsu Ltd Character recognizing method and device
JP2004334913A (en) * 2004-08-19 2004-11-25 Matsushita Electric Ind Co Ltd Document recognition device and document recognition method
JP2006331354A (en) * 2005-05-30 2006-12-07 Sharp Corp Character recognition device, character recognition method, its program and recording medium
JP2008004116A (en) * 2007-08-02 2008-01-10 Hitachi Ltd Method and device for retrieving character in video
CN101571921A (en) * 2008-04-28 2009-11-04 富士通株式会社 Method and device for identifying key words
CN102915440A (en) * 2011-08-03 2013-02-06 汉王科技股份有限公司 Method and device for character segmentation
CN102663378A (en) * 2012-03-22 2012-09-12 杭州新锐信息技术有限公司 Method for indentifying joined-up handwritten characters
JP2015215893A (en) * 2014-05-08 2015-12-03 株式会社Nttドコモ Identification method and facility for sign character of exercise participant
CN105447522A (en) * 2015-11-25 2016-03-30 成都数联铭品科技有限公司 Complex image character identification system
CN105678292A (en) * 2015-12-30 2016-06-15 成都数联铭品科技有限公司 Complex optical text sequence identification system based on convolution and recurrent neural network
CN105654072A (en) * 2016-03-24 2016-06-08 哈尔滨工业大学 Automatic character extraction and recognition system and method for low-resolution medical bill image
CN106682698A (en) * 2016-12-29 2017-05-17 成都数联铭品科技有限公司 OCR identification method based on template matching

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682698A (en) * 2016-12-29 2017-05-17 成都数联铭品科技有限公司 OCR identification method based on template matching
CN109034149A (en) * 2017-06-08 2018-12-18 北京君正集成电路股份有限公司 A kind of character identifying method and device
CN107292280A (en) * 2017-07-04 2017-10-24 盛世贞观(北京)科技有限公司 A kind of seal automatic font identification method and identifying device
CN107545391A (en) * 2017-09-07 2018-01-05 安徽共生物流科技有限公司 A kind of logistics document intellectual analysis and automatic storage method based on image recognition
CN110942074A (en) * 2018-09-25 2020-03-31 京东数字科技控股有限公司 Character segmentation recognition method and device, electronic equipment and storage medium
CN110942074B (en) * 2018-09-25 2024-04-09 京东科技控股股份有限公司 Character segmentation recognition method and device, electronic equipment and storage medium
CN110390508A (en) * 2019-06-10 2019-10-29 平安科技(深圳)有限公司 Schedule method, apparatus and storage medium are created based on OCR
CN113627849A (en) * 2021-08-12 2021-11-09 深圳市全景世纪科技有限公司 Method and system for improving automatic goods customer information acquisition recognition rate

Similar Documents

Publication Publication Date Title
CN106682671A (en) Image character recognition system
CN106682698A (en) OCR identification method based on template matching
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN111401372B (en) Method for extracting and identifying image-text information of scanned document
CN104809481B (en) A kind of natural scene Method for text detection based on adaptive Color-based clustering
CN106611174A (en) OCR recognition method for unusual fonts
US6252988B1 (en) Method and apparatus for character recognition using stop words
CN105512611A (en) Detection and identification method for form image
CN105447522A (en) Complex image character identification system
CN106682667A (en) Image-text OCR (optical character recognition) system for uncommon fonts
Ferrer et al. Lbp based line-wise script identification
CN104008384A (en) Character identification method and character identification apparatus
CN105426856A (en) Image table character identification method
JP2006053920A (en) Character recognition program, method and device
CN105469053A (en) Bayesian optimization-based image table character segmentation method
CN107463866A (en) A kind of method of the hand-written laboratory report of identification for performance evaluation
Yin et al. Decipherment of historical manuscript images
Yadav et al. A robust approach for offline English character recognition
CN109685061A (en) The recognition methods of mathematical formulae suitable for structuring
CN106778759A (en) For the feature image automatic creation system of pictograph identification
Darma et al. Segmentation of balinese script on lontar manuscripts using projection profile
CN106682666A (en) Characteristic template manufacturing method for unusual font OCR identification
CN108062548B (en) Braille square self-adaptive positioning method and system
CN110674678A (en) Method and device for identifying sensitive mark in video
CN112580738B (en) AttentionOCR text recognition method and device based on improvement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170517