CN106503711A - A kind of character recognition method - Google Patents
A kind of character recognition method Download PDFInfo
- Publication number
- CN106503711A CN106503711A CN201611007796.9A CN201611007796A CN106503711A CN 106503711 A CN106503711 A CN 106503711A CN 201611007796 A CN201611007796 A CN 201611007796A CN 106503711 A CN106503711 A CN 106503711A
- Authority
- CN
- China
- Prior art keywords
- image
- word
- character recognition
- recognition method
- obtains
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/146—Aligning or centring of the image pick-up or image-field
- G06V30/1475—Inclination or skew detection or correction of characters or of image to be recognised
- G06V30/1478—Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Input (AREA)
Abstract
The present invention relates to image identification technical field, more particularly, to a kind of character recognition method, including:Obtain images to be recognized;To obtaining Image semantic classification:Image slant correction is processed with correction chart picture and thresholding and obtains the single image of foreground information and background information;Analyzing and processing image:The textural characteristics in the ranks of analysis of the image, obtain the word matrix parameter of image;Segmentation figure picture:Image is cut based on the word matrix parameter, form several subimages, obtain the word block of image;Identification:Individual processing is carried out to word block, obtains the characteristics of image of word block, and described image feature is identified;Described image correction includes:Expanding treatment is first carried out to image, is detected the edge of expanded rear image, is carried out hough conversion to edge point and find the angle of nose section, obtain straight line angle and rotate image angle.The method word is simple, and discrimination is high.
Description
【Technical field】
The present invention relates to image identification technical field, a kind of more particularly, to a kind of computer glitch detecting system and method
Character recognition method.
【Background technology】
With the extensive application of the image acquisition equipments such as digital camera, photographic head, ultrahigh speed scanner, in image, information is got over
To get over the concern for causing people.A kind of important expression way that word in the picture be image, semantic content, energy are wherein embedded in
Important information required for some people is enough provided.Such as word in image can be the Description of content of the image, if energy
Enough automatically extract and recognize the word in image, it is possible to allow computer automatic understanding picture material.Computer is allowed as the mankind one
There is extremely important meaning for word in sample identification image, storage, classification, understanding and retrieval for image and video etc.
Justice, it are mainly used in the high-tech sectors such as Chinese information processing, office automation and its translation, artificial intelligence, have wide
General application prospect and commercial value.At present to image in word typically just carried out by the process of simple image segmentation
Identification, it is impossible to carry out Automatic adjusument according to the character features in image, cause existing image character recognition method precision
Relatively low, it is impossible to meet the demand of practical application.
【Content of the invention】
In view of the foregoing, it is necessary to which a kind of computer glitch detecting system and a kind of character recognition method of method are provided,
Purpose is to solve the existing image character recognition method technical problem relatively low to the accuracy of identification of word.
The purpose of the present invention is achieved through the following technical solutions:
A kind of character recognition method, comprises the following steps:
Obtain images to be recognized;
To obtaining Image semantic classification:Thresholding process obtains the single image of foreground information and background information;Image inclines
Tiltedly with correction chart picture, described image slant correction is comprised the following steps for correction:Image is processed to thresholding and is taken as text to be corrected
This, text to be corrected extracts the straight line in image, according to length and the angle of inclination of the straight line, to institute by Hough transformation
State straight line to be filtered;Straight line after for filtration, determines that the median at the angle of inclination is the text image to be corrected
Angle of inclination;According to the angle of inclination of the text image to be corrected, the text image to be corrected is rotated;
Analyzing and processing image:The textural characteristics in the ranks of analysis of the image, obtain the word matrix parameter of image;
Segmentation figure picture:Image is cut based on the word matrix parameter, form several subimages, obtain image
Word block;
Identification:Individual processing is carried out to word block, obtains the characteristics of image of word block, and described image feature is entered
Row identification.
Further, described also include to recognizing that image carries out image noise reduction lifting knowledge to obtaining Image semantic classification
The degree of accuracy of other places reason.
Further, described image noise reduction process can adopt Wavelet-denoising Method, morphology scratch filter method, intermediate value filter
The methods such as ripple device method, adaptive wiener filter method and mean filter method.
Further, the thresholding is processed includes fixed threshold method, self-adaption thresholding method, Da-Jin algorithm or changes
Dai Fa.
Further, image in image array is divided into the font in the matrix coordinate by image with the first pixel value table
Show, background is represented with the second pixel value that the number of every the second pixel value of row in the matrix coordinate of statistical picture obtains an array;
Statistics to some row high parameters, average statistics by parameter, acquisition font size parameter.
Further, described identification based on default clustering algorithm to cutting after word sub-block carry out at image segmentation
Reason, obtains the Word message in word block, and is compared in preset system literal storehouse according to the Word message, root
According to the word compared in structure determination image.
Further, the analyzing and processing image also includes carrying out expansion process to word block.
Further, the identification step includes that extracted word block is identified after being normalized again.
Beneficial effect of the present invention:The present invention is analyzed by the high textural characteristics of the row matrix for recognizing image, calculates figure
As the matrix parameter of word, then character script size parameter is estimated based on the related matrix parameter of word, then to each
Individual word is split soon, and word sub-block is identified, and improves the accuracy of cutting word sub-block, so as to improve word
The precision of identification.
【Specific embodiment】
A kind of character recognition method, it is characterised in that comprise the following steps:
Obtain images to be recognized;Images to be recognized can be any image for needing to carry out Text region, images to be recognized
Can come from external equipment.Images to be recognized can be original image, or obtain after carrying out pretreatment to original image
Image, image to be identified can be the picture formats such as jpg, bmp, png.
To obtaining Image semantic classification, including thresholding process, thresholding process and slant correction.Thresholding process:Institute
Stating thresholding and processing includes fixed threshold method, self-adaption thresholding method, Da-Jin algorithm or iterative method.The thresholding of image has
Beneficial to the further process of image, the single image of foreground information and background information is obtained, makes image become simple, and data
Amount reduces, and can highlight the profile of target interested.Thresholding process:Set as the quality of images to be recognized is limited to input
Standby, environment and the printing quality of document, in image, printed character is identified before processing, needs according to noise
Feature carries out denoising to images to be recognized, lifts the degree of accuracy of identifying processing, and image noise reduction process can be gone using small echo
Make an uproar the methods such as method, morphology scratch filter method, median filter method, adaptive wiener filter method and mean filter method.
Slant correction:As scanning and shooting process are related to artificial operation, the images to be recognized for being input into computer more or less can all be deposited
In some inclinations, in image, printed character is identified before processing, it is necessary to carry out image direction detection, and correction chart
Image space to.In the present embodiment, slant correction is concretely comprised the following steps:Image is processed to thresholding and is taken as text to be corrected
This, text to be corrected extracts the straight line in image, according to length and the angle of inclination of the straight line, to institute by Hough transformation
State straight line to be filtered;Straight line after for filtration, determines that the median at the angle of inclination is the text image to be corrected
Angle of inclination;According to the angle of inclination of the text image to be corrected, the text image to be corrected is rotated.Two-value
In matrix, each character point is converted to the straight line in polar coordinate system, is in straight line position in image space coordinate system
On all character points can intersect at a point in polar coordinate system;Then numerical value in polar coordinate system exceeded the point mark of a certain threshold value
It is designated as the straight line of corresponding position in image space coordinate system;Finally, to image space coordinate system in every straight line,
According to the information of the character point being located on this straight line, corresponding line segment can be just obtained.
Analyzing and processing image, the textural characteristics in the ranks of analysis of the image obtain the word matrix parameter of image;By image array
The font that middle image is divided in the matrix coordinate by image represents that with the first pixel value background is represented with the second pixel value, statistics
The number of every the second pixel value of row in the matrix coordinate of image, obtains an array;To some row high parameters, parameter is averaged statistics
Data-Statistics, obtain font size parameter.
Segmentation figure picture:Image is cut based on the word matrix parameter, form several subimages, obtain image
Word block;Word in also including to image before image cutting is carried out in character area carries out judgement orientation, can
To word block scanning element line by line, to obtain the line space and column pitch of word in word block, and calculate literal line
Height variance and text line width variance.The height variance of the literal line is used for the concordance for reflecting literal line height, and
The width variance of text column is used for the concordance for reflecting word column width.Then the height of comprehensive this article between word spacing and literal line
Or the factor such as the concordance of the width of text line is come to judge the word be transversely arranged or longitudinal arrangement.For example, if line space
It is more than column pitch, and literal line is highly consistent, then judges that word is transversely arranged in character area.If column pitch is more than in the ranks
Away from, and word column width is consistent, then judge that word is longitudinal arrangement in character area.The cutting result of word block is carried out
Revise, such as the word row or column after including false segmentation merges, or to English initial and the false segmentation of the second letter
It is modified
Identification:Individual processing is carried out to word block, obtains the characteristics of image of word block, and described image feature is entered
Row identification;Before word being extracted using the word block after printed page analysis and individual character slicing operation from character area,
Expansion process can also be carried out to the word block, then retain word edge gradient using the word block, remove local back
The interference of scape gradient, so as to from the character area by each Word Input out, and extracted word is normalized
Process, will all words zoom to unified size, the feature for finally extracting each word is identified.
Upper described, only it is presently preferred embodiments of the present invention, not makees any pro forma restriction to the present invention, although
The present invention is disclosed as above with preferred embodiment, but is not limited to the present invention, and any those skilled in the art are not taking off
In the range of technical solution of the present invention, when the technology contents using the disclosure above make a little change or are modified to equivalent variations
Equivalent embodiments, as long as being that the technical spirit according to the present invention is to above example without departing from technical solution of the present invention content
Any brief introduction modification, equivalent variations and the modification that is made, still falls within the scope of technical solution of the present invention.
Claims (8)
1. a kind of character recognition method, it is characterised in that comprise the following steps:
Obtain images to be recognized;
To obtaining Image semantic classification:Thresholding process obtains the single image of foreground information and background information;Image inclines school
Just with correction chart picture, described image slant correction is comprised the following steps:Image is processed to thresholding and is taken as text to be corrected, treated
Correction text extracts the straight line in image, according to length and the angle of inclination of the straight line, to the straight line by Hough transformation
Filtered;Straight line after for filtration, determines that the median at the angle of inclination is the inclination of the text image to be corrected
Angle;According to the angle of inclination of the text image to be corrected, the text image to be corrected is rotated.
Analyzing and processing image:The textural characteristics in the ranks of analysis of the image, obtain the word matrix parameter of image;
Segmentation figure picture:Image is cut based on the word matrix parameter, form several subimages, obtain the text of image
Word block;
Identification:Individual processing is carried out to word block, obtains the characteristics of image of word block, and described image feature is known
Not.
2. character recognition method according to claim 1, it is characterised in that:Described to obtain Image semantic classification also include right
Recognize that image carries out image noise reduction to lift the degree of accuracy of identifying processing.
3. character recognition method according to claim 2, it is characterised in that:Described image noise reduction process can be gone using small echo
Method, morphology scratch filter method, median filter method, adaptive wiener filter method and the mean filter method of making an uproar is carried out.
4. character recognition method according to claim 1, it is characterised in that:The thresholding is processed includes fixed threshold side
Method, self-adaption thresholding method and Da-Jin algorithm or iterative method.
5. character recognition method according to claim 1, it is characterised in that:Image in image array is divided into the square of image
Font in battle array coordinate represents that with the first pixel value background is represented with the second pixel value, every row the in statistical picture matrix coordinate
The number of two pixel values, obtains an array;Statistics to some row high parameters, average statistics by parameter, obtains font size and joins
Number.
6. character recognition method according to claim 1, it is characterised in that:The identification is based on default clustering algorithm to cutting
Word sub-block after point carries out image segmentation process, obtains the Word message in word block, and is existed according to the Word message
Compare in preset system literal storehouse, according to the word compared in structure determination image.
7. character recognition method according to claim 1, it is characterised in that:The analyzing and processing image is also included to word word
Block carries out expansion process.
8. character recognition method according to claim 1, it is characterised in that:The identification step includes extracted word block
It is identified after being normalized again.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611007796.9A CN106503711A (en) | 2016-11-16 | 2016-11-16 | A kind of character recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611007796.9A CN106503711A (en) | 2016-11-16 | 2016-11-16 | A kind of character recognition method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106503711A true CN106503711A (en) | 2017-03-15 |
Family
ID=58324587
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611007796.9A Pending CN106503711A (en) | 2016-11-16 | 2016-11-16 | A kind of character recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106503711A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108875721A (en) * | 2017-12-18 | 2018-11-23 | 南通艾思达智能科技有限公司 | A kind of more specification text cooperatives positioning and extracting method |
CN109409377A (en) * | 2018-12-03 | 2019-03-01 | 龙马智芯(珠海横琴)科技有限公司 | The detection method and device of text in image |
CN109542285A (en) * | 2018-11-16 | 2019-03-29 | 北京小米移动软件有限公司 | Image processing method and device |
CN109886978A (en) * | 2019-02-20 | 2019-06-14 | 贵州电网有限责任公司 | A kind of end-to-end warning information recognition methods based on deep learning |
CN109981920A (en) * | 2017-12-28 | 2019-07-05 | 夏普株式会社 | Image processing apparatus, image processing program, image processing method and image forming apparatus |
CN110163203A (en) * | 2019-04-09 | 2019-08-23 | 浙江口碑网络技术有限公司 | Character identifying method, device, storage medium and computer equipment |
CN110427939A (en) * | 2019-08-02 | 2019-11-08 | 泰康保险集团股份有限公司 | Method, apparatus, medium and the electronic equipment of correction inclination text image |
CN110858306A (en) * | 2018-08-22 | 2020-03-03 | 西门子(中国)有限公司 | License plate character recognition apparatus, method and computer-readable storage medium |
CN111079756A (en) * | 2018-10-19 | 2020-04-28 | 杭州萤石软件有限公司 | Method and equipment for extracting and reconstructing table in document image |
CN111223109A (en) * | 2020-01-03 | 2020-06-02 | 四川新网银行股份有限公司 | Complex form image analysis method |
CN111401371A (en) * | 2020-06-03 | 2020-07-10 | 中邮消费金融有限公司 | Text detection and identification method and system and computer equipment |
CN112883942A (en) * | 2021-04-28 | 2021-06-01 | 北京世纪好未来教育科技有限公司 | Evaluation method and device for handwritten character, electronic equipment and computer storage medium |
CN113569859A (en) * | 2021-07-27 | 2021-10-29 | 北京奇艺世纪科技有限公司 | Image processing method and device, electronic equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101038626A (en) * | 2007-04-25 | 2007-09-19 | 上海大学 | Method and device for recognizing test paper score |
CN103324930A (en) * | 2013-06-28 | 2013-09-25 | 浙江大学苏州工业技术研究院 | License plate character segmentation method based on grey level histogram binaryzation |
CN104036241A (en) * | 2014-05-30 | 2014-09-10 | 宁波海视智能系统有限公司 | License plate recognition method |
CN104050450A (en) * | 2014-06-16 | 2014-09-17 | 西安通瑞新材料开发有限公司 | Vehicle license plate recognition method based on video |
CN104298982A (en) * | 2013-07-16 | 2015-01-21 | 深圳市腾讯计算机系统有限公司 | Text recognition method and device |
CN105631486A (en) * | 2014-10-27 | 2016-06-01 | 深圳Tcl数字技术有限公司 | Method and device for recognizing images and characters |
-
2016
- 2016-11-16 CN CN201611007796.9A patent/CN106503711A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101038626A (en) * | 2007-04-25 | 2007-09-19 | 上海大学 | Method and device for recognizing test paper score |
CN103324930A (en) * | 2013-06-28 | 2013-09-25 | 浙江大学苏州工业技术研究院 | License plate character segmentation method based on grey level histogram binaryzation |
CN104298982A (en) * | 2013-07-16 | 2015-01-21 | 深圳市腾讯计算机系统有限公司 | Text recognition method and device |
CN104036241A (en) * | 2014-05-30 | 2014-09-10 | 宁波海视智能系统有限公司 | License plate recognition method |
CN104050450A (en) * | 2014-06-16 | 2014-09-17 | 西安通瑞新材料开发有限公司 | Vehicle license plate recognition method based on video |
CN105631486A (en) * | 2014-10-27 | 2016-06-01 | 深圳Tcl数字技术有限公司 | Method and device for recognizing images and characters |
Non-Patent Citations (1)
Title |
---|
宁博: "手写体汉字识别实验平台及笔划网格特征提取方法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108875721A (en) * | 2017-12-18 | 2018-11-23 | 南通艾思达智能科技有限公司 | A kind of more specification text cooperatives positioning and extracting method |
CN109981920A (en) * | 2017-12-28 | 2019-07-05 | 夏普株式会社 | Image processing apparatus, image processing program, image processing method and image forming apparatus |
CN109981920B (en) * | 2017-12-28 | 2021-10-01 | 夏普株式会社 | Image processing apparatus, image processing program, image processing method, and image forming apparatus |
CN110858306A (en) * | 2018-08-22 | 2020-03-03 | 西门子(中国)有限公司 | License plate character recognition apparatus, method and computer-readable storage medium |
CN111079756A (en) * | 2018-10-19 | 2020-04-28 | 杭州萤石软件有限公司 | Method and equipment for extracting and reconstructing table in document image |
CN111079756B (en) * | 2018-10-19 | 2023-09-19 | 杭州萤石软件有限公司 | Form extraction and reconstruction method and equipment in receipt image |
CN109542285A (en) * | 2018-11-16 | 2019-03-29 | 北京小米移动软件有限公司 | Image processing method and device |
CN109409377A (en) * | 2018-12-03 | 2019-03-01 | 龙马智芯(珠海横琴)科技有限公司 | The detection method and device of text in image |
CN109886978A (en) * | 2019-02-20 | 2019-06-14 | 贵州电网有限责任公司 | A kind of end-to-end warning information recognition methods based on deep learning |
CN110163203B (en) * | 2019-04-09 | 2021-08-24 | 浙江口碑网络技术有限公司 | Character recognition method, device, storage medium and computer equipment |
CN110163203A (en) * | 2019-04-09 | 2019-08-23 | 浙江口碑网络技术有限公司 | Character identifying method, device, storage medium and computer equipment |
CN110427939A (en) * | 2019-08-02 | 2019-11-08 | 泰康保险集团股份有限公司 | Method, apparatus, medium and the electronic equipment of correction inclination text image |
CN111223109A (en) * | 2020-01-03 | 2020-06-02 | 四川新网银行股份有限公司 | Complex form image analysis method |
CN111223109B (en) * | 2020-01-03 | 2023-06-06 | 四川新网银行股份有限公司 | Complex form image analysis method |
CN111401371A (en) * | 2020-06-03 | 2020-07-10 | 中邮消费金融有限公司 | Text detection and identification method and system and computer equipment |
CN111401371B (en) * | 2020-06-03 | 2020-09-08 | 中邮消费金融有限公司 | Text detection and identification method and system and computer equipment |
CN112883942A (en) * | 2021-04-28 | 2021-06-01 | 北京世纪好未来教育科技有限公司 | Evaluation method and device for handwritten character, electronic equipment and computer storage medium |
CN113569859A (en) * | 2021-07-27 | 2021-10-29 | 北京奇艺世纪科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN113569859B (en) * | 2021-07-27 | 2023-07-04 | 北京奇艺世纪科技有限公司 | Image processing method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106503711A (en) | A kind of character recognition method | |
CN107609549B (en) | Text detection method for certificate image in natural scene | |
CN108171104B (en) | Character detection method and device | |
US20190340460A1 (en) | Text line detecting method and text line detecting device | |
CN106778752A (en) | A kind of character recognition method | |
US20140193029A1 (en) | Text Detection in Images of Graphical User Interfaces | |
CN103488983A (en) | Business card OCR data correction method and system based on knowledge base | |
CN108805076A (en) | The extracting method and system of environmental impact assessment report table word | |
Bai et al. | Scene text localization using gradient local correlation | |
CN110276279B (en) | Method for detecting arbitrary-shape scene text based on image segmentation | |
CN112364862B (en) | Histogram similarity-based disturbance deformation Chinese character picture matching method | |
Lin et al. | Image segmentation using the k-means algorithm for texture features | |
CN102136074B (en) | Man-machine interface (MMI) based wood image texture analyzing and identifying method | |
Liu et al. | A novel multi-oriented chinese text extraction approach from videos | |
CN107730511B (en) | Tibetan historical literature text line segmentation method based on baseline estimation | |
CN113139535A (en) | OCR document recognition method | |
CN104598881B (en) | Feature based compresses the crooked scene character recognition method with feature selecting | |
Angadi et al. | A robust segmentation technique for line, word and character extraction from Kannada text in low resolution display board images | |
Karanje et al. | Survey on text detection, segmentation and recognition from a natural scene images | |
CN106503713A (en) | One kind is based on thick periphery feature character recognition method | |
Ahmed et al. | Enhancing the character segmentation accuracy of bangla ocr using bpnn | |
Sharma et al. | A new method for character segmentation from multi-oriented video words | |
CN102831421B (en) | A kind of document above-below direction detection method based on punctuation mark | |
CN106469267B (en) | Verification code sample collection method and system | |
CN110807348A (en) | Method for removing interference lines in document image based on greedy algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170315 |