CN104636117A - Automatic segmentation method of form image - Google Patents
Automatic segmentation method of form image Download PDFInfo
- Publication number
- CN104636117A CN104636117A CN201310557566.XA CN201310557566A CN104636117A CN 104636117 A CN104636117 A CN 104636117A CN 201310557566 A CN201310557566 A CN 201310557566A CN 104636117 A CN104636117 A CN 104636117A
- Authority
- CN
- China
- Prior art keywords
- information
- driving information
- data
- region
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Character Input (AREA)
Abstract
The invention discloses an automatic segmentation method of a form image. The automatic segmentation method of the form image comprises recording the form and regional information of the form, automatically analyzing, testing, positioning a test region of handwriting in the form image and finally positioning of a segmentation region. The automatic segmentation method of the form image comprises the following steps: a. a preset calibration on regions needed to be sliced, identified or recorded by hand in the known form is conducted, by template customization, the form and the regional information of the form are stored in a form template library, and driving information of knowledge is obtained ; b, automatically analyzing, testing and positioning a text region is conducted on the scanned or photographed form image, and driving information of data is obtained; c, the driving information of the knowledge and the driving information of data are synthesized, the coincidence degree between the driving information of the knowledge and the driving information of the data is compared, and the final segmentation region is positioned. The automatic segmentation method of the form and the image combines an accurate positioning technology of image areas of the driving information of the knowledge and the driving information of the data and an automatic intelligent processing system of the form and the data based on accurately automatic segmentation of the form image.
Description
Technical field
The present invention relates to form technical field of image processing, particularly a kind of automatic segmentation method of tabular drawing picture.
Background technology
Tradition all adopts artificial means to hand-written manuscript typing, and hand-written manuscript write variation, complicated, make the labour intensity of employee high, efficiency of inputting is but very low, this brings very large trouble to work, researchers develop many application software for this reason, it is desirable to fundamentally to solve hand-written manuscript Rapid input problem.
According to Chinese patent [CN103020619A] method of handwritten entries " in a kind of automatic segmentation electronization notebook ", as shown in Figure 2, (1) shooting needs the papery page-images of the notebook of electronization; (2) determined the four edges edge line of described papery page-images by the line detection method in image, and the page area that four edges edge line limits is corrected to square region; (3) determine the type of the described papery page according to described papery page-images, obtain the papery page empty cutting template of the described type notebook preserved in advance, described blank cutting template is made up of some character blocks; (4) determine the character block at user's handwriting place in described square region, in units of character block, the user's handwriting be in any one character block is extracted in automatic segmentation.The registration of this invention to template and handwritten text is simply differentiate to reach accurate location have form can not effectively process in interior handwritten text region to mixing simultaneously.
Summary of the invention
The object of the invention is to: for the above-mentioned technical matters existed in prior art, there is provided a kind of and combine the image-region placement technology of Knowledge driving information and data-driven information and the list data automated intelligent disposal system based on the accurate automatic segmentation of tabular drawing picture, effectively can improve the automatic segmentation method of the tabular drawing picture of efficiency of inputting.
The present invention is achieved by the following technical solutions:
An automatic segmentation method for tabular drawing picture, comprises the steps: that (1) obtains form entity in form document; (2) scanning or shooting form entity obtain tabular drawing picture; (3) his-and-hers watches table images data analysis and study, obtains the data-driven information being applied to the text filed cutting of handwritten form; (4) table custom-tailoring, by form and area information thereof all stored in form template storehouse; (5) from form template library, the Knowledge driving information being applied to region cutting is obtained; (6) regional analysis integrated data activation bit and Knowledge driving information, his-and-hers watches table images carries out regional analysis and location, obtains cutting the area informations such as subregional position; (7) region cutting utilizes area information, and his-and-hers watches table images carries out cutting, obtains the last area image exported.
Further, his-and-hers watches table images data analysis and study, obtain the data-driven information being applied to the text filed cutting of handwritten form, comprising position and the type information in region; His-and-hers watches table images data analysis and learning procedure as follows:
(A) first by form image binaryzation; In system, adopt adaptive binarization method, in conjunction with Otsu method and Niblack method, the image obtained is the "AND" that two kinds of binarization methods obtain image; If p (x, y) is the value of the last binary picture picture point (x, y) exported, p
otsuthe value that (x, y) obtains for Otsu method, p
niblackthe value that (x, y) obtains for Niblack method, then have
p(x,y)=p
Otsu(x,y)&p
Niblack(x,y)
Wherein, p (x, y)=1 represents stain (prospect character), and p (x, y)=0 represents white point (background);
(B) obtain the corresponding regional of tabular drawing picture by connected domain analysis, then need to differentiate region; Hybrid hierarchy differentiates handwritten form, namely the unit processed is the merging block of some connected domains, the uncertainty of handwritten form characteristic, a kind of Fisher linear discriminant based on incremental learning (the Fisher Linear Discriminant adopted, FLD) sorter, the projection matrix (vector) of classical FLD algorithm is
Wherein, S
w=C
1+ C
2for scatter matrix within class, m
ifor Different categories of samples mean vector;
Utilize the renewal C of sequence KL mapping algorithm (Sequential Karhunen-Loeve Algorithm, SKL) incremental form
i, SKL algorithm is by the D of K eigenvalue of maximum composition
iwith the U that corresponding proper vector forms
iestimate C
i
Wherein, D
ithe orthogonal matrix of K × K dimension, U
iit is the matrix with K row;
In handwritten form differentiates, the feature vector dimension of use is fewer, so along with the continuous increase of new samples, directly uses singular solution decomposition (Singular Value Decomposition, SVD) to upgrade D
iand U
i;
In this incremental sorter, we utilize a kind of adaptive filter mode to upgrade m
i,
Wherein, α is an average constant factor, generally can be set to 0.05, and x
ifor the new samples of the i-th class in incremental learning.
Be further, regional analysis integrated data activation bit and Knowledge driving information, if the text filed position of data-driven information handwritten form and Knowledge driving information handwritten form text filed position registration are higher than 50%, the handwritten form then utilizing data-driven information to obtain is text filed as final cutting region, and text filed for other type, then to come from the Knowledge driving information in form template library, carry out cutting subregional location.
In sum, owing to have employed technique scheme, the invention has the beneficial effects as follows:
(1) the image-region placement technology of Knowledge driving information (customizing and corresponding form template storehouse to the masterplate of demand based on business) and data-driven information (coming from form automated image analysis and study) is combined;
(2) based on list data automated intelligent process (identifying or the typing) system of the accurate automatic segmentation of tabular drawing picture;
(3) diversification, complicated hand-written manuscript can effectively differentiate and typing, further increase the efficiency of typing.
Accompanying drawing explanation
Examples of the present invention will be described by way of reference to the accompanying drawings, wherein:
Fig. 1 is that tabular drawing of the present invention is as automatic segmentation method flow diagram;
Fig. 2 is the schematic flow sheet of the method for handwritten entries in a kind of automatic segmentation electronization of [CN103020619A] patent notebook.
Embodiment
All features disclosed in this instructions, or the step in disclosed all methods or process, except mutually exclusive feature and/or step, all can combine by any way.
Arbitrary feature disclosed in this instructions (comprising any accessory claim, summary and accompanying drawing), unless specifically stated otherwise, all can be replaced by other equivalences or the alternative features with similar object.That is, unless specifically stated otherwise, each feature is an example in a series of equivalence or similar characteristics.
As shown in Figure 1, the present invention proposes a kind of automatic segmentation method of tabular drawing picture, comprises the steps: that (1) obtains form entity in form document; (2) scanning or shooting form entity obtain tabular drawing picture; (3) his-and-hers watches table images data analysis and study, obtains the data-driven information being applied to the text filed cutting of handwritten form; (4) table custom-tailoring, by form and area information thereof all stored in form template storehouse; (5) from form template library, the Knowledge driving information being applied to region cutting is obtained; (6) regional analysis integrated data activation bit and Knowledge driving information, his-and-hers watches table images carries out regional analysis and location, obtains cutting the area informations such as subregional position; (7) region cutting utilizes area information, and his-and-hers watches table images carries out cutting, obtains the last area image exported.
Further, his-and-hers watches table images data analysis and study, obtain the data-driven information being applied to the text filed cutting of handwritten form, comprising position and the type information in region; His-and-hers watches table images data analysis and learning procedure as follows:
(A) first by form image binaryzation; Binarization method comprises overall binarization method and local binarization method, and in overall binarization method, performance does very well and stable method is Otsu method, and in local binarization method, performance does very well and stable method is then Niblack method.In system, adopt adaptive binarization method, in conjunction with Otsu method and Niblack method, the image obtained is the "AND" that two kinds of binarization methods obtain image; If p (x, y) is the value of the last binary picture picture point (x, y) exported, p
otsuthe value that (x, y) obtains for Otsu method, p
niblackthe value that (x, y) obtains for Niblack method, then have
p(x,y)=p
Otsu(x,y)&p
Niblack(x,y)
Wherein, p (x, y)=1 represents stain (prospect character), and p (x, y)=0 represents white point (background);
(B) obtain the corresponding regional of tabular drawing picture by connected domain analysis, then need to differentiate region; Hybrid hierarchy differentiates handwritten form, namely the unit processed is the merging block of some connected domains, and these image blocks are likely character rows, or multiple character, or one or two character, or multiple character row composition, the uncertainty of handwritten form characteristic in addition, a kind of Fisher linear discriminant based on incremental learning (Fisher Linear Discriminant, the FLD) sorter adopted, the projection matrix (vector) of classical FLD algorithm is
Wherein, S
w=C
1+ C
2for scatter matrix within class, m
ifor Different categories of samples mean vector;
Utilize the renewal C of sequence KL mapping algorithm (Sequential Karhunen-Loeve Algorithm, SKL) incremental form
i; SKL algorithm is by the D of K eigenvalue of maximum composition
iwith the U that corresponding proper vector forms
iestimate C
i
Wherein, D
ithe orthogonal matrix of K × K dimension, U
iit is the matrix with K row;
In handwritten form differentiates, the feature vector dimension of use is fewer, so along with the continuous increase of new samples, directly uses singular solution decomposition (Singular Value Decomposition, SVD) to upgrade D
iand U
i;
In this incremental sorter, we utilize a kind of adaptive filter mode to upgrade m
i,
Wherein, α is an average constant factor, generally can be set to 0.05, and x
ifor the new samples of the i-th class in incremental learning.
Be further, regional analysis integrated data activation bit and Knowledge driving information, if the text filed position of data-driven information handwritten form and Knowledge driving information handwritten form text filed position registration higher (generally can be set to registration 50%), the handwritten form then utilizing data-driven information to obtain is text filed as final cutting region, and text filed for other type, then to come from the Knowledge driving information in form template library, carry out cutting subregional location.
In operation, according to the demand of business or user, to needing cutting in form (comprising other spoken and written languages such as Chinese, Japanese), the region of identification or manual entry demarcates in advance.Customized by masterplate, form and area information thereof are all stored in form template storehouse, and the knowledge area information spinner that form template storehouse provides will comprise the position in region, the type (handwritten form region, block letter region, picture region etc.) in region.
View data automatic analysis and study main in tabular drawing picture automatic analysis, detection & localization handwritten form text filed.In general, the handwritten form text in form is most important information, needs location and cutting, so that follow-up identification or manual entry; But a lot of handwritten form text does not complete and writes in region that Table Design designs for it, often exceeds the scope of this design section, so need the automatic analysis by view data and study, carries out the text filed location of handwritten form.So, by view data automatic analysis and study, the data-driven information that handwritten form is text filed can be obtained, mainly comprise position and type (handwritten form the is text filed) information in region.Need before this to relocate image binaryzation, filter line, denoising and identified region.
Regional analysis carries out goodness of fit comparison to knowledge area information and the text filed position of data-driven information handwritten form, if degree of agreement is greater than 50%, the handwritten form then utilizing data-driven information to obtain is text filed as final cutting region, if degree of agreement is less than 50%, then utilize the Knowledge driving information in form template library to be master, carry out cutting subregional location.
The present invention is not limited to aforesaid embodiment.The present invention expands to any new feature of disclosing in this manual or any combination newly, and the step of the arbitrary new method disclosed or process or any combination newly.
Claims (3)
1. an automatic segmentation method for tabular drawing picture, is characterized in that, comprise the steps:
(1) in form document, form entity is obtained;
(2) scanning or shooting form entity obtain tabular drawing picture;
(3) his-and-hers watches table images data analysis and study, obtains the data-driven information being applied to the text filed cutting of handwritten form;
(4) table custom-tailoring, by form and area information thereof all stored in form template storehouse;
(5) the Knowledge driving information being applied to region cutting is obtained from form template library;
(6) regional analysis integrated data activation bit and Knowledge driving information, his-and-hers watches table images carries out regional analysis and location, obtains area information;
(7) region cutting utilizes area information, and his-and-hers watches table images carries out cutting, obtains the last area image exported.
2. the automatic segmentation method of a kind of tabular drawing picture according to claim 1, it is characterized in that: his-and-hers watches table images data analysis and study, obtain the data-driven information being applied to the text filed cutting of handwritten form, comprising position and the type information in region; His-and-hers watches table images data analysis and study are carried out as follows:
(A) first by form image binaryzation; In system, adopt adaptive binarization method, in conjunction with Otsu method and Niblack method, the image obtained is the "AND" that two kinds of binarization methods obtain image; If p (x, y) is the value of the last binary picture picture point (x, y) exported, p
otsuthe value that (x, y) obtains for Otsu method, p
niblackthe value that (x, y) obtains for Niblack method, then have
p(x,y)=p
Otsu(x,y)&p
Niblack(x,y)
Wherein, p (x, y)=1 represents stain (prospect character), and p (x, y)=0 represents white point (background);
(B) in addition, obtain the corresponding regional of tabular drawing picture by connected domain analysis, then need to differentiate region; Hybrid hierarchy differentiates handwritten form, and the unit namely processed is the merging block of some connected domains; The uncertainty of handwritten form characteristic, a kind of Fisher linear discriminant based on incremental learning (Fisher Linear Discr iminant, the FLD) sorter of employing, the projection matrix (vector) of classical FLD algorithm is
Wherein, S
w=C
1+ C
2for scatter matrix within class, m
ifor Different categories of samples mean vector;
Sequence SKL mapping algorithm (Sequential Karhunen-Loeve Algorithm, SKL) incremental form is utilized to upgrade C
i, SKL algorithm is by the D of K eigenvalue of maximum composition
iwith the U that corresponding proper vector forms
iestimate C
i
Wherein, D
ithe orthogonal matrix of K × K dimension, U
iit is the matrix with K row;
In handwritten form differentiates, the feature vector dimension of use is fewer, so along with the continuous increase of new samples, directly uses singular solution decomposition (Singular Value Decomposition, SVD) to upgrade D
iand U
i;
In this incremental sorter, utilize a kind of adaptive filter mode to upgrade m
i
Wherein, α is an average constant factor, generally can be set to 0.05, and x
ifor the new samples of the i-th class in incremental learning.
3. the automatic segmentation method of a kind of tabular drawing picture according to claim 1, it is characterized in that: regional analysis integrated data activation bit and Knowledge driving information, if the text filed position of data-driven information handwritten form and Knowledge driving information handwritten form text filed position registration are higher than 50%, the handwritten form then utilizing data-driven information to obtain is text filed as final cutting region, and text filed for other type, then to come from the Knowledge driving information in form template library, carry out cutting subregional location.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310557566.XA CN104636117A (en) | 2013-11-12 | 2013-11-12 | Automatic segmentation method of form image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310557566.XA CN104636117A (en) | 2013-11-12 | 2013-11-12 | Automatic segmentation method of form image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104636117A true CN104636117A (en) | 2015-05-20 |
Family
ID=53214923
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310557566.XA Pending CN104636117A (en) | 2013-11-12 | 2013-11-12 | Automatic segmentation method of form image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104636117A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105184329A (en) * | 2015-08-27 | 2015-12-23 | 鲁东大学 | Cloud-platform-based off-line handwriting recognition method |
CN105373791A (en) * | 2015-11-12 | 2016-03-02 | 中国建设银行股份有限公司 | Information processing method and information processing device |
CN105426856A (en) * | 2015-11-25 | 2016-03-23 | 成都数联铭品科技有限公司 | Image table character identification method |
CN106156761A (en) * | 2016-08-10 | 2016-11-23 | 北京交通大学 | The image form detection of facing moving terminal shooting and recognition methods |
CN107688805A (en) * | 2017-07-25 | 2018-02-13 | 平安科技(深圳)有限公司 | The method, apparatus and relevant device positioned according to image file in single mode plate is recorded |
CN112308046A (en) * | 2020-12-02 | 2021-02-02 | 龙马智芯(珠海横琴)科技有限公司 | Method, device, server and readable storage medium for positioning text region of image |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060007164A1 (en) * | 2002-04-22 | 2006-01-12 | Yingjian Liu | Wireless and passive tablet for inputting to computer |
CN101366020A (en) * | 2005-12-21 | 2009-02-11 | 微软公司 | Table detection in ink notes |
CN101447017A (en) * | 2008-11-27 | 2009-06-03 | 浙江工业大学 | Method and system for quickly identifying and counting votes on the basis of layout analysis |
US7920299B2 (en) * | 2005-03-14 | 2011-04-05 | Gtech Rhode Island Corporation | System and method for processing a form |
CN102750531A (en) * | 2012-06-05 | 2012-10-24 | 江苏尚博信息科技有限公司 | Method for detecting handwriting mark symbols for bill document positioning grids |
CN102903136A (en) * | 2012-09-28 | 2013-01-30 | 王平 | Method and system for electronizing handwriting |
-
2013
- 2013-11-12 CN CN201310557566.XA patent/CN104636117A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060007164A1 (en) * | 2002-04-22 | 2006-01-12 | Yingjian Liu | Wireless and passive tablet for inputting to computer |
US7920299B2 (en) * | 2005-03-14 | 2011-04-05 | Gtech Rhode Island Corporation | System and method for processing a form |
CN101366020A (en) * | 2005-12-21 | 2009-02-11 | 微软公司 | Table detection in ink notes |
CN101447017A (en) * | 2008-11-27 | 2009-06-03 | 浙江工业大学 | Method and system for quickly identifying and counting votes on the basis of layout analysis |
CN102750531A (en) * | 2012-06-05 | 2012-10-24 | 江苏尚博信息科技有限公司 | Method for detecting handwriting mark symbols for bill document positioning grids |
CN102903136A (en) * | 2012-09-28 | 2013-01-30 | 王平 | Method and system for electronizing handwriting |
Non-Patent Citations (1)
Title |
---|
杨颖,杨磊: "自由手写体数字表格自动识别系统", 《计算机工程与应用》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105184329A (en) * | 2015-08-27 | 2015-12-23 | 鲁东大学 | Cloud-platform-based off-line handwriting recognition method |
CN105373791A (en) * | 2015-11-12 | 2016-03-02 | 中国建设银行股份有限公司 | Information processing method and information processing device |
CN105373791B (en) * | 2015-11-12 | 2018-12-14 | 中国建设银行股份有限公司 | Information processing method and information processing unit |
CN105426856A (en) * | 2015-11-25 | 2016-03-23 | 成都数联铭品科技有限公司 | Image table character identification method |
CN106156761A (en) * | 2016-08-10 | 2016-11-23 | 北京交通大学 | The image form detection of facing moving terminal shooting and recognition methods |
CN106156761B (en) * | 2016-08-10 | 2020-01-10 | 北京交通大学 | Image table detection and identification method for mobile terminal shooting |
CN107688805A (en) * | 2017-07-25 | 2018-02-13 | 平安科技(深圳)有限公司 | The method, apparatus and relevant device positioned according to image file in single mode plate is recorded |
CN112308046A (en) * | 2020-12-02 | 2021-02-02 | 龙马智芯(珠海横琴)科技有限公司 | Method, device, server and readable storage medium for positioning text region of image |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104636117A (en) | Automatic segmentation method of form image | |
CN108171297A (en) | A kind of answer card identification method and device | |
US8494273B2 (en) | Adaptive optical character recognition on a document with distorted characters | |
CN101923643B (en) | General form recognizing method | |
CN103049750B (en) | Character identifying method | |
CN103761743B (en) | A kind of solid wooden floor board detection method of surface flaw based on image co-registration segmentation | |
CN110796143A (en) | Scene text recognition method based on man-machine cooperation | |
CN109002834A (en) | Fine granularity image classification method based on multi-modal characterization | |
CN104123550A (en) | Cloud computing-based text scanning identification method | |
CN109934227A (en) | System for recognizing characters from image and method | |
CN105824756B (en) | A kind of out-of-date demand automatic testing method and system based on code dependence | |
CN102750531B (en) | Method for detecting handwriting mark symbols for bill document positioning grids | |
CN106485272A (en) | The zero sample classification method being embedded based on the cross-module state of manifold constraint | |
CN103577839B (en) | A kind of neighborhood keeps differentiating embedding face identification method and system | |
CN107818321A (en) | A kind of watermark date recognition method for vehicle annual test | |
EP4170605A1 (en) | Multi-level transferable region-based domain adaptive object detection apparatus and method | |
RU2656573C2 (en) | Methods of detecting the user-integrated check marks | |
CN112241730A (en) | Form extraction method and system based on machine learning | |
Turan et al. | A novel method to identify and grade DNA damage on comet images | |
CN105183950B (en) | A kind of method and system for consulting engineering drawing based on mobile terminal | |
CN115937887A (en) | Method and device for extracting document structured information, electronic equipment and storage medium | |
CN117520343A (en) | Information extraction method, server and storage medium | |
CN103927533A (en) | Intelligent processing method for graphics and text information in early patent document scanning copy | |
Chakraborty et al. | Handwritten Character Recognition from Image Using CNN | |
CN103186777A (en) | Human detection method based on non-negative matrix factorization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20150520 |
|
WD01 | Invention patent application deemed withdrawn after publication |