CN104636117A - Automatic segmentation method of form image - Google Patents

Automatic segmentation method of form image Download PDF

Info

Publication number
CN104636117A
CN104636117A CN201310557566.XA CN201310557566A CN104636117A CN 104636117 A CN104636117 A CN 104636117A CN 201310557566 A CN201310557566 A CN 201310557566A CN 104636117 A CN104636117 A CN 104636117A
Authority
CN
China
Prior art keywords
information
driving information
data
region
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310557566.XA
Other languages
Chinese (zh)
Inventor
殷绪成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU ABEYOND OUTSOURCING CO Ltd
Original Assignee
JIANGSU ABEYOND OUTSOURCING CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIANGSU ABEYOND OUTSOURCING CO Ltd filed Critical JIANGSU ABEYOND OUTSOURCING CO Ltd
Priority to CN201310557566.XA priority Critical patent/CN104636117A/en
Publication of CN104636117A publication Critical patent/CN104636117A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

The invention discloses an automatic segmentation method of a form image. The automatic segmentation method of the form image comprises recording the form and regional information of the form, automatically analyzing, testing, positioning a test region of handwriting in the form image and finally positioning of a segmentation region. The automatic segmentation method of the form image comprises the following steps: a. a preset calibration on regions needed to be sliced, identified or recorded by hand in the known form is conducted, by template customization, the form and the regional information of the form are stored in a form template library, and driving information of knowledge is obtained ; b, automatically analyzing, testing and positioning a text region is conducted on the scanned or photographed form image, and driving information of data is obtained; c, the driving information of the knowledge and the driving information of data are synthesized, the coincidence degree between the driving information of the knowledge and the driving information of the data is compared, and the final segmentation region is positioned. The automatic segmentation method of the form and the image combines an accurate positioning technology of image areas of the driving information of the knowledge and the driving information of the data and an automatic intelligent processing system of the form and the data based on accurately automatic segmentation of the form image.

Description

A kind of automatic segmentation method of tabular drawing picture
Technical field
The present invention relates to form technical field of image processing, particularly a kind of automatic segmentation method of tabular drawing picture.
Background technology
Tradition all adopts artificial means to hand-written manuscript typing, and hand-written manuscript write variation, complicated, make the labour intensity of employee high, efficiency of inputting is but very low, this brings very large trouble to work, researchers develop many application software for this reason, it is desirable to fundamentally to solve hand-written manuscript Rapid input problem.
According to Chinese patent [CN103020619A] method of handwritten entries " in a kind of automatic segmentation electronization notebook ", as shown in Figure 2, (1) shooting needs the papery page-images of the notebook of electronization; (2) determined the four edges edge line of described papery page-images by the line detection method in image, and the page area that four edges edge line limits is corrected to square region; (3) determine the type of the described papery page according to described papery page-images, obtain the papery page empty cutting template of the described type notebook preserved in advance, described blank cutting template is made up of some character blocks; (4) determine the character block at user's handwriting place in described square region, in units of character block, the user's handwriting be in any one character block is extracted in automatic segmentation.The registration of this invention to template and handwritten text is simply differentiate to reach accurate location have form can not effectively process in interior handwritten text region to mixing simultaneously.
Summary of the invention
The object of the invention is to: for the above-mentioned technical matters existed in prior art, there is provided a kind of and combine the image-region placement technology of Knowledge driving information and data-driven information and the list data automated intelligent disposal system based on the accurate automatic segmentation of tabular drawing picture, effectively can improve the automatic segmentation method of the tabular drawing picture of efficiency of inputting.
The present invention is achieved by the following technical solutions:
An automatic segmentation method for tabular drawing picture, comprises the steps: that (1) obtains form entity in form document; (2) scanning or shooting form entity obtain tabular drawing picture; (3) his-and-hers watches table images data analysis and study, obtains the data-driven information being applied to the text filed cutting of handwritten form; (4) table custom-tailoring, by form and area information thereof all stored in form template storehouse; (5) from form template library, the Knowledge driving information being applied to region cutting is obtained; (6) regional analysis integrated data activation bit and Knowledge driving information, his-and-hers watches table images carries out regional analysis and location, obtains cutting the area informations such as subregional position; (7) region cutting utilizes area information, and his-and-hers watches table images carries out cutting, obtains the last area image exported.
Further, his-and-hers watches table images data analysis and study, obtain the data-driven information being applied to the text filed cutting of handwritten form, comprising position and the type information in region; His-and-hers watches table images data analysis and learning procedure as follows:
(A) first by form image binaryzation; In system, adopt adaptive binarization method, in conjunction with Otsu method and Niblack method, the image obtained is the "AND" that two kinds of binarization methods obtain image; If p (x, y) is the value of the last binary picture picture point (x, y) exported, p otsuthe value that (x, y) obtains for Otsu method, p niblackthe value that (x, y) obtains for Niblack method, then have
p(x,y)=p Otsu(x,y)&p Niblack(x,y)
Wherein, p (x, y)=1 represents stain (prospect character), and p (x, y)=0 represents white point (background);
(B) obtain the corresponding regional of tabular drawing picture by connected domain analysis, then need to differentiate region; Hybrid hierarchy differentiates handwritten form, namely the unit processed is the merging block of some connected domains, the uncertainty of handwritten form characteristic, a kind of Fisher linear discriminant based on incremental learning (the Fisher Linear Discriminant adopted, FLD) sorter, the projection matrix (vector) of classical FLD algorithm is
W = S w - 1 ( m 1 - m 2 )
Wherein, S w=C 1+ C 2for scatter matrix within class, m ifor Different categories of samples mean vector;
Utilize the renewal C of sequence KL mapping algorithm (Sequential Karhunen-Loeve Algorithm, SKL) incremental form i, SKL algorithm is by the D of K eigenvalue of maximum composition iwith the U that corresponding proper vector forms iestimate C i
C i ≈ U i D i U i T
Wherein, D ithe orthogonal matrix of K × K dimension, U iit is the matrix with K row;
In handwritten form differentiates, the feature vector dimension of use is fewer, so along with the continuous increase of new samples, directly uses singular solution decomposition (Singular Value Decomposition, SVD) to upgrade D iand U i;
In this incremental sorter, we utilize a kind of adaptive filter mode to upgrade m i,
m i new = ( 1 - α ) m i + αx i
Wherein, α is an average constant factor, generally can be set to 0.05, and x ifor the new samples of the i-th class in incremental learning.
Be further, regional analysis integrated data activation bit and Knowledge driving information, if the text filed position of data-driven information handwritten form and Knowledge driving information handwritten form text filed position registration are higher than 50%, the handwritten form then utilizing data-driven information to obtain is text filed as final cutting region, and text filed for other type, then to come from the Knowledge driving information in form template library, carry out cutting subregional location.
In sum, owing to have employed technique scheme, the invention has the beneficial effects as follows:
(1) the image-region placement technology of Knowledge driving information (customizing and corresponding form template storehouse to the masterplate of demand based on business) and data-driven information (coming from form automated image analysis and study) is combined;
(2) based on list data automated intelligent process (identifying or the typing) system of the accurate automatic segmentation of tabular drawing picture;
(3) diversification, complicated hand-written manuscript can effectively differentiate and typing, further increase the efficiency of typing.
Accompanying drawing explanation
Examples of the present invention will be described by way of reference to the accompanying drawings, wherein:
Fig. 1 is that tabular drawing of the present invention is as automatic segmentation method flow diagram;
Fig. 2 is the schematic flow sheet of the method for handwritten entries in a kind of automatic segmentation electronization of [CN103020619A] patent notebook.
Embodiment
All features disclosed in this instructions, or the step in disclosed all methods or process, except mutually exclusive feature and/or step, all can combine by any way.
Arbitrary feature disclosed in this instructions (comprising any accessory claim, summary and accompanying drawing), unless specifically stated otherwise, all can be replaced by other equivalences or the alternative features with similar object.That is, unless specifically stated otherwise, each feature is an example in a series of equivalence or similar characteristics.
As shown in Figure 1, the present invention proposes a kind of automatic segmentation method of tabular drawing picture, comprises the steps: that (1) obtains form entity in form document; (2) scanning or shooting form entity obtain tabular drawing picture; (3) his-and-hers watches table images data analysis and study, obtains the data-driven information being applied to the text filed cutting of handwritten form; (4) table custom-tailoring, by form and area information thereof all stored in form template storehouse; (5) from form template library, the Knowledge driving information being applied to region cutting is obtained; (6) regional analysis integrated data activation bit and Knowledge driving information, his-and-hers watches table images carries out regional analysis and location, obtains cutting the area informations such as subregional position; (7) region cutting utilizes area information, and his-and-hers watches table images carries out cutting, obtains the last area image exported.
Further, his-and-hers watches table images data analysis and study, obtain the data-driven information being applied to the text filed cutting of handwritten form, comprising position and the type information in region; His-and-hers watches table images data analysis and learning procedure as follows:
(A) first by form image binaryzation; Binarization method comprises overall binarization method and local binarization method, and in overall binarization method, performance does very well and stable method is Otsu method, and in local binarization method, performance does very well and stable method is then Niblack method.In system, adopt adaptive binarization method, in conjunction with Otsu method and Niblack method, the image obtained is the "AND" that two kinds of binarization methods obtain image; If p (x, y) is the value of the last binary picture picture point (x, y) exported, p otsuthe value that (x, y) obtains for Otsu method, p niblackthe value that (x, y) obtains for Niblack method, then have
p(x,y)=p Otsu(x,y)&p Niblack(x,y)
Wherein, p (x, y)=1 represents stain (prospect character), and p (x, y)=0 represents white point (background);
(B) obtain the corresponding regional of tabular drawing picture by connected domain analysis, then need to differentiate region; Hybrid hierarchy differentiates handwritten form, namely the unit processed is the merging block of some connected domains, and these image blocks are likely character rows, or multiple character, or one or two character, or multiple character row composition, the uncertainty of handwritten form characteristic in addition, a kind of Fisher linear discriminant based on incremental learning (Fisher Linear Discriminant, the FLD) sorter adopted, the projection matrix (vector) of classical FLD algorithm is
W = S w - 1 ( m 1 - m 2 )
Wherein, S w=C 1+ C 2for scatter matrix within class, m ifor Different categories of samples mean vector;
Utilize the renewal C of sequence KL mapping algorithm (Sequential Karhunen-Loeve Algorithm, SKL) incremental form i; SKL algorithm is by the D of K eigenvalue of maximum composition iwith the U that corresponding proper vector forms iestimate C i
C i ≈ U i D i U i T
Wherein, D ithe orthogonal matrix of K × K dimension, U iit is the matrix with K row;
In handwritten form differentiates, the feature vector dimension of use is fewer, so along with the continuous increase of new samples, directly uses singular solution decomposition (Singular Value Decomposition, SVD) to upgrade D iand U i;
In this incremental sorter, we utilize a kind of adaptive filter mode to upgrade m i,
m i new = ( 1 - α ) m i + αx i
Wherein, α is an average constant factor, generally can be set to 0.05, and x ifor the new samples of the i-th class in incremental learning.
Be further, regional analysis integrated data activation bit and Knowledge driving information, if the text filed position of data-driven information handwritten form and Knowledge driving information handwritten form text filed position registration higher (generally can be set to registration 50%), the handwritten form then utilizing data-driven information to obtain is text filed as final cutting region, and text filed for other type, then to come from the Knowledge driving information in form template library, carry out cutting subregional location.
In operation, according to the demand of business or user, to needing cutting in form (comprising other spoken and written languages such as Chinese, Japanese), the region of identification or manual entry demarcates in advance.Customized by masterplate, form and area information thereof are all stored in form template storehouse, and the knowledge area information spinner that form template storehouse provides will comprise the position in region, the type (handwritten form region, block letter region, picture region etc.) in region.
View data automatic analysis and study main in tabular drawing picture automatic analysis, detection & localization handwritten form text filed.In general, the handwritten form text in form is most important information, needs location and cutting, so that follow-up identification or manual entry; But a lot of handwritten form text does not complete and writes in region that Table Design designs for it, often exceeds the scope of this design section, so need the automatic analysis by view data and study, carries out the text filed location of handwritten form.So, by view data automatic analysis and study, the data-driven information that handwritten form is text filed can be obtained, mainly comprise position and type (handwritten form the is text filed) information in region.Need before this to relocate image binaryzation, filter line, denoising and identified region.
Regional analysis carries out goodness of fit comparison to knowledge area information and the text filed position of data-driven information handwritten form, if degree of agreement is greater than 50%, the handwritten form then utilizing data-driven information to obtain is text filed as final cutting region, if degree of agreement is less than 50%, then utilize the Knowledge driving information in form template library to be master, carry out cutting subregional location.
The present invention is not limited to aforesaid embodiment.The present invention expands to any new feature of disclosing in this manual or any combination newly, and the step of the arbitrary new method disclosed or process or any combination newly.

Claims (3)

1. an automatic segmentation method for tabular drawing picture, is characterized in that, comprise the steps:
(1) in form document, form entity is obtained;
(2) scanning or shooting form entity obtain tabular drawing picture;
(3) his-and-hers watches table images data analysis and study, obtains the data-driven information being applied to the text filed cutting of handwritten form;
(4) table custom-tailoring, by form and area information thereof all stored in form template storehouse;
(5) the Knowledge driving information being applied to region cutting is obtained from form template library;
(6) regional analysis integrated data activation bit and Knowledge driving information, his-and-hers watches table images carries out regional analysis and location, obtains area information;
(7) region cutting utilizes area information, and his-and-hers watches table images carries out cutting, obtains the last area image exported.
2. the automatic segmentation method of a kind of tabular drawing picture according to claim 1, it is characterized in that: his-and-hers watches table images data analysis and study, obtain the data-driven information being applied to the text filed cutting of handwritten form, comprising position and the type information in region; His-and-hers watches table images data analysis and study are carried out as follows:
(A) first by form image binaryzation; In system, adopt adaptive binarization method, in conjunction with Otsu method and Niblack method, the image obtained is the "AND" that two kinds of binarization methods obtain image; If p (x, y) is the value of the last binary picture picture point (x, y) exported, p otsuthe value that (x, y) obtains for Otsu method, p niblackthe value that (x, y) obtains for Niblack method, then have
p(x,y)=p Otsu(x,y)&p Niblack(x,y)
Wherein, p (x, y)=1 represents stain (prospect character), and p (x, y)=0 represents white point (background);
(B) in addition, obtain the corresponding regional of tabular drawing picture by connected domain analysis, then need to differentiate region; Hybrid hierarchy differentiates handwritten form, and the unit namely processed is the merging block of some connected domains; The uncertainty of handwritten form characteristic, a kind of Fisher linear discriminant based on incremental learning (Fisher Linear Discr iminant, the FLD) sorter of employing, the projection matrix (vector) of classical FLD algorithm is
W = S w - 1 ( m 1 - m 2 )
Wherein, S w=C 1+ C 2for scatter matrix within class, m ifor Different categories of samples mean vector;
Sequence SKL mapping algorithm (Sequential Karhunen-Loeve Algorithm, SKL) incremental form is utilized to upgrade C i, SKL algorithm is by the D of K eigenvalue of maximum composition iwith the U that corresponding proper vector forms iestimate C i
C i ≈ U i D i U i T
Wherein, D ithe orthogonal matrix of K × K dimension, U iit is the matrix with K row;
In handwritten form differentiates, the feature vector dimension of use is fewer, so along with the continuous increase of new samples, directly uses singular solution decomposition (Singular Value Decomposition, SVD) to upgrade D iand U i;
In this incremental sorter, utilize a kind of adaptive filter mode to upgrade m i
m i new = ( 1 - α ) m i + αx i
Wherein, α is an average constant factor, generally can be set to 0.05, and x ifor the new samples of the i-th class in incremental learning.
3. the automatic segmentation method of a kind of tabular drawing picture according to claim 1, it is characterized in that: regional analysis integrated data activation bit and Knowledge driving information, if the text filed position of data-driven information handwritten form and Knowledge driving information handwritten form text filed position registration are higher than 50%, the handwritten form then utilizing data-driven information to obtain is text filed as final cutting region, and text filed for other type, then to come from the Knowledge driving information in form template library, carry out cutting subregional location.
CN201310557566.XA 2013-11-12 2013-11-12 Automatic segmentation method of form image Pending CN104636117A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310557566.XA CN104636117A (en) 2013-11-12 2013-11-12 Automatic segmentation method of form image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310557566.XA CN104636117A (en) 2013-11-12 2013-11-12 Automatic segmentation method of form image

Publications (1)

Publication Number Publication Date
CN104636117A true CN104636117A (en) 2015-05-20

Family

ID=53214923

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310557566.XA Pending CN104636117A (en) 2013-11-12 2013-11-12 Automatic segmentation method of form image

Country Status (1)

Country Link
CN (1) CN104636117A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184329A (en) * 2015-08-27 2015-12-23 鲁东大学 Cloud-platform-based off-line handwriting recognition method
CN105373791A (en) * 2015-11-12 2016-03-02 中国建设银行股份有限公司 Information processing method and information processing device
CN105426856A (en) * 2015-11-25 2016-03-23 成都数联铭品科技有限公司 Image table character identification method
CN106156761A (en) * 2016-08-10 2016-11-23 北京交通大学 The image form detection of facing moving terminal shooting and recognition methods
CN107688805A (en) * 2017-07-25 2018-02-13 平安科技(深圳)有限公司 The method, apparatus and relevant device positioned according to image file in single mode plate is recorded
CN112308046A (en) * 2020-12-02 2021-02-02 龙马智芯(珠海横琴)科技有限公司 Method, device, server and readable storage medium for positioning text region of image

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060007164A1 (en) * 2002-04-22 2006-01-12 Yingjian Liu Wireless and passive tablet for inputting to computer
CN101366020A (en) * 2005-12-21 2009-02-11 微软公司 Table detection in ink notes
CN101447017A (en) * 2008-11-27 2009-06-03 浙江工业大学 Method and system for quickly identifying and counting votes on the basis of layout analysis
US7920299B2 (en) * 2005-03-14 2011-04-05 Gtech Rhode Island Corporation System and method for processing a form
CN102750531A (en) * 2012-06-05 2012-10-24 江苏尚博信息科技有限公司 Method for detecting handwriting mark symbols for bill document positioning grids
CN102903136A (en) * 2012-09-28 2013-01-30 王平 Method and system for electronizing handwriting

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060007164A1 (en) * 2002-04-22 2006-01-12 Yingjian Liu Wireless and passive tablet for inputting to computer
US7920299B2 (en) * 2005-03-14 2011-04-05 Gtech Rhode Island Corporation System and method for processing a form
CN101366020A (en) * 2005-12-21 2009-02-11 微软公司 Table detection in ink notes
CN101447017A (en) * 2008-11-27 2009-06-03 浙江工业大学 Method and system for quickly identifying and counting votes on the basis of layout analysis
CN102750531A (en) * 2012-06-05 2012-10-24 江苏尚博信息科技有限公司 Method for detecting handwriting mark symbols for bill document positioning grids
CN102903136A (en) * 2012-09-28 2013-01-30 王平 Method and system for electronizing handwriting

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨颖,杨磊: "自由手写体数字表格自动识别系统", 《计算机工程与应用》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184329A (en) * 2015-08-27 2015-12-23 鲁东大学 Cloud-platform-based off-line handwriting recognition method
CN105373791A (en) * 2015-11-12 2016-03-02 中国建设银行股份有限公司 Information processing method and information processing device
CN105373791B (en) * 2015-11-12 2018-12-14 中国建设银行股份有限公司 Information processing method and information processing unit
CN105426856A (en) * 2015-11-25 2016-03-23 成都数联铭品科技有限公司 Image table character identification method
CN106156761A (en) * 2016-08-10 2016-11-23 北京交通大学 The image form detection of facing moving terminal shooting and recognition methods
CN106156761B (en) * 2016-08-10 2020-01-10 北京交通大学 Image table detection and identification method for mobile terminal shooting
CN107688805A (en) * 2017-07-25 2018-02-13 平安科技(深圳)有限公司 The method, apparatus and relevant device positioned according to image file in single mode plate is recorded
CN112308046A (en) * 2020-12-02 2021-02-02 龙马智芯(珠海横琴)科技有限公司 Method, device, server and readable storage medium for positioning text region of image

Similar Documents

Publication Publication Date Title
CN104636117A (en) Automatic segmentation method of form image
CN108171297A (en) A kind of answer card identification method and device
US8494273B2 (en) Adaptive optical character recognition on a document with distorted characters
CN101923643B (en) General form recognizing method
CN103049750B (en) Character identifying method
CN103761743B (en) A kind of solid wooden floor board detection method of surface flaw based on image co-registration segmentation
CN110796143A (en) Scene text recognition method based on man-machine cooperation
CN109002834A (en) Fine granularity image classification method based on multi-modal characterization
CN104123550A (en) Cloud computing-based text scanning identification method
CN109934227A (en) System for recognizing characters from image and method
CN105824756B (en) A kind of out-of-date demand automatic testing method and system based on code dependence
CN102750531B (en) Method for detecting handwriting mark symbols for bill document positioning grids
CN106485272A (en) The zero sample classification method being embedded based on the cross-module state of manifold constraint
CN103577839B (en) A kind of neighborhood keeps differentiating embedding face identification method and system
CN107818321A (en) A kind of watermark date recognition method for vehicle annual test
EP4170605A1 (en) Multi-level transferable region-based domain adaptive object detection apparatus and method
RU2656573C2 (en) Methods of detecting the user-integrated check marks
CN112241730A (en) Form extraction method and system based on machine learning
Turan et al. A novel method to identify and grade DNA damage on comet images
CN105183950B (en) A kind of method and system for consulting engineering drawing based on mobile terminal
CN115937887A (en) Method and device for extracting document structured information, electronic equipment and storage medium
CN117520343A (en) Information extraction method, server and storage medium
CN103927533A (en) Intelligent processing method for graphics and text information in early patent document scanning copy
Chakraborty et al. Handwritten Character Recognition from Image Using CNN
CN103186777A (en) Human detection method based on non-negative matrix factorization

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150520

WD01 Invention patent application deemed withdrawn after publication