CN106203397B - Form based on tabular analysis technology in image differentiates and localization method - Google Patents

Form based on tabular analysis technology in image differentiates and localization method Download PDF

Info

Publication number
CN106203397B
CN106203397B CN201610593119.3A CN201610593119A CN106203397B CN 106203397 B CN106203397 B CN 106203397B CN 201610593119 A CN201610593119 A CN 201610593119A CN 106203397 B CN106203397 B CN 106203397B
Authority
CN
China
Prior art keywords
encirclement frame
encirclement
threshold value
maximum
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610593119.3A
Other languages
Chinese (zh)
Other versions
CN106203397A (en
Inventor
于志文
车少帅
邵婷
邵一婷
胡笳
吴洲洋
周玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CLP Hongxin Information Technology Co., Ltd
Original Assignee
JIANGSU HONGXIN SYSTEM INTEGRATION CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIANGSU HONGXIN SYSTEM INTEGRATION CO Ltd filed Critical JIANGSU HONGXIN SYSTEM INTEGRATION CO Ltd
Priority to CN201610593119.3A priority Critical patent/CN106203397B/en
Publication of CN106203397A publication Critical patent/CN106203397A/en
Application granted granted Critical
Publication of CN106203397B publication Critical patent/CN106203397B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • G06V10/225Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on a marking or identifier characterising the area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/247Aligning, centring, orientation detection or correction of the image by affine transforms, e.g. correction due to perspective effects; Quadrilaterals, e.g. trapezoids

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The present invention disclose the form based on tabular analysis technology in image and differentiated and localization method, including selection length is between encirclement frame maximum length threshold and minimum length threshold from all encirclement frames of image and width is between encirclement frame Breadth Maximum threshold value and minimum widith threshold value and area maximum enclosure frame;Image slant correction;Extraction all outer encirclement frames that length is between encirclement frame maximum length threshold and minimum length threshold and width is between encirclement frame Breadth Maximum threshold value and minimum widith threshold value;The ratio of the outer encirclement frame area of extraction and itself boundary rectangle area outer encirclement frame between area ratio max-thresholds and area ratio minimum threshold;The extraction outer encirclement frame that horizontal line section number is between the minimal amount threshold value and maximum number threshold value of horizontal line section number and vertical segment number is between the minimal amount threshold value and maximum number threshold value of vertical segment number;According to the method for encirclement frame in searching form to form locating;Form of the present invention differentiates and the accuracy rate of positioning is high.

Description

Form based on tabular analysis technology in image differentiates and localization method
Technical field
The present invention relates to tabular analysis technical field in image procossing, more particularly to one kind to be based on tabular analysis skill in image The form of art differentiates and localization method.
Background technology
Paper document is a kind of common Informational Expression, and has higher stability and security, but with letter The development of breath technology, it is difficult to information management and increasingly highlighted the shortcomings that analysis.Using image processing techniques to papery text Shelves information is digitized processing and has become inexorable trend.
At present, domestic and international main digital document method is that paper document is scanned into the image for including various information, Image information is extracted according to Digital image technology.During image information is extracted, the extraction of list data is very crucial A step, if form is differentiated or positioned internal is inaccurate by mistake, not only result in and lose tableau format information, also result in Produce the OCR recognition results of mistake.
Conventional form discrimination method is to find the straight line in image, enters line tilt correction according to straight line, in slant correction Image in if horizontal linear and vertical line meet table features, then it is assumed that be form, but this method is on the one hand easy Correction is inaccurate, on the other hand, has some similar to tabular drawing picture by flase drop, causes false drop rate higher.And conventional form Positioning is by finding straight line Information locating table cell, and this method can cause position inaccurate due to the interruption of straight line.
The content of the invention
The technical problems to be solved by the invention are to provide one kind for above-mentioned the deficiencies in the prior art to be based on table in image The form of case analysis technology differentiates and localization method, form discriminating and localization method based on tabular analysis technology in image, Differentiate that outer encirclement frame is by the method for finding doubtful table area and internally find the horizontal and vertical straight line for meeting number No is form, eliminates non-tabular drawing picture, differentiates that the accuracy rate of form is higher, and using each encirclement frame in searching form Method, each encirclement frame in form can form a profile, be ranked up according to the position of profile, final position-table, Laid the foundation for reduction form data afterwards, form locating is very accurate.
To realize above-mentioned technical purpose, the technical scheme that the present invention takes is:
Form based on tabular analysis technology in image differentiates and localization method, comprises the following steps:
1)Encirclement frame all in image will be extracted similar to form scan sample into image;
(2)Minimum length threshold, maximum length threshold, minimum widith threshold value and the Breadth Maximum threshold value of encirclement frame are set, Set area ratio max-thresholds and area ratio minimum threshold;
(3)Maximum length threshold and minimum length threshold of the length in encirclement frame are chosen from all encirclement frames of image Between and width is between the Breadth Maximum threshold value and minimum widith threshold value of encirclement frame and area is maximum encirclement frame;
(4)By by step(3)Obtained encirclement frame enters line tilt correction to image;
(5)Extract outer encirclement frame all in the image of slant correction, maximum length threshold of the extraction length in encirclement frame Be worth between minimum length threshold and width is all outer between the Breadth Maximum threshold value and minimum widith threshold value of encirclement frame Encirclement frame, and each outer encirclement frame of extraction is labeled as doubtful table area;
(6)The inside of the doubtful table area obtained to step (5) carries out looking for encirclement frame to operate, and extracts the area of encirclement frame With all bags of the ratio of the boundary rectangle area of itself between area ratio max-thresholds and area ratio minimum threshold Peripheral frame;
(7)The minimal amount threshold value and maximum number threshold value of the horizontal line section number included in doubtful table area are set, The minimal amount threshold value and maximum number threshold value of the vertical segment number included in doubtful table area are set, is become by Hough The method for changing detection of straight lines is detected by step(6)The horizontal line hop count that all encirclement frames include in obtained doubtful table area Mesh and vertical segment number, extract the minimal amount threshold value and maximum number of the horizontal line section number that includes in horizontal line section number Between threshold value and comprising vertical segment number between the minimal amount threshold value and maximum number threshold value of vertical segment number The doubtful table area of extraction is simultaneously labeled as form by doubtful table area;
(8)According to finding the method for each encirclement frame in form successively to step(7)Obtained form carries out form locating.
Length is chosen as further improved technical scheme of the present invention, in all encirclement frames from image wrapping Between the maximum length threshold and minimum length threshold of peripheral frame and width encirclement frame Breadth Maximum threshold value and minimum widith threshold Between value and encirclement frame that area is maximum, including:
The maximum encirclement frame of area is chosen from all encirclement frames of image, by the length point of the maximum encirclement frame of area Do not contrasted with the minimum length threshold and maximum length threshold of encirclement frame, by the width of the maximum encirclement frame of area respectively with encirclement The minimum widith threshold value and Breadth Maximum threshold comparison of frame, if the minimum that the length of the encirclement frame of area maximum is less than encirclement frame is long The width for spending the encirclement frame of threshold value or area maximum is less than the minimum widith threshold value of encirclement frame, then is non-table by this image labeling Table images simultaneously reject non-tabular drawing picture, are otherwise labeled as tabular drawing picture to be detected;
If in tabular drawing picture to be detected the length of the maximum encirclement frame of area be more than encirclement frame maximum length threshold or The width of the maximum encirclement frame of area is more than the Breadth Maximum threshold value of encirclement frame, then the big encirclement frame of area time is chosen, if area The width of maximum length threshold or the secondary big encirclement frame of area that the length of secondary big encirclement frame is more than encirclement frame, which is more than, to be surrounded The Breadth Maximum threshold value of frame, then choose the third-largest encirclement frame of area, meets maximum of the length in encirclement frame until choosing one Between length threshold and minimum length threshold and bag of the width between the Breadth Maximum threshold value and minimum widith threshold value of encirclement frame Peripheral frame.The encirclement frame of selection belongs to length between the maximum length threshold and minimum length threshold of encirclement frame and width is surrounding The maximum encirclement frame of area in all encirclement frames between the Breadth Maximum threshold value and minimum widith threshold value of frame.
It is described by by step as further improved technical scheme of the present invention(3)Obtained encirclement frame is entered to image Line tilt correction, including:
Detected by the method for Hough transform detection of straight lines by step(3)All line segments in obtained encirclement frame, meter All line segments and the angle of horizontal direction simultaneously choose minimum angle, using minimum angle as tabular drawing to be detected as The angle of rotation, tabular drawing picture to be detected is rotated, then complete the Slant Rectify to tabular drawing picture to be detected.
As further improved technical scheme of the present invention, method that the basis finds each encirclement frame in form successively To step(7)Obtained form carries out form locating, including:
From step(7)The left upper apex of obtained form starts, and finds approached with the height of the left upper apex of form successively Encirclement frame and be ranked up according to the front and back position of encirclement frame;
After the completion of the encirclement frame sequence of the first row, since the highest summit that the first row surrounds frame bottom, find successively Close encirclement frame and sorted successively with the height on highest summit;
After the completion of the encirclement frame sequence of second row, surrounded according to the third line is found the step of finding the second row encirclement frame successively Frame simultaneously sorts successively, the encirclement frame until searching out form bottommost, obtains the form that encirclement frame has sorted;
According to the coordinate setting of the Sort Direction of the encirclement frame in the form to have sorted and encirclement frame Sort Direction to tool Body table position, complete form locating.
The present invention chooses maximum length threshold and minimum length of the length in encirclement frame from all encirclement frames of image Between threshold value and width is between the Breadth Maximum threshold value and minimum widith threshold value of encirclement frame and area is maximum encirclement frame, if There is no the encirclement frame chosen to satisfaction from image, then by this image authentication and be labeled as non-tabular drawing picture and reject non-tabular drawing Picture;From all outer encirclement frames of image extract length between the maximum length threshold and minimum length threshold of encirclement frame and All outer encirclement frames of the width between the Breadth Maximum threshold value and minimum widith threshold value of encirclement frame, and outer encirclement frame will be extracted It is labeled as doubtful table area;It is non-form that the outer encirclement frame do not extracted, which differentiates,;Again from the doubtful table area of extraction The outer area of encirclement frame of interior extraction and the ratio of the boundary rectangle area of itself in area ratio max-thresholds and area ratio most All encirclement frames between small threshold value, so as to eliminate the interference of word and noise;The horizontal line section number included is extracted to exist Between the minimal amount threshold value and maximum number threshold value of horizontal line section number and comprising vertical segment number in vertical line hop count Doubtful table area between purpose minimal amount threshold value and maximum number threshold value, and the doubtful table area of extraction is marked For form, it is non-form that the outer encirclement frame do not extracted, which differentiates,.The present invention is excluded successively by above-mentioned form mirror method for distinguishing Non- form, final extraction belong to the region of the outer encirclement frame of form, differentiate that the accuracy rate of form is higher;The present invention also passes through searching The method of each encirclement frame in form, each encirclement frame in form can form a profile, according to finding each encirclement frame That is the position of profile is ranked up, final position-table, and the reduction form data after being lays the foundation, and form locating is accurate Rate is higher.
Brief description of the drawings
Fig. 1 is the workflow diagram of the present invention.
Embodiment
The embodiment of the present invention is further illustrated below according to Fig. 1:
Referring to Fig. 1, the form based on tabular analysis technology in image differentiates and localization method, comprises the following steps:
(1)Various similar form samples are extracted into encirclement all in image by device scans such as scanners into image Frame;
(2)Set minimum length threshold L1, maximum length threshold L2, minimum widith threshold value W1 and the Breadth Maximum of encirclement frame Threshold value W2, area ratio max-thresholds S1 and area ratio minimum threshold S2 is set,;
(3)Maximum length threshold L2 and minimum length threshold of the length in encirclement frame are chosen from all encirclement frames of image Between value L1 and width is between the Breadth Maximum threshold value W2 and minimum widith threshold value W1 of encirclement frame and area is maximum encirclement frame, If there is no the encirclement frame chosen to meeting to require in image, this image labeling is non-tabular drawing picture and rejects non-tabular drawing Picture;
(4)By by step(3)Obtained encirclement frame enters line tilt correction to image;
(5)The method that profile function is searched by the findContours in OpenCV carries from the image of slant correction All outer encirclement frames are taken, by the length of all outer encirclement frames minimum length threshold L1 and maximum length with encirclement frame respectively Threshold value L2 is contrasted, by the width of all outer encirclement frames Breadth Maximum threshold value W2 and minimum widith threshold value W1 with encirclement frame respectively Contrast, extraction length between the maximum length threshold L2 and minimum length threshold L1 of encirclement frame and width encirclement frame maximum All outer encirclement frames between width threshold value W2 and minimum widith threshold value W1, and each outer encirclement frame of extraction is marked For doubtful table area, doubtful table area is set to N number of, and the outer encirclement frame for being unsatisfactory for extraction conditions is labeled as non-form and rejected Non- form;
(6)The inside of one of them doubtful table area is carried out looking for encirclement frame to operate, calculates the interior of doubtful table area The ratio of the area of each encirclement frame in portion and the boundary rectangle area of itself, by the area of each encirclement frame and the external square of itself The ratio of shape area contrasts with area ratio max-thresholds S1 and area ratio minimum threshold S2 respectively, from doubtful table area The ratio of the area of internal extraction encirclement frame and the boundary rectangle area of itself is in area ratio max-thresholds S1 and area ratio All encirclement frames between minimum threshold S2, and then eliminate the interference of word and noise in image;
(7)Set the minimal amount threshold value H1 and maximum number threshold value of the horizontal line section number included in doubtful table area H2, the minimal amount threshold value H3 and maximum number threshold value H4 of the vertical segment number included in doubtful table area are set, is passed through The method of Hough transform detection of straight lines is detected by step(6)The water that all encirclement frames include in obtained doubtful table area Horizontal line hop count mesh and vertical segment number, by step(6)The level that all encirclement frames include in obtained doubtful table area Line segment number contrasts with the minimal amount threshold value H1 and maximum number threshold value H2 of horizontal line section number respectively, by step(6)Obtain Doubtful table area in the vertical segment number that includes of all encirclement frame minimal amount threshold with vertical segment number respectively Value H3 and maximum number threshold value H4 contrasts, if the horizontal line section number that includes of doubtful table area is in the minimum of horizontal line section number Between quantity threshold H1 and maximum number threshold value H2 and comprising vertical segment number vertical segment number minimal amount threshold Between value H3 and maximum number threshold value H4, then this doubtful table area is labeled as form, and perform step(8);Otherwise mark For non-form and non-form is rejected, and returns to execution step(6).
(8)According to finding the method for each encirclement frame in form successively to step(7)Obtained form carries out form locating, After positioning, execution step is returned again to(6), until N number of doubtful table area in image is carried out into step(6), step(7)With Step(8)Operation, positioning is completed to all forms in image.
Further, in all encirclement frames from image choose length encirclement frame maximum length threshold L2 and Between minimum length threshold L1 and width between the Breadth Maximum threshold value W2 and minimum widith threshold value W1 of encirclement frame and area most Big encirclement frame, including:
The area of all encirclement frames of image is calculated, the maximum encirclement of area is chosen from all encirclement frames of image Frame, minimum length threshold L1 and maximum length threshold L2 of the length of the maximum encirclement frame of area respectively with encirclement frame are contrasted, Breadth Maximum threshold value W2 and minimum widith threshold value W1 of the width of the maximum encirclement frame of area respectively with encirclement frame are contrasted, if face The length of the maximum encirclement frame of product is less than the minimum length threshold L1 of encirclement frame or the width of the encirclement frame of area maximum is less than This image labeling is then non-tabular drawing picture by the minimum widith threshold value W1 of encirclement frame and rejects non-tabular drawing picture, is otherwise labeled as Tabular drawing picture to be detected;
If in tabular drawing picture to be detected the length of the maximum encirclement frame of area be more than encirclement frame maximum length threshold L2 or The width of the maximum encirclement frame of person's area is more than the Breadth Maximum threshold value W2 of encirclement frame, then chooses the big encirclement frame of area time, will The length of the big encirclement frame of area time contrasts with the minimum length threshold L1 and maximum length threshold L2 of encirclement frame respectively, by area The width of secondary big encirclement frame contrasts with the Breadth Maximum threshold value W2 and minimum widith threshold value W1 of encirclement frame respectively, if area The length of big encirclement frame is more than the maximum length threshold L2 of encirclement frame or the width of the secondary big encirclement frame of area is more than encirclement The Breadth Maximum threshold value W2 of frame, then the third-largest encirclement frame of area is chosen, according to the method so contrasted successively until choosing one Individual length between the maximum length threshold L2 and minimum length threshold L1 of encirclement frame and width encirclement frame Breadth Maximum threshold Encirclement frame between value W2 and minimum widith threshold value W1, and the encirclement frame chosen belongs to maximum length threshold of the length in encirclement frame Between L2 and minimum length threshold L1 and institute of the width between the Breadth Maximum threshold value W2 and minimum widith threshold value W1 of encirclement frame There is the encirclement frame that area is maximum in encirclement frame;If there is no the encirclement frame chosen to meeting to require in tabular drawing picture to be detected, This tabular drawing picture to be detected is then labeled as non-tabular drawing picture and rejects non-tabular drawing picture.
Further, it is described by by step(3)Obtained encirclement frame enters line tilt correction to image, including:
Detected by the method for Hough transform detection of straight lines by step(3)All line segments in obtained encirclement frame, with The left upper apex of encirclement frame is origin, using the horizontal right direction of encirclement frame as X-axis positive direction, with the side vertically downward of encirclement frame To the angle for for Y-axis positive direction, calculating all line segments and X-axis positive direction(0-180 degree)If angle is more than 90 degree, subtracted with 180 The angle is gone, chooses the angle of minimum, the angle using the angle of minimum as form image rotation to be detected, if the angle line segment It is more than 90 degree with the angle of X-axis positive direction, using rotate counterclockwise, otherwise using turning clockwise, is finally completed to be detected The Slant Rectify of tabular drawing picture.
Further, the basis finds the method for each encirclement frame in form to step successively(7)Obtained form enters Row form locating, including:
From step(7)The left upper apex of obtained form starts, using scanning method from left to right successively find and table The close encirclement frame of the height of the left upper apex of lattice simultaneously from left to right sorts successively to encirclement frame;The encirclement frame of the first row has sorted Cheng Hou, since the highest summit that the first row surrounds frame bottom, the encirclement close with the height on highest summit is found successively Frame simultaneously sorts successively according to the front and back position of encirclement frame to encirclement frame;After the completion of the encirclement frame sequence of second row, according to finding the The step of two row encirclement frames, finds the third line encirclement frame and encirclement frame is sorted successively according to the front and back position of encirclement frame successively, directly To the encirclement frame for searching out form bottommost, now each encirclement frame in form has sorted completion;
According to the coordinate setting of the Sort Direction of the encirclement frame in the form to have sorted and encirclement frame Sort Direction to tool Body table position, complete form locating.
Protection scope of the present invention includes but is not limited to embodiment of above, and protection scope of the present invention is with claims It is defined, any replacement being readily apparent that to those skilled in the art that this technology is made, deformation, improvement each fall within the present invention's Protection domain.

Claims (4)

1. the form based on tabular analysis technology in image differentiates and localization method, it is characterised in that comprises the following steps:
(1)Encirclement frame all in image will be extracted similar to form scan sample into image;
(2)Set minimum length threshold, maximum length threshold, minimum widith threshold value and the Breadth Maximum threshold value of encirclement frame, setting Area ratio max-thresholds and area ratio minimum threshold;
(3)Length is chosen from all encirclement frames of image between the maximum length threshold and minimum length threshold of encirclement frame And the encirclement frame that width is between the Breadth Maximum threshold value and minimum widith threshold value of encirclement frame and area is maximum;
(4)By by step(3)Obtained encirclement frame enters line tilt correction to image;
(5)Extraction outer encirclement frame all in the image of slant correction, extraction length encirclement frame maximum length threshold and Between minimum length threshold and all outer encirclements of the width between the Breadth Maximum threshold value and minimum widith threshold value of encirclement frame Frame, and each outer encirclement frame of extraction is labeled as doubtful table area;
(6)The inside of the doubtful table area obtained to step (5) carries out looking for encirclement frame to operate, extract the area of encirclement frame with from All encirclement frames of the ratio of the boundary rectangle area of body between area ratio max-thresholds and area ratio minimum threshold;
(7)The minimal amount threshold value and maximum number threshold value of the horizontal line section number included in doubtful table area are set, is set The minimal amount threshold value and maximum number threshold value of the vertical segment number included in doubtful table area, are detected by Hough transformation The method of straight line is detected by step(6)The horizontal line section number and hang down that all encirclement frames include in obtained doubtful table area Straightway number, extract the horizontal line section number that includes horizontal line section number minimal amount threshold value and maximum number threshold value it Between and comprising doubtful table of the vertical segment number between the minimal amount threshold value and maximum number threshold value of vertical segment number The doubtful table area of extraction is simultaneously labeled as form by lattice region;
(8)According to finding the method for each encirclement frame in form successively to step(7)Obtained form carries out form locating.
2. the form according to claim 1 based on tabular analysis technology in image differentiates and localization method, its feature exist In:Length is chosen in all encirclement frames from image between the maximum length threshold and minimum length threshold of encirclement frame And the encirclement frame that width is between the Breadth Maximum threshold value and minimum widith threshold value of encirclement frame and area is maximum, including:
The maximum encirclement frame of area is chosen from all encirclement frames of image, by the length of the maximum encirclement frame of area respectively with Minimum length threshold and the maximum length threshold contrast of encirclement frame, by the width of the maximum encirclement frame of area respectively with encirclement frame Minimum widith threshold value and Breadth Maximum threshold comparison, if the length of the encirclement frame of area maximum is less than the minimum length threshold of encirclement frame The width of the encirclement frame of value or area maximum is less than the minimum widith threshold value of encirclement frame, then is non-tabular drawing by this image labeling Picture simultaneously rejects non-tabular drawing picture, is otherwise labeled as tabular drawing picture to be detected;
If the length of the maximum encirclement frame of area is more than the maximum length threshold or area of encirclement frame in tabular drawing picture to be detected The width of maximum encirclement frame is more than the Breadth Maximum threshold value of encirclement frame, then chooses the big encirclement frame of area time, if area time is big The width of the length maximum length threshold or time big encirclement frame of area that are more than encirclement frame of encirclement frame be more than encirclement frame Breadth Maximum threshold value, then choose the third-largest encirclement frame of area, meets maximum length of the length in encirclement frame until choosing one Between threshold value and minimum length threshold and encirclement frame of the width between the Breadth Maximum threshold value and minimum widith threshold value of encirclement frame.
3. the form according to claim 2 based on tabular analysis technology in image differentiates and localization method, its feature exist In:It is described by by step(3)Obtained encirclement frame enters line tilt correction to image, including:
Detected by the method for Hough transformation detection of straight lines by step(3)All line segments in obtained encirclement frame, calculate institute Some line segments and the angle of horizontal direction and the angle for choosing minimum, the rotation using the angle of minimum as tabular drawing picture to be detected Angle, tabular drawing picture to be detected is rotated, then completed to the Slant Rectify of tabular drawing picture to be detected.
4. the form according to claim 1 based on tabular analysis technology in image differentiates and localization method, its feature exist In:The basis finds the method for each encirclement frame in form to step successively(7)Obtained form carries out form locating, bag Include:
From step(7)The left upper apex of obtained form starts, and finds the bag close with the height of the left upper apex of form successively Peripheral frame is simultaneously ranked up according to the front and back position of encirclement frame;
The first row encirclement frame sequence after the completion of, since the first row surround frame bottom highest summit, successively find with most The close encirclement frame of the height on high summit simultaneously sorts successively;
After the completion of the encirclement frame sequence of second row, the third line encirclement frame is found successively simultaneously according to the step of finding the second row encirclement frame Sort successively, the encirclement frame until searching out form bottommost, obtain the form that encirclement frame has sorted;
According to the coordinate setting of the Sort Direction of the encirclement frame in the form to have sorted and encirclement frame Sort Direction to specific table Case is put, and completes form locating.
CN201610593119.3A 2016-07-26 2016-07-26 Form based on tabular analysis technology in image differentiates and localization method Active CN106203397B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610593119.3A CN106203397B (en) 2016-07-26 2016-07-26 Form based on tabular analysis technology in image differentiates and localization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610593119.3A CN106203397B (en) 2016-07-26 2016-07-26 Form based on tabular analysis technology in image differentiates and localization method

Publications (2)

Publication Number Publication Date
CN106203397A CN106203397A (en) 2016-12-07
CN106203397B true CN106203397B (en) 2017-11-10

Family

ID=57495785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610593119.3A Active CN106203397B (en) 2016-07-26 2016-07-26 Form based on tabular analysis technology in image differentiates and localization method

Country Status (1)

Country Link
CN (1) CN106203397B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9842251B2 (en) * 2016-01-29 2017-12-12 Konica Minolta Laboratory U.S.A., Inc. Bulleted lists
CN107895173B (en) * 2017-11-06 2021-08-17 国网重庆市电力公司电力科学研究院 Method, device and equipment for labeling image target and readable storage medium
CN108776776B (en) * 2018-05-25 2021-11-02 河南思维轨道交通技术研究院有限公司 Identification method for horizontal and vertical line segment in image
CN109308465B (en) * 2018-09-14 2020-01-17 百度在线网络技术(北京)有限公司 Table line detection method, device, equipment and computer readable medium
CN109816045A (en) * 2019-02-11 2019-05-28 青岛海信智能商用系统股份有限公司 A kind of commodity recognition method and device
CN114862753A (en) * 2022-03-17 2022-08-05 北京梦诚科技有限公司 Automatic high-precision table correction method and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7739587B2 (en) * 2006-06-12 2010-06-15 Xerox Corporation Methods and apparatuses for finding rectangles and application to segmentation of grid-shaped tables
JP5871571B2 (en) * 2011-11-11 2016-03-01 株式会社Pfu Image processing apparatus, rectangle detection method, and computer program
JP5948866B2 (en) * 2011-12-27 2016-07-06 富士ゼロックス株式会社 Image processing apparatus and program
CN103258201B (en) * 2013-04-26 2016-04-06 四川大学 A kind of form lines extracting method of amalgamation of global and local message
CN104484643B (en) * 2014-10-27 2018-05-29 中国科学技术大学 The intelligent identification Method and system of a kind of handwriting table

Also Published As

Publication number Publication date
CN106203397A (en) 2016-12-07

Similar Documents

Publication Publication Date Title
CN106203397B (en) Form based on tabular analysis technology in image differentiates and localization method
CN105205488B (en) Word area detection method based on Harris angle points and stroke width
CN104091388B (en) A kind of paper money discrimination method based on magnetic image and device
CN104346858B (en) A kind of bank note face amount recognition methods based on magnetic image and device
CN106485183A (en) A kind of Quick Response Code localization method and system
CN106530310B (en) A kind of pedestrian count method and device based on the identification of human body overhead
CN104778470B (en) Text detection based on component tree and Hough forest and recognition methods
CN109325401A (en) The method and system for being labeled, identifying to title field are positioned based on edge
CN104463138B (en) The text positioning method and system of view-based access control model structure attribute
CN102254144A (en) Robust method for extracting two-dimensional code area in image
CN105095822B (en) A kind of Chinese letter co pattern image detection method and system
CN103473551A (en) Station logo recognition method and system based on SIFT operators
CN101770575A (en) Method and device for measuring image inclination angle of business card
CN104966051A (en) Method of recognizing layout of document image
CN109993739B (en) Seal authenticity identification method and device
CN104700420A (en) Ellipse detection method and system based on Hough conversion and ovum identification method
US20090169113A1 (en) Automatic and Semi-Automatic Detection of Planar Shapes from 2D Images
CN103679218B (en) A kind of handwritten form keyword detection method
CN109767436B (en) Method and device for identifying authenticity of seal
Roy et al. A novel approach to skew detection and character segmentation for handwritten Bangla words
CN106204616A (en) The recognition methods of a kind of Iran note denomination and device
CN104680142A (en) Method for comparing four-slap fingerprint based on feature point set segmentation and RST invariant features
CN104166843B (en) Document image source judgment method based on linear continuity
CN103235951A (en) Preliminary positioning method for matrix type two-dimensional bar code
CN109815954A (en) Correction for direction method, apparatus, equipment and the storage medium of VAT invoice image

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Yu Zhiwen

Inventor after: Che Shaoshuai

Inventor after: Shao Yiting

Inventor after: Hu Jia

Inventor after: Wu Zhouyang

Inventor after: Zhou Ling

Inventor before: Yu Zhiwen

Inventor before: Che Shaoshuai

Inventor before: Shao Yiting

Inventor before: Hu Jia

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 210005 No. 268, Hanzhoung Road, Nanjing, Jiangsu

Patentee after: CLP Hongxin Information Technology Co., Ltd

Address before: 210005 No. 268, Hanzhoung Road, Nanjing, Jiangsu

Patentee before: Jiangsu Hongxin System Integration Co., Ltd.

CP01 Change in the name or title of a patent holder