CN102567711A - Method and system for making and using scanning recognition template - Google Patents

Method and system for making and using scanning recognition template Download PDF

Info

Publication number
CN102567711A
CN102567711A CN2010106228013A CN201010622801A CN102567711A CN 102567711 A CN102567711 A CN 102567711A CN 2010106228013 A CN2010106228013 A CN 2010106228013A CN 201010622801 A CN201010622801 A CN 201010622801A CN 102567711 A CN102567711 A CN 102567711A
Authority
CN
China
Prior art keywords
template
locating piece
image
making
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010106228013A
Other languages
Chinese (zh)
Inventor
龚健
周长岭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Founder International Co Ltd
Founder International Beijing Co Ltd
Original Assignee
Founder International Co Ltd
Founder International Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Founder International Co Ltd, Founder International Beijing Co Ltd filed Critical Founder International Co Ltd
Priority to CN2010106228013A priority Critical patent/CN102567711A/en
Publication of CN102567711A publication Critical patent/CN102567711A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Character Input (AREA)

Abstract

The invention relates to a method and a system for making and using a scanning recognition template. The method comprises the following steps of: making a recognition template, dividing a locating block in the template, and setting the attribute of the locating plate; performing regional analysis on a scanned image, and searching out the template of which a superposition rate to the image region reaches a threshold; matching the locating block in the template with the region in the scanned image, and extracting and recognizing the content information of the matched locating block; and classifying the recognized content information of the locating block. With the method and the system provided by the invention, the recognition efficiency to the regular complex layout is improved greatly, and the recognition information is checked and classified automatically.

Description

A kind of scanning recognition template making and use method and system
Technical field
The present invention relates to scan distinguishment technical field, be specifically related to a kind of scanning recognition template making and use method and system.
Background technology
Along with the continuous progress of society, the fast development of digitizing technique, the data requestor of People more and more favor electronization need carry out digitized processing so get more and more to the papery data, scans identification.
In the digital production process, the OCR technology is very crucial, the good and bad quality that directly influences data identification quality of OCR technology.And the very big difficulty that increase Computer Automatic Recognition such as the various charts in the papery data, formula.Also have the picture in some data, identification is got up can waste the plenty of time, and effect is bad, reduces identification efficiency simultaneously greatly.The workload of the content arrangement after the identification also is very huge, is very easy to cause content confused, also needs manual work to put in order, has increased cost of labor.
Summary of the invention
The objective of the invention is to defective, a kind of scanning recognition template making and use method and system are provided, to improve picture and text identification efficiency and quality to present OCR technology.
The present invention provides a kind of scanning recognition template making and use method, comprises the steps:
(S0) make recognition template, in said template, mark locating piece, and the attribute of locating piece is set;
(S1) scan image is carried out regional analysis, find out the template that reaches the setting threshold values with the image-region coincidence factor;
(S2) locating piece in the said template and the zone in the scan image are mated, extract and discern the content information of Matching Location piece;
(S3) the locating piece content information of having discerned is sorted out.
Further, aforesaid a kind of scanning recognition template making and use method, this method also comprises, scan image is carried out normalization handle, said normalization is handled and is meant, the anamorphose that causes in the scanning is corrected.
Further; Aforesaid a kind of scanning recognition template making and use method, in the step (S0), said template is meant the closed figure zone that comprises the border; Comprise one or more locating pieces in the said template; Wherein, the sealing rectangle frame of locating piece finger print intralamellar part is used for the content in its matching area is discerned the row labels of going forward side by side.
Further; Aforesaid a kind of scanning recognition template making and use method, template and locating piece all have adeditive attribute, comprise: the coupling metric attribute; Be used to weigh the coincidence factor of coincidence factor, locating piece and the image-region of template and image, and as the index of manual intervention.
Further, aforesaid a kind of scanning recognition template making and use method, the adeditive attribute of said locating piece also comprises:
1) identification content type: comprise literal, figure, image;
2) identification content clustering label: be used for system and the identification content carried out classification processing according to this label;
3) content verification rule: be used for to discerning the rule that content is checked;
4) from the dynamic deformation attribute: be used for locating piece and overlap with image-region when contrasting, locating piece is carried out the fine setting of size, position in setting the threshold values scope.
Further; Aforesaid a kind of scanning recognition template making and use method; In the step (S2); In the template locating piece with scan image in the zone mate, promptly two regional rectangle coincidence factors coupling metric attribute preset threshold of reaching locating piece thinks that promptly this zone and this locating piece mate.
Further, aforesaid a kind of scanning recognition template making and use method, in the step (S2), locating piece allows nested, when locating piece is discerned the interior content in its zone, discerns according to following order: by the nested number of plies, matching degree, priority weight.
Further, aforesaid a kind of scanning recognition template making and use method, in the step (S2), locating piece carries out the fine setting of size, position according to the picture material of its matching area in setting the threshold values scope to locating piece.
Further; Aforesaid a kind of scanning recognition template making and use method; In the step (S2); Locating piece carries out dissimilar processing according to identification content type mark to the image in its zone: as carrying out OCR identification to literal, scratching figure to image, possibly carry out curve fitting to figure.
A kind of scanning recognition template is made and using system, comprising:
The template construct device is used for making template and marks the template locating piece, and the attribute of locating piece is set;
The Template Manager device is used to manage all templates, and finds out the template that reaches the setting threshold values with the image-region coincidence factor;
The identification actuating unit is used for the zone of locating piece and scan image is mated, and extracts and discern the content information of Matching Location piece;
Sorter is used for classifying to accomplishing content identified information.
Beneficial effect of the present invention is following: the present invention helps to promote recognition efficiency, and carries out identifying information verification and classification for the file of publishing based on template.For the image-region evident characteristic, through the cutting zone, and pass through regional separation and the marks of different identification difficulty, not only can verify the accuracy of discerning to promote each other, but also discern the taxonomic revision of content simultaneously.Adopt method and system of the present invention, solved the relative positioning problem of the picture that takes, reduced the workload of manual sorting significantly.
Description of drawings
Fig. 1 is that a kind of scanning recognition template is made and the using system structural drawing in the embodiment of the invention;
Fig. 2 is a kind of scanning recognition template making and use method process flow diagram in the embodiment of the invention;
Fig. 3 is original scan image among the embodiment;
Fig. 4 is the template pattern of the most suitable Fig. 3 among the embodiment;
Fig. 5 is the synoptic diagram that locating piece and image-region mate among the embodiment.
Embodiment
Be elaborated below in conjunction with the Figure of description specific embodiments of the invention.
As shown in Figure 1, the invention provides a kind of scanning recognition template manufacturing system, comprising:
Template construct device 11 is used for making template and template locating piece, and the attribute of locating piece is set;
Template Manager device 12 is used to manage all templates, and finds out the template that reaches the setting threshold values with the image-region coincidence factor;
Identification actuating unit 13 is used for the zone of locating piece and scan image is mated, and extracts and discern the content information of Matching Location piece;
Sorter 14 is used for classifying to accomplishing content identified information.
A kind of scanning recognition template method for making that said system realized is as shown in Figure 2, and this method comprises the steps:
S0: make recognition template, in said template, mark locating piece, and the attribute of locating piece is set.
In the embodiment of the invention, said template is meant the closed figure zone that comprises the border, comprises one or more locating pieces in the said template, and wherein, the sealing rectangle frame of locating piece finger print intralamellar part is used for the content in its matching area is discerned the row labels of going forward side by side.
Template and locating piece all have adeditive attribute, comprise: the coupling metric attribute, be used to weigh the coincidence factor of coincidence factor, locating piece and the image-region of template and image, and as the index of manual intervention.
The adeditive attribute of said locating piece also comprises:
1) identification content type: like literal, figure, image;
2) identification content clustering label: be used for system and the identification content carried out classification processing according to this label.
3) content verification rule is used for the rule that the identification content is checked;
4) from the dynamic deformation attribute: be used for locating piece and overlap with image-region when contrasting, locating piece is carried out the fine setting of size, position in setting the threshold values scope.
S1: scan image is carried out regional analysis, find out the template that reaches the setting threshold values with the image-region coincidence factor.
In the embodiment of the invention, scan image is carried out the connected domain analysis, carry out the Region Segmentation of image according to the characteristic of connected domain, the template in image after the Region Segmentation and the Template Manager device is mated, the zoning coincidence factor is found out corresponding template thus.This connected domain analysis and matching process are the known technology of this area.
In the embodiment of the invention, comprise that also scan image is carried out normalization to be handled, said normalization is handled and is meant, the anamorphose that causes in the scanning is corrected, and the typical case is crooked like the page, and size has slight variation.Efficient and the accuracy that helps to improve template and scan image coupling handled in normalization.What the normalization processing of the scanning page was adopted all is some known image processing techniquess.
S2: locating piece in the said template and the zone in the scan image are mated, extract and discern the content information of Matching Location piece.
In the embodiment of the invention, mate in the zone in the template in locating piece and the scan image, and promptly two regional rectangle coincidence factors reach the threshold values that the coupling metric attribute of locating piece sets and think that promptly this zone and this locating piece mate.
Further, locating piece allows nested, when locating piece is discerned the content in its zone, discerns according to following order: by the nested number of plies, matching degree, priority weight.
Further, locating piece carries out the fine setting of size, position according to the picture material of its matching area in setting the threshold values scope to locating piece.
Further, locating piece carries out dissimilar processing according to identification content type mark to the image in its zone: as carrying out OCR identification to literal, scratching figure to image, possibly carry out curve fitting to figure.
S3: the locating piece content information to having discerned is sorted out.
The identifying information corresponding like some locating piece is image, and the identifying information that some locating piece is corresponding is a literal, and these dissimilar content informations are sorted out accordingly.
Below for the concrete embodiment of the present invention describes, to specify the concrete ins and outs of scanning recognition template method for making.
Fig. 3 is original scan image among the embodiment, by finding out that this original scanning is the content of a menu among the figure, comprises the pattern of width of cloth completion article, the material of whole menu, method for making and points for attention.
Fig. 4 is the template pattern of the most suitable Fig. 3 among the embodiment.In the Template Manager device, carry out regional analysis according to Fig. 3, find out the template that reaches the setting threshold values with the image-region coincidence factor, in the present embodiment, format template as shown in Figure 4.
By finding out among the figure, this template is formed template housing 41 and locating piece 42 by 2 parts.Wherein, template housing 41 has been set the size of whole scan image, 42 distribution situations that mark content in the scan image of locating piece.
In the present embodiment, locating piece all includes following attribute:
1) identification content type: like literal, figure, image;
2) identification content clustering label: be used for system and the identification content carried out classification processing according to this label.
3) content verification rule is used for the rule that the identification content is checked;
4) from the dynamic deformation attribute: be used for locating piece and overlap with image-region when contrasting, locating piece is carried out the fine setting of size, position in setting the threshold values scope.
Fig. 5 is that locating piece and image-region mate among the embodiment.In the identification actuating unit, at first with locating piece in the template and image-region according to location matches, promptly two regional rectangle coincidence factors reach and set threshold values and think that promptly this zone and this locating piece mate.This location matches technology is the known technology of this area, and here with regard to no longer too much introducing, in the present embodiment, threshold value setting is 85%, and promptly the coincidence factor of locating piece and image-region reaches more than 85% in the template, just thinks that this zone and this locating piece mate.In the present embodiment, as shown in Figure 5.
After regional and locating piece carry out elementary coupling,, locating piece is carried out the fine setting of size, position in setting the threshold values scope according to the attribute that is arranged in the locating piece.For example to locating piece c, after the coupling of the points for attention in locating piece c and the image, locating piece c narrows down to literal scope automatically, and has ignored the housing of literal.
Next, discern the content of Matching Location piece, content identified is recorded in the locating piece.Simultaneously content identified is classified, for example the content type that identifies of locating piece a is an image, and the content type that locating piece b identifies is a literal.The locating piece content information of having discerned is sorted out.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, belong within the scope of claim of the present invention and equivalent technology thereof if of the present invention these are revised with modification, then the present invention also is intended to comprise these changes and modification interior.

Claims (10)

1. a scanning recognition template making and use method comprises the steps:
(S0) make recognition template, in said template, mark locating piece, and the attribute of locating piece is set;
(S1) scan image is carried out regional analysis, find out the template that reaches the setting threshold values with the image-region coincidence factor;
(S2) locating piece in the said template and the zone in the scan image are mated, extract and discern the content information of Matching Location piece;
(S3) the locating piece content information of having discerned is sorted out.
2. a kind of scanning recognition template making and use method as claimed in claim 1 is characterized in that, this method also comprises, scan image is carried out normalization handle, and said normalization is handled and is meant, the anamorphose that causes in the scanning is corrected.
3. a kind of scanning recognition template making and use method as claimed in claim 1; It is characterized in that in the step (S0), said template is meant the closed figure zone that comprises the border; Comprise one or more locating pieces in the said template; Wherein, the sealing rectangle frame of locating piece finger print intralamellar part is used for the content in its matching area is discerned the row labels of going forward side by side.
4. a kind of scanning recognition template making and use method as claimed in claim 3; It is characterized in that; Described template and locating piece all have adeditive attribute; Comprise: the coupling metric attribute, be used to weigh the coincidence factor of coincidence factor, locating piece and the image-region of template and image, and as the index of manual intervention.
5. a kind of scanning recognition template making and use method as claimed in claim 4 is characterized in that the adeditive attribute of said locating piece also comprises:
1) identification content type: comprise literal, figure, image;
2) identification content clustering label: be used for system and the identification content carried out classification processing according to this label;
3) content verification rule: be used for to discerning the rule that content is checked;
4) from the dynamic deformation attribute: be used for locating piece and overlap with image-region when contrasting, locating piece is carried out the fine setting of size, position in setting the threshold values scope.
6. a kind of scanning recognition template making and use method as claimed in claim 4; It is characterized in that; In the step (S2); Mate in zone in the template in locating piece and the scan image, thinks that promptly this zone and this locating piece mate if two regional rectangle coincidence factors reach the threshold values that the coupling metric attribute of locating piece sets.
7. a kind of scanning recognition template making and use method as claimed in claim 6; It is characterized in that in the step (S2), locating piece allows nested; When locating piece is discerned the interior content in its zone, discern according to following order: by the nested number of plies, matching degree, priority weight.
8. a kind of scanning recognition template making and use method as claimed in claim 6 is characterized in that, in the step (S2), locating piece carries out the fine setting of size, position according to the picture material of its matching area in setting the threshold values scope to locating piece.
9. a kind of scanning recognition template making and use method as claimed in claim 6 is characterized in that, in the step (S2), locating piece carries out dissimilar processing according to identification content type mark to the image in its zone.
10. a scanning recognition template is made and using system, comprising:
The template construct device is used for making template and marks the template locating piece, and the attribute of locating piece is set;
The Template Manager device is used to manage all templates, and finds out the template that reaches the setting threshold values with the image-region coincidence factor;
The identification actuating unit is used for the zone of locating piece and scan image is mated, and extracts and discern the content information of Matching Location piece;
Sorter is used for classifying to accomplishing content identified information.
CN2010106228013A 2010-12-29 2010-12-29 Method and system for making and using scanning recognition template Pending CN102567711A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010106228013A CN102567711A (en) 2010-12-29 2010-12-29 Method and system for making and using scanning recognition template

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010106228013A CN102567711A (en) 2010-12-29 2010-12-29 Method and system for making and using scanning recognition template

Publications (1)

Publication Number Publication Date
CN102567711A true CN102567711A (en) 2012-07-11

Family

ID=46413091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010106228013A Pending CN102567711A (en) 2010-12-29 2010-12-29 Method and system for making and using scanning recognition template

Country Status (1)

Country Link
CN (1) CN102567711A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809157A (en) * 2014-12-29 2016-07-27 北京鸿合智能系统股份有限公司 Answer sheet modeling method and device
CN107206587A (en) * 2014-12-05 2017-09-26 Ars责任有限公司 Equipment for being oriented to the part especially by crawls such as robot, automation equipments
CN107517272A (en) * 2017-09-14 2017-12-26 新疆圣力信息科技有限公司 A kind of device, the system and method for automatic data collection set form data
CN107590495A (en) * 2017-09-18 2018-01-16 哈尔滨成长科技有限公司 Answer sheet picture method for correcting error, device, readable storage medium storing program for executing and electronic equipment
CN108665439A (en) * 2017-08-22 2018-10-16 深圳安博电子有限公司 Method of testing substrate and terminal device
CN108875697A (en) * 2018-07-05 2018-11-23 南昌市微轲联信息技术有限公司 Collecting vehicle information method for uploading, device, storage medium and computer equipment
CN109086738A (en) * 2018-08-23 2018-12-25 深圳市深晓科技有限公司 A kind of character identifying method and device based on template matching
CN110705610A (en) * 2019-09-17 2020-01-17 孔佑强 Evaluation system and method based on handwriting detection and temporary writing capability
CN111353611A (en) * 2018-12-20 2020-06-30 核动力运行研究所 Automatic generation system and method for in-service inspection and overhaul inspection report of nuclear power station

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1619580A (en) * 2004-09-03 2005-05-25 深圳市海云天科技有限公司 Information identification method of full-filling information card
US20090087103A1 (en) * 2007-09-28 2009-04-02 Hitachi High-Technologies Corporation Inspection Apparatus and Method
CN101464951A (en) * 2007-12-21 2009-06-24 北大方正集团有限公司 Image recognition method and system
CN101882225A (en) * 2009-12-29 2010-11-10 北京中科辅龙计算机技术股份有限公司 Engineering drawing material information extraction method based on template
CN101923643A (en) * 2010-08-11 2010-12-22 中科院成都信息技术有限公司 General form recognizing method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1619580A (en) * 2004-09-03 2005-05-25 深圳市海云天科技有限公司 Information identification method of full-filling information card
US20090087103A1 (en) * 2007-09-28 2009-04-02 Hitachi High-Technologies Corporation Inspection Apparatus and Method
CN101464951A (en) * 2007-12-21 2009-06-24 北大方正集团有限公司 Image recognition method and system
CN101882225A (en) * 2009-12-29 2010-11-10 北京中科辅龙计算机技术股份有限公司 Engineering drawing material information extraction method based on template
CN101923643A (en) * 2010-08-11 2010-12-22 中科院成都信息技术有限公司 General form recognizing method

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107206587A (en) * 2014-12-05 2017-09-26 Ars责任有限公司 Equipment for being oriented to the part especially by crawls such as robot, automation equipments
CN105809157A (en) * 2014-12-29 2016-07-27 北京鸿合智能系统股份有限公司 Answer sheet modeling method and device
CN108665439A (en) * 2017-08-22 2018-10-16 深圳安博电子有限公司 Method of testing substrate and terminal device
CN107517272A (en) * 2017-09-14 2017-12-26 新疆圣力信息科技有限公司 A kind of device, the system and method for automatic data collection set form data
CN107590495A (en) * 2017-09-18 2018-01-16 哈尔滨成长科技有限公司 Answer sheet picture method for correcting error, device, readable storage medium storing program for executing and electronic equipment
CN108875697A (en) * 2018-07-05 2018-11-23 南昌市微轲联信息技术有限公司 Collecting vehicle information method for uploading, device, storage medium and computer equipment
CN109086738A (en) * 2018-08-23 2018-12-25 深圳市深晓科技有限公司 A kind of character identifying method and device based on template matching
CN111353611A (en) * 2018-12-20 2020-06-30 核动力运行研究所 Automatic generation system and method for in-service inspection and overhaul inspection report of nuclear power station
CN111353611B (en) * 2018-12-20 2023-05-26 核动力运行研究所 Nuclear power station in-service inspection large repair inspection report automatic generation system and method
CN110705610A (en) * 2019-09-17 2020-01-17 孔佑强 Evaluation system and method based on handwriting detection and temporary writing capability

Similar Documents

Publication Publication Date Title
CN102567711A (en) Method and system for making and using scanning recognition template
US8792715B2 (en) System and method for forms classification by line-art alignment
JP5492205B2 (en) Segment print pages into articles
Ray Choudhury et al. An architecture for information extraction from figures in digital libraries
CN102081732B (en) Method and system for recognizing format template
CN104778470B (en) Text detection based on component tree and Hough forest and recognition methods
Rigaud et al. Robust frame and text extraction from comic books
CN101017533A (en) Recognition method of printed mongolian character
CN109325401A (en) The method and system for being labeled, identifying to title field are positioned based on edge
CN100562074C (en) The method that a kind of video caption extracts
CN102332096A (en) Video caption text extraction and identification method
CN1760860A (en) Device part assembly drawing image search apparatus
CN1237742A (en) Address reader, sorting machine and character string recognition method for mail and the like
EP2220590A1 (en) A method for processing optical character recognition (ocr) data, wherein the output comprises visually impaired character images
CN112419260A (en) PCB character area defect detection method
KR101937398B1 (en) System and method for extracting character in image data of old document
CN113723362A (en) Method and device for detecting table line in image
Banerjee et al. Automatic hyperlinking of engineering drawing documents
CN104680142A (en) Method for comparing four-slap fingerprint based on feature point set segmentation and RST invariant features
Sumathi et al. Techniques and challenges of automatic text extraction in complex images: a survey
CN104123527A (en) Mask-based image table document identification method
Lue et al. A novel character segmentation method for text images captured by cameras
CN100356393C (en) Character recognition method predicted base on font
Li et al. Script identification of camera-based images
CN111950556A (en) License plate printing quality detection method based on deep learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned

Effective date of abandoning: 20161130

C20 Patent right or utility model deemed to be abandoned or is abandoned