CN107133621A - The classification of formatting fax based on OCR and information extracting method - Google Patents

The classification of formatting fax based on OCR and information extracting method Download PDF

Info

Publication number
CN107133621A
CN107133621A CN201710334784.5A CN201710334784A CN107133621A CN 107133621 A CN107133621 A CN 107133621A CN 201710334784 A CN201710334784 A CN 201710334784A CN 107133621 A CN107133621 A CN 107133621A
Authority
CN
China
Prior art keywords
field
image
ocr
region
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710334784.5A
Other languages
Chinese (zh)
Other versions
CN107133621B (en
Inventor
于志文
车少帅
胡笳
吴洲洋
周玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU HONGXIN SYSTEM INTEGRATION CO Ltd
Original Assignee
JIANGSU HONGXIN SYSTEM INTEGRATION CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIANGSU HONGXIN SYSTEM INTEGRATION CO Ltd filed Critical JIANGSU HONGXIN SYSTEM INTEGRATION CO Ltd
Priority to CN201710334784.5A priority Critical patent/CN107133621B/en
Publication of CN107133621A publication Critical patent/CN107133621A/en
Application granted granted Critical
Publication of CN107133621B publication Critical patent/CN107133621B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/242Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/158Segmentation of character regions using character size, text spacings or pitch estimation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Discrimination (AREA)
  • Character Input (AREA)

Abstract

The invention discloses a kind of classification of formatting fax based on OCR and information extracting method, including:The binaryzation of adaptive threshold is carried out to the image of fax;Image is corrected;Find the profile of the maximum enclosure frame of form in the image after correction, the gauge outfit region of the upper area interception image of the maximum enclosure frame of form from image;Screen the character contour in gauge outfit region and character contour is merged;The quantity of the field after gauge outfit region merging technique is detected, image is classified;The successful image of classification is extracted, region to be identified in image is positioned;The field in the region to be identified in form is identified according to OCR identification technologies;Optimize identified field.The present invention has the operating efficiency for improving office, liberates employee productivity, realizes transformation of the unstructured data to structural data, is adapted to formatting fax, the i.e. fax of tabular drawing picture, such as standardization contract, self-control voucher, bill.

Description

The classification of formatting fax based on OCR and information extracting method
Technical field
The present invention relates to image processing field, the classification and information extraction of particularly a kind of formatting fax based on OCR Method.
Background technology
With the development of science and technology transnational trans-regional business exchange is also more and more frequent, passed due to faxing compared with alternative document Defeated mode has special legal effect so that it is widely used in office system.Format and contained in fax paper A large amount of useful informations, these current fax papers are required for manually being classified, and manual extraction important information therein, efficiency It is beneath.Need the document classification and information extracting method of a kind of efficient quick badly, lift the operating efficiency of employee, reduction manpower into This, discharges productivity.
China Patent Publication No. CN101876999 disclose it is a kind of generate the methods of fax indexes, message analyzing device and Fax searching system, the system carries out printed page analysis to facsimile message, the characteristic information in the facsimile message is extracted, according to institute State the characteristic information of extraction and set up label for facsimile message, using the label as the facsimile message index, so as to user According to the corresponding facsimile message of the label lookup.But the system is merely able to realize classification and the index of file, it is difficult to realize The extraction of key message in file.
China Patent Publication No. CN102222289 discloses a kind of mobile phone financial management method based on OCR and system, should System carries out analysis identification to Financial Billing by OCR technique, but can not be directed to the scanning fax formatted, it is impossible to realizes and passes The classification and information extraction of portrait of one's ancestors hung at a sacrificial ceremony picture.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of form based on OCR for above-mentioned the deficiencies in the prior art Change classification and the information extracting method of fax, the classification and information extracting method that this formatting based on OCR is faxed, which have, to improve The operating efficiency of office, liberates employee productivity, realizes transformation of the unstructured data to structural data, the present invention is adapted to Fax, the i.e. fax of tabular drawing picture are formatted, such as standardization contract, self-control voucher, bill.
To realize above-mentioned technical purpose, the technical scheme that the present invention takes is:
A kind of classification of formatting fax based on OCR and information extracting method, specifically include following steps:
Step 1:The image file of fax is obtained, the binaryzation of adaptive threshold is carried out to image, the interference of noise is reduced;
Step 2:The angle of inclination of image is determined, image is corrected;
Step 3:Find the profile of the maximum enclosure frame of form in the image after correction, the maximum enclosure frame of form from image The gauge outfit region of upper area interception image;
Step 4:Screen the character contour in gauge outfit region and character contour is merged, so that character contour be merged into Integer field;
Step 5:The quantity of the field after gauge outfit region merging technique is detected, according to the quantity of the field in gauge outfit region and the content of field Image is classified;
Step 6:The successful image of classification is extracted, region to be identified in image is positioned;
Step 7:According to region to be identified position in the table and OCR identification technologies to the region to be identified in form Field be identified;
Step 8:Optimize identified field.
As further improved technical scheme of the present invention, described step 1 specifically includes following steps:
(1)The image file of fax is obtained, image is gone to the image of HSV colour gamuts, removes in red interval pixel;
(2)Binary-state threshold on the location of pixels is determined according to the pixel Distribution value of the neighborhood block of the pixel of image, to figure Binaryzation as carrying out adaptive threshold, reduces the interference of noise.
As further improved technical scheme of the present invention, described step 2 includes finding the most long straight line in image, root According to most long straight line and the angle of horizontal direction, so as to carry out rotation correction to image.
As further improved technical scheme of the present invention, described step 4 comprises the following steps:
(1)The scope of the length threshold of Set Font profile and the scope of width threshold value;
(2)Profile retrieval is carried out to gauge outfit region, length is filtered out in the range of the length threshold of character contour and width exists Profile in the range of the width threshold value of character contour, the profile filtered out as character contour;
(3)Character contour is merged, the color of character contour is extracted, by the close character contour of color and each font wheel The font that the distance between exterior feature is less than the half of the width of character contour in itself is merged into entire fields.
As further improved technical scheme of the present invention, described step 5 comprises the following steps:
(1)Detect the quantity of the field in gauge outfit region;
(2)If the quantity of field is 0, image is not classified;
(3)If the quantity of field is 1, image is classified using the method for machine learning SVM classifier;
(4)If the quantity of field is more than 1, the font in gauge outfit region is recognized by OCR, by the font and image in gauge outfit region Typonym in identification storehouse is matched, so that classification is realized, by the total number of word of matching divided by the correct total word of field of matching Number is simultaneously contrasted obtained result with threshold value set in advance, if greater than threshold value set in advance, is then classified successfully, no Then, classification failure.
As further improved technical scheme of the present invention, described step 6 comprises the following steps:
(1)Loading makes Template Information in advance;
(2)Extraction step 5 is classified successful image, finds in image all encirclement frames that include in the profile of maximum enclosure frame Profile;
(3)The length threshold scope and width threshold value scope of encirclement frame are set, length threshold model of the length in encirclement frame is filtered out Enclose interior and encirclement frame of the width in the range of the width threshold value of encirclement frame;
(4)According to the positional information of the encirclement frame filtered out, according to from top to bottom, order from left to right is entered to all encirclement frames Row is scanned and sorted, and realizes the positioning of form, and region to be identified in form is found according to template information;
(5)Judged whether to need outside identification form according to Template Information, if necessary to the information outside identification form, then needed To carrying out field contours extract outside form, using the character contour outside the method screening form of step 4 and to character contour Merged, so that character contour is merged into entire fields, according to the maximum in the field and image recorded in Template Information The relative position of encirclement frame determines the region to be identified outside form, according to the field recorded in Template Information to maximum enclosure The field location of identification is needed to be positioned beyond frame.
As further improved technical scheme of the present invention, described step 7 comprises the following steps:
(1)According to the positional information in region to be identified in step 6, field picture is intercepted;
(2)Recognized by OCR, the field having good positioning is identified.
As further improved technical scheme of the present invention, described step 8 comprises the following steps:
(1)Extract the field of OCR identifications;
(2)Optimized for field type difference, to small letter class field, remove nonnumeric part therein;To date field, Space and nonnumeric and date are screened out;
(3)Dictionary optimizes, and by setting up the form of dictionary library, the field that OCR is recognized is matched with field in dictionary library, such as Fruit matching fraction is more than threshold value set in advance, then field in dictionary library is replaced with into the field of OCR identifications so as to dictionary library In field optimize renewal, meanwhile, the correct field of manual confirmation is supplemented in dictionary library, it is described matching fraction be equal to OCR recognizes correct word sum divided by word sum is currently matched with dictionary library.
The present invention quickly can be classified and information extraction to formatting fax paper, and classification speed is fast, and classification is accurate, Information extraction accuracy rate is high.Have in the prior art and searching classification is carried out to facsimile signal, but carrying for field information can not be realized Take;Have what image was identified, but the identification function for formatting facsimile signal can not be realized.Therefore, there is presently no one kind Effective ways for formatting fax paper information extraction, set forth herein method completion this technology vacancy, improve Desk job efficiency, releases productivity, has saved human cost.
Brief description of the drawings
Fig. 1 is flow chart of the invention.
Embodiment
The embodiment of the present invention is further illustrated below according to Fig. 1:
Referring to Fig. 1, the present embodiment is adapted to the fax of any formatting, wherein the fax formatted the i.e. image with form is passed Very, the present embodiment is specific as follows by taking the fax of bill as an example:
A kind of classification of formatting fax based on OCR and information extracting method, specifically include following steps:
Step 1:The image file of the fax of bill is obtained, the binaryzation of adaptive threshold is carried out to image, the dry of noise is reduced Disturb;
Step 2:The angle of inclination of image is determined, image is corrected;
Step 3:Find the profile of the maximum enclosure frame of form in the image after correction, the maximum enclosure frame of form from image The ticket head region of upper area interception image;
Step 4:Screen the character contour in gauge outfit region and character contour is merged, so that character contour be merged into Integer field;
Step 5:The quantity of the field after gauge outfit region merging technique is detected, according to the quantity of the field in gauge outfit region and the content of field Image is classified;
Step 6:The successful image of classification is extracted, region to be identified in image is positioned(Including inside form and outside form Portion);
Step 7:According to region to be identified position in the table and OCR identification technologies to the region to be identified in form Field be identified;
Step 8:Optimize identified field.
In the present embodiment, described step 1 specifically includes following steps:
(1)The image file of fax is obtained, image is gone to the image of HSV colour gamuts, removes in red interval pixel(Go Red chapter);
(2)Binary-state threshold on the location of pixels is determined according to the pixel Distribution value of the neighborhood block of the pixel of image, to figure Binaryzation as carrying out adaptive threshold, reduces the interference of noise.
It is preferred that, described step 2 is specially to find the most long straight line in image, according to most long straight line and horizontal direction Angle, so as to carry out rotation correction to image.
In the present embodiment, described step 4 comprises the following steps:
(1)The scope of the length threshold of Set Font profile and the scope of width threshold value;
(2)To gauge outfit region carry out profile retrieval, filter out the length of profile in the range of the length threshold of character contour and Profile of the width of profile in the range of the width threshold value of character contour, the profile filtered out as character contour;
(3)Character contour is merged, the color of character contour is extracted, by color identical character contour and each font wheel The font that the distance between exterior feature is less than the half of the width of character contour in itself is merged into entire fields.
In the present embodiment, described step 5 comprises the following steps:
(1)Detect the quantity of the field in gauge outfit region;
(2)If the quantity of field is 0, image is not classified, exited;
(3)If the quantity of field is 1, image is classified using the method for machine learning SVM classifier, svm classifier Device is needed to carry out being trained big gauge head in advance, and the bill do not distinguished by SVM classifier is directly exited, and the present embodiment is adopted With machine learning SVM classifier of the prior art;
(4)If the quantity of field is more than 1, the font in gauge outfit region is recognized by OCR, by the font and image in gauge outfit region Typonym in identification storehouse is matched, so that classification is realized, by the total number of word of matching divided by the correct total word of field of matching Number is simultaneously contrasted obtained result with threshold value Thr set in advance, if greater than threshold value set in advance, is then categorized into Work(, otherwise, classification failure are exited.
It is preferred that, described step 6 comprises the following steps:
(1)Template Information is made, loading makes Template Information in advance;
(2)The successful image of classification is extracted, the profiles for including encirclement frame all in the profile of maximum enclosure frame in image are found;
(3)The length threshold scope and width threshold value scope of encirclement frame are set, length of the length in encirclement frame of encirclement frame is filtered out Spend in threshold range and encirclement frame encirclement frame of the width in the range of the width threshold value of encirclement frame;
(4)According to the positional information of the encirclement frame filtered out, according to from top to bottom, order from left to right is entered to all encirclement frames Row is scanned and sorted, and realizes the positioning of form, and region to be identified in form is found according to Template Information(According to Template Information, Judge the position of region to be identified in the table to judge region to be identified whether outside form;If area to be identified Domain then only needs to carry out positioning extraction to the region to be identified in form, if region to be identified in the inside of form Outside form, below step is performed);
(5)Determined the need for outside identification form, if necessary to identification form external information, then needed pair according to Template Information Field contours extract is carried out outside form, the character contour outside form is screened using the method for step 4 and character contour is entered Row fusion, so that character contour is merged into entire fields, the maximum bag in the field and image recorded in Template Information The relative position of peripheral frame determines the region to be identified outside form, according to the field recorded in Template Information to maximum enclosure frame with The outer field location for needing to recognize is positioned.
In the present embodiment, described step 7 comprises the following steps:
(1)According to the positional information in region to be identified in step 6, field picture is intercepted;
(2)Recognized by OCR, the field having good positioning is identified.
In the present embodiment, described step 8 comprises the following steps:
(1)Extract the field of OCR identifications;
(2)Optimized for field type difference, to small letter class field, remove nonnumeric part therein;To date field, Space therein and nonnumeric and " date " are screened out;
(3)Dictionary optimizes, and by setting up the form of dictionary library, the field that OCR is recognized is matched with field in dictionary library, If matching fraction be more than threshold value scoreThr set in advance, by field in dictionary library replace with OCR identification field from And renewal is optimized to the field in dictionary library, meanwhile, the correct field of manual confirmation is continuously replenished into dictionary library, institute Matching fraction is stated to be equal to the correct word sum of OCR identifications divided by currently match word sum with dictionary library.
The present invention quickly can be classified and information extraction to formatting fax paper, and classification speed is fast, and classification is accurate, Information extraction accuracy rate is high.Have in the prior art and searching classification is carried out to facsimile signal, but carrying for field information can not be realized Take;Have what image was identified, but the identification function for formatting facsimile signal can not be realized.Therefore, there is presently no one kind Effective ways for formatting fax paper information extraction, set forth herein method completion this technology vacancy, improve Desk job efficiency, releases productivity, has saved human cost.
Protection scope of the present invention includes but is not limited to embodiment of above, and protection scope of the present invention is with claims It is defined, any replacement being readily apparent that to those skilled in the art that this technology is made, deformation, improvement each fall within the present invention's Protection domain.

Claims (8)

1. classification and the information extracting method of a kind of formatting fax based on OCR, it is characterised in that:
Specifically include following steps:
Step 1:The image file of fax is obtained, the binaryzation of adaptive threshold is carried out to image, the interference of noise is reduced;
Step 2:The angle of inclination of image is determined, image is corrected;
Step 3:Find the profile of the maximum enclosure frame of form in the image after correction, the maximum enclosure frame of form from image The gauge outfit region of upper area interception image;
Step 4:Screen the character contour in gauge outfit region and character contour is merged, so that character contour be merged into Integer field;
Step 5:The quantity of the field after gauge outfit region merging technique is detected, according to the quantity of the field in gauge outfit region and the content of field Image is classified;
Step 6:The successful image of classification is extracted, region to be identified in image is positioned;
Step 7:According to region to be identified position in the table and OCR identification technologies to the region to be identified in form Field be identified;
Step 8:Optimize identified field.
2. classification and the information extracting method of the formatting fax according to claim 1 based on OCR, it is characterised in that:
Described step 1 specifically includes following steps:
(1)The image file of fax is obtained, image is gone to the image of HSV colour gamuts, removes in red interval pixel;
(2)Binary-state threshold on the location of pixels is determined according to the pixel Distribution value of the neighborhood block of the pixel of image, to figure Binaryzation as carrying out adaptive threshold, reduces the interference of noise.
3. classification and the information extracting method of the formatting fax according to claim 2 based on OCR, it is characterised in that:
Described step 2 includes finding the most long straight line in image, according to most long straight line and the angle of horizontal direction, so as to figure As carrying out rotation correction.
4. classification and the information extracting method of the formatting fax according to claim 3 based on OCR, it is characterised in that: Described step 4 comprises the following steps:
(1)The scope of the length threshold of Set Font profile and the scope of width threshold value;
(2)Profile retrieval is carried out to gauge outfit region, length is filtered out in the range of the length threshold of character contour and width exists Profile in the range of the width threshold value of character contour, the profile filtered out as character contour;
(3)Character contour is merged, the color of character contour is extracted, by the close character contour of color and each font wheel The font that the distance between exterior feature is less than the half of the width of character contour in itself is merged into entire fields.
5. classification and the information extracting method of the formatting fax according to claim 4 based on OCR, it is characterised in that:
Described step 5 comprises the following steps:
(1)Detect the quantity of the field in gauge outfit region;
(2)If the quantity of field is 0, image is not classified;
(3)If the quantity of field is 1, image is classified using the method for machine learning SVM classifier;
(4)If the quantity of field is more than 1, the font in gauge outfit region is recognized by OCR, by the font and image in gauge outfit region Typonym in identification storehouse is matched, so that classification is realized, by the total number of word of matching divided by the correct total word of field of matching Number is simultaneously contrasted obtained result with threshold value set in advance, if greater than threshold value set in advance, is then classified successfully, no Then, classification failure.
6. classification and the information extracting method of the formatting fax according to claim 5 based on OCR, it is characterised in that:
Described step 6 comprises the following steps:
(1)Loading makes Template Information in advance;
(2)Extraction step 5 is classified successful image, finds in image all encirclement frames that include in the profile of maximum enclosure frame Profile;
(3)The length threshold scope and width threshold value scope of encirclement frame are set, length threshold model of the length in encirclement frame is filtered out Enclose interior and encirclement frame of the width in the range of the width threshold value of encirclement frame;
(4)According to the positional information of the encirclement frame filtered out, according to from top to bottom, order from left to right is entered to all encirclement frames Row is scanned and sorted, and realizes the positioning of form, and region to be identified in form is found according to template information;
(5)Judged whether to need outside identification form according to Template Information, if necessary to the information outside identification form, then needed To carrying out field contours extract outside form, using the character contour outside the method screening form of step 4 and to character contour Merged, so that character contour is merged into entire fields, according to the maximum in the field and image recorded in Template Information The relative position of encirclement frame determines the region to be identified outside form, according to the field recorded in Template Information to maximum enclosure The field location of identification is needed to be positioned beyond frame.
7. classification and the information extracting method of the formatting fax according to claim 6 based on OCR, it is characterised in that:
Described step 7 comprises the following steps:
(1)According to the positional information in region to be identified in step 6, field picture is intercepted;
(2)Recognized by OCR, the field having good positioning is identified.
8. classification and the information extracting method of the formatting fax according to claim 1 based on OCR, it is characterised in that:
Described step 8 comprises the following steps:
(1)Extract the field of OCR identifications;
(2)Optimized for field type difference, to small letter class field, remove nonnumeric part therein;To date field, Space and nonnumeric and date are screened out;
(3)Dictionary optimizes, and by setting up the form of dictionary library, the field that OCR is recognized is matched with field in dictionary library, such as Fruit matching fraction is more than threshold value set in advance, then field in dictionary library is replaced with into the field of OCR identifications so as to dictionary library In field optimize renewal, meanwhile, the correct field of manual confirmation is supplemented in dictionary library, it is described matching fraction be equal to OCR recognizes correct word sum divided by word sum is currently matched with dictionary library.
CN201710334784.5A 2017-05-12 2017-05-12 Method for classifying and extracting information of formatted fax based on OCR Active CN107133621B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710334784.5A CN107133621B (en) 2017-05-12 2017-05-12 Method for classifying and extracting information of formatted fax based on OCR

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710334784.5A CN107133621B (en) 2017-05-12 2017-05-12 Method for classifying and extracting information of formatted fax based on OCR

Publications (2)

Publication Number Publication Date
CN107133621A true CN107133621A (en) 2017-09-05
CN107133621B CN107133621B (en) 2020-09-29

Family

ID=59733140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710334784.5A Active CN107133621B (en) 2017-05-12 2017-05-12 Method for classifying and extracting information of formatted fax based on OCR

Country Status (1)

Country Link
CN (1) CN107133621B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633239A (en) * 2017-10-18 2018-01-26 江苏鸿信系统集成有限公司 Bill classification and bill field extracting method based on deep learning and OCR
CN108038504A (en) * 2017-12-11 2018-05-15 深圳房讯通信息技术有限公司 A kind of method for parsing property ownership certificate photo content
CN108509401A (en) * 2018-03-05 2018-09-07 平安普惠企业管理有限公司 Contract generation method, device, computer equipment and storage medium
CN108830133A (en) * 2018-04-17 2018-11-16 平安科技(深圳)有限公司 Recognition methods, electronic device and the readable storage medium storing program for executing of contract image picture
CN109816118A (en) * 2019-01-25 2019-05-28 上海深杳智能科技有限公司 A kind of method and terminal of the creation structured document based on deep learning model
WO2019104879A1 (en) * 2017-11-30 2019-06-06 平安科技(深圳)有限公司 Information recognition method for form-type image, electronic device and readable storage medium
CN110119648A (en) * 2018-02-05 2019-08-13 国家计算机网络与信息安全管理中心 A kind of facsimile signal classification method based on optical character identification
CN110674332A (en) * 2019-08-01 2020-01-10 南昌市微轲联信息技术有限公司 Motor vehicle digital electronic archive classification method based on OCR and text mining
CN111767769A (en) * 2019-08-14 2020-10-13 北京京东尚科信息技术有限公司 Text extraction method and device, electronic equipment and storage medium
CN112528984A (en) * 2020-12-18 2021-03-19 平安银行股份有限公司 Image information extraction method, device, electronic equipment and storage medium
CN112560859A (en) * 2020-11-20 2021-03-26 中电鸿信信息科技有限公司 Intelligent academic calendar information extraction method based on machine vision and natural language processing
CN112732955A (en) * 2021-03-31 2021-04-30 国网浙江省电力有限公司 Financial certificate storage and recording method in standard cost accounting
CN112733518A (en) * 2021-01-14 2021-04-30 卫宁健康科技集团股份有限公司 Table template generation method, device, equipment and storage medium
CN115273111A (en) * 2022-06-27 2022-11-01 北京互时科技股份有限公司 Device for identifying drawing material sheet without template

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020181777A1 (en) * 2001-05-30 2002-12-05 International Business Machines Corporation Image processing method, image processing system and program
US20030048490A1 (en) * 2001-09-07 2003-03-13 Yasushi Yanagihara Image recognizing apparatus
CN102750541A (en) * 2011-04-22 2012-10-24 北京文通科技有限公司 Document image classifying distinguishing method and device
CN103208004A (en) * 2013-03-15 2013-07-17 北京英迈杰科技有限公司 Automatic recognition and extraction method and device for bill information area
CN103258198A (en) * 2013-04-26 2013-08-21 四川大学 Extraction method for characters in form document image

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020181777A1 (en) * 2001-05-30 2002-12-05 International Business Machines Corporation Image processing method, image processing system and program
US20030048490A1 (en) * 2001-09-07 2003-03-13 Yasushi Yanagihara Image recognizing apparatus
CN102750541A (en) * 2011-04-22 2012-10-24 北京文通科技有限公司 Document image classifying distinguishing method and device
CN103208004A (en) * 2013-03-15 2013-07-17 北京英迈杰科技有限公司 Automatic recognition and extraction method and device for bill information area
CN103258198A (en) * 2013-04-26 2013-08-21 四川大学 Extraction method for characters in form document image

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LIANG XU ET AL: "A knowledge-based table recognition method for Chinese bank statement images", 《2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)》 *
杜刚: "银行票据识别系统的研究", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 *
邹亚劼: "基于OCR的文档图片检测与信息提取系统的研究", 《万方学位论文库》 *
高鸿: "文档图像拼接技术研究", 《万方学位论文库》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633239A (en) * 2017-10-18 2018-01-26 江苏鸿信系统集成有限公司 Bill classification and bill field extracting method based on deep learning and OCR
WO2019104879A1 (en) * 2017-11-30 2019-06-06 平安科技(深圳)有限公司 Information recognition method for form-type image, electronic device and readable storage medium
CN108038504A (en) * 2017-12-11 2018-05-15 深圳房讯通信息技术有限公司 A kind of method for parsing property ownership certificate photo content
CN108038504B (en) * 2017-12-11 2019-12-27 深圳房讯通信息技术有限公司 Method for analyzing content of house property certificate photo
CN110119648A (en) * 2018-02-05 2019-08-13 国家计算机网络与信息安全管理中心 A kind of facsimile signal classification method based on optical character identification
CN108509401A (en) * 2018-03-05 2018-09-07 平安普惠企业管理有限公司 Contract generation method, device, computer equipment and storage medium
CN108830133A (en) * 2018-04-17 2018-11-16 平安科技(深圳)有限公司 Recognition methods, electronic device and the readable storage medium storing program for executing of contract image picture
CN108830133B (en) * 2018-04-17 2020-02-21 平安科技(深圳)有限公司 Contract image picture identification method, electronic device and readable storage medium
CN109816118B (en) * 2019-01-25 2022-12-06 上海深杳智能科技有限公司 Method and terminal for creating structured document based on deep learning model
CN109816118A (en) * 2019-01-25 2019-05-28 上海深杳智能科技有限公司 A kind of method and terminal of the creation structured document based on deep learning model
CN110674332A (en) * 2019-08-01 2020-01-10 南昌市微轲联信息技术有限公司 Motor vehicle digital electronic archive classification method based on OCR and text mining
CN110674332B (en) * 2019-08-01 2022-11-15 南昌市微轲联信息技术有限公司 Motor vehicle digital electronic archive classification method based on OCR and text mining
CN111767769A (en) * 2019-08-14 2020-10-13 北京京东尚科信息技术有限公司 Text extraction method and device, electronic equipment and storage medium
CN112560859A (en) * 2020-11-20 2021-03-26 中电鸿信信息科技有限公司 Intelligent academic calendar information extraction method based on machine vision and natural language processing
CN112528984A (en) * 2020-12-18 2021-03-19 平安银行股份有限公司 Image information extraction method, device, electronic equipment and storage medium
CN112733518A (en) * 2021-01-14 2021-04-30 卫宁健康科技集团股份有限公司 Table template generation method, device, equipment and storage medium
CN112732955A (en) * 2021-03-31 2021-04-30 国网浙江省电力有限公司 Financial certificate storage and recording method in standard cost accounting
CN115273111A (en) * 2022-06-27 2022-11-01 北京互时科技股份有限公司 Device for identifying drawing material sheet without template

Also Published As

Publication number Publication date
CN107133621B (en) 2020-09-29

Similar Documents

Publication Publication Date Title
CN107133621A (en) The classification of formatting fax based on OCR and information extracting method
CN107633239B (en) Bill classification and bill field extraction method based on deep learning and OCR
CN102567764B (en) A kind of bill evidence and system improving electron image recognition efficiency
CN103258198B (en) Character extracting method in a kind of form document image
CN105654072A (en) Automatic character extraction and recognition system and method for low-resolution medical bill image
JP5830338B2 (en) Form recognition method and form recognition apparatus
CN101887521B (en) Method and terminal for rectifying deviation of file
CN111476109A (en) Bill processing method, bill processing apparatus, and computer-readable storage medium
US20040008884A1 (en) System and method for scanned image bleedthrough processing
CN102855232A (en) Table analysis and edit processing method
JP2014131277A (en) Document image compression method and application of the same to document authentication
US20090180694A1 (en) Method and apparatus for determining an orientation of a document including Korean characters
CN104966051A (en) Method of recognizing layout of document image
JP2012500428A (en) Segment print pages into articles
CN112949471A (en) Domestic CPU-based electronic official document identification reproduction method and system
US11443504B2 (en) Image box filtering for optical character recognition
CN105303363A (en) Data processing method and data processing system
CN111091090A (en) Bank report OCR recognition method, device, platform and terminal
CN104376317A (en) Method for transforming paper file into electronic file
US11436733B2 (en) Image processing apparatus, image processing method and storage medium
JP5887242B2 (en) Image processing apparatus, image processing method, and program
CN101894255B (en) Wavelet transform-based container number positioning method
TWI772199B (en) Accounting management system for recognizes accounting voucher image to automatically obtain accounting related information
CN111445433B (en) Method and device for detecting blank page and fuzzy page of electronic file
JP4205554B2 (en) Form processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 210005 No. 268, Hanzhoung Road, Nanjing, Jiangsu

Applicant after: CLP Hongxin Information Technology Co., Ltd

Address before: 210005 No. 268, Hanzhoung Road, Nanjing, Jiangsu

Applicant before: Jiangsu Hongxin System Integration Co., Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant