CN107133621A - The classification of formatting fax based on OCR and information extracting method - Google Patents
The classification of formatting fax based on OCR and information extracting method Download PDFInfo
- Publication number
- CN107133621A CN107133621A CN201710334784.5A CN201710334784A CN107133621A CN 107133621 A CN107133621 A CN 107133621A CN 201710334784 A CN201710334784 A CN 201710334784A CN 107133621 A CN107133621 A CN 107133621A
- Authority
- CN
- China
- Prior art keywords
- field
- image
- ocr
- region
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/242—Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/158—Segmentation of character regions using character size, text spacings or pitch estimation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Character Discrimination (AREA)
- Character Input (AREA)
Abstract
The invention discloses a kind of classification of formatting fax based on OCR and information extracting method, including:The binaryzation of adaptive threshold is carried out to the image of fax;Image is corrected;Find the profile of the maximum enclosure frame of form in the image after correction, the gauge outfit region of the upper area interception image of the maximum enclosure frame of form from image;Screen the character contour in gauge outfit region and character contour is merged;The quantity of the field after gauge outfit region merging technique is detected, image is classified;The successful image of classification is extracted, region to be identified in image is positioned;The field in the region to be identified in form is identified according to OCR identification technologies;Optimize identified field.The present invention has the operating efficiency for improving office, liberates employee productivity, realizes transformation of the unstructured data to structural data, is adapted to formatting fax, the i.e. fax of tabular drawing picture, such as standardization contract, self-control voucher, bill.
Description
Technical field
The present invention relates to image processing field, the classification and information extraction of particularly a kind of formatting fax based on OCR
Method.
Background technology
With the development of science and technology transnational trans-regional business exchange is also more and more frequent, passed due to faxing compared with alternative document
Defeated mode has special legal effect so that it is widely used in office system.Format and contained in fax paper
A large amount of useful informations, these current fax papers are required for manually being classified, and manual extraction important information therein, efficiency
It is beneath.Need the document classification and information extracting method of a kind of efficient quick badly, lift the operating efficiency of employee, reduction manpower into
This, discharges productivity.
China Patent Publication No. CN101876999 disclose it is a kind of generate the methods of fax indexes, message analyzing device and
Fax searching system, the system carries out printed page analysis to facsimile message, the characteristic information in the facsimile message is extracted, according to institute
State the characteristic information of extraction and set up label for facsimile message, using the label as the facsimile message index, so as to user
According to the corresponding facsimile message of the label lookup.But the system is merely able to realize classification and the index of file, it is difficult to realize
The extraction of key message in file.
China Patent Publication No. CN102222289 discloses a kind of mobile phone financial management method based on OCR and system, should
System carries out analysis identification to Financial Billing by OCR technique, but can not be directed to the scanning fax formatted, it is impossible to realizes and passes
The classification and information extraction of portrait of one's ancestors hung at a sacrificial ceremony picture.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of form based on OCR for above-mentioned the deficiencies in the prior art
Change classification and the information extracting method of fax, the classification and information extracting method that this formatting based on OCR is faxed, which have, to improve
The operating efficiency of office, liberates employee productivity, realizes transformation of the unstructured data to structural data, the present invention is adapted to
Fax, the i.e. fax of tabular drawing picture are formatted, such as standardization contract, self-control voucher, bill.
To realize above-mentioned technical purpose, the technical scheme that the present invention takes is:
A kind of classification of formatting fax based on OCR and information extracting method, specifically include following steps:
Step 1:The image file of fax is obtained, the binaryzation of adaptive threshold is carried out to image, the interference of noise is reduced;
Step 2:The angle of inclination of image is determined, image is corrected;
Step 3:Find the profile of the maximum enclosure frame of form in the image after correction, the maximum enclosure frame of form from image
The gauge outfit region of upper area interception image;
Step 4:Screen the character contour in gauge outfit region and character contour is merged, so that character contour be merged into
Integer field;
Step 5:The quantity of the field after gauge outfit region merging technique is detected, according to the quantity of the field in gauge outfit region and the content of field
Image is classified;
Step 6:The successful image of classification is extracted, region to be identified in image is positioned;
Step 7:According to region to be identified position in the table and OCR identification technologies to the region to be identified in form
Field be identified;
Step 8:Optimize identified field.
As further improved technical scheme of the present invention, described step 1 specifically includes following steps:
(1)The image file of fax is obtained, image is gone to the image of HSV colour gamuts, removes in red interval pixel;
(2)Binary-state threshold on the location of pixels is determined according to the pixel Distribution value of the neighborhood block of the pixel of image, to figure
Binaryzation as carrying out adaptive threshold, reduces the interference of noise.
As further improved technical scheme of the present invention, described step 2 includes finding the most long straight line in image, root
According to most long straight line and the angle of horizontal direction, so as to carry out rotation correction to image.
As further improved technical scheme of the present invention, described step 4 comprises the following steps:
(1)The scope of the length threshold of Set Font profile and the scope of width threshold value;
(2)Profile retrieval is carried out to gauge outfit region, length is filtered out in the range of the length threshold of character contour and width exists
Profile in the range of the width threshold value of character contour, the profile filtered out as character contour;
(3)Character contour is merged, the color of character contour is extracted, by the close character contour of color and each font wheel
The font that the distance between exterior feature is less than the half of the width of character contour in itself is merged into entire fields.
As further improved technical scheme of the present invention, described step 5 comprises the following steps:
(1)Detect the quantity of the field in gauge outfit region;
(2)If the quantity of field is 0, image is not classified;
(3)If the quantity of field is 1, image is classified using the method for machine learning SVM classifier;
(4)If the quantity of field is more than 1, the font in gauge outfit region is recognized by OCR, by the font and image in gauge outfit region
Typonym in identification storehouse is matched, so that classification is realized, by the total number of word of matching divided by the correct total word of field of matching
Number is simultaneously contrasted obtained result with threshold value set in advance, if greater than threshold value set in advance, is then classified successfully, no
Then, classification failure.
As further improved technical scheme of the present invention, described step 6 comprises the following steps:
(1)Loading makes Template Information in advance;
(2)Extraction step 5 is classified successful image, finds in image all encirclement frames that include in the profile of maximum enclosure frame
Profile;
(3)The length threshold scope and width threshold value scope of encirclement frame are set, length threshold model of the length in encirclement frame is filtered out
Enclose interior and encirclement frame of the width in the range of the width threshold value of encirclement frame;
(4)According to the positional information of the encirclement frame filtered out, according to from top to bottom, order from left to right is entered to all encirclement frames
Row is scanned and sorted, and realizes the positioning of form, and region to be identified in form is found according to template information;
(5)Judged whether to need outside identification form according to Template Information, if necessary to the information outside identification form, then needed
To carrying out field contours extract outside form, using the character contour outside the method screening form of step 4 and to character contour
Merged, so that character contour is merged into entire fields, according to the maximum in the field and image recorded in Template Information
The relative position of encirclement frame determines the region to be identified outside form, according to the field recorded in Template Information to maximum enclosure
The field location of identification is needed to be positioned beyond frame.
As further improved technical scheme of the present invention, described step 7 comprises the following steps:
(1)According to the positional information in region to be identified in step 6, field picture is intercepted;
(2)Recognized by OCR, the field having good positioning is identified.
As further improved technical scheme of the present invention, described step 8 comprises the following steps:
(1)Extract the field of OCR identifications;
(2)Optimized for field type difference, to small letter class field, remove nonnumeric part therein;To date field,
Space and nonnumeric and date are screened out;
(3)Dictionary optimizes, and by setting up the form of dictionary library, the field that OCR is recognized is matched with field in dictionary library, such as
Fruit matching fraction is more than threshold value set in advance, then field in dictionary library is replaced with into the field of OCR identifications so as to dictionary library
In field optimize renewal, meanwhile, the correct field of manual confirmation is supplemented in dictionary library, it is described matching fraction be equal to
OCR recognizes correct word sum divided by word sum is currently matched with dictionary library.
The present invention quickly can be classified and information extraction to formatting fax paper, and classification speed is fast, and classification is accurate,
Information extraction accuracy rate is high.Have in the prior art and searching classification is carried out to facsimile signal, but carrying for field information can not be realized
Take;Have what image was identified, but the identification function for formatting facsimile signal can not be realized.Therefore, there is presently no one kind
Effective ways for formatting fax paper information extraction, set forth herein method completion this technology vacancy, improve
Desk job efficiency, releases productivity, has saved human cost.
Brief description of the drawings
Fig. 1 is flow chart of the invention.
Embodiment
The embodiment of the present invention is further illustrated below according to Fig. 1:
Referring to Fig. 1, the present embodiment is adapted to the fax of any formatting, wherein the fax formatted the i.e. image with form is passed
Very, the present embodiment is specific as follows by taking the fax of bill as an example:
A kind of classification of formatting fax based on OCR and information extracting method, specifically include following steps:
Step 1:The image file of the fax of bill is obtained, the binaryzation of adaptive threshold is carried out to image, the dry of noise is reduced
Disturb;
Step 2:The angle of inclination of image is determined, image is corrected;
Step 3:Find the profile of the maximum enclosure frame of form in the image after correction, the maximum enclosure frame of form from image
The ticket head region of upper area interception image;
Step 4:Screen the character contour in gauge outfit region and character contour is merged, so that character contour be merged into
Integer field;
Step 5:The quantity of the field after gauge outfit region merging technique is detected, according to the quantity of the field in gauge outfit region and the content of field
Image is classified;
Step 6:The successful image of classification is extracted, region to be identified in image is positioned(Including inside form and outside form
Portion);
Step 7:According to region to be identified position in the table and OCR identification technologies to the region to be identified in form
Field be identified;
Step 8:Optimize identified field.
In the present embodiment, described step 1 specifically includes following steps:
(1)The image file of fax is obtained, image is gone to the image of HSV colour gamuts, removes in red interval pixel(Go
Red chapter);
(2)Binary-state threshold on the location of pixels is determined according to the pixel Distribution value of the neighborhood block of the pixel of image, to figure
Binaryzation as carrying out adaptive threshold, reduces the interference of noise.
It is preferred that, described step 2 is specially to find the most long straight line in image, according to most long straight line and horizontal direction
Angle, so as to carry out rotation correction to image.
In the present embodiment, described step 4 comprises the following steps:
(1)The scope of the length threshold of Set Font profile and the scope of width threshold value;
(2)To gauge outfit region carry out profile retrieval, filter out the length of profile in the range of the length threshold of character contour and
Profile of the width of profile in the range of the width threshold value of character contour, the profile filtered out as character contour;
(3)Character contour is merged, the color of character contour is extracted, by color identical character contour and each font wheel
The font that the distance between exterior feature is less than the half of the width of character contour in itself is merged into entire fields.
In the present embodiment, described step 5 comprises the following steps:
(1)Detect the quantity of the field in gauge outfit region;
(2)If the quantity of field is 0, image is not classified, exited;
(3)If the quantity of field is 1, image is classified using the method for machine learning SVM classifier, svm classifier
Device is needed to carry out being trained big gauge head in advance, and the bill do not distinguished by SVM classifier is directly exited, and the present embodiment is adopted
With machine learning SVM classifier of the prior art;
(4)If the quantity of field is more than 1, the font in gauge outfit region is recognized by OCR, by the font and image in gauge outfit region
Typonym in identification storehouse is matched, so that classification is realized, by the total number of word of matching divided by the correct total word of field of matching
Number is simultaneously contrasted obtained result with threshold value Thr set in advance, if greater than threshold value set in advance, is then categorized into
Work(, otherwise, classification failure are exited.
It is preferred that, described step 6 comprises the following steps:
(1)Template Information is made, loading makes Template Information in advance;
(2)The successful image of classification is extracted, the profiles for including encirclement frame all in the profile of maximum enclosure frame in image are found;
(3)The length threshold scope and width threshold value scope of encirclement frame are set, length of the length in encirclement frame of encirclement frame is filtered out
Spend in threshold range and encirclement frame encirclement frame of the width in the range of the width threshold value of encirclement frame;
(4)According to the positional information of the encirclement frame filtered out, according to from top to bottom, order from left to right is entered to all encirclement frames
Row is scanned and sorted, and realizes the positioning of form, and region to be identified in form is found according to Template Information(According to Template Information,
Judge the position of region to be identified in the table to judge region to be identified whether outside form;If area to be identified
Domain then only needs to carry out positioning extraction to the region to be identified in form, if region to be identified in the inside of form
Outside form, below step is performed);
(5)Determined the need for outside identification form, if necessary to identification form external information, then needed pair according to Template Information
Field contours extract is carried out outside form, the character contour outside form is screened using the method for step 4 and character contour is entered
Row fusion, so that character contour is merged into entire fields, the maximum bag in the field and image recorded in Template Information
The relative position of peripheral frame determines the region to be identified outside form, according to the field recorded in Template Information to maximum enclosure frame with
The outer field location for needing to recognize is positioned.
In the present embodiment, described step 7 comprises the following steps:
(1)According to the positional information in region to be identified in step 6, field picture is intercepted;
(2)Recognized by OCR, the field having good positioning is identified.
In the present embodiment, described step 8 comprises the following steps:
(1)Extract the field of OCR identifications;
(2)Optimized for field type difference, to small letter class field, remove nonnumeric part therein;To date field,
Space therein and nonnumeric and " date " are screened out;
(3)Dictionary optimizes, and by setting up the form of dictionary library, the field that OCR is recognized is matched with field in dictionary library,
If matching fraction be more than threshold value scoreThr set in advance, by field in dictionary library replace with OCR identification field from
And renewal is optimized to the field in dictionary library, meanwhile, the correct field of manual confirmation is continuously replenished into dictionary library, institute
Matching fraction is stated to be equal to the correct word sum of OCR identifications divided by currently match word sum with dictionary library.
The present invention quickly can be classified and information extraction to formatting fax paper, and classification speed is fast, and classification is accurate,
Information extraction accuracy rate is high.Have in the prior art and searching classification is carried out to facsimile signal, but carrying for field information can not be realized
Take;Have what image was identified, but the identification function for formatting facsimile signal can not be realized.Therefore, there is presently no one kind
Effective ways for formatting fax paper information extraction, set forth herein method completion this technology vacancy, improve
Desk job efficiency, releases productivity, has saved human cost.
Protection scope of the present invention includes but is not limited to embodiment of above, and protection scope of the present invention is with claims
It is defined, any replacement being readily apparent that to those skilled in the art that this technology is made, deformation, improvement each fall within the present invention's
Protection domain.
Claims (8)
1. classification and the information extracting method of a kind of formatting fax based on OCR, it is characterised in that:
Specifically include following steps:
Step 1:The image file of fax is obtained, the binaryzation of adaptive threshold is carried out to image, the interference of noise is reduced;
Step 2:The angle of inclination of image is determined, image is corrected;
Step 3:Find the profile of the maximum enclosure frame of form in the image after correction, the maximum enclosure frame of form from image
The gauge outfit region of upper area interception image;
Step 4:Screen the character contour in gauge outfit region and character contour is merged, so that character contour be merged into
Integer field;
Step 5:The quantity of the field after gauge outfit region merging technique is detected, according to the quantity of the field in gauge outfit region and the content of field
Image is classified;
Step 6:The successful image of classification is extracted, region to be identified in image is positioned;
Step 7:According to region to be identified position in the table and OCR identification technologies to the region to be identified in form
Field be identified;
Step 8:Optimize identified field.
2. classification and the information extracting method of the formatting fax according to claim 1 based on OCR, it is characterised in that:
Described step 1 specifically includes following steps:
(1)The image file of fax is obtained, image is gone to the image of HSV colour gamuts, removes in red interval pixel;
(2)Binary-state threshold on the location of pixels is determined according to the pixel Distribution value of the neighborhood block of the pixel of image, to figure
Binaryzation as carrying out adaptive threshold, reduces the interference of noise.
3. classification and the information extracting method of the formatting fax according to claim 2 based on OCR, it is characterised in that:
Described step 2 includes finding the most long straight line in image, according to most long straight line and the angle of horizontal direction, so as to figure
As carrying out rotation correction.
4. classification and the information extracting method of the formatting fax according to claim 3 based on OCR, it is characterised in that:
Described step 4 comprises the following steps:
(1)The scope of the length threshold of Set Font profile and the scope of width threshold value;
(2)Profile retrieval is carried out to gauge outfit region, length is filtered out in the range of the length threshold of character contour and width exists
Profile in the range of the width threshold value of character contour, the profile filtered out as character contour;
(3)Character contour is merged, the color of character contour is extracted, by the close character contour of color and each font wheel
The font that the distance between exterior feature is less than the half of the width of character contour in itself is merged into entire fields.
5. classification and the information extracting method of the formatting fax according to claim 4 based on OCR, it is characterised in that:
Described step 5 comprises the following steps:
(1)Detect the quantity of the field in gauge outfit region;
(2)If the quantity of field is 0, image is not classified;
(3)If the quantity of field is 1, image is classified using the method for machine learning SVM classifier;
(4)If the quantity of field is more than 1, the font in gauge outfit region is recognized by OCR, by the font and image in gauge outfit region
Typonym in identification storehouse is matched, so that classification is realized, by the total number of word of matching divided by the correct total word of field of matching
Number is simultaneously contrasted obtained result with threshold value set in advance, if greater than threshold value set in advance, is then classified successfully, no
Then, classification failure.
6. classification and the information extracting method of the formatting fax according to claim 5 based on OCR, it is characterised in that:
Described step 6 comprises the following steps:
(1)Loading makes Template Information in advance;
(2)Extraction step 5 is classified successful image, finds in image all encirclement frames that include in the profile of maximum enclosure frame
Profile;
(3)The length threshold scope and width threshold value scope of encirclement frame are set, length threshold model of the length in encirclement frame is filtered out
Enclose interior and encirclement frame of the width in the range of the width threshold value of encirclement frame;
(4)According to the positional information of the encirclement frame filtered out, according to from top to bottom, order from left to right is entered to all encirclement frames
Row is scanned and sorted, and realizes the positioning of form, and region to be identified in form is found according to template information;
(5)Judged whether to need outside identification form according to Template Information, if necessary to the information outside identification form, then needed
To carrying out field contours extract outside form, using the character contour outside the method screening form of step 4 and to character contour
Merged, so that character contour is merged into entire fields, according to the maximum in the field and image recorded in Template Information
The relative position of encirclement frame determines the region to be identified outside form, according to the field recorded in Template Information to maximum enclosure
The field location of identification is needed to be positioned beyond frame.
7. classification and the information extracting method of the formatting fax according to claim 6 based on OCR, it is characterised in that:
Described step 7 comprises the following steps:
(1)According to the positional information in region to be identified in step 6, field picture is intercepted;
(2)Recognized by OCR, the field having good positioning is identified.
8. classification and the information extracting method of the formatting fax according to claim 1 based on OCR, it is characterised in that:
Described step 8 comprises the following steps:
(1)Extract the field of OCR identifications;
(2)Optimized for field type difference, to small letter class field, remove nonnumeric part therein;To date field,
Space and nonnumeric and date are screened out;
(3)Dictionary optimizes, and by setting up the form of dictionary library, the field that OCR is recognized is matched with field in dictionary library, such as
Fruit matching fraction is more than threshold value set in advance, then field in dictionary library is replaced with into the field of OCR identifications so as to dictionary library
In field optimize renewal, meanwhile, the correct field of manual confirmation is supplemented in dictionary library, it is described matching fraction be equal to
OCR recognizes correct word sum divided by word sum is currently matched with dictionary library.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710334784.5A CN107133621B (en) | 2017-05-12 | 2017-05-12 | Method for classifying and extracting information of formatted fax based on OCR |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710334784.5A CN107133621B (en) | 2017-05-12 | 2017-05-12 | Method for classifying and extracting information of formatted fax based on OCR |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107133621A true CN107133621A (en) | 2017-09-05 |
CN107133621B CN107133621B (en) | 2020-09-29 |
Family
ID=59733140
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710334784.5A Active CN107133621B (en) | 2017-05-12 | 2017-05-12 | Method for classifying and extracting information of formatted fax based on OCR |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107133621B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107633239A (en) * | 2017-10-18 | 2018-01-26 | 江苏鸿信系统集成有限公司 | Bill classification and bill field extracting method based on deep learning and OCR |
CN108038504A (en) * | 2017-12-11 | 2018-05-15 | 深圳房讯通信息技术有限公司 | A kind of method for parsing property ownership certificate photo content |
CN108509401A (en) * | 2018-03-05 | 2018-09-07 | 平安普惠企业管理有限公司 | Contract generation method, device, computer equipment and storage medium |
CN108830133A (en) * | 2018-04-17 | 2018-11-16 | 平安科技(深圳)有限公司 | Recognition methods, electronic device and the readable storage medium storing program for executing of contract image picture |
CN109816118A (en) * | 2019-01-25 | 2019-05-28 | 上海深杳智能科技有限公司 | A kind of method and terminal of the creation structured document based on deep learning model |
WO2019104879A1 (en) * | 2017-11-30 | 2019-06-06 | 平安科技(深圳)有限公司 | Information recognition method for form-type image, electronic device and readable storage medium |
CN110119648A (en) * | 2018-02-05 | 2019-08-13 | 国家计算机网络与信息安全管理中心 | A kind of facsimile signal classification method based on optical character identification |
CN110674332A (en) * | 2019-08-01 | 2020-01-10 | 南昌市微轲联信息技术有限公司 | Motor vehicle digital electronic archive classification method based on OCR and text mining |
CN111767769A (en) * | 2019-08-14 | 2020-10-13 | 北京京东尚科信息技术有限公司 | Text extraction method and device, electronic equipment and storage medium |
CN112528984A (en) * | 2020-12-18 | 2021-03-19 | 平安银行股份有限公司 | Image information extraction method, device, electronic equipment and storage medium |
CN112560859A (en) * | 2020-11-20 | 2021-03-26 | 中电鸿信信息科技有限公司 | Intelligent academic calendar information extraction method based on machine vision and natural language processing |
CN112732955A (en) * | 2021-03-31 | 2021-04-30 | 国网浙江省电力有限公司 | Financial certificate storage and recording method in standard cost accounting |
CN112733518A (en) * | 2021-01-14 | 2021-04-30 | 卫宁健康科技集团股份有限公司 | Table template generation method, device, equipment and storage medium |
CN115273111A (en) * | 2022-06-27 | 2022-11-01 | 北京互时科技股份有限公司 | Device for identifying drawing material sheet without template |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020181777A1 (en) * | 2001-05-30 | 2002-12-05 | International Business Machines Corporation | Image processing method, image processing system and program |
US20030048490A1 (en) * | 2001-09-07 | 2003-03-13 | Yasushi Yanagihara | Image recognizing apparatus |
CN102750541A (en) * | 2011-04-22 | 2012-10-24 | 北京文通科技有限公司 | Document image classifying distinguishing method and device |
CN103208004A (en) * | 2013-03-15 | 2013-07-17 | 北京英迈杰科技有限公司 | Automatic recognition and extraction method and device for bill information area |
CN103258198A (en) * | 2013-04-26 | 2013-08-21 | 四川大学 | Extraction method for characters in form document image |
-
2017
- 2017-05-12 CN CN201710334784.5A patent/CN107133621B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020181777A1 (en) * | 2001-05-30 | 2002-12-05 | International Business Machines Corporation | Image processing method, image processing system and program |
US20030048490A1 (en) * | 2001-09-07 | 2003-03-13 | Yasushi Yanagihara | Image recognizing apparatus |
CN102750541A (en) * | 2011-04-22 | 2012-10-24 | 北京文通科技有限公司 | Document image classifying distinguishing method and device |
CN103208004A (en) * | 2013-03-15 | 2013-07-17 | 北京英迈杰科技有限公司 | Automatic recognition and extraction method and device for bill information area |
CN103258198A (en) * | 2013-04-26 | 2013-08-21 | 四川大学 | Extraction method for characters in form document image |
Non-Patent Citations (4)
Title |
---|
LIANG XU ET AL: "A knowledge-based table recognition method for Chinese bank statement images", 《2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)》 * |
杜刚: "银行票据识别系统的研究", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 * |
邹亚劼: "基于OCR的文档图片检测与信息提取系统的研究", 《万方学位论文库》 * |
高鸿: "文档图像拼接技术研究", 《万方学位论文库》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107633239A (en) * | 2017-10-18 | 2018-01-26 | 江苏鸿信系统集成有限公司 | Bill classification and bill field extracting method based on deep learning and OCR |
WO2019104879A1 (en) * | 2017-11-30 | 2019-06-06 | 平安科技(深圳)有限公司 | Information recognition method for form-type image, electronic device and readable storage medium |
CN108038504A (en) * | 2017-12-11 | 2018-05-15 | 深圳房讯通信息技术有限公司 | A kind of method for parsing property ownership certificate photo content |
CN108038504B (en) * | 2017-12-11 | 2019-12-27 | 深圳房讯通信息技术有限公司 | Method for analyzing content of house property certificate photo |
CN110119648A (en) * | 2018-02-05 | 2019-08-13 | 国家计算机网络与信息安全管理中心 | A kind of facsimile signal classification method based on optical character identification |
CN108509401A (en) * | 2018-03-05 | 2018-09-07 | 平安普惠企业管理有限公司 | Contract generation method, device, computer equipment and storage medium |
CN108830133A (en) * | 2018-04-17 | 2018-11-16 | 平安科技(深圳)有限公司 | Recognition methods, electronic device and the readable storage medium storing program for executing of contract image picture |
CN108830133B (en) * | 2018-04-17 | 2020-02-21 | 平安科技(深圳)有限公司 | Contract image picture identification method, electronic device and readable storage medium |
CN109816118B (en) * | 2019-01-25 | 2022-12-06 | 上海深杳智能科技有限公司 | Method and terminal for creating structured document based on deep learning model |
CN109816118A (en) * | 2019-01-25 | 2019-05-28 | 上海深杳智能科技有限公司 | A kind of method and terminal of the creation structured document based on deep learning model |
CN110674332A (en) * | 2019-08-01 | 2020-01-10 | 南昌市微轲联信息技术有限公司 | Motor vehicle digital electronic archive classification method based on OCR and text mining |
CN110674332B (en) * | 2019-08-01 | 2022-11-15 | 南昌市微轲联信息技术有限公司 | Motor vehicle digital electronic archive classification method based on OCR and text mining |
CN111767769A (en) * | 2019-08-14 | 2020-10-13 | 北京京东尚科信息技术有限公司 | Text extraction method and device, electronic equipment and storage medium |
CN112560859A (en) * | 2020-11-20 | 2021-03-26 | 中电鸿信信息科技有限公司 | Intelligent academic calendar information extraction method based on machine vision and natural language processing |
CN112528984A (en) * | 2020-12-18 | 2021-03-19 | 平安银行股份有限公司 | Image information extraction method, device, electronic equipment and storage medium |
CN112733518A (en) * | 2021-01-14 | 2021-04-30 | 卫宁健康科技集团股份有限公司 | Table template generation method, device, equipment and storage medium |
CN112732955A (en) * | 2021-03-31 | 2021-04-30 | 国网浙江省电力有限公司 | Financial certificate storage and recording method in standard cost accounting |
CN115273111A (en) * | 2022-06-27 | 2022-11-01 | 北京互时科技股份有限公司 | Device for identifying drawing material sheet without template |
Also Published As
Publication number | Publication date |
---|---|
CN107133621B (en) | 2020-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107133621A (en) | The classification of formatting fax based on OCR and information extracting method | |
CN107633239B (en) | Bill classification and bill field extraction method based on deep learning and OCR | |
CN102567764B (en) | A kind of bill evidence and system improving electron image recognition efficiency | |
CN103258198B (en) | Character extracting method in a kind of form document image | |
CN105654072A (en) | Automatic character extraction and recognition system and method for low-resolution medical bill image | |
JP5830338B2 (en) | Form recognition method and form recognition apparatus | |
CN101887521B (en) | Method and terminal for rectifying deviation of file | |
CN111476109A (en) | Bill processing method, bill processing apparatus, and computer-readable storage medium | |
US20040008884A1 (en) | System and method for scanned image bleedthrough processing | |
CN102855232A (en) | Table analysis and edit processing method | |
JP2014131277A (en) | Document image compression method and application of the same to document authentication | |
US20090180694A1 (en) | Method and apparatus for determining an orientation of a document including Korean characters | |
CN104966051A (en) | Method of recognizing layout of document image | |
JP2012500428A (en) | Segment print pages into articles | |
CN112949471A (en) | Domestic CPU-based electronic official document identification reproduction method and system | |
US11443504B2 (en) | Image box filtering for optical character recognition | |
CN105303363A (en) | Data processing method and data processing system | |
CN111091090A (en) | Bank report OCR recognition method, device, platform and terminal | |
CN104376317A (en) | Method for transforming paper file into electronic file | |
US11436733B2 (en) | Image processing apparatus, image processing method and storage medium | |
JP5887242B2 (en) | Image processing apparatus, image processing method, and program | |
CN101894255B (en) | Wavelet transform-based container number positioning method | |
TWI772199B (en) | Accounting management system for recognizes accounting voucher image to automatically obtain accounting related information | |
CN111445433B (en) | Method and device for detecting blank page and fuzzy page of electronic file | |
JP4205554B2 (en) | Form processing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 210005 No. 268, Hanzhoung Road, Nanjing, Jiangsu Applicant after: CLP Hongxin Information Technology Co., Ltd Address before: 210005 No. 268, Hanzhoung Road, Nanjing, Jiangsu Applicant before: Jiangsu Hongxin System Integration Co., Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |