CN110008944A - OCR recognition methods and device, storage medium based on template matching - Google Patents

OCR recognition methods and device, storage medium based on template matching Download PDF

Info

Publication number
CN110008944A
CN110008944A CN201910127136.1A CN201910127136A CN110008944A CN 110008944 A CN110008944 A CN 110008944A CN 201910127136 A CN201910127136 A CN 201910127136A CN 110008944 A CN110008944 A CN 110008944A
Authority
CN
China
Prior art keywords
identified
picture
identification
document
document picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910127136.1A
Other languages
Chinese (zh)
Other versions
CN110008944B (en
Inventor
高梁梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910127136.1A priority Critical patent/CN110008944B/en
Publication of CN110008944A publication Critical patent/CN110008944A/en
Application granted granted Critical
Publication of CN110008944B publication Critical patent/CN110008944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Character Input (AREA)

Abstract

This application discloses a kind of OCR recognition methods and device, storage medium, computer equipment based on template matching, are related to technical field of information processing.Wherein method includes: the sample files picture of the different specified type-setting modes of acquisition;Frame choosing is carried out to each sample files picture, obtains recognition template corresponding with each sample files picture;Recognition template database is established, the corresponding recognition template of each sample files picture is preserved in the recognition template database;Document picture to be identified is acquired, and the frame and title of the document picture to be identified are identified, obtains the Doctype of the document picture to be identified;The Doctype of the document picture to be identified obtained according to identification calls corresponding recognition template in the recognition template database to carry out OCR identification to the document picture to be identified.The application establishes recognition template database, adapts to the identification of the document of a variety of different typesetting formats, improves the accuracy of OCR identification.

Description

OCR recognition methods and device, storage medium based on template matching
Technical field
This application involves technical field of information processing, particularly with regard to a kind of OCR recognition methods based on template matching And device, storage medium, computer equipment.
Background technique
Optical character identification (Optical Character Recognition, OCR) method, which refers to, passes through electronic equipment (such as scanner or digital camera) obtains the electronic document of paper document, and the character string cutting in electronic document is opened, and is formed Then small picture comprising single character identifies the text after cutting using certain method.
Existing OCR recognition methods can only accurately be known because of factors such as character typesetting multiplicity in picture to be identified The fixed picture of the characters typesettings such as other identity card, bank card, but it is poor to the picture recognition effect of other documents.
Summary of the invention
In view of this, this application provides a kind of OCR recognition methods by template matching and device, storage medium, based on Machine equipment is calculated, main purpose is to solve the problems, such as that existing OCR recognition methods recognition effect is poor.
According to the one aspect of the application, a kind of OCR recognition methods based on template matching is provided, this method comprises:
The sample files picture of the different specified type-setting modes of acquisition;
Frame choosing is carried out to each sample files picture, obtains recognition template corresponding with each sample files picture;
Recognition template database is established, it is corresponding that each sample files picture is preserved in the recognition template database Recognition template;
Document picture to be identified is acquired, and the frame and title of the document picture to be identified are identified, is obtained The Doctype of the document picture to be identified;
The Doctype of the document picture to be identified obtained according to identification calls corresponding in the recognition template database Recognition template carries out OCR identification to the document picture to be identified.
Optionally, the frame and title to the document picture to be identified identifies, obtains described to be identified The Doctype of document picture, comprising:
Binary conversion treatment is carried out to the document picture to be identified, obtains binaryzation form image;
Based on the slant correction algorithm of perspective variation, slant correction is carried out to the binaryzation form image;
The text to be identified is extracted using morphological image process method based on the binaryzation form image after slant correction The frame of shelves picture;
OCR identification is carried out to the predeterminable area of the binaryzation form image after the slant correction, is obtained described to be identified The title of document picture;
The Doctype of the document picture to be identified is obtained according to the frame of the document picture to be identified and title.
It is optionally, described that OCR identification is carried out to the document picture to be identified, comprising:
OCR identification is carried out to the document picture to be identified using convolution loop neural network model.
Optionally, the convolution loop neural network model include neural network CNN, bidirectional circulating neural network LSTM and Couple chronological classification CTC model;
It is described that OCR identification is carried out to the document picture to be identified using convolution loop neural network model, comprising:
Neural network CNN extracts the feature of the identification region of the document picture to be identified, generates the spy of the identification region Levy sequence;
Bidirectional circulating neural network LSTM determines the corresponding label distribution list of each feature in characteristic sequence;
Connection chronological classification CTC model determines the text of the identification region according to the corresponding label distribution list of each feature Word.
Optionally, the acquisition document picture to be identified, comprising:
Document picture to be identified is acquired by the high-definition camera instrument with adjust automatically shooting angle function.
Optionally, in calling the recognition template database corresponding recognition template to the document picture to be identified into Before row OCR identification, the method also includes:
Adjust the brightness and contrast of the document picture to be identified;
Gray proces are carried out to the document picture to be identified;
User is received to the angle adjustment instruction of the document picture to be identified after gray proces, adjusts the document to be identified The angle of picture.
Optionally, first sample document picture is configured with multiple regions recognition template in the recognition template database, The partial region of each region recognition template sample files picture for identification.
It is optionally, described that frame choosing is carried out to each sample files picture, comprising:
Whole automatic frame choosing is carried out to each sample files picture, obtains the identification region of each sample files picture;
The identification region selected to automatic frame is adjusted, and multiple identification regions of wrong automatic frame choosing are adjusted to one One identification region of wrong automatic frame choosing is split as multiple identification regions by identification region.
According to the another aspect of the application, a kind of OCR identification device based on template matching is provided, which includes:
Sample files picture collection unit, for acquiring the sample files picture of different specified type-setting modes;
Recognition template acquiring unit obtains and each sample files figure for carrying out frame choosing to each sample files picture The corresponding recognition template of piece;
Recognition template Database unit is protected in the recognition template database for establishing recognition template database There is the corresponding recognition template of each sample files picture;
Doctype acquiring unit, for acquiring document picture to be identified, and to the frame of the document picture to be identified And title is identified, the Doctype of the document picture to be identified is obtained;
The Doctype of OCR recognition unit, the document picture to be identified for being obtained according to identification calls the identification mould Corresponding recognition template carries out OCR identification to the document picture to be identified in plate database.
Optionally, the Doctype acquiring unit is further used for:
Binary conversion treatment is carried out to the document picture to be identified, obtains binaryzation form image;
Based on the slant correction algorithm of perspective variation, slant correction is carried out to the binaryzation form image;
The text to be identified is extracted using morphological image process method based on the binaryzation form image after slant correction The frame of shelves picture;
OCR identification is carried out to the predeterminable area of the binaryzation form image after the slant correction, is obtained described to be identified The title of document picture;
The Doctype of the document picture to be identified is obtained according to the frame of the document picture to be identified and title.
Optionally, the OCR recognition unit is further used for:
OCR identification is carried out to the document picture to be identified using convolution loop neural network model.
Specifically, the convolution loop neural network model include neural network CNN, bidirectional circulating neural network LSTM and Couple chronological classification CTC model;
It is described that OCR identification is carried out to the document picture to be identified using convolution loop neural network model, comprising:
Neural network CNN extracts the feature of the identification region of the document picture to be identified, generates the spy of the identification region Levy sequence;
Bidirectional circulating neural network LSTM determines the corresponding label distribution list of each feature in characteristic sequence;
Connection chronological classification CTC model determines the text of the identification region according to the corresponding label distribution list of each feature Word.
Optionally, the device further include:
Picture adjustment unit, for adjusting the brightness and contrast of the document picture to be identified;
Gray scale processing unit, for carrying out gray proces to the document picture to be identified;
Angle adjusting, for receiving user to the angle adjustment instruction of the document picture to be identified after gray proces, Adjust the angle of the document picture to be identified.
Optionally, first sample document picture is configured with multiple regions recognition template in the recognition template database, The partial region of each region recognition template sample files picture for identification.
Specifically, the recognition template acquiring unit is further used for:
Whole automatic frame choosing is carried out to each sample files picture, obtains the identification region of each sample files picture;
The identification region selected to automatic frame is adjusted, and multiple identification regions of wrong automatic frame choosing are adjusted to one One identification region of wrong automatic frame choosing is split as multiple identification regions by identification region.
According to the application another aspect, a kind of storage medium is provided, computer program, described program are stored thereon with The above-mentioned OCR recognition methods based on template matching is realized when being executed by processor.
According to the application another aspect, a kind of computer equipment is provided, including storage medium, processor and be stored in On storage medium and the computer program that can run on a processor, the processor realize above-mentioned be based on when executing described program The OCR recognition methods of template matching.
By above-mentioned technical proposal, a kind of OCR recognition methods and device, storage based on template matching provided by the present application Medium, computer equipment establish recognition template database, adapt to the identification of the document of a variety of different typesetting formats, improve The accuracy of OCR identification.
In addition, the application also uses high-definition camera instrument to acquire the picture of document to be identified, can exclude due to shooting light, angle Spend the influence identified to OCR.It can also be identified, be improved based on specific region of the region recognition template to some document picture The efficiency of identification.
Above description is only the general introduction of technical scheme, in order to better understand the technological means of the application, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects, features and advantages of the application can It is clearer and more comprehensible, below the special specific embodiment for lifting the application.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 shows a kind of flow diagram of OCR recognition methods based on template matching provided by the embodiments of the present application;
Fig. 2 shows a kind of schematic diagrames of sample files provided by the embodiments of the present application;
Fig. 3 shows a kind of structural schematic diagram of OCR identification device based on template matching provided by the embodiments of the present application.
Specific embodiment
The application is described in detail below with reference to attached drawing and in conjunction with the embodiments.It should be noted that not conflicting In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
For the problem of current OCR recognition methods recognition effect difference.It present embodiments provides a kind of based on template matching OCR recognition methods, adapt to the identification of the document of a variety of different typesetting formats, improve the accuracy of OCR identification, such as Fig. 1 It is shown, this method comprises:
S11: the sample files picture of the different specified type-setting modes of acquisition;
In practical applications, pass through the different specified typesettings of high-definition camera instrument acquisition with adjust automatically shooting angle function The sample files picture of mode.
S12: frame choosing is carried out to each sample files picture, obtains recognition template corresponding with each sample files picture;
It should be noted that the embodiment of the present application collecting sample document picture, determines the cog region of the sample files picture Domain range, establishes the corresponding recognition template of document picture to be identified, includes the coordinate position of each identification region in recognition template And zone name.
It will be appreciated that the quality of image cutting quality directly affects the discrimination of OCR in OCR identification.To one When cutting wrong image progress OCR identification, it is often unable to get correct recognition result.Thus, the application first establishes text to be identified The template of shelves picture, records the coordinate position and zone name of each identification region in a template.For document map to be identified The print area of piece, directly using the zone name in the template as the text of print area, improves knowledge without identification Other efficiency.
S13: establishing recognition template database, and each sample files picture is preserved in the recognition template database Corresponding recognition template;
It will be appreciated that the embodiment of the present application establishes the identification including the corresponding recognition template of a variety of sample files pictures Template database, so as to accurately be identified to the document picture to be identified with different type-setting modes.
S14: acquiring document picture to be identified, and identify to the frame and title of the document picture to be identified, Obtain the Doctype of the document picture to be identified;
S15: the Doctype of the document picture to be identified obtained according to identification calls phase in the recognition template database The recognition template answered carries out OCR identification to the document picture to be identified.
It will be appreciated that the embodiment of the present application is determined according to the recognition template to match with document picture to be identified wait know The identification region and zone name of other document picture can relatively accurately determine identification region, (in practice be hand to identification region Write region) carry out OCR identification, the text according to the zone name in recognition template as print area.
The OCR recognition methods based on template matching of the embodiment of the present application, establishes recognition template database, adapts to more The identification of the document of the different typesetting formats of kind, improves the accuracy of OCR identification.
In a kind of optional embodiment of the embodiment of the present application, similar with the method in Fig. 1, step S14 is to described The frame and title of document picture to be identified are identified, the Doctype of the document picture to be identified is obtained, comprising:
Binary conversion treatment is carried out to the document picture to be identified, obtains binaryzation form image;
Based on the slant correction algorithm of perspective variation, slant correction is carried out to the binaryzation form image;
The text to be identified is extracted using morphological image process method based on the binaryzation form image after slant correction The frame of shelves picture;
OCR identification is carried out to the predeterminable area of the binaryzation form image after the slant correction, is obtained described to be identified The title of document picture;
The Doctype of the document picture to be identified is obtained according to the frame of the document picture to be identified and title.
It should be noted that the frame of document picture to be identified refers to the lateral wire of table in document to be identified and indulges To wire.Extracting the frame of document picture to be identified, detailed process is as follows:
Horizontal structure element and vertical structure element is selected to open the binaryzation form image after slant correction respectively Operation obtains table horizontal line image and table vertical line image;
To the table horizontal line image and the table vertical line image carries out and operation, obtains table frame diagram;
Micronization processes are carried out to the table frame diagram, Form Frame Line skeleton is extracted, that is, extracts document picture to be identified Frame.
It will be appreciated that the region among the top of document picture to be identified is the title of document picture to be identified.Root It can determine the major class of document picture to be identified according to title, for example be financial category or article class;It can be further determined that according to frame The group of document picture to be identified, such as spend money application form and reimbursement approval form belong to financial category table, but have by oneself not because each Same frame, therefore corresponding different Doctype.
In another optionally embodiment of the embodiment of the present application, to the document map to be identified described in step S15 Piece carries out OCR identification, comprising:
OCR identification is carried out to the document picture to be identified using convolution loop neural network model.
Specifically, the convolution loop neural network model include neural network CNN, bidirectional circulating neural network LSTM and Couple chronological classification CTC model;
It is described that OCR identification is carried out to the document picture to be identified using convolution loop neural network model, comprising:
Neural network CNN extracts the feature of the identification region of the document picture to be identified, generates the spy of the identification region Levy sequence;
Bidirectional circulating neural network LSTM determines the corresponding label distribution list of each feature in characteristic sequence;
Connection chronological classification CTC model determines the text of the identification region according to the corresponding label distribution list of each feature Word.
Specifically, the corresponding label distribution list of some feature is softmax vector, indicates that this feature corresponds to each label Probability export most probable label after these probability of all features are transferred to CTC model, using removal space etc. behaviour Make, obtains final sequence label, the i.e. text of identification region.
It will be appreciated that handwriting is regular not as good as printing character, cause OCR identification handwriting effect poor, we Case CTC model can carry out automatic aligning to the text not being aligned, and improve the accuracy of OCR identification.
Preferably, the acquisition document picture to be identified, comprising:
Document picture to be identified is acquired by the high-definition camera instrument with adjust automatically shooting angle function.
It should be noted that high-definition camera instrument can be according to the intensity adjustment ISO value or exposure of ambient light, to mention The quality of high document picture.ISO indicates the film speed of CCD CMOS photosensitive element in digital camera, and ISO numerical value is higher Photoperceptivity with regard to illustrating the sensitive component is stronger.
Under normal circumstances, ISO value is lower, and the quality of photograph is higher, and obtaining for the details performance of photograph is finer and smoother, and ISO value is got over The brightness of height, photograph is higher, and the quality of photograph can be reduced with the raising of ISO value, and noise can become increasingly severe, But high ISO value can make up the deficiency of light.
Also, high-definition camera instrument can also adjust shooting angle according to documents location, avoid due to text skew in document Influence the effect of OCR identification.
OCR identifies that often user needs to provide the higher image of quality and can just have to the more demanding of tablet pattern quality Preferable identification quality.Resolution ratio cannot be too low, and color cannot be too abundant, and contrast cannot be too low, and the text on image cannot There is deflection.
Preferably, the embodiment of the present application in calling the recognition template database corresponding recognition template to described wait know Before other document picture carries out OCR identification, also document map piece to be identified is pre-processed, is specifically included:
Adjust the brightness and contrast of the document picture to be identified;
Gray proces are carried out to the document picture to be identified;
User is received to the angle adjustment instruction of the document picture to be identified after gray proces, adjusts the document to be identified The angle of picture.
In a kind of optional embodiment, the application using Tesseract-OCR Open Framework to the brightness of image and Contrast is automatically adjusted;Gray proces are carried out to image using open-source cross-platform computer vision library openCV, make the figure The color of picture becomes black-and-white two color, forms a sharp contrast;Show that image interactive interface allows user to carry out manual correction to image, User can be adjusted the angle of image by the built-in function that application program provides, so that the text on image is no longer inclined Tiltedly.
In a kind of optional embodiment of the application, first sample document picture is matched in the recognition template database It is equipped with multiple regions recognition template, the partial region of each region recognition template sample files picture for identification.
It should be noted that first sample picture is any one sample files picture in recognition template database.Pass through To a certain sample files picture configuration multiple regions recognition template, it can be achieved that the knowledge of the partial region to document map piece to be identified Not, the efficiency of OCR identification is improved.
In practical applications, by taking Fig. 2 as an example, sample files are bill document, which is configured with region recognition mould Plate A and region recognition template B, wherein the identification region range of region recognition template A includes the spend money amount of money, spend money department and contract Number, the identification region range of region recognition template B includes the spend money amount of money, the department that spends money, applicant, department head.It can basis Different region recognition templates is arranged in actual business requirement.
In the another embodiment of this programme, in order to improve the efficiency for the template for establishing document picture to be identified, also Frame choosing can be carried out to each sample files picture in this way, obtain recognition template corresponding with each sample files picture:
Whole automatic frame choosing is carried out to each sample files picture, obtains the identification region of each sample files picture;
The identification region selected to automatic frame is adjusted, and multiple identification regions of wrong automatic frame choosing are adjusted to one One identification region of wrong automatic frame choosing is split as multiple identification regions by identification region.
It will be appreciated that the embodiment of the present application carries out whole automatic frame choosing to document map piece to be identified, then adjustment is whole Automatic frame choosing as a result, establish the corresponding recognition template of document picture to be identified, include each identification region in the recognition template Coordinate position and zone name.Collimation mark can be remembered any portion on region when picture frame by automatic marked region Point, wire can be automatically adjusted to the boundary in region, wire can also be marked the blank space outside the boundary of four, region, wire It can automatically contract to the boundary in region.
Specifically: selecting result to carry out the more adjustment of a mark whole automatic frame, the multiple regions of error flag are merged into One region;Select result to carry out the adjustment of more marks one whole automatic frame, by error flag be a region be split as it is multiple Region.
The OCR recognition methods of the embodiment of the present application establishes the template to match with document to be identified, adapt to it is a variety of not With the identification of the document of typesetting format, the discrimination and accuracy of OCR identification are improved.And it can be excluded using high-definition camera instrument The influence that OCR is identified due to shooting light, angle.The OCR recognition methods of the embodiment of the present application can also be based on region recognition Template identifies the specific region of some document picture, improves the efficiency of identification.
Fig. 3 shows a kind of structural schematic diagram of OCR identification device based on template matching provided by the embodiments of the present application. As shown in figure 3, the device of the embodiment of the present application includes:
Sample files picture collection unit 31, for acquiring the sample files picture of different specified type-setting modes;
In practical applications, sample files picture collection unit 31 passes through the high definition with adjust automatically shooting angle function The sample files picture of the different specified type-setting modes of video camera acquisition.
Recognition template acquiring unit 32 obtains and each sample files for carrying out frame choosing to each sample files picture The corresponding recognition template of picture;
It should be noted that the embodiment of the present application collecting sample document picture, determines the cog region of the sample files picture Domain range, establishes the corresponding recognition template of document picture to be identified, includes the coordinate position of each identification region in recognition template And zone name.
It will be appreciated that the quality of image cutting quality directly affects the discrimination of OCR in OCR identification.To one When cutting wrong image progress OCR identification, it is often unable to get correct recognition result.Thus, the application first establishes text to be identified The template of shelves picture, records the coordinate position and zone name of each identification region in a template.For document map to be identified The print area of piece, directly using the zone name in the template as the text of print area, improves knowledge without identification Other efficiency.
Recognition template Database unit 33, for establishing recognition template database, in the recognition template database Preserve the corresponding recognition template of each sample files picture;
It will be appreciated that the embodiment of the present application establishes the identification including the corresponding recognition template of a variety of sample files pictures Template database, so as to accurately be identified to the document picture to be identified with different type-setting modes.
Doctype acquiring unit 34, for acquiring document picture to be identified, and to the side of the document picture to be identified Frame and title are identified, the Doctype of the document picture to be identified is obtained.
OCR recognition unit 35 calls and knows accordingly in the recognition template database for acquiring document picture to be identified Other template carries out OCR identification to the document picture to be identified.
It will be appreciated that the embodiment of the present application is determined according to the recognition template to match with document picture to be identified wait know The identification region and zone name of other document picture can relatively accurately determine identification region, (in practice be hand to identification region Write region) carry out OCR identification, the text according to the zone name in recognition template as print area.
The OCR identification device based on template matching of the embodiment of the present application, establishes recognition template database, adapts to more The identification of the document of the different typesetting formats of kind, improves the accuracy of OCR identification.
Optionally, Doctype acquiring unit 34 is further used for:
Binary conversion treatment is carried out to the document picture to be identified, obtains binaryzation form image;
Based on the slant correction algorithm of perspective variation, slant correction is carried out to the binaryzation form image;
The text to be identified is extracted using morphological image process method based on the binaryzation form image after slant correction The frame of shelves picture;
OCR identification is carried out to the predeterminable area of the binaryzation form image after the slant correction, is obtained described to be identified The title of document picture;
The Doctype of the document picture to be identified is obtained according to the frame of the document picture to be identified and title.
It should be noted that the frame of document picture to be identified refers to the lateral wire of table in document to be identified and indulges To wire.Extracting the frame of document picture to be identified, detailed process is as follows:
Horizontal structure element and vertical structure element is selected to open the binaryzation form image after slant correction respectively Operation obtains table horizontal line image and table vertical line image;
To the table horizontal line image and the table vertical line image carries out and operation, obtains table frame diagram;
Micronization processes are carried out to the table frame diagram, Form Frame Line skeleton is extracted, that is, extracts document picture to be identified Frame.
It will be appreciated that the region among the top of document picture to be identified is the title of document picture to be identified.Root It can determine the major class of document picture to be identified according to title, for example be financial category or article class;It can be further determined that according to frame The group of document picture to be identified, such as spend money application form and reimbursement approval form belong to financial category table, but have by oneself not because each Same frame, therefore corresponding different Doctype.
Optionally, OCR recognition unit 35 is further used for:
OCR identification is carried out to the document picture to be identified using convolution loop neural network model.
Specifically, the convolution loop neural network model include neural network CNN, bidirectional circulating neural network LSTM and Couple chronological classification CTC model;
It is described that OCR identification is carried out to the document picture to be identified using convolution loop neural network model, comprising:
Neural network CNN extracts the feature of the identification region of the document picture to be identified, generates the spy of the identification region Levy sequence;
Bidirectional circulating neural network LSTM determines the corresponding label distribution list of each feature in characteristic sequence;
Connection chronological classification CTC model determines the text of the identification region according to the corresponding label distribution list of each feature Word.
Specifically, the corresponding label distribution list of some feature is softmax vector, indicates that this feature corresponds to each label Probability export most probable label after these probability of all features are transferred to CTC model, using removal space etc. behaviour Make, obtains final sequence label, the i.e. text of identification region.
It will be appreciated that handwriting is regular not as good as printing character, cause OCR identification handwriting effect poor, we Case CTC model can carry out automatic aligning to the text not being aligned, and improve the accuracy of OCR identification.
OCR identifies that often user needs to provide the higher image of quality and can just have to the more demanding of tablet pattern quality Preferable identification quality.Resolution ratio cannot be too low, and color cannot be too abundant, and contrast cannot be too low, and the text on image cannot There is deflection.
Optionally, the device further include:
Picture adjustment unit, for adjusting the brightness and contrast of the document picture to be identified;
Gray scale processing unit, for carrying out gray proces to the document picture to be identified;
Angle adjusting, for receiving user to the angle adjustment instruction of the document picture to be identified after gray proces, Adjust the angle of the document picture to be identified.
The picture adjustment unit of the embodiment of the present application uses brightness and comparison of the Tesseract-OCR Open Framework to image Degree is automatically adjusted;Gray scale processing unit carries out at gray scale image using open-source cross-platform computer vision library openCV Reason, makes the color of the image become black-and-white two color, forms a sharp contrast;Angle adjusting shows that image interactive interface allows use Family carries out manual correction to image, and user can be adjusted the angle of image by the built-in function that application program provides, make Obtain the text no longer deflection on image.
In a kind of optional embodiment of the application, first sample document picture is matched in the recognition template database It is equipped with multiple regions recognition template, the partial region of each region recognition template sample files picture for identification.
It should be noted that first sample picture is any one sample files picture in recognition template database.Pass through To a certain sample files picture configuration multiple regions recognition template, it can be achieved that the knowledge of the partial region to document map piece to be identified Not, the efficiency of OCR identification is improved.
In practical applications, by taking Fig. 2 as an example, sample files are bill document, which is configured with region recognition mould Plate A and region recognition template B, wherein the identification region range of region recognition template A includes the spend money amount of money, spend money department and contract Number, the identification region range of region recognition template B includes the spend money amount of money, the department that spends money, applicant, department head.It can basis Different region recognition templates is arranged in actual business requirement.
Specifically, recognition template acquiring unit 32 is further used for:
Whole automatic frame choosing is carried out to each sample files picture, obtains the identification region of each sample files picture;
The identification region selected to automatic frame is adjusted, and multiple identification regions of wrong automatic frame choosing are adjusted to one One identification region of wrong automatic frame choosing is split as multiple identification regions by identification region.
The OCR identification device of the embodiment of the present application establishes the template to match with document to be identified, adapt to it is a variety of not With the identification of the document of typesetting format, the discrimination and accuracy of OCR identification are improved.And it can be excluded using high-definition camera instrument The influence that OCR is identified due to shooting light, angle.The OCR recognition methods of the embodiment of the present application can also be based on region recognition Template identifies the specific region of some document picture, improves the efficiency of identification.
It should be noted that each involved by a kind of OCR identification device based on template matching provided by the embodiments of the present application Other corresponding descriptions of functional unit, can be with reference to the corresponding description in Fig. 1 and Fig. 2.
Based on above-mentioned method as shown in Figure 1, correspondingly, being deposited thereon the embodiment of the present application also provides a kind of storage medium Computer program is contained, which realizes the above-mentioned identification side OCR based on template matching as shown in Figure 1 when being executed by processor Method.
Based on this understanding, the technical solution of the application can be embodied in the form of software products, which produces Product can store in a non-volatile memory medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions With so that computer equipment (can be personal computer, server or the network equipment an etc.) execution the application is each Method described in implement scene.
Based on above-mentioned method as shown in Figure 1 and virtual bench embodiment shown in Fig. 3, to achieve the goals above, The embodiment of the present application also provides a kind of computer equipments, are specifically as follows personal computer, server, network equipment etc., should Entity device includes storage medium and processor;Storage medium, for storing computer program;Processor, for executing calculating Machine program is to realize the above-mentioned OCR recognition methods based on template matching as shown in Figure 1.
Optionally, which can also include user interface, network interface, camera, radio frequency (Radio Frequency, RF) circuit, sensor, voicefrequency circuit, WI-FI module etc..User interface may include display screen (Display), input unit such as keyboard (Keyboard) etc., optional user interface can also connect including USB interface, card reader Mouthful etc..Network interface optionally may include standard wireline interface and wireless interface (such as blue tooth interface, WI-FI interface).
It will be understood by those skilled in the art that a kind of computer equipment structure provided in this embodiment is not constituted to the reality The restriction of body equipment may include more or fewer components, perhaps combine certain components or different component layouts.
It can also include operating system, network communication module in storage medium.Operating system is that management computer equipment is hard The program of part and software resource supports the operation of message handling program and other softwares and/or program.Network communication module is used Communication between each component in realization storage medium inside, and communicated between other hardware and softwares in the entity device.
Through the above description of the embodiments, those skilled in the art can be understood that the application can borrow It helps software that the mode of necessary general hardware platform is added to realize, hardware realization can also be passed through.Pass through the skill of application the application Art scheme,.
It should be noted that the terms "include", "comprise" or its any other variant are intended to the packet of nonexcludability Contain, so that the process, method, article or equipment for including a series of elements not only includes those elements, but also including Other elements that are not explicitly listed, or further include for elements inherent to such a process, method, article, or device. In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including the element Process, method, article or equipment in there is also other identical elements.
In the description of the present application, numerous specific details are set forth.Although it is understood that embodiments herein can To practice without these specific details.In some instances, well known method, structure and skill is not been shown in detail Art, so as not to obscure the understanding of this specification.Similarly, it should be understood that in order to simplify the application and help to understand each hair One or more of bright aspect, in the description above to the exemplary embodiment of the application, each feature of the application has When be grouped together into a single embodiment, figure, or description thereof.However, the method for this application should not be construed to Reflect an intention that i.e. claimed this application claims more more than feature expressly recited in each claim Feature.More precisely, as the following claims reflect, inventive aspect is less than single embodiment disclosed above All features.Therefore, it then follows thus claims of specific embodiment are expressly incorporated in the specific embodiment, wherein Separate embodiments of each claim as the application itself.
It will be appreciated by those skilled in the art that the accompanying drawings are only schematic diagrams of a preferred implementation scenario, module in attached drawing or Process is not necessarily implemented necessary to the application.It will be appreciated by those skilled in the art that the mould in device in implement scene Block can according to implement scene describe be distributed in the device of implement scene, can also carry out corresponding change be located at be different from In one or more devices of this implement scene.The module of above-mentioned implement scene can be merged into a module, can also be into one Step splits into multiple submodule.
Above-mentioned the application serial number is for illustration only, does not represent the superiority and inferiority of implement scene.Disclosed above is only the application Several specific implementation scenes, still, the application is not limited to this, and the changes that any person skilled in the art can think of is all The protection scope of the application should be fallen into.

Claims (10)

1. a kind of OCR recognition methods based on template matching characterized by comprising
The sample files picture of the different specified type-setting modes of acquisition;
Frame choosing is carried out to each sample files picture, obtains recognition template corresponding with each sample files picture;
Recognition template database is established, the corresponding knowledge of each sample files picture is preserved in the recognition template database Other template;
Document picture to be identified is acquired, and the frame and title of the document picture to be identified are identified, described in acquisition The Doctype of document picture to be identified;
The Doctype of the document picture to be identified obtained according to identification calls to be identified accordingly in the recognition template database Template carries out OCR identification to the document picture to be identified.
2. the method according to claim 1, wherein the frame and mark to the document picture to be identified Topic is identified, the Doctype of the document picture to be identified is obtained, comprising:
Binary conversion treatment is carried out to the document picture to be identified, obtains binaryzation form image;
Based on the slant correction algorithm of perspective variation, slant correction is carried out to the binaryzation form image;
The document map to be identified is extracted using morphological image process method based on the binaryzation form image after slant correction The frame of piece;
OCR identification is carried out to the predeterminable area of the binaryzation form image after the slant correction, obtains the document to be identified The title of picture;
The Doctype of the document picture to be identified is obtained according to the frame of the document picture to be identified and title.
3. the method according to claim 1, wherein described carry out OCR identification to the document picture to be identified, Include:
OCR identification is carried out to the document picture to be identified using convolution loop neural network model.
4. according to the method described in claim 3, it is characterized in that, the convolution loop neural network model includes neural network CNN, bidirectional circulating neural network LSTM and connection chronological classification CTC model;
It is described that OCR identification is carried out to the document picture to be identified using convolution loop neural network model, comprising:
Neural network CNN extracts the feature of the identification region of the document picture to be identified, generates the feature sequence of the identification region Column;
Bidirectional circulating neural network LSTM determines the corresponding label distribution list of each feature in characteristic sequence;
Connection chronological classification CTC model determines the text of the identification region according to the corresponding label distribution list of each feature.
5. the method according to claim 1, wherein being identified accordingly in calling the recognition template database Before template carries out OCR identification to the document picture to be identified, the method also includes:
Adjust the brightness and contrast of the document picture to be identified;
Gray proces are carried out to the document picture to be identified;
User is received to the angle adjustment instruction of the document picture to be identified after gray proces, adjusts the document picture to be identified Angle.
6. the method according to claim 1, wherein first sample document picture is in the recognition template database In be configured with multiple regions recognition template, the partial region of each region recognition template sample files picture for identification.
7. the method according to claim 1, wherein described carry out frame choosing to each sample files picture, comprising:
Whole automatic frame choosing is carried out to each sample files picture, obtains the identification region of each sample files picture;
The identification region selected to automatic frame is adjusted, and multiple identification regions of wrong automatic frame choosing are adjusted to an identification One identification region of wrong automatic frame choosing is split as multiple identification regions by region.
8. a kind of OCR identification device based on template matching characterized by comprising
Sample files picture collection unit, for acquiring the sample files picture of different specified type-setting modes;
Recognition template acquiring unit obtains and each sample files picture pair for carrying out frame choosing to each sample files picture The recognition template answered;
Recognition template Database unit is preserved in the recognition template database for establishing recognition template database The corresponding recognition template of each sample files picture;
Doctype acquiring unit, frame for acquiring document picture to be identified, and to the document picture to be identified and Title is identified, the Doctype of the document picture to be identified is obtained;
The Doctype of OCR recognition unit, the document picture to be identified for being obtained according to identification calls the recognition template number OCR identification is carried out to the document picture to be identified according to recognition template corresponding in library.
9. a kind of storage medium, is stored thereon with computer program, which is characterized in that realization when described program is executed by processor OCR recognition methods described in any one of claims 1 to 7 based on template matching.
10. a kind of computer equipment, including storage medium, processor and storage can be run on a storage medium and on a processor Computer program, which is characterized in that the processor is realized described in any one of claims 1 to 7 when executing described program The OCR recognition methods based on template matching.
CN201910127136.1A 2019-02-20 2019-02-20 OCR recognition method and device based on template matching and storage medium Active CN110008944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910127136.1A CN110008944B (en) 2019-02-20 2019-02-20 OCR recognition method and device based on template matching and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910127136.1A CN110008944B (en) 2019-02-20 2019-02-20 OCR recognition method and device based on template matching and storage medium

Publications (2)

Publication Number Publication Date
CN110008944A true CN110008944A (en) 2019-07-12
CN110008944B CN110008944B (en) 2024-02-13

Family

ID=67165937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910127136.1A Active CN110008944B (en) 2019-02-20 2019-02-20 OCR recognition method and device based on template matching and storage medium

Country Status (1)

Country Link
CN (1) CN110008944B (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110443317A (en) * 2019-08-09 2019-11-12 上海尧眸电气科技有限公司 A kind of method, apparatus and electronic equipment of paper shelves electronic data processing
CN110866457A (en) * 2019-10-28 2020-03-06 世纪保众(北京)网络科技有限公司 Electronic insurance policy obtaining method and device, computer equipment and storage medium
CN110866388A (en) * 2019-11-19 2020-03-06 重庆华龙网海数科技有限公司 Publishing PDF layout analysis and identification method based on mixing of multiple neural networks
CN110909733A (en) * 2019-10-28 2020-03-24 世纪保众(北京)网络科技有限公司 Template positioning method and device based on OCR picture recognition and computer equipment
CN110931097A (en) * 2019-11-04 2020-03-27 武汉市纽艾云健康科技有限公司 Processing and analyzing system for inspection report
CN111046736A (en) * 2019-11-14 2020-04-21 贝壳技术有限公司 Method, device and storage medium for extracting text information
CN111047261A (en) * 2019-12-11 2020-04-21 青岛盈智科技有限公司 Warehouse logistics order identification method and system
CN111126380A (en) * 2019-12-02 2020-05-08 贵州电网有限责任公司 Method and system for identifying signature of nameplate of power equipment
CN111178365A (en) * 2019-12-31 2020-05-19 五八有限公司 Picture character recognition method and device, electronic equipment and storage medium
CN111428725A (en) * 2020-04-13 2020-07-17 北京令才科技有限公司 Data structuring processing method and device and electronic equipment
CN111507324A (en) * 2020-03-16 2020-08-07 平安科技(深圳)有限公司 Card frame identification method, device, equipment and computer storage medium
CN111680679A (en) * 2020-06-03 2020-09-18 重庆数道科技有限公司 Automatic document identification method based on OCR
CN111695566A (en) * 2020-06-18 2020-09-22 郑州大学 Method and system for identifying and processing fixed format document
CN112348022A (en) * 2020-10-28 2021-02-09 富邦华一银行有限公司 Free-form document identification method based on deep learning
CN112364790A (en) * 2020-11-16 2021-02-12 中国民航大学 Airport work order information identification method and system based on convolutional neural network
CN112418215A (en) * 2020-11-17 2021-02-26 峰米(北京)科技有限公司 Video classification identification method and device, storage medium and equipment
CN112508011A (en) * 2020-12-02 2021-03-16 上海逸舟信息科技有限公司 OCR (optical character recognition) method and device based on neural network
CN112580499A (en) * 2020-12-17 2021-03-30 上海眼控科技股份有限公司 Text recognition method, device, equipment and storage medium
CN112801084A (en) * 2021-01-29 2021-05-14 杭州大拿科技股份有限公司 Image processing method and device, electronic equipment and storage medium
CN112818961A (en) * 2021-03-26 2021-05-18 北京东方金朔信息技术有限公司 Image feature identification method and device
CN112906695A (en) * 2021-04-14 2021-06-04 数库(上海)科技有限公司 Form recognition method adapting to multi-class OCR recognition interface and related equipment
CN113111829A (en) * 2021-04-23 2021-07-13 杭州睿胜软件有限公司 Method and device for identifying document
WO2021184578A1 (en) * 2020-03-17 2021-09-23 平安科技(深圳)有限公司 Ocr-based target field recognition method and apparatus, electronic device, and storage medium
CN113537221A (en) * 2020-04-15 2021-10-22 阿里巴巴集团控股有限公司 Image recognition method, device and equipment
WO2022001637A1 (en) * 2020-06-29 2022-01-06 北京市商汤科技开发有限公司 Document processing method, device, and apparatus, and computer-readable storage medium
WO2022062798A1 (en) * 2020-09-25 2022-03-31 北京来也网络科技有限公司 Rpa and ai-based table information extraction method and apparatus, device and medium
CN114564912A (en) * 2021-11-30 2022-05-31 中国电子科技集团公司第十五研究所 Intelligent checking and correcting method and system for document format
CN115830620A (en) * 2023-02-14 2023-03-21 江苏联著实业股份有限公司 Archive text data processing method and system based on OCR
CN116137077A (en) * 2023-04-13 2023-05-19 宁波为昕科技有限公司 Method and device for establishing electronic component library, electronic equipment and storage medium
US11995905B2 (en) 2020-02-10 2024-05-28 Beijing Baidu Netcom Science Technology Co., Ltd. Object recognition method and apparatus, and electronic device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090263019A1 (en) * 2008-04-16 2009-10-22 Asaf Tzadok OCR of books by word recognition
CN103258198A (en) * 2013-04-26 2013-08-21 四川大学 Extraction method for characters in form document image
CN103810485A (en) * 2014-01-22 2014-05-21 深圳市东信时代信息技术有限公司 Recognition device, character recognition system and method
US20150278593A1 (en) * 2014-03-31 2015-10-01 Abbyy Development Llc Data capture from images of documents with fixed structure
CN108549881A (en) * 2018-05-02 2018-09-18 杭州创匠信息科技有限公司 The recognition methods of certificate word and device
CN109086714A (en) * 2018-07-31 2018-12-25 国科赛思(北京)科技有限公司 Table recognition method, identifying system and computer installation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090263019A1 (en) * 2008-04-16 2009-10-22 Asaf Tzadok OCR of books by word recognition
CN103258198A (en) * 2013-04-26 2013-08-21 四川大学 Extraction method for characters in form document image
CN103810485A (en) * 2014-01-22 2014-05-21 深圳市东信时代信息技术有限公司 Recognition device, character recognition system and method
US20150278593A1 (en) * 2014-03-31 2015-10-01 Abbyy Development Llc Data capture from images of documents with fixed structure
CN108549881A (en) * 2018-05-02 2018-09-18 杭州创匠信息科技有限公司 The recognition methods of certificate word and device
CN109086714A (en) * 2018-07-31 2018-12-25 国科赛思(北京)科技有限公司 Table recognition method, identifying system and computer installation

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110443317A (en) * 2019-08-09 2019-11-12 上海尧眸电气科技有限公司 A kind of method, apparatus and electronic equipment of paper shelves electronic data processing
CN110866457A (en) * 2019-10-28 2020-03-06 世纪保众(北京)网络科技有限公司 Electronic insurance policy obtaining method and device, computer equipment and storage medium
CN110909733A (en) * 2019-10-28 2020-03-24 世纪保众(北京)网络科技有限公司 Template positioning method and device based on OCR picture recognition and computer equipment
CN110931097A (en) * 2019-11-04 2020-03-27 武汉市纽艾云健康科技有限公司 Processing and analyzing system for inspection report
CN111046736A (en) * 2019-11-14 2020-04-21 贝壳技术有限公司 Method, device and storage medium for extracting text information
CN110866388A (en) * 2019-11-19 2020-03-06 重庆华龙网海数科技有限公司 Publishing PDF layout analysis and identification method based on mixing of multiple neural networks
CN111126380A (en) * 2019-12-02 2020-05-08 贵州电网有限责任公司 Method and system for identifying signature of nameplate of power equipment
CN111047261B (en) * 2019-12-11 2023-06-16 青岛盈智科技有限公司 Warehouse logistics order identification method and system
CN111047261A (en) * 2019-12-11 2020-04-21 青岛盈智科技有限公司 Warehouse logistics order identification method and system
CN111178365A (en) * 2019-12-31 2020-05-19 五八有限公司 Picture character recognition method and device, electronic equipment and storage medium
US11995905B2 (en) 2020-02-10 2024-05-28 Beijing Baidu Netcom Science Technology Co., Ltd. Object recognition method and apparatus, and electronic device and storage medium
CN111507324A (en) * 2020-03-16 2020-08-07 平安科技(深圳)有限公司 Card frame identification method, device, equipment and computer storage medium
CN111507324B (en) * 2020-03-16 2024-05-31 平安科技(深圳)有限公司 Card frame recognition method, device, equipment and computer storage medium
WO2021184578A1 (en) * 2020-03-17 2021-09-23 平安科技(深圳)有限公司 Ocr-based target field recognition method and apparatus, electronic device, and storage medium
CN111428725A (en) * 2020-04-13 2020-07-17 北京令才科技有限公司 Data structuring processing method and device and electronic equipment
CN113537221A (en) * 2020-04-15 2021-10-22 阿里巴巴集团控股有限公司 Image recognition method, device and equipment
CN111680679A (en) * 2020-06-03 2020-09-18 重庆数道科技有限公司 Automatic document identification method based on OCR
CN111695566B (en) * 2020-06-18 2023-03-14 郑州大学 Method and system for identifying and processing fixed format document
CN111695566A (en) * 2020-06-18 2020-09-22 郑州大学 Method and system for identifying and processing fixed format document
WO2022001637A1 (en) * 2020-06-29 2022-01-06 北京市商汤科技开发有限公司 Document processing method, device, and apparatus, and computer-readable storage medium
WO2022062798A1 (en) * 2020-09-25 2022-03-31 北京来也网络科技有限公司 Rpa and ai-based table information extraction method and apparatus, device and medium
CN112348022B (en) * 2020-10-28 2024-05-07 富邦华一银行有限公司 Free-form document identification method based on deep learning
CN112348022A (en) * 2020-10-28 2021-02-09 富邦华一银行有限公司 Free-form document identification method based on deep learning
CN112364790B (en) * 2020-11-16 2022-10-25 中国民航大学 Airport work order information identification method and system based on convolutional neural network
CN112364790A (en) * 2020-11-16 2021-02-12 中国民航大学 Airport work order information identification method and system based on convolutional neural network
CN112418215A (en) * 2020-11-17 2021-02-26 峰米(北京)科技有限公司 Video classification identification method and device, storage medium and equipment
CN112508011A (en) * 2020-12-02 2021-03-16 上海逸舟信息科技有限公司 OCR (optical character recognition) method and device based on neural network
CN112580499A (en) * 2020-12-17 2021-03-30 上海眼控科技股份有限公司 Text recognition method, device, equipment and storage medium
CN112801084A (en) * 2021-01-29 2021-05-14 杭州大拿科技股份有限公司 Image processing method and device, electronic equipment and storage medium
WO2022161293A1 (en) * 2021-01-29 2022-08-04 杭州大拿科技股份有限公司 Image processing method and apparatus, and electronic device and storage medium
CN112818961A (en) * 2021-03-26 2021-05-18 北京东方金朔信息技术有限公司 Image feature identification method and device
CN112906695B (en) * 2021-04-14 2022-03-08 数库(上海)科技有限公司 Form recognition method adapting to multi-class OCR recognition interface and related equipment
CN112906695A (en) * 2021-04-14 2021-06-04 数库(上海)科技有限公司 Form recognition method adapting to multi-class OCR recognition interface and related equipment
CN113111829A (en) * 2021-04-23 2021-07-13 杭州睿胜软件有限公司 Method and device for identifying document
CN114564912A (en) * 2021-11-30 2022-05-31 中国电子科技集团公司第十五研究所 Intelligent checking and correcting method and system for document format
CN115830620A (en) * 2023-02-14 2023-03-21 江苏联著实业股份有限公司 Archive text data processing method and system based on OCR
CN115830620B (en) * 2023-02-14 2023-05-30 江苏联著实业股份有限公司 Archive text data processing method and system based on OCR
CN116137077A (en) * 2023-04-13 2023-05-19 宁波为昕科技有限公司 Method and device for establishing electronic component library, electronic equipment and storage medium
CN116137077B (en) * 2023-04-13 2023-08-08 宁波为昕科技有限公司 Method and device for establishing electronic component library, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110008944B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
CN110008944A (en) OCR recognition methods and device, storage medium based on template matching
Singh Practical machine learning and image processing: for facial recognition, object detection, and pattern recognition using Python
EP3437019B1 (en) Optical character recognition in structured documents
US9779295B2 (en) Systems and methods for note content extraction and management using segmented notes
US10455163B2 (en) Image processing apparatus that generates a combined image, control method, and storage medium
CN109559344B (en) Frame detection method, device and storage medium
CN110263616A (en) A kind of character recognition method, device, electronic equipment and storage medium
KR20120130684A (en) Image processing apparatus, image processing method, and computer readable medium
CN108304562B (en) Question searching method and device and intelligent terminal
CN113223025A (en) Image processing method and device, and neural network training method and device
CN116092231A (en) Ticket identification method, ticket identification device, terminal equipment and storage medium
JP6778314B1 (en) Image processing system, image processing method, and image processing program
US20160343142A1 (en) Object Boundary Detection in an Image
JP5566971B2 (en) Information processing program, information processing apparatus, and character recognition method
JP5878004B2 (en) Multiple document recognition system and multiple document recognition method
CN106803269B (en) Method and device for perspective correction of document image
CN113159029A (en) Method and system for accurately capturing local information in picture
CN114730499A (en) Image recognition method and device, training method, electronic device and storage medium
JP4507673B2 (en) Image processing apparatus, image processing method, and program
JP2004199200A (en) Pattern recognition device, imaging apparatus, information processing system, pattern recognition method, recording medium and program
US11323627B2 (en) Method and electronic device for applying beauty effect setting
KR102352726B1 (en) Electronic apparatus that can convert medical expenses receipt printed on paper into an electronic document and operating method thereof
KR102300475B1 (en) Electronic device that can convert a table-inserted image into an electronic document and operating method thereof
JP6815712B1 (en) Image processing system, image processing method, image processing program, image processing server, and learning model
JP7450131B2 (en) Image processing system, image processing method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant