CN110008944A - OCR recognition methods and device, storage medium based on template matching - Google Patents
OCR recognition methods and device, storage medium based on template matching Download PDFInfo
- Publication number
- CN110008944A CN110008944A CN201910127136.1A CN201910127136A CN110008944A CN 110008944 A CN110008944 A CN 110008944A CN 201910127136 A CN201910127136 A CN 201910127136A CN 110008944 A CN110008944 A CN 110008944A
- Authority
- CN
- China
- Prior art keywords
- identified
- picture
- identification
- document
- document picture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000013528 artificial neural network Methods 0.000 claims description 20
- 238000003062 neural network model Methods 0.000 claims description 15
- 230000002457 bidirectional effect Effects 0.000 claims description 10
- 238000013527 convolutional neural network Methods 0.000 claims description 10
- 239000000284 extract Substances 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 230000000877 morphologic effect Effects 0.000 claims description 5
- 230000010365 information processing Effects 0.000 abstract description 2
- 238000012015 optical character recognition Methods 0.000 description 78
- 238000010586 diagram Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Character Input (AREA)
Abstract
This application discloses a kind of OCR recognition methods and device, storage medium, computer equipment based on template matching, are related to technical field of information processing.Wherein method includes: the sample files picture of the different specified type-setting modes of acquisition;Frame choosing is carried out to each sample files picture, obtains recognition template corresponding with each sample files picture;Recognition template database is established, the corresponding recognition template of each sample files picture is preserved in the recognition template database;Document picture to be identified is acquired, and the frame and title of the document picture to be identified are identified, obtains the Doctype of the document picture to be identified;The Doctype of the document picture to be identified obtained according to identification calls corresponding recognition template in the recognition template database to carry out OCR identification to the document picture to be identified.The application establishes recognition template database, adapts to the identification of the document of a variety of different typesetting formats, improves the accuracy of OCR identification.
Description
Technical field
This application involves technical field of information processing, particularly with regard to a kind of OCR recognition methods based on template matching
And device, storage medium, computer equipment.
Background technique
Optical character identification (Optical Character Recognition, OCR) method, which refers to, passes through electronic equipment
(such as scanner or digital camera) obtains the electronic document of paper document, and the character string cutting in electronic document is opened, and is formed
Then small picture comprising single character identifies the text after cutting using certain method.
Existing OCR recognition methods can only accurately be known because of factors such as character typesetting multiplicity in picture to be identified
The fixed picture of the characters typesettings such as other identity card, bank card, but it is poor to the picture recognition effect of other documents.
Summary of the invention
In view of this, this application provides a kind of OCR recognition methods by template matching and device, storage medium, based on
Machine equipment is calculated, main purpose is to solve the problems, such as that existing OCR recognition methods recognition effect is poor.
According to the one aspect of the application, a kind of OCR recognition methods based on template matching is provided, this method comprises:
The sample files picture of the different specified type-setting modes of acquisition;
Frame choosing is carried out to each sample files picture, obtains recognition template corresponding with each sample files picture;
Recognition template database is established, it is corresponding that each sample files picture is preserved in the recognition template database
Recognition template;
Document picture to be identified is acquired, and the frame and title of the document picture to be identified are identified, is obtained
The Doctype of the document picture to be identified;
The Doctype of the document picture to be identified obtained according to identification calls corresponding in the recognition template database
Recognition template carries out OCR identification to the document picture to be identified.
Optionally, the frame and title to the document picture to be identified identifies, obtains described to be identified
The Doctype of document picture, comprising:
Binary conversion treatment is carried out to the document picture to be identified, obtains binaryzation form image;
Based on the slant correction algorithm of perspective variation, slant correction is carried out to the binaryzation form image;
The text to be identified is extracted using morphological image process method based on the binaryzation form image after slant correction
The frame of shelves picture;
OCR identification is carried out to the predeterminable area of the binaryzation form image after the slant correction, is obtained described to be identified
The title of document picture;
The Doctype of the document picture to be identified is obtained according to the frame of the document picture to be identified and title.
It is optionally, described that OCR identification is carried out to the document picture to be identified, comprising:
OCR identification is carried out to the document picture to be identified using convolution loop neural network model.
Optionally, the convolution loop neural network model include neural network CNN, bidirectional circulating neural network LSTM and
Couple chronological classification CTC model;
It is described that OCR identification is carried out to the document picture to be identified using convolution loop neural network model, comprising:
Neural network CNN extracts the feature of the identification region of the document picture to be identified, generates the spy of the identification region
Levy sequence;
Bidirectional circulating neural network LSTM determines the corresponding label distribution list of each feature in characteristic sequence;
Connection chronological classification CTC model determines the text of the identification region according to the corresponding label distribution list of each feature
Word.
Optionally, the acquisition document picture to be identified, comprising:
Document picture to be identified is acquired by the high-definition camera instrument with adjust automatically shooting angle function.
Optionally, in calling the recognition template database corresponding recognition template to the document picture to be identified into
Before row OCR identification, the method also includes:
Adjust the brightness and contrast of the document picture to be identified;
Gray proces are carried out to the document picture to be identified;
User is received to the angle adjustment instruction of the document picture to be identified after gray proces, adjusts the document to be identified
The angle of picture.
Optionally, first sample document picture is configured with multiple regions recognition template in the recognition template database,
The partial region of each region recognition template sample files picture for identification.
It is optionally, described that frame choosing is carried out to each sample files picture, comprising:
Whole automatic frame choosing is carried out to each sample files picture, obtains the identification region of each sample files picture;
The identification region selected to automatic frame is adjusted, and multiple identification regions of wrong automatic frame choosing are adjusted to one
One identification region of wrong automatic frame choosing is split as multiple identification regions by identification region.
According to the another aspect of the application, a kind of OCR identification device based on template matching is provided, which includes:
Sample files picture collection unit, for acquiring the sample files picture of different specified type-setting modes;
Recognition template acquiring unit obtains and each sample files figure for carrying out frame choosing to each sample files picture
The corresponding recognition template of piece;
Recognition template Database unit is protected in the recognition template database for establishing recognition template database
There is the corresponding recognition template of each sample files picture;
Doctype acquiring unit, for acquiring document picture to be identified, and to the frame of the document picture to be identified
And title is identified, the Doctype of the document picture to be identified is obtained;
The Doctype of OCR recognition unit, the document picture to be identified for being obtained according to identification calls the identification mould
Corresponding recognition template carries out OCR identification to the document picture to be identified in plate database.
Optionally, the Doctype acquiring unit is further used for:
Binary conversion treatment is carried out to the document picture to be identified, obtains binaryzation form image;
Based on the slant correction algorithm of perspective variation, slant correction is carried out to the binaryzation form image;
The text to be identified is extracted using morphological image process method based on the binaryzation form image after slant correction
The frame of shelves picture;
OCR identification is carried out to the predeterminable area of the binaryzation form image after the slant correction, is obtained described to be identified
The title of document picture;
The Doctype of the document picture to be identified is obtained according to the frame of the document picture to be identified and title.
Optionally, the OCR recognition unit is further used for:
OCR identification is carried out to the document picture to be identified using convolution loop neural network model.
Specifically, the convolution loop neural network model include neural network CNN, bidirectional circulating neural network LSTM and
Couple chronological classification CTC model;
It is described that OCR identification is carried out to the document picture to be identified using convolution loop neural network model, comprising:
Neural network CNN extracts the feature of the identification region of the document picture to be identified, generates the spy of the identification region
Levy sequence;
Bidirectional circulating neural network LSTM determines the corresponding label distribution list of each feature in characteristic sequence;
Connection chronological classification CTC model determines the text of the identification region according to the corresponding label distribution list of each feature
Word.
Optionally, the device further include:
Picture adjustment unit, for adjusting the brightness and contrast of the document picture to be identified;
Gray scale processing unit, for carrying out gray proces to the document picture to be identified;
Angle adjusting, for receiving user to the angle adjustment instruction of the document picture to be identified after gray proces,
Adjust the angle of the document picture to be identified.
Optionally, first sample document picture is configured with multiple regions recognition template in the recognition template database,
The partial region of each region recognition template sample files picture for identification.
Specifically, the recognition template acquiring unit is further used for:
Whole automatic frame choosing is carried out to each sample files picture, obtains the identification region of each sample files picture;
The identification region selected to automatic frame is adjusted, and multiple identification regions of wrong automatic frame choosing are adjusted to one
One identification region of wrong automatic frame choosing is split as multiple identification regions by identification region.
According to the application another aspect, a kind of storage medium is provided, computer program, described program are stored thereon with
The above-mentioned OCR recognition methods based on template matching is realized when being executed by processor.
According to the application another aspect, a kind of computer equipment is provided, including storage medium, processor and be stored in
On storage medium and the computer program that can run on a processor, the processor realize above-mentioned be based on when executing described program
The OCR recognition methods of template matching.
By above-mentioned technical proposal, a kind of OCR recognition methods and device, storage based on template matching provided by the present application
Medium, computer equipment establish recognition template database, adapt to the identification of the document of a variety of different typesetting formats, improve
The accuracy of OCR identification.
In addition, the application also uses high-definition camera instrument to acquire the picture of document to be identified, can exclude due to shooting light, angle
Spend the influence identified to OCR.It can also be identified, be improved based on specific region of the region recognition template to some document picture
The efficiency of identification.
Above description is only the general introduction of technical scheme, in order to better understand the technological means of the application,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects, features and advantages of the application can
It is clearer and more comprehensible, below the special specific embodiment for lifting the application.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen
Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 shows a kind of flow diagram of OCR recognition methods based on template matching provided by the embodiments of the present application;
Fig. 2 shows a kind of schematic diagrames of sample files provided by the embodiments of the present application;
Fig. 3 shows a kind of structural schematic diagram of OCR identification device based on template matching provided by the embodiments of the present application.
Specific embodiment
The application is described in detail below with reference to attached drawing and in conjunction with the embodiments.It should be noted that not conflicting
In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
For the problem of current OCR recognition methods recognition effect difference.It present embodiments provides a kind of based on template matching
OCR recognition methods, adapt to the identification of the document of a variety of different typesetting formats, improve the accuracy of OCR identification, such as Fig. 1
It is shown, this method comprises:
S11: the sample files picture of the different specified type-setting modes of acquisition;
In practical applications, pass through the different specified typesettings of high-definition camera instrument acquisition with adjust automatically shooting angle function
The sample files picture of mode.
S12: frame choosing is carried out to each sample files picture, obtains recognition template corresponding with each sample files picture;
It should be noted that the embodiment of the present application collecting sample document picture, determines the cog region of the sample files picture
Domain range, establishes the corresponding recognition template of document picture to be identified, includes the coordinate position of each identification region in recognition template
And zone name.
It will be appreciated that the quality of image cutting quality directly affects the discrimination of OCR in OCR identification.To one
When cutting wrong image progress OCR identification, it is often unable to get correct recognition result.Thus, the application first establishes text to be identified
The template of shelves picture, records the coordinate position and zone name of each identification region in a template.For document map to be identified
The print area of piece, directly using the zone name in the template as the text of print area, improves knowledge without identification
Other efficiency.
S13: establishing recognition template database, and each sample files picture is preserved in the recognition template database
Corresponding recognition template;
It will be appreciated that the embodiment of the present application establishes the identification including the corresponding recognition template of a variety of sample files pictures
Template database, so as to accurately be identified to the document picture to be identified with different type-setting modes.
S14: acquiring document picture to be identified, and identify to the frame and title of the document picture to be identified,
Obtain the Doctype of the document picture to be identified;
S15: the Doctype of the document picture to be identified obtained according to identification calls phase in the recognition template database
The recognition template answered carries out OCR identification to the document picture to be identified.
It will be appreciated that the embodiment of the present application is determined according to the recognition template to match with document picture to be identified wait know
The identification region and zone name of other document picture can relatively accurately determine identification region, (in practice be hand to identification region
Write region) carry out OCR identification, the text according to the zone name in recognition template as print area.
The OCR recognition methods based on template matching of the embodiment of the present application, establishes recognition template database, adapts to more
The identification of the document of the different typesetting formats of kind, improves the accuracy of OCR identification.
In a kind of optional embodiment of the embodiment of the present application, similar with the method in Fig. 1, step S14 is to described
The frame and title of document picture to be identified are identified, the Doctype of the document picture to be identified is obtained, comprising:
Binary conversion treatment is carried out to the document picture to be identified, obtains binaryzation form image;
Based on the slant correction algorithm of perspective variation, slant correction is carried out to the binaryzation form image;
The text to be identified is extracted using morphological image process method based on the binaryzation form image after slant correction
The frame of shelves picture;
OCR identification is carried out to the predeterminable area of the binaryzation form image after the slant correction, is obtained described to be identified
The title of document picture;
The Doctype of the document picture to be identified is obtained according to the frame of the document picture to be identified and title.
It should be noted that the frame of document picture to be identified refers to the lateral wire of table in document to be identified and indulges
To wire.Extracting the frame of document picture to be identified, detailed process is as follows:
Horizontal structure element and vertical structure element is selected to open the binaryzation form image after slant correction respectively
Operation obtains table horizontal line image and table vertical line image;
To the table horizontal line image and the table vertical line image carries out and operation, obtains table frame diagram;
Micronization processes are carried out to the table frame diagram, Form Frame Line skeleton is extracted, that is, extracts document picture to be identified
Frame.
It will be appreciated that the region among the top of document picture to be identified is the title of document picture to be identified.Root
It can determine the major class of document picture to be identified according to title, for example be financial category or article class;It can be further determined that according to frame
The group of document picture to be identified, such as spend money application form and reimbursement approval form belong to financial category table, but have by oneself not because each
Same frame, therefore corresponding different Doctype.
In another optionally embodiment of the embodiment of the present application, to the document map to be identified described in step S15
Piece carries out OCR identification, comprising:
OCR identification is carried out to the document picture to be identified using convolution loop neural network model.
Specifically, the convolution loop neural network model include neural network CNN, bidirectional circulating neural network LSTM and
Couple chronological classification CTC model;
It is described that OCR identification is carried out to the document picture to be identified using convolution loop neural network model, comprising:
Neural network CNN extracts the feature of the identification region of the document picture to be identified, generates the spy of the identification region
Levy sequence;
Bidirectional circulating neural network LSTM determines the corresponding label distribution list of each feature in characteristic sequence;
Connection chronological classification CTC model determines the text of the identification region according to the corresponding label distribution list of each feature
Word.
Specifically, the corresponding label distribution list of some feature is softmax vector, indicates that this feature corresponds to each label
Probability export most probable label after these probability of all features are transferred to CTC model, using removal space etc. behaviour
Make, obtains final sequence label, the i.e. text of identification region.
It will be appreciated that handwriting is regular not as good as printing character, cause OCR identification handwriting effect poor, we
Case CTC model can carry out automatic aligning to the text not being aligned, and improve the accuracy of OCR identification.
Preferably, the acquisition document picture to be identified, comprising:
Document picture to be identified is acquired by the high-definition camera instrument with adjust automatically shooting angle function.
It should be noted that high-definition camera instrument can be according to the intensity adjustment ISO value or exposure of ambient light, to mention
The quality of high document picture.ISO indicates the film speed of CCD CMOS photosensitive element in digital camera, and ISO numerical value is higher
Photoperceptivity with regard to illustrating the sensitive component is stronger.
Under normal circumstances, ISO value is lower, and the quality of photograph is higher, and obtaining for the details performance of photograph is finer and smoother, and ISO value is got over
The brightness of height, photograph is higher, and the quality of photograph can be reduced with the raising of ISO value, and noise can become increasingly severe,
But high ISO value can make up the deficiency of light.
Also, high-definition camera instrument can also adjust shooting angle according to documents location, avoid due to text skew in document
Influence the effect of OCR identification.
OCR identifies that often user needs to provide the higher image of quality and can just have to the more demanding of tablet pattern quality
Preferable identification quality.Resolution ratio cannot be too low, and color cannot be too abundant, and contrast cannot be too low, and the text on image cannot
There is deflection.
Preferably, the embodiment of the present application in calling the recognition template database corresponding recognition template to described wait know
Before other document picture carries out OCR identification, also document map piece to be identified is pre-processed, is specifically included:
Adjust the brightness and contrast of the document picture to be identified;
Gray proces are carried out to the document picture to be identified;
User is received to the angle adjustment instruction of the document picture to be identified after gray proces, adjusts the document to be identified
The angle of picture.
In a kind of optional embodiment, the application using Tesseract-OCR Open Framework to the brightness of image and
Contrast is automatically adjusted;Gray proces are carried out to image using open-source cross-platform computer vision library openCV, make the figure
The color of picture becomes black-and-white two color, forms a sharp contrast;Show that image interactive interface allows user to carry out manual correction to image,
User can be adjusted the angle of image by the built-in function that application program provides, so that the text on image is no longer inclined
Tiltedly.
In a kind of optional embodiment of the application, first sample document picture is matched in the recognition template database
It is equipped with multiple regions recognition template, the partial region of each region recognition template sample files picture for identification.
It should be noted that first sample picture is any one sample files picture in recognition template database.Pass through
To a certain sample files picture configuration multiple regions recognition template, it can be achieved that the knowledge of the partial region to document map piece to be identified
Not, the efficiency of OCR identification is improved.
In practical applications, by taking Fig. 2 as an example, sample files are bill document, which is configured with region recognition mould
Plate A and region recognition template B, wherein the identification region range of region recognition template A includes the spend money amount of money, spend money department and contract
Number, the identification region range of region recognition template B includes the spend money amount of money, the department that spends money, applicant, department head.It can basis
Different region recognition templates is arranged in actual business requirement.
In the another embodiment of this programme, in order to improve the efficiency for the template for establishing document picture to be identified, also
Frame choosing can be carried out to each sample files picture in this way, obtain recognition template corresponding with each sample files picture:
Whole automatic frame choosing is carried out to each sample files picture, obtains the identification region of each sample files picture;
The identification region selected to automatic frame is adjusted, and multiple identification regions of wrong automatic frame choosing are adjusted to one
One identification region of wrong automatic frame choosing is split as multiple identification regions by identification region.
It will be appreciated that the embodiment of the present application carries out whole automatic frame choosing to document map piece to be identified, then adjustment is whole
Automatic frame choosing as a result, establish the corresponding recognition template of document picture to be identified, include each identification region in the recognition template
Coordinate position and zone name.Collimation mark can be remembered any portion on region when picture frame by automatic marked region
Point, wire can be automatically adjusted to the boundary in region, wire can also be marked the blank space outside the boundary of four, region, wire
It can automatically contract to the boundary in region.
Specifically: selecting result to carry out the more adjustment of a mark whole automatic frame, the multiple regions of error flag are merged into
One region;Select result to carry out the adjustment of more marks one whole automatic frame, by error flag be a region be split as it is multiple
Region.
The OCR recognition methods of the embodiment of the present application establishes the template to match with document to be identified, adapt to it is a variety of not
With the identification of the document of typesetting format, the discrimination and accuracy of OCR identification are improved.And it can be excluded using high-definition camera instrument
The influence that OCR is identified due to shooting light, angle.The OCR recognition methods of the embodiment of the present application can also be based on region recognition
Template identifies the specific region of some document picture, improves the efficiency of identification.
Fig. 3 shows a kind of structural schematic diagram of OCR identification device based on template matching provided by the embodiments of the present application.
As shown in figure 3, the device of the embodiment of the present application includes:
Sample files picture collection unit 31, for acquiring the sample files picture of different specified type-setting modes;
In practical applications, sample files picture collection unit 31 passes through the high definition with adjust automatically shooting angle function
The sample files picture of the different specified type-setting modes of video camera acquisition.
Recognition template acquiring unit 32 obtains and each sample files for carrying out frame choosing to each sample files picture
The corresponding recognition template of picture;
It should be noted that the embodiment of the present application collecting sample document picture, determines the cog region of the sample files picture
Domain range, establishes the corresponding recognition template of document picture to be identified, includes the coordinate position of each identification region in recognition template
And zone name.
It will be appreciated that the quality of image cutting quality directly affects the discrimination of OCR in OCR identification.To one
When cutting wrong image progress OCR identification, it is often unable to get correct recognition result.Thus, the application first establishes text to be identified
The template of shelves picture, records the coordinate position and zone name of each identification region in a template.For document map to be identified
The print area of piece, directly using the zone name in the template as the text of print area, improves knowledge without identification
Other efficiency.
Recognition template Database unit 33, for establishing recognition template database, in the recognition template database
Preserve the corresponding recognition template of each sample files picture;
It will be appreciated that the embodiment of the present application establishes the identification including the corresponding recognition template of a variety of sample files pictures
Template database, so as to accurately be identified to the document picture to be identified with different type-setting modes.
Doctype acquiring unit 34, for acquiring document picture to be identified, and to the side of the document picture to be identified
Frame and title are identified, the Doctype of the document picture to be identified is obtained.
OCR recognition unit 35 calls and knows accordingly in the recognition template database for acquiring document picture to be identified
Other template carries out OCR identification to the document picture to be identified.
It will be appreciated that the embodiment of the present application is determined according to the recognition template to match with document picture to be identified wait know
The identification region and zone name of other document picture can relatively accurately determine identification region, (in practice be hand to identification region
Write region) carry out OCR identification, the text according to the zone name in recognition template as print area.
The OCR identification device based on template matching of the embodiment of the present application, establishes recognition template database, adapts to more
The identification of the document of the different typesetting formats of kind, improves the accuracy of OCR identification.
Optionally, Doctype acquiring unit 34 is further used for:
Binary conversion treatment is carried out to the document picture to be identified, obtains binaryzation form image;
Based on the slant correction algorithm of perspective variation, slant correction is carried out to the binaryzation form image;
The text to be identified is extracted using morphological image process method based on the binaryzation form image after slant correction
The frame of shelves picture;
OCR identification is carried out to the predeterminable area of the binaryzation form image after the slant correction, is obtained described to be identified
The title of document picture;
The Doctype of the document picture to be identified is obtained according to the frame of the document picture to be identified and title.
It should be noted that the frame of document picture to be identified refers to the lateral wire of table in document to be identified and indulges
To wire.Extracting the frame of document picture to be identified, detailed process is as follows:
Horizontal structure element and vertical structure element is selected to open the binaryzation form image after slant correction respectively
Operation obtains table horizontal line image and table vertical line image;
To the table horizontal line image and the table vertical line image carries out and operation, obtains table frame diagram;
Micronization processes are carried out to the table frame diagram, Form Frame Line skeleton is extracted, that is, extracts document picture to be identified
Frame.
It will be appreciated that the region among the top of document picture to be identified is the title of document picture to be identified.Root
It can determine the major class of document picture to be identified according to title, for example be financial category or article class;It can be further determined that according to frame
The group of document picture to be identified, such as spend money application form and reimbursement approval form belong to financial category table, but have by oneself not because each
Same frame, therefore corresponding different Doctype.
Optionally, OCR recognition unit 35 is further used for:
OCR identification is carried out to the document picture to be identified using convolution loop neural network model.
Specifically, the convolution loop neural network model include neural network CNN, bidirectional circulating neural network LSTM and
Couple chronological classification CTC model;
It is described that OCR identification is carried out to the document picture to be identified using convolution loop neural network model, comprising:
Neural network CNN extracts the feature of the identification region of the document picture to be identified, generates the spy of the identification region
Levy sequence;
Bidirectional circulating neural network LSTM determines the corresponding label distribution list of each feature in characteristic sequence;
Connection chronological classification CTC model determines the text of the identification region according to the corresponding label distribution list of each feature
Word.
Specifically, the corresponding label distribution list of some feature is softmax vector, indicates that this feature corresponds to each label
Probability export most probable label after these probability of all features are transferred to CTC model, using removal space etc. behaviour
Make, obtains final sequence label, the i.e. text of identification region.
It will be appreciated that handwriting is regular not as good as printing character, cause OCR identification handwriting effect poor, we
Case CTC model can carry out automatic aligning to the text not being aligned, and improve the accuracy of OCR identification.
OCR identifies that often user needs to provide the higher image of quality and can just have to the more demanding of tablet pattern quality
Preferable identification quality.Resolution ratio cannot be too low, and color cannot be too abundant, and contrast cannot be too low, and the text on image cannot
There is deflection.
Optionally, the device further include:
Picture adjustment unit, for adjusting the brightness and contrast of the document picture to be identified;
Gray scale processing unit, for carrying out gray proces to the document picture to be identified;
Angle adjusting, for receiving user to the angle adjustment instruction of the document picture to be identified after gray proces,
Adjust the angle of the document picture to be identified.
The picture adjustment unit of the embodiment of the present application uses brightness and comparison of the Tesseract-OCR Open Framework to image
Degree is automatically adjusted;Gray scale processing unit carries out at gray scale image using open-source cross-platform computer vision library openCV
Reason, makes the color of the image become black-and-white two color, forms a sharp contrast;Angle adjusting shows that image interactive interface allows use
Family carries out manual correction to image, and user can be adjusted the angle of image by the built-in function that application program provides, make
Obtain the text no longer deflection on image.
In a kind of optional embodiment of the application, first sample document picture is matched in the recognition template database
It is equipped with multiple regions recognition template, the partial region of each region recognition template sample files picture for identification.
It should be noted that first sample picture is any one sample files picture in recognition template database.Pass through
To a certain sample files picture configuration multiple regions recognition template, it can be achieved that the knowledge of the partial region to document map piece to be identified
Not, the efficiency of OCR identification is improved.
In practical applications, by taking Fig. 2 as an example, sample files are bill document, which is configured with region recognition mould
Plate A and region recognition template B, wherein the identification region range of region recognition template A includes the spend money amount of money, spend money department and contract
Number, the identification region range of region recognition template B includes the spend money amount of money, the department that spends money, applicant, department head.It can basis
Different region recognition templates is arranged in actual business requirement.
Specifically, recognition template acquiring unit 32 is further used for:
Whole automatic frame choosing is carried out to each sample files picture, obtains the identification region of each sample files picture;
The identification region selected to automatic frame is adjusted, and multiple identification regions of wrong automatic frame choosing are adjusted to one
One identification region of wrong automatic frame choosing is split as multiple identification regions by identification region.
The OCR identification device of the embodiment of the present application establishes the template to match with document to be identified, adapt to it is a variety of not
With the identification of the document of typesetting format, the discrimination and accuracy of OCR identification are improved.And it can be excluded using high-definition camera instrument
The influence that OCR is identified due to shooting light, angle.The OCR recognition methods of the embodiment of the present application can also be based on region recognition
Template identifies the specific region of some document picture, improves the efficiency of identification.
It should be noted that each involved by a kind of OCR identification device based on template matching provided by the embodiments of the present application
Other corresponding descriptions of functional unit, can be with reference to the corresponding description in Fig. 1 and Fig. 2.
Based on above-mentioned method as shown in Figure 1, correspondingly, being deposited thereon the embodiment of the present application also provides a kind of storage medium
Computer program is contained, which realizes the above-mentioned identification side OCR based on template matching as shown in Figure 1 when being executed by processor
Method.
Based on this understanding, the technical solution of the application can be embodied in the form of software products, which produces
Product can store in a non-volatile memory medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions
With so that computer equipment (can be personal computer, server or the network equipment an etc.) execution the application is each
Method described in implement scene.
Based on above-mentioned method as shown in Figure 1 and virtual bench embodiment shown in Fig. 3, to achieve the goals above,
The embodiment of the present application also provides a kind of computer equipments, are specifically as follows personal computer, server, network equipment etc., should
Entity device includes storage medium and processor;Storage medium, for storing computer program;Processor, for executing calculating
Machine program is to realize the above-mentioned OCR recognition methods based on template matching as shown in Figure 1.
Optionally, which can also include user interface, network interface, camera, radio frequency (Radio
Frequency, RF) circuit, sensor, voicefrequency circuit, WI-FI module etc..User interface may include display screen
(Display), input unit such as keyboard (Keyboard) etc., optional user interface can also connect including USB interface, card reader
Mouthful etc..Network interface optionally may include standard wireline interface and wireless interface (such as blue tooth interface, WI-FI interface).
It will be understood by those skilled in the art that a kind of computer equipment structure provided in this embodiment is not constituted to the reality
The restriction of body equipment may include more or fewer components, perhaps combine certain components or different component layouts.
It can also include operating system, network communication module in storage medium.Operating system is that management computer equipment is hard
The program of part and software resource supports the operation of message handling program and other softwares and/or program.Network communication module is used
Communication between each component in realization storage medium inside, and communicated between other hardware and softwares in the entity device.
Through the above description of the embodiments, those skilled in the art can be understood that the application can borrow
It helps software that the mode of necessary general hardware platform is added to realize, hardware realization can also be passed through.Pass through the skill of application the application
Art scheme,.
It should be noted that the terms "include", "comprise" or its any other variant are intended to the packet of nonexcludability
Contain, so that the process, method, article or equipment for including a series of elements not only includes those elements, but also including
Other elements that are not explicitly listed, or further include for elements inherent to such a process, method, article, or device.
In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including the element
Process, method, article or equipment in there is also other identical elements.
In the description of the present application, numerous specific details are set forth.Although it is understood that embodiments herein can
To practice without these specific details.In some instances, well known method, structure and skill is not been shown in detail
Art, so as not to obscure the understanding of this specification.Similarly, it should be understood that in order to simplify the application and help to understand each hair
One or more of bright aspect, in the description above to the exemplary embodiment of the application, each feature of the application has
When be grouped together into a single embodiment, figure, or description thereof.However, the method for this application should not be construed to
Reflect an intention that i.e. claimed this application claims more more than feature expressly recited in each claim
Feature.More precisely, as the following claims reflect, inventive aspect is less than single embodiment disclosed above
All features.Therefore, it then follows thus claims of specific embodiment are expressly incorporated in the specific embodiment, wherein
Separate embodiments of each claim as the application itself.
It will be appreciated by those skilled in the art that the accompanying drawings are only schematic diagrams of a preferred implementation scenario, module in attached drawing or
Process is not necessarily implemented necessary to the application.It will be appreciated by those skilled in the art that the mould in device in implement scene
Block can according to implement scene describe be distributed in the device of implement scene, can also carry out corresponding change be located at be different from
In one or more devices of this implement scene.The module of above-mentioned implement scene can be merged into a module, can also be into one
Step splits into multiple submodule.
Above-mentioned the application serial number is for illustration only, does not represent the superiority and inferiority of implement scene.Disclosed above is only the application
Several specific implementation scenes, still, the application is not limited to this, and the changes that any person skilled in the art can think of is all
The protection scope of the application should be fallen into.
Claims (10)
1. a kind of OCR recognition methods based on template matching characterized by comprising
The sample files picture of the different specified type-setting modes of acquisition;
Frame choosing is carried out to each sample files picture, obtains recognition template corresponding with each sample files picture;
Recognition template database is established, the corresponding knowledge of each sample files picture is preserved in the recognition template database
Other template;
Document picture to be identified is acquired, and the frame and title of the document picture to be identified are identified, described in acquisition
The Doctype of document picture to be identified;
The Doctype of the document picture to be identified obtained according to identification calls to be identified accordingly in the recognition template database
Template carries out OCR identification to the document picture to be identified.
2. the method according to claim 1, wherein the frame and mark to the document picture to be identified
Topic is identified, the Doctype of the document picture to be identified is obtained, comprising:
Binary conversion treatment is carried out to the document picture to be identified, obtains binaryzation form image;
Based on the slant correction algorithm of perspective variation, slant correction is carried out to the binaryzation form image;
The document map to be identified is extracted using morphological image process method based on the binaryzation form image after slant correction
The frame of piece;
OCR identification is carried out to the predeterminable area of the binaryzation form image after the slant correction, obtains the document to be identified
The title of picture;
The Doctype of the document picture to be identified is obtained according to the frame of the document picture to be identified and title.
3. the method according to claim 1, wherein described carry out OCR identification to the document picture to be identified,
Include:
OCR identification is carried out to the document picture to be identified using convolution loop neural network model.
4. according to the method described in claim 3, it is characterized in that, the convolution loop neural network model includes neural network
CNN, bidirectional circulating neural network LSTM and connection chronological classification CTC model;
It is described that OCR identification is carried out to the document picture to be identified using convolution loop neural network model, comprising:
Neural network CNN extracts the feature of the identification region of the document picture to be identified, generates the feature sequence of the identification region
Column;
Bidirectional circulating neural network LSTM determines the corresponding label distribution list of each feature in characteristic sequence;
Connection chronological classification CTC model determines the text of the identification region according to the corresponding label distribution list of each feature.
5. the method according to claim 1, wherein being identified accordingly in calling the recognition template database
Before template carries out OCR identification to the document picture to be identified, the method also includes:
Adjust the brightness and contrast of the document picture to be identified;
Gray proces are carried out to the document picture to be identified;
User is received to the angle adjustment instruction of the document picture to be identified after gray proces, adjusts the document picture to be identified
Angle.
6. the method according to claim 1, wherein first sample document picture is in the recognition template database
In be configured with multiple regions recognition template, the partial region of each region recognition template sample files picture for identification.
7. the method according to claim 1, wherein described carry out frame choosing to each sample files picture, comprising:
Whole automatic frame choosing is carried out to each sample files picture, obtains the identification region of each sample files picture;
The identification region selected to automatic frame is adjusted, and multiple identification regions of wrong automatic frame choosing are adjusted to an identification
One identification region of wrong automatic frame choosing is split as multiple identification regions by region.
8. a kind of OCR identification device based on template matching characterized by comprising
Sample files picture collection unit, for acquiring the sample files picture of different specified type-setting modes;
Recognition template acquiring unit obtains and each sample files picture pair for carrying out frame choosing to each sample files picture
The recognition template answered;
Recognition template Database unit is preserved in the recognition template database for establishing recognition template database
The corresponding recognition template of each sample files picture;
Doctype acquiring unit, frame for acquiring document picture to be identified, and to the document picture to be identified and
Title is identified, the Doctype of the document picture to be identified is obtained;
The Doctype of OCR recognition unit, the document picture to be identified for being obtained according to identification calls the recognition template number
OCR identification is carried out to the document picture to be identified according to recognition template corresponding in library.
9. a kind of storage medium, is stored thereon with computer program, which is characterized in that realization when described program is executed by processor
OCR recognition methods described in any one of claims 1 to 7 based on template matching.
10. a kind of computer equipment, including storage medium, processor and storage can be run on a storage medium and on a processor
Computer program, which is characterized in that the processor is realized described in any one of claims 1 to 7 when executing described program
The OCR recognition methods based on template matching.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910127136.1A CN110008944B (en) | 2019-02-20 | 2019-02-20 | OCR recognition method and device based on template matching and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910127136.1A CN110008944B (en) | 2019-02-20 | 2019-02-20 | OCR recognition method and device based on template matching and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110008944A true CN110008944A (en) | 2019-07-12 |
CN110008944B CN110008944B (en) | 2024-02-13 |
Family
ID=67165937
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910127136.1A Active CN110008944B (en) | 2019-02-20 | 2019-02-20 | OCR recognition method and device based on template matching and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110008944B (en) |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110443317A (en) * | 2019-08-09 | 2019-11-12 | 上海尧眸电气科技有限公司 | A kind of method, apparatus and electronic equipment of paper shelves electronic data processing |
CN110866457A (en) * | 2019-10-28 | 2020-03-06 | 世纪保众(北京)网络科技有限公司 | Electronic insurance policy obtaining method and device, computer equipment and storage medium |
CN110866388A (en) * | 2019-11-19 | 2020-03-06 | 重庆华龙网海数科技有限公司 | Publishing PDF layout analysis and identification method based on mixing of multiple neural networks |
CN110909733A (en) * | 2019-10-28 | 2020-03-24 | 世纪保众(北京)网络科技有限公司 | Template positioning method and device based on OCR picture recognition and computer equipment |
CN110931097A (en) * | 2019-11-04 | 2020-03-27 | 武汉市纽艾云健康科技有限公司 | Processing and analyzing system for inspection report |
CN111046736A (en) * | 2019-11-14 | 2020-04-21 | 贝壳技术有限公司 | Method, device and storage medium for extracting text information |
CN111047261A (en) * | 2019-12-11 | 2020-04-21 | 青岛盈智科技有限公司 | Warehouse logistics order identification method and system |
CN111126380A (en) * | 2019-12-02 | 2020-05-08 | 贵州电网有限责任公司 | Method and system for identifying signature of nameplate of power equipment |
CN111178365A (en) * | 2019-12-31 | 2020-05-19 | 五八有限公司 | Picture character recognition method and device, electronic equipment and storage medium |
CN111428725A (en) * | 2020-04-13 | 2020-07-17 | 北京令才科技有限公司 | Data structuring processing method and device and electronic equipment |
CN111507324A (en) * | 2020-03-16 | 2020-08-07 | 平安科技(深圳)有限公司 | Card frame identification method, device, equipment and computer storage medium |
CN111680679A (en) * | 2020-06-03 | 2020-09-18 | 重庆数道科技有限公司 | Automatic document identification method based on OCR |
CN111695566A (en) * | 2020-06-18 | 2020-09-22 | 郑州大学 | Method and system for identifying and processing fixed format document |
CN112348022A (en) * | 2020-10-28 | 2021-02-09 | 富邦华一银行有限公司 | Free-form document identification method based on deep learning |
CN112364790A (en) * | 2020-11-16 | 2021-02-12 | 中国民航大学 | Airport work order information identification method and system based on convolutional neural network |
CN112418215A (en) * | 2020-11-17 | 2021-02-26 | 峰米(北京)科技有限公司 | Video classification identification method and device, storage medium and equipment |
CN112508011A (en) * | 2020-12-02 | 2021-03-16 | 上海逸舟信息科技有限公司 | OCR (optical character recognition) method and device based on neural network |
CN112580499A (en) * | 2020-12-17 | 2021-03-30 | 上海眼控科技股份有限公司 | Text recognition method, device, equipment and storage medium |
CN112801084A (en) * | 2021-01-29 | 2021-05-14 | 杭州大拿科技股份有限公司 | Image processing method and device, electronic equipment and storage medium |
CN112818961A (en) * | 2021-03-26 | 2021-05-18 | 北京东方金朔信息技术有限公司 | Image feature identification method and device |
CN112906695A (en) * | 2021-04-14 | 2021-06-04 | 数库(上海)科技有限公司 | Form recognition method adapting to multi-class OCR recognition interface and related equipment |
CN113111829A (en) * | 2021-04-23 | 2021-07-13 | 杭州睿胜软件有限公司 | Method and device for identifying document |
WO2021184578A1 (en) * | 2020-03-17 | 2021-09-23 | 平安科技(深圳)有限公司 | Ocr-based target field recognition method and apparatus, electronic device, and storage medium |
CN113537221A (en) * | 2020-04-15 | 2021-10-22 | 阿里巴巴集团控股有限公司 | Image recognition method, device and equipment |
WO2022001637A1 (en) * | 2020-06-29 | 2022-01-06 | 北京市商汤科技开发有限公司 | Document processing method, device, and apparatus, and computer-readable storage medium |
WO2022062798A1 (en) * | 2020-09-25 | 2022-03-31 | 北京来也网络科技有限公司 | Rpa and ai-based table information extraction method and apparatus, device and medium |
CN114564912A (en) * | 2021-11-30 | 2022-05-31 | 中国电子科技集团公司第十五研究所 | Intelligent checking and correcting method and system for document format |
CN115830620A (en) * | 2023-02-14 | 2023-03-21 | 江苏联著实业股份有限公司 | Archive text data processing method and system based on OCR |
CN116137077A (en) * | 2023-04-13 | 2023-05-19 | 宁波为昕科技有限公司 | Method and device for establishing electronic component library, electronic equipment and storage medium |
US11995905B2 (en) | 2020-02-10 | 2024-05-28 | Beijing Baidu Netcom Science Technology Co., Ltd. | Object recognition method and apparatus, and electronic device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090263019A1 (en) * | 2008-04-16 | 2009-10-22 | Asaf Tzadok | OCR of books by word recognition |
CN103258198A (en) * | 2013-04-26 | 2013-08-21 | 四川大学 | Extraction method for characters in form document image |
CN103810485A (en) * | 2014-01-22 | 2014-05-21 | 深圳市东信时代信息技术有限公司 | Recognition device, character recognition system and method |
US20150278593A1 (en) * | 2014-03-31 | 2015-10-01 | Abbyy Development Llc | Data capture from images of documents with fixed structure |
CN108549881A (en) * | 2018-05-02 | 2018-09-18 | 杭州创匠信息科技有限公司 | The recognition methods of certificate word and device |
CN109086714A (en) * | 2018-07-31 | 2018-12-25 | 国科赛思(北京)科技有限公司 | Table recognition method, identifying system and computer installation |
-
2019
- 2019-02-20 CN CN201910127136.1A patent/CN110008944B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090263019A1 (en) * | 2008-04-16 | 2009-10-22 | Asaf Tzadok | OCR of books by word recognition |
CN103258198A (en) * | 2013-04-26 | 2013-08-21 | 四川大学 | Extraction method for characters in form document image |
CN103810485A (en) * | 2014-01-22 | 2014-05-21 | 深圳市东信时代信息技术有限公司 | Recognition device, character recognition system and method |
US20150278593A1 (en) * | 2014-03-31 | 2015-10-01 | Abbyy Development Llc | Data capture from images of documents with fixed structure |
CN108549881A (en) * | 2018-05-02 | 2018-09-18 | 杭州创匠信息科技有限公司 | The recognition methods of certificate word and device |
CN109086714A (en) * | 2018-07-31 | 2018-12-25 | 国科赛思(北京)科技有限公司 | Table recognition method, identifying system and computer installation |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110443317A (en) * | 2019-08-09 | 2019-11-12 | 上海尧眸电气科技有限公司 | A kind of method, apparatus and electronic equipment of paper shelves electronic data processing |
CN110866457A (en) * | 2019-10-28 | 2020-03-06 | 世纪保众(北京)网络科技有限公司 | Electronic insurance policy obtaining method and device, computer equipment and storage medium |
CN110909733A (en) * | 2019-10-28 | 2020-03-24 | 世纪保众(北京)网络科技有限公司 | Template positioning method and device based on OCR picture recognition and computer equipment |
CN110931097A (en) * | 2019-11-04 | 2020-03-27 | 武汉市纽艾云健康科技有限公司 | Processing and analyzing system for inspection report |
CN111046736A (en) * | 2019-11-14 | 2020-04-21 | 贝壳技术有限公司 | Method, device and storage medium for extracting text information |
CN110866388A (en) * | 2019-11-19 | 2020-03-06 | 重庆华龙网海数科技有限公司 | Publishing PDF layout analysis and identification method based on mixing of multiple neural networks |
CN111126380A (en) * | 2019-12-02 | 2020-05-08 | 贵州电网有限责任公司 | Method and system for identifying signature of nameplate of power equipment |
CN111047261B (en) * | 2019-12-11 | 2023-06-16 | 青岛盈智科技有限公司 | Warehouse logistics order identification method and system |
CN111047261A (en) * | 2019-12-11 | 2020-04-21 | 青岛盈智科技有限公司 | Warehouse logistics order identification method and system |
CN111178365A (en) * | 2019-12-31 | 2020-05-19 | 五八有限公司 | Picture character recognition method and device, electronic equipment and storage medium |
US11995905B2 (en) | 2020-02-10 | 2024-05-28 | Beijing Baidu Netcom Science Technology Co., Ltd. | Object recognition method and apparatus, and electronic device and storage medium |
CN111507324A (en) * | 2020-03-16 | 2020-08-07 | 平安科技(深圳)有限公司 | Card frame identification method, device, equipment and computer storage medium |
CN111507324B (en) * | 2020-03-16 | 2024-05-31 | 平安科技(深圳)有限公司 | Card frame recognition method, device, equipment and computer storage medium |
WO2021184578A1 (en) * | 2020-03-17 | 2021-09-23 | 平安科技(深圳)有限公司 | Ocr-based target field recognition method and apparatus, electronic device, and storage medium |
CN111428725A (en) * | 2020-04-13 | 2020-07-17 | 北京令才科技有限公司 | Data structuring processing method and device and electronic equipment |
CN113537221A (en) * | 2020-04-15 | 2021-10-22 | 阿里巴巴集团控股有限公司 | Image recognition method, device and equipment |
CN111680679A (en) * | 2020-06-03 | 2020-09-18 | 重庆数道科技有限公司 | Automatic document identification method based on OCR |
CN111695566B (en) * | 2020-06-18 | 2023-03-14 | 郑州大学 | Method and system for identifying and processing fixed format document |
CN111695566A (en) * | 2020-06-18 | 2020-09-22 | 郑州大学 | Method and system for identifying and processing fixed format document |
WO2022001637A1 (en) * | 2020-06-29 | 2022-01-06 | 北京市商汤科技开发有限公司 | Document processing method, device, and apparatus, and computer-readable storage medium |
WO2022062798A1 (en) * | 2020-09-25 | 2022-03-31 | 北京来也网络科技有限公司 | Rpa and ai-based table information extraction method and apparatus, device and medium |
CN112348022B (en) * | 2020-10-28 | 2024-05-07 | 富邦华一银行有限公司 | Free-form document identification method based on deep learning |
CN112348022A (en) * | 2020-10-28 | 2021-02-09 | 富邦华一银行有限公司 | Free-form document identification method based on deep learning |
CN112364790B (en) * | 2020-11-16 | 2022-10-25 | 中国民航大学 | Airport work order information identification method and system based on convolutional neural network |
CN112364790A (en) * | 2020-11-16 | 2021-02-12 | 中国民航大学 | Airport work order information identification method and system based on convolutional neural network |
CN112418215A (en) * | 2020-11-17 | 2021-02-26 | 峰米(北京)科技有限公司 | Video classification identification method and device, storage medium and equipment |
CN112508011A (en) * | 2020-12-02 | 2021-03-16 | 上海逸舟信息科技有限公司 | OCR (optical character recognition) method and device based on neural network |
CN112580499A (en) * | 2020-12-17 | 2021-03-30 | 上海眼控科技股份有限公司 | Text recognition method, device, equipment and storage medium |
CN112801084A (en) * | 2021-01-29 | 2021-05-14 | 杭州大拿科技股份有限公司 | Image processing method and device, electronic equipment and storage medium |
WO2022161293A1 (en) * | 2021-01-29 | 2022-08-04 | 杭州大拿科技股份有限公司 | Image processing method and apparatus, and electronic device and storage medium |
CN112818961A (en) * | 2021-03-26 | 2021-05-18 | 北京东方金朔信息技术有限公司 | Image feature identification method and device |
CN112906695B (en) * | 2021-04-14 | 2022-03-08 | 数库(上海)科技有限公司 | Form recognition method adapting to multi-class OCR recognition interface and related equipment |
CN112906695A (en) * | 2021-04-14 | 2021-06-04 | 数库(上海)科技有限公司 | Form recognition method adapting to multi-class OCR recognition interface and related equipment |
CN113111829A (en) * | 2021-04-23 | 2021-07-13 | 杭州睿胜软件有限公司 | Method and device for identifying document |
CN114564912A (en) * | 2021-11-30 | 2022-05-31 | 中国电子科技集团公司第十五研究所 | Intelligent checking and correcting method and system for document format |
CN115830620A (en) * | 2023-02-14 | 2023-03-21 | 江苏联著实业股份有限公司 | Archive text data processing method and system based on OCR |
CN115830620B (en) * | 2023-02-14 | 2023-05-30 | 江苏联著实业股份有限公司 | Archive text data processing method and system based on OCR |
CN116137077A (en) * | 2023-04-13 | 2023-05-19 | 宁波为昕科技有限公司 | Method and device for establishing electronic component library, electronic equipment and storage medium |
CN116137077B (en) * | 2023-04-13 | 2023-08-08 | 宁波为昕科技有限公司 | Method and device for establishing electronic component library, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110008944B (en) | 2024-02-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110008944A (en) | OCR recognition methods and device, storage medium based on template matching | |
Singh | Practical machine learning and image processing: for facial recognition, object detection, and pattern recognition using Python | |
EP3437019B1 (en) | Optical character recognition in structured documents | |
US9779295B2 (en) | Systems and methods for note content extraction and management using segmented notes | |
US10455163B2 (en) | Image processing apparatus that generates a combined image, control method, and storage medium | |
CN109559344B (en) | Frame detection method, device and storage medium | |
CN110263616A (en) | A kind of character recognition method, device, electronic equipment and storage medium | |
KR20120130684A (en) | Image processing apparatus, image processing method, and computer readable medium | |
CN108304562B (en) | Question searching method and device and intelligent terminal | |
CN113223025A (en) | Image processing method and device, and neural network training method and device | |
CN116092231A (en) | Ticket identification method, ticket identification device, terminal equipment and storage medium | |
JP6778314B1 (en) | Image processing system, image processing method, and image processing program | |
US20160343142A1 (en) | Object Boundary Detection in an Image | |
JP5566971B2 (en) | Information processing program, information processing apparatus, and character recognition method | |
JP5878004B2 (en) | Multiple document recognition system and multiple document recognition method | |
CN106803269B (en) | Method and device for perspective correction of document image | |
CN113159029A (en) | Method and system for accurately capturing local information in picture | |
CN114730499A (en) | Image recognition method and device, training method, electronic device and storage medium | |
JP4507673B2 (en) | Image processing apparatus, image processing method, and program | |
JP2004199200A (en) | Pattern recognition device, imaging apparatus, information processing system, pattern recognition method, recording medium and program | |
US11323627B2 (en) | Method and electronic device for applying beauty effect setting | |
KR102352726B1 (en) | Electronic apparatus that can convert medical expenses receipt printed on paper into an electronic document and operating method thereof | |
KR102300475B1 (en) | Electronic device that can convert a table-inserted image into an electronic document and operating method thereof | |
JP6815712B1 (en) | Image processing system, image processing method, image processing program, image processing server, and learning model | |
JP7450131B2 (en) | Image processing system, image processing method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |