CN107622263A - The character identifying method and device of document image - Google Patents
The character identifying method and device of document image Download PDFInfo
- Publication number
- CN107622263A CN107622263A CN201710091081.4A CN201710091081A CN107622263A CN 107622263 A CN107622263 A CN 107622263A CN 201710091081 A CN201710091081 A CN 201710091081A CN 107622263 A CN107622263 A CN 107622263A
- Authority
- CN
- China
- Prior art keywords
- character
- recognition result
- image
- module
- error correction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The present invention relates to a kind of character identifying method and device of document image, methods described includes:Obtain Bank bills image;Identify that the character in the Bank bills image is identified result;The bank identifier according to corresponding to the recognition result determines the Bank bills image;Call error correction character library corresponding to the bank identifier;Error correction is carried out to the error character string in the recognition result according to the error correction character library of calling and obtains correct recognition result.The character identifying method and device of document image provided by the invention, corresponding error correction character library is inquired about according to bank identifier corresponding to Bank bills image, character errors form in the error correction character library of calling carries out error correction to the recognition result of Bank bills image, improves the accuracy of recognition result.
Description
Technical field
The present invention relates to the technical field of image recognition, more particularly to a kind of character identifying method and dress of document image
Put.
Background technology
With the development of computer technology, increasing information is handled by computer.In image recognition, sometimes
Software is needed to be identified to obtain the character information in image to the character in image.But the word in image is identified by software
Accord with obtained recognition result, there can be part in recognition result and identify incorrect character.
However, in order to ensure recognition result correctly, it is necessary to search the incorrect character of identification in recognition result and carry out
Modification.At present, incorrect character is identified typically by artificial search, and the character to finding is modified, it is artificial right
Identify that incorrect character is modified the relatively low of efficiency in recognition result.
The content of the invention
Based on this, it is necessary to for manually to identifying that incorrect character is modified the relatively low of efficiency in recognition result
Problem, there is provided a kind of recognition methods of document image and device.
A kind of character identifying method of document image, methods described include:
Obtain Bank bills image;
Identify that the character in the Bank bills image is identified result;
The bank identifier according to corresponding to the recognition result determines the Bank bills image;
Inquire about error correction character library corresponding to the bank identifier;
Error correction is carried out to the error character string in the recognition result according to the character errors form in the error correction character library
Obtain correct recognition result.
In one of the embodiments, after the acquisition Bank bills image, in addition to:
Extract the seal image region in the Bank bills image;
Highest similarity is obtained in the similarity of the seal image region and each seal image to prestore;
By the highest similarity compared with default similarity;
If the highest similarity is more than default similarity, it is verified, performs the identification Bank bills figure
The step of character as in is identified result.
In one of the embodiments, the character in the identification Bank bills image is identified after result,
Also include:
Extract the set form data in the recognition result;
Count the quantity for meeting the semantic set form data of corresponding form in the set form data of extraction;
Anticipation accuracy rate is generated according to the total quantity of the quantity of statistics and the set form data of extraction;
The anticipation default accuracy rate threshold value of accuracy rate is compared;
If the anticipation accuracy rate is more than or equal to default accuracy rate threshold value, execution is described to determine institute according to the recognition result
The step of stating bank identifier corresponding to Bank bills image;
If the anticipation accuracy rate is less than default accuracy rate threshold value, generation error prompting message.
In one of the embodiments, the character errors form according in the error correction character library is to the recognition result
In error character string carry out error correction and obtain correct recognition result, including:
According to the error character string of the match query in the recognition result of the character errors form in the error correction character library;
Correct characters string corresponding with the character errors form is inquired about in the error correction character library;
The error character string is revised as the correct characters string and obtains correct recognition result.
In one of the embodiments, the character errors form according in the error correction character library is in the recognition result
After the error character string of middle match query, in addition to:
Pair error character string determined is marked in the recognition result;
Labeled recognition result is sent to the terminal for being logged in auditor's account and shown;
The confirmation instruction that the terminal of auditor's account returns is logged in described in reception, according to performing the confirmation instruction
The step of correct characters string corresponding with the character errors form is inquired about in the error correction character library.
The character identifying method of above-mentioned document image, the character in the Bank bills image that gets is identified to obtain
Recognition result, the bank identifier according to corresponding to recognition result determines Bank bills image, inquire about erroneous character correction corresponding to bank identifier
Storehouse.Each bank identifier corresponds to different error correction character libraries, store in error correction character library character errors form and with character errors lattice
Correct characters string corresponding to formula.Character errors form in error correction character library is repaiied to the error character string in recognition result
Change, improve the modification efficiency to identifying incorrect character in recognition result, the character errors form in error correction character library
The correct recognition result of error correction acquisition is carried out to the recognition result of Bank bills image, improves the accuracy of recognition result.
A kind of character recognition device of document image, described device include:
Image collection module, for obtaining Bank bills image;
Picture recognition module, for identifying that the character in the Bank bills image is identified result;
Determining module is identified, for bank identifier corresponding to determining the Bank bills image according to the recognition result;
Character library enquiry module, for inquiring about error correction character library corresponding to the bank identifier;
Character correction module, for according to the character errors form in the error correction character library to the mistake in the recognition result
Character string carries out error correction and obtains correct recognition result by mistake.
In one of the embodiments, described device also includes:
Region extraction module, for extracting the seal image region in the Bank bills image;
Similarity acquisition module, in the similarity in the seal image region and each seal image to prestore
Obtain highest similarity;
Similarity-rough set module, for by the highest similarity compared with default similarity;
If picture recognition module, which is additionally operable to the highest similarity, is more than default similarity, it is verified, described in identification
Character in Bank bills image is identified result.
In one of the embodiments, described device also includes:
Data extraction module, for extracting the set form data in the recognition result;
Data prejudge module, and the semantic set form of corresponding form is met in the set form data for counting extraction
The quantity of data;Anticipation accuracy rate is generated according to the total quantity of the quantity of statistics and the set form data of extraction;
Threshold value comparison module, for the anticipation default accuracy rate threshold value of accuracy rate to be compared;
Information generating module, if being less than default accuracy rate threshold value for the anticipation accuracy rate, generation error reminds letter
Breath;
If the mark determining module, which is additionally operable to the anticipation accuracy rate, is more than or equal to default accuracy rate threshold value, according to described
Recognition result determines bank identifier corresponding to the Bank bills image.
In one of the embodiments, the character correction module includes:
Error character enquiry module, for according to the character errors form in the error correction character library in the recognition result
The error character string of match query;
Correct characters enquiry module, it is corresponding with the character errors form correct for being inquired about in the error correction character library
Character string;
Error character modified module, correct identification is obtained for the error character string to be revised as into the correct characters string
As a result.
In one of the embodiments, the character correction module also includes:
Error character mark module, the error character string for pair determination in the recognition result are marked;
Recognition result sending module, for sending to the terminal for being logged in auditor's account labeled recognition result
Row display;
The confirmation that the correct characters enquiry module is additionally operable to be logged in the terminal return of auditor's account described in reception refers to
Order, confirm that correct characters string corresponding with the character errors form is inquired about in instruction in the error correction character library according to described.
The character recognition device of above-mentioned document image, the character in the Bank bills image that gets is identified to obtain
Recognition result, the bank identifier according to corresponding to recognition result determines Bank bills image, inquire about erroneous character correction corresponding to bank identifier
Storehouse.Each bank identifier corresponds to different error correction character libraries, store in error correction character library character errors form and with character errors lattice
Correct characters string corresponding to formula.Character errors form in error correction character library is repaiied to the error character string in recognition result
Change, improve the modification efficiency to identifying incorrect character in recognition result, the character errors form in error correction character library
The correct recognition result of error correction acquisition is carried out to the recognition result of Bank bills image, improves the accuracy of recognition result.
Brief description of the drawings
Fig. 1 is the applied environment figure of the character identifying method of document image in one embodiment;
Fig. 2 is the structured flowchart of the server in the character recognition system of document image in one embodiment;
Fig. 3 is the schematic flow sheet of the character identifying method of document image in one embodiment;
Fig. 4 is schematic flow sheet the step of Bank bills image is verified in one embodiment;
Fig. 5 is the schematic flow sheet for the step of accuracy rate judges in advance in one embodiment;
The schematic flow sheet for the step of Fig. 6 is marked erroneous character string in one embodiment;
Fig. 7 is the structured flowchart of the character recognition device of document image in one embodiment;
Fig. 8 is the structured flowchart of the character recognition device of document image in another embodiment;
Fig. 9 is the structured flowchart of character correction module in one embodiment.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
Fig. 1 is the applied environment figure of the character identifying method of document image in one embodiment.Reference picture 1, the document figure
The character identifying method of picture is applied to the character recognition system of document image.The character recognition system of document image includes terminal
110 and server 120, wherein terminal 110 be connected by network with server 120.Terminal 110 can be fixed terminal or movement
Terminal, fixed terminal can be specifically at least one of printer, scanner and monitor, and mobile terminal can be specifically flat
At least one of plate computer, smart mobile phone, personal digital assistant and digital camera.
Fig. 2 is the internal structure signal of the server 120 in the character recognition system of Fig. 1 document images in one embodiment
Figure.As shown in Fig. 2 the server 120 includes the processor, non-volatile memory medium, memory storage connected by system bus
Device and network interface.Wherein, the non-volatile memory medium of server 120 is stored with operating system, database, in addition to a kind of
The character recognition device of document image, the character recognition device of the document image are used to realize a kind of character recognition of document image
Method.Processor, which is used to provide, to be calculated and control ability, supports the operation of whole server 120, the memory storage in server 120
Device provides environment for the operation of the character recognition device of the document image in non-volatile memory medium, can be deposited in the built-in storage
Computer-readable instruction is contained, when the computer-readable instruction is executed by processor, may be such that a kind of document figure of computing device
The character identifying method of picture.Network interface is used to carry out network service with terminal 110.
As shown in figure 3, in one embodiment, there is provided a kind of character identifying method of document image, the present embodiment is with this
Method apply the server 120 in the character recognition system of Fig. 1 document images for example, this method specifically include it is following
Content:
S302, obtain Bank bills image.
Specifically, when terminal 110 collects Bank bills image, terminal 110 sends out the Bank bills image collected
Deliver to server 120.The Bank bills image that the receiving terminal 110 of server 120 is sent.Bank bills image is to Bank bills
Obtained image is imaged or is scanned, Bank bills image includes character.Bank bills can be specifically bank's flowing water list.
Character is the general name of various words and symbol, including at least one digital in various countries' word, punctuation mark, graphical symbol.
S304, identify that the character in Bank bills image is identified result.
Specifically, server 120 carries out binary conversion treatment after Bank bills image is got to Bank bills image,
Bank bills image is changed into black white image.Server 120 scans the character in Bank bills image, extracts the character of each character
Feature, inquired about according to character feature in character feature storehouse corresponding to character, using the character inquired as recognition result.Character is special
Sign can be specifically character boundary, character shape and character frame.Character frame can be specifically to represent this with minimum pixel
Character.
In one embodiment, S304 specifically also includes:Bank bills image is divided according to character row and character row
Cut to obtain single character zone;Extract the image information features in single character zone;According to the image information features pair of extraction
Single character, which is identified, is identified result.
Specifically, server 120 is split after obtaining single character zone to Bank bills image procession, and extraction is single
Image information features in character zone.Image information features can specifically include character boundary, font, character skeleton and character
At least one of pixel of skeleton.Character skeleton is the framework that character is represented with minimal number of pixel.Server 120
Character in single character zone is refined, extracts the character skeleton in single character zone.Word is provided with server 120
Symbol identification storehouse, character recognition storehouse include image information features and character, and image information features are corresponding with character to be stored in character
Identify in storehouse.Server 120 is inquired about according to the image information features of extraction in character recognition storehouse to be believed with the character feature of extraction
Character corresponding to breath, the character to inquire are used as recognition result.
Character recognition storehouse is provided with server 120, character recognition storehouse includes image information features and character, and character is special
Reference breath is corresponding with character to be stored in character recognition storehouse.Server 120 is according to the image information features of extraction in character recognition
Character corresponding with the image information features extracted is inquired about in storehouse, the character to inquire is used as recognition result.
In one embodiment, the image information features of extraction are character skeleton, and server 120 is looked into character recognition storehouse
Ask the character matched with character skeleton.Can be specifically, the calculating character skeleton of server 120 in character recognition character it is similar
Degree, choose with character skeleton similarity highest character as the character inquired.
S306, the bank identifier according to corresponding to recognition result determines Bank bills image.
Specifically, Bank bills image includes character corresponding to bank identifier, and server 120 enters to Bank bills image
The recognition result that row identification obtains includes character corresponding to bank identifier.Server 120 extracts bank's list in recognition result
According to bank identifier corresponding to image, bank identifier is specifically bank's full name, bank referred to as and in the english abbreviation of Bank Name
Any one.
S308, inquire about error correction character library corresponding to bank identifier.
Specifically, store multiple error correction character libraries in server 120, each error correction character library respectively with each bank identifier pair
Should, i.e., error correction character library corresponding to each bank identifier differs.Character errors form and correct word are store in error correction character library
The corresponding relation of string, such as " super * silver " correspondence " super Net silver " are accorded with, wherein " * " represents 0, one or more characters.Server
120 after the bank identifier in extracting recognition result, inquires about error correction character library corresponding to bank identifier.
S310, error correction acquisition is carried out to the error character string in recognition result according to the character errors form in error correction character library
Correct recognition result.
Specifically, server 120 extracts the character errors form in error correction character library, inquires about and whether there is in recognition result
With the character string for the character errors format match extracted, the character string inquired is error character string.If inquiring, server
120 extract correct characters string corresponding to character errors form, the character string that will be inquired in recognition result from error correction character library
It is revised as the correct characters string extracted.
In the present embodiment, the character in the Bank bills image that gets is identified and is identified result, according to knowledge
Other result determines bank identifier corresponding to Bank bills image, inquires about error correction character library corresponding to bank identifier.Each bank identifier
Different error correction character libraries is corresponded to, character errors form and correct characters corresponding with character errors form are store in error correction character library
String.Character errors form in error correction character library is modified to the error character string in recognition result, is improved to identification
As a result the modification efficiency of the middle incorrect character of identification, the character errors form in error correction character library is to Bank bills image
Recognition result carries out error correction and obtains correct recognition result, improves the accuracy of recognition result.
As shown in figure 4, the step of also including checking Bank bills image in one embodiment, after S302, the step
Specifically include following steps:
S402, extract the seal image region in Bank bills image.
S404, highest similarity is obtained in the similarity of seal image region and each seal image to prestore.
Specifically, server 120 inquires about seal image region in Bank bills image, from Bank bills image
Extraction seal image region obtains seal image region.Prestore each seal image in server 120, seal image with it is each
Corresponding to bank identifier.Server 120 calculates seal image region and the similarity of each seal image to prestore, from being calculated
Similarity in choose highest similarity.
S406, by highest similarity compared with default similarity.
Specifically, default similarity is provided with server 120, default similarity is used to verify the true of Bank bills image
Reality;If similarity is more than default similarity, Bank bills image is true;If similarity is less than or equal to default similarity, bank
Document image is false.Server 120 the highest similarity of each seal image for getting seal image region and prestoring with it is pre-
If similarity is compared.
S408, if highest similarity is more than default similarity, it is verified, identifies that the character in Bank bills image obtains
To recognition result.
Specifically, highest similarity is being by server 120 compared with default similarity, if highest similarity is more than
During default similarity, then Bank bills image is true, is verified, and the character in Bank bills image is identified;If most
High similarity is less than or equal to default similarity, then Bank bills image is false, checking not by, terminate identification process, can be with
Generation checking does not pass through information.
In one embodiment, server 120 can also extract the seal image area in the image of the certificate of set form
Domain, the true and false of certificate, the certificate tool of set form are judged by the similarity of seal image region and the seal image to prestore
Body can be at least one of property ownership certificate, land certificate and driver's license.
In the present embodiment, seal image region is extracted from Bank bills image, according to seal image region and the print that prestores
The similarity of chapter image verifies the true and false of Bank bills image, is being verified, when to determine Bank bills image be true, to silver
Character in row document image is identified;Checking not by, it is that vacation is to determine Bank bills image, terminate identification process,
The calculation resources spent by the false Bank bills of identification are saved, improve the utilization rate of calculation resources.
As shown in figure 5, in one embodiment, the step of also judgement in advance including accuracy rate after S304, the step is specific
Including herein below:
S502, extract the set form data in recognition result.
S504, count the quantity for meeting the semantic set form data of corresponding form in the set form data of extraction;
Anticipation accuracy rate is generated according to the total quantity of the quantity of statistics and the set form data of extraction.
Specifically, set form data in recognition result be present, server 120 extracts set form data, statistics extraction
The data total amount of the set form data arrived, count and meet the semantic fixation of corresponding form in the set form data extracted
The quantity of formatted data, the right amount of set form data.It is accurate that the data right amount of statistics divided by data total amount are obtained into anticipation
True rate.The semantic set form for set form data fit of form, or the data relationship between set form data.
In one embodiment, the detecting to the set form data extracted of server 120, what Detection and Extraction were arrived
Whether set form data meet set form, and server 120 is counting the data for the set form data for meeting set form just
Really amount.For example, date format is 8 characters, such as " 20161123 ", if 9 characters in the date data extracted be present
Date data, then the date data of 9 characters is wrong data.Extracting in date data has 10, positive exact figures be present
According to for 6, then it is 60% to prejudge accuracy rate.
In one embodiment, server 120 extracts balance data and surplus data, detection balance data and surplus data
Whether match.
S506, judge to prejudge whether accuracy rate is less than default accuracy rate threshold value;If it is not, then perform S508;If so, perform
S510。
S508, the bank identifier according to corresponding to recognition result determines Bank bills image.
S510, generation error prompting message.
Specifically, server 120 will prejudge accuracy rate compared with default accuracy rate threshold value, be more than in anticipation accuracy rate
During equal to default accuracy rate threshold value, server 120 extracts bank identifier corresponding to Bank bills image from recognition result.Pre-
When sentencing accurate record less than default accuracy rate threshold value, show that recognition result has more identification mistake, then generation identification mistake carries
Awake information, error prompting information represent anticipation not by terminating the process for carrying out error correction to recognition result.
In the present embodiment, before error correction is carried out to recognition result, first the accuracy rate of recognition result is judged in advance, will
Judge that obtained anticipation accuracy rate is determined whether to recognition result error correction in advance.When anticipation accuracy rate is less than default accuracy rate threshold value
When, show more, it is necessary to which the character in Bank bills image is identified again in the presence of mistake;When anticipation accuracy rate be more than etc.
When default accuracy rate threshold value, show existing mistake within the scope of error correction, error correction can be carried out to recognition result so as to obtain
To correct recognition result.
As shown in fig. 6, in one embodiment, the step of S310 specifically also includes marked erroneous character string, step tool
Body includes herein below:
S602, according to the error character string of the match query in recognition result of the character errors form in error correction character library.
Specifically, store various error character string formats in error correction character library, in error correction character library also there is with erroneous words
Accord with the correct characters string of string format matching.Server 120 is inquired about whether there is in recognition result meets mistake in error correction character library
The character string of character format, the character string inquired are error character string.For example, in the error correction character library of such as agricultural bank
The error character string format " surplus gold+" being stored with, wherein "+" represents one or more characters, and server 120 is in recognition result
The character that middle inquiry matches with " surplus gold+", if the character string inquired is " surplus gold A ", " surplus gold A " is determination
Error character string.
S604, pair error character string determined is marked in recognition result.
Specifically, after server 120 determines error character string in recognition result, pair error character string determined enters rower
Note, makes the error character string of determination be shown in the form of being different from correct characters string.It is specifically as follows under error character wound addition
Line, can also be the background color that the addition of error character string is different from character color.
S606, labeled recognition result is sent to the terminal for being logged in auditor's account and shown.
Specifically, server 120, which is inquired about, is logged in the terminal 110 of auditor's account, and server 120 is by labeled identification
As a result send to the terminal 110 for being logged in auditor's account.The terminal 110 for being logged in auditor's account receives server 120
After the recognition result of transmission, recognition result is shown.Auditor corresponding to auditor's account is by terminal 110 to mark
Error character string in recognition result is checked.
S608, receive and be logged in the confirmation instruction that the terminal of auditor's account returns, according to confirmation instruction in error correction character library
It is middle to inquire about correct characters string corresponding with character errors form.
Specifically, the terminal 110 for being logged in auditor's account can be after the recognition result of display mark, and triggering confirmation refers to
Order, the terminal 110 for being logged in auditor's account will confirm that instruction is back to server 120.Confirm that instruction is used to trigger server
The error character string marked in 120 pairs of recognition results carries out error correction.Server 120 is logged in the end of auditor's account receiving
After the confirmation instruction that end 110 returns, the character errors form of the error character String matching of determination, extraction are inquired about in error correction character library
Correct characters string corresponding to the character errors form inquired, the correct characters string extracted are and the error character of determination wound pair
The correct characters string answered.
S610, the correct characters string that the error character string of determination is revised as inquiring is obtained into correct recognition result.
Specifically, server 120 obtains the correct characters string that the error character string of determination is replaced or is revised as inquiring
Correct recognition result.For example, determine error character string for " surplus gold A ", inquired in error correction character library with " knot
The character errors form of remaining gold A " matchings is " surplus gold+", the correct characters corresponding to extraction " surplus gold+" in error correction character library
Go here and there as " balance ", by recognition result " surplus gold A " is replaced with " balance ".
In the present embodiment, the error character string in recognition result is marked, and by the recognition result of mark send to
The terminal for being logged in auditor's account shown, the erroneous words that auditor is marked by being logged in the terminal-pair of auditor's account
Symbol string is audited, and after the completion of examination & verification, is received and is logged in the confirmation instruction that the terminal of auditor's account returns, referred to according to confirmation
Order carries out error correction to error character string, improves the determination accuracy rate of error character string, further increases entangling for recognition result
Wrong accuracy.
As shown in fig. 7, in one embodiment, there is provided a kind of character recognition device 700 of document image, the device are specific
Including:Image collection module 702, picture recognition module 704, mark determining module 706, character library enquiry module 708 and character entangle
Mismatch block 710.
Image collection module 702, for obtaining Bank bills image.
Picture recognition module 704, for identifying that the character in Bank bills image is identified result.
Determining module 706 is identified, for bank identifier corresponding to determining Bank bills image according to recognition result.
Character library enquiry module 708, for inquiring about error correction character library corresponding to bank identifier.
Character correction module 710, for according to the character errors form in error correction character library to the erroneous words in recognition result
Symbol string carries out error correction and obtains correct recognition result.
In the present embodiment, the character in the Bank bills image that gets is identified and is identified result, according to knowledge
Other result determines bank identifier corresponding to Bank bills image, inquires about error correction character library corresponding to bank identifier.Each bank identifier
Different error correction character libraries is corresponded to, character errors form and correct characters corresponding with character errors form are store in error correction character library
String.Character errors form in error correction character library is modified to the error character string in recognition result, is improved to identification
As a result the modification efficiency of the middle incorrect character of identification, the character errors form in error correction character library is to Bank bills image
Recognition result carries out error correction and obtains correct recognition result, improves the accuracy of recognition result.
As shown in figure 8, in one embodiment, the character recognition device 700 of document image specifically also includes:Extracted region
Module 712, similarity acquisition module 714, similarity-rough set module 716, data extraction module 718, data anticipation module 720,
Threshold value comparison module 722 and information generating module 724.
Region extraction module 712, for extracting the seal image region in Bank bills image.
Similarity acquisition module 714, in seal image region with being obtained in the similarity of each seal image to prestore
Highest similarity.
Similarity-rough set module 716, for by highest similarity compared with default similarity.
If picture recognition module 704, which is additionally operable to highest similarity, is more than default similarity, it is verified, identification bank is single
Result is identified according to the character in image.
Data extraction module 718, for extracting the set form data in recognition result.
Data prejudge module 720, and the semantic fixation of corresponding form is met in the set form data for counting extraction
The quantity of formatted data;Anticipation accuracy rate is generated according to the total quantity of the quantity of statistics and the set form data of extraction.
Threshold value comparison module 722, it is compared for the default accuracy rate threshold value of accuracy rate will to be prejudged.
Information generating module 724, if being less than default accuracy rate threshold value for prejudging accuracy rate, generation error reminds letter
Breath.
If mark determining module 706 is additionally operable to prejudge accuracy rate more than or equal to default accuracy rate threshold value, according to recognition result
Determine bank identifier corresponding to Bank bills image.
In the present embodiment, seal image region is extracted from Bank bills image, according to seal image region and the print that prestores
The similarity of chapter image verifies the true and false of Bank bills image, is being verified, when to determine Bank bills image be true, to silver
Character in row document image is identified;Checking not by, it is that vacation is to determine Bank bills image, terminate identification process,
The calculation resources spent by the false Bank bills of identification are saved, improve the utilization rate of calculation resources.Carried out to recognition result
Before error correction, first the accuracy rate of recognition result is judged in advance, will judge that obtained anticipation accuracy rate determines whether to pair in advance
Recognition result error correction.When prejudging accuracy rate less than default accuracy rate threshold value, show more, it is necessary to again to bank in the presence of mistake
Character in document image is identified;When prejudging accuracy rate more than or equal to default accuracy rate threshold value, show existing mistake
Within the scope of error correction, error correction can be carried out to recognition result so as to obtain correct recognition result.
As shown in figure 9, in one embodiment, character correction module 710 specifically includes:Error character enquiry module
710a, correct characters enquiry module 710b, error character modified module 710c, error character mark module 710d and recognition result
Sending module 710e.
Error character enquiry module 710a, for being inquired about according to the character errors form in error correction character library in recognition result
The error character string of matching.
Correct characters enquiry module 710b, for inquiring about correct characters corresponding with character errors form in error correction character library
String.
Error character modified module 710c, correct identification knot is obtained for error character string to be revised as into correct characters string
Fruit.
Error character mark module 710d, the error character string for pair determination in recognition result are marked.
Recognition result sending module 710e, for labeled recognition result to be sent to the end for being logged in auditor's account
End is shown.
Correct characters enquiry module 710b is additionally operable to receive the confirmation instruction that the terminal for being logged in auditor's account returns, root
Correct characters string corresponding with character errors form is inquired about in error correction character library according to confirming to instruct.
In the present embodiment, the error character string in recognition result is marked, and by the recognition result of mark send to
The terminal for being logged in auditor's account shown, the erroneous words that auditor is marked by being logged in the terminal-pair of auditor's account
Symbol string is audited, and after the completion of examination & verification, is received and is logged in the confirmation instruction that the terminal of auditor's account returns, referred to according to confirmation
Order carries out error correction to error character string, improves the determination accuracy rate of error character string, further increases entangling for recognition result
Wrong accuracy.
One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with
The hardware of correlation is instructed to complete by computer program, the computer program can be stored in a computer-readable storage and be situated between
In matter, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, foregoing storage medium can be
The non-volatile memory mediums such as magnetic disc, CD, read-only memory (Read-Only Memory, ROM), or random storage note
Recall body (Random Access Memory, RAM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality
Apply all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited
In contradiction, the scope that this specification is recorded all is considered to be.
Embodiment described above only expresses the several embodiments of the present invention, and its description is more specific and detailed, but simultaneously
Can not therefore it be construed as limiting the scope of the patent.It should be pointed out that come for one of ordinary skill in the art
Say, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention
Scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.
Claims (10)
1. a kind of character identifying method of document image, methods described include:
Obtain Bank bills image;
Identify that the character in the Bank bills image is identified result;
The bank identifier according to corresponding to the recognition result determines the Bank bills image;
Inquire about error correction character library corresponding to the bank identifier;
Error correction is carried out to the error character string in the recognition result according to the character errors form in the error correction character library, obtained
Correct recognition result.
2. according to the method for claim 1, it is characterised in that after the acquisition Bank bills image, in addition to:
Extract the seal image region in the Bank bills image;
Highest similarity is obtained in the similarity of the seal image region and each seal image to prestore;
By the highest similarity compared with default similarity;
If the highest similarity is more than default similarity, it is verified, performs in the identification Bank bills image
Character the step of being identified result.
3. according to the method for claim 1, it is characterised in that the character in the identification Bank bills image obtains
After recognition result, in addition to:
Extract the set form data in the recognition result;
Count the quantity for meeting the semantic set form data of corresponding form in the set form data of extraction;
Anticipation accuracy rate is generated according to the total quantity of the quantity of statistics and the set form data of extraction;
The anticipation default accuracy rate threshold value of accuracy rate is compared;
If the anticipation accuracy rate is more than or equal to default accuracy rate threshold value, execution is described to determine the silver according to the recognition result
Corresponding to row document image the step of bank identifier;
If the anticipation accuracy rate is less than default accuracy rate threshold value, generation error prompting message.
4. according to the method for claim 1, it is characterised in that the character errors form according in the error correction character library
Error correction is carried out to the error character string in the recognition result, obtains correct recognition result, including:
According to the error character string of the match query in the recognition result of the character errors form in the error correction character library;
Correct characters string corresponding with the character errors form is inquired about in the error correction character library;
The error character string is revised as the correct characters string, obtains correct recognition result.
5. according to the method for claim 4, it is characterised in that the character errors form according in the error correction character library
In the recognition result after the error character string of match query, in addition to:
Pair error character string determined is marked in the recognition result;
Labeled recognition result is sent to the terminal for being logged in auditor's account and shown;
The confirmation instruction that the terminal of auditor's account returns is logged in described in reception, confirms instruction described in performing in institute according to described
State the step of correct characters string corresponding with the character errors form is inquired about in error correction character library.
6. a kind of character recognition device of document image, it is characterised in that described device includes:
Image collection module, for obtaining Bank bills image;
Picture recognition module, for identifying that the character in the Bank bills image is identified result;
Determining module is identified, for bank identifier corresponding to determining the Bank bills image according to the recognition result;
Character library enquiry module, for inquiring about error correction character library corresponding to the bank identifier;
Character correction module, for according to the character errors form in the error correction character library to the erroneous words in the recognition result
Symbol string carries out error correction, obtains correct recognition result.
7. device according to claim 6, it is characterised in that described device also includes:
Region extraction module, for extracting the seal image region in the Bank bills image;
Similarity acquisition module, in the seal image region with being obtained in the similarity of each seal image to prestore
Highest similarity;
Similarity-rough set module, for by the highest similarity compared with default similarity;
If picture recognition module, which is additionally operable to the highest similarity, is more than default similarity, it is verified, identifies the bank
Character in document image is identified result.
8. device according to claim 6, it is characterised in that described device also includes:
Data extraction module, for extracting the set form data in the recognition result;
Data prejudge module, and the semantic set form data of corresponding form are met in the set form data for counting extraction
Quantity;Anticipation accuracy rate is generated according to the total quantity of the quantity of statistics and the set form data of extraction;
Threshold value comparison module, for the anticipation default accuracy rate threshold value of accuracy rate to be compared;
Information generating module, if being less than default accuracy rate threshold value, generation error prompting message for the anticipation accuracy rate;
If the mark determining module, which is additionally operable to the anticipation accuracy rate, is more than or equal to default accuracy rate threshold value, according to the identification
As a result bank identifier corresponding to the Bank bills image is determined.
9. device according to claim 6, it is characterised in that the character correction module includes:
Error character enquiry module, for being inquired about according to the character errors form in the error correction character library in the recognition result
The error character string of matching;
Correct characters enquiry module, for inquiring about correct characters corresponding with the character errors form in the error correction character library
String;
Error character modified module, for the error character string to be revised as into the correct characters string, obtain correct identification knot
Fruit.
10. device according to claim 9, it is characterised in that the character correction module also includes:
Error character mark module, the error character string for pair determination in the recognition result are marked;
Recognition result sending module, shown for labeled recognition result to be sent to the terminal for being logged in auditor's account
Show;
The correct characters enquiry module is additionally operable to be logged in the confirmation instruction that the terminal of auditor's account returns described in reception, root
Confirm that correct characters string corresponding with the character errors form is inquired about in instruction in the error correction character library according to described.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710091081.4A CN107622263B (en) | 2017-02-20 | 2017-02-20 | The character identifying method and device of document image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710091081.4A CN107622263B (en) | 2017-02-20 | 2017-02-20 | The character identifying method and device of document image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107622263A true CN107622263A (en) | 2018-01-23 |
CN107622263B CN107622263B (en) | 2018-08-21 |
Family
ID=61087822
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710091081.4A Active CN107622263B (en) | 2017-02-20 | 2017-02-20 | The character identifying method and device of document image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107622263B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109214387A (en) * | 2018-09-14 | 2019-01-15 | 辽宁奇辉电子系统工程有限公司 | A kind of railway operation detection system based on character recognition technology |
CN109344831A (en) * | 2018-08-22 | 2019-02-15 | 中国平安人寿保险股份有限公司 | A kind of tables of data recognition methods, device and terminal device |
CN109784339A (en) * | 2018-12-13 | 2019-05-21 | 平安普惠企业管理有限公司 | Picture recognition test method, device, computer equipment and storage medium |
CN111582169A (en) * | 2020-05-08 | 2020-08-25 | 腾讯科技(深圳)有限公司 | Image recognition data error correction method, device, computer equipment and storage medium |
CN111931771A (en) * | 2020-09-16 | 2020-11-13 | 深圳壹账通智能科技有限公司 | Bill content identification method, device, medium and electronic equipment |
CN115880699A (en) * | 2023-03-03 | 2023-03-31 | 济南市莱芜区综合检验检测中心 | Food packaging bag detection method and system |
CN117037190A (en) * | 2023-10-10 | 2023-11-10 | 北京惠朗时代科技有限公司 | Seal identification management system based on data analysis |
CN117095423A (en) * | 2023-10-20 | 2023-11-21 | 上海银行股份有限公司 | Bank bill character recognition method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030076994A1 (en) * | 1999-03-01 | 2003-04-24 | Hitachi, Ltd. | Business form handling method and system for carrying out the same |
CN102156864A (en) * | 2010-02-12 | 2011-08-17 | 冲电气工业株式会社 | Bill processing system |
CN103208004A (en) * | 2013-03-15 | 2013-07-17 | 北京英迈杰科技有限公司 | Automatic recognition and extraction method and device for bill information area |
CN104751194A (en) * | 2015-04-27 | 2015-07-01 | 陈包容 | Processing method and processing device for financial expense reimbursement |
CN105608453A (en) * | 2014-11-17 | 2016-05-25 | 株式会社日立信息通信工程 | Character identification system and character identification method |
CN105956590A (en) * | 2016-04-27 | 2016-09-21 | 泰合鼎川物联科技(北京)股份有限公司 | Character recognition method and character recognition system |
CN106934918A (en) * | 2015-12-30 | 2017-07-07 | 航天信息股份有限公司 | The method and apparatus for carrying out bill character recognition using basic corpus auxiliary |
-
2017
- 2017-02-20 CN CN201710091081.4A patent/CN107622263B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030076994A1 (en) * | 1999-03-01 | 2003-04-24 | Hitachi, Ltd. | Business form handling method and system for carrying out the same |
CN102156864A (en) * | 2010-02-12 | 2011-08-17 | 冲电气工业株式会社 | Bill processing system |
CN103208004A (en) * | 2013-03-15 | 2013-07-17 | 北京英迈杰科技有限公司 | Automatic recognition and extraction method and device for bill information area |
CN105608453A (en) * | 2014-11-17 | 2016-05-25 | 株式会社日立信息通信工程 | Character identification system and character identification method |
CN104751194A (en) * | 2015-04-27 | 2015-07-01 | 陈包容 | Processing method and processing device for financial expense reimbursement |
CN106934918A (en) * | 2015-12-30 | 2017-07-07 | 航天信息股份有限公司 | The method and apparatus for carrying out bill character recognition using basic corpus auxiliary |
CN105956590A (en) * | 2016-04-27 | 2016-09-21 | 泰合鼎川物联科技(北京)股份有限公司 | Character recognition method and character recognition system |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109344831A (en) * | 2018-08-22 | 2019-02-15 | 中国平安人寿保险股份有限公司 | A kind of tables of data recognition methods, device and terminal device |
CN109344831B (en) * | 2018-08-22 | 2024-04-05 | 中国平安人寿保险股份有限公司 | Data table identification method and device and terminal equipment |
CN109214387A (en) * | 2018-09-14 | 2019-01-15 | 辽宁奇辉电子系统工程有限公司 | A kind of railway operation detection system based on character recognition technology |
CN109784339A (en) * | 2018-12-13 | 2019-05-21 | 平安普惠企业管理有限公司 | Picture recognition test method, device, computer equipment and storage medium |
CN111582169A (en) * | 2020-05-08 | 2020-08-25 | 腾讯科技(深圳)有限公司 | Image recognition data error correction method, device, computer equipment and storage medium |
CN111582169B (en) * | 2020-05-08 | 2023-10-10 | 腾讯科技(深圳)有限公司 | Image recognition data error correction method, device, computer equipment and storage medium |
CN111931771A (en) * | 2020-09-16 | 2020-11-13 | 深圳壹账通智能科技有限公司 | Bill content identification method, device, medium and electronic equipment |
CN115880699A (en) * | 2023-03-03 | 2023-03-31 | 济南市莱芜区综合检验检测中心 | Food packaging bag detection method and system |
CN117037190A (en) * | 2023-10-10 | 2023-11-10 | 北京惠朗时代科技有限公司 | Seal identification management system based on data analysis |
CN117037190B (en) * | 2023-10-10 | 2023-12-15 | 北京惠朗时代科技有限公司 | Seal identification management system based on data analysis |
CN117095423A (en) * | 2023-10-20 | 2023-11-21 | 上海银行股份有限公司 | Bank bill character recognition method and device |
CN117095423B (en) * | 2023-10-20 | 2024-01-05 | 上海银行股份有限公司 | Bank bill character recognition method and device |
Also Published As
Publication number | Publication date |
---|---|
CN107622263B (en) | 2018-08-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107622263B (en) | The character identifying method and device of document image | |
KR101462289B1 (en) | Digital image archiving and retrieval using a mobile device system | |
CN102612696B (en) | Medical information system with report validator and report augmenter | |
JP4676225B2 (en) | Method and apparatus for capturing electronic forms from scanned documents | |
US20200097713A1 (en) | Method and System for Accurately Detecting, Extracting and Representing Redacted Text Blocks in a Document | |
CN110569856B (en) | Sample labeling method and device, and damage category identification method and device | |
CN110362700B (en) | Data processing method, device, computer equipment and storage medium | |
CN107733967A (en) | Processing method, device, computer equipment and the storage medium of pushed information | |
US11694499B2 (en) | Systems and methods for updating an image registry for use in fraud detection related to financial documents | |
CN107958204A (en) | Reference report recognition methods, device, computer equipment and storage medium | |
CN107590490A (en) | Unanimous vote face information acquisition method, device and the computer-readable recording medium of invoice | |
CN110111165A (en) | True from false of bills checking method, system, medium and electronic equipment | |
CN109003670A (en) | Big data medical information processing method, system, terminal device and storage medium | |
CN111984734A (en) | Data processing method, device and equipment based on block chain and storage medium | |
CN110188328B (en) | File structuring processing method and device | |
CN115294505B (en) | Risk object detection and training method and device for model thereof and electronic equipment | |
CN111462388A (en) | Bill inspection method and device, terminal equipment and storage medium | |
CN110992044B (en) | Data processing method and device, electronic equipment and readable storage medium | |
CN111445616B (en) | Invoice verification method and device, computer equipment and storage medium | |
CN111652746A (en) | Information generation method and device, electronic equipment and storage medium | |
CN115239962B (en) | Target segmentation method and device based on deep large receptive field space attention | |
CN116778534B (en) | Image processing method, device, equipment and medium | |
CN116974647A (en) | Application registration method and device and electronic equipment | |
CN117033371A (en) | Data reporting method and device based on artificial intelligence, computer equipment and medium | |
CN114611541A (en) | Invoice image recognition method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1244923 Country of ref document: HK |
|
GR01 | Patent grant | ||
GR01 | Patent grant |