CN107622263A - The character identifying method and device of document image - Google Patents

The character identifying method and device of document image Download PDF

Info

Publication number
CN107622263A
CN107622263A CN201710091081.4A CN201710091081A CN107622263A CN 107622263 A CN107622263 A CN 107622263A CN 201710091081 A CN201710091081 A CN 201710091081A CN 107622263 A CN107622263 A CN 107622263A
Authority
CN
China
Prior art keywords
character
recognition result
image
module
error correction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710091081.4A
Other languages
Chinese (zh)
Other versions
CN107622263B (en
Inventor
赵骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201710091081.4A priority Critical patent/CN107622263B/en
Publication of CN107622263A publication Critical patent/CN107622263A/en
Application granted granted Critical
Publication of CN107622263B publication Critical patent/CN107622263B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to a kind of character identifying method and device of document image, methods described includes:Obtain Bank bills image;Identify that the character in the Bank bills image is identified result;The bank identifier according to corresponding to the recognition result determines the Bank bills image;Call error correction character library corresponding to the bank identifier;Error correction is carried out to the error character string in the recognition result according to the error correction character library of calling and obtains correct recognition result.The character identifying method and device of document image provided by the invention, corresponding error correction character library is inquired about according to bank identifier corresponding to Bank bills image, character errors form in the error correction character library of calling carries out error correction to the recognition result of Bank bills image, improves the accuracy of recognition result.

Description

The character identifying method and device of document image
Technical field
The present invention relates to the technical field of image recognition, more particularly to a kind of character identifying method and dress of document image Put.
Background technology
With the development of computer technology, increasing information is handled by computer.In image recognition, sometimes Software is needed to be identified to obtain the character information in image to the character in image.But the word in image is identified by software Accord with obtained recognition result, there can be part in recognition result and identify incorrect character.
However, in order to ensure recognition result correctly, it is necessary to search the incorrect character of identification in recognition result and carry out Modification.At present, incorrect character is identified typically by artificial search, and the character to finding is modified, it is artificial right Identify that incorrect character is modified the relatively low of efficiency in recognition result.
The content of the invention
Based on this, it is necessary to for manually to identifying that incorrect character is modified the relatively low of efficiency in recognition result Problem, there is provided a kind of recognition methods of document image and device.
A kind of character identifying method of document image, methods described include:
Obtain Bank bills image;
Identify that the character in the Bank bills image is identified result;
The bank identifier according to corresponding to the recognition result determines the Bank bills image;
Inquire about error correction character library corresponding to the bank identifier;
Error correction is carried out to the error character string in the recognition result according to the character errors form in the error correction character library Obtain correct recognition result.
In one of the embodiments, after the acquisition Bank bills image, in addition to:
Extract the seal image region in the Bank bills image;
Highest similarity is obtained in the similarity of the seal image region and each seal image to prestore;
By the highest similarity compared with default similarity;
If the highest similarity is more than default similarity, it is verified, performs the identification Bank bills figure The step of character as in is identified result.
In one of the embodiments, the character in the identification Bank bills image is identified after result, Also include:
Extract the set form data in the recognition result;
Count the quantity for meeting the semantic set form data of corresponding form in the set form data of extraction;
Anticipation accuracy rate is generated according to the total quantity of the quantity of statistics and the set form data of extraction;
The anticipation default accuracy rate threshold value of accuracy rate is compared;
If the anticipation accuracy rate is more than or equal to default accuracy rate threshold value, execution is described to determine institute according to the recognition result The step of stating bank identifier corresponding to Bank bills image;
If the anticipation accuracy rate is less than default accuracy rate threshold value, generation error prompting message.
In one of the embodiments, the character errors form according in the error correction character library is to the recognition result In error character string carry out error correction and obtain correct recognition result, including:
According to the error character string of the match query in the recognition result of the character errors form in the error correction character library;
Correct characters string corresponding with the character errors form is inquired about in the error correction character library;
The error character string is revised as the correct characters string and obtains correct recognition result.
In one of the embodiments, the character errors form according in the error correction character library is in the recognition result After the error character string of middle match query, in addition to:
Pair error character string determined is marked in the recognition result;
Labeled recognition result is sent to the terminal for being logged in auditor's account and shown;
The confirmation instruction that the terminal of auditor's account returns is logged in described in reception, according to performing the confirmation instruction The step of correct characters string corresponding with the character errors form is inquired about in the error correction character library.
The character identifying method of above-mentioned document image, the character in the Bank bills image that gets is identified to obtain Recognition result, the bank identifier according to corresponding to recognition result determines Bank bills image, inquire about erroneous character correction corresponding to bank identifier Storehouse.Each bank identifier corresponds to different error correction character libraries, store in error correction character library character errors form and with character errors lattice Correct characters string corresponding to formula.Character errors form in error correction character library is repaiied to the error character string in recognition result Change, improve the modification efficiency to identifying incorrect character in recognition result, the character errors form in error correction character library The correct recognition result of error correction acquisition is carried out to the recognition result of Bank bills image, improves the accuracy of recognition result.
A kind of character recognition device of document image, described device include:
Image collection module, for obtaining Bank bills image;
Picture recognition module, for identifying that the character in the Bank bills image is identified result;
Determining module is identified, for bank identifier corresponding to determining the Bank bills image according to the recognition result;
Character library enquiry module, for inquiring about error correction character library corresponding to the bank identifier;
Character correction module, for according to the character errors form in the error correction character library to the mistake in the recognition result Character string carries out error correction and obtains correct recognition result by mistake.
In one of the embodiments, described device also includes:
Region extraction module, for extracting the seal image region in the Bank bills image;
Similarity acquisition module, in the similarity in the seal image region and each seal image to prestore Obtain highest similarity;
Similarity-rough set module, for by the highest similarity compared with default similarity;
If picture recognition module, which is additionally operable to the highest similarity, is more than default similarity, it is verified, described in identification Character in Bank bills image is identified result.
In one of the embodiments, described device also includes:
Data extraction module, for extracting the set form data in the recognition result;
Data prejudge module, and the semantic set form of corresponding form is met in the set form data for counting extraction The quantity of data;Anticipation accuracy rate is generated according to the total quantity of the quantity of statistics and the set form data of extraction;
Threshold value comparison module, for the anticipation default accuracy rate threshold value of accuracy rate to be compared;
Information generating module, if being less than default accuracy rate threshold value for the anticipation accuracy rate, generation error reminds letter Breath;
If the mark determining module, which is additionally operable to the anticipation accuracy rate, is more than or equal to default accuracy rate threshold value, according to described Recognition result determines bank identifier corresponding to the Bank bills image.
In one of the embodiments, the character correction module includes:
Error character enquiry module, for according to the character errors form in the error correction character library in the recognition result The error character string of match query;
Correct characters enquiry module, it is corresponding with the character errors form correct for being inquired about in the error correction character library Character string;
Error character modified module, correct identification is obtained for the error character string to be revised as into the correct characters string As a result.
In one of the embodiments, the character correction module also includes:
Error character mark module, the error character string for pair determination in the recognition result are marked;
Recognition result sending module, for sending to the terminal for being logged in auditor's account labeled recognition result Row display;
The confirmation that the correct characters enquiry module is additionally operable to be logged in the terminal return of auditor's account described in reception refers to Order, confirm that correct characters string corresponding with the character errors form is inquired about in instruction in the error correction character library according to described.
The character recognition device of above-mentioned document image, the character in the Bank bills image that gets is identified to obtain Recognition result, the bank identifier according to corresponding to recognition result determines Bank bills image, inquire about erroneous character correction corresponding to bank identifier Storehouse.Each bank identifier corresponds to different error correction character libraries, store in error correction character library character errors form and with character errors lattice Correct characters string corresponding to formula.Character errors form in error correction character library is repaiied to the error character string in recognition result Change, improve the modification efficiency to identifying incorrect character in recognition result, the character errors form in error correction character library The correct recognition result of error correction acquisition is carried out to the recognition result of Bank bills image, improves the accuracy of recognition result.
Brief description of the drawings
Fig. 1 is the applied environment figure of the character identifying method of document image in one embodiment;
Fig. 2 is the structured flowchart of the server in the character recognition system of document image in one embodiment;
Fig. 3 is the schematic flow sheet of the character identifying method of document image in one embodiment;
Fig. 4 is schematic flow sheet the step of Bank bills image is verified in one embodiment;
Fig. 5 is the schematic flow sheet for the step of accuracy rate judges in advance in one embodiment;
The schematic flow sheet for the step of Fig. 6 is marked erroneous character string in one embodiment;
Fig. 7 is the structured flowchart of the character recognition device of document image in one embodiment;
Fig. 8 is the structured flowchart of the character recognition device of document image in another embodiment;
Fig. 9 is the structured flowchart of character correction module in one embodiment.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
Fig. 1 is the applied environment figure of the character identifying method of document image in one embodiment.Reference picture 1, the document figure The character identifying method of picture is applied to the character recognition system of document image.The character recognition system of document image includes terminal 110 and server 120, wherein terminal 110 be connected by network with server 120.Terminal 110 can be fixed terminal or movement Terminal, fixed terminal can be specifically at least one of printer, scanner and monitor, and mobile terminal can be specifically flat At least one of plate computer, smart mobile phone, personal digital assistant and digital camera.
Fig. 2 is the internal structure signal of the server 120 in the character recognition system of Fig. 1 document images in one embodiment Figure.As shown in Fig. 2 the server 120 includes the processor, non-volatile memory medium, memory storage connected by system bus Device and network interface.Wherein, the non-volatile memory medium of server 120 is stored with operating system, database, in addition to a kind of The character recognition device of document image, the character recognition device of the document image are used to realize a kind of character recognition of document image Method.Processor, which is used to provide, to be calculated and control ability, supports the operation of whole server 120, the memory storage in server 120 Device provides environment for the operation of the character recognition device of the document image in non-volatile memory medium, can be deposited in the built-in storage Computer-readable instruction is contained, when the computer-readable instruction is executed by processor, may be such that a kind of document figure of computing device The character identifying method of picture.Network interface is used to carry out network service with terminal 110.
As shown in figure 3, in one embodiment, there is provided a kind of character identifying method of document image, the present embodiment is with this Method apply the server 120 in the character recognition system of Fig. 1 document images for example, this method specifically include it is following Content:
S302, obtain Bank bills image.
Specifically, when terminal 110 collects Bank bills image, terminal 110 sends out the Bank bills image collected Deliver to server 120.The Bank bills image that the receiving terminal 110 of server 120 is sent.Bank bills image is to Bank bills Obtained image is imaged or is scanned, Bank bills image includes character.Bank bills can be specifically bank's flowing water list. Character is the general name of various words and symbol, including at least one digital in various countries' word, punctuation mark, graphical symbol.
S304, identify that the character in Bank bills image is identified result.
Specifically, server 120 carries out binary conversion treatment after Bank bills image is got to Bank bills image, Bank bills image is changed into black white image.Server 120 scans the character in Bank bills image, extracts the character of each character Feature, inquired about according to character feature in character feature storehouse corresponding to character, using the character inquired as recognition result.Character is special Sign can be specifically character boundary, character shape and character frame.Character frame can be specifically to represent this with minimum pixel Character.
In one embodiment, S304 specifically also includes:Bank bills image is divided according to character row and character row Cut to obtain single character zone;Extract the image information features in single character zone;According to the image information features pair of extraction Single character, which is identified, is identified result.
Specifically, server 120 is split after obtaining single character zone to Bank bills image procession, and extraction is single Image information features in character zone.Image information features can specifically include character boundary, font, character skeleton and character At least one of pixel of skeleton.Character skeleton is the framework that character is represented with minimal number of pixel.Server 120 Character in single character zone is refined, extracts the character skeleton in single character zone.Word is provided with server 120 Symbol identification storehouse, character recognition storehouse include image information features and character, and image information features are corresponding with character to be stored in character Identify in storehouse.Server 120 is inquired about according to the image information features of extraction in character recognition storehouse to be believed with the character feature of extraction Character corresponding to breath, the character to inquire are used as recognition result.
Character recognition storehouse is provided with server 120, character recognition storehouse includes image information features and character, and character is special Reference breath is corresponding with character to be stored in character recognition storehouse.Server 120 is according to the image information features of extraction in character recognition Character corresponding with the image information features extracted is inquired about in storehouse, the character to inquire is used as recognition result.
In one embodiment, the image information features of extraction are character skeleton, and server 120 is looked into character recognition storehouse Ask the character matched with character skeleton.Can be specifically, the calculating character skeleton of server 120 in character recognition character it is similar Degree, choose with character skeleton similarity highest character as the character inquired.
S306, the bank identifier according to corresponding to recognition result determines Bank bills image.
Specifically, Bank bills image includes character corresponding to bank identifier, and server 120 enters to Bank bills image The recognition result that row identification obtains includes character corresponding to bank identifier.Server 120 extracts bank's list in recognition result According to bank identifier corresponding to image, bank identifier is specifically bank's full name, bank referred to as and in the english abbreviation of Bank Name Any one.
S308, inquire about error correction character library corresponding to bank identifier.
Specifically, store multiple error correction character libraries in server 120, each error correction character library respectively with each bank identifier pair Should, i.e., error correction character library corresponding to each bank identifier differs.Character errors form and correct word are store in error correction character library The corresponding relation of string, such as " super * silver " correspondence " super Net silver " are accorded with, wherein " * " represents 0, one or more characters.Server 120 after the bank identifier in extracting recognition result, inquires about error correction character library corresponding to bank identifier.
S310, error correction acquisition is carried out to the error character string in recognition result according to the character errors form in error correction character library Correct recognition result.
Specifically, server 120 extracts the character errors form in error correction character library, inquires about and whether there is in recognition result With the character string for the character errors format match extracted, the character string inquired is error character string.If inquiring, server 120 extract correct characters string corresponding to character errors form, the character string that will be inquired in recognition result from error correction character library It is revised as the correct characters string extracted.
In the present embodiment, the character in the Bank bills image that gets is identified and is identified result, according to knowledge Other result determines bank identifier corresponding to Bank bills image, inquires about error correction character library corresponding to bank identifier.Each bank identifier Different error correction character libraries is corresponded to, character errors form and correct characters corresponding with character errors form are store in error correction character library String.Character errors form in error correction character library is modified to the error character string in recognition result, is improved to identification As a result the modification efficiency of the middle incorrect character of identification, the character errors form in error correction character library is to Bank bills image Recognition result carries out error correction and obtains correct recognition result, improves the accuracy of recognition result.
As shown in figure 4, the step of also including checking Bank bills image in one embodiment, after S302, the step Specifically include following steps:
S402, extract the seal image region in Bank bills image.
S404, highest similarity is obtained in the similarity of seal image region and each seal image to prestore.
Specifically, server 120 inquires about seal image region in Bank bills image, from Bank bills image Extraction seal image region obtains seal image region.Prestore each seal image in server 120, seal image with it is each Corresponding to bank identifier.Server 120 calculates seal image region and the similarity of each seal image to prestore, from being calculated Similarity in choose highest similarity.
S406, by highest similarity compared with default similarity.
Specifically, default similarity is provided with server 120, default similarity is used to verify the true of Bank bills image Reality;If similarity is more than default similarity, Bank bills image is true;If similarity is less than or equal to default similarity, bank Document image is false.Server 120 the highest similarity of each seal image for getting seal image region and prestoring with it is pre- If similarity is compared.
S408, if highest similarity is more than default similarity, it is verified, identifies that the character in Bank bills image obtains To recognition result.
Specifically, highest similarity is being by server 120 compared with default similarity, if highest similarity is more than During default similarity, then Bank bills image is true, is verified, and the character in Bank bills image is identified;If most High similarity is less than or equal to default similarity, then Bank bills image is false, checking not by, terminate identification process, can be with Generation checking does not pass through information.
In one embodiment, server 120 can also extract the seal image area in the image of the certificate of set form Domain, the true and false of certificate, the certificate tool of set form are judged by the similarity of seal image region and the seal image to prestore Body can be at least one of property ownership certificate, land certificate and driver's license.
In the present embodiment, seal image region is extracted from Bank bills image, according to seal image region and the print that prestores The similarity of chapter image verifies the true and false of Bank bills image, is being verified, when to determine Bank bills image be true, to silver Character in row document image is identified;Checking not by, it is that vacation is to determine Bank bills image, terminate identification process, The calculation resources spent by the false Bank bills of identification are saved, improve the utilization rate of calculation resources.
As shown in figure 5, in one embodiment, the step of also judgement in advance including accuracy rate after S304, the step is specific Including herein below:
S502, extract the set form data in recognition result.
S504, count the quantity for meeting the semantic set form data of corresponding form in the set form data of extraction; Anticipation accuracy rate is generated according to the total quantity of the quantity of statistics and the set form data of extraction.
Specifically, set form data in recognition result be present, server 120 extracts set form data, statistics extraction The data total amount of the set form data arrived, count and meet the semantic fixation of corresponding form in the set form data extracted The quantity of formatted data, the right amount of set form data.It is accurate that the data right amount of statistics divided by data total amount are obtained into anticipation True rate.The semantic set form for set form data fit of form, or the data relationship between set form data.
In one embodiment, the detecting to the set form data extracted of server 120, what Detection and Extraction were arrived Whether set form data meet set form, and server 120 is counting the data for the set form data for meeting set form just Really amount.For example, date format is 8 characters, such as " 20161123 ", if 9 characters in the date data extracted be present Date data, then the date data of 9 characters is wrong data.Extracting in date data has 10, positive exact figures be present According to for 6, then it is 60% to prejudge accuracy rate.
In one embodiment, server 120 extracts balance data and surplus data, detection balance data and surplus data Whether match.
S506, judge to prejudge whether accuracy rate is less than default accuracy rate threshold value;If it is not, then perform S508;If so, perform S510。
S508, the bank identifier according to corresponding to recognition result determines Bank bills image.
S510, generation error prompting message.
Specifically, server 120 will prejudge accuracy rate compared with default accuracy rate threshold value, be more than in anticipation accuracy rate During equal to default accuracy rate threshold value, server 120 extracts bank identifier corresponding to Bank bills image from recognition result.Pre- When sentencing accurate record less than default accuracy rate threshold value, show that recognition result has more identification mistake, then generation identification mistake carries Awake information, error prompting information represent anticipation not by terminating the process for carrying out error correction to recognition result.
In the present embodiment, before error correction is carried out to recognition result, first the accuracy rate of recognition result is judged in advance, will Judge that obtained anticipation accuracy rate is determined whether to recognition result error correction in advance.When anticipation accuracy rate is less than default accuracy rate threshold value When, show more, it is necessary to which the character in Bank bills image is identified again in the presence of mistake;When anticipation accuracy rate be more than etc. When default accuracy rate threshold value, show existing mistake within the scope of error correction, error correction can be carried out to recognition result so as to obtain To correct recognition result.
As shown in fig. 6, in one embodiment, the step of S310 specifically also includes marked erroneous character string, step tool Body includes herein below:
S602, according to the error character string of the match query in recognition result of the character errors form in error correction character library.
Specifically, store various error character string formats in error correction character library, in error correction character library also there is with erroneous words Accord with the correct characters string of string format matching.Server 120 is inquired about whether there is in recognition result meets mistake in error correction character library The character string of character format, the character string inquired are error character string.For example, in the error correction character library of such as agricultural bank The error character string format " surplus gold+" being stored with, wherein "+" represents one or more characters, and server 120 is in recognition result The character that middle inquiry matches with " surplus gold+", if the character string inquired is " surplus gold A ", " surplus gold A " is determination Error character string.
S604, pair error character string determined is marked in recognition result.
Specifically, after server 120 determines error character string in recognition result, pair error character string determined enters rower Note, makes the error character string of determination be shown in the form of being different from correct characters string.It is specifically as follows under error character wound addition Line, can also be the background color that the addition of error character string is different from character color.
S606, labeled recognition result is sent to the terminal for being logged in auditor's account and shown.
Specifically, server 120, which is inquired about, is logged in the terminal 110 of auditor's account, and server 120 is by labeled identification As a result send to the terminal 110 for being logged in auditor's account.The terminal 110 for being logged in auditor's account receives server 120 After the recognition result of transmission, recognition result is shown.Auditor corresponding to auditor's account is by terminal 110 to mark Error character string in recognition result is checked.
S608, receive and be logged in the confirmation instruction that the terminal of auditor's account returns, according to confirmation instruction in error correction character library It is middle to inquire about correct characters string corresponding with character errors form.
Specifically, the terminal 110 for being logged in auditor's account can be after the recognition result of display mark, and triggering confirmation refers to Order, the terminal 110 for being logged in auditor's account will confirm that instruction is back to server 120.Confirm that instruction is used to trigger server The error character string marked in 120 pairs of recognition results carries out error correction.Server 120 is logged in the end of auditor's account receiving After the confirmation instruction that end 110 returns, the character errors form of the error character String matching of determination, extraction are inquired about in error correction character library Correct characters string corresponding to the character errors form inquired, the correct characters string extracted are and the error character of determination wound pair The correct characters string answered.
S610, the correct characters string that the error character string of determination is revised as inquiring is obtained into correct recognition result.
Specifically, server 120 obtains the correct characters string that the error character string of determination is replaced or is revised as inquiring Correct recognition result.For example, determine error character string for " surplus gold A ", inquired in error correction character library with " knot The character errors form of remaining gold A " matchings is " surplus gold+", the correct characters corresponding to extraction " surplus gold+" in error correction character library Go here and there as " balance ", by recognition result " surplus gold A " is replaced with " balance ".
In the present embodiment, the error character string in recognition result is marked, and by the recognition result of mark send to The terminal for being logged in auditor's account shown, the erroneous words that auditor is marked by being logged in the terminal-pair of auditor's account Symbol string is audited, and after the completion of examination & verification, is received and is logged in the confirmation instruction that the terminal of auditor's account returns, referred to according to confirmation Order carries out error correction to error character string, improves the determination accuracy rate of error character string, further increases entangling for recognition result Wrong accuracy.
As shown in fig. 7, in one embodiment, there is provided a kind of character recognition device 700 of document image, the device are specific Including:Image collection module 702, picture recognition module 704, mark determining module 706, character library enquiry module 708 and character entangle Mismatch block 710.
Image collection module 702, for obtaining Bank bills image.
Picture recognition module 704, for identifying that the character in Bank bills image is identified result.
Determining module 706 is identified, for bank identifier corresponding to determining Bank bills image according to recognition result.
Character library enquiry module 708, for inquiring about error correction character library corresponding to bank identifier.
Character correction module 710, for according to the character errors form in error correction character library to the erroneous words in recognition result Symbol string carries out error correction and obtains correct recognition result.
In the present embodiment, the character in the Bank bills image that gets is identified and is identified result, according to knowledge Other result determines bank identifier corresponding to Bank bills image, inquires about error correction character library corresponding to bank identifier.Each bank identifier Different error correction character libraries is corresponded to, character errors form and correct characters corresponding with character errors form are store in error correction character library String.Character errors form in error correction character library is modified to the error character string in recognition result, is improved to identification As a result the modification efficiency of the middle incorrect character of identification, the character errors form in error correction character library is to Bank bills image Recognition result carries out error correction and obtains correct recognition result, improves the accuracy of recognition result.
As shown in figure 8, in one embodiment, the character recognition device 700 of document image specifically also includes:Extracted region Module 712, similarity acquisition module 714, similarity-rough set module 716, data extraction module 718, data anticipation module 720, Threshold value comparison module 722 and information generating module 724.
Region extraction module 712, for extracting the seal image region in Bank bills image.
Similarity acquisition module 714, in seal image region with being obtained in the similarity of each seal image to prestore Highest similarity.
Similarity-rough set module 716, for by highest similarity compared with default similarity.
If picture recognition module 704, which is additionally operable to highest similarity, is more than default similarity, it is verified, identification bank is single Result is identified according to the character in image.
Data extraction module 718, for extracting the set form data in recognition result.
Data prejudge module 720, and the semantic fixation of corresponding form is met in the set form data for counting extraction The quantity of formatted data;Anticipation accuracy rate is generated according to the total quantity of the quantity of statistics and the set form data of extraction.
Threshold value comparison module 722, it is compared for the default accuracy rate threshold value of accuracy rate will to be prejudged.
Information generating module 724, if being less than default accuracy rate threshold value for prejudging accuracy rate, generation error reminds letter Breath.
If mark determining module 706 is additionally operable to prejudge accuracy rate more than or equal to default accuracy rate threshold value, according to recognition result Determine bank identifier corresponding to Bank bills image.
In the present embodiment, seal image region is extracted from Bank bills image, according to seal image region and the print that prestores The similarity of chapter image verifies the true and false of Bank bills image, is being verified, when to determine Bank bills image be true, to silver Character in row document image is identified;Checking not by, it is that vacation is to determine Bank bills image, terminate identification process, The calculation resources spent by the false Bank bills of identification are saved, improve the utilization rate of calculation resources.Carried out to recognition result Before error correction, first the accuracy rate of recognition result is judged in advance, will judge that obtained anticipation accuracy rate determines whether to pair in advance Recognition result error correction.When prejudging accuracy rate less than default accuracy rate threshold value, show more, it is necessary to again to bank in the presence of mistake Character in document image is identified;When prejudging accuracy rate more than or equal to default accuracy rate threshold value, show existing mistake Within the scope of error correction, error correction can be carried out to recognition result so as to obtain correct recognition result.
As shown in figure 9, in one embodiment, character correction module 710 specifically includes:Error character enquiry module 710a, correct characters enquiry module 710b, error character modified module 710c, error character mark module 710d and recognition result Sending module 710e.
Error character enquiry module 710a, for being inquired about according to the character errors form in error correction character library in recognition result The error character string of matching.
Correct characters enquiry module 710b, for inquiring about correct characters corresponding with character errors form in error correction character library String.
Error character modified module 710c, correct identification knot is obtained for error character string to be revised as into correct characters string Fruit.
Error character mark module 710d, the error character string for pair determination in recognition result are marked.
Recognition result sending module 710e, for labeled recognition result to be sent to the end for being logged in auditor's account End is shown.
Correct characters enquiry module 710b is additionally operable to receive the confirmation instruction that the terminal for being logged in auditor's account returns, root Correct characters string corresponding with character errors form is inquired about in error correction character library according to confirming to instruct.
In the present embodiment, the error character string in recognition result is marked, and by the recognition result of mark send to The terminal for being logged in auditor's account shown, the erroneous words that auditor is marked by being logged in the terminal-pair of auditor's account Symbol string is audited, and after the completion of examination & verification, is received and is logged in the confirmation instruction that the terminal of auditor's account returns, referred to according to confirmation Order carries out error correction to error character string, improves the determination accuracy rate of error character string, further increases entangling for recognition result Wrong accuracy.
One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with The hardware of correlation is instructed to complete by computer program, the computer program can be stored in a computer-readable storage and be situated between In matter, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, foregoing storage medium can be The non-volatile memory mediums such as magnetic disc, CD, read-only memory (Read-Only Memory, ROM), or random storage note Recall body (Random Access Memory, RAM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality Apply all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, the scope that this specification is recorded all is considered to be.
Embodiment described above only expresses the several embodiments of the present invention, and its description is more specific and detailed, but simultaneously Can not therefore it be construed as limiting the scope of the patent.It should be pointed out that come for one of ordinary skill in the art Say, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention Scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (10)

1. a kind of character identifying method of document image, methods described include:
Obtain Bank bills image;
Identify that the character in the Bank bills image is identified result;
The bank identifier according to corresponding to the recognition result determines the Bank bills image;
Inquire about error correction character library corresponding to the bank identifier;
Error correction is carried out to the error character string in the recognition result according to the character errors form in the error correction character library, obtained Correct recognition result.
2. according to the method for claim 1, it is characterised in that after the acquisition Bank bills image, in addition to:
Extract the seal image region in the Bank bills image;
Highest similarity is obtained in the similarity of the seal image region and each seal image to prestore;
By the highest similarity compared with default similarity;
If the highest similarity is more than default similarity, it is verified, performs in the identification Bank bills image Character the step of being identified result.
3. according to the method for claim 1, it is characterised in that the character in the identification Bank bills image obtains After recognition result, in addition to:
Extract the set form data in the recognition result;
Count the quantity for meeting the semantic set form data of corresponding form in the set form data of extraction;
Anticipation accuracy rate is generated according to the total quantity of the quantity of statistics and the set form data of extraction;
The anticipation default accuracy rate threshold value of accuracy rate is compared;
If the anticipation accuracy rate is more than or equal to default accuracy rate threshold value, execution is described to determine the silver according to the recognition result Corresponding to row document image the step of bank identifier;
If the anticipation accuracy rate is less than default accuracy rate threshold value, generation error prompting message.
4. according to the method for claim 1, it is characterised in that the character errors form according in the error correction character library Error correction is carried out to the error character string in the recognition result, obtains correct recognition result, including:
According to the error character string of the match query in the recognition result of the character errors form in the error correction character library;
Correct characters string corresponding with the character errors form is inquired about in the error correction character library;
The error character string is revised as the correct characters string, obtains correct recognition result.
5. according to the method for claim 4, it is characterised in that the character errors form according in the error correction character library In the recognition result after the error character string of match query, in addition to:
Pair error character string determined is marked in the recognition result;
Labeled recognition result is sent to the terminal for being logged in auditor's account and shown;
The confirmation instruction that the terminal of auditor's account returns is logged in described in reception, confirms instruction described in performing in institute according to described State the step of correct characters string corresponding with the character errors form is inquired about in error correction character library.
6. a kind of character recognition device of document image, it is characterised in that described device includes:
Image collection module, for obtaining Bank bills image;
Picture recognition module, for identifying that the character in the Bank bills image is identified result;
Determining module is identified, for bank identifier corresponding to determining the Bank bills image according to the recognition result;
Character library enquiry module, for inquiring about error correction character library corresponding to the bank identifier;
Character correction module, for according to the character errors form in the error correction character library to the erroneous words in the recognition result Symbol string carries out error correction, obtains correct recognition result.
7. device according to claim 6, it is characterised in that described device also includes:
Region extraction module, for extracting the seal image region in the Bank bills image;
Similarity acquisition module, in the seal image region with being obtained in the similarity of each seal image to prestore Highest similarity;
Similarity-rough set module, for by the highest similarity compared with default similarity;
If picture recognition module, which is additionally operable to the highest similarity, is more than default similarity, it is verified, identifies the bank Character in document image is identified result.
8. device according to claim 6, it is characterised in that described device also includes:
Data extraction module, for extracting the set form data in the recognition result;
Data prejudge module, and the semantic set form data of corresponding form are met in the set form data for counting extraction Quantity;Anticipation accuracy rate is generated according to the total quantity of the quantity of statistics and the set form data of extraction;
Threshold value comparison module, for the anticipation default accuracy rate threshold value of accuracy rate to be compared;
Information generating module, if being less than default accuracy rate threshold value, generation error prompting message for the anticipation accuracy rate;
If the mark determining module, which is additionally operable to the anticipation accuracy rate, is more than or equal to default accuracy rate threshold value, according to the identification As a result bank identifier corresponding to the Bank bills image is determined.
9. device according to claim 6, it is characterised in that the character correction module includes:
Error character enquiry module, for being inquired about according to the character errors form in the error correction character library in the recognition result The error character string of matching;
Correct characters enquiry module, for inquiring about correct characters corresponding with the character errors form in the error correction character library String;
Error character modified module, for the error character string to be revised as into the correct characters string, obtain correct identification knot Fruit.
10. device according to claim 9, it is characterised in that the character correction module also includes:
Error character mark module, the error character string for pair determination in the recognition result are marked;
Recognition result sending module, shown for labeled recognition result to be sent to the terminal for being logged in auditor's account Show;
The correct characters enquiry module is additionally operable to be logged in the confirmation instruction that the terminal of auditor's account returns described in reception, root Confirm that correct characters string corresponding with the character errors form is inquired about in instruction in the error correction character library according to described.
CN201710091081.4A 2017-02-20 2017-02-20 The character identifying method and device of document image Active CN107622263B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710091081.4A CN107622263B (en) 2017-02-20 2017-02-20 The character identifying method and device of document image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710091081.4A CN107622263B (en) 2017-02-20 2017-02-20 The character identifying method and device of document image

Publications (2)

Publication Number Publication Date
CN107622263A true CN107622263A (en) 2018-01-23
CN107622263B CN107622263B (en) 2018-08-21

Family

ID=61087822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710091081.4A Active CN107622263B (en) 2017-02-20 2017-02-20 The character identifying method and device of document image

Country Status (1)

Country Link
CN (1) CN107622263B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214387A (en) * 2018-09-14 2019-01-15 辽宁奇辉电子系统工程有限公司 A kind of railway operation detection system based on character recognition technology
CN109344831A (en) * 2018-08-22 2019-02-15 中国平安人寿保险股份有限公司 A kind of tables of data recognition methods, device and terminal device
CN109784339A (en) * 2018-12-13 2019-05-21 平安普惠企业管理有限公司 Picture recognition test method, device, computer equipment and storage medium
CN111582169A (en) * 2020-05-08 2020-08-25 腾讯科技(深圳)有限公司 Image recognition data error correction method, device, computer equipment and storage medium
CN111931771A (en) * 2020-09-16 2020-11-13 深圳壹账通智能科技有限公司 Bill content identification method, device, medium and electronic equipment
CN115880699A (en) * 2023-03-03 2023-03-31 济南市莱芜区综合检验检测中心 Food packaging bag detection method and system
CN117037190A (en) * 2023-10-10 2023-11-10 北京惠朗时代科技有限公司 Seal identification management system based on data analysis
CN117095423A (en) * 2023-10-20 2023-11-21 上海银行股份有限公司 Bank bill character recognition method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030076994A1 (en) * 1999-03-01 2003-04-24 Hitachi, Ltd. Business form handling method and system for carrying out the same
CN102156864A (en) * 2010-02-12 2011-08-17 冲电气工业株式会社 Bill processing system
CN103208004A (en) * 2013-03-15 2013-07-17 北京英迈杰科技有限公司 Automatic recognition and extraction method and device for bill information area
CN104751194A (en) * 2015-04-27 2015-07-01 陈包容 Processing method and processing device for financial expense reimbursement
CN105608453A (en) * 2014-11-17 2016-05-25 株式会社日立信息通信工程 Character identification system and character identification method
CN105956590A (en) * 2016-04-27 2016-09-21 泰合鼎川物联科技(北京)股份有限公司 Character recognition method and character recognition system
CN106934918A (en) * 2015-12-30 2017-07-07 航天信息股份有限公司 The method and apparatus for carrying out bill character recognition using basic corpus auxiliary

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030076994A1 (en) * 1999-03-01 2003-04-24 Hitachi, Ltd. Business form handling method and system for carrying out the same
CN102156864A (en) * 2010-02-12 2011-08-17 冲电气工业株式会社 Bill processing system
CN103208004A (en) * 2013-03-15 2013-07-17 北京英迈杰科技有限公司 Automatic recognition and extraction method and device for bill information area
CN105608453A (en) * 2014-11-17 2016-05-25 株式会社日立信息通信工程 Character identification system and character identification method
CN104751194A (en) * 2015-04-27 2015-07-01 陈包容 Processing method and processing device for financial expense reimbursement
CN106934918A (en) * 2015-12-30 2017-07-07 航天信息股份有限公司 The method and apparatus for carrying out bill character recognition using basic corpus auxiliary
CN105956590A (en) * 2016-04-27 2016-09-21 泰合鼎川物联科技(北京)股份有限公司 Character recognition method and character recognition system

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344831A (en) * 2018-08-22 2019-02-15 中国平安人寿保险股份有限公司 A kind of tables of data recognition methods, device and terminal device
CN109344831B (en) * 2018-08-22 2024-04-05 中国平安人寿保险股份有限公司 Data table identification method and device and terminal equipment
CN109214387A (en) * 2018-09-14 2019-01-15 辽宁奇辉电子系统工程有限公司 A kind of railway operation detection system based on character recognition technology
CN109784339A (en) * 2018-12-13 2019-05-21 平安普惠企业管理有限公司 Picture recognition test method, device, computer equipment and storage medium
CN111582169A (en) * 2020-05-08 2020-08-25 腾讯科技(深圳)有限公司 Image recognition data error correction method, device, computer equipment and storage medium
CN111582169B (en) * 2020-05-08 2023-10-10 腾讯科技(深圳)有限公司 Image recognition data error correction method, device, computer equipment and storage medium
CN111931771A (en) * 2020-09-16 2020-11-13 深圳壹账通智能科技有限公司 Bill content identification method, device, medium and electronic equipment
CN115880699A (en) * 2023-03-03 2023-03-31 济南市莱芜区综合检验检测中心 Food packaging bag detection method and system
CN117037190A (en) * 2023-10-10 2023-11-10 北京惠朗时代科技有限公司 Seal identification management system based on data analysis
CN117037190B (en) * 2023-10-10 2023-12-15 北京惠朗时代科技有限公司 Seal identification management system based on data analysis
CN117095423A (en) * 2023-10-20 2023-11-21 上海银行股份有限公司 Bank bill character recognition method and device
CN117095423B (en) * 2023-10-20 2024-01-05 上海银行股份有限公司 Bank bill character recognition method and device

Also Published As

Publication number Publication date
CN107622263B (en) 2018-08-21

Similar Documents

Publication Publication Date Title
CN107622263B (en) The character identifying method and device of document image
KR101462289B1 (en) Digital image archiving and retrieval using a mobile device system
CN102612696B (en) Medical information system with report validator and report augmenter
JP4676225B2 (en) Method and apparatus for capturing electronic forms from scanned documents
US20200097713A1 (en) Method and System for Accurately Detecting, Extracting and Representing Redacted Text Blocks in a Document
CN110569856B (en) Sample labeling method and device, and damage category identification method and device
CN110362700B (en) Data processing method, device, computer equipment and storage medium
CN107733967A (en) Processing method, device, computer equipment and the storage medium of pushed information
US11694499B2 (en) Systems and methods for updating an image registry for use in fraud detection related to financial documents
CN107958204A (en) Reference report recognition methods, device, computer equipment and storage medium
CN107590490A (en) Unanimous vote face information acquisition method, device and the computer-readable recording medium of invoice
CN110111165A (en) True from false of bills checking method, system, medium and electronic equipment
CN109003670A (en) Big data medical information processing method, system, terminal device and storage medium
CN111984734A (en) Data processing method, device and equipment based on block chain and storage medium
CN110188328B (en) File structuring processing method and device
CN115294505B (en) Risk object detection and training method and device for model thereof and electronic equipment
CN111462388A (en) Bill inspection method and device, terminal equipment and storage medium
CN110992044B (en) Data processing method and device, electronic equipment and readable storage medium
CN111445616B (en) Invoice verification method and device, computer equipment and storage medium
CN111652746A (en) Information generation method and device, electronic equipment and storage medium
CN115239962B (en) Target segmentation method and device based on deep large receptive field space attention
CN116778534B (en) Image processing method, device, equipment and medium
CN116974647A (en) Application registration method and device and electronic equipment
CN117033371A (en) Data reporting method and device based on artificial intelligence, computer equipment and medium
CN114611541A (en) Invoice image recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1244923

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant