CN113011407A - System and method for automatically identifying, sorting and delivering electric charge rechecking document - Google Patents
System and method for automatically identifying, sorting and delivering electric charge rechecking document Download PDFInfo
- Publication number
- CN113011407A CN113011407A CN202110163916.9A CN202110163916A CN113011407A CN 113011407 A CN113011407 A CN 113011407A CN 202110163916 A CN202110163916 A CN 202110163916A CN 113011407 A CN113011407 A CN 113011407A
- Authority
- CN
- China
- Prior art keywords
- image data
- data
- module
- document
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000012545 processing Methods 0.000 claims description 87
- 238000012795 verification Methods 0.000 claims description 50
- 238000012937 correction Methods 0.000 claims description 36
- 238000007781 pre-processing Methods 0.000 claims description 32
- 238000005457 optimization Methods 0.000 claims description 21
- 230000011218 segmentation Effects 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 17
- 239000000284 extract Substances 0.000 claims description 14
- 238000005520 cutting process Methods 0.000 claims description 13
- 238000012552 review Methods 0.000 claims description 8
- 230000007797 corrosion Effects 0.000 claims description 5
- 238000005260 corrosion Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 5
- 239000002131 composite material Substances 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 238000003672 processing method Methods 0.000 claims description 4
- 238000002955 isolation Methods 0.000 abstract description 4
- 238000001514 detection method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000012360 testing method Methods 0.000 description 6
- 238000011176 pooling Methods 0.000 description 5
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 238000013524 data verification Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003628 erosive effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/32—Normalisation of the pattern dimensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Character Discrimination (AREA)
Abstract
The invention provides an automatic identification, sorting and delivery system and method for an electric charge rechecking document. The method specifically comprises the steps of firstly collecting document image data and optimizing the document image data, extracting document information data through character recognition, then verifying the document information data, marking and correcting data which are not verified, matching the data which are verified with user information, and delivering mails according to matching results. The automatic identification and sorting and delivery system for the electric charge rechecking document efficiently realizes the accurate sending of the document under the condition of physical isolation of an internal network and an external network.
Description
Technical Field
The invention relates to the technical field of automatic identification and sending of an electric charge rechecking document under the condition of physical isolation of an internal network and an external network, in particular to a system and a method for automatically identifying, sorting and delivering the electric charge rechecking document.
Background
With the rapid development of economy, the legal concept of people is promoted year by year, and the requirements on the real-time performance and the accuracy of the electric charge rechecking document are increasingly urgent. Because of the objective reason of physical isolation between an internal network and an external network, the traditional means for rechecking the delivery of the bill of the electric charge mainly depends on manual processing, the bill in the internal network is manually input into an external network computer, a specific user electronic mailbox address is matched according to the bill user information, and then the bill enters a mail sending website to be delivered one by one. The traditional mode is time-consuming and easy to make mistakes, the working efficiency is limited, meanwhile, the electric charge rechecking document cannot be practically ensured to be timely and accurately sent to a user side, and the limitation is large.
Disclosure of Invention
The invention aims to overcome the defects in the existing method and provide a system and a method for automatically identifying, sorting and delivering an electric charge rechecking document.
The purpose of the invention is realized by the following technical scheme:
a system for automatically identifying, sorting and delivering an electric charge rechecking document comprises a data acquisition module, a data processing module, an identification module, a verification module, a matching and delivering module and a database, wherein the data acquisition module, the data processing module, the verification module and the matching and delivering module are all connected with the database, and the data acquisition module is used for acquiring document image data and storing the document image data into the database; the data processing module is used for optimizing and processing document image data; the data processing module is also connected with an identification module, and the identification module is used for acquiring document information data according to the document image data after the optimization processing; the authentication module is connected with the identification module and is used for carrying out information authentication on the bill information data; the matching and delivering module is used for matching the receipt information data with the user information of the database, and the matching and delivering module is also used for constructing and sending an e-mail according to a matching result.
The electric charge check sheet is identified and processed in the form of image data in the forms of photographing and the like, and corresponding document information is obtained without manual information input. Compared with manual information input, the method has the advantages of rapidness and accuracy by means of identification processing of document image data. And the authentication function is also provided after the information is identified, so that the information error rate is reduced. And moreover, mailbox matching can be carried out according to the identified accurate information, and the writing and sending of the mails are carried out according to the matched mailbox addresses, so that manual intervention is not needed in the whole process, the bill identification efficiency and the accuracy are high, automatic delivery is supported, and the labor cost is saved.
Further, the database comprises an image data unit, a document information data unit, a user information unit and a rule standard information unit, the image data unit is connected with the data acquisition module, and the image data unit is used for storing document image data acquired by the data acquisition module; the bill information data unit is connected with the verification module and used for storing bill information data verified by the verification module; the user information unit is connected with the matching and delivering module and is used for providing user information required by the matching and delivering module for information matching; the rule standard information unit is connected with the verification module and is used for providing judgment rules and standards for the verification module.
The data of different types are stored in different data units, so that the data can be effectively distinguished when image data processing and delivery work are carried out, the user information is stored, the identified document information data can be matched with the user information, and the delivery mailbox address can be ensured to be accurate.
Furthermore, the data processing module comprises a preprocessing unit and an inclination correction unit, the preprocessing unit is connected with the image data unit, the inclination correction unit is connected with the preprocessing unit, the inclination correction unit is also connected with the identification module, and the preprocessing unit and the inclination correction unit are both used for optimizing and processing document image data acquired by the data acquisition module.
Before data identification, document image data needs to be processed so as to improve identification efficiency and identification accuracy. The preprocessing unit and the inclination correction unit optimize initial document image data, and the preprocessing unit aims to simplify document image information quantity, filter irrelevant information, enhance useful information and improve the reliability of subsequent identification. The inclination correction unit mainly aims at the problem of inclination of initial document image data, and because the document image data is obtained in any one mode of mobile terminal photographing, PC terminal camera photographing, scanner photographing, album uploading or system downloading when the document image data is initially obtained, the problem of inclination of the initial document image data is probably caused by the photographing angle, and the accuracy of subsequent identification can be improved after inclination correction.
Furthermore, the identification module comprises a character segmentation unit and a character identification unit, the character segmentation unit is connected with the data processing module, the character segmentation unit is used for extracting characters in document image data and cutting the characters into single text images, the character identification unit is connected with the character segmentation unit, the character identification unit is also connected with the verification module, and the character identification unit is used for identifying the text images and transmitting document information data obtained by identification to the verification module.
The recognition accuracy can be effectively improved by using a single character mode for recognition, so that the recognition accuracy is greatly improved by cutting and then recognizing characters in document image data.
Further, a method for automatically identifying, sorting and delivering an electric charge rechecking document is suitable for the system for automatically identifying, sorting and delivering the electric charge rechecking document, and comprises the following steps:
the method comprises the following steps that firstly, a data acquisition module acquires document image data and stores the acquired document image data into an image data unit of a database;
secondly, extracting document image data in the image data unit by a data processing module, optimizing the document image data by the data processing module, and transmitting the processed document image data to an identification module;
thirdly, performing character recognition on the document image data after the optimization processing through a recognition module, and extracting document information data;
step four, the identification module sends the document information data to the verification module, the verification module verifies the document information data, if the document information data pass the verification, the document information data are stored in a document information data unit of the database, and step five is executed; if the bill information data does not pass the verification, marking the bill information data which does not pass the verification and the corresponding bill image data, correcting the marked bill image data by a bill processing personnel, importing the corrected bill image data into an identification module, and then returning to the third execution step;
and step five, the matching and delivering module matches the receipt information data in the receipt information data unit with the user information data in the user information unit, and then the matching and delivering module delivers the receipt information data to a corresponding user through a mail according to the matching result.
The identified document information data is verified through the verification module, the accuracy of the document information is improved, manual correction is supported, and the document information sent to a user by a final mail is correct. After the email address of the user is obtained through matching, the mail editing and document information uploading do not need to be carried out manually, and the obtained document information is edited into a mail to be sent directly through the matching and delivering module, so that the working efficiency is greatly improved, and the labor consumption is reduced.
Further, the optimization processing in the second step includes document image data preprocessing, and the specific process of document image data preprocessing is as follows: the preprocessing unit firstly normalizes the document image data, unifies the document image data into the same size, then binarizes the document image data after normalization processing to obtain a document image data binarized image with a prominent outline, and then denoises the document image data binarized image by a composite processing method of a corrosion operator and an expansion operator.
The document image data is subjected to normalization processing, so that the size of the document image can be unified, an image area can be better positioned, effective information can be identified, binarization processing is a technical means for processing an image by adopting black and white colors, the document image data is subjected to binarization processing, the outline of an identified target can be highlighted, the image information amount is compressed, and the subsequent identification efficiency is improved. The denoising method mainly solves the problem of background noise in the picture scanning process, and can eliminate strong noise and simultaneously reserve character features to the maximum extent.
Further, the optimization processing in the second step further includes tilt correction processing of text information in document image data, and the specific process of the tilt correction processing is as follows: the method comprises the steps that an inclination correction unit detects a region containing text information in document image data, the region is input into a preprocessing unit to be processed to obtain a text region binary image, the preprocessing unit transmits the obtained text region binary image back to the inclination correction unit, the inclination correction unit extracts coordinates of all pixels with non-zero gray values in the text region binary image, the direction of text inclination is calculated, meanwhile, a minimum rectangle containing the text region is obtained through a minArearect function, so that a central point of the text region is obtained, affine transformation processing is conducted on the text region in the document information data according to the central point of the text region, and corrected document image data are obtained.
After the characters are subjected to tilt correction, the segmentation accuracy can be improved when the characters are subsequently segmented.
Further, before character recognition is performed on the document image data after the optimization processing through the recognition module in the third step, the recognition module further extracts and cuts characters in the document image data after the optimization processing, and the specific process of extraction and cutting is as follows: the character segmentation unit performs horizontal projection on the document image data after optimization processing so as to obtain an upper limit and a lower limit of each line of document image data, then performs cutting according to the upper limit and the lower limit of each line of document image data, performs vertical projection on each line of cut document image data so as to obtain a left boundary and a right boundary of each character in the document image data, finally performs cutting according to the left boundary and the right boundary of each character in the document image data, and transmits all the characters obtained by cutting to the character recognition unit.
The character recognition efficiency is higher when the character is recognized according to a single character, so that the character is cut separately, and the error rate of the character recognition is effectively reduced.
Further, in the third step, the character recognition is classified and recognized through a fast-RCNN algorithm to obtain document information data.
Further, the specific process of validating the document information data in the fourth step is as follows: the verification module extracts a regular expression in the regular standard information unit, compares the document information data with the regular expression, extracts a user information structure in the regular standard information unit, compares the document information data with the user information structure, and passes the document information data verification when the comparison result shows that the document information data is consistent with the regular expression and the document information data is consistent with the user information structure; otherwise, the verification fails.
And verifying the identified user information structurally, thereby ensuring that the condition of information lack does not occur. And whether the bill information data meet the requirements or not is judged through the regular expression, so that the error rate of the bill information is further reduced, and the wrong bill can be corrected in time.
The invention has the beneficial effects that:
the information identification is carried out by collecting the image data of the electric charge rechecking document without manual information input, so that the objective problem of physical isolation of an internal network and an external network is solved, the information input efficiency is improved, and the document information error rate is reduced. And the original document image data is processed through preprocessing and inclination correction processing, so that the identification accuracy is improved. The identified information is verified before delivery matching, so that the user information on the document is correct and can be matched with the user information in the database, the condition that the document is wrongly sent by a user cannot occur, and the information accuracy rate identified by the document image data is ensured. And the mail can be sent independently without manually inputting information and sending the mail, so that the workload of workers is reduced, and the bill delivery efficiency is improved.
Drawings
FIG. 1 is a schematic diagram of an embodiment of the present invention;
FIG. 2 is a schematic flow diagram of the present invention;
FIG. 3 is a first interface diagram of an implementation interface of the system for automatically identifying, sorting and delivering the electric charge review documents according to the embodiment of the invention;
FIG. 4 is a second interface diagram of the interface of the system for automatically identifying, sorting and delivering the electric charge review documents according to the embodiment of the invention;
FIG. 5 is a third interface diagram of an implementation interface of the system for automatically identifying, sorting and delivering the electric charge review documents according to the embodiment of the invention;
wherein: 1. the system comprises a data acquisition module, a data processing module, a preprocessing unit, a tilt correction unit, a recognition module, a character segmentation unit, a character recognition unit, a verification module, a matching and delivery module, a database, a data base, a user information unit, a rule standard information unit, and a data processing module, 2-1, a pre-processing unit, 2-2, a tilt correction unit, 3-1, a character segmentation unit, 3-2, a character recognition unit, 4, a verification module, 5, a matching and delivery module, 6, a database, 6-1, an image data unit, 6-2.
Detailed Description
The invention is further described below with reference to the figures and examples.
Example (b):
a system for automatically identifying, sorting and delivering an electric charge rechecking document comprises a data acquisition module 1, a data processing module 2, an identification module 3, an authentication module 4, a matching and delivering module 5 and a database 6, wherein the data acquisition module 1, the data processing module 2, the authentication module 4 and the matching and delivering module 5 are all connected with the database 6, and the data acquisition module 1 is used for acquiring document image data and storing the document image data into the database 6; the data processing module 2 is used for optimizing and processing document image data; the data processing module 2 is also connected with an identification module 3, and the identification module 3 is used for acquiring document information data according to the document image data after the optimization processing; the authentication module 4 is connected with the identification module 3, and the authentication module 4 is used for carrying out information authentication on the bill information data; the matching and delivering module 5 is used for matching the bill information data with the user information of the database 6, and the matching and delivering module 5 is also used for constructing and sending an e-mail according to a matching result.
The data acquisition unit supports to shoot from the mobile terminal, shoot by the PC terminal camera, shoot by the scanner, upload the photo album or obtain images in any mode of system downloading, the acquiescent acquisition mode is to shoot through the PC terminal camera in batches, the high-definition camera has clear shooting effect, and subsequent image data processing and character recognition are facilitated.
The database 6 comprises an image data unit 6-1, a document information data unit 6-2, a user information unit 6-3 and a rule standard information unit 6-4, the image data unit 6-1 is connected with the data acquisition module 1, and the image data unit 6-1 is used for storing document image data acquired by the data acquisition module 1; the bill information data unit 6-2 is connected with the authentication module 4, and the bill information data unit 6-2 is used for storing bill information data authenticated by the authentication module 4; the user information unit 6-3 is connected with the matching and delivering module 5, and the user information unit 6-3 is used for providing user information required by the matching and delivering module 5 for information matching; the rule standard information unit 6-4 is connected with the verification module 4, and the rule standard information unit 6-4 is used for providing judgment rules and standards for the verification module 4.
A fixed image storage path is provided for the collected document image data, and the path can be modified at any time.
The data processing module 2 comprises a preprocessing unit 2-1 and an inclination correction unit 2-2, the preprocessing unit 2-1 is connected with the image data unit 6-1, the inclination correction unit 2-2 is connected with the preprocessing unit 2-1, the inclination correction unit 2-2 is further connected with the identification module 3, and the preprocessing correspondence and inclination correction unit 2-2 are used for optimizing document image data acquired by the data acquisition module 1.
The identification module 3 comprises a character segmentation unit 3-1 and a character identification unit 3-2, the character segmentation unit 3-1 is connected with the data processing module 2, the character segmentation unit 3-1 is used for extracting characters in document image data and cutting the characters into single text images, the character identification unit 3-2 is connected with the character segmentation unit 3-1, the character identification unit 3-2 is further connected with the verification module 4, and the character identification unit 3-2 is used for identifying the text images and transmitting document information data obtained through identification to the verification module 4.
The receipt contains a large amount of information including numbers, Chinese characters, table structures and the like, so that the characters are divided by character division before receipt information identification and entry, and the identification accuracy in a single character image form is higher.
A method for automatically identifying, sorting and delivering an electric charge rechecking document is suitable for the system for automatically identifying, sorting and delivering the electric charge rechecking document, as shown in figure 2, and comprises the following steps:
firstly, a data acquisition module 1 acquires document image data and stores the acquired document image data into an image data unit 6-1 of a database 6;
secondly, the data processing module 2 extracts document image data in the image data unit 6-1, the data processing module 2 performs optimization processing on the document image data, and transmits the processed document image data to the identification module 3;
thirdly, performing character recognition on the document image data after the optimization processing through a recognition module 3, and extracting document information data;
step four, the identification module 3 sends the bill information data to the verification module 4, the verification module 4 verifies the bill information data, if the bill information data pass the verification, the bill information data is stored in a bill information data unit 6-2 of the database 6, and step five is executed; if the bill information data does not pass the verification, marking the bill information data which does not pass the verification and the corresponding bill image data, correcting the marked bill image data by a bill processing personnel, importing the corrected bill image data into the identification module 3, and then returning to the third execution step;
and step five, the matching and delivery module 5 matches the receipt information data in the receipt information data unit 6-2 with the user information data in the user information unit 6-3, and then the matching and delivery module 5 delivers the receipt information data to the corresponding user through a mail according to the matching result.
Before sending the mail, the mailbox address contained in the matched user information is sent into the receiver function of the smtplib module, and the MIME module is utilized to construct the structure of the e-mail, such as a mail header, a sender, a receiver, a mail body and the like. Then using MIMEMULTIpart function to add attachment, automatically reading document information data, and adding it into the established mail. And after the mail is automatically constructed, the smtplib module is automatically connected with a preset mail server, and finally the mail is sent to a specified mailbox.
In the second step, the optimization processing comprises document image data preprocessing, and the specific process of document image data preprocessing is as follows: the preprocessing unit 2-1 firstly normalizes the document image data, unifies the document image data into the same size, then binarizes the document image data after normalization processing to obtain a document image data binarized image with a prominent outline, and then denoises the document image data binarized image by a composite processing method of a corrosion operator and an expansion operator.
The binarization processing is a technical means of setting the gray value of a pixel point on an image to be 0 or 255, namely processing the image by adopting black and white colors, and can not only highlight the outline of an identification target, but also compress the information content of the image and improve the processing efficiency. The binarization processing specifically adopts a local adaptive threshold correction method, and the method avoids the problems that the global binarization cannot effectively process the image shadow and the illumination is uneven.
The denoising process mainly solves the problem of background noise in the image scanning process, and particularly in the region with dense document characters, the background noise is easy to appear. And the composite processing method of the corrosion operator and the expansion operator can eliminate strong noise and simultaneously reserve character characteristics to the maximum extent, and can effectively meet the purpose of bill image optimization. The erosion operator can eliminate boundary points of the binary image and remove objects smaller than the structural elements, while the expansion operator is opposite to the erosion operator and can expand the boundary points of the binary image and combine background points contacting with the object into the object to expand the boundary outwards. Therefore, the document image binary image is processed by adopting the corrosion operator and the expansion operator at the same time, and the denoising processing of the document image binary image can be completed only by setting a reasonable size of the structural element and implementing a processing mode of expanding first and then corroding or corroding first and then expanding.
In the second step, the optimization processing further comprises inclination correction processing of text information in document image data, and the specific process of the inclination correction processing is as follows: the method comprises the steps that a tilt correction unit 2-2 detects a region containing text information in document image data, the region is input into a preprocessing unit 2-1 to be processed to obtain a text region binary image, then the preprocessing unit 2-1 transmits the obtained text region binary image back to the tilt correction unit 2-2, the tilt correction unit 2-2 extracts coordinates of all pixels with non-zero gray values in the text region binary image, the direction of text tilt is calculated, meanwhile, a minimum rectangle containing the text region is obtained through a minAreaRect function, so that a center point of the text region is obtained, then affine transformation processing is conducted on the text region in the document information data according to the center point of the text region, and corrected document image data are obtained.
Before character recognition is carried out on the document image data after the optimization processing through the recognition module 3 in the third step, the recognition module 3 also extracts and cuts characters in the document image data after the optimization processing, and the specific processes of the extraction and the cutting are as follows: the character segmentation unit 3-1 horizontally projects the optimized document image data to obtain an upper limit and a lower limit of each line of document image data, then cuts the document image data according to the upper limit and the lower limit of each line of document image data, vertically projects the cut document image data to obtain a left limit and a right limit of each character in the document image data, finally cuts the document image data according to the left limit and the right limit of each character in the document image data, and transmits all the cut characters to the character recognition unit 3-2.
And in the third step, the character recognition is classified and recognized through a fast-RCNN algorithm to obtain document information data.
The fast-RCNN algorithm is divided into a plurality of processing modules, and the algorithm is divided into an input module, a convolutional layer, a linear rectifying layer, a pooling layer, a full-link layer and a classification regression module. Firstly, performing feature extraction on input document image data through convolution kernel in a convolution layer, wherein the convolution layer internally comprises a plurality of convolution kernels, and each element forming the convolution kernels corresponds to a weight coefficient and a deviation amount and is similar to a neuron of a feedforward neural network. When the convolution kernel works, the convolution kernel regularly sweeps the input characteristics, matrix element multiplication summation is carried out on the input characteristics, and deviation amount is superposed. Here, a convolution kernel with a scale of 3 × 3 is used, and feature extraction is performed using 6 convolution kernels with different weighting coefficients. After the features are extracted, the linear rectification layer further finds a positive sample detection frame through detection frame regression, the positive sample detection frame comprises character information to be detected and classified, interference of an external background on recognition is further reduced, and the linear rectification layer is activated through a ReLU function. And extracting a candidate feature map through a feature map obtained by extracting the comprehensive features of the pooling layer of the fast-RCNN algorithm and a candidate region framed by a positive sample detection frame, so as to be used for subsequent full-connected layer and classification regression layer to classify, wherein the pooling dimension of 2x2 is selected, meanwhile, a pooling result is obtained by using a weighted average method, and a weighting coefficient is obtained by automatic iteration of a gradient descent method. The pooling layer carries out aggregation statistics on the local region characteristics of the image, greatly reduces the dimensionality of the characteristic vector, and can reduce the calculation amount of a subsequent full-connection layer and a classification regression layer. The full connection layer and the classification regression layer mainly classify the candidate feature maps to obtain the probability of a certain class, and then the final accurate position of the detection frame is obtained through the calculation of the original detection target position, so that the recognition result of the character is obtained.
The specific process of document information data verification in the fourth step is as follows: the verification module 4 extracts a regular expression in the rule standard information unit 6-4, the verification module 4 compares the data of the receipt information with the regular expression, meanwhile, the verification module 4 extracts a user information structure in the rule standard information unit 6-4 and compares the data of the receipt information with the user information structure, and when the comparison result is that the data of the receipt information is consistent with the regular expression and the data of the receipt information is consistent with the user information structure, the data of the receipt information passes verification; otherwise, the verification fails.
The implementation interface diagram of the system for automatically identifying, sorting and delivering the electric charge rechecking document is totally divided into three parts, as shown in fig. 3, the first part of the implementation interface comprises user number identification and mail sending, three units are arranged below the implementation interface diagram, one unit is used for leading in a user mailbox address from Excel, the second unit is used for identifying the user number, and the third unit is used for sending the image to the user mailbox. As shown in fig. 4, the second part includes a mail transmission record query, which may be based on the user number, the start date, or a derived transmission. As shown in fig. 5, the third part includes an operation log, which can show the overall workflow status, so that the user can quickly know the identification result of each document, thereby facilitating timely troubleshooting. Through the implementation interface, the automatic identification and delivery work of the electric charge rechecking document can be realized, the confirmation of the sending condition is supported, the interface is simple and easy to understand, and the operation is simple.
In this embodiment, an electric power consumer in a certain city is used as an experimental object, and 200 electric power charge rechecking documents thereof are used as test samples to perform effect detection. 200 electric charge rechecking documents are scanned and all stored in a jpg format and then processed by the document automatic identification, sorting and delivery system developed by the invention. The test standard mainly comprises the following two aspects: firstly, whether the power house number can be accurately identified in batches is judged, the test result is evaluated by the identification rate and the delivery rate, and the evaluation formula is as follows:
the test results are shown in table 1 below.
Table 1: test results
Content of test | Success number | Total number of | Recognition rate |
Identification | 194 | 200 | 97% |
Delivery of | 194 | 194 | 100% |
The total processing time of the documents is 2070 seconds, and the average processing time of each document is 10.35 seconds. The manual processing of a single document takes about 36 seconds averagely, and compared with the original manual processing time, the efficiency is improved by over 71.25 percent. From the above table, it can be seen that the method provided herein can effectively improve the bill processing efficiency, and simultaneously ensure a higher recognition accuracy.
The above-described embodiments are only preferred embodiments of the present invention, and are not intended to limit the present invention in any way, and other variations and modifications may be made without departing from the spirit of the invention as set forth in the claims.
Claims (10)
1. The system for automatically identifying, sorting and delivering the electric charge rechecking document is characterized by comprising a data acquisition module (1), a data processing module (2), an identification module (3), a verification module (4), a matching and delivering module (5) and a database (6), wherein the data acquisition module (1), the data processing module (2), the verification module (4) and the matching and delivering module (5) are all connected with the database (6), and the data acquisition module (1) is used for acquiring document image data and storing the document image data into the database (6); the data processing module (2) is used for optimizing and processing document image data; the data processing module (2) is also connected with an identification module (3), and the identification module (3) is used for acquiring document information data according to the document image data after the optimization processing; the authentication module (4) is connected with the identification module (3), and the authentication module (4) is used for carrying out information authentication on the bill information data; the matching and delivering module (5) is used for matching the data of the single information with the user information, and the matching and delivering module (5) is also used for constructing and sending the e-mail according to the matching result.
2. The system for automatically identifying, sorting and delivering the electric charge review document according to claim 1, wherein the database (6) comprises an image data unit (6-1), a document information data unit (6-2), a user information unit (6-3) and a rule standard information unit (6-4), the image data unit (6-1) is connected with the data acquisition module (1), and the image data unit (6-1) is used for storing document image data acquired by the data acquisition module (1); the bill information data unit (6-2) is connected with the verification module (4), and the bill information data unit (6-2) is used for storing bill information data verified by the verification module (4); the user information unit (6-3) is connected with the matching and delivering module (5), and the user information unit (6-3) is used for providing user information required by the matching and delivering module (5) for information matching; the rule standard information unit (6-4) is connected with the verification module (4), and the rule standard information unit (6-4) is used for providing judgment rules and standards for the verification module (4).
3. The system for automatically identifying, sorting and delivering the electric charge review documents according to claim 1, wherein the data processing module (2) comprises a preprocessing unit (2-1) and an inclination correction unit (2-2), the preprocessing unit (2-1) is connected with the image data unit (6-1), the inclination correction unit (2-2) is connected with the preprocessing unit (2-1), the inclination correction unit (2-2) is further connected with the identification module (3), and the preprocessing unit and the inclination correction unit (2-2) are used for optimizing and processing document image data acquired by the data acquisition module (1).
4. The system for automatically identifying, sorting and delivering electric charge review documents according to claim 1, characterized in that the recognition module (3) comprises a character segmentation unit (3-1) and a character recognition unit (3-2), the character segmentation unit (3-1) is connected with the data processing module (2), the character segmentation unit (3-1) is used for extracting characters in document image data and cutting the characters into single character images, the character recognition unit (3-2) is connected with the character segmentation unit (3-1), the character recognition unit (3-2) is also connected with the verification module (4), the character recognition unit (3-2) is used for recognizing the character image and transmitting the bill information data obtained through recognition to the verification module (4).
5. A method for automatically identifying, sorting and delivering an electric charge rechecking document is suitable for a system for automatically identifying, sorting and delivering the electric charge rechecking document, and is characterized by comprising the following steps:
the method comprises the following steps that firstly, a data acquisition module (1) acquires document image data and stores the acquired document image data into an image data unit (6-1) of a database (6);
secondly, the data processing module (2) extracts document image data in the image data unit (6-1), the data processing module (2) performs optimization processing on the document image data, and transmits the processed document image data to the identification module (3);
thirdly, performing character recognition on the document image data after the optimization processing through a recognition module (3) to extract document information data;
step four, the identification module (3) sends the document information data to the verification module (4), the verification module (4) verifies the document information data, if the document information data pass the verification, the document information data are stored in a document information data unit (6-2) of the database (6), and step five is executed; if the bill information data fails to pass the verification, marking the bill information data which fails to pass the verification and the corresponding bill image data, correcting the marked bill image data by a bill processing personnel, importing the corrected bill image data into an identification module (3), and then returning to the third execution step;
and step five, the matching and delivering module (5) matches the receipt information data in the receipt information data unit (6-2) with the user information data in the user information unit (6-3), and then the matching and delivering module (5) delivers the receipt information data to the corresponding user through a mail according to the matching result.
6. The method for automatically identifying, sorting and delivering electric charge review documents according to claim 5, wherein the optimization processing in the second step comprises document image data preprocessing, and the specific process of document image data preprocessing is as follows: the preprocessing unit (2-1) firstly normalizes the document image data, unifies the document image data into the same size, then binarizes the document image data after normalization processing to obtain a document image data binarized image with a prominent outline, and then denoises the document image data binarized image by a composite processing method of a corrosion operator and an expansion operator.
7. The method for automatically identifying, sorting and delivering electric charge review documents according to claim 5, wherein the optimization processing in the second step further comprises tilt correction processing of text information in document image data, and the specific process of the tilt correction processing is as follows: the method comprises the steps that a tilt correction unit (2-2) detects a region containing text information in document image data, the region is input into a preprocessing unit (2-1) to be processed to obtain a text region binary image, then the preprocessing unit (2-1) transmits the obtained text region binary image back to the tilt correction unit (2-2), the tilt correction unit (2-2) extracts coordinates of all pixels with non-zero gray values in the text region binary image, the direction of text tilt is calculated, meanwhile, a minimum rectangle containing a text region is obtained through a minAreaRect function, the center point of the text region is obtained, then the text region in the document information data is subjected to transform processing according to the center point of the text region, and corrected document image data are obtained.
8. The method for automatically identifying, sorting and delivering the electric charge rechecking document according to claim 5, wherein before the character recognition is performed on the optimized document image data through the recognition module (3) in the third step, the recognition module (3) further extracts and cuts the characters in the optimized document image data, and the specific processes of the extraction and the cutting are as follows: the character segmentation unit (3-1) horizontally projects the document image data after optimization processing, so that an upper limit and a lower limit of each line of document image data are obtained, then cutting is carried out according to the upper limit and the lower limit of each line of document image data, vertical projection is carried out on each line of cut document image data, a left boundary and a right boundary of each character in the document image data are obtained, finally cutting is carried out according to the left boundary and the right boundary of each character in the document image data, and all characters obtained through cutting are transmitted to the character recognition unit (3-2).
9. The method for automatically identifying, sorting and delivering electric charge rechecked documents according to claim 5, wherein the character recognition in step three is classified and identified by a fast-RCNN algorithm to obtain document information data.
10. The method for automatically identifying, sorting and delivering the electric charge rechecking documents as claimed in claim 5, wherein the document information data in the fourth step is verified by the following specific process: the verification module (4) extracts a regular expression in the rule standard information unit (6-4), the verification module (4) compares the data of the document information with the regular expression, meanwhile, the verification module (4) extracts a user information structure in the rule standard information unit (6-4), compares the data of the document information with the user information structure, and passes the verification of the document information data when the comparison result shows that the document information data is consistent with the regular expression and the document information data is consistent with the user information structure; otherwise, the verification fails.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110163916.9A CN113011407A (en) | 2021-02-05 | 2021-02-05 | System and method for automatically identifying, sorting and delivering electric charge rechecking document |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110163916.9A CN113011407A (en) | 2021-02-05 | 2021-02-05 | System and method for automatically identifying, sorting and delivering electric charge rechecking document |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113011407A true CN113011407A (en) | 2021-06-22 |
Family
ID=76385526
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110163916.9A Pending CN113011407A (en) | 2021-02-05 | 2021-02-05 | System and method for automatically identifying, sorting and delivering electric charge rechecking document |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113011407A (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105654072A (en) * | 2016-03-24 | 2016-06-08 | 哈尔滨工业大学 | Automatic character extraction and recognition system and method for low-resolution medical bill image |
CN109087194A (en) * | 2018-08-13 | 2018-12-25 | 平安普惠企业管理有限公司 | Invoice checking method, device, computer equipment and storage medium |
CN109492641A (en) * | 2018-09-18 | 2019-03-19 | 上海延华智能科技(集团)股份有限公司 | Energy bills input method and system, storage medium, server based on image recognition |
CN109657665A (en) * | 2018-10-31 | 2019-04-19 | 广东工业大学 | A kind of invoice batch automatic recognition system based on deep learning |
CN109784341A (en) * | 2018-12-25 | 2019-05-21 | 华南理工大学 | A kind of medical document recognition methods based on LSTM neural network |
CN109840519A (en) * | 2019-01-25 | 2019-06-04 | 青岛盈智科技有限公司 | A kind of adaptive intelligent form recognition input device and its application method |
CN110705488A (en) * | 2019-10-09 | 2020-01-17 | 广州医药信息科技有限公司 | Image character recognition method |
US20200143349A1 (en) * | 2018-11-02 | 2020-05-07 | Royal Bank Of Canada | System and method for auto-populating electronic transaction process |
CN111966640A (en) * | 2020-09-03 | 2020-11-20 | 深圳市小满科技有限公司 | Document file identification method and system |
CN112256723A (en) * | 2020-09-02 | 2021-01-22 | 中山大学 | Intelligent management system and retrieval method for no-mail based on deep learning image-text recognition |
-
2021
- 2021-02-05 CN CN202110163916.9A patent/CN113011407A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105654072A (en) * | 2016-03-24 | 2016-06-08 | 哈尔滨工业大学 | Automatic character extraction and recognition system and method for low-resolution medical bill image |
CN109087194A (en) * | 2018-08-13 | 2018-12-25 | 平安普惠企业管理有限公司 | Invoice checking method, device, computer equipment and storage medium |
CN109492641A (en) * | 2018-09-18 | 2019-03-19 | 上海延华智能科技(集团)股份有限公司 | Energy bills input method and system, storage medium, server based on image recognition |
CN109657665A (en) * | 2018-10-31 | 2019-04-19 | 广东工业大学 | A kind of invoice batch automatic recognition system based on deep learning |
US20200143349A1 (en) * | 2018-11-02 | 2020-05-07 | Royal Bank Of Canada | System and method for auto-populating electronic transaction process |
CN109784341A (en) * | 2018-12-25 | 2019-05-21 | 华南理工大学 | A kind of medical document recognition methods based on LSTM neural network |
CN109840519A (en) * | 2019-01-25 | 2019-06-04 | 青岛盈智科技有限公司 | A kind of adaptive intelligent form recognition input device and its application method |
CN110705488A (en) * | 2019-10-09 | 2020-01-17 | 广州医药信息科技有限公司 | Image character recognition method |
CN112256723A (en) * | 2020-09-02 | 2021-01-22 | 中山大学 | Intelligent management system and retrieval method for no-mail based on deep learning image-text recognition |
CN111966640A (en) * | 2020-09-03 | 2020-11-20 | 深圳市小满科技有限公司 | Document file identification method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11151369B2 (en) | Systems and methods for classifying payment documents during mobile image processing | |
CN109657665B (en) | Invoice batch automatic identification system based on deep learning | |
CN111680688B (en) | Character recognition method and device, electronic equipment and storage medium | |
CN110555372A (en) | Data entry method, device, equipment and storage medium | |
CN110569341B (en) | Method and device for configuring chat robot, computer equipment and storage medium | |
CN112651289B (en) | Value-added tax common invoice intelligent recognition and verification system and method thereof | |
US20130028502A1 (en) | Systems and methods for mobile image capture and processing of checks | |
CN110516649B (en) | Face recognition-based alumni authentication method and system | |
CN112818785B (en) | Rapid digitization method and system for meteorological paper form document | |
CN113963147B (en) | Key information extraction method and system based on semantic segmentation | |
CN111178147B (en) | Screen crushing and grading method, device, equipment and computer readable storage medium | |
CN112464925A (en) | Mobile terminal account opening data bank information automatic extraction method based on machine learning | |
CN111178203B (en) | Signature verification method and device, computer equipment and storage medium | |
CN114694161A (en) | Text recognition method and equipment for specific format certificate and storage medium | |
CN108090728B (en) | Express information input method and system based on intelligent terminal | |
CN113011407A (en) | System and method for automatically identifying, sorting and delivering electric charge rechecking document | |
CN116363655A (en) | Financial bill identification method and system | |
WO2019071476A1 (en) | Express information input method and system based on intelligent terminal | |
CN114463767A (en) | Credit card identification method, device, computer equipment and storage medium | |
CN114758340A (en) | Intelligent identification method, device and equipment for logistics address and storage medium | |
CN113947778A (en) | Archive file based digital processing method | |
US10181077B1 (en) | Document image orientation assessment and correction | |
CN111353744A (en) | Goods receipt data storage method and device based on image recognition | |
CN117391068B (en) | Method and system for checking life insurance security business information based on RPA | |
CN113723392A (en) | Document image quality evaluation method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210622 |
|
RJ01 | Rejection of invention patent application after publication |