CN113011407A - System and method for automatically identifying, sorting and delivering electric charge rechecking document - Google Patents

System and method for automatically identifying, sorting and delivering electric charge rechecking document Download PDF

Info

Publication number
CN113011407A
CN113011407A CN202110163916.9A CN202110163916A CN113011407A CN 113011407 A CN113011407 A CN 113011407A CN 202110163916 A CN202110163916 A CN 202110163916A CN 113011407 A CN113011407 A CN 113011407A
Authority
CN
China
Prior art keywords
image data
data
module
document
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110163916.9A
Other languages
Chinese (zh)
Inventor
何妍妍
鲍卫东
范光平
徐向东
龚鸿仙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Yiwu Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Yiwu Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd, Yiwu Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority to CN202110163916.9A priority Critical patent/CN113011407A/en
Publication of CN113011407A publication Critical patent/CN113011407A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Discrimination (AREA)

Abstract

The invention provides an automatic identification, sorting and delivery system and method for an electric charge rechecking document. The method specifically comprises the steps of firstly collecting document image data and optimizing the document image data, extracting document information data through character recognition, then verifying the document information data, marking and correcting data which are not verified, matching the data which are verified with user information, and delivering mails according to matching results. The automatic identification and sorting and delivery system for the electric charge rechecking document efficiently realizes the accurate sending of the document under the condition of physical isolation of an internal network and an external network.

Description

System and method for automatically identifying, sorting and delivering electric charge rechecking document
Technical Field
The invention relates to the technical field of automatic identification and sending of an electric charge rechecking document under the condition of physical isolation of an internal network and an external network, in particular to a system and a method for automatically identifying, sorting and delivering the electric charge rechecking document.
Background
With the rapid development of economy, the legal concept of people is promoted year by year, and the requirements on the real-time performance and the accuracy of the electric charge rechecking document are increasingly urgent. Because of the objective reason of physical isolation between an internal network and an external network, the traditional means for rechecking the delivery of the bill of the electric charge mainly depends on manual processing, the bill in the internal network is manually input into an external network computer, a specific user electronic mailbox address is matched according to the bill user information, and then the bill enters a mail sending website to be delivered one by one. The traditional mode is time-consuming and easy to make mistakes, the working efficiency is limited, meanwhile, the electric charge rechecking document cannot be practically ensured to be timely and accurately sent to a user side, and the limitation is large.
Disclosure of Invention
The invention aims to overcome the defects in the existing method and provide a system and a method for automatically identifying, sorting and delivering an electric charge rechecking document.
The purpose of the invention is realized by the following technical scheme:
a system for automatically identifying, sorting and delivering an electric charge rechecking document comprises a data acquisition module, a data processing module, an identification module, a verification module, a matching and delivering module and a database, wherein the data acquisition module, the data processing module, the verification module and the matching and delivering module are all connected with the database, and the data acquisition module is used for acquiring document image data and storing the document image data into the database; the data processing module is used for optimizing and processing document image data; the data processing module is also connected with an identification module, and the identification module is used for acquiring document information data according to the document image data after the optimization processing; the authentication module is connected with the identification module and is used for carrying out information authentication on the bill information data; the matching and delivering module is used for matching the receipt information data with the user information of the database, and the matching and delivering module is also used for constructing and sending an e-mail according to a matching result.
The electric charge check sheet is identified and processed in the form of image data in the forms of photographing and the like, and corresponding document information is obtained without manual information input. Compared with manual information input, the method has the advantages of rapidness and accuracy by means of identification processing of document image data. And the authentication function is also provided after the information is identified, so that the information error rate is reduced. And moreover, mailbox matching can be carried out according to the identified accurate information, and the writing and sending of the mails are carried out according to the matched mailbox addresses, so that manual intervention is not needed in the whole process, the bill identification efficiency and the accuracy are high, automatic delivery is supported, and the labor cost is saved.
Further, the database comprises an image data unit, a document information data unit, a user information unit and a rule standard information unit, the image data unit is connected with the data acquisition module, and the image data unit is used for storing document image data acquired by the data acquisition module; the bill information data unit is connected with the verification module and used for storing bill information data verified by the verification module; the user information unit is connected with the matching and delivering module and is used for providing user information required by the matching and delivering module for information matching; the rule standard information unit is connected with the verification module and is used for providing judgment rules and standards for the verification module.
The data of different types are stored in different data units, so that the data can be effectively distinguished when image data processing and delivery work are carried out, the user information is stored, the identified document information data can be matched with the user information, and the delivery mailbox address can be ensured to be accurate.
Furthermore, the data processing module comprises a preprocessing unit and an inclination correction unit, the preprocessing unit is connected with the image data unit, the inclination correction unit is connected with the preprocessing unit, the inclination correction unit is also connected with the identification module, and the preprocessing unit and the inclination correction unit are both used for optimizing and processing document image data acquired by the data acquisition module.
Before data identification, document image data needs to be processed so as to improve identification efficiency and identification accuracy. The preprocessing unit and the inclination correction unit optimize initial document image data, and the preprocessing unit aims to simplify document image information quantity, filter irrelevant information, enhance useful information and improve the reliability of subsequent identification. The inclination correction unit mainly aims at the problem of inclination of initial document image data, and because the document image data is obtained in any one mode of mobile terminal photographing, PC terminal camera photographing, scanner photographing, album uploading or system downloading when the document image data is initially obtained, the problem of inclination of the initial document image data is probably caused by the photographing angle, and the accuracy of subsequent identification can be improved after inclination correction.
Furthermore, the identification module comprises a character segmentation unit and a character identification unit, the character segmentation unit is connected with the data processing module, the character segmentation unit is used for extracting characters in document image data and cutting the characters into single text images, the character identification unit is connected with the character segmentation unit, the character identification unit is also connected with the verification module, and the character identification unit is used for identifying the text images and transmitting document information data obtained by identification to the verification module.
The recognition accuracy can be effectively improved by using a single character mode for recognition, so that the recognition accuracy is greatly improved by cutting and then recognizing characters in document image data.
Further, a method for automatically identifying, sorting and delivering an electric charge rechecking document is suitable for the system for automatically identifying, sorting and delivering the electric charge rechecking document, and comprises the following steps:
the method comprises the following steps that firstly, a data acquisition module acquires document image data and stores the acquired document image data into an image data unit of a database;
secondly, extracting document image data in the image data unit by a data processing module, optimizing the document image data by the data processing module, and transmitting the processed document image data to an identification module;
thirdly, performing character recognition on the document image data after the optimization processing through a recognition module, and extracting document information data;
step four, the identification module sends the document information data to the verification module, the verification module verifies the document information data, if the document information data pass the verification, the document information data are stored in a document information data unit of the database, and step five is executed; if the bill information data does not pass the verification, marking the bill information data which does not pass the verification and the corresponding bill image data, correcting the marked bill image data by a bill processing personnel, importing the corrected bill image data into an identification module, and then returning to the third execution step;
and step five, the matching and delivering module matches the receipt information data in the receipt information data unit with the user information data in the user information unit, and then the matching and delivering module delivers the receipt information data to a corresponding user through a mail according to the matching result.
The identified document information data is verified through the verification module, the accuracy of the document information is improved, manual correction is supported, and the document information sent to a user by a final mail is correct. After the email address of the user is obtained through matching, the mail editing and document information uploading do not need to be carried out manually, and the obtained document information is edited into a mail to be sent directly through the matching and delivering module, so that the working efficiency is greatly improved, and the labor consumption is reduced.
Further, the optimization processing in the second step includes document image data preprocessing, and the specific process of document image data preprocessing is as follows: the preprocessing unit firstly normalizes the document image data, unifies the document image data into the same size, then binarizes the document image data after normalization processing to obtain a document image data binarized image with a prominent outline, and then denoises the document image data binarized image by a composite processing method of a corrosion operator and an expansion operator.
The document image data is subjected to normalization processing, so that the size of the document image can be unified, an image area can be better positioned, effective information can be identified, binarization processing is a technical means for processing an image by adopting black and white colors, the document image data is subjected to binarization processing, the outline of an identified target can be highlighted, the image information amount is compressed, and the subsequent identification efficiency is improved. The denoising method mainly solves the problem of background noise in the picture scanning process, and can eliminate strong noise and simultaneously reserve character features to the maximum extent.
Further, the optimization processing in the second step further includes tilt correction processing of text information in document image data, and the specific process of the tilt correction processing is as follows: the method comprises the steps that an inclination correction unit detects a region containing text information in document image data, the region is input into a preprocessing unit to be processed to obtain a text region binary image, the preprocessing unit transmits the obtained text region binary image back to the inclination correction unit, the inclination correction unit extracts coordinates of all pixels with non-zero gray values in the text region binary image, the direction of text inclination is calculated, meanwhile, a minimum rectangle containing the text region is obtained through a minArearect function, so that a central point of the text region is obtained, affine transformation processing is conducted on the text region in the document information data according to the central point of the text region, and corrected document image data are obtained.
After the characters are subjected to tilt correction, the segmentation accuracy can be improved when the characters are subsequently segmented.
Further, before character recognition is performed on the document image data after the optimization processing through the recognition module in the third step, the recognition module further extracts and cuts characters in the document image data after the optimization processing, and the specific process of extraction and cutting is as follows: the character segmentation unit performs horizontal projection on the document image data after optimization processing so as to obtain an upper limit and a lower limit of each line of document image data, then performs cutting according to the upper limit and the lower limit of each line of document image data, performs vertical projection on each line of cut document image data so as to obtain a left boundary and a right boundary of each character in the document image data, finally performs cutting according to the left boundary and the right boundary of each character in the document image data, and transmits all the characters obtained by cutting to the character recognition unit.
The character recognition efficiency is higher when the character is recognized according to a single character, so that the character is cut separately, and the error rate of the character recognition is effectively reduced.
Further, in the third step, the character recognition is classified and recognized through a fast-RCNN algorithm to obtain document information data.
Further, the specific process of validating the document information data in the fourth step is as follows: the verification module extracts a regular expression in the regular standard information unit, compares the document information data with the regular expression, extracts a user information structure in the regular standard information unit, compares the document information data with the user information structure, and passes the document information data verification when the comparison result shows that the document information data is consistent with the regular expression and the document information data is consistent with the user information structure; otherwise, the verification fails.
And verifying the identified user information structurally, thereby ensuring that the condition of information lack does not occur. And whether the bill information data meet the requirements or not is judged through the regular expression, so that the error rate of the bill information is further reduced, and the wrong bill can be corrected in time.
The invention has the beneficial effects that:
the information identification is carried out by collecting the image data of the electric charge rechecking document without manual information input, so that the objective problem of physical isolation of an internal network and an external network is solved, the information input efficiency is improved, and the document information error rate is reduced. And the original document image data is processed through preprocessing and inclination correction processing, so that the identification accuracy is improved. The identified information is verified before delivery matching, so that the user information on the document is correct and can be matched with the user information in the database, the condition that the document is wrongly sent by a user cannot occur, and the information accuracy rate identified by the document image data is ensured. And the mail can be sent independently without manually inputting information and sending the mail, so that the workload of workers is reduced, and the bill delivery efficiency is improved.
Drawings
FIG. 1 is a schematic diagram of an embodiment of the present invention;
FIG. 2 is a schematic flow diagram of the present invention;
FIG. 3 is a first interface diagram of an implementation interface of the system for automatically identifying, sorting and delivering the electric charge review documents according to the embodiment of the invention;
FIG. 4 is a second interface diagram of the interface of the system for automatically identifying, sorting and delivering the electric charge review documents according to the embodiment of the invention;
FIG. 5 is a third interface diagram of an implementation interface of the system for automatically identifying, sorting and delivering the electric charge review documents according to the embodiment of the invention;
wherein: 1. the system comprises a data acquisition module, a data processing module, a preprocessing unit, a tilt correction unit, a recognition module, a character segmentation unit, a character recognition unit, a verification module, a matching and delivery module, a database, a data base, a user information unit, a rule standard information unit, and a data processing module, 2-1, a pre-processing unit, 2-2, a tilt correction unit, 3-1, a character segmentation unit, 3-2, a character recognition unit, 4, a verification module, 5, a matching and delivery module, 6, a database, 6-1, an image data unit, 6-2.
Detailed Description
The invention is further described below with reference to the figures and examples.
Example (b):
a system for automatically identifying, sorting and delivering an electric charge rechecking document comprises a data acquisition module 1, a data processing module 2, an identification module 3, an authentication module 4, a matching and delivering module 5 and a database 6, wherein the data acquisition module 1, the data processing module 2, the authentication module 4 and the matching and delivering module 5 are all connected with the database 6, and the data acquisition module 1 is used for acquiring document image data and storing the document image data into the database 6; the data processing module 2 is used for optimizing and processing document image data; the data processing module 2 is also connected with an identification module 3, and the identification module 3 is used for acquiring document information data according to the document image data after the optimization processing; the authentication module 4 is connected with the identification module 3, and the authentication module 4 is used for carrying out information authentication on the bill information data; the matching and delivering module 5 is used for matching the bill information data with the user information of the database 6, and the matching and delivering module 5 is also used for constructing and sending an e-mail according to a matching result.
The data acquisition unit supports to shoot from the mobile terminal, shoot by the PC terminal camera, shoot by the scanner, upload the photo album or obtain images in any mode of system downloading, the acquiescent acquisition mode is to shoot through the PC terminal camera in batches, the high-definition camera has clear shooting effect, and subsequent image data processing and character recognition are facilitated.
The database 6 comprises an image data unit 6-1, a document information data unit 6-2, a user information unit 6-3 and a rule standard information unit 6-4, the image data unit 6-1 is connected with the data acquisition module 1, and the image data unit 6-1 is used for storing document image data acquired by the data acquisition module 1; the bill information data unit 6-2 is connected with the authentication module 4, and the bill information data unit 6-2 is used for storing bill information data authenticated by the authentication module 4; the user information unit 6-3 is connected with the matching and delivering module 5, and the user information unit 6-3 is used for providing user information required by the matching and delivering module 5 for information matching; the rule standard information unit 6-4 is connected with the verification module 4, and the rule standard information unit 6-4 is used for providing judgment rules and standards for the verification module 4.
A fixed image storage path is provided for the collected document image data, and the path can be modified at any time.
The data processing module 2 comprises a preprocessing unit 2-1 and an inclination correction unit 2-2, the preprocessing unit 2-1 is connected with the image data unit 6-1, the inclination correction unit 2-2 is connected with the preprocessing unit 2-1, the inclination correction unit 2-2 is further connected with the identification module 3, and the preprocessing correspondence and inclination correction unit 2-2 are used for optimizing document image data acquired by the data acquisition module 1.
The identification module 3 comprises a character segmentation unit 3-1 and a character identification unit 3-2, the character segmentation unit 3-1 is connected with the data processing module 2, the character segmentation unit 3-1 is used for extracting characters in document image data and cutting the characters into single text images, the character identification unit 3-2 is connected with the character segmentation unit 3-1, the character identification unit 3-2 is further connected with the verification module 4, and the character identification unit 3-2 is used for identifying the text images and transmitting document information data obtained through identification to the verification module 4.
The receipt contains a large amount of information including numbers, Chinese characters, table structures and the like, so that the characters are divided by character division before receipt information identification and entry, and the identification accuracy in a single character image form is higher.
A method for automatically identifying, sorting and delivering an electric charge rechecking document is suitable for the system for automatically identifying, sorting and delivering the electric charge rechecking document, as shown in figure 2, and comprises the following steps:
firstly, a data acquisition module 1 acquires document image data and stores the acquired document image data into an image data unit 6-1 of a database 6;
secondly, the data processing module 2 extracts document image data in the image data unit 6-1, the data processing module 2 performs optimization processing on the document image data, and transmits the processed document image data to the identification module 3;
thirdly, performing character recognition on the document image data after the optimization processing through a recognition module 3, and extracting document information data;
step four, the identification module 3 sends the bill information data to the verification module 4, the verification module 4 verifies the bill information data, if the bill information data pass the verification, the bill information data is stored in a bill information data unit 6-2 of the database 6, and step five is executed; if the bill information data does not pass the verification, marking the bill information data which does not pass the verification and the corresponding bill image data, correcting the marked bill image data by a bill processing personnel, importing the corrected bill image data into the identification module 3, and then returning to the third execution step;
and step five, the matching and delivery module 5 matches the receipt information data in the receipt information data unit 6-2 with the user information data in the user information unit 6-3, and then the matching and delivery module 5 delivers the receipt information data to the corresponding user through a mail according to the matching result.
Before sending the mail, the mailbox address contained in the matched user information is sent into the receiver function of the smtplib module, and the MIME module is utilized to construct the structure of the e-mail, such as a mail header, a sender, a receiver, a mail body and the like. Then using MIMEMULTIpart function to add attachment, automatically reading document information data, and adding it into the established mail. And after the mail is automatically constructed, the smtplib module is automatically connected with a preset mail server, and finally the mail is sent to a specified mailbox.
In the second step, the optimization processing comprises document image data preprocessing, and the specific process of document image data preprocessing is as follows: the preprocessing unit 2-1 firstly normalizes the document image data, unifies the document image data into the same size, then binarizes the document image data after normalization processing to obtain a document image data binarized image with a prominent outline, and then denoises the document image data binarized image by a composite processing method of a corrosion operator and an expansion operator.
The binarization processing is a technical means of setting the gray value of a pixel point on an image to be 0 or 255, namely processing the image by adopting black and white colors, and can not only highlight the outline of an identification target, but also compress the information content of the image and improve the processing efficiency. The binarization processing specifically adopts a local adaptive threshold correction method, and the method avoids the problems that the global binarization cannot effectively process the image shadow and the illumination is uneven.
The denoising process mainly solves the problem of background noise in the image scanning process, and particularly in the region with dense document characters, the background noise is easy to appear. And the composite processing method of the corrosion operator and the expansion operator can eliminate strong noise and simultaneously reserve character characteristics to the maximum extent, and can effectively meet the purpose of bill image optimization. The erosion operator can eliminate boundary points of the binary image and remove objects smaller than the structural elements, while the expansion operator is opposite to the erosion operator and can expand the boundary points of the binary image and combine background points contacting with the object into the object to expand the boundary outwards. Therefore, the document image binary image is processed by adopting the corrosion operator and the expansion operator at the same time, and the denoising processing of the document image binary image can be completed only by setting a reasonable size of the structural element and implementing a processing mode of expanding first and then corroding or corroding first and then expanding.
In the second step, the optimization processing further comprises inclination correction processing of text information in document image data, and the specific process of the inclination correction processing is as follows: the method comprises the steps that a tilt correction unit 2-2 detects a region containing text information in document image data, the region is input into a preprocessing unit 2-1 to be processed to obtain a text region binary image, then the preprocessing unit 2-1 transmits the obtained text region binary image back to the tilt correction unit 2-2, the tilt correction unit 2-2 extracts coordinates of all pixels with non-zero gray values in the text region binary image, the direction of text tilt is calculated, meanwhile, a minimum rectangle containing the text region is obtained through a minAreaRect function, so that a center point of the text region is obtained, then affine transformation processing is conducted on the text region in the document information data according to the center point of the text region, and corrected document image data are obtained.
Before character recognition is carried out on the document image data after the optimization processing through the recognition module 3 in the third step, the recognition module 3 also extracts and cuts characters in the document image data after the optimization processing, and the specific processes of the extraction and the cutting are as follows: the character segmentation unit 3-1 horizontally projects the optimized document image data to obtain an upper limit and a lower limit of each line of document image data, then cuts the document image data according to the upper limit and the lower limit of each line of document image data, vertically projects the cut document image data to obtain a left limit and a right limit of each character in the document image data, finally cuts the document image data according to the left limit and the right limit of each character in the document image data, and transmits all the cut characters to the character recognition unit 3-2.
And in the third step, the character recognition is classified and recognized through a fast-RCNN algorithm to obtain document information data.
The fast-RCNN algorithm is divided into a plurality of processing modules, and the algorithm is divided into an input module, a convolutional layer, a linear rectifying layer, a pooling layer, a full-link layer and a classification regression module. Firstly, performing feature extraction on input document image data through convolution kernel in a convolution layer, wherein the convolution layer internally comprises a plurality of convolution kernels, and each element forming the convolution kernels corresponds to a weight coefficient and a deviation amount and is similar to a neuron of a feedforward neural network. When the convolution kernel works, the convolution kernel regularly sweeps the input characteristics, matrix element multiplication summation is carried out on the input characteristics, and deviation amount is superposed. Here, a convolution kernel with a scale of 3 × 3 is used, and feature extraction is performed using 6 convolution kernels with different weighting coefficients. After the features are extracted, the linear rectification layer further finds a positive sample detection frame through detection frame regression, the positive sample detection frame comprises character information to be detected and classified, interference of an external background on recognition is further reduced, and the linear rectification layer is activated through a ReLU function. And extracting a candidate feature map through a feature map obtained by extracting the comprehensive features of the pooling layer of the fast-RCNN algorithm and a candidate region framed by a positive sample detection frame, so as to be used for subsequent full-connected layer and classification regression layer to classify, wherein the pooling dimension of 2x2 is selected, meanwhile, a pooling result is obtained by using a weighted average method, and a weighting coefficient is obtained by automatic iteration of a gradient descent method. The pooling layer carries out aggregation statistics on the local region characteristics of the image, greatly reduces the dimensionality of the characteristic vector, and can reduce the calculation amount of a subsequent full-connection layer and a classification regression layer. The full connection layer and the classification regression layer mainly classify the candidate feature maps to obtain the probability of a certain class, and then the final accurate position of the detection frame is obtained through the calculation of the original detection target position, so that the recognition result of the character is obtained.
The specific process of document information data verification in the fourth step is as follows: the verification module 4 extracts a regular expression in the rule standard information unit 6-4, the verification module 4 compares the data of the receipt information with the regular expression, meanwhile, the verification module 4 extracts a user information structure in the rule standard information unit 6-4 and compares the data of the receipt information with the user information structure, and when the comparison result is that the data of the receipt information is consistent with the regular expression and the data of the receipt information is consistent with the user information structure, the data of the receipt information passes verification; otherwise, the verification fails.
The implementation interface diagram of the system for automatically identifying, sorting and delivering the electric charge rechecking document is totally divided into three parts, as shown in fig. 3, the first part of the implementation interface comprises user number identification and mail sending, three units are arranged below the implementation interface diagram, one unit is used for leading in a user mailbox address from Excel, the second unit is used for identifying the user number, and the third unit is used for sending the image to the user mailbox. As shown in fig. 4, the second part includes a mail transmission record query, which may be based on the user number, the start date, or a derived transmission. As shown in fig. 5, the third part includes an operation log, which can show the overall workflow status, so that the user can quickly know the identification result of each document, thereby facilitating timely troubleshooting. Through the implementation interface, the automatic identification and delivery work of the electric charge rechecking document can be realized, the confirmation of the sending condition is supported, the interface is simple and easy to understand, and the operation is simple.
In this embodiment, an electric power consumer in a certain city is used as an experimental object, and 200 electric power charge rechecking documents thereof are used as test samples to perform effect detection. 200 electric charge rechecking documents are scanned and all stored in a jpg format and then processed by the document automatic identification, sorting and delivery system developed by the invention. The test standard mainly comprises the following two aspects: firstly, whether the power house number can be accurately identified in batches is judged, the test result is evaluated by the identification rate and the delivery rate, and the evaluation formula is as follows:
Figure BDA0002936776610000141
Figure BDA0002936776610000142
the test results are shown in table 1 below.
Table 1: test results
Content of test Success number Total number of Recognition rate
Identification 194 200 97%
Delivery of 194 194 100%
The total processing time of the documents is 2070 seconds, and the average processing time of each document is 10.35 seconds. The manual processing of a single document takes about 36 seconds averagely, and compared with the original manual processing time, the efficiency is improved by over 71.25 percent. From the above table, it can be seen that the method provided herein can effectively improve the bill processing efficiency, and simultaneously ensure a higher recognition accuracy.
The above-described embodiments are only preferred embodiments of the present invention, and are not intended to limit the present invention in any way, and other variations and modifications may be made without departing from the spirit of the invention as set forth in the claims.

Claims (10)

1. The system for automatically identifying, sorting and delivering the electric charge rechecking document is characterized by comprising a data acquisition module (1), a data processing module (2), an identification module (3), a verification module (4), a matching and delivering module (5) and a database (6), wherein the data acquisition module (1), the data processing module (2), the verification module (4) and the matching and delivering module (5) are all connected with the database (6), and the data acquisition module (1) is used for acquiring document image data and storing the document image data into the database (6); the data processing module (2) is used for optimizing and processing document image data; the data processing module (2) is also connected with an identification module (3), and the identification module (3) is used for acquiring document information data according to the document image data after the optimization processing; the authentication module (4) is connected with the identification module (3), and the authentication module (4) is used for carrying out information authentication on the bill information data; the matching and delivering module (5) is used for matching the data of the single information with the user information, and the matching and delivering module (5) is also used for constructing and sending the e-mail according to the matching result.
2. The system for automatically identifying, sorting and delivering the electric charge review document according to claim 1, wherein the database (6) comprises an image data unit (6-1), a document information data unit (6-2), a user information unit (6-3) and a rule standard information unit (6-4), the image data unit (6-1) is connected with the data acquisition module (1), and the image data unit (6-1) is used for storing document image data acquired by the data acquisition module (1); the bill information data unit (6-2) is connected with the verification module (4), and the bill information data unit (6-2) is used for storing bill information data verified by the verification module (4); the user information unit (6-3) is connected with the matching and delivering module (5), and the user information unit (6-3) is used for providing user information required by the matching and delivering module (5) for information matching; the rule standard information unit (6-4) is connected with the verification module (4), and the rule standard information unit (6-4) is used for providing judgment rules and standards for the verification module (4).
3. The system for automatically identifying, sorting and delivering the electric charge review documents according to claim 1, wherein the data processing module (2) comprises a preprocessing unit (2-1) and an inclination correction unit (2-2), the preprocessing unit (2-1) is connected with the image data unit (6-1), the inclination correction unit (2-2) is connected with the preprocessing unit (2-1), the inclination correction unit (2-2) is further connected with the identification module (3), and the preprocessing unit and the inclination correction unit (2-2) are used for optimizing and processing document image data acquired by the data acquisition module (1).
4. The system for automatically identifying, sorting and delivering electric charge review documents according to claim 1, characterized in that the recognition module (3) comprises a character segmentation unit (3-1) and a character recognition unit (3-2), the character segmentation unit (3-1) is connected with the data processing module (2), the character segmentation unit (3-1) is used for extracting characters in document image data and cutting the characters into single character images, the character recognition unit (3-2) is connected with the character segmentation unit (3-1), the character recognition unit (3-2) is also connected with the verification module (4), the character recognition unit (3-2) is used for recognizing the character image and transmitting the bill information data obtained through recognition to the verification module (4).
5. A method for automatically identifying, sorting and delivering an electric charge rechecking document is suitable for a system for automatically identifying, sorting and delivering the electric charge rechecking document, and is characterized by comprising the following steps:
the method comprises the following steps that firstly, a data acquisition module (1) acquires document image data and stores the acquired document image data into an image data unit (6-1) of a database (6);
secondly, the data processing module (2) extracts document image data in the image data unit (6-1), the data processing module (2) performs optimization processing on the document image data, and transmits the processed document image data to the identification module (3);
thirdly, performing character recognition on the document image data after the optimization processing through a recognition module (3) to extract document information data;
step four, the identification module (3) sends the document information data to the verification module (4), the verification module (4) verifies the document information data, if the document information data pass the verification, the document information data are stored in a document information data unit (6-2) of the database (6), and step five is executed; if the bill information data fails to pass the verification, marking the bill information data which fails to pass the verification and the corresponding bill image data, correcting the marked bill image data by a bill processing personnel, importing the corrected bill image data into an identification module (3), and then returning to the third execution step;
and step five, the matching and delivering module (5) matches the receipt information data in the receipt information data unit (6-2) with the user information data in the user information unit (6-3), and then the matching and delivering module (5) delivers the receipt information data to the corresponding user through a mail according to the matching result.
6. The method for automatically identifying, sorting and delivering electric charge review documents according to claim 5, wherein the optimization processing in the second step comprises document image data preprocessing, and the specific process of document image data preprocessing is as follows: the preprocessing unit (2-1) firstly normalizes the document image data, unifies the document image data into the same size, then binarizes the document image data after normalization processing to obtain a document image data binarized image with a prominent outline, and then denoises the document image data binarized image by a composite processing method of a corrosion operator and an expansion operator.
7. The method for automatically identifying, sorting and delivering electric charge review documents according to claim 5, wherein the optimization processing in the second step further comprises tilt correction processing of text information in document image data, and the specific process of the tilt correction processing is as follows: the method comprises the steps that a tilt correction unit (2-2) detects a region containing text information in document image data, the region is input into a preprocessing unit (2-1) to be processed to obtain a text region binary image, then the preprocessing unit (2-1) transmits the obtained text region binary image back to the tilt correction unit (2-2), the tilt correction unit (2-2) extracts coordinates of all pixels with non-zero gray values in the text region binary image, the direction of text tilt is calculated, meanwhile, a minimum rectangle containing a text region is obtained through a minAreaRect function, the center point of the text region is obtained, then the text region in the document information data is subjected to transform processing according to the center point of the text region, and corrected document image data are obtained.
8. The method for automatically identifying, sorting and delivering the electric charge rechecking document according to claim 5, wherein before the character recognition is performed on the optimized document image data through the recognition module (3) in the third step, the recognition module (3) further extracts and cuts the characters in the optimized document image data, and the specific processes of the extraction and the cutting are as follows: the character segmentation unit (3-1) horizontally projects the document image data after optimization processing, so that an upper limit and a lower limit of each line of document image data are obtained, then cutting is carried out according to the upper limit and the lower limit of each line of document image data, vertical projection is carried out on each line of cut document image data, a left boundary and a right boundary of each character in the document image data are obtained, finally cutting is carried out according to the left boundary and the right boundary of each character in the document image data, and all characters obtained through cutting are transmitted to the character recognition unit (3-2).
9. The method for automatically identifying, sorting and delivering electric charge rechecked documents according to claim 5, wherein the character recognition in step three is classified and identified by a fast-RCNN algorithm to obtain document information data.
10. The method for automatically identifying, sorting and delivering the electric charge rechecking documents as claimed in claim 5, wherein the document information data in the fourth step is verified by the following specific process: the verification module (4) extracts a regular expression in the rule standard information unit (6-4), the verification module (4) compares the data of the document information with the regular expression, meanwhile, the verification module (4) extracts a user information structure in the rule standard information unit (6-4), compares the data of the document information with the user information structure, and passes the verification of the document information data when the comparison result shows that the document information data is consistent with the regular expression and the document information data is consistent with the user information structure; otherwise, the verification fails.
CN202110163916.9A 2021-02-05 2021-02-05 System and method for automatically identifying, sorting and delivering electric charge rechecking document Pending CN113011407A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110163916.9A CN113011407A (en) 2021-02-05 2021-02-05 System and method for automatically identifying, sorting and delivering electric charge rechecking document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110163916.9A CN113011407A (en) 2021-02-05 2021-02-05 System and method for automatically identifying, sorting and delivering electric charge rechecking document

Publications (1)

Publication Number Publication Date
CN113011407A true CN113011407A (en) 2021-06-22

Family

ID=76385526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110163916.9A Pending CN113011407A (en) 2021-02-05 2021-02-05 System and method for automatically identifying, sorting and delivering electric charge rechecking document

Country Status (1)

Country Link
CN (1) CN113011407A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654072A (en) * 2016-03-24 2016-06-08 哈尔滨工业大学 Automatic character extraction and recognition system and method for low-resolution medical bill image
CN109087194A (en) * 2018-08-13 2018-12-25 平安普惠企业管理有限公司 Invoice checking method, device, computer equipment and storage medium
CN109492641A (en) * 2018-09-18 2019-03-19 上海延华智能科技(集团)股份有限公司 Energy bills input method and system, storage medium, server based on image recognition
CN109657665A (en) * 2018-10-31 2019-04-19 广东工业大学 A kind of invoice batch automatic recognition system based on deep learning
CN109784341A (en) * 2018-12-25 2019-05-21 华南理工大学 A kind of medical document recognition methods based on LSTM neural network
CN109840519A (en) * 2019-01-25 2019-06-04 青岛盈智科技有限公司 A kind of adaptive intelligent form recognition input device and its application method
CN110705488A (en) * 2019-10-09 2020-01-17 广州医药信息科技有限公司 Image character recognition method
US20200143349A1 (en) * 2018-11-02 2020-05-07 Royal Bank Of Canada System and method for auto-populating electronic transaction process
CN111966640A (en) * 2020-09-03 2020-11-20 深圳市小满科技有限公司 Document file identification method and system
CN112256723A (en) * 2020-09-02 2021-01-22 中山大学 Intelligent management system and retrieval method for no-mail based on deep learning image-text recognition

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654072A (en) * 2016-03-24 2016-06-08 哈尔滨工业大学 Automatic character extraction and recognition system and method for low-resolution medical bill image
CN109087194A (en) * 2018-08-13 2018-12-25 平安普惠企业管理有限公司 Invoice checking method, device, computer equipment and storage medium
CN109492641A (en) * 2018-09-18 2019-03-19 上海延华智能科技(集团)股份有限公司 Energy bills input method and system, storage medium, server based on image recognition
CN109657665A (en) * 2018-10-31 2019-04-19 广东工业大学 A kind of invoice batch automatic recognition system based on deep learning
US20200143349A1 (en) * 2018-11-02 2020-05-07 Royal Bank Of Canada System and method for auto-populating electronic transaction process
CN109784341A (en) * 2018-12-25 2019-05-21 华南理工大学 A kind of medical document recognition methods based on LSTM neural network
CN109840519A (en) * 2019-01-25 2019-06-04 青岛盈智科技有限公司 A kind of adaptive intelligent form recognition input device and its application method
CN110705488A (en) * 2019-10-09 2020-01-17 广州医药信息科技有限公司 Image character recognition method
CN112256723A (en) * 2020-09-02 2021-01-22 中山大学 Intelligent management system and retrieval method for no-mail based on deep learning image-text recognition
CN111966640A (en) * 2020-09-03 2020-11-20 深圳市小满科技有限公司 Document file identification method and system

Similar Documents

Publication Publication Date Title
US11151369B2 (en) Systems and methods for classifying payment documents during mobile image processing
CN109657665B (en) Invoice batch automatic identification system based on deep learning
CN111680688B (en) Character recognition method and device, electronic equipment and storage medium
CN110555372A (en) Data entry method, device, equipment and storage medium
CN110569341B (en) Method and device for configuring chat robot, computer equipment and storage medium
CN112651289B (en) Value-added tax common invoice intelligent recognition and verification system and method thereof
US20130028502A1 (en) Systems and methods for mobile image capture and processing of checks
CN110516649B (en) Face recognition-based alumni authentication method and system
CN112818785B (en) Rapid digitization method and system for meteorological paper form document
CN113963147B (en) Key information extraction method and system based on semantic segmentation
CN111178147B (en) Screen crushing and grading method, device, equipment and computer readable storage medium
CN112464925A (en) Mobile terminal account opening data bank information automatic extraction method based on machine learning
CN111178203B (en) Signature verification method and device, computer equipment and storage medium
CN114694161A (en) Text recognition method and equipment for specific format certificate and storage medium
CN108090728B (en) Express information input method and system based on intelligent terminal
CN113011407A (en) System and method for automatically identifying, sorting and delivering electric charge rechecking document
CN116363655A (en) Financial bill identification method and system
WO2019071476A1 (en) Express information input method and system based on intelligent terminal
CN114463767A (en) Credit card identification method, device, computer equipment and storage medium
CN114758340A (en) Intelligent identification method, device and equipment for logistics address and storage medium
CN113947778A (en) Archive file based digital processing method
US10181077B1 (en) Document image orientation assessment and correction
CN111353744A (en) Goods receipt data storage method and device based on image recognition
CN117391068B (en) Method and system for checking life insurance security business information based on RPA
CN113723392A (en) Document image quality evaluation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210622

RJ01 Rejection of invention patent application after publication