CN111861731A - Post-credit check system and method based on OCR - Google Patents
Post-credit check system and method based on OCR Download PDFInfo
- Publication number
- CN111861731A CN111861731A CN202010758400.4A CN202010758400A CN111861731A CN 111861731 A CN111861731 A CN 111861731A CN 202010758400 A CN202010758400 A CN 202010758400A CN 111861731 A CN111861731 A CN 111861731A
- Authority
- CN
- China
- Prior art keywords
- text
- error
- error correction
- recognition
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000012937 correction Methods 0.000 claims abstract description 128
- 230000011218 segmentation Effects 0.000 claims abstract description 42
- 238000012015 optical character recognition Methods 0.000 claims abstract description 37
- 238000007689 inspection Methods 0.000 claims abstract description 18
- 238000005516 engineering process Methods 0.000 claims abstract description 14
- 230000009466 transformation Effects 0.000 claims abstract description 12
- 238000003062 neural network model Methods 0.000 claims abstract description 9
- 238000012549 training Methods 0.000 claims description 22
- 239000011159 matrix material Substances 0.000 claims description 7
- 230000004927 fusion Effects 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 4
- 239000000463 material Substances 0.000 description 5
- 238000004080 punching Methods 0.000 description 5
- 238000012552 review Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000011835 investigation Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Strategic Management (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Business, Economics & Management (AREA)
- Technology Law (AREA)
- Marketing (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Character Discrimination (AREA)
Abstract
The invention relates to the technical field of computers, in particular to an OCR-based post-loan inspection method, which comprises the following steps: s1, shooting and extracting the text picture of the photocopy of the client data; s2, inputting the text picture into a pre-trained text correction network model; s3, correcting the text picture and the classified saliency map by using a strip region transformation algorithm; s4, outputting a recognition text by adopting an optical character recognition technology; s5, determining error-correcting words corresponding to the error word segmentation with the wrong recognition in the recognition text; s6, determining an error correction confidence coefficient corresponding to the error correction candidate text through the neural network model, and taking the error correction candidate text with the error correction confidence coefficient larger than a first threshold value as the error correction text; and S7, comparing the information according to the error correction text, judging whether the information comparison is consistent, and prompting manual recheck if the information comparison is inconsistent. The invention solves the technical problem that content identification and grabbing are inaccurate for the photocopy piece with poor photocopy quality by using an OCR technology.
Description
Technical Field
The invention relates to the technical field of computers, in particular to an OCR-based post-loan inspection system and method.
Background
Currently, when a loan application is required, bank workers will review the authenticity of the material provided by the borrower. In most cases, banks use manual review, i.e., workers review the materials provided by the borrowers one by one to determine whether the borrowers meet the conditions of the bank's loan. However, the manual auditing mode has low efficiency and high labor cost; when the workload is large, the repeated work of the staff for a long time is easy to cause errors.
In view of the above, document CN109697665A discloses a loan auditing method, device, equipment and medium based on artificial intelligence, which includes: obtaining a loan application request, wherein the loan application request comprises an identity card image and user personal information; adopting an OCR recognition technology to carry out recognition verification on the identity card image to obtain basic information of the user; inquiring a third-party credit investigation platform based on the user basic information to obtain a user credit investigation value; forming a voice question based on the personal information of the user, broadcasting the voice question and starting a camera to record to obtain a monitoring video; calling a pre-established micro expression recognition model to detect the monitoring video to obtain a micro expression detection result; and obtaining a loan auditing result based on the micro-expression detection result and the credit investigation value of the user. The auditing process does not need manual intervention, can intelligently audit the authenticity of the information of the borrowers, and effectively improves the efficiency of loan auditing.
Similarly, aiming at the problem of excessive post-loan inspection occupied post-loan, by introducing an optical Character recognition technology, namely OCR (optical Character recognition), automatic content recognition and capture are performed on customer data related to services stored in a post-loan system, and automatic comparison and inspection are performed on contents to be inspected through a preset rule to achieve automatic post-loan inspection, so that the labor input in the current post-loan process can be greatly reduced. However, most of the customer's material is a photocopy, and the photocopy quality is uneven, and for the photocopy with poor photocopy quality, it is not accurate to recognize and capture the content by the OCR technology. In addition, because of the problems of light, font color shade, and shooting angle of the photocopy, OCR has a certain error recognition rate and does not represent that the data has a certain problem or error.
Disclosure of Invention
The invention provides an OCR-based post-loan inspection system and method, which solve the technical problem that content identification and grabbing of a photocopy with poor photocopy quality are inaccurate by using an OCR technology.
The basic scheme provided by the invention is as follows: an OCR-based post-credit inspection method comprising the steps of:
s1, shooting and extracting the text picture of the photocopy of the client data;
s2, inputting the text picture into a pre-trained text correction network model, and outputting a character-level classification saliency map;
s3, correcting the text picture and the classified saliency map by using a strip region transformation algorithm, and outputting a corrected picture;
s4, converting the print characters in the corrected picture into an image file of a black-and-white dot matrix by adopting an optical character recognition technology, converting characters in the image file into a text format, and outputting a recognition text;
s5, performing correlation identification on the identification text and the service library, and determining error-correcting words corresponding to error-segmented words with identification errors in the identification text;
s6, replacing the error word with the corresponding error segmentation word in the recognition text to obtain a corresponding error correction candidate text, determining the error correction confidence corresponding to the error correction candidate text through a neural network model, and taking the error correction candidate text with the error correction confidence greater than a first threshold value as the error correction text;
and S7, comparing the information according to the error correction text, judging whether the information comparison is consistent, and prompting manual recheck if the information comparison is inconsistent.
The working principle and the advantages of the invention are as follows:
1. the photocopy of the customer material need not be horizontal text, but may be irregular text such as slanted text, curved text, or perspective warped text. For irregular texts, the OCR technology is directly adopted for recognition, and the effect is poor. In the scheme, the text picture is corrected through the text correction network model and the strip region transformation algorithm, so that the accuracy and the robustness of recognition are improved. In addition, the accuracy of identifying and capturing the content can be well ensured by correcting the error of the identification text and comparing the information with the corrected error correction text.
2. For the service with the consistent information comparison, the service can be directly marked as pass in the post-credit check process, and the post-credit data check process is ended after the relevant information is recorded. And prompting the post-loan personnel to manually recheck the business with inconsistent information comparison. Because of the problems of light, font color depth, shooting angle and the like of the photocopy, the OCR has a certain error identification rate and does not represent that the data has certain problems or errors, and the post-loan personnel rechecks the data by calling the data and manually confirms the condition of the business data, so that the post-loan inspection accuracy can be ensured.
The invention corrects the text picture through the text correction network model and the strip region transformation algorithm, and then corrects the recognized text, thereby solving the technical problem that the content recognition and capture of the photocopy with poor photocopy quality are not accurate by using an OCR technology.
Further, S2 specifically includes: s21, inputting the text pictures into the full convolution neural network to extract features of different scales and different depths, and S22, adopting a U-shaped network structure to perform feature fusion on the features of different scales and different depths to obtain a character-level classification saliency map.
Has the advantages that: by the method, the features of different scales and different depths are extracted, then the features of different scales and different depths are subjected to feature fusion, and the obtained character-level classification saliency map contains the features of main scales and depths, so that the identification accuracy can be effectively improved.
Further, the training process of the text correction network model comprises the following steps:
a1, initializing parameters of the text correction network model;
a2, acquiring training text pictures and real classification saliency map labels;
a3, inputting a training text picture into a text correction network model to obtain a prediction saliency map;
a4, calculating network loss according to the predicted saliency map and the real classification saliency map labels, and then updating and correcting network parameters according to the calculated network loss;
and A5, continuously repeating the above processes until a certain number of turns is reached, finishing training, and storing the corrected network parameters.
Has the advantages that: the method is used for training the text correction network model, so that the correction speed of the text picture is improved, and the correction precision is improved.
Further, before determining the error-correcting word corresponding to the error-segmented word with the recognition error in the recognition text, the method further includes:
and determining the recognition confidence of each participle in the recognized text, and taking the participle with the recognition confidence smaller than a second threshold value as an error participle.
Has the advantages that: and taking the participles with the recognition confidence coefficient smaller than the second threshold value as error participles, and indirectly controlling and judging the standard tightness degree of the error participles according to different conditions in a mode of presetting the second threshold value.
Further, determining the error-correcting words corresponding to the error-segmented words with the wrong recognition in the recognition text comprises:
for the error word segmentation in the recognized text, determining a confusable word corresponding to the error word segmentation, and determining the confusable degree of the confusable word corresponding to the error word segmentation;
and selecting the confusable words corresponding to the wrong segmentation according to a preset rule based on the confusable degree of the confusable words corresponding to the wrong segmentation, and taking a selection result as the error-correcting words corresponding to the wrong segmentation.
Has the advantages that: in such a way, the confusable words corresponding to the wrong segmentation are determined, and then the confusable words corresponding to the wrong segmentation are selected according to the preset rule, so that the influence of the confusable words is considered, and the accuracy of recognition is improved.
Based on the OCR-based post-loan inspection method, there is also disclosed an OCR-based post-loan inspection system, including:
the input module is used for shooting and extracting a text picture of a photocopy of the client data;
the character module is used for inputting the text pictures into a pre-trained text correction network model and outputting a character-level classification saliency map;
the correction module is used for correcting the text picture and the classified saliency map by utilizing a strip region transformation algorithm and outputting a corrected picture;
the recognition module is used for converting the print characters in the corrected picture into an image file of a black-and-white dot matrix by adopting an optical character recognition technology, converting characters in the image file into a text format and outputting a recognition text;
the error correction module is used for performing correlation identification on the identification text and the service library and determining error correction words corresponding to error segmentation words with identification errors in the identification text;
the text module is used for replacing the error word with the corresponding error word in the recognized text to obtain a corresponding error correction candidate text, determining an error correction confidence coefficient corresponding to the error correction candidate text through a neural network model, and taking the error correction candidate text with the error correction confidence coefficient larger than a first threshold value as the error correction text;
and the comparison module is used for comparing the information according to the error correction text, judging whether the information comparison is consistent, and prompting manual rechecking if the information comparison is inconsistent.
The working principle and the advantages of the invention are as follows: in the scheme, the text image is corrected through the text correction network model and the strip region transformation algorithm, so that the accuracy and the robustness of recognition are improved; by correcting the error of the identification text and comparing the information with the corrected error correction text, the accuracy of identifying and capturing the content can be well ensured.
Further, the character module specifically includes: and the extraction unit is used for inputting the text picture into the full convolution neural network to extract the features with different scales and different depths, and the generation unit is used for performing feature fusion on the features with different scales and different depths by adopting a U-shaped network structure to obtain the character-level classification saliency map.
Has the advantages that: by the method, the accuracy of identification can be effectively improved.
Further, the training process of the text correction network model comprises the following steps:
a1, initializing parameters of the text correction network model;
a2, acquiring training text pictures and real classification saliency map labels;
a3, inputting a training text picture into a text correction network model to obtain a prediction saliency map;
a4, calculating network loss according to the predicted saliency map and the real classification saliency map labels, and then updating and correcting network parameters according to the calculated network loss;
and A5, continuously repeating the above processes until a certain number of turns is reached, finishing training, and storing the corrected network parameters.
Has the advantages that: therefore, the method is beneficial to improving the correction rate of the text picture and improving the correction precision.
Further, the error correction module further comprises a confidence unit, which is used for determining the recognition confidence of each participle in the recognized text before determining the error-correcting word corresponding to the erroneous participle in the recognized text, and taking the participle with the recognition confidence smaller than a second threshold value as the erroneous participle.
Has the advantages that: the degree of tightness of the standard of the wrong word segmentation can be indirectly controlled and judged according to different conditions.
Furthermore, the error correction module also comprises a confusion unit used for determining confusable words corresponding to the error participles and determining the confusable degree of the confusable words corresponding to the error participles; and selecting the confusable words corresponding to the wrong segmentation according to a preset rule based on the confusable degree of the confusable words corresponding to the wrong segmentation, and taking a selection result as the error-correcting words corresponding to the wrong segmentation.
Has the advantages that: in such a mode, the influence of the confusable words is considered, and the accuracy of recognition is improved.
Drawings
FIG. 1 is a block diagram of a system architecture of an embodiment of an OCR-based post-loan inspection system of the present invention.
Detailed Description
The following is further detailed by the specific embodiments:
example 1
An embodiment of an OCR-based post-credit checking system of the present invention is substantially as shown in fig. 1, and includes:
the input module is used for shooting and extracting a text picture of a photocopy of the client data;
the character module is used for inputting the text pictures into a pre-trained text correction network model and outputting a character-level classification saliency map;
the correction module is used for correcting the text picture and the classified saliency map by utilizing a strip region transformation algorithm and outputting a corrected picture;
the recognition module is used for converting the print characters in the corrected picture into an image file of a black-and-white dot matrix by adopting an optical character recognition technology, converting characters in the image file into a text format and outputting a recognition text;
the error correction module is used for performing correlation identification on the identification text and the service library and determining error correction words corresponding to error segmentation words with identification errors in the identification text;
the text module is used for replacing the error word with the corresponding error word in the recognized text to obtain a corresponding error correction candidate text, determining an error correction confidence coefficient corresponding to the error correction candidate text through a neural network model, and taking the error correction candidate text with the error correction confidence coefficient larger than a first threshold value as the error correction text;
and the comparison module is used for comparing the information according to the error correction text, judging whether the information comparison is consistent, and prompting manual rechecking if the information comparison is inconsistent.
In this embodiment, the client-related information in the photocopy required by the post-credit check process needs to be extracted, including an identification card, a business license, an authorized book number, and authorizer identification information.
S1, shooting and extracting the text picture of the photocopy of the client material.
Firstly, an input module shoots and extracts text pictures of photocopies of client data, in the embodiment, a high-definition camera is adopted to shoot the photocopies of various data provided by a client, and after shooting is finished, the shot text pictures are led into a server so as to be processed in the next step.
And S2, inputting the text picture into a pre-trained text correction network model, and outputting a character-level classification saliency map.
And a character module is mounted on the server, the text image is input to a pre-trained text correction network model, and a character-level classification saliency map is output. In this embodiment, the character module may be implemented by a program, and specifically includes an extraction unit and a generation unit, where the extraction unit inputs a text image into a full convolution neural network to extract features of different scales and different depths, and then the generation unit performs feature fusion on the features of different scales and different depths by using a U-type network structure, so as to obtain a character-level classification saliency map.
In this embodiment, the training process of the text correction network model includes the steps of: initializing parameters of the text correction network model; acquiring training text pictures and real classification saliency map labels; inputting the training text picture into a text correction network model to obtain a prediction saliency map; calculating network loss according to the predicted saliency map and the real classification saliency map labels, and updating and correcting network parameters according to the calculated network loss; and repeating the processes continuously until a certain number of rounds is reached, finishing training and storing the corrected network parameters. In this way, training of the text correction network model can be completed, and more specific details can be referred to in the prior art.
And S3, correcting the text picture and the classified saliency map by using a strip region transformation algorithm, and outputting a corrected picture.
And a correction module is mounted on the server, corrects the text picture and the classified saliency map by adopting a strip region transformation algorithm, and outputs a corrected picture. In particular, the detailed implementation process of the stripe region transformation algorithm can be referred to the relevant part of patent CN 111144411A.
And S4, converting the print characters in the corrected picture into an image file of a black-and-white dot matrix by adopting an optical character recognition technology, converting characters in the image file into a text format, and outputting a recognition text.
The server is provided with a recognition module, which is OCR software in this embodiment, and is configured to recognize the corrected picture, extract characters therein, and output a recognition text. Specifically, the shape is determined by detecting the dark and light patterns, then the shape is translated into computer characters by a character recognition method, that is, characters in a corrected picture are optically converted into an image file of black and white dot matrix for print characters, and the characters in the image are converted into a text format by recognition software, so that a recognition text is output.
And S5, performing correlation identification on the identification text and the service library, and determining error-correcting words corresponding to the error segmentation words with the identification errors in the identification text.
And the server is provided with an error correction module which is used for determining error correction words corresponding to the error segmentation words identified in the identified text and replacing the error correction words with the corresponding error segmentation words in the identified text so as to obtain corresponding error correction candidate texts. The error correction words correct and identify error word segments in the text, and perform associated identification with the service library, for example, identify the household address with the text of Zhang III, specifically, "Chongqing City Hechuan county coin pool town x village". In the above-described recognition text, the error-segmented word is "Hechuan county", because "Hechuan county" has been renamed to "Hechuan district" in 2007, and the corresponding error-correcting word is "Hechuan district". In this way, the recognition text can be corrected to "different town xx village in the Hechuan area of Chongqing city" by the error correction word "Hechuan area", thereby obtaining an error correction candidate text with the content of "different town xx village in Hechuan area of Chongqing city".
In addition, the error correction module further comprises a confidence unit which determines the recognition confidence of each participle in the recognized text before determining the error correction word corresponding to the error participle with the recognition error in the recognized text, and takes the participle with the recognition confidence smaller than a second threshold value as the error participle. Since each error word may have more than one corresponding error correction word, the obtained error correction candidate text may also have more than one error correction word. For example, if there are 2 error segmentations in the recognized text, there are 3 error correction words corresponding to the 1 st error segmentation, and there are 2 error correction words corresponding to the second error segmentation, the number of error correction candidate texts corresponding to the recognized text is 3 × 2 — 6. Specifically, the posterior probability of each word segmentation in the recognized text is obtained and is used as the recognition confidence of each word segmentation. For example, there are two error phrases "a 1", "a 2"; the second threshold is 90%, and the posterior probabilities, i.e., recognition confidences, of the error participles "a 1" and "a 2" are 92% and 88%, respectively. Therefore, "a 2" should be taken as a wrong participle. In this embodiment, the prior art can be referred to for the calculation of the posterior probability.
S6, replacing the error word with the corresponding error segmentation word in the recognition text to obtain a corresponding error correction candidate text, determining the error correction confidence corresponding to the error correction candidate text through a neural network model, and taking the error correction candidate text with the error correction confidence greater than a first threshold value as the error correction text.
The server is provided with a text module, the error correction words are used for replacing corresponding error participles in the recognized text to obtain corresponding error correction candidate texts, the error correction confidence degrees corresponding to the error correction candidate texts are determined through a neural network model, and the error correction candidate texts with the error correction confidence degrees larger than a first threshold value are used as the error correction texts. When determining the error-correcting words corresponding to the error participles, there may be multiple choices, so that there may be more than one error-correcting word corresponding to each error participle, and therefore, the error-correcting confidence corresponding to the error-correcting candidate text is calculated through the neural network model, which may specifically refer to the prior art. For example, the error word is "ding" and the corresponding error correction word may be "subscription" or "subscription". If the first threshold is 90%, the error correction confidence of the "subscription" is 87%, and the error correction confidence of the "subscription" is 91%, then the "subscription" is the error correction text in the two error correction candidate texts of the "subscription" and the "subscription".
And S7, comparing the information according to the error correction text, judging whether the information comparison is consistent, and prompting manual recheck if the information comparison is inconsistent.
The comparison module extracts information recorded in the post-credit system, namely service information, personal identity information and other related information stored in the post-credit system through a detection rule configured in the post-credit system, and compares the information with the information of the error correction text. If the information comparison is consistent, directly marking the information as pass of the check, and ending the post-loan data check process after recording the relevant information; and if the comparison is inconsistent, prompting post-loan personnel to perform manual review on an interface of the post-loan system.
Example 2
The difference from the embodiment 1 is that the error correction module further includes an obfuscating unit, which determines confusable words corresponding to the erroneous participles, and determines confusability degrees of the confusable words corresponding to the erroneous participles; and selecting the confusable words corresponding to the wrong segmentation according to a preset rule based on the confusable degree of the confusable words corresponding to the wrong segmentation, and taking a selection result as the error-correcting words corresponding to the wrong segmentation. In this embodiment, confusable words such as "reservation", "subscription", and "subscription" are collected in advance. For recognizing wrong segmentation in the text, determining confusable words corresponding to the wrong segmentation based on pre-collected data, and manually setting confusable degrees of the confusable words in advance.
Example 3
The difference from the embodiment 2 is that the loan payment system further comprises a client side, which is used for collecting behavior data of the user related to the loan in real time, analyzing the behavior data, and judging whether the user is a high-quality client with both a repayment willingness and a repayment capability. In the embodiment, the client is a bank APP, the client is provided for the user to use for a period of free, and data related to loan behaviors of the user is collected in real time in the use period of the user.
Specifically, the method comprises the steps that short messages received by a user every day are read, and semantic analysis is carried out by extracting keywords in the short messages, so that the repayment willingness and the repayment capacity of the user are judged. For example, the short message of the bank card of the user may contain words such as "balance", "arrearage", etc., and then semantic analysis is performed to determine whether the balance of the bank card is less than the arrearage, and the number of times that the balance of each month is less than the arrearage is counted. If the number of times of the balance less than the arrearage per month exceeds 3, namely the balance is insufficient and appears for 3 times or more per month, the repayment will not be good and the repayment capacity of the user can be judged to be bad; on the contrary, if the user does not receive short messages such as insufficient balance, arrearage and the like in the service life, the user can be judged to be a high-quality client with better repayment willingness and repayment capability.
In addition, the working condition and the consumption habit of the user are analyzed through the positioning data and the Bluetooth card punching of the mobile phone. For example, the work type of the user can be analyzed through the Bluetooth card punching place of the mobile phone, so that the income condition of the user can be judged. For example, the Bluetooth card-punching places of the users are all displayed in a high-grade office building, and the card-punching time is about 22 times per month, which indicates that the users are likely to be white-collar workers of companies, are full of work per month, do not require leave, spacious work and late arrival, can judge that the cash flow of the users is stable, have good repayment willingness and repayment capacity, and judge that the users are high-quality clients. For the positioning data, the consumption habits of the users are analyzed according to the frequently-going consumption places and the entertainment places, and if the users frequently go in and out of the places with high consumption levels, the repayment willingness and the repayment capacity of the users can be judged to be poor. And finally, the data such as the mobile phone short message information, the mobile phone positioning data, the Bluetooth card punching place and the like of the user can be integrated to analyze the repayment willingness and the repayment capacity of the user, so that the integrated judgment is made.
The foregoing is merely an example of the present invention, and common general knowledge in the field of known specific structures and characteristics is not described herein in any greater extent than that known in the art at the filing date or prior to the priority date of the application, so that those skilled in the art can now appreciate that all of the above-described techniques in this field and have the ability to apply routine experimentation before this date can be combined with one or more of the present teachings to complete and implement the present invention, and that certain typical known structures or known methods do not pose any impediments to the implementation of the present invention by those skilled in the art. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the patent. The scope of the claims of the present application shall be determined by the contents of the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.
Claims (10)
1. An OCR-based post-credit checking method, characterized in that,
the method comprises the following steps:
s1, shooting and extracting the text picture of the photocopy of the client data;
s2, inputting the text picture into a pre-trained text correction network model, and outputting a character-level classification saliency map;
s3, correcting the text picture and the classified saliency map by using a strip region transformation algorithm, and outputting a corrected picture;
s4, converting the print characters in the corrected picture into an image file of a black-and-white dot matrix by adopting an optical character recognition technology, converting characters in the image file into a text format, and outputting a recognition text;
s5, performing correlation identification on the identification text and the service library, and determining error-correcting words corresponding to error-segmented words with identification errors in the identification text;
s6, replacing the error word with the corresponding error segmentation word in the recognition text to obtain a corresponding error correction candidate text, determining the error correction confidence corresponding to the error correction candidate text through a neural network model, and taking the error correction candidate text with the error correction confidence greater than a first threshold value as the error correction text;
and S7, comparing the information according to the error correction text, judging whether the information comparison is consistent, and prompting manual recheck if the information comparison is inconsistent.
2. An OCR-based post-credit inspection method as claimed in claim 1,
s2 specifically includes: s21, inputting the text pictures into the full convolution neural network to extract features of different scales and different depths, and S22, adopting a U-shaped network structure to perform feature fusion on the features of different scales and different depths to obtain a character-level classification saliency map.
3. An OCR-based post-credit inspection method as claimed in claim 2,
the training process of the text correction network model comprises the following steps:
a1, initializing parameters of the text correction network model;
a2, acquiring training text pictures and real classification saliency map labels;
a3, inputting a training text picture into a text correction network model to obtain a prediction saliency map;
a4, calculating network loss according to the predicted saliency map and the real classification saliency map labels, and then updating and correcting network parameters according to the calculated network loss;
and A5, continuously repeating the above processes until a certain number of turns is reached, finishing training, and storing the corrected network parameters.
4. An OCR-based post-credit inspection method as claimed in claim 3,
before determining the error-correcting word corresponding to the error-segmented word with the recognition error in the recognized text, the method further comprises the following steps:
and determining the recognition confidence of each participle in the recognized text, and taking the participle with the recognition confidence smaller than a second threshold value as an error participle.
5. An OCR-based post-credit inspection method as claimed in claim 4,
determining error correction words corresponding to the error segmentation words with the recognition errors in the recognition text, wherein the error correction words comprise:
for the error word segmentation in the recognized text, determining a confusable word corresponding to the error word segmentation, and determining the confusable degree of the confusable word corresponding to the error word segmentation;
and selecting the confusable words corresponding to the wrong segmentation according to a preset rule based on the confusable degree of the confusable words corresponding to the wrong segmentation, and taking a selection result as the error-correcting words corresponding to the wrong segmentation.
6. An OCR-based post-credit checking system, characterized in that,
the method comprises the following steps:
the input module is used for shooting and extracting a text picture of a photocopy of the client data;
the character module is used for inputting the text pictures into a pre-trained text correction network model and outputting a character-level classification saliency map;
the correction module is used for correcting the text picture and the classified saliency map by utilizing a strip region transformation algorithm and outputting a corrected picture;
the recognition module is used for converting the print characters in the corrected picture into an image file of a black-and-white dot matrix by adopting an optical character recognition technology, converting characters in the image file into a text format and outputting a recognition text;
the error correction module is used for performing correlation identification on the identification text and the service library and determining error correction words corresponding to error segmentation words with identification errors in the identification text;
the text module is used for replacing the error word with the corresponding error word in the recognized text to obtain a corresponding error correction candidate text, determining an error correction confidence coefficient corresponding to the error correction candidate text through a neural network model, and taking the error correction candidate text with the error correction confidence coefficient larger than a first threshold value as the error correction text;
and the comparison module is used for comparing the information according to the error correction text, judging whether the information comparison is consistent, and prompting manual rechecking if the information comparison is inconsistent.
7. An OCR-based post-credit inspection system as claimed in claim 6,
the character module specifically includes: and the extraction unit is used for inputting the text picture into the full convolution neural network to extract the features with different scales and different depths, and the generation unit is used for performing feature fusion on the features with different scales and different depths by adopting a U-shaped network structure to obtain the character-level classification saliency map.
8. An OCR-based post-credit inspection system as claimed in claim 7,
the training process of the text correction network model comprises the following steps:
a1, initializing parameters of the text correction network model;
a2, acquiring training text pictures and real classification saliency map labels;
a3, inputting a training text picture into a text correction network model to obtain a prediction saliency map;
a4, calculating network loss according to the predicted saliency map and the real classification saliency map labels, and then updating and correcting network parameters according to the calculated network loss;
and A5, continuously repeating the above processes until a certain number of turns is reached, finishing training, and storing the corrected network parameters.
9. An OCR-based post-credit checking system as recited in claim 8 wherein the error correction module further includes a confidence unit for determining a recognition confidence of each of the recognized text segments before determining the error correction word corresponding to the erroneous word segment recognized in the recognized text, and regarding the word segment whose recognition confidence is less than the second threshold as the erroneous word segment.
10. An OCR-based post-credit inspection system as recited in claim 9 wherein the error correction module further includes a confusing unit for determining confusable words corresponding to erroneous tokens, determining confusability of the confusable words corresponding to the erroneous tokens; and selecting the confusable words corresponding to the wrong segmentation according to a preset rule based on the confusable degree of the confusable words corresponding to the wrong segmentation, and taking a selection result as the error-correcting words corresponding to the wrong segmentation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010758400.4A CN111861731A (en) | 2020-07-31 | 2020-07-31 | Post-credit check system and method based on OCR |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010758400.4A CN111861731A (en) | 2020-07-31 | 2020-07-31 | Post-credit check system and method based on OCR |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111861731A true CN111861731A (en) | 2020-10-30 |
Family
ID=72953493
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010758400.4A Pending CN111861731A (en) | 2020-07-31 | 2020-07-31 | Post-credit check system and method based on OCR |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111861731A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112396049A (en) * | 2020-11-19 | 2021-02-23 | 平安普惠企业管理有限公司 | Text error correction method and device, computer equipment and storage medium |
CN112434686A (en) * | 2020-11-16 | 2021-03-02 | 浙江大学 | End-to-end error-containing text classification recognition instrument for OCR (optical character recognition) picture |
CN112926306A (en) * | 2021-03-08 | 2021-06-08 | 北京百度网讯科技有限公司 | Text error correction method, device, equipment and storage medium |
CN112990182A (en) * | 2021-05-10 | 2021-06-18 | 北京轻松筹信息技术有限公司 | Finance information auditing method and system and electronic equipment |
CN113807953A (en) * | 2021-09-24 | 2021-12-17 | 重庆富民银行股份有限公司 | Wind control management method and system based on telephone return visit |
CN114554086A (en) * | 2022-02-10 | 2022-05-27 | 支付宝(杭州)信息技术有限公司 | Auxiliary shooting method and device and electronic equipment |
CN114926831A (en) * | 2022-05-31 | 2022-08-19 | 平安普惠企业管理有限公司 | Text-based recognition method and device, electronic equipment and readable storage medium |
CN116704523A (en) * | 2023-08-07 | 2023-09-05 | 山东成信彩印有限公司 | Text typesetting image recognition system for publishing and printing equipment |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000148844A (en) * | 1998-09-11 | 2000-05-30 | Nissan Fire & Marine Insurance Co Ltd | Insurance business processing system, device and method for preparing insurance slip, and computer-readable recording medium recorded with program for executing the method by computer |
CN107977356A (en) * | 2017-11-21 | 2018-05-01 | 新疆科大讯飞信息科技有限责任公司 | Method and device for correcting recognized text |
CN109190594A (en) * | 2018-09-21 | 2019-01-11 | 广东蔚海数问大数据科技有限公司 | Optical Character Recognition system and information extracting method |
CN109753636A (en) * | 2017-11-01 | 2019-05-14 | 阿里巴巴集团控股有限公司 | Machine processing and text error correction method and device calculate equipment and storage medium |
CN109815463A (en) * | 2018-12-13 | 2019-05-28 | 深圳壹账通智能科技有限公司 | Control method, device, computer equipment and storage medium are chosen in text editing |
CN110245331A (en) * | 2018-03-09 | 2019-09-17 | 中兴通讯股份有限公司 | A kind of sentence conversion method, device, server and computer storage medium |
CN110443236A (en) * | 2019-08-06 | 2019-11-12 | 中国工商银行股份有限公司 | Text will put information extracting method and device after loan |
CN110533521A (en) * | 2019-06-21 | 2019-12-03 | 深圳前海微众银行股份有限公司 | Method for early warning, device, equipment and readable storage medium storing program for executing after dynamic is borrowed |
CN111079768A (en) * | 2019-12-23 | 2020-04-28 | 北京爱医生智慧医疗科技有限公司 | Character and image recognition method and device based on OCR |
CN111144411A (en) * | 2019-12-27 | 2020-05-12 | 南京大学 | Method and system for correcting and identifying irregular text based on saliency map |
CN111310443A (en) * | 2020-02-12 | 2020-06-19 | 新华智云科技有限公司 | Text error correction method and system |
CN111340032A (en) * | 2020-03-16 | 2020-06-26 | 天津得迈科技有限公司 | Character recognition method based on application scene in financial field |
-
2020
- 2020-07-31 CN CN202010758400.4A patent/CN111861731A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000148844A (en) * | 1998-09-11 | 2000-05-30 | Nissan Fire & Marine Insurance Co Ltd | Insurance business processing system, device and method for preparing insurance slip, and computer-readable recording medium recorded with program for executing the method by computer |
CN109753636A (en) * | 2017-11-01 | 2019-05-14 | 阿里巴巴集团控股有限公司 | Machine processing and text error correction method and device calculate equipment and storage medium |
CN107977356A (en) * | 2017-11-21 | 2018-05-01 | 新疆科大讯飞信息科技有限责任公司 | Method and device for correcting recognized text |
CN110245331A (en) * | 2018-03-09 | 2019-09-17 | 中兴通讯股份有限公司 | A kind of sentence conversion method, device, server and computer storage medium |
CN109190594A (en) * | 2018-09-21 | 2019-01-11 | 广东蔚海数问大数据科技有限公司 | Optical Character Recognition system and information extracting method |
CN109815463A (en) * | 2018-12-13 | 2019-05-28 | 深圳壹账通智能科技有限公司 | Control method, device, computer equipment and storage medium are chosen in text editing |
CN110533521A (en) * | 2019-06-21 | 2019-12-03 | 深圳前海微众银行股份有限公司 | Method for early warning, device, equipment and readable storage medium storing program for executing after dynamic is borrowed |
CN110443236A (en) * | 2019-08-06 | 2019-11-12 | 中国工商银行股份有限公司 | Text will put information extracting method and device after loan |
CN111079768A (en) * | 2019-12-23 | 2020-04-28 | 北京爱医生智慧医疗科技有限公司 | Character and image recognition method and device based on OCR |
CN111144411A (en) * | 2019-12-27 | 2020-05-12 | 南京大学 | Method and system for correcting and identifying irregular text based on saliency map |
CN111310443A (en) * | 2020-02-12 | 2020-06-19 | 新华智云科技有限公司 | Text error correction method and system |
CN111340032A (en) * | 2020-03-16 | 2020-06-26 | 天津得迈科技有限公司 | Character recognition method based on application scene in financial field |
Non-Patent Citations (2)
Title |
---|
夏昌新;莫浩泓;王成鑫;王瑶;闫仕宇;: "基于深度学习的图像文字识别技术研究与应用" * |
杜一谦;: "消费信贷智能决策模型系统设计与实现" * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112434686A (en) * | 2020-11-16 | 2021-03-02 | 浙江大学 | End-to-end error-containing text classification recognition instrument for OCR (optical character recognition) picture |
CN112434686B (en) * | 2020-11-16 | 2023-05-23 | 浙江大学 | End-to-end misplaced text classification identifier for OCR (optical character) pictures |
CN112396049A (en) * | 2020-11-19 | 2021-02-23 | 平安普惠企业管理有限公司 | Text error correction method and device, computer equipment and storage medium |
CN112926306A (en) * | 2021-03-08 | 2021-06-08 | 北京百度网讯科技有限公司 | Text error correction method, device, equipment and storage medium |
CN112926306B (en) * | 2021-03-08 | 2024-01-23 | 北京百度网讯科技有限公司 | Text error correction method, device, equipment and storage medium |
CN112990182A (en) * | 2021-05-10 | 2021-06-18 | 北京轻松筹信息技术有限公司 | Finance information auditing method and system and electronic equipment |
CN112990182B (en) * | 2021-05-10 | 2021-09-21 | 北京轻松筹信息技术有限公司 | Finance information auditing method and system and electronic equipment |
CN113807953B (en) * | 2021-09-24 | 2023-11-03 | 重庆富民银行股份有限公司 | Wind control management method and system based on telephone return visit |
CN113807953A (en) * | 2021-09-24 | 2021-12-17 | 重庆富民银行股份有限公司 | Wind control management method and system based on telephone return visit |
CN114554086A (en) * | 2022-02-10 | 2022-05-27 | 支付宝(杭州)信息技术有限公司 | Auxiliary shooting method and device and electronic equipment |
CN114926831A (en) * | 2022-05-31 | 2022-08-19 | 平安普惠企业管理有限公司 | Text-based recognition method and device, electronic equipment and readable storage medium |
CN116704523B (en) * | 2023-08-07 | 2023-10-20 | 山东成信彩印有限公司 | Text typesetting image recognition system for publishing and printing equipment |
CN116704523A (en) * | 2023-08-07 | 2023-09-05 | 山东成信彩印有限公司 | Text typesetting image recognition system for publishing and printing equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111861731A (en) | Post-credit check system and method based on OCR | |
JP6528147B2 (en) | Accounting data entry support system, method and program | |
US8300942B2 (en) | Area extraction program, character recognition program, and character recognition device | |
US20060202012A1 (en) | Secure data processing system, such as a system for detecting fraud and expediting note processing | |
US6567765B1 (en) | Evaluation system and method for fingerprint verification | |
WO2021042505A1 (en) | Note generation method and apparatus based on character recognition technology, and computer device | |
CN109840520A (en) | A kind of invoice key message recognition methods and system | |
CN111784498A (en) | Identity authentication method and device, electronic equipment and storage medium | |
CN110503099B (en) | Information identification method based on deep learning and related equipment | |
CN113095307B (en) | Automatic identification method for financial voucher information | |
CN112396047B (en) | Training sample generation method and device, computer equipment and storage medium | |
CN115810134B (en) | Image acquisition quality inspection method, system and device for vehicle insurance anti-fraud | |
CN111415336A (en) | Image tampering identification method and device, server and storage medium | |
CN113158777A (en) | Quality scoring method, quality scoring model training method and related device | |
CN117454426A (en) | Method, device and system for desensitizing and collecting information of claim settlement data | |
CN116189063B (en) | Key frame optimization method and device for intelligent video monitoring | |
CN115688107B (en) | Fraud-related APP detection system and method | |
CN111861733A (en) | Fraud prevention and control system and method based on address fuzzy matching | |
CN111881880A (en) | Bill text recognition method based on novel network | |
CN110942073A (en) | Container trailer number identification method and device and computer equipment | |
CN115205882A (en) | Intelligent identification and processing method for expense voucher in medical industry | |
CN115116119A (en) | Face recognition system based on digital image processing technology | |
CN114625872A (en) | Risk auditing method, system and equipment based on global pointer and storage medium | |
CN113807256A (en) | Bill data processing method and device, electronic equipment and storage medium | |
CN113610098B (en) | Tax payment number identification method and device, storage medium and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201030 |