CN111861731A - Post-credit check system and method based on OCR - Google Patents

Post-credit check system and method based on OCR Download PDF

Info

Publication number
CN111861731A
CN111861731A CN202010758400.4A CN202010758400A CN111861731A CN 111861731 A CN111861731 A CN 111861731A CN 202010758400 A CN202010758400 A CN 202010758400A CN 111861731 A CN111861731 A CN 111861731A
Authority
CN
China
Prior art keywords
text
error
error correction
recognition
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010758400.4A
Other languages
Chinese (zh)
Inventor
何寒曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Fumin Bank Co Ltd
Original Assignee
Chongqing Fumin Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Fumin Bank Co Ltd filed Critical Chongqing Fumin Bank Co Ltd
Priority to CN202010758400.4A priority Critical patent/CN111861731A/en
Publication of CN111861731A publication Critical patent/CN111861731A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Strategic Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Character Discrimination (AREA)

Abstract

The invention relates to the technical field of computers, in particular to an OCR-based post-loan inspection method, which comprises the following steps: s1, shooting and extracting the text picture of the photocopy of the client data; s2, inputting the text picture into a pre-trained text correction network model; s3, correcting the text picture and the classified saliency map by using a strip region transformation algorithm; s4, outputting a recognition text by adopting an optical character recognition technology; s5, determining error-correcting words corresponding to the error word segmentation with the wrong recognition in the recognition text; s6, determining an error correction confidence coefficient corresponding to the error correction candidate text through the neural network model, and taking the error correction candidate text with the error correction confidence coefficient larger than a first threshold value as the error correction text; and S7, comparing the information according to the error correction text, judging whether the information comparison is consistent, and prompting manual recheck if the information comparison is inconsistent. The invention solves the technical problem that content identification and grabbing are inaccurate for the photocopy piece with poor photocopy quality by using an OCR technology.

Description

Post-credit check system and method based on OCR
Technical Field
The invention relates to the technical field of computers, in particular to an OCR-based post-loan inspection system and method.
Background
Currently, when a loan application is required, bank workers will review the authenticity of the material provided by the borrower. In most cases, banks use manual review, i.e., workers review the materials provided by the borrowers one by one to determine whether the borrowers meet the conditions of the bank's loan. However, the manual auditing mode has low efficiency and high labor cost; when the workload is large, the repeated work of the staff for a long time is easy to cause errors.
In view of the above, document CN109697665A discloses a loan auditing method, device, equipment and medium based on artificial intelligence, which includes: obtaining a loan application request, wherein the loan application request comprises an identity card image and user personal information; adopting an OCR recognition technology to carry out recognition verification on the identity card image to obtain basic information of the user; inquiring a third-party credit investigation platform based on the user basic information to obtain a user credit investigation value; forming a voice question based on the personal information of the user, broadcasting the voice question and starting a camera to record to obtain a monitoring video; calling a pre-established micro expression recognition model to detect the monitoring video to obtain a micro expression detection result; and obtaining a loan auditing result based on the micro-expression detection result and the credit investigation value of the user. The auditing process does not need manual intervention, can intelligently audit the authenticity of the information of the borrowers, and effectively improves the efficiency of loan auditing.
Similarly, aiming at the problem of excessive post-loan inspection occupied post-loan, by introducing an optical Character recognition technology, namely OCR (optical Character recognition), automatic content recognition and capture are performed on customer data related to services stored in a post-loan system, and automatic comparison and inspection are performed on contents to be inspected through a preset rule to achieve automatic post-loan inspection, so that the labor input in the current post-loan process can be greatly reduced. However, most of the customer's material is a photocopy, and the photocopy quality is uneven, and for the photocopy with poor photocopy quality, it is not accurate to recognize and capture the content by the OCR technology. In addition, because of the problems of light, font color shade, and shooting angle of the photocopy, OCR has a certain error recognition rate and does not represent that the data has a certain problem or error.
Disclosure of Invention
The invention provides an OCR-based post-loan inspection system and method, which solve the technical problem that content identification and grabbing of a photocopy with poor photocopy quality are inaccurate by using an OCR technology.
The basic scheme provided by the invention is as follows: an OCR-based post-credit inspection method comprising the steps of:
s1, shooting and extracting the text picture of the photocopy of the client data;
s2, inputting the text picture into a pre-trained text correction network model, and outputting a character-level classification saliency map;
s3, correcting the text picture and the classified saliency map by using a strip region transformation algorithm, and outputting a corrected picture;
s4, converting the print characters in the corrected picture into an image file of a black-and-white dot matrix by adopting an optical character recognition technology, converting characters in the image file into a text format, and outputting a recognition text;
s5, performing correlation identification on the identification text and the service library, and determining error-correcting words corresponding to error-segmented words with identification errors in the identification text;
s6, replacing the error word with the corresponding error segmentation word in the recognition text to obtain a corresponding error correction candidate text, determining the error correction confidence corresponding to the error correction candidate text through a neural network model, and taking the error correction candidate text with the error correction confidence greater than a first threshold value as the error correction text;
and S7, comparing the information according to the error correction text, judging whether the information comparison is consistent, and prompting manual recheck if the information comparison is inconsistent.
The working principle and the advantages of the invention are as follows:
1. the photocopy of the customer material need not be horizontal text, but may be irregular text such as slanted text, curved text, or perspective warped text. For irregular texts, the OCR technology is directly adopted for recognition, and the effect is poor. In the scheme, the text picture is corrected through the text correction network model and the strip region transformation algorithm, so that the accuracy and the robustness of recognition are improved. In addition, the accuracy of identifying and capturing the content can be well ensured by correcting the error of the identification text and comparing the information with the corrected error correction text.
2. For the service with the consistent information comparison, the service can be directly marked as pass in the post-credit check process, and the post-credit data check process is ended after the relevant information is recorded. And prompting the post-loan personnel to manually recheck the business with inconsistent information comparison. Because of the problems of light, font color depth, shooting angle and the like of the photocopy, the OCR has a certain error identification rate and does not represent that the data has certain problems or errors, and the post-loan personnel rechecks the data by calling the data and manually confirms the condition of the business data, so that the post-loan inspection accuracy can be ensured.
The invention corrects the text picture through the text correction network model and the strip region transformation algorithm, and then corrects the recognized text, thereby solving the technical problem that the content recognition and capture of the photocopy with poor photocopy quality are not accurate by using an OCR technology.
Further, S2 specifically includes: s21, inputting the text pictures into the full convolution neural network to extract features of different scales and different depths, and S22, adopting a U-shaped network structure to perform feature fusion on the features of different scales and different depths to obtain a character-level classification saliency map.
Has the advantages that: by the method, the features of different scales and different depths are extracted, then the features of different scales and different depths are subjected to feature fusion, and the obtained character-level classification saliency map contains the features of main scales and depths, so that the identification accuracy can be effectively improved.
Further, the training process of the text correction network model comprises the following steps:
a1, initializing parameters of the text correction network model;
a2, acquiring training text pictures and real classification saliency map labels;
a3, inputting a training text picture into a text correction network model to obtain a prediction saliency map;
a4, calculating network loss according to the predicted saliency map and the real classification saliency map labels, and then updating and correcting network parameters according to the calculated network loss;
and A5, continuously repeating the above processes until a certain number of turns is reached, finishing training, and storing the corrected network parameters.
Has the advantages that: the method is used for training the text correction network model, so that the correction speed of the text picture is improved, and the correction precision is improved.
Further, before determining the error-correcting word corresponding to the error-segmented word with the recognition error in the recognition text, the method further includes:
and determining the recognition confidence of each participle in the recognized text, and taking the participle with the recognition confidence smaller than a second threshold value as an error participle.
Has the advantages that: and taking the participles with the recognition confidence coefficient smaller than the second threshold value as error participles, and indirectly controlling and judging the standard tightness degree of the error participles according to different conditions in a mode of presetting the second threshold value.
Further, determining the error-correcting words corresponding to the error-segmented words with the wrong recognition in the recognition text comprises:
for the error word segmentation in the recognized text, determining a confusable word corresponding to the error word segmentation, and determining the confusable degree of the confusable word corresponding to the error word segmentation;
and selecting the confusable words corresponding to the wrong segmentation according to a preset rule based on the confusable degree of the confusable words corresponding to the wrong segmentation, and taking a selection result as the error-correcting words corresponding to the wrong segmentation.
Has the advantages that: in such a way, the confusable words corresponding to the wrong segmentation are determined, and then the confusable words corresponding to the wrong segmentation are selected according to the preset rule, so that the influence of the confusable words is considered, and the accuracy of recognition is improved.
Based on the OCR-based post-loan inspection method, there is also disclosed an OCR-based post-loan inspection system, including:
the input module is used for shooting and extracting a text picture of a photocopy of the client data;
the character module is used for inputting the text pictures into a pre-trained text correction network model and outputting a character-level classification saliency map;
the correction module is used for correcting the text picture and the classified saliency map by utilizing a strip region transformation algorithm and outputting a corrected picture;
the recognition module is used for converting the print characters in the corrected picture into an image file of a black-and-white dot matrix by adopting an optical character recognition technology, converting characters in the image file into a text format and outputting a recognition text;
the error correction module is used for performing correlation identification on the identification text and the service library and determining error correction words corresponding to error segmentation words with identification errors in the identification text;
the text module is used for replacing the error word with the corresponding error word in the recognized text to obtain a corresponding error correction candidate text, determining an error correction confidence coefficient corresponding to the error correction candidate text through a neural network model, and taking the error correction candidate text with the error correction confidence coefficient larger than a first threshold value as the error correction text;
and the comparison module is used for comparing the information according to the error correction text, judging whether the information comparison is consistent, and prompting manual rechecking if the information comparison is inconsistent.
The working principle and the advantages of the invention are as follows: in the scheme, the text image is corrected through the text correction network model and the strip region transformation algorithm, so that the accuracy and the robustness of recognition are improved; by correcting the error of the identification text and comparing the information with the corrected error correction text, the accuracy of identifying and capturing the content can be well ensured.
Further, the character module specifically includes: and the extraction unit is used for inputting the text picture into the full convolution neural network to extract the features with different scales and different depths, and the generation unit is used for performing feature fusion on the features with different scales and different depths by adopting a U-shaped network structure to obtain the character-level classification saliency map.
Has the advantages that: by the method, the accuracy of identification can be effectively improved.
Further, the training process of the text correction network model comprises the following steps:
a1, initializing parameters of the text correction network model;
a2, acquiring training text pictures and real classification saliency map labels;
a3, inputting a training text picture into a text correction network model to obtain a prediction saliency map;
a4, calculating network loss according to the predicted saliency map and the real classification saliency map labels, and then updating and correcting network parameters according to the calculated network loss;
and A5, continuously repeating the above processes until a certain number of turns is reached, finishing training, and storing the corrected network parameters.
Has the advantages that: therefore, the method is beneficial to improving the correction rate of the text picture and improving the correction precision.
Further, the error correction module further comprises a confidence unit, which is used for determining the recognition confidence of each participle in the recognized text before determining the error-correcting word corresponding to the erroneous participle in the recognized text, and taking the participle with the recognition confidence smaller than a second threshold value as the erroneous participle.
Has the advantages that: the degree of tightness of the standard of the wrong word segmentation can be indirectly controlled and judged according to different conditions.
Furthermore, the error correction module also comprises a confusion unit used for determining confusable words corresponding to the error participles and determining the confusable degree of the confusable words corresponding to the error participles; and selecting the confusable words corresponding to the wrong segmentation according to a preset rule based on the confusable degree of the confusable words corresponding to the wrong segmentation, and taking a selection result as the error-correcting words corresponding to the wrong segmentation.
Has the advantages that: in such a mode, the influence of the confusable words is considered, and the accuracy of recognition is improved.
Drawings
FIG. 1 is a block diagram of a system architecture of an embodiment of an OCR-based post-loan inspection system of the present invention.
Detailed Description
The following is further detailed by the specific embodiments:
example 1
An embodiment of an OCR-based post-credit checking system of the present invention is substantially as shown in fig. 1, and includes:
the input module is used for shooting and extracting a text picture of a photocopy of the client data;
the character module is used for inputting the text pictures into a pre-trained text correction network model and outputting a character-level classification saliency map;
the correction module is used for correcting the text picture and the classified saliency map by utilizing a strip region transformation algorithm and outputting a corrected picture;
the recognition module is used for converting the print characters in the corrected picture into an image file of a black-and-white dot matrix by adopting an optical character recognition technology, converting characters in the image file into a text format and outputting a recognition text;
the error correction module is used for performing correlation identification on the identification text and the service library and determining error correction words corresponding to error segmentation words with identification errors in the identification text;
the text module is used for replacing the error word with the corresponding error word in the recognized text to obtain a corresponding error correction candidate text, determining an error correction confidence coefficient corresponding to the error correction candidate text through a neural network model, and taking the error correction candidate text with the error correction confidence coefficient larger than a first threshold value as the error correction text;
and the comparison module is used for comparing the information according to the error correction text, judging whether the information comparison is consistent, and prompting manual rechecking if the information comparison is inconsistent.
In this embodiment, the client-related information in the photocopy required by the post-credit check process needs to be extracted, including an identification card, a business license, an authorized book number, and authorizer identification information.
S1, shooting and extracting the text picture of the photocopy of the client material.
Firstly, an input module shoots and extracts text pictures of photocopies of client data, in the embodiment, a high-definition camera is adopted to shoot the photocopies of various data provided by a client, and after shooting is finished, the shot text pictures are led into a server so as to be processed in the next step.
And S2, inputting the text picture into a pre-trained text correction network model, and outputting a character-level classification saliency map.
And a character module is mounted on the server, the text image is input to a pre-trained text correction network model, and a character-level classification saliency map is output. In this embodiment, the character module may be implemented by a program, and specifically includes an extraction unit and a generation unit, where the extraction unit inputs a text image into a full convolution neural network to extract features of different scales and different depths, and then the generation unit performs feature fusion on the features of different scales and different depths by using a U-type network structure, so as to obtain a character-level classification saliency map.
In this embodiment, the training process of the text correction network model includes the steps of: initializing parameters of the text correction network model; acquiring training text pictures and real classification saliency map labels; inputting the training text picture into a text correction network model to obtain a prediction saliency map; calculating network loss according to the predicted saliency map and the real classification saliency map labels, and updating and correcting network parameters according to the calculated network loss; and repeating the processes continuously until a certain number of rounds is reached, finishing training and storing the corrected network parameters. In this way, training of the text correction network model can be completed, and more specific details can be referred to in the prior art.
And S3, correcting the text picture and the classified saliency map by using a strip region transformation algorithm, and outputting a corrected picture.
And a correction module is mounted on the server, corrects the text picture and the classified saliency map by adopting a strip region transformation algorithm, and outputs a corrected picture. In particular, the detailed implementation process of the stripe region transformation algorithm can be referred to the relevant part of patent CN 111144411A.
And S4, converting the print characters in the corrected picture into an image file of a black-and-white dot matrix by adopting an optical character recognition technology, converting characters in the image file into a text format, and outputting a recognition text.
The server is provided with a recognition module, which is OCR software in this embodiment, and is configured to recognize the corrected picture, extract characters therein, and output a recognition text. Specifically, the shape is determined by detecting the dark and light patterns, then the shape is translated into computer characters by a character recognition method, that is, characters in a corrected picture are optically converted into an image file of black and white dot matrix for print characters, and the characters in the image are converted into a text format by recognition software, so that a recognition text is output.
And S5, performing correlation identification on the identification text and the service library, and determining error-correcting words corresponding to the error segmentation words with the identification errors in the identification text.
And the server is provided with an error correction module which is used for determining error correction words corresponding to the error segmentation words identified in the identified text and replacing the error correction words with the corresponding error segmentation words in the identified text so as to obtain corresponding error correction candidate texts. The error correction words correct and identify error word segments in the text, and perform associated identification with the service library, for example, identify the household address with the text of Zhang III, specifically, "Chongqing City Hechuan county coin pool town x village". In the above-described recognition text, the error-segmented word is "Hechuan county", because "Hechuan county" has been renamed to "Hechuan district" in 2007, and the corresponding error-correcting word is "Hechuan district". In this way, the recognition text can be corrected to "different town xx village in the Hechuan area of Chongqing city" by the error correction word "Hechuan area", thereby obtaining an error correction candidate text with the content of "different town xx village in Hechuan area of Chongqing city".
In addition, the error correction module further comprises a confidence unit which determines the recognition confidence of each participle in the recognized text before determining the error correction word corresponding to the error participle with the recognition error in the recognized text, and takes the participle with the recognition confidence smaller than a second threshold value as the error participle. Since each error word may have more than one corresponding error correction word, the obtained error correction candidate text may also have more than one error correction word. For example, if there are 2 error segmentations in the recognized text, there are 3 error correction words corresponding to the 1 st error segmentation, and there are 2 error correction words corresponding to the second error segmentation, the number of error correction candidate texts corresponding to the recognized text is 3 × 2 — 6. Specifically, the posterior probability of each word segmentation in the recognized text is obtained and is used as the recognition confidence of each word segmentation. For example, there are two error phrases "a 1", "a 2"; the second threshold is 90%, and the posterior probabilities, i.e., recognition confidences, of the error participles "a 1" and "a 2" are 92% and 88%, respectively. Therefore, "a 2" should be taken as a wrong participle. In this embodiment, the prior art can be referred to for the calculation of the posterior probability.
S6, replacing the error word with the corresponding error segmentation word in the recognition text to obtain a corresponding error correction candidate text, determining the error correction confidence corresponding to the error correction candidate text through a neural network model, and taking the error correction candidate text with the error correction confidence greater than a first threshold value as the error correction text.
The server is provided with a text module, the error correction words are used for replacing corresponding error participles in the recognized text to obtain corresponding error correction candidate texts, the error correction confidence degrees corresponding to the error correction candidate texts are determined through a neural network model, and the error correction candidate texts with the error correction confidence degrees larger than a first threshold value are used as the error correction texts. When determining the error-correcting words corresponding to the error participles, there may be multiple choices, so that there may be more than one error-correcting word corresponding to each error participle, and therefore, the error-correcting confidence corresponding to the error-correcting candidate text is calculated through the neural network model, which may specifically refer to the prior art. For example, the error word is "ding" and the corresponding error correction word may be "subscription" or "subscription". If the first threshold is 90%, the error correction confidence of the "subscription" is 87%, and the error correction confidence of the "subscription" is 91%, then the "subscription" is the error correction text in the two error correction candidate texts of the "subscription" and the "subscription".
And S7, comparing the information according to the error correction text, judging whether the information comparison is consistent, and prompting manual recheck if the information comparison is inconsistent.
The comparison module extracts information recorded in the post-credit system, namely service information, personal identity information and other related information stored in the post-credit system through a detection rule configured in the post-credit system, and compares the information with the information of the error correction text. If the information comparison is consistent, directly marking the information as pass of the check, and ending the post-loan data check process after recording the relevant information; and if the comparison is inconsistent, prompting post-loan personnel to perform manual review on an interface of the post-loan system.
Example 2
The difference from the embodiment 1 is that the error correction module further includes an obfuscating unit, which determines confusable words corresponding to the erroneous participles, and determines confusability degrees of the confusable words corresponding to the erroneous participles; and selecting the confusable words corresponding to the wrong segmentation according to a preset rule based on the confusable degree of the confusable words corresponding to the wrong segmentation, and taking a selection result as the error-correcting words corresponding to the wrong segmentation. In this embodiment, confusable words such as "reservation", "subscription", and "subscription" are collected in advance. For recognizing wrong segmentation in the text, determining confusable words corresponding to the wrong segmentation based on pre-collected data, and manually setting confusable degrees of the confusable words in advance.
Example 3
The difference from the embodiment 2 is that the loan payment system further comprises a client side, which is used for collecting behavior data of the user related to the loan in real time, analyzing the behavior data, and judging whether the user is a high-quality client with both a repayment willingness and a repayment capability. In the embodiment, the client is a bank APP, the client is provided for the user to use for a period of free, and data related to loan behaviors of the user is collected in real time in the use period of the user.
Specifically, the method comprises the steps that short messages received by a user every day are read, and semantic analysis is carried out by extracting keywords in the short messages, so that the repayment willingness and the repayment capacity of the user are judged. For example, the short message of the bank card of the user may contain words such as "balance", "arrearage", etc., and then semantic analysis is performed to determine whether the balance of the bank card is less than the arrearage, and the number of times that the balance of each month is less than the arrearage is counted. If the number of times of the balance less than the arrearage per month exceeds 3, namely the balance is insufficient and appears for 3 times or more per month, the repayment will not be good and the repayment capacity of the user can be judged to be bad; on the contrary, if the user does not receive short messages such as insufficient balance, arrearage and the like in the service life, the user can be judged to be a high-quality client with better repayment willingness and repayment capability.
In addition, the working condition and the consumption habit of the user are analyzed through the positioning data and the Bluetooth card punching of the mobile phone. For example, the work type of the user can be analyzed through the Bluetooth card punching place of the mobile phone, so that the income condition of the user can be judged. For example, the Bluetooth card-punching places of the users are all displayed in a high-grade office building, and the card-punching time is about 22 times per month, which indicates that the users are likely to be white-collar workers of companies, are full of work per month, do not require leave, spacious work and late arrival, can judge that the cash flow of the users is stable, have good repayment willingness and repayment capacity, and judge that the users are high-quality clients. For the positioning data, the consumption habits of the users are analyzed according to the frequently-going consumption places and the entertainment places, and if the users frequently go in and out of the places with high consumption levels, the repayment willingness and the repayment capacity of the users can be judged to be poor. And finally, the data such as the mobile phone short message information, the mobile phone positioning data, the Bluetooth card punching place and the like of the user can be integrated to analyze the repayment willingness and the repayment capacity of the user, so that the integrated judgment is made.
The foregoing is merely an example of the present invention, and common general knowledge in the field of known specific structures and characteristics is not described herein in any greater extent than that known in the art at the filing date or prior to the priority date of the application, so that those skilled in the art can now appreciate that all of the above-described techniques in this field and have the ability to apply routine experimentation before this date can be combined with one or more of the present teachings to complete and implement the present invention, and that certain typical known structures or known methods do not pose any impediments to the implementation of the present invention by those skilled in the art. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the patent. The scope of the claims of the present application shall be determined by the contents of the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.

Claims (10)

1. An OCR-based post-credit checking method, characterized in that,
the method comprises the following steps:
s1, shooting and extracting the text picture of the photocopy of the client data;
s2, inputting the text picture into a pre-trained text correction network model, and outputting a character-level classification saliency map;
s3, correcting the text picture and the classified saliency map by using a strip region transformation algorithm, and outputting a corrected picture;
s4, converting the print characters in the corrected picture into an image file of a black-and-white dot matrix by adopting an optical character recognition technology, converting characters in the image file into a text format, and outputting a recognition text;
s5, performing correlation identification on the identification text and the service library, and determining error-correcting words corresponding to error-segmented words with identification errors in the identification text;
s6, replacing the error word with the corresponding error segmentation word in the recognition text to obtain a corresponding error correction candidate text, determining the error correction confidence corresponding to the error correction candidate text through a neural network model, and taking the error correction candidate text with the error correction confidence greater than a first threshold value as the error correction text;
and S7, comparing the information according to the error correction text, judging whether the information comparison is consistent, and prompting manual recheck if the information comparison is inconsistent.
2. An OCR-based post-credit inspection method as claimed in claim 1,
s2 specifically includes: s21, inputting the text pictures into the full convolution neural network to extract features of different scales and different depths, and S22, adopting a U-shaped network structure to perform feature fusion on the features of different scales and different depths to obtain a character-level classification saliency map.
3. An OCR-based post-credit inspection method as claimed in claim 2,
the training process of the text correction network model comprises the following steps:
a1, initializing parameters of the text correction network model;
a2, acquiring training text pictures and real classification saliency map labels;
a3, inputting a training text picture into a text correction network model to obtain a prediction saliency map;
a4, calculating network loss according to the predicted saliency map and the real classification saliency map labels, and then updating and correcting network parameters according to the calculated network loss;
and A5, continuously repeating the above processes until a certain number of turns is reached, finishing training, and storing the corrected network parameters.
4. An OCR-based post-credit inspection method as claimed in claim 3,
before determining the error-correcting word corresponding to the error-segmented word with the recognition error in the recognized text, the method further comprises the following steps:
and determining the recognition confidence of each participle in the recognized text, and taking the participle with the recognition confidence smaller than a second threshold value as an error participle.
5. An OCR-based post-credit inspection method as claimed in claim 4,
determining error correction words corresponding to the error segmentation words with the recognition errors in the recognition text, wherein the error correction words comprise:
for the error word segmentation in the recognized text, determining a confusable word corresponding to the error word segmentation, and determining the confusable degree of the confusable word corresponding to the error word segmentation;
and selecting the confusable words corresponding to the wrong segmentation according to a preset rule based on the confusable degree of the confusable words corresponding to the wrong segmentation, and taking a selection result as the error-correcting words corresponding to the wrong segmentation.
6. An OCR-based post-credit checking system, characterized in that,
the method comprises the following steps:
the input module is used for shooting and extracting a text picture of a photocopy of the client data;
the character module is used for inputting the text pictures into a pre-trained text correction network model and outputting a character-level classification saliency map;
the correction module is used for correcting the text picture and the classified saliency map by utilizing a strip region transformation algorithm and outputting a corrected picture;
the recognition module is used for converting the print characters in the corrected picture into an image file of a black-and-white dot matrix by adopting an optical character recognition technology, converting characters in the image file into a text format and outputting a recognition text;
the error correction module is used for performing correlation identification on the identification text and the service library and determining error correction words corresponding to error segmentation words with identification errors in the identification text;
the text module is used for replacing the error word with the corresponding error word in the recognized text to obtain a corresponding error correction candidate text, determining an error correction confidence coefficient corresponding to the error correction candidate text through a neural network model, and taking the error correction candidate text with the error correction confidence coefficient larger than a first threshold value as the error correction text;
and the comparison module is used for comparing the information according to the error correction text, judging whether the information comparison is consistent, and prompting manual rechecking if the information comparison is inconsistent.
7. An OCR-based post-credit inspection system as claimed in claim 6,
the character module specifically includes: and the extraction unit is used for inputting the text picture into the full convolution neural network to extract the features with different scales and different depths, and the generation unit is used for performing feature fusion on the features with different scales and different depths by adopting a U-shaped network structure to obtain the character-level classification saliency map.
8. An OCR-based post-credit inspection system as claimed in claim 7,
the training process of the text correction network model comprises the following steps:
a1, initializing parameters of the text correction network model;
a2, acquiring training text pictures and real classification saliency map labels;
a3, inputting a training text picture into a text correction network model to obtain a prediction saliency map;
a4, calculating network loss according to the predicted saliency map and the real classification saliency map labels, and then updating and correcting network parameters according to the calculated network loss;
and A5, continuously repeating the above processes until a certain number of turns is reached, finishing training, and storing the corrected network parameters.
9. An OCR-based post-credit checking system as recited in claim 8 wherein the error correction module further includes a confidence unit for determining a recognition confidence of each of the recognized text segments before determining the error correction word corresponding to the erroneous word segment recognized in the recognized text, and regarding the word segment whose recognition confidence is less than the second threshold as the erroneous word segment.
10. An OCR-based post-credit inspection system as recited in claim 9 wherein the error correction module further includes a confusing unit for determining confusable words corresponding to erroneous tokens, determining confusability of the confusable words corresponding to the erroneous tokens; and selecting the confusable words corresponding to the wrong segmentation according to a preset rule based on the confusable degree of the confusable words corresponding to the wrong segmentation, and taking a selection result as the error-correcting words corresponding to the wrong segmentation.
CN202010758400.4A 2020-07-31 2020-07-31 Post-credit check system and method based on OCR Pending CN111861731A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010758400.4A CN111861731A (en) 2020-07-31 2020-07-31 Post-credit check system and method based on OCR

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010758400.4A CN111861731A (en) 2020-07-31 2020-07-31 Post-credit check system and method based on OCR

Publications (1)

Publication Number Publication Date
CN111861731A true CN111861731A (en) 2020-10-30

Family

ID=72953493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010758400.4A Pending CN111861731A (en) 2020-07-31 2020-07-31 Post-credit check system and method based on OCR

Country Status (1)

Country Link
CN (1) CN111861731A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112396049A (en) * 2020-11-19 2021-02-23 平安普惠企业管理有限公司 Text error correction method and device, computer equipment and storage medium
CN112434686A (en) * 2020-11-16 2021-03-02 浙江大学 End-to-end error-containing text classification recognition instrument for OCR (optical character recognition) picture
CN112926306A (en) * 2021-03-08 2021-06-08 北京百度网讯科技有限公司 Text error correction method, device, equipment and storage medium
CN112990182A (en) * 2021-05-10 2021-06-18 北京轻松筹信息技术有限公司 Finance information auditing method and system and electronic equipment
CN113807953A (en) * 2021-09-24 2021-12-17 重庆富民银行股份有限公司 Wind control management method and system based on telephone return visit
CN114554086A (en) * 2022-02-10 2022-05-27 支付宝(杭州)信息技术有限公司 Auxiliary shooting method and device and electronic equipment
CN114926831A (en) * 2022-05-31 2022-08-19 平安普惠企业管理有限公司 Text-based recognition method and device, electronic equipment and readable storage medium
CN116704523A (en) * 2023-08-07 2023-09-05 山东成信彩印有限公司 Text typesetting image recognition system for publishing and printing equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000148844A (en) * 1998-09-11 2000-05-30 Nissan Fire & Marine Insurance Co Ltd Insurance business processing system, device and method for preparing insurance slip, and computer-readable recording medium recorded with program for executing the method by computer
CN107977356A (en) * 2017-11-21 2018-05-01 新疆科大讯飞信息科技有限责任公司 Method and device for correcting recognized text
CN109190594A (en) * 2018-09-21 2019-01-11 广东蔚海数问大数据科技有限公司 Optical Character Recognition system and information extracting method
CN109753636A (en) * 2017-11-01 2019-05-14 阿里巴巴集团控股有限公司 Machine processing and text error correction method and device calculate equipment and storage medium
CN109815463A (en) * 2018-12-13 2019-05-28 深圳壹账通智能科技有限公司 Control method, device, computer equipment and storage medium are chosen in text editing
CN110245331A (en) * 2018-03-09 2019-09-17 中兴通讯股份有限公司 A kind of sentence conversion method, device, server and computer storage medium
CN110443236A (en) * 2019-08-06 2019-11-12 中国工商银行股份有限公司 Text will put information extracting method and device after loan
CN110533521A (en) * 2019-06-21 2019-12-03 深圳前海微众银行股份有限公司 Method for early warning, device, equipment and readable storage medium storing program for executing after dynamic is borrowed
CN111079768A (en) * 2019-12-23 2020-04-28 北京爱医生智慧医疗科技有限公司 Character and image recognition method and device based on OCR
CN111144411A (en) * 2019-12-27 2020-05-12 南京大学 Method and system for correcting and identifying irregular text based on saliency map
CN111310443A (en) * 2020-02-12 2020-06-19 新华智云科技有限公司 Text error correction method and system
CN111340032A (en) * 2020-03-16 2020-06-26 天津得迈科技有限公司 Character recognition method based on application scene in financial field

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000148844A (en) * 1998-09-11 2000-05-30 Nissan Fire & Marine Insurance Co Ltd Insurance business processing system, device and method for preparing insurance slip, and computer-readable recording medium recorded with program for executing the method by computer
CN109753636A (en) * 2017-11-01 2019-05-14 阿里巴巴集团控股有限公司 Machine processing and text error correction method and device calculate equipment and storage medium
CN107977356A (en) * 2017-11-21 2018-05-01 新疆科大讯飞信息科技有限责任公司 Method and device for correcting recognized text
CN110245331A (en) * 2018-03-09 2019-09-17 中兴通讯股份有限公司 A kind of sentence conversion method, device, server and computer storage medium
CN109190594A (en) * 2018-09-21 2019-01-11 广东蔚海数问大数据科技有限公司 Optical Character Recognition system and information extracting method
CN109815463A (en) * 2018-12-13 2019-05-28 深圳壹账通智能科技有限公司 Control method, device, computer equipment and storage medium are chosen in text editing
CN110533521A (en) * 2019-06-21 2019-12-03 深圳前海微众银行股份有限公司 Method for early warning, device, equipment and readable storage medium storing program for executing after dynamic is borrowed
CN110443236A (en) * 2019-08-06 2019-11-12 中国工商银行股份有限公司 Text will put information extracting method and device after loan
CN111079768A (en) * 2019-12-23 2020-04-28 北京爱医生智慧医疗科技有限公司 Character and image recognition method and device based on OCR
CN111144411A (en) * 2019-12-27 2020-05-12 南京大学 Method and system for correcting and identifying irregular text based on saliency map
CN111310443A (en) * 2020-02-12 2020-06-19 新华智云科技有限公司 Text error correction method and system
CN111340032A (en) * 2020-03-16 2020-06-26 天津得迈科技有限公司 Character recognition method based on application scene in financial field

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
夏昌新;莫浩泓;王成鑫;王瑶;闫仕宇;: "基于深度学习的图像文字识别技术研究与应用" *
杜一谦;: "消费信贷智能决策模型系统设计与实现" *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434686A (en) * 2020-11-16 2021-03-02 浙江大学 End-to-end error-containing text classification recognition instrument for OCR (optical character recognition) picture
CN112434686B (en) * 2020-11-16 2023-05-23 浙江大学 End-to-end misplaced text classification identifier for OCR (optical character) pictures
CN112396049A (en) * 2020-11-19 2021-02-23 平安普惠企业管理有限公司 Text error correction method and device, computer equipment and storage medium
CN112926306A (en) * 2021-03-08 2021-06-08 北京百度网讯科技有限公司 Text error correction method, device, equipment and storage medium
CN112926306B (en) * 2021-03-08 2024-01-23 北京百度网讯科技有限公司 Text error correction method, device, equipment and storage medium
CN112990182A (en) * 2021-05-10 2021-06-18 北京轻松筹信息技术有限公司 Finance information auditing method and system and electronic equipment
CN112990182B (en) * 2021-05-10 2021-09-21 北京轻松筹信息技术有限公司 Finance information auditing method and system and electronic equipment
CN113807953B (en) * 2021-09-24 2023-11-03 重庆富民银行股份有限公司 Wind control management method and system based on telephone return visit
CN113807953A (en) * 2021-09-24 2021-12-17 重庆富民银行股份有限公司 Wind control management method and system based on telephone return visit
CN114554086A (en) * 2022-02-10 2022-05-27 支付宝(杭州)信息技术有限公司 Auxiliary shooting method and device and electronic equipment
CN114926831A (en) * 2022-05-31 2022-08-19 平安普惠企业管理有限公司 Text-based recognition method and device, electronic equipment and readable storage medium
CN116704523B (en) * 2023-08-07 2023-10-20 山东成信彩印有限公司 Text typesetting image recognition system for publishing and printing equipment
CN116704523A (en) * 2023-08-07 2023-09-05 山东成信彩印有限公司 Text typesetting image recognition system for publishing and printing equipment

Similar Documents

Publication Publication Date Title
CN111861731A (en) Post-credit check system and method based on OCR
JP6528147B2 (en) Accounting data entry support system, method and program
US8300942B2 (en) Area extraction program, character recognition program, and character recognition device
US20060202012A1 (en) Secure data processing system, such as a system for detecting fraud and expediting note processing
US6567765B1 (en) Evaluation system and method for fingerprint verification
WO2021042505A1 (en) Note generation method and apparatus based on character recognition technology, and computer device
CN109840520A (en) A kind of invoice key message recognition methods and system
CN111784498A (en) Identity authentication method and device, electronic equipment and storage medium
CN110503099B (en) Information identification method based on deep learning and related equipment
CN113095307B (en) Automatic identification method for financial voucher information
CN112396047B (en) Training sample generation method and device, computer equipment and storage medium
CN115810134B (en) Image acquisition quality inspection method, system and device for vehicle insurance anti-fraud
CN111415336A (en) Image tampering identification method and device, server and storage medium
CN113158777A (en) Quality scoring method, quality scoring model training method and related device
CN117454426A (en) Method, device and system for desensitizing and collecting information of claim settlement data
CN116189063B (en) Key frame optimization method and device for intelligent video monitoring
CN115688107B (en) Fraud-related APP detection system and method
CN111861733A (en) Fraud prevention and control system and method based on address fuzzy matching
CN111881880A (en) Bill text recognition method based on novel network
CN110942073A (en) Container trailer number identification method and device and computer equipment
CN115205882A (en) Intelligent identification and processing method for expense voucher in medical industry
CN115116119A (en) Face recognition system based on digital image processing technology
CN114625872A (en) Risk auditing method, system and equipment based on global pointer and storage medium
CN113807256A (en) Bill data processing method and device, electronic equipment and storage medium
CN113610098B (en) Tax payment number identification method and device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201030