CN115830613A - Document intelligent acquisition sorting method, calling method, storage medium and system - Google Patents

Document intelligent acquisition sorting method, calling method, storage medium and system Download PDF

Info

Publication number
CN115830613A
CN115830613A CN202310023136.3A CN202310023136A CN115830613A CN 115830613 A CN115830613 A CN 115830613A CN 202310023136 A CN202310023136 A CN 202310023136A CN 115830613 A CN115830613 A CN 115830613A
Authority
CN
China
Prior art keywords
text
sorting
document
recognition result
lines
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310023136.3A
Other languages
Chinese (zh)
Inventor
王先来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Bairui Network Technology Co ltd
Original Assignee
Guangzhou Bairui Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Bairui Network Technology Co ltd filed Critical Guangzhou Bairui Network Technology Co ltd
Priority to CN202310023136.3A priority Critical patent/CN115830613A/en
Publication of CN115830613A publication Critical patent/CN115830613A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Character Input (AREA)

Abstract

The invention discloses a document intelligent acquisition sorting method, a calling method, a storage medium and a system. The method comprises the following steps: A. performing text detection on the material image to obtain a text line position; B. performing character recognition on the first n lines of text to obtain a text recognition result; or performing character recognition on the front and back n lines of text lines to obtain the text recognition result; C. and matching the text recognition result with preset keywords of each document type, and sorting the material image to which the text recognition result belongs into the corresponding matched document type if the matching is successful. The sorting method can realize intelligent collection and sorting of the documents, reduce the character recognition amount, shorten the recognition time, improve the recognition and sorting efficiency, facilitate a teller to collect all materials required by business handling at one time when the business handling process is started, and avoid inconvenience caused by scattered collection of the documents in the business handling process.

Description

Document intelligent acquisition sorting method, calling method, storage medium and system
Technical Field
The invention relates to the technical field of OCR (optical character recognition), in particular to an intelligent document collection and sorting method, a calling method, a storage medium and a system.
Background
In some financial businesses of financial institutions such as banks and security dealers, related materials such as identity cards and authorization books of business transactants need to be collected through a camera during transaction, and identity information authentication or digital filing is carried out so as to confirm the legal identity of the business transactants and protect the legal rights and interests of the business transactants. Financial services usually have a fixed service handling process, different process nodes may authenticate/enter different materials, for example, a W service has 6 process nodes, the process node 1 needs to authenticate the identity card information of a service handler, the process node 3 needs to authenticate the authority of the service handler, and the process node 5 needs to enter a specification signed by the service handler. The traditional material collection mode is as follows: the teller collects materials as required in the process of handling the business according to the business process. Taking the service handling process of the W service as an example, when the process node 1 is processed, the identity card of the service handler needs to be authenticated, so the teller uses a camera connected to the service system to acquire the identity card of the service handler, the service system acquires an image currently shot by the camera as an identity card image to complete the identity card information authentication of the process node 1, the teller processes the process node 1 and then processes the process node 2, and then processes the process node 3, the node needs to acquire the authorization book of the service handler, so the teller uses the camera to acquire the authorization book of the service handler, the service system acquires the image currently shot by the camera as an authorization book image to complete the authorization book authentication of the process node 3, and the subsequent process nodes 5 are the same and will not be described. According to the traditional mode, at each process node needing to authenticate/input the related materials, a teller needs to specially operate a camera to collect the materials, which is troublesome. For this purpose, it is conceivable to collect all the material at once, but these are collected by camera shooting, and at one time, it is necessary to manually check the classified and named corresponding material documents. Although the existing OCR technology can be considered to perform automatic character recognition and perform automatic classification based on the character recognition result, the existing OCR technology directly performs OCR character recognition on the whole image, and recognition of a document material with a large number of characters takes a relatively long time and is inefficient.
Disclosure of Invention
One of the technical problems to be solved by the invention is to provide an intelligent document collection and sorting method, which can realize batch collection of materials and intelligent sorting and has high efficiency.
The second technical problem to be solved by the invention is to provide a method for intelligently acquiring, sorting and calling documents during business handling, which can realize batch acquisition of materials and intelligent sorting and calling during business handling and has higher efficiency.
A third technical problem to be solved by the present invention is to provide a computer-readable storage medium for storing a computer program for implementing any one of the above methods.
The fourth technical problem to be solved by the present invention is to provide a system capable of implementing the above method for intelligently collecting, sorting and calling documents during business handling.
In order to solve one of the above technical problems, the present invention provides an intelligent document collection and sorting method, which comprises the steps of:
A. performing text detection on the material image to obtain a text line position;
B. performing character recognition on the text line to obtain a text recognition result;
C. matching the text recognition result with preset keywords of each document type, and if the matching is successful, executing a sorting step C1. Sorting the material image to which the text recognition result belongs into the corresponding matched document type;
step B specifically:
performing character recognition on the first n lines of text lines to obtain the text recognition result, wherein n is more than or equal to 1 and less than Sum, and Sum is the total number of the text lines; or
And performing character recognition on the front and back n lines of text lines to obtain the text recognition result, wherein n is more than or equal to 1 and less than Sum 1/2.
Further, if the number of text lines is lower than the preset degree, the step B is not executed, and the step B0 is executed instead.
Further, in step C, a plurality of preset keywords of each document category are used as the keyword group of the document category, and if the matching degree between the text recognition result and the keyword group of one document category is higher than a preset threshold, it is determined that the matching is successful, where the matching degree is a ratio of the number of preset keywords of the document category contained in the text recognition result to the total number of preset keywords of the document category.
Further, in step C1, the sorted material images are stored in a list to be confirmed for the user to verify and confirm.
Further, in the step a, after the text line position is obtained, it is determined whether the text direction of the text line is forward, and if not, an included angle between the current text direction of the text line and the forward direction is calculated, and the text direction of the text line is rotated to the forward direction by rotating the material image.
In order to solve the second technical problem, the invention provides a method for intelligently acquiring, sorting and calling documents during business handling, which comprises the following steps:
s1, when a business handling process is started, a camera is called to collect multiple material images which belong to different materials and are required by the business handling process in batches, and each material image is sorted by adopting the intelligent document collection and sorting method to obtain a document type corresponding to each material image;
and S2, performing service handling according to the sequence of the service handling process nodes, and if the currently-handled process nodes need to use materials of a certain document type, calling out the material images which are sorted to be matched with the document type from the multiple material images.
Further, the service is specifically in a financial service.
In order to solve the third technical problem, the present invention provides a computer-readable storage medium, on which an executable computer program is stored, wherein the computer program can implement the method for intelligently collecting and sorting documents as described above or implement the method for intelligently collecting, sorting and calling documents during business transaction as described above.
In order to solve the fourth technical problem, the present invention provides a system for intelligently acquiring, sorting and calling documents during business transaction, which includes a camera and business transaction equipment connected to the camera, where the business transaction equipment includes a processor and a computer-readable storage medium, the computer-readable storage medium is the computer-readable storage medium, and the processor of the business transaction equipment can execute a computer program in the storage medium to implement the above-mentioned method for intelligently acquiring, sorting and calling documents during business transaction and/or implement the above-mentioned method for intelligently acquiring, sorting and calling documents during business transaction.
The inventor finds that although a document may be very long, keywords capable of representing the document category can be usually contained in the title, the first few lines of characters or the last few lines of characters of the document, so that the intelligent document collecting and sorting method provided by the invention only identifies the first n lines of characters or only identifies the first and the last n lines of characters when character recognition is carried out, so that enough text content can be provided for carrying out document category keyword matching, the character recognition amount is reduced, the recognition duration is shortened, and the recognition and sorting efficiency is improved.
Drawings
Fig. 1 is a schematic flow diagram of a first embodiment of an intelligent document collection and sorting method provided by the present invention.
Fig. 2 is a schematic flow chart of a second embodiment of the document intelligent collection and sorting method provided by the present invention.
Fig. 3 is a schematic flow chart of the method for intelligently acquiring, sorting and calling documents during business transaction according to the present invention.
Detailed Description
The invention is described in detail below with reference to specific examples.
Example one
The embodiment provides a system for intelligently acquiring, sorting and calling documents during business handling, which comprises a camera and business handling equipment connected with the camera, wherein the business handling equipment comprises a processor and a computer-readable storage medium. The storage medium stores executable computer programs, and the processor of the business handling equipment can execute the computer programs in the storage medium, so as to realize the method for intelligently acquiring, sorting and calling the documents during business handling as shown in fig. 3. The following describes the implementation of the method by taking the financial transaction process of a bank as an example.
A user handles a W service on a bank counter, the service is a financial service released by a bank, a banking system divides a handling process of the service into 6 process nodes, the process node 1 needs to authenticate identity card information of a service handler, the process node 3 needs to authenticate an authorization of the service handler, and the process node 5 needs to input a description signed by the service handler. When a bank teller handles a W service for a user, the bank teller acquires all materials (an identity card, an authorization book and a description) required by the service from the user, and then starts a W service handling process in a banking service system client installed on a service handling device (such as a computer). After the system is started, the banking system calls a camera connected with the business handling equipment to acquire images of materials required for business handling, and a bank teller places the three materials of the identity card, the authorization book and the specification in a shooting position of the camera to shoot. After the banking system collects the material images of the three materials, the banking system sorts the three material images respectively by using the intelligent document collection and sorting method shown in fig. 1.
The sorting process comprises the following steps:
firstly, the banking system performs text detection on the material image through a text detection algorithm, the text detection algorithm detects the position of each text line in the material image and marks each text line identification box in the image. Secondly, the banking system judges whether the character direction of each text line recognition box is in the forward direction, for example, the judgment is made according to the height value and the width value of the text line recognition boxes, if the height values of a plurality of text line recognition boxes are greater than the width value, the character direction of the material image is in a vertical state, and the included angle between the current character direction of the text line and the forward direction is 90 degrees, so that the material image is rotated to rotate the character direction of the text line to the forward direction by 90 degrees. Thirdly, the banking system sorts the text lines of the material images with the forward text direction. If the number of the text lines is more than 10 lines, such as the material images of the authorization book and the specification, the text recognition algorithm is adopted to perform text recognition on the first 10 lines of the material images with the forward text direction, and the text recognition result is obtained. If the number of the text lines is not 10, for example, the material image of the identity card, the text recognition algorithm is adopted to perform the text recognition on all the text lines of the material image with the forward text direction, and a text recognition result is obtained. In the embodiment, 10 lines are used as the threshold of the number of text lines, other embodiments may change to other values, for example, 5 lines or 15 lines, and when performing character recognition, only the first 5 lines or only the first 15 lines are correspondingly changed to perform character recognition. And finally, matching the text recognition result with preset keywords of each document category. In this embodiment, a plurality of corresponding sets of keyword sets are configured in advance for each document category commonly used in financial services. The multiple groups of keywords are OR logic, and the matching with the document category is successful as long as one group of keywords are hit in the text recognition result of the material image. The keywords in each group are AND logic, all the keywords of the group are hit by the character recognition result of the material image, namely the keywords in the group are hit when the hit rate of the keywords in the group is 100%. Alternatively, the keyword hit rate in the group may be more than 80%, which is the matching degree between the text recognition result and the keyword group, and specifically includes: the ratio of the number of keywords in a single keyword group contained in the text recognition result to the total number of keywords of the keyword group. After the text recognition result is successfully matched with the document category, the banking system sorts the material image to which the text recognition result belongs into a corresponding matched document category, and the sorting specifically comprises naming the file name of the material image as the corresponding matched document category, and archiving the material image after the sorting. Preferably, the banking system does not archive the sorted material image after it is sorted, but instead stores the sorted material image in a list to be validated for verification and validation by the bank teller. And after the bank teller checks that the material image is correct, the bank business system archives the named material image.
After the three material images are sorted, the banking system can conduct business handling according to the sequence of the business handling process nodes. The process node 1 needs to authenticate the identity card information of a business transactor, so when a bank teller processes the process node 1 in a banking system, the banking system automatically calls a material image with a file name of an identity card from the three sorted material images, and performs identity card information authentication on a user according to the material image. After the bank teller processes the process node 1, the process node 2 is processed, then the process node 3 is processed, and the process node 3 needs to authenticate the authorization of the business transactor, so the bank business system automatically extracts the material image with the file name of the authorization from the three sorted material images and authenticates the authorization of the user according to the material image. After the bank teller finishes processing the flow node 3, the flow nodes 4 and 5 are processed, the flow node 5 needs to input the instruction signed by the service transactor, and when the banking service system enters the flow node 5, the material image with the file name as the instruction is automatically extracted from the three sorted material images and is input into the flow node as the instruction. Therefore, the bank teller can collect all materials required by business handling at one time when the business handling process is started, and inconvenience caused by scattered document collection in the business handling process is avoided.
Example two
This embodiment is substantially the same as the first embodiment, and only the differences between the two embodiments will be described below.
In this embodiment, the banking system does not adopt the intelligent document collection and sorting method shown in fig. 1, but adopts the intelligent document collection and sorting method shown in fig. 2 to sort the three material images respectively. In this way, the difference between this embodiment and the first embodiment is that how many lines of text lines are subjected to text recognition in the sorting process, and in this embodiment, specifically, if the number of text lines is greater than 20 lines, for example, the material image of the authorization book and the instruction book, the text recognition algorithm is adopted to perform text recognition on the first 10 lines and the last 10 lines of the material image with the forward text direction, so as to obtain the text recognition result. If the number of the text lines is not 20, for example, the material image of the identity card, the text recognition algorithm is adopted to perform the text recognition on all the text lines of the material image with the forward text direction, and a text recognition result is obtained. The sorting method of the embodiment is applied to the condition that the front and the back of the document have the keywords for reflecting the document category, so that the sorting accuracy of the documents can be improved.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the protection scope of the present invention, although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solutions of the present invention can be modified or substituted equivalently without departing from the spirit and scope of the technical solutions of the present invention.

Claims (9)

1. An intelligent document collecting and sorting method comprises the following steps:
A. performing text detection on the material image to obtain a text line position;
B. performing character recognition on the text line to obtain a text recognition result;
C. matching the text recognition result with preset keywords of each document type, and if the matching is successful, executing a sorting step C1. Sorting the material image to which the text recognition result belongs into the corresponding matched document type;
it is characterized in that the step B specifically comprises the following steps:
performing character recognition on the first n lines of text to obtain the text recognition result, wherein n is more than or equal to 1 and less than Sum, and Sum is the total number of the text lines; or
And performing character recognition on the front and back n lines of text lines to obtain the text recognition result, wherein n is more than or equal to 1 and less than Sum 1/2.
2. The method as claimed in claim 1, wherein if the number of text lines is lower than the predetermined level, the step B is not executed, and the step B0 is executed instead.
3. The method according to claim 1, wherein in step C, there are a plurality of preset keywords for each document category, and the preset keywords are used as the keyword groups for the document category, and if the matching degree between the text recognition result and the keyword group for a document category is higher than a preset threshold, the matching is determined to be successful, wherein the matching degree is the ratio of the number of preset keywords for the document category contained in the text recognition result to the total number of preset keywords for the document category.
4. The method as claimed in claim 1, wherein in step C1, the sorted material images are stored in a list to be confirmed for verification and confirmation by the user.
5. The intelligent document collecting and sorting method as claimed in claim 1, wherein in step a, after the position of the text line is obtained, it is determined whether the text line has a forward direction, and if not, the included angle between the current text direction of the text line and the forward direction is calculated, and the text direction of the text line is turned to the forward direction by rotating the material image.
6. A method for intelligently acquiring, sorting and calling documents during business handling is characterized by comprising the following steps:
s1, when a business handling process is started, a camera is called to collect multiple material images which belong to different materials and are required by the business handling process in batches, and each material image is sorted by adopting the intelligent document collection and sorting method according to any one of claims 1 to 5 to obtain the document type corresponding to each material image;
and S2, performing service handling according to the sequence of the service handling process nodes, and calling and taking the material images which are sorted to be matched with the document type from the multiple material images if the currently handling process nodes need to use materials of a certain document type.
7. The method according to claim 6, wherein the business is financial business.
8. A computer-readable storage medium, on which an executable computer program is stored, the computer program being adapted to implement the method for intelligently collecting and sorting documents according to any one of claims 1 to 5 or the method for intelligently collecting, sorting and retrieving documents during business operations according to claim 6 or 7.
9. A system for intelligently acquiring, sorting and calling documents during business transaction, comprising a camera and business transaction equipment connected with the camera, wherein the business transaction equipment comprises a processor and a computer readable storage medium, and the computer readable storage medium is the computer readable storage medium according to claim 8, and the processor of the business transaction equipment can execute a computer program in the storage medium so as to realize the intelligent acquisition and sorting method for documents according to any one of claims 1 to 5 and/or realize the method for intelligently acquiring, sorting and calling documents during business transaction according to claim 6 or 7.
CN202310023136.3A 2023-01-09 2023-01-09 Document intelligent acquisition sorting method, calling method, storage medium and system Pending CN115830613A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310023136.3A CN115830613A (en) 2023-01-09 2023-01-09 Document intelligent acquisition sorting method, calling method, storage medium and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310023136.3A CN115830613A (en) 2023-01-09 2023-01-09 Document intelligent acquisition sorting method, calling method, storage medium and system

Publications (1)

Publication Number Publication Date
CN115830613A true CN115830613A (en) 2023-03-21

Family

ID=85520357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310023136.3A Pending CN115830613A (en) 2023-01-09 2023-01-09 Document intelligent acquisition sorting method, calling method, storage medium and system

Country Status (1)

Country Link
CN (1) CN115830613A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078191A1 (en) * 2009-09-28 2011-03-31 Xerox Corporation Handwritten document categorizer and method of training
US9135517B1 (en) * 2012-11-29 2015-09-15 Amazon Technologies, Inc. Image based document identification based on obtained and stored document characteristics
CN112085022A (en) * 2020-09-09 2020-12-15 上海蜜度信息技术有限公司 Method, system and equipment for recognizing characters
CN112434640A (en) * 2020-12-04 2021-03-02 小米科技(武汉)有限公司 Method and device for determining rotation angle of document image and storage medium
US20210319247A1 (en) * 2020-04-10 2021-10-14 I.R.I.S. Text classification
CN114005131A (en) * 2021-11-02 2022-02-01 京东科技信息技术有限公司 Certificate character recognition method and device
CN115410211A (en) * 2022-08-30 2022-11-29 上海浦东发展银行股份有限公司 Image classification method and device, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078191A1 (en) * 2009-09-28 2011-03-31 Xerox Corporation Handwritten document categorizer and method of training
US9135517B1 (en) * 2012-11-29 2015-09-15 Amazon Technologies, Inc. Image based document identification based on obtained and stored document characteristics
US20210319247A1 (en) * 2020-04-10 2021-10-14 I.R.I.S. Text classification
CN112085022A (en) * 2020-09-09 2020-12-15 上海蜜度信息技术有限公司 Method, system and equipment for recognizing characters
CN112434640A (en) * 2020-12-04 2021-03-02 小米科技(武汉)有限公司 Method and device for determining rotation angle of document image and storage medium
CN114005131A (en) * 2021-11-02 2022-02-01 京东科技信息技术有限公司 Certificate character recognition method and device
CN115410211A (en) * 2022-08-30 2022-11-29 上海浦东发展银行股份有限公司 Image classification method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US10607073B2 (en) Systems and methods for classifying payment documents during mobile image processing
KR101585029B1 (en) Recognition and classification system of document
CN108932343B (en) Data set cleaning method and system for human face image database
US20210158015A1 (en) Classifying digital documents in multi-document transactions based on signatory role analysis
US9779400B2 (en) Biometric matching system using input biometric sample
WO2019085064A1 (en) Medical claim denial determination method, device, terminal apparatus, and storage medium
US20070053574A1 (en) Real time image quality analysis and verification
AU2011288069A9 (en) Valuable file identification method and identification system, device thereof
CN110413569A (en) Archives of paper quality electronization archiving method, device and terminal device
CN111126098A (en) Certificate image acquisition method, device, equipment and storage medium
CN115116068B (en) Archive intelligent archiving system based on OCR
CN112487982A (en) Merchant information auditing method, system and storage medium
CN114511866A (en) Data auditing method, device, system, processor and machine-readable storage medium
CN109492532A (en) A kind of methods, devices and systems of image AI identification
CN115830613A (en) Document intelligent acquisition sorting method, calling method, storage medium and system
CN111932270A (en) Method and device for bank customer identity verification
CN111259894A (en) Certificate information identification method and device and computer equipment
CN111797922B (en) Text image classification method and device
JP3648050B2 (en) Form image classification method, form image registration method, and form image classification apparatus
US11521428B1 (en) Methods and systems for signature verification
US20230130990A1 (en) Reducing false detections in template-based classification of identity documents
CN113313095B (en) User information matching method and device, computer equipment and storage medium
KR102425032B1 (en) Apparatus for automatic classifying document types and method thereof
Slavin et al. Search for Falsifications in Copies of Business Documents
CN113742287A (en) Archive data archiving method based on data middlebox, computer device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230321

RJ01 Rejection of invention patent application after publication