CN115827869A - Document image processing method and device, electronic equipment and storage medium - Google Patents

Document image processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115827869A
CN115827869A CN202211663202.5A CN202211663202A CN115827869A CN 115827869 A CN115827869 A CN 115827869A CN 202211663202 A CN202211663202 A CN 202211663202A CN 115827869 A CN115827869 A CN 115827869A
Authority
CN
China
Prior art keywords
document
bill
message
document image
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211663202.5A
Other languages
Chinese (zh)
Inventor
张文宇
卜丽
陆佳庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202211663202.5A priority Critical patent/CN115827869A/en
Publication of CN115827869A publication Critical patent/CN115827869A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

The application discloses a document image processing method, a document image processing device, electronic equipment and a storage medium, which belong to the technical field of image processing, and the method comprises the following steps: respectively carrying out document type identification on document images in the acquired credit card delivery business by utilizing at least two pre-trained classification models, determining document types of the document images based on identification results of the at least two classification models, and matching text contents in the document images by utilizing a plurality of regular sentences if the document types are credit card messages, wherein the plurality of regular sentences are predetermined according to key sentences of each type of credit card messages on each page of messages, and then determining processing results of the document images based on message types and message page numbers corresponding to the successfully matched regular sentences. Therefore, the document image belonging to the credit message can be automatically identified, and the document image can be identified as the credit message of which type is the page of the credit message of which type, so that the processing efficiency of the document image is higher.

Description

Document image processing method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a document image processing method and apparatus, an electronic device, and a storage medium.
Background
In recent years, the credit card delivery business of the bank is gradually increased in the bank export examination business, and the credit card delivery business of the bank relates to various types of document images, at present, the document images are classified by an auditor through years of business experience, and after the document images are determined to be credit messages, whether the document images are MT700 messages or MT707 messages and which page of the MT700 messages or MT707 messages need to be determined through experience. Thus, the labor cost is high, and the processing efficiency of the document image is low.
Disclosure of Invention
The embodiment of the application provides a document image processing method and device, an electronic device and a storage medium, and aims to solve the problems of high labor cost and low processing efficiency in document image processing in the related art.
In a first aspect, an embodiment of the present application provides a method for processing a document image, including:
respectively identifying the bill types of the acquired bill images in the credit bill delivery service by utilizing at least two pre-trained classification models;
determining the bill category of the bill image based on the recognition results of the at least two classification models;
if the bill type is a credit message, matching text contents in the bill image by utilizing a plurality of regular sentences, wherein the plurality of regular sentences are predetermined according to key sentences of each type of credit message on each page of message;
and determining a processing result of the document image based on the message category and the message page number corresponding to the successfully matched regular sentence.
In some embodiments, the document type recognition is respectively performed on the obtained document images in the credit card delivery business by using at least two classification models trained in advance, and the method comprises the following steps:
converting the document image into hypertext markup language (HTML) text;
extracting the content of the HTML text to obtain the text content contained in the document image;
and inputting the text content into each classification model to identify the bill category.
In some embodiments, entering the text content into each classification model for document category identification includes:
converting the text content into word frequency inverse text frequency tf-idf word vectors;
and inputting the tf-idf word vectors into each classification model for bill category identification.
In some embodiments, before entering the text content into each classification model for document category identification, the method further comprises:
and determining that the number of characters contained in the text content is not less than a preset value.
In some embodiments, the model complexity of each classification model is lower than the specified complexity.
In some embodiments, determining a document category of the document image based on the recognition results of the at least two classification models includes:
counting the recognition results of the at least two classification models;
if only one identification result with the maximum counting times exists, determining the identification result with the maximum counting times as the bill type of the bill image;
and if at least two recognition results with the largest counting times exist, taking the recognition result of the classification model with the highest predetermined accuracy as the bill category of the bill image.
In a second aspect, an embodiment of the present application provides an apparatus for processing a document image, including:
the identification module is used for respectively identifying the bill types of the acquired bill images in the credit bill delivery service by utilizing at least two pre-trained classification models;
the category determining module is used for determining the bill category of the bill image based on the recognition results of the at least two classification models;
the matching module is used for matching the text content in the receipt image by utilizing a plurality of regular sentences if the receipt type is a credit certificate message, wherein the regular sentences are predetermined according to the key sentences of each credit certificate message on each page of message;
and the result determining module is used for determining the processing result of the document image based on the message category and the message page number corresponding to the successfully matched regular sentence.
In some embodiments, the identification module is specifically configured to:
converting the document image into hypertext markup language (HTML) text;
extracting the content of the HTML text to obtain the text content contained in the document image;
and inputting the text content into each classification model for bill classification identification.
In some embodiments, the identification module is specifically configured to:
converting the text content into word frequency inverse text frequency tf-idf word vectors;
and inputting the tf-idf word vector into each classification model for bill class identification.
In some embodiments, the identification module is further to:
and before the text content is input into each classification model for bill category identification, determining that the number of characters contained in the text content is not less than a preset value.
In some embodiments, the model complexity of each classification model is lower than the specified complexity.
In some embodiments, the category determination module is specifically configured to:
counting the recognition results of the at least two classification models;
if only one identification result with the maximum counting times exists, determining the identification result with the maximum counting times as the bill type of the bill image;
and if at least two recognition results with the largest counting times exist, taking the recognition result of the classification model with the highest predetermined accuracy as the bill category of the bill image.
In a third aspect, an embodiment of the present application provides an electronic device, including: at least one processor, and a memory communicatively coupled to the at least one processor, wherein:
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of processing a document image as described above.
In a fourth aspect, an embodiment of the present application provides a storage medium, and when instructions in the storage medium are executed by a processor of an electronic device, the electronic device is capable of executing the above method for processing a document image.
In a fifth aspect, an embodiment of the present application provides a computer program product, which, when the computer program product is called and executed by an electronic device, causes the electronic device to execute the above method for processing a document image.
In the embodiment of the application, at least two pre-trained classification models are used for respectively identifying the document type of a document image in the acquired credit document delivery service, the document type of the document image is determined based on the identification results of the at least two classification models, if the document type is a credit document message, text contents in the document image are matched by using a plurality of regular sentences, wherein the plurality of regular sentences are predetermined according to key sentences of each type of credit document message on each page of message, and then the processing result of the document image is determined based on the message type and the message page number corresponding to the regular sentences which are successfully matched. Therefore, the document image belonging to the credit message can be automatically identified, and the document image can be identified as the type of the credit message and the page of the credit message, so that the processing efficiency of the document image is higher.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic view of an application scenario of a method for processing a document image according to an embodiment of the present application;
FIG. 2 is a flowchart of a document image processing method according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of another document image processing method according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a document image processing apparatus according to an embodiment of the present application;
fig. 5 is a schematic hardware structure diagram of an electronic device for implementing a document image processing method according to an embodiment of the present application.
Detailed Description
In order to solve the problems of high labor cost and low processing efficiency in processing document images in the related art, embodiments of the application provide a document image processing method and device, an electronic device and a storage medium.
The preferred embodiments of the present application will be described in conjunction with the drawings of the specification, it should be understood that the preferred embodiments described herein are only for illustrating and explaining the present application, and are not intended to limit the present application, and the embodiments and features of the embodiments in the present application may be combined with each other without conflict. In addition, in the embodiments of the present application, the acquisition, storage, use, processing, etc. of the data all conform to the relevant regulations of the national laws and regulations.
To facilitate understanding of the present application, the present application refers to technical terms in which:
credit-letter is a written certificate issued by the bank to the exporter (seller) to guarantee the responsibility of paying the loan according to the request of the importer (buyer). In the letter of credit, the bank authorizes the exporter to draft a draft of no more than a prescribed amount to the payer with the bank or its designated bank under the condition of being in compliance with the regulations of the letter of credit, and to accompany the shipping documents as prescribed, and to collect the goods at a designated place as scheduled. In international trade activities, the buyer and the seller may not trust each other, and therefore, two banks are required as guarantors of the buyer and the seller instead of the cash receipt. The instrument used by the bank in this activity is a letter of credit.
Documentary letter of credit: the documentary credit is a bill with attached shipping documents or a credit paid only by the shipping documents, and most of the international trade settlement is the documentary credit.
MT700 message: the MT700 is the message format of the electronic opening book of the documentary credit, that is, the electronic opening credit which is commonly used at present uses this type of message format.
MT707 message: MT707 is a modified message format for documentary credit. The contents of the MT700 message and the MT707 message have certain similarity, and the MT707 message is used to describe which modifications exist in the MT700 message.
Word frequency-inverse text frequency (TF-IDF) is a statistical method to evaluate the importance of a word to the text content. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus.
Optical Character Recognition (OCR) refers to a process of inspecting characters on an image by an electronic device (e.g., a scanner or a digital camera), confirming the shape by detecting dark and light patterns, and then translating the shape into characters by a Character Recognition method.
Fig. 1 is a schematic diagram of a possible application scenario provided in an embodiment of the present application, where the application scenario includes: a target terminal (101a, 101b) and a server 102. The target terminals (101a, 101b) and the server 102 can exchange information through a communication network, and the communication mode adopted by the communication network can be a wireless communication mode or a wired communication mode.
For example, the target terminals (101a, 101b) may communicate with the server 102 by accessing the network via cellular Mobile communication technology, including the 5th Generation Mobile networks (5 g) technology. As another example, the target terminals (101a, 101b) may communicate with the server 102 via short-range Wireless communication, including Wireless Fidelity (Wi-Fi) technology.
The number of the devices is not limited in the embodiment of the present application, and as shown in fig. 1, the target terminal (101a, 101b) and the server 102 are only used as an example for description, and the devices and their respective functions are briefly described below.
The target terminal (101a, 101b) may be a device that provides voice and/or data connectivity to the user, such as the target terminal (101a, 101b) including but not limited to: the Mobile terminal Device comprises a Mobile phone, a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), a wearable Device, a Virtual Reality (VR) Device, an Augmented Reality (AR) Device, a wireless terminal Device in industrial control, a wireless terminal Device in unmanned driving, a wireless terminal Device in a smart grid, a wireless terminal Device in transportation safety, a wireless terminal Device in a smart city, a wireless terminal Device in a smart home, and the like.
In addition, the target terminals (101a, 101b) may have clients related to image classification installed thereon, and the clients may be software (e.g., APP, browser, short video software, etc.), or may be web pages, applets, etc. In an embodiment of the application, the target terminal (101a, 101b) may send each document image to the server 102 using the image classification client described above. For example, in a credit ticketing service scenario, multiple document images may be sent to the server 102 for approval.
Further, the server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.
Based on the above application scenario, referring to the accompanying drawings, a method for processing a document image provided in an embodiment of the present application will be further described and explained, referring to fig. 2, where the embodiment of the present application provides a method for processing a document image, and the method can be applied to the server 102 shown in fig. 1, and includes the following steps:
in step 201, document type recognition is performed on the document images in the obtained credit card delivery service respectively by using at least two classification models trained in advance.
The credit card delivery service can be the credit card delivery service notified by other banks or the credit card delivery service of the bank. And, the document image in the credit statement service may include an original document image of the credit, a message image of the credit, an image of "export delivery document list", an image of "application for credit statement", and the like.
In some embodiments, the document image may be converted into a hypertext Markup Language (HTML) Text, the content of the HTML Text may be extracted to obtain the Text content included in the document image, and then the Text content may be input into each classification model for document type identification.
In practical application, when the number of characters included in the text content is less than a preset value such as 20, the document image may be blank or does not include actual business data, so before the text content is input into each classification model for document type identification, whether the number of characters included in the text content is not less than the preset value or not can be judged, if yes, subsequent operation is performed, and otherwise, the document image processing can be finished.
In addition, considering that the text content in the document image has certain business meaning, in order to perform document type identification more quickly and better, after the number of characters contained in the text content in the document image is determined to be not less than a preset value, the text content can be converted into tf-idf word vectors, and then the tf-idf word vectors are input into each classification model to perform document type identification.
In some embodiments, the model complexity of each classification model is lower than the specified complexity, i.e., each classification model is a lightweight model. For example, the classification model may be selected from a Support Vector Machine (SVM) model, a Logistic Regression (LR) model, and a K-Nearest Neighbor (KNN) model. The number of samples needed to be used in training the lightweight model is small, and the training speed is high, so that the manual marking amount of the samples can be reduced, and the model training speed can be improved.
In step 202, based on the recognition results of the at least two classification models, a document category of the document image is determined.
For example, counting the recognition results of the at least two classification models, and if only one recognition result with the largest counting frequency is obtained, determining the recognition result with the largest counting frequency as the bill type of the bill image; and if at least two recognition results with the largest counting times exist, taking the recognition result of the classification model with the highest predetermined accuracy as the bill type of the bill image.
Therefore, the recognition results of the two classification models are integrated, the final recognition result is determined, the defects of a single two classification model can be overcome, and the accuracy of the final recognition result is improved.
In step 203, if the document type is a credit message, matching the text content in the document image by using a plurality of regular sentences, wherein the plurality of regular sentences are predetermined according to the key sentences of each type of credit message on each page of message.
Here, a regular sentence is a regular expression.
In practical application, the credit message includes an MT700 message and an MT707 message, and the MT707 message is used to describe what modifications are made to the MT700 message. MT700 messages typically have 1-3 pages, MT707 messages typically have only 1 or 2 pages. In order to determine whether the document image is the page of the MT700 message or the page of the MT707 message, the key sentences of the MT700 message and the MT707 message on each page of the message may be analyzed, and a plurality of regular sentences may be constructed based on the analysis result.
It should be noted that more than one regular sentence may be constructed for a certain page of a certain type of message.
In step 204, a document image processing result is determined based on the message category and the message page number corresponding to the regular sentence with successful matching.
For example, if only one regular sentence is successfully matched, the message category and the message page number corresponding to the regular sentence can be determined as the processing result of the document image; if at least two matched regular sentences exist, the message category and the message page number corresponding to each successfully matched regular sentence can be determined, and the message category and the message page number which are determined for the most times are determined as the processing result of the document image.
In the embodiment of the application, the document image belonging to the credit message can be automatically identified, and the document image can be identified as the credit message of which type and page of the credit message of which type, so that the processing efficiency of the document image is higher.
Fig. 3 is a flowchart of processing a document image according to another embodiment of the present application, including the following steps.
In step 301, a document image in his bank's informed credit delivery service is obtained.
Generally, when a non-self-informed letter of credit is delivered, documents such as original letter of credit, letter of credit message, export delivery document list, letter of credit delivery application book and the like are scanned into document images,
in step 302, the document image is converted to HTML text using OCR techniques.
In step 303, content extraction is performed on the HTML text to obtain text content included in the document image.
Wherein, when the characters of the text content are less than 20, the document image is a blank page or has no substantial business meaning, and can be directly discarded, i.e. the subsequent steps are not executed.
In step 304, the text content is converted into tf-idf word vectors, and the tf-idf word vectors are respectively input into three pre-established classification recognition models for bill category recognition.
For example, the three classification recognition models are an SVM model, an LR model and a KNN model, and in order to increase the recognition speed, the three classification models can only perform two-classification recognition on whether the document of credit is a message, and the three recognition models all belong to lightweight models, that is, the model complexity is lower than the specified complexity, compared with a neural network model, the models have fewer samples used in training, and the training time is shorter, so that the manual annotation amount of document image samples can be reduced, and the model training speed can also be increased.
The training process of these three recognition models is briefly described below.
Firstly, a document sample set is obtained, whether each document image sample in the document sample set has marking information of a credit message or not is judged, then, each document image sample in the document sample set is converted into tf-idf vectors, 80% of the tf-idf vectors serve as a test set, 20% of the tf-idf vectors serve as a training set, and SVM, LR and KNN models for classification and recognition are trained.
In step 305, based on the recognition results of the three classification models, the document category of the document image is determined.
In order to improve the identification accuracy, the integrated learning model can be constructed by adopting a voting method on the basis of the three classification models, and the identification result with the largest occurrence frequency is taken as the final identification result.
In addition, when the classification results of the three classification models do not agree (i.e., the recognition result with the largest occurrence number is more than one), the recognition result of the model with the highest posterior probability (i.e., the model with the highest predetermined accuracy) may be selected as the final recognition result.
In step 306, if the document type is a credit document message, it is determined whether the document image is the MT700 message or the MT707 message, and the page of the MT700 message or the MT707 message by using the regular matching method.
I.e. the classification and page number ordering of the MT700 message and the MT707 message are performed.
In view of the fact that the MT700 message and the MT707 message have a certain similarity and the contents represented by many message fields are consistent, for this reason, common key sentences appearing on each page of messages in the MT700 message and the MT707 message may be summarized in advance, and regular sentences are designed as a standard for distinguishing.
Subsequently, a plurality of regular sentences can be matched with the text content of the current document image, when more regular sentences are matched, the probability that the regular sentences belong to a certain class of credit message is higher, and the class with the highest matching times is selected as a classification result.
In step 307, the output document image is the MT700 message or the MT707 message, and is the processing result of the MT700 message or the MT707 message page.
For example, the output result is: the document image is page 1 of the MT700 message, the document image is page 2 of the MT700 message, the document image is page 1 of the MT707 message, and the like.
The single sentence image processing method provided by the embodiment of the application has the following advantages:
(1) The document image samples are converted into tf-idf word vectors, the classification model is trained by means of the tf-idf word vectors, and document category identification is carried out by combining an integrated learning idea, so that the classification accuracy of the document images is improved, and the dependence degree on labeled samples is reduced.
(2) And for the problem of high similarity between the MT700 message and the MT707 message, classifying and paging judgment are performed in a regular matching mode.
(3) Compared with the traditional offline manual single mode, the method has higher accuracy and recall rate, and improves the processing efficiency of the message service.
When the method provided in the embodiments of the present application is implemented in software or hardware or a combination of software and hardware, a plurality of functional modules may be included in the electronic device, and each functional module may include software, hardware or a combination thereof.
Based on the same technical concept, the embodiment of the application also provides a processing device of the document image, and the principle of the processing device of the document image for solving the problem is similar to the processing method of the document image, so the implementation of the processing device of the document image can refer to the implementation of the processing method of the document image, and repeated parts are not repeated.
Fig. 4 is a schematic structural diagram of a device for processing a document image according to an embodiment of the present disclosure, and includes an identification module 401, a category determination module 402, a matching module 403, and a result determination module 404.
The identification module 401 is configured to perform bill type identification on the acquired bill images in the credit bill delivery service by using at least two pre-trained classification models;
a category determining module 402, configured to determine a document category of the document image based on recognition results of the at least two classification models;
a matching module 403, configured to match text content in the document image with a plurality of regular statements if the document type is a credit message, where the regular statements are predetermined according to a key statement of each type of credit message on each page of message;
and the result determining module 404 is configured to determine a processing result of the document image based on the message category and the message page number corresponding to the successfully matched regular statement.
In some embodiments, the identification module 401 is specifically configured to:
converting the document image into hypertext markup language (HTML) text;
extracting the content of the HTML text to obtain the text content contained in the document image;
and inputting the text content into each classification model for bill classification identification.
In some embodiments, the identification module 401 is specifically configured to:
converting the text content into word frequency inverse text frequency tf-idf word vectors;
and inputting the tf-idf word vector into each classification model for bill class identification.
In some embodiments, the identification module 401 is further configured to:
and before the text content is input into each classification model for bill category identification, determining that the number of characters contained in the text content is not less than a preset value.
In some embodiments, the model complexity of each classification model is lower than the specified complexity.
In some embodiments, the category determination module 402 is specifically configured to:
counting the recognition results of the at least two classification models;
if only one identification result with the maximum counting times exists, determining the identification result with the maximum counting times as the bill type of the bill image;
and if at least two recognition results with the maximum counting times exist, taking the recognition result of the classification model with the highest predetermined accuracy as the bill category of the bill image.
The division of the modules in the embodiments of the present application is schematic, and only one logic function division is provided, and in actual implementation, there may be another division manner, and in addition, each function module in each embodiment of the present application may be integrated in one processor, may also exist alone physically, or may also be integrated in one module by two or more modules. The coupling of the various modules to each other may be through interfaces that are typically electrical communication interfaces, but mechanical or other forms of interfaces are not excluded. Thus, modules described as separate components may or may not be physically separate, may be located in one place, or may be distributed in different locations on the same or different devices. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
Having described the method and apparatus for processing document images according to an exemplary embodiment of the present application, an electronic device according to another exemplary embodiment of the present application is described next.
In some possible implementations, an electronic device of the present application may include at least one processor, and at least one memory. Wherein the memory stores program code which, when executed by the processor, causes the processor to perform the methods according to the various exemplary embodiments of the present application described above in the present specification.
An electronic device 130 implemented according to this embodiment of the present application is described below with reference to fig. 5. The electronic device 130 shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 5, the electronic device 130 is represented in the form of a general electronic device. The components of the electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 that connects the various system components (including the memory 132 and the processor 131).
Bus 133 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
The memory 132 may include readable media in the form of volatile memory, such as Random Access Memory (RAM) 1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.
Memory 132 may also include programs/utilities 1325 having a set (at least one) of program modules 1324, such program modules 1324 including but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 130, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur via input/output (I/O) interfaces 135. Also, the electronic device 130 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 136. As shown, the network adapter 136 communicates with other modules for the electronic device 130 over the bus 133. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as the memory 132 comprising instructions, executable by the processor 131 to perform the method of processing a document image is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product is also provided, which, when invoked for execution by an electronic device, causes the electronic device to perform any of the exemplary methods provided herein.
Also, a computer program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable Disk, a hard Disk, a RAM, a ROM, an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a Compact Disk Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product for processing of document images in embodiments of the present application may be in the form of a CD-ROM and include program code and may be run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, radio Frequency (RF), etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the user computing device over any kind of Network, such as a Local Area Network (LAN) or Wide Area Network (WAN), or may be connected to external computing devices (e.g., connected over the internet using an internet service provider).
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.
Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application also encompasses these modifications and variations.

Claims (15)

1. A document image processing method is characterized by comprising the following steps:
respectively identifying the document types of the obtained document images in the credit card delivery business by using at least two pre-trained classification models;
determining the bill category of the bill image based on the recognition results of the at least two classification models;
if the bill category is a credit message, matching text contents in the bill image by using a plurality of regular sentences, wherein the plurality of regular sentences are predetermined according to key sentences of each type of credit message on each page of message;
and determining a processing result of the document image based on the message category and the message page number corresponding to the successfully matched regular sentence.
2. The method of claim 1, wherein the step of respectively performing document type recognition on the obtained document images in the credit delivery service by using at least two classification models trained in advance comprises the following steps:
converting the document image into hypertext markup language (HTML) text;
extracting the content of the HTML text to obtain the text content contained in the document image;
and inputting the text content into each classification model for bill classification identification.
3. The method of claim 2, wherein entering the textual content into each classification model for document category identification comprises:
converting the text content into word frequency inverse text frequency tf-idf word vectors;
and inputting the tf-idf word vector into each classification model for bill class identification.
4. A method as claimed in claim 2 or 3, wherein prior to entering the text content into each classification model for document category identification, further comprising:
and determining that the number of characters contained in the text content is not less than a preset value.
5. The method of claim 1, in which a model complexity of each classification model is lower than a specified complexity.
6. The method of claim 1, wherein determining a document category for the document image based on the recognition results of the at least two classification models comprises:
counting the recognition results of the at least two classification models;
if only one identification result with the maximum counting times exists, determining the identification result with the maximum counting times as the bill type of the bill image;
and if at least two recognition results with the largest counting times exist, taking the recognition result of the classification model with the highest predetermined accuracy as the bill category of the bill image.
7. An apparatus for processing an image of a document, comprising:
the identification module is used for respectively identifying the bill types of the acquired bill images in the credit bill delivery service by utilizing at least two pre-trained classification models;
the category determining module is used for determining the bill category of the bill image based on the recognition results of the at least two classification models;
the matching module is used for matching the text content in the receipt image by utilizing a plurality of regular sentences if the receipt type is a credit certificate message, wherein the regular sentences are predetermined according to the key sentences of each credit certificate message on each page of message;
and the result determining module is used for determining the processing result of the document image based on the message category and the message page number corresponding to the successfully matched regular sentence.
8. The apparatus of claim 7, wherein the identification module is specifically configured to:
converting the document image into hypertext markup language (HTML) text;
extracting the content of the HTML text to obtain the text content contained in the document image;
and inputting the text content into each classification model for bill classification identification.
9. The apparatus of claim 8, wherein the identification module is specifically configured to:
converting the text content into word frequency inverse text frequency tf-idf word vectors;
and inputting the tf-idf word vector into each classification model for bill class identification.
10. The apparatus of claim 8 or 9, wherein the identification module is further to:
and before the text content is input into each classification model for bill category identification, determining that the number of characters contained in the text content is not less than a preset value.
11. The apparatus of claim 7, in which a model complexity of each classification model is lower than a specified complexity.
12. The apparatus of claim 7, wherein the category determination module is specifically configured to:
counting the recognition results of the at least two classification models;
if only one identification result with the maximum counting times exists, determining the identification result with the maximum counting times as the bill type of the bill image;
and if at least two recognition results with the largest counting times exist, taking the recognition result of the classification model with the highest predetermined accuracy as the bill category of the bill image.
13. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein:
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of claims 1-6.
15. A computer program product, characterized in that the computer program product, when invoked for execution by an electronic device, causes the electronic device to perform the method according to any of claims 1-6.
CN202211663202.5A 2022-12-23 2022-12-23 Document image processing method and device, electronic equipment and storage medium Pending CN115827869A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211663202.5A CN115827869A (en) 2022-12-23 2022-12-23 Document image processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211663202.5A CN115827869A (en) 2022-12-23 2022-12-23 Document image processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115827869A true CN115827869A (en) 2023-03-21

Family

ID=85517924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211663202.5A Pending CN115827869A (en) 2022-12-23 2022-12-23 Document image processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115827869A (en)

Similar Documents

Publication Publication Date Title
US20230222366A1 (en) Systems and methods for semantic analysis based on knowledge graph
US20230401828A1 (en) Method for training image recognition model, electronic device and storage medium
US20200004815A1 (en) Text entity detection and recognition from images
US20220309549A1 (en) Identifying key-value pairs in documents
US11531987B2 (en) User profiling based on transaction data associated with a user
US11822568B2 (en) Data processing method, electronic equipment and storage medium
CN110147540B (en) Method and system for generating business security requirement document
CN111507214A (en) Document identification method, device and equipment
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN113450075A (en) Work order processing method and device based on natural language technology
CN116628163A (en) Customer service processing method, customer service processing device, customer service processing equipment and storage medium
CN116912847A (en) Medical text recognition method and device, computer equipment and storage medium
CN112464927B (en) Information extraction method, device and system
CN111598122B (en) Data verification method and device, electronic equipment and storage medium
CN112669850A (en) Voice quality detection method and device, computer equipment and storage medium
CN115880702A (en) Data processing method, device, equipment, program product and storage medium
CN115827869A (en) Document image processing method and device, electronic equipment and storage medium
US20210312223A1 (en) Automated determination of textual overlap between classes for machine learning
CN114417045A (en) Insurance case spot inspection method, system, equipment and storage medium based on neural network
CN114444040A (en) Authentication processing method, authentication processing device, storage medium and electronic equipment
CN110083807B (en) Contract modification influence automatic prediction method, device, medium and electronic equipment
CN111311197A (en) Travel data processing method and device
CN110889717A (en) Method and device for filtering advertisement content in text, electronic equipment and storage medium
US20210342901A1 (en) Systems and methods for machine-assisted document input
CN111783471B (en) Semantic recognition method, device, equipment and storage medium for natural language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination