CN111242034A

CN111242034A - Document image processing method and device, processing equipment and client

Info

Publication number: CN111242034A
Application number: CN202010035313.6A
Authority: CN
Inventors: 周凡; 陈超; 连琨; 方雪琼; 朱世艾
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Ant Shengxin (Shanghai) Information Technology Co.,Ltd.
Priority date: 2020-01-14
Filing date: 2020-01-14
Publication date: 2020-06-05

Abstract

The embodiment of the specification discloses a document image processing method, a document image processing device, document processing equipment and a client. In one embodiment, the user may be guided to take a video of the document to determine the key area containing text from the document video. And then, image quality can be enhanced aiming at the key areas, and after a plurality of key areas are fused, a document image with enhanced image quality can be obtained. The enhanced document image quality is higher, and the accuracy of character recognition in the document is correspondingly improved.

Description

Document image processing method and device, processing equipment and client

Technical Field

The embodiment scheme of the specification belongs to the technical field of computer image data processing, and particularly relates to a document image processing method, a document image processing device, document image processing equipment and a client.

Background

With the development of internet technology, the development of internet insurance business is faster and faster. At present, many insurance companies provide on-line claim settlement services, and users can quickly and intelligently settle insurance claims by shooting and uploading pictures, identity documents and the like of vehicle damage sites. Different from traditional off-line claim settlement, the on-line claim settlement does not require a user to go to a specific place, so that the time of the user can be saved, and the claim settlement processing efficiency is improved.

In the on-line claims settlement process, various documents required for the settlement are often provided, such as invoices for medical treatment of patients in accidents, invoices for vehicle maintenance, vehicle repair lists and the like. In the existing processing process of some online claims settlement services, a user can use an intelligent terminal to shoot documents and then upload photos. After receiving the photo, the insurance company business personnel can manually identify the information in the photo, and then fill in the corresponding entry of the claim settlement business list to complete the subsequent on-line claim settlement business processing.

Disclosure of Invention

Embodiments of the present specification aim to provide a document image processing method, an apparatus, a processing device, and a client, which can effectively improve document image quality and improve accuracy of character recognition on a document.

The document image processing method, device, processing equipment and client provided by the embodiment of the specification are realized in the following modes:

a method of image processing of a document, the method comprising:

acquiring a document video;

selecting a plurality of key area images from the receipt video as candidate images, wherein the key areas comprise areas containing Chinese character information in the receipt;

determining the image quality of the key area in the candidate image according to a preset quality algorithm;

eliminating the candidate images with the image quality not meeting the requirement, and determining a residual image set;

and fusing the key area in the residual image set to obtain the enhanced document image.

An apparatus for image processing of a document, the apparatus comprising:

the video data acquisition module is used for acquiring document videos;

a key area processing module, configured to select multiple key area images from the document video as candidate images, where the key area includes an area containing Chinese character information in the document;

the quality calculation module is used for determining the image quality of the key area in the candidate image according to a preset quality algorithm;

the screening module is used for eliminating the candidate images with the image quality not meeting the requirement and determining a residual image set;

and the fusion processing module is used for fusing the key areas in the residual image set to obtain the enhanced document image.

A document image processing apparatus comprising a processor and a memory for storing processor-executable instructions that, when executed by the processor, implement:

acquiring a document video;

A client includes a display screen, a camera, a processor, and a memory storing processor-executable instructions,

the shooting device is used for video shooting of the document;

the display screen user displays shooting information, and the processor executes the instructions to realize:

displaying video shooting information or video shooting guide information of a receipt in the display screen;

acquiring a document video;

A server comprising a processor and a memory for storing processor-executable instructions that when executed by the processor implement:

acquiring a document video;

The document image processing method, the document image processing device, the document image processing equipment and the document client, which are provided by the embodiment of the specification, can determine a key area containing characters from a document video by guiding a user to carry out video shooting on the document. And then, image quality can be enhanced aiming at the key areas, and after a plurality of key areas are fused, a document image with enhanced image quality can be obtained. The enhanced document image quality is higher, and the accuracy of character recognition in the document is improved. In the insurance claim settlement scene, the embodiment provided by the specification can improve the accuracy of character recognition in the document shot by the mobile device, improve the automatic processing process of the claim settlement process and improve the claim settlement processing efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a process flow diagram of an embodiment of a method described herein;

FIG. 2 is a schematic diagram of an interaction scenario for prompting a user to take a document video in one implementation scenario of the present description;

FIG. 3 is a schematic flow chart diagram of another embodiment of the method provided herein;

FIG. 4 is a schematic process diagram of another embodiment of the method provided herein;

FIG. 5 is a schematic flow chart diagram of another embodiment of the method provided herein;

FIG. 6 is a schematic diagram illustrating an interaction scenario for prompting a user to take a document in an implementation scenario of the present description;

FIG. 7 is a block diagram of a hardware configuration of a client to which an image processing method for documents according to an embodiment of the present invention is applied;

FIG. 8 is a block diagram of an embodiment of an image processing apparatus that may be used with a document on the client side of a user;

FIG. 9 is a block diagram of another embodiment of an image processing apparatus for documents provided in the present specification.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments in the present specification, and not all of the embodiments. All other embodiments that can be obtained by a person skilled in the art on the basis of one or more embodiments of the present description without inventive step shall fall within the scope of protection of the embodiments of the present description.

At present, the insurance industry often provides a scheme for on-line claim settlement in the fields of property insurance, accident insurance, health insurance and the like, so that a user can shoot documents such as invoices, repair lists, medical documents and the like, upload photos and complete collection of materials required for claim settlement, thereby reducing the trouble of the user and improving the experience of claim settlement service. After receiving the photo, the insurance company can use a manual identification mode, and can also use electronic equipment or a computer to complete Optical Character Recognition (OCR) in some links, so as to automatically extract the information on the document and improve the working efficiency. The traditional OCR technology has a good effect when images generated by professional image acquisition equipment such as a scanner are solved. However, for the service scene of online claim settlement, a user usually uses a common mobile phone to take a picture, and compared with a professional claim settlement person of an insurance company or a professional image acquisition device used by the insurance company, the quality of the acquired image is poor and the accuracy of character recognition is difficult to achieve due to various reasons such as poor optical quality of a camera, insufficient performance of the mobile phone, low picture resolution, insufficient illumination, serious shake during shooting, inaccurate focusing, insufficient specialization of a shooting technology and the like.

The embodiment of the specification provides a technical scheme combining video fusion processing, and the technical scheme guides a user to extract a key area from a video and perform fusion processing with image quality after the user performs video shooting on a document to obtain an enhanced document image with higher image quality, so that the enhanced function of a document image with low quality/low resolution is realized, and the identification accuracy of character information in the document is improved. The scheme of the embodiment of the specification can be applied to various terminal devices, such as a client side of a user for shooting documents or a server for processing images shot by the documents. The client can comprise an accident party initiating online claims settlement, and also can be terminal equipment with a shooting function (at least with the shooting function) used by insurance company personnel, such as a smart phone, a tablet personal computer, intelligent wearable equipment, vehicle-mounted equipment, special shooting equipment and the like. The client can be provided with a communication module and can be in communication connection with a remote server to realize data transmission with the server. The server may include a system for processing claims at the insurance company side, or may include an intermediate platform server, such as a server of a certain payment application. The specific architecture of the server may include a single computer device, or may include a server cluster composed of a plurality of servers, or a server of a distributed system, or a server combining a blockchain data storage.

The following describes an embodiment of the present specification by taking a specific application scenario of processing a claim document in an online car insurance claim. Specifically, fig. 1 is a schematic flowchart of an embodiment of an image processing method for a document provided in this specification. Although the present specification provides the method steps or apparatus structures as shown in the following examples or figures, more or less steps or modules may be included in the method or apparatus structures based on conventional or non-inventive efforts. In the case of steps or structures which do not logically have the necessary cause and effect relationship, the execution order of the steps or the block structure of the apparatus is not limited to the execution order or the block structure shown in the embodiments or the drawings of the present specification. When the described method or module structure is applied to a device, a server or an end product in practice, the method or module structure according to the embodiment or the figures may be executed sequentially or in parallel (for example, in a parallel processor or multi-thread processing environment, or even in an implementation environment including distributed processing and server clustering).

Of course, the following description of the embodiments of the claim document does not limit the other extensible solutions based on the present description. For example, in other implementation scenarios, the embodiments provided in this specification can also be applied to other document image capturing interactions, or a scenario of capturing interactions of certificates, or an application scenario of document image processing of financial reimbursement, online shopping, and the like. Even more, the target object processed in some embodiments may not be limited to a claim receipt, for example, a person, an injured vehicle in a claim service, clothing, or other video capture object. Accordingly, the claims document video, the enhanced document image and the like described in the subsequent embodiment steps can be replaced by corresponding target objects, such as the claims vehicle video. In a specific embodiment, as shown in fig. 1, in an embodiment of a method for processing an image of a document provided by the present specification, the method may include:

s0: and acquiring a document video.

In this embodiment, the processing device may obtain a claim settlement document video (a document video in an application scene) obtained by shooting a document. The processing device may be a client such as a mobile terminal of a user, a vehicle-mounted device, or the like, or may be a server. For example, after a user uses the mobile terminal to shoot a document to generate a video of the claim settlement document, the video of the claim settlement document can be uploaded to a server through a network or transmitted to the server through storage media such as a U disk, and the server obtains the video of the claim settlement document of the document.

In some embodiments, the user may directly use the client to capture the document to obtain video data. In another embodiment provided by the specification, the document video shooting of the user can be guided at the client, so that the user can finish the claim settlement document video shooting more accurately, quickly and conveniently. Therefore, in another embodiment of the method provided in this specification, the acquiring a document video of a document may include;

s02: displaying video shooting guide information of a bill required by on-line claim settlement at a client;

s04: and acquiring a video of the claim settlement document shot according to the video shooting guide information.

In this embodiment, the user can use the mobile phone and other clients to perform on-line claims settlement of the car insurance service. Online claims may require images of multiple or multiple documents. In this embodiment, the user may directly photograph the document, or may determine the type/kind of the document to be photographed after selection. In other embodiments, the client may prompt or instruct the user to take one or some documents according to a certain rule or sequence.

In the embodiment of the specification, video shooting guide information of a document required for online claims can be displayed on the client. The shooting guide information may include an interface for the user to select the type or the category of the document to be shot, or may include prompt information displayed in the shooting view-finding frame to guide the user to correctly shoot the document video. For example, the text reminding information of "please aim at the invoice red chapter" may also include one or more of text, voice, arrow image, animation, etc. in combination to realize the video shooting reminding, as shown in fig. 2.

And the user finishes video shooting of the claim settlement document according to the video shooting guide information, and can obtain a corresponding claim settlement document video. The claim receipt video can be a whole video obtained by shooting one or more receipts, or can be a plurality of video segments.

S2: and selecting a plurality of key area images from the receipt video as candidate images, wherein the key areas comprise areas containing Chinese character information in the receipt.

In the embodiment, the area containing the text information in the document can be used as a key area. In the recorded video data of the claim document, there may be some frame images containing text information, that is, there is the key area, and some frame images may not contain text information, that is, there may not be the key area. For example, in some embodiment scenarios, for some frame images, there may be multiple text-containing regions, which in some embodiments may be divided into multiple key regions, each of which may contain one or more text regions. Correspondingly, some image frames may include multiple key regions. In other embodiments, the text area in the frame image may be divided into a key area.

In this embodiment, a plurality of key area images including the key area may be selected from the claim receipt video as candidate images. The key area image may be a frame image containing the key area in some embodiments. In other embodiments of the method that is enhanced by the present specification, the key area image may also be a sub-area containing text information determined from a frame image in a document video. For example, the frame image may be divided into several key regions, and each key region corresponding to a block in the frame image may be used as a key region image in some embodiments of the present specification.

In this embodiment, a plurality of candidate images including a key area may be selected from a document video. The selection mode can be determined according to a specific application scene. For example, in some embodiments, key area images of consecutive frames may be selected as candidate images. Or extracting frame images of the designated position/time/sequence as candidate images after determining a certain key frame image containing a key area, for example, extracting three frame images before and after the key frame image and 7 images of the key frame as candidate images; or after the key frame is determined, continuously extracting 9 frame images containing the key area as candidate images at intervals of 125ms, and the like.

S4: determining the image quality of the key area in the candidate image according to a preset quality algorithm;

s6: and eliminating the candidate images with the image quality not meeting the requirement, and determining a residual image set.

The embodiment can calculate the image quality of the key area to obtain the image quality evaluation data. And performing subsequent screening according to the image quality, eliminating the images with the image quality not meeting the requirements, and reserving the candidate images with higher quality. Thus, enhanced document images with higher image quality can be obtained by processing candidate images with higher quality.

It should be noted that the image quality described in this specification may have a corresponding determined evaluation basis in different application scenarios or embodiments. Some embodiments of the present disclosure may include a quantized value obtained by performing data calculation on some parameters of an image by a computer, such as image size, number of pixels, color, signal-to-noise ratio, degree of sharpening, gray scale, and the like. In one embodiment provided by the present specification, the image quality may include at least a resolution parameter. The resolution generally refers to the amount of information stored in the image, such as how many pixels are in each inch of the image, and the unit of the resolution is ppi (pixels per inch), which may be called pixels per inch.

In other embodiments of the present description, specific quality algorithms may be set and determined based on different quality assessment requirements or criteria. For example, in some embodiments, image quality algorithms may be determined for photographic completeness, attention strength, blur level, etc. of an image. Or the image quality algorithm may be determined from pixel statistics of the image, the amount or content of information contained, the results of the image, etc. Or the quality algorithm of the image and the like can be determined from the signal-to-noise ratio, the mean square error, the information entropy, the structural distortion degree and the like of the image. Of course, the quality algorithm of the image may also be determined in combination with one or more of the above-mentioned parameters.

And calculating the image quality of each key area in the candidate image according to a preset quality algorithm, and then screening the image according to the image quality. The specific image quality requirements may correspond to the presetting of the setting parameters. And further, unqualified images, such as too dark and too bright images, too fuzzy image slices, images with missing key information and the like can be eliminated according to a certain threshold value. After the candidate images with unsatisfactory image quality are removed, the remaining candidate images can be called a residual image set.

S8: and fusing the key area in the residual image set to obtain the enhanced document image.

The image fusion generally refers to that image data collected by a multi-source channel and related to the same target is subjected to image processing, computer technology and the like, so that favorable information in respective channels is extracted to the maximum extent, and finally, high-quality images are synthesized, so that the utilization rate of image information is improved, the computer interpretation precision and reliability are improved, and the spatial resolution, the spectral resolution and the like of an original image are improved. The fusion described in this specification may include, in some embodiments, a process of integrating a plurality of key regions (or one key region in a special case) pointing to the same region into one key region. The specific fusion mode can be set according to the scene or image processing requirements. For example, the method may include, but is not limited to, image superposition, stitching, cropping, calculation of gray values of images or calculation of average values of pixel colors as fused pixel values, and the like. The fusion may include only fusing key regions in a plurality of remaining images in some embodiments, and may also include a mode of fusing key regions and performing fusion processing on other regions in the remaining images.

The fusion of images can be generally divided into data-level fusion, feature-level fusion and decision-level fusion. The data level fusion is also called pixel level fusion, and the implementation method comprises a space domain algorithm and a transform domain algorithm, wherein the space domain algorithm can comprise a logic filtering method, a gray weighted average method, a contrast modulation method and the like; the algorithm in the transform domain may include a pyramid decomposition fusion method, a wavelet transform method. In feature level fusion, it can be ensured that different images contain information features, such as infrared light characterization of object heat, visible light characterization of object brightness, and so on. The decision-level fusion can be realized by combining a Bayes method, a D-S evidence method, a voting method and the like. Fusion algorithms often incorporate an average, entropy, standard deviation, mean gradient, etc. of the image. The specific way of fusing the key areas of the remaining images in the remaining image set is not repeated one by one.

In another embodiment of the method provided in the present specification, different weights may be given to different image qualities, and the fusion of images may be performed according to the weights. Specifically, in another embodiment of the method provided in this specification, the fusing the key areas in the remaining image set includes:

s80: respectively determining the same residual image corresponding to the key area pointing to the same area in the residual image set;

s82: giving corresponding weight to the same residual image of the same region according to the image quality of the key region;

s84: and performing weighted fusion on the key areas of the residual images in the residual image set according to the weights.

The plurality of frame images can have the same character area and can correspond to a plurality of key areas pointing to the same character area. In this embodiment, the remaining images corresponding to the key areas pointing to the same area may be referred to as the same block remaining images. In some embodiments, image quality, such as a quality score, for the key region may be calculated. The key area of the same area may have different quality scores in different same residual images, and the weights of the same residual images may be set according to the quality scores. And then performing weighted fusion according to the weight pairs.

For example, in one example, for the same line text area "company address: selecting and screening Suzhou science and technology city Koi-kung road No. 158' to obtain 3 same-block residual images P1, P2 and P3, and respectively calculating the image quality scores of key areas in the 3 same-block residual images to be 90, 91 and 80. P1, P2, P3 may then be given weights of 0.4, 0.2 according to the image quality score. In this way, the same residual image subjected to the fusion processing has a corresponding weight, and then the residual images in the residual image set (the same residual image corresponding to the key region of the same region) can be subjected to weighted fusion according to the weights. The specific weighting fusion processing mode may be a weighting processing according to the previously determined fusion processing mode. For example, the feature data D1, D2, and D3 of the key region are extracted during the fusion process, and the weight of the feature data D1, D2, and D3 may be respectively increased according to the weight of P1, P2, and P3, 0.4, and 0.2.

The present specification provides an implementation of weighted fusion processing, which can perform weighted fusion on key region pixels in the remaining image according to weight based on three RGB channels. Specifically, in another embodiment of the method described in this specification, the performing weighted fusion on the key regions of the remaining images in the remaining image set according to the weights may include:

s840: and respectively performing weighted fusion on the three channels of the pixels RGB in the key area in the same residual image according to the weight of the residual image.

For example, in one embodiment, the RGB values of the pixel M at the same position in the critical region of the same block of the remaining images P1, P2, and P3 are (200, 164, 180), (208, 160, 180), (180, 156, 192), respectively. Calculating the RGB value of the pixel M after fusion according to the weights 0.4, 0.4 and 0.2 of the residual images as follows:

(200*0.4+208*0.4+180*0.2)，(164*0.4+160*0.4+156*0.2)，(180*0.76180*0.4+192*0.4)＝(199，160，182)。

the RGB values (199, 161, 182) of the pixel M are rounded values. In this way, when the images are fused in the key area, the images with high image quality occupy larger weight or proportion, so that the image quality after weighted fusion is improved.

Of course, in other embodiments of the present description, image quality enhancement may be performed in other ways or at other steps of the process. For example, a supervised rivalry generation network may be employed to achieve low quality/low resolution image enhancement after critical areas are determined.

After the enhanced document image is obtained, the text information on the document can be automatically extracted by utilizing Optical Character Recognition (OCR). The image after the image quality enhancement processing has better image quality, and can effectively improve the character recognition accuracy rate, thereby improving the processing efficiency of the on-line claim settlement service. Therefore, as shown in fig. 3, fig. 3 is a schematic flow chart of another embodiment of the method provided in this specification, and may further include:

s100: and recognizing the text information in the enhanced document image by using an optical character recognition mode.

In other embodiments of the method described herein, the recognized text information may be further checked to determine whether the recognized text information is valid or available. If the text information after identification is determined to be invalid or unavailable, the user can be reminded to shoot the video again or switch to manual processing in time, the situation that the document is shot again by the user due to the fact that the text information is invalid and the like in the subsequent processing flow can be prevented, the user experience can be improved, and the on-line claim settlement processing efficiency can be improved. Specifically, in another embodiment of the method provided in this specification, the method may further include:

s120: determining the confidence degree of the recognized character information according to a preset mode;

s122: and if the confidence degree is greater than or equal to a preset threshold value, determining that the characters identified from the claim settlement document are the character information.

The confidence degree can represent the reliability degree of the recognized text information corresponding to the text information in the enhanced document image, and can be a representation form of a probability value, a representation form of different divided levels, a representation form of a score value and the like. The preset mode for calculating the confidence level may be set according to the scene needs or the business needs, for example, some supervised machine learning algorithms (such as random forest, logistic regression, bayesian network, etc.) are adopted to process the recognized text information and the enhanced document image, and the confidence level (probability value) of the text information is output, or in other embodiments, the confidence level of the recognized text information is determined by performing word segmentation, keyword detection, syntactic analysis, etc., or the text information is queried/matched in a designated database to determine the confidence level. If the confidence degree is larger than a preset threshold value, the text information can be used as effective and available text information identified from the claim settlement document.

As mentioned above, the method may further comprise:

s124: if the confidence degree is smaller than the preset threshold value, manually processing or prompting the user to shoot the claim settlement document image again.

For example, if the confidence level of the recognized text message is 0.6 and is lower than the preset threshold value of 0.8, it may indicate that the recognized text message may have a typesetting error or a recognition error, and the reliability is low, and the text message cannot be used as the text message content for the online claims processing. Fig. 4 is a schematic processing process diagram of another embodiment of the method provided in this specification, as shown in fig. 4, if the confidence level is lower than the preset threshold, at this time, the user may be timely reminded to shoot again or switch to manual processing, so that situations that the user shoots the document again when text information is found to be invalid in the subsequent processing flow can be prevented, user experience can be improved, and online claim settlement processing efficiency can be improved.

According to the embodiment of the specification, the low-quality images collected in the self-service shooting scene of the user can be enhanced in an online shooting document and automatic character recognition mode, and the OCR recognition accuracy is improved. In other embodiments of the present description, whether a document photographed by a user meets requirements may be further checked based on the text information with higher recognition accuracy obtained from the document, so as to form a forward feedback, and further assist the user in photographing the document or correct the wrong document photographing by the user in time. Specifically, as shown in fig. 5, in another embodiment of the method provided in this specification, the method may further include:

s140: judging whether the shot document meets the requirements or not according to the recognized character information;

s142: and performing corresponding claim settlement service processing according to the judgment result.

For example, in a specific application scenario, it is known that an invoice issued by vehicle maintenance is shot according to the text information, and an invoice issued by hospitalization is uploaded for current vehicle insurance claim settlement service processing, so that it can be determined that the current shot invoice of the user does not meet the claim settlement requirement according to the recognized text information. Prompt information can be displayed at the client side at the moment, and the user is prompted to shoot the admission invoice. Or when the admission receipt is shot, the recognized text information is found to have no information for protecting the 'expense total', and in this case, the text information may be shot by the user at a short distance, and the 'expense total' is not displayed in a shot view frame. In this case, the device may be set not to comply with the claim requirement, and the user may be prompted to shoot again. Of course, if the claims are met, the next processing operation can be correspondingly performed according to the claim service processing flow.

In another embodiment of the method provided by the present specification, the determining whether the photographed claim document meets the claim requirement according to the recognized text information at least includes one of the following:

and judging whether the claim settlement document has missing information or not according to the identified text information. Such as the lack of a "total charge" as described above, or the lack of a license plate number or license plate number for the vehicle in the vehicle repair documentation, etc.

And judging whether the missed non-shot claim settlement documents exist according to the identified text information. For example, in the injury settlement, a plurality of invoices such as admission invoices and medical expense bills are required, and the bills of the medical expense bills are still absent according to the recognized text information. At this time, the user can be prompted to take a video shot of the bill of the medical expense situation again.

And judging whether the type of the claim settlement document is correct or not according to the recognized text information. For example, if the online claims are requested to be uploaded by the user as an invoice union, and the user uploads a billing union, it can be determined that the type of the document uploaded by the user is wrong. Or, the user is required to upload a special value-added tax invoice generated by accommodation, and the user uploads a receipt for collecting accommodation fee, so that the type of the claim settlement document can be judged to be incorrect.

And judging whether the claim object name on the claim receipt corresponds to the object name of the claim in the claim request or not according to the identified text information. The claim object can be determined according to a specific application scene. For example, the claim subject in the claim request in the car insurance accident is the injured hospitalized person named "zhang jia", and the hospitalized person in the uploaded claim document is named "zhang shen". In the embodiment, the fact that the claim settlement objects do not correspond to each other can be recognized, the claim receipt video can be required to be shot again, or fraud risks and the like are prompted to exist.

For example, as shown in a scene diagram of fig. 6, if it is found that there is no information of a drawer in a captured invoice image according to the recognized text information, a prompt message "the ticket is not qualified, the drawer information is lacking, and the drawer information is required to be captured again" can be displayed in the captured viewfinder window in real time.

Of course, other embodiments may also include other specific scenarios for determining whether the photographed claim document meets the claim settlement requirements according to the recognized text information. Whether the document shot by the user meets the claim settlement requirement is judged according to the recognized text information, the user can be timely reminded of correction, supplement, rephotography and the like under the non-conforming condition, the problems of rephotography, information recognition errors and the like caused by the document shooting problem in the follow-up process can be avoided, and the claim settlement service processing efficiency is improved. The video shooting guide information is combined, so that the user can quickly know where the shot document has problems, the user is guided to quickly and conveniently complete document video shooting, and the user online claim settlement service use experience and terminal use experience are improved.

In the present specification, each embodiment of the method is described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. Reference is made to the description of the method embodiments.

According to the document image processing method provided by the embodiment of the specification, the user can be guided to carry out video shooting on the document, and the key area containing characters can be determined from the document video. And then, image quality can be enhanced aiming at the key areas, and after a plurality of key areas are fused, a document image with enhanced image quality can be obtained. The enhanced document image quality is higher, and the accuracy of character recognition in the document is improved. In the insurance claim settlement scene, the embodiment provided by the specification can improve the accuracy of character recognition in the document shot by the mobile device, improve the automatic processing process of the claim settlement process and improve the claim settlement processing efficiency.

The method embodiments provided by the embodiments of the present specification can be executed in a mobile terminal, a computer terminal, a server or a similar computing device. Taking the example of running on a smart phone client, fig. 7 is a hardware structure block diagram of a client to which the image processing method for documents according to the embodiment of the present invention is applied. As shown in fig. 7, client 10 may include one or more (only one shown) processors 102 (processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission module 106 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 7 is only an illustration and is not intended to limit the structure of the electronic device. For example, the client 10 may also include more or fewer components than shown in fig. 7, and may also include other Processing hardware, such as a GPU (Graphics Processing Unit), for example, or have a different configuration than shown in fig. 7.

The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the search method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by executing the software programs and modules stored in the memory 104, so as to implement the processing method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to a computer terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission module 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission module 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission module 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

Based on the document image processing method, the specification further provides a document image processing device. The apparatus may comprise a system (including a distributed system), software (applications), modules, components, servers, clients, etc. that utilize the methods described in the embodiments of the present specification in conjunction with any necessary equipment to implement the hardware. Based on the same innovative concept, the processing device in one embodiment provided in the present specification is as described in the following embodiment. Since the implementation scheme for solving the problem of the apparatus is similar to that of the method, the implementation of the specific processing apparatus in the embodiment of the present specification may refer to the implementation of the foregoing method, and repeated details are not repeated. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated. Specifically, as shown in fig. 8, fig. 8 is a schematic block diagram of an embodiment of an image processing apparatus that can be used for a document on the side of a user client according to the present specification, and specifically includes:

the video data acquisition module 801 may be configured to acquire a document video;

a key area processing module 802, configured to select multiple key area images from the document video as candidate images, where the key area includes an area containing Chinese character information in the document;

a quality calculation module 803, configured to determine the image quality of the key area in the candidate image according to a preset quality algorithm;

the screening module 804 may be configured to remove the candidate images whose image quality does not meet the requirement, and determine a remaining image set;

the fusion processing module 805 may be configured to fuse the key areas in the remaining image set to obtain an enhanced document image.

Based on the foregoing description of the embodiment, in another embodiment of the apparatus provided in this specification, the fusing the key areas in the remaining image set includes:

respectively determining the same residual image corresponding to the key area pointing to the same area in the residual image set;

giving corresponding weight to the same residual image of the same region according to the image quality of the key region;

and performing weighted fusion on the key areas of the residual images in the residual image set according to the weights.

Based on the foregoing description of the embodiment, in another embodiment of the apparatus provided in this specification, the key area image includes a sub area containing text information determined from a frame image in the document video.

Based on the foregoing description of the embodiments, in another embodiment of the apparatus provided in this specification, the performing weighted fusion on the key areas of the remaining images in the remaining image set according to the weights includes:

and respectively performing weighted fusion on the three channels of the pixels RGB in the key area in the same residual image according to the weight of the residual image.

Based on the foregoing description of the embodiments, this specification provides another embodiment of the apparatus, where the apparatus further includes: and the character recognition module can be used for recognizing the character information in the enhanced receipt image by using an optical character recognition mode.

Based on the description of the foregoing embodiment, in another embodiment of the apparatus provided in this specification, the apparatus may further include a shooting guidance module, which may be configured to display video shooting guidance information of a document required for online claim settlement at a client, and acquire a video of the claim document shot according to the video shooting guidance information.

FIG. 9 is a block diagram of another embodiment of an image processing apparatus for documents provided in the present specification. Based on the foregoing description of the embodiments, in another embodiment of the apparatus provided in the present specification, the apparatus may further include:

the feedback module 806 may be configured to determine whether the shot document meets the requirement according to the recognized text information; and performing corresponding service processing according to the judgment result.

Based on the description of the foregoing embodiment, in another embodiment of the apparatus provided in this specification, the determining whether the photographed claim document meets the claim requirement according to the recognized text information includes at least one of the following:

judging whether the claim settlement document has missing information or not according to the identified text information;

judging whether a missing non-shot claim settlement document exists according to the identified text information;

judging whether the type of the claim settlement document is correct or not according to the recognized text information;

and judging whether the claim object name on the claim receipt corresponds to the object name of the claim in the claim request or not according to the identified text information.

The document image processing device provided by the embodiment of the specification can determine a key area containing characters from a document video by guiding a user to shoot the document video. And then, image quality can be enhanced aiming at the key areas, and after a plurality of key areas are fused, a document image with enhanced image quality can be obtained. The enhanced document image quality is higher, and the accuracy of character recognition in the document is improved. In the insurance claim settlement scene, the embodiment provided by the specification can improve the accuracy of character recognition in the document shot by the mobile device, improve the automatic processing process of the claim settlement process and improve the claim settlement processing efficiency.

It should be noted that the apparatus described above in the embodiments of the present disclosure may also include other embodiments according to the description of the related method embodiments. The specific implementation manner may refer to the description of the method embodiment, and is not described in detail herein.

The document image processing method or apparatus provided in the embodiments of the present specification may be implemented in a computer by a processor executing corresponding program instructions, for example, implemented on a PC side using a c + + language of a windows operating system, or implemented in other hardware necessary for an application design language set corresponding to Linux, android, and iOS systems, or implemented based on processing logic of a quantum computer. In particular, the present description provides an embodiment of an image processing apparatus that may implement the above method, the processing apparatus may include a processor and a memory for storing processor-executable instructions, the processor implementing, when executing the instructions:

acquiring a document video;

The processing device may be the client or the server. It should be noted that, the processing device described above in this embodiment of the present disclosure may also include other implementations according to the description of the related method embodiment. The specific implementation manner may refer to the description of the method embodiment, and is not described in detail herein.

The instructions described above may be stored in a variety of computer-readable storage media. The computer readable storage medium may include physical devices for storing information, which may be digitized and then stored using an electrical, magnetic, or optical media. The computer-readable storage medium according to this embodiment may include: devices that store information using electrical energy, such as various types of memory, e.g., RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, and usb disks; devices that store information optically, such as CDs or DVDs. Of course, there are other ways of storing media that can be read, such as quantum memory, graphene memory, and so forth. The instructions in the devices or servers or clients or systems described below are as described above.

Based on the foregoing, embodiments of the present specification further provide a client, which may include a display screen, a camera, a processor, and a memory storing processor-executable instructions. The display screen may include a touch screen, a liquid crystal display, a projection device, and the like for displaying information content. The client type can comprise a mobile terminal, a special document collecting device, a vehicle-mounted interaction device, a personal computer and the like.

The shooting device is used for video shooting of the document;

the display screen user displays shooting information, and the processor can realize the following steps when executing the instruction:

acquiring a document video;

Based on the foregoing, embodiments of the present specification further provide a server, which may include a processor and a memory for storing processor-executable instructions, where the processor executes the instructions to implement:

acquiring a document video;

It should be noted that, the client, the server, the processing device, or the like described above in this embodiment of the present disclosure may also include other embodiments according to the description of the related method embodiment, such as recognizing the text information in the enhanced document image by using an optical character recognition method. The specific implementation manner may refer to the description of the method embodiment, and is not described in detail herein.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the hardware + program class embodiment, since it is substantially similar to the method embodiment, the description is simple, and the relevant points can be referred to the partial description of the method embodiment.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Although the contents of the embodiments of the present specification refer to operations and data descriptions of data acquisition, transmission, interaction, computation, judgment, etc., such as model structure for generating a countermeasure network, ORC algorithm, image quality calculation method, video data capture and acquisition, pixel-level image fusion, etc., the embodiments of the present specification are not limited to those that necessarily conform to industry communication standards, machine learning standard models, standard image data processing protocols, communication protocols, and standard data models/templates or are described in the embodiments of the present specification. Certain industry standards, or implementations modified slightly from those described using custom modes or examples, may also achieve the same, equivalent, or similar, or other, contemplated implementations of the above-described examples. The embodiments using these modified or transformed data acquisition, storage, judgment, processing, etc. may still fall within the scope of the alternative embodiments of the present description.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardsradware (Hardware Description Language), vhjhd (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a vehicle-mounted human-computer interaction device, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Although embodiments of the present description provide method steps as described in embodiments or flowcharts, more or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual apparatus or end product executes, it may execute sequentially or in parallel (e.g., parallel processors or multi-threaded environments, or even distributed data processing environments) according to the method shown in the embodiment or the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, in implementing the embodiments of the present description, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of multiple sub-modules or sub-units, and the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The embodiments of this specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The described embodiments may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of an embodiment of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

The above description is only an example of the embodiments of the present disclosure, and is not intended to limit the embodiments of the present disclosure. Various modifications and variations to the embodiments described herein will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present specification should be included in the scope of the claims of the embodiments of the present specification.

Claims

1. A method of image processing of a document, the method comprising:

acquiring a document video;

2. The method of claim 1, the fusing key regions in the set of remaining images comprising:

3. The method of claim 1 or 2, wherein the key area image comprises a sub-area containing text information determined from a frame image in the document video.

4. The method of claim 2, wherein the weighted fusing of key regions of the remaining images of the set of remaining images by weight comprises:

5. The method of claim 1, the obtaining a document video comprising;

displaying video shooting guide information of a bill required by on-line claim settlement at a client;

and acquiring a video of the claim settlement document shot according to the video shooting guide information.

6. The method of claim 1, further comprising:

and recognizing the text information in the enhanced document image by using an optical character recognition mode.

7. The method of claim 6, further comprising:

judging whether the shot document meets the requirements or not according to the recognized character information;

and performing corresponding service processing according to the judgment result.

8. An apparatus for image processing of a document, the apparatus comprising:

the video data acquisition module is used for acquiring document videos;

9. The apparatus of claim 8, the fusing key regions in the set of remaining images comprising:

10. The apparatus of claim 8 or 9, the key area image comprising a sub-area containing text information determined from a frame image in the document video.

11. The apparatus of claim 9, the weighted fusing by weight key regions of remaining images in the set of remaining images comprising:

12. The apparatus of claim 8, further comprising: and the character recognition module is used for recognizing the character information in the enhanced receipt image by using an optical character recognition mode.

13. The device of claim 8, further comprising a shooting guide module for displaying video shooting guide information of a document required for online claim at the client, and acquiring a video of the claim document shot according to the video shooting guide information.

14. The apparatus of claim 13, the apparatus further comprising:

the feedback module is used for judging whether the shot document meets the requirements or not according to the recognized text information; and carrying out corresponding claim settlement service processing according to the judgment result.

15. A document image processing apparatus comprising a processor and a memory for storing processor-executable instructions that, when executed by the processor, implement:

acquiring a document video;

16. A client includes a display screen, a camera, a processor, and a memory storing processor-executable instructions,

the shooting device is used for video shooting of the document;

acquiring a document video;

17. A server comprising a processor and a memory for storing processor-executable instructions that when executed by the processor implement:

acquiring a document video;