CN116740727A

CN116740727A - Bill image processing method and device, electronic equipment and storage medium

Info

Publication number: CN116740727A
Application number: CN202310789879.1A
Authority: CN
Inventors: 苏绪平; 杜伟
Original assignee: Zhizhen Artificial Intelligence Technology Shanghai Co ltd; Shanghai Xiaoi Robot Technology Co Ltd
Current assignee: Zhizhen Artificial Intelligence Technology Shanghai Co ltd; Shanghai Xiaoi Robot Technology Co Ltd
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2023-09-12

Abstract

The embodiment of the invention discloses a bill image processing method, a bill image processing device, electronic equipment and a storage medium. The method comprises the following steps: acquiring a bill image to be processed; classifying foreground or background of each pixel point in the bill image, and dividing bill subgraphs corresponding to independent bills in the bill image according to classification results; according to the position coordinates of each edge contour point in the bill subgraph, perspective transformation is carried out on the bill subgraph to obtain a regular placement subgraph corresponding to the bill subgraph; and carrying out optical character recognition on the regular placement subgraph, and generating a forward placement subgraph corresponding to the regular placement subgraph according to a recognition result. The technical scheme of the embodiment of the invention provides a general bill processing scheme irrelevant to the background and the type, and effectively improves the bill segmentation accuracy and the bill rotation angle identification accuracy.

Description

Bill image processing method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and apparatus for processing a bill image, an electronic device, and a storage medium.

Background

Along with the continuous acceleration of social informatization steps, more and more scenes are available for extracting needed information from actual notes for electronic recording. Current bill information extraction relies mainly on optical character recognition (optical character recognition, OCR) technology to extract key information in the bill image, and then extracts the required content from the key information through position or semantic information.

In order to ensure the accuracy of OCR recognition, it is generally required to accurately divide a single bill drawing in a bill image, and to place the single bill drawing in a horizontal placement state. Based on the above, the prior art is mainly based on the deep learning technology, and the position and the category of each bill in the bill image are detected by using a rotation target detection algorithm, and after a single bill image in a horizontal state is obtained by rotation, the single bill image is sent to an OCR module for subsequent recognition.

In the process of realizing the invention, the inventor finds that the realization mode of the prior art has good segmentation and rotation effects only aiming at bill images of known bill types, and when the bill images of unknown types are input, the rotation target detection algorithm cannot detect the rotation angle of the bill, so that the accuracy of OCR recognition is reduced, and the accuracy of extracting key information based on the bill positions is reduced.

Disclosure of Invention

The embodiment of the invention provides a processing method, a device, electronic equipment and a storage medium for bill images, which realize a general bill processing scheme irrelevant to the background and the type and improve the bill segmentation precision and the bill rotation angle identification accuracy.

According to an aspect of the embodiment of the present invention, there is provided a method for processing a ticket image, including:

acquiring a bill image to be processed, wherein the bill image comprises at least one independent bill;

classifying foreground or background of each pixel point in the bill image, and dividing bill subgraphs corresponding to independent bills in the bill image according to classification results;

according to the position coordinates of each edge contour point in the bill subgraph, perspective transformation is carried out on the bill subgraph to obtain a regular placement subgraph corresponding to the bill subgraph;

and carrying out optical character recognition on the regular placement subgraph, and generating a forward placement subgraph corresponding to the regular placement subgraph according to a recognition result.

According to another aspect of the embodiment of the present invention, there is provided a processing apparatus for ticket images, including:

the bill image acquisition module is used for acquiring a bill image to be processed, wherein the bill image comprises at least one independent bill;

The bill sub-graph segmentation module is used for classifying the foreground or the background of each pixel point in the bill image respectively and segmenting bill sub-graphs corresponding to the independent bills in the bill image according to classification results;

the perspective transformation module is used for carrying out perspective transformation on the bill subgraph according to the position coordinates of each edge contour point in the bill subgraph to obtain a regular placement subgraph corresponding to the bill subgraph;

and the forward-direction placement sub-graph generating module is used for carrying out optical character recognition on the regular placement sub-graph and generating a forward-direction placement sub-graph corresponding to the regular placement sub-graph according to a recognition result.

According to another aspect of the embodiment of the present invention, there is also provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of processing ticket images according to any of the embodiments of the present invention.

According to another aspect of the embodiments of the present invention, there is also provided a computer readable storage medium storing computer instructions for causing a processor to implement a method for processing ticket images according to any of the embodiments of the present invention.

According to the technical scheme, the bill image to be processed is acquired; classifying foreground or background of each pixel point in the bill image, and dividing bill subgraphs corresponding to independent bills in the bill image according to classification results; according to the position coordinates of each edge contour point in the bill subgraph, perspective transformation is carried out on the bill subgraph to obtain a regular placement subgraph corresponding to the bill subgraph; the technical means of carrying out optical character recognition on the regular placement subgraph and generating the forward placement subgraph corresponding to the regular placement subgraph according to the recognition result can accurately and effectively divide the bill subgraph of the independent bill according to the bill image of any color background, and meanwhile, the accurate bill rotation angle can be recognized for any kind of bill and any shooting angle, so that the accuracy of the subsequent OCR recognition and the extraction accuracy of key information can be facilitated, and the image information extraction requirements under various scenes can be met.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for processing ticket images according to a first embodiment of the present invention;

FIG. 2 is a flowchart of a method for processing a ticket image according to a second embodiment of the present invention;

FIG. 3 is an algorithm frame diagram of a foreground and background segmentation model to which the technical scheme of the present invention is applied;

FIG. 4 is a schematic diagram of a bill mask diagram to which the technical solution of the embodiment of the present invention is applied;

fig. 5 is a flowchart of a method for processing a ticket image according to the third embodiment of the present invention;

FIG. 6 is a schematic diagram of an implementation of obtaining edge profiles of a bill subgraph, to which the technical solution of the embodiment of the present invention is applied;

FIG. 7 is a schematic diagram of an implementation of generating a regular placement subgraph from bill subgraphs through perspective transformation, to which the technical scheme of the embodiment of the present invention is applicable;

FIG. 8 is a schematic diagram of an implementation of generating a forward-direction placement sub-graph corresponding to a regular placement sub-graph by an optical character recognition technique according to an embodiment of the present invention;

fig. 9 is a schematic structural view of a bill image processing apparatus according to a fourth embodiment of the present invention;

fig. 10 is a schematic structural diagram of an electronic device implementing a method for processing ticket images according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

Fig. 1 is a flowchart of a processing method of a bill image according to a first embodiment of the present invention, where the embodiment may be adapted to divide bill subgraphs corresponding to one or more independent bills from the bill image, and adjust each bill subgraph to a forward arrangement by performing angular rotation on each bill subgraph. As shown in fig. 1, the method includes:

S110, acquiring a bill image to be processed, wherein the bill image comprises at least one independent bill.

Wherein the ticket image is an image comprising at least one individual ticket. Specifically, the bill image to be processed can be obtained by means of real-time shooting, network downloading or image processing and the like.

Specifically, the bill image may be an image obtained by shooting the user at any shooting angle, or may be an image having any shooting background. Meanwhile, the single bill contained in the bill image may be any type of bill, which is not limited in this embodiment. That is, the embodiments of the present invention are applicable to general ticket processing scenarios regardless of background, shooting angle, and ticket type.

S120, classifying the foreground or the background of each pixel point in the bill image, and dividing bill subgraphs corresponding to independent bills in the bill image according to classification results.

In this embodiment, in order to effectively realize the segmentation of the independent bill under any background, it is proposed to classify the foreground or the background by taking each pixel point in the bill image as a unit. That is, for each pixel, a classification result is obtained that the pixel is foreground or background. The foreground may be understood as a part of the image of the bill, and the background may be understood as a part of the background where each single bill is placed.

Specifically, the foreground or background can be respectively classified for each pixel point in the bill image by various foreground or background detection technologies.

After classification of the foreground or the background of each pixel point is completed, clustering processing can be performed on each target pixel point according to the position relation between each pixel point classified as the foreground (for example, the target pixel point can be named as the target pixel point), and bill subgraphs corresponding to independent bills can be separated from the bill images according to the clustering result.

Specifically, contour recognition can be performed in each clustering result, and bill subgraphs corresponding to each recognized contour are segmented in the bill image. Wherein each bill sub-graph corresponds to an independent bill.

S130, performing perspective transformation on the bill subgraph according to the position coordinates of each edge contour point in the bill subgraph to obtain a regular placement subgraph corresponding to the bill subgraph.

In this embodiment, since the shooting angle of the bill image is not restricted, each bill sub-image may be oriented at any angle, and various kinds of deformation may occur. For example, a single bill itself is a regular rectangle, but the resulting bill subgraph is an irregular quadrilateral due to the inclination of the shooting angle.

In order to ensure that each bill sub-graph can be accurately placed in the forward direction, in the embodiment, creatively proposes that perspective transformation is firstly carried out on the bill sub-graph before optical character recognition is carried out.

The perspective transformation is a transformation in which the projection geometry on the projection surface is maintained by rotating the projection surface (perspective surface) by a certain angle around the line (perspective axis) according to the perspective rotation law under the condition that the perspective center, the image point and the target point are collinear, and destroying the original projection light beam. In this embodiment, the perspective transformation mainly achieves the object of: the bill subgraph which can be deformed is limited to a range of geometric shapes (typically, rectangles or squares) which are horizontally and vertically arranged. At this time, the bill subgraph is subjected to regular arrangement processing, so that a corresponding regular arrangement subgraph is obtained.

That is, since the regular placement subgraph is a geometry of horizontal, vertical placement, the rotation angle of the regular placement subgraph relative to the forward placement can only be any one of 0 °,90 °,180 °,270 °. The possible discrimination of the subsequent rotation angle is greatly reduced.

In this embodiment, if it is determined that the ticket is a regular rectangle, the implementation manner of perspective transformation may be: firstly, according to the position coordinates of each edge contour point in a bill subgraph, the coordinates of four corner points of the bill subgraph are determined, then, a transformation matrix is determined by calculating the relation between the coordinates of the four corner points and the coordinates of four regular corner points of the regular placement subgraph, and furthermore, the bill subgraph can be subjected to regular placement processing through the transformation matrix, so that the regular placement subgraph is obtained.

And S140, performing optical character recognition on the regular placement subgraph, and generating a forward placement subgraph corresponding to the regular placement subgraph according to a recognition result.

As described above, the rotation angle of the regular placement sub-graph is any one of 0 °,90 °,180 °, and 270 °, so that a specific rotation angle of the regular placement sub-graph needs to be further determined, so as to finally adjust the regular placement sub-graph to a forward placement sub-graph.

In this embodiment, optical character recognition may be performed on the regular placement subgraph first to obtain a corresponding text recognition result, and then the rotation angle of the regular placement subgraph relative to the horizontal direction may be determined by checking the rotation angle of the text recognition result.

Any text box can be identified in the regular placement subgraph, and the rotation angle of the regular placement subgraph relative to the horizontal direction is determined according to the rotation angle of the text box relative to the horizontal direction. Alternatively, all text boxes may be identified in the regular placement sub-graph, and the rotation angle of each text box with respect to the horizontal direction may be obtained accordingly, and further, the rotation angle of most text boxes that is consistent may be used as the rotation angle of the regular placement sub-graph with respect to the horizontal direction, which is not limited in this embodiment.

Example two

Fig. 2 is a flowchart of a processing method of a bill image according to a second embodiment of the present invention, where the processing method is optimized based on the foregoing embodiments, and in this embodiment, the operation of "classifying each pixel point in the bill image into a foreground or a background respectively, and dividing a bill sub-image corresponding to an independent bill in the bill image according to the classification result" is implemented.

Accordingly, as shown in fig. 2, the method includes:

s210, acquiring a bill image to be processed, wherein the bill image comprises at least one independent bill.

S220, carrying out convolution and pooling processing on the bill images with the target resolution for a plurality of times to obtain bill feature images with a plurality of downsampling resolutions.

In the present embodiment, each pixel point included in the ticket image is classified as foreground or background step by using a segmentation algorithm in the deep learning algorithm.

The target resolution is the image resolution of the bill image to be processed, in order to better extract image features of different scales in the bill image for subsequent processing, the embodiment performs convolution and pooling processing on the bill image of the target resolution for multiple times, that is, performs downsampling processing on the bill image for multiple times, so as to obtain bill feature images of multiple downsampling resolutions.

The resolutions of different downsampling resolutions are different and smaller than the target resolution. Typically, the bill image may be input to a plurality of downsampling modules connected in sequence, so as to obtain bill feature maps with a plurality of downsampling resolutions. Each downsampling module comprises a convolution layer and a pooling layer which are connected, and the convolution layer and the pooling layer are used for outputting bill feature images with set downsampling resolution.

S230, performing multiple deconvolution and channel stitching processing on the bill feature graphs with the downsampling resolutions to obtain a multichannel feature graph with the target resolution.

In this embodiment, after obtaining a plurality of bill feature maps with downsampling resolution, it is equivalent to extracting advanced features from the bill images, and then performing deconvolution and channel stitching processing on the bill feature maps with downsampling resolution for a plurality of times, which is equivalent to re-performing upsampling processing on each bill feature map, so as to amplify the advanced features extracted before and expand the receptive field.

In this embodiment, in order to avoid information loss caused by the deconvolution process, the bill feature map of each downsampling resolution generated in the downsampling process is spliced with each bill feature map obtained in the upsampling process in the channel direction, so as to finally obtain a multichannel feature map of the target resolution.

S240, carrying out convolution and classification processing on the multichannel feature map of the target resolution to obtain a single-channel feature map of the target resolution.

The single-channel feature map is used for describing probability values of each pixel point belonging to a foreground or a background in the bill image.

In this embodiment, the multi-channel feature map of the target resolution may be sequentially passed through a convolution layer and a classification function (typically, sigmoid) layer, so as to obtain a single-channel feature map of the target resolution.

Alternatively, the bill image with the target resolution may be input into a pre-trained foreground and background segmentation model, and the foreground and background segmentation model outputs a single-channel feature map with the target resolution.

Fig. 3 specifically shows an algorithm frame diagram of a foreground and background segmentation model to which the technical solution of the embodiment of the present invention is applied. As shown in fig. 3, the down arrow represents the down sampling process, the up arrow represents the up sampling process, and the horizontal solid arrow represents the stitching in the channel direction.

Specifically, the target resolution of the original ticket image is 512×512, and there are three RBG pixel channels, and further, the ticket image may be expressed as 512×512×3. The ticket image may be first expanded in number of channels by a convolution layer, i.e., expressed as 512 x 64. Then, the rolling and pooling processing is sequentially performed by four adjacent downsampling modules, which is equivalent to performing 4 downsampling processes, and four downsampling resolution bill feature graphs of 2 times (256×256×128×128), 4 times (128×128×256), 8 times (64×64×512) and 16 times (32×32×1024) are obtained. And then deconvoluting to obtain an up-sampling feature map, and splicing the feature map with the down-sampling resolution in the down-sampling process with the feature map with the up-sampling resolution in the up-sampling process in the channel direction in order to avoid information loss caused by the deconvolution process. Finally, the up-sampled multichannel feature map (512×512×64) with the target resolution is processed by a convolutional neural network (CNN, cable News Network) layer and a classification function (typically, a sigmoid function is used as an activation function) layer, so that a single channel feature map (512×512×1) with the target resolution can be obtained.

Each feature point in the single-channel feature map represents a probability value that each pixel point in the bill image belongs to a foreground or a background.

S250, converting the bill image into a bill mask diagram according to the classification result, wherein each target pixel point of target pixel values in the bill mask diagram is used for identifying an independent bill.

Specifically, the bill image can be converted into the bill mask map by assigning different pixel values to the pixel points classified as foreground and the pixel points classified as background in the bill image. Specifically, the bill image may be converted into a bill mask map by assigning a target pixel value 255 to a target pixel classified as foreground and a pixel value 0 to other pixels classified as background. Further, each target pixel point of the target pixel value in the bill mask map is used for identifying an image of each independent bill.

By way of example, and not limitation, a schematic diagram of a bill mask diagram to which the technical solution of the embodiment of the present invention is applicable is shown in fig. 4. As shown in fig. 4, by marking the pixels classified as foreground as white and the pixels classified as background as black, the image position of the target pixel corresponding to the independent bill can be effectively determined.

And S260, carrying out corrosion treatment on each target pixel point in the bill mask diagram, and dividing bill subgraphs corresponding to independent bills in the bill image according to the bill mask diagram after corrosion treatment.

In this embodiment, in order to avoid that the images of two separate notes are connected in the note mask map because the two separate notes are spaced too close together, it is disadvantageous to perform image segmentation and edge detection subsequently. Furthermore, in the process of obtaining the bill mask graph, the corrosion operation in morphology can be carried out on the bill mask graph, and then the bill sub-graph corresponding to each independent bill respectively with clear interval can be obtained by dividing the bill mask graph after the corrosion treatment in the original bill image.

S270, performing perspective transformation on the bill subgraph according to the position coordinates of each edge contour point in the bill subgraph to obtain a regular placement subgraph corresponding to the bill subgraph.

S280, carrying out optical character recognition on the regular placement subgraph, and generating a forward placement subgraph corresponding to the regular placement subgraph according to a recognition result.

According to the technical scheme, the bill characteristic diagrams with a plurality of downsampling resolutions are obtained through convolution and pooling processing on the bill images with the target resolution for a plurality of times; performing multiple deconvolution and channel splicing processing on the bill feature graphs with the multiple downsampling resolutions to obtain a multi-channel feature graph with the target resolution; the multichannel feature map with the target resolution is subjected to convolution and classification processing to obtain a technical means for describing a single-channel feature map of a probability value of each pixel point belonging to a foreground or a background in the bill image, so that the classification result of each pixel point as the foreground or the background can be accurately determined, further effective data preparation can be provided for the accurate independent bill segmentation and the extraction of the edge of the independent bill, and meanwhile, the accuracy of the subsequent image segmentation and the edge detection can be further improved by introducing morphological corrosion processing before the bill subgraph segmentation operation.

Example III

Fig. 5 is a flowchart of a processing method of a bill image according to a third embodiment of the present invention, where the optimization is performed based on the foregoing embodiments, in this embodiment, "perspective transformation is performed on the bill subgraph according to the position coordinates of each edge contour point in the bill subgraph" to obtain a regular placement subgraph corresponding to the bill subgraph "and" optical character recognition is performed on the regular placement subgraph "are performed, and an implementation manner of generating a forward placement subgraph corresponding to the regular placement subgraph" according to the recognition result is refined.

Accordingly, as shown in fig. 5, the method specifically may include:

s510, acquiring a bill image to be processed, wherein the bill image comprises at least one independent bill.

S520, classifying the foreground or the background of each pixel point in the bill image, and dividing bill subgraphs corresponding to the independent bills in the bill image according to classification results.

S530, acquiring the position coordinates of each edge contour point in the bill subgraph by adopting a set edge contour searching algorithm.

In this embodiment, the edge information of each bill subgraph divided in the bill image may be determined by setting an edge detection operator (e.g., a canny operator) first. In particular, a schematic implementation of obtaining an edge profile of a bill subgraph is shown in fig. 6.

Accordingly, after the edge information of one or more bill subgraphs is acquired, a set of position coordinates of all edge contour points included in each bill subgraph can be acquired through a method provided by an opencv tool for searching for an image edge contour.

S540, acquiring a plurality of vertex position coordinates corresponding to the bill subgraph according to the position coordinates of each edge contour point in the bill subgraph.

In this embodiment, after the position coordinates of each edge contour point in the bill sub-graph are obtained, each extreme point included in the bill sub-graph may be identified as a plurality of vertex position coordinates corresponding to the bill sub-graph according to the actual shape of the individual bill and the position coordinates of each edge contour point.

Specifically, if the shape of the independent bill is a rectangle, identifying a lower left corner extreme point, an upper left corner extreme point, a lower right corner extreme point and an upper right corner extreme point included in the bill subgraph according to the position coordinates of each edge contour point, and taking the position coordinates of the four extreme points as the position coordinates of four vertexes of the rectangle.

S550, determining a standard transformation size corresponding to the bill subgraph according to the plurality of vertex position coordinates, and determining a rotation matrix for perspective transformation according to the plurality of vertex position coordinates and the standard transformation size.

As previously described, if the individual notes are rectangular in shape, the standard transformation size may be the height and width of a rectangle.

The relative distance between every two adjacent top points can be calculated according to the position coordinates of the plurality of top points, and the height and the width of the rectangle can be determined according to the calculated relative distance.

In a specific example, if four sequentially adjacent vertices are a, b, c, and d, respectively, the relative distance e1 between a and b, the relative distance e2 between b and c, the relative distance e3 between c and d, and the relative distance e4 between d and a may be calculated. Taking the average e5 of e1 and e3, and the average e6 of e2 and e4, the smaller value of e5 and e6 can be taken as the height of the rectangle, and the larger value of e5 and e6 can be taken as the width of the rectangle.

Accordingly, determining a rotation matrix for performing perspective transformation according to the plurality of vertex position coordinates and the standard transformation size may specifically include:

according to the standard transformation size, determining a regular placement position coordinate (x 'i, y' i) corresponding to each vertex position coordinate (xi, yi), wherein i is an integer, and i is [1,2,3,4];

According to the regular placement position coordinates (x 'i, y' i) corresponding to each vertex position coordinate (xi, yi), respectively, and the formula:computing a rotation matrix for perspective transformation>

Where, (Ui, vi, wi) is a result obtained by converting the vertex position coordinates (xi, yi) into the map coordinate system.

Specifically, assuming that the four vertex position coordinates corresponding to the bill subgraph are { [ x1, y1], [ x2, y2], [ x3, y3], [ x4, y4] }, and assuming that the width w and the height h of the bill subgraph are calculated, the regular placement position coordinates of the four vertices after perspective transformation can be determined to be { [0,0], [ w,0], [0, h ], [ w, h ] }. The rotation matrix required for the perspective transformation can be found using these two sets of coordinates.

The coordinate conversion between (Ui, vi, wi) and (xi, yi) can be realized by the conversion relation of the existing map coordinate system and the space coordinate system.

S560, performing perspective transformation on the bill subgraph according to the rotation matrix and the position coordinates of each edge contour point in the bill subgraph to obtain a regular placement subgraph corresponding to the bill subgraph.

In the present embodiment, the rotation matrix is acquired And then, determining the regular placement coordinates of each pixel point in the bill subgraph. That is, it is equivalent to mapping the pixel value of each pixel point in the bill subgraph to the value represented by { [0,0],[w,0],[0,h],[w,h]And the effect of generating a regular placement subgraph corresponding to the bill subgraph is further realized in the rectangular frame defined by the method.

Specifically, fig. 7 shows a schematic implementation diagram of generating a regular placement sub-graph from a bill sub-graph through perspective transformation, which is applicable to the technical scheme of the embodiment of the present invention. As shown in fig. 7, after the position coordinates of the edge contour points in the bill subgraph (upper left subgraph in fig. 7) are acquired, a rotation matrix may be generated. Further, according to the rotation matrix, each pixel point in the bill sub-image (upper right sub-image in fig. 7) divided into individual bills in the bill image may be mapped to the inside of a set rectangular frame, thereby obtaining three regular placement sub-images in the lower half of fig. 7.

S570, performing optical character recognition on the regular placement subgraph to acquire all text boxes included in the regular placement subgraph.

In this embodiment, the text detection model obtained by training in advance may be based on a deep learning algorithm, and all text boxes included in each regular placement sub-graph may be obtained by inputting each regular placement sub-graph into the text detection model respectively.

S580, respectively determining the included angle value between each text box and the horizontal direction, and determining the regular placement direction matched with the regular placement subgraph according to the occurrence times of each included angle value.

In this embodiment, an angle classification model may be trained in advance, and an angle value between each text box in the same regular placement sub-graph and the horizontal direction may be determined by using the angle classification model. Optionally, the text box detection algorithm used in the angle classification model adopts a CTPN (Connectionist Text Proposal Network, connected text suggestion network) algorithm, and the direction classifier adopts a lightweight mobiletv 3 network as a backbone network.

In a specific example, if three text boxes are included in the regular placement sub-graph a, text box 1, text box 2, and text box 3. If the included angle value determined for the text box 1 is 90 degrees, the included angle value determined for the text box 2 is 90 degrees, and the included angle value determined according to the text box 2 is 270 degrees, and the regular placement direction matched with the regular placement subgraph is determined to be 90 degrees due to the fact that the occurrence number of the 90 degrees is more. The regular placement direction can be understood as that after the regular placement subgraph rotates according to the regular placement direction, the regular placement subgraph can be adjusted to be placed forward.

Fig. 8 shows a schematic implementation diagram of generating a forward placement sub-graph corresponding to a regular placement sub-graph by using an optical character recognition technology according to the technical scheme of the embodiment of the present invention. As shown in fig. 8, by inputting different regular placement subgraphs into the text detection model and the angle classification model, respectively, the regular placement direction corresponding to each regular placement subgraph can be obtained, respectively.

In this embodiment, by acquiring all text boxes included in the regular placement subgraph, determining an included angle value between each text box and a horizontal direction, and determining an implementation manner of a regular placement direction matched with the regular placement subgraph according to the occurrence times of each included angle value, a plurality of included angle values obtained by calculating a plurality of text boxes may be used for combined verification, so as to further improve the accuracy of determining the regular placement direction.

S590, converting the regular placement subgraph into a forward placement subgraph according to the regular placement direction.

On the basis of the above embodiments, acquiring the ticket image to be processed may include:

acquiring the bill image to be processed, which is input by a target user through a human-computer interaction interface in the process of handling the target transaction;

correspondingly, after generating the forward placement subgraph corresponding to the regular placement subgraph according to the recognition result, the method further comprises the following steps:

at least one forward placement subgraph is displayed by a user through the man-machine interaction interface; responding to the selection of a target user on a target forward-direction placement sub-graph, and extracting key description information matched with the current processing process of the target transaction from the target forward-direction placement sub-graph; and continuing to execute the transaction of the target transaction according to the key description information.

In this optional implementation manner, an optional application scenario of each embodiment of the present invention is specifically defined. Typically, a specific user (target user) may be involved in uploading various notes during a transaction process, such as reimbursement or coupon verification, on a self-service machine. At this time, the bill image to be processed can be uploaded through a man-machine interaction interface provided by the self-service machine. Furthermore, the self-service machine can generate a forward placement subgraph corresponding to each single bill in the bill image through the bill image processing method according to the embodiments of the invention, and display the forward placement subgraphs through a human-computer interaction interface.

And then, the target user can select one or more target forward-direction placement subgraphs from the forward-direction placement subgraphs, and the self-service machine can identify key description information matched with the current processing process of the target transaction in the target forward-direction placement subgraphs through an OCR technology. For example, if the coupon code is identified in the target forward-placed sub-graph first when the coupon verification process is currently performed, if the coupon code is verified to be a legal coupon, a specific value of the coupon amount may be further identified in the target forward-placed sub-graph to continue with a subsequent verification process, etc.

Through the arrangement, in the transaction process related to bill image processing, a user can adopt any shooting equipment to shoot at any shooting angle and any shooting background to obtain any type of bill image to be processed, and even if bill deformation occurs in the shot bill in the bill image, the deformation correction can be effectively performed through the technical scheme of the embodiment of the invention.

Meanwhile, the technical scheme of the embodiment of the invention combines a deep learning method and statistical knowledge to count the angle of a single line of characters and then judge the angle of the whole bill, so that the accuracy of bill angle judgment can be greatly improved, the accuracy can reach 98.6%, and the method is favorable for being combined with a subsequent OCR character recognition algorithm to improve the accuracy of the whole character recognition.

Example IV

Fig. 9 is a schematic structural diagram of a bill image processing device according to a third embodiment of the present invention. As shown in fig. 9, the apparatus includes: a bill image acquisition module 910, a bill sub-graph segmentation module 920, a perspective transformation module 930, and a forward put sub-graph generation module 940, wherein:

a bill image obtaining module 910, configured to obtain a bill image to be processed, where the bill image includes at least one independent bill;

the bill sub-graph segmentation module 920 is configured to classify a foreground or a background of each pixel point in the bill image, and segment a bill sub-graph corresponding to an independent bill in the bill image according to a classification result;

the perspective transformation module 930 is configured to perform perspective transformation on the bill subgraph according to the position coordinates of the edge contour points in the bill subgraph, so as to obtain a regular placement subgraph corresponding to the bill subgraph;

And the forward placement sub-graph generating module 940 is configured to perform optical character recognition on the regular placement sub-graph, and generate a forward placement sub-graph corresponding to the regular placement sub-graph according to a recognition result.

Based on the above embodiments, the bill sub-graph segmentation module 920 may be specifically configured to:

carrying out convolution and pooling treatment on the bill images with the target resolution for a plurality of times to obtain bill feature images with a plurality of downsampling resolutions;

performing multiple deconvolution and channel splicing processing on the bill feature graphs with the multiple downsampling resolutions to obtain a multi-channel feature graph with the target resolution;

carrying out convolution and classification processing on the multichannel feature map with the target resolution to obtain a single-channel feature map with the target resolution;

according to the classification result, converting the bill image into a bill mask map, wherein each target pixel point of target pixel values in the bill mask map is used for identifying independent bills;

and carrying out corrosion treatment on each target pixel point in the bill mask diagram, and dividing bill subgraphs corresponding to independent bills in the bill image according to the bill mask diagram after corrosion treatment.

Based on the above embodiments, the perspective transformation module 930 may specifically include:

the contour point coordinate acquisition unit is used for acquiring the position coordinates of each edge contour point in the bill subgraph by adopting a set edge contour searching algorithm;

the vertex coordinate acquisition unit is used for acquiring a plurality of vertex position coordinates corresponding to the bill subgraph according to the position coordinates of each edge contour point in the bill subgraph;

a rotation matrix determining unit, configured to determine a standard transformation size corresponding to the bill subgraph according to the plurality of vertex position coordinates, and determine a rotation matrix for performing perspective transformation according to the plurality of vertex position coordinates and the standard transformation size;

and the perspective transformation unit is used for carrying out perspective transformation on the bill subgraph according to the rotation matrix and the position coordinates of each edge contour point in the bill subgraph to obtain a regular placement subgraph corresponding to the bill subgraph.

On the basis of the above embodiments, the independent bill may be rectangular in shape, and the standard transformation size may be the height and width of the rectangle;

accordingly, the rotation matrix determining unit may be specifically configured to:

according to the andeach vertex position coordinate (xi, yi) corresponds to a respective set of regular placement position coordinates (x 'i, y' i), and the formula:computing a rotation matrix for perspective transformation>

Based on the above embodiments, the forward placement sub-graph generation module 940 may be specifically configured to:

performing optical character recognition on the regular placement subgraph to obtain all text boxes included in the regular placement subgraph;

respectively determining an included angle value between each text box and the horizontal direction, and determining a regular placement direction matched with the regular placement subgraph according to the occurrence times of each included angle value;

and converting the regular placement subgraph into a forward placement subgraph according to the regular placement direction.

Based on the above embodiments, the ticket image acquisition module 910 may specifically be configured to:

Correspondingly, the system also comprises a service execution module for:

after generating a forward-direction placement sub-graph corresponding to the regular placement sub-graph according to the identification result, displaying at least one forward-direction placement sub-graph through the human-computer interaction interface;

responding to the selection of a target user on a target forward-direction placement sub-graph, and extracting key description information matched with the current processing process of the target transaction from the target forward-direction placement sub-graph;

and continuing to execute the transaction of the target transaction according to the key description information.

The bill image processing device provided by the embodiment of the invention can execute the bill image processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example five

Fig. 10 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 10, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the processing of ticket images as provided by any of the embodiments of the present invention.

Namely, acquiring a bill image to be processed, wherein the bill image comprises at least one independent bill;

In some embodiments, the method of processing ticket images as provided by any of the embodiments of the present invention may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the processing method of ticket images as provided by any of the embodiments of the present invention described above can be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the processing method of ticket images as provided by any embodiment of the invention in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method for processing a ticket image, comprising:

2. The method of claim 1, wherein classifying each pixel point in the ticket image as foreground or background comprises:

3. The method of claim 1, wherein segmenting the ticket sub-graph corresponding to the individual ticket in the ticket image based on the classification result comprises:

4. The method of claim 1, wherein performing perspective transformation on the bill subgraph according to the position coordinates of each edge contour point in the bill subgraph to obtain a regular placement subgraph corresponding to the bill subgraph comprises:

acquiring position coordinates of each edge contour point in the bill subgraph by adopting a set edge contour searching algorithm;

Acquiring a plurality of vertex position coordinates corresponding to the bill subgraph according to the position coordinates of each edge contour point in the bill subgraph;

determining a standard transformation size corresponding to the bill subgraph according to the plurality of vertex position coordinates, and determining a rotation matrix for perspective transformation according to the plurality of vertex position coordinates and the standard transformation size;

and performing perspective transformation on the bill subgraph according to the rotation matrix and the position coordinates of each edge contour point in the bill subgraph to obtain a regular placement subgraph corresponding to the bill subgraph.

5. The method of claim 4, wherein the individual notes are rectangular in shape and the standard transformation size is the height and width of a rectangle;

determining a rotation matrix for perspective transformation based on the plurality of vertex position coordinates and the standard transformation size, comprising:

based on the standard transformation size, a coordinate (x _i ，y _i ) Respectively corresponding to the regular placement position coordinates (x' _i ，y’ _i ) Wherein i is an integer, i.e. [1,2,3,4 ]]；

According to the position coordinates (x _i ，y _i ) Respectively corresponding to the regular placement position coordinates (x' _i ，y’ _i ) And the formula:computing a rotation matrix for perspective transformation

Wherein, (U) _i ，V _i ，W _i ) To coordinate the vertex position (x _i ，y _i ) And (5) performing coordinate conversion in a mapping coordinate system to obtain a result.

6. The method according to claim 1, wherein performing optical character recognition on the regular placement sub-graph and generating a forward placement sub-graph corresponding to the regular placement sub-graph according to a recognition result comprises:

7. The method of claim 1, wherein acquiring the ticket image to be processed comprises:

after generating the forward placement subgraph corresponding to the regular placement subgraph according to the recognition result, the method further comprises the following steps:

At least one forward placement subgraph is displayed by a user through the man-machine interaction interface;

8. A bill image processing apparatus, comprising:

9. An electronic device, the electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of processing ticket images according to any of claims 1 to 7.

10. A computer-readable storage medium storing computer instructions for causing a processor to execute the method of processing a ticket image according to any one of claims 1 to 7.