CN111507181A - Bill image correction method and device and computer equipment - Google Patents

Bill image correction method and device and computer equipment Download PDF

Info

Publication number
CN111507181A
CN111507181A CN202010164109.4A CN202010164109A CN111507181A CN 111507181 A CN111507181 A CN 111507181A CN 202010164109 A CN202010164109 A CN 202010164109A CN 111507181 A CN111507181 A CN 111507181A
Authority
CN
China
Prior art keywords
distortion
matrix
bill picture
semantic segmentation
corrected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010164109.4A
Other languages
Chinese (zh)
Other versions
CN111507181B (en
Inventor
周军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010164109.4A priority Critical patent/CN111507181B/en
Publication of CN111507181A publication Critical patent/CN111507181A/en
Application granted granted Critical
Publication of CN111507181B publication Critical patent/CN111507181B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)
  • Image Processing (AREA)

Abstract

The application discloses a method and a device for correcting a bill image and computer equipment, relates to the field of image processing, and can solve the problems that correction precision is not high when bill deformation is corrected, and further subsequent bill text detection and recognition are easily affected. The method comprises the following steps: acquiring a sample bill picture with the same type as a bill picture to be corrected; carrying out data processing on the sample bill picture to obtain a corresponding distortion matrix; establishing a semantic segmentation model based on a semantic segmentation algorithm, and training the semantic segmentation model by using the processed sample bill picture and the corresponding distortion matrix so as to enable the semantic segmentation model to meet a preset standard; inputting the bill picture to be corrected into a semantic segmentation model meeting the preset standard to obtain a correction matrix; and correcting the bill picture to be corrected by using the correction matrix. The application is suitable for automatic correction to the bill picture.

Description

Bill image correction method and device and computer equipment
Technical Field
The present application relates to the field of image processing, and in particular, to a method and an apparatus for correcting a document image, and a computer device.
Background
In recent years, automatic identification and micro-storage of document images are becoming the hot spots of research. Electronic imaging systems for financial instruments have appeared in the banking, fiscal, security and other industries, and these systems often take scanned images of the instruments as input. In the scanning input process, due to improper arrangement, folding of paper and other various factors, a scanning pattern may be inclined or folded to some extent, which causes great difficulty in next layout analysis, and therefore, correction of a bill image through image preprocessing is a very important link.
In the traditional claims industry, information on medical treatment tickets is often extracted by manual entry. In recent years, with the rise of machine learning and deep learning, OCR (Optical Character Recognition) technology is increasingly applied to information extraction of medical tickets. The OCR of the medical bill comprises the following main steps: image preprocessing- > text detection- > text recognition- > field division- > field post-processing. The quality of the preprocessing has a great influence on the later detection and identification. In the preprocessing stage, the existing methods mostly perform affine or perspective transformation by manually selecting four points, upper left, lower left, upper right and lower right, on the invoice.
Because affine transformation or perspective transformation considers the distortion and folding of the picture as a whole deformation, in many scenes, the distortion and folding of the picture are only local deformations, so that the correction precision is not high due to the affine transformation or perspective transformation, and the accuracy of bill text detection and recognition is influenced.
Disclosure of Invention
In view of this, the present application provides a method and an apparatus for correcting a document image, and a computer device, which can solve the problem that when the document deformation is corrected, the correction accuracy is not high, and further, the subsequent document text detection and recognition are easily affected.
According to one aspect of the application, a method for correcting a note image is provided, and the method comprises the following steps:
acquiring a sample bill picture with the same type as a bill picture to be corrected;
carrying out data processing on the sample bill picture to obtain a corresponding distortion matrix;
establishing a semantic segmentation model based on a semantic segmentation algorithm, and training the semantic segmentation model by using the processed sample bill picture and the corresponding distortion matrix so as to enable the semantic segmentation model to meet a preset standard;
inputting the bill picture to be corrected into a semantic segmentation model meeting the preset standard to obtain a correction matrix;
and correcting the bill picture to be corrected by using the correction matrix.
According to another aspect of the present application, there is provided an apparatus for rectifying an image of a bill, the apparatus including:
the acquiring module is used for acquiring a sample bill picture with the same bill type as the bill picture to be corrected;
the processing module is used for carrying out data processing on the sample bill picture to obtain a corresponding distortion matrix;
the training module is used for establishing a semantic segmentation model based on a semantic segmentation algorithm, and training the semantic segmentation model by utilizing the processed sample bill picture and the corresponding distortion matrix so as to enable the semantic segmentation model to meet a preset standard;
the input module is used for inputting the bill picture to be corrected into the semantic segmentation model meeting the preset standard to obtain a correction matrix;
and the correction module is used for correcting the bill picture to be corrected by using the correction matrix.
According to still another aspect of the present application, there is provided a non-transitory readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of rectification of a document image described above.
According to still another aspect of the present application, there is provided a computer device including a nonvolatile readable storage medium, a processor, and a computer program stored on the nonvolatile readable storage medium and executable on the processor, the processor implementing the method for rectifying a ticket image described above when executing the program.
By the technical scheme, the application provides a bill image correction method, a bill image correction device and computer equipment, compared with the current mode of utilizing affine transformation or perspective transformation to realize bill correction, the method can screen out the sample bill picture with the same type as the bill of the bill picture to be corrected by simulating various twisting and folding effects of the bill image in a real scene in advance, train and build a semantic segmentation model by utilizing the sample bill picture and the corresponding twisting matrix, acquire the twisting matrix of the bill picture to be corrected based on the successfully trained semantic segmentation model, then, a remap function of opencv is used for determining a correction coordinate corresponding to each pixel point in the bill picture to be corrected according to the distortion matrix output by the model, and realizing the correction reconstruction of the bill picture to be corrected by moving each pixel point to the corresponding correction coordinate position. The image correction method and device can automatically perform pixel-level image correction according to the characteristics of the input bill, can be applied to correction scenes of various local and non-local deformations, further can guarantee correction accuracy, and improve correction efficiency.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application to the disclosed embodiment. In the drawings:
FIG. 1 is a schematic flow chart illustrating a method for correcting a document image according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart illustrating another method for correcting a document image according to an embodiment of the present disclosure;
fig. 3 is a schematic flow chart illustrating sample ticket picture data processing according to an embodiment of the present application;
FIG. 4 is a schematic flow chart illustrating a process for correcting a bill to be corrected according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram illustrating an apparatus for rectifying an image of a document according to an embodiment of the present disclosure;
fig. 6 shows a schematic structural diagram of another correction device for a bill image provided by the embodiment of the application.
Detailed Description
The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The method aims at solving the problems that the correction precision is not high when the bill deformation is corrected at present, and further the detection and the identification of subsequent bill texts are easily influenced. The embodiment of the application provides a method for correcting a note image, as shown in fig. 1, the method includes:
101. and acquiring a sample bill picture with the same type as the bill picture to be corrected.
In a specific application scene, in order to ensure the precision of the semantic segmentation model training, the selected sample bill picture can be selected from bill pictures of types of bills to be corrected, and the sample bill picture is ensured to have no deviation of any pixel point, and the page is clear and complete.
102. And carrying out data processing on the sample bill picture to obtain a corresponding distortion matrix.
In a specific application scene, in order to fully train a model and ensure the training precision, a large number of bill pictures containing different form offsets are required to be used as training samples during model training, specifically, the bill pictures are generated in advance by processing data of the sample bill pictures, namely, the bill pictures of various deformations of images in a real scene are simulated by increasing the distortion effect and the deformation effect, and a distortion matrix formed by original pixel coordinates in the sample bill pictures and correspondingly deformed pixel coordinates is obtained.
103. And establishing a semantic segmentation model based on a semantic segmentation algorithm, and training the semantic segmentation model by using the processed sample bill picture and the corresponding distortion matrix so as to enable the semantic segmentation model to accord with a preset standard.
The preset standard is that the error between the predicted result and the real result output by the semantic segmentation model is smaller than a preset threshold, the preset threshold can be set according to the actual application scene, and the smaller the preset threshold is, the higher the precision of the trained semantic segmentation model is.
104. And inputting the bill picture to be corrected into a semantic segmentation model meeting a preset standard to obtain a correction matrix.
The picture to be corrected is a bill picture which has certain deformation and needs to be subjected to image correction.
105. And correcting the bill picture to be corrected by using the correction matrix.
The correction matrix comprises the current position coordinates of each pixel point in the bill picture to be corrected and the corresponding correction coordinates, and the correction coordinates correspond to the real coordinate positions of the pixel points. The principle of utilizing the correction matrix to correct the bill picture to be corrected is that the correction coordinates in the correction matrix are utilized to correct the current position coordinates of each pixel point, so that each offset pixel point is restored to the real position.
By the bill image correcting method in the embodiment, a sample bill image with the same bill type as that of a bill image to be corrected can be screened out by simulating various distortion and folding effects of the bill image in a real scene in advance, a semantic segmentation model is trained and built by using the sample bill image and the corresponding distortion matrix, the distortion matrix of the bill image to be corrected is obtained based on the successfully trained semantic segmentation model, then, a correction coordinate corresponding to each pixel point in the bill image to be corrected is determined by using a remap function of opencv according to the distortion matrix output by the model, and correction and reconstruction of the bill image to be corrected are realized by moving each pixel point to the corresponding correction coordinate position. The image correction method and device can automatically perform pixel-level image correction according to the characteristics of the input bill, can be applied to correction scenes of various local and non-local deformations, further can guarantee correction accuracy, and improve correction efficiency.
Further, as a refinement and an extension of the embodiment of the foregoing embodiment, in order to fully describe the implementation process in this embodiment, another method for rectifying a document image is provided, as shown in fig. 2, the method includes:
201. and acquiring a sample bill picture with the same type as the bill picture to be corrected.
For example, the type of the ticket corresponding to the ticket image to be corrected is a medical outpatient service charging ticket, and in order to ensure the accuracy of the semantic segmentation model training, the selected sample ticket image should also be preferably a medical outpatient service charging ticket.
202. And randomly selecting a plurality of distortion starting points from the sample bill picture.
The distortion starting point is a distortion operation point when a distortion effect is added, and in a specific application scene, in order to create a distortion bill image sample, a pixel point at the distortion starting point needs to be moved to a corresponding position according to a preset distortion effect. A plurality of different distortion starting points can be simultaneously selected for the same sample bill picture.
203. And carrying out distortion processing on the sample bill picture based on a distortion formula at the distortion starting point so as to obtain a first distortion matrix corresponding to each distortion starting point in different distortion states.
Wherein, the corresponding warping formula may be: w1 ═ 1-dαWherein w1 is a predetermined distortion effect, α is a constant value of 2, d is an euclidean distance between an offset point of the distortion starting point after the distortion and the distortion starting point, and the euclidean distance between the offset point and the distortion starting point is calculated by the following formula:
Figure BDA0002406799030000051
(x1, y1) corresponds to the offset point and (x2, y2) corresponds to the distortion starting point.
For this embodiment, in a specific application scenario, in order to perform warping processing on a sample ticket picture and obtain a first warping matrix, step 203 in the embodiment may specifically include: calculating a first offset point corresponding to a distortion initial point under a preset distortion effect according to a distortion formula; moving the pixel point at the distortion starting point to a first offset point; and constructing a first distortion matrix comprising the position coordinates of the distortion starting point and the position coordinates of the first offset point.
204. And performing linear interpolation filling of missing pixel points on the sample bill picture after the distortion processing to obtain a first sample bill picture which accords with a preset size.
In a specific application scene, after pixels are moved, a plurality of black pixel points appear on an image, and in order to ensure the integrity of the image, missing pixel points can be filled in a linear interpolation mode. The method specifically comprises the following steps: for a pixel point with a pixel value of 0 (0 represents black), the average value of the pixel values of 9 points around the pixel point is selected as the pixel value of the pixel point. After filling of all missing pixel points is completed, the bill picture needs to be preprocessed, the picture is processed into a specified input format size mainly through operations such as equal-scale scaling and normalization, and then a first sample bill picture is obtained.
205. And randomly selecting a plurality of folding deformation starting points from the sample bill picture.
The folding deformation starting point is a folding operation point when a folding effect is added, and in a specific application scene, in order to create a folded bill image sample, a pixel point at the folding deformation starting point needs to be moved to a corresponding position according to a preset folding effect. A plurality of different folding deformation starting points can be simultaneously selected for the same sample bill picture.
206. And folding the sample bill picture based on a folding formula at the folding deformation starting point so as to obtain a second distortion matrix corresponding to each folding deformation starting point under different folding states.
Wherein, the corresponding folding formula can be that w2 is α/(d)2+ α), where w2 is the preset folding effect, α is a constant value of 2, d is the euclidean distance between the offset point of the folding deformation starting point after the folding deformation and the folding deformation starting point, and the euclidean distance between the offset point and the folding deformation starting point is calculated as:
Figure BDA0002406799030000061
(x1, y1) corresponds to the offset point and (x2, y2) corresponds to the fold deformation starting point.
For this embodiment, in a specific application scenario, in order to perform folding processing on a sample ticket picture and obtain a second warping matrix, step 204 in the embodiment may specifically include: calculating a second offset point corresponding to the folding deformation starting point under the preset folding effect according to a folding formula; moving the pixel point at the folding deformation starting point to a second offset point; and constructing a second distortion matrix comprising the position coordinates of the folding deformation starting point and the position coordinates of a second offset point.
207. And performing linear interpolation filling of missing pixel points on the folded sample bill picture to obtain a second sample bill picture which accords with the preset size.
In a specific application scene, after pixels are moved, a plurality of black pixel points appear on an image, and in order to ensure the integrity of the image, missing pixel points can be filled in a linear interpolation mode. The method specifically comprises the following steps: for a pixel point with a pixel value of 0 (0 represents black), the average value of the pixel values of 9 points around the pixel point is selected as the pixel value of the pixel point. After filling of all missing pixel points is completed, the bill picture needs to be preprocessed, the picture is processed into a specified input format size mainly through operations such as equal-scale scaling and normalization, and then a second sample bill picture is obtained.
Correspondingly, in a specific scene, a schematic flow chart of data processing on the sample bill picture is shown in fig. 3, after the sample bill picture is obtained, the deformed bill image can be further obtained by twisting and folding the sample bill picture, and after linear interpolation filling of missing pixels is performed on the deformed bill picture, the deformed bill picture is processed into a specified input format size.
208. And configuring labels for the first sample bill picture and the second sample bill picture, wherein the labels correspond to the first distortion matrix and the second distortion matrix.
In a specific application scenario, a training set and a verification set can be created by using a first sample bill picture and a second sample bill picture, and a label value corresponding to each bill picture is w × h × 2, wherein w and h are widths and heights of the pictures respectively, and 2 represents coordinates of pixel points x and y on the distorted image on the input image.
209. And inputting the first sample bill picture and the second sample bill picture after the label configuration into a semantic segmentation model, and respectively obtaining a corresponding first prediction distortion matrix and a corresponding second prediction distortion matrix.
For this embodiment, in a specific application scenario, a first sample ticket image or a second sample ticket image in a training set may be used as an input, a first warping matrix and a second warping matrix in a corresponding tag may be used as an output training semantic segmentation model, after training for a certain period of time, the first sample ticket image or the second sample ticket image in a verification set may be used as an input, a first predicted warping matrix and a second predicted warping matrix output by the semantic segmentation model may be obtained, and the training condition of the semantic segmentation model may be further verified by matching the first predicted warping matrix and the second predicted warping matrix with the first predicted warping matrix and the second predicted warping matrix in the tag of the input ticket image.
210. A first loss function between the first predictive warping matrix and the first warping matrix and a second loss function between the second predictive warping matrix and the second warping matrix are calculated.
For this embodiment, in a specific application scenario, the predicted distortion matrix and the distortion matrix corresponding to the configuration tag may be compared through a loss function, and a loss value is obtained, so as to determine an error between the network output result and the pre-labeled data. The calculation formula is as follows:
Figure BDA0002406799030000081
Figure BDA0002406799030000082
wherein, yi,jAnd
Figure BDA0002406799030000083
the pixel values predicted by the semantic segmentation model at the corresponding coordinates (i, j) and the pixel values stored in the configuration tag, respectively.
211. And if the first loss function and the second loss function are smaller than the preset threshold value, judging that the semantic segmentation model meets the preset standard.
The preset threshold value can be set according to the actual application scene, and the smaller the preset threshold value is, the higher the precision of the semantic segmentation model which represents training is.
212. And if the first loss function or the second loss function which is larger than or equal to the preset threshold value is determined to exist, correcting the output result of the training semantic segmentation model by using the first distortion matrix and the second distortion matrix so as to enable the semantic segmentation model to accord with the preset standard.
In a specific application scenario, if it is determined that a loss function greater than or equal to a preset threshold exists in the first loss function or the second loss function, or the loss function ratio greater than or equal to the preset threshold is greater than a preset ratio, it may be determined that the semantic segmentation model does not meet the preset standard, and the output result of the training semantic segmentation model may be corrected by using the first warping matrix and the second warping matrix to reach the preset standard.
213. And adjusting the bill picture to be corrected to enable the bill picture to be corrected to accord with the preset size.
In a specific application scenario, before a bill picture to be corrected is input into a semantic segmentation model, the bill picture needs to be processed in advance, the picture is processed into a specified input format size mainly through operations such as equal scaling, normalization and the like, and therefore the bill picture and a sample bill picture are guaranteed to belong to the same format.
214. And inputting the bill picture to be corrected into a semantic segmentation model meeting a preset standard to obtain a correction matrix.
In a specific application scenario, since the semantic segmentation model is trained, after the bill picture to be corrected is input into the semantic segmentation model, the correction matrix output by the semantic segmentation model can be obtained, and the correction matrix in the step of this embodiment is equivalent to the distortion matrix in the training step.
215. And extracting the current position coordinates and the correction position coordinates of each pixel point contained in the bill picture to be corrected from the correction matrix.
For this embodiment, in a specific application scenario, the correction matrix may include the current position coordinates and the correction position coordinates of each pixel point in the to-be-corrected bill picture, and if the current position coordinates and the correction position coordinates of the same pixel point are matched, it may be determined that the pixel point is not shifted and does not need to be corrected; if the current position coordinate of the same pixel point is not matched with the correction position coordinate, the pixel point can be judged not to be deviated and needs to be corrected.
216. And determining the pixel points of which the current position coordinates are not matched with the correction position coordinates as the pixel points to be corrected.
For example, if the current position coordinate corresponding to the pixel point a is determined to be (x1, y1) and the corresponding correction position coordinate is (x2, y2), if it is determined that the coordinates (x1, y1) and (x2, y2) are not coincident, the pixel point a can be further determined as a pixel point to be corrected, and all pixel points to be corrected included in the bill picture to be corrected can be extracted by a coordinate matching method.
217. And replacing the current position coordinate with the correction position coordinate to move the pixel point to be corrected to the correction position, thereby further realizing the correction treatment of the bill picture to be corrected.
For this embodiment, after all the pixels to be corrected are determined based on the embodiment step 216, the pixels to be corrected may be sequentially moved to the correction position, and a corrected bill picture is further obtained.
In a specific application scenario, a schematic flow diagram of performing correction processing on a bill picture to be corrected is shown in fig. 4, when the bill picture to be corrected is obtained, picture size adjustment processing needs to be performed in advance to enable the bill picture to be corrected and a sample bill picture to belong to the same format size, then the processed bill picture to be corrected is input into a trained semantic segmentation model, a correction matrix w h 2 output by the semantic segmentation model is obtained, then the bill picture to be corrected is reconstructed according to correction position coordinates of each pixel point in the correction matrix, and further pixel level correction of the bill picture is achieved.
By the bill image correcting method, after the standard sample bill picture is screened out, the sample bill picture is subjected to data processing based on the distortion formula and the folding formula to obtain a first distortion matrix and a second distortion matrix, and after the linear interpolation operation is carried out on the sample bill pictures after the distortion processing, a first sample bill data picture and a second sample bill picture are obtained, then a modified semantic segmentation model is trained by utilizing the distortion matrix and the corresponding first sample bill data picture or the second sample bill picture, when the semantic segmentation model is judged to be in accordance with the preset standard, the preprocessed bill picture to be corrected is input into the semantic segmentation model to obtain the correction matrix, then the pixel point to be corrected in the bill picture to be corrected is adjusted based on the correction position coordinate in the correction matrix, and the correction processing of the bill picture to be corrected is further achieved. The image correction method and device can automatically perform pixel-level image correction according to the characteristics of the input bill, can be applied to correction scenes of various local and non-local deformations, further can guarantee correction accuracy, and improve correction efficiency. In the correction process, besides the input of the bill picture to be corrected, no manual participation is needed, so that the dependence on manual operation can be reduced, the automation efficiency is improved, and the error rate can be reduced.
Further, as a concrete embodiment of the method shown in fig. 1 and fig. 2, the present application provides an apparatus for rectifying an image of a bill, as shown in fig. 5, the apparatus includes: an acquisition module 31, a processing module 32, a training module 33, an input module 34, and a rectification module 35.
The acquiring module 31 is used for acquiring a sample bill picture with the same bill type as the bill picture to be corrected;
the processing module 32 is configured to perform data processing on the sample bill picture to obtain a corresponding distortion matrix;
the training module 33 is used for creating a semantic segmentation model based on a semantic segmentation algorithm, and training the semantic segmentation model by using the processed sample bill picture and the corresponding distortion matrix so as to enable the semantic segmentation model to meet a preset standard;
the input module 34 is used for inputting the bill picture to be corrected into the semantic segmentation model meeting the preset standard to obtain a correction matrix;
and the correcting module 35 is used for correcting the bill picture to be corrected by using the correcting matrix.
In a specific application scenario, in order to implement data processing on a sample bill picture and obtain a corresponding distortion matrix, the processing module 32 is specifically configured to randomly select a plurality of distortion starting points from the sample bill picture; carrying out distortion processing on the sample bill picture based on a distortion formula at a distortion starting point so as to obtain first distortion matrixes corresponding to the distortion starting points in different distortion states; performing linear interpolation filling of missing pixel points on the sample bill picture after the distortion processing to obtain a first sample bill picture which accords with a preset size; randomly selecting a plurality of folding deformation starting points from the sample bill picture; folding the sample bill picture based on a folding formula at a folding deformation starting point so as to obtain a second distortion matrix corresponding to each folding deformation starting point under different folding states; and performing linear interpolation filling of missing pixel points on the folded sample bill picture to obtain a second sample bill picture which accords with the preset size.
Accordingly, the corresponding warping formula is: w1 ═ 1-dαThe method comprises the steps of obtaining a sample bill picture, a distortion starting point, a distortion effect setting module 32 and a processing module, wherein w1 is a preset distortion effect, α is a constant value 2, d is an Euclidean distance between a deviation point of the distortion starting point after the distortion deformation and the distortion starting point, the first deviation point corresponding to the distortion starting point under the preset distortion effect is obtained by performing distortion processing on the sample bill picture at the distortion starting point based on a distortion formula, a first distortion matrix corresponding to each distortion starting point in different distortion states is obtained, the processing module is specifically used for calculating a first deviation point corresponding to the distortion starting point under the preset distortion effect according to the distortion formula, pixel points at the distortion starting point are moved to the first deviation point, and the first distortion matrix comprising the position coordinates of the distortion starting.
In a specific application scenario, the corresponding folding formula is w 2- α/(d)2+ α), where w2 is the preset folding effect, α is a constant value of 2, and d is the offset point of the folding deformation starting point after the folding deformation and the foldingEuclidean distance between the deformation starting points; in order to fold the sample bill picture at the folding deformation starting point based on the folding formula to obtain a second distortion matrix corresponding to each folding deformation starting point in different folding states, the processing module 32 is specifically configured to calculate a second offset point corresponding to the folding deformation starting point under a preset folding effect according to the folding formula; moving the pixel point at the folding deformation starting point to a second offset point; and constructing a second distortion matrix comprising the position coordinates of the folding deformation starting point and the position coordinates of a second offset point.
Correspondingly, in order to obtain a semantic segmentation model meeting a preset standard through training, the training module 33 is specifically configured to configure labels for the first sample ticket image and the second sample ticket image, where the labels correspond to the first distortion matrix and the second distortion matrix; inputting the first sample bill data picture and the second sample bill picture after the label configuration into a semantic segmentation model, and respectively obtaining a corresponding first prediction distortion matrix and a corresponding second prediction distortion matrix; calculating a first loss function between the first predictive warping matrix and the first warping matrix and a second loss function between the second predictive warping matrix and the second warping matrix; if the first loss function and the second loss function are smaller than a preset threshold value, judging that the semantic segmentation model meets a preset standard; and if the first loss function or the second loss function which is larger than or equal to the preset threshold value is determined to exist, correcting the output result of the training semantic segmentation model by using the first distortion matrix and the second distortion matrix so as to enable the semantic segmentation model to accord with the preset standard.
In a specific application scenario, in order to make the ticket image to be corrected and the sample ticket image for training the semantic segmentation model belong to the same format size, as shown in fig. 6, the apparatus further includes: and an adjustment module 36.
And the adjusting module 36 is used for adjusting the bill picture to be corrected so as to enable the bill picture to be corrected to conform to the preset size.
Correspondingly, in order to obtain the correction matrix corresponding to the to-be-corrected bill picture, the input module 34 is specifically configured to input the preprocessed to-be-corrected bill picture into the semantic segmentation model meeting the preset standard, so as to obtain the correction matrix.
In a specific application scenario, in order to correct a to-be-corrected bill picture by using a correction matrix, the correction module 35 is specifically configured to extract current position coordinates and correction position coordinates of each pixel point included in the to-be-corrected bill picture from the correction matrix; determining pixel points of which the current position coordinates are not matched with the correction position coordinates as pixel points to be corrected; and replacing the current position coordinate with the correction position coordinate to move the pixel point to be corrected to the correction position, thereby further realizing the correction treatment of the bill picture to be corrected.
It should be noted that other corresponding descriptions of the functional units related to the correction device for a document image provided in this embodiment may refer to the corresponding descriptions in fig. 1 to fig. 2, and are not repeated herein.
Based on the method shown in fig. 1 and fig. 2, correspondingly, the embodiment of the present application further provides a storage medium, on which a computer program is stored, and the program, when executed by a processor, implements the method for rectifying the image of the bill shown in fig. 1 and fig. 2.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method of the embodiments of the present application.
Based on the method shown in fig. 1 and fig. 2 and the virtual device embodiment shown in fig. 5 and fig. 6, in order to achieve the above object, an embodiment of the present application further provides a computer device, which may specifically be a personal computer, a server, a network device, and the like, where the entity device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the method for rectification of a document image as shown in fig. 1 and 2.
Optionally, the computer device may also include a user interface, a network interface, a camera, Radio Frequency (RF) circuitry, sensors, audio circuitry, a WI-FI module, and so forth. The user interface may include a Display screen (Display), an input unit such as a keypad (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., a bluetooth interface, WI-FI interface), etc.
It will be understood by those skilled in the art that the computer device structure provided in the present embodiment is not limited to the physical device, and may include more or less components, or combine some components, or arrange different components.
The nonvolatile readable storage medium can also comprise an operating system and a network communication module. The operating system is a program of hardware and software resources of the physical device for ticket image rectification, and supports the running of an information processing program and other software and/or programs. The network communication module is used for realizing communication among components in the nonvolatile readable storage medium and communication with other hardware and software in the entity device.
Through the above description of the embodiments, those skilled in the art can clearly understand that the present application can be implemented by software plus a necessary universal hardware platform, or after screening out a standard sample bill picture, data processing is performed on the sample bill picture based on a distortion formula and a folding formula to obtain a first distortion matrix and a second distortion matrix, and after performing linear interpolation operation on the sample bill picture after the distortion processing, a first sample bill picture and a second sample bill picture are obtained, then a modified semantic segmentation model is trained by using the distortion matrix and the corresponding first sample bill picture or the corresponding second sample bill picture, when it is determined that the semantic segmentation model meets a preset standard, the preprocessed bill picture to be corrected can be input into the semantic segmentation model to obtain a correction matrix, then pixel points to be corrected in the bill picture to be corrected can be adjusted based on correction position coordinates in the correction matrix, further realize the correction processing of the bill picture to be corrected. The image correction method and device can automatically perform pixel-level image correction according to the characteristics of the input bill, can be applied to correction scenes of various local and non-local deformations, further can guarantee correction accuracy, and improve correction efficiency. In the correction process, besides the input of the bill picture to be corrected, no manual participation is needed, so that the dependence on manual operation can be reduced, the automation efficiency is improved, and the error rate can be reduced.
Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims (10)

1. A method for rectifying an image of a document, comprising:
acquiring a sample bill picture with the same type as a bill picture to be corrected;
carrying out data processing on the sample bill picture to obtain a corresponding distortion matrix;
establishing a semantic segmentation model based on a semantic segmentation algorithm, and training the semantic segmentation model by using the processed sample bill picture and the corresponding distortion matrix so as to enable the semantic segmentation model to meet a preset standard;
inputting the bill picture to be corrected into a semantic segmentation model meeting the preset standard to obtain a correction matrix;
and correcting the bill picture to be corrected by using the correction matrix.
2. The method according to claim 1, wherein the data processing of the sample ticket image to obtain the corresponding warping matrix specifically comprises:
randomly selecting a plurality of distortion starting points from the sample bill picture;
performing distortion processing on the sample bill picture based on a distortion formula at the distortion starting point so as to obtain first distortion matrixes corresponding to the distortion starting points in different distortion states;
performing linear interpolation filling of missing pixel points on the sample bill picture after the distortion processing to obtain a first sample bill picture which accords with a preset size;
randomly selecting a plurality of folding deformation starting points from the sample bill picture;
folding the sample bill picture based on a folding formula at the folding deformation starting point so as to obtain a second distortion matrix corresponding to each folding deformation starting point under different folding states;
and performing linear interpolation filling of missing pixel points on the folded sample bill picture to obtain a second sample bill picture which accords with the preset size.
3. The method of claim 2, wherein the warping equation is: w1 ═ 1-dαWherein w1 is a preset distortion effect, α is a constant value of 2, and d is an Euclidean distance between an offset point of the distortion starting point after distortion and the distortion starting point;
the distorting the sample note picture based on a distorting formula at the distortion starting point so as to obtain a first distortion matrix corresponding to each distortion starting point in different distortion states, specifically comprising:
calculating a first offset point corresponding to the distortion initial point under the preset distortion effect according to the distortion formula;
moving the pixel point at the distortion starting point to the first offset point;
constructing a first distortion matrix comprising the distortion starting point position coordinates and the first offset point position coordinates.
4. The method of claim 2, wherein the folding formula is w 2- α/(d)2+ α), wherein w2 is a preset folding effect, α is a constant value of 2, and d is the euclidean distance between the offset point of the folding deformation starting point after the folding deformation and the folding deformation starting point;
folding the sample bill picture based on a folding formula at the folding deformation starting point so as to obtain a second distortion matrix corresponding to each folding deformation starting point under different folding states, specifically comprising:
calculating a second offset point corresponding to the folding deformation starting point under the preset folding effect according to the folding formula;
moving the pixel point at the folding deformation starting point to the second offset point;
and constructing a second distortion matrix comprising the position coordinates of the folding deformation starting point and the position coordinates of the second offset point.
5. The method according to claim 2, wherein the creating a semantic segmentation model based on a semantic segmentation algorithm, and training the semantic segmentation model using the processed sample bill picture and the corresponding warping matrix to make the semantic segmentation model meet a preset standard specifically comprises:
configuring labels for the first sample bill picture and the second sample bill picture, wherein the labels correspond to the first distortion matrix and the second distortion matrix;
inputting the first sample bill data picture and the second sample bill picture after the label configuration into a semantic segmentation model, and respectively obtaining a corresponding first prediction distortion matrix and a corresponding second prediction distortion matrix;
calculating a first loss function between the first predictive warping matrix and the first warping matrix and a second loss function between the second predictive warping matrix and the second warping matrix;
if the first loss function and the second loss function are both smaller than a preset threshold value, judging that the semantic segmentation model meets a preset standard;
and if the first loss function or the second loss function which is larger than or equal to the preset threshold value is determined to exist, correcting and training the output result of the semantic segmentation model by using the first distortion matrix and the second distortion matrix so as to enable the semantic segmentation model to meet the preset standard.
6. The method according to claim 5, wherein before the step of inputting the to-be-corrected bill picture into the semantic segmentation model conforming to the preset standard and obtaining the correction matrix, the method further comprises:
adjusting the bill picture to be corrected to enable the bill picture to be corrected to conform to a preset size;
the step of inputting the bill picture to be corrected into the semantic segmentation model meeting the preset standard to obtain a correction matrix specifically comprises the following steps:
and inputting the preprocessed bill picture to be corrected into a semantic segmentation model meeting the preset standard to obtain a correction matrix.
7. The method according to claim 6, wherein said utilizing the correction matrix to correct the ticket image to be corrected comprises:
extracting the current position coordinates and the correction position coordinates of each pixel point contained in the bill picture to be corrected from the correction matrix;
determining pixel points of which the current position coordinates are not matched with the correction position coordinates as pixel points to be corrected;
and replacing the current position coordinate with the correction position coordinate to enable the pixel point to be corrected to move to a correction position, and further realizing correction processing of the bill picture to be corrected.
8. An apparatus for rectifying an image of a bill, comprising:
the acquiring module is used for acquiring a sample bill picture with the same bill type as the bill picture to be corrected;
the processing module is used for carrying out data processing on the sample bill picture to obtain a corresponding distortion matrix;
the training module is used for establishing a semantic segmentation model based on a semantic segmentation algorithm, and training the semantic segmentation model by utilizing the processed sample bill picture and the corresponding distortion matrix so as to enable the semantic segmentation model to meet a preset standard;
the input module is used for inputting the bill picture to be corrected into the semantic segmentation model meeting the preset standard to obtain a correction matrix;
and the correction module is used for correcting the bill picture to be corrected by using the correction matrix.
9. A non-transitory readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of rectification of a document image according to any one of claims 1 to 7.
10. A computer device comprising a non-volatile readable storage medium, a processor and a computer program stored on the non-volatile readable storage medium and executable on the processor, wherein the processor when executing the program implements a method of rectification of a document image according to any one of claims 1 to 7.
CN202010164109.4A 2020-03-11 2020-03-11 Correction method and device for bill image and computer equipment Active CN111507181B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010164109.4A CN111507181B (en) 2020-03-11 2020-03-11 Correction method and device for bill image and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010164109.4A CN111507181B (en) 2020-03-11 2020-03-11 Correction method and device for bill image and computer equipment

Publications (2)

Publication Number Publication Date
CN111507181A true CN111507181A (en) 2020-08-07
CN111507181B CN111507181B (en) 2023-05-26

Family

ID=71871553

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010164109.4A Active CN111507181B (en) 2020-03-11 2020-03-11 Correction method and device for bill image and computer equipment

Country Status (1)

Country Link
CN (1) CN111507181B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036304A (en) * 2020-08-31 2020-12-04 平安医疗健康管理股份有限公司 Medical bill layout identification method and device and computer equipment
CN112597998A (en) * 2021-01-07 2021-04-02 天津师范大学 Deep learning-based distorted image correction method and device and storage medium
CN112767270A (en) * 2021-01-19 2021-05-07 中国科学技术大学 Fold document image correction system
CN113011249A (en) * 2021-01-29 2021-06-22 招商银行股份有限公司 Bill auditing method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130137482A1 (en) * 2011-11-30 2013-05-30 Microsoft Corporation Perspective correction using a reflection
CN107292823A (en) * 2017-08-20 2017-10-24 平安科技(深圳)有限公司 Electronic installation, the method for invoice classification and computer-readable recording medium
CN109214382A (en) * 2018-07-16 2019-01-15 顺丰科技有限公司 A kind of billing information recognizer, equipment and storage medium based on CRNN
CN109886257A (en) * 2019-01-30 2019-06-14 四川长虹电器股份有限公司 Using the method for deep learning correction invoice picture segmentation result in a kind of OCR system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130137482A1 (en) * 2011-11-30 2013-05-30 Microsoft Corporation Perspective correction using a reflection
CN107292823A (en) * 2017-08-20 2017-10-24 平安科技(深圳)有限公司 Electronic installation, the method for invoice classification and computer-readable recording medium
CN109214382A (en) * 2018-07-16 2019-01-15 顺丰科技有限公司 A kind of billing information recognizer, equipment and storage medium based on CRNN
CN109886257A (en) * 2019-01-30 2019-06-14 四川长虹电器股份有限公司 Using the method for deep learning correction invoice picture segmentation result in a kind of OCR system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036304A (en) * 2020-08-31 2020-12-04 平安医疗健康管理股份有限公司 Medical bill layout identification method and device and computer equipment
CN112597998A (en) * 2021-01-07 2021-04-02 天津师范大学 Deep learning-based distorted image correction method and device and storage medium
CN112767270A (en) * 2021-01-19 2021-05-07 中国科学技术大学 Fold document image correction system
CN112767270B (en) * 2021-01-19 2022-07-15 中国科学技术大学 Fold document image correction system
CN113011249A (en) * 2021-01-29 2021-06-22 招商银行股份有限公司 Bill auditing method, device, equipment and storage medium
CN113011249B (en) * 2021-01-29 2024-05-28 招商银行股份有限公司 Bill auditing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111507181B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN111507181B (en) Correction method and device for bill image and computer equipment
US20190304066A1 (en) Synthesis method of chinese printed character images and device thereof
CN111291629A (en) Method and device for recognizing text in image, computer equipment and computer storage medium
US8724924B2 (en) Systems and methods for processing mobile images to identify and extract content from forms
CN108846385B (en) Image identification and correction method and device based on convolution-deconvolution neural network
CN111985465A (en) Text recognition method, device, equipment and storage medium
CN110443235B (en) Intelligent paper test paper total score identification method and system
CN110490190A (en) A kind of structured image character recognition method and system
CN114170468B (en) Text recognition method, storage medium and computer terminal
WO2017141802A1 (en) Image processing device, character recognition device, image processing method, and program recording medium
CN113850238B (en) Document detection method and device, electronic equipment and storage medium
CN111737478A (en) Text detection method, electronic device and computer readable medium
CN112988557A (en) Search box positioning method, data acquisition device and medium
KR102544129B1 (en) System for providing optical mark recogntion based answer sheeet scoring service
CN113592735A (en) Text page image restoration method and system, electronic equipment and computer readable medium
CN109741273A (en) A kind of mobile phone photograph low-quality images automatically process and methods of marking
CN109697442B (en) Training method and device of character recognition model
CN112036304A (en) Medical bill layout identification method and device and computer equipment
CN112949649B (en) Text image identification method and device and computing equipment
CN114511696A (en) Control positioning method and device, electronic equipment and readable storage medium
KR102562170B1 (en) Method for providing deep learning based paper book digitizing service
CN116030472A (en) Text coordinate determining method and device
CN112733857B (en) Image character detection model training method and device for automatically segmenting character area
CN115376132A (en) Data processing method and device for scanning pen, storage medium and scanning pen
JP2017138743A (en) Image processing apparatus, image processing method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant