WO2019227615A1 - 校正发票图像的方法、装置、计算机设备和存储介质 - Google Patents

校正发票图像的方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2019227615A1
WO2019227615A1 PCT/CN2018/095484 CN2018095484W WO2019227615A1 WO 2019227615 A1 WO2019227615 A1 WO 2019227615A1 CN 2018095484 W CN2018095484 W CN 2018095484W WO 2019227615 A1 WO2019227615 A1 WO 2019227615A1
Authority
WO
WIPO (PCT)
Prior art keywords
picture
straight lines
straight line
invoice image
text portion
Prior art date
Application number
PCT/CN2018/095484
Other languages
English (en)
French (fr)
Inventor
王威
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019227615A1 publication Critical patent/WO2019227615A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Definitions

  • the present invention relates to the field of computer technology, and in particular, to a method, a device, a computer device, and a storage medium for correcting an invoice image.
  • the main purpose of the present invention is to provide a method, a device, a computer device and a storage medium for correcting an invoice image, which are used to perform a unified front view correction process on the invoice image and reduce the quality difference of the invoice image.
  • the method for correcting an invoice image provided by the present invention includes:
  • a perspective transformation is performed on an area within the frame of the invoice image to be corrected to obtain a corrected invoice picture.
  • the device for correcting an invoice image provided by the present invention includes:
  • a processing unit configured to perform black and white binarization processing on the invoice image to be corrected to obtain a first picture
  • a first detection unit configured to detect a text portion in the first picture, and fill the detected text portion as a blank image to obtain a second picture;
  • a second detection unit configured to detect a border of the second picture
  • a transformation unit configured to perform a perspective transformation on an area in the frame of the invoice image to be corrected to obtain a corrected invoice picture.
  • the computer equipment provided by the present invention further includes a memory and a processor, where the memory stores computer-readable instructions, and is characterized in that the processor implements the steps of the foregoing method when the computer-readable instructions are executed.
  • the present invention also provides a computer non-volatile storage medium having computer-readable instructions stored thereon, characterized in that the steps of the above method when the computer-readable instructions are executed by a processor.
  • the beneficial effects of the present invention are: performing black and white binarization processing on the invoice image to be corrected to obtain a first picture; calculating the text portion in the first picture, and filling the calculated text portion into a blank state.
  • a second picture detecting a border of the second picture; performing perspective transformation on an area within the border of the invoice image to be corrected to obtain a corrected invoice picture to obtain a corrected invoice picture from a front view, thereby achieving
  • the purpose is to reduce the quality difference of the invoice image. Therefore, when the corrected invoice image is used as a training sample to train related models such as the invoice image, the quality of the invoice image is similar, which can significantly speed up the convergence rate of the model and improve the model. Training efficiency.
  • FIG. 1 is a schematic diagram of steps of a method for correcting an invoice image according to an embodiment of the present invention:
  • FIG. 2 is a schematic diagram of steps of a method for correcting an invoice image in another embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of an apparatus for correcting an invoice image according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a first detection unit of a device for correcting an invoice image according to an embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of a first detection module of an apparatus for correcting an invoice image according to an embodiment of the present invention
  • FIG. 6 is a schematic structural diagram of a second detection unit of a device for correcting an invoice image according to an embodiment of the present invention
  • FIG. 7 is a schematic structural diagram of an apparatus for correcting an invoice image in another embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a processing unit of an apparatus for correcting an invoice image according to an embodiment of the present invention.
  • FIG. 9 is a schematic block diagram of a computer device according to an embodiment of the present invention.
  • a method for correcting an invoice image according to the present invention includes:
  • Step S1 performing black and white binarization processing on the invoice image to be corrected to obtain a first picture
  • Step S2 detecting a text portion in the first picture, and filling the detected text portion as a blank image to obtain a second picture;
  • Step S3 detecting a border of the second picture
  • Step S4 Perform a perspective transformation on an area in the frame of the invoice image to be corrected to obtain a corrected invoice picture.
  • step S1 the method for correcting an invoice image in this embodiment needs to first obtain an invoice image to be corrected, where the invoice image to be corrected is used as an original image for performing unified front view correction processing.
  • a black and white binarization process is required to obtain a corresponding first picture.
  • the border and text portions in the invoice image to be corrected are changed to black, and other areas outside the border and text portions For example, the background or white space will turn white.
  • the text portion in the invoice image to be corrected can be more easily calculated.
  • step S2 for the obtained first picture, it is necessary to detect the text part in the first picture, and fill the blank part of the text part in the detected first picture to obtain a second picture, and fill the text part.
  • the blank state can avoid interference when detecting the border of the second picture, thereby improving the accuracy of detecting the border of the invoice image to be corrected.
  • step S3 before correcting the invoice image to be corrected, it is necessary to detect the border of the invoice image to be corrected.
  • the border of the second picture is the reference frame of the invoice image to be corrected, thereby realizing the inside of the border. Area is subjected to unified front view correction processing.
  • step S4 when the invoice image to be corrected is corrected, it is necessary to correspondingly obtain the area within the frame of the invoice image to be corrected according to the frame position of the second picture detected, and then to the to-be-corrected image.
  • the area of the invoice image located within the above frame is subjected to perspective transformation to obtain a corrected invoice picture.
  • the transformation matrix M is calculated according to the coordinates corresponding to the four points before and after the transformation.
  • the formula for solving the transformation matrix M is:
  • the coordinates (x i ′, y i ′) of all points in the corrected invoice image after perspective transformation are obtained, and the invoice image corrected by the front view is obtained, thereby achieving the purpose of reducing the quality difference of the invoice image; Therefore, when the corrected invoice image is used as a training sample to train related models such as the invoice image, the quality of the invoice image is similar, which can significantly accelerate the convergence rate of the model and improve the training efficiency of the model.
  • the step of detecting a text portion in the first picture includes:
  • Step S21 input the first picture into a preset CTPN model for detection; wherein the CTPN model is a first picture of a specified amount of a known text portion and a text portion marked in the first picture Trained as sample data, used to detect the text in the first picture;
  • Step S22 Obtain a detection result output by the CTPN model, where the detection result is a text portion in the first picture.
  • a CTPN model is used for detection, where the CTPN model is a trained model.
  • the method for training a CTPN model includes first obtaining a large amount of sample data and dividing the sample data into a training set and a test set, where the sample data includes a first picture of a known text portion, and marking the first picture in the first picture. Out of the text section.
  • the sample data of the training set is input into a preset CTPN model for training, and a result training model for detecting a text part is obtained.
  • the first picture of the known text part in the sample data of the test set is input to the result training model to obtain the three results of prioritybox, pred, and score.
  • the text data marked in the picture is compared to verify whether it meets the requirements.
  • the loss function is used to calculate whether the weighted sum of classification loss and regression loss meets the requirements.
  • classification loss it is the classification result of the predicted text type.
  • the regression loss it is a smooth L1 loss calculated from the predicted text position and the actual text position. Specifically, the two coordinates (corresponding to four values) of the diagonal position of the text position are selected for calculation. The calculation formula is
  • x is the difference between the predicted text position and the actual text position
  • is an adjustable parameter.
  • the weighted sum of the classification loss function and smooth L1 loss function is adjusted by adjusting this parameter to achieve the pass. Minimize the loss to drive the purpose of CTPN model training.
  • the CTPN model After the training of the CTPN model is completed, after the first picture is input to the trained CTPN model, the CTPN model will output a detection result, and the detection result is the text portion in the first picture, so that the text can be further realized. Partially filled in blank for easy border detection.
  • the CTPN model includes a VGG network, an LSTM network, and a fully connected layer
  • the step S21 of inputting the first picture into a preset CTPN model for detection includes:
  • Step S211 Process the first picture into a black and white picture required by a specified pixel
  • Step S212 input the black and white picture into a VGG network and perform convolution calculation to obtain a plurality of first picture features
  • Step S213 Perform correlation feature calculation on the first image feature through the LSTM network to obtain multiple second picture features
  • step S214 the plurality of second picture features are combined together to form a global picture feature through a fully connected layer, thereby outputting a detection result.
  • the first picture Before inputting into the preset CTPN model, the first picture needs to be processed into a black and white picture required by the specified pixel; the specific processing method is to keep the first picture at a constant aspect ratio.
  • the maximum dimension of the first picture is first adjusted to 256 pixels, so as to obtain a black and white picture required by a specified pixel.
  • the black-and-white pictures required by the specified pixels are input to a CTPN model for detection.
  • the CTPN model specifically includes a VGG network, an LSTM network, and a fully connected layer.
  • the VGG network in the CTPN model is used to perform the convolution calculation on the black and white picture to obtain the first picture feature; the VGG network performs the convolution calculation to obtain the first picture feature, and the LSTM network needs to be used to correlate the first image feature.
  • the feature calculation results in the second picture feature.
  • the addition of the LSTM network to the CTPN model can make the CTPN model make full use of the back-to-back correlation of the text part in the first picture feature and directly predict the position, type and
  • the three parameters of confidence greatly improve the speed and accuracy of detecting the text portion in the first picture.
  • the second picture feature is a local picture feature
  • the above-mentioned second picture features need to be combined together to form a global picture feature through a fully connected layer.
  • the detection result is obtained based on the global picture feature.
  • the corresponding detection results are prioritybox, pred, and score.
  • the step S3 of detecting a border of the second picture includes:
  • Step S31 detecting a plurality of black short straight lines in the second picture
  • Step S32 Perform direction determination on each of the plurality of short straight lines, and calculate a distance between the adjacent short straight lines;
  • Step S33 divide the distance between adjacent short straight lines shorter than a preset threshold and reach a preset directional consistency condition into the same short straight line group to obtain multiple short straight line groups;
  • Step S34 Fit the short straight lines in each of the short straight line groups to obtain corresponding multiple long straight lines;
  • Step S35 classify the positions of the multiple groups of long straight lines in the second picture to obtain multiple groups of azimuth straight lines;
  • Step S36 Delete the long straight lines that do not meet the conditions in each azimuth straight line group according to a preset rule
  • Step S37 Calculate the average slope of the remaining long straight lines in each of the azimuth straight line groups
  • Step S38 selecting the two endpoints of the remaining long straight lines in the azimuth straight line group, and finding all the endpoints that are closest to the boundary of the second picture on the side of the corresponding azimuth straight line group as the designated points.
  • An average of the slopes of the remaining long straight lines in the azimuth straight line group, and the specified point generates a boundary straight line of the second picture on the side of the corresponding azimuth straight line group;
  • Step S39 Generate a frame of the second picture according to the boundary straight line and a preset frame rule.
  • a plurality of short black straight lines are obtained by detecting the second picture by using a probabilistic Hough transform.
  • n it is the same, and this value is the distance from the origin to the line where all the above point sets are located, where ⁇ i represents (x i , y i ) which is positive with the horizontal axis
  • the included angle is the angular component of the representation ( ⁇ i , ⁇ i ) of (x i , y i ) in polar coordinates.
  • the probabilistic Huff transform is faster, and it detects a short straight line that fits the edge of the figure more than the classic Huff transform. Large straight line across the entire image.
  • the method for determining the directions of the multiple short straight lines is as follows.
  • the absolute value of the difference between the cosine of the two short straight lines and 1 is used as the measurement standard.
  • the absolute value of the difference from 1 is 0, indicating that the two short straight lines are parallel.
  • the cosine of the angle between the two short straight lines and 1 is 1, the absolute value of the difference is 1, indicating that the two short straight lines are perpendicular.
  • the absolute value of the difference between the cosine of the angle between the two short straight lines and 1 is less than 0.1, it is also determined that the directions of the two short straight lines are consistent.
  • the method for calculating the distance between the adjacent short straight lines is to randomly select two endpoints in the two short straight lines, and calculate the distance from one endpoint to another short straight line, respectively, from the obtained four endpoints to the short
  • the maximum value of the four distance values is selected.
  • the maximum value is less than a preset threshold, specifically less than 15 pixels, it means that the distance between the two short straight lines is very small.
  • the distance between adjacent short straight lines is less than a preset threshold, and the short straight lines that meet a preset directional consistency condition are divided into the same short straight line group to obtain multiple short straight line groups.
  • the least square method is used for the fitting method. It should be noted that for short straight lines that are close to horizontal, the least squares method can be used directly; for short straight lines that are close to vertical, the slope of the short straight lines is very large, which results in a relatively large error. In this case, you need to swap the x-coordinate and y-coordinate positions, and then use the least squares method to calculate the results and then exchange the coordinates back.
  • the specific grouping method is to divide the long straight lines that are horizontal and located in the upper third of the entire second picture into the upper group, and divide the long straight lines that are horizontal and located in the lower third of the whole second picture.
  • the straight lines are grouped in the lower group, and the long straight lines that are in the left third of the entire second picture are grouped in the left group.
  • the long straight lines of the position are grouped on the right side, so that all the long straight lines are classified according to the position.
  • the two endpoints of the remaining long straight lines in the azimuth straight line group are selected, and the endpoints of all the endpoints that are closest to the boundary of the second picture on the side of the corresponding azimuth straight line group are found
  • the end point is a designated point; for example, when the azimuth straight line group is the upper group, the designated point refers to the end point that is closest to the upper boundary of the distance from all the endpoints among all the end points; according to the remaining length in each of the azimuth straight line groups
  • the average of the slope of the straight line and the specified point generates a boundary straight line of the second picture on the side where the corresponding azimuth straight line group is located.
  • the preset border rule is a line segment corresponding to a closed area surrounded by the obtained border line as the above The border of the second picture.
  • the step S36 of deleting an unqualified long straight line in each azimuth straight line group according to a preset rule includes:
  • each set of azimuth straight lines includes two long straight lines, the long straight lines with shorter lengths are deleted;
  • each azimuth straight line group when two long straight lines are included in each group, if the directions of the two long straight lines are inconsistent, a long straight line with a shorter length is deleted.
  • a long straight line When more than two long straight lines are included in each group, if a long straight line does not match the direction of more than half of the long straight lines in this group, delete the long straight line.
  • the specific method for judging whether the directions of the two long straight lines are the same is to use the absolute value of the difference between the cosine value of the two long straight lines and 1 as a criterion; when the cosine value of the two long straight lines The absolute value of the difference from 1 is 0, indicating that the two long straight lines are parallel.
  • the cosine of the angle between the two long straight lines and 1 is 1, the absolute value of the difference is 1, indicating that the two long straight lines are perpendicular.
  • the long straight line is taken as the boundary straight line of the azimuth straight line group.
  • the azimuth line group includes an upper group, a lower group, a left group, and a right group
  • the boundary line includes an upper boundary line, a lower boundary line, and a left boundary line.
  • the right border line; the step S39 of generating a border of the second picture according to the border line and a preset border rule includes:
  • a line segment corresponding to a closed area surrounded by the upper boundary straight line, the lower boundary straight line, the left boundary straight line, and the right boundary straight line is obtained as a border of the second picture.
  • the azimuth straight line group includes an upper group, a lower group, a left group, and a right group, the boundary straight line Including the upper boundary straight line, the lower boundary straight line, the left boundary straight line, and the right boundary straight line, and obtaining a line segment corresponding to a closed area surrounded by the upper boundary straight line, the lower boundary straight line, the left boundary straight line, and the right boundary straight line as the second picture Border. It should be noted that when there is no azimuth straight line group on one side, it is determined whether there is an azimuth straight line group on the opposite side.
  • the boundary straight line on the opposite side is translated to this side until it is parallel to the vertical side. Stop the translation at the end point of one of the boundary straight lines, and then use the boundary straight line translated from the opposite side as the boundary straight line of the side to perform the above operation. If there is no straight line on the opposite side, use the two sides directly.
  • the border of the second picture itself is used as a border.
  • the step of calculating the text portion in the first picture and filling the calculated text portion with a blank state to obtain a second picture includes:
  • Step S201 Adjust the contrast of the first picture.
  • the contrast of the first picture can be adjusted to make the distinction between the black and white parts in the first picture more obvious.
  • the method of adjusting the contrast of the first picture may specifically be a limited contrast adaptive histogram equalization algorithm (CLAHE algorithm), wherein the limited contrast adaptive histogram equalization algorithm (CLAHE algorithm) specifically adopts an adaptively trimmed image histogram, and then uses The trimmed histogram balances the black and white picture, which has the advantage of making the distinction between the areas corresponding to the text portion and the border portion in the first picture, the blank portions, and the white areas corresponding to the background department more obvious.
  • CLAHE algorithm limited contrast adaptive histogram equalization algorithm
  • the step S1 of performing black and white binarization processing on the invoice image to be corrected to obtain a first picture includes:
  • Step S11 converting the invoice image to be corrected to obtain a grayscale image
  • Step S12 performing a black and white binarization process on the grayscale image to obtain a first picture.
  • each pixel in the invoice image to be corrected Since the color of each pixel in the invoice image to be corrected is determined by three components of R, G, and B, and each component has 256 values, each pixel has a range of more than 16 million colors.
  • the grayscale image is a special color image with the same three components of R, G, and B.
  • the change range of each pixel is only 256. Therefore, before the black and white binarization of the invoice image to be corrected, The grayscale image is converted into the invoice image to be corrected, which can reduce the subsequent calculation amount.
  • the method for converting the invoice image to be corrected to obtain a grayscale image may be to find the average value of the three components of each pixel R, G, and B, and then assign this average value to the three components of this pixel.
  • the method of converting the invoice image to be corrected to obtain a grayscale image can also be other methods.
  • the brightness value of Y is used to represent the The gray value of each pixel of the invoice image also makes the subsequent calculation less.
  • the grayscale image can be subjected to black-and-white binarization.
  • the method for performing the black-white binarization process on the grayscale image is specifically for the grayscale For each pixel point P in the figure, select a square matrix R with a length of 21 pixels centered on the point P, and change the gray value of all pixels in the square matrix R from large to small (color from white to black) ) To sort, select the smallest gray value T among the larger gray values of 20% of all pixels in the square matrix R as the gray threshold, and if the gray value of point P is lower than the gray threshold T, set point P to Is black, otherwise point P is set to white.
  • the black-and-white binarization processing is performed by the above method, so that the text portion and the border portion of the grayscale image become black, and the background and blank portions become white to obtain the corresponding first Picture, the first picture that is binarized in black and white is helpful for detecting the text part and detecting the frame.
  • the apparatus for correcting an invoice image in this embodiment includes:
  • a processing unit 10 configured to perform black and white binarization processing on the invoice image to be corrected to obtain a first picture
  • a first detecting unit 20 configured to detect a text portion in the first picture, and fill the detected text portion as a blank image to obtain a second picture;
  • a second detection unit 30, configured to detect a border of the second picture
  • a transformation unit 40 is configured to perform perspective transformation on an area in the frame of the invoice image to be corrected to obtain a corrected invoice map.
  • the apparatus for correcting an invoice image in this embodiment needs to first obtain an invoice image to be corrected, where the invoice image to be corrected is used as an original image for performing unified front view correction processing.
  • a black and white binarization process is required to obtain a corresponding first picture.
  • the border and text portions in the invoice image to be corrected are changed to black, and other areas outside the border and text portions For example, the background or white space will turn white.
  • the processing unit 10 performs the black-and-white binarization processing on the invoice image to be corrected, it is easier to calculate a text portion in the invoice image to be corrected.
  • the first detection unit 20 needs to detect the text portion in the first picture, fill the blank text portion in the detected first picture to obtain a second picture, and fill the text portion as The blank state can avoid interference when detecting the border of the second picture, thereby improving the accuracy of the border of the invoice image to be corrected.
  • the second detection unit 30 detects a frame of the invoice image to be corrected.
  • the frame of the second picture is a reference frame to be corrected for the invoice image to be corrected, so as to realize the correction within the frame.
  • the area is subjected to unified front view correction processing.
  • the first detection unit 20 includes:
  • a first detection module 21 is configured to input the first picture into a preset CTPN model for detection; wherein the CTPN model is a first picture of a specified amount of a known text portion and the first picture
  • the marked text portion is obtained by training as sample data, and is used to detect and obtain the text portion in the first picture;
  • the obtaining module 22 is configured to obtain a detection result output by the CTPN model, where the detection result is a text portion in the first picture.
  • the first detection module 21 detects a text portion in the first picture.
  • a CTPN model is used for detection, where the CTPN model is a trained model.
  • the method for training the CTPN model has been described in the foregoing method embodiments, and is not repeated here.
  • the CTPN model includes a VGG network, an LSTM network, and a fully connected layer.
  • the first detection module 21 includes:
  • a processing submodule 211 configured to process the first picture into a black and white picture required by a specified pixel
  • a first calculation submodule 212 configured to input the black and white picture into a VGG network and perform convolution calculation to obtain a plurality of first picture features
  • a second calculation submodule 213, configured to perform correlation feature calculation on the first image feature through an LSTM network to obtain multiple second picture features
  • the combining sub-module 214 is configured to combine the multiple second picture features together to form a global picture feature through a fully connected layer, so as to output a detection result.
  • the processing sub-module 211 processes the first picture into a black and white picture required by the specified pixel; the specific processing method is to keep the first picture at a constant aspect ratio. In case of change, first adjust the maximum dimension of the first picture to 256 pixels, so as to obtain the black and white picture required by the specified pixel.
  • the first calculation sub-module 212 inputs the black and white picture required by the specified pixel into a CTPN model for detection.
  • the CTPN model specifically includes a VGG network, an LSTM network, and a fully connected layer.
  • the VGG network in the CTPN model is used to perform the convolution calculation on the black and white picture to obtain the first picture feature; the VGG network performs the convolution calculation to obtain the first picture feature, and the second calculation submodule 213 performs the LSTM network on the first image.
  • the second picture feature is obtained by calculating the correlation feature of the feature. It should be pointed out that the addition of the LSTM network to the CTPN model can make the CTPN model make full use of the back-to-back correlation of the text part in the first picture feature to directly predict the text part.
  • the three parameters of position, type and confidence greatly improve the speed and accuracy of detecting the text portion in the first picture.
  • the second picture feature is a local picture feature
  • the combining sub-module 214 combines the above-mentioned second picture features together through a fully connected layer to form a global picture feature, and finally obtains a detection result based on the global picture feature.
  • the corresponding detection results are priorbox, pred And score three results, where prioritybox is used to indicate the text position, pred is used to indicate the text type, score is used to indicate the confidence level of the text type at a specific position, and the above three parameters can be used to obtain the section of writing.
  • the second detection unit 30 includes:
  • a second detection module 31 configured to detect a plurality of short black straight lines in the second picture
  • the execution module 32 is configured to perform direction determination on a plurality of the short straight lines and calculate a distance between the adjacent short straight lines;
  • a grouping module 33 configured to divide the distance between adjacent short straight lines shorter than a preset threshold and reach a preset directional consistency condition into the same short straight line group to obtain multiple short straight line groups;
  • a fitting module 34 configured to fit the short straight lines in each of the short straight line groups to obtain corresponding multiple long straight lines
  • a classification module 35 configured to classify according to the positions of the multiple groups of long straight lines in the second picture, to obtain multiple groups of azimuthal straight groups;
  • a deleting module 36 configured to delete a long straight line that does not meet the conditions in each azimuth straight line group according to a preset rule
  • a first calculation module 37 configured to calculate an average slope of the remaining long straight lines in each of the azimuth straight line groups
  • a second calculation module 38 is configured to select the two endpoints of the remaining long straight lines in the azimuth straight line group, and find all the endpoints that are closest to the boundary of the second picture on the side of the corresponding azimuth straight line group as the designation. A point, based on the average of the slopes of the remaining long straight lines in each of the azimuth straight line groups, and the designated point to generate a boundary straight line of the second picture on the side of the corresponding azimuth straight line group;
  • a generating module 39 is configured to generate a frame of the second picture according to the boundary straight line and a preset frame rule.
  • the second detection module 31 in this embodiment detects a plurality of short black straight lines by detecting the second picture by using a probabilistic Hough transform.
  • n is the distance from the origin to the line where all the above point sets are located, where ⁇ i represents (x i , y i ) and The included angle in the positive direction of the horizontal axis, that is, the angular component in the representation ( ⁇ i , ⁇ i ) of (x i , y i ) in polar coordinates. Then randomly extract the edge points in the second picture for detection. If this point has been calibrated as a point on the short straight line previously detected, skip it, otherwise mark the collinear points along the line direction detected by this one. The endpoints of the short straight line are determined until all the edge points in the second picture are extracted.
  • the probabilistic Huff transform is faster, and it detects a short straight line that fits the edge of the figure more than the classic Huff transform. Large straight line across the entire image.
  • the execution module 32 For a plurality of short straight lines detected by the probabilistic Hough transform, the execution module 32 performs direction determination on the plurality of short straight lines, and calculates a distance between the adjacent short straight lines.
  • the method for determining the directions of the multiple short straight lines is as follows.
  • the absolute value of the difference between the cosine of the two short straight lines and 1 is used as the measurement standard.
  • the absolute value of the difference from 1 is 0, indicating that the two short straight lines are parallel.
  • the absolute value of the difference is 1, indicating that the two short straight lines are perpendicular.
  • the absolute value of the difference between the cosine of the angle between the two short straight lines and 1 is less than 0.1, it is also determined that the directions of the two short straight lines are consistent.
  • the method for calculating the distance between the adjacent short straight lines is to randomly select two endpoints in the two short straight lines, and calculate the distance from one endpoint to another short straight line, respectively, from the obtained four endpoints to the short
  • the maximum value of the four distance values is selected.
  • the maximum value is less than a preset threshold, specifically less than 15 pixels, it means that the distance between the two short straight lines is very small.
  • the grouping module 33 divides the distances between adjacent short straight lines shorter than a preset threshold and meets a preset direction consistency condition into the same short straight line group according to the above method to obtain multiple short straight line groups.
  • the fitting module 34 fits all the short lines in the above multiple groups of short straight lines into a group to obtain corresponding multiple groups of long straight lines.
  • the method of fitting is to use the least square method. . It should be noted that for short straight lines that are close to horizontal, the least squares method can be used directly; for short straight lines that are close to vertical, the slope of the short straight lines is very large, which results in a relatively large error. In this case, you need to swap the x-coordinate and y-coordinate positions, and then use the least squares method to calculate the results and then exchange the coordinates back.
  • the classification module 35 classifies all the long straight lines according to positions to obtain multiple groups of azimuthal straight lines.
  • the specific grouping method is to divide the long straight lines that are horizontal and located in the upper third of the entire second picture into the upper group, and divide the long straight lines that are horizontal and located in the lower third of the whole second picture.
  • the straight lines are grouped in the lower group, and the long straight lines that are in the left third of the entire second picture are grouped in the left group.
  • the long straight lines of the position are grouped on the right side, so that all the long straight lines are classified according to the position.
  • the deletion module 36 needs to delete the long straight lines in each of the azimuth straight line groups that do not meet the conditions according to the preset rules.
  • the purpose is to delete the azimuth straight line groups and the groups. For other long straight lines whose directions are inconsistent, exclude straight lines that are not generated by the boundary of the second picture.
  • the first calculation module 37 calculates the average slope of the remaining long straight lines in each of the azimuth straight line groups. It should be noted that for long straight lines that are close to horizontal, the slope can be directly calculated. For long straight lines that are close to vertical, because The slope of a long straight line is very large. In this case, you need to exchange the x-coordinate and y-coordinate positions and then calculate the slope. After calculating the result, you can exchange the coordinates back.
  • the two endpoints of the remaining long straight lines in the azimuth straight line group are selected, and the endpoints of all the endpoints that are closest to the boundary of the second picture on the side of the corresponding azimuth straight line group are found.
  • the end point is a designated point; for example, when the azimuth straight line group is an upper group, the designated point refers to an end point that is closest to an upper boundary distance from all of the end points among all the end points; the second calculation module 38 according to each of the azimuth straight lines The average of the slopes of the remaining long straight lines in the group, and the specified point generates a boundary straight line of the second picture on the side where the corresponding azimuth line group is located.
  • the generating module 39 generates the frame of the second picture according to the obtained boundary straight line, according to the foregoing boundary straight line, and a preset border rule; wherein the preset border rule corresponds to the closed area surrounded by the obtained border straight line.
  • the line segment serves as the border of the second picture.
  • the deleting module 36 is configured to delete a long straight line having a shorter length when each set of azimuth straight lines includes two long straight lines;
  • a straight line group includes more than two long straight lines, the long straight lines that do not coincide with the direction of other long straight lines in the group are deleted.
  • each azimuth straight line group when two long straight lines are included in each group, if the directions of the two long straight lines are inconsistent, the deletion module 36 deletes one of the long straight lines.
  • the deletion module 36 deletes the long straight line.
  • the specific method for judging whether the directions of the two long straight lines are the same is to use the absolute value of the difference between the cosine value of the two long straight lines and 1 as a criterion; when the cosine value of the two long straight lines The absolute value of the difference from 1 is 0, indicating that the two long straight lines are parallel.
  • the cosine of the angle between the two long straight lines and 1 is 1, the absolute value of the difference is 1, indicating that the two long straight lines are perpendicular.
  • the long straight line is taken as the boundary straight line of the azimuth straight line group.
  • the device for correcting an invoice image in this embodiment can obtain multiple sets of azimuth straight lines when classifying the positions of the plurality of sets of long straight lines in the second picture.
  • the azimuth straight line group includes an upper group and a lower group, Group, left group, and right group
  • the boundary straight line includes an upper boundary straight line, a lower boundary straight line, a left boundary straight line, and a right boundary straight line
  • the generating module 39 obtains the upper boundary straight line, the lower boundary straight line, the left boundary straight line, and the right
  • a line segment corresponding to a closed area surrounded by a straight line of a border is used as a border of the second picture.
  • the apparatus for correcting an invoice image in another embodiment further includes:
  • the adjusting unit 201 is configured to adjust the contrast of the first picture.
  • the adjusting unit 201 can adjust the contrast of the first picture to make the distinction between the black and white parts in the first picture more obvious.
  • the method of adjusting the contrast of the first picture may specifically be a limited contrast adaptive histogram equalization algorithm (CLAHE algorithm), wherein the limited contrast adaptive histogram equalization algorithm (CLAHE algorithm) specifically adopts an adaptively trimmed image histogram, and then uses The trimmed histogram balances the black and white picture, which has the advantage of making the distinction between the areas corresponding to the text portion and the border portion in the first picture, the blank portions, and the white areas corresponding to the background department more obvious.
  • CLAHE algorithm limited contrast adaptive histogram equalization algorithm
  • the processing unit 10 includes:
  • a conversion module 11 configured to convert the invoice image to be corrected to obtain a grayscale image
  • a processing module 12 is configured to perform black and white binarization processing on the grayscale image to obtain a first picture.
  • each pixel in the invoice image to be corrected is determined by three components of R, G, and B, and each component has 256 values, each pixel has a range of more than 16 million colors.
  • the grayscale image is a special color image with the same three components of R, G, and B.
  • the change range of each pixel is only 256. Therefore, the invoice image to be corrected is converted into black and white before binarization.
  • the module 101 converts the invoice image to be corrected to obtain a grayscale image, which can reduce subsequent calculations.
  • the method for converting the invoice image to be corrected to obtain a grayscale image can be to find the average of the three components of each pixel R, G, and B, and then assign this average to the three components of this pixel.
  • the method of converting the invoice image to be corrected to obtain a grayscale image can also be other methods.
  • the brightness value of Y is used to represent the The gray value of each pixel of the invoice image also makes the subsequent calculation less.
  • the processing module 102 After obtaining the grayscale image converted from the invoice image to be corrected, the processing module 102 performs black and white binarization processing on the grayscale image, and the method of performing black and white binarization processing on the grayscale image is specifically for the grayscale image. For each pixel point P in the degree diagram, select a square matrix R with a length of 21 pixels centered on the point P, and change the gray value of all pixels in the square matrix R from large to small (color from white to Black) for sorting, select the smallest gray value T among the larger gray values of 20% of all pixels in the square matrix R as the gray threshold, and if the gray value of point P is lower than the gray threshold T, then point P Set it to black, otherwise set point P to white.
  • the black-and-white binarization processing is performed by the above method, so that the text portion and the border portion of the grayscale image become black, and the background and blank portions become white to obtain the corresponding first Picture, the first picture that is binarized in black and white is helpful for detecting the text part and detecting the frame.
  • an embodiment of the present invention further provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the computer design processor is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the memory provides an environment for operating systems and computer-readable instructions in a non-volatile storage medium.
  • the database of the computer equipment is used for preset data such as a method for correcting an invoice image.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • An embodiment of the invention also provides a computer non-volatile readable storage medium, which stores computer-readable instructions, and the computer-readable instructions implement the processes of the foregoing method embodiments when executed by a processor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

一种校正发票图像的方法、装置、计算机设备和存储介质,其中方法包括:对待校正的发票图像进行黑白二值化处理得到第一图片(S1);检测第一图片中的文字部分,并将检测得到的文字部分填充为空白图像以得第二图片(S2);检测第二图片的边框(S3);对待校正的发票图像中位于边框内的区域进行透视变换得到校正后的发票图片(S4)。

Description

校正发票图像的方法、装置、计算机设备和存储介质
本申请要求于2018年6月1日提交中国专利局、申请号为2018105572039,申请名称为“校正发票图像的方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及到计算机技术领域,特别是涉及到一种校正发票图像的方法、装置、计算机设备和存储介质。
背景技术
为了能在新的发票图像中提取特定项目的文本信息,需要先使用已经大量标记好的发票图像输入到具体深度学习的算法中训练模型,然后利用训练好的模型来提取新的发票图像的特定项目的文本信息。而在训练模型的过程中,需要大量的发票图像,由于用来训练的大量发票图像来自于途径,例如由不同的用户通过拍照的方式上传,而拍摄的发票图像由于会受到拍照角度、光照条件、拍摄背景以及发票在发票图像中的位置等多种因素的影响,导致这些发票图像的质量千差万别,当将这些发票图像直接用于训练模型时会使得模型的收敛速度较为缓慢,因此如何将这些发票图像进行统一的正视图校正处理,从而减小发票图像的质量差异成为亟待解决的问题。
技术问题
本发明的主要目的为提供一种校正发票图像的方法、装置、计算机设备和存储介质,用于对发票图像进行统一的正视图校正处理,减小发票图像的质量差异。
技术解决方案
本发明提出的校正发票图像的方法,包括:
对所述待校正的发票图像进行黑白二值化处理得到第一图片;
检测所述第一图片中的文字部分,并将检测得到的文字部分填充为空白图像以得第二图片;
检测所述第二图片的边框;
对所述待校正的发票图像中位于所述边框内的区域进行透视变换得到校正后的发票图片。
本发明提出的校正发票图像的装置,包括:
处理单元,用于对所述待校正的发票图像进行黑白二值化处理得到第一图片;
第一检测单元,用于检测所述第一图片中的文字部分,并将检测得到的文字部分填充为空白图像以得第二图片;
第二检测单元,用于检测所述第二图片的边框;
变换单元,用于对所述待校正的发票图像中位于所述边框内的区域进行透视变换得到校正后的发票图片。
本发明还提出的计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现上述方法的步骤。
本发明还提出的计算机非易失性可读存储介质,其上存储有计算机可读指令,其特征在于,所述计算机 可读指令被处理器执行时上述方法的步骤。
有益效果
本发明的有益效果为:对所述待校正的发票图像进行黑白二值化处理得到第一图片;计算出所述第一图片中的文字部分,并将计算得到的文字部分填充为空白状态得第二图片;检测所述第二图片的边框;对所述待校正的发票图像中位于所述边框内的区域进行透视变换得到校正后的发票图片得到经过正视图校正后的发票图片,从而实现了减小发票图像的质量差异的目的,从而,校正后的发票图像作为训练样本对发票图像等相关模型进行训练时,由于发票图像的质量相近,能显著加快该模型的收敛速度,提高模型的训练效率。
附图说明
图1为本发明一实施例中的校正发票图像的方法的步骤示意图:
图2为本发明另一实施例中的校正发票图像的方法的步骤示意图;
图3为本发明一实施例中的校正发票图像的装置的结构示意图;
图4为本发明一实施例中的校正发票图像的装置的第一检测单元的结构示意图;
图5为本发明一实施例中的校正发票图像的装置的第一检测模块的结构示意图;
图6为本发明一实施例中的校正发票图像的装置的第二检测单元的结构示意图;
图7为本发明另一实施例中的校正发票图像的装置的结构示意图;
图8为本发明一实施例中的校正发票图像的装置的处理单元的结构示意图;
图9为本发明一实施例的计算机设备的结构示意框图。
本发明的最佳实施方式
参照图1,本发明提出的校正发票图像的方法,包括:
步骤S1,对所述待校正的发票图像进行黑白二值化处理得到第一图片;
步骤S2,检测所述第一图片中的文字部分,并将检测得到的文字部分填充为空白图像以得第二图片;
步骤S3,检测所述第二图片的边框;
步骤S4,对所述待校正的发票图像中位于所述边框内的区域进行透视变换得到校正后的发票图片。
在步骤S1中,本实施例中的校正发票图像的方法,需要先获取待校正的发票图像,其中待校正的发票图像作为进行统一的正视图校正处理的原始图像。对于上述待校正的发票图像需要进行黑白二值化处理得到对应的第一图片,具体的说,将待校正的发票图像中的边框和文字部分变为黑色,而边框和文字部分之外其它区域例如背景或空白部分将变为白色。而将上述待校正的发票图像进行黑白二值化处理之后,能更容易的计算出上述待校正的发票图像中的文字部分。
在步骤S2中,对于得到的第一图片,需要检测出上述第一图片中的文字部分,并将检测得到的上述第一图片中的文字部分填充为空白状态得到第二图片,将文字部分填充为空白状态可以避免在检测上述第二图片的边框时造成干扰,从而提高检测待校正的发票图像的边框的准确性。
在步骤S3中,在对待校正的发票图像进行校正之前,需要检测得到待校正的发票图像的边框,这里,第二图片的边框就是对待校正的发票图像进行校正的参考边框,从而实现对边框内的区域进行统一的正 视图校正处理。
在步骤S4中,在对待校正的发票图像进行校正时,需要根据检测得到的上述第二图片的边框位置,对应获取得到上述待校正的发票图像中位于上述边框内的区域,然后对上述待校正的发票图像中位于上述边框内的区域进行透视变换得到校正后的发票图片。其中透视变换的具体方式为以上述待校正的发票图像中的边框的四个交点放入到坐标系,在坐标系中,从左上角按照顺时针依次排列为(x i,y i),其中i=0,1,2,3,需要指出的是,在该坐标系中,将上述第二图片的边框中位于左上角的交点作为原点,X轴的正方向为水平向右,Y轴的正方向竖直向下。其中校正后的发票图片的边框上的四个交点的坐标为(x i′,y i′),i=0,1,2,3。校正后的发票图片的四个交点的坐标与校正前的发票图像的四个交点的坐标所对应的关系为:x 0′=y 0′=y 1′=x 2′=0,x 1′=x 2′=max(x 1-x 0,x 3-x 2),y 2′=y 3′=max(y 2-y 0,y 3-y 1)。根据变换前后的四个点所对应的坐标来计算得到变换矩阵M,其中求解变换矩阵M的公式为:
Figure PCTCN2018095484-appb-000001
在上述公式中,t为常数,根据上述公式求得的变换矩阵M,将待校正的发票图像中所有点坐标(x i,y i)代入公式:
Figure PCTCN2018095484-appb-000002
从而求得校正后的发票图像中所有点在透视变换后的坐标(x i′,y i′),得到经过正视图校正后的发票图片,从而实现了减小发票图像的质量差异的目的;从而,校正后的发票图像作为训练样本对发票图像等相关模型进行训练时,由于发票图像的质量相近,能显著加快该模型的收敛速度,提高模型的训练效率。
本实施例中,所述检测所述第一图片中的文字部分的步骤,包括:
步骤S21,将所述第一图片输入到预设的CTPN模型中进行检测;其中,所述CTPN模型为指定量的已知文字部分的第一图片以及所述第一图片中标记出的文字部分作为样本数据训练所得,用于检测得到第一图片中的文字部分;
步骤S22,获取所述CTPN模型输出的检测结果,所述检测结果即为所述第一图片中的文字部分。
对于获取的第一图片,需要检测第一图片中的文字部分,本实施例中采用CTPN模型来进行检测,其中上述CTPN模型为训练好的模型。对于CTPN模型进行训练的方法包括,先获取大量的样本数据,并将上述样本数据分成训练集和测试集,其中上述样本数据包括已知文字部分的第一图片,以及在上述第一图片中标记出的文字部分。将上述训练集的样本数据输入到预设的CTPN模型中进行训练,得到用于检测文字部分的结果训练模型。对于训练得到的结果训练模型,将测试集的样本数据中的已知文字部 分的第一图片输入到结果训练模型得到priorbox、pred和score三个结果,通过将上述三个结果与输入的第一图片中标记出的文字部分的数据进行对比,验证是否达到要求,其中具体通过损失函数来计算分类损失和回归损失的加权和是否达到要求,对于分类损失,它是通过预测的文本类型的分类结果与文本类型的真实分类结果计算的交叉熵,计算公式为
Figure PCTCN2018095484-appb-000003
这里n=1,…,N取遍所有的文本类型,k=1,…,K取遍所有的文本类型的分类。由于预测文本类型的分类结果有很多种,例如当文本类型n隶属于分类k时t nk=1,否则t nk=0,y nk表示预测出的文本类型n隶属于分类k的概率。对于回归损失,它是通过预测的文本位置与真实的文本位置计算的smooth L1损失,其中,具体通过选取出文本位置的对角的两个坐标(对应四个值)进行计算,计算公式为
Figure PCTCN2018095484-appb-000004
其中x就是预测的文本位置与真实的文本位置的差值,σ是一个可以调节的参数,在训练过程中通过调节这个参数使得分类损失函数和smooth L1损失函数的加权和大致相同,以达到通过最小化损失来驱动CTPN模型训练的目的。当CTPN模型训练完成之后,当将上述第一图片输入到训练好的CTPN模型后,上述CTPN模型将输出检测结果,上述检测结果即为上述第一图片中的文字部分,从而能进一步实现将文字部分填充成空白状态,便于检测边框。
本实施例中的校正发票图像的方法,所述CTPN模型包括VGG网络、LSTM网络以及全连接层,所述将所述第一图片输入到预设的CTPN模型中进行检测的步骤S21,包括:
步骤S211,将所述第一图片处理成指定像素要求的黑白图片;
步骤S212,将所述黑白图片输入到VGG网络中进行卷积计算得到多个第一图片特征;
步骤S213,通过LSTM网络对所述第一图像特征进行关联性特征计算得到多个第二图片特征;
步骤S214,通过全连接层将所述多个第二图片特征结合在一起形成全局图片特征,从而输出检测结果。
对于上述第一图片,在输入到上述预设的CTPN模型之前,需要先将第一图片处理成指定像素要求的黑白图片;其中具体的处理方式为将第一图片在保持长宽比不变的情况下,先将第一图片的最大维度调整为256像素,从而得到指定像素要求的黑白图片。
将上述指定像素要求的黑白图片输入到CTPN模型进行检测,其中CTPN模型具体包括VGG网络、LSTM网络以及全连接层。上述CTPN模型中的VGG网络用于对上述黑白图片进行卷积计算得到第一图片特征;经过VGG网络进行卷积计算得到第一图片特征,还需要通过LSTM网络对上述第一图像特征进行关联性特征计算得到第二图片特征,需要指出的是CTPN模型增加LSTM网络,可以使得CTPN模型能充分利用了第一图片特征中的文字部分的前后关联性,直接同步预测出文字部分的位置、类型与置信度三个参量,大大提高检测第一图片中的文字部分的速度以及精度。由于第二图片特征为局部图片特征,因此需要通过全连接层将上述第二图片特征结合在一起形成全局图片特征,最后根据全局图片特征得到检测结果,其中对应的检测结果为priorbox、pred和score三个结果,其中priorbox用来表示文本位置,pred用来表示文本类型,score用来表示特定位置上文本类型的置信度大小,根据得到上述三个 参数即可得到上述第一图片中的文字部分。
本实施例中的校正发票图像的方法,所述检测所述第二图片的边框的步骤S3包括:
步骤S31,检测所述第二图片中的多个黑色的短直线;
步骤S32,对多个所述短直线分别进行方向判定,以及计算相邻的所述短直线之间的距离;
步骤S33,将相邻的短直线之间的距离小于预设阈值,且达到预设的方向一致性条件的短直线,分到同一个短直线组中,得到多个短直线组;
步骤S34,分别对每个所述短直线组内的短直线进行拟合得到对应的多组长直线;
步骤S35,根据所述多组长直线在所述第二图片中的位置进行分类,得到多组方位直线组;
步骤S36,根据预设规则,删除每一方位直线组内不符合条件的长直线;
步骤S37,计算每一所述方位直线组中剩余的长直线的斜率平均值;
步骤S38,选取出该方位直线组中的剩余的长直线的两个端点,找出所有端点中距离所述第二图片在对应方位直线组所在侧的边界最近的端点作为指定点,根据每一所述方位直线组中剩余的长直线的斜率平均值,以及所述指定点生成所述第二图片在对应方位直线组所在侧的边界直线;
步骤S39,按照所述边界直线,以及预设的边框规则,生成所述第二图片的边框。
本实施例中的通过概率霍夫变换来检测上述第二图片得到多个黑色的短直线,其中上述概率霍夫变换检测得到黑色的短直线的方法具体为,将上述第二图片放置到直角坐标系中,而对于直角坐标系中的每个点(x i,y i),i=1,2,...n,如果它们在同一条直线上,那么x icosθ i+y isinθ i对于所有的点i=1,2,...n,均相同,而这个数值就是原点到上述所有点集所在直线的距离,其中θ i表示(x i,y i)与横轴正向的夹角,也就是(x i,y i)在极坐标下的表示(θ ii)中的角分量。再随机抽取上述第二图片中的边缘点进行检测,如果此点已经被标定为之前检测出的短直线上的点,则跳过,否则沿着本条检测出的直线方向去标定共线的点以确定短直线的端点,直到上述第二图片中所有边缘点都抽取完毕。相较于经典霍夫变换需要遍历上述第二图片上的所有边缘点,概率霍夫变换的速度更快,而且检测出的是更贴合图形边缘的短直线,而不是经典霍夫变换中的横跨整个图像的大直线。
对于通过上述概率霍夫变换检测得到的多个短直线,需要对多个上述短直线分别进行方向判定,以及计算相邻的所述短直线之间的距离。其中对多个上述短直线分别进行方向判定的方法为,通过将两条短直线夹角的余弦值与1的差值的绝对值作为衡量的标准;当该两条短直线夹角的余弦值与1的差值的绝对值大小为0,说明两条短直线平行,当该两条短直线夹角的余弦值与1的差值的绝对值大小为1,说明两条短直线垂直。优选地,当两条短直线夹角的余弦值与1的差值的绝对值大小低于0.1时,也判定两条短直线的方向为一致。其中计算相邻的所述短直线之间的距离的方法为,随机在两条短直线中选取两个端点,分别计算某一端点到另一条短直线的距离,从得到的四个端点到短直线的四个距离值,选取四个距离值中的最大值,当该最大值小于预设阈值时,具体的说小于15个像素点时,则说明这两条短直线的距离很小。根据上述方法将相邻的短直线之间的距离小于预设阈值,且达到预设的方向一致性条件的短直线,分到同一个短直线组中,得到多个短直线组。
对于上述多组短直线,需要将上述多组短直线中的所有短直线,以组内的形式进行拟合得到对应的多组长直线,其中进行拟合的方法为采用最小二乘法。需要注意的是,对于接近水平的短直线,直接使 用最小二乘法即可;对于接近竖直的短直线,由于短直线的斜率非常大,导致直接使用最小二乘法会产生比较大的误差,对于这种情况,需要将x坐标和y坐标位置进行交换,再使用最小二乘法,计算出结果后再将坐标交换回来。
对于得到的多组长直线,需要将上述所有的长直线根据位置进行分类得到多组方位直线组。其中具体的分组方式为将水平的且位于整体第二图片中上方三分之一的位置的长直线分在上方组,将水平的且位于整体第二图片中下方三分之一的位置的长直线分在下方组,将竖直的且在第二图片整体左方三分之一的位置的长直线分在左侧组,将竖直的且在第二图片整体右方三分之一的位置的长直线分在右侧组,从而实现将所有的长直线根据位置进行分类。
对于得到的多组方位直线组中每组内的长直线,需要根据预设规则,删除每一方位直线组内不符合条件的长直线,其目的在于删除方位直线组中与组内其它长直线方向不一致的长直线,排除不是第二图片的边界所产生的直线。
计算每一所述方位直线组中剩余的长直线的斜率平均值,需要注意的是,对于接近水平的长直线,直接计算斜率即可;对于接近竖直的长直线,由于长直线的斜率非常大,对于这种情况,需要将x坐标和y坐标位置进行交换再计算斜率即可,计算出结果后再将坐标交换回来。
对于该方位直线组中的剩余的长直线,选取出该方位直线组中的剩余的长直线的两个端点,找出所有端点中距离上述第二图片在对应方位直线组所在侧的边界最近的端点作为指定点;例如当该方位直线组为上方组时,该指定点指的是所有端点中距离所有端点中距离的上边界距离最近的端点;根据每一所述方位直线组中剩余的长直线的斜率平均值,以及所述指定点生成所述第二图片在对应方位直线组所在侧的边界直线。
根据求出的边界直线,按照上述边界直线,以及预设的边框规则,生成上述第二图片的边框;其中预设的边框规则为获取得到的边界直线围成的闭合区域所对应的线段作为上述第二图片的边框。
优选地,本实施例中的校正发票图像的方法,所述根据预设规则,删除每一方位直线组内不符合条件的长直线的步骤S36,包括:
当每一组方位直线组中包括两条长直线时,则删除长度较短的长直线;
当每一组方位直线组中包括两条以上的长直线时,将与组内其它长直线方向不一致的长直线删除。
在每一方位直线组中,当每一组中包括两条长直线时,如果这两条长直线方向不一致,则删除其中长度较短的一条长直线。当每一组中包括两条以上的长直线时,如果一条长直线与本组中超过一半以上的长直线方向不一致,则删除这条长直线。具体判断两条长直线的方向是否一致的方法为,通过将两条长直线夹角的余弦值与1的差值的绝对值大小作为衡量的标准;当该两条长直线夹角的余弦值与1的差值的绝对值大小为0,说明两条长直线平行,当该两条长直线夹角的余弦值与1的差值的绝对值大小为1,说明两条长直线垂直。当每一组中只有一条长直线时,则将该长直线作为该方位直线组的边界直线。
优选地,本实施例中的校正发票图像的方法,所述方位直线组包括上方组、下方组、左方组以及右方组,所述边界直线包括上边界直线、下边界直线、左边界直线以及右边界直线;所述按照所述边界直线,以及预设的边框规则,生成所述第二图片的边框的步骤S39,包括:
获取所述上边界直线、下边界直线、左边界直线以及右边界直线围成的闭合区域对应的线段,作为所述第二图片的边框。
根据上述多组长直线在所述第二图片中的位置进行分类时,能得到多组方位直线组,当方位直线组包括上方组、下方组、左方组以及右方组,则上述边界直线包括上边界直线、下边界直线、左边界直线以及右边界直线,获取所述上边界直线、下边界直线、左边界直线以及右边界直线围成的闭合区域对应的线段,作为所述第二图片的边框。需要说明的是,当某一侧不存在方位直线组,则判断对侧是否存在方位直线组,若对侧存在方位直线组,则把对侧的边界直线向本侧平移,直至与垂直侧的其中一条边界直线的端点将要分离的位置处时停止平移,然后把这条从对侧平移过来的边界直线作为本侧的边界直线进行上述操作;若对侧同样不存在直线,则直接使用两侧第二图片本身的边界作为边框。
本实施例中的校正发票图像的方法,所述计算出所述第一图片中的文字部分,并将计算得到的文字部分填充为空白状态得第二图片的步骤S2之前,包括:
步骤S201,调整所述第一图片的对比度。
在步骤S201中,对于得到的第一图片,可以通过调整第一图片的对比度,使得上述第一图片中的黑白部分的区分更加明显。其中调整第一图片的对比度的方法具体可以为限制对比度自适应直方图均衡算法(CLAHE算法),其中限制对比度自适应直方图均衡算法(CLAHE算法)具体采用自适应修剪图像的直方图,再使用修剪后的直方图对黑白图片进行均衡调整,其优点在于使得上述第一图片中的文字部分和边框部分对应的区域与空白部分、背景部门对应的白色区域之间的区分更加明显。
本实施例中的校正发票图像的方法,所述对所述待校正的发票图像进行黑白二值化处理得到第一图片的步骤S1,包括:
步骤S11,对所述待校正的发票图像进行转换得到灰度图;
步骤S12,将所述灰度图进行黑白二值化处理得到第一图片。
由于待校正的发票图像中的每个像素点的颜色由R、G、B三个分量决定,而每个分量有256个值可取,这样每个像素点有1600多万种颜色的变化范围。而灰度图是R、G、B三个分量相同的一种特殊的彩色图像,每个像素点的变化范围仅为256种,所以将待校正的发票图像在进行黑白二值化之前,将待校正的发票图像进行转换得到灰度图,能使得后续的计算量变得更少。其中对待校正的发票图像进行转换得到灰度图的方法可以为求出每个像素点的R、G、B三个分量的平均值,然后将这个平均值赋予给这个像素点的三个分量。此外,对待校正的发票图像进行转换得到灰度图的方法还可以为其他方法,例如在YUV的颜色空间中,Y的分量的物理意义所代表的是点的亮度,由Y值反映亮度等级,从而根据RGB和YUV颜色空间的变化关系可建立亮度Y与R、G、B三个颜色分量的对应关系:Y=0.3R+0.59G+0.11B,将Y这个亮度值用来表示待校正的发票图像每个像素点的灰度值,也使得后续的计算量变得更少。
在得到由待校正的发票图像转换得到的灰度图,即可对上述灰度图进行黑白二值化处理,其中对上述灰度图进行黑白二值化处理的方法具体为,对于上述灰度图中的每一个像素点P,选取以点P为中心的边长为21个像素点的正方形矩阵R,把正方形矩阵R中所有像素点的灰度值从大到小(颜色由白到黑)进行排序,选取正方形矩阵R中所有像素点20%的较大灰度值中最小灰度值T作为灰度阈值,如果点P的灰度值低于灰度阈值T,则将点P置为黑色,否则将点P置为白色。对于上述灰度图的所有像素点均通过上述方法进行黑白二值化处理,从而将上述灰度图中的文字部分和边框部分变成黑色,背景和空白部分变成白色,得到对应的第一图片,进行黑白二值化处理的第一图片有利于检测出其中的文字 部分,以及检测边框。
参照图3,本实施例中的校正发票图像的装置,包括:
处理单元10,用于对所述待校正的发票图像进行黑白二值化处理得到第一图片;
第一检测单元20,用于检测所述第一图片中的文字部分,并将检测得到的文字部分填充为空白图像以得第二图片;
第二检测单元30,用于检测所述第二图片的边框;
变换单元40,用于对所述待校正的发票图像中位于所述边框内的区域进行透视变换得到校正后的发票图。
本实施例中的校正发票图像的装置,需要先获取待校正的发票图像,其中待校正的发票图像作为进行统一的正视图校正处理的原始图像。对于上述待校正的发票图像需要进行黑白二值化处理得到对应的第一图片,具体的说,将待校正的发票图像中的边框和文字部分变为黑色,而边框和文字部分之外其它区域例如背景或空白部分将变为白色。处理单元10将上述待校正的发票图像进行黑白二值化处理之后,能更容易的计算出上述待校正的发票图像中的文字部分。
对于得到的第一图片,第一检测单元20需要检测上述第一图片中的文字部分,并将检测得到的上述第一图片中的文字部分填充为空白状态得到第二图片,将文字部分填充为空白状态可以避免在检测上述第二图片的边框时造成干扰,从而提高检测的待校正的发票图像的边框的准确性。
在对待校正的发票图像进行校正之前,第二检测单元30检测得到待校正的发票图像的边框,这里,第二图片的边框就是对待校正的发票图像进行校正的参考边框,从而实现对边框内的区域进行统一的正视图校正处理。
在对待校正的发票图像进行校正时,需要根据检测得到的上述第二图片的边框位置,对应获取得到上述待校正的发票图像中位于上述边框内的区域,变换单元40对上述待校正的发票图像中位于上述边框内的区域进行透视变换得到校正后的发票图片。其中透视变换的具体方式等在上述方法实施例中已经描述,不再赘述。
参照图4,本实施例中的校正发票图像的装置,所述第一检测单元20,包括:
第一检测模块21,用于将所述第一图片输入到预设的CTPN模型中进行检测;其中,所述CTPN模型为指定量的已知文字部分的第一图片以及所述第一图片中标记出的文字部分作为样本数据训练所得,用于检测得到第一图片中的文字部分;
获取模块22,用于获取所述CTPN模型输出的检测结果,所述检测结果即为所述第一图片中的文字部分。
对于获取的第一图片,第一检测模块21检测第一图片中的文字部分,本实施例中采用CTPN模型来进行检测,其中上述CTPN模型为训练好的模型。对于CTPN模型进行训练的方法在上述方法实施例中已经描述,不再赘述。
参照图5,本实施例中的校正发票图像的装置,所述CTPN模型包括VGG网络、LSTM网络以及全连接层,所述第一检测模块21,包括:
处理子模块211,用于将所述第一图片处理成指定像素要求的黑白图片;
第一计算子模块212,用于将所述黑白图片输入到VGG网络中进行卷积计算得到多个第一图片特 征;
第二计算子模块213,用于通过LSTM网络对所述第一图像特征进行关联性特征计算得到多个第二图片特征;
结合子模块214,用于通过全连接层将所述多个第二图片特征结合在一起形成全局图片特征,从而输出检测结果。
对于上述第一图片,在输入到上述预设的CTPN模型之前,处理子模块211将第一图片处理成指定像素要求的黑白图片;其中具体的处理方式为将第一图片在保持长宽比不变的情况下,先将第一图片的最大维度调整为256像素,从而得到指定像素要求的黑白图片。
第一计算子模块212将上述指定像素要求的黑白图片输入到CTPN模型进行检测,其中CTPN模型具体包括VGG网络、LSTM网络以及全连接层。上述CTPN模型中的VGG网络用于对上述黑白图片进行卷积计算得到第一图片特征;经过VGG网络进行卷积计算得到第一图片特征,第二计算子模块213通过LSTM网络对上述第一图像特征进行关联性特征计算得到第二图片特征,需要指出的是CTPN模型增加LSTM网络,可以使得CTPN模型能充分利用了第一图片特征中的文字部分的前后关联性,直接同步预测出文字部分的位置、类型与置信度三个参量,大大提高检测第一图片中的文字部分的速度以及精度。由于第二图片特征为局部图片特征,结合子模块214通过全连接层将上述第二图片特征结合在一起形成全局图片特征,最后根据全局图片特征得到检测结果,其中对应的检测结果为priorbox、pred和score三个结果,其中priorbox用来表示文本位置,pred用来表示文本类型,score用来表示特定位置上文本类型的置信度大小,根据得到上述三个参数即可得到上述第一图片中的文字部分。
参照图6,本实施例中的校正发票图像的装置,所述第二检测单元30,包括:
第二检测模块31,用于检测所述第二图片中的多个黑色的短直线;
执行模块32,用于对多个所述短直线分别进行方向判定,以及计算相邻的所述短直线之间的距离;
分组模块33,用于将相邻的短直线之间的距离小于预设阈值,且达到预设的方向一致性条件的短直线,分到同一个短直线组中,得到多个短直线组;
拟合模块34,用于分别对每个所述短直线组内的短直线进行拟合得到对应的多组长直线;
分类模块35,用于根据所述多组长直线在所述第二图片中的位置进行分类,得到多组方位直线组;
删除模块36,用于根据预设规则,删除每一方位直线组内不符合条件的长直线;
第一计算模块37,用于计算每一所述方位直线组中剩余的长直线的斜率平均值;
第二计算模块38,用于选取出该方位直线组中的剩余的长直线的两个端点,找出所有端点中距离所述第二图片在对应方位直线组所在侧的边界最近的端点作为指定点,根据每一所述方位直线组中剩余的长直线的斜率平均值,以及所述指定点生成所述第二图片在对应方位直线组所在侧的边界直线;
生成模块39,用于按照所述边界直线,以及预设的边框规则,生成所述第二图片的边框。
本实施例中的第二检测模块31通过概率霍夫变换来检测上述第二图片得到多个黑色的短直线,其中上述概率霍夫变换检测得到黑色的短直线的方法具体为,将上述第二图片放置到直角坐标系中,而对于直角坐标系中的每个点(x i,y i),i=1,2,...n,如果它们在同一条直线上,那么x icosθ i+y isinθ i对于所有的点i=1,2,...n,均相同,而这个数值就是原点到上述所有点集所在直线的距离,其中θ i表示 (x i,y i)与横轴正向的夹角,也就是(x i,y i)在极坐标下的表示(θ ii)中的角分量。再随机抽取上述第二图片中的边缘点进行检测,如果此点已经被标定为之前检测出的短直线上的点,则跳过,否则沿着本条检测出的直线方向去标定共线的点以确定短直线的端点,直到上述第二图片中所有边缘点都抽取完毕。相较于经典霍夫变换需要遍历上述第二图片上的所有边缘点,概率霍夫变换的速度更快,而且检测出的是更贴合图形边缘的短直线,而不是经典霍夫变换中的横跨整个图像的大直线。
对于通过上述概率霍夫变换检测得到的多个短直线,执行模块32对多个上述短直线分别进行方向判定,以及计算相邻的所述短直线之间的距离。其中对多个上述短直线分别进行方向判定的方法为,通过将两条短直线夹角的余弦值与1的差值的绝对值作为衡量的标准;当该两条短直线夹角的余弦值与1的差值的绝对值大小为0,说明两条短直线平行,当该两条短直线夹角的余弦值与1的差值的绝对值大小为1,说明两条短直线垂直。优选地,当两条短直线夹角的余弦值与1的差值的绝对值大小低于0.1时,也判定两条短直线的方向为一致。其中计算相邻的所述短直线之间的距离的方法为,随机在两条短直线中选取两个端点,分别计算某一端点到另一条短直线的距离,从得到的四个端点到短直线的四个距离值,选取四个距离值中的最大值,当该最大值小于预设阈值时,具体的说小于15个像素点时,则说明这两条短直线的距离很小。分组模块33根据上述方法将相邻的短直线之间的距离小于预设阈值,且达到预设的方向一致性条件的短直线,分到同一个短直线组中,得到多个短直线组。
对于上述多组短直线,拟合模块34将上述多组短直线中的所有短直线,以组内的形式进行拟合得到对应的多组长直线,其中进行拟合的方法为采用最小二乘法。需要注意的是,对于接近水平的短直线,直接使用最小二乘法即可;对于接近竖直的短直线,由于短直线的斜率非常大,导致直接使用最小二乘法会产生比较大的误差,对于这种情况,需要将x坐标和y坐标位置进行交换,再使用最小二乘法,计算出结果后再将坐标交换回来。
对于得到的多组长直线,分类模块35将上述所有的长直线根据位置进行分类得到多组方位直线组。其中具体的分组方式为将水平的且位于整体第二图片中上方三分之一的位置的长直线分在上方组,将水平的且位于整体第二图片中下方三分之一的位置的长直线分在下方组,将竖直的且在第二图片整体左方三分之一的位置的长直线分在左侧组,将竖直的且在第二图片整体右方三分之一的位置的长直线分在右侧组,从而实现将所有的长直线根据位置进行分类。
删除模块36对于得到的多组方位直线组中每组内的长直线,需要根据预设规则,删除每一方位直线组内不符合条件的长直线,其目的在于删除方位直线组中与组内其它长直线方向不一致的长直线,排除不是第二图片的边界所产生的直线。
第一计算模块37计算每一所述方位直线组中剩余的长直线的斜率平均值,需要注意的是,对于接近水平的长直线,直接计算斜率即可;对于接近竖直的长直线,由于长直线的斜率非常大,对于这种情况,需要将x坐标和y坐标位置进行交换再计算斜率即可,计算出结果后再将坐标交换回来。
对于该方位直线组中的剩余的长直线,选取出该方位直线组中的剩余的长直线的两个端点,找出所有端点中距离上述第二图片在对应方位直线组所在侧的边界最近的端点作为指定点;例如当该方位直线组为上方组时,该指定点指的是所有端点中距离所有端点中距离的上边界距离最近的端点;第二计算模块38根据每一所述方位直线组中剩余的长直线的斜率平均值,以及所述指定点生成所述第二图片在对应方位直线组所在侧的边界直线。
生成模块39根据求出的边界直线,按照上述边界直线,以及预设的边框规则,生成上述第二图片的边框;其中预设的边框规则为获取得到的边界直线围成的闭合区域所对应的线段作为上述第二图片的边框。
优选地,本实施例中的校正发票图像的装置,所述删除模块36用于当每一组方位直线组中包括两条长直线时,则删除长度较短的长直线;当每一组方位直线组中包括两条以上的长直线时,将与组内其它长直线方向不一致的长直线删除。
在每一方位直线组中,当每一组中包括两条长直线时,如果这两条长直线方向不一致,删除模块36则删除其中长度较短的一条长直线。当每一组中包括两条以上的长直线时,如果一条长直线与本组中超过一半以上的长直线方向不一致,删除模块36则删除这条长直线。具体判断两条长直线的方向是否一致的方法为,通过将两条长直线夹角的余弦值与1的差值的绝对值大小作为衡量的标准;当该两条长直线夹角的余弦值与1的差值的绝对值大小为0,说明两条长直线平行,当该两条长直线夹角的余弦值与1的差值的绝对值大小为1,说明两条长直线垂直。当每一组中只有一条长直线时,则将该长直线作为该方位直线组的边界直线。
优选地,本实施例中的校正发票图像的装置,根据上述多组长直线在所述第二图片中的位置进行分类时,能得到多组方位直线组,当方位直线组包括上方组、下方组、左方组以及右方组,则上述边界直线包括上边界直线、下边界直线、左边界直线以及右边界直线,生成模块39获取所述上边界直线、下边界直线、左边界直线以及右边界直线围成的闭合区域对应的线段,作为所述第二图片的边框。需要说明的是,当某一侧不存在方位直线组,则判断对侧是否存在方位直线组,若对侧存在方位直线组,则把对侧的边界直线向本侧平移,直至与垂直侧的其中一条边界直线的端点将要分离的位置处时停止平移,然后把这条从对侧平移过来的边界直线作为本侧的边界直线进行上述操作;若对侧同样不存在直线,则直接使用两侧第二图片本身的边界作为边框。
参照图7,另一实施例中的校正发票图像的装置,还包括:
调整单元201,用于调整所述第一图片的对比度。
对于得到的第一图片,调整单元201可以通过调整第一图片的对比度,使得上述第一图片中的黑白部分的区分更加明显。其中调整第一图片的对比度的方法具体可以为限制对比度自适应直方图均衡算法(CLAHE算法),其中限制对比度自适应直方图均衡算法(CLAHE算法)具体采用自适应修剪图像的直方图,再使用修剪后的直方图对黑白图片进行均衡调整,其优点在于使得上述第一图片中的文字部分和边框部分对应的区域与空白部分、背景部门对应的白色区域之间的区分更加明显。
参照图8,本实施例中的校正发票图像的装置,所述处理单元10,包括:
转换模块11,用于对所述待校正的发票图像进行转换得到灰度图;
处理模块12,用于将所述灰度图进行黑白二值化处理得到第一图片。
由于待校正的发票图像中的每个像素点的颜色由R、G、B三个分量决定,而每个分量有256个值可取,这样每个像素点有1600多万种颜色的变化范围。而灰度图是R、G、B三个分量相同的一种特殊的彩色图像,每个像素点的变化范围仅为256种,所以将待校正的发票图像在进行黑白二值化之前,转换模块101将待校正的发票图像进行转换得到灰度图,能使得后续的计算量变得更少。其中对待校正的发票图像进行转换得到灰度图的方法可以为求出每个像素点的R、G、B三个分量的平均值,然后将这 个平均值赋予给这个像素点的三个分量。此外,对待校正的发票图像进行转换得到灰度图的方法还可以为其他方法,例如在YUV的颜色空间中,Y的分量的物理意义所代表的是点的亮度,由Y值反映亮度等级,从而根据RGB和YUV颜色空间的变化关系可建立亮度Y与R、G、B三个颜色分量的对应关系:Y=0.3R+0.59G+0.11B,将Y这个亮度值用来表示待校正的发票图像每个像素点的灰度值,也使得后续的计算量变得更少。
在得到由待校正的发票图像转换得到的灰度图,处理模块102对上述灰度图进行黑白二值化处理,其中对上述灰度图进行黑白二值化处理的方法具体为,对于上述灰度图中的每一个像素点P,选取以点P为中心的边长为21个像素点的正方形矩阵R,把正方形矩阵R中所有像素点的灰度值从大到小(颜色由白到黑)进行排序,选取正方形矩阵R中所有像素点20%的较大灰度值中最小灰度值T作为灰度阈值,如果点P的灰度值低于灰度阈值T,则将点P置为黑色,否则将点P置为白色。对于上述灰度图的所有像素点均通过上述方法进行黑白二值化处理,从而将上述灰度图中的文字部分和边框部分变成黑色,背景和空白部分变成白色,得到对应的第一图片,进行黑白二值化处理的第一图片有利于检测出其中的文字部分,以及检测边框。
参照图9,本发明实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于预设的校正发票图像的方法等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现上述各方法实施例的流程。
发明一实施例还提供一种计算机非易失性可读存储介质,其上存储有计算机可读指令,计算机可读指令被处理器执行时实现上述各方法实施例的流程。
以上所述仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。

Claims (20)

  1. 一种校正发票图像的方法,其特征在于,包括:
    对所述待校正的发票图像进行黑白二值化处理得到第一图片;
    检测所述第一图片中的文字部分,并将检测得到的文字部分填充为空白图像以得第二图片;
    检测所述第二图片的边框;
    对所述待校正的发票图像中位于所述边框内的区域进行透视变换得到校正后的发票图片。
  2. 根据权利要求1所述的校正发票图像的方法,其特征在于,所述检测所述第一图片中的文字部分的步骤包括:
    将所述第一图片输入到预设的CTPN模型中进行检测;其中,所述CTPN模型为指定量的已知文字部分的第一图片以及所述第一图片中标记出的文字部分作为样本数据训练所得,用于检测得到第一图片中的文字部分;
    获取所述CTPN模型输出的检测结果,所述检测结果即为所述第一图片中的文字部分。
  3. 根据权利要求2所述的校正发票图像的方法,其特征在于,所述CTPN模型包括VGG网络、LSTM网络以及全连接层,所述将所述第一图片输入到预设的CTPN模型中进行检测的步骤,包括:
    将所述第一图片处理成指定像素要求的黑白图片;
    将所述黑白图片输入到VGG网络中进行卷积计算得到多个第一图片特征;
    通过LSTM网络对所述第一图像特征进行关联性特征计算得到多个第二图片特征;
    通过全连接层将所述多个第二图片特征结合在一起形成全局图片特征,从而输出检测结果。
  4. 根据权利要求1所述的校正发票图像的方法,其特征在于,所述检测所述第二图片的边框的步骤包括:
    检测所述第二图片中的多个黑色的短直线;
    对多个所述短直线分别进行方向判定,以及计算相邻的所述短直线之间的距离;
    将相邻的短直线之间的距离小于预设阈值,且达到预设的方向一致性条件的短直线,分到同一个短直线组中,得到多个短直线组;
    分别对每个所述短直线组内的短直线进行拟合得到对应的多组长直线;
    根据所述多组长直线在所述第二图片中的位置进行分类,得到多组方位直线组;
    根据预设规则,删除每一方位直线组内不符合条件的长直线;
    计算每一所述方位直线组中剩余的长直线的斜率平均值;
    选取出该方位直线组中的剩余的长直线的两个端点,找出所有端点中距离所述第二图片在对应方位直线组所在侧的边界最近的端点作为指定点,根据每一所述方位直线组中剩余的长直线的斜率平均值,以及所述指定点生成所述第二图片在对应方位直线组所在侧的边界直线;
    按照所述边界直线,以及预设的边框规则,生成所述第二图片的边框。
  5. 根据权利要求4所述的校正发票图像的方法,其特征在于,所述根据预设规则,删除每一方位直线组内不符合条件的长直线的步骤,包括:
    当每一组方位直线组中包括两条长直线时,则删除长度较短的长直线;
    当每一组方位直线组中包括两条以上的长直线时,将与组内其它长直线方向不一致的长直线删除。
  6. 根据权利要求4所述的校正发票图像的方法,其特征在于,所述方位直线组包括上方组、下方组、左方组以及右方组,所述边界直线包括上边界直线、下边界直线、左边界直线以及右边界直线;所述按照所述边界直线,以及预设的边框规则,生成所述第二图片的边框的步骤,包括:
    获取所述上边界直线、下边界直线、左边界直线以及右边界直线围成的闭合区域对应的线段,作为所述第二图片的边框。
  7. 根据权利要求1所述的校正发票图像的方法,其特征在于,所述对所述待校正的发票图像进行黑白二值化处理得到第一图片的步骤,包括:
    对所述待校正的发票图像进行转换得到灰度图;
    将所述灰度图进行黑白二值化处理得到第一图片。
  8. 一种校正发票图像的装置,其特征在于,包括:
    处理单元,用于对所述待校正的发票图像进行黑白二值化处理得到第一图片;
    第一检测单元,用于检测所述第一图片中的文字部分,并将检测得到的文字部分填充为空白图像以得第二图片;
    第二检测单元,用于检测所述第二图片的边框;
    变换单元,用于对所述待校正的发票图像中位于所述边框内的区域进行透视变换得到校正后的发票图片。
  9. 根据权利要求8所述的校正发票图像的装置,其特征在于,所述第一检测单元,包括:
    第一检测模块,用于将所述第一图片输入到预设的CTPN模型中进行检测;其中,所述CTPN模型为指定量的已知文字部分的第一图片以及所述第一图片中标记出的文字部分作为样本数据训练所得,用于检测得到第一图片中的文字部分;
    获取模块,用于获取所述CTPN模型输出的检测结果,所述检测结果即为所述第一图片中的文字部分。
  10. 根据权利要求9所述的校正发票图像的装置,其特征在于,所述CTPN模型包括VGG网络、LSTM网络以及全连接层,所述第一检测模块,包括:
    处理子模块,用于将所述第一图片处理成指定像素要求的黑白图片;
    第一计算子模块,用于将所述黑白图片输入到VGG网络中进行卷积计算得到多个第一图片特征;
    第二计算子模块,用于通过LSTM网络对所述第一图像特征进行关联性特征计算得到多个第二图片特征;
    结合子模块,用于通过全连接层将所述多个第二图片特征结合在一起形成全局图片特征,从而输出检测结果。
  11. 根据权利要求8所述的校正发票图像的装置,其特征在于,所述第二检测单元,包括:
    第二检测模块,用于检测所述第二图片中的多个黑色的短直线;
    执行模块,用于对多个所述短直线分别进行方向判定,以及计算相邻的所述短直线之间的距离;
    分组模块,用于将相邻的短直线之间的距离小于预设阈值,且达到预设的方向一致性条件的短直线,分到同一个短直线组中,得到多个短直线组;
    拟合模块,用于分别对每个所述短直线组内的短直线进行拟合得到对应的多组长直线;
    分类模块,用于根据所述多组长直线在所述第二图片中的位置进行分类,得到多组方位直线组;
    删除模块,用于根据预设规则,删除每一方位直线组内不符合条件的长直线;
    第一计算模块,用于计算每一所述方位直线组中剩余的长直线的斜率平均值;
    第二计算模块,用于选取出该方位直线组中的剩余的长直线的两个端点,找出所有端点中距离所述第二图片在对应方位直线组所在侧的边界最近的端点作为指定点,根据每一所述方位直线组中剩余的长直线的斜率平均值,以及所述指定点生成所述第二图片在对应方位直线组所在侧的边界直线;
    生成模块,用于按照所述边界直线,以及预设的边框规则,生成所述第二图片的边框。
  12. 根据权利要求11所述的校正发票图像的装置,其特征在于,所述删除模块,具体用于当每一组方位直线组中包括两条长直线时,则删除长度较短的长直线;当每一组方位直线组中包括两条以上的长直线时,将与组内其它长直线方向不一致的长直线删除。
  13. 根据权利要求11所述的校正发票图像的装置,其特征在于,所述方位直线组包括上方组、下方组、左方组以及右方组,所述边界直线包括上边界直线、下边界直线、左边界直线以及右边界直线;所述生成模块,具体用于获取所述上边界直线、下边界直线、左边界直线以及右边界直线围成的闭合区域对应的线段,作为所述第二图片的边框。
  14. 根据权利要求8所述的校正发票图像的装置,其特征在于,所述对所述待校正的发票图像进行黑白二值化处理得到第一图片的步骤,包括:
    对所述待校正的发票图像进行转换得到灰度图;
    将所述灰度图进行黑白二值化处理得到第一图片。
  15. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现校正发票图像的方法,该校正发票图像的方法包括:
    对所述待校正的发票图像进行黑白二值化处理得到第一图片;
    检测所述第一图片中的文字部分,并将检测得到的文字部分填充为空白图像以得第二图片;
    检测所述第二图片的边框;
    对所述待校正的发票图像中位于所述边框内的区域进行透视变换得到校正后的发票图片。
  16. 根据权利要求15所述的计算机设备,其特征在于,所述检测所述第一图片中的文字部分的步骤包括:
    将所述第一图片输入到预设的CTPN模型中进行检测;其中,所述CTPN模型为指定量的已知文字部分的第一图片以及所述第一图片中标记出的文字部分作为样本数据训练所得,用于检测得到第一图片中的文字部分;
    获取所述CTPN模型输出的检测结果,所述检测结果即为所述第一图片中的文字部分。
  17. 根据权利要求16所述的计算机设备,其特征在于,所述CTPN模型包括VGG网络、LSTM网络以及全连接层,所述将所述第一图片输入到预设的CTPN模型中进行检测的步骤,包括:
    将所述第一图片处理成指定像素要求的黑白图片;
    将所述黑白图片输入到VGG网络中进行卷积计算得到多个第一图片特征;
    通过LSTM网络对所述第一图像特征进行关联性特征计算得到多个第二图片特征;
    通过全连接层将所述多个第二图片特征结合在一起形成全局图片特征,从而输出检测结果。
  18. 一种计算机非易失性可读存储介质,其上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现校正发票图像的方法,该校正发票图像的方法包括:
    对所述待校正的发票图像进行黑白二值化处理得到第一图片;
    检测所述第一图片中的文字部分,并将检测得到的文字部分填充为空白图像以得第二图片;
    检测所述第二图片的边框;
    对所述待校正的发票图像中位于所述边框内的区域进行透视变换得到校正后的发票图片。
  19. 根据权利要求18所述的计算机非易失性可读存储介质,其特征在于,所述检测所述第一图片中的文字部分的步骤包括:
    将所述第一图片输入到预设的CTPN模型中进行检测;其中,所述CTPN模型为指定量的已知文字部分的第一图片以及所述第一图片中标记出的文字部分作为样本数据训练所得,用于检测得到第一图片中的文字部分;
    获取所述CTPN模型输出的检测结果,所述检测结果即为所述第一图片中的文字部分。
  20. 根据权利要求19所述的计算机非易失性可读存储介质,其特征在于,所述CTPN模型包括VGG网络、LSTM网络以及全连接层,所述将所述第一图片输入到预设的CTPN模型中进行检测的步骤,包括:
    将所述第一图片处理成指定像素要求的黑白图片;
    将所述黑白图片输入到VGG网络中进行卷积计算得到多个第一图片特征;
    通过LSTM网络对所述第一图像特征进行关联性特征计算得到多个第二图片特征;
    通过全连接层将所述多个第二图片特征结合在一起形成全局图片特征,从而输出检测结果。
PCT/CN2018/095484 2018-06-01 2018-07-12 校正发票图像的方法、装置、计算机设备和存储介质 WO2019227615A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810557203.9A CN108960062A (zh) 2018-06-01 2018-06-01 校正发票图像的方法、装置、计算机设备和存储介质
CN201810557203.9 2018-06-01

Publications (1)

Publication Number Publication Date
WO2019227615A1 true WO2019227615A1 (zh) 2019-12-05

Family

ID=64492481

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/095484 WO2019227615A1 (zh) 2018-06-01 2018-07-12 校正发票图像的方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN108960062A (zh)
WO (1) WO2019227615A1 (zh)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310746A (zh) * 2020-01-15 2020-06-19 支付宝实验室(新加坡)有限公司 文本行检测方法、模型训练方法、装置、服务器及介质
CN111695559A (zh) * 2020-04-28 2020-09-22 深圳市跨越新科技有限公司 基于YoloV3模型的运单图片信息打码方法及系统
CN111695558A (zh) * 2020-04-28 2020-09-22 深圳市跨越新科技有限公司 基于YoloV3模型的物流运单图片摆正方法及系统
CN111753830A (zh) * 2020-06-22 2020-10-09 作业不凡(北京)教育科技有限公司 一种作业图像校正方法和计算设备
CN111860608A (zh) * 2020-06-28 2020-10-30 浙江大华技术股份有限公司 发票图像配准方法、设备及计算机存储介质
CN111862082A (zh) * 2020-07-31 2020-10-30 成都盛锴科技有限公司 一种列车闸片厚度复核方法及其系统
CN111899270A (zh) * 2020-07-30 2020-11-06 平安科技(深圳)有限公司 卡片边框检测方法、装置、设备及可读存储介质
CN112052853A (zh) * 2020-09-09 2020-12-08 国家气象信息中心 一种基于深度学习的手写气象档案资料的文本定位方法
CN112529014A (zh) * 2020-12-14 2021-03-19 中国平安人寿保险股份有限公司 直线检测方法、信息提取方法、装置、设备及存储介质
CN112529989A (zh) * 2020-12-19 2021-03-19 杭州东信北邮信息技术有限公司 一种基于票据模板的图片重构方法
CN112633275A (zh) * 2020-12-22 2021-04-09 航天信息股份有限公司 一种基于深度学习的多票据混拍图像校正方法及系统
CN112800797A (zh) * 2020-12-30 2021-05-14 凌云光技术股份有限公司 一种dm码的区域定位方法及系统
CN116311333A (zh) * 2023-02-21 2023-06-23 南京云阶电力科技有限公司 针对电气图纸中边缘细小文字识别的预处理方法及系统

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815954A (zh) * 2019-01-31 2019-05-28 科大讯飞股份有限公司 增值税发票图像的方向校正方法、装置、设备及存储介质
CN110415183A (zh) * 2019-06-18 2019-11-05 平安科技(深圳)有限公司 图片校正方法、装置、计算机设备及计算机可读存储介质
CN111738254A (zh) * 2019-10-12 2020-10-02 贵州电网有限责任公司 一种继电保护装置面板与屏幕内容自动化识别方法
CN110674889B (zh) * 2019-10-15 2021-03-30 贵州电网有限责任公司 一种用于电表终端故障识别的图像训练方法
CN111259177B (zh) * 2020-01-10 2023-07-18 深圳盒子信息科技有限公司 一种黑白二值签名图片存储方法和系统
CN111444912A (zh) * 2020-01-14 2020-07-24 国网电子商务有限公司 一种票据图像文字识别方法及装置
CN111369554A (zh) * 2020-03-18 2020-07-03 山西安数智能科技有限公司 低亮度多角度环境下皮带损伤样本的优化和预处理方法
CN111784587B (zh) * 2020-06-30 2023-08-01 杭州师范大学 一种基于深度学习网络的发票照片位置矫正方法
CN113220859B (zh) * 2021-06-01 2024-05-10 平安科技(深圳)有限公司 基于图像的问答方法、装置、计算机设备及存储介质
CN117333374B (zh) * 2023-10-26 2024-09-03 深圳市海恒智能股份有限公司 一种基于图像直线段信息的书脊图像校正方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8995770B2 (en) * 2011-07-11 2015-03-31 Brigham Young University Word warping for offline handwriting recognition
CN107862303A (zh) * 2017-11-30 2018-03-30 平安科技(深圳)有限公司 表格类图像的信息识别方法、电子装置及可读存储介质
CN108022243A (zh) * 2017-11-23 2018-05-11 浙江清华长三角研究院 一种基于深度学习的图像中纸张检测方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473763B (zh) * 2013-08-31 2017-06-20 哈尔滨理工大学 基于启发式概率Hough变换的道路边缘检测方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8995770B2 (en) * 2011-07-11 2015-03-31 Brigham Young University Word warping for offline handwriting recognition
CN108022243A (zh) * 2017-11-23 2018-05-11 浙江清华长三角研究院 一种基于深度学习的图像中纸张检测方法
CN107862303A (zh) * 2017-11-30 2018-03-30 平安科技(深圳)有限公司 表格类图像的信息识别方法、电子装置及可读存储介质

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Document Image Calibration Recovery Algorithm Based on Hough Line Detection and Two- dimensional Perspective Transformation", ELECTRONIC MEASUREMENT TECHNOLOGY, vol. 40, no. 9, 30 September 2017 (2017-09-30), pages 129, ISSN: 1002-7300 *
TIAN, WENLI: "Document Image Calibration Recovery Algorithm Based on Hough Line Detection and Two- dimensional Perspective Transformation", ELECTRONIC MEASUREMENT TECHNOLOGY, vol. 40, no. 9, 30 September 2017 (2017-09-30), pages 129, ISSN: 1002-7300 *
WANG, YAJUN: "Chinese Character Detection and Time and Space Distribution Analysis of Street View Images in Several Capital Cities in Southeast Asia", BASIC SCIENCES, CHINA MASTER'S THESES FULL-TEXT DATABASE, 15 August 2017 (2017-08-15), ISSN: 1674-0246 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310746A (zh) * 2020-01-15 2020-06-19 支付宝实验室(新加坡)有限公司 文本行检测方法、模型训练方法、装置、服务器及介质
CN111310746B (zh) * 2020-01-15 2024-03-01 支付宝实验室(新加坡)有限公司 文本行检测方法、模型训练方法、装置、服务器及介质
CN111695559B (zh) * 2020-04-28 2023-07-18 深圳市跨越新科技有限公司 基于YoloV3模型的运单图片信息打码方法及系统
CN111695559A (zh) * 2020-04-28 2020-09-22 深圳市跨越新科技有限公司 基于YoloV3模型的运单图片信息打码方法及系统
CN111695558A (zh) * 2020-04-28 2020-09-22 深圳市跨越新科技有限公司 基于YoloV3模型的物流运单图片摆正方法及系统
CN111695558B (zh) * 2020-04-28 2023-08-04 深圳市跨越新科技有限公司 基于YoloV3模型的物流运单图片摆正方法及系统
CN111753830A (zh) * 2020-06-22 2020-10-09 作业不凡(北京)教育科技有限公司 一种作业图像校正方法和计算设备
CN111860608A (zh) * 2020-06-28 2020-10-30 浙江大华技术股份有限公司 发票图像配准方法、设备及计算机存储介质
CN111899270A (zh) * 2020-07-30 2020-11-06 平安科技(深圳)有限公司 卡片边框检测方法、装置、设备及可读存储介质
CN111899270B (zh) * 2020-07-30 2023-09-05 平安科技(深圳)有限公司 卡片边框检测方法、装置、设备及可读存储介质
CN111862082A (zh) * 2020-07-31 2020-10-30 成都盛锴科技有限公司 一种列车闸片厚度复核方法及其系统
CN112052853B (zh) * 2020-09-09 2024-02-02 国家气象信息中心 一种基于深度学习的手写气象档案资料的文本定位方法
CN112052853A (zh) * 2020-09-09 2020-12-08 国家气象信息中心 一种基于深度学习的手写气象档案资料的文本定位方法
CN112529014A (zh) * 2020-12-14 2021-03-19 中国平安人寿保险股份有限公司 直线检测方法、信息提取方法、装置、设备及存储介质
CN112529014B (zh) * 2020-12-14 2023-09-26 中国平安人寿保险股份有限公司 直线检测方法、信息提取方法、装置、设备及存储介质
CN112529989A (zh) * 2020-12-19 2021-03-19 杭州东信北邮信息技术有限公司 一种基于票据模板的图片重构方法
CN112633275A (zh) * 2020-12-22 2021-04-09 航天信息股份有限公司 一种基于深度学习的多票据混拍图像校正方法及系统
CN112633275B (zh) * 2020-12-22 2023-07-18 航天信息股份有限公司 一种基于深度学习的多票据混拍图像校正方法及系统
CN112800797A (zh) * 2020-12-30 2021-05-14 凌云光技术股份有限公司 一种dm码的区域定位方法及系统
CN112800797B (zh) * 2020-12-30 2023-12-19 凌云光技术股份有限公司 一种dm码的区域定位方法及系统
CN116311333A (zh) * 2023-02-21 2023-06-23 南京云阶电力科技有限公司 针对电气图纸中边缘细小文字识别的预处理方法及系统
CN116311333B (zh) * 2023-02-21 2023-12-01 南京云阶电力科技有限公司 针对电气图纸中边缘细小文字识别的预处理方法及系统

Also Published As

Publication number Publication date
CN108960062A (zh) 2018-12-07

Similar Documents

Publication Publication Date Title
WO2019227615A1 (zh) 校正发票图像的方法、装置、计算机设备和存储介质
CN112348815B (zh) 图像处理方法、图像处理装置以及非瞬时性存储介质
US10803554B2 (en) Image processing method and device
US11790499B2 (en) Certificate image extraction method and terminal device
CN106682629B (zh) 一种复杂背景下身份证号识别算法
WO2020228187A1 (zh) 边缘检测方法、装置、电子设备和计算机可读存储介质
CN110400278B (zh) 一种图像颜色和几何畸变的全自动校正方法、装置及设备
CN111353961B (zh) 一种文档曲面校正方法及装置
CN111160291B (zh) 基于深度信息与cnn的人眼检测方法
WO2023024766A1 (zh) 物体尺寸识别方法、可读存储介质及物体尺寸识别系统
US9087272B2 (en) Optical match character classification
US20180253852A1 (en) Method and device for locating image edge in natural background
CN110135446B (zh) 文本检测方法及计算机存储介质
CN112990183B (zh) 离线手写汉字同名笔画提取方法、系统、装置
WO2022116104A1 (zh) 图像处理方法、装置、设备及存储介质
CN115082450A (zh) 基于深度学习网络的路面裂缝检测方法和系统
CN113538291B (zh) 卡证图像倾斜校正方法、装置、计算机设备和存储介质
JP2017500662A (ja) 投影ひずみを補正するための方法及びシステム
CN117115358B (zh) 数字人自动建模方法及装置
WO2024041318A1 (zh) 图像集的生成方法、装置、设备和计算机可读存储介质
WO2023155298A1 (zh) 数据增强处理方法、装置、计算机设备及存储介质
WO2023273158A1 (zh) 车路协同中相机作用距离确定方法、装置和路侧设备
CN111260623A (zh) 图片评价方法、装置、设备及存储介质
WO2021098861A1 (zh) 识别文本的方法、装置、识别设备和存储介质
CN111325670A (zh) 一种数据增强方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18920289

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 25.03.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18920289

Country of ref document: EP

Kind code of ref document: A1