Detailed Description
The technical problems solved, the technical solutions adopted and the technical effects achieved by the embodiments of the present invention are clearly and completely described below with reference to the accompanying drawings and the specific embodiments. It is to be understood that the described embodiments are merely a few, and not all, of the embodiments of the present application. All other equivalent or obviously modified embodiments obtained by the person skilled in the art based on the embodiments in this application fall within the scope of protection of the invention without inventive step. The embodiments of the invention can be embodied in many different ways as defined and covered by the claims.
It should be noted that in the following description, numerous specific details are set forth in order to provide an understanding. It may be evident, however, that the subject invention may be practiced without these specific details.
It should be noted that, unless explicitly defined or conflicting, the embodiments and technical features in the present invention may be combined with each other to form a technical solution.
In order to effectively correct a tilted text image, an embodiment of the present invention provides a text image tilt correction method. As shown in fig. 1, the method may include at least:
s100: and acquiring a text image to be corrected.
S110: and carrying out gray processing on the text image to be corrected to obtain a gray image.
S120: and carrying out binarization processing on the gray level image to obtain a binary image.
S130: and extracting straight lines in the binary image through Hough transform.
S140: and filtering the straight line according to the length and the inclination angle of the straight line.
S150: and determining the median of the inclination angles as the inclination angles of the text images to be corrected according to the filtered straight lines.
S160: and rotating the text image to be corrected according to the inclination angle of the text image to be corrected.
The embodiment of the invention extracts the straight line in the binary image by carrying out graying, binarization processing and Hough transformation on the text image to be corrected. Then, the extracted straight line is filtered according to the length and the inclination angle of the straight line, the median of the inclination of the filtered straight line in the image is used as the estimation of the image inclination, and the image is subjected to rotation correction, so that the text images with various inclinations can be effectively corrected.
Specifically, step S120 may further include:
s122: and extracting an image gray matrix according to the gray image.
S124: and calculating an image local contrast matrix according to the image gray matrix.
S126: and (4) performing binary division on the image local contrast matrix by using the Otsu method to obtain a binary image.
The local contrast of the text image can be calculated according to the following formula:
Con(i,j)=αC(i,j)+(1-α)(Imax(i,j)-Imin(i,j))
wherein Con (i, j) represents the local contrast value at (i, j); i represents the abscissa of the pixel point; j represents the ordinate of the pixel; i (I, j) represents the gray value at position (I, j), Imax(I, j) and Imin(i, j) respectively representing the maximum and minimum grey values in a local neighborhood centered around (i, j), α e (0,1), which is an adjustable parameter, and e is an infinitesimal quantity, which has the effect of preventing the denominator from being 0.
In the local contrast calculation formula, the parameter α can be calculated according to the following formula:
wherein, pow (x)1,x2) Representing an exponential function, x1Denotes the base number, x2Represents an index; var represents the standard deviation of the whole image; the gamma value is 1.
Illustratively, the local filter window width size may be chosen to be 5 when computing the image local contrast matrix.
The local contrast matrix of the image is obtained by filtering the gray matrix of the image, so that the influence caused by uneven illumination can be effectively eliminated, and the contrast and the binary separability of the image are improved.
In an optional embodiment, when the step S126 is specifically implemented, the method may specifically include:
s1261: and acquiring the maximum value and the minimum value of the contrast value in the local contrast matrix of the image.
S1262: and setting the number of histogram groups, equally dividing the interval between the maximum value and the minimum value of the contrast value according to the number of the histogram groups, so that the local contrast value of each pixel point falls into the corresponding interval, and constructing a histogram.
S1263: selecting any point in the histogram, dividing the histogram into two parts according to the point, and calculating the intra-class variance and the inter-class variance of the two parts.
S1264: and selecting the point with the maximum value of the inter-class variance divided by the intra-class variance in the histogram as the optimal binary segmentation threshold point.
S1265: and dividing the image local contrast matrix into a first binary matrix according to the optimal binary segmentation threshold point.
S1266: and performing edge detection on the gray level image by using a Canny operator to determine an edge matrix.
S1267: and taking the intersection of the first binary matrix and the edge matrix to determine a third binary matrix.
S1268: and determining a binary image according to the third binary matrix.
In the first binary matrix, 0 represents a background point, and 1 represents a character point.
When the Canny operator is used for detecting the edge on the original gray level image, the parameter threshold value of the Canny operator can be reduced, so that the detected edge is more than the real edge. The edge matrix is also a binary matrix. Where 0 denotes a background point and 1 denotes a character edge point.
Taking the intersection of the first binary matrix and the edge matrix is to set the point of the position where both binary matrices are 1 as 1, and set the points of the rest positions as 0, thereby obtaining a third binary matrix.
The process of obtaining the optimal binary split threshold points is described below by way of a preferred embodiment.
Setting the histogram group number bin to 1000, and obtaining the maximum value Con in the image local contrast matrixmaxAnd minimum value Conmin。
Then, Con is mixedmaxAnd ConminThe interval in between is equally divided by the number of histogram bins, so that each point in the image local contrast matrix falls into a corresponding bin by its size.
All alternative thresholds can be derived according to the following formula:
wherein, i is 1, 2.
After selecting a threshold thres (i), the histogram can be divided into two parts, and the intra-class variance of the two parts can be calculated respectivelyAnd between-class variance
Finally, the optimal binary segmentation threshold point thres is determined according to the following formula*:
The process of obtaining the third binary matrix is described below in a preferred embodiment.
Let B1Representing a first binary matrix, B2An edge matrix is represented.
Wherein, B2Can be obtained by the following steps: on the gray matrix of the text image, Canny operator is used for edge extraction, the low threshold is set to 50, and the high threshold is set to 150, and the two thresholds ensure that most true edges are extracted. Extracted B2Still a binary matrix.
Then, carry out B ═ B1∩B2I.e. B1And B2The same 1-centered dot is represented as 1 (character edge point), and the remaining dots are represented as 0 (background point).
It will be appreciated by persons skilled in the art that the above description is made only by way of example and not as a limitation on the scope of the invention.
It should be noted that the above-mentioned method for acquiring a binary image is only an example, and any other existing or hereafter-appearing method for acquiring a binary image may be applied to the present invention, and should be included in the scope of the present invention and incorporated herein by reference.
According to the embodiment of the invention, the local contrast matrix is divided into two values by using the Otsu method, so that the accuracy of binarization can be effectively improved.
In step S130, a straight line in the binary image is extracted by hough transform.
In a binary image, a background point is represented by 0, and a character point is represented by 1. Therefore, only the polar coordinate conversion is required for the character points. Setting a threshold value (for example, 10), taking a point in the Hough space larger than the threshold value as a straight line in the image space, and reversing the point to a corresponding position in the image space, thereby completing the Hough transformation.
Converting each character point in the binary matrix into a straight line in a polar coordinate system, wherein all the character points at the position of the straight line in the image space coordinate system intersect at one point in the polar coordinate system; then marking the point of which the numerical value exceeds a certain threshold value in the polar coordinate system as a straight line at the corresponding position in the image space coordinate system; finally, for each straight line in the image space coordinate system, the corresponding line segment can be obtained according to the information of the character points on the straight line.
Specifically, the equation of a straight line in the rectangular plane coordinate system, y ═ kx + b, can be expressed as r ═ x · cos θ + y · sin θ through hough transform, where r represents the distance between the straight line and the origin, and θ represents the angle between the straight line and the x axis. Thus, for any point having coordinates (x0, y0), all straight lines passing through the point satisfy r (θ) ═ x0 · cos θ + y0 · sin θ. When detecting a straight line by hough transform, a set of straight lines in the form of r, θ passing through the point is determined for each point in the image (i.e., a corresponding r value is calculated for each θ angle sampled at angular intervals in the range of 0 ° to 360 °), and coordinates of the set of straight lines passing through each point may constitute one curve in hough space, and intersections of the plurality of curves in hough space represent straight lines formed by the respective points. In general, the intersection formed by the intersection of the most curves represents the detected straight line.
The process of extracting the straight lines in the binary image is described in detail below by way of a preferred embodiment.
Suppose there is an edge point B in B1The coordinate in the space coordinate system (x-y coordinate system) is (x)1,y1) The straight line passing through the point is expressed by a parametric equation as p ═ x1·cosθ+y1·sinθ。
Point b in space coordinate system1Is mapped to a straight line p ═ x in a polar coordinate system (p to theta coordinate system)1·cosθ+y1·sinθ。
Similarly, a point (p) in the polar coordinate system1,θ1) Is mapped into a straight line p under a space coordinate system1=x·cosθ1+y·sinθ1。
All edge points B in Bi(i ═ 1, 2.. n) are all mapped to straight lines in the polar coordinate system, and the values of the corresponding positions in the polar coordinate system are accumulatedAnd (4) adding. I.e. the values of all points that are straight through are accumulated, the unit of accumulation may be 1.
And mapping the edge points on the same straight line to corresponding straight lines in the polar coordinate system, wherein the corresponding straight lines are intersected at one point. Therefore, if a certain point (p) in the polar coordinate system1,θ1) If m is the size of (b), the straight line p is indicated in the space coordinate system1=x·cosθ1+y·sinθ1The number of edge points of (2) is m.
Setting the threshold value to 50, if m > 50, p1=x·cosθ1+y·sinθ1As a straight line in image space.
All lines in the image are counted in the above manner.
According to the embodiment of the invention, when Hough transform is performed on the binary image, only character points are counted, so that the accuracy of line detection can be effectively improved, and the time complexity of an algorithm is reduced.
On the basis of obtaining the straight line through hough transform, step S140 may specifically include:
and S142, determining a line segment by counting the character points on each straight line, wherein the coordinate of the character point with the minimum abscissa is taken as the initial coordinate of the line segment, and the coordinate of the character point with the maximum abscissa is taken as the final coordinate of the line segment.
S144: and judging whether the distance between adjacent character points on the line segment is greater than a preset threshold value.
S146: and if the distance between the adjacent character points on the line segment is greater than a preset distance threshold, dividing the line segment into two line segments from the two character points.
Wherein, the character point is also the pixel point. The distance preset threshold may be 10.
After the line segments are obtained, the line segments are also screened. In particular, the screening may be performed in the following manner.
In a first implementation, it is determined whether the length of the line segment is less than a first threshold. If so, the line segment is deleted.
In a second implementation, it is determined whether the number of character points included in the line segment is less than a second threshold. If so, the line segment is deleted.
In a third implementation, the tilt angle of each line segment is calculated. A neighborhood of the slope angles of the line segments is determined. And judging whether the inclination angle values of other line segments fall into the neighborhood. If not, the line segment is deleted.
Illustratively, if the length of a line segment is less than 70, then the line segment is deleted. If the number of character points contained in a line segment is less than 20, the line segment is deleted. If a line segment is tilted by an angle theta, the neighborhood of theta may be set to theta-5, theta + 5. If the inclination angle value of no other line segment falls within the angle range of [ theta-5, theta +5], the line segment is deleted.
After the line segments are screened, the screened line segments can be sorted according to the size of the inclination angles, and the inclination angle of the median is selected and used as the inclination angle of the text image to be corrected.
In an optional embodiment, the rotating the text image to be corrected according to the inclination angle of the text image to be corrected may specifically include:
and judging whether the inclination angle of the text image to be corrected is larger than a preset angle threshold, and if so, rotating the text image to be corrected. Otherwise, the text image to be corrected is not rotated.
Preferably, the angle preset threshold may be 2 degrees.
In some optional implementations of the embodiment, it is determined whether the tilt angle of the text image to be corrected is less than 2 degrees. If so, the text image to be corrected is not rotated.
In some optional implementations of the present embodiment, it is determined whether the tilt angle of the text image to be corrected is between 85 degrees and 95 degrees. If so, the text image to be corrected is rotated twice.
Whether the inclination angle is between 85 degrees and 95 degrees indicates that the text image is a vertically shot image, and at the moment, whether the text image rotates leftwards or rightwards cannot be determined, and the text image can be rotated in any direction. Therefore, for the vertically shot image, the forward and reverse rotations are performed twice according to the inclination angle thereof.
Although the steps in this embodiment are described in the foregoing sequence, those skilled in the art will understand that, in order to achieve the effect of this embodiment, different steps need not be executed in such a sequence, and may be executed simultaneously or in a reverse sequence, and these simple changes are within the protection scope of the present invention.
Based on the same technical concept as the method embodiment, the embodiment of the invention also provides a text image inclination correction system. As shown in fig. 2, the text image inclination correction system 20 includes at least: an acquisition unit 21, a graying unit 22, a binarization unit 23, an extraction unit 24, a filtering unit 25, a determination unit 26, and a rotation unit 27. Wherein the acquiring unit 21 is configured to acquire a text image to be corrected. The graying unit 22 is configured to perform graying processing on the text image to be corrected, resulting in a grayscale image. The binarization unit 23 is configured to perform binarization processing on the grayscale image, resulting in a binary image. The extraction unit 24 is configured to extract a straight line in the binary image by hough transform. The filtering unit 25 is configured to filter the straight line according to the length and the inclination angle of the straight line. The determination unit 26 is configured to determine, for the filtered straight line, the median of the tilt angles as the tilt angle of the text image to be corrected. The rotation unit 27 is configured to rotate the text image to be corrected in accordance with the inclination angle of the text image to be corrected.
In some optional implementation manners of the embodiment of the present invention, the binarization unit may specifically include: the device comprises an extraction module, a first calculation module and a first division module. Wherein the extraction module is configured to extract an image grayscale matrix from the grayscale image. The first calculation module is configured to calculate an image local contrast matrix from the image grayscale matrix. The first partitioning module is configured to perform binary partitioning on the image local contrast matrix by using the Otsu method to obtain a binary image.
In some optional implementation manners of the embodiment of the present invention, the first partitioning module specifically includes: the device comprises an acquisition module, a construction module, a second calculation module, a selection module, a second division module, a first determination module, a second determination module and a third determination module. Wherein the obtaining module is configured to obtain a maximum value and a minimum value of the contrast values in the image local contrast matrix. The construction module is configured to set the number of histogram groups, and equally divide the interval between the maximum value and the minimum value of the contrast value according to the number of the histogram groups, so that the local contrast value of each pixel point falls into the corresponding interval, and construct a histogram. The second calculation module is configured to select any point in the histogram, divide the histogram into two parts according to the point, and calculate the intra-class variance and the inter-class variance of the two parts. The selecting module is configured to select a point in the histogram where the value of the inter-class variance divided by the intra-class variance is maximum as an optimal binary segmentation threshold point. The second partitioning module is configured to partition the image local contrast matrix into a first binary matrix according to the optimal binary segmentation threshold points. The first determination module is configured to perform edge detection on the gray-scale image using a Canny operator to determine an edge matrix. The second determination module is configured to determine a third binary matrix by taking an intersection of the first binary matrix and the edge matrix. The third determination module is configured to determine a binary image from the binary matrix.
In some optional implementation manners of the embodiment of the present invention, the filtering unit specifically includes: the device comprises a fourth determining module, a first judging module, a segmenting module and a screening module. The screening module comprises a second judging module and a first deleting module; or the screening module comprises a third judging module and a second deleting module; or the screening module comprises a third calculating module, a fifth determining module, a fourth judging module and a third deleting module. The fourth determining module is configured to determine a line segment by counting character points on a straight line, wherein a coordinate where a character point with a minimum abscissa is located is taken as a start coordinate of the line segment, and a coordinate where a character point with a maximum abscissa is located is taken as an end coordinate of the line segment. The first judging module is configured to judge whether the distance between adjacent character points on the line segment is larger than a preset threshold value. The segmentation module is configured to segment a line segment from two character points into two line segments if a distance between adjacent character points on the line segment is greater than a distance preset threshold. The second determination module is configured to determine whether the length of the line segment is less than a first threshold. The first deletion module is configured to delete a line segment if the length of the line segment is less than a first threshold. Alternatively, the third judging module is configured to judge whether the number of character points included in the line segment is less than a second threshold. The second deletion module is configured to delete a line segment if the number of character points that the line segment contains is less than a second threshold. Alternatively, the third calculation module is configured to calculate the inclination angle of the line segment. A fifth determination module is configured to determine a neighborhood of the angle of inclination of the line segment. The fourth determination module is configured to determine whether the tilt angle values of other line segments fall within the neighborhood. The third deletion module is configured to delete the other line segment if the tilt angle values of the line segment fall within the neighborhood.
In some optional implementations of the embodiment of the present invention, the rotation unit specifically includes: a fifth judging module and a rotating module. Wherein the fifth judging module is configured to judge whether the inclination angle of the text image to be corrected is greater than an angle preset threshold. The rotation module is configured to rotate the text image to be corrected if the inclination angle of the text image to be corrected is greater than an angle preset threshold.
It should be noted that: in the text image tilt correction system provided in the above embodiment, only the division of the above functional modules is taken as an example when performing text image correction, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the system is divided into different functional modules to complete all or part of the above described functions.
The above system embodiment may be used to implement the above method embodiment, and the technical principle, the technical problems solved, and the technical effects are similar, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the above described system may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
It should be noted that the system embodiment and the method embodiment of the present invention have been described above separately, but the details described for one embodiment may also be applied to another embodiment. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention. Those skilled in the art will understand that: modules or steps in embodiments of the present invention may also be broken down or combined. For example, the modules of the above embodiments may be combined into one module, or may be further split into multiple sub-modules.
The technical solutions provided by the embodiments of the present invention are described in detail above. Although specific examples have been employed herein to illustrate the principles and practice of the invention, the foregoing descriptions of embodiments are merely provided to assist in understanding the principles of embodiments of the invention; also, it will be apparent to those skilled in the art that variations may be made in the embodiments and applications of the invention without departing from the spirit and scope of the invention.
It should be noted that the flowcharts or block diagrams referred to herein are not limited to the forms shown herein, and may be divided and/or combined.
It should be noted that: the numerals and text in the figures are only used to illustrate the invention more clearly and are not to be considered as an undue limitation of the scope of the invention.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus/device.
The term "if" can be interpreted to mean "at … …" or "immediately after … …", depending on the context.
The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. The terms first, second, etc. are used to denote names, and not to denote any particular order.
The various steps of the present invention may be implemented in a general purpose computing device, for example, they may be centralized on a single computing device, such as: personal computers, server computers, hand-held or portable devices, tablet-type devices or multi-processor apparatus, which may be distributed over a network of computing devices, may perform the steps shown or described in a different order than those shown or described herein, or may be implemented as separate integrated circuit modules, or may be implemented as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific hardware or software or combination thereof.
The methods provided by the present invention may be implemented using programmable logic devices or as computer program software or program modules (including routines, programs, objects, components, data structures, etc.) including performing particular tasks or implementing particular abstract data types, such as a computer program product which is executed to cause a computer to perform the methods described herein. The computer program product includes a computer-readable storage medium having computer program logic or code portions embodied in the medium for performing the method. The computer-readable storage medium may be a built-in medium installed in the computer or a removable medium detachable from the computer main body (e.g., a storage device using a hot-plug technology). The built-in medium includes, but is not limited to, rewritable non-volatile memory such as: RAM, ROM, flash memory, and hard disk. The removable media include, but are not limited to: optical storage media (e.g., CD-ROMs and DVDs), magneto-optical storage media (e.g., MOs), magnetic storage media (e.g., magnetic tapes or removable disks), media with built-in rewritable non-volatile memory (e.g., memory cards), and media with built-in ROMs (e.g., ROM cartridges).
The present invention is not limited to the above-described embodiments, and any variations, modifications, or alterations that may occur to one skilled in the art without departing from the spirit of the invention fall within the scope of the invention.