CN117351495B - Text image correction method, device, chip and terminal - Google Patents
Text image correction method, device, chip and terminal Download PDFInfo
- Publication number
- CN117351495B CN117351495B CN202311225097.1A CN202311225097A CN117351495B CN 117351495 B CN117351495 B CN 117351495B CN 202311225097 A CN202311225097 A CN 202311225097A CN 117351495 B CN117351495 B CN 117351495B
- Authority
- CN
- China
- Prior art keywords
- image
- text image
- effective
- text
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000003702 image correction Methods 0.000 title claims abstract description 31
- 229910052704 radon Inorganic materials 0.000 claims abstract description 38
- SYUHGPGVQRZVTB-UHFFFAOYSA-N radon atom Chemical compound [Rn] SYUHGPGVQRZVTB-UHFFFAOYSA-N 0.000 claims abstract description 38
- 230000009466 transformation Effects 0.000 claims abstract description 27
- 238000012937 correction Methods 0.000 claims abstract description 18
- 238000007781 pre-processing Methods 0.000 claims abstract description 15
- 238000012216 screening Methods 0.000 claims abstract description 14
- 230000011218 segmentation Effects 0.000 claims abstract description 13
- 238000012545 processing Methods 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000001514 detection method Methods 0.000 abstract description 10
- 230000000903 blocking effect Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 8
- 239000010749 BS 2869 Class C1 Substances 0.000 description 6
- 238000013473 artificial intelligence Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 4
- 230000001413 cellular effect Effects 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/146—Aligning or centring of the image pick-up or image-field
- G06V30/1463—Orientation detection or correction, e.g. rotation of multiples of 90 degrees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/146—Aligning or centring of the image pick-up or image-field
- G06V30/1475—Inclination or skew detection or correction of characters or of image to be recognised
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Input (AREA)
Abstract
The embodiment of the invention discloses a text image correction method, a device, a chip and a terminal, wherein after an effective image block is obtained by preprocessing, blocking and screening a text image to be corrected, the effective image block containing picture elements is further removed to obtain an effective text image block, the inclination state of the effective text image is determined, when the effective text image is inclined, the effective text image is combined to obtain a target block, the inclination angle of the combined target block is calculated based on the Lato transformation, and the text image to be corrected is rotationally corrected according to the inclination angle. According to the invention, whether the text image is inclined or not is determined based on the segmented text image, so that inclination misjudgment caused by directly judging the text image can be avoided, and the correction efficiency of the text image is improved; in addition, screening and removing operations are performed after the segmentation, and the inclination angle is calculated based on the radon transformation, so that the accuracy of inclination angle detection of the text image is further ensured.
Description
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to a text image correction method, apparatus, chip and storage medium.
Background
Electronic documents have the advantages of small occupied storage, convenient searching, easy management, convenient processing and the like, and paperless office and mobile electronic products are also increasingly favored by office workers and masses. In general, a scanner or other camera tools can be used to automatically and quickly convert a paper document into an electronic document and output the electronic document on a computer, and electronic images obtained by scanning can be divided into two main types, namely plain text images and complex text images.
The prior art has good effect on the inclination correction of the first kind of plain text images, but for complex text images, the inclination detection of the complex text images still has the problems of large result error, low efficiency and the like because the complex text images not only comprise text paragraphs, but also comprise complex images and charts of various types.
Disclosure of Invention
Based on the above, the invention provides a text image correction method, a device, a chip and a storage medium, which can solve the technical problems of large error, low efficiency and the like of the inclination angle detection result of a text image in the prior art.
In a first aspect, there is provided a text image correction method, including:
preprocessing a text image to be corrected;
performing block processing on the preprocessed text image to obtain at least one image block;
screening effective image blocks of effective pixel points from at least one image block;
removing the effective image blocks containing the picture elements from the effective image blocks to obtain effective text image blocks;
Determining a tilt state of the valid text image;
if the effective text image is in an inclined state, merging the effective text image to obtain a target block;
Calculating the inclination angle of the combined target blocks based on the radon transform;
And carrying out rotation correction on the text image to be corrected according to the inclination angle.
Optionally, determining the tilt state of the valid text image includes:
selecting a target block to collect preset characteristic points;
performing straight line fitting on the acquired characteristic points to obtain a fitted straight line equation;
Determining whether the target segment is tilted based on a slope of the linear equation.
Optionally, merging the valid text image to obtain a target block includes:
Determining a rectangle with preset length and width;
And filling each effective text image into the rectangle to obtain a target block.
Optionally, removing the valid image block including the picture element from the valid image block to obtain a valid text image block, including:
carrying out Lato transformation calculation on the effective image blocks to obtain the inclination angles of each effective image block;
Respectively calculating the mean value and the variance of the inclination angles of the effective image blocks;
and removing the effective image blocks with larger variances from the effective image blocks to obtain effective text image blocks.
Optionally, preprocessing the text image to be corrected, including:
setting a factor K, wherein K represents the proportion of a text part to the text image, and the text image comprises the text part and an image part;
Dividing a text image into a first class and a second class according to a threshold t, wherein the gray scale range of the first class is 0to t, the gray scale range of the second class is t+1 to L-1, and the gray scale of the text image is expressed as L;
calculating the sum of intra-class variances of the first class and the second class And inter-class variance/>
The threshold selection formula under the maximum inter-class judgment criterion of the law is determined as follows:
In the threshold selection formula Th, mu 0(t)、μ1 (t) respectively represents the gray-scale average value of the first class and the second class of the text image, Respectively representing intra-class variances of a first class and a second class of the text image;
traversing the gray level L of the text image, and determining a t value when the inter-class variance is larger and the intra-class variance is smaller;
and taking the threshold t as a segmentation threshold of the discipline method to carry out binarization segmentation on the text image.
Optionally, preprocessing the text image to be corrected, further including: graying the text image;
The block processing is performed on the preprocessed text image to obtain at least one image block, which comprises the following steps:
And dividing the text image into a preset number of image blocks on average according to the image size of the text image.
Optionally, calculating the tilt angle of the combined target block based on the radon transform includes:
let f (x, y) be the two-dimensional function of the target block, the radon transform formula is as follows:
wherein,
Determining a collecting point (rho, theta) of peak points in the space after the radon transformation;
and determining theta in the gathering point (rho, theta) as the inclination angle of the text image.
In a second aspect, the present invention provides a text image correction apparatus comprising:
the preprocessing module is used for preprocessing the text image to be corrected;
the block module is used for carrying out block processing on the preprocessed text image to obtain at least one image block;
The screening module is used for screening effective image blocks of the effective pixel points from at least one image block;
the image removing module is used for removing the effective image blocks containing the image elements from the effective image blocks to obtain effective text image blocks;
The inclination judging module is used for determining the inclination state of the effective text image;
The block merging module is used for merging the effective text images to obtain target blocks if the effective text images are in an inclined state;
The inclination angle calculation module is used for calculating the inclination angle of the combined target blocks based on the radon transformation;
And the rotation correction module is used for carrying out rotation correction on the text image to be corrected according to the inclination angle.
In a third aspect, a chip is provided comprising a first processor for calling and running a computer program from a first memory, such that a device on which the chip is mounted performs the steps of the text image correction method as described above.
In a fourth aspect, there is provided a terminal comprising a second memory, a second processor and a computer program stored in said second memory and executable on said second processor, the second processor implementing the steps of the text image correction method as described above when said computer program is executed.
According to the text image correction method, the device, the chip and the storage medium, the text image to be corrected is preprocessed, the preprocessed text image is subjected to block processing to obtain at least one image block, effective image blocks of effective pixel points are screened out of the at least one image block, the effective image blocks containing picture elements are removed from the effective image blocks to obtain effective text image blocks, the inclination state of the effective text image is determined, if the effective text image is in the inclination state, the effective text image is combined to obtain target blocks, the inclination angle of the combined target blocks is calculated based on the radon transformation, and the text image to be corrected is subjected to rotary correction according to the inclination angle. According to the invention, whether the text image is inclined or not is determined based on the segmented text image, so that inclination misjudgment caused by directly judging the text image can be avoided, and the correction efficiency of the text image is improved; in addition, screening and removing operations are performed after the segmentation, and the inclination angle is calculated based on the radon transformation, so that the accuracy of inclination angle detection of the text image is further ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a basic flow diagram of a text image correction method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a text image according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an image partition according to an embodiment of the present invention;
Fig. 4 is a schematic diagram of a merged image partition according to an embodiment of the present invention;
FIG. 5 is a diagram of a Lato transformation relationship provided in an embodiment of the present invention;
FIG. 6 is a basic block diagram of a text image correcting apparatus according to an embodiment of the present invention;
Fig. 7 is a basic structural block diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In order to enable those skilled in the art to better understand the present invention, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present invention with reference to the accompanying drawings.
In some of the flows described in the specification and claims of the present invention and in the foregoing figures, a plurality of operations occurring in a particular order are included, but it should be understood that the operations may be performed out of order or performed in parallel, with the order of operations such as 101, 102, etc., being merely used to distinguish between the various operations, the order of the operations themselves not representing any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be within the scope of the present invention based on the embodiments of the present invention.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among them, artificial intelligence (AI: ARTIFICIAL INTELLIGENCE) is a theory, method, technique, and application system that simulates, extends, and expands human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain the best result.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Referring specifically to fig. 1, fig. 1 is a basic flow chart of a text image correction method according to the present embodiment.
As shown in fig. 1, a text image correction method includes:
s11, preprocessing the text image to be corrected.
A digital image (text image herein) is an image that is stored in a digital format, and is processed by a series of operations using related computer technology to achieve the intended purpose. In order to improve the quality of the original acquired image, and to better identify the target in the image, the original image needs to be preprocessed in advance. Mainly comprises the following steps: graying, binarizing and the like.
In general, the original oblique image contains noise, and if the image containing noise is directly processed, the detection result will be affected, so that the noise needs to be removed during the previous operation.
The acquired images are usually colored, and the color of the color image displayed in human eyes is generally composed of three basic colors of R (red), G (green) and B (blue). The three components range from 0 to 25, where 0 means that the component is not contained, 255 means that the component is contained the largest, that is, 0 means black, 255 means white, and other values within the range of the interval show different shades of gray, with a larger value indicating a greater luminance and a smaller value indicating a lower luminance. When the three component values are not equal, the image is represented as a color, otherwise it is represented as a gray scale image. Since the gray value of each pixel of a color image is represented by one byte (8 bits), which increases the storage space undoubtedly, in the actual processing of a digital image, the color information of the image is not needed, and the direct processing of the color image obviously affects the efficiency of the system, so that the gray value of the image needs to be reduced to only two values of 0 or 255 on the basis of gray, that is, the whole image only has the color effect represented by the two colors.
The preprocessing mentioned in step S11 includes graying the text image first and binarizing based on the grayed text image. The purpose of binarization is to separate text information from a complex background, and the text information can be independently presented so as to facilitate later processing. The most common method for binarizing an image is a thresholding method, where a threshold value is set for clearly separating a foreground and a background.
The application provides an improved Otu algorithm (law method), wherein the basic idea of the Otu algorithm is as follows: let the gray level of the text image be L, the total number of pixels be N, N i be the number of pixels with gray value i, the probability of each gray level be:
pi=ni/N
The images are classified into two classes according to a threshold t: a first class C0 and a second class C1, wherein the gray scale range of the first class C0 is 0-t, the gray scale range of the second class C1 is t+1-L-1, and the gray scale range of the second class C1 represents the background part with larger gray scale value, and the proportion of the first class C0 to the second class C1 is as follows:
w 0(t)、w1 (t) represents the first class gray scale and the second class gray scale respectively;
The gray average value of the first class C0 and the second class C1 is as follows:
The total gray average of the image is:
the intra-class variances of the first class C0 and the second class C1 are:
The sum of the class variances of the two classes is:
the threshold selection formula of the improved Otu algorithm is shown below,
At this time, the gray level L of the text image is traversed, the value of t when the inter-class variance is larger and the intra-class variance is smaller is determined, and the text image is binarized and segmented by taking the threshold t as a segmentation threshold of the discriminant method.
The improved Otsu method sets a factor K, K as the proportion of the target to the image according to the proportion of the target in the image, corrects the class with larger variance, and the essence of the method is to adjust the weight of a threshold calculation formula, so that the problem that when the variance of the class is larger, the threshold is biased to the class, so that the threshold is too high, namely, partial background is divided into the target by mistake, and the segmentation effect is poor after segmentation is avoided. The improved algorithm integrates the inter-class variance and the intra-class variance at the same time, and the larger the inter-class variance is, the more the target is separated from the background is explained; the smaller the intra-class variance, the better the cohesiveness of the two classes after segmentation. Therefore, the inter-class variance and the intra-class variance are considered at the same time, and the larger the inter-class variance is and the smaller the intra-class variance is, so that a better segmentation effect can be obtained.
The K value determining method comprises the following steps: and obtaining an initial binary image according to the Otsu method, and counting the numbers of foreground pixels and background pixels in the binary image and the total number of pixels. Let the number of foreground pixels be n i and the number of background pixels be n b, then the ratio K of the target to the image is calculated by the following formula:
s12, performing block processing on the preprocessed text image to obtain at least one image block;
And dividing the text image into a preset number of image blocks on average according to the image size of the text image. If there are too many blocks, there are more independent image blocks, and the interference of the picture to the result is smaller, but considering the problem of processing efficiency, the blocks should not be too many, in some examples of embodiments, a method of dividing 8×8 into 64 image blocks may be used, and an average block method, that is, dividing the image into 64 image blocks according to the size of the image. The text image and the segmented effect diagram are shown in fig. 2 and 3.
S13, screening effective image blocks of effective pixel points from at least one image block;
However, among the 64 sub-blocks, some sub-blocks are pure black background sub-blocks after binarization, and some sub-blocks have few effective pixels of texts or pictures, so that when the sub-blocks are subjected to Radon transformation, bright spots are not obtained, the calculation amount of an algorithm is increased, and the processing efficiency is affected, so that the Radon transformation is not needed for such sub-blocks, namely, the subsequent operation is not needed.
In order to prevent the influence on the tilt angle precision, in some examples, the range is set smaller, and the image blocks with the effective pixels smaller than 2% generally do not cause the influence on the tilt angle precision, and the image blocks can be well excluded.
S14, removing the effective image blocks containing the picture elements from the effective image blocks to obtain effective text image blocks;
After removing image blocks with fewer effective pixels, carrying out Radon transform (Radon transform) on the reserved image blocks to obtain the dip angles of the image blocks, respectively calculating the mean value and variance of the dip angles of the image blocks, and removing the image blocks with larger variances. Because variance represents the magnitude of the fluctuation of the variable, the larger the variance, the larger the fluctuation range, and the larger the deviation from the average. The non-conforming image blocks can be eliminated by the variance, because most of the reserved image blocks contain characters, the inclination angle obtained by the radon transformation is in a certain range, the deviation mean value is larger, and more picture elements are contained in the image blocks, which is the image blocks needing to eliminate interference.
In order to exclude as many image blocks containing picture elements as possible, the effect of removing variance only once is often not ideal, and in general, the variance needs to be removed again, so that the required image blocks are more accurately reserved. The description of the Radon transform (Radon transform) is given below.
S15, determining the inclination state of the effective text image;
One target block can be selected from the rest of the effective text images to perform preset feature point acquisition: let a certain target block be m×n pixels, M be the width of the image, N be the height of the image, after binarizing the image, the values of the pixels in the image are only 0 and 255.
Traversing pixels from 0 to M-1 columns of the binary image column by column, traversing pixels from 0 to N-1 for each column until traversing to obtain a point with a first pixel value which is not 0, such as a point A, wherein the column at the moment is an X coordinate value of the point A, and the position of the point A in the column is a Y coordinate value.
Traversing pixels from 0 to N-1 row by row for the binary image, traversing pixels from 0 to M-1 row by row until traversing to a point with a first pixel value which is not 0, such as a point B, wherein the row at the moment is the Y coordinate value of the point B, and the position of the point B in the column is the X coordinate value.
And traversing the points with the first pixel value not being 0 from the point A to the point B, namely, the characteristic points on the left edge, and recording coordinate values of the points, so as to determine a characteristic point set. The point set between A and B is composed of a row of text pixels, and is an inclined straight line.
And then, carrying out straight line fitting on the acquired characteristic points to obtain a fitted straight line equation. A linear equation can be obtained at two points. The straight line equation is set as: y=a+bx, taking the feature point coordinates in the feature point set obtained above as a sample, and setting that N feature points are in the feature point set, that is, (x 1,y1),(x2,y2),(x3,y3),(xN,yN), then a straight line equation can be obtained by selecting two points for initial calculation, and whether the target block is inclined can be determined based on the slope of the straight line equation.
S16, if the effective text image is in an inclined state, merging the effective text image to obtain a target block;
If the average dip angle of the sub-blocks is reserved as the image dip angle, the accuracy will be poor due to the limitation of the average method, so that all the sub-blocks need to be combined, and then the dip angle detection is carried out on the combined image. However, since the original image is not completely equally divided into 64 blocks (from the top left corner), the right-most and bottom-most image edges remain, and all the processing is performed in each sub-block after the division, and there is no remaining portion of the processed edges, which not only affects the accuracy, but also results in poor post-rotation effect.
The step of merging the valid text images in the embodiment of the application comprises the following steps: and determining a rectangle with preset length and width, and filling each effective text image into the rectangle to obtain a target block. The method specifically comprises the following steps: a matrix of length W and height H (as with the original text image size) is redefined and initialized to 0. The image after the combination of the elimination sub-blocks is filled into the image, thus the obtained elimination interference is equal to the original image in size. After interference is removed, the effect diagram of sub-block merging is shown in fig. 4.
S17, calculating the inclination angle of the combined target blocks based on the radon transformation;
the basic idea of the radon transform is based on the point-to-line duality that the image is in image space before being radon transformed and in parameter space (ρ, θ) after being transformed. Before and after the different spatial transformations, it is understood that the image is projected in the parameter space, and each point in the parameter space after the projection corresponds to a straight line in the image space, and the radon transformation is the integration on the straight line. The Radon transform works to calculate the projection of an image at an angle, i.e. to integrate the image linearly along a particular direction. Assuming that f (x, y) is a two-dimensional function in space, the essence of Radon transform is to integrate the line of the function in a certain direction. Such as: if the line integral of the function in the vertical direction is calculated, the projection in the X-axis direction is calculated, and if the line integral in the horizontal direction is required, the projection in the Y-axis direction is calculated, and similarly, the line integral in other directions can be projected along the corresponding angle theta. Typically, the radon transform of the function is integrated along the Y-axis. The radon transform relationship is shown in fig. 5, which shows the line integral of the image parallel to the axis. If an image matrix is represented, the essence of the radon transform is to calculate the projection along the image matrix in any direction.
Let f (x, y) be the two-dimensional function of the target block, the radon transform formula is as follows:
wherein,
Based on the Radon transformation principle, we know that a point will be formed in Radon space after the straight line of the original image is Radon transformed. After the image is subjected to the radon transform, the straight line with higher gray value in the image is correspondingly brighter, the point formed in the parameter space is correspondingly brighter if the length of a line segment in the image is larger, and the straight line with lower gray value is correspondingly brighter, so that the point with lower gray value is formed, and in view of the result, if the straight line of the detected image is, the straight line can be converted into the detection of the bright and dark points, thereby simplifying the problem. Thus, for the detection of the inclination angle of the inclined image, the collecting point (ρ, θ) of the peak point in the space after the radon transformation is determined, and θ in the collecting point (ρ, θ) is determined as the inclination angle of the text image.
And S18, carrying out rotation correction on the text image to be corrected according to the inclination angle.
And after detecting the inclined angle, carrying out corresponding rotation correction according to the inclination angle being larger or smaller than 0. When the inclination angle is larger than 0, the image is subjected to anticlockwise rotation transformation; and when the inclination angle is smaller than 0, performing clockwise rotation transformation on the image. The formula of the rotation transformation is as follows:
where x, y is the position of the image before correction, x ', y' is the position of the image after correction, Is a rotation transformation matrix.
According to the invention, whether the text image is inclined or not is determined based on the segmented text image, so that inclination misjudgment caused by directly judging the text image can be avoided, and the correction efficiency of the text image is improved; in addition, screening and removing operations are performed after the segmentation, and the inclination angle is calculated based on the radon transformation, so that the accuracy of inclination angle detection of the text image is further ensured.
In order to solve the technical problems, the embodiment of the invention also provides a text image correction device. Referring specifically to fig. 6, fig. 6 is a basic block diagram of a text image correction apparatus according to the present embodiment, including:
the preprocessing module is used for preprocessing the text image to be corrected;
the block module is used for carrying out block processing on the preprocessed text image to obtain at least one image block;
The screening module is used for screening effective image blocks of the effective pixel points from at least one image block;
the image removing module is used for removing the effective image blocks containing the image elements from the effective image blocks to obtain effective text image blocks;
The inclination judging module is used for determining the inclination state of the effective text image;
The block merging module is used for merging the effective text images to obtain target blocks if the effective text images are in an inclined state;
The inclination angle calculation module is used for calculating the inclination angle of the combined target blocks based on the radon transformation;
And the rotation correction module is used for carrying out rotation correction on the text image to be corrected according to the inclination angle.
According to the invention, whether the text image is inclined or not is determined based on the segmented text image, so that inclination misjudgment caused by directly judging the text image can be avoided, and the correction efficiency of the text image is improved; in addition, screening and removing operations are performed after the segmentation, and the inclination angle is calculated based on the radon transformation, so that the accuracy of inclination angle detection of the text image is further ensured.
In some embodiments, the above modules can also implement corresponding functions to implement the above-mentioned steps of the text image correction method.
In order to solve the above technical problems, the embodiment of the present invention further provides a chip, where the chip may be a general-purpose processor or a special-purpose processor. The chip includes a processor for supporting the terminal to perform the above-mentioned related steps, such as calling and running a computer program from a memory, so that a device mounted with the chip executes to implement the text image correction method in the above-mentioned respective embodiments.
Optionally, in some examples, the chip further includes a transceiver, where the transceiver is controlled by the processor, and is configured to support the terminal to perform the related steps to implement the text image correction method in the foregoing embodiments.
Optionally, the chip may further comprise a storage medium.
It should be noted that the chip may be implemented using the following circuits or devices: one or more field programmable gate arrays (field programmable GATE ARRAY, FPGA), programmable logic devices (programmablelogic device, PLD), controllers, state machines, gate logic, discrete hardware components, any other suitable circuit or circuits capable of performing the various functions described throughout this application.
The invention also provides a terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the text image correction method as described above when executing the computer program.
Referring specifically to fig. 7, fig. 7 is a basic block diagram illustrating a terminal including a processor, a nonvolatile storage medium, a memory, and a network interface connected by a system bus. The non-volatile storage medium of the terminal stores an operating system, a database and a computer readable instruction, the database can store a control information sequence, and the computer readable instruction can enable the processor to realize a text image correction method when the computer readable instruction is executed by the processor. The processor of the terminal is operative to provide computing and control capabilities supporting the operation of the entire terminal. The memory of the terminal may have stored therein computer readable instructions that, when executed by the processor, cause the processor to perform a text image correction method. The network interface of the terminal is used for connecting and communicating with the terminal. It will be appreciated by persons skilled in the art that the structures shown in the drawings are block diagrams of only some of the structures associated with the aspects of the application and are not limiting of the terminals to which the aspects of the application may be applied, and that a particular terminal may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
As used herein, a "terminal" or "terminal device" includes both a device of a wireless signal receiver having no transmitting capability and a device of receiving and transmitting hardware having electronic devices capable of performing two-way communication over a two-way communication link, as will be appreciated by those skilled in the art. Such an electronic device may include: a cellular or other communication device having a single-line display or a multi-line display or a cellular or other communication device without a multi-line display; PCS (Personal Communications Service, personal communications System) that may combine voice, data processing, facsimile and/or data communications capabilities; a PDA (Personal DigitalAssistant ) that may include a radio frequency receiver, pager, internet/intranet access, web browser, notepad, calendar and/or GPS (Global Positioning System ) receiver; a conventional laptop and/or palmtop computer or other appliance that has and/or includes a radio frequency receiver. As used herein, "terminal," "terminal device" may be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or adapted and/or configured to operate locally and/or in a distributed fashion, to operate at any other location(s) on earth and/or in space. The "terminal" and "terminal device" used herein may also be a communication terminal, a network access terminal, and a music/video playing terminal, for example, may be a PDA, a MID (Mobile INTERNET DEVICE ) and/or a Mobile phone with a music/video playing function, and may also be a smart tv, a set top box, and other devices.
The present invention also provides a storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the text image correction method described in any of the embodiments above.
The present embodiment also provides a computer program which can be distributed on a computer readable medium and executed by a computable device to implement at least one step of the above-described text image correction method; and in some cases at least one of the steps shown or described may be performed in a different order than that described in the above embodiments.
The present embodiment also provides a computer program product comprising computer readable means having stored thereon a computer program as shown above. The computer readable means in this embodiment may comprise a computer readable storage medium as shown above.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored in a computer-readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
Claims (8)
1. A text image correction method, characterized by comprising:
preprocessing a text image to be corrected;
performing block processing on the preprocessed text image to obtain at least one image block;
screening effective image blocks of effective pixel points from at least one image block;
removing the effective image blocks containing the picture elements from the effective image blocks to obtain effective text image blocks;
Determining a tilt state of the valid text image;
if the effective text image is in an inclined state, merging the effective text image to obtain a target block;
Calculating the inclination angle of the combined target blocks based on the radon transform;
performing rotation correction on the text image to be corrected according to the inclination angle;
the merging of the valid text images to obtain target blocks comprises the following steps:
Determining a rectangle with preset length and width;
Filling each effective text image into the rectangle to obtain a target block;
The removing the effective image block containing the picture element from the effective image block to obtain an effective text image block comprises the following steps:
carrying out Lato transformation calculation on the effective image blocks to obtain the inclination angles of each effective image block;
Respectively calculating the mean value and the variance of the inclination angles of the effective image blocks;
and removing the effective image blocks with larger variances from the effective image blocks to obtain effective text image blocks.
2. The text image correction method of claim 1, wherein said determining a tilt state of said valid text image includes:
selecting a target block to collect preset characteristic points;
performing straight line fitting on the acquired characteristic points to obtain a fitted straight line equation;
Determining whether the target segment is tilted based on a slope of the linear equation.
3. The text image correction method of claim 1, wherein said preprocessing the text image to be corrected includes:
setting a factor K, wherein K represents the proportion of a text part to the text image, and the text image comprises the text part and an image part;
Dividing a text image into a first class and a second class according to a threshold t, wherein the gray scale range of the first class is 0to t, the gray scale range of the second class is t+1 to L-1, and the gray scale of the text image is expressed as L;
calculating the sum of intra-class variances of the first class and the second class And inter-class variance/>
The threshold selection formula Th under the maximum inter-class judgment criterion of the law is determined as follows:
In the threshold selection formula Th, mu 0(t)、μ1 (t) respectively represents the gray-scale average value of the first class and the second class of the text image, Respectively representing the intra-class variances of the first class and the second class of the text image, wherein w 0、w1 respectively represents the gray scale of the first class and the gray scale of the second class;
traversing the gray level L of the text image, and determining a t value when the inter-class variance is larger and the intra-class variance is smaller;
and taking the threshold t as a segmentation threshold of the discipline method to carry out binarization segmentation on the text image.
4. The text image correction method of claim 1, wherein said preprocessing the text image to be corrected further comprises: graying the text image;
The block processing is performed on the preprocessed text image to obtain at least one image block, which comprises the following steps:
And dividing the text image into a preset number of image blocks on average according to the image size of the text image.
5. The text image correction method of claim 1, wherein calculating the tilt angle of the combined target block based on the radon transform includes:
let f (x, y) be the two-dimensional function of the target block, the radon transform formula is as follows:
wherein,
Determining a collecting point (rho, theta) of peak points in the space after the radon transformation;
and determining theta in the gathering point (rho, theta) as the inclination angle of the text image.
6. A text image correction apparatus characterized by comprising:
the preprocessing module is used for preprocessing the text image to be corrected;
the block module is used for carrying out block processing on the preprocessed text image to obtain at least one image block;
The screening module is used for screening effective image blocks of the effective pixel points from at least one image block;
the image removing module is used for removing the effective image blocks containing the image elements from the effective image blocks to obtain effective text image blocks;
The inclination judging module is used for determining the inclination state of the effective text image;
The block merging module is used for merging the effective text images to obtain target blocks if the effective text images are in an inclined state;
The inclination angle calculation module is used for calculating the inclination angle of the combined target blocks based on the radon transformation;
the rotation correction module is used for carrying out rotation correction on the text image to be corrected according to the inclination angle;
the block merging module is configured to merge the valid text images to obtain a target block, and includes:
Determining a rectangle with preset length and width;
Filling each effective text image into the rectangle to obtain a target block;
the image removing module is configured to remove an effective image block containing an image element from the effective image block to obtain an effective text image block, and includes:
carrying out Lato transformation calculation on the effective image blocks to obtain the inclination angles of each effective image block;
Respectively calculating the mean value and the variance of the inclination angles of the effective image blocks;
and removing the effective image blocks with larger variances from the effective image blocks to obtain effective text image blocks.
7. A chip, comprising: a first processor for calling and running a computer program from a first memory, so that a device on which the chip is mounted performs the respective steps of the text image correction method as claimed in any one of claims 1 to 5.
8. A terminal, comprising: a second memory, a second processor and a computer program stored in the second memory and executable on the second processor, characterized in that the second processor implements the steps of the text image correction method according to any one of claims 1 to 5 when the computer program is executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311225097.1A CN117351495B (en) | 2023-09-21 | 2023-09-21 | Text image correction method, device, chip and terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311225097.1A CN117351495B (en) | 2023-09-21 | 2023-09-21 | Text image correction method, device, chip and terminal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117351495A CN117351495A (en) | 2024-01-05 |
CN117351495B true CN117351495B (en) | 2024-04-26 |
Family
ID=89360362
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311225097.1A Active CN117351495B (en) | 2023-09-21 | 2023-09-21 | Text image correction method, device, chip and terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117351495B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8897565B1 (en) * | 2012-06-29 | 2014-11-25 | Google Inc. | Extracting documents from a natural scene image |
WO2019056346A1 (en) * | 2017-09-25 | 2019-03-28 | 深圳传音通讯有限公司 | Method and device for correcting tilted text image using expansion method |
CN110298282A (en) * | 2019-06-21 | 2019-10-01 | 华南师范大学 | Document image processing method, storage medium and calculating equipment |
CN113421257A (en) * | 2021-07-22 | 2021-09-21 | 凌云光技术股份有限公司 | Dot matrix font text line rotation correction method and device |
-
2023
- 2023-09-21 CN CN202311225097.1A patent/CN117351495B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8897565B1 (en) * | 2012-06-29 | 2014-11-25 | Google Inc. | Extracting documents from a natural scene image |
WO2019056346A1 (en) * | 2017-09-25 | 2019-03-28 | 深圳传音通讯有限公司 | Method and device for correcting tilted text image using expansion method |
CN110298282A (en) * | 2019-06-21 | 2019-10-01 | 华南师范大学 | Document image processing method, storage medium and calculating equipment |
CN113421257A (en) * | 2021-07-22 | 2021-09-21 | 凌云光技术股份有限公司 | Dot matrix font text line rotation correction method and device |
Non-Patent Citations (1)
Title |
---|
基于Radon变换的图文图像的倾斜校正;仇伟涛;;福建电脑;20160131(01);122-123 * |
Also Published As
Publication number | Publication date |
---|---|
CN117351495A (en) | 2024-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112348815B (en) | Image processing method, image processing apparatus, and non-transitory storage medium | |
EP3309703B1 (en) | Method and system for decoding qr code based on weighted average grey method | |
CN110472623A (en) | Image detecting method, equipment and system | |
JP4628882B2 (en) | Classifier learning method, face discrimination method and apparatus, and program | |
US5703969A (en) | System and method for recognizing visual indicia | |
EP1146472A2 (en) | Loose-gray-scale template matching | |
CN108764039B (en) | Neural network, building extraction method of remote sensing image, medium and computing equipment | |
CN112926564B (en) | Picture analysis method, system, computer device and computer readable storage medium | |
US10055668B2 (en) | Method for the optical detection of symbols | |
EP2782065B1 (en) | Image-processing device removing encircling lines for identifying sub-regions of image | |
EP2249307A1 (en) | Method for image reframing | |
CN111507119B (en) | Identification code recognition method, identification code recognition device, electronic equipment and computer readable storage medium | |
CN117351495B (en) | Text image correction method, device, chip and terminal | |
CN113159029A (en) | Method and system for accurately capturing local information in picture | |
CN114519788A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
CN116976372A (en) | Picture identification method, device, equipment and medium based on square reference code | |
CN102750718B (en) | Background masking generating method | |
CN115861922B (en) | Sparse smoke detection method and device, computer equipment and storage medium | |
CN112070708A (en) | Image processing method, image processing apparatus, electronic device, and storage medium | |
CN116311290A (en) | Handwriting and printing text detection method and device based on deep learning | |
EP0503005A1 (en) | Character normalization using an elliptical sampling window for optical character recognition | |
CN113269728B (en) | Visual edge-tracking method, device, readable storage medium and program product | |
CN112766256B (en) | Grating phase diagram processing method and device, electronic equipment and storage medium | |
CN113344042A (en) | Road condition image model training method and system based on driving assistance and intelligent terminal | |
US10380463B2 (en) | Image processing device, setting support method, and non-transitory computer-readable media |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A text image correction method, device, chip, and terminal Granted publication date: 20240426 Pledgee: Rizhao Donggang Rural Commercial Bank Co.,Ltd. Pledgor: Shandong Ruixin Semiconductor Technology Co.,Ltd. Registration number: Y2024980028949 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right |