CN110705546B - Text image angle deviation correcting method and device and computer readable storage medium - Google Patents

Text image angle deviation correcting method and device and computer readable storage medium Download PDF

Info

Publication number
CN110705546B
CN110705546B CN201910846892.XA CN201910846892A CN110705546B CN 110705546 B CN110705546 B CN 110705546B CN 201910846892 A CN201910846892 A CN 201910846892A CN 110705546 B CN110705546 B CN 110705546B
Authority
CN
China
Prior art keywords
text image
image
text
projection histogram
binary copy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910846892.XA
Other languages
Chinese (zh)
Other versions
CN110705546A (en
Inventor
王博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910846892.XA priority Critical patent/CN110705546B/en
Priority to PCT/CN2019/116549 priority patent/WO2021042509A1/en
Publication of CN110705546A publication Critical patent/CN110705546A/en
Application granted granted Critical
Publication of CN110705546B publication Critical patent/CN110705546B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/243Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)
  • Facsimile Image Signal Circuits (AREA)
  • Image Analysis (AREA)
  • Studio Devices (AREA)

Abstract

The invention relates to an artificial intelligence technology and discloses a text image angle correction method, which comprises the steps of obtaining a text image, and preprocessing the text image to obtain a binarized text image; detecting a deflected text in the binarized text image through an iterative algorithm to obtain a deflected text image, and cutting the deflected text image to obtain a binary copy image; performing progressive rotation on the binary copy image, and converting the binary copy image subjected to progressive rotation into a frequency projection histogram set; calculating standard deviation of peak top and peak valley point of the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as a correction angle of the text image, thereby finishing angle correction of the text image. The invention also provides a text image angle deviation correcting device and a computer readable storage medium. The invention realizes accurate correction of the angle of the text image.

Description

Text image angle deviation correcting method and device and computer readable storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a text image angle deviation correcting method and device based on projection and a computer readable storage medium.
Background
Optical character recognition technology has extremely wide application in the current society. The optical character recognition (Optical Character Recognition, OCR) is a process of recognizing optical characters in a picture through an image processing and pattern recognition technology and translating the optical characters into computer characters, the main process is that the input image is preprocessed, binarized, denoised, character cut and character recognition are carried out, most of the OCR algorithms are realized based on decision trees and support vector machines (Support Vector Machine, SVM) nowadays, the recognition accuracy is very sensitive to the deflection of the characters, however, the acquisition of text images is difficult to achieve zero deflection, and certain difficulty exists if the correction angle is required to be accurately calculated.
Disclosure of Invention
The invention provides a text image angle correction method, a text image angle correction device and a computer readable storage medium, which mainly aim at presenting accurate correction results to a user when the user performs text image angle correction in a knowledge base.
In order to achieve the above object, the present invention provides a text image angle correction method, including:
acquiring a text image, and performing preprocessing operation on the text image to obtain a binarized text image;
detecting a deflected text in the binarized text image through an iterative algorithm to obtain a deflected text image, and cutting the deflected text image to obtain a binary copy image;
performing progressive rotation on the binary copy image, converting the binary copy image subjected to progressive rotation into a frequency projection histogram, and obtaining a frequency projection histogram set of the binary copy image according to the progressive rotation angle of the binary copy image;
calculating standard deviation of peak top and peak valley point of the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as a correction angle of the text image, thereby finishing angle correction of the text image.
Optionally, the preprocessing operation is performed on the text image to obtain a binarized text image, including:
and denoising the text image by using an adaptive image denoising filter, performing contrast enhancement on the text image after denoising by using a contrast stretching mode, and performing thresholding operation on the text image after contrast enhancement according to an OTSU algorithm to obtain the binarized text image.
Optionally said converting said progressively rotated binary copy image into a frequency projection histogram comprises:
performing Fourier transform on the binary copy image subjected to progressive rotation;
calculating the amplitude spectrum and the phase spectrum of the binary copy image after Fourier transformation;
and constructing the frequency projection histogram according to the magnitude spectrum and the phase spectrum.
Optionally, the fourier transform method includes:
the method comprises the following steps of:
wherein u=0, 1,2, 3..m-1; v=0, 1,2, 3..n-1; x=0, 1,2, 3..m-1; y=0, 1,2, 3..n-1; m, N the number of long and wide pixel points in the binary copy image, x and y are space coordinate points, F (x, y) is the space domain sampling value of the binary copy image, F (u, v) is the Fourier transform domain sampling value of the binary copy image, and u and v are transform domain coordinate points.
Optionally, the method for calculating standard deviation of peak top point and peak valley point in the frequency projection histogram set includes:
wherein sigma represents standard deviation of frequency projection histogram, x i Represents the ith peak point in the frequency projection histogram, n represents the number of peak points in the frequency projection histogram, y j The i-th peak and valley point in the frequency projection histogram is represented, m represents the number of peak and valley points in the frequency projection histogram, and mu is the average value of all peak peaks and peak and valley points.
In addition, in order to achieve the above object, the present invention also provides a text image angle correction device, which includes a memory and a processor, wherein the memory stores a text image angle correction program that can be run on the processor, and the text image angle correction program when executed by the processor implements the following steps:
acquiring a text image, and performing preprocessing operation on the text image to obtain a binarized text image;
detecting a deflected text in the binarized text image through an iterative algorithm to obtain a deflected text image, and cutting the deflected text image to obtain a binary copy image;
performing progressive rotation on the binary copy image, converting the binary copy image subjected to progressive rotation into a frequency projection histogram, and obtaining a frequency projection histogram set of the binary copy image according to the progressive rotation angle of the binary copy image;
calculating standard deviation of peak top and peak valley point of the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as a correction angle of the text image, thereby finishing angle correction of the text image.
Optionally, the preprocessing operation is performed on the text image to obtain a binarized text image, including:
and denoising the text image by using an adaptive image denoising filter, performing contrast enhancement on the text image after denoising by using a contrast stretching mode, and performing thresholding operation on the text image after contrast enhancement according to an OTSU algorithm to obtain the binarized text image.
Optionally said converting said progressively rotated binary copy image into a frequency projection histogram comprises:
performing Fourier transform on the binary copy image subjected to progressive rotation;
calculating the amplitude spectrum and the phase spectrum of the binary copy image after Fourier transformation;
and constructing the frequency projection histogram according to the magnitude spectrum and the phase spectrum.
Optionally, the fourier transform method includes:
the method comprises the following steps of:
wherein u=0, 1,2,3 … M-1; v=0, 1,2,3 … N-1; x=0, 1,2,3 … M-1; y=0, 1,2,3 … N-1; m, N the number of long and wide pixel points in the binary copy image, x and y are space coordinate points, F (x, y) is the space domain sampling value of the binary copy image, F (u, v) is the Fourier transform domain sampling value of the binary copy image, and u and v are transform domain coordinate points.
In addition, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a text image angle deviation correcting program executable by one or more processors to implement the steps of the text image angle deviation correcting method as described above.
According to the text image angle correction method, the device and the computer readable storage medium, when a user performs text image angle correction, preprocessing operation is performed on an obtained text image, and an inclined text image in the text image is analyzed and processed to obtain a frequency projection histogram set, the standard deviation of the peak top point and the peak valley point of the frequency projection histogram set is calculated, and the maximum standard deviation is used as the correction angle of the text image, so that an accurate text image angle correction result can be presented to the user.
Drawings
Fig. 1 is a flow chart of a text image angle correction method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an internal structure of a text image angle correction device according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a text image angle correction program in the text image angle correction device according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The invention provides a text image angle correction method. Referring to fig. 1, a flow chart of a text image angle correction method according to an embodiment of the invention is shown. The method may be performed by an apparatus, which may be implemented in software and/or hardware.
In this embodiment, the text image angle correction method includes:
s1, acquiring a text image, and preprocessing the text image to obtain a binarized text image.
In a preferred embodiment of the present invention, the text image may be image data such as a certificate, an invoice, etc. The pretreatment operation is as follows: and denoising the text image by using an adaptive image denoising filter, performing contrast enhancement on the text image after denoising by using a contrast stretching mode, and performing thresholding operation on the text image after contrast enhancement according to an OTSU algorithm to obtain the binarized text image. In detail, the specific implementation steps of the pretreatment operation are as follows:
a. noise reduction:
according to the invention, the self-adaptive image noise reduction filter is used for reducing noise of the text image, so that the salt-pepper noise of the text image is filtered, and the details of the text image can be protected to a great extent. Wherein the salt and pepper noise is a randomly occurring white point or black point in the image, and the adaptive image noise reduction filter is a signal extractor for extracting an original signal from the signal polluted by the noise.
In the preferred embodiment of the invention, by presetting the text image as f (x, y), a degradation image g (x, y) is obtained under the effect of the degradation function H due to the influence of the pretzel noise η (x, y). Thus, an image degradation formula is obtained: g (x, y) =η (x, y) +f (x, y), and denoising the text image by using an Adaptive Filter method, wherein the denoising calculation formula is as follows:
wherein,is the noise variance of the text image,/>Is the mean value of the gray scale of the pixels in a window near point (x, y), +.>Is the variance of the pixel gray levels within a window around point (x, y).
b. Contrast enhancement:
the contrast refers to the contrast between the maximum value and the minimum value of brightness in an imaging system, wherein low contrast can increase the difficulty of image processing. The preferred embodiment of the invention adopts a contrast stretching method, and the aim of enhancing the contrast of the text image is achieved by utilizing a mode of improving the dynamic range of the gray level. The contrast stretching is also called gray stretching.
Furthermore, the invention carries out gray scale stretching on the specific area according to the piecewise linear transformation function in the contrast stretching method, thereby further improving the contrast of the output image. When contrast stretching is performed, it is essentially the gray value transformation that is achieved. The invention realizes gray value conversion by linear stretching, wherein the linear stretching refers to pixel level operation with linear relation between input gray values and output gray values, and a gray conversion formula is as follows:
D b =f(D a )=a*D a +b
where a is the linear slope and b is the intercept on the Y axis. When a > 1, the contrast of the image output at this time is enhanced compared with the original image. When a < 1, the contrast of the image output at this timeIs attenuated compared to the original image, wherein D a Represents the gray value of the input image, D b Representing the output image gray value.
c. Image thresholding operations:
the invention carries out a binarization efficient algorithm on the text image with enhanced contrast through an OTSU algorithm to obtain a binarized image. Further, in the preferred embodiment of the present invention, the preset gray level t is a segmentation threshold value of the foreground and the background of the text image after the contrast enhancement, and the preset proportion of the foreground points to the text image after the contrast enhancement is w 0 Average gray level u 0 The method comprises the steps of carrying out a first treatment on the surface of the The proportion of the background points to the text image after the contrast enhancement is w 1 Average gray level u 1 The total average gray level of the text image after the contrast enhancement is:
u=w 0 *u 0 +w 1 *u 1
the variance of the foreground and background images of the text image after the contrast enhancement is as follows:
g=w 0 *(u 0 -u)*(u 0 -u)+w 1 *(u 1 -u)*(u 1 -u)=w 0 *w 1 *(u 0 -u 1 )*(u 0 -u 1 ),
when the variance g is maximum, the foreground and the background are the largest, the gray level t is the optimal threshold, the gray level value larger than the gray level t in the text image after the contrast enhancement is set to 255, and the gray level value smaller than the gray level t is set to 0, so as to obtain the binarized text image of the text image after the contrast enhancement.
Further, the preprocessing operation of the present invention may further include dimension reduction of the binarized text image by a principal component analysis method, so that the binarized text image can be processed more efficiently. Wherein the principal component analysis is a method of converting a set of variables for which correlation may exist into a set of linearly uncorrelated variables by a positive-negative conversion.
S2, detecting the deflected text in the binarized text image through an iterative algorithm to obtain a deflected text image, and extracting the deflected text image to obtain a binary copy image.
The preferred embodiment of the invention detects the deflected text in the binarized text image to the deflected text image through an AdaBoost iterative algorithm. The AdaBoost iterative algorithm is a detection algorithm, the core of the AdaBoost iterative algorithm is iteration, the AdaBoost iterative algorithm is a weak classifier constructed for different training sets, and each basic weak classifier is combined together to form a final strong classifier. The AdaBoost iterative algorithm is implemented by adjusting data distribution, and the weight of each sample is set according to the accuracy of judging the classification of each sample in each training set and the accuracy of the total classification of the last sample. The newly obtained weight is used as a data set trained by the lower-layer classifier, and the trained classifiers are combined each time to form a final decision classifier.
The invention divides different areas in the binarized text image to obtain training samples (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x n ,y n ) Wherein the negative sample (background) is y i Expressed by =0, positive samples (foreground, i.e. containing skewed text) are represented by y i =1. Preferably, the weak classifier constructed by the invention is:
wherein f is a feature, θ is a threshold, p indicates the direction of the inequality sign, and x indicates a detection sub-window. By aggregating the constructed weak classifiers and classifying a minimum error rate ε in the constructed weak classifiers t Optimal weak classifier h t (x) Selecting, the epsilon t The calculation formula of (2) is as follows:
ε t =min f,p,θi (w i /∑w i )|h(x,f,p,θ)-y i |,
wherein w is a characteristic weight value, and a final strong classifier is obtained:
β t =ε t /(1-ε t )。
further, the method detects the skewed text in the binarized text image by cascading classifiers. The cascade classifier is to form a text detection cascade classifier by the training strong classifier in a cascade mode, and the cascade classifier is a degenerated decision tree. In the cascade classifier, the layer 2 classifier classification is triggered by the positive sample obtained by the layer 1 classification, the layer 3 classifier classification is triggered by the positive sample obtained by the layer 2 classification, and so on. And finally detecting all the deflection text images in the binarized text images in a general environment, and cutting the deflection text images to obtain the binary copy image.
S3, carrying out progressive rotation on the binary copy image, converting the binary copy image subjected to progressive rotation into a frequency projection histogram, and obtaining a frequency projection histogram set of the binary copy image according to the progressive rotation angle of the binary copy image.
In the preferred embodiment of the invention, the binary copy image is rotated in a progressive manner according to a preset angle, preferably, the binary copy image is rotated in a progressive manner by taking 2 degrees as a unit between-45 degrees and 45 degrees, and the number of long and wide pixel points in the binary copy image is calculated after each progressive rotation.
Further, the binary copy image after progressive rotation is converted into a frequency projection histogram through a Fourier transform algorithm. In detail, the fourier transform method includes:
the method comprises the following steps of:
wherein u=0, 1,2, 3..m-1; v=0, 1,2, 3..n-1; x=0, 1,2, 3..m-1; y=0, 1,2, 3..n-1; m, N the number of long and wide pixel points in the binary copy image, x and y are space coordinate points, F (x, y) is the space domain sampling value of the binary copy image, F (u, v) is the Fourier transform domain sampling value of the binary copy image, and u and v are transform domain coordinate points. When the binary copy image is listed as a square matrix, then m=n. F (u, v) is called the spectrum of the binary copy image signal F (x, y), and calculates the binary copy image amplitude spectrum and the phase spectrum after fourier transform, respectively:
wherein F (u, v) =r (u, v) +ji (u, v) = |f (u, v) |e jφ(u,v) -F (u, v) represents the two-copy image amplitude spectrum, -phi (u, v) represents the two-copy image phase spectrum.
Further, the invention constructs the frequency projection histogram according to the calculated amplitude spectrum and phase spectrum of the binary copy image, and obtains different frequency projection histograms according to different progressive rotation angles of the binary copy image, namely a frequency projection histogram set of the binary copy image.
S4, calculating standard deviation of peak top points and peak valley points in the frequency projection histogram, obtaining a standard deviation set, and taking the maximum standard deviation in the standard deviation set as a correction angle of the text image to finish angle correction of the text image.
In a preferred embodiment of the present invention, the method for calculating the standard deviation between the peak top and the peak valley point in the frequency projection histogram set includes:
wherein sigma represents standard deviation of frequency projection histogram, x i Represents the ith peak point in the frequency projection histogram, n represents the number of peak points in the frequency projection histogram, y i The i-th peak and valley point in the frequency projection histogram is represented, m represents the number of peak and valley points in the frequency projection histogram, and mu is the average value of all peak peaks and peak and valley points. The standard deviation reflects the degree of dispersion between the peak valley point and the peak top point.
Further, the invention calculates standard deviation of all histograms in the frequency projection histogram set to obtain a standard deviation set, obtains the optimal azimuth after correcting the text image when the standard deviation is maximum according to the structural characteristics of the text image, obtains the correction angle of the text image, and carries out rotary correction on the original image according to the correction angle.
The invention also provides a text image angle deviation correcting device. Referring to fig. 2, an internal structure diagram of a text image angle correction device according to an embodiment of the invention is shown.
In this embodiment, the text image angle correction apparatus 1 may be a PC (Personal Computer ), or a terminal device such as a smart phone, a tablet computer, a portable computer, or a server. The text-to-image angle deviation correcting device 1 comprises at least a memory 11, a processor 12, a communication bus 13, and a network interface 14.
The memory 11 includes at least one type of readable storage medium including flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal memory unit of the text image angle rectification device 1, for example a hard disk of the text image angle rectification device 1. The memory 11 may also be an external storage device of the text image angle deviation correcting device 1 in other embodiments, such as a plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card) or the like, which are provided on the text image angle deviation correcting device 1. Further, the memory 11 may also include both an internal memory unit and an external memory device of the text image angle deviation correcting device 1. The memory 11 may be used not only for storing application software installed in the text image angle correction apparatus 1 and various types of data, such as codes of the text image angle correction program 01, but also for temporarily storing data that has been output or is to be output.
Processor 12 may in some embodiments be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chip for executing program code or processing data stored in memory 11, such as executing text image angle deviation correction program 01, etc.
The communication bus 13 is used to enable connection communication between these components.
The network interface 14 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), typically used to establish a communication connection between the apparatus 1 and other electronic devices.
Optionally, the device 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or a display unit, as appropriate, for displaying information processed in the text-image angle correction apparatus 1 and for displaying a visual user interface.
Fig. 2 shows only a text image angle deviation correcting device 1 with components 11-14 and a text image angle deviation correcting program 01, it being understood by a person skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the text image angle deviation correcting device 1, and may comprise fewer or more components than shown, or may combine certain components, or a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, a text image angle deviation correcting program 01 is stored in the memory 11; the processor 12 implements the following steps when executing the text image angle deviation correcting program 01 stored in the memory 11:
step one, acquiring a text image, and preprocessing the text image to obtain a binarized text image.
In a preferred embodiment of the present invention, the text image may be image data such as a certificate, an invoice, etc. The pretreatment operation is as follows: and denoising the text image by using an adaptive image denoising filter, performing contrast enhancement on the text image after denoising by using a contrast stretching mode, and performing thresholding operation on the text image after contrast enhancement according to an OTSU algorithm to obtain the binarized text image. In detail, the specific implementation steps of the pretreatment operation are as follows:
d. noise reduction:
according to the invention, the self-adaptive image noise reduction filter is used for reducing noise of the text image, so that the salt-pepper noise of the text image is filtered, and the details of the text image can be protected to a great extent. Wherein the salt and pepper noise is a randomly occurring white point or black point in the image, and the adaptive image noise reduction filter is a signal extractor for extracting an original signal from the signal polluted by the noise.
In the preferred embodiment of the invention, by presetting the text image as f (x, y), a degradation image g (x, y) is obtained under the effect of the degradation function H due to the influence of the pretzel noise η (x, y). Then, an image degradation formula g (x, y) =η (x, y) +f (x, y) is obtained, and noise reduction is performed on the text image by using an Adaptive Filter method, wherein the noise reduction calculation formula is as follows:
wherein,is the noise variance of the text image,/>Is the mean value of the gray scale of the pixels in a window near point (x, y), +.>Is the variance of the pixel gray levels within a window around point (x, y).
e. Contrast enhancement:
the contrast refers to the contrast between the maximum value and the minimum value of brightness in an imaging system, wherein low contrast can increase the difficulty of image processing. The preferred embodiment of the invention adopts a contrast stretching method, and the aim of enhancing the contrast of the text image is achieved by utilizing a mode of improving the dynamic range of the gray level. The contrast stretching is also called gray stretching.
Furthermore, the invention carries out gray scale stretching on the specific area according to the piecewise linear transformation function in the contrast stretching method, thereby further improving the contrast of the output image. When contrast stretching is performed, it is essentially the gray value transformation that is achieved. The invention realizes gray value conversion by linear stretching, wherein the linear stretching refers to pixel level operation with linear relation between input gray values and output gray values, and a gray conversion formula is as follows:
D b =f(D a )=a*D a +b
where a is the linear slope and b is the intercept on the Y axis. When a > 1, the contrast of the image output at this time is enhanced compared with the original image. When a < 1, the contrast of the image output is weakened compared with the original image, wherein D a Represents the gray value of the input image, D b Representing the output image gray value.
f. Image thresholding operations:
the invention enhances the contrast ratio of the text image through OTSU algorithmAnd (5) performing a binarization efficient algorithm to obtain a binarized image. Further, in the preferred embodiment of the present invention, the preset gray level t is a segmentation threshold value of the foreground and the background of the text image after the contrast enhancement, and the preset proportion of the foreground points to the text image after the contrast enhancement is w 0 Average gray level u 0 The method comprises the steps of carrying out a first treatment on the surface of the The proportion of the background points to the text image after the contrast enhancement is w 1 Average gray level u 1 The total average gray level of the text image after the contrast enhancement is:
u=w 0 *u 0 +w 1 *u 1
the variance of the foreground and background images of the text image after the contrast enhancement is as follows:
g=w 0 *(u 0 -u)*(u 0 -u)+w 1 *(u 1 -u)*(u 1 -u)=w 0 *w 1 *(u 0 -u 1 )*(u 0 -u 1 ),
when the variance g is maximum, the foreground and the background are the largest, the gray level t is the optimal threshold, the gray level value larger than the gray level t in the text image after the contrast enhancement is set to 255, and the gray level value smaller than the gray level t is set to 0, so as to obtain the binarized text image of the text image after the contrast enhancement.
Further, the preprocessing operation of the present invention may further include dimension reduction of the binarized text image by a principal component analysis method, so that the binarized text image can be processed more efficiently. Wherein the principal component analysis is a method of converting a set of variables for which correlation may exist into a set of linearly uncorrelated variables by a positive-negative conversion.
Detecting the deflected text in the binarized text image through an iterative algorithm to obtain a deflected text image, and extracting the deflected text image to obtain a binary copy image.
The preferred embodiment of the invention detects the deflected text in the binarized text image to the deflected text image through an AdaBoost iterative algorithm. The AdaBoost iterative algorithm is a detection algorithm, the core of the AdaBoost iterative algorithm is iteration, the AdaBoost iterative algorithm is a weak classifier constructed for different training sets, and each basic weak classifier is combined together to form a final strong classifier. The AdaBoost iterative algorithm is implemented by adjusting data distribution, and the weight of each sample is set according to the accuracy of judging the classification of each sample in each training set and the accuracy of the total classification of the last sample. The newly obtained weight is used as a data set trained by the lower-layer classifier, and the trained classifiers are combined each time to form a final decision classifier.
The invention divides different areas in the binarized text image to obtain training samples (x) 1 ,y 1 ),(x 2 ,y 2 ),…(x n ,y n ) Wherein the negative sample (background) is y i Expressed by =0, positive samples (foreground, i.e. containing skewed text) are represented by y i =1. Preferably, the weak classifier constructed by the invention is:
wherein f is a feature, θ is a threshold, p indicates the direction of the inequality sign, and x indicates a detection sub-window. By aggregating the constructed weak classifiers and classifying a minimum error rate ε in the constructed weak classifiers t Optimal weak classifier h t (x) Selecting, the epsilon t The calculation formula of (2) is as follows:
ε t =min f,p,θi (w i /∑w i )|h(x,f,p,θ)-y i |,
wherein w is a characteristic weight value, and a final strong classifier is obtained:
β t =ε t /(1-ε t )。
further, the method detects the skewed text in the binarized text image by cascading classifiers. The cascade classifier is to form a text detection cascade classifier by the training strong classifier in a cascade mode, and the cascade classifier is a degenerated decision tree. In the cascade classifier, the layer 2 classifier classification is triggered by the positive sample obtained by the layer 1 classification, the layer 3 classifier classification is triggered by the positive sample obtained by the layer 2 classification, and so on. And finally detecting all the deflection text images in the binarized text images in a general environment, and cutting the deflection text images to obtain the binary copy image.
Step three, carrying out progressive rotation on the binary copy image, converting the binary copy image subjected to progressive rotation into a frequency projection histogram, and obtaining a frequency projection histogram set of the binary copy image according to the progressive rotation angle of the binary copy image.
In the preferred embodiment of the invention, the binary copy image is rotated in a progressive manner according to a preset angle, preferably, the binary copy image is rotated in a progressive manner by taking 2 degrees as a unit between-45 degrees and 45 degrees, and the number of long and wide pixel points in the binary copy image is calculated after each progressive rotation.
Further, the binary copy image after progressive rotation is converted into a frequency projection histogram through a Fourier transform algorithm. In detail, the fourier transform method includes:
the method comprises the following steps of:
wherein u=0, 1,2, 3..m-1; v=0, 1,2, 3..n-1; x=0, 1,2, 3..m-1; y=0, 1,2, 3..n-1; m, N the number of long and wide pixel points in the binary copy image, x and y are space coordinate points, F (x, y) is the space domain sampling value of the binary copy image, F (u, v) is the Fourier transform domain sampling value of the binary copy image, and u and v are transform domain coordinate points. When the binary copy image is listed as a square matrix, then m=n. F (u, v) is called the spectrum of the binary copy image signal F (x, y), and calculates the binary copy image amplitude spectrum and the phase spectrum after fourier transform, respectively:
wherein F (u, v) =r (u, v) +ji (u, v) = |f (u, v) |e (u, v), |f (u, v) | represents the two-copy image magnitude spectrum, and phi (u, v) represents the two-copy image phase spectrum.
Further, the invention constructs the frequency projection histogram according to the calculated amplitude spectrum and phase spectrum of the binary copy image, and obtains different frequency projection histograms according to different progressive rotation angles of the binary copy image, namely a frequency projection histogram set of the binary copy image.
And step four, calculating standard deviation of peak top points and peak valley points in the frequency projection histogram to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as a correction angle of the text image to finish angle correction of the text image.
In a preferred embodiment of the present invention, the method for calculating the standard deviation between the peak top and the peak valley point in the frequency projection histogram set includes:
wherein sigma represents standard deviation of frequency projection histogram, x i Represents the ith peak point in the frequency projection histogram, n represents the number of peak points in the frequency projection histogram, y j The i-th peak and valley point in the frequency projection histogram is represented, m represents the number of peak and valley points in the frequency projection histogram, and mu is the average value of all peak peaks and peak and valley points. The standard deviation reflects the degree of dispersion between the peak valley point and the peak top point.
Further, the invention calculates standard deviation of all histograms in the frequency projection histogram set to obtain a standard deviation set, obtains the optimal azimuth after correcting the text image when the standard deviation is maximum according to the structural characteristics of the text image, obtains the correction angle of the text image, and carries out rotary correction on the original image according to the correction angle.
Alternatively, in other embodiments, the text image angle correcting program may be further divided into one or more modules, where one or more modules are stored in the memory 11 and executed by one or more processors (the processor 12 in this embodiment) to perform the present invention, and the modules referred to herein are a series of instruction segments of a computer program capable of performing a specific function for describing the execution of the text image angle correcting program in the text image angle correcting device.
For example, referring to fig. 3, a schematic program module of a text image angle correction program in an embodiment of the text image angle correction apparatus according to the present invention is shown, where the text image angle correction program may be divided into a text image preprocessing module 10, a text image detection module 20, an image conversion module 30, and a calculation module 40, and the text image angle correction program is exemplified as follows:
the text image preprocessing module 10 is used for: and acquiring a text image, and preprocessing the text image to obtain a binarized text image.
The text image detection module 20 is configured to: and detecting the deflected text in the binarized text image through an iterative algorithm to obtain a deflected text image, and cutting the deflected text image to obtain a binary copy image.
The image conversion module 30 is configured to: and carrying out progressive rotation on the binary copy image, converting the binary copy image subjected to progressive rotation into a frequency projection histogram, and obtaining a frequency projection histogram set of the binary copy image according to the progressive rotation angle of the binary copy image.
The calculation module 40 is configured to: calculating standard deviation of peak top and peak valley point of the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as a correction angle of the text image, thereby finishing angle correction of the text image.
The functions or operation steps implemented when the program modules such as the text image preprocessing module 10, the text image detection module 20, the image conversion module 30, and the calculation module 40 are executed are substantially the same as those of the above-described embodiments, and will not be described herein.
In addition, an embodiment of the present invention further provides a computer readable storage medium, where a text image angle deviation correcting program is stored, where the text image angle deviation correcting program can be executed by one or more processors to implement the following operations:
acquiring a text image, and performing preprocessing operation on the text image to obtain a binarized text image;
detecting a deflected text in the binarized text image through an iterative algorithm to obtain a deflected text image, and cutting the deflected text image to obtain a binary copy image;
performing progressive rotation on the binary copy image, converting the binary copy image subjected to progressive rotation into a frequency projection histogram, and obtaining a frequency projection histogram set of the binary copy image according to the progressive rotation angle of the binary copy image;
calculating standard deviation of peak top and peak valley point of the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as a correction angle of the text image, thereby finishing angle correction of the text image.
The computer-readable storage medium of the present invention is substantially the same as the above-described embodiments of the text image angle deviation correcting device and method, and will not be described in detail herein.
It should be noted that, the foregoing reference numerals of the embodiments of the present invention are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (7)

1. A text image angle correction method, the method comprising:
acquiring a text image, and performing preprocessing operation on the text image to obtain a binarized text image;
detecting a deflected text in the binarized text image through an iterative algorithm to obtain a deflected text image, and cutting the deflected text image to obtain a binary copy image;
performing progressive rotation on the binary copy image, converting the binary copy image subjected to progressive rotation into a frequency projection histogram, and obtaining a frequency projection histogram set of the binary copy image according to the progressive rotation angle of the binary copy image;
calculating standard deviation of peak top points and peak valley points of the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as a correction angle of the text image so as to finish angle correction of the text image;
the converting the binary copy image after progressive rotation into a frequency projection histogram includes: performing Fourier transform on the binary copy image subjected to progressive rotation; calculating the amplitude spectrum and the phase spectrum of the binary copy image after Fourier transformation; constructing the frequency projection histogram according to the magnitude spectrum and the phase spectrum;
the method for calculating the standard deviation of the peak top point and the peak valley point of the frequency projection histogram set comprises the following steps:
wherein sigma represents standard deviation of frequency projection histogram, x i Represents the ith peak point in the frequency projection histogram, n represents the number of peak points in the frequency projection histogram, y j The i-th peak and valley point in the frequency projection histogram is represented, m represents the number of peak and valley points in the frequency projection histogram, and mu is the average value of all peak peaks and peak and valley points.
2. The text image angle correction method of claim 1, wherein said preprocessing the text image to obtain a binarized text image comprises:
and denoising the text image by using an adaptive image denoising filter, performing contrast enhancement on the text image after denoising by using a contrast stretching mode, and performing thresholding operation on the text image after contrast enhancement according to an OTSU algorithm to obtain the binarized text image.
3. The text image angle rectification method as claimed in claim 1, wherein said fourier transform method comprises:
the method comprises the following steps of:
wherein u=0, 1,2,3 … M-1; v=0, 1,2,3 … N-1; x=0, 1,2,3 … M-1; y=0, 1,2,3 … N-1; m, N the number of long and wide pixel points in the binary copy image, x and y are space coordinate points, F (x, y) is the space domain sampling value of the binary copy image, F (u, v) is the Fourier transform domain sampling value of the binary copy image, and u and v are transform domain coordinate points.
4. The device is characterized by comprising a memory and a processor, wherein the memory stores a text image angle deviation correcting program which can be run on the processor, and the text image angle deviation correcting program realizes the following steps when being executed by the processor:
acquiring a text image, and performing preprocessing operation on the text image to obtain a binarized text image;
detecting a deflected text in the binarized text image through an iterative algorithm to obtain a deflected text image, and cutting the deflected text image to obtain a binary copy image;
performing progressive rotation on the binary copy image, converting the binary copy image subjected to progressive rotation into a frequency projection histogram, and obtaining a frequency projection histogram set of the binary copy image according to the progressive rotation angle of the binary copy image;
calculating standard deviation of peak top points and peak valley points of the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as a correction angle of the text image so as to finish angle correction of the text image;
the converting the binary copy image after progressive rotation into a frequency projection histogram includes: performing Fourier transform on the binary copy image subjected to progressive rotation; calculating the amplitude spectrum and the phase spectrum of the binary copy image after Fourier transformation; constructing the frequency projection histogram according to the magnitude spectrum and the phase spectrum;
the method for calculating the standard deviation of the peak top point and the peak valley point of the frequency projection histogram set comprises the following steps:
wherein sigma represents standard deviation of frequency projection histogram, x i Represents the ith peak point in the frequency projection histogram, n represents the number of peak points in the frequency projection histogram, y j The i-th peak and valley point in the frequency projection histogram is represented, m represents the number of peak and valley points in the frequency projection histogram, and mu is the average value of all peak peaks and peak and valley points.
5. The text image angle correcting apparatus according to claim 4, wherein the preprocessing operation for the text image to obtain a binarized text image comprises:
and denoising the text image by using an adaptive image denoising filter, performing contrast enhancement on the text image after denoising by using a contrast stretching mode, and performing thresholding operation on the text image after contrast enhancement according to an OTSU algorithm to obtain the binarized text image.
6. The text image angle correcting apparatus as claimed in claim 4, wherein the fourier transform method comprises:
the method comprises the following steps of:
wherein u=0, 1,2,3 … M-1; v=0, 1,2,3 … N-1; x=0, 1,2,3 … M-1; y=0, 1,2,3 … N-1; m, N the number of long and wide pixel points in the binary copy image, x and y are space coordinate points, F (x, y) is the space domain sampling value of the binary copy image, F (u, v) is the Fourier transform domain sampling value of the binary copy image, and u and v are transform domain coordinate points.
7. A computer-readable storage medium having stored thereon a text image angle rectification program executable by one or more processors to implement the steps of the text image angle rectification method of any one of claims 1 to 3.
CN201910846892.XA 2019-09-06 2019-09-06 Text image angle deviation correcting method and device and computer readable storage medium Active CN110705546B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910846892.XA CN110705546B (en) 2019-09-06 2019-09-06 Text image angle deviation correcting method and device and computer readable storage medium
PCT/CN2019/116549 WO2021042509A1 (en) 2019-09-06 2019-11-08 Method and apparatus for rectifying deflection of angle of text image, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910846892.XA CN110705546B (en) 2019-09-06 2019-09-06 Text image angle deviation correcting method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110705546A CN110705546A (en) 2020-01-17
CN110705546B true CN110705546B (en) 2023-12-19

Family

ID=69196126

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910846892.XA Active CN110705546B (en) 2019-09-06 2019-09-06 Text image angle deviation correcting method and device and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN110705546B (en)
WO (1) WO2021042509A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011215828A (en) * 2010-03-31 2011-10-27 Canon Inc Image correction apparatus, and method of controlling the same
CN107992869A (en) * 2016-10-26 2018-05-04 深圳超多维科技有限公司 For tilting the method, apparatus and electronic equipment of word correction
WO2019056346A1 (en) * 2017-09-25 2019-03-28 深圳传音通讯有限公司 Method and device for correcting tilted text image using expansion method

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4817176A (en) * 1986-02-14 1989-03-28 William F. McWhortor Method and apparatus for pattern recognition
US8160393B2 (en) * 2008-09-18 2012-04-17 Certifi Media Inc. Method for image skew detection
KR101566196B1 (en) * 2009-03-02 2015-11-05 삼성전자주식회사 Method and apparatus for classifying an image using histogram analysis and method and apparatus for recognizing text image using thereof
CN103761700A (en) * 2013-12-23 2014-04-30 南京信息工程大学 Watermark method capable of resisting printing scanning attack and based on character refinement
US9621761B1 (en) * 2015-10-08 2017-04-11 International Business Machines Corporation Automatic correction of skewing of digital images
CN108121983A (en) * 2016-11-29 2018-06-05 蓝盾信息安全技术有限公司 A kind of text image method for correcting error based on Fourier transformation
CN107480728B (en) * 2017-08-28 2019-02-26 南京大学 A kind of discrimination method of the mimeograph documents based on Fourier's residual values
CN109409356B (en) * 2018-08-23 2021-01-08 浙江理工大学 Multi-direction Chinese print font character detection method based on SWT

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011215828A (en) * 2010-03-31 2011-10-27 Canon Inc Image correction apparatus, and method of controlling the same
CN107992869A (en) * 2016-10-26 2018-05-04 深圳超多维科技有限公司 For tilting the method, apparatus and electronic equipment of word correction
WO2019056346A1 (en) * 2017-09-25 2019-03-28 深圳传音通讯有限公司 Method and device for correcting tilted text image using expansion method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于投影直方图法的偏微分方程文本图像版面检测算法研究;王龙;;佳木斯职业学院学报(第02期);第271-273页 *

Also Published As

Publication number Publication date
CN110705546A (en) 2020-01-17
WO2021042509A1 (en) 2021-03-11

Similar Documents

Publication Publication Date Title
CN110853047B (en) Intelligent image segmentation and classification method, device and computer readable storage medium
Azulay et al. Why do deep convolutional networks generalize so poorly to small image transformations?
CN110717497B (en) Image similarity matching method, device and computer readable storage medium
US9418283B1 (en) Image processing using multiple aspect ratios
CN110738203B (en) Field structured output method, device and computer readable storage medium
US8644561B2 (en) License plate optical character recognition method and system
CN109086714A (en) Table recognition method, identifying system and computer installation
CN108229532B (en) Image recognition method and device and electronic equipment
CN110517283A (en) Attitude Tracking method, apparatus and computer readable storage medium
Ebrahimi et al. SUSurE: Speeded up surround extrema feature detector and descriptor for realtime applications
CN111899270B (en) Card frame detection method, device, equipment and readable storage medium
CN109685059B (en) Text image labeling method, text image labeling device and computer readable storage medium
CN110852311A (en) Three-dimensional human hand key point positioning method and device
CN112163443A (en) Code scanning method, code scanning device and mobile terminal
CN110232381B (en) License plate segmentation method, license plate segmentation device, computer equipment and computer readable storage medium
US10115036B2 (en) Determining the direction of rows of text
CN106663212B (en) Character recognition device, character recognition method, and computer-readable storage medium
CN110929561B (en) Intelligent form text filtering method and device and computer readable storage medium
CN110705546B (en) Text image angle deviation correcting method and device and computer readable storage medium
Krupiński et al. Binarization of degraded document images with generalized Gaussian distribution
Tong et al. QR code detection based on local features
CN110598033A (en) Intelligent self-checking vehicle method and device and computer readable storage medium
CN115063826A (en) Mobile terminal driver license identification method and system based on deep learning
Krupiński et al. Improved two-step binarization of degraded document images based on Gaussian mixture model
CN114299020A (en) Scooter display screen defect detection method and system based on machine vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40019637

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant