CN106909897B

CN106909897B - Text image inversion rapid detection method

Info

Publication number: CN106909897B
Application number: CN201710090240.9A
Authority: CN
Inventors: 王建; 庞彦伟
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2017-02-20
Filing date: 2017-02-20
Publication date: 2020-03-13
Anticipated expiration: 2037-02-20
Also published as: CN106909897A

Abstract

The invention relates to a text image inversion rapid detection method, which comprises the steps of preprocessing an input text image to obtain a binarization processing result B; detecting effective text lines to obtain an effective text line sequence; text line classification is carried out, and the method comprises the following steps: 1) filling a blank between adjacent characters of each effective text line s of the effective text line sequence; 2) calculating the projection value of each effective text line s in the vertical direction, and expressing the projection value by V (c), wherein c expresses the sequence number of the column; 3) obtaining a left boundary and a right boundary of the effective text line s; 4) obtaining a left boundary and a right boundary of the effective text line sequence; 5) judging a left indented text line, a right indented text line and a non-indented text line; and detecting inversion of the text image.

Description

Text image inversion rapid detection method

Technical Field

The invention relates to a text image enhancement technology, in particular to a direction inversion detection technology for a scanned text image.

Background

As computer technology is continuously developed, a text image digitization technology based on OCR (optical character recognition) is widely used. In completing the OCR process, the direction of the text in the text image is critical to the character recognition performance. When the characters have inclination, if the characters are not corrected, the recognition rate of the characters is seriously influenced. Especially when the text is inverted (i.e. deviated by about 180 deg. from the normal orientation). Therefore, before OCR is performed, it is necessary to determine whether the text image has an inversion condition, and for the inversion condition, it should be considered that the rotation process is performed first to ensure that the subsequent recognition process is performed normally.

For the text image with the inclination, the inclination can be detected and corrected correspondingly by means of the existing correction algorithm. However, most of the existing text image deviation correction methods assume that the gradient of an input text image is within a certain range, and firstly, the information of the inclination angle is obtained, so that the gradient correction is completed. However, when the input text image is completely inverted, the existing tilt angle detection method is basically ineffective. Parade et al have proposed a method for fast detecting text image inversion based on punctuation marks. Firstly, detecting text characters; then, by combining the Chinese characters and the structural characteristics of the punctuation marks, the punctuation marks in the text image are screened out, and the type of the punctuation marks is judged according to the pixel distribution characteristics of the punctuation marks; and finally, judging whether the Chinese text image is inverted or not by combining the use habit of punctuation marks. Cinnarization et al (patent publication No. CN102831421A) propose a text up-and-down direction detection method based on punctuation marks. The method provided by the patent judges the direction of the text according to the relative position attribute of the punctuation marks and the text lines, and the basic idea is similar to that of the method provided by great paradox. The punctuation mark-based method completely depends on punctuation features and is ineffective for text images with few punctuation marks, so the method has limited application range and no universality.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide a method for quickly detecting the direction inversion of a text image. The technical scheme is as follows:

a text image inversion rapid detection method comprises the following steps:

the first step is as follows: preprocessing an input text image to obtain a binarization processing result B;

the second step is that: detecting effective text lines to obtain an effective text line sequence;

the third step: text line classification is carried out, and the method comprises the following steps:

1) for each effective text line s of the effective text line sequence, performing expansion operation by using a rectangular structural operator, and filling a blank between adjacent characters of the effective text line s;

2) calculating the projection value of each effective text line s in the vertical direction, and expressing the projection value by V (c), wherein c expresses the sequence number of the column;

3) statistics satisfy the conditions V (c)>0.5×R_heiC value of(s), and marking the minimum value of c as c_minCalled the left boundary of the active text line s; the maximum values are denoted as c_maxCalled the right border of the active text line s, the length of the scanning line being R_leg＝c_max-c_min；

4) Counting c corresponding to each effective text line in the same effective text line sequence_min(m) and c_max(m) mixing c_minThe minimum value of (m) is called the left boundary of the sequence of valid text lines, denoted c_lef(ii) a C is to_maxMaximum value of (m)The right boundary, denoted c, of the sequence of valid text lines_rgt；

5) For a valid text line m, if 0.6 is satisfied<|c_min(m)-c_lef|/|c_max(m)-c_min(m)|<0.9, judging the effective text line m as a left indented text line; if 0.6 is satisfied<|c_rgt-c_max(m)|/(c_max(m)-c_min(m))<0.9, the effective text line m is judged as a right indented text line; if neither of the above two conditions is satisfied, the text line is judged as a non-indented text line;

the fourth step: detecting inversion of the text image, wherein the method comprises the following steps:

counting the number of left indented text lines and right indented text lines in a single text image, and respectively using N_lefAnd N_rgtRepresents; determining whether there is inversion of the text image using:

preferably, the method of the second step is as follows:

1) calculating the projection value of each line in the B in the horizontal direction, and expressing the projection value by H (r), wherein r expresses the serial number of the line;

2) calculating the maximum value of H (r) using H_maxRepresents;

3) for the r-th scanning line, if H (r) is satisfied>0.5×H_maxJudging the line as a valid scanning line;

4) counting the distribution condition of each effective scanning line, and if M continuous effective scanning lines are judged to be effective scanning lines and M is greater than M/100, forming an effective text line sequence by the M continuous effective scanning lines;

determining the line numbers of the uppermost and lowermost active scan lines in the sequence of active text lines by R_top(s) and R_bot(s) respectively representing the upper and lower boundaries of the sequence of valid text lines, defining the height of the sequence of valid text lines as R_hei(s)＝|R_top(s)-R_bot(s) |, the symbol |, represents the absolute value symbol, wheres is the sequence number of the valid text line.

Drawings

FIG. 1 is a flow chart of the method of the present invention

FIG. 2 is a schematic diagram of important definitions used in the present invention

FIG. 3 is a schematic diagram of the type of text line defined by the present invention

FIG. 4 is a diagram of the number of left and right indented text lines of a text image for use in the experiment of the present invention

Detailed Description

Firstly, preprocessing operations such as graying, bilateral filtering, contrast enhancement, binaryzation and the like are carried out on an input text color image, so that the visual quality of a document image is improved; then, detecting effective text lines in the text image by means of horizontal projection analysis, and classifying the text lines by combining the position and length characteristics of the text lines; and finally, judging whether the text image is inverted or not according to the relative number of the left indented text line and the right indented text line. Fig. 1 shows a block diagram of the proposed method.

First a number of useful definitions are given. A text image is composed of a plurality of paragraphs, and character fonts, formats and other characteristics in each paragraph are basically consistent. The present invention refers to the leftmost and rightmost positions where each character can appear in each paragraph as the "paragraph left boundary" and the "paragraph right boundary", respectively. FIG. 2 shows a schematic diagram of the left and right boundaries of a paragraph. Each paragraph may include one or more lines of text, for any line of text, the left side of its leftmost character and the right side of its rightmost character are referred to as the "line left boundary" and the "line right boundary" of the line of text, respectively. Fig. 2 gives a schematic diagram of the row left and right boundaries.

For a certain text line, the left and right boundaries of the text line are basically coincident with the left and right boundaries of the paragraph to which the text line belongs, and the text line is called as a complete text line. If the left boundary of a text line has 2-4 characters from the left boundary of the paragraph to which the text line belongs, and the right boundary of the text line is basically coincident with the right boundary of the paragraph to which the text line belongs, the text line is called as a 'left indented text line'. For a certain text line, the right boundary of the text line has 2-4 characters from the right boundary of the paragraph to which the text line belongs, and meanwhile, the left boundary of the text line is basically coincident with the left boundary of the paragraph to which the text line belongs, so that the text line is called as a 'right indented text line'. Fig. 3 shows a schematic diagram of the three types of text lines.

According to the Chinese and English writing habit, the first line character of each paragraph is usually indented by 2-4 characters to the right, i.e. for paragraphs containing two or more text lines, there must be one left indented text line. If the text image is forward, several left indented text lines can be detected, of course. Conversely, if the text image is inverted, a plurality of right indented text lines can be detected. The invention judges whether the text image has inversion condition by detecting and judging the relative number of the left indented text line and the right indented text line in the text image.

The method provided by the invention comprises the following specific treatment processes: preprocessing, text line detection, text line classification, text direction inversion detection and the like.

1. Pretreatment of

The purpose of preprocessing is to improve the visual quality of a document image, and the preprocessing mainly comprises the following steps: graying, smooth filtering, contrast enhancement, binarization and the like.

(1) Graying:

judging whether the input text image is a gray image or not, and if so, keeping the input text image unchanged; in the case of color images, C is used_R、C_GAnd C_BThree color channels of red, green, and blue are represented, respectively, and a grayscale image is calculated using equation (1), which is represented by I.

I(x,y)＝min{C_R(x,y),C_G(x,y),C_B(x,y)} (1)

Where x is 0,1,2,., M-1, y is 0,1,2,., N-1, M, and N are the height and width of the text image, i.e., the total number of rows and the total number of columns, respectively.

(2) Smoothing filtering

And in consideration of noise pollution of the text image in the acquisition and digitization processes, the bilateral filtering technology is adopted to filter the gray image I, so that the noise influence is reduced. The image after bilateral filtering processing is denoted by G.

(3) Contrast enhancement

Due to the influence of illumination and other reasons, the contrast of the text image may be low, the histogram equalization technology is adopted to perform enhancement processing on the filtering image G, and the processing result is represented by E.

(4) Binarization processing

Calculating the global threshold corresponding to E by using a classical Otsu method and using T_hAnd (4) showing. Using T_hAnd (3) carrying out binarization processing on the E, wherein a processing result is represented by B, and the specific method comprises the following steps:

and B, points with the value of 1 represent text points, and points with the value of 0 represent background points.

2. Valid text line detection

Valid text line detection is accomplished using the following algorithm:

valid text line detection algorithm:

1) and calculating the projection value of each line in the B in the horizontal direction, and expressing the projection value by H (r), wherein r expresses the serial number of the line.

2) Calculating the maximum value of H (r) using H_maxAnd (4) showing.

3) For the r-th scanning line, if H (r) is satisfied>0.5×H_maxThen the line is determined to be a valid scan line.

4) And counting the distribution condition of each effective scanning line, and if M continuous effective scanning lines are judged as effective scanning lines and M is greater than M/100, forming one effective text line by the M continuous effective scanning lines.

5) Determining the line numbers of the uppermost and lowermost active scan lines in the active text line by R_top(s) and R_bot(s) respectively representing the upper and lower boundaries of the text line, defining the height of the text line as R_hei(s)＝|R_top(s)-R_bot(s) |, the symbol |, represents the absolute value symbol, where s is the number of the active text line.

3. Line classification of text

Text line classification is accomplished using the following algorithm:

text line classification algorithm:

1) and for a certain effective text line s, performing expansion operation on the text line by using a rectangular structure operator, and filling a blank between adjacent characters of the text line. The rectangular structure operator has a height of 2 pixels and a width of 50% of the text line height.

2) And calculating the projection value of the text line in the vertical direction, and expressing the projection value by V (c), wherein c expresses the column number.

3) Statistics satisfy the conditions V (c)>0.5×R_heiC value of(s), and marking the minimum value of c as c_minReferred to as the left boundary of the line of text; the maximum values are denoted as c_maxReferred to as the right boundary of the text line, the length of the scan line is R_leg＝c_max-c_min。

4) C corresponding to each effective text line in the same paragraph is counted_min(m) and c_max(m) mixing c_minThe minimum value of (m) is called the left-left boundary of the paragraph, denoted c_lef(ii) a C is to_maxThe maximum value of (m) is called the right boundary of the paragraph and is denoted c_rgt。

5) For a valid text line m, if 0.6 is satisfied<|c_min(m)-c_lef|/|c_max(m)-c_min(m)|<0.9, judging the text line as a left indented text line; if 0.6 is satisfied<|c_rgt-c_max(m)|/(c_max(m)-c_min(m))<0.9, the text line is judged as a right indented text line; if neither of the above conditions is satisfied, the text line is judged as a "non-indented text line".

4. Text image inversion detection

Counting the number of left indented text lines and right indented text lines in a single text image, and respectively using N_lefAnd N_rgtAnd (4) showing. Judging whether the text image has inversion by using the formula (3):

the examples are as follows:

matlab2015a under a Windows10 professional system is adopted as an experimental simulation platform, and a hardware platform is an Intel i5-6200U CPU and an 8G memory.

The method selects 90 text images acquired by the patent applicant as a test set, wherein 78 text images are inverted, and 12 text images are positive. Of the 90 text images, 56 Chinese text images account for 62%, and 34 English text images account for 38%. By adopting the method provided by the invention to process the test image, 100% of the inverted images are normally detected. Fig. 4 shows the distribution of the number of lines of left indented text and right indented text in 90 document images. As can be seen from the figure, for the forward text image, the left indented text line number is significantly greater than the right indented text line number; conversely, for an inverted directional text image, the number of lines of indented text on the right is greater than the number of lines of indented text on the left. It is clearly divided into two categories, namely the inverted text image category (identified by the symbol "x" in the figure) and the forward text image category (identified by the symbol "o" in the figure).

The size of the test image is 1944 × 2592, the resolution reaches 5000 ten thousand pixels, the average speed of processing one image is about 2300ms, if a C language compiling algorithm with higher execution efficiency is used, the processing speed is higher, and the requirement of real-time processing can be met.

The experimental result shows that the method can quickly and effectively judge whether the input scanned text image has inversion condition and process the text images of various language types including Chinese and English.

The steps of the invention are summarized as follows:

step 1: judging the type of the input scanning text image, and if the input scanning text image is a gray image, keeping the type unchanged; if the image is a color image, the image is converted into a gray image by the formula (1), and the gray image is represented by I.

Step 2: and (5) carrying out filtering processing on the gray level image I by adopting a bilateral filtering technology, wherein a filtering result is represented by G.

And step 3: and (4) enhancing the filtering result image G by adopting a histogram equalization technology, wherein the processing result is represented by E.

And 4, step 4: the global threshold of the enhancement result image is calculated by the Otsu method, and binarization processing is performed on E in combination with equation (2), and the processing result is represented by B.

And 5: and detecting the effective text lines in the scanned text image by adopting an effective text line detection algorithm.

Step 6: classifying each valid text line using a text line classification algorithm to determine the number N of left indented text lines and right indented text lines_lefAnd N_rgt。

And 7: and (3) judging whether the scanned text image is inverted or not.

Claims

1. A text image inversion rapid detection method comprises the following steps:

the first step is as follows: carrying out binarization processing on an input text image to obtain a binarization processing result B;

the second step is that: and detecting effective text lines to obtain an effective text line sequence, wherein the method comprises the following steps:

2) calculating the maximum value of H (r) using H_maxRepresents;

4) counting the distribution condition of each effective scanning line, and if M continuous effective scanning lines are judged to be effective scanning lines and M is more than M/100 and M is the height of the text image, namely the total scanning line number, forming an effective text line sequence by the M continuous effective scanning lines;

determining the line numbers of the uppermost and lowermost active scan lines in the sequence of active text lines by R_top(s) and R_bot(s) respectively representing the upper and lower boundaries of the sequence of valid text lines, defining the height of the sequence of valid text lines as R_hei(s)＝|R_top(s)-R_bot(s) |, the symbol |, represents the absolute value symbol, where s is the valid textThe serial number of the row;

3) statistics satisfy the conditions V (c)>0.5×R_heiC value of(s), R_hei(s) is the height of the effective text line s, and the minimum value of c is denoted as c_min(m), referred to as the left boundary of the active text line s; the maximum values are denoted as c_max(m), called the right border of the active text line s, the length of this scanning line being R_leg＝c_max(m)-c_min(m)；

4) Counting c corresponding to each effective text line in the same effective text line sequence_min(m) and c_max(m) mixing c_minThe minimum value of (m) is called the left boundary of the sequence of valid text lines, denoted c_lef(ii) a C is to_maxThe maximum value of (m) is called the right boundary of the sequence of valid text lines, denoted c_rgt；