CN106909897B - Text image inversion rapid detection method - Google Patents

Text image inversion rapid detection method Download PDF

Info

Publication number
CN106909897B
CN106909897B CN201710090240.9A CN201710090240A CN106909897B CN 106909897 B CN106909897 B CN 106909897B CN 201710090240 A CN201710090240 A CN 201710090240A CN 106909897 B CN106909897 B CN 106909897B
Authority
CN
China
Prior art keywords
text
line
effective
text line
lines
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710090240.9A
Other languages
Chinese (zh)
Other versions
CN106909897A (en
Inventor
王建
庞彦伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201710090240.9A priority Critical patent/CN106909897B/en
Publication of CN106909897A publication Critical patent/CN106909897A/en
Application granted granted Critical
Publication of CN106909897B publication Critical patent/CN106909897B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/242Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

The invention relates to a text image inversion rapid detection method, which comprises the steps of preprocessing an input text image to obtain a binarization processing result B; detecting effective text lines to obtain an effective text line sequence; text line classification is carried out, and the method comprises the following steps: 1) filling a blank between adjacent characters of each effective text line s of the effective text line sequence; 2) calculating the projection value of each effective text line s in the vertical direction, and expressing the projection value by V (c), wherein c expresses the sequence number of the column; 3) obtaining a left boundary and a right boundary of the effective text line s; 4) obtaining a left boundary and a right boundary of the effective text line sequence; 5) judging a left indented text line, a right indented text line and a non-indented text line; and detecting inversion of the text image.

Description

Text image inversion rapid detection method
Technical Field
The invention relates to a text image enhancement technology, in particular to a direction inversion detection technology for a scanned text image.
Background
As computer technology is continuously developed, a text image digitization technology based on OCR (optical character recognition) is widely used. In completing the OCR process, the direction of the text in the text image is critical to the character recognition performance. When the characters have inclination, if the characters are not corrected, the recognition rate of the characters is seriously influenced. Especially when the text is inverted (i.e. deviated by about 180 deg. from the normal orientation). Therefore, before OCR is performed, it is necessary to determine whether the text image has an inversion condition, and for the inversion condition, it should be considered that the rotation process is performed first to ensure that the subsequent recognition process is performed normally.
For the text image with the inclination, the inclination can be detected and corrected correspondingly by means of the existing correction algorithm. However, most of the existing text image deviation correction methods assume that the gradient of an input text image is within a certain range, and firstly, the information of the inclination angle is obtained, so that the gradient correction is completed. However, when the input text image is completely inverted, the existing tilt angle detection method is basically ineffective. Parade et al have proposed a method for fast detecting text image inversion based on punctuation marks. Firstly, detecting text characters; then, by combining the Chinese characters and the structural characteristics of the punctuation marks, the punctuation marks in the text image are screened out, and the type of the punctuation marks is judged according to the pixel distribution characteristics of the punctuation marks; and finally, judging whether the Chinese text image is inverted or not by combining the use habit of punctuation marks. Cinnarization et al (patent publication No. CN102831421A) propose a text up-and-down direction detection method based on punctuation marks. The method provided by the patent judges the direction of the text according to the relative position attribute of the punctuation marks and the text lines, and the basic idea is similar to that of the method provided by great paradox. The punctuation mark-based method completely depends on punctuation features and is ineffective for text images with few punctuation marks, so the method has limited application range and no universality.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a method for quickly detecting the direction inversion of a text image. The technical scheme is as follows:
a text image inversion rapid detection method comprises the following steps:
the first step is as follows: preprocessing an input text image to obtain a binarization processing result B;
the second step is that: detecting effective text lines to obtain an effective text line sequence;
the third step: text line classification is carried out, and the method comprises the following steps:
1) for each effective text line s of the effective text line sequence, performing expansion operation by using a rectangular structural operator, and filling a blank between adjacent characters of the effective text line s;
2) calculating the projection value of each effective text line s in the vertical direction, and expressing the projection value by V (c), wherein c expresses the sequence number of the column;
3) statistics satisfy the conditions V (c)>0.5×RheiC value of(s), and marking the minimum value of c as cminCalled the left boundary of the active text line s; the maximum values are denoted as cmaxCalled the right border of the active text line s, the length of the scanning line being Rleg=cmax-cmin
4) Counting c corresponding to each effective text line in the same effective text line sequencemin(m) and cmax(m) mixing cminThe minimum value of (m) is called the left boundary of the sequence of valid text lines, denoted clef(ii) a C is tomaxMaximum value of (m)The right boundary, denoted c, of the sequence of valid text linesrgt
5) For a valid text line m, if 0.6 is satisfied<|cmin(m)-clef|/|cmax(m)-cmin(m)|<0.9, judging the effective text line m as a left indented text line; if 0.6 is satisfied<|crgt-cmax(m)|/(cmax(m)-cmin(m))<0.9, the effective text line m is judged as a right indented text line; if neither of the above two conditions is satisfied, the text line is judged as a non-indented text line;
the fourth step: detecting inversion of the text image, wherein the method comprises the following steps:
counting the number of left indented text lines and right indented text lines in a single text image, and respectively using NlefAnd NrgtRepresents; determining whether there is inversion of the text image using:
Figure BDA0001228716180000021
preferably, the method of the second step is as follows:
1) calculating the projection value of each line in the B in the horizontal direction, and expressing the projection value by H (r), wherein r expresses the serial number of the line;
2) calculating the maximum value of H (r) using HmaxRepresents;
3) for the r-th scanning line, if H (r) is satisfied>0.5×HmaxJudging the line as a valid scanning line;
4) counting the distribution condition of each effective scanning line, and if M continuous effective scanning lines are judged to be effective scanning lines and M is greater than M/100, forming an effective text line sequence by the M continuous effective scanning lines;
determining the line numbers of the uppermost and lowermost active scan lines in the sequence of active text lines by Rtop(s) and Rbot(s) respectively representing the upper and lower boundaries of the sequence of valid text lines, defining the height of the sequence of valid text lines as Rhei(s)=|Rtop(s)-Rbot(s) |, the symbol |, represents the absolute value symbol, wheres is the sequence number of the valid text line.
Drawings
FIG. 1 is a flow chart of the method of the present invention
FIG. 2 is a schematic diagram of important definitions used in the present invention
FIG. 3 is a schematic diagram of the type of text line defined by the present invention
FIG. 4 is a diagram of the number of left and right indented text lines of a text image for use in the experiment of the present invention
Detailed Description
Firstly, preprocessing operations such as graying, bilateral filtering, contrast enhancement, binaryzation and the like are carried out on an input text color image, so that the visual quality of a document image is improved; then, detecting effective text lines in the text image by means of horizontal projection analysis, and classifying the text lines by combining the position and length characteristics of the text lines; and finally, judging whether the text image is inverted or not according to the relative number of the left indented text line and the right indented text line. Fig. 1 shows a block diagram of the proposed method.
First a number of useful definitions are given. A text image is composed of a plurality of paragraphs, and character fonts, formats and other characteristics in each paragraph are basically consistent. The present invention refers to the leftmost and rightmost positions where each character can appear in each paragraph as the "paragraph left boundary" and the "paragraph right boundary", respectively. FIG. 2 shows a schematic diagram of the left and right boundaries of a paragraph. Each paragraph may include one or more lines of text, for any line of text, the left side of its leftmost character and the right side of its rightmost character are referred to as the "line left boundary" and the "line right boundary" of the line of text, respectively. Fig. 2 gives a schematic diagram of the row left and right boundaries.
For a certain text line, the left and right boundaries of the text line are basically coincident with the left and right boundaries of the paragraph to which the text line belongs, and the text line is called as a complete text line. If the left boundary of a text line has 2-4 characters from the left boundary of the paragraph to which the text line belongs, and the right boundary of the text line is basically coincident with the right boundary of the paragraph to which the text line belongs, the text line is called as a 'left indented text line'. For a certain text line, the right boundary of the text line has 2-4 characters from the right boundary of the paragraph to which the text line belongs, and meanwhile, the left boundary of the text line is basically coincident with the left boundary of the paragraph to which the text line belongs, so that the text line is called as a 'right indented text line'. Fig. 3 shows a schematic diagram of the three types of text lines.
According to the Chinese and English writing habit, the first line character of each paragraph is usually indented by 2-4 characters to the right, i.e. for paragraphs containing two or more text lines, there must be one left indented text line. If the text image is forward, several left indented text lines can be detected, of course. Conversely, if the text image is inverted, a plurality of right indented text lines can be detected. The invention judges whether the text image has inversion condition by detecting and judging the relative number of the left indented text line and the right indented text line in the text image.
The method provided by the invention comprises the following specific treatment processes: preprocessing, text line detection, text line classification, text direction inversion detection and the like.
1. Pretreatment of
The purpose of preprocessing is to improve the visual quality of a document image, and the preprocessing mainly comprises the following steps: graying, smooth filtering, contrast enhancement, binarization and the like.
(1) Graying:
judging whether the input text image is a gray image or not, and if so, keeping the input text image unchanged; in the case of color images, C is usedR、CGAnd CBThree color channels of red, green, and blue are represented, respectively, and a grayscale image is calculated using equation (1), which is represented by I.
I(x,y)=min{CR(x,y),CG(x,y),CB(x,y)} (1)
Where x is 0,1,2,., M-1, y is 0,1,2,., N-1, M, and N are the height and width of the text image, i.e., the total number of rows and the total number of columns, respectively.
(2) Smoothing filtering
And in consideration of noise pollution of the text image in the acquisition and digitization processes, the bilateral filtering technology is adopted to filter the gray image I, so that the noise influence is reduced. The image after bilateral filtering processing is denoted by G.
(3) Contrast enhancement
Due to the influence of illumination and other reasons, the contrast of the text image may be low, the histogram equalization technology is adopted to perform enhancement processing on the filtering image G, and the processing result is represented by E.
(4) Binarization processing
Calculating the global threshold corresponding to E by using a classical Otsu method and using ThAnd (4) showing. Using ThAnd (3) carrying out binarization processing on the E, wherein a processing result is represented by B, and the specific method comprises the following steps:
Figure BDA0001228716180000031
and B, points with the value of 1 represent text points, and points with the value of 0 represent background points.
2. Valid text line detection
Valid text line detection is accomplished using the following algorithm:
valid text line detection algorithm:
1) and calculating the projection value of each line in the B in the horizontal direction, and expressing the projection value by H (r), wherein r expresses the serial number of the line.
2) Calculating the maximum value of H (r) using HmaxAnd (4) showing.
3) For the r-th scanning line, if H (r) is satisfied>0.5×HmaxThen the line is determined to be a valid scan line.
4) And counting the distribution condition of each effective scanning line, and if M continuous effective scanning lines are judged as effective scanning lines and M is greater than M/100, forming one effective text line by the M continuous effective scanning lines.
5) Determining the line numbers of the uppermost and lowermost active scan lines in the active text line by Rtop(s) and Rbot(s) respectively representing the upper and lower boundaries of the text line, defining the height of the text line as Rhei(s)=|Rtop(s)-Rbot(s) |, the symbol |, represents the absolute value symbol, where s is the number of the active text line.
3. Line classification of text
Text line classification is accomplished using the following algorithm:
text line classification algorithm:
1) and for a certain effective text line s, performing expansion operation on the text line by using a rectangular structure operator, and filling a blank between adjacent characters of the text line. The rectangular structure operator has a height of 2 pixels and a width of 50% of the text line height.
2) And calculating the projection value of the text line in the vertical direction, and expressing the projection value by V (c), wherein c expresses the column number.
3) Statistics satisfy the conditions V (c)>0.5×RheiC value of(s), and marking the minimum value of c as cminReferred to as the left boundary of the line of text; the maximum values are denoted as cmaxReferred to as the right boundary of the text line, the length of the scan line is Rleg=cmax-cmin
4) C corresponding to each effective text line in the same paragraph is countedmin(m) and cmax(m) mixing cminThe minimum value of (m) is called the left-left boundary of the paragraph, denoted clef(ii) a C is tomaxThe maximum value of (m) is called the right boundary of the paragraph and is denoted crgt
5) For a valid text line m, if 0.6 is satisfied<|cmin(m)-clef|/|cmax(m)-cmin(m)|<0.9, judging the text line as a left indented text line; if 0.6 is satisfied<|crgt-cmax(m)|/(cmax(m)-cmin(m))<0.9, the text line is judged as a right indented text line; if neither of the above conditions is satisfied, the text line is judged as a "non-indented text line".
4. Text image inversion detection
Counting the number of left indented text lines and right indented text lines in a single text image, and respectively using NlefAnd NrgtAnd (4) showing. Judging whether the text image has inversion by using the formula (3):
Figure BDA0001228716180000041
the examples are as follows:
matlab2015a under a Windows10 professional system is adopted as an experimental simulation platform, and a hardware platform is an Intel i5-6200U CPU and an 8G memory.
The method selects 90 text images acquired by the patent applicant as a test set, wherein 78 text images are inverted, and 12 text images are positive. Of the 90 text images, 56 Chinese text images account for 62%, and 34 English text images account for 38%. By adopting the method provided by the invention to process the test image, 100% of the inverted images are normally detected. Fig. 4 shows the distribution of the number of lines of left indented text and right indented text in 90 document images. As can be seen from the figure, for the forward text image, the left indented text line number is significantly greater than the right indented text line number; conversely, for an inverted directional text image, the number of lines of indented text on the right is greater than the number of lines of indented text on the left. It is clearly divided into two categories, namely the inverted text image category (identified by the symbol "x" in the figure) and the forward text image category (identified by the symbol "o" in the figure).
The size of the test image is 1944 × 2592, the resolution reaches 5000 ten thousand pixels, the average speed of processing one image is about 2300ms, if a C language compiling algorithm with higher execution efficiency is used, the processing speed is higher, and the requirement of real-time processing can be met.
The experimental result shows that the method can quickly and effectively judge whether the input scanned text image has inversion condition and process the text images of various language types including Chinese and English.
The steps of the invention are summarized as follows:
step 1: judging the type of the input scanning text image, and if the input scanning text image is a gray image, keeping the type unchanged; if the image is a color image, the image is converted into a gray image by the formula (1), and the gray image is represented by I.
Step 2: and (5) carrying out filtering processing on the gray level image I by adopting a bilateral filtering technology, wherein a filtering result is represented by G.
And step 3: and (4) enhancing the filtering result image G by adopting a histogram equalization technology, wherein the processing result is represented by E.
And 4, step 4: the global threshold of the enhancement result image is calculated by the Otsu method, and binarization processing is performed on E in combination with equation (2), and the processing result is represented by B.
And 5: and detecting the effective text lines in the scanned text image by adopting an effective text line detection algorithm.
Step 6: classifying each valid text line using a text line classification algorithm to determine the number N of left indented text lines and right indented text lineslefAnd Nrgt
And 7: and (3) judging whether the scanned text image is inverted or not.

Claims (1)

1. A text image inversion rapid detection method comprises the following steps:
the first step is as follows: carrying out binarization processing on an input text image to obtain a binarization processing result B;
the second step is that: and detecting effective text lines to obtain an effective text line sequence, wherein the method comprises the following steps:
1) calculating the projection value of each line in the B in the horizontal direction, and expressing the projection value by H (r), wherein r expresses the serial number of the line;
2) calculating the maximum value of H (r) using HmaxRepresents;
3) for the r-th scanning line, if H (r) is satisfied>0.5×HmaxJudging the line as a valid scanning line;
4) counting the distribution condition of each effective scanning line, and if M continuous effective scanning lines are judged to be effective scanning lines and M is more than M/100 and M is the height of the text image, namely the total scanning line number, forming an effective text line sequence by the M continuous effective scanning lines;
determining the line numbers of the uppermost and lowermost active scan lines in the sequence of active text lines by Rtop(s) and Rbot(s) respectively representing the upper and lower boundaries of the sequence of valid text lines, defining the height of the sequence of valid text lines as Rhei(s)=|Rtop(s)-Rbot(s) |, the symbol |, represents the absolute value symbol, where s is the valid textThe serial number of the row;
the third step: text line classification is carried out, and the method comprises the following steps:
1) for each effective text line s of the effective text line sequence, performing expansion operation by using a rectangular structural operator, and filling a blank between adjacent characters of the effective text line s;
2) calculating the projection value of each effective text line s in the vertical direction, and expressing the projection value by V (c), wherein c expresses the sequence number of the column;
3) statistics satisfy the conditions V (c)>0.5×RheiC value of(s), Rhei(s) is the height of the effective text line s, and the minimum value of c is denoted as cmin(m), referred to as the left boundary of the active text line s; the maximum values are denoted as cmax(m), called the right border of the active text line s, the length of this scanning line being Rleg=cmax(m)-cmin(m);
4) Counting c corresponding to each effective text line in the same effective text line sequencemin(m) and cmax(m) mixing cminThe minimum value of (m) is called the left boundary of the sequence of valid text lines, denoted clef(ii) a C is tomaxThe maximum value of (m) is called the right boundary of the sequence of valid text lines, denoted crgt
5) For a valid text line m, if 0.6 is satisfied<|cmin(m)-clef|/|cmax(m)-cmin(m)|<0.9, judging the effective text line m as a left indented text line; if 0.6 is satisfied<|crgt-cmax(m)|/(cmax(m)-cmin(m))<0.9, the effective text line m is judged as a right indented text line; if neither of the above two conditions is satisfied, the text line is judged as a non-indented text line;
the fourth step: detecting inversion of the text image, wherein the method comprises the following steps:
counting the number of left indented text lines and right indented text lines in a single text image, and respectively using NlefAnd NrgtRepresents; determining whether there is inversion of the text image using:
Figure FDA0002269369400000011
CN201710090240.9A 2017-02-20 2017-02-20 Text image inversion rapid detection method Expired - Fee Related CN106909897B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710090240.9A CN106909897B (en) 2017-02-20 2017-02-20 Text image inversion rapid detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710090240.9A CN106909897B (en) 2017-02-20 2017-02-20 Text image inversion rapid detection method

Publications (2)

Publication Number Publication Date
CN106909897A CN106909897A (en) 2017-06-30
CN106909897B true CN106909897B (en) 2020-03-13

Family

ID=59208458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710090240.9A Expired - Fee Related CN106909897B (en) 2017-02-20 2017-02-20 Text image inversion rapid detection method

Country Status (1)

Country Link
CN (1) CN106909897B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609482B (en) * 2017-08-15 2021-02-19 天津大学 Chinese text image inversion discrimination method based on Chinese character stroke characteristics
CN111414866A (en) * 2020-03-24 2020-07-14 上海眼控科技股份有限公司 Vehicle application form detection method and device, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831421A (en) * 2012-08-29 2012-12-19 华东师范大学 Method for detecting document up-down direction based on punctuation marks
CN106097254A (en) * 2016-06-07 2016-11-09 天津大学 A kind of scanning document image method for correcting error

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831421A (en) * 2012-08-29 2012-12-19 华东师范大学 Method for detecting document up-down direction based on punctuation marks
CN106097254A (en) * 2016-06-07 2016-11-09 天津大学 A kind of scanning document image method for correcting error

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
中文文本图像倒置快速检测算法;曾凡锋等;《计算机工程与设计》;20120920;第33卷(第9期);第3512-3516页 *
基于文字结构特征的文本图像方向的研究与应用;朱其猛;《中国优秀硕士学位论文全文数据库信息科技辑》;20140915(第9期);第1-38页 *

Also Published As

Publication number Publication date
CN106909897A (en) 2017-06-30

Similar Documents

Publication Publication Date Title
CN111325203B (en) American license plate recognition method and system based on image correction
CN106446896B (en) Character segmentation method and device and electronic equipment
EP2669847B1 (en) Document processing apparatus, document processing method and scanner
JP5616308B2 (en) Document modification detection method by character comparison using character shape feature
WO2017016448A1 (en) Qr code feature detection method and system
CN110598566A (en) Image processing method, device, terminal and computer readable storage medium
CN107944451B (en) Line segmentation method and system for ancient Tibetan book documents
CN113537227B (en) Structured text recognition method and system
CN113158977B (en) Image character editing method for improving FANnet generation network
CN108710882A (en) A kind of screen rendering text recognition method based on convolutional neural networks
Al Abodi et al. An effective approach to offline Arabic handwriting recognition
CN102737240B (en) Method of analyzing digital document images
CN106909897B (en) Text image inversion rapid detection method
CN108256518B (en) Character area detection method and device
CN114863492A (en) Method and device for repairing low-quality fingerprint image
CN113591831A (en) Font identification method and system based on deep learning and storage medium
CA2790210C (en) Resolution adjustment of an image that includes text undergoing an ocr process
CN116824608A (en) Answer sheet layout analysis method based on target detection technology
CN107609482B (en) Chinese text image inversion discrimination method based on Chinese character stroke characteristics
CN107798355B (en) Automatic analysis and judgment method based on document image format
CN108717544B (en) Newspaper sample manuscript text automatic detection method based on intelligent image analysis
CN116452809A (en) Line object extraction method based on semantic segmentation
CN105721738A (en) Color scanned document image preprocessing method
CN116612478A (en) Off-line handwritten Chinese character scoring method, device and storage medium
Kshetry Image preprocessing and modified adaptive thresholding for improving OCR

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200313