CN113159031A

CN113159031A - Handwritten text detection method and device and storage medium

Info

Publication number: CN113159031A
Application number: CN202110428121.6A
Authority: CN
Inventors: 陈鹏飞; 毛亮; 陈映庭; 杨晓帆
Original assignee: Guangzhou Huiyi Culture Technology Co ltd
Current assignee: Guangzhou Huiyi Culture Technology Co ltd
Priority date: 2021-04-21
Filing date: 2021-04-21
Publication date: 2021-07-23

Abstract

The invention discloses a handwritten text detection method, a device and a storage medium, wherein the method comprises the following steps: inputting a text picture to be detected, and positioning a text line of the text picture to be detected by adopting a key point positioning algorithm to obtain text line positioning information; performing affine transformation correction on original text lines in a text picture to be detected according to the text line positioning information to obtain corrected text lines; dividing the single character of the corrected text line according to the horizontal projection to obtain a candidate character area; and calculating the average width of the surrounding frames of the whole line of characters in the candidate character area, and merging the surrounding frames of the whole line of characters according to the average width to obtain a final character detection result. The embodiment of the invention not only can effectively correct text lines with different angles and different directions, but also can accurately combine the characters divided by the left and right components, and detect the handwritten text by combining the characteristics of Chinese characters, thereby further improving the accuracy and reliability of text detection.

Description

Handwritten text detection method and device and storage medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a handwritten text detection method, a handwritten text detection device and a storage medium.

Background

Text detection and recognition have wide application scenes in daily life, such as identification card recognition, ticket recognition, license plate recognition, form recognition and the like. The handwritten text is more changeable in shape compared with the printed text, and the corresponding detection and recognition difficulty is higher. Most of the existing text detection methods aim at printed text, the printed text is more orderly arranged compared with handwritten text lines, and the detection of a single character is easy to carry out no matter the traditional method or the deep learning method. However, due to the fact that the handwritten text lines have different heights and the characters have left-right and up-down structures, the conventional text detection method is difficult to accurately detect the handwritten text.

Disclosure of Invention

The invention provides a handwritten text detection method, which aims to solve the technical problem that the conventional text detection method is difficult to accurately detect handwritten texts.

A first embodiment of the present invention provides a handwritten text detection method, including:

inputting a text picture to be detected, and positioning a text line of the text picture to be detected by adopting a key point positioning algorithm to obtain text line positioning information;

performing affine transformation correction on the original text line in the text picture to be detected according to the text line positioning information to obtain a corrected text line;

dividing the single character of the corrected text line according to horizontal projection to obtain a candidate character area;

and calculating the average width of the surrounding frames of the whole line of characters in the candidate character area, and merging the surrounding frames of the whole line of characters according to the average width to obtain a final character detection result.

Further, the method for locating the text line of the text picture to be detected by using the key point location algorithm to obtain text line location information specifically comprises the following steps:

and adding a key point output branch on the basis of the yolov3 key point positioning algorithm to improve the yolov3 key point positioning algorithm, and positioning the text line of the text picture to be detected according to the improved yolov3 key point positioning algorithm to obtain text line positioning information.

Further, the original text line includes an oblique text line and a text line with inconsistent height, and the affine transformation correction is performed on the original text line in the text picture to be detected according to the text line positioning information to obtain a corrected text line, specifically:

obtaining four-point positioning information of the text line according to the text line positioning information, obtaining each side length of a quadrangle formed by connecting the four-point positioning information, and determining a target correction rectangle according to each side length of the quadrangle;

calculating an affine transformation matrix from the key point coordinates of the text line to the target correction rectangle by using opencv;

and performing affine transformation on the inclined text line and the high-low inconsistent text line to a text correction text line according to the affine transformation matrix.

Further, the segmenting the single character of the corrected text textual line according to the horizontal projection to obtain a candidate character region specifically includes:

after binarization processing is carried out on the corrected text line, accumulating pixel values in the corrected text line in the horizontal direction to obtain a wave line;

and segmenting wave crests in the wavy lines by setting a threshold value to obtain candidate character areas.

Further, the calculating an average width of the bounding boxes of the whole line of characters in the candidate character region, and merging the bounding boxes of the whole line of characters according to the average width to obtain a final character detection result, specifically:

and calculating the average width of the whole line of character surrounding frames in the candidate character area, and combining the surrounding frames which are adjacent to the whole line of character surrounding frames and have the width smaller than the average width to obtain a final character detection result.

A second embodiment of the present invention provides a handwritten text detection apparatus, including:

the positioning module is used for inputting a text picture to be detected, and positioning a text line of the text picture to be detected by adopting a key point positioning algorithm to obtain text line positioning information;

the correction module is used for carrying out affine transformation correction on the original text line in the text picture to be detected according to the text line positioning information to obtain a corrected text line;

the segmentation module is used for segmenting the single character of the corrected text line according to the horizontal projection to obtain a candidate character area;

and the merging module is used for calculating the average width of the surrounding frames of the whole line of characters in the candidate character area, and merging the surrounding frames of the whole line of characters according to the average width to obtain a final character detection result.

Further, the correction module is specifically configured to:

Further, the segmentation module is specifically configured to: after binarization processing is carried out on the corrected text line, accumulating pixel values in the corrected text line in the horizontal direction to obtain a wave line;

Further, the merging module is specifically configured to:

A third embodiment of the present invention provides a computer-readable storage medium, which includes a stored computer program, wherein when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute a handwritten text detection method as described above.

The embodiment of the invention adopts the key point positioning algorithm to position the text line to obtain accurate text line positioning information, and corrects the text line in the text picture to be detected through radioactive transformation according to the text line positioning information, so that the text lines at different angles and different directions can be effectively corrected, and the accuracy of text detection can be improved; the embodiment of the invention can accurately combine the characters divided from the left and right components in the preliminary detection, and the surrounding frame of each character is subjected to position adjustment, so that the handwritten text is detected by combining the characteristics of Chinese characters, and the accuracy and the reliability of text detection are further improved.

Drawings

Fig. 1 is a schematic flowchart of a handwritten text detection method provided in an embodiment of the present invention;

FIG. 2 is a diagram illustrating the effect of text line positioning according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an effect of text line rectification according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating an effect of segmenting a whole line of text according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating an effect of text detection according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a handwritten text detection apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the description of the present application, it is to be understood that the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless otherwise specified.

In the description of the present application, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.

Referring to fig. 1-5, a first embodiment of the present invention is shown. A first embodiment of the present invention provides a handwritten text detection method as shown in fig. 1, including:

s1, inputting a text picture to be detected, and positioning a text line of the text picture to be detected by adopting a key point positioning algorithm to obtain text line positioning information;

s2, performing affine transformation correction on the original text lines in the text picture to be detected according to the text line positioning information to obtain corrected text lines;

s3, segmenting the single character of the corrected text line according to the horizontal projection to obtain a candidate character area;

and S4, calculating the average width of the surrounding frames of the whole line of characters in the candidate character area, and merging the surrounding frames of the whole line of characters according to the average width to obtain the final character detection result.

As a specific implementation manner of the embodiment of the present invention, a key point positioning algorithm is used to position a text line of a text picture to be detected, so as to obtain text line positioning information, which specifically includes:

a key point output branch is added on the basis of the yolov3 key point positioning algorithm to improve the yolov3 key point positioning algorithm, and the text line of the text picture to be detected is positioned according to the improved yolov3 key point positioning algorithm to obtain text line positioning information.

Referring to fig. 2, an effect diagram of text line positioning according to an embodiment of the present invention is provided. In the embodiment of the invention, the improved yolov3 key point positioning algorithm can realize the detection and key point positioning of the text line at the same time, and is favorable for improving the positioning accuracy of the text line.

As a specific implementation manner of the embodiment of the present invention, the original text line includes an oblique text line and a text line with inconsistent height, and affine transformation correction is performed on the original text line in the text picture to be detected according to the text line positioning information to obtain a corrected text line, specifically:

calculating an affine transformation matrix from the coordinates of the key points of the text line to the target correction rectangle by using opencv;

and performing affine transformation on the oblique text lines and the inconsistent text lines in height to the text correction text lines according to the affine transformation matrix.

Referring to fig. 3, an effect diagram of text line rectification according to an embodiment of the invention is shown. In the embodiment of the invention, the side lengths of the quadrangles connected according to the four-point positioning information are respectively as follows: the upper side length W1, the lower side length W2, the left side length H1 and the right side length H2 determine the coordinates of the target correction rectangle to be (0,0), ((W1+ W2)/2,0), ((W1+ W2)/2, (H1+ H2)/2), (0), (H1+ H2)/2). And (3) calculating an affine transformation matrix from the coordinates of the key points of the text line to the target correction rectangle by using opencv, and correcting the inclined text line or the text line with high and low inconsistency into the text line corresponding to the target correction rectangle by using affine transformation. The embodiment of the invention corrects the inclined text lines and the text lines with different heights in the original text lines through the radial transformation, realizes the linear transformation from the two-dimensional coordinates to the two-dimensional coordinates, and is favorable for keeping the straightness of the two-dimensional graph. The effect of correcting the text lines is improved. The principle of the radiation transformation is as follows:

as a specific implementation manner of the embodiment of the present invention, a single character of the corrected text textual line is segmented according to the horizontal projection to obtain a candidate character region, which specifically includes:

after binarization processing is carried out on the corrected text line, pixel values in the corrected text line are accumulated in the horizontal direction to obtain a wave line;

and segmenting wave crests in the wavy lines by setting a threshold value to obtain a candidate character area.

Referring to fig. 4, an effect diagram of dividing a whole line of text according to an embodiment of the present invention is shown. In the embodiment of the invention, the candidate character area is obtained by dividing the single character according to the horizontal projection, and the candidate character area is the primary character detection result. The embodiment of the invention can rapidly and accurately realize the segmentation of the single character by adopting the horizontal projection method, avoids the problems of overlong time consumption and limited performance caused by detecting a large number of characters, and is beneficial to improving the character detection efficiency.

As a specific implementation manner of the embodiment of the present invention, calculating an average width of a bounding box of an entire row of characters in a candidate character region, and merging the bounding boxes of the entire row of characters according to the average width to obtain a final character detection result, specifically:

Please refer to fig. 5, which is a schematic diagram illustrating an effect of text detection according to an embodiment of the present invention. The embodiment of the invention can accurately combine the characters divided from the left and right components in the preliminary detection, and the surrounding frame of each character is subjected to position adjustment, so that the handwritten text is detected by combining the characteristics of Chinese characters, and the accuracy and the reliability of text detection are further improved.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention adopts the key point positioning algorithm to position the text line to obtain accurate text line positioning information, and corrects the text line in the text picture to be detected through radioactive transformation according to the text line positioning information, so that the text lines at different angles and different directions can be effectively corrected, and the accuracy of text detection can be improved; according to the embodiment of the invention, the horizontal projection is adopted to divide a single character in the corrected text, and two character surrounding boxes with smaller adjacent widths are combined according to the width of the surrounding box of the whole line of characters, so that the problem of poor text detection effect caused by the fact that the left and right components of Chinese in the text are divided into two Chinese characters can be effectively avoided, and the accuracy of text detection can be further improved.

Referring to fig. 6, a second embodiment of the present invention provides a handwritten text detection apparatus, including:

the positioning module 10 is configured to input a text picture to be detected, and position a text line of the text picture to be detected by using a key point positioning algorithm to obtain text line positioning information;

the correction module 20 is configured to perform affine transformation correction on an original text line in a text picture to be detected according to the text line positioning information to obtain a corrected text line;

the segmentation module 30 is configured to segment a single character of the corrected text line according to the horizontal projection to obtain a candidate character region;

and the merging module 40 is configured to calculate an average width of the bounding boxes of the entire row of characters in the candidate character region, and merge the bounding boxes of the entire row of characters according to the average width to obtain a final character detection result.

As a specific implementation of the embodiment of the present invention, a module is defined, which is specifically configured to:

As a specific implementation manner of the embodiment of the present invention, the correcting module 20 is specifically configured to:

as a specific implementation manner of the embodiment of the present invention, the segmentation module 30 is specifically configured to: after binarization processing is carried out on the corrected text line, pixel values in the corrected text line are accumulated in the horizontal direction to obtain a wave line;

Referring to fig. 4, an effect diagram of dividing a whole line of text according to an embodiment of the present invention is shown. In the embodiment of the invention, the candidate character area is obtained by dividing the single character according to the horizontal projection, and the candidate character area is the primary character detection result. The embodiment of the invention can rapidly and accurately realize the segmentation of the single character by adopting the horizontal projection method, avoids the problems of overlong time consumption and limited performance caused by detecting a large number of characters, and is favorable for improving the character detection efficiency

As a specific implementation manner of the embodiment of the present invention, the merging module 40 is specifically configured to:

The embodiment of the invention has the following beneficial effects:

The foregoing is a preferred embodiment of the present invention, and it should be noted that it would be apparent to those skilled in the art that various modifications and enhancements can be made without departing from the principles of the invention, and such modifications and enhancements are also considered to be within the scope of the invention.

Claims

1. A method for detecting handwritten text, comprising:

2. The handwritten text detection method according to claim 1, wherein said positioning the text line of the text picture to be detected by using a key point positioning algorithm to obtain text line positioning information, specifically:

3. The handwritten text detection method according to claim 1, wherein the original text lines include oblique text lines and inconsistent text lines, and affine transformation correction is performed on the original text lines in the text picture to be detected according to the text line positioning information to obtain corrected text lines, specifically:

4. The method according to claim 1, wherein said segmenting individual words of said text-corrected text-line according to horizontal projection to obtain candidate word regions comprises:

5. The method according to claim 1, wherein the calculating an average width of bounding boxes of an entire row of characters in the candidate character region, and merging the bounding boxes of the entire row of characters according to the average width to obtain a final character detection result, specifically comprises:

6. A handwritten text detection device, comprising:

7. The handwritten text detection device of claim 6, wherein said correction module is specifically configured to:

8. The device for detecting handwritten text according to claim 6, wherein said segmentation module is specifically configured to: after binarization processing is carried out on the corrected text line, accumulating pixel values in the corrected text line in the horizontal direction to obtain a wave line;

9. The handwritten text detection device of claim 6, wherein said merging module is specifically configured to:

10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform a method for handwritten text detection as claimed in any of claims 1 to 5.