CN111967474B

CN111967474B - Text line character segmentation method and device based on projection

Info

Publication number: CN111967474B
Application number: CN202010931307.9A
Authority: CN
Inventors: 王玉娇
Original assignee: Luster LightTech Co Ltd
Current assignee: Luster LightTech Co Ltd
Priority date: 2020-09-07
Filing date: 2020-09-07
Publication date: 2024-04-26
Anticipated expiration: 2040-09-07
Also published as: CN111967474A

Abstract

The application belongs to the technical field of image recognition, and particularly relates to a text line character segmentation method and device based on projection. In the technical field of image recognition, the existing optical character recognition technology is urgent to improve the recognition rate and accuracy. The application provides a text line character segmentation method and a text line character segmentation device based on projection, wherein the method determines the actual width and the actual height of a single character through horizontal projection and vertical projection, so that the judgment on the upper boundary and the lower boundary of the character is more accurate, and the robustness is strong; by tilting the font correction, the application range of character segmentation is extended. The character projection data of the application adopts the weighted summation of the gray image vertical projection curve, the binary image vertical projection curve and the edge intensity difference variance projection curve, thereby improving the accuracy and the reliability of character boundary judgment, being beneficial to the accurate and rapid segmentation of characters and being equally effective for the segmentation of slightly adhered characters and adhered special characters.

Description

Text line character segmentation method and device based on projection

Technical Field

The application relates to the technical field of image recognition, in particular to a text line character segmentation method and device based on projection.

Background

In the technical field of image recognition, particularly in optical character recognition, since the width and the height of characters are not identical, the characters cannot be segmented with equal width and height, so that the characters need to be accurately segmented for each character to be recognized efficiently and accurately.

How to more accurately and efficiently divide the upper, lower, left and right boundaries of a single character and avoid over-division or under-division of the character has been a challenge in the optical character recognition technology. The currently adopted character segmentation technology mainly comprises methods such as algorithm recognition segmentation, a horizontal projection method, a connected domain analysis method and the like, but the recognition rate and accuracy are needed to be improved.

Disclosure of Invention

The application provides a text line character segmentation method and device based on projection, which are used for solving the problem that the recognition rate and accuracy of the current character segmentation method need to be improved.

The technical scheme adopted by the application is as follows:

in a first aspect of the present application, there is provided a projection-based text line character segmentation method, comprising the steps of:

Acquiring a text line image to be segmented;

Judging whether fonts in the text line images to be segmented are inclined fonts or not according to the text line images to be segmented, correcting the inclined fonts if yes, calculating character projection data, and directly calculating the character projection data if no;

carrying out normalization processing on the character projection data to obtain normalized character segmentation data;

And carrying out character segmentation on the text line image to be segmented according to the normalized character segmentation data.

Optionally, after the step of acquiring the image of the text line to be segmented, the method includes:

And preprocessing the text line image to be segmented, wherein the preprocessing is to rotationally correct the text line image to be segmented to obtain a preprocessed text image in the horizontal direction.

Optionally, the step of performing tilt font correction includes:

rotationally deforming the text line image to be segmented;

performing vertical projection on the text line image to be segmented to obtain each group of vertical projection curves;

Calculating a horizontal gap G (theta) of each group of vertical projection curve characters;

Calculating the pixel point cumulative average value M (theta) of each group of vertical projection curves;

Calculating a font inclination angle theta=max (G (theta)) according to the horizontal gap G (theta) of each group of vertical projection curve characters and the pixel point cumulative average value M (theta) of each group of vertical projection curves;

and carrying out rotation deformation of the angle theta on the text line image to be segmented to obtain a corrected text line image to be segmented.

Optionally, the step of correcting the inclined font includes a coarse positioning correction process and a fine positioning correction process which are sequentially performed;

the coarse positioning correction process comprises the following steps:

Inputting an angle search range;

Calculating the product of G (theta) and M (theta) under each angle;

selecting an angle corresponding to the maximum value of G (theta) and M (theta);

Determining the inclination angle of the rough positioning fonts;

the accurate positioning correction process comprises the following steps:

calculating an accurate positioning search range;

Calculating the product of G (theta) and M (theta) under each angle;

determining a font tilt angle;

Optionally, the step of calculating character projection data includes:

gray processing is carried out on the text line image to be segmented to obtain a text line gray image, and vertical projection is carried out on the text line gray image to obtain a gray image vertical projection curve;

performing binarization processing on the text line image to be segmented to obtain a text line binarization image, and performing vertical projection on the text line binarization image to obtain a binary image vertical projection curve;

Performing edge intensity difference variance processing on the text line image to be segmented to obtain a text line edge intensity difference variance image, and performing edge intensity difference variance projection on the text line edge intensity difference variance image to obtain an edge intensity difference variance projection curve;

and carrying out weighted summation on the gray image vertical projection curve, the binary image vertical projection curve and the edge intensity difference variance projection curve to obtain character projection data.

Optionally, before the step of performing weighted summation on the grayscale image vertical projection curve, the binary image vertical projection curve and the edge intensity difference variance projection curve to obtain character projection data, the method further includes:

and performing expansion processing on the gray image vertical projection curve, the binary image vertical projection curve and the edge intensity difference variance projection curve.

Optionally, the character projection data includes maximum data of the character, actual width data of the character and actual height data of the character.

In a second aspect of the present application, there is provided a projection-based text line character segmentation apparatus, the apparatus comprising:

The text line image acquisition module is used for acquiring text line images to be segmented;

The character projection data calculation module is used for judging whether the fonts in the text line images to be segmented are inclined fonts according to the text line images to be segmented, if so, correcting the inclined fonts, then calculating character projection data, and if not, directly calculating the character projection data;

The data normalization module is used for carrying out normalization processing on the character projection data to obtain normalized character segmentation data;

And the character segmentation module is used for carrying out character segmentation on the text line image to be segmented according to the normalized character segmentation data.

Optionally, the text line image obtaining module to be segmented further includes a preprocessing sub-module, where the preprocessing sub-module is configured to perform rotation correction on the text line image to be segmented to obtain a preprocessed text image in a horizontal direction.

Optionally, the character projection data calculation module further comprises an inclined character body correction sub-module and a character projection curve sub-module;

the inclined character correction sub-module is used for executing the following steps:

rotationally deforming the text line image to be segmented;

Performing rotational deformation of the angle theta on the text line image to be segmented to obtain a corrected text line image to be segmented;

The character projection curve sub-module is used for executing the following steps:

The technical scheme of the application has the following beneficial effects:

according to the projection-based text line character segmentation method, the actual width and the actual height of a single character are determined through horizontal projection and vertical projection, the judgment of the upper boundary and the lower boundary of the character is more accurate, the robustness is high, and the application range of character segmentation is expanded through inclined font correction. The method has relatively low algorithm complexity, and quick and accurate character segmentation, and is beneficial to improving the accuracy of character recognition. The character projection data adopts the weighted summation of the gray image vertical projection curve, the binary image vertical projection curve and the edge intensity difference variance projection curve, thereby improving the accuracy and reliability of character boundary judgment, being beneficial to accurate and rapid segmentation of characters and being equally effective for segmentation of slightly adhered characters and adhered special characters.

Drawings

In order to more clearly illustrate the technical solution of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a block flow diagram of one embodiment provided by the first aspect of the present application;

FIG. 2 is a schematic diagram of a coarse positioning process and a fine positioning process according to an embodiment of the present application;

FIG. 3 is a schematic diagram of G (θ) and M (θ) in the present application.

In fig. 3, G (θ) is a horizontal gap in which a curve character is projected in the vertical direction; for ease of understanding, fig. 3 shows an image of a text line to be segmented, as shown in fig. 3, since text in the image of the text line to be segmented is not inclined, a length between two vertical dashed lines is G (θ), and when the text in the image of the text line to be segmented is in an inclined state, a value of G (θ) becomes small or even negative, because G (θ) is a horizontal gap between two characters in a vertical projection curve, and when projections of the two characters in the vertical direction overlap, the horizontal gap between the two characters is negative.

For ease of understanding, M (θ) is illustratively labeled in fig. 3, where M (θ) is the pixel cumulative average of the vertical projection curves. M (θ) represents height information data of text in a text line image to be segmented, and accumulation in the vertical direction in a character inclined state is dispersed to a peripheral position, so that the maximum value of projection values varies. Thus, M (θ) is the average value of the pixel integration values of the vertical projection curve, and specifically, the average value of the extremum in a certain percentage range in a single character may be selected, and illustratively, the average value of 5% -20% extremum may be selected, and preferably, 10% is selected, that is, the average value of the pixel integration value extremum of 10% of the characters is selected in all the characters (the extremum of the pixel integration value of any one of the remaining 90% of the characters is smaller than the extremum of the pixel integration value of any one of the 10%).

In fig. 3, in order to screen or distinguish the single characters, a character screening threshold is used, which is illustrated in fig. 3 as a horizontal line-cut vertical projection curve, and two endpoints generated after the vertical projection curve of each single character is cut, and in the opposite way, the two endpoints are single characters, so that the function of screening the single characters is played. When the character screening threshold value screens single characters, the horizontal gap between the characters is determined, and the horizontal gap G (theta) of the characters of the projection curve in the vertical direction is determined.

Detailed Description

Reference will now be made in detail to the embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The embodiments described in the examples below do not represent all embodiments consistent with the application. Merely exemplary of systems and methods consistent with aspects of the application as set forth in the claims.

Referring to fig. 1, a flow diagram of one embodiment of the first aspect of the present application is provided.

Acquiring a text line image to be segmented;

In this embodiment, the text line image to be segmented is firstly determined, so as to prevent segmentation deviation and error caused by italics, and facilitate accurate segmentation. And calculating character projection data, and providing reliable data support for accurate segmentation.

In this embodiment, the preprocessing can effectively reduce the probability of error recognition, and the preprocessing text image in the horizontal direction is obtained through the rotation correction preprocessing, so that the accuracy and the efficiency of character segmentation can be greatly improved.

Optionally, the step of performing tilt font correction includes:

rotationally deforming the text line image to be segmented;

In this embodiment, the horizontal gap G (θ) of the vertical projection curve character and the pixel point cumulative average M (θ) of the vertical projection curve need to be calculated for performing tilt font correction, and referring to fig. 3, the horizontal gap G (θ) of the vertical projection curve character and the pixel point cumulative average M (θ) of the vertical projection curve are exemplarily marked in the drawing, and according to the font tilt angle θ=max (G (θ) ×m (θ)), the rotation angle required for tilt font correction can be quickly and accurately calculated, so as to obtain a corrected text line image to be segmented, so as to provide a good image base for the next operation.

"Rotational deformation" and "rotation" in the present application are the same meaning, and are distinguished from conventional rotational translation about a point or wire, and mean "miscut transformation" or "shear"; miscut transformation, term of art SHEAR MAPPING or shear transformation, chinese translated as "miscut transformation" or "shear"; for the character, each single character has an external parallelogram, and when the single character is not inclined, the outline of the single character, namely the external parallelogram, is rectangular; the rotational deformation is deformation which keeps the length of the upper bottom edge and the lower bottom edge of the parallelogram unchanged, and can keep the bottom edge of the parallelogram fixed for facilitating understanding, so that the upper bottom edge of the parallelogram horizontally translates to drive the other two edges to move together. The distance between the upper bottom edge and the lower bottom edge is always kept unchanged in the process of rotational deformation.

In the present embodiment, the font tilt angle θ=max (G (θ) ×m (θ)), "=" has a meaning different from "equal to", and the font tilt angle θ=max (G (θ) ×m (θ)) has the meaning of: when the value of G (theta) and M (theta) is the maximum value in the rotational deformation process, the rotational deformation angle theta at the moment is the font inclination angle. Since only when the value of G (θ) ×m (θ) is the maximum value, the circumscribed parallelogram of the single character is rectangular, and the corresponding θ is the font tilt angle.

the coarse positioning correction process comprises the following steps:

Inputting an angle search range;

Calculating the product of G (theta) and M (theta) under each angle;

Determining the inclination angle of the rough positioning fonts;

the accurate positioning correction process comprises the following steps:

calculating an accurate positioning search range;

Calculating the product of G (theta) and M (theta) under each angle;

determining a font tilt angle;

Referring to fig. 2, the operation amount can be effectively reduced through the coarse positioning correction process, the efficiency of correcting the inclined fonts is improved, the inclination angle of the inclined fonts is determined through the coarse positioning correction process, a relatively small range of the inclination angles of the fonts is quickly obtained, then the accurate positioning search range is further calculated in the range, the angle corresponding to the maximum value of G (θ) M (θ) is calculated and selected, and the angle is determined as the inclination angle of the fonts, so that the correction of the inclined fonts is completed.

Optionally, the step of calculating character projection data includes:

In this embodiment, in order to further reduce the error rate of character recognition and segmentation, three projection curve weighted summation modes are selected to obtain character projection data, and the character projection data can more comprehensively and accurately reflect the character characteristics, so that the accuracy of character recognition and segmentation is improved.

The expansion processing in this embodiment, that is, the expansion operation in morphology, is a basic algorithm in the field of image processing algorithms. The expansion operation here allows the slightly broken curves to be connected together, avoiding over-segmentation of the character.

rotationally deforming the text line image to be segmented;

When the projection technology is used for text line character segmentation, the generation of a projection curve depends on the control of the characteristics of the data in the prior stage and the extraction of key characteristics. The projection curve may be generated by fusing multiple feature images and then generating a projection curve from the feature images, or may be deformed forms of each key feature image projection curve including but not limited to a gray level projection curve, a binary projection curve, an edge feature difference variance projection curve, and weighted summation thereof. Therefore, any calculation mode based on gray features or features showing differences between characters and background and a weighted sum projection curve generation method thereof belong to the technical scope of the application, and the execution strategies are the same and also belong to the protection scope of the application.

The above-provided detailed description is merely a few examples under the general inventive concept and does not limit the scope of the present application. Any other embodiments which are extended according to the solution of the application without inventive effort fall within the scope of protection of the application for a person skilled in the art.

Claims

1. The text line character segmentation method based on projection is characterized by comprising the following steps of:

Acquiring a text line image to be segmented;

according to the normalized character segmentation data, character segmentation is carried out on the text line image to be segmented;

wherein the step of performing the tilt font correction includes:

rotationally deforming the text line image to be segmented;

Calculating a font inclination angle theta=max (G (theta)) according to the horizontal gap G (theta) of each group of vertical projection curve characters and the pixel point cumulative average value M (theta) of each group of vertical projection curves; the meaning of the font tilt angle θ=max (G (θ) ×m (θ)) is: when the value of G (theta) and M (theta) is the maximum value in the rotational deformation process, the rotational deformation angle theta at the moment is the font inclination angle;

2. The projection-based text line character segmentation method according to claim 1, characterized by comprising, after the step of acquiring a text line image to be segmented:

3. The projection-based text line character segmentation method according to claim 1, wherein the step of performing the oblique font correction includes a coarse positioning correction process and a fine positioning correction process performed sequentially;

the coarse positioning correction process comprises the following steps:

Inputting an angle search range;

Calculating the product of G (theta) and M (theta) under each angle;

Determining the inclination angle of the rough positioning fonts;

the accurate positioning correction process comprises the following steps:

calculating an accurate positioning search range;

Calculating the product of G (theta) and M (theta) under each angle;

determining a font tilt angle;

4. A method of projection-based text line character segmentation as claimed in any one of claims 1 to 3, wherein the step of calculating character projection data includes:

5. The projection-based text line character segmentation method as set forth in claim 4, further comprising, prior to the step of weighting and summing the grayscale image perpendicular projection curve, the binary image perpendicular projection curve, and the edge intensity differential variance projection curve to obtain character projection data:

6. The method for character segmentation of text lines based on projection according to any one of claims 1 to 3 or 5, wherein the character projection data includes maximum data of characters, actual width data of characters, and actual height data of characters.

7. A projection-based text line character segmentation apparatus, the apparatus comprising:

The character segmentation module is used for carrying out character segmentation on the text line image to be segmented according to the normalized character segmentation data;

the character projection data calculation module further comprises an inclined character correction sub-module and a character projection curve sub-module;

rotationally deforming the text line image to be segmented;

8. The device for text line character segmentation based on projection of claim 7, wherein the text line image acquisition module further comprises a preprocessing sub-module, and the preprocessing sub-module is used for performing rotation correction on the text line image to be segmented to obtain a preprocessed text image in a horizontal direction.