CN112560847A

CN112560847A - Image text region positioning method and device, storage medium and electronic equipment

Info

Publication number: CN112560847A
Application number: CN202011561668.5A
Authority: CN
Inventors: 何龚敏; 杨俊�
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-03-26
Anticipated expiration: 2040-12-25
Also published as: CN112560847B

Abstract

The application provides a method and a device for positioning an image text region, a storage medium and electronic equipment, for a text image of a plain text type, performing expansion processing on the text image to connect adjacent characters into a text line connected region, and determining the image text region through a circumscribed rectangle of the text line connected region; for a text image of a text straight line staggered type, determining an image text region by detecting a straight line frame in the text image; for the text image with the complex background layout type, identifying single character frames in the text image, combining the single character frames into a text line communication area, and determining an image text area through the text line communication area and a detected straight line frame. Therefore, the positions of the upper edge, the lower edge, the left edge and the right edge of the text line in the text image are accurately positioned by identifying the straight-line frame and/or the circumscribed rectangle of the communicated region in the text image, and the image text region positioning of each type of text image has universality.

Description

Image text region positioning method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for locating an image text region, a storage medium, and an electronic device.

Background

Image text recognition has a wide demand in many fields, and related applications relate to identification card recognition, license plate number recognition, express bill recognition, bank card number recognition and the like. Image text recognition refers to recognition of text present in a text image, which is generally classified into a plain text type image, a text line interleaved type image (e.g., a form image), and a complex background layout type image (e.g., a ticket image), and image text region positioning is a prerequisite for image text recognition.

At present, the scheme for positioning the image text region mainly is an image pixel projection analysis method, which specifically includes: the image is subjected to binarization processing, so that characters in the image are black, a background is white, pixel points in the image are horizontally projected, the number of black pixel points on each text line is calculated, a pixel distribution diagram is obtained, and the starting point and the ending point of each peak on the pixel distribution diagram are determined as the upper boundary and the lower boundary of the text line according to a threshold value by setting the threshold value.

The existing image text region positioning scheme can only position the upper edge position and the lower edge position of each line of text in a plain text image, and is difficult to position the left edge position and the right edge position.

Disclosure of Invention

The application provides a method and a device for positioning an image text region, a storage medium and electronic equipment, aiming at improving the accuracy and the universality of the positioning of the image text region.

In order to achieve the above object, the present application provides the following technical solutions:

a method for locating a text region of an image, comprising:

acquiring a text image to be positioned, and determining the image type of the text image to be positioned; the image category comprises a plain text type, a text straight line staggered type or a complex background layout type;

if the image type of the text image to be positioned is a pure text type, performing image preprocessing on the text image to be positioned, performing expansion processing on the text image to be positioned after the image preprocessing to obtain a target text image, identifying each text line connected region in the target text image, determining the coordinate value of the circumscribed rectangle of each text line connected region, and determining the text line region in the text image to be positioned based on the coordinate value of the circumscribed rectangle of each text line connected region; the pixel values of adjacent pixel points in each text line communication area are the same;

if the image type of the text image to be positioned is a text straight line staggered type, performing image preprocessing on the text image to be positioned, performing horizontal line detection and vertical line detection on the text image to be positioned after the image preprocessing, determining a plurality of rectangles based on each horizontal line and each vertical line obtained by the detection, and determining a text line area in the text image to be positioned according to coordinate values of the rectangles;

if the image type of the text image to be positioned is a complex background layout type, inputting the text image to be positioned into a pre-constructed single character recognition model to obtain a coordinate prediction value and a confidence coefficient of a single character frame corresponding to each single character in the text image to be positioned, determining a plurality of target single character frames from each single character frame based on the confidence coefficient of each single character frame, merging the target single character frames adjacent in the horizontal direction to obtain a plurality of text line communication areas, performing horizontal line detection and vertical line detection on the text image to be positioned, and determining a text line area in the text image to be positioned according to each text line communication area, and the horizontal line and the vertical line obtained through detection.

Optionally, the above method, where the image preprocessing is performed on the text image to be positioned, includes:

carrying out graying processing on the text image to be positioned to obtain a grayed image;

filtering the grayed image to obtain a filtered image;

carrying out self-adaptive binarization processing on the filtered image to obtain a binarized image;

and carrying out inversion processing on the pixel value of each pixel point in the binary image.

Optionally, in the method, the filtering the grayed image to obtain a filtered image includes:

sliding each pixel point in the gray level image by using the center of a preset filtering sliding window;

and when the center of the filtering sliding window slides to a pixel point in the gray image, selecting a preset filtering calculation formula corresponding to the noise type based on the noise type of the text image to be positioned, calculating a filtering gray value in the current filtering sliding window based on the selected filtering calculation mode, and taking the calculated filtering gray value as the pixel value of the pixel point.

Optionally, the expanding the text image to be positioned after the image preprocessing to obtain the target text image includes:

based on the first sliding window, performing expansion processing on the text image to be positioned after image preprocessing; the width of the first sliding window is determined according to the distance between adjacent characters in the text image to be positioned, and the height of the first sliding window is determined according to the line distance of text lines in the image to be positioned.

Optionally, the above method, where horizontal line detection is performed on the text image to be positioned after image preprocessing, includes:

based on a preset second sliding window, carrying out corrosion treatment on the text image to be positioned after image preprocessing to obtain a first corrosion image;

based on a preset third sliding window, performing expansion processing on the first corrosion image to obtain a first expansion image;

identifying respective horizontal connected regions in the first dilated image and determining a circumscribed rectangle for each of the horizontal connected regions;

and calculating the coordinates of two end points of a horizontal line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the horizontal connected region aiming at each horizontal connected region.

Optionally, the method for detecting a vertical line of the text image to be positioned after the image preprocessing includes:

based on a preset fourth sliding window, carrying out corrosion treatment on the text image to be positioned after image preprocessing to obtain a second corrosion image;

based on a preset fifth sliding window, performing expansion processing on the second corrosion image to obtain a second expansion image;

identifying each vertical connected region in the second expansion image, and determining a circumscribed rectangle of each vertical connected region;

and calculating the coordinates of two end points of a vertical line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the vertical connected region aiming at each vertical connected region.

In the foregoing method, optionally, the determining a plurality of target single word boxes from the single word boxes based on the confidence of each single word box includes:

for each single character frame, if the confidence coefficient of the single character frame is not less than a preset confidence coefficient threshold value, determining the single character frame as an initial single character frame;

forming the initial single word frames into a single word frame set;

selecting a first single character frame from the current single character frame set; the first single character frame is an initial single character frame with the highest confidence level in each initial single character frame contained in the current single character frame set;

calculating the area overlapping rate of the initial single character frame and the first single character frame aiming at each residual initial single character frame in the single character frame set, and deleting the initial single character frame from the single character frame set if the area overlapping rate is greater than a preset overlapping threshold value;

determining a target single character frame for the first single character frame, and judging whether the current single character frame set is an empty set;

and if the current single character frame set is not the empty set, returning to execute the step of selecting the first single character frame from the current single character frame set until the current single character frame set is the empty set.

An image text region positioning apparatus comprising:

the acquisition unit is used for acquiring a text image to be positioned and determining the image type of the text image to be positioned; the image category comprises a plain text type, a text straight line staggered type or a complex background layout type;

the first positioning unit is used for performing image preprocessing on the text image to be positioned if the image type of the text image to be positioned is a pure text type, performing expansion processing on the text image to be positioned after the image preprocessing to obtain a target text image, identifying each text line communication area in the target text image, determining the coordinate value of the circumscribed rectangle of each text line communication area, and determining the text line area in the text image to be positioned based on the coordinate value of the circumscribed rectangle of each text line communication area; the pixel values of adjacent pixel points in each text line communication area are the same;

the second positioning unit is used for preprocessing the text image to be positioned if the image type of the text image to be positioned is a text straight line staggered type, performing horizontal line detection and vertical line detection on the text image to be positioned after image preprocessing, determining a plurality of rectangles based on each horizontal line and each vertical line obtained through detection, and determining a text line area in the text image to be positioned according to the coordinate values of the rectangles;

and the third positioning unit is used for inputting the text image to be positioned into a pre-constructed single character recognition model if the image type of the text image to be positioned is a complex background layout type, obtaining a coordinate prediction value and a confidence coefficient of a single character frame corresponding to each single character in the text image to be positioned, determining a plurality of target single character frames from each single character frame based on the confidence coefficient of each single character frame, combining the target single character frames adjacent in the horizontal direction to obtain a plurality of text line communication areas, performing horizontal line detection and vertical line detection on the text image to be positioned, and determining a text line area in the text image to be positioned according to each text line communication area and the detected horizontal line and vertical line.

A storage medium, the storage medium comprising stored instructions, wherein when the instructions are executed, a device in which the storage medium is located is controlled to execute the above-mentioned image text region location method.

An electronic device comprising a memory, and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by one or more processors to perform the method for image text region location.

Compared with the prior art, the method has the following advantages:

the application provides a method and a device for positioning an image text region, wherein the method comprises the following steps: aiming at different types of text images, different image text region positioning strategies are adopted, and expansion processing is carried out on the text images of the pure text type, so that adjacent characters are connected into a text line connected region, the circumscribed rectangle of the text line connected region is further determined, and the region of the text in the text images is positioned; the method comprises the steps of detecting a straight-line frame in a text image to position a region with text in the text image, identifying a single-word frame corresponding to each single word in the text image based on a single-word identification model for the text image with a complex background layout type, combining the single-word frames into a text line communication region, and detecting the straight-line frame in the text image to position the region with text in the text image through the straight-line frame and the text line communication region. Therefore, according to the technical scheme provided by the application, the positions of the upper edge, the lower edge, the left edge and the right edge of each text line in the text image are accurately positioned by identifying the straight-line framework and/or the circumscribed rectangle of the communicated region in the text image, and different image text region positioning strategies are adopted for different types of text images, so that the image text region positioning of each type of text image has universality.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flowchart illustrating a method for locating a text region in an image according to the present disclosure;

FIG. 2 is a flowchart illustrating a method for locating a text region in an image according to the present disclosure;

FIG. 3 is a flowchart illustrating a method for locating a text region in an image according to another embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating a method for locating a text region in an image according to another embodiment of the present disclosure;

FIG. 5 is a flowchart illustrating a method for locating a text region in an image according to the present disclosure;

FIG. 6 is a schematic structural diagram of an apparatus for locating an image text region according to the present application;

fig. 7 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The application is operational with numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multi-processor apparatus, distributed computing environments that include any of the above devices or equipment, and the like.

The embodiment of the application provides a method for locating an image text region, which can be applied to a plurality of system platforms, wherein an execution main body of the method can run on a computer terminal or a processor of various mobile devices, and a flow chart of the method is shown in fig. 1, and specifically comprises the following steps:

s101, obtaining a text image to be positioned, and determining the image type of the text image to be positioned.

The method comprises the steps of obtaining a text image to be positioned, and determining the image type of the text image to be positioned, wherein the image type comprises a pure text image type, a text straight line staggered type or a complex background layout type.

Optionally, the specific process of determining the image type of the text image to be positioned includes: and receiving the image type of the text image to be positioned uploaded by a user, or inputting the text image to be positioned into a pre-constructed image recognition model to obtain the image type of the text image to be positioned output by the image recognition model. Optionally, the image recognition model may be a classification model, and the specific construction process refers to the construction process of the existing convolutional neural network classification model.

And S102, if the image type of the text image to be positioned is a plain text type, performing image preprocessing on the text image to be positioned.

And if the image type of the text image to be positioned is a pure text type, performing image preprocessing on the text image to be positioned, wherein optionally, the image preprocessing comprises performing graying processing, filtering processing, self-adaptive binarization processing and pixel inversion processing on the text image to be processed so as to enhance the image quality of the text image to be processed.

Referring to fig. 2, the process of image preprocessing for the text image to be positioned specifically includes:

s201, carrying out gray processing on the text image to be positioned to obtain a gray image.

Carrying out gray processing on the text image to be positioned to obtain a gray image of the text image to be positioned, wherein the specific gray processing process comprises the following steps: and carrying out gray conversion on each pixel point in the text image to be positioned according to a preset gray formula to obtain a gray image of the text image to be positioned, wherein each pixel point in the gray image represents the depth of gray by a numerical value between 0 and 255.

Optionally, the preset graying formula is as follows:

GYAY＝R×0.299+G×0.587+B×0.114

wherein R, G and B represent the red, green and blue values respectively, and GRAY is the GRAY value finally obtained.

S202, filtering the gray image to obtain a filtered image.

Filtering the grayed text image to be positioned, that is, filtering the grayed image to obtain a filtered image of the text image to be positioned, wherein the specific filtering process may include:

when the center of the filtering sliding window slides to a pixel point in the gray image, based on the noise type of the text image to be positioned, a preset filtering calculation formula corresponding to the noise type is selected, based on the selected filtering calculation mode, the filtering gray value in the current filtering sliding window is calculated, and the calculated filtering gray value is used as the pixel value of the pixel point.

In the method provided by the embodiment of the application, each pixel point in the grayscale image is scribed through the center of the preset filtering sliding window, that is, based on a preset sliding mode, the preset filtering sliding window slides in the grayscale image, so that each pixel point in the grayscale image is scribed through the center of the filtering sliding window. The preset sliding manner is a set sliding manner, and is not limited herein.

In the method provided by the embodiment of the application, when the text image is slid to a pixel point each time, a preset filtering calculation mode corresponding to the noise type is selected based on the noise type of the text image to be positioned, the gray value in the current filtering sliding window is calculated based on the selected filtering calculation mode, and the calculated filtering gray value is used as the pixel value of the pixel point. If the noise type of the text image to be positioned is a white noise type, calculating the gray value in each filtering sliding window by using a Gaussian filtering calculation formula, and if the noise type of the text image to be positioned is a salt and pepper noise type, calculating the gray value in each filtering sliding window by using a median filtering calculation formula.

And S203, performing self-adaptive binarization processing on the filtered image to obtain a binarized image.

And carrying out self-adaptive binarization processing on each pixel point in the filtered image, wherein the gray value of each pixel point in the filtered image is 0 or 255, and obtaining a binarization image in the text image to be positioned. Where 0 represents black and 255 represents white.

Optionally, the specific process of performing adaptive binarization processing on the filtered image includes:

dividing each pixel point in the filtered wave image by the center of a preset binarization sliding window, calculating a current binarization threshold value by pixel values of all pixel points in the current binarization sliding window when sliding to one pixel point, comparing the pixel value of the center pixel point in the current binarization sliding window with the current binarization threshold value, if the pixel value of the center pixel point is greater than the binarization threshold value, taking a preset first numerical value as the pixel value of the center pixel point in the current binarization sliding window, and if the pixel value of the center pixel point is not greater than the binarization threshold value, taking a preset second numerical value as the pixel value of the center pixel point in the current binarization sliding window; wherein the first value is 255 and the second value is 0.

In the method provided by the embodiment of the application, the self-adaptive binarization processing is carried out on the filtered image, so that the whole image only presents black and white pixel gray values, and the target contour, namely the contour of each text line, is highlighted. The adaptive binarization processing method provided by the application carries out adaptive binarization processing on the filtered image, so that the corresponding binarization threshold value of each pixel point in the filtered image is not fixed and is determined by the pixel values of all the pixel points in the binarization sliding window, the binarization threshold value of the image area with higher brightness is generally higher, and the binarization threshold value of the image area with lower brightness is lower in a matching way, so that the adaptive binarization processing method can adapt to images with different brightness, different contrast ratios and different textures.

And S204, carrying out inversion processing on the pixel value of each pixel point in the binary image.

The pixel value of each pixel point in the binary image is inverted, if the pixel value of the pixel point is a first numerical value, the pixel value is inverted to a second numerical value, if the pixel value of the pixel point is a second numerical value, the pixel value is inverted to the first numerical value, namely if the pixel value of the pixel point is 255, the pixel value is inverted to 0, and if the pixel value of the pixel point is 0, the pixel value is inverted to 255, so that the color of the black-and-white pixel point in the binary image is inverted.

S103, performing expansion processing on the text image to be positioned after image preprocessing to obtain a target text image.

Performing expansion processing on the text image to be positioned after image preprocessing to connect adjacent characters in each line into a whole to obtain a target text image, wherein the specific expansion processing process is as follows:

In the method provided by the embodiment of the application, based on the first sliding window, in a preset sliding manner, the center of the first sliding window is made to slide over each pixel point of the text image to be positioned after the image preprocessing, and when the center of the first sliding window slides to one pixel point of the text image to be positioned after the image preprocessing, the current maximum pixel value in the coverage range of the first sliding window is used as the pixel value of the pixel point corresponding to the center of the first sliding window, so as to realize expansion processing.

In the method provided by the embodiment of the application, the width of the first sliding window is determined according to the distance between adjacent characters in the text image to be positioned, and the height of the first sliding window is determined according to the line distance of the text line in the text image to be positioned.

S104, identifying each text line connected region in the target text image, and determining the coordinate value of the circumscribed rectangle of each text line connected region.

And identifying each text line communication area in the target text image, wherein the pixel values of adjacent pixel points in each text line communication area are the same.

And determining coordinate values of circumscribed rectangles of each text line connected region based on each recognized text line connected region, namely determining the circumscribed outline of each text line connected region as the circumscribed rectangle of the text line connected region aiming at each text line connected region, determining the coordinate values of the circumscribed rectangles after the circumscribed rectangles are determined, and when the description is needed, determining the coordinate values of the circumscribed rectangles as the coordinate values of four corners of the circumscribed rectangle.

Optionally, after the circumscribed rectangle of each text connected region is determined, the circumscribed rectangle with the obviously excessive or excessively small width may be further deleted, that is, the circumscribed rectangle with the height not within the preset height range and/or the width not within the preset width range is deleted, and the circumscribed rectangle with the height within the preset height range and the width within the preset width range is retained.

And S105, determining the text line area in the text image to be positioned based on the coordinate value of the circumscribed rectangle of each text line connected area.

Determining the text line region in the text image to be positioned based on the coordinate values of the circumscribed rectangles of each text line connected region, namely determining one rectangle by the coordinate values of each circumscribed rectangle, wherein the rectangle corresponds to one text line region in the text image to be positioned, and the coordinate values of all circumscribed rectangles can determine all the text line regions in the text image to be positioned.

And S106, if the image type of the text image to be positioned is a text straight line staggered type, performing image preprocessing on the text image to be positioned.

If the image type of the text image to be positioned is a text straight line staggered type, similar to the text image of a plain text type, image preprocessing needs to be performed on the text image to be positioned, and a specific process of the image preprocessing is shown in fig. 2 for reference, which is not described herein again.

S107, horizontal line detection and vertical line detection are carried out on the text image to be positioned after image preprocessing, and a plurality of rectangles are determined based on each horizontal line and each vertical line obtained through detection.

According to the method provided by the embodiment of the application, horizontal line detection and vertical line detection are carried out on the text image to be positioned after image preprocessing, namely, a straight line framework in the text image to be positioned after image preprocessing is detected.

According to the method provided by the embodiment of the application, horizontal corrosion processing is carried out on the text image to be positioned after image preprocessing, and horizontal expansion processing is carried out after the horizontal corrosion processing, so that the horizontal line in the text image to be positioned after the image preprocessing is detected.

Referring to fig. 3, the process of performing horizontal line detection on the text image to be positioned after image preprocessing specifically includes:

s301, based on a preset second sliding window, carrying out corrosion treatment on the text image to be positioned after image preprocessing to obtain a first corrosion image.

In the method provided by the embodiment of the application, the text image to be positioned after the image preprocessing is subjected to corrosion processing based on the preset second sliding window, that is, the text image to be positioned after the image preprocessing is subjected to horizontal corrosion to obtain the first corrosion image.

It should be noted that the aspect ratio of the second sliding window is greater than the first threshold, optionally, the first threshold may be 30, that is, the aspect ratio of the second sliding window is greater than 30, optionally, the height of the second sliding window satisfies a preset first height range, and the first height range is 1 to 2 pixel points.

According to the method provided by the embodiment of the application, the second sliding window is used for corroding the text image to be positioned after the image preprocessing, so that image elements of vertical lines and other non-horizontal lines can be inhibited.

S302, based on a preset third sliding window, performing expansion processing on the first corrosion image to obtain a first expansion image.

In the method provided in the embodiment of the application, the first erosion image is expanded based on the preset third sliding window, that is, the first erosion image is horizontally expanded to obtain the first expanded image, and it should be noted that a specific process of the expansion processing refers to an existing image expansion process, which is not described herein again.

It should be noted that the aspect ratio of the third sliding window is greater than the second threshold, optionally, the second threshold may be 20, that is, the aspect ratio of the third sliding window is greater than 20, optionally, the height of the second sliding window satisfies a preset second height range, and the second height range is 1 to 5 pixel points.

S303, identifying each horizontal connected region in the first expansion image, and determining a circumscribed rectangle of each horizontal connected region.

And identifying each horizontal connected region in the first expansion image, wherein the horizontal connected region is a horizontal expansion region, determining the circumscribed outline of the horizontal connected region, and further determining the circumscribed rectangle of each horizontal connected region.

S304, calculating the coordinates of two end points of a horizontal line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the horizontal connected region aiming at each horizontal connected region.

And calculating the coordinates of two end points of the horizontal line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the horizontal connected region aiming at each horizontal connected region.

The specific process of calculating the coordinates of the two end points of the horizontal line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the horizontal connected region for each horizontal connected region comprises the following steps:

calculating the upper edge of the circumscribed rectangle of the horizontally connected regionTaking the mean value between the vertical coordinate of the edge and the vertical coordinate of the lower edge as the vertical coordinates of two end points of a horizontal line corresponding to the circumscribed rectangle; and taking the abscissa of the left edge of the circumscribed rectangle as the abscissa of the left end point of the horizontal line corresponding to the circumscribed rectangle, and taking the abscissa of the right edge of the circumscribed rectangle as the abscissa of the right end point of the horizontal line corresponding to the circumscribed rectangle. For example, the upper edge of the circumscribed rectangle has an ordinate y_upThe lower edge ordinate is y_downLeft edge abscissa is x_leftThe abscissa of the right edge is x_rightThe left end point of the horizontal line corresponding to the circumscribed rectangle is

The right end point is

According to the method provided by the embodiment of the application, the vertical corrosion treatment is carried out on the text image to be positioned after the image preprocessing, and the vertical expansion treatment is carried out after the vertical corrosion treatment, so that the vertical line in the text image to be positioned after the image preprocessing is detected.

Referring to fig. 4, the process of performing vertical line detection on the text image to be positioned after image preprocessing specifically includes:

s401, based on a preset fourth sliding window, carrying out corrosion treatment on the text image to be positioned after image preprocessing to obtain a second corrosion image.

In the method provided by the embodiment of the application, the text image to be positioned after the image preprocessing is subjected to corrosion processing based on the preset fourth sliding window, that is, the text image to be positioned after the image preprocessing is subjected to vertical corrosion to obtain the second corrosion image.

It should be noted that the aspect ratio of the fourth sliding window is smaller than the third threshold, optionally, the third threshold may be 1/30, that is, the aspect ratio of the fourth sliding window is smaller than 1/30, and optionally, the width of the fourth sliding window may be 1 pixel.

According to the method provided by the embodiment of the application, the fourth sliding window is used for carrying out corrosion treatment on the text image to be positioned after image preprocessing, and image elements of horizontal lines and other non-vertical lines can be restrained.

S402, based on a preset fifth sliding window, performing expansion processing on the second corrosion image to obtain a second expansion image.

In the method provided in the embodiment of the application, the second erosion image is expanded based on the preset fifth sliding window, that is, the first erosion image is vertically expanded to obtain the second expanded image, and it should be noted that a specific process of the expansion processing refers to an existing image expansion process, which is not described herein again.

It should be noted that the aspect ratio of the fifth sliding window is smaller than the fourth threshold, optionally, the fourth threshold may be 1/20, that is, the aspect ratio of the fifth sliding window is larger than 1/20, optionally, the width of the fifth sliding window may be smaller than the fifth threshold, and the fifth threshold may be 5 pixel points, that is, the width of the fifth sliding window may be smaller than 5 pixel points.

S403, identifying each vertical connected region in the second expansion image, and determining a circumscribed rectangle of each vertical connected region.

And identifying each vertical connected region in the second expansion image, wherein the vertical connected region is a vertical expansion region, determining the circumscribed outline of the vertical connected region, and further determining the circumscribed rectangle of each vertical connected region.

S404, calculating the coordinates of two end points of a vertical line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of each vertical connected region.

And calculating the coordinates of two end points of the vertical line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the vertical connected region aiming at each vertical connected region.

The specific process of calculating the coordinates of the two end points of the vertical line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the vertical connected region for each vertical connected region comprises the following steps:

calculating an average value between the abscissa of the left edge and the abscissa of the right edge of the circumscribed rectangle of the vertically communicated region, and taking the average value as the abscissas of two end points of a vertical line corresponding to the circumscribed rectangle; and taking the ordinate of the upper edge of the circumscribed rectangle as the ordinate of the upper end point of the vertical line corresponding to the circumscribed rectangle, and taking the ordinate of the lower edge of the circumscribed rectangle as the ordinate of the lower end point of the vertical line corresponding to the circumscribed rectangle. For example, the upper edge of the circumscribed rectangle has an ordinate y_upThe lower edge ordinate is y_downLeft edge abscissa is x_leftThe abscissa of the right edge is x_rightThe upper end point of the vertical line corresponding to the circumscribed rectangle is

The lower end point is

In the method provided by the embodiment of the application, after the horizontal line and the vertical line of the text image to be positioned after the image preprocessing are detected, a certain blank gap needs to be left for text region segmentation, so that based on each text region, the horizontal line above and/or below the text region is copied, the vertical line on the left and/or right of the text region is copied, the copied horizontal line is moved up or down, and the copied vertical line is moved left or right.

In the method provided by the embodiment of the application, each horizontal line and each vertical line form a straight line frame of the text image to be positioned, and the formed straight line frame divides the text image to be positioned into a plurality of rectangles.

And S108, determining a text line region in the text image to be positioned according to the coordinate value of the rectangle.

And determining the text line region in the text image to be positioned based on the coordinate values of all the rectangles, namely determining one rectangle by the coordinate values of each rectangle, wherein the determined rectangle corresponds to one text line region in the text image to be positioned, and the coordinate values of all the rectangles can determine all the text line regions in the text image to be positioned.

And S109, if the image type of the text image to be positioned is a complex background layout type, inputting the text image to be positioned into a pre-constructed single character recognition model to obtain a coordinate prediction value and a confidence coefficient of a single character frame corresponding to each single character in the text image to be positioned.

In the method provided in the embodiment of the present application, the single character recognition model is constructed in advance, and the construction process of the single character recognition model is referred to in the prior art and is not described herein again.

If the image type of the text image to be positioned is a complex background layout type, inputting the text image to be positioned into a pre-constructed single character recognition model to obtain a coordinate prediction value and a confidence coefficient of a single character frame corresponding to each single character in the text image to be positioned output by the single character recognition model, wherein optionally, the confidence coefficient ranges from 0 to 1, and the higher the numerical value is, the higher the confidence coefficient is.

And S110, determining a plurality of target single character frames from the single character frames based on the confidence coefficient of each single character frame.

Specifically, referring to fig. 5, the process of determining a plurality of target single word frames from each single word frame based on the confidence of each single word frame includes:

s501, aiming at each single character frame, if the confidence coefficient of the single character frame is not smaller than a preset confidence coefficient threshold value, the single character frame is determined to be an initial single character frame.

And determining the single character frame of which the confidence coefficient is not less than a preset confidence coefficient threshold value in each single character frame as an initial single character frame.

And S502, forming a single character frame set by each initial single character frame.

S503, selecting a first single character frame from the current single character frame set; the first single character frame is the initial single character frame with the maximum confidence level in each initial single character frame contained in the current single character frame set.

S504, calculating the area overlapping rate of the initial single character frame and the first single character frame aiming at each residual initial single character frame in the single character frame set.

For each initial single character frame remaining in the single character frame set, calculating the area overlapping rate of the initial single character frame and the first single character frame according to the intersected area and the parallel area by calculating the intersected area and the parallel area of the initial single character frame and the first single character frame, namely dividing the intersected area by the parallel area to obtain the area overlapping rate.

And S505, judging whether the area overlapping rate of the initial single character frame and the first single character frame is greater than a preset overlapping threshold value or not for each residual initial single character frame in the single character frame set.

And for each remaining initial single character frame in the single character frame set, judging whether the area overlapping rate is greater than a preset overlapping threshold value or not based on the calculated area overlapping rate of the initial single character frame and the first single character frame, if so, executing a step S506, and if not, executing a step S507.

S506, deleting the initial single-word frame from the single-word frame set.

For each initial single character frame remaining in the single character frame set, if the area overlapping rate is greater than the preset overlapping threshold, deleting the initial single character frame from the single character frame set, and executing step S507.

S507, judging whether an initial single character frame which does not calculate the area overlapping rate with the first single character frame exists in the single character frame set.

And judging whether an initial single character frame which does not calculate the area overlapping rate with the first single character frame exists in the single character frame set, if so, returning to execute the step S505, and if not, executing the step S508.

And S508, determining the first single character frame as a target single character frame.

S509, judging whether the current single-word frame set is an empty set.

And judging whether the current single-character frame set is an empty set, if so, directly ending, if not, returning to execute the step S503.

Optionally, in the method provided in this embodiment of the present application, target single character frames with heights not greater than the preset threshold are deleted.

And S111, merging the target single character frames adjacent in the horizontal direction to obtain a plurality of text line communication areas.

Merging target single character frames adjacent in the horizontal direction to obtain a plurality of text row communication areas, wherein the specific process comprises the following steps:

sequencing the target single character frames according to a preset sequence according to the abscissa of the upper left corner of each target single character frame to obtain a single character frame sequence;

judging whether the distance between the horizontal coordinates of the upper left corners of two adjacent target single character frames of the single character frame sequence is greater than a preset first threshold value or not, and segmenting whether the distance between the horizontal coordinates of the upper left corners is greater than the middle of two target single character frames corresponding to the preset first threshold value or not to obtain a plurality of single character frame sequences;

and taking the left boundary of the first target single-character frame in each single-character frame sequence as the left boundary of the text line communication region corresponding to the single-character frame sequence, taking the right boundary of the last target single-character frame in each single-character frame sequence as the right boundary of the text line communication region corresponding to the single-character frame sequence, taking the minimum value of the upper boundary of each single-character frame sequence as the upper boundary of the text line communication region corresponding to the single-character frame sequence, and taking the maximum value of the lower boundary of each single-character frame sequence as the lower boundary of the text line communication region corresponding to the single-character frame sequence.

In the method provided in this embodiment of the present application, each target single-character frame is arranged according to the abscissa of the upper left corner of each target single-character frame in a preset order to obtain a single-character frame sequence, optionally, the preset order may be an order from small to large abscissas, and whether the distance between the abscissas of the upper left corners of two adjacent target single-character frames in the single-character frame sequence is greater than a preset first threshold is determined, if so, the two target single-character frames are segmented to obtain a plurality of single-character frame sequences, that is, if the distance between the left abscissas of the upper left corners of a plurality of adjacent single-character frames in the single-character frame sequence is greater than the preset first threshold, the single-character frame sequence is divided into a plurality of single-character frame sequences, for example, the distance between the left abscissas of the upper left abscissas of two adjacent single-character frames in a plurality of groups is greater than the preset first threshold, the single-character frame sequences are finally divided into 6 groups, and each group of single-character frame sequence corresponds to one text line connected region, the upper boundary of the text line connected region is determined according to the minimum value of the upper boundary of the single character frame sequence corresponding to the text line connected region, the lower boundary is determined according to the maximum value of the lower boundary of the single character frame sequence corresponding to the text line connected region, the left boundary is determined according to the left boundary of the first target single character frame in the single character frame sequence corresponding to the text line connected region, and the right boundary is determined according to the right boundary of the last target single character frame in the single character frame sequence corresponding to the text line connected region.

And S112, carrying out horizontal line detection and vertical line detection on the text image to be positioned.

The specific implementation process of step S112 is as described in step S107, and is not described herein again.

S113, determining a text line region in the text image to be positioned according to the text line communication region and the detected horizontal line and vertical line.

After a straight-line frame and each text line communication area in a text image to be positioned are determined, the text line communication area is divided by the straight-line frame, namely, the detected horizontal line and vertical line form the straight-line frame of the text image to be positioned, the text image to be positioned is divided into a plurality of areas, so that characters belonging to different areas in the text communication area are divided, a plurality of final text line communication areas are obtained, each final text line communication area can determine a rectangle, the determined rectangle corresponds to one text line area in the text image to be positioned, and coordinate values of all rectangles can determine all text line areas in the text image to be positioned.

According to the image text region positioning method provided by the embodiment of the application, aiming at different types of text images, different image text region positioning strategies are adopted, and the text images of pure text types are expanded, so that adjacent characters are connected into a text line communicating region, the circumscribed rectangle of the text line communicating region is further determined, and the region with the text in the text images is positioned; the method comprises the steps of detecting a straight-line frame in a text image to position a region with text in the text image, identifying a single-word frame corresponding to each single word in the text image based on a single-word identification model for the text image with a complex background layout type, combining the single-word frames into a text line communication region, and detecting the straight-line frame in the text image to position the region with text in the text image through the straight-line frame and the text line communication region. By adopting the image text region positioning method provided by the embodiment of the application, the accurate positioning of the upper, lower, left and right edge positions of each text line in the text image is realized by identifying the straight-line frame and/or the circumscribed rectangle of the communicated region in the text image, and the universality of the image text region positioning of each type of text image is realized by adopting different image text region positioning strategies for different types of text images.

Corresponding to the method described in fig. 1, an embodiment of the present application further provides an apparatus for locating an image text region, which is used to implement the method in fig. 1 specifically, and a schematic structural diagram of the apparatus is shown in fig. 6, and specifically includes:

an obtaining unit 601, configured to obtain a text image to be positioned, and determine an image type of the text image to be positioned; the image category comprises a plain text type, a text straight line staggered type or a complex background layout type;

a first positioning unit 602, configured to, if the image type of the text image to be positioned is a pure text type, perform image preprocessing on the text image to be positioned, perform expansion processing on the text image to be positioned after the image preprocessing to obtain a target text image, identify each text line connected region in the target text image, determine a coordinate value of a circumscribed rectangle of each text line connected region, and determine a text line region in the text image to be positioned based on the coordinate value of the circumscribed rectangle of each text line connected region; the pixel values of adjacent pixel points in each text line communication area are the same;

the second positioning unit 603 is configured to, if the image type of the text image to be positioned is a text straight line staggered type, perform image preprocessing on the text image to be positioned, perform horizontal line detection and vertical line detection on the text image to be positioned after the image preprocessing, determine a plurality of rectangles based on each horizontal line and each vertical line obtained by the detection, and determine a text line region in the text image to be positioned according to coordinate values of the rectangles;

a third positioning unit 604, configured to, if the image type of the text image to be positioned is a complex background layout type, input the text image to be positioned into a pre-constructed single character recognition model, obtain a coordinate prediction value and a confidence level of a single character frame corresponding to each single character in the text image to be positioned, determine multiple target single character frames from each single character frame based on the confidence level of each single character frame, merge the target single character frames adjacent to each other in the horizontal direction to obtain multiple text line connected regions, perform horizontal line detection and vertical line detection on the text image to be positioned, and determine a text line region in the text image to be positioned according to each text line connected region and the detected horizontal line and vertical line.

According to the image text region positioning device provided by the embodiment of the application, aiming at different types of text images, different image text region positioning strategies are adopted, and the text images of pure text types are expanded, so that adjacent characters are connected into a text line communicating region, the circumscribed rectangle of the text line communicating region is further determined, and the region where the text exists in the text images is positioned; the method comprises the steps of detecting a straight-line frame in a text image to position a region with text in the text image, identifying a single-word frame corresponding to each single word in the text image based on a single-word identification model for the text image with a complex background layout type, combining the single-word frames into a text line communication region, and detecting the straight-line frame in the text image to position the region with text in the text image through the straight-line frame and the text line communication region. By adopting the image text region positioning device provided by the embodiment of the application, the accurate positioning of the upper, lower, left and right edge positions of each text line in the text image is realized by identifying the straight-line frame and/or the circumscribed rectangle of the communicated region in the text image, and different image text region positioning strategies are adopted for different types of text images, so that the universality of the positioning of the image text regions of the text images of various types is realized.

In an embodiment of the present application, based on the foregoing solution, the first positioning unit 602 and the second positioning unit 603 are configured to:

the graying subunit is used for performing graying processing on the text image to be positioned to obtain a grayed image;

the filtering subunit is used for carrying out filtering processing on the grayed image to obtain a filtered image;

a binarization subunit, configured to perform adaptive binarization processing on the filtered image to obtain a binarized image;

and the inversion sub unit is used for carrying out inversion processing on the pixel value of each pixel point in the binary image.

In an embodiment of the application, based on the foregoing solution, the filtering subunit performs filtering processing on the grayed image to obtain a filtered image, and is configured to:

In an embodiment of the present application, based on the foregoing scheme, the first positioning unit 602 performs dilation processing on the text image to be positioned after image preprocessing to obtain a target text image, and is configured to:

In an embodiment of the present application, based on the foregoing solution, the second positioning unit 603 performs horizontal line detection on the text image to be positioned after image preprocessing, so as to:

In an embodiment of the present application, based on the foregoing solution, the second positioning unit 603 performs vertical line detection on the text image to be positioned after image preprocessing, and is configured to:

In an embodiment of the application, based on the foregoing solution, the third positioning unit 604 performs determining a plurality of target single word boxes from the single word boxes based on the confidence of each single word box, for:

forming the initial single word frames into a single word frame set;

The embodiment of the application also provides a storage medium, which comprises a stored instruction, wherein when the instruction runs, the device where the storage medium is located is controlled to execute the image text region positioning method.

An electronic device is provided in an embodiment of the present application, and its structural schematic diagram is shown in fig. 7, which specifically includes a memory 701 and one or more instructions 702, where the one or more instructions 702 are stored in the memory 701, and are configured to be executed by one or more processors 703 to perform the following operations according to the one or more instructions 702:

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The foregoing detailed description is directed to a method and an apparatus for locating an image text region, a storage medium, and an electronic device provided by the present application, and a specific example is applied in the detailed description to explain the principles and embodiments of the present application, and the description of the foregoing embodiment is only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. an image text area positioning method, is characterized in that, comprises:

Acquiring a text image to be positioned, and determining the image type of the text image to be positioned; the image category includes a plain text type, a text line interlaced type, or a complex background layout type;

If the image type of the to-be-located text image is a plain text type, image preprocessing is performed on the to-be-located text image, and expansion processing is performed on the to-be-located text image after image preprocessing to obtain the target text image, and the target text image is identified. Describe each text line connected area in the target text image, determine the coordinate value of the circumscribed rectangle of each described text line connected area, and determine the to-be-located coordinate value based on the coordinate value of the circumscribed rectangle of each described text line connected area A text line region in a text image; wherein, the pixel values of adjacent pixels in each of the text line connected regions are the same;

If the image type of the to-be-located text image is the text-line interlaced type, image preprocessing is performed on the to-be-located text image, and horizontal line detection and vertical line detection are performed on the to-be-located text image after image preprocessing. Determine a plurality of rectangles for each horizontal line and each vertical line obtained by detection, and determine the text line area in the to-be-located text image according to the coordinate value of the rectangle;

If the image type of the to-be-located text image is a complex background layout type, input the to-be-located text image into a pre-built single-character recognition model to obtain a single-character box corresponding to each single character in the to-be-located text image Based on the confidence of each of the single-character boxes, multiple target single-character frames are determined from each of the single-character frames, and the horizontally adjacent target single-character frames are merged to obtain multiple text lines. Connected regions, perform horizontal line detection and vertical line detection on the text image to be positioned, and determine the text line region in the text image to be positioned according to each of the text line connected regions and the detected horizontal and vertical lines .

2. The method according to claim 1, wherein the performing image preprocessing on the to-be-located text image comprises:

performing grayscale processing on the text image to be positioned to obtain a grayscale image;

filtering the grayscale image to obtain a filtered image;

performing adaptive binarization processing on the filtered image to obtain a binarized image;

Invert the pixel value of each pixel in the binarized image.

3. The method according to claim 2, wherein the filtering of the grayscaled image to obtain a filtered image comprises:

Slide over each pixel in the grayscale image with the center of the preset filtering sliding window;

Every time the center of the filtering sliding window slides to a pixel in the grayscale image, based on the noise type of the text image to be located, a preset filtering calculation formula corresponding to the noise type is selected, and based on In the selected filtering calculation method, the filtering grayscale value in the current filtering sliding window is calculated, and the calculated filtering grayscale value is used as the pixel value of the pixel point.

4. The method according to claim 3, wherein the preprocessed text image to be positioned is subjected to expansion processing to obtain a target text image, comprising:

Based on the first sliding window, the preprocessed text image to be positioned is subjected to expansion processing; wherein, the width of the first sliding window is determined according to the distance between adjacent characters in the text image to be positioned, and the width of the first sliding window is determined. The height of the first sliding window is determined according to the line spacing of text lines in the to-be-located image.

5. The method according to claim 3, characterized in that, performing horizontal line detection on the text image to be positioned after image preprocessing, comprising:

Based on the preset second sliding window, corrode the preprocessed text image to be positioned to obtain a first corroded image;

Based on a preset third sliding window, dilation processing is performed on the first eroded image to obtain a first dilated image;

Identifying each horizontally connected region in the first dilated image, and determining a circumscribed rectangle of each of the horizontally connected regions;

For each of the horizontally connected regions, according to the coordinates of the circumscribed rectangle of the horizontally connected region, the coordinates of the two end points of the horizontal line corresponding to the circumscribed rectangle are calculated.

6. The method according to claim 3, wherein the performing vertical line detection on the text image to be positioned after the image preprocessing comprises:

Based on the preset fourth sliding window, corrode the preprocessed text image to be positioned to obtain a second corroded image;

Based on the preset fifth sliding window, dilation processing is performed on the second eroded image to obtain a second dilated image;

Identifying each vertical connected region in the second dilated image, and determining the circumscribed rectangle of each of the vertical connected regions;

For each of the vertical connected regions, the coordinates of the two end points of the vertical line corresponding to the circumscribed rectangle are calculated according to the coordinates of the circumscribed rectangle of the vertical connected region.

7. The method according to claim 1, characterized in that, determining a plurality of target word boxes from each of the word boxes based on the confidence of each of the word boxes, comprising:

For each of the single-character frames, if the confidence of the single-character frame is not less than a preset reliability threshold, the single-character frame is determined as the initial single-character frame;

Each of the initial single-character frames is formed into a single-character frame set;

The first single-character frame is selected from the current single-character frame set; the first single-character frame is the initial single-character frame with the greatest confidence in each initial single-character frame contained in the current single-character frame set;

For each remaining initial single-word frame in the single-character frame set, calculate the area overlap ratio of the initial single-character frame and the first single-character frame, and if the area overlap ratio is greater than a preset overlap threshold, the initial single-character frame a single-word box is deleted from the single-word box set;

Determine the target single-character frame by the first single-character frame, and judge whether the current single-character frame set is an empty set;

If the current one-word box set is not an empty set, the step of selecting the first one-word box from the current one-word box set is returned to execute until the current one-word box set is an empty set.

8. An image text area positioning device, characterized in that, comprising:

an acquiring unit, configured to acquire a text image to be positioned, and determine the image type of the text image to be positioned; the image category includes a plain text type, a text line-interlaced type or a complex background layout type;

a first positioning unit, configured to perform image preprocessing on the text image to be positioned if the image type of the text image to be positioned is a plain text type, and perform expansion processing on the text image to be positioned after the image preprocessing, Obtain the target text image, identify the connected regions of each text line in the target text image, determine the coordinate value of the circumscribed rectangle of each of the text line connected regions, based on the coordinates of the circumscribed rectangle of each of the text line connected regions value, determine the text line area in the text image to be positioned; wherein, the pixel values of adjacent pixels in each of the text line connected areas are the same;

a second positioning unit, configured to perform image preprocessing on the text image to be positioned, and perform horizontal line detection on the text image to be positioned after image preprocessing if the image type of the text image to be positioned is a text line interlaced type and vertical line detection, determine a plurality of rectangles based on each horizontal line and each vertical line obtained by the detection, and determine the text line area in the text image to be positioned according to the coordinate value of the rectangle;

a third positioning unit, configured to input the to-be-located text image into a pre-built word recognition model if the image type of the to-be-located text image is a complex background layout type, and obtain the to-be-located text image Each word corresponds to the coordinate prediction value and confidence level of the word box. Based on the confidence level of each word box, a plurality of target word boxes are determined from each of the word boxes, and the horizontally adjacent target word boxes are processed. Merge to obtain a plurality of text line connected regions, perform horizontal line detection and vertical line detection on the text image to be located, and determine the to-be-located text image according to each of the text line connected regions and the detected horizontal lines and vertical lines. The text line area in the text image.

9 . A storage medium, characterized in that the storage medium comprises stored instructions, wherein when the instructions are executed, a device where the storage medium is located is controlled to execute the image according to any one of claims 1 to 7 Text area positioning method.

10. An electronic device, comprising a memory, and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by one or more processors as claimed in claims 1- 7. The image text area positioning method according to any one of the items.