CN112560847A - Image text region positioning method and device, storage medium and electronic equipment - Google Patents
Image text region positioning method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN112560847A CN112560847A CN202011561668.5A CN202011561668A CN112560847A CN 112560847 A CN112560847 A CN 112560847A CN 202011561668 A CN202011561668 A CN 202011561668A CN 112560847 A CN112560847 A CN 112560847A
- Authority
- CN
- China
- Prior art keywords
- image
- text
- single character
- character frame
- line
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 93
- 238000004891 communication Methods 0.000 claims abstract description 44
- 238000007781 pre-processing Methods 0.000 claims description 72
- 238000001914 filtration Methods 0.000 claims description 52
- 238000001514 detection method Methods 0.000 claims description 47
- 238000005260 corrosion Methods 0.000 claims description 35
- 230000007797 corrosion Effects 0.000 claims description 35
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000010586 diagram Methods 0.000 description 6
- 230000003044 adaptive effect Effects 0.000 description 5
- 230000003628 erosive effect Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000013145 classification model Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 235000002566 Capsicum Nutrition 0.000 description 1
- 239000006002 Pepper Substances 0.000 description 1
- 235000016761 Piper aduncum Nutrition 0.000 description 1
- 235000017804 Piper guineense Nutrition 0.000 description 1
- 244000203593 Piper nigrum Species 0.000 description 1
- 235000008184 Piper nigrum Nutrition 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000010339 dilation Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Character Input (AREA)
Abstract
The application provides a method and a device for positioning an image text region, a storage medium and electronic equipment, for a text image of a plain text type, performing expansion processing on the text image to connect adjacent characters into a text line connected region, and determining the image text region through a circumscribed rectangle of the text line connected region; for a text image of a text straight line staggered type, determining an image text region by detecting a straight line frame in the text image; for the text image with the complex background layout type, identifying single character frames in the text image, combining the single character frames into a text line communication area, and determining an image text area through the text line communication area and a detected straight line frame. Therefore, the positions of the upper edge, the lower edge, the left edge and the right edge of the text line in the text image are accurately positioned by identifying the straight-line frame and/or the circumscribed rectangle of the communicated region in the text image, and the image text region positioning of each type of text image has universality.
Description
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for locating an image text region, a storage medium, and an electronic device.
Background
Image text recognition has a wide demand in many fields, and related applications relate to identification card recognition, license plate number recognition, express bill recognition, bank card number recognition and the like. Image text recognition refers to recognition of text present in a text image, which is generally classified into a plain text type image, a text line interleaved type image (e.g., a form image), and a complex background layout type image (e.g., a ticket image), and image text region positioning is a prerequisite for image text recognition.
At present, the scheme for positioning the image text region mainly is an image pixel projection analysis method, which specifically includes: the image is subjected to binarization processing, so that characters in the image are black, a background is white, pixel points in the image are horizontally projected, the number of black pixel points on each text line is calculated, a pixel distribution diagram is obtained, and the starting point and the ending point of each peak on the pixel distribution diagram are determined as the upper boundary and the lower boundary of the text line according to a threshold value by setting the threshold value.
The existing image text region positioning scheme can only position the upper edge position and the lower edge position of each line of text in a plain text image, and is difficult to position the left edge position and the right edge position.
Disclosure of Invention
The application provides a method and a device for positioning an image text region, a storage medium and electronic equipment, aiming at improving the accuracy and the universality of the positioning of the image text region.
In order to achieve the above object, the present application provides the following technical solutions:
a method for locating a text region of an image, comprising:
acquiring a text image to be positioned, and determining the image type of the text image to be positioned; the image category comprises a plain text type, a text straight line staggered type or a complex background layout type;
if the image type of the text image to be positioned is a pure text type, performing image preprocessing on the text image to be positioned, performing expansion processing on the text image to be positioned after the image preprocessing to obtain a target text image, identifying each text line connected region in the target text image, determining the coordinate value of the circumscribed rectangle of each text line connected region, and determining the text line region in the text image to be positioned based on the coordinate value of the circumscribed rectangle of each text line connected region; the pixel values of adjacent pixel points in each text line communication area are the same;
if the image type of the text image to be positioned is a text straight line staggered type, performing image preprocessing on the text image to be positioned, performing horizontal line detection and vertical line detection on the text image to be positioned after the image preprocessing, determining a plurality of rectangles based on each horizontal line and each vertical line obtained by the detection, and determining a text line area in the text image to be positioned according to coordinate values of the rectangles;
if the image type of the text image to be positioned is a complex background layout type, inputting the text image to be positioned into a pre-constructed single character recognition model to obtain a coordinate prediction value and a confidence coefficient of a single character frame corresponding to each single character in the text image to be positioned, determining a plurality of target single character frames from each single character frame based on the confidence coefficient of each single character frame, merging the target single character frames adjacent in the horizontal direction to obtain a plurality of text line communication areas, performing horizontal line detection and vertical line detection on the text image to be positioned, and determining a text line area in the text image to be positioned according to each text line communication area, and the horizontal line and the vertical line obtained through detection.
Optionally, the above method, where the image preprocessing is performed on the text image to be positioned, includes:
carrying out graying processing on the text image to be positioned to obtain a grayed image;
filtering the grayed image to obtain a filtered image;
carrying out self-adaptive binarization processing on the filtered image to obtain a binarized image;
and carrying out inversion processing on the pixel value of each pixel point in the binary image.
Optionally, in the method, the filtering the grayed image to obtain a filtered image includes:
sliding each pixel point in the gray level image by using the center of a preset filtering sliding window;
and when the center of the filtering sliding window slides to a pixel point in the gray image, selecting a preset filtering calculation formula corresponding to the noise type based on the noise type of the text image to be positioned, calculating a filtering gray value in the current filtering sliding window based on the selected filtering calculation mode, and taking the calculated filtering gray value as the pixel value of the pixel point.
Optionally, the expanding the text image to be positioned after the image preprocessing to obtain the target text image includes:
based on the first sliding window, performing expansion processing on the text image to be positioned after image preprocessing; the width of the first sliding window is determined according to the distance between adjacent characters in the text image to be positioned, and the height of the first sliding window is determined according to the line distance of text lines in the image to be positioned.
Optionally, the above method, where horizontal line detection is performed on the text image to be positioned after image preprocessing, includes:
based on a preset second sliding window, carrying out corrosion treatment on the text image to be positioned after image preprocessing to obtain a first corrosion image;
based on a preset third sliding window, performing expansion processing on the first corrosion image to obtain a first expansion image;
identifying respective horizontal connected regions in the first dilated image and determining a circumscribed rectangle for each of the horizontal connected regions;
and calculating the coordinates of two end points of a horizontal line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the horizontal connected region aiming at each horizontal connected region.
Optionally, the method for detecting a vertical line of the text image to be positioned after the image preprocessing includes:
based on a preset fourth sliding window, carrying out corrosion treatment on the text image to be positioned after image preprocessing to obtain a second corrosion image;
based on a preset fifth sliding window, performing expansion processing on the second corrosion image to obtain a second expansion image;
identifying each vertical connected region in the second expansion image, and determining a circumscribed rectangle of each vertical connected region;
and calculating the coordinates of two end points of a vertical line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the vertical connected region aiming at each vertical connected region.
In the foregoing method, optionally, the determining a plurality of target single word boxes from the single word boxes based on the confidence of each single word box includes:
for each single character frame, if the confidence coefficient of the single character frame is not less than a preset confidence coefficient threshold value, determining the single character frame as an initial single character frame;
forming the initial single word frames into a single word frame set;
selecting a first single character frame from the current single character frame set; the first single character frame is an initial single character frame with the highest confidence level in each initial single character frame contained in the current single character frame set;
calculating the area overlapping rate of the initial single character frame and the first single character frame aiming at each residual initial single character frame in the single character frame set, and deleting the initial single character frame from the single character frame set if the area overlapping rate is greater than a preset overlapping threshold value;
determining a target single character frame for the first single character frame, and judging whether the current single character frame set is an empty set;
and if the current single character frame set is not the empty set, returning to execute the step of selecting the first single character frame from the current single character frame set until the current single character frame set is the empty set.
An image text region positioning apparatus comprising:
the acquisition unit is used for acquiring a text image to be positioned and determining the image type of the text image to be positioned; the image category comprises a plain text type, a text straight line staggered type or a complex background layout type;
the first positioning unit is used for performing image preprocessing on the text image to be positioned if the image type of the text image to be positioned is a pure text type, performing expansion processing on the text image to be positioned after the image preprocessing to obtain a target text image, identifying each text line communication area in the target text image, determining the coordinate value of the circumscribed rectangle of each text line communication area, and determining the text line area in the text image to be positioned based on the coordinate value of the circumscribed rectangle of each text line communication area; the pixel values of adjacent pixel points in each text line communication area are the same;
the second positioning unit is used for preprocessing the text image to be positioned if the image type of the text image to be positioned is a text straight line staggered type, performing horizontal line detection and vertical line detection on the text image to be positioned after image preprocessing, determining a plurality of rectangles based on each horizontal line and each vertical line obtained through detection, and determining a text line area in the text image to be positioned according to the coordinate values of the rectangles;
and the third positioning unit is used for inputting the text image to be positioned into a pre-constructed single character recognition model if the image type of the text image to be positioned is a complex background layout type, obtaining a coordinate prediction value and a confidence coefficient of a single character frame corresponding to each single character in the text image to be positioned, determining a plurality of target single character frames from each single character frame based on the confidence coefficient of each single character frame, combining the target single character frames adjacent in the horizontal direction to obtain a plurality of text line communication areas, performing horizontal line detection and vertical line detection on the text image to be positioned, and determining a text line area in the text image to be positioned according to each text line communication area and the detected horizontal line and vertical line.
A storage medium, the storage medium comprising stored instructions, wherein when the instructions are executed, a device in which the storage medium is located is controlled to execute the above-mentioned image text region location method.
An electronic device comprising a memory, and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by one or more processors to perform the method for image text region location.
Compared with the prior art, the method has the following advantages:
the application provides a method and a device for positioning an image text region, wherein the method comprises the following steps: aiming at different types of text images, different image text region positioning strategies are adopted, and expansion processing is carried out on the text images of the pure text type, so that adjacent characters are connected into a text line connected region, the circumscribed rectangle of the text line connected region is further determined, and the region of the text in the text images is positioned; the method comprises the steps of detecting a straight-line frame in a text image to position a region with text in the text image, identifying a single-word frame corresponding to each single word in the text image based on a single-word identification model for the text image with a complex background layout type, combining the single-word frames into a text line communication region, and detecting the straight-line frame in the text image to position the region with text in the text image through the straight-line frame and the text line communication region. Therefore, according to the technical scheme provided by the application, the positions of the upper edge, the lower edge, the left edge and the right edge of each text line in the text image are accurately positioned by identifying the straight-line framework and/or the circumscribed rectangle of the communicated region in the text image, and different image text region positioning strategies are adopted for different types of text images, so that the image text region positioning of each type of text image has universality.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for locating a text region in an image according to the present disclosure;
FIG. 2 is a flowchart illustrating a method for locating a text region in an image according to the present disclosure;
FIG. 3 is a flowchart illustrating a method for locating a text region in an image according to another embodiment of the present disclosure;
FIG. 4 is a flowchart illustrating a method for locating a text region in an image according to another embodiment of the present disclosure;
FIG. 5 is a flowchart illustrating a method for locating a text region in an image according to the present disclosure;
FIG. 6 is a schematic structural diagram of an apparatus for locating an image text region according to the present application;
fig. 7 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The application is operational with numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multi-processor apparatus, distributed computing environments that include any of the above devices or equipment, and the like.
The embodiment of the application provides a method for locating an image text region, which can be applied to a plurality of system platforms, wherein an execution main body of the method can run on a computer terminal or a processor of various mobile devices, and a flow chart of the method is shown in fig. 1, and specifically comprises the following steps:
s101, obtaining a text image to be positioned, and determining the image type of the text image to be positioned.
The method comprises the steps of obtaining a text image to be positioned, and determining the image type of the text image to be positioned, wherein the image type comprises a pure text image type, a text straight line staggered type or a complex background layout type.
Optionally, the specific process of determining the image type of the text image to be positioned includes: and receiving the image type of the text image to be positioned uploaded by a user, or inputting the text image to be positioned into a pre-constructed image recognition model to obtain the image type of the text image to be positioned output by the image recognition model. Optionally, the image recognition model may be a classification model, and the specific construction process refers to the construction process of the existing convolutional neural network classification model.
And S102, if the image type of the text image to be positioned is a plain text type, performing image preprocessing on the text image to be positioned.
And if the image type of the text image to be positioned is a pure text type, performing image preprocessing on the text image to be positioned, wherein optionally, the image preprocessing comprises performing graying processing, filtering processing, self-adaptive binarization processing and pixel inversion processing on the text image to be processed so as to enhance the image quality of the text image to be processed.
Referring to fig. 2, the process of image preprocessing for the text image to be positioned specifically includes:
s201, carrying out gray processing on the text image to be positioned to obtain a gray image.
Carrying out gray processing on the text image to be positioned to obtain a gray image of the text image to be positioned, wherein the specific gray processing process comprises the following steps: and carrying out gray conversion on each pixel point in the text image to be positioned according to a preset gray formula to obtain a gray image of the text image to be positioned, wherein each pixel point in the gray image represents the depth of gray by a numerical value between 0 and 255.
Optionally, the preset graying formula is as follows:
GYAY=R×0.299+G×0.587+B×0.114
wherein R, G and B represent the red, green and blue values respectively, and GRAY is the GRAY value finally obtained.
S202, filtering the gray image to obtain a filtered image.
Filtering the grayed text image to be positioned, that is, filtering the grayed image to obtain a filtered image of the text image to be positioned, wherein the specific filtering process may include:
sliding each pixel point in the gray level image by using the center of a preset filtering sliding window;
when the center of the filtering sliding window slides to a pixel point in the gray image, based on the noise type of the text image to be positioned, a preset filtering calculation formula corresponding to the noise type is selected, based on the selected filtering calculation mode, the filtering gray value in the current filtering sliding window is calculated, and the calculated filtering gray value is used as the pixel value of the pixel point.
In the method provided by the embodiment of the application, each pixel point in the grayscale image is scribed through the center of the preset filtering sliding window, that is, based on a preset sliding mode, the preset filtering sliding window slides in the grayscale image, so that each pixel point in the grayscale image is scribed through the center of the filtering sliding window. The preset sliding manner is a set sliding manner, and is not limited herein.
In the method provided by the embodiment of the application, when the text image is slid to a pixel point each time, a preset filtering calculation mode corresponding to the noise type is selected based on the noise type of the text image to be positioned, the gray value in the current filtering sliding window is calculated based on the selected filtering calculation mode, and the calculated filtering gray value is used as the pixel value of the pixel point. If the noise type of the text image to be positioned is a white noise type, calculating the gray value in each filtering sliding window by using a Gaussian filtering calculation formula, and if the noise type of the text image to be positioned is a salt and pepper noise type, calculating the gray value in each filtering sliding window by using a median filtering calculation formula.
And S203, performing self-adaptive binarization processing on the filtered image to obtain a binarized image.
And carrying out self-adaptive binarization processing on each pixel point in the filtered image, wherein the gray value of each pixel point in the filtered image is 0 or 255, and obtaining a binarization image in the text image to be positioned. Where 0 represents black and 255 represents white.
Optionally, the specific process of performing adaptive binarization processing on the filtered image includes:
dividing each pixel point in the filtered wave image by the center of a preset binarization sliding window, calculating a current binarization threshold value by pixel values of all pixel points in the current binarization sliding window when sliding to one pixel point, comparing the pixel value of the center pixel point in the current binarization sliding window with the current binarization threshold value, if the pixel value of the center pixel point is greater than the binarization threshold value, taking a preset first numerical value as the pixel value of the center pixel point in the current binarization sliding window, and if the pixel value of the center pixel point is not greater than the binarization threshold value, taking a preset second numerical value as the pixel value of the center pixel point in the current binarization sliding window; wherein the first value is 255 and the second value is 0.
In the method provided by the embodiment of the application, the self-adaptive binarization processing is carried out on the filtered image, so that the whole image only presents black and white pixel gray values, and the target contour, namely the contour of each text line, is highlighted. The adaptive binarization processing method provided by the application carries out adaptive binarization processing on the filtered image, so that the corresponding binarization threshold value of each pixel point in the filtered image is not fixed and is determined by the pixel values of all the pixel points in the binarization sliding window, the binarization threshold value of the image area with higher brightness is generally higher, and the binarization threshold value of the image area with lower brightness is lower in a matching way, so that the adaptive binarization processing method can adapt to images with different brightness, different contrast ratios and different textures.
And S204, carrying out inversion processing on the pixel value of each pixel point in the binary image.
The pixel value of each pixel point in the binary image is inverted, if the pixel value of the pixel point is a first numerical value, the pixel value is inverted to a second numerical value, if the pixel value of the pixel point is a second numerical value, the pixel value is inverted to the first numerical value, namely if the pixel value of the pixel point is 255, the pixel value is inverted to 0, and if the pixel value of the pixel point is 0, the pixel value is inverted to 255, so that the color of the black-and-white pixel point in the binary image is inverted.
S103, performing expansion processing on the text image to be positioned after image preprocessing to obtain a target text image.
Performing expansion processing on the text image to be positioned after image preprocessing to connect adjacent characters in each line into a whole to obtain a target text image, wherein the specific expansion processing process is as follows:
based on the first sliding window, performing expansion processing on the text image to be positioned after image preprocessing; the width of the first sliding window is determined according to the distance between adjacent characters in the text image to be positioned, and the height of the first sliding window is determined according to the line distance of text lines in the image to be positioned.
In the method provided by the embodiment of the application, based on the first sliding window, in a preset sliding manner, the center of the first sliding window is made to slide over each pixel point of the text image to be positioned after the image preprocessing, and when the center of the first sliding window slides to one pixel point of the text image to be positioned after the image preprocessing, the current maximum pixel value in the coverage range of the first sliding window is used as the pixel value of the pixel point corresponding to the center of the first sliding window, so as to realize expansion processing.
In the method provided by the embodiment of the application, the width of the first sliding window is determined according to the distance between adjacent characters in the text image to be positioned, and the height of the first sliding window is determined according to the line distance of the text line in the text image to be positioned.
S104, identifying each text line connected region in the target text image, and determining the coordinate value of the circumscribed rectangle of each text line connected region.
And identifying each text line communication area in the target text image, wherein the pixel values of adjacent pixel points in each text line communication area are the same.
And determining coordinate values of circumscribed rectangles of each text line connected region based on each recognized text line connected region, namely determining the circumscribed outline of each text line connected region as the circumscribed rectangle of the text line connected region aiming at each text line connected region, determining the coordinate values of the circumscribed rectangles after the circumscribed rectangles are determined, and when the description is needed, determining the coordinate values of the circumscribed rectangles as the coordinate values of four corners of the circumscribed rectangle.
Optionally, after the circumscribed rectangle of each text connected region is determined, the circumscribed rectangle with the obviously excessive or excessively small width may be further deleted, that is, the circumscribed rectangle with the height not within the preset height range and/or the width not within the preset width range is deleted, and the circumscribed rectangle with the height within the preset height range and the width within the preset width range is retained.
And S105, determining the text line area in the text image to be positioned based on the coordinate value of the circumscribed rectangle of each text line connected area.
Determining the text line region in the text image to be positioned based on the coordinate values of the circumscribed rectangles of each text line connected region, namely determining one rectangle by the coordinate values of each circumscribed rectangle, wherein the rectangle corresponds to one text line region in the text image to be positioned, and the coordinate values of all circumscribed rectangles can determine all the text line regions in the text image to be positioned.
And S106, if the image type of the text image to be positioned is a text straight line staggered type, performing image preprocessing on the text image to be positioned.
If the image type of the text image to be positioned is a text straight line staggered type, similar to the text image of a plain text type, image preprocessing needs to be performed on the text image to be positioned, and a specific process of the image preprocessing is shown in fig. 2 for reference, which is not described herein again.
S107, horizontal line detection and vertical line detection are carried out on the text image to be positioned after image preprocessing, and a plurality of rectangles are determined based on each horizontal line and each vertical line obtained through detection.
According to the method provided by the embodiment of the application, horizontal line detection and vertical line detection are carried out on the text image to be positioned after image preprocessing, namely, a straight line framework in the text image to be positioned after image preprocessing is detected.
According to the method provided by the embodiment of the application, horizontal corrosion processing is carried out on the text image to be positioned after image preprocessing, and horizontal expansion processing is carried out after the horizontal corrosion processing, so that the horizontal line in the text image to be positioned after the image preprocessing is detected.
Referring to fig. 3, the process of performing horizontal line detection on the text image to be positioned after image preprocessing specifically includes:
s301, based on a preset second sliding window, carrying out corrosion treatment on the text image to be positioned after image preprocessing to obtain a first corrosion image.
In the method provided by the embodiment of the application, the text image to be positioned after the image preprocessing is subjected to corrosion processing based on the preset second sliding window, that is, the text image to be positioned after the image preprocessing is subjected to horizontal corrosion to obtain the first corrosion image.
It should be noted that the aspect ratio of the second sliding window is greater than the first threshold, optionally, the first threshold may be 30, that is, the aspect ratio of the second sliding window is greater than 30, optionally, the height of the second sliding window satisfies a preset first height range, and the first height range is 1 to 2 pixel points.
According to the method provided by the embodiment of the application, the second sliding window is used for corroding the text image to be positioned after the image preprocessing, so that image elements of vertical lines and other non-horizontal lines can be inhibited.
S302, based on a preset third sliding window, performing expansion processing on the first corrosion image to obtain a first expansion image.
In the method provided in the embodiment of the application, the first erosion image is expanded based on the preset third sliding window, that is, the first erosion image is horizontally expanded to obtain the first expanded image, and it should be noted that a specific process of the expansion processing refers to an existing image expansion process, which is not described herein again.
It should be noted that the aspect ratio of the third sliding window is greater than the second threshold, optionally, the second threshold may be 20, that is, the aspect ratio of the third sliding window is greater than 20, optionally, the height of the second sliding window satisfies a preset second height range, and the second height range is 1 to 5 pixel points.
S303, identifying each horizontal connected region in the first expansion image, and determining a circumscribed rectangle of each horizontal connected region.
And identifying each horizontal connected region in the first expansion image, wherein the horizontal connected region is a horizontal expansion region, determining the circumscribed outline of the horizontal connected region, and further determining the circumscribed rectangle of each horizontal connected region.
S304, calculating the coordinates of two end points of a horizontal line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the horizontal connected region aiming at each horizontal connected region.
And calculating the coordinates of two end points of the horizontal line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the horizontal connected region aiming at each horizontal connected region.
The specific process of calculating the coordinates of the two end points of the horizontal line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the horizontal connected region for each horizontal connected region comprises the following steps:
calculating the upper edge of the circumscribed rectangle of the horizontally connected regionTaking the mean value between the vertical coordinate of the edge and the vertical coordinate of the lower edge as the vertical coordinates of two end points of a horizontal line corresponding to the circumscribed rectangle; and taking the abscissa of the left edge of the circumscribed rectangle as the abscissa of the left end point of the horizontal line corresponding to the circumscribed rectangle, and taking the abscissa of the right edge of the circumscribed rectangle as the abscissa of the right end point of the horizontal line corresponding to the circumscribed rectangle. For example, the upper edge of the circumscribed rectangle has an ordinate yupThe lower edge ordinate is ydownLeft edge abscissa is xleftThe abscissa of the right edge is xrightThe left end point of the horizontal line corresponding to the circumscribed rectangle isThe right end point is
According to the method provided by the embodiment of the application, the vertical corrosion treatment is carried out on the text image to be positioned after the image preprocessing, and the vertical expansion treatment is carried out after the vertical corrosion treatment, so that the vertical line in the text image to be positioned after the image preprocessing is detected.
Referring to fig. 4, the process of performing vertical line detection on the text image to be positioned after image preprocessing specifically includes:
s401, based on a preset fourth sliding window, carrying out corrosion treatment on the text image to be positioned after image preprocessing to obtain a second corrosion image.
In the method provided by the embodiment of the application, the text image to be positioned after the image preprocessing is subjected to corrosion processing based on the preset fourth sliding window, that is, the text image to be positioned after the image preprocessing is subjected to vertical corrosion to obtain the second corrosion image.
It should be noted that the aspect ratio of the fourth sliding window is smaller than the third threshold, optionally, the third threshold may be 1/30, that is, the aspect ratio of the fourth sliding window is smaller than 1/30, and optionally, the width of the fourth sliding window may be 1 pixel.
According to the method provided by the embodiment of the application, the fourth sliding window is used for carrying out corrosion treatment on the text image to be positioned after image preprocessing, and image elements of horizontal lines and other non-vertical lines can be restrained.
S402, based on a preset fifth sliding window, performing expansion processing on the second corrosion image to obtain a second expansion image.
In the method provided in the embodiment of the application, the second erosion image is expanded based on the preset fifth sliding window, that is, the first erosion image is vertically expanded to obtain the second expanded image, and it should be noted that a specific process of the expansion processing refers to an existing image expansion process, which is not described herein again.
It should be noted that the aspect ratio of the fifth sliding window is smaller than the fourth threshold, optionally, the fourth threshold may be 1/20, that is, the aspect ratio of the fifth sliding window is larger than 1/20, optionally, the width of the fifth sliding window may be smaller than the fifth threshold, and the fifth threshold may be 5 pixel points, that is, the width of the fifth sliding window may be smaller than 5 pixel points.
S403, identifying each vertical connected region in the second expansion image, and determining a circumscribed rectangle of each vertical connected region.
And identifying each vertical connected region in the second expansion image, wherein the vertical connected region is a vertical expansion region, determining the circumscribed outline of the vertical connected region, and further determining the circumscribed rectangle of each vertical connected region.
S404, calculating the coordinates of two end points of a vertical line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of each vertical connected region.
And calculating the coordinates of two end points of the vertical line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the vertical connected region aiming at each vertical connected region.
The specific process of calculating the coordinates of the two end points of the vertical line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the vertical connected region for each vertical connected region comprises the following steps:
calculating an average value between the abscissa of the left edge and the abscissa of the right edge of the circumscribed rectangle of the vertically communicated region, and taking the average value as the abscissas of two end points of a vertical line corresponding to the circumscribed rectangle; and taking the ordinate of the upper edge of the circumscribed rectangle as the ordinate of the upper end point of the vertical line corresponding to the circumscribed rectangle, and taking the ordinate of the lower edge of the circumscribed rectangle as the ordinate of the lower end point of the vertical line corresponding to the circumscribed rectangle. For example, the upper edge of the circumscribed rectangle has an ordinate yupThe lower edge ordinate is ydownLeft edge abscissa is xleftThe abscissa of the right edge is xrightThe upper end point of the vertical line corresponding to the circumscribed rectangle isThe lower end point is
In the method provided by the embodiment of the application, after the horizontal line and the vertical line of the text image to be positioned after the image preprocessing are detected, a certain blank gap needs to be left for text region segmentation, so that based on each text region, the horizontal line above and/or below the text region is copied, the vertical line on the left and/or right of the text region is copied, the copied horizontal line is moved up or down, and the copied vertical line is moved left or right.
In the method provided by the embodiment of the application, each horizontal line and each vertical line form a straight line frame of the text image to be positioned, and the formed straight line frame divides the text image to be positioned into a plurality of rectangles.
And S108, determining a text line region in the text image to be positioned according to the coordinate value of the rectangle.
And determining the text line region in the text image to be positioned based on the coordinate values of all the rectangles, namely determining one rectangle by the coordinate values of each rectangle, wherein the determined rectangle corresponds to one text line region in the text image to be positioned, and the coordinate values of all the rectangles can determine all the text line regions in the text image to be positioned.
And S109, if the image type of the text image to be positioned is a complex background layout type, inputting the text image to be positioned into a pre-constructed single character recognition model to obtain a coordinate prediction value and a confidence coefficient of a single character frame corresponding to each single character in the text image to be positioned.
In the method provided in the embodiment of the present application, the single character recognition model is constructed in advance, and the construction process of the single character recognition model is referred to in the prior art and is not described herein again.
If the image type of the text image to be positioned is a complex background layout type, inputting the text image to be positioned into a pre-constructed single character recognition model to obtain a coordinate prediction value and a confidence coefficient of a single character frame corresponding to each single character in the text image to be positioned output by the single character recognition model, wherein optionally, the confidence coefficient ranges from 0 to 1, and the higher the numerical value is, the higher the confidence coefficient is.
And S110, determining a plurality of target single character frames from the single character frames based on the confidence coefficient of each single character frame.
Specifically, referring to fig. 5, the process of determining a plurality of target single word frames from each single word frame based on the confidence of each single word frame includes:
s501, aiming at each single character frame, if the confidence coefficient of the single character frame is not smaller than a preset confidence coefficient threshold value, the single character frame is determined to be an initial single character frame.
And determining the single character frame of which the confidence coefficient is not less than a preset confidence coefficient threshold value in each single character frame as an initial single character frame.
And S502, forming a single character frame set by each initial single character frame.
S503, selecting a first single character frame from the current single character frame set; the first single character frame is the initial single character frame with the maximum confidence level in each initial single character frame contained in the current single character frame set.
S504, calculating the area overlapping rate of the initial single character frame and the first single character frame aiming at each residual initial single character frame in the single character frame set.
For each initial single character frame remaining in the single character frame set, calculating the area overlapping rate of the initial single character frame and the first single character frame according to the intersected area and the parallel area by calculating the intersected area and the parallel area of the initial single character frame and the first single character frame, namely dividing the intersected area by the parallel area to obtain the area overlapping rate.
And S505, judging whether the area overlapping rate of the initial single character frame and the first single character frame is greater than a preset overlapping threshold value or not for each residual initial single character frame in the single character frame set.
And for each remaining initial single character frame in the single character frame set, judging whether the area overlapping rate is greater than a preset overlapping threshold value or not based on the calculated area overlapping rate of the initial single character frame and the first single character frame, if so, executing a step S506, and if not, executing a step S507.
S506, deleting the initial single-word frame from the single-word frame set.
For each initial single character frame remaining in the single character frame set, if the area overlapping rate is greater than the preset overlapping threshold, deleting the initial single character frame from the single character frame set, and executing step S507.
S507, judging whether an initial single character frame which does not calculate the area overlapping rate with the first single character frame exists in the single character frame set.
And judging whether an initial single character frame which does not calculate the area overlapping rate with the first single character frame exists in the single character frame set, if so, returning to execute the step S505, and if not, executing the step S508.
And S508, determining the first single character frame as a target single character frame.
S509, judging whether the current single-word frame set is an empty set.
And judging whether the current single-character frame set is an empty set, if so, directly ending, if not, returning to execute the step S503.
Optionally, in the method provided in this embodiment of the present application, target single character frames with heights not greater than the preset threshold are deleted.
And S111, merging the target single character frames adjacent in the horizontal direction to obtain a plurality of text line communication areas.
Merging target single character frames adjacent in the horizontal direction to obtain a plurality of text row communication areas, wherein the specific process comprises the following steps:
sequencing the target single character frames according to a preset sequence according to the abscissa of the upper left corner of each target single character frame to obtain a single character frame sequence;
judging whether the distance between the horizontal coordinates of the upper left corners of two adjacent target single character frames of the single character frame sequence is greater than a preset first threshold value or not, and segmenting whether the distance between the horizontal coordinates of the upper left corners is greater than the middle of two target single character frames corresponding to the preset first threshold value or not to obtain a plurality of single character frame sequences;
and taking the left boundary of the first target single-character frame in each single-character frame sequence as the left boundary of the text line communication region corresponding to the single-character frame sequence, taking the right boundary of the last target single-character frame in each single-character frame sequence as the right boundary of the text line communication region corresponding to the single-character frame sequence, taking the minimum value of the upper boundary of each single-character frame sequence as the upper boundary of the text line communication region corresponding to the single-character frame sequence, and taking the maximum value of the lower boundary of each single-character frame sequence as the lower boundary of the text line communication region corresponding to the single-character frame sequence.
In the method provided in this embodiment of the present application, each target single-character frame is arranged according to the abscissa of the upper left corner of each target single-character frame in a preset order to obtain a single-character frame sequence, optionally, the preset order may be an order from small to large abscissas, and whether the distance between the abscissas of the upper left corners of two adjacent target single-character frames in the single-character frame sequence is greater than a preset first threshold is determined, if so, the two target single-character frames are segmented to obtain a plurality of single-character frame sequences, that is, if the distance between the left abscissas of the upper left corners of a plurality of adjacent single-character frames in the single-character frame sequence is greater than the preset first threshold, the single-character frame sequence is divided into a plurality of single-character frame sequences, for example, the distance between the left abscissas of the upper left abscissas of two adjacent single-character frames in a plurality of groups is greater than the preset first threshold, the single-character frame sequences are finally divided into 6 groups, and each group of single-character frame sequence corresponds to one text line connected region, the upper boundary of the text line connected region is determined according to the minimum value of the upper boundary of the single character frame sequence corresponding to the text line connected region, the lower boundary is determined according to the maximum value of the lower boundary of the single character frame sequence corresponding to the text line connected region, the left boundary is determined according to the left boundary of the first target single character frame in the single character frame sequence corresponding to the text line connected region, and the right boundary is determined according to the right boundary of the last target single character frame in the single character frame sequence corresponding to the text line connected region.
And S112, carrying out horizontal line detection and vertical line detection on the text image to be positioned.
The specific implementation process of step S112 is as described in step S107, and is not described herein again.
S113, determining a text line region in the text image to be positioned according to the text line communication region and the detected horizontal line and vertical line.
After a straight-line frame and each text line communication area in a text image to be positioned are determined, the text line communication area is divided by the straight-line frame, namely, the detected horizontal line and vertical line form the straight-line frame of the text image to be positioned, the text image to be positioned is divided into a plurality of areas, so that characters belonging to different areas in the text communication area are divided, a plurality of final text line communication areas are obtained, each final text line communication area can determine a rectangle, the determined rectangle corresponds to one text line area in the text image to be positioned, and coordinate values of all rectangles can determine all text line areas in the text image to be positioned.
According to the image text region positioning method provided by the embodiment of the application, aiming at different types of text images, different image text region positioning strategies are adopted, and the text images of pure text types are expanded, so that adjacent characters are connected into a text line communicating region, the circumscribed rectangle of the text line communicating region is further determined, and the region with the text in the text images is positioned; the method comprises the steps of detecting a straight-line frame in a text image to position a region with text in the text image, identifying a single-word frame corresponding to each single word in the text image based on a single-word identification model for the text image with a complex background layout type, combining the single-word frames into a text line communication region, and detecting the straight-line frame in the text image to position the region with text in the text image through the straight-line frame and the text line communication region. By adopting the image text region positioning method provided by the embodiment of the application, the accurate positioning of the upper, lower, left and right edge positions of each text line in the text image is realized by identifying the straight-line frame and/or the circumscribed rectangle of the communicated region in the text image, and the universality of the image text region positioning of each type of text image is realized by adopting different image text region positioning strategies for different types of text images.
Corresponding to the method described in fig. 1, an embodiment of the present application further provides an apparatus for locating an image text region, which is used to implement the method in fig. 1 specifically, and a schematic structural diagram of the apparatus is shown in fig. 6, and specifically includes:
an obtaining unit 601, configured to obtain a text image to be positioned, and determine an image type of the text image to be positioned; the image category comprises a plain text type, a text straight line staggered type or a complex background layout type;
a first positioning unit 602, configured to, if the image type of the text image to be positioned is a pure text type, perform image preprocessing on the text image to be positioned, perform expansion processing on the text image to be positioned after the image preprocessing to obtain a target text image, identify each text line connected region in the target text image, determine a coordinate value of a circumscribed rectangle of each text line connected region, and determine a text line region in the text image to be positioned based on the coordinate value of the circumscribed rectangle of each text line connected region; the pixel values of adjacent pixel points in each text line communication area are the same;
the second positioning unit 603 is configured to, if the image type of the text image to be positioned is a text straight line staggered type, perform image preprocessing on the text image to be positioned, perform horizontal line detection and vertical line detection on the text image to be positioned after the image preprocessing, determine a plurality of rectangles based on each horizontal line and each vertical line obtained by the detection, and determine a text line region in the text image to be positioned according to coordinate values of the rectangles;
a third positioning unit 604, configured to, if the image type of the text image to be positioned is a complex background layout type, input the text image to be positioned into a pre-constructed single character recognition model, obtain a coordinate prediction value and a confidence level of a single character frame corresponding to each single character in the text image to be positioned, determine multiple target single character frames from each single character frame based on the confidence level of each single character frame, merge the target single character frames adjacent to each other in the horizontal direction to obtain multiple text line connected regions, perform horizontal line detection and vertical line detection on the text image to be positioned, and determine a text line region in the text image to be positioned according to each text line connected region and the detected horizontal line and vertical line.
According to the image text region positioning device provided by the embodiment of the application, aiming at different types of text images, different image text region positioning strategies are adopted, and the text images of pure text types are expanded, so that adjacent characters are connected into a text line communicating region, the circumscribed rectangle of the text line communicating region is further determined, and the region where the text exists in the text images is positioned; the method comprises the steps of detecting a straight-line frame in a text image to position a region with text in the text image, identifying a single-word frame corresponding to each single word in the text image based on a single-word identification model for the text image with a complex background layout type, combining the single-word frames into a text line communication region, and detecting the straight-line frame in the text image to position the region with text in the text image through the straight-line frame and the text line communication region. By adopting the image text region positioning device provided by the embodiment of the application, the accurate positioning of the upper, lower, left and right edge positions of each text line in the text image is realized by identifying the straight-line frame and/or the circumscribed rectangle of the communicated region in the text image, and different image text region positioning strategies are adopted for different types of text images, so that the universality of the positioning of the image text regions of the text images of various types is realized.
In an embodiment of the present application, based on the foregoing solution, the first positioning unit 602 and the second positioning unit 603 are configured to:
the graying subunit is used for performing graying processing on the text image to be positioned to obtain a grayed image;
the filtering subunit is used for carrying out filtering processing on the grayed image to obtain a filtered image;
a binarization subunit, configured to perform adaptive binarization processing on the filtered image to obtain a binarized image;
and the inversion sub unit is used for carrying out inversion processing on the pixel value of each pixel point in the binary image.
In an embodiment of the application, based on the foregoing solution, the filtering subunit performs filtering processing on the grayed image to obtain a filtered image, and is configured to:
sliding each pixel point in the gray level image by using the center of a preset filtering sliding window;
and when the center of the filtering sliding window slides to a pixel point in the gray image, selecting a preset filtering calculation formula corresponding to the noise type based on the noise type of the text image to be positioned, calculating a filtering gray value in the current filtering sliding window based on the selected filtering calculation mode, and taking the calculated filtering gray value as the pixel value of the pixel point.
In an embodiment of the present application, based on the foregoing scheme, the first positioning unit 602 performs dilation processing on the text image to be positioned after image preprocessing to obtain a target text image, and is configured to:
based on the first sliding window, performing expansion processing on the text image to be positioned after image preprocessing; the width of the first sliding window is determined according to the distance between adjacent characters in the text image to be positioned, and the height of the first sliding window is determined according to the line distance of text lines in the image to be positioned.
In an embodiment of the present application, based on the foregoing solution, the second positioning unit 603 performs horizontal line detection on the text image to be positioned after image preprocessing, so as to:
based on a preset second sliding window, carrying out corrosion treatment on the text image to be positioned after image preprocessing to obtain a first corrosion image;
based on a preset third sliding window, performing expansion processing on the first corrosion image to obtain a first expansion image;
identifying respective horizontal connected regions in the first dilated image and determining a circumscribed rectangle for each of the horizontal connected regions;
and calculating the coordinates of two end points of a horizontal line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the horizontal connected region aiming at each horizontal connected region.
In an embodiment of the present application, based on the foregoing solution, the second positioning unit 603 performs vertical line detection on the text image to be positioned after image preprocessing, and is configured to:
based on a preset fourth sliding window, carrying out corrosion treatment on the text image to be positioned after image preprocessing to obtain a second corrosion image;
based on a preset fifth sliding window, performing expansion processing on the second corrosion image to obtain a second expansion image;
identifying each vertical connected region in the second expansion image, and determining a circumscribed rectangle of each vertical connected region;
and calculating the coordinates of two end points of a vertical line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the vertical connected region aiming at each vertical connected region.
In an embodiment of the application, based on the foregoing solution, the third positioning unit 604 performs determining a plurality of target single word boxes from the single word boxes based on the confidence of each single word box, for:
for each single character frame, if the confidence coefficient of the single character frame is not less than a preset confidence coefficient threshold value, determining the single character frame as an initial single character frame;
forming the initial single word frames into a single word frame set;
selecting a first single character frame from the current single character frame set; the first single character frame is an initial single character frame with the highest confidence level in each initial single character frame contained in the current single character frame set;
calculating the area overlapping rate of the initial single character frame and the first single character frame aiming at each residual initial single character frame in the single character frame set, and deleting the initial single character frame from the single character frame set if the area overlapping rate is greater than a preset overlapping threshold value;
determining a target single character frame for the first single character frame, and judging whether the current single character frame set is an empty set;
and if the current single character frame set is not the empty set, returning to execute the step of selecting the first single character frame from the current single character frame set until the current single character frame set is the empty set.
The embodiment of the application also provides a storage medium, which comprises a stored instruction, wherein when the instruction runs, the device where the storage medium is located is controlled to execute the image text region positioning method.
An electronic device is provided in an embodiment of the present application, and its structural schematic diagram is shown in fig. 7, which specifically includes a memory 701 and one or more instructions 702, where the one or more instructions 702 are stored in the memory 701, and are configured to be executed by one or more processors 703 to perform the following operations according to the one or more instructions 702:
acquiring a text image to be positioned, and determining the image type of the text image to be positioned; the image category comprises a plain text type, a text straight line staggered type or a complex background layout type;
if the image type of the text image to be positioned is a pure text type, performing image preprocessing on the text image to be positioned, performing expansion processing on the text image to be positioned after the image preprocessing to obtain a target text image, identifying each text line connected region in the target text image, determining the coordinate value of the circumscribed rectangle of each text line connected region, and determining the text line region in the text image to be positioned based on the coordinate value of the circumscribed rectangle of each text line connected region; the pixel values of adjacent pixel points in each text line communication area are the same;
if the image type of the text image to be positioned is a text straight line staggered type, performing image preprocessing on the text image to be positioned, performing horizontal line detection and vertical line detection on the text image to be positioned after the image preprocessing, determining a plurality of rectangles based on each horizontal line and each vertical line obtained by the detection, and determining a text line area in the text image to be positioned according to coordinate values of the rectangles;
if the image type of the text image to be positioned is a complex background layout type, inputting the text image to be positioned into a pre-constructed single character recognition model to obtain a coordinate prediction value and a confidence coefficient of a single character frame corresponding to each single character in the text image to be positioned, determining a plurality of target single character frames from each single character frame based on the confidence coefficient of each single character frame, merging the target single character frames adjacent in the horizontal direction to obtain a plurality of text line communication areas, performing horizontal line detection and vertical line detection on the text image to be positioned, and determining a text line area in the text image to be positioned according to each text line communication area, and the horizontal line and the vertical line obtained through detection.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
The foregoing detailed description is directed to a method and an apparatus for locating an image text region, a storage medium, and an electronic device provided by the present application, and a specific example is applied in the detailed description to explain the principles and embodiments of the present application, and the description of the foregoing embodiment is only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Claims (10)
1. A method for locating an area of image text, comprising:
acquiring a text image to be positioned, and determining the image type of the text image to be positioned; the image category comprises a plain text type, a text straight line staggered type or a complex background layout type;
if the image type of the text image to be positioned is a pure text type, performing image preprocessing on the text image to be positioned, performing expansion processing on the text image to be positioned after the image preprocessing to obtain a target text image, identifying each text line connected region in the target text image, determining the coordinate value of the circumscribed rectangle of each text line connected region, and determining the text line region in the text image to be positioned based on the coordinate value of the circumscribed rectangle of each text line connected region; the pixel values of adjacent pixel points in each text line communication area are the same;
if the image type of the text image to be positioned is a text straight line staggered type, performing image preprocessing on the text image to be positioned, performing horizontal line detection and vertical line detection on the text image to be positioned after the image preprocessing, determining a plurality of rectangles based on each horizontal line and each vertical line obtained by the detection, and determining a text line area in the text image to be positioned according to coordinate values of the rectangles;
if the image type of the text image to be positioned is a complex background layout type, inputting the text image to be positioned into a pre-constructed single character recognition model to obtain a coordinate prediction value and a confidence coefficient of a single character frame corresponding to each single character in the text image to be positioned, determining a plurality of target single character frames from each single character frame based on the confidence coefficient of each single character frame, merging the target single character frames adjacent in the horizontal direction to obtain a plurality of text line communication areas, performing horizontal line detection and vertical line detection on the text image to be positioned, and determining a text line area in the text image to be positioned according to each text line communication area, and the horizontal line and the vertical line obtained through detection.
2. The method of claim 1, wherein the image preprocessing the text image to be positioned comprises:
carrying out graying processing on the text image to be positioned to obtain a grayed image;
filtering the grayed image to obtain a filtered image;
carrying out self-adaptive binarization processing on the filtered image to obtain a binarized image;
and carrying out inversion processing on the pixel value of each pixel point in the binary image.
3. The method of claim 2, wherein the filtering the grayed image to obtain a filtered image comprises:
sliding each pixel point in the gray level image by using the center of a preset filtering sliding window;
and when the center of the filtering sliding window slides to a pixel point in the gray image, selecting a preset filtering calculation formula corresponding to the noise type based on the noise type of the text image to be positioned, calculating a filtering gray value in the current filtering sliding window based on the selected filtering calculation mode, and taking the calculated filtering gray value as the pixel value of the pixel point.
4. The method according to claim 3, wherein the expanding the text image to be positioned after the image preprocessing to obtain the target text image comprises:
based on the first sliding window, performing expansion processing on the text image to be positioned after image preprocessing; the width of the first sliding window is determined according to the distance between adjacent characters in the text image to be positioned, and the height of the first sliding window is determined according to the line distance of text lines in the image to be positioned.
5. The method according to claim 3, wherein the horizontal line detection of the pre-processed text image to be positioned comprises:
based on a preset second sliding window, carrying out corrosion treatment on the text image to be positioned after image preprocessing to obtain a first corrosion image;
based on a preset third sliding window, performing expansion processing on the first corrosion image to obtain a first expansion image;
identifying respective horizontal connected regions in the first dilated image and determining a circumscribed rectangle for each of the horizontal connected regions;
and calculating the coordinates of two end points of a horizontal line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the horizontal connected region aiming at each horizontal connected region.
6. The method according to claim 3, wherein the vertical line detection of the pre-processed image of the text to be positioned comprises:
based on a preset fourth sliding window, carrying out corrosion treatment on the text image to be positioned after image preprocessing to obtain a second corrosion image;
based on a preset fifth sliding window, performing expansion processing on the second corrosion image to obtain a second expansion image;
identifying each vertical connected region in the second expansion image, and determining a circumscribed rectangle of each vertical connected region;
and calculating the coordinates of two end points of a vertical line corresponding to the circumscribed rectangle according to the coordinates of the circumscribed rectangle of the vertical connected region aiming at each vertical connected region.
7. The method of claim 1, wherein determining a plurality of target single word boxes from the respective single word boxes based on the confidence level of each single word box comprises:
for each single character frame, if the confidence coefficient of the single character frame is not less than a preset confidence coefficient threshold value, determining the single character frame as an initial single character frame;
forming the initial single word frames into a single word frame set;
selecting a first single character frame from the current single character frame set; the first single character frame is an initial single character frame with the highest confidence level in each initial single character frame contained in the current single character frame set;
calculating the area overlapping rate of the initial single character frame and the first single character frame aiming at each residual initial single character frame in the single character frame set, and deleting the initial single character frame from the single character frame set if the area overlapping rate is greater than a preset overlapping threshold value;
determining a target single character frame for the first single character frame, and judging whether the current single character frame set is an empty set;
and if the current single character frame set is not the empty set, returning to execute the step of selecting the first single character frame from the current single character frame set until the current single character frame set is the empty set.
8. An apparatus for locating a region of image text, comprising:
the acquisition unit is used for acquiring a text image to be positioned and determining the image type of the text image to be positioned; the image category comprises a plain text type, a text straight line staggered type or a complex background layout type;
the first positioning unit is used for performing image preprocessing on the text image to be positioned if the image type of the text image to be positioned is a pure text type, performing expansion processing on the text image to be positioned after the image preprocessing to obtain a target text image, identifying each text line communication area in the target text image, determining the coordinate value of the circumscribed rectangle of each text line communication area, and determining the text line area in the text image to be positioned based on the coordinate value of the circumscribed rectangle of each text line communication area; the pixel values of adjacent pixel points in each text line communication area are the same;
the second positioning unit is used for preprocessing the text image to be positioned if the image type of the text image to be positioned is a text straight line staggered type, performing horizontal line detection and vertical line detection on the text image to be positioned after image preprocessing, determining a plurality of rectangles based on each horizontal line and each vertical line obtained through detection, and determining a text line area in the text image to be positioned according to the coordinate values of the rectangles;
and the third positioning unit is used for inputting the text image to be positioned into a pre-constructed single character recognition model if the image type of the text image to be positioned is a complex background layout type, obtaining a coordinate prediction value and a confidence coefficient of a single character frame corresponding to each single character in the text image to be positioned, determining a plurality of target single character frames from each single character frame based on the confidence coefficient of each single character frame, combining the target single character frames adjacent in the horizontal direction to obtain a plurality of text line communication areas, performing horizontal line detection and vertical line detection on the text image to be positioned, and determining a text line area in the text image to be positioned according to each text line communication area and the detected horizontal line and vertical line.
9. A storage medium, comprising stored instructions, wherein when executed, the storage medium controls a device on which the storage medium is located to perform the image text region location method according to any one of claims 1 to 7.
10. An electronic device comprising a memory and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by the one or more processors to perform the method of image text region location according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011561668.5A CN112560847A (en) | 2020-12-25 | 2020-12-25 | Image text region positioning method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011561668.5A CN112560847A (en) | 2020-12-25 | 2020-12-25 | Image text region positioning method and device, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112560847A true CN112560847A (en) | 2021-03-26 |
Family
ID=75032624
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011561668.5A Pending CN112560847A (en) | 2020-12-25 | 2020-12-25 | Image text region positioning method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112560847A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113486828A (en) * | 2021-07-13 | 2021-10-08 | 杭州睿胜软件有限公司 | Image processing method, device, equipment and storage medium |
CN114495103A (en) * | 2022-01-28 | 2022-05-13 | 北京百度网讯科技有限公司 | Text recognition method, text recognition device, electronic equipment and medium |
CN115880704A (en) * | 2023-02-16 | 2023-03-31 | 中国人民解放军总医院第一医学中心 | Automatic case cataloging method, system, equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102073862A (en) * | 2011-02-18 | 2011-05-25 | 山东山大鸥玛软件有限公司 | Method for quickly calculating layout structure of document image |
CN103034848A (en) * | 2012-12-19 | 2013-04-10 | 方正国际软件有限公司 | Identification method of form type |
CN107633239A (en) * | 2017-10-18 | 2018-01-26 | 江苏鸿信系统集成有限公司 | Bill classification and bill field extracting method based on deep learning and OCR |
WO2019104879A1 (en) * | 2017-11-30 | 2019-06-06 | 平安科技(深圳)有限公司 | Information recognition method for form-type image, electronic device and readable storage medium |
CN109948135A (en) * | 2019-03-26 | 2019-06-28 | 厦门商集网络科技有限责任公司 | A kind of method and apparatus based on table features normalized image |
CN111460927A (en) * | 2020-03-17 | 2020-07-28 | 北京交通大学 | Method for extracting structured information of house property certificate image |
-
2020
- 2020-12-25 CN CN202011561668.5A patent/CN112560847A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102073862A (en) * | 2011-02-18 | 2011-05-25 | 山东山大鸥玛软件有限公司 | Method for quickly calculating layout structure of document image |
CN103034848A (en) * | 2012-12-19 | 2013-04-10 | 方正国际软件有限公司 | Identification method of form type |
CN107633239A (en) * | 2017-10-18 | 2018-01-26 | 江苏鸿信系统集成有限公司 | Bill classification and bill field extracting method based on deep learning and OCR |
WO2019104879A1 (en) * | 2017-11-30 | 2019-06-06 | 平安科技(深圳)有限公司 | Information recognition method for form-type image, electronic device and readable storage medium |
CN109948135A (en) * | 2019-03-26 | 2019-06-28 | 厦门商集网络科技有限责任公司 | A kind of method and apparatus based on table features normalized image |
CN111460927A (en) * | 2020-03-17 | 2020-07-28 | 北京交通大学 | Method for extracting structured information of house property certificate image |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113486828A (en) * | 2021-07-13 | 2021-10-08 | 杭州睿胜软件有限公司 | Image processing method, device, equipment and storage medium |
WO2023284502A1 (en) * | 2021-07-13 | 2023-01-19 | 杭州睿胜软件有限公司 | Image processing method and apparatus, device, and storage medium |
CN113486828B (en) * | 2021-07-13 | 2024-04-30 | 杭州睿胜软件有限公司 | Image processing method, device, equipment and storage medium |
CN114495103A (en) * | 2022-01-28 | 2022-05-13 | 北京百度网讯科技有限公司 | Text recognition method, text recognition device, electronic equipment and medium |
CN115880704A (en) * | 2023-02-16 | 2023-03-31 | 中国人民解放军总医院第一医学中心 | Automatic case cataloging method, system, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112560847A (en) | Image text region positioning method and device, storage medium and electronic equipment | |
CN108520254B (en) | Text detection method and device based on formatted image and related equipment | |
CN110232713B (en) | Image target positioning correction method and related equipment | |
CN109389121B (en) | Nameplate identification method and system based on deep learning | |
CN111626190A (en) | Water level monitoring method for scale recognition based on clustering partitions | |
CN110647795A (en) | Form recognition method | |
EP1374148B1 (en) | Method and device for recognition of a handwritten pattern | |
CN110059702B (en) | Object contour recognition method and device | |
CN109509200A (en) | Checkerboard angle point detection process, device and computer readable storage medium based on contours extract | |
CN110942004A (en) | Handwriting recognition method and device based on neural network model and electronic equipment | |
CN110674811B (en) | Image recognition method and device | |
CN107977658B (en) | Image character area identification method, television and readable storage medium | |
CN113592886A (en) | Method and device for examining architectural drawings, electronic equipment and medium | |
CN111461100A (en) | Bill identification method and device, electronic equipment and storage medium | |
WO2017161636A1 (en) | Fingerprint-based terminal payment method and device | |
CN109389110B (en) | Region determination method and device | |
CN104331695A (en) | Robust round identifier shape quality detection method | |
US7146047B2 (en) | Image processing apparatus and method generating binary image from a multilevel image | |
CN113963353A (en) | Character image processing and identifying method and device, computer equipment and storage medium | |
CN116862910A (en) | Visual detection method based on automatic cutting production | |
CN113435219B (en) | Anti-counterfeiting detection method and device, electronic equipment and storage medium | |
CN113392455A (en) | House type graph scale detection method and device based on deep learning and electronic equipment | |
CN111199240A (en) | Training method of bank card identification model, and bank card identification method and device | |
CN111008987B (en) | Method and device for extracting edge image based on gray background and readable storage medium | |
CN109740337B (en) | Method and device for realizing identification of slider verification code |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |