CN111666933B

CN111666933B - Text detection method and device, electronic equipment and storage medium

Info

Publication number: CN111666933B
Application number: CN202010513951.4A
Authority: CN
Inventors: 尹磊; 邓小兵; 张春雨
Original assignee: Guangdong Genius Technology Co Ltd
Current assignee: Guangdong Genius Technology Co Ltd
Priority date: 2020-06-08
Filing date: 2020-06-08
Publication date: 2023-04-07
Anticipated expiration: 2040-06-08
Also published as: CN111666933A

Abstract

The embodiment of the invention discloses a text detection method and device, electronic equipment and a storage medium. The method comprises the following steps: acquiring a mask image of a text line region mask of a target picture; determining the value of each pixel point in the mask image, wherein in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i; subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the j th row in the mask image to obtain a new value of the pixel point in the j th row or the j +1 th row; a set of pixel points equal to-i and i in the new value respectively forms first boundary information and second boundary information corresponding to a text line region mask numbered i; and constructing a text line outline corresponding to the text line area mask with the number i by using the first boundary information and the second boundary information. By implementing the embodiment of the invention, the outline of each text line can be quickly determined, and the time consumption of the whole text recognition is reduced.

Description

Text detection method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of text detection, in particular to a text detection method and device, electronic equipment and a storage medium.

Background

In the text recognition technology, a photographed image is greatly affected by the environment, and in the text recognition, a text line needs to be detected to obtain an optimal text line circumscribed frame, so that the text in the circumscribed frame is recognized.

The existing classical text detection technology is mainly a text line detection algorithm based on PSENet, which combines FPN and PSE technologies, detects each text line through FPN, and then outputs a mask image for multi-classification of text regions and backgrounds after a post-processing based on PSE, i.e. a progressive scale expansion algorithm, that is, outputs a matrix with the same size as the input image and only one channel, each matrix has a value of 0, 1, 2.

After the multi-classified mask image is obtained, traversal is performed on the mask of each text line region through a findContours function in Opencv to find out the outline range of each text line region. However, the contour finding operation needs to be performed on each text line region mask, and for an input image with dense text line regions, the overall time consumption for finding the contour is high, which reaches over 400ms and accounts for about 80% -90% of the overall text line detection algorithm, so that the overall time consumption of OCR is affected.

Disclosure of Invention

Aiming at the defects, the embodiment of the invention discloses a text detection method, a text detection device, electronic equipment and a storage medium, which can quickly determine the outline of each text line and reduce the time consumption of the whole text recognition.

The first aspect of the embodiments of the present invention discloses a method for text detection, where the method includes:

acquiring a mask image of a text line region mask of a target picture, wherein the mask image and the target picture have the same size;

determining the value of each pixel point in the mask image, wherein in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i, and the values of the rest pixel points outside the text line region mask in the mask image are 0; i is more than or equal to 1 and less than or equal to M, wherein M is the total number of the text line region masks corresponding to the target picture;

subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the j th row in the mask image to obtain a new value of the pixel point in the j th row or the j +1 th row, wherein j is more than or equal to 1 and less than or equal to N, and N is the total row number of the mask image;

a set of pixel points which are equal to-i in the new value forms first boundary information corresponding to the text line region mask with the number of i, and a set of pixel points which are equal to i in the new value forms second boundary information corresponding to the text line region mask with the number of i;

and constructing a text line outline corresponding to the text line area mask with the number i by using the first boundary information and the second boundary information.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the obtaining a mask map of a text line region mask of a target picture includes:

acquiring a target picture;

and inputting the target picture into a pre-trained text line detection network model based on deep learning, and outputting a mask image with masks of all text line regions.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the forming, by using the first boundary information and the second boundary information, a text line outline corresponding to a text line area mask with a structure number i includes:

and determining the position and height of the center line corresponding to the text line area mask with the number i according to the first boundary information and the second boundary information.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the determining, according to the first boundary information and the second boundary information, a centerline position and a height corresponding to a text line area mask with a number i includes:

determining a second coordinate of the pixel point which is equal to i in the new value corresponding to the first coordinate of the pixel point which is equal to-i in the new value, wherein the abscissa of the first coordinate is the same as that of the second coordinate;

adding the vertical coordinates of the first coordinate and the second coordinate, and then averaging to obtain a midpoint position; all the midpoint positions form a centerline position corresponding to the text line region mask with the number i;

subtracting the vertical coordinates of the first coordinate and the second coordinate, and then taking an absolute value to obtain height information; the set of all height information forms the height corresponding to the text line region mask with the number i;

and constructing a text line outline corresponding to the text line region mask with the number i based on the center line position and the height corresponding to the text line region mask with the number i.

sequentially connecting the pixel points which are equal to-i in the new value to form a first boundary corresponding to the text line region mask with the number of i; sequentially connecting the pixel points which are equal to i in the new value to form a second boundary corresponding to the text line region mask code which is numbered i;

determining a pixel point with the smallest abscissa among the pixel points with the new value equal to-i as a first pixel point, and determining a pixel point with the smallest abscissa among the pixel points with the new value equal to i as a second pixel point; determining the pixel point with the maximum abscissa in the pixel points with the new value equal to-i as a third pixel point, and determining the pixel point with the maximum abscissa in the pixel points with the new value equal to i as a fourth pixel point;

connecting the first pixel point with the second pixel point to serve as a left boundary corresponding to the text line region mask with the number i; connecting the third pixel point with the fourth pixel point to be used as the right boundary of the text line region mask with the number i;

and forming a text line outline corresponding to the text line region mask with the number i by using a closed box formed by the left boundary, the first boundary, the right boundary and the second boundary.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the method further includes:

and determining the position of a text line outline corresponding to the text line region mask with the number of i in the target picture, and synthesizing the text line outline into the target picture.

A second aspect of the embodiments of the present invention discloses a text detection apparatus, including:

the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a mask image of a text line region mask of a target picture, and the mask image and the target picture have the same size;

the determining unit is used for determining the value of each pixel point in the mask image, in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i, and the values of the other pixel points outside the text line region mask in the mask image are 0; i is more than or equal to 1 and less than or equal to M, wherein M is the total number of the text line region masks corresponding to the target picture;

the calculation unit is used for subtracting the value of the pixel point corresponding to the j +1 th line from the value of the pixel point in the j th line in the mask image to obtain a new value of the pixel point in the j th line or the j +1 th line, wherein j is more than or equal to 1 and is less than or equal to N, and N is the total line number of the mask image;

the information forming unit is used for forming first boundary information corresponding to the text line region mask code with the number i by the set of pixel points with the number-i in the new value, and forming second boundary information corresponding to the text line region mask code with the number i by the set of pixel points with the number i in the new value;

and the outline construction unit is used for constructing a text line outline corresponding to the text line region mask with the number of i by using the first boundary information and the second boundary information.

As an optional implementation manner, in a second aspect of the embodiment of the present invention, the obtaining unit includes:

the picture acquisition subunit is used for acquiring a target picture;

and the identification subunit is used for inputting the target picture into a pre-trained text line detection network model based on deep learning and outputting a mask image with the mask of each text line region.

As an alternative implementation, in a second aspect of the embodiment of the present invention, the contour constructing unit includes:

and the central line and height acquisition subunit is used for determining the position and the height of the central line corresponding to the text line region mask with the number i according to the first boundary information and the second boundary information.

As an alternative implementation, in a second aspect of the embodiments of the present invention, the centerline and height obtaining subunit includes:

the first sun-breaking unit is used for determining a second coordinate of the pixel point which is equal to the i in the new value and corresponds to the first coordinate of the pixel point which is equal to the-i in the new value, and the abscissa of the first coordinate is the same as that of the second coordinate;

the second grandchild unit is used for adding the vertical coordinates of the first coordinate and the second coordinate and then averaging to obtain a midpoint position; all the midpoint positions form a centerline position corresponding to the text line region mask with the number i;

the third sun unit is used for subtracting the vertical coordinates of the first coordinate and the second coordinate and then taking an absolute value to obtain height information; the set of all height information forms the height corresponding to the text line region mask with the number i;

and a fourth grandchild unit, configured to construct a text line outline corresponding to the text line area mask number i based on the centerline position and the height corresponding to the text line area mask number i.

the first information construction subunit is used for sequentially connecting the pixel points which are equal to-i in the new value to form a first boundary corresponding to the text line region mask with the number of i; sequentially connecting the pixel points which are equal to i in the new value to form a second boundary corresponding to the text line region mask code which is numbered i;

the target pixel point determining subunit is used for determining a pixel point with the minimum abscissa in the pixel points with the new value equal to-i as a first pixel point and determining a pixel point with the minimum abscissa in the pixel points with the new value equal to i as a second pixel point; determining the pixel point with the maximum abscissa in the pixel points with the new value equal to-i as a third pixel point, and determining the pixel point with the maximum abscissa in the pixel points with the new value equal to i as a fourth pixel point;

the second information construction subunit is used for connecting the first pixel point with the second pixel point and taking the first pixel point and the second pixel point as a left boundary corresponding to the text line region mask with the number i; connecting the third pixel point with the fourth pixel point to serve as the right boundary of the text line region mask with the number i;

and a third information construction subunit, configured to form a text line outline corresponding to the text line area mask with the number i from a closed box formed by the left boundary, the first boundary, the right boundary, and the second boundary.

As an optional implementation manner, in a second aspect of the embodiment of the present invention, the apparatus further includes:

and the synthesis unit is used for determining the position of the text line outline corresponding to the text line area mask with the number i in the target picture and synthesizing the text line outline into the target picture.

A third aspect of an embodiment of the present invention discloses an electronic device, including: a memory storing executable program code; a processor coupled with the memory; the processor calls the executable program code stored in the memory for executing part or all of the steps of a text detection method disclosed in the first aspect of the embodiments of the present invention.

A fourth aspect of the embodiments of the present invention discloses a computer-readable storage medium storing a computer program, where the computer program enables a computer to execute part or all of the steps of a method for text detection disclosed in the first aspect of the embodiments of the present invention.

A fifth aspect of the embodiments of the present invention discloses a computer program product, which, when running on a computer, causes the computer to execute part or all of the steps of a method for text detection disclosed in the first aspect of the embodiments of the present invention.

A sixth aspect of the embodiments of the present invention discloses an application publishing platform, where the application publishing platform is configured to publish a computer program product, where when the computer program product runs on a computer, the computer is enabled to execute part or all of the steps of a method for text detection disclosed in the first aspect of the embodiments of the present invention.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, a mask image of a text line region mask of a target picture is obtained, and the size of the mask image is the same as that of the target picture; determining the value of each pixel point in the mask image, wherein in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i, and the values of the rest pixel points outside the text line region mask in the mask image are 0; i is more than or equal to 1 and less than or equal to M, wherein M is the total number of the text line region masks corresponding to the target picture; subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the jth row in the mask image to obtain a new value of the pixel point in the jth row or the jth +1 th row, wherein j is more than or equal to 1 and less than or equal to N, and N is the total row number of the mask image; a set of pixel points which are equal to-i in the new value forms first boundary information corresponding to the text line region mask with the number of i, and a set of pixel points which are equal to i in the new value forms second boundary information corresponding to the text line region mask with the number of i; and constructing a text line outline corresponding to the text line region mask with the number i by using the first boundary information and the second boundary information. Therefore, by implementing the embodiment of the invention, the subtraction operation is carried out line by line aiming at the mask image with multiple classifications of the text line region and the background, and the upper and lower boundaries of the corresponding text line are found out through the subtraction operation, so that the outline information of the corresponding text line is calculated, the time consumption of the module is greatly reduced, the module is not influenced by the density degree of the text line region, the average time consumption for finding the outline is reduced to be within 50ms, and the problem of overhigh integral time delay of OCR is solved to a great extent.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of a method for text detection according to an embodiment of the present invention;

FIG. 2 is a schematic illustration of a mask map disclosed in accordance with an embodiment of the present invention;

FIG. 3 is a schematic diagram of new values of the mask map of FIG. 2 after subtraction from each other line by line;

FIG. 4 is a schematic illustration of a first boundary and a second boundary determined based on the new values of FIG. 3;

FIG. 5 is a flow chart illustrating another method for text detection according to an embodiment of the present invention;

FIG. 6 is a flow chart illustrating another method for text detection according to an embodiment of the present invention;

FIG. 7 is a schematic illustration of a text line profile determined based on the new values of FIG. 3;

FIG. 8 is a schematic illustration of another mask map disclosed in an embodiment of the present invention;

FIG. 9 is a schematic diagram of new values of the mask map of FIG. 8 after subtraction from each other line by line;

FIG. 10 is a schematic illustration of a text line profile determined based on the new values of FIG. 9;

FIG. 11 is a schematic structural diagram of an apparatus for text detection according to an embodiment of the present invention;

FIG. 12 is a schematic structural diagram of another text detection apparatus disclosed in the embodiments of the present invention;

fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

It should be noted that the terms "first", "second", "third", "fourth", and the like in the description and the claims of the present invention are used for distinguishing different objects, and are not used for describing a specific order. The terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the invention discloses a text detection method, a text detection device, electronic equipment and a storage medium, wherein the text detection method, the electronic equipment and the storage medium perform subtraction operation line by line aiming at a mask image with multiple classifications of a text line region and a background, and the upper and lower boundaries of a corresponding text line are found out through the subtraction operation, so that outline information of the corresponding text line is calculated, the time consumption of a module is reduced to the maximum extent, the module is not influenced by the density degree of the text line region, the average time consumption for finding the outline is reduced to be within 50ms, the problem of overhigh integral time delay of OCR is solved to the maximum extent, and the detailed description is carried out in combination with the attached drawings.

Example one

Referring to fig. 1, fig. 1 is a schematic flow chart of a text detection method according to an embodiment of the present invention. As shown in fig. 1, the text detection method includes the following steps:

110. and acquiring a mask image of a text line area mask of the target picture, wherein the mask image and the target picture have the same size.

The target picture is an image input by a user, and the mode of inputting the image by the user may be an image obtained by photographing a document by an image acquisition device, or an image downloaded by the user from the internet, which is not limited herein. One or more text lines exist in the target picture, and whether the text lines are in a horizontal state is not limited.

The method for acquiring the text line region mask of the target picture can be various, and in the embodiment of the invention, the text line detection network model based on deep learning is adopted, and the text line detection network model can adopt any deep learning network such as YOLO, CTPN, PSENet and the like. Illustratively, a PSENet text line detection network model is adopted, so that the detection result has strong robustness to the conditions of illumination, color, texture, blur and the like.

After the PSENet text line detection network model is created, the PSENet text line detection network model is trained through a sample set, and the label of the sample is an external frame of each text line. Inputting a target picture into a PSENet text line detection network model, detecting each text line region through FPN, and outputting a mask (mask) image aiming at the text region and background multi-classification after PSE-based post-processing, namely a progressive scale expansion algorithm.

The mask map is a presentation mode of a matrix which has the same size as the target picture and only has one channel, the matrix is a two-dimensional matrix of N × M, wherein N is the number of rows of pixel points of the target picture and the mask map, M is the number of columns of pixel points of the target picture and the mask map, the value of each pixel point is 0, 1, 2.

For the user, since the matrix cannot visually display each text line region mask, in fact, the final result of the PSENet text line detection network model is a mask image, and the values of each text line region mask are different, so that the text line region masks with different colors in multiple categories are presented.

120. Determining the value of each pixel point in the mask image, wherein in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i, and the values of the rest pixel points outside the text line region mask in the mask image are 0; i is more than or equal to 1 and less than or equal to M, and M is the total number of the text line region masks corresponding to the target picture.

Based on the principle of step 110, values of all pixel points can be obtained, and each text line region mask is numbered according to the difference of the values of the pixel points, wherein the value of each pixel point in the text line region mask numbered with i is i, and the value of each pixel point in the text line region mask numbered with k is k. The values of the other pixel points outside the mask of the region of the text line in the mask image are 0. I is more than or equal to 1 and less than or equal to M, and M is the total number of the text line region masks identified by the PSENet text line detection network model corresponding to the target picture.

130. And subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the j th row in the mask image to obtain a new value of the pixel point in the j th row or the j +1 th row, wherein j is more than or equal to 1 and is less than or equal to N, and N is the total row number of the mask image.

The new value calculation can be realized by using an efficient matrix operation method in a python scientific calculation library numpy, and the two-dimensional matrix of N x m corresponding to the mask map is subtracted line by line to obtain a new value matrix. Specifically, the value of the pixel point of the jth row in the mask image is subtracted from the value of the pixel point of the jth +1 row to obtain a new value of the pixel point of the jth row or the jth +1 row, wherein j is greater than or equal to 1 and less than or equal to N, and N is the total row number of the mask image.

Subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the j th row means subtracting the value of the pixel point corresponding to the j +1 th row of the column where each pixel point in the j th row is located from the value of each pixel point in the j th row.

The first line of the mask map may be taken as the first line, and the new value is calculated as a subtraction operation from top to bottom. The last row of the mask map may also be taken as the first row, and the new values are calculated as a subtraction operation from bottom to top. Of course, the two may also be converted to each other, for example, if the first row of the mask map is used as the first row, the value of the pixel point in the jth row in the mask map is subtracted from the value of the pixel point in the jth row to obtain the new value of the pixel point in the jth row or the jth-1 row, and the new value is calculated from bottom to top to realize the gradual subtraction operation. The difference obtained by subtracting is used as a new value of any one of the two rows subtracted, and the last row without the new value takes a value of 0, for example, the subtraction operation is realized from bottom to top, and the difference obtained by subtracting the difference of the second row from the last row from the row of the last row is used as the new value of the last row, so that the first row does not have the new value, the values of the first row are all set to 0, if the difference is used as the new value of the second row from the last, the last row does not have the new value, and the value of the lowest row is all set to 0.

In a similar manner, the contour of each text row can also be determined by subtracting adjacent columns to obtain a new value.

140. And forming first boundary information corresponding to the text line region mask with the number i by the set of pixel points with the number-i in the new value, and forming second boundary information corresponding to the text line region mask with the number i by the set of pixel points with the number i in the new value.

Taking the way of subtracting from each other from the bottom up, to obtain a new value of the text line region mask with the number i as an example, the first boundary information (i.e. the lower boundary information, if from top to bottom, the first boundary information is the upper boundary information) and the second boundary information (i.e. the upper boundary information, if from top to bottom, the second boundary information is the lower boundary information) corresponding to the text line region mask with the number i are determined.

Specifically, referring to the mask map (i = 7) containing a rectangular text line region mask shown in fig. 2, a new value of fig. 3 is obtained by subtracting from bottom to top line by line, in the new value of fig. 3, a pixel point with a new value of-7 constitutes lower boundary information, and a pixel point with a new value of 7 constitutes upper boundary information.

150. And constructing a text line outline corresponding to the text line region mask with the number i by using the first boundary information and the second boundary information.

As shown in fig. 4, each pixel point of the first boundary information and each pixel point of the second boundary information are sequentially connected to obtain a lower boundary 21 and an upper boundary 22 corresponding to the text line region mask with the number of 7, where the upper boundary and the lower boundary can be used as text line outlines corresponding to the text line region mask with the number of 7, where the left boundary and the right boundary can be upper and lower boundaries (extending parallel to the horizontal direction of the mask map) extending to the head column and the tail column of the mask map, and the left and right boundaries of the mask map are used as the left and right boundaries of the text line region mask with the number of 7.

The text line outline corresponding to each text line region mask is obtained in the same way, because the pixel positions of the text line outline are determined, and the sizes of the mask image and the target image are completely the same, the values of all the pixels on all the text line outlines can be adjusted to the values with the same numbers as the pixel positions, the text line outline image corresponding to the mask image is obtained, then the target image and the mask image are directly combined into a new drawing, the drawing contains all the text lines of the target image, and the periphery of all the text lines is wrapped by the corresponding text line outline.

Each text line outline can be divided through the new drawing, and then the divided text lines are sent to a character recognition model for character recognition (certainly, the recognition can also not be divided), and the recognition result can be used for searching questions and the like.

By implementing the embodiment of the invention, the subtraction operation is carried out line by line aiming at the mask image with multiple classifications of the text line region and the background, and the upper and lower boundaries of the corresponding text line are found out through the subtraction operation, so that the outline information of the corresponding text line is calculated, the time consumption of the module is greatly reduced, the module is not influenced by the density degree of the text line region, the average time consumption for finding the outline is reduced to be within 50ms, and the problem of overhigh integral time delay of OCR is solved to a great extent.

Example two

Referring to fig. 5, fig. 5 is a schematic flow chart of another text detection method according to the embodiment of the invention. As shown in fig. 5, the text detection method includes the steps of:

310. and acquiring a mask image of a text line area mask of the target picture, wherein the mask image and the target picture have the same size.

320. Determining the value of each pixel point in the mask image, wherein in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i, and the values of the rest pixel points outside the text line region mask in the mask image are 0; and i is more than or equal to 1 and less than or equal to M, wherein M is the total number of the text line region masks corresponding to the target picture.

330. And subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the j th row in the mask image to obtain a new value of the pixel point in the j th row or the j +1 th row, wherein j is more than or equal to 1 and is less than or equal to N, and N is the total row number of the mask image.

340. And forming first boundary information corresponding to the text line region mask with the number i by the set of pixel points with the number-i in the new value, and forming second boundary information corresponding to the text line region mask with the number i by the set of pixel points with the number i in the new value.

350. And determining the position and height of the center line corresponding to the text line area mask with the number i according to the first boundary information and the second boundary information.

360. And constructing a text line outline corresponding to the text line region mask with the number i based on the center line position and the height corresponding to the text line region mask with the number i.

Steps 310 to 340 may be the same as steps 110 to 140 in the first embodiment, and are not described herein again.

Step 350 and step 360, determine the outline of the text line with the centerline position and height.

In step 350, a second coordinate of the pixel point equal to i in the new value corresponding to the first coordinate of the pixel point equal to-i in the new value is determined, and the abscissa of the first coordinate is the same as that of the second coordinate. Based on the above way of subtracting row by row, each first coordinate corresponds to a second coordinate through the corresponding relationship between the abscissa of the two coordinates, that is, the two coordinates are located in the same column in the matrix formed by the new value.

Adding the vertical coordinates of the first coordinate and the second coordinate, and then averaging to obtain a midpoint position; and the set of all the midpoint positions forms a midline position corresponding to the text line region mask with the number of i. It should be noted here that the pixel coordinate is a position of the pixel in the matrix constructed by the new value, for example, an abscissa value of the point (x, y) is x, and an ordinate value is y, which represents that the pixel is located in the xth row and the yth column of the matrix constructed by the new value, therefore, the averaged value after adding the ordinate of the first coordinate and the ordinate of the second coordinate may not be an integer, and the central line position is also presented by the pixel position, so if not an integer, the rounding may be performed upward or downward.

Subtracting the vertical coordinates of the first coordinate and the second coordinate, and then taking an absolute value to obtain height information; the set of all height information forms the height corresponding to the text line region mask with the number i; thus, the height information is presented in pixel points.

In step 360, a text line outline corresponding to the text line region mask with the number i is constructed based on the centerline position and the height corresponding to the text line region mask with the number i. The position and the height of the center line of the text line outline are determined, the corresponding text line outline can be obtained, similarly, the head end and the tail end of the connected center line can be respectively extended to obtain the left end and the right end of the text line outline, and the heights from the left end and the right end of the text line outline to the head end and the tail end of the center line can respectively adopt the height information corresponding to the head end and the tail end of the center line.

Of course, in other embodiments, the first embodiment and the second embodiment may be combined, and the text line outline may be constructed by the first boundary information, the second boundary information, and the position and height of the middle line.

EXAMPLE III

Referring to fig. 6, fig. 6 is a schematic flowchart of another text detection method according to an embodiment of the disclosure. As shown in fig. 6, the text detection method includes the steps of:

410. and acquiring a mask image of a text line region mask of the target picture, wherein the size of the mask image is the same as that of the target picture.

420. Determining the value of each pixel point in the mask image, wherein in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i, and the values of the rest pixel points outside the text line region mask in the mask image are 0; i is more than or equal to 1 and less than or equal to M, and M is the total number of the text line region masks corresponding to the target picture.

430. And subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the j th row in the mask image to obtain a new value of the pixel point in the j th row or the j +1 th row, wherein j is more than or equal to 1 and less than or equal to N, and N is the total row number of the mask image.

440. And forming first boundary information corresponding to the text line region mask with the number i by the set of pixel points with the number-i in the new value, and forming second boundary information corresponding to the text line region mask with the number i by the set of pixel points with the number i in the new value.

450. And determining an upper boundary, a lower boundary, a left boundary and a right boundary corresponding to the text line region mask with the number i according to the first boundary information and the second boundary information.

460. And constructing a text line outline corresponding to the text line region mask with the number i based on the upper, lower, left and right borders corresponding to the text line region mask with the number i.

Steps 410 to 440 may be the same as steps 110 to 140 in the first embodiment, and are not described herein again.

If the target picture is arranged in multiple formats, the text of other lines may be covered by the text line outline dividing method, and recognition is inaccurate, and in step 350 and step 360 of the embodiment of the present invention, the outline of the text line is determined by the upper, lower, left, and right boundaries.

In step 450, sequentially connecting the pixel points which are equal to-i in the new value to form a first boundary corresponding to the text line region mask with the number of i; and sequentially connecting the pixel points which are equal to i in the new value to form a second boundary corresponding to the text line region mask with the number of i. Still taking the mask map shown in fig. 2 as an example, the bottom-up line-by-line subtraction algorithm shown in fig. 3 is adopted, and the second boundary with the upper boundary being the line segment AB and the first boundary with the lower boundary being the line segment EF in fig. 7 can be obtained by connecting the new value pixel points.

Then determining the pixel point with the minimum abscissa in the pixel points with the new value equal to-i as a first pixel point, and determining the pixel point with the minimum abscissa in the pixel points with the new value equal to i as a second pixel point; and determining the pixel point with the largest abscissa in the pixel points with the new value equal to-i as a third pixel point, and determining the pixel point with the largest abscissa in the pixel points with the new value equal to i as a fourth pixel point. Connecting the first pixel point with the second pixel point to serve as a left boundary corresponding to a text line region mask with the number i; and connecting the third pixel point and the fourth pixel point to be used as a right boundary of the text line region mask with the number i.

Because a line-by-line subtraction mode is adopted, the vertical coordinates of each first pixel point and each second pixel point are the same, and the vertical coordinates of each third pixel point and each fourth pixel point are the same. After the first pixel point is connected with the second pixel point and the third pixel point is connected with the fourth pixel point, the left boundary of which the upper boundary is the line segment AF and the right boundary of which the lower boundary is the line segment BE in fig. 7 can BE obtained.

In step 460, the closed box formed by the left boundary, the first boundary, the right boundary and the second boundary forms the text line outline corresponding to the text line area mask with the number of 7, that is, the closed box formed by the line segments AB, BE, EF and AF in fig. 7 is used as the text line outline corresponding to the text line area mask with the number of 7.

In fig. 7, the closed box formed by ABCD may be regarded as a text line outline corresponding to the text line area mask of theoretical number 7, so that it can be seen that, with respect to the text line outline obtained by the method of the present invention, the text line outline is included in the text line outline theoretically and is only larger than a line of pixel point positions, and such a deviation may be ignored with respect to the whole mask image, because generally, the pixel points between two text lines are far larger than 1 line of pixel points, but the calculation speed is increased by several times.

In the same way, the embodiment of the present invention can also be applied to the determination of the outline of the curved text line, and fig. 8 to 10 are schematic diagrams for performing the subtraction line by line on the curved text line numbered 8 and determining the text line outline thereof. As can be seen from fig. 10, the text line outline (thin line part) determined in the embodiment of the present invention wraps the theoretical text line outline (thick line part), and deviates from the position of one pixel point to the outside only in a partial region, and this deviation is negligible, but the calculation speed is increased by several times.

Example four

Referring to fig. 11, fig. 11 is a schematic structural diagram of a text detection apparatus according to an embodiment of the present invention. As shown in fig. 11, the text detection apparatus may include:

an obtaining unit 510, configured to obtain a mask map of a text line area mask of a target picture, where the mask map and the target picture have the same size;

a determining unit 520, configured to determine values of each pixel point in the mask map, where in a text line region mask numbered i, the value of each pixel point in the text line region mask is i, and values of the remaining pixel points in the mask map outside the text line region mask are 0; i is more than or equal to 1 and less than or equal to M, wherein M is the total number of the text line region masks corresponding to the target picture;

a calculating unit 530, configured to subtract the value of the pixel point in the jth row from the value of the pixel point in the jth row in the mask map to obtain a new value of the pixel point in the jth row or the jth +1 row, where j is greater than or equal to 1 and less than or equal to N, and N is the total number of rows of the mask map;

an information forming unit 540, configured to form, in the new value, a set of pixel points equal to-i to form first boundary information corresponding to the text line region mask code numbered i, and form, in the new value, a set of pixel points equal to i to form second boundary information corresponding to the text line region mask code numbered i;

and an outline constructing unit 550, configured to construct a text line outline corresponding to the text line region mask with the number i by using the first boundary information and the second boundary information.

As an optional implementation manner, the obtaining unit 510 may include:

a picture acquiring subunit 511, configured to acquire a target picture;

and the identifying subunit 512 is configured to input the target picture into a pre-trained text line detection network model based on deep learning, and output a mask map with a mask of each text line region.

As an alternative embodiment, the contour constructing unit 540 may include:

and a central line and height obtaining subunit 541, configured to determine, according to the first boundary information and the second boundary information, a central line position and a height corresponding to the text line area mask numbered i.

As an alternative embodiment, the centerline and height acquisition subunit 541 includes:

a first grandchild unit 5411, configured to determine a second coordinate of a pixel point equal to i in the new value, where an abscissa of the first coordinate is the same as an abscissa of the second coordinate, and the first coordinate corresponds to the first coordinate of the pixel point equal to-i in the new value;

a second grandchild unit 5412, configured to add the vertical coordinates of the first coordinate and the second coordinate, and then average the result to obtain a midpoint position; all the midpoint positions form a centerline position corresponding to the text line region mask with the number i;

a third grandchild unit 5413, configured to subtract the ordinate of the first coordinate from the ordinate of the second coordinate, and then take an absolute value to obtain height information; the set of all height information forms the height corresponding to the text line region mask with the number i;

a fourth grandchild unit 5414, configured to construct, based on the centerline position and the height corresponding to the text line area mask numbered i, a text line outline corresponding to the text line area mask numbered i.

As an optional implementation, the apparatus may further include:

and a synthesizing unit 560, configured to determine a position of a text line outline corresponding to the text line area mask with the number i in the target picture, and synthesize the text line outline in the target picture.

The text detection device shown in fig. 11 performs subtraction operation line by line on the mask image with multiple classifications of the text line region and the background, and finds out the upper and lower boundaries of the corresponding text line through the subtraction operation, thereby calculating the outline information of the corresponding text line, greatly reducing the time consumption of the module, preventing the module from being influenced by the density degree of the text line region, reducing the average time consumption for finding the outline to within 50ms, and solving the problem of overhigh integral time delay of the OCR to a great extent.

EXAMPLE five

Referring to fig. 12, fig. 12 is a schematic structural diagram of another text detection apparatus according to an embodiment of the disclosure. As shown in fig. 12, the text detection apparatus may include:

an obtaining unit 610, configured to obtain a mask map of a text line region mask of a target picture, where the mask map and the target picture have the same size;

a determining unit 620, configured to determine values of each pixel point in the mask map, where in a text line region mask numbered i, the value of each pixel point in the text line region mask is i, and values of the remaining pixel points in the mask map outside the text line region mask are 0; i is more than or equal to 1 and less than or equal to M, wherein M is the total number of the text line region masks corresponding to the target picture;

the calculating unit 630 is configured to subtract the value of the pixel point in the jth row from the value of the pixel point in the jth row in the mask map to obtain a new value of the pixel point in the jth row or the jth +1 row, where j is greater than or equal to 1 and less than or equal to N, and N is the total number of rows of the mask map;

an information forming unit 640, configured to form first boundary information corresponding to the text line region mask code numbered i by using a set of pixel points equal to-i in the new value, and form second boundary information corresponding to the text line region mask code numbered i by using a set of pixel points equal to i in the new value;

and an outline constructing unit 650, configured to construct a text line outline corresponding to the text line region mask with the number i by using the first boundary information and the second boundary information.

As an optional implementation manner, the obtaining unit 610 may include:

a picture acquisition sub-unit 611 configured to acquire a target picture;

and the identifying subunit 612 is configured to input the target picture into a pre-trained text line detection network model based on deep learning, and output a mask map with masks of text line regions.

As an alternative embodiment, the contour constructing unit 640 may include:

a first information constructing subunit 641, configured to sequentially connect the pixels that are equal to-i in the new value, and form a first boundary corresponding to the text line region mask numbered i; sequentially connecting the pixel points which are equal to i in the new value to form a second boundary corresponding to the text line region mask with the number of i;

the target pixel point determining subunit 642 is configured to determine, as a first pixel point, a pixel point with a smallest abscissa among the pixel points whose new values are equal to-i, and determine, as a second pixel point, a pixel point with a smallest abscissa among the pixel points whose new values are equal to i; determining the pixel point with the largest abscissa in the pixel points with the new value equal to-i as a third pixel point, and determining the pixel point with the largest abscissa in the pixel points with the new value equal to i as a fourth pixel point;

a second information constructing subunit 643, configured to connect the first pixel point and the second pixel point as a left boundary corresponding to the text line region mask numbered i; connecting the third pixel point with the fourth pixel point to serve as the right boundary of the text line region mask with the serial number i;

a third information constructing subunit 644, configured to form a text line outline corresponding to the text line area mask with the number i from a closed box formed by the left boundary, the first boundary, the right boundary, and the second boundary.

As an optional implementation, the apparatus may further include:

a synthesizing unit 660, configured to determine a position of a text line outline corresponding to the text line area mask with the number i in the target picture, and synthesize the text line outline in the target picture.

The text detection device shown in fig. 12 performs subtraction operation line by line on the mask image with multiple classifications of the text line region and the background, and finds out the upper and lower boundaries of the corresponding text line, thereby calculating the outline information of the corresponding text line, greatly reducing the time consumption of the module, preventing the module from being influenced by the density of the text line region, reducing the average time consumption for finding the outline to within 50ms, and solving the problem of high overall time delay of the OCR to a great extent.

EXAMPLE six

Referring to fig. 13, fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. As shown in fig. 13, the electronic device may include:

a memory 710 storing executable program code;

a processor 720 coupled to the memory 710;

the processor 720 calls the executable program code stored in the memory 710 to execute some or all of the steps of the text detection method according to any one of the first to third embodiments.

The embodiment of the invention discloses a computer-readable storage medium which stores a computer program, wherein the computer program enables a computer to execute part or all of steps in any one of the text detection methods in the first embodiment and the third embodiment.

The embodiment of the invention also discloses a computer program product, wherein when the computer program product runs on a computer, the computer is enabled to execute part or all of the steps in the text detection method in any one of the first embodiment to the third embodiment.

The embodiment of the invention also discloses an application publishing platform, wherein the application publishing platform is used for publishing the computer program product, and when the computer program product runs on a computer, the computer is enabled to execute part or all of the steps in any one of the text detection methods of the first embodiment to the third embodiment.

In various embodiments of the present invention, it should be understood that the sequence numbers of the processes do not mean the execution sequence necessarily in order, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated units, if implemented as software functional units and sold or used as a stand-alone product, may be stored in a computer accessible memory. Based on such understanding, the technical solution of the present invention, which is a part of or contributes to the prior art in essence, or all or part of the technical solution, can be embodied in the form of a software product, which is stored in a memory and includes several requests for causing a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute part or all of the steps of the method according to the embodiments of the present invention.

In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.

Those of ordinary skill in the art will appreciate that some or all of the steps of the methods of the embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, including Read-Only Memory (ROM), random Access Memory (RAM), programmable Read-Only Memory (PROM), erasable Programmable Read-Only Memory (EPROM), one-time Programmable Read-Only Memory (OTPROM), electrically Erasable Programmable Read-Only Memory (EEPROM), compact Disc Read-Only (CD-ROM) or other Memory capable of storing data, magnetic tape, or any other medium capable of carrying computer data.

The method, the apparatus, the electronic device and the storage medium for text detection disclosed in the embodiments of the present invention are described in detail above, and a specific example is applied in the present disclosure to explain the principle and the implementation of the present invention, and the description of the above embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of text detection, comprising:

acquiring a mask image of a text line area mask of a target picture, wherein the mask image and the target picture have the same size;

subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the jth row in the mask image to obtain a new value of the pixel point in the jth row or the jth +1 th row, wherein j is more than or equal to 1 and less than or equal to N, and N is the total row number of the mask image;

a set of pixel points which are equal to-i in the new value forms first boundary information corresponding to a text line region mask code which is numbered i, and a set of pixel points which are equal to i in the new value forms second boundary information corresponding to the text line region mask code which is numbered i;

2. The method of claim 1, wherein obtaining a mask map of a text line region mask of the target picture comprises:

acquiring a target picture;

3. The method according to claim 1, wherein the constructing a text line outline corresponding to a text line region mask with a number i by using the first boundary information and the second boundary information comprises:

4. The method according to claim 3, wherein the determining the position and height of the center line corresponding to the text line region mask with the number i according to the first boundary information and the second boundary information comprises:

determining a second coordinate of the pixel point which is equal to i in the new value and corresponds to the first coordinate of the pixel point which is equal to-i in the new value, wherein the abscissa of the first coordinate is the same as that of the second coordinate;

subtracting the vertical coordinates of the first coordinate and the second coordinate, and then taking an absolute value to obtain height information; the height corresponding to the text line region mask with the number i is formed by the set of all height information;

5. The method according to claim 1, wherein the constructing a text line outline corresponding to a text line region mask with a number i by using the first boundary information and the second boundary information comprises:

determining a pixel point with the smallest abscissa among the pixel points with the new value equal to-i as a first pixel point, and determining a pixel point with the smallest abscissa among the pixel points with the new value equal to i as a second pixel point; determining the pixel point with the largest abscissa in the pixel points with the new value equal to-i as a third pixel point, and determining the pixel point with the largest abscissa in the pixel points with the new value equal to i as a fourth pixel point;

connecting the first pixel point with the second pixel point to be used as a left boundary corresponding to the text line region mask with the number of i; connecting the third pixel point and the fourth pixel point to be used as the right boundary of the text line region mask with the number i;

and forming a text line outline corresponding to the text line area mask with the number i by using a closed box formed by the left boundary, the first boundary, the right boundary and the second boundary.

6. The method according to any one of claims 1-5, further comprising:

7. An apparatus for text detection, comprising:

the information forming unit is used for forming first boundary information corresponding to the text line region mask code with the number of i by the set of pixel points which are equal to-i in the new value, and forming second boundary information corresponding to the text line region mask code with the number of i by the set of pixel points which are equal to i in the new value;

and the outline construction unit is used for constructing a text line outline corresponding to the text line area mask with the number i by using the first boundary information and the second boundary information.

8. The apparatus of claim 7, wherein the obtaining unit comprises:

a picture acquiring subunit, configured to acquire a target picture;

and the identification subunit is used for inputting the target picture into a pre-trained text line detection network model based on deep learning and outputting a mask image with masks of all text line regions.

9. The apparatus of claim 7, wherein the contour construction unit comprises:

10. The apparatus of claim 9, wherein the midline and height acquisition subunit comprises:

the first grandchild unit is used for determining a second coordinate of a pixel point which is equal to i in a new value corresponding to a first coordinate of the pixel point which is equal to-i in the new value, and the abscissa of the first coordinate is the same as that of the second coordinate;

the second grandchild unit is used for adding the vertical coordinates of the first coordinate and the second coordinate and then averaging to obtain the midpoint position; all the midpoint positions form a centerline position corresponding to the text line region mask with the number i;

the third sun unit is used for subtracting the vertical coordinates of the first coordinate and the second coordinate and then taking an absolute value to obtain height information; the height corresponding to the text line region mask with the number i is formed by the set of all height information;

and a fourth grandchild unit, configured to construct, based on the centerline position and the height corresponding to the text line area mask numbered i, a text line outline corresponding to the text line area mask numbered i.

11. The apparatus of claim 7, wherein the contour construction unit comprises:

the first information construction subunit is used for sequentially connecting the pixel points which are equal to the i in the new value to form a first boundary corresponding to the text line region mask with the number of i; sequentially connecting the pixel points which are equal to i in the new value to form a second boundary corresponding to the text line region mask code which is numbered i;

the target pixel point determining subunit is used for determining a pixel point with the smallest abscissa among the pixel points with the new value equal to-i as a first pixel point, and determining a pixel point with the smallest abscissa among the pixel points with the new value equal to i as a second pixel point; determining the pixel point with the maximum abscissa in the pixel points with the new value equal to-i as a third pixel point, and determining the pixel point with the maximum abscissa in the pixel points with the new value equal to i as a fourth pixel point;

the second information construction subunit is used for connecting the first pixel point with the second pixel point and taking the first pixel point and the second pixel point as a left boundary corresponding to the text line region mask with the number i; connecting the third pixel point with the fourth pixel point to be used as the right boundary of the text line region mask with the number i;

and a third information construction subunit, configured to form a text line outline corresponding to the text line region mask with the number i by using a closed box formed by the left boundary, the first boundary, the right boundary, and the second boundary.

12. The apparatus of any of claims 7-11, further comprising:

13. An electronic device, comprising: a memory storing executable program code; a processor coupled with the memory; the processor calls the executable program code stored in the memory for performing a method of text detection as claimed in any one of claims 1 to 6.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, wherein the computer program causes a computer to perform a method of text detection as claimed in any one of claims 1 to 6.