CN111666933B - Text detection method and device, electronic equipment and storage medium - Google Patents

Text detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111666933B
CN111666933B CN202010513951.4A CN202010513951A CN111666933B CN 111666933 B CN111666933 B CN 111666933B CN 202010513951 A CN202010513951 A CN 202010513951A CN 111666933 B CN111666933 B CN 111666933B
Authority
CN
China
Prior art keywords
text line
pixel point
mask
line region
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010513951.4A
Other languages
Chinese (zh)
Other versions
CN111666933A (en
Inventor
尹磊
邓小兵
张春雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Genius Technology Co Ltd
Original Assignee
Guangdong Genius Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Genius Technology Co Ltd filed Critical Guangdong Genius Technology Co Ltd
Priority to CN202010513951.4A priority Critical patent/CN111666933B/en
Publication of CN111666933A publication Critical patent/CN111666933A/en
Application granted granted Critical
Publication of CN111666933B publication Critical patent/CN111666933B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a text detection method and device, electronic equipment and a storage medium. The method comprises the following steps: acquiring a mask image of a text line region mask of a target picture; determining the value of each pixel point in the mask image, wherein in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i; subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the j th row in the mask image to obtain a new value of the pixel point in the j th row or the j +1 th row; a set of pixel points equal to-i and i in the new value respectively forms first boundary information and second boundary information corresponding to a text line region mask numbered i; and constructing a text line outline corresponding to the text line area mask with the number i by using the first boundary information and the second boundary information. By implementing the embodiment of the invention, the outline of each text line can be quickly determined, and the time consumption of the whole text recognition is reduced.

Description

Text detection method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of text detection, in particular to a text detection method and device, electronic equipment and a storage medium.
Background
In the text recognition technology, a photographed image is greatly affected by the environment, and in the text recognition, a text line needs to be detected to obtain an optimal text line circumscribed frame, so that the text in the circumscribed frame is recognized.
The existing classical text detection technology is mainly a text line detection algorithm based on PSENet, which combines FPN and PSE technologies, detects each text line through FPN, and then outputs a mask image for multi-classification of text regions and backgrounds after a post-processing based on PSE, i.e. a progressive scale expansion algorithm, that is, outputs a matrix with the same size as the input image and only one channel, each matrix has a value of 0, 1, 2.
After the multi-classified mask image is obtained, traversal is performed on the mask of each text line region through a findContours function in Opencv to find out the outline range of each text line region. However, the contour finding operation needs to be performed on each text line region mask, and for an input image with dense text line regions, the overall time consumption for finding the contour is high, which reaches over 400ms and accounts for about 80% -90% of the overall text line detection algorithm, so that the overall time consumption of OCR is affected.
Disclosure of Invention
Aiming at the defects, the embodiment of the invention discloses a text detection method, a text detection device, electronic equipment and a storage medium, which can quickly determine the outline of each text line and reduce the time consumption of the whole text recognition.
The first aspect of the embodiments of the present invention discloses a method for text detection, where the method includes:
acquiring a mask image of a text line region mask of a target picture, wherein the mask image and the target picture have the same size;
determining the value of each pixel point in the mask image, wherein in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i, and the values of the rest pixel points outside the text line region mask in the mask image are 0; i is more than or equal to 1 and less than or equal to M, wherein M is the total number of the text line region masks corresponding to the target picture;
subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the j th row in the mask image to obtain a new value of the pixel point in the j th row or the j +1 th row, wherein j is more than or equal to 1 and less than or equal to N, and N is the total row number of the mask image;
a set of pixel points which are equal to-i in the new value forms first boundary information corresponding to the text line region mask with the number of i, and a set of pixel points which are equal to i in the new value forms second boundary information corresponding to the text line region mask with the number of i;
and constructing a text line outline corresponding to the text line area mask with the number i by using the first boundary information and the second boundary information.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the obtaining a mask map of a text line region mask of a target picture includes:
acquiring a target picture;
and inputting the target picture into a pre-trained text line detection network model based on deep learning, and outputting a mask image with masks of all text line regions.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the forming, by using the first boundary information and the second boundary information, a text line outline corresponding to a text line area mask with a structure number i includes:
and determining the position and height of the center line corresponding to the text line area mask with the number i according to the first boundary information and the second boundary information.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the determining, according to the first boundary information and the second boundary information, a centerline position and a height corresponding to a text line area mask with a number i includes:
determining a second coordinate of the pixel point which is equal to i in the new value corresponding to the first coordinate of the pixel point which is equal to-i in the new value, wherein the abscissa of the first coordinate is the same as that of the second coordinate;
adding the vertical coordinates of the first coordinate and the second coordinate, and then averaging to obtain a midpoint position; all the midpoint positions form a centerline position corresponding to the text line region mask with the number i;
subtracting the vertical coordinates of the first coordinate and the second coordinate, and then taking an absolute value to obtain height information; the set of all height information forms the height corresponding to the text line region mask with the number i;
and constructing a text line outline corresponding to the text line region mask with the number i based on the center line position and the height corresponding to the text line region mask with the number i.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the forming, by using the first boundary information and the second boundary information, a text line outline corresponding to a text line area mask with a structure number i includes:
sequentially connecting the pixel points which are equal to-i in the new value to form a first boundary corresponding to the text line region mask with the number of i; sequentially connecting the pixel points which are equal to i in the new value to form a second boundary corresponding to the text line region mask code which is numbered i;
determining a pixel point with the smallest abscissa among the pixel points with the new value equal to-i as a first pixel point, and determining a pixel point with the smallest abscissa among the pixel points with the new value equal to i as a second pixel point; determining the pixel point with the maximum abscissa in the pixel points with the new value equal to-i as a third pixel point, and determining the pixel point with the maximum abscissa in the pixel points with the new value equal to i as a fourth pixel point;
connecting the first pixel point with the second pixel point to serve as a left boundary corresponding to the text line region mask with the number i; connecting the third pixel point with the fourth pixel point to be used as the right boundary of the text line region mask with the number i;
and forming a text line outline corresponding to the text line region mask with the number i by using a closed box formed by the left boundary, the first boundary, the right boundary and the second boundary.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the method further includes:
and determining the position of a text line outline corresponding to the text line region mask with the number of i in the target picture, and synthesizing the text line outline into the target picture.
A second aspect of the embodiments of the present invention discloses a text detection apparatus, including:
the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a mask image of a text line region mask of a target picture, and the mask image and the target picture have the same size;
the determining unit is used for determining the value of each pixel point in the mask image, in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i, and the values of the other pixel points outside the text line region mask in the mask image are 0; i is more than or equal to 1 and less than or equal to M, wherein M is the total number of the text line region masks corresponding to the target picture;
the calculation unit is used for subtracting the value of the pixel point corresponding to the j +1 th line from the value of the pixel point in the j th line in the mask image to obtain a new value of the pixel point in the j th line or the j +1 th line, wherein j is more than or equal to 1 and is less than or equal to N, and N is the total line number of the mask image;
the information forming unit is used for forming first boundary information corresponding to the text line region mask code with the number i by the set of pixel points with the number-i in the new value, and forming second boundary information corresponding to the text line region mask code with the number i by the set of pixel points with the number i in the new value;
and the outline construction unit is used for constructing a text line outline corresponding to the text line region mask with the number of i by using the first boundary information and the second boundary information.
As an optional implementation manner, in a second aspect of the embodiment of the present invention, the obtaining unit includes:
the picture acquisition subunit is used for acquiring a target picture;
and the identification subunit is used for inputting the target picture into a pre-trained text line detection network model based on deep learning and outputting a mask image with the mask of each text line region.
As an alternative implementation, in a second aspect of the embodiment of the present invention, the contour constructing unit includes:
and the central line and height acquisition subunit is used for determining the position and the height of the central line corresponding to the text line region mask with the number i according to the first boundary information and the second boundary information.
As an alternative implementation, in a second aspect of the embodiments of the present invention, the centerline and height obtaining subunit includes:
the first sun-breaking unit is used for determining a second coordinate of the pixel point which is equal to the i in the new value and corresponds to the first coordinate of the pixel point which is equal to the-i in the new value, and the abscissa of the first coordinate is the same as that of the second coordinate;
the second grandchild unit is used for adding the vertical coordinates of the first coordinate and the second coordinate and then averaging to obtain a midpoint position; all the midpoint positions form a centerline position corresponding to the text line region mask with the number i;
the third sun unit is used for subtracting the vertical coordinates of the first coordinate and the second coordinate and then taking an absolute value to obtain height information; the set of all height information forms the height corresponding to the text line region mask with the number i;
and a fourth grandchild unit, configured to construct a text line outline corresponding to the text line area mask number i based on the centerline position and the height corresponding to the text line area mask number i.
As an alternative implementation, in a second aspect of the embodiment of the present invention, the contour constructing unit includes:
the first information construction subunit is used for sequentially connecting the pixel points which are equal to-i in the new value to form a first boundary corresponding to the text line region mask with the number of i; sequentially connecting the pixel points which are equal to i in the new value to form a second boundary corresponding to the text line region mask code which is numbered i;
the target pixel point determining subunit is used for determining a pixel point with the minimum abscissa in the pixel points with the new value equal to-i as a first pixel point and determining a pixel point with the minimum abscissa in the pixel points with the new value equal to i as a second pixel point; determining the pixel point with the maximum abscissa in the pixel points with the new value equal to-i as a third pixel point, and determining the pixel point with the maximum abscissa in the pixel points with the new value equal to i as a fourth pixel point;
the second information construction subunit is used for connecting the first pixel point with the second pixel point and taking the first pixel point and the second pixel point as a left boundary corresponding to the text line region mask with the number i; connecting the third pixel point with the fourth pixel point to serve as the right boundary of the text line region mask with the number i;
and a third information construction subunit, configured to form a text line outline corresponding to the text line area mask with the number i from a closed box formed by the left boundary, the first boundary, the right boundary, and the second boundary.
As an optional implementation manner, in a second aspect of the embodiment of the present invention, the apparatus further includes:
and the synthesis unit is used for determining the position of the text line outline corresponding to the text line area mask with the number i in the target picture and synthesizing the text line outline into the target picture.
A third aspect of an embodiment of the present invention discloses an electronic device, including: a memory storing executable program code; a processor coupled with the memory; the processor calls the executable program code stored in the memory for executing part or all of the steps of a text detection method disclosed in the first aspect of the embodiments of the present invention.
A fourth aspect of the embodiments of the present invention discloses a computer-readable storage medium storing a computer program, where the computer program enables a computer to execute part or all of the steps of a method for text detection disclosed in the first aspect of the embodiments of the present invention.
A fifth aspect of the embodiments of the present invention discloses a computer program product, which, when running on a computer, causes the computer to execute part or all of the steps of a method for text detection disclosed in the first aspect of the embodiments of the present invention.
A sixth aspect of the embodiments of the present invention discloses an application publishing platform, where the application publishing platform is configured to publish a computer program product, where when the computer program product runs on a computer, the computer is enabled to execute part or all of the steps of a method for text detection disclosed in the first aspect of the embodiments of the present invention.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, a mask image of a text line region mask of a target picture is obtained, and the size of the mask image is the same as that of the target picture; determining the value of each pixel point in the mask image, wherein in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i, and the values of the rest pixel points outside the text line region mask in the mask image are 0; i is more than or equal to 1 and less than or equal to M, wherein M is the total number of the text line region masks corresponding to the target picture; subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the jth row in the mask image to obtain a new value of the pixel point in the jth row or the jth +1 th row, wherein j is more than or equal to 1 and less than or equal to N, and N is the total row number of the mask image; a set of pixel points which are equal to-i in the new value forms first boundary information corresponding to the text line region mask with the number of i, and a set of pixel points which are equal to i in the new value forms second boundary information corresponding to the text line region mask with the number of i; and constructing a text line outline corresponding to the text line region mask with the number i by using the first boundary information and the second boundary information. Therefore, by implementing the embodiment of the invention, the subtraction operation is carried out line by line aiming at the mask image with multiple classifications of the text line region and the background, and the upper and lower boundaries of the corresponding text line are found out through the subtraction operation, so that the outline information of the corresponding text line is calculated, the time consumption of the module is greatly reduced, the module is not influenced by the density degree of the text line region, the average time consumption for finding the outline is reduced to be within 50ms, and the problem of overhigh integral time delay of OCR is solved to a great extent.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a method for text detection according to an embodiment of the present invention;
FIG. 2 is a schematic illustration of a mask map disclosed in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of new values of the mask map of FIG. 2 after subtraction from each other line by line;
FIG. 4 is a schematic illustration of a first boundary and a second boundary determined based on the new values of FIG. 3;
FIG. 5 is a flow chart illustrating another method for text detection according to an embodiment of the present invention;
FIG. 6 is a flow chart illustrating another method for text detection according to an embodiment of the present invention;
FIG. 7 is a schematic illustration of a text line profile determined based on the new values of FIG. 3;
FIG. 8 is a schematic illustration of another mask map disclosed in an embodiment of the present invention;
FIG. 9 is a schematic diagram of new values of the mask map of FIG. 8 after subtraction from each other line by line;
FIG. 10 is a schematic illustration of a text line profile determined based on the new values of FIG. 9;
FIG. 11 is a schematic structural diagram of an apparatus for text detection according to an embodiment of the present invention;
FIG. 12 is a schematic structural diagram of another text detection apparatus disclosed in the embodiments of the present invention;
fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
It should be noted that the terms "first", "second", "third", "fourth", and the like in the description and the claims of the present invention are used for distinguishing different objects, and are not used for describing a specific order. The terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the invention discloses a text detection method, a text detection device, electronic equipment and a storage medium, wherein the text detection method, the electronic equipment and the storage medium perform subtraction operation line by line aiming at a mask image with multiple classifications of a text line region and a background, and the upper and lower boundaries of a corresponding text line are found out through the subtraction operation, so that outline information of the corresponding text line is calculated, the time consumption of a module is reduced to the maximum extent, the module is not influenced by the density degree of the text line region, the average time consumption for finding the outline is reduced to be within 50ms, the problem of overhigh integral time delay of OCR is solved to the maximum extent, and the detailed description is carried out in combination with the attached drawings.
Example one
Referring to fig. 1, fig. 1 is a schematic flow chart of a text detection method according to an embodiment of the present invention. As shown in fig. 1, the text detection method includes the following steps:
110. and acquiring a mask image of a text line area mask of the target picture, wherein the mask image and the target picture have the same size.
The target picture is an image input by a user, and the mode of inputting the image by the user may be an image obtained by photographing a document by an image acquisition device, or an image downloaded by the user from the internet, which is not limited herein. One or more text lines exist in the target picture, and whether the text lines are in a horizontal state is not limited.
The method for acquiring the text line region mask of the target picture can be various, and in the embodiment of the invention, the text line detection network model based on deep learning is adopted, and the text line detection network model can adopt any deep learning network such as YOLO, CTPN, PSENet and the like. Illustratively, a PSENet text line detection network model is adopted, so that the detection result has strong robustness to the conditions of illumination, color, texture, blur and the like.
After the PSENet text line detection network model is created, the PSENet text line detection network model is trained through a sample set, and the label of the sample is an external frame of each text line. Inputting a target picture into a PSENet text line detection network model, detecting each text line region through FPN, and outputting a mask (mask) image aiming at the text region and background multi-classification after PSE-based post-processing, namely a progressive scale expansion algorithm.
The mask map is a presentation mode of a matrix which has the same size as the target picture and only has one channel, the matrix is a two-dimensional matrix of N × M, wherein N is the number of rows of pixel points of the target picture and the mask map, M is the number of columns of pixel points of the target picture and the mask map, the value of each pixel point is 0, 1, 2.
For the user, since the matrix cannot visually display each text line region mask, in fact, the final result of the PSENet text line detection network model is a mask image, and the values of each text line region mask are different, so that the text line region masks with different colors in multiple categories are presented.
120. Determining the value of each pixel point in the mask image, wherein in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i, and the values of the rest pixel points outside the text line region mask in the mask image are 0; i is more than or equal to 1 and less than or equal to M, and M is the total number of the text line region masks corresponding to the target picture.
Based on the principle of step 110, values of all pixel points can be obtained, and each text line region mask is numbered according to the difference of the values of the pixel points, wherein the value of each pixel point in the text line region mask numbered with i is i, and the value of each pixel point in the text line region mask numbered with k is k. The values of the other pixel points outside the mask of the region of the text line in the mask image are 0. I is more than or equal to 1 and less than or equal to M, and M is the total number of the text line region masks identified by the PSENet text line detection network model corresponding to the target picture.
130. And subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the j th row in the mask image to obtain a new value of the pixel point in the j th row or the j +1 th row, wherein j is more than or equal to 1 and is less than or equal to N, and N is the total row number of the mask image.
The new value calculation can be realized by using an efficient matrix operation method in a python scientific calculation library numpy, and the two-dimensional matrix of N x m corresponding to the mask map is subtracted line by line to obtain a new value matrix. Specifically, the value of the pixel point of the jth row in the mask image is subtracted from the value of the pixel point of the jth +1 row to obtain a new value of the pixel point of the jth row or the jth +1 row, wherein j is greater than or equal to 1 and less than or equal to N, and N is the total row number of the mask image.
Subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the j th row means subtracting the value of the pixel point corresponding to the j +1 th row of the column where each pixel point in the j th row is located from the value of each pixel point in the j th row.
The first line of the mask map may be taken as the first line, and the new value is calculated as a subtraction operation from top to bottom. The last row of the mask map may also be taken as the first row, and the new values are calculated as a subtraction operation from bottom to top. Of course, the two may also be converted to each other, for example, if the first row of the mask map is used as the first row, the value of the pixel point in the jth row in the mask map is subtracted from the value of the pixel point in the jth row to obtain the new value of the pixel point in the jth row or the jth-1 row, and the new value is calculated from bottom to top to realize the gradual subtraction operation. The difference obtained by subtracting is used as a new value of any one of the two rows subtracted, and the last row without the new value takes a value of 0, for example, the subtraction operation is realized from bottom to top, and the difference obtained by subtracting the difference of the second row from the last row from the row of the last row is used as the new value of the last row, so that the first row does not have the new value, the values of the first row are all set to 0, if the difference is used as the new value of the second row from the last, the last row does not have the new value, and the value of the lowest row is all set to 0.
In a similar manner, the contour of each text row can also be determined by subtracting adjacent columns to obtain a new value.
140. And forming first boundary information corresponding to the text line region mask with the number i by the set of pixel points with the number-i in the new value, and forming second boundary information corresponding to the text line region mask with the number i by the set of pixel points with the number i in the new value.
Taking the way of subtracting from each other from the bottom up, to obtain a new value of the text line region mask with the number i as an example, the first boundary information (i.e. the lower boundary information, if from top to bottom, the first boundary information is the upper boundary information) and the second boundary information (i.e. the upper boundary information, if from top to bottom, the second boundary information is the lower boundary information) corresponding to the text line region mask with the number i are determined.
Specifically, referring to the mask map (i = 7) containing a rectangular text line region mask shown in fig. 2, a new value of fig. 3 is obtained by subtracting from bottom to top line by line, in the new value of fig. 3, a pixel point with a new value of-7 constitutes lower boundary information, and a pixel point with a new value of 7 constitutes upper boundary information.
150. And constructing a text line outline corresponding to the text line region mask with the number i by using the first boundary information and the second boundary information.
As shown in fig. 4, each pixel point of the first boundary information and each pixel point of the second boundary information are sequentially connected to obtain a lower boundary 21 and an upper boundary 22 corresponding to the text line region mask with the number of 7, where the upper boundary and the lower boundary can be used as text line outlines corresponding to the text line region mask with the number of 7, where the left boundary and the right boundary can be upper and lower boundaries (extending parallel to the horizontal direction of the mask map) extending to the head column and the tail column of the mask map, and the left and right boundaries of the mask map are used as the left and right boundaries of the text line region mask with the number of 7.
The text line outline corresponding to each text line region mask is obtained in the same way, because the pixel positions of the text line outline are determined, and the sizes of the mask image and the target image are completely the same, the values of all the pixels on all the text line outlines can be adjusted to the values with the same numbers as the pixel positions, the text line outline image corresponding to the mask image is obtained, then the target image and the mask image are directly combined into a new drawing, the drawing contains all the text lines of the target image, and the periphery of all the text lines is wrapped by the corresponding text line outline.
Each text line outline can be divided through the new drawing, and then the divided text lines are sent to a character recognition model for character recognition (certainly, the recognition can also not be divided), and the recognition result can be used for searching questions and the like.
By implementing the embodiment of the invention, the subtraction operation is carried out line by line aiming at the mask image with multiple classifications of the text line region and the background, and the upper and lower boundaries of the corresponding text line are found out through the subtraction operation, so that the outline information of the corresponding text line is calculated, the time consumption of the module is greatly reduced, the module is not influenced by the density degree of the text line region, the average time consumption for finding the outline is reduced to be within 50ms, and the problem of overhigh integral time delay of OCR is solved to a great extent.
Example two
Referring to fig. 5, fig. 5 is a schematic flow chart of another text detection method according to the embodiment of the invention. As shown in fig. 5, the text detection method includes the steps of:
310. and acquiring a mask image of a text line area mask of the target picture, wherein the mask image and the target picture have the same size.
320. Determining the value of each pixel point in the mask image, wherein in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i, and the values of the rest pixel points outside the text line region mask in the mask image are 0; and i is more than or equal to 1 and less than or equal to M, wherein M is the total number of the text line region masks corresponding to the target picture.
330. And subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the j th row in the mask image to obtain a new value of the pixel point in the j th row or the j +1 th row, wherein j is more than or equal to 1 and is less than or equal to N, and N is the total row number of the mask image.
340. And forming first boundary information corresponding to the text line region mask with the number i by the set of pixel points with the number-i in the new value, and forming second boundary information corresponding to the text line region mask with the number i by the set of pixel points with the number i in the new value.
350. And determining the position and height of the center line corresponding to the text line area mask with the number i according to the first boundary information and the second boundary information.
360. And constructing a text line outline corresponding to the text line region mask with the number i based on the center line position and the height corresponding to the text line region mask with the number i.
Steps 310 to 340 may be the same as steps 110 to 140 in the first embodiment, and are not described herein again.
Step 350 and step 360, determine the outline of the text line with the centerline position and height.
In step 350, a second coordinate of the pixel point equal to i in the new value corresponding to the first coordinate of the pixel point equal to-i in the new value is determined, and the abscissa of the first coordinate is the same as that of the second coordinate. Based on the above way of subtracting row by row, each first coordinate corresponds to a second coordinate through the corresponding relationship between the abscissa of the two coordinates, that is, the two coordinates are located in the same column in the matrix formed by the new value.
Adding the vertical coordinates of the first coordinate and the second coordinate, and then averaging to obtain a midpoint position; and the set of all the midpoint positions forms a midline position corresponding to the text line region mask with the number of i. It should be noted here that the pixel coordinate is a position of the pixel in the matrix constructed by the new value, for example, an abscissa value of the point (x, y) is x, and an ordinate value is y, which represents that the pixel is located in the xth row and the yth column of the matrix constructed by the new value, therefore, the averaged value after adding the ordinate of the first coordinate and the ordinate of the second coordinate may not be an integer, and the central line position is also presented by the pixel position, so if not an integer, the rounding may be performed upward or downward.
Subtracting the vertical coordinates of the first coordinate and the second coordinate, and then taking an absolute value to obtain height information; the set of all height information forms the height corresponding to the text line region mask with the number i; thus, the height information is presented in pixel points.
In step 360, a text line outline corresponding to the text line region mask with the number i is constructed based on the centerline position and the height corresponding to the text line region mask with the number i. The position and the height of the center line of the text line outline are determined, the corresponding text line outline can be obtained, similarly, the head end and the tail end of the connected center line can be respectively extended to obtain the left end and the right end of the text line outline, and the heights from the left end and the right end of the text line outline to the head end and the tail end of the center line can respectively adopt the height information corresponding to the head end and the tail end of the center line.
Of course, in other embodiments, the first embodiment and the second embodiment may be combined, and the text line outline may be constructed by the first boundary information, the second boundary information, and the position and height of the middle line.
The text line outline corresponding to each text line region mask is obtained in the same way, because the pixel positions of the text line outline are determined, and the sizes of the mask image and the target image are completely the same, the values of all the pixels on all the text line outlines can be adjusted to the values with the same numbers as the pixel positions, the text line outline image corresponding to the mask image is obtained, then the target image and the mask image are directly combined into a new drawing, the drawing contains all the text lines of the target image, and the periphery of all the text lines is wrapped by the corresponding text line outline.
Each text line outline can be divided through the new drawing, and then the divided text lines are sent to a character recognition model for character recognition (certainly, the recognition can also not be divided), and the recognition result can be used for searching questions and the like.
By implementing the embodiment of the invention, the subtraction operation is carried out line by line aiming at the mask image with multiple classifications of the text line region and the background, and the upper and lower boundaries of the corresponding text line are found out through the subtraction operation, so that the outline information of the corresponding text line is calculated, the time consumption of the module is greatly reduced, the module is not influenced by the density degree of the text line region, the average time consumption for finding the outline is reduced to be within 50ms, and the problem of overhigh integral time delay of OCR is solved to a great extent.
EXAMPLE III
Referring to fig. 6, fig. 6 is a schematic flowchart of another text detection method according to an embodiment of the disclosure. As shown in fig. 6, the text detection method includes the steps of:
410. and acquiring a mask image of a text line region mask of the target picture, wherein the size of the mask image is the same as that of the target picture.
420. Determining the value of each pixel point in the mask image, wherein in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i, and the values of the rest pixel points outside the text line region mask in the mask image are 0; i is more than or equal to 1 and less than or equal to M, and M is the total number of the text line region masks corresponding to the target picture.
430. And subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the j th row in the mask image to obtain a new value of the pixel point in the j th row or the j +1 th row, wherein j is more than or equal to 1 and less than or equal to N, and N is the total row number of the mask image.
440. And forming first boundary information corresponding to the text line region mask with the number i by the set of pixel points with the number-i in the new value, and forming second boundary information corresponding to the text line region mask with the number i by the set of pixel points with the number i in the new value.
450. And determining an upper boundary, a lower boundary, a left boundary and a right boundary corresponding to the text line region mask with the number i according to the first boundary information and the second boundary information.
460. And constructing a text line outline corresponding to the text line region mask with the number i based on the upper, lower, left and right borders corresponding to the text line region mask with the number i.
Steps 410 to 440 may be the same as steps 110 to 140 in the first embodiment, and are not described herein again.
If the target picture is arranged in multiple formats, the text of other lines may be covered by the text line outline dividing method, and recognition is inaccurate, and in step 350 and step 360 of the embodiment of the present invention, the outline of the text line is determined by the upper, lower, left, and right boundaries.
In step 450, sequentially connecting the pixel points which are equal to-i in the new value to form a first boundary corresponding to the text line region mask with the number of i; and sequentially connecting the pixel points which are equal to i in the new value to form a second boundary corresponding to the text line region mask with the number of i. Still taking the mask map shown in fig. 2 as an example, the bottom-up line-by-line subtraction algorithm shown in fig. 3 is adopted, and the second boundary with the upper boundary being the line segment AB and the first boundary with the lower boundary being the line segment EF in fig. 7 can be obtained by connecting the new value pixel points.
Then determining the pixel point with the minimum abscissa in the pixel points with the new value equal to-i as a first pixel point, and determining the pixel point with the minimum abscissa in the pixel points with the new value equal to i as a second pixel point; and determining the pixel point with the largest abscissa in the pixel points with the new value equal to-i as a third pixel point, and determining the pixel point with the largest abscissa in the pixel points with the new value equal to i as a fourth pixel point. Connecting the first pixel point with the second pixel point to serve as a left boundary corresponding to a text line region mask with the number i; and connecting the third pixel point and the fourth pixel point to be used as a right boundary of the text line region mask with the number i.
Because a line-by-line subtraction mode is adopted, the vertical coordinates of each first pixel point and each second pixel point are the same, and the vertical coordinates of each third pixel point and each fourth pixel point are the same. After the first pixel point is connected with the second pixel point and the third pixel point is connected with the fourth pixel point, the left boundary of which the upper boundary is the line segment AF and the right boundary of which the lower boundary is the line segment BE in fig. 7 can BE obtained.
In step 460, the closed box formed by the left boundary, the first boundary, the right boundary and the second boundary forms the text line outline corresponding to the text line area mask with the number of 7, that is, the closed box formed by the line segments AB, BE, EF and AF in fig. 7 is used as the text line outline corresponding to the text line area mask with the number of 7.
In fig. 7, the closed box formed by ABCD may be regarded as a text line outline corresponding to the text line area mask of theoretical number 7, so that it can be seen that, with respect to the text line outline obtained by the method of the present invention, the text line outline is included in the text line outline theoretically and is only larger than a line of pixel point positions, and such a deviation may be ignored with respect to the whole mask image, because generally, the pixel points between two text lines are far larger than 1 line of pixel points, but the calculation speed is increased by several times.
In the same way, the embodiment of the present invention can also be applied to the determination of the outline of the curved text line, and fig. 8 to 10 are schematic diagrams for performing the subtraction line by line on the curved text line numbered 8 and determining the text line outline thereof. As can be seen from fig. 10, the text line outline (thin line part) determined in the embodiment of the present invention wraps the theoretical text line outline (thick line part), and deviates from the position of one pixel point to the outside only in a partial region, and this deviation is negligible, but the calculation speed is increased by several times.
The text line outline corresponding to each text line region mask is obtained in the same way, because the pixel positions of the text line outline are determined, and the sizes of the mask image and the target image are completely the same, the values of all the pixels on all the text line outlines can be adjusted to the values with the same numbers as the pixel positions, the text line outline image corresponding to the mask image is obtained, then the target image and the mask image are directly combined into a new drawing, the drawing contains all the text lines of the target image, and the periphery of all the text lines is wrapped by the corresponding text line outline.
Each text line outline can be divided through the new drawing, and then the divided text lines are sent to a character recognition model for character recognition (certainly, the recognition can also not be divided), and the recognition result can be used for searching questions and the like.
By implementing the embodiment of the invention, the subtraction operation is carried out line by line aiming at the mask image with multiple classifications of the text line region and the background, and the upper and lower boundaries of the corresponding text line are found out through the subtraction operation, so that the outline information of the corresponding text line is calculated, the time consumption of the module is greatly reduced, the module is not influenced by the density degree of the text line region, the average time consumption for finding the outline is reduced to be within 50ms, and the problem of overhigh integral time delay of OCR is solved to a great extent.
Example four
Referring to fig. 11, fig. 11 is a schematic structural diagram of a text detection apparatus according to an embodiment of the present invention. As shown in fig. 11, the text detection apparatus may include:
an obtaining unit 510, configured to obtain a mask map of a text line area mask of a target picture, where the mask map and the target picture have the same size;
a determining unit 520, configured to determine values of each pixel point in the mask map, where in a text line region mask numbered i, the value of each pixel point in the text line region mask is i, and values of the remaining pixel points in the mask map outside the text line region mask are 0; i is more than or equal to 1 and less than or equal to M, wherein M is the total number of the text line region masks corresponding to the target picture;
a calculating unit 530, configured to subtract the value of the pixel point in the jth row from the value of the pixel point in the jth row in the mask map to obtain a new value of the pixel point in the jth row or the jth +1 row, where j is greater than or equal to 1 and less than or equal to N, and N is the total number of rows of the mask map;
an information forming unit 540, configured to form, in the new value, a set of pixel points equal to-i to form first boundary information corresponding to the text line region mask code numbered i, and form, in the new value, a set of pixel points equal to i to form second boundary information corresponding to the text line region mask code numbered i;
and an outline constructing unit 550, configured to construct a text line outline corresponding to the text line region mask with the number i by using the first boundary information and the second boundary information.
As an optional implementation manner, the obtaining unit 510 may include:
a picture acquiring subunit 511, configured to acquire a target picture;
and the identifying subunit 512 is configured to input the target picture into a pre-trained text line detection network model based on deep learning, and output a mask map with a mask of each text line region.
As an alternative embodiment, the contour constructing unit 540 may include:
and a central line and height obtaining subunit 541, configured to determine, according to the first boundary information and the second boundary information, a central line position and a height corresponding to the text line area mask numbered i.
As an alternative embodiment, the centerline and height acquisition subunit 541 includes:
a first grandchild unit 5411, configured to determine a second coordinate of a pixel point equal to i in the new value, where an abscissa of the first coordinate is the same as an abscissa of the second coordinate, and the first coordinate corresponds to the first coordinate of the pixel point equal to-i in the new value;
a second grandchild unit 5412, configured to add the vertical coordinates of the first coordinate and the second coordinate, and then average the result to obtain a midpoint position; all the midpoint positions form a centerline position corresponding to the text line region mask with the number i;
a third grandchild unit 5413, configured to subtract the ordinate of the first coordinate from the ordinate of the second coordinate, and then take an absolute value to obtain height information; the set of all height information forms the height corresponding to the text line region mask with the number i;
a fourth grandchild unit 5414, configured to construct, based on the centerline position and the height corresponding to the text line area mask numbered i, a text line outline corresponding to the text line area mask numbered i.
As an optional implementation, the apparatus may further include:
and a synthesizing unit 560, configured to determine a position of a text line outline corresponding to the text line area mask with the number i in the target picture, and synthesize the text line outline in the target picture.
The text detection device shown in fig. 11 performs subtraction operation line by line on the mask image with multiple classifications of the text line region and the background, and finds out the upper and lower boundaries of the corresponding text line through the subtraction operation, thereby calculating the outline information of the corresponding text line, greatly reducing the time consumption of the module, preventing the module from being influenced by the density degree of the text line region, reducing the average time consumption for finding the outline to within 50ms, and solving the problem of overhigh integral time delay of the OCR to a great extent.
EXAMPLE five
Referring to fig. 12, fig. 12 is a schematic structural diagram of another text detection apparatus according to an embodiment of the disclosure. As shown in fig. 12, the text detection apparatus may include:
an obtaining unit 610, configured to obtain a mask map of a text line region mask of a target picture, where the mask map and the target picture have the same size;
a determining unit 620, configured to determine values of each pixel point in the mask map, where in a text line region mask numbered i, the value of each pixel point in the text line region mask is i, and values of the remaining pixel points in the mask map outside the text line region mask are 0; i is more than or equal to 1 and less than or equal to M, wherein M is the total number of the text line region masks corresponding to the target picture;
the calculating unit 630 is configured to subtract the value of the pixel point in the jth row from the value of the pixel point in the jth row in the mask map to obtain a new value of the pixel point in the jth row or the jth +1 row, where j is greater than or equal to 1 and less than or equal to N, and N is the total number of rows of the mask map;
an information forming unit 640, configured to form first boundary information corresponding to the text line region mask code numbered i by using a set of pixel points equal to-i in the new value, and form second boundary information corresponding to the text line region mask code numbered i by using a set of pixel points equal to i in the new value;
and an outline constructing unit 650, configured to construct a text line outline corresponding to the text line region mask with the number i by using the first boundary information and the second boundary information.
As an optional implementation manner, the obtaining unit 610 may include:
a picture acquisition sub-unit 611 configured to acquire a target picture;
and the identifying subunit 612 is configured to input the target picture into a pre-trained text line detection network model based on deep learning, and output a mask map with masks of text line regions.
As an alternative embodiment, the contour constructing unit 640 may include:
a first information constructing subunit 641, configured to sequentially connect the pixels that are equal to-i in the new value, and form a first boundary corresponding to the text line region mask numbered i; sequentially connecting the pixel points which are equal to i in the new value to form a second boundary corresponding to the text line region mask with the number of i;
the target pixel point determining subunit 642 is configured to determine, as a first pixel point, a pixel point with a smallest abscissa among the pixel points whose new values are equal to-i, and determine, as a second pixel point, a pixel point with a smallest abscissa among the pixel points whose new values are equal to i; determining the pixel point with the largest abscissa in the pixel points with the new value equal to-i as a third pixel point, and determining the pixel point with the largest abscissa in the pixel points with the new value equal to i as a fourth pixel point;
a second information constructing subunit 643, configured to connect the first pixel point and the second pixel point as a left boundary corresponding to the text line region mask numbered i; connecting the third pixel point with the fourth pixel point to serve as the right boundary of the text line region mask with the serial number i;
a third information constructing subunit 644, configured to form a text line outline corresponding to the text line area mask with the number i from a closed box formed by the left boundary, the first boundary, the right boundary, and the second boundary.
As an optional implementation, the apparatus may further include:
a synthesizing unit 660, configured to determine a position of a text line outline corresponding to the text line area mask with the number i in the target picture, and synthesize the text line outline in the target picture.
The text detection device shown in fig. 12 performs subtraction operation line by line on the mask image with multiple classifications of the text line region and the background, and finds out the upper and lower boundaries of the corresponding text line, thereby calculating the outline information of the corresponding text line, greatly reducing the time consumption of the module, preventing the module from being influenced by the density of the text line region, reducing the average time consumption for finding the outline to within 50ms, and solving the problem of high overall time delay of the OCR to a great extent.
EXAMPLE six
Referring to fig. 13, fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. As shown in fig. 13, the electronic device may include:
a memory 710 storing executable program code;
a processor 720 coupled to the memory 710;
the processor 720 calls the executable program code stored in the memory 710 to execute some or all of the steps of the text detection method according to any one of the first to third embodiments.
The embodiment of the invention discloses a computer-readable storage medium which stores a computer program, wherein the computer program enables a computer to execute part or all of steps in any one of the text detection methods in the first embodiment and the third embodiment.
The embodiment of the invention also discloses a computer program product, wherein when the computer program product runs on a computer, the computer is enabled to execute part or all of the steps in the text detection method in any one of the first embodiment to the third embodiment.
The embodiment of the invention also discloses an application publishing platform, wherein the application publishing platform is used for publishing the computer program product, and when the computer program product runs on a computer, the computer is enabled to execute part or all of the steps in any one of the text detection methods of the first embodiment to the third embodiment.
In various embodiments of the present invention, it should be understood that the sequence numbers of the processes do not mean the execution sequence necessarily in order, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated units, if implemented as software functional units and sold or used as a stand-alone product, may be stored in a computer accessible memory. Based on such understanding, the technical solution of the present invention, which is a part of or contributes to the prior art in essence, or all or part of the technical solution, can be embodied in the form of a software product, which is stored in a memory and includes several requests for causing a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute part or all of the steps of the method according to the embodiments of the present invention.
In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.
Those of ordinary skill in the art will appreciate that some or all of the steps of the methods of the embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, including Read-Only Memory (ROM), random Access Memory (RAM), programmable Read-Only Memory (PROM), erasable Programmable Read-Only Memory (EPROM), one-time Programmable Read-Only Memory (OTPROM), electrically Erasable Programmable Read-Only Memory (EEPROM), compact Disc Read-Only (CD-ROM) or other Memory capable of storing data, magnetic tape, or any other medium capable of carrying computer data.
The method, the apparatus, the electronic device and the storage medium for text detection disclosed in the embodiments of the present invention are described in detail above, and a specific example is applied in the present disclosure to explain the principle and the implementation of the present invention, and the description of the above embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (14)

1. A method of text detection, comprising:
acquiring a mask image of a text line area mask of a target picture, wherein the mask image and the target picture have the same size;
determining the value of each pixel point in the mask image, wherein in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i, and the values of the rest pixel points outside the text line region mask in the mask image are 0; i is more than or equal to 1 and less than or equal to M, wherein M is the total number of the text line region masks corresponding to the target picture;
subtracting the value of the pixel point corresponding to the j +1 th row from the value of the pixel point in the jth row in the mask image to obtain a new value of the pixel point in the jth row or the jth +1 th row, wherein j is more than or equal to 1 and less than or equal to N, and N is the total row number of the mask image;
a set of pixel points which are equal to-i in the new value forms first boundary information corresponding to a text line region mask code which is numbered i, and a set of pixel points which are equal to i in the new value forms second boundary information corresponding to the text line region mask code which is numbered i;
and constructing a text line outline corresponding to the text line area mask with the number i by using the first boundary information and the second boundary information.
2. The method of claim 1, wherein obtaining a mask map of a text line region mask of the target picture comprises:
acquiring a target picture;
and inputting the target picture into a pre-trained text line detection network model based on deep learning, and outputting a mask image with masks of all text line regions.
3. The method according to claim 1, wherein the constructing a text line outline corresponding to a text line region mask with a number i by using the first boundary information and the second boundary information comprises:
and determining the position and height of the center line corresponding to the text line area mask with the number i according to the first boundary information and the second boundary information.
4. The method according to claim 3, wherein the determining the position and height of the center line corresponding to the text line region mask with the number i according to the first boundary information and the second boundary information comprises:
determining a second coordinate of the pixel point which is equal to i in the new value and corresponds to the first coordinate of the pixel point which is equal to-i in the new value, wherein the abscissa of the first coordinate is the same as that of the second coordinate;
adding the vertical coordinates of the first coordinate and the second coordinate, and then averaging to obtain a midpoint position; all the midpoint positions form a centerline position corresponding to the text line region mask with the number i;
subtracting the vertical coordinates of the first coordinate and the second coordinate, and then taking an absolute value to obtain height information; the height corresponding to the text line region mask with the number i is formed by the set of all height information;
and constructing a text line outline corresponding to the text line region mask with the number i based on the center line position and the height corresponding to the text line region mask with the number i.
5. The method according to claim 1, wherein the constructing a text line outline corresponding to a text line region mask with a number i by using the first boundary information and the second boundary information comprises:
sequentially connecting the pixel points which are equal to-i in the new value to form a first boundary corresponding to the text line region mask with the number of i; sequentially connecting the pixel points which are equal to i in the new value to form a second boundary corresponding to the text line region mask code which is numbered i;
determining a pixel point with the smallest abscissa among the pixel points with the new value equal to-i as a first pixel point, and determining a pixel point with the smallest abscissa among the pixel points with the new value equal to i as a second pixel point; determining the pixel point with the largest abscissa in the pixel points with the new value equal to-i as a third pixel point, and determining the pixel point with the largest abscissa in the pixel points with the new value equal to i as a fourth pixel point;
connecting the first pixel point with the second pixel point to be used as a left boundary corresponding to the text line region mask with the number of i; connecting the third pixel point and the fourth pixel point to be used as the right boundary of the text line region mask with the number i;
and forming a text line outline corresponding to the text line area mask with the number i by using a closed box formed by the left boundary, the first boundary, the right boundary and the second boundary.
6. The method according to any one of claims 1-5, further comprising:
and determining the position of a text line outline corresponding to the text line region mask with the number of i in the target picture, and synthesizing the text line outline into the target picture.
7. An apparatus for text detection, comprising:
the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a mask image of a text line region mask of a target picture, and the mask image and the target picture have the same size;
the determining unit is used for determining the value of each pixel point in the mask image, in the text line region mask with the number of i, the value of each pixel point in the text line region mask is i, and the values of the other pixel points outside the text line region mask in the mask image are 0; i is more than or equal to 1 and less than or equal to M, wherein M is the total number of the text line region masks corresponding to the target picture;
the calculation unit is used for subtracting the value of the pixel point corresponding to the j +1 th line from the value of the pixel point in the j th line in the mask image to obtain a new value of the pixel point in the j th line or the j +1 th line, wherein j is more than or equal to 1 and is less than or equal to N, and N is the total line number of the mask image;
the information forming unit is used for forming first boundary information corresponding to the text line region mask code with the number of i by the set of pixel points which are equal to-i in the new value, and forming second boundary information corresponding to the text line region mask code with the number of i by the set of pixel points which are equal to i in the new value;
and the outline construction unit is used for constructing a text line outline corresponding to the text line area mask with the number i by using the first boundary information and the second boundary information.
8. The apparatus of claim 7, wherein the obtaining unit comprises:
a picture acquiring subunit, configured to acquire a target picture;
and the identification subunit is used for inputting the target picture into a pre-trained text line detection network model based on deep learning and outputting a mask image with masks of all text line regions.
9. The apparatus of claim 7, wherein the contour construction unit comprises:
and the central line and height acquisition subunit is used for determining the position and the height of the central line corresponding to the text line region mask with the number i according to the first boundary information and the second boundary information.
10. The apparatus of claim 9, wherein the midline and height acquisition subunit comprises:
the first grandchild unit is used for determining a second coordinate of a pixel point which is equal to i in a new value corresponding to a first coordinate of the pixel point which is equal to-i in the new value, and the abscissa of the first coordinate is the same as that of the second coordinate;
the second grandchild unit is used for adding the vertical coordinates of the first coordinate and the second coordinate and then averaging to obtain the midpoint position; all the midpoint positions form a centerline position corresponding to the text line region mask with the number i;
the third sun unit is used for subtracting the vertical coordinates of the first coordinate and the second coordinate and then taking an absolute value to obtain height information; the height corresponding to the text line region mask with the number i is formed by the set of all height information;
and a fourth grandchild unit, configured to construct, based on the centerline position and the height corresponding to the text line area mask numbered i, a text line outline corresponding to the text line area mask numbered i.
11. The apparatus of claim 7, wherein the contour construction unit comprises:
the first information construction subunit is used for sequentially connecting the pixel points which are equal to the i in the new value to form a first boundary corresponding to the text line region mask with the number of i; sequentially connecting the pixel points which are equal to i in the new value to form a second boundary corresponding to the text line region mask code which is numbered i;
the target pixel point determining subunit is used for determining a pixel point with the smallest abscissa among the pixel points with the new value equal to-i as a first pixel point, and determining a pixel point with the smallest abscissa among the pixel points with the new value equal to i as a second pixel point; determining the pixel point with the maximum abscissa in the pixel points with the new value equal to-i as a third pixel point, and determining the pixel point with the maximum abscissa in the pixel points with the new value equal to i as a fourth pixel point;
the second information construction subunit is used for connecting the first pixel point with the second pixel point and taking the first pixel point and the second pixel point as a left boundary corresponding to the text line region mask with the number i; connecting the third pixel point with the fourth pixel point to be used as the right boundary of the text line region mask with the number i;
and a third information construction subunit, configured to form a text line outline corresponding to the text line region mask with the number i by using a closed box formed by the left boundary, the first boundary, the right boundary, and the second boundary.
12. The apparatus of any of claims 7-11, further comprising:
and the synthesis unit is used for determining the position of the text line outline corresponding to the text line area mask with the number i in the target picture and synthesizing the text line outline into the target picture.
13. An electronic device, comprising: a memory storing executable program code; a processor coupled with the memory; the processor calls the executable program code stored in the memory for performing a method of text detection as claimed in any one of claims 1 to 6.
14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, wherein the computer program causes a computer to perform a method of text detection as claimed in any one of claims 1 to 6.
CN202010513951.4A 2020-06-08 2020-06-08 Text detection method and device, electronic equipment and storage medium Active CN111666933B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010513951.4A CN111666933B (en) 2020-06-08 2020-06-08 Text detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010513951.4A CN111666933B (en) 2020-06-08 2020-06-08 Text detection method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111666933A CN111666933A (en) 2020-09-15
CN111666933B true CN111666933B (en) 2023-04-07

Family

ID=72385773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010513951.4A Active CN111666933B (en) 2020-06-08 2020-06-08 Text detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111666933B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257629A (en) * 2020-10-29 2021-01-22 广联达科技股份有限公司 Text information identification method and device for construction drawing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1388815A2 (en) * 2002-04-25 2004-02-11 Microsoft Corporation Segmented layered image system
CN109076246A (en) * 2016-04-06 2018-12-21 英特尔公司 Use the method for video coding and system of image data correction mask
CN109522900A (en) * 2018-10-30 2019-03-26 北京陌上花科技有限公司 Natural scene character recognition method and device
CN110569708A (en) * 2019-06-28 2019-12-13 北京市商汤科技开发有限公司 Text detection method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9916522B2 (en) * 2016-03-11 2018-03-13 Kabushiki Kaisha Toshiba Training constrained deconvolutional networks for road scene semantic segmentation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1388815A2 (en) * 2002-04-25 2004-02-11 Microsoft Corporation Segmented layered image system
CN109076246A (en) * 2016-04-06 2018-12-21 英特尔公司 Use the method for video coding and system of image data correction mask
CN109522900A (en) * 2018-10-30 2019-03-26 北京陌上花科技有限公司 Natural scene character recognition method and device
CN110569708A (en) * 2019-06-28 2019-12-13 北京市商汤科技开发有限公司 Text detection method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李翌昕等.文本检测算法的发展与挑战.《信号处理》.2017,第33卷(第4期),第558-568页. *

Also Published As

Publication number Publication date
CN111666933A (en) 2020-09-15

Similar Documents

Publication Publication Date Title
JP6088792B2 (en) Image detection apparatus, control program, and image detection method
CN110956171A (en) Automatic nameplate identification method and device, computer equipment and storage medium
CN109711399B (en) Shop identification method and device based on image and electronic equipment
CN110738101A (en) Behavior recognition method and device and computer readable storage medium
US8340433B2 (en) Image processing apparatus, electronic medium, and image processing method
CN111695554B (en) Text correction method and device, electronic equipment and storage medium
KR20170137170A (en) Method and apparatus for text image processing
CN111062885A (en) Mark detection model training and mark detection method based on multi-stage transfer learning
CN110210480B (en) Character recognition method and device, electronic equipment and computer readable storage medium
CN110443170B (en) Human body key point determining method and device and electronic equipment
CN110852324A (en) Deep neural network-based container number detection method
CN110807110A (en) Image searching method and device combining local and global features and electronic equipment
US8249363B2 (en) Image comparison system and method
CN111666933B (en) Text detection method and device, electronic equipment and storage medium
JP4149464B2 (en) Image processing device
CN113378812A (en) Digital dial plate identification method based on Mask R-CNN and CRNN
CN106709489B (en) Character recognition processing method and device
CN111340782A (en) Image marking method, device and system
JP4867620B2 (en) Image processing apparatus and image processing program
CN115619791A (en) Article display detection method, device, equipment and readable storage medium
CN115496907A (en) Image processing method and device
CN106469437B (en) Image processing method and image processing apparatus
CN114511877A (en) Behavior recognition method and device, storage medium and terminal
JP6717769B2 (en) Information processing device and program
CN113095239A (en) Key frame extraction method, terminal and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant