CN110852229A - Method, device and equipment for determining position of text area in image and storage medium - Google Patents

Method, device and equipment for determining position of text area in image and storage medium Download PDF

Info

Publication number
CN110852229A
CN110852229A CN201911065589.2A CN201911065589A CN110852229A CN 110852229 A CN110852229 A CN 110852229A CN 201911065589 A CN201911065589 A CN 201911065589A CN 110852229 A CN110852229 A CN 110852229A
Authority
CN
China
Prior art keywords
text
region
text region
determining
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911065589.2A
Other languages
Chinese (zh)
Inventor
王亚领
刘设伟
马文伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Original Assignee
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taikang Insurance Group Co Ltd, Taikang Online Property Insurance Co Ltd filed Critical Taikang Insurance Group Co Ltd
Priority to CN201911065589.2A priority Critical patent/CN110852229A/en
Publication of CN110852229A publication Critical patent/CN110852229A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)

Abstract

The invention discloses a method, a device and equipment for determining the position of a text area in an image and a storage medium. The method for determining the position of the text area in the image comprises the following steps: acquiring an image to be recognized, wherein the image to be recognized comprises a text; performing text positioning on the image to be recognized to obtain a plurality of text areas and coordinate information of four corners of each text area; and performing a position determination operation on each text region, respectively, including: determining a text area which belongs to the same line as the reference text area according to an area formed by extension lines of the upper and lower edges of the selected reference text area; respectively determining column information of each text region in each row according to the abscissa of the angle of the same azimuth of each text region in each row; and respectively determining the line information of each text area according to the average value of the vertical coordinates of the same azimuth angle of each text area in each line.

Description

Method, device and equipment for determining position of text area in image and storage medium
Technical Field
The invention relates to the field of text image recognition, in particular to a method, a device, equipment and a storage medium for determining the position of a text area in an image.
Background
Text content Recognition is a key link for outputting characters in an image in a final text format in an OCR (Optical Character Recognition) data structuring mode, and determining the line and row positions of text content is the basis of text content Recognition. Therefore, accurate and efficient text content line position is a necessary condition for OCR technology to output accurate results. The method for accurately calculating the line position and the row position of the text not only can assist an OCR technology to more accurately output the text content to be recognized, but also can greatly reduce the workload of manual entry and save a large amount of manpower, material resources and financial resources when serving the analysis of various documents or cards in an insurance business scene, so that the cost investment is reduced and the resource allocation is optimized.
For text images with fixed and uniform systems and the same typesetting format, the existing method for determining the row and column positions of text contents is to determine each item of text content to be recognized by matching a fixed template and based on fixed coordinates.
However, for text images with no fixed uniform format and different typesetting formats or with external character interference around the images, the use of OCR technology faces great difficulty. More complicated, in natural scenes, photographs taken artificially to text images inevitably have some degree of oblique perspective. For example, when shooting a bill, the bill may have rotation or unevenness in the paper surface, and it is difficult to ensure complete leveling and flatness even after correction. Therefore, in the above-mentioned several cases, the determination of the line and column positions of the text contents is carried out, and no specific and effective solution exists at present.
The above information disclosed in this background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for determining a position of a text region in an image, an electronic device, and a computer-readable storage medium.
Additional features and advantages of the invention will be set forth in the detailed description which follows, or may be learned by practice of the invention.
According to an aspect of the present invention, there is provided a method for determining a position of a text region in an image, including: acquiring an image to be recognized, wherein the image to be recognized comprises a text; performing text positioning on the image to be recognized to obtain a plurality of text areas and coordinate information of four corners of each text area; and performing a position determination operation on each text region, respectively, including: determining a text area which belongs to the same line as the reference text area according to an area formed by extension lines of the upper and lower edges of the selected reference text area; respectively determining column information of each text region in each row according to the abscissa of the angle of the same azimuth of each text region in each row; and respectively determining the line information of each text area according to the average value of the vertical coordinates of the same azimuth angle of each text area in each line.
According to an embodiment of the present invention, determining a text region belonging to the same line as the reference text region based on a region formed by extensions of upper and lower sides of the selected reference text region includes: step a), determining the reference text region as the text region at the leftmost end according to the abscissa of the upper left corner of each text region; step b) determining a region formed by the upper edge and the lower edge of the reference text region according to a linear equation determined by the upper edge and the lower edge of the reference text region; step c) determining a text region having an overlap with the region as a text region belonging to the same line as the reference text region; step d) if at least one determined text region which belongs to the same line as the reference text region exists, selecting the text region at the leftmost end which is not determined as the reference text region in the at least one determined text region as a new reference text region, and repeatedly executing the steps a) to c).
According to an embodiment of the present invention, determining the column information of each text region in each line according to the abscissa of the angle of the same orientation of each text region in each line includes: respectively sequencing the text regions in each line according to the size of the abscissa of the upper left corner of each text region in each line; and respectively determining the column information of each text area in each row according to the sorting result.
According to an embodiment of the present invention, determining line information of each text region based on an average value of vertical coordinates of angles of the same orientation of each text region in each line includes: determining an average value of vertical coordinates of the upper left corner of each text area in each line; and respectively determining the line information of each text area according to the size of the average value.
According to an embodiment of the present invention, before the position determination operation is performed on each text region separately, the method further includes: classifying the plurality of text regions according to the coordinate information of the four corners of each text region, so that the different types of text regions are not overlapped in the vertical direction; performing the position determination operation on each text region includes: the position determination operation is performed on the text regions in each category, respectively.
According to an embodiment of the present invention, classifying the plurality of text regions according to the coordinate information of the four corners of each text region includes: sequentially screening two text regions meeting the classification conditions and classifying the two text regions into one type; wherein the classification condition is that a larger one of the ordinate of the lower left corner of the two text regions is smaller than a smaller one of the ordinate of the upper left corner of the two text regions.
According to an embodiment of the present invention, performing text localization on the image to be recognized to obtain a plurality of text regions and coordinate information of four corners of each text region includes: and obtaining the coordinate information of the plurality of text regions and the four corner positions of each text region based on the trained deep learning text detection and positioning model.
According to another aspect of the present invention, there is provided an apparatus for determining a position of a text region in an image, comprising: the image acquisition module is used for acquiring an image to be identified, and the image to be identified comprises a text; the text positioning module is used for carrying out text positioning on the image to be recognized to obtain a plurality of text areas and coordinate information of four corners of each text area; the text line dividing module is used for determining the text regions which belong to the same line as the reference text region according to the region formed by the extension lines of the upper and lower edges of the selected reference text region; the first determining module is used for respectively determining the column information of each text area in each row according to the abscissa of the angle of the same azimuth of each text area in each row; and the second determining module is used for respectively determining the line information of each text area according to the average value of the vertical coordinates of the same azimuth angle of each text area in each line.
According to still another aspect of the present invention, there is provided an electronic apparatus including: the device comprises a memory, a processor and executable instructions stored in the memory and executable in the processor, wherein the processor executes the executable instructions to realize the method for determining the position of the text area in the image.
According to a further aspect of the present invention, there is provided a computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, implement a method of determining the position of a text region in an image as in any one of the above.
According to the method for determining the position of the text region in the image, provided by the invention, by utilizing the coordinate information of the text region, all the text regions in the image can be adaptively divided, the row and column information of the text regions in the image can be quickly and effectively determined, the defects that the text image has no fixed and uniform system, the typesetting format is different, the surrounding is interfered by external characters, the oblique perspective exists and the like are overcome, and the text basic information is provided for the high-efficiency and high-precision formatted output of the subsequent OCR technology.
In addition, according to some embodiments, the method for determining the position of the text region in the image provided by the present invention can perform a preliminary classification operation first before performing the line segmentation operation on all the text regions, so as to reduce the amount of calculation in the line segmentation operation.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
FIG. 1 is a flow diagram illustrating a method for determining a location of a text region in an image according to an exemplary embodiment.
FIG. 2 is a flow diagram illustrating another method for determining a location of a text region in an image according to an example embodiment.
FIG. 3 is a flow chart illustrating a method for determining a location of a text region in yet another image according to an exemplary embodiment.
FIG. 4 is a flow chart illustrating a method for determining a location of a text region in yet another image according to an exemplary embodiment.
FIG. 5 is a flow chart illustrating a method for determining a location of a text region in yet another image according to an exemplary embodiment.
FIG. 6 is a block diagram illustrating an apparatus for determining a location of a text region in an image according to an example embodiment.
Fig. 7 is a schematic structural diagram of an electronic device according to an example embodiment.
FIG. 8 is a schematic diagram illustrating a computer-readable storage medium in accordance with an example embodiment.
FIG. 9 is a diagram illustrating a determination of whether two text regions belong to the same line, according to an example embodiment.
Fig. 10 is a diagram illustrating line splitting processing of a plurality of text regions in a text image according to an exemplary embodiment.
FIG. 11 is a diagram illustrating a process of categorizing a plurality of text regions, according to an example embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, apparatus, steps, and so forth. In other instances, well-known structures, methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
Further, in the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise. The symbol "/" generally indicates that the former and latter associated objects are in an "or" relationship. The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.
As described above, for text images with no fixed and uniform format, different typesetting formats, or external character interference around the images, the use of OCR technology faces great difficulty. More complicated, in natural scenes, photographs taken artificially to text images inevitably have some degree of oblique perspective. For example, when shooting a bill, the bill may have rotation or unevenness in the paper surface, and it is difficult to ensure complete leveling and flatness even after correction. At the present stage, no effective solution is available for determining the position of the line of text content with high accuracy in the above-mentioned several situations.
Therefore, the invention provides a method for determining the position of a text region in an image, which can adaptively divide all text regions in the image into lines and quickly and effectively determine the line and column information of the text regions in the image by utilizing the coordinate information of the text regions, overcomes the defects that the text image has no fixed and uniform standard, has different typesetting formats, has external character interference around the text image and has oblique perspective and the like, and provides text basic information for the high-efficiency and high-precision formatted output of the subsequent OCR technology. Preferably, the method for determining the position of the text region in the image provided by the invention can perform a preliminary classification operation before performing the line segmentation operation on all the text regions, so as to reduce the operation amount during the line segmentation operation.
The following describes a method for determining the position of a text region in an image according to embodiments of the present invention.
FIG. 1 is a flow diagram illustrating a method for determining a location of a text region in an image according to an exemplary embodiment. The method for determining the position of a text region in an image as shown in fig. 1 can be applied, for example, in a scene in which a text image is recognized based on OCR technology.
Referring to fig. 1, a method 10 for determining a location of a text region in an image includes:
in step S102, an image to be recognized is acquired.
Wherein the image to be recognized contains text.
In step S104, text positioning is performed on the image to be recognized, and a plurality of text regions and coordinate information of four corners of each text region are obtained.
In some embodiments, performing text localization on an image to be recognized, and obtaining a plurality of text regions and coordinate information of four corners of each text region may include: and obtaining a plurality of text regions and coordinate information of four corner positions of each text region based on the trained deep learning text detection and positioning model. It should be noted that the present invention does not limit the training method, detection/positioning algorithm, etc. adopted by the model, and those skilled in the art will understand that any deep learning model that can be used for detecting and positioning text regions in an image can be adopted in the step to identify and position each text region in an image to be identified.
By detecting the text of the text image by using a trained deep learning text detection and positioning model, n text regions box in any closed quadrilateral shape can be positionedi(i is 1, 2, …, n), and then the horizontal and vertical coordinates of four points of each text region, that is, each quadrangle "upper left corner", "upper right corner", "lower left corner" and "lower right corner" are obtained, which are (x) respectivelylti,ylti)、(xrti,yrti)、(xlbi,ylbi) And (x)rbi,yrbi)。
In step S106, the position determination operation is respectively performed on each text region, and specifically includes:
in step S1062, a text area belonging to the same line as the reference text area is determined based on an area formed by extension lines of the upper and lower sides of the selected reference text area.
In step S1064, column information of each text area in each line is determined based on the abscissa of the angle of the same azimuth of each text area in each line.
In step S1066, the line information of each text area is determined based on the average value of the vertical coordinates of the angles of the same orientation of each text area in each line.
According to the method for determining the position of the text region in the image, provided by the embodiment of the invention, by utilizing the coordinate information of the text region, all the text regions in the image can be adaptively divided, the row and column information of the text regions in the image can be quickly and effectively determined, the defects that the text image has no fixed and uniform standard, the typesetting format is different, external character interference exists around the text image, oblique perspective exists and the like are overcome, and the text basic information is provided for the high-efficiency and high-precision formatted output of the subsequent OCR technology.
It should be clearly understood that the present disclosure describes how to make and use particular examples, but the principles of the present disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.
FIG. 2 is a flow diagram illustrating another method for determining a location of a text region in an image according to an example embodiment. The difference from the method 10 shown in fig. 1 is that the method 20 shown in fig. 2 further provides a method of performing a line splitting operation on all text regions in an image, i.e., an embodiment of step S1062 in the method 10. Likewise, the method for determining the position of a text region in an image as shown in fig. 2 can also be applied, for example, in a scene in which a text image is recognized based on OCR technology.
Referring to fig. 2, step S1062 of the method 10 includes:
in step S202, the reference text region is determined as the leftmost text region based on the abscissa of the upper left corner of each text region.
The n text regions box still used abovei(i ═ 1, 2, …, n) is an example, and can be determined according to the following formula (1):
min(xlti,1≤i≤n)---(1)
screening out the text area box at the leftmost end of the text imageiWith boxiAnd performing line splitting operation as an initial reference text region. Alternatively, the abscissa x of the "lower left corner" of each text region may belbiTo determine an initial reference text region.
In step S204, a region composed of the upper and lower edges of the reference text region is determined based on the straight line equation determined by the upper and lower edges of the reference text region.
As mentioned above, the following formula (2) can be used:
Figure BDA0002259242140000081
determine boxiAn equation of a straight line on one side of the upper edge;
may be according to the following formula (3):
Figure BDA0002259242140000082
determine boxiThe equation of the straight line on one side of the lower edge.
That is, the initial reference text region boxiThe area formed by the upper and lower edges of (2) is an infinite strip-shaped area sandwiched between two straight lines defined by the above formulas (2) and (3). When boxiWhen the two edges of the upper edge and the lower edge are parallel to each other, the corresponding strip-shaped area is two-way infinite; when boxiWhen the two sides of the upper edge and the lower edge are not parallel, the corresponding strip-shaped area is one-way infinite.
In step S206, a text region overlapping with the region is determined as a text region belonging to the same line as the reference text region.
As mentioned above, the following formula (4) can be used:
max(ylbj,ycbj)<min(yltj,yctj)---(4)
preliminarily screening out all text regions and the initial reference text region boxiThe upper and lower sides form a text region box with overlapj. Wherein, yctjTo be a boxjAbscissa x of upper left cornerltjSubstituting the ordinate value, y, obtained by the above formula (2)cbjTo be a boxjAbscissa x of upper left cornerltjThe ordinate value obtained by the above formula (3) is substituted.
That is, all the boxes satisfying the above formula (4)jAnd the initial reference text region boxiBelong to the same row. In contrast, the text region box as shown in FIG. 9kIt is apparent that the above formula (4) is not satisfied, and thus boxkAnd boxiThe two groups belong to each other.
In step S208, if there is at least one determined text region belonging to the same line as the reference text region, the leftmost text region of the at least one determined text region that is not determined as the reference text region is selected as a new reference text region, and the above steps S202 to S206 are repeatedly performed.
Sequentially screening according to the aboveAll text regions are compared with the initial reference text region boxiThe upper and lower sides form a text region box with overlapjAnd can be selected from all boxes according to the above formula (1)jSelecting the text area positioned at the leftmost end as a new reference text area boxi'. However, it should be noted that the boxi' must be selected as the reference text region for the first time. Based on new reference text region boxi', the above steps S202 to S206 are repeatedly executed. Until the text area box is screened as the initial reference text area boxiAll box of the same rowjHave been selected as the new reference text regions, indicating that the text regions of the same line have been completely determined at this time.
For the text image overall, the above steps S202 to S208 are repeatedly executed, that is, all the text regions can be divided into a plurality of lines, where each text region necessarily satisfies the above expression (4) together with at least one text region belonging to the same line.
FIG. 3 is a flow chart illustrating a method for determining a location of a text region in yet another image according to an exemplary embodiment. The difference from the method 10 shown in fig. 1 or the method 20 shown in fig. 2 is that the method 30 shown in fig. 3 further provides a method for determining column information of each text area in each line according to the abscissa of the angle of the same orientation of each text area in each line, i.e., an embodiment of step S1064 in the method 10. Likewise, the method for determining the position of a text region in an image as shown in fig. 3 may also be applied, for example, in a scene in which a text image is recognized based on OCR technology.
Referring to fig. 3, step S1064 of the method 10 further includes:
in step S302, the text regions in each line are sorted according to the size of the abscissa of the upper left corner of each text region in each line.
In step S304, column information of each text region in each line is determined based on the sorting result.
For each line determined according to, for example, method 20, the total text area (box) therein may be determinedi) Abscissa of upper left corner (x)lti) The size of the text region is determined, all the text regions belonging to the same line are sorted from left to right, and the horizontal position index obtained by sorting is marked as column information of each text region. For the text image overall, the above steps S302 to S304 are repeatedly executed, and the column information of all the text regions can be determined.
It should be noted that the present invention is not limited to marking the column information with the abscissa of the upper left corner of each text region, and only needs to select the same azimuth for each text region. That is, in some embodiments, each text region may have its column information labeled with the abscissa of any of the upper left, lower left, upper right, or lower right corners of the text region.
FIG. 4 is a flow chart illustrating a method for determining a location of a text region in yet another image according to an exemplary embodiment. The difference from the method 10 shown in fig. 1, the method 20 shown in fig. 2, or the method 30 shown in fig. 3 is that the method 40 shown in fig. 4 further provides a method for determining line information of each text area according to an average value of vertical coordinates of angles of the same orientation of each text area in each line, namely, an embodiment of step S1066 in the method 10. Likewise, the method for determining the position of a text region in an image as shown in fig. 4 can also be applied, for example, in a scene in which a text image is recognized based on OCR technology.
Referring to fig. 4, step S1066 of the method 10 further includes:
in step S402, the average value of the ordinate of the upper left corner of each text region in each line is determined.
In step S404, line information of each text region is determined based on the size of the average value.
For each line determined according to, for example, method 20, the total text area (box) therein may be determinedi) Upper left ordinate (y)lti) Calculating the mean ordinate
Figure BDA0002259242140000101
And sorting the divided rows from top to bottom according to the size of the average ordinate. For the text image overall situation, the corresponding mark of the longitudinal position index obtained by sorting can be marked as each timeLine information of the text area.
It should be noted that, the present invention also does not limit the line information to be marked by the ordinate mean value of the upper left corner of each text region, and only needs to select the same azimuth angle for each text region. That is, in some embodiments, each text region may have its line information marked with the mean of the ordinate of any of the top left, bottom left, top right, or bottom right corner of the text region.
Fig. 10 is a diagram illustrating line splitting processing of a plurality of text regions in a text image according to an exemplary embodiment. Without loss of generality, the text is illustrated as tilted text, but the invention is not limited to the recognized gesture of the text region. The method of the invention is obviously equally applicable to, for example, perfectly horizontal and flat images of documents.
Referring to fig. 10, the text region "a certain hotel menu" located at the leftmost end of the text image may be determined as the initial reference text region according to step S202 in the method 20, and then the text region "2019-04-05" adjacent to and belonging to the same line may be found according to steps S204 to S208 in the method 20. By analogy, the steps S202 to S208 are repeatedly executed, resulting in four lines of text regions in total. Then, the column and row information of each text area is determined according to the method 30 and the method 40, and the marking result is shown in the following table 1:
TABLE 1
Figure BDA0002259242140000111
FIG. 5 is a flow chart illustrating a method for determining a location of a text region in yet another image according to an exemplary embodiment. The difference between the above methods is that the method 50 shown in fig. 5 further provides a method of classifying all text regions in an image before the text regions are classified, i.e., an embodiment of any of the above methods. Likewise, the method for determining the position of a text region in an image as shown in fig. 5 can also be applied, for example, in a scene in which a text image is recognized based on OCR technology.
Referring to fig. 5, prior to step S106 in method 10, method 10 further includes:
in step S502, a plurality of text regions are classified so that the text regions of different classes do not overlap in the vertical direction, based on the coordinate information of the four corners of each text region.
Correspondingly, step S106 in the method 10 is: the position determination operation is performed on the text areas in the respective categories, that is, the steps S1062 to S1066 are performed on the text areas in the respective categories, respectively.
In some embodiments, classifying the plurality of text regions according to the coordinate information of the four corners of each text region may include: and sequentially screening and classifying two text regions which meet the classification condition.
Wherein the classification condition is that the larger of the ordinate of the lower left corner of the two text regions is smaller than the smaller of the ordinate of the upper left corner of the two text regions.
In light of the above, the classification conditions can be converted into the following formula (5):
max(ylbi,ylbj)<min(ylti,yltj)---(5)
and (3) according to the vertical coordinates of the upper left corner and the lower left corner of all the text regions, comparing every two text regions in the text image in a traversal mode, and sequentially screening out every two text regions which meet the formula (5).
Based on the classification processing result, the method 20, 30, 40 can sequentially realize operations such as line and row information determination of the text region in each category without dividing or marking from the perspective of the text image overall.
Note that "so that the text regions of different categories do not overlap in the vertical direction" merely defines the meaning of "each text region in any category does not overlap in the vertical direction with all the text regions in other categories", but does not mean "the text regions of the same category must overlap in the vertical direction". In other words, the two text regions classified into two categories must not satisfy the above expression (5), and the two text regions classified into the same category may or may not satisfy the above expression (5).
In this regard, reference is made to fig. 11: both the text region a and the text region B in category 1 obviously satisfy the above expression (5), and both the text region B and the text region C obviously also satisfy the above expression (5), but both the text region a and the text region C obviously do not satisfy the above expression (5). However, since the classification process is to compare every two text regions traversably in the text image: when comparing the text region a with other text regions, it may be determined that the text region B and the text region a belong to the same category; when comparing the text region B with other text regions, it is determined that the text region A, C belongs to the same category as the text region B.
Referring again to FIG. 10: in fact, all text regions in the text image can be initially divided into upper and lower categories (as shown by the two sides of the thick dashed line in fig. 10) through step S502. Then, for example, through steps S202 to S208 in the method 20, the text regions in the two categories are respectively divided into lines, so that it can be determined that two text regions "a certain hotel menu" and "2019-04-05" in the class above the thick dotted line belong to the same line, and nine text regions in the class below the thick dotted line belong to three lines, that is, all text regions in the entire text image are divided into four lines. Therefore, no matter all the text areas are directly subjected to line division, or all the text areas are classified in advance and then the text areas in each class are subjected to line division, the determination results of the two schemes on the positions of the text areas in the image are completely consistent.
According to some embodiments, the method for determining the position of the text region in the image provided by the invention can perform a preliminary classification operation before performing the line division operation on all the text regions, so as to reduce the operation amount during the line division operation.
It should be noted that, although the above methods are described by taking the case of performing line and row positioning from the left text region as an example, it should be understood by those skilled in the art that the above methods can be applied to the method of performing line and row positioning from the right text region as well according to the inventive concept and content disclosed in the above methods.
Those skilled in the art will appreciate that all or part of the steps implementing the above embodiments are implemented as computer programs executed by a CPU. The computer program, when executed by the CPU, performs the functions defined by the method provided by the present invention. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.
Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the method according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.
FIG. 6 is a block diagram illustrating an apparatus for determining a location of a text region in an image according to an example embodiment. The apparatus for determining the location of a text region in an image as shown in fig. 6 may be applied, for example, in a scene in which a text image is recognized based on OCR technology.
Referring to fig. 6, the apparatus 60 for determining the position of a text region in an image includes: an image acquisition module 602, a text positioning module 604, a text line-splitting module 608, a first determination module 610, and a second determination module 612.
The image obtaining module 602 is configured to obtain an image to be identified.
Wherein the image to be recognized contains text.
The text positioning module 604 is configured to perform text positioning on the image to be recognized, and obtain a plurality of text regions and coordinate information of four corners of each text region.
In some embodiments, the text positioning module 604 may further include a detection positioning unit for obtaining coordinate information of a plurality of text regions and four corner positions of each text region based on the trained deep learning text detection and positioning model.
The text line dividing module 608 is configured to determine a text region that belongs to the same line as the reference text region according to a region formed by extension lines of the upper and lower sides of the selected reference text region.
In some embodiments, the text-line-splitting module 608 may further include: the device comprises a first determining unit, a second determining unit, a third determining unit and a repeated executing unit.
The first determining unit is used for determining the reference text area as the text area at the leftmost end according to the abscissa of the upper left corner of each text area.
The second determination unit is configured to determine a region formed by the upper and lower edges of the reference text region based on a straight line equation determined by the upper and lower edges of the reference text region.
The third determining unit is configured to determine a text region having an overlap with the region as a text region belonging to the same line as the reference text region.
The repeated execution unit is used for selecting the text region at the leftmost end of the at least one determined text region which is not determined as the reference text region as a new reference text region when at least one determined text region which belongs to the same line as the reference text region exists, and instructing the first, second and third determination units to repeatedly execute the respective functions.
The first determining module 610 is configured to determine column information of each text region in each row according to an abscissa of an angle of the same orientation of each text region in each row.
In some embodiments, the first determining module 610 may further include: a horizontal sorting unit and a fourth determining unit.
The horizontal sorting unit is used for sorting the text regions in each line according to the size of the horizontal coordinate of the upper left corner of each text region in each line.
The fourth determining unit is configured to determine column information of each text region in each row, respectively, according to the sorting result.
The second determining module 612 is configured to determine line information of each text region according to an average value of vertical coordinates of angles of the same orientation of each text region in each line.
In some embodiments, the second determining module 612 may further include: an average value calculating unit and a fifth determining unit.
The average value calculating unit is used for determining the average value of the vertical coordinates of the upper left corner of each text area in each line.
The fifth determining unit is configured to determine line information of each text region respectively according to a size of the average value.
In some embodiments, the apparatus 60 for determining the position of the text region in the image may further include a classification processing module 606, configured to perform classification processing on a plurality of text regions according to coordinate information of four corners of each text region before the text segmentation module 608 determines the text regions belonging to the same line as the reference text region according to the region formed by extension lines of the upper and lower sides of the selected reference text region, so that the text regions of different classes do not overlap in the vertical direction.
In some embodiments, the classification processing module 606 may further include a traversal filtering unit for sequentially filtering and classifying two text regions that satisfy the classification condition.
Wherein the classification condition may be, for example, that the larger of the ordinate of the lower left corner of the two text regions is smaller than the smaller of the ordinate of the upper left corner of the two text regions.
According to the position determining device of the text region in the image, provided by the embodiment of the invention, by utilizing the coordinate information of the text region, all the text regions in the image can be adaptively divided, the row and column information of the text regions in the image can be quickly and effectively determined, the defects that the text image has no fixed and uniform standard, the typesetting format is different, the surrounding is interfered by external characters, the oblique perspective exists and the like are overcome, and the text basic information is provided for the high-efficiency and high-precision formatted output of the subsequent OCR technology.
In addition, according to some embodiments, the present invention provides a device for determining a position of a text region in an image, which is capable of performing a preliminary classification operation first before performing a line segmentation operation on all text regions, so as to reduce an amount of computation in the line segmentation operation.
It is noted that the block diagrams shown in the above figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
Fig. 7 is a schematic structural diagram of an electronic device according to an example embodiment. It should be noted that the electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 7, the electronic device 700 is embodied in the form of a general-purpose computer device. The components of the electronic device 700 include: at least one Central Processing Unit (CPU)701, which may perform various appropriate actions and processes according to program code stored in a Read Only Memory (ROM)702 or loaded from at least one storage unit 708 into a Random Access Memory (RAM) 703.
In particular, according to an embodiment of the present invention, the program code may be executed by the central processing unit 701, such that the central processing unit 701 performs the steps according to various exemplary embodiments of the present invention described in the above-mentioned method embodiment section of the present specification. For example, the central processing unit 701 may perform the steps as shown in fig. 1 to 5.
In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The CPU 701, the ROM702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input unit 706 including a keyboard, a mouse, and the like; an output unit 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage unit 708 including a hard disk and the like; and a communication unit 709 including a network interface card such as a LAN card, a modem, or the like. The communication unit 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage unit 708 as necessary.
FIG. 8 is a schematic diagram illustrating a computer-readable storage medium in accordance with an example embodiment.
Referring to fig. 8, a program product 800 configured to implement the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable medium carries one or more programs which, when executed by a device, cause the computer readable medium to carry out the functions shown in figures 1 to 5.
Exemplary embodiments of the present invention are specifically illustrated and described above. It is to be understood that the invention is not limited to the precise construction, arrangements, or instrumentalities described herein; on the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A method for determining a location of a text region in an image, comprising:
acquiring an image to be recognized, wherein the image to be recognized comprises a text;
performing text positioning on the image to be recognized to obtain a plurality of text areas and coordinate information of four corners of each text area; and
performing a position determination operation on each text region, respectively, including:
determining a text area which belongs to the same line as the reference text area according to an area formed by extension lines of the upper and lower edges of the selected reference text area;
respectively determining column information of each text region in each row according to the abscissa of the angle of the same azimuth of each text region in each row; and
and respectively determining the line information of each text area according to the average value of the vertical coordinates of the same azimuth angle of each text area in each line.
2. The method according to claim 1, wherein determining the text region belonging to the same line as the reference text region based on the region formed by the extensions of the upper and lower sides of the selected reference text region comprises:
step a), determining the reference text region as the text region at the leftmost end according to the abscissa of the upper left corner of each text region;
step b) determining a region formed by the upper edge and the lower edge of the reference text region according to a linear equation determined by the upper edge and the lower edge of the reference text region;
step c) determining a text region having an overlap with the region as a text region belonging to the same line as the reference text region;
step d) if at least one determined text region which belongs to the same line as the reference text region exists, selecting the text region at the leftmost end which is not determined as the reference text region in the at least one determined text region as a new reference text region, and repeatedly executing the steps a) to c).
3. The method of claim 1, wherein determining column information for each text region in each line based on the abscissa of the angle of the same orientation of each text region in each line comprises:
respectively sequencing the text regions in each line according to the size of the abscissa of the upper left corner of each text region in each line; and
and respectively determining column information of each text area in each row according to the sorting result.
4. The method of claim 1, wherein determining the line information for each text region separately based on an average of the ordinate of the angle of the same orientation of each text region in each line comprises:
determining an average value of vertical coordinates of the upper left corner of each text area in each line; and
and respectively determining the line information of each text area according to the size of the average value.
5. The method of any of claims 1-4, wherein prior to performing the location determination operation separately on each text region, the method further comprises: classifying the plurality of text regions according to the coordinate information of the four corners of each text region, so that the different types of text regions are not overlapped in the vertical direction; performing the position determination operation on each text region includes: the position determination operation is performed on the text regions in each category, respectively.
6. The method according to claim 5, wherein classifying the plurality of text regions according to the coordinate information of the four corners of each text region comprises: sequentially screening two text regions meeting the classification conditions and classifying the two text regions into one type; wherein the classification condition is that a larger one of the ordinate of the lower left corner of the two text regions is smaller than a smaller one of the ordinate of the upper left corner of the two text regions.
7. The method according to any one of claims 1 to 4, wherein performing text localization on the image to be recognized to obtain a plurality of text regions and coordinate information of four corners of each text region comprises: and obtaining the coordinate information of the plurality of text regions and the four corner positions of each text region based on the trained deep learning text detection and positioning model.
8. An apparatus for determining a location of a text region in an image, comprising:
the image acquisition module is used for acquiring an image to be identified, and the image to be identified comprises a text;
the text positioning module is used for carrying out text positioning on the image to be recognized to obtain a plurality of text areas and coordinate information of four corners of each text area;
the text line dividing module is used for determining the text regions which belong to the same line as the reference text region according to the region formed by the extension lines of the upper and lower edges of the selected reference text region;
the first determining module is used for respectively determining the column information of each text area in each row according to the abscissa of the angle of the same azimuth of each text area in each row; and
and the second determining module is used for respectively determining the line information of each text area according to the average value of the vertical coordinates of the same azimuth angle of each text area in each line.
9. An electronic device, comprising: memory, processor and executable instructions stored in the memory and executable in the processor, characterized in that the processor implements the method according to any of claims 1-7 when executing the executable instructions.
10. A computer-readable storage medium having stored thereon computer-executable instructions, which when executed by a processor, implement the method of any one of claims 1-7.
CN201911065589.2A 2019-11-04 2019-11-04 Method, device and equipment for determining position of text area in image and storage medium Pending CN110852229A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911065589.2A CN110852229A (en) 2019-11-04 2019-11-04 Method, device and equipment for determining position of text area in image and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911065589.2A CN110852229A (en) 2019-11-04 2019-11-04 Method, device and equipment for determining position of text area in image and storage medium

Publications (1)

Publication Number Publication Date
CN110852229A true CN110852229A (en) 2020-02-28

Family

ID=69598298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911065589.2A Pending CN110852229A (en) 2019-11-04 2019-11-04 Method, device and equipment for determining position of text area in image and storage medium

Country Status (1)

Country Link
CN (1) CN110852229A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011274A (en) * 2021-02-24 2021-06-22 南京三百云信息科技有限公司 Image recognition method and device, electronic equipment and storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030206201A1 (en) * 2002-05-03 2003-11-06 Ly Eric Thichvi Method for graphical classification of unstructured data
CN1567356A (en) * 2003-06-18 2005-01-19 摩托罗拉公司 Method for identification of text line
CN102063619A (en) * 2010-11-30 2011-05-18 汉王科技股份有限公司 Character row extraction method and device
US20120102388A1 (en) * 2010-10-26 2012-04-26 Jian Fan Text segmentation of a document
CN105225218A (en) * 2014-06-24 2016-01-06 佳能株式会社 For distortion correction method and the equipment of file and picture
CN105450900A (en) * 2014-06-24 2016-03-30 佳能株式会社 Distortion correction method and equipment for document image
CN108304761A (en) * 2017-09-25 2018-07-20 腾讯科技(深圳)有限公司 Method for text detection, device, storage medium and computer equipment
CN108805131A (en) * 2018-05-22 2018-11-13 北京旷视科技有限公司 Text line detection method, apparatus and system
CN109657629A (en) * 2018-12-24 2019-04-19 科大讯飞股份有限公司 A kind of line of text extracting method and device
CN109670500A (en) * 2018-11-30 2019-04-23 平安科技(深圳)有限公司 A kind of character area acquisition methods, device, storage medium and terminal device
US10296578B1 (en) * 2018-02-20 2019-05-21 Paycor, Inc. Intelligent extraction and organization of data from unstructured documents
CN109871743A (en) * 2018-12-29 2019-06-11 口碑(上海)信息技术有限公司 The localization method and device of text data, storage medium, terminal
CN109977762A (en) * 2019-02-01 2019-07-05 汉王科技股份有限公司 A kind of text positioning method and device, text recognition method and device
CN110032938A (en) * 2019-03-12 2019-07-19 北京汉王数字科技有限公司 A kind of Tibetan language recognition method, device and electronic equipment

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030206201A1 (en) * 2002-05-03 2003-11-06 Ly Eric Thichvi Method for graphical classification of unstructured data
CN1567356A (en) * 2003-06-18 2005-01-19 摩托罗拉公司 Method for identification of text line
US20120102388A1 (en) * 2010-10-26 2012-04-26 Jian Fan Text segmentation of a document
CN102063619A (en) * 2010-11-30 2011-05-18 汉王科技股份有限公司 Character row extraction method and device
CN105225218A (en) * 2014-06-24 2016-01-06 佳能株式会社 For distortion correction method and the equipment of file and picture
CN105450900A (en) * 2014-06-24 2016-03-30 佳能株式会社 Distortion correction method and equipment for document image
CN108304761A (en) * 2017-09-25 2018-07-20 腾讯科技(深圳)有限公司 Method for text detection, device, storage medium and computer equipment
US10296578B1 (en) * 2018-02-20 2019-05-21 Paycor, Inc. Intelligent extraction and organization of data from unstructured documents
CN108805131A (en) * 2018-05-22 2018-11-13 北京旷视科技有限公司 Text line detection method, apparatus and system
CN109670500A (en) * 2018-11-30 2019-04-23 平安科技(深圳)有限公司 A kind of character area acquisition methods, device, storage medium and terminal device
CN109657629A (en) * 2018-12-24 2019-04-19 科大讯飞股份有限公司 A kind of line of text extracting method and device
CN109871743A (en) * 2018-12-29 2019-06-11 口碑(上海)信息技术有限公司 The localization method and device of text data, storage medium, terminal
CN109977762A (en) * 2019-02-01 2019-07-05 汉王科技股份有限公司 A kind of text positioning method and device, text recognition method and device
CN110032938A (en) * 2019-03-12 2019-07-19 北京汉王数字科技有限公司 A kind of Tibetan language recognition method, device and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011274A (en) * 2021-02-24 2021-06-22 南京三百云信息科技有限公司 Image recognition method and device, electronic equipment and storage medium
CN113011274B (en) * 2021-02-24 2024-04-09 南京三百云信息科技有限公司 Image recognition method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107798299B (en) Bill information identification method, electronic device and readable storage medium
WO2018233055A1 (en) Method and apparatus for entering policy information, computer device and storage medium
US10878173B2 (en) Object recognition and tagging based on fusion deep learning models
US8515208B2 (en) Method for document to template alignment
US8693790B2 (en) Form template definition method and form template definition apparatus
WO2021017272A1 (en) Pathology image annotation method and device, computer apparatus, and storage medium
Li et al. Automatic comic page segmentation based on polygon detection
CN114862845B (en) Defect detection method, device and equipment for mobile phone touch screen and storage medium
CN112541922A (en) Test paper layout segmentation method based on digital image, electronic equipment and storage medium
WO2021023111A1 (en) Methods and devices for recognizing number of receipts and regions of a plurality of receipts in image
CN115424111A (en) Intelligent identification method, device, equipment and medium of antigen detection kit
CN109635729B (en) Form identification method and terminal
CN115082935A (en) Method, apparatus and storage medium for correcting document image
CN114494751A (en) License information identification method, device, equipment and medium
CN114581928A (en) Form identification method and system
CN113469302A (en) Multi-circular target identification method and system for video image
CN110852229A (en) Method, device and equipment for determining position of text area in image and storage medium
CN117593420A (en) Plane drawing labeling method, device, medium and equipment based on image processing
CN106056575B (en) A kind of image matching method based on like physical property proposed algorithm
Heitzler et al. A modular process to improve the georeferencing of the Siegfried map
CN113936187A (en) Text image synthesis method and device, storage medium and electronic equipment
CN112287763A (en) Image processing method, apparatus, device and medium
CN104112135B (en) Text image extraction element and method
WO2021098861A1 (en) Text recognition method, apparatus, recognition device, and storage medium
CN116541549B (en) Subgraph segmentation method, subgraph segmentation device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200228