CN110852229A

CN110852229A - Method, device and equipment for determining position of text area in image and storage medium

Info

Publication number: CN110852229A
Application number: CN201911065589.2A
Authority: CN
Inventors: 王亚领; 刘设伟; 马文伟
Original assignee: Taikang Insurance Group Co Ltd; Taikang Online Property Insurance Co Ltd
Current assignee: Taikang Insurance Group Co Ltd; Taikang Online Property Insurance Co Ltd
Priority date: 2019-11-04
Filing date: 2019-11-04
Publication date: 2020-02-28

Abstract

The invention discloses a method, a device and equipment for determining the position of a text area in an image and a storage medium. The method for determining the position of the text area in the image comprises the following steps: acquiring an image to be recognized, wherein the image to be recognized comprises a text; performing text positioning on the image to be recognized to obtain a plurality of text areas and coordinate information of four corners of each text area; and performing a position determination operation on each text region, respectively, including: determining a text area which belongs to the same line as the reference text area according to an area formed by extension lines of the upper and lower edges of the selected reference text area; respectively determining column information of each text region in each row according to the abscissa of the angle of the same azimuth of each text region in each row; and respectively determining the line information of each text area according to the average value of the vertical coordinates of the same azimuth angle of each text area in each line.

Description

Method, device and equipment for determining position of text area in image and storage medium

Technical Field

The invention relates to the field of text image recognition, in particular to a method, a device, equipment and a storage medium for determining the position of a text area in an image.

Background

Text content Recognition is a key link for outputting characters in an image in a final text format in an OCR (Optical Character Recognition) data structuring mode, and determining the line and row positions of text content is the basis of text content Recognition. Therefore, accurate and efficient text content line position is a necessary condition for OCR technology to output accurate results. The method for accurately calculating the line position and the row position of the text not only can assist an OCR technology to more accurately output the text content to be recognized, but also can greatly reduce the workload of manual entry and save a large amount of manpower, material resources and financial resources when serving the analysis of various documents or cards in an insurance business scene, so that the cost investment is reduced and the resource allocation is optimized.

For text images with fixed and uniform systems and the same typesetting format, the existing method for determining the row and column positions of text contents is to determine each item of text content to be recognized by matching a fixed template and based on fixed coordinates.

However, for text images with no fixed uniform format and different typesetting formats or with external character interference around the images, the use of OCR technology faces great difficulty. More complicated, in natural scenes, photographs taken artificially to text images inevitably have some degree of oblique perspective. For example, when shooting a bill, the bill may have rotation or unevenness in the paper surface, and it is difficult to ensure complete leveling and flatness even after correction. Therefore, in the above-mentioned several cases, the determination of the line and column positions of the text contents is carried out, and no specific and effective solution exists at present.

The above information disclosed in this background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not constitute prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

In view of the above, the present invention provides a method and an apparatus for determining a position of a text region in an image, an electronic device, and a computer-readable storage medium.

Additional features and advantages of the invention will be set forth in the detailed description which follows, or may be learned by practice of the invention.

According to an aspect of the present invention, there is provided a method for determining a position of a text region in an image, including: acquiring an image to be recognized, wherein the image to be recognized comprises a text; performing text positioning on the image to be recognized to obtain a plurality of text areas and coordinate information of four corners of each text area; and performing a position determination operation on each text region, respectively, including: determining a text area which belongs to the same line as the reference text area according to an area formed by extension lines of the upper and lower edges of the selected reference text area; respectively determining column information of each text region in each row according to the abscissa of the angle of the same azimuth of each text region in each row; and respectively determining the line information of each text area according to the average value of the vertical coordinates of the same azimuth angle of each text area in each line.

According to an embodiment of the present invention, determining a text region belonging to the same line as the reference text region based on a region formed by extensions of upper and lower sides of the selected reference text region includes: step a), determining the reference text region as the text region at the leftmost end according to the abscissa of the upper left corner of each text region; step b) determining a region formed by the upper edge and the lower edge of the reference text region according to a linear equation determined by the upper edge and the lower edge of the reference text region; step c) determining a text region having an overlap with the region as a text region belonging to the same line as the reference text region; step d) if at least one determined text region which belongs to the same line as the reference text region exists, selecting the text region at the leftmost end which is not determined as the reference text region in the at least one determined text region as a new reference text region, and repeatedly executing the steps a) to c).

According to an embodiment of the present invention, determining the column information of each text region in each line according to the abscissa of the angle of the same orientation of each text region in each line includes: respectively sequencing the text regions in each line according to the size of the abscissa of the upper left corner of each text region in each line; and respectively determining the column information of each text area in each row according to the sorting result.

According to an embodiment of the present invention, determining line information of each text region based on an average value of vertical coordinates of angles of the same orientation of each text region in each line includes: determining an average value of vertical coordinates of the upper left corner of each text area in each line; and respectively determining the line information of each text area according to the size of the average value.

According to an embodiment of the present invention, before the position determination operation is performed on each text region separately, the method further includes: classifying the plurality of text regions according to the coordinate information of the four corners of each text region, so that the different types of text regions are not overlapped in the vertical direction; performing the position determination operation on each text region includes: the position determination operation is performed on the text regions in each category, respectively.

According to an embodiment of the present invention, classifying the plurality of text regions according to the coordinate information of the four corners of each text region includes: sequentially screening two text regions meeting the classification conditions and classifying the two text regions into one type; wherein the classification condition is that a larger one of the ordinate of the lower left corner of the two text regions is smaller than a smaller one of the ordinate of the upper left corner of the two text regions.

According to an embodiment of the present invention, performing text localization on the image to be recognized to obtain a plurality of text regions and coordinate information of four corners of each text region includes: and obtaining the coordinate information of the plurality of text regions and the four corner positions of each text region based on the trained deep learning text detection and positioning model.

According to another aspect of the present invention, there is provided an apparatus for determining a position of a text region in an image, comprising: the image acquisition module is used for acquiring an image to be identified, and the image to be identified comprises a text; the text positioning module is used for carrying out text positioning on the image to be recognized to obtain a plurality of text areas and coordinate information of four corners of each text area; the text line dividing module is used for determining the text regions which belong to the same line as the reference text region according to the region formed by the extension lines of the upper and lower edges of the selected reference text region; the first determining module is used for respectively determining the column information of each text area in each row according to the abscissa of the angle of the same azimuth of each text area in each row; and the second determining module is used for respectively determining the line information of each text area according to the average value of the vertical coordinates of the same azimuth angle of each text area in each line.

According to still another aspect of the present invention, there is provided an electronic apparatus including: the device comprises a memory, a processor and executable instructions stored in the memory and executable in the processor, wherein the processor executes the executable instructions to realize the method for determining the position of the text area in the image.

According to a further aspect of the present invention, there is provided a computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, implement a method of determining the position of a text region in an image as in any one of the above.

According to the method for determining the position of the text region in the image, provided by the invention, by utilizing the coordinate information of the text region, all the text regions in the image can be adaptively divided, the row and column information of the text regions in the image can be quickly and effectively determined, the defects that the text image has no fixed and uniform system, the typesetting format is different, the surrounding is interfered by external characters, the oblique perspective exists and the like are overcome, and the text basic information is provided for the high-efficiency and high-precision formatted output of the subsequent OCR technology.

In addition, according to some embodiments, the method for determining the position of the text region in the image provided by the present invention can perform a preliminary classification operation first before performing the line segmentation operation on all the text regions, so as to reduce the amount of calculation in the line segmentation operation.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.

FIG. 1 is a flow diagram illustrating a method for determining a location of a text region in an image according to an exemplary embodiment.

FIG. 2 is a flow diagram illustrating another method for determining a location of a text region in an image according to an example embodiment.

FIG. 3 is a flow chart illustrating a method for determining a location of a text region in yet another image according to an exemplary embodiment.

FIG. 4 is a flow chart illustrating a method for determining a location of a text region in yet another image according to an exemplary embodiment.

FIG. 5 is a flow chart illustrating a method for determining a location of a text region in yet another image according to an exemplary embodiment.

FIG. 6 is a block diagram illustrating an apparatus for determining a location of a text region in an image according to an example embodiment.

Fig. 7 is a schematic structural diagram of an electronic device according to an example embodiment.

FIG. 8 is a schematic diagram illustrating a computer-readable storage medium in accordance with an example embodiment.

FIG. 9 is a diagram illustrating a determination of whether two text regions belong to the same line, according to an example embodiment.

Fig. 10 is a diagram illustrating line splitting processing of a plurality of text regions in a text image according to an exemplary embodiment.

FIG. 11 is a diagram illustrating a process of categorizing a plurality of text regions, according to an example embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, apparatus, steps, and so forth. In other instances, well-known structures, methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the invention.

Further, in the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise. The symbol "/" generally indicates that the former and latter associated objects are in an "or" relationship. The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.

As described above, for text images with no fixed and uniform format, different typesetting formats, or external character interference around the images, the use of OCR technology faces great difficulty. More complicated, in natural scenes, photographs taken artificially to text images inevitably have some degree of oblique perspective. For example, when shooting a bill, the bill may have rotation or unevenness in the paper surface, and it is difficult to ensure complete leveling and flatness even after correction. At the present stage, no effective solution is available for determining the position of the line of text content with high accuracy in the above-mentioned several situations.

Therefore, the invention provides a method for determining the position of a text region in an image, which can adaptively divide all text regions in the image into lines and quickly and effectively determine the line and column information of the text regions in the image by utilizing the coordinate information of the text regions, overcomes the defects that the text image has no fixed and uniform standard, has different typesetting formats, has external character interference around the text image and has oblique perspective and the like, and provides text basic information for the high-efficiency and high-precision formatted output of the subsequent OCR technology. Preferably, the method for determining the position of the text region in the image provided by the invention can perform a preliminary classification operation before performing the line segmentation operation on all the text regions, so as to reduce the operation amount during the line segmentation operation.

The following describes a method for determining the position of a text region in an image according to embodiments of the present invention.

FIG. 1 is a flow diagram illustrating a method for determining a location of a text region in an image according to an exemplary embodiment. The method for determining the position of a text region in an image as shown in fig. 1 can be applied, for example, in a scene in which a text image is recognized based on OCR technology.

Referring to fig. 1, a method 10 for determining a location of a text region in an image includes:

in step S102, an image to be recognized is acquired.

Wherein the image to be recognized contains text.

In step S104, text positioning is performed on the image to be recognized, and a plurality of text regions and coordinate information of four corners of each text region are obtained.

In some embodiments, performing text localization on an image to be recognized, and obtaining a plurality of text regions and coordinate information of four corners of each text region may include: and obtaining a plurality of text regions and coordinate information of four corner positions of each text region based on the trained deep learning text detection and positioning model. It should be noted that the present invention does not limit the training method, detection/positioning algorithm, etc. adopted by the model, and those skilled in the art will understand that any deep learning model that can be used for detecting and positioning text regions in an image can be adopted in the step to identify and position each text region in an image to be identified.

By detecting the text of the text image by using a trained deep learning text detection and positioning model, n text regions box in any closed quadrilateral shape can be positioned_i(i is 1, 2, …, n), and then the horizontal and vertical coordinates of four points of each text region, that is, each quadrangle "upper left corner", "upper right corner", "lower left corner" and "lower right corner" are obtained, which are (x) respectively_lti，y_lti)、(x_rti，y_rti)、(x_lbi，y_lbi) And (x)_rbi，y_rbi)。

In step S106, the position determination operation is respectively performed on each text region, and specifically includes:

in step S1062, a text area belonging to the same line as the reference text area is determined based on an area formed by extension lines of the upper and lower sides of the selected reference text area.

In step S1064, column information of each text area in each line is determined based on the abscissa of the angle of the same azimuth of each text area in each line.

In step S1066, the line information of each text area is determined based on the average value of the vertical coordinates of the angles of the same orientation of each text area in each line.

According to the method for determining the position of the text region in the image, provided by the embodiment of the invention, by utilizing the coordinate information of the text region, all the text regions in the image can be adaptively divided, the row and column information of the text regions in the image can be quickly and effectively determined, the defects that the text image has no fixed and uniform standard, the typesetting format is different, external character interference exists around the text image, oblique perspective exists and the like are overcome, and the text basic information is provided for the high-efficiency and high-precision formatted output of the subsequent OCR technology.

It should be clearly understood that the present disclosure describes how to make and use particular examples, but the principles of the present disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.

FIG. 2 is a flow diagram illustrating another method for determining a location of a text region in an image according to an example embodiment. The difference from the method 10 shown in fig. 1 is that the method 20 shown in fig. 2 further provides a method of performing a line splitting operation on all text regions in an image, i.e., an embodiment of step S1062 in the method 10. Likewise, the method for determining the position of a text region in an image as shown in fig. 2 can also be applied, for example, in a scene in which a text image is recognized based on OCR technology.

Referring to fig. 2, step S1062 of the method 10 includes:

in step S202, the reference text region is determined as the leftmost text region based on the abscissa of the upper left corner of each text region.

The n text regions box still used above_i(i ═ 1, 2, …, n) is an example, and can be determined according to the following formula (1):

min(x_lti，1≤i≤n)---(1)

screening out the text area box at the leftmost end of the text image_iWith box_iAnd performing line splitting operation as an initial reference text region. Alternatively, the abscissa x of the "lower left corner" of each text region may be_lbiTo determine an initial reference text region.

In step S204, a region composed of the upper and lower edges of the reference text region is determined based on the straight line equation determined by the upper and lower edges of the reference text region.

As mentioned above, the following formula (2) can be used:

determine box_iAn equation of a straight line on one side of the upper edge;

may be according to the following formula (3):

determine box_iThe equation of the straight line on one side of the lower edge.

That is, the initial reference text region box_iThe area formed by the upper and lower edges of (2) is an infinite strip-shaped area sandwiched between two straight lines defined by the above formulas (2) and (3). When box_iWhen the two edges of the upper edge and the lower edge are parallel to each other, the corresponding strip-shaped area is two-way infinite; when box_iWhen the two sides of the upper edge and the lower edge are not parallel, the corresponding strip-shaped area is one-way infinite.

In step S206, a text region overlapping with the region is determined as a text region belonging to the same line as the reference text region.

As mentioned above, the following formula (4) can be used:

max(y_lbj，y_cbj)＜min(y_ltj，y_ctj)---(4)

preliminarily screening out all text regions and the initial reference text region box_iThe upper and lower sides form a text region box with overlap_j. Wherein, y_ctjTo be a box_jAbscissa x of upper left corner_ltjSubstituting the ordinate value, y, obtained by the above formula (2)_cbjTo be a box_jAbscissa x of upper left corner_ltjThe ordinate value obtained by the above formula (3) is substituted.

That is, all the boxes satisfying the above formula (4)_jAnd the initial reference text region box_iBelong to the same row. In contrast, the text region box as shown in FIG. 9_kIt is apparent that the above formula (4) is not satisfied, and thus box_kAnd box_iThe two groups belong to each other.

In step S208, if there is at least one determined text region belonging to the same line as the reference text region, the leftmost text region of the at least one determined text region that is not determined as the reference text region is selected as a new reference text region, and the above steps S202 to S206 are repeatedly performed.

Sequentially screening according to the aboveAll text regions are compared with the initial reference text region box_iThe upper and lower sides form a text region box with overlap_jAnd can be selected from all boxes according to the above formula (1)_jSelecting the text area positioned at the leftmost end as a new reference text area box_i'. However, it should be noted that the box_i' must be selected as the reference text region for the first time. Based on new reference text region box_i', the above steps S202 to S206 are repeatedly executed. Until the text area box is screened as the initial reference text area box_iAll box of the same row_jHave been selected as the new reference text regions, indicating that the text regions of the same line have been completely determined at this time.

For the text image overall, the above steps S202 to S208 are repeatedly executed, that is, all the text regions can be divided into a plurality of lines, where each text region necessarily satisfies the above expression (4) together with at least one text region belonging to the same line.

FIG. 3 is a flow chart illustrating a method for determining a location of a text region in yet another image according to an exemplary embodiment. The difference from the method 10 shown in fig. 1 or the method 20 shown in fig. 2 is that the method 30 shown in fig. 3 further provides a method for determining column information of each text area in each line according to the abscissa of the angle of the same orientation of each text area in each line, i.e., an embodiment of step S1064 in the method 10. Likewise, the method for determining the position of a text region in an image as shown in fig. 3 may also be applied, for example, in a scene in which a text image is recognized based on OCR technology.

Referring to fig. 3, step S1064 of the method 10 further includes:

in step S302, the text regions in each line are sorted according to the size of the abscissa of the upper left corner of each text region in each line.

In step S304, column information of each text region in each line is determined based on the sorting result.

For each line determined according to, for example, method 20, the total text area (box) therein may be determined_i) Abscissa of upper left corner (x)_lti) The size of the text region is determined, all the text regions belonging to the same line are sorted from left to right, and the horizontal position index obtained by sorting is marked as column information of each text region. For the text image overall, the above steps S302 to S304 are repeatedly executed, and the column information of all the text regions can be determined.

It should be noted that the present invention is not limited to marking the column information with the abscissa of the upper left corner of each text region, and only needs to select the same azimuth for each text region. That is, in some embodiments, each text region may have its column information labeled with the abscissa of any of the upper left, lower left, upper right, or lower right corners of the text region.

FIG. 4 is a flow chart illustrating a method for determining a location of a text region in yet another image according to an exemplary embodiment. The difference from the method 10 shown in fig. 1, the method 20 shown in fig. 2, or the method 30 shown in fig. 3 is that the method 40 shown in fig. 4 further provides a method for determining line information of each text area according to an average value of vertical coordinates of angles of the same orientation of each text area in each line, namely, an embodiment of step S1066 in the method 10. Likewise, the method for determining the position of a text region in an image as shown in fig. 4 can also be applied, for example, in a scene in which a text image is recognized based on OCR technology.

Referring to fig. 4, step S1066 of the method 10 further includes:

in step S402, the average value of the ordinate of the upper left corner of each text region in each line is determined.

In step S404, line information of each text region is determined based on the size of the average value.

For each line determined according to, for example, method 20, the total text area (box) therein may be determined_i) Upper left ordinate (y)_lti) Calculating the mean ordinate

And sorting the divided rows from top to bottom according to the size of the average ordinate. For the text image overall situation, the corresponding mark of the longitudinal position index obtained by sorting can be marked as each timeLine information of the text area.

It should be noted that, the present invention also does not limit the line information to be marked by the ordinate mean value of the upper left corner of each text region, and only needs to select the same azimuth angle for each text region. That is, in some embodiments, each text region may have its line information marked with the mean of the ordinate of any of the top left, bottom left, top right, or bottom right corner of the text region.

Fig. 10 is a diagram illustrating line splitting processing of a plurality of text regions in a text image according to an exemplary embodiment. Without loss of generality, the text is illustrated as tilted text, but the invention is not limited to the recognized gesture of the text region. The method of the invention is obviously equally applicable to, for example, perfectly horizontal and flat images of documents.

Referring to fig. 10, the text region "a certain hotel menu" located at the leftmost end of the text image may be determined as the initial reference text region according to step S202 in the method 20, and then the text region "2019-04-05" adjacent to and belonging to the same line may be found according to steps S204 to S208 in the method 20. By analogy, the steps S202 to S208 are repeatedly executed, resulting in four lines of text regions in total. Then, the column and row information of each text area is determined according to the method 30 and the method 40, and the marking result is shown in the following table 1:

TABLE 1

FIG. 5 is a flow chart illustrating a method for determining a location of a text region in yet another image according to an exemplary embodiment. The difference between the above methods is that the method 50 shown in fig. 5 further provides a method of classifying all text regions in an image before the text regions are classified, i.e., an embodiment of any of the above methods. Likewise, the method for determining the position of a text region in an image as shown in fig. 5 can also be applied, for example, in a scene in which a text image is recognized based on OCR technology.

Referring to fig. 5, prior to step S106 in method 10, method 10 further includes:

in step S502, a plurality of text regions are classified so that the text regions of different classes do not overlap in the vertical direction, based on the coordinate information of the four corners of each text region.

Correspondingly, step S106 in the method 10 is: the position determination operation is performed on the text areas in the respective categories, that is, the steps S1062 to S1066 are performed on the text areas in the respective categories, respectively.

In some embodiments, classifying the plurality of text regions according to the coordinate information of the four corners of each text region may include: and sequentially screening and classifying two text regions which meet the classification condition.

Wherein the classification condition is that the larger of the ordinate of the lower left corner of the two text regions is smaller than the smaller of the ordinate of the upper left corner of the two text regions.

In light of the above, the classification conditions can be converted into the following formula (5):

max(y_lbi，y_lbj)＜min(y_lti，y_ltj)---(5)

and (3) according to the vertical coordinates of the upper left corner and the lower left corner of all the text regions, comparing every two text regions in the text image in a traversal mode, and sequentially screening out every two text regions which meet the formula (5).

Based on the classification processing result, the

method

20, 30, 40 can sequentially realize operations such as line and row information determination of the text region in each category without dividing or marking from the perspective of the text image overall.

Note that "so that the text regions of different categories do not overlap in the vertical direction" merely defines the meaning of "each text region in any category does not overlap in the vertical direction with all the text regions in other categories", but does not mean "the text regions of the same category must overlap in the vertical direction". In other words, the two text regions classified into two categories must not satisfy the above expression (5), and the two text regions classified into the same category may or may not satisfy the above expression (5).

In this regard, reference is made to fig. 11: both the text region a and the text region B in category 1 obviously satisfy the above expression (5), and both the text region B and the text region C obviously also satisfy the above expression (5), but both the text region a and the text region C obviously do not satisfy the above expression (5). However, since the classification process is to compare every two text regions traversably in the text image: when comparing the text region a with other text regions, it may be determined that the text region B and the text region a belong to the same category; when comparing the text region B with other text regions, it is determined that the text region A, C belongs to the same category as the text region B.

Referring again to FIG. 10: in fact, all text regions in the text image can be initially divided into upper and lower categories (as shown by the two sides of the thick dashed line in fig. 10) through step S502. Then, for example, through steps S202 to S208 in the method 20, the text regions in the two categories are respectively divided into lines, so that it can be determined that two text regions "a certain hotel menu" and "2019-04-05" in the class above the thick dotted line belong to the same line, and nine text regions in the class below the thick dotted line belong to three lines, that is, all text regions in the entire text image are divided into four lines. Therefore, no matter all the text areas are directly subjected to line division, or all the text areas are classified in advance and then the text areas in each class are subjected to line division, the determination results of the two schemes on the positions of the text areas in the image are completely consistent.

According to some embodiments, the method for determining the position of the text region in the image provided by the invention can perform a preliminary classification operation before performing the line division operation on all the text regions, so as to reduce the operation amount during the line division operation.

It should be noted that, although the above methods are described by taking the case of performing line and row positioning from the left text region as an example, it should be understood by those skilled in the art that the above methods can be applied to the method of performing line and row positioning from the right text region as well according to the inventive concept and content disclosed in the above methods.

Those skilled in the art will appreciate that all or part of the steps implementing the above embodiments are implemented as computer programs executed by a CPU. The computer program, when executed by the CPU, performs the functions defined by the method provided by the present invention. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.

Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the method according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.

FIG. 6 is a block diagram illustrating an apparatus for determining a location of a text region in an image according to an example embodiment. The apparatus for determining the location of a text region in an image as shown in fig. 6 may be applied, for example, in a scene in which a text image is recognized based on OCR technology.

Referring to fig. 6, the apparatus 60 for determining the position of a text region in an image includes: an image acquisition module 602, a text positioning module 604, a text line-splitting module 608, a first determination module 610, and a second determination module 612.

The image obtaining module 602 is configured to obtain an image to be identified.

Wherein the image to be recognized contains text.

The text positioning module 604 is configured to perform text positioning on the image to be recognized, and obtain a plurality of text regions and coordinate information of four corners of each text region.

In some embodiments, the text positioning module 604 may further include a detection positioning unit for obtaining coordinate information of a plurality of text regions and four corner positions of each text region based on the trained deep learning text detection and positioning model.

The text line dividing module 608 is configured to determine a text region that belongs to the same line as the reference text region according to a region formed by extension lines of the upper and lower sides of the selected reference text region.

In some embodiments, the text-line-splitting module 608 may further include: the device comprises a first determining unit, a second determining unit, a third determining unit and a repeated executing unit.

The first determining unit is used for determining the reference text area as the text area at the leftmost end according to the abscissa of the upper left corner of each text area.

The second determination unit is configured to determine a region formed by the upper and lower edges of the reference text region based on a straight line equation determined by the upper and lower edges of the reference text region.

The third determining unit is configured to determine a text region having an overlap with the region as a text region belonging to the same line as the reference text region.

The repeated execution unit is used for selecting the text region at the leftmost end of the at least one determined text region which is not determined as the reference text region as a new reference text region when at least one determined text region which belongs to the same line as the reference text region exists, and instructing the first, second and third determination units to repeatedly execute the respective functions.

The first determining module 610 is configured to determine column information of each text region in each row according to an abscissa of an angle of the same orientation of each text region in each row.

In some embodiments, the first determining module 610 may further include: a horizontal sorting unit and a fourth determining unit.

The horizontal sorting unit is used for sorting the text regions in each line according to the size of the horizontal coordinate of the upper left corner of each text region in each line.

The fourth determining unit is configured to determine column information of each text region in each row, respectively, according to the sorting result.

The second determining module 612 is configured to determine line information of each text region according to an average value of vertical coordinates of angles of the same orientation of each text region in each line.

In some embodiments, the second determining module 612 may further include: an average value calculating unit and a fifth determining unit.

The average value calculating unit is used for determining the average value of the vertical coordinates of the upper left corner of each text area in each line.

The fifth determining unit is configured to determine line information of each text region respectively according to a size of the average value.

In some embodiments, the apparatus 60 for determining the position of the text region in the image may further include a classification processing module 606, configured to perform classification processing on a plurality of text regions according to coordinate information of four corners of each text region before the text segmentation module 608 determines the text regions belonging to the same line as the reference text region according to the region formed by extension lines of the upper and lower sides of the selected reference text region, so that the text regions of different classes do not overlap in the vertical direction.

In some embodiments, the classification processing module 606 may further include a traversal filtering unit for sequentially filtering and classifying two text regions that satisfy the classification condition.

Wherein the classification condition may be, for example, that the larger of the ordinate of the lower left corner of the two text regions is smaller than the smaller of the ordinate of the upper left corner of the two text regions.

According to the position determining device of the text region in the image, provided by the embodiment of the invention, by utilizing the coordinate information of the text region, all the text regions in the image can be adaptively divided, the row and column information of the text regions in the image can be quickly and effectively determined, the defects that the text image has no fixed and uniform standard, the typesetting format is different, the surrounding is interfered by external characters, the oblique perspective exists and the like are overcome, and the text basic information is provided for the high-efficiency and high-precision formatted output of the subsequent OCR technology.

In addition, according to some embodiments, the present invention provides a device for determining a position of a text region in an image, which is capable of performing a preliminary classification operation first before performing a line segmentation operation on all text regions, so as to reduce an amount of computation in the line segmentation operation.

It is noted that the block diagrams shown in the above figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

Fig. 7 is a schematic structural diagram of an electronic device according to an example embodiment. It should be noted that the electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 7, the electronic device 700 is embodied in the form of a general-purpose computer device. The components of the electronic device 700 include: at least one Central Processing Unit (CPU)701, which may perform various appropriate actions and processes according to program code stored in a Read Only Memory (ROM)702 or loaded from at least one storage unit 708 into a Random Access Memory (RAM) 703.

In particular, according to an embodiment of the present invention, the program code may be executed by the central processing unit 701, such that the central processing unit 701 performs the steps according to various exemplary embodiments of the present invention described in the above-mentioned method embodiment section of the present specification. For example, the central processing unit 701 may perform the steps as shown in fig. 1 to 5.

In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The CPU 701, the ROM702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input unit 706 including a keyboard, a mouse, and the like; an output unit 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage unit 708 including a hard disk and the like; and a communication unit 709 including a network interface card such as a LAN card, a modem, or the like. The communication unit 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage unit 708 as necessary.

Referring to fig. 8, a program product 800 configured to implement the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable medium carries one or more programs which, when executed by a device, cause the computer readable medium to carry out the functions shown in figures 1 to 5.

Exemplary embodiments of the present invention are specifically illustrated and described above. It is to be understood that the invention is not limited to the precise construction, arrangements, or instrumentalities described herein; on the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method for determining a location of a text region in an image, comprising:

acquiring an image to be recognized, wherein the image to be recognized comprises a text;

performing text positioning on the image to be recognized to obtain a plurality of text areas and coordinate information of four corners of each text area; and

performing a position determination operation on each text region, respectively, including:

determining a text area which belongs to the same line as the reference text area according to an area formed by extension lines of the upper and lower edges of the selected reference text area;

respectively determining column information of each text region in each row according to the abscissa of the angle of the same azimuth of each text region in each row; and

and respectively determining the line information of each text area according to the average value of the vertical coordinates of the same azimuth angle of each text area in each line.

2. The method according to claim 1, wherein determining the text region belonging to the same line as the reference text region based on the region formed by the extensions of the upper and lower sides of the selected reference text region comprises:

step a), determining the reference text region as the text region at the leftmost end according to the abscissa of the upper left corner of each text region;

step b) determining a region formed by the upper edge and the lower edge of the reference text region according to a linear equation determined by the upper edge and the lower edge of the reference text region;

step c) determining a text region having an overlap with the region as a text region belonging to the same line as the reference text region;

step d) if at least one determined text region which belongs to the same line as the reference text region exists, selecting the text region at the leftmost end which is not determined as the reference text region in the at least one determined text region as a new reference text region, and repeatedly executing the steps a) to c).

3. The method of claim 1, wherein determining column information for each text region in each line based on the abscissa of the angle of the same orientation of each text region in each line comprises:

respectively sequencing the text regions in each line according to the size of the abscissa of the upper left corner of each text region in each line; and

and respectively determining column information of each text area in each row according to the sorting result.

4. The method of claim 1, wherein determining the line information for each text region separately based on an average of the ordinate of the angle of the same orientation of each text region in each line comprises:

determining an average value of vertical coordinates of the upper left corner of each text area in each line; and

and respectively determining the line information of each text area according to the size of the average value.

5. The method of any of claims 1-4, wherein prior to performing the location determination operation separately on each text region, the method further comprises: classifying the plurality of text regions according to the coordinate information of the four corners of each text region, so that the different types of text regions are not overlapped in the vertical direction; performing the position determination operation on each text region includes: the position determination operation is performed on the text regions in each category, respectively.

6. The method according to claim 5, wherein classifying the plurality of text regions according to the coordinate information of the four corners of each text region comprises: sequentially screening two text regions meeting the classification conditions and classifying the two text regions into one type; wherein the classification condition is that a larger one of the ordinate of the lower left corner of the two text regions is smaller than a smaller one of the ordinate of the upper left corner of the two text regions.

7. The method according to any one of claims 1 to 4, wherein performing text localization on the image to be recognized to obtain a plurality of text regions and coordinate information of four corners of each text region comprises: and obtaining the coordinate information of the plurality of text regions and the four corner positions of each text region based on the trained deep learning text detection and positioning model.

8. An apparatus for determining a location of a text region in an image, comprising:

the image acquisition module is used for acquiring an image to be identified, and the image to be identified comprises a text;

the text positioning module is used for carrying out text positioning on the image to be recognized to obtain a plurality of text areas and coordinate information of four corners of each text area;

the text line dividing module is used for determining the text regions which belong to the same line as the reference text region according to the region formed by the extension lines of the upper and lower edges of the selected reference text region;

the first determining module is used for respectively determining the column information of each text area in each row according to the abscissa of the angle of the same azimuth of each text area in each row; and

and the second determining module is used for respectively determining the line information of each text area according to the average value of the vertical coordinates of the same azimuth angle of each text area in each line.

9. An electronic device, comprising: memory, processor and executable instructions stored in the memory and executable in the processor, characterized in that the processor implements the method according to any of claims 1-7 when executing the executable instructions.

10. A computer-readable storage medium having stored thereon computer-executable instructions, which when executed by a processor, implement the method of any one of claims 1-7.