CN111461132B

CN111461132B - Method and device for assisting in labeling OCR image data

Info

Publication number: CN111461132B
Application number: CN202010304296.1A
Authority: CN
Inventors: 蔡耀华
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2022-05-10
Anticipated expiration: 2040-04-17
Also published as: CN111461132A

Abstract

Embodiments of the present specification provide methods and apparatus for assisting in OCR image data annotation. In the method, after text region detection is carried out on OCR image data to obtain a first text candidate region set, a fourth text candidate region which is not overlapped with other text candidate regions is extracted from the first text candidate region set on the basis of the region height of the text candidate regions and the overlapping relation of the text candidate regions. Further, a representative text candidate region is determined from among the text candidate regions with partial coincidence with other text candidate regions. Further, the fourth set of text candidate regions and the representative text candidate region are output as a text label box.

Description

Method and device for assisting in labeling OCR image data

Technical Field

Embodiments of the present disclosure relate generally to the field of Optical Character Recognition (OCR) data annotation technology, and in particular, to a method and an apparatus for assisting OCR image data annotation.

Background

The labeling of the OCR image data mainly depends on manually selecting a text area in a frame in an OCR picture and filling in text content, so that the labeling efficiency and the labeling precision of the OCR image data are low. When the text in the OCR image data is inclined and dense, the problems of low labeling efficiency and labeling precision of the OCR image data are more obvious.

Disclosure of Invention

In view of the foregoing, the present specification provides a method and an apparatus for assisting in labeling OCR image data. By using the method and the device, the text annotation box can be automatically determined from the OCR image data, thereby being beneficial to improving the annotation efficiency and the annotation precision of the OCR image data.

According to an aspect of embodiments of the present specification, there is provided a method for assisting OCR image data annotation, comprising: performing text region detection on OCR image data to obtain a first text candidate region set in the OCR image data; dividing the first text candidate region set into a second text candidate region set and a third text candidate region set based on the region height of the text candidate regions, wherein the region height of the second text candidate region is not less than the average region height of the first text candidate region set, and the region height of the third text candidate region is less than the average region height of the first text candidate region set; dividing the second text candidate region set into a fourth text candidate region set and a fifth text candidate region set based on the coincidence relation of the text candidate regions, wherein the fourth text candidate region is a text candidate region which does not coincide with other text candidate regions, and the fifth text candidate region is a text candidate region which partially coincides with other text candidate regions; determining a representative text candidate region from each group of overlapped text candidate regions in the fifth text candidate region set to obtain a sixth text candidate region set; and outputting the text candidate areas in the fourth text candidate area set and the sixth text candidate area set as text labeling boxes.

Optionally, in an example of the above aspect, determining a representative text candidate region from each group of overlapping text candidate regions in the fifth text candidate region set to obtain a sixth text candidate region set may include: determining the overall region slope of the fourth text candidate region set as a standard region slope; and respectively calculating the region slope between each text candidate region in the group of the text candidate regions and the fourth text candidate region with the closest distance for each group of the coincident text candidate regions, and determining the text candidate region with the minimum difference between the calculated region slope and the standard region slope as the representative text candidate region of the group of the coincident text candidate regions.

Optionally, in an example of the above aspect, before dividing the first set of text candidate regions into a second set of text candidate regions and a third set of text candidate regions, the method may further include: and overlapping and combining the first text candidate region in the first text candidate region set.

Optionally, in an example of the above aspect, before dividing the second set of text candidate regions into a fourth set of text candidate regions and a fifth set of text candidate regions, the method may further include: removing text candidate regions having a region slope greater than a predetermined threshold from the second set of text candidate regions.

Optionally, in an example of the above aspect, before dividing the second set of text candidate regions into a fourth set of text candidate regions and a fifth set of text candidate regions, the method may further include: searching out neighbor areas of each third text candidate area in the third text candidate area set from the second text candidate area set; and adding a third text candidate region overlapping with the neighbor region to the second text candidate region set.

Optionally, in an example of the above aspect, the method may further include: extracting text characteristic points of each text candidate area in the text labeling box; determining the inclination of the text labeling box based on the extracted text feature points of the text candidate areas; and performing rotation correction on the text labeling box according to the inclination of the text labeling box.

Optionally, in an example of the above aspect, the method may further include: and carrying out binarization processing on the OCR image data.

According to another aspect of embodiments herein, there is provided an apparatus for assisting in OCR image data annotation, comprising: the text region detection unit is used for detecting text regions of OCR image data to obtain a first text candidate region set in the OCR image data; the first region dividing unit divides the first text candidate region set into a second text candidate region set and a third text candidate region set based on the region height of the text candidate region, wherein the region height of the second text candidate region is not less than the average region height of the first text candidate region set, and the region height of the third text candidate region is less than the average region height of the first text candidate region set; the second region dividing unit is used for dividing the second text candidate region set into a fourth text candidate region set and a fifth text candidate region set based on the coincidence relation of the text candidate regions, wherein the fourth text candidate region is a text candidate region which does not coincide with other text candidate regions, and the fifth text candidate region is a text candidate region which partially coincides with other text candidate regions; a representative text region determining unit that determines a representative text candidate region from each group of the coincident text candidate regions in the fifth text candidate region set to obtain a sixth text candidate region set; and a text label box output unit that outputs the text candidate areas in the fourth text candidate area set and the sixth text candidate area set as a text label box.

Alternatively, in one example of the above-described aspect, the representative text region determining unit may include: a standard region slope determination module, configured to determine an overall region slope of the fourth text candidate region set, as a standard region slope; and the representative text region determining module is used for respectively calculating the region slope between each text candidate region in the group of the text candidate regions and the fourth text candidate region with the nearest distance for each group of the coincident text candidate regions, and determining the calculated text candidate region with the minimum difference between the region slope and the standard region slope as the representative text candidate region of the group of the coincident text candidate regions.

Optionally, in an example of the above aspect, the apparatus may further include: and the region merging processing unit is used for performing overlapping merging processing on a first text candidate region in the first text candidate region set before dividing the first text candidate region set into a second text candidate region set and a third text candidate region set.

Optionally, in an example of the above aspect, the apparatus may further include: a region removing unit that removes, from the second text candidate region set, a text candidate region whose region gradient is greater than a predetermined threshold value before dividing the second text candidate region set into a fourth text candidate region set and a fifth text candidate region set.

Optionally, in an example of the above aspect, the apparatus may further include: a neighbor region searching unit that searches a neighbor region of each of the third text candidate regions in the third text candidate region set from the second text candidate region set; and an area adding unit that adds a third text candidate area overlapping with the neighbor area to the second text candidate area set.

Optionally, in an example of the above aspect, the apparatus may further include: the text characteristic point extraction unit is used for extracting text characteristic points of each text candidate area in the text labeling box; an inclination determining unit, which determines the inclination of the text labeling box based on the extracted text feature points of each text candidate area; and a label box correction unit which performs rotation correction on the text label box according to the inclination of the text label box.

Optionally, in an example of the above aspect, the apparatus may further include: and a binarization processing unit that performs binarization processing on the OCR image data.

According to another aspect of the present specification, there is provided an electronic apparatus including: at least one processor, and a memory coupled with the at least one processor, the memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform a method for assisting OCR image data annotation as described above.

According to another aspect of embodiments herein, there is provided a machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform a method for assisting in annotation of OCR image data as described above.

Drawings

A further understanding of the nature and advantages of the present disclosure may be realized by reference to the following drawings. In the drawings, similar components or features may have the same reference numerals.

Figure 1 illustrates a flow diagram of a method for assisting in OCR image data annotation in accordance with an embodiment of the present description.

Figure 2 illustrates an example schematic of OCR image data in accordance with an embodiment of the present description.

Fig. 3A-3B illustrate exemplary schematic diagrams of text region detection results after processing by the MSER algorithm according to embodiments of the present description.

Fig. 4 illustrates an example schematic diagram of a text region detection result after being processed by the NMS according to an embodiment of the present specification.

Fig. 5 shows a flowchart of one example of a text region division process based on the region height of a text candidate region according to an embodiment of the present specification.

Fig. 6 illustrates an example schematic diagram of a region slope of a text candidate region according to an embodiment of the present description.

Fig. 7 shows a flowchart of a process for determining representative text candidate regions of a group of overlapping text candidate regions according to an embodiment of the present description.

Fig. 8 illustrates an example schematic diagram of region slopes between text candidate regions according to an embodiment of the present description.

FIG. 9 illustrates a flow diagram of a process for rotation correction of a text label box according to an embodiment of the present description.

FIG. 10 illustrates a block diagram of one example of a data annotation assistance device in accordance with an embodiment of the present description.

Fig. 11 shows a block diagram representing one example of a text region determining unit according to an embodiment of the present specification.

FIG. 12 shows a schematic diagram of an electronic device for assisting in OCR image data annotation, in accordance with embodiments of the present description.

Detailed Description

The subject matter described herein will now be discussed with reference to example embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and thereby implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as needed. For example, the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. In addition, features described with respect to some examples may also be combined in other examples.

As used herein, the term "include" and its variants mean open-ended terms in the sense of "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment". The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. The definition of a term is consistent throughout the specification unless the context clearly dictates otherwise.

In this specification, the OCR process refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a character recognition method. For example, the OCR device scans the text data to obtain OCR image data, and then analyzes the OCR image data to obtain the text and layout information. OCR data tagging is an act of processing OCR data by a data processing person with the aid of a tagging tool.

One implementation of OCR data annotation relies on manual framing out text areas and filling in text content in OCR pictures. The data annotation efficiency and the data annotation precision of the implementation scheme are not high. When the text in the OCR image data is inclined and dense, the problems of low labeling efficiency and labeling precision of the OCR image data are more obvious.

Another implementation scheme of OCR data annotation is to directly access a character detection and recognition model of the server, preprocess the current data to be annotated, and automatically detect the characters and contents in the picture. The OCR data labeling scheme has to rely on a character detection and recognition model of a server and has no universality. Moreover, the character detection and recognition model must be trained specifically for english, chinese, and has poor generalization capability. Because the data annotation provides training data for the model, and the requirements on the model assistance are that the generalization capability is good and the real-time performance is high, the model implementation scheme is not suitable for being used as intelligent assistance in a data annotation scene.

In view of the above, embodiments according to the present specification provide a method and apparatus for assisting OCR image data annotation. In the OCR image data labeling method, text region detection is carried out on OCR image data to obtain a first text candidate region set in the OCR image data. The first text candidate region set is divided into a second text candidate region set and a third text candidate region set based on the region heights of the text candidate regions, the region height of the second text candidate region set is not smaller than the average region height of the first text candidate region set, and the region height of the third text candidate region set is smaller than the average region height of the first text candidate region set. Then, based on the coincidence relation of the text candidate regions, the second set of text candidate regions is divided into a fourth set of text candidate regions, which are text candidate regions having no coincidence with other text candidate regions, and a fifth set of text candidate regions, which are text candidate regions having partial coincidence with other text candidate regions. Then, a representative text candidate region is determined from each group of the overlapped text candidate regions in the fifth text candidate region set to obtain a sixth text candidate region set. Then, the text candidate areas in the fourth text candidate area set and the sixth text candidate area set are output as text labeling boxes. By the method, the text annotation box can be automatically and efficiently determined from the OCR image data, so that the annotation efficiency and the annotation precision of the OCR image data are improved.

A method and apparatus for assisting OCR image data annotation according to an embodiment of the present specification will be described below with reference to the accompanying drawings.

As shown in FIG. 1, at block 110, text region detection is performed on the OCR image data to obtain a first set of text candidate regions in the OCR image data. The first set of text candidate regions includes a plurality of first text candidate regions. Figure 2 illustrates an example schematic of OCR image data in accordance with an embodiment of the present description. After being subjected to the text region detection, the text region detection result shown in fig. 4 is obtained.

In one example, text region detection may be performed on the OCR image data using a Maximum Stable Extreme Region (MSER) algorithm to obtain a first set of text candidate Regions in the OCR image data. Alternatively, in one example, before text region detection is performed on the OCR image data, binarization processing may also be performed on the OCR image data. By binarizing OCR image data, noise interference at the time of text region detection can be reduced.

MSER is a traditional method proposed in 2002 for detecting text regions in images. The MSER algorithm is mainly based on the concept of watershed to perform blob detection in images. Specifically, MSER carries out binarization processing on an image which is processed into gray level, the threshold value of binarization processing is increased from 0 to 255, the increase of the threshold value is similar to the increase of a horizontal plane on a piece of land, and as the horizontal plane rises, a land area with uneven height can be continuously submerged, which is the watershed algorithm. And the difference in level is the difference in gray scale values in the image. In an image containing text, some regions (such as text) are consistent in color (gray value) and therefore will not be covered for a period of time when the horizontal plane (threshold) continues to increase until the threshold reaches the gray value of the text itself, and these regions are called maximally stable extremal regions.

After MSER algorithm processing is performed on the OCR image data shown in fig. 2, the text region detection result shown in fig. 3A may be obtained. The text box shown in fig. 3A has various shapes, which are not beneficial to subsequently obtain the text region from the image through coordinates, so that the text region can be converted into a rectangular text region by using the functions "cv2. bounding select" and "cv2. rectangle" of opencv, for example, thereby obtaining the text region detection result shown in fig. 3B.

After the MSER algorithm processing is performed on the OCR image data, in the obtained first text candidate region set, there may be no overlap between a plurality of first text candidate regions, or there may be overlap between a part of the first text candidate regions. As can be seen from fig. 3B, in the text region detection result shown in fig. 3B, there is overlap between part of the first text candidate regions. In this case, the data annotation assist method according to the embodiment of the present specification may further include removing a repeated text candidate region from the first text candidate region.

In one example, a Non Maximum Suppression (NMS) algorithm may be used to remove the repeated text candidate regions from the first text candidate region. The NMS algorithm is often accompanied by image region detection, and its role is to remove repetitive regions, i.e., to remove small text candidate regions contained in large text candidate regions, which is often used in the fields of face recognition, object detection, and the like.

Specifically, when the NMS algorithm processing is performed on the first text candidate region set, all the first text candidate regions are traversed for score sorting, the first text candidate region with the highest score is selected, then the remaining first text candidate regions are traversed, the first text candidate region having an overlapping area with the first text candidate region with the current highest score larger than a predetermined threshold (IOU) is found, and the found first text candidate region is deleted from the first text candidate region set. For example, in one example, the IOU threshold is set to 0.5, i.e., if the overlapping area between two first text candidate regions is greater than 50% of one of the first text candidate regions, text candidate region deletion is performed. Through this processing, the remaining first text candidate regions do not include the first text candidate region having an overlap portion with the first text candidate region having the highest score larger than the IOC threshold.

Then, another first text candidate region with the highest score (except the first text candidate region with the highest score already found in the previous processing) is found from the remaining first text candidate region set, and the first text candidate region with the overlapping area with the another first text candidate region larger than the threshold is found and deleted. And circularly executing the process until the first text candidate region set does not contain the first text candidate region which is not processed by the NMS algorithm. After the above NMS processing, the text region detection result shown in fig. 4 can be obtained.

After the text region detection result is obtained as described above, returning to fig. 1, at block 120, the first set of text candidate regions is divided into a second set of text candidate regions and a third set of text candidate regions based on the region height of the text candidate regions. Here, the region height of the second text candidate region is not smaller than the average region height of the first text candidate region set, and the region height of the third text candidate region is smaller than the average region height of the first text candidate region set.

As shown in FIG. 5, at block 510, an average height of all first text candidate regions in the first set of text candidate regions is calculated. For example, the height of the region of each first text candidate region may be determined, and then the determined height of the region of each first text candidate region may be averaged to obtain an average height.

At block 520, text candidate region partitioning is performed on the first set of text candidate regions based on the region height of each first text candidate region. Specifically, the first text candidate region not smaller than the average height is divided into the second text candidate regions, thereby obtaining the second text candidate region set 520. Further, the first text candidate region smaller than the average height is divided into third text candidate regions, thereby resulting in a third text candidate region set 530.

Optionally, in block 540, the second text candidate region set may be verified twice, that is, the text candidate regions with region slope (slope gradient) greater than the predetermined threshold are removed from the second text candidate region set. In this specification, the region slope of a text candidate region may be expressed by the inclination angle of the text feature point of the text candidate region with respect to a predetermined reference axis. For example, the text feature point may be a bottom side of the text candidate region, and accordingly, may be represented using an angle of the bottom side of the text candidate region with respect to a horizontal axis, such as the angle in fig. 6

As shown. Alternatively, the text feature point may be a side of the text candidate region, and accordingly, may be made to beRepresented by the angle of the side of the text candidate area with respect to the vertical axis. Alternatively, the text feature point may be a center point of the text candidate region, and accordingly, may be represented using an angle of a connection line between the center point and one of the corner vertices with respect to a horizontal axis.

Further optionally, at block 560, neighbor regions of respective third text candidate regions in the third text candidate region set are searched out from the second text candidate region set, and a third text candidate region having a coincidence with the neighbor regions is added to the second text candidate region set, thereby implementing a cross-comparison of the second text candidate region set and the third text candidate region set.

After the above processing, at block 130, the second set of text candidate regions is divided into a fourth set of text candidate regions and a fifth set of text candidate regions based on the coincidence relation of the text candidate regions. Here, the fourth text candidate region is a text candidate region in which there is no coincidence with other text candidate regions, and the fifth text candidate region is a text candidate region in which there is partial coincidence with other text candidate regions.

At block 140, representative text candidate regions are determined from each set of overlapping text candidate regions in the fifth set of text candidate regions to yield a sixth set of text candidate regions.

As shown in FIG. 7, at block 710, the overall region slope for the fourth set of text candidate regions is determined as the standard region slope. For example, text feature points of respective text candidate regions in the fourth text candidate region set, for example, a bottom edge or a center point of the respective text candidate regions, may be extracted. Then, based on the extracted individual text feature points, a trend line is fitted using, for example, an LR (linear regression model), thereby obtaining a gradient as an overall region slope of the fourth text candidate region set.

Next, the operations of blocks 720 to 750 are performed for each group of coinciding text candidate regions in the fourth set of text candidate regions until no unprocessed group of coinciding text candidate regions is contained in the fourth set of text candidate regions.

Specifically, at block 720, the region slopes between each text candidate region in the set of text candidate regions and the fourth text candidate region that is closest in distance are respectively calculated. As shown in fig. 8, it is assumed that the group of text candidate regions includes text candidate regions a and B, and that the fourth set of text candidate regions includes text candidate regions C and D. As can be seen from fig. 8, the text candidate region a is closest to the text candidate region C, and the text candidate region B is closest to the text candidate region D. Thus, the region slope between the text candidate region a and the text candidate region C, and the region slope between the text candidate region B and the text candidate region D can be calculated.

In the present specification, in calculating the region slope between two text candidate regions, text feature points of the two text candidate regions, for example, the bottom edges or the center points of the two text candidate regions, may be extracted. Then, based on the extracted individual text feature points, a trend line is fitted using, for example, an LR (linear regression model), thereby obtaining a gradient as a region slope of the two text candidate regions. As shown in fig. 8, the center points of the text candidate regions A, B, C and D may be extracted, and then the angle between the horizontal axis and the line connecting the center points of the text candidate regions a and C is taken as the region slope between the text candidate regions a and C. An angle between a connecting line between center points of the text candidate regions B and D and the horizontal axis is taken as a region slope between the text candidate regions B and D.

At block 730, the text candidate region with the smallest difference between the calculated region slope and the standard region slope is determined as the representative text candidate region of the set of overlapped text candidate regions. For example, assume the standard region slope is

Then calculate

And

if, if

If so, the text candidate area a is determined to be a representative text candidate area of the set of overlapping text candidate areas.

At block 150, the text candidate regions in the fourth set of text candidate regions and the sixth set of text candidate regions are output as text label boxes.

A method for assisting in annotation of OCR image data according to an embodiment of the present specification is described above with reference to fig. 1 to 8. By the method, the text annotation box can be automatically and efficiently determined from the OCR image data, so that the annotation efficiency and the annotation precision of the OCR image data are improved.

Furthermore, the methods provided by embodiments of the present specification for assisting in OCR image data annotation do not rely on a deep learning model and can operate in any environment including, for example, a browser, and thus can be deeply integrated with OCR-related data annotation tools. In addition, the method does not need training data and is insensitive to the language, so that the OCR data of any language can be directly labeled, and the labeling efficiency is greatly improved.

Further, optionally, in another example, after the text label box is determined as above, the rotation correction processing may also be performed on the text label box. FIG. 9 illustrates a flow diagram of a process for rotation correction of a text label box according to an embodiment of the present description.

As shown in FIG. 9, at block 910, text feature points are extracted for each text candidate region in the text label box. For example, the text feature point may be a bottom edge of the text candidate region, a side edge of the text candidate region, or a center point of the text candidate region.

At block 920, based on the extracted text feature points of the respective text candidate regions, the inclination of the text labeling box is determined. For example, when the text feature point is a bottom edge of the text candidate region, a side edge of the text candidate region, or a center point, a trend line may be fitted using an LR (linear regression model), thereby obtaining a gradient (e.g., with respect to a given axis such as a horizontal axis) of the text label box.

At block 930, a rotation correction is performed on the text label box based on the determined tilt of the text label box. For example, in one example, if the text label box has a slope, the text label box is rotation corrected. In another example, the text label box may be rotated and corrected when the inclination of the text label box is greater than a predetermined angle.

With the method shown in fig. 9, by determining the inclination of the text labeling box using the text feature points of each text candidate region in the text labeling box, rotation correction can be performed when the OCR image appears text inclination.

Fig. 10 shows a block diagram of an example of a data annotation assistance device 1000 according to an embodiment of the present description. As shown in fig. 10, the data annotation assisting apparatus 1000 may include a text region detecting unit 1010, a first region dividing unit 1020, a second region dividing unit 1030, a representative text region determining unit 1040, and a text annotation box output unit 1050.

The text region detection unit 1010 is configured to perform text region detection on the OCR image data to obtain a first set of text candidate regions in the OCR image data. The operation of the text region detecting unit 1010 may refer to the operation of the block 110 described above with reference to fig. 1.

The first region dividing unit 1020 is configured to divide the first text candidate region set into a second text candidate region set and a third text candidate region set based on the region height of the text candidate region. Here, the region height of the second text candidate region is not smaller than the average region height of the first text candidate region set, and the region height of the third text candidate region is smaller than the average region height of the first text candidate region set. The operation of the first region division unit 1020 may refer to the operation of the block 120 described above with reference to fig. 1 and the operation described with reference to fig. 5.

The second region dividing unit 1030 is configured to divide the second text candidate region set into a fourth text candidate region set and a fifth text candidate region set based on a coincidence relation of the text candidate regions. Here, the fourth text candidate region is a text candidate region in which there is no overlap with other text candidate regions, and the fifth text candidate region is a text candidate region in which there is partial overlap with other text candidate regions. The operation of the second region dividing unit 1030 may refer to the operation of the block 130 described above with reference to fig. 1.

The representative text region determining unit 1040 is configured to determine a representative text candidate region from each group of overlapping text candidate regions in the fifth set of text candidate regions to obtain a sixth set of text candidate regions. The operation of the representative text region determining unit 1040 may refer to the operation of the block 140 described above with reference to fig. 1 and the operation described with reference to fig. 7.

The text label box output unit 1050 is configured to output the text candidate regions in the fourth text candidate region set and the sixth text candidate region set as the text label box. The operation of the text label box output unit 1050 may refer to the operation of block 150 described above with reference to fig. 1.

Fig. 11 illustrates a block diagram representing one example of a text region determining unit 1100 according to an embodiment of the present specification. As shown in fig. 11, the representative text region determining unit 1100 may include a standard region slope determining module 1110 and a representative text region determining module 1120.

The standard region slope determination module 1110 is configured to determine an overall region slope of the fourth set of text candidate regions as the standard region slope. The operation of the standard region slope determination module 1110 may refer to the operation of block 710 described above with reference to fig. 7.

The representative text region determining module 1120 is configured to, for each group of the coincident text candidate regions, respectively calculate a region slope between each text candidate region in the group of the text candidate regions and a nearest fourth text candidate region, and determine the text candidate region having the smallest difference between the calculated region slope and the standard region slope as a representative text candidate region of the group of the coincident text candidate regions. The operations of the representative text region determination module 1120 may refer to the operations of blocks 720 through 760 described above with reference to fig. 7.

Further, optionally, in an example, the data annotation assisting apparatus 1000 may further include a region merging processing unit (not shown). The region merging processing unit is configured to perform overlap-merge processing on a first text candidate region in the first text candidate region set before dividing the first text candidate region set into a second text candidate region set and a third text candidate region set.

Further, optionally, in an example, the data annotation assisting apparatus 1000 may further include a region removal unit (not shown). The region removing unit is configured to remove, from the second set of text candidate regions, a text candidate region having a region slope larger than a predetermined threshold before dividing the second set of text candidate regions into a fourth set of text candidate regions and a fifth set of text candidate regions.

Further, optionally, in an example, the data annotation assisting apparatus 1000 may further include a neighbor area searching unit (not shown) and an area adding unit (not shown). The neighbor region searching unit is configured to search out neighbor regions of respective third text candidate regions in the third text candidate region set from the second text candidate region set. The region adding unit is configured to add a third text candidate region, which coincides with a neighboring region, to the second text candidate region set.

Further, optionally, in one example, the data annotation assisting apparatus 1000 may further include a text feature point extracting unit, an inclination determining unit, and an annotation frame correcting unit.

The text feature point extracting unit is configured to extract text feature points of the respective text candidate regions in the text labeling box. The operation of the text feature point extracting unit may refer to the operation of block 910 described above with reference to fig. 9.

The inclination determining unit is configured to determine the inclination of the text labeling box based on the extracted text feature points of the respective text candidate regions. The operation of the inclination determination unit may refer to the operation of the block 920 described above with reference to fig. 9.

And the marking frame correcting unit is configured to perform rotation correction on the text marking frame according to the determined inclination of the text marking frame. The operation of the label box correction unit may refer to the operation of block 930 described above with reference to fig. 9.

Further, optionally, in one example, the data labeling auxiliary device 1000 may further include a binarization processing unit (not shown). The binarization processing unit is configured to perform binarization processing on the OCR image data.

As described above with reference to fig. 1 to 11, the data annotation assist method and the data annotation assist device according to the embodiments of the present specification are described. The above data annotation auxiliary device can be implemented by hardware, or can be implemented by software, or a combination of hardware and software.

Figure 12 illustrates a schematic diagram of an electronic device for assisting in annotation of OCR image data in accordance with an embodiment of the present description. As shown in fig. 12, the electronic device 1200 may include at least one processor 1210, a memory (e.g., non-volatile storage) 1220, a memory 1230, and a communication interface 1240, and the at least one processor 1210, the memory 1220, the memory 1230, and the communication interface 1240 are connected together via a bus 1260. The at least one processor 1210 executes at least one computer-readable instruction (i.e., the elements described above as being implemented in software) stored or encoded in memory.

In one embodiment, computer-executable instructions are stored in the memory that, when executed, cause the at least one processor 1210 to: performing text region detection on the OCR image data to obtain a first text candidate region set in the OCR image data; dividing the first text candidate region set into a second text candidate region set and a third text candidate region set based on the region height of the text candidate regions, wherein the region height of the second text candidate region is not less than the average region height of the first text candidate region set, and the region height of the third text candidate region is less than the average region height of the first text candidate region set; dividing the second text candidate region set into a fourth text candidate region set and a fifth text candidate region set based on the coincidence relation of the text candidate regions, wherein the fourth text candidate region is a text candidate region which is not coincided with other text candidate regions, and the fifth text candidate region is a text candidate region which is partially coincided with other text candidate regions; determining a representative text candidate region from each group of overlapped text candidate regions in the fifth text candidate region set to obtain a sixth text candidate region set; and outputting the text candidate areas in the fourth text candidate area set and the sixth text candidate area set as text labeling boxes.

It should be appreciated that the computer-executable instructions stored in the memory, when executed, cause the at least one processor 1210 to perform the various operations and functions described above in connection with fig. 1-11 in the various embodiments of the present description.

According to one embodiment, a program product, such as a machine-readable medium (e.g., a non-transitory machine-readable medium), is provided. A machine-readable medium may have instructions (i.e., elements described above as being implemented in software) that, when executed by a machine, cause the machine to perform various operations and functions described above in connection with fig. 1-11 in the various embodiments of the present specification. Specifically, a system or apparatus may be provided which is provided with a readable storage medium on which software program code implementing the functions of any of the above embodiments is stored, and causes a computer or processor of the system or apparatus to read out and execute instructions stored in the readable storage medium.

In this case, the program code itself read from the readable medium can realize the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of the present invention.

Examples of the readable storage medium include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or from the cloud via a communications network.

It will be understood by those skilled in the art that various changes and modifications may be made in the above-disclosed embodiments without departing from the spirit of the invention. Accordingly, the scope of the invention should be determined from the following claims.

It should be noted that not all steps and units in the above flows and system structure diagrams are necessary, and some steps or units may be omitted according to actual needs. The execution order of the steps is not fixed, and can be determined as required. The apparatus structures described in the above embodiments may be physical structures or logical structures, that is, some units may be implemented by the same physical entity, or some units may be implemented by a plurality of physical entities, or some units may be implemented by some components in a plurality of independent devices.

In the above embodiments, the hardware units or modules may be implemented mechanically or electrically. For example, a hardware unit, module or processor may comprise permanently dedicated circuitry or logic (such as a dedicated processor, FPGA or ASIC) to perform the corresponding operations. The hardware units or processors may also include programmable logic or circuitry (e.g., a general purpose processor or other programmable processor) that may be temporarily configured by software to perform the corresponding operations. The specific implementation (mechanical, or dedicated permanent, or temporarily set) may be determined based on cost and time considerations.

The detailed description set forth above in connection with the appended drawings describes exemplary embodiments but does not represent all embodiments that may be practiced or fall within the scope of the claims. The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.

The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for assisting OCR image data annotation comprising:

performing text region detection on OCR image data to obtain a first text candidate region set in the OCR image data;

dividing the first text candidate region set into a second text candidate region set and a third text candidate region set based on the region height of the text candidate regions, wherein the region height of the second text candidate region is not less than the average region height of the first text candidate region set, and the region height of the third text candidate region is less than the average region height of the first text candidate region set;

dividing the second text candidate region set into a fourth text candidate region set and a fifth text candidate region set based on the coincidence relation of the text candidate regions, wherein the fourth text candidate region is a text candidate region which does not coincide with other text candidate regions, and the fifth text candidate region is a text candidate region which partially coincides with other text candidate regions;

determining a representative text candidate region from each group of overlapped text candidate regions in the fifth text candidate region set to obtain a sixth text candidate region set; and

outputting the text candidate regions in the fourth text candidate region set and the sixth text candidate region set as text label boxes,

determining a representative text candidate region from each group of overlapped text candidate regions in the fifth text candidate region set, and obtaining a sixth text candidate region set includes:

determining the overall region slope of the fourth text candidate region set as a standard region slope;

for each set of the coincident text candidate regions,

respectively calculating the region slope between each text candidate region in the group of text candidate regions and the fourth text candidate region with the nearest distance, wherein the region slope between the two text candidate regions is represented by the inclination angle of the trend line fitted by the text feature points extracted based on the two text candidate regions, and

and determining the text candidate area with the minimum difference between the calculated area slope and the standard area slope as a representative text candidate area of the group of overlapped text candidate areas.

2. The method of claim 1, wherein prior to dividing the first set of text candidate regions into a second set of text candidate regions and a third set of text candidate regions, the method further comprises:

and overlapping and combining the first text candidate region in the first text candidate region set.

3. The method of claim 1, wherein prior to dividing the second set of text candidate regions into a fourth set of text candidate regions and a fifth set of text candidate regions, the method further comprises:

removing text candidate regions having a region slope greater than a predetermined threshold from the second set of text candidate regions.

4. The method of claim 1, wherein prior to dividing the second set of text candidate regions into a fourth set of text candidate regions and a fifth set of text candidate regions, the method further comprises:

searching out neighbor areas of each third text candidate area in the third text candidate area set from the second text candidate area set; and

and adding a third text candidate region which is overlapped with the neighbor region to the second text candidate region set.

5. The method of claim 1, further comprising:

extracting text characteristic points of each text candidate area in the text labeling box;

determining the inclination of the text labeling box based on the extracted text feature points of the text candidate areas; and

and performing rotation correction on the text labeling box according to the inclination of the text labeling box.

6. The method of claim 1, further comprising:

and carrying out binarization processing on the OCR image data.

7. An apparatus for assisting OCR image data annotation comprising:

the text region detection unit is used for detecting text regions of OCR image data to obtain a first text candidate region set in the OCR image data;

the first region dividing unit divides the first text candidate region set into a second text candidate region set and a third text candidate region set based on the region height of the text candidate region, wherein the region height of the second text candidate region is not less than the average region height of the first text candidate region set, and the region height of the third text candidate region is less than the average region height of the first text candidate region set;

the second region dividing unit is used for dividing the second text candidate region set into a fourth text candidate region set and a fifth text candidate region set based on the coincidence relation of the text candidate regions, wherein the fourth text candidate region is a text candidate region which does not coincide with other text candidate regions, and the fifth text candidate region is a text candidate region which partially coincides with other text candidate regions;

a representative text region determining unit that determines a representative text candidate region from each group of the coincident text candidate regions in the fifth text candidate region set to obtain a sixth text candidate region set; and

a text label box output unit that outputs the text candidate areas in the fourth text candidate area set and the sixth text candidate area set as a text label box,

wherein the representative text region determining unit includes:

a standard region slope determining module, configured to determine an overall region slope of the fourth text candidate region set as a standard region slope;

and the representative text region determining module is used for respectively calculating the region slope between each text candidate region in the group of text candidate regions and a fourth text candidate region which is closest to the text candidate region in distance, wherein the region slope between the two text candidate regions is represented by the inclination angle of a trend line which is fitted by text feature points extracted based on the two text candidate regions, and the text candidate region with the minimum difference between the calculated region slope and the standard region slope is determined as the representative text candidate region of the group of coincident text candidate regions.

8. The apparatus of claim 7, further comprising:

and the region merging processing unit is used for performing overlapping merging processing on a first text candidate region in the first text candidate region set before dividing the first text candidate region set into a second text candidate region set and a third text candidate region set.

9. The apparatus of claim 7, further comprising:

a region removing unit that removes, from the second text candidate region set, a text candidate region whose region gradient is greater than a predetermined threshold value before dividing the second text candidate region set into a fourth text candidate region set and a fifth text candidate region set.

10. The apparatus of claim 7, further comprising:

a neighbor region searching unit that searches a neighbor region of each of the third text candidate regions in the third text candidate region set from the second text candidate region set; and

and the area adding unit is used for adding a third text candidate area overlapped with the neighbor area to the second text candidate area set.

11. The apparatus of claim 7, further comprising:

the text characteristic point extraction unit is used for extracting text characteristic points of each text candidate area in the text labeling box;

an inclination determining unit, which determines the inclination of the text labeling box based on the extracted text feature points of each text candidate area; and

and the marking frame correcting unit is used for performing rotation correction on the text marking frame according to the inclination of the text marking frame.

12. The apparatus of claim 7, further comprising:

a binarization processing unit that performs binarization processing on the OCR image data.

13. An electronic device, comprising:

at least one processor, and

a memory coupled with the at least one processor, the memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method of any of claims 1-6.

14. A machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method of any of claims 1 to 6.