CN116824594B

CN116824594B - Text ordering method for positioning keywords in image

Info

Publication number: CN116824594B
Application number: CN202310834541.3A
Authority: CN
Inventors: 韦箫华
Original assignee: Guangdong Xike Intelligent Technology Co ltd
Current assignee: Guangdong Xike Intelligent Technology Co ltd
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2024-04-26
Anticipated expiration: 2043-07-10
Also published as: CN116824594A

Abstract

The invention discloses a text ordering method for positioning keywords in images, which relates to the technical field of machine vision and comprises the following steps: a. sequencing all the extracted connected domains from left to right once according to the x direction, and storing the result in a vector R1; b. taking out all the connected domains of R1, and storing the connected domains into a vector R0, wherein R1 is empty; c. performing forward traversal on R0, judging the relative position relation between each connected domain of R0 and all connected domains of R1 in reverse traversal, taking out the connected domains meeting the conditions from R0, putting the connected domains into a vector R1, and otherwise, executing the step d; d. judging the relative position relation between the connected domain of R0 and the last connected domain of DR, taking out the connected domain meeting the condition from R0, putting into DR, otherwise putting into R1; the invention greatly reduces the probability of missed detection and false detection of the keywords.

Description

Text ordering method for positioning keywords in image

Technical Field

The invention relates to the technical field of machine vision, in particular to a text ordering method for positioning keywords in images.

Background

Locating the position of keywords in an image often requires the use of OCR technology as a step in OCR preprocessing: the ordering of the connected domains plays a very important role in the accuracy of keyword positioning. In this step, the connected domain needs to be text-ordered. The text ranking is to rank the central coordinates of the communication fields in the order from left to right and then from top to bottom, namely, the order of reading articles by people is met. The existing method (the 'chart' ordering algorithm in the sort_region function in Halcon) generally performs line segmentation on all connected domains, and the processing method is as follows: as long as the smallest circumscribed rectangle frame of the connected domains meets the set overlapping percentage in the Y-axis direction, the connected domains belong to the same row, and then the connected domains of each row are ordered from left to right. This method can only process images with regular layout, but in practical application, the text layout on the images is not regular. If the above method is used, error conditions as shown in fig. 2,3 and fig. 4, 5 are easily caused. In fig. 2, the smallest circumscribed rectangular box of the word "D" overlaps the smallest circumscribed rectangular boxes of the words "D" and "T" in the word "DOT" in the Y-axis direction, but does not overlap the smallest circumscribed rectangular box of the word "O" in the word "DOT". This results in OCR recognition results of "CORDDT" and not the desired "CORDDOT" results, and the keyword "DOT" cannot be extracted correctly under incorrect recognition results. In fig. 4, the smallest circumscribed rectangle of "Y" overlaps the smallest circumscribed rectangle of the character "5" in the Y-axis direction, resulting in a final recognition result of "2Y5", which is not the expected "2Y156" result, and the keyword "156" cannot be extracted correctly under the wrong recognition result.

In order to avoid the errors, the invention provides a sorting method for the image connected domains, which greatly improves the accuracy of positioning keywords in the images.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention aims to provide a text ordering method for positioning keywords in images.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a text ordering method for locating keywords in an image, comprising the steps of:

Step 1: after all connected domains in the image are extracted, sequencing the X coordinate of the left upper corner of the smallest circumscribed rectangle frame of the connected domains from left to right according to the X axis direction, and sequentially storing the sequenced results into a vector R1;

Step 2: taking out all the connected domains in R1 and storing the same into a vector R0, wherein R1 is an empty vector, taking out the first connected domain of R0 and storing the same into DR as a reference connected domain of a new row;

Step 3: traversing R0 according to the sequence from the index to the index, setting the circulation index as j, wherein R0[ j ] represents the j-th connected domain in the vector R0, and acquiring the position parameter of R0[ j ]: min_j, max_j, cy_j, cx_j, and up_j;

Step 4: traversing R0 to a jth connected domain, namely traversing R1 according to the sequence from the index to the small, setting a cyclic index as k, comparing the relative positions of R0[ j ] and a connected domain R1[ k ] meeting a condition III in R1, and meeting the conditions I and II when meeting the condition III, wherein the condition I and II indicate that the connected domain belonging to the same row with R0[ j ] exists in R1, so that R0[ j ] is stored in R1, and if R1 is traversed, and the conditions I, II and III can not be met simultaneously, executing step 5, otherwise returning to step 3, traversing the next connected domain in R0, namely R0[ j+1];

Step 5: the last connected domain of the vector DR is taken out, namely, the last connected domain DR [ end ] stored in the DR is taken out, and the position parameter of DR [ end ] is obtained: min_e, max_e, cy_e and cx_e, comparing the relative positions of the connected domain R0[ j ] which is not stored in R1 in the step 4 with the DR [ end ], storing R0[ j ] in DR if the conditions four, five and six are satisfied at the same time, otherwise storing R0[ j ] in R1;

Step 6: judging whether R0 is traversed to be over or not, if not, returning to the step 3, continuing traversing, otherwise judging whether R1 is empty or not, if R1 is empty, indicating that all connected domains are ordered to be over, DR stores all ordered connected domains, if R1 is not empty, indicating that all connected domains are not ordered, returning to the step 2, and continuing executing until R1 is judged to be empty in the step 6.

Further, R0 represents initializing to be empty, storing unordered connected domains, R1 represents initializing to be connected domains which have been ordered from left to right in the X-axis direction, and subsequently being used for temporarily storing connected domains which do not conform to the ordering rule, DR represents the finally outputted connected domains.

Further, cx_k denotes the center x coordinate of the smallest circumscribed rectangular frame of the kth connected domain, cy_k denotes the center Y coordinate of the smallest circumscribed rectangular frame of the kth connected domain, max_k denotes the maximum Y coordinate of the smallest circumscribed rectangular frame of the kth connected domain, min_k denotes the minimum Y coordinate of the smallest circumscribed rectangular frame of the kth connected domain, up_k denotes the midpoint between the center point and the upper boundary of the smallest circumscribed rectangular frame of the kth connected domain, H denotes the distance between up_k and min_k, W denotes the maximum range of the distances of the center point x coordinates of the two connected domains used for judging the relative positional relationship, which parameter needs to be manually set according to the actual situation for limiting the comparative range.

Further, min_j represents the minimum Y coordinate of the minimum circumscribed rectangular frame of the jth connected domain in R0, and other parameters are the same.

Further, min_e represents the minimum Y coordinate of the minimum circumscribed rectangular frame of DR [ end ], and other parameters are the same.

Compared with the prior art, the invention has the following beneficial effects:

Taking fig. 7 as an example, by using the method of the present invention, the final line segmentation result is shown in fig. 14, where the connected domains with the same color represent the same line, and the connected domains with different colors represent different lines, and as can be seen from fig. 15 and fig. 16, the present invention can avoid the erroneous line segmentation result obtained by the existing method on fig. 2 and fig. 4, and can ensure that each connected domain of the keyword "DOT" in fig. 2 belongs to the same line, and each connected domain of the keyword "156" in fig. 4 belongs to the same line, which is beneficial to correctly extracting the required keyword in the subsequent OCR recognition result.

Drawings

FIG. 1 is a flow chart of a text ranking method for locating keywords in an image according to the present invention;

FIG. 2 is a sample diagram of a prior art image text layout;

FIG. 3 is a graph of the result of line segmentation of sample graph one in the prior art;

FIG. 4 is a sample diagram II of a prior art image text layout;

FIG. 5 is a graph of the result of line segmentation of the sample graph two by the prior art method;

FIG. 6 is a diagram of the parameter definitions of the positions of the minimum bounding rectangle of connected domains;

FIG. 7 is a diagram showing the results of sorting connected-domain minimum bounding rectangular boxes according to the X-axis;

FIG. 8 is a diagram showing the conditions I, II and III to be satisfied when connected domains to be ordered and unordered belong to the same row;

FIG. 9 is an illustration of an erroneous split line case one;

FIG. 10 is a diagram showing correction of an erroneous split case to a correct split case;

FIG. 11 is a diagram showing the conditions four, five and six to be satisfied for communicating domains to be ordered and unordered belonging to the same row;

FIG. 12 is a display diagram of an erroneous split line case two;

FIG. 13 is a diagram showing correction of an erroneous split case two to a correct split case;

FIG. 14 is a graph showing the result of ordering the connected domain of FIG. 7 according to the present invention;

FIG. 15 is a graph showing the result of ordering the connected domain of FIG. 3 according to the present invention;

FIG. 16 is a graph showing the result of sorting the connected domains of FIG. 5 according to the present invention.

Detailed Description

Defining 3 vectors, named R0, R1 and DR respectively, whose roles are as follows:

R0 is initialized to be empty and unordered connected domains are stored. And R1, initializing the connected domains which are sequenced from left to right according to the X-axis direction, and temporarily storing the connected domains which do not accord with the sequencing rule. DR, finally outputting connected domain.

Defining related variables of the position of the minimum circumscribed rectangle frame of the connected domain: cx_k, the center x coordinate of the smallest circumscribed rectangular box of the kth connected domain. cy_k, the center y coordinate of the smallest circumscribed rectangular frame of the kth connected domain. max_k is the maximum Y coordinate of the minimum circumscribed rectangle frame of the kth connected domain. min_k is the minimum Y coordinate of the minimum circumscribed rectangular frame of the kth connected domain. up_k, the midpoint between the center point and the upper boundary of the smallest circumscribed rectangular box of the kth connected domain. H, distance between up_k and min_k. W is used for judging the maximum range of the distance of the x coordinates of the central points of the two connected domains of the relative position relationship, and the parameter needs to be manually set according to actual conditions and is used for limiting the comparison range.

Referring to fig. 1 to 16, a text ranking method for locating keywords in an image includes the steps of:

Step 1: after all the connected domains in the image are extracted, the X coordinates of the left upper corner of the smallest circumscribed rectangle frame of the connected domains are sequenced from left to right according to the X axis direction, as shown in fig. 7. And sequentially storing the ordered results into the vector R1.

Step 2: at this time, the connected domains in R1 do not satisfy the final ordering rule, all connected domains in R1 are taken out and stored in a vector R0, and at this time R1 is an empty vector. The first connected domain of R0 (i.e., R0[0 ]) is taken out and stored in DR as a new row of reference connected domains.

Step 3: and traversing R0 according to the sequence from the index to the index, setting the circulation index as j, and enabling R0[ j ] to represent the j-th connected domain in the vector R0. Obtaining the position parameter of R0[ j ]: min_j, max_j, cy_j, cx_j, and up_j. (wherein, min_j represents the minimum Y coordinate of the minimum circumscribed rectangle of the jth connected domain in R0, and other parameters are the same).

Step 4: when any one of the connected domains in R0 is traversed (here, it is assumed that the connected domains are traversed to the jth connected domain, that is, R0[ j ]), R1 is traversed in the order of the index from large to small, the cyclic index is set to k, the comparison of the relative positions of R0[ j ] and the connected domain satisfying the condition three (here, it is assumed that R1[ k ]) in R1 is performed (as shown in FIG. 8), and if the condition three is satisfied, the conditions one and two are satisfied, which means that there is a connected domain belonging to the same row as R0[ j ] in R1, so R0[ j ] is stored in R1. This step can prevent the erroneous split situation as shown in fig. 9, since the connected areas c and a satisfy the condition of the same row, but c and b also satisfy the condition of the same row, and b is not determined to belong to the same row of a at this time, b has already been stored in R1, and c should be stored in R1 at this time, and b and c can be set to the same row at the time of the next determination, as shown in fig. 10. If R1 is traversed and the conditions one, two and three cannot be met at the same time, then step 5 is executed, otherwise, step 3 is returned, and the next connected domain in R0, namely R0[ j+1], is traversed.

Step 5: the last connected domain of the vector DR is fetched, i.e., the connected domain that was last stored in DR (here set to DR [ end ]). Obtaining a location parameter of DR [ end ]: min_e, max_e, cy_e, and cx_e. (wherein, min_e represents the minimum Y coordinate of the minimum circumscribed rectangle of DR [ end ], and the other parameters are the same). And (3) comparing the relative positions of the connected domain R0[ j ] which is not stored in the R1 in the step (4) with the DR [ end ] (as shown in FIG. 11), if the conditions four, five and six are simultaneously met, storing the R0[ j ] in the DR, otherwise, storing the R0[ j ] in the R1. This step can achieve the result shown in fig. 10, dividing b and c into the same row, and a belonging to another row. Meanwhile, the situation of wrong line division as shown in fig. 12 can be avoided, the connected domains a and b meet the condition of the same line, and c and a do not meet the condition of the same line, and the relative position relationship between c and b can be judged through step 5, so that c and b meet the condition of the same line, and then a, b and c all belong to the same line, as shown in fig. 13.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to those skilled in the art without departing from the principles of the present invention are intended to be considered as protecting the scope of the present template.

The foregoing describes one embodiment of the present invention in detail, but the description is only a preferred embodiment of the present invention and should not be construed as limiting the scope of the invention. All equivalent changes and modifications within the scope of the present invention are intended to be covered by the present invention.

Claims

1. A text ranking method for locating keywords in an image, comprising the steps of:

Step 3: traversing R0 according to the sequence from the index to the index, setting the circulation index as j, wherein R0[ j ] represents the j-th connected domain in the vector R0, and acquiring the position parameter of R0[ j ]: min_j, max_j, cy_j, cx_j, and up_j, min_j represent the minimum Y coordinate of the minimum bounding rectangle frame of the jth connected domain in R0, max_j represents the maximum Y coordinate of the minimum bounding rectangle frame of the jth connected domain in R0, cx_j represents the center x coordinate of the minimum bounding rectangle frame of the jth connected domain in R0, cy_j represents the center Y coordinate of the minimum bounding rectangle frame of the jth connected domain in R0, up_j represents the midpoint between the center point and the upper boundary of the minimum bounding rectangle frame of the jth connected domain in R0;

Step 5: the last connected domain of the vector DR is taken out, namely, the last connected domain DR [ end ] stored in the DR is taken out, and the position parameter of DR [ end ] is obtained: min_e, max_e, cy_e, and cx_e, min_e represent the minimum Y coordinate of the minimum bounding rectangle frame of DR [ end ], max_e represents the maximum Y coordinate of the minimum bounding rectangle frame of DR [ end ], cx_e represents the center x coordinate of the minimum bounding rectangle frame of DR [ end ], cy_e represents the center Y coordinate of the minimum bounding rectangle frame of DR [ end ], compare the relative positions of the connected domain R0[ j ] which is not stored in R1 in step 4 and DR [ end ], store R0[ j ] in DR if the conditions four, five and six are satisfied at the same time, otherwise store R0[ j ] in R1;

2. The text ranking method for locating keywords in images according to claim 1, wherein R0 represents initializing to null, storing unordered connected domains, R1 represents initializing to connected domains that have been ranked from left to right in the X-axis direction, and subsequently temporarily storing connected domains that do not conform to ranking rules, and DR represents the finally outputted connected domains.

3. The text ranking method for locating keywords in an image according to claim 2, wherein cx_k represents a center x coordinate of a smallest bounding rectangular frame of a kth connected domain, cy_k represents a center Y coordinate of a smallest bounding rectangular frame of a kth connected domain, max_k represents a largest Y coordinate of a smallest bounding rectangular frame of a kth connected domain, min_k represents a smallest Y coordinate of a smallest bounding rectangular frame of a kth connected domain, up_k represents a midpoint between a center point and an upper boundary of a smallest bounding rectangular frame of a kth connected domain, H represents a distance between up_k to min_k, and W represents a maximum range of distances of x coordinates of center points of two connected domains used for judging a relative positional relationship.