CN107943780B

CN107943780B - Layout column dividing method and device

Info

Publication number: CN107943780B
Application number: CN201711365896.3A
Authority: CN
Inventors: 胡雨隆; 胡金水
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2017-12-18
Filing date: 2017-12-18
Publication date: 2021-07-06
Anticipated expiration: 2037-12-18
Also published as: CN107943780A

Abstract

The invention discloses a layout column dividing method and a device, wherein the method comprises the following steps: acquiring a text image to be column-divided; segmenting the text image to obtain each text line; and merging the text lines into columns to obtain a final column division result. By using the method and the device, accurate column division results can be obtained for the complex handwritten page.

Description

Layout column dividing method and device

Technical Field

The invention relates to the field of image processing, in particular to a layout column dividing method and device.

Background

With the development of computer science and technology, the automatic information processing capability and level are also remarkably improved. The electronic application of documents is spread in all aspects of life and work of people, brings great changes to working modes and life modes, and also profoundly influences the field of education. The analysis and the column division of the layout in the document are one of the essential steps in the document electronization process, and the accuracy of the analysis directly affects the document electronization result, so that the analysis and the column division are always paid attention by related technical researchers.

The existing mainstream method for document layout column division: based on the consideration that most columns in the print document are regular rectangles, the interval between the columns is found by carrying out matrix detection and induction on blank areas, and finally, a column division result is obtained.

The layout column dividing method is only effective on documents (taking printed documents as a typical example) with neat typesetting and clear edges, but is not ideal for handwritten documents with disordered writing, multiple columns and irregular distribution among the columns, especially for solving problems of mathematical and physical problems.

Disclosure of Invention

The embodiment of the invention provides a page column dividing method and device, so that an ideal column dividing effect can be obtained for a complex handwriting page.

Therefore, the invention provides the following technical scheme:

a method of layout subfield, the method comprising:

acquiring a text image to be column-divided;

segmenting the text image to obtain each text line;

and merging the text lines into columns to obtain a final column division result.

Optionally, the merging the text lines into columns to obtain a final column separation result includes:

taking each text line as an independent text column, combining adjacent text columns in sequence, and calculating combined column cost to obtain the minimum column cost;

and determining a final column separation result according to the minimum column separation cost.

Optionally, the step of taking each text line as an independent text column, sequentially merging adjacent text columns, and calculating a merged column cost to obtain a minimum column cost includes:

(1) initialization: taking each text line as an independent text bar, storing the independent text bar into a text bar set as a current bar dividing result, and calculating the current bar dividing cost;

(2) sequentially selecting one text column in the current column result as the current text column for prediction: taking a text column obtained by combining a current text column and an adjacent text column thereof as a predicted text column to obtain a predicted column separation result, calculating a predicted column separation cost corresponding to the predicted column separation result, and obtaining a minimum predicted column separation cost;

(3) judging whether the minimum prediction fence cost is less than the current fence cost;

(4) if so, updating the current column dividing result to be the predicted column dividing result corresponding to the minimum predicted column dividing cost, updating the current column dividing cost to be the minimum predicted column dividing cost, and then executing the step (2);

(5) otherwise, taking the current column cost as the minimum column cost;

the determining the final column division result according to the minimum column division cost comprises:

and taking the current column division result corresponding to the minimum column division cost as a final column division result.

Optionally, the calculating the predicted fence cost includes:

calculating the intra-column cost of the predicted text column;

calculating the inter-column cost of the predicted text column;

and obtaining the predicted column separation cost according to the intra-column cost and the inter-column cost.

Optionally, the calculating the intra-column cost of the predicted text column includes:

respectively calculating the cost values of every two adjacent text lines in the predicted text column, which belong to the same column, based on a pre-constructed in-column cost model; the input of the in-column cost model is any one or more of the following items: the space, the X-axis overlapping proportion and the X-axis length proportion of two adjacent text lines; the output of the in-column cost model is a cost value of two adjacent text lines belonging to the same column;

and averaging all the obtained cost values, and taking the average value as the intra-column cost of the prediction text column.

Optionally, the calculating the inter-column cost of the predicted text column includes:

calculating the cost value of the predicted text column and each adjacent text column which do not belong to the same column based on a pre-constructed inter-column cost model; the input of the inter-column cost model is any one or more of the following items: predicting the minimum distance between the pixel and the convex hull externally connected with the pixel in the text column and the adjacent text column, the X-axis overlapping proportion, the Y-axis overlapping proportion, the X-axis length proportion and the Y-axis length proportion; the output of the inter-column cost model is the cost value of two adjacent text columns as independent columns;

and averaging all the obtained cost values, and taking the average value as the inter-column cost of the predicted text column and the adjacent text column.

Optionally, the adjacent text columns refer to all text columns within a set range.

A layout splitting apparatus, the apparatus comprising:

the image acquisition module is used for acquiring a text image to be classified;

the segmentation module is used for segmenting the text lines of the text image to obtain all the text lines;

and the column processing module is used for merging the text lines into columns to obtain a final column result.

Optionally, the column dividing processing module is specifically configured to take each text line as an independent text column, sequentially merge adjacent text columns, and calculate a column dividing cost after merging to obtain a minimum column dividing cost; and determining a final column separation result according to the minimum column separation cost.

Optionally, the column processing module includes:

the initialization unit is used for storing each text line as an independent text bar into the text bar set as a current bar dividing result and calculating the current bar dividing cost;

the prediction unit is used for sequentially selecting one text column in the current column dividing result as a current text column, taking the text column after the current text column and the adjacent text column are combined as a prediction text column to obtain a prediction column dividing result, calculating the prediction column dividing cost corresponding to the prediction column dividing result and obtaining the minimum prediction column dividing cost;

the judging unit is used for judging whether the minimum prediction barrier cost is smaller than the current barrier cost;

the updating unit is used for updating the current column dividing result to be the predicted column dividing result corresponding to the minimum predicted column dividing cost after the judging unit judges that the minimum predicted column dividing cost is smaller than the current column dividing cost, updating the current column dividing cost to be the minimum predicted column dividing cost, and then triggering the predicting unit to perform the next round of calculation;

and the column result output unit is used for taking the current column cost as the minimum column cost and taking the current column result corresponding to the minimum column cost as the final column result after the judgment unit judges that the minimum predicted column cost is greater than or equal to the current column cost.

Optionally, the prediction unit comprises:

an intra-column cost calculation unit configured to calculate an intra-column cost of the predicted text column;

an inter-column cost calculation unit configured to calculate an inter-column cost of the predicted text column;

and the prediction column cost calculation unit is used for obtaining the prediction column cost according to the intra-column cost and the inter-column cost of the prediction text column.

Optionally, the intra-column cost calculating unit is specifically configured to calculate, based on a pre-constructed intra-column cost model, cost values of every two adjacent text lines in the predicted text column that belong to the same column respectively; the input of the in-column cost model is any one or more of the following items: the space, the X-axis overlapping proportion and the X-axis length proportion of two adjacent text lines; the output of the in-column cost model is a cost value of two adjacent text lines belonging to the same column; and averaging all the obtained cost values, and taking the average value as the intra-column cost of the prediction text column.

Optionally, the inter-column cost calculating unit is specifically configured to calculate, based on a pre-constructed inter-column cost model, a cost value of the predicted text column and each adjacent text column that do not belong to the same column; the input of the inter-column cost model is any one or more of the following items: predicting the minimum distance between the pixel and the convex hull externally connected with the pixel in the text column and the adjacent text column, the X-axis overlapping proportion, the Y-axis overlapping proportion, the X-axis length proportion and the Y-axis length proportion; the output of the inter-bar cost model is the cost value of two text bars as independent bars; and averaging all the obtained cost values, and taking the average value as the inter-column cost of the predicted text column and the adjacent text column.

The layout column dividing method and device provided by the embodiment of the invention can be used for dividing the text lines of the text image to be divided into columns and then combining the text lines into columns by the minimum unit of the text lines, so as to obtain the final column dividing result. Further, when merging the text lines into columns, the optimal merging scheme is selected as the final column dividing result by calculating the merging cost of various merging schemes. The scheme of the invention is not only suitable for typesetting documents with regular typesetting and clear edges, but also is more suitable for the columns of the complex handwritten layout with a plurality of columns and no fixed rule in the column distribution, such as handwritten answers in a math test paper, and can obtain accurate column result.

Drawings

In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a flow chart of a layout subfield method according to an embodiment of the present invention;

FIG. 2 is a flow diagram of a process for merging lines of text into columns in an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a layout column apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a column processing module in the embodiment of the present invention.

Detailed Description

In order to make the technical field of the invention better understand the scheme of the embodiment of the invention, the embodiment of the invention is further described in detail with reference to the drawings and the implementation mode.

Aiming at the problem that the existing column separation method cannot obtain an ideal column separation effect on a complex handwriting layout, the embodiment of the invention provides a layout column separation method, which combines text rows into columns by using a text behavior minimum unit to obtain a final column separation result.

As shown in fig. 1, it is a flowchart of a layout column dividing method according to an embodiment of the present invention, including the following steps:

step 101, acquiring a text image to be column-divided.

Specifically, the to-be-columnated text image may be obtained by scanning with a scanner device, or obtained by using a high-speed camera, a mobile device, or the like.

And 102, segmenting the text lines of the text image to obtain the text lines.

The text line segmentation can be realized by adopting the prior art, such as a projection statistical method, a connected body clustering method and the like, and the embodiment of the invention is not limited.

And 103, merging the text lines into columns to obtain a final column division result.

Specifically, each text line is used as an independent text column, adjacent text columns are combined in sequence, and the combined column cost is calculated to obtain the minimum column cost; and determining a final column separation result according to the minimum column separation cost. The fence cost includes an intra-fence cost and an inter-fence cost.

The specific implementation process of merging text lines into columns is shown in fig. 2, and includes the following steps:

step 201, initialization: and taking each text line as an independent text column, storing the independent text line into the text column set as a current column dividing result, and calculating the current column dividing cost.

Assuming that there are N text fields currently stored in the text field set a, a ═ a₁,A₂,…,A_N}。

The fence cost comprises: intra-column cost and inter-column cost; the intra-column cost refers to the cost that adjacent text lines in a text column belong to the same column; the inter-column cost refers to the cost of the current text column and the adjacent text column being independent columns.

The cost values of two adjacent text lines belonging to the same column can be obtained by a pre-constructed intra-column cost model

To obtain wherein l_kA line of text of the k-th line,

representing the adjacent lines of the k-th line of text. The in-column cost model may employ a regression model (e.g., SVM, DNN, etc.). The input characteristics of the in-column cost model are any one or more of the following items: the distance between two adjacent text lines (e.g., the minimum distance between the center of gravity connecting lines of the communicating bodies in the two lines), the X-axis overlap ratio (i.e., the X-axis overlap length/the total length of the two X-axis lines), and the X-axis length ratio (i.e., the X-axis length of the short line/the X-axis length of the long line); the output of the intra-column cost model is a cost value of two adjacent text lines belonging to the same column.

Accordingly, text column A_iCan pass through the function f (A)_i) And (3) calculating:

wherein K is the text column A_iThe total number of lines of the Chinese text.

Cost values of two adjacent text columns as independent columns, namely, cost values not belonging to the same column can be obtained through a pre-constructed inter-column cost model (A)_i,A_j) Get, wherein the text column A_jPresentation text column A_iAdjacent text columns.

The inter-column cost model may also employ a regression model (e.g., SVM, DNN, etc.). The input characteristics of the inter-column cost model are any one or more of the following items: predicting the minimum distance between pixels in the text column and the adjacent text column and externally connecting convex bags, the X-axis overlapping proportion (namely the X-axis overlapping length/the total length of two rows of X-axes), the Y-axis overlapping proportion (namely the Y-axis overlapping length/the total length of two rows of Y-axes), the X-axis length proportion (namely the X-axis length of a short row/the X-axis length of a long row) and the Y-axis length proportion (namely the Y-axis length of a short row/the Y-axis length of a long row); the output of the inter-column cost model is the cost value of two adjacent columns as independent columns.

Accordingly, text column A_iCan pass through the function g (A)_i) And (3) calculating:

wherein M is_iAs a text field A_iTotal number of adjacent text columns of (A)_jPresentation text column A_iAdjacent text columns.

After the intra-column cost and the inter-column cost of each text column in the current column dividing result are obtained based on the formulas (1) and (2), the current column dividing cost can be obtained based on two cost values of the text columns, for example, the intra-column cost and the inter-column cost can be weighted to obtain the current column dividing cost; further, in order to avoid too fine or too coarse results of the column division, the number of the current columns may be considered in the formula for calculating the current column division cost, for example, the following formula may be adopted for calculating the current column division cost:

wherein α and β are the intra-column cost weight and the inter-column cost weight, respectively, and N is the number of the current column. λ is the penalty factor for the number of columns. The values of α, β and λ can be determined based on practical application and/or a large number of experiments and experience.

The adjacent text columns refer to all the text columns within the set range. For example, for text column A_iIts adjacent text column can be defined as text column A_iIs the origin, centered at all text columns within a radius R. Of course, other setting ranges are also possible, for example, other shapes are also possible, and the embodiment of the present invention is not limited thereto.

Step 202, one text column in the current column dividing result is sequentially selected as a current text column, the text column after the current text column and the adjacent text column are combined is used as a predicted text column, a predicted column dividing result is obtained, a predicted column dividing cost corresponding to the predicted column dividing result is calculated, and the minimum predicted column dividing cost is obtained.

That is, the selected text field in which the current text field and the adjacent text field are merged is used as a new text field, and accordingly, the current column result is also changed correspondingly, and the changed column result is used as the predicted column result. When the selected current text field is merged with its neighboring text fields, it may be merged with a part of its neighboring text fields, or merged with all its neighboring text fields.

The calculation of the predicted fence cost is the same as the calculation of the current fence cost described in step 201 above. Assuming that there are 6 text columns in the current column dividing result, combining each text column with the adjacent text column in sequence by taking the text column as the current text column, and calculating the prediction column dividing cost corresponding to the combination, thereby obtaining 6 prediction column dividing costs, and selecting the minimum prediction column dividing cost from the prediction column dividing costs.

Step 203, judging whether the minimum prediction fence cost is less than the current fence cost; if yes, go to step 204; otherwise, step 205 is performed.

And 204, updating the current column dividing result to be the predicted column dividing result corresponding to the minimum predicted column dividing cost, updating the current column dividing cost to be the minimum predicted column dividing cost, and then executing the step 202.

And step 205, taking the current column separation cost as the minimum column separation cost, and taking the current column separation result corresponding to the minimum column separation cost as the final column separation result.

The layout column dividing method provided by the embodiment of the invention is used for dividing the text line of the text image to be divided into columns and then combining the text lines into columns by the minimum unit of the text line, so as to obtain the final column dividing result. Further, when merging the text lines into columns, the optimal merging scheme is selected as the final column dividing result by calculating the merging cost of various merging schemes. The scheme of the invention is not only suitable for typesetting documents with regular typesetting and clear edges, but also is more suitable for the columns of the complex handwritten layout with a plurality of columns and no fixed rule in the column distribution, such as handwritten answers in a math test paper, and can obtain accurate column result.

Correspondingly, an embodiment of the present invention further provides a layout column dividing apparatus, as shown in fig. 3, which is a schematic structural diagram of the apparatus.

In this embodiment, the apparatus includes the following modules:

the image acquisition module 301 is configured to acquire a text image to be columnated;

a segmentation module 302, configured to perform text line segmentation on the text image to obtain each text line;

and a column processing module 303, configured to merge the text lines into columns to obtain a final column result.

The image acquisition module 301 may be specifically a scanner, a high-speed camera, a camera, or the like. The segmentation module 302 may implement segmentation of the to-be-segmented text image by using the prior art, such as a method of projection statistics, a method of connected body clustering, and the like.

The column processing module 303 is specifically configured to take each text line as an independent text column, sequentially merge adjacent text columns, and calculate a column cost after merging to obtain a minimum column cost; and determining a final column separation result according to the minimum column separation cost. A specific structure of the column processing module 303 is shown in fig. 4, and includes the following units:

an initialization unit 41, configured to store each text line as an independent text bar in a text bar set as a current bar result, and calculate a current bar cost; the specific calculation mode of the current column cost can refer to the description in the embodiment of the method of the invention, and is not described herein again;

the prediction unit 42 is configured to sequentially select one text column in the current segmentation result as a current text column, use a text column obtained by combining the current text column and an adjacent text column thereof as a predicted text column, obtain a predicted segmentation result, calculate a predicted segmentation cost corresponding to the predicted segmentation result, and obtain a minimum predicted segmentation cost therein;

a judging unit 43, configured to judge whether the minimum predicted fence cost is smaller than a current fence cost;

an updating unit 44, configured to update the current segmentation result to be the prediction segmentation result corresponding to the minimum prediction segmentation cost after the determining unit 43 determines that the minimum prediction segmentation cost is smaller than the current segmentation cost, update the current segmentation cost to be the minimum prediction segmentation cost, and then trigger the predicting unit to perform the next round of calculation;

and a barrier result output unit 45, configured to, after the determining unit 43 determines that the minimum predicted barrier cost is greater than or equal to the current barrier cost, take the current barrier cost as the minimum barrier cost, and take the current barrier result corresponding to the minimum barrier cost as a final barrier result.

The prediction unit 42 may select an optimal merging scheme as a final subfield result by calculating merging costs of various merging schemes. One specific structure of prediction unit 42 may include the following units:

the intra-column cost calculation unit is configured to calculate an intra-column cost of the predicted text column, for example, cost values of every two adjacent text lines in the predicted text column belonging to the same column may be calculated based on a pre-constructed intra-column cost model; the input of the in-column cost model is any one or more of the following items: the space, the X-axis overlapping proportion and the X-axis length proportion of two adjacent text lines; the output of the in-column cost model is a cost value of two adjacent text lines belonging to the same column; averaging all the obtained cost values, and taking the average value as the intra-column cost of the prediction text column;

an inter-column cost calculation unit, configured to calculate an inter-column cost of the predicted text column, for example, a cost value of the predicted text column not belonging to the same column as each adjacent text column may be calculated based on a pre-constructed inter-column cost model; the input of the inter-column cost model is any one or more of the following items: predicting the minimum distance between the pixel and the convex hull externally connected with the pixel in the text column and the adjacent text column, the X-axis overlapping proportion, the Y-axis overlapping proportion, the X-axis length proportion and the Y-axis length proportion; the output of the inter-bar cost model is the cost value of two text bars as independent bars; taking the average value of all the obtained cost values, and taking the average value as the inter-column cost of the predicted text column and the adjacent text column;

The specific calculation method of the corresponding cost by each calculation unit may refer to the description in the embodiment of the method of the present invention, and is not described herein again.

The layout column dividing device provided by the embodiment of the invention can be used for carrying out text line segmentation on a text image to be divided into columns and then combining the text lines into columns by using a text line minimum unit to obtain a final column dividing result. Further, when merging the text lines into columns, the optimal merging scheme is selected as the final column dividing result by calculating the merging cost of various merging schemes. The scheme of the invention is not only suitable for typesetting documents with regular typesetting and clear edges, but also is more suitable for the columns of the complex handwritten layout with a plurality of columns and no fixed rule in the column distribution, such as handwritten answers in a math test paper, and can obtain accurate column result.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. Furthermore, the above-described embodiments of the apparatus are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above embodiments of the present invention have been described in detail, and the present invention has been described herein with reference to particular embodiments, but the above embodiments are merely intended to facilitate an understanding of the methods and apparatuses of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for page column, the method comprising:

acquiring a text image to be column-divided;

segmenting the text image to obtain each text line;

merging the text lines into columns to obtain a final column division result;

the merging the text lines into columns to obtain a final column separation result comprises:

taking each text line as an independent text bar, storing the independent text line into a text bar set as a current bar dividing result, and calculating the current bar dividing cost; the fence cost comprises: intra-column cost and inter-column cost; the intra-column cost refers to the cost that adjacent text lines in a text column belong to the same column; the inter-column cost refers to the cost that the current text column and the adjacent text column are independent columns;

sequentially combining adjacent text columns, calculating combined column cost to obtain the minimum column cost, and updating a column result and the current column cost according to the minimum column cost and the current column cost;

and after all adjacent text fields are combined, obtaining a final column dividing result.

2. The method according to claim 1, wherein the sequentially merging the adjacent text columns, calculating merged column costs to obtain a minimum column cost, and updating the column result and the current column costs according to the minimum column cost and the current column costs comprises:

(1) sequentially selecting one text column in the current column result as the current text column for prediction: taking a text column obtained by combining a current text column and an adjacent text column thereof as a predicted text column to obtain a predicted column separation result, calculating a predicted column separation cost corresponding to the predicted column separation result, and obtaining a minimum predicted column separation cost;

(2) judging whether the minimum prediction fence cost is less than the current fence cost;

(3) if so, updating the current column dividing result to be the predicted column dividing result corresponding to the minimum predicted column dividing cost, updating the current column dividing cost to be the minimum predicted column dividing cost, and then executing the step (1);

(4) otherwise, taking the current column cost as the minimum column cost;

after all the adjacent text columns are combined, the step of obtaining a final column separation result comprises the following steps:

and after all adjacent text columns are combined, taking the current column division result corresponding to the minimum column division cost as a final column division result.

3. The method of claim 2, wherein calculating the predicted fence cost comprises:

calculating the intra-column cost of the predicted text column;

calculating the inter-column cost of the predicted text column;

4. The method of claim 3, wherein calculating the intra-column cost of the predicted text column comprises:

5. The method of claim 3, wherein calculating the inter-column cost of the predicted text column comprises:

6. The method according to any one of claims 1 to 5, wherein the adjacent text columns refer to all text columns within a set range.

7. A layout column apparatus, the apparatus comprising:

the column processing module is used for storing each text line as an independent text column into the text column set as a current column result and calculating the current column cost; sequentially combining adjacent text columns, and calculating combined column cost to obtain the minimum column cost; updating the column dividing result and the current column dividing cost according to the minimum column dividing cost and the current column dividing cost; after all adjacent text columns are combined, a final column division result is obtained; wherein the fence cost comprises: intra-column cost and inter-column cost; the intra-column cost refers to the cost that adjacent text lines in a text column belong to the same column; the inter-column cost refers to the cost of the current text column and the adjacent text column being independent columns.

8. The apparatus of claim 7, wherein the columnar processing module comprises:

9. The apparatus of claim 8, wherein the prediction unit comprises:

10. The apparatus of claim 9,

the intra-column cost calculation unit is specifically configured to calculate, based on a pre-constructed intra-column cost model, cost values of every two adjacent text lines in the predicted text column that belong to the same column; the input of the in-column cost model is any one or more of the following items: the space, the X-axis overlapping proportion and the X-axis length proportion of two adjacent text lines; the output of the in-column cost model is a cost value of two adjacent text lines belonging to the same column; and averaging all the obtained cost values, and taking the average value as the intra-column cost of the prediction text column.

11. The apparatus of claim 9,

the inter-column cost calculation unit is specifically configured to calculate, based on a pre-constructed inter-column cost model, a cost value of the predicted text column and each adjacent text column not belonging to the same column; the input of the inter-column cost model is any one or more of the following items: predicting the minimum distance between the pixel and the convex hull externally connected with the pixel in the text column and the adjacent text column, the X-axis overlapping proportion, the Y-axis overlapping proportion, the X-axis length proportion and the Y-axis length proportion; the output of the inter-bar cost model is the cost value of two text bars as independent bars; and averaging all the obtained cost values, and taking the average value as the inter-column cost of the predicted text column and the adjacent text column.