Embodiment
For the original block in the literal piece that adopts certain type-setting mode is sorted; Reading order with original block in reduction this article block; The embodiment of the invention provides a kind of literal piece content recombination method; In this method,, confirm the corresponding original block sortord of type-setting mode of original block in the literal piece according to the corresponding relation of predefined type-setting mode and original block sortord; According to this original block sortord the original block in the literal piece is sorted then, and the original block after will sorting is exported demonstration.
Referring to Fig. 2, the literal piece content recombination method that the embodiment of the invention provides may further comprise the steps:
Step 20:, confirm the corresponding original block sortord of type-setting mode of original block in the literal piece according to the corresponding relation of predefined type-setting mode and original block sortord;
Step 21: the original block sortord according to confirming sorts to the original block in the literal piece;
Step 22: the original block after will sorting is exported demonstration.
In step 20 and the step 21, when the type-setting mode of original block was horizontally-arranged or vertical setting of types in the literal piece, the original block sortord specifically may further comprise the steps a and step b:
Step a: the sequence number based on each original block in the literal piece sorts the original block in the literal piece;
Step b: per two the adjacent original blocks in the ranking results of step a are carried out following steps: calculate the degree of overlapping of two adjacent original blocks, confirm the new position relation of two adjacent original blocks according to the degree of overlapping of two adjacent original blocks; If this position relation is different with the position relation of two adjacent original blocks in the ranking results of step a, then with the location swap of two adjacent original blocks in the ranking results of step a.
Behind the execution of step b, promptly obtain the ranking results of the original block in the literal piece.
Among the step a, the sequence number of original block is the sequence number that format document wright is provided with for each original block when making the format document, and this sequence number has been indicated the output order of the original block in the format document.When the sequence number based on each original block sorts original block, for the sequence number of original block setting increases successively, then can each original block be sorted according to sequence number order from small to large as if format document producer; If format document producer then can sort each original block according to sequence number order from big to small for the sequence number of original block setting reduces successively.
Among the step b, confirm that based on the degree of overlapping of two adjacent original blocks the new position of two adjacent original blocks concerns that its concrete realization can be following:
Branch one: confirm front and back position relation and the upper-lower position relation of two adjacent original blocks in the literal piece; If two adjacent original blocks degree of overlapping in the horizontal direction is less than the first threshold that sets in advance; Then confirm the new position relation of two adjacent original blocks according to the upper-lower position relation of two adjacent original blocks; In the relation of new position, be the original block that is positioned at first promptly at last original block, under original block in new position relation, be to be positioned at second original block;
Branch two: if two adjacent original blocks in the degree of overlapping of vertical direction less than 0; Then confirm the new position relation of two adjacent original blocks according to the front and back position relation of two adjacent original blocks; In the relation of new position, be the original block that is positioned at first promptly at preceding original block, after original block in new position relation, be to be positioned at second original block;
Branch three: if two adjacent original blocks in the horizontal direction degree of overlapping and in the degree of overlapping of vertical direction all greater than second threshold value that is provided with in advance; Then confirm the new position relation of two adjacent original blocks based on the sequence number of two adjacent original blocks; Be that the less original block of sequence number is the original block that is positioned at first in new position relation, the original block that sequence number is bigger is to be positioned at second original block in new position relation;
Branch four: if two adjacent original blocks in the degree of overlapping of vertical direction less than in the horizontal direction degree of overlapping, then confirm that according to the left margin difference of two adjacent original blocks and the magnitude relationship of right margin difference the new position of two adjacent original blocks concerns;
Branch five:, then confirm that according to the coboundary difference of two adjacent original blocks and the magnitude relationship of lower boundary difference the new position of two adjacent original blocks concerns if two adjacent original blocks are not less than degree of overlapping in the horizontal direction in the degree of overlapping of vertical direction.
In the above-mentioned branch one, confirm the front and back position relation of two adjacent original blocks, its concrete realization can be following:
Calculate the left margin difference of two adjacent original blocks and the right margin difference of two adjacent original blocks; Concrete, calculate Block in two adjacent original blocks
iLeft margin and original block Block
jLeft margin poor, obtain the left margin difference and the original block Block of two adjacent original blocks
jRight margin and original block Block
iRight margin poor, obtain the right margin difference of two adjacent original blocks, wherein: subscript i, j represent sequence number respectively;
If the left margin difference is from left to right horizontally-arranged or vertical setting of types from left to right or directionless vertical setting of types less than the type-setting mode of original block in right margin difference and the literal piece; The front and back position relation of then confirming two adjacent original blocks is: first original block in two adjacent original blocks is preceding, second original block after;
If the left margin difference is from right to left horizontally-arranged or vertical setting of types from right to left less than the type-setting mode of original block in right margin difference and the literal piece; The front and back position relation of then confirming two adjacent original blocks is: second original block in these two adjacent original blocks be preceding, first original block after;
If it is from left to right horizontally-arranged or vertical setting of types from left to right or directionless vertical setting of types that the left margin difference is not less than the type-setting mode of original block in right margin difference and the literal piece; Confirm that then the front and back position relation of these two adjacent original blocks in the literal piece is: second original block in these two adjacent original blocks be preceding, first original block after;
If the type-setting mode that the left margin difference is not less than original block in right margin difference and the literal piece is from right to left horizontally-arranged or vertical setting of types from right to left; Confirm that then the front and back position relation of these two adjacent original blocks in the literal piece is: first original block in these two adjacent original blocks is preceding, second original block after.
In the above-mentioned branch one, confirm the upper-lower position relation of two adjacent original blocks in the literal piece, its concrete realization can be following:
Calculate the coboundary difference of two adjacent original blocks and the lower boundary difference of two adjacent original blocks; Concrete, calculate Block in two adjacent original blocks
iCoboundary and original block Block
jCoboundary poor, obtain the coboundary difference and the original block Block of two adjacent original blocks
jLower boundary and original block Block
iLower boundary poor, obtain the lower boundary difference of two adjacent original blocks, wherein: subscript i, j represent sequence number respectively;
If the coboundary difference is less than the lower boundary difference, confirm that then the upper-lower position relation of two adjacent original blocks in the literal piece is: first original block in two adjacent original blocks is last, and second original block is down;
If the coboundary difference is not less than the lower boundary difference, confirm that then the upper-lower position relation of two adjacent original blocks in the literal piece is: second original block in two adjacent original blocks be last, and first original block is down.
In the above-mentioned branch four, confirm that according to the left margin difference of two adjacent original blocks and the magnitude relationship of right margin difference the new position of two adjacent original blocks concerns that its concrete realization can be following:
If the left margin difference is from left to right horizontally-arranged or vertical setting of types from left to right or directionless vertical setting of types less than the type-setting mode of original block in right margin difference and the said literal piece; The front and back position relation of then confirming two adjacent original blocks is: first original block in two adjacent original blocks is preceding, second original block after;
If the left margin difference is from right to left horizontally-arranged or vertical setting of types from right to left less than the type-setting mode of original block in right margin difference and the said literal piece; The front and back position relation of then confirming two adjacent original blocks is: second original block in these two adjacent original blocks be preceding, first original block after;
If it is from left to right horizontally-arranged or vertical setting of types from left to right or directionless vertical setting of types that the left margin difference is not less than the type-setting mode of original block in right margin difference and the said literal piece; Confirm that then the front and back position relation of these two adjacent original blocks in the literal piece is: second original block in these two adjacent original blocks be preceding, first original block after;
If the type-setting mode that the left margin difference is not less than original block in right margin difference and the said literal piece is from right to left horizontally-arranged or vertical setting of types from right to left; Confirm that then the front and back position relation of these two adjacent original blocks in the literal piece is: first original block in these two adjacent original blocks is preceding, second original block after.
In the above-mentioned branch five, confirm that according to the coboundary difference of two adjacent original blocks and the magnitude relationship of lower boundary difference the new position of two adjacent original blocks concerns that its concrete realization can be following:
If the coboundary difference is less than the lower boundary difference, confirm that then the upper-lower position relation of two adjacent original blocks in the literal piece is: first original block in two adjacent original blocks is last, and second original block is down;
If the coboundary difference is not less than the lower boundary difference, confirm that then the upper-lower position relation of two adjacent original blocks in the literal piece is: second original block in two adjacent original blocks be last, and first original block is down.
In the above-mentioned branch one, the type-setting mode of original block is horizontally-arranged or nondirectional vertical setting of types and two adjacent original blocks when in the literal piece, going together in the literal piece, first threshold can be between-0.07 and-0.1 value; In the literal piece type-setting mode of original block be horizontally-arranged or nondirectional vertical setting of types and two adjacent original blocks in the literal piece during different rows, first threshold is value between-0.03 and-0.06; The type-setting mode of original block is from left to right or during vertical setting of types from right to left, the value of first threshold can be 0 in the literal piece.In the above-mentioned branch three, second threshold value can be between 0.5 and 1 value.
Certainly, the above-mentioned listed first threshold and the span of second threshold value are more excellent span, and any other span and can realize goal of the invention of the present invention still in protection scope of the present invention.
Concrete, can confirm whether two adjacent original blocks go together in the literal piece according to following mode:
At first, the size of the horizontal base line value of two adjacent original blocks is relatively confirmed the line space size according to the font size that has than the original block of levels baseline value; For example, will have C than the font size of the original block of levels baseline value doubly as the line space size, C can be in 0.91-1 value, certainly, C also can get other greater than 0 numerical value;
Then, calculate the horizontal base line difference of these two adjacent original blocks, if the horizontal base line difference greater than said line space size, is then confirmed these two adjacent original blocks different rows in the literal piece, otherwise, confirm that these two adjacent original blocks go together in the literal piece.
In step 20 and the step 21, when the type-setting mode of original block was oblique row in the literal piece, the original block sortord specifically can comprise the steps c-step f:
Step c: the sequence number based on each original block in the literal piece sorts the original block in the literal piece;
Steps d: the coordinate system at literal piece place is rotated conversion;
Step e: the position coordinates based in the coordinate system of ordering each original block of back after rotation transformation among the step c, each original block is divided into groups, the original block after dividing into groups in same group is positioned at delegation or same row;
Step f: based on original block sequence number mean value of each group of back of dividing into groups, each group after dividing into groups is sorted, and, divide into groups for each, sort based on the writing direction of the said literal piece interior original block that will divide into groups.
Behind the execution of step f, promptly obtain the ranking results of the original block in the literal piece.
Among the step c, the sequence number of original block is the sequence number that format document wright is provided with for each original block when making the format document, and this sequence number has been indicated the output order of the original block in the format document.When the sequence number based on each original block sorts original block, for the sequence number of original block setting increases successively, then can each original block be sorted according to sequence number order from small to large as if format document producer; If format document producer then can sort each original block according to sequence number order from big to small for the sequence number of original block setting reduces successively.
In the steps d, the coordinate system that the literal piece is belonged to is rotated conversion, and its concrete realization can be following:
At first, the horizontal base line difference and the vertical parallax difference of per two the adjacent original blocks in ordering back among the calculation procedure c; Confirm maximum horizontal base line difference and the maximum vertical parallax differences of occurrence number of occurrence number that calculate; The horizontal base line difference f maximum according to occurrence number
xThe vertical parallax difference f maximum with occurrence number
y, the slope k of calculating oblique line, for example, k=f
y/ f
x
Then, be rotation angle value α according to the slope k coordinates computed, for example, computing formula can be α=arctan (k);
At last, according to coordinate system rotation angle value α, the coordinate system that the literal piece is belonged to is rotated conversion, and the coordinate system rotation mathematical formulae is: x '=x * cos α+y * sin α, and y '=-x * sin α+y * cos α.
Among the step e, the position coordinates based in the coordinate system of ordering each original block of back after rotation transformation divides into groups each original block, and its concrete realization can be following:
For per two the adjacent original blocks after the ordering among the step c, carry out following steps:
Calculate the horizontal base line value in last original block in two adjacent original blocks and the coordinate system of back one original block after rotational transform; Whether the horizontal base line value of a back original block of confirming to calculate and the difference of last original block horizontal base line value be less than predefined the 3rd threshold value; If; Confirm that then last original block and back one original block are positioned at same delegation, and last original block and back one original block are divided in same group; Otherwise, confirm that last original block and back one original block are positioned at different rows, and in being divided in not last original block and back one original block on the same group.
Here, the 3rd threshold value can be 0.91-1 times of original block height, and certain the 3rd threshold value also can be other any numerical value of 0 that is not less than.
Among the step f, the original block in will dividing into groups based on the writing direction of said literal piece sorts, and its concrete realization can be following:
Greater than 1 o'clock, confirm that the type-setting mode of original block is that vertical row type is tiltedly arranged in the literal piece in slope k, and the interior original block that will divide into groups sorts according to the upper boundary values order from small to large of original block;
Be not more than at 1 o'clock in slope k, confirm that the type-setting mode of original block is that horizontal-type is tiltedly arranged in the literal piece, and the interior original block that will divide into groups sorts according to the left side dividing value order from small to large of original block.
Among the present invention, the type-setting mode of horizontally-arranged can be referring to Fig. 3 A; The type-setting mode of vertical setting of types can be referring to Fig. 3 B; Tiltedly row's type-setting mode can be referring to Fig. 3 C for horizontal-type; Tiltedly row's type-setting mode can be referring to Fig. 3 D for vertical row type.
Among the present invention, the horizontal base line difference of two original blocks is meant, the difference of the horizontal base line value of the horizontal base line of an original block and another original block in these two original blocks.The vertical parallax difference of two original blocks is meant, the difference of the vertical parallax value of the vertical parallax value of an original block and another original block in these two original blocks.The left margin difference of two original blocks is meant, the difference of the left side dividing value of the left side dividing value of an original block and another original block in these two original blocks.The right margin difference of two original blocks is meant, the difference of the right dividing value of the right dividing value of an original block and another original block in these two original blocks.The coboundary difference of two original blocks is meant, the difference of the upper boundary values of the upper boundary values of an original block and another original block in these two original blocks.The lower boundary difference of two original blocks is meant, the difference of the lower border value of the lower border value of an original block and another original block in these two original blocks.
The vertical parallax value of the literal among Fig. 3 E in the rectangular area " " is 97.7, and the horizontal base line value is 522.18, and left side dividing value is 97.4, and upper boundary values is 506.0, and the right dividing value is 117.5, and lower border value is 525.5.Need to prove that the horizontal base line value of original block, vertical parallax value, left side dividing value, upper boundary values, the right dividing value, lower border value are the property values of original block, format document wright can be provided with when input characters automatically.
Among the present invention, degree of overlapping is meant: two original blocks overlap length and two original blocks on direction of measurement is at the ratio of the projected length that this side up.Two original blocks degree of overlapping in the horizontal direction is meant: the minimum lower boundary of two original blocks and maximum coboundary poor accounts for the ratio of the difference of maximum lower boundary and minimum coboundary.Two original blocks are meant in the degree of overlapping of vertical direction: the minimum left margin of two original blocks and maximum right margin poor accounts for the ratio of the difference of maximum left margin and minimum right margin.
The present invention will be described below in conjunction with specific embodiment:
Embodiment one:
In the present embodiment, the type-setting mode of original block is horizontally-arranged or directionless vertical setting of types in the literal piece, to the original block in the literal piece according to horizontally-arranged from left to right or from right to left the order of horizontally-arranged sort the literal of recombinating again piece content.Concrete sort method is following:
Step 01: the sequence number based on each original block in the literal piece sorts the original block in the literal piece;
Step 02: calculate two adjacent original block Block in the literal piece successively respectively
iAnd Block
I+1Degree of overlapping O in the horizontal direction
yDegree of overlapping O with vertical direction
x
Step 03:, confirm the front and back position relation of two adjacent original blocks according to the difference of the left margin of two adjacent original blocks and the extent of right margin;
Here, can use Ret
xThe extent relation of difference and right margin of representing the left margin of two adjacent original blocks, for example, if the difference of left margin is from left to right horizontally-arranged or directionless vertical setting of types, then Ret less than the type-setting mode of original block in the difference of right margin and the said literal piece
xValue be that the front and back position relation of-1, two adjacent original blocks is: first original block in two adjacent original blocks is preceding, second original block after; If the left margin difference is horizontally-arranged from right to left, then Ret less than the type-setting mode of original block in right margin difference and the said literal piece
xValue be that the front and back position relation of 1, two adjacent original block is: second original block in these two adjacent original blocks be preceding, first original block after; If it is from left to right horizontally-arranged or directionless vertical setting of types, then Ret that the left margin difference is not less than the type-setting mode of original block in right margin difference and the said literal piece
xValue be that the front and back position relation of 1, two adjacent original block in the literal piece is: second original block in these two adjacent original blocks be preceding, first original block after; If it is horizontally-arranged from right to left, then Ret that the left margin difference is not less than the type-setting mode of original block in right margin difference and the said literal piece
xValue be that the front and back position relation of-1, two adjacent original blocks in the literal piece is: first original block in these two adjacent original blocks is preceding, second original block after.
Step 04:, confirm the upper-lower position relation of two adjacent original blocks according to the difference of the coboundary of two adjacent original blocks and the extent of lower boundary;
Here, can use Ret
yThe extent relation of difference and lower boundary of representing the coboundary of two adjacent original blocks, for example, if the difference of coboundary poor less than lower boundary, then Ret
yValue be that the upper-lower position relation of-1, two adjacent original blocks in the literal piece is: first original block in two adjacent original blocks is last, and second original block is down; If the coboundary difference is not less than lower boundary difference, then Ret
yValue be that the upper-lower position relation of 1, two adjacent original block in the literal piece is: second original block in two adjacent original blocks be last, and first original block is down.
Step 05: whether go together according to two adjacent original blocks, the degree of overlapping O of horizontal direction is set
yThreshold value, be set to-0.08 during the colleague, be set to-0.05 during different rows.Judge that the method whether two adjacent original blocks go together is:
The size of the horizontal base line value of two adjacent original blocks relatively, 0.95 times of conduct of font size of original block that will have relatively large horizontal base line value is with the line space size;
Calculate the horizontal base line difference of these two adjacent original blocks, and with line space relatively, if greater than line space, different rows then, otherwise colleague.
Step 06: if two adjacent original blocks are not overlapping in the horizontal direction, i.e. in the horizontal direction degree of overlapping O
yLess than threshold value, then sort according to the upper-lower position relation;
Step 07: if two original blocks are not overlapping in vertical direction, promptly in the degree of overlapping of vertical direction less than 0, then sort according to context;
Step 08: if two original blocks are all overlapping in level and vertical direction, and degree of overlapping then sorts according to the sequence number with original block all greater than 0.5 o'clock;
Step 09:, then sort according to degree of overlapping if above-mentioned condition does not all satisfy.
The method that sorts according to degree of overlapping is following:
If the degree of overlapping of vertical direction is less than the degree of overlapping of horizontal direction, then according to Ret
xSize sort Ret even
xLess than 0, Block then
iPreceding, otherwise Block
I+1Preceding;
If the degree of overlapping of vertical direction is not less than the degree of overlapping of horizontal direction, then according to Ret
ySize sort Ret even
yLess than 0, Block then
iPreceding, otherwise Block
I+1Preceding.
Shown in Fig. 4 A, be the literal piece synoptic diagram of horizontally-arranged for type-setting mode; Shown in Fig. 4 B, thereby for according to the method described above to the content reorganization that the obtains synoptic diagram as a result that sorts of the original block in the literal piece shown in Fig. 4 A;
Shown in Fig. 4 C, be the literal piece synoptic diagram of directionless vertical setting of types for type-setting mode; Shown in Fig. 4 D, thereby for according to the method described above to the content reorganization that the obtains synoptic diagram as a result that sorts of the original block in the literal piece shown in Fig. 4 C.
Embodiment two:
In the present embodiment, the type-setting mode of original block is a vertical setting of types in the literal piece, the original block in the literal piece is sorted the literal of recombinating again piece content according to vertical setting of types from left to right, vertical setting of types from left to right; Concrete sort method is following:
Step 11: the sequence number based on each original block in the literal piece sorts the original block in the literal piece;
Step 12: calculate two adjacent original block Block in the literal piece successively respectively
iAnd Block
I+1Degree of overlapping O in the horizontal direction
yDegree of overlapping O with vertical direction
x
Step 13:, confirm the front and back position relation of two adjacent original blocks according to the difference of the left margin of two adjacent original blocks and the extent of right margin;
Here, can use Ret
xThe extent relation of difference and right margin of representing the left margin of two adjacent original blocks, for example, if the difference of left margin is vertical setting of types from left to right, then Ret less than the type-setting mode of original block in the difference of right margin and the said literal piece
xValue be that the front and back position relation of-1, two adjacent original blocks is: first original block in two adjacent original blocks is preceding, second original block after; If the left margin difference is vertical setting of types from right to left, then Ret less than the type-setting mode of original block in right margin difference and the said literal piece
xValue be that the front and back position relation of 1, two adjacent original block is: second original block in these two adjacent original blocks be preceding, first original block after; If it is vertical setting of types from left to right, then Ret that the left margin difference is not less than the type-setting mode of original block in right margin difference and the said literal piece
xValue be that the front and back position relation of 1, two adjacent original block in the literal piece is: second original block in these two adjacent original blocks be preceding, first original block after; If it is vertical setting of types from right to left, then Ret that the left margin difference is not less than the type-setting mode of original block in right margin difference and the said literal piece
xValue be that the front and back position relation of-1, two adjacent original blocks in the literal piece is: first original block in these two adjacent original blocks is preceding, second original block after.
Step 14:, confirm the upper-lower position relation of two adjacent original blocks according to the difference of the coboundary of two adjacent original blocks and the extent of lower boundary;
Here, can use Ret
yThe extent relation of difference and lower boundary of representing the coboundary of two adjacent original blocks, for example, if the difference of coboundary poor less than lower boundary, then Ret
yValue be that the upper-lower position relation of-1, two adjacent original blocks in the literal piece is: first original block in two adjacent original blocks is last, and second original block is down; If the coboundary difference is not less than lower boundary difference, then Ret
yValue be that the upper-lower position relation of 1, two adjacent original block in the literal piece is: second original block in two adjacent original blocks be last, and first original block is down.
Step 15: if two adjacent original blocks are not overlapping in vertical direction, promptly in the degree of overlapping of vertical direction less than 0, then sort according to context;
Step 16: if two adjacent original blocks are not overlapping in the horizontal direction, i.e. in the horizontal direction degree of overlapping O
yLess than 0, then sort according to the upper-lower position relation;
Step 17: if two original blocks are all overlapping in level and vertical direction, and degree of overlapping then sorts according to the sequence number with original block all greater than 0.5 o'clock;
Step 18:, then sort according to degree of overlapping if above-mentioned condition does not all satisfy.
The method that sorts according to degree of overlapping is following:
If the degree of overlapping of vertical direction is less than the degree of overlapping of horizontal direction, then according to Ret
xSize sort Ret even
xLess than 0, Block then
iPreceding, otherwise Block
I+1Preceding;
If the degree of overlapping of vertical direction is not less than the degree of overlapping of horizontal direction, then according to Ret
ySize sort Ret even
yLess than 0, Block then
iPreceding, otherwise Block
I+1Preceding.
Shown in Fig. 4 E, be the literal piece synoptic diagram of vertical setting of types from left to right for type-setting mode; Shown in Fig. 4 F, thereby for according to the method described above to the content reorganization that the obtains synoptic diagram as a result that sorts of the original block in the literal piece shown in Fig. 4 E.
Embodiment three:
In the present embodiment; The type-setting mode of original block is that horizontal-type is tiltedly arranged or vertical row type is tiltedly arranged in the literal piece; Then the coordinate system to literal piece place carries out the rotational transform of coordinate system; Find colleague's original block according to the coordinate position of original block in rotating coordinate system, again colleague's original block according to concrete from left to right or composing from right to left sort the literal of recombinating again piece content in proper order.Concrete sort method is following:
Step 21: the sequence number based on each original block in the literal piece sorts the original block in the literal piece;
Step 22: horizontal base line difference and the vertical parallax difference of calculating per two the adjacent original blocks in ordering back; Confirm maximum horizontal base line difference and the maximum vertical parallax differences of occurrence number of occurrence number that calculate; The horizontal base line difference f maximum according to occurrence number
xThe vertical parallax difference f maximum with occurrence number
y, the slope k of calculating oblique line, that is: k=f
y/ f
x
Step 23: the value according to slope k calculates the coordinate system rotation angle [alpha], that is: the tangent value of α is k, and computing formula is: α=arctan (k);
Annotate: mathematic sign arctan (), its meaning is an arc tangent.
Step 24: the original block of seeking the colleague;
Horizontal base line value BaseY with original block
OldWith vertical parallax value BaseX
OldCalculate the horizontal base line value BaseY of baseline value in postrotational coordinate system according to the rotation of coordinate formula
NewWith vertical parallax value BaseX
New, through calculating BaseY
NewBaseY with its previous original block
NewDifference whether seek colleague's original block less than 0.95 times original block height, if then be colleague's original block, otherwise be different rows.
Step 25: carry out original block ordering in oblique row of horizontal-type and the oblique row of arranging of vertical row type based on the absolute value of k and 1 relation;
Absolute value as if k is not more than 1, and then type-setting mode is tiltedly arranged for from left to right oblique row of horizontal-type or horizontal-type from right to left, and the original block of going together is sorted according to left side dividing value direction from small to large.
Otherwise type-setting mode tiltedly arranges for vertical row type from left to right or vertical row type is from right to left tiltedly arranged, and the original block of going together is sorted according to upper boundary values direction from small to large.
Step 26: if the line number of literal piece then sorts between every trade to advancing greater than 1, promptly confirm the order of each row, concrete:
Add up the mean value of sequence number of each row original block, according to the size of sequence number mean value according to from small to large order Sort Rows.
Annotate: the coordinate system rotation mathematical formulae is: x '=x * cos α+y * sin α, and y '=-x * sin α+y * cos α.In the present invention, x ' is BaseX
New, y ' is BaseY
New, x is BaseX
Old, y is BaseY
Old
Shown in Fig. 4 G, for type-setting mode is tiltedly row's a literal piece synoptic diagram of horizontal-type; Shown in Fig. 4 H, thereby for according to the method described above to the content reorganization that the obtains synoptic diagram as a result that sorts of the original block in the literal piece shown in Fig. 4 G.
Shown in Fig. 4 I, for type-setting mode is tiltedly row's a literal piece synoptic diagram of vertical row type; Shown in Fig. 4 J, thereby for according to the method described above to the content reorganization that the obtains synoptic diagram as a result that sorts of the original block in the literal piece shown in Fig. 4 I.
The effect of the embodiment of the invention is: adopt method provided by the present invention, can robotization ground according to the type-setting mode of layout files, set up the reading order of space of a whole page literal, reduce original space of a whole page content information.Layout file is counter separate after, artificial only the need simply confirm the reading order of article content, improved anti-efficient of separating with index.
Referring to Fig. 5, the embodiment of the invention also provides a kind of literal piece content reconstruction unit, and this device comprises:
Sortord is confirmed unit 40, is used for the corresponding relation according to predefined type-setting mode and original block sortord, confirms the corresponding original block sortord of type-setting mode of original block in the literal piece;
Original block sequencing unit 41 is used for according to said original block sortord the original block in the said literal piece being sorted;
Content reduction unit 42 is exported demonstration with the original block after the ordering.
Further, said original block sequencing unit 41 specifically comprises:
First sequencing unit when type-setting mode that is used for original block in said literal piece is horizontally-arranged or vertical setting of types, sorts the original block in the said literal piece according to the sequence number of each original block in the said literal piece;
The ordering amending unit is used for carrying out following steps for per two adjacent original blocks of ranking results: calculate the degree of overlapping of two adjacent original blocks, confirm the new position relation of two adjacent original blocks according to the degree of overlapping of two adjacent original blocks; If this position relation is different with the position relation of two adjacent original blocks in ranking results, then with the location swap of two adjacent original blocks in ranking results.
Further, said ordering amending unit specifically is used for: the new position relation of confirming two adjacent original blocks according to following mode:
Confirm front and back position relation and the upper-lower position relation of two adjacent original blocks;
If two adjacent original blocks degree of overlapping in the horizontal direction less than the first threshold that is provided with in advance, is then confirmed the new position relation of two adjacent original blocks based on the upper-lower position relation of two adjacent original blocks;
If two adjacent original blocks less than 0, are then confirmed the new position relation of two adjacent original blocks in the degree of overlapping of vertical direction according to the front and back position relation of two adjacent original blocks;
If two adjacent original blocks in the horizontal direction degree of overlapping and in the degree of overlapping of vertical direction all greater than second threshold value that is provided with in advance, then confirm the new position relation of two adjacent original blocks according to the sequence number of two adjacent original blocks;
If two adjacent original blocks less than in the horizontal direction degree of overlapping, then confirm that according to the left margin difference of two adjacent original blocks and the magnitude relationship of right margin difference the new position of two adjacent original blocks concerns in the degree of overlapping of vertical direction;
If two adjacent original blocks are not less than degree of overlapping in the horizontal direction in the degree of overlapping of vertical direction, then confirm that according to the coboundary difference of two adjacent original blocks and the magnitude relationship of lower boundary difference the new position of two adjacent original blocks concerns.
Further, said ordering amending unit specifically is used for: the front and back position relation of confirming two adjacent original blocks according to following mode:
Calculate the left margin difference and the right margin difference of two adjacent original blocks;
If the left margin difference is from left to right horizontally-arranged or vertical setting of types from left to right or directionless vertical setting of types less than the type-setting mode of original block in right margin difference and the said literal piece; The front and back position relation of then confirming two adjacent original blocks is: first original block in two adjacent original blocks is preceding, second original block after;
If the left margin difference is from right to left horizontally-arranged or vertical setting of types from right to left less than the type-setting mode of original block in right margin difference and the said literal piece; The front and back position relation of then confirming two adjacent original blocks is: second original block in these two adjacent original blocks be preceding, first original block after;
If it is from left to right horizontally-arranged or vertical setting of types from left to right or directionless vertical setting of types that the left margin difference is not less than the type-setting mode of original block in right margin difference and the said literal piece; Confirm that then the front and back position relation of these two adjacent original blocks in the literal piece is: second original block in these two adjacent original blocks be preceding, first original block after;
If the type-setting mode that the left margin difference is not less than original block in right margin difference and the said literal piece is from right to left horizontally-arranged or vertical setting of types from right to left; Confirm that then the front and back position relation of these two adjacent original blocks in the literal piece is: first original block in these two adjacent original blocks is preceding, second original block after.
Further, said ordering amending unit specifically is used for: confirm the upper-lower position relation of two adjacent original blocks in the literal piece in the following manner:
Calculate the coboundary difference and the lower boundary difference of two adjacent original blocks;
If the coboundary difference is less than the lower boundary difference, confirm that then the upper-lower position relation of two adjacent original blocks in the literal piece is: first original block in two adjacent original blocks is last, and second original block is down;
If the coboundary difference is not less than the lower boundary difference, confirm that then the upper-lower position relation of two adjacent original blocks in the literal piece is: second original block in two adjacent original blocks be last, and first original block is down.
Further, said ordering amending unit specifically is used for: confirm that according to the left margin difference of two adjacent original blocks and the magnitude relationship of right margin difference the new position of two adjacent original blocks concerns according to following mode:
If the left margin difference is from left to right horizontally-arranged or vertical setting of types from left to right less than the type-setting mode of original block in right margin difference and the said literal piece; The front and back position relation of then confirming two adjacent original blocks is: first original block in two adjacent original blocks is preceding, second original block after;
If the left margin difference is from right to left horizontally-arranged or vertical setting of types from right to left less than the type-setting mode of original block in right margin difference and the said literal piece; The front and back position relation of then confirming two adjacent original blocks is: second original block in these two adjacent original blocks be preceding, first original block after;
If the type-setting mode that the left margin difference is not less than original block in right margin difference and the said literal piece is from left to right horizontally-arranged or vertical setting of types from left to right; Confirm that then the front and back position relation of these two adjacent original blocks in the literal piece is: second original block in these two adjacent original blocks be preceding, first original block after;
If the type-setting mode that the left margin difference is not less than original block in right margin difference and the said literal piece is from right to left horizontally-arranged or vertical setting of types from right to left; Confirm that then the front and back position relation of these two adjacent original blocks in the literal piece is: first original block in these two adjacent original blocks is preceding, second original block after.
Further, said ordering amending unit specifically is used for: confirm that according to the coboundary difference of two adjacent original blocks and the magnitude relationship of lower boundary difference the new position of two adjacent original blocks concerns according to following mode:
If the coboundary difference is less than the lower boundary difference, confirm that then the upper-lower position relation of two adjacent original blocks in the literal piece is: first original block in two adjacent original blocks is last, and second original block is down;
If the coboundary difference is not less than the lower boundary difference, confirm that then the upper-lower position relation of two adjacent original blocks in the literal piece is: second original block in two adjacent original blocks be last, and first original block is down.
The type-setting mode of original block is horizontally-arranged or nondirectional vertical setting of types and two adjacent original blocks when in the literal piece, going together in said literal piece, first threshold can be between-0.07 and-0.1 value; In said literal piece the type-setting mode of original block be horizontally-arranged or nondirectional vertical setting of types and two adjacent original blocks in the literal piece during different rows, first threshold can be between-0.03 and-0.06 value; The type-setting mode of original block is from left to right or during vertical setting of types from right to left, the value of first threshold can be 0 in said literal piece.Second threshold value can be between 0.5 and 1 value.
Further, said ordering amending unit also is used for: confirm in the following manner whether two adjacent original blocks go together in the literal piece:
The size of the horizontal base line value of two adjacent original blocks is relatively confirmed the line space size according to the font size that has than the original block of levels baseline value;
Calculate the horizontal base line difference of these two adjacent original blocks, if the horizontal base line difference greater than said line space size, is then confirmed these two adjacent original blocks different rows in the literal piece, otherwise, confirm that these two adjacent original blocks go together in the literal piece.
Further, said original block sequencing unit 41 specifically comprises:
Second sequencing unit, the type-setting mode that is used for original block in said literal piece be during for oblique row, according to the sequence number of each original block in the said literal piece original block in the said literal piece sorted;
The rotational transform unit is used for the coordinate system at said literal piece place is rotated conversion;
Grouped element is used for the position coordinates based on the coordinate system of ordering each original block of back after rotation transformation, and each original block is divided into groups, and the original block after dividing into groups in same group is positioned at delegation or same row;
The 3rd sequencing unit is used for the original block sequence number mean value based on each group of back of dividing into groups, each group after dividing into groups sorted, and, divide into groups for each, sort based on the writing direction of the said literal piece interior original block that will divide into groups.
Further, said rotational transform unit specifically is used for:
Calculate the horizontal base line difference and the vertical parallax difference of per two the adjacent original blocks in ordering back; Confirm maximum horizontal base line difference and the maximum vertical parallax differences of occurrence number of occurrence number that calculate; Horizontal base line difference and the occurrence number maximum vertical parallax difference maximum according to occurrence number, the slope of calculating oblique line;
According to said slope calculating coordinate system rotation angle value;
According to said coordinate system rotation angle value, the coordinate system that said literal piece is belonged to is rotated conversion.
Further, said grouped element specifically is used for: for per two the adjacent original blocks after the ordering, carry out following steps:
Calculate the horizontal base line value in last original block in two adjacent original blocks and the coordinate system of back one original block after rotational transform; Whether the horizontal base line value of a back original block of confirming to calculate and the difference of last original block horizontal base line value be less than predefined the 3rd threshold value; If; Confirm that then last original block and back one original block are positioned at same delegation, and last original block and back one original block are divided in same group; Otherwise, confirm that last original block and back one original block are positioned at different rows, and in being divided in not last original block and back one original block on the same group.Said the 3rd threshold value can be 0.91-1 times of original block height.
Further, said the 3rd sequencing unit specifically is used for: the original block in will dividing into groups according to the writing direction of said literal piece according to following mode sorts:
Greater than 1 o'clock, confirm that the type-setting mode of original block is that vertical row type is tiltedly arranged in the said literal piece at said slope, and the interior original block that will divide into groups sorts according to the upper boundary values order from small to large of original block;
Be not more than at 1 o'clock at said slope, confirm that the type-setting mode of original block is that horizontal-type is tiltedly arranged in the said literal piece, and the interior original block that will divide into groups sorts according to the left side dividing value order from small to large of original block.
To sum up, beneficial effect of the present invention comprises:
In the scheme that the embodiment of the invention provides; Corresponding relation according to predefined type-setting mode and original block sortord; Confirm the corresponding original block sortord of type-setting mode of original block in the literal piece; According to this original block sortord the original block in the literal piece is sorted then, and the original block after will sorting is exported demonstration.It is thus clear that, adopt the present invention, can the original block in the literal piece that adopt certain type-setting mode be sorted, with the reading order of original block in reduction this article block.
The present invention is that reference is described according to the process flow diagram and/or the block scheme of method, equipment (system) and the computer program of the embodiment of the invention.Should understand can be by the flow process in each flow process in computer program instructions realization flow figure and/or the block scheme and/or square frame and process flow diagram and/or the block scheme and/or the combination of square frame.Can provide these computer program instructions to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, make the instruction of carrying out through the processor of computing machine or other programmable data processing device produce to be used for the device of the function that is implemented in flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame appointments.
These computer program instructions also can be stored in ability vectoring computer or the computer-readable memory of other programmable data processing device with ad hoc fashion work; Make the instruction that is stored in this computer-readable memory produce the manufacture that comprises command device, this command device is implemented in the function of appointment in flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame.
These computer program instructions also can be loaded on computing machine or other programmable data processing device; Make on computing machine or other programmable devices and to carry out the sequence of operations step producing computer implemented processing, thereby the instruction of on computing machine or other programmable devices, carrying out is provided for being implemented in the step of the function of appointment in flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame.
Although described the preferred embodiments of the present invention, in a single day those skilled in the art get the basic inventive concept could of cicada, then can make other change and modification to these embodiment.So accompanying claims is intended to be interpreted as all changes and the modification that comprises preferred embodiment and fall into the scope of the invention.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, belong within the scope of claim of the present invention and equivalent technologies thereof if of the present invention these are revised with modification, then the present invention also is intended to comprise these changes and modification interior.