CN102968407B - The building method of double-layer PDF file and device - Google Patents

The building method of double-layer PDF file and device Download PDF

Info

Publication number
CN102968407B
CN102968407B CN201110256474.9A CN201110256474A CN102968407B CN 102968407 B CN102968407 B CN 102968407B CN 201110256474 A CN201110256474 A CN 201110256474A CN 102968407 B CN102968407 B CN 102968407B
Authority
CN
China
Prior art keywords
character
character image
double
pdf file
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110256474.9A
Other languages
Chinese (zh)
Other versions
CN102968407A (en
Inventor
王晓健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hanwang Technology Co Ltd
Original Assignee
Hanwang Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hanwang Technology Co Ltd filed Critical Hanwang Technology Co Ltd
Priority to CN201110256474.9A priority Critical patent/CN102968407B/en
Publication of CN102968407A publication Critical patent/CN102968407A/en
Application granted granted Critical
Publication of CN102968407B publication Critical patent/CN102968407B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Processing Or Creating Images (AREA)

Abstract

The present invention discloses a kind of building method and device of double-layer PDF file, relates to technical field of computer information processing, improves the display quality of double-layer PDF file.Comprise: obtain each character in original character data; Target character size and the target display location of each character correspondence in double-layer PDF file in described character image is calculated according to the reference character of specifying in each character in character image; The original size of each character in the target character size corresponding in double-layer PDF file according to each character in described character image and described character image calculates cross directional stretch coefficient corresponding to each character and longitudinal stretching coefficient; Target character size, target display location, cross directional stretch coefficient and the longitudinal stretching coefficient corresponding in double-layer PDF file according to each character in described character image generate described double-layer PDF file.The embodiment of the present invention is mainly used in the making process of double-layer PDF file.

Description

The building method of double-layer PDF file and device
Technical field
The present invention relates to technical field of computer information processing, particularly relate to a kind of building method and device of double-layer PDF file.
Background technology
Double-deck PDF (Portable Document Format, portable file layout) file is a kind of PDF file with sandwich construction, its file content both comprised text layers, also comprised image layer, and text layers is corresponding one by one up and down with the position of image layer.After double-layer PDF file refers to and papery data etc. is obtained scan image by scanner, through decontamination, correction and OCR (Optical Character Recognition, optical character identification) identify, then directly generate the pdf document that can retrieve.This pdf document is double-deck, and upper strata is original image, and lower floor is recognition result, thus 100% can retain original space of a whole page effect, and supports to select/copy/function such as retrieval, is convenient to set up index data base, carries out the management of science.
But, adopt existing double-layer PDF file building method, when the font of the mechanical printings such as Ancient books is made into double-layer PDF file, owing to there is no the corresponding character model of the font of these mechanical printings in modern character library, thus, when above-mentioned Ancient books being made the double-layer PDF file obtained and showing, accurately cannot determine the size of the corresponding block of the font of these mechanical printings, make the font of these mechanical printings very inharmonious when showing, destroy the consistance of the space of a whole page, reduce the display quality of double-layer PDF file.
Summary of the invention
Embodiments of the invention provide a kind of building method and device of double-layer PDF file, improve the display quality of double-layer PDF file.
For achieving the above object, embodiments of the invention adopt following technical scheme:
A building method for double-layer PDF file, comprising:
Obtain each character in the character image of original character data after overscanning, OCR identify;
Target character size and the target display location of each character correspondence in double-layer PDF file in described character image is calculated according to the reference character of specifying in each character in character image;
The original size of each character in the target character size corresponding in double-layer PDF file according to each character in described character image and described character image calculates cross directional stretch coefficient corresponding to each character in described character image and longitudinal stretching coefficient;
Target character size, target display location, cross directional stretch coefficient and the longitudinal stretching coefficient corresponding in double-layer PDF file according to each character in described character image generate described double-layer PDF file.
A constructing apparatus for double-layer PDF file, comprising:
Acquiring unit, for obtaining each character in the character image of original character data after overscanning, OCR identify;
First computing unit, for calculating target character size and the target display location of each character correspondence in double-layer PDF file in described character image according to the reference character of specifying in each character in character image;
Second computing unit, the original size of each character in the target character size corresponding in double-layer PDF file according to each character in described character image and described character image calculates cross directional stretch coefficient corresponding to each character in described character image and longitudinal stretching coefficient;
Generation unit, generates described double-layer PDF file for target character size, target display location, cross directional stretch coefficient and the longitudinal stretching coefficient corresponding in double-layer PDF file according to each character in described character image.
In the embodiment of the present invention described by technique scheme, by calculating target character size and the target display location of each character correspondence in double-layer PDF file in described character image; And calculate cross directional stretch coefficient corresponding to each character in described character image and longitudinal stretching coefficient; Target character size, target display location, cross directional stretch coefficient and the longitudinal stretching coefficient corresponding in double-layer PDF file according to each character in described character image generate described double-layer PDF file.Due to when generating described double-layer PDF file, can process character according to cross directional stretch coefficient and longitudinal stretching coefficient, achieve the stretching of the horizontal and vertical different proportion of character, and then make the character model on PDF can approach the pattern of character in papery data as far as possible, improve the display quality of double-layer PDF file.Especially, significant for the reduction of special character in Ancient books.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 provides a kind of process flow diagram of building method of double-layer PDF file for the embodiment of the present invention 1;
What Fig. 2 provided a kind of building method of double-layer PDF file for the embodiment of the present invention 1 realizes schematic diagram;
Fig. 3 provides the process flow diagram of the building method of another kind of double-layer PDF file for the embodiment of the present invention 1;
Fig. 4 provides a kind of structural drawing of constructing apparatus of double-layer PDF file for the embodiment of the present invention 2;
Fig. 5 provides the structural drawing of the constructing apparatus of another kind of double-layer PDF file for the embodiment of the present invention 2.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Embodiment 1
The embodiment of the present invention provides a kind of building method of double-layer PDF file, as shown in Figure 1, comprising:
101, each character in the character image of original character data after overscanning, OCR identify is obtained;
Particularly, obtain the character image of original character data after overscanning, obtain recognition result and the character picture coordinate of each character of character image after optical character identification OCR identifies.
102, target character size and the target display location of each character correspondence in double-layer PDF file in described character image is calculated according to the reference character of specifying in each character in character image;
103, the cross directional stretch coefficient that each character calculated in described character image according to the original size of each character in the target character size corresponding in double-layer PDF file of each character in described character image and described character image is corresponding and longitudinal stretching coefficient;
104, described double-layer PDF file is generated according to target character size, target display location, cross directional stretch coefficient and longitudinal stretching coefficient that each character in described character image is corresponding in double-layer PDF file.
In the embodiment of the present invention, by calculating target character size and the target display location of each character correspondence in double-layer PDF file in described character image; And calculate cross directional stretch coefficient corresponding to each character in described character image and longitudinal stretching coefficient; Target character size, target display location, cross directional stretch coefficient and the longitudinal stretching coefficient corresponding in double-layer PDF file according to each character in described character image generate described double-layer PDF file.Due to when generating described double-layer PDF file, can process character according to cross directional stretch coefficient and longitudinal stretching coefficient, achieve the stretching of the horizontal and vertical different proportion of character, and then make the character model on PDF can approach the pattern of character in papery data as far as possible, improve the display quality of double-layer PDF file.
The implementation of the embodiment of the present invention is described in detail below in conjunction with Fig. 2.As shown in Figure 2, be followed successively by from left to right original character image, reference character, calculate tensionless winkler foundation character model, stretch after character model, show the matching effect of character model and original character image, what be filled to vertical striped is original image region, what be filled to travers is the shown character model region calculated, and the region being shown as grid is the region of two figure couplings.
Further, in above-mentioned steps 102, the target character size and the target display location that calculate each character correspondence in double-layer PDF file in described character image can realize as follows:
First, according to calculate the target character size of each character correspondence in double-layer PDF file in described character image, wherein, CalcS ifor the target character size that the character of i-th in described character image is corresponding in double-layer PDF file, ImgH ifor the original height of the character of i-th in described character image, RefS is the size of specifying reference character, RefH ifor the height of the character picture of reference character in character model under RefS corresponding to described i-th character.
In fig. 2, ImgH, RefH, RefS and CalcS show successively, and wherein, RefH is the height of described appointment reference character under RefS.
Such as, choose the Song typeface of TrueType as appointment reference character, RefS is specifically as follows No. four.After the size (that is font size) of a character is determined, height and the width of this character are also determined thereupon.
Secondly, according to CalcS icalculate every composition data of character model corresponding to each character in described character image, comprise CalcH i, CalcW i, CalcX i, CalcY i, CellH iand CellW i; Wherein, CalcS ifor the target character size that the character of i-th in described character image is corresponding in double-layer PDF file, character model is the display block that the character in described character image is corresponding in double-layer PDF file, CalcH ifor the height of the character picture in character model, CalcW ifor the width of the character picture in character model, CalcX ifor the horizontal offset in the character picture distance model upper left corner in character model, CalcY ifor the vertical offset in the character picture distance model upper left corner in character model, CellH ifor the height of character model, CellW ifor the width of character model.
Above-mentioned according to CalcS icalculate every composition data of character model corresponding to each character in described character image, the computing method that specifically can be provided by existing windows bottom functional module realize (above-mentioned windows bottom functional module is a general computing module), specific implementation process is: according to the font description document in the character model read operation system that each character is corresponding, can obtain every composition data of character model corresponding to each character from this font description document.More detailed implementation procedure can see associated description of the prior art.
It should be noted that, if ignore the error of multiplication and division computing introducing, then CalcH iwith ImgH ishould be equal, CellH iwith CalcS ishould be equal.
Afterwards, according to ShowPt_X=ImgPt_X-CalcX iand ShowPt_Y=ImgPt_Y-CalcY icalculate the target display location of each character in described character image, wherein, described ShowPt_X, ShowPt_Y are the coordinate figure of the target display location of each character in described character image, ImgPt_X, ImgPt_Y are the coordinate figure in the upper left corner of each character on character image.Such as, the some ImgPt in the upper left corner of the original block in each character place in the described character image shown in the leftmost side in Fig. 2.Point MatchPt to be matched is shown in Fig. 2, has made MatchPt=ImgPt, then the coordinate of ImgPt is deducted side-play amount CalcX i, CalY i, the coordinate figure of the target display location ShowPt of each character in character image described in image can be obtained.Follow-up when generating double-layer PDF file, placed according to target display location ShowPt by character model, can realize MatchPt should overlap with ImgPt, and then realizes the matching effect shown in Fig. 2.
Further, in above-mentioned steps 103, calculate cross directional stretch coefficient corresponding to each character in described character image and longitudinal stretching coefficient specifically realizes in the following way:
According to calculate the longitudinal stretching coefficient of each character correspondence in double-layer PDF file in described character image, wherein, R ifor the longitudinal stretching coefficient that the character of i-th in described character image is corresponding in double-layer PDF file, ImgH ifor the original height of the character of i-th in described character image, CalcH ifor the height of the character picture in the character model that the character of i-th in described character image is corresponding in double-layer PDF file; It should be noted that, if ignore the error of multiplication and division computing introducing, then CalcH iwith ImgH ishould be equal, i.e. R ilevel off to 1; And, according to calculate the cross directional stretch coefficient of each character correspondence in double-layer PDF file in described character image, wherein, S ifor the cross directional stretch coefficient that the character of i-th in described character image is corresponding in double-layer PDF file, CalcW ifor the width of the character picture in the character model that the character of i-th in described character image is corresponding in double-layer PDF file, ImgW ifor the original width of the character of i-th in described character image.
Afterwards, respectively according to CalcY i=CalcY ir iadjustment CalcY i, according to CalcX i=CalcX is iadjustment CalcX i.Due to CalcX i, CalcY ichange with the change of cross directional stretch coefficient, longitudinal stretching coefficient, thus need CalcX i, CalcY iadjust in real time.In like manner, according to ShowPt_X=ImgPt_X-CalcX iadjustment ShowPt_X, according to ShowPt_Y=ImgPt_Y-CalcY iadjustment ImgPt_Y.
After getting the target character size corresponding in double-layer PDF file of each character on above-mentioned character image, target display location, cross directional stretch coefficient and longitudinal stretching coefficient, described double-layer PDF file can be generated according to target character size, target display location, cross directional stretch coefficient and the longitudinal stretching coefficient that each character in described character image is corresponding in double-layer PDF file.Concrete generative process is: utilize each character in described character image target character size, target display location, cross directional stretch coefficient and longitudinal stretching coefficient corresponding in double-layer PDF file to generate double-layer PDF file according to the relevant regulations of pdf document form.
In the page due to the vertical typesetting of books in ancient times, Chinese character " seven " " nine " etc. have longitudinally narrow typesetting effect, be different from the formation feature that the horizontal and vertical width of conventional Song typeface is identical, thus, the method adopting the embodiment of the present invention to provide can calculate cross directional stretch coefficient corresponding to each character and longitudinal stretching coefficient, and according to cross directional stretch coefficient corresponding to each character and longitudinal stretching coefficient, convergent-divergent process is carried out to character, thus make the character model on the final PDF generated can approach the pattern of character in papery data as far as possible, improve the display quality of double-layer PDF file.
Further, optionally, in order to be optimized the display effect of double-layer PDF file, the embodiment of the present invention also can realize to each character be expert in word carry out row alignment process.As shown in Figure 3, the method also comprises:
105, when the character in described character image is horizontal typesetting, upper boundary values and lower border value that each character in described character image is expert at is obtained.
Particularly, when the character in described character image is horizontal typesetting, determine the current character in described character image be expert in the mean value of upper boundary values of all characters upper boundary values of being expert at for described current character; Determine the current character in described character image be expert in the mean value of lower border value of all characters lower border value of being expert at for described current character.
The upper boundary values of 106, being expert at according to each character in described character image and lower border value, adjust the longitudinal stretching coefficient of each character in described character image.
Particularly, according to adjust the longitudinal stretching coefficient of each character in described character image, wherein, R ifor the longitudinal stretching coefficient after the character adjustment of i-th in described character image, P ijfor the lower border value of the jth row at the character place of i-th in described character image, Q ijfor the upper boundary values of the jth row at the character place of i-th in described character image, H ifor the height that the character of i-th in described character image is corresponding under current font size.
Afterwards, the longitudinal stretching coefficient after adopting each character to adjust regenerates double-layer PDF file.
It should be noted that, in the embodiment of the present invention, the height of character, width, horizontal offset values and vertical offset value are all in units of pixel.
Separately, what describe in the method shown in Fig. 3 is the row alignment process carried out when word adopts horizontal typesetting, certainly, the embodiment of the present invention can also realize when word adopts longitudinal typesetting, row alignment process is carried out to word, now, obtains left side dividing value and the right dividing value of each character column in described character image, and according to the left side dividing value of each character column in described character image and the right dividing value, adjust the cross directional stretch coefficient of each character in described character image.Method shown in specific implementation process and above-mentioned Fig. 3 is similar, does not repeat them here.
Embodiment 2
The embodiment of the present invention provides a kind of constructing apparatus of double-layer PDF file, as shown in Figure 4, comprising: acquiring unit 11, the first computing unit 12, second computing unit 13 and generation unit 14.
Wherein, acquiring unit 11, for obtaining each character in the character image of original character data after overscanning, optical character identification OCR identify;
First computing unit 12, for calculating target character size and the target display location of each character correspondence in double-layer PDF file in described character image according to the reference character of specifying in each character in character image;
Second computing unit 13, the original size of each character in the target character size corresponding in double-layer PDF file according to each character in described character image and described character image calculates cross directional stretch coefficient corresponding to each character in described character image and longitudinal stretching coefficient;
Generation unit 14, generates described double-layer PDF file for target character size, target display location, cross directional stretch coefficient and the longitudinal stretching coefficient corresponding in double-layer PDF file according to each character in described character image.
In the embodiment of the present invention, by calculating target character size and the target display location of each character correspondence in double-layer PDF file in described character image; And calculate cross directional stretch coefficient corresponding to each character in described character image and longitudinal stretching coefficient; Target character size, target display location, cross directional stretch coefficient and the longitudinal stretching coefficient corresponding in double-layer PDF file according to each character in described character image generate described double-layer PDF file.Due to when generating described double-layer PDF file, can process character according to cross directional stretch coefficient and longitudinal stretching coefficient, achieve the stretching of the horizontal and vertical different proportion of character, and then make the character model on PDF can approach the pattern of character in papery data as far as possible, improve the display quality of double-layer PDF file.
Further, described first computing unit 12 is specifically for basis calculate the target character size of each character correspondence in double-layer PDF file in described character image, wherein, CalcS ifor the target character size that the character of i-th in described character image is corresponding in double-layer PDF file, ImgH ifor the original height of the character of i-th in described character image, RefS specifies the size of reference character, is the height of the character picture of reference character in character model under RefS corresponding to described i-th character.
Described first computing unit 12 is also specifically for according to CalcS icalculate every composition data of character model corresponding to each character in described character image, described composition data comprise CalcH i, CalcW i, CalcX i, CalcY i, wherein, CalcS ifor the target character size that the character of i-th in described character image is corresponding in double-layer PDF file, character model is the display block that the character in described character image is corresponding in double-layer PDF file, CalcH ifor the height of the character picture in character model, CalcW ifor the width of the character picture in character model, CalcX ifor the horizontal offset in the character picture distance model upper left corner in character model, CalcY ifor the vertical offset in the character picture distance model upper left corner in character model, CellH ifor the height of character model, CellW ifor the width of character model.
Described first computing unit 12 is also specifically for according to ShowPt_X=ImgPt_X-CalcX iand ShowPt_Y=ImgPt_Y-CalcY icalculate the target display location of each character in described character image, wherein, described ShowPt_X, ShowPt_Y are the coordinate figure of the target display location of each character in described character image, ImgPt_X, ImgPt_Y are the coordinate figure in the upper left corner of each character on character image.
Further, described second computing unit 13 is specifically for basis calculate the longitudinal stretching coefficient of each character correspondence in double-layer PDF file in described character image, wherein, R ifor the longitudinal stretching coefficient that the character of i-th in described character image is corresponding in double-layer PDF file, CalcH ifor the height of the character picture in the character model that the character of i-th in described character image is corresponding in double-layer PDF file, ImgH ifor the original height of the character of i-th in described character image;
Described second computing unit 13 is also specifically for basis calculate the cross directional stretch coefficient of each character correspondence in double-layer PDF file in described character image, wherein, S ifor the cross directional stretch coefficient that the character of i-th in described character image is corresponding in double-layer PDF file, CalcW ifor the width of the character picture in the character model that the character of i-th in described character image is corresponding in double-layer PDF file, ImgW ifor the original width of the character of i-th in described character image.
As shown in Figure 5, described device also comprises: adjustment unit 16, for respectively according to CalcY i=CalcY ir iadjustment CalcY i, according to CalcX i=CalcX is iadjustment CalcX i; Meanwhile, according to ShowPt_X=ImgPt_X-CalcX iadjustment ShowPt_X, according to ShowPt_Y=ImgPt_Y-CalcY iadjustment ImgPt_Y.
Further, optionally, in order to be optimized the display effect of double-layer PDF file, the embodiment of the present invention also can realize to each character be expert in word carry out row alignment process.As shown in Figure 5, this device also comprises: boundary value acquiring unit 15.
Boundary value acquiring unit 15, for when the character in described character image is horizontal typesetting, obtains upper boundary values and lower border value that each character in described character image is expert at.
Particularly, described boundary value acquiring unit 15 specifically for when the character in described character image is horizontal typesetting, determine the current character in described character image be expert in the mean value of upper boundary values of all characters upper boundary values of being expert at for described current character.Described boundary value acquiring unit also 15 specifically for determine current character in described character image be expert in the mean value of lower border value of all characters lower border value of being expert at for described current character.
Described adjustment unit 16 also for the upper boundary values of being expert at according to each character in described character image and lower border value, adjusts the longitudinal stretching coefficient of each character in described character image.
Particularly, described adjustment unit 16 is specifically for basis adjust the longitudinal stretching coefficient of each character in described character image, wherein, R ifor the longitudinal stretching coefficient after the character adjustment of i-th in described character image, P ijfor the lower border value of the jth row at the character place of i-th in described character image, Q ijfor the upper boundary values of the jth row at the character place of i-th in described character image, H ifor the height of the character of i-th in described character image character model of correspondence under current font size.
Boundary value acquiring unit 15, also for when the character in described character image is longitudinal typesetting, obtains left side dividing value and the right dividing value of each character column in described character image.
Described adjustment unit 16 also for according to the left side dividing value of each character column in described character image and the right dividing value, adjusts the cross directional stretch coefficient of each character in described character image.
The embodiment of the present invention is mainly used in the making process of double-layer PDF file.
Through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add required common hardware by software and realize, and can certainly pass through hardware, but in a lot of situation, the former is better embodiment.Based on such understanding, technical scheme of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product is stored in the storage medium that can read, as the floppy disk of computing machine, hard disk or CD etc., comprise some instructions and perform method described in each embodiment of the present invention in order to make a computer equipment (can be personal computer, server, or the network equipment etc.).
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; change can be expected easily or replace, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of described claim.

Claims (10)

1. a building method for double-layer PDF file, is characterized in that, comprising:
Obtain each character in the character image of original character data after overscanning, OCR identify;
Target character size and the target display location of each character correspondence in double-layer PDF file in described character image is calculated according to the reference character of specifying in each character in character image;
The original size of each character in the target character size corresponding in double-layer PDF file according to each character in described character image and described character image calculates cross directional stretch coefficient corresponding to each character in described character image and longitudinal stretching coefficient;
Target character size, target display location, cross directional stretch coefficient and the longitudinal stretching coefficient corresponding in double-layer PDF file according to each character in described character image generate described double-layer PDF file;
The described reference character according to specifying in each character in character image calculates each character in described character image target character size corresponding in double-layer PDF file and target display location comprises:
Calculate target character size and the target display location of each character correspondence in double-layer PDF file in described character image according to the reference character of specifying in each character in character image, comprising:
According to calculate the target character size of each character correspondence in double-layer PDF file in described character image; Wherein, CalcS ifor the target character size that the character of i-th on described character image is corresponding in double-layer PDF file, ImgH ifor the original height of the character of i-th on described character image, RefS is the size of specifying reference character, RefH ifor the height of the character picture of reference character in character model under RefS corresponding to described i-th character;
According to CalcS icalculate every composition data of character model corresponding to each character in described character image; Described composition data comprise: CalcH i, CalcW i, CalcX i, CalcY i, wherein, character model is the display block that the character in described character image is corresponding in double-layer PDF file, CalcH ifor the height of the character picture in character model, CalcW ifor the width of the character picture in character model, CalcX ifor the horizontal offset in the character picture distance model upper left corner in character model, CalcY ifor the vertical offset in the character picture distance model upper left corner in character model;
According to ShowPt_X=ImgPt_X-CalcX iand ShowPt_Y=ImgPt_Y-CalcY icalculate the target display location of each character in described character image, wherein, described ShowPt_X, ShowPt_Y is the coordinate figure of the target display location of each character in described character image, ImgPt_X, ImgPt_Y are the coordinate figure in the upper left corner of the original block in each character place on character image.
2. the building method of double-layer PDF file according to claim 1, it is characterized in that, the original size of each character in the described target character size corresponding in double-layer PDF file according to each character in described character image and described character image calculates cross directional stretch coefficient corresponding to each character in described character image and longitudinal stretching coefficient comprises:
According to calculate the longitudinal stretching coefficient of each character correspondence in double-layer PDF file in described character image; Wherein, R ifor the longitudinal stretching coefficient that the character of i-th on described character image is corresponding in double-layer PDF file, ImgH ifor the original height of the character of i-th on described character image, CalcH ifor the height of the character picture in the character model that the character of i-th on described character image is corresponding in double-layer PDF file;
According to calculate the cross directional stretch coefficient of each character correspondence in double-layer PDF file in described character image; Wherein, S ifor the cross directional stretch coefficient that the character of i-th on described character image is corresponding in double-layer PDF file, ImgW ifor the original width of the character of i-th on described character image, CalcW ifor the width of the character picture in the character model that the character of i-th on described character image is corresponding in double-layer PDF file;
Respectively according to CalcY' i=CalcY i× R iadjustment CalcY' i, according to CalcX' i=CalcX i× S iadjustment CalcX' i, meanwhile, respectively according to ShowPt_X=ImgPt_X-CalcX' iadjustment ShowPt_X, according to ShowPt_Y=ImgPt_Y-CalcY' iadjustment ShowPt_Y.
3. the building method of double-layer PDF file according to claim 1, is characterized in that, also comprises:
When the character in described character image is horizontal typesetting, obtain upper boundary values and lower border value that each character in described character image is expert at;
The upper boundary values of being expert at according to each character in described character image and lower border value, adjust the longitudinal stretching coefficient of each character in described character image;
When the character in described character image is longitudinal typesetting, obtain left side dividing value and the right dividing value of each character column in described character image;
According to left side dividing value and the right dividing value of each character column in described character image, adjust the cross directional stretch coefficient of each character in described character image.
4. the building method of double-layer PDF file according to claim 3, is characterized in that, described when the character in described character image is horizontal typesetting, obtains upper boundary values that each character in described character image is expert at and lower border value comprises:
Determine the current character in described character image be expert in the mean value of upper boundary values of all characters upper boundary values of being expert at for described current character;
Determine the current character in described character image be expert in the mean value of lower border value of all characters lower border value of being expert at for described current character.
5. the building method of double-layer PDF file according to claim 3, it is characterized in that, described upper boundary values of being expert at according to each character in described character image and lower border value, the longitudinal stretching coefficient adjusting each character in described character image comprises:
According to adjust the longitudinal stretching coefficient of each character in described character image, wherein, R ifor the longitudinal stretching coefficient after the character adjustment of i-th in described character image, P ijfor the lower border value of the jth row at the character place of i-th in described character image, Q ijfor the upper boundary values of the jth row at the character place of i-th in described character image, H ifor the height of the character of i-th in described character image character model of correspondence under current font size.
6. a constructing apparatus for double-layer PDF file, is characterized in that, comprising:
Acquiring unit, for obtaining each character in the character image of original character data after overscanning, OCR identify;
First computing unit, for calculating target character size and the target display location of each character correspondence in double-layer PDF file in described character image according to the reference character of specifying in each character in character image, specifically for basis calculate the target character size of each character correspondence in double-layer PDF file in described character image, wherein, CalcS ifor the target character size that the character of i-th in described character image is corresponding in double-layer PDF file, ImgH ifor the original height of the character of i-th in described character image, RefS is the size of specifying reference character, RefH ifor the height of the character picture of reference character in character model under RefS corresponding to described i-th character;
Described first computing unit is also specifically for according to CalcS icalculate every composition data of character model corresponding to each character in described character image, described composition data comprise: CalcH i, CalcW i, CalcX i, CalcY i, wherein, character model is the display block that the character in described character image is corresponding in double-layer PDF file, CalcH ifor the height of the character picture in character model, CalcW ifor the width of the character picture in character model, CalcX ifor the horizontal offset in the character picture distance model upper left corner in character model, CalcY ifor the vertical offset in the character picture distance model upper left corner in character model;
Described first computing unit is also specifically for according to ShowPt_X=ImgPt_X-CalcX iand ShowPt_Y=ImgPt_Y-CalcY icalculate the target display location of each character in described character image, wherein, described ShowPt_X, ShowPt_Y is the coordinate figure of the target display location of each character in described character image, ImgPt_X, ImgPt_Y are the coordinate figure in the upper left corner of the original block in each character place on character image;
Second computing unit, the original size of each character in the target character size corresponding in double-layer PDF file according to each character in described character image and described character image calculates cross directional stretch coefficient corresponding to each character in described character image and longitudinal stretching coefficient;
Generation unit, generates described double-layer PDF file for target character size, target display location, cross directional stretch coefficient and the longitudinal stretching coefficient corresponding in double-layer PDF file according to each character in described character image.
7. the constructing apparatus of double-layer PDF file according to claim 6, is characterized in that, described second computing unit is specifically for basis calculate the longitudinal stretching coefficient of each character correspondence in double-layer PDF file in described character image, wherein, R ifor the longitudinal stretching coefficient that the character of i-th in described character image is corresponding in double-layer PDF file, CalcH ifor the height of the character picture in the character model that the character of i-th in described character image is corresponding in double-layer PDF file, ImgH ifor the original height of the character of i-th in described character image;
Described second computing unit is also specifically for basis calculate the cross directional stretch coefficient of each character correspondence in double-layer PDF file in described character image, wherein, S ifor the cross directional stretch coefficient that the character of i-th in described character image is corresponding in double-layer PDF file, CalcW ifor the width of the character picture in the character model that the character of i-th in described character image is corresponding in double-layer PDF file, ImgW ifor the original width of the character of i-th in described character image;
Described device also comprises: adjustment unit, for respectively according to CalcY' i=CalcY i× R iadjustment CalcY' i, according to CalcX' i=CalcX i× S iadjustment CalcX' i; Also for according to ShowPt_X=ImgPt_X-CalcX' iadjustment ShowPt_X, according to ShowPt_Y=ImgPt_Y-CalcY' iadjustment ShowPt_Y.
8. the constructing apparatus of double-layer PDF file according to claim 6, is characterized in that, also comprises:
Boundary value acquiring unit, for when the character in described character image is horizontal typesetting, obtains upper boundary values and lower border value that each character in described character image is expert at;
Adjustment unit, for the upper boundary values of being expert at according to each character in described character image and lower border value, adjusts the longitudinal stretching coefficient of each character in described character image;
Boundary value acquiring unit, also for when the character in described character image is longitudinal typesetting, obtains left side dividing value and the right dividing value of each character column in described character image;
Described adjustment unit also for according to the left side dividing value of each character column in described character image and the right dividing value, adjusts the cross directional stretch coefficient of each character in described character image.
9. the constructing apparatus of double-layer PDF file according to claim 8, it is characterized in that, described boundary value acquiring unit specifically for determine current character in described character image be expert in the mean value of upper boundary values of all characters upper boundary values of being expert at for described current character;
Described boundary value acquiring unit also specifically for determine current character in described character image be expert in the mean value of lower border value of all characters lower border value of being expert at for described current character.
10. the constructing apparatus of double-layer PDF file according to claim 8, is characterized in that, described adjustment unit is specifically for basis adjust the longitudinal stretching coefficient of each character in described character image, wherein, R ifor the longitudinal stretching coefficient after the character adjustment of i-th in described character image, P ijfor the lower border value of the jth row at the character place of i-th in described character image, Q ijfor the upper boundary values of the jth row at the character place of i-th in described character image, H ifor the height of the character of i-th in described character image character model of correspondence under current font size.
CN201110256474.9A 2011-08-31 2011-08-31 The building method of double-layer PDF file and device Active CN102968407B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110256474.9A CN102968407B (en) 2011-08-31 2011-08-31 The building method of double-layer PDF file and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110256474.9A CN102968407B (en) 2011-08-31 2011-08-31 The building method of double-layer PDF file and device

Publications (2)

Publication Number Publication Date
CN102968407A CN102968407A (en) 2013-03-13
CN102968407B true CN102968407B (en) 2015-09-09

Family

ID=47798555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110256474.9A Active CN102968407B (en) 2011-08-31 2011-08-31 The building method of double-layer PDF file and device

Country Status (1)

Country Link
CN (1) CN102968407B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104166849B (en) * 2013-05-17 2017-04-19 北大方正集团有限公司 Electronic document identification method and apparatus
CN103714047B (en) * 2013-11-12 2017-10-10 北京中献电子技术开发中心 The method and apparatus laterally proofreaded and export bilayer PDF
CN105335346B (en) * 2015-11-09 2018-12-04 汉王科技股份有限公司 A kind of Text Extraction and device of PDF document
CN109815187A (en) * 2017-11-22 2019-05-28 江苏文心古籍数字产业有限公司 A kind of support construction of bilayer PDF generating means
CN109948137A (en) * 2017-12-21 2019-06-28 江苏奥博洋信息技术有限公司 The method of the double-deck unified size of PDF batch
CN109684606B (en) * 2018-12-21 2023-09-01 人教数字出版有限公司 Method and device for presenting artistic effect on PDF page
CN110222617A (en) * 2019-05-29 2019-09-10 四川译讯信息科技有限公司 A kind of pdf document restorative procedure and system
CN112667115B (en) * 2020-12-22 2023-07-25 科大讯飞股份有限公司 Text display method, electronic equipment and storage device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510421A (en) * 2009-01-16 2009-08-19 北京中星微电子有限公司 Method and apparatus for regulating dot-character size, and embedded system
CN101916174A (en) * 2010-06-28 2010-12-15 汉王科技股份有限公司 Display method and device thereof, treatment method and device thereof for electronic document handwriting
CN101980133A (en) * 2010-10-29 2011-02-23 方正国际软件有限公司 Method and system for detecting text selection region deviation of double-layer electronic file

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1460557A3 (en) * 2003-03-12 2006-04-05 Eastman Kodak Company Manual and automatic alignement of pages
US20080212901A1 (en) * 2007-03-01 2008-09-04 H.B.P. Of San Diego, Inc. System and Method for Correcting Low Confidence Characters From an OCR Engine With an HTML Web Form

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510421A (en) * 2009-01-16 2009-08-19 北京中星微电子有限公司 Method and apparatus for regulating dot-character size, and embedded system
CN101916174A (en) * 2010-06-28 2010-12-15 汉王科技股份有限公司 Display method and device thereof, treatment method and device thereof for electronic document handwriting
CN101980133A (en) * 2010-10-29 2011-02-23 方正国际软件有限公司 Method and system for detecting text selection region deviation of double-layer electronic file

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
方正书版大样转双层PDF文件的实现方法;艾红等;《武汉理工大学学报 信息与管理工程版》;20110430;第33卷(第2期);第214-216、235页 *
李一平.双层格式电子图书的制作与使用.《大学图书馆发展与和谐社会构建》.西南交通大学出版社,2007,第98-101页. *

Also Published As

Publication number Publication date
CN102968407A (en) 2013-03-13

Similar Documents

Publication Publication Date Title
CN102968407B (en) The building method of double-layer PDF file and device
CN108255489B (en) Front-end interface code generation method and device, electronic equipment and storage medium
US7853869B2 (en) Creation of semantic objects for providing logical structure to markup language representations of documents
CN102622593B (en) Text recognition method and system
CN103488711A (en) Method and system for fast making vector font library
CN107944451B (en) Line segmentation method and system for ancient Tibetan book documents
CN101388037A (en) Page layout method and apparatus
CN104915332A (en) Method and device for generating composing template
CN107679442A (en) Method, apparatus, computer equipment and the storage medium of document Data Enter
CN102693553A (en) Method and device for creating charts achieving three-dimensional effect
JP2010186389A (en) Apparatus and program for processing information
EP2110758A1 (en) Searching method based on layout information
CN111415396A (en) Image generation method and device and storage medium
CN115659917A (en) Document format restoration method and device, electronic equipment and storage equipment
CN104133809B (en) Font style bolding method
CN103970723A (en) Electronic document screen display method based on image detecting and cutting
US8749834B2 (en) Information processing apparatus that perform margin reduction depending on the column group structure, method for controlling same, and storage medium on which computer program has been recorded
CN105474267A (en) Hardware glyph cache
US7961191B2 (en) Outline font brightness value correction system, method and program
CN104424174A (en) Document processing system and document processing method
CN103489268B (en) A kind of Arabic display packing for POS platform
CN101686309A (en) Method and device of generating trapping by image path
US10679049B2 (en) Identifying hand drawn tables
CN102592261B (en) Vector diagram showing method and system
CN112785536B (en) Three-dimensional tile printing file conversion method, device and medium for eliminating edge joint

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant