CN104516868B - The streaming restoring method and system in a kind of space of a whole page space - Google Patents

The streaming restoring method and system in a kind of space of a whole page space Download PDF

Info

Publication number
CN104516868B
CN104516868B CN201310462663.0A CN201310462663A CN104516868B CN 104516868 B CN104516868 B CN 104516868B CN 201310462663 A CN201310462663 A CN 201310462663A CN 104516868 B CN104516868 B CN 104516868B
Authority
CN
China
Prior art keywords
character
space
streaming
pitch
whole page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310462663.0A
Other languages
Chinese (zh)
Other versions
CN104516868A (en
Inventor
王长胜
董宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Fangzheng Apapi Technology Co Ltd
New Founder Holdings Development Co ltd
Original Assignee
Peking University Founder Group Co Ltd
Beijing Founder Apabi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Founder Group Co Ltd, Beijing Founder Apabi Technology Co Ltd filed Critical Peking University Founder Group Co Ltd
Priority to CN201310462663.0A priority Critical patent/CN104516868B/en
Publication of CN104516868A publication Critical patent/CN104516868A/en
Application granted granted Critical
Publication of CN104516868B publication Critical patent/CN104516868B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Document Processing Apparatus (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

The streaming restoring method in space of a whole page space of the present invention, obtain three characters adjacent successively in the space of a whole page;Calculate two neighboring character pitch and calculate character pitch difference;If the character pitch difference is less than preset difference value threshold value, space is not inserted into;Otherwise, calculate the right value of the first character and the left side value of the second character and as boundary values and, then calculate first character pitch and the boundary values and difference as judgment value, if the judgment value is more than the word width values of the space character of the affiliated type of the first character, space is then inserted after the first character, is otherwise not inserted into space.So, by the spacing for calculating the blank position between former and later two characters, it is due to that proportional spacing caused by typesetting is also due to caused by space character being present to obtain the spacing, if the difference of the character pitch at front and rear two is in threshold value, then illustrate that its interval is uniform, space need not be inserted, if the difference of spacing is larger, then whether inserted by multilevel iudge, so, reduced by space, by the space needed in the space of a whole page when streaming is changed, space needed for increase, the reduction treatment in streaming space can be so carried out according to the presentation effect of the actual space of a whole page, so that the displaying of the space of a whole page is more accurate.

Description

The streaming restoring method and system in a kind of space of a whole page space
Technical field
It is specifically a kind of to reduce logic space in format document the present invention relates to data for electronic documents process field To correspond to the processing method and system in space in streaming document.
Background technology
Format document is a kind of document independently of software, hardware, operating system, presentation or printing device, such as pdf, The format document of the forms such as ceb, cebx.One format document can include multiple pages, each page by with equipment and resolution The unrelated some pel compositions of rate.Format technology refers to a variety of digital content objects such as word, figure, image, audio frequency and video The technology presented according to space of a whole page solidification is carried out after certain typesetting rule composing.The characteristics of format document is that the space of a whole page is fixed, do not run Version, i.e. What You See Is What You Get(What you see is what you get, abbreviation WYSIWYG), using electronic document Cheng Zhong, present effect not because hardware environment, operator change change, format, the space of a whole page, font, font size etc. with Paper document keeps completely the same.The characteristics of format document form make it electronic document issue, digital information propagate and The preferable document format of archive.Increasing e-book, the description of product, company's proclamation, network data, Email start Using format document, abroad, the PDF format documents form of Adobe companies turns into digital information actual one at present Individual industrial standard.
Streaming typesetting shows that the word, numeral, form and the graph image that refer to include to document carry out specific typesetting side Formula processing, the content after preservation is original editor's element, can between different zoom ratios adaptive size display.Mesh Preceding electronic reading software had both supported the format typesetting of document to show, supported streaming typesetting to show again.Streaming document such as Office Document, what it was described be not generate after typesetting have the space of a whole page present required for all data document (i.e. format document), Its related (streaming) data generally without layout informations such as fixed position sizes, needs similar streamline when loading document every time Relevant location information is calculated in the typesetting again that carried out from the beginning to the end to these data of formula, then can just show.
In the structured message of document, the information of the logical construction on document (type such as streaming, format) is stored with, Logical construction, and the display style information such as fall including article, paragraph.Structured message can be used for the weight for realizing space of a whole page content Row(Reflow), to adapt to the demand of the equipment of different screen size particularly mobile device.The logical construction of common coarseness Unit has:Region, layout frame, paragraph, form etc..Fine-grained basic logical structure unit has:Text sentence, figure, image, public affairs Formula, chart, multimedia object, composite object etc..These are rich in semantic logic digital content objects in streaming document, format text Logic structure of data description in shelves etc. is all generally discrepant.
Text sentence(Run or Span), refer to logic text unit minimum in paragraph, whether continuous text is to have Same text attribute is foundation to be divided into several sentences.By the anchor point such as other unit objects such as graph image object point in paragraph Even if cut open the beginning has same text attribute also to divide different sentences.Also allow in format document by embedded font and Face change mode expresses display text, and not only by normal font encode corresponding to text string express display text This.There are some differences in version streaming document, such as format document Chinese version beginning of the sentence character can be given and publish in terms of text sentence is expressed X on face, y-coordinate, streaming document text sentence is then without this outer-flowing type document text sentence will not carry out pinpoint retouch to it State, and format document text sentence then may.Text is shown, i.e. the display result of word on an output device is generally believed by font Breath, text sentence and related style information etc. of drawing determine.
Word is wide(Advance Width)Refer to that the word of format text is wide, it is also possible to pass through the wide table of the word of embedded font WidthTable directly describes to provide, and can to obtain word wide by font Glyph corresponding to certain character in correspondence font for acquiescence Value.
Space(space)The generally existing in various documents, space effect can produce in several ways in streaming document It is raw:Space bar, Tab tabs, word space charspace, kerning (pair), special format are accurately positioned description etc.;Format Mathematic interpolation between the x coordinate position that space effect in document is fixed more by pels such as each characters comes out, or logical Extra word space charspace is crossed to stretch or tighten the distance between character, naturally it is also possible to real space character To express.Space character is contained in character caused by space bar, therefore is easily identified when streaming is reset, but non-real occupied space The blank position that lattice character is such as formed by special shapes such as Tab tabs, word spaces, due in the absence of space character information, because This is difficult to be identified as space.
The structuring streaming information of format document, especially wherein English text paragraph is on PC or various mobile devices Usually there are several phenomenons for hindering reading now:Do not have between English word space, have inside English word unnecessary space, Certain spacing between nearly all English alphabet all be present.It is empty with existing after being read in Fig. 2, Fig. 3 to refer to original master drawing in Fig. 1 The display master drawing of space mistake in Fig. 4 be present in the master drawing of lattice display mistake, and Fig. 3 original master drawing.
Above-mentioned phenomenon Producing reason is:(1)Space character is not present in format document in related English text string, but Logic space in format is expressed using the absolute coordinate position of character in character pitch (charspace) or fixed format; (2)And space expression way of equal value is also not present in the structuring streaming information of format document for example with Maker marker characters { Type=space, nCount=n space number };(3)Can only finally the structuring streaming information of format document be relied on specifically to present (Render) drawn out when according to certain typesetting rule, but the result showed at present such as Fig. 2, Fig. 3 and Fig. 5 institute Show related defects still be present:Because existing drafting layout algorithm make use of charspace when CSS renders Information;And charspace is have ignored when being rendered using format font information, and decile word space.Because the generation in space is former Because a lot, therefore it is difficult to adopt a certain method and is reduced space caused by above-mentioned numerous reasons.
The content of the invention
Therefore, the technical problems to be solved by the invention are after format document is converted into streaming document, space occurs Position it is incorrect, there is unnecessary space or the position of the appearance and do not occur, so as to propose that a kind of effectively reduction space of a whole page is hollow Lattice, improve restoring method and system that the space of a whole page space of accuracy rate occurs in space.
In order to solve the above technical problems, the offer a kind of restoring method and system in space of a whole page space of the present invention.
A kind of streaming restoring method in space of a whole page space, comprises the following steps:
Obtain the first character adjacent successively in the space of a whole page, the second character, the 3rd character;
The first character pitch between the first character and the second character is calculated, and is calculated between the second character and the 3rd character The second character pitch, then calculate the character pitch difference of the first character pitch and the second character pitch;
If the character pitch difference is less than preset difference value threshold value, space is not inserted into;Otherwise, the first character is calculated The right is worth and the left side of the second character is worth and as boundary values and, then calculate first character pitch and the boundary values and Difference is as judgment value, if the judgment value is more than the word width values of the space character of the affiliated type of the first character, in the first word Space is inserted after symbol, is otherwise not inserted into space.
The streaming restoring method in the space of a whole page space, second intercharacter calculated between the first character and the second character Away from process, including:
Coordinate x1, the coordinate x2 of the second character of the first character, the wide w1 of word of the first character are obtained, then the first character pitch d1=x2-x1-w1。
The streaming restoring method in the space of a whole page space, second intercharacter calculated between the second character and the 3rd character Away from process, including:
Coordinate x2, the coordinate x3 of the 3rd character of the second character, the wide w2 of word of the second character are obtained, then the first character pitch d2=x3-x2-w2。
The streaming restoring method in the space of a whole page space, the intercharacter for calculating the first character pitch and the second character pitch Process away from difference is:Character pitch difference D=d1-d2.
The streaming restoring method in the space of a whole page space, the preset difference value threshold value are 3 pounds -5 pounds.
The streaming restoring method in the space of a whole page space, it is described obtain in the space of a whole page successively adjacent the first character, the second character, The process of 3rd character, including:The font resource information of the character is obtained by printed page analysis method.
The streaming restoring method in the space of a whole page space, if the judgment value is more than the space character of the affiliated type of the first character Word width values when, include calculating the judgment value S and the word width values w0 of the space character of the affiliated type of the first character ratio Value n, n=S/w0, n space is then inserted after the first character.
The streaming restoring method in the space of a whole page space, when font is English, for the initial character of a line, if its is previous When character is the hyphen "-" of English, then space is not inserted into before the initial character;Otherwise a space is inserted before the initial character.
The streaming restoring method in the space of a whole page space, the space of the insertion is the word belonging to the character before the insertion position The space of body type.
The streaming restoring method in the space of a whole page space, the right value and the left side are worth from the word in character library where the character Obtained in mould information.
A kind of streaming also original system in space of a whole page space, including:
Acquiring unit:Obtain the first character adjacent successively in the space of a whole page, the second character, the 3rd character;
Character pitch computing unit:The first character pitch between the first character and the second character is calculated, and calculates second The second character pitch between character and the 3rd character, then calculate the first character pitch and the second character pitch character pitch it is poor Value;
Compare insertion unit:If the character pitch difference is less than preset difference value threshold value, space is not inserted into;Otherwise, Calculate the right value of the first character and the left side value of the second character and be used as boundary values and then calculate first character pitch With the boundary values and difference as judgment value, if the word that the judgment value is more than the space character of the affiliated type of the first character is wide Value, then insert space after the first character, be otherwise not inserted into space.
The streaming also original system in the space of a whole page space, in the character pitch computing unit, calculate the first character and second The process of the second character pitch between character, including:
Coordinate x1, the coordinate x2 of the second character of the first character, the wide w1 of word of the first character are obtained, then the first character pitch d1=x2-x1-w1。
The streaming also original system in the space of a whole page space, in the character pitch computing unit, calculate the second character and the 3rd The process of the second character pitch between character, including:
Coordinate x2, the coordinate x3 of the 3rd character of the second character, the wide w2 of word of the second character are obtained, then the first character pitch d2=x3-x2-w2。
The streaming also original system in the space of a whole page space, in the character pitch computing unit, the first intercharacter of the calculating Process away from the character pitch difference with the second character pitch is:Character pitch difference D=d1-d2.
The streaming also original system in the space of a whole page space, the preset difference value threshold value are 3 pounds -5 pounds.
The streaming also original system in the space of a whole page space, in the acquiring unit, pass through printed page analysis method and obtain the word The font resource information of symbol.
The streaming also original system in the space of a whole page space, it is described relatively to insert in unit, if the judgment value is more than first During the word width values of the space character of the affiliated type of character, in addition to calculate the judgment value S and the affiliated type of the first character The word width values w0 of space character ratio n, n=S/w0, n space is then inserted after the first character.
The streaming also original system in the space of a whole page space, in addition to hyphen judging unit, when font is English, for one Capable initial character, if its previous character is the hyphen "-" of English, space is not inserted into before the initial character;Otherwise at this A space is inserted before initial character.
The streaming also original system in the space of a whole page space, the space of the insertion is the word belonging to the character before the insertion position The space of body type.
The streaming also original system in the space of a whole page space, the right value and the left side are worth from the word in character library where the character Obtained in mould information.
The above-mentioned technical proposal of the present invention has advantages below compared with prior art,
(1)The streaming restoring method in space of a whole page space of the present invention, obtain three characters adjacent successively in the space of a whole page;Meter Two neighboring character pitch simultaneously calculates character pitch difference;If the character pitch difference is less than preset difference value threshold Value, then be not inserted into space;Otherwise, calculate the right value of the first character and the left side of the second character is worth and as boundary values and, so Calculate afterwards first character pitch and the boundary values and difference as judgment value, if the judgment value is more than the first character The word width values of the space character of affiliated type, then insert space after the first character, be otherwise not inserted into space.So, calculating is passed through The spacing of blank position between former and later two characters, it is due to that proportional spacing caused by typesetting is also due to deposit to obtain the spacing Caused by space character, if the difference of the character pitch at front and rear two in threshold value, illustrates that its interval is uniform, it is not necessary to Space is inserted, if the difference of spacing is larger, whether is inserted by multilevel iudge, so, is reduced by space, by the space of a whole page The space needed increases required space when streaming is changed, and so can carry out streaming according to the presentation effect of the actual space of a whole page The reduction treatment in space so that the displaying of the space of a whole page is more accurate.
(2)The streaming restoring method in space of a whole page space of the present invention, the seat that character pitch passes through two adjacent characters Mark difference subtracts the wide acquisition of word of above character, due to coordinate value and word it is wide be all by the font resource information after printed page analysis It is readily available, can thus directly invokes to calculate its character pitch value.
(3)The streaming restoring method in space of a whole page space of the present invention, preset difference value threshold value is 3 pounds -5 pounds, due to font The difference of the information such as type, even therefore uniformly blank caused by typesetting may also character pitch it is not fully identical, therefore The difference of the character pitch then thinks that its character is evenly distributed within the specific limits, without inserting space, the preset difference value threshold value For 3 pounds -5 pounds, meet the normal distribution scope of existing interval difference during font arrangement.
(4)The streaming restoring method in space of a whole page space of the present invention, the character is obtained by printed page analysis method Font resource information, the right value and left side value obtain from the formed word module information in character library where the character.These information All it is that directly directly can obtain and call from existing resource information, while reducing the complexity of calculating, Improve the precision of data acquisition.
(5)The streaming restoring method in the space of a whole page space, the space of the insertion is belonging to the character before the insertion position Font type space.Because in the solution of the present invention, consideration is that appropriate space is inserted after suitable character, therefore Maintain with the character before the position that font is consistent when being inserted into the space, so as to the most appropriate form for restoring space, Improve the visuality of the space of a whole page.
(6)The streaming restoring method in the space of a whole page space, when font is English, for the initial character of a line, if its When previous character is the hyphen "-" of English, then space is not inserted into before the initial character;Otherwise a sky is inserted before the initial character Lattice.By the judgement to hyphen, the character of the word of branch is obtained, for the special circumstances specially treated, so as to reach More preferable treatment effect.
(7)The streaming restoring method in the space of a whole page space, when blank position is more, by calculate the judgment value S with The word width values w0 of the space character of the affiliated type of first character ratio n, obtain the number in the space of required insertion, such root According to needing to insert multiple characters, the information of the original space of a whole page is preferably restored.
Brief description of the drawings
In order that present disclosure is more likely to be clearly understood, specific embodiment and combination below according to the present invention Accompanying drawing, the present invention is further detailed explanation, wherein
Fig. 1 is a width master face master drawing;
Fig. 2 is the master drawing for existing after the width corresponding to Fig. 1 is reset space display mistake;
Fig. 3 is another master drawing for existing after the width corresponding to Fig. 1 is reset space display mistake;
Fig. 4 is another width master face master drawing;
Fig. 5 is the master drawing for the presence space display mistake that the space of a whole page occurs after resetting in Fig. 4;
Fig. 6 is the flow chart of the streaming restoring method in space of a whole page space in one embodiment of the present of invention;
Fig. 7 is the character schematic diagram of the streaming restoring method in space of a whole page space in one embodiment of the present of invention.
Embodiment
Embodiment 1:
The present invention provides a kind of streaming restoring method in space of a whole page space, flow chart such as Fig. 6, comprises the following steps:
(1)The first character O adjacent successively in the space of a whole page, the second character I, the 3rd character g are obtained, in Fig. 7.
(2)The first character pitch d1 between the first character O and the second character I is calculated, and calculates the second character I and the 3rd The second character pitch d2 between character g, then calculate the first character pitch d1 and the second character pitch d2 character pitch difference D =d1-d2。
(3)If the character pitch difference D is less than preset difference value threshold value, threshold value is 5 pounds herein, then is not inserted into space; Otherwise, calculate the first character O the right value R1 and the second character I the left side value L2's and conduct boundary values and R1+L2, described the right Value R1 and left side value L2 obtains from the formed word module information in character library where the character.Then the first character pitch d1 is calculated With the boundary values and R1+L2 difference as judgment value S, S=d1-(R1+L2)If the judgment value S is more than the first character institute Belong to the word width values w0 of the space character of type, then space is inserted after the first character, is otherwise not inserted into space.
The streaming restoring method in a kind of space of a whole page space described in the present embodiment, by calculating the sky between former and later two characters The spacing of white position, it is due to that proportional spacing caused by typesetting is also due to caused by space character being present, such as to obtain the spacing The difference of character pitch at two before and after fruit then illustrates that its interval is uniform in threshold value, it is not necessary to space is inserted, if spacing Whether difference is larger, then inserted by multilevel iudge, so, is reduced by space, and the space needed in the space of a whole page is turned in streaming When changing, increase required space, the reduction treatment in streaming space can be so carried out according to the presentation effect of the actual space of a whole page so that The displaying of the space of a whole page is more accurate.
Embodiment 2:
(1)Obtain three characters adjacent successively in the space of a whole page, such as the first character O, the second character I, the 3rd character g.Pass through This step can obtain all data parameters in the structured message of the space of a whole page or document, so as to obtain word corresponding to the character Body resource information, including the data such as font, coordinate, word be wide.These information are all can be directly from existing resource information Directly obtain and call, while reducing the complexity of calculating, also improve the precision of data acquisition.
(2)The first character pitch d1 between the first character O and the second character I is calculated, and calculates the second character I and the 3rd The second character pitch d2 between character g, then calculate the first character pitch d1 and the second character pitch d2 character pitch difference D =d1-d2, wherein, in this step, coordinate x1, the coordinate x2 of the second character of the first character of acquisition, the wide w1 of word of the first character, Then first character pitch d1=x2-x1-w1.Obtain coordinate x2, the coordinate x3 of the 3rd character of the second character, the word of the second character Wide w2, then first character pitch d2=x3-x2-w2.Described coordinate x1, x2 and word wide w1, w2 can be from the knots of document Structure information passes through font resource information acquisition.Character pitch subtracts above character by the coordinate difference of two adjacent characters The wide acquisition of word, due to coordinate value and word it is wide be all to be readily available by the font resource information after printed page analysis, thus may be used Directly invoke to calculate its character pitch value.
(3)If the character pitch difference D is less than preset difference value threshold value, threshold value is 3 pounds herein, then is not inserted into space; Otherwise, calculate the first character O the right value R1 and the second character I the left side value L2's and conduct boundary values and R1+L2, described the right Value R1 and left side value L2 obtains from the formed word module information in character library where the character.Then the first character pitch d1 is calculated With the boundary values and R1+L2 difference as judgment value S, S=d1-(R1+L2)If the judgment value S is more than the first character institute Belong to the word width values w0 of the space character of type, now, also calculate the judgment value S and the affiliated type of the first character space character Word width values w0 ratio n, n=S/w0, then after the first character insert n space.When blank position is more, pass through meter The judgment value S and the word width values w0 of the space character of the affiliated type of the first character ratio n are calculated, obtains the sky of required insertion The number of lattice, so inserts multiple characters as needed, preferably restores the information of the original space of a whole page.The space of Insert Here is The space of font type belonging to first character.Because in the solution of the present invention, consideration is inserted just after suitable character When space, therefore maintain that font is consistent when being inserted into the space with the character before the position, restored so as to most appropriate The form in space, also improve the visuality of the space of a whole page.
As the embodiment that can be converted, the preset difference value threshold value is 3 pounds -5 pounds.Due to information such as font types Difference, even therefore uniformly blank caused by typesetting may also character pitch it is not fully identical, therefore the character pitch Difference then thinks that its character is evenly distributed within the specific limits, and without inserting space, the preset difference value threshold value is 3 pounds -5 pounds, symbol Close the normal distribution scope of existing interval difference during font arrangement.
As further preferred embodiment, when carrying out streaming rearrangement, when font is English, for certain a line Initial character, if its previous character is the hyphen "-" of English, space is not inserted into before the initial character;Otherwise in the lead-in One space of insertion before symbol.By the judgement to hyphen, the character of the word of branch is obtained, it is special for the special circumstances Processing, so as to reach more preferable treatment effect.
Embodiment 3:
The present embodiment provides a kind of streaming also original system in space of a whole page space, including:
Acquiring unit:Obtain the first character adjacent successively in the space of a whole page, the second character, the 3rd character.Pass through printed page analysis Method obtains the font resource information of the character, wherein including character library information, has formed word module information in character library information.Pass through version Surface analysis method obtains the font resource information of the character, and the right value and left side value are from character library where the character Obtained in formed word module information.These information are all that directly directly can obtain and call from existing resource information, are reduced While the complexity calculated, the precision of data acquisition is also improved.
Character pitch computing unit:The first character pitch between the first character and the second character is calculated, and calculates second The second character pitch between character and the 3rd character, then calculate the first character pitch and the second character pitch character pitch it is poor Value.
In the character pitch computing unit, coordinate x1, the coordinate x2 of the second character of the first character, the first character are obtained The wide w1 of word, then first character pitch d1=x2-x1-w1;Obtain coordinate x2, the coordinate x3 of the 3rd character of the second character, second The wide w2 of word of character, then first character pitch d2=x3-x2-w2, character pitch difference D=d1-d2.Above-mentioned character pitch passes through phase The coordinate difference of two adjacent characters subtracts the wide acquisition of word of above character, due to coordinate value and word it is wide be all to pass through printed page analysis Font resource information afterwards is readily available, and can thus be directly invoked to calculate its character pitch value.
Compare insertion unit:If the character pitch difference is less than preset difference value threshold value(Preset difference value threshold value is stated as 3 - 5 pounds of pound), then it is not inserted into space;Otherwise, calculate the right value of the first character and the left side of the second character is worth(Described the right value and Left side value obtains from the formed word module information in character library where the character)And as boundary values and, then calculate first word Accord with spacing and the boundary values and difference be used as judgment value, if space character of the judgment value more than the affiliated type of the first character Word width values, then space is inserted after the first character, is otherwise not inserted into space, the space of the insertion is belonging to the first character The space of font type.The space of the insertion be the insertion position before character belonging to font type space.Due to this In the scheme of invention, consideration is that appropriate space is inserted after suitable character, thus when being inserted into the space with the position Preceding character maintains that font is consistent, so as to the most appropriate form for restoring space, also improves the visuality of the space of a whole page.It is described Preset difference value threshold value is generally 3 pounds -5 pounds, due to the difference of the information such as font type, even therefore uniformly empty caused by typesetting In vain may also character pitch it is not fully identical, therefore the difference of the character pitch then think within the specific limits its character be distributed Uniformly, without inserting space, the preset difference value threshold value is 3 pounds -5 pounds, meets normal point of existing interval difference during font arrangement Cloth scope.
As further preferred embodiment, the streaming also original system in the space of a whole page space, the relatively insertion unit In, if the judgment value is more than the word width values of the space character of the affiliated type of the first character, in addition to calculate the judgment value S With the word width values w0 of the space character of the affiliated type of the first character ratio n, n=S/w0, then n is inserted after the first character Individual space.When blank position is more, by the word for calculating the judgment value S and the space character of the affiliated type of the first character Width values w0 ratio n, the number in the space of required insertion is obtained, multiple characters is so inserted as needed, preferably restores The originally information of the space of a whole page.
As the situation of hyphen in English in other embodiment, is considered, the streaming in the space of a whole page space reduces system System, in addition to hyphen judging unit, when font is English, for the initial character of a line, if its previous character is English Hyphen "-" when, then be not inserted into space before the initial character;Otherwise a space is inserted before the initial character.By to loigature The judgement of symbol, the character of the word of branch is obtained, for the special circumstances specially treated, so as to reach more preferable processing effect Fruit.
The streaming restoring method in space of a whole page space of the present invention, obtain three characters adjacent successively in the space of a whole page;Calculate Two neighboring character pitch simultaneously calculates character pitch difference;If the character pitch difference is less than preset difference value threshold value, Then it is not inserted into space;Otherwise, the right value of the first character of calculating is with the left side value of the second character and as boundary values and Ran Houji Calculate first character pitch and the boundary values and difference be used as judgment value, if the judgment value is more than belonging to the first character The word width values of the space character of type, then insert space after the first character, be otherwise not inserted into space.So, before and after by calculating The spacing of blank position between two characters, it is due to that proportional spacing caused by typesetting is also due in the presence of empty to obtain the spacing Caused by lattice character, if the difference of the character pitch at front and rear two in threshold value, illustrates that its interval is uniform, it is not necessary to insert Space, if the difference of spacing is larger, whether inserted by multilevel iudge, so, reduced by space, will needed in the space of a whole page Space when streaming is changed, increase required space, so can carry out streaming space according to the presentation effect of the actual space of a whole page Reduction treatment so that the displaying of the space of a whole page is more accurate.
Embodiment 4:
The specific embodiment of the streaming restoring method in another space of a whole page space is given below.Main process is as follows:
First, all streaming paragraph downflow system sentences correspond to the page of layout page in structured message in acquisition format document page by page Face block pageBlock information { i.e. n pageObject }, text can be obtained by pageObject or other are non-textual such as The format primitive information such as Image, such as Text each TextCode information of fontId, fontSize, and n x, y, Charspace, textString }, other possible format precise location informations }.Wherein fontId can point to format document Word in the CSS of font information (including possible embedded word font information) or the structuring streaming information in format document Body information.It is another to also need to obtain in format document that all streaming paragraph downflow system sentences correspond to the space of a whole page page in structured message page by page Type page width layoutPageWidht.
Then, the wide list of word of each text character pel in layout page is obtained by logical segment.Patrolled one by one from above-mentioned first Collect and quaternary group information { x, y, charpace, textString } is obtained in the space of a whole page text sentence TextCode in paragraph, its Chinese The text of this is textString, and the space of a whole page coordinate of its initial character is (x, y), and the spacing of each character is in text charspace.Then its character is taken out one by one from textString text sentences, its word is obtained in the character library where the character Word in mould Glyph information is wide, Left/Right Bearing values(That is left side value, the right value, are shown in Fig. 7), then the x with the character Coordinate value is added(For initial character in textString, x values can be fetched directly into;For non-initial character, x values are the type matrix Right Bearing values in Glyph add the word width values and charspace values of the upper character in the wide list of word), will This end value recorded in the wide list of word.Remarks, the wide list algorithm of character words is still same as above before and after across text sentence obtains, whenever Word width values>During layoutPageWidth, it is necessary first to by word width values-layoutPageWidth, and if pending being English character, if previous character be hyphen as "-", it is wide to disregard the word, and is dynamically added a streaming in the character tail Space;If previous character is not hyphen, a streaming space is simply added.
After obtaining the wide information of above-mentioned coordinate information, word and left side value, the right value information, then streaming space can be carried out Reduction treatment, each character of each logic streaming paragraph in the above-mentioned streaming page is calculated successively, judges the character Whether need to insert space afterwards, after calculating all characters, then complete the streaming reduction in space of a whole page space.
For each character, the streaming restoring method in space is as follows behind:
First, the first character pitch d1 between the first character O and the second character I is calculated, and calculates the second character I and the The second character pitch d2 between three character g, then calculate the first character pitch d1 and the second character pitch d2 character pitch it is poor Value D=d1-d2, wherein, in this step, coordinate x1, the coordinate x2 of the second character of the first character are obtained, the word of the first character is wide W1, then first character pitch d1=x2-x1-w1.Coordinate x2, the coordinate x3 of the 3rd character of the second character are obtained, the second character The wide w2 of word, then first character pitch d2=x3-x2-w2, calculates character pitch difference D=d1-d2.
Then, judged, if the character pitch difference D is less than preset difference value threshold value, threshold value is 4 pounds herein, then It is not inserted into space.
Otherwise, calculate the first character O the right value R1 and the second character I the left side value L2's and conduct boundary values and R1+L2, The right value R1 and left side value L2 obtains from the formed word module information in character library where the character.
Then the first character pitch d1 is calculated with the boundary values and R1+L2 difference as judgment value S, S=d1-(R1 +L2)If the judgment value S is more than the word width values w0 of the space character of the affiliated type of the first character, inserted after the first character Space.The space of Insert Here is the space of the font type belonging to the first character.
Then using all characters as the first character, above-mentioned processing is carried out, then has obtained the streaming reduction in space of a whole page space As a result.
Obviously, above-described embodiment is only intended to clearly illustrate example, and is not the restriction to embodiment.It is right For those of ordinary skill in the art, can also make on the basis of the above description it is other it is various forms of change or Change.There is no necessity and possibility to exhaust all the enbodiments.And the obvious change thus extended out or Among changing still in the protection domain of the invention.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more Usable storage medium(Including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)The computer program production of upper implementation The form of product.
The present invention is with reference to method according to embodiments of the present invention, equipment(System)And the flow of computer program product Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
Although preferred embodiments of the present invention have been described, but those skilled in the art once know basic creation Property concept, then can make other change and modification to these embodiments.So appended claims be intended to be construed to include it is excellent Select embodiment and fall into having altered and changing for the scope of the invention.

Claims (20)

1. a kind of streaming restoring method in space of a whole page space, it is characterised in that comprise the following steps:
All streaming paragraph downflow system sentences correspond to the page block message of layout page in structured message in acquisition format document page by page, Obtain the first character adjacent successively in the space of a whole page, the second character, the 3rd character;
Calculate the first character pitch between the first character and the second character, and calculate the between the second character and the 3rd character Two character pitches, then calculate the character pitch difference of the first character pitch and the second character pitch;
If the character pitch difference is less than preset difference value threshold value, space is not inserted into;Otherwise, the right of the first character is calculated It is that the left side of value and the second character is worth and as boundary values and, then calculate first character pitch and the boundary values and difference As judgment value, if the judgment value is more than the word width values of the space character of the affiliated type of the first character, after the first character Space is inserted, is otherwise not inserted into space.
2. the streaming restoring method in space of a whole page space according to claim 1, it is characterised in that the first character of the calculating and the The process of the first character pitch between two characters, including:
Coordinate x1, the coordinate x2 of the second character of the first character, the wide w1 of word of the first character are obtained, then the first character pitch d1= x2-x1-w1。
3. the streaming restoring method in space of a whole page space according to claim 1 or claim 2, it is characterised in that the second character of the calculating The process of the second character pitch between the 3rd character, including:
Coordinate x2, the coordinate x3 of the 3rd character of the second character, the wide w2 of word of the second character are obtained, then the second character pitch d2= x3-x2-w2。
4. the streaming restoring method in space of a whole page space according to claim 3, it is characterised in that the first character pitch of the calculating Process with the character pitch difference of the second character pitch is:Character pitch difference D=d1-d2.
5. the streaming restoring method in space of a whole page space according to claim 4, it is characterised in that the preset difference value threshold value is 3 - 5 pounds of pound.
6. the streaming restoring method in space of a whole page space according to claim 1 or claim 2, it is characterised in that:In the acquisition space of a whole page according to Secondary the first adjacent character, the second character, the process of the 3rd character, including:The character is obtained by printed page analysis method Font resource information.
7. the streaming restoring method in space of a whole page space according to claim 6, it is characterised in that:If the judgment value is more than the During the word width values of the space character of the affiliated type of one character, in addition to calculate the judgment value S and the affiliated type of the first character Space character word width values w0 ratio n, n=S/w0, then after the first character insert n space.
8. the streaming restoring method in space of a whole page space according to claim 1, it is characterised in that:When font is English, for The initial character of a line, if its previous character is the hyphen "-" of English, space is not inserted into before the initial character;Otherwise exist A space is inserted before the initial character.
9. the streaming restoring method in space of a whole page space according to claim 8, it is characterised in that:Inserted for this in the space of the insertion Enter the space of the font type belonging to the character before position.
10. the streaming restoring method in space of a whole page space according to claim 1, it is characterised in that:The right value and left side value Obtained in formed word module information in character library where from the character.
A kind of 11. streaming also original system in space of a whole page space, it is characterised in that including:
Acquiring unit:All streaming paragraph downflow system sentences correspond to the page of layout page in structured message in acquisition format document page by page Face block message, obtain the first character adjacent successively in the space of a whole page, the second character, the 3rd character;
Character pitch computing unit:The first character pitch between the first character and the second character is calculated, and calculates the second character With the second character pitch between the 3rd character, then the character pitch difference of the first character pitch and the second character pitch is calculated;
Compare insertion unit:If the character pitch difference is less than preset difference value threshold value, space is not inserted into;Otherwise, calculate The right of first character is worth being worth with the left side of the second character and is used as boundary values and then calculates first character pitch and institute The difference of boundary values sum is stated as judgment value, if the judgment value is more than the word width values of the space character of the affiliated type of the first character, Space is then inserted after the first character, is otherwise not inserted into space.
12. the streaming in the space of a whole page space also original system according to claim 11, it is characterised in that the character pitch calculates single In member, the process of the first character pitch between the first character and the second character is calculated, including:
Coordinate x1, the coordinate x2 of the second character of the first character, the wide w1 of word of the first character are obtained, then the first character pitch d1= x2-x1-w1。
13. according to the streaming also original system in the space of a whole page space of claim 11 or 12, it is characterised in that the character pitch meter Calculate in unit, calculate the process of the second character pitch between the second character and the 3rd character, including:
Coordinate x2, the coordinate x3 of the 3rd character of the second character, the wide w2 of word of the second character are obtained, then the second character pitch d2= x3-x2-w2。
14. the streaming in the space of a whole page space also original system according to claim 13, it is characterised in that the character pitch calculates In unit, the process of calculating first character pitch and the character pitch difference of the second character pitch is:Character pitch difference D =d1-d2.
15. the streaming in the space of a whole page space also original system according to claim 14, it is characterised in that the preset difference value threshold value is 3 pounds -5 pounds.
16. the streaming in the space of a whole page space also original system according to claim 15, it is characterised in that:In the acquiring unit, lead to Cross the font resource information that printed page analysis method obtains the character.
17. the streaming in the space of a whole page space also original system according to claim 16, it is characterised in that:The relatively insertion unit In, if the judgment value is more than the word width values of the space character of the affiliated type of the first character, in addition to calculate the judgment value S With the word width values w0 of the space character of the affiliated type of the first character ratio n, n=S/w0, then n is inserted after the first character Individual space.
18. the streaming in the space of a whole page space also original system according to claim 17, it is characterised in that:Also include hyphen and judge list Member, when font is English, for the initial character of a line, if its previous character is the hyphen "-" of English, the lead-in Space is not inserted into before symbol;Otherwise a space is inserted before the initial character.
19. the streaming in the space of a whole page space also original system according to claim 18, it is characterised in that:The space of the insertion is should The space of the font type belonging to character before insertion position.
20. the streaming in the space of a whole page space also original system according to claim 19, it is characterised in that:The right value and left side value Obtained in formed word module information in character library where from the character.
CN201310462663.0A 2013-09-30 2013-09-30 The streaming restoring method and system in a kind of space of a whole page space Active CN104516868B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310462663.0A CN104516868B (en) 2013-09-30 2013-09-30 The streaming restoring method and system in a kind of space of a whole page space

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310462663.0A CN104516868B (en) 2013-09-30 2013-09-30 The streaming restoring method and system in a kind of space of a whole page space

Publications (2)

Publication Number Publication Date
CN104516868A CN104516868A (en) 2015-04-15
CN104516868B true CN104516868B (en) 2018-03-06

Family

ID=52792194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310462663.0A Active CN104516868B (en) 2013-09-30 2013-09-30 The streaming restoring method and system in a kind of space of a whole page space

Country Status (1)

Country Link
CN (1) CN104516868B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335346B (en) * 2015-11-09 2018-12-04 汉王科技股份有限公司 A kind of Text Extraction and device of PDF document
CN105893342A (en) * 2015-12-29 2016-08-24 乐视移动智能信息技术(北京)有限公司 Text information processing method and device
CN111695414B (en) * 2020-04-28 2024-03-01 北京奇艺世纪科技有限公司 Document processing method and device, electronic equipment and computer readable storage medium
CN112699634B (en) * 2020-12-28 2022-05-24 掌阅科技股份有限公司 Typesetting processing method of electronic book, electronic equipment and storage medium
CN113723048A (en) * 2021-09-06 2021-11-30 北京字跳网络技术有限公司 Method and device for setting rich text space, storage medium and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295290A (en) * 2008-06-11 2008-10-29 北大方正集团有限公司 Method for multi-row words layout in row
CN101901333A (en) * 2009-05-25 2010-12-01 汉王科技股份有限公司 Method for segmenting word in text image and identification device using same

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7827484B2 (en) * 2005-09-02 2010-11-02 Xerox Corporation Text correction for PDF converters
JP5248845B2 (en) * 2006-12-13 2013-07-31 キヤノン株式会社 Document processing apparatus, document processing method, program, and storage medium
CN101876967B (en) * 2010-03-25 2012-05-02 深圳市万兴软件有限公司 Method for generating PDF text paragraphs
CN101980185B (en) * 2010-10-29 2013-03-27 方正国际软件有限公司 Method and system for removing spaces from text copied from double-layer electronic file

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295290A (en) * 2008-06-11 2008-10-29 北大方正集团有限公司 Method for multi-row words layout in row
CN101901333A (en) * 2009-05-25 2010-12-01 汉王科技股份有限公司 Method for segmenting word in text image and identification device using same

Also Published As

Publication number Publication date
CN104516868A (en) 2015-04-15

Similar Documents

Publication Publication Date Title
CN104516868B (en) The streaming restoring method and system in a kind of space of a whole page space
CA2937702C (en) Emphasizing a portion of the visible content elements of a markup language document
CN103177709B (en) Method and device for displaying characters
CN100498927C (en) Dot-character retracting method in two-dimension pattern engines and Chinese processing method
CN104111922B (en) Processing method and device of streaming document
US7786994B2 (en) Determination of unicode points from glyph elements
CN108090037B (en) Automatic typesetting method and device
WO2015180422A1 (en) Page layout method and device
CN107092585A (en) Mongolian display methods and system based on scalable vector graphicses
CN104111913B (en) A kind of processing method and processing device of streaming document
CN109714627A (en) A kind of rendering method of comment information, device and equipment
CN106446139A (en) Webpage content extracting method and device
CN106558019B (en) Picture arrangement method and device
CN109948129B (en) Rich text editing method and editor based on three-dimensional engine and electronic equipment
EP1959352A2 (en) System and method of report representation
CN113436298B (en) Method and device for automatically generating Chinese character stroke order animation and related components thereof
JP2004213607A (en) Grid tracking and character composition space for adjusting japanese text
KR20110021714A (en) Method and apparatus for the page-by-page provision of an electronic document as a computer graphic
US9984053B2 (en) Replicating the appearance of typographical attributes by adjusting letter spacing of glyphs in digital publications
US20080256441A1 (en) Flash rich textfields
EP3614279A1 (en) A method and a device for displaying text with a proportional font
CN100349203C (en) Co-screen displaying device supporting scale chracter shape and method thereof
CN112765506A (en) Page text content display method, device, equipment and storage medium
CN106776489B (en) Electronic document display method and system of display device
US20040125107A1 (en) Coordinating grid tracking and mojikumi spacing of Japanese text

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220622

Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: Beijing Fangzheng apapi Technology Co., Ltd.

Address before: 100871, Beijing, Haidian District Cheng Fu Road 298, founder building, 9 floor

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: Beijing Fangzheng apapi Technology Co., Ltd.

TR01 Transfer of patent right