CN104516859B - A kind of word modification method and system - Google Patents

A kind of word modification method and system Download PDF

Info

Publication number
CN104516859B
CN104516859B CN201310447805.6A CN201310447805A CN104516859B CN 104516859 B CN104516859 B CN 104516859B CN 201310447805 A CN201310447805 A CN 201310447805A CN 104516859 B CN104516859 B CN 104516859B
Authority
CN
China
Prior art keywords
word
font
wide
width
description information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310447805.6A
Other languages
Chinese (zh)
Other versions
CN104516859A (en
Inventor
孙浩鹏
丁力
董宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Fangzheng Apapi Technology Co Ltd
New Founder Holdings Development Co ltd
Original Assignee
Peking University Founder Group Co Ltd
Beijing Founder Apabi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Founder Group Co Ltd, Beijing Founder Apabi Technology Co Ltd filed Critical Peking University Founder Group Co Ltd
Priority to CN201310447805.6A priority Critical patent/CN104516859B/en
Publication of CN104516859A publication Critical patent/CN104516859A/en
Application granted granted Critical
Publication of CN104516859B publication Critical patent/CN104516859B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

A kind of word modification method and system, at least one font of word in document is obtained, for every kind of font, the word of the word in the font is wide, selects the font of font description information mistake, and the word in the font of font description information mistake is modified.Its font description information is given expression to by the wide information of word well, then it is modified, to the line space that solves to be likely to occur during typesetting is uneven, word gap is excessive or the problems such as text overlays, pass through the characteristic of wide word and non-wide word, processing is modified respectively, effectively increases the readability and aesthetic property of content.

Description

A kind of word modification method and system
Technical field
It is specifically a kind of that there is word amendment invention hair and be the present invention relates to a kind of computer version process field System.
Background technology
Computer font abbreviation font, it is a kind of to contain font pattern set, the data acquisition system of character code collection.It contains character Code arrives the mapping relations of font, and the description information of each font.Glyph bitmap can be drawn out by font description information. Computer character is generally presented to user by the form of document.Streaming document refer to no fixed format, content can be according to it Show the actual size of medium and carry out the document of content arrangement.EPub and TXT is two kinds of conventional streaming documents.With structure The PDF and CEBX for changing information also can be considered streaming document.
With the popularization of smart mobile phone and the rise of mobile Internet, increasing people's selection is carried out using mobile device Mobile reading, and streaming document is the scheme for being best suitable for mobile reading, because the product category of mobile device is various, screen is differentiated Rate is different with size, and " rearrageability " of flowing content allows it reasonably to be showed according to the situation of physical device Its content, this just greatly improves the readability and aesthetic property of content.
But because historical reasons cause to include not rigorous embedded font in part flow file, these fonts contain Vicious font description information, have plenty of containing the high information of vicious row, have plenty of containing wide information of vicious word etc., this is all Problem is brought to normal typesetting process, the content ejected occurs that line space is uneven, word gap is excessive or even word The phenomenon to overlap.This strong influence readability and aesthetic property of content.
Therefore, also occur some composition methods for being directed to terminal reading document in the prior art, in mobile terminal When font changes when webpage layout starts or during webpage layout, judgement treats whether typesetting character can be with wide place Reason, if can be with wide processing, by calling the character duration data pre-saved in mobile terminal to enter the typesetting of line character. By the length of benchmark character string with actually treating the character length of typesetting and to judge to treat that typesetting word whether may be used in the program With wide processing, the method can realize the processing to wide word such as Chinese, Japanese, Korean, but for presence etc. simultaneously The document of wide word and non-wide word (such as English, German), how much is the character included according to it due to non-wide word, and it is wide Degree difference is very big, and this method can not be applicable.In addition, in these schemes, for the document containing wrong font description information, nothing Method is effectively identified and handled accordingly, therefore for the document containing wrong font description information, can not effectively identify word Shape description information whether mistake, also can not respectively be handled, cause the printer's error for wide word and non-wide word It can not find well, font typesetting effect is poor.
The content of the invention
Therefore, the technical problems to be solved by the invention be type-setting mode of the prior art can not pair and meanwhile exist etc. The problem of mistake of the font description information of wide word and non-wide word finds and corrected in time well, so as to propose one kind The word modification method with error correction of error correction can be carried out to the document of wide word and non-wide word be present and be System.
In order to solve the above technical problems, a kind of word modification method of offer and system of the present invention,
A kind of word modification method, including:
Obtain at least one font of word in document;
For every kind of font, the word of the word in the font is wide, selects the font of font description information mistake;
Word in the font of font description information mistake is modified.
Text composition method of the present invention, the word of the word in the font is wide, selects font description The processing of the font of information errors includes:
At least two words in the font are chosen, calculate the word width values of each word and the font frame rectangle of the word The ratio of width, and calculate average value;
Whether threshold value is exceeded according to the average value, determine the font font description information whether mistake.
Text composition method of the present invention, for wide word, the threshold value is 0.9-1.1;For non-wide text Word, the threshold value are 1.2-1.4.
Word modification method of the present invention, the word in the font to font description information mistake are modified Processing include:
It is wide to the word of the word to be modified according to the font frame rectangle width of word.
Word modification method of the present invention, it is wide to the word of the word when being modified, utilize following formula to carry out Amendment:W=D×a:W is revised word width values, and D is the font frame rectangle width value of the word, and a is correction factor, for Wide word, a=1.01-1.1, for non-wide word a=1.22-1.35.
Word modification method of the present invention, for wide word, correction factor a=1.05, repaiied for non-wide word Positive coefficient a=1.3.
Word modification method of the present invention, described for every kind of font, the word of the word in the font is wide, During the font for selecting font description information mistake, handled for embedded font.
Word modification method of the present invention, the word in the font to font description information mistake are modified Process, including:For the word in the font of font description information mistake, first determine whether it is wide word, for Wide word calculates sample average and is modified, and the font frame rectangle width value for obtaining character for non-wide word is gone forward side by side Row amendment.
Word modification method of the present invention, the process for calculating sample average for wide word and being modified is such as Under:
Whether it has been computed first, it is determined that the word of the word is wide, the wide word using the font is chosen if not calculating Sample, then obtain the font border width value of sample and calculate average, then after being corrected with correction factor Width value;If being computed, the result of calculation before directly invoking.
Word modification method of the present invention, in addition to composition step, in the font of font description information mistake Word carries out typesetting after being modified, and word width values heel row is directly obtained for the word of the font without font description information mistake Version, typesetting process are that wide be added to of current word currently has been arranged into text width, if greater than one layout region width, are then reset current Text width is arranged and row baseline is added up into a row highly, then this word under current end of line row;If it is not more than typesetting Peak width then directly current end of line row under this word.
A kind of word update the system, including:
Acquiring unit:Obtain at least one font of word in document;
Selecting unit:For every kind of font, the word of the word in the font is wide, selects font description information mistake Font;
Amending unit:Word in the font of font description information mistake is modified.
Text composing system of the present invention, the selecting unit include:
Mean value computation module:At least two words in the font are chosen, calculate the word width values and the word of each word Font frame rectangle width ratio, and calculate average value;
Judge module:Whether threshold value is exceeded according to the average value, determines whether the font description information of the font is wrong By mistake.
Text composing system of the present invention, for wide word, the threshold value is 0.9-1.1;For non-wide text Word, the threshold value are 1.2-1.4.
Word update the system of the present invention, the amending unit include:Correcting module, according to the font frame of word Rectangle width, it is wide to the word of the word to be modified.
Word update the system of the present invention, in the correcting module, it is modified using following formula:W=D×a:W is amendment Word width values afterwards, D are the font frame rectangle width value of the word, and a is correction factor, for wide word, a=1.01-1.1, For non-wide word a=1.22-1.35.
Word update the system of the present invention, for wide word, correction factor a=1.05, repaiied for non-wide word Positive coefficient a=1.3.
Word update the system of the present invention, in the selecting unit, handled for embedded font.
Word update the system of the present invention, the amending unit include
Judge module, for the word in the font of font description information mistake, first determine whether it is wide word;
Wide word correcting module:Sample average is calculated for wide word and is modified;
Non- wide word correcting module:The font frame rectangle width value of character is obtained for non-wide word and is repaiied Just.
Word update the system of the present invention, the wide word correcting module include:
Judging submodule:Judge that the word of the word is wide whether to be computed,
Calculating sub module:The wide word sample using the font is chosen if not calculating, then obtains the font of sample Border width value simultaneously calculates average, then width value after being corrected with correction factor;
Call submodule:If being computed, the result of calculation before directly invoking.
Word update the system of the present invention, in addition to typesetting unit, in the font of font description information mistake Word carries out typesetting after being modified, and word width values heel row is directly obtained for the word of the font without font description information mistake Version, typesetting process are that wide be added to of current word currently has been arranged into text width, if greater than one layout region width, are then reset current Text width is arranged and row baseline is added up into a row highly, then this word under current end of line row;If it is not more than typesetting Peak width then directly current end of line row under this word.
The above-mentioned technical proposal of the present invention has advantages below compared with prior art,
(1)A kind of word modification method of the present invention and system, at least one font of word in document is obtained, it is right In every kind of font, the word of the word in the font is wide, selects the font of font description information mistake, and font is described to believe The word ceased in the font of mistake is modified.Its font description information is given expression to by the wide information of word well, then carried out Amendment, to the line space that solves to be likely to occur during typesetting is uneven, word gap is excessive or the problems such as text overlays, has Effect improves the readability and aesthetic property of content.
(2)Word modification method of the present invention, the font of font description information mistake obtain in the following manner:Choosing At least two words in the font are taken, calculate the ratio of the word width values of each word and the font frame rectangle width of the word Value, and calculate average value;Whether threshold value is exceeded according to the average value, determines whether the font description information of the font is wrong By mistake.For wide word, the threshold value is 0.9-1.1;For non-wide word, the threshold value is 1.2-1.4, using this method Font description information can effectively be obtained whether there occurs mistake, improve wrong discrimination.
(3)Word modification method of the present invention, according to the font frame rectangle width of word, to the word of the word Width is modified, and is modified using following formula:W=D×a:W is revised word width values, and D is the font frame of the word Rectangle width value, a are correction factor, for wide word, preferably a=1.01-1.1, correction factor a=1.05.For non-wide Word a=1.22-1.35, preferably a=1.3.Wide word and non-wide word are corrected respectively by correction factor, improves and repaiies The neat and aesthetic property of word after just.
(4)Word modification method of the present invention, handled for embedded font, because non-embedded font is without place Reason can normal typesetting, for font description information correctly embed font obtained without processing, therefore by the above method Embedded font containing wrong font description information, and targetedly handle, reduce treating capacity, improve processing speed, In addition, by way of sample statistics so that this method has universal applicability, and implements simple and convenient.
(5)Word modification method of the present invention, for using the word containing wrong font description information font, head First judge whether it is wide word, calculate sample average for wide word and be modified, obtained for non-wide word The font frame rectangle width value of character is simultaneously modified.For wide and non-wide word, different correction strategies is taken, by In wide word such as Chinese, Korean, itself is wide, therefore once can subsequently be directly invoked after calculating;For non-wide text Word, wait needs to be handled individually for each character, so, by targetedly correcting, improve the accuracy of its calculating.
(6)Word modification method of the present invention, for the font containing wrong font description information, by correcting To word width values, current typesetting text width is added to by current word is wide, then by the width compared with one layout region width, No more than being then inserted directly into, otherwise line feed insertion, so ensures that word is sequentially sequentially inserted into, ensure that typesetting effect.
Brief description of the drawings
In order that present disclosure is more likely to be clearly understood, specific embodiment and combination below according to the present invention Accompanying drawing, the present invention is further detailed explanation, wherein
Fig. 1 is the flow chart of the word modification method of one embodiment of the present of invention 1;
Fig. 2 is the flow chart of the word modification method of another embodiment of the presently claimed invention;
Fig. 3 is the structure chart of the word update the system in embodiments of the invention;
Fig. 4 is the method flow diagram of judgment step in embodiments of the invention;
Fig. 5 is the method flow diagram of error correction step in embodiments of the invention;
Fig. 6 is the method flow diagram of composition step in embodiments of the invention;
Fig. 7 is using the design sketch after word modification method typesetting described in embodiments of the invention.
Embodiment
The specific embodiment of word modification method of the present invention and system is given below, accompanying drawing and reality will be coordinated Example is applied to describe in detail, how application technology means solve technical problem and reach the realization of technical effect to the present invention whereby Process can fully understand and realize according to this.
Embodiment 1:
A kind of word modification method is provided in the present embodiment, flow chart as shown in figure 1, including:
(1)Obtain at least one font of word in document;
(2)For every kind of font, the word of the word in the font is wide, selects the word of font description information mistake Body.Detailed process is as follows:
At least two words in the font are chosen, calculate the word width values of each word and the font frame rectangle of the word The ratio of width, and calculate average value;Whether threshold value is exceeded according to the average value, determines the font description information of the font Whether mistake.For wide word, the threshold value takes 0.9 or 1 or 1.1;For non-wide word, the threshold value is 1.2 or 1.3 Or 1.4.
(3)Word in the font of font description information mistake is modified.It is wide according to the font frame rectangle of word Degree, it is wide to the word of the word to be modified.Amendment is modified using following formula:W=D × a, wherein W are revised word Width values, D is the font frame rectangle width value of the word, for wide word, correction factor a=1.05, for non-wide word Correction factor a=1.3.
In other implementations, correction factor a can be selected as needed, for wide word, the general values of a For:1.01-1.1, it is 1.22-1.35. for the non-wide general spans of word a
Embodiment 2:
A kind of word update the system corresponding with embodiment 1 is provided in the present embodiment, including:
(1)Acquiring unit:Obtain at least one font of word in document;
(2)Selecting unit:For every kind of font, the word of the word in the font is wide, selects font description information The font of mistake.Detailed process is as follows:
At least two words in the font are chosen, calculate the word width values of each word and the font frame rectangle of the word The ratio of width, and calculate average value;Whether threshold value is exceeded according to the average value, determines the font description information of the font Whether mistake.For wide word, the threshold value takes 0.9 or 1 or 1.1;For non-wide word, the threshold value is 1.2 or 1.3 Or 1.4.
(3)Amending unit:Word in the font of font description information mistake is modified.According to the shaped edge of word Frame rectangle width, it is wide to the word of the word to be modified.Amendment is modified using following formula:W=D × a, wherein W are to repair Word width values after just, D is the font frame rectangle width value of the word, for wide word, correction factor a=1.05, for non- Wide word correction factor a=1.3.
In other implementations, the correction factor a in amending unit can be selected as needed, for wide text Word, a is general, and value is:1.01-1.1, it is 1.22-1.35. for the non-wide general spans of word a
Embodiment 3:
Another word modification method is provided in the present embodiment, including:
(1)Obtain at least one font of word in document;
(2)For every kind of font, handled for embedded font, for embedding font, according to the word in the font Word it is wide, select the font of font description information mistake.
Five words in the font are chosen as sample, calculate the word width values of each word and the font frame of the word The ratio of rectangle width, and calculate average value;Whether threshold value is exceeded according to the average value, determines the font description of the font Information whether mistake.For wide word, the threshold value takes 1.1;For non-wide word, the threshold value is 1.3.
(3)Word in the font of font description information mistake is modified.For the word of font description information mistake Word in body, first determine whether it is wide word,
Sample average is calculated for wide word and is modified, and process is as follows:
Whether it has been computed first, it is determined that the word of the word is wide, the wide word using the font is chosen if not calculating Sample, then obtain the font border width value of sample and calculate average, then after being corrected with correction factor Width value, correct and be modified using following formula:W=D × a, wherein W are revised word width values, and D is the font of the word The sample average of frame rectangle width value, correction factor a=1.07;If being computed, the result of calculation before directly invoking.
The font frame rectangle width value of character is obtained for non-wide word and is modified by correction factor.W=D × a, W are revised word width values, and D is the font frame rectangle width value of the word, a=1.25.
As other embodiment, composition step is also included after above-mentioned makeover process, as shown in Fig. 2 describing to believe to font Cease after the word in the font of mistake is modified and carry out typesetting, it is straight for the word of the font without font description information mistake Obtain and take typesetting after word width values, typesetting process is that wide be added to of current word currently has been arranged into text width, if greater than typesetting area Field width degree, then reset and currently arranged text width and row baseline is added up into a row highly, then this text under current end of line row Word;If no more than if one layout region width directly under current end of line row this word.
Embodiment 4:
The present embodiment provides a kind of word update the system, including:
(1)Resolution unit(Alternatively referred to as acquiring unit):Obtain at least one font of word in document;
(2)Judging unit(Or it is selecting unit):For every kind of font, the word of the word in the font is wide, choosing Select out the font of font description information mistake.Including:
Mean value computation module:At least two words in the font are chosen, calculate the word width values and the word of each word Font frame rectangle width ratio, and calculate average value.
Judge module:Whether threshold value is exceeded according to the average value, determines whether the font description information of the font is wrong By mistake.
(3)Error correction unit(Or it is amending unit):Word in the font of font description information mistake is modified, Including
3.1st, judge module, for the word in the font of font description information mistake, first determine whether it is wide Word.
3.2nd, wide word correcting module:Sample average is calculated for wide word and is modified, including:
Judging submodule:Judge that the word of the word is wide whether to be computed,
Calculating sub module:The wide word sample using the font is chosen if not calculating, then obtains the font of sample Border width value simultaneously calculates average, then width value after being corrected with correction factor;
Call submodule:If being computed, the result of calculation before directly invoking.
3.3rd, non-wide word correcting module:The font frame rectangle width value for obtaining character for non-wide word is gone forward side by side Row amendment.
(4)Typesetting unit, typesetting is carried out after being modified to the word in the font of font description information mistake, for not The word of the font of the mistake of description information containing font directly obtains typesetting after word width values, and typesetting process is to be added to current word is wide Text width currently has been arranged, if greater than one layout region width, has then reset and current has arranged text width and by row baseline cumulative one Individual row height, then this word under current end of line row;If one layout region width is not more than directly under current end of line row This word.
Embodiment 5:
The specific embodiment of the present invention is given below, a kind of word modification method, comprises the following steps:
(1)Analyzing step:The text information of document is received, a streaming word content is such as received, is ePub in the present embodiment Streaming document, font resource therein and word content are parsed, see 205 step in Fig. 2.
The font resource is character font data, be it is a kind of contain font pattern set, the data acquisition system of character code collection, it is described Character code is to the mapping relations of font, and the description information of each font.Font describes each calligraphy and painting into assorted in simple terms The information such as sample and its width, height.Font can exist in the form of unique file, can also be embedded into document into For a part for document.
The font name or volume that are used to identify font that the word content is the order of word, system is font distribution Number, the font size of size when being drawn for descriptive text and the unicode encoded radios corresponding to each word. It is also assumed that word content is exactly the set of word description, a word description mainly includes its code value(It is true by code value Surely it is specifically which word), also have which font it has used.The data content received can be resolved to by analyzing step Font resource and word content.In the present embodiment, it is respectively processed for different character font datas, because different fonts has The wide and different corrected parameter of different words, is handled respectively for different character font datas, can improve and each font is sentenced Disconnected result, and correction effect is improved, so as to significantly more preferable typesetting effect.
(2)Judgment step:Find out the embedded font containing wrong font description information.In this step, for every kind of word Body resource is judged successively, is judged the font containing wrong font description information, is seen 210 step in Fig. 2.First determine whether to work as Whether preceding font to be analyzed is embedded font, if current font to be judged is non-embedded font, is regarded as correct font; Then handled for embedded font:A number of word sample is chosen in the word content analytically gone out, these word samples Originally the font to be judged is belonged to, and they are not punctuation marks, then obtain same font size respectively to these word samples Word width values and font frame rectangle width value, calculate the word width values of word in each sample and font frame rectangle width value Ratio, then and assembly average judges whether the average value exceeds threshold value, if average value exceeds threshold range(It is excessive It is or too small), then it is the embedded font containing wrong font description information to mark this font.
In this step, by word width values in statistical sample and the average value of the ratio of font frame rectangle width, with threshold value Scope is compared, and so as to obtain the embedded font containing wrong font description information, is because non-embedded font need not be handled Can normal typesetting, for font description information correctly embed font contained without processing, therefore by the above method The embedded font of mistake font description information, and targetedly handle, reduce treating capacity, improve processing speed, in addition, By way of sample statistics so that this method has universal applicability, and implements simple and convenient.
(3)Error correction step:For using containing wrong font description information font word, first determine whether its whether be Wide word, according to judged result, that containing wrong font description information font is judged in judgment step to having used respectively A little words are modified, and see step 215 in Fig. 2.Firstly for the word using the font containing wrong font description information, root Judge whether the embedded font belongs to wide word according to its unicode encoded radio(Such as Chinese, Japanese, Korean), still fall within Other non-wide words.For wide word, each font only needs one secondary word of calculating wide, takes certain amount word sample Font frame rectangle width value average value, then use correction factor(Empirically determined empirical value)Average value is modified, The word for calculating all wide words of this font is wide.Because each font only needs one secondary word of calculating wide, in follow-up judgement, Whether it has been computed first, it is determined that the word of the word is wide, the wide word sample using the font has been chosen if not calculating, then Obtain the font border width value of sample and calculate average, then corrected with correction factor.If be computed, directly Connect and call the word of all wide words of this font calculated before wide.For non-wide word, it is necessary to be each character cell Calculate that word is wide, the font frame rectangle width value of character is first obtained, with correction factor(Empirical value)It is modified.It is wide after amendment Angle value is multiplied by font size wide, a width of correctly word width values of the actual word that obtain actual word.
In this error correction step, firstly for using the word containing wrong font description information font, it is first determined whether Whether it is wide word, calculates sample average for wide word and be modified, the word of character is obtained for non-wide word Shape frame rectangle width value is simultaneously modified.For wide and non-wide word, different correction strategies is taken, due to wide text Word such as Chinese, Korean, itself is wide, therefore once can subsequently be directly invoked after calculating;For non-wide word, needs are waited Handled individually for each character, so, by targetedly correcting, improve the accuracy of its calculating.Due to repairing It is the amendment carried out using relative value as standard during just, therefore, it is necessary to be multiplied by font size, to obtain actual word wide after correcting.
(4)Composition step:All words are subjected to typesetting.Typesetting is that the word in word flow is placed into suitable exhibition one by one In the one layout region for showing medium size, i.e., streaming word is arranged and line-break, according to the wide information of revised word by word Arrange one by one, so that it is determined that specific coordinate of each word on displaying medium, is shown in step 220 in Fig. 2.First, handle successively Each word in word flow;Obtained correctly by error correction step for the embedded font containing wrong font description information Word width values, for the embedded font without wrong font description information and non-embedded font, directly obtain word width values;Then will The current wide current line that is added to of word has arranged text width, with this width compared with one layout region width, if greater than typesetting Peak width, then reset current line and arranged text width and row baseline is added up into a row highly, then under current end of line row This word, the maximum of the high word height in previous row arrangement of row;Directly working as if one layout region width is not more than Preceding this lower word of end of line row.Particular flow sheet is referring to Fig. 5.In this step, for the embedded word containing wrong font description information Body, word width values are obtained by error correction step, are added to current typesetting text width by current word is wide, then by the width with One layout region width compares, and no more than being then inserted directly into, otherwise line feed insertion, so ensures that word is sequentially sequentially inserted into, It ensure that typesetting effect.
Word modification method described in the present embodiment, including analyzing step, judgment step, error correction step and typesetting step Suddenly, literal resource and word content are parsed first, are then found out according to judgment criteria containing wrong font description information Embedded font, then embed font for this and carry out wide judgement, for whether it is wide be modified respectively, by wide word and Non- wide word is distinguished being modified, to the line space that solves to be likely to occur during typesetting is uneven, word gap The problems such as excessive or text overlays, effectively increase the readability and aesthetic property of content.
Embodiment 6:
The emphasis of the present embodiment needs to rely on font description information calculating typesetting position during being directed to typesetting in the present invention Process, the typesetting rule more refined being related in specific implementation do not refer in the present embodiment.Combined in the present embodiment One specific application example, is illustrated.
An ePub document is provided in the present embodiment, the effect after typesetting is as shown in Figure 7.It contains 3 kinds of fonts, respectively It is the font 1 of title, the english font 3 in Chinese font 2 and text in text, wherein font 1 are correct fonts, font 2 It is the font containing the wide information of erroneous words with font 3.
The text composing system with error correction is used in the present embodiment, referring to Fig. 3, uses its corresponding word amendment Method, flow chart is referring to Fig. 2.3 kinds of font resources and the word flow that document uses are parsed first(Step 205);Then it is right The 3 kinds of fonts parsed are judged respectively(Step 210), judge that font 2 and font 3 are wrong fonts;Then calculate Wide word in font 2(Chinese and Chinese punctuation mark)Word width values(Step 215);Finally by typesetting unit by word flow Carry out typesetting(Step 220).
Detailed process is as follows:
(1)Document to be analyzed is received, as shown in fig. 7, parsing font resource therein and word content.It is included 3 kinds of fonts, are the font 1 of title respectively, the english font 3 in Chinese font 2 and text in text, and wherein font 1 is Correct font, font 2 and font 3 are the fonts containing the wide information of erroneous words.
(2)Deterministic process reference picture 4,3 kinds of fonts are assessed successively(Step 305), judge whether font is embedded Font(Step 310), because font 1 is non-embedded font, it is correct font to mark it(Step 335).
Font 2 and font 3 are all embedded fonts, set sample number as 5, and 5 words are chosen from respective word content Sample(Step 315), the word sample chosen for font 2 is " ice ", "AND", " fire ", " it ", " song ", is chosen for font 3 Word sample be " e ", " p ", " i ", " c ", " f ".Word width values and font frame when obtaining these word sample font sizes 1 respectively Rectangle width value, concrete numerical value is as shown in Table 1 and Table 2.It can be calculated the word width values and font border width of font 2 and font 3 Value ratio average is 0.57 and 0.66 respectively(Step 320).Font 2 belongs to Chinese font, and rule of thumb preset threshold range is 0.9-1.1, font 3 belong to english font, and rule of thumb preset threshold range is 1.2-1.4.By the ratio of font 2 and font 3 Average with its corresponding to threshold range contrast(Step 325), they are labeled as by two fonts all beyond threshold range Font containing wrong font description information(Step 330).
Table 1, the wide numerical value of word sample word of font 2
Ice With Fire It Song
Word width values 0.5 0.5 0.5 0.5 0.5
Font frame rectangle width value 0.95 0.95 0.95 0.95 0.95
Table 2, the wide numerical value of word sample word of font 3
e p i c f
Word width values 0.25 0.25 0.1 0.25 0.2
Font frame rectangle width value 0.38 0.38 0.15 0.38 0.31
(3)Error correction procedure reference picture 5, for all words using font 1, it is not necessary to carry out error correction.For using word The word of body 2, illustrate by taking " ice " word as an example, first determine whether out that its unicode belongs to wide literal scope(Step 405), then sentence It is wide whether font 2 used in it of breaking has calculated wide word word(Step 410), because it is first word by error correction, Not yet calculated, then perform calculating process.Choose font 2 wide word sample, still take " ice ", "AND", " fire ", This 5 words of " it ", " song " are as sample(Step 420), obtain their font border width value and calculate average 0.95, Multiplied by with correction factor 1.05(Empirical value)It is modified, the revised wide word word for obtaining font 2 is wide by 1(Step 425).Multiply It is wide by 10 that its actual word is obtained with the font size 10 of word " ice "(Step 430), using other wide words of font 2, correction word is wide again When directly perform step 430.For the word using font 3, illustrate by taking " e " word as an example, first determine whether out its unicode It is not belonging to wide literal scope(Step 410), the font frame rectangle width value 0.38 of this word is obtained, uses correction factor 1.3(Empirical value)It is modified to obtain 0.49(Step 415).Being multiplied by the font size 10 of word " e ", to obtain its actual word wide by 4.9(Step Rapid 430).
(4)Typesetting process reference picture 6, preceding four words in this document are caption texts, do not deploy to illustrate title here Typesetting rule.The typesetting process of following clarifying text.Body text is handled successively(Step 505), illustrate by taking " ice " word as an example, Judge that it uses wrong font(Step 510), its correct word width values 10 is then obtained by error correction unit(Step 515), will Wide be added to of the word of current character has currently arranged text width, and it is 40 to be increased by 30(Step 525), judge that it is less than one layout region Width 340(Step 530), then current end of line row under " ice " word(Step 540).Word " j " is steps be repeated alternatively until, is obtained It is wide by 2.5 to its word(Step 515), accumulate it to text width has currently been arranged, 342.5 risen to by 340(Step 525), Judge that it has been above one layout region width(Step 530), then it is 0 to reset current text width of having arranged, and row baseline is added up One row height 15(Step 535), the then word " j " under new a line row.This process is repeated until drained all words.Row Effect after version is as shown in Figure 6.
Obviously, above-described embodiment is only intended to clearly illustrate example, and is not the restriction to embodiment.It is right For those of ordinary skill in the art, can also make on the basis of the above description it is other it is various forms of change or Change.There is no necessity and possibility to exhaust all the enbodiments.And the obvious change thus extended out or Among changing still in the protection domain of the invention.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more Usable storage medium(Including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)The computer program production of upper implementation The form of product.
The present invention is with reference to method according to embodiments of the present invention, equipment(System)And the flow of computer program product Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
Although preferred embodiments of the present invention have been described, but those skilled in the art once know basic creation Property concept, then can make other change and modification to these embodiments.So appended claims be intended to be construed to include it is excellent Select embodiment and fall into having altered and changing for the scope of the invention.

Claims (13)

  1. A kind of 1. word modification method, it is characterised in that including:
    Obtain at least one font of word in document;
    For every kind of font, the word of the word in the font is wide, selects the font of font description information mistake, including: At least two words in the font are chosen, calculate the ratio of the word width values of each word and the font frame rectangle width of the word Value, and calculate average value;Whether threshold value is exceeded according to the average value, determines whether the font description information of the font is wrong By mistake;
    Word in the font of font description information mistake is modified.
  2. 2. word modification method according to claim 1, it is characterised in that for wide word, the threshold value is 0.9- 1.1;For non-wide word, the threshold value is 1.2-1.4.
  3. 3. word modification method according to claim 1, it is characterised in that the font to font description information mistake In the processing that is modified of word include:
    It is wide to the word of the word to be modified according to the font frame rectangle width of word.
  4. 4. word modification method according to claim 3, it is characterised in that it is wide to the word of the word when being modified, It is modified using following formula:W=D × a:W is revised word width values, and D is the font frame rectangle width value of the word, A is correction factor, for wide word, a=1.01-1.1, for non-wide word a=1.22-1.35.
  5. 5. word modification method according to claim 4, it is characterised in that for wide word, correction factor a= 1.05, for non-wide word correction factor a=1.3.
  6. 6. the word modification method according to any one of claim 1-5, it is characterised in that described for every kind of word Body, the word of the word in the font is wide, during the font for selecting font description information mistake, for embedding font Handled.
  7. 7. word modification method according to claim 6, it is characterised in that the font to font description information mistake In the process that is modified of word, including:For the word in the font of font description information mistake, whether it is first determined whether For wide word, calculate sample average for wide word and be modified, the shaped edge of character is obtained for non-wide word Frame rectangle width value is simultaneously modified.
  8. 8. word modification method according to claim 7, it is characterised in that calculate sample average for wide word and go forward side by side The process of row amendment is as follows:
    Whether it has been computed first, it is determined that the word of the word is wide, the wide word sample using the font is chosen if not calculating, Then obtain the font border width value of sample and calculate average, then width after being corrected with correction factor Value;If being computed, the result of calculation before directly invoking.
  9. 9. word modification method according to claim 8, it is characterised in that also including composition step, describe to believe to font Cease after the word in the font of mistake is modified and carry out typesetting, it is straight for the word of the font without font description information mistake Obtain and take typesetting after word width values, typesetting process is that wide be added to of current word currently has been arranged into text width, if greater than typesetting area Field width degree, then reset and currently arranged text width and row baseline is added up into a row highly, then this text under current end of line row Word;If no more than if one layout region width directly under current end of line row this word.
  10. A kind of 10. word update the system, it is characterised in that including:
    Acquiring unit:Obtain at least one font of word in document;
    Selecting unit:For every kind of font, the word of the word in the font is wide, selects the word of font description information mistake Body;The selecting unit includes:Mean value computation module:At least two words in the font are chosen, calculate the word of each word Width values and the ratio of the font frame rectangle width of the word, and calculate average value;Judge module:According to the average value whether Beyond threshold value, determine the font font description information whether mistake;
    Amending unit:Word in the font of font description information mistake is modified.
  11. 11. word update the system according to claim 10, it is characterised in that for wide word, the threshold value is 0.9-1.1;For non-wide word, the threshold value is 1.2-1.4.
  12. 12. the word update the system according to claim 10 or 11, it is characterised in that the amending unit includes:Amendment Module, it is wide to the word of the word to be modified according to the font frame rectangle width of word.
  13. 13. word update the system according to claim 12, it is characterised in that in the correcting module, utilize following public affairs Formula is modified:W=D × a:W is revised word width values, and D is the font frame rectangle width value of the word, and a is for amendment Number, for wide word, a=1.01-1.1, for non-wide word a=1.22-1.35.
CN201310447805.6A 2013-09-27 2013-09-27 A kind of word modification method and system Active CN104516859B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310447805.6A CN104516859B (en) 2013-09-27 2013-09-27 A kind of word modification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310447805.6A CN104516859B (en) 2013-09-27 2013-09-27 A kind of word modification method and system

Publications (2)

Publication Number Publication Date
CN104516859A CN104516859A (en) 2015-04-15
CN104516859B true CN104516859B (en) 2018-02-13

Family

ID=52792187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310447805.6A Active CN104516859B (en) 2013-09-27 2013-09-27 A kind of word modification method and system

Country Status (1)

Country Link
CN (1) CN104516859B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488471B (en) * 2015-11-30 2019-03-29 北大方正集团有限公司 A kind of font recognition methods and device
CN105718673B (en) * 2016-01-22 2019-05-10 集美大学 A kind of CAD straight line text interval regulation method
CN105718671B (en) * 2016-01-22 2019-05-10 集美大学 A kind of CAD text point method of adjustment for the typesetting of tire-mold circular arc
CN105741339B (en) * 2016-01-22 2018-10-19 集美大学 A kind of CAD straight lines word spacing method of adjustment
CN113807048B (en) * 2021-09-10 2024-02-27 济南浪潮数据技术有限公司 Method, device, terminal and storage medium for self-adapting to character width

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6288725B1 (en) * 1997-02-24 2001-09-11 Zining Fu Representation and restoration method of font information
CN101097600A (en) * 2006-06-29 2008-01-02 北大方正集团有限公司 Character recognizing method and system
CN101158940A (en) * 2007-11-21 2008-04-09 金蝶软件(中国)有限公司 Method and device for dwindling character stuffing in target region
CN101655835A (en) * 2009-08-26 2010-02-24 北大方正集团有限公司 Method for text message processing, text message output and character retrieval in electronic document and device thereof
CN102982328A (en) * 2011-08-03 2013-03-20 夏普株式会社 Character recognition apparatus and character recognition method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6288725B1 (en) * 1997-02-24 2001-09-11 Zining Fu Representation and restoration method of font information
CN101097600A (en) * 2006-06-29 2008-01-02 北大方正集团有限公司 Character recognizing method and system
CN101158940A (en) * 2007-11-21 2008-04-09 金蝶软件(中国)有限公司 Method and device for dwindling character stuffing in target region
CN101655835A (en) * 2009-08-26 2010-02-24 北大方正集团有限公司 Method for text message processing, text message output and character retrieval in electronic document and device thereof
CN102982328A (en) * 2011-08-03 2013-03-20 夏普株式会社 Character recognition apparatus and character recognition method

Also Published As

Publication number Publication date
CN104516859A (en) 2015-04-15

Similar Documents

Publication Publication Date Title
CN104516859B (en) A kind of word modification method and system
CN104732228B (en) A kind of detection of PDF document mess code, the method for correction
US7492366B2 (en) Method and system of character placement in opentype fonts
CN104462068B (en) Character conversion system and character conversion method
CN103268185B (en) The text display method of E-book reader and device
WO2019041527A1 (en) Method of extracting chart in document, electronic device and computer-readable storage medium
CN105183801A (en) Web page body text extraction method and apparatus
CN107329950A (en) It is a kind of based on the Chinese address segmenting method without dictionary
CN112380824B (en) PDF document processing method, device, equipment and storage medium for automatically identifying columns
CN106325596A (en) Automatic error correction method and system for writing handwriting
CN111062186B (en) Text processing method, device, computer equipment and storage medium
CN112949290B (en) Text error correction method and device and communication equipment
CN103761220B (en) A kind of text composition technical method of information software
CN108491845A (en) Determination, character segmentation method, device and the equipment of Character segmentation position
CN103455572B (en) Obtain the method and device of video display main body in webpage
CN105512096B (en) A kind of optimization method and device based on font embedded in document
CN104933030B (en) A kind of Uighur spell checking methods and device
CN102467664B (en) Method and device for assisting with optical character recognition
CN103136166B (en) Method and device for font determination
US20150331837A1 (en) Text processing method and mobile terminal
CN104156345B (en) The method and apparatus of caption in identification portable document format file
CN103778210B (en) Method and device for judging specific file type of file to be analyzed
CN101901333B (en) Method for segmenting word in text image and identification device using same
CN113962193A (en) Table typesetting method and device, electronic equipment and storage medium
CN112699634B (en) Typesetting processing method of electronic book, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220624

Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: Beijing Fangzheng apapi Technology Co., Ltd.

Address before: 100871, Beijing, Haidian District Cheng Fu Road 298, founder building, 9 floor

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: Beijing Fangzheng apapi Technology Co., Ltd.

TR01 Transfer of patent right