CN102541963A

CN102541963A - Method and device for inquiring character identification

Info

Publication number: CN102541963A
Application number: CN2010106242701A
Authority: CN
Inventors: 段垚; 王长桥
Original assignee: BEIDA FANGZHENG TECHN INST Co Ltd BEIJING; LEADE TECHNOLOGY DEVELOPMENT Co Ltd; Peking University Founder Group Co Ltd
Current assignee: BEIDA FANGZHENG TECHN INST Co Ltd BEIJING; LEADE TECHNOLOGY DEVELOPMENT Co Ltd; Peking University Founder Group Co Ltd
Priority date: 2010-12-31
Filing date: 2010-12-31
Publication date: 2012-07-04
Anticipated expiration: 2030-12-31
Also published as: CN102541963B

Abstract

An embodiment of the invention discloses a method and a device for inquiring character identification, which relate to the field of information processing of computers and are used for solving the problem that unique identifications of equivalent or similar characters cannot be acquired sufficiently in the prior art. The character identification inquiring method includes: receiving the input unique identification of a character, inquiring equivalent unique identification to the input character in an equivalent relation module of a character identifying database; and/or, inquiring similar character unique identification of the input character unique identification in the similar relation module of the character identifying database; and returning to the inquired character unique identification. By the method and the device for inquiring character identification, the problem that unique identifications of equivalent or similar characters cannot be acquired sufficiently is solved.

Description

Font ID querying method and device

Technical field

The present invention relates to the computer information processing field, relate in particular to a kind of font ID querying method and device.

Background technology

Font is meant one group of font with common style, and font is the visual representation form of character.The font instance is meant the data entity of a font, comprises many font datas and some metadata, is encapsulated in usually in the font file, in the .ttf file.Metadata comprises title, developer, version number of font instance etc.Font and font instance are the relations of one-to-many, and same font possibly show as multiple different font instance.Same font can adopt different font technology (like True Type, Typel, Open Type etc.) to be made into the font instance.Perhaps same font is through cutting; Make the different fonts instance comprise the font data that quantity does not wait; This is called the font subsetting technology, and for example the full version of certain Chinese font has 10000 Chinese characters, but has removed the Chinese character that is of little use in the font instance of this Chinese font of certain mobile phone version; Have only 4000 words, promptly have only 4000 font datas.Can also generate many parts of copies of a font instance, be positioned at different computer systems or memory location, their content is identical, but is regarded as the different fonts instance to these copies.The different font instance of same font is regarded as of equal value.

In a lot of electronic documents, for example in MS Word document, html document, the Adobe PDF document, need specify different literal and use different fonts to show.Usually the method for specific font is the title that in document, provides font; Like " Times New Roman " or " Song typeface "; After treating display text and having specified the font name of the font of using; Document processing device in the fontlib of this locality, obtain this font name the font instance of corresponding font, the font data of in this font instance, searching literal to be shown then shows the font data that finds at last.

Yet; When same document was processed on a plurality of different document processing devices, the shortcoming of the method through the font name specific font had a lot, for example: first; The title of font is not unique to the mapping of font; Because the font that many reality are different has identical title, on different document processing devices, the display effect of the literal of designated same font is different like this.The second, possibly not have the font of appointment on the document processing device, thereby can't use this font to show corresponding literal, thereby possibly use similar inadequately instead of fonts to show this literal.More than 2 all can cause document to be shown or to handle with the appearance of document wright expectation.

An existing solution for the problems referred to above is to use embedded font technology.After using embedded font technology, the appointment that the font of in document files, document being used is clear and definite can obtain the address of the font instance of this font, this font is called embedded font.Embedded font technology has two kinds of implementations, and wherein a kind of is the offset address of font instance in document files of record font in document files, and according to this offset address this font instance is embedded into the inner relevant position of document files.When the display document literal, find corresponding font instance according to this offset address, and then the font data of in this font instance, searching literal to be shown, at last the font data that finds is shown.Use the document of this mode that PDF document etc. is arranged.

Another kind is URL (URL) address of the font instance of record font in document files, and this font instance is kept on this URL address corresponding server.When the display document literal, find corresponding font instance according to this URL address, and then the font data of in this font instance, searching literal to be shown, at last the font data that finds is shown.Use the document of this mode that HTML and overlapping CSS (CSS) document etc. are arranged.

In the above-mentioned dual mode, embedded font instance can only be a sub-set (being aforementioned font subsetting technology) of the complete font instance of font, because a lot of situation next chapter document only can use the sub-fraction font data of a font.

The font unique identification of font is meant the identifier of can be worldwide unique confirming a font, and this font unique identification can generate according to predetermined method.

Font unique identification technology should have following characteristics preferably:

(1) a plurality of font unique identifications that the different fonts instance of same font generated can be accurately and determine easily relation of equivalence is arranged between them.

(2) can accurately and easily differentiate different fonts or whether the font instance has similarity relation through the font unique identification.

In realizing process of the present invention, the inventor finds to exist in the prior art following technical matters:

But existing font unique identification technology does not still possess These characteristics.The font unique identification of font can only be generated in the prior art according to certain method, abundant of equal value or similar font unique identification can't be obtained.

Summary of the invention

The embodiment of the invention provides a kind of font ID querying method and device, is used to solve of the prior artly can't obtain abundant of equal value or similar font uniquely identified problem.

A kind of font ID querying method, this method comprises:

Receive the font unique identification of input;

Inquiry font unique identification from the relation of equivalence module of Character Font Recognition Database Unit, this font unique identification is the sign of equal value with the font unique identification of said input; And/or, inquiry font unique identification from the similarity relation module of Character Font Recognition Database Unit, this font unique identification is the sign similar with the font unique identification of said input;

Return the font unique identification that inquires.

A kind of font ID inquiry unit, this device comprises:

Receiving element is used to receive the font unique identification of input;

The relation of equivalence query unit is used for from relation of equivalence module inquiry font unique identification, and this font unique identification is the sign of equal value with the font unique identification of said input;

The similarity relation query unit is used for from similarity relation module inquiry font unique identification, and this font unique identification is the sign similar with the font unique identification of said input;

Return the unit, be used to return the font unique identification that relation of equivalence query unit and/or similarity relation query unit inquire.

In this programme; The font unique identification that the font unique identification that can from the relation of equivalence module of Character Font Recognition Database Unit, inquire and import is of equal value; From the similarity relation module of Character Font Recognition Database Unit, inquire the font unique identification similar, solved and to have obtained abundant of equal value or similar font uniquely identified problem with the font unique identification of importing.

Description of drawings

The method flow synoptic diagram that Fig. 1 provides for the embodiment of the invention;

The apparatus structure synoptic diagram that Fig. 2 provides for the embodiment of the invention;

Another method schematic flow sheet that Fig. 3 provides for the embodiment of the invention;

Another apparatus structure synoptic diagram that Fig. 4 provides for the embodiment of the invention.

Embodiment

The font unique identification is meant can worldwide unique identifier of confirming a font.But a font can have a plurality of font unique identifications, and these a plurality of font unique identifications are considered to be font unique identification of equal value.The font unique identification can be taked character string, numeral or other more complicated form.

Referring to Fig. 1, the document processing method that the embodiment of the invention provides may further comprise the steps:

Step 10: the font unique identification of confirming embedded font in the pending document;

Step 11: in local fontlib, search font instance with the font unique identification coupling of confirming;

Step 12: utilize the pending document of font instance processes that finds.Specifically can be following: at first; The font instance memory address of the said embedded font that writes down in the pending document is redirected to the font instance that finds, and specifically can be that the font instance memory address with said embedded font is updated to the memory address of font instance in local fontlib that finds; Then, use the data of said embedded font in the pending document of font instance processes after use is redirected.For example, the font instance after use is redirected shows or prints the data of the said embedded font of use in the pending document.

Said local fontlib is meant the set that is installed in font instance on the equipment of handling the document, and is used to manage needed data of these font instances and program.Especially; On computer equipment; Font instance in the local fontlib can be managed by operating system (like Windows, Linux) or application software (like Microsoft Office, Adobe Reader), and is stored on the memory device of computing machine with the mode of file.

In the step 10, confirm the font unique identification of embedded font in the pending document, specifically can adopt following three kinds of methods:

First kind: from pending document or with file that pending document is associated in obtain the font unique identification, the font unique identification that gets access to is confirmed as the font unique identification of said embedded font; Adopt this forwarding method, need in advance font unique identification with embedded font be stored in pending document or with file that pending document is associated in;

Second kind,, obtain the font instance of said embedded font according to the font instance memory address of the embedded font that writes down in the pending document; Generate the font unique identification according to the font instance that gets access to, the font unique identification that generates is confirmed as the font unique identification of said embedded font;

The third according to the font instance memory address of the said embedded font that writes down in the pending document, obtains the font instance of said embedded font; Generate the font unique identification according to the font instance that gets access to; Inquiry and/or similar font unique identification of equal value with the font unique identification that generates; Font unique identification that generates and the font unique identification that inquires are confirmed as the font unique identification of said embedded font.

The 4th kind, from pending document or with file that pending document is associated in obtain the font unique identification; Inquiry and/or similar font unique identification of equal value with the font unique identification that obtains; Font unique identification that obtains and the font unique identification that inquires are confirmed as the font unique identification of said embedded font.

Above-mentioned the third with the 4th kind of method in, inquiry can be following with the font unique identification equivalence and/or the similar concrete implementation method of font uniquely identified that generate:

At first, the font unique identification that generates to the input of Character Font Recognition Database Unit;

Then, after the Character Font Recognition Database Unit receives the font unique identification of input, carry out following two steps or carry out one of them step:

Step 1: the inquiry font unique identification of equal value with the font unique identification of said generation from the relation of equivalence module of Character Font Recognition Database Unit, return the font unique identification that inquires;

Step 2: the inquiry font unique identification similar from the similarity relation module of Character Font Recognition Database Unit with the font unique identification of said generation; Return the font unique identification that inquires.

Above-mentioned relation of equivalence module can be based on a font unique identification relation table of equal value of setting up in advance; Can from this table, inquire about the font unique identification of equal value with the font unique identification of said generation, the method for building up of this table can adopt a kind of or combination in any in following two kinds of methods:

First kind, generate a plurality of font unique identifications of these fonts according to the one or more font instances of font; The font unique identification of a plurality of font unique identifications that generate as equivalence is kept in the font unique identification relation table of equal value;

Second kind, receive the font unique identification of a plurality of equivalences of user's input, the font unique identification of these a plurality of equivalences font unique identification as equivalence is kept in the font unique identification relation table of equal value.

Above-mentioned similarity relation module can be based on a similar font unique identification relation table of setting up in advance; Can from this table, inquire about the font unique identification similar with the font unique identification of said generation, the method for building up of this table can adopt a kind of or combination in any in following three kinds of methods:

First kind; Utilize the pattern recognition program of font to judge whether two fonts in a plurality of fonts are similar fonts; Be judged as when being, the font unique identification of said two fonts is being kept in the similar font unique identification relation table as similar font unique identification; Here, the font unique identification can generate according to the font instance, also can be that the people writes fully, or generate at random.

Second kind, judge whether minor release field difference is only arranged in two font unique identifications, being judged as when being, said two font unique identifications are kept in the similar font unique identification relation table as similar font unique identification; Here, the font unique identification can be that the people writes;

For example, the wright of font can provide a font unique identification related with font, can comprise with the lower part: wright's domain name, fontname, main version number, minor release number etc.For example font unique identification " fonts.founder.com/lan_ting_hei/2.1 " is represented 2.1 versions of " The Orchid Pavilion is black " by name (phonetic lan_ting_hei) font of upright company (domain name fonts.founder.com); Wherein 2 is main version number, and 1 is minor release number.When the wright of font released the redaction of certain font, bigger modification should change main version number, and less modification should change minor release number, only has minor release number different font to be considered to similar.Therefore whether the user of font just can simply similar according to minor release number disconnected two font unique identifications.Though in existing a large amount of font version number is arranged, certain is not got in touch with the modification degree agreement of font, so can not be used for judging similarity relation.Therefore need the wright or the unification of other tissue of font to give the font that contains version number unique identification for various fonts, and the similarity degree of version number's reflection font, could adopting said method.

The third receives a plurality of similar font unique identification that the user imports, and these a plurality of similar font unique identifications are kept in the similar font unique identification relation table.

Above-mentioned similarity relation module can also directly find the font unique identification similar with the font uniquely identified that generates according to version number; Step is following: all font unique identifications of preserving in the traversal Character Font Recognition Database Unit; Font unique identification with said generation compares one by one; Judge whether both only have minor release field difference,, return the font unique identification that traverses being judged as when being.

Preferable, simultaneously to Character Font Recognition Database Unit input font uniquely identified, can also be to Character Font Recognition Database Unit input filtration parameter; The Character Font Recognition Database Unit confirms according to filtration parameter whether the font unique identification that inquires satisfies filtercondition, when confirming to satisfy filtercondition, returns the font unique identification that inquires; Otherwise, do not return the font unique identification that inquires.

Similarity between the font unique identification that returns in order to ensure the Character Font Recognition Database Unit and the font unique identification of input; Filtration parameter can comprise the similarity parameter value, and then the Character Font Recognition Database Unit confirms that according to this filtration parameter the concrete grammar whether the font unique identification that inquires satisfies filtercondition is: the Character Font Recognition Database Unit reads the font unique identification that inquires and the font uniquely identified similarity parameter value of said generation from similar font unique identification relation table; Whether the similarity parameter value that judgement is read and the similarity parameter value of input satisfy the relation of setting; Confirm according to judged result whether the font unique identification that inquires satisfies filtercondition.For example, whether judge the similarity parameter value that reads greater than the similarity parameter value of importing, if, confirm that then the font unique identification that inquires satisfies filtercondition, otherwise, confirm that the font unique identification that inquires does not satisfy filtercondition.Again for example; Judge that the similarity parameter value read is whether in the similarity range of parameter values that two similarity parameter values of input constitute, if confirm that then the font unique identification that inquires satisfies filtercondition; Otherwise, confirm that the font unique identification that inquires does not satisfy filtercondition.Here, need in similar font unique identification relation table, increase a similarity field, numeral similarity degree that can 0-9,0 expression is not too similar, and 9 expressions are very similar.Only there is minor release number different font unique identification can get a fixed value (as 7), perhaps confirms with additive method.

Filtration parameter can also comprise the sign type information, and then the Character Font Recognition Database Unit confirms that according to this filtration parameter the concrete grammar whether the font unique identification that inquires satisfies filtercondition is: the Character Font Recognition Database Unit is judged whether the font uniquely identified sign type of said generation and sign type in the filtration parameter satisfy to set and is concerned; Confirm according to judged result whether the font unique identification that inquires satisfies filtercondition.For example, the font uniquely identified type of judging said generation whether be in the filtration parameter the sign type, if, confirm that then the font unique identification that inquires satisfies filtercondition, otherwise, confirm that the font unique identification that inquires does not satisfy filtercondition.Here; The sign type is meant the generting machanism or the method for sign; For example, two signs that the font unique identification is same type that generate according to the metadata of font instance, two signs that the font unique identification is same type that generate according to the digest value of font data in the font instance.

Because the inquiry can only handle the font unique identification of limited type usually, the font uniquely identified type that therefore adopts this method can guarantee that the Character Font Recognition Database Unit returns is the type that the inquiry can handle; On the other hand; The inquiry imports the different sign type of font uniquely identified type a kind of and input; The different font unique identification of font uniquely identified type that can obtain and import promptly can convert the font unique identification of input into another kind of type.

The Character Font Recognition Database Unit also can be set up " sign-local font " relation table, is used to safeguard the relation of font unique identification and the pairing local font instance of this font unique identification.

It is thus clear that the main effect of Character Font Recognition Database Unit is among the present invention: other font unique identification (among the present invention, the font unique identification is similar just to mean that the font of their representatives is similar) that a given font unique identification, inquiry are of equal value or similar with it.The font unique identification can be worldwide unique confirm a font, but each font but can have a plurality of font unique identifications, and possibly can't compare mutually simply.Generation font uniquely identified method for example provided by the invention selects different character group as key, can obtain different font unique identifications; If two uniquely identified key ranges do not overlap, just can't judge simply whether both are of equal value.In addition; Different computer systems, font publisher, document DP display processor, script management program (for example possibly selected diverse font unique identification framework; Possible manual compiling maybe be based on GUID (Globally Unique Identifier, global unique identification symbol); Also maybe be based on the summary of whole font file), thereby can't discern the font unique identification that the other side generates mutually.At last; General font unique identification itself can't reflect the similarity degree of two fonts; Can only judge relation of equivalence; But sometimes hope a kind of font that can't obtain is replaced with another kind of similar font, this just need find out and other font unique identification that the font unique identification is similar.Therefore, the Character Font Recognition Database Unit can finely address the above problem.

Among the present invention; According to the font instance generate font uniquely identified method can for: choose one or more setting characters; From the font instance, obtain the font data corresponding, utilize digest algorithm to calculate digest value based on each font data of obtaining with setting character; Generate the font unique identification according to the digest value that calculates.Here, stress " based on " be because: the first, when calculating summary, except font data, also possibly add other data, like the font metadata; The second, be not necessarily to calculate respectively the summary of each font data, also can be that each font data is calculated and made a summary for being spliced into one earlier.

Here; Generating font uniquely identified method according to the digest value that calculates is: generate the mapping table that comprises one or more corresponding relations (being list item); Each list item has a key and a value; Key is the subclass of said setting character or the sign of this subclass, is worth the digest value for the corresponding font data of the character in this subclass; With the font uniquely identified part of mapping table as said embedded font.Certainly, can also comprise other information in the font unique identification, for example fontname, version number, manufacturer's title etc.

Usually, the setting character of choosing is a character the most frequently used in the character set of font, like 4 the most frequently used Chinese characters.For font file simsun.ttf (the new Song typeface), the font unique identification of example is following:

This example is XML (Extensible Markup Language, an extend markup language) form, but other equivalent form also is possible.The whole font unique identification of font-id element representation.Font-id has several attribute font-name (fontname), version (version), and foundry (wright), but only for reference, generally not as comparing font uniquely identified foundation.The char-glyph-map element is the said mapping table in front, and 4 list items (item element) are arranged, and each list item has a key and a value.In this example; The key of 4 list items (chars attribute) be respectively the most frequently used 4 Chinese characters (, one, be); And value is respectively the MD5 digest value (the glyph-digest attribute adopts 16 system string representations) of the font data of these four characters in font simsun.ttf.In this example, the key in the mapping table all is single character.And in a further embodiment, key can be one group of character, for example:

In this example, a list item (item) is arranged, key (chars) is the character string that " one be " four characters are formed, and is worth the MD5 digest value that (glyph-digest) is the corresponding font data of these several characters.The computing method of digest value can have multiple, for example earlier these several font datas are done scale-of-two and connect, and ask its MD5 digest value again; Perhaps ask earlier the MD5 digest value of these several font datas respectively, again these several digest value are done scale-of-two and connect, ask the digest value of the character string after the connection at last.Because digest value is general littler than a font data, thus the digest value of each font data of a font instance can be calculated in advance, and preserve; When needs are asked the digest value of one of which sub-set, then adopt a kind of algorithm in back, like this can be faster than preceding a kind of algorithm.In addition, two kinds of algorithms do not have essential distinction.

In another example, the code name that can also use one group of character is as key, for example:

A list item (item) is arranged in this example; Key is the charset-name attribute, the code name of expression character set, and the meaning of " zh-top-4 " is preceding 4 the most frequently used characters of Chinese; By the frequency of utilization descending sort, " one be " four characters just.And be worth the MD5 digest value that (glyph-digest) is the corresponding font data of these several characters.Certainly, for the explanation of the code name of character set, must be on all four on different computer systems, comprise the ordering of the character of its representative, could guarantee interoperability like this.

Therefore because digest algorithm is responsive to the order of input data, when using one group of character or its code name as key, the order of character wherein also is important, and the input sequence of font data is identical in the time of generally should be with the calculating digest value.

Single character, one group of character, code name can occur in a font unique identification as the clauses and subclauses of key simultaneously, and on the scope overlapping can be arranged, but two identical clauses and subclauses of key can not occur.

Accordingly, in local fontlib, search the font instance that matees with the font unique identification of confirming in the step 11, its concrete implementation method is following:

For each the font instance in the local fontlib, obtain the font unique identification of this font instance; The corresponding relation (list item) that font unique identification of confirming to get access to and the said definite medium key of font unique identification are equivalent and etc. the corresponding relation (list item) of key non-equivalence; Judge according to the corresponding relation of confirming (list item) whether the font unique identification that gets access to matees with said definite font unique identification; Be judged as when being, this font instance is being confirmed as the font instance with said definite font unique identification coupling." wait key equivalent " is meant that the key of two list items equates and value equates, " waiting the key non-equivalence " is meant that the key of two list items equates that still value does not wait.Among the present invention, each the font instance in the local fontlib all needs one or more font unique identifications.

Judge that according to the corresponding relation of confirming (list item) whether the font unique identification that gets access to is exemplified below with the concrete grammar of said definite font unique identification coupling: when the weights sum that waits the equivalent list item of key surpasses predefined first threshold, this font instance is confirmed as the font instance that matees with said definite font unique identification; When the weights sum of the corresponding relation that waits the key non-equivalence surpasses predefined second threshold value, this font instance is confirmed as and the unmatched font instance of said definite font unique identification; When aforementioned two kinds of conditions all are satisfied, be satisfied execution by second kind of condition; Provide the result that can't judge in other cases.The weights of list item can be confirmed according to the weights of the character of its key representative.For example can stipulate that each character has weights 1, the weights of list item are character weights sums, if key is made up of 4 characters, then the weights of this list item are 4; The weights that can also stipulate each character are relevant with its frequency of utilization.Said first threshold second threshold value can be provided with by software developer or final user.In concrete the realization, said first threshold is usually greater than 0, and said second threshold value can equal 0 usually, like this when the phenomenon of appearance any " waiting the key non-equivalence ", all can be judged to be and does not match.

Preferable; Judge according to the corresponding relation of confirming (list item) the font unique identification that gets access to whether with said definite font unique identification coupling before, other information in font unique identification that can also relatively get access to and the said definite font unique identification except that the corresponding relation (list item) of key and value; And then, judge whether the font unique identification that gets access to matees with said definite font unique identification according to comparative result and said definite corresponding relation (list item).For example; Whether " manufacturer " field in the font unique identification that relatively gets access to is identical with " manufacturer " field in the said definite font unique identification; If it is inequality; And the number of the corresponding relation of key non-equivalences such as above-mentioned judgement surpasses predefined second threshold value, confirms that then the font unique identification that gets access to does not mate with said definite font unique identification, if identical; And the number of the corresponding relation that keys such as above-mentioned judgement are equivalent surpasses predefined first threshold, then confirms the font unique identification and the said definite font unique identification coupling that get access to.Certainly, can also judge according to other decision rules.

In the face of how judging whether two font unique identifications mate be illustrated down:

Step S01: for the mapping table in two font unique identifications, the value in the corresponding relation that comparison key is identical one by one, the value that does not wait if there is abundant value to equate then provides definite results; Do not have equal value if there is abundant value not wait, then provide negative result; Other situations then provide the result that can't judge;

Step S02: compare other information that need compare in two font unique identifications, provide certainly, negate the result that maybe can't judge;

Step S03: the result in comprehensive above two steps provides the conclusion whether two font unique identifications mate.

Two font unique identification couplings just mean that also the font instance of their representatives belongs to same font or enough similar, thereby two font instances can be exchanged use.

Generate font uniquely identified method front and discuss, no longer repeat.Among the step S01, key can be the code name of single character, one group of character, one group of character etc.Whether identical determination methods generally is to be used as character string to key to come comparison to key; When a key is one group of character, when another key is the code name of one group of character, should the character group of the actual representative of code name be participated in relatively.The comparison of value can be that scale-of-two compares, if but digest value has been carried out text code (like Base64 coding or 16 system string encodings), also can carry out character string relatively.

Among the step S02 other information that need compare in the comparison font unique identification.Other information that need compare possibly be fontname, version, manufacturer etc.For example can stipulate to have only manufacturer's (foundry field) identical, two font unique identifications just possibly mate.But generally speaking, need not relatively out of Memory, the result in this step can be considered " affirming " again.

The result of comprehensive step S01 of step S03 and step S02 provides the conclusion whether two font unique identifications mate.Usually, if step S01 and step S02 provide definite results, think that then two font unique identifications mate; There is more than one step (comprising a step) to provide negative decision, thinks that then two font unique identifications do not match; Otherwise, think that two font unique identifications do not match and maybe can't judge (decide by program designer or user and how to handle).

For example following two font unique identifications have three clauses and subclauses is " wait key equivalent ", has represented three characters " be ".If stipulate (to comprise 3) more than 3, need not compare out of Memory, so these two font unique identification couplings even if the digest value of character equates coupling.

And following font unique identification and top two do not match, because " " word " waits the key non-equivalence ":

Preferable, in local fontlib, search in the step 11 after the font instance that matees with the font unique identification of confirming, can also judge further whether the font instance that finds satisfies the processing demands of said pending document; When the font instance that finds in judgement satisfies the processing demands of said pending document, just utilize the said pending document of this font instance processes in the step 12.

Whether the font instance that above-mentioned judgement finds satisfies the processing demands of said pending document, and concrete grammar can be following:

Judge that whether all or most of character that uses said embedded font in the said pending document that covers the character that comprised in the font instance find; If; Confirm that then the font instance that finds satisfies the processing demands of said pending document; Otherwise, confirm that the font instance that finds does not satisfy the processing demands of said pending document.

Preferable; When in local fontlib, not finding the font instance that matees with the font unique identification of confirming in the step 11; Can get access to the font instance of said embedded font according to the font instance memory address of the said embedded font that writes down in the pending document; The font instance that gets access to is kept in the local fontlib; And utilization is kept at the pending document of said font instance processes in the local fontlib.

During the above-mentioned font instance that obtains said embedded font, can only obtain the demonstration of satisfying document in the font instance and the minimum data set of processing demands, rather than obtain whole font instance.The font instance of embedded font possibly be the needs that exceed the document that uses it, and for example, the font instance comprises the font of 10000 characters, but uses the kinds of characters of this font possibly have only 1000 in the document.If embedded font instance need be downloaded from the Internet, download so fully and will expend unnecessary flow and time.Therefore, can only download the font data of those 1000 characters.

When being kept at the font instance that gets access in the local fontlib,, just directly the font instance being added local fontlib, and register its font unique identification if also do not have of equal value or similar font instance in the local fontlib; Otherwise, can incorporate the data in the font instance in the local fontlib of equal value or similar font instance, the data that wherein repeat are preferably eliminated, to save the space.According to the concrete data structure of local fontlib and the form of font instance, the operation of " adding " and " incorporating into " possibly be various, for example creates or revise font file, more new font log-on message etc.

Here; Utilization is kept at the pending document of said font instance processes in the local fontlib; Specifically can be following: the font instance memory address of the said embedded font that writes down in the pending document is redirected to the font instance of preservation, can be the memory address of font instance in fontlib that the font instance memory address of said embedded font is updated to preservation; Use the data of using said embedded font in the said pending document of font instance processes after being redirected.

In addition, when local fontlib need reduce the storage space that takies, can remove part font instance, the perhaps partial data in the part font instance according to certain algorithm.The memory capacity of computing machine is limited, and therefore embedded device particularly should not let the unconfined expansion of local fontlib.The algorithm of removing character font data can be based on frequency of utilization, perhaps most recently used, perhaps access times or the like.

Among the present invention, font uniquely identified generation method also has a lot, for example can be to wait generation font unique identification according to the metadata in the font instance of font (like title, developer, version number) and the wide table of word, character-spacing adjustment (Kerning) table.Font unique identification of design that also can be artificial, irrelevant with concrete character font data.Digest algorithm among the present invention can be selected MD5 algorithm SHA-1 algorithm or CRC algorithm etc.

Among the present invention, use the data in the font instance processes document in the local fontlib, done following benefit like this:

The first, for the situation of font instance in other servers of embedded font,, then can no longer download, thereby practice thrift network traffics and download time if the font instance is not downloaded as yet.The second, open a plurality of documents when simultaneously, and they use a font instance in the local fontlib of sharing when having used similar embedded font, rather than a plurality of embedded font instance, can practice thrift a lot of internal memories and font load time.Can find out that along with the document of routine processes increases, the font instance in the local fontlib can progressively increase, so handling procedure has the lifting that increasing possibility obtains efficient.

Referring to Fig. 2, the embodiment of the invention also provides a kind of document processing device, document processing, and this device comprises:

Confirm unit 20, be used for confirming the font unique identification of the embedded font of pending document;

Search unit 21, be used for searching the font instance that matees with the font unique identification of confirming at local fontlib;

Processing unit 22 is used to utilize the said pending document of the font instance processes that finds.

Said definite unit 20 comprises or combination in any in first module, Unit second, Unit the 3rd, the Unit the 4th, wherein:

Said first module, be used for from said pending document or with file that said pending document is associated obtain the font unique identification, the font unique identification that gets access to is confirmed as the font unique identification of said embedded font;

Said Unit second, the font instance memory address of the said embedded font that is used for writing down according to said pending document is obtained the font instance of said embedded font; Generate the font unique identification according to the font instance that gets access to, the font unique identification that generates is confirmed as the font unique identification of said embedded font; Perhaps,

Said Unit the 3rd, the font instance memory address of the said embedded font that is used for writing down according to said pending document is obtained the font instance of said embedded font; Generate the font unique identification according to the font instance that gets access to; Inquiry and/or similar font unique identification of equal value with the font unique identification that generates; Font unique identification that generates and the font unique identification that inquires are confirmed as the font unique identification of said embedded font;

Said Unit the 4th, be used for from pending document or with file that pending document is associated obtain the font unique identification; Inquiry and/or similar font unique identification of equal value with the font unique identification that obtains; Font unique identification that obtains and the font unique identification that inquires are confirmed as the font unique identification of said embedded font.

Said Unit the 3rd is used for:

Import the font unique identification of said generation to the Character Font Recognition Database Unit;

This device also comprises:

Character Font Recognition Database Unit 23 is used to receive the font unique identification of said generation, and carries out following two steps or carry out one of them step:

Step 2: the inquiry font unique identification similar with the font unique identification of said generation from the similarity relation module of Character Font Recognition Database Unit, return the font unique identification that inquires.

Said Character Font Recognition Database Unit 23 is used for:

Inquire about the font unique identification of equal value from the font unique identification relation table of equal value of relation of equivalence module, and set up said font unique identification relation table of equal value according to following method with the font unique identification of said generation:

Generate a plurality of font unique identifications of this font according to one or more font instances of font; The font unique identification of a plurality of font unique identifications that generate as equivalence is kept in the font unique identification relation table of equal value; Perhaps,

Receive the font unique identification of a plurality of equivalences of user's input, the font unique identification of these a plurality of equivalences is kept in the font unique identification relation table of equal value.

Said Character Font Recognition Database Unit 23 is used for:

From the similar font unique identification relation table of similarity relation module, inquire about the font unique identification similar, and set up said similar font unique identification relation table according to following method with the font unique identification of said generation:

Utilize the pattern recognition program of font to judge whether two fonts in a plurality of fonts are similar fonts, be judged as when being, the font unique identification of said two fonts is kept in the similar font unique identification relation table as similar font unique identification; Perhaps,

Judge whether minor release field difference is only arranged in two font unique identifications,, said two font unique identifications are kept in the similar font unique identification relation table as similar font unique identification being judged as when being; Perhaps,

Receive a plurality of similar font unique identification of user's input, these a plurality of similar font unique identifications are kept in the font unique identification relation table of equal value.

Said Unit the 3rd also is used for:

At the font uniquely identified that generates to the input of Character Font Recognition Database Unit simultaneously, to Character Font Recognition Database Unit input filtration parameter;

Said Character Font Recognition Database Unit 23 also is used for:

Confirm according to said filtration parameter whether the font unique identification that inquires satisfies filtercondition, when confirming to satisfy filtercondition, return the font unique identification that inquires.

Said Character Font Recognition Database Unit 23 is used for:

When filtration parameter comprises the similarity parameter value, from similar font unique identification relation table, read the font unique identification that inquires and the font uniquely identified similarity parameter value of said generation;

Whether the similarity parameter value that judgement is read and the similarity parameter value of input satisfy the relation of setting;

Confirm according to judged result whether the font unique identification that inquires satisfies filtercondition.

Said Character Font Recognition Database Unit 23 is used for:

When said filtration parameter comprises the sign type information, judge whether the font uniquely identified type of said generation and the sign type in the filtration parameter satisfy the relation of setting;

Said Unit second or Unit the 3rd are used for:

Generate the font unique identification according to following method according to the font instance:

Choose one or more setting characters, from the font instance, obtain the font data corresponding, utilize digest algorithm to calculate digest value based on each font data of obtaining with setting character; Generate the font unique identification according to the digest value that calculates.

Said Unit second or Unit the 3rd are used for:

Generate the font unique identification according to following method according to the digest value that calculates:

Generation comprises the mapping table of one or more list items, and each list item has a key and a value, and said key is the subclass of said setting character or the sign of this subclass, and said value is the digest value of the corresponding font data of the character in this subclass; With the font unique identification of said mapping table as said embedded font.

The said unit 21 of searching is used for:

For each the font instance in the local fontlib, obtain the font unique identification of this font instance; The equivalent list item of font unique identification of confirming to get access to and the said definite medium key of font unique identification with etc. the list item of key non-equivalence; Judge according to the list item of confirming whether the font unique identification that gets access to matees with said definite font unique identification; Be judged as when being, this font instance is being confirmed as the font instance with said definite font unique identification coupling.

The said unit 21 of searching also is used for:

Judge according to the list item of confirming the font unique identification that gets access to whether with said definite font unique identification coupling before, other information in font unique identification that relatively gets access to and the said definite font unique identification except that list item; According to comparative result and said definite list item, judge whether the font unique identification that gets access to matees with said definite font unique identification.

The said unit 21 of searching also is used for:

After the font instance of the font unique identification coupling of in local fontlib, searching and confirming, and utilize before the said pending document of font instance processes that finds, judge whether the font instance that finds satisfies the processing demands of said pending document;

Said processing unit 22 is used for:

When the font instance that finds in judgement satisfies the processing demands of said pending document, utilize the said pending document of this font instance processes.

The said unit 21 of searching is used for:

Judge according to following method whether the font instance that finds satisfies the processing demands of said pending document:

Judge that whether the character that comprised in the font instance find uses the character of said embedded font in the said pending document of all or part of covering; If; Confirm that then the font instance that finds satisfies the processing demands of said pending document; Otherwise, confirm that the font instance that finds does not satisfy the processing demands of said pending document.

The said unit 21 of searching also is used for:

When in local fontlib, not finding the font instance that matees with the font unique identification of confirming,, get access to the font instance of said embedded font according to the font instance memory address of the said embedded font that writes down in the said pending document;

The font instance that gets access to is kept in the local fontlib;

Said processing unit 22 is used for:

Utilization is kept at the said pending document of said font instance processes in the local fontlib.

Said processing unit 22 is used for:

The font instance memory address of the said embedded font that writes down in the said pending document is redirected to the font instance that finds;

Use the data of using said embedded font in the said pending document of font instance processes after being redirected.

Said processing unit 22 is used for:

The font instance memory address of the said embedded font that writes down in the said pending document is redirected to the font instance of preservation;

Referring to Fig. 3, the embodiment of the invention also provides a kind of font ID querying method, specifically may further comprise the steps:

Step 30: the font unique identification that receives input;

Step 31: inquiry font unique identification from the relation of equivalence module of Character Font Recognition Database Unit, this font unique identification is the sign of equal value with the font unique identification of said input; And/or, inquiry font unique identification from the similarity relation module of Character Font Recognition Database Unit, this font unique identification is the sign similar with the font unique identification of said input;

Step 32: return the font unique identification that inquires.

Specifically can be from the font unique identification relation table inquiry of equal value font unique identification of equal value of relation of equivalence module with the font unique identification of input; And can adopt a kind of or combination in any method in following two kinds of methods, set up said font unique identification relation table of equal value:

Second kind, receive the font unique identification of a plurality of equivalences of user's input, the font unique identification of these a plurality of equivalences is kept in the font unique identification relation table of equal value.

Specifically can from the similar font unique identification relation table of similarity relation module, inquire about the font unique identification similar with the font unique identification of said generation; And can adopt a kind of or combination in any method in following three kinds of methods, set up said similar font unique identification relation table:

First kind; Utilize the pattern recognition program of font to judge whether two fonts in a plurality of fonts are similar fonts; Be judged as when being, the font unique identification of said two fonts is being kept in the similar font unique identification relation table as similar font unique identification;

Second kind, judge whether minor release field difference is only arranged in two font unique identifications, being judged as when being, said two font unique identifications are kept in the similar font unique identification relation table as similar font unique identification;

Preferable, the font uniquely identified that in step 30, receives input can also receive the filtration parameter of input simultaneously; Can confirm whether the font unique identification that inquires satisfies filtercondition according to filtration parameter, when confirming to satisfy filtercondition, return the font unique identification that inquires; Otherwise, do not return the font unique identification that inquires.

Filtration parameter can comprise the similarity parameter value, then confirms that according to this filtration parameter the concrete grammar whether the font unique identification that inquires satisfies filtercondition is: from similar font unique identification relation table, read the font unique identification that inquires and the font uniquely identified similarity parameter value of said input; Whether the similarity parameter value that judgement is read and the similarity parameter value of input satisfy the relation of setting; Confirm according to judged result whether the font unique identification that inquires satisfies filtercondition.For example, whether judge the similarity parameter value that reads greater than the similarity parameter value of importing, if, confirm that then the font unique identification that inquires satisfies filtercondition, otherwise, confirm that the font unique identification that inquires does not satisfy filtercondition.Again for example; Judge that the similarity parameter value read is whether in the similarity range of parameter values that two similarity parameter values of input constitute, if confirm that then the font unique identification that inquires satisfies filtercondition; Otherwise, confirm that the font unique identification that inquires does not satisfy filtercondition.

Filtration parameter can also comprise the sign type information, then confirms that according to this filtration parameter the concrete grammar whether the font unique identification that inquires satisfies filtercondition is: judging whether the font uniquely identified sign type of said generation and sign type in the filtration parameter satisfy to set concerns; Confirm according to judged result whether the font unique identification that inquires satisfies filtercondition.For example, judge whether the font uniquely identified type of said generation is the sign type of input, if, confirm that then the font unique identification that inquires satisfies filtercondition, otherwise, confirm that the font unique identification that inquires does not satisfy filtercondition.Here; The sign type is meant the generting machanism or the method for sign; For example, two signs that the font unique identification is same type that generate according to the metadata of font instance, two signs that the font unique identification is same type that generate according to the digest value of font data in the font instance.

Among the present invention; According to the font instance generate font uniquely identified method can for: choose one or more setting characters; From the font instance, obtain the font data corresponding, utilize digest algorithm to calculate digest value based on each font data of obtaining with setting character; Generate the font unique identification according to the digest value that calculates.Here; Generating font uniquely identified method according to the digest value that calculates is: the mapping table that generates the corresponding relation (list item) that comprises one or more keys and value; Key is the subclass of said setting character or the sign of this subclass, is worth the digest value for the corresponding font data of the character in this subclass; With the font uniquely identified part of mapping table as said embedded font.Certainly, can also comprise other information in the mapping table, for example fontname, version number, manufacturer's title etc.

Referring to Fig. 4, the embodiment of the invention also provides a kind of font ID inquiry unit, and this device comprises:

Receiving element 40 is used to receive the font unique identification of input;

Relation of equivalence query unit 41 is used for from relation of equivalence module inquiry font unique identification, and this font unique identification is the sign of equal value with the font unique identification of said input;

Similarity relation query unit 42 is used for from similarity relation module inquiry font unique identification, and this font unique identification is the sign similar with the font unique identification of said input;

Return unit 43, be used to return the font unique identification that relation of equivalence query unit and/or similarity relation query unit inquire.

This device also comprises:

Relation of equivalence generation unit 44 is used for a plurality of font unique identifications according to one or more these fonts of font instances generation of font; The font unique identification of a plurality of font unique identifications that generate as equivalence is kept in the font unique identification relation table of equal value of relation of equivalence module; Perhaps,

Receive the font unique identification of a plurality of equivalences of user's input, the font unique identification of these a plurality of equivalences is kept in the font unique identification relation table of equal value;

Accordingly, said relation of equivalence query unit 41 is used for: from the font unique identification relation table inquiry of equal value font unique identification of equal value with the font unique identification of input.

This device also comprises:

Similarity relation generation unit 45; Be used for utilizing the pattern recognition program of font to judge whether two fonts of a plurality of fonts are similar fonts; Be judged as when being, be kept at the font unique identification of said two fonts in the similar font unique identification relation table of similarity relation module as similar font unique identification; Perhaps,

Receive a plurality of similar font unique identification of user's input, these a plurality of similar font unique identifications are kept in the font unique identification relation table of equal value;

Accordingly, said similarity relation query unit 42 is used for: from the font unique identification similar font unique identification of similar font unique identification relation table inquiry with input.

This device also comprises:

Filter element 46; Be used for confirming according to the filtration parameter of input whether the font unique identification that relation of equivalence query unit or similarity relation query unit inquire satisfies filtercondition; When confirming to satisfy filtercondition, indication is returned the unit and is returned the font unique identification that inquires.

Said filter element 46 is used for:

When said filtration parameter comprises the similarity parameter value, from similar font unique identification relation table, read the font unique identification that inquires and the font uniquely identified similarity parameter value of said input;

Said filter element 46 is used for:

Said relation of equivalence generation unit 44 or similarity relation generation unit 45 are used for:

Generation comprises the mapping table of the corresponding relation (list item) of one or more keys and value, and said key is the subclass of said setting character or the sign of this subclass, and said value is the digest value of the corresponding font data of the character in this subclass; With the font unique identification of said mapping table as said embedded font.

To sum up, beneficial effect of the present invention comprises:

In the scheme that the embodiment of the invention provides, at first confirm the font unique identification of embedded font in the pending document; The font instance of the font unique identification coupling of in local fontlib, searching then and confirming; Utilize the said pending document of font instance processes that finds at last.It is thus clear that; Adopt the present invention; For embedded font; Adopt this pending document of font instance processes in the local fontlib, and need from pending document, not obtain the font instance according to the font instance memory address of the embedded font that writes down in the pending document or, save the needed memory headroom of pending document, disk storage space or the needed network traffics of downloaded fonts instance from other downloaded to the font instance.

In the scheme that the embodiment of the invention provides, at first, receive the font unique identification of input; Then, the inquiry font unique identification of equal value from the font unique identification relation table of setting up in advance of equal value with the font unique identification of said generation; And/or, the inquiry font unique identification similar from the similar font unique identification relation table of setting up in advance with the font unique identification of said generation; At last, return the font unique identification that inquires.In this programme; The font unique identification font unique identification of equal value that can from the font unique identification relation table of setting up in advance of equal value, inquire and import; From the similar font unique identification relation table of setting up in advance, inquire the font unique identification similar, can't obtain abundant of equal value or similar font uniquely identified problem thereby solved with the font unique identification of importing.

The present invention is that reference is described according to the process flow diagram and/or the block scheme of method, equipment (system) and the computer program of the embodiment of the invention.Should understand can be by the flow process in each flow process in computer program instructions realization flow figure and/or the block scheme and/or square frame and process flow diagram and/or the block scheme and/or the combination of square frame.Can provide these computer program instructions to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, make the instruction of carrying out through the processor of computing machine or other programmable data processing device produce to be used for the device of the function that is implemented in flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame appointments.

These computer program instructions also can be stored in ability vectoring computer or the computer-readable memory of other programmable data processing device with ad hoc fashion work; Make the instruction that is stored in this computer-readable memory produce the manufacture that comprises command device, this command device is implemented in the function of appointment in flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame.

These computer program instructions also can be loaded on computing machine or other programmable data processing device; Make on computing machine or other programmable devices and to carry out the sequence of operations step producing computer implemented processing, thereby the instruction of on computing machine or other programmable devices, carrying out is provided for being implemented in the step of the function of appointment in flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame.

Although described the preferred embodiments of the present invention, in a single day those skilled in the art get the basic inventive concept could of cicada, then can make other change and modification to these embodiment.So accompanying claims is intended to be interpreted as all changes and the modification that comprises preferred embodiment and fall into the scope of the invention.

Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, belong within the scope of claim of the present invention and equivalent technologies thereof if of the present invention these are revised with modification, then the present invention also is intended to comprise these changes and modification interior.

Claims

1. a font ID querying method is characterized in that, this method comprises:

Receive the font unique identification of input;

Return the font unique identification that inquires.

2. the method for claim 1 is characterized in that, inquiry font unique identification from the font unique identification relation table of equal value of relation of equivalence module, and the method for setting up said font unique identification relation table of equal value comprises:

3. the method for claim 1 is characterized in that, inquiry font unique identification from the similar font unique identification relation table of similarity relation module, and the method for setting up said similar font unique identification relation table comprises:

Receive a plurality of similar font unique identification of user's input, these a plurality of similar font unique identifications are kept in the similar font unique identification relation table.

4. the method for claim 1 is characterized in that, at the font uniquely identified that receives input simultaneously, this method also comprises:

Receive the filtration parameter of input;

The said font unique identification that inquires that returns comprises:

5. method as claimed in claim 4 is characterized in that, when said filtration parameter comprises the similarity parameter value, saidly confirms according to said filtration parameter whether the font unique identification that inquires satisfies filtercondition and comprise:

From the similarity relation module, read the font unique identification that inquires and the font uniquely identified similarity parameter value of said input; Whether the similarity parameter value that judgement is read and the similarity parameter value of input satisfy the relation of setting; Confirm according to judged result whether the font unique identification that inquires satisfies filtercondition.

6. method as claimed in claim 4 is characterized in that, comprises sign during type information at said filtration parameter, saidly confirms according to said filtration parameter whether the font unique identification that inquires satisfies filtercondition and comprise:

Judge whether the font uniquely identified sign type of said generation and the sign type in the filtration parameter satisfy the relation of setting; Confirm according to judged result whether the font unique identification that inquires satisfies filtercondition.

7. method as claimed in claim 2 is characterized in that, generates font uniquely identified method according to the font instance and comprises:

8. method as claimed in claim 7 is characterized in that, the digest value that said basis calculates generates the font unique identification and comprises:

Generation comprises the mapping table of one or more list items, and each list item has a key and a value, and key is the subclass of said setting character or the sign of this subclass, is worth the digest value for the corresponding font data of the character in this subclass; With the font unique identification of mapping table as said embedded font.

9. a Character Font Recognition data library device is characterized in that, this device comprises:

Receiving element is used to receive the font unique identification of input;

10. device as claimed in claim 9 is characterized in that, this device also comprises:

The relation of equivalence generation unit is used for a plurality of font unique identifications according to one or more these fonts of font instances generation of font; The font unique identification of a plurality of font unique identifications that generate as equivalence is kept in the font unique identification relation table of equal value of relation of equivalence module; Perhaps, receive the font unique identification of a plurality of equivalences of user's input, the font unique identification of these a plurality of equivalences is kept in the font unique identification relation table of equal value;

Said relation of equivalence query unit is used for: from the said font unique identification relation table inquiry of equal value font unique identification of equal value with the font unique identification of input.

11. device as claimed in claim 9 is characterized in that, this device also comprises:

The similarity relation generation unit; Be used for utilizing the pattern recognition program of font to judge whether two fonts of a plurality of fonts are similar fonts; Be judged as when being, be kept at the font unique identification of said two fonts in the similar font unique identification relation table of similarity relation module as similar font unique identification; Perhaps, judge whether minor release field difference is only arranged in two font unique identifications,, said two font unique identifications are kept in the similar font unique identification relation table as similar font unique identification being judged as when being; Perhaps, receive a plurality of similar font unique identification of user's input, these a plurality of similar font unique identifications are kept in the font unique identification relation table of equal value;

Said similarity relation query unit is used for: from the font unique identification similar font unique identification of said similar font unique identification relation table inquiry with input.