WO2013079038A1 - 字体确定方法和设备 - Google Patents

字体确定方法和设备 Download PDF

Info

Publication number
WO2013079038A1
WO2013079038A1 PCT/CN2012/085773 CN2012085773W WO2013079038A1 WO 2013079038 A1 WO2013079038 A1 WO 2013079038A1 CN 2012085773 W CN2012085773 W CN 2012085773W WO 2013079038 A1 WO2013079038 A1 WO 2013079038A1
Authority
WO
WIPO (PCT)
Prior art keywords
glyph
font
determining
character
embedded
Prior art date
Application number
PCT/CN2012/085773
Other languages
English (en)
French (fr)
Inventor
仇睿恒
Original Assignee
北大方正集团有限公司
北京方正阿帕比技术有限公司
方正信息产业控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北大方正集团有限公司, 北京方正阿帕比技术有限公司, 方正信息产业控股有限公司 filed Critical 北大方正集团有限公司
Priority to JP2014511731A priority Critical patent/JP5829330B2/ja
Priority to KR1020137030703A priority patent/KR20140031269A/ko
Priority to US13/985,851 priority patent/US20130322759A1/en
Priority to EP12852905.4A priority patent/EP2787448A4/en
Publication of WO2013079038A1 publication Critical patent/WO2013079038A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography

Definitions

  • the present invention relates to the field of text data processing, and in particular, to a font determining method and apparatus. Background technique
  • font embedding is a widely used technical means. Specifically, a partial glyph is extracted from the glyph set corresponding to the original font, and the extracted glyphs are integrated to form a new glyph set. The process is called font embedding, and the obtained new font is a new glyph set.
  • the corresponding font is an inline font.
  • a partial glyph is extracted from the glyph set corresponding to the Song dynasty, and the extracted glyphs are integrated to form a new glyph set, thereby completing the font embedding process.
  • the font corresponding to the new glyph set is an inline font, which is assumed to be embedded. Font A, then the original font corresponding to the embedded font A is Song.
  • a glyph set that can be considered an inline font is a subset of the glyph set of the original font corresponding to the inline font.
  • the glyph set of the inline font will only contain the part of the font that is needed to display the characters in the document, so that the amount of data in the glyph set is as small as possible.
  • the glyph set may also contain the character encoding of each character in the document or the mapping relationship of the index number to the corresponding glyph.
  • the character code corresponding to the character or the glyph corresponding to the index number can be obtained according to the mapping relationship, and then the character is displayed according to the obtained glyph.
  • the font embedding technique can ensure the consistency of document display in different environments, since there is no, for example, due to the glyph set of the embedded font Only partial glyphs in the glyph set of the original font are included, so that the user cannot arbitrarily edit the document. For example, when the user needs to add a text "and" to the document, if the glyph set of the embedded font does not contain the glyphs of the characters "and", then the characters "and" cannot be displayed, resulting in the editing failure. For another example, when displaying a document, a glyph set to the embedded font is used.
  • the present invention provides a font determining method and apparatus for solving the problem that the original font corresponding to the embedded font used in the document cannot be determined.
  • a font determining method comprising:
  • the original font corresponding to the embedded font is determined according to the font corresponding to each glyph.
  • a font determining device comprising:
  • An embedded font determining unit for determining an embedded font used by the document An embedded font determining unit for determining an embedded font used by the document
  • a glyph selection unit configured to select at least one glyph in the glyph set of the embedded font
  • a glyph font determining unit configured to determine a font corresponding to each selected glyph
  • the original font determining unit is configured to determine an original font corresponding to the embedded font according to a font corresponding to each font.
  • At least one glyph is selected from the glyph set of the embedded font, and then the font corresponding to each glyph is determined, and the original word corresponding to the embedded font is determined according to the font corresponding to each glyph.
  • FIG. 1 is a schematic flowchart of a method according to an embodiment of the present disclosure
  • 2 is a schematic flowchart of an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a device according to an embodiment of the present invention.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The embodiment provides a font determining method. First, at least one glyph is selected from a glyph set of an embedded font or at least one glyph corresponding to a character using an inline font is selected from a document, and then selected. The font corresponding to each glyph, and the original font corresponding to the embedded font is determined according to the font corresponding to each glyph.
  • a method for determining a font includes the following steps:
  • Step 10 Determine the embedded font used by the document
  • the embedded font used for each character in the document is recorded in the description information of the document, and the root step 11: selecting at least one of the glyph sets of the determined embedded font;
  • Step 12 Determine the font corresponding to each selected glyph
  • Step 13 Determine the original font corresponding to the embedded font according to the font corresponding to each glyph.
  • step 11 at least one glyph in the glyph set of the embedded font is selected, and the specific implementation may be as follows:
  • the first type when the document includes the mapping relationship between the character encoding and the glyph, determines the glyph corresponding to the plurality of commonly used characters preset according to the mapping relationship, and selects the determined glyph from the glyph set of the embedded font.
  • the number of occurrences of each glyph using the embedded font is selected, and at least one glyph having the most occurrences in each glyph is selected.
  • the method can be applied to the case where the mapping relationship between the character encoding and the font is not included in the document, and it is also applicable to the case where the mapping relationship between the character encoding and the glyph is included in the document.
  • step 12 the font corresponding to each selected glyph is determined, and the specific implementation may be as follows: the way:
  • the character code corresponding to the glyph is determined, and the glyph feature value of the glyph is calculated, and the font corresponding to the glyph feature value is searched in the pre-generated glyph feature table, and the font is searched.
  • the font to be determined is the font corresponding to the glyph;
  • the glyph feature value of the glyph is calculated, and the font corresponding to the glyph feature value is searched in the glyph feature table, and the found font is determined as the font corresponding to the glyph. relationship.
  • the glyph feature table is generated as follows: Select a plurality of preset common characters, and for each glyph set of a plurality of locally saved fonts, extract the glyphs of the selected plurality of common characters from the glyph set, and calculate each of the extracted characters.
  • the glyph feature value of the glyph, and the mapping relationship of the extracted glyphs is saved in the glyph feature table, and each mapping relationship includes a font corresponding to the glyph, a character code corresponding to the glyph, and a glyph feature value of the glyph.
  • the above description determines the character encoding corresponding to the glyph.
  • the specific implementation may adopt the following two methods: First, when the glyph set of the embedded font includes the mapping relationship between the character encoding and the glyph, the character encoding corresponding to the glyph is determined according to the mapping relationship. ;
  • the character encoding of the font is identified using optical character recognition (OCR) techniques.
  • OCR optical character recognition
  • step 13 the original font corresponding to the embedded font is determined according to the font corresponding to each glyph, and the specific implementation can be implemented in the following two ways:
  • the same font is determined as the original font corresponding to the embedded font
  • determining the glyph corresponding to the same font in the glyph selected in step 11 and determining whether the glyph satisfies the setting condition, and determining the same font as the original font corresponding to the embedded font when satisfied, the following example:
  • Example 1 If the number of glyphs corresponding to the same font exceeds a preset threshold, the same font is determined as the original font corresponding to the embedded font.
  • the threshold is an integer greater than zero.
  • Example 2 If the number of glyphs corresponding to the same font accounts for the total number of glyphs selected in step 11 If the set threshold is exceeded, the same font is determined as the original font corresponding to the embedded font. The threshold is greater than zero and less than one.
  • Example 3 If the sum of the weighting values of the glyphs corresponding to the same font exceeds a preset threshold value, the same font is determined as the original font corresponding to the embedded font.
  • the threshold is a value greater than zero.
  • the number of glyphs corresponding to the same font is 60, wherein the weight of the 10 glyphs is 2, and the weight of the 50 glyphs is 1, then the sum of the weights of the 60 glyphs is 70, if the threshold The value is 50.
  • the present invention is not limited to the above three implementation methods, and any method capable of determining the original font corresponding to the embedded font according to the glyph corresponding to the same font is within the protection scope of the present invention.
  • the glyph corresponding to the character to be displayed is searched in the glyph set corresponding to the original font saved locally, and the displayed font is used to display the font.
  • the character to be displayed is searched in the glyph set corresponding to the original font saved locally, and the displayed font is used to display the font. The character to be displayed.
  • information used by an application such as character editing may be stored in a document, the information including information of the original font corresponding to the embedded font, the recognized character encoding, and the like.
  • the execution body of the method may be a device capable of processing a document, such as a client or a server.
  • the server may carry the information of the original font corresponding to the determined embedded font in the document and send the information to the client.
  • the client displays the document, the client searches for the glyph set corresponding to the original font saved locally. The glyph corresponding to each character to be displayed, and the character to be displayed is displayed using the found glyph.
  • Step 1 Check whether there is a mapping relationship between character encoding and glyph in the glyph set of the embedded font. If yes, go to step 2, otherwise, go to step 5;
  • Step 2 Select a glyph of at least one common character from the glyph set of the embedded font, calculate a glyph feature value of each selected glyph, and determine a character code corresponding to each glyph according to a mapping relationship between the character encoding and the glyph;
  • Step 3 For each glyph selected, the font corresponding to the character code and the glyph feature value of the glyph is searched in the glyph feature table, and the found font is determined as the font of the glyph;
  • Step 4 determining the original font corresponding to the embedded font according to the selected font of each glyph, and the process ends;
  • the font of each glyph selected belongs to the same font A, then it can be determined that the original font of the embedded font is A.
  • Step 5 Count the number of occurrences of each glyph in the document using the inline font, and select at least one glyph with the most occurrences; then go to step 6a or step 6b;
  • Step 6a For each glyph selected, the glyph is drawn, and the character encoding of the glyph is identified by OCR technology. If the recognition is successful, the glyph feature value of the glyph is calculated, and the glyph character encoding is searched in the glyph feature table. a font corresponding to the glyph feature value, the found font is determined as the font of the glyph, to step 7; if the recognition fails, to step 6b;
  • Step 6b calculating, for each glyph selected, a glyph feature value of the glyph, searching a font corresponding to the glyph feature value of the glyph in the glyph feature table, and determining the found font as a font of the glyph;
  • Step 7 determining the original font corresponding to the embedded font according to the selected font of each glyph, and the process ends;
  • the original font corresponding to the embedded font is the same font. For example, 20 common glyphs are selected. If at least 18 glyphs correspond to the same font A, the original font A corresponding to the embedded font can be determined.
  • the glyph feature table holds a mapping relationship of several ⁇ character codes, original fonts, and glyph feature values>. Since the number of fonts saved locally is limited (hundreds of common fonts), and generally the number of glyphs selected is not much, the overhead of constructing a glyph feature table for a common character is acceptable, and The overhead of matching and searching is also small.
  • font feature table there may be more than one font feature table.
  • you can target each character The types respectively generate a glyph feature table, and the character types include numbers, letters, punctuation marks, Chinese characters, other special characters, and the like.
  • the rules for selecting glyphs can also be different. For example, if there are few types of punctuation marks, the mapping relationship of the corresponding glyphs can be added to the corresponding glyph feature table; and the Chinese characters can add the mapping relationship of the most common 200 Chinese characters corresponding glyphs to the corresponding glyph feature table.
  • font search can be performed in the corresponding glyph feature table according to the character type; font search can also be performed in all tables.
  • the common characters selected may not be common characters, so there may be cases where the corresponding font cannot be found according to the glyph feature value. Therefore, in the step of course, there is a mapping of character encoding and glyph.
  • the embedded font of the relationship may also ignore the mapping relationship, that is, when there is a mapping relationship between character encoding and glyph in step 1, it is also possible to go to step 5.
  • the lack of assistance with character encoding may in some cases be affected by efficiency and accuracy.
  • the corresponding original font can be found according to the embedded font, so that the free text editing or the data transmission of the embedded font can be further performed, and it can also be applied to other applications that rely on the original font.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • the inline font A from the new sect contains a mapping between character encoding and glyphs.
  • the MD5 value of the glyph data is used as the glyph feature value of the glyph.
  • Choose common 200 Chinese characters (such as "", “ ⁇ ”, “Yes”, “ ⁇ ”, etc.), from the collection of ten common Chinese fonts, such as New Song, Blackbody, Carcass, Chinese Imitation Song, and Young Round.
  • the 200-character glyphs are extracted, and the glyph feature values of each glyph are respectively calculated, thereby obtaining a glyph feature table of common Chinese characters, which is shown in Table 1 below:
  • Step 1 Select the glyphs corresponding to the four characters "", “ ⁇ ", “Yes”, and “ ⁇ ” from the glyph set of the embedded font A, because these four characters are common and include inline fonts.
  • Step 2 Calculate the glyph feature value corresponding to each glyph selected.
  • the glyph feature value of “Yes” is 65c8c486368da89dedd430b09127f883.
  • the font with the character value of "Yes” is determined by looking up the glyph feature table, and the font with the eigenvalue of 65c8c486368da89dedd430b09127f883 is New Song.
  • Step 3 Since the font corresponding to each glyph selected is New Song, it is determined that the original font corresponding to the embedded font A is New Song.
  • the glyph feature table in the above embodiment is not necessarily stored as a table, and may be stored as a tree or other data structure as long as it can search and locate according to the provided conditions.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • the inline font A from the new sect does not contain the mapping between character encoding and glyphs in its glyph set.
  • the MD5 value of the glyph data is used as the feature value of the glyph.
  • Choose the common 200 Chinese characters (such as "", “ ⁇ ", “Yes”, “ ⁇ ”, etc., do not contain "Silver,,", from the New Song, Blackbody, Carcass, Chinese imitation Song, young round, etc.
  • the 200-character glyphs are extracted from the glyph set of the Chinese font, and the glyph feature values of each glyph are respectively calculated, thereby obtaining a glyph feature table of common Chinese characters, as shown in Table 1.
  • Step 1 Count the number of common glyphs in the document using the embedded font A. Select the top 5 common glyphs with the most occurrences, such as “Yes”, “Yes”, “Y”, “Silver”, “One” .
  • Step 2 When processing the "g” glyph, first use OCR technology to identify, get the “character” of the character, and then pass the character encoding and glyph feature values according to "" 53dll69058611886e5cf2b2b4dd0627f Find the glyph feature table and determine the glyph of "" corresponds to the new Song.
  • Step 3 After processing 5 glyphs, it is found that all four glyphs correspond to the new Song style, and one glyph cannot determine its font. Considering that the distribution pattern of common glyphs in the document may be different from the distribution rules of common characters, it is finally determined.
  • the original font of the embedded font A is the new Song.
  • the information of the original font corresponding to the embedded font determined in the present invention may be written back to the description information of the document for use by subsequent applications, for example, when a character needs to be displayed, if the font collection of the embedded font does not include the The glyph of the character, then, the glyph of the character can be searched from the glyph set of the original font corresponding to the inline font, and then the text is displayed according to the glyph.
  • the character encoding determined in the present invention can also be written back to the document's configuration file for use by applications such as text editing. For example, when a character needs to be edited, the corresponding glyph can be directly found according to the character code of the saved character, and then the text can be edited according to the glyph. There is no need to temporarily determine the character encoding of the character, which improves the display speed.
  • the glyph feature value can be calculated by using a Message Digest Algorithm (MD5).
  • MD5 Message Digest Algorithm
  • SHA-1 Secure Hash Algorithm
  • Techniques such as contour feature extraction in graphics processing are calculated.
  • an embodiment of the present invention provides a font determining device, where the device includes:
  • the embedded font determining unit 30 is configured to determine an embedded font used by the document; a glyph selection unit 31, configured to select at least one glyph of the glyph set of the embedded font, or select a glyph corresponding to at least one character in the document that uses the inline font;
  • a font font determining unit 32 configured to determine a font corresponding to each selected font
  • the original font determining unit 33 is configured to determine an original font corresponding to the embedded font according to a font corresponding to each font.
  • the glyph selection unit 31 is configured to:
  • the document includes a mapping relationship between the character encoding and the glyph, determining, according to the mapping relationship, a glyph corresponding to the plurality of common characters set in advance, and selecting the determined glyph from the glyph set of the embedded font; or ,
  • the number of occurrences of each glyph in the document using the inline font is counted, and at least one glyph having the most occurrences in each glyph is selected.
  • the glyph font determining unit 32 is configured to:
  • the character code corresponding to the glyph is determined, and the glyph feature value of the glyph is calculated, and the font corresponding to the glyph feature value is searched in the pre-generated glyph feature table, and the font to be found is found.
  • glyph feature table is Contains the mapping relationship between character encoding and font and glyph feature values.
  • the glyph font determining unit 32 is configured to:
  • the character encoding corresponding to the glyph is determined according to the mapping relationship
  • the OCR technique is used to identify the character encoding of the font.
  • the original font determining unit 33 is configured to:
  • the same font is determined as the original font corresponding to the embedded font.
  • the device further includes:
  • the display unit 34 is configured to: after determining the original font corresponding to the embedded font, search for a glyph corresponding to the character to be displayed in the glyph set corresponding to the original font saved locally, and use the search The resulting glyph displays the character to be displayed.
  • the beneficial effects of the present invention include:
  • At least one glyph is selected from the glyph set of the embedded font, and then the font corresponding to each selected glyph is determined, and the original font corresponding to the embedded font is determined according to the font corresponding to each glyph. It can be seen that this solution implements the problem of determining the font corresponding to the embedded font used in the document.
  • the glyph corresponding to the character to be displayed is searched for the glyph set corresponding to the original font saved locally, and the character to be displayed is displayed by using the found glyph.
  • the glyph set of the inline font used by the document does not include the glyph of the character to be added, then the original font corresponding to the embedded font can be saved locally.
  • the glyphs of the glyphs are searched for the glyphs of the characters to be added, and the characters are displayed according to the glyphs, thereby avoiding the problem of editing failure.
  • the client when the client needs to display a document saved on the server, the client can locally obtain the glyph set of the original font corresponding to the embedded font used by the document, without downloading the glyph set of the embedded font used by the document. , thereby improving the display speed of documents in the network environment.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Document Processing Apparatus (AREA)
  • Controls And Circuits For Display Device (AREA)
  • Character Discrimination (AREA)

Abstract

公开了一种字体确定方法和设备,该方法确定选取的各字形对应的字体,并根据各字形对应的字体确定内嵌字体对应的原始字体。本方案解决了无法确定文档中使用的内嵌字体所对应的原始字体的问题。

Description

字体确定方法和设备 技术领域
本发明涉及文字数据处理领域, 尤其涉及一种字体确定方法和设备。 背景技术
为了保证在不同平台上文档显示的一致性, 字体内嵌是一种被广泛采用的 技术手段。 具体来说, 从原始字体对应的字形集合中抽取部分字形, 将抽取的 字形整合在一起形成一个新的字形集合, 该过程就称为字体内嵌, 所得到的新 的字体即新的字形集合对应的字体就是内嵌字体。 例如, 从宋体对应的字形集 合中抽取部分字形, 将抽取的字形整合在一起形成新的字形集合, 从而完成字 体内嵌过程, 新的字形集合对应的字体为一个内嵌字体, 假设为内嵌字体 A, 那么内嵌字体 A对应的原始字体即为宋体。可以认为内嵌字体的字形集合是该 内嵌字体对应的原始字体的字形集合的一个子集。
一般来说, 内嵌字体的字形集合中只会包含显示文档中字符所需要的那部 分字形, 以使字形集合的数据量尽可能的小。 此外, 字形集合中还可能包含文 档中各字符的字符编码或者索引号到相应字形的映射关系。在显示文档中的字 符时, 可以根据该映射关系获取到该字符的字符编码或者索引号对应的字形, 然后根据获取到的字形显示该字符。
在实现本发明的过程中, 发明人发现现有技术中存在以下技术问题: 虽然字体内嵌技术能够保证在不同环境下文档显示的一致性,但是由于无 例如, 由于内嵌字体的字形集合中仅包含原始字体的字形集合中的部分字 形, 使得用户不能对文档进行任意的编辑。 比如, 在用户需要在文档中增加一 个文字 "和" 时, 如果内嵌字体的字形集合中不包含文字 "和" 的字形, 那么, 就无法显示文字 "和", 导致编辑失败。 又例如, 在显示文档时要使用到内嵌字体的字形集合, 那么, 在客户端需 要显示服务器上保存的一个文档时,客户端需要下载该文档的所有配置文件包 括该文档使用的内嵌字体的字形集合, 由于内嵌字体的字形集合的数据量普遍 偏大, 使得在网络环境中文档的显示速度较慢。 发明内容
本发明提供一种字体确定方法和设备, 用于解决无法确定文档中使用的内 嵌字体所对应的原始字体的问题。
一种字体确定方法, 该方法包括:
确定文档所使用的内嵌字体;
选取所述内嵌字体的字形集合中的至少一个字形;
确定选取的各字形对应的字体;
根据各字形对应的字体, 确定所述内嵌字体对应的原始字体。
一种字体确定设备, 该设备包括:
内嵌字体确定单元, 用于确定文档所使用的内嵌字体;
字形选取单元, 用于选取所述内嵌字体的字形集合中的至少一个字形; 字形字体确定单元, 用于确定选取的各字形对应的字体;
原始字体确定单元, 用于根据各字形对应的字体, 确定所述内嵌字体对应 的原始字体。
本方案中, 首先从内嵌字体的字形集合中选取至少一个字形, 然后确定选 取的各字形对应的字体, 并根据各字形对应的字体确定内嵌字体对应的原始字
附图说明
图 1为本发明实施例提供的方法流程示意图; 图 2为本发明实施例的流程示意图;
图 3为本发明实施例提供的设备结构示意图。 具体实施方式 明实施例提供一种字体确定方法, 该方法中, 首先从内嵌字体的字形集合中选 取至少一个字形或从文档中选取至少一个使用内嵌字体的字符对应的字形, 然 后确定选取的各字形对应的字体, 并根据各字形对应的字体确定内嵌字体对应 的原始字体。
参见图 1 , 本发明实施例提供的字体确定方法, 包括以下步骤:
步骤 10: 确定文档所使用的内嵌字体;
这里, 在文档的描述信息中记录有文档中各字符使用的内嵌字体, 可以根 步骤 11: 选取确定的内嵌字体的字形集合中的至少一个字形;
步骤 12: 确定选取的各字形对应的字体;
步骤 13: 根据各字形对应的字体, 确定内嵌字体对应的原始字体。
步骤 11 中, 选取内嵌字体的字形集合中的至少一个字形, 具体实现可以 采用如下两种方式:
第一种, 在文档中包含字符编码与字形的映射关系时, 根据该映射关系确 定预先设定的多个常用字符分别对应的字形, 并从内嵌字体的字形集合中选取 确定的字形。
第二种, 统计文档中使用内嵌字体的各字形出现的次数, 选取各字形中出 现次数最多的至少一个字形。本中方法可以适用于文档中未包含字符编码与字 形的映射关系的情况, 当然也可以适用于文档中包含字符编码与字形的映射关 系的情况。
步骤 12 中, 确定选取的各字形对应的字体, 具体实现可以采用如下两种 方式:
第一, 对于选取的每个字形, 确定该字形对应的字符编码, 并计算该字形 的字形特征值,在预先生成的字形特征表中查找该字符编码与该字形特征值对 应的字体, 将查找到的字体确定为该字形对应的字体;
第二, 对于选取的每个字形, 计算该字形的字形特征值, 在字形特征表中 查找该字形特征值对应的字体, 将查找到的字体确定为该字形对应的字体。 关系。 字形特征表的生成方法如下: 选择预先设定的多个常用字符, 对于本地 保存的多个字体的字形集合, 从该字形集合中抽取选择的多个常用字符的字 形, 计算抽取到的每个字形的字形特征值, 并将抽取到的字形的映射关系保存 在字形特征表中, 每条映射关系中包含该字形对应的字体、 该字形对应的字符 编码和该字形的字形特征值。
上述确定该字形对应的字符编码, 具体实现可以采用如下两种方式: 第一, 在内嵌字体的字形集合中包含字符编码与字形的映射关系时, 根据 该映射关系确定该字形对应的字符编码;
第二, 利用光学字符识别 (OCR )技术识别该字体的字符编码。
步骤 13 中, 根据各字形对应的字体, 确定内嵌字体对应的原始字体, 具 体实现可以采用如下两种方式:
第一, 若各字形对应的字体为同一字体, 则将该同一字体确定为内嵌字体 对应的原始字体;
第二, 确定步骤 11 中选取的字形中对应同一字体的字形, 并确定该字形 是否满足设定条件, 在满足时将该同一字体确定为内嵌字体对应的原始字体, 下面举例说明:
例 1: 若对应同一字体的字形的个数超过预先设定的门限值, 则将该同一 字体确定为内嵌字体对应的原始字体。 该门限值为大于 0的整数。
例 2:若对应同一字体的字形的个数占步骤 11中选取的字形总个数的比率 超过设定门限值, 则将该同一字体确定为内嵌字体对应的原始字体。 该门限值 大于 0且小于 1。
例 3: 若对应同一字体的字形的加权值之和超过预先设定的门限值, 则将 该同一字体确定为内嵌字体对应的原始字体。该门限值为大于 0的数值。比如, 对应同一字体的字形的个数为 60, 其中 10个字形的加权值为 2, 50个字形的 加权值为 1 , 那么, 该 60个字形的加权值之和为 70, 若该门限值为 50, 则该 当然, 本发明并不局限于上述 3种实现方法, 任何能够根据对应同一字体 的字形确定内嵌字体对应的原始字体的方法, 均在本发明的保护范围内。
较佳的, 在确定内嵌字体对应的原始字体之后, 在需要进行字符显示时, 在本地保存的该原始字体对应的字形集合中查找待显示字符对应的字形, 并使 用查找到的字形显示该待显示字符。
较佳的, 本发明中还可以将字符编辑等应用使用的信息保存在到文档中, 该信息包括内嵌字体对应的原始字体的信息、 识别出的字符编码等。
需要说明的是, 本方法的执行主体可以是客户端、 服务器等能够处理文档 的设备。 在执行主体是服务器时, 服务器可以将确定的内嵌字体对应的原始字 体的信息携带在文档中发送给客户端, 客户端在显示文档时, 在本地保存的该 原始字体对应的字形集合中查找各待显示字符对应的字形, 并使用查找到的字 形显示该待显示字符。
下面对本发明进行具体说明:
对于文档使用的每个内嵌字体, 按如下步骤进行处理:
步骤 1:检查内嵌字体的字形集合中是否存在字符编码到字形的映射关系, 如果存在, 到步骤 2, 否则, 到步骤 5;
步骤 2: 从内嵌字体的字形集合中选取至少一个常用字符的字形, 计算每 个选取的字形的字形特征值, 并根据字符编码到字形的映射关系确定每个字形 对应的字符编码; 步骤 3: 对于选取的每个字形, 在字形特征表中查找该字形的字符编码和 字形特征值对应的字体, 将查找到的字体确定为该字形的字体;
步骤 4: 根据选取的每个字形的字体确定内嵌字体对应的原始字体, 流程 结束;
具体的, 如果所选取的每个字形的字体都属于同一字体 A, 那么则可以确 定该内嵌字体的原始字体就是 A。
步骤 5: 统计该文档中使用该内嵌字体的各字形出现的次数, 并选取至少 一个出现次数最多的字形; 然后到步骤 6a或步骤 6b;
步骤 6a: 对于选取的每个字形, 将该字形绘制出来, 使用 OCR技术识别 该字形的字符编码, 若识别成功, 则计算该字形的字形特征值, 在字形特征表 中查找该字形的字符编码和字形特征值对应的字体, 将查找到的字体确定为该 字形的字体, 到步骤 7; 若识别失败, 到步骤 6b;
步骤 6b: 对于选取的每个字形, 计算该字形的字形特征值, 在字形特征表 中查找该字形的字形特征值对应的字体, 将查找到的字体确定为该字形的字 体;
步骤 7: 根据选取的每个字形的字体确定内嵌字体对应的原始字体, 流程 结束;
具体的, 如果对应同一字体的字形的个数超过预先设定的门限值, 则可以 判定该内嵌字体所对应的原始字体为该同一字体。 例如, 选取了 20个常见的 字形, 若其中最少 18个字形都对应同一字体 A, 则可以判定该内嵌字体所对 应的原始字体 A。
字形特征表保存了若干<字符编码, 原始字体, 字形特征值 >的映射关系。 由于本地保存的字体的数量有限(几百种常见字体), 而且一般来说所选取的 字形的数量也不会很多, 所以构造一个常见字符的字形特征表的开销是可以接 受的, 而且在其中进行匹配、 搜索的开销也 ^艮小。
在实际使用中, 字形特征表可以存在不止一张。 例如, 可以针对每种字符 类型分别生成一张字形特征表, 字符类型包括数字、 字母、 标点符号、 汉字、 其他特殊字符等。 对于每张字形特征表, 选取字形的规则也可不同。 如, 标点 符号种类较少, 可以将所有标点符号对应字形的映射关系加入对应的字形特征 表; 而汉字则可以将最常见的 200个汉字对应字形的映射关系加入对应的字形 特征表。 在使用时, 可以按照字符类型在对应的字形特征表中进行字体查找; 也可以在所有表中进行字体查找。
由于 OCR存在误识别率, 同时所选取的常见字符也有一定可能不是常见 字符, 所以可能存在根据字形特征值找不到对应的字体的情况, 因此在进行步 当然, 对于存在字符编码和字形的映射关系的内嵌字体, 也可以忽视该映 射关系, 即步骤 1中在存在字符编码到字形的映射关系时, 也可以到步骤 5。 但是缺少了字符编码的辅助, 在某些情况下效率和准确率可能会受到影响。
通过本实施例可以根据内嵌字体找到对应的原始字体, 从而可以进一步进 行自由的文字编辑或省略内嵌字体的数据传输, 也可适用于其他依赖原始字体 的应用。
实施例一:
从新宋体(simsun.ttf )得来的内嵌字体 A, 其字形集合中包含字符编码与 字形的映射关系。 采用字形数据的 MD5值作为该字形的字形特征值。 选择常 见的 200个汉字字符(如 "的", "一", "是", "了" 等), 从新宋体、 黑体、 楷体、 华文仿宋、 幼圓等十个常见的中文字体的字形集合中抽取这 200个字符 的字形, 并分别计算各字形的字形特征值, 从而得到了一个常见汉字的字形特 征表, 示意如下表 1 :
字符编码 字体 字形特征值
的 新宋体 53dll69058611886e5cf2b2b4dd0627f 新宋体 C8f77ee32399b7bbe05560f9da7aa5a3 疋 新宋体 65c8c486368da89dedd430b09127f883
Figure imgf000010_0001
步骤 1 : 从内嵌字体 A的字形集合中选择 "的"、 "一"、 "是"、 "了" 这四 个字符对应的字形, 因为这四个字符很常见,且包含在内嵌字体 A的字形集合 中; 也可以选择包含在内嵌字体 A的字形集合中、 同时还包含在字体特征表中 的常见字符。
步骤 2: 计算选择的每个字形对应的字形特征值, 如 "是" 的字形特征值 就是 65c8c486368da89dedd430b09127f883。 通过查找字形特征表确定字符编码 为"是", 特征值为 65c8c486368da89dedd430b09127f883的字体是新宋体。
同样可以确认其他三个字形对应的字体也是新宋体。
步骤 3: 由于选择的每个字形对应的字体是新宋体, 因此确定内嵌字体 A 对应的原始字体是新宋体。
上述实施例中的字形特征表并不一定真的存储为表状, 也可以存储为树等 其他数据结构, 只要其能够根据提供的条件进行搜索、 定位即可。
实施例二:
从新宋体(simsun.ttf )得来的内嵌字体 A, 其字形集合中不包含字符编码 与字形的映射关系。 采用字形数据的 MD5值作为该字形的特征值。 选择常见 的 200个汉字字符(如 "的", "一", "是", "了" 等, 不包含 "银,,), 从新宋 体、 黑体、 楷体、 华文仿宋、 幼圓等十个常见的中文字体的字形集合中抽取这 200个字符的字形, 并分别计算各字形的字形特征值, 从而得到了一个常见汉 字的字形特征表, 如表 1所示。
步骤 1 : 统计文档中使用内嵌字体 A的常见字形出现的次数, 选取前 5个 出现次数最多的常见字形, 比如是 "的"、 "是"、 "了"、 "银"、 "一"。
步骤 2: 当处理 "的"的字形时, 首先利用 OCR技术进行识别, 得到 "的" 的字符编码, 然后通过根据 "的 " 的字符编码和字形特征值 53dll69058611886e5cf2b2b4dd0627f 查找字形特性表, 确定 "的" 的字形对应 新宋体。
当处理 "是" 的字形时, 利用 OCR技术将其错误识别为 "足", 从而未在 字形特征表中找到对应的字体, 则直接通过根据 "是" 的字形特征值 65c8c486368da89dedd430b09127f883查找字形特性表, 确定 "是" 的字形对应 新宋体。
"了" 和 "一" 不再赘述。 确认 "了" 和 "一" 的字形都对应新宋体。 当处理 "银" 的字形时, 利用 OCR技术和字形特征值都不能找到其对应 的字体。
步骤 3: 处理完 5个字形后, 发现 4个字形都对应新宋体, 还有 1个字形 不能确定其字体, 考虑到文档常见字形的分布规律可能与常见字符分布规律存 在一些差异, 最终判定该内嵌字体 A的原始字体就是新宋体。
本发明中确定的内嵌字体对应的原始字体的信息可以写回到文档的描述 信息中, 以供后续应用使用, 比如, 在需要显示一个字符时, 如果内嵌字体的 字形集合中不包含该字符的字形, 那么, 可以从该内嵌字体对应的原始字体的 字形集合中查找该字符的字形, 进而根据字形进行文字显示。
同样本发明中确定的字符编码也可以写回到文档的配置文件中, 以供文字 编辑等应用使用。 比如, 在需要编辑一个字符时, 可以根据已保存的该字符的 字符编码直接找到对应的字形, 进而根据字形进行文字编辑。 而不需要临时确 定该字符的字符编码, 提高了显示速度。
本发明中字形特征值的计算可以采用消息摘要算法 (Message Digest Algorithm , MD5 ) , 在实际使用时也可以采用安全散列算法 (Secure Hash Algorithm, SHA-1 )等其他摘要计算方法, 也可以采用图形处理中轮廓特征提 取等技术进行计算。
参见图 3 , 本发明实施例提供一种字体确定设备, 该设备包括:
内嵌字体确定单元 30, 用于确定文档所使用的内嵌字体; 字形选取单元 31 , 用于选取所述内嵌字体的字形集合中的至少一个字形, 或者选取所述文档中至少一个使用所述内嵌字体的字符对应的字形;
字形字体确定单元 32, 用于确定选取的各字形对应的字体;
原始字体确定单元 33 ,用于根据各字形对应的字体,确定所述内嵌字体对 应的原始字体。
进一步的, 所述字形选取单元 31用于:
在所述文档中包含字符编码与字形的映射关系时, 根据该映射关系确定预 先设定的多个常用字符分别对应的字形, 并从所述内嵌字体的字形集合中选取 确定的字形; 或者,
统计所述文档中使用所述内嵌字体的各字形出现的次数, 选取各字形中出 现次数最多的至少一个字形。
进一步的, 所述字形字体确定单元 32用于:
对于选取的每个字形, 确定该字形对应的字符编码, 并计算该字形的字形 特征值,在预先生成的字形特征表中查找该字符编码与该字形特征值对应的字 体, 将查找到的字体确定为该字形对应的字体; 或者,
对于选取的每个字形, 计算该字形的字形特征值, 在所述字形特征表中查 找该字形特征值对应的字体, 将查找到的字体确定为该字形对应的字体; 所述 字形特征表中包含字符编码与字体、 字形特征值的映射关系。
进一步的, 所述字形字体确定单元 32用于:
在所述字形集合中包含字符编码与字形的映射关系时, 根据该映射关系确 定该字形对应的字符编码; 或者,
利用 OCR技术识别该字体的字符编码。
进一步的, 所述原始字体确定单元 33用于:
若各字形对应的字体为同一字体, 则将该同一字体确定为所述内嵌字体对 应的原始字体; 或者,
确定对应同一字体的字形, 并确定该字形是否满足设定条件, 在满足时将 该同一字体确定为所述内嵌字体对应的原始字体。
进一步的, 该设备还包括:
显示单元 34,用于在确定所述内嵌字体对应的原始字体之后,在需要进行 字符显示时, 在本地保存的所述原始字体对应的字形集合中查找待显示字符对 应的字形, 并使用查找到的字形显示该待显示字符。
综上, 本发明的有益效果包括:
本发明实施例提供的方案中, 首先从内嵌字体的字形集合中选取至少一个 字形, 然后确定选取的各字形对应的字体, 并根据各字形对应的字体确定内嵌 字体对应的原始字体。 可见, 本方案实现了确定文档中使用的内嵌字体所对应 字体的问题。
在确定内嵌字体对应的原始字体之后, 在需要进行字符显示时, 在本地保 存的所述原始字体对应的字形集合中查找待显示字符对应的字形, 并使用查找 到的字形显示该待显示字符, 能够解决由于无法确定文档中使用的内嵌字体所 对应的原始字体而带来的问题。 比如, 在用户需要在文档中增加一个字符时, 如果文档使用的内嵌字体的字形集合中不包含该需要增加的字符的字形, 那 么, 可以从本地保存的该内嵌字体对应的原始字体的字形集合中查找该需要增 加的字符的字形, 进而根据字形进行文字显示, 从而避免了编辑失败的问题。 又比如, 在客户端需要显示服务器上保存的一个文档时, 客户端可以从本地获 取文档使用的内嵌字体对应的原始字体的字形集合, 而不需要下载该文档使用 的内嵌字体的字形集合, 从而提高了在网络环境中文档的显示速度。
本发明是参照根据本发明实施例的方法、 设备(系统)、 和计算机程序产 品的流程图和 /或方框图来描述的。应理解可由计算机程序指令实现流程图和 /或方框图中的每一流程和 /或方框、 以及流程图和 /或方框图中的流程和 / 或方框的结合。 可提供这些计算机程序指令到通用计算机、 专用计算机、 嵌入 式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算 这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设 备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中 的指令产生包括指令装置的制造品, 该指令装置实现在流程图一个流程或多个 流程和 /或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使 得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处 理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个 流程或多个流程和 /或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基 本创造性概念, 则可对这些实施例作出另外的变更和修改。 所以, 所附权利要 求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。 明的精神和范围。 这样, 倘若本发明的这些修改和变型属于本发明权利要求及 其等同技术的范围之内, 则本发明也意图包含这些改动和变型在内。

Claims

权 利 要 求 书
1、 一种字体确定方法, 其特征在于, 该方法包括:
确定文档所使用的内嵌字体;
选取所述内嵌字体的字形集合中的至少一个字形;
确定选取的各字形对应的字体;
根据各字形对应的字体, 确定所述内嵌字体对应的原始字体。
2、 如权利要求 1 所述的方法, 其特征在于, 所述选取所述内嵌字体的字 形集合中的至少一个字形, 具体包括:
在所述文档中包含字符编码与字形的映射关系时, 根据该映射关系确定预 先设定的多个常用字符分别对应的字形, 并从所述内嵌字体的字形集合中选取 确定的字形; 或者,
统计所述文档中使用所述内嵌字体的各字形出现的次数, 选取各字形中出 现次数最多的至少一个字形。
3、 如权利要求 1所述的方法, 其特征在于, 所述确定选取的各字形对应 的字体, 具体包括:
对于选取的每个字形, 确定该字形对应的字符编码, 并计算该字形的字形 特征值,在预先生成的字形特征表中查找该字符编码与该字形特征值对应的字 体, 将查找到的字体确定为该字形对应的字体; 或者,
对于选取的每个字形, 计算该字形的字形特征值, 在所述字形特征表中查 找该字形特征值对应的字体, 将查找到的字体确定为该字形对应的字体; 所述字形特征表中包含字符编码与字体、 字形特征值的映射关系。
4、 如权利要求 3所述的方法, 其特征在于, 所述确定该字形对应的字符 编码, 具体包括:
在所述字形集合中包含字符编码与字形的映射关系时, 根据该映射关系确 定该字形对应的字符编码; 或者, 利用光学字符识别 OCR技术识别该字体的字符编码。
5、 如权利要求 1-4 中任一所述的方法, 其特征在于, 所述根据各字形对 应的字体, 确定所述内嵌字体对应的原始字体, 具体包括:
若各字形对应的字体为同一字体, 则将该同一字体确定为所述内嵌字体对 应的原始字体; 或者,
确定对应同一字体的字形, 并确定该字形是否满足设定条件, 在满足时将 该同一字体确定为所述内嵌字体对应的原始字体。
6、 如权利要求 1-4 中任一所述的方法, 其特征在于, 在确定所述内嵌字 体对应的原始字体之后, 进一步包括:
在需要进行字符显示时, 在本地保存的所述原始字体对应的字形集合中查 找待显示字符对应的字形, 并使用查找到的字形显示该待显示字符。
7、 一种字体确定设备, 其特征在于, 该设备包括:
内嵌字体确定单元, 用于确定文档所使用的内嵌字体;
字形选取单元, 用于选取所述内嵌字体的字形集合中的至少一个字形, 或 者选取所述文档中至少一个使用所述内嵌字体的字符对应的字形;
字形字体确定单元, 用于确定选取的各字形对应的字体;
原始字体确定单元, 用于根据各字形对应的字体, 确定所述内嵌字体对应 的原始字体。
8、 如权利要求 7所述的设备, 其特征在于, 所述字形选取单元用于: 在所述文档中包含字符编码与字形的映射关系时, 根据该映射关系确定预 先设定的多个常用字符分别对应的字形, 并从所述内嵌字体的字形集合中选取 确定的字形; 或者,
统计所述文档中使用所述内嵌字体的各字形出现的次数, 选取各字形中出 现次数最多的至少一个字形。
9、 如权利要求 7所述的设备, 其特征在于, 所述字形字体确定单元用于: 对于选取的每个字形, 确定该字形对应的字符编码, 并计算该字形的字形 特征值,在预先生成的字形特征表中查找该字符编码与该字形特征值对应的字 体, 将查找到的字体确定为该字形对应的字体; 或者,
对于选取的每个字形, 计算该字形的字形特征值, 在所述字形特征表中查 找该字形特征值对应的字体, 将查找到的字体确定为该字形对应的字体;
所述字形特征表中包含字符编码与字体、 字形特征值的映射关系。
10、如权利要求 9所述的设备,其特征在于,所述字形字体确定单元用于: 在所述字形集合中包含字符编码与字形的映射关系时, 根据该映射关系确 定该字形对应的字符编码; 或者,
利用 OCR技术识别该字体的字符编码。
11、如权利要求 7-10中任一所述的设备, 其特征在于, 所述原始字体确定 单元用于:
若各字形对应的字体为同一字体, 则将该同一字体确定为所述内嵌字体对 应的原始字体; 或者,
确定对应同一字体的字形, 并确定该字形是否满足设定条件, 在满足时将 该同一字体确定为所述内嵌字体对应的原始字体。
12、 如权利要求 7-10中任一所述的设备, 其特征在于, 该设备还包括: 显示单元, 用于在确定所述内嵌字体对应的原始字体之后, 在需要进行字 符显示时, 在本地保存的所述原始字体对应的字形集合中查找待显示字符对应 的字形, 并使用查找到的字形显示该待显示字符。
PCT/CN2012/085773 2011-12-01 2012-12-03 字体确定方法和设备 WO2013079038A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2014511731A JP5829330B2 (ja) 2011-12-01 2012-12-03 フォントを識別するための方法および装置
KR1020137030703A KR20140031269A (ko) 2011-12-01 2012-12-03 글꼴을 판별하는 방법 및 장치
US13/985,851 US20130322759A1 (en) 2011-12-01 2012-12-03 Method and device for identifying font
EP12852905.4A EP2787448A4 (en) 2011-12-01 2012-12-03 METHOD AND DEVICE FOR DETERMINING POLICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110393936.1A CN103136166B (zh) 2011-12-01 2011-12-01 字体确定方法和设备
CN201110393936.1 2011-12-01

Publications (1)

Publication Number Publication Date
WO2013079038A1 true WO2013079038A1 (zh) 2013-06-06

Family

ID=48496008

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/085773 WO2013079038A1 (zh) 2011-12-01 2012-12-03 字体确定方法和设备

Country Status (5)

Country Link
EP (1) EP2787448A4 (zh)
JP (1) JP5829330B2 (zh)
KR (1) KR20140031269A (zh)
CN (1) CN103136166B (zh)
WO (1) WO2013079038A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488471B (zh) * 2015-11-30 2019-03-29 北大方正集团有限公司 一种字形识别方法及装置
CN105975448A (zh) * 2016-05-04 2016-09-28 北京华熙动博网络科技有限公司 一种字体加载方法及装置
CN107943760B (zh) * 2017-11-22 2021-09-21 万兴科技股份有限公司 Pdf文档编辑的字体优化方法、装置、终端设备和存储介质
CN109656821B (zh) * 2018-12-11 2022-06-07 万兴科技股份有限公司 测试方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003223161A (ja) * 2002-01-30 2003-08-08 Canon Inc 情報処理システム、方法及び装置、プログラム並びに記憶媒体
CN101008940A (zh) * 2006-01-27 2007-08-01 北京书生国际信息技术有限公司 自动处理字体缺失的方法与装置
US20110188761A1 (en) * 2010-02-02 2011-08-04 Boutros Philip Character identification through glyph data matching
CN102567431A (zh) * 2010-12-31 2012-07-11 北大方正集团有限公司 文档处理方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100661173B1 (ko) * 2005-10-18 2006-12-26 삼성전자주식회사 다이렉트 프린팅 기능을 갖는 프린터 및 그 인쇄방법
US8194280B2 (en) * 2007-01-31 2012-06-05 Konica Minolta Laboratory U.S.A., Inc. Direct printing of a desired or multiple appearances of object in a document file
CN101782896B (zh) * 2009-01-21 2011-11-30 汉王科技股份有限公司 结合ocr技术的pdf文字提取方法
CN102063415B (zh) * 2009-11-16 2012-07-25 北大方正集团有限公司 向pdf文件内嵌单字节字体的方法及其系统
US20110276872A1 (en) * 2010-05-06 2011-11-10 Xerox Corporation Dynamic font replacement

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003223161A (ja) * 2002-01-30 2003-08-08 Canon Inc 情報処理システム、方法及び装置、プログラム並びに記憶媒体
CN101008940A (zh) * 2006-01-27 2007-08-01 北京书生国际信息技术有限公司 自动处理字体缺失的方法与装置
US20110188761A1 (en) * 2010-02-02 2011-08-04 Boutros Philip Character identification through glyph data matching
CN102567431A (zh) * 2010-12-31 2012-07-11 北大方正集团有限公司 文档处理方法及装置

Also Published As

Publication number Publication date
CN103136166A (zh) 2013-06-05
KR20140031269A (ko) 2014-03-12
CN103136166B (zh) 2015-06-17
EP2787448A4 (en) 2016-03-16
EP2787448A1 (en) 2014-10-08
JP2014522519A (ja) 2014-09-04
JP5829330B2 (ja) 2015-12-09

Similar Documents

Publication Publication Date Title
WO2019184217A1 (zh) 热点事件分类方法、装置及存储介质
WO2019218514A1 (zh) 网页目标信息的提取方法、装置及存储介质
WO2016180268A1 (zh) 一种文本聚合方法及装置
WO2019041521A1 (zh) 用户关键词提取装置、方法及计算机可读存储介质
TWI554896B (zh) Information Classification Method and Information Classification System Based on Product Identification
CN110110075A (zh) 网页分类方法、装置以及计算机可读存储介质
JP5930496B2 (ja) レイアウトファイルにおける構造化情報の取得方法及び装置
CN109241523B (zh) 变体作弊字段的识别方法、装置及设备
CN104978354B (zh) 文本分类方法和装置
WO2013079038A1 (zh) 字体确定方法和设备
WO2016095645A1 (zh) 笔画输入方法、装置和系统
CN107357777A (zh) 提取标签信息的方法和装置
JP6419969B2 (ja) 画像の提示情報を提供するための方法及び機器
TWI317488B (en) Method for automatically detecting similar documents
CN110928986B (zh) 法律证据的排序和推荐方法、装置、设备及存储介质
US20130322759A1 (en) Method and device for identifying font
CN104346616B (zh) 字符识别装置和字符识别方法
US20120005207A1 (en) Method and system for web extraction
CN109753646B (zh) 一种文章属性识别方法以及电子设备
WO2021000400A1 (zh) 导诊相似问题对生成方法、系统及计算机设备
JP6546703B2 (ja) 自然言語処理装置及び自然言語処理方法
JP5408658B2 (ja) 情報整合性判別装置、その方法及びプログラム
CN111159996B (zh) 基于文本指纹算法的短文本集合相似度比较方法及系统
CN114816838A (zh) 用于提高数据恢复效率的方法、装置、介质及计算机设备
WO2019163643A1 (ja) 要約評価装置、方法、プログラム、及び記憶媒体

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12852905

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 13985851

Country of ref document: US

ENP Entry into the national phase

Ref document number: 20137030703

Country of ref document: KR

Kind code of ref document: A

REEP Request for entry into the european phase

Ref document number: 2012852905

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2012852905

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2014511731

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE