WO2015010386A1 - 文档格式转换装置和文档格式转换方法 - Google Patents
文档格式转换装置和文档格式转换方法 Download PDFInfo
- Publication number
- WO2015010386A1 WO2015010386A1 PCT/CN2013/086494 CN2013086494W WO2015010386A1 WO 2015010386 A1 WO2015010386 A1 WO 2015010386A1 CN 2013086494 W CN2013086494 W CN 2013086494W WO 2015010386 A1 WO2015010386 A1 WO 2015010386A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- path
- character
- font file
- unicode
- path group
- Prior art date
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 46
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000012545 processing Methods 0.000 claims description 18
- 238000005516 engineering process Methods 0.000 claims description 10
- 238000012015 optical character recognition Methods 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 abstract description 7
- 230000008707 rearrangement Effects 0.000 abstract description 7
- 230000008569 process Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 3
- 238000003672 processing method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/106—Display of layout of documents; Previewing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/109—Font handling; Temporal or kinetic typography
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/123—Storage facilities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
Definitions
- the present invention relates to the field of electronic document format conversion technology, and in particular to a document format conversion apparatus and a document format conversion method. Background technique
- the present invention is based on the above problems, and proposes a new document format conversion technology, which can solve the data redundancy problem in the layout document, so that the converted document has a smaller volume, and can also solve the streaming rearrangement display error.
- the problem is that it is easy to achieve better display effect on various terminals.
- the present invention provides a document format conversion apparatus, including: a document parsing unit, configured to parse a layout document to obtain a path primitive constituting the layout document; and a path grouping unit, configured to The primitives are grouped to generate a path group; the font file generating unit obtains a path group for describing the characters, and generates a font file corresponding to the path group for describing the characters, wherein if there are two or more paths describing the same characters Group, only one font file is generated, and the font file is associated with the two path groups describing the same characters; the document generating unit generates the converted document by using all the generated font files.
- a path group corresponding to each character is obtained by grouping path primitives in the layout document (of course, there is also a path group not used for describing characters);
- path primitives in the layout document of course, there is also a path group not used for describing characters
- only one font file is generated, so that for a document containing many identical characters, the volume of the document itself can be greatly reduced, in the case of using a mobile device, Helps users store a larger number of documents.
- the present invention also provides a document format conversion method, comprising: parsing a layout document to obtain path primitives constituting the layout document; grouping the path primitives to generate a path group; and obtaining a character for describing characters a path group, and generating a font file corresponding to the path group for describing the character, wherein if there are two or more path groups describing the same character, only one font file is generated, and the font file is associated with the two The path group of the same character is described above; the converted document is generated using all the generated font files.
- a path group corresponding to each character is obtained by grouping path primitives in the layout document (of course, there is also a path group not used for describing characters); meanwhile, due to the form of generating a font file And for generating a font file for a path group for describing the same character, the document itself can be greatly reduced in size for a document containing many identical characters, and is advantageous for user storage in the case of using a mobile device. A larger number of documents.
- FIG. 1A is a block diagram showing a document format conversion apparatus according to an embodiment of the present invention
- FIG. 1B is a block diagram showing a document format conversion apparatus according to another embodiment of the present invention
- FIG. 1C is a diagram showing a connection relationship between respective units in the document format conversion device shown in FIG. 1B;
- FIG. 2 is a flowchart showing a document format conversion method according to an embodiment of the present invention
- FIG. 3 is a specific flow chart showing conversion of a layout document according to an embodiment of the present invention
- FIG. 4 shows a flow chart for grouping path primitives according to an embodiment of the present invention
- FIG. 5 shows whether a path group is used to describe a stream of characters according to an embodiment of the present invention.
- FIG. 6 shows a flow chart for determining whether a path group is used to describe the same character and a corresponding processing method, in accordance with an embodiment of the present invention
- FIG. 7 shows a flow chart of generating a font file in accordance with an embodiment of the present invention.
- Figure 1A shows a block diagram of a document format conversion apparatus in accordance with one embodiment of the present invention.
- the document format conversion apparatus includes: a document parsing unit 102, configured to parse a layout document to obtain a path primitive constituting a layout document; and a path grouping unit 104, configured to perform the path primitive Grouping to generate a path group;
- the font file generating unit 106 obtains a path group for describing a character, and generates a font file corresponding to the path group for describing the character, wherein if there are two or more of the same characters
- the path group generates only one font file and associates the font file to the two or more path groups describing the same character;
- the document generating unit 108 generates the converted document using all the generated font files.
- FIG. 1B shows a block diagram of a document format conversion apparatus according to an embodiment of the present invention.
- the document format conversion apparatus 100 includes: a document parsing unit 102, configured to parse a layout document to obtain path primitives constituting the layout document; and a path grouping unit 104. And the grouping of the path primitives to generate a corresponding path group (including a path group for describing characters obtained by the font file generating unit 106, and other path groups not used for describing characters); font file generation The unit 106 obtains a path group for describing a character, and generates a corresponding font file. If there are two or more path groups describing the same character, only one font file is generated, and the font file is associated with the two files.
- More than one path group describing the same character document generation unit 108, using generation All of the font files are generated, and the converted document is generated.
- a path group corresponding to each character is obtained by grouping path primitives in the layout document (of course, there is also a path group not used for describing characters); meanwhile, due to the form of generating a font file And for the path group used to describe the same character, only one font file is generated, so that for a document containing many identical characters, the volume of the document itself can be greatly reduced, and in the case of using a mobile device, it is advantageous for the user. Store a larger number of documents.
- each character needs to be described by a separate path. Even the same characters that are repeated must be described by using paths respectively, so that a large number of paths cause great redundancy, and the document volume is also Larger; by generating font files, the same characters can only be described by using the same font file, which greatly reduces the original path redundancy, thereby helping to reduce the size of the document and solve the problem of document data redundancy. The problem.
- the path grouping unit 104 includes: a circumscribing rectangle obtaining subunit 1042, configured to obtain a minimum circumscribed rectangle of each of the path primitives; and a packet processing subunit 1044, configured to use all paths The positional relationship between the minimum circumscribed rectangles of the primitives is detected; if the minimum circumscribed rectangles of the two path primitives intersect, or the distance between the minimum circumscribed rectangles of the two path primitives is less than the preset character spacing, The two path primitives are grouped into the same path group.
- the path primitive is used.
- the other path primitives are assigned to the same path group.
- the minimum circumscribed rectangle of the path primitive is taken as its corresponding area, and it is determined whether each rectangular area intersects and is separated by a distance, thereby judging whether it should be divided into the same path group.
- each character corresponds to a path group (of course, there are path groups that are not used to describe characters), and by the above-described grouping process, segmentation of each character in the layout document can be realized.
- the method further includes: a description determining unit 110, configured to identify each path group by using an optical character recognition technology, and if the characters corresponding to the path group can be identified, determine the corresponding path group.
- the character is described for processing by the font file generating unit.
- OCR Optical Character Recognition
- the method further includes: a Unicode identification unit 112, configured to identify a Unicode code corresponding to the path group for describing a character; a character description unit 114, using the identifier
- the Unicode and the corresponding font file represent the characters being described.
- the font file contains path data for describing the character, so that the content of the document is properly streamlined on different devices, and a better display effect can be obtained.
- the font file generating unit 106 generates the font file by using the unified code recognized by the unified code identifying unit 112 and the corresponding path group.
- a font file is generated from a Unicode and a path group, thereby ensuring an accurate description of the corresponding character.
- the font file generating unit 106 includes: a first table generating subunit 1062, configured to generate a first table by using the Unicode, where the Unicode is stored in the first table. a mapping to the glyph index; a second table generating sub-unit 1064, configured to generate a second table by using path primitives included in the path group, where the second table stores a glyph index and glyph data corresponding to the glyph index
- the table processing sub-unit 1066 is configured to generate the font file by using the first table and the second table.
- the first table is a cmap table generated using a Unicode
- the second table is a giyf table generated using a path group.
- the method further includes: a recording state determining unit 116, configured to determine whether the unified code identified by the unified code identifying unit 112 has been recorded; and a data acquiring unit 118, configured to use the unified code In the case where it has been recorded, it is determined that there are other path groups for describing the same character, and the recorded Unicode and the corresponding generated font file are acquired to be used by the character description unit 114 to represent the character to be described. And the font file generating unit 106 generates the font file in a case where the Unicode code is not recorded, to be used by the character description unit 114 to represent the character to be described.
- the method further includes: a file saving unit 120, configured to uniformly save the font file, so that the character description unit 114 uses the name of the font file and a Unicode corresponding to the font file. And indicating a corresponding character; and a coordinate determining unit 122, configured to further acquire coordinates of the specified path group, and determine coordinates of the specified path group, in a case that the acquired Unicode of the specified path group has been recorded Whether the coordinates of the recorded path group are the same, wherein if they are the same, it is determined to be the same path group, and no processing is performed; if not, a new name is generated, so that the character description unit 114 uses the recorded unified code and the The new name indicates a corresponding character, and the font file generating unit 106 generates a font file named using the new name.
- a file saving unit 120 configured to uniformly save the font file, so that the character description unit 114 uses the name of the font file and a Unicode corresponding to the font file. And indicating a corresponding
- Fig. 1C is a diagram showing the connection relationship between the respective units in the document format conversion device shown in Fig. 1B.
- connection relationship between the units in the document format conversion apparatus 100 includes:
- the document parsing unit 102 is connected to the path grouping unit 104. Specifically, the path grouping unit 104 groups the path primitives according to the path primitives parsed by the document parsing unit 102, thereby generating a corresponding path group.
- the path grouping unit 104 includes: a circumscribed rectangle obtaining subunit 1042 and a packet processing subunit 1044.
- the circumscribing rectangle obtaining subunit 1042 is connected to the document parsing unit 102, and obtains the minimum circumscribed rectangle of each path primitive based on the path primitive parsed by the document parsing unit 102;
- the packet processing subunit 1044 is connected to the circumscribed rectangle acquiring subunit 1042: Detecting a positional relationship between the minimum circumscribed rectangles of all path primitives; grouping the path primitives according to a relationship between the minimum circumscribed rectangles of the path primitives.
- the description judging unit 110 is connected to the font file generating unit 106, and determines whether or not it is used to describe a character by recognizing the path group, whereby the font file generating unit 106 generates a font file using the path group in which the character is described.
- the Unicode identification unit 112 is connected to the description judging unit 110, and identifies the corresponding Unicode for the path for describing the character judged by the description judging unit 110.
- the Unicode recognition unit 112 is also connected to the font file generation unit 106, so that the font file generation unit 106 can further generate the font file by using the Unicode recognized by the Unicode recognition unit 112 and the corresponding path group generated by the path grouping unit 104. .
- the font file generating unit 106 includes: a first table generating subunit 1062, a second table generating subunit 1064, and a table processing subunit 1066.
- the first table generation subunit 1062 is connected to the Unicode recognition unit 112, and generates a first table by using the Unicode recognized by the Unicode recognition unit 112.
- the second table generation subunit 1064 is connected to the path grouping unit 104, and uses the path group.
- the path primitives included in the path generate a second table;
- the table processing sub-unit 1066 is coupled to the first table generation sub-unit 1062 and the second table generation sub-unit 1064, respectively, to generate a font file using the first table and the second table.
- the character description unit 114 is connected to the Unicode recognition unit 112 and the font file generation unit 106, respectively, and the Unicode code recognized by the Unicode recognition unit 112 and the corresponding font file generated by the font file generation unit 106 to represent the character to be described.
- the recording state judging unit 116 is connected to the Unicode recognizing unit 112 and the data acquiring unit 118, respectively. Specifically, the recording state determining unit 116 determines whether the unified code recognized by the unified code identifying unit 112 has been recorded. If it has been recorded, the data acquiring unit 118 determines that there are other path groups for describing the same character, and acquires the recorded. The Unicode and the corresponding generated font file are used by character description unit 114 to represent the character being described (data acquisition unit 118 is also coupled to character description unit 114).
- the file saving unit 120 is connected to the font file generating unit 106 and the character describing unit 114, respectively. Specifically, the file saving unit 120 uniformly saves the generated by the font file generating unit 106.
- the font file is represented by the character description unit 114 using the name of the font file and the Unicode corresponding to the font file.
- the coordinate judging unit 122 is connected to the recording state judging unit 116 and the character descripting unit 114, respectively. Specifically, the coordinate determination unit 122 further acquires the coordinates of the specified path group, and determines the coordinates of the specified path group and the recorded path group, in a case where the recording state determination unit 116 determines that the acquired unified code of the specified path group has been recorded. Whether the coordinates are the same, if not the same, a new name is generated to represent the corresponding character by the character description unit 114 using the recorded Unicode and the new name, and is generated by the font file generating unit 106 using the new name. Named font file.
- FIG. 2 shows a flow chart of a document format conversion method in accordance with an embodiment of the present invention.
- a document format conversion method includes: Step 202: parse a layout document to obtain a path primitive constituting the layout document; Step 204, the path primitive Grouping to generate each path group (including the path group for describing characters obtained in step 206, and other path groups not used to describe characters); Step 206, obtaining a path group for describing characters, and generating a corresponding Font file, wherein if there are more than two path groups describing the same character, only one font file is generated, and the font file is associated with the two or more path groups describing the same character; Step 208, using the generated All font files are generated, and the converted document is generated.
- a path group corresponding to each character is obtained by grouping path primitives in the layout document (of course, there is also a path group not used for describing characters); meanwhile, due to the form of generating a font file And for the path group used to describe the same character, only one font file is generated, so that for a document containing many identical characters, the volume of the document itself can be greatly reduced, and in the case of using a mobile device, it is advantageous for the user. Store a larger number of documents.
- each character needs to be described by a separate path. Even the same characters that are repeated must be described by using paths respectively, so that a large number of paths cause great redundancy, and the document volume is also Larger; by generating font files, the same characters can only be described by using the same font file, which greatly reduces the original path redundancy, thereby helping to reduce the size of the document and solve the problem of document data redundancy. The problem.
- the step 204 includes: acquiring a minimum circumscribed rectangle of each of the path primitives; detecting a relationship between minimum circumscribed rectangles of all path primitives, where two paths are If the minimum circumscribed rectangle of the primitive intersects, or the distance between the minimum circumscribed rectangles of the two path primitives is less than the preset character spacing, then the two path primitives are grouped into the same path group, or if a path If the minimum circumscribed rectangle corresponding to the primitive intersects the minimum circumscribed rectangle corresponding to another path primitive, or the distance between the two is less than the preset character spacing, the two path primitives are divided into the same path group.
- the minimum circumscribed rectangle of the path primitive is taken as its corresponding region, and it is determined whether or not each rectangular region should be divided into the same path group by calculating whether each rectangular region intersects and is separated by a distance.
- each character corresponds to a path group (of course, there are path groups not used to describe characters), and by the above-described grouping process, division of each character in the layout document can be realized.
- the step of acquiring a path group for describing a character comprises: identifying each path group by using an optical character recognition technology, if the path can be identified The corresponding character of the group determines that the corresponding path group is used to describe the character.
- optical character recognition technology can be adopted.
- OCR Optical Character Recognition
- the method further includes: identifying the Unicode code for describing the path group of the character, and using the Unicode code and the corresponding font file to represent the character to be described.
- the font file contains path data for describing the character, which facilitates proper stream rearrangement of the document content on different devices, and can obtain a better display effect.
- the step of generating the font file comprises: generating the font file by using the recognized Unicode and the corresponding path group.
- a font file is generated from a unified code and a path group, thereby ensuring an accurate description of the corresponding character.
- the step of generating the font file by using the Unicode and the corresponding path group comprises: generating a first table by using the Unicode, where the Unicode is stored in the first table Mapping to a glyph index; utilizing path primitives included in the path group Generating a second table in which a glyph index and glyph data (or path data) corresponding to the glyph index are stored; and the font file is generated using the first table and the second table.
- the first table is a cmap table generated by using a Unicode
- the second table is a glyf table generated by using a path group.
- the identified unified code it is further determined whether the identified unified code has been recorded, wherein if the unified code has been recorded, it is determined that there are other path groups for describing the same character, and the recorded unified is obtained. And a corresponding generated font file for representing the character to be described; if the Unicode code is not recorded, generating the font file for representing the character to be described.
- the identified Unicode it is determined whether the currently processed character has been processed, that is, whether the same character already exists, and if so, directly using the previously generated font file and the like, Avoid data redundancy and regenerate if it does not exist. Through the above comparison process, it is ensured that each character only corresponds to one font file, avoiding data redundancy and reducing the document size.
- the method further includes: uniformly saving the font file, and using the name of the font file and a Unicode corresponding to the font file to represent a corresponding character, wherein if the specified path group is obtained, If the Unicode has been recorded, the coordinates of the specified path group are further obtained, and it is determined whether the coordinates of the specified path group are the same as the coordinates of the recorded path group. If they are the same, the same path group is determined, and no processing is performed. If not the same, a new name is generated, the corresponding character is represented by the recorded Unicode and the new name, and a font file named using the new name is generated.
- Fig. 3 shows a specific flow chart for converting a layout document according to an embodiment of the present invention.
- a specific process for converting a layout document includes: Step 302: Parse the layout document data.
- the original layout document may be parsed by using a parsing engine.
- Step 304 Acquire, according to the parsing result, the primitives constituting the layout document.
- Step 306 Determine whether the primitive is a path. Specifically, by analyzing the layout document data, the primitive ID, the primitive type, the primitive data, and the like can be obtained. Therefore, by analyzing the obtained primitive type, the identifier can be determined. Whether the primitive is a path. If yes, go to step 308, otherwise go to step 310.
- Step 308 grouping the paths to obtain a path group, where each path group is used to describe a complete element, for example, to describe a character.
- Step 310 Perform corresponding processing according to the primitive type.
- Step 312 determining whether the path description is a character, and if yes, proceeding to step 314, otherwise proceeding to step 316.
- Step 314 generating a font file.
- FIG. 4 illustrates a flow chart for grouping path primitives in accordance with an embodiment of the present invention.
- the flow of grouping path primitives includes:
- Step 402 Acquire path primitive data, that is, a primitive belonging to the path type.
- Step 404 Calculate a minimum circumscribed rectangle of the path primitive as an area corresponding to the path primitive.
- Step 406 Determine whether the currently processed path primitive is a start path, that is, whether it is the first path of a certain path group, and if yes, proceed to step 408; otherwise, proceed to step 410. Specifically, after the grouping of the previous path group is completed, the first path element to be processed next is used as the start path.
- Step 408 Save the coordinates of the minimum circumscribed rectangle and return to step 402.
- Step 410 Calculate a distance from a minimum circumscribed rectangle of the start path and the calculated path to determine a relationship between the two.
- the minimum circumscribed rectangle of the path that has been calculated here that is, the coordinate data saved in step 408.
- Step 412 According to the calculation result of step 410, determine whether the two intersect, or when they do not intersect, whether the spacing between the two is smaller than the character spacing. Where, if the intersection or spacing is less than the character spacing
- step 416 (or other preset distance), then go to step 416, otherwise go to step 414.
- Step 414 using the path primitive as a starting path of the next group path, and entering the step
- Step 416 these paths are taken as the same path group.
- FIG. 5 illustrates a flow chart for determining whether a path group is used to describe a character, in accordance with an embodiment of the present invention.
- the process of determining whether a path group is used to describe a character includes:
- Step 502 Obtain a path group.
- Step 504 Calculate a minimum circumscribed rectangle of the path group, and use the area corresponding to the path group.
- Step 506 Identify the path group by using OCR technology to obtain a corresponding character.
- Step 508 Determine whether the corresponding Unicode code can be recognized according to the recognized character. If yes, go to step 512; otherwise, go to step 510. Step 512, the path group is processed as a character.
- Step 312 further includes determining whether two or more path groups describe the same character.
- FIG. 6 illustrates a flowchart for determining whether the path group is used to describe the same character and corresponding processing method according to an embodiment of the present invention. .
- the process of determining whether a path group is used to describe the same character and the corresponding processing method according to an embodiment of the present invention includes:
- Step 602 Obtain a character path group.
- Step 604 Identify the Unicode code of the character described by the character path group, and perform a search in the processed character list, where the processed character list stores the Unicode code of the character described by the processed character path group.
- Step 606 Determine, according to the search result, whether a Unicode code of the currently found character exists in the processed character list. If yes, go to step 612, otherwise go to step 608. Step 608, indicating that the character of the current path group description is the first occurrence in the layout document, and adding the recognized Unicode code to the processed character list.
- Step 610 Generate a corresponding font file according to the Unicode code and the path, and return to step 602 to continue processing other path groups.
- Step 612 Obtain coordinates of the current path group and the found path group, and perform coordinate changes on the two sets of coordinates. Specifically, the coordinates may be translated to the coordinate origin, and the coordinates of the two are compared.
- Step 614 Determine whether the coordinates of the two are the same. If yes, go to step 616, otherwise go to step 610.
- Step 616 the description is used to describe the same character, replace the original path data with the stored Unicode code and the font name (the internal file name of the font file) to represent the character, and return to step 602 to continue to perform other path groups. deal with.
- Figure 7 illustrates a flow diagram for generating a font file in accordance with an embodiment of the present invention.
- the process of generating a font file includes: Step 702, passing in a Unicode code and a path description (ie, a path group corresponding to the Unicode code).
- Step 704 generating a cmap table by using a Unicode code.
- Step 706 the path will be described g lyf stored table. Of course, you also need to generate some other description tables necessary for OpenType font files.
- Step 708 Generate a corresponding OpenType font file by using the generated cmap table, glyf table, and description table, and save the font file.
- the present disclosure also provides one or more computer readable media having computer executable instructions that, when executed by a computer, perform a digital rights merging method, the method comprising: parsing a layout document to obtain a composition a path primitive of the layout document; grouping the path primitives to generate a path group; obtaining a path group for describing characters, and generating a font file corresponding to the path group for describing characters, wherein If there are more than two path groups describing the same character, only one font file is generated, and the font file is associated with the two or more path groups describing the same character; the converted document is generated using all the generated font files. .
- the present disclosure also provides a computer comprising one or more computer readable media with computer executable instructions that, when executed by a computer, perform the method of claim 9.
- a computer or computing device such as described herein, has hardware, including one or more processors or processing units, system memory, and some form of computer-readable media.
- computer readable media includes computer storage media and communication media.
- Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
- Communication media typically embody computer readable instructions, data structures, program modules or other data in a modulated data signal, such as a carrier wave or other transmission mechanism, and includes any information delivery medium. Combinations of any of the above are also included within the scope of computer readable.
- Embodiments of the invention may be described in the general context of computer-executable instructions, such as program modules, being executed by one or more computers or other devices.
- Computer executable instructions can be organized as software into one or more computer executable components or modules.
- program modules include, but are not limited to, routines, programs, objects, components, and data structures that perform particular tasks or implement particular abstract data types.
- aspects of the invention may be implemented using any number of such components or modules and their organization. For example, aspects of the invention are not limited to the specific computer-executable instructions or specific components or modules illustrated in the figures and described herein.
- Other embodiments of the invention may include different computer-executable instructions or components having more or less functionality than those illustrated and described herein.
- aspects of the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
- program modules can be located in both local and remote computer storage media including memory storage devices.
- the present invention provides a document format conversion apparatus and a document format conversion method, which can Solve the data redundancy problem in the layout document, make the converted document have a smaller volume, and also solve the problem of streaming rearrangement display error, and achieve better display effect on various terminals.
- a document format conversion apparatus and a document format conversion method which can Solve the data redundancy problem in the layout document, make the converted document have a smaller volume, and also solve the problem of streaming rearrangement display error, and achieve better display effect on various terminals.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Software Systems (AREA)
- Business, Economics & Management (AREA)
- Multimedia (AREA)
- Document Processing Apparatus (AREA)
- Controls And Circuits For Display Device (AREA)
- Character Discrimination (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/399,337 US9529781B2 (en) | 2013-07-22 | 2013-11-04 | Apparatus and method for document format conversion |
EP13890226.7A EP3026571A4 (en) | 2013-07-22 | 2013-11-04 | Document format conversion device and document format conversion method |
JP2016528295A JP2016532190A (ja) | 2013-07-22 | 2013-11-04 | 文書フォーマット変換装置及び方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310309016.6 | 2013-07-22 | ||
CN201310309016.6A CN104331391B (zh) | 2013-07-22 | 2013-07-22 | 文档格式转换装置和文档格式转换方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015010386A1 true WO2015010386A1 (zh) | 2015-01-29 |
Family
ID=52392652
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2013/086494 WO2015010386A1 (zh) | 2013-07-22 | 2013-11-04 | 文档格式转换装置和文档格式转换方法 |
Country Status (5)
Country | Link |
---|---|
US (1) | US9529781B2 (zh) |
EP (1) | EP3026571A4 (zh) |
JP (1) | JP2016532190A (zh) |
CN (1) | CN104331391B (zh) |
WO (1) | WO2015010386A1 (zh) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9684986B1 (en) * | 2015-02-25 | 2017-06-20 | Amazon Technologies, Inc. | Constructing fonts from scanned images for rendering text |
CN105404683A (zh) * | 2015-11-30 | 2016-03-16 | 北大方正集团有限公司 | 一种版式文档处理方法及装置 |
CN109614594B (zh) * | 2018-11-27 | 2023-05-30 | 浙江万朋数智科技股份有限公司 | 一种将题目文档解析为题库数据的方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7165216B2 (en) * | 2004-01-14 | 2007-01-16 | Xerox Corporation | Systems and methods for converting legacy and proprietary documents into extended mark-up language format |
US7315868B1 (en) * | 2001-12-21 | 2008-01-01 | Unisys Corporation | XML element to source mapping tree |
CN102866986A (zh) * | 2012-08-30 | 2013-01-09 | 中国矿业大学 | 一种文档格式转换系统 |
CN103186513A (zh) * | 2011-12-31 | 2013-07-03 | 北大方正集团有限公司 | 一种文档格式转换的方法及装置 |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US2005101A (en) * | 1934-09-26 | 1935-06-18 | Herberts Machinery Co Ltd | Lathe |
JPS60233781A (ja) * | 1984-05-07 | 1985-11-20 | Oki Electric Ind Co Ltd | 文字分類方法 |
US5412771A (en) * | 1992-02-07 | 1995-05-02 | Signature Software, Inc. | Generation of interdependent font characters based on ligature and glyph categorizations |
CA2125608A1 (en) * | 1993-06-30 | 1994-12-31 | George M. Moore | Method and system for providing substitute computer fonts |
JP3344062B2 (ja) * | 1994-03-18 | 2002-11-11 | 富士通株式会社 | カタカナ手書き文字切り出し回路 |
US6741743B2 (en) * | 1998-07-31 | 2004-05-25 | Prc. Inc. | Imaged document optical correlation and conversion system |
US6678410B1 (en) * | 1999-02-17 | 2004-01-13 | Adobe Systems Incorporated | Generating a glyph |
JP2000330546A (ja) * | 1999-05-25 | 2000-11-30 | Hitachi Ltd | フォント作成装置、およびフォント作成用記憶媒体 |
JP2001043212A (ja) * | 1999-07-23 | 2001-02-16 | Internatl Business Mach Corp <Ibm> | 電子文書における文字情報の正規化方法 |
JP2001282776A (ja) * | 2000-03-30 | 2001-10-12 | Canon Inc | 文書処理装置、文書処理方法および記憶媒体 |
JP3958003B2 (ja) * | 2000-09-29 | 2007-08-15 | 独立行政法人科学技術振興機構 | 文字認識方法、文字認識プログラム、文字認識プログラムを記録したコンピュータ読み取り可能な記録媒体及び文字認識装置 |
US20040205568A1 (en) * | 2002-03-01 | 2004-10-14 | Breuel Thomas M. | Method and system for document image layout deconstruction and redisplay system |
US7310769B1 (en) * | 2003-03-12 | 2007-12-18 | Adobe Systems Incorporated | Text encoding using dummy font |
US20050105799A1 (en) * | 2003-11-17 | 2005-05-19 | Media Lab Europe | Dynamic typography system |
JP4393161B2 (ja) * | 2003-11-20 | 2010-01-06 | キヤノン株式会社 | 画像処理装置及び画像処理方法 |
ZA200409347B (en) * | 2003-12-01 | 2005-07-27 | Inventio Ag | Lift system |
JP2007128370A (ja) * | 2005-11-04 | 2007-05-24 | Nec Corp | 文書管理サーバー、文書管理システム、文書管理方法、文書管理プログラム |
US8438472B2 (en) * | 2009-01-02 | 2013-05-07 | Apple Inc. | Efficient data structures for parsing and analyzing a document |
US8266179B2 (en) * | 2009-09-30 | 2012-09-11 | Hewlett-Packard Development Company, L.P. | Method and system for processing text |
CN102591849B (zh) * | 2011-01-07 | 2014-07-30 | 北大方正集团有限公司 | 文档格式转换的方法及装置 |
US8768061B2 (en) * | 2012-05-02 | 2014-07-01 | Xerox Corporation | Post optical character recognition determination of font size |
-
2013
- 2013-07-22 CN CN201310309016.6A patent/CN104331391B/zh not_active Expired - Fee Related
- 2013-11-04 JP JP2016528295A patent/JP2016532190A/ja active Pending
- 2013-11-04 US US14/399,337 patent/US9529781B2/en not_active Expired - Fee Related
- 2013-11-04 WO PCT/CN2013/086494 patent/WO2015010386A1/zh active Application Filing
- 2013-11-04 EP EP13890226.7A patent/EP3026571A4/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7315868B1 (en) * | 2001-12-21 | 2008-01-01 | Unisys Corporation | XML element to source mapping tree |
US7165216B2 (en) * | 2004-01-14 | 2007-01-16 | Xerox Corporation | Systems and methods for converting legacy and proprietary documents into extended mark-up language format |
CN103186513A (zh) * | 2011-12-31 | 2013-07-03 | 北大方正集团有限公司 | 一种文档格式转换的方法及装置 |
CN102866986A (zh) * | 2012-08-30 | 2013-01-09 | 中国矿业大学 | 一种文档格式转换系统 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3026571A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP3026571A1 (en) | 2016-06-01 |
JP2016532190A (ja) | 2016-10-13 |
US9529781B2 (en) | 2016-12-27 |
US20150339271A1 (en) | 2015-11-26 |
CN104331391B (zh) | 2018-02-02 |
EP3026571A4 (en) | 2017-04-12 |
CN104331391A (zh) | 2015-02-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019019361A1 (zh) | 数据库数据处理方法、装置、计算机设备和存储介质 | |
WO2023131218A1 (zh) | 图数据的存储 | |
CN110888842A (zh) | 一种文件存储方法、文件查询方法、装置及设备 | |
CN107566090B (zh) | 一种定长/变长的文本报文处理方法及装置 | |
WO2018188373A1 (zh) | 页面分享方法、装置、服务器及存储介质 | |
CN109062906B (zh) | 程序语言资源的翻译方法及装置 | |
US20160224554A1 (en) | Search methods, servers, and systems | |
WO2022099868A1 (zh) | 智能笔书写行为特征分析方法、装置及电子设备 | |
CN110502645B (zh) | 信息查询方法及装置 | |
WO2015010386A1 (zh) | 文档格式转换装置和文档格式转换方法 | |
CN109947431A (zh) | 一种代码生成方法、装置、设备及存储介质 | |
CN113467777A (zh) | 路径识别方法、装置和系统 | |
WO2018028127A1 (zh) | 存储文件的解析方法和装置 | |
CN109429260B (zh) | 一种北向数据的校验方法及装置 | |
WO2024041301A1 (zh) | 一种生成统一抽象语法树与程序分析的方法和装置 | |
WO2024113874A1 (zh) | 环形二维码的编码方法和解码方法 | |
CN113391972A (zh) | 一种接口测试方法及装置 | |
CN104753891B (zh) | 一种xml报文解析方法及装置 | |
CN109446052B (zh) | 一种应用程序的校验方法及设备 | |
CN115729887A (zh) | 一种文件解析方法、装置及计算机可读介质 | |
WO2021135103A1 (zh) | 一种语义分析方法、装置、计算机设备及存储介质 | |
CN115390936A (zh) | 统一校验方法、装置、设备和存储介质 | |
WO2022104998A1 (zh) | 笔迹内容评价方法、装置及电子设备 | |
US10038604B2 (en) | Processing method and apparatus for signaling tracing | |
CN112445811A (zh) | 基于sql配置的数据服务方法、装置、存储介质及组件 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 14399337 Country of ref document: US |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13890226 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2013890226 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2016528295 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |