CN103488619A - Method and device for processing document file - Google Patents
Method and device for processing document file Download PDFInfo
- Publication number
- CN103488619A CN103488619A CN201310282405.4A CN201310282405A CN103488619A CN 103488619 A CN103488619 A CN 103488619A CN 201310282405 A CN201310282405 A CN 201310282405A CN 103488619 A CN103488619 A CN 103488619A
- Authority
- CN
- China
- Prior art keywords
- document files
- word
- merging
- document
- merged
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Document Processing Apparatus (AREA)
- Processing Or Creating Images (AREA)
Abstract
The invention discloses a method and a device for processing a document file. The method for processing the document file comprises the following steps of extracting document file elements from the document file; combining the document file elements according to the types of the document file elements and the position information of the document file elements in the document file so as to generate the combined document file. By the adoption of the method and the device, the document file elements are combined according to the types of the document file elements which are extracted from the document file and the position information of the document file elements in the document file so as to generate the combined document file. The document file self-adaptive to a screen of user equipment can be generated without manual editing of each document file.
Description
Technical field
The present invention relates to the document files treatment technology, relate in particular to a kind of method and device for carrying out the document files processing.
Background technology
The user uses mobile device reading documents file very general at present, document files for different-format, need in computing machine, install and support document files software for editing or the document files ocr software of corresponding format just can present document files, be also, the document files image is played up each document files element according to the attribute that presents of the document files element in document files, thereby present the document file, wherein present attribute and include but not limited to the coordinate information of document files element in document files, style information, this style information comprises character script, word size and color etc.And if different document Document Editing software or the document ocr software that adapts to various document file format be not installed, because mobile device of the prior art can only be simple to document file structure, and the document files of the specific format of being crossed by the human-edited completes the processing of self-adaptation screen, microsoft office series form for current the most generally application, PDF, the document files of openOffice series form, can not batch processing be can the adaptive user device screen document files, can not meet the demand of current extensive reading electronic document files, also to the user, read and bring very large inconvenience, reduce reading experience.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of method and device for carrying out the document files processing, take the problem that can not be document files that can the adaptive user device screen by the document files batch processing of most of forms that solves in prior art.
According to an aspect of the present invention, provide a kind of for carrying out the method for document files processing, comprising:
Extract the document files element from document files;
According to type and the positional information of document files element in the document file of described document files element, described document files element is merged, to generate the document files after merging.
According to another aspect of the present invention, also provide a kind of for carrying out the document files treating apparatus of document files processing, having comprised:
The element extraction device, for extracting the document files element from document files;
Merge device, described document files element is merged in the positional information of the document file for the type according to described document files element and document files element, to generate the document files after merging.
The present invention is by the type of the document files element according to extracting from document files, with the positional information of the document document element in document files, the document files element is merged, generate the document files after merging, do not need artificial the participation respectively every piece of document files to be edited, just can generate the document files of the screen of adaptive user equipment.
The accompanying drawing explanation
By reading the detailed description that non-limiting example is done of doing with reference to the following drawings, it is more obvious that other features, objects and advantages of the present invention will become:
The process flow diagram that Fig. 1 is a kind of embodiment of the method for processing for document files of the present invention;
The schematic diagram that Fig. 2 a-2c is two figures that in the embodiment of the present invention, coordinate is adjacent;
The schematic diagram that Fig. 3 is a kind of device embodiment processed for document files of the present invention.
In accompanying drawing, same or analogous Reference numeral represents same or analogous parts.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
Herein:
" document files " refers to adopt the document files software for editing to edit the file generated, wherein comprise the document files elements such as word, figure, particularly, document files includes but not limited to the document files of the forms such as Word, Excel, PDF, OpenOffice, RTF, XML, TXT, EPUB;
" document files element " refers to the element in document files, including but not limited to word, figure.
Fig. 1 shows the schematic flow sheet of the method that document files is processed of one embodiment of the invention.
As shown in Figure 1, in step S101, extract the document files element from document files, the document files element includes but not limited at least following any form: word, figure.
In step S102, according to type and the positional information of document files element in the document file of document files element, described document files element is merged, to generate the document files after merging.
Concrete, can, based on following at least one mode, according to type and the positional information of document files element, a plurality of document files elements be merged:
I), when the document files element is word, according to arrangement mode and the coordinate information of word, described word is consolidated into to a word sequence.
Be appreciated that, common character arrangement comprises horizontally-arranged and vertical setting of types, when the arrangement mode of word is horizontally-arranged, can be according to the coordinate information of word, by the same a line between adjacent newline, but separated by other document files elements such as figure, multistage horizontally-arranged word is merged into one by the continuously arranged word sequence of horizontally-arranged mode; When the arrangement mode of word is vertical setting of types, can be according to the coordinate information of word, by the same row between adjacent newline, but separated by other document files elements such as figure, the merging of multistage vertical setting of types word is processed into one by the continuously arranged word sequence of vertical setting of types mode one.
Ii), for two adjacent figures, according to the coordinate information of two figures, two figures are merged into to a figure.
Particularly, can judge in the following manner whether two figures are the figure that coordinate is adjacent:
At first, according to the coordinate information of two figures, obtain the minimum containment rectangle of described two figures, described minimum containment rectangle is determined by minimum horizontal ordinate and maximum horizontal ordinate, minimum ordinate and the maximum ordinate of corresponding figure;
Subsequently, judge whether the interval between the minimum containment rectangle zone of two figures is less than predetermined value, if the interval between the rectangular area of two figures is less than predetermined value, these two figures are figures that coordinate is adjacent, as shown in Fig. 2 a-2c.Concrete, this predetermined value can be set according to actual document file situation and experience, preferred, can be set as 2 pixels.
Iii) when being included in a plurality of dissimilar document files element in the same area, coordinate information according to described dissimilar document files element, described a plurality of dissimilar document files elements are merged into to a figure, wherein, described dissimilar document files element includes but not limited to word, figure.
Subsequently, the document files element obtained after being merged by a plurality of document files elements, for example word sequence and figure, arranged according to its corresponding coordinate information, generates the document files after merging.
Wherein, the document files after merging comprises following information:
The type of the document files element after-merging, comprise word, figure;
The content of the document files element after-merging, comprising following any one:
The content of the word sequence that ■ obtains after merging;
The graphic file of the figure that ■ obtains after merging or memory address or the link of graphic file;
The positional information of-described document files element, this positional information comprises document files element coordinate information in document files after merging;
The style information of-described document files element, this style information comprises font size, the color of word.
The present embodiment is by the type of the document files element according to extracting from document files, with the positional information of the document document element in document files, the document files element is merged, generate the document files after merging, do not need artificial the participation respectively every piece of document files to be edited, just can generate the document files of the screen of adaptive user equipment, be also, document files after merging is played up and is presented on distinct device, while for the size that adapts to different screen, carrying out typesetting again, owing to each document files element in former document files having been carried out merge, process, to obtain continuously arranged word sequence and to be merged the figure obtained by a plurality of document files elements, therefore can reduce document files changes because of the format that typesetting causes again.
Step S101 and S102 in previous embodiment, a kind of method of processing for document files of the present invention, can also comprise step S103 and S104 in another embodiment.
In step S103, by the graphics memory after merging, be graphic file, the graphic file of forms such as GIF, JPG, PNG.
In step S104, the document files after described graphic file is set up index and put into merging.
Concrete, can in the following ways graphic file be set up to index in the document files after merging:
According to coordinate information, feature size and the figure place figure layer etc. that obtain figure after merging, set up the index of this figure.For example, the content part of document files element in document files after merging, with the origin coordinates that identifies presentation graphic in described graphic file, wide and high, for example use { ix, iy, iw, ih} means, wherein, ix is the initial horizontal ordinate of figure at described graphic file, iy is the initial ordinate of figure at described graphic file, and what iw was figure is wide, the height that ih is figure.
Step S101-S104 in previous embodiment, a kind of method of processing for document files of the present invention, can also comprise step S105 in another embodiment.
In step S105, when the browser reading documents file of user user equipment, be built in the document files after reader in browser is resolved described merging, and according to the information in the document files after this merging, the document files element wherein comprised played up to be presented.
The method that the embodiment of the present invention provides, the user need to not install the reader of various adaptation different document file layouts on subscriber equipment, also do not need to install special application program, the browser that just can be convenient to use subscriber equipment is read the document files of various different-formats.As mentioned above, after being processed by a plurality of original document document elements merging due to each document files element in the document files after merging, obtain, for example continuously arranged word sequence reaches by a plurality of document files elements and merges the figure obtained, and the different big or small screen formats that typesetting causes again change because needs adapt to therefore can to reduce document files.
Fig. 3 shows a kind of device of processing for document files of the present invention, and as shown in Figure 3, the document document handling apparatus comprises element extraction device 21 and merges device 22.
Wherein, element extraction device 21, for from document files, extracting the document files element, the document files element includes but not limited at least following any form: word, figure.
Merge device 22, described document files element is merged in the positional information of the document file for the type according to the document files element and document files element, to generate the document files after merging.
Concrete, described merging device 22 can comprise the first merging module 221 and arrange module 222.
Described first merges module 221, for based on following at least one mode, according to type and the positional information of described document files element, a plurality of document files elements is merged:
I), when the document files element is word, according to arrangement mode and the coordinate information of word, described word is consolidated into to a word sequence.
Be appreciated that common character arrangement comprises horizontally-arranged and vertical setting of types, first merges module 221 can comprise:
First merges submodule 2211, for when the arrangement mode of word is horizontally-arranged, can be according to the coordinate information of word, by the same a line between adjacent newline, but separated by other document files elements such as figure, multistage horizontally-arranged word is merged into one by the continuously arranged word sequence of horizontally-arranged mode; With
Second merges submodule 2212, for when the arrangement mode of word is vertical setting of types, can be according to the coordinate information of word, by the same row between adjacent newline, but separated by other document files elements such as figure, the merging of multistage vertical setting of types word is processed into one by the continuously arranged word sequence of vertical setting of types mode one.
Ii), for two adjacent figures, according to the coordinate information of two figures, two figures are merged into to a figure.
Particularly, can judge in the following manner whether two figures are the figure that coordinate is adjacent:
At first, according to the coordinate information of two figures, obtain the minimum containment rectangle of described two figures, described minimum containment rectangle is determined by minimum horizontal ordinate and maximum horizontal ordinate, minimum ordinate and the maximum ordinate of corresponding figure;
Subsequently, judge whether the interval between the minimum containment rectangle zone of two figures is less than predetermined value, if the interval between the rectangular area of two figures is less than predetermined value, these two figures are figures that coordinate is adjacent, as shown in Fig. 2 a-2c.Concrete, this predetermined value can be set according to actual document file situation and experience, preferred, can be set as 2 pixels.
Iii) when being included in a plurality of dissimilar document files element in the same area, coordinate information according to described dissimilar document files element, described a plurality of dissimilar document files elements are merged into to a figure, wherein, described dissimilar document files element includes but not limited to word, figure.
Described arrangement module 222, for the document files element obtained after being merged by a plurality of document files elements, for example word sequence and figure, arranged according to its corresponding coordinate information, generates the document files after merging.Concrete, can be according to the document files element obtained after being merged by a plurality of document files elements, for example word sequence and figure, obtain the minimum containment rectangle of the document files element obtained after described merging, described minimum containment rectangle is determined by minimum horizontal ordinate and maximum horizontal ordinate, minimum ordinate and the maximum ordinate of corresponding word sequence or figure, these minimum containment rectangles are got to union based on coordinate, the document files after being merged.
Wherein, the document files after merging comprises following information:
The type of the document files element after-merging, comprise word, figure;
The content of the document files element after-merging, comprising following any one:
The content of the word sequence that ■ obtains after merging;
The graphic file of the figure that ■ obtains after merging or memory address or the link of graphic file;
The positional information of-described document files element, this positional information comprises document files element coordinate information in document files after merging;
The style information of-described document files element, this style information comprises font size, the color of word.
The present embodiment is by the type of the document files element according to extracting from document files, with the positional information of the document document element in document files, the document files element is merged, generate the document files after merging, do not need artificial the participation respectively every piece of document files to be edited, just can generate the document files of the screen of adaptive user equipment, be also, document files after merging is played up and is presented on distinct device, while for the size that adapts to different screen, carrying out typesetting again, owing to each document files element in former document files having been carried out merge, process, to obtain continuously arranged word sequence and to be merged the figure obtained by a plurality of document files elements, therefore can reduce document files changes because of the format that typesetting causes again.
Wherein, memory storage, be graphic file for the graphics memory by obtaining after merging, the graphic file of forms such as GIF, JPG, PNG;
The index apparatus for establishing, for the document files after described graphic file is set up index and put into merging.
Concrete, can in the following ways graphic file be set up to index in the document files after merging:
According to coordinate information, feature size and the figure place figure layer etc. that obtain figure after merging, set up the index of this figure.For example, the content part of document files element in document files after merging, with the origin coordinates that identifies presentation graphic in described graphic file, wide and high, for example used { ix, iy, iw, ih} means, wherein, ix is the initial horizontal ordinate of figure at described graphic file, iy is the initial ordinate of figure at described graphic file, and what iw was figure is wide, the height that ih is figure.
The device that the embodiment of the present invention provides, the user need to not install the reader of various adaptation different document file layouts on subscriber equipment, also do not need to install special application program, the browser that just can be convenient to use subscriber equipment is read the document files of various different-formats.As mentioned above, after being processed by a plurality of original document document elements merging due to each document files element in the document files after merging, obtain, for example continuously arranged word sequence reaches by a plurality of document files elements and merges the figure obtained, and the different big or small screen formats that typesetting causes again change because needs adapt to therefore can to reduce document files.
It should be noted that the present invention can be implemented in the assembly of software and/or software and hardware, for example, each device of the present invention can adopt special IC (ASIC) or any other similar hardware device to realize.In one embodiment, software program of the present invention can carry out to realize step mentioned above or function by processor.Similarly, software program of the present invention (comprising relevant data structure) can be stored in computer readable recording medium storing program for performing, for example, and RAM storer, magnetic or CD-ROM driver or flexible plastic disc and similar devices.In addition, steps more of the present invention or function can adopt hardware to realize, for example, thereby as coordinate the circuit of carrying out each step or function with processor.
To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned example embodiment, and in the situation that do not deviate from spirit of the present invention or essential characteristic, can realize the present invention with other concrete form.Therefore, no matter from which point, all should regard embodiment as exemplary, and be nonrestrictive, scope of the present invention is limited by claims rather than above-mentioned explanation, therefore is intended to be included in the present invention dropping on the implication that is equal to important document of claim and all changes in scope.Any Reference numeral in claim should be considered as limit related claim.In addition, obviously other unit or step do not got rid of in " comprising " word, and odd number is not got rid of plural number.A plurality of unit of stating in the system claim or dress also can be realized by software or hardware by a unit or device.The first, the second word such as grade is used for meaning title, and does not mean any specific order.
Claims (10)
1. one kind for carrying out the method for document files processing, it is characterized in that, comprising:
-extraction document part element from document files;
-according to type and the put information of document files element in the document file of described document files element, described document files element is merged, to generate the document files after merging.
2. method according to claim 1, wherein, the described step that the document files element is merged comprises the following steps:
-based on following at least one, according to type and the positional information of described civilian document element, described a plurality of document files elements are merged:
-when the document files element is word, according to arrangement mode and the coordinate information of word, described word is consolidated into to a word sequence;
-for two adjacent figures, according to the coordinate information of two figures, two figures are merged into to a figure;
-when being included in a plurality of dissimilar document files element in the same area, according to the coordinate information of described dissimilar document files element, described a plurality of dissimilar document files elements are merged into to a figure;
-will according to its corresponding coordinate information, be arranged by the document files element obtained after a plurality of document files elements merging, generate the document files after merging.
3. method according to claim 2, is characterized in that, described when the document files element is word, according to arrangement mode and the coordinate information of word, word is consolidated into to the step of a word sequence, comprising:
-when the arrangement mode of word is horizontally-arranged, according to the coordinate information of word, by the same a line between adjacent newline, but separated by other document files elements, multistage horizontally-arranged word is merged into one by the continuously arranged word sequence of horizontally-arranged mode;
-when the arrangement mode of word is vertical setting of types, according to the coordinate information of word, by the same row between adjacent newline, but separated by other document files elements, the merging of multistage vertical setting of types word is processed into one by the continuously arranged word sequence of vertical setting of types mode one.
4. according to the method in claim 2 or 3, it is characterized in that, also comprise:
-by the graphics memory obtained after merging, be graphic file;
-document files after described graphic file is set up index and put into merging.
5. require 2 or 3 described methods according to profit, it is characterized in that, also comprise:
-be built in the document files after reader in browser is resolved described merging, and according to the information in the document files after this merging, the document files element wherein comprised is played up to be presented.
6. one kind for carrying out the document files treating apparatus of document files processing, it is characterized in that, comprising:
The element extraction device, for extracting civilian document element from document files;
Merge device, described document files element is merged in the positional information of this files for the type according to described document files element and document files element, to generate the document files after merging.
7. device according to claim 6, is characterized in that, described merging device comprises:
First merges module, for based on following at least one mode, according to type and the positional information of described document files element, described a plurality of document files elements is merged:
-when the document files element is word, according to arrangement mode and the coordinate information of word, described word is consolidated into to a word sequence;
-for two adjacent figures, according to the coordinate information of two figures, two figures are merged into to a figure;
-when being included in a plurality of dissimilar document files element in the same area, according to the coordinate information of described dissimilar document files element, described a plurality of dissimilar document files elements are merged into to a figure;
Arrange module, for the document element obtained after being merged by a plurality of document files elements, arranged according to its corresponding coordinate information, generate the document files after merging.
8. device according to claim 7, its spy is, described first merges module comprises:
First merges submodule, for when the arrangement mode of word is horizontally-arranged, according to the coordinate information of word, by the same a line between adjacent newline, but separated by other document files elements, multistage horizontally-arranged word is merged into one by the continuously arranged word sequence of horizontally-arranged mode;
Second merges submodule, for when the arrangement mode of word is vertical setting of types, according to the coordinate information of word, by the same row between adjacent newline, but separated by other document files elements, the merging of multistage vertical setting of types word is processed into one by the continuously arranged word sequence of vertical setting of types mode one.
9. according to the described device of claim 7 or 8, it is characterized in that, also comprise:
Memory storage is graphic file for the graphics memory by obtaining after merging;
The index apparatus for establishing, for the document files after described graphic file is set up index and put into merging.
10. according to the described device of claim 7 or 8, it is characterized in that, also comprise:
Reader, for being built in browser, resolving the document files after described merging, and according to the information in the document files after this merging, the document files element wherein comprised played up to be presented.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310282405.4A CN103488619B (en) | 2013-07-05 | 2013-07-05 | Method and device for processing document file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310282405.4A CN103488619B (en) | 2013-07-05 | 2013-07-05 | Method and device for processing document file |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103488619A true CN103488619A (en) | 2014-01-01 |
CN103488619B CN103488619B (en) | 2017-05-24 |
Family
ID=49828862
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310282405.4A Active CN103488619B (en) | 2013-07-05 | 2013-07-05 | Method and device for processing document file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103488619B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105335339A (en) * | 2015-10-19 | 2016-02-17 | 江苏沃叶软件有限公司 | Pdf document conversion method |
CN106020677A (en) * | 2016-04-27 | 2016-10-12 | 努比亚技术有限公司 | Information processing method and mobile terminal |
CN106033412A (en) * | 2015-03-20 | 2016-10-19 | 广州金山移动科技有限公司 | Text conversion method and device |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6175844B1 (en) * | 1997-05-29 | 2001-01-16 | Adobe Systems Incorporated | Ordering groups of text in an image |
CN1604074A (en) * | 2004-11-22 | 2005-04-06 | 北京北大方正技术研究院有限公司 | Method for determining words reading sequence for columned serial words pages with mutually exclusive pattern and characters |
CN1604075A (en) * | 2004-11-22 | 2005-04-06 | 北京北大方正技术研究院有限公司 | Method for conducting words reading sequence recovery for newspaper pages |
CN101382932A (en) * | 2008-10-24 | 2009-03-11 | 北大方正集团有限公司 | Typesetting method and device for right angle folding manuscript block |
CN101515272A (en) * | 2008-02-18 | 2009-08-26 | 株式会社理光 | Method and device for extracting webpage content |
CN101866418A (en) * | 2009-04-17 | 2010-10-20 | 株式会社理光 | Method and equipment for determining file reading sequences |
CN101916293A (en) * | 2010-08-27 | 2010-12-15 | 中国电信股份有限公司 | Method and device for introducing media information into file |
CN101944104A (en) * | 2010-08-19 | 2011-01-12 | 百度在线网络技术(北京)有限公司 | Evaluation method and equipment for importance of webpage sub-blocks |
CN102262618A (en) * | 2010-05-28 | 2011-11-30 | 北京大学 | Method and device for identifying page information |
CN102479173A (en) * | 2010-11-25 | 2012-05-30 | 北京大学 | Method and device for identifying reading sequence of layout |
CN102567300A (en) * | 2011-12-29 | 2012-07-11 | 方正国际软件有限公司 | Picture document processing method and device |
CN102890826A (en) * | 2011-08-12 | 2013-01-23 | 北京多看科技有限公司 | Method for resetting scan edition document |
-
2013
- 2013-07-05 CN CN201310282405.4A patent/CN103488619B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6175844B1 (en) * | 1997-05-29 | 2001-01-16 | Adobe Systems Incorporated | Ordering groups of text in an image |
CN1604074A (en) * | 2004-11-22 | 2005-04-06 | 北京北大方正技术研究院有限公司 | Method for determining words reading sequence for columned serial words pages with mutually exclusive pattern and characters |
CN1604075A (en) * | 2004-11-22 | 2005-04-06 | 北京北大方正技术研究院有限公司 | Method for conducting words reading sequence recovery for newspaper pages |
CN101515272A (en) * | 2008-02-18 | 2009-08-26 | 株式会社理光 | Method and device for extracting webpage content |
CN101382932A (en) * | 2008-10-24 | 2009-03-11 | 北大方正集团有限公司 | Typesetting method and device for right angle folding manuscript block |
CN101866418A (en) * | 2009-04-17 | 2010-10-20 | 株式会社理光 | Method and equipment for determining file reading sequences |
CN102262618A (en) * | 2010-05-28 | 2011-11-30 | 北京大学 | Method and device for identifying page information |
CN101944104A (en) * | 2010-08-19 | 2011-01-12 | 百度在线网络技术(北京)有限公司 | Evaluation method and equipment for importance of webpage sub-blocks |
CN101916293A (en) * | 2010-08-27 | 2010-12-15 | 中国电信股份有限公司 | Method and device for introducing media information into file |
CN102479173A (en) * | 2010-11-25 | 2012-05-30 | 北京大学 | Method and device for identifying reading sequence of layout |
CN102890826A (en) * | 2011-08-12 | 2013-01-23 | 北京多看科技有限公司 | Method for resetting scan edition document |
CN102567300A (en) * | 2011-12-29 | 2012-07-11 | 方正国际软件有限公司 | Picture document processing method and device |
Non-Patent Citations (3)
Title |
---|
贾娟等: "《图文互斥版面中文字阅读顺序的确定》", 《中文信息学报》 * |
贾娟等: "《基于图论最大匹配的非Manhattan版面阅读顺序》", 《计算机工程》 * |
贾娟等: "《多篇章非Manhattan版面层次布局模型和阅读顺序确定》", 《计算机辅助设计与图形学学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106033412A (en) * | 2015-03-20 | 2016-10-19 | 广州金山移动科技有限公司 | Text conversion method and device |
CN106033412B (en) * | 2015-03-20 | 2019-07-26 | 广州金山移动科技有限公司 | A kind of text conversion method and device |
CN105335339A (en) * | 2015-10-19 | 2016-02-17 | 江苏沃叶软件有限公司 | Pdf document conversion method |
CN106020677A (en) * | 2016-04-27 | 2016-10-12 | 努比亚技术有限公司 | Information processing method and mobile terminal |
Also Published As
Publication number | Publication date |
---|---|
CN103488619B (en) | 2017-05-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109933756B (en) | Image file transferring method, device and equipment based on OCR (optical character recognition), and readable storage medium | |
CN111401371B (en) | Text detection and identification method and system and computer equipment | |
US8209600B1 (en) | Method and apparatus for generating layout-preserved text | |
JP5659563B2 (en) | Identification method, identification device, and computer program | |
CN101876967B (en) | Method for generating PDF text paragraphs | |
JP5439454B2 (en) | Electronic comic editing apparatus, method and program | |
JP5439456B2 (en) | Electronic comic editing apparatus, method and program | |
WO2013058397A1 (en) | Digital comic editing device and method therefor | |
US8522138B2 (en) | Content analysis apparatus and method | |
CN104077270A (en) | Electronic book production apparatus, electronic book system and electronic book production method | |
CN101908218A (en) | Editing equipment and method for arranging | |
CN116402020A (en) | Signature imaging processing method, system and storage medium based on OFD document | |
CN114663897A (en) | Table extraction method and table extraction system | |
CN103488619A (en) | Method and device for processing document file | |
JP5950700B2 (en) | Image processing apparatus, image processing method, and program | |
JP5769131B2 (en) | Image processing apparatus and program | |
CN104536947A (en) | Layout document processing method and device | |
CN111444452A (en) | Conversion method, device and storage medium of webpage | |
CN111476090A (en) | Watermark identification method and device | |
CN115114481A (en) | Document format conversion method, device, storage medium and equipment | |
JP6030915B2 (en) | Image rearrangement method, image rearrangement system, and image rearrangement program | |
CN117291152A (en) | Table extraction method and apparatus | |
CN111104871B (en) | Form region identification model generation method and device and form positioning method and device | |
JP2021140831A (en) | Document image processing system, document image processing method, and document image processing program | |
CN113378526A (en) | PDF paragraph processing method, device, storage medium and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |