CN103488619A - Method and device for processing document file - Google Patents

Method and device for processing document file Download PDF

Info

Publication number
CN103488619A
CN103488619A CN201310282405.4A CN201310282405A CN103488619A CN 103488619 A CN103488619 A CN 103488619A CN 201310282405 A CN201310282405 A CN 201310282405A CN 103488619 A CN103488619 A CN 103488619A
Authority
CN
China
Prior art keywords
document files
word
merging
document
merged
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310282405.4A
Other languages
Chinese (zh)
Other versions
CN103488619B (en
Inventor
徐广金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201310282405.4A priority Critical patent/CN103488619B/en
Publication of CN103488619A publication Critical patent/CN103488619A/en
Application granted granted Critical
Publication of CN103488619B publication Critical patent/CN103488619B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a method and a device for processing a document file. The method for processing the document file comprises the following steps of extracting document file elements from the document file; combining the document file elements according to the types of the document file elements and the position information of the document file elements in the document file so as to generate the combined document file. By the adoption of the method and the device, the document file elements are combined according to the types of the document file elements which are extracted from the document file and the position information of the document file elements in the document file so as to generate the combined document file. The document file self-adaptive to a screen of user equipment can be generated without manual editing of each document file.

Description

A kind of method and device for carrying out the document files processing
Technical field
The present invention relates to the document files treatment technology, relate in particular to a kind of method and device for carrying out the document files processing.
Background technology
The user uses mobile device reading documents file very general at present, document files for different-format, need in computing machine, install and support document files software for editing or the document files ocr software of corresponding format just can present document files, be also, the document files image is played up each document files element according to the attribute that presents of the document files element in document files, thereby present the document file, wherein present attribute and include but not limited to the coordinate information of document files element in document files, style information, this style information comprises character script, word size and color etc.And if different document Document Editing software or the document ocr software that adapts to various document file format be not installed, because mobile device of the prior art can only be simple to document file structure, and the document files of the specific format of being crossed by the human-edited completes the processing of self-adaptation screen, microsoft office series form for current the most generally application, PDF, the document files of openOffice series form, can not batch processing be can the adaptive user device screen document files, can not meet the demand of current extensive reading electronic document files, also to the user, read and bring very large inconvenience, reduce reading experience.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of method and device for carrying out the document files processing, take the problem that can not be document files that can the adaptive user device screen by the document files batch processing of most of forms that solves in prior art.
According to an aspect of the present invention, provide a kind of for carrying out the method for document files processing, comprising:
Extract the document files element from document files;
According to type and the positional information of document files element in the document file of described document files element, described document files element is merged, to generate the document files after merging.
According to another aspect of the present invention, also provide a kind of for carrying out the document files treating apparatus of document files processing, having comprised:
The element extraction device, for extracting the document files element from document files;
Merge device, described document files element is merged in the positional information of the document file for the type according to described document files element and document files element, to generate the document files after merging.
The present invention is by the type of the document files element according to extracting from document files, with the positional information of the document document element in document files, the document files element is merged, generate the document files after merging, do not need artificial the participation respectively every piece of document files to be edited, just can generate the document files of the screen of adaptive user equipment.
The accompanying drawing explanation
By reading the detailed description that non-limiting example is done of doing with reference to the following drawings, it is more obvious that other features, objects and advantages of the present invention will become:
The process flow diagram that Fig. 1 is a kind of embodiment of the method for processing for document files of the present invention;
The schematic diagram that Fig. 2 a-2c is two figures that in the embodiment of the present invention, coordinate is adjacent;
The schematic diagram that Fig. 3 is a kind of device embodiment processed for document files of the present invention.
In accompanying drawing, same or analogous Reference numeral represents same or analogous parts.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
Herein:
" document files " refers to adopt the document files software for editing to edit the file generated, wherein comprise the document files elements such as word, figure, particularly, document files includes but not limited to the document files of the forms such as Word, Excel, PDF, OpenOffice, RTF, XML, TXT, EPUB;
" document files element " refers to the element in document files, including but not limited to word, figure.
Fig. 1 shows the schematic flow sheet of the method that document files is processed of one embodiment of the invention.
As shown in Figure 1, in step S101, extract the document files element from document files, the document files element includes but not limited at least following any form: word, figure.
In step S102, according to type and the positional information of document files element in the document file of document files element, described document files element is merged, to generate the document files after merging.
Concrete, can, based on following at least one mode, according to type and the positional information of document files element, a plurality of document files elements be merged:
I), when the document files element is word, according to arrangement mode and the coordinate information of word, described word is consolidated into to a word sequence.
Be appreciated that, common character arrangement comprises horizontally-arranged and vertical setting of types, when the arrangement mode of word is horizontally-arranged, can be according to the coordinate information of word, by the same a line between adjacent newline, but separated by other document files elements such as figure, multistage horizontally-arranged word is merged into one by the continuously arranged word sequence of horizontally-arranged mode; When the arrangement mode of word is vertical setting of types, can be according to the coordinate information of word, by the same row between adjacent newline, but separated by other document files elements such as figure, the merging of multistage vertical setting of types word is processed into one by the continuously arranged word sequence of vertical setting of types mode one.
Ii), for two adjacent figures, according to the coordinate information of two figures, two figures are merged into to a figure.
Particularly, can judge in the following manner whether two figures are the figure that coordinate is adjacent:
At first, according to the coordinate information of two figures, obtain the minimum containment rectangle of described two figures, described minimum containment rectangle is determined by minimum horizontal ordinate and maximum horizontal ordinate, minimum ordinate and the maximum ordinate of corresponding figure;
Subsequently, judge whether the interval between the minimum containment rectangle zone of two figures is less than predetermined value, if the interval between the rectangular area of two figures is less than predetermined value, these two figures are figures that coordinate is adjacent, as shown in Fig. 2 a-2c.Concrete, this predetermined value can be set according to actual document file situation and experience, preferred, can be set as 2 pixels.
Iii) when being included in a plurality of dissimilar document files element in the same area, coordinate information according to described dissimilar document files element, described a plurality of dissimilar document files elements are merged into to a figure, wherein, described dissimilar document files element includes but not limited to word, figure.
Subsequently, the document files element obtained after being merged by a plurality of document files elements, for example word sequence and figure, arranged according to its corresponding coordinate information, generates the document files after merging.
Wherein, the document files after merging comprises following information:
The type of the document files element after-merging, comprise word, figure;
The content of the document files element after-merging, comprising following any one:
The content of the word sequence that ■ obtains after merging;
The graphic file of the figure that ■ obtains after merging or memory address or the link of graphic file;
The positional information of-described document files element, this positional information comprises document files element coordinate information in document files after merging;
The style information of-described document files element, this style information comprises font size, the color of word.
The present embodiment is by the type of the document files element according to extracting from document files, with the positional information of the document document element in document files, the document files element is merged, generate the document files after merging, do not need artificial the participation respectively every piece of document files to be edited, just can generate the document files of the screen of adaptive user equipment, be also, document files after merging is played up and is presented on distinct device, while for the size that adapts to different screen, carrying out typesetting again, owing to each document files element in former document files having been carried out merge, process, to obtain continuously arranged word sequence and to be merged the figure obtained by a plurality of document files elements, therefore can reduce document files changes because of the format that typesetting causes again.
Step S101 and S102 in previous embodiment, a kind of method of processing for document files of the present invention, can also comprise step S103 and S104 in another embodiment.
In step S103, by the graphics memory after merging, be graphic file, the graphic file of forms such as GIF, JPG, PNG.
In step S104, the document files after described graphic file is set up index and put into merging.
Concrete, can in the following ways graphic file be set up to index in the document files after merging:
According to coordinate information, feature size and the figure place figure layer etc. that obtain figure after merging, set up the index of this figure.For example, the content part of document files element in document files after merging, with the origin coordinates that identifies presentation graphic in described graphic file, wide and high, for example use { ix, iy, iw, ih} means, wherein, ix is the initial horizontal ordinate of figure at described graphic file, iy is the initial ordinate of figure at described graphic file, and what iw was figure is wide, the height that ih is figure.
Step S101-S104 in previous embodiment, a kind of method of processing for document files of the present invention, can also comprise step S105 in another embodiment.
In step S105, when the browser reading documents file of user user equipment, be built in the document files after reader in browser is resolved described merging, and according to the information in the document files after this merging, the document files element wherein comprised played up to be presented.
The method that the embodiment of the present invention provides, the user need to not install the reader of various adaptation different document file layouts on subscriber equipment, also do not need to install special application program, the browser that just can be convenient to use subscriber equipment is read the document files of various different-formats.As mentioned above, after being processed by a plurality of original document document elements merging due to each document files element in the document files after merging, obtain, for example continuously arranged word sequence reaches by a plurality of document files elements and merges the figure obtained, and the different big or small screen formats that typesetting causes again change because needs adapt to therefore can to reduce document files.
Fig. 3 shows a kind of device of processing for document files of the present invention, and as shown in Figure 3, the document document handling apparatus comprises element extraction device 21 and merges device 22.
Wherein, element extraction device 21, for from document files, extracting the document files element, the document files element includes but not limited at least following any form: word, figure.
Merge device 22, described document files element is merged in the positional information of the document file for the type according to the document files element and document files element, to generate the document files after merging.
Concrete, described merging device 22 can comprise the first merging module 221 and arrange module 222.
Described first merges module 221, for based on following at least one mode, according to type and the positional information of described document files element, a plurality of document files elements is merged:
I), when the document files element is word, according to arrangement mode and the coordinate information of word, described word is consolidated into to a word sequence.
Be appreciated that common character arrangement comprises horizontally-arranged and vertical setting of types, first merges module 221 can comprise:
First merges submodule 2211, for when the arrangement mode of word is horizontally-arranged, can be according to the coordinate information of word, by the same a line between adjacent newline, but separated by other document files elements such as figure, multistage horizontally-arranged word is merged into one by the continuously arranged word sequence of horizontally-arranged mode; With
Second merges submodule 2212, for when the arrangement mode of word is vertical setting of types, can be according to the coordinate information of word, by the same row between adjacent newline, but separated by other document files elements such as figure, the merging of multistage vertical setting of types word is processed into one by the continuously arranged word sequence of vertical setting of types mode one.
Ii), for two adjacent figures, according to the coordinate information of two figures, two figures are merged into to a figure.
Particularly, can judge in the following manner whether two figures are the figure that coordinate is adjacent:
At first, according to the coordinate information of two figures, obtain the minimum containment rectangle of described two figures, described minimum containment rectangle is determined by minimum horizontal ordinate and maximum horizontal ordinate, minimum ordinate and the maximum ordinate of corresponding figure;
Subsequently, judge whether the interval between the minimum containment rectangle zone of two figures is less than predetermined value, if the interval between the rectangular area of two figures is less than predetermined value, these two figures are figures that coordinate is adjacent, as shown in Fig. 2 a-2c.Concrete, this predetermined value can be set according to actual document file situation and experience, preferred, can be set as 2 pixels.
Iii) when being included in a plurality of dissimilar document files element in the same area, coordinate information according to described dissimilar document files element, described a plurality of dissimilar document files elements are merged into to a figure, wherein, described dissimilar document files element includes but not limited to word, figure.
Described arrangement module 222, for the document files element obtained after being merged by a plurality of document files elements, for example word sequence and figure, arranged according to its corresponding coordinate information, generates the document files after merging.Concrete, can be according to the document files element obtained after being merged by a plurality of document files elements, for example word sequence and figure, obtain the minimum containment rectangle of the document files element obtained after described merging, described minimum containment rectangle is determined by minimum horizontal ordinate and maximum horizontal ordinate, minimum ordinate and the maximum ordinate of corresponding word sequence or figure, these minimum containment rectangles are got to union based on coordinate, the document files after being merged.
Wherein, the document files after merging comprises following information:
The type of the document files element after-merging, comprise word, figure;
The content of the document files element after-merging, comprising following any one:
The content of the word sequence that ■ obtains after merging;
The graphic file of the figure that ■ obtains after merging or memory address or the link of graphic file;
The positional information of-described document files element, this positional information comprises document files element coordinate information in document files after merging;
The style information of-described document files element, this style information comprises font size, the color of word.
The present embodiment is by the type of the document files element according to extracting from document files, with the positional information of the document document element in document files, the document files element is merged, generate the document files after merging, do not need artificial the participation respectively every piece of document files to be edited, just can generate the document files of the screen of adaptive user equipment, be also, document files after merging is played up and is presented on distinct device, while for the size that adapts to different screen, carrying out typesetting again, owing to each document files element in former document files having been carried out merge, process, to obtain continuously arranged word sequence and to be merged the figure obtained by a plurality of document files elements, therefore can reduce document files changes because of the format that typesetting causes again.
Element extraction device 21 in previous embodiment and merging device 22, a kind of device of processing for document files of the present invention, can also comprise memory storage and index apparatus for establishing in another embodiment.
Wherein, memory storage, be graphic file for the graphics memory by obtaining after merging, the graphic file of forms such as GIF, JPG, PNG;
The index apparatus for establishing, for the document files after described graphic file is set up index and put into merging.
Concrete, can in the following ways graphic file be set up to index in the document files after merging:
According to coordinate information, feature size and the figure place figure layer etc. that obtain figure after merging, set up the index of this figure.For example, the content part of document files element in document files after merging, with the origin coordinates that identifies presentation graphic in described graphic file, wide and high, for example used { ix, iy, iw, ih} means, wherein, ix is the initial horizontal ordinate of figure at described graphic file, iy is the initial ordinate of figure at described graphic file, and what iw was figure is wide, the height that ih is figure.
Element extraction device 21 in previous embodiment, merging device 22, memory storage and index apparatus for establishing, a kind of device of processing for document files of the present invention, can also comprise reader in another embodiment, for being built in browser, resolve the document files after described merging, and according to the information in the document files after this merging, the document files element wherein comprised is played up to be presented.
The device that the embodiment of the present invention provides, the user need to not install the reader of various adaptation different document file layouts on subscriber equipment, also do not need to install special application program, the browser that just can be convenient to use subscriber equipment is read the document files of various different-formats.As mentioned above, after being processed by a plurality of original document document elements merging due to each document files element in the document files after merging, obtain, for example continuously arranged word sequence reaches by a plurality of document files elements and merges the figure obtained, and the different big or small screen formats that typesetting causes again change because needs adapt to therefore can to reduce document files.
It should be noted that the present invention can be implemented in the assembly of software and/or software and hardware, for example, each device of the present invention can adopt special IC (ASIC) or any other similar hardware device to realize.In one embodiment, software program of the present invention can carry out to realize step mentioned above or function by processor.Similarly, software program of the present invention (comprising relevant data structure) can be stored in computer readable recording medium storing program for performing, for example, and RAM storer, magnetic or CD-ROM driver or flexible plastic disc and similar devices.In addition, steps more of the present invention or function can adopt hardware to realize, for example, thereby as coordinate the circuit of carrying out each step or function with processor.
To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned example embodiment, and in the situation that do not deviate from spirit of the present invention or essential characteristic, can realize the present invention with other concrete form.Therefore, no matter from which point, all should regard embodiment as exemplary, and be nonrestrictive, scope of the present invention is limited by claims rather than above-mentioned explanation, therefore is intended to be included in the present invention dropping on the implication that is equal to important document of claim and all changes in scope.Any Reference numeral in claim should be considered as limit related claim.In addition, obviously other unit or step do not got rid of in " comprising " word, and odd number is not got rid of plural number.A plurality of unit of stating in the system claim or dress also can be realized by software or hardware by a unit or device.The first, the second word such as grade is used for meaning title, and does not mean any specific order.

Claims (10)

1. one kind for carrying out the method for document files processing, it is characterized in that, comprising:
-extraction document part element from document files;
-according to type and the put information of document files element in the document file of described document files element, described document files element is merged, to generate the document files after merging.
2. method according to claim 1, wherein, the described step that the document files element is merged comprises the following steps:
-based on following at least one, according to type and the positional information of described civilian document element, described a plurality of document files elements are merged:
-when the document files element is word, according to arrangement mode and the coordinate information of word, described word is consolidated into to a word sequence;
-for two adjacent figures, according to the coordinate information of two figures, two figures are merged into to a figure;
-when being included in a plurality of dissimilar document files element in the same area, according to the coordinate information of described dissimilar document files element, described a plurality of dissimilar document files elements are merged into to a figure;
-will according to its corresponding coordinate information, be arranged by the document files element obtained after a plurality of document files elements merging, generate the document files after merging.
3. method according to claim 2, is characterized in that, described when the document files element is word, according to arrangement mode and the coordinate information of word, word is consolidated into to the step of a word sequence, comprising:
-when the arrangement mode of word is horizontally-arranged, according to the coordinate information of word, by the same a line between adjacent newline, but separated by other document files elements, multistage horizontally-arranged word is merged into one by the continuously arranged word sequence of horizontally-arranged mode;
-when the arrangement mode of word is vertical setting of types, according to the coordinate information of word, by the same row between adjacent newline, but separated by other document files elements, the merging of multistage vertical setting of types word is processed into one by the continuously arranged word sequence of vertical setting of types mode one.
4. according to the method in claim 2 or 3, it is characterized in that, also comprise:
-by the graphics memory obtained after merging, be graphic file;
-document files after described graphic file is set up index and put into merging.
5. require 2 or 3 described methods according to profit, it is characterized in that, also comprise:
-be built in the document files after reader in browser is resolved described merging, and according to the information in the document files after this merging, the document files element wherein comprised is played up to be presented.
6. one kind for carrying out the document files treating apparatus of document files processing, it is characterized in that, comprising:
The element extraction device, for extracting civilian document element from document files;
Merge device, described document files element is merged in the positional information of this files for the type according to described document files element and document files element, to generate the document files after merging.
7. device according to claim 6, is characterized in that, described merging device comprises:
First merges module, for based on following at least one mode, according to type and the positional information of described document files element, described a plurality of document files elements is merged:
-when the document files element is word, according to arrangement mode and the coordinate information of word, described word is consolidated into to a word sequence;
-for two adjacent figures, according to the coordinate information of two figures, two figures are merged into to a figure;
-when being included in a plurality of dissimilar document files element in the same area, according to the coordinate information of described dissimilar document files element, described a plurality of dissimilar document files elements are merged into to a figure;
Arrange module, for the document element obtained after being merged by a plurality of document files elements, arranged according to its corresponding coordinate information, generate the document files after merging.
8. device according to claim 7, its spy is, described first merges module comprises:
First merges submodule, for when the arrangement mode of word is horizontally-arranged, according to the coordinate information of word, by the same a line between adjacent newline, but separated by other document files elements, multistage horizontally-arranged word is merged into one by the continuously arranged word sequence of horizontally-arranged mode;
Second merges submodule, for when the arrangement mode of word is vertical setting of types, according to the coordinate information of word, by the same row between adjacent newline, but separated by other document files elements, the merging of multistage vertical setting of types word is processed into one by the continuously arranged word sequence of vertical setting of types mode one.
9. according to the described device of claim 7 or 8, it is characterized in that, also comprise:
Memory storage is graphic file for the graphics memory by obtaining after merging;
The index apparatus for establishing, for the document files after described graphic file is set up index and put into merging.
10. according to the described device of claim 7 or 8, it is characterized in that, also comprise:
Reader, for being built in browser, resolving the document files after described merging, and according to the information in the document files after this merging, the document files element wherein comprised played up to be presented.
CN201310282405.4A 2013-07-05 2013-07-05 Method and device for processing document file Active CN103488619B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310282405.4A CN103488619B (en) 2013-07-05 2013-07-05 Method and device for processing document file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310282405.4A CN103488619B (en) 2013-07-05 2013-07-05 Method and device for processing document file

Publications (2)

Publication Number Publication Date
CN103488619A true CN103488619A (en) 2014-01-01
CN103488619B CN103488619B (en) 2017-05-24

Family

ID=49828862

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310282405.4A Active CN103488619B (en) 2013-07-05 2013-07-05 Method and device for processing document file

Country Status (1)

Country Link
CN (1) CN103488619B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335339A (en) * 2015-10-19 2016-02-17 江苏沃叶软件有限公司 Pdf document conversion method
CN106020677A (en) * 2016-04-27 2016-10-12 努比亚技术有限公司 Information processing method and mobile terminal
CN106033412A (en) * 2015-03-20 2016-10-19 广州金山移动科技有限公司 Text conversion method and device

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6175844B1 (en) * 1997-05-29 2001-01-16 Adobe Systems Incorporated Ordering groups of text in an image
CN1604074A (en) * 2004-11-22 2005-04-06 北京北大方正技术研究院有限公司 Method for determining words reading sequence for columned serial words pages with mutually exclusive pattern and characters
CN1604075A (en) * 2004-11-22 2005-04-06 北京北大方正技术研究院有限公司 Method for conducting words reading sequence recovery for newspaper pages
CN101382932A (en) * 2008-10-24 2009-03-11 北大方正集团有限公司 Typesetting method and device for right angle folding manuscript block
CN101515272A (en) * 2008-02-18 2009-08-26 株式会社理光 Method and device for extracting webpage content
CN101866418A (en) * 2009-04-17 2010-10-20 株式会社理光 Method and equipment for determining file reading sequences
CN101916293A (en) * 2010-08-27 2010-12-15 中国电信股份有限公司 Method and device for introducing media information into file
CN101944104A (en) * 2010-08-19 2011-01-12 百度在线网络技术(北京)有限公司 Evaluation method and equipment for importance of webpage sub-blocks
CN102262618A (en) * 2010-05-28 2011-11-30 北京大学 Method and device for identifying page information
CN102479173A (en) * 2010-11-25 2012-05-30 北京大学 Method and device for identifying reading sequence of layout
CN102567300A (en) * 2011-12-29 2012-07-11 方正国际软件有限公司 Picture document processing method and device
CN102890826A (en) * 2011-08-12 2013-01-23 北京多看科技有限公司 Method for resetting scan edition document

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6175844B1 (en) * 1997-05-29 2001-01-16 Adobe Systems Incorporated Ordering groups of text in an image
CN1604074A (en) * 2004-11-22 2005-04-06 北京北大方正技术研究院有限公司 Method for determining words reading sequence for columned serial words pages with mutually exclusive pattern and characters
CN1604075A (en) * 2004-11-22 2005-04-06 北京北大方正技术研究院有限公司 Method for conducting words reading sequence recovery for newspaper pages
CN101515272A (en) * 2008-02-18 2009-08-26 株式会社理光 Method and device for extracting webpage content
CN101382932A (en) * 2008-10-24 2009-03-11 北大方正集团有限公司 Typesetting method and device for right angle folding manuscript block
CN101866418A (en) * 2009-04-17 2010-10-20 株式会社理光 Method and equipment for determining file reading sequences
CN102262618A (en) * 2010-05-28 2011-11-30 北京大学 Method and device for identifying page information
CN101944104A (en) * 2010-08-19 2011-01-12 百度在线网络技术(北京)有限公司 Evaluation method and equipment for importance of webpage sub-blocks
CN101916293A (en) * 2010-08-27 2010-12-15 中国电信股份有限公司 Method and device for introducing media information into file
CN102479173A (en) * 2010-11-25 2012-05-30 北京大学 Method and device for identifying reading sequence of layout
CN102890826A (en) * 2011-08-12 2013-01-23 北京多看科技有限公司 Method for resetting scan edition document
CN102567300A (en) * 2011-12-29 2012-07-11 方正国际软件有限公司 Picture document processing method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
贾娟等: "《图文互斥版面中文字阅读顺序的确定》", 《中文信息学报》 *
贾娟等: "《基于图论最大匹配的非Manhattan版面阅读顺序》", 《计算机工程》 *
贾娟等: "《多篇章非Manhattan版面层次布局模型和阅读顺序确定》", 《计算机辅助设计与图形学学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106033412A (en) * 2015-03-20 2016-10-19 广州金山移动科技有限公司 Text conversion method and device
CN106033412B (en) * 2015-03-20 2019-07-26 广州金山移动科技有限公司 A kind of text conversion method and device
CN105335339A (en) * 2015-10-19 2016-02-17 江苏沃叶软件有限公司 Pdf document conversion method
CN106020677A (en) * 2016-04-27 2016-10-12 努比亚技术有限公司 Information processing method and mobile terminal

Also Published As

Publication number Publication date
CN103488619B (en) 2017-05-24

Similar Documents

Publication Publication Date Title
CN109933756B (en) Image file transferring method, device and equipment based on OCR (optical character recognition), and readable storage medium
CN111401371B (en) Text detection and identification method and system and computer equipment
US8209600B1 (en) Method and apparatus for generating layout-preserved text
JP5659563B2 (en) Identification method, identification device, and computer program
CN101876967B (en) Method for generating PDF text paragraphs
JP5439454B2 (en) Electronic comic editing apparatus, method and program
JP5439456B2 (en) Electronic comic editing apparatus, method and program
WO2013058397A1 (en) Digital comic editing device and method therefor
US8522138B2 (en) Content analysis apparatus and method
CN104077270A (en) Electronic book production apparatus, electronic book system and electronic book production method
CN101908218A (en) Editing equipment and method for arranging
CN116402020A (en) Signature imaging processing method, system and storage medium based on OFD document
CN114663897A (en) Table extraction method and table extraction system
CN103488619A (en) Method and device for processing document file
JP5950700B2 (en) Image processing apparatus, image processing method, and program
JP5769131B2 (en) Image processing apparatus and program
CN104536947A (en) Layout document processing method and device
CN111444452A (en) Conversion method, device and storage medium of webpage
CN111476090A (en) Watermark identification method and device
CN115114481A (en) Document format conversion method, device, storage medium and equipment
JP6030915B2 (en) Image rearrangement method, image rearrangement system, and image rearrangement program
CN117291152A (en) Table extraction method and apparatus
CN111104871B (en) Form region identification model generation method and device and form positioning method and device
JP2021140831A (en) Document image processing system, document image processing method, and document image processing program
CN113378526A (en) PDF paragraph processing method, device, storage medium and equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant