CN104536947A - Layout document processing method and device - Google Patents

Layout document processing method and device Download PDF

Info

Publication number
CN104536947A
CN104536947A CN201410753650.3A CN201410753650A CN104536947A CN 104536947 A CN104536947 A CN 104536947A CN 201410753650 A CN201410753650 A CN 201410753650A CN 104536947 A CN104536947 A CN 104536947A
Authority
CN
China
Prior art keywords
module
row
data
exhibition
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410753650.3A
Other languages
Chinese (zh)
Inventor
薛璐影
刘水
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201410753650.3A priority Critical patent/CN104536947A/en
Publication of CN104536947A publication Critical patent/CN104536947A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a layout document processing method and device. Metadata of a text content in a layout document are acquired and merged to obtain at least one row of presentation row data, and each row of presentation row data include row identification information and the metadata of the text content in the row, so that the layout document can be presented in a stream presentation mode according to the presentation row data of each row. Manual intervention is omitted, operation is simple, and accuracy is high, so that layout document processing efficiency and reliability are improved.

Description

The disposal route of format document and device
[technical field]
The present invention relates to document processing technology, particularly relate to a kind of disposal route and device of format document.
[background technology]
Format document refers to the document meeting format document format specification, and it adopts a kind of definitely describing mode, in self-defining coordinate system, clearly have recorded the position of each page elements display, pattern and size etc.Format document form is that the space of a whole page presents the fixing electronic file form of effect, and presenting of format document has nothing to do with terminal, and when various terminal being read, print or prints, the result that presents of its space of a whole page is all consistent.Increasing e-book, the description of product, company's proclamation, network data, Email bring into use format document, such as, current Portable Document format (Portable Document Format, PDF) document is exactly the more typical format document of one.
Along with the increase of the kind of terminal, user needs to represent format document easily on multiple terminal.Size due to the display device of terminal has very big-difference, the screen of the mobile phone of especially various model, and therefore, the situation that normally cannot represent in different terminals may appear in the format document of fixing layout.In prior art, for the size of the display device of different terminals, artificial layout can be re-started to the content of format document, namely recalculate the position of each page elements display, pattern and size etc., to form the format document of applicable different terminals.
But, the complicated operation of artificial layout, and easily make mistakes, thus result in the efficiency of format document process and the reduction of reliability.
[summary of the invention]
Many aspects of the present invention provide a kind of disposal route and device of format document, in order to improve efficiency and the reliability of format document process.
An aspect of of the present present invention, provides a kind of disposal route of format document, comprising:
Obtain the metadata of content of text included in format document;
Carry out merging treatment to the metadata of described content of text, to obtain the exhibition active data of at least one row, at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row;
According to the exhibition active data of described each row, with streaming ways of presentation, represent described format document.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and the described metadata to described content of text carries out merging treatment, to obtain the exhibition active data of at least one row, comprising:
Carry out merging treatment to the metadata of described content of text, to obtain the initial row data of at least one row, at least one row described, the initial row data of each row comprise the metadata of the line identifier information of this row and the content of text included by this row;
The order of the metadata of content of text included in the initial row data of described each row is adjusted, to obtain the exhibition active data of described each row.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and the described exhibition active data according to described each row, with streaming ways of presentation, represents described format document, comprising:
Merging treatment is carried out to the exhibition active data of at least one row described, represent module data with what obtain at least one module, at least one module described, each module represents the exhibition active data that module data comprises the module id information of this module and the row included by this module;
Represent module data according to described each module, with streaming ways of presentation, represent described format document.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and the described exhibition active data at least one row described carries out merging treatment, represents module data, comprising with what obtain at least one module:
Merging treatment is carried out to the exhibition active data of at least one row described, to obtain the initial module data of at least one module, at least one module described, the initial module data of each module comprise the exhibition active data of the module id information of this module and the row included by this module;
The order of the exhibition active data of the row included by described each module is adjusted, represents module data with what obtain described each module.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, describedly represents module data according to described each module, with streaming ways of presentation, represents described format document, comprising:
Obtain the metadata of image content included in described format document;
According to the metadata representing module data and described image content of described each module, that the metadata of described image content is inserted respective modules represents module data, to obtain the binding module data of described respective modules;
According to the binding module data of described respective modules, and other modules at least one module described except described respective modules represent module data, with streaming ways of presentation, represent described format document.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, describedly represents module data according to described each module, with streaming ways of presentation, before representing described format document, also comprises:
Adjustment process is carried out to the module data that represents of the module of the peripheral attribute for describing described format document.
Another aspect of the present invention, provides a kind for the treatment of apparatus of format document, comprising:
Acquiring unit, for obtaining the metadata of content of text included in format document;
Merge cells, for carrying out merging treatment to the metadata of described content of text, to obtain the exhibition active data of at least one row, at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row;
Represent unit, for the exhibition active data according to described each row, with streaming ways of presentation, represent described format document.
Aspect as above and arbitrary possible implementation, provide a kind of implementation, described merge cells further, specifically for
Carry out merging treatment to the metadata of described content of text, to obtain the initial row data of at least one row, at least one row described, the initial row data of each row comprise the metadata of the line identifier information of this row and the content of text included by this row; And
The order of the metadata of content of text included in the initial row data of described each row is adjusted, to obtain the exhibition active data of described each row.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described in represent unit, specifically for
Merging treatment is carried out to the exhibition active data of at least one row described, represent module data with what obtain at least one module, at least one module described, each module represents the exhibition active data that module data comprises the module id information of this module and the row included by this module; And
Represent module data according to described each module, with streaming ways of presentation, represent described format document.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described in represent unit, specifically for
Merging treatment is carried out to the exhibition active data of at least one row described, to obtain the initial module data of at least one module, at least one module described, the initial module data of each module comprise the exhibition active data of the module id information of this module and the row included by this module; And
The order of the exhibition active data of the row included by described each module is adjusted, represents module data with what obtain described each module.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described in represent unit, specifically for
Obtain the metadata of image content included in described format document;
According to the metadata representing module data and described image content of described each module, that the metadata of described image content is inserted respective modules represents module data, to obtain the binding module data of described respective modules; And
According to the binding module data of described respective modules, and other modules at least one module described except described respective modules represent module data, with streaming ways of presentation, represent described format document.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described in represent unit, also for
Adjustment process is carried out to the module data that represents of the module of the peripheral attribute for describing described format document.
As shown from the above technical solution, the embodiment of the present invention is by obtaining the metadata of content of text included in format document, and then merging treatment is carried out to the metadata of described content of text, to obtain the exhibition active data of at least one row, in at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row, make it possible to the exhibition active data according to described each row, with streaming ways of presentation, represent described format document, without the need to artificial participation, simple to operate, and accuracy is high, thus improve efficiency and the reliability of format document process.
In addition, adopt technical scheme provided by the invention, by with streaming ways of presentation, represent format document, making the space of a whole page of format document present effect is no longer immobilize and can not edit, but can edit according to the size flexibility and changeability of the display device of terminal, thus improve the dirigibility of format document process.
In addition, adjustment process is carried out by the module data that represents of the module to the peripheral attribute for describing format document, such as, amendment or deletion etc., make it possible to according to adjustment after each module represent module data, with streaming ways of presentation, represent described format document, the size that can avoid due to the display device of terminal have very big-difference and cause represent format document with streaming ways of presentation time its content element for the peripheral attribute describing described format document there will be the problem of mistake, thus improve efficiency and the reliability of format document process.
[accompanying drawing explanation]
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The schematic flow sheet of the disposal route of the format document that Fig. 1 provides for one embodiment of the invention;
The structural representation of the treating apparatus of the format document that Fig. 2 provides for another embodiment of the present invention.
[embodiment]
For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making other embodiments whole obtained under creative work prerequisite, belong to the scope of protection of the invention.
It should be noted that, terminal involved in the embodiment of the present invention can include but not limited to mobile phone, personal digital assistant (Personal Digital Assistant, PDA), radio hand-held equipment, panel computer (Tablet Computer), PC (Personal Computer, PC), MP3 player, MP4 player, wearable device (such as, intelligent glasses, intelligent watch, Intelligent bracelet etc.) etc.
In addition, term "and/or" herein, being only a kind of incidence relation describing affiliated partner, can there are three kinds of relations in expression, and such as, A and/or B, can represent: individualism A, exists A and B simultaneously, these three kinds of situations of individualism B.In addition, character "/" herein, general expression forward-backward correlation is to the relation liking a kind of "or".
The schematic flow sheet of the disposal route of the format document that Fig. 1 provides for one embodiment of the invention, as shown in Figure 1.
101, the metadata of content of text included in format document is obtained.
102, carry out merging treatment to the metadata of described content of text, to obtain the exhibition active data of at least one row, at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row.
103, according to the exhibition active data of described each row, with streaming ways of presentation, described format document is represented.
Like this, the exhibition active data of obtained each row can be utilized, editing and processing is carried out to represented format document.
It should be noted that, the executive agent of 101 ~ 103 can for being positioned at the application of local terminal, or can also for being arranged in plug-in unit or SDK (Software Development Kit) (the SoftwareDevelopment Kit of the application of local terminal, the functional unit such as SDK), or can also for being arranged in the processing engine of the server of network side, or can also for being positioned at the distributed system of network side, the present embodiment is not particularly limited this.
Be understandable that, described application can be mounted in the local program (nativeApp) in terminal, or can also be a web page program (webApp) of browser in terminal, and the present embodiment is not particularly limited this.
Like this, by obtaining the metadata of content of text included in format document, and then merging treatment is carried out to the metadata of described content of text, to obtain the exhibition active data of at least one row, in at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row, make it possible to the exhibition active data according to described each row, with streaming ways of presentation, represent described format document, without the need to artificial participation, simple to operate, and accuracy is high, thus improve efficiency and the reliability of format document process.
Alternatively, in one of the present embodiment possible implementation, in 101, the metadata of content of text included in the format document obtained can include but not limited at least one item in the metadata of content of text included in the metadata of content of text included in the catalog page of described release documentation and the text page of release documentation, and the present embodiment is not particularly limited this.
Format document refers to the document meeting format document format specification, and it adopts a kind of definitely describing mode, in self-defining coordinate system, clearly have recorded the position of each page elements display, size and pattern etc.The metadata of format document can include but not limited to the data such as position, size, the color and style of each page elements in format document (as word, picture or hyperlink etc.).
Usually, in board-like document, its content of text can be divided into several content element, and can be a character, or can also be several character, the present embodiment be particularly limited this.Each content element one group metadata represents.
In the present embodiment, the metadata of format document can comprise at least one item in the metadata of content of text and the metadata of image content, its storage format can be multiple format, such as, JavaScript object representation (JavaScript Object Notation, JSON) form etc.
JSON is a kind of data presentation technique of light weight.The mode record data that JSON form adopts key assignments (key-value) right are very intuitively, succinct than extend markup language (Extensible MarkupLanguage, XML).
So-called catalog page, refers to the page it comprising structurized directory information.Such as, the page that catalogue, Contents etc. have the keyword of directory feature is comprised; Or, more such as, comprise " ... ... ... XX ", "------------------XX " or "------------------XX " etc. there is the page of the character of directory feature, wherein, XX represents the page number such as arabic numeral, English digital character.
So-called text page, refers to the page it comprising text message.
Alternatively, in one of the present embodiment possible implementation, in 102, specifically can carry out merging treatment to the metadata of described content of text, to obtain the initial row data of at least one row, in at least one row described, the initial row data of each row comprise the metadata of the line identifier information of this row and the content of text included by this row, and then, again the order of the metadata of content of text included in the initial row data of described each row is adjusted, to obtain the exhibition active data of described each row.
Particularly, specifically can increase a new zone bit in the metadata of content of text, in order to represent the row belonging to several content element included by described content of text, such as, _ line position, initial value is 0.
In a concrete implementation procedure, specifically can the position data of several content element included by described content of text and the position data of content element, merging treatment is carried out to the metadata of described content of text, to obtain the initial row data of at least one row.
Wherein, described content element is determined according to the metadata of format document, and can be a character, or can also be several character, the present embodiment be particularly limited this.When the layout of format document, by the content element that divides in advance integrally, its characteristic of correspondence data can be set.Wherein, described characteristic can include but not limited at least one item in position, font, size, color, pattern and typesetting format, and the present embodiment is not particularly limited this.
The font of the character in described content element, referring to the external form feature of the character in content element, is exactly the style of character, is the coat of character, such as, the Song typeface, regular script or lishu etc.
The size of the character in described content element, refers to the size of the character in content element, such as, and No. four (14 pounds), little No. four (No. 12) or No. five (10.5 pounds) etc.
The color of the character in described content element, refers to the color of the character in content element, such as, red or blue etc.
The pattern of the character in described content element, refers to the style of the character in content element, such as, and overstriking or italic etc.
The typesetting format of the character in described content element, refers to the distribution form of the character in content element, such as, between two parties, often row be no more than at most S (S be more than or equal to 1 integer) individual character or often go ending there is no punctuation mark etc.
Such as, suppose that the position data of certain content element can be designated as (X, Y), in order to represent the coordinate in the facing pages upper left corner of this content element, wherein, X represents horizontal ordinate, and Y represents ordinate.Specifically the difference of Y-coordinate can be less than or equal to the content element of the Y-coordinate threshold value pre-set, be merged into a row.
In the implementation procedure that another is concrete, because the typesetting format in format document is varied, such as, subfield etc., even if make content element can belong to a line in position, but also not necessarily really in terms of content belong to a line, therefore, other characteristics of content element can also be utilized further, correction process is carried out to merged row, can include but not limited at least one item in the font of the character in content element, size, color, pattern and typesetting format, the present embodiment is not particularly limited this.
A kind of optional correction processing method, specifically can utilize the X-coordinate of content element, carry out correction process to merged row.Particularly, the difference of the X-coordinate between two neighbouring content cells within each row can specifically be calculated.If certain difference is less than or equal to the X-coordinate threshold value that pre-sets such as, 1.2 times of average font size etc., then can retain current amalgamation result; If be greater than this X-coordinate threshold value, then need to delete the amalgamation result of these two adjacent content element.
Another kind of optional correction processing method, specifically can utilize font or the pattern of the character of the content element within each row, carry out correction process to merged row.Particularly, font or the pattern of the character of the content element within each row can specifically be obtained.If the font of the character of the content element within each row or pattern are unanimously, then can retain current amalgamation result; If in a row in the font of the character of some content element or certain several content element or pattern and this row the font of the character of other guide unit or pattern inconsistent, then need the amalgamation result deleting this content element or these content element.
The order of the metadata of content of text included in the initial row data to described each row adjusts, after the exhibition active data obtaining described each row, further, can also according to the metadata of the content element within each row, the line width of the reference position calculating this row, the end position of changing one's profession, this row, the line space etc. between these row and adjacent lines, and added in the exhibition active data of this row, using the merging foundation as follow-up merging treatment.
In the implementation procedure that another is concrete, because the storage of the metadata of content of text included in format document is unordered, therefore, the order of content element included in each row that described merging treatment obtains may according to the order of particular content, therefore, also need the metadata utilizing each content element further, such as, the particular content of content element, the X-coordinate etc. of content element, the order of the metadata of content of text included in the initial row data of described each row is adjusted, to obtain the exhibition active data of described each row.
Alternatively, in one of the present embodiment possible implementation, in 103, specifically can carry out merging treatment to the exhibition active data of at least one row described, represent module data with what obtain at least one module, at least one module described each module represent the exhibition active data that module data comprises the module id information of this module and the row included by this module, and then then can represent module data according to described each module, with streaming ways of presentation, represent described format document.
Particularly, specifically can increase a new zone bit in the metadata of content of text, in order to represent the module belonging to several content element included by described content of text, such as, _ module position, initial value is 0.
In a concrete implementation procedure, specifically can utilize the exhibition active data of each row such as, the data such as particular content, line space, capable line width of capable content element, merging treatment is carried out to the exhibition active data of at least one row described, represents module data with what obtain at least one module.
Such as, specifically can obtain the particular content of the content element within adjacent lines, determine that whether the particular content of the content element within two row is consistent, if the content of the two is consistent, illustrate to be same theme illustrated by these two adjacent lines, then by these two adjacent lines, a module can be merged into, and then then can the exhibition active data of row included by this module, what obtain this module current represents module data; If the content of the two is inconsistent, illustrate not to be same theme illustrated by these two adjacent lines, then can abandon these current two adjacent lines, no longer perform these two adjacent lines, be merged into the operation of a module, but the like, continue the particular content of the content element obtained within other adjacent lines, until be disposed by the particular content of the content element within all row.
In concrete application process, before determining that whether corresponding particular content is consistent, word segmentation processing can also be carried out, to obtain word segmentation result to the particular content of correspondence respectively further.At this, word segmentation processing technology has been this area comparatively proven technique, for English, because English itself is in units of word, separates, therefore can realize participle easily between word with word by space.Chinese is in units of word, can adopt such as existing: based on string matching segmenting method, based on the segmenting method understood or the segmenting method etc. of Corpus--based Method, word segmentation processing is carried out to Chinese, comparatively conventional such as based on the maximum forward matching algorithm in the segmenting method of string matching, detailed description see related content of the prior art, can repeat no more herein.
After word segmentation processing is carried out to the particular content of correspondence, in order to improve the efficiency of subsequent treatment and reduce noise, filtration treatment is carried out to each word obtained after word segmentation processing, includes but not limited to following listed filtration treatment: filter out the word that default inactive vocabulary comprises; Wherein, generic word list is that these words do not possess independent competency usually in advance based on function word, auxiliary word, pronoun, article, adverbial word, modal particle etc. that word frequency statistics goes out.Specifically can carry out collection by the word frequency of occurrences in existing resource being reached to default high frequency condition to obtain, such as, auxiliary word " " there is the very high frequency of occurrences, but it has very low competency usually, therefore, is collected in inactive vocabulary.
Particularly, after the word segmentation result obtaining corresponding particular content, determine the concrete operations whether content of corresponding particular content is consistent, various ways can also be had, such as, after the word segmentation result obtaining corresponding particular content, text similarity measurement algorithm of the prior art can also be adopted, whether the similarity between each particular content that calculating obtains is consistent to determine the content of corresponding particular content.Such as, Longest Common Substring method, longest common subsequence method, minimum editing distance method, Hamming distance method, cosine value method etc., detailed description see related content of the prior art, can repeat no more herein.Other concrete operations of the present embodiment are not particularly limited.
Or, more such as, specifically the difference of the line width of adjacent lines can be less than or equal to the row of the line width threshold value pre-set, be merged into a module.
Or, more such as, specifically the difference of the line space of adjacent lines can be less than or equal to the row of the line space threshold value pre-set, be merged into a module.
In a concrete implementation procedure, specifically can carry out merging treatment to the exhibition active data of at least one row described, to obtain the initial module data of at least one module, in at least one module described, the initial module data of each module comprise the exhibition active data of the module id information of this module and the row included by this module, and then again the order of the exhibition active data of the row included by described each module is adjusted, represent module data with what obtain described each module.
In the implementation procedure that another is concrete, specifically can obtain the metadata of image content included in described format document, and then according to the metadata representing module data and described image content of described each module, that the metadata of described image content is inserted respective modules represents module data, to obtain the binding module data of described respective modules, then then can according to the binding module data of described respective modules, and other modules at least one module described except described respective modules represent module data, with streaming ways of presentation, represent described format document.
Such as, specifically can according to the position data of several picture element unit cell included in the metadata of image content, with, the position data of several content element included in the metadata of content of text and the position data of content element, determine the content element corresponding with each picture element unit cell.Then, that the metadata of this picture element unit cell is inserted the module belonging to content element corresponding to it represents module data.
Size due to the display device of terminal has very big-difference, the screen of the mobile phone of especially various model, therefore, if with streaming ways of presentation, represent format document, may mistake be there is in its content element for the peripheral attribute describing described format document, such as, and header, footer, sidenote, annotations and comments etc.Alternatively, in the implementation procedure that another is concrete, module data is being represented according to described each module, with streaming ways of presentation, before representing described format document, module data can also be represented further according to described each module, adjustment process is carried out to the module data that represents of the module of the peripheral attribute for describing described format document, such as, amendment or deletion etc.Like this, then can according to adjustment after each module represent module data, with streaming ways of presentation, represent described format document, the size that can avoid due to the display device of terminal have very big-difference and cause represent format document with streaming ways of presentation time its content element for the peripheral attribute describing described format document there will be the problem of mistake, thus improve efficiency and the reliability of format document process.
In the present embodiment, by obtaining the metadata of content of text included in format document, and then merging treatment is carried out to the metadata of described content of text, to obtain the exhibition active data of at least one row, in at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row, make it possible to the exhibition active data according to described each row, with streaming ways of presentation, represent described format document, without the need to artificial participation, simple to operate, and accuracy is high, thus improve efficiency and the reliability of format document process.
In addition, adopt technical scheme provided by the invention, by with streaming ways of presentation, represent format document, making the space of a whole page of format document present effect is no longer immobilize and can not edit, but can edit according to the size flexibility and changeability of the display device of terminal, thus improve the dirigibility of format document process.
In addition, adjustment process is carried out by the module data that represents of the module to the peripheral attribute for describing format document, such as, amendment or deletion etc., make it possible to according to adjustment after each module represent module data, with streaming ways of presentation, represent described format document, the size that can avoid due to the display device of terminal have very big-difference and cause represent format document with streaming ways of presentation time its content element for the peripheral attribute describing described format document there will be the problem of mistake, thus improve efficiency and the reliability of format document process.
It should be noted that, for aforesaid each embodiment of the method, in order to simple description, therefore it is all expressed as a series of combination of actions, but those skilled in the art should know, the present invention is not by the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and involved action and module might not be that the present invention is necessary.
In the above-described embodiments, the description of each embodiment is all emphasized particularly on different fields, in certain embodiment, there is no the part described in detail, can see the associated description of other embodiments.
The structural representation of the treating apparatus of the format document that Fig. 2 provides for another embodiment of the present invention, as shown in Figure 2.The treating apparatus of the format document of the present embodiment can comprise acquiring unit 21, merge cells 22 and represent unit 23.Wherein, acquiring unit 21, for obtaining the metadata of content of text included in format document; Merge cells 22, for carrying out merging treatment to the metadata of described content of text, to obtain the exhibition active data of at least one row, at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row; Represent unit 23, for the exhibition active data according to described each row, with streaming ways of presentation, represent described format document.
Like this, the exhibition active data of each row that acquiring unit 21 can be utilized to obtain, carries out editing and processing to representing the format document that unit 23 represents.
It should be noted that, the treating apparatus of the format document that the present embodiment provides can for being positioned at the application of local terminal, or can also for being arranged in plug-in unit or SDK (Software Development Kit) (the Software Development Kit of the application of local terminal, the functional unit such as SDK), or can also for being arranged in the processing engine of the server of network side, or can also for being positioned at the distributed system of network side, the present embodiment is not particularly limited this.
Be understandable that, described application can be mounted in the local program (nativeApp) in terminal, or can also be a web page program (webApp) of browser in terminal, and the present embodiment is not particularly limited this.
Alternatively, in one of the present embodiment possible implementation, described merge cells 22, specifically may be used for carrying out merging treatment to the metadata of described content of text, to obtain the initial row data of at least one row, at least one row described, the initial row data of each row comprise the metadata of the line identifier information of this row and the content of text included by this row; And the order of the metadata of content of text included in the initial row data of described each row is adjusted, to obtain the exhibition active data of described each row.
Alternatively, in one of the present embodiment possible implementation, describedly represent unit 23, specifically may be used for carrying out merging treatment to the exhibition active data of at least one row described, represent module data with what obtain at least one module, at least one module described, each module represents the exhibition active data that module data comprises the module id information of this module and the row included by this module; And represent module data according to described each module, with streaming ways of presentation, represent described format document.
In a concrete implementation procedure, describedly represent unit 23, specifically may be used for carrying out merging treatment to the exhibition active data of at least one row described, to obtain the initial module data of at least one module, at least one module described, the initial module data of each module comprise the exhibition active data of the module id information of this module and the row included by this module; And the order of the exhibition active data of the row included by described each module is adjusted, represent module data with what obtain described each module.
In the implementation procedure that another is concrete, described in represent unit 23, specifically to may be used for obtaining in described format document the metadata of included image content; According to the metadata representing module data and described image content of described each module, that the metadata of described image content is inserted respective modules represents module data, to obtain the binding module data of described respective modules; And according to the binding module data of described respective modules, and other modules at least one module described except described respective modules represent module data, with streaming ways of presentation, represent described format document.
In the implementation procedure that another is concrete, describedly represent unit 23, can also be further used for carrying out adjustment process to the module data that represents of the module of the peripheral attribute for describing described format document, represent described in making unit 23 can according to adjustment after each module represent module data, with streaming ways of presentation, represent described format document.
It should be noted that, method in the embodiment that Fig. 1 is corresponding, the treating apparatus of the format document that can be provided by the present embodiment realizes.Detailed description see the related content in embodiment corresponding to Fig. 1, can repeat no more herein.
In the present embodiment, the metadata of content of text included in format document is obtained by acquiring unit, and then by merge cells, merging treatment is carried out to the metadata of described content of text, to obtain the exhibition active data of at least one row, in at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row, making to represent unit can according to the exhibition active data of described each row, with streaming ways of presentation, represent described format document, without the need to artificial participation, simple to operate, and accuracy is high, thus improve efficiency and the reliability of format document process.
In addition, adopt technical scheme provided by the invention, by with streaming ways of presentation, represent format document, making the space of a whole page of format document present effect is no longer immobilize and can not edit, but can edit according to the size flexibility and changeability of the display device of terminal, thus improve the dirigibility of format document process.
In addition, adjustment process is carried out by the module data that represents of the module to the peripheral attribute for describing format document, such as, amendment or deletion etc., make it possible to according to adjustment after each module represent module data, with streaming ways of presentation, represent described format document, the size that can avoid due to the display device of terminal have very big-difference and cause represent format document with streaming ways of presentation time its content element for the peripheral attribute describing described format document there will be the problem of mistake, thus improve efficiency and the reliability of format document process.
Those skilled in the art can be well understood to, and for convenience and simplicity of description, the system of foregoing description, the specific works process of device and unit, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
In several embodiment provided by the present invention, should be understood that, disclosed system, apparatus and method, can realize by another way.Such as, device embodiment described above is only schematic, such as, the division of described unit, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of device or unit or communication connection can be electrical, machinery or other form.
The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form that hardware also can be adopted to add SFU software functional unit realizes.
The above-mentioned integrated unit realized with the form of SFU software functional unit, can be stored in a computer read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, comprising some instructions in order to make a computer installation (can be personal computer, server, or network equipment etc.) or processor (processor) perform the part steps of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. various can be program code stored medium.
Last it is noted that above embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (12)

1. a disposal route for format document, is characterized in that, comprising:
Obtain the metadata of content of text included in format document;
Carry out merging treatment to the metadata of described content of text, to obtain the exhibition active data of at least one row, at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row;
According to the exhibition active data of described each row, with streaming ways of presentation, represent described format document.
2. method according to claim 1, is characterized in that, the described metadata to described content of text carries out merging treatment, to obtain the exhibition active data of at least one row, comprising:
Carry out merging treatment to the metadata of described content of text, to obtain the initial row data of at least one row, at least one row described, the initial row data of each row comprise the metadata of the line identifier information of this row and the content of text included by this row;
The order of the metadata of content of text included in the initial row data of described each row is adjusted, to obtain the exhibition active data of described each row.
3. method according to claim 1 and 2, is characterized in that, the described exhibition active data according to described each row, with streaming ways of presentation, represents described format document, comprising:
Merging treatment is carried out to the exhibition active data of at least one row described, represent module data with what obtain at least one module, at least one module described, each module represents the exhibition active data that module data comprises the module id information of this module and the row included by this module;
Represent module data according to described each module, with streaming ways of presentation, represent described format document.
4. method according to claim 3, is characterized in that, the described exhibition active data at least one row described carries out merging treatment, represents module data, comprising with what obtain at least one module:
Merging treatment is carried out to the exhibition active data of at least one row described, to obtain the initial module data of at least one module, at least one module described, the initial module data of each module comprise the exhibition active data of the module id information of this module and the row included by this module;
The order of the exhibition active data of the row included by described each module is adjusted, represents module data with what obtain described each module.
5. method according to claim 3, is characterized in that, describedly represents module data according to described each module, with streaming ways of presentation, represents described format document, comprising:
Obtain the metadata of image content included in described format document;
According to the metadata representing module data and described image content of described each module, that the metadata of described image content is inserted respective modules represents module data, to obtain the binding module data of described respective modules;
According to the binding module data of described respective modules, and other modules at least one module described except described respective modules represent module data, with streaming ways of presentation, represent described format document.
6. method according to claim 3, is characterized in that, describedly represents module data according to described each module, with streaming ways of presentation, before representing described format document, also comprises:
Adjustment process is carried out to the module data that represents of the module of the peripheral attribute for describing described format document.
7. a treating apparatus for format document, is characterized in that, comprising:
Acquiring unit, for obtaining the metadata of content of text included in format document;
Merge cells, for carrying out merging treatment to the metadata of described content of text, to obtain the exhibition active data of at least one row, at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row;
Represent unit, for the exhibition active data according to described each row, with streaming ways of presentation, represent described format document.
8. device according to claim 7, is characterized in that, described merge cells, specifically for
Carry out merging treatment to the metadata of described content of text, to obtain the initial row data of at least one row, at least one row described, the initial row data of each row comprise the metadata of the line identifier information of this row and the content of text included by this row; And
The order of the metadata of content of text included in the initial row data of described each row is adjusted, to obtain the exhibition active data of described each row.
9. the device according to claim 7 or 8, is characterized in that, described in represent unit, specifically for
Merging treatment is carried out to the exhibition active data of at least one row described, represent module data with what obtain at least one module, at least one module described, each module represents the exhibition active data that module data comprises the module id information of this module and the row included by this module; And
Represent module data according to described each module, with streaming ways of presentation, represent described format document.
10. device according to claim 9, is characterized in that, described in represent unit, specifically for
Merging treatment is carried out to the exhibition active data of at least one row described, to obtain the initial module data of at least one module, at least one module described, the initial module data of each module comprise the exhibition active data of the module id information of this module and the row included by this module; And
The order of the exhibition active data of the row included by described each module is adjusted, represents module data with what obtain described each module.
11. devices according to claim 9, is characterized in that, described in represent unit, specifically for
Obtain the metadata of image content included in described format document;
According to the metadata representing module data and described image content of described each module, that the metadata of described image content is inserted respective modules represents module data, to obtain the binding module data of described respective modules; And
According to the binding module data of described respective modules, and other modules at least one module described except described respective modules represent module data, with streaming ways of presentation, represent described format document.
12. devices according to claim 9, is characterized in that, described in represent unit, also for
Adjustment process is carried out to the module data that represents of the module of the peripheral attribute for describing described format document.
CN201410753650.3A 2014-12-10 2014-12-10 Layout document processing method and device Pending CN104536947A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410753650.3A CN104536947A (en) 2014-12-10 2014-12-10 Layout document processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410753650.3A CN104536947A (en) 2014-12-10 2014-12-10 Layout document processing method and device

Publications (1)

Publication Number Publication Date
CN104536947A true CN104536947A (en) 2015-04-22

Family

ID=52852475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410753650.3A Pending CN104536947A (en) 2014-12-10 2014-12-10 Layout document processing method and device

Country Status (1)

Country Link
CN (1) CN104536947A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932221A (en) * 2017-05-25 2018-12-04 北大方正集团有限公司 File composition method and device based on blob
CN109597913A (en) * 2018-11-05 2019-04-09 东软集团股份有限公司 The method for being aligned document picture, device, storage medium and electronic equipment
CN109815453A (en) * 2018-12-25 2019-05-28 东软集团股份有限公司 Document method of partition, device, storage medium and electronic equipment
CN111695414A (en) * 2020-04-28 2020-09-22 北京奇艺世纪科技有限公司 Document processing method and device, electronic equipment and computer readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010014900A1 (en) * 2000-02-16 2001-08-16 Sun Microsystems, Inc. Method and system for separating content and layout of formatted objects
US7127673B2 (en) * 1999-12-21 2006-10-24 Fujitsu Limited Electronic document display system
CN101206639A (en) * 2007-12-20 2008-06-25 北大方正集团有限公司 Method for indexing complex impression based on PDF
CN101308488A (en) * 2008-06-05 2008-11-19 北大方正集团有限公司 Document stream type information processing method based on format document and device therefor
CN101887413A (en) * 2009-05-14 2010-11-17 北大方正集团有限公司 Structure processing method and system of plate type table
CN101923723A (en) * 2009-06-16 2010-12-22 汉王科技股份有限公司 Method for realizing display of electronic document

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7127673B2 (en) * 1999-12-21 2006-10-24 Fujitsu Limited Electronic document display system
US20010014900A1 (en) * 2000-02-16 2001-08-16 Sun Microsystems, Inc. Method and system for separating content and layout of formatted objects
CN101206639A (en) * 2007-12-20 2008-06-25 北大方正集团有限公司 Method for indexing complex impression based on PDF
CN101308488A (en) * 2008-06-05 2008-11-19 北大方正集团有限公司 Document stream type information processing method based on format document and device therefor
CN101887413A (en) * 2009-05-14 2010-11-17 北大方正集团有限公司 Structure processing method and system of plate type table
CN101923723A (en) * 2009-06-16 2010-12-22 汉王科技股份有限公司 Method for realizing display of electronic document

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932221A (en) * 2017-05-25 2018-12-04 北大方正集团有限公司 File composition method and device based on blob
CN109597913A (en) * 2018-11-05 2019-04-09 东软集团股份有限公司 The method for being aligned document picture, device, storage medium and electronic equipment
CN109815453A (en) * 2018-12-25 2019-05-28 东软集团股份有限公司 Document method of partition, device, storage medium and electronic equipment
CN111695414A (en) * 2020-04-28 2020-09-22 北京奇艺世纪科技有限公司 Document processing method and device, electronic equipment and computer readable storage medium
CN111695414B (en) * 2020-04-28 2024-03-01 北京奇艺世纪科技有限公司 Document processing method and device, electronic equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
US8819028B2 (en) System and method for web content extraction
CN108108342B (en) Structured text generation method, search method and device
CN109284145A (en) The generation of multilingual configuration file and methods of exhibiting and device, equipment and medium
CN103853806A (en) Method and device for converting table
CN103064920A (en) Method and device for scaling page fonts in mobile terminal
CN103500118A (en) Method and device for optimizing cascading style sheet
CN108804469B (en) Webpage identification method and electronic equipment
CN104331474A (en) Page processing method and device
CN109492177B (en) web page blocking method based on web page semantic structure
US9330075B2 (en) Method and apparatus for identifying garbage template article
CN103279457B (en) A kind of method and device generating chart based on Excel
CN105574092A (en) Information mining method and device
CN110263007A (en) A kind of file naming method, system and electronic equipment and storage medium
CN109445794B (en) Page construction method and device
CN104536947A (en) Layout document processing method and device
CN110704608A (en) Text theme generation method and device and computer equipment
CN106462933A (en) Using content structure to socially connect users
CN102959538A (en) Indexing documents
CN115659917A (en) Document format restoration method and device, electronic equipment and storage equipment
CN106407288A (en) Method and system for synchronously updating information
US10261987B1 (en) Pre-processing E-book in scanned format
CN105302776B (en) Data Proofreading Platform Server
CN104536948A (en) Layout document processing method and device
CN106547529A (en) Page makeup method and device
JP5715172B2 (en) Document display device, document display method, and document display program

Legal Events

Date Code Title Description
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150422

RJ01 Rejection of invention patent application after publication