CN104536947A - Layout document processing method and device - Google Patents
Layout document processing method and device Download PDFInfo
- Publication number
- CN104536947A CN104536947A CN201410753650.3A CN201410753650A CN104536947A CN 104536947 A CN104536947 A CN 104536947A CN 201410753650 A CN201410753650 A CN 201410753650A CN 104536947 A CN104536947 A CN 104536947A
- Authority
- CN
- China
- Prior art keywords
- module
- row
- data
- exhibition
- metadata
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Transfer Between Computers (AREA)
Abstract
The invention provides a layout document processing method and device. Metadata of a text content in a layout document are acquired and merged to obtain at least one row of presentation row data, and each row of presentation row data include row identification information and the metadata of the text content in the row, so that the layout document can be presented in a stream presentation mode according to the presentation row data of each row. Manual intervention is omitted, operation is simple, and accuracy is high, so that layout document processing efficiency and reliability are improved.
Description
[technical field]
The present invention relates to document processing technology, particularly relate to a kind of disposal route and device of format document.
[background technology]
Format document refers to the document meeting format document format specification, and it adopts a kind of definitely describing mode, in self-defining coordinate system, clearly have recorded the position of each page elements display, pattern and size etc.Format document form is that the space of a whole page presents the fixing electronic file form of effect, and presenting of format document has nothing to do with terminal, and when various terminal being read, print or prints, the result that presents of its space of a whole page is all consistent.Increasing e-book, the description of product, company's proclamation, network data, Email bring into use format document, such as, current Portable Document format (Portable Document Format, PDF) document is exactly the more typical format document of one.
Along with the increase of the kind of terminal, user needs to represent format document easily on multiple terminal.Size due to the display device of terminal has very big-difference, the screen of the mobile phone of especially various model, and therefore, the situation that normally cannot represent in different terminals may appear in the format document of fixing layout.In prior art, for the size of the display device of different terminals, artificial layout can be re-started to the content of format document, namely recalculate the position of each page elements display, pattern and size etc., to form the format document of applicable different terminals.
But, the complicated operation of artificial layout, and easily make mistakes, thus result in the efficiency of format document process and the reduction of reliability.
[summary of the invention]
Many aspects of the present invention provide a kind of disposal route and device of format document, in order to improve efficiency and the reliability of format document process.
An aspect of of the present present invention, provides a kind of disposal route of format document, comprising:
Obtain the metadata of content of text included in format document;
Carry out merging treatment to the metadata of described content of text, to obtain the exhibition active data of at least one row, at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row;
According to the exhibition active data of described each row, with streaming ways of presentation, represent described format document.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and the described metadata to described content of text carries out merging treatment, to obtain the exhibition active data of at least one row, comprising:
Carry out merging treatment to the metadata of described content of text, to obtain the initial row data of at least one row, at least one row described, the initial row data of each row comprise the metadata of the line identifier information of this row and the content of text included by this row;
The order of the metadata of content of text included in the initial row data of described each row is adjusted, to obtain the exhibition active data of described each row.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and the described exhibition active data according to described each row, with streaming ways of presentation, represents described format document, comprising:
Merging treatment is carried out to the exhibition active data of at least one row described, represent module data with what obtain at least one module, at least one module described, each module represents the exhibition active data that module data comprises the module id information of this module and the row included by this module;
Represent module data according to described each module, with streaming ways of presentation, represent described format document.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and the described exhibition active data at least one row described carries out merging treatment, represents module data, comprising with what obtain at least one module:
Merging treatment is carried out to the exhibition active data of at least one row described, to obtain the initial module data of at least one module, at least one module described, the initial module data of each module comprise the exhibition active data of the module id information of this module and the row included by this module;
The order of the exhibition active data of the row included by described each module is adjusted, represents module data with what obtain described each module.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, describedly represents module data according to described each module, with streaming ways of presentation, represents described format document, comprising:
Obtain the metadata of image content included in described format document;
According to the metadata representing module data and described image content of described each module, that the metadata of described image content is inserted respective modules represents module data, to obtain the binding module data of described respective modules;
According to the binding module data of described respective modules, and other modules at least one module described except described respective modules represent module data, with streaming ways of presentation, represent described format document.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, describedly represents module data according to described each module, with streaming ways of presentation, before representing described format document, also comprises:
Adjustment process is carried out to the module data that represents of the module of the peripheral attribute for describing described format document.
Another aspect of the present invention, provides a kind for the treatment of apparatus of format document, comprising:
Acquiring unit, for obtaining the metadata of content of text included in format document;
Merge cells, for carrying out merging treatment to the metadata of described content of text, to obtain the exhibition active data of at least one row, at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row;
Represent unit, for the exhibition active data according to described each row, with streaming ways of presentation, represent described format document.
Aspect as above and arbitrary possible implementation, provide a kind of implementation, described merge cells further, specifically for
Carry out merging treatment to the metadata of described content of text, to obtain the initial row data of at least one row, at least one row described, the initial row data of each row comprise the metadata of the line identifier information of this row and the content of text included by this row; And
The order of the metadata of content of text included in the initial row data of described each row is adjusted, to obtain the exhibition active data of described each row.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described in represent unit, specifically for
Merging treatment is carried out to the exhibition active data of at least one row described, represent module data with what obtain at least one module, at least one module described, each module represents the exhibition active data that module data comprises the module id information of this module and the row included by this module; And
Represent module data according to described each module, with streaming ways of presentation, represent described format document.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described in represent unit, specifically for
Merging treatment is carried out to the exhibition active data of at least one row described, to obtain the initial module data of at least one module, at least one module described, the initial module data of each module comprise the exhibition active data of the module id information of this module and the row included by this module; And
The order of the exhibition active data of the row included by described each module is adjusted, represents module data with what obtain described each module.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described in represent unit, specifically for
Obtain the metadata of image content included in described format document;
According to the metadata representing module data and described image content of described each module, that the metadata of described image content is inserted respective modules represents module data, to obtain the binding module data of described respective modules; And
According to the binding module data of described respective modules, and other modules at least one module described except described respective modules represent module data, with streaming ways of presentation, represent described format document.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described in represent unit, also for
Adjustment process is carried out to the module data that represents of the module of the peripheral attribute for describing described format document.
As shown from the above technical solution, the embodiment of the present invention is by obtaining the metadata of content of text included in format document, and then merging treatment is carried out to the metadata of described content of text, to obtain the exhibition active data of at least one row, in at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row, make it possible to the exhibition active data according to described each row, with streaming ways of presentation, represent described format document, without the need to artificial participation, simple to operate, and accuracy is high, thus improve efficiency and the reliability of format document process.
In addition, adopt technical scheme provided by the invention, by with streaming ways of presentation, represent format document, making the space of a whole page of format document present effect is no longer immobilize and can not edit, but can edit according to the size flexibility and changeability of the display device of terminal, thus improve the dirigibility of format document process.
In addition, adjustment process is carried out by the module data that represents of the module to the peripheral attribute for describing format document, such as, amendment or deletion etc., make it possible to according to adjustment after each module represent module data, with streaming ways of presentation, represent described format document, the size that can avoid due to the display device of terminal have very big-difference and cause represent format document with streaming ways of presentation time its content element for the peripheral attribute describing described format document there will be the problem of mistake, thus improve efficiency and the reliability of format document process.
[accompanying drawing explanation]
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The schematic flow sheet of the disposal route of the format document that Fig. 1 provides for one embodiment of the invention;
The structural representation of the treating apparatus of the format document that Fig. 2 provides for another embodiment of the present invention.
[embodiment]
For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making other embodiments whole obtained under creative work prerequisite, belong to the scope of protection of the invention.
It should be noted that, terminal involved in the embodiment of the present invention can include but not limited to mobile phone, personal digital assistant (Personal Digital Assistant, PDA), radio hand-held equipment, panel computer (Tablet Computer), PC (Personal Computer, PC), MP3 player, MP4 player, wearable device (such as, intelligent glasses, intelligent watch, Intelligent bracelet etc.) etc.
In addition, term "and/or" herein, being only a kind of incidence relation describing affiliated partner, can there are three kinds of relations in expression, and such as, A and/or B, can represent: individualism A, exists A and B simultaneously, these three kinds of situations of individualism B.In addition, character "/" herein, general expression forward-backward correlation is to the relation liking a kind of "or".
The schematic flow sheet of the disposal route of the format document that Fig. 1 provides for one embodiment of the invention, as shown in Figure 1.
101, the metadata of content of text included in format document is obtained.
102, carry out merging treatment to the metadata of described content of text, to obtain the exhibition active data of at least one row, at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row.
103, according to the exhibition active data of described each row, with streaming ways of presentation, described format document is represented.
Like this, the exhibition active data of obtained each row can be utilized, editing and processing is carried out to represented format document.
It should be noted that, the executive agent of 101 ~ 103 can for being positioned at the application of local terminal, or can also for being arranged in plug-in unit or SDK (Software Development Kit) (the SoftwareDevelopment Kit of the application of local terminal, the functional unit such as SDK), or can also for being arranged in the processing engine of the server of network side, or can also for being positioned at the distributed system of network side, the present embodiment is not particularly limited this.
Be understandable that, described application can be mounted in the local program (nativeApp) in terminal, or can also be a web page program (webApp) of browser in terminal, and the present embodiment is not particularly limited this.
Like this, by obtaining the metadata of content of text included in format document, and then merging treatment is carried out to the metadata of described content of text, to obtain the exhibition active data of at least one row, in at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row, make it possible to the exhibition active data according to described each row, with streaming ways of presentation, represent described format document, without the need to artificial participation, simple to operate, and accuracy is high, thus improve efficiency and the reliability of format document process.
Alternatively, in one of the present embodiment possible implementation, in 101, the metadata of content of text included in the format document obtained can include but not limited at least one item in the metadata of content of text included in the metadata of content of text included in the catalog page of described release documentation and the text page of release documentation, and the present embodiment is not particularly limited this.
Format document refers to the document meeting format document format specification, and it adopts a kind of definitely describing mode, in self-defining coordinate system, clearly have recorded the position of each page elements display, size and pattern etc.The metadata of format document can include but not limited to the data such as position, size, the color and style of each page elements in format document (as word, picture or hyperlink etc.).
Usually, in board-like document, its content of text can be divided into several content element, and can be a character, or can also be several character, the present embodiment be particularly limited this.Each content element one group metadata represents.
In the present embodiment, the metadata of format document can comprise at least one item in the metadata of content of text and the metadata of image content, its storage format can be multiple format, such as, JavaScript object representation (JavaScript Object Notation, JSON) form etc.
JSON is a kind of data presentation technique of light weight.The mode record data that JSON form adopts key assignments (key-value) right are very intuitively, succinct than extend markup language (Extensible MarkupLanguage, XML).
So-called catalog page, refers to the page it comprising structurized directory information.Such as, the page that catalogue, Contents etc. have the keyword of directory feature is comprised; Or, more such as, comprise " ... ... ... XX ", "------------------XX " or "------------------XX " etc. there is the page of the character of directory feature, wherein, XX represents the page number such as arabic numeral, English digital character.
So-called text page, refers to the page it comprising text message.
Alternatively, in one of the present embodiment possible implementation, in 102, specifically can carry out merging treatment to the metadata of described content of text, to obtain the initial row data of at least one row, in at least one row described, the initial row data of each row comprise the metadata of the line identifier information of this row and the content of text included by this row, and then, again the order of the metadata of content of text included in the initial row data of described each row is adjusted, to obtain the exhibition active data of described each row.
Particularly, specifically can increase a new zone bit in the metadata of content of text, in order to represent the row belonging to several content element included by described content of text, such as, _ line position, initial value is 0.
In a concrete implementation procedure, specifically can the position data of several content element included by described content of text and the position data of content element, merging treatment is carried out to the metadata of described content of text, to obtain the initial row data of at least one row.
Wherein, described content element is determined according to the metadata of format document, and can be a character, or can also be several character, the present embodiment be particularly limited this.When the layout of format document, by the content element that divides in advance integrally, its characteristic of correspondence data can be set.Wherein, described characteristic can include but not limited at least one item in position, font, size, color, pattern and typesetting format, and the present embodiment is not particularly limited this.
The font of the character in described content element, referring to the external form feature of the character in content element, is exactly the style of character, is the coat of character, such as, the Song typeface, regular script or lishu etc.
The size of the character in described content element, refers to the size of the character in content element, such as, and No. four (14 pounds), little No. four (No. 12) or No. five (10.5 pounds) etc.
The color of the character in described content element, refers to the color of the character in content element, such as, red or blue etc.
The pattern of the character in described content element, refers to the style of the character in content element, such as, and overstriking or italic etc.
The typesetting format of the character in described content element, refers to the distribution form of the character in content element, such as, between two parties, often row be no more than at most S (S be more than or equal to 1 integer) individual character or often go ending there is no punctuation mark etc.
Such as, suppose that the position data of certain content element can be designated as (X, Y), in order to represent the coordinate in the facing pages upper left corner of this content element, wherein, X represents horizontal ordinate, and Y represents ordinate.Specifically the difference of Y-coordinate can be less than or equal to the content element of the Y-coordinate threshold value pre-set, be merged into a row.
In the implementation procedure that another is concrete, because the typesetting format in format document is varied, such as, subfield etc., even if make content element can belong to a line in position, but also not necessarily really in terms of content belong to a line, therefore, other characteristics of content element can also be utilized further, correction process is carried out to merged row, can include but not limited at least one item in the font of the character in content element, size, color, pattern and typesetting format, the present embodiment is not particularly limited this.
A kind of optional correction processing method, specifically can utilize the X-coordinate of content element, carry out correction process to merged row.Particularly, the difference of the X-coordinate between two neighbouring content cells within each row can specifically be calculated.If certain difference is less than or equal to the X-coordinate threshold value that pre-sets such as, 1.2 times of average font size etc., then can retain current amalgamation result; If be greater than this X-coordinate threshold value, then need to delete the amalgamation result of these two adjacent content element.
Another kind of optional correction processing method, specifically can utilize font or the pattern of the character of the content element within each row, carry out correction process to merged row.Particularly, font or the pattern of the character of the content element within each row can specifically be obtained.If the font of the character of the content element within each row or pattern are unanimously, then can retain current amalgamation result; If in a row in the font of the character of some content element or certain several content element or pattern and this row the font of the character of other guide unit or pattern inconsistent, then need the amalgamation result deleting this content element or these content element.
The order of the metadata of content of text included in the initial row data to described each row adjusts, after the exhibition active data obtaining described each row, further, can also according to the metadata of the content element within each row, the line width of the reference position calculating this row, the end position of changing one's profession, this row, the line space etc. between these row and adjacent lines, and added in the exhibition active data of this row, using the merging foundation as follow-up merging treatment.
In the implementation procedure that another is concrete, because the storage of the metadata of content of text included in format document is unordered, therefore, the order of content element included in each row that described merging treatment obtains may according to the order of particular content, therefore, also need the metadata utilizing each content element further, such as, the particular content of content element, the X-coordinate etc. of content element, the order of the metadata of content of text included in the initial row data of described each row is adjusted, to obtain the exhibition active data of described each row.
Alternatively, in one of the present embodiment possible implementation, in 103, specifically can carry out merging treatment to the exhibition active data of at least one row described, represent module data with what obtain at least one module, at least one module described each module represent the exhibition active data that module data comprises the module id information of this module and the row included by this module, and then then can represent module data according to described each module, with streaming ways of presentation, represent described format document.
Particularly, specifically can increase a new zone bit in the metadata of content of text, in order to represent the module belonging to several content element included by described content of text, such as, _ module position, initial value is 0.
In a concrete implementation procedure, specifically can utilize the exhibition active data of each row such as, the data such as particular content, line space, capable line width of capable content element, merging treatment is carried out to the exhibition active data of at least one row described, represents module data with what obtain at least one module.
Such as, specifically can obtain the particular content of the content element within adjacent lines, determine that whether the particular content of the content element within two row is consistent, if the content of the two is consistent, illustrate to be same theme illustrated by these two adjacent lines, then by these two adjacent lines, a module can be merged into, and then then can the exhibition active data of row included by this module, what obtain this module current represents module data; If the content of the two is inconsistent, illustrate not to be same theme illustrated by these two adjacent lines, then can abandon these current two adjacent lines, no longer perform these two adjacent lines, be merged into the operation of a module, but the like, continue the particular content of the content element obtained within other adjacent lines, until be disposed by the particular content of the content element within all row.
In concrete application process, before determining that whether corresponding particular content is consistent, word segmentation processing can also be carried out, to obtain word segmentation result to the particular content of correspondence respectively further.At this, word segmentation processing technology has been this area comparatively proven technique, for English, because English itself is in units of word, separates, therefore can realize participle easily between word with word by space.Chinese is in units of word, can adopt such as existing: based on string matching segmenting method, based on the segmenting method understood or the segmenting method etc. of Corpus--based Method, word segmentation processing is carried out to Chinese, comparatively conventional such as based on the maximum forward matching algorithm in the segmenting method of string matching, detailed description see related content of the prior art, can repeat no more herein.
After word segmentation processing is carried out to the particular content of correspondence, in order to improve the efficiency of subsequent treatment and reduce noise, filtration treatment is carried out to each word obtained after word segmentation processing, includes but not limited to following listed filtration treatment: filter out the word that default inactive vocabulary comprises; Wherein, generic word list is that these words do not possess independent competency usually in advance based on function word, auxiliary word, pronoun, article, adverbial word, modal particle etc. that word frequency statistics goes out.Specifically can carry out collection by the word frequency of occurrences in existing resource being reached to default high frequency condition to obtain, such as, auxiliary word " " there is the very high frequency of occurrences, but it has very low competency usually, therefore, is collected in inactive vocabulary.
Particularly, after the word segmentation result obtaining corresponding particular content, determine the concrete operations whether content of corresponding particular content is consistent, various ways can also be had, such as, after the word segmentation result obtaining corresponding particular content, text similarity measurement algorithm of the prior art can also be adopted, whether the similarity between each particular content that calculating obtains is consistent to determine the content of corresponding particular content.Such as, Longest Common Substring method, longest common subsequence method, minimum editing distance method, Hamming distance method, cosine value method etc., detailed description see related content of the prior art, can repeat no more herein.Other concrete operations of the present embodiment are not particularly limited.
Or, more such as, specifically the difference of the line width of adjacent lines can be less than or equal to the row of the line width threshold value pre-set, be merged into a module.
Or, more such as, specifically the difference of the line space of adjacent lines can be less than or equal to the row of the line space threshold value pre-set, be merged into a module.
In a concrete implementation procedure, specifically can carry out merging treatment to the exhibition active data of at least one row described, to obtain the initial module data of at least one module, in at least one module described, the initial module data of each module comprise the exhibition active data of the module id information of this module and the row included by this module, and then again the order of the exhibition active data of the row included by described each module is adjusted, represent module data with what obtain described each module.
In the implementation procedure that another is concrete, specifically can obtain the metadata of image content included in described format document, and then according to the metadata representing module data and described image content of described each module, that the metadata of described image content is inserted respective modules represents module data, to obtain the binding module data of described respective modules, then then can according to the binding module data of described respective modules, and other modules at least one module described except described respective modules represent module data, with streaming ways of presentation, represent described format document.
Such as, specifically can according to the position data of several picture element unit cell included in the metadata of image content, with, the position data of several content element included in the metadata of content of text and the position data of content element, determine the content element corresponding with each picture element unit cell.Then, that the metadata of this picture element unit cell is inserted the module belonging to content element corresponding to it represents module data.
Size due to the display device of terminal has very big-difference, the screen of the mobile phone of especially various model, therefore, if with streaming ways of presentation, represent format document, may mistake be there is in its content element for the peripheral attribute describing described format document, such as, and header, footer, sidenote, annotations and comments etc.Alternatively, in the implementation procedure that another is concrete, module data is being represented according to described each module, with streaming ways of presentation, before representing described format document, module data can also be represented further according to described each module, adjustment process is carried out to the module data that represents of the module of the peripheral attribute for describing described format document, such as, amendment or deletion etc.Like this, then can according to adjustment after each module represent module data, with streaming ways of presentation, represent described format document, the size that can avoid due to the display device of terminal have very big-difference and cause represent format document with streaming ways of presentation time its content element for the peripheral attribute describing described format document there will be the problem of mistake, thus improve efficiency and the reliability of format document process.
In the present embodiment, by obtaining the metadata of content of text included in format document, and then merging treatment is carried out to the metadata of described content of text, to obtain the exhibition active data of at least one row, in at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row, make it possible to the exhibition active data according to described each row, with streaming ways of presentation, represent described format document, without the need to artificial participation, simple to operate, and accuracy is high, thus improve efficiency and the reliability of format document process.
In addition, adopt technical scheme provided by the invention, by with streaming ways of presentation, represent format document, making the space of a whole page of format document present effect is no longer immobilize and can not edit, but can edit according to the size flexibility and changeability of the display device of terminal, thus improve the dirigibility of format document process.
In addition, adjustment process is carried out by the module data that represents of the module to the peripheral attribute for describing format document, such as, amendment or deletion etc., make it possible to according to adjustment after each module represent module data, with streaming ways of presentation, represent described format document, the size that can avoid due to the display device of terminal have very big-difference and cause represent format document with streaming ways of presentation time its content element for the peripheral attribute describing described format document there will be the problem of mistake, thus improve efficiency and the reliability of format document process.
It should be noted that, for aforesaid each embodiment of the method, in order to simple description, therefore it is all expressed as a series of combination of actions, but those skilled in the art should know, the present invention is not by the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and involved action and module might not be that the present invention is necessary.
In the above-described embodiments, the description of each embodiment is all emphasized particularly on different fields, in certain embodiment, there is no the part described in detail, can see the associated description of other embodiments.
The structural representation of the treating apparatus of the format document that Fig. 2 provides for another embodiment of the present invention, as shown in Figure 2.The treating apparatus of the format document of the present embodiment can comprise acquiring unit 21, merge cells 22 and represent unit 23.Wherein, acquiring unit 21, for obtaining the metadata of content of text included in format document; Merge cells 22, for carrying out merging treatment to the metadata of described content of text, to obtain the exhibition active data of at least one row, at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row; Represent unit 23, for the exhibition active data according to described each row, with streaming ways of presentation, represent described format document.
Like this, the exhibition active data of each row that acquiring unit 21 can be utilized to obtain, carries out editing and processing to representing the format document that unit 23 represents.
It should be noted that, the treating apparatus of the format document that the present embodiment provides can for being positioned at the application of local terminal, or can also for being arranged in plug-in unit or SDK (Software Development Kit) (the Software Development Kit of the application of local terminal, the functional unit such as SDK), or can also for being arranged in the processing engine of the server of network side, or can also for being positioned at the distributed system of network side, the present embodiment is not particularly limited this.
Be understandable that, described application can be mounted in the local program (nativeApp) in terminal, or can also be a web page program (webApp) of browser in terminal, and the present embodiment is not particularly limited this.
Alternatively, in one of the present embodiment possible implementation, described merge cells 22, specifically may be used for carrying out merging treatment to the metadata of described content of text, to obtain the initial row data of at least one row, at least one row described, the initial row data of each row comprise the metadata of the line identifier information of this row and the content of text included by this row; And the order of the metadata of content of text included in the initial row data of described each row is adjusted, to obtain the exhibition active data of described each row.
Alternatively, in one of the present embodiment possible implementation, describedly represent unit 23, specifically may be used for carrying out merging treatment to the exhibition active data of at least one row described, represent module data with what obtain at least one module, at least one module described, each module represents the exhibition active data that module data comprises the module id information of this module and the row included by this module; And represent module data according to described each module, with streaming ways of presentation, represent described format document.
In a concrete implementation procedure, describedly represent unit 23, specifically may be used for carrying out merging treatment to the exhibition active data of at least one row described, to obtain the initial module data of at least one module, at least one module described, the initial module data of each module comprise the exhibition active data of the module id information of this module and the row included by this module; And the order of the exhibition active data of the row included by described each module is adjusted, represent module data with what obtain described each module.
In the implementation procedure that another is concrete, described in represent unit 23, specifically to may be used for obtaining in described format document the metadata of included image content; According to the metadata representing module data and described image content of described each module, that the metadata of described image content is inserted respective modules represents module data, to obtain the binding module data of described respective modules; And according to the binding module data of described respective modules, and other modules at least one module described except described respective modules represent module data, with streaming ways of presentation, represent described format document.
In the implementation procedure that another is concrete, describedly represent unit 23, can also be further used for carrying out adjustment process to the module data that represents of the module of the peripheral attribute for describing described format document, represent described in making unit 23 can according to adjustment after each module represent module data, with streaming ways of presentation, represent described format document.
It should be noted that, method in the embodiment that Fig. 1 is corresponding, the treating apparatus of the format document that can be provided by the present embodiment realizes.Detailed description see the related content in embodiment corresponding to Fig. 1, can repeat no more herein.
In the present embodiment, the metadata of content of text included in format document is obtained by acquiring unit, and then by merge cells, merging treatment is carried out to the metadata of described content of text, to obtain the exhibition active data of at least one row, in at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row, making to represent unit can according to the exhibition active data of described each row, with streaming ways of presentation, represent described format document, without the need to artificial participation, simple to operate, and accuracy is high, thus improve efficiency and the reliability of format document process.
In addition, adopt technical scheme provided by the invention, by with streaming ways of presentation, represent format document, making the space of a whole page of format document present effect is no longer immobilize and can not edit, but can edit according to the size flexibility and changeability of the display device of terminal, thus improve the dirigibility of format document process.
In addition, adjustment process is carried out by the module data that represents of the module to the peripheral attribute for describing format document, such as, amendment or deletion etc., make it possible to according to adjustment after each module represent module data, with streaming ways of presentation, represent described format document, the size that can avoid due to the display device of terminal have very big-difference and cause represent format document with streaming ways of presentation time its content element for the peripheral attribute describing described format document there will be the problem of mistake, thus improve efficiency and the reliability of format document process.
Those skilled in the art can be well understood to, and for convenience and simplicity of description, the system of foregoing description, the specific works process of device and unit, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
In several embodiment provided by the present invention, should be understood that, disclosed system, apparatus and method, can realize by another way.Such as, device embodiment described above is only schematic, such as, the division of described unit, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of device or unit or communication connection can be electrical, machinery or other form.
The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form that hardware also can be adopted to add SFU software functional unit realizes.
The above-mentioned integrated unit realized with the form of SFU software functional unit, can be stored in a computer read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, comprising some instructions in order to make a computer installation (can be personal computer, server, or network equipment etc.) or processor (processor) perform the part steps of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. various can be program code stored medium.
Last it is noted that above embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.
Claims (12)
1. a disposal route for format document, is characterized in that, comprising:
Obtain the metadata of content of text included in format document;
Carry out merging treatment to the metadata of described content of text, to obtain the exhibition active data of at least one row, at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row;
According to the exhibition active data of described each row, with streaming ways of presentation, represent described format document.
2. method according to claim 1, is characterized in that, the described metadata to described content of text carries out merging treatment, to obtain the exhibition active data of at least one row, comprising:
Carry out merging treatment to the metadata of described content of text, to obtain the initial row data of at least one row, at least one row described, the initial row data of each row comprise the metadata of the line identifier information of this row and the content of text included by this row;
The order of the metadata of content of text included in the initial row data of described each row is adjusted, to obtain the exhibition active data of described each row.
3. method according to claim 1 and 2, is characterized in that, the described exhibition active data according to described each row, with streaming ways of presentation, represents described format document, comprising:
Merging treatment is carried out to the exhibition active data of at least one row described, represent module data with what obtain at least one module, at least one module described, each module represents the exhibition active data that module data comprises the module id information of this module and the row included by this module;
Represent module data according to described each module, with streaming ways of presentation, represent described format document.
4. method according to claim 3, is characterized in that, the described exhibition active data at least one row described carries out merging treatment, represents module data, comprising with what obtain at least one module:
Merging treatment is carried out to the exhibition active data of at least one row described, to obtain the initial module data of at least one module, at least one module described, the initial module data of each module comprise the exhibition active data of the module id information of this module and the row included by this module;
The order of the exhibition active data of the row included by described each module is adjusted, represents module data with what obtain described each module.
5. method according to claim 3, is characterized in that, describedly represents module data according to described each module, with streaming ways of presentation, represents described format document, comprising:
Obtain the metadata of image content included in described format document;
According to the metadata representing module data and described image content of described each module, that the metadata of described image content is inserted respective modules represents module data, to obtain the binding module data of described respective modules;
According to the binding module data of described respective modules, and other modules at least one module described except described respective modules represent module data, with streaming ways of presentation, represent described format document.
6. method according to claim 3, is characterized in that, describedly represents module data according to described each module, with streaming ways of presentation, before representing described format document, also comprises:
Adjustment process is carried out to the module data that represents of the module of the peripheral attribute for describing described format document.
7. a treating apparatus for format document, is characterized in that, comprising:
Acquiring unit, for obtaining the metadata of content of text included in format document;
Merge cells, for carrying out merging treatment to the metadata of described content of text, to obtain the exhibition active data of at least one row, at least one row described, the exhibition active data of each row comprises the metadata of the line identifier information of this row and the content of text included by this row;
Represent unit, for the exhibition active data according to described each row, with streaming ways of presentation, represent described format document.
8. device according to claim 7, is characterized in that, described merge cells, specifically for
Carry out merging treatment to the metadata of described content of text, to obtain the initial row data of at least one row, at least one row described, the initial row data of each row comprise the metadata of the line identifier information of this row and the content of text included by this row; And
The order of the metadata of content of text included in the initial row data of described each row is adjusted, to obtain the exhibition active data of described each row.
9. the device according to claim 7 or 8, is characterized in that, described in represent unit, specifically for
Merging treatment is carried out to the exhibition active data of at least one row described, represent module data with what obtain at least one module, at least one module described, each module represents the exhibition active data that module data comprises the module id information of this module and the row included by this module; And
Represent module data according to described each module, with streaming ways of presentation, represent described format document.
10. device according to claim 9, is characterized in that, described in represent unit, specifically for
Merging treatment is carried out to the exhibition active data of at least one row described, to obtain the initial module data of at least one module, at least one module described, the initial module data of each module comprise the exhibition active data of the module id information of this module and the row included by this module; And
The order of the exhibition active data of the row included by described each module is adjusted, represents module data with what obtain described each module.
11. devices according to claim 9, is characterized in that, described in represent unit, specifically for
Obtain the metadata of image content included in described format document;
According to the metadata representing module data and described image content of described each module, that the metadata of described image content is inserted respective modules represents module data, to obtain the binding module data of described respective modules; And
According to the binding module data of described respective modules, and other modules at least one module described except described respective modules represent module data, with streaming ways of presentation, represent described format document.
12. devices according to claim 9, is characterized in that, described in represent unit, also for
Adjustment process is carried out to the module data that represents of the module of the peripheral attribute for describing described format document.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410753650.3A CN104536947A (en) | 2014-12-10 | 2014-12-10 | Layout document processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410753650.3A CN104536947A (en) | 2014-12-10 | 2014-12-10 | Layout document processing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104536947A true CN104536947A (en) | 2015-04-22 |
Family
ID=52852475
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410753650.3A Pending CN104536947A (en) | 2014-12-10 | 2014-12-10 | Layout document processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104536947A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108932221A (en) * | 2017-05-25 | 2018-12-04 | 北大方正集团有限公司 | File composition method and device based on blob |
CN109597913A (en) * | 2018-11-05 | 2019-04-09 | 东软集团股份有限公司 | The method for being aligned document picture, device, storage medium and electronic equipment |
CN109815453A (en) * | 2018-12-25 | 2019-05-28 | 东软集团股份有限公司 | Document method of partition, device, storage medium and electronic equipment |
CN111695414A (en) * | 2020-04-28 | 2020-09-22 | 北京奇艺世纪科技有限公司 | Document processing method and device, electronic equipment and computer readable storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010014900A1 (en) * | 2000-02-16 | 2001-08-16 | Sun Microsystems, Inc. | Method and system for separating content and layout of formatted objects |
US7127673B2 (en) * | 1999-12-21 | 2006-10-24 | Fujitsu Limited | Electronic document display system |
CN101206639A (en) * | 2007-12-20 | 2008-06-25 | 北大方正集团有限公司 | Method for indexing complex impression based on PDF |
CN101308488A (en) * | 2008-06-05 | 2008-11-19 | 北大方正集团有限公司 | Document stream type information processing method based on format document and device therefor |
CN101887413A (en) * | 2009-05-14 | 2010-11-17 | 北大方正集团有限公司 | Structure processing method and system of plate type table |
CN101923723A (en) * | 2009-06-16 | 2010-12-22 | 汉王科技股份有限公司 | Method for realizing display of electronic document |
-
2014
- 2014-12-10 CN CN201410753650.3A patent/CN104536947A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7127673B2 (en) * | 1999-12-21 | 2006-10-24 | Fujitsu Limited | Electronic document display system |
US20010014900A1 (en) * | 2000-02-16 | 2001-08-16 | Sun Microsystems, Inc. | Method and system for separating content and layout of formatted objects |
CN101206639A (en) * | 2007-12-20 | 2008-06-25 | 北大方正集团有限公司 | Method for indexing complex impression based on PDF |
CN101308488A (en) * | 2008-06-05 | 2008-11-19 | 北大方正集团有限公司 | Document stream type information processing method based on format document and device therefor |
CN101887413A (en) * | 2009-05-14 | 2010-11-17 | 北大方正集团有限公司 | Structure processing method and system of plate type table |
CN101923723A (en) * | 2009-06-16 | 2010-12-22 | 汉王科技股份有限公司 | Method for realizing display of electronic document |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108932221A (en) * | 2017-05-25 | 2018-12-04 | 北大方正集团有限公司 | File composition method and device based on blob |
CN109597913A (en) * | 2018-11-05 | 2019-04-09 | 东软集团股份有限公司 | The method for being aligned document picture, device, storage medium and electronic equipment |
CN109815453A (en) * | 2018-12-25 | 2019-05-28 | 东软集团股份有限公司 | Document method of partition, device, storage medium and electronic equipment |
CN111695414A (en) * | 2020-04-28 | 2020-09-22 | 北京奇艺世纪科技有限公司 | Document processing method and device, electronic equipment and computer readable storage medium |
CN111695414B (en) * | 2020-04-28 | 2024-03-01 | 北京奇艺世纪科技有限公司 | Document processing method and device, electronic equipment and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8819028B2 (en) | System and method for web content extraction | |
CN108108342B (en) | Structured text generation method, search method and device | |
CN109284145A (en) | The generation of multilingual configuration file and methods of exhibiting and device, equipment and medium | |
CN103853806A (en) | Method and device for converting table | |
CN103064920A (en) | Method and device for scaling page fonts in mobile terminal | |
CN103500118A (en) | Method and device for optimizing cascading style sheet | |
CN108804469B (en) | Webpage identification method and electronic equipment | |
CN104331474A (en) | Page processing method and device | |
CN109492177B (en) | web page blocking method based on web page semantic structure | |
US9330075B2 (en) | Method and apparatus for identifying garbage template article | |
CN103279457B (en) | A kind of method and device generating chart based on Excel | |
CN105574092A (en) | Information mining method and device | |
CN110263007A (en) | A kind of file naming method, system and electronic equipment and storage medium | |
CN109445794B (en) | Page construction method and device | |
CN104536947A (en) | Layout document processing method and device | |
CN110704608A (en) | Text theme generation method and device and computer equipment | |
CN106462933A (en) | Using content structure to socially connect users | |
CN102959538A (en) | Indexing documents | |
CN115659917A (en) | Document format restoration method and device, electronic equipment and storage equipment | |
CN106407288A (en) | Method and system for synchronously updating information | |
US10261987B1 (en) | Pre-processing E-book in scanned format | |
CN105302776B (en) | Data Proofreading Platform Server | |
CN104536948A (en) | Layout document processing method and device | |
CN106547529A (en) | Page makeup method and device | |
JP5715172B2 (en) | Document display device, document display method, and document display program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150422 |
|
RJ01 | Rejection of invention patent application after publication |