CN102841893A - Method and device for processing fragmentation data in document - Google Patents
Method and device for processing fragmentation data in document Download PDFInfo
- Publication number
- CN102841893A CN102841893A CN201110168129XA CN201110168129A CN102841893A CN 102841893 A CN102841893 A CN 102841893A CN 201110168129X A CN201110168129X A CN 201110168129XA CN 201110168129 A CN201110168129 A CN 201110168129A CN 102841893 A CN102841893 A CN 102841893A
- Authority
- CN
- China
- Prior art keywords
- document
- segment data
- attribute
- publication
- storaging mark
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and a device for processing fragmentation data in a document. The method disclosed by the invention comprises the following steps of: extracting the fragmentation data in the document, and recording attributes of the fragmentation data, the attribute of the document and the attribute of publication to which the document belongs in association. The invention also provides the device for processing the fragmentation data in the document. The device comprises an extraction module and a recording module; the extracting module is used for the fragmentation data in the document, and the recording module is used for recording the attributes of the fragmentation data, the attribute of the document and the attribute of publication to which the document belongs in association. According to the invention, the extracted attributes of the fragmentation data, the attribute of the document and the attribute of publication to which the document belongs are recorded in association, thus, fast search foundation is provided when the fragmentation data is subsequently searched.
Description
Technical field
The present invention relates to field of computer data processing, in particular to the method and apparatus of handling segment data in the document.
Background technology
In the present publishing area, mainly publish the paper publication thing through the mode of " selected topic is planned, solicited contributions, goes over a manuscript or draft, sets type, prints ".Books divide chapter usually, and collection of thesis is concentrated publication by many pieces of papers usually, and periodical is made up of many pieces of separate contributions.Various types of contents in the contribution, like picture, character, video segment, audio fragment etc., these contents are referred to as " segment data " usually.
Publication is formed by more segment data aggregate usually.The user needs to be scattered in segment data extract and the arrangement in many publications, and reduced data is assembled into publication.
The inventor finds that the segment data are dispersed in each electronic document, owing to not about the data relationship of segment data, be not easy to inquire about some segment data.The process that the user searches segment data in the publication is comparatively loaded down with trivial details, and for one piece of article even one section word of certain publication, owing to need browse the whole electronic document of this publication, search efficiency is lower.
Summary of the invention
The present invention aims to provide a kind of method and apparatus of handling segment data in the document, to solve the above-mentioned problem that can't set up about the data relationship of segment data.
In an embodiment of the present invention, a kind of method of handling segment data in the document is provided, has comprised: extracted the segment data in the document; Write down the attribute of the publication that attribute and said document belonged to of the attribute of said segment data, said document explicitly.
In an embodiment of the present invention, a kind of device of handling segment data in the document is provided, has comprised: extraction module is used for extracting the segment data of document; Logging modle is used for writing down explicitly the attribute of the publication that attribute and said document belonged to of the attribute of said segment data, said document.
Embodiments of the invention write down the attribute of publication of attribute, ownership of attribute and ownership document of the segment data of extraction relatedly.Thereby being convenient to, provides and has searched foundation fast when searching the segment data for follow-up.
Description of drawings
Accompanying drawing described herein is used to provide further understanding of the present invention, constitutes the application's a part, and illustrative examples of the present invention and explanation thereof are used to explain the present invention, do not constitute improper qualification of the present invention.In the accompanying drawings:
Fig. 1 shows the process flow diagram of embodiment one;
Fig. 2 shows the process flow diagram of embodiment two;
Fig. 3 shows the screenshot capture of selecting document among the embodiment;
Fig. 4 shows the structured flowchart of embodiment three.
Embodiment
Below with reference to accompanying drawing and combine embodiment, specify the present invention.Referring to Fig. 1, Fig. 1 is the process flow diagram of the embodiment of the invention one, comprising:
Step S 11: extract the segment data in the document;
Publication is made up of a plurality of documents among the embodiment, and for example: the publication of photography class, the inside comprises a plurality of chapters and sections, in the content stores to of each chapters and sections document, in document, segment data such as note, picture is arranged.
Extract the process of the segment data in the document, can obtain to store the file of segment data in the document earlier, for example: the word document is made up of a plurality of subdocuments; Comprise the document of paragraph format, the document of Show Styles, the document of memory contents etc.; The word document is changed, can be obtained these documents of xml form, through the node of document of traversal memory contents; Can extract the content in each node, i.e. the segment data.
Step S12: the attribute that writes down the publication that attribute and said document belonged to of the attribute of said segment data, said document explicitly.
For the segment data of extracting, for ease of follow-up searching, attribute that can the segment data are relevant is related ground record together.In the present embodiment, the property store of the publication that attribute and document belonged to of the attribute of segment data, document in a record, is convenient to the follow-up segment data of searching.
Through write down these attributes relatedly, be convenient to subsequent query segment data about the segment data.Through receiving the keyword of user's input, but search fast in the dependency data attribute related segment data, and be shown to the user.
Method among the embodiment; Also can define the acquisition module of the segment data of need extracting in advance, define various segment data, and set up storaging mark respectively for segment data, document, publication through acquisition module; Store in each database, thereby be convenient to search relatedly.Through the embodiment explanation, the process flow diagram referring to the embodiment shown in Fig. 2 two may further comprise the steps below:
Step S21: gather the segment data in the document according to predefined template.
In the present embodiment, be that example describes with the word document, the segment data storage is in the document of an xml form forming the word document.Need define the acquisition module of xml form in advance,, call the document of the xml form of storage segment data, thereby extract the segment data through acquisition module.
The partial code of acquisition module is following:
In acquisition module, the metadata (being attribute) and the intersegmental relation of relational database memory word of the segment data in the tableMap definition document.Relational database comprises a plurality of tables, the segment data that each table is corresponding a type.Corresponding segment data of a record of each table.Each table comprises multiple row, and each row is each metadata description of corresponding segment data respectively.Wherein, the table node definition table name of segment data storage, the meta node has specifically defined the metadata of segment data and the relation of database storing interfield.The meta node comprises following three attributes:
Name is the node name of document, is used for this node in the document location through this nodename.
ValType node processing type is handled the method for the node of appointment, the method for the corresponding a kind of processing node of each type through the decision of this attribute.For example, obtain the character data of node, standard (or standard) is changed the node character data, changes the form of picture, the form of conversion audio file etc., extracts the metadata (being attribute) of segment data simultaneously.After the node content processing, be kept at the attribute of segment data among the colName.
The run after fame field name of the database that is called " chapter storehouse " of colName is used to preserve the result to after the node processing.
During collection, from books, select corresponding publication or document to get final product, in the title zone as shown in Figure 3, the books of selection are The Analects of Confucius, the document of selection be The Analects of Confucius language material file (word formatted file), be document.
Step S22: with the Attribute Association of the publication of the attribute of the document of segment attribution data and ownership be recorded in same the record of database.
To extract the attribute of segment data document and the record of the Attribute Association of ownership publication in advance, and be stored in the database that name is called " Library ".After extracting the segment data, the document properties of storing in the Library and the attribute of ownership publication and the attribute of segment data are incorporated in the record.Relevant partial code is following:
Wherein, the meta node comprises two attributes:
ParentColName is metadata (being attribute) the row name of publication, finds in database table through this parentColName attribute and wants synchronous metadata.
ColName metadata row name, the field name of specifying synchronizing metadata to store.It is the field name in the chapter storehouse.
The attribute of the attribute of document and the said publication of document has been stored in the Library, these attributes directly stored under the colName field in chapter storehouse, thus the attribute of the attribute of realization association store segment data, ownership document properties, ownership publication.
Preferably, also comprise: for segment data, document, publication are created storaging mark respectively, in related ground record attribute, storaging mark, the storaging mark of document and the storaging mark of publication of related ground recorded piece data.
Storaging mark can be through contingency table the form record, referring to table 1:
Table 1 is preserved the contingency table of incidence relation between the resource
At table 1, the incidence relation between segment data, publication, the document can be through the contingency table storage, and for example: Root Resource ID is the storaging mark of publication, and the source resource ID is the storaging mark of document, and target resource ID is the storaging mark of segment data.Write down shut sequence and incidence relation number simultaneously,, can find the memory location of each segment data, related document, related publication through the incidence relation between these signs in the table 1.
Step S23: the keyword of in attribute data, searching reception.
Step S24: the attribute data that feedback search is arrived related segment data, show the access links of the document or the publication of segment data association simultaneously.
The storaging mark that in access links, comprises publication or document.
Step S25: receive the access links of selecting, show corresponding document or publication.
Owing to have storaging mark in the access links, call and show respective document or publication according to storaging mark.
Through the step among the embodiment two, can find relevant segment data fast according to the keyword that receives, through the storaging mark of association, can further find relevant document or the segment data in the publication.
Preferably, also can in advance the segment data in the document be replaced with placeholder, and in placeholder the sign of storage segment data, for example: placeholder is with " PAMCMS: // " beginning; After connect 4 identifiers by CSV, as shown in table 2, be respectively id, type; Lib, res, i.e. PAMCMS: //id; Type, lib, res
The concrete implication of 2:4 identifier of table
Title | Implication |
id | Quote the unique identifier of segment data |
type | Quote resource type, can expand as required; |
lib | The location identifier of segment data is quoted in expression |
res | Reserved identifier |
In above-mentioned step S25, visit said document or publication according to said access links, through the storaging mark in the placeholder, extract segment data and demonstration.
Specified two embodiment of the present invention above, method of the present invention can adopt the form of module to be integrated in the electronic circuit, provides preferred embodiment three below, and specifies through the structural drawing of Fig. 4.Apparatus structure block diagram referring among Fig. 4 comprises:
Extraction module 41 is used for extracting the segment data of document;
Logging modle 42 is used for the segment data according to extraction module 41 extractions, writes down the attribute of the publication that attribute and said document belonged to of the attribute of said segment data, said document explicitly.
Preferably, also comprise:
Search module 43, be used for searching received keyword from the attribute of said logging modle 42 records;
Feedback module 44, be used for feeding back said search attribute that module 43 finds in logging modle 42 related segment data.
Preferably, also comprise:
Identification module 45 is used for said segment data, said document and said publication are stored respectively, and generates storaging mark respectively;
Identification record module 46; Be used for when logging modle 42 writes down said attribute relatedly said storaging mark, the said storaging mark of said document and the said storaging mark of said publication of the said segment data that related ground record identification module 45 generates.
Preferably, also comprise:
Link feedback module 47; When being used for feedback module 44 feedback segment data; Storaging mark through identification record module 46 records feeds back the said document of said segment data association or the access links of said publication, in said link, adds storaging mark.
Obviously; It is apparent to those skilled in the art that above-mentioned each module of the present invention or each step can realize that they can concentrate on the single calculation element with the general calculation device; Perhaps be distributed on the network that a plurality of calculation element forms; Alternatively, they can be realized with the executable program code of calculation element, carried out by calculation element thereby can they be stored in the memory storage; Perhaps they are made into each integrated circuit modules respectively, perhaps a plurality of modules in them or step are made into the single integrated circuit module and realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The above is merely the preferred embodiments of the present invention, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.All within spirit of the present invention and principle, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (11)
1. a method of handling segment data in the document is characterized in that, comprising:
Extract the segment data in the document;
Write down the attribute of the publication that attribute and said document belonged to of the attribute of said segment data, said document explicitly.
2. method according to claim 1 is characterized in that, also comprises:
Receive keyword;
From said attribute, search received keyword;
Feed back the said attribute that finds related segment data.
3. method according to claim 1 is characterized in that, the process of said extraction comprises:
Said document is converted into the document of xml form;
Travel through the content of each node in the document of said xml form;
Extract said content as said segment data.
4. method according to claim 3 is characterized in that, the said process of record explicitly comprises:
In the process of said traversal, from the document of said xml form, extract the attribute of each said segment data;
With the property store of each said segment data in the data-base recording of creating in advance;
Confirm the publication that said document belongs to;
The attribute of the publication that attribute and said document belonged to of the attribute of each said segment data of storage, said document in each bar record of said database.
5. method according to claim 2 is characterized in that, also comprises:
Said segment data, said document and said publication are stored respectively, and generated storaging mark respectively;
When writing down said attribute, write down the said storaging mark of the said storaging mark of said segment data, said document and the said storaging mark of said publication relatedly relatedly.
6. method according to claim 5 is characterized in that, the attribute that said feedback search is arrived after the related segment data, also comprise:
Feed back the said document of said segment data association or the access links of said publication;
The storaging mark that contains said document or said publication in the said access links.
7. method according to claim 6 is characterized in that, also comprises:
The placeholder that contains said segment data storage sign in use is in advance replaced the segment data in the said document;
Visit said document according to said access links;
Show in the process of said document, obtain said segment data, replace said placeholder according to said storaging mark.
8. a device of handling segment data in the document is characterized in that, comprising:
Extraction module is used for extracting the segment data of document;
Logging modle is used for writing down explicitly the attribute of the publication that attribute and said document belonged to of the attribute of said segment data, said document.
9. device according to claim 8 is characterized in that, also comprises:
Search module, be used for searching received keyword from said attribute;
Feedback module, be used to feed back the said attribute that finds related segment data.
10. device according to claim 9 is characterized in that, also comprises:
Identification module is used for said segment data, said document and said publication are stored respectively, and generates storaging mark respectively;
The identification record module is used for when writing down said attribute relatedly, writes down the said storaging mark of the said storaging mark of said segment data, said document and the said storaging mark of said publication relatedly.
11. device according to claim 10 is characterized in that, also comprises:
The link feedback module is used to feed back the said document of said segment data association or the access links of said publication.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110168129XA CN102841893A (en) | 2011-06-21 | 2011-06-21 | Method and device for processing fragmentation data in document |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110168129XA CN102841893A (en) | 2011-06-21 | 2011-06-21 | Method and device for processing fragmentation data in document |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102841893A true CN102841893A (en) | 2012-12-26 |
Family
ID=47369266
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110168129XA Pending CN102841893A (en) | 2011-06-21 | 2011-06-21 | Method and device for processing fragmentation data in document |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102841893A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106934336A (en) * | 2015-12-31 | 2017-07-07 | 珠海金山办公软件有限公司 | A kind of method and device of lantern slide identification |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090210264A1 (en) * | 2006-08-09 | 2009-08-20 | Anderson Denise M | Conversation Mode Booking System |
CN101894115A (en) * | 2009-05-18 | 2010-11-24 | 北京大学 | Image data processing method of electronic document and device thereof |
-
2011
- 2011-06-21 CN CN201110168129XA patent/CN102841893A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090210264A1 (en) * | 2006-08-09 | 2009-08-20 | Anderson Denise M | Conversation Mode Booking System |
CN101894115A (en) * | 2009-05-18 | 2010-11-24 | 北京大学 | Image data processing method of electronic document and device thereof |
Non-Patent Citations (4)
Title |
---|
张素智等: "XML数据库及其应用研究", 《计算机工程与应用》 * |
陈玲灵等: "数字图书馆中文文本数据对象", 《燕山大学学报》 * |
陈玲灵等: "数字图书馆中文文本数据对象转换为XML格式文档的实现方法", 《燕山大学学报》 * |
陈玲灵等: "数字图书馆中文文本数据对象转换为XML格式文档的实现方法", 《燕山大学学报》, no. 02, 15 May 2002 (2002-05-15), pages 184 - 186 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106934336A (en) * | 2015-12-31 | 2017-07-07 | 珠海金山办公软件有限公司 | A kind of method and device of lantern slide identification |
US10698943B2 (en) | 2015-12-31 | 2020-06-30 | Beijing Kingsoft Office Software, Inc. | Method and apparatus for recognizing slide |
CN106934336B (en) * | 2015-12-31 | 2020-07-03 | 珠海金山办公软件有限公司 | Method and device for identifying slide |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102521416B (en) | Data correlation query method and data correlation query device | |
US8943054B2 (en) | Social media content management system and method | |
US20140358911A1 (en) | Search and discovery system | |
AU2016345990A1 (en) | A system and method for processing big data using electronic document and electronic file-based system that operates on RDBMS | |
JP5147947B2 (en) | Method and system for generating search collection by query | |
JP2010067175A (en) | Hybrid content recommendation server, recommendation system, and recommendation method | |
CN102314497B (en) | Method and equipment for identifying body contents of markup language files | |
US8880463B2 (en) | Standardized framework for reporting archived legacy system data | |
CA2619230A1 (en) | Annotating documents in a collaborative application with data in disparate information systems | |
CN102184211A (en) | File system, and method and device for retrieving, writing, modifying or deleting file | |
CN101477527B (en) | Multimedia resource retrieval method and apparatus | |
CN103827852B (en) | Assemble WEB page on search engine results page | |
CN103778202A (en) | Enterprise electronic document managing server side and system | |
CN103020322A (en) | Query method | |
US20140372412A1 (en) | Dynamic filtering search results using augmented indexes | |
US20110219017A1 (en) | System and methods for citation database construction and for allowing quick understanding of scientific papers | |
KR20150018880A (en) | Information aggregation, classification and display method and system | |
CN102819601A (en) | Information retrieval method and information retrieval equipment | |
US20150066996A1 (en) | Method and system for automatically collecting publication digital resource | |
CN102841886A (en) | Method and device for splitting document | |
CN110471925A (en) | Realize the method and system that index data is synchronous in search system | |
WO2014144033A1 (en) | Multiple schema repository and modular data procedures | |
WO2016206395A1 (en) | Weekly report information processing method and device | |
CN102841893A (en) | Method and device for processing fragmentation data in document | |
Desyaputri et al. | News recommendation in Indonesian language based on user click behavior |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20121226 |