WO2011020360A1 - Procédé de mémorisation de document - Google Patents

Procédé de mémorisation de document Download PDF

Info

Publication number
WO2011020360A1
WO2011020360A1 PCT/CN2010/073412 CN2010073412W WO2011020360A1 WO 2011020360 A1 WO2011020360 A1 WO 2011020360A1 CN 2010073412 W CN2010073412 W CN 2010073412W WO 2011020360 A1 WO2011020360 A1 WO 2011020360A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
document
data
nodes
free
Prior art date
Application number
PCT/CN2010/073412
Other languages
English (en)
Chinese (zh)
Inventor
王东临
郭旭
刘宁胜
Original Assignee
北京书生国际信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京书生国际信息技术有限公司 filed Critical 北京书生国际信息技术有限公司
Publication of WO2011020360A1 publication Critical patent/WO2011020360A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]

Definitions

  • the present invention relates to computer storage technology, and in particular to a method for storing a document.
  • Extensible Markup Language is a document description language in which documents described in this language are called XML documents.
  • XML documents have many advantages.
  • XML is a meta-markup language. Developers can define their own tags according to their needs.
  • XML documents are well-defined and structured, and have strong anti-destructive capabilities.
  • the information represented by XML is platform-independent. The platform here can be understood as different applications and can be understood as different operating systems.
  • XML not only allows the vocabulary in the specified document, but also allows the specified elements to be between Relationship.
  • An XML document consists of a document type definition (DTD)/schema (Schema) and XML text, DTD/ Schema is a grammar rule for a set of tags, indicating how the XML text is organized. This way of composition makes the XML document realize the separation of content and form, which achieves many of the above advantages.
  • XML documents After storing a document as an XML document, if you want to access an object, you need to first parse the entire document, convert it into a tree structure, and then search for the object to be accessed. Make an access. It can be seen that after the storage method is used to store a certain document, when the user accesses part of the content of the document, the system needs to firstly analyze the entire document by using resources, and then select the content that the user is interested in for display, thereby prolonging the processing time. , which reduces access performance and wastes system resources.
  • the present invention provides a method for storing a document, which can improve access performance.
  • the present invention adopts the following scheme:
  • a method of storing a document comprising:
  • the document content is mapped to each node in the corresponding tree structure and the data thereof.
  • the document is made up of nodes, and the nodes are organized by a tree structure.
  • each node data corresponding to the document is stored, and the storage location of the node data is recorded in the node index table.
  • a document can be stored in a storage medium in a tree structure, and a storage format with a node index is used.
  • the node corresponding to the content can be directly accessed without the need to process the entire document in advance. It saves system resources, improves access performance, and enables fast search of nodes.
  • the document storage method of the invention can support the storage of complex tree structure data, small storage space, good expansibility, incremental modification, and easy to improve storage security.
  • FIG. 1 is a general flow chart of a document storage method of the present invention.
  • FIG. 2 is a specific flowchart of a document storage method according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a block number array and an array block corresponding data block.
  • Figure 4 is a schematic diagram of a free block table.
  • Figure 5 is a flow chart of shrinking the document storage space.
  • the basic idea of the present invention is to pre-configure a document type having a tree structure defined by at least one node type, and define a document storage format including a node index table and a node data area.
  • the document content is first mapped to the corresponding nodes and their data in the corresponding tree structure according to the document type of the document.
  • a certain storage space is allocated to the document in the storage medium, and the storage area is further allocated to the node index table and the node data area in the storage space.
  • each node data corresponding to the document is stored in the node data area, and the correspondence between each node and the storage location of the node data is recorded in the node index table.
  • the files stored according to the above storage manner may be collectively referred to as a SurXml document. (XML document defined by Sursen).
  • FIG. 1 is a general flow chart of a document storage method of the present invention. As shown in Figure 1, the method includes:
  • Step 101 Preconfigure a document type having a tree structure defined by at least one node type; and define a document storage format including a node index table and a node data area.
  • the core of the present invention is to store the document directly in the storage medium in a tree structure.
  • the document type configured in this step represents the specific organization of the tree structure.
  • Step 102 Determine, according to the document type configured in step 101, the document type of the document, and map the content of the document to each node in the corresponding tree structure and the data according to the document type.
  • Step 103 Allocate a storage space for the document according to the length of each node in the corresponding tree structure and the length of the data, and allocate a storage area for the node index table and the node data area in the allocated storage space.
  • Step 104 Store the data of the node in the node data area, and record the correspondence between each node and the storage location of the node data in the node index table. In short, the storage location of the node data is recorded in the node index table.
  • step 102 and step 103 are performed sequentially.
  • step 103 may be performed first and then step 102 may be performed, or two steps may be performed simultaneously.
  • step 103 is an optional step, that is, after mapping the content of the document into each node in the tree structure and its data, the data of the node can be stored, and the node is recorded. The storage location of the point data.
  • the node data usually includes the content of the node.
  • the relationship information with other nodes is further included. It should be noted that, in an embodiment of the present invention, the content of each node and the relationship information between the nodes are stored in the node data area, but only the node content is recorded in the node index table. The storage location, that is, the relationship information between the nodes, may not be recorded in the node index table.
  • FIG. 2 is a specific flowchart of a document storage method according to an embodiment of the present invention. As shown in Figure 2, the method includes:
  • Step 201 Preconfigure a document type defined by at least one node type and having a tree structure, and define a document storage format including a file header, a node index table, and a node data area.
  • the configured document type includes the following information: a, which types of nodes (node types) are included in the document; b, names of various types of nodes included in the document, and nodes included in the node The name/type of each attribute; c, possible parent-child relationship between various nodes.
  • the document type can describe a specific tree structure.
  • a simple document type in the document type, there are several page sub-nodes under the root node, and there are several text and image sub-nodes under each page sub-node. .
  • each node type defines a node type tag that uniquely identifies a node type.
  • a document storage format with a node index is also defined, which is specifically a file header, a node index table, and a node data area.
  • the file header is used to provide entry information for accessing the file, and it cannot be understood that the entry information for accessing the file must be stored at the head of the file. In fact, it may be agreed to store the entry information in any part of the file, such as the end of the file;
  • the node index table is used to record the specific storage location of each node data to facilitate retrieval of each node data; the node data area is used to store each node data.
  • Step 202 Referring to the document type configured in step 201, determine a document type of the document to be stored, and store the document type information.
  • the document type information is stored. This involves both the representation of the document type and the storage of the document type information.
  • Document types can be represented in two formats: type tables, such as custom type tables, and XML Schema/DTD/Relax NG, where Relax NG is a document type definition proposed by OASIS.
  • a type table is a data structure. Any data structure that can describe information about a document type can be considered as a type table. An example implementation is given here, but the actual possible implementation is not limited to this example:
  • the definition of the attribute including the name and type.
  • the definition of the node including the node name, attribute list, and list of child node names;
  • XML documents are linear in physical storage, they are logically a tree structure composed of nodes. Therefore, the DTD/Schema/Relax used to describe the XML document type Formats such as NG can also be used to describe the type of document of the present invention.
  • the document type can be represented by either of the above two methods. The following describes how the document type information is stored.
  • a linear character sequence is stored for storing document type information.
  • a set of serialization functions may be specified according to the custom type table used, and the data of the type table is converted into a linear character sequence; for using DTD/Schema/Relax
  • the document type represented by the NG format because the DTD/Schema/Relax NG file itself is already a linear character sequence, so DTD/Schema/Relax can be stored directly.
  • the NG file itself.
  • the document type information to be stored may be pre-processed, such as encryption, compression, and transformation, and then the processing result is stored as document type information.
  • document type information can be stored remotely, stored locally, or stored in program logic.
  • program logic When storing document type information, document type information can be stored remotely, stored locally, or stored in program logic. The following describes the different implementations of storing document type information in three storage locations.
  • Document type information can be stored locally, and so-called local storage refers to storing document type information in a specific file in which a document is stored.
  • local storage refers to storing document type information in a specific file in which a document is stored.
  • a custom storage area for storing document type information and specify the method of the area, including but not limited to the following method: a specified length of area starting from a specified position in the file; in the document A specific node or attribute is added to the type's tree structure as a storage for document type information.
  • the program accessing the document uses the document type information inside the document by default.
  • the document type information can be stored remotely, and the so-called remote storage refers to storing document type information in other file systems external to the stored document.
  • Document type information when stored remotely, including but not limited to the following methods: remote or distributed file systems, such as Network File System (NFS), WIN2000 Distributed File System (DFS), Andrew File System (AFS); local file system Web page (WEB) server; File Transfer Protocol (FTP) server.
  • NFS Network File System
  • DFS WIN2000 Distributed File System
  • AFS Andrew File System
  • WEB local file system Web page
  • FTP File Transfer Protocol
  • the URL or path information of the remote document type information is also stored in the SurXml document, and the method of selecting the storage location is the same as the method of selecting the storage location of the document type information by the local storage method.
  • the program that accesses the SurXml document finds the document type information based on the URL or path information saved in the document.
  • the document type information may also be stored in the program logic for accessing the SurXml document. Specifically, it can be hard coded through a set of application program interface (API) functions. Before accessing the contents of the SurXml document, the application needs to call the API function to create document type information data in the memory; or directly store the document type information. In the source code or binary image of the application that accesses the SurXml document, the program that accesses the SurXml document can directly copy the document type information into memory for use. This non-explicit storage method can only support a limited number of document types, and the program needs to assign an ID to each document type. In the SurXml document, it is necessary to store the ID of the document type used.
  • API application program interface
  • the document type information When storing the document type information, it is also possible to adopt any combination of the above three storage methods. For example, some document type information is stored remotely, part of the document type information is stored locally, or part of the document type information is stored in the program logic.
  • Step 203 Map the content of the document to each node in the corresponding tree structure and its data according to the document type.
  • the data defining the node includes content information and location information of the node.
  • the content information of the node is used to describe the content of the document corresponding to the node, including the node type tag, the node length, the name/tag and the value of the node attribute; the location information of the node is used to describe the node.
  • the position of the point in the tree structure corresponding to the entire document may also be referred to as related node index information.
  • the related node index information includes a parent node ID of the node, a left and right brother node ID of the node, a leftmost child node ID of the node, a number of child nodes of the node, and all child node IDs of the node.
  • the ID of the left and right brother nodes and all the child node IDs of the nodes are optional. Without these two contents, the position of the node in the tree structure can be clearly expressed, and the two contents are added. The goal is to increase the speed of retrieval.
  • the document content is mapped to different nodes and their data according to the document type.
  • each page is mapped to a page node, and the text information part and the image information part in the page are respectively mapped to two child nodes of the page node, and the node IDs are respectively A and B.
  • the content information of the page node includes: the node type is marked as a page node, the node length value, the name/tag and the value of the node attribute include information such as a header, a footer, and a page number.
  • the relevant node index information of the page node includes: the parent node is the root node of the PDF document, the left and right sibling nodes are other page nodes, the leftmost child node ID is A, the number of child nodes is 2, and all child nodes The ID is A and B.
  • Step 204 Allocate a storage space for the document according to the length of each node and its data, and allocate respective storage areas for the file header, the node index table, and the node data area in the storage space.
  • the length of the file header is either fixed or short, and does not require complicated storage allocation mechanism support, and can directly allocate a fixed storage area; and for the node index table and the node data area, The length thereof increases as the number of nodes increases. Therefore, in the present embodiment, the shrinkable storage allocation and recovery mechanism is used to allocate and organize the storage areas of the node index table and the node data area. In addition, this mechanism is also used for storage allocation of large objects.
  • the shrinkable storage allocation and recovery mechanism employed in an embodiment of the present invention is similar to the inode/freelist mechanism in the UNIX file system. Specifically, the node index table and the node data area are respectively treated as a UNIX file, and each corresponds to an inode.
  • the entire storage space is divided into three parts: super block, inode table and data block, as shown in Table 1.
  • the super block and inode of Table 1 Tables are used for the allocation and organization of storage space, while the actual data is in the data block section.
  • Table 2 shows the structure inside the super block.
  • the free block table is used for allocation and reclamation of storage space. Since the inode table is a fixed-size area, the number of inodes recorded is also determined. The free space in the inode table can be managed by using the idle INODE table.
  • Inode0 is an extension of the super block in the original unix file system according to an embodiment of the present invention, and is used for expanding and shrinking the storage space occupied by the inode table.
  • a special inode, inode0 may be added to the superblock, where the block number in which the inode table is located is recorded, in which case the inode table is located in the data block.
  • the inode 0 is recorded in the super block as an inode number, and the inode 0 itself is stored in the inode table.
  • Each file corresponds to an inode.
  • the inode is used to record the block number of the data block contained in the file corresponding to it, that is, the specific storage location of the file data corresponding to the inode.
  • An array of block numbers is stored in each inode.
  • the first few items of the block number record record the block number of the block in which the file data of the inode is stored, and the indirect block is recorded in the last three items of the block number array.
  • the so-called indirect block refers to the data block in which the block number of the data block is recorded.
  • FIG. 3 is a schematic diagram of a block number array and an array block corresponding to the data block. Wherein, 301 is a data block, 302 is an indirect block, 303 is a secondary indirect block, and 304 is a cubic indirect block.
  • the storage location of the block number entry of the block, the secondary indirect block, and the cubic indirect block, especially for the storage of the node data area, can play a large role.
  • the number of block number entries of the indirect block can be arbitrarily set, and can continue to include four indirect blocks, five indirect blocks, or the like, or not, depending on the file size corresponding to the inode.
  • the node index table and the node data area are both regarded as a UNIX file, and an inode is set for the node index table, marked as inode1, and an inode is set for the node data area, and is marked as inode2; Then inode1 records the specific storage location of the node index table, and inode2 records the specific storage location of the node data area.
  • Step 205 Store the data of the node in the node data area, and record the correspondence between each node and the storage location of the node data in the node index table.
  • node data area There are three ways to store node data in the node data area: tlv mode, SlottedPage mode, and inode mode.
  • Tlv is the node type tag (tag) + node length (length) + node value (value), in this storage mode, all nodes are arranged in a certain order, inside each node, the type name is in front Next, the node length is stored, and finally the attribute value of the node and the ID of other related nodes.
  • the nodes sequentially stored in the tlv manner are linearly arranged from the head. After all node data has been stored, there may be a certain amount of unused free area in the node data area.
  • the offset of the free area starting in the data area is recorded to facilitate management of the free area. It is of course also possible to reserve a node data area in the node data area in advance to record the offset of the free area starting in the data area.
  • the node data is divided into fixed-size pages, each node is located on a specific page, and multiple nodes can be stored in one page.
  • the data of the array and node of each node offset in the page are recorded respectively, and the offset array and the node data are relatively increased.
  • the data of the node can be freely moved in the free area of the page. In this way, it is more flexible when modifying the node data, especially when the length of the node data changes, and the paging storage is more suitable for the environment using the cache; the disadvantage is that the length of the node is affected by the page size. Constraints only apply to smaller nodes.
  • Nodes or attributes with large lengths are not suitable for storage in the form of tlv or slotted pages. If you use the tlv method to store the node data, it will occupy a large amount of memory when loading the node data. If you use the SlottedPage method, you cannot create a larger node object due to the limitation of the page size.
  • the node with a relatively large length can be stored by using the aforementioned inode/freelist mechanism. Specifically, an inode is set in the inode table for the large node to be stored, the storage location of the node data is recorded by the block number array, and the inode number corresponding to the node is recorded in the node data area.
  • the storage location of the node data can be expressed in different ways.
  • the starting offset of the node data in the node data area may be used to indicate the storage location of the node data.
  • the storage location of the node data may be represented by the page address where the node is located and the index of the offset array element in the node.
  • the index of the offset array element does not change, so the node data storage location indicated in the node index table does not A change has occurred.
  • the location of the node data can be accurately located by combining the node data storage location indicated in the node index table and the value of the offset array element in the page.
  • the inode number corresponding to the node may be recorded in the node index table as the storage location of the node data.
  • the index may be recorded. That is, the ID of the node is mapped to the storage location of the data of the node in the node data area.
  • the implementation of the node index can be as follows:
  • the node ID is used as a key value to establish a hash table, and the hash table stores the storage location of the node data in the node data area.
  • a linear table can also be used to store the node ID and the corresponding storage location.
  • the node data needs to be stored in the node data area, and then the node index table is filled according to the storage location of the node data in the node data area.
  • Step 206 Store identification information and entry information of the accessed document in the file header.
  • the main purpose of the header is to describe the file, providing some metadata and entry information for accessing the contents of the file.
  • the entry information required by the file header of the SurXml document formed by the present invention includes: a node index table and a storage location and length of the node data area in the file; one or more root nodes (the logical structure of the SurXml document is a tree type) Therefore, the node may form the ID of a tree or a tree.
  • the storage location of the node index table and the node data area in the file is the corresponding inode number.
  • the description information that SurXml's file header needs to provide can be quite arbitrary, but must include one or more unique identifiers so that applications that need access to the file content can identify the document as SurXml.
  • the starting offset of the file header data can be fixed or recorded to a fixed offset position in the file
  • the length of the header data can be fixed or recorded to a fixed offset position in the file
  • the starting offset and length of the file header data are recorded at a fixed offset position, and can be further deepened, and the offset of o2 is recorded at a fixed offset position o1. Shift, record o3... at the o2 offset, and record the offset and length of the file header data at on.
  • step 206 is performed after step 205. In fact, step 206 may also be performed concurrently with step 205 or before step 205.
  • the application accesses the document, first determine the SurXml document type, and then determine the node ID corresponding to the content to be accessed, by searching for the node ID in the node index table.
  • the entry of the node determines the storage location of the node data, and then accesses the node data. It can be seen that when accessing the document, the entire document does not need to be parsed, and the node data can be directly accessed, the access speed is fast, the processing is convenient, and the access performance is greatly improved.
  • the SurXml document stored in the above manner may not occupy the storage space allocated for the SurXml document, that is, the storage space allocated for the document has a free area.
  • the storage method of the present invention can effectively manage the free area in the document storage space and the free area in the document internal node data area.
  • the management of free space in the storage space allocated for the document is done through the free block table.
  • the free block table is stored in portions, and each portion contains one or more consecutive blocks in which the block number of the free block that is not allocated in the storage space is recorded.
  • the first free block number entry in each part of the free block table records the starting block number of the lower part of the free block table, and the first free block number entry of the last part of the free block table is 0, indicating that this part is The last part of the free block table.
  • Figure 4 shows a schematic diagram of a free block table.
  • 401 is the free block table shown in Table 1, which is the first part of the entire free block table, and the first item records the starting block number a of the lower part of the free block table, from which the free block can be found.
  • the next portion 402 of the table, and so on, up to 405, the value of the first entry in 405 is 0, indicating that this is the last portion of the free block table.
  • the storage space occupied by the SurXml document can be expanded and contracted.
  • the expansion of the storage space is relatively simple. It is only necessary to incorporate the expanded free space into the management of the free block table. In fact, the new free block number is appended to the end of the free block table.
  • Figure 5 is a flow chart for shrinking the storage space. As shown in Figure 5, the process includes:
  • Step 501 determining a target shrinkage amount of the storage space and a number of free blocks in the original storage space.
  • step 502 it is determined whether the shrinkage of the storage space is possible. If yes, step 503 and subsequent steps are performed, otherwise the flow is ended.
  • this step it is determined whether the shrinkage of the storage space is possible: if the target shrinkage amount of the storage space is greater than the length of the free space, it is determined that the shrinkage is impossible to complete, otherwise the shrinkage can be determined.
  • Step 503 Determine whether the SurXml document occupies a non-idle data block in the contraction area of the tail of the storage space. If yes, perform step 504 and subsequent steps, otherwise perform step 505 and subsequent steps.
  • step 504 the free data block of the non-shrinking area occupied by the SurXml document is replaced with the non-free data block of the shrinking area, and the corresponding inode and free block table are updated.
  • Step 505 Determine whether the data block storing the free block table is located in the contraction area of the end of the storage space occupied by the SurXml document, and if yes, perform step 506 and subsequent steps, otherwise step 507 is performed.
  • Step 506 Transfer the block number item describing the non-shrink area free block in the free block table located in the contraction area to the free block table of the non-shrink area.
  • step 507 a new footer of the free block table is calculated, and the storage space occupied by the SurXml document is truncated according to the specified length.
  • all of the free blocks may be shrunk.
  • the above is the management of the storage space occupied by the SurXml document.
  • the organization and management of the free space is performed inside the storage area occupied by the node data area according to the storage mode of the node data.
  • the offset of the free area starting in the data area is recorded to facilitate management of the free area.
  • the free space inside each page of the node data area is managed by itself; if it is necessary to search for free space for the new node, it can search page by page; to improve the efficiency of the search, another pair can be established.
  • the index of the free page indexed by the size of the free space in each page of the node data area, the index method that can be used includes: directly recording the number of the page and the size of the free space, sorting the page number with the size of the free space as the key value, B-tree /B+ tree (with free space size as key), and so on.
  • the allocation, organization, and collection of the free page index storage in the node data area may use the inode/freelist mechanism to process the free page index of the node data area as a file corresponding to an inode.
  • the free area may be marked as a free data block of the SurXml document according to the policy, that is, the free area is deleted from the node data area.
  • the SurXml document requires a space contraction operation, compression is performed as described above.
  • the management method of the free area makes the collection and release of the document space more convenient, and effectively improves the storage efficiency of the document.
  • the operations involved in the node include: creating a node, deleting a node, adding a child node, and modifying the attributes of the node.
  • storing the node data involves the operation of creating a node, such as adding a new table to an established document.
  • the operation specifically includes the following steps:
  • b Find the free area in the node data area and allocate a certain amount of space for the node data. Specifically, it may be that a certain amount of space is allocated to the node data according to the node attribute data.
  • the organization management method of the free space in the foregoing node data area is used.
  • the input node data into the storage location of the node data area allocated for the node. Specifically, the input node type flag and the node attribute data are recorded into the storage location of the node data area allocated for the node.
  • the index entry corresponding to the node ID is deleted from the node index table.
  • the operation of adding the child node is involved, for example, the newly added table Establish a connection with the page on which it is located.
  • the operation specifically includes updating the relevant node index information of the relevant node. Specifically, the following steps are included:
  • child node position refers to the position of the child node in all child nodes of the parent node.
  • updating the related node index information of the relevant node may also be: using the left and right brother nodes of the child node Point table traversal.
  • the operation of deleting the child node corresponds to adding the child node, that is, canceling the connection relationship between the nodes, for example, moving a table from the current page.
  • the operation specifically includes updating the relevant node index information of the relevant node. Specifically, the following steps are included:
  • the parent node may be the parent node.
  • the node ID value is set to null, or it may be the ID of the parent node ID value as the original grandparent node.
  • updating the related node index information of the relevant node may also be: using the left and right brother nodes of the child node Point table traversal.
  • the SurXml document may involve the operation of modifying the node attributes, for example, modifying some of the tables.
  • the operation specifically includes the following steps:
  • a. Determine the node data specifically determine the node ID, node attribute name, and node attribute value.
  • Applying the SurXml document formed after the above method is stored may sometimes require operations such as encryption and lossless compression. These can all be achieved by performing a corresponding reversible transformation on the subtree of the document.
  • the reversible transformation operation for specifying the SurXml document in the embodiment of the present invention specifically includes the following steps:
  • a Determine the subtree root node ID and the transformation function corresponding to the content to be changed. For example, determine the subtree root node ID and encryption function. Since the encryption may be the entire document or a part of the document, in this step, it is necessary to determine the location of the encrypted portion in the entire document. Specifically, the subtree root node corresponding to the encrypted portion is determined.
  • the traversal can use depth-first traversal or breadth-first traversal.
  • the ID and data of each node traversed are recorded in a linear buffer, and the data in the buffer is processed by the determined encryption function.
  • the operation in this step actually completes the process of encrypting the specified document content.
  • the root node of the subtree is marked as the root of the transformation tree; the other transformation nodes in the subtree except the root node of the subtree are marked as internal nodes of the transformation tree, and The ID of the root node is recorded into the other transform node index. In order to ensure the connection relationship between nodes, all transformed node IDs remain unchanged.
  • the data in the buffer processed by the transform function (for example, the encryption function) is saved as the data of the root node of the subtree; at the same time, the other nodes in the subtree except the root node are emptied.
  • the data is saved as the data of the root node of the subtree; at the same time, the other nodes in the subtree except the root node are emptied.
  • the transformed document may also be inversely transformed to obtain an initial document.
  • the specific operations include the following steps:
  • the decryption function For example, use the decryption function to process the data of the root node of the subtree. Since the encrypted data of the entire subtree node is stored as the subtree root node data, as long as the subtree root node data is decrypted, the data of the entire subtree node is decrypted.
  • the data of the root node of the subtree after the inverse transform processing is sequentially restored to the data of each node according to the traversal order when the transform is performed.
  • the data of each node is sequentially restored according to the traversal order of the encryption process.
  • An XML document is a common document format, and a SurXml document formed in accordance with the storage method of the present invention can also be converted to and from an XML format document.
  • the specific process of converting an XML document into a SurXml document includes:
  • the specific process of converting a SurXml document to an XML document includes:
  • the process in this step is recursive, and the child nodes are converted to child elements in the XML document.
  • document types representing different tree structures are pre-configured, and the document storage format is defined to include a node index table and a node data area.
  • the document storage format with node index a fast search of nodes can be achieved.
  • the document content is first mapped to the corresponding nodes and their data in the corresponding tree structure according to the document type of the document.
  • the document is made up of nodes, and the nodes are organized by a tree structure.
  • a certain storage space is allocated to the document in the storage medium, and the storage area is further allocated to the node index table and the node data area in the storage space.
  • each node data corresponding to the document is stored in the node data area, and the correspondence relationship between each node and the storage location of the node data is recorded in the node index table, and the entry information of the access document is stored in the file header.
  • a document can be stored in a storage medium in a tree structure, and a storage format with a node index is used.
  • the node corresponding to the content can be directly accessed without the need to process the entire document in advance.
  • the system resources are saved and the access performance is improved; in addition, the method of the present invention still retains the configuration of the document type and inherits the advantages of the XML document.
  • the document storage method of the invention can support the storage of complex tree structure data, small storage space, good expansibility, incremental modification, and easy to improve storage security.
  • the present invention allocates a storage area to the node index table and the node data area
  • the node index table and the node data area are treated as two files respectively, and the mechanism similar to UNIX inode/freelist is used.
  • the allocation, organization, and recycling of these two storage areas make the two storage areas easy to shrink and expand, simplifying operations when adding nodes and growing node data.

Abstract

L'invention porte sur un procédé de mémorisation de document. Le procédé comprend : la mise en correspondance du contenu d'un document avec la structure arborescente conformément aux types de document désigné, les informations de type de chaque nœud étant définies dans le type de document, la mémorisation du contenu de chaque nœud pour construire la structure arborescente et les informations de la relation entre nœuds, l'enregistrement de l'emplacement de mémorisation des données de nœud dans le tableau d'indices de nœud, les données de nœud comprenant le contenu de nœud, et la mémorisation du tableau d'indices de nœud. Le procédé de mémorisation permet d'améliorer les performances d'accès du document.
PCT/CN2010/073412 2009-08-19 2010-06-01 Procédé de mémorisation de document WO2011020360A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2009100905332 2009-08-19
CN200910090533 2009-08-19

Publications (1)

Publication Number Publication Date
WO2011020360A1 true WO2011020360A1 (fr) 2011-02-24

Family

ID=43606611

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/073412 WO2011020360A1 (fr) 2009-08-19 2010-06-01 Procédé de mémorisation de document

Country Status (1)

Country Link
WO (1) WO2011020360A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018075041A1 (fr) * 2016-10-20 2018-04-26 Hitachi, Ltd. Système de mémoire de données et procédé de fourniture de mémoire partagée dans un système de grappe évolutif et programme informatique destiné audit système de mémoire de données
CN110149803A (zh) * 2018-08-27 2019-08-20 深圳市锐明技术股份有限公司 数据存储方法、系统及终端设备
CN111209444A (zh) * 2020-01-06 2020-05-29 电子科技大学 一种基于时间序列多版本图拓扑数据的存储方法
CN111813813A (zh) * 2020-07-08 2020-10-23 杭州海康威视系统技术有限公司 一种数据管理方法、装置、设备及存储介质
CN113741796A (zh) * 2020-07-16 2021-12-03 北京沃东天骏信息技术有限公司 一种终端应用的数据持久化方法和装置

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060064432A1 (en) * 2004-09-22 2006-03-23 Pettovello Primo M Mtree an Xpath multi-axis structure threaded index
CN101124574A (zh) * 2004-04-30 2008-02-13 微软公司 元数据导航和分配的属性树

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101124574A (zh) * 2004-04-30 2008-02-13 微软公司 元数据导航和分配的属性树
US20060064432A1 (en) * 2004-09-22 2006-03-23 Pettovello Primo M Mtree an Xpath multi-axis structure threaded index

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018075041A1 (fr) * 2016-10-20 2018-04-26 Hitachi, Ltd. Système de mémoire de données et procédé de fourniture de mémoire partagée dans un système de grappe évolutif et programme informatique destiné audit système de mémoire de données
US10956393B2 (en) 2016-10-20 2021-03-23 Hitachi, Ltd. Data storage system and process for providing distributed storage in a scalable cluster system and computer program for such data storage system
CN110149803A (zh) * 2018-08-27 2019-08-20 深圳市锐明技术股份有限公司 数据存储方法、系统及终端设备
CN111209444A (zh) * 2020-01-06 2020-05-29 电子科技大学 一种基于时间序列多版本图拓扑数据的存储方法
CN111209444B (zh) * 2020-01-06 2023-03-31 电子科技大学 一种基于时间序列多版本图拓扑数据的存储方法
CN111813813A (zh) * 2020-07-08 2020-10-23 杭州海康威视系统技术有限公司 一种数据管理方法、装置、设备及存储介质
CN111813813B (zh) * 2020-07-08 2024-02-20 杭州海康威视系统技术有限公司 一种数据管理方法、装置、设备及存储介质
CN113741796A (zh) * 2020-07-16 2021-12-03 北京沃东天骏信息技术有限公司 一种终端应用的数据持久化方法和装置
CN113741796B (zh) * 2020-07-16 2024-04-16 北京沃东天骏信息技术有限公司 一种终端应用的数据持久化方法和装置

Similar Documents

Publication Publication Date Title
WO2011020360A1 (fr) Procédé de mémorisation de document
WO2010079883A2 (fr) Procédé et appareil de reproduction de contenu par gestion de canal intégré
WO2014010992A1 (fr) Procédé de communication entre un demandeur de contenu et un fournisseur de contenu pour fournir un contenu et diffuser en continu, en temps réel, un contenu dans un réseau centré sur le contenu et basé sur un nom de contenu
WO2017146338A1 (fr) Procédé et appareil permettant d'archiver une base de données générant des informations d'index, et procédé et appareil permettant de consulter une base de données archivée comprenant des informations d'index
WO2018082484A1 (fr) Procédé et système de capture d'écran pour dispositif électronique, et dispositif électronique
WO2016065705A1 (fr) Procédé et appareil de mise à jour de liste de canaux, et dispositif terminal
WO2014126335A1 (fr) Procédé de gestion de données basé sur l'informatique en nuage, et système et appareil associés
WO2013174172A1 (fr) Procédé et système de prévisualisation d'informations de fichier
WO2017028573A1 (fr) Procédé et système de traitement d'informations d'image sur la base d'un terminal mobile
WO2018101640A1 (fr) Procédé de récupération de cohérence pour duplication de base de données transparente
WO2012097701A1 (fr) Procédé, système et support de stockage informatique pour pré-lecture de données de réseau
WO2011155736A2 (fr) Procédé de production dynamique de termes supplémentaires pour chaque sens de chaque expression en langage naturel ; gestionnaire de dictionnaire, dispositif de production de documents, annotateur de termes, système de recherche et dispositif de construction d'un système d'informations sur des documents basé sur le procédé
WO2017034136A1 (fr) Appareil mobile, appareil de balayage d'image et procédé permettant de traiter une tâche
WO2019169814A1 (fr) Procédé, appareil et dispositif de génération automatique d'annotation en chinois, et support d'informations
WO2020177376A1 (fr) Procédé et appareil d'extraction de données, terminal et support d'enregistrement lisible par ordinateur
WO2014010819A1 (fr) Procédé d'implémentation de données structurées et non structurées dans un document xml
CN113553300A (zh) 文件的处理方法、装置、可读介质和电子设备
WO2018076871A1 (fr) Procédé de synchronisation d'informations de contact, appareil, support, dispositif électronique, et système
WO2019245247A1 (fr) Procédé de gestion d'objet utilisant un identifiant de trace, un appareil pour celui-ci, un programme informatique pour celui-ci, et un support d'enregistrement stockant un programme informatique de celui-ci
WO2014142528A1 (fr) Appareil et procédé de génération de fichiers de publication électronique epub appliqués à des droits d'auteur numériques
WO2017020620A1 (fr) Procédé de synchronisation d'onglet, dispositif électronique et support d'informations
WO2015161646A1 (fr) Procédé d'assemblage de données, dispositif et système de propagation de ressources
WO2018191889A1 (fr) Procédé et appareil de traitement de photo, et dispositif informatique
WO2010147410A2 (fr) Procédé et dispositif de mise à niveau d'un objet de droits stocké dans une carte mémoire
WO2019019341A1 (fr) Procédé de stockage de fichiers, terminal et support de stockage lisible par ordinateur

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10809493

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10809493

Country of ref document: EP

Kind code of ref document: A1