CN114942963A - Data storage method, device, equipment and storage medium - Google Patents

Data storage method, device, equipment and storage medium Download PDF

Info

Publication number
CN114942963A
CN114942963A CN202210435936.1A CN202210435936A CN114942963A CN 114942963 A CN114942963 A CN 114942963A CN 202210435936 A CN202210435936 A CN 202210435936A CN 114942963 A CN114942963 A CN 114942963A
Authority
CN
China
Prior art keywords
sub
metadata
storage
record
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210435936.1A
Other languages
Chinese (zh)
Inventor
郭亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202210435936.1A priority Critical patent/CN114942963A/en
Publication of CN114942963A publication Critical patent/CN114942963A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees

Abstract

The application provides a data storage method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a storage area of metadata, wherein the storage area comprises a plurality of sub-storage areas, different sub-storage areas are isolated from each other, and different sub-storage areas are used for storing metadata of different versions; acquiring a plurality of submission records corresponding to the plurality of sub-storage areas and having a first incidence relation, wherein the submission records are used for recording submission information after the metadata stored in the sub-storage areas are modified; constructing a second incidence relation between the submission record and the data storage structure; metadata is stored according to a first incidence relation between the child storage region and the commit record, and a second incidence relation between the commit record and the data storage structure. The data consistency and the processing efficiency of the storage metadata are ensured, and the realization is simple.

Description

Data storage method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data storage method, apparatus, device, and storage medium.
Background
At present, a distributed version control system GIT is used as a version management tool, and version management can be performed on text files. However, for structured metadata, the GIT cannot provide real-time storage and query services for it.
In the prior art, a solution that only a GIT is used for providing version management service and a relational database is used for providing real-time storage and query service for metadata exists, but the solution still has the defects that data consistency and processing efficiency are difficult to guarantee when metadata is processed, and processing complexity and user experience are also influenced.
Disclosure of Invention
Embodiments of the present invention provide a data storage method, apparatus, device, and storage medium, so as to ensure data consistency and processing efficiency of storage metadata, and implement the method and apparatus simply.
In a first aspect, an embodiment of the present invention provides a data storage method, where the method includes:
acquiring a storage area of metadata, wherein the storage area comprises a plurality of sub-storage areas, different sub-storage areas are isolated from each other, and different sub-storage areas are used for storing metadata of different versions;
acquiring a plurality of submission records corresponding to the plurality of sub-storage areas and having a first incidence relation, wherein the submission records are used for recording submission information after metadata stored in the sub-storage areas are modified;
constructing a second incidence relation between the submission record and a data storage structure;
and storing the metadata according to a first association relationship between the sub-storage area and the submitted record and a second association relationship between the submitted record and the data storage structure.
In a second aspect, an embodiment of the present invention provides a data storage apparatus, including:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a storage area of metadata, the storage area comprises a plurality of sub-storage areas, different sub-storage areas are isolated from each other, and different sub-storage areas are used for storing metadata of different versions;
a second obtaining module, configured to obtain multiple submission records, in which a first association relationship exists, corresponding to the multiple sub-storage areas, where the submission records are used to record submission information after metadata stored in the sub-storage areas are modified;
the relationship building module is used for building a second incidence relationship between the submission record and the data storage structure;
and the data storage module is used for storing the metadata according to a first incidence relation between the sub storage area and the submitted record and a second incidence relation between the submitted record and the data storage structure.
In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor, a communication interface; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to implement at least the data storage method of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to implement at least the data storage method according to the first aspect.
In the embodiment of the invention, when the metadata is stored, a storage area of the metadata is firstly obtained so as to be convenient for managing the metadata, the storage area comprises a plurality of sub-storage areas, different sub-storage areas are mutually isolated, and different sub-storage areas are used for storing metadata of different versions; acquiring a plurality of submission records corresponding to the plurality of sub-storage areas and having a first incidence relation, wherein the submission records are used for recording submission information after the metadata stored in the sub-storage areas are modified; constructing a second incidence relation between the submission record and the data storage structure; metadata is stored according to a first incidence relation between the child storage region and the commit record, and a second incidence relation between the commit record and the data storage structure.
In the scheme, no additional version management tool such as GIT is needed, metadata of different versions are stored only by adopting different mutually isolated sub-storage areas, and a second association relation is established between the submission record having the first association relation with the sub-storage area and the data storage structure, so that the metadata can be stored according to the first association relation between the sub-storage area and the submission record and the second association relation between the submission record and the data storage structure. The data consistency and the processing efficiency of the storage metadata can be ensured, and the realization is simple.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a flowchart of a data storage method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a hash tree structure according to an embodiment of the present invention;
fig. 3 is a schematic application diagram of a data storage method according to an embodiment of the present invention;
FIG. 4 is a flow chart of an alternative data storage method provided by an embodiment of the present invention;
FIG. 5 is a diagram illustrating an alternative merging of two sub-storage areas according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a data storage device according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The features of the embodiments and examples described below may be combined with each other without conflict between the embodiments. In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.
The terms or concepts involved in the embodiments of the present invention are explained first:
hash Tree (Merkle Tree): the method is a multi-branch tree structure, wherein each node comprises a hash value, the hash value is calculated by mixing the hash values of all child nodes of the node, and the root node of the hash tree is obtained by performing hash operation on all nodes of the tree.
Graph Database (GDB): in computer science, a graph database is a database that is semantically queried using graph structures, using nodes, edges and attributes to represent and store data.
Metadata (Metadata), also called intermediary data and relay data, is data (data about data) describing data, and is mainly information describing data attribute (property) for supporting functions such as indicating storage location, history data, resource search, file record, and the like.
The data storage method provided by the embodiment of the invention can be executed by an electronic device, and in practical application, the electronic device can be a server or a user terminal such as a PC, and the server can be a physical server or a virtual server (virtual machine) in a cloud.
Fig. 1 is a flowchart of a data storage method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
101. acquiring a storage area of metadata, wherein the storage area comprises a plurality of sub-storage areas, different sub-storage areas are isolated from each other, and different sub-storage areas are used for storing metadata of different versions;
102. acquiring a plurality of submission records corresponding to the plurality of sub-storage areas and having a first incidence relation, wherein the submission records are used for recording submission information after metadata stored in the sub-storage areas are modified;
103. constructing a second incidence relation between the submission record and a data storage structure;
104. and storing the metadata according to a first association relationship between the sub-storage area and the submitted record and a second association relationship between the submitted record and the data storage structure.
In the embodiment of the present invention, the metadata may be understood as data (data about data) describing data, and mainly information describing a data property (property), for example, to support functions such as indicating a storage location, history data, resource search, file record, and the like.
Alternatively, a storage area may be understood to be a data storage warehouse (Repository) for storing metadata. For example, the metadata may be divided into different data types according to usage requirements of the metadata, and then stored in different storage areas according to the data type to which the metadata belongs, so as to manage the metadata separately.
In addition, different storage areas are isolated from each other. For example, the sub-storage areas in different storage areas are isolated from each other, and the metadata stored in different storage areas are also isolated from each other).
As described above, the storage area may include a plurality of sub-storage areas, and different sub-storage areas are isolated from each other, and different sub-storage areas are used for storing different versions of metadata.
Alternatively, the sub storage area may be understood as a storage Branch (Branch) in the data storage warehouse. A plurality of sub-storage areas may be created in a data storage warehouse, at least one default main sub-storage area in a data storage warehouse is used as a baseline sub-storage area, all other new sub-storage areas except the main sub-storage area are cloned or copied from the main sub-storage area, and metadata stored in the cloned or copied new sub-storage area may be completely consistent with the current main sub-storage area before being modified.
Optionally, in this embodiment of the present invention, when a user modifies the sub-storage region, in response to a detection operation of the user on the sub-storage region, the storage content in the sub-storage region is returned, for example, the storage content includes a hash key corresponding to a last commit record commit of the sub-storage region, and a currently stored data storage structure, for example, a hash tree structure, for data index.
It will be appreciated that after reading the contents of the child storage area, the metadata in the child storage area is modified, and the child storage area creates a new commit record at each commit, such that the child storage area and the new commit record have an association, i.e., a first association.
The new commit record is used to record commit information after the metadata stored in the sub storage area is modified, for example, the new commit record includes a hash value corresponding to a last commit record of the sub storage area and a hash value of a root node in a data storage structure of the data index corresponding to the current commit.
It should be noted that, in the embodiment of the present invention, if the commit is a commit for a merge operation, hash values corresponding to multiple parent commit records are also recorded in the new commit record.
As an optional embodiment, after generating a commit record corresponding to the commit, a commit record association relationship between the commit record and a last commit record may be further established, where the last commit record is a commit record having a first association relationship with the target sub-storage area before the target metadata is modified.
Optionally, the hash value corresponding to the last submitted record is recorded in the new submitted record obtained each time, and then the association relationship between the submitted record and the last submitted record is established. In addition, each new commit record obtained is associated with a root node of the data storage structure. Because the hash value in the submitted record is calculated by mixing all the included attributes, the hash value is unique, the hash value of the root node of the corresponding data storage structure for data index is also unique and unchangeable, and a second association relationship between the submitted record and the data storage structure can be established based on the correspondence between the submitted record and the root node of the corresponding data storage structure, and the second association relationship is also unique and unchangeable.
Optionally, the Data storage structure may be used for Data indexing, for example, it may be a hash tree structure, the root nodes of all hash tree structures are unique, and the child nodes of the root nodes are divided into two types, i.e., tree nodes TreeNode, which are used to represent a branch structure and may include multiple child nodes and multiple Data nodes. The data node does not contain a child node, but contains a key value (value) for storing metadata.
In order to efficiently store data, the data structure based on the hash tree is adopted to realize the packaging processing of the metadata. The TreeNode corresponds to a node in the Hash tree, and each TreeNode contains a hashKey; the TreeNode is only a node organizing the parent-child relationship of the tree structure and does not really store data; any one TreeNode can contain any number of DataNode nodes that do not have child nodes but can store metadata with value values. The hashKey of the DataNode is calculated by mixing the name and the value of the DataNode; the hashKey of the TreeNode is calculated by mixing all the child nodes (including the TreeNode and the DataNode) and the name of the TreeNode.
In the embodiment of the present invention, as described above, each tree node TreeNode contains a name and a hash value. As shown in fig. 2, which is a schematic diagram of a hash tree structure, for example, the ROOT node TreeNode0 contains a name (ROOT) and a hash value (hash …).
Optionally, the target tree node is any one of a plurality of tree nodes, and the hash value of the target tree node is calculated from the hash value of a child node included in the target tree node and the name of the target tree node, and therefore is unique. The child nodes comprise at least one data node, and the hash value of each data node is obtained by calculating the name and the key value of the data node.
As also shown in fig. 2, for example, taking the target tree Node as the root Node TreeNode0 as an example, the child nodes included under the root Node TreeNode0 are TreeNode1, TreeNode2, and Data Node 0. For example, the sub-Node TreeNode1 includes Data nodes Data Node1 and Data Node2, the sub-Node TreeNode2 includes Data nodes Data Node3 and Data Node4, for example, the hash value of the Data Node3 is calculated by the name Data121 of the Data Node3 and the key value va l ue121, and other Data nodes are illustrated in fig. 2 and are not described in detail.
Through the processing steps, the metadata is abstracted into the hash tree structure according to the first incidence relation between the sub-storage area and the submitted record and the second incidence relation between the submitted record and the data storage structure, the storage query and the version management of the metadata can be integrated, the problem of data consistency is avoided, and the storage efficiency of the metadata can be greatly improved. And then, the metadata is stored by adopting a graph database, and the data storage mode of using the graph database is combined, so that the network and IO expenses are greatly reduced, and the second-level data storage and query can be realized.
For example, the data storage method provided by the embodiment of the present invention may be implemented in the data storage system shown in fig. 3, where in the structural diagram of the data storage system, the storage area (name: configData, id: 679) includes a sub-storage area (name: feature _ xxx, lastcommit key: y1az), and the Commit record having the first association relationship with the sub-storage area is the current latest Commit record Commit3(hashKey: y1az …, parent: ere2 …). The last Commit record for this child storage area is Commit record Commit2(hashKey: ere2 …, parent: as1f …), and the last Commit record for this child storage area is Commit record Commit1(hashKey: as1f …, parent: 0).
The Commit records Commit1, Commit2 and Commit3 are all associated with a root node TreeNode, for example, Commit1 is associated with TreeNode1(hashKey: icq5 …), Commit2 is associated with TreeNode2(hashKey:8yu1 …), Commit3 is associated with hash tree TreeNode3(hashKey: u91x …), that is, a second association relationship is established. And finally storing the metadata by adopting a graph database according to a first incidence relation between the sub-storage area and the submitted record and a second incidence relation between the submitted record and the data storage structure.
The embodiment of the invention combines the storage of the metadata and the version management thereof, can efficiently read and modify the metadata in the storage, can establish the metadata versions corresponding to different sub-storage areas, and can provide the versioning capabilities of combining, solving conflicts and the like, thus modifying the metadata based on the established different sub-storage areas when the metadata is used subsequently, finally combining the metadata versions into a uniform baseline version, and simultaneously providing the processing capability of conflicts among different versions.
As an optional embodiment, the embodiment of the present invention may further adopt the following implementation manner to construct a first association relationship between the submission record and the target sub-storage area: determining a target sub-storage area for storing the target metadata in response to a read operation of the target metadata; reading the target metadata from the target sub-storage area to modify the target metadata; after submitting the modified target metadata, generating a submission record corresponding to the submission; a first association between the commit record and the target child storage area is constructed.
It can be understood that, in the embodiment of the present invention, after determining a target sub-storage area for storing target metadata, and reading the target metadata in the target sub-storage area, the target metadata is modified, and a commit is performed after the modified target metadata is committed, a commit record corresponding to the commit is generated each time the commit is committed, and thus a first association relationship between the commit record and the target sub-storage area can be constructed.
The commit record is used to record commit information after modifying the target metadata stored in the target sub-storage area, for example, including a hash value corresponding to a last commit record of the target sub-storage area and a hash value of a root node in the data storage structure of the data index corresponding to the commit this time.
After the data storage structure of the metadata is determined, the metadata is finally mapped into a database for persistent storage, and efficient reading and submitting services are provided. If the traditional relational database is used as a storage database, the traditional relational database has the characteristic of a tree-shaped data structure, and after the traditional relational database is mapped to a 'plane' table structure of the relational database, a large amount of cyclic recursion operations are needed no matter query reading or writing is carried out, and when the depth of the tree-shaped data structure is deeper, the processing efficiency cannot be accepted by a user.
In computer science, a Graph Database (GDB) is a database that uses graph structures to perform semantic queries and uses nodes, edges, and attributes to represent and store data. The key concept of this graph database is a graph, which directly associates data items in storage with data nodes and sets of edges between nodes that represent relationships. These relationships allow the data in the storage area to be linked together directly and, in many cases, retrieved through one operation.
Because graph databases prioritize relationships between data, querying relationships in graph databases is fast because they are permanently stored in the databases themselves, and graph databases can also be used to visually display relationships, making it very useful for highly interconnected data.
To this end, in an optional embodiment, the metadata is stored according to a first association relationship between the sub-storage region and the submitted record and a second association relationship between the submitted record and the data storage structure, and the following specific implementation may also be adopted, for example, the edges in the graph database are determined according to the first association relationship between the sub-storage region and the submitted record and the second association relationship between the submitted record and the data storage structure; respectively taking the storage area, the sub-storage area, the submission record and the data storage structure as vertexes in the graph database; a graph database is used to store metadata from edges and vertices.
It can be understood that, in the embodiment of the present invention, the first association relationship between the sub-storage region and the submitted record, and the second association relationship between the submitted record and the data storage structure are all unidirectional association relationships, and therefore, the first association relationship and the second association relationship can be abstracted to be a data structure of a graph, and further, in order to efficiently modify and query the metadata, in the embodiment of the present invention, the graph database is used to realize the final data storage, and the storage region, the sub-storage region, the submitted record and the data storage structure are respectively used as vertices in the graph database, so that the graph database can be directly used to express a complete data storage structure.
Specifically, each node of the data storage structure can be used as a vertex of the graph database, and through the use of the graph database, metadata can be completely stored and inquired at one time without recursion, so that a large number of network and IO operations are reduced, and the data storage efficiency is greatly improved.
For example, when one of the tree nodes and all its child nodes are not changed, different data storage structures may share the node for storage, so that when any one of the tree nodes in the data storage structure is changed, the occupied space is reduced, all the ancestor nodes including the root node are changed, and finally, each modification and submission can be mapped to the root node of one data storage structure. When the two data storage structures are compared, if the hash value of one tree node is the same, the tree node is represented and all included child nodes are the same; of course, if the hash values of the root nodes of the two data storage structures are the same, it means that the two data storage structures are identical.
Alternatively, if each read and commit of a version of metadata is a sequential operation, no conflict will occur. However, if multiple copies of metadata are read and modified differently, and the modified metadata is submitted sequentially, a conflict may occur in the subsequent submission after the previous submission is completed, and the conflict may be understood as a merge conflict.
As an optional embodiment, as shown in fig. 4, the data storage method in the embodiment of the present invention further includes the following implementation manners:
401. in response to the merging operation of the at least two sub-storage areas, determining at least two submission records having a first association relationship with the at least two sub-storage areas;
402. determining whether the same modification operation is executed on the same target tree node in the data storage structure in the same time period and modification results are inconsistent according to at least two submitted records;
403. if yes, conflict prompt information is output to prompt that the combination conflict and the specific conflict content occur in the plurality of sub-storage areas;
404. if not, combining at least two sub-storage areas to obtain a new sub-storage area.
As described above, if multiple copies of metadata of one version are read and modified differently, and the modified metadata is submitted successively, the essence of the metadata is to be understood as a merge operation of two sub-storage regions corresponding to the modified metadata, and after the previous submission is completed, a conflict may occur in the subsequent submission, which may be understood as a merge conflict.
As shown in FIG. 5, for the same Branch (name: master, IastCommitKey (IastCommitHashKey): y1az …), two copies of data are read simultaneously to edit CommitHashKey into yuxz … and 7yz3 …, and the Branch A corresponding to CommitHashKey: yuxz … is submitted successfully first, so that the IastCommitHashKey corresponding to Branch (name: master) is updated to u7ec …; when the Branch B corresponding to the hashKey 7yz3 … is submitted, the IastCommitHashKey 7yz3 … corresponding to the Commit is found to be different from the IastCommitHashKey u7ec … corresponding to the current Branch, the nearest common ancestor (LCA: local common ancestor) Commit (HashKey: y1az …, parent: ere2) between the two is found, the hash values of the target tree nodes associated with the two submitted records are respectively compared with the hash values of the target tree nodes associated with the common ancestor record, namely the three-way comparison is carried out on the respectively associated treeNodes to obtain diffA and diffB, if the same modification operation is carried out on the same target tree node in the data storage structure in the same time period by the diffA and the diffB and the modification results are inconsistent, the conflict is considered to occur, the manual conflict is required to be processed, and the intervention information is output to prompt that the conflict occurs in a plurality of sub-storage regions and the merging of the specific storage regions; if not, combining at least two sub-storage areas to obtain a new sub-storage area.
Optionally, after the conflict prompt information is output to the user, the user decides to merge the content, attaches the merged data, and submits the merged data again to solve the merging conflict problem.
By the embodiment of the invention, different sub-storage areas are established to carry out versioning updating on the metadata, only modified data contents are stored instead of full data at each time, a history operation record can be completely stored, and conflicts among a plurality of sub-storage areas can be solved.
As can be seen from the above description, in the embodiments of the present invention, by integrating the version management and the fast storage and query capabilities of metadata, there are no problems of consistency and efficiency after separating the version management from the actual storage, and meanwhile, by structured storage, it is more convenient for a user to perform a solution to a possible conflict when modifying metadata in a plurality of sub-storage regions. In addition, the data structure is abstracted into TreeData, the graph database is used as actual storage, and compared with a traditional relational database, the data structure can be read and stored completely at one time without repeated recursion operation, so that the network and IO are greatly reduced, and the data processing efficiency is greatly improved.
The data storage device of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these means can each be constructed using commercially available hardware components and by performing the steps taught in this disclosure.
Fig. 6 is a schematic structural diagram of a data storage device according to an embodiment of the present invention, as shown in fig. 6, the data storage device includes: the system comprises a first acquisition module 11, a second acquisition module 12, a relation construction module 13 and a data storage module 14.
The first obtaining module 11 is configured to obtain a storage area of metadata, where the storage area includes multiple sub-storage areas, different sub-storage areas are isolated from each other, and different sub-storage areas are used to store metadata of different versions;
a second obtaining module 12, configured to obtain multiple submission records having a first association relationship and corresponding to the multiple sub-storage areas, where the submission records are used to record submission information after the metadata stored in the sub-storage areas are modified;
a relationship construction module 13, configured to construct a second association relationship between the submission record and the data storage structure;
a data storage module 14, configured to store the metadata according to a first association relationship between the sub storage area and the commit record and a second association relationship between the commit record and the data storage structure.
Optionally, the first obtaining module 11 is specifically configured to: acquiring the data type of the metadata; determining the storage area corresponding to the data type.
Optionally, the apparatus further comprises: a first determining module, configured to determine, in response to a read operation on target metadata, a target sub-storage area for storing the target metadata; a modification module, configured to read the target metadata from the target sub-storage area to modify the target metadata; a generation module, configured to generate the submission record corresponding to the submission after submitting the modified target metadata; a relationship construction module, configured to construct the first association relationship between the submission record and the target sub-storage area.
Optionally, the apparatus further comprises: and the relationship building module is used for building a submitted record association relationship between the submitted record and the last submitted record, wherein the last submitted record is the submitted record which has the first association relationship with the target sub-storage area before the target metadata is modified.
Optionally, the data storage structure comprises: the data processing method comprises the steps that each tree node comprises a name and a hash value, the hash value of a target tree node is obtained by calculating the hash value of a child node contained under the target tree node and the name of the target tree node, the child node comprises at least one data node, the hash value of each data node is obtained by calculating the name of the data node and a key value, and the key value is used for storing metadata; the target tree node is any one of the plurality of tree nodes.
Optionally, the data storage module 14 is specifically configured to: determining an edge in a graph database according to a first incidence relation between the sub-storage area and the submitted record and a second incidence relation between the submitted record and the data storage structure; respectively taking the storage area, the sub-storage area, the submission record and the data storage structure as vertexes in the graph database; storing the metadata according to the edges and the vertices using the graph database.
Optionally, the apparatus further comprises: a second determining module, configured to determine, in response to a merge operation on at least two of the sub-storage regions, at least two of the commit records that have the first association relationship with at least two of the sub-storage regions; determining whether the same modification operation is executed on the same target tree node in the data storage structure in the same time period and modification results are inconsistent according to at least two submission records; the prompt module is used for outputting conflict prompt information if the conflict is generated, so as to prompt that the merging conflict and the specific conflict content occur in the plurality of sub-storage areas; and the merging module is used for merging at least two sub-storage areas to obtain a new sub-storage area if the current sub-storage area is not the same as the current sub-storage area.
Optionally, the second determining module is specifically configured to: determining a common ancestor record of at least two of the commit records;
and comparing the hash values of the target tree nodes associated with at least two submitted records with the hash values of the target tree nodes associated with the common ancestor record respectively to determine whether the same modification operation is performed on the same target tree node in the data storage structure within the same time period and the modification results are inconsistent.
In one possible design, the structure of the data storage device shown in fig. 6 may be implemented as an electronic device. As shown in fig. 7, the electronic device may include: a processor 21, a memory 22, and a communication interface 23. Wherein the memory 22 has stored thereon executable code which, when executed by the processor 21, makes the processor 21 at least to implement the data storage method as provided in the previous embodiments.
In addition, an embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which, when executed by a processor of an electronic device, causes the processor to implement at least the data storage method as provided in the foregoing embodiments.
The above described embodiments of the apparatus are merely illustrative, wherein the network elements illustrated as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (11)

1. A method of storing data, comprising:
acquiring a storage area of metadata, wherein the storage area comprises a plurality of sub-storage areas, different sub-storage areas are isolated from each other, and different sub-storage areas are used for storing metadata of different versions;
acquiring a plurality of submission records corresponding to the plurality of sub-storage areas and having a first incidence relation, wherein the submission records are used for recording submission information after metadata stored in the sub-storage areas are modified;
constructing a second incidence relation between the submission record and a data storage structure;
and storing the metadata according to a first association relationship between the sub-storage area and the submitted record and a second association relationship between the submitted record and the data storage structure.
2. The method of claim 1, wherein obtaining the storage area of the metadata comprises:
acquiring the data type of the metadata;
determining the storage area corresponding to the data type.
3. The method of claim 1, further comprising:
in response to a read operation on target metadata, determining a target sub-storage area for storing the target metadata;
reading the target metadata from the target sub-storage area to modify the target metadata;
after submitting the modified target metadata, generating the submission record corresponding to the submission;
and constructing the first association relation between the submission record and the target sub-storage area.
4. The method of claim 3, wherein after generating the commit record corresponding to the commit, the method further comprises:
and constructing a submitted record association relation between the submitted record and the last submitted record, wherein the last submitted record is the submitted record which has the first association relation with the target sub-storage area before the target metadata is modified.
5. The method of claim 1, wherein the data storage structure comprises:
the data processing method comprises the steps that each tree node comprises a name and a hash value, the hash value of a target tree node is obtained by calculating the hash value of a child node contained under the target tree node and the name of the target tree node, the child node comprises at least one data node, the hash value of each data node is obtained by calculating the name of the data node and a key value, and the key value is used for storing metadata; the target tree node is any one of the plurality of tree nodes.
6. The method of claim 1, wherein storing the metadata according to a first association between the child storage region and the commit record and a second association between the commit record and the data storage structure comprises:
determining an edge in a graph database according to a first incidence relation between the sub-storage area and the submitted record and a second incidence relation between the submitted record and the data storage structure;
respectively using the storage area, the sub-storage area, the submission record and the data storage structure as vertices in the graph database;
storing the metadata according to the edges and the vertices using the graph database.
7. The method of claim 1, further comprising:
in response to the merging operation of at least two of the sub-storage areas, determining at least two of the commit records having the first association relationship with at least two of the sub-storage areas;
determining whether the same modification operation is executed on the same target tree node in the data storage structure in the same time period and modification results are inconsistent according to at least two submission records;
if yes, conflict prompt information is output to prompt that the multiple sub-storage areas have combined conflicts and specific conflict contents;
and if not, combining at least two sub-storage areas to obtain a new sub-storage area.
8. The method of claim 7, wherein determining whether the same modification operation has been performed on the same target tree node in the data storage structure in the same time period and the modification results are inconsistent according to at least two of the commit records comprises:
determining a common ancestor record of at least two of the commit records;
and comparing the hash values of the target tree nodes associated with at least two submitted records with the hash values of the target tree nodes associated with the common ancestor record respectively to determine whether the same modification operation is performed on the same target tree node in the data storage structure within the same time period and the modification results are inconsistent.
9. A data storage device, comprising:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a storage area of metadata, the storage area comprises a plurality of sub-storage areas, different sub-storage areas are isolated from each other, and different sub-storage areas are used for storing metadata of different versions;
a second obtaining module, configured to obtain multiple submission records, in which a first association relationship exists, corresponding to the multiple sub-storage areas, where the submission records are used to record submission information after metadata stored in the sub-storage areas are modified;
the relationship building module is used for building a second incidence relationship between the submission record and the data storage structure;
and the data storage module is used for storing the metadata according to a first incidence relation between the sub storage area and the submitted record and a second incidence relation between the submitted record and the data storage structure.
10. An electronic device, comprising: a memory, a processor, a communication interface; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the data storage method of any one of claims 1 to 8.
11. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the data storage method of any one of claims 1 to 8.
CN202210435936.1A 2022-04-24 2022-04-24 Data storage method, device, equipment and storage medium Pending CN114942963A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210435936.1A CN114942963A (en) 2022-04-24 2022-04-24 Data storage method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210435936.1A CN114942963A (en) 2022-04-24 2022-04-24 Data storage method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114942963A true CN114942963A (en) 2022-08-26

Family

ID=82907318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210435936.1A Pending CN114942963A (en) 2022-04-24 2022-04-24 Data storage method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114942963A (en)

Similar Documents

Publication Publication Date Title
AU2019219824B2 (en) System for synchronization of changes in edited websites and interactive applications
US20220342875A1 (en) Data preparation context navigation
US11762876B2 (en) Data normalization using data edge platform
CN110032604B (en) Data storage device, translation device and database access method
US20210174006A1 (en) System and method for facilitating complex document drafting and management
US11341171B2 (en) Method and apparatus for implementing a set of integrated data systems
US9753960B1 (en) System, method, and computer program for dynamically generating a visual representation of a subset of a graph for display, based on search criteria
US20130006968A1 (en) Data integration system
US7769719B2 (en) File system dump/restore by node numbering
US8285760B1 (en) System for organizing computer data
EP3362916B1 (en) Signature-based cache optimization for data preparation
CN113986873B (en) Method for processing, storing and sharing data modeling of mass Internet of things
US8880463B2 (en) Standardized framework for reporting archived legacy system data
US10296505B2 (en) Framework for joining datasets
CN111221791A (en) Method for importing multi-source heterogeneous data into data lake
US11675769B2 (en) On-demand, dynamic and optimized indexing in natural language processing
WO2014110940A1 (en) A method, apparatus and system for storing, reading the directory index
CN112148680B (en) File system metadata management method based on distributed graph database
CN111221785A (en) Semantic data lake construction method of multi-source heterogeneous data
CN113590894A (en) Dynamic and efficient remote sensing image metadata warehousing retrieval method
CN114372174A (en) XML document distributed query method and system
CN114942963A (en) Data storage method, device, equipment and storage medium
US11074401B2 (en) Merging delta object notation documents
US11023674B2 (en) Generation and application of object notation deltas
CN113282551B (en) Data processing method, system and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination