CN117435776B - Metadata storage and query method, device, computer equipment and storage medium - Google Patents

Metadata storage and query method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117435776B
CN117435776B CN202311755927.1A CN202311755927A CN117435776B CN 117435776 B CN117435776 B CN 117435776B CN 202311755927 A CN202311755927 A CN 202311755927A CN 117435776 B CN117435776 B CN 117435776B
Authority
CN
China
Prior art keywords
metadata
information
target
key
target key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311755927.1A
Other languages
Chinese (zh)
Other versions
CN117435776A (en
Inventor
王淏舟
杨峻峰
赵园
韩冰
秦轶群
郭罡
冯雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Tuoshupai Technology Development Co ltd
Original Assignee
Hangzhou Tuoshupai Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Tuoshupai Technology Development Co ltd filed Critical Hangzhou Tuoshupai Technology Development Co ltd
Priority to CN202311755927.1A priority Critical patent/CN117435776B/en
Publication of CN117435776A publication Critical patent/CN117435776A/en
Application granted granted Critical
Publication of CN117435776B publication Critical patent/CN117435776B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a metadata storage and query method, a device, a computer device and a storage medium, wherein the metadata storage method comprises the following steps: constructing a Huffman coding tree corresponding to each metadata block based on the identification information in each metadata block; generating a target key corresponding to each coding path in the Huffman coding tree, and determining a target disk page corresponding to the target key; further, the original data information corresponding to each coding path is used as a correlation value, and the target key and the corresponding correlation value are associated and stored to the target disk page. The application solves the problem of higher data storage cost caused by the fact that the storage space cannot be efficiently utilized, and realizes the reduction of the data storage cost.

Description

Metadata storage and query method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of information processing technologies, and in particular, to a metadata storage and query method, a metadata storage and query device, a computer device, and a storage medium.
Background
Metadata is used for providing attribute and characteristic information of data, and comprises information such as data structure, format, content and the like, and the metadata can be used for realizing operations such as data query, data quality management and the like. Moreover, the metadata is used as key data in the database, and if the metadata is damaged, the database is out of service and cannot be restored. Therefore, secure storage of metadata is required.
The current metadata storage method judges the type of the acquired metadata to be stored, processes the metadata to be stored according to the type of the metadata to be stored by utilizing different storage formats to obtain new metadata, and stores the new metadata in a preset metadata storage system. However, the storage method cannot efficiently utilize the storage space, resulting in high data storage cost.
Aiming at the problem that the storage space cannot be efficiently utilized in the related technology, so that the data storage cost is high, no effective solution is proposed at present.
Disclosure of Invention
In this embodiment, a metadata storage and query method, apparatus, computer device, and storage medium are provided to solve the problem that in the related art, the storage space cannot be efficiently utilized, resulting in higher data storage cost.
In a first aspect, in this embodiment, there is provided a metadata storage method, including:
Constructing a Huffman coding tree corresponding to each metadata block based on the identity information in each metadata block;
generating a target key corresponding to each coding path in the Huffman coding tree, and determining a target disk page corresponding to the target key;
and taking the original data information corresponding to each coding path as a correlation value, and storing the correlation value of the target key and the corresponding correlation value in the target disk page.
In some embodiments, the constructing a huffman coding tree corresponding to each metadata block based on the identification information in each metadata block includes:
acquiring the identity information in each metadata block; the identity information comprises metadata identification, database identification, view identification and domain identification;
coding the identification information in each metadata block to obtain a corresponding Huffman coding tree;
each level of the huffman coding tree corresponds to the identity information of different categories.
In some embodiments, the generating the target key corresponding to each encoding path in the huffman coding tree includes:
acquiring each coding path in the Huffman coding tree;
Acquiring data characteristic information corresponding to the coding path; the data characteristic information comprises version information and data area characteristic codes;
and determining the code of each node in the code path, and generating the corresponding target key based on the code of each node and the data characteristic information.
In some embodiments, after the storing the target key and the corresponding correlation value association to the target disk page, the method further includes:
When the fact that the same coding section exists in the target key of each metadata block in the target disk page is detected, generating a mark value corresponding to the same coding section; the tag value is associated with a reference address of the same encoded segment;
Updating the same encoded segment in the metadata block to the tag value.
In some embodiments, after the storing the target key and the corresponding correlation value in the target disk page, the method further includes:
Generating a query key corresponding to a new metadata block when the new metadata block is received;
Acquiring code tree information corresponding to the query key, and performing coding processing on the query key based on the code tree information to obtain a target key;
taking the original data information in the new metadata block as a correlation value;
And determining a data node corresponding to the target key, and storing the target key and the corresponding related value in the data node in an associated way.
In a second aspect, in this embodiment, there is provided a metadata query method, including:
generating a corresponding query key according to the received user demand data, and acquiring code tree information corresponding to the query key;
encoding the query key based on the encoding tree information to obtain a target key;
Determining a target disk page corresponding to the target key, and acquiring a correlation value corresponding to the target key from the target disk page; the correlation value stores the original data information in the metadata block.
In some embodiments, the obtaining the code tree information corresponding to the query key includes:
in the code tree buffer, code tree information corresponding to the query key is searched;
and extracting the code tree information from the metadata index node when the code tree information is not retrieved.
In a third aspect, in this embodiment, there is provided a metadata storage apparatus, including: the device comprises a construction module, a generation module and a storage module;
the construction module is used for constructing a Huffman coding tree corresponding to each metadata block based on the identity information in each metadata block;
The generating module is used for generating a target key corresponding to each coding path in the Huffman coding tree and determining a target disk page corresponding to the target key;
and the storage module is used for taking the original data information corresponding to each coding path as a correlation value, and storing the correlation value associated with the target key and the corresponding target key to the target disk page.
In a fourth aspect, in this embodiment, there is provided a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the metadata storage method of the first aspect.
In a fifth aspect, in this embodiment, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the metadata storage method of the first aspect described above.
Compared with the related art, the metadata storage and query method, device, computer equipment and storage medium provided in the embodiment construct Huffman coding trees corresponding to each metadata block based on the identification information in each metadata block; generating a target key corresponding to each coding path in the Huffman coding tree, and determining a target disk page corresponding to the target key; furthermore, the original data information corresponding to each coding path is used as a correlation value, and the target key and the corresponding correlation value are stored in a target disk page in a correlated manner, so that the problem that the storage space cannot be efficiently utilized, the data storage cost is high is solved, and the data storage cost is reduced.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
fig. 1 is a block diagram of a hardware structure of a terminal device of a metadata storage method according to an embodiment of the present application;
FIG. 2 is a flow chart of a metadata storage method according to an embodiment of the present application;
Fig. 3 is a schematic structural diagram of a huffman coding tree according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a child node interconnect structure according to an embodiment of the present application;
FIG. 5 is a flowchart of a metadata query method according to an embodiment of the present application;
FIG. 6 is a flow chart of a metadata storage method provided by a preferred embodiment of the present application;
FIG. 7 is a block diagram of a metadata query system according to an embodiment of the present application;
FIG. 8 is a block diagram of a metadata store according to one embodiment of the present application;
Fig. 9 is a block diagram of a metadata query device according to an embodiment of the present application.
In the figure: 102. a processor; 104. a memory; 106. a transmission device; 108. an input-output device; 10. constructing a module; 20. a generating module; 30. a storage module; 40. an acquisition module; 50. a coding module; 60. a query module; 100. a control module; 200. an input module; 300. an encoder; 400. a code tree buffer; 500. a metadata inode; 600. a processor; 700. metadata storage nodes.
Detailed Description
The present application will be described and illustrated with reference to the accompanying drawings and examples for a clearer understanding of the objects, technical solutions and advantages of the present application.
Unless defined otherwise, technical or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," "these" and similar terms in this application are not intended to be limiting in number, but may be singular or plural. The terms "comprising," "including," "having," and any variations thereof, as used herein, are intended to encompass non-exclusive inclusion; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (units) is not limited to the list of steps or modules (units), but may include other steps or modules (units) not listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this disclosure are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. Typically, the character "/" indicates that the associated object is an "or" relationship. The terms "first," "second," "third," and the like, as referred to in this disclosure, merely distinguish similar objects and do not represent a particular ordering for objects.
The method embodiments provided in the present embodiment may be executed in a terminal, a computer, or similar computing device. For example, running on a terminal, fig. 1 is a block diagram of the hardware structure of the terminal of the metadata storage method of the present embodiment. As shown in fig. 1, the terminal may include one or more (only one is shown in fig. 1) processors 102 and a memory 104 for storing data, wherein the processors 102 may include, but are not limited to, a microprocessor MCU, a programmable logic device FPGA, or the like. The terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and is not intended to limit the structure of the terminal. For example, the terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to a metadata storage method in the present embodiment, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, to implement the above-described method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. The network includes a wireless network provided by a communication provider of the terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as a NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.
In this embodiment, a metadata storage method is provided, fig. 2 is a flowchart of the metadata storage method of this embodiment, and as shown in fig. 2, the flowchart includes the following steps:
step S210, based on the identification information in each metadata block, a Huffman coding tree corresponding to each metadata block is constructed.
It should be appreciated that in a Key-Value pair (KV) distributed storage system, each Key corresponds to a unique Value. When metadata is stored in KV mode, each metadata block is converted into corresponding KV data, the related Value is used for storing the original data information of the metadata block, and the target Key Key is a metadata characteristic Value and is used for searching and inquiring the corresponding data.
Specifically, the identification information in each metadata block is acquired, the identification information comprises four data areas, namely metadata identification, database identification, view identification and domain identification, and the identification information in each metadata block is subjected to coding processing to obtain a corresponding Huffman coding tree.
The huffman coding tree is a coding tree structure for data compression, and each level of the huffman coding tree corresponds to different types of identification information, i.e. each level of the huffman coding tree is only responsible for encoding the identification information of the same type.
Step S220, generating a target key corresponding to each coding path in the Huffman coding tree, and determining a target disk page corresponding to the target key.
Specifically, each level of the huffman coding tree is traversed to obtain each coding path in the huffman coding tree. For each coding path, data characteristic information corresponding to the coding path is acquired, wherein the data characteristic information comprises version information and data area characteristic codes.
Further, the code of each node in the code path is extracted, the corresponding target key is generated based on the code of each node, the associated version information and the data area feature code, and the target disk page corresponding to the target key is determined.
It should be noted that, in this embodiment, the encoding operation is only performed on the identification information in the metadata block, and the version information and the data area feature code of the metadata block remain unchanged in the original form.
Step S230, the original data information corresponding to each coding path is used as a correlation value, and the target key and the corresponding correlation value are associated and stored in the target disk page.
Specifically, each encoding path corresponds to a different metadata block, so after generating a target key corresponding to the encoding path, the original data information in the associated metadata block is used as a correlation value, and the target key and the corresponding correlation value are stored in association with a target disk page, so that the metadata block is stored in order. In addition, when the metadata is queried, the query can be completed only by acquiring the corresponding disk page position, so that scanning of all disk pages is avoided, and the subsequent data query efficiency is improved.
It should be noted that the target disk page corresponds to the data node, and all metadata are segmented according to the target key and stored in different data nodes, so as to support load balancing of the database system. In addition, all leaf nodes in the Huffman coding tree store the corresponding data node position information, thus further improving the metadata inquiry performance.
The current metadata storage method judges the type of the acquired metadata to be stored, processes the metadata to be stored according to the type of the metadata to be stored by utilizing different storage formats to obtain new metadata, and stores the new metadata in a preset metadata storage system. However, the storage method cannot efficiently utilize the storage space, resulting in high data storage cost.
Compared with the prior art, the method and the device perform Huffman coding on each metadata block to be stored to obtain a corresponding Huffman coding tree, and generate a target key corresponding to each coding path in the Huffman coding tree. Based on the method, the original data information corresponding to each coding path is used as a relevant value, a target disk page corresponding to the target key is determined, and the target key and the corresponding relevant value are associated and stored to the target disk page, so that the storage space required by each target key is reduced by converting the data information forming the target key into Huffman codes, the problem that the storage space cannot be efficiently utilized, and the data storage cost is high is solved, and the data storage cost is reduced.
In some embodiments, the huffman coding tree corresponding to each metadata block is constructed based on the identification information in each metadata block, and the method comprises the following steps:
Step S211, obtaining the identity information in each metadata block; the identity information comprises metadata identification, database identification, view identification and domain identification;
Step S212, carrying out coding processing on the identity information in each metadata block to obtain a corresponding Huffman coding tree; each level of the huffman coding tree corresponds to different types of identity information.
Specifically, each metadata block to be stored is composed of a plurality of identity information, version information and corresponding original data information. The identification information comprises metadata identification, database identification, view identification and domain identification, which are respectively used for identifying different metadata blocks, distinguishing different databases where each metadata block in the same database system is located, views associated with the metadata blocks and different domains to which the metadata blocks belong.
Further, each metadata block to be stored is scanned, and the identification information contained in each metadata block is subjected to coding processing to generate a corresponding Huffman coding tree. Taking fig. 3 as an example to illustrate the encoding process of the metadata block, firstly, acquiring the metadata identification of the metadata block to be stored, and encoding the metadata identification as A as the root node of the encoding tree; acquiring a database identifier of the metadata block, and encoding the database identifier into a certain child node under the node A; and sequentially acquiring the view identifier and the domain identifier of the metadata block, and encoding the view identifier and the domain identifier into child nodes under a certain node of the upper layer.
According to the embodiment, the identity information in each metadata block is obtained, the identity information comprises metadata identification, database identification, view identification and domain identification, the identity information in each metadata block is subjected to coding processing to obtain a corresponding Huffman coding tree, each level of the Huffman coding tree corresponds to different types of identity information respectively, so that the Huffman coding tree is optimized, the number of layers of the Huffman coding tree is effectively reduced, and the coding and decoding capabilities of the metadata block are improved.
In some embodiments, generating a target key corresponding to each encoding path in the huffman coding tree includes the steps of:
Step S221, each coding path in the Huffman coding tree is obtained;
Step S222, obtaining data characteristic information corresponding to the coding path; the data characteristic information comprises version information and data area characteristic codes;
Step S223, determining the codes of each node in the coding path, and generating a corresponding target key based on the codes of each node and the data characteristic information.
Specifically, each level of the huffman coding tree is traversed to obtain each coding path in the huffman coding tree. As shown in fig. 3, "a-B-AB-AAA", "a-B-AB-AAB", and "a-B-AB-AAC" are three different encoding paths, respectively, and each encoding in each encoding path corresponds to different identification information in the same metadata block.
Further, data characteristic information corresponding to the encoding path is acquired, the data characteristic information including version information of the metadata block and the data area characteristic encoding. The version information comprises information such as a version number or a timestamp of the metadata block; and the data area feature code is a code for verifying the integrity and consistency of the metadata block.
Based on the above, when generating the target key of each metadata block, acquiring a first data area, namely a metadata identifier, in the current metadata block, searching a root node corresponding to the metadata identifier from a Huffman coding tree, and extracting a corresponding code from the root node; secondly, a second data area in the current metadata block, namely a database identifier, traversing the child nodes under the root node, determining the child nodes corresponding to the database identifier, and extracting corresponding codes from the child nodes; and the same operation is sequentially carried out on the third data area, namely the view identifier and the fourth data area, namely the domain identifier, so that codes corresponding to the view identifier and the domain identifier are respectively obtained.
After the codes of each data area in the metadata block are acquired, the acquired codes and the data characteristic information are combined, and the target key corresponding to the metadata block is regenerated. For example, the version information and the data area feature codes in the metadata block are 010OPENPIECLOUDDB …, the identifications of the first to fourth data areas are 314, 00100000560000, 000000000787899, 00000005768000 in sequence, and the codes corresponding to the respective identifications are A, A, AA and AAA, respectively, and the regenerated target key is "A-A-Aa-AAA-010OPENPIECLOUDDB …".
It is to be appreciated that each database system contains multiple databases, each database contains multiple views, each view contains multiple fields, and a large number of metadata blocks are stored in each field. In the existing key value pair storage mode, the target key of the metadata block needs to contain enough data to mark the target key, so that the data volume of the target key is large, and the embodiment utilizes the optimized Huffman coding tree to code '314-00100000560000-000000000787899-00000005768000-010 OPENPIECLOUDDB …' into 'A-A-AA-AaA-010 OPENPIECLOUDDB …' for storage, so that the storage space required by each target key is reduced while the safe storage of the metadata is ensured.
According to the embodiment, each coding path in the Huffman coding tree is obtained, and the data characteristic information corresponding to the coding path is obtained, wherein the data characteristic information comprises version information and data area characteristic codes, so that the codes of each node in the coding path are determined, and corresponding target keys are generated based on the codes of each node and the data characteristic information, so that the storage space required by each target key is reduced by converting the data information for generating the target key into Huffman codes with smaller storage space, the efficient utilization of the storage space is realized, and the data storage cost is reduced.
In some of these embodiments, after storing the target key and the corresponding correlation value association to the target disk page, the method further comprises the steps of:
When the fact that the same coding section exists in the target key of each metadata block in the target disk page is detected, generating a mark value corresponding to the same coding section; the tag value is associated with a reference address of the same encoded segment;
the same encoded segment in the metadata block is updated to the tag value.
Specifically, in each target disk page storing metadata blocks, it is determined whether the target key of each metadata block in the page has the same code segment. If the target keys of the metadata blocks are detected to have the same coding segments, generating the marking values corresponding to the same coding segments.
Illustratively, when the metadata block stored in the target disk page includes "314-A-AA-aAA- …", "314-A-AA-aAA- …", "314-A-AA-aAB- …" and "314-A-AB-aAA- …", the same code segment is "314-A-AA", the tag value Ref corresponding to the code segment is generated, and the same code segment in the metadata block is updated to the tag value Ref, the updated target disk page includes "314-A-AA-aAA- …", "Ref-aAA- …", "Ref-aAB- …" and "314-A-AB-aAA- …".
When the metadata block is updated, the first metadata block with the same coding segment in the target disk page remains unchanged in the original form, the same coding segment in the metadata block is used as the data to be referenced, and the marking value is associated with the reference address of the same coding segment.
According to the embodiment, when the fact that the same coding section exists in the target key of each metadata block in the target disk page is detected, the marking value corresponding to the same coding section is generated, the marking value is associated with the reference address of the same coding section, the same coding section in the metadata block is updated to the marking value, and therefore continuous deduplication compression of stored data is achieved, and space occupied by metadata storage is further reduced.
In some of these embodiments, after storing the target key and the corresponding correlation value association to the target disk page, the method further includes the steps of:
step S241, when a new metadata block is received, generating a query key corresponding to the new metadata block;
step S242, obtaining the code tree information corresponding to the query key, and carrying out coding processing on the query key based on the code tree information to obtain a target key;
Step S243, the original data information in the new metadata block is used as the related value;
Step S244, determining a target disk page corresponding to the target key, and storing the target key and the corresponding related value in the target disk page in an associated manner.
Specifically, when a new metadata block exists and needs to be inserted into a data node, a query key corresponding to the new metadata block is generated, the query key is transmitted to an encoder, and the encoder queries corresponding coding tree information in a coding tree buffer according to the received query key. If the cache hits, the coding tree buffer sends coding tree information required by the encoder to the encoder; if the cache is not hit, the code tree buffer acquires the needed code tree information from the metadata index node and returns to the encoder.
After the encoder obtains the code tree information corresponding to the query key, the query key is encoded based on the code tree information, and a target key is generated and sent to the processor. The code tree information is used for providing information such as a code format and the like so as to generate a target key which is suitable for a local storage mode.
Further, the original data information in the new metadata block is used as a correlation value, the processor obtains the position of the target disk page corresponding to the target key from the code tree buffer, and the target key and the corresponding correlation value are stored in the target disk page in a correlated manner, so that the insertion storage of the metadata block is completed. In addition, the operation principle of deleting a metadata block is the same as that of inserting a metadata block.
It is to be appreciated that if the code tree information required for the new metadata block is not in the local code tree, then code tree expansion is performed. For example, based on the coding tree information required by the new metadata block, a coding corresponding to the database identifier needs to be added, and then in the existing huffman coding tree, a child node is added to the level of the database identifier, so that the latest version of the coding tree is obtained. At this time, the processor sends the latest version of the code tree to the metadata index node, the metadata index node updates the code tree version, and the processor refreshes the code tree buffer again to realize rolling type update.
Through the embodiment, when a new metadata block is received, a query key corresponding to the new metadata block is generated; acquiring code tree information corresponding to the query key, and performing coding processing on the query key based on the code tree information to obtain a target key; taking original data information in the new metadata block as a correlation value; and further, determining a target disk page corresponding to the target key, and storing the target key and the corresponding related value in the target disk page in an associated manner, so that the insertion of the metadata block is realized.
In some of these embodiments, the individual child nodes under each node are associated by an inline data pointer in a huffman coding tree.
Specifically, in the huffman coding tree, the hierarchy corresponding to the database identifier, the view identifier, and the domain identifier includes a plurality of child nodes. For child nodes under the same node, an inline data pointer is set to associate each child node.
Taking fig. 4 as an example for illustration, node AA, node AB and node AC are all child nodes under node a, and node AA, node AB and node AC are associated with each other by using an inline data pointer. Based on the above, when traversing each level of the huffman coding tree, the query can be performed among each child node under the same node without going back to the parent node.
According to the embodiment, in the Huffman coding tree, all child nodes under each node are related through the inline data pointer, so that the searching efficiency of the same layer of data is improved, and the traversing cost of the Huffman coding tree is reduced.
In this embodiment, a metadata query method is provided, and fig. 5 is a flowchart of the metadata query method of this embodiment, as shown in fig. 5, where the flowchart includes the following steps:
Step S510, generating a corresponding query key according to the received user demand data, and acquiring code tree information corresponding to the query key;
step S520, coding the query key based on the coding tree information to obtain a target key;
Step S530, determining a target disk page corresponding to the target key, and acquiring a correlation value corresponding to the target key from the target disk page; the correlation value stores the original data information in the metadata block.
Specifically, when user demand data is received, a corresponding query key is generated according to the user demand data, the query key is sent to an encoder, and coding tree information required by the query key is queried through the encoder.
After the code tree information is obtained, the query key is coded based on the code tree information, and the target key is obtained. The code tree information is used for providing information such as a code format and the like so as to generate a target key which is suitable for a local storage mode.
It should be noted that, in the metadata storage process, in order to support load balancing of the database system, all metadata is subjected to data slicing according to the target key, and is stored in different data nodes, and all data nodes are placed in the metadata storage node. Based on the above, the target data node corresponding to the target key and the position of the corresponding target disk page are obtained from the code tree buffer, and the correlation value corresponding to the target key is extracted from the target disk page in the metadata storage node, and the correlation value stores the original data information in the metadata block.
Moreover, if the mark value exists in the target key in the metadata query process, the reference address associated with the mark value is acquired, and the compressed and stored coding segment can be restored according to the reference address, so that the storage space is reduced, and the accurate query of the metadata is ensured.
According to the embodiment, a corresponding query key is generated according to the received user demand data, and code tree information corresponding to the query key is obtained; encoding the query key based on the encoding tree information to obtain a target key; further, a target disk page corresponding to the target key is determined, a correlation value corresponding to the target key is obtained from the target disk page, wherein the correlation value stores original data information in a metadata block, so that when metadata is queried, query can be completed only by obtaining the corresponding disk page position, scanning of all disk pages is avoided, and metadata query efficiency is improved.
In some embodiments, the method for obtaining the code tree information corresponding to the query key includes the following steps:
step S511, the code tree information corresponding to the query key is searched in the code tree buffer;
in step S512, when the code tree information is not retrieved, the code tree information is extracted from the metadata index node.
Specifically, when user demand data is received, a corresponding query key is generated according to the user demand data, the query key is sent to the encoder, and whether the code tree buffer contains code tree information required by the query key is queried through the encoder.
If the cache hits, namely the current needed code tree information is retrieved from the code tree buffer, the code tree buffer sends the code tree information needed by the encoder to the encoder; if the cache is not hit, i.e. the currently required code tree information is not retrieved in the code tree buffer, the code tree buffer obtains the required code tree information from the metadata inode and returns to the encoder.
According to the embodiment, the code tree information corresponding to the query key is searched in the code tree buffer, and when the code tree information is not searched, the code tree information is extracted from the metadata index node, so that the target key suitable for the local metadata storage mode can be obtained through encoding.
The present embodiment is described and illustrated below by way of preferred embodiments.
Fig. 6 is a flowchart of the metadata storage method of the present preferred embodiment, and as shown in fig. 6, the metadata storage method includes the steps of:
Step S610, obtaining the identity information in each metadata block; the identity information comprises metadata identification, database identification, view identification and domain identification;
Step S620, carrying out coding processing on the identity information in each metadata block to obtain a corresponding Huffman coding tree; wherein, each level of the Huffman coding tree corresponds to the identity information of different categories respectively;
step S630, each coding path in the Huffman coding tree is obtained;
step S640, obtaining data characteristic information corresponding to the coding path; the data characteristic information comprises version information and data area characteristic codes;
Step S650, determining the code of each node in the code path, and generating a corresponding target key based on the code and the data characteristic information of each node;
step S660, the original data information corresponding to each coding path is used as a correlation value, and the target key and the corresponding correlation value are stored in a correlated manner.
Through the embodiment, the identity information in each metadata block is acquired; the identity information comprises metadata identification, database identification, view identification and domain identification; coding the identification information in each metadata block to obtain a corresponding Huffman coding tree; each level of the Huffman coding tree corresponds to different types of identity information, so that the number of layers of the Huffman coding tree is effectively reduced, and the coding and decoding capabilities of metadata blocks are improved.
Then, each coding path in the Huffman coding tree is obtained, and data characteristic information corresponding to the coding path is obtained, wherein the data characteristic information comprises version information and data area characteristic codes; determining the code of each node in the code path, and generating a corresponding target key based on the code of each node and the data characteristic information; the original data information corresponding to each coding path is used as a related value, and the target key and the corresponding related value are associated and stored, so that the storage space required by each target key is reduced by converting the data information forming the target key into Huffman codes, the problem that the storage space cannot be efficiently utilized, the data storage cost is high is solved, and the data storage cost is reduced.
It should be noted that the steps illustrated in the above-described flow or flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.
Also provided in this embodiment is a metadata query system, fig. 7 is a block diagram of the metadata storage system of this embodiment, and as shown in fig. 7, the system includes: control module 100, input module 200, encoder 300, encoding tree buffer 400, metadata inode 500, processor 600, and metadata storage node 700;
When the input module 200 receives the user demand data, the control module 100 controls the input module to generate a corresponding query key according to the user demand data, send the query key to the encoder 300, and query whether the code tree buffer 400 contains code tree information required by the query key through the encoder 300. If the cache hits, the code tree buffer 400 sends the code tree information required by the encoder 300 to the encoder 300; if there is a cache miss, the code tree buffer 400 obtains the required code tree information from the metadata inode 500 and returns to the encoder 300.
After acquiring the code tree information corresponding to the query key, the encoder 300 encodes the query key based on the code tree information, generates a target key, and sends the target key to the processor 600. The code tree information is used for providing information such as a code format and the like so as to generate a target key which is suitable for a local storage mode.
Further, the processor 600 obtains the location of the target disk page corresponding to the target key from the code tree buffer 400, and extracts, in the metadata storage node 700, a correlation value corresponding to the target key from the target disk page, where the correlation value stores the original data information in the metadata block. It should be noted that the metadata storage node 700 is used to store all data nodes.
According to the embodiment, a corresponding query key is generated according to the received user demand data, and code tree information corresponding to the query key is obtained; encoding the query key based on the encoding tree information to obtain a target key; and determining a target disk page corresponding to the target key, and acquiring a correlation value corresponding to the target key from the target disk page, wherein the correlation value stores the original data information in the metadata block, so that the metadata query efficiency is improved.
In this embodiment, a metadata storage device is further provided, and the metadata storage device is used to implement the foregoing embodiments and preferred embodiments, and will not be described in detail. The terms "module," "unit," "sub-unit," and the like as used below may refer to a combination of software and/or hardware that performs a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware, are also possible and contemplated.
Fig. 8 is a block diagram of the structure of the metadata storage apparatus of the present embodiment, as shown in fig. 8, comprising: a construction module 10, a generation module 20 and a storage module 30;
A construction module 10, configured to construct a huffman coding tree corresponding to each metadata block based on the identification information in each metadata block;
the generating module 20 is configured to generate a target key corresponding to each encoding path in the huffman coding tree, and determine a target disk page corresponding to the target key;
The storage module 30 is configured to use the original data information corresponding to each encoding path as a correlation value, and store the target key and the corresponding correlation value in association with each other to the target disk page.
By the device provided by the embodiment, based on the identity information in each metadata block, a Huffman coding tree corresponding to each metadata block is constructed; generating a target key corresponding to each coding path in the Huffman coding tree, and determining a target disk page corresponding to the target key; furthermore, the original data information corresponding to each coding path is used as a correlation value, and the target key and the corresponding correlation value are stored in a target disk page in a correlated manner, so that the problem that the storage space cannot be efficiently utilized, the data storage cost is high is solved, and the data storage cost is reduced.
In some embodiments, on the basis of fig. 8, the apparatus further includes an encoding module, configured to obtain identification information in each metadata block; the identity information comprises metadata identification, database identification, view identification and domain identification; coding the identification information in each metadata block to obtain a corresponding Huffman coding tree; each level of the huffman coding tree corresponds to different types of identity information.
In some embodiments, on the basis of fig. 8, the apparatus further includes a combining module, configured to obtain each encoding path in the huffman encoding tree; acquiring data characteristic information corresponding to the coding path; the data characteristic information comprises version information and data area characteristic codes; and determining the code of each node in the code path, and generating a corresponding target key based on the code of each node and the data characteristic information.
In some embodiments, on the basis of fig. 8, the apparatus further includes an updating module, configured to generate, when detecting that the target key of each metadata block in the target disk page has the same coding segment, a flag value corresponding to the same coding segment; the tag value is associated with a reference address of the same encoded segment; the same encoded segment in the metadata block is updated to the tag value.
In some embodiments, on the basis of fig. 8, the apparatus further includes a query module, configured to generate a query key corresponding to the new metadata block when the new metadata block is received; acquiring code tree information corresponding to the query key, and performing coding processing on the query key based on the code tree information to obtain a target key; taking original data information in the new metadata block as a correlation value; and determining a data node corresponding to the target key, and storing the target key and the corresponding related value in the data node in an associated way.
In this embodiment, there is also provided a metadata query device, and fig. 9 is a block diagram of the metadata storage device of this embodiment, as shown in fig. 9, where the device includes: an acquisition module 40, an encoding module 50 and a query module 60;
The obtaining module 40 is configured to generate a corresponding query key according to the received user demand data, and obtain code tree information corresponding to the query key;
the encoding module 50 is configured to encode the query key based on the encoding tree information to obtain a target key;
The query module 60 is configured to determine a target disk page corresponding to the target key, and obtain a correlation value corresponding to the target key from the target disk page; the correlation value stores the original data information in the metadata block.
By the device provided by the embodiment, corresponding query keys are generated according to the received user demand data, and code tree information corresponding to the query keys is obtained; encoding the query key based on the encoding tree information to obtain a target key; further, a target disk page corresponding to the target key is determined, and a correlation value corresponding to the target key is obtained from the target disk page, wherein the correlation value stores original data information in the metadata block, so that metadata query efficiency is improved.
In some embodiments, on the basis of fig. 9, the apparatus further includes a retrieving module, configured to retrieve, in the code tree buffer, code tree information corresponding to the query key; when the code tree information is not retrieved, the code tree information is extracted from the metadata inode.
The above-described respective modules may be functional modules or program modules, and may be implemented by software or hardware. For modules implemented in hardware, the various modules described above may be located in the same processor; or the above modules may be located in different processors in any combination.
There is also provided in this embodiment a computer device comprising a memory in which a computer program is stored and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the computer device may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
It should be noted that, specific examples in this embodiment may refer to examples described in the foregoing embodiments and alternative implementations, and are not described in detail in this embodiment.
In addition, in combination with the metadata storage method provided in the above embodiment, a storage medium may also be provided in this embodiment to implement. The storage medium has a computer program stored thereon; the computer program, when executed by a processor, implements any of the metadata storage methods of the above embodiments.
It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure in accordance with the embodiments provided herein.
It is to be understood that the drawings are merely illustrative of some embodiments of the present application and that it is possible for those skilled in the art to adapt the present application to other similar situations without the need for inventive work. In addition, it should be appreciated that while the development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as a departure from the disclosure.
The term "embodiment" in this disclosure means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. It will be clear or implicitly understood by those of ordinary skill in the art that the embodiments described in the present application can be combined with other embodiments without conflict.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the patent claims. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (9)

1. A method of metadata storage, the method comprising:
Constructing a Huffman coding tree corresponding to each metadata block based on the identity information in each metadata block; the identity information comprises metadata identification, database identification, view identification and domain identification;
Generating a target key corresponding to each coding path in the Huffman coding tree, and determining a target disk page corresponding to the target key; the generating the target key corresponding to each coding path in the huffman coding tree includes: acquiring each coding path in the Huffman coding tree; acquiring data characteristic information corresponding to the coding path; the data characteristic information comprises version information and data area characteristic codes; determining the code of each node in the code path, and generating the corresponding target key based on the code of each node and the data characteristic information;
and taking the original data information corresponding to each coding path as a correlation value, and storing the correlation value of the target key and the corresponding correlation value in the target disk page.
2. The method according to claim 1, wherein constructing a huffman coding tree corresponding to each of the metadata blocks based on the identification information in each of the metadata blocks comprises:
acquiring the identity information in each metadata block; the identity information comprises metadata identification, database identification, view identification and domain identification;
coding the identification information in each metadata block to obtain a corresponding Huffman coding tree;
each level of the huffman coding tree corresponds to the identity information of different categories.
3. The metadata storage method according to claim 1, further comprising, after said storing said target key and said corresponding correlation value in association with said target disk page:
When the fact that the same coding section exists in the target key of each metadata block in the target disk page is detected, generating a mark value corresponding to the same coding section; the tag value is associated with a reference address of the same encoded segment;
Updating the same encoded segment in the metadata block to the tag value.
4. The metadata storage method according to claim 1, further comprising, after said storing the target key in association with the corresponding correlation value to the target disk page:
Generating a query key corresponding to a new metadata block when the new metadata block is received;
Acquiring code tree information corresponding to the query key, and performing coding processing on the query key based on the code tree information to obtain a target key;
taking the original data information in the new metadata block as a correlation value;
and determining the target disk page corresponding to the target key, and storing the target key and the corresponding related value in association with the target disk page.
5. A method of metadata querying, the method comprising:
Generating a corresponding query key according to the received user demand data, and acquiring code tree information corresponding to the query key; the coding tree information is used for providing a coding format;
encoding the query key based on the encoding tree information to obtain a target key;
Determining a target disk page corresponding to the target key, and acquiring a correlation value corresponding to the target key from the target disk page; the correlation value stores the original data information in the metadata block.
6. The metadata query method as claimed in claim 5, wherein said obtaining the code tree information corresponding to the query key comprises:
in the code tree buffer, code tree information corresponding to the query key is searched;
and extracting the code tree information from the metadata index node when the code tree information is not retrieved.
7. A metadata storage apparatus, the apparatus comprising: the device comprises a construction module, a generation module and a storage module;
the construction module is used for constructing a Huffman coding tree corresponding to each metadata block based on the identity information in each metadata block; the identity information comprises metadata identification, database identification, view identification and domain identification;
The generating module is used for generating a target key corresponding to each coding path in the Huffman coding tree and determining a target disk page corresponding to the target key;
The generating module is further configured to obtain each encoding path in the huffman encoding tree; acquiring data characteristic information corresponding to the coding path; the data characteristic information comprises version information and data area characteristic codes; determining the code of each node in the code path, and generating the corresponding target key based on the code of each node and the data characteristic information;
and the storage module is used for taking the original data information corresponding to each coding path as a correlation value, and storing the correlation value associated with the target key and the corresponding target key to the target disk page.
8. A computer device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the steps of the metadata storage method of any of claims 1 to 4.
9. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the metadata storage method of any of claims 1 to 4.
CN202311755927.1A 2023-12-20 2023-12-20 Metadata storage and query method, device, computer equipment and storage medium Active CN117435776B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311755927.1A CN117435776B (en) 2023-12-20 2023-12-20 Metadata storage and query method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311755927.1A CN117435776B (en) 2023-12-20 2023-12-20 Metadata storage and query method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117435776A CN117435776A (en) 2024-01-23
CN117435776B true CN117435776B (en) 2024-04-30

Family

ID=89552048

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311755927.1A Active CN117435776B (en) 2023-12-20 2023-12-20 Metadata storage and query method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117435776B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117687970A (en) * 2024-02-02 2024-03-12 济南浪潮数据技术有限公司 Metadata retrieval method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975498A (en) * 2016-04-27 2016-09-28 华为技术有限公司 Data query method, device and system
CN112835896A (en) * 2021-01-27 2021-05-25 浙江中智达科技有限公司 Real-time database data hotspot balancing method, device, equipment and medium
CN112948717A (en) * 2021-05-13 2021-06-11 北京电信易通信息技术股份有限公司 Massive space POI searching method and system based on multi-factor constraint
US11086524B1 (en) * 2018-06-27 2021-08-10 Datadirect Networks, Inc. System and method for non-volatile memory based optimized, versioned, log-structured metadata storage with efficient data retrieval
CN114900193A (en) * 2022-04-08 2022-08-12 博流智能科技(南京)有限公司 Adaptive Huffman coding system and method
CN116560581A (en) * 2023-05-19 2023-08-08 济南浪潮数据技术有限公司 Virtual machine disk file migration method, system, storage medium and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975498A (en) * 2016-04-27 2016-09-28 华为技术有限公司 Data query method, device and system
US11086524B1 (en) * 2018-06-27 2021-08-10 Datadirect Networks, Inc. System and method for non-volatile memory based optimized, versioned, log-structured metadata storage with efficient data retrieval
CN112835896A (en) * 2021-01-27 2021-05-25 浙江中智达科技有限公司 Real-time database data hotspot balancing method, device, equipment and medium
CN112948717A (en) * 2021-05-13 2021-06-11 北京电信易通信息技术股份有限公司 Massive space POI searching method and system based on multi-factor constraint
CN114900193A (en) * 2022-04-08 2022-08-12 博流智能科技(南京)有限公司 Adaptive Huffman coding system and method
CN116560581A (en) * 2023-05-19 2023-08-08 济南浪潮数据技术有限公司 Virtual machine disk file migration method, system, storage medium and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Self-index GML Storage Approach based on Element coding;Weili Wang et al;《2011 19th International Conference on Geoinfomatics》;20110811;第1-6页 *
基于XML的关键字查询算法研究;田冰;《万方学位论文》;20131030;第1-64页 *

Also Published As

Publication number Publication date
CN117435776A (en) 2024-01-23

Similar Documents

Publication Publication Date Title
CN109165215B (en) Method and device for constructing space-time index in cloud environment and electronic equipment
CN117435776B (en) Metadata storage and query method, device, computer equipment and storage medium
US10620830B2 (en) Reconciling volumelets in volume cohorts
CN111339382B (en) Character string data retrieval method, device, computer equipment and storage medium
CN101324896B (en) Method for storing and searching vector data and management system thereof
CN110427368A (en) Data processing method, device, electronic equipment and storage medium
CN109104405B (en) Binary protocol encoding and decoding method and device
CN108733317B (en) Data storage method and device
CN100383794C (en) Searching method, holding method and searching system for dictionary-like data
CN106681995B (en) Data caching method, data query method and device
CN107590157B (en) Data storage method, data query method and related equipment
CN111680489B (en) Target text matching method and device, storage medium and electronic equipment
CN104539750A (en) IP locating method and device
CN108647266A (en) A kind of isomeric data is quickly distributed storage, exchange method
CN111611250A (en) Data storage device, data query method, data query device, server and storage medium
CN109325089A (en) A kind of non-pointing object querying method, device, terminal device and storage medium
CN114610708A (en) Vector data processing method and device, electronic equipment and storage medium
CN115563409A (en) Address administrative division identification method, device, equipment and medium
CN110413807B (en) Image query method and system based on content semantic metadata
CN110825706A (en) Data compression method and related equipment
CN104301182A (en) Method and device for inquiring slow website access abnormal information
CN104915394A (en) Yellow page information updating method and device
CN110972258A (en) Method and device for establishing position fingerprint database
CN107070987B (en) Data acquisition method and system for distributed object storage system
CN116126928A (en) Information searching system based on variable fingerprint cuckoo filter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant