WO2024104073A1 - Metadata access method and device, and storage medium - Google Patents

Metadata access method and device, and storage medium Download PDF

Info

Publication number
WO2024104073A1
WO2024104073A1 PCT/CN2023/126791 CN2023126791W WO2024104073A1 WO 2024104073 A1 WO2024104073 A1 WO 2024104073A1 CN 2023126791 W CN2023126791 W CN 2023126791W WO 2024104073 A1 WO2024104073 A1 WO 2024104073A1
Authority
WO
WIPO (PCT)
Prior art keywords
metadata
memory
data
target metadata
target
Prior art date
Application number
PCT/CN2023/126791
Other languages
French (fr)
Chinese (zh)
Inventor
王淏舟
杨峻峰
冯雷
Original Assignee
杭州拓数派科技发展有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州拓数派科技发展有限公司 filed Critical 杭州拓数派科技发展有限公司
Publication of WO2024104073A1 publication Critical patent/WO2024104073A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of data processing, and in particular to a metadata access method, device and storage medium.
  • the traditional metadata access method is that the user sends the access instruction to the distribution node, the distribution node connects to a master node, and sends the access instruction to the master node.
  • the master node pulls metadata from the metadata service to parse the instruction.
  • the master node starts the computing node and sends the instruction to the computing node.
  • the computing node After the computing node starts, it receives the instruction, and the computing node pulls metadata from the metadata service and processes the instruction.
  • the computing node returns the processing result and exits and is destroyed to release computing resources.
  • all nodes (distribution nodes, master nodes, and slave nodes) need to access the metadata service.
  • the data volume is large, which occupies a large amount of network bandwidth, increases network overhead and cost, greatly increases the load of the metadata service, and leads to poor database performance, and requires more resources to be allocated to the metadata service.
  • the high load of the metadata service limits the maximum number of nodes, which means that it limits the performance of the entire database cluster.
  • a metadata access method, device, and storage medium are provided.
  • a metadata access method in this embodiment, and the method includes:
  • the target metadata is extracted from the metadata service by a data extractor and cached to a cloud disk, and the loader loads the cached data of the target metadata from the cloud disk to the first memory of the loader;
  • the target metadata is metadata in the metadata service whose data update frequency is lower than a first preset value and whose data access frequency is higher than a second preset value;
  • the cache data of the target metadata in the first memory is loaded into a second memory of a slave node corresponding to the master node; the slave node reads the second memory to access the target metadata.
  • the method before loading the cache data of the target metadata into the first memory, the method includes:
  • generating cache data of the target metadata according to the acquired target metadata includes:
  • the cache data is generated according to the encoded data and the type information.
  • generating cache data of the target metadata according to the acquired target metadata includes:
  • the version information of the target metadata is obtained, and the cache data is generated according to the version information.
  • the method further comprises:
  • the data update instruction is used to instruct to update the cache data of the target metadata stored in the first memory
  • cache data of the target metadata corresponding to the execution state is selected from the first memory and loaded into the second memory of the slave node.
  • selecting, according to the execution state, from the first memory, cache data of the target metadata corresponding to the execution state and loading it into the second memory of the slave node comprises:
  • cache data of the target metadata before the update is selected from the first memory and loaded into the second memory of the slave node;
  • the updated cache data of the target metadata is selected from the first memory and loaded into the second memory of the slave node.
  • the method further comprises:
  • the address of the first memory is connected to the slave node to be accessed.
  • the slave node is a stateless computing node.
  • a metadata access device in this embodiment, and the device includes:
  • a first loading module is used to extract target metadata from a metadata service through a data extractor and cache the target metadata to a cloud disk, and the loader loads the cached data of the target metadata from the cloud disk to a first memory of the loader;
  • the target metadata is metadata in the metadata service whose data update frequency is lower than a first preset value and whose data access frequency is higher than a second preset value;
  • the access module is used to respond to the metadata access request of the master node and to The cached data of the target metadata is loaded into the second memory of the slave node corresponding to the master node; the slave node reads the second memory to access the target metadata.
  • a computer-readable storage medium on which a computer program is stored.
  • the steps of the metadata access method described in the first aspect are implemented.
  • FIG. 1 is a hardware structure block diagram of a terminal for executing a metadata access method according to one or more embodiments of the present application.
  • FIG. 2 is a flowchart of a metadata access method according to one or more embodiments of the present application.
  • FIG. 3 is a flowchart of a metadata cache generation method in one or more embodiments of the present application.
  • FIG. 4 is a flowchart of a metadata cache access method in one or more embodiments of the present application.
  • FIG5 is a flowchart of a computing node updating method in one or more embodiments of the present application.
  • FIG. 6 is a flowchart of a metadata cache dynamic update method in one or more embodiments of the present application.
  • FIG. 7 is a structural block diagram of a metadata access device according to one or more embodiments of the present application.
  • connection is not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
  • the “multiple” involved in this application refers to two or more.
  • “And/or” describes the association relationship of associated objects, indicating that three relationships may exist. For example, “A and/or B” may mean: A exists alone, There are three situations: A and B exist at the same time, and B exists alone. Generally, the character “/” indicates that the objects related to each other are in an “or” relationship.
  • the terms “first”, “second”, “third”, etc. involved in this application are only used to distinguish similar objects and do not represent a specific ordering of objects.
  • Computing nodes based on distributed clusters can be dynamically generated and destroyed on demand. Distributed computing node resources do not need to be generated in advance.
  • Stateless computing nodes do not store any cluster information or data, and their creation and destruction will not have any impact on the distributed cluster. All slave nodes involved in this application are stateless computing nodes, and their states are all stored in the metadata service.
  • Metadata is key data in the database. Once damaged, the database will stop serving and cannot be recovered.
  • the distributed database involved in this application is a distributed data with storage and computing separation in a master-segment node architecture.
  • the master node is responsible for receiving user instructions (queries) and parsing
  • the slave node is a stateless computing node in the eMPP (elastic Massive Parallel Processing) architecture, which is responsible for processing user instructions, reading and processing data, and returning the results to the master node.
  • eMPP elastic Massive Parallel Processing
  • the metadata storage and access involved in this application are all stored and accessed by the same node, that is, metadata is stored uniformly as key data to ensure its security. All nodes (including all master and slave nodes) need to access metadata.
  • the metadata service system refers to a database-like service system that can provide unified metadata services for distributed databases, including storage, query, modification, and insertion.
  • the database master node When users add, delete, check, and modify stored data, the database master node will parse and translate the user instructions into machine language after receiving the user instructions, and then pass them to the computing node for processing. In the entire process (parsing, translation, processing), metadata is required for processing.
  • FIG1 is a hardware structure block diagram of a terminal that executes a metadata access method of an embodiment of the present application.
  • the terminal may include one or more (only one is shown in FIG. 1 ) processors 102 and a memory 104 for storing data, wherein the processor 102 may include but is not limited to a processing device such as a microprocessor MCU or a programmable logic device FPGA.
  • the terminal may also include a transmission device 106 and an input/output device 108 for communication functions.
  • the structure shown in FIG. 1 is for illustration only and does not limit the structure of the terminal.
  • the terminal may also include more or fewer components than those shown in FIG. 1 , or may have a different configuration than that shown in FIG. 1 .
  • the memory 104 can be used to store computer programs, for example, software programs and modules of application software, such as a computer program corresponding to a metadata access method in this embodiment.
  • the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, that is, to implement the above method.
  • the memory 104 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 104 may further include a memory remotely arranged relative to the processor 102, and these remote memories may be connected to the terminal via a network. Examples of the above network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the transmission device 106 is used to receive or send data via a network.
  • the above-mentioned network includes a wireless network provided by the communication provider of the terminal.
  • the transmission device 106 includes a network adapter (Network Interface Controller, referred to as NIC), which can be connected to other network devices through a base station so as to communicate with the Internet.
  • the transmission device 106 can be a radio frequency (Radio Frequency, referred to as RF) module, which is used to communicate with the Internet wirelessly.
  • RF Radio Frequency
  • FIG. 2 is a flowchart of a metadata access method in an embodiment of the present application. As shown in FIG. 2 , the process includes the following steps:
  • Step S210 extract the target metadata from the metadata service through the data extractor and cache it to the cloud disk, and the loader loads the cached data of the target metadata from the cloud disk into the first memory of the loader;
  • the target metadata is the metadata in the metadata service whose data update frequency is lower than the first preset value and whose data access frequency is higher than the second preset value.
  • this method can be applied to the eMPP architecture, and further to the distributed storage-computing separation database based on eMPP.
  • the loader loads the cached data of the target metadata into the first memory, where the target metadata is metadata in the metadata service with a data update frequency lower than a first preset value and a data access frequency higher than a second preset value.
  • the first memory here is the local memory of the loader, and the loader can load the cached data of the target metadata from the cloud disk to the local memory of the loader.
  • the metadata can be divided into hot data and cold data, where cold data can be defined as metadata with a low update frequency and a high access frequency, and hot data can be defined as metadata with a high update frequency and a low access frequency.
  • cold data it will be frequently accessed by computing nodes, but its data update frequency is low.
  • the cold data can be extracted from the metadata service as the target metadata to generate a cached data image, and the computing nodes can use the cached data from the cache.
  • the corresponding target metadata can be obtained from the stored data image without accessing the metadata service, which greatly saves network bandwidth and reduces the complexity of the metadata service.
  • the loader here may be a loader provided by the eMPP architecture.
  • Step S220 in response to the metadata access request of the master node, the cache data of the target metadata in the first memory is loaded into the second memory of the slave node corresponding to the master node; the slave node reads the second memory to access the target metadata.
  • the loader responds to the metadata access request of the master node, and loads the cached data of the target metadata in the first memory into the second memory of the slave node corresponding to the master node; the slave node reads the second memory to access the target metadata, and the second memory here can be the local memory of the slave node.
  • the slave node here is a stateless computing node, and the slave node's access to the cached data of the target metadata can only be notified to the loader by the master node.
  • the loader responds to the metadata access request of the master node, and the metadata access request carries the information of the slave node that needs to access the target metadata, and loads the cached data of the target metadata in the local memory of the loader into the local memory of the corresponding slave node, and the slave node reads the target metadata from its local memory to access the target metadata.
  • the loader first loads the target metadata into the local memory of the loader.
  • the target metadata is loaded into the local memory of the slave node, thereby reducing the access pressure of the metadata service and solving the problem in the prior art that all nodes need to access the metadata service, resulting in high metadata service load.
  • the process before loading the cache data of the target metadata into the first memory, the process includes: acquiring the target metadata in the metadata service, and generating the cache data of the target metadata according to the acquired target metadata.
  • the target metadata is extracted from the metadata service, and cache data of the target metadata is generated according to the acquired target metadata.
  • generating cache data of the target metadata based on the acquired target metadata includes: classifying the target metadata according to the data type of the target metadata to obtain type information; performing feature extraction on the classified target metadata; encoding the extracted features and the classified target metadata to obtain encoded data; and generating cache data based on the encoded data and the type information.
  • feature extraction is performed on the classified target metadata to extract feature values, and the extracted feature values and the classified target metadata are encoded together to obtain encoded data.
  • cache data is generated, and the cache data includes the type information of the encoded data and the target metadata.
  • a corresponding cache data image is generated and saved in the cloud disk.
  • the encoding here can be binary encoding, so that the encoded data obtained after encoding conforms to the data structure in the local memory of the loader and the local memory of the slave node.
  • generating cache data of the target metadata according to the acquired target metadata includes: acquiring version information of the target metadata, and generating cache data according to the version information.
  • the version information of the target metadata is obtained, the version information is generated for the cached data, the cached data with the version information is used to generate a corresponding cached data image, and the image is saved in the cloud disk.
  • the metadata access method also includes a cache dynamic update process, which includes: in response to a data update instruction sent by the metadata service, obtaining the execution status of the task of the slave node; the data update instruction is used to indicate the update of the cache data of the target metadata stored in the first memory; according to the execution status, selecting the cache data of the target metadata corresponding to the execution status from the first memory and loading it into the second memory of the slave node.
  • a cache dynamic update process which includes: in response to a data update instruction sent by the metadata service, obtaining the execution status of the task of the slave node; the data update instruction is used to indicate the update of the cache data of the target metadata stored in the first memory; according to the execution status, selecting the cache data of the target metadata corresponding to the execution status from the first memory and loading it into the second memory of the slave node.
  • the loader responds to the data update instruction sent by the metadata service, obtains the execution status of the task of the slave node, and when the execution status is that the slave node is executing the task, selects the cache data of the target metadata before the update from the first memory and loads it into the second memory of the slave node; when the execution status is that the slave node is idle, selects the cache data of the target metadata after the update from the first memory and loads it into the second memory of the slave node.
  • the slave node After all the slave nodes have selected the cache data of the updated target metadata from the first memory and loaded it into the second memory of the slave node, that is, after all the slave nodes have completed the cache connection update, the slave node deletes the old memory cache and the system deletes the mirror file of the old cache data.
  • a metadata cache generation method includes the following steps:
  • Step S310 The data extractor extracts the required metadata from the metadata service.
  • the required metadata here is the target metadata, that is, cold data.
  • the data extractor here can be a module that implements the data extraction function in the eMPP architecture.
  • Step S320 the cache data generator classifies the metadata according to the attributes of the data inside the metadata.
  • the cache data generator here can be a module that implements the cache data generation function in the eMPP architecture.
  • Step S330 The cache data generator pre-calculates metadata.
  • the pre-calculation here includes scanning the extracted classified metadata and calculating the feature value according to the feature class defined by the system.
  • the feature value here can be calculated by a hash algorithm, and the feature value here can be used to characterize the type of metadata.
  • Step S340 the cache data generator encodes the extracted metadata.
  • the cache data generator performs binary encoding on the classified metadata and calculated feature values to ensure that they conform to the in-memory data structure.
  • the data extractor passes the extracted data to the cache data generator for encoding to improve loading and query speeds.
  • Step S350 perform version verification on the metadata to generate version information.
  • Step S360 Pack the processed metadata to generate cache data.
  • step S370 the cache data generator adds version information to the packaged cache data, generates a corresponding cache data image, and stores it in the cloud disk.
  • the cache data generator adds version information to the packaged cache data, generates a cache data image from the cache data with added version information, and saves the generated cache data image in the cloud disk.
  • the master node or metadata service triggers metadata access, and the loader reads the cache data image of the metadata from the cloud disk and puts it into the loader's local hard disk.
  • the computing node reads data from the loader's local hard disk to the computing node's local hard disk.
  • the loader's local hard disk and the computing node's local hard disk are in the same server.
  • the computing node here is a stateless computing node.
  • data extraction extracts the required metadata from the metadata service and caches it to the cloud disk.
  • the loader first loads the target metadata into the local memory of the loader.
  • the target metadata is then loaded into the local memory of the computing node, thereby reducing the access pressure of the metadata service and solving the problem that all nodes in the prior art need to access the metadata service, resulting in high metadata service load;
  • the target metadata is classified, pre-calculated and encoded, which speeds up the subsequent query speed for cached data.
  • the metadata is classified into cold and hot data, making offline caching possible.
  • the offline cache is made into a data mirror and can be directly mounted through the operating system without the need for special hardware devices.
  • the metadata is pre-calculated to improve the query speed of cached data.
  • the query keywords of the metadata are hashed with corresponding multi-keywords, and the metadata is stored using a dedicated data structure.
  • the dedicated data structure is designed for cached data in the memory, classified and stored, and the generated data structure after encoding.
  • the metadata is specially encoded to improve the security and loading speed of the metadata.
  • the full binary encoding can be directly loaded into the memory as a whole block. Data verification is added to the dedicated data structure to ensure the correctness of the data.
  • the metadata cache generation method provided by this implementation reduces the load of the metadata service, reduces the network transmission bandwidth required for metadata, reduces the latency of metadata queries, improves the overall performance of the database cluster, and increases the number of physical nodes that can be supported by the database cluster.
  • a metadata cache access method is provided, as shown in FIG4 , the method includes the following steps:
  • Step S410 mounting the cached data image to the local environment.
  • the system mounts the cache data image to the local environment, where the system can be an operating system of the eMPP architecture.
  • the offline cache is made into a data image and can be directly mounted through the operating system without the need for special hardware devices.
  • Step S420 The loader reads the cached data image of the metadata.
  • the loader loads the cached data image in the cloud disk into the loader's memory, that is, the loader
  • the cached data image in the disk is loaded into the local hard disk of the loader.
  • the loader first verifies the metadata version. After the version is correct, the loader verifies the metadata.
  • the loader reads the binary file in the cloud disk and saves it in the loader's memory.
  • the loader reads the metadata through the I/O link of the cloud disk, which does not occupy the network bandwidth and reduces the data access pressure of the system.
  • Step S430 the loader connects the cache data in its memory to the local memory of the computing node.
  • Step S440 The computing node obtains the corresponding metadata from its local memory.
  • the computing node when a computing node needs metadata, if the metadata has been loaded into the local memory of the computing node, the computing node can directly read the corresponding memory to obtain the corresponding metadata.
  • Step S510 when the loader receives an instruction to add a new computing node, the loader connects to the memory address of the loader that caches the data.
  • Step S520 the loader notifies the computing node and connects the memory address of the loader that caches the data to the local memory of the new computing node.
  • the new computing node can work normally.
  • the computing node is destroyed, recycled or abnormally exited, its de-cached memory connection will be automatically disconnected by the operating system without additional processing.
  • the creation, destruction and recycling of computing nodes do not require special processing of the cache and do not occupy any computing resources; the exit of the computing node in an uncontrollable state does not affect the cache itself and does not require special processing of the cache.
  • a metadata cache dynamic update method includes the following steps:
  • Step S610 When the metadata is updated, the metadata service notifies the cache data generator, and the cache data generator generates a cache data image of the new metadata and notifies the loader.
  • the metadata service detects that there is a new version of metadata, the metadata service notifies the cache data generator, the cache data generator generates a cache data image of the new metadata, and notifies the loader.
  • Step S620 The loader re-reads the cache data image of the new metadata and generates a new metadata memory cache.
  • the loader re-reads the cache data image of the new metadata from the cloud disk and generates a loader memory cache of the new metadata.
  • Step S630 The loader checks all computing nodes and updates them according to the status of the computing nodes.
  • the computing node if the computing node still has tasks being executed, it will wait. When the computing node completes the current task, the loader notifies the computing node to disconnect the current metadata memory cache and reconnect to the new memory cache; if the computing node currently has no tasks or is about to execute a new task, it will notify the computing node to disconnect the current metadata memory cache and reconnect it to the new metadata memory cache. This allows dynamic updates of the metadata cache.
  • Step S640 After all computing nodes complete the cache connection update, the computing nodes delete their old memory caches and corresponding image files, and the loaders delete their old memory caches and corresponding image files.
  • the loader determines whether to connect to the loader memory cache of the new metadata according to the status of the computing node, thereby realizing dynamic update of the metadata cache, and the execution of the tasks of the computing node is not affected during the update process.
  • the cache data image is dynamically updated with the metadata version.
  • the cluster does not need to be shut down during the dynamic update, and the currently executing tasks are not affected.
  • the cache data image is dynamically rolled over and switched, and the memory and disk space occupied by the old cache data image will be recovered in time.
  • a metadata access device is also provided, which is used to implement the above embodiments and implementation methods, and the descriptions that have been made will not be repeated.
  • the terms “module”, “unit”, “sub-unit”, etc. used below can implement a combination of software and/or hardware for a predetermined function.
  • the devices described in the following embodiments are preferably implemented in software, the implementation of hardware, or a combination of software and hardware, is also possible and conceivable.
  • FIG. 7 is a structural block diagram of a metadata access device according to an embodiment of the present application. As shown in FIG. 7 , the device includes:
  • the first loading module 710 is used to extract the target metadata from the metadata service through the data extractor and cache it to the cloud disk, and the loader loads the cached data of the target metadata from the cloud disk into the first memory of the loader;
  • the target metadata is metadata in the metadata service whose data update frequency is lower than the first preset value and whose data access frequency is higher than the second preset value;
  • the access module 720 is used to load the cache data of the target metadata in the first memory into the second memory of the slave node corresponding to the master node in response to the metadata access request of the master node; the slave node reads the second memory to access the target metadata.
  • modules can be functional modules or program modules, and can be implemented by software or hardware.
  • the above modules can be located in the same processor; or the above modules can be located in different processors in any combination.
  • This embodiment also provides an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.
  • the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
  • the processor may be configured to perform the following steps through a computer program:
  • the target metadata is metadata in a metadata service whose data update frequency is lower than a first preset value and whose data access frequency is higher than a second preset value;
  • a storage medium may be provided in this embodiment to implement the method.
  • the storage medium stores a computer program; when the computer program is executed by a processor, the steps of any metadata access method in the above embodiment are implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A metadata access method and device, and a storage medium. The method comprises: a data extractor extracting target metadata from a metadata service and caching same to a cloud disk, and a loader loading cached data of the target metadata into a first memory of the loader from the cloud disk, wherein the target metadata is metadata having the data updating frequency lower than a first preset value and the data access frequency higher than a second preset value in the metadata service; in response to a metadata access request of a master node, loading the cached data of the target metadata in the first memory into a second memory of a slave node corresponding to the master node; and the slave node reading the second memory to access the target metadata.

Description

一种元数据访问方法、装置和存储介质A metadata access method, device and storage medium
相关申请Related Applications
本申请要求2022年11月14日申请的,申请号为202211418015.0,发明名称为“一种元数据访问方法、装置和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed on November 14, 2022, with application number 202211418015.0, and invention name “A metadata access method, device and storage medium”, the entire contents of which are incorporated by reference into this application.
技术领域Technical Field
本申请涉及数据处理领域,特别是涉及一种元数据访问方法、装置和存储介质。The present application relates to the field of data processing, and in particular to a metadata access method, device and storage medium.
背景技术Background technique
传统的元数据访问方法为,用户访问指令发送到分发节点,分发节点连接一个主节点,并把访问指令发送至主节点,主节点收到指令以后从元数据服务拉取元数据对指令进行解析,主节点启动计算节点,并发送指令至计算节点,计算节点启动后,接收指令,计算节点从元数据服务拉取元数据,并对指令进行处理,计算节点返回处理结果,并且退出销毁,释放计算资源。传统技术中的元数据访问方法,所有节点(分发节点、主节点即从节点)均需访问元数据服务,数据量大,占用大量网络带宽,提高了网络开销和成本,极大的加重了元数据服务的负载,导致数据库性能差,以及需要分配更多的资源给元数据服务。元数据服务负载高,限制了最大节点的数量,也就是限制了整个数据库集群的性能。The traditional metadata access method is that the user sends the access instruction to the distribution node, the distribution node connects to a master node, and sends the access instruction to the master node. After receiving the instruction, the master node pulls metadata from the metadata service to parse the instruction. The master node starts the computing node and sends the instruction to the computing node. After the computing node starts, it receives the instruction, and the computing node pulls metadata from the metadata service and processes the instruction. The computing node returns the processing result and exits and is destroyed to release computing resources. In the metadata access method in traditional technology, all nodes (distribution nodes, master nodes, and slave nodes) need to access the metadata service. The data volume is large, which occupies a large amount of network bandwidth, increases network overhead and cost, greatly increases the load of the metadata service, and leads to poor database performance, and requires more resources to be allocated to the metadata service. The high load of the metadata service limits the maximum number of nodes, which means that it limits the performance of the entire database cluster.
针对传统技术中存在的所有节点均需访问元数据服务导致元数据服务负载高的问题,目前还没有提出有效的解决方案。Currently, no effective solution has been proposed to solve the problem in traditional technologies that all nodes need to access the metadata service, resulting in high metadata service load.
发明内容Summary of the invention
根据本申请的各种实施例,提供了一种元数据访问方法、装置和存储介质。According to various embodiments of the present application, a metadata access method, device, and storage medium are provided.
第一个方面,在本实施例中提供了一种元数据访问方法,所述方法包括:In a first aspect, a metadata access method is provided in this embodiment, and the method includes:
通过数据抽取器将目标元数据从元数据服务中抽取出来缓存至云磁盘,加载器从云磁盘中将目标元数据的缓存数据加载至加载器的第一内存中;所述目标元数据为元数据服务中数据更新频次低于第一预设值且数据访问频次高于第二预设值的元数据;The target metadata is extracted from the metadata service by a data extractor and cached to a cloud disk, and the loader loads the cached data of the target metadata from the cloud disk to the first memory of the loader; the target metadata is metadata in the metadata service whose data update frequency is lower than a first preset value and whose data access frequency is higher than a second preset value;
响应于主节点的元数据访问请求,将所述第一内存中的所述目标元数据的缓存数据加载至所述主节点对应的从节点的第二内存中;所述从节点读取所述第二内存,以访问所述目标元数据。In response to a metadata access request from the master node, the cache data of the target metadata in the first memory is loaded into a second memory of a slave node corresponding to the master node; the slave node reads the second memory to access the target metadata.
在其中的一些实施例中,所述将目标元数据的缓存数据加载至第一内存中之前包括: In some embodiments, before loading the cache data of the target metadata into the first memory, the method includes:
获取所述元数据服务中的目标元数据;Obtaining target metadata from the metadata service;
根据获取到的所述目标元数据,生成所述目标元数据的缓存数据。Generate cache data of the target metadata according to the acquired target metadata.
在其中的一些实施例中,所述根据获取到的所述目标元数据,生成所述目标元数据的缓存数据包括:In some embodiments, generating cache data of the target metadata according to the acquired target metadata includes:
根据所述目标元数据的数据类型对所述目标元数据进行分类,获取类型信息;Classifying the target metadata according to the data type of the target metadata to obtain type information;
对所述目标元数据进行特征提取;Performing feature extraction on the target metadata;
对提取到的特征和分类后的目标元数据进行编码,得到编码数据;Encoding the extracted features and the classified target metadata to obtain encoded data;
根据所述编码数据和所述类型信息,生成所述缓存数据。The cache data is generated according to the encoded data and the type information.
在其中的一些实施例中,所述根据获取到的所述目标元数据,生成所述目标元数据的缓存数据包括:In some embodiments, generating cache data of the target metadata according to the acquired target metadata includes:
获取所述目标元数据的版本信息,根据所述版本信息生成所述缓存数据。The version information of the target metadata is obtained, and the cache data is generated according to the version information.
在其中的一些实施例中,所述方法还包括:In some embodiments, the method further comprises:
响应于元数据服务发送的数据更新指令,获取所述从节点的任务的执行状态;所述数据更新指令用于指示更新所述第一内存中存储的所述目标元数据的缓存数据;In response to a data update instruction sent by the metadata service, acquiring the execution status of the task of the slave node; the data update instruction is used to instruct to update the cache data of the target metadata stored in the first memory;
根据所述执行状态,从所述第一内存中选择与所述执行状态对应的所述目标元数据的缓存数据加载至所述从节点的所述第二内存中。According to the execution state, cache data of the target metadata corresponding to the execution state is selected from the first memory and loaded into the second memory of the slave node.
在其中的一些实施例中,所述根据所述执行状态,从所述第一内存中选择与所述执行状态对应的所述目标元数据的缓存数据加载至所述从节点的所述第二内存中包括:In some embodiments, selecting, according to the execution state, from the first memory, cache data of the target metadata corresponding to the execution state and loading it into the second memory of the slave node comprises:
当所述执行状态为所述从节点执行任务时,从所述第一内存中选择更新前的所述目标元数据的缓存数据加载至所述从节点的所述第二内存中;When the execution state is that the slave node executes the task, cache data of the target metadata before the update is selected from the first memory and loaded into the second memory of the slave node;
当所述执行状态为所述从节点空闲时,从所述第一内存中选择更新后的所述目标元数据的缓存数据加载至所述从节点的所述第二内存中。When the execution state is that the slave node is idle, the updated cache data of the target metadata is selected from the first memory and loaded into the second memory of the slave node.
在其中的一些实施例中,所述方法还包括:In some embodiments, the method further comprises:
响应于从节点的接入请求,将所述第一内存的地址连接到待接入的从节点中。In response to an access request from a slave node, the address of the first memory is connected to the slave node to be accessed.
在其中的一些实施例中,所述从节点为无状态计算节点。In some of the embodiments, the slave node is a stateless computing node.
第二个方面,在本实施例中提供了一种元数据访问装置,所述装置包括:In a second aspect, a metadata access device is provided in this embodiment, and the device includes:
第一加载模块,用于通过数据抽取器将目标元数据从元数据服务中抽取出来缓存至云磁盘,加载器从云磁盘中将目标元数据的缓存数据加载至加载器的第一内存中;所述目标元数据为元数据服务中数据更新频次低于第一预设值且数据访问频次高于第二预设值的元数据;A first loading module is used to extract target metadata from a metadata service through a data extractor and cache the target metadata to a cloud disk, and the loader loads the cached data of the target metadata from the cloud disk to a first memory of the loader; the target metadata is metadata in the metadata service whose data update frequency is lower than a first preset value and whose data access frequency is higher than a second preset value;
访问模块,用于响应于主节点的元数据访问请求,将所述第一内存中的所述目标元数 据的缓存数据加载至所述主节点对应的从节点的第二内存中;所述从节点读取所述第二内存,以访问所述目标元数据。The access module is used to respond to the metadata access request of the master node and to The cached data of the target metadata is loaded into the second memory of the slave node corresponding to the master node; the slave node reads the second memory to access the target metadata.
第三个方面,在本实施例中提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现第一个方面所述的元数据访问方法的步骤。According to a third aspect, a computer-readable storage medium is provided in this embodiment, on which a computer program is stored. When the computer program is executed by a processor, the steps of the metadata access method described in the first aspect are implemented.
本申请的一个或多个实施例的细节在以下附图和描述中提出,以使本申请的其他特征、目的和优点更加简明易懂。Details of one or more embodiments of the present application are set forth in the following drawings and description to make other features, objects, and advantages of the present application more readily apparent.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。The drawings described herein are used to provide further understanding of the present application and constitute a part of the present application. The illustrative embodiments of the present application and their descriptions are used to explain the present application and do not constitute improper limitations on the present application.
图1是执行本申请一个或多个实施例的一种元数据访问方法的终端的硬件结构框图。FIG. 1 is a hardware structure block diagram of a terminal for executing a metadata access method according to one or more embodiments of the present application.
图2是本申请一个或多个实施例的一种元数据访问方法的流程图。FIG. 2 is a flowchart of a metadata access method according to one or more embodiments of the present application.
图3是本申请一个或多个实施例中的一种元数据缓存生成方法的流程图。FIG. 3 is a flowchart of a metadata cache generation method in one or more embodiments of the present application.
图4是本申请一个或多个实施例中的一种元数据缓存访问方法的流程图。FIG. 4 is a flowchart of a metadata cache access method in one or more embodiments of the present application.
图5是本申请一个或多个实施例中的一种计算节点更新方法的流程图。FIG5 is a flowchart of a computing node updating method in one or more embodiments of the present application.
图6是本申请一个或多个实施例中的一种元数据缓存动态更新方法的流程图。FIG. 6 is a flowchart of a metadata cache dynamic update method in one or more embodiments of the present application.
图7是本申请一个或多个实施例的一种元数据访问装置的结构框图。FIG. 7 is a structural block diagram of a metadata access device according to one or more embodiments of the present application.
具体实施方式Detailed ways
为更清楚地理解本申请的目的、技术方案和优点,下面结合附图和实施例,对本申请进行了描述和说明。In order to more clearly understand the purpose, technical solutions and advantages of the present application, the present application is described and illustrated below in conjunction with the accompanying drawings and embodiments.
除另作定义外,本申请所涉及的技术术语或者科学术语应具有本申请所属技术领域具备一般技能的人所理解的一般含义。在本申请中的“一”、“一个”、“一种”、“该”、“这些”等类似的词并不表示数量上的限制,它们可以是单数或者复数。在本申请中所涉及的术语“包括”、“包含”、“具有”及其任何变体,其目的是涵盖不排他的包含;例如,包含一系列步骤或模块(单元)的过程、方法和系统、产品或设备并未限定于列出的步骤或模块(单元),而可包括未列出的步骤或模块(单元),或者可包括这些过程、方法、产品或设备固有的其他步骤或模块(单元)。在本申请中所涉及的“连接”、“相连”、“耦接”等类似的词语并不限定于物理的或机械连接,而可以包括电气连接,无论是直接连接还是间接连接。在本申请中所涉及的“多个”是指两个或两个以上。“和/或”描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:单独存在A, 同时存在A和B,单独存在B这三种情况。通常情况下,字符“/”表示前后关联的对象是一种“或”的关系。在本申请中所涉及的术语“第一”、“第二”、“第三”等,只是对相似对象进行区分,并不代表针对对象的特定排序。Unless otherwise defined, the technical terms or scientific terms involved in this application shall have the general meaning understood by people with ordinary skills in the technical field to which this application belongs. The words "one", "a", "the", "these" and the like in this application do not indicate a quantitative limitation, and they may be singular or plural. The terms "include", "comprise", "have" and any variants thereof involved in this application are intended to cover non-exclusive inclusions; for example, a process, method and system, product or device comprising a series of steps or modules (units) is not limited to the listed steps or modules (units), but may include unlisted steps or modules (units), or may include other steps or modules (units) inherent to these processes, methods, products or devices. The words "connect", "connected", "coupled" and the like involved in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The "multiple" involved in this application refers to two or more. "And/or" describes the association relationship of associated objects, indicating that three relationships may exist. For example, "A and/or B" may mean: A exists alone, There are three situations: A and B exist at the same time, and B exists alone. Generally, the character "/" indicates that the objects related to each other are in an "or" relationship. The terms "first", "second", "third", etc. involved in this application are only used to distinguish similar objects and do not represent a specific ordering of objects.
为了便于理解,示例性的给出了与本申请相关概念的说明以供参考,可以理解的是,相关概念的说明也作为本申请实施例的一部分内容,具体如下:For ease of understanding, the description of concepts related to the present application is provided for reference by way of example. It can be understood that the description of related concepts is also part of the embodiments of the present application, as follows:
一、弹性分布式计算1. Elastic Distributed Computing
基于分布式集群的计算节点可以动态按需生成和销毁。分布式的计算节点资源无需预先生成。Computing nodes based on distributed clusters can be dynamically generated and destroyed on demand. Distributed computing node resources do not need to be generated in advance.
二、无状态计算节点2. Stateless Compute Nodes
无状态计算节点不存储任何集群的信息以及数据,生成和销毁不会对分布式集群产生任何影响。本申请所涉及的所有从节点,皆为无状态计算节点,其状态全部储存于元数据服务中。Stateless computing nodes do not store any cluster information or data, and their creation and destruction will not have any impact on the distributed cluster. All slave nodes involved in this application are stateless computing nodes, and their states are all stored in the metadata service.
三、元数据3. Metadata
一种用于描述/执行数据库中用户数据/查询/操作的数据,本申请所涉及的元数据为独立式存储。元数据为数据库关键数据,一旦损坏,数据库将停止服务且不可恢复。A type of data used to describe/execute user data/queries/operations in a database. The metadata involved in this application is stored independently. Metadata is key data in the database. Once damaged, the database will stop serving and cannot be recovered.
四、分布式数据库4. Distributed Database
本申请所涉及的分布式数据库为主-从(master-segment)节点架构的存算分离的分布式数据。其中,主节点负责接收用户指令(query)和解析,从节点为eMPP(elasticMassive Parallel Processing,弹性大规模并行计算)架构的无状态计算节点,负责处理用户指令,读取和处理数据,并把结果返回至主节点。一般的数量级为一个主节点,数千个从节点。The distributed database involved in this application is a distributed data with storage and computing separation in a master-segment node architecture. Among them, the master node is responsible for receiving user instructions (queries) and parsing, and the slave node is a stateless computing node in the eMPP (elastic Massive Parallel Processing) architecture, which is responsible for processing user instructions, reading and processing data, and returning the results to the master node. The general order of magnitude is one master node and thousands of slave nodes.
五、元数据存储/访问5. Metadata Storage/Access
本申请所涉及的元数据存储和访问皆为同一节点进行存储和访问,即元数据作为关键性数据统一存储,保证其安全性。所有节点(包括所有的主、从节点)都需要访问元数据。元数据服务系统指一种类数据库服务系统,可以为分布式数据库提供统一的元数据服务,包括存储,查询,修改,插入。The metadata storage and access involved in this application are all stored and accessed by the same node, that is, metadata is stored uniformly as key data to ensure its security. All nodes (including all master and slave nodes) need to access metadata. The metadata service system refers to a database-like service system that can provide unified metadata services for distributed databases, including storage, query, modification, and insertion.
六、用户指令(query)6. User command (query)
用户对存储的数据进行增删查改,数据库主节点在接收到用户指令以后,会对用户指令进行解析和翻译至机器语言,并交由计算节点进行处理。在整个过程(解析,翻译,处理)中,需要元数据参与进行处理。When users add, delete, check, and modify stored data, the database master node will parse and translate the user instructions into machine language after receiving the user instructions, and then pass them to the computing node for processing. In the entire process (parsing, translation, processing), metadata is required for processing.
在本实施例中提供的方法实施例可以在终端、计算机或者类似的运算装置中执行。比如在终端上运行,图1是执行本申请实施例的一种元数据访问方法的终端的硬件结构框图。 如图1所示,终端可以包括一个或多个(图1中仅示出一个)处理器102和用于存储数据的存储器104,其中,处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置。上述终端还可以包括用于通信功能的传输设备106以及输入输出设备108。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述终端的结构造成限制。例如,终端还可包括比图1中所示更多或者更少的组件,或者具有与图1所示出的不同配置。The method embodiment provided in this embodiment can be executed in a terminal, a computer or a similar computing device. For example, running on a terminal, FIG1 is a hardware structure block diagram of a terminal that executes a metadata access method of an embodiment of the present application. As shown in FIG. 1 , the terminal may include one or more (only one is shown in FIG. 1 ) processors 102 and a memory 104 for storing data, wherein the processor 102 may include but is not limited to a processing device such as a microprocessor MCU or a programmable logic device FPGA. The terminal may also include a transmission device 106 and an input/output device 108 for communication functions. It will be appreciated by those skilled in the art that the structure shown in FIG. 1 is for illustration only and does not limit the structure of the terminal. For example, the terminal may also include more or fewer components than those shown in FIG. 1 , or may have a different configuration than that shown in FIG. 1 .
存储器104可用于存储计算机程序,例如,应用软件的软件程序以及模块,如在本实施例中的一种元数据访问方法对应的计算机程序,处理器102通过运行存储在存储器104内的计算机程序,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 104 can be used to store computer programs, for example, software programs and modules of application software, such as a computer program corresponding to a metadata access method in this embodiment. The processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, that is, to implement the above method. The memory 104 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include a memory remotely arranged relative to the processor 102, and these remote memories may be connected to the terminal via a network. Examples of the above network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
传输设备106用于经由一个网络接收或者发送数据。上述的网络包括终端的通信供应商提供的无线网络。在一个实例中,传输设备106包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输设备106可以为射频(Radio Frequency,简称为RF)模块,其用于通过无线方式与互联网进行通讯。The transmission device 106 is used to receive or send data via a network. The above-mentioned network includes a wireless network provided by the communication provider of the terminal. In one embodiment, the transmission device 106 includes a network adapter (Network Interface Controller, referred to as NIC), which can be connected to other network devices through a base station so as to communicate with the Internet. In one embodiment, the transmission device 106 can be a radio frequency (Radio Frequency, referred to as RF) module, which is used to communicate with the Internet wirelessly.
在本实施例中提供了一种元数据访问方法,图2是本申请实施例的一种元数据访问方法的流程图,如图2所示,该流程包括如下步骤:In this embodiment, a metadata access method is provided. FIG. 2 is a flowchart of a metadata access method in an embodiment of the present application. As shown in FIG. 2 , the process includes the following steps:
步骤S210,通过数据抽取器将目标元数据从元数据服务中抽取出来缓存至云磁盘,加载器从云磁盘中将目标元数据的缓存数据加载至加载器的第一内存中;目标元数据为元数据服务中数据更新频次低于第一预设值且数据访问频次高于第二预设值的元数据。Step S210, extract the target metadata from the metadata service through the data extractor and cache it to the cloud disk, and the loader loads the cached data of the target metadata from the cloud disk into the first memory of the loader; the target metadata is the metadata in the metadata service whose data update frequency is lower than the first preset value and whose data access frequency is higher than the second preset value.
具体地,该方法可以应用于eMPP架构,进一步可以应用于基于eMPP分布式存算分离数据库。加载器将目标元数据的缓存数据加载至第一内存中,这里的目标元数据为元数据服务中数据更新频次低于第一预设值且数据访问频次高于第二预设值的元数据,这里的第一内存为加载器的本地内存,加载器可以从云磁盘中将目标元数据的缓存数据加载至加载器的本地内存中。进一步具体地,根据元数据的活跃程度,可以将元数据分为热数据和冷数据,其中可以定义冷数据为低更新频率高访问频率的元数据,热数据为高更新频率低访问频率的元数据。对于冷数据,其会频繁被计算节点访问,但其数据更新频率较低,可以将冷数据作为目标元数据从元数据服务中提取出来,生成缓存数据镜像,计算节点从缓 存数据镜像中获取相应的目标元数据,而无需访问元数据服务,极大的节省了网络带宽和降低了元数据服务的复杂度。Specifically, this method can be applied to the eMPP architecture, and further to the distributed storage-computing separation database based on eMPP. The loader loads the cached data of the target metadata into the first memory, where the target metadata is metadata in the metadata service with a data update frequency lower than a first preset value and a data access frequency higher than a second preset value. The first memory here is the local memory of the loader, and the loader can load the cached data of the target metadata from the cloud disk to the local memory of the loader. More specifically, according to the activity level of the metadata, the metadata can be divided into hot data and cold data, where cold data can be defined as metadata with a low update frequency and a high access frequency, and hot data can be defined as metadata with a high update frequency and a low access frequency. For cold data, it will be frequently accessed by computing nodes, but its data update frequency is low. The cold data can be extracted from the metadata service as the target metadata to generate a cached data image, and the computing nodes can use the cached data from the cache. The corresponding target metadata can be obtained from the stored data image without accessing the metadata service, which greatly saves network bandwidth and reduces the complexity of the metadata service.
示例性地,这里的加载器可以为eMPP架构自带的加载器。Exemplarily, the loader here may be a loader provided by the eMPP architecture.
步骤S220,响应于主节点的元数据访问请求,将第一内存中的目标元数据的缓存数据加载至主节点对应的从节点的第二内存中;从节点读取第二内存,以访问目标元数据。Step S220, in response to the metadata access request of the master node, the cache data of the target metadata in the first memory is loaded into the second memory of the slave node corresponding to the master node; the slave node reads the second memory to access the target metadata.
具体地,加载器响应于主节点的元数据访问请求,将第一内存中的目标元数据的缓存数据加载至主节点对应的从节点的第二内存中;从节点读取第二内存,以访问目标元数据,这里的第二内存可以为从节点的本地内存。这里的从节点为无状态计算节点,从节点对目标元数据的缓存数据的访问,只能通过主节点通知加载器。加载器响应于主节点的元数据访问请求,该元数据访问请求携带有需要进行目标元数据访问的从节点的信息,将加载器的本地内存中的目标元数据的缓存数据加载至相应的从节点的本地内存中,从节点从其本地内存中读取目标元数据,以访问该目标元数据。Specifically, the loader responds to the metadata access request of the master node, and loads the cached data of the target metadata in the first memory into the second memory of the slave node corresponding to the master node; the slave node reads the second memory to access the target metadata, and the second memory here can be the local memory of the slave node. The slave node here is a stateless computing node, and the slave node's access to the cached data of the target metadata can only be notified to the loader by the master node. The loader responds to the metadata access request of the master node, and the metadata access request carries the information of the slave node that needs to access the target metadata, and loads the cached data of the target metadata in the local memory of the loader into the local memory of the corresponding slave node, and the slave node reads the target metadata from its local memory to access the target metadata.
在本实施例中,将元数据中的目标元数据进行缓存至云磁盘后,从节点需要访问目标元数据时,加载器先将目标元数据加载至加载器的本地内存中,从节点需要访问相应的目标元数据时,再将该目标元数据加载至从节点的本地内存中,从而减小元数据服务的访问压力,解决了现有技术中存在的所有节点均需访问元数据服务导致元数据服务负载高的问题。In this embodiment, after the target metadata in the metadata is cached to the cloud disk, when the slave node needs to access the target metadata, the loader first loads the target metadata into the local memory of the loader. When the slave node needs to access the corresponding target metadata, the target metadata is loaded into the local memory of the slave node, thereby reducing the access pressure of the metadata service and solving the problem in the prior art that all nodes need to access the metadata service, resulting in high metadata service load.
在其中的一些实施例中,将目标元数据的缓存数据加载至第一内存中之前包括:获取元数据服务中的目标元数据,根据获取到的目标元数据,生成目标元数据的缓存数据。In some of the embodiments, before loading the cache data of the target metadata into the first memory, the process includes: acquiring the target metadata in the metadata service, and generating the cache data of the target metadata according to the acquired target metadata.
具体地,从元数据服务中将目标元数据抽取出来,根据获取到的目标元数据,生成目标元数据的缓存数据。Specifically, the target metadata is extracted from the metadata service, and cache data of the target metadata is generated according to the acquired target metadata.
在其中的一些实施例中,根据获取到的目标元数据,生成目标元数据的缓存数据包括:根据目标元数据的数据类型对目标元数据进行分类,获取类型信息;将分类后的目标元数据进行特征提取;对提取到的特征和分类后的目标元数据进行编码,得到编码数据;根据编码数据和类型信息,生成缓存数据。In some of the embodiments, generating cache data of the target metadata based on the acquired target metadata includes: classifying the target metadata according to the data type of the target metadata to obtain type information; performing feature extraction on the classified target metadata; encoding the extracted features and the classified target metadata to obtain encoded data; and generating cache data based on the encoded data and the type information.
具体地,将分类后的目标元数据进行特征提取,提取特征值,将提取的特征值和分类后的目标元数据一并进行编码,得到编码数据,根据编码数据和类型信息,生成缓存数据,缓存数据中包括编码数据和目标元数据的类型信息,生成对应的缓存数据镜像,并保存于云磁盘中。示例性地,这里的编码可以是二进制编码,以使得编码后得到的编码数据符合加载器本地内存和从节点的本地内存中的数据结构。通过对目标元数据进行分类及特征值提取,加快云磁盘中目标元数据的查询速度。 Specifically, feature extraction is performed on the classified target metadata to extract feature values, and the extracted feature values and the classified target metadata are encoded together to obtain encoded data. According to the encoded data and type information, cache data is generated, and the cache data includes the type information of the encoded data and the target metadata. A corresponding cache data image is generated and saved in the cloud disk. Exemplarily, the encoding here can be binary encoding, so that the encoded data obtained after encoding conforms to the data structure in the local memory of the loader and the local memory of the slave node. By classifying the target metadata and extracting feature values, the query speed of the target metadata in the cloud disk is accelerated.
在其中的一些实施例中,根据获取到的目标元数据,生成目标元数据的缓存数据包括:获取目标元数据的版本信息,根据版本信息生成缓存数据。In some of the embodiments, generating cache data of the target metadata according to the acquired target metadata includes: acquiring version information of the target metadata, and generating cache data according to the version information.
具体地,获取目标元数据的版本信息,为缓存数据生成版本信息,将带有版本信息的缓存数据,生成对应的缓存数据镜像,并保存于云磁盘中。Specifically, the version information of the target metadata is obtained, the version information is generated for the cached data, the cached data with the version information is used to generate a corresponding cached data image, and the image is saved in the cloud disk.
在其中的一些实施例中,该元数据访问方法还包括缓存动态更新过程,该缓存动态更新过程包括:响应于元数据服务发送的数据更新指令,获取从节点的任务的执行状态;数据更新指令用于指示更新第一内存中存储的目标元数据的缓存数据;根据执行状态,从第一内存中选择与执行状态对应的目标元数据的缓存数据加载至从节点的第二内存中。In some of the embodiments, the metadata access method also includes a cache dynamic update process, which includes: in response to a data update instruction sent by the metadata service, obtaining the execution status of the task of the slave node; the data update instruction is used to indicate the update of the cache data of the target metadata stored in the first memory; according to the execution status, selecting the cache data of the target metadata corresponding to the execution status from the first memory and loading it into the second memory of the slave node.
具体地,加载器响应于元数据服务发送的数据更新指令,获取从节点的任务的执行状态,当执行状态为从节点正在执行任务时,从第一内存中选择更新前的目标元数据的缓存数据加载至从节点的第二内存中;当执行状态为从节点空闲时,从第一内存中选择更新后的目标元数据的缓存数据加载至从节点的第二内存中。当所有的从节点均从第一内存中选择更新后的目标元数据的缓存数据加载至从节点的第二内存中后,即所有的从节点完成缓存连接更新后,从节点删除旧的内存缓存以及系统删除旧的缓存数据的镜像文件。Specifically, the loader responds to the data update instruction sent by the metadata service, obtains the execution status of the task of the slave node, and when the execution status is that the slave node is executing the task, selects the cache data of the target metadata before the update from the first memory and loads it into the second memory of the slave node; when the execution status is that the slave node is idle, selects the cache data of the target metadata after the update from the first memory and loads it into the second memory of the slave node. After all the slave nodes have selected the cache data of the updated target metadata from the first memory and loaded it into the second memory of the slave node, that is, after all the slave nodes have completed the cache connection update, the slave node deletes the old memory cache and the system deletes the mirror file of the old cache data.
下面通过一个或多个实施例对本申请实施例进行描述和说明。The embodiments of the present application are described and illustrated below through one or more embodiments.
在本一个或多个实施例中提供了一种元数据缓存生成方法,如图3所示,该方法包括如下步骤:In one or more embodiments, a metadata cache generation method is provided. As shown in FIG3 , the method includes the following steps:
步骤S310,数据抽取器将需要的元数据从元数据服务中抽取出来。Step S310: The data extractor extracts the required metadata from the metadata service.
具体地,这里的需要的元数据即为目标元数据也即冷数据。这里的数据抽取器可以为eMPP架构中实现数据抽取功能的模块。Specifically, the required metadata here is the target metadata, that is, cold data. The data extractor here can be a module that implements the data extraction function in the eMPP architecture.
步骤S320,缓存数据生成器根据元数据的内部的数据的属性对元数据进行分类。Step S320: the cache data generator classifies the metadata according to the attributes of the data inside the metadata.
具体地,按照单个元数据内部的数据的属性,对元数据进行分类。这里的缓存数据生成器可以为eMPP架构中实现缓存数据生成功能的模块。Specifically, the metadata is classified according to the attributes of the data within the single metadata. The cache data generator here can be a module that implements the cache data generation function in the eMPP architecture.
步骤S330,缓存数据生成器对元数据进行预计算。Step S330: The cache data generator pre-calculates metadata.
具体地,这里的预计算包括扫描抽取出来的分类后的元数据,并按照系统定义的特征类计算出特征值,这里的特征值可以通过哈希算法来计算,这里的特征值可用来表征元数据的类型。Specifically, the pre-calculation here includes scanning the extracted classified metadata and calculating the feature value according to the feature class defined by the system. The feature value here can be calculated by a hash algorithm, and the feature value here can be used to characterize the type of metadata.
步骤S340,缓存数据生成器对抽取出来的元数据进行编码。Step S340: the cache data generator encodes the extracted metadata.
具体地,缓存数据生成器对分类后的元数据和计算出的特征值进行二进制编码,以保证其符合内存中数据结构。数据抽取器将抽取出来的数据交由缓存数据生成器进行编码,以提升加载速度和查询速度。 Specifically, the cache data generator performs binary encoding on the classified metadata and calculated feature values to ensure that they conform to the in-memory data structure. The data extractor passes the extracted data to the cache data generator for encoding to improve loading and query speeds.
步骤S350,对元数据进行版本校验,生成版本信息。Step S350: perform version verification on the metadata to generate version information.
步骤S360,对处理好的元数据进行打包,生成缓存数据。Step S360: Pack the processed metadata to generate cache data.
步骤S370,缓存数据生成器为打包好的缓存数据添加版本信息,并生成对应的缓存数据镜像,保存于云磁盘中。In step S370, the cache data generator adds version information to the packaged cache data, generates a corresponding cache data image, and stores it in the cloud disk.
具体地,缓存数据生成器为打包好的缓存数据添加版本信息,缓存数据生成器将添加版本信息的缓存数据生成缓存数据镜像,并将生成的缓存数据镜像保存于云磁盘中。后续主节点或元数据服务触发元数据访问,加载器将元数据的缓存数据镜像从云磁盘中读出放入加载器的本地硬盘中,计算节点从加载器的本地硬盘中读取数据到计算节点的本地硬盘中,加载器的本地硬盘和计算节点的本地硬盘在一个服务器内。这里的计算节点为无状态计算节点。Specifically, the cache data generator adds version information to the packaged cache data, generates a cache data image from the cache data with added version information, and saves the generated cache data image in the cloud disk. Subsequently, the master node or metadata service triggers metadata access, and the loader reads the cache data image of the metadata from the cloud disk and puts it into the loader's local hard disk. The computing node reads data from the loader's local hard disk to the computing node's local hard disk. The loader's local hard disk and the computing node's local hard disk are in the same server. The computing node here is a stateless computing node.
在本实施例中,数据抽取将需要的元数据从元数据服务中抽取出来缓存至云磁盘,计算节点需要访问目标元数据时,加载器先将目标元数据加载至加载器的本地内存中,计算节点需要访问相应的目标元数据时,再将该目标元数据加载至计算节点的本地内存中,从而减小元数据服务的访问压力,解决了现有技术中存在的所有节点均需访问元数据服务导致元数据服务负载高的问题;对目标元数据进行分类、预计算和编码,加快了后续对于缓存数据的查询速度。对元数据进行冷热数据分类,使得离线缓存成为可能,离线缓存做成数据镜像,可以通过操作系统直接挂载,无需特殊硬件设备。对元数据进行预计算,提高了对缓存数据的查询的速度。对元数据的查询关键字进行对应的多关键字哈希编码,使用专用的数据结构对元数据进行存储,专用数据结构是为内存中的缓存数据所设计,进行分类和储存,并进行编码后的生成的数据结构。对元数据进行特殊编码,提高了元数据的安全性和加载速度,全二进制编码,可直接整块加载到内存中,对专用的数据结构添加了数据校验,保证数据的正确性。即本实施提供的元数据缓存生成方法减少了元数据服务的负载,减少了元数据需要的网络传输带宽,减少了元数据查询的延迟,提高了数据库集群整体性能及提高了数据库集群物理节点的承受数量。In this embodiment, data extraction extracts the required metadata from the metadata service and caches it to the cloud disk. When the computing node needs to access the target metadata, the loader first loads the target metadata into the local memory of the loader. When the computing node needs to access the corresponding target metadata, the target metadata is then loaded into the local memory of the computing node, thereby reducing the access pressure of the metadata service and solving the problem that all nodes in the prior art need to access the metadata service, resulting in high metadata service load; the target metadata is classified, pre-calculated and encoded, which speeds up the subsequent query speed for cached data. The metadata is classified into cold and hot data, making offline caching possible. The offline cache is made into a data mirror and can be directly mounted through the operating system without the need for special hardware devices. The metadata is pre-calculated to improve the query speed of cached data. The query keywords of the metadata are hashed with corresponding multi-keywords, and the metadata is stored using a dedicated data structure. The dedicated data structure is designed for cached data in the memory, classified and stored, and the generated data structure after encoding. The metadata is specially encoded to improve the security and loading speed of the metadata. The full binary encoding can be directly loaded into the memory as a whole block. Data verification is added to the dedicated data structure to ensure the correctness of the data. That is, the metadata cache generation method provided by this implementation reduces the load of the metadata service, reduces the network transmission bandwidth required for metadata, reduces the latency of metadata queries, improves the overall performance of the database cluster, and increases the number of physical nodes that can be supported by the database cluster.
在本一个或多个实施例中提供了一种元数据缓存访问方法,如图4所示,该方法包括如下步骤:In one or more embodiments, a metadata cache access method is provided, as shown in FIG4 , the method includes the following steps:
步骤S410,将缓存数据镜像挂载到本地环境中。Step S410, mounting the cached data image to the local environment.
具体地,系统将缓存数据镜像挂载到本地环境中,这里的系统可以为eMPP架构的操作系统。离线缓存做成数据镜像,可以通过操作系统直接挂载,无需特殊硬件设备。Specifically, the system mounts the cache data image to the local environment, where the system can be an operating system of the eMPP architecture. The offline cache is made into a data image and can be directly mounted through the operating system without the need for special hardware devices.
步骤S420,加载器读取元数据的缓存数据镜像。Step S420: The loader reads the cached data image of the metadata.
具体地,加载器将云磁盘中的缓存数据镜像加载至加载器的内存中,即加载器将云磁 盘中的缓存数据镜像加载至加载器的本地硬盘中。加载器先对元数据进行版本校验,版本正确后,加载器对元数据进行校验。加载器读取云磁盘中的二进制文件,并保存于加载器的内存中。加载器读取元数据通过云磁盘的I/O链路进行,不占用网路带宽,减少了系统的数据访问压力。Specifically, the loader loads the cached data image in the cloud disk into the loader's memory, that is, the loader The cached data image in the disk is loaded into the local hard disk of the loader. The loader first verifies the metadata version. After the version is correct, the loader verifies the metadata. The loader reads the binary file in the cloud disk and saves it in the loader's memory. The loader reads the metadata through the I/O link of the cloud disk, which does not occupy the network bandwidth and reduces the data access pressure of the system.
步骤S430,加载器将其内存中的缓存数据连接到计算节点的本地内存中。Step S430 , the loader connects the cache data in its memory to the local memory of the computing node.
步骤S440,计算节点从其本地内存中获取对应的元数据。Step S440: The computing node obtains the corresponding metadata from its local memory.
具体地,当计算节点需要元数据的时候,如果该元数据已经被加载进计算节点的本地内存中,计算节点直接读取相应内存即可获取对应的元数据。Specifically, when a computing node needs metadata, if the metadata has been loaded into the local memory of the computing node, the computing node can directly read the corresponding memory to obtain the corresponding metadata.
在本实施例中,计算节点需要访问目标元数据时,加载器先将目标元数据加载至加载器的本地内存中,计算节点需要访问相应的目标元数据时,再将该目标元数据加载至计算节点的本地内存中,从而减小元数据服务的访问压力,解决了现有技术中存在的所有节点均需访问元数据服务导致元数据服务负载高的问题。缓存由加载器独立控制,每个物理环境只需要一份内存拷贝,极大的节约了内存空间(每个物理环境的计算节点>100)。In this embodiment, when a computing node needs to access target metadata, the loader first loads the target metadata into the local memory of the loader. When the computing node needs to access the corresponding target metadata, the target metadata is then loaded into the local memory of the computing node, thereby reducing the access pressure of the metadata service and solving the problem of high metadata service load caused by all nodes needing to access the metadata service in the prior art. The cache is independently controlled by the loader, and only one memory copy is required for each physical environment, which greatly saves memory space (the number of computing nodes in each physical environment is >100).
在本一个或多个实施例中提供了一种计算节点更新方法,如图5所示,该方法包括如下步骤:In one or more embodiments, a computing node updating method is provided. As shown in FIG5 , the method includes the following steps:
步骤S510,加载器收到新计算节点加入的指令时,加载器并连接到缓存数据的加载器的内存地址。Step S510, when the loader receives an instruction to add a new computing node, the loader connects to the memory address of the loader that caches the data.
步骤S520,加载器通知计算节点,并将缓存数据的加载器的内存地址连接到新计算节点的本地内存中。Step S520, the loader notifies the computing node and connects the memory address of the loader that caches the data to the local memory of the new computing node.
具体地,将缓存数据的加载器的内存地址连接到新计算节点的本地内存中后,新计算节点即可正常工作。当计算节点销毁、回收或非正常退出时,其去缓存的内存连接会由操作系统自动断开,无需额外处理。Specifically, after the memory address of the cache data loader is connected to the local memory of the new computing node, the new computing node can work normally. When the computing node is destroyed, recycled or abnormally exited, its de-cached memory connection will be automatically disconnected by the operating system without additional processing.
在本实施例中,计算节点的新建、销毁和回收均不需要对缓存进行特殊处理,不占用任何计算资源;计算节点非可控状态下退出,不影响缓存本身,无需对缓存进行特殊处理。In this embodiment, the creation, destruction and recycling of computing nodes do not require special processing of the cache and do not occupy any computing resources; the exit of the computing node in an uncontrollable state does not affect the cache itself and does not require special processing of the cache.
在本一个或多个实施例中提供了一种元数据缓存动态更新方法,如图6所示,该方法包括如下步骤:In one or more embodiments, a metadata cache dynamic update method is provided. As shown in FIG6 , the method includes the following steps:
步骤S610,当元数据有更新时,元数据服务通知缓存数据生成器,缓存数据生成器生成新的元数据的缓存数据镜像,并通知加载器。Step S610: When the metadata is updated, the metadata service notifies the cache data generator, and the cache data generator generates a cache data image of the new metadata and notifies the loader.
具体地,当元数据服务检测到有新版本元数据的时候,元数据服务通知缓存数据生成器,缓存数据生成器生成新的元数据的缓存数据镜像,并通知加载器。Specifically, when the metadata service detects that there is a new version of metadata, the metadata service notifies the cache data generator, the cache data generator generates a cache data image of the new metadata, and notifies the loader.
步骤S620,加载器重新读取新的元数据的缓存数据镜像,并生成新的元数据内存缓存。 Step S620: The loader re-reads the cache data image of the new metadata and generates a new metadata memory cache.
具体地,加载器重新从云磁盘中读取新的元数据的缓存数据镜像,并生成新的元数据的加载器内存缓存。Specifically, the loader re-reads the cache data image of the new metadata from the cloud disk and generates a loader memory cache of the new metadata.
步骤S630,加载器检查所有的计算节点,根据计算节点的状态进行更新。Step S630: The loader checks all computing nodes and updates them according to the status of the computing nodes.
具体地,如果计算节点还有正在执行的任务,则等待,当计算节点完成当前任务,加载器通知计算节点解除当前元数据内存缓存的连接,并重新连接到新的内存缓存上;如果计算节点当前无任务或即将执行新的任务,则通知计算节点解除当前元数据内存缓存的连接,并将其连接到新的元数据内存缓存上。从而实现元数据缓存动态更新。Specifically, if the computing node still has tasks being executed, it will wait. When the computing node completes the current task, the loader notifies the computing node to disconnect the current metadata memory cache and reconnect to the new memory cache; if the computing node currently has no tasks or is about to execute a new task, it will notify the computing node to disconnect the current metadata memory cache and reconnect it to the new metadata memory cache. This allows dynamic updates of the metadata cache.
步骤S640,所有计算节点完成缓存连接更新后,计算节点删除其旧的内存缓存以及对应的镜像文件,加载器删除其旧的内存缓存以及对应的镜像文件。Step S640: After all computing nodes complete the cache connection update, the computing nodes delete their old memory caches and corresponding image files, and the loaders delete their old memory caches and corresponding image files.
在本实施例中,当元数据有更新时,加载器根据计算节点的状态,来判断是否连接至新的元数据的加载器内存缓存上,从而实现元数据缓存动态更新,并且在更新过程中不影响计算节点的任务的执行。缓存数据镜像随元数据版本进行动态更新,动态更新时集群无需停机,不影响当前正在执行的任务,缓存数据镜像动态滚动切换,旧缓存数据镜像占用的内存和磁盘空间会及时回收。In this embodiment, when the metadata is updated, the loader determines whether to connect to the loader memory cache of the new metadata according to the status of the computing node, thereby realizing dynamic update of the metadata cache, and the execution of the tasks of the computing node is not affected during the update process. The cache data image is dynamically updated with the metadata version. The cluster does not need to be shut down during the dynamic update, and the currently executing tasks are not affected. The cache data image is dynamically rolled over and switched, and the memory and disk space occupied by the old cache data image will be recovered in time.
在本实施例中还提供了一种元数据访问装置,该装置用于实现上述实施例及实施方式,已经进行过说明的不再赘述。以下所使用的术语“模块”、“单元”、“子单元”等可以实现预定功能的软件和/或硬件的组合。尽管在以下实施例中所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。In this embodiment, a metadata access device is also provided, which is used to implement the above embodiments and implementation methods, and the descriptions that have been made will not be repeated. The terms "module", "unit", "sub-unit", etc. used below can implement a combination of software and/or hardware for a predetermined function. Although the devices described in the following embodiments are preferably implemented in software, the implementation of hardware, or a combination of software and hardware, is also possible and conceivable.
图7是本申请实施例的一种元数据访问装置的结构框图,如图7所示,该装置包括:FIG. 7 is a structural block diagram of a metadata access device according to an embodiment of the present application. As shown in FIG. 7 , the device includes:
第一加载模块710,用于通过数据抽取器将目标元数据从元数据服务中抽取出来缓存至云磁盘,加载器从云磁盘中将目标元数据的缓存数据加载至加载器的第一内存中;目标元数据为元数据服务中数据更新频次低于第一预设值且数据访问频次高于第二预设值的元数据;The first loading module 710 is used to extract the target metadata from the metadata service through the data extractor and cache it to the cloud disk, and the loader loads the cached data of the target metadata from the cloud disk into the first memory of the loader; the target metadata is metadata in the metadata service whose data update frequency is lower than the first preset value and whose data access frequency is higher than the second preset value;
访问模块720,用于响应于主节点的元数据访问请求,将第一内存中的目标元数据的缓存数据加载至主节点对应的从节点的第二内存中;从节点读取第二内存,以访问目标元数据。The access module 720 is used to load the cache data of the target metadata in the first memory into the second memory of the slave node corresponding to the master node in response to the metadata access request of the master node; the slave node reads the second memory to access the target metadata.
需要说明的是,上述各个模块可以是功能模块也可以是程序模块,既可以通过软件来实现,也可以通过硬件来实现。对于通过硬件来实现的模块而言,上述各个模块可以位于同一处理器中;或者上述各个模块还可以按照任意组合的形式分别位于不同的处理器中。It should be noted that the above modules can be functional modules or program modules, and can be implemented by software or hardware. For modules implemented by hardware, the above modules can be located in the same processor; or the above modules can be located in different processors in any combination.
在本实施例中还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。 This embodiment also provides an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.
可选地,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。Optionally, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:Optionally, in this embodiment, the processor may be configured to perform the following steps through a computer program:
S1,将目标元数据的缓存数据加载至第一内存中;目标元数据为元数据服务中数据更新频次低于第一预设值且数据访问频次高于第二预设值的元数据;S1, loading cache data of target metadata into a first memory; the target metadata is metadata in a metadata service whose data update frequency is lower than a first preset value and whose data access frequency is higher than a second preset value;
S2,响应于主节点的元数据访问请求,将第一内存中的目标元数据的缓存数据加载至主节点对应的从节点的第二内存中;从节点读取第二内存,以访问目标元数据。S2, in response to the metadata access request of the master node, load the cache data of the target metadata in the first memory into the second memory of the slave node corresponding to the master node; the slave node reads the second memory to access the target metadata.
需要说明的是,在本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,在本实施例中不再赘述。It should be noted that the specific examples in this embodiment can refer to the examples described in the above embodiments and optional implementation modes, and will not be repeated in this embodiment.
此外,结合上述实施例中提供的一种元数据访问方法,在本实施例中还可以提供一种存储介质来实现。该存储介质上存储有计算机程序;该计算机程序被处理器执行时实现上述实施例中的任意一种元数据访问方法的步骤。In addition, in combination with a metadata access method provided in the above embodiment, a storage medium may be provided in this embodiment to implement the method. The storage medium stores a computer program; when the computer program is executed by a processor, the steps of any metadata access method in the above embodiment are implemented.
应该明白的是,这里描述的具体实施例只是用来解释这个应用,而不是用来对它进行限定。根据本申请提供的实施例,本领域普通技术人员在不进行创造性劳动的情况下得到的所有其它实施例,均属本申请保护范围。It should be understood that the specific embodiments described herein are only used to explain the application, rather than to limit it. Based on the embodiments provided in this application, all other embodiments obtained by ordinary technicians in this field without creative work are within the protection scope of this application.
显然,附图只是本申请的一些例子或实施例,对本领域的普通技术人员来说,也可以根据这些附图将本申请适用于其他类似情况,但无需付出创造性劳动。另外,可以理解的是,尽管在此开发过程中所做的工作可能是复杂和漫长的,但是,对于本领域的普通技术人员来说,根据本申请披露的技术内容进行的某些设计、制造或生产等更改仅是常规的技术手段,不应被视为本申请公开的内容不足。Obviously, the drawings are only some examples or embodiments of the present application. For ordinary technicians in the field, the present application can also be applied to other similar situations based on these drawings without creative work. In addition, it is understandable that although the work done in this development process may be complicated and lengthy, for ordinary technicians in the field, certain changes in design, manufacturing or production based on the technical content disclosed in this application are only conventional technical means and should not be regarded as insufficient content disclosed in this application.
“实施例”一词在本申请中指的是结合实施例描述的具体特征、结构或特性可以包括在本申请的至少一个实施例中。该短语出现在说明书中的各个位置并不一定意味着相同的实施例,也不意味着与其它实施例相互排斥而具有独立性或可供选择。本领域的普通技术人员能够清楚或隐含地理解的是,本申请中描述的实施例在没有冲突的情况下,可以与其它实施例结合。The term "embodiment" in this application refers to a specific feature, structure or characteristic described in conjunction with the embodiment that can be included in at least one embodiment of the present application. The appearance of this phrase in various places in the specification does not necessarily mean the same embodiment, nor does it mean that it is mutually exclusive with other embodiments and is independent or optional. It is clearly or implicitly understood by those of ordinary skill in the art that the embodiments described in this application can be combined with other embodiments without conflict.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对专利保护范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请的保护范围应以所附权利要求为准。 The above-mentioned embodiments only express several implementation methods of the present application, and the descriptions thereof are relatively specific and detailed, but they cannot be understood as limiting the scope of patent protection. It should be pointed out that, for a person of ordinary skill in the art, several variations and improvements can be made without departing from the concept of the present application, and these all belong to the scope of protection of the present application. Therefore, the scope of protection of the present application shall be subject to the attached claims.

Claims (10)

  1. 一种元数据访问方法,其特征在于,所述方法包括:A metadata access method, characterized in that the method comprises:
    通过数据抽取器将目标元数据从元数据服务中抽取出来缓存至云磁盘,加载器从云磁盘中将目标元数据的缓存数据加载至加载器的第一内存中;所述目标元数据为元数据服务中数据更新频次低于第一预设值且数据访问频次高于第二预设值的元数据;The target metadata is extracted from the metadata service by a data extractor and cached to a cloud disk, and the loader loads the cached data of the target metadata from the cloud disk to the first memory of the loader; the target metadata is metadata in the metadata service whose data update frequency is lower than a first preset value and whose data access frequency is higher than a second preset value;
    响应于主节点的元数据访问请求,将所述第一内存中的所述目标元数据的缓存数据加载至所述主节点对应的从节点的第二内存中;所述从节点读取所述第二内存,以访问所述目标元数据。In response to a metadata access request from the master node, the cache data of the target metadata in the first memory is loaded into a second memory of a slave node corresponding to the master node; the slave node reads the second memory to access the target metadata.
  2. 根据权利要求1所述的元数据访问方法,其中,所述将目标元数据的缓存数据加载至第一内存中之前包括:The metadata access method according to claim 1, wherein before loading the cache data of the target metadata into the first memory, the method further comprises:
    获取所述元数据服务中的目标元数据;Obtaining target metadata from the metadata service;
    根据获取到的所述目标元数据,生成所述目标元数据的缓存数据。Generate cache data of the target metadata according to the acquired target metadata.
  3. 根据权利要求2所述的元数据访问方法,其中,所述根据获取到的所述目标元数据,生成所述目标元数据的缓存数据包括:According to the metadata access method of claim 2, wherein generating cache data of the target metadata according to the acquired target metadata comprises:
    根据所述目标元数据的数据类型对所述目标元数据进行分类,获取类型信息;Classifying the target metadata according to the data type of the target metadata to obtain type information;
    对所述目标元数据进行特征提取;Performing feature extraction on the target metadata;
    对提取到的特征和分类后的目标元数据进行编码,得到编码数据;Encoding the extracted features and the classified target metadata to obtain encoded data;
    根据所述编码数据和所述类型信息,生成所述缓存数据。The cache data is generated according to the encoded data and the type information.
  4. 根据权利要求2所述的元数据访问方法,其中,所述根据获取到的所述目标元数据,生成所述目标元数据的缓存数据包括:According to the metadata access method of claim 2, wherein generating cache data of the target metadata according to the acquired target metadata comprises:
    获取所述目标元数据的版本信息,根据所述版本信息生成所述缓存数据。The version information of the target metadata is obtained, and the cache data is generated according to the version information.
  5. 根据权利要求1所述的元数据访问方法,其中,所述方法还包括:The metadata access method according to claim 1, wherein the method further comprises:
    响应于元数据服务发送的数据更新指令,获取所述从节点的任务的执行状态;所述数据更新指令用于指示更新所述第一内存中存储的所述目标元数据的缓存数据;In response to a data update instruction sent by the metadata service, acquiring the execution status of the task of the slave node; the data update instruction is used to instruct to update the cache data of the target metadata stored in the first memory;
    根据所述执行状态,从所述第一内存中选择与所述执行状态对应的所述目标元数据的缓存数据加载至所述从节点的所述第二内存中。According to the execution state, cache data of the target metadata corresponding to the execution state is selected from the first memory and loaded into the second memory of the slave node.
  6. 根据权利要求5所述的元数据访问方法,其中,所述根据所述执行状态,从所述第一内存中选择与所述执行状态对应的所述目标元数据的缓存数据加载至所述从节点的所述第二内存中包括: According to the metadata access method of claim 5, wherein, according to the execution state, selecting cache data of the target metadata corresponding to the execution state from the first memory and loading it into the second memory of the slave node comprises:
    当所述执行状态为所述从节点执行任务时,从所述第一内存中选择更新前的所述目标元数据的缓存数据加载至所述从节点的所述第二内存中;When the execution state is that the slave node executes the task, cache data of the target metadata before the update is selected from the first memory and loaded into the second memory of the slave node;
    当所述执行状态为所述从节点空闲时,从所述第一内存中选择更新后的所述目标元数据的缓存数据加载至所述从节点的所述第二内存中。When the execution state is that the slave node is idle, the updated cache data of the target metadata is selected from the first memory and loaded into the second memory of the slave node.
  7. 根据权利要求1所述的元数据访问方法,其中,所述方法还包括:The metadata access method according to claim 1, wherein the method further comprises:
    响应于从节点的接入请求,将所述第一内存的地址连接到待接入的从节点中。In response to an access request from a slave node, the address of the first memory is connected to the slave node to be accessed.
  8. 根据权利要求1至权利要求7中任一项所述的元数据访问方法,其中,所述从节点为无状态计算节点。The metadata access method according to any one of claims 1 to 7, wherein the slave node is a stateless computing node.
  9. 一种元数据访问装置,其特征在于,所述装置包括:A metadata access device, characterized in that the device comprises:
    第一加载模块,用于通过数据抽取器将目标元数据从元数据服务中抽取出来缓存至云磁盘,加载器从云磁盘中将目标元数据的缓存数据加载至加载器的第一内存中;所述目标元数据为元数据服务中数据更新频次低于第一预设值且数据访问频次高于第二预设值的元数据;A first loading module is used to extract target metadata from a metadata service through a data extractor and cache the target metadata to a cloud disk, and the loader loads the cached data of the target metadata from the cloud disk to a first memory of the loader; the target metadata is metadata in the metadata service whose data update frequency is lower than a first preset value and whose data access frequency is higher than a second preset value;
    访问模块,用于响应于主节点的元数据访问请求,将所述第一内存中的所述目标元数据的缓存数据加载至所述主节点对应的从节点的第二内存中;所述从节点读取所述第二内存,以访问所述目标元数据。An access module is used to load the cache data of the target metadata in the first memory into the second memory of the slave node corresponding to the master node in response to the metadata access request of the master node; the slave node reads the second memory to access the target metadata.
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至权利要求8中任一项所述的元数据访问方法的步骤。 A computer-readable storage medium having a computer program stored thereon, characterized in that when the computer program is executed by a processor, the steps of the metadata access method described in any one of claims 1 to 8 are implemented.
PCT/CN2023/126791 2022-11-14 2023-10-26 Metadata access method and device, and storage medium WO2024104073A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211418015.0 2022-11-14
CN202211418015.0A CN115470008B (en) 2022-11-14 2022-11-14 Metadata access method and device and storage medium

Publications (1)

Publication Number Publication Date
WO2024104073A1 true WO2024104073A1 (en) 2024-05-23

Family

ID=84338079

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/126791 WO2024104073A1 (en) 2022-11-14 2023-10-26 Metadata access method and device, and storage medium

Country Status (2)

Country Link
CN (1) CN115470008B (en)
WO (1) WO2024104073A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115470008B (en) * 2022-11-14 2023-03-10 杭州拓数派科技发展有限公司 Metadata access method and device and storage medium
CN115878405A (en) * 2023-03-08 2023-03-31 杭州拓数派科技发展有限公司 PostgreSQL database memory detection method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077224A (en) * 2014-07-04 2014-10-01 用友软件股份有限公司 Software function analyzing system and method
US20170330239A1 (en) * 2016-05-13 2017-11-16 Yahoo Holdings, Inc. Methods and systems for near real-time lookalike audience expansion in ads targeting
US20200218634A1 (en) * 2019-01-08 2020-07-09 FinancialForce.com, Inc. Software development framework for a cloud computing platform
CN112955869A (en) * 2018-11-08 2021-06-11 英特尔公司 Function As A Service (FAAS) system enhancements
CN114827145A (en) * 2022-04-24 2022-07-29 阿里巴巴(中国)有限公司 Server cluster system, and metadata access method and device
CN115470008A (en) * 2022-11-14 2022-12-13 杭州拓数派科技发展有限公司 Metadata access method and device and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718484A (en) * 2014-12-04 2016-06-29 中兴通讯股份有限公司 File writing method, file reading method, file deletion method, file query method and client
CN105988721A (en) * 2015-02-10 2016-10-05 中兴通讯股份有限公司 Data caching method and apparatus for network disk client
CN105279240B (en) * 2015-09-28 2018-07-13 暨南大学 The metadata forecasting method and system of client origin information association perception
CN109471843B (en) * 2018-12-24 2021-08-10 郑州云海信息技术有限公司 Metadata caching method, system and related device
CN111427966B (en) * 2020-06-10 2020-09-22 腾讯科技(深圳)有限公司 Database transaction processing method and device and server
CN114625762A (en) * 2020-11-27 2022-06-14 华为技术有限公司 Metadata acquisition method, network equipment and system
US11782637B2 (en) * 2021-01-05 2023-10-10 Red Hat, Inc. Prefetching metadata in a storage system
CN113220693B (en) * 2021-06-02 2023-10-20 北京火山引擎科技有限公司 Computing storage separation system, data access method thereof, medium and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077224A (en) * 2014-07-04 2014-10-01 用友软件股份有限公司 Software function analyzing system and method
US20170330239A1 (en) * 2016-05-13 2017-11-16 Yahoo Holdings, Inc. Methods and systems for near real-time lookalike audience expansion in ads targeting
CN112955869A (en) * 2018-11-08 2021-06-11 英特尔公司 Function As A Service (FAAS) system enhancements
US20200218634A1 (en) * 2019-01-08 2020-07-09 FinancialForce.com, Inc. Software development framework for a cloud computing platform
CN114827145A (en) * 2022-04-24 2022-07-29 阿里巴巴(中国)有限公司 Server cluster system, and metadata access method and device
CN115470008A (en) * 2022-11-14 2022-12-13 杭州拓数派科技发展有限公司 Metadata access method and device and storage medium

Also Published As

Publication number Publication date
CN115470008A (en) 2022-12-13
CN115470008B (en) 2023-03-10

Similar Documents

Publication Publication Date Title
WO2024104073A1 (en) Metadata access method and device, and storage medium
CN108595207B (en) Gray scale publishing method, rule engine, system, terminal and storage medium
WO2016011883A1 (en) Data resource acquisition method, device and system
CN110096336B (en) Data monitoring method, device, equipment and medium
US9288156B2 (en) Method and apparatus for supporting scalable multi-modal dialog application sessions
WO2019153488A1 (en) Service configuration management method, apparatus, storage medium and server
WO2020052322A1 (en) Data processing method, device and computing node
CN105512266A (en) Method and device for achieving operational consistency of distributed database
WO2014067254A1 (en) Method, device and database system for detecting database data consistency
KR20120018178A (en) Swarm-based synchronization over a network of object stores
CN111338806B (en) Service control method and device
WO2018010501A1 (en) Global transaction identifier (gtid) synchronization method, apparatus and system, and storage medium
CN111818117A (en) Data updating method and device, storage medium and electronic equipment
US20170155741A1 (en) Server, method, and system for providing service data
CN113010549A (en) Data processing method based on remote multi-active system, related equipment and storage medium
WO2023071576A1 (en) Container cluster construction method and system
CN114629883B (en) Service request processing method and device, electronic equipment and storage medium
WO2024037629A1 (en) Data integration method and apparatus for blockchain, and computer device and storage medium
WO2023185454A1 (en) Data access method and related apparatus
CN101751292B (en) Method for realizing consistency function of multimachine core data in ATC (automatic timing corrector) system
CN115238006A (en) Retrieval data synchronization method, device, equipment and computer storage medium
CN112711466B (en) Hanging affair inspection method and device, electronic equipment and storage medium
CN112261097A (en) Object positioning method for distributed storage system and electronic equipment
CN114764379A (en) Access switching method and device for application software and computer readable storage medium
CN113297516A (en) Customer interaction interface generation method and device and electronic equipment