WO2024051027A1 - 一种大数据的数据配置方法和系统 - Google Patents

一种大数据的数据配置方法和系统 Download PDF

Info

Publication number
WO2024051027A1
WO2024051027A1 PCT/CN2022/139829 CN2022139829W WO2024051027A1 WO 2024051027 A1 WO2024051027 A1 WO 2024051027A1 CN 2022139829 W CN2022139829 W CN 2022139829W WO 2024051027 A1 WO2024051027 A1 WO 2024051027A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
node
primary
shard
name node
Prior art date
Application number
PCT/CN2022/139829
Other languages
English (en)
French (fr)
Inventor
吕灏
韩国权
李庆
胥月
黄海峰
蔡惠民
Original Assignee
中电科大数据研究院有限公司
太极计算机股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中电科大数据研究院有限公司, 太极计算机股份有限公司 filed Critical 中电科大数据研究院有限公司
Publication of WO2024051027A1 publication Critical patent/WO2024051027A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1435Saving, restoring, recovering or retrying at system level using file system or storage system metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the invention relates to the field of information processing, in particular to a data configuration method and system for big data.
  • the present invention proposes a data configuration method and system for big data.
  • the method includes: configuring a name node, a data node and a client, wherein the name node is configured as a central management server, descriptive metadata is stored in memory in the form of a list, responds to the client's request for file access, and Provide internal metadata services;
  • the data node is used to store the data required by the user, store the data in blocks, set a fixed size of each block, and perform backup storage; receive the control information forwarded by the name node, and under the unified scheduling of the name node Create, delete and copy data blocks and report to the name node periodically;
  • the user performs data access through the name node; the primary shard and secondary shard of the data are set in the data node; in order to maintain data consistency between the primary shard and the secondary shard, the primary and secondary shards pass confirmation messages The interaction completes the data synchronization of the primary and secondary shards.
  • the primary and secondary shards complete the data synchronization of the primary and secondary shards through the interaction of confirmation messages.
  • the specific steps are: processing the relational database operations associated with them on the two shards at the same time.
  • the primary shard needs to be submitted.
  • the shard issues a commit request. If the secondary shard has completed the task, it will directly return an ACK message to the primary shard. If the secondary shard has not completed the task, it will return a NACK message to the primary shard.
  • Use Use a trigger to indicate whether you need to wait and record it in the log.
  • the method of storing data in blocks, setting a fixed size of each block, and performing backup storage include: storing three copies by default, one on the local machine, one on the same rack machine, and others. A portion of the rack.
  • a namenode has at least one backup namenode.
  • the backup name node performs regular name node backup and ensures normal operation of the cluster through automatic switching.
  • the user when the user creates a file, the user first caches the file data into a local temporary file. When the accumulated data in this temporary file reaches the threshold, the user initiates a connection with the name node.
  • each client sets effective mark status information for the metadata in the storage node.
  • the effective mark status information before the operation is updated and stored in the log.
  • triggering to indicate whether to wait and logging include: setting a rollback value for the consistency of the primary copy in the data node, and the rollback value is used to indicate the tolerance of inconsistency between the copy and the primary copy. , that is, when any one of the primary and secondary shards fails to work, data inconsistency between the primary and secondary shards is allowed, and the entire cluster's transactions are allowed to be forcibly submitted.
  • the name node detects that the number of replicas in the system is lower than the system's preset replica number threshold, the name node detects the data blocks contained in the lost replicas, and when the system is idle, replicates the missing replicas to reach the preset replica threshold. ; Detect the operations that need to be recovered from the event log, and then call the process to complete.
  • the data storage nodes are configured to distribute replicas of virtual shards, a single physical data node is configured to deploy multiple logical shards, and the replica shards of each logical shard are deployed on different physical machines.
  • the system includes a name node, a data node and a user terminal, wherein the name node is configured as a central management server, and when the data node is set as a storage node, on its primary and secondary shards stored
  • the data node is set as a storage node, on its primary and secondary shards stored
  • log event information is set at the same time to meet the user's information transmission and processing needs when the data of the primary and secondary shards are inconsistent.
  • equalization through log event information the data is improved. processing efficiency.
  • Figure 1 is a schematic diagram of the process flow of the method of the present invention.
  • words such as “first” and “second” are used to describe the same or similar items with basically the same function or effect.
  • words such as “first” and “second” do not limit the number and execution order.
  • the first information and the second information are used to distinguish different information rather than to describe a specific order of the information.
  • the present invention proposes a data configuration method for big data.
  • the method includes: configuring a name node, a data node and a client in a corresponding system, where the name node is configured as a central management Server, descriptive metadata is stored in memory in the form of a list, responds to client requests for file access, and provides internal metadata services;
  • the data node is used to store the data required by the user, store the data in blocks, set a fixed size of each block, and perform backup storage; receive the control information forwarded by the name node, and under the unified scheduling of the name node Create, delete and copy data blocks and report to the name node periodically;
  • the user performs data access through the name node; the primary shard and secondary shard of the data are set in the data node; in order to maintain data consistency between the primary shard and the secondary shard, the primary and secondary shards pass confirmation messages The interaction completes the data synchronization of the primary and secondary shards.
  • the primary and secondary shards complete the data synchronization of the primary and secondary shards through the interaction of confirmation messages.
  • the specific steps are: processing the relational database operations associated with them on the two shards at the same time.
  • the primary shard needs to be submitted, the primary shard needs to be submitted at the same time. Issue a commit request. If the secondary shard has completed the task, it will directly return an ACK message to the primary shard. If the secondary shard has not completed the task, it will return a NACK message to the primary shard.
  • the trigger indicates whether it needs to wait and is logged.
  • the system sets up a master-slave cluster structure based on the access structure of big data.
  • the cluster consists of a name node, a backup name node, multiple data nodes and multiple user terminals.
  • the name node is a key component. As the central management server in the file system, the name node mainly provides internal metadata services. It is responsible for managing the namespace of the file system and responding to user access to files. It will store the descriptive metadata of the system. Save it as a list and store it in memory so that users can access it quickly. If the namenode fails, the entire file system becomes unusable because it stores information about all data blocks and the file cannot be reconstructed without it. Back up the name node, perform name node backup regularly, and ensure the normal operation of the cluster through automatic switching.
  • the name node includes the basic information of the file, the mapping relationship between the file and the data block, and the storage location of the data block in it.
  • the data node is responsible for storing user data. It divides the local disk into multiple blocks or slices to store data. The default size of each block is . The default storage is three copies, one on the local machine and one on other machines in the same rack. One copy and one copy of other racks, and save the metadata of blocks and slices in the memory. Create, delete and copy data blocks under the unified scheduling of the name node, and report to the name node periodically.
  • the client is the user interface, responsible for interacting with the cluster and performing operations such as reading and writing files.
  • the user When the user wants to create a file, the user will first cache the file data into a local temporary file. When the accumulated data in this temporary file reaches the threshold size, the user will contact the name node. When the user wants to read file, the client will query the storage location of the required file, and the name node will return the address of the data node where the data is stored and the addresses of other copies. The client can directly transmit data to the data node, and finally terminate the connection.
  • Each client node in the system sets the valid flag status Valid_flag information for the metadata in the data node.
  • the Valid_flag status before the operation is updated, and the Valid_flag setting is submitted in the form of a pre-write log.
  • a transactional nature For the data generated by the transaction itself, the multi-version management mechanism of Valid_flag is used to realize the latest reference to the data in its own update.
  • the primary shard and the secondary shard of data are set in the storage node; in order to set the data consistency between the primary shard and the secondary shard, the consistency mechanism of the primary and secondary shards is implemented in a segmented manner.
  • the optional way is to process the relational database operations associated with them on both shards at the same time.
  • the waiting period is set by the name node.
  • the primary shard receives an ACK message from the secondary shard, it instructs the secondary shard to submit updated data together.
  • the data on the primary shard is updated first, and then the primary shard forwards the data to the secondary shard.
  • the updated data is transmitted to the secondary shard according to the incremental transmission method, and when the primary shard completes the data update Later, when the primary shard needs to be submitted, you can optionally follow the Valid_flag setting method mentioned above, and at the same time send a submission request to the secondary shard. If the secondary shard has finished processing this task, it will directly return a to the primary shard. ACK message. If the secondary shard has not completed this task, it will return a NACK message to the primary shard to indicate that it still needs to wait, or to trigger whether it needs to wait and record the log.
  • a rollback value is set in the data node for the consistency of the primary and replicas.
  • the rollback value is used to indicate the tolerance of inconsistency between the replica and the primary. That is, when there is any inconsistency between the primary and secondary shards, When a shard fails to work, the system allows data inconsistency between the primary and secondary shards, and allows the entire cluster's transactions to be forcibly committed.
  • the event log is retained on the default or reserved shard of the storage data node, and the event of the above operation is recorded.
  • the event operation is to record the operation lost on the failed shard, which may be a data update operation.
  • the shard on the failed node returns to normal, the operations that need to be restored are detected from the event log, and then the process is called to complete the data on the failed node shard and equalize the data on the healthy shard, thereby completing the data recovery after the failure. and safety.
  • the identification information of the retained event log will be synchronously sent to the user terminal to indicate data on which the information on the storage node has not been successfully synchronized.
  • the name node when the name node detects that the number of replicas in the system is lower than the system's preset replica number threshold, the name node will quickly detect the data blocks contained in the lost replicas and copy the missing replicas when the system is idle to achieve Default copy number threshold.
  • the data storage nodes in the system are set to copy distribution of virtual shards.
  • one physical node can deploy multiple logical shards, and the replica shards of each logical shard can be deployed in different locations.
  • the copies of the logical shards on the physical storage node are dispersed to multiple storage nodes, ensuring that the load of the downed server and other nodes can not only fail over, but also the load on the failed server can be restored.
  • Load balancing after failover For example, each data specifies a storage node and storage directory. The data is stored on the data node in the form of a file and is accessed by the storage unit deployed on the node.
  • the master node will query other storage units to see if there is storage space. If there is storage space, it will allocate storage space to the data from other data nodes; if a certain storage node needs the entire Migration, the name node or data node serving as the master node needs to suspend storage requests for data units, query other nodes to see if they have suitable space to take over the data nodes configured on this server, and synchronize the data files if there is a suitable storage location. Go to other servers and send a file storage location information change request to the name node. The name node will record the change and synchronize it to the virtual logical shard management unit corresponding to the storage space mapping.
  • the data when the data is written to the allocation, it is set to concurrent operation.
  • the data when the data is written to the corresponding shard data, if the file is locked, it can also store the data in another Hash segment. Data is written to other shard files, thus reducing the degradation in data writing performance caused by locking.
  • each data has three copies, one copy is the primary data, and the other two copies are backups of the primary data.
  • the primary copy information is modified, and then the data of the other two copies passes through the network.
  • the main information and copy information of all tag data will be compressed and sent to the cluster through broadcast messages. When reading, the data is still decompressed through broadcast messages.
  • the namespace data can be accessed by all users in the entire system. .
  • the name node will receive the heartbeat signal and block report from the data node at a fixed time interval. If the name node verifies the metadata based on the block report and finds an exception, or the data node does not send the heartbeat signal on time, these corresponding The data node will be marked as down. At this time, the system will no longer send any I/O requests to the abnormal data node.
  • the client wants to access the file system, it must interact with the name node and data node. First, the client will find the name node. When receiving the request, the name node will respond to the client. The client will obtain the file metadata. At the same time, the name node will map the data block to the data node through the metadata.
  • the client when it has a file writing request, it will be sent to the name node as soon as possible.
  • the name node will exchange information with the data node. Specifically, the client will send the file size and configuration information to the name node. node, the name node will return the address information related to the data node it manages to the user based on the received information; then, the user can split the file to be written into Many small data blocks are written to the corresponding data nodes in sequence.
  • the client first makes a request to the name node to read the file.
  • the name node will quickly return the address information of the data node that stores the file requested by the client to the client.
  • the client can The file was successfully read through the address information of the data node.
  • the program can be stored in a computer-readable storage medium.
  • the program can be stored in a computer-readable storage medium.
  • the process may include the processes of the embodiments of each of the above methods.
  • the storage medium may be a magnetic disk, an optical disk, or a read-only memory (Read-Only memory).
  • Memory ROM), random access memory (Random Access Memory (RAM), flash memory (Flash Memory), hard disk (Hard Disk Drive, abbreviation: HDD) or solid-state drive (Solid-State Drive, SSD), etc.; the storage medium may also include a combination of the above types of memories.
  • a component may be, but is not limited to: a process running on a processor, a processor, an object, an executable file, a thread of execution, a program, and/or a computer.
  • an application running on a computing device and the computing device may be components.
  • One or more components can exist within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers. Additionally, these components can execute from various computer-readable media having various data structures thereon.
  • These components may be configured by having one or more data groupings (e.g., data from one component that interacts with another component in a local system, a distributed system, and/or in a signaling manner via, for example, the Internet).
  • the network interacts with other systems) signals to communicate in the form of local and/or remote processes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及信息处理领域,具体公开了一种大数据的数据配置方法和系统,所述方法包括配置名称节点,数据节点和用户端,其中,所述名称节点被配置为中心管理服务器,在数据节点设置为存储节点时,在其存储的主、副分片上通过确认消息的交互完成主、副分片的数据同步,同时设置日志事件信息,满足用户主、副分片数据不一致时信息的传输处理需求,在通过日志事件信息追平情形下,提高了数据处理效率。

Description

一种大数据的数据配置方法和系统 技术领域
本发明涉及信息处理领域,特别是一种大数据的数据配置方法和系统。
背景技术
随着云计算能力的提升,基于各行各业中的海量数据处理,逐渐成为人们关注和研究的热点;将数据挖掘的方法应用到各种领域,成为一种趋势。
在从大量的数据中挖掘出有价值的信息供管理、决策和调控参考使用的同时,如何使得在海量存储的系统中,保障数据的安全和一致性,从而为用户在进行处理和访问时,能够及时安全地更新,已经成为亟待解决的问题。
技术问题
为解决上述问题之一,本发明提出一种大数据的数据配置方法和系统。
技术解决方案
所述方法包括:配置名称节点,数据节点和用户端,其中,所述名称节点被配置为中心管理服务器,描述性元数据以列表形式存储在内存中,响应用户端对文件的访问要求,并提供内部元数据服务;
所述数据节点用于存储用户端需要的数据,以分块的方式存储数据,设置固定的每个块的大小,并进行备份存储;接收名称节点转发的控制信息,在名称节点的统一调度下进行数据块的创建、删除和复制工作,并周期性地向名称节点报告;
用户端通过名称节点执行数据的访问;其中在数据节点中设置数据的主分片和副分片;为保持主分片与副分片之间的数据一致性,主、副分片通过确认消息的交互完成主、副分片的数据同步。
进一步,所述主、副分片通过确认消息的交互完成主、副分片的数据同步具体为:在两个分片上同时处理与其关联的关系数据库操作,当主分片需要提交时,同时向副分片发出提交请求,如果副分片已经处理完本次任务,它会直接返回主分片一个ACK消息,如果副分片还没有完成本次任务,它会返回主分片一个NACK消息,用以触发表明是否需要等待,并做日志记录。
进一步,所述以分块的方式存储数据,设置固定的每个块的大小,并进行备份存储包括:默认存储三份,分别是本机的一份、同机架机器上的一份和其它机架的一份。
进一步,一个名称节点具有至少一个备份名称节点。
进一步,所述备份名称节点定时进行名称节点备份,通过自动切换保证集群正常运行。
进一步,当用户端创建文件时,用户端先将文件数据缓存到本地的一个临时文件中,当这个临时文件积累的数据达到阈值后,用户端才与名称节点发起连接。
进一步,各个用户端为存储节点中的元数据设置有效标记状态信息,当出现系统故障,更新操作前的有效标记状态信息,并存在日志中。
进一步,用以触发表明是否需要等待,并做日志记录包括:数据节点中对主副本的一致性设置回退值,所述回退值用于标示副本和主本之间的不一致性的容忍度,即当主、副分片中有任意一个分片无法工作的时刻,允许主、副分片之间的数据非一致性,并允许集群整体的事务强制提交。
进一步,当名称节点检测到系统中副本数量比系统预先设定的副本数量阈值低时,名称节点检测出丢失副本包含的数据块,并在系统空闲时,复制缺少的副本以达到预设副本阈值;从事件日志中探测出需要恢复的操作,进而调用进程完成。
进一步,数据存储节点设置为虚拟分片的副本分布,单个物理数据节点配置成部署多个逻辑分片,并且每个逻辑分片的副本分片部署在不同物理机上。
本申请公开的方案,所述系统包括名称节点,数据节点和用户端,其中,所述名称节点被配置为中心管理服务器,在数据节点设置为存储节点时,在其存储的主、副分片上通过确认消息的交互完成主、副分片的数据同步,同时设置日志事件信息,满足用户主、副分片数据不一致时信息的传输处理需求,在通过日志事件信息追平情形下,提高了数据处理效率。
附图说明
通过参考附图会更加清楚的理解本发明的特征和优点,附图是示意性的而不应理解为对本发明进行任何限制。
图1是本发明方法流程的示意图。
本发明的最佳实施方式
参看下面的说明以及附图,本发明的这些或其他特征和特点、操作方法、结构的相关元素的功能、部分的结合以及制造的经济性可以被更好地理解,其中说明和附图形成了说明书的一部分。然而,可以清楚地理解,附图仅用作说明和描述的目的,并不意在限定本发明的保护范围。可以理解的是,附图并非按比例绘制。本发明中使用了多种结构图用来说明根据本发明的实施例的各种变形。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
需要说明的是,本文中的“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。
需要说明的是,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能或作用基本相同的相同项或相似项进行区分,本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定。例如,第一信息和第二信息是用于区别不同的信息,而不是用于描述信息的特定顺序。
需要说明的是,本发明实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本发明实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
实施例1
如图1所示,本发明提出一种大数据的数据配置方法,所述方法包括:在对应的系统中,配置名称节点,数据节点和用户端,其中,所述名称节点被配置为中心管理服务器,描述性元数据以列表形式存储在内存中,响应用户端对文件的访问要求,并提供内部元数据服务;
所述数据节点用于存储用户端需要的数据,以分块的方式存储数据,设置固定的每个块的大小,并进行备份存储;接收名称节点转发的控制信息,在名称节点的统一调度下进行数据块的创建、删除和复制工作,并周期性地向名称节点报告;
用户端通过名称节点执行数据的访问;其中在数据节点中设置数据的主分片和副分片;为保持主分片与副分片之间的数据一致性,主、副分片通过确认消息的交互完成主、副分片的数据同步。
所述主、副分片通过确认消息的交互完成主、副分片的数据同步具体为:在两个分片上同时处理与其关联的关系数据库操作,当主分片需要提交时,同时向副分片发出提交请求,如果副分片已经处理完本次任务,它会直接返回主分片一个ACK消息,当如果副分片还没有完成本次任务,它会返回主分片一个NACK消息, 用以触发表明是否需要等待,并做日志记录。
示例性的,所述系统基于大数据的访问结构设置主从集群结构,所述集群由一个名称节点和一个备份名称节点、多个数据节点和多个用户终端组成。
名称节点是其中的关键组件,名称节点作为文件系统中的中心管理服务器,主要提供内部元数据服务,负责管理文件系统的命名空间并响应用户端对文件的访问,它将系统的描述性元数据保存成列表形式存储在内存中,用户就可以进行快速访问。如果名称节点失效,那么整个文件系统将无法使用,这是因为其中存储着所有数据块的信息,没有它就无法重建文件。备份名称节点,定时进行名称节点备份,通过自动切换保证集群正常运行。名称节点中包括文件的基本信息,文件与数据块之间的映射关系以及数据块在其中的存储位置。
数据节点负责存储用户的数据,它将本地磁盘划分成多个块或片来存储数据,每个块的大小默认为,默认存储三份,分别是本机的一份、同机架其它机器上的一份和其它机架的一份,并且在内存中保存块、片的元数据,在名称节点的统一调度下进行数据块的创建、删除和复制工作,并周期性地向名称节点报告。用户端是用户接口,负责与集群进行交互,并对文件进行读、写等操作。
当用户端想创建文件时,用户端会先将文件数据缓存到本地的一个临时文件中,当这个临时文件积累的数据达到阈值大小后,用户端将会联系名称节点,当用户端想读取文件时,用户端会通过查询所需文件的存放位置,名称节点返回存储数据的数据节点的地址和其它副本的地址,用户端可以直接与数据节点进行数据传输,最后中断连接。
所述系统中各个用户端节点为数据节点中的元数据设置有效标志状态Valid_flag信息,当出现系统故障,更新操作前的Valid_flag状态,采用预写日志的方式,将Valid_flag的设置提交时,设置为一种事务性。对于事务自身参照自身产生的数据,则采用对Valid_flag的多版本管理机制实现对自身更新中数据的最新参照。
在所述存储节点中设置数据的主分片和副分片;为设置主分片与副分片之间的数据一致性,主、副分片的一致性机制采用分段的方式实现。可选的方式是,在两个分片上同时处理与其关联的关系数据库操作,当主分片需要提交时,可选的可按照上述的Valid_flag置位方式,同时会向副分片发出提交请求,如果副分片已经处理完本次任务,它会直接返回主分片一个ACK消息,当如果副分片还没有完成本次任务,它会返回主分片一个NACK消息, 用以表明还需要等待,或用以触发表明是否需要等待,并做日志记录。
可选的,所述等待由名称节点予以设置等待周期时间,当主分片接到从副分片发来的ACK消息时,它会指示副分片一块提交更新完的数据。
可选的,先更新主分片上数据,再由主分片将数据转发到副分片上,将所述更新的数据按照增分传输的方式以传输到副分片上,而当主分片完成数据更新后,当主分片需要提交时,可选的可按照上述的Valid_flag置位方式,同时会向副分片发出提交请求,如果副分片已经处理完本次任务,它会直接返回主分片一个ACK消息,如果副分片还没有完成本次任务,它会返回主分片一个NACK消息, 用以表明还需要等待,或用以触发表明是否需要等待,并做日志记录。示例性的,在数据节点中对主、副本的一致性设置回退值,所述回退值用于标示副本和主本之间的不一致性的容忍度,即当主、副分片中有任意一个分片无法工作的时刻,系统允许主、副分片之间的数据非一致性,并允许集群整体的事务可以强制提交。
此时,在存储数据节点的缺省或预留的分片上,保留事件日志,并记录上述操作的事件,所述事件操作为记录故障分片上丢失的那个操作,可能是一个数据更新操作。
此后,当故障节点上的那个分片恢复正常的时候,从事件日志中探测出需要恢复的操作,进而调用进程完成故障节点分片向健康分片上数据的数据追平,从而完成故障后数据恢复和安全。所述用户端访问数据时,所述保留事件日志的标识信息,将同步发送到用户端,以指示所述存储节点上的信息未同步成功的数据。
示例性的,当名称节点检测到系统中副本数量比系统预先设定的副本数量阈值低时,名称节点会快速检测出丢失副本包含的数据块,并在系统空闲的时候复制缺少的副本以达到预设副本数量阈值。可选的,同样可以从事件日志中探测出需要恢复的操作,进而调用进程完成故障节点分片向健康分片上数据的数据追平,协助完成副本数据的更新。
可选的,所述系统中的数据存储节点设置为虚拟分片的副本分布,示例性的1个物理节点可以部署多个逻辑分片,并且每个逻辑分片的副本分片可以部署在不同物理机上,当一台物理存储节点宕机后,该物理存储节点上逻辑分片的副本因为分散到多个存储节点上,保证宕机服务器等节点的负载不仅可以实现故障转移,实现故障服务器上负载在故障转移后的负载均衡。示例性的,每个数据都指定了存储节点和存储目录,数据以文件的方式存储在数据节点上,受该节点上部署的存储单元的访问。如果指定节点的存储目录的空间不够,那主节点就会查询其他的存储单元是否有存储空间,如果有存储空间的话就会从其他数据节点上分配存储空间给数据;如果某台存储节点需要整体迁移,作为主节点的名称节点或数据节点需要暂停针对数据单元的存储请求,查询其他的节点是否有合适的空间接管此服务器上配置的数据节点,如果有合适的存储位置就会将数据文件同步到其他服务器上并向名称节点发送文件存储位置信息变更请求,名称节点会记录变更并同步给与该存储空间映射对应的虚拟逻辑分片管理单元。
可选的,当数据在写入到分配时设置为并发操作,当数据在向对应分片数据写入数据时,如果遇到文件被锁住的情况,它还可以将另一Hash段内的数据写入其他分片文件上,这样就减少了因锁定而导致的数据写入性能的下降。
可选的,每个数据有三个副本,其中有一个副本是主数据,其他两个副本是主数据的备份,在数据更改的时候都是修改主副本信息,然后其他两个副本的数据通过网络同步到备份节点上;如果存储主副本的数据服务器出现宕机或者网络故障错误等情况,则根据负载量选择一个副本服务器更新副本内容并进行副本的同步,将该副本服务器更改为该的主存储节点,原来的服务器启动之后从该服务器获得标签数据的更改信息再同步数据。所有标签数据的主信息和副本信息都会经过压缩处理后通过广播消息发送给集群,在读取的时候还是通过广播消息对数据进行解压缩操作,命名空间数据可以被整个系统中的所有用户端访问。
可选的,名称节点会在固定的时间间隔内接收来自数据节点的心跳信号和块报告,如果名称节点根据块报告验证元数据发现异常,亦或者数据节点未按时发送心跳信号,则这些相应的数据节点便会被标上宕机的标识。此时,系统不会再给异常的数据节点发送任何 I/O 请求。
用户端要想访问文件系统,就必须要同名称节点和数据节点进行交互。首先,用户端会找到名称节点,接到请求时名称节点会响应用户端,用户端获取到文件元数据,同时名称节点通过元数据会向数据节点映射数据块。
首先,当用户端有文件写入的请求时,会第一时间给到名称节点,名称节点收到具体请求后会与数据节点做信息交换,具体就是用户端会发送文件大小和配置信息给名称节点,名称节点会根据收到的信息把自己管理的数据节点相关地址信息回送给用户端;接下来,用户端就可以依据名称节点返回的数据节点地址信息,把要写入的文件拆分成许多小的数据块并按顺序写入对应的数据节点。
可选的,用户端先是向名称节点提出读取文件的请求,名称节点接收到用户端的请求后会迅速将存储用户端所请求文件的数据节点地址信息返回到用户端,此时用户端就能够通过数据节点的地址信息顺利读取到文件了。
本领域技术人员可以理解,实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)、随机存储记忆体(Random Access Memory,RAM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive,缩写:HDD)或固态硬盘(Solid-State Drive,SSD)等;所述存储介质还可以包括上述种类的存储器的组合。
如在本申请所使用的,术语“组件”、“模块”、“系统”等等旨在指代计算机相关实体,该计算机相关实体可以是硬件、固件、硬件和软件的结合、软件或者运行中的软件。例如,组件可以是,但不限于是:在处理器上运行的处理、处理器、对象、可执行文件、执行中的线程、程序和/或计算机。作为示例,在计算设备上运行的应用和该计算设备都可以是组件。一个或多个组件可以存在于执行中的过程和/或线程中,并且组件可以位于一个计算机中以及/或者分布在两个或更多个计算机之间。此外,这些组件能够从在其上具有各种数据结构的各种计算机可读介质中执行。这些组件可以通过诸如根据具有一个或多个数据分组(例如,来自一个组件的数据,该组件与本地系统、分布式系统中的另一个组件进行交互和/或以信号的方式通过诸如互联网之类的网络与其它系统进行交互)的信号,以本地和/或远程过程的方式进行通信。
应说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的精神和范围,其均应涵盖在本发明的权利要求范围当中。

Claims (10)

  1. 一种大数据的数据配置方法,其特征在于:配置名称节点,数据节点和用户端,
    其中,所述名称节点被配置为中心管理服务器,描述性元数据以列表形式存储在内存中,响应用户端对文件的访问要求,提供内部元数据服务;
    所述数据节点用于存储用户端需要的数据,以分块的方式存储数据,设置固定的每个块的大小,并进行备份存储;接收名称节点转发的控制信息,在名称节点的统一调度下进行数据块的创建、删除和复制工作,并周期性地向名称节点报告;
    用户端通过名称节点执行数据的访问;其中在数据节点中设置数据的主分片和副分片;为保持主分片与副分片之间的数据一致性,主、副分片通过确认消息的交互完成主、副分片的数据同步。 
  2. 如权利要求1所述的方法,其特征在于:所述主、副分片通过确认消息的交互完成主、副分片的数据同步具体为:在两个分片上同时处理与其关联的关系数据库操作,当主分片需要提交时,同时向副分片发出提交请求,如果副分片已经处理完本次任务,它会直接返回主分片一个ACK消息,如果副分片还没有完成本次任务,它会返回主分片一个NACK消息,用以触发表明是否需要等待,并做日志记录。
  3. 如权利要求2所述的方法,其特征在于:所述以分块的方式存储数据,设置固定的每个块的大小,并进行备份存储包括:默认存储三份,分别是本机的一份、同机架机器上的一份和其它机架的一份。
  4. 如权利要求3所述的方法,其特征在于:配置一个名称节点具有至少一个备份名称节点。
  5. 如权利要求4所述的方法,其特征在于:所述备份名称节点定时进行名称节点备份,通过自动切换保证正常运行。
  6. 如权利要求5所述的方法,其特征在于:当用户端创建文件时,用户端先将文件数据缓存到本地的一个临时文件中,当这个临时文件积累的数据达到阈值后,用户端才与名称节点发起连接。
  7. 如权利要求6所述的方法,其特征在于:各个用户端为数据节点中的元数据设置有效标记状态信息,当出现系统故障,更新操作前的有效标记状态信息,并存在日志中。
  8. 如权利要求7所述的方法,其特征在于:用以触发表明是否需要等待,并做日志记录包括:数据节点中对主、副分片的一致性设置回退值,所述回退值用于标示副本和主本之间的不一致性的容忍度,当主、副分片中有任意一个分片无法工作时,允许主、副分片之间的数据非一致性,并允许集群整体的事务强制提交。
  9. 如权利要求8所述的方法,其特征在于:当名称节点检测到系统中副本数量比预先设定的副本数量阈值低时,名称节点检测出丢失副本包含的数据块,并在系统空闲时,复制缺少的副本以达到预设副本数量阈值;从事件日志中探测出需要恢复的操作,进而调用进程完成。
  10. 一种大数据的数据配置系统,所述系统包括名称节点、数据节点和用户端,用于实现权利要求1-9中任一所述的方法。
PCT/CN2022/139829 2022-09-07 2022-12-18 一种大数据的数据配置方法和系统 WO2024051027A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211091952.XA CN115168367B (zh) 2022-09-07 2022-09-07 一种大数据的数据配置方法和系统
CN202211091952.X 2022-09-07

Publications (1)

Publication Number Publication Date
WO2024051027A1 true WO2024051027A1 (zh) 2024-03-14

Family

ID=83481954

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/139829 WO2024051027A1 (zh) 2022-09-07 2022-12-18 一种大数据的数据配置方法和系统

Country Status (2)

Country Link
CN (1) CN115168367B (zh)
WO (1) WO2024051027A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115168367B (zh) * 2022-09-07 2022-11-25 太极计算机股份有限公司 一种大数据的数据配置方法和系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160105370A1 (en) * 2014-10-10 2016-04-14 Pegasystems Inc. Event processing with enhanced throughput
CN105930498A (zh) * 2016-05-06 2016-09-07 中国银联股份有限公司 一种分布式数据库的管理方法及系统
CN106682227A (zh) * 2017-01-06 2017-05-17 郑州云海信息技术有限公司 基于分布式文件系统的日志数据存储系统及读写方法
CN112711596A (zh) * 2019-10-24 2021-04-27 阿里巴巴集团控股有限公司 多副本数据库系统、数据处理方法、电子设备以及计算机可读存储介质
CN115168367A (zh) * 2022-09-07 2022-10-11 太极计算机股份有限公司 一种大数据的数据配置方法和系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10795881B2 (en) * 2015-12-18 2020-10-06 Sap Se Table replication in a database environment
CN108572976A (zh) * 2017-03-10 2018-09-25 华为软件技术有限公司 一种分布式数据库中数据恢复方法、相关设备和系统
CN109976941B (zh) * 2017-12-28 2022-12-13 华为技术有限公司 一种数据恢复方法和装置
CN110413687B (zh) * 2019-05-09 2024-01-05 国网冀北电力有限公司 基于节点互证校验的分布式事务故障处理方法及相关设备
CN114637475B (zh) * 2022-04-13 2024-06-25 苏州浪潮智能科技有限公司 一种分布式存储系统控制方法、装置及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160105370A1 (en) * 2014-10-10 2016-04-14 Pegasystems Inc. Event processing with enhanced throughput
CN105930498A (zh) * 2016-05-06 2016-09-07 中国银联股份有限公司 一种分布式数据库的管理方法及系统
CN106682227A (zh) * 2017-01-06 2017-05-17 郑州云海信息技术有限公司 基于分布式文件系统的日志数据存储系统及读写方法
CN112711596A (zh) * 2019-10-24 2021-04-27 阿里巴巴集团控股有限公司 多副本数据库系统、数据处理方法、电子设备以及计算机可读存储介质
CN115168367A (zh) * 2022-09-07 2022-10-11 太极计算机股份有限公司 一种大数据的数据配置方法和系统

Also Published As

Publication number Publication date
CN115168367B (zh) 2022-11-25
CN115168367A (zh) 2022-10-11

Similar Documents

Publication Publication Date Title
US11755415B2 (en) Variable data replication for storage implementing data backup
US8793531B2 (en) Recovery and replication of a flash memory-based object store
KR101914019B1 (ko) 분산 데이터베이스 시스템들을 위한 고속 장애 복구
US7487311B2 (en) System and method for asynchronous backup of virtual disks in a distributed storage array
US7299378B2 (en) Geographically distributed clusters
KR101771246B1 (ko) 분산 데이터 시스템들을 위한 전 시스템에 미치는 체크포인트 회피
US9424140B1 (en) Providing data volume recovery access in a distributed data store to multiple recovery agents
US8868487B2 (en) Event processing in a flash memory-based object store
US8666939B2 (en) Approaches for the replication of write sets
WO2023046042A1 (zh) 一种数据备份方法和数据库集群
JP2019036353A (ja) 索引更新パイプライン
AU2005207572B2 (en) Cluster database with remote data mirroring
JP2016524750A5 (zh)
US10452680B1 (en) Catch-up replication with log peer
US10803012B1 (en) Variable data replication for storage systems implementing quorum-based durability schemes
WO2024051027A1 (zh) 一种大数据的数据配置方法和系统
US11947493B2 (en) Techniques for archived log deletion

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22957986

Country of ref document: EP

Kind code of ref document: A1