WO2022121387A1 - 数据存储方法、装置、服务器及介质 - Google Patents

数据存储方法、装置、服务器及介质 Download PDF

Info

Publication number
WO2022121387A1
WO2022121387A1 PCT/CN2021/116105 CN2021116105W WO2022121387A1 WO 2022121387 A1 WO2022121387 A1 WO 2022121387A1 CN 2021116105 W CN2021116105 W CN 2021116105W WO 2022121387 A1 WO2022121387 A1 WO 2022121387A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
storage
server
stored
information
Prior art date
Application number
PCT/CN2021/116105
Other languages
English (en)
French (fr)
Inventor
武金剑
谢永恒
万月亮
Original Assignee
北京锐安科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京锐安科技有限公司 filed Critical 北京锐安科技有限公司
Publication of WO2022121387A1 publication Critical patent/WO2022121387A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Definitions

  • the present application relates to data processing technology, for example, to a data storage method, device, server and medium.
  • the present application provides a data storage method, device, server and medium, so as to realize the separation of storage resource scheduling and data storage resources, and avoid the effect of hardware resource waste.
  • a data storage method comprising:
  • the data to be stored is sent to the storage server corresponding to the data information through the scheduling server for storage.
  • a data storage device comprising:
  • a data information acquisition module configured to acquire the data information of the data to be stored through the metadata gateway;
  • the data saving module is configured to send the data to be stored to the storage server corresponding to the data information through the scheduling server according to the data information for saving.
  • a server is also provided, wherein the server includes:
  • processors one or more processors
  • storage means arranged to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the data storage method provided by any embodiment of the present application.
  • a computer-readable storage medium is also provided, storing a computer program, wherein, when the computer program is executed by a processor, the data storage method provided by any embodiment of the present application is implemented.
  • Embodiment 1 is a flowchart of a data storage method in Embodiment 1 of the present application.
  • FIG. 2 is a schematic diagram of a process flow diagram of stored data based on an erasure code algorithm in Embodiment 1 of the present application;
  • FIG. 3 is a schematic diagram of storage processing according to a data name in Embodiment 1 of the present application.
  • FIG. 5 is a structural diagram of a data storage device in Embodiment 3 of the present application.
  • FIG. 6 is a schematic structural diagram of a server in Embodiment 4 of the present application.
  • FIG. 1 is a flowchart of a data storage method according to Embodiment 1 of the present application. This embodiment is applicable to the case of storing data. The method can be executed by a data storage device, and includes the following steps.
  • the data is calculated in real time and offline. During the calculation process or after the calculation is completed, the data is stored in different database components according to the data type. When saving the data, it is necessary to use the storage resource scheduling. Allocation, storage resource scheduling and storage resource binding, the above functions are implemented through a unified server, exemplarily, through Yet Another Resource Negotiator (Yet Another Resource Negotiator) in the Hadoop Distributed File System (HDFS) architecture , YARN) for storage resource scheduling.
  • YARN is a general resource management system that can provide unified resource management and scheduling for upper-layer applications. Its introduction brings benefits to the cluster in terms of utilization, unified resource management, and data sharing.
  • Storage resource scheduling is bound to storage resources, and storage resource scheduling capabilities often do not match storage resources. When one of them is insufficient and needs to be expanded, it can only be expanded at the same time, resulting in a waste of hardware resources. Therefore, it is necessary to separate storage resource scheduling from storage resources.
  • Storage resource scheduling is completed by a separate resource scheduling server, and storage functions are completed by a separate storage server, thereby realizing the separation of storage resource scheduling and storage resources and avoiding the waste of hardware resources.
  • the first step is to enable the HBase metadata gateway switch, log in to the storage management node, and execute commands through the executable command interface to open coexistence metadata. gateway.
  • the second step is to configure a routing policy for forwarding the data to be stored to the remote server for storage.
  • the third step is to modify the custom parameters of HDFS, and send the data to be stored that meets the custom parameters to the remote server for storage. After modifying the custom parameters, restart the server, so that all servers can obtain the configured routing policy and the modified self Define parameters.
  • the fourth step is to configure a routing policy for forwarding the data to be stored to the local server.
  • the storage capacity of the storage server needs to meet the storage capacity requirements of business processing. You can compare the effective storage space of the storage server with the effective storage space of the original storage server by calculating. In related solutions, there are often mismatches in storage resource scheduling capabilities and excessive redundancy in storage resources.
  • the number of central processing unit (CPU) cores and storage capacity can be appropriately reduced according to business analysis or actual measurement, thereby reducing hardware costs. .
  • the stored data is metadata, which is the data that describes the data attributes and information about the environment.
  • the metadata gateway obtains data information such as the storage location, data name, data size, and data version of the data to be stored. After separation, the data information of the data to be stored is obtained through the metadata gateway, so as to save the data to be stored in a corresponding storage location.
  • the metadata gateway provides a unified file system access entry for upper-layer big data computing applications, and identifies the storage server to be accessed by the metadata gateway.
  • the data to be stored is sent to the storage server corresponding to the data information through the scheduling server for storage.
  • the data information of the data to be stored is obtained through the metadata gateway, and the scheduling server allocates the storage server according to the data information of the data to be stored to save the data to be stored.
  • the data information includes: data attribute information; the storage server includes: a big data storage server and a small data storage server; the data to be stored is sent to the data information through a scheduling server according to the data information
  • the storing in the corresponding storage server includes: sending the data to be stored to the big data storage server or the small data storage server through the scheduling server according to the data attribute information for storage.
  • the information of the data to be stored obtained through the metadata gateway includes the size of the data. If it is data with a large amount of data, the scheduling server schedules the big data storage server to store the data to be stored. If it is data with a small amount of data, the scheduling server schedules the small data storage. The server stores the data to be stored.
  • the big data storage server can be the storage architecture of HDFS, which enables users to develop distributed programs without knowing the underlying details of the distribution, and make full use of the power of the cluster to perform high-speed computing and computing.
  • HDFS provides storage for massive data.
  • the HDFS storage architecture has the characteristics of high fault tolerance and improves the security of data storage.
  • the small data storage server can use distributed storage or relational storage and other storage methods.
  • the scheduling server includes: a first scheduling server and a second scheduling server; wherein the first scheduling server corresponds to the big data storage server; the second scheduling server corresponds to the small data storage server.
  • the scheduling server is divided into a first scheduling server corresponding to the big data storage server and a second scheduling server corresponding to the small data storage server.
  • the size information of the data to be stored is obtained through the metadata gateway, and when the data to be stored is big data, the storage resources are scheduled through the first scheduling server, and the data to be stored is stored in the big data storage server.
  • the storage resource scheduling is performed by the second scheduling server, and the data to be stored is stored in the small data storage server.
  • the data information includes: data name information; the big data storage server and the small data storage server include: a local storage server and a remote storage server; Sending the data to the storage server corresponding to the data information for storage includes: sending the data to be stored to the local storage server or the remote storage server through the scheduling server according to the data name information for storage.
  • the data information obtained through the metadata gateway also includes data name information, the storage address includes the local server and the remote server, and the scheduling server sends the data to be stored to the local server or the remote server for storage according to the data name information.
  • tables starting with test_+4 digits+01 are written to remote server storage, and tables with other names are written to local HDFS storage. As shown in Figure 3, test_202001 is written to the remote storage server, and test_202000 is written to the local HDFS.
  • the files to be stored will be stored locally first, and the data stored remotely is read-only data.
  • configure the gateway to transparently forward it to the remote for processing. Generally, it is used temporarily during the transformation process of the coexistence of the new and the old.
  • the routing policy is modified to the actual required mode.
  • the scheduling server will refer to the capacity of multiple storage servers, and schedule storage resources according to the absolute capacity value of the remaining capacity of multiple storage servers or the ratio of the remaining capacity to the total capacity, so that the storage resources of multiple storage servers can be maintained. balanced.
  • the data information of the data to be stored is obtained through the metadata gateway; according to the data information, the data to be stored is sent to the corresponding storage server through the scheduling server for storage, which solves the problem of storage resource scheduling capability and storage.
  • the resources are often mismatched. When one of them is insufficient and needs to be expanded, it can only be expanded at the same time, which leads to the problem of wasting hardware resources. It realizes the separation of storage resource scheduling and storage resources, and avoids the effect of wasting hardware resources.
  • Embodiment 4 is a flowchart of a data storage method provided in Embodiment 2 of the present application.
  • This embodiment is described on the basis of the previous embodiment, and the data information further includes: data type;
  • the scheduling server sends the data to be stored to a storage server corresponding to the data information for storage, including: sending the data to be stored to a database corresponding to the data type in the storage server through the scheduling server save.
  • the data to be stored is saved to the corresponding database according to the data type, which facilitates the management of the stored data.
  • S220 Send the data to be stored to a database corresponding to the data type in the storage server through the scheduling server for storage.
  • the data information of the data to be stored obtained through the metadata gateway further includes the data type.
  • the database components include but are not limited to HBase, Druid, Greenplum, JanusGraph, and Solr, so as to satisfy the storage of data of different data types.
  • sending the to-be-stored data to a database corresponding to the data type in the storage server for storage by the scheduling server includes: adjusting the storage server to divide the to-be-stored data into data blocks and check blocks.
  • the proportion of the data to be stored is divided into a first preset number of data blocks and a second preset number of check blocks by the storage server according to the proportion; the data blocks and the second preset number of check blocks are divided according to the data type
  • the check block is stored in a database corresponding to the data type.
  • Replica is a data reliability protection technology under the distributed storage system. By storing the same data on different nodes, multiple copies of the same data content are supported in the case of a single point of failure, such as node or hard disk failure. , which can achieve uninterrupted external storage requests by reading redundant copies.
  • Erasure coding is another mechanism to achieve data protection. It is a data protection method that divides data into fragments, expands, encodes redundant data blocks, and stores redundant data blocks in different locations, such as disks. , storage nodes, or other geographic locations. Compared with replicas, erasure codes have higher storage utilization and lower costs. When storing data, it is necessary to divide the data into data blocks and check blocks. When a data block is lost, the data can be recovered through other data blocks and check blocks to prevent data loss.
  • the ratio of dividing the original data to be stored into data blocks and check blocks is adjusted, the data blocks are adjusted to a first preset number, and the check blocks are adjusted to a second preset number.
  • the data to be stored is divided according to the newly adjusted ratio of the data block and the check block, so as to improve the utilization rate of the disk.
  • adjusting the ratio of dividing the data to be stored into data blocks and check blocks by the storage server includes: increasing the number of data blocks that the storage server divides the data to be stored into, so as to increase the ratio of the data blocks.
  • the utilization rate of the disk is 66.66%
  • the utilization rate of the disk is 83%
  • the ratio of data blocks to check blocks is adjusted to 22:2.
  • the utilization rate of the disk is 91.67%.
  • the number of parity blocks can be set to 1 or 2.
  • the number of parity blocks must be less than the number of data blocks.
  • the number of parity blocks can be determined according to the number of storage servers.
  • the check block is the backup check block. When one check block is damaged or lost, another check block is enabled.
  • the data information of the data to be stored is obtained through the metadata gateway; the data information further includes: data type; the scheduling server sends the to-be-stored data to the database corresponding to the data type in the storage server It solves the problem that the storage resource scheduling capability and storage resources often do not match. When one of them is insufficient and needs to be expanded, it can only be expanded at the same time, resulting in the waste of hardware resources. The separation of storage resource scheduling and storage resources is realized to avoid hardware waste of resources.
  • FIG. 5 is a structural diagram of a data storage device according to Embodiment 3 of the present application.
  • the data storage device includes: a data information acquisition module 310 and a data storage module 320 .
  • the data information obtaining module 310 is configured to obtain the data information of the data to be stored through the metadata gateway; the data saving module 320 is configured to send the to-be-stored data to the corresponding data information through the scheduling server according to the data information storage server for saving.
  • the data information includes: data attribute information;
  • the storage server includes: a big data storage server and a small data storage server.
  • the data saving module 320 is configured to send the data to be stored to the big data storage server or the small data storage server through the scheduling server according to the data attribute information. save.
  • the scheduling server includes: a first scheduling server and a second scheduling server; wherein the first scheduling server corresponds to the big data storage server; the second scheduling server corresponds to the small data storage server.
  • the data information includes: data name information;
  • the big data storage server and the small data storage server include: a local storage server and a remote storage server;
  • the data saving module 320 is further configured to send the data to be stored to the local storage server or the remote storage server through the scheduling server according to the data name information for saving.
  • the data information further includes: data type;
  • the data saving module 320 is further configured to send the to-be-stored data to a database corresponding to the data type in the storage server through the scheduling server for saving.
  • the data saving module 320 includes:
  • a proportion adjustment unit configured to adjust the proportion of the storage server dividing the data to be stored into data blocks and check blocks
  • a storage data dividing unit configured to divide the to-be-stored data into a first preset number of data blocks and a second preset number of check blocks by the storage server according to the ratio;
  • a data saving unit configured to save the data block and the check block in a database corresponding to the data type according to the data type.
  • the proportional adjustment unit includes:
  • the subunit for increasing the number of data blocks is configured to increase the number of data blocks that the storage server divides the data to be stored into, so as to increase the proportion of the data blocks.
  • the data information of the data to be stored is obtained through the metadata gateway; according to the data information, the data to be stored is sent to the corresponding storage server through the scheduling server for storage, which solves the problem of storage resource scheduling capability and storage.
  • the resources are often mismatched. When one of them is insufficient and needs to be expanded, it can only be expanded at the same time, which leads to the problem of wasting hardware resources. It realizes the separation of storage resource scheduling and storage resources, and avoids the effect of wasting hardware resources.
  • the data storage device provided by the embodiment of the present application can execute the data storage method provided by any embodiment of the present application, and has functional modules and effects corresponding to the execution method.
  • FIG. 6 is a schematic structural diagram of a server according to Embodiment 4 of the present application.
  • the server includes a processor 410, a memory 420, an input device 430, and an output device 440; the number of processors 410 in the server may be One or more, one processor 410 is taken as an example in FIG. 6; the processor 410, the memory 420, the input device 430 and the output device 440 in the server can be connected through a bus or other means, and the connection through a bus is taken as an example in FIG. 6 .
  • the memory 420 can be configured to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the data storage methods in the embodiments of the present application (for example, data in a data storage device). information acquisition module 310 and data storage module 320).
  • the processor 410 executes various functional applications and data processing of the server by running the software programs, instructions and modules stored in the memory 420, that is, to implement the above-mentioned data storage method.
  • the memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the terminal, and the like. Additionally, memory 420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some instances, memory 420 may include memory located remotely from processor 410, which may be connected to a server through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the input device 430 may be configured to receive input numerical or character information, and to generate key signal input related to user settings and function control of the server.
  • the output device 440 may include a display device such as a display screen.
  • Embodiment 5 of the present application further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are used to execute a data storage method when executed by a computer processor, and the method includes:
  • the data to be stored is sent to the storage server corresponding to the data information through the scheduling server for storage.
  • a storage medium containing computer-executable instructions provided by an embodiment of the present application the computer-executable instructions of which are not limited to the above method operations, and can also perform related operations in the data storage method provided by any embodiment of the present application .
  • the present application can be implemented by means of software and general hardware, and can also be implemented by hardware.
  • the present application can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as a floppy disk of a computer, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory) , RAM), flash memory (FLASH), hard disk or optical disk, etc., including several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) execute the methods described in the various embodiments of the present application.
  • a computer-readable storage medium such as a floppy disk of a computer, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory) , RAM), flash memory (FLASH), hard disk or optical disk, etc.
  • the multiple units and modules included are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized;
  • the names are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of this application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据存储方法、装置、服务器及介质。该数据存储方法包括:通过元数据网关获取待存储数据的数据信息(S110);根据所述数据信息通过调度服务器将所述待存储数据发送至所述数据信息对应的存储服务器进行保存(S120)。

Description

数据存储方法、装置、服务器及介质
本申请要求在2020年12月11日提交中国专利局、申请号为202011461108.2的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术,例如涉及一种数据存储方法、装置、服务器及介质。
背景技术
随着科学技术的发展,大数据作为科技时代的产物,得到了广泛的应用。
在对大数据进行存储时,存储资源调度和数据存储资源未进行分离,导致当大数据业务处理所需存储资源调度能力和数据存储资源其中之一不足需要扩容时,因未进行分离,只能同时将二者扩容,导致硬件资源的浪费。
发明内容
本申请提供一种数据存储方法、装置、服务器及介质,以实现存储资源调度和数据存储资源进行分离,避免硬件资源浪费的效果。
提供了一种数据存储方法,该方法包括:
通过元数据网关获取待存储数据的数据信息;
根据所述数据信息通过调度服务器将所述待存储数据发送至所述数据信息对应的存储服务器进行保存。
还提供了一种数据存储装置,该装置包括:
数据信息获取模块,设置为通过元数据网关获取待存储数据的数据信息;
数据保存模块,设置为根据所述数据信息通过调度服务器将所述待存储数据发送至所述数据信息对应的存储服务器进行保存。
还提供了一种服务器,其中,所述服务器包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如本申请任意实施例所提供的数据存储方法。
还提供了一种计算机可读存储介质,存储有计算机程序,其中,该计算机程序被处理器执行时实现如本申请任意实施例所提供的数据存储方法。
附图说明
图1是本申请实施例一中的一种数据存储方法的流程图;
图2是本申请实施例一中的基于纠删码算法的存储数据处理流程示意图;
图3是本申请实施例一中的根据数据名称进行存储处理的示意图;
图4是本申请实施例二中的一种数据存储方法的流程图;
图5是本申请实施例三中的一种数据存储装置的结构图;
图6是本申请实施例四中的一种服务器的结构示意图。
具体实施方式
下面结合附图和实施例对本申请进行说明。此处所描述的实施例仅仅用于解释本申请,而非对本申请的限定。为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
实施例一
图1为本申请实施例一提供的一种数据存储方法的流程图,本实施例可适用于对数据进行存储的情况,该方法可以由数据存储装置来执行,包括如下步骤。
S110、通过元数据网关获取待存储数据的数据信息。
根据不同的计算场景的需求对数据进行实时和离线计算,计算过程中或计算完成后将数据按数据类型存储到不同的数据库组件中,将数据进行保存时,需要通过存储资源调度将存储资源进行分配,存储资源调度与存储资源绑定,通过统一服务器实现上述功能,示例性的,通过Hadoop分布式文件系统(Hadoop Distributed File System,HDFS)架构中的另一种资源协调者(Yet Another Resource Negotiator,YARN)进行存储资源调度,YARN是一个通用资源管理系统,可为上层应用提供统一的资源管理和调度,它的引入为集群在利用率、资源统一管理和数据共享等方面带来了好处,此时,存储资源调度与存储资源绑定,存储资源调度能力与存储资源往往不匹配,当其中之一不足需要扩容时,只能同时扩容,从而导致硬件资源浪费。故需将存储资源调度与存储资源进行 分离,存储资源调度通过单独的资源调度服务器完成,存储功能通过单独的存储服务器完成,从而实现存储资源调度与存储资源的分离,避免硬件资源的浪费。
可选的,当需将存储资源调度与存储资源进行分离时,需要进行参数配置,第一步需启用HBase元数据网关开关,登录存储管理节点,通过可执行指令的界面执行命令打开共存元数据网关。第二步,配置将待存储数据转发至远端服务器进行存储的路由策略。第三步,修改HDFS的自定义参数,将符合自定义参数的待存储数据发送至远端服务器进行存储,修改自定义参数后重启服务器,以使所有服务器均获取配置的路由策略和修改的自定义参数。第四步,配置将待存储数据转发至本地服务器的路由策略。
可选的,通过核算调度服务器的中央处理器的核心数与相关方案的服务器的核心数进行对比和匹配。存储服务器的存储能力需要满足业务处理的存储容量需求,可以通过核算存储服务器的有效存储空间与原存储服务器的有效存储空间相对比。相关方案往往存在存储资源调度能力失配和存储资源具有过高冗余度的情况,中央处理器(Central Processing Unit,CPU)核心数和存储容量可根据业务分析或实测适度降低,从而减少硬件成本。
存储数据为元数据,元数据是描述数据属性和环境的信息的数据,通过元数据网关获取待存储数据的存储位置、数据名称、数据大小和数据版本等数据信息,在存储资源调度与存储资源分离后,通过元数据网关获取待存储数据的数据信息,以便将待存储数据保存至对应存储位置。元数据网关为上层大数据计算应用提供一个统一的文件系统访问入口,通过元数据网关识别数据要访问的存储服务器。
S120、根据数据信息通过调度服务器将待存储数据发送至所述数据信息对应的存储服务器进行保存。
通过元数据网关获取待存储数据的数据信息,调度服务器根据待存储数据的数据信息调配存储服务器对待存储数据进行保存。
可选的,数据信息包括:数据属性信息;所述存储服务器包括:大数据存储服务器和小数据存储服务器;所述根据所述数据信息通过调度服务器将所述待存储数据发送至所述数据信息对应的存储服务器进行保存,包括:根据所述数据属性信息通过所述调度服务器将所述待存储数据发送至所述大数据存储服务器或所述小数据存储服务器中进行保存。通过元数据网关获取的待存储数据的信息包括数据的大小,若为大数据量的数据,调度服务器调度大数据存储服务器存储待存储数据,若为小数据量的数据,调度服务器调度小数据存储服务器存储待存储数据。示例性的,如图2所示,大数据存储服务器可以是HDFS 的存储架构,它可以使用户在不了解分布式底层细节的情况下,开发分布式程序,充分利用集群的威力进行高速运算和存储,HDFS为海量的数据提供了存储,采用HDFS存储架构具有高容错性的特点,提高数据存储的安全性。小数据存储服务器可以采用分布式存储,也可以采用关系型存储等存储方式。
可选的,调度服务器包括:第一调度服务器和第二调度服务器;其中,所述第一调度服务器与所述大数据存储服务器对应;所述第二调度服务器与所述小数据存储服务器对应。调度服务器分为与大数据存储服务器对应的第一调度服务器和与小数据存储服务器对应的第二调度服务器。通过元数据网关获取待存储数据的大小信息,当待存储数据为大数据时,通过第一调度服务器进行存储资源的调度,将待存储数据存储到大数据存储服务器中。当待存储数据为小数据时,通过第二调度服务器进行存储资源调度,将待存储数据存储到小数据存储服务器中。
可选的,数据信息包括:数据名称信息;所述大数据存储服务器和所述小数据存储服务器包括:本地存储服务器和远程存储服务器;所述根据所述数据信息通过调度服务器将所述待存储数据发送至所述数据信息对应的存储服务器进行保存,包括:根据所述数据名称信息通过所述调度服务器将所述待存储数据发送至所述本地存储服务器或所述远程存储服务器中进行保存。通过元数据网关获取的数据信息还包括数据名称信息,存储地址包括本地服务器和远程服务器,调度服务器根据数据名称信息将待存储数据发送至本地服务器或远程服务器进行存储。示例性的,以test_开头+4位数字+01结尾的表写入远程服务器存储,其它名称的表写入本地的HDFS存储。如图3所示,test_202001写入远程存储服务器,test_202000写入本地的HDFS。
待存储文件会优先进行本地存储,远程存储的数据为只读数据。在业务上线调整阶段,将网关配置为透明转发到远端处理。一般在新老共存上线的改造的过程中临时使用。在上层应用都改造完成之后,再将路由策略修改为实际需要的方式。调度服务器在调度存储资源时会参考多个存储服务器的容量,根据多个存储服务器的剩余容量的绝对容量值或剩余容量占总容量的比率来调度存储资源,使多个存储服务器的存储资源保持均衡。
本实施例的技术方案,通过元数据网关获取待存储数据的数据信息;根据所述数据信息通过调度服务器将所述待存储数据发送至对应的存储服务器进行保存,解决了存储资源调度能力与存储资源往往不匹配,当其中之一不足需要扩容时,只能同时扩容,从而导致硬件资源浪费的问题,实现存储资源调度与存储资源的分离,避免硬件资源的浪费的效果。
实施例二
图4为本申请实施例二提供的一种数据存储方法的流程图,本实施例是在上一实施例的基础上进行说明,数据信息还包括:数据类型;所述根据所述数据信息通过调度服务器将所述待存储数据发送至所述数据信息对应的存储服务器进行保存,包括:通过所述调度服务器将所述待存储数据发送至所述存储服务器中与所述数据类型对应的数据库中保存。根据数据类型将待存储数据保存至对应的数据库中,便于存储数据的管理。
如图4所示,包括如下步骤。
S210、通过元数据网关获取待存储数据的数据信息,数据信息包括:数据类型。
S220、通过调度服务器将待存储数据发送至存储服务器中与数据类型对应的数据库中保存。
通过元数据网关获取的待存储数据的数据信息还包括数据类型。根据数据类型将待存储数据存储至存储服务器中的对应数据库中,可选的,数据库组件包括并不限于HBase、Druid、Greenplum、JanusGraph和Solr,以满足不同数据类型的数据进行存储。
可选的,通过调度服务器将所述待存储数据发送至所述存储服务器中与所述数据类型对应的数据库中保存,包括:调整所述存储服务器将待存储数据划分为数据块和校验块的比例;根据所述比例通过所述存储服务器将所述待存储数据划分为第一预设数量的数据块和第二预设数量的校验块;根据所述数据类型将所述数据块和所述校验块保存至所述数据类型对应的数据库中。
副本是在分布式存储系统下的一种数据可靠性保护技术,通过将相同的数据在不同的节点上存储多份完全相同的数据内容,支持在单点故障的情况下,比如节点或者硬盘故障,可以通过读取冗余的副本来实现外部存储请求不中断。纠删码是另外一种实现数据保护的机制,是一种数据保护方法,它将数据分割成片段,把冗余数据块扩展、编码,并将冗余数据块存储在不同的位置,比如磁盘、存储节点或者其它地理位置。相较于副本而言,纠删码具有更高存储利用率,能降低成本。在对数据进行存储时,需将数据划分为数据块和校验块,当有数据块丢失时,可通过其他数据块和校验块将数据进行恢复,从而防止数据丢失。
可选的,调整原有的待存储数据划分为数据块和校验块的比例,将数据块调整为第一预设数量,校验块调整为第二预设数量。将待存储数据根据新调整的数据块和校验块的比例进行划分,以提高磁盘的利用率。可选的,调整所述 存储服务器将待存储数据划分为数据块和校验块的比例,包括:提高所述存储服务器将待存储数据划分为数据块的数量,以提高数据块的比例。示例性的,原数据块与校验块的比例为4:2时,磁盘的利用率为66.66%,原数据块与校验块的比例为5:1时,磁盘的利用率为83%,将数据块与校验块的比例调整为22:2,此时,磁盘的利用率为91.67%。提高数据块的比例可最大程度降低大型数据中心的设备规模及磁盘可利用空间。校验块可设置为1个或2个,校验块的数量需小于数据块的数量,校验块的个数可根据存储服务器的数量决定,校验块设置为2个时,其中一个校验块为备份校验块,当一个校验块损坏或丢失时,另一个校验块开始启用。
本实施例的技术方案,通过元数据网关获取待存储数据的数据信息;数据信息还包括:数据类型;调度服务器将所述待存储数据发送至所述存储服务器中与所述数据类型对应的数据库中保存,解决了存储资源调度能力与存储资源往往不匹配,当其中之一不足需要扩容时,只能同时扩容,从而导致硬件资源浪费的问题,实现存储资源调度与存储资源的分离,避免硬件资源的浪费的效果。
实施例三
图5为本申请实施例三提供的一种数据存储装置的结构图,该数据存储装置包括:数据信息获取模块310和数据保存模块320。
其中,数据信息获取模块310,设置为通过元数据网关获取待存储数据的数据信息;数据保存模块320,设置为根据所述数据信息通过调度服务器将所述待存储数据发送至所述数据信息对应的存储服务器进行保存。
可选的,数据信息包括:数据属性信息;所述存储服务器包括:大数据存储服务器和小数据存储服务器。
在上述实施例的技术方案中,数据保存模块320是设置为根据所述数据属性信息通过所述调度服务器将所述待存储数据发送至所述大数据存储服务器或所述小数据存储服务器中进行保存。
可选的,调度服务器包括:第一调度服务器和第二调度服务器;其中,所述第一调度服务器与所述大数据存储服务器对应;所述第二调度服务器与所述小数据存储服务器对应。
可选的,数据信息包括:数据名称信息;所述大数据存储服务器和所述小数据存储服务器包括:本地存储服务器和远程存储服务器;
在上述实施例的技术方案中,数据保存模块320还设置为根据所述数据名 称信息通过所述调度服务器将所述待存储数据发送至所述本地存储服务器或所述远程存储服务器中进行保存。
可选的,数据信息还包括:数据类型;
在上述实施例的技术方案中,数据保存模块320还设置为通过调度服务器将所述待存储数据发送至所述存储服务器中与所述数据类型对应的数据库中保存。
在上述实施例的技术方案中,数据保存模块320,包括:
比例调整单元,设置为调整所述存储服务器将待存储数据划分为数据块和校验块的比例;
存储数据划分单元,设置为根据所述比例通过所述存储服务器将所述待存储数据划分为第一预设数量的数据块和第二预设数量的校验块;
数据保存单元,设置为根据所述数据类型将所述数据块和所述校验块保存至所述数据类型对应的数据库中。
在上述实施例的技术方案中,比例调整单元,包括:
数据块数量提高子单元,设置为提高所述存储服务器将待存储数据划分为数据块的数量,以提高数据块的比例。
本实施例的技术方案,通过元数据网关获取待存储数据的数据信息;根据所述数据信息通过调度服务器将所述待存储数据发送至对应的存储服务器进行保存,解决了存储资源调度能力与存储资源往往不匹配,当其中之一不足需要扩容时,只能同时扩容,从而导致硬件资源浪费的问题,实现存储资源调度与存储资源的分离,避免硬件资源的浪费的效果。
本申请实施例所提供的数据存储装置可执行本申请任意实施例所提供的数据存储方法,具备执行方法相应的功能模块和效果。
实施例四
图6为本申请实施例四提供的一种服务器的结构示意图,如图6所示,该服务器包括处理器410、存储器420、输入装置430和输出装置440;服务器中处理器410的数量可以是一个或多个,图6中以一个处理器410为例;服务器中的处理器410、存储器420、输入装置430和输出装置440可以通过总线或其他方式连接,图6中以通过总线连接为例。
存储器420作为一种计算机可读存储介质,可设置为存储软件程序、计算机可执行程序以及模块,如本申请实施例中的数据存储方法对应的程序指令/模 块(例如,数据存储装置中的数据信息获取模块310和数据保存模块320)。处理器410通过运行存储在存储器420中的软件程序、指令以及模块,从而执行服务器的多种功能应用以及数据处理,即实现上述的数据存储方法。
存储器420可包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端的使用所创建的数据等。此外,存储器420可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器420可包括相对于处理器410远程设置的存储器,这些远程存储器可以通过网络连接至服务器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
输入装置430可设置为接收输入的数字或字符信息,以及产生与服务器的用户设置以及功能控制有关的键信号输入。输出装置440可包括显示屏等显示设备。
实施例五
本申请实施例五还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行一种数据存储方法,该方法包括:
通过元数据网关获取待存储数据的数据信息;
根据所述数据信息通过调度服务器将所述待存储数据发送至所述数据信息对应的存储服务器进行保存。
本申请实施例所提供的一种包含计算机可执行指令的存储介质,其计算机可执行指令不限于如上所述的方法操作,还可以执行本申请任意实施例所提供的数据存储方法中的相关操作。
本申请可借助软件及通用硬件来实现,也可以通过硬件实现。本申请可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请多个实施例所述的方法。
上述数据存储装置的实施例中,所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多个功能单元的名称也只是为了便于相互区分,并不用于限制本申请的保 护范围。

Claims (10)

  1. 一种数据存储方法,包括:
    通过元数据网关获取待存储数据的数据信息;
    根据所述数据信息通过调度服务器将所述待存储数据发送至所述数据信息对应的存储服务器进行保存。
  2. 根据权利要求1所述的方法,其中,所述数据信息包括:数据属性信息;所述存储服务器包括:大数据存储服务器和小数据存储服务器;
    所述根据所述数据信息通过调度服务器将所述待存储数据发送至所述数据信息对应的存储服务器进行保存,包括:
    根据所述数据属性信息通过所述调度服务器将所述待存储数据发送至所述大数据存储服务器或所述小数据存储服务器中进行保存。
  3. 根据权利要求2所述的方法,其中,所述调度服务器包括:第一调度服务器和第二调度服务器;
    其中,所述第一调度服务器与所述大数据存储服务器对应;所述第二调度服务器与所述小数据存储服务器对应。
  4. 根据权利要求2所述的方法,其中,所述数据信息包括:数据名称信息;所述大数据存储服务器和所述小数据存储服务器包括:本地存储服务器和远程存储服务器;
    所述根据所述数据信息通过调度服务器将所述待存储数据发送至所述数据信息对应的存储服务器进行保存,包括:
    根据所述数据名称信息通过所述调度服务器将所述待存储数据发送至所述本地存储服务器或所述远程存储服务器中进行保存。
  5. 根据权利要求1所述的方法,其中,所述数据信息包括:数据类型;
    所述根据所述数据信息通过调度服务器将所述待存储数据发送至所述数据信息对应的存储服务器进行保存,包括:
    通过所述调度服务器将所述待存储数据发送至所述存储服务器中与所述数据类型对应的数据库中保存。
  6. 根据权利要求5所述的方法,其中,所述通过所述调度服务器将所述待存储数据发送至所述存储服务器中与所述数据类型对应的数据库中保存,包括:
    调整所述存储服务器将所述待存储数据划分为数据块和校验块的比例;
    根据所述比例通过所述存储服务器将所述待存储数据划分为第一预设数量的数据块和第二预设数量的校验块;
    根据所述数据类型将所述数据块和所述校验块保存至所述数据类型对应的数据库中。
  7. 根据权利要求6所述的方法,其中,所述调整所述存储服务器将所述待存储数据划分为数据块和校验块的比例,包括:
    提高所述存储服务器将所述待存储数据划分为所述数据块的数量,以提高所述数据块的比例。
  8. 一种数据存储装置,包括:
    数据信息获取模块,设置为通过元数据网关获取待存储数据的数据信息;
    数据保存模块,设置为根据所述数据信息通过调度服务器将所述待存储数据发送至所述数据信息对应的存储服务器进行保存。
  9. 一种服务器,包括:
    一个或多个处理器;
    存储装置,设置为存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7中任一项所述的数据存储方法。
  10. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1-7中任一项所述的数据存储方法。
PCT/CN2021/116105 2020-12-11 2021-09-02 数据存储方法、装置、服务器及介质 WO2022121387A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011461108.2A CN112527760A (zh) 2020-12-11 2020-12-11 数据存储方法、装置、服务器及介质
CN202011461108.2 2020-12-11

Publications (1)

Publication Number Publication Date
WO2022121387A1 true WO2022121387A1 (zh) 2022-06-16

Family

ID=74999229

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/116105 WO2022121387A1 (zh) 2020-12-11 2021-09-02 数据存储方法、装置、服务器及介质

Country Status (2)

Country Link
CN (1) CN112527760A (zh)
WO (1) WO2022121387A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112527760A (zh) * 2020-12-11 2021-03-19 北京锐安科技有限公司 数据存储方法、装置、服务器及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102045203A (zh) * 2010-12-30 2011-05-04 华为技术有限公司 网络管理的方法、装置及网络管理系统
WO2014071786A1 (zh) * 2012-11-06 2014-05-15 腾讯科技(深圳)有限公司 一种文件传输的方法及系统
CN104469392A (zh) * 2014-12-19 2015-03-25 北京奇艺世纪科技有限公司 一种视频文件存储方法及装置
CN105468451A (zh) * 2014-08-19 2016-04-06 复旦大学 一种基于高通量测序数据的计算机集群的作业调度系统
CN107426288A (zh) * 2017-05-26 2017-12-01 郑州云海信息技术有限公司 一种基于存储网络的资源共享调度方法和装置
CN112527760A (zh) * 2020-12-11 2021-03-19 北京锐安科技有限公司 数据存储方法、装置、服务器及介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9122535B2 (en) * 2011-11-22 2015-09-01 Netapp, Inc. Optimizing distributed data analytics for shared storage
CN103412884B (zh) * 2013-07-18 2016-12-28 华中科技大学 一种异构存储介质下嵌入式数据库的管理方法
CN108780386B (zh) * 2017-12-20 2020-09-04 华为技术有限公司 一种数据存储的方法、装置和系统
CN110457281A (zh) * 2019-08-14 2019-11-15 北京博睿宏远数据科技股份有限公司 数据处理方法、装置、设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102045203A (zh) * 2010-12-30 2011-05-04 华为技术有限公司 网络管理的方法、装置及网络管理系统
WO2014071786A1 (zh) * 2012-11-06 2014-05-15 腾讯科技(深圳)有限公司 一种文件传输的方法及系统
CN105468451A (zh) * 2014-08-19 2016-04-06 复旦大学 一种基于高通量测序数据的计算机集群的作业调度系统
CN104469392A (zh) * 2014-12-19 2015-03-25 北京奇艺世纪科技有限公司 一种视频文件存储方法及装置
CN107426288A (zh) * 2017-05-26 2017-12-01 郑州云海信息技术有限公司 一种基于存储网络的资源共享调度方法和装置
CN112527760A (zh) * 2020-12-11 2021-03-19 北京锐安科技有限公司 数据存储方法、装置、服务器及介质

Also Published As

Publication number Publication date
CN112527760A (zh) 2021-03-19

Similar Documents

Publication Publication Date Title
US9031910B2 (en) System and method for maintaining a cluster setup
US9971823B2 (en) Dynamic replica failure detection and healing
JP6353924B2 (ja) ブロックベースストレージに対するデータボリュームの耐久性状態の低減
US11042503B1 (en) Continuous data protection and restoration
US10922303B1 (en) Early detection of corrupt data partition exports
US11860741B2 (en) Continuous data protection
US10936423B2 (en) Enhanced application write performance
WO2019152117A1 (en) Systems and methods for synchronizing microservice data stores
WO2019001017A1 (zh) 集群间数据迁移方法、系统、服务器及计算机存储介质
CN112199427A (zh) 一种数据处理方法和系统
US12019523B2 (en) System and method for cloning as SQL server AG databases in a hyperconverged system
CN114722119A (zh) 数据同步方法及系统
US11086542B1 (en) Network-configurable snapshot load order properties
US11886225B2 (en) Message processing method and apparatus in distributed system
WO2022121387A1 (zh) 数据存储方法、装置、服务器及介质
US11042454B1 (en) Restoration of a data source
CN112714022A (zh) 多套集群的控制处理方法、装置及计算机设备
US11068192B1 (en) Utilizing mutiple snapshot sources for creating new copy of volume in a networked environment wherein additional snapshot sources are reserved with lower performance levels than a primary snapshot source
US10268475B1 (en) Near-zero downtime customizing change
US10083121B2 (en) Storage system and storage method
US11501014B2 (en) Secure data replication in distributed data storage environments
US10712959B2 (en) Method, device and computer program product for storing data
CN113965582A (zh) 一种模式转换方法和系统,及存储介质
CN114676291B (zh) 一种数据库系统及数据库系统的控制方法
US11593498B2 (en) Distribution of user specific data elements in a replication environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21902095

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21902095

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21902095

Country of ref document: EP

Kind code of ref document: A1