WO2015172478A1 - 一种分布式存储系统中管理异构副本的方法及装置 - Google Patents

一种分布式存储系统中管理异构副本的方法及装置 Download PDF

Info

Publication number
WO2015172478A1
WO2015172478A1 PCT/CN2014/086658 CN2014086658W WO2015172478A1 WO 2015172478 A1 WO2015172478 A1 WO 2015172478A1 CN 2014086658 W CN2014086658 W CN 2014086658W WO 2015172478 A1 WO2015172478 A1 WO 2015172478A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
copy
heterogeneous
read
request parameter
Prior art date
Application number
PCT/CN2014/086658
Other languages
English (en)
French (fr)
Inventor
程宁
韩盛中
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2015172478A1 publication Critical patent/WO2015172478A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to the field of data storage technologies, and in particular, to a method and apparatus for managing heterogeneous replicas in a distributed storage system.
  • Big data is used in various fields such as telecommunications, internet, finance, medical, military, and science. Compared with traditional data warehouse applications, big data analysis has the characteristics of large data volume and complex query analysis. For distributed file systems, it is natural to manage massive amounts of data. Take the traffic analysis data of telecommunications as an example. Every day, there will be new data, and each region will have different data. For a long time, for example, ten years of data, it will be massive data. If the operator wants to calculate the rules based on these data and specify a more reasonable billing scheme, then these huge data are stored in the distributed file system. How to improve the reading efficiency is very important.
  • these huge tables are stored in the form of some large files.
  • they When initially stored, they are stored in an additional write manner, and when they are read, they are sequentially read out. If the write is done in a row, the sequential readout is read in line format.
  • the business wants to analyze the data of each column, it will be more troublesome, and it needs to be read out to re-integrate. The larger the amount of data, the larger the workload.
  • the present invention provides a method and apparatus for managing heterogeneous copies in a distributed storage system that overcomes the above technical problems or at least partially solves the above technical problems, by storing a file separately in a plurality of storage servers.
  • the heterogeneous copy of the file is read, the corresponding heterogeneous copy can be read as needed, which can effectively improve the working efficiency of the user to process the data.
  • a method for managing a heterogeneous copy in a distributed storage system includes: obtaining a write request parameter for storing a copy of a file; and acquiring each from the metadata server according to the write request parameter Storing location information of the server; converting a copy of the file according to the write request parameter into a heterogeneous copy of the file in a plurality of different formats according to a pre-specified format; according to the storage obtained from the metadata server
  • the location information of the server is stored in a plurality of different formats of the heterogeneous copies of the file in a specified storage server.
  • the method before the step of converting the copy of the file according to the write request parameter into a heterogeneous copy of the file in a plurality of different formats according to a pre-specified format, the method further includes:
  • the step of converting a copy of the file into a heterogeneous copy of the file in a plurality of different formats according to the write request parameter is entered.
  • the step of converting the copy of the file into a heterogeneous copy of the file in a plurality of different formats according to a pre-specified format is: performing a copy of the file according to the write request parameter according to a line mode
  • the format of the storage, column mode storage, or block mode storage is converted into a heterogeneous copy of the file in a plurality of different formats, and a heterogeneous copy of the file in a plurality of different formats obtained by the conversion is cached.
  • the method further includes: acquiring a read request parameter for reading a copy of the file; acquiring location information of the storage server to be read from the metadata server according to the read request parameter; determining according to the read request parameter Whether to enable a heterogeneous copy of the file; if the heterogeneous copy of the file is enabled, the file is read from the specified storage server according to the location information of the storage server to be read from the metadata server Heterogeneous copies.
  • the write request parameter includes: a file handle, a file offset, a file length, and a format of a copy of the file, wherein the format of the copy of the file includes: storing according to a row mode, storing according to a column mode, or according to a block.
  • Mode storage the read request parameter includes: a file handle, a file offset, a file length, and a read mode of a copy of the file, the read mode including reading a copy in a row mode, and reading a copy in a column mode Or read the copy in block mode.
  • an apparatus for managing a heterogeneous copy in a distributed storage system comprising: a write request acquisition module configured to acquire a write request parameter for storing a copy of the file; a module, configured to acquire location information of each of the storage servers from a metadata server according to the write request parameter; and a format conversion module configured to convert a copy of the file according to the write request parameter according to a pre-specified format a heterogeneous copy of the file in a plurality of different formats; a copy storage module configured to convert the different files of the plurality of different formats according to the location information of the storage server obtained from the metadata server The configuration copies are stored separately on the specified storage server.
  • the apparatus further includes: a first determining module, configured to determine, according to the write request parameter, whether to enable a heterogeneous copy of the file; if the heterogeneous copy of the file is enabled, triggering the format conversion Module.
  • a first determining module configured to determine, according to the write request parameter, whether to enable a heterogeneous copy of the file; if the heterogeneous copy of the file is enabled, triggering the format conversion Module.
  • the format conversion module is configured to convert the copy of the file into a plurality of different formats of the file according to the write request parameter according to a row mode storage, a column mode storage or a block mode storage format.
  • the device further includes: a read request obtaining module, configured to acquire a read request parameter for reading a copy of the file; and a second location obtaining module configured to use the read request parameter from the metadata server Obtaining location information of the storage server to be read; a second determining module, configured to determine, according to the read request parameter, whether to enable a heterogeneous copy of the file; and a copy reading module, configured to enable a heterogeneous copy of the file if enabled And reading a heterogeneous copy of the file from the specified storage server according to the location information of the storage server to be read from the metadata server.
  • a read request obtaining module configured to acquire a read request parameter for reading a copy of the file
  • a second location obtaining module configured to use the read request parameter from the metadata server Obtaining location information of the storage server to be read
  • a second determining module configured to determine, according to the read request parameter, whether to enable a heterogeneous copy of the file
  • a copy reading module configured to enable a heterogen
  • the write request parameter includes: a file handle, a file offset, a file length, and a format of a copy of the file, wherein the format of the copy of the file includes: storing according to a row mode, storing according to a column mode, or according to a block.
  • Mode storage the read request parameter includes: a file handle, a file offset, a file length, and a read mode of a copy of the file, the read mode including reading a copy in a row mode, and reading a copy in a column mode Or read the copy in block mode.
  • the client program can convert a copy of the file into a heterogeneous copy of a plurality of files of different formats according to a pre-specified format, and convert the obtained multiple different
  • the heterogeneous copies of the formatted files are stored on the specified storage server.
  • each heterogeneous copy can be converted to each other, and multiple heterogeneous copies can be redundant. To improve the reliability of the file system.
  • the user can choose to read heterogeneous copies of the files as needed (eg, according to the needs of data analysis), such as when the user needs to analyze the copy of the file by line.
  • FIG. 1 shows one of flowcharts for storing a heterogeneous copy of a file in a method of managing a heterogeneous copy in a distributed storage system in an embodiment of the present invention
  • FIG. 2 is a schematic diagram showing a replica redundancy architecture in a distributed file system in an embodiment of the present invention
  • Figure 3 is a diagram showing a heterogeneous copy of a file stored in a row mode in an embodiment of the present invention
  • FIG. 4 is a diagram showing a heterogeneous copy of a file stored in a column mode in an embodiment of the present invention
  • Figure 5 is a diagram showing a heterogeneous copy of a file stored in a block mode in an embodiment of the present invention
  • FIG. 6 is a flow chart 2 of a method for storing a heterogeneous copy of a file in a method for managing a heterogeneous copy in a distributed storage system according to an embodiment of the present invention
  • FIG. 7 is a flow chart showing a process of reading a heterogeneous copy of a file in a method of managing a heterogeneous copy in a distributed storage system in an embodiment of the present invention
  • FIG. 8 is a flow chart showing the repair of a heterogeneous copy in a method for managing a heterogeneous copy in a distributed storage system in an embodiment of the present invention
  • Figure 9 is a block diagram showing an apparatus for managing heterogeneous copies in a distributed storage system in an embodiment of the present invention.
  • a method for managing a heterogeneous copy in a distributed storage system first obtaining a write request parameter for storing a copy of a file; and then acquiring each storage server from the metadata server according to a write request parameter. Location information; then, according to the write request parameter, the copy of the file is converted into a heterogeneous copy of the file in a plurality of different formats according to a pre-specified format; finally, the converted multiple is obtained according to the location information of the storage server obtained from the metadata server Heterogeneous copies of files of different formats are stored on the specified storage server.
  • FIG. 1 is a flowchart of storing a heterogeneous copy of a file in a method for managing a heterogeneous copy in a distributed storage system according to an embodiment of the present invention, where an execution subject of each step in the method may be a client program.
  • the method includes:
  • Step S101 Acquire a write request parameter for storing a copy of the file.
  • the copy of the file has the same content as the file, and the copy technology is a data management mechanism, and the data item is copied and stored in multiple nodes (storage servers) of the distributed system, respectively. To improve system reliability and access efficiency.
  • step S101 the client program calls a write data interface of the file system to obtain a write request parameter for storing a copy of the file, where the write request parameter includes: a file handle, a file offset, a file length, and The format of the copy of the file, etc., where the format of the copy of the file includes: storage in line mode (see Figure 3), storage in column mode (see Figure 4) or in block mode (see Figure 4).
  • Step S103 Obtain location information of each storage server in the distributed storage system from the metadata server according to the write request parameter.
  • a copy of files of different formats may be stored in different storage servers, that is, each storage server stores a copy of a file in one format, for example, a file in a write request parameter.
  • the format of the copy at this time, the client program can query the location information of the plurality of storage servers corresponding to the format of the copy of the file through the metadata server according to the write request parameter, and use the location information of the plurality of storage servers obtained by the query as The storage location of the copy of the file.
  • FIG. 2 is a schematic diagram of a replica redundancy architecture in a distributed file system according to an embodiment of the present invention, where the distributed file system includes: a client program 201, a metadata server 203, and a client program 201.
  • a plurality of storage servers 205 wherein the location information of each storage server 205 in the distributed file system can be recorded in the metadata server 203, and the storage server 205 is configured to store a heterogeneous copy of the file or a copy of the file.
  • Step S105 Convert a copy of the file according to the write request parameter into a heterogeneous copy of the file in a plurality of different formats according to a pre-specified format.
  • a heterogeneous copy of a file refers to a copy of a data format converted from a copy of the file, the content in the heterogeneous copy of the file being the same as the copy of the file and the contents of the file, each of which Heterogeneous copies can be applied to different application scenarios (such as storing a copy of a file in row mode, storing a copy of a file in column mode, storing a copy of a file in block mode, etc.), and each heterogeneous copy can also be mutually redundancy.
  • step S105 the copy of the file is converted into a file format of a plurality of different formats according to a row mode storage, a column mode storage, or a block mode storage according to the write request parameter. Construct a copy and cache a heterogeneous copy of the file in multiple different formats.
  • FIG. 3 a schematic diagram of a copy of a file stored in a row mode in an embodiment of the present invention, wherein the content in the copy 2 of the file is stored in a table, the table including a plurality of rows 21 and a plurality of columns 23
  • line 21 comprises a row, b row, c row and d row
  • column 23 comprises 1 column, 2 columns, 3 columns and 4 columns, wherein the contents of a row record: 50, 28, 352 and 120, b row record Contents: 21, 99, 66, and 112, the contents of the c line record: 32, 52, 123, and 13, the contents of the d line record: 65, 23, 87, and 344.
  • step S105 The copy 2 of the file shown in FIG.
  • FIG. 4 a schematic diagram of a copy of a file stored in a column mode in an embodiment of the present invention, wherein the content in the copy 2 of the file is stored in a table, the table including a plurality of rows 21 and a plurality of columns 23, Row 21 includes a row, b row, c row, and d row, and column 23 includes 1 column, 2 columns, 3 columns, and 4 columns, wherein the contents of one column record: 50, 21, 32, and 65, and the contents of the 2 columns of records : 28, 99, 52, and 23, 3 columns of recorded content: 352, 66, 123, and 87, 4 columns of recorded content: 120, 112, 13, and 344.
  • step S105 the copy 2 of the file as shown in FIG.
  • the heterogeneous copy 4 of the file is converted into a heterogeneous copy 4 of the file as shown in FIG. 4 according to the format stored in the column mode, and the heterogeneous copy of the converted file is cached. 4, the contents recorded in the heterogeneous copy 4 of the file: 50, 21, 32, 65, 28, 99, 52, 23, 352, 66, 123, 87, 120, 112, 13 and 344, of course, it is understandable
  • the specific content recorded in the copy 2 of the file is not limited in the embodiment of the present invention.
  • FIG. 5 a schematic diagram showing a copy of a file stored in a block mode in an embodiment of the present invention, wherein contents in a copy 2 of the file are stored in a table, the table including a plurality of rows 21 and a plurality of columns 23 , wherein the row 21 includes a row, a b row, a c row, and a d row, and the column 23 includes 1 column, 2 columns, 3 columns, and 4 columns, and the plurality of rows and the plurality of columns are divided into a plurality of blocks, wherein The contents recorded in block 25: 50, 28, 21, and 99, the contents recorded in block 27: 352, 120, 66, and 112, the contents recorded in block 29: 123, 13, 87, and 344, and the contents recorded in block 31: 32.
  • step S105 the copy 2 of the file as shown in FIG. 5 is converted into a heterogeneous copy 5 of the file as shown in FIG. 5 according to the format stored in the block mode, and the heterogeneous copy of the converted file is cached.
  • the contents recorded in the heterogeneous copy 5 of the file 50, 28, 21, 99, 352, 120, 66, 112, 32, 52, 65, 23, 123, 13, 87 and 344, of course, it is understandable
  • the specific content recorded in the copy 2 of the file is not limited in the embodiment of the present invention. It can be understood that, in FIG. 3 to FIG. 5, only three manners of format conversion are listed, and the manner of format conversion is not limited in the embodiment of the present invention. In the specific implementation, the user can adjust the format conversion method according to the specific situation.
  • step S103 may be performed, or step S105 may be performed, or step S103 and step S105 may be simultaneously performed, that is, the above steps are not limited in the embodiment of the present invention.
  • the sequence between S103 and step S105 is not limited in the embodiment of the present invention.
  • Step S107 Store, according to the location information of the storage server obtained from the metadata server, the heterogeneous copies of the converted plurality of files of different formats on the designated storage server.
  • the client program 201 can store the heterogeneous copies of the converted plurality of files of different formats on the designated storage server 205 according to the location information of the storage server acquired from the metadata server 203, that is, each The storage server 205 stores heterogeneous copies of files in different formats.
  • the client program can convert a copy of the file into a heterogeneous copy of the file in a plurality of different formats according to a pre-specified format, and convert the obtained heterogeneous copy of the file in a plurality of different formats.
  • each file Stored on a specified storage server, each file is stored in multiple heterogeneous copies, and the content of each heterogeneous copy is the same. Multiple heterogeneous copies play a role in redundancy to improve the file system. reliability.
  • the user can select to read a copy of the file according to the need, for example, when the user needs to analyze the heterogeneous copy of the file by line, he can select to read the copy of the file stored in the row mode; when the user needs When parsing a heterogeneous copy of a file by column, you can choose to read a copy of the file stored in column mode; when the user needs to analyze the heterogeneous copy of the file by block (some multidimensional table data), you can choose to read the press Heterogeneous copies of files stored in block mode. Since the user can select to read the heterogeneous copy of the file as needed, the user can effectively improve the efficiency of processing the data.
  • step S109 the client program determines whether to enable the heterogeneous copy of the file; if the heterogeneous copy of the file is enabled, the process proceeds to step S105; if the file is not enabled
  • step S111 the client program stores the copy of the file on the designated storage server according to the location information of each storage server obtained from the metadata server.
  • the client program may determine whether to enable the heterogeneous copy of the file according to the parameter input by the user. It is understood that, in the embodiment of the present invention, it is not limited to determine whether to enable the file. Specific conditions for heterogeneous copies.
  • FIG. 7 a flowchart of reading a heterogeneous copy of a file in a method for managing a heterogeneous copy in a distributed storage system according to an embodiment of the present invention, when a file is stored in a storage server in a distributed storage system
  • the method also includes:
  • Step S113 Acquire a read request parameter for reading a copy of the file.
  • the read request parameter includes: a file handle, a file offset, a file length, and a read mode of a copy of the file, wherein the read mode includes a row mode read copy, a column mode read Take a copy or block mode to read the copy.
  • the read request parameter includes a row mode read copy
  • the read request parameter includes a column mode read A copy is taken
  • the read request parameter includes a block mode read copy. Since the read request parameter includes a read mode, the user can select to read a copy of the file as needed.
  • Step S115 Obtain location information of the storage server to be read from the metadata server according to the read request parameter.
  • a correspondence between a storage server and a format of a copy of the file stored in the storage server is recorded in the metadata server, for example, a copy of the file stored in the storage server A is a heterogeneous copy stored in a row mode; A copy of the file stored in server B is a heterogeneous copy stored in a column mode; a copy of the file stored in storage server C is a heterogeneous copy stored in a block mode.
  • Step S117 Determine whether to enable a heterogeneous copy of the file.
  • step S119 the client program reads the heterogeneous file from the specified storage server according to the location information of the storage server to be read from the metadata server.
  • a copy For example, the client program needs to read the heterogeneous copy stored in the row mode, and the client program can query the location information of the storage server A according to the correspondence between the format of the copy of the file stored in the storage server in the metadata server, and then obtain the location information of the storage server A, and then In step S119, the heterogeneous copy stored in the row mode is read from the storage server A.
  • step S121 the client program obtains any one of the specified plurality of storage servers according to the location information of the storage server to be read from the metadata server. A copy of the file is read on the storage server. Because the format of a copy of a file stored in multiple storage servers is not converted, the files stored in multiple storage servers have the same copy format, so a copy of the file can be read from any of the multiple storage servers. .
  • a flowchart of repairing a heterogeneous copy in a method for managing a heterogeneous copy in a distributed storage system includes the following steps:
  • Step S801 the server program initiates a copy repair process.
  • One of the trigger conditions when the client program requests to read the copy (such as the line mode), it is found that the specified copy has been lost (may be the disk is broken or the corresponding storage server is closed), at this time the server-side program initiates a copy repair process.
  • the second trigger condition is that the file system periodically checks the status of the disk and finds that the disk is damaged or removed. After the aging time is reached, the corresponding copy repair process is also initiated.
  • Step S803 the server program determines whether the heterogeneous copy is enabled, and if yes, proceeds to step S805; if not, proceeds to step S809.
  • step S805 the server program reads the corresponding heterogeneous copy from the other storage servers, and then proceeds to step S807.
  • Step S807 The server program converts the heterogeneous copy of the file read from the other storage server into a heterogeneous copy corresponding to the storage server, and stores the new heterogeneous copy to the disk in the specified storage server.
  • step S807 the data recovery algorithm for the heterogeneous copy in step S807 is described as follows:
  • Row->column, column->row conversion only need to know the number of rows (or the number of columns), transpose the two-dimensional matrix.
  • the data has specific location information in the table.
  • the position information of the data can be represented by the number of rows and the number of columns. Therefore, when row->column or column->row conversion is required, According to the number of rows or columns of data, the two-dimensional matrix is transposed.
  • step S809 if the heterogeneous copy is not enabled, a copy of the other storage server can be directly copied to the storage server.
  • an apparatus for managing heterogeneous copies in a distributed storage system includes:
  • the write request obtaining module 901 is configured to obtain a write request parameter for storing a copy of the file, where the write request parameter includes: a file handle, a file offset, and a file length;
  • the first location obtaining module 903 is configured to acquire location information of each storage server in the distributed storage system from the metadata server according to the write request parameter.
  • FIG. 2 it is a schematic diagram of a replica redundancy architecture in a distributed file system according to an embodiment of the present invention, wherein the location information of a plurality of storage servers may be stored in the metadata server 203.
  • the format conversion module 905 is configured to convert a copy of the file according to the write request parameter into a heterogeneous copy of the file in a plurality of different formats according to a pre-specified format.
  • a heterogeneous copy of a file refers to a copy obtained by converting a copy of the file into a data format, and the content recorded in the heterogeneous copy of the file is the same as the copy of the file and the content in the file, each Heterogeneous copies are suitable for different application scenarios (such as storing a copy of a file in row mode, storing a copy of a file in column mode, storing a copy of a file in block mode, etc.), and are also mutually redundant. For details, please refer to FIG. 3 to FIG. 5.
  • the copy storage module 907 is configured to store, according to the location information of the storage server from the metadata server, the heterogeneous copies of the converted plurality of files in different formats on the designated storage server.
  • the client program 201 can store the heterogeneous copies of the converted plurality of files of different formats on the designated storage server 205 according to the location information of the storage server acquired from the metadata server 203.
  • the client program can convert a copy of the file into a heterogeneous copy of the file in a plurality of different formats according to a pre-specified format, and convert the obtained heterogeneous copy of the file in a plurality of different formats. They are stored on a specified storage server. By storing multiple heterogeneous copies of a file, heterogeneous copies can be converted to each other, so that multiple heterogeneous copies can be redundant to improve the reliability of the file system.
  • the user can select to read a copy of the file according to the need, for example, when the user needs to analyze the copy of the file by line, he can select to read the heterogeneous copy of the file stored in the row mode; when the user needs When parsing a copy of a file by column, you can choose to read a heterogeneous copy of the file stored in column mode; when the user needs to analyze a heterogeneous copy of the file by block (some multidimensional table data), you can choose to read the block by block.
  • a heterogeneous copy of a file stored in a schema Since the user can select to read the heterogeneous copy of the file as needed, the user can effectively improve the efficiency of processing the data.
  • the apparatus 900 further includes:
  • the first determining module 909 is configured to determine, according to the write request parameter, whether to enable the heterogeneous copy of the file; if the copy of the heterogeneous file is enabled, trigger the format conversion module 905; if the heterogeneous copy of the file is not enabled,
  • the trigger copy storage module 907 stores the copy of the file on the designated storage server according to the location information of each storage server obtained from the metadata server.
  • the format conversion module 905 is configured to convert the copy of the file into a plurality of different formats according to a write mode parameter according to a row mode storage, a column mode storage, or a block mode storage format.
  • a write mode parameter according to a row mode storage, a column mode storage, or a block mode storage format.
  • the apparatus 900 further includes:
  • the read request obtaining module 911 is configured to acquire a read request parameter for reading a copy of the file
  • the second location obtaining module 913 is configured to acquire, according to the read request parameter, location information of the storage server to be read from the metadata server;
  • the second determining module 915 is configured to determine, according to the read request parameter, whether to enable the heterogeneous copy of the file;
  • the copy reading module 917 is configured to, if the heterogeneous copy of the file is enabled, read the heterogeneous file from the specified storage server according to the location information of the storage server to be read from the metadata server a copy; and if a heterogeneous copy of the file is not enabled, a copy of any one of the files is read from the specified plurality of storage servers according to the location information of the storage server to be read from the metadata server.
  • the write request parameter includes: a file handle, a file offset, a file length, and a format of a copy of the file, where the format of the copy of the file includes: according to the line Mode storage, storing in a column mode or storing in a block mode;
  • the read request parameters include: a file handle, a file offset, a file length, and a read mode of a copy of the file, the read mode including a line mode read Take a copy, a column mode read copy, or a block mode read copy.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种分布式存储系统中管理异构副本的方法及装置,其中方法包括:获取用于存储文件的副本的写请求参数;根据写请求参数从元数据服务器获取分布式存储系统中每个存储服务器的位置信息;根据写请求参数将文件的副本按照预先指定的格式转换得到多个不同格式的文件的异构副本;根据从元数据服务器获取的存储服务器的位置信息,将转换得到的多个不同格式的文件的异构副本分别存储在指定的存储服务器上。通过上述技术方案可以将一个文件在多个存储服务器中分别存储多个不同的异构副本,在读取文件的异构副本时,可以根据需要读取对应的异构副本,能够有效提高用户处理数据的工作效率。

Description

一种分布式存储系统中管理异构副本的方法及装置 技术领域
本发明涉及数据存储技术领域,尤其涉及一种分布式存储系统中管理异构副本的方法及装置。
背景技术
大数据在通讯、互联网、金融、医疗、军工、科学等各个领域均有应用。大数据分析相比于传统的数据仓库应用,具有数据量大、查询分析复杂等特点。而针对分布式文件系统来说,要管理的自然是海量数据。就以电信的流量分析数据为例,每天都会有新的数据,每个地区也会有不同的数据,时间一长比如说十年的数据,就会是海量数据。如果运营商要根据这些数据统计出规律,指定更合理的计费方案,那么,这些庞大的数据存储在分布式文件系统中,怎样提高读取效率是一件很重要的事情。
相关技术中,这些庞大的表格以一些大文件的方式存储,一开始存储时,是以追加写的方式存入,而需要读取时,再顺序读出来。如果写入时是按行的方式写进去,顺序读出来的时候就是以行的格式读出来。但是如果业务想分析每一列的数据,就会比较麻烦,需要读出来重新整合,数据量越大工作量也就越大。
发明内容
鉴于上述技术问题,本发明提供了一种克服上述技术问题或者至少部分地解决上述技术问题的分布式存储系统中管理异构副本的方法及装置,通过将一个文件在多个存储服务器中分别存储多个不同的异构副本,在读取文件的异构副本时,可以根据需要读取对应的异构副本,能够有效提高用户处理数据的工作效率。
依据本发明的一个方面,提供了一种分布式存储系统中管理异构副本的方法,包括:获取用于存储文件的副本的写请求参数;根据所述写请求参数从元数据服务器获取每个存储服务器的位置信息;根据所述写请求参数将所述文件的副本按照预先指定的格式转换得到多个不同格式的所述文件的异构副本;根据从所述元数据服务器获取的所述存储服务器的位置信息,将转换得到的多个不同格式的所述文件的异构副本分别存储在指定的存储服务器上。
可选地,在根据所述写请求参数将所述文件的副本按照预先指定的格式转换得到多个不同格式的所述文件的异构副本的步骤之前,所述方法还包括:
根据所述写请求参数判断是否启用所述文件的异构副本;
如果启用所述文件的异构副本,则进入根据所述写请求参数将所述文件的副本按照预先指定的格式转换得到多个不同格式的所述文件的异构副本的步骤。
可选地,所述将所述文件的副本按照预先指定的格式转换得到多个不同格式的所述文件的异构副本的步骤为:根据所述写请求参数将所述文件的副本按照行模式存储、列模式存储或块模式存储的格式转换成多个不同格式的所述文件的异构副本,并缓存转换得到的多个不同格式的所述文件的异构副本。
可选地,所述方法还包括:获取用于读取文件的副本的读请求参数;根据所述读请求参数从所述元数据服务器获取要读的存储服务器的位置信息;根据读请求参数判断是否启用所述文件的异构副本;如果启用所述文件的异构副本,则根据从所述元数据服务器获取要读的存储服务器的位置信息,从指定的存储服务器上读取所述文件的异构副本。
可选地,所述写请求参数包括:文件句柄、文件偏移量、文件长度和文件的副本的格式,其中所述文件的副本的格式包括:按照行模式存储、按照列模式存储或者按照块模式存储;所述读请求参数包括:文件句柄、文件偏移量、文件长度和所述文件的副本的读取模式,所述读取模式包括按照行模式读取副本、按照列模式读取副本或者按照块模式读取副本。
依据本发明的另一个方面,还提供了一种分布式存储系统中管理异构副本的装置,包括:写请求获取模块,设置为获取用于存储文件的副本的写请求参数;第一位置获取模块,设置为根据所述写请求参数从元数据服务器获取每个所述存储服务器的位置信息;格式转换模块,设置为根据所述写请求参数将所述文件的副本按照预先指定的格式转换得到多个不同格式的所述文件的异构副本;副本存储模块,设置为根据从所述元数据服务器获取的所述存储服务器的位置信息,将转换得到的多个不同格式的所述文件的异构副本分别存储在指定的存储服务器上。
可选地,所述装置还包括:第一判断模块,设置为根据所述写请求参数判断是否启用所述文件的异构副本;如果启用所述文件的异构副本,则触发所述格式转换模块。
可选地,所述格式转换模块设置为根据所述写请求参数将所述文件的副本按照行模式存储、列模式存储或块模式存储的格式转换成多个不同格式的所述文件的异构副本,并缓存转换得到的多个不同格式的所述文件的异构副本。
可选地,所述装置还包括:读请求获取模块,设置为获取用于读取文件的副本的读请求参数;第二位置获取模块,设置为根据所述读请求参数从所述元数据服务器获取要读的存储服务器的位置信息;第二判断模块,设置为根据所述读请求参数判断是否启用所述文件的异构副本;副本读取模块,设置为如果启用所述文件的异构副本,则根据从所述元数据服务器获取要读的存储服务器的位置信息,从指定的存储服务器上读取所述文件的异构副本。
可选地,所述写请求参数包括:文件句柄、文件偏移量、文件长度和文件的副本的格式,其中所述文件的副本的格式包括:按照行模式存储、按照列模式存储或者按照块模式存储;所述读请求参数包括:文件句柄、文件偏移量、文件长度和所述文件的副本的读取模式,所述读取模式包括按照行模式读取副本、按照列模式读取副本或者按照块模式读取副本。
本发明的有益效果是:在本发明的实施例中,由于客户端程序可以将文件的副本按照预先指定的格式转换得到多个不同格式的文件的异构副本,并将转换得到的多个不同格式的文件的异构副本分别存储在指定的存储服务器上,通过将一个文件存多个异构副本,每个异构副本之间是可以相互转换,多个异构副本起到冗余的作用,以提高文件系统的可靠性。而且由于每个存储服务器中存储的文件的异构副本的格式不同,使得用户可以根据需要(例如根据数据分析的需要)选择读取文件的异构副本,例如当用户需要按行分析文件的副本时,就可以选择读取按行模式存储的文件的异构副本;当用户需要按列分析文件的副本时,就可以选择读取按列模式存储的文件的异构副本;当用户需要按块分析文件的副本(一些多维表格数据)时,就可以选择读取按块模式存储的文件的异构副本。由于用户可以根据需要选择读取文件的异构副本,能够有效提高用户处理数据的工作效率,特别适用于海量的规则性数据操作和大数据库的管理。
附图说明
图1表示本发明的实施例中分布式存储系统中管理异构副本的方法中存储文件的异构副本的流程图之一;
图2表示本发明的实施例中分布式文件系统中的副本冗余架构的示意图;
图3表示本发明的实施例中按行模式存储的文件的异构副本的示意图;
图4表示本发明的实施例中按列模式存储的文件的异构副本的示意图;
图5表示本发明的实施例中按块模式存储的文件的异构副本的示意图;
图6表示本发明的实施例中分布式存储系统中管理异构副本的方法中存储文件的异构副本的流程图之二;
图7表示本发明的实施例中分布式存储系统中管理异构副本的方法中读取文件的异构副本的流程图;
图8表示本发明的实施例中分布式存储系统中管理异构副本的方法中修复异构副本的流程图;以及
图9表示本发明的实施例中分布式存储系统中管理异构副本的装置的框图。
具体实施方式
依据本发明的一个方面公开了一种分布式存储系统中管理异构副本的方法,首先获取用于存储文件的副本的写请求参数;然后根据写请求参数从元数据服务器获取每个存储服务器的位置信息;然后根据写请求参数将文件的副本按照预先指定的格式转换得到多个不同格式的文件的异构副本;最后根据从元数据服务器获取的存储服务器的位置信息,将转换得到的多个不同格式的文件的异构副本分别存储在指定的存储服务器上。
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
如图1所示,为本发明的实施例中分布式存储系统中管理异构副本的方法中存储文件的异构副本的流程图之一,该方法中各个步骤的执行主体可以是客户端程序,该方法包括:
步骤S101、获取用于存储文件的副本的写请求参数。
在本发明的实施例中,文件的副本与文件具有相同的内容,副本技术是一种数据管理机制,将数据项复制多份分别放在分布式系统的多个节点(存储服务器)上,用以提高系统的可靠性和访问效率。
具体地,在步骤S101中该客户端程序调用文件系统的写数据接口,以获取用于存储文件的副本的写请求参数,其中该写请求参数包括:文件句柄、文件偏移量、文件长度和文件的副本的格式等,其中文件的副本的格式包括:按照行模式存储(可参见图3)、按照列模式存储(可参见图4)或者按照块模式存储(可参见图4)。
步骤S103、根据写请求参数从元数据服务器获取分布式存储系统中每个存储服务器的位置信息。
通常情况下,为了确保文件系统的可靠性,可以将不同格式的文件的副本存储在不同的存储服务器中,即每个存储服务器存储一种格式的文件的副本,例如写请求参数中还包括文件的副本的格式,此时客户端程序可以根据该写请求参数通过元数据服务器查询与文件的副本的格式对应的多个存储服务器的位置信息,并将查询得到的多个存储服务器的位置信息作为文件的副本的存储位置。
如图2所示,为本发明的实施例中分布式文件系统中的副本冗余架构的示意图,其中分布式文件系统包括:客户端程序201、元数据服务器203、以及与客户端程序201连接的多个存储服务器205,其中元数据服务器203中可以记录分布式文件系统中每个存储服务器205的位置信息,存储服务器205设置为存储文件的异构副本或文件的副本。
步骤S105、根据写请求参数将文件的副本按照预先指定的格式转换得到多个不同格式的文件的异构副本。
在本发明的实施例中,文件的异构副本是指对文件的副本进行数据格式的转换得到的副本,文件的异构副本中的内容与文件的副本以及文件中的内容相同,其中每个异构副本可以适用于不同的应用场景(例如按行模式存储文件的副本、按列模式存储文件的副本、按块模式存储文件的副本等),同样每个异构副本之间也可以互为冗余。
可选地,在本发明的一个实施例中,在步骤S105中,根据写请求参数将文件的副本按照行模式存储、列模式存储或块模式存储的格式转换成多个不同格式的文件的异构副本,并缓存转换得到的多个不同格式的文件的异构副本。
如图3所示,为本发明的实施例中按行模式存储的文件的副本的示意图,其中文件的副本2中的内容以表格的方式存储,该表格包括多个行21和多个列23,其中行21包括a行、b行、c行和d行,列23包括1列、2列、3列和4列,其中a行记录的内容:50、28、352和120,b行记录的内容:21、99、66和112,c行记录的内容:32、52、123和13,d行记录的内容:65、23、87和344。可选地,在步骤S105中将 如图3所示的文件的副本2按照行模式存储的格式转换成如图3所示的文件的异构副本3,并缓存转换得到的文件的异构副本3,文件的异构副本3记录的内容:50、28、352、120、21、99、66、112、32、52、123、13、65、23、87和344,当然可以理解的是,在本发明的实施例中并不限定文件的副本2中记录的具体内容。
如图4所示,为本发明的实施例中按列模式存储的文件的副本的示意图,其中文件的副本2中的内容以表格的方式存储,表格包括多个行21和多个列23,行21包括a行、b行、c行和d行,列23包括1列、2列、3列和4列,其中1列记录的内容:50、21、32和65,2列记录的内容:28、99、52和23,3列记录的内容:352、66、123和87,4列记录的内容:120、112、13和344。可选地,在步骤S105中将如图4所示的文件的副本2按照列模式存储的格式转换成如图4所示的文件的异构副本4,并缓存转换得到的文件的异构副本4,文件的异构副本4中记录的内容:50、21、32、65、28、99、52、23、352、66、123、87、120、112、13和344,当然可以理解的是,在本发明的实施例中并不限定文件的副本2中记录的具体内容。
如图5所示,表示本发明的实施例中按块模式存储的文件的副本的示意图,其中文件的副本2中的内容以表格的方式存储,该表格包括多个行21和多个列23,其中行21包括a行、b行、c行和d行,列23包括1列、2列、3列和4列,图中对多个行和多个列进行划分得到多个块,其中块25记录的内容:50、28、21和99,块27记录的内容:352、120、66和112,块29记录的内容:123、13、87和344,块31记录的内容:32、52、65和23。可选地,在步骤S105中将如图5所示的文件的副本2按照块模式存储的格式转换成如图5所示的文件的异构副本5,并缓存转换得到的文件的异构副本5,文件的异构副本5中记录的内容:50、28、21、99、352、120、66、112、32、52、65、23、123、13、87和344,当然可以理解的是,在本发明的实施例中并不限定文件的副本2中记录的具体内容。当然可以理解的是,图3~图5中仅列举了三种格式转换的方式,在本发明的实施例中并不限定格式转换的方式。在具体实施时,用户可以根据具体情况来调整格式转换的方式。
可选地,在本发明的实施例中,执行完步骤S101之后,可以进入步骤S103,或者进入步骤S105,或者同时进入步骤S103和步骤S105,即在本发明的实施例中并不限定上述步骤S103和步骤S105之间的先后顺序。
步骤S107、根据从元数据服务器获取的存储服务器的位置信息,将转换得到的多个不同格式的文件的异构副本分别存储在指定的存储服务器上。
继续参见图2,客户端程序201可以根据从元数据服务器203获取的存储服务器的位置信息,将转换得到的多个不同格式的文件的异构副本分别存储在指定的存储服务器205上,即每个存储服务器205中存储不同格式的文件的异构副本。
在本发明的实施例中,由于客户端程序可以将文件的副本按照预先指定的格式转换得到多个不同格式的文件的异构副本,并将转换得到的多个不同格式的文件的异构副本分别存储在指定的存储服务器上,通过将一个文件存多个异构副本,而每个异构副本的内容都是一样的,多个异构副本起到冗余的作用,以提高文件系统的可靠性。而且在本发明的实施例中用户可以根据需要选择读取文件的副本,例如当用户需要按行分析文件的异构副本时,就可以选择读取按行模式存储的文件的副本;当用户需要按列分析文件的异构副本时,就可以选择读取按列模式存储的文件的副本时;当用户需要按块分析文件的异构副本(一些多维表格数据)时,就可以选择读取按块模式存储的文件的异构副本。由于用户可以根据需要选择读取文件的异构副本,能够有效提高用户处理数据的工作效率。
如图6所示,为本发明的实施例中分布式存储系统中管理异构副本的方法中存储文件的副本的流程图之二,与图1中所示的存储文件的副本的流程图之一的区别是,在图6中在步骤S105之前,方法还包括:步骤S109、客户端程序判断是否启用文件的异构副本;如果启用文件的异构副本,则进入步骤S105;如果没有启用文件的异构副本,则进入步骤S111,在步骤S111中,客户端程序根据从元数据服务器获取每个存储服务器的位置信息,将文件的副本分别存储在指定的存储服务器上。可选地,在本发明的实施例客户端程序可以根据用户输入的参数来判断是否启用文件的异构副本,当然可以理解的是,在本发明的实施例中并不限定判断是否启用文件的异构副本的具体条件。
如图7所示,为本发明的实施例中分布式存储系统中管理异构副本的方法中读取文件的异构副本的流程图,当分布式存储系统中的存储服务器中存储有文件的副本或文件的异构副本时,该方法还包括:
步骤S113、获取用于读取文件的副本的读请求参数。
可选地,在本发明的实施例中,读请求参数包括:文件句柄、文件偏移量、文件长度、文件的副本的读取模式,其中读取模式包括行模式读取副本、列模式读取副本或者块模式读取副本。例如当用户需要按行分析文件的副本时,该读请求参数中包括行模式读取副本;当用户需要按列分析文件的副本时,该读请求参数中包括列模式读 取副本;当用户需要按块分析文件的副本时,该读请求参数中包括块模式读取副本。由于读请求参数中包括读取模式,使得用户可以根据需要选择读取文件的副本。
步骤S115、根据读请求参数从元数据服务器获取要读的存储服务器的位置信息。
可选地,在元数据服务器中记录有存储服务器与该存储服务器中存储的文件的副本的格式的对应关系,例如存储服务器A中存储的文件的副本为按照行模式存储的异构副本;存储服务器B中存储的文件的副本为按照列模式存储的异构副本;存储服务器C中存储的文件的副本为按照块模式存储的异构副本。
步骤S117、判断是否启用文件的异构副本。
如果启用所述文件的异构副本,则进入步骤S119,在步骤S119中,客户端程序根据从元数据服务器获取要读的存储服务器的位置信息,从指定的存储服务器上读取文件的异构副本。例如客户端程序要读取按照行模式存储的异构副本,客户端程序可以根据元数据服务器中记录有存储服务器中文件的副本的格式的对应关系,查询得到存储服务器A的位置信息,然后在步骤S119中,从存储服务器A中读取按照行模式存储的异构副本。
如果没有启用所述文件的异构副本,则进入步骤S121,在步骤S121中,客户端程序根据从元数据服务器获取要读的存储服务器的位置信息,从指定的多个存储服务器中的任意一个存储服务器上读取文件的副本。由于多个存储服务器中所存储的文件的副本的格式并未转换,多个存储服务器中存储的文件的副本格式相同,因此可以从多个存储服务器中的任意一个存储服务器中读取文件的副本。
在本发明的实施例中为了提高异构副本存储的可靠性,可以在存储服务器上存储的异构副本出现问题的时候,对该异构副本进行修复处理。如图8所示,为本发明的实施例中分布式存储系统中管理异构副本的方法中修复异构副本的流程图,包括如下步骤:
步骤S801、服务器端程序发起副本修复处理。
可选地,在本发明的实施例中,异构副本进行修复处理的触发条件有两个:
触发条件之一、客户端程序请求读副本时(比如是行模式),发现指定的副本已经丢失(可能是磁盘坏了或者对应的存储服务器关闭了),此时服务器端程序发起副本修复处理。
触发条件之二、文件系统内部定期检测磁盘状态,发现有磁盘损坏或者移除,在达到老化时间后,也发起对应的副本修复处理。
步骤S803、服务器端程序判断是否启用异构副本,如果是,则进入步骤S805;如果否,则进入步骤S809。
如果启用异构副本,则进入步骤S805,在步骤S805中,服务器端程序从其他存储服务器中读取对应的异构副本,然后进入步骤S807。
步骤S807、服务器端程序将从其他存储服务器读取的文件的异构副本,转化成本存储服务器对应的异构副本,并将新的异构副本存到指定的存储服务器中的磁盘上。
可选地,步骤S807中关于异构副本的数据恢复算法,描述如下:
(1)行->列,列->行的转换,只需要知道行数(或列数),将二维矩阵进行转置运算。如图3~图5,数据在表格中都有具体的位置信息,例如可以通过行数和列数来表示数据的位置信息,因此,当需要进行行->列或列->行的转换时,根据数据的行数或列数,将二维矩阵进行转置运算。
(2)行->块,块->行的转换。当行->块转化时,根据行列数和块数,将对应的行数据取出来,组成块数据。如图3~图5,数据在表格中都有具体的位置信息,例如可以通过行数、列数和块数来表示数据的位置信息,因此,当行->块转化时,可根据行数、列数和块数,将对应的行数据取出来,组成块数据。
(3)列->块,块->列的转换方式同(2)类似,在此不再敷述。
步骤S809,如果没有启用异构副本,则可以将其他存储服务器的副本直接拷贝至本存储服务器上。
依据本发明的另一个方面还公开了一种分布式存储系统中管理异构副本的装置。如图9所示,分布式存储系统中管理异构副本的装置900包括:
写请求获取模块901,设置为获取用于存储文件的副本的写请求参数,其中该写请求参数包括:文件句柄、文件偏移量和文件长度;
第一位置获取模块903,设置为根据写请求参数从元数据服务器获取分布式存储系统中每个存储服务器的位置信息。如图2所示,为本发明的实施例中分布式文件系统中的副本冗余架构的示意图,其中元数据服务器203中可以存储多个存储服务器的位置信息。
格式转换模块905,设置为根据所述写请求参数将文件的副本按照预先指定的格式转换得到多个不同格式的所述文件的异构副本。在本发明的实施例中,文件的异构副本是指对文件的副本进行数据格式的转换得到的副本,文件的异构副本所记录的内容与文件的副本以及文件中的内容相同,每个异构副本适用于不同的应用场景(例如按行模式存储文件的副本、按列模式存储文件的副本、按块模式存储文件的副本等),同样也可以互为冗余。具体可参见图3~图5。
副本存储模块907,设置为根据从元数据服务器获取存储服务器的位置信息,将转换得到的多个不同格式的所述文件的异构副本分别存储在指定的存储服务器上。继续参见图2,客户端程序201可以根据从元数据服务器203获取的存储服务器的位置信息,将转换得到的多个不同格式的文件的异构副本分别存储在指定的存储服务器205上。
在本发明的实施例中,由于客户端程序可以将文件的副本按照预先指定的格式转换得到多个不同格式的文件的异构副本,并将转换得到的多个不同格式的文件的异构副本分别存储在指定的存储服务器上,通过将一个文件存多个异构副本,异构副本之间可以相互转换,使得多个异构副本起到冗余的作用,以提高文件系统的可靠性。而且在本发明的实施例中用户可以根据需要选择读取文件的副本,例如当用户需要按行分析文件的副本时,就可以选择读取按行模式存储的文件的异构副本;当用户需要按列分析文件的副本时,就可以选择读取按列模式存储的文件的异构副本;当用户需要按块分析文件的异构副本(一些多维表格数据)时,就可以选择读取按块模式存储的文件的异构副本。由于用户可以根据需要选择读取文件的异构副本,能够有效提高用户处理数据的工作效率。
可选地,在本发明的另一个实施例中,所述装置900还包括:
第一判断模块909,设置为根据写请求参数判断是否启用所述文件的异构副本;如果启用异构文件的副本,则触发格式转换模块905;如果没有启用所述文件的异构副本,则触发副本存储模块907根据从元数据服务器获取每个存储服务器的位置信息,将所述文件的副本分别存储在指定的存储服务器上。
可选地,在本发明的另一个实施例中,格式转换模块905设置为根据写请求参数将所述文件的副本按照行模式存储、列模式存储或块模式存储的格式转换成多个不同格式的所述文件的异构副本,并缓存转换得到的多个不同格式的所述文件的异构副本。
可选地,在本发明的另一个实施例中,所述装置900还包括:
读请求获取模块911,设置为获取用于读取文件的副本的读请求参数;
第二位置获取模块913,设置为根据所述读请求参数从所述元数据服务器获取要读的存储服务器的位置信息;
第二判断模块915,设置为根据读请求参数判断是否启用所述文件的异构副本;
副本读取模块917,设置为如果启用所述文件的异构副本,则根据从所述元数据服务器获取要读的存储服务器的位置信息,从指定的存储服务器上读取所述文件的异构副本;以及用于如果没有启用所述文件的异构副本,则根据从所述元数据服务器获取要读的存储服务器的位置信息,从指定的多个存储服务器上读取任意一个文件的副本。
可选地,在本发明的另一个实施例中,所述写请求参数包括:文件句柄、文件偏移量、文件长度和文件的副本的格式,其中所述文件的副本的格式包括:按照行模式存储、按照列模式存储或者按照块模式存储;所述读请求参数包括:文件句柄、文件偏移量、文件长度和所述文件的副本的读取模式,所述读取模式包括行模式读取副本、列模式读取副本或者块模式读取副本。
以上所述的是本发明的优选实施方式,应当指出对于本技术领域的普通人员来说,在不脱离本发明所述的原理前提下还可以作出若干改进和润饰,这些改进和润饰也在本发明的保护范围内。

Claims (10)

  1. 一种分布式存储系统中管理异构副本的方法,包括:
    获取用于存储文件的副本的写请求参数;
    根据所述写请求参数从元数据服务器获取分布式存储系统中每个存储服务器的位置信息;
    根据所述写请求参数将所述文件的副本按照预先指定的格式转换得到多个不同格式的所述文件的异构副本;
    根据从所述元数据服务器获取的所述存储服务器的位置信息,将转换得到的多个不同格式的所述文件的异构副本分别存储在指定的存储服务器上。
  2. 根据权利要求1所述的方法,其中,在根据所述写请求参数将所述文件的副本按照预先指定的格式转换得到多个不同格式的所述文件的异构副本的步骤之前,所述方法还包括:
    根据所述写请求参数判断是否启用所述文件的异构副本;
    如果启用所述文件的异构副本,则进入根据所述写请求参数将所述文件的副本按照预先指定的格式转换得到多个不同格式的所述文件的异构副本的步骤。
  3. 根据权利要求1或2所述的方法,其中,所述根据所述写请求参数将所述文件的副本按照预先指定的格式转换得到多个不同格式的所述文件的异构副本的步骤为:
    根据所述写请求参数将所述文件的副本按照行模式存储、列模式存储或块模式存储的格式转换成多个不同格式的所述文件的异构副本,并缓存转换得到的多个不同格式的所述文件的异构副本。
  4. 根据权利要求1所述的方法,其中,所述方法还包括:
    获取用于读取文件的副本的读请求参数;
    根据所述读请求参数从所述元数据服务器获取要读的所述存储服务器的位置信息;
    根据所述读请求参数判断是否启用所述文件的异构副本;
    如果启用所述文件的异构副本,则根据从所述元数据服务器获取要读的存储服务器的位置信息,从指定的存储服务器上读取所述文件的异构副本。
  5. 根据权利要求4所述的方法,其中,所述写请求参数包括:文件句柄、文件偏移量、文件长度和文件的副本的格式,其中所述文件的副本的格式包括:按照行模式存储、按照列模式存储或者按照块模式存储;
    所述读请求参数包括:文件句柄、文件偏移量、文件长度和所述文件的副本的读取模式,所述读取模式包括按照行模式读取副本、按照列模式读取副本或者按照块模式读取副本。
  6. 一种分布式存储系统中管理异构副本的装置,包括:
    写请求获取模块,设置为获取用于存储文件的副本的写请求参数;
    第一位置获取模块,设置为根据所述写请求参数从元数据服务器获取分布式存储系统中每个存储服务器的位置信息;
    格式转换模块,设置为根据所述写请求参数将所述文件的副本按照预先指定的格式转换得到多个不同格式的所述文件的异构副本;
    副本存储模块,设置为根据从所述元数据服务器获取的所述存储服务器的位置信息,将转换得到的多个不同格式的所述文件的异构副本分别存储在指定的存储服务器上。
  7. 根据权利要求6所述的装置,其中,所述装置还包括:
    第一判断模块,设置为根据所述写请求参数判断是否启用所述文件的异构副本;如果启用所述文件的异构副本,则触发所述格式转换模块。
  8. 根据权利要求6或7所述的装置,其中,所述格式转换模块设置为根据所述写请求参数将所述文件的副本按照行模式存储、列模式存储或块模式存储的格式转换成多个不同格式的所述文件的异构副本,并缓存转换得到的多个不同格式的所述文件的异构副本。
  9. 根据权利要求6所述的装置,其中,所述装置还包括:
    读请求获取模块,设置为获取用于读取文件的副本的读请求参数;
    第二位置获取模块,设置为根据所述读请求参数从所述元数据服务器获取要读的所述存储服务器的位置信息;
    第二判断模块,设置为根据所述读请求参数判断是否启用所述文件的异构副本;
    副本读取模块,设置为如果启用所述文件的异构副本,则根据从所述元数据服务器获取要读的所述存储服务器的位置信息,从指定的存储服务器上读取所述文件的异构副本。
  10. 根据权利要求9所述的装置,其中,所述写请求参数包括:文件句柄、文件偏移量、文件长度和文件的副本的格式,其中所述文件的副本的格式包括:按照行模式存储、按照列模式存储或者按照块模式存储;
    所述读请求参数包括:文件句柄、文件偏移量、文件长度和所述文件的副本的读取模式,所述读取模式包括按照行模式读取副本、按照列模式读取副本或者按照块模式读取副本。
PCT/CN2014/086658 2014-05-15 2014-09-16 一种分布式存储系统中管理异构副本的方法及装置 WO2015172478A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410206795.1 2014-05-15
CN201410206795.1A CN105095294B (zh) 2014-05-15 2014-05-15 一种分布式存储系统中管理异构副本的方法及装置

Publications (1)

Publication Number Publication Date
WO2015172478A1 true WO2015172478A1 (zh) 2015-11-19

Family

ID=54479238

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/086658 WO2015172478A1 (zh) 2014-05-15 2014-09-16 一种分布式存储系统中管理异构副本的方法及装置

Country Status (2)

Country Link
CN (1) CN105095294B (zh)
WO (1) WO2015172478A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107807793A (zh) * 2017-10-27 2018-03-16 清华大学 分布式计算机存储系统中数据副本异构存储与访问方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107180031B (zh) * 2016-03-09 2021-04-09 华为技术有限公司 分布式存储方法及装置、数据处理方法及装置
CN107451138A (zh) * 2016-05-30 2017-12-08 中兴通讯股份有限公司 一种分布式文件系统存储方法和系统
CN107515863A (zh) * 2016-06-15 2017-12-26 上海宽带技术及应用工程研究中心 一种基于分布式数据库的sdn集群实现的方法及系统
CN106202396A (zh) * 2016-07-08 2016-12-07 乐视控股(北京)有限公司 对象存储方法和对象存储系统
CN107295070B (zh) * 2017-05-31 2019-10-29 上海交通大学 文件大小异构的分布式编码缓存放置方法及系统
CN109428861A (zh) * 2017-08-29 2019-03-05 阿里巴巴集团控股有限公司 网络通信方法及设备
CN108304471A (zh) * 2017-12-28 2018-07-20 中国银联股份有限公司 一种数据异构存储方法以及数据异构存储装置
CN108334565A (zh) * 2018-01-15 2018-07-27 贵州易鲸捷信息技术有限公司 一种数据混合存储结构、数据存储查询方法、终端及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1329308A (zh) * 2000-06-21 2002-01-02 国际商业机器公司 将应用程序数据分配至不相似格式分布式数据库的方法和系统
CN102004743A (zh) * 2009-09-02 2011-04-06 中国银联股份有限公司 一种用于异构数据库之间数据复制的系统及方法
CN102314375A (zh) * 2011-03-18 2012-01-11 北京神州数码思特奇信息技术股份有限公司 一种异构数据库存储统一接口和数据库访问方法
CN102708121A (zh) * 2011-02-18 2012-10-03 微软公司 异构源上的动态分布式查询执行

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1329308A (zh) * 2000-06-21 2002-01-02 国际商业机器公司 将应用程序数据分配至不相似格式分布式数据库的方法和系统
CN102004743A (zh) * 2009-09-02 2011-04-06 中国银联股份有限公司 一种用于异构数据库之间数据复制的系统及方法
CN102708121A (zh) * 2011-02-18 2012-10-03 微软公司 异构源上的动态分布式查询执行
CN102314375A (zh) * 2011-03-18 2012-01-11 北京神州数码思特奇信息技术股份有限公司 一种异构数据库存储统一接口和数据库访问方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107807793A (zh) * 2017-10-27 2018-03-16 清华大学 分布式计算机存储系统中数据副本异构存储与访问方法
CN107807793B (zh) * 2017-10-27 2019-11-08 清华大学 分布式计算机存储系统中数据副本异构存储与访问方法

Also Published As

Publication number Publication date
CN105095294B (zh) 2019-08-09
CN105095294A (zh) 2015-11-25

Similar Documents

Publication Publication Date Title
WO2015172478A1 (zh) 一种分布式存储系统中管理异构副本的方法及装置
US11829360B2 (en) Database workload capture and replay
JP6697392B2 (ja) 半構造データスキーマのトランスペアレントディスカバリ
US10176225B2 (en) Data processing service
US20180314721A1 (en) Incremental out-of-place updates for index structures
US9639542B2 (en) Dynamic mapping of extensible datasets to relational database schemas
US8938430B2 (en) Intelligent data archiving
EP2572289A1 (en) Data storage and processing service
US20160314178A1 (en) Method and apparatus for processing database data in distributed database system
WO2015070674A1 (zh) 一种操作数据的方法和系统
WO2015074290A1 (zh) 数据库实现方法
US11841845B2 (en) Data consistency mechanism for hybrid data processing
CN111046036A (zh) 数据同步方法、装置、系统及存储介质
Gupta et al. Faster as well as early measurements from big data predictive analytics model
El Alami et al. Supply of a key value database redis in-memory by data from a relational database
Murugesan et al. Audit log management in MongoDB
KR101451280B1 (ko) 분산형 데이터베이스 관리 시스템 및 방법
WO2016155510A1 (en) Apparatus and method for creating user defined variable size tags on records in rdbms
Kvet et al. Locating and accessing large datasets using Flower Index Approach
de Macedo et al. An improvement of a different approach for medical image storage
Singh NoSQL: A new horizon in big data
HRUBARU et al. On the Performance of Three In-Memory Data Systems for On Line Analytical Processing.
Gayathiri et al. Big health data processing with document-based Nosql database
Saxena et al. Moving from relational data storage to decentralized structured storage system
CN116610739A (zh) 一种数据处理方法、装置及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14892213

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14892213

Country of ref document: EP

Kind code of ref document: A1