WO2018108158A1 - 一种基于多数派的数据存储方法、装置、存储介质及设备 - Google Patents

一种基于多数派的数据存储方法、装置、存储介质及设备 Download PDF

Info

Publication number
WO2018108158A1
WO2018108158A1 PCT/CN2017/116513 CN2017116513W WO2018108158A1 WO 2018108158 A1 WO2018108158 A1 WO 2018108158A1 CN 2017116513 W CN2017116513 W CN 2017116513W WO 2018108158 A1 WO2018108158 A1 WO 2018108158A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
data
storage unit
preset value
received
Prior art date
Application number
PCT/CN2017/116513
Other languages
English (en)
French (fr)
Inventor
吴义谱
张炎泼
Original Assignee
贵州白山云科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 贵州白山云科技有限公司 filed Critical 贵州白山云科技有限公司
Publication of WO2018108158A1 publication Critical patent/WO2018108158A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]

Definitions

  • Embodiments of the present invention relate to, but are not limited to, the field of data storage technologies, and in particular, to a majority-based data storage method, apparatus, storage medium, and device.
  • the temporary storage and asynchronous distribution methods receive and temporarily store the entire contents of the uploaded file on the device receiving the request of the processing client.
  • the scheduling process asynchronously distributes the temporary file distribution of the disk to multiple storage devices, and the data on each storage device acts as a copy of the file, and multiple copies are simultaneously written successfully, and then the front-end device is deleted.
  • the temporary storage file if a distribution write fails, uses a loop retry mechanism.
  • the characteristic of this method is that the access to the most recently uploaded file adopts the N+1 mode, and N is the number of file copies. This mechanism avoids the problem that the access cannot be accessed before the undistributed write is completed.
  • the file is stored in a shard, the system pre-allocates a certain number of shards, the shards use multiple copies, multiple copies of the shards are Stored on different storage machines, multiple copies use a certain algorithm (such as paxos algorithm) to elect a master.
  • the election algorithm selects a master, and the master is responsible for the write of the slice. operating.
  • the characteristic of this method is that for a file write request, according to a certain strategy (such as hash policy)
  • the file is written to a shard.
  • the write operation must first write the main shard, then the main shard writes multiple shards. When the shard is greater than or equal to N/2, the shard is successfully written.
  • the slice returns a successful response and N is the slave score.
  • the file is temporarily stored on the device disk. A single point problem occurs. If the file is down or the disk of the file is faulty, the file will be permanently lost. 2. Temporary storage Receiving a file to write a disk and then distributing it asynchronously will cause the disk IO of the scratch machine to increase. When the load is high, the performance of the device will be seriously affected.
  • the front-end server receives the file upload request, and selects a file group from the file group of the first preset value according to the file upload request;
  • the first preset value, the second preset value, and the third preset value are all positive integers greater than 1, and the third preset value is less than or equal to the second preset. value.
  • Each file group stores an address corresponding to each storage unit
  • the method further includes:
  • the data in the received file is uploaded to the storage unit of the second preset value corresponding to the selected file group according to the selected file group, including:
  • the data in the received file is separately uploaded to the selected file group through the HTTP connection.
  • the second preset number of storage units corresponds to the second preset number of storage units.
  • Determining whether the number of connected storage units is greater than a third preset value if yes, disconnecting the HTTP connection between the selected storage unit with the slowest transmission speed, and selecting the selected storage unit with the slowest transmission speed
  • the data in the received file is uploaded to the storage unit of the second preset value corresponding to the selected file group in parallel through the HTTP connection, including:
  • the data in the received file is respectively connected to the storage unit of the second preset value corresponding to the selected file group in a byte stream by HTTP connection.
  • determining that the data is successfully written includes:
  • the computer readable storage medium provided by the embodiment of the present invention has a computer program stored thereon, and when the program is executed by the processor, the steps of the foregoing method are implemented.
  • a computer device provided by an embodiment of the present invention includes a memory, a processor, and a computer program stored on the memory and operable on the processor, and the processor implements the steps of the method when the program is executed.
  • Another method for storing a data based on the majority comprising: the storage unit receiving data in a file uploaded by the front-end server; and sending the data to the front-end server after determining that the data in the file is received The physical information of the file in the file.
  • sending the file physical information of the file to the front-end server including:
  • the file physical information including SHA1 is sent to the front-end server.
  • An apparatus provided by the embodiment of the present invention is applied to a front-end server, and includes:
  • the receiving unit is configured to receive a file upload request, and select a file group from the first preset number of file groups according to the file upload request;
  • the uploading unit is configured to receive the data in the file, and upload the data in the received file to the storage unit of the second preset value corresponding to the selected file group according to the selected file group;
  • the processing unit is configured to: when it is determined that the data upload in the file is completed, and the file physical information returned by the storage unit greater than or equal to the third preset value is received, determining that the data is successfully written, and then the metadata in the file is The information is updated to the database. At this time, the file upload request is returned successfully.
  • the first preset value, the second preset value, and the third preset value are all positive integers greater than 1, and the third preset value is less than or equal to the second preset value.
  • the above device also has the following features:
  • Each file group stores an address corresponding to each storage unit
  • the uploading unit is further configured to: before uploading the data in the received file to the storage unit of the second preset value corresponding to the selected file group, according to the selected file group
  • the address corresponding to each storage unit is simultaneously established with each storage unit in the selected file group. HTTP connection.
  • the uploading unit is specifically configured to receive data in the file, and each time the capacity of the data in the received file reaches a preset threshold, the data in the received file is respectively passed according to the selected file group.
  • the HTTP connection is uploaded in parallel to the storage unit of the second preset value corresponding to the selected file group.
  • the above device also has the following features:
  • the front end server further includes a detecting unit:
  • the detecting unit is configured to detect a transmission speed of each storage unit, and select a storage unit with the slowest transmission speed; determine whether the number of connected storage units is greater than a third preset value; if yes, disconnect and select the transmission speed
  • the HTTP connection between the slowest storage units, the selected storage unit with the slowest transmission speed is removed from the selected file group; after the data upload in the file is completed, it is detected whether it is greater than or equal to the third.
  • the file physical information returned by the preset number of storage units if yes, it is determined that the data is successfully written.
  • the above device also has the following features:
  • the processing unit is specifically configured to receive the file physical information including the SHA1 by each storage unit, and determine that the data writing is successful when the physical information of the file having the same SHA1 is greater than or equal to the third preset value.
  • the embodiment of the present invention further provides an apparatus, configured to be applied to a storage unit, comprising: a receiving unit configured to receive data in a file uploaded by a front end server; and a sending unit configured to: when determining that data in the file is received Sending file physical information to the front-end server.
  • the above device also has the following features:
  • the sending unit is specifically configured to: after determining that the data in the file is received, send the file physical information including the SHA1 to the front-end server.
  • the beneficial effects of the present invention are as follows:
  • the embodiment of the present invention provides a data storage method, apparatus, storage medium and device based on majority, in which the front-end server can receive the first preset value according to the received file upload request. Selecting a file group from the file group, and uploading the data in the received file to the storage unit of the second preset value corresponding to the selected file group according to the selected file group, and After determining that the data upload in the file is completed and receiving the physical information of the file returned by the storage unit greater than or equal to the third preset value, the data is successfully written, and then the element in the file is The data information is updated to the database. At this time, the file upload request is returned successfully.
  • This article writes files by using the majority success method, there is no single point problem, and any minority copy device failure will not cause data loss.
  • the server uses the streaming transparent proxy mechanism to transfer the file content to On each file copy, unnecessary disk IO is reduced, and each copy is a peer-to-peer relationship, and the master-slave relationship between devices is not maintained, thereby improving the data write success rate.
  • FIG. 1 is a schematic flowchart diagram of a majority-based data storage method according to Embodiment 1 of the present invention
  • FIG. 2 is a topological structural diagram of a system according to Embodiment 1 of the present invention.
  • Embodiment 3 is a schematic flowchart of a majority-based data storage method according to Embodiment 2 of the present invention.
  • FIG. 5 is a schematic structural diagram of an apparatus applied to a storage unit according to Embodiment 4 of the present invention.
  • FIG. 6 is a schematic structural diagram of a majority-based data storage system according to Embodiment 5 of the present invention.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • Embodiment 1 of the present invention provides a data storage method based on majority, as shown in FIG. 1 It is a schematic flowchart of a majority-based data storage method in Embodiment 1 of the present invention, and the method may include the following steps:
  • Step 101 The front-end server receives the file upload request, and selects a file group from the file group of the first preset value according to the file upload request.
  • Step 102 Receive data in the file, and upload the data in the received file to the storage unit of the second preset value corresponding to the selected file group according to the selected file group.
  • Step 103 When it is determined that the data upload in the file is completed, and the physical information of the file returned by the storage unit greater than or equal to the third preset value is received, the data is successfully written, and the metadata information in the file is updated. To the database, at this time, the return file upload request is successful.
  • the first preset value, the second preset value, and the third preset value are all positive integers greater than 1, and the third preset value is less than or equal to the second preset value.
  • the number of storage units corresponding to different file groups in step 101 may be the same or different.
  • the first preset value is 2
  • the second preset value is 3
  • the third preset value is 2, as shown in FIG. 2 (topological structure diagram of the embodiment of the present invention), stored in three storage unit stores.
  • a copy of the data when there are two storage unit Stores returning the physical information of the file (that is, the majority copy returns the write success), the front-end server Front can determine that the data is successfully written, and update the metadata information in the file to the database DB.
  • the returning file uploading request is successful, so that the user can understand the feedback information.
  • the group-x is the file group x
  • the group-y is the file group y, which is not described in detail in the embodiment of the present invention.
  • the server adopts a stream transparent proxy.
  • the mechanism to transfer the contents of the file to each file copy reduces unnecessary disk IO.
  • Each copy is a peer-to-peer relationship, and the master-slave relationship between devices is not maintained, thereby improving the data write success rate.
  • the preset threshold corresponding to the data capacity is 1 MB of data
  • the second preset value is 3, so that each time the front-end server receives 1 MB of data, it can receive 1 MB according to the selected file group.
  • the data is uploaded to the three storage units corresponding to the selected file group in parallel through the HTTP connection.
  • the front-end server may be a nginx service that directly receives the user request, and can read the buffer data of the 1M in a loop, and handle the access of the database at the same time, which is not described in detail in the embodiment of the present invention.
  • the data in the received file is uploaded to the storage unit of the second preset value corresponding to the selected file group in parallel through the HTTP connection, and the specific execution is:
  • the data in the received file is respectively connected to the storage unit of the second preset value corresponding to the selected file group in a byte stream by HTTP connection.
  • the server uses a streaming transparent proxy mechanism to transfer file content to each file copy, reducing unnecessary disk IO.
  • the method may further include the following steps A1-A5: Step A1: detecting the transmission speed of each storage unit, and selecting the storage unit with the slowest transmission speed.
  • Step A2 Determine whether the number of connected storage units is greater than a third preset value.
  • Step A3 If yes, disconnect and select the HTTP connection between the storage unit with the slowest transmission speed, and remove the selected storage unit with the slowest transmission speed from the selected file group.
  • Step A4 After the data upload in the file is completed, it is detected whether the file physical information returned by the storage unit greater than or equal to the third preset number is received. Step A5: If yes, it is determined that the data is successfully written.
  • Step B2 When the physical information of the file having the same SHA1 is greater than or equal to the third preset value When the number is determined, it is determined that the data is successfully written.
  • the file physical information may also include information such as MD5, data size (SIZE).
  • the first embodiment of the present invention provides a data storage method based on a majority, in which the front-end server may select a file group from a file group of the first preset value according to the received file upload request, and Uploading the data in the received file to the storage unit of the second preset value corresponding to the selected file group according to the selected file group, and uploading the data in the determined file is completed, and receiving If the file physical information returned by the storage unit greater than or equal to the third preset value is successful, the data is successfully written, and the metadata information in the file is updated to the database. At this time, the file upload request is returned successfully.
  • the technical solution of this paper by using the majority success method to write files, there will be no single point problem, and any minority copy device failure will not cause data loss.
  • the server adopts the mechanism of streaming transparent proxy.
  • the file content is transferred to each file copy, which reduces unnecessary disk IO, and each copy is a peer-to-peer relationship. It does not maintain the master-slave relationship between devices, which improves the data write success rate.
  • the second embodiment of the present invention is based on the same inventive concept as the first embodiment of the present invention.
  • the second embodiment of the present invention provides a data storage method based on a majority based on a storage unit, and a schematic diagram of the flow is shown in FIG. The following steps can be included:
  • Step 301 The storage unit receives data in a file uploaded by the front-end server.
  • Step 302 After determining that the data in the file is received, send the file physical information of the file to the front-end server.
  • the server adopts the mechanism of streaming transparent proxy.
  • the file content is transferred to each file copy, which reduces unnecessary disk IO, and each copy is a peer-to-peer relationship. It does not maintain the master-slave relationship between devices, which improves the data write success rate.
  • the front-end server receives a file upload and write request, selects one file group, and stores the storage addresses corresponding to the three storage units in the file group, and the front-end server simultaneously establishes an HTTP connection to the three front-end servers to the storage unit.
  • the front-end server cyclically reads 1M of buffer data, each storage unit creates a thread, concurrently transfers the buffer data to 3 storage units, and at the same time, detects the transmission speed of each storage unit, each time sent When the buffer data is completed, it is checked whether the majority of the storage unit can be successfully succeeded, and in the case that the majority is successful, the storage unit with slow transmission speed is actively culled, that is, the HTTP link of the data transmission is disconnected.
  • the front-end server After the front-end server updates the metadata information, it returns a file upload request successfully. It should be noted that, after the return file upload request is successful, the majority of the copy is correctly written data, and the file access can read data from any correctly written copy.
  • the uploading unit 42 is configured to receive the data in the file, and upload the data in the received file to the storage list of the second preset value corresponding to the selected file group according to the selected file group. Yuan
  • the processing unit 43 is configured to: when it is determined that the data upload in the file is completed, and the file physical information returned by the storage unit greater than or equal to the third preset value is received, the data is successfully written, and then the element in the file is The data information is updated to the database. At this time, the return file upload request is successful;
  • the first preset value, the second preset value, and the third preset value are all positive integers greater than 1, and the third preset value is less than or equal to the second preset value.
  • the address corresponding to each storage unit is stored in each file group.
  • the uploading unit 42 is further configured to upload the data in the received file to the second preset value corresponding to the selected file group.
  • an HTTP connection is established with each storage unit in the selected file group according to the address corresponding to each storage unit stored in the selected file group.
  • the uploading unit 42 may be specifically configured to receive data in the file, and each time the capacity of the data in the received file reaches a preset threshold, according to the selected file group, the received file The data in the data is uploaded to the storage unit of the second preset value corresponding to the selected file group in parallel through the HTTP connection.
  • the front-end server further includes a detecting unit 44: the detecting unit 44 is configured to detect the transmission speed of each storage unit, and select the storage unit with the slowest transmission speed; determine whether the number of connected storage units is greater than the third The preset value; if it is, disconnects the HTTP connection between the selected storage unit with the slowest transmission speed, and selects the storage unit with the slowest transmission speed to be removed from the selected file group; the data upload in the file After the completion, it is detected whether the physical information of the file returned by the storage unit greater than or equal to the third preset number is received; if yes, it is determined that the data is successfully written.
  • the detecting unit 44 is configured to detect the transmission speed of each storage unit, and select the storage unit with the slowest transmission speed; determine whether the number of connected storage units is greater than the third The preset value; if it is, disconnects the HTTP connection between the selected storage unit with the slowest transmission speed, and selects the storage unit with the slowest transmission speed to be removed from the selected file group; the data upload in the file
  • the uploading unit 42 is specifically configured to upload the data in the received file to the storage unit of the second preset value corresponding to the selected file group in parallel by HTTP connection and in a byte stream manner. on.
  • processing unit 43 may be specifically configured to receive the file physical information including the SHA1 returned by each storage unit; and determine the data writing when the physical information of the file receiving the same SHA1 is greater than or equal to the third preset value. success.
  • the third embodiment of the present invention provides a front-end server, and the front-end server may select a file group from a file group of the first preset value according to the received file upload request, and receive the file group according to the selected file group.
  • the data in the file is uploaded to the second preset value corresponding to the selected file group.
  • the data upload in the determined file is completed, and the physical information of the file returned by the storage unit greater than or equal to the third preset value is received, the data is successfully written, and then the file is written.
  • the metadata information is updated to the database. At this time, the file upload request is returned successfully.
  • This article writes files by using the majority success method, there is no single point problem, and any minority copy device failure will not cause data loss.
  • the server uses the streaming transparent proxy mechanism to transfer the file content to On each file copy, unnecessary disk IO is reduced, and each copy is a peer-to-peer relationship.
  • the master-slave relationship between devices is not maintained, and the data write success rate is improved.
  • the receiving unit 51 is configured to receive data in a file uploaded by the front end server;
  • the transmitting unit 52 is arranged to transmit the file physical information to the front end server after determining that the data reception in the file is completed.
  • the sending unit 52 may be specifically configured to send the file physical information including the SHA1 to the front-end server after determining that the data in the file is received.
  • the fourth embodiment of the present invention provides a storage unit.
  • the file is written without a single point problem, and any minority copy device fails, and data loss is not caused.
  • the server adopts Streaming transparent proxy mechanism, transferring file content to each file copy, reducing unnecessary disk IO, and each copy is peer-to-peer, without maintaining the master-slave relationship between devices, improving data writing Success rate.
  • Embodiment 5 is based on the same inventive concept as Embodiments 1 and 2 of the present invention.
  • Embodiment 5 of the present invention provides a data storage system based on a majority.
  • the system may mainly include:
  • the front-end server 61 is configured to receive a file upload request, and select a file group from the first preset number of file groups according to the file upload request, where each file group corresponds to a second preset number of storage units Receiving data in the file, and uploading the data in the received file to the storage unit of the second preset value corresponding to the selected file group according to the selected file group; when determining the file The data upload is completed, and the storage unit that is greater than or equal to the third preset value is received. When the physical information of the file is returned, the data is successfully written, and the metadata information in the file is updated to the database.
  • the storage unit 62 is configured to receive data in a file uploaded by the front-end server, and send the file physical information to the front-end server after determining that the data in the file is received.
  • the fifth embodiment of the present invention provides a data storage system based on a majority, the front-end server may select a file group from a file group of the first preset value according to the received file upload request, and select a file group according to the selected file.
  • the group uploads the data in the received file to the storage unit of the second preset value corresponding to the selected file group, and completes the data upload in the determined file, and receives the greater than or equal to the third.
  • the physical information of the file returned by the storage unit of the preset value is successful, the data is successfully written, and the metadata information in the file is updated to the database. At this time, the file upload request is returned successfully.
  • This article writes files by using the majority success method, there is no single point problem, and any minority copy device failure will not cause data loss.
  • the server uses the streaming transparent proxy mechanism to transfer the file content to On each file copy, unnecessary disk IO is reduced, and each copy is a peer-to-peer relationship. The master-slave relationship between devices is not maintained, and the data write success rate is improved.
  • the server uses streaming proxy to transfer the file content to each copy. Can reduce disk IO usage.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本文公开了一种基于多数派的数据存储方法、装置、存储介质及设备,在该方法中,前端服务器可根据接收到的文件上传请求从第一预设值个数的文件组中选取一个文件组,将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上,在确定文件中的数据上传完成,并接收到大于或等于第三预设值个数的存储单元返回的文件物理信息时,则数据写入成功,再将文件中的元数据信息更新至数据库,此时,返回文件上传请求成功。本文通过采用多数派成功方法写入文件,不会出现单点问题,且任意少数派的副本设备出现故障都不会造成数据丢失,另外服务器采用流式代理,将文件内容传输到各副本上,可减少磁盘IO占用。

Description

一种基于多数派的数据存储方法、装置、存储介质及设备
本申请要求在2016年12月16日提交中国专利局、申请号为201611168618.4、发明名称为“一种基于多数派数据存储方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及但不限于数据存储技术领域,尤其涉及一种基于多数派的数据存储方法、装置、存储介质及设备。
背景技术
在大规模分布式存储系统中,设备的宕机、网络的异常和磁盘的故障随处可见,为解决这些异常情况,分布式存储系统往往会将数据冗余存储多份,每一份数据称为一个副本。这样当某个设备出现故障时,可以从其他的设备上访问到数据。同时当某个副本的数据丢失时,可以从其他副本恢复,保证了数据的可靠性。
由于写入数据时,需要写多副本,因此就需要一种方法如何写多个副本才算一次成功的请求。现有写入多副本的方法有:
1、暂存和异步分发方法在接收处理客户端请求的设备上,将上传文件的全部内容接收并临时存储
在该设备的磁盘,并更新对应的文件元数据信息,最后返回一个成功响应。与此同时,调度进程异步的将磁盘的临时文件分发写入多个存储设备,每一个存储设备上的数据作为该文件的一个副本,多个副本都同时写入成功后,再删除前端设备上的临时存储文件,若一次分发写入失败采用循环重试机制。这种方式的特点是,对于最近上传文件的访问采用N+1方式,N为文件副本数,这种机制避免了在未分发写入完成之前,不能访问的问题。
2、写一主和多从方法一个分布式系统集群中,文件被存储在一个分片中,系统中预先分配一定数量的分片,分片采用多副本的方式,分片的多个副本被存储在不同的存储机器上,多个副本采用某种算法(如paxos算法)选举出一个主,当原来的主宕机或者失联时,选举算法再选择一个主,主负责分片的写入操作。这种方式的特点是,对于一个文件的写入请求,按照一定的策略(如hash策略)将 文件写入到某个分片上,写操作必须先写主分片,然后由主分片去写多个从分片,当大于或等于N/2个从分片写成功后,再由主分片返回成功响应,N为从分数。
对于上述现有数据写入方式存在以下问题:
1、文件暂存在设备磁盘上,会出现单点问题,当文件还未分发到存储设备之前,若该设备宕机或者存储文件的磁盘出现故障,将会导致文件永久的丢失;2、暂存接收文件写磁盘再异步分发,将导致暂存机器磁盘IO增大,当负载高时,将严重影响该设备的性能;
3、写主和多从方法,需要维护分片之间的主从关系,当主异常时,还未选举新的主之前,所有映射到该分片的写入操作都将失败。
因此,亟需提供一种防止文件丢失、提高数据写入成功率且不会出现单点问题以及减少磁盘IO占用的数据存储方法。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本发明实施例提供了一种基于多数派的数据存储方法、装置、存储介质及设备,用以实现防止文件丢失、提高数据写入成功率且不会出现单点问题以及减少磁盘IO占用的效果。
本发明实施例提供的基于多数派的数据存储方法,包括:
前端服务器接收文件上传请求,根据所述文件上传请求从第一预设值个数的文件组中选取一个文件组;
接收文件中的数据,根据选取出的文件组,将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上;
当确定文件中的数据上传完成且接收到大于或等于第三预设值个数的存储单元返回的文件物理信息时,确定数据写入成功,将文件中的元数据信息更新至数据库,;
其中,所述第一预设值、所述第二预设值和所述第三预设值均为大于1的正整数,且所述第三预设值小于或等于所述第二预设值。
上述方法还具有以下特点:
每一文件组中分别存储有各存储单元对应的地址;
所述将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上之前,所述方法还包括:
根据选取出的文件组中存储的各存储单元对应的地址,同时与选取出的文件组中的每一存储单元建立HTTP连接。
上述方法还具有以下特点:
所述接收文件中的数据,根据选取出的文件组,将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上,包括:
接收文件中的数据,每当接收到的文件中的数据的容量达到预设阈值时,根据选取出的文件组,将接收到的文件中的数据分别通过HTTP连接并行上传至选取出的文件组对应的第二预设值个数的存储单元上。
上述方法还具有以下特点:
所述方法还包括:检测每个存储单元的传输速度,并选取出传输速度最慢的存储单元;
判断连接的存储单元数量是否大于第三预设值;若是,断开与选取出的传输速度最慢的存储单元之间的HTTP连接,将所述选取出的传输速度最慢的存储单元从选取出的文件组中剔除;在文件中的数据上传完成后,检测是否接收到大于或等于第三预设置个数的存储单元返回的文件物理信息;若是,则确定数据写入成功。
上述方法还具有以下特点:
所述将接收到的文件中的数据分别通过HTTP连接并行上传至选取出的文件组对应的第二预设值个数的存储单元上,包括:
将接收到的文件中的数据分别通过HTTP连接并以字节流的方式并行上传至选取出的文件组对应的第二预设值个数的存储单元上。
上述方法还具有以下特点:
所述当接收到大于或等于第三预设值个数的存储单元返回的文件物理信息时,确定数据写入成功,包括:
接收到各存储单元返回包含SHA1的文件物理信息;当接收到具有相同SHA1的文件物理信息大于或等于第三预设值个数时,确定数据写入成功。
本发明实施例提供的计算机可读存储介质,所述存储介质上存储有计算机程序,所述程序被处理器执行时实现上述方法的步骤。
本发明实施例提供的计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现所述方法的步骤。
本发明实施例提供的另一种基于多数派的数据存储方法,包括:存储单元接收前端服务器上传的文件中的数据;当确定所述文件中的数据接收完成后,向所述前端服务器发送所述文件的文件物理信息。
上述方法还具有以下特点:
所述当确定所述文件中的数据接收完成后,向所述前端服务器发送所述文件的文件物理信息,包括:
当确定所述文件中的数据接收完成后,向所述前端服务器发送包含SHA1的文件物理信息。
本发明实施例提供的一种装置,应用于前端服务器,包括:
接收单元被设置为接收文件上传请求,并根据所述文件上传请求从第一预设值个数的文件组中选取一个文件组;
上传单元被设置为接收文件中的数据,并根据选取出的文件组,将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上;
处理单元被设置为当确定文件中的数据上传完成,且接收到大于或等于第三预设值个数的存储单元返回的文件物理信息时,确定数据写入成功,再将文件中的元数据信息更新至数据库,此时,返回文件上传请求成功;
其中,所述第一预设值、第二预设值和第三预设值均为大于1的正整数,且第三预设值小于或等于第二预设值。
上述装置还具有以下特点:
每一文件组中分别存储有各存储单元对应的地址;则,
所述上传单元,还被设置为在将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上之前,根据选取出的文件组中存储的各存储单元对应的地址,同时与选取出的文件组中的每一存储单元建立 HTTP连接。
所述上传单元,具体被设置为接收文件中的数据,且每当接收到的文件中的数据的容量达到预设阈值时,根据选取出的文件组,将接收到的文件中的数据分别通过HTTP连接并行上传至选取出的文件组对应的第二预设值个数的存储单元上。
上述装置还具有以下特点:
所述前端服务器还包括检测单元:
所述检测单元被设置为检测每个存储单元的传输速度,并选取出传输速度最慢的存储单元;判断连接的存储单元数量是否大于第三预设值;若是,断开与选取出传输速度最慢的存储单元之间的HTTP连接,将所述选取出传输速度最慢的存储单元从选取出的文件组中剔除;在文件中的数据上传完成后,检测是否接收到大于或等于第三预设置个数的存储单元返回的文件物理信息;若是,则确定数据写入成功。
上述装置还具有以下特点:
所述处理单元,具体被设置为接收到各存储单元返回包含SHA1的文件物理信息;当接收到具有相同SHA1的文件物理信息大于或等于第三预设值个数时,确定数据写入成功。
本发明实施例还提供了一种装置,应用于存储单元,包括:接收单元,被设置为接收前端服务器上传的文件中的数据;发送单元被设置为当确定所述文件中的数据接收完成后,向所述前端服务器发送文件物理信息。
上述装置还具有以下特点:
所述发送单元,具体被设置为当确定所述文件中的数据接收完成后,向所述前端服务器发送包含SHA1的文件物理信息。
本发明有益效果如下:本发明实施例提供了一种基于多数派的数据存储方法、装置、存储介质及设备,在该方法中,前端服务器可根据接收到的文件上传请求从第一预设值个数的文件组中选取一个文件组,并根据选取出的文件组,将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上,且在确定文件中的数据上传完成,并接收到大于或等于第三预设值个数的存储单元返回的文件物理信息时,则数据写入成功,再将文件中的元 数据信息更新至数据库,此时,返回文件上传请求成功。本文通过采用多数派成功方法写入文件,不会出现单点问题,且任意少数派的副本设备出现故障都不会造成数据丢失,另外,服务器采用流式透明代理的机制,将文件内容传输到每个文件副本上,减少了不必要的磁盘IO,且每个副本都是对等关系,不用维护设备之间的主从关系,从而提高了数据写入成功率。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图说明
此处所说明的附图用来提供对本发明实施例的进一步理解,构成本申请的一部分,本发明实施例的示意性实施例及其说明用于解释本发明实施例,并不构成对本发明实施例的不当限定。在附图中:
图1所示为本发明实施例一所述的基于多数派的数据存储方法的流程示意图;
图2所示为本发明实施例一中系统拓扑结构图;
图3所示为本发明实施例二中的基于多数派的数据存储方法的流程示意图;
图4所示为本发明实施例三中的应用于前端服务器的装置的结构示意图;
图5所示为本发明实施例四中的应用于存储单元的装置的结构示意图;
图6所示为本发明实施例五中的基于多数派的数据存储系统的结构示意图。
具体实施方式
现结合附图和具体实施方式对本发明实施例进一步说明。
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
实施例一:
本发明实施例一提供了一种基于多数派的数据存储方法,如图1所示,其 为本发明实施例一中的基于多数派的数据存储方法的流程示意图,此方法可包括以下步骤:
步骤101:前端服务器接收文件上传请求,并根据文件上传请求从第一预设值个数的文件组中选取一个文件组。
步骤102:接收文件中的数据,并根据选取出的文件组,将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上。
步骤103:当确定文件中的数据上传完成,且接收到大于或等于第三预设值个数的存储单元返回的文件物理信息时,则数据写入成功,再将文件中的元数据信息更新至数据库,此时,返回文件上传请求成功。
其中,第一预设值、第二预设值和第三预设值均为大于1的正整数,且第三预设值小于或等于第二预设值。
步骤101中不同文件组对应的存储单元的个数可以相同也可以不同。
例如,假设第一预设值为2,第二预设值为3,第三预设值为2,如图2所示(本发明实施例拓扑结构图),在3个存储单元Store中存储数据副本,当有2个存储单元Store返回文件物理信息后(即多数派副本返回写入成功),前端服务器Front即可确定数据写入成功,并将文件中的元数据信息更新至数据库DB,此时,返回文件上传请求成功,以供用户了解反馈信息,图中Group-x为文件组x,Group-y为文件组y,本发明实施例对此不做赘述。
也就是说,在本发明实施例通过采用多数派成功方法写入文件,不会出现单点问题,且任意少数派的副本设备出现故障都不会造成数据丢失,另外,服务器采用流式透明代理的机制,将文件内容传输到每个文件副本上,减少了不必要的磁盘IO。且每个副本都是对等关系,不用维护设备之间的主从关系,从而提高了数据写入成功率。
每一文件组中分别存储有各存储单元对应的地址;则,对于步骤102中,在将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上之前,此方法还可包括以下步骤:
根据选取出的文件组中存储的各存储单元对应的地址,同时与选取出的文件组中的每一存储单元建立HTTP连接。优选地,作为一个可执行方案,对于步骤102中,接收文件中的数据,并根据选取出的文件组,将接收到的文件中 的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上,具体可包括:接收文件中的数据,且每当接收到的文件中的数据的容量达到预设阈值时,根据选取出的文件组,将接收到的文件中的数据分别通过HTTP连接并行上传至选取出的文件组对应的第二预设值个数的存储单元上。
例如,数据容量对应的预设阈值为1MB的数据,第二预设值为3个,这样每次前端服务器接收到1MB的数据后,就可以根据选取出的文件组,将接收到的1MB的数据分别通过HTTP连接并行上传至选取出的文件组对应的3个存储单元上。需要说明的是,前端服务器可为一个直接接收用户请求的nginx服务,可循环读取1M的缓冲区数据,同时处理数据库的访问,本发明实施例对此不作赘述。
优选地,作为一个可执行方案,将接收到的文件中的数据分别通过HTTP连接并行上传至选取出的的文件组对应的第二预设值个数的存储单元上,可具体执行为:将接收到的文件中的数据分别通过HTTP连接并以字节流的方式并行上传至选取出的文件组对应的第二预设值个数的存储单元上。服务器采用流式透明代理的机制,将文件内容传输到每个文件副本上,减少了不必要的磁盘IO。
为了提高文件的写入效率,此方法还可包括以下步骤A1-A5:步骤A1:检测每个存储单元的传输速度,并选取出传输速度最慢的存储单元。
步骤A2:判断连接的存储单元数量是否大于第三预设值。
步骤A3:若是,断开与选取出传输速度最慢的存储单元之间的HTTP连接,将选取出的传输速度最慢的存储单元从选取出的文件组中剔除。
步骤A4:在文件中的数据上传完成后,检测是否接收到大于或等于第三预设置个数的存储单元返回的文件物理信息。步骤A5:若是,则确定数据写入成功。
这样能够保证数据传输中的各通信线路的高效性,从而提高了数据写入的效率。
对于步骤103,当接收到大于或等于第三预设值个数的存储单元返回的文件物理信息时,确定数据写入成功,可具体包括如下步骤B1-B2:步骤B1:接收到各存储单元返回的包含SHA1的文件物理信息;
步骤B2:当接收到具有相同SHA1的文件物理信息大于或等于第三预设值 个数时,确定数据写入成功。文件物理信息还可以包括MD5、数据大小(SIZE)等信息。
本发明实施例一提供了一种基于多数派的数据存储方法,在该方法中,前端服务器可根据接收到的文件上传请求从第一预设值个数的文件组中选取一个文件组,并根据选取出的文件组,将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上,且在确定文件中的数据上传完成,并接收到大于或等于第三预设值个数的存储单元返回的文件物理信息时,则数据写入成功,再将文件中的元数据信息更新至数据库,此时,返回文件上传请求成功。本文的技术方案,通过采用多数派成功方法写入文件,不会出现单点问题,且任意少数派的副本设备出现故障都不会造成数据丢失,另外,服务器采用流式透明代理的机制,将文件内容传输到每个文件副本上,减少了不必要的磁盘IO,且每个副本都是对等关系,不用维护设备之间的主从关系,提高了数据写入成功率。
实施例二基于与本发明实施例一相同的发明构思,本发明实施例二以存储单元为执行主体,提供了一种基于多数派的数据存储方法,其流程示意图如图3所示,此方法可包括以下步骤:
步骤301:存储单元接收前端服务器上传的文件中的数据。
步骤302:当确定文件中的数据接收完成后,向前端服务器发送上述文件的文件物理信息。
对于步骤302,当确定文件中的数据接收完成后,向前端服务器发送上述文件的文件物理信息,可具体之行为:当确定文件中的数据接收完成后,向前端服务器发送包含SHA1的文件物理信息。文件物理信息还可以包括MD5、数据大小(SIZE)等信息。
本文的技术方案,通过采用多数派成功方法写入文件,不会出现单点问题,且任意少数派的副本设备出现故障都不会造成数据丢失,另外,服务器采用流式透明代理的机制,将文件内容传输到每个文件副本上,减少了不必要的磁盘IO,且每个副本都是对等关系,不用维护设备之间的主从关系,提高了数据写入成功率。
以前端服务器和存储单元交互为例,对本文进行详细说明,具体细节描述 可参见上述实施例一和实施例二中的相关描述,重复之处不再赘述,下面简单描述一下整体流程。
1、前端服务器接收到一个文件上传写入请求,选择1个文件组,文件组中保存了3个存储单元对应的存储地址,前端服务器同时建立到3个前端服务器到存储单元的HTTP连接。
2、前端服务器循环读取1M的缓冲区数据,每一个存储单元创建一个线程,并发的将缓冲区数据传输到3个存储单元,与此同时,检测每个存储单元的传输速度,每次发送完缓冲区数据时检查是否能构成多数派的存储单元能成功,且在能保证多数派成功的情况下,将传输速度慢的存储单元主动剔除,即断开该数据传输的HTTP链接。
3、文件内容全部发送完后,接收3个存储单元返回包含SHA1的文件物理信息,由于同一个物理文件的SHA1一定相同,当有2(3/2+1=2)以上的多数派存储单元返回的SHA1相同时,则存储单元写入成功。
4、前端服务器更新完元数据信息后,返回一次文件上传请求成功。需要说明的是,对于返回文件上传请求成功后,多数派副本被正确的写入的数据,文件的访问可以从任何一个正确写入的副本读取数据。
本发明实施例二提供的一种基于多数派的数据存储方法,通过采用多数派成功方法写入文件,不会出现单点问题,且任意少数派的副本设备出现故障都不会造成数据丢失,另外,服务器采用流式透明代理的机制,将文件内容传输到每个文件副本上,减少了不必要的磁盘IO,且每个副本都是对等关系,不用维护设备之间的主从关系,提高了数据写入成功率。
实施例三基于与本发明实施例一相同的发明构思,本发明实施例三提供了一种应用于前端服务器的装置,具体实施可参见上述方法实施例一中的相关描述,重复之处不再赘述,其结构示意图如图4所示,该装置包括:
接收单元41被设置为被设置为接收文件上传请求,并根据文件上传请求从第一预设值个数的文件组中选取一个文件组,其中,每一文件组对应第二预设值个数的存储单元;
上传单元42被设置为接收文件中的数据,并根据选取出的文件组,将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单 元上;
处理单元43被设置为当确定文件中的数据上传完成,且接收到大于或等于第三预设值个数的存储单元返回的文件物理信息时,则数据写入成功,再将文件中的元数据信息更新至数据库,此时,返回文件上传请求成功;
其中,第一预设值、第二预设值和第三预设值均为大于1的正整数,且第三预设值小于或等于第二预设值。
每一文件组中分别存储有各存储单元对应的地址;则,上传单元42,还可被设置为在将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上之前,根据选取出的的文件组中存储的各存储单元对应的地址,同时与选取出的文件组中的每一存储单元建立HTTP连接。
作为一个可执行方法,上传单元42可具体被设置为接收文件中的数据,且每当接收到的文件中的数据的容量达到预设阈值时,根据选取出的文件组,将接收到的文件中的数据分别通过HTTP连接并行上传至选取出的文件组对应的第二预设值个数的存储单元上。
作为一个可执行方法,前端服务器还包括检测单元44:检测单元44被设置为检测每个存储单元的传输速度,并选取出传输速度最慢的存储单元;判断连接的存储单元数量是否大于第三预设值;若是,断开与选取出的传输速度最慢的存储单元之间的HTTP连接,将选取出传输速度最慢的存储单元从选取出的文件组中剔除;在文件中的数据上传完成后,检测是否接收到大于或等于第三预设置个数的存储单元返回的文件物理信息;若是,则确定数据写入成功。
优选地,上传单元42可具体被设置为将接收到的文件中的数据分别通过HTTP连接并以字节流的方式并行上传至选取出的文件组对应的第二预设值个数的存储单元上。
进一步地,处理单元43可具体被设置为接收到各存储单元返回的包含SHA1的文件物理信息;当接收到相同SHA1的文件物理信息大于或等于第三预设值个数时,确定数据写入成功。
本发明实施例三提供了一种前端服务器,前端服务器可根据接收到的文件上传请求从第一预设值个数的文件组中选取一个文件组,并根据选取出的文件组,将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个 数的存储单元上,且在确定文件中的数据上传完成,并接收到大于或等于第三预设值个数的存储单元返回的文件物理信息时,则数据写入成功,再将文件中的元数据信息更新至数据库,此时,返回文件上传请求成功。本文通过采用多数派成功方法写入文件,不会出现单点问题,且任意少数派的副本设备出现故障都不会造成数据丢失,另外,服务器采用流式透明代理的机制,将文件内容传输到每个文件副本上,减少了不必要的磁盘IO,且每个副本都是对等关系,不用维护设备之间的主从关系,提高了数据写入成功率。
实施例四基于与本发明实施例二相同的发明构思,本发明实施例四提供了一种应用于存储单元的装置,具体实施可参见上述方法实施例二中的相关描述,重复之处不再赘述,其结构示意图如图5所示,该装置包括:
接收单元51被设置为接收前端服务器上传的文件中的数据;
发送单元52被设置为当确定文件中的数据接收完成后,向前端服务器发送文件物理信息。
进一步地,发送单元52,可具体被设置为当确定文件中的数据接收完成后,向前端服务器发送包含SHA1的文件物理信息。
本发明实施例四提供了一种存储单元,写入文件通过采用多数派成功方法,不会出现单点问题,且任意少数派的副本设备出现故障,都不会造成数据丢失,另外,服务器采用流式透明代理的机制,将文件内容传输到每个文件副本上,减少了不必要的磁盘IO,且每个副本都是对等关系,不用维护设备之间的主从关系,提高了数据写入成功率。
实施例五基于与本发明实施例一、二相同的发明构思,本发明实施例五提供了一种基于多数派的数据存储系统,该系统的具体实施可参见上述方法实施例一、二中的相关描述,重复之处不再赘述,其结构示意图如图6所示,该系统主要可包括:
前端服务器61被设置为接收文件上传请求,并根据文件上传请求从第一预设值个数的文件组中选取一个文件组,其中,每一文件组对应第二预设值个数的存储单元;接收文件中的数据,并根据选取出的文件组,将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上;当确定文件中的数据上传完成,且接收到大于或等于第三预设值个数的存储单元返 回的文件物理信息时,则数据写入成功,再将文件中的元数据信息更新至数据库,此时,返回文件上传请求成功;其中,第一预设值、第二预设值和第三预设值均为大于1的正整数,且第三预设值小于或等于第二预设值。存储单元62被设置为接收前端服务器上传的文件中的数据,并当确定文件中的数据接收完成后,向前端服务器发送文件物理信息。本发明实施例五提供了一种基于多数派的数据存储系统,前端服务器可根据接收到的文件上传请求从第一预设值个数的文件组中选取一个文件组,并根据选取出的文件组,将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上,且在确定文件中的数据上传完成,并接收到大于或等于第三预设值个数的存储单元返回的文件物理信息时,则数据写入成功,再将文件中的元数据信息更新至数据库,此时,返回文件上传请求成功。本文通过采用多数派成功方法写入文件,不会出现单点问题,且任意少数派的副本设备出现故障都不会造成数据丢失,另外,服务器采用流式透明代理的机制,将文件内容传输到每个文件副本上,减少了不必要的磁盘IO,且每个副本都是对等关系,不用维护设备之间的主从关系,提高了数据写入成功率。
本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的精神和范围,均应涵盖在权利要求范围当中。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储 装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
工业实用性
本文通过采用多数派成功方法写入文件,不会出现单点问题,且任意少数派的副本设备出现故障都不会造成数据丢失,另外服务器采用流式代理,将文件内容传输到各副本上,可减少磁盘IO占用。

Claims (16)

  1. 一种基于多数派的数据存储方法,包括:
    前端服务器接收文件上传请求,根据所述文件上传请求从第一预设值个数的文件组中选取一个文件组;
    接收文件中的数据,根据选取出的文件组,将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上;
    当确定文件中的数据上传完成且接收到大于或等于第三预设值个数的存储单元返回的文件物理信息时,确定数据写入成功,将文件中的元数据信息更新至数据库;
    其中,所述第一预设值、所述第二预设值和所述第三预设值均为大于1的正整数,且所述第三预设值小于或等于所述第二预设值。
  2. 如权利要求1所述的方法,其中,每一文件组中分别存储有各存储单元对应的地址;
    所述将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上之前,所述方法还包括:
    根据选取出的文件组中存储的各存储单元对应的地址,同时与选取出的文件组中的每一存储单元建立HTTP连接。
  3. 如权利要求2所述的方法,其中,所述接收文件中的数据,根据选取出的文件组,将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上,包括:
    接收文件中的数据,每当接收到的文件中的数据的容量达到预设阈值时,根据选取出的文件组,将接收到的文件中的数据分别通过HTTP连接并行上传至选取出的文件组对应的第二预设值个数的存储单元上。
  4. 如权利要求3所述的方法,其中,所述方法还包括:检测每个存储单元的传输速度,并选取出传输速度最慢的存储单元;
    判断连接的存储单元数量是否大于第三预设值;若是,断开与选取出的传输速度最慢的存储单元之间的HTTP连接,将所述选取出的传输速度最慢的存储单元从选取出的文件组中剔除;在文件中的数据上传完成后,检测是否接收 到大于或等于第三预设置个数的存储单元返回的文件物理信息;若是,则确定数据写入成功。
  5. 如权利要求3所述的方法,其中,所述将接收到的文件中的数据分别通过HTTP连接并行上传至选取出的文件组对应的第二预设值个数的存储单元上,包括:
    将接收到的文件中的数据分别通过HTTP连接并以字节流的方式并行上传至选取出的文件组对应的第二预设值个数的存储单元上。
  6. 如权利要求1所述的方法,其中,所述当接收到大于或等于第三预设值个数的存储单元返回的文件物理信息时,确定数据写入成功,包括:
    接收到各存储单元返回包含SHA1的文件物理信息;当接收到具有相同SHA1的文件物理信息大于或等于第三预设值个数时,确定数据写入成功。
  7. 一种计算机可读存储介质,所述存储介质上存储有计算机程序,所述程序被处理器执行时实现权利要求1至6中任意一项所述方法的步骤。
  8. 一种计算机设备,其特征在于,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至6中任意一项所述方法的步骤。
  9. 一种基于多数派的数据存储方法,包括:存储单元接收前端服务器上传的文件中的数据;当确定所述文件中的数据接收完成后,向所述前端服务器发送所述文件的文件物理信息。
  10. 如权利要求9所述的方法,其中,所述当确定所述文件中的数据接收完成后,向所述前端服务器发送所述文件的文件物理信息,包括:
    当确定所述文件中的数据接收完成后,向所述前端服务器发送包含SHA1的文件物理信息。
  11. 一种装置,应用于前端服务器,包括:
    接收单元被设置为接收文件上传请求,并根据所述文件上传请求从第一预设值个数的文件组中选取一个文件组;
    上传单元被设置为接收文件中的数据,并根据选取出的文件组,将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上;
    处理单元被设置为当确定文件中的数据上传完成,且接收到大于或等于第三预设值个数的存储单元返回的文件物理信息时,确定数据写入成功,再将文件中的元数据信息更新至数据库,此时,返回文件上传请求成功;
    其中,所述第一预设值、第二预设值和第三预设值均为大于1的正整数,且第三预设值小于或等于第二预设值。
  12. 如权利要求11所述的装置,其中,每一文件组中分别存储有各存储单元对应的地址;
    所述上传单元,还被设置为在将接收到的文件中的数据分别上传至选取出的文件组对应的第二预设值个数的存储单元上之前,根据选取出的文件组中存储的各存储单元对应的地址,同时与选取出的文件组中的每一存储单元建立HTTP连接;还被设置为每当接收到的文件中的数据的容量达到预设阈值时,根据选取出的文件组,将接收到的文件中的数据分别通过HTTP连接并行上传至选取出的文件组对应的第二预设值个数的存储单元上。
  13. 如权利要求11所述的装置,其中,所述前端服务器还包括检测单元:
    所述检测单元被设置为检测每个存储单元的传输速度,并选取出传输速度最慢的存储单元;判断连接的存储单元数量是否大于第三预设值;若是,断开与选取出传输速度最慢的存储单元之间的HTTP连接,将所述选取出传输速度最慢的存储单元从选取出的文件组中剔除;在文件中的数据上传完成后,检测是否接收到大于或等于第三预设置个数的存储单元返回的文件物理信息;若是,则确定数据写入成功。
  14. 如权利要求11所述的装置,其中,所述处理单元,具体被设置为接收到各存储单元返回包含SHA1的文件物理信息;当接收到具有相同SHA1的文件物理信息大于或等于第三预设值个数时,确定数据写入成功。
  15. 一种装置,应用于存储单元,包括:接收单元,被设置为接收前端服务器上传的文件中的数据;发送单元被设置为当确定所述文件中的数据接收完成后,向所述前端服务器发送文件物理信息。
  16. 如权利要求15所述的装置,其中,所述发送单元,具体用于当确定所述文件中的数据接收完成后,向所述前端服务器发送包含SHA1的文件物理信息。
PCT/CN2017/116513 2016-12-16 2017-12-15 一种基于多数派的数据存储方法、装置、存储介质及设备 WO2018108158A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611168618.4 2016-12-16
CN201611168618.4A CN108206839B (zh) 2016-12-16 2016-12-16 一种基于多数派数据存储方法、装置及系统

Publications (1)

Publication Number Publication Date
WO2018108158A1 true WO2018108158A1 (zh) 2018-06-21

Family

ID=62557999

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/116513 WO2018108158A1 (zh) 2016-12-16 2017-12-15 一种基于多数派的数据存储方法、装置、存储介质及设备

Country Status (2)

Country Link
CN (1) CN108206839B (zh)
WO (1) WO2018108158A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111538716A (zh) * 2020-03-30 2020-08-14 中国平安人寿保险股份有限公司 一种跨系统维护数据库的方法及相关装置
CN112511633A (zh) * 2020-12-03 2021-03-16 苏州浪潮智能科技有限公司 一种海量小文件分块传输的方法、系统、设备及介质
CN114936010A (zh) * 2022-07-20 2022-08-23 阿里巴巴(中国)有限公司 数据处理方法、装置、设备及介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992209B (zh) * 2019-03-29 2023-02-03 新华三技术有限公司成都分公司 数据处理方法、装置及分布式存储系统
CN112383628B (zh) * 2020-11-16 2021-06-18 北京中电兴发科技有限公司 一种基于流式存储的存储网关资源分配方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741911A (zh) * 2009-12-18 2010-06-16 中兴通讯股份有限公司 基于多副本协同的写操作方法、系统及节点
US20110202737A1 (en) * 2010-02-18 2011-08-18 Fujitsu Limited Storage apparatus and storage system
CN103186554A (zh) * 2011-12-28 2013-07-03 阿里巴巴集团控股有限公司 分布式数据镜像方法及存储数据节点
CN103207867A (zh) * 2012-01-16 2013-07-17 联想(北京)有限公司 处理数据块的方法、发起恢复操作的方法和节点
CN105577776A (zh) * 2015-12-17 2016-05-11 上海爱数信息技术股份有限公司 基于数据仲裁者副本的分布式存储系统及方法
CN105760556A (zh) * 2016-04-19 2016-07-13 江苏物联网研究发展中心 低延时高吞吐量的多副本文件读写优化方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741911A (zh) * 2009-12-18 2010-06-16 中兴通讯股份有限公司 基于多副本协同的写操作方法、系统及节点
US20110202737A1 (en) * 2010-02-18 2011-08-18 Fujitsu Limited Storage apparatus and storage system
CN103186554A (zh) * 2011-12-28 2013-07-03 阿里巴巴集团控股有限公司 分布式数据镜像方法及存储数据节点
CN103207867A (zh) * 2012-01-16 2013-07-17 联想(北京)有限公司 处理数据块的方法、发起恢复操作的方法和节点
CN105577776A (zh) * 2015-12-17 2016-05-11 上海爱数信息技术股份有限公司 基于数据仲裁者副本的分布式存储系统及方法
CN105760556A (zh) * 2016-04-19 2016-07-13 江苏物联网研究发展中心 低延时高吞吐量的多副本文件读写优化方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111538716A (zh) * 2020-03-30 2020-08-14 中国平安人寿保险股份有限公司 一种跨系统维护数据库的方法及相关装置
CN112511633A (zh) * 2020-12-03 2021-03-16 苏州浪潮智能科技有限公司 一种海量小文件分块传输的方法、系统、设备及介质
CN112511633B (zh) * 2020-12-03 2022-11-29 苏州浪潮智能科技有限公司 一种海量小文件分块传输的方法、系统、设备及介质
CN114936010A (zh) * 2022-07-20 2022-08-23 阿里巴巴(中国)有限公司 数据处理方法、装置、设备及介质

Also Published As

Publication number Publication date
CN108206839B (zh) 2020-02-07
CN108206839A (zh) 2018-06-26

Similar Documents

Publication Publication Date Title
WO2018108158A1 (zh) 一种基于多数派的数据存储方法、装置、存储介质及设备
US10261853B1 (en) Dynamic replication error retry and recovery
US10467246B2 (en) Content-based replication of data in scale out system
US10127117B2 (en) Online backup to an object service using bulk export
US7882286B1 (en) Synchronizing volumes for replication
US9286298B1 (en) Methods for enhancing management of backup data sets and devices thereof
JP6341543B2 (ja) 分散型記憶システム管理装置および方法
US10831741B2 (en) Log-shipping data replication with early log record fetching
US8126844B2 (en) Multi-thread replication across a network
US9307011B2 (en) Synchronous mirroring of NVLog to multiple destinations (architecture level)
US10705930B2 (en) Service takeover method, storage device, and service takeover apparatus
CN106572153A (zh) 集群的数据存储方法及装置
WO2018121456A1 (zh) 一种数据存储方法、服务器以及存储系统
WO2015085530A1 (zh) 数据复制方法及存储系统
WO2017088572A1 (zh) 一种处理数据的方法、装置及系统
CN109492049B (zh) 用于区块链网络的数据处理、区块生成及同步方法
CN110633168A (zh) 一种分布式存储系统的数据备份方法和系统
CN104794119A (zh) 用于中间件消息的存储与传输方法及系统
US11983438B2 (en) Technique for improving operations log indexing
CN110278222B (zh) 分布式文件存储系统中数据管理的方法、系统以及相关设备
EP3039568B1 (en) Distributed disaster recovery file sync server system
US11194500B2 (en) Resilient implementation of client file operations and replication
US9891992B2 (en) Information processing apparatus, information processing method, storage system and non-transitory computer readable storage media
US20160188228A1 (en) Managing volume updates received during an execution of a volume migration process
US11144232B2 (en) Storage system with efficient snapshot pair creation during synchronous replication of logical storage volumes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17881427

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17881427

Country of ref document: EP

Kind code of ref document: A1