WO2020140394A1 - 文件存储方法及终端设备 - Google Patents

文件存储方法及终端设备 Download PDF

Info

Publication number
WO2020140394A1
WO2020140394A1 PCT/CN2019/091527 CN2019091527W WO2020140394A1 WO 2020140394 A1 WO2020140394 A1 WO 2020140394A1 CN 2019091527 W CN2019091527 W CN 2019091527W WO 2020140394 A1 WO2020140394 A1 WO 2020140394A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
storage node
storage
node
fragmented
Prior art date
Application number
PCT/CN2019/091527
Other languages
English (en)
French (fr)
Inventor
雷琼
郑映锋
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020140394A1 publication Critical patent/WO2020140394A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support

Definitions

  • This application belongs to the field of computer application technology, and in particular, relates to a file storage method, terminal device, and computer-readable storage medium.
  • cloud storage came into being as a new type of storage model. It plays an important role in the field of large-scale data storage such as scientific computing and commercial computing, and is well received by the business community and academics. Widespread concern in the world.
  • the cloud file system is an important part of the cloud storage system. It provides the underlying storage support for the cloud storage system. It is responsible for the effective and reliable storage of data to ensure the availability of data and further ensure the reliability and stability of the storage system.
  • the embodiments of the present application provide a file storage method, a terminal device, and a computer-readable storage medium, to solve the problem that the existing technology cannot meet the data storage and sharing requirements of large files, and the cost of storing and processing data is relatively large The problem.
  • a first aspect of an embodiment of the present application provides a file storage method, including:
  • a second aspect of an embodiment of the present application provides a terminal device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor executes the computer
  • the following steps are realized when the instructions are readable:
  • a third aspect of the embodiments of the present application provides a terminal device, including:
  • the obtaining unit is used to obtain the file to be stored
  • a slicing unit configured to perform redundant slicing of the file to be stored based on erasure coding to obtain at least one slicing data
  • the storage unit is used to obtain the operation data of each storage node in the network, determine the target storage node for storing the sharded data according to the operation data, and send the sharded data to the target storage node.
  • a fourth aspect of the embodiments of the present application provides a computer-readable storage medium that stores computer-readable instructions.
  • the computer-readable instructions include program instructions, and when the program instructions are executed by a processor Causing the processor to perform the method of the first aspect described above.
  • the file to be stored is subjected to redundant fragmentation based on erasure coding to obtain at least one fragmented data, and the operation data of each storage node in the network is obtained according to the operation data Determine a target storage node for storing the sharded data, and send the sharded data to the target storage node. Redundant sharding based on erasure codes is performed on the files to be stored to obtain sharded data, and the sharded data is sent to the corresponding storage node for storage, which reduces the space occupied by large file storage, thereby saving costs and storage costs.
  • FIG. 1 is a flowchart of a file storage method provided in Embodiment 1 of the present application.
  • FIG. 2 is a flowchart of a file storage method provided in Embodiment 2 of the present application.
  • FIG. 3 is a schematic diagram of a terminal device provided in Embodiment 3 of this application.
  • FIG. 4 is a schematic diagram of a terminal device provided in Embodiment 4 of the present application.
  • FIG. 1 is a flowchart of a file storage method provided in Embodiment 1 of the present application.
  • the execution subject of the file storage method is a terminal.
  • Terminals include but are not limited to mobile terminals such as smart phones, tablet computers, and wearable devices, and may also be desktop computers.
  • the file storage method shown in the figure may include the following steps:
  • Cloud storage is an Internet technology that has emerged after cloud computing. It uses virtualization technology to connect a large number of physical storage devices that actually exist, and these devices seem to be looming in a cloud of cloud. They In order to realize the storage service and work together, the purpose of providing storage services to users is finally achieved.
  • the cloud computing system has sufficient storage capacity because it needs to process massive amounts of data, it seems to have been converted into a cloud storage system. Users can use the cloud storage system to access data on the cloud at any time and place, which is convenient and fast.
  • Multi-cloud storage uses multiple cloud services at the same time, and user data is redundantly stored on multiple clouds according to a certain distribution strategy. Because different cloud storage service providers have their own characteristics and advantages, all multi-cloud storage can use different cloud service infrastructure to meet the diversity of user needs. Multi-cloud storage can not only avoid vendor lock-in issues, but also reduce service interruption or data loss problems caused by cloud storage component failures (failures may come from hardware, software, and equipment). Therefore, compared with traditional single cloud storage, multi-cloud storage can improve data availability and fault tolerance.
  • the storage file to be stored can be sent to the storage node by the data owner, or the file to be stored that needs to be stored can be obtained from the data owner.
  • the feature of the file to be stored in this embodiment is that the file package of the file is large. If the file is stored in a storage node, it will occupy more storage space of the storage node. Therefore, this solution will The file to be stored is fragmented to obtain at least one fragmented data, which is used to fragmentally store the fragmented data and reduce the pressure of the storage node to store the data.
  • S102 Perform redundant fragmentation based on erasure coding on the file to be stored to obtain at least one fragmented data.
  • the erasure code redundancy technology has the advantages of strong fault tolerance and high space utilization.
  • n original file blocks can be added to m copy check blocks. If you want to restore the original file, you only need to pass the Any n data blocks in m+n shares.
  • Erasure coding includes two similar processes of encoding and decoding. The encoding process increases the original n shares to n+m. All data blocks exist in different storage nodes. If the number of missing data blocks in the node is less than m, then It can still be recovered from the remaining data blocks.
  • a quadruple (n, k, m, r) is used to represent the erasure code, where k is used to represent the number of data blocks into which the original data object is divided before encoding, and m is used to represent the generated after encoding
  • the number of encoding blocks, n is used to indicate the total number of data blocks after encoding, and r is used to indicate an integer greater than or equal to k.
  • n data blocks are: d 1, d 2, ... , d n, n data blocks is generated according to m parity blocks, respectively, c 1, c 2, ..., c m.
  • n data blocks is generated according to m parity blocks, respectively, c 1, c 2, ..., c m.
  • any n blocks from the n original data blocks and m check blocks can decode the original data, that is, at most m data blocks or check blocks are tolerated at the same time.
  • S103 Obtain the operation data of each storage node in the network, determine a target storage node for storing the sharded data according to the operation data, and send the sharded data to the target storage node.
  • the specific storage method may be to obtain the number of operations of each storage node in the network, determine the target storage node for storing the sharded data according to the number of operations, and then store the sharded data to the target storage node one by one.
  • each storage node in the network may be different, some storage nodes may store more data, or the data they are processing is more, and then send a large amount of fragmented data into The storage node may affect the data processing efficiency of the storage node, and may also affect the reading efficiency when reading fragmented data. Therefore, we obtain the running data of each storage node and measure the current data storage amount of each storage node according to the running data.
  • each fragmented data can be stored by a corresponding storage node.
  • one storage node can be stored, Or at least two pieces of data.
  • each storage node periodically reports to the management node the size of the fragmented data that it can receive at present or within a period of time, and the management node sends it according to each fragmented data.
  • the size of the data can be stored, and then distributed according to the size of each shard data, so that it can fully meet the storage needs of each storage node, and will not cause excessive data load.
  • the file to be stored is subjected to redundant fragmentation based on erasure coding to obtain at least one fragmented data, obtaining the operation data of each storage node in the network, and determining according to the operation data A target storage node for storing the fragmented data, and sending the fragmented data to the target storage node. Redundant sharding based on erasure codes is performed on the files to be stored to obtain sharded data, and the sharded data is sent to the corresponding storage node for storage, which reduces the space occupied by large file storage, thereby saving costs and storage costs.
  • FIG. 2 is a flowchart of a file storage method provided in Embodiment 2 of the present application.
  • the execution subject of the file storage method is a terminal.
  • Terminals include but are not limited to mobile terminals such as smart phones, tablet computers, and wearable devices, and may also be desktop computers.
  • the file storage method shown in the figure may include the following steps:
  • S201 is implemented in exactly the same way as S101 in the embodiment corresponding to FIG. 1.
  • S101 in the embodiment corresponding to FIG. 1
  • S202 Perform redundant fragmentation based on erasure coding on the file to be stored to obtain at least one fragmented data.
  • the data is divided into multiple data blocks by erasure coding, expanded, encoded and then stored in different locations.
  • the parameters in the erasure coding technique can be expressed in the form of (k, n, b, r).
  • k represents the number of a file that has been split into pieces
  • b represents the size of each data block after segmentation
  • n is the number of all data blocks after a file is calculated using erasure coding technology
  • r is an integer and Greater than or equal to k.
  • the file is first divided into multiple data blocks and the operation of encoding operations of these data blocks with specific rules will produce one or more check blocks, and then the check blocks and data blocks All of them are distributed to different nodes in the cloud.
  • the system can get any k available from the n data blocks to start the decoding process.
  • the operating data of the storage node includes but is not limited to the data such as the CPU utilization of each storage node at each moment, the current memory usage, the current disk read and write speed, the current network bandwidth and other data, here No limitation.
  • S204 Calculate the data storage coefficient of each storage node according to the operation data.
  • i ⁇ 1,2,3,4 ⁇ calculates the data storage coefficient of each storage node; where A is used to represent the node ID of the storage node, and ⁇ i is used to represent the weight of CPU, memory, disk and network operating data , Curri i (A) is used to indicate the current CPU utilization, current memory usage, current disk read and write speed, and current network bandwidth of the storage node; Orig i (A) is used to indicate the maximum CPU processing rate and memory capacity of the storage node , Maximum disk read and write speed, network maximum bandwidth.
  • Curr i (A) for the current operating data indicates a storage node, i.e., the collected operational data storage nodes, which may include but is not limited to the current CPU usage, a current memory usage Amount, current disk read and write speed, current network bandwidth;
  • Orig i (A) is used to represent the maximum operating data of the storage node, that is, the storage node's hardware parameters or maximum operating parameters, including but not limited to the maximum CPU processing rate, memory capacity , Disk maximum read and write speed, network maximum bandwidth;
  • ⁇ i is used to represent the weight of CPU, memory, disk and network operating data, these weights can be set according to the manager or the use of each storage node, here is not Be limited.
  • S205 Determine the target storage node corresponding to each of the fragmented data according to the data storage coefficient of each of the storage nodes, and the node of the target storage node corresponding to the data identification of each of the fragmented data Logo.
  • the target storage node corresponding to each piece of data is determined according to the data storage coefficient of each storage node.
  • the storage nodes corresponding to the data storage coefficient are sorted in ascending order, and the shards are divided according to the order of the amount of data of each shard data.
  • the sharded data and the storage node are especially the data identifier and the node identifier, in this solution, the storage node corresponding to each sharded data is determined by the node identifier of the storage node and the data identifier of the sharded data, That is, the node identifier of the target storage node corresponding to the data identifier of each fragmented data.
  • steps S2051 to S2052 may also be included:
  • S2051 Acquire a storage address sent by the target storage node; the storage address is used to indicate a location where the fragmented data is stored in the target storage node.
  • the fragment data is stored locally.
  • the storage location of the sharded data cannot be determined.
  • the storage node sends the storage address of the fragmented data to the terminal of the data owner, so that the data owner can view the fragmented data at any time, or according to the storage The address checks the storage of data to prevent the storage node from deleting or tampering with the fragmented data.
  • S2052 Generate a data index according to the data identifier of the sharded data, the node identifier of the storage node, and the storage address of the sharded data in the target storage node; the data index is used to retrieve data from the target Extract the fragmented data from the storage node.
  • the terminal of the data owner After receiving the storage address sent by the target storage node, the terminal of the data owner generates a data index according to the data identification of the sharded data, the node identification of the storage node, and the storage address of the sharded data in the target storage node, so that View and extract sharded data in the target storage node.
  • the data owner can also check the data storage according to the data index to prevent the storage node from deleting or tampering with the fragmented data.
  • S206 Send the fragmented data to the target storage node identified by the node corresponding to the data identifier of the fragmented data.
  • each fragmented data corresponds to a data identifier
  • each storage node also has a corresponding node identifier.
  • the data identifier of the sharded data is one-to-one corresponding or associated with the node identifier of the target storage node, and sent to the terminal of the management node or the data owner
  • the terminal device of the management node or the data owner sends each fragmented data to the corresponding target storage node according to the associated data identifier and node data identifier.
  • steps S207 to S209 may also be included:
  • S207 Encrypt the fragmented data according to a preset encryption method to obtain encrypted fragmented data.
  • the fragmented data may also be encrypted by a preset encryption method to obtain encrypted fragmented data.
  • each data owner can have a private key, and the data owner's private key is used to encrypt the fragmented data, and the encrypted fragmented data is stored to Storage node.
  • a data digest can be randomly generated according to the data content in each fragmented data as a key to encrypt the fragmented data.
  • the key used by the data digest can only be owned by the management node or the data owner.
  • the management node is used to manage all storage nodes, the management node does not need to store data, and its authority is the highest.
  • the storage node may delete or modify the stored data beyond the authority, which affects the security and privacy of the sharded data, and even affects the data processing of the sharded data by the management node or the data owner Process and processing results, therefore, by storing a summary of each sharded data to the management node or the terminal of the data owner, it is used to initiate verification of the integrity of the sharded data stored by the storage node at any time to ensure data storage Integrity and security.
  • S208 Obtain the operation data of each storage node in the network, and determine a target storage node for storing the encrypted fragmented data according to the operation data.
  • each storage node in the network may be different, some storage nodes may store more data, or the data they are processing is more, and then send a large amount of fragmented data to the storage node , May affect the data processing efficiency of the storage node, and may also affect the reading efficiency when reading fragmented data. Therefore, we obtain the running data of each storage node, and measure the current data storage capacity of each storage node according to the running data.
  • the data storable amount can be stored in the order of high to low to store the encrypted fragmented data of the corresponding size.
  • the size of the encrypted fragment data may be the same or different.
  • the size of the encrypted fragment data is the same, according to the difference in the number of fragment sizes, in the case of a large amount of data storage, a large number of encrypted fragment data can be stored, such as two or more;
  • the encrypted fragment data can be sorted according to the order from large to small, according to the size of the encrypted fragment data from large to small, and the storage capacity of the storage node data is as large as In a small order, arrange the corresponding storage node for each encrypted fragmented data for corresponding data storage to ensure that the storage node with larger data storage capacity can be used to store encrypted fragmented data with larger data volume, thereby ensuring the network
  • the load balance of each node in the network ensures that each storage node can operate in the most efficient state under
  • S209 Send the encrypted fragment data to the target storage node.
  • each sharded data corresponds to a data identifier.
  • the obtained encrypted sharded data also has an encrypted data identifier, which is used for different encrypted sharded data.
  • logo the same time, in this embodiment, each storage node also has a corresponding node identifier.
  • the encrypted data identifier of the encrypted fragmented data and the node identifier of the target storage node are in one-to-one correspondence or association, and sent to the management node or data
  • the management node or the data owner's terminal device sends each encrypted fragmented data to the corresponding target storage node according to the associated encrypted data identifier and node data identifier.
  • step S206 it may further include:
  • the data distribution instruction includes a device identifier of the terminal device, and the storage node corresponds to the data index
  • the data is sent to the terminal device.
  • the data owner terminal may receive the data request information sent by the data requester’s terminal device at any time. After receiving the data request information, the owner terminal parses the data request information to obtain the data identifier of the data requested by the data requester's terminal device, and then determines the data index corresponding to the data identifier according to the data identifier, that is, determines the storage The data identifies the storage node of the corresponding data and its storage location.
  • the data distribution instruction includes the device identifier of the data requester’s terminal device, and the device identifier may be information such as the Internet protocol address of the device, which is used to uniquely determine the terminal device that receives the data corresponding to the data identifier, and does not do here limited.
  • the file to be stored is subjected to redundant fragmentation based on erasure coding to obtain at least one fragmented data, obtaining the operation data of each storage node in the network, and determining according to the operation data A target storage node for storing the fragmented data, and sending the fragmented data to the target storage node.
  • Redundant sharding based on erasure codes is performed on the files to be stored to obtain sharded data, and the sharded data is encrypted to obtain encrypted data corresponding to each sharded data, and the encrypted data is sent to the storage corresponding to the sharding identifier
  • Node storage reduces the space occupied by large file storage, thereby saving costs and storage costs.
  • FIG. 3 is a schematic diagram of a terminal device provided in Embodiment 3 of the present application.
  • Each unit included in the terminal device is used to execute each step in the embodiments corresponding to FIG. 1 to FIG. 2.
  • the terminal device 300 of this embodiment includes:
  • the obtaining unit 301 is used to obtain a file to be stored
  • the fragmentation unit 302 is configured to perform redundant fragmentation based on erasure coding on the file to be stored to obtain at least one fragmentation data;
  • the storage unit 303 is configured to obtain the operation data of each storage node in the network, determine a target storage node for storing the sharded data according to the operation data, and send the sharded data to the target storage node .
  • the storage unit 303 may include:
  • An operation data obtaining unit configured to obtain the operation data of each storage node in the network
  • a calculation unit configured to calculate a data storage coefficient of each storage node according to the operation data
  • the matching unit is configured to determine the target storage node corresponding to each of the fragmented data according to the data storage coefficient of each of the storage nodes, and the target storage corresponding to the data identifier of each of the fragmented data Node ID of the node;
  • the sending unit is configured to send the fragmented data to the target storage node of the node identifier corresponding to the data identifier of the fragmented data.
  • the terminal device may further include:
  • An address obtaining unit configured to obtain a storage address sent by the target storage node; the storage address is used to indicate a location where the fragmented data is stored in the target storage node;
  • An index generating unit configured to generate a data index based on the data identification of the sharded data, the node identification of the storage node, and the storage address of the sharded data in the target storage node; the data index is used Extract the fragmented data from the target storage node.
  • the operation data includes operation data of the CPU, memory, disk, and network of the target storage node;
  • the calculating the data storage coefficient of each storage node according to the operation data includes:
  • i ⁇ 1,2,3,4 ⁇ calculates the data storage coefficient of each of the storage nodes; where A is used to represent the node ID of the storage node, and ⁇ i is used to represent the operating data of the CPU, memory, disk, and network weight, Curr i (a) represented by the current CPU usage for the storage node, the current memory usage, disk access speed of the current, the current network bandwidth; Orig i (a) is used to indicate the maximum rate of the storage node of the CPU, Memory capacity, disk maximum read and write speed, network maximum bandwidth.
  • the terminal device may further include:
  • An encryption unit configured to encrypt the fragmented data according to a preset encryption method to obtain encrypted fragmented data
  • An encrypted storage unit configured to obtain the operating data of each storage node in the network, and determine a target storage node for storing the encrypted fragmented data according to the operating data;
  • An encryption sending unit configured to send the encrypted fragmented data to the target storage node.
  • the file to be stored is subjected to redundant fragmentation based on erasure coding to obtain at least one fragmented data, obtaining the operation data of each storage node in the network, and determining according to the operation data A target storage node for storing the fragmented data, and sending the fragmented data to the target storage node.
  • Redundant sharding based on erasure codes is performed on the files to be stored to obtain sharded data, and the sharded data is encrypted to obtain encrypted data corresponding to each sharded data, and the encrypted data is sent to the storage corresponding to the sharding identifier
  • Node storage reduces the space occupied by large file storage, thereby saving costs and storage costs.
  • the terminal device 4 of this embodiment includes: a processor 40, a memory 41, and computer-readable instructions 42 stored in the memory 41 and executable on the processor 40.
  • the processor 40 executes the computer-readable instruction 42
  • the steps in the above embodiments of each file storage method are implemented, for example, steps 101 to 103 shown in FIG. 1.
  • the processor 40 executes the computer-readable instructions 42
  • the functions of each module/unit in the foregoing device embodiments are realized, for example, the functions of the units 301 to 303 shown in FIG.
  • the computer-readable instructions 42 may be divided into one or more modules/units, the one or more modules/units are stored in the memory 41, and executed by the processor 40, To complete this application.
  • the one or more modules/units may be a series of computer-readable instruction instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 42 in the terminal device 4.
  • the terminal device 4 may be a computing device such as a desktop computer, a notebook, a palmtop computer and a cloud server.
  • the terminal device may include, but is not limited to, the processor 40 and the memory 41.
  • FIG. 4 is only an example of the terminal device 4 and does not constitute a limitation on the terminal device 4, and may include more or less components than the illustration, or a combination of certain components or different components.
  • the terminal device may further include an input and output device, a network access device, a bus, and the like.
  • the so-called processor 40 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 41 may be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4.
  • the memory 41 may also be an external storage device of the terminal device 4, for example, a plug-in hard disk equipped on the terminal device 4, a smart memory card (Smart, Media, Card, SMC), and a secure digital (SD) Card, flash card (Flash Card, FC), etc.
  • the memory 41 may include both an internal storage unit of the terminal device 4 and an external storage device.
  • the memory 41 is used to store the computer-readable instructions and other programs and data required by the terminal device.
  • the memory 41 can also be used to temporarily store data that has been or will be output.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the present application can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing relevant hardware through computer-readable instructions, which can be stored in a computer-readable storage medium in.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种文件存储方法、终端设备及计算机可读存储介质,用于计算机应用技术领域,包括:获取待存储文件(S101),将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据(S102),获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点(S103)。通过将待存储文件进行基于纠删码的冗余分片,得到分片数据,将分片数据发送至对应的存储节点进行存储,降低了大文件存储的占用空间,从而节省费用和存储代价。

Description

文件存储方法及终端设备
本申请申明享有2019年1月4日递交的申请号为201910008544.5、名称为“文件存储方法及终端设备”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本申请属于计算机应用技术领域,尤其涉及一种文件存储方法、终端设备及计算机可读存储介质。
背景技术
随着云计算和大数据技术的快速兴起,云存储作为一种新型的存储模式应运而生,其在科学计算和商业计算等大规模数据存储领域发挥着重要的作用,并受到企业界和学术界的广泛关注。云文件系统是云存储系统的重要组成部分,为云存储系统提供底层的存储支撑,负责对数据进行有效、可靠的存储来保证数据的可用性,进一步保证存储系统的可靠性和稳定性。
然而随着存储需求的不断壮大,用户需要存储的文件的占用存储空间越来越大,例如目前医疗行业中电子胶片和影像图片的文件大小过于庞大,而现有的云存储方式不能满足大文件的数据存储和共享需求,存储和处理数据的代价较大。
技术问题
有鉴于此,本申请实施例提供了一种文件存储方法、终端设备及计算机可读存储介质,以解决现有技术中不能满足大文件的数据存储和共享需求,存储和处理数据的代价较大的问题。
技术解决方案
本申请实施例的第一方面提供了一种文件存储方法,包括:
获取待存储文件;
将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据;
获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点。
本申请实施例的第二方面提供了一种终端设备,包括存储器、处理器以及存储在所述 存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现以下步骤:
获取待存储文件;
将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据;
获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点。
本申请实施例的第三方面提供了一种终端设备,包括:
获取单元,用于获取待存储文件;
分片单元,用于将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据;
存储单元,用于获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点。
本申请实施例的第四方面提供了一种计算机可读存储介质,所述计算机存储介质存储有计算机可读指令,所述计算机可读指令包括程序指令,所述程序指令当被处理器执行时使所述处理器执行上述第一方面的方法。
有益效果
本申请实施例与现有技术相比存在的有益效果是:
本申请实施例通过获取待存储文件,将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据,获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点。通过将待存储文件进行基于纠删码的冗余分片,得到分片数据,将分片数据发送至对应的存储节点进行存储,降低了大文件存储的占用空间,从而节省费用和存储代价。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例一提供的文件存储方法的流程图;
图2是本申请实施例二提供的文件存储方法的流程图;
图3是本申请实施例三提供的终端设备的示意图;
图4是本申请实施例四提供的终端设备的示意图。
本申请的实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。
为了说明本申请所述的技术方案,下面通过具体实施例来进行说明。
实施例1
参见图1,图1是本申请实施例一提供的文件存储方法的流程图。本实施例中文件存储方法的执行主体为终端。终端包括但不限于智能手机、平板电脑、可穿戴设备等移动终端,还可以是台式电脑等。如图所示的文件存储方法可以包括以下步骤:
S101:获取待存储文件。
随着网格计算、物联网以及云计算等技术的快速兴起,数据正在以前所未有的速度进行增长和累积,面对如此大规模的数据,如何进行海量数据的存储和处理给存储系统带来新的挑战。在分布式存储系统发展的基础上,云存储以其低成本、高效率、良好扩展性和高可靠性等特点,迅速成为海量数据存储研究关注的热点。
云存储是继云计算之后兴起的一种互联网技术,它通过虚拟化技术把实实在在存在的大量的物理存储设备连接到一起,而这些设备就好似若隐若现地存在于一团云雾中一般,它们为了实现存储服务而协同运行,最终实现向用户提供存储服务的目的。当云计算系统因需处理海量数据而具备了雄厚的存储能力之后,它就俨然已经转换成了云存储系统。用户利用云存储系统可以在任意时间的地点在云上存取数据,方便而又快捷。
传统的单云存储仅仅依赖于一个云存储服务提供商,为了避免厂商锁定问题和提高可用性,多云存储的概念问世并吸引了越来越多的目光和关注。多云存储是同时使用多个云服务,用户的数据被按照一定的分布策略冗余的存储在多个云上。由于不同的云存储服务提供商都有其各自的特点和优势,所有多云存储可以利用不同的云服务基础设施来满足用户需求的多样性。多云存储不仅可以避免厂商锁定问题,还可以减少由于云存储组件失效(失效可能来自于硬件、软件、设备)引起的服务中断或数据丢失问题。所以,与传统的单云存储对比起来多云存储可以提高数据的可用性和容错性。
在本方案中,可以通过数据所有者将待存储的存储文件发送至存储节点中,也可以是从数据所有者处获取需要存储的待存储文件。本实施例中的待存储文件所具有的特点是该文件的文件包较大,如果将该文件存储在一个存储节点中,则会占用该存储节点较多的存储空间,因此,本方案中将待存储文件进行分片,得到至少一个分片数据,用以将这些分 片数据进行分片存储,降低存储节点存储数据的压力。
S102:将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据。
纠删码冗余技术具有较强的容错能力和高空间利用率等优势,通过纠删码,可以将n份原始文件块,增加m份校验块,如果要还原原始文件,只需要通过将m+n份中的任意n份数据块。纠删码包括编码和解码两个类似的过程,编码过程将原始的n份增加到n+m,全部的数据块存在不同的存储节点中,若节点中数据块丢失的份数小于m,那么还是可以通过剩余的数据块恢复出来。
本实施例中,用一个四元组(n,k,m,r)来表示纠删码,其中k用于表示编码前原始数据对象分割成的数据块数,m用于表示编码后生成的编码块的数目,n用于表示编码后总的数据块数,r用于表示一个大于等于k的整数。纠删码将一个数据对象分为k个块,用集合表示为F=(F 1、F 2、…、F k)。编码后,生成m个编码块,加上原始k个数据块,共产生n个数据块,用集合表示为E=(E 1、E 2、…、E n)。其中,集合F与集合E中的数据块大小相同。在读取数据时,系统只需读取n块数据块中的任意k块就可以修复出原始数据,当系统中失效的块数不超过m块时,可以通过剩余的有效块修复失效节点。
具体的,本方案中的编码使用过程为:
1)将待存文件数据分为k个分片;
2)将k个分片进行冗余编码,生成n(n>k)个冗余分片,并且将它们分别存储在不同的服务器节点上;
3)当用户访问或者修复数据时,从n个分片中选取t(k≤m<n)个有效分片,从每个分片上下载Q比例的存储量来进行译码,恢复原文件数据。
可选的,给定n个数据块分别为:d 1、d 2、…、d n,根据n个数据块生成m个校验块,分别为c 1、c 2、…、c m。对于任意的n和m,从n个原始数据块和m个校验块中任取n块就能解码出原始数据,即最多容忍m个数据块或者校验块同时丢失。
S103:获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点。
在得到至少一个分片数据之后,将这些分片数据分别存储于网络中的存储节点中。具体的存储方式可以是获取网络中每个存储节点的运行数,根据运行数确定用于存储该分片数据的目标存储节点,然后将这些分片数据一一存储至目标存储节点中。
具体的,由于网络中的每个存储节点的运行情况都可能不一样,有的存储节点中可能 存储有较多的数据,或者其正在处理的数据较多,再将大量的分片数据送入该存储节点中,可能影响该存储节点的数据处理效率,也会导致在读取分片数据时影响读取效率。因此,我们通过获取每个存储节点的运行数据,根据运行数据来衡量每个存储节点当前的数据存储量。
需要说明的是,本方案中的分片数据可以有一个,也可以有至少两个,分片数据的数量越多,则说明每个分片数据的大小越小,对应的可存储的存储节点则可能更多一些。但是网络中存储节点的数量有限,为了保证分片数据的数量较多的情况下,每个分片数据都能有对应的存储节点来存储,本实施例中可以在一个存储节点中存储一个,或者至少两个分片数据。
除此之外,还可以是设定每个存储节点周期性的向管理节点汇报自己当前、或者一段时间之内可以接收存储的分片数据的大小,由管理节点根据每个分片数据发送的可存储数据大小,再根据每个分片数据的大小进行分发,这样便可以完全符合每个存储节点的存储需求,也不会导致数据负载过重。
上述方案,通过获取待存储文件,将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据,获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点。通过将待存储文件进行基于纠删码的冗余分片,得到分片数据,将分片数据发送至对应的存储节点进行存储,降低了大文件存储的占用空间,从而节省费用和存储代价。
实施例2
参见图2,图2是本申请实施例二提供的文件存储方法的流程图。本实施例中文件存储方法的执行主体为终端。终端包括但不限于智能手机、平板电脑、可穿戴设备等移动终端,还可以是台式电脑等。如图所示的文件存储方法可以包括以下步骤:
S201:获取待存储文件。
在本实施例中S201与图1对应的实施例中S101的实现方式完全相同,具体可参考图1对应的实施例中的S101的相关描述,在此不再赘述。
S202:将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据。
通过纠删码将数据分割成多个数据块,扩展、编码然后存储在不同的位置。纠删码技术中的参数可表示成(k,n,b,r)的形式。其中,k表示某一文件开始被切分成的数量,b表示的是进行切分之后每一数据块大小,n为某一文件采用纠删码技术计算之后全部数据 块的数量,r为整数且大于等于k。纠删码技术中,首先把文件切分为多个数据块并对这些数据块实施特定规则的编码运算的操作,就会产生出一个或以上的校验块,然后将校验块和数据块一起全部都分布到云中的不同节点上。当用户要访问文件或是要恢复文件时,系统从n个数据块里面获得任意的k个可用的就可以来开始解码过程。
S203:获取网络中每个所述存储节点的运行数据。
由于网络中的每个存储节点的运行情况都可能不一样,有的存储节点中可能存储有较多的数据,或者其正在处理的数据较多,再将大量的分片数据送入该存储节点中,可能影响该存储节点的数据处理效率,也会导致在读取分片数据时影响读取效率。因此,我们通过获取每个存储节点的运行数据,根据运行数据来衡量每个存储节点当前的数据存储量。具体的,在本实施例中,存储节点的运行数据包括但不限于每个存储节点在每一时刻的CPU利用率、当前内存使用量、当前磁盘读写速度、当前网络带宽等数据,此处不做限定。
S204:根据所述运行数据计算每个所述存储节点的数据存储系数。
在获取到每个存储节点的运行数据之后,我们可以根据运行数据来衡量每个存储节点当前的运行情况。为了更加精确的衡量每个存储节点的运行情况,我们对每个存储节点的运行情况进行量化,根据运行数据计算每个存储节点的数据存储系数。
根据公式
Figure PCTCN2019091527-appb-000001
i∈{1,2,3,4}计算每个存储节点的数据存储系数;其中,A用于表示存储节点的节点标识,ω i用于表示CPU、内存、磁盘以及网络的运行数据的权重,Curr i(A)用于表示存储节点的当前CPU利用率、当前内存使用量、当前磁盘读写速度、当前网络带宽;Orig i(A)用于表示存储节点的最大CPU处理速率、内存容量、磁盘最大读写速度、网络最大带宽。
需要说明的是,在上述公式中,Curr i(A)用于表示存储节点的当前运行数据,即所采集到的存储节点的运行数据,其中可以包括但不限于当前CPU利用率、当前内存使用量、当前磁盘读写速度、当前网络带宽;Orig i(A)用于表示存储节点的最大运行数据,即存储节点的硬件参数或者最大运行参数,其中包括但不限于最大CPU处理速率、内存容量、磁盘最大读写速度、网络最大带宽;ω i用于表示CPU、内存、磁盘以及网络的运行数据的权重,这些权重可以根据管理者或者每个存储节点的使用情况进行设定,此处不做限定。
S205:根据每个所述存储节点的所述数据存储系数,确定每个所述分片数据所对应的目标存储节点,以及每个所述分片数据的数据标识所对应的目标存储节点的节点标识。
在计算得到每个存储节点的数据存储系数之后,数据存储系数越大,表示该存储节点具有较强的数据存储能力,其所能存储的数据量则越大,相反的,若数据存储系数越小,则表示对应的存储节点只能存储数据量较小的分片数据。基于此,本实施例中根据每个存储节点的数据存储系数,确定每个分片数据所对应的目标存储节点。
具体的,我们根据数据存储系数的大小,按照由小到大的顺序对数据存储系数对应的存储节点进行排序,并根据每个分片数据的数据量的由大到小的顺序,将分片数据和存储节点一一对应。在本实施例中,分片数据和存储节点分别尤其数据标识和节点标识,在本方案中,通过存储节点的节点标识和分片数据的数据标识来确定每个分片数据对应的存储节点,即每个分片数据的数据标识所对应的目标存储节点的节点标识。
进一步的,在步骤S205之后,还可以包括步骤S2051~S2052:
S2051:获取所述目标存储节点发送的存储地址;所述存储地址用于表示所述分片数据在所述目标存储节点中存放的位置。
在存储节点获取到待存储的分片数据之后,将该分片数据存储至本地。在实际应用中,每个存储节点中可能有多个存储区域用于存储数据,数据所有者在将分片数据发送至存储节点之后,并不能确定该分片数据的存储位置。因为在本方案中,存储节点在存储完毕分片数据之后,向数据所有者的终端发送存储该分片数据的存储地址,以便于数据所有者可以随时查看该分片数据,或者可以根据该存储地址对数据的存储情况进行检验,防止存储节点对该分片数据自行删除或者篡改。
S2052:根据所述分片数据的数据标识、所述存储节点的节点标识以及所述分片数据在所述目标存储节点中的存储地址,生成数据索引;所述数据索引用于从所述目标存储节点中提取所述分片数据。
数据所有者的终端在接收到目标存储节点发送的存储地址之后,根据分片数据的数据标识、存储节点的节点标识以及分片数据在目标存储节点中的存储地址,生成数据索引,以便于从目标存储节点中查看、提取分片数据。数据所有者还可以根据该数据索引对数据的存储情况进行检验,防止存储节点对该分片数据自行删除或者篡改。
S206:将所述分片数据发送至所述分片数据的数据标识所对应的节点标识的目标存储节点。
在确定每个分片数据对应的目标存储节点之后,将该分片数据发送至对应的目标存储节点。在本实施例中,每个分片数据对应有数据标识,每个存储节点也有对应的节点标识。具体的,在确定每个分片数据对应的目标存储节点之后,将该分片数据的数据标识与该目标存储节点的节点标识一一对应或者关联起来,发送至管理节点或者数据所有者的终端设 备,由管理节点或者数据所有者的终端设备根据关联起来的数据标识和节点数据标识将每个分片数据发送至对应的目标存储节点。
进一步的,在步骤S202之后,还可以包括步骤S207~S209:
S207:根据预设的加密方式对所述分片数据进行加密,得到加密的分片数据。
在得到至少一个分片数据之后,为了保证分片数据的安全性,还可以通过预设的加密方式对该分片数据进行加密,得到加密的分片数据。
具体的,在对分片数据进行加密时,每个数据所有者都可以有一个私钥,通过数据所有者的私钥,对该分片数据进行加密,并将加密之后的分片数据存储至存储节点中。
可选的,还可以根据每个分片数据中的数据内容随机生成给一个数据摘要,作为密钥,来对该分片数据进行加密。而该数据摘要作为的密钥只能是管理节点或者数据所有者所有。其中,管理节点是用来管理所有存储节点的节点,管理节点不需要存储数据,其权限最高。
进一步的,由于存储节点可能会对所存储的数据进行权限之外的删除或者修改,而影响分片数据的安全性和私密性,甚至影响管理节点或者数据所有者针对该分片数据的数据处理过程和处理结果,因此,可以通过将每个分片数据的摘要存储至管理节点或者数据所有者的终端,用于随时对存储节点所存储的分片数据的完整性发起验证,以保证数据存储的完整性和安全性。
S208:获取网络中每个存储节点的运行数据,根据所述运行数据确定用来存储所述加密的分片数据的目标存储节点。
由于网络中的每个存储节点的运行情况都可能不一样,有的存储节点中可能存储有较多的数据,或者其正在处理的数据较多,再将大量的分片数据送入该存储节点中,可能影响该存储节点的数据处理效率,也会导致在读取分片数据时影响读取效率。因此,我们通过获取每个存储节点的运行数据,根据运行数据来衡量每个存储节点当前的数据可存储量。
在确定每个存储节点当前的数据可存储量之后,可以将数据可存储量按照由高到低的顺序来存储对应大小的加密之后的分片数据。具体的,加密分片数据的大小可以是相同的,也可以是不同的。当加密分片数据的大小相同时,可以根据分片大小数量的不同,在数据可存储量较大的情况下,可以存储数量较多的加密分片数据,比如两个或者两个以上;当加密分片数据的大小不同时,可以根据按照由大到小的顺序对加密分片数据进行排序,根据加密分片数据大小由大到小的顺序,以及存储节点的数据可存储量有大到小的顺序,对每个加密分片数据安排对应的存储节点进行对应的数据存储,以保证数据可存储量较大的存储节点可以用来存储数据量较大的加密分片数据,进而保证网络中每个节点的负载均衡,保证每个存储节点都能在正常存储数据的情况下,以最高效的状态运行。
S209:将所述加密的分片数据发送至所述目标存储节点。
在确定每个加密的分片数据对应的目标存储节点之后,将该加密的分片数据发送至对应的目标存储节点。在本实施例中,每个分片数据对应有数据标识,在将分片数据进行加密之后,所得到的加密分片数据也同样带有加密数据标识,用于对不同的加密分片数据进行标识。同时在本实施例中,每个存储节点也有对应的节点标识。具体的,在确定每个加密的分片数据对应的目标存储节点之后,将该加密分片数据的加密数据标识与该目标存储节点的节点标识一一对应或者关联起来,发送至管理节点或者数据所有者的终端设备,由管理节点或者数据所有者的终端设备根据关联起来的加密数据标识和节点数据标识将每个加密的分片数据发送至对应的目标存储节点。
进一步的,在步骤S206之后,还可以包括:
获取终端设备发送的数据请求信息;
解析所述数据请求信息,得到所述终端设备所请求的数据的数据标识;
根据所述数据标识,确定所述数据标识对应的数据索引;
根据所述数据索引,向存储所述数据索引对应的数据的存储节点发送数据分发指令;所述数据分发指令中包括所述终端设备的设备标识,用于所述存储节点将所述数据索引对应的数据发送至所述终端设备。
具体的,数据所有者终端在将分片数据发送至存储节点,并根据存储该分片数据的位置生成数据索引之后,可能随时会接收到数据请求者的终端设备发送的数据请求信息,在数据所有者终端接收到数据请求信息之后,通过对数据请求信息进行解析,得到数据请求者的终端设备所请求的数据的数据标识,再根据该数据标识,确定数据标识对应的数据索引,即确定存储该数据标识对应的数据的存储节点及其存储位置。通过数据索引,向存储该数据索引对应的数据的存储节点发送数据分发指令,以通过该数据分发指令通知存储节点向数据请求者终端通知分发数据的信息。其中,数据分发指令中包括数据请求者的终端设备的设备标识,其设备标识可以是设备的互联网协议地址地址等信息,用于唯一确定接收该数据标识对应的数据的终端设备,此处不做限定。
上述方案,通过获取待存储文件,将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据,获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点。通过将待存储文件进行基于纠删码的冗余分片,得到分片数据,对分片数据进行加密,得到每个分片数据对应的加密数据,将加密数据发送至分片标识对应的存储节点进行存储,降低了大文件存储的占用空间,从而节省费用和存储代价。
实施例3
参见图3,图3是本申请实施例三提供的一种终端设备的示意图。终端设备包括的各单元用于执行图1~图2对应的实施例中的各步骤。具体请参阅图1~图2各自对应的实施例中的相关描述。为了便于说明,仅示出了与本实施例相关的部分。本实施例的终端设备300包括:
获取单元301,用于获取待存储文件;
分片单元302,用于将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据;
存储单元303,用于获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点。
进一步的,所述存储单元303可以包括:
运行数据获取单元,用于获取网络中每个所述存储节点的运行数据;
计算单元,用于根据所述运行数据计算每个所述存储节点的数据存储系数;
匹配单元,用于根据每个所述存储节点的所述数据存储系数,确定每个所述分片数据所对应的目标存储节点,以及每个所述分片数据的数据标识所对应的目标存储节点的节点标识;
发送单元,用于将所述分片数据发送至所述分片数据的数据标识所对应的节点标识的目标存储节点。
进一步的,所述终端设备还可以包括:
地址获取单元,用于获取所述目标存储节点发送的存储地址;所述存储地址用于表示所述分片数据在所述目标存储节点中存放的位置;
索引生成单元,用于根据所述分片数据的数据标识、所述存储节点的节点标识以及所述分片数据在所述目标存储节点中的存储地址,生成数据索引;所述数据索引用于从所述目标存储节点中提取所述分片数据。
所述运行数据包括所述目标存储节点的中央处理器CPU、内存、磁盘以及网络的运行数据;
所述根据所述运行数据计算每个所述存储节点的数据存储系数,包括:
根据公式
Figure PCTCN2019091527-appb-000002
i∈{1,2,3,4}计算每个所述存储节点的数据存储系数;其中,A用于表示存储节点的节点标识,ω i用于表示CPU、内存、磁盘以及网络的运行数据的权重,Curr i(A)用于表示存储节点的当前CPU利用率、当前内存使用量、当前磁盘读写速度、当前网络带宽;Orig i(A)用于表示存储节点的最大CPU处理速率、内存容量、磁盘最大读写速度、网络最大带宽。
进一步的,所述终端设备还可以包括:
加密单元,用于根据预设的加密方式对所述分片数据进行加密,得到加密的分片数据;
加密存储单元,用于获取网络中每个存储节点的运行数据,根据所述运行数据确定用来存储所述加密的分片数据的目标存储节点;
加密发送单元,用于将所述加密的分片数据发送至所述目标存储节点。
上述方案,通过获取待存储文件,将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据,获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点。通过将待存储文件进行基于纠删码的冗余分片,得到分片数据,对分片数据进行加密,得到每个分片数据对应的加密数据,将加密数据发送至分片标识对应的存储节点进行存储,降低了大文件存储的占用空间,从而节省费用和存储代价。
实施例4
图4是本申请实施例四提供的终端设备的示意图。如图4所示,该实施例的终端设备4包括:处理器40、存储器41以及存储在所述存储器41中并可在所述处理器40上运行的计算机可读指令42。所述处理器40执行所述计算机可读指令42时实现上述各个文件存储方法实施例中的步骤,例如图1所示的步骤101至103。或者,所述处理器40执行所述计算机可读指令42时实现上述各装置实施例中各模块/单元的功能,例如图3所示单元301至303的功能。
示例性的,所述计算机可读指令42可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器41中,并由所述处理器40执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令指令段,该指令段用于描述所述计算机可读指令42在所述终端设备4中的执行过程。
所述终端设备4可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。 所述终端设备可包括,但不仅限于,处理器40、存储器41。本领域技术人员可以理解,图4仅仅是终端设备4的示例,并不构成对终端设备4的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述终端设备还可以包括输入输出设备、网络接入设备、总线等。
所称处理器40可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器41可以是所述终端设备4的内部存储单元,例如终端设备4的硬盘或内存。所述存储器41也可以是所述终端设备4的外部存储设备,例如所述终端设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card,FC)等。进一步地,所述存储器41还可以既包括所述终端设备4的内部存储单元也包括外部存储设备。所述存储器41用于存储所述计算机可读指令以及所述终端设备所需的其他程序和数据。所述存储器41还可以用于暂时地存储已经输出或者将要输出的数据。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用 时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一计算机可读存储介质中。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种文件存储方法,其特征在于,包括:
    获取待存储文件;
    将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据;
    获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点。
  2. 如权利要求1所述的文件存储方法,其特征在于,所述获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点,包括:
    获取网络中每个所述存储节点的运行数据;
    根据所述运行数据计算每个所述存储节点的数据存储系数;
    根据每个所述存储节点的所述数据存储系数,确定每个所述分片数据所对应的目标存储节点,以及每个所述分片数据的数据标识所对应的目标存储节点的节点标识;
    将所述分片数据发送至所述分片数据的数据标识所对应的节点标识的目标存储节点。
  3. 如权利要求2所述的文件存储方法,其特征在于,所述将所述分片数据发送至所述分片数据的数据标识所对应的节点标识的目标存储节点之后,还包括:
    获取所述目标存储节点发送的存储地址;所述存储地址用于表示所述分片数据在所述目标存储节点中存放的位置;
    根据所述分片数据的数据标识、所述存储节点的节点标识以及所述分片数据在所述目标存储节点中的存储地址,生成数据索引;所述数据索引用于从所述目标存储节点中提取所述分片数据。
  4. 如权利要求2所述的文件存储方法,其特征在于,所述运行数据包括所述目标存储节点的中央处理器CPU、内存、磁盘以及网络的运行数据;
    所述根据所述运行数据计算每个所述存储节点的数据存储系数,包括:
    根据公式
    Figure PCTCN2019091527-appb-100001
    i∈{1,2,3,4}计算每个所述存储节点的数据存储系数;其中,A用于表示存储节点的节点标识,ω i用于表示CPU、内存、磁盘以及网络的运行数据的权重,Curr i(A)用于表示存储节点的当前CPU利用率、当前内存使用量、当前磁盘读写速度、当前网络带宽;Orig i(A)用于表示存储节点的最大CPU处理速率、内存容量、磁盘最大读写速度、网络最大带宽。
  5. 如权利要求1-4任一项所述的文件存储方法,其特征在于,所述将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据之后,包括:
    根据预设的加密方式对所述分片数据进行加密,得到加密的分片数据;
    获取网络中每个存储节点的运行数据,根据所述运行数据确定用来存储所述加密的分片数据的目标存储节点;
    将所述加密的分片数据发送至所述目标存储节点。
  6. 如权利要求3所述的文件存储方法,其特征在于,所述将所述分片数据发送至所述分片数据的数据标识所对应的节点标识的目标存储节点之后,还包括:
    获取终端设备发送的数据请求信息;
    解析所述数据请求信息,得到所述终端设备所请求的数据的数据标识;
    根据所述数据标识,确定所述数据标识对应的数据索引;
    根据所述数据索引,向存储所述数据索引对应的数据的存储节点发送数据分发指令;所述数据分发指令中包括所述终端设备的设备标识,用于所述存储节点将所述数据索引对应的数据发送至所述终端设备。
  7. 一种终端设备,其特征在于,包括存储器以及处理器,所述存储器中存储有可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时,实现如下步骤:
    获取待存储文件;
    将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据;
    获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点。
  8. 如权利要求7所述的终端设备,其特征在于,所述获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点,包括:
    获取网络中每个所述存储节点的运行数据;
    根据所述运行数据计算每个所述存储节点的数据存储系数;
    根据每个所述存储节点的所述数据存储系数,确定每个所述分片数据所对应的目标存储节点,以及每个所述分片数据的数据标识所对应的目标存储节点的节点标识;
    将所述分片数据发送至所述分片数据的数据标识所对应的节点标识的目标存储节点。
  9. 如权利要求8所述的终端设备,其特征在于,所述将所述分片数据发送至所述分片数据的数据标识所对应的节点标识的目标存储节点之后,还包括:
    获取所述目标存储节点发送的存储地址;所述存储地址用于表示所述分片数据在所述目标存储节点中存放的位置;
    根据所述分片数据的数据标识、所述存储节点的节点标识以及所述分片数据在所述目标存储节点中的存储地址,生成数据索引;所述数据索引用于从所述目标存储节点中提取所述分片数据。
  10. 如权利要求8所述的终端设备,其特征在于,所述运行数据包括所述目标存储节点的中央处理器CPU、内存、磁盘以及网络的运行数据;所述根据所述运行数据计算每个所述存储节点的数据存储系数,包括:
    根据公式
    Figure PCTCN2019091527-appb-100002
    i∈{1,2,3,4}计算每个所述存储节点的数据存储系数;其中,A用于表示存储节点的节点标识,ω i用于表示CPU、内存、磁盘以及网络的运行数据的权重,Curr i(A)用于表示存储节点的当前CPU利用率、当前内存使用量、当前磁盘读写速度、当前网络带宽;Orig i(A)用于表示存储节点的最大CPU处理速率、内存容量、磁盘最大读写速度、网络最大带宽。
  11. 一种终端设备,其特征在于,包括:
    获取单元,用于获取待存储文件;
    分片单元,用于将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据;
    存储单元,用于获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点。
  12. 如权利要求11所述的终端设备,其特征在于,所述存储单元包括:
    运行数据获取单元,用于获取网络中每个所述存储节点的运行数据;
    计算单元,用于根据所述运行数据计算每个所述存储节点的数据存储系数;
    匹配单元,用于根据每个所述存储节点的所述数据存储系数,确定每个所述分片数据所对应的目标存储节点,以及每个所述分片数据的数据标识所对应的目标存储节点的节点标识;
    发送单元,用于将所述分片数据发送至所述分片数据的数据标识所对应的节点标识的目标存储节点。
  13. 如权利要求12所述的终端设备,其特征在于,所述终端设备还包括:
    地址获取单元,用于获取所述目标存储节点发送的存储地址;所述存储地址用于表示 所述分片数据在所述目标存储节点中存放的位置;
    索引生成单元,用于根据所述分片数据的数据标识、所述存储节点的节点标识以及所述分片数据在所述目标存储节点中的存储地址,生成数据索引;所述数据索引用于从所述目标存储节点中提取所述分片数据。
    所述运行数据包括所述目标存储节点的中央处理器CPU、内存、磁盘以及网络的运行数据;
  14. 如权利要求12所述的终端设备,其特征在于,所述计算单元包括:
    根据公式
    Figure PCTCN2019091527-appb-100003
    i∈{1,2,3,4}计算每个所述存储节点的数据存储系数;其中,A用于表示存储节点的节点标识,ω i用于表示CPU、内存、磁盘以及网络的运行数据的权重,Curr i(A)用于表示存储节点的当前CPU利用率、当前内存使用量、当前磁盘读写速度、当前网络带宽;Orig i(A)用于表示存储节点的最大CPU处理速率、内存容量、磁盘最大读写速度、网络最大带宽。
  15. 如权利要求12-14所述的终端设备,其特征在于,所述终端设备还包括:
    加密单元,用于根据预设的加密方式对所述分片数据进行加密,得到加密的分片数据;
    加密存储单元,用于获取网络中每个存储节点的运行数据,根据所述运行数据确定用来存储所述加密的分片数据的目标存储节点;
    加密发送单元,用于将所述加密的分片数据发送至所述目标存储节点。
  16. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现:
    获取待存储文件;
    将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据;
    获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点。
  17. 如权利要求16所述的文件存储方法,其特征在于,所述获取网络中每个存储节点的运行数据,根据所述运行数据确定用于存储所述分片数据的目标存储节点,并将所述分片数据发送至所述目标存储节点,包括:
    获取网络中每个所述存储节点的运行数据;
    根据所述运行数据计算每个所述存储节点的数据存储系数;
    根据每个所述存储节点的所述数据存储系数,确定每个所述分片数据所对应的目标存储节点,以及每个所述分片数据的数据标识所对应的目标存储节点的节点标识;
    将所述分片数据发送至所述分片数据的数据标识所对应的节点标识的目标存储节点。
  18. 如权利要求17所述的文件存储方法,其特征在于,所述将所述分片数据发送至所述分片数据的数据标识所对应的节点标识的目标存储节点之后,还包括:
    获取所述目标存储节点发送的存储地址;所述存储地址用于表示所述分片数据在所述目标存储节点中存放的位置;
    根据所述分片数据的数据标识、所述存储节点的节点标识以及所述分片数据在所述目标存储节点中的存储地址,生成数据索引;所述数据索引用于从所述目标存储节点中提取所述分片数据。
  19. 如权利要求17所述的文件存储方法,其特征在于,所述运行数据包括所述目标存储节点的中央处理器CPU、内存、磁盘以及网络的运行数据;
    所述根据所述运行数据计算每个所述存储节点的数据存储系数,包括:
    根据公式
    Figure PCTCN2019091527-appb-100004
    i∈{1,2,3,4}计算每个所述存储节点的数据存储系数;其中,A用于表示存储节点的节点标识,ω i用于表示CPU、内存、磁盘以及网络的运行数据的权重,Curr i(A)用于表示存储节点的当前CPU利用率、当前内存使用量、当前磁盘读写速度、当前网络带宽;Orig i(A)用于表示存储节点的最大CPU处理速率、内存容量、磁盘最大读写速度、网络最大带宽。
  20. 如权利要求16-19任一项所述的文件存储方法,其特征在于,所述将所述待存储文件进行基于纠删码的冗余分片,得到至少一个分片数据之后,包括:
    根据预设的加密方式对所述分片数据进行加密,得到加密的分片数据;
    获取网络中每个存储节点的运行数据,根据所述运行数据确定用来存储所述加密的分片数据的目标存储节点;将所述加密的分片数据发送至所述目标存储节点。
PCT/CN2019/091527 2019-01-04 2019-06-17 文件存储方法及终端设备 WO2020140394A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910008544.5A CN109857710B (zh) 2019-01-04 2019-01-04 文件存储方法及终端设备
CN201910008544.5 2019-01-04

Publications (1)

Publication Number Publication Date
WO2020140394A1 true WO2020140394A1 (zh) 2020-07-09

Family

ID=66893943

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/091527 WO2020140394A1 (zh) 2019-01-04 2019-06-17 文件存储方法及终端设备

Country Status (2)

Country Link
CN (1) CN109857710B (zh)
WO (1) WO2020140394A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407492A (zh) * 2021-06-18 2021-09-17 中国人民银行清算总中心 文件分片存储、分片文件重组方法及装置、文件保护系统
WO2022199155A1 (zh) * 2021-03-24 2022-09-29 华为技术有限公司 一种数据传输的系统、方法以及网络设备
CN117648057A (zh) * 2024-01-29 2024-03-05 瑞达可信安全技术(广州)有限公司 一种基于分布式存储的数据安全管理方法及系统

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857710B (zh) * 2019-01-04 2023-10-27 平安科技(深圳)有限公司 文件存储方法及终端设备
CN110381061A (zh) * 2019-07-19 2019-10-25 广东省新一代通信与网络创新研究院 文件的多云存储方法、下载方法、装置及存储介质
CN111061357B (zh) * 2019-12-13 2021-09-03 北京奇艺世纪科技有限公司 节能方法、装置、电子设备及存储介质
CN110781510B (zh) * 2020-01-02 2020-04-21 广州欧赛斯信息科技有限公司 应用于学分银行系统的数据分片加密方法、装置及服务器
CN111291414A (zh) * 2020-03-11 2020-06-16 深圳市网心科技有限公司 数据存储方法及装置、计算机装置及存储介质
CN113726832B (zh) * 2020-05-26 2024-03-05 杭州海康存储科技有限公司 分布式存储系统的数据存储方法、装置、系统及设备
CN111835848B (zh) * 2020-07-10 2022-08-23 北京字节跳动网络技术有限公司 数据分片方法、装置、电子设备及计算机可读介质
CN112328550A (zh) * 2020-11-03 2021-02-05 深圳壹账通智能科技有限公司 一种分布式文件系统架构下的文件管理方法及装置
CN113485637A (zh) * 2021-05-11 2021-10-08 广州炒米信息科技有限公司 数据存储方法、装置及计算机设备
CN117950600B (zh) * 2024-03-27 2024-06-04 广东力创信息技术有限公司 一种数据存储方法以及相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105100146A (zh) * 2014-05-07 2015-11-25 腾讯科技(深圳)有限公司 数据存储方法、装置及系统
US9201733B2 (en) * 2013-03-13 2015-12-01 Futurewei Technologies, Inc. Systems and methods for data repair
CN106686095A (zh) * 2016-12-30 2017-05-17 郑州云海信息技术有限公司 一种基于纠删码技术的数据存储方法及装置
CN106909470A (zh) * 2017-01-20 2017-06-30 深圳市中博科创信息技术有限公司 基于纠删码的分布式文件系统存储方法及装置
CN109857710A (zh) * 2019-01-04 2019-06-07 平安科技(深圳)有限公司 文件存储方法及终端设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6853506B2 (ja) * 2017-03-31 2021-03-31 日本電気株式会社 ストレージシステム、データソート方法及びプログラム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9201733B2 (en) * 2013-03-13 2015-12-01 Futurewei Technologies, Inc. Systems and methods for data repair
CN105100146A (zh) * 2014-05-07 2015-11-25 腾讯科技(深圳)有限公司 数据存储方法、装置及系统
CN106686095A (zh) * 2016-12-30 2017-05-17 郑州云海信息技术有限公司 一种基于纠删码技术的数据存储方法及装置
CN106909470A (zh) * 2017-01-20 2017-06-30 深圳市中博科创信息技术有限公司 基于纠删码的分布式文件系统存储方法及装置
CN109857710A (zh) * 2019-01-04 2019-06-07 平安科技(深圳)有限公司 文件存储方法及终端设备

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022199155A1 (zh) * 2021-03-24 2022-09-29 华为技术有限公司 一种数据传输的系统、方法以及网络设备
CN113407492A (zh) * 2021-06-18 2021-09-17 中国人民银行清算总中心 文件分片存储、分片文件重组方法及装置、文件保护系统
CN113407492B (zh) * 2021-06-18 2024-03-26 中国人民银行清算总中心 文件分片存储、分片文件重组方法及装置、文件保护系统
CN117648057A (zh) * 2024-01-29 2024-03-05 瑞达可信安全技术(广州)有限公司 一种基于分布式存储的数据安全管理方法及系统

Also Published As

Publication number Publication date
CN109857710A (zh) 2019-06-07
CN109857710B (zh) 2023-10-27

Similar Documents

Publication Publication Date Title
WO2020140394A1 (zh) 文件存储方法及终端设备
US20230342249A1 (en) Check requests in a storage network
US9807171B2 (en) Conclusive write operation dispersed storage network frame
US20190163564A1 (en) Data transfer priority levels
US9811405B2 (en) Cache for file-based dispersed storage
US9098441B2 (en) Storing data integrity information utilizing dispersed storage
US11582299B2 (en) Allocating cache memory in a dispersed storage network
US11016702B2 (en) Hierarchical event tree
US10437678B2 (en) Updating an encoded data slice
US12032442B2 (en) Aggregating audit records in a storage network
US20190004727A1 (en) Using a namespace to augment de-duplication
US10761917B2 (en) Using global namespace addressing in a dispersed storage network
US20230100323A1 (en) Memory Allocation for Block Rebuilding in a Storage Network
US10681138B2 (en) Storing and retrieving multi-format content in a distributed storage network
US20180107423A1 (en) Modifying and utilizing a file structure in a dispersed storage network
JP2018524705A (ja) データ転送中にデータアクセス要求を処理するための方法及びシステム
US10506045B2 (en) Memory access using deterministic function and secure seed
US20180365261A1 (en) Fingerprinting data for more aggressive de-duplication
US20170357666A1 (en) Implementing queues (fifo) and stacks (filo) on top dispersed storage
US10594793B2 (en) Read-prepare requests to multiple memories
US10511665B2 (en) Efficient resource reclamation after deletion of slice from common file
US20170060739A1 (en) Protocols for expanding existing sites in a dispersed storage network
US10409678B2 (en) Self-optimizing read-ahead
WO2022252357A1 (zh) 区块链网络的共识处理方法、装置、设备、系统和介质
US10503595B2 (en) Combining deduplication with locality for efficient and fast storage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19907414

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19907414

Country of ref document: EP

Kind code of ref document: A1