WO2020151323A1 - 一种基于数据分片的数据存储方法、设备及介质 - Google Patents

一种基于数据分片的数据存储方法、设备及介质 Download PDF

Info

Publication number
WO2020151323A1
WO2020151323A1 PCT/CN2019/117869 CN2019117869W WO2020151323A1 WO 2020151323 A1 WO2020151323 A1 WO 2020151323A1 CN 2019117869 W CN2019117869 W CN 2019117869W WO 2020151323 A1 WO2020151323 A1 WO 2020151323A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
storage
target data
fragments
fragmentation
Prior art date
Application number
PCT/CN2019/117869
Other languages
English (en)
French (fr)
Inventor
梁劲峰
郑映锋
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020151323A1 publication Critical patent/WO2020151323A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers

Definitions

  • This application relates to the field of data storage technology, and in particular to a data storage method, device, and medium based on data fragmentation.
  • the embodiments of the present application provide a data storage method, device and medium based on data fragmentation, which help reduce the cost of data storage and maintenance.
  • an embodiment of the present application provides a data storage method based on data sharding, which is applied to a pre-deployed distributed storage system, the distributed storage system includes at least two storage devices, and the method includes:
  • the characteristic information including any one or more of the following information: the data label of the target data, the importance level of the target data, the storage cost of the target data, and The size of the target data;
  • the target data is fragmented using erasure coding technology to obtain at least two data fragments corresponding to the target data, and the at least two data fragments include n original data fragments corresponding to the target data.
  • an embodiment of the present application provides a data processing device, which includes a unit for executing the method of the first aspect.
  • embodiments of the present application provide another data processing device, including a processor and a memory, the processor and the memory are connected to each other, wherein the memory is used to store a computer program that supports the data processing device to execute the above method
  • the computer program includes program instructions
  • the processor is configured to invoke the program instructions to execute the method of the first aspect described above.
  • the data processing device may also include a user interface and/or a communication interface.
  • an embodiment of the present application provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores a computer program, the computer program includes program instructions, and the program instructions When executed by a processor, the processor is caused to execute the method of the first aspect.
  • the implementation of the embodiments of the present application does not require multiple disaster recovery, which avoids data storage redundancy, helps reduce the cost of data storage and maintenance, and improves the security of data storage.
  • FIG. 1 is a schematic flowchart of a data storage method based on data sharding according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of another data storage method based on data fragmentation provided by an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of another data processing device provided by an embodiment of the present application.
  • the technical solution of this application can be applied to a data processing device.
  • the data processing device can be a server, a storage device, a terminal, or other processing device, which is used to process data, including fragmentation processing, and determining the data such as data fragmentation.
  • the storage strategy may indicate the storage location of data such as each data fragment in a pre-deployed distributed storage system.
  • the distributed storage system includes at least two storage devices.
  • the storage strategy may specifically indicate that each data fragment is stored in the at least two storage devices. Storage location in a storage device.
  • the storage device involved in this application can be a server, memory or other storage device (or device), and the terminal can be a mobile phone, computer, tablet, personal computer, smart watch, etc., which is not limited by this application.
  • the distributed storage system may be any distributed system such as a P2P distributed storage system, or may also be a system composed of a P2P distributed storage system and a central storage system, or may also be other storage systems.
  • the application is not limited.
  • the data processing device may be a device in the distributed storage system, such as a storage device (storage center) in the central storage system; or, it may also be a storage device in a P2P distributed storage system; or, it may also be Independent devices (different from the storage devices used to store data in the system), etc., not listed here.
  • the P2P distributed storage system is an open network that allows different users to provide storage on this network, thereby reducing costs.
  • data can be fragmented by using erasure coding technology to obtain multiple data fragments (fragmented data) corresponding to the data, including original data fragments and redundant data fragments, etc.
  • the storage strategy for the multiple data shards can be determined according to the characteristic information of the data, so that the multiple data shards are stored in the storage device of the distributed storage system according to the storage strategy, without the need for multiple disaster recovery. That is, there is no need to perform a complete backup of the same data in multiple locations, which helps to reduce the cost of data storage and maintenance and avoid data storage redundancy.
  • the following takes a system composed of a P2P distributed storage system and a central storage system as an example for the distributed storage system to describe in detail respectively.
  • the erasure coding (Erasure Coding) technology mainly uses the erasure coding algorithm to encode the original data to obtain redundancy, and store the data and the redundancy together to achieve the purpose of fault tolerance.
  • the basic idea is to obtain m redundant elements (that is, m redundant data fragments) through certain calculations of n original data elements (ie, n original data fragments).
  • n original data elements ie, n original data fragments.
  • the original data fragment may also be called a data block or other names, and the redundant data fragment may also be called a check block or other names, which is not limited in this application.
  • the process of obtaining m-block redundant data fragmentation can be called encoding, and the process of recovering erroneous or lost data blocks can be called decoding.
  • the data storage method based on data fragmentation enhances the fault tolerance performance of the system and reduces the system storage overhead.
  • FIG. 1 is a schematic flowchart of a data storage method based on data slicing according to an embodiment of the present application. Specifically, the method of this embodiment can be applied to the aforementioned data processing device. As shown in Figure 1, the data storage method based on data fragmentation may include the following steps:
  • the target data is the data to be stored.
  • the target data may be carried in a data storage request or other request sent from a terminal or other device, and the data processing device may obtain the target data by receiving the request carrying the target data; or;
  • the target data may also be determined when a storage instruction for a certain data is detected, and the data indicated by the storage instruction is the target data; or, the target data may also be data in a specific database, such as data processing equipment.
  • the data in a specific database or queue to be stored is regarded as the target data, etc., which are not listed here.
  • the data processing device can obtain characteristic information of the target data, and the characteristic information can be used to characterize the characteristics of the target data.
  • the characteristic information of the target data may include any one or more of the following information: the data label of the target data, the importance level (priority) of the target data, the storage cost of the target data, and the target The size of the data (data volume) and so on.
  • the feature information can be carried in the above request; another example, the feature information such as data label, importance level, etc. can be determined based on the source of the target data, and the corresponding relationship between the data source and the feature information can be set in advance; The feature information, such as the size of the target data, can be detected in real time, etc.
  • the method for acquiring the feature information is not limited in this application.
  • the at least two data fragments may include n original data fragments and m redundant data fragments corresponding to the target data, and both n and m are integers greater than 0.
  • the n original data fragments are the target data. That is, the data processing device may obtain multiple data fragments by slicing the target data, so as to store the target data based on the multiple data fragments.
  • step 101 and step 102 may be executed first, and then step 101 may be executed, or step 101 and step 102 may be executed simultaneously, which is not limited in this application.
  • the storage policy may indicate the storage location of each of the at least two data fragments in the at least two storage devices, such as which of the at least two storage devices each data fragment is stored in
  • the storage device may also be used to further indicate which memory (or storage device, if multiple memories/storage devices are deployed in the storage device) in which storage device is stored.
  • the storage devices stored in each data fragment may be the same or different, and this application does not limit it.
  • multiple storage policies may be preset, and multiple sets of data characteristic information and the multiple storage policies may be associated and stored, so as to determine the storage strategy based on the characteristic information of the data.
  • the storage information determines the storage strategy to determine the storage location of each data slice in each storage device; alternatively, multiple sets of data feature information, storage information of the storage device, and the multiple storage strategies can be associated and stored, so as to facilitate according to the characteristics of the data
  • the information and the storage information of the storage device determine the storage strategy to determine the storage location of each data segment in each storage device, and then store the corresponding data segment in the corresponding storage location.
  • the data feature information (or storage information) and the storage strategy may have a one-to-one correspondence or a one-to-many correspondence.
  • storage strategy 1 Store n pieces of original data in the storage center of a central storage system, and store m pieces of redundant data in a P2P distributed storage network;
  • storage strategy 2 Store all data The shards are stored in the storage devices of the P2P distributed storage network;
  • storage strategy 3 According to the load of each storage device in the P2P distributed storage network, determine the storage location of each data shard in each storage device (such as small load).
  • the data fragments stored on the storage device can be more than the storage device with heavy load, and another example is to filter out the storage devices with load less than the threshold from each storage device to store the data fragments, etc.), etc., not one by one here Enumerate.
  • the data processing device can determine whether the data feature information matches the storage device of the storage device, and determine the matching storage device from each storage device to store each data segment, that is, generate a storage strategy in real time,
  • the storage strategy can be used to indicate the information of the storage device of each data fragment, such as the identification, for example, according to the size of the target data, determine the storage device whose remaining storage space is greater than the size of the target data in each storage device, and store each data fragment in Determined in the storage device.
  • the data processing device determines the storage strategy for the at least two data fragments, it can determine the characteristics of the target data according to the preset correspondence between the data characteristic information and the storage strategy.
  • the storage strategy corresponding to the information, and the determined storage strategy is used as the storage strategy for the at least two data fragments.
  • the storage location of each data segment indicated by the storage strategy corresponding to the different data feature information in the at least two storage devices is different (partially or completely different).
  • the data processing device can determine the data storage strategy according to different requirements for data storage reliability and/or read performance, for example, it can specifically obtain the reliability and/or readability requirement information of the target data to be stored According to the reliability and/or readability requirements, the label (or importance level) of the target data is determined. If the reliability requirement is high (the reliability parameter is greater than the preset threshold or the reliability information requirement is high or the reliability information includes target keywords), the label of the target data is determined to be a highly reliable label (or the importance level is high); and If the reliability requirement is low (the reliability parameter is not greater than the preset threshold or the reliability information requirement is low or the reliability information does not include the target keyword), the label of the target data is determined to be a low reliability label (or the importance level is low) .
  • the storage strategy corresponding to the tag of the target data can be determined according to the corresponding relationship between each data tag (or importance level) stored in advance and the storage strategy, so as to store the data fragments according to the determined storage strategy. For example, assuming that the reliability and readability of the target data are high, the corresponding tag is tag 1, and the storage strategy corresponding to tag 1 is the aforementioned strategy 1, then the original n blocks of the target data can be Data fragments are stored in the storage center, and m redundant data fragments are stored in the P2P distributed storage network.
  • the corresponding label is label 2
  • the storage strategy corresponding to this label 2 is the above-mentioned strategy 2
  • the reliability and/or accessibility requirements may correspond to the storage cost of the data to be stored.
  • the higher the storage cost of the data the higher the reliability and/or accessibility requirements of the data.
  • the label (or importance level) of the target data can be determined according to the cost interval where the storage cost of the target data is located, and then the label of the target data can be determined according to the correspondence between each data label (or importance level) stored in advance and the storage strategy (Or importance level) the corresponding storage strategy.
  • the data processing device may directly determine the storage strategy based on the corresponding relationship between the storage cost interval and the storage strategy by setting the corresponding relationship between the storage cost interval of the data and the storage strategy.
  • the data processing device may also obtain storage information of each of the at least two storage devices. Further, when determining the storage strategy for the at least two data fragments, the data processing device may determine the at least two data fragments according to the characteristic information of the target data and the storage information of each of the at least two storage devices. A storage strategy for data fragments.
  • the data processing device can determine the storage strategy for the at least two data slices according to the correspondence between the pre-stored data feature information, the storage information of the storage device, and the storage strategy;
  • the characteristic information of the target data determines the initial storage strategy for the at least two data fragments, and then determines the final storage strategy for the at least two data fragments according to the storage information of the storage device, so as to determine the at least two data fragments based on the final storage strategy.
  • Two data fragments are stored.
  • the method of determining the initial storage strategy is similar to the above, and will not be repeated here.
  • the storage information may include any one or more of the following information: remaining storage space, used storage space (load), deployment location, security level, and so on.
  • the data processing device determines the initial storage strategy for the at least two data slices according to the characteristic information of the target data: storing n pieces of original data in the storage center of the central storage system, and storing m pieces of redundant data Fragments are stored in a P2P distributed storage network. Further, the data processing device can also determine the storage location of the m redundant data fragments according to the storage information of the storage device in the P2P distributed storage network to determine the final storage strategy, for example, from each of the P2P distributed storage network The storage devices with remaining storage space greater than the preset space threshold are filtered out of the storage devices to store the m redundant data fragments. Another example is to filter out the storage devices with used storage space less than the threshold from the storage devices in the P2P distributed storage network.
  • the storage device stores the m redundant data fragments. For example, from the storage devices in the P2P distributed storage network, the top L storage devices sorted by the distance between the deployment location and the storage center are sorted from near to far to store the m Block redundant data fragmentation and so on. Wherein, L is an integer greater than zero.
  • One storage device can store one data slice corresponding to the target data, and can also store multiple data slices corresponding to the target data.
  • the stored information includes a security level
  • the security level includes high, medium, and low. If the data processing device determines that the target data label is a high-reliability label according to the storage reliability requirements of the target data, or determines that the target data label is a high-reliability label by other means, it can be based on the storage information of each storage device in the system Such as the security level, a storage device with a high security level is determined from each storage device, and each data segment is stored through the storage device with a high security level. Thereby improving the security and reliability of data storage.
  • the data processing device can also record the storage location of each data fragment, for example, bind the information of each data fragment with the identification of the storage device where the data fragment is located (It can also be bound with the identifier of the memory/storage device), such as generating a list of segmented storage nodes that record the identifier of each data segment and its storage device identifier, so that subsequent data errors or losses can be recovered in time Or lost data.
  • the data processing device may also encrypt the fragmented storage node list to further improve the security of data storage and prevent the storage location of the data fragments from being stolen by illegal elements.
  • the data processing device can also only treat specific data such as data with high reliability and/or reading performance requirements (such as data with specific tags, data with high importance levels, and storage costs higher than the preset cost value.
  • Data, etc. are encrypted, and then when it is determined that the tag of the target data is a specific tag (or the importance level of the target data is high, or the storage cost of the target data is higher than the preset cost value), each corresponding target data Data fragments are encrypted to reduce system overhead.
  • the encryption method of each data segment corresponding to the target data can be the same to save storage overhead for storing the encryption method; or the encryption method of each data segment corresponding to the data can be different, such as the n pieces of original data
  • the encryption method of fragments is different from the encryption method of m-block redundant data fragments to further improve storage security.
  • the data processing device can perform segmentation processing on the target data to be stored by using erasure coding technology to obtain at least two data segments corresponding to the target data, and can obtain characteristic information of the target data. , Determining a storage strategy for the obtained at least two data fragments according to the characteristic information of the target data, so as to store the at least two data fragments in at least two storage devices included in the distributed storage system according to the storage strategy , To avoid the problem of data storage redundancy caused by full backup of the same data in multiple locations, making it helpful to reduce data storage costs and reduce maintenance costs.
  • FIG. 2 is a schematic flowchart of another data storage method based on data slicing according to an embodiment of the present application. Specifically, as shown in FIG. 2, the data storage method based on data fragmentation may include the following steps:
  • the characteristic information of the target data may include any one or more of the following information: the data label of the target data, the importance level of the target data, the storage cost of the target data, the size of the target data, etc., here Do not go into details.
  • the data processing device may also determine a fragmentation ratio for fragmentation processing of the target data, and the fragmentation ratio is used to indicate original data fragmentation and redundant data fragmentation.
  • the ratio between slices that is, the ratio of n and m mentioned above.
  • the fragmentation ratio for fragmenting all data can be the same or different.
  • the fragmentation ratio may also be the ratio between the redundant data fragments and the original data fragments (that is, the ratio of m and n mentioned above), or it may be the ratio between the original data fragments and the original data.
  • the ratio between the total data fragments (that is, the ratio between n and (n+m) above), or it can be the ratio between the redundant data fragments and the total data fragment (that is, the above m and (n+m) m) ratio), etc., I will not list them here.
  • the fragmentation ratio may specifically indicate the ratio value, or may also indicate the specific value of the data fragmentation, such as the aforementioned values of n and m, thereby helping to achieve rapid fragmentation and improving the efficiency of data fragmentation processing.
  • the fragmentation ratio of the target data may be determined according to the scale of the distributed storage system, or may be determined according to the scale of the P2P distributed storage system, or may be determined according to the target data
  • the characteristic information is determined, or it may be determined based on the system scale and the characteristic information of the target data, etc., which is not limited in this application.
  • multiple fragment ratios and multiple sets of storage system scale information can be preset, and the corresponding relationship between each storage system scale information and fragment ratio can be set and obtained.
  • the data processing device can obtain the scale information of the distributed storage system, and then determine the corresponding relationship with the preset storage system scale information and fragmentation ratio The fragmentation ratio corresponding to the scale information of the distributed storage system, and the fragmentation ratio is used as the fragmentation ratio for performing fragmentation processing on the target data.
  • the scale information may include the number corresponding to the at least two storage devices and/or the number of storage devices in the P2P distributed storage network, and so on.
  • multiple fragmentation ratios and multiple data importance levels can be preset to obtain, and each data importance level (or Correspondence between data label or data storage cost or data size) and fragmentation ratio.
  • the characteristic information of the target data may include the importance level of the target data (or data label or data storage cost or data size).
  • the data processing device Before the target data is fragmented using erasure coding technology, the data processing device also According to the preset data importance level (or data label or data storage fee or data size) and the corresponding relationship between the sharding ratio, the importance level (or data label or data storage fee or data size) corresponding to the target data can be determined
  • the fragmentation ratio of, and use the fragmentation ratio as the fragmentation ratio for fragmentation processing of the target data can be determined.
  • the data processing device may also combine any two or more of storage system scale information, data importance level, data label, data storage cost, and data size to determine the fragmentation of the target data.
  • the ratio which can be preset to obtain the corresponding relationship between these parameters and the fragment ratio, and will not be repeated here. Therefore, it is possible to quickly determine the fragmentation ratio corresponding to the target data according to the foregoing correspondence relationship, and perform fragmentation processing on the target data according to the determined fragmentation ratio, which helps to improve the efficiency of data fragmentation processing.
  • the data processing device may also determine a fragmentation ratio for fragmentation processing of the target data, so as to fragment the target data according to the fragmentation ratio. Because the larger m of the fragment, the higher the efficiency of data recovery when data is damaged or lost, and the larger the storage space occupied, therefore, it is necessary to choose between the two according to the actual situation. For example, taking the fragmentation ratio of n/m as described above (or directly determining the values of n and m) as an example, the fragmentation ratio may be determined according to the scale of the system. The larger the system scale, the m The larger the value, the smaller the sharding ratio.
  • the sharding ratio can be determined by determining the target data
  • the priority of the target data is determined according to the priority of the target data. The higher the priority of the target data, the larger the m and the smaller the fragmentation ratio. Thus, the flexibility and reliability of data fragmentation processing can be improved.
  • the data processing device After the data processing device determines the fragmentation ratio of the target data, it can fragment the target data according to the allocation ratio to obtain at least two data fragments, thereby improving the flexibility and reliability of data fragmentation processing It also helps to improve the efficiency of fragmentation processing.
  • the at least two data fragments may include n original data fragments and m redundant data fragments corresponding to the target data, and both n and m are integers greater than 0.
  • the storage policy indicates the storage location of each of the at least two data fragments in the at least two storage devices.
  • steps 201 and 203-204 please refer to the relevant description of steps 101-103 in the embodiment shown in FIG. 1, which will not be repeated here.
  • the normal state may refer to a state in which the data fragment is not error-free or missing, and/or the state in which the data fragment can be read;
  • the abnormal state may refer to the state in which the data fragment is not error or lost, And/or, the state where the data fragment cannot be read.
  • the period can be preset, and the periods for detecting the storage state of data slices can be set to be the same or different.
  • multiple cycles can be preset, and the multiple cycles can be associated and stored with multiple storage scenarios, or the multiple cycles can be associated and stored with data feature information, or the multiple cycles can be associated with the storage system scale.
  • the information is stored in association, etc., that is, the corresponding relationship between the period and the storage scene is preset, or the corresponding relationship between the period and the data feature information is preset, or the corresponding relationship between the period and the storage system scale information is preset, and so on.
  • the data processing device can determine the current storage scene (for example, determine the current storage scene according to the current mode of the device, or determine the current storage scene according to the received scene confirmation instruction, etc.), or determine the characteristic information of the target data, or determine the distribution
  • the scale information of the storage system is used to determine the detection period according to the corresponding correspondence, and to detect the storage status of each data segment according to the determined period. This improves the reliability of status detection.
  • the data storage device may also increase the detection of the storage state in a cycle in combination with a preset trigger condition, and the preset trigger condition may be preset.
  • the preset trigger conditions include detection of a failure to read data and receipt of a detection instruction, and the data processing device can trigger detection of the detection when it detects a failure to read data from a storage device and receives a detection instruction input by the user.
  • the storage state of each of the at least two data fragments helps to further improve the timeliness and reliability of status detection.
  • the data segment that is in an abnormal state may no longer be performed Perform testing to save testing overhead.
  • the data processing device when it detects the storage status of each data segment according to a preset period, it may detect the storage status of each of the at least two data segments according to a preset first period; when it is detected that the number of data fragments in an abnormal state exceeds (reaches) the preset second number threshold, the storage of the data fragments in the normal state among the at least two data fragments is detected according to the preset second cycle status.
  • the time interval corresponding to the second period is less than the time interval corresponding to the first period
  • the second number threshold is less than the first number threshold.
  • the present application can dynamically adjust the detection period, and adjust the detection period to be shorter as it approaches the first number threshold for data reconstruction, so as to further improve the timeliness of the detection of data fragments in abnormal states, thereby Helps improve the efficiency of data reconstruction.
  • the data processing device may also detect the storage state (live state) of data fragments according to a preset cycle, and when the number of data fragments in a normal state is lower than the preset third number threshold, the Fragments of data with errors.
  • the above-mentioned first number threshold and second data threshold can be set to be less than or equal to m
  • the third number threshold can be set to be greater than or equal to n to improve the reliability of data reconstruction.
  • the data processing device can determine each storage location according to the storage location of each data segment, such as the above-mentioned binding relationship or the list of segment storage nodes, so as to obtain the normal state (live) from each storage location.
  • Data fragmentation and data reconstruction Only when the storage location of the data fragments is known can the data be recovered from the network, which improves the confidentiality of the data and further improves the security of the data storage.
  • the reconstructed data slice may be re-stored in the corresponding position (the same position as before reconstruction).
  • the data processing device can re-determine the storage location for the reconstructed data slice, such as storing it in the storage device with the least current load, or storing it in the storage device with the largest remaining storage space, or storing it to a security level The highest storage device, etc., I will not list them all here.
  • the data processing device may re-determine the storage location of each data segment of the target data, for example, determine a new storage strategy, and store each data segment in accordance with the storage location indicated by the new storage strategy. .
  • the new storage location can be re-recorded, such as updating the binding relationship or updating the shard storage node list, etc., to further improve data storage security.
  • the present application can also be combined with blockchain technology to achieve convenient and reliable payment for the system.
  • a terminal that needs to store data For example, the terminal corresponding to the target data can send a transaction request carrying the target data and its storage cost to the blockchain node, and the transaction request is recorded on the blockchain.
  • the node sends the transaction request to the data processing device or the transaction system where the data processing device is located, to process the transaction request to obtain the transaction result, and record the transaction result on the blockchain, which reduces transaction costs and risks , Improve transaction efficiency and security.
  • this solution introduces a P2P distributed storage system as an aid, that is, by combining the P2P distributed storage system with the traditional central storage system, and using erasure coding technology to fragment and distribute data Type storage makes it possible to provide a safe and reliable storage solution at a relatively low price.
  • the erasure code-based data storage method has low redundancy and disk utilization. Advantages of high rate.
  • FIG. 3 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • the data processing device in the embodiment of the present application includes a unit for executing the above-mentioned data storage method based on data fragmentation.
  • the data processing device of this embodiment may be set in a pre-deployed distributed storage system.
  • the distributed storage system may include at least two storage devices.
  • the data processing device 300 of this embodiment may include: an acquiring unit 301 and Processing unit 302. among them,
  • the acquiring unit 301 is configured to acquire feature information of target data to be stored, where the feature information includes any one or more of the following information: the data label of the target data, the importance level of the target data, the The storage cost of the target data and the size of the target data;
  • the processing unit 302 is configured to perform fragmentation processing on the target data using erasure coding technology to obtain at least two data fragments corresponding to the target data, and the at least two data fragments include the target data corresponding N original data fragments and m redundant data fragments, where n and m are both integers greater than 0;
  • the processing unit 302 is further configured to determine a storage strategy for the at least two data fragments according to the characteristic information of the target data, and store the at least two data fragments according to the storage strategy, and the storage strategy indicates The storage location of each of the at least two data fragments in the at least two storage devices is described.
  • the obtaining unit 301 is further configured to obtain scale information of the distributed storage system before the fragmentation processing of the target data using the erasure coding technology, where the scale information includes the at least two storages The corresponding quantity of equipment;
  • the processing unit 302 is further configured to determine a fragmentation ratio corresponding to the scale information of the distributed storage system according to the preset correspondence between the storage system scale information and the fragmentation ratio, and the fragmentation ratio is used to indicate the original The ratio between data fragments and redundant data fragments;
  • the processing unit 302 may be specifically configured to use erasure coding technology and perform fragmentation processing on the target data according to the fragmentation ratio to obtain at least two data fragments corresponding to the target data.
  • the characteristic information of the target data includes the importance level of the target data
  • the processing unit 302 is further configured to determine the importance level corresponding to the target data according to the preset correspondence between the data importance level and the fragmentation ratio before the fragmentation processing of the target data using the erasure coding technology
  • the fragmentation ratio is used to indicate the ratio between original data fragments and redundant data fragments
  • the processing unit 302 may be specifically configured to use erasure coding technology and perform fragmentation processing on the target data according to the fragmentation ratio to obtain at least two data fragments corresponding to the target data.
  • the processing unit 302 may be specifically configured to determine the storage strategy corresponding to the characteristic information of the target data according to the preset correspondence between the data characteristic information and the storage strategy, and use the determined storage strategy as a pair The storage strategy of the at least two data fragments;
  • the storage location of each data segment indicated by the storage strategy corresponding to different data feature information in the at least two storage devices is different.
  • the obtaining unit 301 may also be configured to obtain storage information of each storage device of the at least two storage devices, where the storage information includes any one or more of the following information: remaining storage space, used Storage space, deployment location and security level;
  • the processing unit 302 may be specifically configured to determine a storage strategy for the at least two data fragments according to the characteristic information of the target data and the storage information of each of the at least two storage devices.
  • the obtaining unit 301 may be further configured to, after storing the at least two data fragments according to the storage strategy, detect the status of each of the at least two data fragments according to a preset period.
  • Storage state the storage state includes a normal state and an abnormal state;
  • the processing unit 302 may be further configured to: when it is detected that the number of data fragments in an abnormal state exceeds a preset first number threshold, reconstruct data fragments in a normal state according to the at least two data fragments. Data fragments in abnormal state, and reconstructed data fragments are stored.
  • the acquiring unit 301 may be specifically configured to detect the storage state of each data fragment in the at least two data fragments according to a preset first cycle; when the number of data fragments in an abnormal state is detected When the preset second number threshold is exceeded, detecting the storage state of the data fragments in the normal state among the at least two data fragments according to the preset second cycle;
  • the time interval corresponding to the second period is less than the time interval corresponding to the first period, and the second number threshold is less than the first number threshold.
  • the data processing device can implement part or all of the steps in the data storage method based on data slicing in the embodiment shown in FIG. 1 to FIG. 2 through the foregoing unit.
  • the embodiments of the present application are device embodiments corresponding to the method embodiments, and the description of the method embodiments is also applicable to the embodiments of the present application.
  • FIG. 4 is a schematic structural diagram of another data processing device provided by an embodiment of the present application.
  • the data processing device is used to execute the above-mentioned method.
  • the data processing device 400 in this embodiment may include: one or more processors 401 and a memory 402.
  • the data processing device may further include one or more user interfaces 403 and/or one or more communication interfaces 404.
  • the above-mentioned processor 401, user interface 403, communication interface 404, and memory 402 may be connected through a bus 405, or may be connected in other ways, as illustrated in FIG. 4 by way of a bus.
  • the memory 402 is used to store a computer program, and the computer program includes program instructions, and the processor 401 is used to execute the program instructions stored in the memory 402.
  • the processor 401 may be configured to call the program instructions to perform the following steps: obtain characteristic information of the target data to be stored, the characteristic information including any one or more of the following information: data tags of the target data, The importance level of the target data, the storage cost of the target data, and the size of the target data; the target data is fragmented using erasure coding technology to obtain at least two data corresponding to the target data Fragments, the at least two data fragments include n original data fragments and m redundant data fragments corresponding to the target data, where both n and m are integers greater than 0; according to the target data Determine the storage strategy for the at least two data fragments, and store the at least two data fragments according to the storage strategy, where the storage strategy indicates that each of the at least two data fragments The storage location of data fragments in at least two storage devices.
  • the at least two storage devices are storage devices in a pre-deployed distributed storage system.
  • the processor 401 may also call program instructions to execute the following steps: obtain scale information of the distributed storage system, where the scale information includes The number corresponding to the at least two storage devices; determining the sharding ratio corresponding to the scale information of the distributed storage system according to the preset correspondence between the storage system scale information and the sharding ratio, the sharding ratio Used to indicate the ratio between original data fragments and redundant data fragments;
  • the processor 401 when the processor 401 executes the fragmentation processing of the target data using the erasure coding technology to obtain at least two data fragments corresponding to the target data, the processor 401 may specifically execute the following steps: use erasure coding According to the technology, the target data is fragmented according to the fragmentation ratio to obtain at least two data fragments corresponding to the target data.
  • the characteristic information of the target data includes the importance level of the target data; the processor 401 may also call program instructions to perform the following steps before executing the fragmentation processing of the target data using the erasure coding technique: According to the preset correspondence between the data importance level and the fragmentation ratio, the fragmentation ratio corresponding to the importance level of the target data is determined, and the fragmentation ratio is used to indicate the original data fragmentation and the redundant data fragmentation. Ratio between
  • the processor 401 may specifically execute the following steps: use the erasure coding technology and perform the following steps: The target data is sliced by the slice ratio to obtain at least two data slices corresponding to the target data.
  • the processor 401 when the processor 401 executes the determination of the storage strategy for the at least two data fragments according to the characteristic information of the target data, it may specifically execute the following steps: according to preset data characteristic information and storage strategy Determine the storage strategy corresponding to the characteristic information of the target data, and use the determined storage strategy as the storage strategy for the at least two data fragments;
  • the storage location of each data segment indicated by the storage strategy corresponding to different data feature information in the at least two storage devices is different.
  • the processor 401 may also call program instructions to perform the following steps: obtain storage information of each of the at least two storage devices, where the storage information includes any one or more of the following information: remaining Storage space, used storage space, deployment location and security level;
  • the processor 401 executes the determination of a storage strategy for the at least two data slices according to the characteristic information of the target data, it may specifically execute the following steps: according to the characteristic information of the target data and the at least two data slices The storage information of each storage device in the storage device determines the storage strategy for the at least two data fragments.
  • the processor 401 may also call program instructions to execute the following steps: detect the at least two data fragments according to a preset cycle The storage status of each data segment in the data segment, the storage status includes a normal state and an abnormal state; when it is detected that the number of data segments in an abnormal state exceeds the preset first number threshold, according to the at least two Among the data fragments, the data fragment in the normal state reconstructs the data fragment in the abnormal state, and stores the reconstructed data fragment.
  • the processor 401 when it executes the detection of the storage status of each of the at least two data fragments according to a preset period, it may specifically execute the following steps: The storage state of each data fragment in the at least two data fragments; when it is detected that the number of data fragments in an abnormal state exceeds a preset second number threshold, the at least The storage state of the data fragment in the normal state among the two data fragments; wherein the time interval corresponding to the second period is less than the time interval corresponding to the first period, and the second number threshold is less than the first period. Number threshold.
  • the processor 401 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), and application specific integrated circuits (Application Specific Integrated Circuits). Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the user interface 403 may include an input device and an output device.
  • the input device may include a touch panel, a microphone, etc.
  • the output device may include a display (LCD, etc.), a speaker, and the like.
  • the communication interface 404 may include a receiver and a transmitter for communicating with other devices.
  • the memory 402 may include a read-only memory and a random access memory, and provides instructions and data to the processor 401.
  • a part of the memory 402 may also include a non-volatile random access memory.
  • the memory 402 may also store the aforementioned correspondence between data features and strategies, and so on.
  • the processor 401 described in the embodiment of the present application, etc. can execute the implementation described in the method embodiments shown in FIG. 1 to FIG. 2, and can also execute the various implementations described in FIG. 3 of the embodiment of the present application. The implementation of the unit will not be repeated here.
  • the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it can implement the description in the embodiment corresponding to FIGS. 1 to 2 Part or all of the steps in the data storage method based on data slicing can also implement the functions of the data processing device in the embodiment shown in FIG. 3 or FIG. 4 of the present application, which will not be repeated here.
  • the embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to execute part or all of the steps in the above method.
  • the embodiment of the present application also provides a data storage system.
  • the data storage system may include the above-mentioned data processing device and a storage device in a distributed storage system.
  • the data processing device may be used to perform some or all of the steps in the above method. I will not repeat them here.
  • the computer-readable storage medium may be the internal storage unit of the data processing device described in any of the foregoing embodiments, such as the hard disk or memory of the data processing device.
  • the computer-readable storage medium may also be an external storage device of the data processing device, such as a plug-in hard disk equipped on the data processing device, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital). , SD) card, flash card (Flash Card), etc.
  • the term "and/or” is merely an association relationship describing the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, and both A and B exist. , There are three cases of B alone.
  • the character "/" in this text generally indicates that the associated objects before and after are in an "or” relationship.
  • the size of the sequence numbers of the foregoing processes does not mean the order of execution. The execution sequence of the processes should be determined by their functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Abstract

一种基于数据分片的数据存储方法、设备及介质,应用于数据存储技术领域。其中,该方法包括:获取待存储的目标数据的特征信息(101);使用纠删码技术对所述目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片(102);根据所述目标数据的特征信息确定对所述至少两个数据分片的存储策略,并按照所述存储策略存储所述至少两个数据分片(103)。采用该方法,有助于减少数据存储和维护的成本。

Description

一种基于数据分片的数据存储方法、设备及介质
本申请要求于2019年01月23日提交中国专利局、申请号为201910070379.6、申请名称为“一种基于数据分片的数据存储方法、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据存储技术领域,尤其涉及一种基于数据分片的数据存储方法、设备及介质。
背景技术
随着数字化技术的发展,需要存储的数据量急剧增大。传统存储解决方案为了保证数据安全,一般采取多地灾备,在多地对同一份数据进行完全备份,这就导致数据存储冗余,增加了数据存储和维护的成本。
发明内容
本申请实施例提供一种基于数据分片的数据存储方法、设备及介质,有助于减少数据存储和维护的成本。
第一方面,本申请实施例提供了一种基于数据分片的数据存储方法,应用于预先部署的分布式存储系统,所述分布式存储系统包括至少两个存储设备,所述方法包括:
获取待存储的目标数据的特征信息,所述特征信息包括以下信息中的任一项或多项:所述目标数据的数据标签、所述目标数据的重要等级、所述目标数据的存储费用以及所述目标数据的大小;
使用纠删码技术对所述目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片,所述至少两个数据分片包括所述目标数据对应的n块原始数据分片和m块冗余数据分片,所述n和m均为大于0的整数;
根据所述目标数据的特征信息确定对所述至少两个数据分片的存储策略,并按照所述存储策略存储所述至少两个数据分片,所述存储策略指示了所述至少两个数据分片中每个数据分片在所述至少两个存储设备中的存储位置。
第二方面,本申请实施例提供了一种数据处理设备,该数据处理设备包括用于执行上述第一方面的方法的单元。
第三方面,本申请实施例提供了另一种数据处理设备,包括处理器和存储器,所述处理器和存储器相互连接,其中,所述存储器用于存储支持数据处理设备执行上述方法的计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行上述第一方面的方法。可选的,该数据处理设备还可包括用户接口和/或通信接口。
第四方面,本申请实施例提供了一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行上述第一方面的方法。
实施本申请实施例无需多地灾备,这就避免了数据存储冗余,有助于减少数据存储和 维护的成本,且提升了数据存储的安全性。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图进行说明。
图1是本申请实施例提供的一种基于数据分片的数据存储方法的流程示意图;
图2是本申请实施例提供的另一种基于数据分片的数据存储方法的流程示意图;
图3是本申请实施例提供的一种数据处理设备的结构示意图;
图4是本申请实施例提供的另一种数据处理设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。
本申请的技术方案可应用于数据处理设备中,该数据处理设备可以是服务器、存储设备、终端或其他处理设备,用于对数据进行处理,包括分片处理、确定对数据如数据分片的存储策略等等。该存储策略可以指示数据如各数据分片在预先部署的分布式存储系统中的存储位置,该分布式存储系统包括至少两个存储设备,该存储策略可具体指示各数据分片在该至少两个存储设备中的存储位置。本申请涉及的存储设备可以是服务器、存储器或其他存储设备(或装置),终端可以是手机、电脑、平板、个人计算机、智能手表等,本申请不做限定。
可选的,该分布式存储系统可以为任一分布式系统如P2P分布式存储系统,或者还可以为P2P分布式存储系统和中心式存储系统构成的系统,或者还可以为其他存储系统,本申请不做限定。该数据处理设备可以为该分布式存储系统中的设备,比如该中心式存储系统中的存储设备(存储中心);或者,还可以是P2P分布式存储系统中的存储设备;或者,还可以是独立的设备(区别于系统中的各用于存储数据的存储设备),等等,此处不一一列举。其中,P2P分布式存储系统为一个开放式网络,可允许不同的用户在此网络上提供存储,从而降低成本。
本申请实施例能够通过利用纠删码技术对数据进行分片处理,以得到该数据对应的多个数据分片(分片数据),包括原始数据分片和冗余数据分片等等,并能够根据该数据的特征信息确定对该多个数据分片的存储策略,以按照该存储策略将该多个分片数据分别存储于分布式存储系统的存储设备中,而无需多地灾备,即无需在多地对同一份数据进行完全备份,这就有助于减少数据存储和维护的成本,避免数据存储冗余。以下以分布式存储系统为P2P分布式存储系统和中心式存储系统构成的系统为例,分别详细说明。
其中,纠删码(Erasure Coding)技术主要是通过纠删码算法将原始的数据进行编码得到冗余,并将数据和冗余一并存储起来,以达到容错的目的。其基本思想是将n块原始数据元素(即n块原始数据分片)通过一定的计算,得到m块冗余元素(即m块冗余数据分片)。对于这n+m块的数据分片,当其中任意的m块(或小于m块,包括原始数据和/或冗余数据)分片出错或丢失时,均可以通过对应的重构算法恢复出原来的n块原始数据分片,即恢复出原始的数据。其中,该原始数据分片还可叫做数据块或者其余名称,该冗余 数据分片还可叫做校验块或者其余名称,本申请不做限定。该得到m块冗余数据分片过程可被称为编码(encoding),恢复出错或丢失数据块的过程可被称为解码(decoding)。基于数据分片的数据存储方式,使得增强了系统容错性能,降低了系统存储开销。
请参见图1,图1是本申请实施例提供的一种基于数据分片的数据存储方法的流程示意图。具体的,本实施例的方法可应用于上述的数据处理设备中。如图1所示,该基于数据分片的数据存储方法可以包括以下步骤:
101、获取待存储的目标数据的特征信息。
其中,该目标数据即为待存储的数据。可选的,该目标数据可以是携带于来自终端或其他设备发送的数据存储请求或其他请求中的,数据处理设备可通过接收携带该目标数据的该请求,以获取得到该目标数据;或者;该目标数据也可以是在检测到针对某一数据的存储指令确定出的,该存储指令指示的数据即为目标数据;或者,该目标数据还可以是处于特定数据库的数据,比如数据处理设备可以将处于特定数据库或待存储队列中的数据作为该目标数据,等等,此处不一一列举。
进一步的,在确定出该目标数据之后,数据处理设备即可获取该目标数据的特征信息,该特征信息可用于表征该目标数据的特征。可选的,该目标数据的特征信息可包括以下信息中的任一项或多项:该目标数据的数据标签、该目标数据的重要等级(优先级)、该目标数据的存储费用以及该目标数据的大小(数据量)等等。例如,该特征信息可携带与上述的请求中;又如,该特征信息如数据标签、重要等级等可基于目标数据的来源确定出,具体可预先设置数据来源和特征信息的对应关系;又如,该特征信息如目标数据的大小可以是实时检测出的,等等,对于该特征信息的获取方式,本申请不做限定。
102、使用纠删码技术对该目标数据进行分片处理,以得到该目标数据对应的至少两个数据分片。
其中,该至少两个数据分片可包括该目标数据对应的n块原始数据分片和m块冗余数据分片,该n和m均为大于0的整数。该n块原始数据分片即为该目标数据。也就是说,数据处理设备可通过将目标数据分片得到多个数据分片,以基于该多个数据分片对目标数据进行存储。
可选的,该步骤101和步骤102的执行顺序不受限制,比如还可先执行步骤102,再执行步骤101,或者,该步骤101和步骤102可同时执行,本申请不做限定。
103、根据该目标数据的特征信息确定对该至少两个数据分片的存储策略,并按照该存储策略存储该至少两个数据分片。
其中,该存储策略可指示该至少两个数据分片中每个数据分片在该至少两个存储设备中的存储位置,如每一个数据分片存储于该至少两个存储设备中的哪一个存储设备,或者还可用于进一步指示存储于哪一个存储设备中的哪一个存储器(或存储装置,如果存储设备中部署有多个存储器/存储装置)。各数据分片所存储于的存储设备可以相同,也可以不同,本申请不做限定。
可选的,在一些实施例中,可预先设置多个存储策略(规则),并可将多组数据特征信息和该多个存储策略进行关联存储,以便于根据数据的特征信息确定存储策略来确定对该n块原始数据分片和m块冗余数据分片在各存储设备的存储位置;或者,可将存储设备 的存储信息和该多个存储策略进行关联存储,以便于根据存储设备的存储信息确定存储策略来确定各数据分片在各存储设备的存储位置;或者,可将多组数据特征信息、存储设备的存储信息和该多个存储策略进行关联存储,以便于根据数据的特征信息和存储设备的存储信息确定存储策略来确定各数据分片在各存储设备的存储位置,进而在对应的存储位置存储对应的数据分片。其中,该数据特征信息(或存储信息)和该存储策略可以是一对一的对应关系,也可以是一对多的对应关系。例如,存储策略1:将n块原始数据分片存储于中心式存储系统的存储中心,将m块冗余数据分片存储于P2P分布式存储网络;又如,存储策略2:将所有的数据分片都存储于P2P分布式存储网络的存储设备中;又如,存储策略3:按照P2P分布式存储网络各存储设备的负载确定各数据分片在各存储设备的存储位置(如负载小的存储设备上存储的数据分片可多于负载大的存储设备,又如从各存储设备中筛选出负载小于阈值的存储设备来存储该各数据分片等),等等,此处不一一列举。或者在一些实施例中,数据处理设备可通过确定数据特征信息是否与存储设备的存储设备相匹配,从各存储设备中确定出匹配的存储设备来存储各数据分片,即实时生成存储策略,该存储策略可用于指示各数据分片的存储设备的信息如标识,如根据目标数据的大小确定各存储设备中剩余存储空间大于该目标数据的大小的存储设备,并将各数据分片存储于确定出的存储设备中。
在一种可能的实施方式中,数据处理设备在确定对该至少两个数据分片的存储策略时,可以根据预设的数据特征信息和存储策略的对应关系,确定出与该目标数据的特征信息对应的存储策略,并将确定出的存储策略作为对该至少两个数据分片的存储策略。其中,不同的数据特征信息对应的存储策略指示的各数据分片在该至少两个存储设备中的存储位置存在不同(部分不同或完全不同)。
例如,数据处理设备可根据对数据存储可靠性和/或读取性能的不同要求,来确定数据的存储策略,如具体可获取待存储数据即目标数据的可靠性和/或读取性要求信息,根据该可靠性和/或读取性要求确定目标数据的标签(或重要等级)。如可靠性要求较高(可靠性参数大于预设阈值或可靠性信息要求为高或可靠性信息包括目标关键词)时,确定目标数据的标签为高可靠标签(或重要等级为高);又如可靠性要求较低(可靠性参数不大于预设阈值或可靠性信息要求为低或可靠性信息不包括目标关键词)时,确定目标数据的标签为低可靠标签(或重要等级为低)。进而可根据预先存储的各数据标签(或重要等级)和存储策略的对应关系,确定出与该目标数据的标签对应的存储策略,以便于按照该确定出的存储策略存储该各数据分片。举例来说,假设对目标数据的可靠性和读取性要求较高,其对应的标签为标签1,该标签1对应的存储策略为上述的策略1,则可将该目标数据的n块原始数据分片存储于存储中心,将m块冗余数据分片存储于P2P分布式存储网络。假设对目标数据的可靠性和读取性要求较低,其对应的标签为标签2,该标签2对应的存储策略为上述的策略2,则可将该目标数据的所有的数据分片都存储于P2P分布式存储网络。从而能够根据对数据存储可靠性和读取性能的不同要求,实现对数据分片的灵活存储。
又如,该可靠性和/或读取性要求可与待存储的数据的存储费用相对应,数据的存储费用越高,则可表明数据的可靠性和/或读取性要求越高,从而可根据目标数据的存储费用所在的费用区间,确定目标数据的标签(或重要等级),进而根据预先存储的各数据标签(或 重要等级)和存储策略的对应关系,确定出该目标数据的标签(或重要等级)对应的存储策略。或者,数据处理设备可通过设置数据的存储费用区间和存储策略的对应关系,直接根据目标数据的存储费用所在的费用区间,根据存储费用区间和存储策略的对应关系确定存储策略。
在一种可能的实施方式中,数据处理设备还可获取该至少两个存储设备中每个存储设备的存储信息。进一步的,数据处理设备在确定对该至少两个数据分片的存储策略时,可以根据该目标数据的特征信息和该至少两个存储设备中每个存储设备的存储信息,确定对该至少两个数据分片的存储策略。比如数据处理设备可根据预先存储的数据特征信息、存储设备的存储信息和存储策略三者之间的对应关系,确定出对该至少两个数据分片的存储策略;又如数据处理设备可根据目标数据的特征信息确定出对该至少两个数据分片的初始存储策略,进而根据存储设备的存储信息确定对该至少两个数据分片的最终存储策略,以基于该最终存储策略对该至少两个数据分片进行存储。确定该初始存储策略的方式和上述类似,此处不赘述。可选的,该存储信息可包括以下信息中的任一项或多项:剩余存储空间、已使用存储空间(负载)、部署位置以及安全等级等等。
例如,数据处理设备根据目标数据的特征信息确定出对该至少两个数据分片的初始存储策略为:将n块原始数据分片存储于中心式存储系统的存储中心,将m块冗余数据分片存储于P2P分布式存储网络。进一步的,数据处理设备还可根据P2P分布式存储网络中的存储设备的存储信息确定该m块冗余数据分片的存储位置,以确定最终存储策略,比如从P2P分布式存储网络中的各存储设备中筛选出剩余存储空间大于预设空间阈值的存储设备来存储该m块冗余数据分片,又如从P2P分布式存储网络中的各存储设备中筛选出已使用存储空间小于阈值的存储设备来存储该m块冗余数据分片,又如从P2P分布式存储网络中的各存储设备中筛选出部署位置与存储中心距离由近到远排序的前L个存储设备来存储该m块冗余数据分片等等。其中,L为大于0的整数。一个存储设备可存储该目标数据对应的一个数据分片,也可以存储该目标数据对应的多个数据分片。
又如,假设存储信息包括安全等级,安全等级包括高、中、低。如果数据处理设备根据对目标数据的存储可靠性要求,确定目标数据的标签为高可靠标签,或者通过其他方式确定目标数据的标签为高可靠标签,则可根据系统中的各存储设备的存储信息如安全等级,从各存储设备中确定出安全等级为高的存储设备,并通过安全等级为高的存储设备存储各数据分片。从而提升数据存储的安全性和可靠性。
在存储该分片处理得到的数据分片之后,数据处理设备还可记录各数据分片的存储位置,比如将每个数据分片的信息与该数据分片所在的存储设备的标识进行绑定(还可和存储器/存储装置的标识绑定),如生成记录有各数据分片的标识及其存储设备标识的分片存储节点列表,以便于后续数据出错或丢失时能够及时地恢复该出错或丢失的数据。可选的,数据处理设备还可通过对该分片存储节点列表进行加密,以进一步提升数据存储的安全性,避免数据分片的存储位置被非法分子窃取。进一步可选的,数据处理设备还可仅对特定数据如具有高可靠性和/或读取性能要求的数据(如特定标签的数据、重要等级为高的数据、存储费用高于预设费用值的数据等等)进行加密,进而在确定目标数据的标签为特定标签(或者目标数据的重要等级为高,或者目标数据的存储费用高于预设费用值)时,对该目 标数据对应的各数据分片进行加密,以降低系统开销。其中,该目标数据对应的各数据分片的加密方式可以相同,以节省用于存储该加密方式的存储开销;或者该数据对应的各数据分片的加密方式可以不同,比如该n块原始数据分片的加密方式和m块冗余数据分片的加密方式不同,以进一步提升存储安全性。
在本实施例中,数据处理设备能够通过利用纠删码技术对待存储的目标数据进行分片处理,以得到该目标数据对应的至少两个数据分片,并能够通过获取该目标数据的特征信息,根据该目标数据的特征信息确定对得到的该至少两个数据分片的存储策略,以按照该存储策略将该至少两个数据分片存储于分布式存储系统包括的至少两个存储设备中,避免了在多地对同一份数据进行完全备份导致的数据存储冗余的问题,使得有助于减少数据存储成本以及减少维护成本。
请参见图2,图2是本申请实施例提供的另一种基于数据分片的数据存储方法的流程示意图。具体的,如图2所示,该基于数据分片的数据存储方法可以包括以下步骤:
201、获取待存储的目标数据的特征信息。
其中,该目标数据的特征信息可包括以下信息中的任一项或多项:目标数据的数据标签、目标数据的重要等级、该目标数据的存储费用以及该目标数据的大小等等,此处不赘述。
202、确定对该目标数据的分片比例。
可选的,在对该目标数据进行分片处理之前,数据处理设备还可确定对该目标数据进行分片处理的分片比例,该分片比例用于指示原始数据分片和冗余数据分片之间的比例(即上述的n和m的比例)。对所有数据进行分片处理的分片比例可以相同,也可以不同。在其他实施例中,该分片比例还可以为冗余数据分片和原始数据分片之间的比例(即上述的m和n的比例),或者可以为原始数据分片和原始数据对应的总数据分片之间的比例(即上述的n和(n+m)的比例),或者可以为冗余数据分片和该总数据分片之间的比例(即上述的m和(n+m)的比例),等等,此处不一一列举。该分片比例可具体指示比例值,或者还可指示数据分片的具体值,比如上述的n和m的值,从而有助于实现快速分片,提升数据分片处理的效率。
进一步可选的,对该目标数据的分片比例可以是根据该分布式存储系统的规模确定出,或者可以是根据该P2P分布式存储系统的规模确定出的,或者可以是根据该目标数据的特征信息确定出的,或者可以是根据该系统规模和该目标数据的特征信息确定出的,等等,本申请不做限定。
例如,在一种可能的实施方式中,可预先设置得到多个分片比例以及多组存储系统规模信息,并可设置得到各存储系统规模信息和分片比例的对应关系。在该使用纠删码技术对目标数据进行分片处理之前,数据处理设备可获取该分布式存储系统的规模信息,进而根据预设的存储系统规模信息和分片比例的对应关系,确定出与该分布式存储系统的规模信息对应的分片比例,并将该分片比例作为对目标数据进行分片处理的分片比例。其中,该规模信息可包括该至少两个存储设备对应的数量和/或该P2P分布式存储网络中的存储设备的数量等等。
又如,在一种可能的实施方式中,可预先设置得到多个分片比例以及多个数据重要等 级(或数据标签或数据存储费用或数据大小),并可设置得到各数据重要等级(或数据标签或数据存储费用或数据大小)和分片比例的对应关系。进一步的,该目标数据的特征信息可包括该目标数据的重要等级(或数据标签或数据存储费用或数据大小),在该使用纠删码技术对目标数据进行分片处理之前,数据处理设备还可根据预设的数据重要等级(或数据标签或数据存储费用或数据大小)和分片比例的对应关系,确定出与该目标数据的重要等级(或数据标签或数据存储费用或数据大小)对应的分片比例,并将该分片比例作为对目标数据进行分片处理的分片比例。
又如,在一种可能的实施方式中,数据处理设备还可结合存储系统规模信息、数据重要等级、数据标签、数据存储费用、数据大小中的任两项或以上确定对目标数据的分片比例,具体可预先设置得到这些参数和分片比例的对应关系,此处不赘述。从而能够根据上述的对应关系快速确定出与目标数据对应的分片比例,并按照该确定出的分片比例对该目标数据进行分片处理,使得有助于提升数据分片处理的效率。
也就是说,在对该目标数据进行分片处理之前,数据处理设备还可确定对该目标数据进行分片处理的分片比例,以便于按照该分片比例对该目标数据进行分片。因分片的m越大,数据损坏或丢失时的数据恢复的效率越高,同时占用的存储空间越大,因此需要根据实际情况在两者之间进行取舍。举例来说,以分片比例为如上述的n/m(或直接确定n和m的值)为例,该分片比例可以是根据该系统的规模确定出的,系统规模越大,该m可以越大,该分片比例可以越小,比如该系统中的存储设备的数量越多,该m可以越大,该分片比例越小;或者,该分片比例可以是通过确定该目标数据的优先级,并根据该目标数据的优先级确定出的,目标数据的优先级越高,该m可以越大,该分片比例可以越小。从而能够提升数据分片处理的灵活性和可靠性。
203、使用纠删码技术并按照该分片比例对目标数据进行分片处理,以得到该目标数据对应的至少两个数据分片。
数据处理设备在确定对目标数据的分片比例后,即可根据该分配比例对该目标数据进行分片处理,以得到至少两个数据分片,从而提升了数据分片处理的灵活性和可靠性,且有助于提升分片处理的效率。
其中,该至少两个数据分片可包括该目标数据对应的n块原始数据分片和m块冗余数据分片,该n和m均为大于0的整数。
204、根据该目标数据的特征信息确定对该至少两个数据分片的存储策略,并按照该存储策略存储该至少两个数据分片。
其中,该存储策略指示了该至少两个数据分片中每个数据分片在该至少两个存储设备中的存储位置。
可选的,该步骤201、203-204的描述请参照上述图1所示实施例中步骤101-103的相关描述,此处不赘述。
205、按照预设的周期检测该至少两个数据分片中各数据分片的存储状态,该存储状态包括正常状态和非正常状态。
其中,该正常状态可以是指数据分片未出错或未丢失的状态,和/或,数据分片可以被读取的状态;该非正常状态可以是指数据分片出错未或丢失的状态,和/或,数据分片不能 被读取的状态。
在一些实施例中,该周期可预先设置得到,所有检测数据分片存储状态的周期可以设置为相同,也可以设置为不同。例如,可以预先设置得到多个周期,并将该多个周期分别与多个存储场景进行关联存储,或者将该多个周期与数据特征信息进行关联存储,或者将该多个周期与存储系统规模信息进行关联存储等等,即预设得到周期与存储场景的对应关系,或者预设得到周期与数据特征信息的对应关系,或者预设得到周期与存储系统规模信息的对应关系等等。从而数据处理设备能够通过确定当前存储场景(比如根据设备当前模式确定该当前存储场景,或者根据接收到的场景确认指令确定当前存储场景等等),或者确定目标数据的特征信息,或者确定分布式存储系统的规模信息,以根据对应的对应关系确定出检测的周期,并按照确定出的周期去检测各数据分片的存储状态。这就提升了状态检测的可靠性。
可选的,数据存储设备还可以结合预设触发条件在周期内增加对该存储状态的检测,该预设触发条件可预先设置得到。比如该预设触发条件包括检测到读取数据失败和接收到检测指令,则数据处理设备可在检测到对某一存储设备读取数据失败时,接收到用户输入的检测指令时,触发检测该至少两个数据分片中各数据分片的存储状态。从而有助于进一步提升状态检测的及时性和可靠性。
进一步可选的,如果检测各数据分片中某一数据分片处于非正常状态,后续针对该目标数据对应的各数据分片的检测操作中,可不再对该处于非正常状态的数据分片进行检测,以节省检测开销。
206、当检测到处于非正常状态的数据分片的数目超过预设的第一数目阈值时,根据该至少两个数据分片中处于正常状态的数据分片重构处于非正常状态的数据分片,并存储重构的数据分片。
可选的,数据处理设备在按照预设的周期检测各数据分片的存储状态时,可以是按照预设的第一周期检测该至少两个数据分片中各数据分片的存储状态;当检测到处于非正常状态的数据分片的数目超过(达到)预设的第二数目阈值时,按照预设的第二周期检测该至少两个数据分片中处于正常状态的数据分片的存储状态。其中,该第二周期对应的时间间隔小于该第一周期对应的时间间隔,该第二数目阈值小于该第一数目阈值。也就是说,本申请可通过动态调整检测周期,越接近进行数据重构的第一数目阈值时将检测周期调整为越短,来实现进一步提升非正常状态的数据分片检测的及时性,从而有助于提升数据重构效率。
在其他实施例中,数据处理设备还可按照预设的周期检测数据分片的存储状态(存活状态),当处于正常状态的数据分片的数目低于预设的第三数目阈值时,重构出错的数据分片。其中,上述第一数目阈值和第二数据阈值可设置为小于或等于m,该第三数目阈值可设置为大于或等于n,以提升数据重构的可靠性。
在进行数据重构时,数据处理设备可根据各数据分片的存储位置,比如根据上述的绑定关系或分片存储节点列表确定各存储位置,以从各存储位置获取到正常状态(存活)的数据分片并进行数据重构。只有在知道数据分片的存储位置的情况下,才能从网络上恢复数据,这就提高了数据的保密性,进一步提升了数据存储的安全性。
可选的,在对出错的数据分片进行重构之后,可在相应的位置(与重构前的位置相同)重新存储该重构的数据分片。或者,可选的,数据处理设备还可重新为该重构的数据分片确定存储位置,比如存储到当前负载最小的存储设备,或者存储到剩余存储空间最大的存储设备,或者存储到安全等级最高的存储设备,等等,此处不一一列举。或者,可选的,数据处理设备还可重新确定该目标数据的各数据分片的存储位置,比如确定新的存储策略,并按照重新确定的新的存储策略指示的存储位置存储各数据分片。在重新存储数据分片之后,即可重新记录该新的存储位置,如更新该绑定关系或者更新该分片存储节点列表等,以进一步提升数据存储安全性。
进一步可选的,在一些实施例中,本申请还可以通过与区块链技术相结合,以实现为系统提供便捷,可信的支付。例如,需要存储数据如该目标数据对应的终端可将携带目标数据及其存储费用等信息的交易请求发送到区块链节点,将该交易请求被记录到区块链上,通过该区块链节点将该交易请求发送给数据处理设备或数据处理设备所在的交易系统,以对该交易请求进行处理,以得到交易结果,并将交易结果记录到区块链上,使得降低了交易成本和风险,提升了交易效率和安全性。
本方案在中心存储系统的基础上,引入一个P2P分布式存储系统作为辅助,即通过将P2P分布式存储系统和传统中心式存储系统相结合,并使用纠删码技术对数据进行分片和分布式存储,使得实现了用相对低廉的价格,提供一个安全和可靠的存储方案,且该基于纠删码的数据存储方式相对于多地灾备的数据存储方式,具有冗余度低、磁盘利用率高等优点。
上述方法实施例都是对本申请的基于数据分片的数据存储方法的举例说明,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
请参见图3,图3是本申请实施例提供的一种数据处理设备的结构示意图。本申请实施例的数据处理设备包括用于执行上述基于数据分片的数据存储方法的单元。具体的,本实施例的数据处理设备可设置于预先部署的分布式存储系统,所述分布式存储系统可包括至少两个存储设备,本实施例的数据处理设备300可包括:获取单元301和处理单元302。其中,
获取单元301,用于获取待存储的目标数据的特征信息,所述特征信息包括以下信息中的任一项或多项:所述目标数据的数据标签、所述目标数据的重要等级、所述目标数据的存储费用以及所述目标数据的大小;
处理单元302,用于使用纠删码技术对所述目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片,所述至少两个数据分片包括所述目标数据对应的n块原始数据分片和m块冗余数据分片,所述n和m均为大于0的整数;
处理单元302,还用于根据所述目标数据的特征信息确定对所述至少两个数据分片的存储策略,并按照所述存储策略存储所述至少两个数据分片,所述存储策略指示了所述至少两个数据分片中每个数据分片在所述至少两个存储设备中的存储位置。
可选的,获取单元301,还用于在所述使用纠删码技术对目标数据进行分片处理之前,获取所述分布式存储系统的规模信息,所述规模信息包括所述至少两个存储设备对应的数量;
处理单元302,还用于根据预设的存储系统规模信息和分片比例的对应关系,确定出与所述分布式存储系统的规模信息对应的分片比例,所述分片比例用于指示原始数据分片和冗余数据分片之间的比例;
处理单元302,可具体用于使用纠删码技术并按照所述分片比例对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片。
可选的,所述目标数据的特征信息包括所述目标数据的重要等级;
处理单元302,还用于在所述使用纠删码技术对目标数据进行分片处理之前,根据预设的数据重要等级和分片比例的对应关系,确定出与所述目标数据的重要等级对应的分片比例,所述分片比例用于指示原始数据分片和冗余数据分片之间的比例;
处理单元302,可具体用于使用纠删码技术并按照所述分片比例对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片。
可选的,处理单元302,可具体用于根据预设的数据特征信息和存储策略的对应关系,确定出与所述目标数据的特征信息对应的存储策略,并将确定出的存储策略作为对所述至少两个数据分片的存储策略;
其中,不同的数据特征信息对应的存储策略指示的各数据分片在所述至少两个存储设备中的存储位置存在不同。
可选的,获取单元301,还可用于获取所述至少两个存储设备中每个存储设备的存储信息,所述存储信息包括以下信息中的任一项或多项:剩余存储空间、已使用存储空间、部署位置以及安全等级;
处理单元302,可具体用于根据所述目标数据的特征信息和所述至少两个存储设备中每个存储设备的存储信息,确定对所述至少两个数据分片的存储策略。
可选的,获取单元301,还可用于在所述按照所述存储策略存储所述至少两个数据分片之后,按照预设的周期检测所述至少两个数据分片中各数据分片的存储状态,所述存储状态包括正常状态和非正常状态;
处理单元302,还可用于当检测到处于非正常状态的数据分片的数目超过预设的第一数目阈值时,根据所述至少两个数据分片中处于正常状态的数据分片重构处于非正常状态的数据分片,并存储重构的数据分片。
进一步可选的,获取单元301可具体用于按照预设的第一周期检测所述至少两个数据分片中各数据分片的存储状态;当检测到处于非正常状态的数据分片的数目超过预设的第二数目阈值时,按照预设的第二周期检测所述至少两个数据分片中处于正常状态的数据分片的存储状态;
其中,所述第二周期对应的时间间隔小于所述第一周期对应的时间间隔,所述第二数目阈值小于所述第一数目阈值。
具体的,该数据处理设备可通过上述单元实现上述图1至图2所示实施例中的基于数据分片的数据存储方法中的部分或全部步骤。应理解,本申请实施例是对应方法实施例的装置实施例,对方法实施例的描述,也适用于本申请实施例。
请参见图4,图4是本申请实施例提供的另一种数据处理设备的结构示意图。该数据处理设备用于执行上述的方法。如图4所示,本实施例中的数据处理设备400可以包括: 一个或多个处理器401和存储器402。可选的,该数据处理设备还可包括一个或多个用户接口403,和/或,一个或多个通信接口404。上述处理器401、用户接口403、通信接口404和存储器402可通过总线405连接,或者可以通过其他方式连接,图4中以总线方式进行示例说明。其中,存储器402用于存储计算机程序,所述计算机程序包括程序指令,处理器401用于执行存储器402存储的程序指令。
其中,处理器401可用于调用所述程序指令执行以下步骤:获取待存储的目标数据的特征信息,所述特征信息包括以下信息中的任一项或多项:所述目标数据的数据标签、所述目标数据的重要等级、所述目标数据的存储费用以及所述目标数据的大小;使用纠删码技术对所述目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片,所述至少两个数据分片包括所述目标数据对应的n块原始数据分片和m块冗余数据分片,所述n和m均为大于0的整数;根据所述目标数据的特征信息确定对所述至少两个数据分片的存储策略,并按照所述存储策略存储所述至少两个数据分片,所述存储策略指示了所述至少两个数据分片中每个数据分片在至少两个存储设备中的存储位置。可选的,所述至少两个存储设备为预先部署的分布式存储系统中的存储设备。
可选的,处理器401在执行所述使用纠删码技术对目标数据进行分片处理之前,还可调用程序指令执行以下步骤:获取所述分布式存储系统的规模信息,所述规模信息包括所述至少两个存储设备对应的数量;根据预设的存储系统规模信息和分片比例的对应关系,确定出与所述分布式存储系统的规模信息对应的分片比例,所述分片比例用于指示原始数据分片和冗余数据分片之间的比例;
可选的,处理器401在执行所述使用纠删码技术对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片时,可具体执行以下步骤:使用纠删码技术并按照所述分片比例对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片。
可选的,所述目标数据的特征信息包括所述目标数据的重要等级;处理器401在执行所述使用纠删码技术对目标数据进行分片处理之前,还可调用程序指令执行以下步骤:根据预设的数据重要等级和分片比例的对应关系,确定出与所述目标数据的重要等级对应的分片比例,所述分片比例用于指示原始数据分片和冗余数据分片之间的比例;
处理器401在执行所述使用纠删码技术对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片时,可具体执行以下步骤:使用纠删码技术并按照所述分片比例对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片。
可选的,处理器401在执行所述根据所述目标数据的特征信息确定对所述至少两个数据分片的存储策略时,可具体执行以下步骤:根据预设的数据特征信息和存储策略的对应关系,确定出与所述目标数据的特征信息对应的存储策略,并将确定出的存储策略作为对所述至少两个数据分片的存储策略;
其中,不同的数据特征信息对应的存储策略指示的各数据分片在所述至少两个存储设备中的存储位置存在不同。
可选的,处理器401还可调用程序指令执行以下步骤:获取所述至少两个存储设备中每个存储设备的存储信息,所述存储信息包括以下信息中的任一项或多项:剩余存储空间、已使用存储空间、部署位置以及安全等级;
处理器401在执行所述根据所述目标数据的特征信息确定对所述至少两个数据分片的存储策略时,可具体执行以下步骤:根据所述目标数据的特征信息和所述至少两个存储设备中每个存储设备的存储信息,确定对所述至少两个数据分片的存储策略。
可选的,处理器401在执行所述按照所述存储策略存储所述至少两个数据分片之后,还可调用程序指令执行以下步骤:按照预设的周期检测所述至少两个数据分片中各数据分片的存储状态,所述存储状态包括正常状态和非正常状态;当检测到处于非正常状态的数据分片的数目超过预设的第一数目阈值时,根据所述至少两个数据分片中处于正常状态的数据分片重构处于非正常状态的数据分片,并存储重构的数据分片。
可选的,处理器401在执行所述按照预设的周期检测所述至少两个数据分片中各数据分片的存储状态时,可具体执行以下步骤:按照预设的第一周期检测所述至少两个数据分片中各数据分片的存储状态;当检测到处于非正常状态的数据分片的数目超过预设的第二数目阈值时,按照预设的第二周期检测所述至少两个数据分片中处于正常状态的数据分片的存储状态;其中,所述第二周期对应的时间间隔小于所述第一周期对应的时间间隔,所述第二数目阈值小于所述第一数目阈值。
其中,所述处理器401可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
用户接口403可包括输入设备和输出设备,输入设备可以包括触控板、麦克风等,输出设备可以包括显示器(LCD等)、扬声器等。
通信接口404可包括接收器和发射器,用于与其他设备进行通信。
存储器402可以包括只读存储器和随机存取存储器,并向处理器401提供指令和数据。存储器402的一部分还可以包括非易失性随机存取存储器。例如,存储器402还可以存储上述的数据特征和策略的对应关系等等。
具体实现中,本申请实施例中所描述的处理器401等可执行上述图1至图2所示的方法实施例中所描述的实现方式,也可执行本申请实施例图3所描述的各单元的实现方式,此处不赘述。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时可实现图1至图2所对应实施例中描述的基于数据分片的数据存储方法中的部分或全部步骤,也可实现本申请图3或图4所示实施例的数据处理设备的功能,此处不赘述。
本申请实施例还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述方法中的部分或全部步骤。
本申请实施例还提供了一种数据存储系统,该数据存储系统可包括上述的数据处理设备和分布式存储系统中的存储设备,该数据处理设备可用于执行上述方法中的部分或全部步骤,此处不赘述。
所述计算机可读存储介质可以是前述任一实施例所述的数据处理设备的内部存储单 元,例如数据处理设备的硬盘或内存。所述计算机可读存储介质也可以是所述数据处理设备的外部存储设备,例如所述数据处理设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。
在本申请中,术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
以上所述,仅为本申请的部分实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。

Claims (20)

  1. 一种基于数据分片的数据存储方法,其特征在于,应用于预先部署的分布式存储系统,所述分布式存储系统包括至少两个存储设备,所述方法包括:
    获取待存储的目标数据的特征信息,所述特征信息包括以下信息中的任一项或多项:所述目标数据的数据标签、所述目标数据的重要等级、所述目标数据的存储费用以及所述目标数据的大小;
    使用纠删码技术对所述目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片,所述至少两个数据分片包括所述目标数据对应的n块原始数据分片和m块冗余数据分片,所述n和m均为大于0的整数;
    根据所述目标数据的特征信息确定对所述至少两个数据分片的存储策略,并按照所述存储策略存储所述至少两个数据分片,所述存储策略指示了所述至少两个数据分片中每个数据分片在所述至少两个存储设备中的存储位置。
  2. 根据权利要求1所述的方法,其特征在于,在所述使用纠删码技术对目标数据进行分片处理之前,所述方法还包括:
    获取所述分布式存储系统的规模信息,所述规模信息包括所述至少两个存储设备对应的数量;
    根据预设的存储系统规模信息和分片比例的对应关系,确定出与所述分布式存储系统的规模信息对应的分片比例,所述分片比例用于指示原始数据分片和冗余数据分片之间的比例;
    所述使用纠删码技术对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片,包括:
    使用纠删码技术并按照所述分片比例对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片。
  3. 根据权利要求1所述的方法,其特征在于,所述目标数据的特征信息包括所述目标数据的重要等级;在所述使用纠删码技术对目标数据进行分片处理之前,所述方法还包括:
    根据预设的数据重要等级和分片比例的对应关系,确定出与所述目标数据的重要等级对应的分片比例,所述分片比例用于指示原始数据分片和冗余数据分片之间的比例;
    所述使用纠删码技术对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片,包括:
    使用纠删码技术并按照所述分片比例对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述根据所述目标数据的特征信息确定对所述至少两个数据分片的存储策略,包括:
    根据预设的数据特征信息和存储策略的对应关系,确定出与所述目标数据的特征信息对应的存储策略,并将确定出的存储策略作为对所述至少两个数据分片的存储策略;
    其中,不同的数据特征信息对应的存储策略指示的各数据分片在所述至少两个存储设备中的存储位置存在不同。
  5. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:
    获取所述至少两个存储设备中每个存储设备的存储信息,所述存储信息包括以下信息中的任一项或多项:剩余存储空间、已使用存储空间、部署位置以及安全等级;
    所述根据所述目标数据的特征信息确定对所述至少两个数据分片的存储策略,包括:
    根据所述目标数据的特征信息和所述至少两个存储设备中每个存储设备的存储信息,确定对所述至少两个数据分片的存储策略。
  6. 根据权利要求1所述的方法,其特征在于,在所述按照所述存储策略存储所述至少两个数据分片之后,所述方法还包括:
    按照预设的第一周期检测所述至少两个数据分片中各数据分片的存储状态,所述存储状态包括正常状态和非正常状态;
    当检测到处于非正常状态的数据分片的数目超过预设的第二数目阈值时,按照预设的第二周期检测所述至少两个数据分片中处于正常状态的数据分片的存储状态;
    当检测到处于非正常状态的数据分片的数目超过预设的第一数目阈值时,根据所述至少两个数据分片中处于正常状态的数据分片重构处于非正常状态的数据分片,并存储重构的数据分片;
    其中,所述第二周期对应的时间间隔小于所述第一周期对应的时间间隔,所述第二数目阈值小于所述第一数目阈值。
  7. 根据权利要求1所述的方法,其特征在于,在所述按照所述存储策略存储所述至少两个数据分片之后,所述方法还包括:
    记录各数据分片的存储位置,并生成记录有各数据分片的标识及其存储设备标识的分片存储节点列表;
    对所述分片存储节点列表进行加密;
    当确定所述目标数据的标签为特定标签,或者,确定所述目标数据的重要等级为高,或者,确定所述目标数据的存储费用高于预设费用值时,对所述目标数据对应的各数据分片进行加密,且该n块原始数据分片的加密方式和m块冗余数据分片的加密方式不同。
  8. 一种数据处理设备,其特征在于,设置于预先部署的分布式存储系统,所述分布式存储系统包括至少两个存储设备,包括:获取单元和处理单元;
    所述获取单元,用于获取待存储的目标数据的特征信息,所述特征信息包括以下信息中的任一项或多项:所述目标数据的数据标签、所述目标数据的重要等级、所述目标数据的存储费用以及所述目标数据的大小;
    所述处理单元,用于使用纠删码技术对所述目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片,所述至少两个数据分片包括所述目标数据对应的n块原始数据分片和m块冗余数据分片,所述n和m均为大于0的整数;
    所述处理单元,还用于根据所述目标数据的特征信息确定对所述至少两个数据分片的存储策略,并按照所述存储策略存储所述至少两个数据分片,所述存储策略指示了所述至少两个数据分片中每个数据分片在所述至少两个存储设备中的存储位置。
  9. 根据权利要求8所述的设备,其特征在于,
    所述获取单元,还用于在所述使用纠删码技术对目标数据进行分片处理之前,获取所述分布式存储系统的规模信息,所述规模信息包括所述至少两个存储设备对应的数量;
    所述处理单元,还用于根据预设的存储系统规模信息和分片比例的对应关系,确定出与所述分布式存储系统的规模信息对应的分片比例,所述分片比例用于指示原始数据分片和冗余数据分片之间的比例;
    所述处理单元,具体用于使用纠删码技术并按照所述分片比例对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片。
  10. 根据权利要求8所述的设备,其特征在于,所述目标数据的特征信息包括所述目标数据的重要等级;
    所述处理单元,还用于在所述使用纠删码技术对目标数据进行分片处理之前,根据预设的数据重要等级和分片比例的对应关系,确定出与所述目标数据的重要等级对应的分片比例,所述分片比例用于指示原始数据分片和冗余数据分片之间的比例;
    所述处理单元,具体用于使用纠删码技术并按照所述分片比例对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片。
  11. 根据权利要求8-10任一项所述的设备,其特征在于,
    所述处理单元,具体用于用于根据预设的数据特征信息和存储策略的对应关系,确定出与所述目标数据的特征信息对应的存储策略,并将确定出的存储策略作为对所述至少两个数据分片的存储策略;
    其中,不同的数据特征信息对应的存储策略指示的各数据分片在所述至少两个存储设备中的存储位置存在不同。
  12. 根据权利要求8-10任一项所述的设备,其特征在于,
    所述获取单元,还用于获取所述至少两个存储设备中每个存储设备的存储信息,所述存储信息包括以下信息中的任一项或多项:剩余存储空间、已使用存储空间、部署位置以及安全等级;
    所述处理单元,具体用于根据所述目标数据的特征信息和所述至少两个存储设备中每个存储设备的存储信息,确定对所述至少两个数据分片的存储策略。
  13. 根据权利要求8所述的设备,其特征在于,
    所述获取单元,还用于在所述按照所述存储策略存储所述至少两个数据分片之后,按照预设的第一周期检测所述至少两个数据分片中各数据分片的存储状态,所述存储状态包括正常状态和非正常状态;
    所述获取单元,还用于当检测到处于非正常状态的数据分片的数目超过预设的第二数目阈值时,按照预设的第二周期检测所述至少两个数据分片中处于正常状态的数据分片的存储状态;
    所述处理单元,还用于当检测到处于非正常状态的数据分片的数目超过预设的第一数目阈值时,根据所述至少两个数据分片中处于正常状态的数据分片重构处于非正常状态的数据分片,并存储重构的数据分片;
    其中,所述第二周期对应的时间间隔小于所述第一周期对应的时间间隔,所述第二数目阈值小于所述第一数目阈值。
  14. 根据权利要求8所述的设备,其特征在于,
    所述处理单元,还用于在所述按照所述存储策略存储所述至少两个数据分片之后,记 录各数据分片的存储位置,并生成记录有各数据分片的标识及其存储设备标识的分片存储节点列表;对所述分片存储节点列表进行加密;当确定所述目标数据的标签为特定标签,或者,确定所述目标数据的重要等级为高,或者,确定所述目标数据的存储费用高于预设费用值时,对所述目标数据对应的各数据分片进行加密,且该n块原始数据分片的加密方式和m块冗余数据分片的加密方式不同。
  15. 一种数据处理设备,其特征在于,包括处理器和存储器,所述处理器和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行以下步骤:
    获取待存储的目标数据的特征信息,所述特征信息包括以下信息中的任一项或多项:所述目标数据的数据标签、所述目标数据的重要等级、所述目标数据的存储费用以及所述目标数据的大小;
    使用纠删码技术对所述目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片,所述至少两个数据分片包括所述目标数据对应的n块原始数据分片和m块冗余数据分片,所述n和m均为大于0的整数;
    根据所述目标数据的特征信息确定对所述至少两个数据分片的存储策略,并按照所述存储策略存储所述至少两个数据分片,所述存储策略指示了所述至少两个数据分片中每个数据分片在分布式存储系统包括至少两个存储设备中的存储位置。
  16. 根据权利要求15所述的设备,其特征在于,所述处理器在执行所述使用纠删码技术对目标数据进行分片处理之前,还执行以下步骤:
    获取所述分布式存储系统的规模信息,所述规模信息包括所述至少两个存储设备对应的数量;
    根据预设的存储系统规模信息和分片比例的对应关系,确定出与所述分布式存储系统的规模信息对应的分片比例,所述分片比例用于指示原始数据分片和冗余数据分片之间的比例;
    所述处理器在执行所述使用纠删码技术对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片时,具体执行以下步骤:
    使用纠删码技术并按照所述分片比例对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片。
  17. 根据权利要求15所述的设备,其特征在于,所述处理器在执行所述目标数据的特征信息包括所述目标数据的重要等级;在所述使用纠删码技术对目标数据进行分片处理之前,还执行以下步骤:
    根据预设的数据重要等级和分片比例的对应关系,确定出与所述目标数据的重要等级对应的分片比例,所述分片比例用于指示原始数据分片和冗余数据分片之间的比例;
    所述使用纠删码技术对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片时,具体执行以下步骤:
    使用纠删码技术并按照所述分片比例对目标数据进行分片处理,以得到所述目标数据对应的至少两个数据分片。
  18. 根据权利要求15所述的设备,其特征在于,所述处理器在执行所述按照所述存储 策略存储所述至少两个数据分片之后,还执行以下步骤:
    按照预设的第一周期检测所述至少两个数据分片中各数据分片的存储状态,所述存储状态包括正常状态和非正常状态;
    当检测到处于非正常状态的数据分片的数目超过预设的第二数目阈值时,按照预设的第二周期检测所述至少两个数据分片中处于正常状态的数据分片的存储状态;
    当检测到处于非正常状态的数据分片的数目超过预设的第一数目阈值时,根据所述至少两个数据分片中处于正常状态的数据分片重构处于非正常状态的数据分片,并存储重构的数据分片;
    其中,所述第二周期对应的时间间隔小于所述第一周期对应的时间间隔,所述第二数目阈值小于所述第一数目阈值。
  19. 根据权利要求15所述的设备,其特征在于,所述处理器在执行所述按照所述存储策略存储所述至少两个数据分片之后,还执行以下步骤:
    记录各数据分片的存储位置,并生成记录有各数据分片的标识及其存储设备标识的分片存储节点列表;
    对所述分片存储节点列表进行加密;
    当确定所述目标数据的标签为特定标签,或者,确定所述目标数据的重要等级为高,或者,确定所述目标数据的存储费用高于预设费用值时,对所述目标数据对应的各数据分片进行加密,且该n块原始数据分片的加密方式和m块冗余数据分片的加密方式不同。
  20. 一种计算机非易失性可读存储介质,其特征在于,所述计算机非易失性可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求1-7任一项所述的方法。
PCT/CN2019/117869 2019-01-23 2019-11-13 一种基于数据分片的数据存储方法、设备及介质 WO2020151323A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910070379.6A CN109885256B (zh) 2019-01-23 2019-01-23 一种基于数据分片的数据存储方法、设备及介质
CN201910070379.6 2019-01-23

Publications (1)

Publication Number Publication Date
WO2020151323A1 true WO2020151323A1 (zh) 2020-07-30

Family

ID=66926867

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117869 WO2020151323A1 (zh) 2019-01-23 2019-11-13 一种基于数据分片的数据存储方法、设备及介质

Country Status (2)

Country Link
CN (1) CN109885256B (zh)
WO (1) WO2020151323A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685793A (zh) * 2020-12-25 2021-04-20 联想(北京)有限公司 基于区块链的数据处理方法、装置和系统
CN115208903A (zh) * 2022-06-02 2022-10-18 广州番禺电缆集团有限公司 一种基于分布式服务的智能电缆
CN117389469A (zh) * 2023-09-21 2024-01-12 华南理工大学 一种互联网数据存储方法、装置、系统和介质
CN117688612A (zh) * 2024-01-31 2024-03-12 青岛闪收付信息技术有限公司 一种电子债权凭证数据信息安全存储方法

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885256B (zh) * 2019-01-23 2022-07-08 平安科技(深圳)有限公司 一种基于数据分片的数据存储方法、设备及介质
CN110442644A (zh) * 2019-07-08 2019-11-12 深圳壹账通智能科技有限公司 区块链数据归档存储方法、装置、计算机设备和存储介质
CN110363017A (zh) * 2019-07-15 2019-10-22 华瑞新智科技(北京)有限公司 混合云环境下基于客户端加密的数据安全共享方法及系统
CN110580246B (zh) * 2019-07-30 2023-10-20 平安科技(深圳)有限公司 迁徙数据的方法、装置、计算机设备及存储介质
CN110855761B (zh) * 2019-10-29 2021-09-21 深圳前海微众银行股份有限公司 一种基于区块链系统的数据处理方法及装置
CN111061357B (zh) * 2019-12-13 2021-09-03 北京奇艺世纪科技有限公司 节能方法、装置、电子设备及存储介质
CN111818124B (zh) * 2020-05-29 2022-09-02 平安科技(深圳)有限公司 数据存储方法、装置、电子设备及介质
CN112783445A (zh) * 2020-11-17 2021-05-11 北京旷视科技有限公司 数据存储方法、装置、系统、电子设备及可读存储介质
CN113485637A (zh) * 2021-05-11 2021-10-08 广州炒米信息科技有限公司 数据存储方法、装置及计算机设备
CN113672174B (zh) * 2021-08-03 2024-05-07 中移(杭州)信息技术有限公司 数据重构方法、设备、存储介质及装置
CN115857837B (zh) * 2023-02-27 2023-06-06 中国华能集团清洁能源技术研究院有限公司 一种大数据环境下的数据分布方法、装置、设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160062674A1 (en) * 2014-09-02 2016-03-03 Netapp, Inc. Data storage architecture for storing metadata with data
CN105630418A (zh) * 2015-12-24 2016-06-01 曙光信息产业(北京)有限公司 一种数据存储方法及装置
CN106909470A (zh) * 2017-01-20 2017-06-30 深圳市中博科创信息技术有限公司 基于纠删码的分布式文件系统存储方法及装置
CN107273060A (zh) * 2017-07-07 2017-10-20 深圳云安行科技有限公司 数据分布式存储的方法和装置
CN107589917A (zh) * 2017-09-29 2018-01-16 郑州云海信息技术有限公司 一种分布式存储系统及方法
CN109144766A (zh) * 2017-06-28 2019-01-04 杭州海康威视数字技术股份有限公司 一种数据存储、重构方法和装置、及电子设备
CN109885256A (zh) * 2019-01-23 2019-06-14 平安科技(深圳)有限公司 一种基于数据分片的数据存储方法、设备及介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102270161B (zh) * 2011-06-09 2013-03-20 华中科技大学 一种基于纠删码的多等级容错数据存储、读取和恢复方法
CN102937967B (zh) * 2012-10-11 2018-02-27 南京中兴新软件有限责任公司 数据冗余实现方法及装置
US9838042B1 (en) * 2015-06-17 2017-12-05 Amazon Technologies, Inc. Data retrieval optimization for redundancy coded data storage systems with static redundancy ratios
CN106502576B (zh) * 2015-09-06 2020-06-23 中兴通讯股份有限公司 迁移策略调整方法及装置
CN107291889A (zh) * 2017-06-20 2017-10-24 郑州云海信息技术有限公司 一种数据存储方法及系统
CN107943421B (zh) * 2017-11-30 2021-04-20 成都华为技术有限公司 一种基于分布式存储系统的分区划分方法及装置
CN109117275A (zh) * 2018-08-31 2019-01-01 平安科技(深圳)有限公司 基于数据分片的对账方法、装置、计算机设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160062674A1 (en) * 2014-09-02 2016-03-03 Netapp, Inc. Data storage architecture for storing metadata with data
CN105630418A (zh) * 2015-12-24 2016-06-01 曙光信息产业(北京)有限公司 一种数据存储方法及装置
CN106909470A (zh) * 2017-01-20 2017-06-30 深圳市中博科创信息技术有限公司 基于纠删码的分布式文件系统存储方法及装置
CN109144766A (zh) * 2017-06-28 2019-01-04 杭州海康威视数字技术股份有限公司 一种数据存储、重构方法和装置、及电子设备
CN107273060A (zh) * 2017-07-07 2017-10-20 深圳云安行科技有限公司 数据分布式存储的方法和装置
CN107589917A (zh) * 2017-09-29 2018-01-16 郑州云海信息技术有限公司 一种分布式存储系统及方法
CN109885256A (zh) * 2019-01-23 2019-06-14 平安科技(深圳)有限公司 一种基于数据分片的数据存储方法、设备及介质

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685793A (zh) * 2020-12-25 2021-04-20 联想(北京)有限公司 基于区块链的数据处理方法、装置和系统
CN115208903A (zh) * 2022-06-02 2022-10-18 广州番禺电缆集团有限公司 一种基于分布式服务的智能电缆
CN115208903B (zh) * 2022-06-02 2023-10-24 广州番禺电缆集团有限公司 一种基于分布式服务的智能电缆
CN117389469A (zh) * 2023-09-21 2024-01-12 华南理工大学 一种互联网数据存储方法、装置、系统和介质
CN117688612A (zh) * 2024-01-31 2024-03-12 青岛闪收付信息技术有限公司 一种电子债权凭证数据信息安全存储方法
CN117688612B (zh) * 2024-01-31 2024-04-26 青岛闪收付信息技术有限公司 一种电子债权凭证数据信息安全存储方法

Also Published As

Publication number Publication date
CN109885256A (zh) 2019-06-14
CN109885256B (zh) 2022-07-08

Similar Documents

Publication Publication Date Title
WO2020151323A1 (zh) 一种基于数据分片的数据存储方法、设备及介质
EP3934165A1 (en) Consensus method of consortium blockchain, and consortium blockchain system
US11614867B2 (en) Distributed storage system-based data processing method and storage device
US10073652B2 (en) Performance optimized storage vaults in a dispersed storage network
US20160006461A1 (en) Method and device for implementation data redundancy
US8984363B1 (en) Proof of retrievability for archived files
CN112035472B (zh) 数据处理方法、装置、计算机设备和存储介质
US11563560B2 (en) Blockchain-based data evidence storage method and apparatus
US20160274967A1 (en) Data recovery agent and search service for repairing bit rot
CN103164523A (zh) 数据一致性检查方法、装置及系统
EP3934161A1 (en) Consensus method and data verification method, apparatus, and system of consortium blockchain
CN110851535B (zh) 基于区块链的数据处理方法、装置、存储介质及终端
CN109254956B (zh) 一种数据下载的方法、装置及电子设备
CN112381649A (zh) 基于区块链的交易共识方法、装置及设备
CN112632007A (zh) 一种日志存储及提取方法、装置、设备及存储介质
KR20090089285A (ko) 데이터를 안전하게 보호하기 위한 시스템 및 방법
US10268543B2 (en) Online volume repair
CN110209347B (zh) 一种可追溯的数据存储方法
CN107346271A (zh) 备份数据块的方法和灾备端设备
CN109002710A (zh) 一种检测方法、装置及计算机可读存储介质
CN109240849B (zh) 数据备份方法、装置及用于视频会议系统的多点控制单元
CN117667788B (zh) 数据的交互方法、计算机系统、电子设备和存储介质
US11874821B2 (en) Block aggregation for shared streams
CN110209727B (zh) 一种数据存储方法、终端设备及介质
CN117494161A (zh) 一种可恢复错误数据的移动硬盘加密/解密方法、系统和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19912028

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19912028

Country of ref document: EP

Kind code of ref document: A1