WO2021174828A1 - 数据处理方法、装置、计算机系统及可读存储介质 - Google Patents

数据处理方法、装置、计算机系统及可读存储介质 Download PDF

Info

Publication number
WO2021174828A1
WO2021174828A1 PCT/CN2020/118457 CN2020118457W WO2021174828A1 WO 2021174828 A1 WO2021174828 A1 WO 2021174828A1 CN 2020118457 W CN2020118457 W CN 2020118457W WO 2021174828 A1 WO2021174828 A1 WO 2021174828A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
processed
replica server
original data
compression
Prior art date
Application number
PCT/CN2020/118457
Other languages
English (en)
French (fr)
Inventor
齐泽青
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202010743261.8A external-priority patent/CN111880740B/zh
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021174828A1 publication Critical patent/WO2021174828A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • This application relates to the field of big data technology, and in particular to a data processing method, device, computer system and readable storage medium.
  • distributed storage systems use multiple distributed storage servers to share the storage load, which overcomes the disadvantages of traditional centralized storage systems with low security, but in order to ensure the reliability of data
  • typical distributed storage is commonly used at the bottom of the data storage. For example, one data store keeps multiple copies and stores them on different hosts. Because the same data stores multiple copies, the space occupied by the original data is the same as that of the original data. It is several times more expensive than there are.
  • the compression process is generally independent as a process to compress data according to certain rules, which is a kind of system performance.
  • the additional consumption will reduce the overall performance of the system, so a data compression processing solution is needed that has less impact on the system performance.
  • the purpose of this application is to provide a data processing method, device, computer system, and readable storage medium, which are used to solve the problem of relatively low overall system performance caused by data compression in the prior art.
  • the distributed storage system includes a plurality of replica servers, wherein the replica servers include a master replica server and at least one slave replica server, Applies to any slave replica server, including:
  • the priority list includes a read and write operation state and a compression operation state, and set the read and write operation state to have a higher priority than the compression operation state;
  • the target data is written and the to-be-processed data is deleted.
  • this application also provides a data processing method, which is applied to a distributed storage system.
  • the distributed storage system includes a plurality of replica servers, wherein the replica server includes a master replica server and at least one slave replica Server, applied to a master replica server, the master replica server stores the original data received from the replica server, including the following:
  • this application also provides a data processing device, including a master replica server and at least one slave replica server;
  • the slave replica server includes the following:
  • the first receiving module is configured to receive the original data sent by the client and write the original data into the first storage unit to obtain the data to be processed;
  • the execution module is used to verify the to-be-processed data, synchronously execute the compression operation on the to-be-processed data that has passed the verification, and obtain the target data;
  • the first processing module is configured to write the target data and delete the to-be-processed data
  • the primary replica server stores the original data received from the replica server, including the following:
  • the second receiving module is configured to receive the original data sent by the client, write the original data into the second storage unit, and send the original data to the slave replica server;
  • the second processing module is configured to receive a read request sent by the client, and send the original data to the client.
  • the present application also provides a computer system, which includes multiple computer devices, each computer device includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the multiple When processors of two computer devices execute the computer program, jointly implementing the data processing method includes:
  • the distributed storage system includes a plurality of replica servers, wherein the replica server includes a master replica server and at least one slave replica server, applied to any slave replica server, includes: receiving the master replica server Send the original data and write the original data into the first storage unit as the data to be processed; provide a preset priority list, the priority list includes read and write operation status and compression operation status, set the The read and write operation status has a higher priority than the compression operation status; the current system status is monitored in real time, the data to be processed is verified according to the priority list, and the compression operation of the data to be processed that has passed the verification is performed synchronously to obtain the target Data; write the target data and delete the data to be processed.
  • the replica server includes a master replica server and at least one slave replica server, applied to any slave replica server, includes: receiving the master replica server Send the original data and write the original data into the first storage unit as the data to be processed; provide a preset priority list, the priority list includes read and write operation status and compression operation status, set the The read and write operation status has
  • the data processing method further includes application in a distributed storage system, the distributed storage system includes a plurality of replica servers, wherein the replica server includes a master replica server and at least one slave replica server, applied to the master replica server,
  • the master replica server storing the original data received from the replica server includes: receiving the original data sent by the client and writing the original data into the second storage unit, and sending the original data to the slave replica server; Receive the read request sent by the client, and send the original data to the client.
  • the present application also provides a computer-readable storage medium, which includes multiple storage media, each of which stores a computer program, and when the computer program stored in the multiple storage media is executed by a processor Jointly realize the above-mentioned data processing methods, including;
  • the distributed storage system includes a plurality of replica servers, wherein the replica server includes a master replica server and at least one slave replica server, applied to any slave replica server, includes: receiving the master replica server Send the original data and write the original data into the first storage unit as the data to be processed; provide a preset priority list, the priority list includes read and write operation status and compression operation status, set the The priority of the read and write operation status is higher than the compression operation status; the current system status is monitored in real time, the data to be processed is verified according to the priority list, and the compression operation of the data to be processed that passes the verification is performed synchronously to obtain the target Data; write the target data and delete the data to be processed.
  • the replica server includes a master replica server and at least one slave replica server, applied to any slave replica server, includes: receiving the master replica server Send the original data and write the original data into the first storage unit as the data to be processed; provide a preset priority list, the priority list includes read and write operation status and compression operation status, set the The priority of the read and write operation
  • the data processing method further includes application in a distributed storage system, the distributed storage system includes a plurality of replica servers, wherein the replica server includes a master replica server and at least one slave replica server, applied to the master replica server,
  • the master replica server storing the original data received from the replica server includes: receiving the original data sent by the client and writing the original data into the second storage unit, and sending the original data to the slave replica server; Receive the read request sent by the client, and send the original data to the client.
  • the data processing method, device, computer system, and readable storage medium provided in this application receive the original data sent by the client and write the original data completely into the master replica server and the slave replica server, and then the data is processed from the replica server.
  • the written data is verified and the compression operation is performed synchronously, the compressed data is rewritten and the original data is deleted for storage, the verification process is parallel to the compression writing process, and the priority control is used to rationalize the system performance Configuration, priority is given to normal read and write operations, and compression operations and write after compression only use system idle resources, which solves the problem of low overall system performance caused by data compression in the prior art.
  • Figure 1 is a framework diagram of Embodiment 1 of the data processing method of this application.
  • FIG. 3 is a specific flow chart of Embodiment 1 of the data processing method of this application.
  • Embodiment 3 is a block diagram of Embodiment 3 of the data processing device of this application.
  • Fig. 6 is a block diagram of the execution module of the third embodiment of the data processing device of this application.
  • FIG. 7 is a schematic diagram of the hardware structure of the computer equipment in the fourth embodiment of the computer system of this application.
  • the data processing method, device, computer system, and readable storage medium provided in this application are suitable for the distributed storage field of cloud storage, involve the blockchain field, and are applied to the application service layer of the blockchain.
  • the distributed storage system includes A plurality of replica servers, wherein the replica server includes a master replica server and at least one slave replica server, and a method is provided based on a first receiving module, an execution module, a first processing module in the slave replica server, and a second receiving module in the master replica server , The data processing method of the second processing module.
  • client A, B, C, and D are replica servers
  • B is the master replica server
  • C and D are slave replica servers.
  • This application receives the original data sent by the client A through the replica servers B, C, and D and writes the original data completely into the master replica server B and the slave replica servers C, D, respectively, and writes in the slave replica servers C, D
  • the incoming data is verified and the compression operation is performed synchronously.
  • the compressed data is rewritten and the original data is deleted for storage. This solves the problem of low overall system performance caused by data compression in the prior art.
  • the writing process is executed synchronously, and at the same time, client A prefers to read the original data from the master replica server B, so that the data can be compressed to a larger proportion and still be able to guarantee the original read and write performance, by making minimal changes to the existing system That is, the compression function can be added to a mature system, which introduces a lower risk to the system and minimizes the impact of compression on the system.
  • the distributed storage system includes a plurality of replica servers, wherein the replica servers include a master replica server and at least one slave replica server , Applied to any slave replica server, refer to Figure 2, including:
  • S100 Receive the original data sent by the master replica server or the client and write the original data into the first storage unit as data to be processed;
  • the above-mentioned original data is the underlying data of the system, such as metadata, raw data, etc., which are different from common image data, text data, etc., and are generally data blocks, that is, one or several groups.
  • the records arranged in sequence are the unit of data transmitted between the main storage and the input device, output device or external storage.
  • the written copies including the master copy server and the slave copy server
  • uncompressed data is written to ensure the best performance of the written data.
  • the method of selectively writing to all replica servers or part of the replica servers according to whether the distributed system is a strongly consistent system is adopted.
  • relational databases that is, in a distributed system
  • the updated data can be seen by subsequent accesses, which is a strong consistency system. If part or all of the subsequent access is not available, it is a weak consistency system, which can be determined according to the nature of the system before writing
  • the replica server that needs to be written the original data written can be from the master replica server or from the client.
  • S200 Monitor the current system status in real time, verify the data to be processed according to a preset priority list, and perform a compression operation on the data to be processed that has passed the verification synchronously to obtain target data;
  • verification is mainly used to find disk errors, data write errors and other data inconsistencies, to ensure the accuracy of written data, and the verification and compression operations are performed synchronously, that is, as an example, Replica a performs data verification while simultaneously compressing and writing the results of the verification agreement in slave copy a.
  • the verification process is parallel to the compression writing process to achieve the purpose of data compression.
  • the master replica server reads and writes data normally, so the original read and write performance can still be guaranteed after the data is compressed in a larger proportion.
  • the priority list includes a read and write operation state and a compression operation state, and the read and write operation state has a higher priority than the compression operation state;
  • the read and write operations in the above priority list are for the system to perform uncompressed read and write operations.
  • the purpose of presetting the priority list is to configure the system performance reasonably.
  • the system IO cannot support both read and write operations and compression During operation, read and write operations are performed first, while compression operations and write after compression only use system idle resources.
  • other operations that occupy system performance in the actual environment can also be added to the priority list.
  • S200-1 Real-time monitoring whether the current system is in the state of performing read and write operations
  • real-time monitoring is to determine the operations currently being performed by the current distributed system in a timely manner, so that the system preferentially executes normal read and write operations, and ensures the integrity and accuracy of normal read and write operations of the system.
  • the above read and write operations include obtaining data from the client and writing it to the master copy or slave copy for the first time, as well as the case where the client reads data from the master copy. Because the above two cases require high system performance, in order to reduce Compressed writing performed in synchronization with verification takes up more system performance and affects the normal read and write process of the system. Therefore, when performing compressed writing synchronously, it is monitored whether there is a normal read and write operation.
  • the above steps are mainly to realize that when the system resources are busy, the compressed writing thread can even be blocked, and the impact of the compressed writing on the non-compressed reading and writing operations is minimized.
  • An identification bit used to determine the compression operation process is added to the to-be-processed data.
  • the specific implementation of adding the identification bit used to determine the progress of the compression operation is to add identification information to the data where the compression operation has been completed, so that the subsequent compression operation will continue, and the compressed data does not need to be compressed again. Perform checksum compression to improve work efficiency.
  • the verification of the to-be-processed data described in the foregoing steps, referring to FIG. 3, includes:
  • parity is odd and even.
  • CRC cyclic redundancy check
  • the general term for codes They are composed by adding a check bit to the code to be checked. If it is an odd check and the check bit is added, the number of 1s in the code is an odd number, if it is an even check After adding the check bit, the number of 1s in the code is an even number; Hamming code also uses parity to check data.
  • the original data is the underlying data, it can also be read out by compiling. If the data to be verified has an error during storage or data exchange, it cannot be completely compiled. This method The data to be processed can also be verified quickly.
  • the data to be processed is verified, the data that has passed the verification is compressed synchronously, and the data that fails the verification needs to be processed again, so when the verification fails, the following is also included:
  • the method of reading the master copy is adopted.
  • the primary server In order to ensure the read performance, the primary server generally does not perform any processing on the original data written, and the accuracy and completeness of the stored data can be determined to the greatest extent.
  • S222 Adjust the to-be-processed data according to the original data obtained from other replica servers, and obtain the adjusted to-be-processed data;
  • the verification failure may be due to a deviation in the data exchange process or a problem in the storage process. Therefore, the original data can be checked and the wrong data can be modified.
  • the original data can be obtained from the master replica server, or Data can be obtained from multiple other replica servers, and other data verified by the replica server can be obtained.
  • the original data here is the original data stored in other replica servers or the to-be-processed data consistent with the original data.
  • S223 Perform verification again on the adjusted to-be-processed data until the verification is passed.
  • performing the compression operation on the to-be-processed data that has passed the verification in the above step S200, and obtaining the target data includes the following:
  • S231 Process the to-be-processed data based on a preset compression algorithm to obtain compressed data
  • specific compression algorithms include but are not limited to Huffman (Huffman) algorithm and LZW (Lenpel-Ziv& Welch) compression algorithm, etc.
  • Other compression algorithms in the prior art can also be used for this, and corresponding algorithms can be preset according to specific implementation scenarios.
  • the label corresponding to the compression algorithm is used to identify the algorithm, and the compressed data is marked to facilitate subsequent decompression and reading according to the label matching a suitable algorithm.
  • the user equipment can download the target data from any replica server in the distributed system, and the original data can be obtained after decompression, so as to verify whether the above data has been tampered with.
  • Each data block It contains a batch of network transaction information, which is used to verify the validity of the information (anti-counterfeiting) and generate the next block to ensure data security.
  • S300 Write the target data and delete the to-be-processed data.
  • the pre-stored data to be processed needs to be deleted after the compression operation is completed. According to the above steps, if there is no error in the data exchange or data storage process, the data to be processed is consistent with the original data. To achieve the release of storage space to reduce costs, it is necessary to delete the original data and use the corresponding compressed data for storage.
  • step S200 the verification process and the compression writing process are executed synchronously (step S200) to solve the problem of low overall system performance caused by data compression in the prior art.
  • compression is only performed on the slave replica server, preferably the master replica
  • the server reads the data, and the data can still guarantee the original read and write performance after a larger proportion of the data is compressed, and combined with priority control, the impact of the compression operation on the system IO is further reduced to a minimum.
  • the distributed storage system includes a plurality of replica servers, where the replica servers include a master replica server and at least one slave replica Server, applied to a master replica server, the master replica server stores the original data received from the replica server, including the following:
  • S410 Receive original data sent by the client, write the original data into the second storage unit, and send the original data to the slave replica server;
  • S420 Receive a read request sent by the client, and send the original data to the client.
  • the client can also obtain the original data after decompressing the data in the replica server, because the original data It is compressed, so every time the data is read, a decompression process needs to be completed.
  • the master replica server can directly obtain the complete original data to improve the efficiency of reading and writing. You can also choose to write compressed data to the master replica. In this case, the system's The compression ratio is further improved, but a part of the read performance is sacrificed, which further reduces the occupation of system space and reduces costs.
  • a data processing apparatus 5 of this embodiment includes a master replica server 51 and at least one slave replica server 52;
  • the slave replica server 52 includes the following:
  • the first receiving module 521 is configured to receive the original data sent by the client;
  • the first storage unit 522 is configured to write and store the original data as data to be processed
  • the above-mentioned original data is the underlying data of the system, such as metadata, raw data, etc., which are different from common image data, text data, etc., and are generally data blocks.
  • the priority list storage module is used to provide a preset priority list, the priority list includes a read and write operation state and a compression operation state, and the priority of the read and write operation state is set to be higher than the compression operation state;
  • the execution module 523 is configured to monitor the current system status in real time, verify the to-be-processed data according to the priority list, and synchronously execute a compression operation on the to-be-processed data that has passed the verification to obtain target data;
  • the execution module further includes the following:
  • the detection unit 5231 is used to monitor in real time whether the current system is in a state of performing read and write operations
  • the control unit 5232 is configured to sequentially execute read and write operations and compression operations according to the priority list
  • the above-mentioned control unit reasonably configures the system performance according to the preset priority list.
  • the read and write operations are executed first, and the compression operation is executed after the compression operation.
  • the verification unit 5233 is configured to control the verification of the to-be-processed data
  • the verification process performed by the verification unit includes, but is not limited to, verification code verification, compilation verification, and mutual verification between multiple replica servers.
  • the adjustment unit 5234 is configured to obtain original data from the master replica server when the verification fails; adjust the to-be-processed data according to the original data obtained from other replica servers to obtain the adjusted to-be-processed data; The adjusted data to be processed is checked again until it passes the check.
  • the compression unit 5235 is configured to control the execution of the compression operation on the to-be-processed data that has passed the verification.
  • the specific compression algorithm executed by the compression unit includes but not limited to Huffman algorithm and LZW (Lenpel-Ziv& Welch) compression algorithm, etc.
  • the verification and compression operations are performed synchronously, and the verification process is parallel to the compression writing process to achieve the purpose of data compression and at the same time solve the problem of relatively low overall system performance caused by data compression in the prior art.
  • the first processing module 524 is configured to write the target data and delete the to-be-processed data.
  • the master replica server 51 stores the original data received from the replica server, including the following:
  • the second receiving module 511 is configured to receive the original data sent by the client, and send the original data to the slave replica server;
  • the second storage unit 512 is configured to write and store the original data
  • the second processing module 513 is configured to receive a read request sent by the client, and send the original data to the client.
  • This technical solution is based on the distributed storage field of cloud storage.
  • the data written in the slave replica server is checked. Verify and synchronize the compression operation, use the compressed data to rewrite and delete the original data for storage, which solves the problem of low overall system performance caused by data compression in the prior art.
  • the master copy that has never been compressed is preferred. Read the data in the middle to reduce the impact on the normal read and write performance of the original data.
  • This technical solution also monitors whether the current system is in the state of performing read and write operations in real time through the detection unit, processes normal read and write operations and compression operations according to priority, and reasonably configures system performance.
  • the system IO cannot support both read and write operations and During the compression operation, read and write operations are performed first, and the compression operation is performed when the system is free, which further reduces the problem of data compression that causes the overall performance of the system to be relatively low and affects other processes.
  • the present application also provides a computer system that includes multiple computer devices 6.
  • the components of the data processing device 5 in the second embodiment can be dispersed in different computer devices, and the computer devices can be executable programs.
  • the computer equipment in this embodiment at least includes but is not limited to: a memory 61 and a processor 62 that can be communicatively connected to each other through a system bus, as shown in FIG. 7.
  • FIG. 7 only shows a computer device with components, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the memory 61 (ie, readable storage medium) includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), Read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 61 may be an internal storage unit of a computer device, such as a hard disk or a memory of the computer device.
  • the memory 61 may also be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) equipped on the computer device. Card, Flash Card, etc.
  • the memory 51 may also include both the internal storage unit of the computer device and its external storage device.
  • the memory 61 is generally used to store an operating system and various application software installed in a computer device, such as the program code of the data processing apparatus in the first embodiment, and so on.
  • the memory 61 may also be used to temporarily store various types of data that have been output or will be output.
  • the processor 62 may be a central processing unit (Central Processing Unit) in some embodiments. Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip.
  • the processor 62 is generally used to control the overall operation of the computer equipment.
  • the processor 62 is used to run program codes or process data stored in the memory 61, for example, to run a data processing device, so as to implement the data processing method of the first embodiment.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile, and includes multiple storage media, such as flash memory, hard disk, and multimedia.
  • the computer-readable storage medium of this embodiment is used to store a data processing device, and when executed by the processor 62, the data processing method of the first embodiment is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种数据处理方法、装置、计算机系统及可读存储介质,涉及大数据技术领域,用于分布式数据,应用于分布式存储系统中,所述分布式存储系统包括多个副本服务器,其中所述副本服务器包括主副本服务器和至少一个从副本服务器,对于任一从副本服务器,包括:接收主副本服务器发出的原始数据并将所述原始数据写入第一存储单元,作为待处理数据;根据优先级列表对所述待处理数据进行校验,同步执行对通过校验的待处理数据的压缩操作,获得目标数据;将所述目标数据写入并删除所述待处理数据。通过校验过程与压缩写入过程并行,同时采用优先级控制使用系统空闲资源执行压缩写入,解决数据压缩造成系统整体性能比较低的问题。

Description

数据处理方法、装置、计算机系统及可读存储介质
本申请要求于2020年7月29日提交中国专利局申请号为202010743261.8,名称为“数据处理方法、装置、计算机系统及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及大数据技术领域,尤其涉及一种数据处理方法、装置、计算机系统及可读存储介质。
背景技术
随着大规模的数据存储应用的发展,分布式存储系统由于采用多台分散的存储服务器来分担存储负荷,克服了传统的集中式存储系统性安全性较低的缺陷,但是为了保证数据的可靠性,目前典型的分布式存储在底层普遍采用的数据存储,例如将一份数据存储保留多个副本,且存储在不同的主机上,由于同样的数据存储了多份占用的空间与原数据相比有数倍之多,成本较高。
发明人发现为了降低成本,需要对数据进行一定的压缩处理,但是在多副本分布式存储系统下,压缩过程一般会独立的作为一个进程按照一定的规则对数据进行压缩,对系统性能是一种额外的消耗,会降低系统整体性能,因此需要一种对系统性能的影响较少的情况进行数据压缩的处理方案。
技术问题
本申请的目的是提供一种数据处理方法、装置、计算机系统及可读存储介质,用于解决现有技术中的数据压缩造成系统整体性能比较低的问题。
技术解决方案
为实现上述目的,本申请提供一种数据处理方法,应用于分布式存储系统中,所述分布式存储系统包括多个副本服务器,其中所述副本服务器包括主副本服务器和至少一个从副本服务器,应用于任一从副本服务器,包括:
接收主副本服务器发出的原始数据并将所述原始数据写入第一存储单元,作为待处理数据;
提供一预置的优先级列表,所述优先级列表中包括读写操作状态和压缩操作状态,设置所述读写操作状态优先级高于压缩操作状态;
实时监测当前系统状态,根据所述优先级列表对所述待处理数据进行校验,同步执行对通过校验的待处理数据的压缩操作,获得目标数据;
将所述目标数据写入并删除所述待处理数据。
为实现上述目的,本申请还提供一种数据处理方法,应用于分布式存储系统中,所述分布式存储系统包括多个副本服务器,其中,所述副本服务器包括主副本服务器和至少一个从副本服务器,应用于主副本服务器,所述主副本服务器存储有所述从副本服务器接收的原始数据,包括以下:
接收客户端发出的原始数据并将所述原始数据写入第二存储单元,将所述原始数据发送至从副本服务器;
接收客户端发出的读出请求,将所述原始数据发送至客户端。
为实现上述目的,本申请还提供一种数据处理装置,包括主副本服务器和至少一个从副本服务器;
所述从副本服务器包括以下:
第一接收模块,用于接收客户端发出的原始数据并将所述原始数据写入第一存储单元,获得待处理数据;
执行模块,用于对所述待处理数据进行校验,同步执行对通过校验的待处理数据的压缩操作,获得目标数据;
第一处理模块,用于将所述目标数据写入并删除所述待处理数据;
所述主副本服务器存储有所述从副本服务器接收的原始数据,包括以下:
第二接收模块,用于接收客户端发出的原始数据并将所述原始数据写入第二存储单元,并将所述原始数据发送至从副本服务器;
第二处理模块,用于接收客户端发出的读出请求,将所述原始数据发送至客户端。
为实现上述目的,本申请还提供一种计算机系统,其包括多个计算机设备,各计算机设备包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其中,所述多个计算机设备的处理器执行所述计算机程序时共同实现所述数据处理方法,包括:
应用于分布式存储系统中,所述分布式存储系统包括多个副本服务器,其中所述副本服务器包括主副本服务器和至少一个从副本服务器,应用于任一从副本服务器,包括:接收主副本服务器发出的原始数据并将所述原始数据写入第一存储单元,作为待处理数据;提供一预置的优先级列表,所述优先级列表中包括读写操作状态和压缩操作状态,设置所述读写操作状态优先级高于压缩操作状态;实时监测当前系统状态,根据所述优先级列表对所述待处理数据进行校验,同步执行对通过校验的待处理数据的压缩操作,获得目标数据;将所述目标数据写入并删除所述待处理数据。
所述数据处理方法还包括应用于分布式存储系统中,所述分布式存储系统包括多个副本服务器,其中,所述副本服务器包括主副本服务器和至少一个从副本服务器,应用于主副本服务器,所述主副本服务器存储有所述从副本服务器接收的原始数据,包括:接收客户端发出的原始数据并将所述原始数据写入第二存储单元,将所述原始数据发送至从副本服务器;接收客户端发出的读出请求,将所述原始数据发送至客户端。
为实现上述目的,本申请还提供一种计算机可读存储介质,其包括多个存储介质,各存储介质上存储有计算机程序,所述多个存储介质存储的所述计算机程序被处理器执行时共同实现上述数据处理方法,包括;
应用于分布式存储系统中,所述分布式存储系统包括多个副本服务器,其中所述副本服务器包括主副本服务器和至少一个从副本服务器,应用于任一从副本服务器,包括:接收主副本服务器发出的原始数据并将所述原始数据写入第一存储单元,作为待处理数据;提供一预置的优先级列表,所述优先级列表中包括读写操作状态和压缩操作状态,设置所述读写操作状态优先级高于压缩操作状态;实时监测当前系统状态,根据所述优先级列表对所述待处理数据进行校验,同步执行对通过校验的待处理数据的压缩操作,获得目标数据;将所述目标数据写入并删除所述待处理数据。
所述数据处理方法还包括应用于分布式存储系统中,所述分布式存储系统包括多个副本服务器,其中,所述副本服务器包括主副本服务器和至少一个从副本服务器,应用于主副本服务器,所述主副本服务器存储有所述从副本服务器接收的原始数据,包括:接收客户端发出的原始数据并将所述原始数据写入第二存储单元,将所述原始数据发送至从副本服务器;接收客户端发出的读出请求,将所述原始数据发送至客户端。
有益效果
本申请提供的数据处理方法、装置、计算机系统及可读存储介质,通过接收客户端发出的原始数据并将所述原始数据分别完整写入主副本服务器和从副本服务器,再从副本服务器中对写入的数据进校验并同步执行压缩操作,采用压缩后的数据重新写入并删除原始数据进行存储,通过校验过程与压缩写入过程并行,同时通过优先级控制,对系统性能进行合理配置,优先执行正常读写操作,而压缩操作和压缩后写入,仅使用系统空闲资源,解决现有技术中的数据压缩造成系统整体性能比较低的问题。
附图说明
图1为本申请数据处理方法实施例一的框架图;
图2为本申请数据处理方法实施例一中从副本服务器处理的流程图;
图3为本申请数据处理方法实施例一中具体流程图;
图4为本申请数据处理方法实施例二中的流程图;
图5为本申请数据处理装置实施例三的模块图;
图6为本申请数据处理装置实施例三执行模块中的模块图;
图7为本申请计算机系统实施例四中计算机设备的硬件结构示意图。
本发明的实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供的数据处理方法、装置、计算机系统及可读存储介质,适用于云存储的分布式存储领域,涉及区块链领域,且应用于区块链的应用服务层,分布式存储系统包括多个副本服务器,其中所述副本服务器包括主副本服务器和至少一个从副本服务器,提供一种基于从副本服务器中第一接收模块、执行模块、第一处理模块以及主副本服务器中第二接收模块、第二处理模块的数据处理方法。参阅图1中的框架如,A客户端,B、C、D为副本服务器,B为主副本服务器,C、D为从副本服务器,对于数据读取,优选采用读取主副本服务器B的方式。本申请通过副本服务器B、C、D接收客户端A发出的原始数据并将所述原始数据分别完整写入主副本服务器B和从副本服务器C、D,在从副本服务器C、D中对写入的数据进校验并同步执行压缩操作,采用压缩后的数据重新写入并删除原始数据进行存储,解决现有技术中的数据压缩造成系统整体性能比较低的问题,通过校验进程与压缩写入进程同步执行,同时客户端A优选从主副本服务器B中读取原始数据,使数据获得较大比例的压缩后仍然能够保证原始的读写性能,通过对现有系统进行最小化的改动即可将压缩功能添加到成熟系统,对系统引入较低风险,将压缩对系统的影响降低到最小。
实施例一
请参阅图1,本实施例的一种数据处理方法,应用于分布式存储系统中,所述分布式存储系统包括多个副本服务器,其中所述副本服务器包括主副本服务器和至少一个从副本服务器,应用于任一从副本服务器,参阅图2,包括:
S100:接收主副本服务器或客户端发出的原始数据并将所述原始数据写入第一存储单元,作为待处理数据;
在本实施方式中,需要说明的是,上述原始数据为系统底层数据,比如元数据、裸数据等等,区别于常见的图像数据文本数据等等,一般为数据块,即一组或几组按顺序连续排列在一起的记录,是主存储器与输入设备、输出设备或外存储器之间进行传输的数据单位,本方案中对于写入的全部副本(包括主副本服务器和从副本服务器),在初次写入的时候,均写入不压缩的数据,以保证写入数据的最佳性能。
在上述步骤S100中的数据写入前,采用根据该分布式系统是否是强一致性系统而选择性写入全部副本服务器或者部分副本服务器的方式,对于关系型数据库(即分布式系统中),要求更新过的数据能被后续的访问都能看到,即为强一致性系统,如果后续的部分或者全部访问不到,则是弱一致性系统,在写入前即可根据该系统性质确定需要被写入的副本服务器,写入的原始数据可以是来自主副本服务器也可以来自客户端。
S200:实时监测当前系统状态,根据预置的优先级列表对所述待处理数据进行校验,同步执行对通过校验的待处理数据的压缩操作,获得目标数据;
在本方案中,校验主要是用于发现磁盘错误,数据写入错误等各种数据不一致性问题,保证写入数据的准确性,校验与压缩操作同步执行,即作为举例的,当从副本a执行数据校验的同时将校验一致的结果同步压缩写入从副本a中,所述校验过程与压缩写入过程并行,达到数据压缩的目的,同时由于从副本服务器执行压缩写入,而主副本服务器为正常读写数据,因此数据获得较大比例的压缩后仍然能够保证原始的读写性能。
具体的,参阅图3,在执行对通过校验的所述待处理数据的压缩操作前,包括以下步骤:
提供一预置的优先级列表,所述优先级列表中包括读写操作状态和压缩操作状态,所述读写操作状态优先级高于压缩操作状态;
需要说明的是,上述优先级列表中的读写操作为系统执行非压缩读写操作,预置该优先级列表的作用在于对系统性能进行合理配置,当系统IO无法同时支持读写操作和压缩操作时,优先执行读写操作,而压缩操作和压缩后写入,仅使用系统空闲资源,作为举例的,也可将实际环境中的其他占用系统性能的操作加入该优先级列表中。
S200-1:实时监测当系统前是否处于执行读写操作的状态下;
具体的,实时监测是为了及时确定当前分布式系统正在执行的操作,以便系统优先执行正常读写操作,保证系统正常读写的完整性和准确性。
上述读写操作包括从客户端中获取数据并首次写入主副本或从副本中,以及客户端从主副本中读取数据的情况,由于上述两种情况下对系统性能要求较高,为了减少与校验同步执行的压缩写入占用较多系统性能,而影响系统正常读写过程,因而在同步执行压缩写入时监测是否存在正常读写操作的情况。
S200-2:若是,则停止压缩操作,根据所述优先级列表依次执行读写操作和压缩操作;
上述步骤主要是为了实现在系统资源忙碌的时候,压缩写入线程甚至可以被阻塞,将压缩写入对非压缩读写操作的影响最小化。
具体的,在停止压缩操作前,还包括以下:
在所述待处理数据中添加用于确定压缩操作进程的标识位。
在上述实施方式中,添加用于确定压缩操作进程的标识位具体的实现方式为在已经完成压缩操作的数据处添加标识信息,这样在后续继续执行压缩操作拾,已经被压缩的数据不需要再次进行校验和压缩,提高工作效率。
S200-3:若否,则继续执行校验和压缩操作。
具体的,上述步骤中所述对所述待处理数据进行校验,参阅图3,包括:
S211:判断所述待处理数据是否预置校验码;
在上述实施方式中,常使用的检验码有三种. 分别是奇偶校验码、海明校验码和循环冗余校验码(CRC),奇偶校验码是奇校验码和偶校验码的统称. 它们都是通过在要校验的编码上加一位校验位组成,如果是奇校验加上校验位后,编码中1的个数为奇数个,如果是偶校验加上校验位后,编码中1的个数为偶数个;海明码也是利用奇偶性来校验数据的. 它是一种多重奇偶校验检错系统,它通过在数据位之间插入k个校验位,来扩大码距,从而实现检错和纠错;CRC码利用生成多项式为k个数据位产生r个校验位进行编码,其编码长度为n=k+r所以又称 (n,k)码,CRC码广泛应用于数据通信领域和磁介质存储系统;所述校验码为原始数据中预置的,通过校验码对所述待处理数据进行校验的方式比较方便。
S212:若是,则基于所述校验码对所述待处理数据进行校验;
S213:若否,在将所述待校验数据进行编码读出校验。
除了校验码校验的方式外,由于原始数据为底层数据,还可通过编译读出的方式,若待校验数据在存储或数据交换过程中发生错误,则无法被完整编译,通过该方式也可迅速对待处理数据进行校验。
在本方案中,除了上述通过校验码以及编码读出校验的方式外,还可以通过主副本服务器与多个从副本服务器之间相互校验。
更具体的,对所述待处理数据进行校验,校验通过的数据同步执行压缩,校验失败的数据则需要进行再次处理,所以当校验失败后还包括以下:
S221:从主副本服务器中获取原始数据;
对于数据读取,采用读取主副本的方式。为了保证读出性能,主服务器中一般对于写入的原始数据不进行任何处理,可以最大限度确定存储数据的准确性和完整性。
S222:根据从其他副本服务器获得的原始数据对所述待处理数据进行调整,获得调整后的待处理数据;
校验失败可能是数据交换过程中产生偏差,也可能是存储过程中出现问题,因此可以根据原始数据进行核对并修改错误数据,在具体实施场景中,可以从主副本服务器中获取原始数据,也可以从其他多个从副本服务器中获得数据,可以获取其他从副本服务器中校验通过的数据。
需要说明的是,在此处的原始数据即为其他副本服务器中存储的原始数据或与原始数据一致的待处理数据。
S223:对所述调整后的待处理数据再次进行校验,直至通过校验。
具体的,参阅图3,上述步骤S200中执行对通过校验的待处理数据的压缩操作,获得目标数据包括以下:
S231:基于预设压缩算法对所述待处理数据处理,获得压缩数据;
在本实施方式中,具体的压缩算法包括但不限于霍夫曼(Huffman)算法和LZW(Lenpel-Ziv& Welch)压缩算法等,现有技术中其他压缩算法也可用于此,可根据具体的实施场景预设相应的算法。
S232:获取所述压缩算法对应的标签,对所述压缩数据在预设位置采用所述标签进行标记,获得带有算法标签的压缩数据作为目标数据。
在上述步骤中,所述压缩算法对应的标签用于标识该算法,对压缩数据进行标记以便于后续根据该标签匹配合适的算法进行解压读取。
通过本方案中分布式存储系统存储原始数据,用户设备可以从分布式系统任一从副本服务器中下载得目标数据,解压后即可获得原始数据,以便查证上述数据是否被篡改,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块,确保数据安全性。
S300:将所述目标数据写入并删除所述待处理数据。
在本实施方式中,在完成压缩操作后需要删除预先存储的待处理数据,根据上述步骤可知,若在数据交换或数据存储过程中不存在出错的情况,待处理数据与原始数据一致,因此为了实现对存储空间的释放以减少成本,需要删除原始数据,采用对应的压缩数据用于存储。
本方案中通过校验进程与压缩写入进程同步执行(步骤S200),解决现有技术中的数据压缩造成系统整体性能比较低的问题,同时只在从副本服务器端进行压缩,优选从主副本服务器读取数据,数据获得较大比例的压缩后仍然能够保证原始的读写性能,还结合优先级控制,进一步将压缩操作对系统IO的影响降低到最小。
实施例二:
请参阅图4,本实施例的一种数据处理方法,应用于分布式存储系统中,所述分布式存储系统包括多个副本服务器,其中,所述副本服务器包括主副本服务器和至少一个从副本服务器,应用于主副本服务器,所述主副本服务器存储有所述从副本服务器接收的原始数据,包括以下:
S410:接收客户端发出的原始数据并将所述原始数据写入第二存储单元,并将所述原始数据发送至从副本服务器;
S420:接收客户端发出的读出请求,将所述原始数据发送至客户端。
通过上述步骤S410和步骤S420实现本方案中优先采用读取主副本服务器的方式读取数据,在实际应用场景中,客户端也可获取从副本服务器中的数据解压后获得原始数据,由于原始数据被压缩,因此每次数据的读取需要完成一个解压的过程,主副本服务器可直接获取完整的原始数据,提高读写效率,也可选择对主副本写入压缩数据,这种情况,系统的压缩比进一步提高,但是牺牲掉了一部分读性能,进一步减少对系统空间的占用,降低成本。
实施例三:
请参阅图5,本实施例的一种数据处理装置5,包括主副本服务器51和至少一个从副本服务器52;
所述从副本服务器52包括以下:
第一接收模块521,用于接收客户端发出的原始数据;
第一存储单元522,用于将所述原始数据写入并存储,作为待处理数据;
需要说明的是,上述原始数据为系统底层数据,比如元数据、裸数据等等,区别于常见的图像数据文本数据等等,一般为数据块。
优先级列表存储模块,用于提供一预置的优先级列表,所述优先级列表中包括读写操作状态和压缩操作状态,设置所述读写操作状态优先级高于压缩操作状态;
执行模块523,用于实时监测当前系统状态,根据所述优先级列表对所述待处理数据进行校验,同步执行对通过校验的待处理数据的压缩操作,获得目标数据;
具体的,参阅图6,所述执行模块还包括以下:
检测单元5231,用于实时监测当前系统是否处于执行读写操作的状态下;
控制单元5232,用于根据所述优先级列表依次执行读写操作和压缩操作;
上述控制单元根据预置优先级列表对系统性能进行合理配置,当系统IO无法同时支持读写操作和压缩操作时,优先执行读写操作,而压缩操作后执行。
校验单元5233,用于控制对所述待处理数据进行校验;
上述通过校验单元执行的校验过程包括但不限于校验码校验、编译校验以及多个副本服务器之间相互校验。
调整单元5234,用于当校验失败后,从主副本服务器中获取原始数据;根据从其他副本服务器获得的原始数据对所述待处理数据进行调整,获得调整后的待处理数据;对所述调整后的待处理数据再次进行校验,直至通过校验。
压缩单元5235,用于控制执行对通过校验的待处理数据的压缩操作。
压缩单元执行的具体的压缩算法包括但不限于霍夫曼(Huffman)算法和LZW(Lenpel-Ziv& Welch)压缩算法等。
校验与压缩操作同步执行,所述校验过程与压缩写入过程并行,达到数据压缩的目的,同时解决现有技术中的数据压缩造成系统整体性能比较低的问题。
第一处理模块524,用于将所述目标数据写入并删除所述待处理数据。
所述主副本服务器51存储有所述从副本服务器接收的原始数据,包括以下:
第二接收模块511,用于接收客户端发出的原始数据,并将所述原始数据发送至从副本服务器;
本方案中对于写入的全部副本(包括主副本服务器和从副本服务器),在初次写入的时候,均写入不压缩的数据。
第二存储单元512,用于将所述原始数据写入并存储;
第二处理模块513,用于接收客户端发出的读出请求,将所述原始数据发送至客户端。
本技术方案基于云存储的分布式存储领域,通过接收客户端发出的原始数据并将所述原始数据分别完整写入主副本服务器和从副本服务器,在从副本服务器中对写入的数据进校验并同步执行压缩操作,采用压缩后的数据重新写入并删除原始数据进行存储,解决现有技术中的数据压缩造成系统整体性能比较低的问题,同时优选的从未执行压缩操作的主副本中读出数据,减少对原始数据正常读写性能的影响。
本技术方案还通过检测单元实时监测当前系统是否处于执行读写操作的状态下,根据优先级处理正常读写操作和压缩操作,对系统性能进行合理配置,当系统IO无法同时支持读写操作和压缩操作时,优先执行读写操作,压缩操作在系统空余时执行,进一步减少数据压缩造成系统整体性能比较低而影响其他进程的问题。
实施例四:
为实现上述目的,本申请还提供一种计算机系统,该计算机系统包括多个计算机设备6,实施例二的数据处理装置5的组成部分可分散于不同的计算机设备中,计算机设备可以是执行程序的智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。本实施例的计算机设备至少包括但不限于:可通过系统总线相互通信连接的存储器61、处理器62,如图7所示。需要指出的是,图7仅示出了具有组件-的计算机设备,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
本实施例中,存储器61(即可读存储介质)包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器61可以是计算机设备的内部存储单元,例如该计算机设备的硬盘或内存。在另一些实施例中,存储器61也可以是计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。当然,存储器51还可以既包括计算机设备的内部存储单元也包括其外部存储设备。本实施例中,存储器61通常用于存储安装于计算机设备的操作系统和各类应用软件,例如实施例一的数据处理装置的程序代码等。此外,存储器61还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器62在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器62通常用于控制计算机设备的总体操作。本实施例中,处理器62用于运行存储器61中存储的程序代码或者处理数据,例如运行数据处理装置,以实现实施例一的数据处理方法。
实施例五:
为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,其包括多个存储介质,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,程序被处理器62执行时实现相应功能。本实施例的计算机可读存储介质用于存储数据处理装置,被处理器62执行时实现实施例一的数据处理方法。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种数据处理方法,其中,应用于分布式存储系统中,所述分布式存储系统包括多个副本服务器,其中所述副本服务器包括主副本服务器和至少一个从副本服务器,对于任一从副本服务器,包括:
    接收主副本服务器发出的原始数据并将所述原始数据写入第一存储单元,作为待处理数据;
    提供一预置的优先级列表,所述优先级列表中包括读写操作状态和压缩操作状态,设置所述读写操作状态优先级高于压缩操作状态;
    实时监测当前系统状态,根据所述优先级列表对所述待处理数据进行校验,同步执行对通过校验的待处理数据的压缩操作,获得目标数据;
    将所述目标数据写入并删除所述待处理数据。
  2. 根据权利要求1所述的数据处理方法,其中,在压缩操作前,还包括以下:
    在所述待处理数据中添加用于确定压缩操作进程的标识位。
  3. 根据权利要求1所述的数据处理方法,其中,对所述待处理数据进行校验,包括:
    判断所述待处理数据是否预置校验码;
    若是,则基于所述校验码对所述待处理数据进行校验;
    若否,在将所述待校验数据进行编码读出校验。
  4. 根据权利要求3所述的数据处理方法,其中,对所述待处理数据进行校验,还包括:
    当校验失败后,从主副本服务器中获取原始数据;
    根据从其他副本服务器获得的原始数据对所述待处理数据进行调整,获得调整后的待处理数据;
    对所述调整后的待处理数据再次进行校验,直至通过校验。
  5. 根据权利要求1所述的数据处理方法,其中,执行对通过校验的待处理数据的压缩操作,获得目标数据包括以下:
    基于预设压缩算法对所述待处理数据处理,获得压缩数据;
    获取所述压缩算法对应的标签,对所述压缩数据在预设位置采用所述标签进行标记,获得带有算法标签的压缩数据作为目标数据。
  6. 一种数据处理方法,其中,用于分布式存储系统中,所述分布式存储系统包括多个副本服务器,其中,所述副本服务器包括主副本服务器和至少一个从副本服务器,应用于主副本服务器,所述主副本服务器存储有所述从副本服务器接收的原始数据,包括以下:
    接收客户端发出的原始数据并将所述原始数据写入第二存储单元,并将所述原始数据发送至从副本服务器;
    接收客户端发出的读出请求,将所述原始数据发送至客户端。
  7. 一种数据处理装置,其中:包括主副本服务器和至少一个从副本服务器;
    所述从副本服务器包括以下:
    第一接收模块,用于接收客户端发出的原始数据并将所述原始数据写入,获得待处理数据;
    优先级列表存储模块,用于提供一预置的优先级列表,所述优先级列表中包括读写操作状态和压缩操作状态,设置所述读写操作状态优先级高于压缩操作状态;
    执行模块,用于实时监测当前系统状态,根据所述优先级列表对所述待处理数据进行校验,同步执行对通过校验的待处理数据的压缩操作,获得目标数据;
    第一处理模块,用于将所述目标数据写入并删除所述待处理数据;
    所述主副本服务器存储有所述从副本服务器接收的原始数据,包括以下:
    第二接收模块,用于接收客户端发出的原始数据并将所述原始数据写入,并将所述原始数据发送至从副本服务器;
    第二处理模块,用于接收客户端发出的读出请求,将所述原始数据发送至客户端。
  8. 根据权利要求7所述的一种数据处理装置,其中,所述执行模块还包括以下:
    检测单元,用于实时监测当前系统是否处于执行读写操作的状态下;
    控制单元,用于根据所述优先级列表依次执行读写操作和压缩操作;
    校验单元,用于控制对所述待处理数据进行校验,判断所述待处理数据是否预置校验码;若是,则基于所述校验码对所述待处理数据进行校验;若否,在将所述待校验数据进行编码读出校验;
    调整单元,用于当校验失败后,从主副本服务器中获取原始数据;根据从其他副本服务器获得的原始数据对所述待处理数据进行调整,获得调整后的待处理数据;对所述调整后的待处理数据再次进行校验,直至通过校验;
    压缩单元,用于控制执行对通过校验的待处理数据的压缩操作,所述压缩单元用于基于预设压缩算法对所述待处理数据处理,获得压缩数据;获取所述压缩算法对应的标签,对所述压缩数据在预设位置采用所述标签进行标记,获得带有算法标签的压缩数据作为目标数据。
  9. 一种计算机系统,其包括多个计算机设备,各计算机设备包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其中,所述多个计算机设备的处理器执行所述计算机程序时共同实现所述数据处理方法的步骤,包括:
    应用于分布式存储系统中,所述分布式存储系统包括多个副本服务器,其中所述副本服务器包括主副本服务器和至少一个从副本服务器,应用于任一从副本服务器,包括:
    接收主副本服务器发出的原始数据并将所述原始数据写入第一存储单元,作为待处理数据;
    提供一预置的优先级列表,所述优先级列表中包括读写操作状态和压缩操作状态,设置所述读写操作状态优先级高于压缩操作状态;
    实时监测当前系统状态,根据所述优先级列表对所述待处理数据进行校验,同步执行对通过校验的待处理数据的压缩操作,获得目标数据;
    将所述目标数据写入并删除所述待处理数据。
  10. 根据权利要求9所述的计算机系统,其中,在压缩操作前,还包括以下:
    在所述待处理数据中添加用于确定压缩操作进程的标识位。
  11. 根据权利要求9所述的一种计算机系统,其中,对所述待处理数据进行校验,包括:
    判断所述待处理数据是否预置校验码;
    若是,则基于所述校验码对所述待处理数据进行校验;
    若否,在将所述待校验数据进行编码读出校验。
  12. 根据权利要求11所述的计算机系统,其中,对所述待处理数据进行校验,还包括:
    当校验失败后,从主副本服务器中获取原始数据;
    根据从其他副本服务器获得的原始数据对所述待处理数据进行调整,获得调整后的待处理数据;
    对所述调整后的待处理数据再次进行校验,直至通过校验。
  13. 根据权利要求9所述的计算机系统,其中,执行对通过校验的待处理数据的压缩操作,获得目标数据包括以下:
    基于预设压缩算法对所述待处理数据处理,获得压缩数据;
    获取所述压缩算法对应的标签,对所述压缩数据在预设位置采用所述标签进行标记,获得带有算法标签的压缩数据作为目标数据。
  14. 一种计算机系统,其包括多个计算机设备,各计算机设备包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其中,所述多个计算机设备的处理器执行所述计算机程序时共同实现所述数据处理方法的步骤,包括应用于分布式存储系统中,所述分布式存储系统包括多个副本服务器,其中,所述副本服务器包括主副本服务器和至少一个从副本服务器,应用于主副本服务器,所述主副本服务器存储有所述从副本服务器接收的原始数据,包括:
    接收客户端发出的原始数据并将所述原始数据写入第二存储单元,将所述原始数据发送至从副本服务器;
    接收客户端发出的读出请求,将所述原始数据发送至客户端。
  15. 一种计算机可读存储介质,其包括多个存储介质,各存储介质上存储有计算机程序,其中,所述多个存储介质存储的所述计算机程序被处理器执行时共同实现所述数据处理方法的步骤,包括:
    应用于分布式存储系统中,所述分布式存储系统包括多个副本服务器,其中所述副本服务器包括主副本服务器和至少一个从副本服务器,应用于任一从副本服务器,包括:接收主副本服务器发出的原始数据并将所述原始数据写入第一存储单元,作为待处理数据;提供一预置的优先级列表,所述优先级列表中包括读写操作状态和压缩操作状态,设置所述读写操作状态优先级高于压缩操作状态;实时监测当前系统状态,根据所述优先级列表对所述待处理数据进行校验,同步执行对通过校验的待处理数据的压缩操作,获得目标数据;将所述目标数据写入并删除所述待处理数据。
  16. 根据权利要求15所述的可读存储介质,其中,在压缩操作前,还包括以下:
    在所述待处理数据中添加用于确定压缩操作进程的标识位。
  17. 根据权利要求15所述的计算机系统,其中,对所述待处理数据进行校验,包括:
    判断所述待处理数据是否预置校验码;
    若是,则基于所述校验码对所述待处理数据进行校验;
    若否,在将所述待校验数据进行编码读出校验。
  18. 根据权利要求17所述的可读存储介质,其中,对所述待处理数据进行校验,还包括:
    当校验失败后,从主副本服务器中获取原始数据;
    根据从其他副本服务器获得的原始数据对所述待处理数据进行调整,获得调整后的待处理数据;
    对所述调整后的待处理数据再次进行校验,直至通过校验。
  19. 根据权利要求15所述的可读存储介质,其中,执行对通过校验的待处理数据的压缩操作,获得目标数据包括以下:
    基于预设压缩算法对所述待处理数据处理,获得压缩数据;
    获取所述压缩算法对应的标签,对所述压缩数据在预设位置采用所述标签进行标记,获得带有算法标签的压缩数据作为目标数据。
  20. 一种计算机可读存储介质,其包括多个存储介质,各存储介质上存储有计算机程序,其中,所述多个存储介质存储的所述计算机程序被处理器执行时共同实现所述数据处理方法的步骤,包括应用于分布式存储系统中,所述分布式存储系统包括多个副本服务器,其中,所述副本服务器包括主副本服务器和至少一个从副本服务器,应用于主副本服务器,所述主副本服务器存储有所述从副本服务器接收的原始数据,包括:
    接收客户端发出的原始数据并将所述原始数据写入第二存储单元,将所述原始数据发送至从副本服务器;
    接收客户端发出的读出请求,将所述原始数据发送至客户端。
PCT/CN2020/118457 2020-07-29 2020-09-28 数据处理方法、装置、计算机系统及可读存储介质 WO2021174828A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010743261.8A CN111880740B (zh) 2020-07-29 数据处理方法、装置、计算机系统及可读存储介质
CN202010743261.8 2020-07-29

Publications (1)

Publication Number Publication Date
WO2021174828A1 true WO2021174828A1 (zh) 2021-09-10

Family

ID=73200519

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118457 WO2021174828A1 (zh) 2020-07-29 2020-09-28 数据处理方法、装置、计算机系统及可读存储介质

Country Status (1)

Country Link
WO (1) WO2021174828A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114398006A (zh) * 2021-12-24 2022-04-26 中国电信股份有限公司 一种分布式存储模式控制方法、装置、设备以及存储介质
CN114999559A (zh) * 2022-08-03 2022-09-02 合肥康芯威存储技术有限公司 一种存储芯片的测试方法、系统及存储介质
CN116048429A (zh) * 2023-04-03 2023-05-02 创云融达信息技术(天津)股份有限公司 一种多副本读写方法及装置
CN116455753A (zh) * 2023-06-14 2023-07-18 新华三技术有限公司 一种数据平滑方法及装置
CN116527539A (zh) * 2023-05-15 2023-08-01 合芯科技(苏州)有限公司 数据一致性校验方法、装置及计算机设备
CN116579551A (zh) * 2023-04-28 2023-08-11 广东技术师范大学 一种基于智能制造的智能管理系统和方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102543108A (zh) * 2011-08-16 2012-07-04 北京友友天宇系统技术有限公司 基于分布式存储的视频冗余策略优化方法
US20150286833A1 (en) * 2014-04-02 2015-10-08 Cleversafe, Inc. Controlling access in a dispersed storage network
CN110881062A (zh) * 2019-10-18 2020-03-13 平安科技(深圳)有限公司 基于大数据的文件传输方法、装置、设备和存储介质
CN111104069A (zh) * 2019-12-20 2020-05-05 北京金山云网络技术有限公司 分布式存储系统的多区域数据处理方法、装置及电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102543108A (zh) * 2011-08-16 2012-07-04 北京友友天宇系统技术有限公司 基于分布式存储的视频冗余策略优化方法
US20150286833A1 (en) * 2014-04-02 2015-10-08 Cleversafe, Inc. Controlling access in a dispersed storage network
CN110881062A (zh) * 2019-10-18 2020-03-13 平安科技(深圳)有限公司 基于大数据的文件传输方法、装置、设备和存储介质
CN111104069A (zh) * 2019-12-20 2020-05-05 北京金山云网络技术有限公司 分布式存储系统的多区域数据处理方法、装置及电子设备

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114398006A (zh) * 2021-12-24 2022-04-26 中国电信股份有限公司 一种分布式存储模式控制方法、装置、设备以及存储介质
CN114999559A (zh) * 2022-08-03 2022-09-02 合肥康芯威存储技术有限公司 一种存储芯片的测试方法、系统及存储介质
CN116048429A (zh) * 2023-04-03 2023-05-02 创云融达信息技术(天津)股份有限公司 一种多副本读写方法及装置
CN116579551A (zh) * 2023-04-28 2023-08-11 广东技术师范大学 一种基于智能制造的智能管理系统和方法
CN116579551B (zh) * 2023-04-28 2023-12-08 广东技术师范大学 一种基于智能制造的智能管理系统和方法
CN116527539A (zh) * 2023-05-15 2023-08-01 合芯科技(苏州)有限公司 数据一致性校验方法、装置及计算机设备
CN116527539B (zh) * 2023-05-15 2023-11-28 合芯科技(苏州)有限公司 数据一致性校验方法、装置及计算机设备
CN116455753A (zh) * 2023-06-14 2023-07-18 新华三技术有限公司 一种数据平滑方法及装置
CN116455753B (zh) * 2023-06-14 2023-08-18 新华三技术有限公司 一种数据平滑方法及装置

Also Published As

Publication number Publication date
CN111880740A (zh) 2020-11-03

Similar Documents

Publication Publication Date Title
WO2021174828A1 (zh) 数据处理方法、装置、计算机系统及可读存储介质
US10963341B2 (en) Isolating the introduction of software defects in a dispersed storage network
US11327840B1 (en) Multi-stage data recovery in a distributed storage network
EP3014451B1 (en) Locally generated simple erasure codes
US9766810B2 (en) Resolving write conflicts in a dispersed storage network
US9483539B2 (en) Updating local data utilizing a distributed storage network
US9092140B2 (en) Dispersed storage write process
CN109964215B (zh) 具有环形缓冲区镜像的远程直接存储器访问数据通信中的流控制
JP2007058286A (ja) 記憶装置のフォーマットを不要としたストレージシステム及び記憶制御方法
US10416898B2 (en) Accessing data in a dispersed storage network during write operations
US20190034276A1 (en) Resolving write conflicts in a dispersed storage network
US10120574B2 (en) Reversible data modifications within DS units
US10007575B2 (en) Alternative multiple memory format storage in a storage network
US10536525B2 (en) Consistency level driven data storage in a dispersed storage network
US10146645B2 (en) Multiple memory format storage in a storage network
CN111880740B (zh) 数据处理方法、装置、计算机系统及可读存储介质
US20170346898A1 (en) Enhancing performance of data storage in a dispersed storage network
US10594793B2 (en) Read-prepare requests to multiple memories
US20190171375A1 (en) Adjusting optimistic writes in a dispersed storage network
US10956266B2 (en) Processing data access transactions in a dispersed storage network using source revision indicators
US10831397B2 (en) Stateful relocator for a distributed storage network
US10133634B2 (en) Method for performing in-place disk format changes in a distributed storage network
CN117331497A (zh) 一种磁盘阵列算法任务处理方法、装置、设备及介质
US20190294494A1 (en) Virtualization of storage units in a dispersed storage network
US20190197032A1 (en) Preventing unnecessary modifications, work, and conflicts within a dispersed storage network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20923584

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20923584

Country of ref document: EP

Kind code of ref document: A1