WO2020253083A1 - 主备存储卷同步数据校验方法、装置、设备及存储介质 - Google Patents

主备存储卷同步数据校验方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2020253083A1
WO2020253083A1 PCT/CN2019/119090 CN2019119090W WO2020253083A1 WO 2020253083 A1 WO2020253083 A1 WO 2020253083A1 CN 2019119090 W CN2019119090 W CN 2019119090W WO 2020253083 A1 WO2020253083 A1 WO 2020253083A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
storage volume
backup
check value
primary
Prior art date
Application number
PCT/CN2019/119090
Other languages
English (en)
French (fr)
Inventor
陈泽冰
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020253083A1 publication Critical patent/WO2020253083A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems

Definitions

  • This application relates to the technical field of big data processing, and in particular to a method, device, equipment, and computer-readable storage medium for verifying synchronization data of primary and secondary storage volumes.
  • the data and update frequency of the primary storage volume in the storage system is the highest, and it is also the main storage unit.
  • the volume is used for the backup of the primary storage volume.
  • the backup storage volume can be used to restore data.
  • the main purpose of this application is to provide a method, device, equipment, and computer-readable storage medium for verifying synchronization data of primary and secondary storage volumes, which are designed to solve the time-consuming method of verifying data due to the existing full-scale real-time calculation and comparison method.
  • this application provides a method for verifying synchronization data of primary and backup storage volumes.
  • the method includes:
  • Obtain the data in the main storage volume calculate the first check value of the data according to the digest check algorithm, and store the first check value in the data, where the data includes the main storage All data in the volume or data to be backed up, where the first check value is used to check the integrity of the data after being backed up to the backup storage volume;
  • the step of calculating the first check value of the data according to the digest check algorithm includes:
  • the data is sliced according to the storage timestamp of the data to obtain several data blocks;
  • the sub-check value of the data block is respectively calculated based on the digest check algorithm, and the first check value is calculated according to the sub-check value corresponding to the data block.
  • the calculating the sub-check value of the data block based on the digest check algorithm, and calculating the first check value according to the sub-check value corresponding to the data block includes:
  • the value composed of the four 32-bit groups is cascaded to obtain the first check value.
  • calculating a first check value of the data according to a digest check algorithm, and storing the first check value in the data further include:
  • the data blocks are sequentially transmitted to the backup storage volume for storage according to a preset data transmission protocol, where the data transmission protocol is used to control the primary storage volume and all storage volumes. Data transfer between storage volumes.
  • the step of obtaining the backup data in the backup storage volume and calculating the second check value of the backup data according to the digest check algorithm includes:
  • the method further includes:
  • the step of calculating the sub-check value of the data block based on the digest check algorithm includes:
  • the method further includes:
  • the current timestamp read the data corresponding to the current timestamp in the primary storage volume and the backup storage volume respectively, and compare them with each other to obtain a comparison result;
  • the comparison result it is determined whether a data update operation is required and a data update method is determined, and the data update method includes full update or partial update.
  • the present application also provides a device for verifying synchronization data of primary and backup storage volumes, and the device for verifying synchronization data of primary and backup storage volumes includes:
  • the first calculation module is used to obtain the data in the main storage volume, calculate the first check value of the data according to the digest check algorithm, and store the first check value in the data, where all The data includes all data in the primary storage volume or data to be backed up, and the first check value is used to verify the integrity of the data after being backed up to the backup storage volume;
  • the second counting module is used to obtain the backup data in the backup storage volume, and calculate the second check value of the backup data according to the digest check algorithm;
  • a verification module configured to determine whether the backup data in the backup storage volume is abnormal based on the first check value and the second check value, where the abnormality is data in the backup storage volume Inconsistent with the data in the main storage volume;
  • the backup module is configured to determine that the backup data in the backup storage volume is abnormal in the verification module, and then start the data backup program to send the data in the primary storage volume to the backup storage volume, and The original backup data in the backup storage volume of the injury.
  • the first calculation module includes a data cutting unit and a calculation unit
  • the data cutting unit is configured to perform slicing processing on the data according to the storage timestamp of the data according to the data slicing algorithm to obtain several data blocks;
  • the calculation unit is configured to calculate the sub-check value of the data block based on the digest check algorithm, and calculate the first check value according to the sub-check value corresponding to the data block.
  • the data cutting unit is configured to group the data block in a 512-bit packet format to obtain a value composed of four 32-bit packets;
  • the calculation unit is used to concatenate the values composed of the four 32-bit groups to obtain the first check value.
  • the device for verifying synchronization data of the primary and backup storage volumes further includes: a detection module and a sending module;
  • the detection module is configured to detect whether the primary storage volume receives a data update request from the backup storage volume
  • the sending module is configured to transmit the data blocks to the backup storage volume in sequence according to a preset data transmission protocol when the detection module detects that the data update request is received, wherein the data transmission The protocol is used to control data transmission between the primary storage volume and the backup storage volume.
  • the second calculation module is configured to receive the data block sent by the primary storage volume, and calculate the digest check value of the data block according to the digest check algorithm to obtain the first Two check value.
  • the device for verifying synchronization data of the primary and backup storage volumes further includes: an acquiring module configured to acquire a storage time stamp of the data block stored in the backup storage volume; and determine the currently transmitted data according to the storage time stamp Whether the block is the earliest data block;
  • the calculating unit is configured to calculate the sub-check value of the currently transmitted data block according to the digest check algorithm when the currently transmitted data block is not the earliest time data block; and obtain the previous data block of the currently transmitted data block The sub-check value of the data block; the sub-check value of the previous data block is added to the sub-check value of the currently transmitted data block to obtain the actual check value of the currently transmitted data block, and Sending the actual check value as the second check value to the backup storage volume.
  • the device for verifying synchronization data of the primary and backup storage volumes further includes: a judging module for judging whether the timing count of the verification timer on the backup storage volume has reached; if the timing of the verification timer is When the count reaches, obtain the current timestamp of the primary storage volume and the backup storage volume; according to the current timestamp, read the data corresponding to the current timestamp in the primary storage volume and the backup storage volume respectively , And perform mutual comparison to obtain a comparison result; determine whether a data update operation is required and a method for determining data update according to the comparison result, and the data update method includes full update or partial update.
  • the present application also provides a device for verifying synchronization data of the primary and backup storage volumes.
  • the device for verifying synchronization data of the primary and backup storage volumes includes: a memory, a processor, and a device that is A primary and backup storage volume synchronization data verification program running on the processor, and the primary and backup storage volume synchronization data verification program is executed by the processor to implement the primary and backup storage volume synchronization data as described in any of the preceding items Steps of the verification method.
  • the present application also provides a computer-readable storage medium in which computer instructions are stored.
  • the computer instructions run on a computer, the computer executes the above-mentioned primary and secondary storage volumes. Synchronous data verification method steps.
  • this application uses a digest verification algorithm to assist in calculating the verification value of the data on the primary and secondary storage volumes respectively. Compare the integrity of the data in the primary and backup storage volumes. If the comparison result is incomplete, control the backup storage volume to read the data in the primary storage volume to replace and update the data in the backup storage volume, based on the summary correction Compared with the existing verification process, the verification algorithm greatly reduces the length of verification time, thereby improving the comparison efficiency of data update.
  • the data can be sliced and recalculated, and the start of the verification process can be controlled by setting a timer, thereby realizing automatic data verification, saving human resources, avoiding errors in comparison, and providing The accuracy of the calibration.
  • FIG. 1 is a schematic flowchart of a first embodiment of a method for verifying synchronization data of primary and backup storage volumes provided by this application;
  • FIG. 2 is a schematic flowchart of a second embodiment of a method for verifying synchronization data of primary and backup storage volumes provided by this application;
  • FIG. 3 is a schematic diagram of a slice grouping calculation check value provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of functional modules of an embodiment of an apparatus for verifying synchronization data of primary and backup storage volumes provided by this application;
  • FIG. 5 is a schematic diagram of the structure of a server involved in a solution according to an embodiment of the application.
  • the synchronization data verification method provided by the primary and backup storage volumes is mainly based on the MD5 digest verification algorithm to calculate the MD5 value in the storage volume for data verification assistance, which can quickly verify the primary storage volume and The data synchronization problem of the backup storage volume.
  • obtain the main storage volume data calculate the MD5 value and save it in the text.
  • the secondary storage volume data is obtained, the MD5 value is calculated and stored in the text.
  • the main storage volume will calculate the MD5 value at a certain time interval, and then compare it with the MD5 value of the previous time period. When it detects that its own MD5 value has changed, it is explained The main storage volume data has been updated. The change in the MD5 value of the primary storage volume will trigger both the primary and backup storage volumes to calculate and compare the MD5 value to check whether the data of the backup storage volume has been updated synchronously.
  • the physical implementation system of the method may be a personal computer (PC), a smart phone, a server, and other operation terminals with remote access functions. Based on such hardware results, various embodiments of the synchronization data verification method of the present application are proposed.
  • FIG. 1 is a flowchart of a method for verifying synchronization data of primary and backup storage volumes according to an embodiment of the application.
  • the method for verifying synchronization data of the primary and backup storage volumes specifically includes the following steps:
  • Step S10 Obtain data in the main storage volume, calculate a first check value of the data according to a digest check algorithm, and store the first check value in the data;
  • the data includes one of the following data: all the data in the main storage volume and the data to be backed up.
  • the data to be backed up refers to part of the data in the main storage volume, preferably some Data within a period of time, but the data format of the data stored in each period of time is stored in the same data format. This can be set according to the specific backup situation.
  • the first check value is used to verify the integrity of the data to be backed up, that is, when the data to be backed up is backed up from the primary storage volume to the backup storage volume, the backup is performed according to the first check value.
  • the integrity verification is performed on the data to be backed up (that is, the backup data) after the backup storage volume.
  • the calculation of the first check value is specifically implemented according to the digest algorithm MD5.
  • MD5 By using the MD5 algorithm to calculate the data obtained from the main storage volume, an MD5 value is obtained.
  • This MD5 value is the first Check value.
  • the MD5 value can be used to check the data in the backup storage volume and compare whether it is consistent.
  • Step S20 Obtain the backup data in the backup storage volume, and calculate a second check value of the backup data according to the digest check algorithm;
  • the backup data stored in the backup storage volume is essentially the data backed up from the primary storage volume.
  • the primary and backup storage volumes are a pair of storage devices that implement mutual verification.
  • the backup storage volume and the primary storage volume are connected to each other through a data interface, and the backup storage volume is controlled by a timer to periodically read data from the primary storage volume for storage.
  • the backup storage volume is checking the read data When storing, it is not necessary to clean up the original data, but accumulate storage. When the storage space of the backup storage volume is not enough, then selectively clean up; it can also be that the backup storage volume reads data from the primary storage volume every time Clean up and replace the data in the backup storage volume.
  • the calculation of the second check value is also the same as the calculation method of step S10, but the results obtained may be different.
  • the calculation of the second check value includes two cases, one is calculated based on the original backup data in the backup storage volume, and the other is based on the backup data of the backup storage volume after it is changed.
  • the backup data is calculated;
  • the second check value calculated based on the original backup data in the backup storage volume it is actually the check value calculated according to the digest check algorithm every time the backup data is triggered and the check passes. , The check value at this time will be stored in the backup data;
  • the second check value calculated based on the backup data after the backup data of the backup storage volume has changed it is actually not performed every time the backup storage volume receives the data to be backed up sent from the primary storage volume. Before verification, it is calculated by the digest verification algorithm. At this time, the second verification value does not need to be stored. It is calculated in real time and is mainly used to determine the data to be backed up in the process of backing up to the backup storage volume. Whether data mutation or tampering has occurred.
  • Step S30 Determine whether the backup data in the backup storage volume is abnormal based on the first check value and the second check value;
  • this step it is first necessary to extract the corresponding first check value and second check value from the primary and backup storage volumes, and then compare them; and for the extraction of the first check value and the second check value, Generally, the extraction is performed based on the stored identifier.
  • the first check value is stored in the data, it will have a unique identifier in the data or be stored in a specific location in the data, based on the identifier or The identification and extraction of the specific location is sufficient, and the same is true for the extraction of the second check value.
  • the first case is obtained according to the extraction method of the first check child, and the second case needs to be calculated in real time according to the digest check algorithm.
  • the abnormality is that the data in the backup storage volume is inconsistent with the data in the primary storage volume.
  • the inconsistency here may be due to incomplete data backup, or in the backup process. In, the data has changed.
  • the verification value is calculated based on the same calculation method and the same data If the verification result is consistent, it means that the backup data in the backup storage volume is the same as the data in the primary storage volume. If they are not the same, the reason for the difference may need to be determined based on the time point when the second checksum is calculated .
  • One is when the data on the backup storage device reaches a certain point in time when it must be updated, and the other is when the backup storage volume just completes the data.
  • the second check value calculated at the point in time that needs to be updated is inconsistent with the first check value, it is considered that there is new data in the data in the primary storage volume, and the backup storage volume must execute Backup data update program; if it is the second check value calculated when the data transmission and backup has just been completed, if it is inconsistent with the first check value, it is considered that the backup storage volume is in the process of receiving the data in the primary storage volume Error, you need to perform the procedure to back up the data again.
  • step S40 In the process of comparing the first check value with the second check value in this step, there may be two comparison situations. One is that the second check value is calculated based on the original backup data in the backup storage volume. At this time, the check value obtained in this case is recorded as check value A. At this time, it is only necessary to compare whether the first check value is the same as the check value A. If not, step S40 is executed.
  • the second check value includes check values calculated in two cases, namely check value A and check value B, where check value B is based on the backup data of the backup storage volume after the change The backup data is calculated. At this time, it is necessary to compare whether the first check value and check value A are the same. If they are not the same, continue to compare whether the first check value and check value B are the same. If they are not the same, then Re-request the primary storage volume to send the data to be backed up, and perform step S40; if the same, the backup step is ended, and the data changes of the primary and backup storage volumes are continuously monitored.
  • Step S40 if it is determined that the backup data in the backup storage volume is abnormal, start the data backup program to send the data in the primary storage volume to the backup storage volume, and delete the backup data in the backup storage volume.
  • the original backup data if it is determined that the backup data in the backup storage volume is abnormal, start the data backup program to send the data in the primary storage volume to the backup storage volume, and delete the backup data in the backup storage volume. The original backup data.
  • the primary storage volume when the primary storage volume sends the data to be backed up to the backup storage volume, the data to be backed up can be fragmented one by one, and the fragmented data blocks are transmitted in the order of storage time stamps. While transmitting the data block, it also includes calculating the corresponding sub-check value for each data block, and on the side of the backup storage volume, it can also be checked one by one by calculating the received data block.
  • the above-mentioned method is combined with the MD5 algorithm to calculate the check value to achieve the integrity comparison of the data in the primary and backup storage volumes, which greatly saves the time length of data comparison, and can also perform partial data comparison.
  • the backup storage volume can keep the data synchronization and the same with the primary storage volume in real time, avoiding the loss of the primary storage volume data.
  • the data in order to further improve the calculation efficiency, for steps S10 and S20, when calculating the check value, the data can be sliced before calculation, and the sliced data block can also be appropriately processed. Select a representative data block to compare and calculate, such as a data block with a large amount of data.
  • the step of calculating the first check value of the data according to the digest check algorithm of the data includes:
  • the data is sliced according to the storage timestamp of the data to obtain several data blocks; and the slicing process is to divide the data to be backed up into data blocks, and each The time intervals of the data blocks are equal;
  • the sub-check value of the data block is respectively calculated based on the digest check algorithm, and the first check value is calculated according to the sub-check value corresponding to the data block.
  • the sub-check value can be calculated in two ways, one is to calculate the sub-check value separately, and the other is for each data block.
  • the sub-check value of the block is calculated on the basis of the self-check value of the previous data block, that is to say, the sub-check value of the later data block can simultaneously realize the calibration of the previous data block and its own data. Test.
  • the check value is the check value of the first block of data plus the check value of the second block of data.
  • the specific method based on the MD5 algorithm can also be calculated by grouping.
  • the specific process is as follows:
  • the calculating the sub-check value of the data block based on the digest check algorithm, and calculating the first check value according to the sub-check value corresponding to the data block includes:
  • the value composed of the four 32-bit groups is cascaded to obtain the first check value.
  • the MD5 uses 512-bit packets to process the input information, and each packet is divided into 16 32-bit sub-groups. After a series of processing, the output of the algorithm consists of four 32-bit packets. After cascading these four 32-bit packets, a 128-bit hash value is generated, and the data of the 128-bit hash value is the first parity data and is stored in the data of the main storage volume.
  • the primary storage volume can send this value directly to the backup storage volume along with the data to be backed up, and the backup storage volume can directly compare the backup data to verify the integrity according to the received checksum. Need to read from the primary storage volume again.
  • bit length (Bits Length) of the data will be extended to N*512+448, where N is a non-negative integer, and N can be zero.
  • the filling method is as follows:
  • a 64-bit binary representation of the data length before filling (unit: Bit) is appended to the result. If the data length before filling in the binary representation exceeds 64 bits, the lower 64 bits are taken.
  • the initial 128-bit value is the initial test link variable.
  • variables A, B, C, and D in the program are 0x67452301 respectively , 0xEFCDAB89, 0x98BADCFE, 0x10325476)
  • the first group needs to copy the above four link variables to the other four variables: A to a, B to b, C to c, D to d.
  • the main loop has four rounds (MD4 only has three rounds), and each round is very similar. Perform 16 operations in the first round. Each operation performs a nonlinear function operation on three of a, b, c, and d, and then adds the result to the fourth variable, a subgroup of the text and a constant. Then move the result to the left by an indefinite number, and add one of a, b, c or d. Finally replace one of a, b, c, or d with the result.
  • An MD5 operation consists of similar 64 cycles, divided into 4 groups of 16 times.
  • Mi Represents a 32-bits input data
  • Ki Represents a 32-bits constant, used to complete each different calculation, the specific flow chart is shown in Figure 3.
  • F is a function of bitwise operation. That is, if X, then Y, otherwise Z.
  • Function H is a bit-wise parity operator.
  • the final output is the cascade of a, b, c, and d, and the result of the cascade is the first check value.
  • the second check value is calculated after the data update is performed.
  • the data blocks are sequentially transmitted to the backup storage volume for storage according to a preset data transmission protocol, and the data transmission protocol is used to control the primary storage volume and the backup storage volume. Data transfer between storage volumes.
  • the primary storage volume and the backup storage volume need to be written in advance through the handshake protocol to determine the corresponding school
  • the verification value can be calculated after the verification method, which can further ensure the correspondence of the calculation results.
  • the step of obtaining the backup data in the backup storage volume and calculating the second check value of the backup data according to the digest check algorithm includes:
  • the check value may be calculated once for each data block received for comparison. , So as to ensure that the data received every time is consistent with the data of the main storage volume, which further improves the efficiency of data backup. Compared with the prior art, there is no need to wait for the backup to be completed and then back up again. Data blocks with inconsistent verification can be backed up again by request.
  • the data backup of the backup storage volume is verified at the time point when the update arrives, the data blocks are sequentially transferred to the backup storage volume for storage according to the preset data transmission protocol. After that, it also includes:
  • the calculation of the sub-check value of the data block based on the digest check algorithm includes:
  • the MD5 value calculation for the data of the primary and backup storage volumes is based on the md5 function of the Python library function hashlib. This function can directly calculate the data we sent in the past.
  • the primary storage volume has 5M data.
  • the backup volume When it is synchronized to the backup volume, it is divided into 5 1MB files.
  • two MD5 values are calculated, one is the MD5 of the current 1MB data, and the other is The MD5 after splicing, assuming that the second piece is synchronized now, this MD5 should be the MD5 of the first piece plus the second piece.
  • the method further includes:
  • the current timestamp read the data corresponding to the current timestamp in the primary storage volume and the backup storage volume respectively, and compare them with each other to obtain a comparison result;
  • the data update method includes one of full update and partial update.
  • the comparison time can be reduced by comparing the MD5 value, and the inconsistency of data can also be found in advance. For example, if MD5 value calculation is not performed on the data of the primary and backup storage volumes, but the data of the primary volume and the backup volume are compared one-to-one, this will greatly increase the comparison time. There is no way to know which piece of data is abnormal with this full comparison method, and the full comparison will not start until all the data is transmitted.
  • Judging the abnormality is the inconsistency of the MD5 value, indicating that the data synchronized between the primary volume and the backup volume is inconsistent, so it can indicate that the data synchronization is abnormal. Based on the abnormality, the corresponding backup data can be re-backed up, thereby improving the efficiency of data comparison and discovering data synchronization in time Whether it is abnormal and at what point in time the abnormal data appears. Excluding abnormalities should improve the quality of the product, but still use this method to compare data to see if abnormalities occur during the synchronization process.
  • FIG. 2 it is a detailed flowchart of a method for verifying synchronization data of primary and backup storage volumes according to an embodiment of this application.
  • the method specifically includes the following steps:
  • Step S210 Collect all data on the main storage volume, and perform slice and group processing on all the data
  • the slice grouping processing refers to the use of the MD5 algorithm to first divide the collected data at fixed time intervals, and then filter out several data blocks that occupy a large amount of memory from the divided data , And sort in chronological order, and then group the data blocks in a 512-bit packet format to obtain a value composed of four 32-bit packets.
  • Step S220 Based on the data after the slice grouping processing, the MD5 algorithm is used to calculate the first check value;
  • the check value is calculated based on the value composed of four 32-bit groups after the grouping, and it can be obtained by connecting directly in series.
  • Step S230 Detect whether a data update request is received on the backup storage volume
  • the detection of the request can be determined by detecting the working status of the backup timer set on the backup storage volume.
  • the data backup on the backup storage volume is controlled by a timer. When it is detected that the timer is triggered, step S240 is executed, otherwise, step S250 is executed.
  • Step S240 start the data backup program, obtain the data in the main storage volume, and save it in the backup storage volume;
  • Step S250 Obtain the backup data in the backup storage volume, and calculate the second check value of the backup data according to the MD5 algorithm;
  • the second check value there are specifically the following two cases.
  • One is that during the backup process, the data blocks sent by the primary storage volume to the backup storage volume are sampled and compared to the The summary check value of the data block is calculated to obtain the second check value; in another case, the backup data is directly obtained from the backup storage volume, and then the backup data is sliced and grouped and processed based on the sliced data. The calculation of the check value obtains the second check value.
  • Step S260 Compare whether the second check value is the same as the first check value.
  • step S270 if the second check value is not the same as the first check value, a data backup process is performed, or the data on the primary storage volume is restored.
  • the second check value according to the transmitted data block when calculating the second check value according to the transmitted data block, it further includes determining whether the currently transmitted data block is the earliest data block according to the storage timestamp of the data block;
  • calculating the sub-check values of the data block respectively based on the check algorithm includes:
  • the sub-check value of the previous data block is added to the sub-check value calculated according to the currently transmitted data block to obtain the actual check value of the currently transmitted data block, and the actual check value is The verification value is used as the second verification value.
  • the MD5 value calculation for the data of the primary and backup storage volumes is based on the md5 function of the Python library function hashlib. This function can directly calculate the data we sent in the past.
  • the primary storage volume has 5M data.
  • the backup volume When it is synchronized to the backup volume, it is divided into 5 1MB files.
  • two MD5 values are calculated, one is the MD5 of the current 1MB data, and the other is The MD5 after splicing, assuming that the second piece is synchronized now, this MD5 should be the MD5 of the first piece plus the second piece.
  • This calculation method can determine the specific location of the data abnormality. The abnormality is judged as the MD5 value is inconsistent, indicating that the data synchronized between the primary volume and the backup volume is inconsistent. Therefore, it can indicate that the data synchronization is abnormal.
  • the corresponding backup data can be reproduced. Backup, thereby improving the efficiency of data comparison, timely discovering whether the data synchronization is abnormal and at which point in time the abnormal data occurs. Excluding abnormalities should improve the quality of the product, but still use this method to compare data to see if abnormalities occur during the synchronization process.
  • this application also provides a primary and backup storage volume synchronization data verification device.
  • the primary and backup storage volume synchronization data verification device can be used to implement the primary and backup storage volume synchronization data verification provided in the embodiments of this application.
  • the physical realization of the method exists in the form of a local PC computer and a server, and the specific hardware realization of the server is shown in Figure 4.
  • the server includes: a processor 301, such as a CPU, a communication bus 302, a user interface 303, a network interface 304, and a memory 305.
  • the communication bus 302 is used to implement connection and communication between these components.
  • the user interface 303 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the network interface 304 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 305 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory.
  • the memory 305 may also be a storage device independent of the aforementioned processor 301.
  • the hardware structure of the device shown in FIG. 4 does not constitute a limitation on the device for verifying the synchronization data of the primary and backup storage volumes, and may include more or less components than shown in the figure, or a combination of certain components. Components, or different component arrangements.
  • the storage 305 which is a computer-readable storage medium, may include an operating system, a network communication module, a user interface module, and a synchronization data verification program based on primary and backup storage volumes.
  • the operating system is a program for managing and data analysis devices and software resources, and the main and backup storage volumes synchronize data verification programs and the operation of other software and/or programs.
  • the network interface 104 is mainly used to access the network; the user interface 103 is mainly used to communicate with the external Internet or with a remote server that provides enterprise data, and access to the remote server
  • the data stored in all the databases in the database is analyzed and processed to obtain the corresponding abnormal data structure, and the processor 301 can be used to call the synchronization data verification program of the primary and backup storage volumes stored in the memory 305, and execute the following Operation of each embodiment of the method for verifying synchronization data of primary and secondary storage volumes.
  • the implementation of FIG. 4 can also be a PC terminal with a touch operation platform such as a server.
  • the processor of the PC terminal can realize the main control by reading the data stored in the buffer or storage unit.
  • the program code of the backup storage volume synchronization data verification method is used for data verification when the primary and backup storage volumes synchronize data.
  • FIG. 5 is a functional module of the apparatus for verifying synchronization data of primary and backup storage volumes provided by an embodiment of this application. Schematic diagram.
  • the device includes:
  • the first calculation module 41 is configured to obtain data in the main storage volume, calculate a first check value of the data according to a digest check algorithm, and store the first check value in the data, where:
  • the data includes one of the following data: all data in the primary storage volume and data to be backed up, and the first check value is used to verify the integrity of the data after being backed up to the backup storage volume. Test
  • the second counting module 42 is used to obtain the backup data in the backup storage volume, and calculate the second check value of the backup data according to the digest check algorithm;
  • the check module 43 is configured to determine whether there is an abnormality in the backup data in the backup storage volume based on the first check value and the second check value, and the abnormality is the data in the backup storage volume and The data in the primary storage volume is inconsistent;
  • the backup module 44 is configured to determine that the backup data in the backup storage volume is abnormal in the verification module, then start the data backup program to send the data in the primary storage volume to the backup storage volume, And delete the original backup data in the backup storage volume.
  • the content of the embodiment of the device for verifying the synchronization data of the primary and backup storage volumes is not repeated in this embodiment.
  • the check value is calculated by using a digest check algorithm on the data on the primary and backup storage volumes, and the digest check algorithm used when calculating the check value first performs slice and grouping processing on the data, and then compares the obtained data
  • the block calculates the check value, and then compares the check value to determine whether the synchronized data is abnormal, and finally determines whether it needs to be backed up again according to the comparison result, and calculates the check value through the above method combined with the MD5 algorithm to realize the main and backup storage
  • the integrity comparison of the data in the volume greatly saves the time length of data comparison, and it can also compare and update part of the data, so that the backup storage volume can keep the data synchronization and the same with the primary storage volume in real time, avoiding Loss of primary storage volume data.
  • the application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium, or may be a volatile computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer executes the following steps:
  • Obtain the data in the main storage volume calculate the first check value of the data according to the digest check algorithm, and store the first check value in the data, where the data includes: the main All data in the storage volume or data to be backed up, where the first check value is used to verify the integrity of the data after being backed up to the backup storage volume;
  • the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. ⁇
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium (such as ROM/RAM), including Several instructions are used to make a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种主备存储卷同步数据校验方法、校验装置、设备及计算机可读存储介质,通过分别对主备存储卷上的数据采用摘要校验算法计算校验值,并且在计算校验值时采用的摘要校验算法先对数据进行切片分组处理后,再对得到的数据块进行校验值的计算,然后比对校验值来确定同步数据是否异常,最后根据比对的结果确定是否需要重新备份。通过上述的方式结合MD5算法计算校验值来实现对主备存储卷中的数据的完整性比对,大大节省的数据比较的时间长度,而且还可以进行部分数据的比对更新,使得备存储卷可以实时保持与主存储卷的数据的同步和相同,避免了主存储卷数据的丢失。

Description

主备存储卷同步数据校验方法、装置、设备及存储介质
本申请要求于2019年6月18日提交中国专利局、申请号为201910526266.2、发明名称为“主备存储卷同步数据校验方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及大数据处理技术领域,尤其涉及一种主备存储卷同步数据校验方法、装置、设备及计算机可读存储介质。
背景技术
随着存储系统应用的普及,尤其是对于企业或者个人用户用于进行大数据存储或者是构建服务器系统时,应用得极为广泛,但是对于不同的使用群体,其对存储系统的安全性能和数据的保护性能会存在不同的要求;尤其是现在对主备存储卷组成的系统越来越多,主存储卷在存储系统中保存的数据和更新频率是最高的,也是主要的储存单元,而备存储卷则是用于对主存储卷的备份使用,当存储系统出现故障后,可以通过备存储卷进行数据的恢复。
因此,对于备存储卷中的数据备份是否完整是非常重要的,在现有技术中,对于备存储卷中的数据每次的更新同步及对数据完整性的校验中,通常采用的是全量比较的方法,而该方法需要对主存储卷的全部数据逐一传输给备存储卷,并且还要一条一条数据的比对,发明人意识到这样的比对验证方式太过于耗时了,并且占据系统的处理资源也较大,工作周期长,效率较低,并且大幅度的遍历读取数据很容易发生错误,导致源数据读取的无效性,没有信服力,其操作过程也不够智能化。
发明内容
本申请的主要目的在于提供一种主备存储卷同步数据校验方法、装置、设备及计算机可读存储介质,旨在解决由于现有的采用全量实时计算比较法对数据校验,其耗时过长和效率低的技术问题。
为实现上述目的,本申请提供一种主备存储卷同步数据校验方法,该方法包括:
获取主存储卷中的数据,并根据摘要校验算法计算所述数据的第一校验值以及将所述第一校验值存储于所述数据中,其中,所述数据包括所述主存储卷中的所有数据或待备份的数据,所述第一校验值用于对所述数据备份至备存储卷后的完整性进行校验;
获取备存储卷中的备份数据,并根据所述摘要校验算法计算所述备份数据的第二校验值;
基于所述第一校验值和所述第二校验值,确定所述备存储卷中的备份数据是否存在异常,其中,所述异常为所述备存储卷中的数据与所述主存储卷中的数据不一致;
若确定所述备存储卷中的备份数据存在异常,则启动所述数据的备份程序将所述主存储卷中的数据发送至所述备存储卷,并删除所述备存储卷中原始的备份数据。
可选的,所述根据摘要校验算法计算所述数据的第一校验值的步骤包括:
根据数据切片划分算法,按照所述数据的存储时间戳对所述数据进行分片处理,得到若干个数据块;
基于所述摘要校验算法分别计算所述数据块的子校验值,并根据所述数据块对应的子校验值计算所述第一校验值。
可选的,所述基于所述摘要校验算法分别计算所述数据块的子校验值,并根据所述数据块对应的子校验值计算所述第一校验值包括:
以512位分组的格式对所述数据块进行分组处理,得到由四个32位分组组成的值;
将所述四个32位分组组成的值级联处理,得到所述第一校验值。
可选的,所述获取主存储卷中的数据,并根据摘要校验算法计算所述数据的第一校验值,将所述第一校验值存储于所述数据中的步骤之后,还包括:
检测所述主存储卷是否接收到有来自所述备存储卷的数据更新请求;
若接收到所述数据更新请求,则根据预置的数据传输协议将所述数据块依次传输至所述备存储卷中存储,其中,所述数据传输协议用于控制所述主存储卷和所述备存储卷之间数据传输。
可选的,所述获取备存储卷中的备份数据,并根据所述摘要校验算法计算所述备份数据的第二校验值的步骤包括:
接收所述主存储卷发送的所述数据块,并根据所述摘要校验算法对所述数据块进行摘要校验值的计算,得到所述第二校验值。
可选的,在所述根据预置的数据传输协议将所述数据块依次传输至所述备存储卷中存储的步骤之后,还包括:
获取所述备存储卷存储所述数据块的存储时间戳;
根据所述存储时间戳确定当前传输的数据块是否为最早时间的数据块;
其中,若当前传输的数据块不是最早时间的数据块,则所述基于所述摘要校验算法分别计算所述数据块的子校验值的步骤包括:
根据所述摘要校验算法计算所述当前传输的数据块的子校验值;
获取当前传输的数据块的前一个数据块的子校验值;
将所述前一个数据块的子校验值加上所述当前传输的数据块的子校验值,得到所述当前传输的数据块的实际校验值,并将所述实际校验值作为所述第二校验值发送给所述备存储卷。
可选的,在所述获取备存储卷中的备份数据,并根据所述摘要校验算法计算所述备份数据的第二校验值的步骤之后,还包括:
判断所述备存储卷上的校验定时器的定时计数是否到达;
若所述校验定时器的定时计数到达,则获取所述主存储卷和所述备存储卷的当前时间戳;
根据所述当前时间戳,分别读取所述主存储卷和备存储卷中与所述当前时间戳对应的数据,并进行相互比对,得到比对结果;
根据所述比对结果确定是否需要进行数据的更新操作以及确定数据更新的方式,所述数据更新的方式包括全部更新或部分更新。
此外,为实现上述目的,本申请还一种主备存储卷同步数据校验装置,所述主备存储卷同步数据校验装置包括:
第一计算模块,用于获取主存储卷中的数据,并根据摘要校验算法计算所述数据的第一校验值以及将所述第一校验值存储于所述数据中,其中,所述数据包括所述主存储卷中的所有数据或待备份的数据,所述第一校验值用于对所述数据备份至备存储卷后的完整性进行校验;
第二计模块,用于获取备存储卷中的备份数据,并根据所述摘要校验算法计算所述备份数据的第二校验值;
校验模块,用于基于所述第一校验值和所述第二校验值确定所述备存储卷中的备份数据是否存在异常,其中,所述异常为所述备存储卷中的数据与所述主存储卷中的数据不一致;
备份模块,用于在所述校验模块确定所述备存储卷中的备份数据存在异常,则启动所 述数据的备份程序将所述主存储卷中的数据发送至所述备存储卷,并伤处所述备存储卷中原始的备份数据。
可选的,所述第一计算模块包括数据切割单元和计算单元;
所述数据切割单元用于根据数据切片划分算法,按照所述数据的存储时间戳对所述数据进行分片处理,得到若干个数据块;
所述计算单元用于基于所述摘要校验算法分别计算所述数据块的子校验值,并根据所述数据块对应的子校验值计算所述第一校验值。
可选的,所述数据切割单元用于以512位分组的格式对所述数据块进行分组处理,得到由四个32位分组组成的值;
所述计算单元用于将所述四个32位分组组成的值级联处理,得到所述第一校验值。
可选的,所述主备存储卷同步数据校验装置还包括:检测模块和发送模块;
所述检测模块用于检测所述主存储卷是否接收到有来自所述备存储卷的数据更新请求;
所述发送模块用于在所述检测模块检测接收到所述数据更新请求,则根据预置的数据传输协议将所述数据块依次传输至所述备存储卷中存储,其中,所述数据传输协议用于控制所述主存储卷和所述备存储卷之间数据传输。
可选的,所述第二计算模块用于接收所述主存储卷发送的所述数据块,并根据所述摘要校验算法对所述数据块进行摘要校验值的计算,得到所述第二校验值。
可选的,所述主备存储卷同步数据校验装置还包括:获取模块,用于获取所述备存储卷存储所述数据块的存储时间戳;根据所述存储时间戳确定当前传输的数据块是否为最早时间的数据块;
所述计算单元用于在当前传输的数据块不是最早时间的数据块时,根据所述摘要校验算法计算所述当前传输的数据块的子校验值;获取当前传输的数据块的前一个数据块的子校验值;将所述前一个数据块的子校验值加上所述当前传输的数据块的子校验值,得到所述当前传输的数据块的实际校验值,并将所述实际校验值作为所述第二校验值发送给所述备存储卷。
可选的,所述主备存储卷同步数据校验装置还包括:判断模块,用于判断所述备存储卷上的校验定时器的定时计数是否到达;若所述校验定时器的定时计数到达,则获取所述主存储卷和所述备存储卷的当前时间戳;根据所述当前时间戳,分别读取所述主存储卷和备存储卷中与所述当前时间戳对应的数据,并进行相互比对,得到比对结果;根据所述比对结果确定是否需要进行数据的更新操作以及确定数据更新的方式,所述数据更新的方式包括全部更新或部分更新。
此外,为实现上述目的,本申请还一种所述主备存储卷同步数据校验设备,所述主备存储卷同步数据校验设备包括:存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的主备存储卷同步数据校验程序,所述主备存储卷同步数据校验程序被所述处理器执行时实现如上任一项所述的主备存储卷同步数据校验方法的步骤。
此外,为实现上述目的,本申请还一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行上述主备存储卷同步数据校验方法的步骤。
针对目前主备存储卷在数据同步的校验中耗时过长和效率较低的问题,本申请通过采用摘要校验算法分别对主存储卷和备存储卷上的数据计算校验值进行辅助对主备存储卷中的数据的完整性进行比对,若比对结果是不完整,则控制备存储卷读取主存储卷中的数据对备存储卷中的数据进行替换更新,基于摘要校验算法来校验,与现有的校验过程相比,大大减少了校验的时间长度,从而提高了数据更新的比对效率。
同时,还可以通过对数据进行切片处理再计算,以及设置定时器的方式控制校验过程的启动,从而实现了自动化校验数据,节省了人力资源的消耗,避免了认为对比的误差,提供了校验的准确率。
附图说明
图1为本申请提供的主备存储卷同步数据校验方法第一实施例的流程示意图;
图2为本申请提供的主备存储卷同步数据校验方法第二实施例的流程示意图;
图3为本申请实施例提供的切片分组计算校验值的原理图;
图4为本申请提供的主备存储卷同步数据校验装置一实施例的功能模块示意图;
图5为本申请实施例方案涉及的服务器的结构示意图。
具体实施方式
应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
在本申请中,其提供的主备存储卷同步数据校验方法主要是基于摘要校验算法MD5计算存储卷中的MD5值的方法进行数据的校验辅助,就可以快速校验主存储卷和备存储卷的数据同步性问题。首先,获取主存储卷数据,进行MD5值计算并保存在文本中。然后,主存储卷到备存储卷数据传送完毕后,获取备存储卷数据,进行MD5值计算并保存在文本中。最后,把主存储卷和备存储卷的MD5值进行比较,就可以快速得知主备存储卷的数据是否一致,传送是否发生异常。在整个过程中,不需要人工参与,全程自动触发脚本执行。而且,可以同时对多个卷进行校验,校验过程相互独立,不会互相影响。而且,是以主动方式进行数据校验的,主存储卷会以一定的时间间隔进行MD5值计算,再与前一时间段的MD5值进行比较,当检测到自身的MD5值发生变化时,说明主存储卷数据有更新。主存储卷的MD5值变化会触发主备存储卷都进行MD5值的计算与比较,查看备存储卷数据是否已同步更新。在本实施例中,该方法的物理实现系统可以是个人计算机(PC)、智能手机、服务器等一些具有远程访问功能的操作终端。基于这样的硬件结果,提出本申请的同步数据校验方法的各个实施例。
参照图1,图1为本申请实施例提供的主备存储卷同步数据校验方法的流程图。在本实施例中,所述主备存储卷同步数据校验方法具体包括以下步骤:
步骤S10,获取主存储卷中的数据,并根据摘要校验算法计算所述数据的第一校验值,将所述第一校验值存储于所述数据中;
在该步骤中,所述数据包括以下数据中的一种:主存储卷中的所有数据和待备份的数据,该待备份的数据指的是主存储卷中的部分数据,优选的可以是某一段时间内的数据,但是每个时间段上存储的数据的数据格式都是采用相同的数据格式进行存储,这个可以根据具体的备份情况来设置。
对于所述第一校验值用于实现对所述待备份数据的完整性验证,即是在待备份数据从主存储卷备份到备存储卷的过程中,根据该第一校验值对备份到备存储卷后的待备份数据(即是备份数据)进行完整性验证。
在本案中,对于第一校验值的计算,具体的根据摘要算法MD5来实现,通过使用MD5算法对从主存储卷中获取到的数据进行计算,得到一个MD5值,该MD5值就是第一校验值,通过该MD5值可以对备存储卷中的数据进行校验,比对是否一致。
步骤S20,获取备存储卷中的备份数据,并根据所述摘要校验算法计算所述备份数据的第二校验值;
在本实施例中,对于存储在备存储卷中的备份数据实质上就是从主存储卷中备份过来的数据,在实际应用中,主、备存储卷是一对实现相互验证的存储设备,而备存储卷与主存储卷之间是通过数据接口相互连接的,并且通过定时器控制备存储卷定时从所述主存储卷中读取数据进行存储,当然备存储卷在对读取到的数据进行存储时,可以不对原来的数 据进行清理,而是累积存储,当备存储卷的存储空间不够用后,再选择性清理;也可以是备存储卷每从主存储卷中读取一次数据就对备存储卷中的数据进行清理替换。
在该步骤中,对于第二校验值的计算,也是与步骤S10的计算方式相同,只是得到的结果可能会存在不同。
在本实施例中,对于第二校验值的计算包括两种情况,一种是基于备存储卷中原始的备份数据进行计算得到,另一种是基于备存储卷的备份数据发生改变后的备份数据进行计算得到;
而对于基于备存储卷中原始的备份数据进行计算得到的第二校验值,其实际上是在每次触发备份数据且校验通过后,会自动根据摘要校验算法计算得到的校验值,这时的校验值会存储在备份数据中;
而对于基于备存储卷的备份数据发生改变后的备份数据进行计算得到的第二校验值,其实际上是每次在备存储卷接收到主存储卷发送过来的待备份数据后且未进行校验之前,通过摘要校验算法计算得到的,这时的第二校验值是不需要存储的,是实时计算得到的,主要是用于判断待备份数据在备份至备存储卷的过程中是否发生数据变异或备篡改。
步骤S30,基于所述第一校验值和所述第二校验值确定所述备存储卷中的备份数据是否存在异常;
在该步骤中,首先需要从主备存储卷中提取出对应的第一校验值和第二校验值,然后在进行比较;而对于第一校验值和第二校验值的提取,一般是根据存储的标识来进行提取,一般情况下,第一校验值在存储到数据中时,其在数据中会存在一个唯一的标识或者是存储在数据中的特定位置上,基于标识或者特定位置进行识别提取即可,同理,对于第二校验值的提取也是一样的。
当然,若第二校验值存在两种情况时,则第一种情况是按照第一校验孩子的提取方式得到,第二种情况则需要根据摘要校验算法进行实时计算得到即可。
在该步骤中,所述异常为所述备存储卷中的数据与所述主存储卷中的数据不一致,这里的不一致可能是由于数据备份不及时,导致数据不齐全,也可能是在备份过程中,数据发生了改变。
在实际应用中,由于对于校验值的计算方式都是一样的,并且在实际中,主备存储卷中所存储的内容应该是相同的,所以基于相同的计算方式和相同的数据计算验证值,若验证结果是一致,则说明备存储卷中的备份数据与主存储卷中的数据相同,若不相同,则该不相同所导致的原因可能需要根据计算第二校验值的时间点确定。
在本实施例中,计算第二校验值的时间点一般是两种,一种是当备存储设备上的数据到了一定需要更新的时间点时,另一种是备存储卷刚刚完成数据的传输备份时,当处于需要更新的时间点上计算的第二校验值,若与第一校验值不一致,则认为是主存储卷中的数据存在了新的数据,则备存储卷要执行备份数据更新程序;若是处于刚刚完成数据的传输备份时,计算的第二校验值,若与第一校验值不一致,则认为是备存储卷在接收主存储卷中的数据过程中出现了错误,则需要执行重新备份数据的程序。
对于该步骤中第一校验值与第二校验值的比较过程中,可能会存在两种比较情况,一种是对于第二校验值是基于备存储卷中原始的备份数据进行计算得到时,这里将该种情况下得到的校验值记为校验值A,这时只需要比较第一校验值与校验值A是否相同,若不相同,则执行步骤S40。
另一种是第二校验值同时包括两种情况计算得到的校验值,分别是校验值A和校验值B,其中校验值B是基于备存储卷的备份数据发生改变后的备份数据进行计算得到,这时需要先比较第一校验值与校验值A是否相同,若不相同,则继续执行比较第一校验值与校验值B是否相同,若不相同,则重新请求主存储卷发送待备份数据,并执行步骤S40;若相同,则结束备份步骤,继续监控主备存储卷的数据变化。
步骤S40,若确定所述备存储卷中的备份数据存在异常,则启动所述数据的备份程序将所述主存储卷中的数据发送至所述备存储卷,并删除所述备存储卷中原始的备份数据。
在该步骤中,主存储卷在将待备份数据发送给备存储卷时,具体可以通过将待备份数据逐一进行分片处理,并且分片处理的数据块按照存储时间戳的先后顺序进行传输,在传输的数据块的同时,还包括对每个数据块计算对应的子校验值,而在备存储卷一侧,也可以通过在计算接收到的数据块进行一一校验。
在本实施例中,通过上述的方式结合MD5算法计算校验值来实现对主备存储卷中的数据的完整性比对,大大节省的数据比较的时间长度,而且还可以进行部分数据的比对更新,使得备存储卷可以实时保持与主存储卷的数据的同步和相同,避免了主存储卷数据的丢失。
在本实施例中,为了进一步的提高计算效率,对于步骤S10和S20,在计算校验值时,可以是先对数据进行切片处理后,再计算,而对于切片后的数据块还可以进行适当的筛选,选出具有代表性的数据块来比对计算,比如数据量较大的数据块。
对于所述根据所述数据的摘要校验算法计算所述数据的第一校验值的步骤包括:
根据数据切片划分算法,按照所述数据的存储时间戳对所述数据进行分片处理,得到若干个数据块;,而该分片处理就是将待备份数据分割为一块一块的数据块,而且每个数据块的时间间隔相等;
基于所述摘要校验算法分别计算所述数据块的子校验值,并根据所述数据块对应的子校验值计算所述第一校验值。
在实际应用中,假设待备份数据被切分为4块数据块时,其子校验值的计算可以有两种方式,一种是单独的计算子校验块,另一种是每个数据块的子校验值时基于前一数据块的自校验值的基础上计算得到,也就是说,越往后的数据块的子校验值可以同时实现对前面数据块和自身数据的校验。
可选的,在计算第二块数据块的校验值时,其校验值为第一块数据的校验值加上第二块数据块的校验值。
在本实施例中,由于采用的是MD5算法来计算校验值,而基于MD5算法的具体方式具体还可以是通过分组的方式来计算,具体过程如下:
所述基于所述摘要校验算法分别计算所述数据块的子校验值,并根据所述数据块对应的子校验值计算所述第一校验值包括:
以512位分组的格式对所述数据块进行分组处理,得到由四个32位分组组成的值;
将所述四个32位分组组成的值级联处理,得到所述第一校验值。
在实际应用中,该MD5以512位分组来处理输入的信息,且每一分组又被划分为16个32位子分组,经过了一系列的处理后,算法的输出由四个32位分组组成,将这四个32位分组级联后将生成一个128位散列值,而该128位散列值的数据就是第一校验数据,并存储在主存储卷的数据中。在实际备份中,主存储卷可以将该值直接和待备份数据一起发送给备存储卷,而备存储卷可以直接根据接收到的校验值进行对备份数据进行比对校验完整性,不需要再从主存储卷中读取,这也是针对于主备存储卷不设置在同一设备上的情况,从而提高了数据比对的效率,若比对不一致,则备存储卷下发重新备份请求即可,当然这种一起传输的方式,一般适用于在主存储卷主动触发数据备份的情况下进行。
在实际应用中,对于采用MD5算法计算第一校验值的具体实现过程如下:
1.首先对从主存储卷中获取到的数据进行填充处理,使该数据的位长对512求余的结果等于448,并且填充必须进行,即使其位长对512求余的结果等于448。因此,数据的位长(Bits Length)将被扩展至N*512+448,N为一个非负整数,N可以是零。其填充的方法如下:
在数据的后面填充一个1和无数个0,直到满足上面的条件时才停止用0对数据的填 充。
在这个结果后面附加一个以64位二进制表示的填充前数据长度(单位为Bit),如果二进制表示的填充前数据长度超过64位,则取低64位。
经过这两步的处理,数据的位长=N*512+448+64=(N+1)*512,即长度恰好是512的整数倍。这样做的原因是为满足后面处理中对数据长度的要求。
2.初始化变量(变量值一般不变)
初始的128位值为初试链接变量,这些参数用于第一轮的运算,以大端字节序来表示,他们分别为:
A=0x01234567,
B=0x89ABCDEF,
C=0xFEDCBA98,
D=0x76543210。
(每一个变量给出的数值是高字节存于内存低地址,低字节存于内存高地址,即大端字节序。在程序中变量A、B、C、D的值分别为0x67452301,0xEFCDAB89,0x98BADCFE,0x10325476)
3.处理分组数据
每一分组的算法流程如下:
(1)第一分组需要将上面四个链接变量复制到另外四个变量中:A到a,B到b,C到c,D到d。
(2)从第二分组开始的变量为上一分组的运算结果,即A=a,B=b,C=c,D=d。
主循环有四轮(MD4只有三轮),每轮循环都很相似。第一轮进行16次操作。每次操作对a、b、c和d中的其中三个作一次非线性函数运算,然后将所得结果加上第四个变量,文本的一个子分组和一个常数。再将所得结果向左环移一个不定的数,并加上a、b、c或d中之一。最后用该结果取代a、b、c或d中之一。
一个MD5运算由类似的64次循环构成,分成4组16次。
F:一个非线性函数,一个函数运算一次
Mi:表示一个32-bits的输入数据
Ki:表示一个32-bits常数,用来完成每次不同的计算,其具体的流程图如图3所示。
以下是每次操作中用到的四个非线性函数(每轮一个)。
F(X,Y,Z)=(X&Y)|((~X)&Z)
G(X,Y,Z)=(X&Z)|(Y&(~Z))
H(X,Y,Z)=X^Y^Z
I(X,Y,Z)=Y^(X|(~Z))
(&是与(And),|是或(Or),~是非(Not),^是异或(Xor))
这四个函数的说明:如果X、Y和Z的对应位是独立和均匀的,那么结果的每一位也应是独立和均匀的。
F是一个逐位运算的函数。即,如果X,那么Y,否则Z。函数H是逐位奇偶操作符。
假设Mj表示消息的第j个子分组(从0到15),常数ti是4294967296*abs(sin(i))的整数部分,i取值从1到64,单位是弧度。(4294967296=2^(32))
现定义:
FF(a,b,c,d,Mj,s,ti)操作为a=b+((a+F(b,c,d)+Mj+ti)<<s)
GG(a,b,c,d,Mj,s,ti)操作为a=b+((a+G(b,c,d)+Mj+ti)<<s)
HH(a,b,c,d,Mj,s,ti)操作为a=b+((a+H(b,c,d)+Mj+ti)<<s)
II(a,b,c,d,Mj,s,ti)操作为a=b+((a+I(b,c,d)+Mj+ti)<<s)
注意:“<<<”表示循环左移位,不是左移位。
所有这些完成之后,将a、b、c、d分别在原来基础上再加上A、B、C、D。
即a=a+A,b=b+B,c=c+C,d=d+D
然后用下一分组数据继续运行以上算法。
4.输出
最后的输出是a、b、c和d的级联,该级联的结果即为第一校验值。
在本实施例中,当在计算第一校验值时,所述主存储设备接收到数据更新的请求时,则执行数据的更新后在计算第二校验值,具体的实现方式为:
检测所述主存储卷是否接收到有来自所述备存储卷的数据更新请求;
若接收到所述数据更新请求,则根据预置的数据传输协议将所述数据块依次传输至所述备存储卷中存储,所述数据传输协议用于控制所述主存储卷和所述备存储卷之间数据传输。
进一步的,将主存储卷发送过来的待备份数据存储后,根据接收到的数据结合MD5算法计算,当然要使用MD5算法则需要主存储卷和备存储卷提前通过握手协议写上确定对应的校验方法后才可以实现对校验值得计算,这样可以进一步保证了计算结果的对应性。
这时,所述获取备存储卷中的备份数据,并根据所述摘要校验算法计算所述备份数据的第二校验值的步骤包括:
接收所述主存储卷发送的所述数据块,并根据所述摘要校验算法对所述数据块进行摘要校验值的计算,得到所述第二校验值。
进一步的,若主存储卷在传输数据时是以一块块的数据块进行传输时,对于步骤S20计算第二校验值的过程中,可以是每接收一个数据块计算一次校验值进行比对,从而保证每次接收到的数据保证是与主存储卷的数据一致,进一步提高了数据的备份效率,与现有技术相比,不需要每次都等待备份完成后再重新备份,只需要对校验不一致的数据块进行请求重新备份即可。当然在校验不一致时,可以通过先记录该数据块的信息,等到本次备份完成后,再将这些数据块的信息形成以的重新更新的请求发送给主存储卷,主存储卷根据接收到的请求提取对应的数据块重新发送给备存储卷,备存储卷将接收到的数据块填写到对应的数据块存储位置即可。
在本实施例中,若备存储卷的数据备份是以更新到达的时间点上校验时,在所述根据预置的数据传输协议将所述数据块依次传输至所述备存储卷中存储之后,还包括:
获取所述备存储卷存储所述数据块的存储时间戳;
根据所述存储时间戳确定当前传输的数据块是否为最早时间的数据块;
若当前传输的数据块不是最早时间的数据块,则所述基于所述摘要校验算法分别计算所述数据块的子校验值包括:
根据所述摘要校验算法计算所述当前传输的数据块的子校验值;
获取当前传输的数据块的前一个数据块的子校验值;
将所述前一个数据块的子校验值加上所述当前传输的数据块的子校验值,得到所述当前传输的数据块的实际校验值,并将所述实际校验值作为所述第二校验值发送给所述备存储卷。
在本案中对于主备存储卷的数据进行MD5值计算是基于Python库函数hashlib的md5函数进行的。该函数可以直接对我们传送过去的数据进行计算。
当要计算的数据过大时,会对其进行分片计算。比如主存储卷数据为5M,同步到备卷的时候,则把其分割成5份1MB的文件,在同步的时候,计算两个MD5值,一个是当前片1MB数据的MD5,还有一个就是拼接之后的MD5,假设现在同步的是第二片,这个MD5就应该是第一片加上第二片的MD5。
在本实施例中,在所述获取备存储卷中的备份数据,并根据所述摘要校验算法计算所 述备份数据的第二校验值之后,还包括:
判断所述备存储卷上的校验定时器的定时计数是否到达;
若所述校验定时器的定时计数到达,则获取所述主存储卷和所述备存储卷的当前时间戳;
根据所述当前时间戳,分别读取所述主存储卷和备存储卷中与所述当前时间戳对应的数据,并进行相互比对,得到比对结果;
根据所述比对结果确定是否需要进行数据的更新操作以及确定数据更新的方式,所述数据更新的方式包括全部更新和部分更新中的一种。
本申请实施例提供的方法,通过对MD5值的比较可以减少比较的时间,还可以提前发现数据的不一致。比方说,若没有对主备存储卷的数据进行MD5值计算,而是把主卷和备卷的数据一对一地比对,这样会大大增加比对时间。这种全量比对的方式也没有办法获知是哪个数据片发生异常,而且要等到全部传送完毕才开始进行全量比较。
判断异常就是MD5值不一致,说明主卷和备卷同步的数据不一致,因此可以说明数据同步异常,基于该异常来实现对对应的备份数据进行重备份,从而提高数据比对效率,及时发现数据同步是否异常以及异常数据出现在哪个时间点。排除异常的话应该是对这个产品质量进行提升,但是还是会用该方法去比对数据,看同步过程是否会发生异常的。
如图2所示,为本申请实施例主备存储卷同步数据校验方法的细化流程图,该方法具体包括以下步骤:
步骤S210、采集主存储卷上的所有数据,并对所述所有数据进行切片分组处理;
在该步骤中,该切片分组处理指的是采用MD5算法首先对采集到的数据进行按固定的时间间隔进行切分,然后从切分后的数据中筛选出占用内存较大的若干个数据块,并按时间先后顺序进行排序,然后以512位分组的格式对所述数据块进行分组处理,得到由四个32位分组组成的值。
步骤S220,基于切片分组处理后的数据,采用MD5算法计算其第一校验值;
在该步骤中,具体的是基于分组后的四个32位分组组成的值进行校验值的计算,可选的采用直接联级的方式串联即可得到。
步骤S230、检测备存储卷上是否接收到数据更新的请求;
在本实施例中,对于该请求的检测可以通过检测备存储卷上设置的备份定时器的工作状态来确定。在实际应用中,对于备存储卷上的数据备份是通过定时器来实现的定时控制,而当检测到定时器被触发时,则执行步骤S240,反之,则执行步骤S250。
步骤S240、启动数据备份程序,获取主存储卷中的数据,并保存至备存储卷中;
步骤S250、获取备存储卷中的备份数据,并根据MD5算法计算备份数据的第二校验值;
在本实施例中,对于所述第二校验值的计算具体存在以下两种情况,一种是在备份的过程中,抽样获取主存储卷发送给备存储卷的数据块,并对所述数据块的摘要校验值进行计算,从而得到第二校验值;另一种情况是直接从备存储卷中获取备份数据,然后对备份数据进行切片分组处理,并基于切片分组后的数据进行校验值的计算,得到第二校验值。
步骤S260、比较第二校验值是否于第一校验值相同。
步骤S270,若第二校验值和第一校验值不相同,则执行数据的重备份处理,或者是对主存储卷上的数据进行恢复处理。
在本实施例中,在根据传输的数据块计算第二校验值时,还包括根据所述数据块的存储时间戳确定当前传输的数据块是否为最早时间的数据块;
若不是,则基于校验算法分别计算所述数据块的子校验值包括:
获取当前传输的数据块的前一个数据块的子校验值;
将所述前一个数据块的子校验值加上根据所述当前传输的数据块计算得到的子校验值,得到所述当前传输的数据块的实际校验值,并将所述实际校验值作为所述第二校验值。
在本案中对于主备存储卷的数据进行MD5值计算是基于Python库函数hashlib的md5函数进行的。该函数可以直接对我们传送过去的数据进行计算。
当要计算的数据过大时,会对其进行分片计算。比如主存储卷数据为5M,同步到备卷的时候,则把其分割成5份1MB的文件,在同步的时候,计算两个MD5值,一个是当前片1MB数据的MD5,还有一个就是拼接之后的MD5,假设现在同步的是第二片,这个MD5就应该是第一片加上第二片的MD5。这样的计算方式,可以确定数据异常的具体位置,判断异常就是MD5值不一致,说明主卷和备卷同步的数据不一致,因此可以说明数据同步异常,基于该异常来实现对对应的备份数据进行重备份,从而提高数据比对效率,及时发现数据同步是否异常以及异常数据出现在哪个时间点。排除异常的话应该是对这个产品质量进行提升,但是还是会用该方法去比对数据,看同步过程是否会发生异常的。
为了解决上述的问题,本申请还提供一种主备存储卷同步数据校验设备,该主备存储卷同步数据校验设备可以用于实现本申请实施例提供的主备存储卷同步数据校验方法,其物理实现以本地PC电脑端、服务器的方式存在,该服务器的具体硬件实现如图4所示。
参见图4,该服务器包括:处理器301,例如CPU,通信总线302、用户接口303,网络接口304,存储器305。其中,通信总线302用于实现这些组件之间的连接通信。用户接口303可以包括显示屏(Display)、输入单元比如键盘(Keyboard),网络接口304可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器305可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器305可选的还可以是独立于前述处理器301的存储装置。
本领域技术人员可以理解,图4中示出的设备的硬件结构并不构成对主备存储卷同步数据校验装置的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图4所示,作为一种计算机可读存储介质的存储器305中可以包括操作系统、网络通信模块、用户接口模块以及基于主备存储卷同步数据校验程序。其中,操作系统是管理和数据分析装置和软件资源的程序,所述主备存储卷同步数据校验程序以及其它软件和/或程序的运行。
在图4所示的PC操作平台的硬件结构中,网络接口104主要用于接入网络;用户接口103主要用于与外界互联网或者是与提供企业数据的远程服务器进行通信,调取远程服务器上的所有数据库中的存储的数据,然后对这些数据进行分析处理,得到对应的数据异常结构,而处理器301可以用于调用存储器305中存储的主备存储卷同步数据校验程序,并执行以下主备存储卷同步数据校验方法的各实施例的操作。
在本大明实施例中,对于图4的实现还可以是一种服务器等带有触控操作平台的PC终端,该PC终端的处理器通过读取存储在缓存器或者存储单元中的可以实现主备存储卷同步数据校验方法的程序代码来主备存储卷在同步数据时的数据校验。
为了解决上述的问题,本申请实施例还提供了一种主备存储卷同步数据校验装置,参照图5,图5为本申请实施例提供的主备存储卷同步数据校验装置的功能模块的示意图。在本实施例中,该装置包括:
第一计算模块41,用于获取主存储卷中的数据,并根据摘要校验算法计算所述数据的第一校验值,将所述第一校验值存储于所述数据中,其中,所述数据包括以下数据中的一种:所述主存储卷中的所有数据和待备份的数据,所述第一校验值用于对所述数据备份至备存储卷后的完整性进行校验;
第二计模块42,用于获取备存储卷中的备份数据,并根据所述摘要校验算法计算所 述备份数据的第二校验值;
校验模块43,用于基于所述第一校验值和所述第二校验值确定所述备存储卷中的备份数据是否存在异常,所述异常为所述备存储卷中的数据与所述主存储卷中的数据不一致;
备份模块44,用于在所述校验模块确定所述备存储卷中的备份数据存在异常,则启动所述数据的备份程序将所述主存储卷中的数据发送至所述备存储卷,并删除所述备存储卷中原始的备份数据。
基于与上述本申请实施例的主备存储卷同步数据校验方法相同的实施例说明内容,因此本实施例对主备存储卷同步数据校验装置的实施例内容不做过多赘述。
本实施例通过分别对主备存储卷上的数据采用摘要校验算法计算校验值,并且在计算校验值时采用的摘要校验算法先对数据进行切片分组处理后,再对得到的数据块进行校验值的计算,然后比对校验值来确定同步数据是否异常,最后根据比对的结果确定是否需要重新备份,通过上述的方式结合MD5算法计算校验值来实现对主备存储卷中的数据的完整性比对,大大节省的数据比较的时间长度,而且还可以进行部分数据的比对更新,使得备存储卷可以实时保持与主存储卷的数据的同步和相同,避免了主存储卷数据的丢失。
本申请还提供一种计算机可读存储介质。该计算机可读存储介质可以为非易失性计算机可读存储介质,也可以为易失性计算机可读存储介质。计算机可读存储介质存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:
获取主存储卷中的数据,并根据摘要校验算法计算所述数据的第一校验值以及将所述第一校验值存储于所述数据中,其中,所述数据包括:所述主存储卷中的所有数据或待备份的数据,所述第一校验值用于对所述数据备份至备存储卷后的完整性进行校验;
获取备存储卷中的备份数据,并根据所述摘要校验算法计算所述备份数据的第二校验值;
基于所述第一校验值和所述第二校验值,确定所述备存储卷中的备份数据是否存在异常,其中,所述异常为所述备存储卷中的数据与所述主存储卷中的数据不一致;
若确定所述备存储卷中的备份数据存在异常,则启动所述数据的备份程序将所述主存储卷中的数据发送至所述备存储卷,并删除所述备存储卷中原始的备份数据。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器或者网络设备等)执行本申请各个实施例所述的方法。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,这些均属于本申请的保护之内。

Claims (20)

  1. 一种主备存储卷同步数据校验方法,所述主备存储卷同步数据校验方法包括以下步骤:
    获取主存储卷中的数据,并根据摘要校验算法计算所述数据的第一校验值以及将所述第一校验值存储于所述数据中,其中,所述数据包括:所述主存储卷中的所有数据或待备份的数据,所述第一校验值用于对所述数据备份至备存储卷后的完整性进行校验;
    获取备存储卷中的备份数据,并根据所述摘要校验算法计算所述备份数据的第二校验值;
    基于所述第一校验值和所述第二校验值,确定所述备存储卷中的备份数据是否存在异常,其中,所述异常为所述备存储卷中的数据与所述主存储卷中的数据不一致;
    若确定所述备存储卷中的备份数据存在异常,则启动所述数据的备份程序将所述主存储卷中的数据发送至所述备存储卷,并删除所述备存储卷中原始的备份数据。
  2. 如权利要求1所述的主备存储卷同步数据校验方法,所述根据摘要校验算法计算所述数据的第一校验值的步骤包括:
    根据数据切片划分算法,按照所述数据的存储时间戳对所述数据进行分片处理,得到若干个数据块;
    基于所述摘要校验算法分别计算所述数据块的子校验值,并根据所述数据块对应的子校验值计算所述第一校验值。
  3. 如权利要求2所述的主备存储卷同步数据校验方法,所述基于所述摘要校验算法分别计算所述数据块的子校验值,并根据所述数据块对应的子校验值计算所述第一校验值包括:
    以512位分组的格式对所述数据块进行分组处理,得到由四个32位分组组成的值;
    将所述四个32位分组组成的值级联处理,得到所述第一校验值。
  4. 如权利要求3所述的主备存储卷同步数据校验方法,所述获取主存储卷中的数据,并根据摘要校验算法计算所述数据的第一校验值,将所述第一校验值存储于所述数据中的步骤之后,还包括:
    检测所述主存储卷是否接收到有来自所述备存储卷的数据更新请求;
    若接收到所述数据更新请求,则根据预置的数据传输协议将所述数据块依次传输至所述备存储卷中存储,其中,所述数据传输协议用于控制所述主存储卷和所述备存储卷之间数据传输。
  5. 如权利要求4所述的主备存储卷同步数据校验方法,所述获取备存储卷中的备份数据,并根据所述摘要校验算法计算所述备份数据的第二校验值的步骤包括:
    接收所述主存储卷发送的所述数据块,并根据所述摘要校验算法对所述数据块进行摘要校验值的计算,得到所述第二校验值。
  6. 如权利要求5所述的主备存储卷同步数据校验方法,在所述根据预置的数据传输协议将所述数据块依次传输至所述备存储卷中存储的步骤之后,还包括:
    获取所述备存储卷存储所述数据块的存储时间戳;
    根据所述存储时间戳确定当前传输的数据块是否为最早时间的数据块;
    其中,若当前传输的数据块不是最早时间的数据块时,则所述基于所述摘要校验算法分别计算所述数据块的子校验值的步骤包括:
    根据所述摘要校验算法计算所述当前传输的数据块的子校验值;
    获取当前传输的数据块的前一个数据块的子校验值;
    将所述前一个数据块的子校验值加上所述当前传输的数据块的子校验值,得到所述当前传输的数据块的实际校验值,并将所述实际校验值作为所述第二校验值发送给所述备存储卷。
  7. 如权利要求1所述的主备存储卷同步数据校验方法,在所述获取备存储卷中的备份数据,并根据所述摘要校验算法计算所述备份数据的第二校验值的步骤之后,还包括:
    判断所述备存储卷上的校验定时器的定时计数是否到达;
    若所述校验定时器的定时计数到达,则获取所述主存储卷和所述备存储卷的当前时间戳;
    根据所述当前时间戳,分别读取所述主存储卷和备存储卷中与所述当前时间戳对应的数据,并进行相互比对,得到比对结果;
    根据所述比对结果确定是否需要进行数据的更新操作以及确定数据更新的方式,所述数据更新的方式包括全部更新或部分更新。
  8. 一种主备存储卷同步数据校验装置,所述主备存储卷同步数据校验装置包括:
    第一计算模块,用于获取主存储卷中的数据,并根据所述数据的摘要校验算法计算所述数据的第一校验值,将所述第一校验值存储于所述数据中,其中,所述数据包括以下数据中的一种:所述主存储卷中的所有数据和待备份的数据,所述第一校验值用于对所述数据备份至备存储卷后的完整性校验;
    第二计模块,用于获取备存储卷中的备份数据,并根据所述摘要校验算法计算所述备份数据的第二校验值;
    校验模块,用于基于所述第一校验值和所述第二校验值确定所述备存储卷中的备份数据是否存在异常,所述异常为所述备存储卷中的数据与所述主存储卷中的数据不一致;
    备份模块,用于在所述校验模块确定所述备存储卷中的备份数据存在异常,则启动所述数据的备份程序将所述主存储卷中的数据发送至所述备存储卷,并替换所述备份数据。
  9. 如权利要求8所述的主备存储卷同步数据校验装置,所述第一计算模块包括数据切割单元和计算单元;
    所述数据切割单元用于根据数据切片划分算法,按照所述数据的存储时间戳对所述数据进行分片处理,得到若干个数据块;
    所述计算单元用于基于所述摘要校验算法分别计算所述数据块的子校验值,并根据所述数据块对应的子校验值计算所述第一校验值。
  10. 如权利要求9所述的主备存储卷同步数据校验装置,
    所述数据切割单元用于以512位分组的格式对所述数据块进行分组处理,得到由四个32位分组组成的值;
    所述计算单元用于将所述四个32位分组组成的值级联处理,得到所述第一校验值。
  11. 如权利要求10所述的主备存储卷同步数据校验装置,所述主备存储卷同步数据校验装置还包括:检测模块和发送模块;
    所述检测模块用于检测所述主存储卷是否接收到有来自所述备存储卷的数据更新请求;
    所述发送模块用于在所述检测模块检测接收到所述数据更新请求,则根据预置的数据传输协议将所述数据块依次传输至所述备存储卷中存储,其中,所述数据传输协议用于控制所述主存储卷和所述备存储卷之间数据传输。
  12. 如权利要求11所述的主备存储卷同步数据校验装置,所述第二计算模块用于接收所述主存储卷发送的所述数据块,并根据所述摘要校验算法对所述数据块进行摘要校验值的计算,得到所述第二校验值。
  13. 如权利要求12所述的主备存储卷同步数据校验装置,所述主备存储卷同步数据校验装置还包括:
    获取模块,用于获取所述备存储卷存储所述数据块的存储时间戳;根据所述存储时间戳确定当前传输的数据块是否为最早时间的数据块;
    所述计算单元用于在当前传输的数据块不是最早时间的数据块时,根据所述摘要校验算法计算所述当前传输的数据块的子校验值;获取当前传输的数据块的前一个数据块的子校验值;将所述前一个数据块的子校验值加上所述当前传输的数据块的子校验值,得到所述当前传输的数据块的实际校验值,并将所述实际校验值作为所述第二校验值发送给所述备存储卷。
  14. 如权利要求8所述的主备存储卷同步数据校验装置,所述主备存储卷同步数据校验装置还包括:
    判断模块,用于判断所述备存储卷上的校验定时器的定时计数是否到达;若所述校验定时器的定时计数到达,则获取所述主存储卷和所述备存储卷的当前时间戳;根据所述当前时间戳,分别读取所述主存储卷和备存储卷中与所述当前时间戳对应的数据,并进行相互比对,得到比对结果;根据所述比对结果确定是否需要进行数据的更新操作以及确定数据更新的方式,所述数据更新的方式包括全部更新或部分更新。
  15. 一种主备存储卷同步数据校验设备,所述主备存储卷同步数据校验设备包括:存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的主备存储卷同步数据校验程序,所述主备存储卷同步数据校验程序被所述处理器执行时实现如下步骤:
    获取主存储卷中的数据,并根据摘要校验算法计算所述数据的第一校验值以及将所述第一校验值存储于所述数据中,其中,所述数据包括:所述主存储卷中的所有数据或待备份的数据,所述第一校验值用于对所述数据备份至备存储卷后的完整性进行校验;
    获取备存储卷中的备份数据,并根据所述摘要校验算法计算所述备份数据的第二校验值;
    基于所述第一校验值和所述第二校验值,确定所述备存储卷中的备份数据是否存在异常,其中,所述异常为所述备存储卷中的数据与所述主存储卷中的数据不一致;
    若确定所述备存储卷中的备份数据存在异常,则启动所述数据的备份程序将所述主存储卷中的数据发送至所述备存储卷,并删除所述备存储卷中原始的备份数据。
  16. 如权利要求15所述的主备存储卷同步数据校验设备,所述主备存储卷同步数据校验程序被所述处理器执行实现所述根据摘要校验算法计算所述数据的第一校验值时,包括如下步骤:
    根据数据切片划分算法,按照所述数据的存储时间戳对所述数据进行分片处理,得到若干个数据块;
    基于所述摘要校验算法分别计算所述数据块的子校验值,并根据所述数据块对应的子校验值计算所述第一校验值。
  17. 如权利要求16所述的主备存储卷同步数据校验设备,所述主备存储卷同步数据校验程序被所述处理器执行实现所述基于所述摘要校验算法分别计算所述数据块的子校 验值,并根据所述数据块对应的子校验值计算所述第一校验值时,包括如下步骤:
    以512位分组的格式对所述数据块进行分组处理,得到由四个32位分组组成的值;
    将所述四个32位分组组成的值级联处理,得到所述第一校验值。
  18. 如权利要求17所述的主备存储卷同步数据校验设备,所述主备存储卷同步数据校验程序被所述处理器执行实现所述获取主存储卷中的数据,并根据摘要校验算法计算所述数据的第一校验值,将所述第一校验值存储于所述数据中之后,还包括如下步骤:
    检测所述主存储卷是否接收到有来自所述备存储卷的数据更新请求;
    若接收到所述数据更新请求,则根据预置的数据传输协议将所述数据块依次传输至所述备存储卷中存储,其中,所述数据传输协议用于控制所述主存储卷和所述备存储卷之间数据传输。
  19. 如权利要求18所述的主备存储卷同步数据校验设备,所述主备存储卷同步数据校验程序被所述处理器执行实现所述获取备存储卷中的备份数据,并根据所述摘要校验算法计算所述备份数据的第二校验值时,包括如下步骤:
    接收所述主存储卷发送的所述数据块,并根据所述摘要校验算法对所述数据块进行摘要校验值的计算,得到所述第二校验值。
  20. 一种计算机可读存储介质,所述计算机可读存储介质中存储计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:
    获取主存储卷中的数据,并根据摘要校验算法计算所述数据的第一校验值以及将所述第一校验值存储于所述数据中,其中,所述数据包括:所述主存储卷中的所有数据或待备份的数据,所述第一校验值用于对所述数据备份至备存储卷后的完整性进行校验;
    获取备存储卷中的备份数据,并根据所述摘要校验算法计算所述备份数据的第二校验值;
    基于所述第一校验值和所述第二校验值,确定所述备存储卷中的备份数据是否存在异常,其中,所述异常为所述备存储卷中的数据与所述主存储卷中的数据不一致;
    若确定所述备存储卷中的备份数据存在异常,则启动所述数据的备份程序将所述主存储卷中的数据发送至所述备存储卷,并删除所述备存储卷中原始的备份数据。
PCT/CN2019/119090 2019-06-18 2019-11-18 主备存储卷同步数据校验方法、装置、设备及存储介质 WO2020253083A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910526266.2 2019-06-18
CN201910526266.2A CN110413441A (zh) 2019-06-18 2019-06-18 主备存储卷同步数据校验方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020253083A1 true WO2020253083A1 (zh) 2020-12-24

Family

ID=68359242

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/119090 WO2020253083A1 (zh) 2019-06-18 2019-11-18 主备存储卷同步数据校验方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN110413441A (zh)
WO (1) WO2020253083A1 (zh)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413441A (zh) * 2019-06-18 2019-11-05 平安科技(深圳)有限公司 主备存储卷同步数据校验方法、装置、设备及存储介质
CN110958171A (zh) * 2019-11-29 2020-04-03 盛科网络(苏州)有限公司 基于主备设备的数据同步方法及系统
CN111427718B (zh) * 2019-12-10 2024-01-23 杭州海康威视数字技术股份有限公司 文件备份方法、恢复方法及装置
CN113051544A (zh) * 2019-12-26 2021-06-29 瑞昱半导体股份有限公司 外部设备及其验证更新方法
CN111294391A (zh) * 2020-01-17 2020-06-16 深信服科技股份有限公司 一种配置同步方法、装置、设备及可读存储介质
CN111290998A (zh) * 2020-02-12 2020-06-16 平安科技(深圳)有限公司 迁移数据的校对方法、装置、设备及存储介质
CN111400116A (zh) * 2020-03-10 2020-07-10 珠海全志科技股份有限公司 芯片测试验证方法、计算机装置及计算机可读存储介质
CN111586141B (zh) * 2020-04-30 2023-04-07 中国工商银行股份有限公司 作业处理方法、装置、系统和电子设备
CN111581028A (zh) * 2020-05-12 2020-08-25 上海英方软件股份有限公司 一种基于数据块的数据快速备份和一致性验证方法及系统
CN112052141B (zh) * 2020-09-02 2022-04-01 平安科技(深圳)有限公司 数据分片校验方法、装置、计算机设备及可读存储介质
CN112214352B (zh) * 2020-10-16 2023-02-17 天津七所高科技有限公司 一种基于Ethernet/IP的焊机设备数据自动备份方法及装置
CN112817792A (zh) * 2021-01-22 2021-05-18 浪潮电子信息产业股份有限公司 IaaS系统的数据备份方法、装置、系统及存储介质
CN114422531B (zh) * 2022-03-11 2022-07-05 深圳市金政软件技术有限公司 数据同步方法、系统、设备及存储介质
CN114676145B (zh) * 2022-03-22 2023-05-30 阿里云计算有限公司 数据处理方法以及数据核对系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101452410A (zh) * 2007-12-06 2009-06-10 中兴通讯股份有限公司 一种嵌入式数据库的数据备份系统及数据备份和恢复方法
JP2009230523A (ja) * 2008-03-24 2009-10-08 Nippon Hoso Kyokai <Nhk> ファイル同期装置、ファイル同期方法及びファイル同期プログラム
JP2010211295A (ja) * 2009-03-06 2010-09-24 Mitsubishi Electric Corp データ更新装置、データ更新装置のデータ更新方法およびデータ更新プログラム
CN103164523A (zh) * 2013-03-19 2013-06-19 华为技术有限公司 数据一致性检查方法、装置及系统
CN107643882A (zh) * 2017-09-29 2018-01-30 昂纳信息技术(深圳)有限公司 一种数据可靠性的存储及恢复方法、系统及存储装置
CN110413441A (zh) * 2019-06-18 2019-11-05 平安科技(深圳)有限公司 主备存储卷同步数据校验方法、装置、设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6928607B2 (en) * 2000-10-19 2005-08-09 Oracle International Corporation Data integrity verification mechanism
CN105335443A (zh) * 2014-08-13 2016-02-17 阿里巴巴集团控股有限公司 一种用于数据同步中的异常检测的方法与设备
CN107204852A (zh) * 2017-06-23 2017-09-26 郑州云海信息技术有限公司 一种基于数据一致性校验算法的优化算法
CN108762686B (zh) * 2018-06-04 2021-01-01 平安科技(深圳)有限公司 数据一致性校验流控方法、装置、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101452410A (zh) * 2007-12-06 2009-06-10 中兴通讯股份有限公司 一种嵌入式数据库的数据备份系统及数据备份和恢复方法
JP2009230523A (ja) * 2008-03-24 2009-10-08 Nippon Hoso Kyokai <Nhk> ファイル同期装置、ファイル同期方法及びファイル同期プログラム
JP2010211295A (ja) * 2009-03-06 2010-09-24 Mitsubishi Electric Corp データ更新装置、データ更新装置のデータ更新方法およびデータ更新プログラム
CN103164523A (zh) * 2013-03-19 2013-06-19 华为技术有限公司 数据一致性检查方法、装置及系统
CN107643882A (zh) * 2017-09-29 2018-01-30 昂纳信息技术(深圳)有限公司 一种数据可靠性的存储及恢复方法、系统及存储装置
CN110413441A (zh) * 2019-06-18 2019-11-05 平安科技(深圳)有限公司 主备存储卷同步数据校验方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN110413441A (zh) 2019-11-05

Similar Documents

Publication Publication Date Title
WO2020253083A1 (zh) 主备存储卷同步数据校验方法、装置、设备及存储介质
TWI751402B (zh) 一種資料同步方法、分散式系統、電腦可讀取儲取媒體、電腦設備及分散式設備
US10176213B2 (en) Method and device for verifying consistency of data of master device and slave device
CN107977473B (zh) 基于Logback的分布式系统日志的检索方法和系统
WO2017215646A1 (zh) 数据传输方法和装置
CN109918261B (zh) 故障监听方法、装置、设备及计算机可读存储介质
CN112231271A (zh) 数据迁移完整性校验方法、装置、设备及计算机可读介质
Xiao et al. Towards web-based delta synchronization for cloud storage services
CN112968907B (zh) 数据传输方法、数据存储方法、数据查询方法、介质及设备
CN108243146B (zh) 一种信息提交方法
CN110908910B (zh) 一种基于区块链的测试监控方法、装置及可读存储介质
CN112822260A (zh) 文件传输方法及装置、电子设备、存储介质
JP2022553130A (ja) 温度データを格納及び収集するための方法、システム、電子機器及び記憶媒体
CN110889143A (zh) 文件校验方法及装置
US10176068B2 (en) Methods, systems, and computer readable media for token based message capture
CN110830500A (zh) 网络攻击追踪方法、装置、电子设备及可读存储介质
WO2016086638A1 (zh) 一种实现链路检测的方法、装置及计算机存储介质
US10949645B2 (en) Method, apparatus, and storage medium for data verification
CN111866106A (zh) 共识方法、装置、电子设备和可读存储介质
CN112667586B (zh) 一种基于流处理的数据同步的方法、系统、设备及介质
CN115883533A (zh) 文件同步方法、装置、计算机设备及存储介质
CN113468574B (zh) 一种区块链数据上链方法和装置
CN113094437B (zh) 一种基于Rsync的区块链状态数据同步方法及系统
CN107710165B (zh) 用于存储节点同步业务请求的方法和装置
CN114172894A (zh) 数据传输方法、装置、服务器和计算机设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19933901

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19933901

Country of ref document: EP

Kind code of ref document: A1