CN110874284A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN110874284A
CN110874284A CN201811021341.1A CN201811021341A CN110874284A CN 110874284 A CN110874284 A CN 110874284A CN 201811021341 A CN201811021341 A CN 201811021341A CN 110874284 A CN110874284 A CN 110874284A
Authority
CN
China
Prior art keywords
data
blocks
stripe
input data
storing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811021341.1A
Other languages
Chinese (zh)
Other versions
CN110874284B (en
Inventor
庄灿伟
董元元
赵亚飞
魏舒展
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201811021341.1A priority Critical patent/CN110874284B/en
Publication of CN110874284A publication Critical patent/CN110874284A/en
Application granted granted Critical
Publication of CN110874284B publication Critical patent/CN110874284B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1004Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Error Detection And Correction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data processing method and device. Wherein, the method comprises the following steps: acquiring the data length of input data; determining the number of blocks required by the stripe according to the data length; erasure codes EC are encoded on the input data according to the number of blocks and stored as stripes. The invention solves the technical problem that the execution efficiency of the system is reduced in the process of writing and reading data by the stripe configured in the prior art.

Description

Data processing method and device
Technical Field
The invention relates to the technical field of internet, in particular to a data processing method and device.
Background
Two common methods for reliably storing data in a storage system include multi-copy redundancy and Erasure Code (EC). The erasure coding technology is different from multi-copy redundancy, the EC codes m original data blocks to generate k check blocks to form a stripe group, and then the system can tolerate the damage of any k original data blocks or check blocks at most without data loss. Compared with multi-copy redundancy, the erasure code can reduce the data storage redundancy by more than 50% on the premise of not influencing the data reliability, thereby greatly reducing the storage cost.
In online applications, the offset and length of the write operation often do not align with the stripe group, handling misaligned writes is the largest difference between online EC and offline EC. In the process of processing the non-aligned writing, invalid data is additionally filled for strip data completion, and then EC coding and writing are carried out. Additional data padding may introduce additional computational and storage overhead. Larger stripes mean more padding and do not facilitate concurrent reading and writing of large data, but stripes cannot be set too small, which can cause more small data reads to be spread across multiple stripes.
The conventional method for configuring the size of the stripe mainly comprises the following steps: fixed stripe scheme, i.e. the size of the stripe is fixed during data writing.
However, the above method has the following problems in configuring the stripe size: the fixed stripe scheme results in an uneasy setting of the number of blocks and brings about a large additional calculation overhead and storage overhead.
In view of the above-mentioned problem that the stripe of the prior art configuration reduces the system execution efficiency during the data writing and reading process, no effective solution has been proposed at present.
Disclosure of Invention
Embodiments of the present invention provide a data processing method and apparatus, so as to at least solve the technical problem that the execution efficiency of a system is reduced in the process of data writing and reading due to a stripe configured in the prior art.
According to an aspect of an embodiment of the present invention, there is provided a data processing method, including: acquiring the data length of input data; determining the number of blocks required by the stripe according to the data length; erasure codes EC are encoded on the input data according to the number of blocks and stored as stripes.
Optionally, determining the number of blocks required for the stripe according to the data length includes: under the condition that the preset erasure code coding parameters comprise the number of data blocks in each row and the number of check blocks in each row, calculating the number of rows required for storing input data according to the number of the data blocks in each row; and calculating the number of blocks required for storing the strips of the input data according to the number of rows, the number of data blocks of each row and the number of check blocks of each row.
Optionally, performing erasure coding EC on the input data according to the number of blocks and storing the erasure coded EC as a stripe includes: comparing the determined number of blocks with the maximum number of blocks and the minimum number of blocks, respectively; under the condition that the number of blocks is larger than the maximum number of blocks, carrying out erasure correcting code (EC) coding on input data according to the maximum number of blocks and storing the erasure correcting code as a strip; and under the condition that the number of the blocks is less than the minimum number of the blocks, performing erasure correcting code EC coding on the input data according to the minimum number of the blocks and storing the input data as a strip.
Optionally, the method further includes: and under the condition that the block number is within the interval between the maximum block number and the minimum block number, performing erasure correcting code EC coding on the input data according to the block number and storing the input data as a strip.
Optionally, the method further includes: the stripe size of the stripe and the start position of the padding data are stored.
Optionally, the method further includes: data is read from the stripe according to the stripe size of the stripe and the start position of the padding data.
Optionally, the method further includes: when the input data is a data stream with large data volume, if the data length is larger than the maximum block number, erasure code EC coding is carried out on the input data according to the maximum block number and the input data is stored as a strip, and the data stream is stored in a multi-copy mode.
Optionally, the method further includes: when the input data is a data stream with small data volume, if the data length is smaller than the minimum block number, erasure code EC coding is carried out on the input data according to the minimum block number and the input data is stored as a strip, and the data stream is stored in a single copy mode.
According to an aspect of an embodiment of the present invention, there is provided an apparatus for data processing, including: the acquisition module is used for acquiring the data length of the input data; the adjusting module is used for determining the number of blocks required by the strip according to the data length; and the data generation module is used for carrying out erasure code EC coding on the input data according to the number of the blocks and storing the erasure code EC coding as a stripe.
According to an aspect of the embodiments of the present invention, there is provided a storage apparatus, the storage medium including a stored program, wherein the method of controlling a device on which the storage medium is located to perform the above-described data processing is performed when the program is executed.
In the embodiment of the invention, a mode of dynamically generating a strip according to the length of data written by a user is adopted, and the data length of input data is obtained; determining the number of blocks required by the stripe according to the data length; the input data is subjected to erasure code EC coding according to the number of the blocks and stored into the strips, so that the purpose of improving the execution efficiency of the system is achieved, the technical effects of reducing storage overhead and calculation overhead caused by extra filling are achieved, and the technical problem that the execution efficiency of the system is reduced in the process of writing and reading the data by the strips configured in the prior art is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a block diagram of the hardware architecture of a computer terminal of a method of data processing according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of data processing according to a first embodiment of the invention;
fig. 3 is a flowchart of a strip dynamic generation in a data processing method according to a first embodiment of the present invention;
FIG. 4 is a diagram comparing fixed stripe data stuffing and dynamic stripe data stuffing in a data processing method according to a first embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating a comparison between a big data read/write single stripe and a dynamic stripe in a data processing method according to a first embodiment of the present invention;
fig. 6 is a schematic diagram illustrating comparison between small data read/write scattered stripes and dynamic stripes in a data processing method according to a first embodiment of the present invention;
FIG. 7a is a flow chart illustrating a dynamic stripe technique writing process in a data processing method according to a first embodiment of the present invention;
FIG. 7b is a schematic diagram of a dynamic stripe reading process in the data processing method according to the first embodiment of the present invention;
fig. 8 is a schematic diagram of a data processing apparatus according to a second embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The technical terms related to the present application are:
erasure coding technique (EC, Erasure Code): the data reliability method in the storage system is different from multi-copy redundancy, the EC codes m original data blocks to generate k check blocks to form a stripe group, and then the system can tolerate the damage of any k original data blocks or check blocks at most without data loss.
And (3) online EC: unlike the offline EC, the online EC writes the write-through EC according to the user's actual write.
Tape (strip): the EC encodes the m original data blocks to generate k check blocks to form a strip (stripe);
band size: the size of all data blocks written by one stripe on each disk, i.e. the size of data that can be written by the data blocks on each disk, for example, the data size that can be written by all data blocks on disk a is 512KB, and 512KB is the stripe size of stripe a on disk a;
band size: the size of data that each data block can store, for example, 2KB per data block, 2KB being the stripe granularity of stripe a on disk a;
number of blocks of stripe: the number of blocks of data (including the original data blocks and the check blocks) stored by the stripe.
For simplicity of description, a data block hereinafter refers to an original data block.
Example 1
There is also provided, in accordance with an embodiment of the present invention, a method embodiment of data processing, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than that presented herein.
The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking the example of being operated on a computer terminal, fig. 1 is a hardware structure block diagram of a computer terminal of a data processing method according to an embodiment of the present invention. As shown in fig. 1, the computer terminal 10 may include one or more (only one shown) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission module 106 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the data processing method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing, i.e., implements the data processing method of the application program, by executing the software programs and modules stored in the memory 104. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
Under the above operating environment, the present application provides a method of data processing as shown in fig. 2. Fig. 2 is a flowchart of a data processing method according to a first embodiment of the present invention, and referring to fig. 2, the data processing method according to the first embodiment of the present invention includes:
step S202, acquiring the data length of input data;
step S204, determining the number of blocks required by the strip according to the data length;
in step S206, erasure correction codes EC are encoded on the input data according to the number of blocks and stored as stripes.
Specifically, in combination with steps S202 to S206, the data processing method provided by the present application can be applied to a storage system, and applied to erasure coding technology, the offset and length of the write operation in handling inline applications is not aligned with the stripe groups, additional data padding is required in handling misaligned writes, in order to avoid additional system resource overhead, or the resources allocated by the system can not meet the requirement of data processing, the data processing method provided by the application generates the corresponding stripe according to the length of the input data, so that even facing big data storage it is possible to generate a corresponding stripe according to the data length of the big data, namely, the storage resource waste caused by the fixed configuration of the stripe size in the prior art is avoided, the storage requirement of the big data is met on the basis of the storage size of the original system, and the operation and storage resources of the system are further saved.
In the embodiment of the invention, a mode of dynamically generating a strip according to the length of data written by a user is adopted, and the data length of input data is obtained; determining the number of blocks required by the stripe according to the data length; the input data is subjected to erasure code EC coding according to the number of the blocks and stored into the strips, so that the purpose of improving the execution efficiency of the system is achieved, the technical effects of reducing storage overhead and calculation overhead caused by extra filling are achieved, and the technical problem that the execution efficiency of the system is reduced in the process of writing and reading the data by the strips configured in the prior art is solved.
Optionally, the determining the number of blocks required by the stripe according to the data length in step 204 includes:
step S2041, under the condition that the preset erasure code coding parameters comprise the number of data blocks in each row and the number of check blocks in each row, calculating the number of rows required for storing input data according to the number of data blocks in each row;
step S2042, the number of blocks required for storing the stripe of the output data is calculated according to the number of rows, the number of data blocks in each row, and the number of parity blocks in each row.
Specifically, determining the number of blocks required for a stripe according to the data length includes:
Figure BDA0001787401070000061
wherein, the S tableIndicating the number of blocks, L indicating the data length, m indicating the number of original data blocks, k indicating the number of check blocks,
Figure BDA0001787401070000062
indicating rounding up.
Optionally, the performing, in step 206, erasure coding EC coding on the input data according to the number of blocks and storing the input data as a stripe includes:
step S2061, comparing the determined block number with the maximum block number and the minimum block number respectively;
step S2062, under the condition that the number of blocks is greater than the maximum number of blocks, carrying out erasure correcting code EC coding on the input data according to the maximum number of blocks and storing the erasure correcting code EC coding as a strip;
step S2063, under the condition that the number of blocks is less than the minimum number of blocks, carrying out erasure correcting code EC coding on the input data according to the minimum number of blocks and storing the erasure correcting code EC coding as a strip;
in step S2064, when the number of blocks is within the interval between the maximum number of blocks and the minimum number of blocks, erasure correction code EC encoding is performed on the input data according to the number of blocks and the input data is stored as a stripe.
Fig. 3 is a flowchart of dynamic generation of a stripe in a data processing method according to a first embodiment of the present invention, and as shown in fig. 3, a process of dynamically generating a corresponding stripe according to an input data length specifically includes the following steps:
step1, when receiving input data, calculating the number of blocks needed by the strip according to the data length of the data;
step2, judging whether the block number is larger than the set maximum block number, if yes, executing Step3, and if no, executing Step 4;
step3, generating a stripe by taking the maximum block number as the block number, keeping S equal to S-max, continuing to generate the next stripe, and executing steps 1 and Step2 until the calculated block number is less than the maximum block number, wherein L is the data length and max is the maximum block number;
step4, judging whether the block number is less than the set minimum block number, if yes, executing Step5, and if no, executing Step 6;
step5, generating a strip by taking the minimum block number as the block number;
step6, generate the band by the number of blocks.
In addition, compared with a fixed stripe data filling scheme in the prior art, in the data processing method provided by the present application, by generating corresponding stripes according to data length, storage resources of a system can be saved, as shown in fig. 4, fig. 4 is a comparison diagram of fixed stripe data filling and dynamic stripe data filling in the data processing method according to the first embodiment of the present invention, wherein erasure codes EC are encoded in a structure of "3 data blocks +1 parity blocks", a maximum block number is set to be 4 × 4, a minimum block number is set to be 4 × 1, and different stripes are represented by different lines. For writing 15 data blocks, in the fixed stripe data padding scheme: the left fixed stripe (the number of blocks is fixedly set to 4 x 4) needs to be filled with 9 invalid data blocks, in the dynamic stripe data filling scheme: the right dynamic stripe does not need to be filled with invalid data blocks. Where D represents a data block, P represents a parity block, X represents a padding block (i.e., an invalid data block), the left side is a fixed stripe, and the right side is a dynamic stripe.
Based on the above example of the dynamic stripe data filling scheme, the method for data processing provided by the embodiment of the present application uses the dynamic stripe generation according to the length of the user written data, where 15 data blocks, i.e. the input data length, are based on the data length shown in fig. 4
Figure BDA0001787401070000073
Wherein S represents the number of blocks required by the strip, L represents the data length, m represents the number of original data blocks, and k represents the number of check blocks;
assuming that L is 15, since erasure code EC is encoded in a structure of "3 data block +1 parity block", m is 3, and k is 1;
substituting the data into
Figure BDA0001787401070000071
Obtaining:
Figure BDA0001787401070000072
as shown in fig. 4, the number of blocks in the dynamic stripe data padding scheme is "20", that is, based on 12 data blocks in the original storage size, a row of 4 × 1 stripes are added, that is, 3 data blocks and 1 parity block are added, as shown in fig. 4, the dynamic stripe does not need to pad invalid data blocks, where the textured portion is the added data block.
Based on the flow shown in fig. 3, since S is greater than the maximum block number max, i.e., 20 > 16, then as shown in fig. 4, a stripe with the block number of 16 is generated based on max, i.e., based on max, and 3 data blocks and 1 parity block are generated based on S-max-20-16-4, i.e., a row of 4 × 1 stripes is added with "3 data blocks +1 parity block". The size of the slice may be 20KB, which is 20 × 1KB when the granularity of the slice is 1KB, resulting in a slice size of 20KB for writing 15 data blocks.
Compared with the dynamic stripe data filling scheme provided by the embodiment of the present application, the fixed stripe data filling scheme occupies more data blocks, that is, the left portion of fig. 4, if a data block with a data length of 15 is stored, because the original data block is 12 blocks, the fixed stripe data filling needs to be performed in a 4 × 4 structure in a manner of "3 data blocks +1 parity block" in each row, as shown in the left portion of fig. 4, because the original data storage space in the fixed stripe data filling scheme is insufficient, an additional 4 × 4 stripe needs to be added, so that only 3 data blocks are stored in the newly added stripe, and the remaining 9 empty data blocks are filled with additional invalid data blocks, which causes waste in storage, and thus, the dynamic stripe data filling scheme provided by the embodiment of the present application effectively reduces storage overhead and calculation overhead caused by the additional filling, compared with the existing fixed stripe data filling scheme, the method saves more storage resources.
The dynamic stripe data filling scheme provided by the embodiment of the present application is described by taking large data block read-write and small data block read-write as examples respectively:
the first scheme is as follows: reading and writing the big data block:
optionally, the data processing method provided in the embodiment of the present application further includes:
step S210, when the input data is a data stream with a large data size, if the data length is greater than the maximum block number, performing erasure correction code EC coding on the input data according to the maximum block number and storing the erasure correction code EC coding as a stripe, and storing the data stream in a multi-copy form.
Specifically, as shown in fig. 5, fig. 5 is a schematic diagram illustrating a comparison between a large data read-write single stripe and a dynamic stripe in the data processing method according to the first embodiment of the present invention; when a user inputs a large data block, the dynamic stripe technology automatically divides the large data block into a plurality of stripes by setting a maximum stripe, so that the concurrent advantages of a plurality of copies are exerted, and data can be quickly read through the plurality of copies. For example, when a user reads data from D0 to D7, a single stripe on the left can only read one copy, and a single stripe maximum value is set on the right to read the two copies simultaneously, namely, when the user inputs a large data block, the dynamic stripe technology automatically divides the large data block into a plurality of stripes by setting the maximum stripe, so that the concurrent advantages of multiple copies are exerted, and the reading speed is greatly improved. Wherein D represents a data block, P represents a check block, X represents a padding block, the left side is a single stripe, and the right side is a dynamic stripe.
It should be noted that the large data block provided by the present application refers to a data stream with a large data amount, and in the process of encoding erasure codes according to the data block and the check block, since the maximum block number and the minimum block number are set in the dynamic stripe, if the length of the input data is greater than the maximum block number, the stripe is generated according to the maximum block number, the data stream is stored in a multi-copy form, the storage space is fully utilized in storage, the reading speed is increased in reading, and distributed storage of data is implemented in a distributed storage system.
In addition, the present application only takes the data D0 to D7 as an example for description, so as to implement the method for data processing provided by the present application, which is not limited specifically.
Scheme II: reading and writing the small data blocks:
optionally, the data processing method provided in the embodiment of the present application further includes:
step S211, when the input data is a data stream with a small data size, if the data length is smaller than the minimum block number, performing erasure correction code EC coding on the input data according to the minimum block number and storing the erasure correction code EC coding as a stripe, and storing the data stream in a single copy form.
Specifically, as shown in fig. 6, fig. 6 is a schematic diagram illustrating a comparison between small data read-write scattered stripes and dynamic stripes in the data processing method according to the first embodiment of the present invention; when a user inputs a small data block, the dynamic stripe technology enables the small data block not to be dispersed to a plurality of copies by setting a minimum value to automatically fill partial data, so that the problem of small data IO diffusion is avoided, and storage resources are saved. For example, when a user reads small data blocks D0 to D1, the left stripe needs to read two copies, and the right stripe can read from only one copy by setting a single stripe minimum (in this example, setting the minimum block number to be 4 × 2), that is, when the user inputs a small data block, the dynamic stripe technology automatically fills partial data by setting the minimum value so that the small data block is not scattered to multiple copies, thereby avoiding the small data IO scattering problem, and by comparison, the dynamic stripe reading speed is higher than that of scattered stripes. Wherein D represents a data block, P represents a check block, X represents a padding block, the left side is a single stripe, and the right side is a dynamic stripe.
It should be noted that the small data block provided by the present application refers to a data stream with a small data amount, and in the process of encoding erasure codes according to the data block and the check block, since the maximum block number and the minimum block number are set in the dynamic stripe, if the length of the input data is smaller than the minimum block number, the stripe is generated according to the minimum block number, the data stream is stored in the form of a single copy, space is saved in storage, and the reading speed is increased in reading.
In addition, the present application only takes the data D0 to D1 as an example for description, so as to implement the method for data processing provided by the present application, which is not limited specifically.
Optionally, the data processing method provided in the embodiment of the present application further includes:
in step S208, the stripe size of the stripe and the start position of the padding data are stored.
Specifically, as shown in fig. 7a, fig. 7a is a schematic diagram illustrating a dynamic stripe writing flow in a data processing method according to a first embodiment of the present invention; during the process of writing data into the dynamic stripe, firstly, the size of the stripe and filling data are calculated according to the actual data length of the received data; second, the data is written according to the stripe size; finally, the stripe size and the start position of the padding data are saved.
Optionally, the data processing method provided in the embodiment of the present application further includes:
in step S209, data is read from the stripe according to the stripe size of the stripe and the start position of the padding data.
Specifically, as shown in fig. 7b, fig. 7b is a schematic view of a reading flow of the dynamic stripe technique in the data processing method according to the first embodiment of the present invention; in the process of reading data by a dynamic stripe, firstly, loading the size of the stripe and the initial position of filling data; secondly, calculating the effective data length of the data according to the initial position of the filling data; finally, the data is read according to the stripe size and the effective data length.
The data processing method provided by the application can dynamically calculate the number of blocks required by the strip according to the size of the service write data, so that the storage overhead and the calculation overhead caused by extra filling are reduced; the user is allowed to set the number of blocks required for a stripe at granularity such that the number of blocks required for a stripe is more adapted to the storage unit or the calculation unit. In addition, when the input data of the user is large, the data is divided into a plurality of strips by setting the maximum value of a single strip, so that the advantages of a plurality of copies are fully utilized for concurrent reading and writing. When the user inputs small data, the user data is concentrated on a specific copy by setting the minimum value of the strip, so that the scattered reading and writing of the small data are avoided.
The granularity of the stripes involved in the process of calculating the number of blocks required by the stripes can be set according to the calculation module and the storage module of the system, so that the number of blocks required by the stripes is more adaptive to the calculation module and the storage module.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method of data processing according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
According to an embodiment of the present invention, there is also provided an apparatus for implementing the method for data processing in the first embodiment, and fig. 8 is a schematic diagram of an apparatus for data processing according to a second embodiment of the present invention, as shown in fig. 8, the apparatus includes: an acquisition module 802, an adjustment module 804, and a data generation module 806.
The obtaining module 802 is configured to obtain a data length of input data; an adjusting module 804, configured to determine the number of blocks required for a stripe according to the data length; and a data generating module 806, configured to perform erasure code EC coding on the input data according to the number of blocks and store the erasure code EC coding as a stripe.
In the embodiment of the invention, a mode of dynamically generating a strip according to the length of data written by a user is adopted, and the acquisition module is used for acquiring the data length of input data; the adjusting module is used for determining the number of blocks required by the strip according to the data length; the data generation module is used for performing erasure code EC coding on the input data according to the number of the blocks and storing the erasure code EC coding as a stripe, so that the purpose of improving the execution efficiency of the system is achieved, the technical effects of reducing the storage overhead and the calculation overhead caused by extra filling are achieved, and the technical problem that the execution efficiency of the system is reduced in the process of writing and reading data by the stripe configured in the prior art is solved.
Example 3
According to an aspect of an embodiment of the present invention, there is provided a storage apparatus, the storage medium including a stored program, wherein the apparatus on which the storage medium is located is controlled to execute the method of the data processing of the above embodiment 1 when the program runs.
Example 4
The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the data processing method provided in the first embodiment.
Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring the data length of input data; determining the number of blocks required by the stripe according to the data length; erasure codes EC are encoded on the input data according to the number of blocks and stored as stripes.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: under the condition that the preset erasure code coding parameters comprise the number of data blocks in each row and the number of check blocks in each row, calculating the number of rows required by the length of the stored data according to the number of the data blocks in each row; and calculating the number of blocks required by the strip for storing the data length according to the number of rows, the number of data blocks in each row and the number of check blocks in each row.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: performing erasure coding EC on input data according to the number of blocks and storing as stripes includes: comparing the determined number of blocks with the maximum number of blocks and the minimum number of blocks, respectively; under the condition that the number of blocks is larger than the maximum number of blocks, carrying out erasure correcting code (EC) coding on input data according to the maximum number of blocks and storing the erasure correcting code as a strip; and under the condition that the number of the blocks is less than the minimum number of the blocks, performing erasure correcting code EC coding on the input data according to the minimum number of the blocks and storing the input data as a strip.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: and under the condition that the block number is within the interval between the maximum block number and the minimum block number, performing erasure correcting code EC coding on the input data according to the block number and storing the input data as a strip.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: the stripe size of the stripe and the start position of the padding data are stored.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: data is read from the stripe according to the stripe size of the stripe and the start position of the padding data.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: when the input data is a data stream with large data volume, if the data length is larger than the maximum block number, erasure code EC coding is carried out on the input data according to the maximum block number and the input data is stored as a strip, and the data stream is stored in a multi-copy mode.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: when the input data is a data stream with small data volume, if the data length is smaller than the minimum block number, erasure code EC coding is carried out on the input data according to the minimum block number and the input data is stored as a strip, and the data stream is stored in a single copy mode.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method of data processing, comprising:
acquiring the data length of input data;
determining the number of blocks required by the stripe according to the data length;
and carrying out erasure code EC coding on the input data according to the block number and storing the erasure code EC coding as a stripe.
2. The method of data processing according to claim 1, wherein said determining the number of blocks required for a stripe according to the data length comprises:
under the condition that the preset erasure code coding parameters comprise the number of data blocks in each row and the number of check blocks in each row, calculating the number of rows required for storing the input data according to the number of the data blocks in each row;
and calculating the number of blocks required by the strip for storing the input data according to the number of the rows, the number of the data blocks of each row and the number of the check blocks of each row.
3. The method of data processing according to claim 1 or 2, wherein said EC-encoding and storing said input data as stripes according to said number of blocks comprises:
comparing the determined number of blocks with a maximum number of blocks and a minimum number of blocks, respectively;
performing Erasure Code (EC) encoding on the input data according to the maximum block number and storing the encoded input data as the stripe when the block number is greater than the maximum block number;
and under the condition that the number of the blocks is less than the minimum number of the blocks, performing Erasure Code (EC) coding on the input data according to the minimum number of the blocks and storing the erasure code as the stripe.
4. The method of data processing according to claim 3, wherein the method further comprises:
and under the condition that the block number is within the interval between the maximum block number and the minimum block number, performing erasure correcting code (EC) coding on the input data according to the block number and storing the erasure correcting code as the stripe.
5. The method of data processing according to claim 1, wherein the method further comprises:
the stripe size of the stripe and the start position of the padding data are stored.
6. The method of data processing according to claim 1, wherein the method further comprises:
data is read from a stripe according to its stripe size and the start position of padding data.
7. The method of data processing according to claim 3, wherein the method further comprises:
when input data is data stream with large data volume, if the data length is larger than the maximum block number, performing erasure correcting code (EC) coding on the input data according to the maximum block number and storing the erasure correcting code as the stripe, and storing the data stream in a multi-copy mode.
8. The method of data processing according to claim 3, wherein the method further comprises:
when input data is data stream with small data volume, if the data length is smaller than the minimum block number, performing erasure correcting code (EC) coding on the input data according to the minimum block number and storing the erasure correcting code as the stripe, and storing the data stream in a single copy mode.
9. An apparatus for data processing, comprising:
the acquisition module is used for acquiring the data length of the input data;
the adjusting module is used for determining the number of blocks required by the strip according to the data length;
and the data generation module is used for carrying out erasure code EC coding on the input data according to the block number and storing the erasure code EC coding as a strip.
10. A storage apparatus, the storage medium comprising a stored program, wherein the program, when executed, controls a device in which the storage medium is located to perform the method of data processing according to any one of claims 1 to 8.
CN201811021341.1A 2018-09-03 2018-09-03 Data processing method and device Active CN110874284B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811021341.1A CN110874284B (en) 2018-09-03 2018-09-03 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811021341.1A CN110874284B (en) 2018-09-03 2018-09-03 Data processing method and device

Publications (2)

Publication Number Publication Date
CN110874284A true CN110874284A (en) 2020-03-10
CN110874284B CN110874284B (en) 2024-03-22

Family

ID=69716816

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811021341.1A Active CN110874284B (en) 2018-09-03 2018-09-03 Data processing method and device

Country Status (1)

Country Link
CN (1) CN110874284B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414271A (en) * 2020-03-17 2020-07-14 上海爱数信息技术股份有限公司 Storage method based on self-adaptive storage redundancy strategy
CN113311993A (en) * 2021-03-26 2021-08-27 阿里巴巴新加坡控股有限公司 Data storage method and data reading method
WO2023020136A1 (en) * 2021-08-20 2023-02-23 华为技术有限公司 Data storage method and apparatus in storage system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164836A1 (en) * 2007-12-21 2009-06-25 Spansion Llc Error correction in flash memory array
CN104503706A (en) * 2014-12-23 2015-04-08 中国科学院计算技术研究所 Data storing method and data reading method based on disk array
CN105677508A (en) * 2015-12-16 2016-06-15 浪潮(北京)电子信息产业有限公司 Method and system for modifying erasure code data in cloud storage
WO2018001110A1 (en) * 2016-06-29 2018-01-04 中兴通讯股份有限公司 Method and device for reconstructing stored data based on erasure coding, and storage node

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164836A1 (en) * 2007-12-21 2009-06-25 Spansion Llc Error correction in flash memory array
CN104503706A (en) * 2014-12-23 2015-04-08 中国科学院计算技术研究所 Data storing method and data reading method based on disk array
CN105677508A (en) * 2015-12-16 2016-06-15 浪潮(北京)电子信息产业有限公司 Method and system for modifying erasure code data in cloud storage
WO2018001110A1 (en) * 2016-06-29 2018-01-04 中兴通讯股份有限公司 Method and device for reconstructing stored data based on erasure coding, and storage node

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MINSOO RHU: "Memory-less bit-plane coder architecture for JPEG2000 with concurrent column-stripe coding" *
姜国松: "RAID控制器中矩阵重构方法研究" *
姜国松;丁红;狄平;谢长生;: "一种高性能阵列架构研究" *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414271A (en) * 2020-03-17 2020-07-14 上海爱数信息技术股份有限公司 Storage method based on self-adaptive storage redundancy strategy
CN111414271B (en) * 2020-03-17 2023-10-13 上海爱数信息技术股份有限公司 Storage method based on self-adaptive storage redundancy strategy
CN113311993A (en) * 2021-03-26 2021-08-27 阿里巴巴新加坡控股有限公司 Data storage method and data reading method
CN113311993B (en) * 2021-03-26 2024-04-26 阿里巴巴创新公司 Data storage method and data reading method
WO2023020136A1 (en) * 2021-08-20 2023-02-23 华为技术有限公司 Data storage method and apparatus in storage system

Also Published As

Publication number Publication date
CN110874284B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
CN110874284A (en) Data processing method and device
KR102326830B1 (en) Methods, devices and devices for determining transport block size
CN110837479B (en) Data processing method, related equipment and computer storage medium
CN105677508B (en) The amending method and system of correcting and eleting codes data in a kind of cloud storage
CN109597798B (en) Network file deleting method and device, computer equipment and storage medium
CN110659151A (en) Data verification method and device and storage medium
CN108334419B (en) Data recovery method and device
WO2020119770A1 (en) Information processing method and device and computer storage medium
CN113687975A (en) Data processing method, device, equipment and storage medium
CN111124282A (en) Storage method, storage device and storage equipment in object storage system
CN112235422B (en) Data processing method and device, computer readable storage medium and electronic device
CN105007286A (en) Decoding method, decoding device, and cloud storage method and system
CN111694703A (en) Cache region management method and device and computer equipment
CN107657984B (en) Error correction method, device and equipment of flash memory and computer readable storage medium
CN116303297B (en) File compression processing method, device, equipment and medium
CN111857549B (en) Method, apparatus and computer program product for managing data
CN112422604A (en) File uploading method, device and system and computer equipment
US20180232182A1 (en) Information processing apparatus, data compressing method, and computer-readable recording medium
CN104933010B (en) A kind of data de-duplication method and device
CN115454343A (en) Data processing method, device and medium based on RAID chip
CN108234552B (en) Data storage method and device
CN110968255A (en) Data processing method, data processing device, storage medium and processor
CN114995770A (en) Data processing method, device, equipment, system and readable storage medium
CN114662689A (en) Pruning method, device, equipment and medium for neural network
CN107203559B (en) Method and device for dividing data strips

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant