WO2021073241A1 - 一种基于磁盘存储的数据读取方法、装置及设备 - Google Patents
一种基于磁盘存储的数据读取方法、装置及设备 Download PDFInfo
- Publication number
- WO2021073241A1 WO2021073241A1 PCT/CN2020/109273 CN2020109273W WO2021073241A1 WO 2021073241 A1 WO2021073241 A1 WO 2021073241A1 CN 2020109273 W CN2020109273 W CN 2020109273W WO 2021073241 A1 WO2021073241 A1 WO 2021073241A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- block
- data
- interval
- block height
- data record
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0674—Disk device
- G06F3/0676—Magnetic disk device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0679—Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
Definitions
- the embodiments of this specification relate to the field of information technology, and in particular, to a method, device, and device for reading data based on disk storage.
- the ledger On the database server side that provides services externally with a centralized block chain ledger, the ledger itself is persistently stored on the disk. Then when the user reads, it also needs to read from the disk. Due to the characteristics of the blockchain ledger, the user's data may be randomly distributed in each sector of the disk. The general reading method is more efficient. low.
- the purpose of the embodiments of the present application is to provide a more efficient data reading solution based on disk storage.
- a data reading method based on disk storage includes: receiving a data reading instruction sent by a client, wherein the reading instruction includes a business attribute; and obtaining the corresponding business attribute from a pre-stored index table
- the location information set of the data record where the location information includes the block height of the data block where the data record is located, and the offset in the data block where the data record is located; the block heights are arranged in order to generate the block height Sequence, determine M mutually exclusive block height continuous block height intervals from the block height sequence; for any block height interval, read the data block corresponding to the block height interval from the disk; according to the position
- the information set queries the data records obtained from the data blocks corresponding to the block height interval, and returns to the client.
- an embodiment of this specification also provides a data reading device based on disk storage, including: a receiving module that receives a data reading instruction sent by a client, wherein the reading instruction includes a business attribute; a location The information acquisition module acquires the location information set corresponding to the business attribute from the pre-stored index table, where the location information includes the block height of the data block where the data record is located, and the information in the data block where the data record is located.
- block height interval generation module which sequentially arranges the block heights to generate a block height sequence, and determines M mutually exclusive block height continuous block height intervals from the block height sequence; data block read The fetching module reads the data block corresponding to the block height interval from the disk for any block height interval; the data record reading module queries the data block corresponding to the block height interval according to the position information set Record the acquired data and return it to the client.
- the block height corresponding to the business attribute is first sorted to obtain the block height sequence, and then several block height intervals are obtained from the block height sequence , Read continuously from the disk the full data blocks of these block heights, and then read the data records from the data blocks according to the position information, which reduces the number of disk reads and improves the input/output of the disk. (Input/Output, IO) efficiency, thereby improving the reading efficiency of the blockchain ledger in disk storage.
- Input/Output, IO Input/Output
- any one of the embodiments of the present specification does not need to achieve all the above-mentioned effects.
- Figure 1 is a schematic diagram of the system architecture involved in an embodiment of the specification
- Fig. 2 is a schematic diagram of a process for generating a block chain ledger provided by an embodiment of the specification
- FIG. 3 is a schematic flowchart of a method for creating an index of data records according to an embodiment of this specification
- FIG. 4 is a schematic flowchart of a data reading solution based on disk storage provided by an embodiment of this specification
- FIG. 5 is a schematic structural diagram of a data reading device based on disk storage provided by an embodiment of this specification
- Fig. 6 is a schematic structural diagram of a device for configuring the method of the embodiment of this specification.
- FIG. 1 is a schematic diagram of a system architecture involved in an embodiment of the specification.
- an enterprise organization can face multiple users, and each user can query the database service provider through its corresponding enterprise organization.
- the organization connected to the database server is a financial product company, and the data records can be individual users’ financial management records in the financial product company; or, the connected organization can be a government department, where the data records are what the government department has for the department. Expense details of the managed public project; or, the database server is connected to a hospital, and the data record is the patient’s medical record; or the database server is connected to a third-party payment agency, and the data record can be an individual user Payment records through the agency, and so on.
- FIG. 2 is a schematic diagram of the process of generating a block chain ledger provided by the embodiment of this specification, including S201 to S203.
- S201 Receive a data record to be stored, and determine a hash value of each data record, where the data record contains business attributes.
- the data records to be stored here can be various consumption records of individual users of the client, or can be business results, intermediate states, and operation records generated when the application server executes business logic based on user instructions.
- Specific business scenarios can include consumption records, audit logs, supply chains, government supervision records, medical records, and so on.
- the business attribute is generally unique in the interface organization.
- the business attributes are based on different business scenarios and can include user names, user ID numbers, and driver’s licenses. Number, mobile phone number, project unique number, type of data record (such as financial package number), etc.
- the data record is the user's consumption record
- the business attribute at this time is the user ID (including mobile phone number, ID number, user name, etc.), or the user ID is hashed
- the hash value obtained by the algorithm; or, for government agencies, the data record is the overhead flow of multiple public projects, then the business attribute at this time can be a unique number for each project.
- Business attributes can be stored in a specified location in the data record, such as the head or tail of the data record.
- S203 When a preset block forming condition is reached, determine each data record to be written in the data block, and generate an Nth data block including the hash value of the data block and the data record.
- the preset blocking conditions include: the number of data records to be stored reaches the number threshold, for example, every time one thousand data records are received, a new data block is generated and one thousand data records are written into the block; or , The time interval from the last block formation time reaches the time threshold, for example, every 5 minutes, a new data block is generated, and the data records received within these 5 minutes are written into the block.
- N refers to the serial number of the data block.
- the data block is in the form of a block chain, which is arranged sequentially based on the order of the block time, and has strong timing characteristics.
- the block height of the data block increases monotonically based on the sequence of the block time.
- the block height can be a sequence number.
- the block height of the Nth data block is N; the block height can also be generated in other ways, for example, the block time symmetric encryption is converted into large integer data (for example, 12-bit or 15-bit Integer) as the block height.
- the data block at this time is the initial data block.
- the current data block (the first data block) can be generated based on the hash value of the previous data block (that is, the N-1th data block). For example, a feasible way is to determine the hash value of each data record to be written in the Nth block, and generate a Merck according to the order in the block.
- the root hash value of the Merkel tree and the hash value of the previous data block are spliced together, and the hash algorithm is used again to generate the hash value of the current block.
- the hash value of the corresponding data record and the hash value of the data block can be obtained and saved, and integrity verification can be initiated based on the hash value.
- the specific verification method is to recalculate the hash value of the data record itself and the hash value of the data block in the database, and compare with the locally stored hash value.
- each data block is determined by a hash value
- the hash value of the data block is determined by the content and order of the data records in the data block and the hash value of the previous data block.
- the user can initiate verification based on the hash value of the data block at any time. Any modification of the content of the data block (including the modification of the data record content or sequence in the data block) will result in the hash value of the data block calculated during verification.
- the hash value of the data block is inconsistent when it is generated, which causes the verification to fail, thus realizing the immutability under centralization.
- a segment of data block is designated for continuous integrity verification, or continuous integrity verification starts from the initial data block.
- the verification method is to obtain the hash value of the previous data block, and use the same algorithm as when generating the hash value of the data block, and recalculate its own data according to its own data record and the hash value of the previous data block.
- the hash value of the block for verification is to obtain the hash value of the previous data block, and use the same algorithm as when generating the hash value of the data block, and recalculate its own data according to its own data record and the hash value of the previous data block.
- FIG. 3 is a schematic flow chart of a method for creating an index of data records provided by an embodiment of this specification. The flow specifically includes the following steps S301 to S305.
- S301 In the block chain ledger, for any data record, obtain the business attributes contained in the data record.
- the specific location and acquisition method of the business attributes can be negotiated in advance by the database server and the docking organization.
- the business attributes can be obtained from the specified offset in the data record, or the start and end positions can be identified by specific characters; or, the docking organization provides
- the header containing the business attribute can be directly spliced at the beginning of each data record when uploading by the docking agency, and the database server can directly obtain the business attribute of each data record from the header .
- S303 Determine location information of the data record in the ledger, where the location information includes the block height of the data block where the data record is located, and the offset in the data block where the data record is located.
- a block-chain ledger is composed of multiple data blocks, and at the same time, a data block usually contains multiple data records. Therefore, in the embodiment of this specification, the location information specifically refers to which data block in the ledger is located when a data record is saved, and where it is in the data block.
- the hash value of the data block is a hash value obtained by hash calculation based on the previous block's hash value and its own data record, which can be used to uniquely and unambiguously identify a data block.
- the block height of the first data block is 0, and the block height is increased by 1 for each additional data block; or, the block time of the data block can be converted into a large monotonic increase Integer data (usually 12 to 15 bits) sequence, as the block height of the data block. Therefore, a data block usually has a clear block height.
- the order of the data records has also been fixed, so the serial number of a data record in the data block is also clear.
- the sequence number can also be used to clarify the location information of the data record in the data block in which it is located. That is, the sequence number can also be used to indicate the offset of the data recorded in the data block.
- the address offset of each data record in the data block can also be used to identify the data records in the data block respectively.
- the address offset of each data record is not the same.
- the specific format of the data block can be customized (for example, the metadata information and remark information contained in the block header of the data block, and the block height of the data block is adopted Format, etc.), in different formats, the content of the location information will also be different, which does not constitute a limitation to this solution.
- S305 Establish a corresponding relationship between the business attribute and location information, and write an index with the business attribute as the main key.
- the index is an inverted index.
- the primary key is the business attribute contained in the data record.
- the specific writing method is: when the primary key in the index does not include the designated identification field, an index record with the designated identification field as the primary key is created in the index table.
- the location information is written into the index record where the designated identification field is located. It should be noted that the writing here is not an overwriting writing, but the location information is added to the value of the index record, and it is stored in the index record alongside other location information.
- Table 1 is an exemplary index table provided in the embodiment of this specification.
- the Key is the specific value of the business attribute, and each array in the Value part is a piece of position information.
- the first part of each array is the block height, and the latter part is the serial number of the data recorded in the data block, passing the block height and serial number That is, a data record can be uniquely determined. It is easy to understand that in the index table, a key can correspond to multiple location information.
- the business attributes of the data records and the storage location in the ledger are determined, the corresponding relationship between the two is established, and the business attribute is created as the main key. Sort index, without knowing the user's business details, from the index, you can perform corresponding statistics on data records based on business attributes, as well as subsequent query and verification.
- the location information can also be arranged in sequence according to the order in which the data is recorded in the ledger, which is conducive to the user's query and verification.
- the sequence of data records in the ledger can be reflected by the timestamp when the data record is written into the ledger (that is, the block timestamp of the data block), and the sequence of data records in the same data block can be reflected in the The order in the data block is reflected one after another.
- the status query and statistics of the business attributes can be performed based on the index table. For example, receiving a query request containing the specific value of a business attribute (generally, the query request can be sent in the form of an instruction).
- a disk refers to a memory that uses magnetic recording technology to store data, including a soft disk (soft disk, floppy disk for short) or a hard disk (hard disk, hard disk for short).
- the process of reading and writing data is generally to first issue an instruction to notify the starting sector position of the disk, and then give the number of consecutive sectors that need to be read from this initial sector (or Is one), and also gives whether the action is read or write.
- the disk receives this instruction, it will read or write data in accordance with the requirements of the instruction.
- continuous/random IO will appear.
- continuous and random refers to whether the initial sector address given by this IO and the end sector address of the last IO are continuous or not much apart. If so, this IO should be regarded as a continuous IO, otherwise it is regarded as a random IO.
- the time used by an IO seek time + data transmission time. Since the seek time is several orders of magnitude larger than the transmission time, the key factor that affects IOPS is the bottom seek time. In the case of continuous IO, because the initial sector this time is very close to the end sector last time, The magnetic head hardly needs to change lanes or the lane change time is extremely short; if the phase difference is too large, the magnetic head needs a long lane change time, and if there are many random IOs, the magnetic head keeps changing lanes and the efficiency is greatly reduced.
- the data blocks in the ledger are generally stored in the order of serial numbers on the disk. It is assumed that a data block in the ledger and a sector in the disk occupy roughly the same size (or occupy If multiple sectors are included), reading one data block at a time is equivalent to reading one sector from the disk. As mentioned above, because the user’s data is often irregular, it may store more data in a short period of time and write several adjacent data blocks; it may also store some data records at intervals, so that it can be stored in the ledger. The inside is relatively scattered.
- FIG. 4 is a schematic flowchart of a data reading solution based on disk storage provided by an embodiment of this specification. The process specifically includes the following steps S401 to S409.
- S401 Receive a data read instruction sent by a client, where the read instruction includes a business attribute.
- Data reading can come from the docking organization, or it can be from the service user of the docking organization. Therefore, the database can perform matching from the index table according to the specific value of the business attribute. For example, after Table 1 is created, the user enters a query command, Retrieve (0X123456, &v, FULL).
- S403 Obtain a set of location information corresponding to the service attribute from a pre-stored index table, where the location information includes the block height of the data block where the data record is located, and the offset in the data block where the data record is located. the amount.
- the database server can obtain the location information (2,08), (2,10), (300,89), (300,999) of the corresponding data record of the user "0X123456" from the index table.
- S405 Arrange the block heights in order to generate a block height sequence, and determine M mutually exclusive block height intervals with continuous block heights from the block height sequence.
- the block height sequence refers to the sequence in which the block heights are arranged in ascending order. For example, for location information (2,08), (2,10), (300,89), (300,999), the obtained block heights are 2 and 300, and the block height sequence "2,300" is obtained by sorting.
- the block height sequence obtained is often as follows: "1, 2, 4, 5, 6, 9, 11, 13, 18, 23, 25, 27, 50 , 51, 53, 55, 99, 130, 131, 155."
- the number of blocks may be tens of thousands or more. In this case, if you read each block in turn, it is obviously random IO, which is too inefficient.
- each block height interval does not contain the same block height, that is, each block height interval does not overlap.
- the principle of determining the block height interval is: the invalid block height in the block height interval (that is, the block height that is not in the block height sequence) should not be too much. Otherwise, too many invalid data blocks are read, which also affects the reading efficiency of data records.
- the embodiment of this specification provides an exemplary block height interval determination method, which is specifically as follows:
- the next one will start with the sequence number 18, because the array (18,23) The interval exceeds the preset value 3. Therefore, "18" here will not be included in the block height interval, the second block height interval will be determined as [23, 27], and the third block height interval is [50,55], the fourth block height interval is [130,131], and so on, until the last block height in the block height sequence.
- the storage itself is not continuous with the data block, that is, the data block itself is stored in the order of block height in the disk, so it can be batched from the disk according to the order.
- Continuous reading although there will be some invalid data blocks (that is, the data block does not contain user data records), but based on the aforementioned principle, the number of invalid data blocks in the block height interval is not many.
- the number of track changes of the head in the disk in the IO is reduced, thereby improving the efficiency of reading data blocks.
- the read data block will be placed in the cache or memory of the database server.
- the efficiency of reading and writing will be greatly improved compared to the disk.
- read and write speed there is basically no impact.
- block heights in the block height sequence may not be completely summarized in the block height range, such as the aforementioned block heights "18" and "99", etc., for those that are not in the block height range Scattered blocks are high and cannot be discarded or read, and a single random read can still be performed.
- step S409 Query the acquired data record from the data block corresponding to the block height interval according to the location information set, and return it to the client. Specifically, that is, in the memory, according to the block height and offset obtained in step S402, the data block obtained by the aforementioned reading can be queried one by one.
- the block height corresponding to the business attribute is first sorted to obtain the block height sequence, and then several block height intervals are obtained from the block height sequence , Read the full data blocks of these block heights from the disk continuously, and then read the data records from the data blocks according to the position information, which reduces the number of lane changes when the disk is read, and improves the IO efficiency of the disk. This improves the reading efficiency of the block chain ledger when it is stored on the disk.
- a transformable filtering method can also be a setting condition: the length of the block height interval is not less than a preset value. For example, the length of the block height interval is not less than 4, so that the block height intervals that contain less effective block height can also be filtered out. In this way, the grouping efficiency of the block high sequence can be improved, thereby increasing the reading speed.
- FIG. 5 is a schematic structural diagram of a data reading device based on disk storage provided by the embodiment of this specification, including The following modules.
- the receiving module 501 receives a data read instruction sent by the client, where the read instruction includes a business attribute.
- the location information obtaining module 503 obtains the location information set corresponding to the business attribute from the pre-stored index table, where the location information includes the block height of the data block where the data record is located, and the data block where the data record is located. The offset in.
- the block height interval generating module 505 sequentially arranges the block heights to generate a block height sequence, and determines M mutually exclusive block height intervals with continuous block height from the block height sequence.
- the data block reading module 507 reads the data block corresponding to the block high interval from the disk for any block high interval.
- the data record reading module 509 searches for the data record obtained from the data block corresponding to the block height interval according to the position information set, and returns it to the client.
- the block height interval generation module 505 traverses the block height series, starts from the sequence number of the block height interval that has not yet been determined, and sequentially determines the interval between the two block heights, and sets the interval less than the previous one when the interval is less than the preset value.
- One block height is used as the starting point S M of the interval; and, starting from the block height S M , the interval between the two block heights is determined in sequence, and the previous block height when the interval is greater than the preset value is used as the interval end point E M to generate the M-th block High interval [S M , E M ].
- the block height interval generation module 505 determines the block height number K of the block height sequence in the block height interval, and when the block height number K is not lower than the preset value, generates the Mth Block height interval [S M , E M ].
- the device further includes an index generation module 511.
- an index generation module 511 In the block chain ledger, for any data record, obtain the business attributes contained in the data record; determine the location information of the data record in the ledger, The location information includes the block height of the data block where the data record is located, and the offset in the data block where the data record is located; the correspondence relationship between the business attribute and the location information is established, and the business attribute is written as The index of the primary key.
- the index generating module 511 determines the time stamp of the data record; in the same index record, according to the sequence of the time stamp, the position information of the data record is written into the value of the index record in order.
- the device further includes a data block generation module 513, which receives the data records to be stored and determines the hash value of each data record, wherein the data record contains business attributes.
- the hash value of the block is generated to include the hash value of the Nth data block and the Nth data block of each data record, where the block height of the data block increases monotonically based on the sequence of the block time.
- the preset blocking condition includes: the number of data records to be stored reaches the number threshold; or, the time interval from the last blocking time reaches the time threshold.
- the embodiments of this specification also provide a computer device, which at least includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor implements the data shown in FIG. 4 when the program is executed. Reading method.
- the device may include a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050.
- the processor 1010, the memory 1020, the input/output interface 1030, and the communication interface 1040 realize the communication connection between each other in the device through the bus 1050.
- the processor 1010 may be implemented by a general CPU (Central Processing Unit, central processing unit), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., for execution related Program to realize the technical solutions provided in the embodiments of this specification.
- CPU Central Processing Unit
- ASIC Application Specific Integrated Circuit
- the memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc.
- the memory 1020 may store an operating system and other application programs. When the technical solutions provided in the embodiments of this specification are implemented by software or firmware, related program codes are stored in the memory 1020 and called and executed by the processor 1010.
- the input/output interface 1030 is used to connect an input/output module to realize information input and output.
- the input/output/module can be configured in the device as a component (not shown in the figure), or it can be connected to the device to provide corresponding functions.
- the input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc.
- an output device may include a display, a speaker, a vibrator, an indicator light, and the like.
- the communication interface 1040 is used to connect a communication module (not shown in the figure) to realize the communication interaction between the device and other devices.
- the communication module can realize communication through wired means (such as USB, network cable, etc.), or through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
- the bus 1050 includes a path to transmit information between various components of the device (for example, the processor 1010, the memory 1020, the input/output interface 1030, and the communication interface 1040).
- the above device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040, and the bus 1050, in the specific implementation process, the device may also include the equipment necessary for normal operation. Other components.
- the above-mentioned device may also include only the components necessary to implement the solutions of the embodiments of the present specification, and not necessarily include all the components shown in the figures.
- the embodiment of the present specification also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the data reading method shown in FIG. 4 is implemented.
- Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
- the information can be computer-readable instructions, data structures, program modules, or other data.
- Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
- a typical implementation device is a computer.
- the specific form of the computer can be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email receiving and sending device, and a game control A console, a tablet computer, a wearable device, or a combination of any of these devices.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Key | Value |
0X123456 | (2,08),(2,10),(300,89),(300,999) |
344X0001 | (5,01),(8,22) |
…… | …… |
Claims (15)
- 一种基于磁盘存储的数据读取方法,包括:接收客户端所发送的数据读取指令,其中,所述读取指令中包含有业务属性;从预存的索引表中获取所述业务属性所对应的位置信息集合,其中,所述位置信息包含数据记录所处的数据块的块高,以及,在所处的数据块中的偏移量;对所述块高依序进行排列,生成块高序列,从所述块高序列中确定出M个互斥的块高连续的块高区间;针对任一块高区间,从磁盘中读取所述块高区间所对应的数据块;根据所述位置信息集合从所述块高区间所对应的数据块中查询获取的数据记录,并返回至客户端。
- 如权利要求1所述的方法,从所述块高序列中确定出M个互斥的连续块高区间,包括:遍历所述块高系列,从还未确定所属块高区间的序号开始,依序确定两个块高的间隔,将间隔小于预设值时的前一块高作为区间起点S M;以及,从块高S M开始,依序确定两个块高的间隔,将当间隔大于预设值时的前一块高作为区间终点E M,生成第M个块高区间[S M,E M]。
- 如权利要求2所述的方法,生成第M个块高区间[S M,E M],还包括:确定块高序列在所述块高区间中的块高个数K,当所述块高个数K不低于预设值时,生成第M个块高区间[S M,E M]。
- 如权利1所述的方法,所述预设的索引表基于如下方式预先生成:在块链式账本中,针对任一数据记录,获取所述数据记录中所包含的业务属性;确定所述数据记录在账本中的位置信息,所述位置信息包括数据记录所处的数据块的块高,以及,在所处的数据块中的偏移量;建立所述业务属性和位置信息的对应关系,写入以所述业务属性为主键的索引。
- 如权利要求4所述的方法,写入以所述业务属性为主键的索引,包括:确定数据记录的时间戳;在同一索引记录中按照时间戳的先后顺序,将数据记录的位置信息依序写入索引记录的值。
- 如权利要求4所述的方法,所述块链式账本中的数据块通过如下方式预先生成:接收待存储的数据记录,确定各数据记录的哈希值,其中,数据记录中包含业务属性;当达到预设的成块条件时,确定待写入数据块中的各数据记录,生成包含数据块的哈希值和数据记录的第N个数据块,具体包括:当N=1时,初始数据块的哈希值和块高基于预设方式给定;当N>1时,根据待写入数据块中的各数据记录和第N-1个数据块的哈希值确定第N个数据块的哈希值,生成包含第N个数据块的哈希值和各数据记录的第N个数据块,其中,数据块的块高基于成块时间的先后顺序单调递增。
- 如权利要求6所述的方法,所述预设的成块条件包括:待存储的数据记录数量达到数量阈值;或者,距离上一次成块时刻的时间间隔达到时间阈值。
- 一种基于磁盘存储的数据读取装置,包括:接收模块,接收客户端所发送的数据读取指令,其中,所述读取指令中包含有业务属性;位置信息获取模块,从预存的索引表中获取所述业务属性所对应的位置信息集合,其中,所述位置信息包含数据记录所处的数据块的块高,以及,在所处的数据块中的偏移量;块高区间生成模块,对所述块高依序进行排列,生成块高序列,从所述块高序列中确定出M个互斥的块高连续的块高区间;数据块读取模块,针对任一块高区间,从磁盘中读取所述块高区间所对应的数据块;数据记录读取模块,根据所述位置信息集合从所述块高区间所对应的数据块中查询获取的数据记录,并返回至客户端。
- 如权利要求8所述的装置,所述块高区间生成模块,遍历所述块高系列,从还未确定所属块高区间的序号开始,依序确定两个块高的间隔,将间隔小于预设值时的前一块高作为区间起点S M;以及,从块高S M开始,依序确定两个块高的间隔,将当间隔大于预设值时的前一块高作为区间终点E M,生成第M个块高区间[S M,E M]。
- 如权利要求9所述的装置,所述块高区间生成模块,确定块高序列在所述块高区间中的块高个数K,当所述块高个数K不低于预设值时,生成第M个块高区间[S M,E M]。
- 如权利要求8所述的装置,还包括索引生成模块,在块链式账本中,针对任一数据记录,获取所述数据记录中所包含的业务属性;确定所述数据记录在账本中的位置信息,所述位置信息包括数据记录所处的数据块的块高,以及,在所处的数据块中的偏移量;建立所述业务属性和位置信息的对应关系,写入以所述业务属性为主键的索引。
- 如权利要求11所述的装置,所述索引生成模块,确定数据记录的时间戳;在同一索引记录中按照时间戳的先后顺序,将数据记录的位置信息依序写入索引记录的值。
- 如权利要求11所述的装置,还包括数据块生成模块,接收待存储的数据记录,确定各数据记录的哈希值,其中,数据记录中包含业务属性;当达到预设的成块条件时,确定待写入数据块中的各数据记录,生成包含数据块的哈希值和数据记录的第N个数据块,具体包括:当N=1时,初始数据块的哈希值和块高基于预设方式给定;当N>1时,根据待写入数据块中的各数据记录和第N-1个数据块的哈希值确定第N个数据块的哈希值,生成包含第N个数据块的哈希值和各数据记录的第N个数据块,其中,数据块的块高基于成块时间的先后顺序单调递增。
- 如权利要求13所述的装置,所述预设的成块条件包括:待存储的数据记录数量达到数量阈值;或者,距离上一次成块时刻的时间间隔达到时间阈值。
- 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现如权利要求1至7任一项所述的方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/723,117 US20220236910A1 (en) | 2019-10-18 | 2022-04-18 | Disk storage-based data reading methods and apparatuses, and devices |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910992775.4A CN110879687B (zh) | 2019-10-18 | 2019-10-18 | 一种基于磁盘存储的数据读取方法、装置及设备 |
CN201910992775.4 | 2019-10-18 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/723,117 Continuation US20220236910A1 (en) | 2019-10-18 | 2022-04-18 | Disk storage-based data reading methods and apparatuses, and devices |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021073241A1 true WO2021073241A1 (zh) | 2021-04-22 |
Family
ID=69728022
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/109273 WO2021073241A1 (zh) | 2019-10-18 | 2020-08-14 | 一种基于磁盘存储的数据读取方法、装置及设备 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20220236910A1 (zh) |
CN (1) | CN110879687B (zh) |
WO (1) | WO2021073241A1 (zh) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110879687B (zh) * | 2019-10-18 | 2021-03-16 | 蚂蚁区块链科技(上海)有限公司 | 一种基于磁盘存储的数据读取方法、装置及设备 |
CN113296683B (zh) * | 2020-04-07 | 2022-04-29 | 阿里巴巴集团控股有限公司 | 数据存储方法、装置、服务器和存储介质 |
CN112783927B (zh) * | 2021-01-27 | 2023-03-17 | 浪潮云信息技术股份公司 | 一种数据库查询方法及系统 |
CN115840541B (zh) * | 2023-02-23 | 2023-06-13 | 成都体育学院 | 一种运动数据存储方法、系统和介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180139278A1 (en) * | 2016-11-14 | 2018-05-17 | International Business Machines Corporation | Decentralized immutable storage blockchain configuration |
CN109739843A (zh) * | 2018-12-26 | 2019-05-10 | 篱笆墙网络科技有限公司 | 区块链数据读写方法、系统、设备及存储介质 |
CN110162526A (zh) * | 2019-04-18 | 2019-08-23 | 阿里巴巴集团控股有限公司 | 一种块链式账本中数据记录的查询方法、装置及设备 |
CN110175188A (zh) * | 2019-05-31 | 2019-08-27 | 杭州复杂美科技有限公司 | 一种区块链状态数据缓存和查询方法、设备及存储介质 |
CN110879687A (zh) * | 2019-10-18 | 2020-03-13 | 支付宝(杭州)信息技术有限公司 | 一种基于磁盘存储的数据读取方法、装置及设备 |
Family Cites Families (44)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4490782A (en) * | 1981-06-05 | 1984-12-25 | International Business Machines Corporation | I/O Storage controller cache system with prefetch determined by requested record's position within data block |
US4533995A (en) * | 1981-08-03 | 1985-08-06 | International Business Machines Corporation | Method and system for handling sequential data in a hierarchical store |
US4636946A (en) * | 1982-02-24 | 1987-01-13 | International Business Machines Corporation | Method and apparatus for grouping asynchronous recording operations |
US4583166A (en) * | 1982-10-08 | 1986-04-15 | International Business Machines Corporation | Roll mode for cached data storage |
US4625081A (en) * | 1982-11-30 | 1986-11-25 | Lotito Lawrence A | Automated telephone voice service system |
WO1997029426A1 (fr) * | 1996-02-09 | 1997-08-14 | Sony Corporation | Processeur d'informations, procede de modification de noms de fichiers, et support d'enregistrement sur lequel un programme de changement de nom de fichier est enregistre |
MY138481A (en) * | 2001-05-17 | 2009-06-30 | Sony Corp | Data distribution system, terminal apparatus, distribution center apparatus, highefficiency encoding method, high-efficiency encoding apparatus, encoded data decoding method, encoded data decoding apparatus, data transmission method, data transmission apparatus, sub information attaching method, sub information attaching apparatus, and recording medium |
US7058783B2 (en) * | 2002-09-18 | 2006-06-06 | Oracle International Corporation | Method and mechanism for on-line data compression and in-place updates |
US9396103B2 (en) * | 2007-06-08 | 2016-07-19 | Sandisk Technologies Llc | Method and system for storage address re-mapping for a memory device |
US20090271562A1 (en) * | 2008-04-25 | 2009-10-29 | Sinclair Alan W | Method and system for storage address re-mapping for a multi-bank memory device |
US8527482B2 (en) * | 2008-06-06 | 2013-09-03 | Chrysalis Storage, Llc | Method for reducing redundancy between two or more datasets |
US8914567B2 (en) * | 2008-09-15 | 2014-12-16 | Vmware, Inc. | Storage management system for virtual machines |
US9460178B2 (en) * | 2013-01-25 | 2016-10-04 | Dell Products L.P. | Synchronized storage system operation |
CN103500224B (zh) * | 2013-10-18 | 2016-03-16 | 税友软件集团股份有限公司 | 一种数据写入方法及装置、数据读取方法及装置 |
US10248681B2 (en) * | 2014-07-08 | 2019-04-02 | Sap Se | Faster access for compressed time series data: the block index |
CN104778015B (zh) * | 2015-02-04 | 2018-02-16 | 深圳神州数码云科数据技术有限公司 | 一种磁盘阵列性能优化方法及系统 |
CN110050474A (zh) * | 2016-12-30 | 2019-07-23 | 英特尔公司 | 用于物联网网络中的复合对象的子对象的类型命名和区块链 |
US11941279B2 (en) * | 2017-03-10 | 2024-03-26 | Pure Storage, Inc. | Data path virtualization |
US20220334725A1 (en) * | 2017-03-10 | 2022-10-20 | Pure Storage, Inc. | Edge Management Service |
US11675520B2 (en) * | 2017-03-10 | 2023-06-13 | Pure Storage, Inc. | Application replication among storage systems synchronously replicating a dataset |
US11089105B1 (en) * | 2017-12-14 | 2021-08-10 | Pure Storage, Inc. | Synchronously replicating datasets in cloud-based storage systems |
US10891384B2 (en) * | 2017-10-19 | 2021-01-12 | Koninklijke Kpn N.V. | Blockchain transaction device and method |
US11528611B2 (en) * | 2018-03-14 | 2022-12-13 | Rose Margaret Smith | Method and system for IoT code and configuration using smart contracts |
CN109003078B (zh) * | 2018-06-27 | 2021-08-24 | 创新先进技术有限公司 | 基于区块链的智能合约调用方法及装置、电子设备 |
CN108898390B (zh) * | 2018-06-27 | 2021-01-12 | 创新先进技术有限公司 | 基于区块链的智能合约调用方法及装置、电子设备 |
CN109345386B (zh) * | 2018-08-31 | 2020-04-14 | 阿里巴巴集团控股有限公司 | 基于区块链的交易共识处理方法及装置、电子设备 |
CN109379397B (zh) * | 2018-08-31 | 2019-12-06 | 阿里巴巴集团控股有限公司 | 基于区块链的交易共识处理方法及装置、电子设备 |
US10454498B1 (en) * | 2018-10-18 | 2019-10-22 | Pure Storage, Inc. | Fully pipelined hardware engine design for fast and efficient inline lossless data compression |
JP6838260B2 (ja) * | 2018-11-14 | 2021-03-03 | カウリー株式会社 | ブロックチェーン制御方法 |
CN109714412B (zh) * | 2018-12-25 | 2021-08-10 | 深圳前海微众银行股份有限公司 | 区块同步方法、装置、设备及计算机可读存储介质 |
US11018848B2 (en) * | 2019-01-02 | 2021-05-25 | Bank Of America Corporation | Blockchain management platform for performing asset adjustment, cross sectional editing, and bonding |
CN110264187B (zh) * | 2019-01-23 | 2021-06-04 | 腾讯科技(深圳)有限公司 | 数据处理方法、装置、计算机设备及存储介质 |
CN110008203B (zh) * | 2019-01-31 | 2023-06-06 | 创新先进技术有限公司 | 一种数据清除方法、装置及设备 |
CN110190963B (zh) * | 2019-04-04 | 2020-09-01 | 阿里巴巴集团控股有限公司 | 一种针对授时证书生成请求的监控方法、装置及设备 |
CN110162523B (zh) * | 2019-04-04 | 2020-09-01 | 阿里巴巴集团控股有限公司 | 数据存储方法、系统、装置及设备 |
US10990705B2 (en) * | 2019-04-18 | 2021-04-27 | Advanced New Technologies Co., Ltd. | Index creation for data records |
CN110162662B (zh) * | 2019-04-18 | 2023-02-28 | 创新先进技术有限公司 | 一种块链式账本中数据记录的验证方法、装置及设备 |
US11327676B1 (en) * | 2019-07-18 | 2022-05-10 | Pure Storage, Inc. | Predictive data streaming in a virtual storage system |
US10783277B2 (en) * | 2019-05-31 | 2020-09-22 | Alibaba Group Holding Limited | Blockchain-type data storage |
US11115189B2 (en) * | 2019-06-03 | 2021-09-07 | Advanced New Technologies Co., Ltd. | Verifying a blockchain-type ledger |
US10963453B2 (en) * | 2019-06-03 | 2021-03-30 | Advanced New Technologies Co., Ltd. | Service identifier-based data indexing |
US10791122B2 (en) * | 2019-07-04 | 2020-09-29 | Alibaba Group Holding Limited | Blockchain user account data |
US10795874B2 (en) * | 2019-07-29 | 2020-10-06 | Alibaba Group Holding Limited | Creating index in blockchain-type ledger |
US11550762B2 (en) * | 2021-02-24 | 2023-01-10 | Sap Se | Implementation of data access metrics for automated physical database design |
-
2019
- 2019-10-18 CN CN201910992775.4A patent/CN110879687B/zh active Active
-
2020
- 2020-08-14 WO PCT/CN2020/109273 patent/WO2021073241A1/zh active Application Filing
-
2022
- 2022-04-18 US US17/723,117 patent/US20220236910A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180139278A1 (en) * | 2016-11-14 | 2018-05-17 | International Business Machines Corporation | Decentralized immutable storage blockchain configuration |
CN109739843A (zh) * | 2018-12-26 | 2019-05-10 | 篱笆墙网络科技有限公司 | 区块链数据读写方法、系统、设备及存储介质 |
CN110162526A (zh) * | 2019-04-18 | 2019-08-23 | 阿里巴巴集团控股有限公司 | 一种块链式账本中数据记录的查询方法、装置及设备 |
CN110175188A (zh) * | 2019-05-31 | 2019-08-27 | 杭州复杂美科技有限公司 | 一种区块链状态数据缓存和查询方法、设备及存储介质 |
CN110879687A (zh) * | 2019-10-18 | 2020-03-13 | 支付宝(杭州)信息技术有限公司 | 一种基于磁盘存储的数据读取方法、装置及设备 |
Also Published As
Publication number | Publication date |
---|---|
CN110879687A (zh) | 2020-03-13 |
US20220236910A1 (en) | 2022-07-28 |
CN110879687B (zh) | 2021-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021073242A1 (zh) | 索引创建和数据查询方法、装置及设备 | |
WO2021073241A1 (zh) | 一种基于磁盘存储的数据读取方法、装置及设备 | |
WO2020211569A1 (zh) | 一种数据记录的索引创建方法 | |
CN110188096B (zh) | 一种数据记录的索引创建方法、装置及设备 | |
CN110162526B (zh) | 一种块链式账本中数据记录的查询方法、装置及设备 | |
CN110162662B (zh) | 一种块链式账本中数据记录的验证方法、装置及设备 | |
WO2021017422A1 (zh) | 一种块链式账本中的索引创建方法、装置及设备 | |
WO2020244237A1 (zh) | 一种块链式账本中的验证方法、装置及设备 | |
WO2020253231A1 (zh) | 一种基于收据的数据存储方法、装置及设备 | |
WO2021093461A1 (zh) | 一种块链式账本中的聚合计算方法、装置及设备 | |
WO2021057164A1 (zh) | 一种块链式账本中的查询方法、装置及设备 | |
WO2020233146A1 (zh) | 数据操作记录的存储方法、系统、装置及设备 | |
WO2021073240A1 (zh) | 一种块链式账本中的数据存储方法、装置及设备 | |
US10795874B2 (en) | Creating index in blockchain-type ledger | |
WO2020244238A1 (zh) | 多层块链式账本的数据存储方法、装置及设备 | |
US10990705B2 (en) | Index creation for data records | |
US20220058184A1 (en) | Service identifier-based data indexing | |
WO2021057127A1 (zh) | 一种基于多条业务属性的数据存储方法、装置及设备 | |
US10999062B2 (en) | Blockchain-type data storage | |
WO2021093462A1 (zh) | 一种数据库中的操作记录存储方法、装置及设备 | |
CN111444194B (zh) | 一种块链式账本中索引的清除方法、装置及设备 | |
US11115189B2 (en) | Verifying a blockchain-type ledger | |
CN111444195B (zh) | 一种块链式账本中索引的清除方法、装置及设备 | |
CN110874486B (zh) | 一种块链式账本中的数据读取方法、装置及设备 | |
CN110717196A (zh) | 一种证券交易数据的存储方法、装置及设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20875732 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20875732 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.01.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20875732 Country of ref document: EP Kind code of ref document: A1 |