WO2018133762A1 - File merging method and apparatus - Google Patents

File merging method and apparatus Download PDF

Info

Publication number
WO2018133762A1
WO2018133762A1 PCT/CN2018/072641 CN2018072641W WO2018133762A1 WO 2018133762 A1 WO2018133762 A1 WO 2018133762A1 CN 2018072641 W CN2018072641 W CN 2018072641W WO 2018133762 A1 WO2018133762 A1 WO 2018133762A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
new
header
tree
block
Prior art date
Application number
PCT/CN2018/072641
Other languages
French (fr)
Chinese (zh)
Inventor
郑主能
Original Assignee
广州市动景计算机科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201710040977.XA external-priority patent/CN108319625B/en
Priority claimed from CN201710031732.0A external-priority patent/CN108319602B/en
Application filed by 广州市动景计算机科技有限公司 filed Critical 广州市动景计算机科技有限公司
Publication of WO2018133762A1 publication Critical patent/WO2018133762A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to the field of data storage technologies, and in particular, to a method and apparatus for merging files stored in an external memory.
  • the underlying data structure is either a B-tree or its variant B+ tree, or an LSM tree.
  • the former has better read-friendliness, while the latter has better write-friendliness.
  • these two things seem to have both fish and bear's paws, in the greedy Internet world, they are eager to have a data storage solution that is compatible with both reading and writing.
  • the data structure used in LevelDB seems to combine LSM and B-tree, it is not thorough enough.
  • the files stored in the disk in LevelDB are divided into multiple levels, and there are many files (SSTable files) in different levels.
  • SSTable files files
  • the SSTable files need to be merged, because in the SSTable file.
  • the keys and corresponding values are stored together, so when merging LevelDB files, all key-value pairs need to be fetched one by one to build a new file.
  • the merge process is more complicated and reduces readability. Write performance.
  • LevelDB is a high-performance KV storage engine developed by Google, inspired by Google's BigTable.
  • LevelDB can play very good performance in the small data volume scenario, in the case of large data volume (hundreds of G) and high frequency write, LevelDB is reading, writing, merging, data cleaning, restart recovery, etc. Many aspects have exposed its shortcomings.
  • An object of the embodiments of the present invention is to provide a file merging method and apparatus for data merging.
  • a file merging method is provided.
  • the file is stored in an external memory, including a file header, a data block, and an index block.
  • the file header is used to record metadata information of the file, and the data block is used for storing.
  • the index block is used to store the key corresponding to the value in the form of a B+ tree, wherein the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree, and the method includes: The first file is additionally written with an additional data block in which the value in the data block of the second file is written; after the additional data block is additionally written, the new index block is additionally written, and the new index block is based on the index block of the first file and the second The index block generated by the file, the index block of the first file and all the valid keys in the index block of the second file and their corresponding values are recorded in the data block of the first file and the logical address in the additional data block respectively.
  • a new file header is additionally written after the new index block to record the metadata information of the merged new file.
  • the keys and values of the file described in the embodiment of the present invention are stored separately, and the keys are stored in the form of a B+ tree. Therefore, when the two files are merged, one file can be kept unchanged, and the value of the other file can be directly added and written, thereby improving the writing performance. And the merged index block is a new B+ tree, and the value in the merged file can be conveniently read according to the new index block, and the read performance of the merged file is not affected.
  • the metadata information may include one or more of the following:
  • all nodes constituting the B+ tree are physically stored contiguously.
  • the B+ tree can be physically stored continuously by utilizing the local preloading feature of the disk, so that the index block of the file to be merged can be obtained by simply traversing successive disk blocks in the process of reconstructing the index block.
  • the file merging method may further include: updating a file header of the first file according to the new file header to replace the metadata information in the file header of the first file with the metadata information in the new file header.
  • the present invention can avoid the damage caused to the file by the abnormal situation during the merge process by setting the double file header.
  • the file includes a front file header at the head of the file and a back file header at the end of the file.
  • the contents of the front file header and the last file header are the same, and the previous file header of the first file is updated according to the new file header as a new file.
  • the front and back headers of the new file can be updated normally, and can be used to view the metadata information in the new file.
  • the file merging method may further include: when the step of writing the metadata information of the new file in the new file header is wrong, the new file is restored to the first file before the merge according to the file header of the first file. And/or in the case of an error in the step of updating the header of the first file, the header of the first file is re-updated according to the new header.
  • the file in the merge process can be restored to the first file before the merge according to the file header of the first file.
  • an error occurs during the process of updating the header of the first file, it can be re-updated according to the new header.
  • the file merging method may further include: performing the following steps to read the target value corresponding to the request key from the target file: obtaining a file header and an index block of the target file; determining, according to the file header, whether the request key is at the file header Within the range of the indicated key; in the case where it is determined that the request key is in the range, based on the B+ tree structure of the index block, the leaf node corresponding to the request key is searched in the index block; stored according to the found leaf node The value corresponding to the key reads the target value at the logical address in the data block in the target file.
  • a file merging apparatus is further provided, where the file is stored in an external memory, including a file header, a data block, and an index block, where the file header is used to record metadata information of the file, and the data block is used.
  • the index block is used to store the key corresponding to the value in the form of a B+ tree, wherein the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree
  • the device includes a first writing unit, configured to write an additional data block after the first file, wherein a value in the data block of the second file is written; a B-tree generating unit, configured to be based on the index block of the first file and the second The index block of the file generates a new B+ tree, and all the valid keys in the index block of the first file and the index block of the second file and their corresponding values are respectively recorded in the data blocks of the first file and the logical addresses in the additional data block.
  • the second write unit is configured to additionally write a new index block after appending the data block, wherein the new B+ tree is written; and the third write unit is used after the new index block Add a new header is written to the new file meta merged data record.
  • the metadata information may include one or more of the following:
  • the file merging device may further include: an updating unit, configured to update the file header of the first file according to the new file header, to replace the metadata in the file header of the first file with the metadata information in the new file header. information.
  • an updating unit configured to update the file header of the first file according to the new file header, to replace the metadata in the file header of the first file with the metadata information in the new file header. information.
  • the file includes a front file header at the head of the file and a rear file header at the end of the file.
  • the contents of the front file header and the back file header are the same, and the update unit updates the previous file header of the first file according to the new file header.
  • the file merging device may further include: a first restoring unit, where the step of writing the metadata information of the new file in the new file header fails, the new file is restored according to the file header of the first file The first file before the merge; and/or the second restore unit is configured to re-update the file header of the first file according to the new file header if the step of updating the file header of the first file is erroneous.
  • the file merging device may further include a reading unit, configured to read the target value corresponding to the request key from the target file, where the reading unit may include: an acquiring module, acquiring a file header and an index of the target file a determining module, according to the file header, determining whether the request key is within the range of the key indicated by the file header; and the finding module, in the case of determining that the request key is in the range, searching in the index block based on the B+ tree structure of the index block Corresponding to the leaf node of the request key; the reading module reads the target value in the logical address in the data block in the target file according to the value corresponding to the key stored by the found leaf node.
  • the reading unit may include: an acquiring module, acquiring a file header and an index of the target file a determining module, according to the file header, determining whether the request key is within the range of the key indicated by the file header; and the finding module, in the case of determining that the request key is in the range, searching in the
  • the key and value of the file described in the file merging method and apparatus of the embodiment of the present invention are separately stored, wherein the key is stored in the form of a B+ tree, thereby maintaining one file when merging the two files Moves the value of another file directly to the previous file, improves the write performance, and reconstructs the index block that stores the key in the form of a B+ tree.
  • the new index block can be easily read from the merged file. The value of the merged file will not be affected.
  • Another object of embodiments of the present invention is to provide a new database management method and database system.
  • a database management method for storing a plurality of pieces of data, wherein each piece of data includes a corresponding key and value, the method comprising: writing a plurality of pieces of data into an external memory The log file; the data in the log file is written into the memory table in the internal memory, wherein the data written in the memory table is stored in order according to the size of the key; when the size of the memory table exceeds a predetermined threshold, the memory table is converted Is a read-only memory table, and writes subsequent data in the log file to the new memory table; writes the data in the read-only memory table to the external accessor to obtain the first-level storage file; and merges two or More first-level storage files to get second-level storage files.
  • the file finally stored in the external memory has only two layers, and the redundancy is low, which is convenient to find.
  • the data block management method may further include: specifying a primary file name of the first-level storage file by using a first naming rule; and specifying a primary file name of the second-level storage file by using a second naming rule, the first naming rule It is different from the second naming rule in order to distinguish whether the storage file is a first-level storage file or a second-level storage file based on the main file name.
  • the memory table may be composed of a hash table, where the hash table includes one or more hash buckets, each hash bucket corresponds to one jump table, and each piece of data in the memory table constitutes an element of the jump table.
  • the order of the elements in the jump table is arranged in order according to the size of the keys.
  • Inserting the memory table before jumping the table can reduce the lock granularity. For concurrent read and write operations, if the keys are different, the fast lookup insertion can be performed in the jump table corresponding to the respective hash bucket. On the other hand, the expansion is expanded. While the size of the memory table is not increased, the size of the jump table is not increased, which can reduce the probability that the jump table becomes a linear search as the amount of data becomes larger, thereby improving the overall search efficiency.
  • the data block management method may further include: maintaining a read-only memory table queue in the internal memory, where data in the read-only memory table is not all written to the external memory, and when the size of the new memory table exceeds a predetermined threshold , converts the new memory table into another read-only memory table and puts it into the read-only memory table queue.
  • the data structure of the storage file may include: a file header for recording metadata information of the storage file, a data block for storing the value, and an index block for storing the key corresponding to the value in the form of a B+ tree.
  • the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree, and all the nodes constituting the B+ tree are physically stored continuously.
  • the B+ tree can be physically stored continuously by utilizing the local preloading feature of the disk, so that the index block of the file to be merged can be obtained by simply traversing successive disk blocks in the process of reconstructing the index block.
  • the step of merging the two first-level storage files may include: additionally writing an additional data block after the first storage file, where the value in the data block of the second storage file is written; and appending the data block after appending the data block
  • Writing a new index block the new index block is generated based on the index block of the first storage file and the index block of the second storage file, the index block of the first storage file and all the keys in the index block of the second storage file and The corresponding value is recorded in the leaf node of the new B+ tree in the data block of the first storage file and the logical address in the additional data block; the new file header is additionally written after the new index block to record the merged new file. Metadata information for the file.
  • the two files when the two files are merged, one file can be kept unchanged, and the value of the other file can be directly added and written, thereby improving the writing performance.
  • the merged index block is a new B+ tree, and the value in the merged file can be conveniently read according to the new index block, and the read performance of the merged file is not affected.
  • the metadata information may include one or more of the following: the number of keys in the index block; the range of keys in the index block; the height of the B+ tree; the logical address of the first leaf node in the B+ tree; and the B+ tree The number of internal nodes.
  • the database management method may further include: updating a file header of the first storage file according to the new file header to replace the metadata information in the file header of the first storage file with the metadata information in the new file header.
  • the embodiment of the present invention can avoid the damage caused to the file by the abnormal situation during the merge process by setting the double file header.
  • the file may include a front file header located at a file header and a subsequent file header located at a tail of the file, and the contents of the front file header and the subsequent file header are the same, and the front file header of the first storage file is updated according to the new file header, as The front file header of the new file, with the new file header as the post file header of the new file.
  • the front and back headers of the new file can be updated normally, and can be used to view the metadata information in the new file.
  • the database management method may further include: when the step of writing the metadata information of the new file in the new file header is wrong, the new file is restored to the first before the merge according to the file header of the first storage file.
  • the file is stored; and/or in the case where the step of updating the header of the first stored file is erroneous, the header of the first stored file is re-updated according to the new header.
  • the file in the merge process can be restored to the first before the merge according to the file header of the first storage file.
  • the file is stored, and when an error occurs in the process of updating the header of the first stored file, it can be re-updated according to the new header.
  • the database management method may further include: searching for a key corresponding to the request key in the memory table, in response to the request for finding the target value corresponding to the request key, and reading the target value in the case of finding; If the request key is not found in the memory table, the read-only memory table is searched for the key corresponding to the request key, and the target value is read in the case of the search; the read-only memory table is not found.
  • the request key it is chronologically searched for whether each of the first-level storage files in the disk has a key corresponding to the request key, and in the case of the search, the target value is read; and in each of the first-level storage files. If it is not found, use the binary search method to find whether the second-level storage file in the disk has the key corresponding to the request key, and read the target value if found.
  • the database management method may further include: acquiring a file header and an index block of the target storage file in response to the request for reading the target value corresponding to the request key from the target storage file; determining, according to the file header, whether the request key is Within the range of the key indicated by the file header; in the case of determining that the request key is within the range of the key indicated by the file header, searching for the leaf node corresponding to the request key in the index block based on the B+ tree structure of the index block; In the case of the search, the target value is read in the logical address in the data block in the target storage file according to the value corresponding to the key stored by the found leaf node.
  • the database management method may further include: in response to restarting the request for restoring the internal memory, constructing the second-level storage file list according to the size order of the range of the keys included in the second-level storage file; storing according to the first level The file serial number order of the file, constructing a first-level storage file list; determining, according to the first-level storage file list and the second-level storage file list, the writing progress of the data in the log file being written to the first-level storage file; According to the writing progress, the data in the log file that is not written to the first-level storage file is written to the memory table in the internal memory.
  • a database system comprising: an internal memory and an external memory, wherein the internal memory is used to write a plurality of pieces of data to a log file in the external memory, and the external memory will log the file
  • the data in the internal memory is written into the memory table in the internal memory, wherein the data written in the memory table is stored in order according to the size of the key.
  • the internal memory converts the memory table into a read-only memory table.
  • the external memory writes the subsequent data in the log file to the new memory table, and the internal memory writes the data in the read-only memory table to the external accessor to obtain the first-level storage file, and the external memory merges two or more.
  • the first level stores the text to get the second level storage file.
  • the external storage specifies a primary file name of the first-level storage file by using a first naming rule, and specifies a primary file name of the second-level storage file by using a second naming rule, where the first naming rule is different from the second naming rule.
  • the first naming rule is different from the second naming rule.
  • the memory table is composed of a hash table, the hash table includes one or more hash buckets, and each hash bucket corresponds to one jump table, and each piece of data in the memory table constitutes an element of the jump table, wherein The order of the elements in the jump table is ordered according to the size of the keys.
  • the read-only memory table queue is maintained in the internal memory, and the data in the read-only memory table is not all written to the external memory, and when the size of the new memory table exceeds a predetermined threshold, the external memory converts the new memory table into a new memory table. Make another read-only memory table and put it into a read-only memory table queue.
  • the data structure of the storage file may include: a file header for recording metadata information of the storage file, a data block for storing the value, and an index block for storing the key corresponding to the value in the form of a B+ tree.
  • the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree, and all the nodes constituting the B+ tree are physically stored continuously.
  • the external memory merges the two first-level storage files by performing an operation of: additionally writing the additional data block after the first storage file, wherein the value in the data block of the second storage file is written; and appending the data block Then, a new index block is additionally written, and the new index block is generated based on the index block of the first storage file and the index block of the second storage file, and all of the index block of the first storage file and the index block of the second storage file are valid.
  • the key and its corresponding value are recorded in the leaf node of the new B+ tree in the data block and the additional data block of the first storage file respectively; the new file header is additionally written after the new index block to record the merge Metadata information after the new file.
  • the file finally stored in the external memory has only two hierarchical structures, and the file redundancy is low, which is convenient to find.
  • FIGS. 1 and 3 are diagrams showing the data structure of a file involved in the file combining scheme of the present invention.
  • FIG. 2 is a diagram showing the B+ tree structure of an index block of the present invention.
  • FIG. 4 is a schematic flow chart showing a file merging method according to an embodiment of the present invention.
  • 5 and 6 show schematic views of a file merge state based on the present invention.
  • FIG. 7 shows a schematic flow chart of a method of reading data in an object file.
  • FIG. 8 is a functional block diagram showing a file merging apparatus according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram showing the structure in which the reading unit can also have a function module.
  • FIG. 10 is a schematic diagram showing a hardware configuration of an electronic device in which an embodiment of the present invention can be performed.
  • FIG. 11 is a block diagram showing the structure of a database system according to an embodiment of the present invention.
  • FIG. 12 is a flow chart showing the data storage between the internal memory 110 and the external memory 120.
  • Figure 13 is a static diagram showing the process of storing data.
  • Figure 14 is a flow chart showing a complete lookup.
  • Figure 15 is a flow chart showing the lookup inside a file.
  • FIG. 16 is a schematic flow chart showing restart recovery according to an embodiment of the present invention.
  • any specific values are to be construed as illustrative only and not as a limitation. Thus, other examples of the exemplary embodiments may have different values.
  • FIG. 1 is a block diagram showing a hardware configuration of an electronic device 1000 in which an embodiment of the present invention can be implemented.
  • the electronic device 1000 can be a portable computer, a desktop computer, a mobile phone, a tablet, or the like. As shown in FIG. 1, the electronic device 1000 may include a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, an input device 1600, a speaker 1700, a microphone 1800, and the like.
  • the processor 1100 may be a central processing unit CPU, a microprocessor MCU, or the like.
  • the memory 1200 includes, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), a nonvolatile memory such as a hard disk, and the like.
  • the interface device 1300 includes, for example, a USB interface, a headphone jack, and the like.
  • the communication device 1400 can, for example, perform wired or wireless communication, and specifically can include Wifi communication, Bluetooth communication, 2G/3G/4G/5G communication, and the like.
  • the display device 1500 is, for example, a liquid crystal display, a touch display, or the like.
  • Input device 1600 can include, for example, a touch screen, a keyboard, a somatosensory input, and the like. The user can input/output voice information through the speaker 1700 and the microphone 1800.
  • the memory 1200 of the electronic device 1000 is configured to store an instruction for controlling the processor 1100 to perform any of the file merging methods provided by the embodiments of the present invention or Database management method. It will be understood by those skilled in the art that although a plurality of devices are illustrated for electronic device 1000 in FIG. 1, the present invention may relate only to some of the devices therein, for example, electronic device 1000 relates only to processor 1100 and storage device 1200. A technician can design instructions in accordance with the disclosed aspects of the present invention. How the instructions control the processor for operation is well known in the art and will not be described in detail herein.
  • This embodiment mainly proposes a scheme of merging files stored in an external memory such as a hard disk, a floppy disk, an optical disk, or a USB disk.
  • the key and value of the file described in the file merging method and apparatus of the present embodiment are separately stored, wherein the key is stored in the form of a B+ tree, whereby when merging the two files, one file can be kept. Moves the value of another file directly to the previous file, improves the write performance, and reconstructs the index block that stores the key in the form of a B+ tree.
  • the new index block can be easily read from the merged file. Value, the read performance of the merged file will not be affected.
  • FIG. 1 is a schematic diagram showing the data structure of a file in the file combining scheme of the present embodiment.
  • the file described in this embodiment can be physically divided into a file header, a data block, and an index block by blocks, and each block can be composed of a plurality of pages.
  • the page referred to in this embodiment is the minimum unit of one I/O, which is generally an integer multiple of the system page, and the size of the pages of different types of blocks may be different.
  • the data block is used to store the value.
  • the index block is used to store the key corresponding to the value in the form of a B+ tree.
  • the B+ tree is composed of a leaf node, an internal node, and a root node.
  • the form of the B+ tree here is a person skilled in the art. As is well known, it will not be repeated here.
  • each leaf node in the B+ tree corresponds to a key, and the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree. That is, only the key is stored in the leaf node of the B+ tree, and no value is stored. Instead, the offset of the page in the data block where the value is located and the offset of the value in the page can be stored.
  • all the nodes (root node, internal node, and leaf node) constituting the B+ tree are physically and continuously stored, thereby utilizing the local preloading feature of the disk to quickly acquire all nodes in the B+ tree, and the merge can be improved.
  • the efficiency of building a new B+ tree in the process (the merge process will be explained in more detail below).
  • the header is used to record the metadata information of the file.
  • the metadata information may include the number of keys in the index block, the range of keys in the index block, the height of the B+ tree, the logical address of the first leaf node in the B+ tree, and the number of internal nodes in the B+ tree.
  • the file header of the file may include a front file header and a back file header, and the metadata information of the file recorded by the front file header and the subsequent file header may be the same.
  • the file described in this embodiment may further include a filter, and the filter may be used to determine whether the accessed key is in the file.
  • the filter may be a Bloom filter, and the access does not exist.
  • FIG. 4 is a schematic flowchart showing a file merging method according to an embodiment of the present invention, including steps S210 to S230.
  • the method can combine two or more files.
  • the first file and the second file are combined as an example for description.
  • step S210 an additional data block is additionally written after the first file, in which the value in the data block of the second file is written.
  • the freshness of the second file may be greater than the first file, that is, the second file may be stored later in the external memory, and the first file may be previously stored in the external memory.
  • the value in the data block of the second file may be additionally appended after the first file.
  • a block in which a value is added after the first file is referred to as an additional data block. That is to say, the value in the data block of the second file can be rewritten in the additional data block after the first file, so that the end of the file F and the address of the additional data block are consecutive.
  • new index information can be created, that is, in step S220, the new index block is additionally written after the additional data block.
  • the new index block is generated based on the index block of the first file and the index block of the second file.
  • the freshness of the second file may be greater than the first file, so the key value in the second file may be a modification, deletion, replacement, etc. of the key value in the first file, and thus for the first file and The same key exists in the index block of the second file, and the key in the second file with higher freshness can be selected as the valid key, and the key in the first file is discarded, thereby constructing a new index block.
  • the keys in the generated new index block are all valid keys, and the corresponding values are all valid values.
  • the key in the new index block is also stored in the form of a B+ tree which is regenerated according to the index block of the first file and the index block of the second file, and thus may be referred to as a new B+ tree.
  • the index block of the first file and all the valid keys in the index block of the second file and their corresponding values are respectively recorded in the data block of the first file and the logical address in the additional data block in the leaf of the new B+ tree. In the node.
  • the index block of the first file and all the nodes of the B+ tree in the index block of the second file are physically stored continuously, so that in the process of reconstructing the new B+ tree, the local portion of the disk can be utilized.
  • the feature of the preloading feature is that the index block of the first file and the index block of the second file can be obtained by simply traversing successive disk blocks, thereby improving the construction efficiency of the new B+ tree.
  • the index block in the first file is invalidated and replaced by the new index block.
  • the invalidity mentioned here means that in the subsequent search process, the new index block is used for searching, and the old index block is no longer used. That is, after generating a new index block, the old index block may not be deleted.
  • step S230 a new file header is additionally written after the new index block to record the metadata information of the merged new file.
  • the metadata information of the new file may include the number of keys in the new index block, the range of keys in the new index block, the height of the new B+ tree, the logical address of the first leaf node in the new B+ tree, and the internal nodes in the new B+ tree. Number and so on. After generating a new file header, you can delete the second file and free up storage space.
  • FIG. 5 is a schematic diagram showing a merge process of merging G files into F files according to an embodiment of the present invention.
  • the F file is unchanged, and only the value in the G file needs to be additionally written into the F file, and a new index block and a new file header are generated.
  • the merge process is simpler, and according to the merged B+ tree, the value corresponding to the key in the file can be conveniently found, and the read performance is improved. .
  • FIG. 6 is a schematic diagram showing another example of a merge process of merging G files into F files according to an embodiment of the present invention.
  • both the F file and the G file in FIG. 6 include a front file header located at the head of the file and a rear file header located at the end of the file.
  • the contents of the front file header and the last file header are the same.
  • the previous file header of the F file can be updated according to the new file header as the front file header of the new file, and the new file header is used as the new file header.
  • the file header of the file is used as the new file header.
  • the present invention adopts a method of maintaining a double file header, and can solve the problem that the file cannot be recovered due to an abnormal situation.
  • the first two files of the new file can be updated normally and are the same.
  • FIG. 7 is a schematic flowchart showing a method of reading a target value corresponding to a request key from a file.
  • step S310 a file header and an index block of the target file are acquired.
  • step S320 it is determined according to the file header whether the request key is within the range of the key indicated by the file header. If not, it indicates that the value corresponding to the request key does not exist in the target file, and the reading ends.
  • step S330 is performed to find a leaf node corresponding to the request key in the index block based on the B+ tree structure of the index block.
  • the leaf node corresponding to the request key is not found in the index block, it indicates that the value corresponding to the request key does not exist in the target file, and the reading ends.
  • step S340 may be performed to read the target value in the logical address in the data block in the target file according to the value corresponding to the key stored by the found leaf node.
  • FIG. 8 is a functional block diagram showing a file merging apparatus according to an embodiment of the present invention.
  • the functional modules of the file combining apparatus 500 may be implemented by hardware, software, or a combination of hardware and software that implements the principles of the present invention. It will be understood by those skilled in the art that the functional modules described in FIG. 7 can be combined or divided into sub-modules to implement the principles of the above described invention. Accordingly, the description herein may support any possible combination, or division, or further limitation of the functional modules described herein.
  • the file merging device 500 shown in FIG. 8 can be used to implement the detecting method shown in FIG. 3 to FIG. 6.
  • the following is only a brief description of the functional modules that the file merging device 500 can have and the operations that can be performed by the functional modules.
  • For details of the reference refer to the description above with reference to FIG. 3 to FIG. 6 , and details are not described herein again.
  • the file combining apparatus 500 includes a first writing unit 510, a B-tree generating unit 520, a second writing unit 530, and a third writing unit 540.
  • the first writing unit 510 is configured to write an additional data block after the first file, wherein the value in the data block of the second file is written.
  • the B-tree generating unit 520 is configured to generate a new B+ tree based on the index block of the first file and the index block of the second file, all the keys in the index block of the first file and the index block of the second file, and each key
  • the logical addresses in the data block and the additional data block of the first file are respectively recorded in the leaf nodes in the new B+ tree;
  • the second writing unit 530 is configured to additionally write a new index block after appending the data block, in which a new B+ tree is written.
  • the third writing unit 540 is configured to additionally write a new file header after the new index block to record the metadata information of the merged new file.
  • the file combining apparatus 500 may also optionally include an updating unit 550.
  • the update unit 550 can update the file header of the first file according to the new file header to replace the metadata information in the file header of the first file with the metadata information in the new file header.
  • the file may include a front file header located at the head of the file and a subsequent file header located at the end of the file, and the contents of the front file header and the subsequent file header are the same.
  • the update unit 550 can update the previous file header of the first file as the previous file header of the new file according to the new file header, and use the new file header as the post file header of the new file.
  • the file combining apparatus 500 may further include a first restoring unit 560 and a second restoring unit 570.
  • the first restoration unit 560 may restore the new file to the first file before the merge according to the file header of the first file in the case where the step of writing the metadata information of the new file in the new file header is erroneous.
  • the second restoration unit 570 may re-update the file header of the first file according to the new file header in the case where the step of updating the file header of the first file is erroneous.
  • the file combining apparatus 500 may further include a reading unit 580.
  • the reading unit 580 can read the target value corresponding to the request key from the target file.
  • FIG. 8 is a functional block diagram showing functional modules that a reading unit can have.
  • the reading unit 580 may include an obtaining module 581, a determining module 583, a searching module 585, and a reading module 587.
  • the obtaining module 581 can acquire the file header and the index block of the target file, and the determining module 583 can determine, according to the file header, whether the request key is within the range of the key indicated by the file header. In the case where it is determined that the request key is within the range, the lookup module 585 can look up the leaf node corresponding to the request key in the index block based on the B+ tree structure of the index block. The reading module 587 can read the target value in the logical address in the data block in the target file according to the value corresponding to the key stored by the found leaf node.
  • an electronic device including a memory and a processor.
  • the memory is configured to store executable instructions
  • the processor is configured to execute the electronic device to perform any one of the file merging methods provided in the embodiment according to the control of the executable instructions.
  • the electronic device can be an electronic device 1000 as shown in FIG.
  • the keys and values of the file can be stored separately, and the keys are stored in the form of a B+ tree. Therefore, when the two files are merged, one file can be kept unchanged, and the value of the other file can be directly added and written, thereby improving the writing performance.
  • the merged index block is a new B+ tree, and the value in the merged file can be conveniently read according to the new index block, and the read performance of the merged file is not affected.
  • LevelDB exposes many shortcomings in many aspects such as reading, writing, merging, data cleaning, restarting recovery, etc.
  • this embodiment proposes a new database management method and database system.
  • FIG. 11 is a block diagram showing the structure of a database system according to an embodiment of the present invention.
  • the database system 100 of the present invention mainly includes an internal memory 110 and an external memory 120.
  • the internal memory 110 and the external memory 120 can cooperate to complete data storage.
  • FIG. 12 is a flow chart showing the cooperation between the internal memory 110 and the external memory 120 to implement data storage.
  • a plurality of pieces of data to be stored may be written by the internal memory 110 to a log file in the external memory 120.
  • Each piece of data includes corresponding keys and values, and the log files can be sequentially written in the order in which the data arrives.
  • step S120 may be performed to write the data in the log file to the memory table in the internal memory 110 by the external memory 120.
  • the data written in the memory table can be stored in order according to the size of the key.
  • the data stored in the memory table may adopt a jump table structure, so that the data stored in the memory table is arranged in an order according to the size of the key.
  • the memory table may be composed of a hash table, where the hash table may include one or more hash buckets, each hash bucket corresponds to one jump table, and each data in the memory table constitutes one of the jump tables. An element in which the order of the elements in the jump table is ordered in order of the size of the key.
  • a hash table is embedded before the jump table, so that the lock granularity can be reduced on the one hand, and for concurrent read and write operations, if the keys are not the same, the fast lookup can be performed in the jump table corresponding to the respective hash bucket. insert.
  • the size of the jump table is not enlarged, which can reduce the probability that the jump table becomes a linear search as the amount of data becomes larger, and improves the overall search efficiency.
  • the internal memory 110 can convert the memory table into a read-only memory table (step S130), at which time the internal memory 110 is not written in the log file.
  • the data can be written to a new memory table.
  • read-only memory tables can only be read and cannot be written.
  • the log file in the external memory 120 and the memory table in the internal memory 110 may have a one-to-one correspondence, that is, for a key-Value data, it may be written to the log file, and then from the log file.
  • the newly arrived data can be written into a new log file, and the data in the new log file can be written into the new memory table.
  • step S140 may be performed, and the data in the read-only memory table may be written into the external memory 120 by the internal memory 110 to obtain the first-level storage file.
  • the external memory 120 may perform step S150 to merge two or more first-level storage files stored therein to obtain a second-level storage file.
  • the external memory 120 may specify a primary file name of the first-level storage file according to the first naming rule, and may specify a primary file name of the second-level storage file according to the second naming rule, where the first naming rule and the second naming
  • the rules can be set differently to distinguish whether the storage file is a first-level storage file or a second-level storage file based on the primary file name. For example, "_0" can be added after the main file name of the first-level storage file, and "_1" is added after the main file name of the second-level storage file. That is, the first level storage file and the second level storage file can be named by xxx_0.hdb, xxx_1.hdb respectively.
  • FIG. Figure 13 is a static diagram showing the process of storing data.
  • the read-only memory table queue can be maintained in the internal memory.
  • the data in the read-only memory table is not all written to the external memory, and the new memory table will be new when the size of the new memory table exceeds a predetermined threshold. Convert to another read-only memory table and put it into a read-only memory table queue. Therefore, by maintaining the memory table queue, it is possible to cope with the problem of blocking when the high frequency is written, because the data is too late to be merged, and the memory table is full.
  • the structure of the database system of the present embodiment and the data storage flow of the database system for persistently storing data in the external memory have been described so far with reference to FIGS. 11 to 13.
  • the following describes the process of merging the storage files stored in the external storage, the data search process, and the data recovery process when the database system is restarted under special circumstances.
  • Fig. 1 a schematic diagram of a data structure of a storage file stored in an external memory has been shown.
  • the files described in the present invention can be physically divided into file headers, data blocks, and index blocks by blocks, and each block can be composed of a plurality of pages.
  • the page mentioned in this paper is the minimum unit of I/O, which is generally an integer multiple of the system page.
  • the size of the pages of different types of blocks can be different.
  • the data block is used to store the value.
  • the index block is used to store the key corresponding to the value in the form of a B+ tree.
  • the form of the B+ tree is well known to those skilled in the art and will not be described here. It should be noted that each leaf node in the B+ tree corresponds to a key, and the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree. That is, only the key is stored in the leaf node of the B+ tree, and no value is stored. Instead, the offset of the page in the data block where the value is located and the offset of the value in the page can be stored.
  • all the nodes (root node, internal node, and leaf node) constituting the B+ tree are physically and continuously stored, thereby utilizing the local preloading feature of the disk to quickly acquire all nodes in the B+ tree, and the merge can be improved.
  • the efficiency of building a new B+ tree in the process (the merge process will be explained in more detail below).
  • the header is used to record the metadata information of the file.
  • the metadata information may include the number of keys in the index block, the range of keys in the index block, the height of the B+ tree, the logical address of the first leaf node in the B+ tree, and the number of internal nodes in the B+ tree.
  • the file header of the storage file may include a front file header and a back file header, and the metadata information of the file recorded by the front file header and the subsequent file header may be the same.
  • the storage file may further include a filter, and the filter may be used to determine whether the accessed key is in the file, for example, the filter may be a Bloom filter, and for accessing a key that does not exist, the Bronze may be used.
  • the filter quickly determines that the key does not exist, and does not need to go to the B+ tree to query. Because the Bloom filter is actually a hash table, you can judge the existence of the key in the complexity of O(1), and the search time complexity of the B+ tree is O(logn), so you can set the Bloom filter. Improve search efficiency, which can improve read performance.
  • FIG. 4 is a schematic flow chart showing a method of merging storage files according to an embodiment of the present invention.
  • the method may combine two or more storage files, wherein two or more first-level storage files may be combined into one second-level storage file, or two or more seconds may be combined.
  • the level stores the file and generates a new second-level storage file.
  • the merging process of the storage file of this embodiment is described here by taking the first storage file and the second storage file as an example.
  • step S210 an additional data block is additionally written after the first storage file, wherein the value in the data block of the second storage file is written.
  • the freshness of the second storage file may be greater than the first storage file, that is, the second storage file may be stored later in the external storage, and the first storage file may be previously stored in the external storage.
  • the value written in the data block of the second storage file may be appended after the first storage file, where A block in which a write value can be added after the first storage file is referred to as an additional data block. That is to say, the value in the data block of the second storage file can be rewritten in the additional data block after the first storage file, so that the end of the file F and the address of the additional data block are consecutive.
  • new index information can be created, that is, in step S220, the new index block is additionally written after the additional data block.
  • the new index block is generated based on the index block of the first storage file and the index block of the second storage file.
  • the freshness of the second storage file may be greater than the first storage file, so the key value in the second storage file may be a modification, deletion, replacement, etc. of the key value in the first storage file, and thus The same key existing in the index block of the first storage file and the second storage file may select a key in the second storage file with higher freshness as a valid key, and discard the key in the first storage file to construct a new key Index block.
  • the keys in the generated new index block are all valid keys, and the corresponding values are all valid values.
  • the key in the new index block is also stored in the form of a B+ tree, which is regenerated according to the index block of the first storage file and the index block of the second storage file, and thus may be referred to as a new B+ tree.
  • the index key of the first storage file and all the valid keys in the index block of the second storage file and their corresponding values are respectively recorded in the new B+ tree in the data block of the first storage file and the logical address in the additional data block. In the leaf node.
  • all the nodes of the B+ tree in the index block of the first storage file and the index block of the second storage file are physically stored continuously, so that the disk can be utilized in the process of reconstructing the new B+ tree.
  • the local preloading feature can obtain the index block of the first storage file and the index block of the second storage file by simply traversing successive disk blocks, thereby improving the construction efficiency of the new B+ tree.
  • the index block in the first storage file is invalidated and replaced by the new index block.
  • the invalidity mentioned here means that in the subsequent search process, the new index block is used for searching, and the old index block is no longer used. That is, after generating a new index block, the old index block may not be deleted.
  • step S230 a new file header is additionally written after the new index block to record the metadata information of the merged new file.
  • the metadata information of the new file may include the number of keys in the new index block, the range of keys in the new index block, the height of the new B+ tree, the logical address of the first leaf node in the new B+ tree, and the internal nodes in the new B+ tree. Number and so on. After generating a new file header, you can delete the second storage file and free up storage space.
  • FIG. 5 is a schematic diagram showing a merge process of merging G files into F files.
  • the F file is unchanged, and only the value in the G file needs to be additionally written into the F file, and a new index block and a new file header are generated.
  • the merge process is simpler, and according to the merged B+ tree, the value corresponding to the key in the file can be conveniently found, and the read performance is improved. .
  • FIG. 6 is another schematic diagram showing a merge process of merging G files into F files.
  • both the F file and the G file in FIG. 6 include a front file header located at the head of the file and a rear file header located at the end of the file.
  • the contents of the front file header and the last file header are the same.
  • the previous file header of the F file can be updated according to the new file header as the front file header of the new file, and the new file header is used as the new file header.
  • the file header of the file is used as the new file header.
  • the first two files of the new file can be updated normally and are the same.
  • the storage process when the data is persistently stored in the storage file in the external storage, the storage process is first written to the memory table, then written to the read-only memory table, and then written to the external
  • the first level storage file in the memory the first level storage file is merged into the second level storage file. Therefore, the freshness of the data is decremented according to the memory table, the read-only memory table, the first-level storage file, and the second-level storage file.
  • step S410 may first be performed to find in the memory table whether there is a key corresponding to the request key. For example, when the data in the memory table is stored in the form of a hash table and a jump table, it may first be located according to the request key to the specific hash bucket in the memory table, and then searched in the corresponding jump table.
  • step S420 If you find it in the memory table, you can read it directly. If it is not found in the memory table, it can continue to find in the read-only memory table in the internal memory whether or not there is a key corresponding to the request key (step S420). Wherein, when the memory memory maintains a read-only memory table queue with multiple read-only memory tables, the read-only memory table in the read-only memory table queue can be searched one by one in chronological order.
  • step S430 If it is not found in the read-only memory table, it can be searched from the first-level storage file in the external memory. Here, it can be chronologically searched whether each first-level storage file in the external storage has a request key corresponding to it. Key (step S430).
  • the value corresponding to the request key can be read from the first-level storage file. If it is not found, it can be searched from the second-level storage file in the external storage. Here, it can be used to find whether the second-level storage file has the value corresponding to the request key in the convenient time of the binary search (step S440).
  • the value corresponding to the request key can be read from the second-level storage file. In the case that it is not found, it indicates that the request key and the corresponding value are not stored in the database system.
  • Figure 15 is a flow chart showing the lookup inside a file.
  • the file header and the index block of the target storage file may be first acquired (step S510), and then step S520 is performed to determine, according to the file header, whether the request key is within the range of the key indicated by the file header, and if not, indicating the target storage. The value corresponding to the request key does not exist in the file, and the reading ends.
  • step S530 may be performed to find a leaf node corresponding to the request key in the index block based on the B+ tree structure of the index block.
  • the leaf node corresponding to the request key is not found in the index block, it indicates that the value corresponding to the request key does not exist in the target storage file, and the reading ends.
  • step S540 may be performed to read the target value according to the logical address in the data block in the target storage file according to the value corresponding to the key stored by the found leaf node.
  • Fig. 16 is a schematic flow chart showing the restart of the restart according to the present embodiment.
  • the sequence between step S610 and step S620 is not required, and may be performed at the same time or at different times.
  • a second level storage file list is constructed. Specifically, the index block of the second-level storage file, the filter block (in some cases), the file header, and the like may be pre-loaded by means of memory mapping, and then the second-level storage file list is constructed according to the range order of the keys.
  • a first level storage file list is constructed. Specifically, the index block of the first-level storage file, the filter block (in some cases), the file header, and the like may be pre-loaded by means of memory mapping, and then the first-level storage file list is constructed according to the range order of the keys.
  • step S630 After the first-level storage file list and the second-level storage file list are constructed, it is possible to determine that the write of the log file written to the first-level storage file is entered (step S630), so that the write progress can be made according to the progress.
  • the memory table and the read-only memory table in the internal memory are constructed (step S640).
  • log files written in the external memory which are respectively corresponding to the memory table (or the read-only memory table), so that it can be determined according to the constructed first file list and the second file list.
  • the data in those log files in multiple log files is not written to the storage file.
  • the log file that is not written to the storage file can then be converted to a read-only memory table, where the data in the log file can be written to the memory table for the last generated log file.
  • the recovery of the memory table and the read-only memory table in the internal memory can be completed.
  • an electronic device including a memory and a processor.
  • the memory is configured to store executable instructions
  • the processor is configured to execute the electronic device to perform any one of the file merging methods provided in the embodiment according to the control of the executable instructions.
  • the electronic device can be an electronic device 1000 as shown in FIG.
  • the file finally stored in the external memory has only two hierarchical structures, and the file redundancy is low, which is convenient to find.
  • a database management method In the present embodiment, a database management method, a database system, and an electronic device as described below are also provided.
  • Aspect 2 The data block management method of aspect 1, further comprising:
  • Level storage file Specifying a primary file name of the second-level storage file by using a second naming rule, the first naming rule being different from the second naming rule, so as to distinguish whether the storage file is a first-level storage file or a second based on the primary file name.
  • Level storage file Specifying a primary file name of the second-level storage file by using a second naming rule, the first naming rule being different from the second naming rule, so as to distinguish whether the storage file is a first-level storage file or a second based on the primary file name.
  • the memory table is composed of a hash table
  • the hash table includes one or more hash buckets, and each hash bucket corresponds to one jump table.
  • Each piece of data in the memory table constitutes an element of the hop table, wherein the order of the elements in the hop table is ordered in order according to the size of the key.
  • Aspect 4 The database management method of aspect 1, further comprising:
  • the data in the read-only memory table is not all written to the external memory, and when the size of the new memory table exceeds a predetermined threshold, the new memory table is converted Make another read-only memory table and put it into the read-only memory table queue.
  • An index block configured to store, in a B+ tree, a key corresponding to the value, wherein a logical address of all the keys and their corresponding values in the data block are respectively recorded in a leaf node in the B+ tree, And all nodes constituting the B+ tree are physically stored continuously.
  • Aspect 6 The database management method of aspect 5, wherein the step of merging the two first level storage files comprises:
  • the new index block being generated based on an index block of the first storage file and an index block of the second storage file, where the first storage file is All valid keys in the index block and the index block of the second storage file and their corresponding values are respectively recorded in the new B+ tree in the data block of the first storage file and the logical address in the additional data block.
  • a new file header is additionally written after the new index block to record metadata information of the merged new file.
  • Aspect 7 The database management method of aspect 6, wherein the metadata information comprises one or more of the following:
  • Aspect 8 The database management method of aspect 6, further comprising:
  • the file includes a front file header located at a file header and a subsequent file header located at a tail of the file, and the content of the front file header and the subsequent file header are the same.
  • Aspect 10 The database management method of aspect 8 or 9, further comprising:
  • the new file is restored to the first storage file before the merge according to the file header of the first storage file;
  • the file header of the first storage file is re-updated according to the new file header.
  • Aspect 11 The database management method of any of aspects 1-9, further comprising:
  • Aspect 12 The database management method of aspect 11, further comprising:
  • the target value is read at a logical address in the data block in the target storage file according to the value corresponding to the key stored by the found leaf node.
  • Aspect 13 The database management method of any of aspects 1-9, further comprising:
  • a memory table and a read-only memory table in the internal memory are constructed.
  • a database system comprising: an internal memory and an external memory, wherein
  • the internal memory is used to write a plurality of pieces of data into a log file in an external memory.
  • the external memory writes data in the log file to a memory table in an internal memory, wherein data written in the memory table is stored in an orderly manner according to a size of a key.
  • the internal memory converts the memory table into a read-only memory table, and the external memory writes subsequent data in the log file to a new memory table.
  • the internal memory writes data in the read-only memory table into an external accessor to obtain a first-level storage file.
  • the external memory merges two or more first level storage files to obtain a second level storage file.
  • Aspect 15 The database system of aspect 14, wherein
  • the external storage specifies a primary file name of the first-level storage file by a first naming rule, and specifies a primary file name of the second-level storage file by a second naming rule, the first naming rule and the The second naming rule is different to distinguish whether the storage file is a first-level storage file or a second-level storage file based on the primary file name.
  • the memory table is composed of a hash table
  • the hash table includes one or more hash buckets
  • each hash bucket corresponds to a jump table.
  • Each piece of data in the memory table constitutes an element of the hop table, wherein the order of the elements in the hop table is ordered in order according to the size of the key.
  • Aspect 17 The database system of aspect 14, wherein
  • the database system of aspect 14, wherein the data structure of the storage file comprises:
  • An index block configured to store, in a B+ tree, a key corresponding to the value, wherein a logical address of all the keys and their corresponding values in the data block are respectively recorded in a leaf node in the B+ tree, And all nodes constituting the B+ tree are physically stored continuously.
  • the new index block being generated based on an index block of the first storage file and an index block of the second storage file, where the first storage file is All the keys in the index block and the index block of the second storage file and their corresponding values are respectively recorded in the new B+ tree in the data block of the first storage file and the logical address in the additional data block.
  • a new file header is additionally written after the new index block to record metadata information of the merged new file.
  • An electronic device comprising:
  • a memory for storing executable instructions
  • the invention can be a system, method and/or computer program product.
  • the computer program product can comprise a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement various aspects of the present invention.
  • the computer readable storage medium can be a tangible device that can hold and store the instructions used by the instruction execution device.
  • the computer readable storage medium can be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, for example, with instructions stored thereon A raised structure in the hole card or groove, and any suitable combination of the above.
  • a computer readable storage medium as used herein is not to be interpreted as a transient signal itself, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (eg, a light pulse through a fiber optic cable), or through a wire The electrical signal transmitted.
  • the computer readable program instructions described herein can be downloaded from a computer readable storage medium to various computing/processing devices or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in each computing/processing device .
  • Computer program instructions for performing the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages.
  • the computer readable program instructions can execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer, partly on the remote computer, or entirely on the remote computer or server. carried out.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or wide area network (WAN), or can be connected to an external computer (eg, using an Internet service provider to access the Internet) connection).
  • the customized electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by utilizing state information of computer readable program instructions.
  • Computer readable program instructions are executed to implement various aspects of the present invention.
  • the computer readable program instructions can be provided to a general purpose computer, a special purpose computer, or a processor of other programmable data processing apparatus to produce a machine such that when executed by a processor of a computer or other programmable data processing apparatus Means for implementing the functions/acts specified in one or more of the blocks of the flowcharts and/or block diagrams.
  • the computer readable program instructions can also be stored in a computer readable storage medium that causes the computer, programmable data processing device, and/or other device to operate in a particular manner, such that the computer readable medium storing the instructions includes An article of manufacture that includes instructions for implementing various aspects of the functions/acts recited in one or more of the flowcharts.
  • the computer readable program instructions can also be loaded onto a computer, other programmable data processing device, or other device to perform a series of operational steps on a computer, other programmable data processing device or other device to produce a computer-implemented process.
  • instructions executed on a computer, other programmable data processing apparatus, or other device implement the functions/acts recited in one or more of the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagram can represent a module, a program segment, or a portion of an instruction that includes one or more components for implementing the specified logical functions.
  • Executable instructions can also occur in a different order than those illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are equivalent.

Abstract

Disclosed are a file merging method and apparatus. The method comprises: appending to write an appended data block after a first file, a value in a data block of a second file being written therein; appending to write a new index block after the appended data block, the new index block being generated based on an index block of the first file and an index block of the second file, logical addresses of all keys in the index block of the first file and the index block of the second file and corresponding values thereof in the data block of the first file and the appended data block being respectively recorded in leaf nodes of a new B + tree; and appending to write a new file header after the new index block so as to record metadata information of a merged new file. Accordingly, when two files are merged, a value of one file just needs to be directly appended to write into the other file, thereby improving write performance; and merged index blocks are a new B + tree, which makes it convenient to read values in a merged large file by means of a search, thereby improving read performance.

Description

文件合并方法和装置File merging method and device 技术领域Technical field
本发明涉及数据存储技术领域,特别是涉及存储在外部存储器中的文件的合并方法和装置。The present invention relates to the field of data storage technologies, and in particular, to a method and apparatus for merging files stored in an external memory.
背景技术Background technique
纵观当今数据库的存储引擎,其底层的数据结构要么是B树或其变种B+树,要么则是LSM树。前者具有较好的读友好性,而后者具有较好的写友好性。尽管这两样东西看似鱼和熊掌不可兼得,但在贪婪的互联网世界里,却渴求能有兼容读、写皆友好的数据存储方案的出现。虽然LevelDB中采用的数据结构似乎也结合了LSM和B树,但始终不够彻底,首先在严格意义上它并非一颗B树,而只是简单的多叉树;其二,其Key(键)和Value(值)存放在一起,不利于索引的优化,这种优化在做数据合并时尤显重要。Looking at the storage engine of today's databases, the underlying data structure is either a B-tree or its variant B+ tree, or an LSM tree. The former has better read-friendliness, while the latter has better write-friendliness. Although these two things seem to have both fish and bear's paws, in the greedy Internet world, they are eager to have a data storage solution that is compatible with both reading and writing. Although the data structure used in LevelDB seems to combine LSM and B-tree, it is not thorough enough. First, in the strict sense, it is not a B-tree, but a simple multi-fork tree; second, its Key (key) and The value (value) is stored together, which is not conducive to the optimization of the index. This optimization is especially important when doing data merging.
具体来说,LevelDB中存储在磁盘中的文件分为多个层级,不同层级有很多文件(SSTable文件),为了降低冗余度,提高可读性,需要对SSTable文件进行合并,由于SSTable文件中的键和对应的值是存储在一起的,因此在合并LevelDB文件时,需要取出所有的键值对一一处理,以构建新的文件,合并过程较为复杂,在提高可读性的同时会降低写性能。Specifically, the files stored in the disk in LevelDB are divided into multiple levels, and there are many files (SSTable files) in different levels. In order to reduce redundancy and improve readability, the SSTable files need to be merged, because in the SSTable file. The keys and corresponding values are stored together, so when merging LevelDB files, all key-value pairs need to be fetched one by one to build a new file. The merge process is more complicated and reduces readability. Write performance.
近年来,随着NoSql的兴起,涌现了各种KV型的存储引擎。有针对缓存的,也有针对持久化的,在持久化领域中具有代表性的要属LevelDB了。LevelDB是Google开发的高性能KV存储引擎,其灵感源自于Google的BigTable。In recent years, with the rise of NoSql, various KV-type storage engines have emerged. There are caches, but also for persistence, which is representative of LevelDB in the field of persistence. LevelDB is a high-performance KV storage engine developed by Google, inspired by Google's BigTable.
尽管LevelDB在小数据量的场景下,已经可以发挥非常不错的性能,然而大数据量(上百G)、高频度写入的情况下,LevelDB在读、写、合并、数据清理、重启恢复等多方面都暴露了其不足之处。Although LevelDB can play very good performance in the small data volume scenario, in the case of large data volume (hundreds of G) and high frequency write, LevelDB is reading, writing, merging, data cleaning, restart recovery, etc. Many aspects have exposed its shortcomings.
发明内容Summary of the invention
本发明实施例的一个目的在于提供一种数据读、写皆友好的文件合并方法和装置。An object of the embodiments of the present invention is to provide a file merging method and apparatus for data merging.
根据本发明实施例的一个方面,提供了一种文件合并方法,文件存储在外部存储器中,包括文件头、数据块以及索引块,文件头用于记录文件的元数据信息,数据块用于存放值,索引块用于以B+树的形式存放值对应的键,其中,所有键及其对应的值在数据块中的逻辑地址均分别记录于B+树中的叶子节点中,该方法包括:在第一文件之后追加写入追加数据块,其中写入第二文件的数据块中的值;在追加数据块之后追加写入新索引块,新索引块是基于第一文件的索引块和第二文件的索引块生成的,第一文件的索引块和第二文件的索引块中的全部有效键及其对应的值在第一文件的数据块和追加数据块中的逻辑地址均分别记录于新B+树中的叶子节点中;在新索引块之后追加写入新文件头,以记录合并后的新文件的元数据信息。According to an aspect of an embodiment of the present invention, a file merging method is provided. The file is stored in an external memory, including a file header, a data block, and an index block. The file header is used to record metadata information of the file, and the data block is used for storing. The index block is used to store the key corresponding to the value in the form of a B+ tree, wherein the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree, and the method includes: The first file is additionally written with an additional data block in which the value in the data block of the second file is written; after the additional data block is additionally written, the new index block is additionally written, and the new index block is based on the index block of the first file and the second The index block generated by the file, the index block of the first file and all the valid keys in the index block of the second file and their corresponding values are recorded in the data block of the first file and the logical address in the additional data block respectively. In the leaf node in the B+ tree; a new file header is additionally written after the new index block to record the metadata information of the merged new file.
本发明实施例述及的文件的键和值分开存储,键以B+树的形式进行存储。由此在将两个文件进行合并时,可以保持一个文件不动,将另一个文件的值直接追加写入即可,提高了写性能。并且合并后的索引块为新的B+树,根据新的索引块可以方便地读取合并后的文件中的值,合并后的文件的读性能也不会受到影响。The keys and values of the file described in the embodiment of the present invention are stored separately, and the keys are stored in the form of a B+ tree. Therefore, when the two files are merged, one file can be kept unchanged, and the value of the other file can be directly added and written, thereby improving the writing performance. And the merged index block is a new B+ tree, and the value in the merged file can be conveniently read according to the new index block, and the read performance of the merged file is not affected.
可选地,元数据信息可以包括以下一项或多项:Optionally, the metadata information may include one or more of the following:
索引块中键的数量;The number of keys in the index block;
索引块中键的范围;The range of keys in the index block;
B+树的高度;The height of the B+ tree;
B+树中第一个叶子节点的逻辑地址;The logical address of the first leaf node in the B+ tree;
B+树中内部节点的个数。The number of internal nodes in the B+ tree.
由此,在根据请求键读取对应的目标值时,可以根据文件的文件头中的元数据信息判断请求键是否在该文件的键的范围内,判定为是的情况下,再在该文件的索引块中查找,可以减少不必要的查找。Therefore, when the corresponding target value is read according to the request key, whether the request key is within the range of the key of the file can be determined according to the metadata information in the file header of the file, and if the determination is yes, the file is further Finding in the index block can reduce unnecessary lookups.
可选地,构成B+树的所有节点在物理上连续存储。Optionally, all nodes constituting the B+ tree are physically stored contiguously.
由此,可以利用磁盘的局部性预加载特点,在物理上连续存储B+树,使得在重建索引块的过程中通过简单的遍历连续的磁盘块就可以获取需要合并的文件的索引块。Thus, the B+ tree can be physically stored continuously by utilizing the local preloading feature of the disk, so that the index block of the file to be merged can be obtained by simply traversing successive disk blocks in the process of reconstructing the index block.
可选地,该文件合并方法还可以包括:根据新文件头更新第一文件的文件头,以用新文件头中的元数据信息替换第一文件的文件头中的元数据信息。Optionally, the file merging method may further include: updating a file header of the first file according to the new file header to replace the metadata information in the file header of the first file with the metadata information in the new file header.
由于追加写入是一种破坏性写入,由此本发明通过设置双文件头可以避免合并过程中异常情况发生对文件造成的破坏。Since the append write is a destructive write, the present invention can avoid the damage caused to the file by the abnormal situation during the merge process by setting the double file header.
可选地,文件包括位于文件头部的前文件头和位于文件尾部的后文件头,前文件头和后文件头的内容相同,根据新文件头更新第一文件的前文件头,作为新文件的前文件头,而以新文件头作为新文件的后文件头。Optionally, the file includes a front file header at the head of the file and a back file header at the end of the file. The contents of the front file header and the last file header are the same, and the previous file header of the first file is updated according to the new file header as a new file. The header of the previous file, with the new header as the header of the new file.
由此,合并正常完成时,新文件的前文件头和后文件头都能得到正常更新,都可以用来查看新文件中的元数据信息。Thus, when the merge is completed normally, the front and back headers of the new file can be updated normally, and can be used to view the metadata information in the new file.
可选地,该文件合并方法还可以包括:在新文件头中写入新文件的元数据信息的步骤出错的情况下,根据第一文件的文件头将新文件还原为合并前的第一文件;以及/或者在更新第一文件的文件头的步骤出错的情况下,根据新文件头重新更新第一文件的文件头。Optionally, the file merging method may further include: when the step of writing the metadata information of the new file in the new file header is wrong, the new file is restored to the first file before the merge according to the file header of the first file. And/or in the case of an error in the step of updating the header of the first file, the header of the first file is re-updated according to the new header.
由此,在写入新文件头的过程中出错时,由于第一文件的文件头尚未得到更新,因此可以根据第一文件的文件头将合并过程中的文件还原为合并前的第一文件,在更新第一文件的文件头的过程中出错时,则可以根据新文件头重新更新。Therefore, when an error occurs in the process of writing a new file header, since the file header of the first file has not been updated, the file in the merge process can be restored to the first file before the merge according to the file header of the first file. When an error occurs during the process of updating the header of the first file, it can be re-updated according to the new header.
可选地,该文件合并方法还可以包括括执行以下步骤以从目标文件中读取请求键所对应的目标值:获取目标文件的文件头和索引块;根据文件头判断请求键是否在文件头所指示的键的范围内;在判定请求键在范围内的情况下,基于索引块的B+树结构,在索引块中查找对应于请求键的叶子节点;根据所查找到的叶子节点所存储的键所对应的值在目标文件中的数据块中的逻辑地址读取目标值。Optionally, the file merging method may further include: performing the following steps to read the target value corresponding to the request key from the target file: obtaining a file header and an index block of the target file; determining, according to the file header, whether the request key is at the file header Within the range of the indicated key; in the case where it is determined that the request key is in the range, based on the B+ tree structure of the index block, the leaf node corresponding to the request key is searched in the index block; stored according to the found leaf node The value corresponding to the key reads the target value at the logical address in the data block in the target file.
根据本发明实施例的另一个方面,还提供了一种文件合并装置,文件存储在外部存储器中,包括文件头、数据块以及索引块,文件头用于记录 文件的元数据信息,数据块用于存放值,索引块用于以B+树的形式存放值对应的键,其中,所有键及其对应的值在数据块中的逻辑地址均分别记录于B+树中的叶子节点中,该装置包括:第一写入单元,用于在第一文件之后写入追加数据块,其中写入第二文件的数据块中的值;B树生成单元,用于基于第一文件的索引块和第二文件的索引块生成新B+树,第一文件的索引块和第二文件的索引块中的全部有效键及其对应的值在第一文件的数据块和追加数据块中的逻辑地址均分别记录于新B+树中的叶子节点中;第二写入单元,用于在追加数据块之后追加写入新索引块,其中写入新B+树;第三写入单元,用于在新索引块之后追加写入新文件头,以记录合并后的新文件的元数据信息。According to another aspect of the embodiments of the present invention, a file merging apparatus is further provided, where the file is stored in an external memory, including a file header, a data block, and an index block, where the file header is used to record metadata information of the file, and the data block is used. For storing values, the index block is used to store the key corresponding to the value in the form of a B+ tree, wherein the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree, and the device includes a first writing unit, configured to write an additional data block after the first file, wherein a value in the data block of the second file is written; a B-tree generating unit, configured to be based on the index block of the first file and the second The index block of the file generates a new B+ tree, and all the valid keys in the index block of the first file and the index block of the second file and their corresponding values are respectively recorded in the data blocks of the first file and the logical addresses in the additional data block. In the leaf node in the new B+ tree; the second write unit is configured to additionally write a new index block after appending the data block, wherein the new B+ tree is written; and the third write unit is used after the new index block Add a new header is written to the new file meta merged data record.
可选地,元数据信息可以包括以下一项或多项:Optionally, the metadata information may include one or more of the following:
索引块中键的数量;The number of keys in the index block;
索引块中键的范围;The range of keys in the index block;
B+树的高度;The height of the B+ tree;
B+树中第一个叶子节点的逻辑地址;The logical address of the first leaf node in the B+ tree;
B+树中内部节点的个数。The number of internal nodes in the B+ tree.
可选地,该文件合并装置还可以包括:更新单元,用于根据新文件头更新第一文件的文件头,以用新文件头中的元数据信息替换第一文件的文件头中的元数据信息。Optionally, the file merging device may further include: an updating unit, configured to update the file header of the first file according to the new file header, to replace the metadata in the file header of the first file with the metadata information in the new file header. information.
可选地,文件包括位于文件头部的前文件头和位于文件尾部的后文件头,前文件头和后文件头的内容相同,更新单元根据新文件头更新第一文件的前文件头,作为新文件的前文件头,而以新文件头作为新文件的后文件头。Optionally, the file includes a front file header at the head of the file and a rear file header at the end of the file. The contents of the front file header and the back file header are the same, and the update unit updates the previous file header of the first file according to the new file header. The front file header of the new file, with the new file header as the post file header of the new file.
可选地,该文件合并装置还可以包括:第一还原单元,用于在新文件头中写入新文件的元数据信息的步骤出错的情况下,根据第一文件的文件头将新文件还原为合并前的第一文件;以及/或者第二还原单元,用于在更新第一文件的文件头的步骤出错的情况下,根据新文件头重新更新第一文件的文件头。Optionally, the file merging device may further include: a first restoring unit, where the step of writing the metadata information of the new file in the new file header fails, the new file is restored according to the file header of the first file The first file before the merge; and/or the second restore unit is configured to re-update the file header of the first file according to the new file header if the step of updating the file header of the first file is erroneous.
可选地,该文件合并装置还可以包括读取单元,用于从目标文件中读 取请求键所对应的目标值,其中,读取单元可以包括:获取模块,获取目标文件的文件头和索引块;判断模块,根据文件头判断请求键是否在文件头所指示的键的范围内;查找模块,在判定请求键在范围内的情况下,基于索引块的B+树结构,在索引块中查找对应于请求键的叶子节点;读值模块,根据所查找到的叶子节点所存储的键所对应的值在目标文件中的数据块中的逻辑地址读取目标值。Optionally, the file merging device may further include a reading unit, configured to read the target value corresponding to the request key from the target file, where the reading unit may include: an acquiring module, acquiring a file header and an index of the target file a determining module, according to the file header, determining whether the request key is within the range of the key indicated by the file header; and the finding module, in the case of determining that the request key is in the range, searching in the index block based on the B+ tree structure of the index block Corresponding to the leaf node of the request key; the reading module reads the target value in the logical address in the data block in the target file according to the value corresponding to the key stored by the found leaf node.
本发明实施例的文件合并方法和装置中述及的文件的键和值是分开存储的,其中键是以B+树的形式进行存储的,由此在将两个文件合并时,可以保持一个文件不动,将另一个文件的值直接追加写入前一个文件,提高了写性能,并且重新构造以B+树形式存储键的索引块,根据新的索引块可以方便地读取合并后的文件中的值,合并后的文件的读性能也不会受到影响。The key and value of the file described in the file merging method and apparatus of the embodiment of the present invention are separately stored, wherein the key is stored in the form of a B+ tree, thereby maintaining one file when merging the two files Moves the value of another file directly to the previous file, improves the write performance, and reconstructs the index block that stores the key in the form of a B+ tree. The new index block can be easily read from the merged file. The value of the merged file will not be affected.
本发明实施例的另一目的在于提供一种新的数据库管理方法及数据库系统。Another object of embodiments of the present invention is to provide a new database management method and database system.
根据本发明实施例的一个方面,提供了一种数据库管理方法,用于存储多条数据,其中,每条数据包括相对应的键和值,该方法包括:将多条数据写入外部存储器中的日志文件;将日志文件中的数据写入内部存储器中的内存表,其中,写入内存表中的数据按照键的大小有序存储;在内存表的大小超过预定阈值时,将内存表转化为只读内存表,并将日志文件中的后续数据写入新的内存表;将只读内存表中的数据写入外部存取器中,以得到第一级存储文件;以及合并两个或更多个第一级存储文件,以得到第二级存储文件。According to an aspect of an embodiment of the present invention, a database management method is provided for storing a plurality of pieces of data, wherein each piece of data includes a corresponding key and value, the method comprising: writing a plurality of pieces of data into an external memory The log file; the data in the log file is written into the memory table in the internal memory, wherein the data written in the memory table is stored in order according to the size of the key; when the size of the memory table exceeds a predetermined threshold, the memory table is converted Is a read-only memory table, and writes subsequent data in the log file to the new memory table; writes the data in the read-only memory table to the external accessor to obtain the first-level storage file; and merges two or More first-level storage files to get second-level storage files.
由此,最终存储在外部存储器中的文件仅有两层,冗余度较低,查找起来比较方便。Thus, the file finally stored in the external memory has only two layers, and the redundancy is low, which is convenient to find.
可选地,该数据块管理方法还可以包括:以第一命名规则指定第一级存储文件的主文件名;以及以第二命名规则指定第二级存储文件的主文件名,第一命名规则与第二命名规则不同,以便基于主文件名区分存储文件是第一级存储文件还是第二级存储文件。Optionally, the data block management method may further include: specifying a primary file name of the first-level storage file by using a first naming rule; and specifying a primary file name of the second-level storage file by using a second naming rule, the first naming rule It is different from the second naming rule in order to distinguish whether the storage file is a first-level storage file or a second-level storage file based on the main file name.
由此可以根据对存储文件的主文件名来确认其属于第一级存储文件 还是第二级存储文件。Therefore, it can be confirmed whether it belongs to the first-level storage file or the second-level storage file according to the main file name of the storage file.
可选地,内存表可以由一个哈希表组成,哈希表包括一个或多个哈希桶,每个哈希桶对应一个跳表,内存表中的每条数据构成跳表的一个元素,其中,跳表中的元素的顺序是按照键的大小有序排列的。Optionally, the memory table may be composed of a hash table, where the hash table includes one or more hash buckets, each hash bucket corresponds to one jump table, and each piece of data in the memory table constitutes an element of the jump table. Among them, the order of the elements in the jump table is arranged in order according to the size of the keys.
在跳表前插入内存表,可以降低锁粒度,对于并发读写操作,如果键不相同,则可在各自的哈希桶所对应的跳表中进行快速的查找插入,另一方面,在扩大内存表的大小的同时,不至于扩大跳表的大小,这可以降低跳表随着数据量变大而变成线性查找的概率,从而提高整体的查找效率。Inserting the memory table before jumping the table can reduce the lock granularity. For concurrent read and write operations, if the keys are different, the fast lookup insertion can be performed in the jump table corresponding to the respective hash bucket. On the other hand, the expansion is expanded. While the size of the memory table is not increased, the size of the jump table is not increased, which can reduce the probability that the jump table becomes a linear search as the amount of data becomes larger, thereby improving the overall search efficiency.
可选地,该数据块管理方法还可以包括:在内部存储器中维护只读内存表队列,在只读内存表中的数据未全部写入外部存储器,而新的内存表的大小超过预定阈值时,将新的内存表转化为另一个只读内存表,并放入只读内存表队列。Optionally, the data block management method may further include: maintaining a read-only memory table queue in the internal memory, where data in the read-only memory table is not all written to the external memory, and when the size of the new memory table exceeds a predetermined threshold , converts the new memory table into another read-only memory table and puts it into the read-only memory table queue.
由此通过维护内存表队列可以应对在高频度写的时候不至于因为数据来不及合并,而内存表又写满之后出现阻塞的问题。Therefore, by maintaining the memory table queue, it is possible to cope with the problem of blocking when the high frequency is written, because the data is too late to be merged, and the memory table is full.
可选地,存储文件的数据结构可以包括:文件头,用于记录存储文件的元数据信息;数据块,用于存放值;以及索引块,用于以B+树的形式存放值对应的键,其中,所有键及其对应的值在数据块中的逻辑地址均分别记录于B+树中的叶子节点中,并且构成B+树的所有节点在物理上连续存储。Optionally, the data structure of the storage file may include: a file header for recording metadata information of the storage file, a data block for storing the value, and an index block for storing the key corresponding to the value in the form of a B+ tree. The logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree, and all the nodes constituting the B+ tree are physically stored continuously.
由此,可以利用磁盘的局部性预加载特点,在物理上连续存储B+树,使得在重建索引块的过程中通过简单的遍历连续的磁盘块就可以获取需要合并的文件的索引块。Thus, the B+ tree can be physically stored continuously by utilizing the local preloading feature of the disk, so that the index block of the file to be merged can be obtained by simply traversing successive disk blocks in the process of reconstructing the index block.
可选地,合并两个第一级存储文件的步骤可以包括:在第一存储文件之后追加写入追加数据块,其中写入第二存储文件的数据块中的值;在追加数据块之后追加写入新索引块,新索引块是基于第一存储文件的索引块和第二存储文件的索引块生成的,第一存储文件的索引块和第二存储文件的索引块中的全部键及其对应的值在第一存储文件的数据块和追加数据块中的逻辑地址均分别记录于新B+树中的叶子节点中;在新索引块之后追加写入新文件头,以记录合并后的新文件的元数据信息。Optionally, the step of merging the two first-level storage files may include: additionally writing an additional data block after the first storage file, where the value in the data block of the second storage file is written; and appending the data block after appending the data block Writing a new index block, the new index block is generated based on the index block of the first storage file and the index block of the second storage file, the index block of the first storage file and all the keys in the index block of the second storage file and The corresponding value is recorded in the leaf node of the new B+ tree in the data block of the first storage file and the logical address in the additional data block; the new file header is additionally written after the new index block to record the merged new file. Metadata information for the file.
由此在将两个文件进行合并时,可以保持一个文件不动,将另一个文件的值直接追加写入即可,提高了写性能。并且合并后的索引块为新的B+树,根据新的索引块可以方便地读取合并后的文件中的值,合并后的文件的读性能也不会受到影响。Therefore, when the two files are merged, one file can be kept unchanged, and the value of the other file can be directly added and written, thereby improving the writing performance. And the merged index block is a new B+ tree, and the value in the merged file can be conveniently read according to the new index block, and the read performance of the merged file is not affected.
可选地,元数据信息可以包括以下一项或多项:索引块中键的数量;索引块中键的范围;B+树的高度;B+树中第一个叶子节点的逻辑地址;B+树中内部节点的个数。Optionally, the metadata information may include one or more of the following: the number of keys in the index block; the range of keys in the index block; the height of the B+ tree; the logical address of the first leaf node in the B+ tree; and the B+ tree The number of internal nodes.
由此,在根据请求键读取对应的目标值时,可以根据文件的文件头中的元数据信息判断请求键是否在该文件的键的范围内,判定为是的情况下,再在该文件的索引块中查找,可以减少不必要的查找。Therefore, when the corresponding target value is read according to the request key, whether the request key is within the range of the key of the file can be determined according to the metadata information in the file header of the file, and if the determination is yes, the file is further Finding in the index block can reduce unnecessary lookups.
可选地,该数据库管理方法还可以包括:根据新文件头更新第一存储文件的文件头,以用新文件头中的元数据信息替换第一存储文件的文件头中的元数据信息。Optionally, the database management method may further include: updating a file header of the first storage file according to the new file header to replace the metadata information in the file header of the first storage file with the metadata information in the new file header.
由于追加写入是一种破坏性写入,由此本发明实施例通过设置双文件头可以避免合并过程中异常情况发生对文件造成的破坏。Since the additional write is a destructive write, the embodiment of the present invention can avoid the damage caused to the file by the abnormal situation during the merge process by setting the double file header.
可选地,文件可以包括位于文件头部的前文件头和位于文件尾部的后文件头,前文件头和后文件头的内容相同,根据新文件头更新第一存储文件的前文件头,作为新文件的前文件头,而以新文件头作为新文件的后文件头。Optionally, the file may include a front file header located at a file header and a subsequent file header located at a tail of the file, and the contents of the front file header and the subsequent file header are the same, and the front file header of the first storage file is updated according to the new file header, as The front file header of the new file, with the new file header as the post file header of the new file.
由此,合并正常完成时,新文件的前文件头和后文件头都能得到正常更新,都可以用来查看新文件中的元数据信息。Thus, when the merge is completed normally, the front and back headers of the new file can be updated normally, and can be used to view the metadata information in the new file.
可选地,该数据库管理方法还可以包括:在新文件头中写入新文件的元数据信息的步骤出错的情况下,根据第一存储文件的文件头将新文件还原为合并前的第一存储文件;以及/或者在更新第一存储文件的文件头的步骤出错的情况下,根据新文件头重新更新第一存储文件的文件头。Optionally, the database management method may further include: when the step of writing the metadata information of the new file in the new file header is wrong, the new file is restored to the first before the merge according to the file header of the first storage file. The file is stored; and/or in the case where the step of updating the header of the first stored file is erroneous, the header of the first stored file is re-updated according to the new header.
由此,在写入新文件头的过程中出错时,由于第一存储文件的文件头尚未得到更新,因此可以根据第一存储文件的文件头将合并过程中的文件还原为合并前的第一存储文件,在更新第一存储文件的文件头的过程中出错时,则可以根据新文件头重新更新。Therefore, when an error occurs in the process of writing a new file header, since the file header of the first storage file has not been updated, the file in the merge process can be restored to the first before the merge according to the file header of the first storage file. The file is stored, and when an error occurs in the process of updating the header of the first stored file, it can be re-updated according to the new header.
可选地,该数据库管理方法还可以包括:响应于查找请求键所对应的目标值的请求,在内存表中查找是否具有与请求键对应的键,在查找到的情况下读取目标值;在内存表中查到不到请求键的情况下,从只读内存表中查找是否具有与请求键对应的键,在查找到的情况下读取目标值;在只读内存表中查到不到请求键的情况下,按照时间顺序逐个查找磁盘中各个第一级存储文件中是否具有与请求键对应的键,在查找到的情况下读取目标值;以及在各个第一级存储文件中查到不到的情况下,使用折半查找的方式查找磁盘中第二级存储文件中是否具有与请求键对应的键,在查找到的情况下读取目标值。Optionally, the database management method may further include: searching for a key corresponding to the request key in the memory table, in response to the request for finding the target value corresponding to the request key, and reading the target value in the case of finding; If the request key is not found in the memory table, the read-only memory table is searched for the key corresponding to the request key, and the target value is read in the case of the search; the read-only memory table is not found. In the case of the request key, it is chronologically searched for whether each of the first-level storage files in the disk has a key corresponding to the request key, and in the case of the search, the target value is read; and in each of the first-level storage files. If it is not found, use the binary search method to find whether the second-level storage file in the disk has the key corresponding to the request key, and read the target value if found.
可选地,该数据库管理方法还可以包括:响应于从目标存储文件中读取请求键所对应的目标值的请求,获取目标存储文件的文件头和索引块;根据文件头判断请求键是否在文件头所指示的键的范围内;在判定请求键在文件头所指示的键的范围内的情况下,基于索引块的B+树结构,在索引块中查找对应于请求键的叶子节点;在查找到的情况下,根据所查找到的叶子节点所存储的键所对应的值在目标存储文件中的数据块中的逻辑地址读取目标值。Optionally, the database management method may further include: acquiring a file header and an index block of the target storage file in response to the request for reading the target value corresponding to the request key from the target storage file; determining, according to the file header, whether the request key is Within the range of the key indicated by the file header; in the case of determining that the request key is within the range of the key indicated by the file header, searching for the leaf node corresponding to the request key in the index block based on the B+ tree structure of the index block; In the case of the search, the target value is read in the logical address in the data block in the target storage file according to the value corresponding to the key stored by the found leaf node.
可选地,该数据库管理方法还可以包括:响应于重启恢复内部存储器的请求,根据第二级存储文件所包含的键的范围的大小顺序,构建第二级存储文件列表;根据第一级存储文件的文件序号顺序,构建第一级存储文件列表;根据第一级存储文件列表和第二级存储文件列表,判断日志文件中的数据被写入到第一级存储文件的写入进度;以及根据写入进度,将日志文件中未写入到第一级存储文件的数据写入内部存储器中的内存表。Optionally, the database management method may further include: in response to restarting the request for restoring the internal memory, constructing the second-level storage file list according to the size order of the range of the keys included in the second-level storage file; storing according to the first level The file serial number order of the file, constructing a first-level storage file list; determining, according to the first-level storage file list and the second-level storage file list, the writing progress of the data in the log file being written to the first-level storage file; According to the writing progress, the data in the log file that is not written to the first-level storage file is written to the memory table in the internal memory.
根据本发明实施例的另一个方面,还提供了一种数据库系统,包括:内部存储器和外部存储器,其中,内部存储器用于将多条数据写入外部存储器中的日志文件,外部存储器将日志文件中的数据写入内部存储器中的内存表,其中写入内存表中的数据按照键的大小有序存储,在内存表的大小超过预定阈值时,内部存储器将内存表转化为只读内存表,外部存储器将日志文件中的后续数据写入新的内存表,内部存储器将只读内存表中的数据写入外部存取器中,以得到第一级存储文件,外部存储器合并两个或 更多个第一级存储文,以得到第二级存储文件。According to another aspect of the embodiments of the present invention, there is also provided a database system comprising: an internal memory and an external memory, wherein the internal memory is used to write a plurality of pieces of data to a log file in the external memory, and the external memory will log the file The data in the internal memory is written into the memory table in the internal memory, wherein the data written in the memory table is stored in order according to the size of the key. When the size of the memory table exceeds a predetermined threshold, the internal memory converts the memory table into a read-only memory table. The external memory writes the subsequent data in the log file to the new memory table, and the internal memory writes the data in the read-only memory table to the external accessor to obtain the first-level storage file, and the external memory merges two or more. The first level stores the text to get the second level storage file.
可选地,外部存储器以第一命名规则指定第一级存储文件的主文件名,并且以第二命名规则指定第二级存储文件的主文件名,第一命名规则与第二命名规则不同,以便基于主文件名区分存储文件是第一级存储文件还是第二级存储文件。Optionally, the external storage specifies a primary file name of the first-level storage file by using a first naming rule, and specifies a primary file name of the second-level storage file by using a second naming rule, where the first naming rule is different from the second naming rule. In order to distinguish whether the storage file is a first-level storage file or a second-level storage file based on the main file name.
可选地,内存表由一个哈希表组成,哈希表包括一个或多个哈希桶,每个哈希桶对应一个跳表,内存表中的每条数据构成跳表的一个元素,其中,跳表中的元素的顺序是按照键的大小有序排列的。Optionally, the memory table is composed of a hash table, the hash table includes one or more hash buckets, and each hash bucket corresponds to one jump table, and each piece of data in the memory table constitutes an element of the jump table, wherein The order of the elements in the jump table is ordered according to the size of the keys.
可选地,在内部存储器中维护只读内存表队列,在只读内存表中的数据未全部写入外部存储器,而新的内存表的大小超过预定阈值时,外部存储器将新的内存表转化为另一个只读内存表,并放入只读内存表队列。Optionally, the read-only memory table queue is maintained in the internal memory, and the data in the read-only memory table is not all written to the external memory, and when the size of the new memory table exceeds a predetermined threshold, the external memory converts the new memory table into a new memory table. Make another read-only memory table and put it into a read-only memory table queue.
可选地,存储文件的数据结构可以包括:文件头,用于记录存储文件的元数据信息;数据块,用于存放值;以及索引块,用于以B+树的形式存放值对应的键,其中,所有键及其对应的值在数据块中的逻辑地址均分别记录于B+树中的叶子节点中,并且构成B+树的所有节点在物理上连续存储。Optionally, the data structure of the storage file may include: a file header for recording metadata information of the storage file, a data block for storing the value, and an index block for storing the key corresponding to the value in the form of a B+ tree. The logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree, and all the nodes constituting the B+ tree are physically stored continuously.
可选地,外部存储器通过执行以下操作合并两个第一级存储文件:在第一存储文件之后追加写入追加数据块,其中写入第二存储文件的数据块中的值;在追加数据块之后追加写入新索引块,新索引块是基于第一存储文件的索引块和第二存储文件的索引块生成的,第一存储文件的索引块和第二存储文件的索引块中的全部有效键及其对应的值在第一存储文件的数据块和追加数据块中的逻辑地址均分别记录于新B+树中的叶子节点中;在新索引块之后追加写入新文件头,以记录合并后的新文件的元数据信息。Optionally, the external memory merges the two first-level storage files by performing an operation of: additionally writing the additional data block after the first storage file, wherein the value in the data block of the second storage file is written; and appending the data block Then, a new index block is additionally written, and the new index block is generated based on the index block of the first storage file and the index block of the second storage file, and all of the index block of the first storage file and the index block of the second storage file are valid. The key and its corresponding value are recorded in the leaf node of the new B+ tree in the data block and the additional data block of the first storage file respectively; the new file header is additionally written after the new index block to record the merge Metadata information after the new file.
利用本发明实施例的数据库管理方法及数据库系统,最终存储在外部存储器中的文件仅有两层层级结构,文件冗余度较低,查找起来比较方便。With the database management method and the database system of the embodiment of the present invention, the file finally stored in the external memory has only two hierarchical structures, and the file redundancy is low, which is convenient to find.
通过以下参照附图对本发明的示例性实施例的详细描述,本发明的其它特征及其优点将会变得清楚。Other features and advantages of the present invention will become apparent from the Detailed Description of the <RTIgt;
附图说明DRAWINGS
被结合在说明书中并构成说明书的一部分的附图示出了本发明的实施例,并且连同其说明一起用于解释本发明的原理。The accompanying drawings, which are incorporated in FIG
图1、图3是示出了本发明的文件合并方案中涉及的文件的数据结构示意图。1 and 3 are diagrams showing the data structure of a file involved in the file combining scheme of the present invention.
图2是示出了本发明的索引块的B+树结构示意图。2 is a diagram showing the B+ tree structure of an index block of the present invention.
图4是示出了根据本发明实施例的文件合并方法的示意性流程图。FIG. 4 is a schematic flow chart showing a file merging method according to an embodiment of the present invention.
图5、图6示出了基于本发明的文件合并状态示意图。5 and 6 show schematic views of a file merge state based on the present invention.
图7示出了在目标文件中读取数据的方法的示意性流程图。FIG. 7 shows a schematic flow chart of a method of reading data in an object file.
图8是示出了根据本发明实施例的文件合并装置的功能框图。FIG. 8 is a functional block diagram showing a file merging apparatus according to an embodiment of the present invention.
图9是示出了读取单元还可以具有功能模块的结构示意图。FIG. 9 is a schematic diagram showing the structure in which the reading unit can also have a function module.
图10是示出了可以执行本发明实施例的电子设备的硬件配置的示意图。FIG. 10 is a schematic diagram showing a hardware configuration of an electronic device in which an embodiment of the present invention can be performed.
图11是示出了根据本发明实施例的数据库系统的结构示意图。11 is a block diagram showing the structure of a database system according to an embodiment of the present invention.
图12是示出了内部存储器110和外部存储器120之间的数据存储流程图。FIG. 12 is a flow chart showing the data storage between the internal memory 110 and the external memory 120.
图13是示出了存储数据过程中的静态示意图。Figure 13 is a static diagram showing the process of storing data.
图14是示出了一次完整查找的流程示意图。Figure 14 is a flow chart showing a complete lookup.
图15是示出了在文件内部查找的流程示意图。Figure 15 is a flow chart showing the lookup inside a file.
图16是示出了根据本发明实施例的重启恢复过的示意性流程图。FIG. 16 is a schematic flow chart showing restart recovery according to an embodiment of the present invention.
具体实施方式detailed description
现在将参照附图来详细描述本发明的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本发明的范围。Various exemplary embodiments of the present invention will now be described in detail with reference to the drawings. It should be noted that the relative arrangement of the components and steps, numerical expressions and numerical values set forth in the embodiments are not intended to limit the scope of the invention unless otherwise specified.
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本发明及其应用或使用的任何限制。The following description of the at least one exemplary embodiment is merely illustrative and is in no way
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。Techniques, methods and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but the techniques, methods and apparatus should be considered as part of the specification, where appropriate.
在这里示出和讨论的所有例子中,任何具体值应被解释为仅仅是示例 性的,而不是作为限制。因此,示例性实施例的其它例子可以具有不同的值。In the examples shown and discussed herein, any specific values are to be construed as illustrative only and not as a limitation. Thus, other examples of the exemplary embodiments may have different values.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。It should be noted that similar reference numerals and letters indicate similar items in the following figures, and therefore, once an item is defined in one figure, it is not required to be further discussed in the subsequent figures.
<硬件配置><Hardware Configuration>
图1是示出可以实现本发明的实施例的电子设备1000的硬件配置的框图。FIG. 1 is a block diagram showing a hardware configuration of an electronic device 1000 in which an embodiment of the present invention can be implemented.
电子设备1000可以是便携式电脑、台式计算机、手机、平板电脑等。如图1所示,电子设备1000可以包括处理器1100、存储器1200、接口装置1300、通信装置1400、显示装置1500、输入装置1600、扬声器1700、麦克风1800等等。其中,处理器1100可以是中央处理器CPU、微处理器MCU等。存储器1200例如包括ROM(只读存储器)、RAM(随机存取存储器)、诸如硬盘的非易失性存储器等。接口装置1300例如包括USB接口、耳机接口等。通信装置1400例如能够进行有线或无线通信,具体地可以包括Wifi通信、蓝牙通信、2G/3G/4G/5G通信等。显示装置1500例如是液晶显示屏、触摸显示屏等。输入装置1600例如可以包括触摸屏、键盘、体感输入等。用户可以通过扬声器1700和麦克风1800输入/输出语音信息。The electronic device 1000 can be a portable computer, a desktop computer, a mobile phone, a tablet, or the like. As shown in FIG. 1, the electronic device 1000 may include a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, an input device 1600, a speaker 1700, a microphone 1800, and the like. The processor 1100 may be a central processing unit CPU, a microprocessor MCU, or the like. The memory 1200 includes, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a headphone jack, and the like. The communication device 1400 can, for example, perform wired or wireless communication, and specifically can include Wifi communication, Bluetooth communication, 2G/3G/4G/5G communication, and the like. The display device 1500 is, for example, a liquid crystal display, a touch display, or the like. Input device 1600 can include, for example, a touch screen, a keyboard, a somatosensory input, and the like. The user can input/output voice information through the speaker 1700 and the microphone 1800.
图1所示的电子设备仅仅是说明性的并且决不意味着对本发明、其应用或使用的任何限制。应用于本发明的实施例中,电子设备1000的所述存储器1200用于存储指令,所述指令用于控制所述处理器1100进行操作以执行本发明实施例提供的任意一项文件合并方法或者数据库管理方法。本领域技术人员应当理解,尽管在图1中对电子设备1000示出了多个装置,但是,本发明可以仅涉及其中的部分装置,例如,电子设备1000只涉及处理器1100和存储装置1200。技术人员可以根据本发明所公开方案设计指令。指令如何控制处理器进行操作,这是本领域公知,故在此不再详细描述。The electronic device shown in Figure 1 is merely illustrative and is in no way meant to limit the invention, its application or use. In the embodiment of the present invention, the memory 1200 of the electronic device 1000 is configured to store an instruction for controlling the processor 1100 to perform any of the file merging methods provided by the embodiments of the present invention or Database management method. It will be understood by those skilled in the art that although a plurality of devices are illustrated for electronic device 1000 in FIG. 1, the present invention may relate only to some of the devices therein, for example, electronic device 1000 relates only to processor 1100 and storage device 1200. A technician can design instructions in accordance with the disclosed aspects of the present invention. How the instructions control the processor for operation is well known in the art and will not be described in detail herein.
<第一实施例><First Embodiment>
本实施例主要提出了一种对存储在硬盘、软盘、光盘、U盘等外部存 储器中的文件进行合并的方案。本实施例的文件合并方法和装置中述及的文件的键和值是分开存储的,其中键是以B+树的形式进行存储的,由此在将两个文件合并时,可以保持一个文件不动,将另一个文件的值直接追加写入前一个文件,提高了写性能,并且重新构造以B+树形式存储键的索引块,根据新的索引块可以方便地读取合并后的文件中的值,合并后的文件的读性能也不会受到影响。在详细描述本实施例的文件合并方案前,首先就本实施例的文件合并方案中的文件的数据结构进行说明。This embodiment mainly proposes a scheme of merging files stored in an external memory such as a hard disk, a floppy disk, an optical disk, or a USB disk. The key and value of the file described in the file merging method and apparatus of the present embodiment are separately stored, wherein the key is stored in the form of a B+ tree, whereby when merging the two files, one file can be kept. Moves the value of another file directly to the previous file, improves the write performance, and reconstructs the index block that stores the key in the form of a B+ tree. The new index block can be easily read from the merged file. Value, the read performance of the merged file will not be affected. Before describing in detail the file merging scheme of the present embodiment, the data structure of the file in the file merging scheme of the present embodiment will be described first.
图1是示出了本实施例的文件合并方案中的文件的数据结构的示意图。如图1所示,本实施例述及的文件在物理上可以按块分为文件头、数据块以及索引块,每个区块可以由多个页组成。其中,本实施例述及的页是一次I/O的最小单位,一般是系统页的整数倍,不同类型块的页的大小可以不一样。FIG. 1 is a schematic diagram showing the data structure of a file in the file combining scheme of the present embodiment. As shown in FIG. 1, the file described in this embodiment can be physically divided into a file header, a data block, and an index block by blocks, and each block can be composed of a plurality of pages. The page referred to in this embodiment is the minimum unit of one I/O, which is generally an integer multiple of the system page, and the size of the pages of different types of blocks may be different.
数据块用于存放值(Value)。索引块用于以B+树的形式存放值所对应的键(Key),如图2所示,B+树由叶子节点、内部节点以及根节点构成,此处关于B+树的形式为本领域技术人员所公知,这里不再赘述。需要说明的是,B+树中每个叶子节点对应一个键,所有键及其对应的值在数据块中的逻辑地址均分别记录于B+树中的叶子节点中。即B+树的叶子节点中只存放键,而没有存放值,取而代之可以存放值所在数据块中页的偏移量及值在页内的偏移量。The data block is used to store the value. The index block is used to store the key corresponding to the value in the form of a B+ tree. As shown in FIG. 2, the B+ tree is composed of a leaf node, an internal node, and a root node. The form of the B+ tree here is a person skilled in the art. As is well known, it will not be repeated here. It should be noted that each leaf node in the B+ tree corresponds to a key, and the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree. That is, only the key is stored in the leaf node of the B+ tree, and no value is stored. Instead, the offset of the page in the data block where the value is located and the offset of the value in the page can be stored.
可选地,构成B+树的所有节点(根节点、内部节点、叶子节点)在物理上连续存储,由此可以利用磁盘的局部性预加载特点,快速获取B+树中的全部节点,可以提高合并过程中构建新B+树的效率(合并过程将在下文详细说明)。Optionally, all the nodes (root node, internal node, and leaf node) constituting the B+ tree are physically and continuously stored, thereby utilizing the local preloading feature of the disk to quickly acquire all nodes in the B+ tree, and the merge can be improved. The efficiency of building a new B+ tree in the process (the merge process will be explained in more detail below).
文件头用于记录文件的元数据信息。其中,元数据信息可以包括索引块中键的数量、索引块中键的范围、B+树的高度、B+树中第一个叶子节点的逻辑地址以及B+树中内部节点的个数等等。The header is used to record the metadata information of the file. The metadata information may include the number of keys in the index block, the range of keys in the index block, the height of the B+ tree, the logical address of the first leaf node in the B+ tree, and the number of internal nodes in the B+ tree.
至此,结合图1简要说明了本实施例的文件合并方案中文件的数据结构。其中,图1所示的文件的数据结构仅是一种示例,应该知道,其还可以具有多种变形形式。例如,如图3所示,文件的文件头可以包括前文件 头和后文件头,前文件头和后文件头记录的文件的元数据信息可以相同。再例如,本实施例述及的文件还可以包括过滤器(Filter),过滤器可以用于确定访问的键是否在文件中,例如,过滤器可以是布隆过滤器,对于访问到不存在的key,可以通过布隆过滤器快速判断key是不存在的,而不用再去B+树里面查询。因为布隆过滤器实际上是一个哈希表,可以在O(1)的复杂度内判断key存在与否,而B+树的查找时间复杂度是O(logn),所以设置布隆过滤器可以提高查找效率,即可以提升读性能。So far, the data structure of the file in the file merging scheme of the present embodiment is briefly explained with reference to FIG. The data structure of the file shown in FIG. 1 is only an example, and it should be understood that it can also have various modifications. For example, as shown in FIG. 3, the file header of the file may include a front file header and a back file header, and the metadata information of the file recorded by the front file header and the subsequent file header may be the same. For another example, the file described in this embodiment may further include a filter, and the filter may be used to determine whether the accessed key is in the file. For example, the filter may be a Bloom filter, and the access does not exist. Key, you can use the Bloom filter to quickly determine that the key does not exist, and do not have to go to the B+ tree to query. Because the Bloom filter is actually a hash table, you can judge the existence of the key in the complexity of O(1), and the search time complexity of the B+ tree is O(logn), so you can set the Bloom filter. Improve search efficiency, which can improve read performance.
下面结合图4至图9详细说明本实施例的文件合并方案。图4是示出了根据本发明实施例的文件合并方法的示意性流程图,包括步骤S210至S230。该方法可以将两个或更多个文件进行合并,为了便于描述,这里以将第一文件和第二文件进行合并为例进行说明。The file combining scheme of this embodiment will be described in detail below with reference to FIGS. 4 to 9. FIG. 4 is a schematic flowchart showing a file merging method according to an embodiment of the present invention, including steps S210 to S230. The method can combine two or more files. For convenience of description, the first file and the second file are combined as an example for description.
参见图4,在步骤S210,在第一文件之后追加写入追加数据块,其中写入第二文件的数据块中的值。Referring to FIG. 4, in step S210, an additional data block is additionally written after the first file, in which the value in the data block of the second file is written.
此处第二文件的新鲜度可以大于第一文件,即第二文件可以是在后存储在外部存储器中的,第一文件可以是在先存储在外部存储器中的。Here, the freshness of the second file may be greater than the first file, that is, the second file may be stored later in the external memory, and the first file may be previously stored in the external memory.
由于本实施例述及的文件中的值和键是分开存储的,因此在将第一文件和第二文件合并时,可以在第一文件之后追加写入第二文件的数据块中的值,这里可以在第一文件之后追加写入值的块称为追加数据块。也就是说可以在第一文件之后的追加数据块中重新写入第二文件的数据块中的值,从而物理上文件F的结尾和追加数据块的地址连续。Since the values and keys in the file described in this embodiment are separately stored, when the first file and the second file are merged, the value in the data block of the second file may be additionally appended after the first file. Here, a block in which a value is added after the first file is referred to as an additional data block. That is to say, the value in the data block of the second file can be rewritten in the additional data block after the first file, so that the end of the file F and the address of the additional data block are consecutive.
在第一文件之后追加写入第二文件的数据块中的值后,就可以建立新的索引信息,即步骤S220,在追加数据块之后追加写入新索引块。After the value in the data block of the second file is additionally appended to the first file, new index information can be created, that is, in step S220, the new index block is additionally written after the additional data block.
此处新索引块是基于第一文件的索引块和第二文件的索引块生成的。如上文所述,第二文件的新鲜度可以大于第一文件,因此第二文件中的键值有可能是对第一文件中的键值的修改、删除、替换等,因此对于第一文件和第二文件的索引块中存在的相同的键,可以选取新鲜度较高的第二文件中的键作为有效键,摒弃第一文件中的键,以此构建新索引块。Here, the new index block is generated based on the index block of the first file and the index block of the second file. As described above, the freshness of the second file may be greater than the first file, so the key value in the second file may be a modification, deletion, replacement, etc. of the key value in the first file, and thus for the first file and The same key exists in the index block of the second file, and the key in the second file with higher freshness can be selected as the valid key, and the key in the first file is discarded, thereby constructing a new index block.
也就是说,生成的新索引块中的键均为有效键,其对应的值均为有效值。其中新索引块中的键也是以B+树的形式存储的,该B+树是根据第一文 件的索引块和第二文件的索引块重新生成的,因此可以称为新B+树。第一文件的索引块和第二文件的索引块中的全部有效键及其对应的值在第一文件的数据块和所述追加数据块中的逻辑地址均分别记录于新B+树中的叶子节点中。That is to say, the keys in the generated new index block are all valid keys, and the corresponding values are all valid values. The key in the new index block is also stored in the form of a B+ tree which is regenerated according to the index block of the first file and the index block of the second file, and thus may be referred to as a new B+ tree. The index block of the first file and all the valid keys in the index block of the second file and their corresponding values are respectively recorded in the data block of the first file and the logical address in the additional data block in the leaf of the new B+ tree. In the node.
正如上文所述,第一文件的索引块和第二文件的索引块中的B+树的所有节点在物理上是连续存储的,因此在重新构建新B+树的过程中,可以利用磁盘的局部性预加载特点,通过简单的遍历连续的磁盘块就可以获取第一文件的索引块和第二文件的索引块,从而可以提高新B+树的构造效率。As described above, the index block of the first file and all the nodes of the B+ tree in the index block of the second file are physically stored continuously, so that in the process of reconstructing the new B+ tree, the local portion of the disk can be utilized. The feature of the preloading feature is that the index block of the first file and the index block of the second file can be obtained by simply traversing successive disk blocks, thereby improving the construction efficiency of the new B+ tree.
在构造新B+树以生成新索引块后,第一文件中的索引块被无效,由新索引块代替。其中,这里述及的无效是指在后续查找过程中,使用新索引块进行查找,而不再使用旧的索引块。即在生成新索引块后,可以不删除旧索引块。After constructing a new B+ tree to generate a new index block, the index block in the first file is invalidated and replaced by the new index block. Among them, the invalidity mentioned here means that in the subsequent search process, the new index block is used for searching, and the old index block is no longer used. That is, after generating a new index block, the old index block may not be deleted.
在步骤S230,在新索引块之后追加写入新文件头,以记录合并后的新文件的元数据信息。In step S230, a new file header is additionally written after the new index block to record the metadata information of the merged new file.
新文件的元数据信息可以包括新索引块中键的数量、新索引块中键的范围、新B+树的高度、新B+树中第一个叶子节点的逻辑地址以及新B+树中内部节点的个数等等。在生成新文件头后,可以删除第二文件,释放存储空间。The metadata information of the new file may include the number of keys in the new index block, the range of keys in the new index block, the height of the new B+ tree, the logical address of the first leaf node in the new B+ tree, and the internal nodes in the new B+ tree. Number and so on. After generating a new file header, you can delete the second file and free up storage space.
图5是示出了根据本发明实施例的将G文件合并到F文件的合并过程的示意图。FIG. 5 is a schematic diagram showing a merge process of merging G files into F files according to an embodiment of the present invention.
根据图5以及上文结合图3的描述可知,在合并过程中,F文件不变,仅需把G文件中的值追加写入F文件,并生成新的索引块和新文件头即可。与现有LevelDB中合并时需要一一取出键值对重新构造相比,合并过程较为简单,并且根据合并后的B+树可以方便地查找文件中的键所对应的值,读性能也得到了提升。According to FIG. 5 and the description above with reference to FIG. 3, in the merge process, the F file is unchanged, and only the value in the G file needs to be additionally written into the F file, and a new index block and a new file header are generated. Compared with the existing LevelDB, it is necessary to take out the key value and reconstruct the comparison. The merge process is simpler, and according to the merged B+ tree, the value corresponding to the key in the file can be conveniently found, and the read performance is improved. .
图6是示出了根据本发明实施例的将G文件合并到F文件的合并过程的另一例子的示意图。FIG. 6 is a schematic diagram showing another example of a merge process of merging G files into F files according to an embodiment of the present invention.
与图5不同的是,图6中的F文件和G文件都包括位于文件头部的前文件头和位于文件尾部的后文件头。其中,前文件头和后文件头的内容相 同。Different from FIG. 5, both the F file and the G file in FIG. 6 include a front file header located at the head of the file and a rear file header located at the end of the file. The contents of the front file header and the last file header are the same.
与上文述及的合并过程不同的是,在追加写入新文件头后,还可以根据新文件头更新F文件的前文件头,作为新文件的前文件头,而以新文件头作为新文件的后文件头。Different from the above-mentioned merge process, after the new file header is additionally written, the previous file header of the F file can be updated according to the new file header as the front file header of the new file, and the new file header is used as the new file header. The file header of the file.
由此在文件合并过程中,可以维护两个文件头。这是因为,合并过程中的追加写入是一种“破坏性写入”,即在将G文件合并到F文件时,会破坏F文件。其中,这里述及的破坏性写入是指将G文件合并到F文件,合并后的新文件的新文件头记录的是合并后的新文件的元数据信息,合并前的F文件的文件头被无效,因此如果没有采用防护措施,一旦合并过程失败,F文件将无法被修复。因此本发明采用维护双文件头的方式,可以解决因异常情况导致文件被破坏而无法恢复的问题。This allows you to maintain two file headers during the file merge process. This is because the append write during the merge process is a kind of "destructive write", that is, when the G file is merged into the F file, the F file is destroyed. Among them, the destructive writing mentioned here refers to the merge of the G file into the F file, and the new file header of the merged new file records the metadata information of the merged new file, and the file header of the F file before the merge. It is invalid, so if no protection measures are taken, the F file will not be repaired once the merge process fails. Therefore, the present invention adopts a method of maintaining a double file header, and can solve the problem that the file cannot be recovered due to an abnormal situation.
具体来说,合并正常完成时,新文件的首尾两个文件头都能得到正常更新,且是一样的。在出现异常情况需要恢复时,随便以那个文件头为准都没问题。Specifically, when the merge is completed normally, the first two files of the new file can be updated normally and are the same. In the event of an abnormal situation that needs to be restored, it is ok to take the header of the file as it is.
如果在还没写完末尾的新文件头时,发生异常。由于此时首部的文件头并尚未得到更新,还是完好的,只不过是旧的而已。通过该文件头,可将上次合并未完成的残余信息截断,得到一个旧版的完整的文件。An exception occurs if a new file header has not been written at the end. Since the header of the header at this time has not been updated, it is still intact, but it is only old. Through this file header, the residual information of the last merged unfinished can be truncated to get an old version of the complete file.
如果在更新前文件头时,发生异常。由于此时新文件头已经是完整的了,恢复时,只要以新文件头为准即可。即可以用新文件头重新更新前文件头,以保障两个文件头在初始状态时的完整性和一致性。An exception occurs if the file header is updated before. Since the new file header is already complete at this time, as long as the new file header is used for recovery. That is, the previous file header can be re-updated with the new header to ensure the integrity and consistency of the two headers in the initial state.
图7是示出了从文件中读取请求键所对应的目标值的方法的示意性流程图。FIG. 7 is a schematic flowchart showing a method of reading a target value corresponding to a request key from a file.
参见图7,在步骤S310,获取目标文件的文件头和索引块。Referring to FIG. 7, in step S310, a file header and an index block of the target file are acquired.
在步骤S320,根据文件头判断请求键是否在文件头所指示的键的范围内,不在的话,表明目标文件中不存在请求键所对应的值,读取结束。In step S320, it is determined according to the file header whether the request key is within the range of the key indicated by the file header. If not, it indicates that the value corresponding to the request key does not exist in the target file, and the reading ends.
在判定请求键在范围内的情况下,执行步骤S330,基于索引块的B+树结构,在索引块中查找对应于请求键的叶子节点。在索引块中查找不到与请求键对应的叶子节点的情况下,表明目标文件中不存在请求键所对应的值,读取结束。在查找到的情况下,可以执行步骤S340,根据所查找到 的叶子节点所存储的键所对应的值在目标文件中的数据块中的逻辑地址读取目标值。In the case where it is determined that the request key is in the range, step S330 is performed to find a leaf node corresponding to the request key in the index block based on the B+ tree structure of the index block. When the leaf node corresponding to the request key is not found in the index block, it indicates that the value corresponding to the request key does not exist in the target file, and the reading ends. In the case of finding, step S340 may be performed to read the target value in the logical address in the data block in the target file according to the value corresponding to the key stored by the found leaf node.
图8是示出了根据本发明一实施例的文件合并装置的功能框图。其中,文件合并装置500的功能模块可以由实现本发明原理的硬件、软件或硬件和软件的结合来实现。本领域技术人员可以理解的是,图7所描述的功能模块可以组合起来或者划分成子模块,从而实现上述发明的原理。因此,本文的描述可以支持对本文描述的功能模块的任何可能的组合、或者划分、或者更进一步的限定。FIG. 8 is a functional block diagram showing a file merging apparatus according to an embodiment of the present invention. The functional modules of the file combining apparatus 500 may be implemented by hardware, software, or a combination of hardware and software that implements the principles of the present invention. It will be understood by those skilled in the art that the functional modules described in FIG. 7 can be combined or divided into sub-modules to implement the principles of the above described invention. Accordingly, the description herein may support any possible combination, or division, or further limitation of the functional modules described herein.
图8所示的文件合并装置500可以用来实现图3至图6所示的检测方法,下面仅就文件合并装置500可以具有的功能模块以及各功能模块可以执行的操作做简要说明,对于其中涉及的细节部分可以参见上文结合图3至图6的描述,这里不再赘述。The file merging device 500 shown in FIG. 8 can be used to implement the detecting method shown in FIG. 3 to FIG. 6. The following is only a brief description of the functional modules that the file merging device 500 can have and the operations that can be performed by the functional modules. For details of the reference, refer to the description above with reference to FIG. 3 to FIG. 6 , and details are not described herein again.
如图8所示,文件合并装置500包括第一写入单元510、B树生成单元520、第二写入单元530以及第三写入单元540。As shown in FIG. 8, the file combining apparatus 500 includes a first writing unit 510, a B-tree generating unit 520, a second writing unit 530, and a third writing unit 540.
第一写入单元510用于在第一文件之后写入追加数据块,其中写入第二文件的数据块中的值。The first writing unit 510 is configured to write an additional data block after the first file, wherein the value in the data block of the second file is written.
B树生成单元520用于基于第一文件的索引块和第二文件的索引块生成新B+树,第一文件的索引块和第二文件的索引块中的全部键以及每个键所对应的值在第一文件的数据块和追加数据块中的逻辑地址均分别记录于新B+树中的叶子节点中;The B-tree generating unit 520 is configured to generate a new B+ tree based on the index block of the first file and the index block of the second file, all the keys in the index block of the first file and the index block of the second file, and each key The logical addresses in the data block and the additional data block of the first file are respectively recorded in the leaf nodes in the new B+ tree;
第二写入单元530用于在追加数据块之后追加写入新索引块,其中写入新B+树。The second writing unit 530 is configured to additionally write a new index block after appending the data block, in which a new B+ tree is written.
第三写入单元540用于在新索引块之后追加写入新文件头,以记录合并后的新文件的元数据信息。The third writing unit 540 is configured to additionally write a new file header after the new index block to record the metadata information of the merged new file.
如图8所示,文件合并装置500还可以可选地包括更新单元550。更新单元550能够根据新文件头更新第一文件的文件头,以用新文件头中的元数据信息替换第一文件的文件头中的元数据信息。As shown in FIG. 8, the file combining apparatus 500 may also optionally include an updating unit 550. The update unit 550 can update the file header of the first file according to the new file header to replace the metadata information in the file header of the first file with the metadata information in the new file header.
具体地,文件可以包括位于文件头部的前文件头和位于文件尾部的后文件头,前文件头和后文件头的内容相同。更新单元550可以根据新文件 头更新第一文件的前文件头,作为新文件的前文件头,而以新文件头作为新文件的后文件头。Specifically, the file may include a front file header located at the head of the file and a subsequent file header located at the end of the file, and the contents of the front file header and the subsequent file header are the same. The update unit 550 can update the previous file header of the first file as the previous file header of the new file according to the new file header, and use the new file header as the post file header of the new file.
如图8所示,文件合并装置500还可以可选地包括第一还原单元560和第二还原单元570。As shown in FIG. 8, the file combining apparatus 500 may further include a first restoring unit 560 and a second restoring unit 570.
第一还原单元560可以在新文件头中写入新文件的元数据信息的步骤出错的情况下,根据第一文件的文件头将新文件还原为合并前的第一文件。The first restoration unit 560 may restore the new file to the first file before the merge according to the file header of the first file in the case where the step of writing the metadata information of the new file in the new file header is erroneous.
第二还原单元570可以在更新第一文件的文件头的步骤出错的情况下,根据新文件头重新更新第一文件的文件头。The second restoration unit 570 may re-update the file header of the first file according to the new file header in the case where the step of updating the file header of the first file is erroneous.
如图8所示,文件合并装置500还可以包括读取单元580。读取单元580能够从目标文件中读取请求键所对应的目标值。图8是示出了读取单元可以具有的功能模块的功能框图。As shown in FIG. 8, the file combining apparatus 500 may further include a reading unit 580. The reading unit 580 can read the target value corresponding to the request key from the target file. FIG. 8 is a functional block diagram showing functional modules that a reading unit can have.
如图9所示,读取单元580可以包括获取模块581、判断模块583、查找模块585以及读值模块587。As shown in FIG. 9, the reading unit 580 may include an obtaining module 581, a determining module 583, a searching module 585, and a reading module 587.
获取模块581可以获取目标文件的文件头和索引块,判断模块583可以根据文件头判断请求键是否在文件头所指示的键的范围内。在判定请求键在范围内的情况下,查找模块585可以基于索引块的B+树结构,在索引块中查找对应于请求键的叶子节点。读值模块587可以根据所查找到的叶子节点所存储的键所对应的值在目标文件中的数据块中的逻辑地址读取目标值。The obtaining module 581 can acquire the file header and the index block of the target file, and the determining module 583 can determine, according to the file header, whether the request key is within the range of the key indicated by the file header. In the case where it is determined that the request key is within the range, the lookup module 585 can look up the leaf node corresponding to the request key in the index block based on the B+ tree structure of the index block. The reading module 587 can read the target value in the logical address in the data block in the target file according to the value corresponding to the key stored by the found leaf node.
在本实施例中,还提供一种电子设备,包括存储器和处理器。该存储器用于存储可执行指令,该处理器用于根据该可执行指令的控制运行该电子设备执行本实施例中提供的任意一项文件合并方法。在一个例子中,该电子设备可以是如图10所示的电子设备1000。In this embodiment, an electronic device is further provided, including a memory and a processor. The memory is configured to store executable instructions, and the processor is configured to execute the electronic device to perform any one of the file merging methods provided in the embodiment according to the control of the executable instructions. In one example, the electronic device can be an electronic device 1000 as shown in FIG.
上文中已经参考附图详细描述了根据本实施例的文件合并方法和装置。根据本实施例,可以实现文件的键和值分开存储,键以B+树的形式进行存储。由此在将两个文件进行合并时,可以保持一个文件不动,将另一个文件的值直接追加写入即可,提高了写性能。并且合并后的索引块为新的B+树,根据新的索引块可以方便地读取合并后的文件中的值,合并后的文件的读性能也不会受到影响。The file merging method and apparatus according to the present embodiment have been described in detail above with reference to the accompanying drawings. According to this embodiment, the keys and values of the file can be stored separately, and the keys are stored in the form of a B+ tree. Therefore, when the two files are merged, one file can be kept unchanged, and the value of the other file can be directly added and written, thereby improving the writing performance. And the merged index block is a new B+ tree, and the value in the merged file can be conveniently read according to the new index block, and the read performance of the merged file is not affected.
<第二实施例><Second embodiment>
现有技术中,LevelDB在读、写、合并、数据清理、重启恢复等多方面都暴露了许多不足,针对于此,本实施例提出了一种新的数据库管理方法及数据库系统。In the prior art, LevelDB exposes many shortcomings in many aspects such as reading, writing, merging, data cleaning, restarting recovery, etc. To this end, this embodiment proposes a new database management method and database system.
图11是示出了根据本发明一实施例的数据库系统的结构示意图。如图1所示,本发明的数据库系统100主要包括内部存储器110和外部存储器120。内部存储器110和外部存储器120可以相互配合以完成数据存储。FIG. 11 is a block diagram showing the structure of a database system according to an embodiment of the present invention. As shown in FIG. 1, the database system 100 of the present invention mainly includes an internal memory 110 and an external memory 120. The internal memory 110 and the external memory 120 can cooperate to complete data storage.
图12是示出了内部存储器110和外部存储器120之间相配合以实现数据存储的流程图。FIG. 12 is a flow chart showing the cooperation between the internal memory 110 and the external memory 120 to implement data storage.
参见图12,首先在步骤S110,可以由内部存储器110将需要存储的多条数据写入外部存储器120中的日志文件。其中,每条数据包括相对应的键和值,此处可以按照数据到来的时间顺序依次写入日志文件。Referring to FIG. 12, first in step S110, a plurality of pieces of data to be stored may be written by the internal memory 110 to a log file in the external memory 120. Each piece of data includes corresponding keys and values, and the log files can be sequentially written in the order in which the data arrives.
然后可以执行步骤S120,由外部存储器120将日志文件中的数据写入内部存储器110中的内存表。其中,写入内存表中的数据可以按照键的大小有序存储。例如,内存表中存储的数据可以采用跳表结构,以使得内存表中存储的数据按照键的大小有序排列。例如,内存表可以由一个哈希表组成,此处的哈希表可以包括一个或多个哈希桶,每个哈希桶对应一个跳表,内存表中的每条数据构成跳表的一个元素,其中,跳表中的元素的顺序是按照键的大小有序排列的。Then, step S120 may be performed to write the data in the log file to the memory table in the internal memory 110 by the external memory 120. Among them, the data written in the memory table can be stored in order according to the size of the key. For example, the data stored in the memory table may adopt a jump table structure, so that the data stored in the memory table is arranged in an order according to the size of the key. For example, the memory table may be composed of a hash table, where the hash table may include one or more hash buckets, each hash bucket corresponds to one jump table, and each data in the memory table constitutes one of the jump tables. An element in which the order of the elements in the jump table is ordered in order of the size of the key.
由此,在跳表前嵌入一个哈希表,这样一方面可以降低锁粒度,对于并发读写操作,如果键不相同,则可在各自的哈希桶所对应的跳表中进行快速的查找插入。另一方面,在扩大内存表的大小的同时,不至于扩大跳表的大小,这可以降低跳表随着数据量变大而变成线性查找的概率,提高整体的查找效率。Therefore, a hash table is embedded before the jump table, so that the lock granularity can be reduced on the one hand, and for concurrent read and write operations, if the keys are not the same, the fast lookup can be performed in the jump table corresponding to the respective hash bucket. insert. On the other hand, while expanding the size of the memory table, the size of the jump table is not enlarged, which can reduce the probability that the jump table becomes a linear search as the amount of data becomes larger, and improves the overall search efficiency.
在写入内存表的数据逐渐增多,以至于内存表的大小超过预定阈值时,内部存储器110可以将内存表转化为只读内存表(步骤S130),此时日志文件中未写入内部存储器110的数据可以写入新的内存表中。顾名思义,只读内存表只可以读取,不可以写入。When the data written in the memory table is gradually increased, so that the size of the memory table exceeds a predetermined threshold, the internal memory 110 can convert the memory table into a read-only memory table (step S130), at which time the internal memory 110 is not written in the log file. The data can be written to a new memory table. As the name suggests, read-only memory tables can only be read and cannot be written.
需要说明的是,外部存储器120中的日志文件和内部存储器110中的 内存表可以是一一对应的,即对于一条key-Value数据来说,可以将其写入日志文件,然后再从日志文件写入内存表,在内存表的大小超过预定阈值需要转换为只读内存表时,新到来的数据可以写入新的日志文件,新的日志文件中的数据可以写入新的内存表。It should be noted that the log file in the external memory 120 and the memory table in the internal memory 110 may have a one-to-one correspondence, that is, for a key-Value data, it may be written to the log file, and then from the log file. When writing to a memory table, when the size of the memory table exceeds a predetermined threshold and needs to be converted into a read-only memory table, the newly arrived data can be written into a new log file, and the data in the new log file can be written into the new memory table.
在内存表转化为只读内存表后,可以执行步骤S140,可以由内部存储器110将只读内存表中的数据写入外部存储器120,以得到第一级存储文件。外部存储器120可以执行步骤S150,对存储在其内的两个或更多个第一级存储文件进行合并,以得到第二级存储文件。After the memory table is converted into the read-only memory table, step S140 may be performed, and the data in the read-only memory table may be written into the external memory 120 by the internal memory 110 to obtain the first-level storage file. The external memory 120 may perform step S150 to merge two or more first-level storage files stored therein to obtain a second-level storage file.
另外,外部存储器120可以根据第一命名规则指定第一级存储文件的主文件名,并可以根据第二命名规则指定第二级存储文件的主文件名,此处第一命名规则与第二命名规则可以设置不同,以便基于主文件名区分存储文件是第一级存储文件还是第二级存储文件。例如,可以在第一级存储文件的主文件名后加上“_0”,在第二级存储文件的主文件名后加上“_1”来区分。即可以以xxx_0.hdb,xxx_1.hdb分别命名第一级存储文件和第二级存储文件。In addition, the external memory 120 may specify a primary file name of the first-level storage file according to the first naming rule, and may specify a primary file name of the second-level storage file according to the second naming rule, where the first naming rule and the second naming The rules can be set differently to distinguish whether the storage file is a first-level storage file or a second-level storage file based on the primary file name. For example, "_0" can be added after the main file name of the first-level storage file, and "_1" is added after the main file name of the second-level storage file. That is, the first level storage file and the second level storage file can be named by xxx_0.hdb, xxx_1.hdb respectively.
至此,结合图12简要说明了数据库系统中外部存储器和内部存储器相配合以实现将数据持久化存储到外部存储器的存储流程。图13是示出了存储数据过程中的静态示意图。So far, the storage flow of the external memory and the internal memory in the database system to realize the persistent storage of data to the external memory is briefly explained in conjunction with FIG. Figure 13 is a static diagram showing the process of storing data.
如图13所示,可以在内部存储器中维护只读内存表队列,在只读内存表中的数据未全部写入外部存储器,而新的内存表的大小超过预定阈值时,将新的内存表转化为另一个只读内存表,并放入只读内存表队列。由此通过维护内存表队列可以应对在高频度写的时候不至于因为数据来不及合并,而内存表又写满之后出现阻塞的问题。As shown in Figure 13, the read-only memory table queue can be maintained in the internal memory. The data in the read-only memory table is not all written to the external memory, and the new memory table will be new when the size of the new memory table exceeds a predetermined threshold. Convert to another read-only memory table and put it into a read-only memory table queue. Therefore, by maintaining the memory table queue, it is possible to cope with the problem of blocking when the high frequency is written, because the data is too late to be merged, and the memory table is full.
至此结合图11至图13描述了本实施例的数据库系统的结构以及数据库系统将数据持久化存储到外部存储器中的数据存储流程。下面就持久化存储到外部存储器中的存储文件的合并过程、数据查找过程、在特殊情况下重启数据库系统时的数据恢复过程分别进行说明。The structure of the database system of the present embodiment and the data storage flow of the database system for persistently storing data in the external memory have been described so far with reference to FIGS. 11 to 13. The following describes the process of merging the storage files stored in the external storage, the data search process, and the data recovery process when the database system is restarted under special circumstances.
一、存储文件合并过程First, the storage file consolidation process
在详细说明本发明的文件合并过程前,首先就持久化存储在外部存储 器中的存储文件的数据结构仅说明。Before the detailed description of the file merging process of the present invention, the data structure of the storage file stored in the external memory is first described only.
在图1中,已经示出了存储在外部存储器中的存储文件的数据结构的示意图。如图1所示,本发明述及的文件在物理上可以按块分为文件头、数据块以及索引块,每个区块可以由多个页组成。其中,本文述及的页是一次I/O的最小单位,一般是系统页的整数倍,不同类型块的页的大小可以不一样。In Fig. 1, a schematic diagram of a data structure of a storage file stored in an external memory has been shown. As shown in FIG. 1, the files described in the present invention can be physically divided into file headers, data blocks, and index blocks by blocks, and each block can be composed of a plurality of pages. Among them, the page mentioned in this paper is the minimum unit of I/O, which is generally an integer multiple of the system page. The size of the pages of different types of blocks can be different.
数据块用于存放值(Value)。索引块用于以B+树的形式存放值所对应的键(Key),此处关于B+树的形式为本领域技术人员所公知,这里不再赘述。需要说明的是,B+树中每个叶子节点对应一个键,所有键及其对应的值在数据块中的逻辑地址均分别记录于B+树中的叶子节点中。即B+树的叶子节点中只存放键,而没有存放值,取而代之可以存放值所在数据块中页的偏移量及值在页内的偏移量。The data block is used to store the value. The index block is used to store the key corresponding to the value in the form of a B+ tree. The form of the B+ tree is well known to those skilled in the art and will not be described here. It should be noted that each leaf node in the B+ tree corresponds to a key, and the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree. That is, only the key is stored in the leaf node of the B+ tree, and no value is stored. Instead, the offset of the page in the data block where the value is located and the offset of the value in the page can be stored.
可选地,构成B+树的所有节点(根节点、内部节点、叶子节点)在物理上连续存储,由此可以利用磁盘的局部性预加载特点,快速获取B+树中的全部节点,可以提高合并过程中构建新B+树的效率(合并过程将在下文详细说明)。Optionally, all the nodes (root node, internal node, and leaf node) constituting the B+ tree are physically and continuously stored, thereby utilizing the local preloading feature of the disk to quickly acquire all nodes in the B+ tree, and the merge can be improved. The efficiency of building a new B+ tree in the process (the merge process will be explained in more detail below).
文件头用于记录文件的元数据信息。其中,元数据信息可以包括索引块中键的数量、索引块中键的范围、B+树的高度、B+树中第一个叶子节点的逻辑地址以及B+树中内部节点的个数等等。The header is used to record the metadata information of the file. The metadata information may include the number of keys in the index block, the range of keys in the index block, the height of the B+ tree, the logical address of the first leaf node in the B+ tree, and the number of internal nodes in the B+ tree.
至此,结合图1简要说明了存储在外部存储器中的存储文件的数据结构。其中,图1所示的文件的数据结构仅是一种示例,应该知道,其还可以具有多种变形形式。例如,如图3所示,存储文件的文件头可以包括前文件头和后文件头,前文件头和后文件头记录的文件的元数据信息可以相同。再例如,存储文件还可以包括过滤器(Filter),过滤器可以用于确定访问的键是否在文件中,例如过滤器可以是布隆过滤器,对于访问到不存在的key,可以通过布隆过滤器快速判断key是不存在的,而不用再去B+树里面查询。因为布隆过滤器实际上是一个哈希表,可以在O(1)的复杂度内判断key存在与否,而B+树的查找时间复杂度是O(logn),所以设置布隆过滤器可以提高查找效率,即可以提升读性能。So far, the data structure of the storage file stored in the external memory has been briefly explained in conjunction with FIG. The data structure of the file shown in FIG. 1 is only an example, and it should be understood that it can also have various modifications. For example, as shown in FIG. 3, the file header of the storage file may include a front file header and a back file header, and the metadata information of the file recorded by the front file header and the subsequent file header may be the same. For another example, the storage file may further include a filter, and the filter may be used to determine whether the accessed key is in the file, for example, the filter may be a Bloom filter, and for accessing a key that does not exist, the Bronze may be used. The filter quickly determines that the key does not exist, and does not need to go to the B+ tree to query. Because the Bloom filter is actually a hash table, you can judge the existence of the key in the complexity of O(1), and the search time complexity of the B+ tree is O(logn), so you can set the Bloom filter. Improve search efficiency, which can improve read performance.
下面结合图4至图6详细说明存储文件的合并过程。图4是示出了根据本发明实施例的存储文件合并方法的示意性流程图。该方法可以将两个或更多个存储文件进行合并,其中,可以将两个或更多个第一级存储文件合并为一个第二级存储文件,也可以合并两个或更多个第二级存储文件,生成新的第二级存储文件。为了便于描述,这里以将第一存储文件和第二存储文件进行合并为例来说明本实施例的存储文件的合并过程。The merging process of the storage file will be described in detail below with reference to FIGS. 4 to 6. FIG. 4 is a schematic flow chart showing a method of merging storage files according to an embodiment of the present invention. The method may combine two or more storage files, wherein two or more first-level storage files may be combined into one second-level storage file, or two or more seconds may be combined. The level stores the file and generates a new second-level storage file. For the convenience of description, the merging process of the storage file of this embodiment is described here by taking the first storage file and the second storage file as an example.
参见图4,在步骤S210,在第一存储文件之后追加写入追加数据块,其中写入第二存储文件的数据块中的值。Referring to FIG. 4, in step S210, an additional data block is additionally written after the first storage file, wherein the value in the data block of the second storage file is written.
此处第二存储文件的新鲜度可以大于第一存储文件,即第二存储文件可以是在后存储在外部存储器中的,第一存储文件可以是在先存储在外部存储器中的。Here, the freshness of the second storage file may be greater than the first storage file, that is, the second storage file may be stored later in the external storage, and the first storage file may be previously stored in the external storage.
由于存储文件中的值和键是分开存储的,因此在将第一存储文件和第二存储文件合并时,可以在第一存储文件之后追加写入第二存储文件的数据块中的值,这里可以在第一存储文件之后追加写入值的块称为追加数据块。也就是说可以在第一存储文件之后的追加数据块中重新写入第二存储文件的数据块中的值,从而物理上文件F的结尾和追加数据块的地址连续。Since the value and the key in the storage file are separately stored, when the first storage file and the second storage file are merged, the value written in the data block of the second storage file may be appended after the first storage file, where A block in which a write value can be added after the first storage file is referred to as an additional data block. That is to say, the value in the data block of the second storage file can be rewritten in the additional data block after the first storage file, so that the end of the file F and the address of the additional data block are consecutive.
在第一存储文件之后追加写入第二存储文件的数据块中的值后,就可以建立新的索引信息,即步骤S220,在追加数据块之后追加写入新索引块。After the value written in the data block of the second storage file is added after the first storage file, new index information can be created, that is, in step S220, the new index block is additionally written after the additional data block.
此处新索引块是基于第一存储文件的索引块和第二存储文件的索引块生成的。如上文所述,第二存储文件的新鲜度可以大于第一存储文件,因此第二存储文件中的键值有可能是对第一存储文件中的键值的修改、删除、替换等,因此对于第一存储文件和第二存储文件的索引块中存在的相同的键,可以选取新鲜度较高的第二存储文件中的键作为有效键,摒弃第一存储文件中的键,以此构建新索引块。Here, the new index block is generated based on the index block of the first storage file and the index block of the second storage file. As described above, the freshness of the second storage file may be greater than the first storage file, so the key value in the second storage file may be a modification, deletion, replacement, etc. of the key value in the first storage file, and thus The same key existing in the index block of the first storage file and the second storage file may select a key in the second storage file with higher freshness as a valid key, and discard the key in the first storage file to construct a new key Index block.
也就是说,生成的新索引块中的键均为有效键,其对应的值均为有效值。其中新索引块中的键也是以B+树的形式存储的,该B+树是根据第一存储文件的索引块和第二存储文件的索引块重新生成的,因此可以称为新B+树。第一存储文件的索引块和第二存储文件的索引块中的全部有效键及其对应的值在第一存储文件的数据块和所述追加数据块中的逻辑地址均分别 记录于新B+树中的叶子节点中。That is to say, the keys in the generated new index block are all valid keys, and the corresponding values are all valid values. The key in the new index block is also stored in the form of a B+ tree, which is regenerated according to the index block of the first storage file and the index block of the second storage file, and thus may be referred to as a new B+ tree. The index key of the first storage file and all the valid keys in the index block of the second storage file and their corresponding values are respectively recorded in the new B+ tree in the data block of the first storage file and the logical address in the additional data block. In the leaf node.
正如上文所述,第一存储文件的索引块和第二存储文件的索引块中的B+树的所有节点在物理上是连续存储的,因此在重新构建新B+树的过程中,可以利用磁盘的局部性预加载特点,通过简单的遍历连续的磁盘块就可以获取第一存储文件的索引块和第二存储文件的索引块,从而可以提高新B+树的构造效率。As described above, all the nodes of the B+ tree in the index block of the first storage file and the index block of the second storage file are physically stored continuously, so that the disk can be utilized in the process of reconstructing the new B+ tree. The local preloading feature can obtain the index block of the first storage file and the index block of the second storage file by simply traversing successive disk blocks, thereby improving the construction efficiency of the new B+ tree.
在构造新B+树以生成新索引块后,第一存储文件中的索引块被无效,由新索引块代替。其中,这里述及的无效是指在后续查找过程中,使用新索引块进行查找,而不再使用旧的索引块。即在生成新索引块后,可以不删除旧索引块。After constructing a new B+ tree to generate a new index block, the index block in the first storage file is invalidated and replaced by the new index block. Among them, the invalidity mentioned here means that in the subsequent search process, the new index block is used for searching, and the old index block is no longer used. That is, after generating a new index block, the old index block may not be deleted.
在步骤S230,在新索引块之后追加写入新文件头,以记录合并后的新文件的元数据信息。In step S230, a new file header is additionally written after the new index block to record the metadata information of the merged new file.
新文件的元数据信息可以包括新索引块中键的数量、新索引块中键的范围、新B+树的高度、新B+树中第一个叶子节点的逻辑地址以及新B+树中内部节点的个数等等。在生成新文件头后,可以删除第二存储文件,释放存储空间。The metadata information of the new file may include the number of keys in the new index block, the range of keys in the new index block, the height of the new B+ tree, the logical address of the first leaf node in the new B+ tree, and the internal nodes in the new B+ tree. Number and so on. After generating a new file header, you can delete the second storage file and free up storage space.
图5是示出了将G文件合并到F文件的合并过程的示意图。FIG. 5 is a schematic diagram showing a merge process of merging G files into F files.
根据图5以及上文结合图4的描述可知,在合并过程中,F文件不变,仅需把G文件中的值追加写入F文件,并生成新的索引块和新文件头即可。与现有LevelDB中合并时需要一一取出键值对重新构造相比,合并过程较为简单,并且根据合并后的B+树可以方便地查找文件中的键所对应的值,读性能也得到了提升。According to FIG. 5 and the description above with reference to FIG. 4, in the merge process, the F file is unchanged, and only the value in the G file needs to be additionally written into the F file, and a new index block and a new file header are generated. Compared with the existing LevelDB, it is necessary to take out the key value and reconstruct the comparison. The merge process is simpler, and according to the merged B+ tree, the value corresponding to the key in the file can be conveniently found, and the read performance is improved. .
图6是示出了将G文件合并到F文件的合并过程的另一示意图。FIG. 6 is another schematic diagram showing a merge process of merging G files into F files.
与图5不同的是,图6中的F文件和G文件都包括位于文件头部的前文件头和位于文件尾部的后文件头。其中,前文件头和后文件头的内容相同。Different from FIG. 5, both the F file and the G file in FIG. 6 include a front file header located at the head of the file and a rear file header located at the end of the file. The contents of the front file header and the last file header are the same.
与上文述及的合并过程不同的是,在追加写入新文件头后,还可以根据新文件头更新F文件的前文件头,作为新文件的前文件头,而以新文件头作为新文件的后文件头。Different from the above-mentioned merge process, after the new file header is additionally written, the previous file header of the F file can be updated according to the new file header as the front file header of the new file, and the new file header is used as the new file header. The file header of the file.
由此在文件合并过程中,可以维护两个文件头,这是因为,合并过程中的追加写入也是一种“破坏性写入”,即在将G文件合并到F文件时,是会破坏F文件的。其中,这里述及的破坏性写入是指在将G文件合并到F文件时,合并后的新文件的新文件头记录的是合并后的新文件的元数据信息,合并前的F文件的文件头被无效,因此如果没有采用防护措施,一旦合并过程失败,F文件将无法被修复。采用维护双文件头的方式,可以解决因异常情况导致文件被破坏而无法恢复的问题。Thus, in the file merging process, two file headers can be maintained, because the additional write during the merge process is also a kind of "destructive write", that is, when the G file is merged into the F file, it will be destroyed. F file. The destructive writing mentioned here refers to the fact that when the G file is merged into the F file, the new file header of the merged new file records the metadata information of the merged new file, and the F file before the merge. The file header is invalid, so if no protection measures are taken, the F file will not be repaired once the merge process fails. By maintaining the double file header, you can solve the problem that the file cannot be recovered due to abnormal conditions.
具体来说,合并正常完成时,新文件的首尾两个文件头都能得到正常更新,且是一样的。在出现异常情况需要恢复时,随便以那个文件头为准都没问题。Specifically, when the merge is completed normally, the first two files of the new file can be updated normally and are the same. In the event of an abnormal situation that needs to be restored, it is ok to take the header of the file as it is.
如果在还没写完末尾的新文件头时,发生异常。由于此时首部的文件头并尚未得到更新,还是完好的,只不过是旧的而已。通过该文件头,可将上次合并未完成的残余信息截断,得到一个旧版的完整的文件。An exception occurs if a new file header has not been written at the end. Since the header of the header at this time has not been updated, it is still intact, but it is only old. Through this file header, the residual information of the last merged unfinished can be truncated to get an old version of the complete file.
如果在更新前文件头时,发生异常。由于此时新文件头已经是完整的了,恢复时,只要以新文件头为准即可。即可以用新文件头重新更新前文件头,以保障两个文件头在初始状态时的完整性和一致性。An exception occurs if the file header is updated before. Since the new file header is already complete at this time, as long as the new file header is used for recovery. That is, the previous file header can be re-updated with the new header to ensure the integrity and consistency of the two headers in the initial state.
二、数据查找过程Second, the data search process
根据上文对数据存储过程的描述可知,在将数据持久化存储到外部存储器中的存储文件中时,存储流程是先写入内存表,然后再写入只读内存表、继而写入到外部存储器中的第一级存储文件,第一级存储文件合并成第二级存储文件。因此,数据的新鲜度是按照内存表、只读内存表、第一级存储文件、第二级存储文件递减的。According to the description of the data storage process above, when the data is persistently stored in the storage file in the external storage, the storage process is first written to the memory table, then written to the read-only memory table, and then written to the external The first level storage file in the memory, the first level storage file is merged into the second level storage file. Therefore, the freshness of the data is decremented according to the memory table, the read-only memory table, the first-level storage file, and the second-level storage file.
所以在读取数据时,可以先从内存表读取,在内存表读取不到的情况下,再从只读内存表读取,只读内存表也读取不到的情况下,再从外部存储器中的第一级存储文件中查找,第一存储文件中查找不到,再从第二级存储文件中查找。Therefore, when reading data, it can be read from the memory table first. If the memory table cannot be read, and then read from the read-only memory table, if the read-only memory table cannot be read, then The first-level storage file in the external storage is searched, the first storage file is not found, and then the second-level storage file is searched.
图14是示出了一次完整查找的流程示意图。参见图6,首先可以执行步骤S410,在内存表中查找是否具有与请求键对应的键。例如,在内存表中的数据是以哈希表加跳表的形式存储时,可以先根据请求键定位到内存 表中的具体哈希桶,之后再在对应的跳表内查找。Figure 14 is a flow chart showing a complete lookup. Referring to FIG. 6, step S410 may first be performed to find in the memory table whether there is a key corresponding to the request key. For example, when the data in the memory table is stored in the form of a hash table and a jump table, it may first be located according to the request key to the specific hash bucket in the memory table, and then searched in the corresponding jump table.
在内存表中查找到的情况下,直接读取即可。在内存表中查找不到的情况下,可以继续在内部存储器中的只读内存表中查找是否具有请求键对应的键(步骤S420)。其中,在内存存储器维护了具有多个只读内存表的只读内存表队列时,可以按照时间顺序逐个查找只读内存表队列中的只读内存表。If you find it in the memory table, you can read it directly. If it is not found in the memory table, it can continue to find in the read-only memory table in the internal memory whether or not there is a key corresponding to the request key (step S420). Wherein, when the memory memory maintains a read-only memory table queue with multiple read-only memory tables, the read-only memory table in the read-only memory table queue can be searched one by one in chronological order.
在只读内存表中查找不到的情况下,可以从外部存储器中的第一级存储文件中查找,此处可以按照时间顺序逐个查找外部存储器中各个第一级存储文件是否具有请求键对应的键(步骤S430)。If it is not found in the read-only memory table, it can be searched from the first-level storage file in the external memory. Here, it can be chronologically searched whether each first-level storage file in the external storage has a request key corresponding to it. Key (step S430).
在查找到某个第一级存储文件具有请求键对应的键的情况下,可以从该第一级存储文件中读取对应于请求键的值。在查找不到的情况下,可以从外部存储器中的第二级存储文件中查找,此处可以使用折半查找的方便时查找第二级存储文件中是否具有请求键对应的值(步骤S440).In the case where it is found that a certain first-level storage file has a key corresponding to the request key, the value corresponding to the request key can be read from the first-level storage file. If it is not found, it can be searched from the second-level storage file in the external storage. Here, it can be used to find whether the second-level storage file has the value corresponding to the request key in the convenient time of the binary search (step S440).
在查找到某个第二级存储文件具有请求键对应的键的情况下,可以从该第二级存储文件中读取对应于请求键的值。在查找不到的情况下,表明数据库系统中没有存储该请求键及对应的值。In the case where it is found that a certain second-level storage file has a key corresponding to the request key, the value corresponding to the request key can be read from the second-level storage file. In the case that it is not found, it indicates that the request key and the corresponding value are not stored in the database system.
图15是示出了在文件内部查找的流程示意图。参见图15,首先可以获取目标存储文件的文件头和索引块(步骤S510),然后执行步骤S520,根据文件头判断请求键是否在文件头所指示的键的范围内,不在的话,表明目标存储文件中不存在请求键所对应的值,读取结束。Figure 15 is a flow chart showing the lookup inside a file. Referring to FIG. 15, the file header and the index block of the target storage file may be first acquired (step S510), and then step S520 is performed to determine, according to the file header, whether the request key is within the range of the key indicated by the file header, and if not, indicating the target storage. The value corresponding to the request key does not exist in the file, and the reading ends.
在判定请求键在范围内的情况下,可以执行步骤S530,基于索引块的B+树结构,在索引块中查找对应于请求键的叶子节点。在索引块中查找不到与请求键对应的叶子节点的情况下,表明目标存储文件中不存在请求键所对应的值,读取结束。在查找到的情况下,可以执行步骤S540,根据所查找到的叶子节点所存储的键所对应的值在目标存储文件中的数据块中的逻辑地址读取目标值。In the case that it is determined that the request key is in the range, step S530 may be performed to find a leaf node corresponding to the request key in the index block based on the B+ tree structure of the index block. When the leaf node corresponding to the request key is not found in the index block, it indicates that the value corresponding to the request key does not exist in the target storage file, and the reading ends. In the case of finding, step S540 may be performed to read the target value according to the logical address in the data block in the target storage file according to the value corresponding to the key stored by the found leaf node.
三、重启恢复过程Third, restart the recovery process
在LevelDB中,重启是一件令人苦恼的事。因为它需要从MANIFEST和Current两个清单文件中来恢复内部存储器中的数据,随着数据量的增长, 这两个文件可能会很大,尤其是Current文件,上GB的情况也是很常见。以致有时重启得要花上数十分钟,更糟糕的,如果清单文件丢失,整个库都将不可用。而在本发明的数据库系统中,因每个文件的描述信息都在其自身的索引块、文件头等块中完整描述,而这些块的信息往往都不大,重启时,只要从这几个块中读取相应的信息,就能完整的恢复出整个文件的元数据。即便某个文件损坏,亦不会导致整个库不可用,即便是百GB级的库,亦可以在秒级内完成恢复重启。In LevelDB, restarting is an annoying thing. Because it needs to recover the data in the internal memory from the MANIFEST and Current two manifest files, as the amount of data grows, these two files may be very large, especially for the Current file, and the GB is also very common. As a result, it sometimes takes dozens of minutes to reboot. Worse, if the manifest file is lost, the entire library will be unavailable. In the database system of the present invention, since the description information of each file is completely described in its own index block, file header, and the like, and the information of these blocks is often not large, as long as the block is restarted, By reading the corresponding information, the metadata of the entire file can be completely recovered. Even if a file is damaged, it will not cause the entire library to be unavailable. Even a library of 100 gigabytes can be restarted in seconds.
图16是示出了根据本实施例的重启恢复过的示意性流程图。其中,步骤S610和步骤S620之间的先后顺序不做要求,可以同时执行,也可以异时执行。Fig. 16 is a schematic flow chart showing the restart of the restart according to the present embodiment. The sequence between step S610 and step S620 is not required, and may be performed at the same time or at different times.
参见图16,在步骤S610,构建第二级存储文件列表。具体地,可以通过内存映射的方式,预载入第二级存储文件的索引块、过滤器块(有的情况下)、文件头等,然后根据键的范围顺序,构建第二级存储文件列表。Referring to FIG. 16, in step S610, a second level storage file list is constructed. Specifically, the index block of the second-level storage file, the filter block (in some cases), the file header, and the like may be pre-loaded by means of memory mapping, and then the second-level storage file list is constructed according to the range order of the keys.
在步骤S620,构建第一级存储文件列表。具体地,可以通过内存映射的方式,预载入第一级存储文件的索引块、过滤器块(有的情况下)、文件头等,然后根据键的范围顺序,构建第一级存储文件列表。At step S620, a first level storage file list is constructed. Specifically, the index block of the first-level storage file, the filter block (in some cases), the file header, and the like may be pre-loaded by means of memory mapping, and then the first-level storage file list is constructed according to the range order of the keys.
在构建好第一级存储文件列表和第二级存储文件列表后,就可以判断日志文件中被写入到第一级存储文件的写入进入(步骤S630),以使得可以根据写入进度,构建内部存储器中的内存表和只读内存表(步骤S640)。After the first-level storage file list and the second-level storage file list are constructed, it is possible to determine that the write of the log file written to the first-level storage file is entered (step S630), so that the write progress can be made according to the progress. The memory table and the read-only memory table in the internal memory are constructed (step S640).
如上文所述,写入外部存储器中的日志文件可以有多个,分别与内存表(或者说只读内存表)一一对应,因此可以根据构建好的第一文件列表和第二文件列表判断多个日志文件中那些日志文件中的数据没有写入存储文件。然后可以将没有写入存储文件的日志文件转化为只读内存表,其中,对于最后生成的日志文件,可以将该日志文件中的数据写入内存表。由此,就可以完成内部存储器中内存表和只读内存表的恢复。As described above, there may be multiple log files written in the external memory, which are respectively corresponding to the memory table (or the read-only memory table), so that it can be determined according to the constructed first file list and the second file list. The data in those log files in multiple log files is not written to the storage file. The log file that is not written to the storage file can then be converted to a read-only memory table, where the data in the log file can be written to the memory table for the last generated log file. Thus, the recovery of the memory table and the read-only memory table in the internal memory can be completed.
在本实施例中,还提供一种电子设备,包括存储器和处理器。该存储器用于存储可执行指令,该处理器用于根据该可执行指令的控制运行该电子设备执行本实施例中提供的任意一项文件合并方法。在一个例子中,该电子设备可以是如图10所示的电子设备1000。In this embodiment, an electronic device is further provided, including a memory and a processor. The memory is configured to store executable instructions, and the processor is configured to execute the electronic device to perform any one of the file merging methods provided in the embodiment according to the control of the executable instructions. In one example, the electronic device can be an electronic device 1000 as shown in FIG.
上文中已经参考附图详细描述了根据本发明的数据库管理方法及数据库系统。根据本实施例,使得最终存储在外部存储器中的文件仅有两层层级结构,文件冗余度较低,查找起来比较方便。The database management method and database system according to the present invention have been described in detail above with reference to the accompanying drawings. According to the embodiment, the file finally stored in the external memory has only two hierarchical structures, and the file redundancy is low, which is convenient to find.
在本实施例中,还提供如下述的数据库管理方法、数据库系统以及电子设备。In the present embodiment, a database management method, a database system, and an electronic device as described below are also provided.
方面1.一种数据库管理方法,用于存储多条数据,其中,每条所述数据包括相对应的键和值,该方法包括:Aspect 1. A database management method for storing a plurality of pieces of data, wherein each of the pieces of data includes a corresponding key and a value, the method comprising:
将所述多条数据写入外部存储器中的日志文件;Writing the plurality of pieces of data to a log file in an external memory;
将所述日志文件中的数据写入内部存储器中的内存表,其中,写入所述内存表中的数据按照键的大小有序存储;Writing the data in the log file to the memory table in the internal memory, wherein the data written in the memory table is stored in an orderly manner according to the size of the key;
在所述内存表的大小超过预定阈值时,将所述内存表转化为只读内存表,并将所述日志文件中的后续数据写入新的内存表;When the size of the memory table exceeds a predetermined threshold, converting the memory table into a read-only memory table, and writing subsequent data in the log file to a new memory table;
将所述只读内存表中的数据写入外部存取器中,以得到第一级存储文件;以及Writing data in the read-only memory table to an external accessor to obtain a first-level storage file;
合并两个或更多个第一级存储文件,以得到第二级存储文件。Combine two or more first level storage files to get a second level storage file.
方面2.根据方面1所述的数据块管理方法,还包括:Aspect 2. The data block management method of aspect 1, further comprising:
以第一命名规则指定所述第一级存储文件的主文件名;以及Specifying a primary file name of the first level storage file in a first naming rule;
以第二命名规则指定所述第二级存储文件的主文件名,所述第一命名规则与所述第二命名规则不同,以便基于主文件名区分存储文件是第一级存储文件还是第二级存储文件。Specifying a primary file name of the second-level storage file by using a second naming rule, the first naming rule being different from the second naming rule, so as to distinguish whether the storage file is a first-level storage file or a second based on the primary file name. Level storage file.
方面3.根据方面1所述的数据库管理方法,其中,所述内存表由一个哈希表组成,所述哈希表包括一个或多个哈希桶,每个哈希桶对应一个跳表,所述内存表中的每条数据构成所述跳表的一个元素,其中,所述跳表中的元素的顺序是按照键的大小有序排列的。The database management method according to aspect 1, wherein the memory table is composed of a hash table, the hash table includes one or more hash buckets, and each hash bucket corresponds to one jump table. Each piece of data in the memory table constitutes an element of the hop table, wherein the order of the elements in the hop table is ordered in order according to the size of the key.
方面4.根据方面1所述的数据库管理方法,还包括:Aspect 4. The database management method of aspect 1, further comprising:
在所述内部存储器中维护只读内存表队列,在所述只读内存表中的数据未全部写入外部存储器,而新的内存表的大小超过预定阈值时,将所述新的内存表转化为另一个只读内存表,并放入所述只读内存表队列。Maintaining a read-only memory table queue in the internal memory, the data in the read-only memory table is not all written to the external memory, and when the size of the new memory table exceeds a predetermined threshold, the new memory table is converted Make another read-only memory table and put it into the read-only memory table queue.
方面5.根据方面1所述的数据库管理方法,其中,所述存储文件的 数据结构包括:The database management method of aspect 1, wherein the data structure of the storage file comprises:
文件头,用于记录所述存储文件的元数据信息;a file header for recording metadata information of the storage file;
数据块,用于存放值;以及a data block for storing values;
索引块,用于以B+树的形式存放所述值对应的键,其中,所有键及其对应的值在所述数据块中的逻辑地址均分别记录于所述B+树中的叶子节点中,并且构成所述B+树的所有节点在物理上连续存储。An index block, configured to store, in a B+ tree, a key corresponding to the value, wherein a logical address of all the keys and their corresponding values in the data block are respectively recorded in a leaf node in the B+ tree, And all nodes constituting the B+ tree are physically stored continuously.
方面6.根据方面5所述的数据库管理方法,其中,所述合并两个第一级存储文件的步骤包括:Aspect 6. The database management method of aspect 5, wherein the step of merging the two first level storage files comprises:
在第一存储文件之后追加写入追加数据块,其中写入第二存储文件的数据块中的值;Appending an additional data block after the first storage file, wherein the value in the data block of the second storage file is written;
在所述追加数据块之后追加写入新索引块,所述新索引块是基于所述第一存储文件的索引块和所述第二存储文件的索引块生成的,所述第一存储文件的索引块和所述第二存储文件的索引块中的全部有效键及其对应的值在所述第一存储文件的数据块和所述追加数据块中的逻辑地址均分别记录于新B+树中的叶子节点中;Adding a new index block after the additional data block, the new index block being generated based on an index block of the first storage file and an index block of the second storage file, where the first storage file is All valid keys in the index block and the index block of the second storage file and their corresponding values are respectively recorded in the new B+ tree in the data block of the first storage file and the logical address in the additional data block. In the leaf node;
在所述新索引块之后追加写入新文件头,以记录合并后的新文件的元数据信息。A new file header is additionally written after the new index block to record metadata information of the merged new file.
方面7.根据方面6所述的数据库管理方法,其中,所述元数据信息包括以下一项或多项:Aspect 7. The database management method of aspect 6, wherein the metadata information comprises one or more of the following:
所述索引块中键的数量;The number of keys in the index block;
所述索引块中键的范围;a range of keys in the index block;
所述B+树的高度;The height of the B+ tree;
所述B+树中第一个叶子节点的逻辑地址;The logical address of the first leaf node in the B+ tree;
所述B+树中内部节点的个数。The number of internal nodes in the B+ tree.
方面8.根据方面6所述的数据库管理方法,还包括:Aspect 8. The database management method of aspect 6, further comprising:
根据所述新文件头更新所述第一存储文件的文件头,以用所述新文件头中的元数据信息替换所述第一存储文件的文件头中的元数据信息。Updating a file header of the first storage file according to the new file header to replace metadata information in a file header of the first storage file with metadata information in the new file header.
方面9.根据方面8所述的数据库管理方法,其中,Aspect 9. The database management method according to aspect 8, wherein
所述文件包括位于文件头部的前文件头和位于文件尾部的后文件头, 所述前文件头和所述后文件头的内容相同,The file includes a front file header located at a file header and a subsequent file header located at a tail of the file, and the content of the front file header and the subsequent file header are the same.
根据所述新文件头更新所述第一存储文件的前文件头,作为新文件的前文件头,而以所述新文件头作为新文件的后文件头。Updating a front file header of the first storage file according to the new file header as a front file header of the new file, and using the new file header as a post file header of the new file.
方面10.根据方面8或9所述的数据库管理方法,还包括:Aspect 10. The database management method of aspect 8 or 9, further comprising:
在所述新文件头中写入新文件的元数据信息的步骤出错的情况下,根据所述第一存储文件的文件头将新文件还原为合并前的所述第一存储文件;以及/或者In the case that the step of writing the metadata information of the new file in the new file header is in error, the new file is restored to the first storage file before the merge according to the file header of the first storage file; and/or
在更新所述第一存储文件的文件头的步骤出错的情况下,根据所述新文件头重新更新所述第一存储文件的文件头。In the case where the step of updating the file header of the first storage file is erroneous, the file header of the first storage file is re-updated according to the new file header.
方面11.根据方面1-9中任何一项所述的数据库管理方法,还包括:Aspect 11. The database management method of any of aspects 1-9, further comprising:
响应于查找请求键所对应的目标值的请求,在内存表中查找是否具有与所述请求键对应的键,在查找到的情况下读取所述目标值;Responding to a request for finding a target value corresponding to the request key, searching in the memory table for whether there is a key corresponding to the request key, and reading the target value if found;
在所述内存表中查到不到所述请求键的情况下,从所述只读内存表中查找是否具有与所述请求键对应的键,在查找到的情况下读取所述目标值;If the request key is not found in the memory table, searching for the key corresponding to the request key from the read-only memory table, and reading the target value if found ;
在所述只读内存表中查到不到所述请求键的情况下,按照时间顺序逐个查找外部存储器中各个所述第一级存储文件中是否具有与所述请求键对应的键,在查找到的情况下读取所述目标值;以及If the request key is not found in the read-only memory table, whether each of the first-level storage files in the external memory has a key corresponding to the request key is searched one by one in time series, in the search Reading the target value if it is; and
在各个所述第一级存储文件中查到不到的情况下,使用折半查找的方式查找所述磁盘中第二级存储文件中是否具有与所述请求键对应的键,在查找到的情况下读取所述目标值。If not found in each of the first-level storage files, use a binary search to find whether the second-level storage file in the disk has a key corresponding to the request key, in the case of finding The target value is read below.
方面12.根据方面11所述的数据库管理方法,还包括:Aspect 12. The database management method of aspect 11, further comprising:
响应于从目标存储文件中读取请求键所对应的目标值的请求,获取目标存储文件的文件头和索引块;Obtaining a file header and an index block of the target storage file in response to the request to read the target value corresponding to the request key from the target storage file;
根据所述文件头判断所述请求键是否在所述文件头所指示的键的范围内;Determining, according to the file header, whether the request key is within a range of keys indicated by the file header;
在判定所述请求键在所述文件头所指示的键的范围内的情况下,基于所述索引块的B+树结构,在所述索引块中查找对应于所述请求键的叶子节点;In a case of determining that the request key is within a range of a key indicated by the file header, searching for a leaf node corresponding to the request key in the index block based on a B+ tree structure of the index block;
在查找到的情况下,根据所查找到的叶子节点所存储的键所对应的值 在所述目标存储文件中的数据块中的逻辑地址读取所述目标值。In the case of the search, the target value is read at a logical address in the data block in the target storage file according to the value corresponding to the key stored by the found leaf node.
方面13.根据方面1-9中任何一项所述的数据库管理方法,还包括:Aspect 13. The database management method of any of aspects 1-9, further comprising:
响应于重启恢复内部存储器的请求,根据第二级存储文件所包含的键的范围的大小顺序,构建第二级存储文件列表;Responding to restarting the request to restore the internal memory, constructing the second-level storage file list according to the size order of the range of the keys included in the second-level storage file;
根据第一级存储文件的文件序号顺序,构建第一级存储文件列表;Constructing a first-level storage file list according to the file serial number order of the first-level storage file;
根据所述第一级存储文件列表和所述第二级存储文件列表,判断所述日志文件中的数据被写入到第一级存储文件的写入进度;以及Determining, according to the first-level storage file list and the second-level storage file list, a writing progress of the data in the log file being written to the first-level storage file;
根据所述写入进度,构建所述内部存储器中的内存表和只读内存表。According to the write progress, a memory table and a read-only memory table in the internal memory are constructed.
方面14.一种数据库系统,包括:内部存储器和外部存储器,其中,Aspect 14. A database system comprising: an internal memory and an external memory, wherein
所述内部存储器用于将多条数据写入外部存储器中的日志文件,The internal memory is used to write a plurality of pieces of data into a log file in an external memory.
所述外部存储器将所述日志文件中的数据写入内部存储器中的内存表,其中写入所述内存表中的数据按照键的大小有序存储,The external memory writes data in the log file to a memory table in an internal memory, wherein data written in the memory table is stored in an orderly manner according to a size of a key.
在所述内存表的大小超过预定阈值时,所述内部存储器将所述内存表转化为只读内存表,所述外部存储器将所述日志文件中的后续数据写入新的内存表,When the size of the memory table exceeds a predetermined threshold, the internal memory converts the memory table into a read-only memory table, and the external memory writes subsequent data in the log file to a new memory table.
所述内部存储器将所述只读内存表中的数据写入外部存取器中,以得到第一级存储文件,The internal memory writes data in the read-only memory table into an external accessor to obtain a first-level storage file.
所述外部存储器合并两个或更多个第一级存储文,以得到第二级存储文件。The external memory merges two or more first level storage files to obtain a second level storage file.
方面15.根据方面14所述的数据库系统,其中,Aspect 15. The database system of aspect 14, wherein
所述外部存储器以第一命名规则指定所述第一级存储文件的主文件名,并且以第二命名规则指定所述第二级存储文件的主文件名,所述第一命名规则与所述第二命名规则不同,以便基于主文件名区分存储文件是第一级存储文件还是第二级存储文件。The external storage specifies a primary file name of the first-level storage file by a first naming rule, and specifies a primary file name of the second-level storage file by a second naming rule, the first naming rule and the The second naming rule is different to distinguish whether the storage file is a first-level storage file or a second-level storage file based on the primary file name.
方面16.根据方面14所述的数据库系统,其中,所述内存表由一个哈希表组成,所述哈希表包括一个或多个哈希桶,每个哈希桶对应一个跳表,所述内存表中的每条数据构成所述跳表的一个元素,其中,所述跳表中的元素的顺序是按照键的大小有序排列的。The database system according to aspect 14, wherein the memory table is composed of a hash table, the hash table includes one or more hash buckets, and each hash bucket corresponds to a jump table. Each piece of data in the memory table constitutes an element of the hop table, wherein the order of the elements in the hop table is ordered in order according to the size of the key.
方面17.根据方面14所述的数据库系统,其中,Aspect 17. The database system of aspect 14, wherein
在所述内部存储器中维护只读内存表队列,在所述只读内存表中的数据未全部写入外部存储器,而新的内存表的大小超过预定阈值时,所述外部存储器将所述新的内存表转化为另一个只读内存表,并放入所述只读内存表队列。Maintaining a read-only memory table queue in the internal memory, wherein data in the read-only memory table is not all written to the external memory, and when the size of the new memory table exceeds a predetermined threshold, the external memory will be new The memory table is converted into another read-only memory table and placed in the read-only memory table queue.
方面18.根据方面14所述的数据库系统,其中,所述存储文件的数据结构包括:The database system of aspect 14, wherein the data structure of the storage file comprises:
文件头,用于记录所述存储文件的元数据信息;a file header for recording metadata information of the storage file;
数据块,用于存放值;以及a data block for storing values;
索引块,用于以B+树的形式存放所述值对应的键,其中,所有键及其对应的值在所述数据块中的逻辑地址均分别记录于所述B+树中的叶子节点中,并且构成所述B+树的所有节点在物理上连续存储。An index block, configured to store, in a B+ tree, a key corresponding to the value, wherein a logical address of all the keys and their corresponding values in the data block are respectively recorded in a leaf node in the B+ tree, And all nodes constituting the B+ tree are physically stored continuously.
方面19.根据方面18所述的数据库系统,其中,所述外部存储器通过执行以下操作合并两个第一级存储文件:The database system of aspect 18, wherein the external memory merges the two first level storage files by performing the following operations:
在第一存储文件之后追加写入追加数据块,其中写入第二存储文件的数据块中的值;Appending an additional data block after the first storage file, wherein the value in the data block of the second storage file is written;
在所述追加数据块之后追加写入新索引块,所述新索引块是基于所述第一存储文件的索引块和所述第二存储文件的索引块生成的,所述第一存储文件的索引块和所述第二存储文件的索引块中的全部键及其对应的值在所述第一存储文件的数据块和所述追加数据块中的逻辑地址均分别记录于新B+树中的叶子节点中;Adding a new index block after the additional data block, the new index block being generated based on an index block of the first storage file and an index block of the second storage file, where the first storage file is All the keys in the index block and the index block of the second storage file and their corresponding values are respectively recorded in the new B+ tree in the data block of the first storage file and the logical address in the additional data block. In the leaf node;
在所述新索引块之后追加写入新文件头,以记录合并后的新文件的元数据信息。A new file header is additionally written after the new index block to record metadata information of the merged new file.
方面20.一种电子设备,包括:Aspect 20. An electronic device comprising:
存储器,用于存储可执行指令;a memory for storing executable instructions;
处理器,用于根据所述可执行指令的控制,运行所述电子设备,执行方面1-13所述的任意一项数据库的管理方法。本发明可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本发明的各个方面的计算机可读程序指令。And a processor, configured to execute the management method of any one of the databases described in aspects 1-13, according to the control of the executable instruction. The invention can be a system, method and/or computer program product. The computer program product can comprise a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement various aspects of the present invention.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的 指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。The computer readable storage medium can be a tangible device that can hold and store the instructions used by the instruction execution device. The computer readable storage medium can be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, for example, with instructions stored thereon A raised structure in the hole card or groove, and any suitable combination of the above. A computer readable storage medium as used herein is not to be interpreted as a transient signal itself, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (eg, a light pulse through a fiber optic cable), or through a wire The electrical signal transmitted.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。The computer readable program instructions described herein can be downloaded from a computer readable storage medium to various computing/processing devices or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in each computing/processing device .
用于执行本发明操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令 的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本发明的各个方面。Computer program instructions for performing the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages. Source code or object code written in any combination, including object oriented programming languages such as Smalltalk, C++, etc., as well as conventional procedural programming languages such as the "C" language or similar programming languages. The computer readable program instructions can execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer, partly on the remote computer, or entirely on the remote computer or server. carried out. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or wide area network (WAN), or can be connected to an external computer (eg, using an Internet service provider to access the Internet) connection). In some embodiments, the customized electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by utilizing state information of computer readable program instructions. Computer readable program instructions are executed to implement various aspects of the present invention.
这里参照根据本发明实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本发明的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus, and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams can be implemented by computer readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。The computer readable program instructions can be provided to a general purpose computer, a special purpose computer, or a processor of other programmable data processing apparatus to produce a machine such that when executed by a processor of a computer or other programmable data processing apparatus Means for implementing the functions/acts specified in one or more of the blocks of the flowcharts and/or block diagrams. The computer readable program instructions can also be stored in a computer readable storage medium that causes the computer, programmable data processing device, and/or other device to operate in a particular manner, such that the computer readable medium storing the instructions includes An article of manufacture that includes instructions for implementing various aspects of the functions/acts recited in one or more of the flowcharts.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。The computer readable program instructions can also be loaded onto a computer, other programmable data processing device, or other device to perform a series of operational steps on a computer, other programmable data processing device or other device to produce a computer-implemented process. Thus, instructions executed on a computer, other programmable data processing apparatus, or other device implement the functions/acts recited in one or more of the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本发明的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来 实现,或者可以用专用硬件与计算机指令的组合来实现。对于本领域技术人员来说公知的是,通过硬件方式实现、通过软件方式实现以及通过软件和硬件结合的方式实现都是等价的。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the invention. In this regard, each block in the flowchart or block diagram can represent a module, a program segment, or a portion of an instruction that includes one or more components for implementing the specified logical functions. Executable instructions. In some alternative implementations, the functions noted in the blocks may also occur in a different order than those illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are equivalent.
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。本发明的范围由所附权利要求来限定。The embodiments of the present invention have been described above, and the foregoing description is illustrative, not limiting, and not limited to the disclosed embodiments. Numerous modifications and changes will be apparent to those skilled in the art without departing from the scope of the invention. The choice of terms used herein is intended to best explain the principles, practical applications, or technical improvements in the various embodiments of the embodiments, or to enable those of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims (14)

  1. 一种文件合并方法,所述文件存储在外部存储器中,包括文件头、数据块以及索引块,所述文件头用于记录文件的元数据信息,所述数据块用于存放值,所述索引块用于以B+树的形式存放所述值对应的键,其中,所有键及其对应的值在所述数据块中的逻辑地址均分别记录于所述B+树中的叶子节点中,该方法包括:A file merging method, the file being stored in an external memory, including a file header, a data block, and an index block, wherein the file header is used to record metadata information of the file, and the data block is used to store a value, the index The block is configured to store the key corresponding to the value in the form of a B+ tree, wherein the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree, the method include:
    在第一文件之后追加写入追加数据块,其中写入第二文件的数据块中的值;Appending an additional data block after the first file, wherein the value in the data block of the second file is written;
    在所述追加数据块之后追加写入新索引块,所述新索引块是基于所述第一文件的索引块和所述第二文件的索引块生成的,所述第一文件的索引块和所述第二文件的索引块中的全部有效键及其对应的值在所述第一文件的数据块和所述追加数据块中的逻辑地址均分别记录于新B+树中的叶子节点中;Adding a new index block after the additional data block, the new index block being generated based on the index block of the first file and the index block of the second file, the index block of the first file and All the valid keys in the index block of the second file and their corresponding values are respectively recorded in the leaf nodes in the new B+ tree in the data block of the first file and the logical address in the additional data block;
    在所述新索引块之后追加写入新文件头,以记录合并后的新文件的元数据信息。A new file header is additionally written after the new index block to record metadata information of the merged new file.
  2. 根据权利要求1所述的文件合并方法,其中,所述元数据信息包括以下一项或多项:The file merging method according to claim 1, wherein the metadata information comprises one or more of the following:
    所述索引块中键的数量;The number of keys in the index block;
    所述索引块中键的范围;a range of keys in the index block;
    所述B+树的高度;The height of the B+ tree;
    所述B+树中第一个叶子节点的逻辑地址;The logical address of the first leaf node in the B+ tree;
    所述B+树中内部节点的个数。The number of internal nodes in the B+ tree.
  3. 根据权利要求1或2所述的文件合并方法,其中,构成所述B+树的所有节点在物理上连续存储。The file merging method according to claim 1 or 2, wherein all nodes constituting the B+ tree are physically stored contiguously.
  4. 根据权利要求1-3中任何一项所述的文件合并方法,还包括:The file merging method according to any one of claims 1 to 3, further comprising:
    根据所述新文件头更新所述第一文件的文件头,以用所述新文件头中的元数据信息替换所述第一文件的文件头中的元数据信息。Updating a file header of the first file according to the new file header to replace metadata information in a file header of the first file with metadata information in the new file header.
  5. 根据权利要求1-4中任何一项所述的文件合并方法,其中,A file merging method according to any one of claims 1 to 4, wherein
    所述文件包括位于文件头部的前文件头和位于文件尾部的后文件头,所述前文件头和所述后文件头的内容相同,The file includes a front file header located at a file header and a subsequent file header located at a tail of the file, and the content of the front file header and the subsequent file header are the same.
    根据所述新文件头更新所述第一文件的前文件头,作为新文件的前文件头,而以所述新文件头作为新文件的后文件头。Updating a front file header of the first file according to the new file header as a front file header of the new file, and using the new file header as a post file header of the new file.
  6. 根据权利要求1-5中任何一项所述的文件合并方法,还包括:The file merging method according to any one of claims 1 to 5, further comprising:
    在所述新文件头中写入新文件的元数据信息的步骤出错的情况下,根据所述第一文件的文件头将新文件还原为合并前的所述第一文件;以及/或者In the case where the step of writing the metadata information of the new file in the new file header is in error, the new file is restored to the first file before the merge according to the file header of the first file; and/or
    在更新所述第一文件的文件头的步骤出错的情况下,根据所述新文件头重新更新所述第一文件的文件头。In the case where the step of updating the file header of the first file is erroneous, the file header of the first file is re-updated according to the new file header.
  7. 根据权利要求1-7中任何一项所述的文件合并方法,还包括执行以下步骤以从目标文件中读取请求键所对应的目标值:The file merging method according to any one of claims 1 to 7, further comprising the step of: reading a target value corresponding to the request key from the object file:
    获取目标文件的文件头和索引块;Obtain the file header and index block of the target file;
    根据所述文件头判断所述请求键是否在所述文件头所指示的键的范围内;Determining, according to the file header, whether the request key is within a range of keys indicated by the file header;
    在判定所述请求键在所述范围内的情况下,基于所述索引块的B+树结构,在所述索引块中查找对应于所述请求键的叶子节点;In a case of determining that the request key is within the range, searching for a leaf node corresponding to the request key in the index block based on a B+ tree structure of the index block;
    根据所查找到的叶子节点所存储的键所对应的值在所述目标文件中的数据块中的逻辑地址读取所述目标值。The target value is read at a logical address in a data block in the target file according to a value corresponding to the key stored by the found leaf node.
  8. 一种文件合并装置,所述文件存储在外部存储器中,包括文件头、数据块以及索引块,所述文件头用于记录文件的元数据信息,所述数据块用于存放值,所述索引块用于以B+树的形式存放所述值对应的键,其中, 所有键及其对应的值在所述数据块中的逻辑地址均分别记录于所述B+树中的叶子节点中,该装置包括:A file merging device, the file being stored in an external memory, including a file header, a data block, and an index block, wherein the file header is used to record metadata information of the file, and the data block is used to store a value, the index The block is configured to store the key corresponding to the value in the form of a B+ tree, wherein the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in leaf nodes in the B+ tree, the device include:
    第一写入单元,用于在第一文件之后写入追加数据块,其中写入第二文件的数据块中的值;a first writing unit, configured to write an additional data block after the first file, where the value in the data block of the second file is written;
    B树生成单元,用于基于所述第一文件的索引块和所述第二文件的索引块生成新B+树,所述第一文件的索引块和所述第二文件的索引块中的全部有效键及其对应的值在所述第一文件的数据块和所述追加数据块中的逻辑地址均分别记录于所述新B+树中的叶子节点中;a B-tree generating unit, configured to generate a new B+ tree based on the index block of the first file and the index block of the second file, all of the index block of the first file and the index block of the second file The valid key and its corresponding value are respectively recorded in the data block of the first file and the logical address in the additional data block in the leaf node in the new B+ tree;
    第二写入单元,用于在所述追加数据块之后追加写入新索引块,其中写入所述新B+树;a second writing unit, configured to additionally write a new index block after the additional data block, where the new B+ tree is written;
    第三写入单元,用于在所述新索引块之后追加写入新文件头,以记录合并后的新文件的元数据信息。And a third writing unit, configured to additionally write a new file header after the new index block, to record metadata information of the merged new file.
  9. 根据权利要求8所述的文件合并装置,其中,所述元数据信息包括以下一项或多项:The file merging device according to claim 8, wherein the metadata information comprises one or more of the following:
    所述索引块中键的数量;The number of keys in the index block;
    所述索引块中键的范围;a range of keys in the index block;
    所述B+树的高度;The height of the B+ tree;
    所述B+树中第一个叶子节点的逻辑地址;The logical address of the first leaf node in the B+ tree;
    所述B+树中内部节点的个数。The number of internal nodes in the B+ tree.
  10. 根据权利要求8或9所述的文件合并装置,还包括:The file merging device according to claim 8 or 9, further comprising:
    更新单元,用于根据所述新文件头更新所述第一文件的文件头,以用所述新文件头中的元数据信息替换所述第一文件的文件头中的元数据信息。And an updating unit, configured to update a file header of the first file according to the new file header to replace metadata information in a file header of the first file with metadata information in the new file header.
  11. 根据权利要求8-10中任何一项所述的文件合并装置,其中,A file merging device according to any one of claims 8 to 10, wherein
    所述文件包括位于文件头部的前文件头和位于文件尾部的后文件头,所述前文件头和所述后文件头的内容相同,The file includes a front file header located at a file header and a subsequent file header located at a tail of the file, and the content of the front file header and the subsequent file header are the same.
    所述更新单元根据所述新文件头更新所述第一文件的前文件头,作为新文件的前文件头,而以所述新文件头作为新文件的后文件头。The update unit updates a front file header of the first file as a front file header of the new file according to the new file header, and uses the new file header as a post file header of the new file.
  12. 根据权利要求8-11中任何一项所述的文件合并装置,还包括:The file merging device according to any one of claims 8 to 11, further comprising:
    第一还原单元,用于在所述新文件头中写入新文件的元数据信息的步骤出错的情况下,根据所述第一文件的文件头将新文件还原为合并前的所述第一文件;以及/或者a first restoring unit, configured to restore the new file to the first before the merge according to the file header of the first file in the case that the step of writing the metadata information of the new file in the new file header is in error File; and/or
    第二还原单元,用于在更新所述第一文件的文件头的步骤出错的情况下,根据所述新文件头重新更新所述第一文件的文件头。a second restoring unit, configured to re-update the file header of the first file according to the new file header if an error occurs in the step of updating the file header of the first file.
  13. 根据权利要求8-12中任何一项所述的文件合并装置,还包括读取单元,用于从目标文件中读取请求键所对应的目标值,其中,所述读取单元包括:The file merging device according to any one of claims 8 to 12, further comprising: a reading unit, configured to read a target value corresponding to the request key from the target file, wherein the reading unit comprises:
    获取模块,获取目标文件的文件头和索引块;Obtain a module to obtain a file header and an index block of the target file;
    判断模块,根据所述文件头判断所述请求键是否在所述文件头所指示的键的范围内;The determining module determines, according to the file header, whether the request key is within a range of keys indicated by the file header;
    查找模块,在判定所述请求键在所述范围内的情况下,基于所述索引块的B+树结构,在所述索引块中查找对应于所述请求键的叶子节点;a finding module, in a case of determining that the request key is within the range, searching for a leaf node corresponding to the request key in the index block based on a B+ tree structure of the index block;
    读值模块,根据所查找到的叶子节点所存储的键所对应的值在所述目标文件中的数据块中的逻辑地址读取所述目标值。The reading module reads the target value in a logical address in a data block in the target file according to a value corresponding to the key stored by the found leaf node.
  14. 一种电子设备,包括:An electronic device comprising:
    存储器,用于存储可执行指令;a memory for storing executable instructions;
    处理器,用于根据所述可执行指令的控制,运行所述电子设备执行如权利要求1-7所述的任意一项文件合并方法。And a processor, configured to execute the electronic device to perform the file merging method according to any one of claims 1-7 according to the control of the executable instruction.
PCT/CN2018/072641 2017-01-17 2018-01-15 File merging method and apparatus WO2018133762A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201710040977.XA CN108319625B (en) 2017-01-17 2017-01-17 File mergences method and apparatus
CN201710031732.0 2017-01-17
CN201710031732.0A CN108319602B (en) 2017-01-17 2017-01-17 Database management method and database system
CN201710040977.X 2017-01-17

Publications (1)

Publication Number Publication Date
WO2018133762A1 true WO2018133762A1 (en) 2018-07-26

Family

ID=62907812

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/072641 WO2018133762A1 (en) 2017-01-17 2018-01-15 File merging method and apparatus

Country Status (1)

Country Link
WO (1) WO2018133762A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800208A (en) * 2019-01-18 2019-05-24 湖南友道信息技术有限公司 Network traceability system and its data processing method, computer storage medium
CN111538702A (en) * 2020-04-20 2020-08-14 北京京安佳新技术有限公司 Hadoop-based massive small file processing method and device
CN111984600A (en) * 2020-08-27 2020-11-24 苏州浪潮智能科技有限公司 File aggregation method, device, equipment and readable storage medium
CN112307016A (en) * 2019-07-29 2021-02-02 华为技术有限公司 Data unit merging method and device
EP3910489A1 (en) * 2020-05-15 2021-11-17 Vail Systems, Inc. A data management system using attributed data slices

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030120602A1 (en) * 2000-01-12 2003-06-26 June-Kee Jung Method of combinning multimedia files
CN103678491A (en) * 2013-11-14 2014-03-26 东南大学 Method based on Hadoop small file optimization and reverse index establishment
CN104133867A (en) * 2014-07-18 2014-11-05 中国科学院计算技术研究所 DOT in-fragment secondary index method and DOT in-fragment secondary index system
CN105117415A (en) * 2015-07-30 2015-12-02 西安交通大学 Optimized SSD data updating method
CN105868286A (en) * 2016-03-23 2016-08-17 中国科学院计算技术研究所 Parallel adding method and system for merging small files on basis of distributed file system
CN106326292A (en) * 2015-06-29 2017-01-11 杭州海康威视数字技术股份有限公司 Data structure and file aggregation and reading methods and apparatuses

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030120602A1 (en) * 2000-01-12 2003-06-26 June-Kee Jung Method of combinning multimedia files
CN103678491A (en) * 2013-11-14 2014-03-26 东南大学 Method based on Hadoop small file optimization and reverse index establishment
CN104133867A (en) * 2014-07-18 2014-11-05 中国科学院计算技术研究所 DOT in-fragment secondary index method and DOT in-fragment secondary index system
CN106326292A (en) * 2015-06-29 2017-01-11 杭州海康威视数字技术股份有限公司 Data structure and file aggregation and reading methods and apparatuses
CN105117415A (en) * 2015-07-30 2015-12-02 西安交通大学 Optimized SSD data updating method
CN105868286A (en) * 2016-03-23 2016-08-17 中国科学院计算技术研究所 Parallel adding method and system for merging small files on basis of distributed file system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800208A (en) * 2019-01-18 2019-05-24 湖南友道信息技术有限公司 Network traceability system and its data processing method, computer storage medium
CN112307016A (en) * 2019-07-29 2021-02-02 华为技术有限公司 Data unit merging method and device
CN112307016B (en) * 2019-07-29 2022-08-26 华为技术有限公司 Data unit merging method and device
CN111538702A (en) * 2020-04-20 2020-08-14 北京京安佳新技术有限公司 Hadoop-based massive small file processing method and device
EP3910489A1 (en) * 2020-05-15 2021-11-17 Vail Systems, Inc. A data management system using attributed data slices
CN111984600A (en) * 2020-08-27 2020-11-24 苏州浪潮智能科技有限公司 File aggregation method, device, equipment and readable storage medium
CN111984600B (en) * 2020-08-27 2022-07-29 苏州浪潮智能科技有限公司 File aggregation method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
WO2018133762A1 (en) File merging method and apparatus
US10891264B2 (en) Distributed, scalable key-value store
CN108319602B (en) Database management method and database system
US9166866B2 (en) Hydration and dehydration with placeholders
US20200057753A1 (en) Generating an index for a table in a database background
US9830324B2 (en) Content based organization of file systems
US8706710B2 (en) Methods for storing data streams in a distributed environment
US8738572B2 (en) System and method for storing data streams in a distributed environment
JP5996088B2 (en) Cryptographic hash database
US9189342B1 (en) Generic process for determining child to parent inheritance for fast provisioned or linked clone virtual machines
US10089338B2 (en) Method and apparatus for object storage
US9594674B1 (en) Method and system for garbage collection of data storage systems using live segment records
US9715505B1 (en) Method and system for maintaining persistent live segment records for garbage collection
CN108319625B (en) File mergences method and apparatus
US20210081388A1 (en) Methods, apparatuses and computer program products for managing metadata of storage object
CN110888837A (en) Object storage small file merging method and device
WO2015087509A1 (en) State storage and restoration device, state storage and restoration method, and storage medium
KR101693108B1 (en) Database read method and apparatus using t-tree index for improving read performance
US20190384825A1 (en) Method and device for data protection and computer readable storage medium
JP2016149049A (en) Information processor, information processing system, and pair data updating method and program
JP5441791B2 (en) File storage device with search function and program
US11507472B2 (en) Methods, devices and computer program products for data backup and restoration
KR101618999B1 (en) Network boot system
KR102024719B1 (en) Method and apparatus for journaling of file-based database
CN117369709A (en) Data storage management method and device, storage medium and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18741671

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC , EPO FORM 1205A DATED 24.10.19.

122 Ep: pct application non-entry in european phase

Ref document number: 18741671

Country of ref document: EP

Kind code of ref document: A1