CN113535714A - Data storage method, data reading method and computer equipment - Google Patents

Data storage method, data reading method and computer equipment Download PDF

Info

Publication number
CN113535714A
CN113535714A CN202110680078.2A CN202110680078A CN113535714A CN 113535714 A CN113535714 A CN 113535714A CN 202110680078 A CN202110680078 A CN 202110680078A CN 113535714 A CN113535714 A CN 113535714A
Authority
CN
China
Prior art keywords
data
stored
memory
log
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110680078.2A
Other languages
Chinese (zh)
Inventor
熊志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Hanyun Technology Co ltd
Original Assignee
Shenzhen Hanyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Hanyun Technology Co ltd filed Critical Shenzhen Hanyun Technology Co ltd
Priority to CN202110680078.2A priority Critical patent/CN113535714A/en
Publication of CN113535714A publication Critical patent/CN113535714A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application is applicable to the technical field of computers, and provides a data storage method, a data reading method and computer equipment, which comprise the following steps: acquiring data to be stored, wherein a table corresponding to the data to be stored is a temporary table; storing the data to be stored in a log structure merging tree, or storing the data to be stored in a memory; an index of the stored data is recorded in the log structured merge tree. The response speed of the computer equipment can be improved by the method.

Description

Data storage method, data reading method and computer equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data storage method, a data reading method, a data storage device, a computer apparatus, and a computer-readable storage medium.
Background
Currently, a Log-Structured Merge Tree (LSMTree) is one of the mainstream storage methods for developing storage engines.
The LSMTree mainly consists of a Write-Ahead log (WAL), a memory Table (Memtable), a read-only memory Table (Immutable Memtable), and an ordered String Table (SSTable). Wherein:
WAL: WAL is a commonly used means in databases for ensuring persistence. In the LSMTree structure, if data is written into Memtable first, sorted in a memory, and sequentially written into a hard disk after a certain amount of data is written into the hard disk to form an SSTable file, the data in Memtable will be lost if a system is crashed in the process that Memtable is not converted into SSTable. To ensure the persistence of the data, LSMTree writes the data to the Log (Log) sequentially and then writes the data to Memtable. Thus, even if the system crashes, the data in Memtable can be recovered through Log.
Memtable: is located in the memory.
Immunable Memtable: the Immutable Memtable structure is identical to the Memtable structure and is also present in memory, except that it is not writable, but only readable. When the data written into Memtable reaches the threshold value, Memtable will be converted into Immunable Memtable, which will be handed to the asynchronous written file of the background thread to generate SSTable.
SSTable: SSTable is an ordered record of memory writes generated by Immunable Memtable.
Although LSMTree has greatly improved write performance by converting random Input/Output (I/O) to sequential I/O, it still needs to occupy certain I/O resources, and the occupied I/O resources will affect the performance of the computer device. Therefore, the method can be further optimized in combination with the actual application scene to improve the performance by reducing I/O.
Disclosure of Invention
The embodiment of the application provides a data storage method, which can solve the problem that the response speed of computer equipment is too low.
In a first aspect, an embodiment of the present application provides a data storage method, including:
acquiring data to be stored, wherein a table corresponding to the data to be stored is a temporary table;
storing the data to be stored in a log structure merging tree, or storing the data to be stored in a memory;
recording an index of stored data in the log structured merge tree.
In a second aspect, an embodiment of the present application provides a method for reading data, including:
receiving a reading instruction of data, wherein the reading instruction carries a type identifier of a table corresponding to the data to be read and an index value of the data to be read;
identifying whether the table is a temporary table according to the type identifier of the table;
and if the table is a temporary table, reading corresponding data from the log structure merging tree or reading corresponding data from a memory and a hard disk according to the index value of the data to be read and the log structure merging tree, wherein the log structure merging tree records the index of the data stored in the memory or the index of the data stored in the memory.
In a third aspect, an embodiment of the present application provides a data storage device, including:
the data acquisition module is used for acquiring data to be stored, and a table corresponding to the data to be stored is a temporary table;
the data storage module is used for storing the data to be stored in the log structure merged tree or storing the data to be stored in the memory;
and the corresponding relation recording module is used for recording the index of the stored data in the log structure merging tree.
In a fourth aspect, the present application provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method according to the first aspect or the second aspect when executing the computer program.
In a fifth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method according to the first aspect or the second aspect.
In a sixth aspect, embodiments of the present application provide a computer program product, which, when run on a computer device, causes the computer device to perform the method of the first aspect or the second aspect.
Compared with the prior art, the embodiment of the application has the advantages that:
in the embodiment of the present application, if data needs to be stored in the temporary table, the data is stored in the log-structured merge tree, or the data is stored in the memory, and then the index of the data stored in the log-structured merge tree or the index of the data already stored in the memory is recorded in the log-structured merge tree. When the data storage is carried out based on the log structure merging tree, the operation of the pre-written log is not executed on the data to be stored, but the operation of the data storage is directly executed, so that the I/O resource occupied by the execution of the operation of the pre-written log can be reduced, and the writing performance of the computer equipment is improved. In addition, since the table targeted by the embodiment of the application is a temporary table, and the data in the temporary table does not need to guarantee the persistence, no adverse effect is caused even if the pre-written log is not executed.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the embodiments or the description of the prior art will be briefly described below.
Fig. 1 is a flowchart of a data storage method according to an embodiment of the present application;
FIG. 2 is a flow chart of another method for storing data provided by an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating an application of a data storage method according to an embodiment of the present application;
fig. 4 is a flowchart of a data reading method according to an embodiment of the present application;
FIG. 5 is a block diagram of a data storage device according to another embodiment of the present application;
fig. 6 is a block diagram illustrating a data reading apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise.
The first embodiment is as follows:
in the prior art, when data of a table needs to be stored, the data can be stored in a log structure merging tree mode, but when the data is stored in the log structure merging tree mode, the same storage mode is used for storing data of any type of table, for example, the data is sequentially written into a log, then the data is written into Memtable, finally the data written into Memtable reaches a threshold value and is converted into data of Imutable Memtable, and then the data of Imutable Memtable is asynchronously written into a file by a background thread to generate SSTable. Because the above storage processes for data all need to occupy certain I/O resources, which are the most main performance bottleneck of the database, the existing log structure merge tree still reduces the performance of the database when storing data.
In the database operation process, a temporary table, such as an intermediate table used for association or sorting, is required to be used for many operations. Because the temporary table is only useful for the current operation, the temporary table can be released after the operation is completed, in order to save I/O resources and improve performance, the embodiment of the application improves the existing log structure merged tree, and stores data based on the improved log structure merged tree. Specifically, in the embodiment of the present application, it is first determined whether a table corresponding to data to be stored is a temporary table, and if the table is the temporary table, a memory table is directly generated according to the data to be stored, that is, a step of writing the data into a log in sequence is not performed. Because the step of sequentially writing data into the log (i.e., WAL) is no longer performed, and the sequential writing of data into the log occupies some I/O resources, the impact on the performance of the computer device can be further reduced when the data storage is performed by the method of the embodiment of the present application.
The following describes a data storage method provided in an embodiment of the present application with reference to the drawings.
Fig. 1 shows a flowchart of a data storage method provided in an embodiment of the present application, where the data storage method is applied in a computer device, and is detailed as follows:
and step S11, acquiring the data to be stored, wherein the table corresponding to the data to be stored is a temporary table.
In this embodiment, the data to be stored is data to be filled into the temporary table, and the data to be stored is related to a scene in which the temporary table needs to be generated.
In some embodiments, before step S11, the method further includes: and generating an empty temporary table, namely generating a temporary table with no stored data. Specifically, a temporary table will be generated when a specified operation is performed. The above-mentioned specific operations include, but are not limited to, an association operation and a sorting operation.
For example, in a distributed database, association is performed on two tables, but the distribution rules of the two tables are different, and at this time, the tables need to be redistributed (redistribution refers to an operation of reunifying the distribution rules of the two tables into the same distribution rule), and the tables generated by redistribution are temporary tables, and at this time, the data to be stored is data related to redistribution.
For another example, if a table needs to be sorted, a temporary table with an index in the sorting field may be established, and at this time, the data to be stored is the data corresponding to the sorting field. And writing the data to be stored into the temporary table, and reading the temporary table in sequence to obtain the ordered data.
Step S12, store the data to be stored in the log-structured merge tree, or store the data to be stored in the memory.
The memory table refers to "Memtable" in the log-structured merge tree, and the Memtable is located in the memory.
In this embodiment, the data to be stored may be stored in the log structure merge tree, such as the memory table stored in the log structure merge tree, to obtain the memory table with the data. Or only the data to be stored can be stored into the memory, and not into the memory table in the log-structured merge tree.
In either case, the data to be stored does not need to be pre-written with the log, but the data is directly stored (for example, stored in a memory table or stored in a memory). This is because the table targeted by the embodiment of the present application is a temporary table, and the data in the temporary table does not need to guarantee the persistence, so that no adverse effect is generated even if the pre-written log is not executed, and after the pre-written log is not executed, the occupation of the I/O resources of the computer device can be reduced, thereby being beneficial to improving the processing speed of the computer device.
In step S13, an index of the stored data is recorded in the log-structured merge tree.
Wherein the index of the stored data corresponds to the stored data one to one, and the stored data includes at least one of: data already stored in the memory table and data already stored in the memory.
In this embodiment, in order to facilitate fast searching of stored data, an index corresponding to the stored data is recorded in the memory table, and the index value may be "1", "2", "3", or the like, and of course, other identifiers may be used for marking, and specific view table structure definition and implementation decision only needs to be performed when each index value corresponds to one stored data.
In the embodiment of the present application, if data needs to be stored in the temporary table, the data is stored in the memory table of the log-structured merge tree, or the data is stored in the memory, and then an index of the data that has been stored in the memory table or the memory is recorded in the memory table. When the data storage is carried out based on the log structure merging tree, the operation of the pre-written log is not executed on the data to be stored, but the operation of the data storage is directly executed, so that the I/O resource occupied by the execution of the operation of the pre-written log can be reduced, and the processing speed of the computer equipment is improved. In addition, since the table targeted by the embodiment of the application is a temporary table, and the data in the temporary table does not need to guarantee the persistence, no adverse effect is caused even if the pre-written log is not executed.
Fig. 2 shows a flowchart of another data storage method provided in an embodiment of the present application. In this embodiment, step S21 is the same as step S11, and is not repeated here.
And step S21, acquiring the data to be stored, wherein the table corresponding to the data to be stored is a temporary table.
In step S22, the data to be stored is stored in the memory.
The memory mentioned above refers to a specific data structure of "memory" stored in the computer device.
In some embodiments, the data to be stored is stored in the memory through a list sequence, for example, each record (i.e., each recorded data) corresponds to a sequence number, and the sequence of the record (i.e., the recorded data) corresponding to the sequence number values in the list can be determined by comparing the values of the sequence numbers. Here, the sequence number corresponds to the position information in the subsequent step S23, and the sequence number can be used as a unique identifier of the data. Because the serial number (namely the unique identifier of the data) is adopted to correspond to the stored data, the corresponding data can be accurately found through the serial number.
In step S23, the location information of the data stored in the memory is extracted.
In this embodiment, after the data to be stored is stored in the memory, that is, after the location information of the data in the memory can be determined, the location information is extracted.
In step S24, a correspondence between the index value of the data stored in the memory and the location information of the data stored in the memory is recorded in the memory table of the log-structured merge tree.
Specifically, the index is a correspondence between an index value and position information.
In this embodiment, a key-value structure is used to store the index value and the position information of the data, that is, "key" corresponds to the index value of the data, and "value" is used to store the position information of the data stored in the memory.
Because the index value and the position information of the data are stored by adopting the key-value structure, the stored data can be accurately searched according to the index value and the position information of the data.
In the embodiment of the application, whether the memory table of the log structure merge tree is converted into the read-only memory table is only related to the size of the memory table, and the space occupied by the data is larger than the space occupied by the corresponding relation of the records of the memory table, so that the data to be stored is not stored in the memory table of the log structure merge tree, and only the corresponding relation between the index value of the data stored in the memory and the position information of the data stored in the memory is recorded in the memory table, so that the data volume stored in the log structure merge tree can be reduced, and the write amplification is reduced. In addition, because the memory table of the log-structured merge tree records the corresponding relationship between the index value of the data stored in the memory and the location information of the data stored in the memory, the location information of the data stored in the memory can be determined according to the index value of the data, and the corresponding data can be accurately searched in the memory according to the location information.
In some embodiments, the method for storing data further comprises:
and setting a first memory table threshold value, wherein the first memory table threshold value is used for indicating the maximum value of the data quantity which can be currently stored in the memory table.
Correspondingly, step S22 includes:
and A1, determining the sum of the size of the data stored in the memory and the size of the memory table to obtain a sum value.
It should be noted that the data stored in the memory refers to the data to be stored in the memory.
And A2, if the sum value is not larger than the preset size of the temporary table, storing the data to be stored in the memory, wherein the preset size of the temporary table is smaller than the first memory table threshold.
The preset size of the temporary table is related to the size of the memory of the computer device itself, and generally speaking, the preset size of the temporary table is positively related to the size of the memory of the computer device itself.
In this embodiment, it is considered that the amount of data that can be stored in the original memory table is small, and once the data stored in the memory table reaches the maximum value of the amount of data that can be stored in the memory table, the data will be converted into a read-only memory table, and then an ordered string table is generated and written into the hard disk (i.e., the ordered string table is not stored in the memory any more). In this way, as long as the sum of the space occupied by the data stored in the memory and the corresponding relationship stored in the memory table (i.e., the sum) is not larger than the size of the preset temporary table, the data is continuously stored in the memory, and the corresponding relationship between the index value and the position information of the data newly stored in the memory is continuously stored in the memory table of the log-structured merge tree.
In some embodiments, the method for storing data further comprises:
and B1, if the sum is larger than the preset temporary table size, reducing the first memory table threshold to a second memory table threshold, and storing the data which are not stored in the hard disk, and extracting the position information of the data stored in the hard disk, wherein the second memory table threshold is smaller than the preset temporary table size, and the data which are not stored are the data which are not stored yet and need to be stored.
Specifically, when the data to be stored is not stored in the memory and the sum is larger than the preset size of the temporary table, the data to be stored is stored in the hard disk.
In this embodiment, the data that is not stored may be stored in the hard disk in a heap file (HeapFile) manner. Of course, if the HeapFile method is used for storage, the position information of the data stored in the hard disk is represented by an offset.
The first memory table threshold is usually set to several hundred Megabytes (MB) and several Gigabytes (GB), and the second memory table threshold is usually set to several MB, for example, 4 MB.
B2, creating a memory table in the log-structured merge tree, and recording the corresponding relation between the index value of the data stored in the hard disk and the position information of the data stored in the hard disk in the created memory table.
In this embodiment, since the correspondence between the index value of the data stored in the hard disk and the position information of the data stored in the hard disk is recorded, the position information of the data stored in the hard disk can be determined based on the index value of the data stored in the hard disk.
In the above-mentioned B1 to B2, when the sum of the occupied spaces of the correspondence between the data stored in the memory and the memory table storage is larger than the size of the preset temporary table, the data that is not stored is stored in the hard disk, so that it is possible to avoid excessively occupying the memory of the computer device. Meanwhile, when the sum of the space occupied by the data stored in the memory and the corresponding relation stored in the memory table is not larger than the size of the preset temporary table, the data is stored in the memory as much as possible, and the data is stored in the memory without occupying I/O resources of the computer equipment, so that the storage path of the data is selected by setting the size of the temporary table, the utilization rate of the memory can be effectively improved, and the response speed of the computer equipment can be improved. In addition, because the corresponding ordered character string table needs to be generated according to the newly-built memory table subsequently, and a larger memory table needs to occupy more memory, when the size of the first memory table (i.e., the memory table recording the corresponding relationship between the index value of the data stored in the memory and the position information of the data stored in the memory) reaches the first memory table threshold value, the first memory table threshold value is modified into a smaller second memory table threshold value, so that the memory overhead can be effectively reduced, and the efficiency of converting into the ordered character string table is improved.
In some embodiments, the method for storing data provided in the embodiments of the present application further includes:
and C1, if the sum value is larger than the size of the preset temporary table, converting the memory table recording the corresponding relation between the index value of the data stored in the memory and the position information of the data stored in the memory into a read-only memory table, and not converting the memory table into an ordered character string table.
And C2, if the size of the newly-built memory table is larger than the second memory table threshold value, converting the newly-built memory table into a read-only memory table, and generating an ordered character string table according to the read-only memory table obtained by conversion.
Specifically, when the size of the newly-built memory table (Memtable) is larger than the second memory table threshold, the newly-built memory table (Memtable) is converted into a read-only memory table (executable Memtable), and then a corresponding ordered character string table is generated according to the read-only memory table, wherein the ordered character string table is a record of ordered memory writing generated by the executable Memtable. Of course, if the data to be stored is not stored, a new empty memory table is generated again, and the step B2 and the subsequent steps are returned until the data to be stored is stored in the hard disk.
In this embodiment, the data originally stored in the memory is continuously stored in the memory for subsequent reading, and the memory table recording the correspondence between the index value of the data stored in the memory and the location information of the data is converted into a read-only memory table, so as to prevent the information recorded in the memory table from being changed. That is, it is ensured that only newly written data is written to the hard disk after exceeding the size of the preset temporary table, and old data (data already stored in the memory) is still stored in the memory. Because the data still remains in the memory, the reading performance of the computer equipment can be improved when the data is read from the memory subsequently. And because the data in the memory are not written into the hard disk together, the I/O resources of the computer equipment are not occupied. In addition, since the maximum value of the data amount stored in the memory table is adjusted to the second memory table threshold, the conversion operation is executed as long as the size of the newly-built memory table is judged to be larger than the second memory table threshold. Because the threshold value of the second memory table is smaller, the memory occupation can be reduced, and the memory table can be quickly converted into the read-only memory table and further into the ordered character string table.
In some embodiments, if multiple ordered character string tables are generated, after step C2, the method includes: and judging whether the data stored among the ordered character string tables have overlapped data, if so, generating a new ordered character string table according to the ordered character string table with the overlapped data, and generating no overlapped data among the new ordered character string tables. For example, assume that the upper limit of Memtable is 5 rows of records. Then after the Memtable writes 1,3,2,5,7, the Memtable will convert to an Immutable Memtable which in turn generates SSTable1, the SSTable1 being [1,2,3,5,7 ]. Continuing with the new Memtable writes 9,8,19,6,4, the generated SSTable2 is saved as [4,6,8,9,19 ]. Since there is overlapping data "4, 5,6, 7" for these two SSTable data intervals, it is assumed that now to query "4" it is necessary to query both SSTable1 and SSTable2 at the same time to determine if record 4 exists. If record 9 is queried, SSTable2 need only be queried. To improve query efficiency, SSTable files are typically periodically merged by an asynchronous compression (compact) thread to generate new SSTable files that are globally ordered. For example, SSTable1 and SSTable2 are merged here, two new SSTable files [1,2,3,4,5] and [6,7,8,9,19] are regenerated, and the old SSTable file is deleted.
In some embodiments, to facilitate the distinction, the location information of the data stored in the memory and the location information of the data stored in the hard disk are marked with different identifiers.
For example, the location information of the data stored in the memory is marked with a sequence number, and the location information of the data stored in the hard disk is marked with an offset. Of course, if both are marked with the same flag, a flag bit may be added for distinguishing, for example, when the flag bit is 0, the flag bit indicates that the corresponding flag is the location information of the data stored in the memory, and when the flag bit is 1, the flag bit indicates that the corresponding flag is the location information of the data stored in the hard disk.
In order to more clearly describe the data storage method provided by the embodiment of the present application, the following description is made with reference to fig. 3.
1. Firstly, data is recorded in a memory (memory) to obtain a recorded serial number value.
2. And extracting the index value of the data stored in the memory from the record, taking the index value of the data stored in the memory as key, taking the sequence number value as value, and inserting the value into a memory table (Memtable) of a log-structured merge tree, wherein the threshold value of the Memtable is a first memory table threshold value.
3. And if the sum of the size of the data stored in the memory and the size of the memory table is larger than the size of the preset temporary table, adjusting the threshold value of Memtable to be the second memory table threshold value by the log structure merged tree. The first memory table threshold is larger than the size of the preset temporary table, and the original Memtable is changed into a read-only memory table (Immunable Memtable) and cannot be converted into an ordered character string table (SSTable).
4. Data is recorded to a heap file (HeapFile), resulting in an offset of the record.
5. And extracting the index value of the data stored in the hard disk from the record, taking the index value of the data stored in the hard disk as key and the offset as value, and inserting the key value into a newly-built memory table (Memtable) of the log-structured merge tree. If the size of the newly-built Memtable is larger than the second memory table threshold value, the newly-built Memtable is converted into a read-only memory table, and asynchronous compact is carried out to obtain a combined ordered character string table.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Example two:
this embodiment describes a method of reading data stored in the first embodiment.
Fig. 4 shows a flowchart of a data reading method provided in an embodiment of the present application, which is detailed as follows:
step S41, a reading instruction of the data is received, where the reading instruction carries a type identifier of a table corresponding to the data to be read, and carries an index value of the data to be read.
In particular, different types of tables have different type identifications.
And step S42, identifying whether the table corresponding to the data needing to be read is a temporary table according to the type identification of the table.
In this embodiment, the data reading method is related to the data storage method, but in the embodiment of the present application, the data of the temporary table is stored, so that it is necessary to identify whether the data to be read this time is the data of the temporary table, and the corresponding reading method can be selected according to the identification result.
Step S43, if the table corresponding to the data to be read is a temporary table, reading the corresponding data from the log structure merge tree or reading the corresponding data from the memory according to the index value of the data to be read and the log structure merge tree, where the log structure merge tree records the index of the data stored in the log structure merge tree or the index of the data stored in the memory.
In this embodiment, since the log structure merge tree includes the index of the data, the index value of the data to be read can be compared with the index value of each data included in the log structure merge tree, so as to identify the corresponding data, and then the corresponding data is selected to be read from the log structure merge tree, or the corresponding data is selected to be read from the memory. For example, if the data is stored in the memory at that time, the data is read from the memory, otherwise, the data is directly read from the log-structured merge tree.
In the embodiment of the application, when the data storage is performed based on the log structure merged tree, the operation of the pre-written log is not executed on the data to be stored, but the operation of the data storage is directly executed, so that the I/O resources occupied by the execution of the operation of the pre-written log can be reduced, and the writing performance of the computer equipment is improved. That is, the whole data storage and reading process can reduce the occupation of I/O resources, thereby improving the writing performance of the computer equipment.
In some embodiments, if the data to be stored is stored in the memory, and the correspondence between the index value of the data stored in the memory and the location information of the data stored in the memory is recorded in the memory table of the log-structured merge tree, step S43 reads the corresponding data from the memory according to the index value of the data to be read and the log-structured merge tree, including:
and matching the position information of the data stored in the memory from the memory table in the log structure merging tree according to the index value of the data to be read, and further reading the corresponding data from the memory according to the position information.
In some embodiments, if data is stored in both the memory and the hard disk, and a corresponding relationship between an index value of the data stored in the memory and location information of the data stored in the memory is recorded in a memory table of a log-structured merge tree, and a corresponding relationship between an index value of the data stored in the hard disk and location information of the data stored in the hard disk is recorded in a newly-created memory table, the method for reading data provided in this embodiment of the present application further includes:
if the index in the memory table of the log structure merging tree has an index value which is the same as the index value of the data to be read, and the position information corresponding to the index value points to the memory, reading the corresponding data from the memory according to the corresponding position information; and if the position information corresponding to the index value points to the hard disk, reading corresponding data from the hard disk according to the corresponding position information.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Example three:
fig. 5 shows a block diagram of a data storage device according to an embodiment of the present application, and for convenience of explanation, only the relevant portions of the embodiment of the present application are shown.
Referring to fig. 5, the data storage device 5 includes: a data acquisition module 51, a data storage module 52 and a corresponding relation recording module 53. Wherein:
and the data acquisition module 51 to be stored is used for acquiring the data to be stored, and the table corresponding to the data to be stored is a temporary table.
In some embodiments, the storage means 5 of the data comprise:
and the temporary table generating module is used for generating an empty temporary table.
And the data storage module 52 is configured to store the data to be stored in the log-structured merge tree, or store the data to be stored in the memory.
And an index value recording module 53, configured to record an index of the stored data in the log-structured merge tree.
In the embodiment of the application, if data needs to be stored in the temporary table, the data is stored in the log structure merge tree, or the data is stored in the memory and the hard disk, and then the stored data or the index value of the data in the memory is recorded in the log structure tree. When the data storage is carried out based on the log structure merging tree, the operation of the pre-written log is not executed on the data to be stored, but the operation of the data storage is directly executed, so that the I/O resource occupied by the execution of the operation of the pre-written log can be reduced, and the writing performance of the computer equipment is improved. In addition, since the table targeted by the embodiment of the application is a temporary table, and the data in the temporary table does not need to guarantee the persistence, no adverse effect is caused even if the pre-written log is not executed.
In some embodiments, if the data to be stored is stored in the memory, the storage device 5 for the data further includes:
and the position information extraction module is used for extracting the position information of the data stored in the memory.
Correspondingly, the correspondence relation recording module 53 is specifically configured to:
in a memory table of the log-structured merge tree, a correspondence between an index value of data stored in the memory and location information of the data stored in the memory is recorded.
In some embodiments, the storage device 5 of data further comprises:
the first memory table threshold setting module is configured to set a first memory table threshold, where the first memory table threshold is used to indicate a maximum value of a data amount that can be currently stored in a memory table.
When the data storage module 52 stores data to be stored in the memory, the data storage module includes:
and the sum value determining unit is used for determining the sum of the size of the data stored in the memory and the size of the memory table to obtain a sum value.
And the memory storage unit is used for storing the data to be stored in the memory if the sum value is not larger than the size of the preset temporary table, wherein the size of the preset temporary table is smaller than the first memory table threshold value.
In some embodiments, the storage device 5 of data further comprises:
and the hard disk storage module is used for reducing the first memory table threshold to a second memory table threshold if the sum is larger than the preset size of the temporary table, storing the data which are not stored in the hard disk, and extracting the position information of the data stored in the hard disk, wherein the second memory table threshold is smaller than the preset size of the temporary table, and the data which are not stored refer to the data which are not stored yet and need to be stored.
And the memory table creating module is used for creating a memory table in the log structure merging tree, and recording the corresponding relation between the index value of the data stored in the hard disk and the position information of the data stored in the hard disk in the created memory table.
In some embodiments, the storage device 5 of data further comprises:
and the read-only memory table conversion module is used for converting the memory table recording the corresponding relation between the index value of the data stored in the memory and the position information of the data stored in the memory into the read-only memory table and not converting the memory table into the ordered character string table if the sum value is larger than the size of the preset temporary table.
And the ordered character string table generating module is used for converting the newly-built memory table into a read-only memory table if the size of the newly-built memory table is larger than a second memory table threshold value, and generating an ordered character string table according to the read-only memory table obtained through conversion.
In some embodiments, the location information of the data stored in the memory and the location information of the data stored in the hard disk are marked with different identifications.
In some embodiments, the storage device 5 of data further comprises:
and the overlapped data processing module is used for judging whether the data stored among the ordered character string tables has overlapped data or not if a plurality of ordered character string tables are generated, and generating a new ordered character string table according to the ordered character string table with the overlapped data if the overlapped data exists, wherein the overlapped data does not exist among the new ordered character string tables.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
Example four:
fig. 6 shows a block diagram of a data reading apparatus according to an embodiment of the present application, and only shows portions related to the embodiment of the present application for convenience of description.
The data reading device 6 includes: a reading instruction receiving module 61, a table type judging module 62 and a data reading module 63. Wherein:
a reading instruction receiving module 61, configured to receive a reading instruction of data, where the reading instruction carries a type identifier of a table corresponding to the data to be read, and carries an index value of the data to be read.
And a table type judging module 62, configured to identify whether the table corresponding to the data to be read is a temporary table according to the table type identifier.
A data reading module 63, configured to, if the table corresponding to the data to be read is a temporary table, read the corresponding data from the log structure merge tree or read the corresponding data from the memory according to the index value of the data to be read and the log structure merge tree, where the log structure merge tree records an index of the data stored in the log structure merge tree or the data stored in the memory.
In the embodiment of the application, when the data storage is performed based on the log structure merged tree, the operation of the pre-written log is not executed on the data to be stored, but the operation of the data storage is directly executed, so that the I/O resources occupied by the execution of the operation of the pre-written log can be reduced, and the writing performance of the computer equipment is improved. That is, the whole data storage and reading process can reduce the occupation of I/O resources, thereby improving the writing performance of the computer equipment.
In some embodiments, the data reading module 63, when reading the corresponding data from the memory according to the index value of the data to be read and the log structure merge tree, is specifically configured to:
and matching the position information of the data stored in the memory from the memory table in the log structure merging tree according to the index value of the data to be read, and further reading the corresponding data from the memory according to the position information.
In some embodiments, the reading means 6 of the data further comprise:
the hybrid reading module is used for reading corresponding data from the memory according to corresponding position information if an index in a memory table of the log structure merging tree has an index value which is the same as the index value of the data to be read and the position information corresponding to the index value points to the memory; and if the position information corresponding to the index value points to the hard disk, reading corresponding data from the hard disk according to the corresponding position information.
Example five:
fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 7, the computer device 7 of this embodiment includes: at least one processor 70 (only one processor is shown in fig. 7), a memory 71, and a computer program 72 stored in the memory 71 and executable on the at least one processor 70, the steps of any of the various method embodiments described above being implemented when the computer program 72 is executed by the processor 70.
The computer device 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The computer device may include, but is not limited to, a processor 70, a memory 71. Those skilled in the art will appreciate that fig. 7 is merely an example of the computer device 7, and does not constitute a limitation of the computer device 7, and may include more or less components than those shown, or combine some of the components, or different components, such as input output devices, network access devices, etc.
The Processor 70 may be a Central Processing Unit (CPU), and the Processor 70 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 71 may in some embodiments be an internal storage unit of the computer device 7, such as a hard disk or a memory of the computer device 7. The memory 71 may also be an external storage device of the computer device 7 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the computer device 7. The memory 71 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 71 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
An embodiment of the present application further provides a network device, where the network device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.
The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.
The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/computer device, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A method for storing data, comprising:
acquiring data to be stored, wherein a table corresponding to the data to be stored is a temporary table;
storing the data to be stored in a log structure merging tree, or storing the data to be stored in a memory;
recording an index of stored data in the log structured merge tree.
2. The data storage method according to claim 1, wherein if the data to be stored is stored in the memory, the data storage method further comprises:
extracting the position information of the data stored in the memory;
the recording an index of stored data in the log-structured merge tree includes:
recording a corresponding relation between the index value of the data stored in the memory and the position information of the data stored in the memory in a memory table of the log-structured merge tree.
3. The method of storing data according to claim 2, further comprising:
setting a first memory table threshold value of a log structure merging tree, wherein the first memory table threshold value is used for indicating the maximum value of the data quantity which can be currently stored in a memory table;
the storing the data to be stored in the memory includes:
determining the sum of the size of the data stored in the memory and the size of the memory table to obtain a sum value;
and if the sum is not larger than the size of a preset temporary table, storing the data to be stored in a memory, wherein the size of the preset temporary table is smaller than the threshold value of the first memory table.
4. The method of claim 3, further comprising:
if the sum is larger than the size of a preset temporary table, reducing the threshold value of the first memory table to a threshold value of a second memory table, storing data which are not stored in a hard disk, and extracting position information of the data stored in the hard disk, wherein the threshold value of the second memory table is smaller than the size of the preset temporary table, and the data which are not stored refer to the data which are not stored yet and need to be stored;
and creating a memory table in the log structure merging tree, and recording the corresponding relation between the index value of the data stored in the hard disk and the position information of the data stored in the hard disk in the created memory table.
5. The method of storing data according to claim 4, further comprising:
if the sum is larger than the size of a preset temporary table, converting a memory table recording the corresponding relation between the index value of the data stored in the memory and the position information of the data stored in the memory into a read-only memory table, and not converting the memory table into an ordered character string table;
and if the size of the newly-built memory table is larger than the second memory table threshold value, converting the newly-built memory table into a read-only memory table, and generating an ordered character string table according to the read-only memory table obtained through conversion.
6. The data storage method according to claim 4, wherein the location information of the data stored in the memory and the location information of the data stored in the hard disk are marked with different identifications.
7. A method for reading data, comprising:
receiving a reading instruction of data, wherein the reading instruction carries a type identifier of a table corresponding to the data to be read and an index value of the data to be read;
identifying whether the table corresponding to the data to be read is a temporary table or not according to the type identifier of the table;
and if the table corresponding to the data to be read is a temporary table, reading the corresponding data from the log structure merged tree or reading the corresponding data from the memory according to the index value of the data to be read and the log structure merged tree, wherein the log structure merged tree records the index of the data stored in the log structure merged tree or the index of the data stored in the memory.
8. An apparatus for storing data, comprising:
the data acquisition module is used for acquiring data to be stored, and a table corresponding to the data to be stored is a temporary table;
the data storage module is used for storing the data to be stored in the log structure merged tree or storing the data to be stored in the memory;
and the corresponding relation recording module is used for recording the index of the stored data in the log structure merging tree.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202110680078.2A 2021-06-18 2021-06-18 Data storage method, data reading method and computer equipment Pending CN113535714A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110680078.2A CN113535714A (en) 2021-06-18 2021-06-18 Data storage method, data reading method and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110680078.2A CN113535714A (en) 2021-06-18 2021-06-18 Data storage method, data reading method and computer equipment

Publications (1)

Publication Number Publication Date
CN113535714A true CN113535714A (en) 2021-10-22

Family

ID=78125133

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110680078.2A Pending CN113535714A (en) 2021-06-18 2021-06-18 Data storage method, data reading method and computer equipment

Country Status (1)

Country Link
CN (1) CN113535714A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160179865A1 (en) * 2014-12-17 2016-06-23 Yahoo! Inc. Method and system for concurrency control in log-structured merge data stores
CN106886375A (en) * 2017-03-27 2017-06-23 百度在线网络技术(北京)有限公司 The method and apparatus of data storage
CN108052643A (en) * 2017-12-22 2018-05-18 北京奇虎科技有限公司 Date storage method, device and storage engines based on LSM Tree structures
CN108319602A (en) * 2017-01-17 2018-07-24 广州市动景计算机科技有限公司 Data base management method and Database Systems
CN110515957A (en) * 2019-09-02 2019-11-29 深圳市网心科技有限公司 A kind of method, system, equipment and the readable storage medium storing program for executing of the storage of block chain data
CN110532228A (en) * 2019-09-02 2019-12-03 深圳市网心科技有限公司 A kind of method, system, equipment and the readable storage medium storing program for executing of block chain reading data
CN111966652A (en) * 2019-05-20 2020-11-20 阿里巴巴集团控股有限公司 Method, device, equipment, system and storage medium for sharing storage synchronous data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160179865A1 (en) * 2014-12-17 2016-06-23 Yahoo! Inc. Method and system for concurrency control in log-structured merge data stores
CN108319602A (en) * 2017-01-17 2018-07-24 广州市动景计算机科技有限公司 Data base management method and Database Systems
CN106886375A (en) * 2017-03-27 2017-06-23 百度在线网络技术(北京)有限公司 The method and apparatus of data storage
CN108052643A (en) * 2017-12-22 2018-05-18 北京奇虎科技有限公司 Date storage method, device and storage engines based on LSM Tree structures
CN111966652A (en) * 2019-05-20 2020-11-20 阿里巴巴集团控股有限公司 Method, device, equipment, system and storage medium for sharing storage synchronous data
CN110515957A (en) * 2019-09-02 2019-11-29 深圳市网心科技有限公司 A kind of method, system, equipment and the readable storage medium storing program for executing of the storage of block chain data
CN110532228A (en) * 2019-09-02 2019-12-03 深圳市网心科技有限公司 A kind of method, system, equipment and the readable storage medium storing program for executing of block chain reading data

Similar Documents

Publication Publication Date Title
CN110019218B (en) Data storage and query method and equipment
CN109471851B (en) Data processing method, device, server and storage medium
CN109213432B (en) Storage device for writing data using log structured merge tree and method thereof
CN107665219B (en) Log management method and device
CN112597153A (en) Data storage method and device based on block chain and storage medium
CN115878027A (en) Storage object processing method and device, terminal and storage medium
CN112379835B (en) OOB area data extraction method, terminal device and storage medium
CN116048396B (en) Data storage device and storage control method based on log structured merging tree
CN113535714A (en) Data storage method, data reading method and computer equipment
CN115576947A (en) Data management method and device, combined library, electronic equipment and storage medium
CN113625967B (en) Data storage method, data query method and server
WO2022001626A1 (en) Time series data injection method, time series data query method and database system
WO2020238750A1 (en) Data processing method and apparatus, electronic device, and computer storage medium
CN114356912A (en) Method for writing data into database and computer equipment
CN112380174B (en) XFS file system analysis method containing deleted files, terminal device and storage medium
CN112015672A (en) Data processing method, device, equipment and storage medium in storage system
CN115883508B (en) Number processing method and device, electronic equipment and storage medium
CN112527745B (en) Embedded file system multi-partition analysis method, terminal device and storage medium
US20240220470A1 (en) Data storage device and storage control method based on log-structured merge tree
CN114185890B (en) Database retrieval method and device, storage medium and electronic equipment
CN112860712B (en) Block chain-based transaction database construction method, system and electronic equipment
CN112948376B (en) IP geographical position information query method, terminal equipment and storage medium
CN105956099A (en) HBase high table-based primary key design method
CN109542900B (en) Data processing method and device
CN113918592A (en) Data storage method and data query method of data table and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination