CN113296683B - Data storage method, device, server and storage medium - Google Patents

Data storage method, device, server and storage medium Download PDF

Info

Publication number
CN113296683B
CN113296683B CN202010266500.5A CN202010266500A CN113296683B CN 113296683 B CN113296683 B CN 113296683B CN 202010266500 A CN202010266500 A CN 202010266500A CN 113296683 B CN113296683 B CN 113296683B
Authority
CN
China
Prior art keywords
data
information
database
identification information
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010266500.5A
Other languages
Chinese (zh)
Other versions
CN113296683A (en
Inventor
阮羽彬
吴迪
陈世平
梁宇坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202010266500.5A priority Critical patent/CN113296683B/en
Publication of CN113296683A publication Critical patent/CN113296683A/en
Application granted granted Critical
Publication of CN113296683B publication Critical patent/CN113296683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data storage method, a data storage device, a server and a storage medium, wherein the method comprises the following steps: writing first data into the memory; if the written first data in the memory reaches the preset data volume, first record information is generated, wherein the first record information comprises first identification information corresponding to the first data with the preset data volume in the memory and second identification information corresponding to second data stored in the database. Copying the first data with the preset data volume to a database to generate second recorded information, wherein the second recorded information comprises second identification information and third identification information corresponding to the first data with the preset data volume, and the third identification information is identification information corresponding to the first data with the preset data volume in the database. And processing data read-write transaction according to the first recording information and the second recording information. By adopting the scheme, the normal operation of data reading and writing transactions can be ensured not to be influenced in the data storage process.

Description

Data storage method, device, server and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data storage method, an apparatus, a server, and a storage medium.
Background
Data may be written to memory before it is written to disk. When the data in the memory accumulates to a certain amount, the data in the memory is triggered to be transferred to the disk, and this operation may be referred to as data transfer (Delta Merge).
During Delta Merge, the transferred data is temporarily in a storage space that is neither in memory nor on disk. At this time, if a data read-write transaction is received, a situation that the transferred data cannot be queried occurs, so that the Delta Merge process affects normal operation of the data read-write transaction.
Disclosure of Invention
Embodiments of the present invention provide a data storage method, apparatus, device, and storage medium, so as to ensure that a process of storing data in a database does not affect normal operation of data read-write transactions.
In a first aspect, an embodiment of the present invention provides a data storage method, where the method includes:
writing first data into a memory, wherein the first data is data to be stored in a database;
if the written first data in the memory reaches a preset data volume, generating first record information, wherein the first record information comprises first identification information and second identification information, the first identification information is identification information corresponding to the first data of the preset data volume in the memory, and the second identification information is identification information corresponding to second data stored in the database;
copying the first data of the preset data volume to the database;
generating second record information, wherein the second record information comprises the second identification information and third identification information corresponding to the first data of the preset data volume, and the third identification information is identification information corresponding to the first data of the preset data volume in the database;
and processing data read-write transaction according to the first recording information and the second recording information.
In a second aspect, an embodiment of the present invention provides a data storage apparatus, including:
the write-in module is used for writing first data into the memory, wherein the first data is data to be stored in the database;
a generating module, configured to generate first record information when first data written in the memory reaches a preset data amount, where the first record information includes first identification information and second identification information, the first identification information is identification information corresponding to the first data of the preset data amount in the memory, and the second identification information is identification information corresponding to second data stored in the database;
the copying module is used for copying the first data of the preset data volume to the database;
the generating module is configured to generate second record information, where the second record information includes the second identification information and third identification information corresponding to the first data of the preset data volume, and the third identification information is identification information corresponding to the first data of the preset data volume in the database;
and the processing module is used for processing data read-write transaction according to the first recording information and the second recording information.
In a third aspect, an embodiment of the present invention provides a server, including: a memory, a processor; wherein the memory has stored thereon executable code, which when executed by the processor, causes the processor to perform the data storage method of the first aspect of the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium, on which executable code is stored, and when the executable code is executed by a processor of a server, the processor is caused to execute the data storage method according to the first aspect of the embodiment of the present invention.
By the method provided by the embodiment of the invention, in the process of storing the data in the memory into the database, when the data in the memory reaches the preset data volume, the preset data volume is taken as a unit, and the data with the preset data volume in the memory is copied into the database in a copying mode, so that one part of data is still kept in the memory, and only the other part of data is transferred, therefore, the currently transferred data can be inquired in the memory, and the problem that the transferred data cannot be inquired in the data transfer process is avoided. In the method provided by the embodiment of the present invention, data identification information (reflecting a storage location of data) stored in the memory and the database during the data transfer process and after the data transfer is completed may be recorded through the first record information and the second record information, so that when a certain data read-write transaction is triggered, the data read-write transaction may be executed by using the corresponding record information based on the trigger time of the data read-write transaction, so as to complete the transaction submission.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a flowchart of a data storage method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of generating record information according to an embodiment of the present invention;
FIG. 3 is a flowchart of a data replication method according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a storage format conversion according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a data storage method according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a data storage device according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a plurality" typically includes at least two.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.
One concept involved in this document is explained first:
a data read-write transaction, which may also be referred to simply as a transaction, is the smallest logical unit of work that accesses a database in order to implement a particular service function, and is made up of one sequence of operations. A database can only be transitioned from one state to another if all of the operations involved in the sequence of operations are successfully completed. If an error occurs in any of the operations in the sequence, then the operations that had previously been completed need to be rolled back. That is, all operations within the same transaction are either all executed correctly or are not executed.
The transaction processing method provided by the embodiment of the present invention may be executed by a server, where the server is used as a hardware carrier of a database, and specifically, an application program may be deployed in the server, and a process may be started to execute the transaction processing method.
The data storage method provided by the embodiment of the invention can be applied to a scene of storing data in the database. When data needs to be stored in the database, the data is not directly stored in the database, but is firstly written into the memory, and when the data written in the memory reaches a certain amount, a data transfer operation for transferring the certain amount of data from the memory to the database is triggered. In the related art, if a data transfer operation is triggered and a data read-write transaction is received in the data transfer process, the transferred data cannot be read, so that data loss is caused, and the data read-write transaction cannot be executed successfully. The data storage method provided by the embodiment of the invention can avoid the problem, so that the data storage process does not interfere with the execution process of the data read-write transaction.
The implementation of the data storage method provided herein is described below in conjunction with some of the following embodiments.
Fig. 1 is a flowchart of a data storage method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
101. and writing first data into the memory, wherein the first data is data to be stored in the database.
102. If the written first data in the memory reaches the preset data volume, generating first record information, wherein the first record information comprises first identification information and second identification information, the first identification information is identification information corresponding to the first data of the preset data volume in the memory, and the second identification information is identification information corresponding to the second data stored in the database.
103. First data of a preset data amount is copied to a database.
104. And generating second record information, wherein the second record information comprises second identification information and third identification information corresponding to the first data with the preset data volume, and the third identification information is identification information corresponding to the first data with the preset data volume in a database.
105. And processing data read-write transaction according to the first recording information and the second recording information.
The core idea of the data recommendation scheme provided by the embodiment is summarized firstly: in the process of storing the first data written in the memory into the database, when the first data in the memory reaches a preset data volume, the first data of the preset data volume in the memory is copied. For convenience of description, Memtable hereinafter represents first data of a predetermined data amount in the memory, and Memtable' represents data obtained by copying the first data of the predetermined data amount in the memory. After Memtable' is obtained, it can be transferred from memory to a database. Meanwhile, the identification information (reflecting the storage position of the data) of the data stored in the memory and the database in the data transfer process and after the data transfer is finished can be respectively recorded through the first recording information and the second recording information, so that when a certain data read-write transaction is triggered, the data can be read and written by utilizing the corresponding recording information based on the triggering time of the data read-write transaction.
In practical applications, the preset data amount may be set according to actual requirements, and may be, for example, 500 MB. The preset data volume is reasonably set, so that the data storage efficiency can be improved, and long-time interference on data reading and writing transactions can be avoided.
The identification information in this embodiment may be, but is not limited to, a storage location or a location index, and the corresponding data may be located through the identification information. Taking the above Memtable as an example, the corresponding first identification information may be the storage location of Memtable in the memory.
In embodiments of the invention, only one Memtable may be copied into the database at a time. If more first data are continuously written into the memory after the Memtable is copied to the database, and the first data written later do not reach the preset data volume, at this time, one Memtable written first can be copied to the database, and when the next Memtable copying opportunity is reached, the method provided by the embodiment of the invention is triggered to be executed again.
In the process of copying Memtable to the database, Memtable is copied to obtain Memtable' in a copying mode. Memtable remains in memory and the Memtable' is transferred. During the data transfer process, the data can not be read in the memory or the database. However, according to the embodiment of the present invention, because Memtable is reserved in the memory, the data read/write transaction can be executed by reading Memtable in the memory, and the normal execution of the data read/write transaction is not affected.
In order to save the storage space of the database, optionally, the above process of copying Memtable to the database may also be implemented as: compressing Memtable; the compressed Memtable is copied to the database.
Through the data copy operation, the same data, namely the copied Memtable, exists in the memory and the database. In order to avoid a transaction execution error caused by repeatedly reading the same data when performing a data read-write transaction, the data may be distinguished by the first recording information and the second recording information. The first record information or the second record information includes identification information of operation data required when performing a data read-write transaction.
The role of the first record information and the second record information will be described below in conjunction with a specific process of performing a data read/write transaction.
According to the first record information and the second record information, the process of performing the data read-write transaction can be realized as follows: and receiving the data read-write transaction, and determining reference record information matched with the data read-write transaction according to the receiving time of the data read-write transaction and the generation time corresponding to the first record information and the second record information respectively. And executing the data read-write transaction based on the reference record information, namely determining the data required to be accessed for executing the data read-write transaction according to the identification information contained in the reference record information so as to execute the data read-write transaction based on the data. The reference record information is the first record information or the second record information.
Before performing a data read/write transaction based on the reference record information, it is necessary to select, from the first record information and the second record information, reference record information that matches the data read/write transaction for the data read/write transaction. The process of selecting the reference record information may be implemented as: if the receiving time of the data read-write transaction is before the generation time of the second recording information, determining the reference recording information as the first recording information; and if the receiving time of the data read-write transaction is after the generation time of the second recording information, determining the reference recording information as the second recording information.
In practical application, it is assumed that a data read-write transaction is received at a certain time, and then the receiving time of the data read-write transaction and the generation time of the second record information can be compared. The first record information may be selected as the reference record information if the reception time of the data read-write transaction is before the generation time of the second record information, and the second record information may be selected as the reference record information if the reception time of the data read-write transaction is after the generation time of the second record information. It is to be understood that since the generation time of the second recording information can be acquired, it means that the second recording information and the corresponding generation time can be stored when the second recording information is generated, and the generation time of the second recording information can be determined based thereon later.
After the reference record information matched with the data read-write transaction is determined, the storage positions of the data needing to be operated when the data read-write transaction is executed can be determined, then the data can be inquired on the storage positions, the inquired data is processed, and the like, so that the data read-write transaction is completed.
For ease of understanding, the process of performing a data read and write transaction is illustrated below in conjunction with the specific example and FIG. 2.
In FIG. 2, at time 1, there are 1 Memtable' in memory, labeled M0, and 3 data in the database that have been transferred from memory, labeled R0, R1, and R2. Prior to transferring M0 to the database, record information a may be generated: m0, R0, R1 and R2. Next, M0 may be copied, assuming the copied data is labeled R3. Subsequently, R3 can be transferred to the database, thus increasing R3 in the database. At the time 2 when the copy operation is completed, the recording information B: r0, R1, R2 and R3. If the data read-write transaction X is received at a certain time, it can be determined whether the receiving time of the data read-write transaction X is after time 2, and if the receiving time of the data read-write transaction X is before time 2, the data read-write transaction X is executed using the record information a, that is, the data of the required operation is queried in M0, R0, R1, and R2. If the reception time of the data read-write transaction X is after time 2, the data read-write transaction X, that is, the data required for the inquiry operation in R0, R1, R2, and R3, is executed using the record information B. No matter the record information A or the record information B is used for executing the data read-write transaction X, the situation that data are repeatedly inquired does not exist, the inquired data are all stored data, and the situation that data are missing is avoided.
In order to reduce the occupation of the storage space, optionally, if the data read/write transactions executed based on the first record information (i.e. the data read/write transactions triggered before the second record information is generated) are already committed, the first record information and the Memtable in the memory corresponding to the first record information may be deleted.
In some cases, a plurality of recorded information may exist simultaneously. For example, if the rate of copying a Memtable to a database is slower than the rate of new memtables, multiple memtables may accumulate in the memory, multiple memtables may be queued to be copied to the database in sequence, and multiple memtables may accumulate to cause multiple records to exist at the same time. Then, the effective time of any one recorded information can be considered as: the record information is used by the data read-write transaction received during the period from the generation to the generation of the next record information.
By the method provided by the embodiment of the invention, during the process of storing the data in the memory into the database, Memtable is kept in the memory, and only Memtable' is transferred, so that Memtable can be inquired in the memory, and the problem of data loss during the data transfer process is avoided. In addition, in the method provided by the embodiment of the present invention, when a certain data read-write transaction is received, data read-write can be performed by using the corresponding record information based on the receiving time of the data read-write transaction, so as to complete transaction commit.
One scheme for copying Memtable to a database is illustrated below in conjunction with the embodiment shown in figure 3. As shown in fig. 3, the replication scheme may include the steps of:
301. and performing storage format conversion on the first data with the preset data volume.
302. And copying the data subjected to storage format conversion into a database.
In practical application, if the storage format of the data in the memory is not consistent with the storage format of the data in the database, the Memtable ' may be firstly subjected to storage format conversion so as to convert the Memtable ' into the data conforming to the storage format in the database, and then the Memtable ' subjected to storage format conversion is copied into the database.
In practice, the database may include a line database, a column database, and the like. The columnar database may be, for example, a Histore database. If the database is a columnar database, and the storage format of the data in the columnar database is different from that of the data in the memory, the Memtable 'can be subjected to storage format conversion, and then the Memtable' subjected to storage format conversion is stored in the columnar database.
The storage format of the data in the columnar database and the storage format of the data in the memory, and how to perform the storage format conversion process will be described below.
In practical applications, data is stored in the memory in a row manner, and data is stored in the column-wise database in a column manner. Assuming that Memtable' includes a plurality of rows of data, each row of data includes attribute values corresponding to a plurality of attributes, and the data are stored in the memory row by row, that is, a plurality of attribute values of the same row of data are closely arranged together as a whole.
For convenience of understanding, taking the example of inputting the examination results of the mathematical disciplines of the students into the database, assuming that 10 students exist in a class, a student result list is established for the mathematical examination conditions of the 10 students, wherein the student result list comprises 10 rows of data, and each row of data corresponds to one student. For each student, his name, school number, class and school score need to be recorded in the student's transcript. Accordingly, the name, the school number, the class, and the math score can be used as the attributes described in this embodiment, and the specific contents corresponding to the name, the school number, the class, and the math score can be used as the attribute values. For example, the attribute value corresponding to the name may be student a, the attribute value corresponding to the school number may be 20200114, the attribute value corresponding to the class may be class 2 and class 3, and the mathematical score is 90, and the line data is (student a, 20200114, class 2 and class 3, 90).
For the multi-line data included in Memtable', when the storage format is converted, a plurality of attribute values corresponding to the same attribute in the multi-line data are combined to obtain a plurality of data blocks, and the plurality of data blocks are stored in the columnar database.
To more intuitively understand the process of storage format conversion, an example is illustrated in connection with FIG. 4.
In the memory, data is stored in a storage format of [ name, school number, class, math score ], that is, one piece of data includes attribute values of a plurality of attributes. The memory shown in fig. 4 actually includes two rows of data: the first row of data is: [ student A, 20200114, grade 2, class 3, 90], second line of data: [ student B, 20200115, grade 2, class 3, 95 ]. When the storage format is converted, the student A and the student B corresponding to the names can be combined together to obtain a data block a; 20200114 and 20200115 corresponding to the school number are combined together to obtain a data block b; combining grade 2 and grade 3 classes corresponding to the classes together to obtain a data block c; the 90 and 95 corresponding to the mathematical achievement are combined to give data block d.
After obtaining the plurality of data blocks, the storage locations of the plurality of data blocks in the database corresponding to each other may be determined, and the plurality of data blocks may be stored in the respective storage locations corresponding to each other.
In practical applications, each storage location may store one data block, and the storage locations corresponding to multiple data blocks are located in the same row. A plurality of data blocks located in the same row may be referred to as a row group (Rowgroup).
Although a row group includes multiple data blocks, each of which corresponds to a different attribute, the multiple data blocks within a row group correspond to N rows of complete data. Meanwhile, the mode of arranging the data blocks in the same row can ensure that the data blocks in the same column of the row groups correspond to the same attribute.
The process of copying Memtable to a database is illustrated below in conjunction with figure 5. Assume that there is currently a data table to which 1017 rows of data have been written, where 0-999 rows of data have been stored in the database, and where the data table corresponds to a plurality of data blocks as illustrated by row group 0, row group 1, row group 2, and row group 3, where each row group contains four data blocks. Assume that 1000- > 1017 rows of data are stored in the memory. In memory, it is assumed that every 6 lines of data constitute one Memtable, so there are 3 memtables in total in the memory. The first Memtable corresponds to the data of 1000-. These 3 memtables need to be copied into the database in sequence, the order being the first, second and third memtables. Taking the copy process of the first Memtable as an example, the first Memtable can be first subjected to storage format conversion, and it is assumed that 4 data blocks are obtained, including a data block a, a data block b, a data block c and a data block d. At this time, the data block a, the data block b, the data block c, and the data block d may be stored in the database as a row group 4, and it is ensured that the data blocks corresponding to the same attribute are arranged in the same column.
Based on this, in the column type database, the data blocks arranged in the same column have the same attribute, for example, the data in the data block in the 4 th column is the mathematical achievement of the student. Therefore, under certain data reading and writing scenes, the data which is required to be operated when the data reading and writing transaction is executed can be conveniently inquired.
In order to further improve the efficiency of querying data, data statistics can be performed on each data block to obtain data statistics information corresponding to each data block, and in the process of querying data required to be operated when data read-write transaction is executed, a part of data blocks which cannot be used are filtered out in a database based on the data statistics information, scanning is not performed, and only the rest of data blocks are scanned. Alternatively, after the plurality of data blocks are generated, the data statistics information corresponding to each of the plurality of data blocks may be acquired, and the data statistics information corresponding to each of the plurality of data blocks may be correspondingly stored in the storage locations corresponding to each of the plurality of data blocks.
The data statistics may include statistics such as maximum, minimum, mean square error, etc. in the data block. For any data block i in the plurality of data blocks, after the data block i is obtained, statistical information such as a maximum value, a minimum value, an average value, a mean square error and the like of the data block i can be calculated, and the data statistical information corresponding to the data block i is correspondingly stored in a storage position corresponding to the data block i.
Based on this, the process of querying data can be specifically realized as follows: and responding to the data read-write affairs corresponding to the plurality of data blocks, and filtering the data blocks which do not meet the data read-write affairs according to the data statistical information.
In the process of querying data, the data statistical information of each data block can be preferentially read without reading the data in each data block, and if the data block which does not meet the current data read-write transaction is determined based on the data statistical information corresponding to any data block, the data block can be filtered, that is, the data in the data block is not read.
Taking the example of inquiring the mathematical achievement of students, assume that there are 4 rows of data blocks in the current database, each row includes 5 data blocks, wherein the attribute corresponding to one row of data blocks is the mathematical achievement, and the current data read-write transaction is the number of students whose statistical mathematical achievement is 100 points. At this time, the column of data blocks with the attribute of mathematical achievement may be located first, and the column of data blocks with the attribute of mathematical achievement is assumed to be the 4 th column of data blocks. Next, data statistics information corresponding to each data block in the column 4 data block is obtained to determine a maximum value, such as 95, 88, 100, 99, and 100, in each data block in the column 4 data block. Based on the data, the maximum value of the first data block, the second data block and the fourth data block in the 4 th column of data blocks can be determined to not reach the full score, so that the three data blocks cannot have the full score mathematical achievement, the three data blocks can be directly filtered, and the third data block and the fifth data block are left. Finally, the mathematical achievements in the third data block and the fifth data block can be read, and the number of full-scale mathematical achievements can be counted.
It can be seen from the above example that, when data required to be operated when performing a data read/write transaction corresponds to the same attribute, the data read/write transaction can be performed by directly querying the data blocks in the same column, and the efficiency of data query is high. However, if the data of the desired operation corresponds to at least two attributes when performing a data read and write transaction, it may be advantageous to choose to store the data in a row. Taking the examination score of the student as an example, assuming that the current data reading and writing affair is a list of students with bad statistical examination, the data required for operating when executing the data reading and writing affair includes not only the examination score but also the name of the corresponding student, i.e. the name corresponds to two attributes, if the data is stored in a line mode, the name and the examination score of the student can be directly read, and if the examination score is not qualified, the name of the corresponding student can be directly output.
While the above refers to row-wise data blocks and column-wise databases, another database exists in practical applications as a row-column hybrid database. In such a row-column hybrid database, data may be stored in a row manner or a column manner. If the data is selected to be stored in a row mode, the storage format of the data can not be converted in the process of copying the data from the memory to the row-column hybrid database. If the data is stored in a column mode, the storage format of the data needs to be converted in the process of copying the data from the memory to the row-column hybrid database.
The storage format in which the data is stored in the row-column hybrid database, or whether the storage format conversion of the data is required before the data is stored, may be determined according to the query feature information of the service corresponding to the first data. Optionally, before the Memtable is transferred to the database, query feature information of the service corresponding to the Memtable can be determined; and determining whether to convert the memory format of Memtable according to the query characteristic information.
In practical applications, data in a Memtable is often from the same service, and different services often have different query feature information. In practical applications, for a data reading transaction of a certain service, statistics may be performed on characteristics of the data reading transaction when the data reading transaction is executed, for example, data of operations required when most of the data reading transactions are executed correspond to the same attribute, or data of operations required when most of the data reading transactions are executed correspond to different attributes. Based on this, when the data of the operation required when most of the data reading transactions are executed correspond to the same attribute, the data of the service can be considered to be suitable for being stored in a row mode, and when the data of the operation required when most of the data reading transactions are executed correspond to different attributes, the data of the service can be considered to be suitable for being stored in a column mode.
It will be appreciated that the correspondence between services and query feature information may be established, that the service to which Memtable belongs may be determined before the Memtable is transferred to the database, and that the query feature information corresponding to the service to which Memtable belongs may then be determined based on the correspondence. After determining the query characteristic information, it may be determined whether to perform a memory format conversion for Memtable based on the query characteristic information. If it is determined that memory format conversion is required for Memtable, it may be performed, and if it is determined that memory format conversion is not required, it may be transferred directly to the database.
By the method provided by the embodiment of the invention, the data stored in the inches in a row mode can be converted into the data blocks stored in a column mode as required, and the converted data blocks are stored into the column-type data blocks, so that the efficiency of data query can be improved.
The data storage device of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these data storage devices may each be constructed using commercially available hardware components configured through the steps taught in this disclosure.
Fig. 6 is a schematic structural diagram of a data storage device according to an embodiment of the present invention, as shown in fig. 6, the data storage device includes: a writing module 601, a generating module 602, and a copying module 603.
The writing module 601 is configured to write first data into the memory, where the first data is data to be stored in the database.
The generating module 602 is configured to generate first record information when the first data written in the memory reaches a preset data amount, where the first record information includes first identification information and second identification information, the first identification information is identification information corresponding to the first data of the preset data amount in the memory, and the second identification information is identification information corresponding to second data stored in the database.
The copying module 603 is configured to copy the first data of the preset data amount to the database.
The generating module 602 is configured to generate second record information, where the second record information includes second identification information and third identification information corresponding to the first data of the preset data volume, and the third identification information is identification information corresponding to the first data of the preset data volume in the database.
The processing module 604 performs a data read-write transaction according to the first record information and the second record information.
Optionally, the copy module 603 is specifically configured to: performing storage format conversion on first data with a preset data volume; and copying the data subjected to storage format conversion into a database.
Optionally, the first data of the preset data amount includes multiple lines of data, each line of data includes attribute values corresponding to multiple attributes respectively, and the copy module 603 is further configured to: combining attribute values corresponding to the same attribute in multiple lines of data to obtain multiple data blocks; determining storage positions corresponding to the data blocks in the database respectively, wherein the storage positions corresponding to the data blocks are positioned in the same row; and storing the plurality of data blocks in the corresponding storage positions respectively.
Optionally, the copy module 603 is further configured to: compressing first data of a preset data volume; and copying the compressed data to a database.
Optionally, the apparatus further comprises: the statistical module is used for acquiring data statistical information corresponding to the data blocks; and correspondingly storing the data statistical information corresponding to the data blocks in the storage positions corresponding to the data blocks.
Optionally, the apparatus further comprises: and the filtering module is used for responding to the data reading and writing affairs corresponding to the plurality of data blocks and filtering the data blocks which do not meet the data reading and writing affairs according to the data statistical information.
Optionally, the apparatus further comprises: the determining module is used for determining query characteristic information of a service corresponding to first data with preset data volume; and determining whether to perform storage format conversion on the first data with the preset data volume according to the query characteristic information.
Optionally, the processing module 604 is specifically configured to: receiving data read-write transactions; determining reference record information matched with the data read-write transaction according to the receiving time of the data read-write transaction and the generation time corresponding to the first record information and the second record information respectively, wherein the reference record information is the first record information or the second record information; and determining the data to be accessed for executing the data read-write transaction based on the identification information contained in the reference record information.
Optionally, the transaction processing module is specifically configured to: if the receiving time of the data read-write transaction is before the generation time of the second recording information, determining the reference recording information as the first recording information; and if the receiving time of the data read-write transaction is after the generation time of the second recording information, determining the reference recording information as the second recording information.
Optionally, the reference record information is first record information, and the transaction processing module is further configured to: and when the data reading and writing transaction is submitted, deleting the first record information and the first data with the preset data volume in the memory.
The apparatus shown in fig. 6 may perform the data storage method provided in the embodiments shown in fig. 1 to fig. 5, and the detailed implementation process and technical effect are described in the foregoing embodiments and are not described herein again.
In one possible design, the structure of the data storage apparatus shown in fig. 6 may be implemented as a server, as shown in fig. 7, and the server may include: a processor 701, a memory 702. Wherein the memory 702 stores executable code thereon, and when the executable code is executed by the processor 701, the processor 701 is enabled to at least implement the data storage method provided in the embodiments of fig. 1 to 5.
Optionally, the server may further include a communication interface 703 for communicating with other devices.
In addition, an embodiment of the present invention provides a non-transitory machine-readable storage medium, on which executable code is stored, and when the executable code is executed by a processor of a server, the processor is enabled to implement at least the data storage method provided in the foregoing embodiments shown in fig. 1 to 5.
The above-described apparatus embodiments are merely illustrative, in that elements described as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.
The data storage method provided by the embodiment of the present invention may be executed by a certain program/software, the program/software may be provided by a network side, the electronic device mentioned in the foregoing embodiment may download the program/software into a local nonvolatile storage medium, and when it needs to execute the data storage method, the program/software is read into a memory by a CPU, and then the CPU executes the program/software to implement the data storage method provided in the foregoing embodiment, and the execution process may refer to the schematic in fig. 1 to fig. 5.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (13)

1. A method of data storage, the method comprising:
writing first data into a memory, wherein the first data is data to be stored in a database;
if the written first data in the memory reaches a preset data volume, generating first record information, wherein the first record information comprises first identification information and second identification information, the first identification information is identification information corresponding to the first data of the preset data volume in the memory, and the second identification information is identification information corresponding to second data stored in the database;
copying the first data of the preset data volume to the database;
generating second record information, wherein the second record information comprises the second identification information and third identification information corresponding to the first data of the preset data volume, and the third identification information is identification information corresponding to the first data of the preset data volume in the database;
receiving a data read-write transaction, and determining data to be accessed for executing the data read-write transaction according to the receiving time of the data read-write transaction and the generation time corresponding to the first recording information and the second recording information respectively;
and processing the data read-write transaction according to the data to be accessed.
2. The method of claim 1, the copying the preset amount of data of the first data to the database, comprising:
performing storage format conversion on the first data with the preset data volume;
and copying the data subjected to storage format conversion into the database.
3. The method of claim 2, the preset amount of data of the first data comprising a plurality of lines of data, each line of data comprising attribute values corresponding to a plurality of attributes, respectively, the performing the storage format conversion on the preset amount of data of the first data comprising:
combining attribute values corresponding to the same attribute in the multiple lines of data to obtain multiple data blocks;
the copying the data subjected to storage format conversion into the database comprises:
determining storage positions corresponding to the data blocks in the database respectively, wherein the storage positions corresponding to the data blocks are located in the same row;
and respectively storing the plurality of data blocks to the corresponding storage positions.
4. The method of claim 3, further comprising:
acquiring data statistical information corresponding to the plurality of data blocks;
and correspondingly storing the data statistical information corresponding to the data blocks to the storage positions corresponding to the data blocks.
5. The method of claim 4, further comprising:
and responding to the data reading and writing affairs corresponding to the data blocks, and filtering the data blocks which do not meet the data reading and writing affairs according to the data statistical information.
6. The method of claim 2, further comprising:
determining query characteristic information of a service corresponding to the first data with the preset data volume;
and determining whether to perform storage format conversion on the first data with the preset data volume according to the query characteristic information.
7. The method of claim 1, the copying the preset amount of data of the first data to the database, comprising:
compressing the first data of the preset data volume;
and copying the compressed data to the database.
8. The method according to any one of claims 1 to 7, wherein the determining, according to the receiving time of the data read/write transaction and the generation time corresponding to each of the first record information and the second record information, data to be accessed for executing the data read/write transaction includes:
determining reference record information matched with the data read-write transaction according to the receiving time of the data read-write transaction and the generation time corresponding to the first record information and the second record information respectively, wherein the reference record information is the first record information or the second record information;
and determining the data to be accessed for executing the data read-write transaction based on the identification information contained in the reference record information.
9. The method of claim 8, wherein determining reference record information that matches the data read and write transaction comprises:
if the receiving time of the data read-write transaction is before the generation time of the second recording information, determining the reference recording information as the first recording information;
and if the receiving time of the data read-write transaction is after the generation time of the second recording information, determining the reference recording information as the second recording information.
10. The method of claim 9, the reference recording information being the first recording information, the method further comprising:
and if the data reading and writing transaction is submitted, deleting the first record information and the first data with the preset data volume in the memory.
11. A data storage device, the device comprising:
the write-in module is used for writing first data into the memory, wherein the first data is data to be stored in the database;
a generating module, configured to generate first record information when first data written in the memory reaches a preset data amount, where the first record information includes first identification information and second identification information, the first identification information is identification information corresponding to the first data of the preset data amount in the memory, and the second identification information is identification information corresponding to second data stored in the database;
the copying module is used for copying the first data of the preset data volume to the database;
the generating module is configured to generate second record information, where the second record information includes the second identification information and third identification information corresponding to the first data of the preset data volume, and the third identification information is identification information corresponding to the first data of the preset data volume in the database;
the processing module is used for receiving a data read-write transaction and determining data to be accessed for executing the data read-write transaction according to the receiving time of the data read-write transaction and the generation time corresponding to the first recording information and the second recording information respectively; and processing the data read-write transaction according to the data to be accessed.
12. A server, comprising: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the data storage method of any one of claims 1-10.
13. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of a server, causes the processor to perform the data storage method of any one of claims 1-10.
CN202010266500.5A 2020-04-07 2020-04-07 Data storage method, device, server and storage medium Active CN113296683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010266500.5A CN113296683B (en) 2020-04-07 2020-04-07 Data storage method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010266500.5A CN113296683B (en) 2020-04-07 2020-04-07 Data storage method, device, server and storage medium

Publications (2)

Publication Number Publication Date
CN113296683A CN113296683A (en) 2021-08-24
CN113296683B true CN113296683B (en) 2022-04-29

Family

ID=77317963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010266500.5A Active CN113296683B (en) 2020-04-07 2020-04-07 Data storage method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN113296683B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781214A (en) * 2019-09-26 2020-02-11 金蝶软件(中国)有限公司 Database reading and writing method and device, computer equipment and storage medium
CN110879687A (en) * 2019-10-18 2020-03-13 支付宝(杭州)信息技术有限公司 Data reading method, device and equipment based on disk storage

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609479B (en) * 2012-01-20 2015-11-25 北京思特奇信息技术股份有限公司 A kind of memory database node clone method
WO2016101165A1 (en) * 2014-12-24 2016-06-30 华为技术有限公司 Transaction processing method, device and computer system
CN104899117B (en) * 2015-06-17 2019-04-16 江苏师范大学 Memory database parallel logging method towards Nonvolatile memory
CN105117308B (en) * 2015-09-29 2020-06-23 联想(北京)有限公司 Data processing method, device and system
US10311029B2 (en) * 2017-01-04 2019-06-04 Sap Se Shared database dictionaries
CN110196847A (en) * 2018-08-16 2019-09-03 腾讯科技(深圳)有限公司 Data processing method and device, storage medium and electronic device
CN110096521A (en) * 2019-04-29 2019-08-06 顶象科技有限公司 Log information processing method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781214A (en) * 2019-09-26 2020-02-11 金蝶软件(中国)有限公司 Database reading and writing method and device, computer equipment and storage medium
CN110879687A (en) * 2019-10-18 2020-03-13 支付宝(杭州)信息技术有限公司 Data reading method, device and equipment based on disk storage

Also Published As

Publication number Publication date
CN113296683A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN111352935B (en) Index creating method, device and equipment in block chain type account book
CN110347679B (en) Data storage method, device and equipment based on receipt
CN110716965B (en) Query method, device and equipment in block chain type account book
CN106980665A (en) Data dictionary implementation method, device and data dictionary management system
CN111414362A (en) Data reading method, device, equipment and storage medium
CN108038253B (en) Log query processing method and device
CN113296683B (en) Data storage method, device, server and storage medium
CN111753141A (en) Data management method and related equipment
CN117874061A (en) System and method for realizing remote storage based on clusters
CN117493319A (en) Data deduplication method and device, electronic equipment and storage medium
CN116303278A (en) File merging method, file reading method, device, equipment and storage medium
CN111444197B (en) Verification method, device and equipment for data records in block chain type account book
CN111444194B (en) Method, device and equipment for clearing indexes in block chain type account book
CN114116723A (en) Snapshot processing method and device and electronic equipment
CN112612805A (en) Method and related device for indexing hbase data to query engine
CN111444195B (en) Method, device and equipment for clearing indexes in block chain type account book
CN114238419B (en) Data caching method and device based on multi-tenant SaaS application system
CN114925029B (en) Data storage method and enterprise management system
CN116821102B (en) Data migration method, device, computer equipment and storage medium
CN111459949B (en) Data processing method, device and equipment for database and index updating method
CN112596948B (en) Database cluster data backup method, device, equipment and storage medium
CN118012656A (en) Damaged PDF document repairing method, device, equipment and storage medium
CN116737091A (en) Document printing method and device and electronic equipment
CN116795892A (en) System and method for acquiring assignment result of document template field value
CN118445345A (en) Configuration method, device, terminal and medium of application programming interface

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant