CN113296683B

CN113296683B - Data storage method, device, server and storage medium

Info

Publication number: CN113296683B
Application number: CN202010266500.5A
Authority: CN
Inventors: 阮羽彬; 吴迪; 陈世平; 梁宇坤
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2020-04-07
Filing date: 2020-04-07
Publication date: 2022-04-29
Anticipated expiration: 2040-04-07
Also published as: CN113296683A

Abstract

The embodiment of the invention provides a data storage method, a data storage device, a server and a storage medium, wherein the method comprises the following steps: writing first data into the memory; if the written first data in the memory reaches the preset data volume, first record information is generated, wherein the first record information comprises first identification information corresponding to the first data with the preset data volume in the memory and second identification information corresponding to second data stored in the database. Copying the first data with the preset data volume to a database to generate second recorded information, wherein the second recorded information comprises second identification information and third identification information corresponding to the first data with the preset data volume, and the third identification information is identification information corresponding to the first data with the preset data volume in the database. And processing data read-write transaction according to the first recording information and the second recording information. By adopting the scheme, the normal operation of data reading and writing transactions can be ensured not to be influenced in the data storage process.

Description

Data storage method, device, server and storage medium

技术领域technical field

本发明涉及数据处理技术领域，尤其涉及一种数据存储方法、装置、服务器和存储介质。The present invention relates to the technical field of data processing, and in particular, to a data storage method, device, server and storage medium.

背景技术Background technique

在向磁盘写入数据之前，可以先将数据写入到内存中。当内存中的数据积累到一定量之后，会触发将内存中的数据转移到磁盘中，此操作可以称为数据转移(DeltaMerge)。Data can be written to memory before writing data to disk. When the data in the memory accumulates to a certain amount, it will trigger the transfer of the data in the memory to the disk. This operation can be called data transfer (DeltaMerge).

在Delta Merge的过程中，被转移的数据暂时会处于既不属于内存也不属于磁盘的存储空间中。此时，如果接收到数据读写事务，会发生无法查询到被转移的数据的情况，因此Delta Merge的过程会影响数据读写事务的正常运行。During the Delta Merge process, the transferred data will temporarily reside in storage space that is neither in memory nor on disk. At this time, if a data read/write transaction is received, the transferred data cannot be queried, so the process of Delta Merge will affect the normal operation of the data read/write transaction.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供一种数据存储方法、装置、设备和存储介质，以保证向数据库存储数据的过程不影响数据读写事务的正常运行。Embodiments of the present invention provide a data storage method, apparatus, device, and storage medium, so as to ensure that the process of storing data in the database does not affect the normal operation of data read and write transactions.

第一方面，本发明实施例提供一种数据存储方法，所述方法包括：In a first aspect, an embodiment of the present invention provides a data storage method, the method comprising:

向内存中写入第一数据，所述第一数据是待存储到数据库中的数据；Write first data into the memory, the first data is the data to be stored in the database;

若所述内存中已写入的第一数据达到预设数据量，则生成第一记录信息，所述第一记录信息中包括第一标识信息和第二标识信息，所述第一标识信息是所述预设数据量的第一数据在所述内存中对应的标识信息，所述第二标识信息是所述数据库中已存储的第二数据在所述数据库中对应的标识信息；If the first data written in the memory reaches a preset amount of data, first record information is generated, and the first record information includes first identification information and second identification information, and the first identification information is The identification information corresponding to the first data of the preset data volume in the memory, and the second identification information is the identification information corresponding to the second data stored in the database in the database;

将所述预设数据量的第一数据复制到所述数据库；copying the first data of the preset data amount to the database;

生成第二记录信息，所述第二记录信息中包括所述第二标识信息以及与所述预设数据量的第一数据对应的第三标识信息，所述第三标识信息是所述预设数据量的第一数据在所述数据库中对应的标识信息；generating second record information, the second record information includes the second identification information and third identification information corresponding to the first data of the preset data volume, and the third identification information is the preset Identification information corresponding to the first data of the data volume in the database;

根据所述第一记录信息和所述第二记录信息，进行数据读写事务的处理。Data read and write transactions are processed according to the first record information and the second record information.

第二方面，本发明实施例提供一种数据存储装置，所述装置包括：In a second aspect, an embodiment of the present invention provides a data storage device, the device comprising:

写入模块，用于向内存中写入第一数据，所述第一数据是待存储到数据库中的数据；a writing module, configured to write first data into the memory, where the first data is the data to be stored in the database;

生成模块，用于当所述内存中已写入的第一数据达到预设数据量时，生成第一记录信息，所述第一记录信息中包括第一标识信息和第二标识信息，所述第一标识信息是所述预设数据量的第一数据在所述内存中对应的标识信息，所述第二标识信息是所述数据库中已存储的第二数据在所述数据库中对应的标识信息；A generating module is used to generate first record information when the first data written in the memory reaches a preset data amount, and the first record information includes first identification information and second identification information, and the The first identification information is the identification information corresponding to the first data of the preset data amount in the memory, and the second identification information is the identification corresponding to the second data stored in the database in the database information;

复制模块，用于将所述预设数据量的第一数据复制到所述数据库；a copying module, configured to copy the first data of the preset data amount to the database;

所述生成模块，用于生成第二记录信息，所述第二记录信息中包括所述第二标识信息以及与所述预设数据量的第一数据对应的第三标识信息，所述第三标识信息是所述预设数据量的第一数据在所述数据库中对应的标识信息；The generating module is configured to generate second record information, the second record information includes the second identification information and the third identification information corresponding to the first data of the preset data volume, the third identification information The identification information is the identification information corresponding to the first data of the preset data volume in the database;

处理模块，用于根据所述第一记录信息和所述第二记录信息，进行数据读写事务的处理。The processing module is configured to process data read and write transactions according to the first record information and the second record information.

第三方面，本发明实施例提供一种服务器，包括：存储器、处理器；其中，所述存储器上存储有可执行代码，当所述可执行代码被所述处理器执行时，使所述处理器执行本发明实施例第一方面所述的数据存储方法。In a third aspect, an embodiment of the present invention provides a server, including: a memory and a processor; wherein, executable code is stored on the memory, and when the executable code is executed by the processor, the processing is executed. The controller executes the data storage method described in the first aspect of the embodiment of the present invention.

第四方面，本发明实施例提供一种非暂时性机器可读存储介质，所述非暂时性机器可读存储介质上存储有可执行代码，当所述可执行代码被服务器的处理器执行时，使所述处理器执行本发明实施例第一方面所述的数据存储方法。In a fourth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium, where executable code is stored on the non-transitory machine-readable storage medium, and when the executable code is executed by a processor of a server , causing the processor to execute the data storage method described in the first aspect of the embodiment of the present invention.

通过本发明实施例提供的方法，在将内存中的数据存储到数据库的过程中，当内存中的数据达到预设数据量时，以该预设数据量为单位，采用复制的方式，将内存中预设数据量的数据复制到数据库中，使得一份数据还保留在内存中，仅对另一份数据进行转移，这样还是能够在内存中查询到当前被转移的数据的，避免了在数据转移过程中无法查询到被转移的数据的问题。在本发明实施例提供的方法中还可以通过第一记录信息和第二记录信息记录数据转移过程中以及数据转移完成后内存及数据库中存储的数据标识信息(反映了数据的存储位置)，从而，当某数据读写事务被触发时，可以基于该数据读写事务的触发时间，利用相应的记录信息执行数据读写事务，以便完成事务提交。With the method provided by the embodiment of the present invention, in the process of storing the data in the memory into the database, when the data in the memory reaches the preset data amount, the preset data amount is taken as the unit, and the memory is copied by means of copying. The data with the preset amount of data is copied to the database, so that one piece of data is still kept in the memory, and only the other piece of data is transferred, so that the currently transferred data can still be queried in the memory, avoiding the need for data storage. The problem that the transferred data cannot be queried during the transfer process. In the method provided by the embodiment of the present invention, the data identification information (reflecting the storage location of the data) stored in the memory and the database during the data transfer process and after the data transfer is completed can also be recorded by the first record information and the second record information. , when a data read/write transaction is triggered, the data read/write transaction may be executed based on the trigger time of the data read/write transaction using the corresponding record information, so as to complete the transaction submission.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.

图1为本发明实施例提供的一种数据存储方法的流程图；1 is a flowchart of a data storage method according to an embodiment of the present invention;

图2为本发明实施例提供的一种生成记录信息的示意图；2 is a schematic diagram of generating record information according to an embodiment of the present invention;

图3为本发明实施例提供的一种数据复制方法的流程图；FIG. 3 is a flowchart of a data replication method provided by an embodiment of the present invention;

图4本发明实施例提供的一种存储格式转换的示意图；4 is a schematic diagram of a storage format conversion provided by an embodiment of the present invention;

图5本发明实施例提供的一种数据存储方法的示意图；5 is a schematic diagram of a data storage method provided by an embodiment of the present invention;

图6为本发明实施例提供的一种数据存储装置的结构示意图；6 is a schematic structural diagram of a data storage device according to an embodiment of the present invention;

图7为本发明实施例提供的一种服务器的结构示意图。FIG. 7 is a schematic structural diagram of a server according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

在本发明实施例中使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本发明。在本发明实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义，“多种”一般包含至少两种。The terms used in the embodiments of the present invention are only for the purpose of describing specific embodiments, and are not intended to limit the present invention. The singular forms "a," "the," and "the" as used in the embodiments of the present invention and the appended claims are intended to include the plural forms as well, unless the context clearly dictates otherwise, "a plurality" Generally at least two are included.

取决于语境，如在此所使用的词语“如果”、“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地，取决于语境，短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。Depending on the context, the words "if", "if" as used herein may be interpreted as "at" or "when" or "in response to determining" or "in response to detecting". Similarly, the phrases "if determined" or "if detected (the stated condition or event)" can be interpreted as "when determined" or "in response to determining" or "when detected (the stated condition or event)," depending on the context )" or "in response to detection (a stated condition or event)".

另外，下述各方法实施例中的步骤时序仅为一种举例，而非严格限定。In addition, the sequence of steps in the following method embodiments is only an example, and is not strictly limited.

先对本文中涉及到的一个概念进行说明：Let's first explain a concept involved in this article:

数据读写事务，也可以简称为事务，是为了实现特定的服务功能而访问数据库的最小逻辑工作单位，由一个操作序列构成。只有这个操作序列包含的全部操作都成功完成，才能使得数据库从一种状态转换为另一种状态。如果这个操作序列中的任意一个操作发生错误，那么就需要回滚之前已经完成的操作。也就是说，同一个事务中的所有操作，要么全都正确执行，要么全都不要执行。A data read/write transaction, also referred to as a transaction, is the smallest logical unit of work for accessing a database in order to achieve a specific service function, and consists of a sequence of operations. Only when all operations contained in this sequence of operations complete successfully can the database transition from one state to another. If an error occurs in any operation in this sequence of operations, the previously completed operation needs to be rolled back. That is, all operations in the same transaction are either executed correctly or not executed at all.

本发明实施例提供的事务处理方法可以由服务器来执行，该服务器作为数据库的硬件载体，具体地，可以在该服务器中部署一种应用程序、启动某个进程来执行该事务处理方法。The transaction processing method provided by the embodiment of the present invention may be executed by a server, which serves as a hardware carrier of the database. Specifically, an application program may be deployed in the server, and a certain process may be started to execute the transaction processing method.

本发明实施例提供的数据存储方法可以应用于向数据库中存储数据的场景。在需要向数据库中存储数据时，数据不是直接被存储到数据库中的，而是首先会被写入到内存中，每当内存中写入的数据达到一定量时，会触发将一定量的数据从内存中转移到数据库中的数据转移操作。相关技术中如果触发了数据转移操作，且在数据转移的过程中接收到了数据读写事务，则被转移的这部分数据是无法读取的，这样会造成数据的缺失，就不能成功执行数据读写事务，概括而言，数据存储的过程会对数据读写事务的执行过程造成干扰。本发明实施例提供的数据存储方法可以避免这一问题，使得数据存储的过程对数据读写事务的执行过程不造成干扰。The data storage method provided by the embodiment of the present invention can be applied to a scenario in which data is stored in a database. When data needs to be stored in the database, the data is not directly stored in the database, but is first written to the memory. Whenever the data written in the memory reaches a certain amount, a certain amount of data will be triggered. Data transfer operations from in-memory to database. In the related art, if a data transfer operation is triggered and a data read/write transaction is received during the data transfer process, the transferred data cannot be read, which will result in data loss, and the data read cannot be successfully executed. Write transactions, in general, the process of data storage will interfere with the execution process of data read and write transactions. The data storage method provided by the embodiment of the present invention can avoid this problem, so that the data storage process does not interfere with the execution process of the data read-write transaction.

下面结合以下一些实施例来说明本文提供的数据存储方法的执行过程。The execution process of the data storage method provided herein is described below with reference to the following embodiments.

图1为本发明实施例提供的一种数据存储方法的流程图，如图1所示，该方法包括如下步骤：FIG. 1 is a flowchart of a data storage method provided by an embodiment of the present invention. As shown in FIG. 1 , the method includes the following steps:

101、向内存中写入第一数据，第一数据是待存储到数据库中的数据。101. Write first data into a memory, where the first data is data to be stored in a database.

102、若内存中已写入的第一数据达到预设数据量，则生成第一记录信息，第一记录信息中包括第一标识信息和第二标识信息，第一标识信息是预设数据量的第一数据在内存中对应的标识信息，第二标识信息是数据库中已存储的第二数据在数据库中对应的标识信息。102. If the first data written in the memory reaches the preset data amount, then generate first record information, the first record information includes first identification information and second identification information, and the first identification information is the preset data amount. The identification information corresponding to the first data in the memory, and the second identification information is the identification information corresponding to the second data stored in the database in the database.

103、将预设数据量的第一数据复制到数据库。103. Copy the first data of the preset data amount to the database.

104、生成第二记录信息，第二记录信息中包括第二标识信息以及与所述预设数据量的第一数据对应的第三标识信息，第三标识信息是该预设数据量的第一数据在数据库中对应的标识信息。104. Generate second record information, the second record information includes second identification information and third identification information corresponding to the first data of the preset data amount, and the third identification information is the first identification information of the preset data amount. The identification information corresponding to the data in the database.

105、根据第一记录信息和第二记录信息，进行数据读写事务的处理。105. Perform data read and write transactions according to the first record information and the second record information.

先概述本实施例提供的数据推荐方案的核心思想：在将内存中写入的第一数据存储到数据库的过程中，当内存中的第一数据达到预设数据量时，对内存中预设数据量的第一数据进行复制。为了描述方便，下文用Memtable表示内存中预设数据量的第一数据，用Memtable’表示对内存中预设数据量的第一数据进行复制得到的数据。在得到Memtable’之后，可以将Memtable’从内存中转移到数据库中。同时，还可通过第一记录信息和第二记录信息分别记录数据转移过程中以及数据转移完成后内存及数据库中存储的数据的标识信息(反映了数据的存储位置)，从而，当某数据读写事务被触发时，可以基于该数据读写事务的触发时间，利用相应的记录信息进行数据读写。First summarize the core idea of the data recommendation solution provided by this embodiment: in the process of storing the first data written in the memory to the database, when the first data in the memory reaches the preset amount of data, the The first data of the data volume is copied. For convenience of description, hereinafter, Memtable is used to represent the first data of the preset data volume in the memory, and Memtable' is used to represent the data obtained by copying the first data of the preset data volume in the memory. After getting the Memtable', the Memtable' can be moved from memory to the database. At the same time, the first record information and the second record information can respectively record the identification information of the data stored in the memory and the database during the data transfer process and after the data transfer is completed (reflecting the storage location of the data), so that when a certain data is read When the write transaction is triggered, the data can be read and written by using the corresponding record information based on the trigger time of the data read and write transaction.

实际应用中，上述预设数据量可以根据实际需求进行设定，例如可以是500MB等。合理的设定该预设数据量，可以提高数据存储的效率，也避免对数据读写事务的长时间干扰。In practical applications, the above-mentioned preset data amount can be set according to actual requirements, for example, it can be 500MB or the like. Reasonable setting of the preset data amount can improve the efficiency of data storage and avoid long-term interference to data read and write transactions.

本实施例中的上述标识信息可以是但不限于是存储位置或者位置索引等，通过标识信息可以定位到对应的数据。以上述Memtable为例，其对应的第一标识信息可以是Memtable在内存中的存储位置。The above-mentioned identification information in this embodiment may be, but is not limited to, a storage location or a location index, etc., and corresponding data can be located through the identification information. Taking the above Memtable as an example, the corresponding first identification information may be a storage location of the Memtable in the memory.

在本发明实施例中，每次可以只复制一个Memtable到数据库中。如果在触发复制Memtable到数据库之后，内存中还继续写入了更多的第一数据，而后写入的第一数据又未达到预设数据量，此时则可以将先写入的一个Memtable复制到数据库，待达到下一个Memtable复制时机时，触发再次执行本发明实施例提供的方法。In this embodiment of the present invention, only one Memtable may be copied to the database at a time. If after triggering the replication of the Memtable to the database, more first data continues to be written in the memory, and the first data written later does not reach the preset amount of data, then the first Memtable written can be copied After reaching the database, when the next Memtable replication opportunity is reached, it is triggered to execute the method provided by the embodiment of the present invention again.

在将Memtable复制到数据库的过程中，通过复制的方式，对Memtable进行复制得到Memtable’。Memtable仍然保留在内存中，对Memtable’进行转移。在数据转移的过程中，无论在内存中亦或是数据库中都是读取不到这部分数据的。然而通过本发明实施例，由于在内存中还保留有Memtable，因此可以通过读取内存中的Memtable来执行数据读写事务，并不影响数据读写事务的正常执行。In the process of copying the Memtable to the database, the Memtable is copied to obtain the Memtable' by means of replication. The Memtable remains in memory, and the Memtable' is transferred. In the process of data transfer, this part of the data cannot be read either in memory or in the database. However, according to the embodiment of the present invention, since the Memtable is still reserved in the memory, the data read and write transactions can be executed by reading the Memtable in the memory, without affecting the normal execution of the data read and write transactions.

为了节省数据库的存储空间，可选地，上述将Memtable复制到数据库的过程还可以实现为：对Memtable进行压缩；将压缩后的Memtable复制到数据库。In order to save the storage space of the database, optionally, the above process of copying the Memtable to the database can also be implemented as: compressing the Memtable; and copying the compressed Memtable to the database.

经过数据复制操作，内存和数据库中存在同样的数据，即被复制的Memtable。为了避免在执行数据读写事务时由于重复读取相同数据而导致的事务执行错误，可以通过第一记录信息和第二记录信息来区分这些数据。第一记录信息或第二记录信息包括了执行数据读写事务时所需操作数据的标识信息。After the data replication operation, the same data exists in the memory and the database, that is, the replicated Memtable. In order to avoid a transaction execution error caused by repeatedly reading the same data when executing a data read/write transaction, the data may be distinguished by the first record information and the second record information. The first record information or the second record information includes identification information of the operation data required when executing the data read and write transaction.

下面将结合执行数据读写事务的具体过程，来说明第一记录信息和第二记录信息的作用。The functions of the first record information and the second record information will be described below in conjunction with the specific process of executing the data read/write transaction.

根据第一记录信息和第二记录信息，进行数据读写事务处理的过程可以实现为：接收数据读写事务，根据数据读写事务的接收时间与第一记录信息和第二记录信息各自对应的生成时间，确定与数据读写事务相匹配的参考记录信息。基于参考记录信息，执行数据读写事务，即根据参考记录信息中包含的标识信息，确定执行该数据读写事务所需访问的数据，以便基于这些数据来执行数据读写事务。参考记录信息为第一记录信息或者第二记录信息。According to the first record information and the second record information, the process of performing the data read and write transaction processing can be implemented as: receiving the data read and write transactions, according to the receiving time of the data read and write transactions corresponding to the first record information and the second record information respectively. Generation time, which identifies the reference record information that matches the data read and write transactions. Execute a data read/write transaction based on the reference record information, that is, determine the data to be accessed to execute the data read/write transaction according to the identification information contained in the reference record information, so as to execute the data read/write transaction based on the data. The reference record information is the first record information or the second record information.

在基于参考记录信息执行数据读写事务之前，需要在第一记录信息和第二记录信息之中，为数据读写事务选定与数据读写事务相匹配的参考记录信息。选定参考记录信息的过程可以实现为：若数据读写事务的接收时间在第二记录信息的生成时间之前，则确定参考记录信息为第一记录信息；若数据读写事务的接收时间在第二记录信息的生成时间之后，则确定参考记录信息为第二记录信息。Before executing the data read/write transaction based on the reference record information, it is necessary to select the reference record information matching the data read/write transaction for the data read/write transaction among the first record information and the second record information. The process of selecting the reference record information can be implemented as follows: if the reception time of the data read-write transaction is before the generation time of the second record information, the reference record information is determined to be the first record information; After the generation time of the second record information, the reference record information is determined to be the second record information.

在实际应用中，假设某时刻接收到数据读写事务，进而可以将数据读写事务的接收时间和第二记录信息的生成时间进行比较。如果数据读写事务的接收时间在第二记录信息的生成时间之前，则可以将第一记录信息选定为参考记录信息，如果数据读写事务的接收时间在第二记录信息的生成时间之后，则可以将第二记录信息选定为参考记录信息。可以理解的是，既然可以获取第二记录信息的生成时间，表示在生成第二记录信息时，可以对第二记录信息和对应的生成时间进行存储，后续可以基于此确定第二记录信息的生成时间。In a practical application, it is assumed that a data read/write transaction is received at a certain time, and then the receiving time of the data read/write transaction can be compared with the generation time of the second record information. If the reception time of the data read/write transaction is before the generation time of the second record information, the first record information can be selected as the reference record information, if the reception time of the data read/write transaction is after the generation time of the second record information, Then the second record information can be selected as the reference record information. It can be understood that since the generation time of the second record information can be obtained, it means that when the second record information is generated, the second record information and the corresponding generation time can be stored, and the subsequent generation of the second record information can be determined based on this. time.

在确定了与数据读写事务相匹配的参考记录信息之后，就可以确定执行该数据读写事务时所需操作的数据的存储位置，接着就可以到这些存储位置上去查询数据，对查询到的数据进行处理等，以完成数据读写事务。After determining the reference record information that matches the data read/write transaction, you can determine the storage location of the data required to perform the data read/write transaction, and then you can go to these storage locations to query the data. Data processing, etc., to complete data read and write transactions.

为了方便理解，下面结合具体例子和图2，示例性说明执行数据读写事务的过程。For ease of understanding, the following describes the process of executing data read and write transactions with reference to specific examples and FIG. 2 .

在图2中，在时刻1，内存中存在1个Memtable’，标记为M0，数据库中存在3个已从内存中转移过来的数据，标记为R0、R1和R2。在将M0转移到数据库之前，可以生成记录信息A：M0、R0、R1和R2。接着，可以对M0进行复制，假设将复制得到的数据标记为R3。随后，可以将R3转移到数据库中，这样在数据库中增多了R3。在完成复制操作的时刻2，可以生成记录信息B：R0、R1、R2和R3。如果在某时刻接收到数据读写事务X，可以判断数据读写事务X的接收时间是否在时刻2之后，如果数据读写事务X的接收时间在时刻2之前，则使用记录信息A执行数据读写事务X，也即在M0、R0、R1和R2中查询所需操作的数据。如果数据读写事务X的接收时间在时刻2之后，则使用记录信息B执行数据读写事务X，也即在R0、R1、R2和R3中查询操作所需的数据。无论使用记录信息A还是记录信息B执行数据读写事务X，都可以保证不存在重复查询数据的情况以及保证查询的数据是存储的所有数据，避免了发生数据缺失的情况。In Figure 2, at time 1, there is 1 Memtable' in the memory, marked as M0, and there are 3 data that have been transferred from the memory in the database, marked as R0, R1 and R2. Before transferring M0 to the database, record information A: M0, R0, R1 and R2 can be generated. Next, M0 can be replicated, assuming that the replicated data is marked as R3. Subsequently, the R3s can be moved into the database, thus increasing the R3s in the database. At time 2 when the copy operation is completed, record information B: R0, R1, R2, and R3 can be generated. If the data read/write transaction X is received at a certain time, it can be judged whether the reception time of the data read/write transaction X is after time 2, and if the reception time of the data read/write transaction X is before time 2, use the record information A to execute the data read Write transaction X, that is, query the data for the desired operation in M0, R0, R1 and R2. If the reception time of the data read/write transaction X is after time 2, use the record information B to execute the data read/write transaction X, that is, query the data required for the operation in R0, R1, R2 and R3. Regardless of whether the record information A or the record information B is used to execute the data read and write transaction X, it can be ensured that there is no repeated query of data and that the queried data is all stored data, thus avoiding the occurrence of data missing.

为了减少存储空间的占用，可选地，若基于第一记录信息执行的数据读写事务(即在生成第二记录信息之前触发的数据读写事务)都已经提交，则可以删除第一记录信息以及与第一记录信息对应的内存中的Memtable。In order to reduce the occupation of storage space, optionally, if the data read and write transactions performed based on the first record information (that is, the data read and write transactions triggered before the second record information is generated) have been submitted, the first record information can be deleted. and the Memtable in the memory corresponding to the first record information.

在某些情况下，可能会同时存在多个记录信息。比如如果将一个Memtable复制到数据库的速率比新的Memtable的产生速率慢时，内存中会出现多个Memtable堆积的现象，多个Memtable排队依次被复制到数据库中，多个Memtable堆积的现象可能会引发同时存在多个记录信息。那么，任一个记录信息的有效时间可以认为是：自生成之后至下一个记录信息生成前，这段时间内接收到的数据读写事务都将使用该记录信息。In some cases, multiple records may exist at the same time. For example, if the rate of copying a Memtable to the database is slower than the rate at which a new Memtable is generated, multiple Memtables will accumulate in the memory, and multiple Memtables will be queued to be copied to the database in turn, and the phenomenon of multiple Memtables may accumulate. Raises the presence of more than one record information at the same time. Then, the valid time of any record information can be considered as: from the generation of the record information to the generation of the next record information, the data read and write transactions received during this period will use the record information.

通过本发明实施例提供的方法，在将内存中的数据存储到数据库的过程中，将Memtable保留在内存中，仅对Memtable’进行转移，这样还是能够在内存中查询到Memtable的，避免了在数据转移过程中数据缺失的问题。另外，在本发明实施例提供的方法中，当接收到某数据读写事务时，可以基于该数据读写事务的接收时间，利用相应的记录信息进行数据读写，以便完成事务提交。With the method provided by the embodiment of the present invention, in the process of storing the data in the memory to the database, the Memtable is kept in the memory, and only the Memtable' is transferred, so that the Memtable can still be queried in the memory, avoiding the need for The problem of missing data during data transfer. In addition, in the method provided by the embodiment of the present invention, when a data read/write transaction is received, the corresponding record information may be used to perform data read/write based on the reception time of the data read/write transaction, so as to complete the transaction submission.

下面结合图3所示实施例，示例性说明一种将Memtable复制到数据库的方案。如图3所示，该复制方案可以包括如下步骤：In the following, in conjunction with the embodiment shown in FIG. 3 , a solution for copying the Memtable to the database is exemplarily described. As shown in Figure 3, the replication scheme may include the following steps:

301、对预设数据量的第一数据进行存储格式转换。301. Perform storage format conversion on the first data of a preset data amount.

302、将经过存储格式转换的数据复制到数据库中。302. Copy the data converted in the storage format into the database.

在实际应用中，如果数据在内存中的存储格式与数据在数据库中的存储格式不一致，则可以对Memtable’先进行存储格式转换，以将Memtable’转换为符合数据库中的存储格式的数据，再将经过存储格式转换的Memtable’复制到数据库中。In practical applications, if the storage format of the data in the memory is inconsistent with the storage format of the data in the database, the storage format conversion can be performed on the Memtable' first, so as to convert the Memtable' into data that conforms to the storage format in the database, and then Copy the converted Memtable' into the database.

实际应用中，数据库可以包括行式数据库、列式数据库等。其中，列式数据库比如可以是Histore数据库。如果数据库为列式数据库，数据在列式数据库中的存储格式和数据在内存中的存储格式不同，则可以对Memtable’进行存储格式转换，再将经过存储格式转换的Memtable’存储到列式数据库中。In practical applications, the database may include a row database, a column database, and the like. The columnar database may be, for example, a Histore database. If the database is a columnar database, and the storage format of the data in the columnar database is different from the storage format of the data in the memory, the storage format of the Memtable' can be converted, and then the Memtable' after the storage format conversion can be stored in the columnar database. middle.

下面将介绍数据在列式数据库中的存储格式和数据在内存中的存储格式，以及具体如何得进行存储格式转换的过程。The following will introduce the storage format of the data in the columnar database and the storage format of the data in the memory, and how to convert the storage format.

在实际应用中，数据在内存中是以行的方式进行存储的，数据在列式数据库中是以列的方式进行存储的。假设Memtable’包括了多行数据，每行数据包括分别与多个属性对应的属性值，在内存中这些数据是一行一行进行存储的，也即将同一行数据的多个属性值紧密排列在一起作为一个整体进行存储。In practical applications, data is stored in a row in memory, and data is stored in a column in a columnar database. Suppose Memtable' includes multiple rows of data, and each row of data includes attribute values corresponding to multiple attributes. These data are stored row by row in memory, that is, multiple attribute values of the same row of data are closely arranged together as stored as a whole.

为了便于理解，以将学生的数学学科的考试成绩录入数据库为例进行说明，假设一个班级有10名学生，现针对该10名学生的数学考试情况建立学生成绩单，该学生成绩单中共包括10行数据，每行数据对应一个学生。对于每个学生来说，需要在学生成绩单中记录他的姓名、学号、班级和数学成绩。相应地，姓名、学号、班级和数学成绩则可以作为本实施例中所描述的属性，而姓名、学号、班级和数学成绩分别对应的具体内容可以作为属性值。例如，姓名对应的属性值可以为学生A，学号对应的属性值可以为20200114，班级对应的属性值可以为2年级3班，数学成绩为90，则这一行数据为(学生A，20200114，2年级3班，90)。For ease of understanding, let’s take the example of entering students’ math test scores into the database. Suppose there are 10 students in a class, and now create a student transcript based on the math test results of the 10 students. The student’s transcript includes a total of 10 students. Rows of data, each row of data corresponds to a student. For each student, his name, student number, class, and math grades need to be recorded on the student transcript. Correspondingly, name, student ID, class, and math grade can be used as attributes described in this embodiment, and specific contents corresponding to name, student ID, class, and math grade can be used as attribute values. For example, the attribute value corresponding to the name can be student A, the attribute value corresponding to the student ID can be 20200114, the attribute value corresponding to the class can be 2nd grade 3rd class, and the math score is 90, then this row of data is (student A, 20200114, Grade 2, Class 3, 90).

针对上述Memtable’中所包含的多行数据，在进行存储格式转换时，分别将多行数据中对应于相同属性的多个属性值组合在一起，得到多个数据块，将这多个数据块存储到列式数据库中。For the multiple rows of data contained in the above Memtable', when converting the storage format, multiple attribute values corresponding to the same attribute in the multiple rows of data are combined to obtain multiple data blocks, and the multiple data blocks are combined. stored in a columnar database.

为了更加直观地理解存储格式转换的过程，结合图4来示例性说明。For a more intuitive understanding of the storage format conversion process, an exemplary description is given with reference to FIG. 4 .

在内存中，数据以[姓名，学号，班级，数学成绩]的存储格式进行存储，即一条数据中包含多个属性的属性值。在图4所示的内存中实际包括两行数据：第一行数据为：[学生A，20200114，2年级3班，90]，第二行数据为：[学生B，20200115，2年级3班，95]。在进行存储格式转换时，可以将对应于姓名的学生A和学生B组合在一起，得到数据块a；将对应于学号的20200114和20200115组合在一起，得到数据块b；将对应于班级的2年级3班和2年级3班组合在一起，得到数据块c；将对应于数学成绩的90和95组合在一起，得到数据块d。In the memory, the data is stored in the storage format of [name, student ID, class, math grade], that is, a piece of data contains attribute values of multiple attributes. The memory shown in Figure 4 actually includes two rows of data: the first row of data is: [Student A, 20200114, Grade 2, Class 3, 90], and the second row of data is: [Student B, 20200115, Grade 2, Class 3 , 95]. When converting the storage format, you can combine student A and student B corresponding to the name to obtain data block a; combine 20200114 and 20200115 corresponding to the student number to obtain data block b; Combine 2nd grade 3 and 2nd grade 3 to get data block c; combine 90 and 95 corresponding to math grades to get data block d.

在得到多个数据块之后，可以确定多个数据块在数据库中分别对应的存储位置，将多个数据块分别存储到各自对应的存储位置上。After the multiple data blocks are obtained, the respective storage locations of the multiple data blocks in the database can be determined, and the multiple data blocks are stored in the respective corresponding storage locations.

在实际应用中，每个存储位置可以存储一个数据块，多个数据块对应的存储位置位于同一行。可以将位于同一行的多个数据块称为是一个行组(Rowgroup)。In practical applications, each storage location can store one data block, and the storage locations corresponding to multiple data blocks are located in the same row. Multiple data blocks located in the same row can be called a row group (Rowgroup).

虽然一个行组包括多个数据块，其中每个数据块对应不同的属性，但是一个行组内的多个数据块对应着N行完整的数据。同时，通过将多个数据块排列在同一行的方式，可以保证多个行组的同一列数据块对应于相同的属性。Although a row group includes a plurality of data blocks, wherein each data block corresponds to a different attribute, a plurality of data blocks in a row group corresponds to N rows of complete data. At the same time, by arranging multiple data blocks in the same row, it can be ensured that the same column of data blocks of multiple row groups corresponds to the same attribute.

下面结合图5来示例性说明将Memtable复制到数据库的过程。假设当前有一个数据表，该数据表已写入了1017行数据，其中0-999行数据已经被存储到数据库中，且在数据库中对应于行组0、行组1、行组2和行组3所示意的多个数据块，其中，每个行组包含四个数据块。假设1000-1017行数据存储在内存中。在内存中，假设每6行数据就构成一个Memtable，因此内存中共有3个Memtable。第一个Memtable对应1000-1005行数据，第二个Memtable对应1006-1011行数据，第三个Memtable对应1012-1017行数据。需要将这3个Memtable依次复制到数据库中，复制顺序为第一个Memtable、第二个Memtable、第三个Memtable。以第一个Memtable的复制过程为例，首先可以对第一个Memtable进行存储格式转换，假设得到4个数据块，包括数据块a、数据块b、数据块c和数据块d。此时，可以将数据块a、数据块b、数据块c和数据块d作为行组4存储在数据库中，且保证对应相同属性的数据块排列在同一列。The process of copying the Memtable to the database is exemplarily described below with reference to FIG. 5 . Assuming that there is currently a data table, the data table has written 1017 rows of data, of which 0-999 rows of data have been stored in the database, and correspond to row group 0, row group 1, row group 2 and row in the database Multiple data blocks are shown in group 3, where each row group contains four data blocks. Suppose 1000-1017 rows of data are stored in memory. In memory, it is assumed that every 6 rows of data constitute a Memtable, so there are a total of 3 Memtables in memory. The first Memtable corresponds to rows 1000-1005, the second Memtable corresponds to rows 1006-1011, and the third Memtable corresponds to rows 1012-1017. The three Memtables need to be copied to the database in turn. The copying order is the first Memtable, the second Memtable, and the third Memtable. Taking the replication process of the first Memtable as an example, the storage format of the first Memtable can be converted first, assuming that four data blocks are obtained, including data block a, data block b, data block c, and data block d. At this time, data block a, data block b, data block c, and data block d can be stored in the database as row group 4, and it is ensured that data blocks corresponding to the same attribute are arranged in the same column.

基于此，在列式数据库中，排列在同一列的数据块它们的属性是相同的，比如说第4列的数据块中的数据都是学生的数学成绩。这样，在某些数据读写场景下，便于对执行数据读写事务时所需操作的数据进行查询。Based on this, in a columnar database, the data blocks arranged in the same column have the same attributes. For example, the data in the data block in the fourth column are all students' math scores. In this way, in some data read and write scenarios, it is convenient to query the data required for operations when executing data read and write transactions.

为了进一步提高查询数据的效率，可以对每个数据块进行数据统计，得到每个数据块对应的数据统计信息，在查询执行数据读写事务时所需操作的数据的过程中，可以基于数据统计信息在数据库中，过滤掉一部分用不到的数据块不进行扫描，只扫描剩余的数据块。可选地，在生成多个数据块之后，可以获取多个数据块各自对应的数据统计信息，将多个数据块各自对应的数据统计信息对应地存储到多个数据块各自对应的存储位置上。In order to further improve the efficiency of querying data, data statistics can be performed on each data block to obtain the data statistics information corresponding to each data block. The information is in the database, and some unused data blocks are filtered out and not scanned, and only the remaining data blocks are scanned. Optionally, after multiple data blocks are generated, the respective data statistics information of the multiple data blocks may be obtained, and the respective data statistics information corresponding to the multiple data blocks may be correspondingly stored in the respective storage locations of the multiple data blocks. .

数据统计信息可以包括数据块中的最大值、最小值、平均值、均方差等统计信息。对于多个数据块中的任一数据块i，在得到数据块i之后，可以计算它的最大值、最小值、平均值、均方差等统计信息，将数据块i对应的数据统计信息对应地存储到数据块i对应的存储位置上。The statistical information of the data may include statistical information such as the maximum value, the minimum value, the average value, and the mean square error in the data block. For any data block i in multiple data blocks, after obtaining the data block i, its maximum, minimum, average, mean square error and other statistical information can be calculated, and the data statistics corresponding to the data block i can be correspondingly Stored in the storage location corresponding to the data block i.

基于此，查询数据的过程具体可以实现为：响应于与多个数据块对应的数据读写事务，根据数据统计信息滤除不满足数据读写事务的数据块。Based on this, the process of querying data can be specifically implemented as: in response to data read/write transactions corresponding to multiple data blocks, filtering out data blocks that do not satisfy the data read/write transactions according to data statistics information.

在查询数据的过程中，可以优先读取各数据块的数据统计信息而无需读取各数据块内部的数据，如果基于任一数据块对应的数据统计信息确定该数据块不满足当前的数据读写事务的数据块，则可以过滤掉该数据块，即不读取该数据块内部的数据。In the process of querying data, you can preferentially read the data statistics of each data block without reading the data inside each data block. If it is determined based on the data statistics corresponding to any data block that the data block does not meet the current data read requirements If the data block of the transaction is written, the data block can be filtered out, that is, the data inside the data block is not read.

以查询学生的数学成绩为例，假设当前数据库中有4列数据块，每列包括5个数据块，其中有一列数据块对应的属性为数学成绩，当前的数据读写事务为统计数学考满分(为100分)的学生的数量。此时，可以先定位到属性为数学成绩的那一列数据块，假设属性为数学成绩的那一列数据块为第4列数据块。接着，获取第4列数据块中的每个数据块分别对应的数据统计信息，以确定第4列数据块中的每个数据块中的最大值，比如分别为95、88、100、99、100。基于此可以确定该第4列数据块中的第一个数据块、第二个数据块以及第四个数据块的最大值都未到达满分，因此这三个数据块中不可能存在满分的数学成绩，进而可以直接过滤掉这三个数据块，还剩下第三个数据块和第五个数据块。最后，可以读取第三个数据块和第五个数据块中的数学成绩，统计有多少个满分的数学成绩。Take querying students' math scores as an example. Suppose there are 4 columns of data blocks in the current database, and each column includes 5 data blocks. Among them, one column of data blocks corresponds to the attribute of mathematics scores, and the current data read and write transaction is the full score of the statistical mathematics test. (100 points) the number of students. At this point, you can first locate the column of data blocks whose attribute is the math score, assuming that the column of data blocks whose attribute is the math score is the fourth column of data blocks. Next, obtain the data statistics corresponding to each data block in the fourth column of data blocks to determine the maximum value in each of the data blocks in the fourth column of data blocks, such as 95, 88, 100, 99, 100. Based on this, it can be determined that the maximum value of the first data block, the second data block and the fourth data block in the data block in the fourth column has not reached the full score, so there is no mathematical perfect score in these three data blocks. Results, and then you can directly filter out these three data blocks, leaving the third data block and the fifth data block. Finally, you can read the math scores in the third data block and the fifth data block, and count how many perfect math scores there are.

通过上述示例可以看出，当执行数据读写事务时所需操作的数据对应于同一属性时，可以直接查询同列数据块来执行数据读写事务，数据查询的效率较高。然而，如果执行数据读写事务时所需操作的数据对应至少两个属性，则可能选择以行的方式来存储数据更为有利。还以查询学生的考试成绩为例，假设当前的数据读写事务为统计考试不及格的学生名单，此时执行数据读写事务时所需操作的数据不仅包括考试成绩，还包括对应的学生的姓名，即对应着两种属性，而如果以行的方式来存储数据，这样就可以直接读取学生的姓名和考试成绩，如果考试成绩不合格，则直接输出对应的学生的姓名即可。It can be seen from the above example that when the data to be operated when executing a data read/write transaction corresponds to the same attribute, you can directly query the data block in the same column to execute the data read/write transaction, and the data query efficiency is high. However, if the data required to operate on a data read and write transaction corresponds to at least two attributes, it may be more advantageous to choose to store the data in rows. Taking the query of students' test scores as an example, suppose that the current data reading and writing transaction is the list of students who failed the statistical test. At this time, the data required to perform data reading and writing transactions includes not only the test scores, but also the corresponding students' data. Name corresponds to two attributes, and if the data is stored in rows, the student's name and test scores can be directly read. If the test scores are unqualified, the corresponding student's name can be directly output.

上文提及过行式数据块和列式数据库，实际应用中还存在另外一种数据库为行列混合式数据库。在这种行列混合式数据库中，可以选择以行的方式存储数据，也可以选择以列的方式存储数据。如果选择了以行的方式存储数据，则将数据从内存复制到行列混合式数据库的过程中，可以不对数据的存储格式进行转换。如果选择了以列的方式存储数据，则将数据从内存复制到行列混合式数据库的过程中，需要对数据的存储格式进行转换。The row-based data block and the column-based database have been mentioned above. In practical applications, there is another kind of database that is a row-column hybrid database. In this row-column hybrid database, you can choose to store data in rows or columns. If you choose to store data in rows, the data storage format may not be converted during the process of copying data from memory to a row-column hybrid database. If you choose to store data in columns, the data storage format needs to be converted during the process of copying data from memory to a row-column hybrid database.

以何种存储格式在行列混合式数据库中存储数据，或者说在存储数据之前，是否需要对数据进行存储格式转换，可以根据第一数据所对应的服务的查询特征信息来确定。可选地，在将Memtable转移到数据库之前，可以确定Memtable所对应的服务的查询特征信息；根据查询特征信息，确定是否对Memtable进行存储格式转换。The storage format in which the data is stored in the row-column hybrid database, or whether data storage format conversion needs to be performed before the data is stored, can be determined according to the query feature information of the service corresponding to the first data. Optionally, before transferring the Memtable to the database, the query feature information of the service corresponding to the Memtable can be determined; according to the query feature information, it is determined whether to perform storage format conversion on the Memtable.

实际应用中，一个Memtable中的数据往往来自于同一服务，不同服务往往具有不同的查询特征信息。在实际应用中，针对某一服务的数据读取事务，可以统计执行这些数据读取事务时的特征，比如执行大部分数据读取事务时所需操作的数据都对应同一属性，或者执行大部分数据读取事务时所需操作的数据对应不同属性。基于此，当执行大部分数据读取事务时所需操作的数据都对应同一属性时，可以认为该服务的数据适合以行的方式进行存储，当执行大部分数据读取事务时所需操作的数据对应不同属性时，可以认为该服务的数据适合以列的方式进行存储。In practical applications, the data in a Memtable often comes from the same service, and different services often have different query feature information. In practical applications, for the data read transactions of a service, the characteristics of the data read transactions can be counted. For example, the data required to perform most of the data read transactions correspond to the same attribute, or most The data to be manipulated in the data read transaction corresponds to different attributes. Based on this, when the data required to perform most of the data read transactions corresponds to the same attribute, it can be considered that the data of the service is suitable for storage in rows. When most data read transactions are performed, the required operations are When the data corresponds to different attributes, it can be considered that the data of the service is suitable for storing in columns.

可以理解的是，可以建立服务和查询特征信息的对应关系，在将Memtable转移到数据库之前，可以确定Memtable所属的服务，接着可以基于上述对应关系，确定与Memtable所属的服务对应的查询特征信息。在确定查询特征信息之后，可以根据查询特征信息，确定是否对Memtable进行存储格式转换。如果确定需要对Memtable进行存储格式转换，则可以对Memtable进行存储格式转换，如果确定不需要对Memtable进行存储格式转换，则可以直接将Memtable转移到数据库中。It can be understood that the corresponding relationship between the service and the query feature information can be established. Before transferring the Memtable to the database, the service to which the Memtable belongs can be determined, and then the query feature information corresponding to the service to which the Memtable belongs can be determined based on the above corresponding relationship. After the query feature information is determined, it may be determined whether to perform storage format conversion on the Memtable according to the query feature information. If it is determined that the storage format conversion of the Memtable is required, the storage format conversion of the Memtable can be performed. If it is determined that the storage format conversion of the Memtable is not required, the Memtable can be directly transferred to the database.

通过本发明实施例提供的方法，可以按需将内寸中以行的方式存储的数据转换为以列的方式存储的数据块，将转换后的数据块存入到列式数据块中，可以提高数据查询的效率。With the method provided by the embodiment of the present invention, the data stored in the row in the internal dimension can be converted into the data block stored in the column as needed, and the converted data block can be stored in the column data block. Improve the efficiency of data query.

以下将详细描述本发明的一个或多个实施例的数据存储装置。本领域技术人员可以理解，这些数据存储装置均可使用市售的硬件组件通过本方案所教导的步骤进行配置来构成。The data storage device of one or more embodiments of the present invention will be described in detail below. Those skilled in the art can understand that these data storage devices can be configured by using commercially available hardware components through the steps taught in this solution.

图6为本发明实施例提供的一种数据存储装置的结构示意图，如图6所示，该装置包括：写入模块601、生成模块602、复制模块603。FIG. 6 is a schematic structural diagram of a data storage device according to an embodiment of the present invention. As shown in FIG. 6 , the device includes: a writing module 601 , a generating module 602 , and a copying module 603 .

写入模块601，用于向内存中写入第一数据，第一数据是待存储到数据库中的数据。The writing module 601 is configured to write first data into the memory, where the first data is data to be stored in the database.

生成模块602，用于当内存中已写入的第一数据达到预设数据量时，生成第一记录信息，第一记录信息中包括第一标识信息和第二标识信息，第一标识信息是预设数据量的第一数据在内存中对应的标识信息，第二标识信息是数据库中已存储的第二数据在数据库中对应的标识信息。The generation module 602 is used for generating first record information when the first data written in the memory reaches a preset data amount, the first record information includes first identification information and second identification information, and the first identification information is The identification information corresponding to the first data of the preset data amount in the memory, and the second identification information is the identification information corresponding to the second data stored in the database in the database.

复制模块603，用于将预设数据量的第一数据复制到数据库。The copying module 603 is configured to copy the first data of the preset data amount to the database.

生成模块602，用于生成第二记录信息，第二记录信息中包括第二标识信息以及与预设数据量的第一数据对应的第三标识信息，第三标识信息是预设数据量的第一数据在数据库中对应的标识信息。The generating module 602 is configured to generate second record information, where the second record information includes second identification information and third identification information corresponding to the first data of the preset data volume, and the third identification information is the third identification information of the preset data volume. Identification information corresponding to a data in the database.

处理模块604，根据所述第一记录信息和所述第二记录信息，进行数据读写事务的处理。The processing module 604 processes data read and write transactions according to the first record information and the second record information.

可选地，复制模块603具体用于：对预设数据量的第一数据进行存储格式转换；将经过存储格式转换的数据复制到数据库中。Optionally, the copying module 603 is specifically configured to: perform storage format conversion on the first data of the preset data amount; and copy the data after the storage format conversion into the database.

可选地，预设数据量的第一数据包括多行数据，每行数据包括分别与多个属性对应的属性值，复制模块603还用于：将多行数据中对应于相同属性的属性值进行组合，得到多个数据块；确定多个数据块在数据库中分别对应的存储位置，多个数据块对应的存储位置位于同一行；将多个数据块分别存储到各自对应的存储位置上。Optionally, the first data of the preset data volume includes multiple rows of data, each row of data includes attribute values corresponding to multiple attributes, and the copying module 603 is further configured to: copy the attribute values corresponding to the same attribute in the multiple rows of data. Combine to obtain multiple data blocks; determine the respective storage locations of multiple data blocks in the database, and the storage locations corresponding to multiple data blocks are located in the same row; store multiple data blocks in their respective corresponding storage locations.

可选地，复制模块603还用于：对预设数据量的第一数据进行压缩；将压缩后的数据复制到数据库。Optionally, the copying module 603 is further configured to: compress the first data of the preset data amount; and copy the compressed data to the database.

可选地，所述装置还包括：统计模块，用于获取多个数据块各自对应的数据统计信息；将多个数据块各自对应的数据统计信息对应地存储到多个数据块各自对应的存储位置上。Optionally, the device further includes: a statistics module, configured to obtain the data statistics information corresponding to the multiple data blocks; and store the data statistics information corresponding to the multiple data blocks in the storage corresponding to the multiple data blocks. position.

可选地，所述装置还包括：滤除模块，用于响应于与多个数据块对应的数据读写事务，根据数据统计信息滤除不满足数据读写事务的数据块。Optionally, the apparatus further includes: a filtering module configured to, in response to data read/write transactions corresponding to multiple data blocks, filter out data blocks that do not satisfy the data read/write transactions according to data statistics information.

可选地，所述装置还包括：确定模块，用于确定预设数据量的第一数据所对应的服务的查询特征信息；根据查询特征信息，确定是否对预设数据量的第一数据进行存储格式转换。Optionally, the device further includes: a determination module configured to determine query feature information of a service corresponding to the first data of the preset data volume; according to the query feature information, determine whether to perform a query on the first data of the preset data volume. Storage format conversion.

可选地，所述处理模块604具体用于：接收数据读写事务；根据所述数据读写事务的接收时间与所述第一记录信息和所述第二记录信息各自对应的生成时间，确定与数据读写事务相匹配的参考记录信息，参考记录信息为第一记录信息或者第二记录信息；基于参考记录信息中包含的标识信息，确定执行所述数据读写事务所需访问的数据。Optionally, the processing module 604 is specifically configured to: receive a data read/write transaction; determine the corresponding generation time of the first record information and the second record information according to the reception time of the data read/write transaction and the corresponding generation time of the second record information. The reference record information matched with the data read and write transaction, the reference record information is the first record information or the second record information; based on the identification information contained in the reference record information, the data to be accessed to execute the data read and write transaction is determined.

可选地，所述事务处理模块具体用于：若数据读写事务的接收时间在第二记录信息的生成时间之前，则确定参考记录信息为第一记录信息；若数据读写事务的接收时间在第二记录信息的生成时间之后，则确定参考记录信息为第二记录信息。Optionally, the transaction processing module is specifically configured to: if the reception time of the data read/write transaction is before the generation time of the second record information, then determine that the reference record information is the first record information; if the reception time of the data read/write transaction is the first record information; After the generation time of the second record information, the reference record information is determined to be the second record information.

可选地，参考记录信息为第一记录信息，所述事务处理模块还用于：当数据读写事务提交时，删除第一记录信息和内存中的预设数据量的第一数据。Optionally, the reference record information is the first record information, and the transaction processing module is further configured to delete the first record information and the first data of the preset data amount in the memory when the data read/write transaction is submitted.

图6所示装置可以执行前述图1至图5所示实施例中提供的数据存储方法，详细的执行过程和技术效果参见前述实施例中的描述，在此不再赘述。The apparatus shown in FIG. 6 can execute the data storage methods provided in the embodiments shown in FIG. 1 to FIG. 5 . For the detailed execution process and technical effects, refer to the descriptions in the foregoing embodiments, which will not be repeated here.

在一个可能的设计中，上述图6所示数据存储装置的结构可实现为一服务器，如图7所示，该服务器可以包括：处理器701、存储器702。其中，所述存储器702上存储有可执行代码，当所述可执行代码被所述处理器701执行时，使所述处理器701至少可以实现如前述图1至图5所示实施例中提供的数据存储方法。In a possible design, the structure of the data storage device shown in FIG. 6 may be implemented as a server. As shown in FIG. 7 , the server may include: a processor 701 and a memory 702 . The memory 702 stores executable codes, and when the executable codes are executed by the processor 701, the processor 701 can at least implement the steps provided in the embodiments shown in FIG. 1 to FIG. 5. data storage method.

可选地，该服务器中还可以包括通信接口703，用于与其他设备进行通信。Optionally, the server may further include a communication interface 703 for communicating with other devices.

另外，本发明实施例提供了一种非暂时性机器可读存储介质，所述非暂时性机器可读存储介质上存储有可执行代码，当所述可执行代码被服务器的处理器执行时，使所述处理器至少可以实现如前述图1至图5所示实施例中提供的数据存储方法。In addition, an embodiment of the present invention provides a non-transitory machine-readable storage medium, where executable codes are stored on the non-transitory machine-readable storage medium, and when the executable codes are executed by a processor of a server, The processor is made to implement at least the data storage method provided in the embodiments shown in the aforementioned FIG. 1 to FIG. 5 .

以上所描述的装置实施例仅仅是示意性的，其中作为分离部件说明的单元可以是或者也可以不是物理上分开的。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The apparatus embodiments described above are merely illustrative, wherein units described as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助加必需的通用硬件平台的方式来实现，当然也可以通过硬件和软件结合的方式来实现。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以计算机产品的形式体现出来，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and certainly can also be implemented by combining hardware and software. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of computer products in essence or that contribute to the prior art. In the form of a computer program product embodied on a medium (including but not limited to disk storage, CD-ROM, optical storage, etc.).

本发明实施例提供的数据存储方法可以由某种程序/软件来执行，该程序/软件可以由网络侧提供，前述实施例中提及的电子设备可以将该程序/软件下载到本地的非易失性存储介质中，并在其需要执行前述数据存储方法时，通过CPU将该程序/软件读取到内存中，进而由CPU执行该程序/软件以实现前述实施例中所提供的数据存储方法，执行过程可以参见前述图1至图5中的示意。The data storage method provided by the embodiment of the present invention may be executed by a certain program/software, the program/software may be provided by the network side, and the electronic device mentioned in the foregoing embodiment may download the program/software to a local non-easy In the volatile storage medium, and when it needs to execute the foregoing data storage method, the program/software is read into the memory by the CPU, and then the program/software is executed by the CPU to realize the data storage method provided in the foregoing embodiment. , and the execution process may refer to the foregoing schematic diagrams in FIG. 1 to FIG. 5 .

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A data storage method, the method comprising:

Write first data into the memory, the first data is the data to be stored in the database;

If the first data written in the memory reaches a preset amount of data, first record information is generated, and the first record information includes first identification information and second identification information, and the first identification information is The identification information corresponding to the first data of the preset data volume in the memory, and the second identification information is the identification information corresponding to the second data stored in the database in the database;

copying the first data of the preset data amount to the database;

generating second record information, the second record information includes the second identification information and third identification information corresponding to the first data of the preset data volume, and the third identification information is the preset Identification information corresponding to the first data of the data volume in the database;

Receive a data read-write transaction, and determine the data to be accessed to execute the data read-write transaction according to the reception time of the data read-write transaction and the respective generation times of the first record information and the second record information;

According to the data to be accessed, the data read and write transactions are processed.

2. The method according to claim 1, wherein the copying of the first data of the preset data amount to the database comprises:

performing storage format conversion on the first data of the preset data amount;

Copy the data converted into the storage format into the database.

3 . The method according to claim 2 , wherein the first data of the preset data amount comprises multiple rows of data, each row of data includes attribute values corresponding to a plurality of attributes, and the The first data is subjected to storage format conversion, including:

combining attribute values corresponding to the same attribute in the multiple rows of data to obtain multiple data blocks;

Copying the data converted into the storage format into the database includes:

determining the respective storage locations of the multiple data blocks in the database, and the storage locations corresponding to the multiple data blocks are located in the same row;

The plurality of data blocks are respectively stored in their corresponding storage locations.

4. The method of claim 3, further comprising:

Acquiring respective data statistics information corresponding to the multiple data blocks;

The data statistics information corresponding to each of the plurality of data blocks is correspondingly stored in the corresponding storage locations of the plurality of data blocks.

5. The method of claim 4, further comprising:

In response to data read/write transactions corresponding to the plurality of data blocks, filter out data blocks that do not satisfy the data read/write transactions according to the data statistics information.

6. The method of claim 2, further comprising:

determining the query feature information of the service corresponding to the first data of the preset data volume;

According to the query feature information, it is determined whether to perform storage format conversion on the first data of the preset data amount.

7. The method according to claim 1, wherein the copying of the first data of the preset data amount to the database comprises:

compressing the first data of the preset data amount;

Copy the compressed data to the database.

8. The method according to any one of claims 1 to 7, wherein according to the reception time of the data read-write transaction and the generation time corresponding to the first record information and the second record information, determine The data to be accessed to execute the data read/write transaction includes:

According to the reception time of the data read/write transaction and the respective generation times of the first record information and the second record information, the reference record information matching the data read/write transaction is determined, and the reference record information for the first record information or the second record information;

Based on the identification information contained in the reference record information, the data to be accessed for executing the data read/write transaction is determined.

9. The method according to claim 8, wherein the determining of the reference record information matching the data read-write transaction comprises:

If the reception time of the data read/write transaction is before the generation time of the second record information, determine that the reference record information is the first record information;

If the reception time of the data read/write transaction is after the generation time of the second record information, the reference record information is determined to be the second record information.

10. The method according to claim 9, wherein the reference record information is the first record information, the method further comprising:

If the data read/write transaction is committed, the first record information and the first data of the preset data amount in the memory are deleted.

11. A data storage device comprising:

a writing module, configured to write first data into the memory, where the first data is the data to be stored in the database;

A generating module is used to generate first record information when the first data written in the memory reaches a preset data amount, and the first record information includes first identification information and second identification information, and the The first identification information is the identification information corresponding to the first data of the preset data amount in the memory, and the second identification information is the identification corresponding to the second data stored in the database in the database information;

a copying module, configured to copy the first data of the preset data amount to the database;

The generating module is configured to generate second record information, the second record information includes the second identification information and the third identification information corresponding to the first data of the preset data volume, the third identification information The identification information is the identification information corresponding to the first data of the preset data volume in the database;

a processing module, configured to receive a data read/write transaction, and determine to execute the data read/write transaction according to the reception time of the data read/write transaction and the corresponding generation time of the first record information and the second record information Data to be accessed; according to the data to be accessed, the data read and write transactions are processed.

12. A server, comprising: a memory and a processor; wherein, executable code is stored on the memory, and when the executable code is executed by the processor, the processor is caused to execute the method according to claim 1- The data storage method of any one of 10.

13. A non-transitory machine-readable storage medium having executable codes stored thereon, which, when executed by a processor of a server, cause the processor to execute The data storage method according to any one of claims 1-10.