CN117873405A - Data storage method, device, computer equipment and storage medium - Google Patents

Data storage method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117873405A
CN117873405A CN202410271793.4A CN202410271793A CN117873405A CN 117873405 A CN117873405 A CN 117873405A CN 202410271793 A CN202410271793 A CN 202410271793A CN 117873405 A CN117873405 A CN 117873405A
Authority
CN
China
Prior art keywords
data
log
data page
partition
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410271793.4A
Other languages
Chinese (zh)
Inventor
葛凯凯
邬沛君
李志阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202410271793.4A priority Critical patent/CN117873405A/en
Publication of CN117873405A publication Critical patent/CN117873405A/en
Pending legal-status Critical Current

Links

Abstract

The present application relates to a data storage method, apparatus, computer device, storage medium and computer program product, which can be applied to various scenarios such as cloud technology, artificial intelligence, intelligent traffic, assisted driving, etc. The method comprises the following steps: acquiring log data for recording update changes generated by data modification in a database; determining a data page partition according to the data page identifier carried by the log data; a data page partition, which is a partition in the data page virtual memory for storing data pages identified by the data page identification; based on the log data, performing log playback on the data page identified by the data page identification stored in the data page partition to obtain a data page corresponding to the updated data page; and storing the updated data page and the log data into at least one physical memory corresponding to the data page partition. The method can improve the processing efficiency of data storage.

Description

Data storage method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technology, and in particular, to a data storage method, apparatus, computer device, storage medium, and computer program product.
Background
With the development of computer technology, database technology, which aims at how to organize and store data and how to efficiently acquire and process data, has been rapidly developed and widely used. The database is a warehouse for organizing, storing and managing data according to a data structure, and in order to meet the demands of computing, storing and separating and expandability, a distributed storage mode is often adopted to replace a local disk single machine for storing so as to store the data in a scattered manner.
However, when data in a database is stored in a distributed manner, a large amount of data is transmitted through a network, which occupies a large amount of storage bandwidth and affects the processing efficiency of data storage.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data storage method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve the efficiency of data storage processing.
In a first aspect, the present application provides a data storage method. The method comprises the following steps:
acquiring log data for recording data update changes in a database;
determining a data page partition according to the data page identifier carried by the log data; a data page partition, which is a partition in the data page virtual memory for storing data pages identified by the data page identification;
Based on the log data, performing log playback on the data page identified by the data page identification stored in the data page partition to obtain a data page corresponding to the updated data page;
and storing the updated data page and the log data into at least one physical memory corresponding to the data page partition.
In a second aspect, the present application also provides a data storage device. The device comprises:
the log data acquisition module is used for acquiring log data for recording data update changes in the database;
the data page partition determining module is used for determining a data page partition according to the data page identifier carried by the log data; a data page partition, which is a partition in the data page virtual memory for storing data pages identified by the data page identification;
the log playback module is used for carrying out log playback on the data page identified by the data page identification stored in the data page partition based on the log data to obtain a data page corresponding to the updated data page;
and the data drop processing module is used for storing the updated data page and the log data into at least one physical memory corresponding to the data page partition.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the above data storage method when the processor executes the computer program.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the above data storage method.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of the above data storage method.
According to the data storage method, the device, the computer equipment, the storage medium and the computer program product, for the obtained log data for updating and changing the data in the record database, the data page partition in the data page virtual memory is determined according to the data page identification carried by the log data, the log playback is carried out on the data page identified by the data page identification stored in the data page partition based on the log data, and the updated data page obtained by the log playback and the log data are stored in at least one physical memory corresponding to the data page partition, so that the data in the database is stored. The log data for recording the data updating change in the database is obtained, the log playback is carried out on the data pages stored in the data page partition through the log data, so that the corresponding updated data pages are obtained, the updated data pages and the log data are stored, the data of the database can be stored through the transmission of the log data, the transmitted data volume can be reduced, the storage bandwidth occupancy rate is reduced, and the processing efficiency of the data storage is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for a person having ordinary skill in the art.
FIG. 1 is a diagram of an application environment for a data storage method in one embodiment;
FIG. 2 is a flow diagram of a method of data storage in one embodiment;
FIG. 3 is a schematic block diagram of a data storage method in one embodiment;
FIG. 4 is a schematic block diagram of obtaining updated pages of data in one embodiment;
FIG. 5 is a flow diagram illustrating determination of a data page partition in one embodiment;
FIG. 6 is a schematic diagram of a prior art storage separation data organization;
FIG. 7 is a schematic diagram of a database data organization distribution in one embodiment;
FIG. 8 is a schematic diagram of a multi-piece device joint partition organization in one embodiment;
FIG. 9 is a diagram of a data page disk and log disk joint characterization in one embodiment;
FIG. 10 is a diagram of disk partition partitioning in one embodiment;
FIG. 11 is a diagram of data page numbers associated with partitions in one embodiment;
FIG. 12 is a diagram of a log file rotation process in one embodiment;
FIG. 13 is a schematic diagram of log playback processing for log streams in one embodiment;
FIG. 14 is a diagram of the physical distribution of partition data in one embodiment;
FIG. 15 is a schematic diagram of a log space snapshot-based rotation in one embodiment;
FIG. 16 is a diagram of journal disk partition redirection in one embodiment;
FIG. 17 is a block diagram of a data storage device in one embodiment;
fig. 18 is an internal structural view of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
Cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. Cloud technology (Cloud technology) is based on the general terms of network technology, information technology, integration technology, management platform technology, application technology and the like applied by Cloud computing business models, and can form a resource pool, so that the Cloud computing business model is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.
Cloud storage (cloud storage) is a new concept that extends and develops in the concept of cloud computing, and a distributed cloud storage system (hereinafter referred to as a storage system for short) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of various types in a network to work cooperatively through application software or application interfaces through functions such as cluster application, grid technology, and a distributed storage file system, so as to provide data storage and service access functions for the outside. At present, the storage method of the storage system is as follows: when creating logical volumes, each logical volume is allocated a physical storage space, which may be a disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as a data Identification (ID) and the like, the file system writes each object into a physical storage space of the logical volume, and the file system records storage position information of each object, so that when the client requests to access the data, the file system can enable the client to access the data according to the storage position information of each object. The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided into stripes in advance according to the set of capacity measures for objects stored on a logical volume (which measures tend to have a large margin with respect to the capacity of the object actually to be stored) and redundant array of independent disks (RAID, redundant Array of Independent Disk), and a logical volume can be understood as a stripe, whereby physical storage space is allocated for the logical volume.
The Database (Database), which can be considered as an electronic filing cabinet, is a place for storing electronic files, and users can perform operations such as adding, inquiring, updating, deleting and the like on the data in the files. A "database" is a collection of data stored together in a manner that can be shared with multiple users, with as little redundancy as possible, independent of the application.
The database management system (Database Management System, abbreviated as DBMS) is a computer software system designed for managing databases, and generally has basic functions of storage, interception, security, backup and the like. The database management system may classify according to the database model it supports, e.g., relational, XML (Extensible Markup Language ); or by the type of computer supported, e.g., server cluster, mobile phone; or by the query language used, such as SQL (structured query language (Structured Query Language), XQuery, or by the energy impact emphasis, such as maximum-scale, maximum-speed, or other classification means, regardless of which classification means is used, some DBMSs can cross-category, for example, while supporting multiple query languages.
The data storage method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. The application environment includes various computer devices, and specifically, the first terminal 102 communicates with the first server 104 through a network, and the first data storage system may store data that needs to be processed by the first server 104. The first data storage system may be provided separately, may be integrated on the first server 104, or may be placed on a cloud or other server. The second terminal 106 communicates with the first data storage system and the second server 108, respectively, via a network, and the second data storage system may store data that the second server 108 needs to process. The second data storage system may be provided separately, may be integrated with the second server 108, or may be located on the cloud or other server.
Various applications may be installed on the first terminal 102 to generate various application data during the process of running the various applications, such as various data including video data, text data, or image data, which may be generated by the various applications of the first terminal 102; the first server 104 may be a background server corresponding to various applications of the first terminal 102 for providing various application corresponding functions. The first data storage system may include a database for storing various application data. The database in the first data storage system may send log data to be separately stored to the second server 108 through the second terminal 106, where the log data is used to record a data update change of the database in the first data storage system. For the obtained log data, the second server 108 determines a data page partition in the data page virtual memory according to the data page identifier carried by the log data, performs log playback on the data page identified by the data page identifier stored in the data page partition based on the log data, and stores the updated data page obtained by the log playback and the log data in at least one physical memory corresponding to the data page partition, for example, in the physical memory of the second data storage system, thereby realizing separate storage of the data in the first data storage system.
The first terminal 102 and the second terminal 106 may be, but not limited to, various desktop computers, notebook computers, smart phones, tablet computers, aircrafts, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, etc. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The first server 104 and the second server 108 may be independent physical servers, may be a server cluster or a distributed system formed by a plurality of physical servers, and may also be cloud servers for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited herein. The embodiments of the present application may be applied to various scenarios including, but not limited to, cloud technology, artificial intelligence, intelligent transportation, assisted driving, and the like.
In an exemplary embodiment, as shown in fig. 2, a data storage method is provided, where the method is performed by a computer device, specifically, may be performed by a computer device such as a terminal or a server, or may be performed by the terminal and the server together, and in this embodiment, the application of the method to the computer device in fig. 1 is described by taking as an example, where the method includes the following steps 202 to 208. Wherein:
step 202, obtaining log data for recording changes in data updates in a database.
The database is a set of a large amount of data which is stored in a computer for a long time, organized, sharable and uniformly managed, and the data can be stored and deleted through the database. Log (Log) data is used to record update changes of data in a database, and specifically, data update information for data content in the database may be recorded, for example, various modification information including, but not limited to, adding data, modifying data, deleting data, and the like may be included. The log data can be correspondingly generated when the data transmission update changes in the database so as to record the update change condition of the data.
Specifically, the computer device may obtain log data, which may be obtained based on a database, through which update changes of each data in the database may be accurately recorded. For example, the computer device may obtain log data of the Mysql database, and the log data may be transmitted to the computer device by the Mysql database through the data transmission interface.
Step 204, determining a data page partition according to the data page identifier carried by the log data; the data page partition is a partition in the data page virtual memory for storing data pages identified by the data page identification.
Wherein the data Page identifier is used for identifying a data Page (Page), and the data Page is a basic unit of data storage in a database, and defaults to a size of 16KB (kilobytes). In computer applications, the basic unit of interaction between disk and memory is a page of data, i.e., a minimum of 16KB of content is read from disk to memory at a time, or a minimum of 16KB of content in memory is refreshed to disk at a time. In a database, data pages in which rows are located are loaded, whether one row is read or multiple rows are read. That is, the basic unit of the database management storage space is a data page, the minimum unit of the database I/O (input/output) operation is a page, and a plurality of line records can be stored in one page. Different data pages can be distinguished by different data page identifications, in particular by different data page numbers, data page names.
The data page partition is a partition for storing the data page identified by the data page identification, and the data page partition is a partition divided in the data page virtual memory. The data page virtual memory is a logical virtual disk allocated by the computer device according to the need, and can be used for storing data. By partitioning the data page virtual memory, individual partitions may be obtained, each of which may be used to store a different data page.
Alternatively, the computer device may determine the data page identity carried by the log data, and different log data may be changed for recording data updates of different data pages, with the data pages being identified by the data page identity. The log data carries a data page identifier, thereby indicating the data page to which the change of the update of the record data is directed. In some embodiments, the computer device may parse the log data, such as may parse individual fields of the log data, to determine a data page identification carried by the log data from the data page identification field. The computer device determines a data page partition from the data page virtual memory according to the data page identifier, and the specific computer device may query in the data page virtual memory according to the data page identifier, thereby determining a partition identified by the data page identifier in the data page virtual memory, and determining the partition as a data page partition, where the data page partition is used to store a data page identified by the data page partition.
And 206, performing log playback on the data page identified by the data page identifier stored in the data page partition based on the log data, and obtaining the data page corresponding to the updated data page.
The log playback refers to the process of updating the corresponding data page according to the modification information recorded in the log data, and the data page after updating the data page according to the modification information recorded in the log data can be obtained through the log playback. That is, the log playback refers to a process of updating the data page in correspondence with the modification information recorded in the log data. The updated data page is the result of the modification information recorded in the log data acting on the data page. For example, for the data page a, the log data 1 includes modification information for deleting the data m in the data page a, and when the log playback is performed for the data page a according to the log data 1, the computer device may delete the data m in the data page a according to the modification information for deleting the data m recorded by the log data 1, so as to obtain an updated data page a', where the data m has been deleted compared to the data page a.
For example, the computer device may perform log playback on the data page stored in the data page partition based on the log data, and the specific computer device may perform data update on the data page stored in the data page partition according to the modification information in the log data, so that the updated data page is a data update result obtained by the data page identified by the data page identifier under the effect of the modification information in the log data. In a specific implementation, the computer device may update the data page stored in the data page partition directly according to the log data, to obtain an updated data page. In some embodiments, the computer device may read the stored data page from the data page partition, update the read data page according to log data, and store the updated data page into the data page partition to overwrite the original data page before the update.
Step 208, storing the updated data page and log data in at least one physical memory corresponding to the data page partition.
The physical memory refers to a truly existing memory, and is not a logically allocated virtual disk, and may specifically include a truly existing physical disk, such as a floppy disk, a hard disk, and various disks. The physical memory may comprise at least one, each physical memory being separately arranged with respect to the database, i.e. relatively independent between data processing in the database and data storage in the physical memory.
Alternatively, the computer device may store the log data and the updated data page, and in particular may store the log data and the updated data page in at least one physical memory corresponding to the data page partition. In a specific implementation, the physical memory may be arranged corresponding to the data page partition, i.e. different physical memories may be used for recording data pages in different data page partitions as well as log data. For example, the computer device may determine at least one physical memory corresponding to the data page partition, specifically may determine the corresponding at least one physical memory according to the partition identifier of the data page partition and the preset mapping relationship, and the computer device may store the obtained log data and the updated data page in the determined physical memory.
In a specific application, as shown in fig. 3, the computer device obtains, from the database, log data for recording changes in data updates in the database, where the log data carries a data page identifier. The computer device determines a data page partition from a data page virtual memory (characterized as virtual memory in dashed lines in the figure) based on the data page identification, the data page partition for storing the data page identified by the data page identification. And the computer equipment performs log playback on the data pages based on the log data to obtain the corresponding updated data pages. The computer device stores the resulting updated data page and log data into a physical memory (represented as real memory in solid lines in the figure) corresponding to the data page partition, which may include at least one. In the storage processing of the data in the database, the computer equipment can acquire the log data for recording the update change of the data, log playback is performed on the data pages stored in the data page partition in the data page virtual memory based on the log data, updated data pages are obtained, the updated data pages and the log data are stored in the physical memory, and the storage of the data in the database can be realized. In the data storage process, updated data pages are not required to be transmitted from a database, but log playback is performed based on log data, so that the transmitted data volume is reduced, the occupancy rate of the storage bandwidth can be reduced, and the processing efficiency of data storage is improved.
In the data storage method, for the obtained log data of the data update change in the record database, the computer equipment determines the data page partition in the data page virtual memory according to the data page identifier carried by the log data, performs log playback on the data page identified by the data page identifier stored in the data page partition based on the log data, and stores the updated data page obtained by log playback and the log data into at least one physical memory corresponding to the data page partition, thereby realizing the storage of the data in the database. The log data for recording the data updating change in the database is obtained, the log playback is carried out on the data pages stored in the data page partition through the log data, so that the corresponding updated data pages are obtained, the updated data pages and the log data are stored, the data of the database can be stored through the transmission of the log data, the transmitted data volume can be reduced, the storage bandwidth occupancy rate is reduced, and the processing efficiency of the data storage is improved.
In an exemplary embodiment, log playback is performed on a data page identified by a data page identifier stored in a data page partition based on log data, and a data page corresponding to the updated data page is obtained, including: determining data update information for the data page identified for the data page identification based on the log data; and carrying out data updating on the data page according to the data updating information to obtain an updated data page.
The data update information may include various modification information for the data page, such as adding, deleting, modifying, etc., for the data page. In particular, the computer device may determine data update information based on the log data, which may include various modification information for the data page identified by the data page identification. In some embodiments, the computer device may parse the log data, such as may parse various fields of the log data, to extract data update information from the log data for the data page identified by the data page identification. The computer device may update the data page according to the data update information, and may specifically modify the data in the data page according to the modification information in the data update information, so as to obtain an updated data page, where the updated data page is a data page that needs to be stored.
In this embodiment, the computer device determines the data update information based on the log data, and updates the data page identified by the data page identifier according to the data update information, so as to implement log playback based on the log data, obtain an updated data page to be stored, avoid transmitting the data page through the network, greatly reduce the data volume transmitted by the network, and improve the processing efficiency of data storage.
In an exemplary embodiment, performing data update on a data page according to data update information to obtain an updated data page includes: reading a data page from a data page partition; updating the data in the read data page according to the data updating information to obtain an updated data page; the updated data page is written to the data page partition to replace the data page.
The data pages are stored in the data page partition of the data page virtual memory, and the computer equipment can read the data pages needing log playback from the data page partition. Specifically, the computer device may directly read the data page from the data page partition, update the data in the read data page according to the data update information, and specifically, modify the data in the data page according to the modification information recorded in the data update information, so as to obtain an updated data page. The computer device may rewrite the updated data page into the data page partition and replace the original data page to store the updated data page.
In one specific application, as shown in fig. 4, the computer device determines data update information according to the obtained log data, and updates the data page read from the data page partition according to the data update information, thereby obtaining an updated data page. Wherein the data pages are stored in data page partitions of a data page virtual memory (denoted virtual memory in the figure by dashed lines). The computer device rewrites the updated data page into the data page partition to overwrite the original data page, thereby storing the updated data page in the data page partition of the data page virtual memory.
In this embodiment, the computer device reads the data page from the data page partition, updates the read data page according to the data update information, and replaces the data page in the data page partition with the obtained updated data page, so as to update the data page version stored in the data page partition of the data page virtual memory, so that the validity of the data page data in the data page partition can be ensured, and log playback can be performed based on the data page data stored in the data page partition to directly obtain the data page to be stored, thereby being beneficial to improving the processing efficiency of data storage.
In an exemplary embodiment, as shown in fig. 5, the process of determining the data page partition, that is, determining the data page partition according to the data page identifier carried by the log data, includes:
step 502, determining a data page identifier carried by log data.
Specifically, the computer device may parse the log data to determine a data page identifier carried by the log data, e.g., may parse the data page identifier field of the log data to obtain the data page identifier. The data page identifier is used for identifying each data page, the specific form of the data page identifier can be set according to actual needs, and the data page identifier can comprise identification information such as a data page name, a data page number and the like.
Step 504, based on the data page identification and the partition division unit, determining a partition identification corresponding to the data page identified by the data page identification.
The partition unit is a unit for partitioning the storage space by pointers, and the corresponding storage space can be partitioned into a plurality of partitions based on the partition unit. For example, the partition dividing unit may be N data pages, that is, the storage space is divided in units of N data pages, and the storage space corresponding to each N data pages may be divided into one partition. The partition identification is used for distinguishing each partition in the storage control, and specifically can comprise various identification information such as the number of the partition, the name of the partition and the like.
For example, the computer device may determine a partition division unit and calculate a partition identification based on the data page identification and the partition division unit, the partition identification corresponding to the data page identified by the data page identification. In a specific application, the partition identification may be applied to different storage spaces, such as virtual memory and physical memory, that is, each partition is identified by the same partition identification in both the virtual memory and the physical memory. In some embodiments, the data page identifier may include a data page number M, and the partition unit may be N data pages, and then the computer device may calculate the data page number M/the partition unit n+1 to obtain the partition identifier corresponding to the data page identified by the data page identifier. According to the partition identification, the partition corresponding to the data page identified by the data page identification may be determined from various storage intermediaries, for example, a data page partition in the data page virtual memory may be determined, and for example, a log partition in the log virtual memory may be determined, where the log partition may be a partition in the log virtual memory for storing log data.
In step 506, a data page partition for storing the data page identified by the data page identification is determined from the data page virtual memory based on the partition identification.
Wherein partition identification may be used to identify individual partitions in the data page virtual memory. Alternatively, the computer device may traverse each partition in the data page virtual memory by partition identification to determine a data page partition from the data page virtual memory, the data page partition for storing the data page identified by the data page identification.
In this embodiment, the computer device determines the partition identifier according to the data page identifier and the partition dividing unit carried by the log data, and determines the corresponding data page partition from the data page virtual memory according to the partition identifier, so that accurate mapping can be performed based on each partition in the data page virtual memory based on the data page identifier, and accuracy of determining the data page partition can be ensured, thereby ensuring accuracy of data storage.
In one exemplary embodiment, the data storage method further comprises: storing the log data into a log partition of a log virtual memory; the log virtual memory and the data page virtual memory jointly characterize the same data storage space.
The log virtual memory is a logic virtual disk distributed by the computer equipment according to the requirement and can be used for storing data. By partitioning the log virtual memory, individual log partitions may be obtained, each of which may be used to store different log data. The journal virtual memory and the data page virtual memory jointly represent the same data storage space, namely the journal virtual memory and the data page virtual memory can be combined to carry out complete storage on data in a database. Specifically, the log virtual memory is used for storing log data, the data page virtual memory is used for storing data pages, and the log virtual memory and the data page virtual memory jointly characterize a data storage space which can be used for storing data in a database. That is, the data storage space for storing the data in the database can be logically separated into a log virtual memory and a data page virtual memory according to different storage data types, so that the log data and the data page can be respectively stored through different virtual memories.
Specifically, after obtaining the log data, the computer device may store the log data, for example, may store the log data by a pre-allocated log virtual memory. When the method is applied, the computer equipment can store the log data when receiving the log data, namely, the computer equipment can store the log data before carrying out log playback on the data page identified by the data page identification stored in the data page partition based on the log data to obtain the data page corresponding to the updated data page. In particular implementations, a computer device may determine a log virtual memory associated with a data page virtual memory and determine a log partition from the log virtual memory into which the computer device may store log data, thereby enabling storage for the log data. In some embodiments, the association between the log virtual memory and the data page virtual memory may be set in advance so that the associated other may be determined from a query of either of the log virtual memory and the data page virtual memory.
Further, based on the log data, performing log playback on the data page identified by the data page identifier stored in the data page partition, to obtain a data page corresponding to the updated data page, including: and reading the log data from the log partition, and performing log playback on the data page identified by the data page identification stored in the data page partition based on the read log data to obtain the data page corresponding to the updated data page.
Optionally, during log playback, the computer device may read corresponding log data from the log partition. Specifically, the computer device may perform data reading in the log partition, so as to obtain a log, and perform log playback on the data page identified by the data page identifier stored in the data page partition based on the log data obtained by reading, so as to obtain a corresponding updated data page. In some embodiments, the computer device may monitor whether a log playback trigger condition is satisfied, e.g., whether the number of log data reaches a trigger threshold, whether a log playback processing period is reached, etc., and in the event that the log playback trigger condition is satisfied, the computer device may trigger reading the log data from the log partition for log playback processing.
Further, storing the updated data page and log data in at least one physical memory corresponding to the data page partition, comprising: reading the updated data page from the data page partition; and storing the read log data and the updated data page into at least one physical memory corresponding to the data page partition.
After log playback based on log data, the updated data page may be stored in the data page partition, and the computer device may read the updated data page from the data page partition. For example, the computer device may store the read log data and updated data pages in physical memory, specifically in at least one physical memory corresponding to the data page partition. In a specific application, the computer device may determine at least one corresponding physical memory according to the data page partition, and store the read log data and the updated data page in each physical memory respectively. When the physical memory includes at least two physical memories in a specific application, the log data obtained by reading and the updated data page can be stored for each physical memory at the same time.
In this embodiment, the computer device reads the log data from the log partition for log playback by storing the log data in the log partition of the log virtual memory, reads the updated data page from the data page partition, and drops the read log data and the updated data page into the physical memory, that is, stores them in the physical memory, so that accuracy of the log data and the updated data page can be ensured, and reliability of data storage is ensured.
In one exemplary embodiment, storing log data into a log partition of a log virtual memory includes: determining a log virtual memory associated with the data page virtual memory; determining a log partition from the log virtual memory according to the partition identification of the data page partition; the partition identification is determined based on the data page identification and the partition division unit; log data is stored into a log partition.
The log virtual memory and the data page virtual memory are associated, and the specific log virtual memory and the data page virtual memory jointly represent the same data storage space, so that the data aiming at the same database can be cooperatively stored. The partition identifier is used to identify the partition of the storage space, and may be specifically determined according to the data page identifier and the partition division unit. The log partition for storing the current log data may be determined from the log virtual memory based on the partition identification.
Alternatively, the computer device may determine the log virtual memory associated with the data page virtual memory, and the particular computer device may query a pre-set virtual memory map, based on which the log virtual memory associated with the data page virtual memory may be queried. The computer device may determine a log partition for storing log data from the log virtual memory, and the particular computer device may determine a partition identification of the data page partition, the partition identification being determined based on the data page identification and the partition division unit. The computer device traverses each partition in the log virtual memory according to the partition identification to determine the log partition identified by the partition identification for storing the current log data. The computer device may store the log data in the determined log partition.
In a specific implementation, for the associated data page virtual memory and log virtual memory, both may have the same memory size and be partitioned in the same partition division unit, resulting in the same number of partitions. The association relationship can also be set between the respective partitions of the data page virtual memory and the log virtual memory, for example, the partitions of the data page virtual memory and the log virtual memory, which have the same partition identification, can be associated, so that the associated partitions can jointly characterize the partitions of the data storage space. For example, when the storage space sizes of the data page virtual memory and the log virtual memory are 32T, and the data page partition and the log partition of the same data are obtained by partitioning the data page virtual memory and the log virtual memory in a partition division unit of 5 GB. The individual partitions may be distinguished by partition identification, such as by partition numbering. An association may be established between partitions in the data page virtual memory and the log virtual memory having the same partition number to jointly characterize the same data storage space partition. For example, for a data page partition K and a log partition K, the data page partition K is used to store a data page with partition number K, and the log partition K is used to store log data with partition number K, so that the data page partition K and the log partition K can cooperatively store complete database data with partition number K.
In this embodiment, the computer device determines a log partition from the log virtual memory according to the partition identifier of the data page partition, and stores the log data into the log partition, so that the log virtual memory stores the log data, so that the data security of the log data can be ensured, and the reliability of the data storage can be ensured.
In one exemplary embodiment, the data storage method further comprises: when the storage space of the log partition is saturated, creating a snapshot which is the same as the logical address space of the log partition, and storing log data into the snapshot; and deleting the snapshot when the snapshot meets the release judging condition.
Wherein the logical address space refers to the storage space of the log partition and the snapshot in the log virtual memory. The log partition and the snapshot have the same logical address space, i.e., the snapshot is not an extension to the log space in the log virtual memory. The snapshot may be a copy or replica of the data reproduction, and for a file system, the file system snapshot is an instant copy of the file system that contains all the information of the file system at the time of snapshot generation, and is itself a complete usable copy. By creating the snapshot, the storage space of the log partition can be expanded according to actual needs, so that normal acquisition of log data is ensured. The release determination condition is used to determine whether deletion is required for the created snapshot to release the corresponding storage space. For example, the release determination condition may include reaching a snapshot release period, completion of data storage processing of log data in a snapshot, and the like.
For example, the computer device may store the obtained respective log data in the log partition, and when the storage space of the log partition is saturated, indicating that the log partition may not be able to further store the log data, the computer device may create a snapshot, which is the same logical address space as the log partition. The computer device may store the acquired log data in a snapshot, which may be redirected to physical memory, i.e., the data in the snapshot may be actually stored in physical memory. When creating multiple snapshots, the logical address space of each snapshot is the same as the logical address space of the log partition, i.e., each snapshot remains isolated in the log virtual memory. Further, the computer terminal may monitor whether a release determination condition for the snapshot is satisfied, and when the release determination condition is satisfied, the computer device may delete the created snapshot, thereby releasing the corresponding storage space, e.g., when the snapshot is redirected to the physical memory, the computer device may delete the snapshot from the physical memory, thereby releasing the storage space of the physical memory. For example, when the log data stored in the snapshot completes data storage, that is, the log data stored in the snapshot summary is already stored in the corresponding physical memory, the computer device may consider that the release determination condition is satisfied, and delete the snapshot, thereby releasing the storage space corresponding to the snapshot.
In this embodiment, the computer device performs log data storage by creating a snapshot that is the same as the logical address space of the log partition, and performs active release on the snapshot, so that efficient acquisition and storage of log data can be ensured, thereby ensuring data security of the log data, and being beneficial to improving reliability of data storage.
In one exemplary embodiment, creating the same snapshot as the logical address space of the log partition includes: a snapshot is created based on the log partition and redirected into at least one physical memory according to a logical address space of the log partition.
For example, the computer device may create a snapshot based on the log partition, the storage space size of the snapshot is the same as the storage space size of the log partition, and the logical address space of the snapshot is the same as the logical address space of the log partition, i.e., both point to the same log partition in the log virtual memory. The computer device redirects the snapshot according to the logical address space of the log partition, and the snapshot is redirected to at least one physical memory, so that a storage space is provided for the snapshot through the at least one physical memory, and normal acquisition and storage of log data are ensured.
Further, when the snapshot satisfies the release determination condition, deleting the snapshot includes: when the log data stored in the snapshot completes corresponding data storage processing, deleting the snapshot, and releasing the storage space of the snapshot from at least one physical memory.
Optionally, the computer device may monitor the processing progress of the log data stored in the snapshot, and when it is determined that the log data stored in the snapshot completes the corresponding data storage processing, that is, when the log data stored in the snapshot are all stored in the corresponding physical memory, the computer device may consider that the release determination condition is met, so as to delete the snapshot, and the computer device may also release the storage space occupied by the snapshot in the physical memory, for example, may release the redirection relationship from the snapshot to the physical memory.
In this embodiment, the computer device redirects the snapshot to the physical memory by creating the snapshot that is the same as the logical address space of the log partition, so as to store log data through the physical memory, and dynamically release the snapshot, so that efficient acquisition and storage of the log data can be ensured, thereby ensuring data security of the log data, and being beneficial to improving reliability of data storage.
In one exemplary embodiment, the log data is the first log data to identify the identified data page for the data page; the data storage method further comprises the following steps: generating a data page identified by the data page identification based on the log data; and writing the generated data page into the data page partition as an updated data page.
Wherein the log data belongs to the first log data, i.e. there is no updated data page before the log data. Alternatively, the computer device may generate the corresponding data page directly based on the log data, and take the generated data page as the updated data page. The computer device may also write the updated data page directly into the data page partition of the data page virtual memory.
In this embodiment, for the first log data, the computer device directly generates an updated data page according to the first log data and writes the updated data page into the data page partition for storage, so as to ensure data security of the data page, which is beneficial to improving reliability of data storage.
In one exemplary embodiment, storing the updated data page and log data in at least one physical memory corresponding to the data page partition includes: determining at least two physical memories based on the distributed settings; for each of the at least two physical memories, determining a storage subspace corresponding to the data page partition from the targeted physical memory; and respectively storing the updated data page and the log data into a storage subspace of the physical memory.
Wherein the physical memories include at least two, and each physical memory is configured based on a distributed independent arrangement, such that each physical memory can store data as a block device. The updated data page and log data are stored for each physical memory. The storage subspace corresponds to the data page partition and is used for storing data related to the data page partition, and the data page and log data can be specifically included.
For example, the computer device may determine at least two physical memories based on a distributed setting in advance, and the computer device may traverse each physical memory for data storage. For each physical memory, the computer device determines a storage subspace in the physical memory, the storage subspace corresponding to the data page partition. The physical memory may be divided into a plurality of storage subspaces, each of which may store data associated with a respective data page partition. The corresponding relation between the data page partition and the storage subspace can be preset according to actual requirements. The computer device stores the updated data page and log data, respectively, into a storage subspace of the physical memory to which it is directed. In a specific implementation, the storage subspace may further divide a data page space and a log space, so as to store updated data pages through the data page space, and store log data through the log space. After traversing each physical memory, the computer device causes each physical memory to store updated data pages and log data simultaneously.
In this embodiment, the computer device stores the updated data page and the log data in the storage subspace of each physical memory, and may perform data storage through a plurality of distributed physical memories, so as to ensure reliability of data storage.
In one exemplary embodiment, determining a storage subspace corresponding to a data page partition from a targeted physical memory includes: mapping is carried out on the partition identifiers of the data page partitions, and a partition identifier mapping result is obtained; determining subspace identification according to the partition identification mapping result; a subspace identification identified storage subspace is determined from the targeted physical memory.
The partition identifier is used for identifying the partition of the storage space, and can be specifically determined according to the data page identifier and the partition dividing unit. The log partition for storing the current log data may be determined from the log virtual memory based on the partition identification, and the data page partition for storing the updated data page may be determined from the data page virtual memory. The mapping result of the partition identifier is a mapping result obtained by mapping the partition identifier, and specifically may include a mapping result obtained by mapping based on a linear mapping algorithm or a nonlinear mapping algorithm, for example, the mapping result of the partition identifier may include a mapping result obtained by mapping the partition identifier based on a Hash (Hash) algorithm. The subspace identification is used for identifying storage subspaces in the physical memory, different storage subspaces can correspond to different subspace identifications, and the subspace identifications can be in the specific forms of subspace numbers, subspace names and the like.
In particular, the computer device may determine a partition identification of the data page partition, which may be calculated based on the data page partition and the partition division unit. The computer device may map the partition identifier according to a preset mapping algorithm, for example, may map the partition identifier according to a consistent hash algorithm to obtain a mapping result of the partition identifier. The computer device may determine the subspace identification based on the partition identification mapping result. In a specific application, a corresponding relation between the partition identifier mapping result and the subspace identifier may be preset, and the computer device may determine the subspace identifier corresponding to the partition identifier mapping result by querying the preset corresponding relation. The computer device may traverse the respective storage subspaces in the physical memory to determine from the physical memory a subspace identification identified storage subspace, which is a storage space for storing updated data pages and log data.
In this embodiment, the computer device maps the partition identifier of the data page partition, determines the subspace identifier according to the partition identifier mapping result, and determines the storage subspace identified by the subspace identifier from the physical memory, so that the data storage can be accurately located to the storage subspace for data storage by mapping the partition identifier, and the orderly management of the data storage can be ensured, and the reliability of the data storage is ensured.
The application also provides an application scene, and the application scene applies the data storage method. Specifically, the application of the data storage method in the application scene is as follows:
the relational database Mysql has a data file and a redo log file when storing data using a local disk, and is recorded in the form of a file. Wherein, redox log is used to record the modifications made to database data pages by database transaction operations. In order to meet the demands of separation and expandability of computing storage, distributed storage is often used to replace local disk standalone storage. In the scenario of using distributed block storage, the block storage is often required to be formatted into a file system, so that the data organization distribution of Mysql can be directly written without change, and still be divided into a data file and a redox log file, and only the file system formatted by the block storage is required to be used as a local file system with larger capacity, which increases the cost for IO (input/output) disk drop of the database, and the data organization distribution of Mysql is required to pass through the file system on the block storage before reaching the block storage. In addition, the bandwidth of a block storage system is divided by data and logs, while a 16KB data Page (Page) is much larger than a few tens of bytes of redox log, the traffic of transferring one data Page is equivalent to the traffic of transferring hundreds of redox logs, and the redox log falls on a disk, that is, the slow of storing the redox log to a physical disk affects the Mysql memory refresh (flush) speed, which may reduce the performance of the database.
Specifically, as shown in fig. 6, in the Mysql database stored locally, for the Mysql database operated by the Host, both the data page (page) and log (log) thereof are stored into the current disk through the File System (FS). After the storage separation processing is performed on the locally stored Mysql database, the distributed block storage system can virtualize a plurality of logic block devices in the Mysql storage separation data organization using the distributed block storage. For the Mysql database run by the Host, both its data pages and logs may be sent to the block device 1 via the local file system. The block device 1 is mounted to a Host, which may be a physical machine or a virtual machine, and formats a file system FS (this file system is generally customized, and it is necessary to ensure that data consistency is written simultaneously when a plurality of hosts are mounted). The Mysq database l above the Host writes pages and logs to the FS file system, then to the block device 1, and finally to the block storage system via network data. A plurality of block devices may be included in the block storage system, and for block device 1, data pages and log data from the database may be stored.
However, the distributed block storage system is a black box for Mysql, and as with the use of a local disk, a layer of file system needs to be added on the block device to ensure that the disk-dropping logic of the original Mysql data (including page data and log data) is unchanged, and meanwhile, the file system layer needs to be custom developed in order to realize multiple mounts of the block device. This also makes Mysql imperceptible to the block storage system and does not transfer some of the Mysql's computational tasks to the block storage system. In addition, after the distributed block storage system is used, the Mysql node and the storage system use a network to transmit data, but the data organization is unchanged, so that a large amount of data pages of 16KB occupy storage bandwidth, and the performance of the whole Mysql is affected.
Based on the above, the data storage method provided by the embodiment is implemented based on the database data organization of the distributed block storage, only the redox log is transmitted to the block storage system by using the idea of the log, namely the database, and is organized by a plurality of bare discs of the block device, and meanwhile, the log is circularly stored by using the snapshot of the block device, so that the bare disc data organization distribution based on the distributed block storage system is implemented by using the same. According to the data storage method provided by the embodiment, the Mysql database is perceived by storage, the data organization is directly carried out through the bare disc of the block device without constructing a file system layer, the Mysql data space is organized through a plurality of bare discs of the block device, meanwhile, the snapshot of the block device is utilized to rotate the storage log, and the data reorganization distribution is realized through the data organization.
Specifically, as shown in fig. 7, in the database data organization distribution based on the distributed block storage, for the Mysql database operated by the Host, mysql only transmits log data to the block storage system through the block storage client. The log data is stored by using a log disk (namely a log disk, belonging to virtual block equipment), the disk is partitioned according to 2GB, the partition is called a small table, the redox log falls into the corresponding small table according to the Page number (data Page number) to which the redox log belongs, and one redox log belongs to one Page; and performing log playback on the redox log in the log disk small table. The log playback is to record the page modification content recorded by the redox log onto the page, and the first redox log playback is to generate the original page. And then writing the newly generated page into a small table space corresponding to the page disk (data page disk). Thus, the data of the database is reorganized in a disc-falling mode. Wherein, disk is a logical virtual disk distributed by the distributed block storage system.
Further, the data storage method provided by the embodiment is implemented based on database data organization of distributed block storage, so as to improve the performance of Mysql in the distributed block storage, and specifically comprises processing such as joint partition organization of table space multi-block equipment, linear association of Page data and log data, log space rotation and the like.
For tablespace multi-piece device joint partition organization, the page data of a Mysql database is typically composed of multiple tablespaces, such as a data tablespace, a temporary tablespace, an UNDO tablespace, etc., which are all stored corresponding to a file when using local disk storage. Furthermore, redox log data is written round by several log files. As shown in fig. 8, in the local storage data organization distribution, a data page in the local storage includes file data such as a data tablespace file, a temporary tablespace file, and a revocation tablespace file, and log data may be written in a round-robin manner through a log file 1, a log file 2, and a log file 3, where the log file 1, the log file 2, and the log file 3 are used as log spaces for writing the log data. In addition, when the log data is written into the log file 1, the log file 2 is rewritten, the log file 3 is rewritten after the log file 2 is written, and the log file 1 is rewritten after the log file 3 is written, so that the log file rotation is realized.
According to the data storage method provided by the embodiment, the table spaces, the page data and the log data are reorganized, each table space is logically represented by two disks, the two disks are used for storing the page data and the log data respectively, the disks are partitioned, each partition is 2GB in size, and logical association exists between the corresponding partitions of the two disks. As shown in fig. 9, for the data table space, the storage space can be jointly realized by a plurality of pieces of equipment, and the logical address space is 128T; the logical address space is realized by two disks, including a data page disk (page disk) and a log disk (log disk), and the respective storage spaces of the data page disk and the log disk are 0-128T. Taking the data table space as an example, the logical address space of the data table space is 128T, where Page disk and Log disk both use 128T block devices corresponding to the data table space, and both disks are mapped to the 128T logical address space, that is, two block devices jointly represent one table space, so that Page data and Log data are logically separated. Further, each disk is partitioned and managed in units of 2GB, and each partition of each of the data page disk and the log disk has a linear association. As shown in fig. 10, the data page disk and the log disk are divided into N partitions, each of which has a size of 2GB.
Further, for the linear association processing of page data and log data, the page disk and the log disk are mapped in the same 128T address space, and the small table partitions between the two disks have a one-to-one correspondence. Regarding the distribution of the page data over the page disk, the page data is composed of page pages of 16KB units, each page has a globally unique number (referred to as a page number), and the page pages are numbered to start occupying from the 0 address space of the page disk. As shown in fig. 11, for a table linear address space, the space size is 128T, and each data page can be accommodated at a time, and the number of the data page starts from 0. For data organization of the data page disk, one partition 2GB can hold 131072 data page pages, the data page numbers distributed in the first partition are from 0 to 131071, the data page numbers distributed in the second partition are from 131072 to 2626143, and so on, so that the following formula (1) exists between the data page pages and the partitions, so that the corresponding belonging partition can be determined according to the data page numbers.
Partition number = page number/131072+1 (1)
Further, the redox log data is written into the local file by serial numbering when the local disk is used for storage. As shown in fig. 12 in particular, for 12 log data (log) received in sequence, log data 1, log data 2, log data 3, and log data 4 may be stored by log file 1, log data 5, log data 6, log data 7, and log data 8 may be stored by log file 2, log data 9, log data 10, log data 11, and log data 12 may be stored by log file 3, and a round-robin writing process is performed between log file 1, log file 2, and log file 3, thereby processing for continuous log data streams.
The log data is finally modified on the page data and then generates new page data, so that the log data contains the page number to be modified, the partition number of the page disk where the page is located is found through the page to be acted on by the log, and then the log data is stored on the same partition number of the log disk. Specifically, as shown in fig. 13, the log stream sequence is 4 log data, the number sequence is 1, 2, 3, and 4, and the page numbers of the data pages corresponding to the operations are 131072, 1, 500, and 262145, respectively, where the page number corresponding to the first log data log is in partition 2, the page number corresponding to the second log data and the three log data is in partition 1, and the page number corresponding to the fourth log data is in partition 3. After the corresponding data page partition is found, log data organization exists in the corresponding partition. And by using the idea of log, namely a database, only log data is sent to the distributed block storage, log playback generation is carried out on page data through the distributed storage by using log, and the generated page data has the position of the corresponding page of the page disk.
Further, the page data and the log data are subjected to linear logic association, the disk corresponds to the same partition, one partition corresponds to a small table of a distributed storage system, the small table can be distributed on three disks of 3 machines by using three redundancy strategies, and therefore the page data and the log data are ensured to be on the same disk, and network delay is not increased across disk storage. In particular, as shown in fig. 14, in the physical distribution of partition data, a partition 1 of a data page disk and a log disk corresponds to a small table, the small table is a basic unit of data management of a distributed block storage system, each physical disk space in a storage cluster is divided into small tables according to granularity of 2GB, that is, a disk 1, a disk 2 and the like in fig. 14, three disks of corresponding disk IDs of 3 Host machines form a copy group, that is, a dashed line box in fig. 14, and the capacity of distributed block storage is allocated and used according to the small tables. The corresponding relation is obtained by acquiring an idle small table from a distributed block storage system when creating a database table space for partition initialization, and the relation binding of the partition and the small table is carried out. The small table obtains the three-copy group of the physical disk where the small table is located according to consistent Hash (Hash). The page data and log data are then written to the three physical disks. Through the method, the organization and distribution of page data and log data are realized, but log data is log stream, log generation needs to be written continuously, log file rotation mode storage used by original local storage is used for multiplexing space, and snapshot can be used for rotation storage. Specifically, in the distributed storage, 3 hosts may be included, each host may divide 3 disks separately, and the small table corresponding to the current partition 1 is mapped to the disk 2, that is, the data page and the log data in the partition 1 may be stored in the disk 2 of each host. Further, for the storage space in the physical disk, namely the 2GB small table space applies for the physical space according to 1M granularity, the 1M space on the physical disk is scanned when the service is started, then the spaces are placed in an idle list, when the 2GB small table space needs to be stored on the physical disk, the space is allocated based on the idle list, and then the space is written into the 1M space corresponding to the physical disk.
For log space rotation processing, the partition small table corresponding to the disk is only 2GB, so for continuous log streams, snapshot can be used for expanding and contracting the log space, when log partitions full of 2GB space are to be written, subsequent logs are written into new snapshots by creating block device snapshots, and Offset (Offset) in the partitions is reset to the beginning. As particularly shown in FIG. 15, in log space rotation processing, log number refers to the log number within this partition, where consecutive numbers do not represent global continuity, and log1 represents only the first number within this partition, not the first of the entire database log stream. Here, 3 snapshots are included, the first storing log1 through log n-1, the second storing log n through log m-1, and the third storing log m and log m+1. The Log partition is firstly written onto the snapshot 1 when the Log is just stored, meanwhile, the logic offset starts from 0, then Log is additionally written all the time, when the Log is written into the logic space 2GB, the snapshot 2 is created, meanwhile, the logic offset is reset to 0, then Log is additionally written into the snapshot 2, when the logic offset is 2GB, the snapshot 3 is created, and the subsequent Log is written into the snapshot 3. The logical address space of snapshot 1, snapshot 2, and snapshot 3 are all the same, all partition 0 to 2GB space, and 3 command spaces (similar to the isolation in containers) are similarly created by creating snapshots. The log space can be extended indefinitely by continuously creating snapshots, and the log space is narrowed by deleting snapshots that have been played back (the redox log is mainly used for playing back the generated page data, and can be deleted once the playback is completed).
In addition, as shown in fig. 16, for the physical snapshot data distribution of the disk where the small table corresponding to the log partition is located, the snapshot data of the partition is distributed to different physical disk positions through redirection, that is, physical dispersion of Block devices (blocks) is realized, and the 0 to 2GB space of the partition is logically corresponding to the physical snapshot data distribution of the disk where the small table corresponding to the log partition is located.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiments of the present application also provide a data storage device for implementing the above-mentioned data storage method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation of one or more embodiments of the data storage device provided below may be referred to above as limitation of the data storage method, and will not be repeated here.
In one exemplary embodiment, as shown in FIG. 17, there is provided a data storage device 1700 comprising: a log data acquisition module 1702, a data page partition determination module 1704, a log playback module 1706, and a data drop processing module 1708, wherein:
a log data obtaining module 1702 configured to obtain log data for recording a change of data update in a database;
the data page partition determining module 1704 is configured to determine a data page partition according to a data page identifier carried by the log data; a data page partition, which is a partition in the data page virtual memory for storing data pages identified by the data page identification;
the log playback module 1706 is configured to perform log playback on the data page identified by the data page identifier stored in the data page partition based on the log data, to obtain a data page after the data page is updated correspondingly;
And the data drop processing module 1708 is configured to store the updated data page and log data in at least one physical memory corresponding to the data page partition.
In one embodiment, log playback module 1706 is further configured to determine data update information for the identified data page based on the log data; and carrying out data updating on the data page according to the data updating information to obtain an updated data page.
In one embodiment, log playback module 1706 is also used to read data pages from the data page partition; updating the data in the read data page according to the data updating information to obtain an updated data page; the updated data page is written to the data page partition to replace the data page.
In one embodiment, the data page partition determination module 1704 is further configured to determine a data page identifier carried by the log data; determining a partition identification corresponding to the data page identified by the data page identification based on the data page identification and the partition division unit; a data page partition for storing the data page identified by the data page identification is determined from the data page virtual memory based on the partition identification.
In one embodiment, the system further comprises a log data storage module for storing log data into a log partition of the log virtual memory; the log virtual memory and the data page virtual memory jointly represent the same data storage space; the log playback module 1706 is further configured to read log data from the log partition, and perform log playback on a data page identified by a data page identifier stored in the data page partition based on the log data obtained by reading, so as to obtain a data page after the data page is updated correspondingly; the data drop processing module 1708 is further configured to read an updated data page from the data page partition; and storing the read log data and the updated data page into at least one physical memory corresponding to the data page partition.
In one embodiment, the log data storage module is further for determining a log virtual memory associated with the data page virtual memory; determining a log partition from the log virtual memory according to the partition identification of the data page partition; the partition identification is determined based on the data page identification and the partition division unit; log data is stored into a log partition.
In one embodiment, the system further comprises a snapshot dynamic processing module, which is used for creating a snapshot which is the same as the logical address space of the log partition when the storage space of the log partition is saturated, and storing the log data into the snapshot; and deleting the snapshot when the snapshot meets the release judging condition.
In one embodiment, the snapshot dynamic processing module is further configured to create a snapshot based on the log partition and redirect the snapshot into the at least one physical memory according to a logical address space of the log partition; when the log data stored in the snapshot completes corresponding data storage processing, deleting the snapshot, and releasing the storage space of the snapshot from at least one physical memory.
In one embodiment, the log data is the first log data to identify the identified data page for the data page; the first data processing module is used for generating a data page identified by the data page identification based on the log data; and writing the generated data page into the data page partition as an updated data page.
In one embodiment, the data drop processing module 1708 is further configured to determine at least two physical memories based on a distributed setting; for each of the at least two physical memories, determining a storage subspace corresponding to the data page partition from the targeted physical memory; and respectively storing the updated data page and the log data into a storage subspace of the physical memory.
In one embodiment, the data drop processing module 1708 is further configured to map a partition identifier of a data page partition to obtain a partition identifier mapping result; determining subspace identification according to the partition identification mapping result; a subspace identification identified storage subspace is determined from the targeted physical memory.
The various modules in the data storage device described above may be implemented in whole or in part in software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one exemplary embodiment, a computer device is provided, which may be a server or a terminal, and an internal structure diagram thereof may be as shown in fig. 18. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data processed by the data storage method. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data storage method.
It will be appreciated by those skilled in the art that the structure shown in fig. 18 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application is applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use, and processing of the related data are required to meet the related regulations.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (15)

1. A method of data storage, the method comprising:
acquiring log data for recording data update changes in a database;
determining a data page partition according to the data page identifier carried by the log data; the data page partition is a partition for storing the data page identified by the data page identification in the data page virtual memory;
based on the log data, performing log playback on the data page identified by the data page identification stored in the data page partition to obtain a data page corresponding to the updated data page;
And storing the updated data page and the log data into at least one physical memory corresponding to the data page partition.
2. The method of claim 1, wherein the performing log playback on the data page identified by the data page identifier stored in the data page partition based on the log data to obtain the data page corresponding to the updated data page comprises:
determining data update information for the data page identified by the data page identification based on the log data;
and carrying out data updating on the data page according to the data updating information to obtain an updated data page.
3. The method according to claim 2, wherein the data updating the data page according to the data updating information to obtain an updated data page comprises:
reading the data page from the data page partition;
updating the data in the read data page according to the data updating information to obtain an updated data page;
writing the updated data page into the data page partition to replace the data page.
4. The method of claim 1, wherein the determining a data page partition from the data page identification carried by the log data comprises:
Determining a data page identifier carried by the log data;
determining a partition identification corresponding to the data page identified by the data page identification based on the data page identification and the partition division unit;
and determining a data page partition for storing the data page identified by the data page identification from the data page virtual memory according to the partition identification.
5. The method according to claim 1, wherein the method further comprises:
storing the log data into a log partition of a log virtual memory; the log virtual memory and the data page virtual memory jointly represent the same data storage space;
the log playback is performed on the data page identified by the data page identifier stored in the data page partition based on the log data, and the data page updated corresponding to the data page is obtained, including:
reading the log data from the log partition, and performing log playback on the data page identified by the data page identification stored in the data page partition based on the read log data to obtain a data page corresponding to the updated data page;
the storing the updated data page and the log data in at least one physical memory corresponding to the data page partition includes:
Reading the updated data page from the data page partition;
and storing the read log data and the updated data page into at least one physical memory corresponding to the data page partition.
6. The method of claim 5, wherein storing the log data into a log partition of a log virtual memory comprises:
determining a log virtual memory associated with the data page virtual memory;
determining a log partition from the log virtual memory according to the partition identification of the data page partition; the partition identification is determined based on the data page identification and partition division unit;
and storing the log data into the log partition.
7. The method of claim 5, wherein the method further comprises:
when the storage space of the log partition is saturated, creating a snapshot which is the same as the logical address space of the log partition, and storing the log data into the snapshot;
and deleting the snapshot when the snapshot meets the release judging condition.
8. The method of claim 7, wherein the creating the same snapshot as the logical address space of the log partition comprises:
Creating a snapshot based on the log partition and redirecting the snapshot into the at least one physical memory according to a logical address space of the log partition;
and deleting the snapshot when the snapshot meets a release judgment condition, including:
and deleting the snapshot when the log data stored in the snapshot completes corresponding data storage processing, and releasing the storage space of the snapshot from the at least one physical memory.
9. The method of claim 1, wherein the log data is first log data identifying the identified data page for the data page; the method further comprises the steps of:
generating a data page identified by the data page identification based on the log data;
and writing the generated data page into the data page partition as an updated data page.
10. The method according to any one of claims 1 to 9, wherein storing the updated data page and the log data in at least one physical memory corresponding to the data page partition comprises:
determining at least two physical memories based on the distributed settings;
For each of the at least two physical memories, determining a storage subspace corresponding to the data page partition from the targeted physical memory;
and respectively storing the updated data page and the log data into the storage subspace of the targeted physical memory.
11. The method of claim 10, wherein the determining the storage subspace corresponding to the data page partition from the targeted physical memory comprises:
mapping the partition identification of the data page partition to obtain a partition identification mapping result;
determining subspace identification according to the partition identification mapping result;
the subspace identification identified storage subspace is determined from the targeted physical memory.
12. A data storage device, the device comprising:
the log data acquisition module is used for acquiring log data for recording data update changes in the database;
the data page partition determining module is used for determining a data page partition according to the data page identifier carried by the log data; the data page partition is a partition for storing the data page identified by the data page identification in the data page virtual memory;
The log playback module is used for carrying out log playback on the data page identified by the data page identification stored in the data page partition based on the log data to obtain a data page after the data page is correspondingly updated;
and the data drop processing module is used for storing the updated data page and the log data into at least one physical memory corresponding to the data page partition.
13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 11 when the computer program is executed.
14. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 11.
15. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 11.
CN202410271793.4A 2024-03-11 2024-03-11 Data storage method, device, computer equipment and storage medium Pending CN117873405A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410271793.4A CN117873405A (en) 2024-03-11 2024-03-11 Data storage method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410271793.4A CN117873405A (en) 2024-03-11 2024-03-11 Data storage method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117873405A true CN117873405A (en) 2024-04-12

Family

ID=90590374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410271793.4A Pending CN117873405A (en) 2024-03-11 2024-03-11 Data storage method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117873405A (en)

Similar Documents

Publication Publication Date Title
US11068455B2 (en) Mapper tree with super leaf nodes
CN110799960B (en) System and method for database tenant migration
US10430398B2 (en) Data storage system having mutable objects incorporating time
US8843454B2 (en) Elimination of duplicate objects in storage clusters
US9684702B2 (en) Database redistribution utilizing virtual partitions
CN111427847B (en) Indexing and querying method and system for user-defined metadata
US20160350302A1 (en) Dynamically splitting a range of a node in a distributed hash table
CN104408111A (en) Method and device for deleting duplicate data
CN102662992A (en) Method and device for storing and accessing massive small files
CN106570113B (en) Mass vector slice data cloud storage method and system
US20200364239A1 (en) Asynchronous replication of in-scope table data
CN108427728A (en) Management method, equipment and the computer-readable medium of metadata
CN109407985B (en) Data management method and related device
JP2015528957A (en) Distributed file system, file access method, and client device
US8386741B2 (en) Method and apparatus for optimizing data allocation
CN103473258A (en) Cloud storage file system
US8495112B2 (en) Distributed file hierarchy management in a clustered redirect-on-write file system
US10521398B1 (en) Tracking version families in a file system
CN117873405A (en) Data storage method, device, computer equipment and storage medium
US10540329B2 (en) Dynamic data protection and distribution responsive to external information sources
US9898485B2 (en) Dynamic context-based data protection and distribution
CN109960460B (en) Distributed storage system
CN111782150A (en) Multi-bucket storage system and method based on object storage
CN111309263A (en) Method for realizing logical volume in distributed object storage
CN202025315U (en) Data resource management system with fixed content

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination