CN118051642A - Data storage method and system of database system - Google Patents

Data storage method and system of database system Download PDF

Info

Publication number
CN118051642A
CN118051642A CN202211397508.0A CN202211397508A CN118051642A CN 118051642 A CN118051642 A CN 118051642A CN 202211397508 A CN202211397508 A CN 202211397508A CN 118051642 A CN118051642 A CN 118051642A
Authority
CN
China
Prior art keywords
data
physical
tablespace
file
physical file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211397508.0A
Other languages
Chinese (zh)
Inventor
许友松
苏斌
张志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to CN202211397508.0A priority Critical patent/CN118051642A/en
Publication of CN118051642A publication Critical patent/CN118051642A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9017Indexing; Data structures therefor; Storage structures using directory or table look-up
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data storage method and a data storage system of a database system, which relate to the technical field of databases, and comprise the following steps: the database system can be provided with at least two data catalogues, and the data in the same table space of the database system are stored in different data catalogues, so that the data in the same table can be stored in different storage paths.

Description

Data storage method and system of database system
Technical Field
The embodiment of the application relates to the technical field of databases, in particular to a data storage method and system of a database system.
Background
Databases may enable storage of data, current storage schemes for database systems may support storage of data for multiple data tables to different storage paths, e.g., tables created internally to the database system may be stored in designated storage paths, and the database system may also use a table.
However, the data storage scheme of the current database system cannot realize the storage of different storage paths for the data in a single table.
Disclosure of Invention
In order to solve the technical problems, the application provides a data storage method and a data storage system of a database system. In the method, the data of the same table of the database system can be stored in different data catalogues so as to realize the storage of the data in the same table in different storage paths.
In one possible embodiment, the present application provides a method of data storage for a database system. The data catalog of the database system comprises: a first data directory and at least one second data directory. The method comprises the following steps: writing first data to a second physical file in the second data directory based on a first write request; wherein the first write request includes first data requesting writing to a first tablespace; the second physical file is associated with the first tablespace; the first tablespace is also associated with a first physical file within the first data directory; the first physical file includes partial data of the first tablespace.
The data directory is a file system directory of the database system.
For example, the first data directory is a default data directory of the database system, and the second data directory is an extended data directory of the database system, without limitation. In addition, the number of second data directories may be one or more.
Wherein the first tablespace may be a tablespace that has been created before the database is provided with the second data directory, then the first tablespace may be considered an "old table" and if the database is provided with the second data directory, then the recreated tablespace may be considered a "new table".
The first tablespace is an "old table," a first physical file associated with the first tablespace is included in the first data directory, and the first physical file includes partial data of the first tablespace, which may include at least one data page.
After the database system is configured with the second data directory, when writing data to the "old table", it is specifically herein a first write request, which is a data write request of the database, including first data that is requested to be written to the first table space. The method may write the first data to a second physical file associated with the first tablespace within a second data directory (e.g., a second data directory).
Thus, the method of the embodiment of the application can store the data in the same table space in different data catalogues when storing the data in the database, so as to realize the storage of the data in the same table in different storage paths.
In one possible implementation manner, the writing the first data to the second physical file in the second data directory based on the first writing request includes: creating a second physical file associated with the first tablespace within the second data directory based on a first write request if it is determined that the second physical file associated with the first tablespace is not included within the second data directory; and writing the first data into the second physical file.
When the second data directory does not include the physical file managed by the first table space, the method can create an empty second physical file in the second data directory based on the first writing request and associate the second physical file with the first table space, so that the first table space is associated with not only the first physical file in the first data directory but also the second physical file in the second data directory, and thus the data of the first table space can be written into the physical files in different data directories to realize distributed storage of the data of the same table space in different data directories.
In one possible implementation, the first physical file includes data of at least one data page of the first tablespace, and the second physical file includes data of at least one data page of the first tablespace.
When the data of the same table space is stored to different data catalogues, the granularity of the stored data is data page level, that is, the data of at least one data page of the first table space can be stored to a first physical file in the first data catalogue, and the data of at least one other data page of the first table space can be stored in a second physical file in the second data catalogue, so that the cross-catalog storage of the data page granularity of the same table space can be realized.
In one possible implementation, after the writing of the first data to the second physical file in the second data directory, the method further includes: mapping a first access address to the first table space to a second access address in a third physical file based on a first access request to the first table space, a file size of the first physical file, a file size of the second physical file, and a creation sequence between the first physical file and the second physical file, wherein the third physical file is the first physical file or the second physical file, and the first access request includes the first access address; accessing the data of the first tablespace according to the second access address.
Wherein the first access request may include, but is not limited to: delete data request, query data request, modify data request.
Since the data of the first table space is stored in at least two data directories, here the first data directory and the second data directory, when the service requests access to the first table space, in order to accurately locate which physical file of which data directory the accessed data is in, and the corresponding access address in the physical file, when the first access request of the first table space is processed, the first access address corresponding to the first access request may be mapped to the second access address in the physical file (for example, the first physical file or the second physical file) to be accessed according to the file sizes of the first physical file and the second physical file associated with the first table space, and the creation sequence between the first physical file and the second physical file.
The data in the first table space is stored in the first physical file and the second physical file, after the second physical file is created, the data written in the first table space can be stored in the second physical file, the first physical file is created before the second physical file, one data page in the table space can correspond to one disk block of the disk corresponding to the physical file, and the data in the table space is written in sequence according to the creation sequence of the data pages. Therefore, the method can determine how many data pages of the first data page of the first table space are stored in the first physical file and how many data pages of the second data page are stored in the second physical file according to the sizes of the two physical files and the creation sequence between the two physical files, so that the physical files needing to be accessed at this time and the second access addresses (for example, logical addresses) in the physical files needing to be accessed at this time are determined based on the data pages (corresponding to the first access addresses) needing to be accessed by the first access request, and accordingly, the data access of the first table space is performed to the corresponding physical addresses in the disk corresponding to the physical files according to the second access addresses.
For example, the first access request may be a request triggered after the first data is written to the second physical file and without restarting the database system, and then the file sizes of the first physical file and the second physical file and the information of the creation sequence between the two files may be obtained immediately after the first access request is received.
The method of the present application may acquire and record information of file sizes and creation sequences of at least two physical files corresponding to a table space associated with different data directories after each database restart, so as to accurately locate an address of data to be accessed by using the recorded information when an access request to the table space associated with at least two data directories is received.
In this way, the embodiment of the application can accurately locate the address of the accessed data when the data access is carried out on the table space associated with at least two physical files in at least two data catalogues, so as to realize the address mapping of the table space in which the data pages in the database system are stored in a distributed manner.
In one possible embodiment, the method further comprises: determining at least two fourth physical files associated with the same second table space in respective physical files in the first data directory and the at least one second data directory when the database system is restarted; recording first information, wherein the first information comprises: the file size of each of the at least two fourth physical files associated with the second tablespace, and the creation order between different ones of the at least two fourth physical files.
Wherein the second tablespace may comprise the first tablespace described above. The second tablespace refers to a tablespace in which at least two physical files are associated within different data directories.
For example, when the second tablespace includes the first tablespace, the at least two fourth physical files may include the first physical file and the second physical file.
The embodiment can ensure that the database can support the requirement of effectively accessing data to the table space associated with a plurality of physical files after each time the database is restarted.
In a possible embodiment, after the recording of the first information, the method further includes: based on a second access request to the second tablespace, accessing data of the second tablespace in at least one fourth physical file associated with the second tablespace in accordance with the first information recorded for the second tablespace.
The specific implementation principle of this embodiment is similar to the implementation principle and effect of the embodiment for processing the first access request of the first table space, and is not repeated here.
In one possible embodiment, the method further comprises: creating a third tablespace in the database system based on a second write request, wherein the second write request includes second data requesting writing to the third tablespace; creating a fifth physical file associated with the third tablespace within the second data directory; and writing the second data into the fifth physical file.
Wherein, the third tablespace may be referred to herein as a "new tablespace," which may be a tablespace created by a service request after a second data directory is set on a database system, and to which data is requested to be written.
For example, the third tablespace may be different from the second tablespace, the first tablespace.
Alternatively, in this embodiment, the third tablespace may be associated with only the fifth physical file in the second data directory, and not with any physical files in the first data directory, such that the data of the third tablespace is stored only in the second data directory.
In one possible embodiment, the method further comprises: determining at least two sixth physical files associated with the same fourth tablespace in the respective physical files in the first data directory and in the at least one second data directory; based on the creation sequence of different sixth physical files in the at least two sixth physical files, connecting the data in the at least two sixth physical files according to the sequence from early to late of the creation time so as to acquire the table data of the fourth table space; and backing up the table data of the fourth table space.
Illustratively, the fourth tablespace herein is used to refer to a tablespace in which at least two physical files are associated within different data directories. For example, the fourth tablespace may include, or be the same as, the second tablespace described above, without limitation. Similarly, the sixth physical file may be identical to the fourth physical file.
When backing up the data of the table space in which the data is stored in the different data directories, the embodiment can connect the data of at least two sixth physical files associated with the different data directories according to the creation sequence (from early to late here) of the table space between the at least two sixth physical files so as to obtain the table data of the table space and back up the table data.
For example, the data directory of the database system includes a data directory 1 and a data directory 2, wherein the data directory 1 includes a physical file a associated with table 1, the data directory 2 includes a physical file b associated with table 1, and the data directory 2 is created after the data path 1 such that the physical file b is created after the physical file a, and when the data of table 1 is backed up, the method may read all the table data from the physical file a and read all the table data from the physical file b, and connect a data start position of the table data read from the physical file b to a data end position of the table data read from the physical file a, thereby acquiring the complete data of table 1, and back up the complete data of table 1 acquired.
Thus, for the same table space in which table data is stored in different data catalogues, the application can also realize complete and accurate backup of the data of the table space.
In one possible implementation, the first physical file is created before the second physical file.
In one possible embodiment, the plurality of data directories of the database system are located in different file systems, respectively.
Wherein the plurality of data directories herein includes a first data directory and at least one second data directory.
In this way, the same table data of the database can be stored in different file systems so as to meet different storage requirements of the data.
In one possible implementation manner, disk types of disks corresponding to a plurality of file systems corresponding to the database system are different.
The plurality of file systems comprise a first data directory and at least one file system corresponding to the second data directory.
Thus, the database system may use different types of disks to implement data storage in the same tablespace, so as to utilize characteristics (such as high speed, high capacity, etc.) of the different types of disks to implement data storage.
In one possible embodiment, the present application provides a data storage system for a database system. The data catalog of the database system comprises: a first data directory and at least one second data directory, the data storage system comprising: the first writing module is used for writing first data into a second physical file in the second data directory based on a first writing request; wherein the first write request includes first data requesting writing to a first tablespace; the second physical file is associated with the first tablespace; the first tablespace is also associated with a first physical file within the first data directory; the first physical file includes partial data of the first tablespace.
In a possible implementation manner, the first writing module is specifically configured to: creating a second physical file associated with the first tablespace within the second data directory based on a first write request if it is determined that the second physical file associated with the first tablespace is not included within the second data directory; and writing the first data into the second physical file.
In one possible implementation, the first physical file includes data of at least one data page of the first tablespace, and the second physical file includes data of at least one data page of the first tablespace.
In one possible implementation, the data storage system further comprises: the mapping module is used for mapping a first access address to the first table space to a second access address in a third physical file based on a first access request to the first table space, according to the file size of the first physical file, the file size of the second physical file and the creation sequence between the first physical file and the second physical file, wherein the third physical file is the first physical file or the second physical file, and the first access request comprises the first access address; and the first access module is used for accessing the data of the first table space according to the second access address.
In one possible implementation, the data storage system further comprises: a first determining module, configured to determine, when the database system is restarted, at least two fourth physical files associated with the same second tablespace from among the physical files in the first data directory and the at least one second data directory; the recording module is used for recording first information, wherein the first information comprises: the file size of each of the at least two fourth physical files associated with the second tablespace, and the creation order between different ones of the at least two fourth physical files.
In one possible implementation, the data storage system further comprises: and the second access module is used for accessing the data of the second table space in at least one fourth physical file associated with the second table space according to the first information recorded in the second table space based on a second access request of the second table space.
In one possible implementation, the data storage system further comprises: a first creation module configured to create a third tablespace in the database system based on a second write request, where the second write request includes second data requesting writing to the third tablespace; a second creating module, configured to create a fifth physical file associated with the third tablespace in the second data directory; and the second writing module is used for writing the second data into the fifth physical file.
In one possible implementation, the data storage system further comprises: a second determining module, configured to determine at least two sixth physical files associated with the same fourth tablespace in the physical files in the first data directory and in the at least one second data directory, respectively; the reading module is used for reading the data in each sixth physical file in the at least two sixth physical files; the splicing module is used for connecting the data respectively read from the at least two sixth physical files according to the sequence from the early to the late of the creation time based on the creation sequence between different sixth physical files in the at least two sixth physical files so as to acquire the table data of the fourth table space; and the backup module is used for backing up the table data of the fourth table space.
In one possible implementation, the first physical file is created before the second physical file.
In one possible embodiment, the plurality of data directories of the database system are located in different file systems, respectively.
In one possible implementation manner, disk types of disks corresponding to a plurality of file systems corresponding to the database system are different.
The effects of the data storage system of the database system of each of the above embodiments are similar to those of the data storage method of the database system of each of the above embodiments, and will not be described here again.
In one possible embodiment, the present application provides a data storage device for a database system. The data storage device of the database system includes one or more interface circuits and one or more processors; the interface circuit is configured to receive a signal from the memory and to send the signal to the processor, the signal comprising computer instructions stored in the memory; the processor, when executing the computer instructions, may implement the method of any of the embodiments described above.
The effects of the data storage device of the database system of the present embodiment are similar to those of the data storage method of the database system of each of the above embodiments, and will not be described here again.
In one possible implementation, the present application provides a computer-readable storage medium. The computer readable storage medium stores a computer program which, when run on a computer or processor, causes the computer or processor to perform the method of any of the above embodiments.
The effects of the computer-readable storage medium of the present embodiment are similar to those of the data storage method of the database system of each of the above embodiments, and will not be described here again.
In one possible implementation, the present application provides a computer program product. The computer program product comprises a software program which, when executed by a computer or processor, causes the method of any of the above embodiments to be performed.
The effects of the computer program product of the present embodiment are similar to those of the data storage method of the database system of each of the above embodiments, and will not be described here again.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of a data storage architecture of an exemplary database management system;
FIG. 2a is a diagram of a data storage architecture of an exemplary database management system;
FIG. 2b is a diagram of a data storage architecture of the database management system shown by way of example;
FIG. 2c is a schematic diagram of an exemplary backup structure of backup data of a database;
FIG. 3 is a schematic structural diagram of an apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a chip according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone.
The terms first and second and the like in the description and in the claims of embodiments of the application, are used for distinguishing between different objects and not necessarily for describing a particular sequential order of objects. For example, the first target object and the second target object, etc., are used to distinguish between different target objects, and are not used to describe a particular order of target objects.
In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the description of the embodiments of the present application, unless otherwise indicated, the meaning of "a plurality" means two or more. For example, the plurality of processing units refers to two or more processing units; the plurality of systems means two or more systems.
The following names are first defined:
ordinary table: representing tables that have been created in a default storage directory of the database system.
Appearance (External Tables): the portion of the database supports the handling of external normal data files, which may then be referred to as a table, as a look.
Tablespace (Tablespace): is a logical partition unit of a relational database, and a table space can only belong to one database. All database objects are stored in a designated tablespace. But mainly stores tables, so called as table spaces, each table space is provided with a unique identifier (TablespaceID), and one table corresponds to one table space;
data catalogue: for storing data of the database.
The data storage process of the database system in the related art is briefly described as follows:
Related art 1:
The MariaDB may support placing locally stored read-only tables into a simple storage service (Simple Storage Service, S3), where S3 is an object storage service whose interface is a popular standard for current object storage. To support the storage of different data tables locally and S3.
The mariadib externally stores the data tables intended for data archival storage or old data sharing so that it only supports externally storing read-only tables. And the MariaDB only supports external storage of data with a minimum storage unit of a table, and cannot externally store part of the data of the table. And after the MariaDB imports the table to S3, the operations supported by the table in S3 are not exactly identical to the operations supported by the normal table in MariaDB, e.g., the table in S3 does not support partition adjustment.
Related art 2:
The Oracle database can use the look to enable the user to use common files external to the database system just as they would a common table of the database. The common file can be stored in any position as long as the database can read the data in the common file. Only metadata information of the common file is recorded in the database, and data in the common file is not imported into the database.
However, the Oracle database can only read the external table and cannot write. And the appearance only supports partial functions of the database (for example, does not support encryption columns), and the functions supported by the common table are rich. And the Oracle database uses external normal files, and can only operate the data in the external storage path with the table as the minimum unit.
Therefore, in the storage scheme of the database system in the related art, when data of different data tables is stored in different storage paths, there is mainly inconsistency in database operations supported by the tables in the different storage paths, and the minimum unit of the data stored in the different storage paths is a table.
In view of the above technical problems in the storage schemes of database systems in the related art, the present application provides a method and an apparatus for storing data in a database system, where different data in a table can be stored in directories of different file systems, so that a database can use different disks to store data, and database operations supported by tables stored in different disks can be the same as database operations supported by a common table.
FIG. 1 is a diagram of a data storage architecture of a database management system prior to modification of the present application shown in an exemplary manner.
As shown in FIG. 1, host 100 may include a Database management system (Database MANAGEMENT SYSTEM, DBMS) 101, a file system 104a, and a disk 103a (also referred to as disk 1).
The file system 104a includes a data directory 102a, wherein the data directory 102a is the default data directory of the DBMS101 shown in FIG. 1: /mnt/data. The data directory 102a is used to store data of a database.
The DBMS101 stores data of one table in a table space, and one table corresponds to one table space.
As shown in fig. 1, the DBMS101 includes 5 tablespaces, respectively, from tablespace 1 to tablespace 5, each tablespace corresponds to a physical file in the data directory 102a in the file system 104a, and the data of one table stored in each tablespace is stored in the corresponding physical file in the data directory 102a in the file system 104 a.
For example, data of one table stored in each of the table spaces 1 to 5 is stored in the physical files 1a to 5a in the data directory 102a, respectively.
While the file system 104a is a management system of logical addresses of the disk 1, the data in the physical files 1a to 5a are actually stored in the disk 1.
However, the DBMS101 supports only one data directory, and one table space in the DBMS101 supports only the storage of table data in one physical file, but one table space cannot support the storage of table data in a plurality of physical files.
To this end, the present application improves upon the host 100, DBMS101 shown in FIG. 1, and FIG. 2a is a diagram of a data storage architecture of an improved database management system of the present application, as shown by way of example.
As shown in FIG. 2a, the host 100 may include a DBMS101, a disk 103a (e.g., disk 1 local to the host 101), a file system 104a of the disk 103 a.
In addition, the host 100 may further be loaded with a disk 103b, and a file system 104b of the disk 103b may be created in the host 100.
Wherein the file system 104a may include the data directory 102a of the DBMS101 and the file system 104b may include the data directory 102b of the DBMS 101.
In contrast to FIG. 1, in an embodiment of the present application, the database has not only a default data directory 102a (e.g., shown in FIG. 2 a/mnt/data) but also an extended data directory 102b (e.g., shown in FIG. 2 a/mnt/data_2). Of course, in other embodiments, the database may further have more than 2 data directories (e.g., multiple extended data directories), which is not limited in this aspect of the present application, and the implementation principle is similar and will not be described herein.
Furthermore, in the embodiment of FIG. 2a, the different data directories (data directory 102a and data directory 102 b) of the database correspond to different disks. In other embodiments, different data directories of the database of the present application may also correspond to the same disk, for example, not only the default data directory 102a of the database but also the extended data directory 102b of the database are created within the file system of the same disk. The application does not limit whether the disks corresponding to the data catalogs of the database are the same or not.
In addition, the disk corresponding to the data directory of the database can be a local disk or an external disk (for example, a cloud disk), and the application does not limit the type of the disk corresponding to the data directory of the database. For example, in fig. 2a, disk 103a is a local disk of the host, and disk 103b is a cloud disk.
With continued reference to FIG. 2a, by creating a table in the database, a table space corresponding to the table can be created in DBMS101, wherein, as shown in FIG. 1, 6 tables have been created in the database prior to creating data catalog 102b, the corresponding table spaces being table space 1 through table space 5, respectively, and after setting data catalog 102b also as the data catalog of the database, DBMS101 creates a table corresponding to table space 6 based upon the service request.
Illustratively, tablespaces 1 through 5 are tablespaces created prior to the method of the present application creating a data directory 102b for DBMS101 (e.g., when DBMS101 has only one data directory 102 a).
Illustratively, after the method of the present application creates the data catalog 102b for the DBMS101, when a table is again built in the database (e.g., table 6 is built), then the table space 6 corresponding to table 6 is created within the DBMS 101.
It should be appreciated that the present application is not limited to the number of tablespaces within the DBMS101 and is not limited to the number of tablespaces created after the creation of the data catalog 102b for the database, wherein the principles of implementation of the tablespaces and associated schemas created after the creation of the data catalog 102b are similar to the illustrated principles of tablespaces 6 and are not repeated herein.
Referring to fig. 2a and 2b, an access procedure to table data in a table space 201 of a database will be described by taking the table space 201 (also referred to as a table space 1) as an example. The process of accessing the data from the tablespace 2 to the tablespace 5 in fig. 2a is similar to the example of the tablespace 1 in fig. 2a, and will not be described in detail.
As described above, the table spaces 1 to 5 are the table spaces created before the method of the present application creates the data directory 102b for the DBMS101 (for example, when the DBMS101 has only one data directory 102 a). As shown in fig. 1 and 2a, each table space in the DBMS101 corresponds to a respective physical file in the data directory 102a, and for the table spaces 1 to 5, the physical files corresponding in sequence are respectively the physical files 1a to 5a, and the table data of the table spaces 1 to 5 are respectively stored in the physical files 1a to 5 a.
Taking the database as an example of RDS MySQL, in contrast to fig. 1, referring to fig. 2a, the management software can mount the disk 103b to the host 100 (or virtual machine) where the RDS MySQL instance is located, create a file system 104b for the disk 103b, and create a data directory 102b (e.g., mnt/data_2) in the file system 104b, where the data directory 102b does not include any data, and is an empty directory. As the business writes data to the database, there is an increasing number of physical files in the data directory 102b corresponding to the tablespace in which the data is written, and the data in the tablespace can be written in the physical files.
The database management system 101 or database administrator (Database Administrator, DBA) of the present application then sets an extended data directory 102b for the RDS MySQL database via a set command.
For example, in the scenario of fig. 1, where the DBMS has stored 5 tables (corresponding to table space 1 through table space 5) of data on disk 103a, and it is determined that the storage space of disk 103a corresponding to data directory 102a is insufficient, or that the performance of disk 103a is poor, then DBMS101 or DBA may set data directory 102b as shown in fig. 2a as the second-level data directory (i.e., the extended data directory) of the RDS MySQL database to implement the designation of a new data directory to the database.
For example, the database management system 101 or the database administrator (Database Administrator, DBA) may write the address of the data directory 102b (here, mnt/data_2) as a parameter to the metadata file of the RDS MySQL database (a configuration file of the database) through a set command, but may also write other files of the database, which is not limited herein.
In one possible implementation, after setting the data directory 102b as the data directory of the RDS MySQL database, the data directory of the RDS MySQL database includes not only the data directory 102a but also the data directory 102b. After setting the data directory 102b as the data directory of the RDS MySQL database, newly generated database data in the RDS MySQL database, or newly created tables, may be stored in the data directory 102b to store database data in the disk 103 b.
As shown in fig. 2b, the tablespace 201 is logically comprised of a plurality of data pages, with the tablespace 201 comprising 100 data pages 4001 through 4100, respectively, prior to the creation of the physical file 1b within the data directory 102 b. The data page in one tablespace is the smallest unit of data storage in the database, the size of the data page corresponds to a particular number of storage bytes (e.g., disk blocks) on the physical storage space, and different data pages in the same tablespace physically correspond to multiple disk blocks of equal size. For example, before setting the data directory 102b shown in FIG. 2a as the data directory of the RDS MySQL database, the data of 100 data pages within the tablespace 201 are all stored in the physical file 202a within the default data directory 102a, and the physical file 202a has been associated with tablespace 1. As shown in fig. 2b, data of 100 data pages of the table space 201 in the physical file 202a are stored in 100 disk blocks of the disk 103a, here, the disk blocks 3001a to 3100a, respectively. Data such as data page 4001 is stored in disk block 3001a by physical file 202 a.
It should be appreciated that the different disk blocks within disk 103a shown in FIG. 2b are not limited to contiguous physical addresses or noncontiguous physical addresses.
For example, after setting the data directory 102b shown in fig. 2a as the data directory of the RDS MySQL database, for example, writing data in the table space 1 corresponding to the table 1 is continued according to the service requirement, for example, when the data page 4100 is full, the DBMS101 may create the data page 4101 in the table space, and write the data to be inserted (i.e. written) into the table space 1 into the data page 4101 according to the service requirement, and the specific procedure will be described below.
In fig. 1, table space 1 corresponds to only one physical file 1a in data directory 102a, but in fig. 2a modified by the present application, the database is additionally provided with data directory 102b.
In one possible implementation, the DBMS101 receives a service request indicating that when a request is made to write data to the tablespace 1, then the DBMS101 may create an empty physical file 1b within the data directory 102 b. And the DBMS101 may associate the newly created physical file 1b in the data directory 102b with the table space 1 and save the association, so that the table space 1 is associated with not only the physical file 1a in the data directory 102a but also the physical file 1b in the data directory 102b, thereby implementing association of the same table space with physical files in different data directories, so as to facilitate storage and access of table data in the table space in physical files in different data directories.
For example, both the physical file 1a and the physical file 1b may be associated to the tablespace 1 by associating the file name of the newly created physical file 1b with the file name of the physical file 1a already associated with the tablespace 1. For example, the names of physical file 1 and physical file 2 are the names of table 1 (e.g., XX table) corresponding to table space 1.
For another example, the present application does not limit the file name of the newly created physical file 1b, but after the new physical file 1b is created, the DBMS101 may associate the file name of the physical file 1b with the name of the tablespace 1, and store the associated information as metadata of the physical file 1b into the database. Whereas the pre-created physical file 1a has been associated with the tablespace 1 prior to creation of the data directory 102b, the effect of associating both physical file 1a, physical file 1b with tablespace 1 is also achieved.
In addition, when the association between the physical file and the table space is realized, the association is not limited to the use of the file name or the table name, and other information capable of identifying the physical file or the table space may be used to realize the establishment of the association relationship.
Thus, after setting the data directory 102b as the data directory of the DBMS101, the DBMS may create, in the data directory 102b, physical files corresponding to the respective table spaces, namely, physical files 1b to 5b corresponding to the table spaces 1 to 5, respectively, as shown in fig. 2a, after receiving a data write request to any one of the table spaces 1 to 5, and the DBMS101 may associate the newly created physical files in the data directory 102b with the respective table spaces, such that the same table space may be associated with the respective physical files in the data directory 102a and the respective physical files in the data directory 102 b. Before the data corresponding to the data writing request is written into the corresponding table space, the physical files 1b to 5b are empty files, and the file size is continuously increased along with the writing of the data.
Continuing with the table space 1 as an example, as shown in fig. 2b, after the DBMS101 associates the physical file 1a in the data directory 102a and the newly created physical file 1b in the data directory 102b with the table space 1, and stores the association, the DBMS101 may write the data to be inserted into the table space 1 into the data page 4101 in response to a service request for writing the data into the table space 1, where the data page 4101 is only a logical storage address, and the DBMS101 may write the data to be written into the data page 4101 into the physical file 1b, and the data to be actually inserted into the data page 4101 of the table space 1 is actually stored in the disk block 3001b of the disk 103 b. Thus, in order to realize that the data of a table are stored in different data directories, the application takes the table space as one or more data pages, one or more data pages can be corresponding to one physical file in one data directory, so that the physical file corresponding to the table space is split into a plurality of physical files, and the physical files can be respectively in different data directories. Compared with the data storage of different storage positions only in the unit of a table in the related art, the method can realize the data storage of different data pages of the same table in different storage positions by taking one or more data pages in a single table as units, and can specify the storage positions of data according to the granularity of the data pages.
It should be understood that, in fig. 2b, only the case where the data to be inserted into the tablespace 1 is written into the data page 4101 is taken as an example, in the case where the database data insertion operation is performed into the tablespace 1, the number of data pages corresponding to the physical file 1b is not limited to one or more, for example, if the amount of insertion data is large, the number of data pages corresponding to the amount of data written into the physical file 1b at a time may be plural.
In the case of writing data into the physical file 1b, the data written into the physical file 1b may be written into a corresponding data page in the table space 1, for example, a data page 4002 having a size of 10k but not yet fully written with the data amount, in which case only 8k data is written into the data page 4002, or 2k data may be written into the data page 4002, in which case the data to be inserted into the table space 1 this time, so that the data page 4002 may be written into the physical file 1a corresponding to 8k data, or 2k data may be written into the physical file 1b.
For example, fig. 2b also shows disk blocks 3002 to 3100 of disk 103b, after the data directory 102b is set as the data directory of the database, when the DBMS101 inserts data into the table space 2 to table space 5 shown in fig. 2a for the first time according to the service request, the physical files 2b to 5b associated with the corresponding table space may be created in the data directory 102b, and the data to be inserted into the table space 2 to 5 may be written into the physical files 2b to 5b according to the service request, and the data to be inserted into the table space 2 to 5 may be actually stored in the disk blocks of the disk 103b, for example, the disk blocks 3002 to 3100, and the number of the disk blocks to which the data is written may be determined according to the service requirement without limitation. And here, the data to be inserted into the table spaces 2 to 5 is not limited as to which disk block is written. After the data directory 102b is set as the data directory of the database, the principle of the implementation procedure of inserting data into any one of the table spaces 2 to 5 according to the service request by the DBMS101 is similar to the principle of the implementation procedure of inserting data into the table space 1 described above with respect to fig. 2b, and will not be repeated here.
Thus, as shown in FIG. 2a, the table data for table space 1 within DBMS101 may be stored within physical file 1a within data directory 102a and within physical file 1b within data directory 102 b. Similarly, table data for table space 2 within DBMS101 may be stored within physical file 2a within data directory 102a and within physical file 2b within data directory 102 b; the table data of the table space 3 within the DBMS101 may be stored within the physical file 3a within the data directory 102a and within the physical file 3b within the data directory 102 b; the table data of the table space 4 within the DBMS101 may be stored within the physical files 4a within the data directory 102a and within the physical files 4b within the data directory 102 b; the table data for the table space 5 within the DBMS101 may be stored within the physical files 5a within the data directory 102a and within the physical files 5b within the data directory 102 b. In this way, one tablespace may correspond to at least two physical files, and the at least two physical files may be associated with the same respective tablespace, such that the at least two physical files may constitute one logical tablespace. In this way, data of different data pages of the same tablespace may be stored in different file systems to enable storing data of the same tablespace in different types of disks.
It should be appreciated that the physical files within the data directory 102b specifically comprise which tablespaces within the DBMS101, depending on the access situation of the tablespaces of the database, if after setting the data directory 102b for the database, the service does not request to write data to the tablespace already created within the DBMS101, the data directory 102b will not be created with the physical files associated with that tablespace, e.g., the data directory 102b will not include physical files of the same name as the physical files of the corresponding tablespace within the data directory 102 a.
The present application is different from the related art 1 in that any type of table of the present application (not limited to a read only table) can be stored in an extended data directory other than the default data directory 102 a.
In the present application, the number of extended data directories newly provided in the database may be not only one as exemplified in the above embodiment, but also 2 or more. When the number of the extended data directories is 2 or more, the creation sequence of the files is also arranged between different physical files associated with the same table space in different data directories (including a default data directory and the extended data directory) of the database, so when the data of the table space is accessed, the table data in the corresponding physical files can be accessed according to the creation sequence between the different physical files of the table space, and the specific principle is similar to that of the fact that the number of the extended data directories is one, and is not repeated here.
It should be understood that when a data directory is added to a database, the data directory is not limited to the fact that the disk space corresponding to the existing data directory of the database is full or has poor performance. And, when the number of the extended data directories is 2 or more (for example, the extended data directory includes a data directory 102b as shown in fig. 2a and a new data directory 102c not shown), the present application is not limited to creating a physical file associated with the table space in the data directory 102c and writing the data to be written to the table space into a corresponding physical file in the data directory 102c in the case that the data of the same table space is written to the corresponding disk 103b of the data directory 102 b. That is, the present application does not limit the creation order and the data writing order between different physical files corresponding to the same tablespace between different data directories.
In some embodiments, the application scenario of the present application may be a database system in a cloud environment, and the host 100 shown in fig. 2a may also be a server in a computing cluster, and in some embodiments, the host 100 shown in fig. 2a may also be a virtual machine in the computing cluster, which is not limited in this aspect of the present application.
For example, in a cloud environment, a host (or virtual machine) of a database may often conveniently use various types of storage disks, such as a local disk, serial port (SERIAL ADVANCED Technology Attachment, SATA) Yun Pan, solid state drive (Solid STATE DISK, SSD) Yun Pan, and so on. Different types of storage disks tend to have different prices, speeds, storage capacities, etc. In a cloud environment, the local disk capacity is often limited, and the upper capacity limit of the cloud disk is often large. In order to enable a database system to utilize the high speed of a local disk and the large capacity of a cloud disk, the application can mount the cloud disk on a database host machine in some scenes so as to take the data catalog of the cloud disk as the expanded data catalog of a database. Thus, in this application scenario, the present application may store part of the data of the tablespace to the default data directory 102a for high-speed access of the data using the disk 103a (e.g., the local disk 1), and store another part of the data of the tablespace to the extended data directory 102b for mass storage of the data using the disk 103b (e.g., the cloud disk), which may enable high-speed access and mass storage of database data.
In one possible implementation, after the data directory 102b is set in the database, after the physical files 1b to 5b corresponding to the table spaces 1 to 5 are created as shown in fig. 2a, and the physical files 1b to 5b are associated with the table spaces 1 to 5, respectively, after the DBMS101 receives a data access request (which may include any database operation such as insertion, deletion, modification, and query of data) of any one of the table spaces 1 to 5 (for example, the table space 1) again, the DBMS101 may determine, based on the data access request, the file size of the physical file 1a, the file size of the physical file 1b, and the creation order of the physical file 1a and the physical file 1b, the physical file that needs to be accessed at this time in the physical file 1a and/or the physical file 1b associated with the table space 1, map the data page that needs to be accessed in the table space 1 to the disk block corresponding to implement the data access based on the data access request.
Because DMBS101 responds to the data access request of the service with the database object (e.g., the table space), the present application only changes the storage location of the data page corresponding to the table space, so that the physical files corresponding to the table space may be more than 2 or 2, and thus, the table space facing the service is not changed, and thus, the service cannot perceive the change of the physical files. The user triggering the service does not need to create a table of the database or modify a table of the database to use the extended data directory 102b. And for a table in which data pages are located in a plurality of data directories (for example, a table corresponding to any one of table spaces 1 to 5), all database functions of a common table can be supported.
In one possible implementation, with continued reference to FIG. 2a, after setting up the data directory 102b for the database, the DBMS101 receives a data access request, e.g., the data access request includes creating a table (e.g., table 6), and writing data to the table, the DBMS101 may create a table space 6 in response to the data access request, where the table space 6 does not yet include any data pages, and the data pages within the table space 6 gradually increase as data is written to the table space 6. And the DBMS101 may create a physical file 6b associated with the tablespace 6 within the data directory 102b and save the association.
Since tablespace 6 is the tablespace of the newly created table after the creation of data directory 102b, tablespace 6 may be understood as a new table, and thus tablespace 6 does not currently have a corresponding physical file in data directory 102 a. In contrast, the tablespaces 1 through 5 are tablespaces that have been created before the database is populated with the data catalog 102b, and the tablespaces 1 through 5 all belong to the old table, so that the data catalog 102a has been populated with the physical files of the 5 tablespaces before the database is populated with the data catalog 102 b.
Continuing to describe the data access request of the DBMS101 to the above-mentioned table space 6, after the DBMS101 creates the physical file 6b associated with the table space 6 in the data directory 102b and saves the association relationship, the DBMS101 may write the data to be inserted into the table space 6 into the physical file 6b according to the data access request, so as to write the data of the table space 6 into the disk 103b, and the data storage principle and process are similar to the process shown in fig. 2b, and the data writing is performed according to the data page and the disk block, which will not be repeated herein. Thus, the table data of table space 6 may be stored on disk 103b, while the table data of table space 1 through table space 5 may be stored on disk 103a, optionally further stored on disk 103b. To achieve the effect of storing table data of different table spaces in different storage locations.
In a possible implementation manner, with deletion of database data stored in the disk 103a as shown in fig. 2a, the free storage space in the disk 103a is made to be greater than a preset threshold, and the preset threshold may be flexibly configured according to requirements, which is not described herein. Then the DBMS101, upon receiving a data insertion request for a new table (e.g., table space 6), the DBMS101 may create an empty physical file 6a within the data directory 102a and associate the physical file 6a with the table space 6, e.g., the DBMS101 may save the association of the physical file 6a with the table space 6 such that the table space 6 is associated with not only the physical file 6b within the data directory 102b but also the physical file 6a within the data directory 102a such that the same table space is associated with physical files within a different data directory. Then, the DBMS101 may write data into at least one data page of the table space 6 and write data of the at least one data page into the physical file 6a according to the data insertion request, so that the database data inserted this time is stored into a corresponding disk block in the disk 103 a. In this way, it is also possible for a new table to realize that different data pages of the same table space are stored in different storage locations.
Thus, in embodiments of the present application, data directory 102a and data directory 102b may each store different physical files of the same tablespace of DBMS101 and/or physical files of different tablespaces.
The above-described embodiments mainly describe a process of writing data to an old table after the data directory 102b is set as a data directory of a database, and creating a new table (e.g., the table space 6) and writing data to the new table, which involves the steps of creating a physical file corresponding to a table space of a corresponding accessed table in the data directory 102b, and saving an association relationship of the physical file with the table space. For example, the association of the physical file with the tablespace may be stored in a configuration file (e.g., metadata file) of the database.
Then to ensure that after each database restart, the database is able to support the need for efficient data access to the tablespace associated with the plurality of physical files. In one possible implementation, each time the database is restarted, the DBMS101 may read the address of the data directory to the database by reading the configuration file of the database, here the address of the data directory 102a (e.g., shown in FIG. 2 a/mnt/data) and the address of the data directory 102b (e.g., shown in FIG. 2 a/mnt/data_2); then, when the DBMS101 scans information (e.g., name, size, etc.) of the physical files in the data directory 102a according to the address of the data directory 102a, and the DBMS101 scans information (e.g., name, size, etc.) of the physical files in the data directory 102b according to the address of the data directory 102b, then when the physical files are associated in both the data directory 102a and the data directory 102b in the same table space, the DBMS101 may scan from two data directories to 2 physical files of the same name, for example, the DBMS101 scans to the physical file 202a and the physical file 202b of which names are "table space 1", so that it may be determined that the two physical files are associated with the table space 1, that is, that data indicating one table corresponding to the table space 1 is stored in the physical file 202a and the physical file 202 b.
Or in one possible implementation, the DBMS101 associates the physical file 202b with the tablespace 1 in a manner that does not create the physical file 202b with the same name as the physical file 202a, but rather associates the name of the physical file 202b with the name of the tablespace 1 and saves the association in metadata of the physical file 202 b. Then in this embodiment, after the database is restarted, the DBMS101 may determine which tablespace each physical file is specifically associated with by reading the metadata information of each physical file in the data directory 102 b; and the DBMS101 may also determine which tablespace each physical file in the data directory 102a is associated with by reading the names of the physical files in the data directory 102a in the above embodiment (of course, other implementations are also possible, and are not limited herein), so that the DBMS101 may determine which physical files in the data directory 102a and the data directory 102b are associated with the same tablespace and which tablespace when the database is restarted.
After the database is restarted, the DBMS101 may determine the physical files associated with the same tablespace within the data directory 102a, 102b, and further, the DBMS101 may record the size of at least two physical files associated with the same tablespace, as well as information of the order of creation of the at least two physical files.
Taking tablespace 1 as an example, other tablespaces are the same, and not described in detail herein, each time after a database is restarted, DBMS101 may record the file size of physical file 202a and the file size of physical file 202b associated with tablespace 1, and record information that physical file 202a was created earlier than physical file 202 b. Thus, after the database is restarted, if the service needs to access the table space 1, the DBMS101 may map the data page in the table space 1 to be accessed at this time to the disk block of the corresponding physical file by using the recorded information of the file size and the sequence information of file creation in the morning and evening, so as to map the logical address to the physical address, so as to realize accurate access of the data in the table space storing the data in different data directories.
Taking fig. 2b as an example, for example, one data page has a size of 10k, the physical file 202a stores 100 data pages having a file size of 1000k, and the physical file 202b stores one data page having a file size of 10k. The physical file 202a is created earlier than the physical file 202 b. It should be appreciated that the DBMS101 writes data to the tablespace 1 in sequential order, e.g., the data page 4002 is created after the data page 4001 is full, the data page 4003 is created after the data page 4002 is full, and so on, such that the data page 4004 to the data page 4101 are sequentially full, which is not repeated here.
Since the size of each data page is known, the sizes of the physical files 202a, 202b are known, and the order of creation of the physical files 202a and 202b is known, the DBMS101 can write data to the physical files in this order of creation, and thus the DBMS101 can determine that the physical file 202a stores the data of the first 100 data pages of Table space 1, and that the physical file 202b stores the data of the 101 st data page of Table space 1. The file system 104a where the physical file 202a is located is a file system of the disk 103a, and a mapping relationship between addresses may be provided between the file system 104a and the disk 103a (which may be implemented by any one of the prior art without limitation herein), so that 100 data pages stored in the physical file 202a may be mapped to corresponding 100 disk blocks of the disk 103 a; similarly, the mapping relationship between addresses may be provided between the file system 104b and the disk 103b (which may be implemented by any of the prior art techniques, without limitation), so that 1 data page stored in the physical file 202b may be mapped to a corresponding 1 disk block of the disk 103 b. Thus, 100 disk blocks stored by physical file 202a are automatically mapped to pages 1 through 100 of tablespace 1, and one disk block stored by physical file 202b is automatically mapped to page 101 of tablespace 1.
Thus, each time after a database restart, the DBMS101 may map the logical data page in the tablespace 1 accessed by the current data access request to the disk block of the disk according to the data access request of the tablespace 1, the recorded file sizes of the physical file 202a and the physical file 202b associated with the tablespace 1, and the creation order between the two files, so as to realize the mapping of the logical address to the physical address, so as to realize the access (which may include the inquiry, deletion and modification) to the data in the tablespace associated by at least two physical files.
For example, when the current data access request requests to access the 2 nd data page of the table space 1, the data access is performed at the corresponding physical address mapped in the disk 103a corresponding to the physical file 202 a; for another example, when the current data access request accesses the 101 st data page of the table space 1, the data access is performed at the corresponding physical address mapped in the disk 103b corresponding to the physical file 202 b.
If the data access request is the insertion data, the data access request may be inserted into the physical file 202b by default without information according to the recorded file size and the file creation order, or may be inserted into the physical file 202a when the disk 103a is free, which is not limited herein.
In the embodiment of the application, after each time the database is restarted, the database can record the file sizes of at least two physical files associated with the same table space and the information of the creation sequence of the files. Thus, after the database is restarted, when the service requests to delete, modify, query and the like the data in the table space of the database, the database can map the logical address of the accessed data page in the accessed table space corresponding to at least two physical files with the physical address of the physical file 202a or the physical file 201b according to the recorded information, so as to realize accurate data access of the table space respectively associated with at least two physical files in different data directories.
The above process uses the tablespace 1 as an example to illustrate the data access process after the database is restarted, and when the accessed table is a new table, for example, the tablespace 6 shown in fig. 2a, if the data directory 102a includes the physical file 6a, the physical file 6b is created earlier than the physical file 6a, and other processes are the same as the data access process described in the tablespace 1, and will not be repeated herein.
In various embodiments of the present application, the table spaces corresponding to the plurality of physical files in the same data directory are different, in other words, the data of one table space may be stored in the plurality of data directories, but when the data of one table space is stored in any one data directory, the data of the table space may be stored in only one physical file of the corresponding data directory.
In one possible implementation, the backup tool may be improved when the table data of the database is backed up, and the backup tool of the present application may obtain address information of a data directory of the database, such as an address of the data directory 102a and an address of the data directory 102b shown in fig. 2a, by reading a configuration file of the database; then, the backup tool may scan information (name, size, etc.) of the physical files from the two data directories using addresses of the two data directories, so as to determine which tablespaces are respectively associated with the physical files in the two data directories, the physical files associated with the tablespaces, and the creation order of the physical files; then, the backup tool may read data from the physical files associated with the same table space in the two data directories, for example, table space 1, where the backup tool may read data from the physical file 202a of the data directory 102a and data from the physical file 202b of the data directory 102 b; and the backup tool concatenates the data read from the physical file 202b to the tail of the data read from the physical file 202a in the order of creation of the physical file (where the physical file 202a was created earlier than the physical file 202 b) to obtain backup data for the tablespace 1.
With continued reference to FIG. 2b, for example, the data in physical file 202a is data stored in disk block 3001a, disk blocks 3002a, …, disk block 3100a in order of storage from first to last. Then, as shown in fig. 2c, the backup data of the table space 1 is sequentially data stored in the disk block 3001a, the disk blocks 3002a, …, the disk block 3100a and the disk block 3001b according to the sequence of the data, where the data stored in the disk block 3001b is connected to the data tail of the disk block 3100 a.
By way of example, the backup tool may be an improvement over existing database backup tools, such as adding a tablespace file aggregation component to effect a backup of data for a tablespace associated with a plurality of physical files. The backup tool may use any existing data backup method of the database, and is not limited herein. For example, the backup tool may be software.
Thus, the database management system of the application can support full-scale data backup for the tables stored in a plurality of data catalogs, and the database objects operated by the data backup operation are still table spaces, do not relate to the storage positions of specific data pages, and can be transparent to the full-scale backup operation.
In the embodiment of the application, by adding the physical files for storing the data of the table space, different data in one table corresponding to the table space can be respectively stored in different physical files, so that only the storage position of the bottom table data of the table space is changed, and a plurality of physical files positioned at different storage positions are associated with the same table space, so that the plurality of physical files (for example, the physical file 1a and the physical file 1 b) form a logical table space object of the table space 1, thereby realizing the splitting of the data storage position of the same table space. Because the database is based on the operation of database objects (such as table spaces instead of physical files), various functions of the database cannot detect the change of the storage position of the table data in the table space, and the service cannot sense the change of the storage position of the data in the table, so that when the service accesses the database data, the method can realize hot switching access of the data in the table space corresponding to different physical files, and the service cannot sense the hot switching. In this way, the database operation functions supported by the tables (e.g., tables corresponding to table spaces 1 through 6 in fig. 2 a) split by the storage locations can be kept consistent with the database operation functions supported by the normal tables (e.g., tables corresponding to table spaces 1 through 6 in fig. 1) in which no storage locations split is performed in the database.
Fig. 2a is a schematic diagram illustrating an exemplary system framework. It should be understood that the architecture shown in fig. 2a is only an example, and that the system of the present application may have more or fewer components than shown in the figures, may combine two or more components, or may have different configurations of components. The various components shown in fig. 2a may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
In one possible implementation manner, the application further provides a data storage system of a database system, wherein a data directory of the database system comprises: a first data directory and at least one second data directory, the data storage system comprising: the first writing module is used for writing first data into a second physical file in the second data directory based on a first writing request; wherein the first write request includes first data requesting writing to a first tablespace; the second physical file is associated with the first tablespace; the first tablespace is also associated with a first physical file within the first data directory; the first physical file includes partial data of the first tablespace.
In a possible implementation manner, the first writing module is specifically configured to: creating a second physical file associated with the first tablespace within the second data directory based on a first write request if it is determined that the second physical file associated with the first tablespace is not included within the second data directory; and writing the first data into the second physical file.
In one possible implementation, the first physical file includes data of at least one data page of the first tablespace, and the second physical file includes data of at least one data page of the first tablespace.
In one possible implementation, the data storage system further comprises: the mapping module is used for mapping a first access address to the first table space to a second access address in a third physical file based on a first access request to the first table space, according to the file size of the first physical file, the file size of the second physical file and the creation sequence between the first physical file and the second physical file, wherein the third physical file is the first physical file or the second physical file, and the first access request comprises the first access address; and the first access module is used for accessing the data of the first table space according to the second access address.
In one possible implementation, the data storage system further comprises: a first determining module, configured to determine, when the database system is restarted, at least two fourth physical files associated with the same second tablespace from among the physical files in the first data directory and the at least one second data directory; the recording module is used for recording first information, wherein the first information comprises: the file size of each of the at least two fourth physical files associated with the second tablespace, and the creation order between different ones of the at least two fourth physical files.
In one possible implementation, the data storage system further comprises: and the second access module is used for accessing the data of the second table space in at least one fourth physical file associated with the second table space according to the first information recorded in the second table space based on a second access request of the second table space.
In one possible implementation, the data storage system further comprises: a first creation module configured to create a third tablespace in the database system based on a second write request, where the second write request includes second data requesting writing to the third tablespace; a second creating module, configured to create a fifth physical file associated with the third tablespace in the second data directory; and the second writing module is used for writing the second data into the fifth physical file.
In one possible implementation, the data storage system further comprises: a second determining module, configured to determine at least two sixth physical files associated with the same fourth tablespace in the physical files in the first data directory and in the at least one second data directory, respectively; the reading module is used for reading the data in each sixth physical file in the at least two sixth physical files; the splicing module is used for connecting the data respectively read from the at least two sixth physical files according to the sequence from the early to the late of the creation time based on the creation sequence between different sixth physical files in the at least two sixth physical files so as to acquire the table data of the fourth table space; and the backup module is used for backing up the table data of the fourth table space.
In one possible implementation, the first physical file is created before the second physical file.
In one possible embodiment, the plurality of data directories of the database system are located in different file systems, respectively.
In one possible implementation manner, disk types of disks corresponding to a plurality of file systems corresponding to the database system are different.
An apparatus provided by an embodiment of the present application is described below. As shown in fig. 3:
Fig. 3 is a schematic structural diagram of a data storage device of a database system according to an embodiment of the present application. As shown in fig. 3, the apparatus 500 may include: processor 501, transceiver 505, and optionally memory 502.
The transceiver 505 may be referred to as a transceiver unit, a transceiver circuit, etc. for implementing a transceiver function. The transceiver 505 may include a receiver, which may be referred to as a receiver or a receiving circuit, etc., for implementing a receiving function, and a transmitter; the transmitter may be referred to as a transmitter or a transmitting circuit, etc., for implementing a transmitting function.
The memory 502 may store a computer program or software code or instructions 504, which computer program or software code or instructions 504 may also be referred to as firmware. The processor 501 may implement the data storage method of the database system provided in the embodiments of the present application by running a computer program or software code or instructions 503 therein or by calling a computer program or software code or instructions 504 stored in the memory 502. The processor 501 may be a central processing unit (central processing unit, CPU), and the memory 502 may be, for example, a read-only memory (ROM), or a random access memory (random access memory, RAM).
The processor 501 and transceiver 505 described in the present application may be implemented on an integrated circuit (INTEGRATED CIRCUIT, IC), analog IC, radio frequency integrated circuit RFIC, mixed signal IC, application SPECIFIC INTEGRATED Circuit (ASIC), printed circuit board (printed circuit board, PCB), electronic device, or the like.
The apparatus 500 may further include an antenna 506, and the modules included in the apparatus 500 are merely illustrative, and the present application is not limited thereto.
The structure of the data storage device of the database system may be, for example, not limited by fig. 3. The data store of the database system may be a stand-alone device or may be part of a larger device. For example, the data storage means of the database system may be implemented in the form of:
(1) A stand-alone integrated circuit IC, or chip, or a system-on-a-chip or subsystem; (2) A set of one or more ICs, optionally including storage means for storing data, instructions; (3) modules that may be embedded within other devices; (4) an in-vehicle apparatus, etc.; (5) others, and so forth.
For the case where the data storage device of the database system is implemented as a chip or a chip system, reference is made to the schematic structure of the chip shown in fig. 4. The chip shown in fig. 4 includes a processor 601 and an interface 602. Wherein the number of processors 601 may be one or more, and the number of interfaces 602 may be a plurality. Alternatively, the chip or system of chips may include a memory 603.
All relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.
Based on the same technical idea, the embodiments of the present application also provide a computer-readable storage medium storing a computer program, the computer program containing at least one piece of code executable by a computer to control the computer to implement the above-mentioned method embodiments.
Based on the same technical idea, the embodiments of the present application also provide a computer program for implementing the above-mentioned method embodiments when the computer program is executed.
The program may be stored in whole or in part on a storage medium that is packaged with the processor, or in part or in whole on a memory that is not packaged with the processor.
Based on the same technical conception, the embodiment of the application also provides a chip comprising a processor. The processor may implement the method embodiments described above.
The steps of a method or algorithm described in connection with the present disclosure may be embodied in hardware, or may be embodied in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access Memory (Random Access Memory, RAM), flash Memory, read Only Memory (ROM), erasable programmable Read Only Memory (Erasable Programmable ROM), electrically Erasable Programmable Read Only Memory (EEPROM), registers, hard disk, a removable disk, a compact disk Read Only Memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims (15)

1. A method for storing data in a database system, wherein a data directory in the database system comprises: a first data directory and at least one second data directory, the method comprising:
Writing first data to a second physical file in the second data directory based on a first write request;
Wherein the first write request includes first data requesting writing to a first tablespace;
the second physical file is associated with the first tablespace;
the first tablespace is also associated with a first physical file within the first data directory;
The first physical file includes partial data of the first tablespace.
2. The method of claim 1, wherein writing first data to a second physical file within the second data directory based on the first write request comprises:
Creating a second physical file associated with the first tablespace within the second data directory based on a first write request if it is determined that the second physical file associated with the first tablespace is not included within the second data directory;
and writing the first data into the second physical file.
3. The method according to claim 1 or 2, wherein the first physical file comprises data of at least one data page of the first tablespace and the second physical file comprises data of at least one data page of the first tablespace.
4. A method according to any one of claims 1 to 3, wherein after the writing of the first data to the second physical file within the second data directory, the method further comprises:
Mapping a first access address to the first table space to a second access address in a third physical file based on a first access request to the first table space, a file size of the first physical file, a file size of the second physical file, and a creation sequence between the first physical file and the second physical file, wherein the third physical file is the first physical file or the second physical file, and the first access request includes the first access address;
accessing the data of the first tablespace according to the second access address.
5. The method according to any one of claims 1 to 4, further comprising:
determining at least two fourth physical files associated with the same second table space in respective physical files in the first data directory and the at least one second data directory when the database system is restarted;
Recording first information, wherein the first information comprises: the file size of each of the at least two fourth physical files associated with the second tablespace, and the creation order between different ones of the at least two fourth physical files.
6. The method of claim 5, wherein after the recording of the first information, the method further comprises:
Based on a second access request to the second tablespace, accessing data of the second tablespace in at least one fourth physical file associated with the second tablespace in accordance with the first information recorded for the second tablespace.
7. The method according to any one of claims 1 to 6, further comprising:
Creating a third tablespace in the database system based on a second write request, wherein the second write request includes second data requesting writing to the third tablespace;
Creating a fifth physical file associated with the third tablespace within the second data directory;
and writing the second data into the fifth physical file.
8. The method according to any one of claims 1 to 7, further comprising:
determining at least two sixth physical files associated with the same fourth tablespace in the respective physical files in the first data directory and in the at least one second data directory;
Reading data in each sixth physical file in the at least two sixth physical files;
Based on the creation sequence of different sixth physical files in the at least two sixth physical files, connecting the data respectively read from the at least two sixth physical files according to the sequence from the early to the late of the creation time so as to acquire the table data of the fourth table space;
And backing up the table data of the fourth table space.
9. The method of any one of claims 1 to 8, wherein the first physical file is created before the second physical file.
10. The method according to any of claims 1 to 9, wherein the plurality of data directories of the database system are located in different file systems, respectively.
11. The method of claim 9, wherein the plurality of disks corresponding to the file system for the database system are different in disk type.
12. A data storage system of a database system, wherein a data directory of the database system comprises: a first data directory and at least one second data directory, the data storage system comprising:
the first writing module is used for writing first data into a second physical file in the second data directory based on a first writing request;
Wherein the first write request includes first data requesting writing to a first tablespace;
the second physical file is associated with the first tablespace;
the first tablespace is also associated with a first physical file within the first data directory;
The first physical file includes partial data of the first tablespace.
13. A computer readable storage medium comprising a computer program which, when run on a computer or processor, causes the computer or processor to perform the method of any one of claims 1 to 11.
14. A data storage device of a database system, comprising one or more interface circuits and one or more processors; the interface circuit is configured to receive a signal from the memory and to send the signal to the processor, the signal comprising computer instructions stored in the memory; the processor, when executing the computer instructions, is adapted to perform the method of any one of claims 1 to 11.
15. A computer program product, characterized in that the computer program product comprises a software program which, when executed by a computer or processor, causes the steps of the method of any one of claims 1 to 11 to be performed.
CN202211397508.0A 2022-11-09 2022-11-09 Data storage method and system of database system Pending CN118051642A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211397508.0A CN118051642A (en) 2022-11-09 2022-11-09 Data storage method and system of database system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211397508.0A CN118051642A (en) 2022-11-09 2022-11-09 Data storage method and system of database system

Publications (1)

Publication Number Publication Date
CN118051642A true CN118051642A (en) 2024-05-17

Family

ID=91047042

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211397508.0A Pending CN118051642A (en) 2022-11-09 2022-11-09 Data storage method and system of database system

Country Status (1)

Country Link
CN (1) CN118051642A (en)

Similar Documents

Publication Publication Date Title
US10318434B2 (en) Optimized hopscotch multiple hash tables for efficient memory in-line deduplication application
US10430286B2 (en) Storage control device and storage system
US8423733B1 (en) Single-copy implicit sharing among clones
US8904137B1 (en) Deduplication system space recycling through inode manipulation
US11886401B2 (en) Database key compression
US9430492B1 (en) Efficient scavenging of data and metadata file system blocks
US8694563B1 (en) Space recovery for thin-provisioned storage volumes
US11263090B2 (en) System and method for data packing into blobs for efficient storage
CN108628542B (en) File merging method and controller
CN112463753B (en) Block chain data storage method, system, equipment and readable storage medium
CN109976669B (en) Edge storage method, device and storage medium
CN113853778B (en) Cloning method and device of file system
CN106709014B (en) File system conversion method and device
CN115794669A (en) Method, device and related equipment for expanding memory
JP2015528957A (en) Distributed file system, file access method, and client device
US10528284B2 (en) Method and apparatus for enabling larger memory capacity than physical memory size
CN116414304B (en) Data storage device and storage control method based on log structured merging tree
CN116048396B (en) Data storage device and storage control method based on log structured merging tree
US11366609B2 (en) Technique for encoding deferred reference count increments and decrements
CN108804571B (en) Data storage method, device and equipment
US20200311030A1 (en) Optimizing file system defrag for deduplicated block storage
KR100597411B1 (en) Method and apparatus for effective data management of file
US20240028240A1 (en) Metadata-based data copying
CN118051642A (en) Data storage method and system of database system
US11281387B2 (en) Multi-generational virtual block compaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication