WO2016117007A1

WO2016117007A1 - Database system and database management method

Info

Publication number: WO2016117007A1
Application number: PCT/JP2015/051249
Authority: WO
Inventors: 上村　哲也
Original assignee: 株式会社日立製作所
Priority date: 2015-01-19
Filing date: 2015-01-19
Publication date: 2016-07-28

Abstract

Provided is a database system, comprising a first discrete execution unit which executes a transaction upon a database (first database) of a first version (e.g., runtime environment), and a version management unit which, with each added version (e.g., additional development environment), generates an additional database which is a snapshot of the first database at the time that the version is added and an additional discrete execution unit which executes the transaction upon the additional database. With each update of a row of any database among all of the databases corresponding to each of all of the discrete execution units including at least the first discrete execution unit, the version management unit adds to history information an entry which is information including the ID of the updated row, the post-update data stored in the updated row, and the ID of the version corresponding to the discrete execution unit which carries out the update.

Description

Database system and database management method

The present invention generally relates to database management.

 A method of acquiring a database snapshot in the database development environment and executing a transaction for the snapshot is conceivable. With regard to database snapshots or transactions, for example, techniques of

Patent Documents

1 and 2 are known.

JP-A-5-6297 JP-T 2007-53156

The database snapshot has only one generation and is read-only. When the transaction is confirmed, the snapshot is switched to a new generation (return). For this reason, it is not suitable to simply use a database snapshot in a development environment.

In order to build a development environment equivalent to the production environment as the database development environment, it is necessary to introduce a server and storage equivalent to the server and storage included in the production environment. You will also need a copy of the database. For this reason, the preparation period of environment construction becomes long and cost becomes large.

In order to avoid this problem, if the development environment is made weaker than the production environment (for example, if servers and storage that are less reliable than the production environment are introduced), testing in the development environment will be insufficient, and the reliability of the production environment will be reduced. (For example, failures that cannot be found in the development environment may occur in the production environment).

Each time the database system adds a version (for example, a development environment), a first individual execution unit that executes a transaction for the database (first database) of the first version (for example, production environment) A version management unit that generates an additional database that is a snapshot of the first database at the time and an additional individual execution unit that executes transactions for the additional database. The version management unit updates the ID of the row to be updated and the update each time a row of any database among all databases corresponding to all the individual execution units including at least the first individual execution unit is updated. An entry which is information including the updated data stored in the row to be updated and the version ID corresponding to the individual execution unit to be updated is added to the history information.

Develop a development environment equivalent to the production environment quickly and at low cost.

It is a block diagram of the computer system which concerns on embodiment. It is a block diagram of a database engine. Indicates overall update history information. Individual update history information # 0 is shown. Individual update history information # 1 is shown. Individual update history information # 2 is shown. It is a flowchart of a snapshot creation process. It is a flowchart of a data writing process. It is a flowchart of a data reading process. It is a flowchart of a garbage collection process. It is a flowchart of an anonymization process. It is a flowchart of a resource control process.

Hereinafter, an embodiment will be described.

In the following description, the “storage unit” may be one or more storage devices including a memory. For example, the storage unit may be at least a main storage device of a main storage device (typically a volatile memory) and an auxiliary storage device (typically a nonvolatile storage device).

Also, in the following description, “PDEV” indicates a physical storage device, and may typically be a nonvolatile storage device (for example, an auxiliary storage device). The PDEV may be, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive).

In the following description, processing may be described using a functional unit (eg, version management unit, simultaneous execution control unit, etc.) as the subject, but the functional unit is programmed by a processor (eg, CPU (Central Processing Unit)). Since the predetermined processing is performed by using the storage unit (for example, memory) and / or the interface device (for example, communication port) as appropriate, the subject of the processing may be a processor. The processing described with the functional unit as the subject may be processing performed by a processor or an apparatus or system having the processor. The processor may include a hardware circuit that performs a part or all of the processing. At least a part of the plurality of functional units may be realized by a hardware circuit. The program may be installed in a computer-like device from a program source. The program source may be, for example, a storage medium that can be read by a program distribution server or a computer. When the program source is a program distribution server, the program distribution server may include a processor (for example, a CPU) and a storage unit, and the storage unit may further store a distribution program and a program to be distributed. Then, the processor of the program distribution server executes the distribution program, so that the processor of the program distribution server may distribute the distribution target program to other computers. In the following description, two or more functional units may be realized as one functional unit, or one functional unit may be realized as two or more functional units.

FIG. 1 is a configuration diagram of a computer system according to the embodiment.

The computer system 1 includes a plurality (or one) of client computers 100, a database server (an example of a database system) 200, and an external storage device 300. The client computer 100 and the database server 200 are connected via a communication network (for example, a LAN (Local Area Network)) 10.

The client computer 100 executes processing using database data. For example, the client computer 100 issues a database query to the database server 200 in the process.

The external storage apparatus 300 has one or more PDEVs and can store all or part of the database. The external storage apparatus 300 may not be provided. That is, the entire database may be stored in the memory 220 described later, and the database server 200 may be an in-memory database.

The database server 200 includes a CPU 210, a memory 220, a NIC (Network Interface Card) 230, and an HBA (Host Bus Adapter) 240. The NIC 230 is an interface device that communicates with the client computer 100 via the communication network 10. The HBA 240 is an interface device that communicates with the external storage apparatus 300.

The memory 220 stores a program for realizing a functional unit (a program for executing processing) and management information referred to by at least one functional unit. The memory 220 may store at least a part of the database. In the present embodiment, the memory 220 stores a database engine program and management information. The CPU 210 functions as a database engine 400 as shown in FIG. 2 by executing the database engine program. The management information includes, for example, overall update history information and individual update history information, which will be described later.

FIG. 2 is a configuration diagram of the database engine 400.

The database engine 400 is an example of a database management system (DBMS), and includes a query processing unit 410, a simultaneous execution control unit 420, a data conversion unit 430, a resource management unit 440, a data comparison unit 450, and a query saving unit. 460, fast-forward unit 470, speculative execution unit 480, and version management unit 490.

The query processing unit 410 generates a query execution plan for executing the query received from the client computer 100. Here, the query is described in, for example, SQL (Structured Query Language). The source of the query may be a source other than the client computer 100, for example, a program (for example, an application program) that is an upper program of the database engine 400 and is executed inside or outside the database server 200. Further, the query processing unit 410 can determine which version of a plurality of versions (production environment / development environment, etc.) the received query is, and associate the query with the version. The query is executed by the concurrent execution control unit 420 corresponding to the version associated with the query.

The simultaneous execution control unit 420 is an example of an individual execution unit, and executes a query to the database based on a query execution plan. The simultaneous execution control unit 420 controls the simultaneous execution of a plurality of transactions for one version of the database by, for example, MVCC (MultiVersion Concurrency Control). The concurrent execution control unit 420 is generated for each version (environment) of the database.

The data conversion unit 430 converts the data including confidential attribute values (for example, name, telephone number, address, credit card number) in the database to make it impossible to specify the attribute value of the data. (Hereinafter, anonymization processing) is executed. For example, the data conversion unit 430 manages, for each database version, whether or not to execute the anonymization process and an anonymization method in the case of executing the anonymization process. The data conversion unit 430 converts (anonymizes) the data including the confidential attribute value in the target version database according to the anonymization method corresponding to the version.

The resource management unit 440 manages resource constraints (for example, CPU usage rate, memory bandwidth, etc.) for each database version. The resource constraint represents a usable resource amount (amount of resource portion) among resources (CPU, memory, etc.) of the database server 200. The resource management unit 440 controls the amount of resources allocated to the database version (the concurrent execution control unit 420 corresponding to the version) according to the resource constraint corresponding to the database version.

The data comparison unit 450 compares a plurality of processing results respectively corresponding to a plurality of programs using a plurality of version databases. The data comparison unit 450 outputs the comparison result (for example, outputs it to the client computer 100).

The query saving unit 460 accumulates the query so as to be identifiable for each database version.

The fast-forward unit 470 executes one or more queries corresponding to the version among the queries stored in the query saving unit 460 for any version of the database. Thereby, it can be expected to reproduce the CPU load or the like according to the execution of one or more queries corresponding to the database of that version.

The speculative execution unit 480 causes the simultaneous execution control unit 420 to execute a query using a plurality of query execution plans generated by the query processing unit 410.

The version management unit 490 manages a plurality of versions of the database. The version management unit 490 includes a data access unit 491, an area collection unit 492, a data saving unit 493, a starting point setting unit 494, a version creation unit 495, and a branch identification unit 496. Each process of the function units 491 to 496 will be described later.

FIG. 3 shows the entire update history information.

The whole update history information 221 is information managed by the version management unit 490, and represents the update history of the database table for all versions. Specifically, the overall update history information 221 has an entry for each update transaction. Each entry stores a row ID 311, data (attribute value) 312, TR_ID 313, BR_ID 314, and TIME 315.

The row ID 311 is an ID of an added row (database table row). Data 312 is updated data. TR_ID 313 is an ID of the update transaction. BR_ID 314 is an ID of the version of the database including the row added by the update transaction. The TIME 315 stores information indicating the row update order (or update time). The value of TIME 315 may be a value that can specify the update order (addition order).

Hereinafter, BR_ID “n” (n is an integer of 0 or more) may be expressed as “version #n”. A database of version #n or a table thereof may be referred to as “database #n” or “table #n”. In addition, a row having a row ID “m” (m is an integer of 0 or more) may be referred to as “row #m”. A transaction with TR_ID “p” (p is an integer of 0 or more) may be referred to as “transaction #p”.

4A to 4C show the individual update history information 600 at the time of TIME “6”, corresponding to the overall update history information 221 of FIG. The individual update history information 600 represents the update history of the database (table) updated by the simultaneous execution control unit 420 and is managed by the simultaneous execution control unit 420. Therefore, the individual update history information 600 has an entry for each transaction, and each entry has a row ID 601, data (attribute value) 602, and TR_ID 603, but BR_ID is not included in the individual update history information 600 (simultaneously. The BR_ID is not recognized (conscious) by the execution control unit 420). Hereinafter, the individual update history information 600 corresponding to the version #m corresponding to the simultaneous execution control unit 420 may be referred to as “individual update history information #m”. Therefore, FIG. 4A shows individual update history information # 0, FIG. 4B shows individual update history information # 1, and FIG. 4C shows individual update history information # 2. Hereinafter, the simultaneous execution control unit 420 corresponding to version #m may be referred to as “simultaneous execution control unit #m”. The concurrent execution control unit #m executes a transaction for the database #m (table #m).

According to the history information of FIG. 3 and FIGS. 4A to 4C, the following can be said, for example.

That is, in TIME “0” to “2”, rows # 0 to # 2 (data “A” to “C”) are stored in table # 0 by the transactions # 0 to # 3 of the simultaneous execution control unit # 0, respectively. Added. As a result, the addition of rows # 0 to # 2 (data “A” to “C”) is recorded in the individual update history information # 0 by the simultaneous execution control unit # 0, and the entire update history is recorded by the version management unit 490. Information 221 is recorded. At this point, no version other than version # 0 exists.

Version # 1 is added after TIME “2”, and concurrent execution control unit # 1 is added (generated) accordingly. The reference of the additional version is a database of a predetermined version (here, version # 0 (for example, production environment)) at the time of version addition. Therefore, up to TIME “3”, the table # 1 seen from the concurrent execution control unit # 1 and the table # 0 seen from the concurrent execution control unit # 0 are the same. That is, both table # 1 and table # 0 hold data “A” to “C” stored in rows # 0 to # 2, respectively. As described above, when version #k (k is an integer equal to or greater than 1) is added based on version # 0, concurrent execution control unit #k is added, and added concurrent execution control unit #k. Table #k will be visible. The table #k at the time of version addition is not a copy of the table # 0 at the time of version addition, but a snapshot of the table # 0 at the time of version addition. Therefore, it is not necessary to copy the table # 0 (database).

In TIME “3”, row # 3 (data “D”) is added to table # 1 by transaction # 3 of concurrent execution control unit # 1. The addition of row # 3 (data “D”) is recorded in the individual update history information # 1 by the concurrent execution control unit # 1 and is recorded in the overall update history information 221 by the version management unit 490. In this way, it is possible to update the table # 1 such as adding data. Updates such as data addition are not actually reflected in the database, but are recorded in history information (log information) such as individual update history information # 1 and overall update history information 221. Thereby, a substantial update to the snapshot of Table # 0 is possible. Note that updating both the individual update history information # 1 and the overall update history information 221 is an example. As another example, for example, each individual update history information is included in the overall update history information 221, there is no individual update history information, and the simultaneous execution control unit 420 uses the BR_ID to determine each environment from the overall update history information 221. You may judge how you see it. That is, an example of the history information may be both the individual update history information and the overall update history information, or may be one of the individual update history information and the overall update history information (for example, the overall update history information).

In TIME “4”, row # 3 (data “E”) is added to table # 0 by transaction # 4 of concurrent execution control unit # 0. The addition of row # 3 (data “E”) is recorded in the individual update history information # 0 by the concurrent execution control unit # 0, and is recorded in the overall update history information 221 by the version management unit 490. Therefore, the data “E” in row # 3 is referred to in version # 0 (for example, the production environment), but is not referred to other than version # 0 (version # 1 (for example, the first development environment)). That is, the addition of row # 3 (data “E”) is not recorded in the individual update history information other than the individual update history information # 0. Therefore, data “E” does not exist in row # 3 of table # 1.

Version # 2 is added after TIME “4”. The reference for the additional version is a database of version # 0 (for example, production environment) at the time of version addition. Therefore, up to TIME “4”, the table # 2 seen from the concurrent execution control unit # 2 and the table # 0 seen from the concurrent execution control unit # 0 are the same. That is, both the table # 2 and the table # 0 hold the data “A” to “C” and “E” stored in the rows # 0 to # 3, respectively.

In TIME “5”, row # 3 (data “F”) is added to table # 2 by transaction # 5 of concurrent execution control unit # 2. That is, the data in row # 3 is updated from “E” to “F”. The addition of row # 3 (data “F”) is recorded in the individual update history information # 2 by the concurrent execution control unit # 2, and is recorded in the overall update history information 221 by the version management unit 490. Data “F” in row # 3 is referred to in version # 2 (for example, the second development environment), but other than version # 2 (for example, version # 0 (for example, the production environment) and version # 1 (for example, the first development environment). It is not referenced in the environment)). That is, the addition of row # 3 (data “F”) is not recorded in the individual update history information other than the individual update history information # 2. Therefore, data “F” does not exist in row # 3 of table # 0 and row # 3 of table # 1.

In TIME “6”, row # 2 (data “F”) is added to table # 0 by transaction # 6 of concurrent execution control unit # 0. That is, the data in row # 2 is updated from “C” to “F”. The addition of row # 2 (data “F”) is recorded in the individual update history information # 0 by the concurrent execution control unit # 0, and is recorded in the overall update history information 221 by the version management unit 490. The data “F” in row # 2 is referred to in version # 0, but is not referred to in versions other than version # 0 (version # 1 and version # 2). That is, the addition of row # 2 (data “F”) is not recorded in the individual update history information other than the individual update history information # 0. Therefore, data “F” does not exist in row # 2 of table # 1 and row # 2 of table # 2.

As described above, the version management unit 490 manages the update history of the database (table) for all versions, and the concurrent execution control unit #m manages the update history only for the version (m) database (table). To do. By adding the version, the development environment can be increased, and the database (table) can be updated in each development environment. By adding version #k (k is an integer equal to or greater than 1), a concurrent execution control unit #k is generated, and thus a table #k that can be seen only by the concurrent execution control unit #k is created. The reference table # 0 is not required to be copied. Furthermore, the concurrent execution control unit #k (for example, the kth development environment) is executed by the database server 200 that executes the concurrent execution control unit # 0 (for example, the production environment). For this reason, it is possible to construct a development environment (test environment) equivalent to the present application environment without introducing a server and storage equivalent to the production environment. Therefore, a development environment equivalent to the production environment can be constructed quickly and at low cost.

Also, the concurrent execution control unit 420 and the version management unit 490 are included in the database engine 400. This eliminates the need for a so-called proxy server or storage, and makes it easy to handle in-memory databases.

Not only the addition of version #k, it is also possible to delete version #k (for example, delete unnecessary development environment). In this case, the simultaneous execution control unit #k corresponding to the version #k to be deleted is also deleted. Thereby, when there is data (referred to as data “K” in this paragraph) referred to only by the concurrent execution control unit #k in the database (table), the area for storing the data K is recovered by the garbage collection process. (The area may be managed as an empty area).

The concept of this embodiment is not a replacement or diversion of a so-called general version management technique. Version control technology cannot simply be applied to database management technology. This is because a database normally manages data in units of rows, but a version management technique copies files and manages versions in units of files. Database and file are different. The database is structured data, the file is unstructured data, and one file can include a plurality of types of information. Therefore, if the normal version management technique is diverted to the database management technique, it is necessary to copy the database and perform version management. However, as described above, it is not necessary to copy the database (table) in this embodiment. Therefore, the concept of this embodiment is not a replacement or diversion of a so-called general version management technique. In this embodiment, a row-oriented database is adopted, but a column-oriented database (data management in units of columns) may be adopted instead of the row-oriented database.

Also, the concept of this embodiment is not a replacement or diversion of so-called virtual machine technology. In general, in virtual machine technology, data referred to by a virtual machine is not referred to by another virtual machine. Data is independent for each virtual machine, and therefore data unnecessary for the virtual machine can be deleted immediately. On the other hand, in this embodiment, data in the database may be shared by a plurality of simultaneous execution control units 420, and data recognized by at least one simultaneous execution control unit 420 cannot be deleted. Therefore, the concept of this embodiment is not a replacement or diversion of so-called virtual machine technology.

FIG. 5 is a flowchart of the snapshot creation process.

The version creation unit 495 receives a setting of a time point (TIME) when a new version is added to the database, for example, from the user via the client computer 100 (S501). The version creation unit 495 refers to the overall update history information 221 and acquires the transaction ID (TR_ID) corresponding to the received time point (TIME) (S502). Note that the state of the database at the end of the transaction of TR_ID is the starting point of the new version. In the above process, TR_ID is acquired from TIME, but TR_ID may be directly received.

Next, the starting point setting unit 494 identifies the BR_ID indicating the version to which the transaction of TR_ID belongs, and generates the BR_ID as a reference destination ID indicating the version serving as the reference destination (starting point) (S503). The starting point setting unit 494 generates a new BR_ID for the new version, and generates a concurrent execution control unit 420 that associates the reference destination ID with the new BR_ID (S504).

According to this snapshot creation process, a new version of the database starting from the state of the database at a certain point in time can be newly added. In addition, when adding a new version of the database, it is not necessary to copy the state of the starting database, so that a new version of the database can be generated at an early stage.

FIG. 6 is a flowchart of the data writing process.

The data access unit 491 acquires the row ID of the write target row and the write target data from the simultaneous execution control unit 420 (S601). Next, the data access unit 491 acquires the TR_ID assigned to the transaction to be written from the simultaneous execution control unit 420 (S602).

Next, the data access unit 491 acquires the BR_ID indicating the version of the database managed by the simultaneous execution control unit 420 from the simultaneous execution control unit 420 (S603). Next, the data access unit 491 acquires the current TIME (for example, the current time) (S604). Next, the data access unit 491 stores an entry including the acquired row ID, TR_ID, BR_ID, TIME, and data in the overall update history information 221 (S605).

FIG. 7 is a flowchart of the data reading process.

The data access unit 491 acquires the row ID of the row to be read from the simultaneous execution control unit 420 (S701). Next, the data access unit 491 acquires BR_ID indicating the version of the database managed by the simultaneous execution control unit 420 (S702), and passes the acquired row ID and BR_ID to the branch identification unit 496.

Next, the branch identifying unit 496 determines whether data corresponding to the acquired row ID and BR_ID exists in the overall update history information 221 (S703).

If the determination result in S703 is affirmative (S703: YES), the data access unit 491 reads the data from the overall update history information 221 and passes it to the simultaneous execution control unit 420 (S704). The data is output from the simultaneous execution control unit 420.

On the other hand, if the determination result in S703 is negative (S703: NO), the branch identification unit 496 determines whether the reference source ID corresponding to the BR_ID version database exists in the simultaneous execution control unit 420 (individual update history information 600). It is determined whether or not (S705).

If the determination result in S705 is affirmative (S705: YES), the branch identification unit 496 sets this reference destination ID as BR_ID (S706), and advances the process to S703. As a result, the processing from S703 onward is further executed, and data corresponding to the version database corresponding to BR_ID (reference source ID) is searched. A process for sequentially searching whether or not data corresponding to the version database exists, that is, a process for searching for data by sequentially tracing the version database of the reference destination is performed.

If the determination result in S705 is negative (S705: NO), it means that there is no row to be read, so the data access unit 491 notifies the concurrent execution control unit 420 of a row ID error (S707).

FIG. 8 is a flowchart of the garbage collection process.

The garbage collection process is a process of identifying data that is not referred to by any concurrent execution control unit 420 and collecting an area in which the data is stored. The collected area is managed as an empty area where data can be newly stored. The area to be collected is not referred to by any of the concurrent execution control units 420 and the area storing entries (entries of the entire update history information 221) corresponding to rows / data not referenced by any concurrent execution control unit 420. It may be at least one of the database area corresponding to the row / data. The garbage collection process is executed at an arbitrary time point and may be executed repeatedly.

The area collection unit 492 determines whether or not there is an unprocessed line (process target line) in the overall update history information 221 (S801).

If the determination result in S801 is affirmative (S801: YES), the area collection unit 492 acquires the row ID, TR_ID, and BR_ID of an unprocessed row as a processing target (S802).

Next, the area collection unit 492 is a row having the same row ID as the processing target row ID, and the same BR_ID as the processing target BR_ID or the same reference source ID associated with the processing target BR_ID. It is determined whether or not there is a row that includes BR_ID and whose TIME is before the TIME of the row to be processed (S803). If the determination result in S803 is affirmative (S803: YES), the area collection unit 492 sets a retrievable mark for the line (S804), and advances the process to S801. Here, as a method of setting the recoverable mark, a dedicated field for the recoverable mark may be provided in the row of the overall update history information 221 and set in this field. You may make it store including.

If the determination result in S803 is negative (S803: NO), the area collection unit 492 advances the process to S801.

On the other hand, if the determination result in S801 is negative (S801: NO), the area collection unit 492 determines whether or not there are rows (processing target rows) in which the S806 and S807 are not processed in the overall update history information 221. Judgment is made (S805). If the determination result in S805 is affirmative (S805: YES), the area collection unit 492 determines whether or not a collectable mark has been set in an unprocessed line (S806).

If the determination result in S806 is affirmative (S806: YES), the area collection unit 492 collects the area where the data of the corresponding row is stored (S807), and advances the process to S805. By this process, the duplicate line in the same version or the reference version line duplicated with this version is deleted from the memory 220 (the area is collected and a free area is created). For this reason, the memory 220 can be used effectively. Note that the data saving unit 493 may temporarily save the deleted row in a predetermined storage area.

If the determination result in S806 is negative (S806: NO), the area collection unit 492 advances the process to S805.

If the determination result in S805 is negative (S805: NO), it means that the collection of the area has been determined for all rows, and the area collection unit 492 ends the garbage collection process.

The area collection by the area collection unit 492 is not limited to the above processing. For example, when the use of a certain version of the database is terminated, the row corresponding to the BR_ID of the version database to be deleted may be deleted from the overall update history information 221. If another version of the database is created based on the state of the database of a certain version at a predetermined time, the line corresponding to the state of the certain version of the database at the predetermined time , Don't be deleted.

FIG. 9 is a flowchart of the anonymization process.

Anonymization processing is processing that makes it impossible to specify an attribute value of data by converting data including confidential attribute values (for example, name, telephone number, address, credit card number). The anonymization process is performed on the data read out in S704 of FIG. 7, for example.

The data conversion unit 430 acquires the row ID, column ID (attribute ID), BR_ID, and data for the row acquired in S704 (S901). Next, the data conversion unit 430 determines whether or not the data in this row has been updated after creating this version of the database (S902). Whether or not the data has been updated can be specified by whether or not the BR_ID of the row corresponding to the row ID is the BR_ID associated with the concurrent execution control unit 420 that has requested this data. Specifically, when the BR_ID of the row corresponding to the row ID is the BR_ID associated with the simultaneous execution control unit 420, it can be determined that the update has been completed.

If the determination result in S902 is affirmative (S902: YES), it means that the data has already been anonymized, so the data conversion unit 430 returns the acquired data to the simultaneous execution control unit 420 (S903). .

If the determination result in S902 is negative (S902: NO), the data conversion unit 430 determines whether the anonymization method corresponding to BR_ID has been set in the data conversion unit 430 (S904). If the determination result in S904 is negative (S904: NO), there is no need to anonymize this data, and the data conversion unit 430 advances the process to S903.

On the other hand, if the determination result in S904 is affirmative (S904: YES), since this data needs to be anonymized, the data conversion unit 430 acquires an anonymization method for the column ID of the column in the data (S905). . Here, the data may include, for example, a column (attribute value) such as a name, a credit card, a telephone number, and an address, and there is an appropriate anonymization method depending on the attribute. In addition, about the anonymization method according to an attribute, it can be set as a well-known arbitrary method.

Next, the data conversion unit 430 anonymizes each column (attribute value) of the data by the acquired anonymization method (S906), and returns the anonymized data to the simultaneous execution control unit 420 (S907).

By anonymization processing, anonymized data can be used as data of a certain version of the database, and it is possible to avoid leakage of confidential attribute values. Further, since the anonymization process is performed every time data is read out, it is not necessary to anonymize and store the data in advance, and the anonymized data can be provided quickly at a necessary time. Further, the data conversion unit 430 can perform anonymization processing only for data read from a database of a version other than version # 0. For example, only attribute values that should be kept secret in the development environment are converted, and anonymization processing may not be performed in the production environment.

FIG. 10 is a flowchart of the resource control process.

Resource control processing is to control the amount of available resources for each version (environment). For example, after the database engine 400 receives a query, the amount of resources used to process the version of the database associated with the query is controlled at a predetermined point in time (eg, before performing the data read process).

The resource management unit 440 acquires the BR_ID indicating the version of the database to be processed from the simultaneous execution control unit 420 (S1001). Next, the resource management unit 440 determines whether or not a resource constraint condition is set for the acquired BR_ID version database (S1002). If the determination result in S1002 is negative (S1002: NO), the resource management unit 440 advances the process to S1006.

If the determination result in S1002 is affirmative (S1002: YES), the resource management unit 440 acquires the resource constraint condition corresponding to BR_ID (S1003), and acquires the resource usage status in the database processing of the BR_ID version (S1004). ).

Next, the resource management unit 440 determines whether or not the acquired resource usage condition satisfies the resource constraint condition (S1005). If the determination result in S1005 is affirmative (S1005: YES), the resource management unit 440 advances the process to S1006.

In S1006, the resource management unit 440 causes a process (for example, a data read process) to be performed on the acquired BR_ID version database.

On the other hand, if the determination result in S1005 is negative (S1005: NO), the resource management unit 440 waits for execution of processing (for example, data read processing) on the acquired BR_ID version database (S1007), and the processing is performed in S1004. Proceed to

According to the resource control process, the process for a certain version of the database is controlled according to the resource constraint condition. Therefore, for example, it is possible to reduce the adverse effect of the processing on the production environment database due to the processing on the development environment (test environment) database.

As mentioned above, although one embodiment was described, this is an illustration for explaining the present invention, and the scope of the present invention is not limited to this embodiment. The present invention can be implemented in various other forms.

For example, in the above-described embodiment, the version creation unit 495 does not specify a new version of a new version in a consistent state at a predetermined timing (for example, monthly, weekly, daily timing). The database may be added automatically. As a result, the new version of the added database is saved without being updated, and by using that database, a consistent past state is maintained from another version of the database that has been altered. You can expect to revert to the database version you have.

Further, for example, after adding a new version, the query save unit 460 saves a query for the database of the reference version, and later, the fast-forward unit 470 adds the new version database (query is being executed). The query stored in the query saving unit 460 may be executed on the server that is not present. Thereby, it can be expected that the CPU load or the like in the database processing is reproduced.

Further, for example, after adding a new version, the first query issued by the first program is executed using the database of the first version (for example, the reference version # 0), and the second version The processing of the second query issued by the second program (for example, the first program is upgraded) may be executed using a database of the version (for example, a new version). Each of the first and second programs may be an example of a query source. The first version database and the second version database may have the same contents. The data comparison unit 450 compares the state (result) of the first version database used by the first program with the state (result) of the second version database used by the second program. The comparison result (for example, information on whether or not the same state (result) is obtained by both programs) may be output. As a result, the operation of the program can be verified.

For example, the query processing unit 410 may generate a plurality of query execution plans for the same query. The speculative execution unit 480 may cause two or more simultaneous execution control units 420 respectively corresponding to two or more versions to execute two or more query execution plans. The speculative execution unit 480 may use a database of a version in which the query execution plan that has obtained the earliest result is executed as a database used for subsequent processing. Thereby, it can be expected to execute the query quickly.

Also, for example, after a new version is added, processing by a new program (for example, a program to which a new function is added) may be executed using the new version database. The data comparison unit 450 may merge the table into the reference version database within a range where the execution result is, for example, a table as a unit and there is no problem in consistency. As a result, it can be expected that the execution result of the new program development environment is appropriately reflected in the production environment.

1: Computer system 100: Client computer 200: Database server 400: Database engine

Claims

A database system that executes transactions according to queries,
A first individual execution unit that executes a transaction for a first database that is a first version database;
Each time a version is added, an additional database that is a snapshot of the first database when the version is added, and a version management unit that generates an additional individual execution unit that executes a transaction for the additional database, Have
The version management unit includes an ID of a row to be updated each time a row of any database among all databases corresponding to all the individual execution units including at least the first individual execution unit is updated, Adding an entry, which is information including the updated data stored in the row to be updated, and the ID of the version corresponding to the individual execution unit to be updated, to the history information;
Database system.
A data conversion unit that executes a conversion process for converting an attribute value of a predetermined type of data in at least any additional version of the database to be unspecified;
The database system according to claim 1.
The conversion process is performed on a predetermined type of attribute value in the read data when the data is read from at least one additional version database according to the query.
The database system according to claim 2.
A resource control unit for controlling a resource amount allocated to each of the one or more individual execution units based on a constraint condition of resources allocated to each of the one or more individual execution units including at least the first individual execution unit; Have
The database system according to claim 1.
The version management unit refers to the history information, identifies unnecessary data that is not referenced by any version of the individual execution unit, and releases the area of the identified unnecessary data.
The database system according to claim 1.
The version management unit adds a new version at a predetermined time point set in advance.
The database system according to claim 1.
A query processing unit that generates a plurality of query execution plans based on the query;
A speculative execution unit that controls the plurality of generated query execution plans to be executed by a plurality of individual execution units, respectively, and uses the database with the fastest result for execution of subsequent queries;
The database system according to claim 1.
A plurality of queries issued from a plurality of query sources are executed using a plurality of databases respectively corresponding to a plurality of versions, and the results of the plurality of databases after execution of the plurality of queries are compared and compared. A data comparison unit that outputs the result of
The computer system according to claim 1.
The version management unit is a difference from the first database among the databases updated by an additional individual execution unit corresponding to any additional version, and hinders consistency with the first database Merge the differences that are not into the first database;
The database system according to claim 1.
A query saving unit for accumulating queries after a predetermined time for the first version database;
A fast-forward unit that executes a query accumulated by the query saving unit for an additional version of the database that has not been updated since the predetermined time point;
The database system according to claim 1.
Executing a first individual execution unit for executing a transaction for a first database which is a first version database;
Each time a version is added, an additional database that is a snapshot of the first database at the time of adding the version and an additional individual execution unit that executes a transaction for the additional database are generated.
Stored in the updated row ID and the updated row every time a row of any database among all databases corresponding to all the individual execution units including at least the first individual execution unit is updated An entry which is information including the updated data to be updated and the ID of the version corresponding to the individual execution unit to be updated is added to the history information.
Database management method.
A storage unit for storing history information;
A processor connected to the storage unit and functioning as a database engine;
The processor is
Executing a first individual execution unit for executing a transaction for a first database which is a first version database;
Each time a version is added, an additional database that is a snapshot of the first database at the time of adding the version and an additional individual execution unit that executes a transaction for the additional database are generated.
Stored in the updated row ID and the updated row every time a row of any database among all databases corresponding to all the individual execution units including at least the first individual execution unit is updated An entry that is information including the updated data to be updated and the ID of the version corresponding to the individual execution unit to be updated is added to the history information.
calculator.
The storage unit is a memory;
The memory stores the first database;
All the individual execution units are included in the database engine,
The computer according to claim 12.