WO2016117007A1 - Système de base de données et procédé de gestion de base de données - Google Patents

Système de base de données et procédé de gestion de base de données Download PDF

Info

Publication number
WO2016117007A1
WO2016117007A1 PCT/JP2015/051249 JP2015051249W WO2016117007A1 WO 2016117007 A1 WO2016117007 A1 WO 2016117007A1 JP 2015051249 W JP2015051249 W JP 2015051249W WO 2016117007 A1 WO2016117007 A1 WO 2016117007A1
Authority
WO
WIPO (PCT)
Prior art keywords
database
version
unit
data
updated
Prior art date
Application number
PCT/JP2015/051249
Other languages
English (en)
Japanese (ja)
Inventor
上村 哲也
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Priority to PCT/JP2015/051249 priority Critical patent/WO2016117007A1/fr
Publication of WO2016117007A1 publication Critical patent/WO2016117007A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/28Error detection; Error correction; Monitoring by checking the correct order of processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures

Definitions

  • the present invention generally relates to database management.
  • a method of acquiring a database snapshot in the database development environment and executing a transaction for the snapshot is conceivable.
  • database snapshots or transactions for example, techniques of Patent Documents 1 and 2 are known.
  • the database snapshot has only one generation and is read-only. When the transaction is confirmed, the snapshot is switched to a new generation (return). For this reason, it is not suitable to simply use a database snapshot in a development environment.
  • the development environment is made weaker than the production environment (for example, if servers and storage that are less reliable than the production environment are introduced), testing in the development environment will be insufficient, and the reliability of the production environment will be reduced. (For example, failures that cannot be found in the development environment may occur in the production environment).
  • a first individual execution unit that executes a transaction for the database (first database) of the first version (for example, production environment)
  • a version management unit that generates an additional database that is a snapshot of the first database at the time and an additional individual execution unit that executes transactions for the additional database.
  • the version management unit updates the ID of the row to be updated and the update each time a row of any database among all databases corresponding to all the individual execution units including at least the first individual execution unit is updated.
  • An entry which is information including the updated data stored in the row to be updated and the version ID corresponding to the individual execution unit to be updated is added to the history information.
  • the “storage unit” may be one or more storage devices including a memory.
  • the storage unit may be at least a main storage device of a main storage device (typically a volatile memory) and an auxiliary storage device (typically a nonvolatile storage device).
  • PDEV indicates a physical storage device, and may typically be a nonvolatile storage device (for example, an auxiliary storage device).
  • the PDEV may be, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive).
  • processing may be described using a functional unit (eg, version management unit, simultaneous execution control unit, etc.) as the subject, but the functional unit is programmed by a processor (eg, CPU (Central Processing Unit)). Since the predetermined processing is performed by using the storage unit (for example, memory) and / or the interface device (for example, communication port) as appropriate, the subject of the processing may be a processor.
  • the processing described with the functional unit as the subject may be processing performed by a processor or an apparatus or system having the processor.
  • the processor may include a hardware circuit that performs a part or all of the processing. At least a part of the plurality of functional units may be realized by a hardware circuit.
  • the program may be installed in a computer-like device from a program source.
  • the program source may be, for example, a storage medium that can be read by a program distribution server or a computer.
  • the program distribution server may include a processor (for example, a CPU) and a storage unit, and the storage unit may further store a distribution program and a program to be distributed. Then, the processor of the program distribution server executes the distribution program, so that the processor of the program distribution server may distribute the distribution target program to other computers.
  • two or more functional units may be realized as one functional unit, or one functional unit may be realized as two or more functional units.
  • FIG. 1 is a configuration diagram of a computer system according to the embodiment.
  • the computer system 1 includes a plurality (or one) of client computers 100, a database server (an example of a database system) 200, and an external storage device 300.
  • the client computer 100 and the database server 200 are connected via a communication network (for example, a LAN (Local Area Network)) 10.
  • a communication network for example, a LAN (Local Area Network) 10.
  • the client computer 100 executes processing using database data. For example, the client computer 100 issues a database query to the database server 200 in the process.
  • the external storage apparatus 300 has one or more PDEVs and can store all or part of the database.
  • the external storage apparatus 300 may not be provided. That is, the entire database may be stored in the memory 220 described later, and the database server 200 may be an in-memory database.
  • the database server 200 includes a CPU 210, a memory 220, a NIC (Network Interface Card) 230, and an HBA (Host Bus Adapter) 240.
  • the NIC 230 is an interface device that communicates with the client computer 100 via the communication network 10.
  • the HBA 240 is an interface device that communicates with the external storage apparatus 300.
  • the memory 220 stores a program for realizing a functional unit (a program for executing processing) and management information referred to by at least one functional unit.
  • the memory 220 may store at least a part of the database.
  • the memory 220 stores a database engine program and management information.
  • the CPU 210 functions as a database engine 400 as shown in FIG. 2 by executing the database engine program.
  • the management information includes, for example, overall update history information and individual update history information, which will be described later.
  • FIG. 2 is a configuration diagram of the database engine 400.
  • the database engine 400 is an example of a database management system (DBMS), and includes a query processing unit 410, a simultaneous execution control unit 420, a data conversion unit 430, a resource management unit 440, a data comparison unit 450, and a query saving unit. 460, fast-forward unit 470, speculative execution unit 480, and version management unit 490.
  • DBMS database management system
  • the query processing unit 410 generates a query execution plan for executing the query received from the client computer 100.
  • the query is described in, for example, SQL (Structured Query Language).
  • the source of the query may be a source other than the client computer 100, for example, a program (for example, an application program) that is an upper program of the database engine 400 and is executed inside or outside the database server 200.
  • the query processing unit 410 can determine which version of a plurality of versions (production environment / development environment, etc.) the received query is, and associate the query with the version.
  • the query is executed by the concurrent execution control unit 420 corresponding to the version associated with the query.
  • the simultaneous execution control unit 420 is an example of an individual execution unit, and executes a query to the database based on a query execution plan.
  • the simultaneous execution control unit 420 controls the simultaneous execution of a plurality of transactions for one version of the database by, for example, MVCC (MultiVersion Concurrency Control).
  • MVCC MultiVersion Concurrency Control
  • the concurrent execution control unit 420 is generated for each version (environment) of the database.
  • the data conversion unit 430 converts the data including confidential attribute values (for example, name, telephone number, address, credit card number) in the database to make it impossible to specify the attribute value of the data. (Hereinafter, anonymization processing) is executed. For example, the data conversion unit 430 manages, for each database version, whether or not to execute the anonymization process and an anonymization method in the case of executing the anonymization process. The data conversion unit 430 converts (anonymizes) the data including the confidential attribute value in the target version database according to the anonymization method corresponding to the version.
  • confidential attribute values for example, name, telephone number, address, credit card number
  • anonymization processing is executed.
  • the data conversion unit 430 manages, for each database version, whether or not to execute the anonymization process and an anonymization method in the case of executing the anonymization process.
  • the data conversion unit 430 converts (anonymizes) the data including the confidential attribute value in the target version database according to the anonymization method corresponding to the version.
  • the resource management unit 440 manages resource constraints (for example, CPU usage rate, memory bandwidth, etc.) for each database version.
  • the resource constraint represents a usable resource amount (amount of resource portion) among resources (CPU, memory, etc.) of the database server 200.
  • the resource management unit 440 controls the amount of resources allocated to the database version (the concurrent execution control unit 420 corresponding to the version) according to the resource constraint corresponding to the database version.
  • the data comparison unit 450 compares a plurality of processing results respectively corresponding to a plurality of programs using a plurality of version databases.
  • the data comparison unit 450 outputs the comparison result (for example, outputs it to the client computer 100).
  • the query saving unit 460 accumulates the query so as to be identifiable for each database version.
  • the fast-forward unit 470 executes one or more queries corresponding to the version among the queries stored in the query saving unit 460 for any version of the database. Thereby, it can be expected to reproduce the CPU load or the like according to the execution of one or more queries corresponding to the database of that version.
  • the speculative execution unit 480 causes the simultaneous execution control unit 420 to execute a query using a plurality of query execution plans generated by the query processing unit 410.
  • the version management unit 490 manages a plurality of versions of the database.
  • the version management unit 490 includes a data access unit 491, an area collection unit 492, a data saving unit 493, a starting point setting unit 494, a version creation unit 495, and a branch identification unit 496. Each process of the function units 491 to 496 will be described later.
  • FIG. 3 shows the entire update history information.
  • the whole update history information 221 is information managed by the version management unit 490, and represents the update history of the database table for all versions. Specifically, the overall update history information 221 has an entry for each update transaction. Each entry stores a row ID 311, data (attribute value) 312, TR_ID 313, BR_ID 314, and TIME 315.
  • the row ID 311 is an ID of an added row (database table row).
  • Data 312 is updated data.
  • TR_ID 313 is an ID of the update transaction.
  • BR_ID 314 is an ID of the version of the database including the row added by the update transaction.
  • the TIME 315 stores information indicating the row update order (or update time).
  • the value of TIME 315 may be a value that can specify the update order (addition order).
  • BR_ID “n” (n is an integer of 0 or more) may be expressed as “version #n”.
  • a database of version #n or a table thereof may be referred to as “database #n” or “table #n”.
  • a row having a row ID “m” (m is an integer of 0 or more) may be referred to as “row #m”.
  • a transaction with TR_ID “p” (p is an integer of 0 or more) may be referred to as “transaction #p”.
  • the individual update history information 600 represents the update history of the database (table) updated by the simultaneous execution control unit 420 and is managed by the simultaneous execution control unit 420. Therefore, the individual update history information 600 has an entry for each transaction, and each entry has a row ID 601, data (attribute value) 602, and TR_ID 603, but BR_ID is not included in the individual update history information 600 (simultaneously. The BR_ID is not recognized (conscious) by the execution control unit 420).
  • the individual update history information 600 corresponding to the version #m corresponding to the simultaneous execution control unit 420 may be referred to as “individual update history information #m”. Therefore, FIG. 4A shows individual update history information # 0, FIG. 4B shows individual update history information # 1, and FIG. 4C shows individual update history information # 2.
  • the simultaneous execution control unit 420 corresponding to version #m may be referred to as “simultaneous execution control unit #m”.
  • the concurrent execution control unit #m executes a transaction for the database #m (table #m).
  • rows # 0 to # 2 are stored in table # 0 by the transactions # 0 to # 3 of the simultaneous execution control unit # 0, respectively. Added. As a result, the addition of rows # 0 to # 2 (data “A” to “C”) is recorded in the individual update history information # 0 by the simultaneous execution control unit # 0, and the entire update history is recorded by the version management unit 490. Information 221 is recorded. At this point, no version other than version # 0 exists.
  • Version # 1 is added after TIME “2”, and concurrent execution control unit # 1 is added (generated) accordingly.
  • the reference of the additional version is a database of a predetermined version (here, version # 0 (for example, production environment)) at the time of version addition. Therefore, up to TIME “3”, the table # 1 seen from the concurrent execution control unit # 1 and the table # 0 seen from the concurrent execution control unit # 0 are the same. That is, both table # 1 and table # 0 hold data “A” to “C” stored in rows # 0 to # 2, respectively.
  • version #k (k is an integer equal to or greater than 1) is added based on version # 0
  • concurrent execution control unit #k is added, and added concurrent execution control unit #k. Table #k will be visible.
  • the table #k at the time of version addition is not a copy of the table # 0 at the time of version addition, but a snapshot of the table # 0 at the time of version addition. Therefore, it is not necessary to copy the table # 0 (data
  • row # 3 (data “D”) is added to table # 1 by transaction # 3 of concurrent execution control unit # 1.
  • the addition of row # 3 (data “D”) is recorded in the individual update history information # 1 by the concurrent execution control unit # 1 and is recorded in the overall update history information 221 by the version management unit 490.
  • update the table # 1 such as adding data. Updates such as data addition are not actually reflected in the database, but are recorded in history information (log information) such as individual update history information # 1 and overall update history information 221. Thereby, a substantial update to the snapshot of Table # 0 is possible. Note that updating both the individual update history information # 1 and the overall update history information 221 is an example.
  • each individual update history information is included in the overall update history information 221, there is no individual update history information, and the simultaneous execution control unit 420 uses the BR_ID to determine each environment from the overall update history information 221. You may judge how you see it. That is, an example of the history information may be both the individual update history information and the overall update history information, or may be one of the individual update history information and the overall update history information (for example, the overall update history information).
  • row # 3 (data “E”) is added to table # 0 by transaction # 4 of concurrent execution control unit # 0.
  • the addition of row # 3 (data “E”) is recorded in the individual update history information # 0 by the concurrent execution control unit # 0, and is recorded in the overall update history information 221 by the version management unit 490. Therefore, the data “E” in row # 3 is referred to in version # 0 (for example, the production environment), but is not referred to other than version # 0 (version # 1 (for example, the first development environment)). That is, the addition of row # 3 (data “E”) is not recorded in the individual update history information other than the individual update history information # 0. Therefore, data “E” does not exist in row # 3 of table # 1.
  • Version # 2 is added after TIME “4”.
  • the reference for the additional version is a database of version # 0 (for example, production environment) at the time of version addition. Therefore, up to TIME “4”, the table # 2 seen from the concurrent execution control unit # 2 and the table # 0 seen from the concurrent execution control unit # 0 are the same. That is, both the table # 2 and the table # 0 hold the data “A” to “C” and “E” stored in the rows # 0 to # 3, respectively.
  • row # 3 (data “F”) is added to table # 2 by transaction # 5 of concurrent execution control unit # 2. That is, the data in row # 3 is updated from “E” to “F”.
  • the addition of row # 3 (data “F”) is recorded in the individual update history information # 2 by the concurrent execution control unit # 2, and is recorded in the overall update history information 221 by the version management unit 490.
  • Data “F” in row # 3 is referred to in version # 2 (for example, the second development environment), but other than version # 2 (for example, version # 0 (for example, the production environment) and version # 1 (for example, the first development environment). It is not referenced in the environment)). That is, the addition of row # 3 (data “F”) is not recorded in the individual update history information other than the individual update history information # 2. Therefore, data “F” does not exist in row # 3 of table # 0 and row # 3 of table # 1.
  • row # 2 (data “F”) is added to table # 0 by transaction # 6 of concurrent execution control unit # 0. That is, the data in row # 2 is updated from “C” to “F”.
  • the addition of row # 2 (data “F”) is recorded in the individual update history information # 0 by the concurrent execution control unit # 0, and is recorded in the overall update history information 221 by the version management unit 490.
  • the data “F” in row # 2 is referred to in version # 0, but is not referred to in versions other than version # 0 (version # 1 and version # 2). That is, the addition of row # 2 (data “F”) is not recorded in the individual update history information other than the individual update history information # 0. Therefore, data “F” does not exist in row # 2 of table # 1 and row # 2 of table # 2.
  • the version management unit 490 manages the update history of the database (table) for all versions, and the concurrent execution control unit #m manages the update history only for the version (m) database (table).
  • the version By adding the version, the development environment can be increased, and the database (table) can be updated in each development environment.
  • version #k (k is an integer equal to or greater than 1)
  • a concurrent execution control unit #k is generated, and thus a table #k that can be seen only by the concurrent execution control unit #k is created.
  • the reference table # 0 is not required to be copied.
  • the concurrent execution control unit #k (for example, the kth development environment) is executed by the database server 200 that executes the concurrent execution control unit # 0 (for example, the production environment). For this reason, it is possible to construct a development environment (test environment) equivalent to the present application environment without introducing a server and storage equivalent to the production environment. Therefore, a development environment equivalent to the production environment can be constructed quickly and at low cost.
  • the concurrent execution control unit 420 and the version management unit 490 are included in the database engine 400. This eliminates the need for a so-called proxy server or storage, and makes it easy to handle in-memory databases.
  • version #k not only the addition of version #k, it is also possible to delete version #k (for example, delete unnecessary development environment).
  • the simultaneous execution control unit #k corresponding to the version #k to be deleted is also deleted.
  • the area for storing the data K is recovered by the garbage collection process. (The area may be managed as an empty area).
  • the concept of this embodiment is not a replacement or diversion of a so-called general version management technique. Version control technology cannot simply be applied to database management technology. This is because a database normally manages data in units of rows, but a version management technique copies files and manages versions in units of files. Database and file are different. The database is structured data, the file is unstructured data, and one file can include a plurality of types of information. Therefore, if the normal version management technique is diverted to the database management technique, it is necessary to copy the database and perform version management. However, as described above, it is not necessary to copy the database (table) in this embodiment. Therefore, the concept of this embodiment is not a replacement or diversion of a so-called general version management technique. In this embodiment, a row-oriented database is adopted, but a column-oriented database (data management in units of columns) may be adopted instead of the row-oriented database.
  • the concept of this embodiment is not a replacement or diversion of so-called virtual machine technology.
  • data referred to by a virtual machine is not referred to by another virtual machine. Data is independent for each virtual machine, and therefore data unnecessary for the virtual machine can be deleted immediately.
  • data in the database may be shared by a plurality of simultaneous execution control units 420, and data recognized by at least one simultaneous execution control unit 420 cannot be deleted. Therefore, the concept of this embodiment is not a replacement or diversion of so-called virtual machine technology.
  • FIG. 5 is a flowchart of the snapshot creation process.
  • the version creation unit 495 receives a setting of a time point (TIME) when a new version is added to the database, for example, from the user via the client computer 100 (S501).
  • the version creation unit 495 refers to the overall update history information 221 and acquires the transaction ID (TR_ID) corresponding to the received time point (TIME) (S502). Note that the state of the database at the end of the transaction of TR_ID is the starting point of the new version. In the above process, TR_ID is acquired from TIME, but TR_ID may be directly received.
  • the starting point setting unit 494 identifies the BR_ID indicating the version to which the transaction of TR_ID belongs, and generates the BR_ID as a reference destination ID indicating the version serving as the reference destination (starting point) (S503).
  • the starting point setting unit 494 generates a new BR_ID for the new version, and generates a concurrent execution control unit 420 that associates the reference destination ID with the new BR_ID (S504).
  • a new version of the database starting from the state of the database at a certain point in time can be newly added.
  • FIG. 6 is a flowchart of the data writing process.
  • the data access unit 491 acquires the row ID of the write target row and the write target data from the simultaneous execution control unit 420 (S601). Next, the data access unit 491 acquires the TR_ID assigned to the transaction to be written from the simultaneous execution control unit 420 (S602).
  • the data access unit 491 acquires the BR_ID indicating the version of the database managed by the simultaneous execution control unit 420 from the simultaneous execution control unit 420 (S603).
  • the data access unit 491 acquires the current TIME (for example, the current time) (S604).
  • the data access unit 491 stores an entry including the acquired row ID, TR_ID, BR_ID, TIME, and data in the overall update history information 221 (S605).
  • FIG. 7 is a flowchart of the data reading process.
  • the data access unit 491 acquires the row ID of the row to be read from the simultaneous execution control unit 420 (S701). Next, the data access unit 491 acquires BR_ID indicating the version of the database managed by the simultaneous execution control unit 420 (S702), and passes the acquired row ID and BR_ID to the branch identification unit 496.
  • the branch identifying unit 496 determines whether data corresponding to the acquired row ID and BR_ID exists in the overall update history information 221 (S703).
  • the data access unit 491 reads the data from the overall update history information 221 and passes it to the simultaneous execution control unit 420 (S704).
  • the data is output from the simultaneous execution control unit 420.
  • the branch identification unit 496 determines whether the reference source ID corresponding to the BR_ID version database exists in the simultaneous execution control unit 420 (individual update history information 600). It is determined whether or not (S705).
  • the branch identification unit 496 sets this reference destination ID as BR_ID (S706), and advances the process to S703. As a result, the processing from S703 onward is further executed, and data corresponding to the version database corresponding to BR_ID (reference source ID) is searched.
  • a process for sequentially searching whether or not data corresponding to the version database exists that is, a process for searching for data by sequentially tracing the version database of the reference destination is performed.
  • S705 If the determination result in S705 is negative (S705: NO), it means that there is no row to be read, so the data access unit 491 notifies the concurrent execution control unit 420 of a row ID error (S707).
  • FIG. 8 is a flowchart of the garbage collection process.
  • the garbage collection process is a process of identifying data that is not referred to by any concurrent execution control unit 420 and collecting an area in which the data is stored.
  • the collected area is managed as an empty area where data can be newly stored.
  • the area to be collected is not referred to by any of the concurrent execution control units 420 and the area storing entries (entries of the entire update history information 221) corresponding to rows / data not referenced by any concurrent execution control unit 420. It may be at least one of the database area corresponding to the row / data.
  • the garbage collection process is executed at an arbitrary time point and may be executed repeatedly.
  • the area collection unit 492 determines whether or not there is an unprocessed line (process target line) in the overall update history information 221 (S801).
  • the area collection unit 492 acquires the row ID, TR_ID, and BR_ID of an unprocessed row as a processing target (S802).
  • the area collection unit 492 is a row having the same row ID as the processing target row ID, and the same BR_ID as the processing target BR_ID or the same reference source ID associated with the processing target BR_ID. It is determined whether or not there is a row that includes BR_ID and whose TIME is before the TIME of the row to be processed (S803). If the determination result in S803 is affirmative (S803: YES), the area collection unit 492 sets a retrievable mark for the line (S804), and advances the process to S801.
  • a dedicated field for the recoverable mark may be provided in the row of the overall update history information 221 and set in this field. You may make it store including.
  • the area collection unit 492 determines whether or not there are rows (processing target rows) in which the S806 and S807 are not processed in the overall update history information 221. Judgment is made (S805). If the determination result in S805 is affirmative (S805: YES), the area collection unit 492 determines whether or not a collectable mark has been set in an unprocessed line (S806).
  • the area collection unit 492 collects the area where the data of the corresponding row is stored (S807), and advances the process to S805.
  • the duplicate line in the same version or the reference version line duplicated with this version is deleted from the memory 220 (the area is collected and a free area is created). For this reason, the memory 220 can be used effectively.
  • the data saving unit 493 may temporarily save the deleted row in a predetermined storage area.
  • S805 If the determination result in S805 is negative (S805: NO), it means that the collection of the area has been determined for all rows, and the area collection unit 492 ends the garbage collection process.
  • the area collection by the area collection unit 492 is not limited to the above processing.
  • the row corresponding to the BR_ID of the version database to be deleted may be deleted from the overall update history information 221. If another version of the database is created based on the state of the database of a certain version at a predetermined time, the line corresponding to the state of the certain version of the database at the predetermined time , Don't be deleted.
  • FIG. 9 is a flowchart of the anonymization process.
  • Anonymization processing is processing that makes it impossible to specify an attribute value of data by converting data including confidential attribute values (for example, name, telephone number, address, credit card number).
  • the anonymization process is performed on the data read out in S704 of FIG. 7, for example.
  • the data conversion unit 430 acquires the row ID, column ID (attribute ID), BR_ID, and data for the row acquired in S704 (S901). Next, the data conversion unit 430 determines whether or not the data in this row has been updated after creating this version of the database (S902). Whether or not the data has been updated can be specified by whether or not the BR_ID of the row corresponding to the row ID is the BR_ID associated with the concurrent execution control unit 420 that has requested this data. Specifically, when the BR_ID of the row corresponding to the row ID is the BR_ID associated with the simultaneous execution control unit 420, it can be determined that the update has been completed.
  • S902 If the determination result in S902 is affirmative (S902: YES), it means that the data has already been anonymized, so the data conversion unit 430 returns the acquired data to the simultaneous execution control unit 420 (S903). .
  • the data conversion unit 430 determines whether the anonymization method corresponding to BR_ID has been set in the data conversion unit 430 (S904). If the determination result in S904 is negative (S904: NO), there is no need to anonymize this data, and the data conversion unit 430 advances the process to S903.
  • the data conversion unit 430 acquires an anonymization method for the column ID of the column in the data (S905).
  • the data may include, for example, a column (attribute value) such as a name, a credit card, a telephone number, and an address, and there is an appropriate anonymization method depending on the attribute.
  • the anonymization method according to an attribute it can be set as a well-known arbitrary method.
  • the data conversion unit 430 anonymizes each column (attribute value) of the data by the acquired anonymization method (S906), and returns the anonymized data to the simultaneous execution control unit 420 (S907).
  • anonymized data can be used as data of a certain version of the database, and it is possible to avoid leakage of confidential attribute values. Further, since the anonymization process is performed every time data is read out, it is not necessary to anonymize and store the data in advance, and the anonymized data can be provided quickly at a necessary time. Further, the data conversion unit 430 can perform anonymization processing only for data read from a database of a version other than version # 0. For example, only attribute values that should be kept secret in the development environment are converted, and anonymization processing may not be performed in the production environment.
  • FIG. 10 is a flowchart of the resource control process.
  • Resource control processing is to control the amount of available resources for each version (environment). For example, after the database engine 400 receives a query, the amount of resources used to process the version of the database associated with the query is controlled at a predetermined point in time (eg, before performing the data read process).
  • the resource management unit 440 acquires the BR_ID indicating the version of the database to be processed from the simultaneous execution control unit 420 (S1001). Next, the resource management unit 440 determines whether or not a resource constraint condition is set for the acquired BR_ID version database (S1002). If the determination result in S1002 is negative (S1002: NO), the resource management unit 440 advances the process to S1006.
  • the resource management unit 440 acquires the resource constraint condition corresponding to BR_ID (S1003), and acquires the resource usage status in the database processing of the BR_ID version (S1004). ).
  • the resource management unit 440 determines whether or not the acquired resource usage condition satisfies the resource constraint condition (S1005). If the determination result in S1005 is affirmative (S1005: YES), the resource management unit 440 advances the process to S1006.
  • the resource management unit 440 causes a process (for example, a data read process) to be performed on the acquired BR_ID version database.
  • the resource management unit 440 waits for execution of processing (for example, data read processing) on the acquired BR_ID version database (S1007), and the processing is performed in S1004. Proceed to
  • the process for a certain version of the database is controlled according to the resource constraint condition. Therefore, for example, it is possible to reduce the adverse effect of the processing on the production environment database due to the processing on the development environment (test environment) database.
  • the version creation unit 495 does not specify a new version of a new version in a consistent state at a predetermined timing (for example, monthly, weekly, daily timing).
  • the database may be added automatically.
  • the new version of the added database is saved without being updated, and by using that database, a consistent past state is maintained from another version of the database that has been altered. You can expect to revert to the database version you have.
  • the query save unit 460 saves a query for the database of the reference version, and later, the fast-forward unit 470 adds the new version database (query is being executed).
  • the query stored in the query saving unit 460 may be executed on the server that is not present. Thereby, it can be expected that the CPU load or the like in the database processing is reproduced.
  • the first query issued by the first program is executed using the database of the first version (for example, the reference version # 0), and the second version
  • the processing of the second query issued by the second program may be executed using a database of the version (for example, a new version).
  • Each of the first and second programs may be an example of a query source.
  • the first version database and the second version database may have the same contents.
  • the data comparison unit 450 compares the state (result) of the first version database used by the first program with the state (result) of the second version database used by the second program.
  • the comparison result (for example, information on whether or not the same state (result) is obtained by both programs) may be output. As a result, the operation of the program can be verified.
  • the query processing unit 410 may generate a plurality of query execution plans for the same query.
  • the speculative execution unit 480 may cause two or more simultaneous execution control units 420 respectively corresponding to two or more versions to execute two or more query execution plans.
  • the speculative execution unit 480 may use a database of a version in which the query execution plan that has obtained the earliest result is executed as a database used for subsequent processing. Thereby, it can be expected to execute the query quickly.
  • processing by a new program may be executed using the new version database.
  • the data comparison unit 450 may merge the table into the reference version database within a range where the execution result is, for example, a table as a unit and there is no problem in consistency. As a result, it can be expected that the execution result of the new program development environment is appropriately reflected in the production environment.
  • Computer system 100 Client computer 200: Database server 400: Database engine

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un système de base de données, comportant une première unité d'exécution discrète qui exécute une transaction sur une base de données (première base de données) d'une première version (par exemple, environnement temps d'exécution), et une unité de gestion de versions qui, avec chaque version ajoutée (par exemple, environnement de développement supplémentaire), génère une base de données supplémentaire qui est une image instantanée de la première base de données au moment où la version est ajoutée et une autre unité d'exécution discrète qui exécute la transaction sur la base de données supplémentaire. Avec chaque mise à jour d'une rangée d'une base de données quelconque parmi l'ensemble des bases de données correspondant à chacune de la totalité des unités d'exécution discrètes comprenant au moins la première unité d'exécution discrète, l'unité de gestion de versions ajoute à des informations d'historique une entrée représentant des information comprenant l'ID de la rangée mise à jour, les données post-mise à jour stockées dans la rangée mise à jour, et l'ID de la version correspondant à l'unité d'exécution discrète qui exécute la mise à jour.
PCT/JP2015/051249 2015-01-19 2015-01-19 Système de base de données et procédé de gestion de base de données WO2016117007A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2015/051249 WO2016117007A1 (fr) 2015-01-19 2015-01-19 Système de base de données et procédé de gestion de base de données

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2015/051249 WO2016117007A1 (fr) 2015-01-19 2015-01-19 Système de base de données et procédé de gestion de base de données

Publications (1)

Publication Number Publication Date
WO2016117007A1 true WO2016117007A1 (fr) 2016-07-28

Family

ID=56416573

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/051249 WO2016117007A1 (fr) 2015-01-19 2015-01-19 Système de base de données et procédé de gestion de base de données

Country Status (1)

Country Link
WO (1) WO2016117007A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018018121A (ja) * 2016-07-25 2018-02-01 富士通株式会社 データベース制御プログラム、データベース制御方法及びデータベース制御装置
CN108647357A (zh) * 2018-05-17 2018-10-12 阿里巴巴集团控股有限公司 数据查询的方法及装置
JP2020502626A (ja) * 2016-11-08 2020-01-23 セールスフォース ドット コム インコーポレイティッド データベース・システムにおけるテスト・データの形成及び動作

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04112344A (ja) * 1990-09-03 1992-04-14 Fujitsu Ltd データベースの疑似更新方式
JP2003076593A (ja) * 2002-06-10 2003-03-14 Hitachi Ltd データベース管理方法およびシステム
JP2007011497A (ja) * 2005-06-28 2007-01-18 Hitachi Ltd 性能テスト方法およびテストサーバ
WO2008114452A1 (fr) * 2007-03-20 2008-09-25 Fujitsu Limited Simulateur, système de simulation et programme informatique
JP2012079078A (ja) * 2010-10-01 2012-04-19 Nippon Telegr & Teleph Corp <Ntt> 分散データベース管理装置および分散データベース管理プログラム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04112344A (ja) * 1990-09-03 1992-04-14 Fujitsu Ltd データベースの疑似更新方式
JP2003076593A (ja) * 2002-06-10 2003-03-14 Hitachi Ltd データベース管理方法およびシステム
JP2007011497A (ja) * 2005-06-28 2007-01-18 Hitachi Ltd 性能テスト方法およびテストサーバ
WO2008114452A1 (fr) * 2007-03-20 2008-09-25 Fujitsu Limited Simulateur, système de simulation et programme informatique
JP2012079078A (ja) * 2010-10-01 2012-04-19 Nippon Telegr & Teleph Corp <Ntt> 分散データベース管理装置および分散データベース管理プログラム

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018018121A (ja) * 2016-07-25 2018-02-01 富士通株式会社 データベース制御プログラム、データベース制御方法及びデータベース制御装置
JP2020502626A (ja) * 2016-11-08 2020-01-23 セールスフォース ドット コム インコーポレイティッド データベース・システムにおけるテスト・データの形成及び動作
JP7090606B2 (ja) 2016-11-08 2022-06-24 セールスフォース ドット コム インコーポレイティッド データベース・システムにおけるテスト・データの形成及び動作
CN108647357A (zh) * 2018-05-17 2018-10-12 阿里巴巴集团控股有限公司 数据查询的方法及装置
CN108647357B (zh) * 2018-05-17 2023-01-31 创新先进技术有限公司 数据查询的方法及装置

Similar Documents

Publication Publication Date Title
CN110799960B (zh) 数据库租户迁移的系统和方法
CN109952564B (zh) 数据库系统中测试数据的形成与操纵
US9569458B2 (en) Preserving a state using snapshots with selective tuple versioning
US9639426B2 (en) Single snapshot for multiple applications
US9753812B2 (en) Generating mapping information for single snapshot for multiple applications
US11321192B2 (en) Restoration of specified content from an archive
US11176102B2 (en) Incremental virtual machine metadata extraction
EP4111324A1 (fr) Systèmes de fichiers construits à partir d&#39;objets blocs
US20200265068A1 (en) Replicating Big Data
KR102187127B1 (ko) 데이터 연관정보를 이용한 중복제거 방법 및 시스템
US10621071B2 (en) Formation and manipulation of test data in a database system
US10922280B2 (en) Policy-based data deduplication
US20210303511A1 (en) Cloning a Managed Directory of a File System
US20200201745A1 (en) Formation and manipulation of test data in a database system
US20130325814A1 (en) System and method for archive in a distributed file system
US9298733B1 (en) Storing files in a parallel computing system based on user or application specification
WO2016117007A1 (fr) Système de base de données et procédé de gestion de base de données
JPWO2007099636A1 (ja) ファイルシステム移行方法、ファイルシステム移行プログラム及びファイルシステム移行装置
KR20210058118A (ko) CaseDB: 엣지컴퓨팅을 위한 저비용 Put-Intensive 키-벨류 저장장치
US20130325813A1 (en) System and method for archive in a distributed file system
US20230222165A1 (en) Object storage-based indexing systems and method
US10268411B1 (en) Policy and heuristic based conversion of write-optimized virtual disk format into read-optimized virtual disk format
US9965488B2 (en) Back referencing of deduplicated data
US20140344538A1 (en) Systems, methods, and computer program products for determining block characteristics in a computer data storage system
US20240111718A1 (en) In-band file system access

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15878703

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15878703

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP