WO2010084754A1 - データベースシステム、データベース管理方法、データベース構造および記憶媒体 - Google Patents
データベースシステム、データベース管理方法、データベース構造および記憶媒体 Download PDFInfo
- Publication number
- WO2010084754A1 WO2010084754A1 PCT/JP2010/000326 JP2010000326W WO2010084754A1 WO 2010084754 A1 WO2010084754 A1 WO 2010084754A1 JP 2010000326 W JP2010000326 W JP 2010000326W WO 2010084754 A1 WO2010084754 A1 WO 2010084754A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- identifier
- database
- tuple
- area
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2379—Updates performed during online database operations; commit processing
Definitions
- the present invention relates to a database structure and a technique for executing data processing on the database.
- RDBMS Relational DataBase Management System
- RDB Relational DataBase Management System
- the RDBMS manages data in units of rows using a unique key for each row, it can execute large-scale processing in units of rows at high speed and efficiently, but large-scale processing in units of columns. There is a problem that it is difficult to execute efficiently. For example, when RDBMS executes data processing in units of columns, it is necessary to read data of a plurality of rows corresponding to the columns requested by the query, which causes a problem of reducing the processing speed. Furthermore, since the RDBMS can write data in a continuous storage area in the memory in units of rows, the data can be accessed at a high speed in units of rows. However, when the RDBMS executes a transaction related to search processing, comparison operation, or aggregation operation in units of columns, a phenomenon of frequently accessing data stored in a plurality of non-contiguous memory areas occurs. Processing speed may decrease.
- DWH Data WareHouse
- the DWH is a system that is constructed independently of the core business system and does not update data (addition of new data, change of existing data, or deletion of existing data) in principle. Therefore, DWH does not have a database structure that can perform data update efficiently.
- Patent Document 2 Japanese Patent Laid-Open No. 2000-339390
- Patent Document 3 International Publication No. 00/10103 pamphlet
- the database systems of Patent Document 2 and Patent Document 3 are database structures obtained by converting logical tabular data into, for example, a plurality of information blocks corresponding to respective items of gender, age, height, and weight. Is used.
- Each information block includes a value management table (value list) and a pointer array to the value management table.
- the pointer array to the value management table means that an item value number (that is, a pointer to the value management table) of a certain column of tabular data is in a predetermined order (record number order) of the tabular data.
- a stored array is
- the item value numbers in the value management table are arranged in a predetermined order in order to speed up data update (for example, record update, insertion or deletion). Is required. For this reason, when a new item value number is inserted into the value management table at the time of data update, it is necessary to rearrange other existing item value numbers.
- the pointer array to the value management table must be updated so that it matches the item value number after the rearrangement. Therefore, with the database structures of Patent Document 2 and Patent Document 3, data update cannot be executed efficiently and at high speed. In particular, when data is updated frequently, there is a problem that the processing load becomes extremely large and the processing speed is significantly reduced.
- an object of the present invention is to provide a database system, a database management method, a database structure, and a storage that enable efficient and high-speed execution of data update to a database and can also improve the processing speed of search and aggregation. To provide a medium.
- a data storage unit having a database including a plurality of entity data, and a data processing unit that receives a query and executes data processing based on the received query with respect to the database.
- a database system is provided.
- the database has a fixed-length data identifier that uniquely represents the entity data itself in an area defined by at least one tuple defined in the row direction and at least one attribute field defined in the column direction.
- the identifier table has a metadata area in which a tuple identifier uniquely representing the tuple in which each data identifier is stored is stored.
- a database structure including a plurality of entity data.
- a fixed-length data identifier that uniquely represents the entity data itself is stored in an area defined by at least one tuple defined in the row direction and at least one attribute field defined in the column direction.
- An identifier table; and a conversion table representing a correspondence relationship between the plurality of data identifiers and position data indicating positions of the storage areas of the plurality of entity data, and the conversion table is the identifier table in the identifier table It has a metadata area for storing a tuple identifier that uniquely represents the tuple in which each data identifier is stored.
- the database is a fixed-length data identifier that uniquely represents the entity data itself in an area defined by at least one tuple defined in the row direction and at least one attribute field defined in the column direction.
- the identifier table has a metadata area in which a tuple identifier uniquely representing the tuple in which each data identifier is stored is stored.
- a process for receiving a query for a database that is a computer-readable recording medium and includes a plurality of entity data, and a data process that is executed on the database based on the received query And a storage medium storing a computer program that causes the computer to execute a database management process.
- the database includes an identifier in which a fixed-length data identifier uniquely storing the entity data itself is stored in an area defined by at least one tuple defined in a row direction and at least one attribute field defined in a column direction.
- a table, and a conversion table representing a correspondence relationship between position data indicating the position of the storage area of each of the plurality of entity data and the plurality of data identifiers.
- the conversion table has a metadata area in which a tuple identifier uniquely representing the tuple in which each data identifier is stored in the identifier table is stored.
- updates to the database can be executed efficiently and at high speed, and processing such as search and tabulation can also be executed at high speed.
- FIG. 1 It is a functional block diagram which shows schematic structure of the database system of one Embodiment which concerns on this invention. It is a flowchart which shows roughly the process sequence of the transaction processing part of a database system. It is the schematic which shows an example of the logical structure of the database which concerns on the 1st Embodiment of this invention. It is a figure which shows an example of a real table typically.
- (A) to (C) are diagrams showing a logical data structure of a database.
- (A) And (B) is a figure which shows roughly the 1st modification of the conversion table of the database which concerns on 1st Embodiment.
- (A) And (B) is a figure which shows roughly the 2nd modification of the conversion table of the database which concerns on 1st Embodiment.
- (A) And (B) is a figure which shows roughly the 3rd modification of the conversion table of the database which concerns on 1st Embodiment.
- (A) And (B) is a figure which shows roughly the 4th modification of the conversion table of the database which concerns on 1st Embodiment.
- It is a figure which shows the reference table and intermediate identifier table which comprise the database of 2nd Embodiment.
- FIG. 1 is a functional block diagram showing a schematic configuration of a database system 10 according to an embodiment of the present invention.
- the database system 10 includes a transaction processing unit 20, a checkpoint processing unit 30, a defragmenting processing unit 31, a transaction server 32, and a storage device 40.
- the storage device 40 stores a database 41 and a log file 42.
- the transaction processing unit 20 includes a query reception unit 21, an analysis unit 22, a transaction execution unit 23, and a response processing unit 24.
- the database system 10 and a plurality of client terminals 501 and 502 are connected to the network NW.
- the network NW for example, a generally used small-scale network (for example, wired or wireless LAN) can be mentioned, but it is not particularly limited.
- the network NW may be a large-scale network such as the Internet.
- Each of the client terminals 501 and 502 transmits a query described in a query language (database language) such as SQL (Structured Query Language) or XQuery (XML Query Language) with respect to the database 41 to the database system 10. It has the function to do.
- a query language database language
- SQL Structured Query Language
- XQuery XML Query Language
- the hardware configuration of the database system 10 may be a general-purpose configuration.
- a processor such as a CPU (Central Processing Unit), a main memory, a cache memory, a signal transmission bus, a timer circuit, an input device (for example, a keyboard) Or a pointing device) and an output device (for example, a display or a printer), but is not particularly limited.
- a processor such as a CPU (Central Processing Unit), a main memory, a cache memory, a signal transmission bus, a timer circuit, an input device (for example, a keyboard) Or a pointing device) and an output device (for example, a display or a printer), but is not particularly limited.
- a processor such as a CPU (Central Processing Unit), a main memory, a cache memory, a signal transmission bus, a timer circuit, an input device (for example, a keyboard) Or a pointing device) and an output device (for example, a display or a printer), but is not particularly limited.
- a processor such
- All or part of the configuration of the database system 10 may be realized by hardware, or may be realized by a computer program (or program code) that causes a processor to execute processing.
- the processor reads the computer program from a recording medium such as a nonvolatile memory and executes it.
- the components 21 to 24, 30, 31, 32, and 40 of the database system 10 may be incorporated into a single device, or may be incorporated in a plurality of devices that operate in cooperation with each other. Good.
- FIG. 2 is a flowchart schematically showing a processing procedure of the transaction processing unit 20 of the database system 10.
- the query reception unit 21 receives a query that has arrived from the client terminals 501 and 502 (step S ⁇ b> 11), and gives the received query to the analysis unit 22.
- the analysis unit 22 executes query analysis (syntax analysis, optimization processing, etc.), and gives the analysis result to the transaction execution unit 23 (step S12).
- the transaction execution unit 23 executes a transaction based on the analysis result for the database 41 (step S13).
- the transaction means one unit of work including processing such as search and update of the database 41, and is called atomicity (ATOMICITY), consistency (CONSISTENCY), isolation (ISOLATION), and durability (DURABILITY). This process satisfies the ACID characteristics.
- the transaction ends normally (YES in step S14), the transaction is committed (step S15).
- the transaction execution unit 23 records transaction log information (history information) in the storage device 40 as a log file 42. In parallel with this, the transaction execution unit 23 records transaction log metadata (information such as the start or end of each transaction) in the transaction server 32.
- the checkpoint processing unit 30 periodically sets checkpoints based on the metadata recorded in the transaction server 32 and the log file 42.
- the transaction execution unit 23 executes roll forward (step S16). That is, the checkpoint processing unit 30 refers to the log file 42 and confirms the log information of the period Terr from the time of the checkpoint set immediately before to the time of failure, and is not committed during this period Terr. Log information related to the transaction is deleted from the log file 42.
- the transaction execution unit 23 reflects the execution result of the transaction in the database 41 based on the log file 42. Thereafter, the transaction execution unit 23 returns the database 41 to the state before starting the processing of the uncommitted transaction, that is, rolls back (step S17).
- the response processing unit 24 receives the transaction execution result from the transaction execution unit 23, and transmits the execution result to the client terminals 501 and 502 (step S18).
- the database 41 includes an entity data group including variable-length data, and an identifier table in which a fixed-length data identifier that uniquely represents the entity data itself is stored.
- a storage area different from the storage area assigned to the identifier table is assigned to the entity data group, and the identifier table and the entity data group are completely separated from each other.
- the transaction execution unit 23 searches for a fixed-length data identifier in the identifier table without searching for an actual data group in response to a query request.
- the transaction execution unit 23 executes a transaction such as an update for the database 41 using the search result.
- the defrag processing unit 31 has a function of eliminating fragmentation of the data group. That is, when a plurality of data identifiers are distributed and stored in discontinuous storage areas in the storage device 40, the defragmentation processing unit 31 reads these data identifiers from the storage device 40, and continuously reads the identifiers for the identifier table. It has a function of writing to the storage area.
- the defragmentation unit 31 continues the data identifiers VR11, VR12, VR13,. Can be written to the storage area.
- FIG. 3 is a schematic diagram illustrating an example of the logical structure of the database 41 according to the first embodiment of the present invention. As shown in FIG. 3, this database structure includes an entity data group stored in the storage area DA0 in the storage device 40 and a reference table stored in a storage area different from the storage area DA0 in the storage device 40. (Identifier table) RT0.
- the reference table RT0 has five tuples defined in the row direction and five attribute fields TID, Val1, Val2, Val3, and Val4 defined in the column direction.
- the number of tuples in the reference table RT0 is five.
- the number is not limited to this, and the number of tuples can be set to several tens to several millions, for example.
- the number of attribute fields is not limited to five.
- Unique tuple identifiers (TID) R1, R2, R3, R4, and R5 are assigned to the five tuples of the reference table RT0, respectively.
- Data identifiers VR11, VR12,..., VR43 each having a fixed length in an area defined by these tuples and attribute fields Val1, Val2, Val3, Val4 (area where the tuples and attribute fields Val1, Val2, Val3, Val4 intersect).
- the attribute field Val1 includes data identifiers VR11, VR12, VR13, VR14, and VR15 in areas corresponding to the tuple identifiers R1, R2, R3, R4, and R5, respectively.
- the attribute field Val2 includes the tuple identifiers R1, R2, and R3.
- R4, R5 include data identifiers VR21, VR22, VR23, VR23, VR24, respectively
- attribute field Val3 includes data identifiers VR31, VR3, R4, R5, R4, R5, respectively.
- VR32, VR33, VR34, and VR35 and attribute field Val4 includes data identifiers VR41, VR41, VR41, VR42, and VR43 in areas corresponding to tuple identifiers R1, R2, R3, R4, and R5, respectively.
- the values of the data identifiers VR11 to VR43 can be calculated using a hash function.
- the hash function is a logical operator that outputs a fixed-length bit string in response to a variable-length bit string input.
- the output value (hash value) of the hash function with respect to the input of the actual data can be the values of the data identifiers VR11 to VR43, respectively.
- the anti-collision property of the hash function (the property that it is difficult to generate the same hash value from two different input values) is sufficient depending on the scale of the database 41. High is desirable.
- the transaction execution unit 23 converts the search character string into a hash value, searches the reference table RT0 for a data identifier having a value that matches the hash value, and stores the entity data corresponding to the found data identifier from the storage area DA0. You can find out. At this time, the transaction execution unit 23 searches the reference table RT0 including only the fixed-length data group not including the variable-length data, so that the character string can be searched at high speed.
- the data identifiers VR11 to VR43 each have a value that uniquely represents the actual data in the storage area DA0. Therefore, the transaction execution unit 23 can search for the data identifiers VR11 to VR43 and access variable-length entity data corresponding to the data identifiers VR11 to VR43 based on the search result.
- “substantially unique” means that the data processing uniqueness with respect to the database 41 is satisfied.
- the database structure shown in FIG. 3 can be generated based on a set (actual table) ST of entity data as exemplified in FIG.
- the real table ST in FIG. 4 is a 5-by-4 table having four attribute fields of “store name”, “region”, “sales”, and “year / month”. By hashing actual data such as “A store”, “B store”, “Kyushu”, etc.
- the database of this embodiment can be generated based on an existing real table. However, the present invention is not limited to this, and the database structure of FIG. 3 of this embodiment can be directly created from input data without generating a real table. It may be generated.
- FIG. 5 (A) to 5 (C) are diagrams showing the logical data structure constituting the database 41.
- FIG. 5A The data structure shown in FIG. 5A has a header area at the beginning and an allocation management table at the end. In addition, an area for storing the entity data group is provided between the header area and the allocation management table.
- FIG. 5B is a schematic diagram illustrating an example of a conversion table included in the header area.
- This conversion table is a table that defines the correspondence between the data identifiers VR11 to VR43 and the storage areas of the entity data D11 to D43 corresponding to the data identifiers VR11 to VR43, respectively.
- an area Fa in which position data A11 to A43 are stored.
- an area Fai for storing position data AI11 to AI43 is provided.
- the position data AI11 to AI43 is data indicating the position of the storage area of the tuple identifier that uniquely represents the tuple in which the data identifiers VR11 to VR43 are stored in the identifier table RT0.
- FIG. 5C shows a metadata area Fidx in which tuple identifiers R2, R100,..., 0 corresponding to the data identifier VR11 are stored.
- the metadata area Fidx is assigned to a storage area different from the storage area DA0.
- the position data AI11 to AI43 is an address that specifies the absolute position of the storage area of the tuple identifier, or an offset that specifies the relative position (effective address) of the storage area with reference to a predetermined address. Good.
- pointers indicating addresses assigned to the storage areas can be used as the position data AI11 to AI43.
- the tuple identifier stored in the metadata area Fidx is used as index information for the identifier table RT0.
- the transaction execution unit 23 can access not only the entity data corresponding to the data identifier by referring to the header area of FIG. 5A but also the tuple identifier associated with the entity data by referring to the metadata area Fidx. All of can be acquired.
- By using the conversion table of FIG. 5B it is possible to collectively specify the tuple identifiers R1 to R3 from the data identifier VR41.
- the update process of the database 41 is highly efficient In addition, it is possible to improve the processing speed of search and tabulation, and improve portability.
- the update process can be performed efficiently.
- the value of the variable-length entity data D41 in the storage area DA0 is updated to another value, all the values in the identifier table RT0 in FIG. 3 and the conversion table in FIG.
- the value of the data identifier VR41 may be updated to a new value.
- a new tuple is added to the identifier table RT0 and only existing data identifiers (for example, data identifiers VR11 to VR14) are stored in the new tuple, the conversion table of FIG. There is no need to update.
- the database 41 is updated to the minimum necessary with the update, addition, or deletion of the entity data. Therefore, even when the database 41 is frequently updated, the update can be executed efficiently and at high speed. Is possible.
- the database system 10 of the present embodiment uses the data structure shown in FIGS. 5A to 5C to store a set of tuple identifiers associated with the data identifier (stored in the metadata area Fidx).
- a set of tuple identifiers stored in the identifier table RT0 can be accessed at a high speed. Therefore, it is possible to improve processing speed such as search, aggregation, and tuple update.
- the storage area DA0 of the entity data D11 to D43 and the storage area of the data identifiers VR11 to VR43 are completely separated from each other, it is easy to distribute the reference table RT0 and the entity data group.
- the reference table RT0 and the entity data group can be distributed and arranged in two computer systems connected via a computer network such as a LAN.
- the storage area DA0 for the entity data D11 to D43 and the storage area for the data identifiers VR11 to VR43 are completely separated from each other, the dependency of the database structure on the hardware configuration is low, and the database structure is transferred to another system. Can be easily implanted.
- FIG. 6A is a schematic diagram of the storage area DA0
- FIG. 6B schematically shows a first modification of the conversion table in the storage area DA0 in the database 41 according to the first embodiment.
- FIG. 6A is a schematic diagram of the storage area DA0
- FIG. 6B schematically shows a first modification of the conversion table in the storage area DA0 in the database 41 according to the first embodiment.
- an area Fid in which data identifiers VR11 to VR43 are stored and an area Fa in which position data A11 to A43 are stored are provided. It has been.
- the position data A11 to A43 are data indicating storage areas of the entity data D11 to D43 corresponding to the data identifiers VR11 to VR43, respectively.
- a metadata area Fidx in which a set of tuple identifiers corresponding to the data identifiers VR11 to VR43 is stored, and each area Fidx And a flag area Fn in which the number of data (flag information) stored is stored.
- each metadata area Fidx has a fixed length, and each metadata area Fidx can store a maximum of 256 tuple identifiers.
- the number of tuple identifiers associated with a certain data identifier exceeds the maximum number (256), in other words, the total data amount of the tuple identifier associated with the data identifier is the fixed length of the metadata area Fidx.
- the metadata area Fidx stores position data indicating the position of a storage area (overflow area) in which 500 or more tuple identifiers are stored. For example, in the conversion table of FIG.
- the metadata area Fidx corresponding to the data identifier VR14 stores position data AI14 indicating the position of the overflow area in which the tuple identifier is stored.
- position data AI14 an address that specifies the absolute position of the storage area of the tuple identifier, an offset that specifies the relative position (effective address) of the storage area, or a pointer that points to an address assigned to the storage area Can be used.
- the total length of the area Fid, the area Fa, the flag area Fn, and the metadata area Fidx can be set regardless of the number of tuple identifiers associated with each data identifier. Since it can be a fixed length, a predetermined number or less of tuple identifiers can be referred to at high speed. As a result, the access speed to the tuple set storing the common data identifier in the identifier table RT0 in FIG. 3 can be improved.
- the number of tuple identifiers associated with a certain data identifier exceeds the maximum number (256)
- all the tuple identifiers are stored in the overflow area, but this is not limitative. It is not a thing. In such a case, a part of the tuple identifier may be stored in the overflow area, and the remaining tuple identifier may be stored in the metadata area Fidx.
- the flag region Fn has the maximum number of tuple identifiers instead of the multi-bit value representing the number.
- a 1-bit value indicating that it has been exceeded may be stored as flag information.
- FIG. 7A is a schematic diagram of the storage area DA0
- FIG. 7B schematically shows a second modification of the conversion table in the storage area DA0 in the database 41 according to the first embodiment.
- FIG. 7A is a schematic diagram of the storage area DA0
- FIG. 7B schematically shows a second modification of the conversion table in the storage area DA0 in the database 41 according to the first embodiment.
- An area Fa for storing position data A11 to A43 indicating the position of the storage area of D43 is provided.
- the 7B further includes a fixed-length metadata area Fidx for storing a set of tuple identifiers corresponding to the data identifiers VR11 to VR43, and a flag area Fp.
- the flag area Fp stores a value (flag information) indicating whether or not the number of data stored in each metadata area Fidx exceeds the fixed length size of the metadata area Fidx.
- a maximum of 256 tuple identifiers can be stored in each metadata area Fidx.
- a NULL value is stored in the flag area Fp.
- m tuple identifiers less than the maximum number (256) are stored in the metadata area Fidx
- the m tuple identifiers and (256-m) special identifiers are stored in the metadata area Fidx.
- a value (eg, a zero value) is stored.
- the flag area Fp stores position data indicating the position of the storage area (overflow area) in which the tuple identifier is stored.
- the position data AI14 indicating the position of the overflow region in which the tuple identifier is stored is stored in the flag region Fp corresponding to the data identifier VR14.
- 256 zero values are stored in the metadata area Fidx corresponding to the data identifier VR14.
- the position data AI14 an address that specifies the absolute position of the storage area of the tuple identifier, an offset that specifies the relative position (effective address) of the storage area, or a pointer that points to an address assigned to the storage area Can be used.
- the total length of the area Fid, the area Fa, the flag area Fp, and the metadata area Fidx can be set regardless of the number of tuple identifiers associated with each data identifier. Since it can be a fixed length, a predetermined number or less of tuple identifiers can be referred to at high speed. As a result, the access speed to the tuple set storing the common data identifier in the identifier table RT0 in FIG. 3 can be improved.
- FIG. 8A is a schematic diagram of the storage area DA0
- FIG. 8B schematically shows a third modification of the conversion table in the storage area DA0 in the database 41 according to the first embodiment.
- FIG. 8A is a schematic diagram of the storage area DA0
- FIG. 8B schematically shows a third modification of the conversion table in the storage area DA0 in the database 41 according to the first embodiment.
- An area Fa for storing position data A11 to A43 indicating the position of the storage area of D43 is provided.
- a fixed-length metadata area Fidx in which a set of tuple identifiers corresponding to the data identifiers VR11 to VR43 is stored is provided.
- a maximum of 256 tuple identifiers can be stored in each metadata area Fidx.
- the number of tuple identifiers associated with the data identifier exceeds the maximum number (256), in other words, the total data amount of the tuple identifier associated with the data identifier is equal to the fixed length size of the metadata area Fidx.
- a special value for example, a value of “ ⁇ 1”
- the value “ ⁇ 1” and the position data AI14 indicating the position of the overflow area are stored in the metadata area Fidx corresponding to the data identifier VR14.
- the position data AI14 an address that specifies the absolute position of the storage area of the tuple identifier, an offset that specifies the relative position (effective address) of the storage area, or a pointer that points to an address assigned to the storage area can be used.
- the entire length of the area Fid, the area Fa, and the metadata area Fidx is fixed regardless of the number of tuple identifiers associated with each data identifier. Therefore, less than a predetermined number of tuple identifiers can be referred to at high speed. As a result, the access speed to the tuple set storing the common data identifier in the identifier table RT0 in FIG. 3 can be improved.
- the data structure of the third modified example does not have the flag area Fp, so that a simple structure can be realized.
- the tuple identifier may be stored in an overflow area provided in advance so as to correspond to the data identifier.
- FIG. 9A and FIG. 9B are diagrams schematically showing a logical data structure of a fourth modified example of the first embodiment.
- the storage area DA0 is not provided with the conversion table as in the first to third modifications.
- data blocks DB11 to DB43 corresponding to the data identifiers VR11 to VR43 of the reference table RT0 are stored in the storage area DA0.
- Each data block includes a data identifier and a corresponding entity data value, and a set of tuple identifiers as index information.
- FIG. 9B is a diagram schematically showing the data structure of the data block DB 22.
- the data block DB 22 is associated with the area Fid in which the data identifier VR22 is stored, the variable length area Fd in which the actual data D22 is stored, and the data identifier VR22.
- a variable-length area (metadata area) Fxid in which the tuple identifier group I22 is stored.
- the area Fds is an area in which the length DL22 of the actual data D22 is stored
- the area Fixs is an area in which the length IL22 of the metadata area Fxid is stored.
- the transaction execution unit 23 refers to the tuple identifier associated with the entity data (data identifier) by referring to the inside of the metadata area Fxid at the end of each of the data blocks DB11 to DB43. be able to. Therefore, the database system 10 of the present embodiment can collectively acquire a set of tuple identifiers associated with the data identifier, so that the tuple set storing the common data identifier in the identifier table RT0 can be accessed at high speed. can do. Therefore, it is possible to improve processing speed such as search, aggregation, and tuple update.
- FIG. 10 is a schematic diagram showing a database 41 according to the second embodiment of the present invention.
- this database structure includes an entity data group stored in the storage area DA3 of the storage device 40, a reference table RT1 stored in a storage area different from the storage area DA3, and the first to first items. 3 intermediate identifier tables IT41, IT42, IT43.
- FIG. 11A shows a schematic configuration of the reference table RT1.
- the reference table RT1 has a plurality of tuples defined in the row direction, and four attribute fields TID, Col1Ref, Col2Ref, and Col3Ref defined in the column direction.
- the number of tuples in the reference table RT1 can be set to several tens to several millions.
- the number of attribute fields is not limited to four.
- Unique tuple identifiers R1, R2, R3, R4,... are assigned to the tuples of the reference table RT1, respectively.
- the values of the reference identifiers CRV11 to CRV31 can be calculated using the same hash function as that of the data identifier of the first embodiment. That is, the output values of the hash functions for the inputs of the data identifiers VR11 to VR31 may be the values of the reference identifiers CRV11 to CRV31, respectively.
- FIGS. 11B to 11D are diagrams schematically showing the structures of the first to third intermediate identifier tables IT41, IT42 and IT43.
- the first intermediate identifier table IT41 has a plurality of tuples defined in the row direction and four attribute fields Col1, Val, num, and Index defined in the column direction.
- the attribute field Col1 includes fixed-length reference identifiers CRV11, CRV12,...,
- the attribute field Val includes fixed-length data identifiers VR11, VR12,.
- the attribute field Index includes a tuple identifier as index information in an area corresponding to each tuple, and the attribute field num includes the number of tuple identifiers included in the attribute field Index.
- the second intermediate identifier table IT42 has a plurality of tuples defined in the row direction and four attribute fields Col2, Val, num, and Index defined in the column direction.
- the attribute field Col2 includes fixed-length reference identifiers CRV21, CRV22,..., And the attribute field Val includes fixed-length data identifiers VR21, VR22,.
- the attribute field Index includes a tuple identifier as index information in an area corresponding to each tuple, and the attribute field num includes the number of tuple identifiers included in the attribute field Index.
- the third intermediate identifier table IT43 has a plurality of tuples defined in the row direction and four attribute fields Col3, Val, num, and Index defined in the column direction.
- the attribute field Col3 includes fixed-length reference identifiers CRV31, CRV32,..., And the attribute field Val includes fixed-length data identifiers VR31, VR32,.
- the attribute field Index includes a tuple identifier as index information in an area corresponding to each tuple, and the attribute field num includes the number of tuple identifiers included in the attribute field Index.
- Each of the first to third intermediate identifier tables IT41, IT42, IT43 has a data structure from which redundancy is eliminated because reference identifiers having mutually duplicated values are excluded. Also, in the first to third intermediate identifier tables IT41 to IT43, attribute fields num and num, which correspond to the conversion table area Fn and the metadata area Fidx in FIG. 6B of the second embodiment, respectively. Index.
- the transaction execution unit 23 searches the reference identifiers CRV11 to CRV33 and the data identifiers VR11 to VR33, and can access variable-length entity data using the search results. Since the storage area DA3 has a conversion table similar to the conversion table of the first embodiment or its modification, the transaction execution unit 23 can access the entity data based on the search result. .
- a set of tuple identifiers associated with each of the reference identifiers CRV11 to CRV33 is integrated. Therefore, it is possible to access the tuple set storing common reference identifiers in the reference table RT1 at high speed. Therefore, it is possible to improve processing speed such as search, aggregation, and tuple update.
- FIG. 12A to FIG. 12D are diagrams schematically showing logical data structures of modifications of the second embodiment.
- FIG. 12A is a diagram showing a schematic configuration of the same reference table RT1 as in FIG.
- FIGS. 12B to 12D are diagrams schematically showing the data structures of the intermediate identifier tables IT41, IT42, IT43 of this modification.
- FIGS. 13A to 13D are diagrams schematically showing a logical data structure of another modification of the second embodiment.
- FIG. 13A is a diagram showing a schematic configuration of the same reference table RT1 as in FIG.
- FIGS. 13B to 13D are diagrams schematically showing the data structures of the intermediate identifier tables IT41, IT42, IT43 of the other modified examples.
- the first to third intermediate identifier tables IT41 to IT43 of this modification each corresponds to the metadata area Fidx of the conversion table (FIG. 8B) according to the third modification of the first embodiment. It has an attribute field Index.
- a set of tuple identifiers associated with each of the reference identifiers CRV11 to CRV33 can be acquired in a lump, so that the tuple set storing common reference identifiers in the reference table RT1 can be operated at high speed. Can be accessed. Therefore, it is possible to improve processing speed such as search, aggregation, and tuple update.
- the embodiments of the present invention have been described with reference to the drawings. However, these are exemplifications of the present invention, and various configurations other than the above can be adopted.
- the embodiment described above executes a process suitable for executing a transaction on the database 41, but is not limited to this.
- the transaction is processing that satisfies the ACID characteristics, but the database structure according to the present invention can be applied to data processing that does not satisfy any of these ACID characteristics.
- the query receiving unit 21 receives a query described in a query language
- the analyzing unit 22 analyzes the query
- the query may not be described in a query language (database language), but may simply include a value for calling an API (Application Programming Interface) function for the database.
- API Application Programming Interface
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
図1は、本発明に係る一実施形態のデータベースシステム10の概略構成を示す機能ブロック図である。このデータベースシステム10は、トランザクション処理部20、チェックポイント処理部30、デフラグ処理部31、トランザクションサーバ32および記憶装置40を有する。記憶装置40には、データベース41とログファイル42とが格納されている。トランザクション処理部20は、クエリ受信部21、解析部22、トランザクション実行部23および応答処理部24を含む。
図3は、本発明の第1の実施形態に係るデータベース41の論理構造の一例を示す概略図である。図3に示されるように、このデータベース構造は、記憶装置40内の記憶領域DA0に格納されている実体データ群と、記憶装置40において記憶領域DA0とは異なる記憶領域に格納されている参照テーブル(識別子テーブル)RT0とを有する。
図6(A)は、記憶領域DA0の模式図であり、図6(B)は、第1の実施形態に係るデータベース41のうち記憶領域DA0内の変換テーブルの第1変形例を概略的に示す図である。
図7(A)は、記憶領域DA0の模式図であり、図7(B)は、第1の実施形態に係るデータベース41のうち記憶領域DA0内の変換テーブルの第2変形例を概略的に示す図である。
図8(A)は、記憶領域DA0の模式図であり、図8(B)は、第1の実施形態に係るデータベース41のうち記憶領域DA0内の変換テーブルの第3変形例を概略的に示す図である。
図9(A)および図9(B)は、第1の実施形態の第4変形例の論理データ構造を概略的に示す図である。第4変形例では、記憶領域DA0には、上記第1~第3変形例のような変換テーブルは設けられていない。図9(A)に示されるように、本変形例では、記憶領域DA0には、参照テーブルRT0のデータ識別子VR11~VR43にそれぞれ対応するデータブロックDB11~DB43が格納されている。各データブロックは、データ識別子とこれに対応する実体データ値とを含むとともに、インデックス情報としてのタプル識別子の集合を含む。
図10は、本発明の第2の実施形態に係るデータベース41を示す概略図である。図10に示されるように、このデータベース構造は、記憶装置40の記憶領域DA3に格納された実体データ群と、記憶領域DA3とは別の記憶領域に格納された参照テーブルRT1および第1~第3の中間識別子テーブルIT41,IT42,IT43とを有する。
図12(A)~図12(D)は、第2の実施形態の変形例の論理データ構造を概略的に示す図である。図12(A)は、図11(A)と同じ参照テーブルRT1の概略構成を示す図である。また、図12(B)~図12(D)は、この変形例の中間識別子テーブルIT41,IT42,IT43のデータ構造を概略的に示す図である。本変形例の第1~第3の中間識別子テーブルIT41~IT43においては、それぞれ、上記第1の実施形態の第2変形例に係る変換テーブル(図7(B))の領域Fpとメタデータ領域Fidxとに相当する属性フィールドOver,Indexを有している。
Claims (29)
- 複数の実体データを含むデータベースを有するデータ記憶部と、
クエリを受信し、当該受信されたクエリに基づいたデータ処理を前記データベースに対して実行するデータ処理部と、
を備え、
前記データベースは、
行方向に定義された少なくとも1つのタプルと列方向に定義された少なくとも1つの属性フィールドとで定まる領域に前記実体データそのものを一意に表す固定長のデータ識別子が格納されている識別子テーブルと、
前記複数の実体データそれぞれの記憶領域の位置を示す位置データと前記複数のデータ識別子との間の対応関係を表す変換テーブルと、
を含み、
前記変換テーブルは、前記識別子テーブルにおいて前記各データ識別子が格納されている当該タプルを一意に表すタプル識別子が格納されるメタデータ領域を有する、データベースシステム。 - 請求項1記載のデータベースシステムであって、前記識別子テーブルに割り当てられた記憶領域と前記実体データに割り当てられた記憶領域とが互いに異なる、データベースシステム。
- 請求項1または2記載のデータベースシステムであって、前記データ識別子の値は、前記実体データの入力に対して固定長のビット列を出力するハッシュ関数の出力値である、データベースシステム。
- 請求項1から3のうちのいずれか1項に記載のデータベースシステムであって、前記メタデータ領域のサイズは固定長である、データベースシステム。
- 請求項4に記載のデータベースシステムであって、前記変換テーブルは、前記各データ識別子に対して、前記タプル識別子のデータ量が前記メタデータ領域の固定長サイズを超えたか否かを示すフラグ情報が格納されるフラグ領域を有する、データベースシステム。
- 請求項4または5に記載のデータベースシステムであって、
前記データベースは、前記タプル識別子のデータ量が前記固定長サイズを超えるときに当該タプル識別子の全部または一部が格納されるオーバフロー領域を更に有し、
前記データ処理部は、前記タプル識別子のデータ量が前記固定長サイズを超えるときに、当該タプル識別子の全部または一部を前記オーバフロー領域に格納する、データベースシステム。 - 請求項6に記載のデータベースシステムであって、前記データ処理部は、前記オーバフロー領域に前記タプル識別子を格納するとき、当該タプル識別子の記憶領域の位置を示す位置データを前記メタデータ領域に格納する、データベースシステム。
- 請求項7に記載のデータベースシステムであって、前記タプル識別子の記憶領域の位置を示す位置データは、前記タプル識別子の記憶領域の絶対的位置を指定するアドレスである、データベースシステム。
- 請求項7に記載のデータベースシステムであって、前記タプル識別子の記録領域の位置を示す位置データは、前記タプル識別子の記憶領域の相対的位置を指定するオフセットである、データベースシステム。
- 請求項1から9のうちのいずれか1項に記載のデータベースシステムであって、前記複数の実体データは可変長データを含む、データベースシステム。
- 請求項1から10のうちのいずれか1項に記載のデータベースシステムであって、
前記識別子テーブルは複数存在し、
前記データベースは、前記複数の識別子テーブル内のデータ識別子をそれぞれ一意に表す参照識別子の集合を有する参照テーブルを更に含み、
前記データ処理部は、前記参照テーブルおよび前記識別子テーブルを用いて前記データ処理を実行する、データベースシステム。 - 請求項1から11のうちのいずれか1項に記載のデータベースシステムであって、
前記クエリは、問い合わせ言語で記述されており、
前記データ処理部は、前記クエリを解析し、その解析結果に基づいたトランザクションを前記データ処理として前記データベースに対して実行する、データベースシステム。 - 複数の実体データを含むデータベース構造であって、
行方向に定義された少なくとも1つのタプルと列方向に定義された少なくとも1つの属性フィールドとで定まる領域に前記実体データそのものを一意に表す固定長のデータ識別子が格納されている識別子テーブルと、
前記複数の実体データそれぞれの記憶領域の位置を示す位置データと前記複数のデータ識別子との間の対応関係を表す変換テーブルと、
を含み、
前記変換テーブルは、前記識別子テーブルにおいて前記各データ識別子が格納されている当該タプルを一意に表すタプル識別子が格納されるメタデータ領域を有する、データベース構造。 - 請求項13記載のデータベース構造であって、前記識別子テーブルに割り当てられた記憶領域と前記実体データに割り当てられた記憶領域とが互いに異なる、データベース構造。
- 請求項13または14に記載のデータベース構造であって、前記メタデータ領域のサイズは固定長である、データベース構造。
- 請求項15に記載のデータベース構造であって、前記変換テーブルは、前記各データ識別子に対して、前記タプル識別子のデータ量が前記メタデータ領域の固定長サイズを超えたか否かを示すフラグ情報が格納されるフラグ領域を有する、データベース構造。
- 請求項15または16に記載のデータベース構造であって、前記タプル識別子のデータ量が前記固定長サイズを超えるときに当該タプル識別子の全部または一部が格納されるオーバフロー領域を更に有するデータベース構造。
- (a)複数の実体データを含むデータベースについてクエリを受信するステップと、
(b)前記データベースに対して、当該受信されたクエリに基づいたデータ処理を実行するステップと、
を備え、
前記データベースは、
行方向に定義された少なくとも1つのタプルと列方向に定義された少なくとも1つの属性フィールドとで定まる領域に前記実体データそのものを一意に表す固定長のデータ識別子が格納されている識別子テーブルと、
前記複数の実体データそれぞれの記憶領域の位置を示す位置データと前記複数のデータ識別子との間の対応関係を表す変換テーブルと、
を含み、
前記変換テーブルは、前記識別子テーブルにおいて前記各データ識別子が格納されている当該タプルを一意に表すタプル識別子が格納されるメタデータ領域を有する、データベース管理方法。 - 請求項18記載のデータベース管理方法であって、前記識別子テーブルに割り当てられた記憶領域と前記実体データに割り当てられた記憶領域とが互いに異なる、データベース管理方法。
- 請求項18または19に記載のデータベース管理方法であって、前記メタデータ領域のサイズは固定長である、データベース管理方法。
- 請求項20に記載のデータベース管理方法であって、前記変換テーブルは、前記各データ識別子に対して、前記タプル識別子のデータ量が前記メタデータ領域の固定長サイズを超えたか否かを示すフラグ情報が格納されるフラグ領域を有する、データベース管理方法。
- 請求項20または21に記載のデータベース管理方法であって、
前記データベースは、前記タプル識別子のデータ量が前記固定長サイズを超えるときに当該タプル識別子の全部または一部が格納されるオーバフロー領域を有し、
前記ステップ(b)は、前記タプル識別子のデータ量が前記固定長サイズを超えるときに、当該タプル識別子の全部または一部を前記オーバフロー領域に格納するステップを含む、データベース管理方法。 - 請求項22に記載のデータベース管理方法であって、前記ステップ(b)は、前記オーバフロー領域に前記タプル識別子が格納されるとき、当該タプル識別子の記憶領域の位置を示す位置データを前記メタデータ領域に格納するステップを含む、データベース管理方法。
- コンピュータによって読み取り可能な記録媒体であって、
複数の実体データを含むデータベースについてクエリを受信する処理と、
前記データベースに対して当該受信されたクエリに基づいて実行されるデータ処理と、
を含むデータベース管理処理を前記コンピュータに実行させるコンピュータプログラムを格納し、
前記データベースは、
行方向に定義された少なくとも1つのタプルと列方向に定義された少なくとも1つの属性フィールドとで定まる領域に前記実体データそのものを一意に表す固定長のデータ識別子が格納されている識別子テーブルと、
前記複数の実体データそれぞれの記憶領域の位置を示す位置データと前記複数のデータ識別子との間の対応関係を表す変換テーブルと、
を含み、
前記変換テーブルは、前記識別子テーブルにおいて前記各データ識別子が格納されている当該タプルを一意に表すタプル識別子が格納されるメタデータ領域を有することを特徴とする記憶媒体。 - 請求項24記載の記憶媒体であって、前記識別子テーブルに割り当てられた記憶領域と前記実体データに割り当てられた記憶領域とが互いに異なることを特徴とする記憶媒体。
- 請求項24または25に記載の記憶媒体であって、前記メタデータ領域のサイズは固定長であることを特徴とする記憶媒体。
- 請求項26に記載の記憶媒体であって、前記変換テーブルは、前記各データ識別子に対して、前記タプル識別子のデータ量が前記メタデータ領域の固定長サイズを超えたか否かを示すフラグ情報が格納されるフラグ領域を有することを特徴とする記憶媒体。
- 請求項26または27に記載の記憶媒体であって、
前記データベースは、前記タプル識別子のデータ量が前記固定長サイズを超えるときに当該タプル識別子の全部または一部が格納されるオーバフロー領域を有し、
前記データ処理は、前記タプル識別子のデータ量が前記固定長サイズを超えるときに、当該タプル識別子の全部または一部を前記オーバフロー領域に格納する処理を含むことを特徴とする記憶媒体。 - 請求項28に記載の記憶媒体であって、前記データ処理は、前記オーバフロー領域に前記タプル識別子が格納されるとき、当該タプル識別子の記憶領域の位置を示す位置データを前記メタデータ領域に格納する処理を含むことを特徴とする記憶媒体。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/144,828 US20110289112A1 (en) | 2009-01-26 | 2010-01-21 | Database system, database management method, database structure, and storage medium |
JP2010547440A JPWO2010084754A1 (ja) | 2009-01-26 | 2010-01-21 | データベースシステム、データベース管理方法、及びデータベース構造 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009-014142 | 2009-01-26 | ||
JP2009014142 | 2009-01-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2010084754A1 true WO2010084754A1 (ja) | 2010-07-29 |
Family
ID=42355810
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/000326 WO2010084754A1 (ja) | 2009-01-26 | 2010-01-21 | データベースシステム、データベース管理方法、データベース構造および記憶媒体 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20110289112A1 (ja) |
JP (1) | JPWO2010084754A1 (ja) |
WO (1) | WO2010084754A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013025793A (ja) * | 2011-07-14 | 2013-02-04 | Lsi Corp | フラッシュメディアコントローラの内部のメタデータハンドリング |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5392254B2 (ja) * | 2008-05-30 | 2014-01-22 | 日本電気株式会社 | データベースシステム、データベース管理方法、データベース構造およびコンピュータプログラム |
US20120011143A1 (en) * | 2010-07-09 | 2012-01-12 | David Nash | Publish and Subscribe Data Delivery System and Method Using Keys |
US9304898B2 (en) * | 2011-08-30 | 2016-04-05 | Empire Technology Development Llc | Hardware-based array compression |
EP2843567B1 (en) * | 2013-08-30 | 2017-05-10 | Pilab S.A. | Computer-implemented method for improving query execution in relational databases normalized at level 4 and above |
US9465840B2 (en) * | 2014-03-14 | 2016-10-11 | International Business Machines Corporation | Dynamically indentifying and preventing skewed partitions in a shared-nothing database |
US10140348B2 (en) * | 2015-04-25 | 2018-11-27 | Phillip Hiroshige | System for automatically tracking data through a plurality of data sources and plurality of conversions |
WO2021210056A1 (ja) | 2020-04-14 | 2021-10-21 | 日本電信電話株式会社 | トラフィックモニタリング装置、方法及びプログラム |
CN117076474B (zh) * | 2023-10-16 | 2024-03-12 | 之江实验室 | 离线多模态文献数据的更新方法、装置、设备和介质 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004326772A (ja) * | 2003-04-22 | 2004-11-18 | Hewlett-Packard Development Co Lp | ハッシュテーブルを使用するメモリ管理システム |
JP2008533570A (ja) * | 2005-03-11 | 2008-08-21 | ロックソフト リミテッド | 低冗長記憶システムで索引を行う方法 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5727197A (en) * | 1995-11-01 | 1998-03-10 | Filetek, Inc. | Method and apparatus for segmenting a database |
US8122040B2 (en) * | 2007-08-29 | 2012-02-21 | Richard Banister | Method of integrating remote databases by automated client scoping of update requests over a communications network |
-
2010
- 2010-01-21 JP JP2010547440A patent/JPWO2010084754A1/ja active Pending
- 2010-01-21 WO PCT/JP2010/000326 patent/WO2010084754A1/ja active Application Filing
- 2010-01-21 US US13/144,828 patent/US20110289112A1/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004326772A (ja) * | 2003-04-22 | 2004-11-18 | Hewlett-Packard Development Co Lp | ハッシュテーブルを使用するメモリ管理システム |
JP2008533570A (ja) * | 2005-03-11 | 2008-08-21 | ロックソフト リミテッド | 低冗長記憶システムで索引を行う方法 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013025793A (ja) * | 2011-07-14 | 2013-02-04 | Lsi Corp | フラッシュメディアコントローラの内部のメタデータハンドリング |
Also Published As
Publication number | Publication date |
---|---|
JPWO2010084754A1 (ja) | 2012-07-19 |
US20110289112A1 (en) | 2011-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2010084754A1 (ja) | データベースシステム、データベース管理方法、データベース構造および記憶媒体 | |
JP5922716B2 (ja) | 個別にアクセス可能なデータユニットの記憶の取り扱い方法 | |
US8620880B2 (en) | Database system, method of managing database, and computer-readable storage medium | |
EP2729884B1 (en) | Managing storage of data for range-based searching | |
US9916313B2 (en) | Mapping of extensible datasets to relational database schemas | |
US20160253382A1 (en) | System and method for improving a query response rate by managing a column-based store in a row-based database | |
US9104711B2 (en) | Database system, method of managing database, and computer-readable storage medium | |
JP7274293B2 (ja) | 情報処理装置、情報処理方法及びプログラム | |
AU2015258326B2 (en) | Managing storage of individually accessible data units | |
Dean | xqerl_db: Database Layer in xqerl | |
Pollack et al. | Index Storage Fundamentals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10733362 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13144828 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010547440 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 10733362 Country of ref document: EP Kind code of ref document: A1 |