WO2022091204A1

WO2022091204A1 - Data analysis processing device, data analysis processing method, and program

Info

Publication number: WO2022091204A1
Application number: PCT/JP2020/040213
Authority: WO
Inventors: 哲八木
Original assignee: 日本電信電話株式会社
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2022-05-05
Also published as: JP7464142B2; JPWO2022091204A1

Abstract

A data analysis processing device according to one aspect of the present invention comprises: a multidimensional database; an OLAP operation execution unit; and a multidimensional database management unit. The multidimensional database stores data embodying a real-world event in a multidimensional cube constructed for each subject in association with the identifier of the event. The OLAP operation execution unit executes an Online Analytical Processing (OLAP) operation on a multidimensional cube in response to a request from a client. The multidimensional database management unit manages, in the multidimensional cube, time-dimensional data, space-dimensional data, multiple types of unique dimensional data, and data representing multiple types of characteristics. When each of the data constituting the multidimensional cube is multidimensional data, the multidimensional database management unit classifies the multidimensional data in a multidimensional value range common among the multidimensional cubes.

Description

Data analysis processing equipment, data analysis processing method, and program

One aspect of the present invention relates to a data analysis processing apparatus, a data analysis processing method, and a program.

Real-world events change temporally, spatially, or both. In other words, an event is created, disappeared, or a state transitions. The data that embodies the event can be mapped to a multidimensional cube, as it is called in data analysis technology. The data analysis processing device executes an online analytical processing (OLAP) operation on the multidimensional cube to analyze the data. The data analysis processing apparatus uses, for example, a method as disclosed in Non-Patent Document 1.

When the data analysis processing device executes an OLAP operation on a certain multidimensional cube, the argument instructed by the client is used as an argument of the OLAP operation. In addition, the data analysis processing device can use a relational database to execute OLAP operations. Therefore, when performing an OLAP operation on a certain multidimensional cube, when trying to use the data constituting another multidimensional cube as an argument of the OLAP operation, the data constituting the certain multidimensional cube is newly used. , When searching / manipulating data constituting other multidimensional cubes as a key, it is possible to use the means for speeding up the relational database. For example, a speed-up means as disclosed in Non-Patent Document 2 can be used.

Data of up to 2 items of the data of each dimension / data representing each characteristic that composes the multidimensional cube can be stored in one of the list of one-dimensional value ranges, the list of names, and the hash function that are common among the multidimensional cubes. It is classified according to the value range based on it, and stored and managed in the storage area corresponding to the only value range to which the data belongs.
When performing a single search / operation by using the range used to classify the data of each dimension / the data representing each characteristic that constitutes the multidimensional cube as an index, the same range of both multidimensional cubes. The range of search / operation is limited to the storage area corresponding to the above, and when a plurality of searches / operations are executed at the same time, the conflict of the storage area to be searched / operated is further avoided.

In the conventional data analysis processing device, even if the means for speeding up the relational database can be used, the means can be used only in a limited range. That is, the method that can be applied when each of the data of each dimension constituting the multidimensional cube / the data representing each characteristic is one-dimensional data cannot be applied when each of the above data is multidimensional data. Further, even when the data classified by the range belongs to a plurality of ranges, it is not possible to avoid the conflict of the storage area to be searched / operated and promote the speedup.
Specifically, when the conventional data analysis processing device newly performs an OLAP operation on a certain multidimensional cube, when trying to use the data constituting another multidimensional cube as an argument of the OLAP operation. , When searching / operating the data constituting a certain multidimensional cube with the data constituting another multidimensional cube as a key, the means for speeding up the relational database can be used. However, the range that can be speeded up was limited.
For example, in the conventional data analysis processing device, when each of the data of each dimension constituting the multidimensional cube / the data representing each characteristic is one-dimensional data, the data of up to two items of the data can be multidimensionalized. If the data classified by the range is classified by the range based on one of the list of one-dimensional range, the list of names, and the hash function common among the cubes, and the data classified by the range belongs to a single range, the data belongs to the only one. When accumulating and managing in the storage area corresponding to the value range and executing a single search / operation, the range to be searched / operated is limited to the storage area corresponding to the same value range of both multidimensional cubes, and multiple searches / operations are performed. When the search / operation is executed at the same time, the speed can be increased by further avoiding the conflict of the storage area to be searched / operated.
However, when each of the data of each dimension constituting the multidimensional cube / the data representing each characteristic is multidimensional data, the data can be classified by the multidimensional value range common among the multidimensional cubes, or the value range. When the data classified in (1) belongs to a plurality of price ranges, it cannot be accumulated and managed in duplicate in the storage area corresponding to each price range. Therefore, when each of the data of each dimension constituting the multidimensional cube / the data representing each characteristic is multidimensional data, or when the data classified by the range belongs to a plurality of ranges, a single search / operation is performed. When executing, the speed can be increased by limiting the search / operation range, and when executing multiple searches / operations at the same time, further avoiding conflicts in the storage area to be searched / operated. There wasn't.

The present invention has been made by paying attention to the above circumstances, and is intended to provide a technique capable of executing OLAP operations on a multidimensional cube at high speed.

The data analysis processing apparatus according to one aspect of the present invention includes a multidimensional database, an OLAP operation execution unit, and a multidimensional database management unit. The multidimensional database stores data embodying a real-world event in a multidimensional cube constructed for each subject in association with the identifier of the event. The OLAP operation execution unit executes an OLAP (Online Analytical Processing) operation on a multidimensional cube in response to a request from a client.
Further, when the OLAP operation execution unit executes an OLAP operation on a certain multidimensional cube, at least one of the arguments instructed by the client as the argument of the OLAP operation or the data constituting another multidimensional cube. To use.
The multidimensional database management unit manages time-dimensional data, spatial-dimensional data, multiple types of unique-dimensional data, and data representing multiple types of characteristics in a multidimensional cube. If each of the data constituting the multidimensional cube is multidimensional data, the multidimensional database management unit classifies the multidimensional data in a multidimensional value range common among the multidimensional cubes.
More specifically, the multidimensional database management unit determines that if each of the data of each dimension constituting the multidimensional cube / the data representing each characteristic is multidimensional data, the multidimensional value range common to the multidimensional cubes is used. Classify by. When the data classified by the range belongs to a single range, the multidimensional database management unit stores and manages the data in the storage area corresponding to the range. When the data classified by the range belongs to multiple ranges, the multidimensional database management unit stores and manages the actual data or the reference of the data in the storage area corresponding to each range. do.
In addition, the multidimensional database management unit simply uses the range used for classification as an index when searching / manipulating the data constituting the multidimensional cube using the data constituting another multidimensional cube as a key. When executing one search / operation, the range to be searched / operated is in the storage area corresponding to the same range of both multidimensional cubes and the storage area corresponding to the range near the same range of both multidimensional cubes. In addition to limiting the number of searches / operations, when multiple searches / operations are executed in parallel, conflicts in the storage area to be searched / operated are further avoided.

According to one aspect of the present invention, it is possible to provide a technique capable of executing OLAP operations on a multidimensional cube at high speed.

FIG. 1 is a functional block diagram showing an example of a data analysis processing apparatus according to the present invention. FIG. 2 is a diagram for explaining a data storage state in the multidimensional database 16. FIG. 3 is a diagram showing an example of a range of a wide range including the widest data or the main data. FIG. 4 is a diagram showing an example of a storage area corresponding to a hierarchy of a range in which a higher range includes a lower adjacent range. FIG. 5 is a sequence diagram for explaining an example of the operation of the data analysis processing device 10. FIG. 6 is a flowchart showing an example of the processing procedure of the multidimensional database management unit 15. FIG. 7 is a diagram for explaining an example of processing for limiting the search / operation range in the storage area by the multidimensional database management unit 15. FIG. 8 is a diagram for explaining another example of the process of limiting the search / operation range in the storage area by the multidimensional database management unit 15. FIG. 9 is a diagram for explaining an example of an operation of avoiding a conflict in a storage area searched / operated by the multidimensional database management unit 15. FIG. 10 is a diagram for explaining another example of the operation of avoiding the conflict of the storage area searched / operated by the multidimensional database management unit 15. FIG. 11 is a diagram for explaining an example of a process in which the multidimensional database management unit 15 selects a hierarchy of a range. FIG. 12 is a schematic diagram for explaining an example of an operation of suppressing redundant processing when a range corresponding to a plurality of storage areas is selected. FIG. 13 is a diagram showing an example of tabular data representing the situation shown in FIG. FIG. 14 is a block diagram showing an example of the hardware configuration of the data analysis processing apparatus according to the present invention.

Hereinafter, embodiments relating to the present invention will be described with reference to the drawings.

(Constitution)
FIG. 1 is a functional block diagram showing an example of a data analysis processing apparatus according to the present invention. The data analysis processing device 10 includes an OLAP operation execution unit 11, a multidimensional database management unit 15, and a multidimensional database 16.

The multidimensional database 16 stores data embodying an event in the real world in a multidimensional cube in association with an event identifier for identifying an event that is an information source of the data. Multidimensional cubes are constructed by subject. The accumulated data includes time-dimensional data, spatial-dimensional data, a plurality of types of unique-dimensional data, and data representing a plurality of types of characteristics. There are multiple types of subject-dependent data in the eigendimensional dimension. The characteristic data is identified by time-dimensional, spatial-dimensional, and eigen-dimensional data. There are multiple types of subject-dependent data that represent characteristics.

When each of the data of each dimension constituting the multidimensional cube / the data representing each characteristic is the multidimensional data, the multidimensional database 16 is the multidimensional data in the multidimensional value range common among the multidimensional cubes. To classify. Then, when the data classified by the range belongs to a single range, the multidimensional database 16 stores the data in the storage area corresponding to the range. Further, when the data classified by the range belongs to a plurality of ranges, the multidimensional database 16 duplicately stores the data entity or the reference in the storage area corresponding to each range.

FIG. 2 is a diagram for explaining the data accumulation state in the multidimensional database 16. In FIG. 2, when data a to c, which are two-dimensional data representing features and the like, are classified into value ranges 1 to 4, which are two-dimensional value ranges representing areas and the like, data a to c are in the range 1 and data are in the range 2. Data c is classified into b and range 3. The data a belongs to the range 1, the data b belongs to the

range

1 and 2, and the data c belongs to the

range

1 and 3.

For data belonging to multiple ranges, for example, the main body of the data entity is stored in the storage area corresponding to the range corresponding to the widest overlapping range, and the entity is duplicated or duplicated in the storage area corresponding to the other ranges. Accumulate references to the body of an entity. The reference is, for example, the address of the data stored in the storage.

Distinguish between the body of an entity that accumulates in a storage area and a duplicate of an entity or a reference to the body of an entity, for example, by partitioning within the storage area to store, marking the data to be stored, or creating an index. be able to. The replication of the entity and the reference to the body of the entity accumulated in the storage area are, arbitrarily or according to the criteria, from the replication of the entity to the reference to the body of the entity, from the reference to the body of the entity to the replication of the entity. Can be changed.

If you access the duplicate of the data entity, the storage that you can access even if you access the duplicate of the data entity and the main body of the data entity at the same time in order to access the storage area that stores the duplicate of the data entity. Areas do not conflict.

If you access the reference to the body of the data entity, you need to access the storage area that stores the body of the data entity you are referencing through the storage area that stores the reference to the body of the data entity. If the reference to the body of the entity and the body of the data entity are accessed at the same time, the storage areas to be accessed may conflict.

Here, the range is set to, for example, a size that can include the widest data or a size that can contain the main data. By doing so, the number of range to which the data belongs can be suppressed to the number of adjacent range at most.

In this way, the multidimensional database 16 classifies the multidimensional data in the multidimensional range, and when the data classified in the range belongs to a single range, the multidimensional database 16 stores the data in the storage area corresponding to the range. When the data classified by the range belongs to a plurality of ranges, the multidimensional database 16 duplicately stores the data entity or the reference in the storage area corresponding to each range.
In FIG. 2, * represents the substance (main body) of the data, and ** represents the duplication of the substance of the data / the reference to the body of the substance.

FIG. 3 is a diagram showing an example of a range of a wide range including the widest data or the main data. When changing the range of the multidimensional database 16, for example, when new data is accumulated, the data is re-accumulated according to the new range, including the accumulated data. .. Further, for the multidimensional database 16, for example, a hierarchy of the range in which the upper range includes the lower adjacent range is constructed, and the hierarchy of the range to be used is selected according to the situation. When the hierarchy of the range corresponding to the plurality of storage areas is selected for the multidimensional database 16, the data duplicated and stored in the plurality of storage areas is not used.

FIG. 4 is a diagram showing an example of a storage area corresponding to the hierarchy of the range in which the upper range includes the lower adjacent range.

The OLAP operation execution unit 11 executes an OLAP operation on multidimensional data according to the OLAP operation received from the client 20 and the arguments. That is, the OLAP operation execution unit 11 instructs the multidimensional database management unit 15 to perform an OLAP operation on the multidimensional data. Further, when the OLAP operation execution unit 11 receives the result of the instructed operation from the multidimensional database management unit 15, the OLAP operation execution unit 11 transmits the operation result to the client 20.

The multidimensional database management unit 15 refers to the information in the value range used for classifying the data of each dimension constituting the multidimensional cube / the data representing each characteristic as index information in response to the instruction of the OLAP operation execution unit 11. Specify the storage area to be searched / operated based on the referenced index information. Further, the multidimensional database management unit 15 searches / operates the data constituting the multidimensional cube in parallel with the range corresponding to the storage area as the processing unit. Then, when the search / operation of all the storage areas to be searched / operated is completed, the multidimensional database management unit 15 aggregates the search / operation results and returns the operation result to the OLAP operation execution unit 11. Further, the multidimensional database 16 is managed so that the data is accumulated and used in the multidimensional database 16 as described above.

(Action)
Next, the processing operation of the data analysis processing apparatus configured as described above will be described.
FIG. 5 is a sequence diagram for explaining an example of the operation of the data analysis processing device 10. In FIG. 5, when the OLAP operation execution unit 11 receives an OLAP operation and an argument from the client 20, it instructs the multidimensional database management unit 15 to operate the multidimensional data accordingly.

The multidimensional database management unit 15 refers to and refers to the information in the value range used for classifying the data of each dimension constituting the multidimensional cube / the data representing each characteristic as index information in response to the operation instruction of the multidimensional data. Specify the storage area to be searched / operated based on the index information. The multidimensional database management unit 15 searches / operates the data constituting the multidimensional cube in parallel in parallel with the range corresponding to the storage area as the processing unit (“PARALLELL” surrounded by the broken line in FIG. 5).

The multidimensional database management unit 15 repeats until the search / operation of all the storage areas to be searched / operated is completed (“LOOP” surrounded by the broken line in FIG. 5), and when the search / operation is completed, the search / operation results are aggregated and the operation results are displayed. Return it to the OLAP operation execution unit 11.

The OLAP operation execution unit 11 repeats the instruction to the multidimensional database management unit 15 according to the received OLAP operation and the contents of the argument ("LOOP" surrounded by the broken line in FIG. 5). When the OLAP operation execution unit 11 acquires the final operation result corresponding to the OLAP operation and the contents of the argument, the OLAP operation execution unit 11 returns the operation result of the OLAP operation to the client 20.

Next, the details of the operation of the multidimensional database management unit 15 will be described.
FIG. 6 is a flowchart showing an example of the processing procedure of the multidimensional database management unit 15. In FIG. 6, the multidimensional database management unit 15 waits for the reception of the operation instruction of the multidimensional data from the OLAP operation execution unit 11 (step S11). Upon receiving the operation instruction, the multidimensional database management unit 15 refers to the information in the range used for classifying the data of each dimension constituting the multidimensional cube / the data representing each characteristic as index information (step S12).

Next, the multidimensional database management unit 15 specifies a storage area to be searched / operated based on the referenced index information (step S13), and configures a multidimensional cube with the value range corresponding to the storage area as a processing unit. Search / operate data in parallel (steps S141 to S14N). This process is repeated in step S15 until it is determined that the search / operation of all the storage areas to be searched / operated has been completed.

At this time, when executing a single search / operation, the multidimensional database management unit 15 sets the storage area corresponding to the same range of both multidimensional cubes and the range near the same range of both multidimensional cubes. Limit the search / operation range to the corresponding storage area. Further, when a plurality of searches / operations are executed in parallel, the multidimensional database management unit 15 further avoids a conflict in the storage area to be searched / operated. Then, the multidimensional database management unit 15 aggregates the search / operation results (step S16).

In this way, the multidimensional database management unit 15 configures another multidimensional cube as an argument of the OLAP operation when executing an OLAP operation on a certain multidimensional cube in response to an operation instruction of the multidimensional data. When using the data to be used, the data constituting a certain multidimensional cube is searched / operated by using the data constituting another multidimensional cube as a key.
That is, when the multidimensional database management unit 15 executes a single search / operation by using the range used for classifying the data of each dimension constituting the multidimensional cube / the data representing each characteristic as an index. Limits the search / operation range to the storage area corresponding to the same range of both multidimensional cubes and the storage area corresponding to the range in the vicinity of the same range of both multidimensional cubes. Further, when a plurality of searches / operations are executed in parallel, the multidimensional database management unit 15 further avoids a conflict in the storage area to be searched / operated.

FIG. 7 is a diagram for explaining an example of processing for limiting the search / operation range in the storage area by the multidimensional database management unit 15. As shown in FIG. 7, when the multidimensional database management unit 15 searches / operates the data constituting the multidimensional cube 1 using the data constituting the multidimensional cube 0 as a key, the value ranges 01, 02, 04 The data included in or superimposed on the data classified in the corresponding

storage areas

01, 02, 04 and stored and managed in the corresponding

storage areas

11, 12, 14 are classified into the

value areas

11, 12, and 14, respectively, and stored and managed in the corresponding

storage areas

11, 12, 14. The range to be searched / operated in the set of

areas

01 and 11, the set of

areas

02 and 12, and the set of

areas

04 and 14, which are storage areas corresponding to the same value range of both multidimensional cubes. Can be limited.

FIG. 8 is a diagram for explaining another example of the process of limiting the search / operation range in the storage area by the multidimensional database management unit 15. As shown in FIG. 8, when the multidimensional database management unit 15 searches / operates the data constituting the multidimensional cube 1 using the data constituting the multidimensional cube 0 as a key, it is classified into a range 01 and a range. The data in the vicinity represented by the dotted circle from the center of gravity of the data stored and managed in the storage area corresponding to 01 is the range 11 and the

range

12, 14, 15 within the range of the radius of the dotted circle from the range 11. Since the data is stored and managed in the corresponding

storage areas

11, 12, 14, and 15, the storage area corresponding to the same range of both multidimensional cubes and the vicinity of the same range of both multidimensional cubes. The range to be searched / operated can be limited to the pair of the area 01 and the

areas

11, 12, 14, and 15, which are the storage areas corresponding to the range of. The same applies to the data classified into other range and stored and managed in the storage area corresponding to the range.

In this way, when the multidimensional database management unit 15 specifies the storage area to be searched / operated based on the referenced index information, the storage area corresponding to the same range of both multidimensional cubes and the two multidimensional cubes. The range to be searched / operated is limited to the storage area corresponding to the range in the vicinity of the same range of.

FIG. 9 is a diagram for explaining an example of an operation of avoiding a conflict in the storage area to be searched / operated by the multidimensional database management unit 15. This will be described in association with the schematic diagram of FIG. 7. As shown in FIG. 9, it is a storage area corresponding to the same value range of both multidimensional cubes when the data constituting the multidimensional cube 1 is searched / operated by using the data constituting the multidimensional cube 0 as a key. By searching / manipulating the data constituting the multidimensional cube in parallel with the set of

areas

01 and 11, the set of

areas

02 and 12, and the set of

areas

04 and 14, the conflict of the storage area to be searched / operated can be found. It can be avoided. This is because the data included or duplicated in the data classified in the

range

01, 02, 04 and stored and managed in the corresponding

storage areas

01, 02, 04 is classified into the

areas

11, 12, and 14, respectively, and the corresponding storage area 11 is used. This is because the data is stored and managed in 12, 14.

FIG. 10 is a diagram for explaining another example of the operation of avoiding the conflict of the storage area searched / operated by the multidimensional database management unit 15. This will be described in association with the schematic diagram of FIG. In FIG. 10, when the data constituting the multidimensional cube 1 is searched / operated using the data constituting the multidimensional cube 0 as a key, the storage area is classified into the value range 01 and corresponds to the value range 01 as in FIG. The data in the vicinity represented by the dotted circle from the center of gravity of the data accumulated and managed in is classified into the value range 11 and the value ranges 12, 14, and 15 within the range of the radius of the dotted circle from the value range 11. The data stored and managed in the

storage areas

11, 12, 14, and 15, which are classified into the value range 04 and are stored and managed in the storage area corresponding to the value range 04. The data in is classified into the value range 14 and the

value range

11, 12, 15, 17, 18 within the range of the radius of the dotted circle from the value range 14, and accumulated in the corresponding

storage areas

11, 12, 15, 17, 18 and stored.

Areas

01 and 15, 14, 12 which are storage areas corresponding to the same value range of both multidimensional cubes and storage areas in the vicinity of the same value range of both multidimensional cubes because they are managed data. , 11 pairs,

regions

04 and 18, 17, 15, 14 as a unit, when searching / operating the data constituting the multidimensional cube in parallel, the region 15 for the data in the region 01. , 14, 12, 11 and the storage area to be searched / operated by matching the search / operation order such as the order of

areas

18, 17, 15, 14, 12, 11 for the data in the area 04. Conflict can be avoided. The same applies to the data classified into other range and stored and managed in the storage area corresponding to the range.

A pair of

areas

01 and 15, 14, 12, and 11, which are a storage area corresponding to the same range of both multidimensional cubes and a storage area corresponding to a range in the vicinity of the same range of both multidimensional cubes, a region 04. And 18, 17, 15, 14 are used as a unit to search / operate the data constituting the multidimensional cube in parallel. The same applies to the data classified into other range and stored and managed in the storage area corresponding to the range.

As shown in FIGS. 9 and 10, when the duplicate of the data entity is stored in the storage area, the duplicate of the data entity and the data entity are in different storage areas, and therefore the search / operation is performed. You can completely avoid storage space conflicts.

On the other hand, when the reference to the main body of the data entity is stored in the storage area, the reference destination to the main body of the data entity and the main body of the relevant data entity are in the same storage area. Therefore, when the main body of any of the data stored in the storage area is searched / operated, the conflict of the storage area to be searched / operated cannot be avoided. On the other hand, when the reference to the main body of any of the stored data is searched / operated in the storage area, the conflict of the storage area to be searched / operated can be avoided. Further, if the reference to the main body of the entity is accumulated instead of accumulating the copy of the entity, the required amount of the storage area can be suppressed.

In this way, when the multidimensional database management unit 15 further searches / operates the data constituting the multidimensional cube in parallel with the range corresponding to the storage area as the processing unit based on the referenced index information. Avoid conflicts in the storage area to be searched / operated.

In the description of FIGS. 7 to 10, the storage area to which the data does not belong is excluded from the processing target in the first place. When data belongs to multiple range, the same data is searched / operated in multiple sets of storage area because the entity or reference is duplicated and managed in the storage area corresponding to each range. In some cases. As a result, if the same result is obtained, the duplicated results are aggregated.

FIG. 11 is a diagram for explaining an example of a process in which the multidimensional database management unit 15 selects a range hierarchy. In FIG. 11, the multidimensional database management unit 15 identifies the storage area to be searched / operated based on the referenced index information, and simultaneously parallels the data constituting the multidimensional cube with the storage area corresponding to the value range as a unit. Consider the case of searching / operating. In this case, the multidimensional database management unit 15 sets the hierarchy of the range in which the upper range includes the lower adjacent range for the range used for classifying the data of each dimension constituting the multidimensional cube / the data representing each characteristic. Build and select the range hierarchy to be the processing unit of search / operation according to the situation.

For example, if the situation is to select according to the value of the stored data, select the level of the range that can accommodate the widest data or the range that can accommodate the main data, and the data belongs. Limit the number of ranges to the number of adjacent ranges at most.

The range that can contain the widest data and the range that can contain the main data specifies the level of the range that can contain the data each time the data is accumulated, and the level of the maximum range and the most frequent. It is obtained by calculating the level of the range of. For example, since the data a and b cannot be included in the level 2 range and can be included in the level 1 range, the level 1 range is selected.

Also, for example, if the selection is made according to the degree of parallelism that can be executed, the selection is made based on the number of available CPU cores and the status of other processing, and the processing capacity is maximized. For example, if a level 2 range is selected, the 64 storage area corresponds to the 64 range, and 64 is the upper limit of the degree of parallelism that can be executed. If the range of level 1 is selected, the 64 storage areas are aggregated into four, corresponding to the four ranges, and 4 is the upper limit of the degree of parallelism that can be executed. If the range of level 0 is selected, the 64 storage areas are aggregated into one, corresponding to one range, and 1 is the upper limit of the degree of parallelism that can be executed.

The degree of parallelism that can be executed is larger than the number of CPU cores when I / O waits are taken into consideration, and less than the number of CPU cores when the execution of other processes is taken into consideration. Therefore, the degree of parallelism that can be executed is calculated based on the information set in advance and the information acquired from the OS (Operating System). For example, if the number of CPU cores is 4, the range of level 1 whose range number is closest to the number of CPU cores is selected.

12 and 13 are diagrams for explaining an example of processing for suppressing redundant processing by the multidimensional database management unit 15. As in the case where the level 1 range is selected in FIG. 11, the multidimensional database management unit 15 selects the range hierarchy corresponding to a plurality of storage areas as the range hierarchy used as the search / operation processing unit. think. In this case, redundant processing can be suppressed by not using the data that is duplicately stored and managed in a plurality of storage areas. When data belongs to multiple range, the entity or reference is stored and managed in duplicate in the storage area corresponding to each range. Therefore, when the same data is searched / operated in multiple sets of storage areas. There is. As a result, if the same result is obtained, it is necessary to aggregate the duplicated results. The multidimensional database management unit 15 suppresses this redundant processing.

In FIG. 12, similarly to FIG. 11, when the level 1 range is selected as the hierarchy of the range to be the processing unit of the search / operation, the data a is the level for the level 2 range included in the level 1 range. It is classified into the range 2 of 2 and stored and managed in the corresponding storage area 2, and the data b is classified into the

range

2, 3, 6 and 7 of the level 2 and stored and managed in the corresponding

storage areas

2, 3, 6 and 7. It is shown that the range 1 to 16 of the level 2 is included in the range 3 of the level 1, and the range 1 to 4 of the level 1 is included in the range 1 of the level 0.

FIG. 13 is an example of tabular data representing the situation shown in FIG. Similar to FIG. 11, when the level 1 range is selected as the hierarchy of the range used as the search / operation processing unit, the multidimensional database management unit 15 corresponds to the level 2 range included in the level 1 range. Data is read out and processed in order from each storage area. For example, when the data a is read from the storage area corresponding to the range 2 of the level 2, by searching the tabular data of FIG. 13, the data is stored only in the storage area corresponding to the range 2 of the level 2. Can be identified. Therefore, in order to suppress redundant processing, the multidimensional database management unit 15 searches / operates the storage area corresponding to the range 2 of the level 2 of the paired multidimensional cube.

Further, for example, when the data b is read from the storage area corresponding to the range 2 of the level 2, the storage area corresponding to the

range

3, 6 and 7 of the level 2 is searched by searching the tabular data of FIG. It can be identified that it is also accumulated. Therefore, the multidimensional database management unit 15 searches / operates the storage area corresponding to the

range

2, 3, 6, and 7 of the level 2 of the paired multidimensional cube. Further, in order to suppress redundant processing, the multidimensional database management unit 15 marks the tabular data in FIG. 13 that the data b has been processed, and corresponds to the

range

3, 6 and 7 of the level 2. Data b is not read from the storage area. In addition, in case the hierarchy of the value range corresponding to multiple storage areas is selected at any opportunity, the main body of the entity, the duplication of the entity, and the reference to the main body of the entity are displayed in the storage area corresponding to the hierarchy. If it has been accumulated, the copy of the entity and the reference to the main body of the entity can be deleted and reflected in the tabular data of FIG. 13, or the storage area and the state before the deletion can be obtained after the deletion. It is also possible to return the tabular data of.

FIG. 14 is a block diagram showing an example of the hardware configuration of the data analysis processing apparatus according to the present invention. In FIG. 14, the data analysis processing device 10 includes a processor 12, a storage 200 for storing a multidimensional database 16, an interface unit 13, and a memory 14. That is, the data analysis processing device 10 is a computer, and is realized as, for example, a personal computer, a server computer, or the like.

The interface unit 13 is connected to the network 100 and receives access from the client 20 connected to the network 100.

The storage 200 is a non-volatile storage medium (block device) such as an HDD (Hard Disk Drive) or SSD (Solid State Drive). The storage 200 stores a multidimensional database 16 in a predetermined storage area in addition to basic programs such as an OS (Operating System) and a device driver, and a program for realizing the functions of the data analysis processing device 10.

The memory 14 in FIG. 14 is, for example, a RAM (RandomAccessMemory), and stores a program 14a loaded from the storage 200 and various data 14b.

Further, the processor 12 in FIG. 14 is an arithmetic unit such as a Central Processing Unit (CPU) or a Micro Processing Unit (MPU), and its function is realized by a program loaded in the memory 14.

By the way, the processor 12 includes an OLAP operation execution unit 11 and a multidimensional database management unit 15 as processing functions related to the embodiment. The OLAP operation execution unit 11, the multidimensional database management unit 15, and the time-series alignment unit 17 are processing functions realized by the processor 12 executing the instructions included in the program 14a. That is, the data analysis processing device 10 of the present invention can also be realized by a computer and a program. In addition to recording and distributing the program on a recording medium such as an optical medium, it is also possible to provide the program through a network.

The OLAP operation execution unit 11 and the multidimensional database management unit 15 include integrated circuits such as an ASIC (Application Specific Integrated Circuit) and an FPGA (field-programmable gate array) in place of or in addition to the processor 12. , May be realized in various other formats.

The processor 12 can receive the OLAP operation and the argument from the client 20 via the interface unit 13, and can send the operation result to the client 20.

(effect)
As described above, in the embodiment, when each of the data constituting the multidimensional cube is multidimensional data, the multidimensional database management unit 15 shares the data among the multidimensional cubes. Classify by value range. Further, when the data classified by the range belongs to a single range, the multidimensional database management unit 15 stores the data in the storage area corresponding to the range, and the data classified by the range belongs to a plurality of ranges. In that case, the entity or reference is duplicated and accumulated in the storage area corresponding to each range.

Also, the range information used to classify the data to be operated that constitutes the multidimensional cube is used as index information. As a result, when performing a single search / operation, the storage area corresponding to the same range of both multidimensional cubes and the storage area corresponding to the range near the same range of both multidimensional cubes are searched. / Limit the range of operation. Further, when a plurality of searches / operations are executed at the same time, the conflict of the storage area to be searched / operated is further avoided.

By doing so, even when each of the data of each dimension constituting the multidimensional cube / the data representing each characteristic is multidimensional data, or when the data classified by the range belongs to a plurality of ranges, it is simple. When executing one search / operation, the range of the search / operation can be limited, and when a plurality of searches / operations are executed at the same time, it is possible to avoid the conflict of the storage area to be further searched / operated.

Therefore, according to the embodiment, even when each of the data of each dimension constituting the multidimensional cube / the data representing each characteristic is multidimensional data, or when the data classified by the range belongs to a plurality of ranges. Processing can be speeded up.

Further, when executing an OLAP operation on a certain multidimensional cube, the multidimensional database management unit 15 uses data constituting another multidimensional cube as an argument of the OLAP operation. At this time, when the data constituting a certain multidimensional cube is searched / operated by using the data constituting another multidimensional cube as a key, the multidimensional database management unit 15 is the data of each dimension constituting the multidimensional cube. / For the value range used for classifying the data representing each characteristic, the hierarchy of the value range in which the upper value range includes the lower adjacent value range is constructed. Further, the multidimensional database management unit 15 selects a hierarchy of a range to be a processing unit of search / operation according to a situation such as a value of accumulated data and a degree of parallelism that can be executed. Further, when the multidimensional database management unit 15 selects a hierarchy of range corresponding to a plurality of storage areas, the multidimensional database management unit 15 does not use the data duplicated and stored and managed in the plurality of storage areas.

In this way, even when the hierarchy of the range corresponding to multiple storage areas is selected, if the data belongs to multiple ranges, the entity or reference is duplicated and managed in the storage area corresponding to each range. Therefore, the same data may be searched / operated in a plurality of sets of storage areas. When the same result is obtained, it is necessary to aggregate duplicate results, but redundant processing can be suppressed within the processing unit of search / operation.

Therefore, even when the range hierarchy corresponding to a plurality of storage areas is selected, redundant processing can be suppressed and the speed can be increased within the search / operation processing unit.

Therefore, according to the embodiment, it is possible to speed up the process of searching / operating the data constituting another multidimensional cube by using the data constituting the multidimensional cube as a key. That is, according to the embodiment, it becomes possible to provide a data analysis processing device, a data analysis processing method, and a program capable of executing OLAP operations on a multidimensional cube at high speed. More specifically, according to the embodiment, when data constituting another multidimensional cube is used as an argument of an OLAP operation, the data constituting one multidimensional cube and the data constituting another multidimensional cube are used. When searching / operating as a key, even if each of the data of each dimension that constitutes the multidimensional cube / the data that represents each characteristic is multidimensional data, or if the data classified by the value range belongs to multiple value ranges. When executing a single search / operation, limit the search / operation range, and when executing multiple searches / operations at the same time, avoid conflicts in the storage area to be searched / operated. Thereby, it is possible to provide a technique capable of executing OLAP operations on a multidimensional cube at high speed.

That is, the present invention is not limited to the above-described embodiment as it is, and at the implementation stage, the components can be modified and embodied within a range that does not deviate from the gist thereof. In addition, various inventions can be formed by an appropriate combination of the plurality of components disclosed in the above-described embodiment. For example, some components may be removed from all the components shown in the embodiments. In addition, components from different embodiments may be combined as appropriate.

10 ... Data analysis processing device 11 ... OLAP operation execution unit 12 ... Processor 13 ... Interface unit 14 ... Memory 14a ... Program 14b ... Data 15 ... Multidimensional database management unit 16 ... Multidimensional database 17 ... Time series alignment unit 20 ... Client 100 … Network 200… Storage

Claims

A multidimensional database that stores data that embodies real-world events in association with the identifier of the event in a multidimensional cube constructed for each subject.
An OLAP operation execution unit that executes an OLAP (Online Analytical Processing) operation on the multidimensional cube in response to a request from a client.
The multidimensional cube includes a multidimensional database management unit that manages time-dimensional data, spatial-dimensional data, a plurality of types of unique-dimensional data, and data representing a plurality of types of characteristics.
If each of the data constituting the multidimensional cube is multidimensional data, the multidimensional database management unit classifies the multidimensional data in a common multidimensional value range among the multidimensional cubes, and data analysis. Processing device.
The data analysis processing device according to claim 1, wherein the multidimensional database management unit stores the data in a storage area corresponding to the range when the classified data belongs to a single range.
When the classified data belongs to a plurality of range, the multidimensional database management unit duplicately accumulates the substance of the data or the reference of the data in the storage area corresponding to each of the ranges. Item 1. The data analysis processing apparatus according to Item 1.
The data analysis according to claim 1, wherein the OLAP operation execution unit uses at least one of an argument instructed by the client or other data constituting the multidimensional cube as an argument of the OLAP operation. Processing equipment.
The multidimensional database management unit simply uses the range used for the classification as an index when searching / operating the data constituting the multidimensional cube using the data constituting another multidimensional cube as a key. When executing one search / operation, the range to be searched / operated is in the storage area corresponding to the same range of both multidimensional cubes and the storage area corresponding to the range near the same range of both multidimensional cubes. The data analysis processing apparatus according to claim 1, wherein the data analysis processing apparatus is limited to, and when a plurality of searches / operations are executed in parallel, the content of the storage area to be searched / operated is further avoided.
The multidimensional database management unit constructs a range hierarchy in which the upper range includes the lower adjacent range, selects the range hierarchy to be the processing unit of search / operation according to the situation, and multiple storage areas. The data analysis processing apparatus according to claim 5, wherein when the hierarchy of the range corresponding to is selected, the data that is accumulated and managed in duplicate in the plurality of storage areas is not used.
The process in which a computer processor stores data embodying a real-world event in a multidimensional database constructed for each subject in association with the identifier of the event.
A process in which the processor executes an OLAP (Online Analytical Processing) operation on the multidimensional cube in response to a request from a client.
A process in which the processor manages time-dimensional data, spatial-dimensional data, a plurality of types of eigendimensional data, and data representing a plurality of types of characteristics in the multidimensional cube.
A data analysis processing method including a process of classifying the multidimensional data in a multidimensional range common among the multidimensional cubes if each of the data constituting the multidimensional cube is the multidimensional data.
A program that causes a computer processor to function as the data analysis processing device according to any one of claims 1 to 6.