Disclosure of Invention
The embodiment of the application provides a query method and a query device for multidimensional analysis of mass data, which are used for solving the problems that in the prior art, when online analysis processing is based on a relational database to store and query multidimensional data, query efficiency is low, and when online analysis processing is based on a multidimensional data organization to store and query multidimensional data, all dimensions are exhausted in advance, so that the calculation amount is huge.
The embodiment of the application provides a query method for multidimensional analysis of mass data, which comprises the following steps:
Receiving a query request which is sent by a user and carries dimension information to be queried, wherein the dimension information comprises: dimension name and dimension value;
according to the dimension information, inquiring data corresponding to the dimension information in a pre-established subcube table;
When the data corresponding to the dimension information is inquired, returning the data to the user; and when the data corresponding to the dimension information is not inquired, inquiring the data corresponding to the dimension information in a pre-established cube table, returning the data to a user, and collecting the dimension name contained in the dimension information as a whole, wherein the cube table is synthesized by partial columns in the cube table.
preferably, query information of a user carrying a dimension name to be queried is obtained in advance, data corresponding to the dimension name is queried according to the dimension name, and a subpube table is established according to the dimension name and the data corresponding to the dimension name.
preferably, when the data in the pre-established cube table is updated, the method further includes: and acquiring updated data in a pre-established cube table, performing dimensionality reduction on the acquired data, and updating the data subjected to dimensionality reduction to a pre-established subcube table.
Preferably, for any acquired data, determining a subbcube table containing at least one dimension name corresponding to the data, and for any determined subbcube table, reducing the dimension corresponding to the data to be consistent with the dimension corresponding to the subbcube table, and searching for data with the same dimension value corresponding to the data in the subbcube table; and when the data with the same dimension value corresponding to the data is found out, merging the data, and when the data with the same dimension value corresponding to the data is not found out, directly adding the data to the subbyte table.
Preferably, the method further comprises: in a specific time, in the specific time, the same dimension combinations including the dimension names are grouped into one group, the times of the dimension combinations including the dimension names and collected in each group are counted, a Subcube table is newly built under the condition that the times of the collected dimension combinations including the dimension names exceed a preset threshold value, data corresponding to the dimension names included in the dimension combinations are inquired in a pre-built cube table, and the data corresponding to the dimension names included in the dimension combinations are merged and added to the newly built Subcube table.
An embodiment of the present application provides a query device for multidimensional analysis of mass data, including:
a receiving module, configured to receive a query request sent by a user and carrying dimension information to be queried, where the dimension information includes: dimension name and dimension value;
The query module is used for querying data corresponding to the dimension information in a pre-established subbcube table according to the dimension information;
the data return module is used for returning the data to the user when the data corresponding to the dimension information is inquired; and when the data corresponding to the dimension information is not inquired, inquiring the data corresponding to the dimension information in a pre-established cube table, returning the data to a user, and collecting the dimension name contained in the dimension information as a whole, wherein the cube table is synthesized by partial columns in the cube table.
Preferably, the apparatus further comprises: the device comprises a pre-establishing module, a query module and a database module, wherein the pre-establishing module is used for acquiring query information of a user carrying a dimension name to be queried in advance, querying data corresponding to the dimension name according to the dimension name, and establishing a subpube table according to the dimension name and the data corresponding to the dimension name.
Preferably, the apparatus further comprises: and the first updating module is used for acquiring the updated data in the pre-established cube table when the data in the pre-established cube table is updated, performing dimensionality reduction on the acquired data, and updating the data subjected to dimensionality reduction to the pre-established subcube table.
Preferably, the first updating module is specifically configured to, for any obtained data, determine a subbcube table including at least one dimension name corresponding to the data, and for any determined subbcube table, reduce the dimension corresponding to the data to be consistent with the dimension corresponding to the subbcube table, and search for data with the same dimension value as the data in the subbcube table; and when the data with the same dimension value corresponding to the data is found out, merging the data, and when the data with the same dimension value corresponding to the data is not found out, directly adding the data to the subbyte table.
preferably, the apparatus further comprises: and the second updating module is used for grouping the same dimension combinations containing the dimension names into one group in specific time, counting the times of the dimension combinations containing the dimension names collected in each group, creating a subbcube table under the condition that the times of the collected dimension combinations containing the dimension names exceed a preset threshold value, inquiring data corresponding to the dimension names contained in the dimension combinations in a pre-established cube table, merging the data corresponding to the dimension names contained in the dimension combinations and adding the merged data into the newly created subbcube table.
The embodiment of the application provides a query method and a query device for multidimensional analysis of mass data, the method firstly receives a query request which is sent by a user and carries dimension information to be queried, wherein the dimension information comprises: and inquiring data corresponding to the dimension information in a pre-established subbcube table according to the dimension information, returning the data to the user when the data corresponding to the dimension information is inquired, inquiring the data corresponding to the dimension information in the pre-established cube table when the data corresponding to the dimension information is not inquired, returning the data to the user, and collecting the dimension name contained in the dimension information as a dimension combination, wherein the subbcube table is synthesized by partial columns in the cube table. Through the method, the subtube table is synthesized by partial columns in the cube table, namely the number of rows in the subtube table is less than that in the cube table, and subsequently, a user firstly queries the pre-established subtube table in the query process, so that the query efficiency can be effectively improved.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a query process of multidimensional analysis of mass data provided in an embodiment of the present application, which specifically includes the following steps:
S101: and receiving an inquiry request which is sent by a user and carries dimension information to be inquired.
in practical application, an enterprise user can store collected multidimensional data into a data warehouse according to a fixed storage format through online analysis processing, and then can quickly and flexibly perform complex query of large data volume through online analysis processing according to the dimension requirement of the enterprise user, and provide a query result to the enterprise user in an intuitive and understandable form.
and before inquiring the needed dimension, the enterprise user needs to store the collected multidimensional data into a data warehouse.
Furthermore, since the present application aims to reduce the number of rows of the whole table by extracting some columns in the cube according to the actual requirement of the user and separately creating a table through reducing the number of columns of the whole table, that is, the number of columns of the table created separately from the extracted columns is reduced compared with the original cube table, the number of rows of the whole table is also reduced compared with the previous cube table, and the number of rows of the whole table is reduced, the query speed on the table becomes faster, therefore, in the present application, some columns can be extracted from the cube table stored in advance according to the actual requirement, and a table can be created according to the extracted columns.
it should be noted that, in order to be able to well distinguish the original cube table, extract some columns according to the actual requirements, and create a table according to the extracted columns, in this application, some columns are extracted according to the actual requirements, and a table created according to the extracted columns is defined as a cube table, that is, the cube table is synthesized by part of columns in the cube table, and for the same cube table, the actual requirements of the user usually include a plurality of different dimensional combinations, and according to each different dimensional combination included in the actual requirements of the user, a cube table is created respectively according to data included in the cube table, that is, the same cube table can create a plurality of cube tables with different combination dimensions according to a plurality of different dimensional combinations included in the actual requirements of the user.
Further, the present application provides a method for pre-establishing a subbcube table, which specifically comprises the following steps:
Acquiring query information of a user carrying a dimension name to be queried in advance, querying data corresponding to the dimension name according to the dimension name, and establishing a subbyte table according to the dimension name and the data corresponding to the dimension name.
here, the data corresponding to the dimension name includes: the dimension value and the fact value corresponding to the dimension value are obtained, the dimension name to be queried can be only one dimension name or a combination of a plurality of dimension names, and the dimension name to be queried specifically comprises a plurality of dimension names which are determined according to the actual requirements of the user.
in addition, in the process of building a subbcube table according to the dimension name and the data corresponding to the dimension name, a subbcube table containing an actual value name and a dimension name is built first, each column corresponds to a dimension name, the fact values with the same dimension (i.e. containing the dimension name and the dimension value) are merged, and the merged fact value and the corresponding dimension value are filled into the subbcube table, for example, if the dimension corresponding to the actual value 1 is: the actual value 2 also corresponds to the dimensions: province, beijing and quarter, then the two fact values may be merged, the merged fact value being 3, i.e., 1+2, and the fact value 3 filling in the dimension: in the columns of fact values corresponding to beijing province and first quarter, if the fact value 2 corresponds to dimension a (i.e., beijing province) and dimension C (quarter) then the fact values cannot be combined.
For example, for purposes of simplicity and clarity in explaining the present application, assume that cube tables in the data warehouse are as in Table 4:
Dimension A
|
Dimension B
|
Dimension C
|
Factual value
|
A1
|
B1
|
C1
|
1
|
A1
|
B2
|
C2
|
1
|
A2
|
B1
|
C1
|
1
|
A2
|
B2
|
C2
|
1 |
TABLE 4
Suppose that a user a needs to establish a subtube table according to actual requirements, therefore, the data warehouse obtains query information of the user a carrying a dimension a (i.e., a dimension name) to be queried, and queries data corresponding to the dimension a, i.e., table 5, according to the dimension a.
Dimension A
|
Factual value
|
A1
|
1
|
A1
|
1
|
A2
|
1
|
A2
|
1 |
TABLE 5
Creating a subbcube table containing actual value names and dimensions a (i.e., dimension names), each column corresponding to a dimension name, merging the actual values with the same dimensions (i.e., containing the dimension names and the dimension values), and filling the merged actual values and the corresponding dimension values into the subbcube table, as shown in table 6:
Dimension A
|
factual value
|
A1
|
2
|
A2
|
2 |
TABLE 6
Further, after the creation of the subtube table is completed, the user can send a query request of the dimension information to be queried to the data warehouse through the terminal, and query the data corresponding to the required dimension.
It should be noted that, in the process of querying, it is necessary to know that the queried data is in the rows and columns, and therefore, the dimension information to be queried includes: dimension name and dimension value.
In addition, the dimension name included in the dimension information to be queried may be one, such as dimension a, or a combination of multiple dimension names, such as dimension a and dimension B, and the specific number of the dimension names included in the dimension information to be queried is determined according to the actual demand of the user.
Along with the above example, the user a queries the data corresponding to the dimension a (a 1) according to the actual requirement, and therefore, the user a sends a query request of the dimension a (a 1) (i.e., dimension information) to be queried to the data warehouse through the terminal, and queries the data corresponding to the required dimension.
s102: and inquiring data corresponding to the dimension information in a pre-established subtube table according to the dimension information.
after receiving a query request which is sent by a user and carries dimension information to be queried, a data warehouse directly queries in a pre-established subbyte table according to a dimension name and a dimension value contained in the dimension information, and queries data corresponding to the dimension information.
it should be noted that, the data corresponding to the dimension information may include: the actual value.
continuing the above example, after receiving a query request carrying dimension information to be queried and sent by a user a, a data warehouse directly performs query in a pre-established subcube table 6 according to a dimension a ═ a1 included in the dimension information, and queries that data corresponding to the dimension information is: the actual value is 2.
S103: when the data corresponding to the dimension information is inquired, returning the data to the user; and when the data corresponding to the dimension information is not inquired, inquiring the data corresponding to the dimension information in a pre-established cube table, returning the data to the user, and collecting the dimension name contained in the dimension information as a whole.
And when the data corresponding to the dimension information is inquired in a pre-established subtube table, returning the data to the user.
Continuing to use the above example, the data corresponding to the dimension information is queried as follows: the fact value of 2 is returned to the user A.
However, in the process of pre-establishing a subube, the name of the dimension to be queried is determined according to the actual requirement of the user, and the user is determined according to experience or historical data only when determining the actual requirement, so that due to the limitation of experience or historical data, in practical application, the dimension required by the user in the query may not be in the pre-established subube table, and therefore, when the data corresponding to the dimension information is not queried, the data corresponding to the dimension information can only be queried in the pre-established cube table, and the data is returned to the user.
Further, in practical application, although the data warehouse does not query the data corresponding to the dimension information in the pre-established subcube table according to the dimension information to be queried sent by the user, it can also be described that a subsequent user may have a tendency to query the dimension information to be queried, and when the current user queries the dimension information, although only a certain dimension value under the dimension name is queried, it is also described that the user may have a tendency to query other dimension values under the dimension name later.
For example, assuming that the user a in the above example queries data corresponding to the dimension B — B1 according to the actual demand, the data warehouse does not query data corresponding to the dimension information in the pre-established subube 6 table, and queries data corresponding to the dimension information in the pre-established cube table 4, that is, the fact value is: and 2, returning the data to the user A, and collecting the dimension B as a dimension combination.
Further, after each acquisition, whether the dimension name included in the acquired dimension combination needs to be judged to establish the subbcube table according to the dimension name included in the dimension combination or not is judged, and more computer resources are wasted.
Further, the specific process of determining whether to establish the subbcube table according to the dimension name included in the collected dimension combination is as follows:
In a specific time, the collected dimension combinations containing the dimension names are classified into one group, the times of the dimension combinations containing the dimension names collected in each group are counted, a subibe table is newly built under the condition that the times of the collected dimension combinations containing the dimension names exceed a preset threshold value, data corresponding to the dimension names contained in the dimension combinations are inquired in a pre-built cube table, the data corresponding to the dimension names contained in the dimension combinations are combined, and the data are added to the newly built subibe table, wherein the specific time is consistent with the certain time.
For example, assuming that the specific time is one day, the collected dimension combinations containing the dimension names are shown in table 7:
Dimension B
|
Dimension B
|
Dimension A and dimension B
|
Dimension C
|
Dimension B
|
dimension B and dimension C |
TABLE 7
grouping the same dimension combinations containing the dimension names into one group, and counting the times of the dimension combinations containing the dimension names collected in each group, as shown in table 8:
Dimension name
|
Number of times
|
dimension B
|
4
|
Dimension A and dimension B
|
1
|
Dimension C
|
1
|
Dimension B and dimension C
|
1 |
TABLE 8
Assuming that the preset threshold is 3 times, the data warehouse determines that the number of times of the collected dimension combination including the dimension B exceeds the preset threshold, that is, 3 times, a subube table is newly created, data corresponding to the dimension B included in the dimension combination is queried in a pre-established cube table 4, data with the same dimension value corresponding to the dimension B included in the dimension combination is merged and added to the newly created subube table, as shown in table 9:
Dimension B
|
Factual value
|
B1
|
2
|
B2
|
2 |
TABLE 9
Through the method, the subtube table is synthesized by partial columns in the cube table, namely the number of rows in the subtube table is less than that in the cube table, and subsequently, a user firstly queries the pre-established subtube table in the query process, so that the query efficiency can be effectively improved.
In practical applications, there is a case that data stored in a cube table of a data warehouse in advance is updated, and a subcube table is established according to data in the cube table, that is, when data in the cube table is updated, data in the subcube table is also changed, so in the present application, when data in the cube table is updated, data in the subcube table needs to be updated.
The present application provides a specific manner of updating data in a pre-established subbyte table, which is specifically as follows: and acquiring updated data in a pre-established cube table, performing dimensionality reduction on the acquired data, and updating the data subjected to dimensionality reduction to a pre-established subcube table.
in addition, in the process of performing dimension reduction processing on the acquired data, a subbcube table containing at least one dimension name corresponding to the data can be determined for any acquired data, the subbcube table is determined for any acquired data, the dimension corresponding to the data is reduced to be consistent with the dimension corresponding to the subbcube table, and the data with the same dimension value as that corresponding to the data is searched in the subbcube table; and when the data with the same dimension value corresponding to the data is found out, merging the data, and when the data with the same dimension value corresponding to the data is not found out, directly adding the data to the subbyte table.
for example, assume that the tables stored in the data warehouse contain: table 4, table 6, and table 9, it is assumed that user a adds a row of data in table 4, as shown in table 10:
Dimension A
|
Dimension B
|
dimension C
|
Factual value
|
A1
|
B1
|
C1
|
1
|
A1
|
B2
|
C2
|
1
|
A2
|
B1
|
C1
|
1
|
A2
|
B2
|
C2
|
1
|
A1
|
B3
|
C1
|
1 |
Watch 10
The data warehouse acquires the updated data from the table 10, and determines a subube table containing at least one dimension name corresponding to the data, namely, a subube table 6 and a subtube table 9, for the acquired data.
For the subube table 6, the dimension corresponding to the data is reduced to be consistent with the dimension corresponding to the subube table 6, that is, the dimension corresponding to the reduced data only includes the dimension a, and the data with the same dimension value as the data is found in the subube table 6, and is merged, specifically, as shown in table 11:
dimension A
|
factual value
|
A1
|
3
|
A2
|
2 |
TABLE 11
For the subube table 9, the dimension corresponding to the data is reduced to be consistent with the dimension corresponding to the subube table 9, that is, the dimension corresponding to the data after the dimension reduction only includes the dimension B, and the data with the same dimension value as the data is not found in the subube table 9, so that the data is directly added in the subube table 9, specifically as in table 12:
dimension B
|
Factual value
|
B1
|
2
|
B2
|
2
|
B3
|
1 |
TABLE 12
Based on the same idea, the embodiment of the present application further provides a query device for multidimensional analysis of mass data.
As shown in fig. 2, an inquiry apparatus for multidimensional analysis of mass data provided in an embodiment of the present application includes:
A receiving module 201, configured to receive a query request sent by a user and carrying dimension information to be queried, where the dimension information includes: dimension name and dimension value;
the query module 202 is configured to query, according to the dimension information, data corresponding to the dimension information in a pre-established subbcube table;
The data returning module 203 is configured to return the data to the user when the data corresponding to the dimension information is queried; and when the data corresponding to the dimension information is not inquired, inquiring the data corresponding to the dimension information in a pre-established cube table, returning the data to a user, and collecting the dimension name contained in the dimension information as a dimension combination, wherein the cube table is synthesized by partial columns in the cube table.
The device further comprises:
the pre-establishing module 204 is configured to obtain query information of a user carrying a dimension name to be queried in advance, query data corresponding to the dimension name according to the dimension name, and establish a subpube table according to the dimension name and the data corresponding to the dimension name.
The device further comprises:
The first updating module 205 is configured to, when data in the pre-established cube table is updated, obtain the updated data in the pre-established cube table, perform dimension reduction on the obtained data, and update the dimension-reduced data to the pre-established subcube table.
the first updating module 205 is specifically configured to, for any obtained data, determine a subbcube table including at least one dimension name corresponding to the data, and for any determined subbcube table, reduce the dimension corresponding to the data to be consistent with the dimension corresponding to the subbcube table, and search for data with the same dimension value as the data in the subbcube table; and when the data with the same dimension value corresponding to the data is found out, merging the data, and when the data with the same dimension value corresponding to the data is not found out, directly adding the data to the subbyte table.
The device further comprises:
a second updating module 206, configured to, in a specific time, group the same dimensional combinations including the dimension names into one group, count the times of the dimensional combinations including the dimension names acquired in each group, create a subbcube table when the times of the acquired dimensional combinations including the dimension names exceed a preset threshold, query the pre-established cube table for data corresponding to the dimension names included in the dimensional combinations, merge the data corresponding to the dimension names included in the dimensional combinations and add the merged data to the newly created subbcube table.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
as will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.