CN106484875B - MOLAP-based data processing method and device - Google Patents

MOLAP-based data processing method and device Download PDF

Info

Publication number
CN106484875B
CN106484875B CN201610893549.7A CN201610893549A CN106484875B CN 106484875 B CN106484875 B CN 106484875B CN 201610893549 A CN201610893549 A CN 201610893549A CN 106484875 B CN106484875 B CN 106484875B
Authority
CN
China
Prior art keywords
data
calculation
dimension
open source
source database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610893549.7A
Other languages
Chinese (zh)
Other versions
CN106484875A (en
Inventor
李寅威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Original Assignee
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shiyuan Electronics Thecnology Co Ltd filed Critical Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority to CN201610893549.7A priority Critical patent/CN106484875B/en
Publication of CN106484875A publication Critical patent/CN106484875A/en
Application granted granted Critical
Publication of CN106484875B publication Critical patent/CN106484875B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The invention discloses a data processing method and device based on MOLAP. The data processing method comprises the following steps: creating a data cube according to the fact table and the dimension table; performing data pre-calculation on all possible combinations of dimensions based on the data recorded in the data cube; and storing the pre-calculation result into an open source database so as to determine a query result according to the pre-calculation result during query. By adopting the method, the existing data query scheme can be optimized, so that non-technical personnel can also realize query based on mass data.

Description

MOLAP-based data processing method and device
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a data processing method and device based on MOLAP.
Background
An Online Analytical Processing (OLAP) system is the most important application of a data warehouse system, and is specially used for supporting complex analysis operations and emphasizing decision support for decision-making personnel and high-level management personnel. OLAP can perform complex query processing with large data volume according to the requirements of analysts, and provides query results to decision-makers in an intuitive form, so that the decision-makers can accurately master the operation conditions of enterprises (companies), know the requirements of objects and make correct schemes.
The OLAP system can be classified into three types, namely, a relational OLAP (roap), a Multidimensional OLAP (MOLAP), and a Hybrid OLAP (HOLAP) according to a data storage format of a memory thereof. The MOLAP physically stores multidimensional data used by OLAP analysis into a multidimensional array form to form a cubic structure.
The traditional MOLAP engine is limited by software and hardware resources, can only process data of gigabyte level or <10 terabytes level, and has higher requirements on server configuration when calculating data of a multidimensional cube. Meanwhile, when mass data is queried in real time based on the MOLAP, a structured query language (SQL on Hadoop) scheme based on a distributed system infrastructure is often adopted, on one hand, the time delay is as high as several seconds, tens of seconds or even tens of minutes, on the other hand, for some column databases, rapid query can be usually performed only according to row keys, and the query at the column level can be used only in a specific query scene. In addition, when querying, the writing of SQL statements is required, so that non-technical personnel cannot query.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data processing method and apparatus based on an MOLAP, so as to optimize an existing data query scheme, so that a non-technical person can also implement query based on mass data.
In a first aspect, an embodiment of the present invention provides a data processing method based on a MOLAP, including:
creating a data cube according to the fact table and the dimension table;
performing data pre-calculation on all possible combinations of dimensions based on the data recorded in the data cube;
and storing the pre-calculation result into an open source database so as to determine a query result according to the pre-calculation result during query.
In a second aspect, an embodiment of the present invention further provides a data processing apparatus based on a mol ap, including:
the cube creating module is used for creating a data cube according to the fact table and the dimension table;
the pre-calculation module is used for performing data pre-calculation on all possible combinations of the dimensions based on the data recorded in the data cube;
and the storage module is used for storing the pre-calculation result into the open source database so as to determine the query result according to the pre-calculation result during query.
According to the MOLAP-based data processing method and device provided by the embodiment of the invention, the data cube is created according to the fact table and the dimension table, data pre-calculation is carried out on all possible dimension combinations according to the data recorded in the data cube, and the pre-calculation result is stored in the open source database, so that when a user carries out data query, the server can determine the query result according to the corresponding pre-calculation result only by dragging the dimension and the measurement in the page of the client, and the user does not need to write SQL sentences. Meanwhile, the characteristics of the big data assembly and the MOLAP are fully utilized, the data query process is simplified, and the query response speed is improved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
fig. 1 is a flowchart of a data processing method based on a mol ap according to an embodiment of the present invention;
fig. 2 is a flowchart of a data processing method based on mol ap according to a second embodiment of the present invention;
FIG. 3 is a diagram of the hierarchy relationships of all possible combinations of dimensions of a data cube during data pre-computation;
fig. 4 is a flowchart of a method for creating an open source database table according to a second embodiment of the present invention;
fig. 5 is a flowchart of a data query method according to a second embodiment of the present invention;
FIG. 6 is a schematic diagram of a user interface for client query;
fig. 7 is a schematic structural diagram of a data processing apparatus based on a mol ap according to a third embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings.
Example one
Fig. 1 is a flowchart of a data processing method based on a mol ap according to an embodiment of the present invention. The method provided by the embodiment can be executed by a data processing device based on a MOLAP, and the device can be implemented by software and/or hardware and is integrated in a server. Referring to fig. 1, the method provided in this embodiment may include:
and S110, creating a data cube according to the fact table and the dimension table.
Wherein the fact tables and the dimension tables are stored in a data warehouse of the big data platform. The dimension table stores the values of all attributes in the dimension and the ID of each record. Such as sales territory dimensions including: the information of the region ID, the province of the region, the city of the region, the county of the region, etc., taking the guangzhou river district as a value under the dimension of the sales region as an example, the records of the river district in the dimension table are: 01 (i.e., value ID), guangdong province, guangzhou city, and the river region. The fact table stores fact data including an ID of a value of each dimension, a sales amount, and the like. For example, the fields of the sales fact table include for each sales record: the sales region ID (ID corresponding to the sales region dimension table), the product ID (ID corresponding to the product dimension table), the sales time ID (ID corresponding to the time dimension table), the number of sales, the sales amount, and the like, and it is possible to know which region and which product the number of sales is in which time, based on each ID corresponding to a certain number of sales.
Optionally, the fact table and the dimension table are created in advance, wherein specific creating rules, such as which dimensions are selected to create the dimension table and which dimension tables are created, can be set according to actual situations. In general, the number of dimension tables is not limited, and the number of fact tables is one.
Typically, after creating the fact table and the dimension table, data is obtained from an external database accessible by the current server and is all imported into the corresponding fact table and dimension table.
In particular, a data cube may also be referred to as a multidimensional cube, which may contain at least one dimension of a setting. For example, a data cube may include: region dimensions (values including Beijing, Tianjin, Shanghai and Guangdong), time dimensions (values including one quarter, two quarters, three quarters and four quarters), commodity dimensions (values including sportswear, kettle, sports shoes, sunshade umbrella and sunshade cap) and user age dimensions (values including 0-18 years old, 19-40 years old, 41-60 years old and over 60 years old), wherein fact data (including sales amount and sales quantity) corresponding to each dimension are stored in an associated fact table, namely the data cube comprises four dimension tables and one fact table and is associated through ID of a median value in the dimension tables recorded in the dimension tables and the fact tables. The fact data may be determined based on the determined data cube. For example, the sales amount of sportswear in one quarter of Beijing area can be determined according to the data cube, the sales amount of kettles in Tianjin area can be determined, and the sales amount of sports shoes of consumers 19-40 years old in four quarters of Shanghai area can be determined.
Optionally, at least one data cube may be created according to the fact table and the associated dimension table, wherein a specific creation rule may be set according to an actual situation.
And S120, performing data pre-calculation on all possible combinations of the dimensions based on the data recorded in the data cube.
The data cube can comprise different dimensions, and the different dimensions can be combined optionally. For the data cube exemplified in step S110, the data cube includes four dimensions, and all possible combinations of the dimensions have 16 dimensions, and the specific calculation manner may beTo be provided withFor example, the corresponding dimension combination includes (region dimension, time dimension, and commodity dimension), (region dimension, commodity dimension, and user age dimension), (region dimension, time dimension, and user age dimension), and (time dimension, commodity dimension, and user age dimension). Of course, when the data cube contains only one dimension, all possible combinations are only the combination containing that dimension and the empty set.
Further, after all possible combinations of the data cube are determined, data pre-calculation is carried out on the fact data corresponding to each combination to obtain the pre-calculation of each combinationAnd calculating a result. Optionally, data pre-calculation is performed according to the determined metric. In this embodiment, the metric is a requirement set on how data is pre-calculated or how a result is calculated, and preferably includes an aggregation type (the aggregation type includes an aggregation maximum value and/or an aggregation minimum value). Assuming that the dimension combination includes a region dimension and a time dimension, and the aggregation type is the aggregation maximum, the (region dimension, time dimension), (region dimension), (time dimension), andcorresponding aggregated maximum of fact data. Taking (region dimension) as an example, the data pre-calculation is to obtain the maximum aggregate value of the fact data in the fact tables corresponding to beijing, tianjin, shanghai and guangdong in the region dimension.
Therefore, data pre-calculation can be carried out on all possible dimension combinations corresponding to the data cube.
S130, storing the pre-calculation result into an open source database so as to determine a query result according to the pre-calculation result during query.
In this embodiment, the open database is preferably Hbase. The Hbase is a distributed storage system, and a large-scale unstructured storage cluster can be built in a common server by using the Hbase.
Further, when the user queries, the server can search the corresponding pre-calculation result in the Hbase and feed the pre-calculation result back to the client only by selecting the required dimension combination and measurement at the client.
According to the technical scheme provided by the embodiment, the data cube is created through the fact table and the associated dimension table, data pre-calculation is performed on all possible combinations of dimensions in the data cube, pre-calculation results are stored in the open source database, and calculation and storage of big data based on MOLAP can be achieved. Meanwhile, when a user inquires, only required dimensions and measurement need to be dragged in a page of the client side without writing an SQL sentence, and the server can determine an inquiry result according to a corresponding precomputation result, so that the data inquiry process is simplified, and the inquiry response speed is improved.
Example two
Fig. 2 is a flowchart of a data processing method based on a mol ap according to a second embodiment of the present invention. The data processing method provided in this embodiment is optimized based on the data processing method provided in the above embodiment. Referring to fig. 2, the data processing method provided in this embodiment specifically includes:
s210, creating a corresponding fact table and a corresponding dimension table according to the table item requirements of the fact table and the dimension table in the preset data analysis model.
Specifically, the preset data analysis model includes an entry requirement for a fact table and a dimension table, where the entry requirement may include a dimension value and a dimension hierarchy of the dimension table, and a fact data attribute (such as sales amount) of the fact table. In addition, the data analysis model may also include a metric (such as an aggregate maximum) at the time of data pre-calculation. The specific content in the data analysis model can be set according to actual conditions. In general, the data analysis model can be regarded as a plan of the data processing method in the present embodiment, that is, the data processing method in the present embodiment is executed according to the data analysis model.
Illustratively, a fact table and a dimension table associated with the fact table are created according to the entry requirements.
And S220, importing the data in the external database into the fact table and the dimension table according to the item requirements of the fact table and the dimension table.
Specifically, the external database is accessed according to the requirement of the table entry, and the corresponding data import fact table and the corresponding dimension table are found from the external database. The external database may include a database of each related business management system, an external database of the enterprise, and the like.
Further, when importing data into the fact table and the dimension table, the data warehouse may determine whether a user sets a specific data format, convert the original format of the data into the specific data format and import the data into the fact table and the dimension table if the specific data format is set, and import the data into the fact table and the dimension table according to the original format of the data if the specific data format is not set.
And S230, creating a data cube by using the fact table and the dimension table according to the metadata in the data analysis model.
Illustratively, metadata is included in the data analysis model, wherein the metadata is used to indicate attribute parameters and creation rules of the data cube. Optionally, the metadata may include at least one of: the data cube comprises creation time of the data cube, creation position of the data cube, name of the data cube, fact of the data cube, dimension and sequence of the data cube, measurement of the data cube, aggregation type, configuration information corresponding to a programming model, configuration information corresponding to a starting database, pre-calculation time, storage information of pre-calculation results and the like. The configuration information corresponding to the programming model is information informing the programming model of the size of the memory occupied by the data cube when the programming model is started. The configuration information corresponding to the open source database is the information of the corresponding relation between the pre-calculation result of the data cube and the open source database table. The storage information of the pre-calculation result includes the sequence of the pre-calculation result stored in the open source database for the multiple dimension codes related to the row main key (if the dimension a and the dimension B exist, the dimension a includes three values, which are 1, 2 and 3, respectively, when the corresponding pre-calculation result is stored in the open source database, the sequence of coding 1, 2 and 3, and the sequence of coding the dimension a and the dimension B). Optionally, the configuration information corresponding to the starting database, the pre-calculation time, and the data in the storage information of the pre-calculation result may be updated after the pre-calculation result is stored.
The basic information of the data cube at the time of creation, such as creation time, creation position, name, fact, dimension and its order and measurement, can be determined according to the metadata. Optionally, basic information of the metadata is stored in Hbase. That is, the Hbase query may be accessed when basic information of the data cube is needed.
S240, starting a pre-calculation programming model task according to metadata in the data analysis model, and reading data of all dimension tables and fact tables corresponding to the data cube.
Specifically, a task corresponding to the pre-computed programming model is created and started based on the metadata. In the embodiment, the precomputation programming model is MapReduce, which can realize parallel operation of large-scale data sets.
Optionally, the data of all the dimension tables and the fact tables corresponding to the data cubes in the data platform are read through the precomputation programming model.
And S250, arranging and combining the dimensions of all the dimension tables to obtain all possible combinations including the empty sets.
And S260, carrying out aggregation operation on the combination containing all dimensions according to the set aggregation rule to obtain an aggregation value.
For example, if the data cube includes 4 dimensions, represented by A, B, C and D, respectively, the combination including all the dimensions is (A, B, C, D).
Optionally, the aggregation rule may include an aggregation maximum and/or an aggregation minimum. Taking the aggregation maximum value as an example, the aggregation operation is to return (A, B, C, D) the maximum value max (m) of the corresponding fact data by using the aggregation function, and take max (m) as the aggregation value.
Optionally, dictionary coding may be performed on the values corresponding to the dimensions, and a coded aggregate value including a combination of all the dimensions is calculated.
S270, inputting the combination containing all dimensions as a key value of the pre-calculation programming model, and inputting the aggregation value as a key word of the pre-calculation programming model.
Specifically, the pre-calculation programming model adopts a step-by-step calculation mode during calculation. In the calculation, the combination containing all dimensions is used as the key value input, namely key input, of the pre-calculation programming model. And inputting the aggregation value as a keyword of the pre-calculation programming model, namely value input.
And S280, obtaining a new dimension combination and an aggregation value corresponding to the new dimension combination by utilizing a pre-calculation programming model.
Because a step-by-step calculation mode is adopted, the new dimension combination obtained through calculation is the superior dimension combination containing all the dimension combinations, and the aggregation value is the aggregation value of the new dimension combination.
For example, FIG. 3 is a diagram of the hierarchy relationships of all possible combinations of dimensions of a data cube during data pre-computation. Referring to FIG. 3, it may be determined that the current data cube contains four dimension values, A, B, C and D, respectively. The bottom level is a dimension combination for calculating the aggregation value first, that is, a combination including all dimensions, and calculates up step by step according to the hierarchical relationship in fig. 3 until an empty set is calculated.
And S290, judging whether the pre-calculation of all the possible data combinations is finished. If so, perform S2120, otherwise, perform S2100.
And determining whether to complete the data pre-calculation of all possible dimension combinations by judging whether to complete the data pre-calculation of the empty set.
And S2100, inputting the new dimension combination as a key value of the pre-calculation programming model, and inputting the aggregation value corresponding to the new dimension combination as a key word of the pre-calculation programming model.
For example, if the data pre-calculation for all possible combinations is not completed, a new dimension combination obtained by the pre-calculation programming model through the last calculation may be used as a key input, and a corresponding aggregation value may be used as a value input, so as to continue the data pre-calculation.
And S2110, obtaining a new dimension combination and an aggregation value corresponding to the new dimension combination by utilizing the pre-calculation programming model. Return to execution S290.
Further, after the pre-calculation result is obtained, the pre-calculation time may be saved in the metadata.
The following is an exemplary description of the data pre-calculation process for a data cube.
The data cube is set to include 4 dimensions, A, B, C and D respectively, where the value of dimension A is 1, the value of dimension B is 2, the value of dimension C is (3, 4), and the value of dimension D is (4, 6, 3, 8). I.e., the data cube contains 4 dimensional tables and 1 fact table in total.
Further, based on the metadata in the data analysis model, a pre-computed programming task model is launched and a total of 16 possible combinations of dimensions for the data cube are determined. The set aggregation rules are an aggregation maximum value and an aggregation minimum value. After dictionary coding is carried out on the value corresponding to each dimension, data pre-calculation is carried out on the dimension combination (A, B, C, D), and the maximum value MAX (M) and the minimum value MIN (N) of aggregation are obtained, namely the first-stage calculation is completed.
TABLE 1
The aggregate calculation results for the dimensional combinations can be determined (A, B, C, D) by table 1. Wherein (A, B, C, D) is used as key input of MapReduce, and max (m) and min (n) are used as value input of MapReduce, so that new dimension combinations including (A, B, C), (A, B, D), (A, C, D) and (B, C, D) and aggregation values corresponding to the new dimension combinations can be obtained, and the second-stage calculation is completed.
Table 2 lists the data pre-calculation results of the dimensional combinations (A, B, C) by way of example only.
TABLE 2
A B C MAX(M) MIN(N)
1 2 3 300 50
1 2 4 500 20
Further, a new dimension combination obtained by MapReduce in the second-stage calculation is used as key input, that is, (A, B, C), (A, B, D), (A, C, D) and (B, C, D) are used as key input, and max (m) and min (n) corresponding to each dimension combination are used as value input, so that a new dimension combination and an aggregation value corresponding to each new dimension combination can be obtained, and the third-stage calculation is completed. Wherein, the obtained new dimension combination comprises (A, B), (A, C), (A, D), (B, C), (B, D) and (C, D).
Table 3 is merely an exemplary pre-calculation of data listing the dimensional combinations (A, B).
TABLE 3
A B MAX(M) MIN(N)
1 2 500 20
Further, new dimension combinations (A, B), (A, C), (A, D), (B, C), (B, D) and (C, D) obtained by MapReduce in the third-stage calculation are used as key inputs, and max (m) and min (n) corresponding to each dimension combination are used as value inputs, so that the new dimension combinations and the aggregation values corresponding to the new dimension combinations can be obtained, and the fourth-stage calculation is completed. Wherein, the new dimension combination obtained by the calculation of the stage comprises: (A) (B), (C) and (D).
Table 4 lists the data pre-calculation results for dimensional combination (a) only for exemplary purposes.
TABLE 4
A MAX(M) MIN(N)
1 500 20
Further, new dimension combinations (a), (B), (C), and (D) obtained by MapReduce in the fourth-stage calculation are used as key inputs, and max (m) and min (n) corresponding to each dimension combination are used as value inputs, so that a new dimension combination and an aggregation value corresponding to each new dimension combination can be obtained, that is, the calculation of the last stage is completed. Wherein, the new dimension combination obtained by the stage of calculation is
Table 5 columnsDimension-yielding combinationPre-computing the result. After the result of the data pre-calculation of the empty set is obtained, the pre-calculation of the data cube is confirmed to be completed.
TABLE 5
And S2120, creating an open source database table for storing pre-calculation results.
When the pre-calculation result is stored in the open source database, the pre-calculation result is stored in a logical open source database table. Therefore, an open source database table for storing the pre-calculation result is created in advance. And the pre-calculation results of different data cubes correspond to different open source database tables.
Wherein, referring to fig. 4, the step may include:
s2121, determining the capacity of a storage area set for the open source database table in the open source database.
Wherein the storage area is a logical storage area. And querying the open source database for data is realized by querying the open source database table. If the amount of data stored in the open source database table is increasing, the speed of querying the data becomes slower. At this time, the source database table may be partitioned, i.e., divided into a plurality of logical storage areas. After the open source data table is partitioned, the representation form is still a complete table, and only when data is queried, the storage area in the open source data table is queried.
Typically, the open source database presets the capacity of each storage area of each open source database table during partitioning. Usually, this capacity is set to remain as constant as possible. For example, the capacity of the storage area is set to 256M, that is, each storage area can store up to 256M of data.
And S2122, determining the quantity value of the storage area required when the pre-calculation result is stored in the open source database table according to the size of the pre-calculation result.
For example, if the size of the pre-calculation result is 3G and the storage area size is 256M, then dividing 256M by 3G yields 12(3 × 1024 ÷ 256 ═ 12), i.e., it is determined that the number of required storage areas is 12.
Optionally, when the storage area quantity value required by the pre-calculation result is calculated, if the integer quantity value cannot be obtained, rounding up is performed on the obtained result, for example, if the size of the pre-calculation result is 3.1G, and the capacity of each storage area is 256M, dividing 3.1G by 256M to obtain 12.4, and rounding up 12.4 to obtain 13. Namely, the value of the storage area corresponding to the pre-calculation result is 13.
And S2123, sending the quantity value to the open source database, so that the open source database creates an open source database table for storing the pre-calculation result according to the quantity value.
Further, after the quantity value of the storage area required by the pre-calculation result is determined, the quantity value is sent to the open source database, that is, the open source database is notified to store the size of the open source database table required by the current pre-calculation result.
Specifically, after the open source database receives the quantity value, a corresponding open source database table is created, and the open source database table includes a storage area of the quantity value.
For example, if it is determined that 12 storage areas are needed for the current pre-calculation result, the open source database creates an open source database table, wherein the open source database table actually includes 12 storage areas.
And S2130, starting a storage programming model task, and taking a pre-calculation result as the input of the storage programming model task.
Specifically, the storage programming model task is started according to the metadata. In the present embodiment, the storage programming model is MapReduce. Wherein, the precomputation result can be used as the input of MapReduce.
S2140, generating a corresponding binary format file by using the storage programming model.
Further, the result output by the stored programming model is a default binary format file of the open source database, i.e. a file in HFile format. This has the advantage that performance impact of inserting pre-computed results into the source database one by one can be avoided.
S2150, importing the binary format file into the open source database table by using the bulk load of the open source database, so that the precomputation result is stored in the open source database.
Specifically, when the binary format file is imported into the open source database table, the binary format file can be simultaneously imported into the storage area corresponding to the open source database table by using the BulkLoad of the open source database.
For example, the binary format file includes a 1G pre-calculation result, and the corresponding open source database table includes 4 storage areas, so that when the BulkLoad is used, the import process of importing the first 256M pre-calculation result into the 1 st storage area, importing the next 256M pre-calculation result into the 2 nd storage area, importing the next 256M pre-calculation result into the 3 rd storage area, and importing the last 256M pre-calculation result into the 4 th storage area can be performed simultaneously.
In this embodiment, the advantage of performing partition storage on the starting database table is as follows: on one hand, when the pre-calculation result is imported, a plurality of nodes can simultaneously perform writing operation, and the data writing speed is accelerated by the load balancing principle; on the other hand, when data is queried, the query capability can be dispersed to each target node (namely a storage area), so that data inclination is effectively avoided, and the data query speed is increased.
S2160, storing the corresponding relation between the pre-calculation result and the open source database table into the metadata of the data analysis model.
Specifically, when the pre-calculation result is saved, the corresponding relationship between the pre-calculation result and the open source database table is saved in the configuration information of the metadata corresponding to the open source database. When data is inquired, the open source database table corresponding to the pre-calculation result can be determined according to the corresponding relation.
Optionally, the method may further include: and setting time intervals, and managing metadata in the data cube and the data analysis model.
The time can be set at intervals to manage all data cubes and corresponding metadata in the platform. The management of the data cube may include: modify, query, calculate, delete, etc.
For example, setting time at intervals, importing new record data in the external database into a fact table and a dimension table corresponding to the data cube, and modifying the data cube, the corresponding pre-calculation result and the storage position of the pre-calculation result by combining the newly imported data. And storing the modified related information of the data cube into corresponding metadata to complete modification and calculation of the data cube.
Alternatively, the time of the interval may be set according to the time situation. Management of the data cube and metadata may be done every two weeks, such as during off-the-road seasons. In the busy season, management of the data cube and metadata is completed every other week.
The above steps are query preparation processes, and the following is a specific query process. Referring to fig. 5, the data query method provided in this embodiment specifically includes:
s510, when the query request sent by the client is obtained, the query request is analyzed and converted into a source database query statement.
FIG. 6 is a schematic diagram of a user interface for client query. The user may select the dimensions and metrics that need to be queried through the user interface. As shown in fig. 6, the user drags the dimension a code 1, the dimension C1 level 1, the dimension D1 level 1, and the metric MAX1 (aggregation maximum value) to the column main key 61 as a column of the query, and of course, the user may drag the dimension and index of the user asset analysis model 1 arbitrarily according to actual needs. Specifically, the client generates a query request according to the selection of the user and sends the query request to the server. Optionally, the user may also enter filter terms through the filter column 62 of FIG. 6 to filter the query request.
Further, when the server obtains the query request, the server converts the query request into a query statement recognizable by the starting database.
S520, sending the starting database query statement to the starting database, so as to call a pre-calculation result according to the starting database query statement and form a query result.
Specifically, when the open source database receives the open source database query statement, the open source database table corresponding to the open source database query statement is determined according to the corresponding relationship between the pre-calculation result and the open source database table, the pre-calculation result corresponding to the open source database query statement is queried in the open source database table, and a query result is formed.
Further, when the open-source database table is queried, only the header data and the tail data stored in each storage area may be queried, and if it is determined that the result corresponding to the open-source database query statement is within the data range stored in a certain storage area, data query is performed in the storage area to determine the pre-computed result corresponding to the open-source database query statement.
And S530, packaging the query result and returning the query result to the client so that the client responds to the query result.
Specifically, when receiving the query result, the client parses the query result, and displays the query result in the display area 63 of the graphical user interface of fig. 6, where the query result is in a default display form table form. Optionally, the user may also determine the display type of the query result through the display type module 64 in fig. 6, and input the number of display pieces through the number of pieces module 65.
According to the technical scheme provided by the embodiment, mass data can be queried by combining the MOLAP and the big data, and the query speed is improved. Meanwhile, when in query, SQL sentences do not need to be written, so that a non-technical person user can also execute query work, and the use experience of the user is improved.
EXAMPLE III
Fig. 7 is a schematic structural diagram of a data processing apparatus based on a mol ap according to a third embodiment of the present invention. Referring to fig. 7, the data processing apparatus provided in this embodiment specifically includes: a cube creation module 701, a pre-calculation module 702, and a save module 703.
The cube creating module 701 is configured to create a data cube according to the fact table and the dimension table; a pre-calculation module 702, configured to perform data pre-calculation on all possible combinations of dimensions based on data recorded in the data cube; the saving module 703 is configured to save the pre-calculation result to the open source database, so as to determine the query result according to the pre-calculation result during querying.
According to the technical scheme provided by the embodiment, the data cube is created through the fact table and the associated dimension table, data pre-calculation is performed on all possible combinations of dimensions in the data cube, pre-calculation results are stored in the open source database, and calculation and storage of big data based on MOLAP can be achieved. Meanwhile, when a user inquires, the server can determine the inquiry result according to the corresponding pre-calculation result without writing SQL sentences and only by dragging dimensions and measurement in the page of the client, so that the data inquiry process is simplified, and the inquiry response speed is improved.
On the basis of the above embodiment, the cube creation module 701 includes: the data table creating unit is used for creating a corresponding fact table and a corresponding dimension table according to the table item requirements of the fact table and the dimension table in a preset data analysis model; the data import unit is used for importing the data in the external database into the fact table and the dimension table according to the table item requirements of the fact table and the dimension table; and the cube creating unit is used for creating the data cube by using the fact table and the dimension table according to metadata in the data analysis model, wherein the metadata is used for indicating attribute parameters and creating rules of the data cube.
On the basis of the above embodiment, the pre-calculation module 702 includes: the pre-calculation task starting unit is used for starting a pre-calculation programming model task according to metadata in the data analysis model and reading data of all dimension tables and fact tables corresponding to the data cube; the combination unit is used for arranging and combining the dimensions of all the dimension tables to obtain all possible combinations including the empty sets; the aggregation unit is used for carrying out aggregation operation on the combination containing all dimensions according to a set aggregation rule to obtain an aggregation value; an input value determining unit for inputting a combination including all dimensions as a key value of the pre-calculation programming model and inputting an aggregation value as a key word of the pre-calculation programming model; the result generation unit is used for obtaining a new dimension combination and an aggregation value corresponding to the new dimension combination by utilizing the pre-calculation programming model; and the circulating unit is used for sequentially inputting the new dimension combination as a key value of the pre-calculation programming model, inputting the aggregation value corresponding to the new dimension combination as a key word of the pre-calculation programming model, and obtaining the new dimension combination and the aggregation value corresponding to the new dimension combination by using the pre-calculation programming model until all possible combinations and the aggregation values of all the possible combinations are obtained.
On the basis of the above embodiment, the saving module 703 includes: the table building unit is used for creating an open source database table for storing pre-calculation results; the storage task establishing unit is used for starting the storage programming model task and taking the pre-calculation result as the input of the storage programming model task; the file generating unit is used for generating a corresponding binary format file by using the storage programming model; the result import unit is used for importing the binary format file into the open source database table by using the bulk load of the open source database so as to realize the storage of the pre-calculation result in the open source database; and the relational storage unit is used for storing the corresponding relation between the pre-calculation result and the open source database table into the metadata of the data analysis model.
On the basis of the above embodiment, the table building unit includes: the capacity determining subunit is used for determining the capacity of a storage area set for the open source database table in the open source database; the quantity value determining subunit is used for determining the quantity value of the storage area required when the pre-calculation result is stored in the open source database table according to the size of the pre-calculation result; and the creating subunit is used for sending the quantity value to the open source database so that the open source database creates an open source database table for storing the pre-calculation result according to the quantity value.
On the basis of the above embodiment, the method further includes: and the management module is used for setting time at intervals and managing the metadata in the data cube and the data analysis model.
On the basis of the above embodiment, the method further includes: the statement acquisition module is used for analyzing the query request and converting the query request into a source database query statement when the query request sent by the client is acquired; the result query module is used for sending the starting database query statement to the starting database so as to call a pre-calculation result according to the starting database query statement and form a query result; and the result returning module is used for packaging the query result and returning the query result to the client so that the client responds to the query result.
On the basis of the above embodiment, the metadata includes at least one of: the method comprises the steps of creating time of a data cube, creating positions of the data cube, names of the data cube, facts of the data cube, dimensions and sequence of the data cube, measurement of the data cube, aggregation type, configuration information corresponding to a programming model, configuration information corresponding to a starting database, pre-calculation time and storage information of calculation results.
The data processing device based on the MOLAP provided by the embodiment of the invention is suitable for the data processing method based on the MOLAP provided by any embodiment, and has corresponding functions and beneficial effects.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (9)

1. A data processing method based on MOLAP is characterized by comprising the following steps:
creating a data cube according to the fact table and the dimension table;
performing data pre-calculation on all possible combinations of dimensions based on the data recorded in the data cube;
storing the pre-calculation result into an open source database so as to determine a query result according to the pre-calculation result during query;
performing data pre-computation on all possible combinations of dimensions based on the data recorded in the data cube, including:
starting a precomputation programming model task according to metadata in a data analysis model, and reading data of all dimension tables and fact tables corresponding to the data cube;
arranging and combining the dimensions of all the dimension tables to obtain all possible combinations including the empty sets;
according to a set aggregation rule, performing aggregation operation on the combination containing all dimensions to obtain an aggregation value;
inputting a combination containing all dimensions as a key value of a pre-calculation programming model, and inputting the aggregation value as a key word of the pre-calculation programming model, wherein the pre-calculation programming model adopts a step-by-step calculation mode during calculation;
obtaining a new dimension combination and an aggregation value corresponding to the new dimension combination by using the pre-calculation programming model;
and sequentially inputting the new dimension combination as a key value of the pre-calculation programming model, inputting an aggregation value corresponding to the new dimension combination as a key word of the pre-calculation programming model, and obtaining the new dimension combination and the aggregation value corresponding to the new dimension combination by using the pre-calculation programming model until all possible combinations and aggregation values of all possible combinations are obtained.
2. The data processing method of claim 1, wherein creating the data cube from the fact table and the dimension table comprises:
creating a corresponding fact table and a corresponding dimension table according to the table item requirements of the fact table and the dimension table in a preset data analysis model;
according to the table item requirements of the fact table and the dimension table, importing data in an external database into the fact table and the dimension table;
and creating a data cube by using the fact table and the dimension table according to metadata in the data analysis model, wherein the metadata is used for indicating attribute parameters and creation rules of the data cube.
3. The data processing method of claim 1, wherein saving the pre-computed result to a starting database comprises:
creating an open source database table for storing pre-calculation results;
starting a storage programming model task, and taking a pre-calculation result as the input of the storage programming model task;
generating a corresponding binary format file by using the storage programming model;
importing a binary format file into the open source database table by using the bulk load of the open source database so as to realize the storage of the pre-calculation result in the open source database;
and storing the corresponding relation between the pre-calculation result and the open source database table into the metadata of the data analysis model.
4. The data processing method of claim 3, wherein creating an open source database table for storing pre-computed results comprises:
determining the capacity of a storage area set for an open source database table in an open source database;
determining the quantity value of a storage area required when the pre-calculation result is stored in an open source database table according to the size of the pre-calculation result;
and sending the quantity value to the open source database so that the open source database creates an open source database table for storing pre-calculation results according to the quantity value.
5. The data processing method of claim 1, further comprising:
and setting time intervals, and managing the metadata in the data cube and the data analysis model.
6. The data processing method of claim 1, wherein after saving the pre-computed result in the open source database, further comprising:
when an inquiry request sent by a client is obtained, analyzing the inquiry request and converting the inquiry request into an open source database inquiry statement;
sending the starting database query statement to a starting database to call a pre-calculation result according to the starting database query statement and form a query result;
and packaging and returning the query result to the client so that the client responds to the query result.
7. A data processing method according to any of claims 2 to 5, wherein the metadata comprises at least one of:
the data cube comprises creation time of the data cube, creation position of the data cube, name of the data cube, fact of the data cube, dimension and sequence of the data cube, measurement of the data cube, aggregation type, configuration information corresponding to a programming model, configuration information corresponding to a starting database, pre-calculation time and storage information of pre-calculation results.
8. A data processing apparatus based on MOLAP, comprising:
the cube creating module is used for creating a data cube according to the fact table and the dimension table;
the pre-calculation module is used for performing data pre-calculation on all possible combinations of the dimensions based on the data recorded in the data cube;
the storage module is used for storing the pre-calculation result into the open source database so as to determine a query result according to the pre-calculation result during query;
the pre-calculation module comprises: the pre-calculation task starting unit is used for starting a pre-calculation programming model task according to metadata in the data analysis model and reading data of all dimension tables and fact tables corresponding to the data cube; the combination unit is used for arranging and combining the dimensions of all the dimension tables to obtain all possible combinations including the empty sets; the aggregation unit is used for carrying out aggregation operation on the combination containing all dimensions according to a set aggregation rule to obtain an aggregation value; the input value determining unit is used for inputting a combination containing all dimensions as a key value of a pre-calculation programming model and inputting an aggregation value as a key word of the pre-calculation programming model, and the pre-calculation programming model adopts a step-by-step calculation mode during calculation; the result generation unit is used for obtaining a new dimension combination and an aggregation value corresponding to the new dimension combination by utilizing the pre-calculation programming model; and the circulating unit is used for sequentially inputting the new dimension combination as a key value of the pre-calculation programming model, inputting the aggregation value corresponding to the new dimension combination as a key word of the pre-calculation programming model, and obtaining the new dimension combination and the aggregation value corresponding to the new dimension combination by using the pre-calculation programming model until all possible combinations and the aggregation values of all the possible combinations are obtained.
9. The data processing apparatus of claim 8, wherein the cube creation module comprises:
the data table creating unit is used for creating a corresponding fact table and a corresponding dimension table according to the table item requirements of the fact table and the dimension table in a preset data analysis model;
the data import unit is used for importing the data in the external database into the fact table and the dimension table according to the table item requirements of the fact table and the dimension table;
and the cube creating unit is used for creating a data cube by using the fact table and the dimension table according to metadata in the data analysis model, wherein the metadata is used for indicating attribute parameters and creating rules of the data cube.
CN201610893549.7A 2016-10-13 2016-10-13 MOLAP-based data processing method and device Active CN106484875B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610893549.7A CN106484875B (en) 2016-10-13 2016-10-13 MOLAP-based data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610893549.7A CN106484875B (en) 2016-10-13 2016-10-13 MOLAP-based data processing method and device

Publications (2)

Publication Number Publication Date
CN106484875A CN106484875A (en) 2017-03-08
CN106484875B true CN106484875B (en) 2019-12-31

Family

ID=58270539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610893549.7A Active CN106484875B (en) 2016-10-13 2016-10-13 MOLAP-based data processing method and device

Country Status (1)

Country Link
CN (1) CN106484875B (en)

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106997386B (en) * 2017-03-28 2019-12-27 上海跬智信息技术有限公司 OLAP pre-calculation model, automatic modeling method and automatic modeling system
CN107229730A (en) * 2017-06-08 2017-10-03 北京奇虎科技有限公司 Data query method and device
CN109213829A (en) * 2017-06-30 2019-01-15 北京国双科技有限公司 Data query method and device
CN107729399B (en) * 2017-09-21 2020-06-05 北京京东尚科信息技术有限公司 Data processing method and device
CN110019357B (en) * 2017-09-29 2021-06-29 北京国双科技有限公司 Database query script generation method and device
CN108280046A (en) * 2017-11-30 2018-07-13 深圳市科列技术股份有限公司 A kind of method, battery data server and the user terminal of battery data processing
CN108334554B (en) * 2017-12-29 2021-10-01 上海跬智信息技术有限公司 Novel OLAP pre-calculation model and construction method
CN108875008A (en) * 2018-06-15 2018-11-23 湖北德普电气股份有限公司 A kind of Large Volume Data analysis method and device
CN110781228B (en) * 2018-07-13 2021-07-13 马上消费金融股份有限公司 Data query method, system and device based on data warehouse and storage medium
CN109241159B (en) * 2018-08-07 2021-07-23 威富通科技有限公司 Partition query method and system for data cube and terminal equipment
CN109308301A (en) * 2018-09-28 2019-02-05 中国银行股份有限公司 The preparation method and device of test data
CN111061758B (en) * 2018-10-16 2023-10-20 杭州海康威视数字技术股份有限公司 Data storage method, device and storage medium
CN109710859A (en) * 2019-01-21 2019-05-03 北京字节跳动网络技术有限公司 Data query method and apparatus
CN109885609A (en) * 2019-01-31 2019-06-14 平安科技(深圳)有限公司 Based on combined data area control method, device and storage medium
CN109976910A (en) * 2019-03-20 2019-07-05 跬云(上海)信息科技有限公司 Querying method and device based on precomputation OLAP model
CN110008239A (en) * 2019-03-22 2019-07-12 跬云(上海)信息科技有限公司 Logic based on precomputation optimization executes optimization method and system
CN109992594A (en) * 2019-03-22 2019-07-09 跬云(上海)信息科技有限公司 Distributed based on precomputation optimization executes optimization method and system
CN110110165B (en) * 2019-04-01 2021-04-02 跬云(上海)信息科技有限公司 Dynamic routing method and device for query engine in precomputation system
CN110222124A (en) * 2019-05-08 2019-09-10 跬云(上海)信息科技有限公司 Multidimensional data processing method and system based on OLAP
CN110275920B (en) * 2019-06-27 2021-08-03 中国石油集团东方地球物理勘探有限责任公司 Data query method and device, electronic equipment and computer readable storage medium
CN110674117A (en) * 2019-09-26 2020-01-10 京东数字科技控股有限公司 Data modeling method and device, computer readable medium and electronic equipment
CN110704514A (en) * 2019-10-25 2020-01-17 南京录信软件技术有限公司 Precomputation method based on Lucene
CN112835966A (en) * 2019-11-22 2021-05-25 北京金山云网络技术有限公司 Data query method and device and electronic equipment
CN111125264B (en) * 2019-12-12 2021-05-28 跬云(上海)信息科技有限公司 Extra-large set analysis method and device based on extended OLAP model
CN111125266B (en) * 2019-12-24 2024-01-12 建信金融科技有限责任公司 Data processing method, device, equipment and storage medium
CN111563123B (en) * 2020-05-07 2023-08-22 北京首汽智行科技有限公司 Real-time synchronization method for hive warehouse metadata
CN112100177A (en) * 2020-09-04 2020-12-18 北京三快在线科技有限公司 Data storage method and device, computer equipment and storage medium
CN113094409A (en) * 2021-04-08 2021-07-09 国网电子商务有限公司 Service data processing method and device and computer storage medium
CN113220719A (en) * 2021-06-04 2021-08-06 上海天旦网络科技发展有限公司 Mass dimension data association query optimization method and system
CN113641669B (en) * 2021-06-30 2023-08-01 北京邮电大学 Multi-dimensional data query method and device based on hybrid engine
CN113407587B (en) * 2021-07-19 2023-10-27 北京百度网讯科技有限公司 Data processing method, device and equipment for online analysis processing engine
CN114547054A (en) * 2022-02-15 2022-05-27 上海跬智信息技术有限公司 Correlation coefficient calculation method, device, equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164222A (en) * 2013-02-25 2013-06-19 用友软件股份有限公司 Multidimensional modeling system and multidimensional modeling method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164222A (en) * 2013-02-25 2013-06-19 用友软件股份有限公司 Multidimensional modeling system and multidimensional modeling method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于MapReduce的数据聚集运算算法研究与实现;高伟;《中国优秀硕士学位论文全文数据库信息科技辑》;20130315(第3期);第I138-842页 *
联机分析处理的研究与应用;郭文君;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20021215(第2期);第I138-398页 *

Also Published As

Publication number Publication date
CN106484875A (en) 2017-03-08

Similar Documents

Publication Publication Date Title
CN106484875B (en) MOLAP-based data processing method and device
JP6827127B2 (en) Systems and methods for loading, aggregating, and batching calculations in a single scan in a multidimensional database environment
EP3188043B1 (en) Indirect filtering in blended data operations
TWI525457B (en) Information processing methods and equipment
CN106528787B (en) query method and device based on multidimensional analysis of mass data
US11520760B2 (en) System and method for providing bottom-up aggregation in a multidimensional database environment
CN101916261B (en) Data partitioning method for distributed parallel database system
CN112269792B (en) Data query method, device, equipment and computer readable storage medium
US9747349B2 (en) System and method for distributing queries to a group of databases and expediting data access
CN107943952B (en) Method for realizing full-text retrieval based on Spark framework
US20150310066A1 (en) Processing queries using hybrid access paths
US10268737B2 (en) System and method for performing blended data operations
US20180129708A1 (en) Query processing management in a database management system
CN112015741A (en) Method and device for storing massive data in different databases and tables
WO2018059298A1 (en) Pattern mining method, high-utility item-set mining method and relevant device
US9020954B2 (en) Ranking supervised hashing
CN112835966A (en) Data query method and device and electronic equipment
Chambi et al. Optimizing druid with roaring bitmaps
US20200257684A1 (en) Higher-order data sketching for ad-hoc query estimation
CN112800023B (en) Multi-model data distributed storage and hierarchical query method based on semantic classification
Scriney et al. Efficient cube construction for smart city data
CN104794237A (en) Web page information processing method and device
Karras et al. Query optimization in NoSQL databases using an enhanced localized R-tree index
Nguyen et al. Semantic cubing platform enabling interoperability analysis among cloud-based linked data cubes
CN114880393A (en) Massive space-time data visualization performance optimization method and system based on multidimensional index

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant