CN112162971B - Method, device and system for generating multidimensional data cube - Google Patents

Method, device and system for generating multidimensional data cube Download PDF

Info

Publication number
CN112162971B
CN112162971B CN202011016340.5A CN202011016340A CN112162971B CN 112162971 B CN112162971 B CN 112162971B CN 202011016340 A CN202011016340 A CN 202011016340A CN 112162971 B CN112162971 B CN 112162971B
Authority
CN
China
Prior art keywords
dimension
cube
information
model
multidimensional data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011016340.5A
Other languages
Chinese (zh)
Other versions
CN112162971A (en
Inventor
翟小青
王永进
汤国强
华含青
孙迁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suning Cloud Computing Co Ltd
Original Assignee
Suning Cloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Cloud Computing Co Ltd filed Critical Suning Cloud Computing Co Ltd
Priority to CN202011016340.5A priority Critical patent/CN112162971B/en
Publication of CN112162971A publication Critical patent/CN112162971A/en
Priority to CA3141598A priority patent/CA3141598A1/en
Application granted granted Critical
Publication of CN112162971B publication Critical patent/CN112162971B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device and a system for generating a multidimensional data cube. The method comprises the following steps: collecting query data in an analysis engine, comprising: inquiring related model information, dimension information, measurement information and time granularity; performing statistical analysis on the query data to determine construction information of a multidimensional data cube, wherein the construction information comprises the following steps: recommending a model, recommending a dimension combination, recommending a measurement and recommending time granularity; and constructing a multidimensional data cube according to the construction information. The technical scheme disclosed by the invention realizes the automatic generation of the Cube, and solves the problems of dimension loss, time granularity inconsistency and low Cube hit rate which are possibly caused by manual and autonomous creation of the Cube.

Description

Method, device and system for generating multidimensional data cube
Technical Field
The invention relates to the technical field of big data processing, in particular to a method, a device and a system for generating a multidimensional data cube.
Background
The multidimensional data Cube (Cube) is a data set based on facts and dimensions, which is established to meet the needs of users for data query and analysis from multiple angles and multiple levels, and is generally a data set of the same business topic. In the prior art, cube is mainly used for data processing in an online analysis engine (OLAP), and indexes and results are stored through a pre-computing technology, so as to realize high performance in query. In the prior art, because the OLAP engine has no Cube management capability, in actual use, the dimension combination in the Cube needs manual input or report input as a condition, however, the above method is set by a technician according to various business scene requirements, and if the Cube generated by the above method is directly applied to the OLAP engine, the following problems may occur:
1. the quality of the Cube automatically created by the user is low, dimension loss exists, time granularity does not correspond to the dimension loss, and the Cube hit rate is low;
2. the universality of the Cube independently created by the user is insufficient, so that the utilization rate of part of cubes is low;
3. due to the fact that a complete Cube construction flow does not exist, the Cube independently created by a user cannot be scheduled and complemented.
Disclosure of Invention
In order to solve the problems in the prior art, embodiments of the present invention provide a method, an apparatus, and a system for generating a multidimensional data cube. The technical scheme is as follows:
in a first aspect, a method for generating a multidimensional data cube is provided, the method comprising:
collecting query data in an analysis engine, comprising: inquiring related model information, dimension information, measurement information and time granularity;
performing statistical analysis on the query data to determine construction information of a multidimensional data cube, wherein the construction information comprises the following steps: recommending a model, recommending a dimension combination, recommending a measurement and recommending time granularity;
and constructing a multidimensional data cube according to the construction information.
Further, the performing statistical analysis on the query data to determine construction information of the multidimensional data cube includes:
according to the model information involved in the query, counting the model calling amount in a preset period, and determining the model meeting the condition of the model calling amount as the recommended model;
according to the dimension information related to the query, counting the dimension combination calling amount and the dimension combination response time in the recommendation model in a preset period, and determining the dimension combination meeting the condition of the dimension calling amount and the condition of the response time as the recommendation dimension combination;
determining the time granularity corresponding to the recommended dimension combination as the recommended time granularity;
and determining the measurement field in the model as a recommended measurement.
Further, the constructing a multi-dimensional data cube according to the construction information includes:
and calling the analysis engine according to the construction information, and determining the storage information and the creation interface of the multidimensional data cube in the analysis engine so as to complete the creation of the multidimensional data cube.
Further, determining the recommended dimension combination further comprises: dimension combination optimization:
expanding the recommended dimension combinations in a dimension combination mode according to the number of the recommended dimension combinations; and/or the presence of a gas in the gas,
and supplementing the recommended dimension combination according to the dimension table.
Further, determining the construction information further includes:
and calculating a profit value of the construction information, wherein the profit value is the sum of products of the call quantity and the average response time of each optimized dimension combination which can be covered by the recommended dimension combination.
Further, determining the construction information further includes: and calculating the similarity of the recommended dimension combination among the construction information according to the profit value.
Further, after constructing the multidimensional data cube, the method further comprises:
and writing the intermediate table obtained according to the fact table and the dimension table into the multidimensional data cube to complete complement and scheduling of the multidimensional data cube.
Further, after constructing the multidimensional data cube, the method further comprises:
and calculating the hit rate of the multidimensional data cube in a preset time, wherein the hit rate is the ratio of the calling amount of the multidimensional data cube to the model calling amount.
In a second aspect, there is provided a multidimensional data cube generation apparatus, the apparatus comprising:
the data acquisition module is used for acquiring query data in the analysis engine and comprises: inquiring related model information, dimension information, measurement information and time granularity;
the construction information determining module is used for performing statistical analysis on the query data and determining construction information of the multidimensional data cube, and comprises the following steps: recommending a model, recommending a dimension combination, recommending a measurement combination and recommending time granularity;
and the construction module is used for constructing the multidimensional data cube according to the construction information.
Further, the build information determination module includes:
and the recommendation model determining module is used for counting the model calling amount within the preset time according to the queried model information and determining the model meeting the model calling amount condition as the recommendation model.
And the recommended dimension combination determining module is used for counting the dimension combination calling amount in the recommendation model in the preset time and the dimension combination response time according to the dimension information related to the query, and determining the dimension combination meeting the dimension calling amount condition and the response time condition to be the recommended dimension combination.
And the recommended time granularity determining module is used for determining the time granularity corresponding to the recommended dimension combination as the recommended time granularity.
And the recommendation measurement determining module is used for determining the measurement field in the model as the recommendation measurement.
Further, the construction module is specifically configured to invoke the analysis engine according to the construction information, and determine storage information and a creation interface of the multidimensional data cube in the analysis engine, so as to complete creation of the multidimensional data cube.
Further, the recommended dimension combination determination module further includes:
and the dimension combination expanding module is used for expanding the recommended dimension combinations in a dimension combination mode according to the number of the recommended dimension combinations.
And the dimension combination supplement module is used for supplementing the recommended dimension combination according to the dimension table.
Further, the building information determination module further includes:
and the profit value calculation module is used for calculating the profit value of the constructed information, and the profit value is the product of the call volume and the average response time of each before-optimization dimension combination which can be covered by the recommended dimension combination.
Further, the building information determination module further includes:
and the similarity calculation module is used for calculating the similarity of the recommended dimension combination among the construction information according to the profit value.
Further, the apparatus disclosed in the present invention further comprises:
and the complement module is used for writing the intermediate table obtained according to the fact table and the dimension table into the multidimensional data cube to complete complement to the multidimensional data cube.
And the scheduling module is used for writing the intermediate table obtained according to the fact table and the dimension table into the multidimensional data cube to complete the scheduling of the multidimensional data cube.
Further, the apparatus disclosed in the present invention further comprises:
and the hit rate calculation module is used for calculating the hit rate of the multidimensional data cube in the preset time, wherein the hit rate is the ratio of the calling amount of the multidimensional data cube to the model calling amount.
In a third aspect, there is provided a computer system comprising:
one or more processors; and
memory associated with the one or more processors for storing program instructions which, when read and executed by the one or more processors, perform the method of any of the first aspects described above.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
1. the technical scheme disclosed by the invention realizes the automatic generation of the Cube, and solves the problems of dimension loss, non-correspondence of time granularity and low Cube hit rate which are possibly caused by the manual and autonomous creation of the Cube;
2. according to the technical scheme disclosed by the invention, the Cube is automatically generated, so that the Cube comprises various dimensional combinations, the universality of the Cube is enhanced, and the calling rate of the Cube is improved;
3. according to the technical scheme disclosed by the invention, through an automatic Cube construction process, the Cube is automatically complemented and scheduled.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a multidimensional data cube generation method provided by an embodiment of the present invention;
FIG. 2 is a block diagram of a multidimensional data cube generating apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a computer system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The OLAP engine in the background art is based on a quick analysis technology for sharing multidimensional information, and utilizes a multidimensional database technology to enable a user to observe data from different angles, support complex analysis operation, focus on decision support for managers, meet the complex query requirement of analysts on quickly and flexibly carrying out big data, present a query result in an intuitive and understandable form and assist in decision. The OLAP engine commonly used techniques are selected from: druid, postgreSQL. The drive engine is an engine for processing time series data in real time, the index of the drive engine is sliced according to time, and the index is routed according to a time line when the index is queried. PostgreSQL is an object-relational database management system of free software with complete characteristics, and complex SQL analysis can be quickly executed on large datasets by means of the MPP architecture. However, neither the Druid engine nor the PostgreSQL engine has the management capability of Cube, and in the implementation of the query analysis function, a user needs to independently construct a dimension combination according to a query service scene, so that the problems that the autonomously created Cube is low in hit rate, insufficient in generality, and incapable of scheduling and complementing are caused.
In order to solve the technical problem about Cube in the OLAP engine, the embodiment of the present invention provides a method, an apparatus, and a system for generating a multidimensional data Cube, and the specific technical solution is as follows:
as shown in fig. 1, a method for generating a multidimensional data cube includes:
s1, collecting query data in an analysis engine, comprising: model information, dimension information, metric information, time granularity involved in the query.
The analysis engine mainly refers to an OLAP engine, and may be another data analysis engine.
Query data includes two categories: one is to inquire the call data, and the other is to inquire the fusing data. The query calling data is data information related to a query, which has the capability of executing a query analysis request and does not return blocking information, and comprises the following steps: model information, dimension information, measurement information, time granularity, further, still include: the information of the original Cube routed (here, the original Cube mainly refers to the Cube constructed by the user), the response time, the success/failure flag, whether the cache flag is hit, and the like. The query fusing data is data information related to the query which returns the blocking information because the data volume related to the query analysis request is too large and the analysis engine does not have the execution capacity, and comprises the following steps: model information, dimension information, measurement information, time granularity, further, still include: and fusing abnormal information.
Above, the model information may include: model name, model number, model call amount. The dimension information mainly refers to dimension combinations and the calling amount thereof, and the dimension combinations can include: analysis dimensions, filtering dimensions, sorting dimensions, filtering conditions of the indicators themselves, and the like. The measurement information mainly refers to a measurement field and the call amount thereof, and may include: a metric function. Time granularity is the time range of the query data, e.g., day, month, quarter, year, day represents data in the range of one day, and so on.
It should be noted that, the dimension is non-numerical data of an angle that can be used as observation data in the data table, for example: the sales data can be analyzed to obtain the daily sales of each item from the item classes, and the daily sales of each item from the commodity can be analyzed, wherein the item classes and the commodity are dimensions.
Metrics are numerical data in data tables, such as: sales amount, number of sales, etc. The metric function is a function that computes a metric, such as max, sum, min, etc.
The model is composed of dimensions and measures, and is formed by a fact table, one or more sets of dimension tables and a certain mode. The fact table is a table for storing the measurement values, the foreign keys of the dimension table are stored at the same time, the data used in the analysis engine are finally from the fact table, the dimension table is a table for describing the dimensions, and one dimension can correspond to one or more dimension tables. The specific schema can be a star, a snowflake, a factual star, etc., wherein the schema corresponding to one dimension table is a star schema, and the schema corresponding to a plurality of dimension tables is a snowflake schema. The model is the field source of Cube, and the fields in the finally constructed Cube are the subset of the model fields.
S2, performing statistical analysis on the query data, and determining construction information of the multidimensional data cube, wherein the construction information comprises the following steps: recommendation model, recommendation dimension combination, recommendation measure combination and recommendation time granularity.
The recommended model is obtained according to the statistical result of the model information collected in step S1. The recommended dimension combination is obtained according to the statistical result of the dimension information collected in the step S1. The recommended metric is obtained from the model.
It should be noted that, in the step S2, the query call data and the query fusing data need to be statistically analyzed respectively, and the main reason is that the call amount of the query fusing data is much lower than that of the query call data, and if the query call data and the query fusing data are statistically analyzed together, the query fusing data can be filtered. The method has the advantages that query calling data and query fusing data are processed respectively, query analysis of large data volume which cannot be realized by an OLAP engine through model query data originally can be realized by utilizing Cube, because fields in Cube are subsets of model fields, the related query data volume is small, query is more targeted, and the OLAP engine can avoid the fusing problem caused by large data volume.
In one embodiment, step S2 comprises:
s21, according to the inquired model information, counting the model calling amount within the preset time, and determining the model meeting the model calling amount condition as a recommendation model.
S22, according to the dimension information involved in the query, the dimension combination calling amount in the recommendation model in the preset time and the dimension combination response time are counted, and the dimension combination meeting the dimension calling amount condition and the response time condition is determined to be the recommended dimension combination.
And S23, determining the time granularity corresponding to the recommended dimension combination as the recommended time granularity.
And S24, determining the measurement field in the model as recommended measurement.
In the above, step S21 is mainly to search for a model with a relatively large calling amount, wherein besides determining the model meeting the condition of the model calling amount as the recommended model, a necessary model may be added as the recommended model. Specifically, the models are sorted after counting the model call volume within the preset time, and the model call volume condition is the arrangement order of the models. In step S22, the dimension combination with the large call volume is screened from the model with the large call volume as the recommended dimension combination. Specifically, after the calling quantity of the dimension combination in the preset time is counted, sorting is performed, meanwhile, the response time of the dimension combination in the preset time is counted, sorting is performed, and finally the obtained recommended dimension combination needs to meet the condition of the calling quantity of the dimension combination and the condition of the response time of the dimension combination. The following steps S21 to S23 can be obtained: recommending a model, recommending a dimension combination and recommending time granularity, thereby obtaining an observation angle of data, and finally step S24 of importing a measurement field from the model as a recommendation measurement, wherein the finally obtained construction information comprises: recommendation model, recommendation dimension combination, recommendation time granularity, recommendation measurement. In the embodiment of the present invention, in order to ensure the comprehensiveness of the business data contained in Cube, the measurement field in the model directly pulled in step S24 is used as the recommended measurement.
Cube construction information of the query call data and Cube construction information of the query fusing data are obtained according to the step S2, and the subsequently constructed multidimensional data Cube also comprises: and querying Cube of calling data and Cube of fusing data.
In an embodiment, step S22 further includes a step of recommending a dimension combination optimization, specifically:
and S221, expanding the recommended dimension combinations in a dimension combination mode according to the number of the recommended dimension combinations.
S222, supplementing the recommended dimension combination according to the dimension table.
As described above, step S221 mainly balances the computing resources, and specifically, if the number of recommended dimension combinations is small, an arbitrary combination method is adopted. If the number of the recommended dimension combinations is large, a pairwise combination method is adopted.
For any combination method, for example: one recommended combination of dimensions in common: [ major, city company, store, category, brand ], then any combination of results includes:
large district, city company
Large district and store
Large area, categories of products
Large district, brand
City company, store
Large district, city company, store
Large district, city company, class
Large district, city company, brand
Large district, city company, store, class
Large district, city company, store, brand
Large district, city company, store, category, brand
For pairwise combination methods, for example: obtaining three recommended dimension combinations in total, wherein the first recommended dimension combination is as follows: [ large district, city company, store ], second dimension combination: [ major, urban, grade ], third dimension combination: (categories, brands) the results are combined in pairs:
large district, city company, store, class
Large district, city company, store, category, brand
Large district, city company, category, brand
Step S222 mainly aims to further supplement the dimension combination, and specifically includes two supplement methods: dimension level supplement method and derivative dimension method.
For dimension hierarchy supplementation, for example:
if the recommended dimension combination has brand dimensions, the upper-level dimensions (categories) in the model are automatically supplemented to the recommended dimension combination, and the supplement does not increase the number of records and can meet more scenes.
For the derived dimension method, for example:
the recommended dimension combination has store dimensions, the store opening time on the store dimension table on the model and the store closing time are automatically expanded into the recommended dimension combination, and the number of records cannot be increased by the supplement.
In one embodiment, the optimization of the recommended combination of dimensions further comprises: exclusion of combinations of dimensions has been proposed. In the prior art, the OLAP engine may already have the Cube constructed by the user, so that the recommended dimension combination which is duplicated with the existing Cube dimension combination needs to be deleted in order to avoid duplication. The method specifically comprises the following steps:
s223, comparing the recommended dimension combination with the existing dimension combination, and deleting the recommended dimension combination consistent with the existing dimension combination.
In one embodiment, step S2 further includes:
and S25, calculating a profit value of the construction information, wherein the profit value is the product of the call volume and the average response time of each optimized pre-dimensional combination which can be covered by the recommended dimensional combination.
The concrete calculation formula of the profit value is as follows:
profit value = SUM (average elapsed time of covered dimension field combination x call volume)
For example:
the recommended time granularity is daily granularity, and the recommended dimension combination is [ large district, city company, store ];
the time granularity before optimization is day granularity, the dimension combination 1 before optimization is [ large region ], the calling frequency =100 times, and the average response time =200ms;
the time granularity before optimization is day granularity, the dimension combination 2 before optimization is [ city company ], the calling times =150 times, and the average response time =250ms;
the benefit value for the recommended dimension combination = number of calls for dimension combination 1 before optimization x average response time + number of calls for dimension combination 2 before optimization x average response time =100 x 200+150 x 250=57500.
In one embodiment, step S2 further includes:
and S26, estimating the data volume of the Cube constructed according to the construction information, wherein the data volume is the data volume of the Cube constructed according to the construction information and estimated according to the dimension combination quantity.
In one embodiment, step S2 further includes:
and S27, calculating the similarity of the recommended dimension combination among the construction information according to the profit value.
Step S26 specifically includes:
arranging construction information according to the profit value from big to small, and taking the construction information with the ranking meeting the profit value condition;
and calculating the similarity of the recommended dimension combination among the construction information.
The similarity calculation method can adopt Jaccard, and the main calculation principle is as follows:
given two sets A, B, the Jaccard coefficient is defined as the ratio of the size of the intersection with A and B to the size of the union of A and B, and the formula is:
Figure BDA0002699202620000111
wherein J (A, B) is defined as 1 when the sets A, B are all empty sets.
For example, the following steps are carried out:
the recommended dimension combination in the Cube1 construction information is { large district, city company, store state }, the dimension combination in the Cube2 construction information is { large district, city company, store, class }, and then J (Cube 1, cube 2) =2/5.
If the recommended time granularity of the construction information is different, the similarity is 0.
It should be noted that, when calculating the similarity, in order to avoid filtering out the Cube construction information of the query fusing data, the similarity between the Cube construction information of the query calling data and the similarity between the Cube construction information of the query fusing data need to be calculated respectively.
After the above steps S21 to S27, the construction information of each Cube is composed of seven parts: recommending models, recommending time granularity, recommending dimension combinations, recommending measurement, data volume, profit values and similarity with other Cube construction information. The data volume, the profit value and the similarity are all the measures of the Cube to be constructed, so that technicians can judge the value of the Cube according to the values of the parts and perform manual intervention operation.
And S3, constructing a multi-dimensional data cube according to the construction information.
The multi-dimensional data cube is constructed by mainly determining the storage information and the creation interface of the multi-dimensional data cube in the analysis engine.
Therefore, in an embodiment, step S3 specifically includes:
and calling an analysis engine according to the construction information, and determining the storage information and the creation interface of the multidimensional data cube in the analysis engine to complete the creation of the multidimensional data cube.
For example, the following steps are carried out: cube automatically inherits the relevant information of the belonging model, such as the storage medium, belonging cluster. That is, if the model plan is stored in the DRUID, cube under the model is automatically launched. If the model to which the Cube belongs is stored in the DRUID, creating the Cube by constructing json for creating the datasource and calling a rest interface of the DRUID to achieve creating the Cube. If the Cube belongs to the model stored in PostgreSQL, then the PG table is created through jdbc implementation. In addition, in order to support the management and control of resources, manual intervention for creating Cube by a specified model is supported, and after manual review is needed for creating Cube of the model, cube construction can be continued.
In one embodiment, the method for generating a multidimensional data cube disclosed by the present invention further comprises:
and S4, writing the intermediate table obtained according to the fact table and the dimension table into the multidimensional data cube, and completing the complement and scheduling of the multidimensional data cube.
The complement is mainly writing the historical data in the model into the Cube, and the scheduling is mainly writing the data in the current and future time periods in the model into the Cube. The complement number is consistent with the scheduled write method. The intermediate table is a model formed by combining the fact table and the dimension table in a widening way. For example, the following steps are carried out:
cube complement number:
and after the Cube construction is completed, automatically initiating Cube complement on all data in the life cycle of the model. The process of number complementing is established on the basis of an off-line computing platform, left join widening is carried out through a fact table and a dimension table, and the widening is written into Cube through a widened middle table. Cube data generation rules are as follows:
if the Cube is stored in the DRUID, calling a rest interface of the DRUID through spark-drive, appointing a Post request to a drive dominating node, starting a Hadoop Index Job task of the drive, reading data from the Hadoop cluster, and writing the data into the DRUID;
if the Cube is stored in the PG, the data is written into the PG through the spark-jdbc interface. Cube data generation rules are as follows:
if a Cube contains the following construction information
Time granularity: sky
Dimension combination field: large district, city company, category, brand
A metric combination field: quantity (sum is the aggregation function), amount (sum is the aggregation function)
The Cube data generates the following schematic SQL:
SELECT
DATE _ FORMAT (time, 'day'),
in the large area, the water-soluble polymer,
the city of a company may be that of a city,
the types of the products are shown in the specification,
the brand name is a name of a brand,
SUM (number) AS number of times,
SUM (amount) AS amount
FROM
Width-expanding middle meter
WHERE time > = T-2
AND time < = T-1
GROUP BY data _ FORMAT (time, 'day'),
in the large area, the water-soluble polymer,
a city company,
the types of the products are shown in the specification,
brand
Cube scheduling:
after the Cube construction is completed, automatically registering the Cube scheduling task, and automatically initiating the complement of Cube data according to the frequency during registration: and calculating and scheduling the Cube according to a fixed frequency, firstly widening through a fact table and a dimension table degree in the whole process, and writing data of all cubes under the model through a widened middle table. The scheduled data write rule is consistent with the complementary data write.
In one embodiment, the method for generating a multidimensional data cube disclosed by the invention further comprises:
and S5, calculating the hit rate of the multidimensional data cube in a preset time, wherein the hit rate is the ratio of the calling amount of the multidimensional data cube to the model calling amount.
Step S5 is mainly used to measure the accuracy of the constructed multidimensional data cube, and a higher hit rate indicates better accuracy of the multidimensional data cube.
In an embodiment, based on the hit rate, the method for generating a multidimensional data cube disclosed by the present invention further includes:
acquiring the calling quantity of the multidimensional data cube in preset time;
and deleting the multidimensional data cube of which the calling amount does not meet the calling amount threshold value and the hit rate does not meet the hit rate threshold value within the same preset time.
The method is a mechanism for eliminating the multidimensional data cube so as to ensure that the constructed multidimensional data cube meets the query analysis requirements. When deleting the Cube, if the Cube is stored in the DRUID, calling an REST interface of the DRUID to delete the current Cube;
if the Cube is stored in the PG, deleting the Cube through JDBC;
and reading the mark of the Cube, and if the current Cube is marked as the important Cube, the Cube cannot be automatically deleted, and daily Cube hit rate information is provided for manual intervention processing.
As shown in fig. 2, based on the method for generating a multidimensional data cube disclosed in the embodiment of the present invention, an embodiment of the present invention further provides a device for generating a multidimensional data cube, including:
the data collection module 201 is configured to collect query data in an analysis engine, and includes: model information, dimension information, metric information, time granularity involved in the query.
The construction information determining module 202 is configured to perform statistical analysis on the query data and determine construction information of the multidimensional data cube, where the construction information includes: recommending a model, recommending a dimension combination, recommending a measurement combination and recommending time granularity.
In one embodiment, the build information determination module 202 includes:
and the recommendation model determining module is used for counting the model calling amount within the preset time according to the queried model information and determining the model meeting the model calling amount condition as the recommendation model.
And the recommended dimension combination determining module is used for counting the dimension combination calling amount in the recommendation model in the preset time and the dimension combination response time according to the dimension information related to the query, and determining the dimension combination meeting the dimension calling amount condition and the response time condition to be the recommended dimension combination.
And the recommended time granularity determining module is used for determining the time granularity corresponding to the recommended dimension combination as the recommended time granularity.
And the recommendation measurement determining module is used for determining the measurement field in the model as the recommendation measurement.
In one embodiment, the recommended dimension combination determination module further includes:
and the dimension combination expanding module is used for expanding the recommended dimension combinations in a dimension combination mode according to the number of the recommended dimension combinations.
And the dimension combination supplementing module is used for supplementing the recommended dimension combination according to the dimension table.
In one embodiment, the recommended dimension combination determination module further has a dimension combination optimization function, including:
and the dimension combination repetition ranking module is used for comparing the recommended dimension combination with the existing dimension combination and deleting the recommended dimension combination consistent with the existing dimension combination.
In one embodiment, the construction information determining module 202 further includes:
and the profit value calculation module is used for calculating the profit value of the construction information, and the profit value is the product of the call volume and the average response time of each optimized pre-dimensional combination which can be covered by the recommended dimensional combination.
In one embodiment, the construction information determining module 202 further includes:
and the data volume calculation module is used for estimating the data volume of the Cube constructed according to the construction information, and the data volume of the Cube constructed according to the construction information is estimated according to the number of the dimension combinations.
In one embodiment, the construction information determination module 202 further includes:
the similarity calculation module is used for calculating the similarity of the recommended dimension combination among the construction information according to the profit value, and is specifically used for:
arranging construction information according to the profit value from big to small, and taking the construction information with the ranking meeting the profit value condition;
and calculating the similarity of the recommended dimension combination among the construction information.
The similarity calculation method can adopt Jaccard.
And the construction module 203 is used for constructing the multidimensional data cube according to the construction information. The method is specifically used for:
and calling an analysis engine according to the construction information, and determining the storage information and the creation interface of the multidimensional data cube in the analysis engine to complete the creation of the multidimensional data cube.
In one embodiment, the apparatus disclosed herein further comprises:
and the complement module is used for writing the intermediate table obtained according to the fact table and the dimension table into the multidimensional data cube to complete complement to the multidimensional data cube.
And the scheduling module is used for writing the intermediate table obtained according to the fact table and the dimension table into the multidimensional data cube to complete the scheduling of the multidimensional data cube.
In one embodiment, the disclosed apparatus further comprises:
and the hit rate calculation module is used for calculating the hit rate of the multidimensional data cube in the preset time, wherein the hit rate is the ratio of the calling amount of the multidimensional data cube to the model calling amount.
In one embodiment, the apparatus disclosed herein further comprises:
and the elimination module is used for deleting the multidimensional data cube of which the calling amount does not meet the calling amount threshold value and the hit rate does not meet the hit rate threshold value within the same preset time.
Based on the foregoing method embodiment, an embodiment of the present invention further provides a computer system, including:
one or more processors; and
a memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform the method for generating the multidimensional data cube described above.
Fig. 3 illustrates an architecture of a computer system, which may include, in particular, a processor 310, a video display adapter 311, a disk drive 312, an input/output interface 313, a network interface 314, and a memory 320. The processor 310, the video display adapter 311, the disk drive 312, the input/output interface 313, the network interface 314, and the memory 320 may be communicatively connected by a communication bus 330.
The processor 310 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solution provided in the present Application.
The Memory 320 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 320 may store an operating system 321 for controlling the operation of the electronic device 300, a basic input output system 322 (BIOS) for controlling low-level operations of the electronic device 300. In addition, a web browser 323, a data storage management system 324, and a device identification information processing system 325, and the like may also be stored. The device identification information processing system 325 may be an application program that implements the operations of the foregoing steps in this embodiment of the present application. In summary, when the technical solution provided by the present application is implemented by software or firmware, the relevant program code is stored in the memory 320 and called to be executed by the processor 310.
The input/output interface 313 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The network interface 314 is used for connecting a communication module (not shown in the figure) to implement communication interaction between the present device and other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, bluetooth and the like).
Bus 330 includes a path that transfers information between various components of the device, such as processor 310, video display adapter 311, disk drive 312, input/output interface 313, network interface 314, and memory 320.
In addition, the electronic device 300 may also obtain information of specific pickup conditions from the virtual resource object pickup condition information database 341 for performing condition judgment, and the like.
It should be noted that although the above devices only show the processor 310, the video display adapter 311, the disk drive 312, the input/output interface 313, the network interface 314, the memory 320, the bus 330, etc., in a specific implementation, the devices may also include other components necessary for normal operation. Furthermore, it will be understood by those skilled in the art that the apparatus described above may also include only the components necessary to implement the solution of the present application, and not necessarily all of the components shown in the figures.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially implemented or the portions contributing to the prior art may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method of the embodiments or some portions of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are merely illustrative, wherein units described as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
1. the technical scheme disclosed by the invention realizes the automatic generation of the Cube, and solves the problems of dimension loss, non-correspondence of time granularity and low Cube hit rate which possibly exist in the process of manually and autonomously creating the Cube;
2. according to the technical scheme disclosed by the invention, the Cube is automatically generated, so that the Cube comprises various dimensional combinations, the universality of the Cube is enhanced, and the calling rate of the Cube is improved;
3. according to the technical scheme disclosed by the invention, the automatic number complementing and scheduling of the Cube are realized through an automatic Cube construction process.
All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present invention, and are not described in detail herein.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (7)

1. A method for generating a multidimensional data cube, comprising:
collecting query data in an analysis engine, comprising: inquiring related model information, dimension information, measurement information and time granularity;
performing statistical analysis on the query data, and determining construction information of a multidimensional data cube, wherein the construction information comprises: recommending a model, recommending a dimension combination, recommending measurement, recommending time granularity, a profit value and similarity with other multi-dimensional data cube construction information;
constructing a multidimensional data cube according to the construction information;
wherein, the statistical analysis of the query data to determine the construction information of the multidimensional data cube comprises:
according to the model information related to the query, counting the model calling amount in a preset period, and determining the model meeting the condition of the model calling amount as the recommended model;
according to the dimension information related to the query, counting the dimension combination calling amount and the dimension combination response time in the recommendation model in a preset period, and determining the dimension combination meeting the dimension calling amount condition and the response time condition as the recommendation dimension combination;
determining the time granularity corresponding to the recommended dimension combination as the recommended time granularity;
determining a metric field in the model as a recommended metric;
calculating a profit value of the construction information, wherein the profit value is the sum of products of call quantity and average response time of each dimension combination before optimization which can be covered by the recommended dimension combination;
and calculating the similarity of the recommended dimension combination among the construction information according to the profit value.
2. The method of claim 1, wherein said building a multidimensional data cube from the build information comprises:
and calling the analysis engine according to the construction information, and determining the storage information and the creation interface of the multidimensional data cube in the analysis engine so as to complete the creation of the multidimensional data cube.
3. The method of claim 1, wherein determining the recommended combination of dimensions, further comprises: dimension combination optimization:
expanding the recommended dimension combinations in a dimension combination mode according to the number of the recommended dimension combinations; and/or the presence of a gas in the gas,
supplementing the recommended dimension combination according to a dimension table.
4. The method of any of claims 1-3, wherein after constructing the multidimensional data cube, the method further comprises:
and writing the intermediate table obtained according to the fact table and the dimension table into the multidimensional data cube to complete complement and scheduling of the multidimensional data cube.
5. The method of any of claims 1-3, wherein after constructing the multidimensional data cube, the method further comprises:
and calculating the hit rate of the multidimensional data cube in a preset time, wherein the hit rate is the ratio of the calling amount of the multidimensional data cube to the model calling amount.
6. An apparatus for implementing the multidimensional data cube generation method of claim 1, comprising:
the data acquisition module is used for acquiring query data in the analysis engine and comprises: inquiring related model information, dimension information, measurement information and time granularity;
the construction information determining module is used for performing statistical analysis on the query data and determining construction information of the multidimensional data cube, and comprises the following steps: recommending a model, recommending a dimension combination, recommending a measurement and recommending time granularity;
and the construction module is used for constructing the multidimensional data cube according to the construction information.
7. A computer system, comprising:
one or more processors; and
memory associated with the one or more processors for storing program instructions which, when read and executed by the one or more processors, perform the method of any of claims 1 to 5.
CN202011016340.5A 2020-09-24 2020-09-24 Method, device and system for generating multidimensional data cube Active CN112162971B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011016340.5A CN112162971B (en) 2020-09-24 2020-09-24 Method, device and system for generating multidimensional data cube
CA3141598A CA3141598A1 (en) 2020-09-24 2021-09-24 Multi-dimensional data cube generation method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011016340.5A CN112162971B (en) 2020-09-24 2020-09-24 Method, device and system for generating multidimensional data cube

Publications (2)

Publication Number Publication Date
CN112162971A CN112162971A (en) 2021-01-01
CN112162971B true CN112162971B (en) 2022-11-11

Family

ID=73863745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011016340.5A Active CN112162971B (en) 2020-09-24 2020-09-24 Method, device and system for generating multidimensional data cube

Country Status (2)

Country Link
CN (1) CN112162971B (en)
CA (1) CA3141598A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040122646A1 (en) * 2002-12-18 2004-06-24 International Business Machines Corporation System and method for automatically building an OLAP model in a relational database
CN106600067A (en) * 2016-12-19 2017-04-26 广州视源电子科技股份有限公司 Method and device for optimizing multidimensional cube model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040122646A1 (en) * 2002-12-18 2004-06-24 International Business Machines Corporation System and method for automatically building an OLAP model in a relational database
CN106600067A (en) * 2016-12-19 2017-04-26 广州视源电子科技股份有限公司 Method and device for optimizing multidimensional cube model

Also Published As

Publication number Publication date
CN112162971A (en) 2021-01-01
CA3141598A1 (en) 2022-03-24

Similar Documents

Publication Publication Date Title
US9280581B1 (en) Method and system for determination of data completeness for analytic data calculations
CN107729519B (en) Multi-source multi-dimensional data-based evaluation method and device, and terminal
CN105868373B (en) Method and device for processing key data of power business information system
EP2249299A1 (en) Contextualizing business intelligence reports based on context driven information
TW201537366A (en) Determining a temporary transaction limit
CN111858742A (en) Data visualization method and device, storage medium and equipment
CN111160867A (en) Large-scale regional parking lot big data analysis system
CN104102670A (en) Performance indicator analytical framework
CN113064866A (en) Power business data integration system
CN108305163A (en) The credit method for early warning and system in feature based library
CN111833018A (en) Patent analysis method and system for science and technology project
CN111310032A (en) Resource recommendation method and device, computer equipment and readable storage medium
Cherfi et al. Multidimensional schemas quality: assessing and balancing analyzability and simplicity
US20220058499A1 (en) Multidimensional hierarchy level recommendation for forecasting models
CN116483822B (en) Service data early warning method, device, computer equipment and storage medium
CN112162971B (en) Method, device and system for generating multidimensional data cube
CN110175113B (en) Service scene determination method and device
CN111144987A (en) Abnormal shopping behavior limiting method, limiting assembly and shopping system
CN115470279A (en) Data source conversion method, device, equipment and medium based on enterprise data
CN115481026A (en) Test case generation method and device, computer equipment and storage medium
CN113672660A (en) Data query method, device and equipment
Planting Developing a data repository for the Climate Adaptive City Enschede
CN111858598A (en) Mass data comprehensive management system and method
CN111460300A (en) Network content pushing method and device and storage medium
CN110246026A (en) A kind of output combination setting method, device and the terminal device of data transfer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant