WO2022007592A1 - 数据多维分析方法、装置及系统 - Google Patents

数据多维分析方法、装置及系统 Download PDF

Info

Publication number
WO2022007592A1
WO2022007592A1 PCT/CN2021/099664 CN2021099664W WO2022007592A1 WO 2022007592 A1 WO2022007592 A1 WO 2022007592A1 CN 2021099664 W CN2021099664 W CN 2021099664W WO 2022007592 A1 WO2022007592 A1 WO 2022007592A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimension
preset
dimension combination
combination
analysis
Prior art date
Application number
PCT/CN2021/099664
Other languages
English (en)
French (fr)
Inventor
冀怀远
汪金忠
孙迁
Original Assignee
苏宁易购集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁易购集团股份有限公司 filed Critical 苏宁易购集团股份有限公司
Publication of WO2022007592A1 publication Critical patent/WO2022007592A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying

Definitions

  • the present application relates to the technical field of intelligent data processing, and in particular to a method, device and system for multi-dimensional data analysis.
  • an indicator refers to a measure under one or more dimensions.
  • the indicator is the sales amount of a store in a certain region, where region and store are dimensions, and sales amount is a measure.
  • the indicator has a characteristic called additivity.
  • Additivity refers to the measurement values that can be added according to each dimension.
  • the measures in the additivity indicator have corresponding physical fields in the detailed table, so the additivity indicator can directly Summarize and calculate the data in the business-related detail table, such as the sales amount, the sales amount of each detail in the detail table can be accumulated and calculated, and the sales amount can be calculated from the geographical dimension, department dimension, product dimension, and time dimension.
  • the sales amount under any combination of dimensions it can be calculated directly from the detailed table.
  • Non-additivity refers to metric values that cannot be added no matter what latitude is used.
  • the metrics of non-additive indicators do not have corresponding physical fields in the detailed table, and non-additive indicators cannot directly summarize the data in the detailed table. Calculated. There are mainly two categories, one is relative indicators such as gross profit margin, year-on-year, and chain ratio. Another category is complexity metrics such as deduplication metrics UV.
  • the additivity of the index can also be achieved through some algorithms. For example, use hll for inexact deduplication calculation, and use bitmap for accurate deduplication summary.
  • the number of repurchases in the repurchase index requires the calculation of the number of members who have purchased products for more than two days. There is no physical field corresponding to the measure of the number of repurchases in the detailed table. Naturally, it is impossible to directly summarize the number of repurchases. calculate.
  • the pre-aggregated data of non-additive indicators is not simply aggregated from detailed data, but has a more complex logical operation relationship. Such indicators also include TOPN, sales to new and existing buyers, etc. When a user needs to query a non-additive indicator, the calculation of such indicators may be very slow under the massive amount of data in the detailed table, which cannot meet the performance requirements of the business.
  • the present application provides a data multidimensional analysis method, device and system, which can speed up the query of non-additive indicators and improve the query performance.
  • a first aspect provides a method for multidimensional analysis of data, the method comprising:
  • the target pre-summary table corresponding to the target dimension combination is determined in the pre-summary table pre-calculated in the preset logical model, and the pre-summary table indicates that the A summary table of measurement information under different dimension combinations obtained by summarizing and calculating from the preset detailed table by the preset logical model;
  • the data information to be queried by the query request is obtained from the target pre-summary table and returned to the user.
  • the method also includes:
  • the query request is rewritten into a query statement suitable for the preset detailed table according to the preset rules, and the The query statement acquires the data information to be queried by the query request from the preset detailed table.
  • the preset logical model includes dimension information, measurement information, model time period and calculation logic, and the method further includes the step of calculating dimension combinations in the preset logical model:
  • the dimension combination identifiers corresponding to all the dimension combinations are calculated according to the preset identifier rules.
  • the method also includes the step of pre-calculating a pre-summary table in the preset logic model:
  • Each of the dimension combinations, the dimension combination identifiers corresponding to each of the dimension combinations, and the target time range are added to the calculation logic as parameters to form a calculation statement for each of the dimension combinations, and the calculation statement is used to calculate the target Measure information under the dimension combination in the time range;
  • the data in the preset detailed table is aggregated and calculated by using the calculation statement of each of the dimension combinations, so as to obtain a pre-summary table of the measurement information under each of the dimension combinations.
  • the metric in the metric information has no corresponding physical field in the detailed table.
  • the method further includes:
  • Monitor the query requests sent by the user collect dimension combinations in the query requests whose query times exceed a preset number of times, and pre-aggregate the measurement information under the dimension combinations to generate a pre-aggregation table.
  • the method further includes:
  • a second aspect of the present application provides a data multidimensional analysis device, the device comprising:
  • a receiving unit configured to receive a query request from a user, and extract the analysis dimension combination in the query request
  • a judging unit for judging whether the analysis dimension combination is the same as any dimension combination calculated in the preset logic model, and if so, determining the same dimension combination as the analysis dimension combination as the target dimension combination;
  • determining a pre-summary table unit configured to determine a target pre-summary table corresponding to the target dimension combination in the pre-summary table pre-calculated in the preset logical model according to the corresponding relationship between the preset dimension combination and the pre-summary table,
  • the pre-summary table indicates a summary table of metric information under different dimension combinations obtained by summarizing and calculating from the preset detailed table by the preset logic model;
  • the returning unit is configured to obtain the data information to be queried by the query request from the target pre-summary table and return it to the user.
  • a statement rewriting unit is also included, for rewriting the query request to be applicable according to the preset rules if it is judged that the analysis dimension combination is not the same as any dimension combination calculated in the preset logical model.
  • the data information to be queried by the query request is obtained from the preset detailed table by using the query statement.
  • a third aspect of the present application provides a computer system, the system comprising:
  • memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform the method as described above.
  • the present application uses a preset logical model that stores dimension combinations and pre-summary tables corresponding to dimension combinations, and pre-calculates pre-summary tables corresponding to different dimension combinations. According to the analysis dimension combination, find the corresponding target pre-summary table in the pre-summary table, directly obtain the data information to be queried by the query request from the target pre-summary table and return it to the user, speed up the user's query on non-additive indicators, improve query performance.
  • Fig. 1 shows the flow chart of the data multidimensional analysis method provided in Embodiment 1 of the present application
  • Fig. 2 shows the structure diagram of the data multi-dimensional analysis device provided in Embodiment 2 of the present application
  • FIG. 3 shows a structural diagram of a computer system provided by Embodiment 3 of the present application.
  • non-additivity refers to a metric value that cannot be added no matter what latitude is used.
  • the metrics of non-additive indicators do not have corresponding physical fields in the detailed table, and non-additive indicators cannot directly The data summary in the schedule is calculated.
  • the multi-dimensional analysis and summary calculation of such indicators is more complicated.
  • the number of repurchases in the repurchase index requires the calculation of the number of members who have purchased products for more than two days. There is no physical field corresponding to the measure of the number of repurchases in the detailed table. Naturally, it is impossible to directly summarize the number of repurchases. calculate.
  • the pre-aggregated data of non-additive indicators is not simply aggregated from detailed data, but has a more complex logical operation relationship. When the business needs to query such indicators, under the massive amount of data in the detailed table, the calculation of such indicators is very slow and cannot meet the performance requirements of the business.
  • the present application proposes a data multi-dimensional analysis method.
  • the logical model stores a pre-summary table corresponding to dimension combinations and dimension combinations.
  • extract the analysis dimension combination in the query request find the corresponding target pre-summary table in the pre-summary table, and directly obtain the data information to be queried by the query request from the target pre-summary table and return it to the user, speeding up User queries on non-additive indicators improve query performance.
  • the embodiment of the present application provides a data multidimensional analysis method, which is exemplified by the method being configured in a data multidimensional analysis apparatus.
  • the apparatus can be applied to any computer equipment, so that the computer equipment can execute the data multidimensional analysis method.
  • the above method includes:
  • the user's query request includes the combination of analysis dimensions.
  • the query request is to query the monthly number of repurchases and the repurchase rate aggregated by region and store within a certain time period.
  • the analysis dimension combination in the query request here is region, Store and time granularity, the number of repurchases and the repurchase rate are metric information.
  • the preset logical model There are several dimension combinations in the preset logical model, such as the combination of large area, store and time; the combination of large area and store; the combination of large area and time; Whether the one-dimensional combination is the same, for example, the analysis dimension combination is region, store and time, and the preset logical model also has the combination of region, store and time, this situation is the same, the logic model has the combination of region, store and time, The analysis dimension combination is region, store and time, and the combination of region, store and time is determined as the target dimension combination.
  • a dimension combination in the logic model corresponds to a pre-summary table. After determining the target dimension combination, the corresponding target pre-summary table is found from all the pre-summary tables.
  • the pre-summary table is summarized from the detailed table by the logic model according to the preset calculation logic.
  • the measurement information summary table under the calculated dimension combination for example, the target dimension combination is the combination of region, store and time, and the target pre-summary table is the summary table of the number of repurchases and the repurchase rate under the combination of region, store and time dimension .
  • the query request is to query the monthly number of repurchases by region and store within a certain period of time.
  • Data can be directly obtained from the pre-summary table corresponding to the combination of regions, stores and time granularity dimensions and returned to the user.
  • the method also includes:
  • the query request is rewritten into a query statement suitable for the preset detailed table according to the preset rules, and the The query statement acquires the data information to be queried by the query request from the preset detailed table.
  • the query request needs to be rewritten into a query statement suitable for the detailed table according to the preset rules.
  • the corresponding physical fields cannot be directly calculated from the detailed table, and complex logical operations are required to summarize and calculate the summary table of measurement information under the combination of analysis dimensions from the detailed table.
  • This kind of query request is very small, and the query volume accounts for The ratio is very small, so the method as a whole meets the needs of the business.
  • the preset logical model includes dimension information, measurement information, model time period and calculation logic, and the method further includes the step of calculating dimension combinations in the preset logical model:
  • the dimension combination identifiers corresponding to all the dimension combinations are calculated according to the preset identifier rules.
  • the dimension information included in the logical model is the dimensions to be combined, such as area, store, and time
  • the measurement information is the measurement to be calculated, such as the number of repurchases, the repurchase rate
  • the model time period is the calculation time.
  • data such as one year, one month, and one week
  • the calculation logic refers to the measurement information used to calculate the dimension combination, such as calculating the number of repeat purchases under the combination of large areas and stores.
  • Mandatory dimensions refer to dimensions that must appear in each dimension combination, such as time and statistical period.
  • time will be added to each dimension combination, such as dimension
  • Each dimension in the information is area, store and time, then the calculated dimension combination: area and time, store and time, area, store and time, etc. If there is no mandatory dimension, you only need to calculate according to each dimension.
  • the dimension combination after free combination.
  • the current time refers to the time when the pre-summary table is calculated.
  • the model time period is one year, one month, and one week. For example, on June 20, 2020, when the pre-summary table is calculated and the model time period is one year, the target time range is calculated as 2020 From January 1, 2020 to December 31, 2020, the model time period is January, and the target time range is June 1, 2020 to June 30, 2020.
  • Each of the dimension combinations, the dimension combination identifiers corresponding to each of the dimension combinations, and the target time range are added to the calculation logic as parameters to form a calculation statement for each of the dimension combinations, and the calculation statement is used to calculate the target Measure information under the dimension combination in the time range;
  • the data in the preset detailed table is aggregated and calculated by using the calculation statement of each of the dimension combinations, so as to obtain a pre-summary table of the measurement information under each of the dimension combinations.
  • the calculation logic is a pre-written SQL statement with variables.
  • the dimension combination, the dimension combination identifier, and the target time range are taken as parameters into the SQL statement to form the target calculation statement, and the result of the dimension combination within the time range is calculated.
  • the measurement information such as dimension combination: area and store, area and store dimension combination ID: 12, target time range: June 1, 2020 to June 30, 2020 as a parameter into the SQL statement to form the target Calculation statement, use the target calculation statement to summarize and calculate the data in the preset detailed table, and calculate the number of repurchases and the number of repurchases under the dimension combination of region and store from June 1, 2020 to June 30, 2020 Rate.
  • the metrics in the metric information have no corresponding physical fields in the detailed table.
  • the purpose of this method is to speed up the user's query on non-additive indicators. Because the measurement of non-additive indicators does not have a corresponding physical field in the detailed table, the non-additive indicators cannot be directly obtained by summarizing and calculating the data in the detailed table. , you need to design the calculation logic first, and use the calculation statement to summarize and calculate the data in the detailed table to obtain the pre-summary table.
  • the method also includes:
  • Monitor the query requests sent by the user collect dimension combinations in the query requests whose query times exceed a preset number of times, and pre-aggregate the measurement information under the dimension combinations to generate a pre-aggregation table.
  • the method further includes:
  • the filter condition is Area A
  • the filter dimension is In the large area
  • the filter dimension is included in the analysis dimension combination. The next step is to determine whether the analysis dimension combination is the same as any dimension combination calculated in the preset logic model.
  • the analysis dimension combination is the area, store and time granularity
  • the filter condition is a certain type of product
  • the filter dimension is the category
  • the category is not written in the analysis dimension combination
  • the above method can be specifically used in the scenario of querying repurchase indicators.
  • the table in the data warehouse is widened to obtain a detailed table, and a repurchase model is established.
  • the repurchase model includes dimensions, measurement information, model time period and calculation logic. ,in:
  • Dimensions include time __time, store, region, category, detailed channel, business unit, marketing activity, member ID, time granularity (can be set to month, quarter, half year, year, etc.);
  • Metrics include the number of purchasers (calculated metric, the number of people who purchased the product within a specified time), the number of repurchases (calculated metric, the number of people who purchased more than or equal to two days) and the repurchase rate (derivative indicator, the number of repurchases/number of purchasers);
  • Model time period settings can be set to 1 year, 1 month and 1 week;
  • the calculation logic is an SQL statement with variables.
  • the SQL statement of the repurchase class model is as follows:
  • $ ⁇ groupByColumns ⁇ refers to the dimension combination in this SQL execution statement
  • $ ⁇ parquetTable ⁇ refers to the detailed table after the model data warehouse table is widened
  • $ ⁇ cuboid ⁇ is the dimension combination identifier, which is used to indicate a certain dimension combination. If groupByColumns is specified, the cuboid value can be calculated.
  • the number of purchasers and the number of repurchases are calculated measures in the model, and the actual name of the corresponding calculated measure in the model definition shall prevail.
  • This SQL statement can also add other filter conditions according to the actual situation.
  • the target time range, each dimension combination and the dimension combination identifier corresponding to the dimension combination are added as parameters to the SQL statement to form the target calculation statement;
  • the query SQL statement can be written as:
  • the analysis dimension is time granularity, region and store
  • the filter condition is the content behind where, that is, a region and a certain time period
  • the corresponding filter dimension is region and time.
  • time granularity there is time in the filter dimension
  • the filter dimension is included in the analysis dimension.
  • the filter dimension can be set or not set according to the requirements.
  • the logic model calculates the pre-summary table of the number of repurchases, the number of purchasers and the repurchase rate under the combination of time, region and store dimensions;
  • the monthly number of repurchases and the repurchase rate of stores in a certain region within a certain time period to be queried by the query request are obtained from the pre-summary table and returned to the user.
  • Embodiment 2 of the present application provides a data multi-dimensional analysis device, and the device includes:
  • a receiving unit 21 configured to receive a query request from a user, and extract the analysis dimension combination in the query request;
  • the user's query request includes the combination of analysis dimensions.
  • the query request is to query the monthly number of repurchases and the repurchase rate aggregated by region and store within a certain time period.
  • the analysis dimension combination in the query request here is region, Store and time granularity, the number of repurchases and the repurchase rate are metric information.
  • Judging unit 22 for judging whether the analysis dimension combination is the same as any dimension combination calculated in the preset logic model, and if so, determining the same dimension combination as the analysis dimension combination as the target dimension combination;
  • the preset logical model There are several dimension combinations in the preset logical model, such as the combination of large area, store and time; the combination of large area and store; the combination of large area and time; Whether the one-dimensional combination is the same, for example, the analysis dimension combination is region, store and time, and the preset logical model also has the combination of region, store and time, this situation is the same, the logic model has the combination of region, store and time, The analysis dimension combination is region, store and time, and the combination of region, store and time is determined as the target dimension combination.
  • Determining a pre-summary table unit 23 configured to determine the target pre-summary table corresponding to the target dimension combination in the pre-summary table pre-calculated in the preset logical model according to the corresponding relationship between the preset dimension combination and the pre-summary table , the pre-summary table indicates the metric information summary table under different dimension combinations obtained by the preset logical model from the preset detailed table summary calculation;
  • a dimension combination in the logic model corresponds to a pre-summary table. After determining the target dimension combination, the corresponding target pre-summary table is found from all the pre-summary tables.
  • the pre-summary table is summarized from the detailed table by the logic model according to the preset calculation logic.
  • the measurement information summary table under the calculated dimension combination for example, the target dimension combination is the combination of region, store and time, and the target pre-summary table is the summary table of the number of repurchases and the repurchase rate under the combination of region, store and time dimension .
  • the returning unit 24 is configured to obtain the data information to be queried by the query request from the target pre-summary table and return it to the user.
  • the query request is to query the monthly number of repurchases by region and store within a certain period of time.
  • Data can be directly obtained from the pre-summary table corresponding to the combination of regions, stores and time granularity dimensions and returned to the user.
  • a statement rewriting unit configured to rewrite the query request to be suitable for the preset according to preset rules if it is determined that the analysis dimension combination is not the same as any dimension combination calculated in the preset logical model
  • the query statement of the detailed table is used to obtain the data information to be queried by the query request from the preset detailed table by using the query statement.
  • Embodiment 3 of the present application provides a computer system, including:
  • a memory associated with the one or more processors the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, the method steps of Embodiment 1 are performed, such as Do the following:
  • the target pre-summary table corresponding to the target dimension combination is determined in the pre-summary table pre-calculated in the preset logical model, and the pre-summary table indicates that the A summary table of measurement information under different dimension combinations obtained by summarizing and calculating from the preset detailed table by the preset logical model;
  • the data information to be queried by the query request is obtained from the target pre-summary table and returned to the user.
  • FIG. 3 exemplarily shows the architecture of the computer system, which may specifically include a processor 1510 , a video display adapter 1511 , a disk drive 1512 , an input/output interface 1513 , a network interface 1514 , and a memory 1520 .
  • the processor 1510 , the video display adapter 1511 , the disk drive 1512 , the input/output interface 1513 , and the network interface 1514 , and the memory 1520 may be communicatively connected through the communication bus 1530 .
  • the processor 1510 may adopt a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application-specific integrated circuit) Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., are used to execute relevant programs to realize the technical solutions provided in this application.
  • a general-purpose CPU Central Processing Unit, central processing unit
  • a microprocessor an application-specific integrated circuit (Application-specific integrated circuit) Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc.
  • the memory 1520 may be implemented in the form of a ROM (Read Only Memory, read only memory), a RAM (Random Access Memory, random access memory), a static storage device, a dynamic storage device, and the like.
  • the memory 1520 may store an operating system 1521 for controlling the operation of the computer system 1500 , a basic input output system (BIOS) for controlling low-level operations of the computer system 1500 .
  • BIOS basic input output system
  • a web browser 1523, a data storage management system 1524, an icon font processing system 1525, and the like may also be stored.
  • the above-mentioned icon font processing system 1525 may be an application program that specifically implements the operations of the foregoing steps in this embodiment of the present application.
  • the relevant program codes are stored in the memory 1520 and invoked by the processor 1510 for execution.
  • the input/output interface 1513 is used for connecting input/output modules to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • the input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc.
  • the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the network interface 1514 is used to connect a communication module (not shown in the figure), so as to realize the communication interaction between the device and other devices.
  • the communication module can implement communication through wired means (such as USB, network cable, etc.), or can implement communication through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 1530 includes a path that transfers information between the various components of the device (eg, processor 1510, video display adapter 1511, disk drive 1512, input/output interface 1513, network interface 1514, and memory 1520).
  • processor 1510 processor 1510, video display adapter 1511, disk drive 1512, input/output interface 1513, network interface 1514, and memory 1520.
  • the computer system 1500 can also obtain the information of the specific collection conditions from the virtual resource object collection condition information database 1541, so as to be used for condition judgment, and so on.
  • the above device only shows the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, the memory 1520, the bus 1530, etc., in the specific implementation process, the A device may also include other components necessary for proper operation.
  • the above-mentioned device may also include only the necessary components to realize the solution of the present application, instead of all the components shown in the figures.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据多维分析方法,所述方法包括:接收用户的查询请求,提取查询请求中的分析维度组合(S11);判断所述分析维度组合是否与预设逻辑模型中算得的任一维度组合相同,若是,将逻辑模型中与所述分析维度组合相同的维度组合确定为目标维度组合(S12);根据预设的维度组合与预汇总表的对应关系,在逻辑模型中预先算得的预汇总表中确定目标维度组合对应的目标预汇总表(S13);从目标预汇总表中获取所述查询请求所要查询的数据信息并返回给用户(S14)。相比现有技术,本方法能够加速对非可加性指标的查询,提高查询性能。

Description

数据多维分析方法、装置及系统 技术领域
本申请涉及智能数据处理技术领域,具体涉及一种数据多维分析方法、装置及系统。
背景技术
在数仓和多维分析领域中,指标是指一个或多个维度下的度量,如指标为某地区某店的销售金额,其中地区和门店是维度,销售金额是度量。指标有一种特性,叫做可加性,可加性是指按照各个维度都可以相加的度量值,可加性指标中的度量在明细表中有对应的物理字段,因此可加性指标可以直接对业务相关明细表中的数据汇总计算获得,如销售金额,明细表中每一笔明细的销售金额都可以累积起来计算,从地域维度,部门维度、商品维度、时间维度上销售金额都是可加的,获取任一维度组合下的销售金额时都可以直接从明细表中计算。
非可加性是指无论按照哪个纬度都不可以相加的度量值,非可加性指标的度量在明细表中没有对应的物理字段,非可加性指标无法直接对明细表中的数据汇总计算获得。主要包含有2大类,一类是相对性指标如毛利率、同比、环比等。另一类是复杂性指标例如去重性指标UV。
对于非可加性指标,大致可以分成三类情况:
对于比值类的数据,例如平均值,虽然该度量不具备可加性,在明细表中没有对应的物理字段,但是平均值是根据sum值和count值来进行计算的,而这两个度量是具备可加性的。
对于去重类指标,通过一些算法,也可以实现指标的可加性。例如使用hll进行非精确的去重计算,使用bitmap来进行精确去重汇总。
但是有一些指标则无法实现可加性,通常这类指标的多维分析汇总计算又比较复杂。例如复购类指标中的复购人数,就要求计算购买商品天数超过两天以上的会员人数,明细表中是没有复购人数这一度量对应的物理字段的,自然无法直接对复购人数汇总计算。非可加性指标的预汇总数据不是简单的从明细数据中汇总而来,而是有着比较复杂的逻辑运算关系。这类指标还包括TOPN,新老买家销售额等。当用户需要查询某个非可加性指标时,在明细表中的海量数据量下,这类指标的计算有可能非常慢,无法满足业务对性能上的需求。
技术问题
在此处键入技术问题描述段落。
技术解决方案
本申请提供了一种数据多维分析方法、装置及系统,能够加速对非可加性指标的查询,提高查询性能。
本申请提供了如下方案:
第一方面提供一种数据多维分析方法,所述方法包括:
接收用户的查询请求,提取所述查询请求中的分析维度组合;
判断所述分析维度组合是否与预设逻辑模型中算得的任一维度组合相同,若是,则将与所述分析维度组合相同的维度组合确定为目标维度组合;
根据预设的维度组合与预汇总表的对应关系,在所述预设逻辑模型中预先算得的预汇总表中确定所述目标维度组合对应的目标预汇总表,所述预汇总表指示由所述预设逻辑模型从预设明细表中汇总计算得到的不同维度组合下的度量信息汇总表;
从所述目标预汇总表中获取所述查询请求所要查询的数据信息并返回给用户。
进一步的,所述方法还包括:
若判断出所述分析维度组合不与预设逻辑模型中算得的任一维度组合相同时,则将所述查询请求按照预设的规则重写成适用于所述预设明细表的查询语句,利用所述查询语句从所述预设明细表中获取所述查询请求所要查询的数据信息。
进一步的,所述预设逻辑模型中包括维度信息、度量信息、模型时间周期和计算逻辑,所述方法还包括在所述预设逻辑模型中算得维度组合的步骤:
根据所述维度信息中的各维度计算出所有维度组合;
根据预设的标识规则计算出所有所述维度组合对应的维度组合标识。
进一步的,所述方法还包括在所述预设逻辑模型中预先算得预汇总表的步骤:
根据当前时间和所述模型时间周期,计算出当前时间对应的目标时间范围;
分别将各个所述维度组合、各个所述维度组合对应的维度组合标识以及所述目标时间范围作为参数加入计算逻辑中,以形成各个所述维度组合的计算语句,所述计算语句用于计算目标时间范围内维度组合下的度量信息;
利用各个所述维度组合的计算语句对所述预设明细表中的数据进行汇总计算,得到各个所述维度组合下度量信息的预汇总表。
进一步的,所述度量信息中的度量在所述明细表中无对应物理字段。
优选的,所述方法还包括:
监控用户发送的查询请求,收集查询请求中查询次数超过预设次数的维度组合,对所述维度组合下的度量信息进行预汇总,生成预汇总表。
进一步的,判断所述分析维度组合是否与预设逻辑模型中算得的任一维度组合相同之前,还包括:
判断所述查询请求中是否有过滤条件,若是,提取过滤条件中的过滤维度,判断所述过滤维度是否包含在所述分析维度组合中,若是,则判断所述分析维度组合是否与预设逻辑模型中算得的任一维度组合相同。
本申请第二方面提供一种数据多维分析装置,所述装置包括:
接收单元,用于接收用户的查询请求,提取所述查询请求中的分析维度组合;
判断单元,用于判断所述分析维度组合是否与预设逻辑模型中算得的任一维度组合相同,若是,则将与所述分析维度组合相同的维度组合确定为目标维度组合;
确定预汇总表单元,用于根据预设的维度组合与预汇总表的对应关系,在所述预设逻辑模型中预先算得的预汇总表中确定所述目标维度组合对应的目标预汇总表,所述预汇总表指示由所述预设逻辑模型从预设明细表中汇总计算得到的不同维度组合下的度量信息汇总表;
返回单元,用于从所述目标预汇总表中获取所述查询请求所要查询的数据信息并返回给用户。
进一步的,还包括语句重写单元,用于若判断出所述分析维度组合不与预设逻辑模型中算得的任一维度组合相同时,则将所述查询请求按照预设的规则重写成适用于所述预设明细表的查询语句,利用所述查询语句从所述预设明细表中获取所述查询请求所要查询的数据信息。
本申请第三方面提供一种计算机系统,所述系统包括:
一个或多个处理器;以及
与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行如上所述的方法。
有益效果
根据本申请提供的具体实施例,本申请公开了以下技术效果:
本申请通过预设逻辑模型,逻辑模型中存储有维度组合和维度组合对应的预汇总表,通过预先计算出不同维度组合对应的预汇总表,当接收到用户的查询请求后,提取查询请求中的分析维度组合,在预汇总表中找到对应的目标预汇总表,直接从目标预汇总表中获取查询请求所要查询的数据信息并返回给用户,加速用户对非可加性指标的查询,提高了查询性能。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了本申请实施例1提供的数据多维分析方法流程图;
图2示出了本申请实施例2提供的数据多维分析装置结构图;
图3示出了本申请实施例3提供的计算机系统结构图。
本发明的实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本申请保护的范围。
如背景技术所述,非可加性是指无论按照哪个纬度都不可以相加的度量值,非可加性指标的度量在明细表中没有对应的物理字段,非可加性指标无法直接对明细表中的数据汇总计算获得。
通常这类指标的多维分析汇总计算又比较复杂。例如复购类指标中的复购人数,就要求计算购买商品天数超过两天以上的会员人数,明细表中是没有复购人数这一度量对应的物理字段的,自然无法直接对复购人数汇总计算。非可加性指标的预汇总数据不是简单的从明细数据中汇总而来,而是有着比较复杂的逻辑运算关系。业务上需要查询这类指标时,在明细表中的海量数据量下,这类指标的计算非常慢,无法满足业务对性能上的需求。
为此本申请提出了一种数据多维分析方法,通过预设逻辑模型,逻辑模型中存储有维度组合和维度组合对应的预汇总表,通过预先计算出不同维度组合对应的预汇总表,当接收到用户的查询请求后,提取查询请求中的分析维度组合,在预汇总表中找到对应的目标预汇总表,直接从目标预汇总表中获取查询请求所要查询的数据信息并返回给用户,加速用户对非可加性指标的查询,提高了查询性能。
实施例1
本申请实施例提供一种数据多维分析方法,以该方法被配置于数据多维分析装置中来举例说明,该装置可以应用于任一计算机设备中,以使该计算机设备可以执行数据多维分析方法。
如图1所示,上述方法包括:
S11、接收用户的查询请求,提取所述查询请求中的分析维度组合;
用户的查询请求中包括分析维度组合,比如查询请求是查询某个时间段内的按大区、门店汇总的月度复购人数和复购率,这里的查询请求中的分析维度组合是大区、门店和时间粒度,复购人数和复购率是度量信息。
S12、判断所述分析维度组合是否与预设逻辑模型中算得的任一维度组合相同,若是,则将与所述分析维度组合相同的维度组合确定为目标维度组合;
预设逻辑模型中有若干个维度组合,比如大区、门店和时间组合;大区和门店组合;大区和时间组合;门店和时间组合等,判断分析维度组合与预设逻辑模型中的任一维度组合是否相同,比如分析维度组合是大区、门店和时间,预设逻辑模型中也有大区、门店和时间组合,这种情况就是相同,逻辑模型中具有大区、门店和时间组合,分析维度组合为大区、门店和时间,将大区、门店和时间组合确定为目标维度组合。
S13、根据预设的维度组合与预汇总表的对应关系,在所述预设逻辑模型中预先算得的预汇总表中确定目标维度组合对应的目标预汇总表,所述预汇总表指示由所述预设逻辑模型从预设明细表中汇总计算得到的不同维度组合下的度量信息汇总表;
逻辑模型中一个维度组合对应一个预汇总表,确定目标维度组合后从所有预汇总表中找到对应的目标预汇总表,预汇总表是由逻辑模型根据预设好的计算逻辑从明细表中汇总计算得到的维度组合下的度量信息汇总表,比如目标维度组合为大区、门店和时间组合,目标预汇总表是大区、门店和时间维度组合下的复购人数和复购率的汇总表。
S14、从目标预汇总表中获取所述查询请求所要查询的数据信息并返回给用户。
比如查询请求所要查询的是某个时间段内的按大区、门店汇总的月度复购人数,可以直接从大区、门店和时间粒度维度组合对应的预汇总表中获取数据,返回给用户。
所述方法还包括:
若判断出所述分析维度组合不与预设逻辑模型中算得的任一维度组合相同时,则将所述查询请求按照预设的规则重写成适用于所述预设明细表的查询语句,利用所述查询语句从所述预设明细表中获取所述查询请求所要查询的数据信息。
如果查询请求中包括的分析维度组合与预设逻辑模型中的任一维度组合都不相同,需要按照预设的规则将查询请求重写成适用于明细表的查询语句,由于度量在明细表中没有对应的物理字段,不能直接从明细表中汇总计算,需要复杂的逻辑运算才能从明细表中汇总计算出分析维度组合下的度量信息汇总表,这种查询请求是很少量的,查询量占比很小,因此本方法整体上是满足业务上的需求。
所述预设逻辑模型中包括维度信息、度量信息、模型时间周期和计算逻辑,所述方法还包括在所述预设逻辑模型中算得维度组合的步骤:
根据所述维度信息中的各维度计算出所有维度组合;
根据预设的标识规则计算出所有所述维度组合对应的维度组合标识。
逻辑模型中包括的维度信息为要进行组合的各维度,比如大区、门店和时间,度量信息为要进行计算的度量,比如复购人数,复购率,模型时间周期为计算多长时间内的数据,比如一年、一月和一周,计算逻辑是指用于计算维度组合下的度量信息,比如计算大区和门店组合下的复购人数。
维度信息中可能会有强制维度,强制维度是指一定会在每种维度组合中都会出现的维度,比如时间和统计周期,计算所有维度组合时,会将时间加入每一个维度组合中,比如维度信息中的各维度分别是大区、门店和时间,则计算出的维度组合:大区和时间,门店和时间,大区、门店和时间等,没有强制维度,则只需根据各维度计算出自由组合后的维度组合。
逻辑模型中预设有标识规则用于对维度组合进行标识,比如将大区和门店,大区和时间,门店和时间三个维度组合分别标识为12、13和23。
所述方法还包括在所述预设逻辑模型中预先算得预汇总表的步骤:
根据当前时间和所述模型时间周期,计算出当前时间对应的目标时间范围;
当前时间是指计算预汇总表的时间,模型时间周期为一年、一月和一周,比如2020年6月20号计算预汇总表,模型时间周期为一年,则计算出目标时间范围为2020年1月1日至2020年12月31日,模型时间周期为一月,则目标时间范围为2020年6月1日至2020年6月30日。
分别将各个所述维度组合、各个所述维度组合对应的维度组合标识以及所述目标时间范围作为参数加入计算逻辑中,以形成各个所述维度组合的计算语句,所述计算语句用于计算目标时间范围内维度组合下的度量信息;
利用各个所述维度组合的计算语句对所述预设明细表中的数据进行汇总计算,得到各个所述维度组合下度量信息的预汇总表。
计算逻辑是预先写好的SQL语句,该SQL语句中带有变量,分别将维度组合、维度组合标识和目标时间范围作为参数带入到SQL语句形成目标计算语句,计算出时间范围内维度组合下的度量信息,比如将维度组合:大区和门店,大区和门店维度组合标识:12,目标时间范围:2020年6月1日至2020年6月30日作为参数带入到SQL语句形成目标计算语句,利用该目标计算语句对预先设置的明细表中的数据汇总计算,计算出2020年6月1日至2020年6月30日大区和门店这个维度组合下的复购人数和复购率。
所述度量信息中的度量在所述明细表中无对应物理字段。
本方法是为了加速用户对非可加性指标的查询,因为非可加性指标的度量在明细表中没有对应的物理字段,所以非可加性指标无法直接对明细表中的数据汇总计算获得,需要先设计好计算逻辑,使用计算语句对明细表中的数据汇总计算得到预汇总表。
所述方法还包括:
监控用户发送的查询请求,收集查询请求中查询次数超过预设次数的维度组合,对所述维度组合下的度量信息进行预汇总,生成预汇总表。
对用户发送的查询请求进行监控,对查询次数多的维度组合进行预汇总,生成预汇总表,便于加速用户下次查询。
判断所述分析维度组合是否与预设逻辑模型中算得的任一维度组合相同之前,还包括:
判断所述查询请求中是否有过滤条件,若是,提取过滤条件中的过滤维度,判断所述过滤维度是否包含在所述分析维度组合中,若是,则判断所述分析维度组合是否与预设逻辑模型中算得的任一维度组合相同。
查询请求中可能会有过滤条件,比如用户只需查询A区门店月度的复购人数,那么写查询请求时,分析维度组合是大区、门店和时间粒度,过滤条件就是A区,过滤维度是大区,过滤维度包含在分析维度组合中,接下来就是判断分析维度组合是否与预设逻辑模型中算得的任一维度组合相同,如果用户想要查询A区门店某类商品月度的复购人数,写查询请求时,分析维度组合是大区、门店和时间粒度,过滤条件是某类商品,过滤维度就是品类,而品类没写在分析维度组合中,则不用去判断分析维度组合是否与预设逻辑模型中算得的任一维度组合相同,因为即便从预设逻辑模型中找到与分析维度组合一样的预汇总表,但是该预汇总表中同样没有品类这一维度,也就无法查询到某类商品月度的复购人数,这种情况还是需要从明细表中汇总计算得到结果。
上述方法具体可用在查询复购类指标的场景中,首先将数仓中的表拉宽得到明细表,建立复购类模型,复购类模型中包括维度、度量信息、模型时间周期和计算逻辑,其中:
维度包括时间 __time、门店、大区、品类、明细渠道、事业部、营销活动、会员ID、时间粒度(可以设置为月、季、半年、年等);
度量包括购买人数(计算度量,在指定时间内购买商品的人数)、复购人数(计算度量,购买天数大于等于两天的人数)和复购率(衍生指标,复购人数/购买人数);
模型时间周期设置可以设置为1年、1月和1周;
计算逻辑为具有变量的SQL语句,复购类模型的SQL语句如下:
with preAgg as (
select     ${groupbyColumns},member_id,max(__time)
max_statis_date,min(__time) min_statis_date
      from ${parquetTable} tt2
where $timeRange 
group by  ${groupbyColumns}, member_id)
select ${cuboid},$groupbyColumns,count(distinct member_id) as 购买人数
,count(case when max_statis_date!=min_statis_date then member_id else null end ) as 复购人数
from preAgg group by  $cuboid, ${groupbyColumns}
其中以$开头的表示变量,在实际执行中要以实际值来代替。其中:
${groupByColumns}指的是此次SQL执行语句中的维度组合
${parquetTable}是指在模型数仓表拉宽后的明细表
${timeRange} 是指当天对应的周期时间范围。例如如果模型时间周期是周,当前计算时间为2018-11-30,那么timeRange则为 __time>=’2018-11-26 00:00:00’ and __time<’2018-12-02 00:00:00’
${cuboid} 是维度组合标识,用于指明某个维度组合,指定了groupByColumns,cuboid值也即可以算出。
购买人数、复购人数是模型中的计算度量,实际以模型定义中的对应计算度量名称为准。
这个SQL语句也可以根据实际情况加上其他过滤条件。
建立好模型后,首先根据模型时间周期,以及当前计算时间,确定目标时间范围;
将维度信息中的各维度自由组合,计算出所有维度组合;
按照一定的标识规则,分别计算出各维度组合对应的维度组合标识;
分别将目标时间范围、各维度组合和维度组合对应的维度组合标识作为参数加入到SQL语句中,形成目标计算语句;
利用目标计算语句从明细表中汇总计算,得到所有维度组合下复购人数、购买人数和复购率的预汇总表。
针对上述复购类模型,如果想汇总在某个时间段内的按大区、门店维度汇总的月度复购人数和复购率,查询SQL语句可以写成:
SELECT month(__time),大区,门店,sum(购买人数),sum(复购人数),sum(复购人数)/sum(购买人数) as 复购率 from model where 大区=? Group by month(__time),大区,门店 where __time>= and __time<=
从上面SQL语句中可以看出汇总函数是非常简单的,和普通的指标查询基本一样。其中分析维度为时间粒度、大区和门店,过滤条件为where后面的内容即某大区和某时间段,对应的过滤维度则是大区和时间,这里过滤维度中有时间,而分析维度中有时间粒度,也视为过滤维度包含在分析维度中,过滤维度根据需求设置,也可以不设置。
接收该查询请求后,提取查询请求中的分析维度组合为时间、大区和门店;
逻辑模型中计算有时间、大区和门店维度组合下复购人数、购买人数和复购率的预汇总表;
从该预汇总表中获取所述查询请求所要查询的某个时间段内的某大区内门店的月度复购人数和复购率并返回给用户。
 
实施例2
对应上述方法,如图2所示,本申请实施例2提供一种数据多维分析装置,所述装置包括:
接收单元21,用于接收用户的查询请求,提取所述查询请求中的分析维度组合;
用户的查询请求中包括分析维度组合,比如查询请求是查询某个时间段内的按大区、门店汇总的月度复购人数和复购率,这里的查询请求中的分析维度组合是大区、门店和时间粒度,复购人数和复购率是度量信息。
判断单元22,用于判断所述分析维度组合是否与预设逻辑模型中算得的任一维度组合相同,若是,则将与所述分析维度组合相同的维度组合确定为目标维度组合;
预设逻辑模型中有若干个维度组合,比如大区、门店和时间组合;大区和门店组合;大区和时间组合;门店和时间组合等,判断分析维度组合与预设逻辑模型中的任一维度组合是否相同,比如分析维度组合是大区、门店和时间,预设逻辑模型中也有大区、门店和时间组合,这种情况就是相同,逻辑模型中具有大区、门店和时间组合,分析维度组合为大区、门店和时间,将大区、门店和时间组合确定为目标维度组合。
确定预汇总表单元23,用于根据预设的维度组合与预汇总表的对应关系,在所述预设逻辑模型中预先算得的预汇总表中确定所述目标维度组合对应的目标预汇总表,所述预汇总表指示由所述预设逻辑模型从预设明细表中汇总计算得到的不同维度组合下的度量信息汇总表;
逻辑模型中一个维度组合对应一个预汇总表,确定目标维度组合后从所有预汇总表中找到对应的目标预汇总表,预汇总表是由逻辑模型根据预设好的计算逻辑从明细表中汇总计算得到的维度组合下的度量信息汇总表,比如目标维度组合为大区、门店和时间组合,目标预汇总表是大区、门店和时间维度组合下的复购人数和复购率的汇总表。
返回单元24,用于从所述目标预汇总表中获取所述查询请求所要查询的数据信息并返回给用户。
比如查询请求所要查询的是某个时间段内的按大区、门店汇总的月度复购人数,可以直接从大区、门店和时间粒度维度组合对应的预汇总表中获取数据,返回给用户。
语句重写单元,用于若判断出所述分析维度组合不与预设逻辑模型中算得的任一维度组合相同时,则将所述查询请求按照预设的规则重写成适用于所述预设明细表的查询语句,利用所述查询语句从所述预设明细表中获取所述查询请求所要查询的数据信息。
 
实施例3
对应上述方法和装置,本申请实施例3提供一种计算机系统,包括:
一个或多个处理器;以及
与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行实施例一的方法步骤,如执行以下操作:
接收用户的查询请求,提取所述查询请求中的分析维度组合;
判断所述分析维度组合是否与预设逻辑模型中算得的任一维度组合相同,若是,则将与所述分析维度组合相同的维度组合确定为目标维度组合;
根据预设的维度组合与预汇总表的对应关系,在所述预设逻辑模型中预先算得的预汇总表中确定所述目标维度组合对应的目标预汇总表,所述预汇总表指示由所述预设逻辑模型从预设明细表中汇总计算得到的不同维度组合下的度量信息汇总表;
从所述目标预汇总表中获取所述查询请求所要查询的数据信息并返回给用户。
其中,图3示例性的展示出了计算机系统的架构,具体可以包括处理器1510,视频显示适配器1511,磁盘驱动器1512,输入/输出接口1513,网络接口1514,以及存储器1520。上述处理器1510、视频显示适配器1511、磁盘驱动器1512、输入/输出接口1513、网络接口1514,与存储器1520之间可以通过通信总线1530进行通信连接。
其中,处理器1510可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本申请所提供的技术方案。
存储器1520可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器1520可以存储用于控制计算机系统1500运行的操作系统1521,用于控制计算机系统1500的低级别操作的基本输入输出系统(BIOS)。另外,还可以存储网页浏览器1523,数据存储管理系统1524,以及图标字体处理系统1525等等。上述图标字体处理系统1525就可以是本申请实施例中具体实现前述各步骤操作的应用程序。总之,在通过软件或者固件来实现本申请所提供的技术方案时,相关的程序代码保存在存储器1520中,并由处理器1510来调用执行。
输入/输出接口1513用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
网络接口1514用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线1530包括一通路,在设备的各个组件(例如处理器1510、视频显示适配器1511、磁盘驱动器1512、输入/输出接口1513、网络接口1514,与存储器1520)之间传输信息。
另外,该计算机系统1500还可以从虚拟资源对象领取条件信息数据库1541中获得具体领取条件的信息,以用于进行条件判断,等等。
需要说明的是,尽管上述设备仅示出了处理器1510、视频显示适配器1511、磁盘驱动器1512、输入/输出接口1513、网络接口1514,存储器1520,总线1530等,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本申请方案所必需的组件,而不必包含图中所示的全部组件。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,云服务器,或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的方法。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统或系统实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的系统及系统实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
以上对本申请所提供的数据多维分析方法、装置及系统,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处。综上所述,本说明书内容不应理解为对本申请的限制。

Claims (10)

  1. 一种数据多维分析方法,其特征在于,所述方法包括:
    接收用户的查询请求,提取所述查询请求中的分析维度组合;
    判断所述分析维度组合是否与预设逻辑模型中算得的任一维度组合相同,若是,则将与所述分析维度组合相同的维度组合确定为目标维度组合;
    根据预设的维度组合与预汇总表的对应关系,在所述预设逻辑模型中预先算得的预汇总表中确定所述目标维度组合对应的目标预汇总表,所述预汇总表指示由所述预设逻辑模型从预设明细表中汇总计算得到的不同维度组合下的度量信息汇总表;
    从所述目标预汇总表中获取所述查询请求所要查询的数据信息并返回给用户。
  2. 如权利要求1所述的数据多维分析方法,其特征在于,所述方法还包括:
    若判断出所述分析维度组合不与预设逻辑模型中算得的任一维度组合相同时,则将所述查询请求按照预设的规则重写成适用于所述预设明细表的查询语句,利用所述查询语句从所述预设明细表中获取所述查询请求所要查询的数据信息。
  3. 如权利要求1所述的数据多维分析方法,其特征在于,所述预设逻辑模型中包括维度信息、度量信息、模型时间周期和计算逻辑,所述方法还包括在所述预设逻辑模型中算得维度组合的步骤:
    根据所述维度信息中的各维度计算出所有维度组合;
    根据预设的标识规则计算出所有所述维度组合对应的维度组合标识。
  4. 如权利要求3所述的数据多维分析方法,其特征在于,所述方法还包括在所述预设逻辑模型中预先算得预汇总表的步骤:
    根据当前时间和所述模型时间周期,计算出当前时间对应的目标时间范围;
    分别将各个所述维度组合、各个所述维度组合对应的维度组合标识以及所述目标时间范围作为参数加入计算逻辑中,以形成各个所述维度组合的计算语句,所述计算语句用于计算目标时间范围内维度组合下的度量信息;
    利用各个所述维度组合的计算语句对所述预设明细表中的数据进行汇总计算,得到各个所述维度组合下度量信息的预汇总表。
  5. 如权利要求3或4所述的数据多维分析方法,其特征在于:
    所述度量信息中的度量在所述明细表中无对应的物理字段。
  6. 如权利要求1所述的数据多维分析方法,其特征在于,所述方法还包括:
    监控用户发送的查询请求,收集查询请求中查询次数超过预设次数的维度组合,对所述维度组合下的度量信息进行预汇总,生成预汇总表。
  7. 如权利要求1所述的数据多维分析方法,其特征在于,判断所述分析维度组合是否与预设逻辑模型中算得的任一维度组合相同之前,还包括:
    判断所述查询请求中是否有过滤条件,若是,提取过滤条件中的过滤维度,判断所述过滤维度是否包含在所述分析维度组合中,若是,则判断所述分析维度组合是否与预设逻辑模型中算得的任一维度组合相同。
  8. 一种数据多维分析装置,其特征在于,所述装置包括:
    接收单元,用于接收用户的查询请求,提取所述查询请求中的分析维度组合;
    判断单元,用于判断所述分析维度组合是否与预设逻辑模型中算得的任一维度组合相同,若是,则将与所述分析维度组合相同的维度组合确定为目标维度组合;
    确定预汇总表单元,用于根据预设的维度组合与预汇总表的对应关系,在所述预设逻辑模型中预先算得的预汇总表中确定所述目标维度组合对应的目标预汇总表,所述预汇总表指示由所述预设逻辑模型从预设明细表中汇总计算得到的不同维度组合下的度量信息汇总表;
    返回单元,用于从所述目标预汇总表中获取所述查询请求所要查询的数据信息并返回给用户。
  9. 如权利要求8所述的数据多维分析装置,其特征在于,所述装置还包括:
    语句重写单元,用于若判断出所述分析维度组合不与预设逻辑模型中算得的任一维度组合相同时,则将所述查询请求按照预设的规则重写成适用于所述预设明细表的查询语句,利用所述查询语句从所述预设明细表中获取所述查询请求所要查询的数据信息。
  10. 一种计算机系统,其特征在于,所述系统包括:
    一个或多个处理器;以及
    与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行如权利要求1-8任一项所述的方法。
PCT/CN2021/099664 2020-07-08 2021-06-11 数据多维分析方法、装置及系统 WO2022007592A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010653271.2 2020-07-08
CN202010653271.2A CN112000747B (zh) 2020-07-08 2020-07-08 数据多维分析方法、装置及系统

Publications (1)

Publication Number Publication Date
WO2022007592A1 true WO2022007592A1 (zh) 2022-01-13

Family

ID=73466760

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/099664 WO2022007592A1 (zh) 2020-07-08 2021-06-11 数据多维分析方法、装置及系统

Country Status (2)

Country Link
CN (1) CN112000747B (zh)
WO (1) WO2022007592A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116383299A (zh) * 2023-03-31 2023-07-04 国任财产保险股份有限公司 一种基于分布式数据库的数据展示系统
CN116610715A (zh) * 2023-07-18 2023-08-18 国网浙江省电力有限公司宁波供电公司 一种用于多级存储数据的多维分析方法及系统

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112000747B (zh) * 2020-07-08 2022-11-18 苏宁云计算有限公司 数据多维分析方法、装置及系统
CN112540972A (zh) * 2020-12-16 2021-03-23 中盈优创资讯科技有限公司 一种基于RoaringBitmap海量用户高效圈选方法及装置
CN113486066B (zh) * 2021-07-15 2023-03-24 福建博思软件股份有限公司 一种报表分级汇总的方法及终端
CN115392799B (zh) * 2022-10-27 2023-04-11 平安科技(深圳)有限公司 归因分析方法、装置、计算机设备及存储介质
CN115455010B (zh) * 2022-11-09 2023-02-28 以萨技术股份有限公司 一种基于milvus数据库的数据处理方法、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140156588A1 (en) * 2012-11-30 2014-06-05 Symantec Corporation Systems and methods for performing customized large-scale data analytics
TW201423616A (zh) * 2012-12-10 2014-06-16 Chunghwa Telecom Co Ltd 以對應化約架構處理多維度預先彙總之方法
CN105224534A (zh) * 2014-05-29 2016-01-06 腾讯科技(深圳)有限公司 一种请求响应的方法及装置
CN106528787A (zh) * 2016-11-09 2017-03-22 合网络技术(北京)有限公司 一种基于海量数据多维分析的查询方法及装置
CN109241159A (zh) * 2018-08-07 2019-01-18 威富通科技有限公司 一种数据立方体的分区查询方法、系统及终端设备
CN110276059A (zh) * 2019-06-24 2019-09-24 银联商务股份有限公司 一种动态报表的处理方法和装置
CN112000747A (zh) * 2020-07-08 2020-11-27 苏宁云计算有限公司 数据多维分析方法、装置及系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140156588A1 (en) * 2012-11-30 2014-06-05 Symantec Corporation Systems and methods for performing customized large-scale data analytics
TW201423616A (zh) * 2012-12-10 2014-06-16 Chunghwa Telecom Co Ltd 以對應化約架構處理多維度預先彙總之方法
CN105224534A (zh) * 2014-05-29 2016-01-06 腾讯科技(深圳)有限公司 一种请求响应的方法及装置
CN106528787A (zh) * 2016-11-09 2017-03-22 合网络技术(北京)有限公司 一种基于海量数据多维分析的查询方法及装置
CN109241159A (zh) * 2018-08-07 2019-01-18 威富通科技有限公司 一种数据立方体的分区查询方法、系统及终端设备
CN110276059A (zh) * 2019-06-24 2019-09-24 银联商务股份有限公司 一种动态报表的处理方法和装置
CN112000747A (zh) * 2020-07-08 2020-11-27 苏宁云计算有限公司 数据多维分析方法、装置及系统

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116383299A (zh) * 2023-03-31 2023-07-04 国任财产保险股份有限公司 一种基于分布式数据库的数据展示系统
CN116610715A (zh) * 2023-07-18 2023-08-18 国网浙江省电力有限公司宁波供电公司 一种用于多级存储数据的多维分析方法及系统
CN116610715B (zh) * 2023-07-18 2023-11-28 国网浙江省电力有限公司宁波供电公司 一种用于多级存储数据的多维分析方法及系统

Also Published As

Publication number Publication date
CN112000747B (zh) 2022-11-18
CN112000747A (zh) 2020-11-27

Similar Documents

Publication Publication Date Title
WO2022007592A1 (zh) 数据多维分析方法、装置及系统
CN107016001B (zh) 一种数据查询方法及装置
US8626702B2 (en) Method and system for validation of data extraction
KR102522274B1 (ko) 사용자 그룹화 방법 및 장치, 컴퓨터 장비, 컴퓨터 판독가능 저장 매체 및 컴퓨터 프로그램
US8219547B2 (en) Indirect database queries with large OLAP cubes
US10579589B2 (en) Data filtering
US20140172502A1 (en) Consumer walker reports
TW201828200A (zh) 一種資料處理方法和裝置
CN108491408B (zh) 一种活动信息的处理方法、装置、电子设备及存储介质
CN108932241B (zh) 日志数据统计方法、装置及节点
CN111414410A (zh) 数据处理方法、装置、设备和存储介质
CN113781106B (zh) 商品运营数据分析方法、装置、设备及计算机可读介质
US10134159B1 (en) Data-model-driven visualization of data sets
US20140278790A1 (en) System and method for data acquisition, data warehousing, and providing business intelligence in a retail ecosystem
CN111143546A (zh) 一种获得推荐语的方法、装置及电子设备
US20230004560A1 (en) Systems and methods for monitoring user-defined metrics
CN109933759B (zh) 一种统计类数据表的生成方法和装置
US9230022B1 (en) Customizable result sets for application program interfaces
US20150046881A1 (en) Archiving business objects
CN112000723B (zh) 一种企业信息管理装置及其应用
TW201445342A (zh) 使用條件群組之樞紐分析方法
US20140012632A1 (en) Extension of business scenarios
US8452636B1 (en) Systems and methods for market performance analysis
US20170004537A1 (en) Methods and apparatus to estimate a number of actual mobile devices
US10949410B2 (en) Multi-threaded data analytics

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21837658

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21837658

Country of ref document: EP

Kind code of ref document: A1