CN115827685A - Optimization processing method and device suitable for big data index and storage medium - Google Patents

Optimization processing method and device suitable for big data index and storage medium Download PDF

Info

Publication number
CN115827685A
CN115827685A CN202211510338.2A CN202211510338A CN115827685A CN 115827685 A CN115827685 A CN 115827685A CN 202211510338 A CN202211510338 A CN 202211510338A CN 115827685 A CN115827685 A CN 115827685A
Authority
CN
China
Prior art keywords
index
data
query
metadata
optimized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211510338.2A
Other languages
Chinese (zh)
Inventor
伍攀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shushi Yunchuang Technology Co ltd
Original Assignee
Beijing Shushi Yunchuang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shushi Yunchuang Technology Co ltd filed Critical Beijing Shushi Yunchuang Technology Co ltd
Priority to CN202211510338.2A priority Critical patent/CN115827685A/en
Publication of CN115827685A publication Critical patent/CN115827685A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses an optimization processing method, an optimization processing device and a storage medium suitable for big data indexes, wherein the optimization processing method comprises the following steps: selecting a corresponding preset optimization strategy for optimizing initial data of the index calculation element to obtain optimized data of the index calculation element; analyzing the query index calculation elements in the index query request, extracting initial data and/or optimized data corresponding to the query index calculation elements, and constructing an index query execution plan. In the method for optimizing the big data index, when index calculation elements are determined in query calculation aiming at the big data index, the query index can be analyzed in advance, and a calculation link is optimized by applying a plurality of optimization strategies (such as a calculation optimization strategy, a cache strategy, a pre-calculation strategy, a materialized view and the like) in a strategy library; when index query is received, the optimized result can be used for accelerating query, and then query efficiency is improved.

Description

Optimization processing method and device suitable for big data index and storage medium
Technical Field
The invention relates to the technical field of data analysis, in particular to an optimization processing method and device suitable for big data indexes and a storage medium.
Background
With the driving of national big data strategy and enterprise digital transformation, the index function becomes more and more important, and the decision can be made more quickly and more stably, so that the risk can be effectively avoided, the market trend can be mastered, the market opportunity can be caught, the business target can be quickly reached, and people can take one step first.
In order to ensure the rapid and correct output of the index, various index calculation engines/schemes are available:
and (3) a solidified multidimensional analysis engine represented by Apache Ky i n. And calculating all dimension index combinations in advance according to the configured model, and accelerating in a space time-changing mode. The possible dimension combinations are calculated in advance in the scheme, association and aggregation are hardly needed in the query process, and the query speed is extremely high. The disadvantages of this solution:
1. only aggregated data queries, no detailed data queries;
2. dimension combination needs to be planned in advance, otherwise, a large amount of useless combinations are generated, and resources are wasted;
3. the flexibility is poor, extremely fast query support can be provided for fixed analysis conditions, and the effect of some exploratory query is poor or the support is not available;
4. for the poor support of real-time data, the query can be carried out only after the mode of advanced calculation is available;
5. the learning cost is high, the user needs to use Apache Ky i n, and various special concepts in Apache Ky i n need to be understood so as to optimize a proper dimension combination;
6. the maintenance cost is high, apache Ky i n needs to depend on other engines such as Apache Spark for precalculation, and data storage also depends on external components such as HBase.
A real-time analysis engine represented by Apache dors, starRcoks. Data acquisition efficiency is optimized in the direction of a data storage and query optimizer. Core optimization ideas in data storage- -reading less data, including: partitioned bucket storage, columnar storage, column-by-column compression, indexing, metadata (data distribution statistics), materialized views, vectorization processing and the like; core optimization ideas in query optimizer-optimal distributed physical execution plan, comprising: reuse common expressions, sub-query rewrite, joi n reorder, joi n distributed execution policy. The data is calculated in a distributed mode by fully utilizing a plurality of resources so as to obtain an extremely fast query effect.
The disadvantages of this solution:
1. the data query efficiency is influenced by machine resources and data volume;
2. the storage mode and distribution of data are already finished when the table is built, and subsequent optimization means are few;
3. the universality is good, but the query efficiency is still uncontrollable under the condition of determining the application scene.
In view of this, the present invention is specifically disclosed.
Disclosure of Invention
In order to solve the above technical problems, the present invention provides an optimization processing method, apparatus and storage medium suitable for big data indexes, and specifically, the following technical solutions are adopted:
an optimization processing method suitable for big data indexes comprises the following steps:
selecting a corresponding preset optimization strategy for optimizing initial data of the index calculation element to obtain optimized data of the index calculation element;
analyzing the query index calculation elements in the index query request, extracting initial data and/or optimized data corresponding to the query index calculation elements, and constructing an index query execution plan.
As an optional implementation manner of the present invention, in the optimization processing method applicable to the big data index, selecting a corresponding preset optimization strategy for optimizing initial data of the index calculation element to obtain optimized data of the index calculation element includes:
the initial data of the index calculation element comprises indexes, models and metadata of a data table, which are set by a user and are obtained from a service system;
analyzing the calculation intention of the index set by the user and the current data structure, selecting a corresponding preset optimization strategy, and optimizing the storage structure and the calculation structure of the metadata;
and generating a production plan for producing the optimized data according to the computing structure of the metadata and the computing structure after the optimization strategy is applied and in combination with the data production window.
As an optional embodiment of the present invention, in the method for optimizing a big data index according to the present invention, the manner of obtaining the index, the model, and the metadata of the data table set by the user from the business system includes:
increment trigger type acquisition, namely acquiring increment metadata from a service system when the increment part of the metadata in the service system reaches a preset threshold value;
or periodically acquiring the full amount of metadata, periodically acquiring the full amount of metadata from the business system according to a preset time interval, and deleting the original full amount of metadata after acquiring new full amount of metadata.
As an optional embodiment of the present invention, in the optimization processing method applicable to big data indexes, metadata of indexes, models, and data tables set by a user are obtained from a business system and stored in a basic repository;
the analyzing the calculation intention of the user setting index and the current data structure, selecting a corresponding preset optimization strategy, and optimizing the storage structure and the calculation structure of the metadata comprises the following steps:
reading all metadata in the basic repository, loading the selected preset optimization strategy, producing a new optimized data structure, and writing the optimized data structure into the optimized repository.
As an optional embodiment of the present invention, in the method for optimizing a big data index according to the present invention, the generating a production plan for producing optimized data according to a computing structure of metadata and a computing structure optimized by applying an optimization policy, in combination with a data production window, includes:
generating a data synchronization calculation logic according to the data structures before and after optimization and the optimized data structure calculation logic;
calculating window information by combining data to generate a data synchronization task;
registering the data synchronization task on a task scheduling platform;
and the task scheduling platform executes the task of data synchronization according to the configuration and synchronizes the data from the original table to the optimized table.
As an optional implementation manner of the present invention, in the method for optimizing a big data index according to the present invention, the analyzing query index calculation elements in the index query request, extracting initial data and/or optimized data corresponding to the query index calculation elements, and constructing the index query execution plan includes:
receiving a request of index query, wherein the content of the request comprises indexes, dimensions, index calculation logic and a query range;
analyzing the query index calculation element in the index query request, querying metadata information according to the query index element, and querying optimized data information at the same time;
if the metadata has been optimized, extracting the optimized data structure, otherwise extracting the metadata structure;
and constructing an index query execution plan according to the extracted optimized data structure and/or metadata structure.
As an optional embodiment of the present invention, in the optimization processing method applied to the big data index, the constructing an index query execution plan includes:
constructing an execution plan of the atomic index as an inner layer sq l according to the extracted data structure;
constructing a filtering condition of the outer layer sq l according to the query range in the index query request;
constructing project conditions of the outer layer sq l according to index calculation logic in the index query request;
and assembling the inner layer sq l and the outer layer sq l, constructing and optimizing the complete sq l, and returning the sq l of the index query execution plan.
As an optional embodiment of the present invention, in the optimization processing method applicable to the big data index, the index calculation element includes a first index calculation element set in an index definition stage and a second index calculation element set by a query condition in an index query stage;
the first index calculation element comprises a service time column, and/or a measurement column, and/or a dimension column, and/or an aggregation function, and/or an aggregation window size, and/or a window sliding step length;
the first index calculation element includes a query range.
The invention also provides an optimization processing device suitable for big data indexes, which comprises:
the strategy optimization module selects a corresponding preset optimization strategy for optimizing the initial data of the index calculation element to obtain the optimized data of the index calculation element;
and the query optimization module is used for analyzing the query index calculation elements in the index query request, extracting initial data and/or optimized data corresponding to the query index calculation elements and constructing an index query execution plan.
The invention also provides a storage medium which stores a computer executable program, and when the computer executable program is executed, the optimization processing method suitable for the big data index is realized.
Compared with the prior art, the invention has the following beneficial effects:
in the method for optimizing the big data index, when index calculation elements are determined in query calculation aiming at the big data index, the query index can be analyzed in advance, and a calculation link is optimized by applying a plurality of optimization strategies (such as a calculation optimization strategy, a cache strategy, a pre-calculation strategy, a materialized view and the like) in a strategy library; when index query is received, the optimized result can be used for accelerating query, and then query efficiency is improved.
Description of the drawings:
FIG. 1 is a flowchart of an optimization processing method for big data indicators according to an embodiment of the present invention;
FIG. 2 is a flowchart of obtaining metadata in an optimization processing method for big data indicators according to an embodiment of the present invention;
fig. 3 is a diagram illustrating a specific implementation example of obtaining metadata in an optimization processing method for big data indicators according to an embodiment of the present invention;
FIG. 4 is a flowchart of data structure optimization in an optimization processing method for big data indicators according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating an exemplary implementation of data structure optimization in an optimization processing method for big data indicators according to an embodiment of the present invention;
FIG. 6 is a first example of data structure optimization in an optimization processing method applicable to big data indexes according to an embodiment of the present invention;
FIG. 7 is a second example of data structure optimization in an optimization processing method applicable to big data indexes according to an embodiment of the present invention;
FIG. 8 is a flowchart of a data production plan in an optimization processing method for big data indicators according to an embodiment of the present invention;
FIG. 9 is a diagram illustrating an exemplary implementation of a data production plan in an optimization processing method for big data indicators according to an embodiment of the present invention;
FIG. 10 is a flowchart of data query optimization in an optimization processing method for big data indicators according to an embodiment of the present invention;
FIG. 11 is a flowchart of index query in an optimization processing method for big data indexes according to an embodiment of the present invention;
fig. 12 is an example of association between an index and a model in an optimization processing method suitable for big data indexes according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments.
Thus, the following detailed description of the embodiments of the invention is not intended to limit the scope of the invention as claimed, but is merely representative of some embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments of the present invention and the features and technical solutions thereof may be combined with each other without conflict.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present invention, it should be noted that the terms "upper", "lower", and the like refer to orientations or positional relationships based on those shown in the drawings, or orientations or positional relationships that are conventionally arranged when the products of the present invention are used, or orientations or positional relationships that are conventionally understood by those skilled in the art, and such terms are used for convenience of description and simplification of the description, and do not refer to or imply that the devices or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.
The technical terms related to the optimization processing method, device and storage medium for big data indexes in the embodiment are explained as follows:
index (I)
The index is a defined value, and is a method for quantifying and abstracting the fact to measure the target.
Atomic index
Consisting of non-splittable events + metrics.
Derived index
And limiting the atomic index in a certain business field.
For example, the index name: the new user's first order amount in the last 7 days. Business meaning: the order amount (unit: yuan) of the user who registered within the last 1 month at the earliest placing and completing the payment within the last 7 days is counted.
Atomic index: the amount of the order.
Time period modifier: the last 7 days (i.e., statistical day N-7).
Other modifiers 1: the first order and the payment completed (first order: the one the user was the earliest to place and complete the payment).
Other modifiers 2: a new user is registered (user registration time is approximately 1 month).
Deriving indexes: the new user's initial order amount in last 7 days = last 7 days + first order and complete payment + new registered user + order amount.
Service time series
The time of occurrence of the business event, such as the order time and the browsing time.
Dimension (d) of
A dimension is a measured environment and is used to reflect a class of attributes of a service, and a set of such attributes constitutes a dimension, which may also be referred to as an entity object. The dimensions belong to a data domain, such as a geographic dimension (including content on the level of country, region, province, and city), and a temporal dimension (including content on the level of year, season, month, week, day, etc.).
Dimension of hierarchy
The hierarchy dimension is a certain hierarchical relationship among dimensions, such as a family, a grander, a father, a brother, a sister and the like, which is a hierarchy.
Dimension of union
Federated dimensions, binding multiple dimensions together, when one is built, these dimensions always combine together the query at query time.
Fact table
Fact tables record a numerical consideration of a particular event, typically consisting of a data value and a foreign key that points to a dimension table. The granularity level of the fact table is typically designed to be low so that the fact table can record very primitive operational events, but the negative effect of doing so is that accumulating a large number of records can be time consuming. The fact table has the following three types:
transaction fact table: recording the fact of a particular event, such as a sale;
snapshot fact table: recording the fact of a given point in time, such as a monthly account balance;
cumulative fact table: the aggregated fact at a given point in time, such as the cumulative sales of the current month, is recorded. It is generally necessary to design a surrogate key for the fact table as a unique identifier for each row of records. Surrogate keys are primary keys generated by the system that are not application data, have no business meaning, and are transparent to the user.
Dimension meter
Dimension tables typically have fewer records than fact tables, but each record contains a number of attribute fields that describe the fact data. Dimension tables can define a wide variety of properties.
The following are several of the most common dimension tables:
a time dimension table: describing the times at which events recorded in the star model occur, with the lowest level of time granularity required. Data warehouses are time-varying data sets and need to record the history of the data, so each data warehouse needs a time dimension table.
A geographical dimension table: data describing location information such as country, province, city, county, zip code, etc.
Product dimension table: describing the product and attributes.
Staff dimension table: information about a person, such as a salesperson, marketer, developer, etc.
A range dimension table: information describing the segmented data, such as high level, medium level, low level.
Star model
The star model is the simplest form of a dimensional model and is also the most widely used form in data warehouse and data mart development.
A star schema consists of fact tables and dimension tables, and there may be one or more fact tables in a star schema, each referring to any number of dimension tables. The physical model of the star pattern is like a star, the center is a fact table, and dimension tables surrounding the fact table represent radial branches of the star, which is the origin of the name of the star pattern.
Snowflake model
A snowflake schema is a logical layout of a table in a multi-dimensional model, whose physical relational graph has a shape similar to a snowflake, hence the name.
Like the star schema, the snowflake schema is also composed of fact tables and dimension tables. The snowflake processing is to normalize the dimension table in the star pattern. When all dimension tables are normalized, snowflake type structures, namely snowflake patterns, centering on fact tables are formed. The normalization of the dimension tables is done by removing the low cardinality attributes from the dimension tables and forming a separate table. Cardinality refers to the number of different values in a field, e.g., the primary key column has a unique value and therefore has the highest cardinality, while the column cardinality, such as gender, is low.
Polymerization window size (window size si ze) & Window sliding step size (window size s ide)
Examples are: the number of registered users in the store in the last 7 days is calculated every day. Where the aggregation window size is 7 days and the window sliding step is 1 day.
Referring to fig. 1, an optimization processing method suitable for big data indexes in this embodiment includes:
selecting a corresponding preset optimization strategy for optimizing the initial data of the index calculation element to obtain optimized data of the index calculation element;
analyzing the query index calculation elements in the index query request, extracting initial data and/or optimized data corresponding to the query index calculation elements, and constructing an index query execution plan.
In the optimization processing method applicable to the big data index, in query calculation aiming at the big data index, after an index calculation element is determined, the query index can be analyzed in advance, and a calculation link is optimized by applying a plurality of optimization strategies (such as a calculation optimization strategy, a cache strategy, a pre-calculation strategy, a materialized view and the like) in a strategy library; when index query is received, the optimized result can be used for accelerating query, and then query efficiency is improved.
Therefore, according to the optimization processing method suitable for the big data index, the speed of big data index query is accelerated and the query efficiency is improved by optimizing the calculation link and optimizing the index query.
In the optimization processing method applicable to the big data index according to this embodiment, selecting a corresponding preset optimization strategy for optimizing the initial data of the index calculation element to obtain the optimized data of the index calculation element includes:
the initial data of the index calculation element comprises indexes, models and metadata of a data table, which are set by a user and are obtained from a service system;
analyzing the calculation intention of the index set by the user and the current data structure, selecting a corresponding preset optimization strategy, and optimizing the storage structure and the calculation structure of the metadata; the optimization strategies comprise a calculation optimization strategy, a cache strategy, a precomputation strategy, a materialized view and the like;
and generating a production plan for producing the optimized data according to the computing structure of the metadata and the computing structure after the optimization strategy is applied and in combination with the data production window.
Specifically, referring to fig. 2, in the optimization processing method applicable to big data indexes in this embodiment, the manner of obtaining the indexes, models, and metadata of the data table set by the user from the business system (i.e., the metadata obtaining process) includes:
increment trigger type acquisition, namely acquiring increment metadata from a service system when the increment part of the metadata in the service system reaches a preset threshold value;
or periodically acquiring the full amount of metadata, periodically acquiring the full amount of metadata from the business system according to a preset time interval, and deleting the original full amount of metadata after acquiring new full amount of metadata.
Referring to fig. 3, the specific implementation of the above-mentioned obtaining metadata in this embodiment includes: pulling metadata set by a user from a business system, and storing the metadata into a repository (replay entity); for subsequent optimization. Wherein the metadata content includes an index definition, a model definition, a fact table and a dimension table definition and summary (prof i l e) information.
Referring to fig. 4, in the optimization processing method applicable to big data indexes of the embodiment, metadata of indexes, models, and data tables set by a user are obtained from a business system and stored in a basic repository;
the analyzing the calculation intention of the index set by the user and the current data structure, selecting a corresponding preset optimization strategy, and optimizing the storage structure and the calculation structure of the metadata (namely, the data structure optimization process) comprises the following steps:
reading all metadata in the basic repository, loading the selected preset optimization strategy, producing a new optimized data structure, and writing the optimized data structure into the optimized repository.
Referring to fig. 5, the specific implementation of the data structure optimization in this embodiment includes: the repository (repos entity) storing the metadata is divided into two parts: one part is a base replacement entity (base) for storing the data structure before optimization, and the other part is an optimized replacement entity (opt im sized) for storing the data structure after optimization.
The data structure optimizer (StrutsOpt imi zer) reads all base repos orders (including index information, model information and original data table structure), loads optimization rules, generates a new data structure, and writes the new data structure into the opt imi sized repos orders.
Referring to fig. 6, the data structure optimization of the present embodiment implements optimization of a star model into a pre-broadening model sample. Referring to fig. 7, the data structure of this embodiment is optimized to implement the pre-polymerization sample.
Referring to fig. 8, in the optimization processing method applicable to the big data index according to the embodiment, the generating a production plan for producing optimized data according to the computing structure of the metadata and the computing structure optimized by applying the optimization strategy and combining the data production window (i.e. the data production planning process) includes:
generating a data synchronization calculation logic according to the data structures before and after optimization and the optimized data structure calculation logic;
calculating window information by combining data to generate a data synchronization task;
registering the data synchronization task on a task scheduling platform;
and the task scheduling platform executes the task of data synchronization according to the configuration and synchronizes the data from the original table to the optimized table.
Referring to fig. 9, the data production plan of the present embodiment is specifically implemented as follows:
according to the data structures before and after optimization and the optimized data structure calculation logic, the calculation logic of data synchronization can be generated;
window information is calculated by combining data, and a data synchronization task can be generated;
registering the data synchronization task on a task scheduling platform;
and the task scheduling platform executes the task of data synchronization according to the configuration and synchronizes the data from the original table to the optimized table.
Referring to fig. 10, in the optimization processing method applicable to the big data index in this embodiment, analyzing the query index calculation element in the index query request, extracting initial data and/or optimized data corresponding to the query index calculation element, and constructing an index query execution plan (i.e., data query optimization) includes:
receiving a request of index query, wherein the content of the request comprises indexes, dimensions, index calculation logic and a query range;
analyzing the query index calculation element in the index query request, querying metadata information according to the query index element, and querying optimized data information at the same time;
if the metadata has been optimized, extracting the optimized data structure, otherwise extracting the metadata structure;
and constructing an index query execution plan according to the extracted optimized data structure and/or metadata structure.
Further, the constructing of the index query execution plan in this embodiment includes:
constructing an execution plan of the atomic index as an inner layer sq l according to the extracted data structure;
constructing a filtering condition of the outer layer sq l according to the query range in the index query request;
constructing project conditions of the outer layer sq l according to index calculation logic in the index query request;
and assembling the inner layer sq l and the outer layer sq l, constructing and optimizing the complete sq l, and returning the sq l of the index query execution plan.
The data query optimization of this embodiment is specifically implemented as follows:
receiving a request (request) for index query, wherein the content of the request comprises information such as indexes, dimensions, index calculation logic, query range and the like;
analyzing the request, accessing the meta information of the index in the request query by the priority, and simultaneously querying the optimization information of the index;
if the data structure is optimized, extracting the optimized data structure (including index metadata, a model, a data table and the like), otherwise, extracting the data structure before optimization;
according to the data structure, constructing an execution plan (sq l) of the atomic index as an inner layer sq l;
constructing a filtering condition (f i l ter) of the outer layer sq l according to the query range of the request;
constructing project of the outer layer sq l according to the index calculation logic of the request;
assembling an inner layer sq l and an outer layer sq l to construct a complete sq l;
loading an execution plan optimizer (sq l opt imi ze ru l e), and optimizing the constructed sq l;
returning to the final optimized sq l;
and submitting the sq l to an execution engine to execute the sq l, and returning an execution result to a calling end.
Referring to fig. 11, the basic flow of data query optimization of the present embodiment is shown.
The optimization processing method suitable for the big data index comprises the following steps that index calculation elements comprise a first index calculation element and a second index calculation element, wherein the first index calculation element is arranged in an index definition stage, and the second index calculation element is arranged in an index query stage through query conditions;
the first index calculation element comprises a service time column, and/or a measurement column, and/or a dimension column, and/or an aggregation function, and/or an aggregation window size, and/or a window sliding step size;
the first index calculation element includes a query range.
In the index definition stage, data calculation logic, data organization relation and data storage positions are conveniently expressed. We introduce two entities, a model (including a star model and a snowflake model) and a data table (a fact table and a dimension table).
Wherein the model is composed of a data table and organization relations, and mainly expresses the organization relations of the data structure.
Data table records data storage locations.
And the indexes are associated with the models, and the business time column, the measurement column and the dimension column are mapped in the field positions of the data table.
Aggregation function and aggregation window size of the index metadata, the computational logic that determines the index.
The window sliding step size of the index metadata determines the index calculation frequency.
Referring to fig. 12, an example of the association of the index and the model according to the present embodiment is shown.
The optimization processing method suitable for the big data index in the embodiment has the following technical effects:
calculating scene depth optimization for the indicators: reorganizing the data according to the characteristics of the data model; according to the characteristics of the data use, different data organization modes are selected.
Dynamic and continuous optimization: as the system continues to be used, the engine dynamically adjusts the data distribution based on the analysis.
Friendly service and intelligent optimization: data analysis is focused on data modeling, and data organization, acquisition and delivery are carried out on the data modeling and the data organization to an engine for intelligent tuning.
Better compatibility: the underlying OLAP engine may select the most appropriate one based on the business form characteristics.
This embodiment provides an optimization processing device suitable for big data index simultaneously, includes:
the strategy optimization module is used for selecting a corresponding preset optimization strategy for optimizing the initial data of the index calculation element to obtain the optimized data of the index calculation element;
and the query optimization module is used for analyzing the query index calculation elements in the index query request, extracting initial data and/or optimized data corresponding to the query index calculation elements and constructing an index query execution plan.
The embodiment also provides a computer-readable storage medium, which stores a computer-executable program, and when the computer-executable program is executed, the method for optimizing the big data index is implemented.
The computer readable storage medium of the present embodiments may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable storage medium may be any computer readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
The embodiment also provides an electronic device, which includes a processor and a memory, where the memory is used to store a computer executable program, and when the computer program is executed by the processor, the processor executes the optimization processing method suitable for the big data index.
The electronic device is in the form of a general purpose computing device. The processor can be one or more and can work together. The invention does not exclude that the processing is distributed, i.e. the processors may be distributed over different physical devices. The electronic device of the present invention is not limited to a single entity, and may be a sum of a plurality of entity devices.
The memory stores a computer executable program, typically machine readable code. The computer readable program may be executed by the processor to enable an electronic device to perform the method of the invention, or at least some of the steps of the method.
The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may also be non-volatile memory, such as read-only memory (ROM).
It should be understood that elements or components not shown in the above examples may also be included in the electronic device of the present invention. For example, some electronic devices further include a display unit such as a display screen, and some electronic devices further include a human-computer interaction element such as a button, a keyboard, and the like. Electronic devices are considered to be covered by the present invention as long as the electronic devices are capable of executing a computer-readable program in a memory to implement the method of the present invention or at least a part of the steps of the method.
From the above description of the embodiments, those skilled in the art will readily appreciate that the present invention can be implemented by hardware capable of executing a specific computer program, such as the system of the present invention, and electronic processing units, servers, clients, mobile phones, control units, processors, etc. included in the system. The invention may also be implemented by computer software for performing the method of the invention, e.g. control software executed by a microprocessor, an electronic control unit, a client, a server, etc. It should be noted that the computer software for executing the method of the present invention is not limited to be executed by one or a specific hardware entity, and can also be realized in a distributed manner by non-specific hardware. For computer software, the software product may be stored in a computer readable storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or may be distributed over a network, as long as it enables the electronic device to perform the method according to the present invention.
The above embodiments are only used for illustrating the invention and not for limiting the technical solutions described in the invention, and although the present invention has been described in detail in the present specification with reference to the above embodiments, the present invention is not limited to the above embodiments, and therefore, any modification or equivalent replacement of the present invention is made; all such modifications and variations are intended to be included herein within the scope of this disclosure and the appended claims.

Claims (10)

1. An optimization processing method suitable for big data indexes is characterized by comprising the following steps:
selecting a corresponding preset optimization strategy for optimizing initial data of the index calculation element to obtain optimized data of the index calculation element;
analyzing the query index calculation elements in the index query request, extracting initial data and/or optimized data corresponding to the query index calculation elements, and constructing an index query execution plan.
2. The method as claimed in claim 1, wherein the selecting a corresponding preset optimization strategy for the initial data of the index calculation element to perform optimization to obtain the optimized data of the index calculation element comprises:
the initial data of the index calculation element comprises indexes, models and metadata of a data table, which are set by a user and are obtained from a service system;
analyzing the calculation intention of the index set by the user and the current data structure, selecting a corresponding preset optimization strategy, and optimizing the storage structure and the calculation structure of the metadata;
and generating a production plan for producing the optimized data according to the computing structure of the metadata and the computing structure after the optimization strategy is applied and in combination with the data production window.
3. The optimization processing method suitable for big data index according to claim 2, wherein the manner of obtaining the index, the model and the metadata of the data table set by the user from the business system includes:
increment trigger type acquisition, namely acquiring increment metadata from a service system when the increment part of the metadata in the service system reaches a preset threshold value;
or periodically acquiring the full amount of metadata, periodically acquiring the full amount of metadata from the business system according to a preset time interval, and deleting the original full amount of metadata after acquiring new full amount of metadata.
4. The optimization processing method suitable for big data index as claimed in claim 2, wherein metadata of index, model and data table set by user obtained from business system is stored in basic repository;
the analyzing the calculation intention of the user setting index and the current data structure, selecting a corresponding preset optimization strategy, and optimizing the storage structure and the calculation structure of the metadata comprises the following steps:
reading all metadata in the basic repository, loading the selected preset optimization strategy, producing a new optimized data structure, and writing the optimized data structure into the optimized repository.
5. The method of claim 2, wherein the generating a production plan for producing optimized data according to the computing structure of the metadata and the computing structure optimized by applying the optimization strategy and combining the data production window comprises:
generating a data synchronization calculation logic according to the data structures before and after optimization and the optimized data structure calculation logic;
calculating window information by combining data to generate a data synchronization task;
registering the data synchronization task on a task scheduling platform;
and the task scheduling platform executes the task of data synchronization according to the configuration and synchronizes the data from the original table to the optimized table.
6. The method according to claim 2, wherein the analyzing of the query indicator calculation element in the indicator query request, the extracting of the initial data and/or the optimized data corresponding to the query indicator calculation element, and the constructing of the indicator query execution plan include:
receiving a request of index query, wherein the content of the request comprises indexes, dimensions, index calculation logic and a query range;
analyzing the query index calculation element in the index query request, querying metadata information according to the query index element, and querying optimized data information at the same time;
if the metadata has been optimized, extracting the optimized data structure, otherwise extracting the metadata structure;
and constructing an index query execution plan according to the extracted optimized data structure and/or metadata structure.
7. The optimization processing method suitable for big data index according to claim 6, wherein the constructing an index query execution plan includes:
constructing an execution plan of the atomic index as an inner layer sql according to the extracted data structure;
constructing a filtering condition of the outer-layer sql according to the query range in the index query request;
constructing project conditions of the outer-layer sql according to index calculation logic in the index query request;
and assembling the inner layer sql and the outer layer sql, constructing and optimizing the complete sql, and returning the sql of the index query execution plan.
8. The optimization processing method suitable for the big data index according to claim 1, wherein the index calculation element comprises a first index calculation element set in an index definition stage and a second index calculation element set by a query condition in an index query stage;
the first index calculation element comprises a service time column, and/or a measurement column, and/or a dimension column, and/or an aggregation function, and/or an aggregation window size, and/or a window sliding step length;
the first index calculation element includes a query range.
9. An optimization processing device suitable for big data index, comprising:
the strategy optimization module selects a corresponding preset optimization strategy for optimizing the initial data of the index calculation element to obtain the optimized data of the index calculation element;
and the query optimization module analyzes the query index calculation elements in the index query request, extracts initial data and/or optimized data corresponding to the query index calculation elements and constructs an index query execution plan.
10. A storage medium storing a computer-executable program, wherein the computer-executable program, when executed, implements an optimization processing method for big data index according to any one of claims 1 to 8.
CN202211510338.2A 2022-11-29 2022-11-29 Optimization processing method and device suitable for big data index and storage medium Pending CN115827685A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211510338.2A CN115827685A (en) 2022-11-29 2022-11-29 Optimization processing method and device suitable for big data index and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211510338.2A CN115827685A (en) 2022-11-29 2022-11-29 Optimization processing method and device suitable for big data index and storage medium

Publications (1)

Publication Number Publication Date
CN115827685A true CN115827685A (en) 2023-03-21

Family

ID=85532585

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211510338.2A Pending CN115827685A (en) 2022-11-29 2022-11-29 Optimization processing method and device suitable for big data index and storage medium

Country Status (1)

Country Link
CN (1) CN115827685A (en)

Similar Documents

Publication Publication Date Title
US7716167B2 (en) System and method for automatically building an OLAP model in a relational database
US8447721B2 (en) Interest-driven business intelligence systems and methods of data analysis using interest-driven data pipelines
US8086592B2 (en) Apparatus and method for associating unstructured text with structured data
CN111971666A (en) Dimension context propagation technology for optimizing SQL query plan
EP2963570A1 (en) Dynamic selection of source table for db rollup aggregation and query rewrite based on model driven definitions and cardinality estimates
US9747349B2 (en) System and method for distributing queries to a group of databases and expediting data access
US8219547B2 (en) Indirect database queries with large OLAP cubes
US20120130942A1 (en) OLAP Execution Model Using Relational Operations
JP2006513474A (en) Method, system, and program for describing multidimensional computations for a relational OLAP engine
EP2396720A1 (en) Creation of a data store
US10909160B2 (en) Digital duplicate
Szárnyas et al. The LDBC social network benchmark: Business intelligence workload
US20160188685A1 (en) Fan identity data integration and unification
Bhaskara et al. Data warehouse implemantation to support batik sales information using MOLAP
Nordeen Learn Data Warehousing in 24 Hours
Albano Decision support databases essentials
US20100094864A1 (en) Data storage and fusion layer
US11822548B2 (en) Data warehouse framework for high performance reporting
Khalil et al. New approach for implementing big datamart using NoSQL key-value stores
Reniers et al. Schema design support for semi-structured data: Finding the sweet spot between NF and De-NF
JP2001216307A (en) Relational database management system and storage medium stored with same
CN115827685A (en) Optimization processing method and device suitable for big data index and storage medium
CN114298525A (en) Database risk assessment method and device
CN114490571A (en) Modeling method, server and storage medium
Atay et al. Modeling and querying multidimensional bitemporal data warehouses

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 1501, 15th Floor, Building 7, No.13 Huayuan Road, Haidian District, Beijing, 100088

Applicant after: Beijing Shushi yunchuang Technology Co.,Ltd.

Address before: No. 805, Floor 8, Building A, Zhizhen Building, No. 7, Zhichun Road, Haidian District, Beijing, 100088

Applicant before: Beijing Shushi yunchuang Technology Co.,Ltd.

CB02 Change of applicant information