WO2021139427A1

WO2021139427A1 - Big data index construction method, apparatus and device, and storage medium

Info

Publication number: WO2021139427A1
Application number: PCT/CN2020/131753
Authority: WO
Inventors: 陈志兴
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-07-23
Filing date: 2020-11-26
Publication date: 2021-07-15
Also published as: CN111859299A

Abstract

Disclosed are a big data index construction method, apparatus and device, and a storage medium, relating to the field of big data. The method comprises: acquiring data to be predicted, and parsing said data to construct multiple indexes carrying different dimension attribute information; calculating the access frequency of each index according to a linear regression algorithm, and determining whether other dimension tables need to be associated with same during index calculation in order to determine the type of the index; according to a correlation table between the index type and a storage calculation engine and a correlation table between the index type and a dimension modeling mode for the index, making a query for a storage calculation engine corresponding to the index and calculating a preset dimension table with which the index needs to be associated; and using a routing decision engine to call the storage calculation engine in order to execute the preset dimension table, and calculating a value corresponding to the index. The method solves the problem of the timeliness of big data index calculation, and solves the technical problem of only a single data engine and dimension being used for modeling.

Description

Big data indicator construction method, device, equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on July 23, 2020, the application number is 202010714909.9, and the invention title is "Big Data Index Construction Method, Device, Equipment, and Storage Medium", the entire content of which is incorporated by reference Incorporate in the application.

Technical field

This application relates to the field of big data technology, and in particular to a method, device, equipment and storage medium for constructing big data indicators.

Background technique

With the progress of society and the development of big data, the development of fixed-dimensional indicator I is facing challenges. The basis of the fixed-dimensional indicator I is to use production data to extract the corresponding production indicators. In this process, a lot of calculations are needed on the production data. At the same time, according to the division of different index levels, very flexible calculations are required. With the skyrocketing production data and flexible application scenarios, fixed-dimensional index I services can no longer provide effective services.

In the past, the solution was to solve the problem by providing more computing resources or providing a higher computing engine. However, under the skyrocketing amount of data, it also consumes a lot of resources. The inventor realizes that in terms of calculation model, although the flexibility of index calculation is realized, the time consumption of index calculation is increased. In order to meet the consistency of calculation model and computing resources, the calculation of the fixed dimension index I of big data is often limited to Single calculation engine.

Summary of the invention

The main purpose of this application is to solve the technical problem of only using a single data engine and dimensional modeling.

In order to achieve the above objective, the first aspect of this application provides a method for constructing big data indicators, including: obtaining data to be predicted; analyzing the data to be predicted to construct multiple indicators that carry attribute information of different dimensions; and according to linear regression The algorithm calculates the access frequency of the indicator, and determines whether the indicator is associated with a preset dimension table; based on the access frequency, determines the indicator type of the indicator, wherein the indicator type includes multi-dimensional aggregated indicators and Fixed-dimensional indicators; based on the indicator type, the corresponding relationship table between the preset indicator type and the storage calculation engine, and the corresponding relationship table between the indicator type and the dimensional modeling method of the indicator are determined to correspond to the indicator Corresponding storage calculation engine and dimensional modeling method; according to the dimensional modeling method, determine the preset dimension table associated with the indicator, wherein the preset dimension table includes dimensional modeling corresponding to the indicator type A dimension table constructed in a method or a dimension table constructed based on all dimensional modeling methods; the routing decision engine is used to call the storage calculation engine to execute the preset dimension table, and the indicator value corresponding to the indicator is calculated.

The second aspect of the present application provides a device for constructing big data indicators, including a memory, a processor, and computer-readable instructions stored on the memory and running on the processor, and the processor executes the computer When the instruction is readable, the following steps are implemented: obtain the data to be predicted; analyze the data to be predicted to construct multiple indicators that carry attribute information of different dimensions; calculate the access frequency of the indicators according to the linear regression algorithm, and determine the Whether the indicator is associated with a preset dimension table; based on the access frequency, determine the indicator type of the indicator, where the indicator type includes multi-dimensional aggregated indicators and fixed-dimensional indicators; based on the indicator type, according to preset The corresponding relationship table between the indicator type and the storage calculation engine, and the correspondence table between the indicator type and the dimensional modeling method of the indicator, determine the storage calculation engine and the dimensional modeling method corresponding to the indicator; The dimensional modeling method determines the preset dimension table associated with the indicator, wherein the preset dimension table includes a dimensional table constructed based on the dimensional modeling method corresponding to the indicator type or a dimensional table constructed based on all dimensional modeling methods Dimension table; using the routing decision engine to call the storage calculation engine to execute the preset dimension table, and calculate the index value corresponding to the index.

A third aspect of the present application provides a computer-readable storage medium that stores computer instructions, and when the computer instructions are executed on a computer, the computer executes the following steps: obtaining data to be predicted; Analyze the to-be-predicted data to construct multiple indicators that carry attribute information of different dimensions; calculate the access frequency of the indicators according to the linear regression algorithm, and determine whether the indicators are associated with a preset dimension table; based on the access Frequency, determine the indicator type of the indicator, where the indicator type includes a multi-dimensional aggregated indicator and a fixed-dimensional indicator; based on the indicator type, according to a preset corresponding relationship table between the indicator type and the storage calculation engine , And the corresponding relationship table between the indicator type and the dimensional modeling method of the indicator, determine the storage calculation engine and the dimensional modeling method corresponding to the indicator; determine the predictive value associated with the indicator according to the dimensional modeling method The preset dimension table, wherein the preset dimension table includes a dimension table constructed based on the dimensional modeling method corresponding to the indicator type or a dimension table constructed based on all dimensional modeling methods; the routing decision engine is used to call the storage calculation engine The preset dimension table is executed, and the index value corresponding to the index is calculated.

The fourth aspect of the present application provides a big data indicator construction device, which includes: a first acquisition module for acquiring data to be predicted; a first construction module for analyzing the data to be predicted to construct multiple portable data Indicators of attribute information of different dimensions; a judging module for calculating the access frequency of the indicator according to a linear regression algorithm, and judging whether the indicator is associated with a preset dimension table; a first determining module, for calculating the access frequency based on the access frequency , Determine the indicator type of the indicator, wherein the indicator type includes a multi-dimensional aggregated indicator and a fixed-dimensional indicator; the second determining module is configured to store the calculation engine based on the indicator type according to the preset indicator type The corresponding relationship table between the index type and the dimensional modeling method of the indicator determines the storage calculation engine and the dimensional modeling method corresponding to the indicator; the third determining module is used to determine the storage calculation engine and the dimensional modeling method corresponding to the indicator; The dimensional modeling method determines the preset dimension table associated with the indicator, wherein the preset dimension table includes a dimensional table constructed based on the dimensional modeling method corresponding to the indicator type or a dimensional table constructed based on all dimensional modeling methods Dimension table; calculation module, used to use the routing decision engine to call the storage calculation engine to execute the preset dimension table, and calculate the index value corresponding to the index.

Description of the drawings

Fig. 1 is a schematic diagram of a first embodiment of a method for constructing a big data indicator in an embodiment of the present invention;

2 is a schematic diagram of a second embodiment of a method for constructing a big data indicator in an embodiment of the present invention;

3 is a schematic diagram of a third embodiment of a method for constructing a big data indicator in an embodiment of the present invention;

4 is a schematic diagram of a first embodiment of a device for constructing a big data indicator in an embodiment of the present invention;

Fig. 5 is a schematic diagram of a second embodiment of a device for constructing a big data indicator in an embodiment of the present invention;

Fig. 6 is a schematic diagram of an embodiment of a device for constructing a big data indicator in an embodiment of the present invention.

Detailed ways

The embodiments of the present application provide a method, device, equipment, and storage medium for constructing a big data indicator, which solves the contradiction between the time-consuming and time-efficient calculation of the fixed-dimensional indicator I of big data, and at the same time solves the problem that only a single data engine and Technical issues of dimensional modeling.

The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects, without having to use To describe a specific order or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances so that the embodiments described herein can be implemented in a sequence other than the content illustrated or described herein. In addition, the terms "including" or "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those clearly listed. Steps or units, but may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.

For ease of understanding, the following describes the specific process of the embodiment of the present application. Please refer to FIG. 1. The first embodiment of the method for constructing a big data indicator in the embodiment of the present application includes:

101. Obtain the data to be predicted;

In this embodiment, all data to be predicted are acquired, and the data contains many indicator labels. For example, data such as the premium of a certain type of insurance under a certain activity, and the premiums of all types of insurance under a certain activity. The data to be predicted refers to the data containing the indicators to be calculated. The data is analyzed to determine the indicator information contained in the data. Furthermore, common attributes are added to construct labels of multiple (different) basic dimensions (under attributes). For example, we take the indicator "premium" as an example, and increase the public attribute of the indicator, and we can construct multiple different indicators "premium of auto insurance", "premium under the Double 11 event" or "premium of auto insurance under the Double 11 event".

102. Analyze the data to be predicted to construct multiple indicators that carry attribute information of different dimensions;

In this embodiment, the indicator is an indicator label that the enterprise has obtained based on data analysis. The basic dimension is to add the calculated value of the dimension index under the company's basic attributes, and the addition of the public attribute is to increase the company's common attributes.

103. Calculate the access frequency of the indicator according to the linear regression algorithm, and determine whether the indicator is associated with a preset dimension table;

In this embodiment, if it is necessary to determine whether the indicator needs to be associated with other dimensional data during calculation, it is first necessary to predict the access frequency of the indicator, that is, to determine whether the indicator is an indicator that frequently needs to be counted (visited) or through a certain usage rule The calculated indicators are determined based on the access frequency of the indicator and whether other data needs to be associated with the indicator calculation to determine the calculation requirements of the indicator, and further select a suitable storage calculation engine.

The dimension table mentioned in this embodiment can be understood to a certain extent as a data table containing many index (label) information. For example, a dimension table that counts xx insurance company’s “total premiums in 2019” includes: this dimension table contains labels such as “time: 2019.01, 2019.02, ···2019.12, insurance types: auto insurance, life insurance, critical illness insurance, "Children's Insurance" includes indicators such as "2019.01 Auto Insurance Premium", "2019.01 Critical Illness Insurance Premium", "2019.03 Life Insurance Premium", "2019.03 Children's Insurance Premium", etc.

In this embodiment, whether a certain indicator needs to be associated with other dimension tables during calculation is determined by whether it is necessary to introduce indicators from other dimension tables (data tables) when calculating the indicator. For example, when calculating the indicator "Total auto insurance premiums from 2017 to 2019", the local dimension table only has the indicator "Total auto insurance premiums in 2018". At this time, if you calculate the "Total auto insurance premiums from 2017 to 2019" "This indicator needs to be calculated by linking the indicator information in the dimension table "2017 total auto insurance premiums" and the dimension table "2019 total auto insurance premiums". For another example, when calculating the index "total auto insurance premiums from April to June 2019", since the local dimension table "total premiums for 2019" includes the monthly auto insurance premiums from January to December 2019, there is no need to Associate index information in other dimension tables for calculation.

Linear regression in this embodiment refers to a regression analysis that uses a least square function called a linear regression equation to model the relationship between one or more independent variables and dependent variables. In this embodiment, a linear regression algorithm is used to predict the access frequency of the index. For example, in the process of using indicators, you will find that some indicators are frequently used or the access frequency of indicators is affected by some other data. They have the same characteristics and are linear. According to the characteristics, we can infer which indicators are visits. The frequency is relatively high or is calculated through regular statistics. For example: there is an event, the indicators that need to be checked on Double 11, and the same statistics are required on Double 12. We can calculate and aggregate the indicators of Double 12 in advance according to the characteristics of Double 11. In this embodiment, regression is to predict new data based on existing data, such as predicting stock trends. Linear regression is to be able to use a straight line to more accurately describe the relationship between the data, when new data appears, it can predict a simple value.

The linear regression model looks like:

h(x)=w ₁ x ₁ +w ₂ x ₂ +w ₃ x ₃ +...+w _n x _n +b

The model obtained by linear regression is not necessarily a straight line:

(1) When there is only one variable, the model is a straight line in the plane;

(2) When there are two variables, the model is a plane in space;

(3) When there are more variables, the model will be higher dimensional.

In fact, the residual sum of squares is usually used in linear regression, that is, the distance from the point to the straight line parallel to the y axis instead of the vertical distance. The residual sum of squares divided by the sample size n is the mean square error. The mean square error is used as the cost function of the linear regression model. Minimizing the sum of the distances from all points to the straight line is to minimize the mean square error. This method is called the least squares method.

Loss function formula:

Because h(x)=w ₁ x ₁ +w ₂ x ₂ +w ₃ x ₃ +...+w _n x _n +b

Finally, through solving, the calculation formulas of w and b are obtained as follows:

In this embodiment, when predicting the access frequency of indicator A in a certain sample data, assuming that the input data set D has n samples and d features, then:

D={(x ⁽¹⁾ ,y1), (x ⁽²⁾ ,y2),...,(x ⁽ⁿ⁾ ,yn)}

The i-th sample is expressed as:

(x(i),yi)=(x ₁ ⁽ⁱ⁾ ,x ₂ ⁽ⁱ⁾ ,...,x _d ⁽ⁱ⁾ ,yi)

Linear models make predictions by establishing linear combinations. Our hypothetical function (1) is:

H _θ (x ₁ , x ₂ ,..., x _d )=θ ₀ +θ ₁ x ₁ +θ ₂ x ₂ +...+θ _d x _d

Where θ ₀ and θ ₁ …θ _d are model parameters, let X0=1, X(i)=(X1(i), X2(i),..., Xd(i)) are row vectors, and let X be n*d matrix, θ is a d*1 dimensional vector, assuming function (1) can be expressed as: Hθ(X)=Xθ

The loss function is the mean square error, that is

The least square method is used to solve the parameters, and the loss function J(θ) is derived from θ:

make

Get θ=(X ^T X) ^-1 X ^T Y

In this embodiment, the linear regression algorithm is used to determine the important indicators in the sample data, and the mapping relationship equations between the indicators and the indicator factors that affect the access frequency of the indicators are established respectively. The frequency of access rights to all of the index M Index Factors affecting weight W, to determine the respective primary dependent variable (i.e., the main index _{_{factor) a 1, a 2, a}} 3, ..., a n are established each of the major indicators The equation of the mapping relationship between factors and indicators: y=β+βa ₁ +βa ₂ +…+βa _n , where y is the access frequency of indicator M (in a certain time period), a ₁ , a ₂ , a ₃ , ..., a _n is the index factor all impact indicators M (in a specific time period) of the access frequency. M index below to an example, a collection of access frequency 2017 ~ 2019 M metrics of a special promotion and impact indicators Indicator M access frequency factors _{_{_{a 1, a 2, a 3}}} , ..., a _n value of . Use the SPSS tool to input the above data. The equation is y=βa ₁ +βa ₂ +βa ₃ +...+βa _n . Since the correlation coefficient of the index factor and the adjusted multiple determination coefficient are very close to 1, the model fits well The degree is better, indicating that the linear relationship of the model is more significant. Based on the F test seen _{_{_{a 1, a 2, a 3}}} , ..., a n as the main index factor, and finally through the python mapping, contrast can be obtained each index M predicted value and the actual value can be derived by comparing the The model is reasonable and can be used to predict the access frequency of indicator M (in a specific time period). Repeat the above operation to establish a univariate linear relationship between the access frequency of each main indicator factor and indicator M (in a specific time period) The regression model infers the change of the access frequency of the indicator M in a certain period of time, inputs it as input data into the prediction model, and finally obtains the predicted access frequency of the indicator M (in a certain period of time). Furthermore, according to the prediction method of linear regression, data can be predicted to predict which data will be accessed with high frequency. These high-frequency accessed data need to be pre-aggregated, and some of them do not require high-frequency access and can use other storage. engine.

104. Determine the indicator type of the indicator based on the access frequency, where the indicator type includes multi-dimensional aggregated indicators and fixed-dimensional indicators;

In this embodiment, the index type is determined according to the access frequency of the index and whether other dimension tables need to be associated when calculating the index, and further, the type of the index is determined. For example, some indicators need to be associated with multiple dimension tables to be calculated during calculation, while other indicators do not need to be associated with other dimension tables during calculation to calculate the value of the indicator. There are two types of indicators, which require multi-dimensional aggregation, that is, indicators that need to be associated with other dimension tables for associated calculations during calculation, and fixed-dimensional indicator types, which means that calculations do not need to associate data in other dimension tables. , The index of the index value can be calculated only by the data in the wide table to which it belongs.

105. Based on the indicator type, according to the preset correspondence table between the indicator type and the storage calculation engine, and the correspondence table between the indicator type and the dimensional modeling method of the indicator, determine the storage calculation engine and the dimension corresponding to the indicator Modeling method;

In this embodiment, according to the type of the indicator, from the correspondence table between the preset indicator type and the storage calculation engine, the storage calculation engine corresponding to the indicator is queried, and the preset dimension table associated with the indicator is calculated. information. Different types of indicators are stored in different locations for storage calculation engines. For example, some of them are stored in random reports or semi-aggregated reports. When calculating, you need to associate the indicators of other dimension tables. When querying these indicators, you need to compare the table where the indicators are located with The value of the indicator can be calculated after the other dimension tables are associated. For fixed-dimensional indicators, there is no need to associate other dimension tables in the calculation. Then the aggregate report built by these indicators can be stored in the aggregation engine for calculation in advance, and the user can query When this indicator is used, the corresponding indicator value can be quickly queried without waiting for the calculation time, which improves the efficiency of data processing.

In this embodiment, according to the type of the indicator, it is determined whether multiple dimension tables are required to be associated (to) calculated when querying the indicator (value), and if necessary, the corresponding dimension table is queried. For example, to calculate the index value of the fixed index "2018 Double 11 event auto insurance premiums", you only need to table "2018 insurance premiums", table "2018 auto insurance premiums" and "2018 double 11 event premiums" The data in the three tables of different dimensions are stored in one table, which is a wide table. When calculating, there is no need to associate other data reports. When calculating the indicator "2018 premiums", the table "2018" is needed. Annual auto insurance premiums", table "2018 property insurance premiums", and table "2018 life insurance premiums"... table "2018 XX insurance premiums", all insurance premium tables are linked together to get The indicator value of the indicator "Premium for 2018".

106. Determine the preset dimension table associated with the indicator according to the dimensional modeling method, where the preset dimension table includes a dimensional table constructed based on the dimensional modeling method corresponding to the indicator type or a dimensional table constructed based on all dimensional modeling methods:

In this embodiment, different types of indicators correspond to different modeling models to generate different types of reports, and the generated reports are also stored in different data storage calculation engines according to different report types.

107. Use the routing decision engine to call the storage calculation engine to execute the preset dimension table, and calculate the index value corresponding to the index.

In this embodiment, the routing decision engine will request the corresponding storage calculation engine according to the correspondence relationship between the calculation engines stored in the report to which the indicator belongs. That is, according to the different queried indicators, the routing decision engine will select the storage corresponding to the current calculation request. Calculation engine, and distribute the request to the corresponding storage calculation engine to calculate the value of the corresponding index. For example, if the index to be viewed is a basic (fixed) index, the query (calculation) request will be forwarded to a basic database such as hive (no aggregation database, which can realize multi-table association calculation). If you want to view the pre-calculated index, you will Forward to databases such as druid.io (aggregated data engine). In this embodiment, the calculation requirement of the index can be simply understood as whether the association and calculation of the dimension table are required (with or without).

In the technical solution provided by this application, the data to be predicted is mainly obtained and analyzed to construct indicators of multiple dimensional attributes, and the access frequency of the indicators is predicted by the linear regression algorithm to determine the calculation requirements of the indicators. According to the calculation requirements of the indicators, select an appropriate method to store the indicators in the corresponding storage calculation engine, and calculate the indicator values of the indicators. This solves the contradiction between the time-consuming and time-consuming calculation of the fixed-dimensional indicator I of big data, and at the same time solves the problem. The technical issues of using a single data engine and dimensional modeling.

Please refer to Fig. 2. The second embodiment of the method for constructing a big data indicator in the embodiment of the present application includes:

201. Obtain the data to be predicted;

202. Analyze the data to be predicted and define multiple indicators;

In this embodiment, the obtained data containing many index labels is analyzed, and multiple definable labels are obtained therefrom. For example, "premium", "premium of life insurance", "premium of property insurance under Double 12 event", "premium of auto insurance under Double 11 event" and so on. In this embodiment, an indicator refers to a unit or method used to measure the degree of development of a thing, and it also has a commonly used name in IT, that is, measurement. For example: population, GDP, income, number of users, profit rate, retention rate, coverage rate, etc. Many companies have their own KPI indicator system, which uses several key indicators to measure the performance of the company’s business operations. The indicators need to be obtained through summary calculation methods such as summation and average, and summary calculations need to be performed under certain preconditions, such as time, location, and cost, which is what we often call statistical caliber and scope.

203. Use the preset model to classify the indicators and add dimension attributes;

In this embodiment, a preset model is used to classify the extracted indicators, and the dimensional attribute information of each indicator is added. Taking "premium" as an example, gradually increasing the dimensional attribute information of the indicator "premium" can become "Enterprise plan premiums" and "enterprise plan premiums of secondary institutions" further increase the basic dimension attribute information of the indicators, and at the same time increase the public attribute dimension information, such as "whether it is the enterprise plan premiums of secondary institutions participating in insurance activities".

In this embodiment, the dimension attribute of the indicator refers to a certain characteristic of a thing or phenomenon, such as gender, region, time, etc., are all dimensions. Among them, time is a commonly used and special dimension. Through the comparison of time before and after, you can know whether the development of things is good or bad. For example, "The premium of auto insurance under the Double 11 event in 2019 is higher than that under the Double 11 event in 2018. The premium of auto insurance has increased by 10%", "The premium of life insurance under the Double 12 event in 2019 will increase by 20% compared to the premium of life insurance under the Double 11 event in 2019." This is the comparison in time, also known as the vertical ratio. Another comparison is the horizontal comparison. For example, the comparison between the “premium of auto insurance under the Double 11 event in 2018” and the “premium of life insurance under the Double 11 event in 2018” is a comparison between units of the same level, referred to as horizontal. ratio. In this embodiment, the dimensions can be divided into qualitative dimensions and quantitative dimensions, that is, according to the data type. The data type is character (text) data, which is qualitative. For example, region and gender are all qualitative dimensions; the data type is Numerical data are quantitative dimensions, such as income, age, consumption, etc.

204. Combine the indicators and the dimensional attributes based on the indicators and the dimensional attributes to obtain multiple indicators of different dimensional attributes;

In this embodiment, according to the indicator and dimensional attribute information, the indicator and the dimensional attribute are combined to obtain multiple indicators carrying different dimensional attribute information. For example, "Premium for auto insurance under Double 11 in 2019", "Premium for auto insurance under Double 12 in 2019", "Premium for property insurance under Double 11 in 2019", "Premium for property insurance under Double 12 in 2019" .

205. Based on the linear regression algorithm, determine the main indicator factors that affect the frequency of indicator access;

In this embodiment, according to the linear regression algorithm, the indicators of different dimensional attributes in the data to be predicted are determined, and at the same time, the indicator factors that affect the access frequency of the indicators are determined.

206. Establish the mapping relationship equation between the index and the main index factor, and use the elastic coefficient method to predict the parameter value of the main index factor;

In this embodiment, a mapping relationship equation between the index obtained in the data to be predicted and the index factor corresponding to the index is established. The elastic coefficient method is used to predict the parameter value of each index factor under a certain activity of the data to be predicted. For example, predict the number of people who will purchase auto insurance during the Double 11 event in 2019. The elastic coefficient ET is calculated using the data of the most recent year and the farthest year (from the collected historical data), and then the access frequency of the corresponding indicator under a certain activity can be calculated. The access frequency in this embodiment can also be said to be a probability value.

207. Substitute the parameter value of the index factor into the mapping relationship equation to calculate the access frequency of the index;

In this embodiment, the mapping relationship equation between the index obtained in the data to be predicted and the index factor corresponding to the index is established, and the parameter value of the index factor is substituted into the mapping relationship equation to calculate (predict) the access frequency of the index ( Probability value).

208. If the access frequency of the indicator is greater than the preset threshold and other dimension tables need to be associated when calculating the access frequency of the indicator, the indicator is an indicator type that requires multi-dimensional aggregation;

In this embodiment, if the access probability of the indicator is greater than the preset threshold and when querying (calculating) this indicator, it is necessary to associate other dimension tables for correlation calculation, then it can be determined that the indicator is an indicator type indicator that requires multi-dimensional aggregation. It is an indicator that needs to be aggregated in multiple dimensions. For example, the indicator "2018 premiums", if you want to calculate it, you need to table "2018 auto insurance premiums", table "2018 property insurance premiums", and table "2018 life insurance premiums"... XX insurance premiums in 2018", the table of premiums of all insurance types is linked together, then the indicator "2018 premiums" is an index type index that needs multi-dimensional aggregation, that is, an index that needs multi-dimensional aggregation.

209. If the access frequency of the indicator is greater than the preset threshold and there is no need to associate other dimension tables when calculating the access frequency of the indicator, the indicator type is a fixed-dimensional indicator type;

In this embodiment, if the access probability of the indicator is greater than the preset threshold and the indicator is queried (calculated), there is no need to associate other dimension tables for correlation calculation, and only the data in the table to which the indicator belongs is used, then it can be determined that the indicator is The index type index of the fixed dimension, that is, the fixed index. For example, "2018 Double 11 event auto insurance premium", the dimension of this indicator is a fixed three dimensions "2018 + Double 11 event + auto insurance", when calculating the indicator "2018 Double 11 event auto insurance premium" , You only need to use the wide table to model three tables with different dimensions, the table "Premiums in 2018", the table "Premiums for auto insurance in 2018" and "Premiums under the Double 11 event in 2018", and store them in the same table. Wide table, when calculating, only query the data in this (wide) table, and there is no need to associate data in other tables, then the indicator "2018 Double 11 event insurance premiums" is a fixed-dimensional indicator type indicator, that is Fixed indicators. In this embodiment, the wide table is to build all the fields in it, and there is no need to associate other tables when statistical data (calculating index values).

210. Based on the indicator type, according to the preset correspondence table between the indicator type and the storage calculation engine, and the correspondence table between the indicator type and the dimensional modeling method of the indicator, determine the storage calculation engine and the dimension corresponding to the indicator Modeling method;

211. Determine the preset dimension table associated with the indicator according to the dimensional modeling method, where the preset dimension table includes a dimensional table constructed based on the dimensional modeling method corresponding to the indicator type or a dimensional table constructed based on all dimensional modeling methods;

212. Use the routing decision engine to call the storage calculation engine to execute the preset dimension table, and calculate the index value corresponding to the index.

Referring to Fig. 3, the third embodiment of the method for constructing a big data indicator in the embodiment of the present application includes:

301. Obtain data to be predicted;

302. Analyze the data to be predicted to construct multiple indicators that carry attribute information of different dimensions;

303. Obtain historical data including indicators, where the historical data includes indicators in a specific period, the number of visits of the indicator in a specific period, and indicator factors that affect the number of visits of the indicator in a specific period;

In this embodiment, the historical data containing the indicators to be predicted is obtained. For example, we need to understand the basic law of the indicator "Car insurance premiums under the double 11 event in 2019", and we need to obtain the indicators "Car insurance premiums under the double 11 event in 2018". Analyze the data information to predict the indicator "Car insurance premiums under the Double 11 event in 2019". Therefore, in the historical data in this example, the indicators in a specific period of time and the indicators are in a specific period. The number of visits (frequency of visits) during the period, and the indicator factors that may affect the number of visits of the indicator in a specific period. The index factor is related to the number of visits of the index in a specific period. Therefore, a mapping relationship between the index factor and the index access frequency is established, and the index access frequency is calculated (or "predicted") based on historical data.

304. Use historical data as sample data, perform partial correlation analysis on the sample data, extract indicators, and respectively establish mapping relationship equations between indicators and corresponding indicator factors;

In this embodiment, historical data is used as sample data, for example, the data information of “auto insurance premium under the Double 11 event in 2018” is used as sample data.

305. Perform a T test on the mapping relationship equations to determine the main index factors that affect the frequency of index visits;

In this embodiment, the t test is a type of significance test in the multiple linear regression algorithm. Under the ordinary square method, the F test can be equivalent to the t test. In this embodiment, the partial correlation analysis method is used to further analyze the mapping relationship equations of each index and index factor, and determine the main independent variable in the mapping relationship between each index and the index factor (that is, the main index factor, there will be many index factors affecting The number of times the indicator is visited in a specific period, and the main indicator factor is the main influencing factor), and then all the main indicator factors are retained in the mapping relationship equation between the indicator and the indicator factor. The index factor whose partial correlation coefficient is within the preset value interval and the regression coefficient is greater than the F test parameter or the t test parameter in the mapping relationship equation is the main index factor.

306. Calculate the access frequency of the indicator according to the linear regression algorithm, and determine whether the indicator is associated with a preset dimension table;

307. Determine the indicator type of the indicator based on the access frequency, where the indicator type includes multi-dimensional aggregated indicators and fixed-dimensional indicators;

308. Based on the type of the indicator, query the model construction method corresponding to the indicator from the preset correspondence table between the indicator type and the model construction method;

In this embodiment, according to the type of the indicator, the model construction method corresponding to the indicator type is queried from the preset correspondence table between the indicator type and the model construction method. For indicators that need to be associated with other dimensional tables for calculation, use dimensional modeling, and for fixed-dimensional requirements, use wide table modeling, that is, in a table, all fields are built in it, in the statistical data There is no need to associate other tables.

If the indicator is an indicator type indicator that requires multi-dimensional aggregation, use dimensional modeling to build random reports and/or semi-aggregated reports, and store random reports and/or semi-aggregated reports in the non-aggregated engine and/or semi-aggregated engine; If the indicator to be calculated is an indicator type indicator that requires multi-dimensional aggregation, that is, an indicator that needs to be associated with multiple dimension tables to be calculated, use dimensional modeling to build random reports and/or semi-aggregated reports, and combine random reports with / Or semi-aggregated reports are stored in the non-aggregated engine and/or semi-aggregated engine.

If the indicator is a fixed-dimensional indicator type indicator, use wide-table modeling, build an aggregate report, and store the aggregate report to the aggregation engine; if the indicator to be calculated is a fixed-dimensional indicator type indicator, there is no need to interact with multiple dimension tables For indicators that can be calculated by association, use wide table modeling, build aggregate reports, and store aggregate reports in the aggregation engine.

In this embodiment, wide table modeling means that the indicators and dimensions are stored in a large table, that is, the data is divided into a fact table and a dimension table. The fact table is a record of specific events, and all fields are built in it. There is no need to associate other tables with data. Among them, the dimension represents some description of the event, through the separation of facts and dimension tables, to improve flexibility and solve corresponding problems.

309. Based on the indicator type, according to the preset correspondence table between the indicator type and the storage calculation engine, and the correspondence table between the indicator type and the dimensional modeling method of the indicator, determine the storage calculation engine and the dimension corresponding to the indicator Modeling method;

310. Determine the preset dimension table associated with the indicator according to the dimensional modeling method, where the preset dimension table includes a dimensional table constructed based on the dimensional modeling method corresponding to the indicator type or a dimensional table constructed based on all dimensional modeling methods;

If the indicator is an indicator that requires multi-dimensional aggregation, the indicator is downgraded and stored in a random report or semi-aggregated report; in this embodiment, if the indicator is an indicator that requires multi-dimensional aggregation, it can be understood that the indicator is not It needs to be calculated in advance, and the indicator is downgraded, that is, the indicator and data are stored on a common calculation engine to save computing resources. Common computing engines include non-aggregation engines and semi-aggregation engines.

If the indicator is a fixed-dimensional indicator, use wide-table modeling to store all the fields in the dimension in the aggregate report; in this embodiment, if the indicator is a fixed-dimensional indicator, it means that other dimension tables are not required to calculate the indicator. Aggregate calculation. All indicators of this type can be stored in an aggregate report and calculated in advance. It saves index query (calculation) time and improves the efficiency of data processing. Query the storage calculation engine corresponding to the fixed-dimensional index type index, and store the aggregate report to the aggregation engine; in this embodiment, if the index is a fixed-dimensional index type index, that is, there is no need to associate when calculating the index For other dimension tables, this type of index is stored in the aggregation report through wide table modeling, and stored in the aggregation engine, so that it can be calculated in advance.

311. Use the routing decision engine to call the storage calculation engine to execute the preset dimension table, and calculate the index value corresponding to the index.

The method for constructing a big data indicator in the embodiment of the application is described above. The device for constructing a big data indicator in the embodiment of the application is described below. Please refer to FIG. 4. An embodiment of the device for constructing a big data indicator in the embodiment of the application includes: An acquisition module 401 is used to obtain the data to be predicted; the first construction module 402 is used to analyze the data to be predicted to construct a plurality of indicators carrying attribute information of different dimensions; the judgment module 403 is used to calculate according to the linear regression algorithm The access frequency of the indicator and determine whether the indicator is associated with a preset dimension table; the first determining module 404 is used to determine the indicator type of the indicator based on the access frequency, where the indicator type includes multi-dimensional aggregated indicators and fixed-dimensional indicators; The second determining module 405 is configured to determine the corresponding relationship with the indicator based on the indicator type, according to the preset correspondence table between the indicator type and the storage calculation engine, and the correspondence table between the indicator type and the dimensional modeling method of the indicator The storage calculation engine and the dimensional modeling method of the dimensional model; the third determining module 406 is used to determine the preset dimension table associated with the indicator according to the dimensional modeling method, wherein the preset dimension table includes the dimensional modeling method corresponding to the indicator type The constructed dimension table or the dimension table constructed based on all dimensional modeling methods; the calculation module 407 is used to use the routing decision engine to call the storage calculation engine to execute the preset dimension table and calculate the index value corresponding to the index.

Referring to FIG. 5, the second embodiment of the device for constructing big data indicators in the embodiment of the present application includes:

The first acquisition module 501 is used to obtain the data to be predicted; the first construction module 502 is used to analyze the data to be predicted to construct a plurality of indicators carrying attribute information of different dimensions; the judgment module 503 is used to obtain the linear regression algorithm, Calculate the access frequency of the indicator, and determine whether the indicator is associated with a preset dimension table; the first determining module 504 is used to determine the indicator type of the indicator based on the access frequency, where the indicator type includes multi-dimensional aggregated indicators and fixed-dimensional indicators ; The second determining module 505 is used to determine the corresponding relationship table between the indicator type and the storage calculation engine based on the indicator type, and the corresponding relationship table between the indicator type and the dimensional modeling method of the indicator for a long time. Corresponding storage calculation engine and dimensional modeling method; the third determining module 506 is used to determine the preset dimension table associated with the indicator according to the dimensional modeling method, wherein the preset dimension table includes dimensional modeling corresponding to the indicator type A dimension table constructed in a way or a dimension table constructed based on all dimensional modeling methods; a calculation module 507, used to use the routing decision engine to call the storage calculation engine to execute the preset dimension table, and calculate the corresponding indicator value of the indicator; the second acquisition module 508 , Used to obtain historical data including indicators, where the historical data includes indicators in a specific period, the number of visits of the indicator in a specific period, and index factors that affect the number of visits of the indicator in a specific period; the analysis module 509 is used to combine Historical data is used as sample data, and partial correlation analysis is performed on the sample data, indicators are extracted, and the mapping relationship equations between the indicators and the corresponding indicator factors are established respectively; the test module 510 is used to perform T-tests on the mapping relationship equations respectively to determine the impact index access The main index factor of the frequency; the second query module 511 is used to query the model construction method corresponding to the indicator from the correspondence table between the preset indicator type and the model construction method based on the type of the indicator; the second construction module 512, used to use dimensional modeling to construct random reports and/or semi-aggregated reports, and store random reports and/or semi-aggregated reports in a non-aggregated engine and/or Semi-aggregation engine; the first storage module 513, used when the indicator is a fixed-dimensional indicator type indicator, uses wide table modeling, builds aggregate reports, and stores the aggregate reports in the aggregation engine; indicator downgrade module 514, used as indicators When multi-dimensional aggregated indicators are required, the indicators are downgraded and stored in random reports and or semi-aggregated reports; the fourth determination module 515 is used to query the storage calculation engine corresponding to the indicators that require multi-dimensional aggregation and the required associated The preset dimension table is used to determine the indicator type that needs to be aggregated in multiple dimensions and the dimension table that the indicator needs to associate when calculating; the second storage module 516 is used when the indicator is a fixed-dimensional indicator, and uses a wide table to model all fields in the dimension Stored in the aggregate report; the third storage module 517 is used to query the storage calculation engine corresponding to the fixed-dimensional index type index, and store the aggregate report in the aggregation engine.

Among them, the first building module 502 is specifically used to: analyze the data to be predicted and define multiple indicators; use a preset model to classify the indicators and add dimensional attributes; based on the indicators and dimensional attributes, combine the indicators and dimensional attributes to obtain Multiple indicators of different dimension attributes.

Among them, the judgment module 503 is specifically used to: determine the main index factors that affect the access frequency of the index based on the linear regression algorithm; establish the mapping relationship equation between the index and the main index factor, and use the elastic coefficient method to predict the parameter value of the main index factor; The parameter value of the factor is substituted into the mapping relationship equation to calculate the access frequency of the indicator

Wherein, the first determining module 504 is specifically configured to: if the access frequency of the indicator is greater than a preset threshold and other dimension tables need to be associated when calculating the access frequency of the indicator, the indicator is an indicator type that requires multi-dimensional aggregation; if the access frequency of the indicator is greater than When the threshold is preset and the access frequency of the indicator does not need to be associated with other dimension tables, the indicator type is a fixed-dimensional indicator type.

The above figures 4 and 5 describe the big data indicator construction device in the embodiment of the present application in detail from the perspective of modular functional entities, and the following describes the big data indicator construction device in the embodiment of the present application in detail from the perspective of hardware processing.

FIG. 6 is a schematic structural diagram of a big data indicator construction device provided by an embodiment of the present application. The big data indicator construction device 600 may have relatively large differences due to different configurations or performance, and may include one or more processors (central Processing units, CPU) 610 (for example, one or more processors) and memory 620, and one or more storage media 630 (for example, one or more storage devices with a large amount of data) storing application programs 633 or data 632. Among them, the memory 620 and the storage medium 630 may be short-term storage or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the big data indicator construction device 600. Furthermore, the processor 610 may be configured to communicate with the storage medium 630, and execute a series of instruction operations in the storage medium 630 on the big data indicator construction device 600, so as to implement the steps of the big data indicator construction method in the foregoing embodiments. .

The big data indicator construction device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input and output interfaces 660, and/or one or more operating systems 631, such as Windows Serve, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the structure of the big data indicator construction device shown in FIG. 6 does not constitute a limitation on the big data indicator construction device, and may include more or less components than shown in the figure, or combine certain components, or Different component arrangements.

The present application also provides a device for constructing a big data indicator. The device for constructing a big data indicator includes: a memory and at least one processor, where instructions are stored in the memory, and the memory and at least one processor are interconnected by wires; at least one processor calls the memory In order to make the big data indicator construction device execute the steps of the big data indicator construction method.

The present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. The computer-readable storage medium stores computer instructions. When the computer instructions are run on the computer, the computer executes the following steps: obtain the data to be predicted; analyze the data to be predicted to construct multiple indicators that carry attribute information of different dimensions; according to linear regression Algorithm, calculate the access frequency of the indicator, and determine whether the indicator is associated with a preset dimension table; based on the access frequency, determine the indicator type of the indicator, where the indicator type includes multi-dimensional aggregated indicators and fixed-dimensional indicators; based on the indicator type, according to The preset correspondence table between the indicator type and the storage calculation engine, and the correspondence table between the indicator type and the dimensional modeling method of the indicator, determine the storage calculation engine and the dimensional modeling method corresponding to the indicator; build according to the dimension The model method determines the preset dimension table associated with the indicator. The preset dimension table includes the dimension table constructed based on the dimensional modeling method corresponding to the indicator type or the dimension table constructed based on all the dimensional modeling methods; the routing decision engine is used to call The storage calculation engine executes the preset dimension table and calculates the index value corresponding to the index.

Above, the above embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing various implementations. The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

A method for constructing big data indicators. The method for constructing big data indicators includes:

Obtain the data to be predicted;

Analyze the data to be predicted to construct multiple indicators that carry attribute information of different dimensions;

According to a linear regression algorithm, calculate the access frequency of the indicator, and determine whether the indicator is associated with a preset dimension table;

Determine an indicator type of the indicator based on the access frequency, where the indicator type includes a multi-dimensional aggregated indicator and a fixed-dimensional indicator;

Based on the indicator type, according to the preset correspondence table between the indicator type and the storage calculation engine, and the correspondence table between the indicator type and the dimensional modeling method of the indicator, the storage calculation engine corresponding to the indicator is determined And dimensional modeling;

Determine the preset dimension table associated with the indicator according to the dimensional modeling method, wherein the preset dimension table includes a dimensional table constructed based on the dimensional modeling method corresponding to the indicator type or modeling based on all dimensions Dimension table constructed in a way;

The routing decision engine is used to call the storage calculation engine to execute the preset dimension table, and the index value corresponding to the index is calculated.
The method for constructing a big data indicator according to claim 1, wherein the analyzing the data to be predicted to construct a plurality of indicators carrying attribute information of different dimensions comprises:

Analyze the data to be predicted and define multiple indicators;

Use a preset model to classify the indicators and add dimensional attributes;

Based on the indicator and the dimensional attribute, the indicator and the dimensional attribute are combined to obtain multiple indicators of different dimensional attributes.
The method for constructing a big data indicator according to claim 1, wherein, before calculating the access frequency of the indicator according to a linear regression algorithm, and determining whether the indicator is associated with a preset dimension table, the big data indicator The construction method also includes:

Acquiring historical data including the indicator, where the historical data includes the indicator in a specific period, the number of visits of the indicator in a specific period, and an indicator factor that affects the number of visits of the indicator in a specific period;

Use the historical data as sample data, perform partial correlation analysis on the sample data, extract indicators, and respectively establish the mapping relationship equations between the indicators and corresponding indicator factors;

Perform a T test on the mapping relationship equations to determine the main indicator factors that affect the frequency of the indicator access.
The method for constructing a big data indicator according to claim 1, wherein the calculating the access frequency of the indicator according to a linear regression algorithm, and determining whether the indicator is associated with a preset dimension table comprises:

Based on a linear regression algorithm, determine the main indicator factors that affect the access frequency of the indicator;

Establish a mapping relationship equation between the index and the main index factor, and use the elastic coefficient method to predict the parameter value of the main index factor;

Substituting the parameter value of the index factor into the mapping relationship equation to calculate the access frequency of the index.
The method for constructing a big data indicator according to claim 1, wherein the determining the indicator type of the indicator based on the access frequency comprises:

If the access frequency of the indicator is greater than a preset threshold and calculation of the access frequency of the indicator needs to be associated with other dimension tables, the indicator is an indicator type that requires multi-dimensional aggregation;

If the access frequency of the indicator is greater than the preset threshold and no other dimension table is required to calculate the access frequency of the indicator, the indicator type is a fixed-dimensional indicator type.
The method for constructing a big data indicator according to claim 1, wherein, after the indicator type of the indicator is determined based on the access frequency, the method for constructing a big data indicator further comprises:

Based on the indicator type, query the model construction method corresponding to the indicator from the preset correspondence table between the indicator type and the model construction method;

If the indicator is an indicator type indicator that requires multi-dimensional aggregation, dimensional modeling is used to construct a random report and/or semi-aggregated report, and store the random report and/or semi-aggregated report in a non-aggregated engine and/or Semi-aggregation engine;

If the indicator is a fixed-dimensional indicator type indicator, wide table modeling is used to construct an aggregate report, and the aggregate report is stored in the aggregation engine.
The method for constructing a big data indicator according to claim 1, wherein, based on the indicator type, according to a preset correspondence table between the indicator type and the storage calculation engine, and one of the indicator type and the dimensional modeling method of the indicator After determining the storage calculation engine and the dimensional modeling method corresponding to the indicator, the method for constructing the big data indicator further includes:

If the indicator is an indicator that needs to be aggregated in multiple dimensions, downgrade the indicator and store it in a random report or a semi-aggregated report;

Query the storage calculation engine corresponding to the indicator that needs multi-dimensional aggregation and the preset dimension table that needs to be associated, and determine the dimension table that needs to be associated when calculating the indicator type that needs multi-dimensional aggregation;

If the indicator is a fixed-dimensional indicator, use wide table modeling to store all the fields in the dimension in an aggregate report;

The storage calculation engine corresponding to the index type index of the fixed dimension is queried, and the aggregation report is stored in the aggregation engine.
A big data indicator construction device, including a memory, a processor, and computer-readable instructions stored on the memory and running on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions :

Obtain the data to be predicted;

Analyze the data to be predicted to construct multiple indicators that carry attribute information of different dimensions;

According to a linear regression algorithm, calculate the access frequency of the indicator, and determine whether the indicator is associated with a preset dimension table;

Determine an indicator type of the indicator based on the access frequency, where the indicator type includes a multi-dimensional aggregated indicator and a fixed-dimensional indicator;

Based on the indicator type, according to the preset correspondence table between the indicator type and the storage calculation engine, and the correspondence table between the indicator type and the dimensional modeling method of the indicator, the storage calculation engine corresponding to the indicator is determined And dimensional modeling;

Determine the preset dimension table associated with the indicator according to the dimensional modeling method, wherein the preset dimension table includes a dimensional table constructed based on the dimensional modeling method corresponding to the indicator type or modeling based on all dimensions Dimension table constructed in a way;

The routing decision engine is used to call the storage calculation engine to execute the preset dimension table, and the index value corresponding to the index is calculated.
The device for constructing a big data indicator according to claim 8, wherein the processor executes the computer-readable instructions to implement the parsing of the data to be predicted to construct a plurality of indicators carrying attribute information of different dimensions. , Including the following steps:

Analyze the data to be predicted and define multiple indicators;

Use a preset model to classify the indicators and add dimensional attributes;

Based on the indicator and the dimensional attribute, the indicator and the dimensional attribute are combined to obtain multiple indicators of different dimensional attributes.
The device for constructing a big data indicator according to claim 8, wherein the execution of the computer-readable instructions by the processor is implemented in the calculation of the access frequency of the indicator according to a linear regression algorithm, and determining whether the indicator is related When there is a preset dimension table, the big data indicator construction device further includes the following steps:

Acquiring historical data including the indicator, where the historical data includes the indicator in a specific period, the number of visits of the indicator in a specific period, and an indicator factor that affects the number of visits of the indicator in a specific period;

Use the historical data as sample data, perform partial correlation analysis on the sample data, extract indicators, and respectively establish the mapping relationship equations between the indicators and corresponding indicator factors;

Perform a T test on the mapping relationship equations to determine the main indicator factors that affect the frequency of the indicator access.
The device for constructing a big data indicator according to claim 8, wherein the processor executes the computer-readable instructions to implement the calculation of the access frequency of the indicator according to the linear regression algorithm, and determines whether the indicator is associated with When the dimension table is preset, the following steps are included:

Based on a linear regression algorithm, determine the main indicator factors that affect the access frequency of the indicator;

Establish a mapping relationship equation between the index and the main index factor, and use the elastic coefficient method to predict the parameter value of the main index factor;

Substituting the parameter value of the index factor into the mapping relationship equation to calculate the access frequency of the index.
The device for constructing a big data indicator according to claim 8, wherein when the processor executes the computer-readable instruction to implement the determination of the indicator type of the indicator based on the access frequency, the method comprises the following steps:

If the access frequency of the indicator is greater than a preset threshold and calculation of the access frequency of the indicator needs to be associated with other dimension tables, the indicator is an indicator type that requires multi-dimensional aggregation;

If the access frequency of the indicator is greater than the preset threshold and no other dimension table is required to calculate the access frequency of the indicator, the indicator type is a fixed-dimensional indicator type.
The device for constructing a big data indicator according to claim 8, wherein the processor executes the computer-readable instruction to realize that after the indicator type of the indicator is determined based on the access frequency, returning the big data The data indicator construction equipment includes the following steps:

Based on the indicator type, query the model construction method corresponding to the indicator from the preset correspondence table between the indicator type and the model construction method;

If the indicator is an indicator type indicator that requires multi-dimensional aggregation, dimensional modeling is used to construct a random report and/or semi-aggregated report, and store the random report and/or semi-aggregated report in a non-aggregated engine and/or Semi-aggregation engine;

If the indicator is a fixed-dimensional indicator type indicator, wide table modeling is used to construct an aggregate report, and the aggregate report is stored in the aggregation engine.
The device for constructing a big data indicator according to claim 8, wherein the processor executes the computer-readable instructions to implement a corresponding relationship table between a preset indicator type and a storage calculation engine based on the indicator type, and After determining the corresponding relationship table between the indicator type and the dimensional modeling manner of the indicator, after the storage calculation engine and the dimensional modeling manner corresponding to the indicator are determined, the big data indicator construction device further includes the following steps:

If the indicator is an indicator that needs to be aggregated in multiple dimensions, downgrade the indicator and store it in a random report or a semi-aggregated report;

Query the storage calculation engine corresponding to the index requiring multi-dimensional aggregation and the preset dimension table that needs to be associated, and determine the dimension table that needs to be associated when calculating the index type index that needs multi-dimensional aggregation;

If the indicator is a fixed-dimensional indicator, use wide table modeling to store all the fields in the dimension in an aggregate report;

The storage calculation engine corresponding to the index type index of the fixed dimension is queried, and the aggregation report is stored in the aggregation engine.
A computer-readable storage medium in which computer instructions are stored, and when the computer instructions are executed on a computer, the computer executes the following steps:

Obtain the data to be predicted;

Analyze the data to be predicted to construct multiple indicators that carry attribute information of different dimensions;

According to a linear regression algorithm, calculate the access frequency of the indicator, and determine whether the indicator is associated with a preset dimension table;

Determine an indicator type of the indicator based on the access frequency, where the indicator type includes a multi-dimensional aggregated indicator and a fixed-dimensional indicator;

Based on the indicator type, according to the preset correspondence table between the indicator type and the storage calculation engine, and the correspondence table between the indicator type and the dimensional modeling method of the indicator, the storage calculation engine corresponding to the indicator is determined And dimensional modeling;

Determine the preset dimension table associated with the indicator according to the dimensional modeling method, wherein the preset dimension table includes a dimensional table constructed based on the dimensional modeling method corresponding to the indicator type or modeling based on all dimensions Dimension table constructed in a way;

The routing decision engine is used to call the storage calculation engine to execute the preset dimension table, and the index value corresponding to the index is calculated.
The computer-readable storage medium according to claim 15, when the computer instructions are executed on the computer, the computer is caused to further perform the following steps:

Analyze the data to be predicted and define multiple indicators;

Use a preset model to classify the indicators and add dimensional attributes;

Based on the indicator and the dimensional attribute, the indicator and the dimensional attribute are combined to obtain multiple indicators of different dimensional attributes.
The computer-readable storage medium according to claim 15, when the computer instructions are executed on the computer, the computer is caused to further perform the following steps:

Acquiring historical data including the indicator, where the historical data includes the indicator in a specific period, the number of visits of the indicator in a specific period, and an indicator factor that affects the number of visits of the indicator in a specific period;

Use the historical data as sample data, perform partial correlation analysis on the sample data, extract indicators, and respectively establish the mapping relationship equations between the indicators and corresponding indicator factors;

Perform a T test on the mapping relationship equations to determine the main indicator factors that affect the frequency of the indicator access.
The computer-readable storage medium according to claim 15, when the computer instructions are executed on the computer, the computer is caused to further perform the following steps:

Based on a linear regression algorithm, determine the main indicator factors that affect the access frequency of the indicator;

Establish a mapping relationship equation between the index and the main index factor, and use the elastic coefficient method to predict the parameter value of the main index factor;

Substituting the parameter value of the index factor into the mapping relationship equation to calculate the access frequency of the index.
The computer-readable storage medium according to claim 15, when the computer instructions are executed on the computer, the computer is caused to further perform the following steps:

If the access frequency of the indicator is greater than a preset threshold and calculation of the access frequency of the indicator needs to be associated with other dimension tables, the indicator is an indicator type that requires multi-dimensional aggregation;

If the access frequency of the indicator is greater than the preset threshold and no other dimension table is required to calculate the access frequency of the indicator, the indicator type is a fixed-dimensional indicator type.
A big data indicator construction device, the big data indicator construction device includes:

The first obtaining module is used to obtain the data to be predicted;

The first construction module is used to analyze the data to be predicted to construct multiple indicators carrying attribute information of different dimensions;

The judgment module is configured to calculate the access frequency of the indicator according to the linear regression algorithm, and judge whether the indicator is associated with a preset dimension table;

The first determining module determines an indicator type of the indicator based on the access frequency, where the indicator type includes a multi-dimensional aggregated indicator and a fixed-dimensional indicator;

The second determining module is used to determine the corresponding relationship between the indicator type and the dimensional modeling method of the indicator based on the indicator type and the preset correspondence table between the indicator type and the storage calculation engine. The storage calculation engine and dimensional modeling method corresponding to the above indicators;

The third determining module is configured to determine the preset dimension table associated with the indicator according to the dimensional modeling method, wherein the preset dimension table includes dimensions constructed based on the dimensional modeling method corresponding to the indicator type Table or dimensional table constructed based on all dimensional modeling methods;

The calculation module is configured to use the routing decision engine to call the storage calculation engine to execute the preset dimension table, and to calculate the index value corresponding to the index.