CN106991101A - A kind of method and apparatus of spreadsheet analysis processing - Google Patents

A kind of method and apparatus of spreadsheet analysis processing Download PDF

Info

Publication number
CN106991101A
CN106991101A CN201610042109.0A CN201610042109A CN106991101A CN 106991101 A CN106991101 A CN 106991101A CN 201610042109 A CN201610042109 A CN 201610042109A CN 106991101 A CN106991101 A CN 106991101A
Authority
CN
China
Prior art keywords
data table
cost
conventional data
parameter
scanning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610042109.0A
Other languages
Chinese (zh)
Other versions
CN106991101B (en
Inventor
王伟
潘旻
罗金鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Tmall Technology Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610042109.0A priority Critical patent/CN106991101B/en
Priority to PCT/CN2017/070977 priority patent/WO2017124959A1/en
Priority to EP17740990.1A priority patent/EP3407212A4/en
Priority to TW106101915A priority patent/TW201732641A/en
Publication of CN106991101A publication Critical patent/CN106991101A/en
Priority to US16/041,336 priority patent/US10909481B2/en
Application granted granted Critical
Publication of CN106991101B publication Critical patent/CN106991101B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation

Abstract

The embodiment of the present application provides the method and apparatus that a kind of spreadsheet analysis is handled, and the tables of data includes the conventional data table of data common layer, and, the external data table of non-data common layer, described method includes:Processing cost data are calculated for the conventional data meter of the data common layer;Determine the conventional data table that the external data table of the non-data common layer is relied on;Processing cost data according to the conventional data table, calculate the use cost data of the external data table, so that when the cost of each conventional data table to data common layer is estimated, no longer it is itself storage for considering current data table in isolation, calculate consumption, and several upstream data tables and fraternal tables of data of the tables of data can be considered, so as to reasonable, accurately assess the processing cost of conventional data table, the quality that the data model of data common layer is built is reflected with this, for data common layer model optimization decision support is provided with operation.

Description

A kind of method and apparatus of spreadsheet analysis processing
Technical field
The application is related to big data processing technology field, more particularly to a kind of spreadsheet analysis processing Method and a kind of spreadsheet analysis processing device.
Background technology
The arriving in big data epoch, has highlighted mass data storage, calculating, the demand of processing, number According to association be particularly important with service.The data of these magnanimity are general with structuring or half The form of structuring is stored in cloud computing cluster, such as:Hadoop, ODPS etc..Mass data Between relation by a sheet by a sheet tables of data that is stored in cloud computing cluster come tissue and embodiment, and And exchanging visit, stream are formed between different company, and same in-company different business department Turn and exchange, so that codes or data due value when really playing big data.
It is general for some conventional data under cloud computing environment in thousands of tables of data Data, unified processing and conclusion can be carried out, some highly versatiles are formed, durability is high, height Unified and standard tables of data, composition data common layer.In general, the tables of data of data common layer It is the data that each business department needs to be commonly used.
It is well known that storage, calculating, management, the maintenance of codes or data are required for consumption during big data Higher hardware and software cost and human cost, then how the cost consumption that data mart modeling is brought is counted How amount, and cost consumption required in data use are assessed and are exchanging visits, flowing as data Turn, exchange during face important and core the problem of.
In prior art, for tables of data processing cost only by being disappeared during data mart modeling The computational hardware resource (such as CPU consumption, memory consumption) and storage resource of consumption (are deposited The consumption of storage media) measure, i.e., simply isolated analysis when previous data table is in process The storage consumption of middle generation and calculating are consumed.Use cost for tables of data also will simply be used The data mart modeling cost of table shares out equally each user to this tables of data, it is clear that also not public enough It is flat and reasonable.This by necessarily cause in prior art no matter data mart modeling cost metering or data The problem of use cost metering is all not accurate enough, so that it is effective to have a strong impact on data in cloud computing environment Property judgement, cause data cost too high, and, excessive unnecessary resource cost.
The content of the invention
In view of the above problems, it is proposed that the embodiment of the present application so as to provide one kind overcome above mentioned problem or The method and corresponding one kind for a kind of spreadsheet analysis processing that person solves the above problems at least in part The device of spreadsheet analysis processing.
In order to solve the above problems, this application discloses a kind of method of spreadsheet analysis processing, institute Stating tables of data includes the conventional data table of data common layer, and, the external number of non-data common layer According to table, described method includes:
Processing cost data are calculated for the conventional data meter of the data common layer;
Determine the conventional data table that the external data table of the non-data common layer is relied on;
According to the processing cost data of the conventional data table, the use of the external data table is calculated Cost data.
Alternatively, the conventional data meter for the data common layer calculates processing cost data The step of include:
Extract the processing cost characteristic parameter of the conventional data table of the data common layer;
Using the processing cost data of conventional data table described in the processing cost calculation of characteristic parameters.
Alternatively, the processing cost characteristic parameter includes the first scanning cost parameter, the extraction The sub-step of the processing cost characteristic parameter of the conventional data table of the data common layer is further wrapped Include:
Count the quantity for the parent table that the conventional data table is relied on;
Obtain scanning amount of the conventional data table to the parent table;
Count the quantity of all sublists under the parent table;
The processing cost number using conventional data table described in the processing cost calculation of characteristic parameters According to sub-step further comprise:
The parent table quantity relied on using the conventional data table, the conventional data table is to the father The scanning amount of table, and, the quantity of all sublists under the parent table calculates the first scanning cost Parameter.
Alternatively, the processing cost characteristic parameter also includes the first calculating cost parameter, and, First carrying cost parameter, the processing cost of the conventional data table of the extraction data common layer The sub-step of characteristic parameter further comprises:
The complexity CU of the conventional data table is extracted as the first calculating cost parameter;
The amount of storage of the conventional data table is extracted as the first carrying cost parameter.
Alternatively, the parent table quantity that the conventional data table is relied on is used by equation below, with And, the conventional data table to the scanning amount of the parent table, and, all sublists under the parent table Quantity, calculate the first scanning cost parameter:
Wherein, Cost (j) is tables of data j processing cost data,
The m parent tables that tables of data j is relied on by tables of data i, numbering is 1 ... m,
ScanSize (i, j) is scanning amounts of the conventional data table i to parent table j,
Tables of data m is parent table j all sublists, numbering is 1 ... n.
Alternatively, general number described in the processing cost calculation of characteristic parameters is used by equation below According to the processing cost data of table:
Wherein, ComputeCost (i) calculates cost parameter for the first of conventional data table i;
StorageCost (i) is conventional data table i the first carrying cost parameter;
ScanCost (i, j) is that conventional data table i scans cost parameter to the first of parent table j.
Alternatively, the processing cost data according to the conventional data table, calculate the outside The step of use cost data of tables of data is,
According to the processing cost characteristic parameter of the conventional data table, the external data table is calculated Use cost data.
Alternatively, the processing cost characteristic parameter according to the conventional data table, calculates described The use cost data step of external data table includes:
Extract being processed into for the conventional data table that the external data table of the non-data common layer is relied on Eigen parameter;
Joined using the use cost feature of external data table described in the processing cost calculation of characteristic parameters Number;
Using the use cost data of external data table described in the use cost calculation of characteristic parameters.
Alternatively, the use cost characteristic parameter includes the second calculating cost parameter;
Relied on conventional data table of the external data table for extracting the non-data common layer plus The sub-step of work cost feature parameter is:
Extract the conventional data table that the external data table is relied on first calculates cost parameter;
The use cost using external data table described in the processing cost calculation of characteristic parameters is special The step of levying parameter includes:
Obtain the calculating cost calculation between the external data table and its conventional data table relied on The factor;
Cost parameter is calculated using described in the calculating cost calculation factor correction first, second is obtained Calculate cost parameter.
Alternatively, the use cost characteristic parameter includes the second carrying cost parameter;
Relied on conventional data table of the external data table for extracting the non-data common layer plus The sub-step of work cost feature parameter is:
Extract the first carrying cost parameter of the conventional data table that the external data table is relied on;
The use cost using external data table described in the processing cost calculation of characteristic parameters is special The step of levying parameter also includes:
The carrying cost obtained between the external data table and its conventional data table relied on is calculated The factor;
First carrying cost parameter described in factor correction is calculated using the carrying cost, second is obtained Carrying cost parameter.
Alternatively, the use cost characteristic parameter includes the second scanning cost parameter;
Relied on conventional data table of the external data table for extracting the non-data common layer plus The sub-step of work cost feature parameter is:
Extract the first scanning cost parameter of the conventional data table that the external data table is relied on;
The use cost using external data table described in the processing cost calculation of characteristic parameters is special The step of levying parameter also includes:
Obtain the scanning cost calculation between the external data table and its conventional data table relied on The factor;
First scanning cost parameter described in factor correction is calculated using the carrying cost, second is obtained Scan cost parameter.
Alternatively, the calculating between the external data table and its conventional data table relied on is obtained The sub-step of the cost calculation factor further comprises:
Obtain the number of the tables of data that every day is over-scanned to the conventional data table in nearest m days Mesh, and, the conventional data table average sublist number of nearest m days;
The conventional data table was carried out according to every day in described nearest m days using equation below The number of the tables of data of scanning, and, the conventional data table average sublist number of nearest m days, Calculate the cost calculation factor:
Wherein, m is every day in nearest m days;
Scanm (j) is the tables of data number over-scanned to conventional data table j for the m days;
Denominator is the example of the conventional data table j average sublist numbers of nearest 90 days.
Alternatively, the storage between the external data table and its conventional data table relied on is obtained The sub-step of the cost calculation factor further comprises:
The scanning amount for the conventional data table that the external data table is relied on it is obtained, and, with There are k tables of dependence in the conventional data table;
The scanning of the conventional data table relied on using equation below according to the external data table it Amount, and, there are k tables of dependence with the conventional data table, calculate carrying cost Calculate the factor:
Wherein, scansize (i, j) is scanning amounts of the external data table i to conventional data table j;
M is the k tables that there is dependence with conventional data table j, for numbering 1 ... k.
Alternatively, the scanning between the external data table and its conventional data table relied on is obtained The sub-step of the cost calculation factor further comprises:
The ratio shared by temperature field in the conventional data table is obtained, and, the conventional data Dependence level of the table in current data common layer, the temperature field is the quilt in certain time period The number of times used is more than the field of the direct downstream data table quantity of the conventional data table;
Using equation below according to the ratio shared by temperature field in the conventional data table, and, Level of the conventional data table in current data common layer, calculates the scanning cost calculation factor:
Wherein, hot_ratio (j) accounts for total Field Count in table for the quantity of conventional data table j temperature field The ratio of amount;
Level (j) is dependence levels of the conventional data table j in data common layer.
Alternatively, external number described in the use cost calculation of characteristic parameters is used by equation below According to the use cost data of table:
Cost (i, j)=compcost (j) * compfac (i, j)+storcost (j) * storfac (j)+scancost (j) * scanfac (i, j)
Wherein, i is external data table, and j is conventional data table, is deposited between tables of data i and tables of data j In dependence;
Cost (i, j) is the use cost data that external data table i uses conventional data table j;
Compcost (j) calculates cost parameter for first in conventional data table j processing cost data;
Compfac (i, j) between external data table i and conventional data table j calculating cost calculation because Son;
Storcost (j) is the first carrying cost parameter in conventional data table j processing cost data;
Storfac (i, j) calculates the factor for the carrying cost between external data table i and conventional data table j;
Scancost (j) is the first scanning cost parameter in conventional data table j processing cost data;
Scanfac (i, j) is the scanning cost calculation factor between external data table i and conventional data table j.
Alternatively, described method also includes:
When the processing cost data meet the first preparatory condition, corresponding conventional data table is extracted.
Alternatively, it is described when the processing cost data meet the first preparatory condition, extract correspondence Conventional data table the step of include:
If the first carrying cost parameter of certain conventional data table and the first ratio for calculating cost parameter Higher than the first predetermined threshold value, then the conventional data table is extracted;
And/or,
If the first of certain conventional data table, which calculates cost parameter, is higher than the second predetermined threshold value, extract Go out the conventional data table;
And/or,
If the ratio of the first scanning cost parameter of certain conventional data table and the first calculating cost parameter Higher than the 3rd predetermined threshold value, then the conventional data table is extracted;
And/or,
The presence of statistics and certain conventional data table directly relies on the second meter of the external data table of relation Calculate cost parameter sum;
If the first of the conventional data table, which calculates cost parameter, is more than the described second calculating cost parameter Sum, then extract the conventional data table;
And/or,
Presence that statistics opens conventional data table with certain directly relies on second depositing for the external data table of relation Store up cost parameter sum;
If the first carrying cost parameter of the conventional data table is more than the second carrying cost parameter Sum, then extract the conventional data table;
And/or,
Presence that statistics opens conventional data table with certain directly relies on second sweeping for the external data table of relation Retouch cost parameter sum;
If the first scanning cost parameter of the conventional data table is more than the described second scanning cost parameter Sum, then extract the conventional data table.
Alternatively, described method also includes:
When the use cost data meet the second preparatory condition, corresponding external data table is extracted.
Alternatively, it is described when the processing cost data meet the second preparatory condition, extract correspondence External data table the step of include:
If the second carrying cost parameter of certain external data table and the second ratio for calculating cost parameter Higher than the 4th predetermined threshold value, then the external data table is extracted;
And/or,
If certain external data table can be obtained and current conventional data table phase from other conventional data tables Same data, and the second scanning cost parameter when obtaining data by other conventional data tables is small The second scanning cost parameter when data are obtained from current conventional data table, then extract described outer Portion's tables of data.
In order to solve the above problems, disclosed herein as well is a kind of device of spreadsheet analysis processing, Characterized in that, the tables of data includes the conventional data table of data common layer, and, non-data The external data table of common layer, described device includes:
Processing cost computing module, calculates for the conventional data meter for the data common layer and adds Work cost data;
Determining module, for determining that it is general that the external data table of the non-data common layer is relied on Tables of data;
Use cost computing module, for the processing cost data according to the conventional data table, meter Calculate the use cost data of the external data table.
Alternatively, the processing cost computing module includes:
Processing cost characteristic parameter extraction submodule, the general number for extracting the data common layer According to the processing cost characteristic parameter of table;
Processing cost calculating sub module, for leading to using described in the processing cost calculation of characteristic parameters With the processing cost data of tables of data.
Alternatively, the processing cost characteristic parameter includes the first scanning cost parameter, the processing Cost feature parameter extraction submodule further comprises:
Parent table quantity statistics unit, the quantity for counting the parent table that the conventional data table is relied on;
Scanning amount acquiring unit, for obtaining scanning amount of the conventional data table to the parent table;
Sublist quantity statistics unit, the quantity for counting all sublists under the parent table;
The processing cost calculating sub module further comprises:
First scanning cost computing unit, for the parent table number relied on using the conventional data table Amount, the conventional data table to the scanning amount of the parent table, and, all sublists under the parent table Quantity, calculate the first scanning cost parameter.
Alternatively, the processing cost characteristic parameter also includes the first calculating cost parameter, and, First carrying cost parameter, the processing cost characteristic parameter extraction submodule further comprises:
First calculates cost parameter extraction unit, the complexity CU for extracting the conventional data table Cost parameter is calculated as first;
First carrying cost parameter extraction unit, the amount of storage for extracting the conventional data table is made For the first carrying cost parameter.
Alternatively, the parent table quantity that the conventional data table is relied on is used by equation below, with And, the conventional data table to the scanning amount of the parent table, and, all sublists under the parent table Quantity, calculate the first scanning cost parameter:
Wherein, Cost (j) is tables of data j processing cost data,
The m parent tables that tables of data j is relied on by tables of data i, numbering is 1 ... m,
ScanSize (i, j) is scanning amounts of the conventional data table i to parent table j,
Tables of data m is parent table j all sublists, numbering is 1 ... n.
Alternatively, general number described in the processing cost calculation of characteristic parameters is used by equation below According to the processing cost data of table:
Wherein, ComputeCost (i) calculates cost parameter for the first of conventional data table i;
StorageCost (i) is conventional data table i the first carrying cost parameter;
ScanCost (i, j) is that conventional data table i scans cost parameter to the first of parent table j.
Alternatively, the use cost computing module includes:
Use cost calculating sub module, for the processing cost feature ginseng according to the conventional data table Number, calculates the use cost data of the external data table.
Alternatively, the use cost calculating sub module includes:
Processing cost characteristic parameter extraction unit, the external number for extracting the non-data common layer The processing cost characteristic parameter of the conventional data table relied on according to table;
Use cost calculation of characteristic parameters unit, for using the processing cost calculation of characteristic parameters The use cost characteristic parameter of the external data table;
Use cost Data Computation Unit, for using described in the use cost calculation of characteristic parameters The use cost data of external data table.
Alternatively, the use cost characteristic parameter includes the second calculating cost parameter;
The processing cost characteristic parameter extraction unit includes:
First calculating cost parameter shifts to an earlier date subelement, is relied on for extracting the external data table The first of conventional data table calculates cost parameter;
The use cost calculation of characteristic parameters unit includes:
Calculate the cost calculation factor and obtain subelement, for obtain the external data table and its institute according to The calculating cost calculation factor between bad conventional data table;
Second calculates cost parameter computation unit, for using the calculating cost calculation factor school Just described first calculates cost parameter, obtains second and calculates cost parameter.
Alternatively, the use cost characteristic parameter includes the second carrying cost parameter;
The processing cost characteristic parameter extraction unit includes:
First carrying cost parameter extraction subelement, is relied on for extracting the external data table First carrying cost parameter of conventional data table;
The use cost calculation of characteristic parameters unit also includes:
Carrying cost calculate the factor obtain subelement, for obtain the external data table with its institute according to Carrying cost between bad conventional data table calculates the factor;
Second carrying cost parameter computation unit, for calculating factor school using the carrying cost Just described first carrying cost parameter, obtains the second carrying cost parameter.
Alternatively, the use cost characteristic parameter includes the second scanning cost parameter;
The processing cost characteristic parameter extraction unit includes:
First scanning cost parameter extraction subelement, is relied on for extracting the external data table First scanning cost parameter of conventional data table;
The use cost calculation of characteristic parameters unit also includes:
Scan the cost calculation factor and obtain subelement, for obtain the external data table and its institute according to The scanning cost calculation factor between bad conventional data table;
Second scanning cost parameter computation unit, for calculating factor school using the carrying cost Just described first scanning cost parameter, obtains second and scans cost parameter.
Alternatively, the calculating cost calculation factor obtains subelement and is further used for:
Obtain the number of the tables of data that every day is over-scanned to the conventional data table in nearest m days Mesh, and, the conventional data table average sublist number of nearest m days;
The conventional data table was carried out according to every day in described nearest m days using equation below The number of the tables of data of scanning, and, the conventional data table average sublist number of nearest m days, Calculate the cost calculation factor:
Wherein, m is every day in nearest m days;
Scanm (j) is the tables of data number over-scanned to conventional data table j for the m days;
Denominator is the example of the conventional data table j average sublist numbers of nearest 90 days.
Alternatively, the carrying cost calculates factor acquisition subelement and is further used for:
The scanning amount for the conventional data table that the external data table is relied on it is obtained, and, with There are k tables of dependence in the conventional data table;
The scanning of the conventional data table relied on using equation below according to the external data table it Amount, and, there are k tables of dependence with the conventional data table, calculate carrying cost Calculate the factor:
Wherein, scansize (i, j) is scanning amounts of the external data table i to conventional data table j;
M is the k tables that there is dependence with conventional data table j, for numbering 1 ... k.
Alternatively, the scanning cost calculation factor obtains subelement and is further used for:
The ratio shared by temperature field in the conventional data table is obtained, and, the conventional data Dependence level of the table in current data common layer, the temperature field is the quilt in certain time period The number of times used is more than the field of the direct downstream data table quantity of the conventional data table;
Using equation below according to the ratio shared by temperature field in the conventional data table, and, Level of the conventional data table in current data common layer, calculates the scanning cost calculation factor:
Wherein, hot_ratio (j) accounts for total Field Count in table for the quantity of conventional data table j temperature field The ratio of amount;
Level (j) is dependence levels of the conventional data table j in data common layer.
Alternatively, external number described in the use cost calculation of characteristic parameters is used by equation below According to the use cost data of table:
Cost (i, j)=compcost (j) * compfac (i, j)+storcost (j) * storfac (j)+scancost (j) * scanfac (i, j)
Wherein, i is external data table, and j is conventional data table, is deposited between tables of data i and tables of data j In dependence;
Cost (i, j) is the use cost data that external data table i uses conventional data table j;
Compcost (j) calculates cost parameter for first in conventional data table j processing cost data;
Compfac (i, j) between external data table i and conventional data table j calculating cost calculation because Son;
Storcost (j) is the first carrying cost parameter in conventional data table j processing cost data;
Storfac (i, j) calculates the factor for the carrying cost between external data table i and conventional data table j;
Scancost (j) is the first scanning cost parameter in conventional data table j processing cost data;
Scanfac (i, j) is the scanning cost calculation factor between external data table i and conventional data table j.
Alternatively, described device also includes:
First extraction module, for when the processing cost data meet the first preparatory condition, carrying Take corresponding conventional data table.
Alternatively, first extraction module includes:
First extracting sub-module, for the first carrying cost parameter in certain conventional data table and the When one ratio for calculating cost parameter is higher than the first predetermined threshold value, the conventional data table is extracted;
And/or,
Second extracting sub-module, for being higher than in the first calculating cost parameter of certain conventional data table During the second predetermined threshold value, the conventional data table is extracted;
And/or,
3rd extracting sub-module, for the first scanning cost parameter in certain conventional data table and the When one ratio for calculating cost parameter is higher than three predetermined threshold values, the conventional data table is extracted;
And/or,
4th statistic submodule, the presence for counting with certain conventional data table directly relies on relation External data table second calculate cost parameter sum;
4th extracting sub-module, is more than for the first calculating cost parameter in the conventional data table During the second calculating cost parameter sum, the conventional data table is extracted;
And/or,
5th statistic submodule, the presence for counting with certain conventional data table directly relies on relation External data table the second carrying cost parameter sum;
5th extracting sub-module, is more than for the first carrying cost parameter in the conventional data table During the second carrying cost parameter sum, the conventional data table is extracted;
And/or,
6th statistic submodule, the presence for counting with certain conventional data table directly relies on relation External data table second scanning cost parameter sum;
6th extracting sub-module, is more than for the first scanning cost parameter in the conventional data table During the second scanning cost parameter sum, the conventional data table is extracted.
Alternatively, described device also includes:
Second extraction module, for when the use cost data meet the second preparatory condition, carrying Take corresponding external data table.
Alternatively, second extraction module includes:
7th extracting sub-module, for the second carrying cost parameter in certain external data table and the When two ratios for calculating cost parameter are higher than four predetermined threshold values, the external data table is extracted;
And/or,
8th extracting sub-module, for that can be obtained in certain external data table from other conventional data tables With current conventional data table identical data, and by other conventional data tables obtain data when Second scanning cost parameter be less than from current conventional data table obtain data when second scanning cost During parameter, the external data table is extracted.
Compared with background technology, the embodiment of the present application includes advantages below:
First, in the embodiment of the present application, by considering the dependence between different pieces of information table, Scanning cost parameter is introduced, the assessment of cost mode of tables of data is optimized so that public to data No longer it is to consider current number in isolation when the cost of each conventional data table of co-layer is estimated According to itself storage of table, consumption is calculated, and several upstream numbers of the tables of data can be considered According to table and fraternal tables of data, being processed into for conventional data table is assessed so as to reasonable, accurate This, reflects the quality that the data model of data common layer is built with this, is the public layer model of data Optimization provides decision support with operation.
Second, in the embodiment of the present application, the use cost for external data table is measured, can be with Clearly evaluate that the conventional data tables of other external data table access data common layers brought deposits Storage, the consumption for calculating, scanning three parts, are easy to assessment external data table access data common layer to lead to With the reasonability and necessity of tables of data, thus auxiliary activities department its data table is built it is excellent Change, it is to avoid the wasting of resources that Data duplication construction is caused, lifting data resource utilization rate, reduction number According to cost, so as to reach purpose cost-effective on the whole.
3rd, in the embodiment of the present application, also calculate the factor by introducing so that upstream data table Cost consumption can be inherited according to rational ratio by downstream data table, meanwhile, by comprehensive Close and consider amount of storage, scanning amount, the extent for multiplexing of tables of data, the processing level and number of tables of data The factors such as the temperature field ratio according to table so that the use cost of outside tables of data is calculated more rationally, It is more accurate.
4th, the application asks embodiment by the processing cost data to conventional data table, and outside The use cost data of portion's tables of data are analyzed, and it is compared with predetermined threshold value, so as to It is enough specifically to identify the too high tables of data of cost consumption, help to optimize the tables of data, Further to reach cost-effective purpose.
Brief description of the drawings
Fig. 1 is a kind of step flow chart of the embodiment of the method one of spreadsheet analysis processing of the application;
Fig. 2 is a kind of conventional data table model schematic diagram of data common layer of the application;
Fig. 3 is a kind of conventional data table and external data table relation schematic diagram of the application;
Fig. 4 is a kind of step flow chart of the embodiment of the method two of spreadsheet analysis processing of the application;
Fig. 5 is another conventional data table and external data table relation schematic diagram of the application;
Fig. 6 is a kind of structured flowchart of the device embodiment of spreadsheet analysis processing of the application.
Embodiment
To enable above-mentioned purpose, the feature and advantage of the application more obvious understandable, with reference to The drawings and specific embodiments are described in further detail to the application.
In prior art, for tables of data processing cost only by being disappeared during data mart modeling The computational hardware resource (such as CPU consumption, memory consumption) and storage resource of consumption (are deposited The consumption of storage media) measure.However, the generation of a tables of data, data therein may be come Need to rely on upstream N numbers from the generation of the N tables of data in upstream, that is, a tables of data According to table.And existing cost measurement model is isolated analysis when previous data table is processed The storage consumption and calculating produced in journey is consumed, not in view of the dependence between tables of data, Therefore it have ignored the scanning consumption between tables of data.
Prior art also will simply be used the data mart modeling cost of table for the use cost of tables of data Each user to this tables of data is shared out equally, without the specific visit according to each user Ask that situation is different, take different allocation way.Because different users is to same tables of data Service condition is different, and the data volume that some users access calculates also more complicated than larger, Some users only have read a small amount of data, calculate very simple.If by the way of sharing equally, The scanning cost that so the two users are undertaken is identical, but this be clearly it is unfair, It is irrational.
In view of the above-mentioned problems, the application creatively propose for carry out spreadsheet analysis processing two The metering model of metering model, respectively the data mart modeling cost of data common layer is planted, and, outside Portion data object BU accesses the data use cost metering model of the public layer data of data.
It is simple to the embodiment of the present application below to make those skilled in the art more fully understand the application The core idea brief description of two kinds of involved metering models:
First, the metering model of data common layer data mart modeling cost:Including calculating assessment of cost, depositing Store up three parts of assessment of cost and scanning assessment of cost.Calculate assessment of cost and carrying cost is assessed It is the angle from the conventional data table itself, reflects this tables of data during data mart modeling Actual software and hardware consumption.And cost is scanned, it is since it is considered that tables of data during data mart modeling Dependence, scan cost calculating according to sublist the scanning amount of parent table is accounted for parent table totally swept The ratio for the amount of retouching shares parent table cost, is used as scanning cost of the sublist to parent table.
2nd, external data object BU accesses the data use cost metering mould of the public layer data of data Type:According to the metering method of data mart modeling cost consumption, it can obtain being used three of tables of data It is divided into this, that is, calculates cost, carrying cost, scanning cost.For this tables of data use into This, can calculate this three departmental cost in the way of corresponding proportion shares rear weight summation.Three The amortization ratio algorithm of departmental cost can be with different.
Above two metering model is applied in actual Data Analysis Services, can at least solved Following technical problem:
1) obtain a data common layer tables of data carrying cost, calculate cost, scanning cost tripartite The ratio in face;
2) when carrying cost is higher than some threshold value, amount of storage can be reduced;
3) when calculating cost higher than some threshold value, the calculating logic of this tables of data can be optimized, Reduce amount of calculation;
4) when scanning cost higher than some threshold value, the processing links of this tables of data can be optimized, Reduce the useless scan data volume to parent table;
5) control data user, only reads necessary data volume from common layer, reduces hash Scanning;
6) control data user, as far as possible using the deeper table of level (the deeper table of level be all through The table of common layer deep processing is crossed, is fine work table).
Data mart modeling cost generally, for each tables of data of data common layer is less than The data use cost sum in the direct downstream of the table, this tables of data just meets data common layer It is required that, just there is the value for being present in data common layer.
Reference picture 1, shows a kind of step of the embodiment of the method one of spreadsheet analysis processing of the application Rapid flow chart, wherein, the tables of data can include the conventional data table of data common layer, and, The external data table of non-data common layer, described method specifically may include steps of:
Step 101, processing cost data are calculated for the conventional data meter of the data common layer;
In the embodiment of the present application, the processing cost data of conventional data table can not only be included in pair Tables of data carry out data mart modeling during consumed computational hardware resource (such as CPU consumption, Memory consumption) and storage resource (consumption of storage medium), can also be including between tables of data Scanning consumption between dependence, i.e. tables of data.
The generation of one tables of data, data therein may be from the N tables of data in upstream, because This, between tables of data scanning consumption embody be during being processed to tables of data, can Can use to the scanning amount of the tables of data relied on.Reference picture 2, shows that a kind of data are public The conventional data table model schematic diagram of co-layer, each circle A, B, C, D, E, F difference table Registration is according to 6 conventional data tables of common layer, and the arrow in figure between two circles represents that two lead to With the data exchanging visit relation existed between tables of data, that is, scan relation.For example, conventional data Arrow between table B and conventional data Table A represents that conventional data table B needs to scan conventional data table The size of digitized representation scanning amount on A, arrow, unit is conventional data in TB, therefore Fig. 2 The data that table B needs to scan conventional data Table A are 2TB.
In a preferred embodiment of the present application, the general number for the data common layer Following sub-step can specifically be included by calculating processing cost data according to meter:
Sub-step 1011, extracts the processing cost feature ginseng of the conventional data table of the data common layer Number;
Sub-step 1012, using adding for conventional data table described in the processing cost calculation of characteristic parameters Work cost data.
In a kind of embodiment of the application, the processing cost characteristic parameter can include the first meter Calculate cost parameter, and, the first carrying cost parameter, the extraction data common layer it is logical It may further include with the sub-step of the processing cost characteristic parameter of tables of data:
The complexity CU of the conventional data table is extracted as the first calculating cost parameter;
The amount of storage of the conventional data table is extracted as the first carrying cost parameter.
In the embodiment of the present application, the first calculating cost parameter can be that the conventional data table is entering The cpu resource expended is needed during row data mart modeling, can be calculated with complexity CU, 1CU Represent the cost consumption required for 1 CPU (core) is run one day.Complexity CU can be from opening Put data processing service ODPS (Open Data Processing Service, abbreviation ODPS) cluster Obtained in metadata.ODPS is a kind of large-scale distributed data processing service, can be supported to sea Amount data are handled.
First carrying cost parameter can be the required consumption when being stored to the conventional data table The hard-disc storage resource taken, can be calculated, 1TU represents 1TB data storages one with amount of storage TU Cost consumption required for it.Amount of storage TU can also be obtained from ODPS cluster metadata.
In the embodiment of the present application, in order to by the complexity in units of CU and in units of TU Amount of storage carries out unified, comprehensive metering, can introduce new resource consumption measurement unit, i.e. resource Unit, is designated as CT.Wherein, the conversion relation between resource unit and complexity CU is: 1CT=4CU;Conversion relation between resource unit and amount of storage TU is:1CT=9TU.
If for example, the complexity of one conventional data table consumption of processing is 1CU, the amount of storage of consumption For 2TU, then the resource that the conventional data table is consumed in process is 1/4+2/9=0.47 CT。
In another embodiment of the application, the processing cost characteristic parameter can also include the One scan cost parameter, the processing cost of the conventional data table of the extraction data common layer is special The sub-step for levying parameter may further include:
Count the quantity for the parent table that the conventional data table is relied on;
Obtain scanning amount of the conventional data table to the parent table;
Count the quantity of all sublists under the parent table;
The processing cost number using conventional data table described in the processing cost calculation of characteristic parameters According to sub-step may further include:
The parent table quantity relied on using the conventional data table, the conventional data table is to the father The scanning amount of table, and, the quantity of all sublists under the parent table calculates the first scanning cost Parameter.
For example, referring to shown in Fig. 2, the arrow generation between conventional data table C and conventional data Table A Table conventional data table C needs to scan conventional data Table A, i.e. conventional data Table A is conventional data Digitized representation sublist C on table C parent table, arrow is 1TB to the size of parent table A scanning amount, Sublist quantity under parent table A totally 3, i.e. conventional data table B, conventional data table C and general Tables of data D.Using above-mentioned data, so as to calculate the first scanning cost parameter.
In the specific implementation, the first scanning cost parameter can be calculated by equation below:
Wherein, Cost (j) is tables of data j processing cost data,
The m parent tables that tables of data j is relied on by tables of data i, numbering is 1 ... m,
ScanSize (i, j) is scanning amounts of the conventional data table i to parent table j,
Tables of data m is parent table j all sublists, numbering is 1 ... n.
In a preferred embodiment of the present application, cost parameter, first can be calculated using first Carrying cost parameter, and, the first scanning cost parameter calculates being processed into for the conventional data table Notebook data.
In the specific implementation, the processing cost of the conventional data table can be calculated by equation below Data:
Wherein, ComputeCost (i) calculates cost parameter for the first of conventional data table i;
StorageCost (i) is conventional data table i the first carrying cost parameter;
ScanCost (i, j) is that conventional data table i scans cost parameter to the first of parent table j.
Therefore, the processing cost data of each conventional data table can be calculated as follows in Fig. 2:
Conventional data Table A:2/9+1/4+0=0.472CT
Conventional data table B:1/9+2/4+0.472* (2/ (2+1+1))=0.845CT
Conventional data table C:2/9+2/4/0.472* (1/ (2+1+1))=0.840CT
Conventional data table D:1/9+1/4+0.472* (1/ (2+1+1))=0.479CT
Conventional data table E:0.5/9+3/4+0.854*2/2+0.840* (1/ (1+5))=1.800CT
Conventional data table F:1/9+3/4+0.840* (5/ (1+5))=1.561CT
Above example is only to help the understanding to the embodiment of the present application, be should not be understood as to the application Restriction.Those skilled in the art can be according to the reality between each conventional data table in data common layer Border dependence, using the method and formula described in the embodiment of the present application, obtains corresponding processing Cost data.
Step 102, the conventional data that the external data table of the non-data common layer is relied on is determined Table;
In the embodiment of the present application, can be true first for the external data table of non-data common layer Make the conventional data table that the external data table is relied on.Reference picture 3, shows a kind of general number According to table and external data table relation schematic diagram, Table A in Fig. 3, table B, table C represents data respectively The conventional data table of common layer, table D then represents an external data table of non-data common layer.Outside Portion tables of data D can be with accessing universal table B and conventional data table C.In each conventional data table 4 numerals in circle represent the first calculating cost parameter of the conventional data table, first respectively Carrying cost parameter, the first scanning cost parameter, and total memory data output.
For example, referring to Fig. 3, the first of conventional data Table A calculates cost parameter for 1CT, and first Carrying cost parameter is 2CT, and the first scanning cost parameter is 2CT, the data of conventional data Table A Amount of storage is 10TB.External data table D represents external number with the numeral on conventional data table B arrows It is 2TB according to the table D data volumes for scanning conventional data table B.
Above example is only a kind of example of conventional data table and external data table relation, be should not be understood To be the restriction to the application, those skilled in the art can be according to actual conditions, using the application Method described in embodiment, determines the actual dependence between external data table and conventional data table Relation and data scanning situation.
Step 103, the processing cost data according to the conventional data table, calculate the external data The use cost data of table;
In the embodiment of the present application, because external data table has the dependence between conventional data table Relation, therefore, it can the processing cost data according to the conventional data table, calculate the outside The use cost data of tables of data.Specifically, processing cost that can be according to the conventional data table Characteristic parameter, calculates the use cost data of the external data table.
In a preferred embodiment of the present application, being processed into according to the conventional data table The step of eigen parameter, use cost data for calculating the external data table, can specifically include:
Extract being processed into for the conventional data table that the external data table of the non-data common layer is relied on Eigen parameter;
Joined using the use cost feature of external data table described in the processing cost calculation of characteristic parameters Number;
Using the use cost data of external data table described in the use cost calculation of characteristic parameters.
In the specific implementation, being relied on when the external data table that determine the non-data common layer After conventional data table, the machining feature parameter of the conventional data table can be extracted, so that according to Dependence between the external data table and the conventional data table, calculates the external number According to the use cost characteristic parameter of table, and then obtain the use cost data of the external data table.
Further, the use cost characteristic parameter can include the second calculating cost parameter, the Two carrying cost parameters, and, the second scanning cost parameter.
Second calculating cost parameter can be the external data table using the general of data common layer The cpu resource expended required for during tables of data, can equally be calculated with complexity CU; Second carrying cost parameter can be the hard-disc storage expended required for being stored to conventional data table Resource, can be calculated with amount of storage TU;Second scanning cost parameter can then embody external data table Scanning relation between the conventional data table of data common layer.
In a preferred embodiment of the present application, methods described can further include step 104 With step 105.
Step 104, when the processing cost data meet the first preparatory condition, extract corresponding logical Use tables of data;
Step 105, when the use cost data meet the second preparatory condition, extract corresponding outer Portion's tables of data.
In the specific implementation, when the processing cost data for obtaining the conventional data table, and, outside , can be by the processing cost data and the use cost after the use cost data of portion's tables of data Data are compared with the first preparatory condition and the second preparatory condition respectively, to determine whether to meet phase The preparatory condition answered, if so, corresponding conventional data table can be then extracted, or, external data table.
For example, for the conventional data table of data common layer, cost ginseng is calculated obtaining first respectively After number, the first carrying cost parameter and the first scanning cost parameter, the first calculating can be judged respectively Whether cost parameter, the first carrying cost parameter and the first scanning cost parameter meet default condition. If the first carrying cost parameter is too high, it is contemplated that reducing amount of storage for the conventional data table; If the first calculating cost parameter is higher, the calculating logic of the conventional data table can be optimized, subtracted Few computation complexity;If the first scanning cost parameter is higher, can be to the conventional data table Processing links are optimized, to reduce the useless scan data volume to parent table.
And for the external data table of non-data common layer, then can be according to the use cost number of acquisition According to data user is urged, necessary data volume only is read from data common layer, hash is reduced Scanning, and, as far as possible using the deeper conventional data table of level, because deeper general of level Tables of data is all the table by data common layer deep processing, is fine work table.
In the embodiment of the present application, by considering the dependence between different pieces of information table, introduce Cost parameter is scanned, the assessment of cost mode of tables of data is optimized so as to data common layer No longer it is to consider current data table in isolation when the cost of each conventional data table is estimated Itself storage, calculate consumption, and can consider several upstream data tables of the tables of data with And fraternal tables of data, so as to reasonable, the accurate processing cost for assessing conventional data table, with This reflects the quality that the data model of data common layer is built, be data common layer model optimization with Operation provides decision support.
Secondly, in the embodiment of the present application, the use cost for external data table is measured, can be with Clearly evaluate that the conventional data tables of other external data table access data common layers brought deposits Storage, the consumption for calculating, scanning three parts, are easy to assessment external data table access data common layer to lead to With the reasonability and necessity of tables of data, thus auxiliary activities department its data table is built it is excellent Change, it is to avoid the wasting of resources that Data duplication construction is caused, lifting data resource utilization rate, reduction number According to cost, so as to reach purpose cost-effective on the whole.
Reference picture 4, shows a kind of step of the embodiment of the method two of spreadsheet analysis processing of the application Rapid flow chart, wherein, the tables of data can include the conventional data table of data common layer, and, The external data table of non-data common layer, described method specifically may include steps of:
Step 201, the processing cost characteristic parameter of the conventional data table of the data common layer is extracted;
In the embodiment of the present application, the processing cost characteristic parameter of the conventional data table can include First calculating cost parameter, the first carrying cost parameter, and, the first scanning cost parameter.
First calculating cost parameter can be the conventional data table during data mart modeling is carried out The cpu resource expended is needed, is calculated with complexity CU;First carrying cost parameter can be The hard-disc storage resource of required consuming when being stored to the conventional data table, with amount of storage TU Calculate;First scanning cost parameter then embodies the conventional data table to associated conventional data The scanning amount situation of table, the parent table quantity that can be relied on according to the conventional data table is described logical With scanning amount of the tables of data to the parent table, and, the quantity of all sublists is calculated under the parent table Obtain.
In the embodiment of the present application, in order to by the complexity in units of CU and in units of TU Amount of storage carries out unified, comprehensive metering, can introduce new resource consumption measurement unit, i.e. resource Unit, is designated as CT.Conversion relation between resource unit and complexity CU, amount of storage TU can Think:1CT=4CU, 1CT=9TU.
Step 202, being processed into using conventional data table described in the processing cost calculation of characteristic parameters Notebook data;
In the specific implementation, the processing cost of the conventional data table can be calculated by equation below Data:
Wherein, ComputeCost (i) calculates cost parameter for the first of conventional data table i;
StorageCost (i) is conventional data table i the first carrying cost parameter;
ScanCost (i, j) is that conventional data table i scans cost parameter to the first of parent table j.
Step 203, the conventional data that the external data table of the non-data common layer is relied on is determined Table;
For example, referring to shown in Fig. 3, it is general that the external data table D of non-data common layer is relied on Tables of data includes conventional data table B and conventional data table C.
Step 204, the conventional data table that the external data table of the non-data common layer is relied on is extracted Processing cost characteristic parameter;
In a kind of embodiment of the application, the use cost characteristic parameter can include the second meter Calculate cost parameter;Therefore, the external data table for extracting the non-data common layer is relied on The sub-step of the processing cost characteristic parameter of conventional data table can be:Extract the external data table The first of the conventional data table relied on calculates cost parameter.
In another embodiment of the application, the use cost characteristic parameter can also include the Two carrying cost parameters;Therefore, the external data table institute for extracting the non-data common layer according to The sub-step of the processing cost characteristic parameter of bad conventional data table can also be:Extract the outside First carrying cost parameter of the conventional data table that tables of data is relied on.
In another embodiment of the application, the use cost characteristic parameter can also include the Two scanning cost parameters;Therefore, the external data table institute for extracting the non-data common layer according to The sub-step of the processing cost characteristic parameter of bad conventional data table can be:Extract the external number First scanning cost parameter of the conventional data table relied on according to table.
For example, referring to shown in Fig. 3, the conventional data table that external data table is relied on is conventional data Table B and conventional data table C, calculates cost parameter for second, conventional data can be extracted respectively The first of table B and conventional data table C calculates cost parameter, conventional data table B and conventional data table It is 1CT that the first of C, which calculates cost parameter,;For the second carrying cost parameter, it can extract respectively Conventional data table B and conventional data table C the second carrying cost parameter, the of conventional data table B Two carrying cost parameters are 1CT, and conventional data table C the second carrying cost parameter is 4CT;Pin To the second scanning cost parameter, the of conventional data table B and conventional data table C can be extracted respectively Two scanning cost parameters, conventional data table B the second scanning cost parameter is 3CT, conventional data Table C the second scanning cost parameter is 2CT.
Above example is only to help the understanding to the embodiment of the present application, is not considered as to the application's Limit, those skilled in the art can be according to actual conditions, described in the embodiment of the present application Method, obtains corresponding result.
Step 205, using external data table described in the processing cost calculation of characteristic parameters use into Eigen parameter;
It is described to use the processing cost calculation of characteristic parameters institute in a kind of embodiment of the application The step of use cost characteristic parameter for stating external data table, can include;
Obtain the calculating cost calculation between the external data table and its conventional data table relied on The factor;
Cost parameter is calculated using described in the calculating cost calculation factor correction first, second is obtained Calculate cost parameter.
For same conventional data table, it may be made by multiple different external data tables With different users is different to the service condition of same conventional data table, some uses The data volume that person accesses calculates also more complicated, some users only have read a small amount of than larger Data, are calculated very simple.If by the way of sharing equally, then what the two users were undertaken Cost is identical, but this is clearly unfair, irrational.Therefore, implement in the application In example, the calculating cost calculation factor is introduced, by using the calculating cost calculation factor correction Described first calculates cost parameter, so as to obtain the second calculating cost parameter.Calculate the specific body of the factor The outside is showed using table during using conventional data table, use feelings of the sublist to parent table Condition accounts for the overall ratio by service condition of parent table.
Specifically, the calculating between the external data table and its conventional data table relied on is obtained The sub-step of the cost calculation factor may further include:
Obtain the number of the tables of data that every day is over-scanned to the conventional data table in nearest m days Mesh, and, the conventional data table average sublist number of nearest m days;
It is for instance possible to use equation below, calculates the cost calculation factor, so as to obtain the second meter Calculate cost parameter:
Wherein, m is every day in nearest m days;
Scanm (j) is the tables of data number over-scanned to conventional data table j for the m days;
Denominator is the example of the conventional data table j average sublist numbers of nearest 90 days.
It is described to use the processing cost calculation of characteristic parameters in another embodiment of the application The step of use cost characteristic parameter of the external data table, can also include;
The carrying cost obtained between the external data table and its conventional data table relied on is calculated The factor;
First carrying cost parameter described in factor correction is calculated using the carrying cost, second is obtained Carrying cost parameter.
It is similar with the calculating that second calculates cost parameter, can also for the second carrying cost parameter The mode of the first carrying cost parameter described in factor correction is calculated by using carrying cost, to obtain Second carrying cost parameter.
Specifically, the storage between the external data table and its conventional data table relied on is obtained The sub-step of the cost calculation factor may further include:
The scanning amount for the conventional data table that the external data table is relied on it is obtained, and, with There are k tables of dependence in the conventional data table;
Equation below can be used, carrying cost is calculated and calculates the factor, so as to obtain the second storage Cost parameter:
Wherein, scansize (i, j) is scanning amounts of the external data table i to conventional data table j;
M is the k tables that there is dependence with conventional data table j, for numbering 1 ... k.
It is described to use the processing cost calculation of characteristic parameters in another embodiment of the application The step of use cost characteristic parameter of the external data table, can also include;
Obtain the scanning cost calculation between the external data table and its conventional data table relied on The factor;
First scanning cost parameter described in factor correction is calculated using the carrying cost, second is obtained Scan cost parameter.
Similarly, can also be by obtaining scanning cost for the acquisition of the second scanning cost parameter The factor is calculated, determines that sublist accounts for the parent table totally scanned ratio measured to the scanning amount of parent table, uses The ratio adjustment first scans cost parameter, so as to obtain the second scanning cost parameter.
Specifically, the scanning between the external data table and its conventional data table relied on is obtained The sub-step of the cost calculation factor may further include:
The ratio shared by temperature field in the conventional data table is obtained, and, the conventional data Dependence level of the table in current data common layer;
For any conventional data table, any one field a in table, if the field a The number of times used in certain time period by downstream data table be more than the conventional data table it is direct under Swim table number, then the field a is exactly the temperature field of the conventional data table.Therefore, for The ratio that temperature Field Count in any conventional data table, table accounts for total Field Count in table is temperature word Duan Suozhan ratio.The period counted typically for temperature field can be based on over one day Calculate.
What the dependence level of conventional data table embodied is the conventional data table and current data common layer In dependence between other conventional data tables.Shown in reference picture 3, wrapped altogether in data common layer Include 3 conventional data tables, i.e. conventional data Table A, conventional data table B and conventional data table C. If the dependence level of conventional data Table A is 1, conventional data table B and conventional data table C according to Bad level is 2.
In the specific implementation, can use equation below, calculate scanning the cost calculation factor, from And obtain second and scan cost parameter:
Wherein, hot_ratio (j) accounts for total Field Count in table for the quantity of conventional data table j temperature field The ratio of amount;
Level (j) is dependence levels of the conventional data table j in data common layer.
Step 206, using external data table described in the use cost calculation of characteristic parameters use into Notebook data;
In the embodiment of the present application, when obtain the external data table respectively second calculates cost ginseng After number, the second carrying cost parameter and the second scanning cost parameter, described second can be calculated as This parameter, the second carrying cost parameter and the second scanning cost parameter are added up, so as to obtain institute State the use cost data of external data table.
In the specific implementation, the use cost of the external data table can be calculated by equation below Data:
Cost (i, j)=compcost (j) * compfac (i, j)+storcost (j) * storfac (j)+scancost (j) * scanfac (i, j)
Wherein, i is external data table, and j is conventional data table, is deposited between tables of data i and tables of data j In dependence;
Cost (i, j) is the use cost data that external data table i uses conventional data table j;
Compcost (j) calculates cost parameter for first in conventional data table j processing cost data;
Compfac (i, j) between external data table i and conventional data table j calculating cost calculation because Son;
Storcost (j) is the first carrying cost parameter in conventional data table j processing cost data;
Storfac (i, j) calculates the factor for the carrying cost between external data table i and conventional data table j;
Scancost (j) is the first scanning cost parameter in conventional data table j processing cost data;
Scanfac (i, j) is the scanning cost calculation factor between external data table i and conventional data table j.
Step 207, when the processing cost data meet the first preparatory condition, extract corresponding logical Use tables of data;
Step 208, when the use cost data meet the second preparatory condition, extract corresponding outer Portion's tables of data.
In the specific implementation, the processing cost data of the conventional data table ought be obtained respectively, and, After the use cost data of external data table, according to the processing cost data and described it can use Cost data, is analyzed the conventional data table and external data table, to determine the need for Processing is optimized to the tables of data.
It is described when the processing cost data meet first in a preferred embodiment of the present application During preparatory condition, the step of extracting corresponding conventional data table can include:
If the first carrying cost parameter of certain conventional data table and the first ratio for calculating cost parameter Higher than the first predetermined threshold value, then the conventional data table is extracted;
And/or,
If the first of certain conventional data table, which calculates cost parameter, is higher than the second predetermined threshold value, extract Go out the conventional data table;
And/or,
If the ratio of the first scanning cost parameter of certain conventional data table and the first calculating cost parameter Higher than the 3rd predetermined threshold value, then the conventional data table is extracted;
And/or,
The presence of statistics and certain conventional data table directly relies on the second meter of the external data table of relation Calculate cost parameter sum;
If the first of the conventional data table, which calculates cost parameter, is more than the described second calculating cost parameter Sum, then extract the conventional data table;
And/or,
Presence that statistics opens conventional data table with certain directly relies on second depositing for the external data table of relation Store up cost parameter sum;
If the first carrying cost parameter of the conventional data table is more than the second carrying cost parameter Sum, then extract the conventional data table;
And/or,
Presence that statistics opens conventional data table with certain directly relies on second sweeping for the external data table of relation Retouch cost parameter sum;
If the first scanning cost parameter of the conventional data table is more than the described second scanning cost parameter Sum, then extract the conventional data table.
If for example, the first carrying cost parameter of conventional data table and the first of the conventional data table The ratio for calculating cost parameter is more than 1/4, it is believed that the carrying cost of the conventional data table is higher, The conventional data table can then be extracted, it is considered to reduce amount of storage.
If the first of the conventional data table calculates cost parameter more than 30CU, that is, CPU Computing has exceeded 30min, then it is contemplated that optimizing the calculating logic of the conventional data table, to reduce Amount of calculation.
If the ratio of the first scanning cost parameter of the conventional data table and the first calculating cost parameter More than 10, it is believed that the first scanning cost parameter is higher, then it is contemplated that to the conventional data The processing links of table are optimized, to reduce the useless scan data volume to parent table.
If in addition, the first of the conventional data table calculates cost parameter more than the conventional data table All users calculating cost sum, or, the first carrying cost of the conventional data table Parameter is more than the carrying cost sum of all users of the conventional data table, or, it is described logical It is more than the scanning of all users of the conventional data table with the first scanning cost parameter of tables of data Cost sum, then can recognize and extract the conventional data table, with for further processing.
Above example is only to help the understanding to the embodiment of the present application, and those skilled in the art can root According to actual conditions, it is determined that corresponding predetermined threshold value size, the application is not construed as limiting to this.
It is described when the processing cost data meet the in another preferred embodiment of the present application During two preparatory conditions, the step of extracting corresponding external data table can include:
If the second carrying cost parameter of certain external data table and the second ratio for calculating cost parameter Higher than the 4th predetermined threshold value, then the external data table is extracted;
And/or,
If certain external data table can be obtained and current conventional data table phase from other conventional data tables Same data, and the second scanning cost parameter when obtaining data by other conventional data tables is small The second scanning cost parameter when data are obtained from current conventional data table, then extract described outer Portion's tables of data.
If for example, the second carrying cost parameter of the external data table calculates cost parameter with second Ratio be more than 1/4, it is believed that the carrying cost of the external data table is higher, then can extract Go out the external data table, it is considered to reduce amount of storage.
If in addition, the data that the external data table is relied on can be obtained from other conventional data tables , and when the external data table is scanned to the conventional data table, described second is scanned into Second when this parameter is scanned less than the external data table to current conventional data table is scanned into This parameter, then it is contemplated that the dependence to the external data table is optimized, swept with reducing Retouch cost.
Above example is only to help the understanding to the embodiment of the present application, and those skilled in the art can root According to actual conditions, it is determined that corresponding predetermined threshold value size, the application is not construed as limiting to this.
In the embodiment of the present application, the factor is calculated by introducing so that the cost of upstream data table disappears Consumption can be inherited according to rational ratio by downstream data table, meanwhile, deposited by considering Reserves, scanning amount, the extent for multiplexing of tables of data, the processing level of tables of data and the heat of tables of data Spend the factors such as field ratio so that the use cost to outside tables of data calculates more reasonable, more accurate.
Secondly, the application asks embodiment by the processing cost data to conventional data table, and outside The use cost data of portion's tables of data are analyzed, and it is compared with predetermined threshold value, so as to It is enough specifically to identify the too high tables of data of cost consumption, help further to enter the tables of data Row optimization, to reach cost-effective purpose.
To enable above-mentioned purpose, the feature and advantage of the application more obvious understandable, below with one Individual complete example is made one to the preferred embodiment of the application and is described in detail.
If having 6 data Table As, B, C, D, E and F, its scanning relation each other is as follows Shown in table one:
Table one:
In Table 1:Data common layer includes 4 conventional data tables, i.e. conventional data Table A, led to With tables of data B, conventional data table C and conventional data table D;The external data of not common data Layer Totally 2, table, i.e. external data table E and external data table F
Wherein, for the first row data in table one, it can be understood as:Conventional data table B's deposits Reserves are 10TB, the amount of storage of conventional data Table A is 20TB, and conventional data table B-scan is logical With tables of data A1TB data.Three sublists are had under conventional data Table A.
For the second row data in table one, it can be understood as:Conventional data table C amount of storage is 6TB, conventional data table B amount of storage is 10TB, conventional data table C-scan conventional data table B 2TB data.Two sublists are had under conventional data table B.
For the fourth line data in table one, it can be understood as:External data table E amount of storage is 12TB, conventional data table C amount of storage are 6TB, and external data table E scans conventional data table C 2TB data.Four sublists are had under conventional data table C.
According to above-mentioned scanning relation, another that can construct the application as shown in Figure 5 is general Tables of data and external data table relation schematic diagram.
According to conventional data table processing cost data calculation formula as described below
The conventional data table processing cost data such as following table two can be obtained:
Table two:
Meanwhile, according to external data table use cost data calculation formula as described below
Cost (i, j)=compcost (j) * compfac (i, j)+storcost (j) * storfac (j)+scancost (j) * scanfac (i, j) The external data table use cost data such as following table three can be obtained:
Table three:
Then by the use cost of the processing cost data of above-mentioned conventional data table, and external data table Data are compared with default condition, so as to extract as following table four conventional data table and External data table:
Table four:
Above example is only to help the understanding to herein described method, is not considered as to the application Restriction, those skilled in the art can be according to the actual dependence between tables of data, according to this The described method of application and formula, determine the processing cost data of conventional data table, and outside The use cost data of tables of data, so that according to the processing cost data and use cost data, Recognize the need for optimizing tables of data.
It should be noted that for embodiment of the method, in order to be briefly described, therefore it is all expressed as A series of combination of actions, but those skilled in the art should know, the embodiment of the present application is not Limited by described sequence of movement, because according to the embodiment of the present application, some steps can be adopted Carry out with other orders or simultaneously.Secondly, those skilled in the art should also know, specification Described in embodiment belong to preferred embodiment, involved action not necessarily the application Necessary to embodiment.
Reference picture 6, shows a kind of structure of the device embodiment of spreadsheet analysis processing of the application Block diagram, wherein, the tables of data can include the conventional data table of data common layer, and, it is non- The external data table of data common layer, described device can specifically include following module:
Processing cost computing module 301, for the conventional data meter calculation for the data common layer Processing cost data;
Determining module 302, for determining that it is logical that the external data table of the non-data common layer is relied on Use tables of data;
Use cost computing module 303, for the processing cost data according to the conventional data table, Calculate the use cost data of the external data table.
In the embodiment of the present application, the processing cost computing module 301 can specifically include as follows Submodule:
Processing cost characteristic parameter extraction submodule 3011, for extracting the logical of the data common layer With the processing cost characteristic parameter of tables of data;
Processing cost calculating sub module 3012, for using the processing cost calculation of characteristic parameters institute State the processing cost data of conventional data table.
In a kind of embodiment of the application, the processing cost characteristic parameter can be swept including first Retouch cost parameter, the processing cost characteristic parameter extraction submodule 3011 may further include as Lower unit:
Parent table quantity statistics unit 111A, for counting the parent table that the conventional data table is relied on Quantity;
Scanning amount acquiring unit 111B, for obtaining scanning of the conventional data table to the parent table Amount;
Sublist quantity statistics unit 111C, the quantity for counting all sublists under the parent table;
The processing cost calculating sub module 3012 may further include such as lower unit:
First scanning cost computing unit 121A, for the father relied on using the conventional data table Table quantity, the conventional data table to the scanning amount of the parent table, and, own under the parent table The quantity of sublist, calculates the first scanning cost parameter.
In another embodiment of the application, the processing cost characteristic parameter can also include the One calculates cost parameter, and, the first carrying cost parameter, the processing cost characteristic parameter is carried Submodule 3011 is taken to can further include such as lower unit:
First calculates cost parameter extraction unit 112A, the complexity for extracting the conventional data table CU, which is spent, as first calculates cost parameter;
First carrying cost parameter extraction unit 113A, the storage for extracting the conventional data table Amount is used as the first carrying cost parameter.
In the embodiment of the present application, it can be relied on by equation below using the conventional data table The parent table quantity of connection, and, the conventional data table to the scanning amount of the parent table, and, institute The quantity of all sublists under parent table is stated, the first scanning cost parameter is calculated:
Wherein, Cost (j) is tables of data j processing cost data,
The m parent tables that tables of data j is relied on by tables of data i, numbering is 1 ... .m,
ScanSize (i, j) is scanning amounts of the conventional data table i to parent table j,
Tables of data m is parent table j all sublists, numbering is 1 ... n.
In the embodiment of the present application, the processing cost characteristic parameter can be used by equation below Calculate the processing cost data of the conventional data table:
Wherein, ComputeCost (i) calculates cost parameter for the first of conventional data table i;
StorageCost (i) is conventional data table i the first carrying cost parameter;
ScanCost (i, j) is that conventional data table i scans cost parameter to the first of parent table j.
In the embodiment of the present application, the use cost computing module 303 can specifically include as follows Submodule:
Use cost calculating sub module 3031, it is special for the processing cost according to the conventional data table Parameter is levied, the use cost data of the external data table are calculated.
In the embodiment of the present application, the use cost calculating sub module 3031 can specifically be included such as Lower unit:
Processing cost characteristic parameter extraction unit 311, the outside for extracting the non-data common layer The processing cost characteristic parameter for the conventional data table that tables of data is relied on;
Use cost calculation of characteristic parameters unit 312, by using based on the processing cost characteristic parameter Calculate the use cost characteristic parameter of the external data table;
Use cost Data Computation Unit 313, for using the use cost calculation of characteristic parameters institute State the use cost data of external data table.
In the embodiment of the present application, the use cost characteristic parameter includes the second calculating cost parameter;
The processing cost characteristic parameter extraction unit 311 can specifically include following subelement:
First calculating cost parameter shifts to an earlier date subelement 311A, for extract external data table institute according to The first of bad conventional data table calculates cost parameter;
The use cost calculation of characteristic parameters unit 312 can specifically include following subelement:
Calculate the cost calculation factor and obtain subelement 312A, for obtaining the external data table and its The calculating cost calculation factor between the conventional data table relied on;
Second calculate cost parameter computation unit 312B, for using it is described calculating cost calculation because Son correction described first calculates cost parameter, obtains second and calculates cost parameter.
In the embodiment of the present application, the use cost characteristic parameter can also be stored into including second This parameter;
The processing cost characteristic parameter extraction unit 311 can specifically include following subelement:
First carrying cost parameter extraction subelement 311B, for extract external data table institute according to First carrying cost parameter of bad conventional data table;
The use cost calculation of characteristic parameters unit 312 can also include following subelement:
Carrying cost calculates the factor and obtains subelement 312C, for obtaining the external data table and its Carrying cost between the conventional data table relied on calculates the factor;
Second carrying cost parameter computation unit 312D, for using the carrying cost calculate because Son correction the first carrying cost parameter, obtains the second carrying cost parameter.
In the embodiment of the present application, the use cost characteristic parameter can also be scanned into including second This parameter;
The processing cost characteristic parameter extraction unit 311 can also include following subelement:
First scanning cost parameter extraction subelement 311C, for extract external data table institute according to First scanning cost parameter of bad conventional data table;
The use cost calculation of characteristic parameters unit 312 can also include following subelement:
Scan the cost calculation factor and obtain subelement 312E, for obtaining the external data table and its The scanning cost calculation factor between the conventional data table relied on;
Second scanning cost parameter computation unit 312F, for using the carrying cost calculate because Son correction the first scanning cost parameter, obtains second and scans cost parameter.
In the embodiment of the present application, the calculating cost calculation factor obtain subelement 312A can be with It is further used for:
Obtain the number of the tables of data that every day is over-scanned to the conventional data table in nearest m days Mesh, and, the conventional data table average sublist number of nearest m days;
The conventional data table was carried out according to every day in described nearest m days using equation below The number of the tables of data of scanning, and, the conventional data table average sublist number of nearest m days, Calculate the cost calculation factor:
Wherein, m is every day in nearest m days;
Scanm (j) is the tables of data number over-scanned to conventional data table j for the m days;
Denominator is the example of the conventional data table j average sublist numbers of nearest 90 days.
In the embodiment of the present application, the carrying cost calculate the factor obtain subelement 312C can be with It is further used for:
The scanning amount for the conventional data table that the external data table is relied on it is obtained, and, with There are k tables of dependence in the conventional data table;
The scanning of the conventional data table relied on using equation below according to the external data table it Amount, and, there are k tables of dependence with the conventional data table, calculate carrying cost Calculate the factor:
Wherein, scansize (i, j) is scanning amounts of the external data table i to conventional data table j;
M is the k tables that there is dependence with conventional data table j, for numbering 1 ... k.
In the embodiment of the present application, the scanning cost calculation factor obtain subelement 312E can be with It is further used for:
The ratio shared by temperature field in the conventional data table is obtained, and, the conventional data Dependence level of the table in current data common layer;
Using equation below according to the ratio shared by temperature field in the conventional data table, and, Level of the conventional data table in current data common layer, calculates the scanning cost calculation factor:
Wherein, hot_ratio (j) accounts for total Field Count in table for the quantity of conventional data table j temperature field The ratio of amount;
Level (j) is dependence levels of the conventional data table j in data common layer.
In the embodiment of the present application, the use cost characteristic parameter can be used by equation below Calculate the use cost data of the external data table:
Cost (i, j)=compcost (j) * compfac (i, j)+storcost (j) * storfac (j)+scancost (j) * scanfac (i, j)
Wherein, i is external data table, and j is conventional data table, is deposited between tables of data i and tables of data j In dependence;
Cost (i, j) is the use cost data that external data table i uses conventional data table j;
Compcost (j) calculates cost parameter for first in conventional data table j processing cost data;
Compfac (i, j) between external data table i and conventional data table j calculating cost calculation because Son;
Storcost (j) is the first carrying cost parameter in conventional data table j processing cost data;
Storfac (i, j) calculates the factor for the carrying cost between external data table i and conventional data table j;
Scancost (j) is the first scanning cost parameter in conventional data table j processing cost data;
Scanfac (i, j) is the scanning cost calculation factor between external data table i and conventional data table j.
In the embodiment of the present application, described device can also include following module:
First extraction module 304, for the processing cost data meet the first preparatory condition when, Extract corresponding conventional data table;
In the embodiment of the present application, first extraction module 304 can specifically include following submodule Block:
First extracting sub-module 3041, for the first carrying cost parameter in certain conventional data table When being higher than the first predetermined threshold value with the first ratio for calculating cost parameter, the conventional data is extracted Table;
And/or,
Second extracting sub-module 3042, for calculating cost parameter the first of certain conventional data table During higher than the second predetermined threshold value, the conventional data table is extracted;
And/or,
3rd extracting sub-module 3043, for the first scanning cost parameter in certain conventional data table When being higher than three predetermined threshold values with the first ratio for calculating cost parameter, the conventional data is extracted Table;
And/or,
4th statistic submodule 3044, the presence for counting with certain conventional data table is directly relied on The second of the external data table of relation calculates cost parameter sum;
4th extracting sub-module 3045, for calculating cost parameter the first of the conventional data table When calculating cost parameter sum more than described second, the conventional data table is extracted;
And/or,
5th statistic submodule 3046, the presence for counting with certain conventional data table is directly relied on Second carrying cost parameter sum of the external data table of relation;
5th extracting sub-module 3047, for the first carrying cost parameter in the conventional data table During more than the second carrying cost parameter sum, the conventional data table is extracted;
And/or,
6th statistic submodule 3048, the presence for counting with certain conventional data table is directly relied on Second scanning cost parameter sum of the external data table of relation;
6th extracting sub-module 3049, for scanning cost parameter the first of the conventional data table During more than the described second scanning cost parameter sum, the conventional data table is extracted.
In the embodiment of the present application, described device can also include following module:
Second extraction module 305, for the use cost data meet the second preparatory condition when, Extract corresponding external data table.
In the embodiment of the present application, second extraction module 305 can specifically include following submodule Block:
7th extracting sub-module 3051, for the second carrying cost parameter in certain external data table When being higher than four predetermined threshold values with the second ratio for calculating cost parameter, the external data is extracted Table;
And/or,
8th extracting sub-module 3052, for can be from other conventional datas in certain external data table Table is obtained and current conventional data table identical data, and obtains number passing through other conventional data tables According to when second scanning cost parameter be less than from current conventional data table obtain data when second scanning During cost parameter, the external data table is extracted.
For device embodiment, because it is substantially similar to embodiment of the method, so description Fairly simple, the relevent part can refer to the partial explaination of embodiments of method.
Each embodiment in this specification is described by the way of progressive, each embodiment emphasis What is illustrated is all the difference with other embodiment, identical similar part between each embodiment Mutually referring to.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present application can be provided as method, Device or computer program product.Therefore, the embodiment of the present application can using complete hardware embodiment, The form of embodiment in terms of complete software embodiment or combination software and hardware.Moreover, this Shen Please embodiment can use in one or more computers for wherein including computer usable program code It is real in usable storage medium (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form for the computer program product applied.
In a typical configuration, the computer equipment includes one or more processors (CPU), input/output interface, network interface and internal memory.Internal memory potentially includes computer-readable medium In volatile memory, the shape such as random access memory (RAM) and/or Nonvolatile memory Formula, such as read-only storage (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.Computer-readable medium includes permanent and non-permanent, removable and non-removable media It can realize that information is stored by any method or technique.Information can be computer-readable instruction, Data structure, the module of program or other data.The example of the storage medium of computer includes, but Phase transition internal memory (PRAM), static RAM (SRAM), dynamic random is not limited to deposit Access to memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), Electrically Erasable Read Only Memory (EEPROM), fast flash memory bank or other in Deposit technology, read-only optical disc read-only storage (CD-ROM), digital versatile disc (DVD) or other Optical storage, magnetic cassette tape, tape magnetic rigid disk storage other magnetic storage apparatus or it is any its His non-transmission medium, the information that can be accessed by a computing device available for storage.According to herein Define, computer-readable medium does not include the computer readable media (transitory media) of non-standing, Such as the data-signal and carrier wave of modulation.
The embodiment of the present application be with reference to according to the method for the embodiment of the present application, terminal device (system) and The flow chart and/or block diagram of computer program product is described.It should be understood that can be by computer journey Sequence instructs implementation process figure and/or each flow and/or square frame and flow chart in block diagram And/or the flow in block diagram and/or the combination of square frame.These computer program instructions can be provided To all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing terminals The processor of equipment is to produce a machine so that pass through computer or other programmable datas are handled The instruction of the computing device of terminal device is produced for realizing in one flow of flow chart or multiple streams The device for the function of being specified in one square frame of journey and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide at computer or other programmable datas In the computer-readable memory that reason terminal device works in a specific way so that be stored in the calculating Instruction in machine readable memory, which is produced, includes the manufacture of command device, and the command device is realized Specified in one flow of flow chart or multiple flows and/or one square frame of block diagram or multiple square frames Function.
These computer program instructions can also be loaded into computer or other programmable data processing terminals In equipment so that on computer or other programmable terminal equipments perform series of operation steps with Computer implemented processing is produced, so that performed on computer or other programmable terminal equipments Instruction, which is provided, to be used to realize in one flow of flow chart or multiple flows and/or one square frame of block diagram Or specified in multiple square frames function the step of.
Although having been described for the preferred embodiment of the embodiment of the present application, those skilled in the art Once knowing basic creative concept, then other change and modification can be made to these embodiments. So, appended claims are intended to be construed to include preferred embodiment and fall into the embodiment of the present application Scope has altered and changed.
Finally, in addition it is also necessary to explanation, herein, such as first and second or the like relation Term is used merely to make a distinction an entity or operation with another entity or operation, without It is certain to require or imply between these entities or operation there is any this actual relation or suitable Sequence.Moreover, term " comprising ", "comprising" or its any other variant are intended to nonexcludability Comprising so that process, method, article or terminal device including a series of key elements are not only Including those key elements, but also other key elements including being not expressly set out, or also including being This process, method, article or the intrinsic key element of terminal device.In not more limitations In the case of, the key element limited by sentence " including one ... ", it is not excluded that including the key element Also there is other identical element in process, method, article or terminal device.
Above to the method and a kind of tables of data point of a kind of spreadsheet analysis processing provided herein The device of analysis processing is described in detail, principle of the specific case to the application used herein And embodiment is set forth, the explanation of above example is only intended to help to understand the application's Method and its core concept;Simultaneously for those of ordinary skill in the art, according to the application's Thought, will change in specific embodiments and applications, in summary, this theory Bright book content should not be construed as the limitation to the application.

Claims (38)

1. a kind of method of spreadsheet analysis processing, it is characterised in that the tables of data includes data The conventional data table of common layer, and, the external data table of non-data common layer, described method Including:
Processing cost data are calculated for the conventional data meter of the data common layer;
Determine the conventional data table that the external data table of the non-data common layer is relied on;
According to the processing cost data of the conventional data table, the use of the external data table is calculated Cost data.
2. according to the method described in claim 1, it is characterised in that described public for the data The step of conventional data meter of co-layer calculates processing cost data includes:
Extract the processing cost characteristic parameter of the conventional data table of the data common layer;
Using the processing cost data of conventional data table described in the processing cost calculation of characteristic parameters.
3. method according to claim 2, it is characterised in that the processing cost feature ginseng Number include first scanning cost parameter, the conventional data table of the extraction data common layer add The sub-step of work cost feature parameter further comprises:
Count the quantity for the parent table that the conventional data table is relied on;
Obtain scanning amount of the conventional data table to the parent table;
Count the quantity of all sublists under the parent table;
The processing cost number using conventional data table described in the processing cost calculation of characteristic parameters According to sub-step further comprise:
The parent table quantity relied on using the conventional data table, the conventional data table is to the father The scanning amount of table, and, the quantity of all sublists under the parent table calculates the first scanning cost Parameter.
4. method according to claim 3, it is characterised in that the processing cost feature ginseng Number also includes first and calculates cost parameter, and, the first carrying cost parameter, described in the extraction The sub-step of the processing cost characteristic parameter of the conventional data table of data common layer further comprises:
The complexity CU of the conventional data table is extracted as the first calculating cost parameter;
The amount of storage of the conventional data table is extracted as the first carrying cost parameter.
5. the method according to claim 3 or 4, it is characterised in that adopted by equation below The parent table quantity relied on the conventional data table, and, the conventional data table is to the father The scanning amount of table, and, the quantity of all sublists under the parent table calculates the first scanning cost Parameter:
S c a n C o s t ( i , j ) = C o s t ( j ) * S c a n S i z e ( i , j ) Σ m = 1 n S c a n S i z e ( m , j )
Wherein, Cost (j) is tables of data j processing cost data,
The m parent tables that tables of data j is relied on by tables of data i, numbering is 1 ... m,
ScanSize (i, j) is scanning amounts of the conventional data table i to parent table j,
Tables of data m is parent table j all sublists, numbering is 1 ... n.
6. method according to claim 5, it is characterised in that institute is used by equation below State the processing cost data of conventional data table described in processing cost calculation of characteristic parameters:
C o s t ( i ) = C o m p u t e C o s t ( i ) + S t o r a g e C o s t ( i ) + Σ j = 1 n S c a n C o s t ( i , j )
Wherein, ComputeCost (i) calculates cost parameter for the first of conventional data table i;
StorageCost (i) is conventional data table i the first carrying cost parameter;
ScanCost (i, j) is that conventional data table i scans cost parameter to the first of parent table j.
7. the method according to Claims 2 or 3 or 4, it is characterised in that described according to institute The processing cost data of conventional data table are stated, the use cost data of the external data table are calculated Step is,
According to the processing cost characteristic parameter of the conventional data table, the external data table is calculated Use cost data.
8. method according to claim 7, it is characterised in that described according to the general number According to the processing cost characteristic parameter of table, the use cost data step bag of the external data table is calculated Include:
Extract being processed into for the conventional data table that the external data table of the non-data common layer is relied on Eigen parameter;
Joined using the use cost feature of external data table described in the processing cost calculation of characteristic parameters Number;
Using the use cost data of external data table described in the use cost calculation of characteristic parameters.
9. method according to claim 8, it is characterised in that the use cost feature ginseng Number includes the second calculating cost parameter;
Relied on conventional data table of the external data table for extracting the non-data common layer plus The sub-step of work cost feature parameter is:
Extract the conventional data table that the external data table is relied on first calculates cost parameter;
The use cost using external data table described in the processing cost calculation of characteristic parameters is special The step of levying parameter includes:
Obtain the calculating cost calculation between the external data table and its conventional data table relied on The factor;
Cost parameter is calculated using described in the calculating cost calculation factor correction first, second is obtained Calculate cost parameter.
10. method according to claim 9, it is characterised in that the use cost feature Parameter includes the second carrying cost parameter;
Relied on conventional data table of the external data table for extracting the non-data common layer plus The sub-step of work cost feature parameter is:
Extract the first carrying cost parameter of the conventional data table that the external data table is relied on;
The use cost using external data table described in the processing cost calculation of characteristic parameters is special The step of levying parameter also includes:
The carrying cost obtained between the external data table and its conventional data table relied on is calculated The factor;
First carrying cost parameter described in factor correction is calculated using the carrying cost, second is obtained Carrying cost parameter.
11. method according to claim 10, it is characterised in that the use cost feature Parameter includes the second scanning cost parameter;
Relied on conventional data table of the external data table for extracting the non-data common layer plus The sub-step of work cost feature parameter is:
Extract the first scanning cost parameter of the conventional data table that the external data table is relied on;
The use cost using external data table described in the processing cost calculation of characteristic parameters is special The step of levying parameter also includes:
Obtain the scanning cost calculation between the external data table and its conventional data table relied on The factor;
First scanning cost parameter described in factor correction is calculated using the carrying cost, second is obtained Scan cost parameter.
12. method according to claim 9, it is characterised in that obtain the external data The sub-step of the calculating cost calculation factor between table and its conventional data table relied on is further wrapped Include:
Obtain the number of the tables of data that every day is over-scanned to the conventional data table in nearest m days Mesh, and, the conventional data table average sublist number of nearest m days;
The conventional data table was carried out according to every day in described nearest m days using equation below The number of the tables of data of scanning, and, the conventional data table average sublist number of nearest m days, Calculate the cost calculation factor:
c o m p f a c ( i , j ) = 1 a v g ( Σ m = 1 90 scan m ( j ) )
Wherein, m is every day in nearest m days;
Scanm (j) is the tables of data number over-scanned to conventional data table j for the m days;
Denominator is the example of the conventional data table j average sublist numbers of nearest 90 days.
13. method according to claim 10, it is characterised in that obtain the external data The sub-step that carrying cost between table and its conventional data table relied on calculates the factor is further wrapped Include:
The scanning amount for the conventional data table that the external data table is relied on it is obtained, and, with There are k tables of dependence in the conventional data table;
The scanning of the conventional data table relied on using equation below according to the external data table it Amount, and, there are k tables of dependence with the conventional data table, calculate carrying cost Calculate the factor:
s t o r f a c ( i , j ) = s c a n s i z e ( i , j ) Σ m = 1 k s c a n s i z e ( m , j )
Wherein, scansize (i, j) is scanning amounts of the external data table i to conventional data table j;
M is the k tables that there is dependence with conventional data table j, for numbering 1 ... k.
14. method according to claim 11, it is characterised in that obtain the external data The sub-step of the Sao Miao cost calculation factor between table and its conventional data table relied on is further wrapped Include:
The ratio shared by temperature field in the conventional data table is obtained, and, the conventional data Dependence level of the table in current data common layer, the temperature field is the quilt in certain time period The number of times used is more than the field of the direct downstream data table quantity of the conventional data table;
Using equation below according to the ratio shared by temperature field in the conventional data table, and, Level of the conventional data table in current data common layer, calculates the scanning cost calculation factor:
s c a n f a c ( i , j ) = h o t _ r a t i o ( j ) log 2 ( l e v e l ( j ) + 1 )
Wherein, hot_ratio (j) accounts for total Field Count in table for the quantity of conventional data table j temperature field The ratio of amount;
Level (j) is dependence levels of the conventional data table j in data common layer.
15. the method according to claim 12 or 13 or 14, it is characterised in that by such as Lower formula uses the use cost number of external data table described in the use cost calculation of characteristic parameters According to:
Cost (i, j)=compcost (j) * compfac (i, j)+storcost (j) * storfac (j)+scancost (j) * scanfac (i, j)
Wherein, i is external data table, and j is conventional data table, is deposited between tables of data i and tables of data j In dependence;
Cost (i, j) is the use cost data that external data table i uses conventional data table j;
Compcost (j) calculates cost parameter for first in conventional data table j processing cost data;
Compfac (i, j) between external data table i and conventional data table j calculating cost calculation because Son;
Storcost (j) is the first carrying cost parameter in conventional data table j processing cost data;
Storfac (i, j) calculates the factor for the carrying cost between external data table i and conventional data table j;
Scancost (j) is the first scanning cost parameter in conventional data table j processing cost data;
Scanfac (i, j) is the scanning cost calculation factor between external data table i and conventional data table j.
16. according to claim 1 or 2 or 3 or 4 or 6 or 8 or 9 or 10 or 11 or 12 or Method described in 13 or 14, it is characterised in that also include:
When the processing cost data meet the first preparatory condition, corresponding conventional data table is extracted.
17. method according to claim 16, it is characterised in that described to be processed into when described When notebook data meets the first preparatory condition, the step of extracting corresponding conventional data table includes:
If the first carrying cost parameter of certain conventional data table and the first ratio for calculating cost parameter Higher than the first predetermined threshold value, then the conventional data table is extracted;
And/or,
If the first of certain conventional data table, which calculates cost parameter, is higher than the second predetermined threshold value, extract Go out the conventional data table;
And/or,
If the ratio of the first scanning cost parameter of certain conventional data table and the first calculating cost parameter Higher than the 3rd predetermined threshold value, then the conventional data table is extracted;
And/or,
The presence of statistics and certain conventional data table directly relies on the second meter of the external data table of relation Calculate cost parameter sum;
If the first of the conventional data table, which calculates cost parameter, is more than the described second calculating cost parameter Sum, then extract the conventional data table;
And/or,
Presence that statistics opens conventional data table with certain directly relies on second depositing for the external data table of relation Store up cost parameter sum;
If the first carrying cost parameter of the conventional data table is more than the second carrying cost parameter Sum, then extract the conventional data table;
And/or,
Presence that statistics opens conventional data table with certain directly relies on second sweeping for the external data table of relation Retouch cost parameter sum;
If the first scanning cost parameter of the conventional data table is more than the described second scanning cost parameter Sum, then extract the conventional data table.
18. according to claim 1 or 2 or 3 or 4 or 6 or 8 or 9 or 10 or 11 or 12 or Method described in 13 or 14 or 17, it is characterised in that also include:
When the use cost data meet the second preparatory condition, corresponding external data table is extracted.
19. method according to claim 18, it is characterised in that described to be processed into when described When notebook data meets the second preparatory condition, the step of extracting corresponding external data table includes:
If the second carrying cost parameter of certain external data table and the second ratio for calculating cost parameter Higher than the 4th predetermined threshold value, then the external data table is extracted;
And/or,
If certain external data table can be obtained and current conventional data table phase from other conventional data tables Same data, and the second scanning cost parameter when obtaining data by other conventional data tables is small The second scanning cost parameter when data are obtained from current conventional data table, then extract described outer Portion's tables of data.
20. a kind of device of spreadsheet analysis processing, it is characterised in that the tables of data includes number According to the conventional data table of common layer, and, the external data table of non-data common layer, described dress Put including:
Processing cost computing module, calculates for the conventional data meter for the data common layer and adds Work cost data;
Determining module, for determining that it is general that the external data table of the non-data common layer is relied on Tables of data;
Use cost computing module, for the processing cost data according to the conventional data table, meter Calculate the use cost data of the external data table.
21. device according to claim 20, it is characterised in that the processing cost is calculated Module includes:
Processing cost characteristic parameter extraction submodule, the general number for extracting the data common layer According to the processing cost characteristic parameter of table;
Processing cost calculating sub module, for leading to using described in the processing cost calculation of characteristic parameters With the processing cost data of tables of data.
22. device according to claim 21, it is characterised in that the processing cost feature Parameter includes the first scanning cost parameter, and the processing cost characteristic parameter extraction submodule is further Including:
Parent table quantity statistics unit, the quantity for counting the parent table that the conventional data table is relied on;
Scanning amount acquiring unit, for obtaining scanning amount of the conventional data table to the parent table;
Sublist quantity statistics unit, the quantity for counting all sublists under the parent table;
The processing cost calculating sub module further comprises:
First scanning cost computing unit, for the parent table number relied on using the conventional data table Amount, the conventional data table to the scanning amount of the parent table, and, all sublists under the parent table Quantity, calculate the first scanning cost parameter.
23. device according to claim 22, it is characterised in that the processing cost feature Parameter also includes first and calculates cost parameter, and, the first carrying cost parameter is described to be processed into Eigen parameter extraction submodule further comprises:
First calculates cost parameter extraction unit, the complexity CU for extracting the conventional data table Cost parameter is calculated as first;
First carrying cost parameter extraction unit, the amount of storage for extracting the conventional data table is made For the first carrying cost parameter.
24. the device according to claim 22 or 23, it is characterised in that by following public Formula uses the parent table quantity that the conventional data table is relied on, and, the conventional data table is to institute The scanning amount of parent table is stated, and, the quantity of all sublists under the parent table calculates the first scanning Cost parameter:
S c a n C o s t ( i , j ) = C o s t ( j ) * S c a n S i z e ( i , j ) Σ m = 1 n S c a n S i z e ( m , j )
Wherein, Cost (j) is tables of data j processing cost data,
The m parent tables that tables of data j is relied on by tables of data i, numbering is 1 ... m,
ScanSize (i, j) is scanning amounts of the conventional data table i to parent table j,
Tables of data m is parent table j all sublists, numbering is 1 ... n.
25. device according to claim 24, it is characterised in that used by equation below The processing cost data of conventional data table described in the processing cost calculation of characteristic parameters:
C o s t ( i ) = C o m p u t e C o s t ( i ) + S t o r a g e C o s t ( i ) + Σ j = 1 n S c a n C o s t ( i , j )
Wherein, ComputeCost (i) calculates cost parameter for the first of conventional data table i;
StorageCost (i) is conventional data table i the first carrying cost parameter;
ScanCost (i, j) is that conventional data table i scans cost parameter to the first of parent table j.
26. the device according to claim 21 or 22 or 23, it is characterised in that described to make Included with cost calculation module:
Use cost calculating sub module, for the processing cost feature ginseng according to the conventional data table Number, calculates the use cost data of the external data table.
27. device according to claim 26, it is characterised in that the use cost is calculated Submodule includes:
Processing cost characteristic parameter extraction unit, the external number for extracting the non-data common layer The processing cost characteristic parameter of the conventional data table relied on according to table;
Use cost calculation of characteristic parameters unit, for using the processing cost calculation of characteristic parameters The use cost characteristic parameter of the external data table;
Use cost Data Computation Unit, for using described in the use cost calculation of characteristic parameters The use cost data of external data table.
28. device according to claim 27, it is characterised in that the use cost feature Parameter includes second and calculates cost parameter;
The processing cost characteristic parameter extraction unit includes:
First calculating cost parameter shifts to an earlier date subelement, is relied on for extracting the external data table The first of conventional data table calculates cost parameter;
The use cost calculation of characteristic parameters unit includes:
Calculate the cost calculation factor and obtain subelement, for obtain the external data table and its institute according to The calculating cost calculation factor between bad conventional data table;
Second calculates cost parameter computation unit, for using the calculating cost calculation factor school Just described first calculates cost parameter, obtains second and calculates cost parameter.
29. device according to claim 28, it is characterised in that the use cost feature Parameter includes the second carrying cost parameter;
The processing cost characteristic parameter extraction unit includes:
First carrying cost parameter extraction subelement, is relied on for extracting the external data table First carrying cost parameter of conventional data table;
The use cost calculation of characteristic parameters unit also includes:
Carrying cost calculate the factor obtain subelement, for obtain the external data table with its institute according to Carrying cost between bad conventional data table calculates the factor;
Second carrying cost parameter computation unit, for calculating factor school using the carrying cost Just described first carrying cost parameter, obtains the second carrying cost parameter.
30. device according to claim 29, it is characterised in that the use cost feature Parameter includes the second scanning cost parameter;
The processing cost characteristic parameter extraction unit includes:
First scanning cost parameter extraction subelement, is relied on for extracting the external data table First scanning cost parameter of conventional data table;
The use cost calculation of characteristic parameters unit also includes:
Scan the cost calculation factor and obtain subelement, for obtain the external data table and its institute according to The scanning cost calculation factor between bad conventional data table;
Second scanning cost parameter computation unit, for calculating factor school using the carrying cost Just described first scanning cost parameter, obtains second and scans cost parameter.
31. device according to claim 28, it is characterised in that the calculating cost calculation The factor obtains subelement and is further used for:
Obtain the number of the tables of data that every day is over-scanned to the conventional data table in nearest m days Mesh, and, the conventional data table average sublist number of nearest m days;
The conventional data table was carried out according to every day in described nearest m days using equation below The number of the tables of data of scanning, and, the conventional data table average sublist number of nearest m days, Calculate the cost calculation factor:
c o m p f a c ( i , j ) = 1 a v g ( Σ m = 1 90 scan m ( j ) )
Wherein, m is every day in nearest m days;
Scanm (j) is the tables of data number over-scanned to conventional data table j for the m days;
Denominator is the example of the conventional data table j average sublist numbers of nearest 90 days.
32. device according to claim 29, it is characterised in that the carrying cost is calculated The factor obtains subelement and is further used for:
The scanning amount for the conventional data table that the external data table is relied on it is obtained, and, with There are k tables of dependence in the conventional data table;
The scanning of the conventional data table relied on using equation below according to the external data table it Amount, and, there are k tables of dependence with the conventional data table, calculate carrying cost Calculate the factor:
s t o r f a c ( i , j ) = s c a n s i z e ( i , j ) Σ m = 1 k s c a n s i z e ( m , j )
Wherein, scansize (i, j) is scanning amounts of the external data table i to conventional data table j;
M is the k tables that there is dependence with conventional data table j, for numbering 1 ... k.
33. device according to claim 30, it is characterised in that the scanning cost calculation The factor obtains subelement and is further used for:
The ratio shared by temperature field in the conventional data table is obtained, and, the conventional data Dependence level of the table in current data common layer, the temperature field is the quilt in certain time period The number of times used is more than the field of the direct downstream data table quantity of the conventional data table;
Using equation below according to the ratio shared by temperature field in the conventional data table, and, Level of the conventional data table in current data common layer, calculates the scanning cost calculation factor:
s c a n f a c ( i , j ) = h o t _ r a t i o ( j ) log 2 ( l e v e l ( j ) + 1 )
Wherein, hot_ratio (j) accounts for total Field Count in table for the quantity of conventional data table j temperature field The ratio of amount;
Level (j) is dependence levels of the conventional data table j in data common layer.
34. the device according to claim 31 or 32 or 33, it is characterised in that by such as Lower formula uses the use cost number of external data table described in the use cost calculation of characteristic parameters According to:
Cost (i, j)=compcost (j) * compfac (i, j)+storcost (j) * storfac (j)+scancost (j) * scanfac (i, j)
Wherein, i is external data table, and j is conventional data table, is deposited between tables of data i and tables of data j In dependence;
Cost (i, j) is the use cost data that external data table i uses conventional data table j;
Compcost (j) calculates cost parameter for first in conventional data table j processing cost data;
Compfac (i, j) between external data table i and conventional data table j calculating cost calculation because Son;
Storcost (j) is the first carrying cost parameter in conventional data table j processing cost data;
Storfac (i, j) calculates the factor for the carrying cost between external data table i and conventional data table j;
Scancost (j) is the first scanning cost parameter in conventional data table j processing cost data;
Scanfac (i, j) is the scanning cost calculation factor between external data table i and conventional data table j.
35. according to claim 20 or 21 or 22 or 23 or 25 or 27 or 28 or 29 or 30 Or the device described in 31 or 32 or 33, it is characterised in that also include:
First extraction module, for when the processing cost data meet the first preparatory condition, carrying Take corresponding conventional data table.
36. device according to claim 35, it is characterised in that first extraction module Including:
First extracting sub-module, for the first carrying cost parameter in certain conventional data table and the When one ratio for calculating cost parameter is higher than the first predetermined threshold value, the conventional data table is extracted;
And/or,
Second extracting sub-module, for being higher than in the first calculating cost parameter of certain conventional data table During the second predetermined threshold value, the conventional data table is extracted;
And/or,
3rd extracting sub-module, for the first scanning cost parameter in certain conventional data table and the When one ratio for calculating cost parameter is higher than three predetermined threshold values, the conventional data table is extracted;
And/or,
4th statistic submodule, the presence for counting with certain conventional data table directly relies on relation External data table second calculate cost parameter sum;
4th extracting sub-module, is more than for the first calculating cost parameter in the conventional data table During the second calculating cost parameter sum, the conventional data table is extracted;
And/or,
5th statistic submodule, the presence for counting with certain conventional data table directly relies on relation External data table the second carrying cost parameter sum;
5th extracting sub-module, is more than for the first carrying cost parameter in the conventional data table During the second carrying cost parameter sum, the conventional data table is extracted;
And/or,
6th statistic submodule, the presence for counting with certain conventional data table directly relies on relation External data table second scanning cost parameter sum;
6th extracting sub-module, is more than for the first scanning cost parameter in the conventional data table During the second scanning cost parameter sum, the conventional data table is extracted.
37. according to claim 20 or 21 or 22 or 23 or 25 or 27 or 28 or 29 or 30 Or the device described in 31 or 32 or 33 or 36, it is characterised in that also include:
Second extraction module, for when the use cost data meet the second preparatory condition, carrying Take corresponding external data table.
38. the device according to claim 37, it is characterised in that second extraction module Including:
7th extracting sub-module, for the second carrying cost parameter in certain external data table and the When two ratios for calculating cost parameter are higher than four predetermined threshold values, the external data table is extracted;
And/or,
8th extracting sub-module, for that can be obtained in certain external data table from other conventional data tables With current conventional data table identical data, and by other conventional data tables obtain data when Second scanning cost parameter be less than from current conventional data table obtain data when second scanning cost During parameter, the external data table is extracted.
CN201610042109.0A 2016-01-21 2016-01-21 Data table analysis processing method and device Active CN106991101B (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN201610042109.0A CN106991101B (en) 2016-01-21 2016-01-21 Data table analysis processing method and device
PCT/CN2017/070977 WO2017124959A1 (en) 2016-01-21 2017-01-12 Method and device for use in analyzing data table
EP17740990.1A EP3407212A4 (en) 2016-01-21 2017-01-12 Method and device for use in analyzing data table
TW106101915A TW201732641A (en) 2016-01-21 2017-01-19 Method and device for use in analyzing data table
US16/041,336 US10909481B2 (en) 2016-01-21 2018-07-20 Method and apparatus for analyzing data table

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610042109.0A CN106991101B (en) 2016-01-21 2016-01-21 Data table analysis processing method and device

Publications (2)

Publication Number Publication Date
CN106991101A true CN106991101A (en) 2017-07-28
CN106991101B CN106991101B (en) 2021-02-02

Family

ID=59361344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610042109.0A Active CN106991101B (en) 2016-01-21 2016-01-21 Data table analysis processing method and device

Country Status (5)

Country Link
US (1) US10909481B2 (en)
EP (1) EP3407212A4 (en)
CN (1) CN106991101B (en)
TW (1) TW201732641A (en)
WO (1) WO2017124959A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110517009A (en) * 2019-07-29 2019-11-29 阿里巴巴集团控股有限公司 Real-time common layer building method, device and server
WO2021174945A1 (en) * 2020-10-21 2021-09-10 平安科技(深圳)有限公司 Data cost calculation method, system, computer device, and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457329B (en) * 2019-08-16 2022-05-06 第四范式(北京)技术有限公司 Method and device for realizing personalized recommendation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060253473A1 (en) * 2005-05-06 2006-11-09 Microsoft Corporation Integrating vertical partitioning into physical database design
US20130031064A1 (en) * 2009-12-22 2013-01-31 At&T Intellectual Property I, L.P. Compressing Massive Relational Data
CN104899209A (en) * 2014-03-05 2015-09-09 阿里巴巴集团控股有限公司 Optimization method and device for open type data processing service
US20150347473A1 (en) * 2014-05-29 2015-12-03 International Business Machines Corporation Database partition

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5995958A (en) * 1997-03-04 1999-11-30 Xu; Kevin Houzhi System and method for storing and managing functions
US7260563B1 (en) * 2003-10-08 2007-08-21 Ncr Corp. Efficient costing for inclusion merge join
US8280876B2 (en) * 2007-05-11 2012-10-02 Nec Corporation System, method, and program product for database restructuring support
CN100483395C (en) * 2007-05-25 2009-04-29 金蝶软件(中国)有限公司 Electronic data table calculation chain generation method and device
US9020910B2 (en) * 2010-01-13 2015-04-28 International Business Machines Corporation Storing tables in a database system
CN102436494B (en) * 2011-11-11 2013-05-01 中国工商银行股份有限公司 Device and method for optimizing execution plan and based on practice testing
US9171158B2 (en) * 2011-12-12 2015-10-27 International Business Machines Corporation Dynamic anomaly, association and clustering detection
US10019478B2 (en) * 2013-09-05 2018-07-10 Futurewei Technologies, Inc. Mechanism for optimizing parallel execution of queries on symmetric resources

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060253473A1 (en) * 2005-05-06 2006-11-09 Microsoft Corporation Integrating vertical partitioning into physical database design
US20130031064A1 (en) * 2009-12-22 2013-01-31 At&T Intellectual Property I, L.P. Compressing Massive Relational Data
CN104899209A (en) * 2014-03-05 2015-09-09 阿里巴巴集团控股有限公司 Optimization method and device for open type data processing service
US20150347473A1 (en) * 2014-05-29 2015-12-03 International Business Machines Corporation Database partition
CN105224536A (en) * 2014-05-29 2016-01-06 国际商业机器公司 The method and apparatus of partition database

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110517009A (en) * 2019-07-29 2019-11-29 阿里巴巴集团控股有限公司 Real-time common layer building method, device and server
CN110517009B (en) * 2019-07-29 2023-01-24 创新先进技术有限公司 Real-time public layer construction method and device and server
WO2021174945A1 (en) * 2020-10-21 2021-09-10 平安科技(深圳)有限公司 Data cost calculation method, system, computer device, and storage medium

Also Published As

Publication number Publication date
WO2017124959A1 (en) 2017-07-27
TW201732641A (en) 2017-09-16
US10909481B2 (en) 2021-02-02
EP3407212A4 (en) 2019-06-19
US20180349811A1 (en) 2018-12-06
CN106991101B (en) 2021-02-02
EP3407212A1 (en) 2018-11-28

Similar Documents

Publication Publication Date Title
Morrison et al. On economic complexity and the fitness of nations
Liu et al. Analyzing urban networks through the lens of corporate networks: A critical review
WO2015135321A1 (en) Method and device for mining social relationship based on financial data
CN104809132B (en) A kind of method and device obtaining network principal social networks type
CN108446291A (en) The real-time methods of marking and points-scoring system of user credit
Van den Honert Stochastic group preference modelling in the multiplicative AHP: A model of group consensus
CN106991101A (en) A kind of method and apparatus of spreadsheet analysis processing
CN108901033A (en) Base station method for predicting based on echo state network
Ruther et al. Comparing the effects of an NLCD-derived dasymetric refinement on estimation accuracies for multiple areal interpolation methods
CN110135711A (en) A kind of information management method and device
CN110838060A (en) Financial risk measurement method and device and electronic equipment
CN111737569B (en) Personalized recommendation method based on attribute perception intention-minded convolutional neural network
Boysen‐Urban et al. Measuring the trade restrictiveness of domestic support using the EU Common agricultural policy as an example
CN110532093B (en) Parallel task division method for multi-geometric-shape full core sub-channels of numerical nuclear reactor
CN108492009A (en) Influence power evaluation system construction method and system, influence power evaluation method
Li et al. Evolution of FDI flows in the global network: 2003–2012
CN108647739A (en) A kind of myspace discovery method based on improved density peaks cluster
Schmitz et al. Efficient and quality contouring algorithms on the GPU
CN116701714A (en) Data storage method, device, equipment and medium based on multi-way tree
Dobbie et al. Quantifying uncertainty in environmental indices: an application to an estuarine health index
Zhiyuan et al. Research on the evaluation of enterprise competitiveness based on the wavelet neural network forecasting system
CN117421462B (en) Data processing method and device and electronic equipment
Zhang et al. How do manufacturing and producer service agglomerations affect urban innovation differently? Empirical evidence from China
Zhang et al. Analysis model design on the impact of foreign investment on China’s economic growth
CN115829144B (en) Method for establishing power grid business optimization model and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211110

Address after: Room 507, floor 5, building 3, No. 969, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee after: Zhejiang tmall Technology Co., Ltd

Address before: P.O. Box 847, 4th floor, Grand Cayman capital building, British Cayman Islands

Patentee before: Alibaba Group Holdings Limited

TR01 Transfer of patent right