CN106909566A - A kind of Data Modeling Method and equipment - Google Patents

A kind of Data Modeling Method and equipment Download PDF

Info

Publication number
CN106909566A
CN106909566A CN201510980569.3A CN201510980569A CN106909566A CN 106909566 A CN106909566 A CN 106909566A CN 201510980569 A CN201510980569 A CN 201510980569A CN 106909566 A CN106909566 A CN 106909566A
Authority
CN
China
Prior art keywords
master meter
data modeling
data
metadata
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510980569.3A
Other languages
Chinese (zh)
Inventor
王赛
赵唯行
王永伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201510980569.3A priority Critical patent/CN106909566A/en
Publication of CN106909566A publication Critical patent/CN106909566A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This application discloses a kind of Data Modeling Method.Determined the master meter for data modeling and the business implication determination according to master meter by after the type of the object table that data modeling is generated according to the metadata of each source table, metadata according to master meter is determined for data modeling from table, and the field for being selected from master meter and from table for data modeling, data modeling is carried out finally according to master meter, from table and field, the object table is generated.Metadata so as to be based on tables of data carries out data modeling exactly, it is ensured that the accuracy and efficiency of data modeling result.

Description

A kind of Data Modeling Method and equipment
Technical field
The application is related to communication technical field, more particularly to a kind of Data Modeling Method.The application goes back simultaneously It is related to a kind of data modeling equipment.
Background technology
With the continuous development of network technology, database has in areas of information technology and is widely applied. In the life breath that each department of social life nearly all has various databases in store with people Related various data.In order to carry out unified management to data to provide preferably service, data warehouse Arise at the historic moment.Data warehouse be a subject-oriented (Subject Oriented), integrated (Integrated), The data acquisition system of metastable (Non-Volatile), reflecting history change (Time Variant), uses In administrative decision (Decision Making Support) is supported, for analytical presentation and decision support mesh And create.To need the enterprise of business intelligence, there is provided service guidance flow scheme improvements, monitoring the time, into Originally, quality and control.
Data modeling is one of significant process of construction data warehouse, and data modeling is referred to real world The abstract tissue of Various types of data, determine scope, organizational form of data that data warehouse need to be administered etc. until Change into the data warehouse of reality.Tool treatment is carried out by by data warehouse model construction, can be solved Certainly industry for a long time empirical modeling and people's meat modeling problem, while can preferably be taken in group internal Business group data common layer is built and is optimized.
During data modeling is carried out, for business, uncomprehending data model teacher can typically enter first The service condition investigation of row downstream, then carries out data modeling further according to finding.Due to using downstream Survey and Inquiry needs to expend substantial amounts of manpower, therefore which efficiency is low and investigation is insufficient, so as to cause Get half the result with twice the effort.And the data model teacher understood for business typically uses the modeling pattern based on experience.So And which is due to the guidance without digitization, therefore the degree of accuracy of modeling cannot be ensured.
As can be seen here, how data modeling treatment quickly is carried out on the premise of accuracy is ensured, as this Art personnel technical problem urgently to be resolved hurrily.
The content of the invention
This application provides a kind of Data Modeling Method, it is used to improve the accurate fixed of data modeling and models Efficiency.The method includes:
Metadata according to each source table determines the master meter for data modeling;
Business implication according to the master meter determines the type of the object table generated by the data modeling;
Metadata according to the master meter is determined for data modeling from table;
From the master meter and the field selected from table for data modeling;
According to the master meter, it is described carry out data modeling from table and the field, generate the object table.
Preferably, the business implication according to the master meter determines the object table generated by the data modeling Type, specially:
If the business implication according to the master meter determines that the type of the object table is the true table, according to The metadata of the master meter determines the particular type of the true table, and the particular type includes:Affairs type True table, periodic snapshot fact table and accumulation snapshot fact table;
If the business implication according to the master meter determines that the type of the object table is the dimension table, according to institute The metadata for stating master meter determines whether the dimension table needs to be split and fractionation mode, the fractionation side Formula includes:Level splits and vertical fractionation.
Preferably, the metadata includes downstream use information, and the metadata according to the master meter determines to use In data modeling from table, specially:
Obtained according to the downstream use information and have related tables of data with the master meter;
The related information between the master meter and each tables of data is obtained, and will be with default selection strategy The corresponding tables of data of related information of matching is as described from table.
Preferably, from the master meter and the field selected from table for data modeling, specially:
The master meter and the field service condition information from table are obtained according to the metadata respectively;
The field is chosen according to the field service condition information;
Wherein, the field service condition information at least includes:Field Inquiry number of times, filter condition number of times, Degree of incidence, aggregate statistics number of times, null value accounting, enumerated value accounting.
Preferably, according to the master meter, it is described carry out data modeling from table and the field before, Also include:
When the object table is affairs type fact table, according to the downstream use information to the master The business procedure of table carries out mark, it is determined that generation single event fact table or multiple affair fact table;
When the object table is accumulation snapshot fact table, according to affairs type fact table to the master The business procedure of table carries out mark, and will be currently used in the business mistake of other true tables of the data modeling Cheng Jinhang marks;
When the object table is the dimension table and the fractionation mode is that the level splits, according to described The master meter level is split as multiple dimension tables by the field service condition information of master meter;
When the object table is the dimension table and the fractionation mode is the vertical fractionation, according to described Master meter and each related information between table, by business change higher than predetermined threshold value from table with it is described Master meter by the data modeling generate core dimension table, and by business change be not higher than predetermined threshold value from Table generates self-defined dimension table by the data modeling.
Correspondingly, the application also proposed a kind of data modeling equipment, including:
First determining module, the metadata according to each source table determines the master meter for data modeling;
Second determining module, the business implication according to the master meter determines what is generated by the data modeling The type of object table;
3rd determining module, the metadata according to the master meter is determined for data modeling from table;
Selecting module, from the master meter and the field selected from table for data modeling;
MBM, to the master meter, it is described carry out data modeling from table and the field, generate institute State object table.
Preferably, second determining module specifically for:
If the business implication according to the master meter determines that the type of the object table is the true table, according to The metadata of the master meter determines the particular type of the true table, and the particular type includes:Affairs type True table, periodic snapshot fact table and accumulation snapshot fact table;
If the business implication according to the master meter determines that the type of the object table is the dimension table, according to institute The metadata for stating master meter determines whether the dimension table needs to be split and fractionation mode, the fractionation side Formula includes:Level splits and vertical fractionation.
Preferably, the metadata include downstream use information, the 3rd determining module specifically for:
Obtained according to the downstream use information and have related tables of data with the master meter;
The related information between the master meter and each tables of data is obtained, and will be with default selection strategy The corresponding tables of data of related information of matching is as described from table.
Preferably, the selecting module specifically for:
The master meter and the field service condition information from table are obtained according to the metadata respectively;
The field is chosen according to the field service condition information;
Wherein, the field service condition information at least includes:Field Inquiry number of times, filter condition number of times, Degree of incidence, aggregate statistics number of times, null value accounting, enumerated value accounting.
Preferably, also including processing module, wherein:
When the object table is affairs type fact table, the processing module is used according to the downstream Information carries out mark to the business procedure of the master meter, it is determined that generation single event fact table or multiple affair are true Table;
When the object table is accumulation snapshot fact table, the processing module is according to the affairs type thing Real table carries out mark to the business procedure of the master meter, and will be currently used in other things of the data modeling The business procedure of real table carries out mark;
When the object table is the dimension table and the fractionation mode is that the level splits, the treatment The master meter level is split as multiple dimension tables by module according to the field service condition information of the master meter;
When the object table is the dimension table and the fractionation mode is the vertical fractionation, according to described Business change is higher than predetermined threshold value by master meter and each related information between table, the processing module Core dimension table is generated by the data modeling from table and the master meter, and business change is not higher than Predetermined threshold value from table generates self-defined dimension table by the data modeling.
As can be seen here, by the technical scheme of application the application, determine according to the metadata of each source table Master meter for data modeling and the target by data modeling generation is determined according to the business implication of master meter After the type of table, the metadata according to master meter is determined for data modeling from table, and from master meter and from The field for data modeling is selected in table, data modeling is carried out finally according to master meter, from table and field, Generate the object table.Metadata so as to be based on tables of data carries out data modeling exactly, it is ensured that The accuracy and efficiency of data modeling result.
Brief description of the drawings
Fig. 1 is a kind of schematic flow sheet of Data Modeling Method that the application is proposed;
Fig. 2 is the relation schematic diagram of source table and object table in the application specific embodiment;
Fig. 3 is main modular schematic diagram in the application specific embodiment;
Fig. 4 is the structural representation of metadata processing module in the application specific embodiment;
Fig. 5 is to carry out the schematic flow sheet of data modeling in the application specific embodiment;
Fig. 6 is a kind of structural representation of data modeling equipment that the application is proposed.
Specific embodiment
In existing data warehouse modeling field, data warehouse model design is main to be included " the of Inmon Three normal forms are modeled " and " dimensionality analysis of Kimball " the two schools.Both modellings are theoretical Output result with method for the modelling of certain specific data warehouse can finally be schemed by ER Or the mode of class ER figures represents, to additionally, there may be DMDWDesigner should be used for data warehouse modeling Standalone tool product.These technologies can determine master meter and perform data modeling in the case of the table Process, but as described in background, these modeling patterns do not fit within the reason of data warehouse modeling By and method, and there is no the guidance (the mainly mode of empirical modeling) of digitization in modeling process, Result in the inaccurate of data modeling result.
In view of above technical problem, present applicant proposes a kind of Data Modeling Method, as shown in figure 1, should Method comprises the following steps:
S101, the metadata according to each source table determines the master meter for data modeling.
Master-salve table is a kind of data relationship model, and master meter is the form set up in database, wherein existing Major key (primary key) is used to be associated with other tables, and as the unique identification in master meter. It is then the table of with the major key of master meter (primary key) value as external key (Foreign Key), Ke Yitong from table Cross external key and be associated inquiry with master meter.Inquiry is associated by external key from table and master meter.Wherein, from Table data dependence in master meter, during general last inquiry data master meter be associated inquiry from table.Master meter Can be used to store main information, such as customer data (customer number, customer name, client company, client Unit etc.), it is used for storing client extensions information (customer order information, customer address information, visitor from table Family contact information etc.).
In the technical scheme of the application, due to needing for multiple tables of data by way of data modeling Generation object table, therefore firstly the need of the selection master in multiple current tables of data (being also called source table) Table.For example, object table is probably derived from 1~m tables, that is, table 1 of originating, source table 2 ..., Source table m, it is assumed that source table 1 is the master meter that we select, other source tables (source table 2 ..., Source table m) be we select from table, in general, during data modeling involved master meter is only There is one, detailed process is as shown in Figure 2.
S102, the business implication according to the master meter determines the object table generated by the data modeling Type.
This step is used to determine the type of object table, and from by and large, the type of object table is included the fact that Table and the class of dimension table two.Its features is as follows:
(1) true table
Each data warehouse includes one or more fact table.Fact table may include industry Business sales data, the data as produced by cash registration affairs, fact table generally comprises substantial amounts of row. Being mainly characterized by of fact table can converge comprising numerical data (fact), and these digital informations Always, to provide data of the units concerned as history, each fact table is comprising one by some The index of composition, major key of the index comprising the correlation dimension table as external key.Fact table is not wrapped Containing descriptive information, also not comprising except digital metric field and making the phase of true and respective items in dimension table Close any data outside index field." metric " being included in fact table has two kinds:One It can be accumulative metric to plant, and another kind is non-accumulative metric.Most useful metric is to tire out The metric of meter, its numeral for adding up is significantly.User can be by cumulative metricses value Summary information is obtained, for example, one group of sale feelings of the particular commodity in shop in the specific time period can be collected Condition.Non- accumulative metric can be used for fact table, for example, surveyed in a diverse location for mansion During amount temperature, if it is nonsensical that the temperature of all diverse locations in mansion is added up, but ask flat Average is meaningful.One fact table will be associated with one or more dimension tables, Yong Hu When creating cube using fact table, it is possible to use one or more dimension tables.
(2) dimension table
Dimension table can be regarded as the window that user carrys out analyze data, comprising in fact table in dimension table The characteristic of fact record, some characteristics provide descriptive information, and some characteristics specify how to collect true number According to table data, to provide useful information for analyst, dimension table includes the characteristic for helping combined data Hierarchical structure.For example comprising product information dimension table generally comprise by product be divided into food, beverage, If the hierarchical structure of the Ganlei such as non-consumption product, each class in these products is further repeatedly segmented, until Each product reaches lowest level.
In dimension table, each table includes the characteristic of the fact that independently of other dimension tables, for example, client Dimension table includes the data about client.Information can be divided into different levels by the row field in dimension table Structural level.
Based on foregoing description, more careful division is if desired carried out in this step, then in object table In the case of being true table, can also further determine that things type the fact table, periodic snapshot the fact table or Accumulation snapshot fact table;If object table is dimension table, can further determine that to be that level splits into arranged side by side more Zhang Weibiao, or core and self-defined dimension table vertically are split into, still do not split.In the excellent of the application Select in embodiment, processing mode is as follows:
(1) if the business implication according to the master meter determines that the type of the object table is the true table, Metadata according to the master meter determines the particular type of the true table, and the particular type includes:Thing Business type fact table, periodic snapshot fact table and accumulation snapshot fact table;
(2) if the business implication according to the master meter determines that the type of the object table is the dimension table, Metadata according to the master meter determines whether the dimension table needs to be split and fractionation mode, described Fractionation mode includes:Level splits and vertical fractionation.
S103, the metadata according to the master meter is determined for data modeling from table.
Because the purpose of data modeling is that the related tables of data of tool is polymerized and is associated, therefore In the preferred embodiment of the application, the downstream use information that will be based primarily upon in metadata is carried out from table Selection, specifically, obtains according to the downstream use information have related data with the master meter first Table, then obtains the related information between the master meter and each tables of data, and will be with default selection The corresponding tables of data of related information of strategy matching is as described from table.
S104, from the master meter and the field selected from table for data modeling.
Based on selected from table in S103, the information of the step each field by master meter and from table is selected Select the field needed for data modeling.In the preferred embodiment of the application, first according to first number According to the master meter and the field service condition information from table is obtained respectively, then further according to the word Section service condition information chooses the field.
Based on the factor that may need to consider to use in data modeling, field service condition information should at least be wrapped Include:Field Inquiry number of times, filter condition number of times, degree of incidence, aggregate statistics number of times, null value accounting, Enumerated value accounting.Technical staff can further be expanded on this basis, and these belong to the application Protection domain.
S105, according to the master meter, it is described carry out data modeling from table and the field, generation is described Object table.
As described above, object table is that true table or dimension table are general according to the determination of the business implication of master meter, As an example it is assumed that the business of master meter is meant that (such as someone occurs certain event in certain time in certain place What), then object table is generally true table;Business such as master meter is meant that certain entity (for example Commodity, buyer etc.), then object table is generally dimension table..Correspondingly, determine object table be true table it Afterwards, then it needs to be determined that object table is things type fact table, periodic snapshot fact table or accumulation snapshot thing Real table.And after it is determined that object table is dimension table, then it is necessary to determine whether to split master meter;It is Level splits into multiple dimension tables arranged side by side, or vertically splits into core and self-defined dimension table, does not still do Split.
Based on above-mentioned situation, before finally the step is carried out, the preferred embodiment of the application is for difference Situation propose corresponding processing mode, it is specific as follows:
(1) when the object table is affairs type fact table, according to the downstream use information pair The business procedure of the master meter carries out mark, it is determined that generation single event fact table or multiple affair fact table;
(2) when the object table is accumulation snapshot fact table, according to affairs type fact table pair The business procedure of the master meter carries out mark, and other true tables of the data modeling will be currently used in Business procedure carries out mark.
In the specific embodiment of the application, for affairs type fact table, it is necessary to the business procedure of master meter Mark, it is determined that generation single event fact table or multiple affair fact table.Wherein business procedure is typically all source system The natural business activity of system, such as conclude the business, and can typically place an order, pays, hands over by following business procedure It is readily accomplished.The metadata of the general foundation of business procedure mark is field downstream service condition, mainly word The filter condition number of times of section;Business procedure field is typically all time field, as filtering when downstream uses Condition is more, then the business procedure that the application is primarily upon.
For periodic snapshot fact table, it is necessary to business procedure mark to master meter, the industry of sign this time modeling After business process, into the treatment of next step.
For accumulation snapshot fact table, be first according to affairs type fact table carries out business procedure mark to master meter; Then other involved true tables of this modeling are introduced, also according to metadata to introduce other are true Table mark;To after the completion of the business procedure mark that is related to, into next step modeling.
(3) when the object table is the dimension table and the fractionation mode is that the level splits, root The master meter level is split as multiple dimension tables according to the field service condition information of the master meter;
(4) when the object table is the dimension table and the fractionation mode is the vertical fractionation, root According to the master meter and each related information between table, by business change higher than predetermined threshold value from table Core dimension table is generated by the data modeling with the master meter, and threshold is not higher than preset into business change What is be worth generates self-defined dimension table from table by the data modeling.
The metadata that level splits general foundation is field downstream service condition, the mainly filtering rod of field Piece number.As multiple BU shares commodity list, when different BU are used, all BU fields are filtered, only It is related to the commodity of oneself BU, therefore this specific embodiment does level and splits according to BU, each BU splits To a dimension table.
The metadata for vertically splitting general foundation is master meter, association situation situation and the product of master-salve table from table Go out the time;The situation of change of business can be considered simultaneously, modeled what business often changed to self-defined from table Dimension table, reduces the frequent change of target core dimension table.For example, according to metadata, being associated table 1, being closed Connection table 2, associated table 3 and master meter degree of incidence are more than certain threshold value, the specific embodiment by this three tables and Master meter is put into object table together;But master meter, associated table 1 are in 1:00 AM output, and associated table 2, quilt Contingency table 3 uses the downstream of master meter and associated table 1 to use data as early as possible in 3:00 AM output in order to allow, Master meter and the modeling of associated table 1 are obtained core dimension table by this specific embodiment, by master meter, the associated and of table 2 The associated modeling of table 3 obtains self-defined dimension table.
After the preparation before completing data modeling by above-mentioned steps is processed, you can by existing modeling Instrument carries out data modeling and generates object table.Such as external ERWin, ER/Studio, PowerDesigner be all can be used for operation system (OLTP) or analytic type system (OLAP system, Data warehouse is OLAP system) ER plan design tools, and the country DMDWDesigner data Warehouse modeling tool etc., on the premise of it can complete object table generation purpose, specific data modeling work The difference of tool has no effect on the protection domain of the application.
In order to the technological thought of the application is expanded on further, in conjunction with specifically should shown in Fig. 3 and Fig. 4 With scene, the technical scheme to the application is illustrated.Fig. 3 is the main body in the application specific embodiment Module, including metadata processing module and model construction module, Fig. 4 be Fig. 3 in metadata add The further division of work module.
Wherein, the downstream service condition metadata of table mainly includes that inquiry times, scheduling system queries are secondary Number, Join number, dispatch Join number of system, aggregation number, day net aggregation number, direct downstream number, Whole downstream numbers etc..It is as shown in table 1 below:
Sequence number Entry name Table name Inquiry times Its net inquiry times JOIN number Direct downstream number Whole downstream numbers
1 A A1 835.3 430.9 176 557 121496
2 B B1 343.7 160.4 127 290 70501
3 C C1 797.4 2 12.7 126 234 117312
4 D D1 229.2 155.2 114 206 160743
5 E E1 113.2 61.7 65 93 144155
Table 1
The Join relationship metadatas of table mainly include, Join master meters, Join are from table, Join types, Join Number of times, Join logics etc., shown in table specific as follows 2:
Sequence number Entry name Associated table Chinese name Degree of incidence Correlation logic
1 F F1 Table 1 0 Current master meter
2 G G1 Table 2 14 Xx.url_item=t1.item_id
3 H H1 Table 3 6 Xx.url=t2.id
4 I I1 Table 4 4 Xx.cookieuid=t3.user_id
5 J J1 Table 5 4 Xx.visitor_id=t4.inf_user_id
Table 2
The field downstream service condition metadata of table mainly includes, the where that the field of table is used by downstream Number of times, select number, join number, by number of group and the corresponding number of times in scheduling system etc., It is as shown in table 3 below:
Table 3
Based on above-mentioned metadata table, the idiographic flow schematic diagram of the specific embodiment is as shown in figure 5, main Comprise the following steps:
Step a) selects master meter:Metadata is may be referred to, selection does not have intermediate layer table, but downstream uses More ods layers of table of situation.
Step b) determines object table:For true table, by business procedure mark, it is determined that generation single event True table or multiple affair fact table;For dimension table, it is determined whether carry out level fractionation or vertical fractionation.
Step c) is selected from table:By metadata show master meter downstream service condition, such as master meter and which A little tables have done association, degree of incidence, association type etc.;This sentences and is more than certain threshold value according to degree of incidence Selection is illustrated from table.
Step d) selects master meter and the field from table:Field by metadata displaying master meter and from table is used Situation and dataprofile.Such as Field Inquiry number of times, filter condition number of times, join number, aggregate statistics time Number, null value accounting, enumerated value accounting etc..The specific embodiment chooses field by these data-guidings.
Step e) generates object module:Object module mainly includes two parts, and Part I is object module ER figures, i.e. object module obtains by the association of which table and taken which field of these tables;Another portion Point it is another displaying of model, i.e. model mapping relations, including target table name and annotation, field name Claim and type, source table, the field and type, conversion logic of table of originating etc..
By the scheme of application above-described embodiment, the theory and method of data warehouse modeling are merged, while By the way of metadata driven, modeling data is modeled by way of digitization is instructed, The degree of accuracy and the efficiency of modeling are provided.
To reach above technical purpose, the application also proposed a kind of data modeling equipment, as shown in fig. 6, Including:
First determining module 610, the metadata according to each source table determines the master meter for data modeling;
Second determining module 620, the business implication according to the master meter determines to be generated by the data modeling Object table type;
3rd determining module 630, the metadata according to the master meter is determined for data modeling from table;
Selecting module 640, from the master meter and the field selected from table for data modeling;
MBM 650, according to the master meter, it is described carry out data modeling from table and the field, it is raw Into the object table.
In specific application scenarios, second determining module specifically for:
If the business implication according to the master meter determines that the type of the object table is the true table, according to The metadata of the master meter determines the particular type of the true table, and the particular type includes:Affairs type True table, periodic snapshot fact table and accumulation snapshot fact table;
If the business implication according to the master meter determines that the type of the object table is the dimension table, according to institute The metadata for stating master meter determines whether the dimension table needs to be split and fractionation mode, the fractionation side Formula includes:Level splits and vertical fractionation.
In specific application scenarios, the metadata includes downstream use information, and the described 3rd determines mould Block specifically for:
Obtained according to the downstream use information and have related tables of data with the master meter;
The related information between the master meter and each tables of data is obtained, and will be with default selection strategy The corresponding tables of data of related information of matching is as described from table.
In specific application scenarios, the selecting module specifically for:
The master meter and the field service condition information from table are obtained according to the metadata respectively;
The field is chosen according to the field service condition information;
Wherein, the field service condition information at least includes:Field Inquiry number of times, filter condition number of times, Degree of incidence, aggregate statistics number of times, null value accounting, enumerated value accounting.
In specific application scenarios, also including processing module, wherein:
When the object table is affairs type fact table, the processing module is used according to the downstream Information carries out mark to the business procedure of the master meter, it is determined that generation single event fact table or multiple affair are true Table;
When the object table is accumulation snapshot fact table, the processing module is according to the affairs type thing Real table carries out mark to the business procedure of the master meter, and will be currently used in other things of the data modeling The business procedure of real table carries out mark;
When the object table is the dimension table and the fractionation mode is that the level splits, the treatment The master meter level is split as multiple dimension tables by module according to the field service condition information of the master meter;
When the object table is the dimension table and the fractionation mode is the vertical fractionation, according to described Business change is higher than predetermined threshold value by master meter and each related information between table, the processing module Core dimension table is generated by the data modeling from table and the master meter, and business change is not higher than Predetermined threshold value from table generates self-defined dimension table by the data modeling.
By the technical scheme of application the application, determine to be built for data according to the metadata of each source table The type of the master meter of mould and the object table for determining to be generated by data modeling according to the business implication of master meter Afterwards, the metadata according to master meter is determined for data modeling from table, and is selected from master meter and from table For the field of data modeling, data modeling is carried out finally according to master meter, from table and field, generate institute State object table.Metadata so as to be based on tables of data carries out data modeling exactly, it is ensured that data The accuracy and efficiency of modeling result.
Through the above description of the embodiments, those skilled in the art can be understood that this Shen Please be realized by hardware, it is also possible to realized by the mode of software plus necessary general hardware platform. Based on such understanding, the technical scheme of the application can be embodied in the form of software product, and this is soft It (can be CD-ROM, USB flash disk is mobile hard that part product can be stored in a non-volatile memory medium Disk etc.) in, including some instructions are used to so that a computer equipment (can be personal computer, take Business device, or the network equipment etc.) perform method described in the application each implement scene.
It will be appreciated by those skilled in the art that accompanying drawing is a schematic diagram for being preferable to carry out scene, in accompanying drawing Module or necessary to flow not necessarily implements the application.
It will be appreciated by those skilled in the art that the module in device in implement scene can be according to implement scene Description be distributed in the device of implement scene, it is also possible to is carried out respective change and is disposed other than this implementation In one or more devices of scene.The module of above-mentioned implement scene can merge into a module, also may be used To be further split into multiple submodule.
Above-mentioned the application sequence number is for illustration only, and the quality of implement scene is not represented.
Disclosed above is only several specific implementation scenes of the application, but, the application is not limited to This, the changes that any person skilled in the art can think of should all fall into the protection domain of the application.

Claims (10)

1. a kind of Data Modeling Method, it is characterised in that including:
Metadata according to each source table determines the master meter for data modeling;
Business implication according to the master meter determines the type of the object table generated by the data modeling;
Metadata according to the master meter is determined for data modeling from table;
From the master meter and the field selected from table for data modeling;
According to the master meter, it is described carry out data modeling from table and the field, generate the object table.
2. the method for claim 1, it is characterised in that the business implication according to the master meter is true The type of the object table for being generated by the data modeling surely, specially:
If the business implication according to the master meter determines that the type of the object table is the true table, according to The metadata of the master meter determines the particular type of the true table, and the particular type includes:Affairs type True table, periodic snapshot fact table and accumulation snapshot fact table;
If the business implication according to the master meter determines that the type of the object table is the dimension table, according to institute The metadata for stating master meter determines whether the dimension table needs to be split and fractionation mode, the fractionation side Formula includes:Level splits and vertical fractionation.
3. method as claimed in claim 2, it is characterised in that the metadata includes that downstream uses letter Breath, the metadata according to the master meter determined for data modeling from table, specially:
Obtained according to the downstream use information and have related tables of data with the master meter;
The related information between the master meter and each tables of data is obtained, and will be with default selection strategy The corresponding tables of data of related information of matching is as described from table.
4. the method for claim 1, it is characterised in that from the master meter and described from table The field for data modeling is selected, specially:
The master meter and the field service condition information from table are obtained according to the metadata respectively;
The field is chosen according to the field service condition information;
Wherein, the field service condition information at least includes:Field Inquiry number of times, filter condition number of times, Degree of incidence, aggregate statistics number of times, null value accounting, enumerated value accounting.
5. the method as described in any one of claim 3 or 4, it is characterised in that according to the master meter, It is described carry out data modeling from table and the field before, also include:
When the object table is affairs type fact table, according to the downstream use information to the master The business procedure of table carries out mark, it is determined that generation single event fact table or multiple affair fact table;
When the object table is accumulation snapshot fact table, according to affairs type fact table to the master The business procedure of table carries out mark, and will be currently used in the business mistake of other true tables of the data modeling Cheng Jinhang marks;
When the object table is the dimension table and the fractionation mode is that the level splits, according to described The master meter level is split as multiple dimension tables by the field service condition information of master meter;
When the object table is the dimension table and the fractionation mode is the vertical fractionation, according to described Master meter and each related information between table, by business change higher than predetermined threshold value from table with it is described Master meter by the data modeling generate core dimension table, and by business change be not higher than predetermined threshold value from Table generates self-defined dimension table by the data modeling.
6. a kind of data modeling equipment, it is characterised in that including:
First determining module, the metadata according to each source table determines the master meter for data modeling;
Second determining module, the business implication according to the master meter determines what is generated by the data modeling The type of object table;
3rd determining module, the metadata according to the master meter is determined for data modeling from table;
Selecting module, from the master meter and the field selected from table for data modeling;
MBM, according to the master meter, it is described carry out data modeling from table and the field, generate The object table.
7. equipment as claimed in claim 6, it is characterised in that second determining module specifically for:
If the business implication according to the master meter determines that the type of the object table is the true table, according to The metadata of the master meter determines the particular type of the true table, and the particular type includes:Affairs type True table, periodic snapshot fact table and accumulation snapshot fact table;
If the business implication according to the master meter determines that the type of the object table is the dimension table, according to institute The metadata for stating master meter determines whether the dimension table needs to be split and fractionation mode, the fractionation side Formula includes:Level splits and vertical fractionation.
8. equipment as claimed in claim 7, it is characterised in that the metadata includes that downstream uses letter Breath, the 3rd determining module specifically for:
Obtained according to the downstream use information and have related tables of data with the master meter;
The related information between the master meter and each tables of data is obtained, and will be with default selection strategy The corresponding tables of data of related information of matching is as described from table.
9. equipment as claimed in claim 6, it is characterised in that the selecting module specifically for:
The master meter and the field service condition information from table are obtained according to the metadata respectively;
The field is chosen according to the field service condition information;
Wherein, the field service condition information at least includes:Field Inquiry number of times, filter condition number of times, Degree of incidence, aggregate statistics number of times, null value accounting, enumerated value accounting.
10. the equipment as described in any one of claim 6 or 9, it is characterised in that also including treatment mould Block, wherein:
When the object table is affairs type fact table, the processing module is used according to the downstream Information carries out mark to the business procedure of the master meter, it is determined that generation single event fact table or multiple affair are true Table;
When the object table is accumulation snapshot fact table, the processing module is according to the affairs type thing Real table carries out mark to the business procedure of the master meter, and will be currently used in other things of the data modeling The business procedure of real table carries out mark;
When the object table is the dimension table and the fractionation mode is that the level splits, the treatment The master meter level is split as multiple dimension tables by module according to the field service condition information of the master meter;
When the object table is the dimension table and the fractionation mode is the vertical fractionation, according to described Business change is higher than predetermined threshold value by master meter and each related information between table, the processing module Core dimension table is generated by the data modeling from table and the master meter, and business change is not higher than Predetermined threshold value from table generates self-defined dimension table by the data modeling.
CN201510980569.3A 2015-12-23 2015-12-23 A kind of Data Modeling Method and equipment Pending CN106909566A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510980569.3A CN106909566A (en) 2015-12-23 2015-12-23 A kind of Data Modeling Method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510980569.3A CN106909566A (en) 2015-12-23 2015-12-23 A kind of Data Modeling Method and equipment

Publications (1)

Publication Number Publication Date
CN106909566A true CN106909566A (en) 2017-06-30

Family

ID=59200081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510980569.3A Pending CN106909566A (en) 2015-12-23 2015-12-23 A kind of Data Modeling Method and equipment

Country Status (1)

Country Link
CN (1) CN106909566A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107643917A (en) * 2017-10-19 2018-01-30 山东浪潮通软信息科技有限公司 A kind of user configuration information management method and device
CN108170557A (en) * 2018-01-24 2018-06-15 百度在线网络技术(北京)有限公司 For the method and apparatus of output information
CN108763565A (en) * 2018-06-04 2018-11-06 广东京信软件科技有限公司 A kind of matched construction method of data auto-associating based on deep learning
CN109377159A (en) * 2018-09-19 2019-02-22 成都信息工程大学 A kind of software modeling procedure incarnation evolution system and method, processor, terminal
CN110175173A (en) * 2019-05-24 2019-08-27 全知科技(杭州)有限责任公司 A kind of identification of operation system master data and differentiating method based on data characteristics analysis
CN110222032A (en) * 2019-05-22 2019-09-10 武汉掌游科技有限公司 A kind of generalised event model based on software data analysis
CN110674117A (en) * 2019-09-26 2020-01-10 京东数字科技控股有限公司 Data modeling method and device, computer readable medium and electronic equipment
CN111191177A (en) * 2019-12-25 2020-05-22 苏宁金融科技(南京)有限公司 Web-based model construction method and device, computer equipment and storage medium
CN111666347A (en) * 2019-03-07 2020-09-15 阿里巴巴集团控股有限公司 Data processing method, device and equipment
CN111831624A (en) * 2020-07-14 2020-10-27 北京三快在线科技有限公司 Data table creating method and device, computer equipment and storage medium
CN113076314A (en) * 2021-03-30 2021-07-06 深圳市酷开网络科技股份有限公司 Data table storage method and device and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178732A (en) * 2007-12-12 2008-05-14 江苏省电力公司 Method for quick-speed realizing data store house process based on metadata
CN101777073A (en) * 2010-02-01 2010-07-14 浪潮集团山东通用软件有限公司 Data conversion method based on XML form
US8510339B1 (en) * 2000-10-03 2013-08-13 A9.com Searching content using a dimensional database
CN103853820A (en) * 2014-02-20 2014-06-11 北京用友政务软件有限公司 Data processing method and data processing system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8510339B1 (en) * 2000-10-03 2013-08-13 A9.com Searching content using a dimensional database
CN101178732A (en) * 2007-12-12 2008-05-14 江苏省电力公司 Method for quick-speed realizing data store house process based on metadata
CN101777073A (en) * 2010-02-01 2010-07-14 浪潮集团山东通用软件有限公司 Data conversion method based on XML form
CN103853820A (en) * 2014-02-20 2014-06-11 北京用友政务软件有限公司 Data processing method and data processing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
戴浩: "基于业务元数据的多维建模系统设计与实现", 《计算机工程与设计》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107643917A (en) * 2017-10-19 2018-01-30 山东浪潮通软信息科技有限公司 A kind of user configuration information management method and device
CN108170557A (en) * 2018-01-24 2018-06-15 百度在线网络技术(北京)有限公司 For the method and apparatus of output information
CN108763565A (en) * 2018-06-04 2018-11-06 广东京信软件科技有限公司 A kind of matched construction method of data auto-associating based on deep learning
CN109377159A (en) * 2018-09-19 2019-02-22 成都信息工程大学 A kind of software modeling procedure incarnation evolution system and method, processor, terminal
CN111666347B (en) * 2019-03-07 2023-04-07 阿里巴巴集团控股有限公司 Data processing method, device and equipment
CN111666347A (en) * 2019-03-07 2020-09-15 阿里巴巴集团控股有限公司 Data processing method, device and equipment
CN110222032A (en) * 2019-05-22 2019-09-10 武汉掌游科技有限公司 A kind of generalised event model based on software data analysis
CN110175173B (en) * 2019-05-24 2021-03-26 全知科技(杭州)有限责任公司 Service system main data identification and distinguishing method based on data characteristic analysis
CN110175173A (en) * 2019-05-24 2019-08-27 全知科技(杭州)有限责任公司 A kind of identification of operation system master data and differentiating method based on data characteristics analysis
CN110674117A (en) * 2019-09-26 2020-01-10 京东数字科技控股有限公司 Data modeling method and device, computer readable medium and electronic equipment
CN111191177A (en) * 2019-12-25 2020-05-22 苏宁金融科技(南京)有限公司 Web-based model construction method and device, computer equipment and storage medium
CN111831624A (en) * 2020-07-14 2020-10-27 北京三快在线科技有限公司 Data table creating method and device, computer equipment and storage medium
CN113076314A (en) * 2021-03-30 2021-07-06 深圳市酷开网络科技股份有限公司 Data table storage method and device and computer readable storage medium
CN113076314B (en) * 2021-03-30 2024-04-19 深圳市酷开网络科技股份有限公司 Data table storage method and device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN106909566A (en) A kind of Data Modeling Method and equipment
Ghazal et al. Bigbench: Towards an industry standard benchmark for big data analytics
CN100568237C (en) Report form template in the multidimensional enterprise software system generates method and system
Jukić et al. Augmenting data warehouses with big data
CN101111838B (en) Automated relational schema generation within a multidimensional enterprise software system
US7840896B2 (en) Definition and instantiation of metric based business logic reports
JP6846356B2 (en) Systems and methods for automatically inferring the cube schema used in a multidimensional database environment from tabular data
US8217945B1 (en) Social annotation of a single evolving visual representation of a changing dataset
US20100131457A1 (en) Flattening multi-dimensional data sets into de-normalized form
CN105045869B (en) Natural resources geographical spatial data method for organizing based on multiple data centers and system
CN107016001A (en) A kind of data query method and device
US20070143161A1 (en) Application independent rendering of scorecard metrics
CN102541867A (en) Data dictionary generating method and system
CN108108477B (en) A kind of the KPI system and Rights Management System of linkage
CN104598449A (en) Preference-based clustering
Kim et al. Simultaneous edit-imputation and disclosure limitation for business establishment data
JP6375029B2 (en) A metadata-based online analytical processing system that analyzes the importance of reports
Ramadhani et al. Implementation of data warehouse in making business intelligence dashboard development using PostgreSQL database and Kimball lifecycle method
Yu Data mining in library reader management
Wijayanti et al. K-means cluster analysis for students graduation: case study: STMIK Widya Cipta Dharma
Hamoud et al. Design and implementing cancer data warehouse to support clinical decisions
Herschel Principles and Applications of Business Intelligence Research
CN116090880A (en) Data index system modeling method and system based on big data CDP system
Walde et al. Performance contest between MLE and GMM for huge spatial autoregressive models
Renfro Economic database systems: further reflections on the state of the art

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170630

RJ01 Rejection of invention patent application after publication