CN105404637A - Data mining method and device - Google Patents

Data mining method and device Download PDF

Info

Publication number
CN105404637A
CN105404637A CN201510598360.0A CN201510598360A CN105404637A CN 105404637 A CN105404637 A CN 105404637A CN 201510598360 A CN201510598360 A CN 201510598360A CN 105404637 A CN105404637 A CN 105404637A
Authority
CN
China
Prior art keywords
data
model
dimension
index
fact table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510598360.0A
Other languages
Chinese (zh)
Other versions
CN105404637B (en
Inventor
方铸
万月亮
火一莽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ruian Technology Co Ltd
Original Assignee
Beijing Ruian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ruian Technology Co Ltd filed Critical Beijing Ruian Technology Co Ltd
Priority to CN201510598360.0A priority Critical patent/CN105404637B/en
Publication of CN105404637A publication Critical patent/CN105404637A/en
Application granted granted Critical
Publication of CN105404637B publication Critical patent/CN105404637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses a data mining method and device. The method comprises the following steps: obtaining a data mining model, wherein the data mining model corresponds to a data table in a data warehouse, and the data table records a data mining rule of which data mining is carried out on the basis; and according to the data mining rule, mining factual data in the data warehouse. The data mining method and device provided by the embodiment of invention realizes automatic data mining in a data warehouse system.

Description

Data digging method and device
Technical field
The embodiment of the present invention relates to data warehouse technology field, particularly relates to a kind of data digging method and device.
Background technology
Modern business intelligence set of applications data pick-up (Extract-transform-load, ETL), data warehouse, data mining, key index analysis, the advanced database technology such as data exhibiting, be future thrust and the trend in database application field.Fig. 1 shows each key link of business intelligence application.See Fig. 1, the target of business intelligence application converts data to knowledge, by the analytical approach of science, finds the marginal key index of enterprise's tool and data in mass data.
In business intelligence application process, having a link to be data mining, is exactly in a large amount of data, will find out useful data, and finally convert data to knowledge by various method.We know, data warehouse itself is integrated with the data resource of the magnanimity about destination object.These different data item logical relation numerous and complicated each other, is difficult to put in order.And, in the data mining link of data mining application, the identification to mass data can be related to, quote and be polymerized calculating.These computation processes also just have larger difficulty.So the automatic conversion total degree of difficulty realized in the large data system that data warehouse is such from data to knowledge is higher, be difficult to realize.
Summary of the invention
For above-mentioned technical matters, embodiments provide a kind of data digging method and device, to realize automatic data mining in data warehouse.
First aspect, embodiments provides a kind of data digging method, and described method comprises:
Obtain data mining model, described data mining model corresponds to and the tables of data in data warehouse, and records the data mining rule of carrying out data mining institute foundation in described tables of data;
According to described data mining rule, the factual data in data warehouse is excavated.
Second aspect, the embodiment of the present invention additionally provides a kind of data mining device, and described device comprises:
Model acquisition module, for obtaining data mining model, described data mining model corresponds to and the tables of data in data warehouse, and records the data mining rule of carrying out data mining institute foundation in described tables of data;
Excavate module, for excavating the factual data in data warehouse according to described data mining rule.
The data digging method that the embodiment of the present invention provides and device, by obtaining the data mining model corresponding to the tables of data wherein storing data mining rule, and excavate the factual data in data warehouse according to described data mining rule, thus the automaticdata achieved in data warehouse excavates.
Accompanying drawing explanation
By reading the detailed description done non-limiting example done with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 is the principle schematic of the business intelligence application that prior art provides;
Fig. 2 is the process flow diagram of the data digging method that first embodiment of the invention provides;
Fig. 3 is the data structure schematic diagram of the data mining model that first embodiment of the invention provides;
Fig. 4 is the principle schematic of the data dimension table that first embodiment of the invention provides;
Fig. 5 is the principle schematic of the relation between the index that provides of first embodiment of the invention;
Fig. 6 is the principle schematic of the attribute of the index that first embodiment of the invention provides;
Fig. 7 is the process flow diagram of dredge operation in the data digging method that provides of second embodiment of the invention;
Fig. 8 is the inheritance schematic diagram between class corresponding to dissimilar tables of data that second embodiment of the invention provides;
Fig. 9 is the relation schematic diagram between the dimension manager that provides of second embodiment of the invention and dimensional model;
Figure 10 is the process flow diagram that in the dredge operation that provides of third embodiment of the invention, dimension is excavated;
Figure 11 is the inheritance figure between the interpreter being added on syntax parsing chain that provides of third embodiment of the invention;
Figure 12 is the process flow diagram that in the dredge operation that provides of fourth embodiment of the invention, dimension is excavated;
Figure 13 is the process flow diagram that in the dredge operation that provides of fifth embodiment of the invention, index is excavated;
Figure 14 is the structural drawing of the data mining device that sixth embodiment of the invention provides.
Embodiment
Below in conjunction with drawings and Examples, the present invention is described in further detail.Be understandable that, specific embodiment described herein is only for explaining the present invention, but not limitation of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not entire infrastructure.
First embodiment
Present embodiments provide a kind of technical scheme of data digging method.Described data digging method is performed by data mining device.Further, described data mining device is integrated in the server internal of data warehouse, or exists between the integrated server being described data warehouse in another computing equipment of network connection.In a word, the equipment being integrated with described data mining device can read data from data warehouse.
See Fig. 1, described data digging method comprises:
S11, obtains data mining model, and described data mining model corresponds to and the tables of data in data warehouse, and records the data mining rule of carrying out data mining institute foundation in described tables of data.
Data processing in traditional data warehouse is generally the basic data processing of some storage to data, management etc.By the data processing that these are basic, data are only some changes in file layout, can not produce new content.Business intelligence application is then completely different, by the drilling through of data, add up, sort out, the operation such as polymerization, new knowledge can be excavated from original data, realize the conversion from data to knowledge.Such as, store at one in the data warehouse of the generated energy data in each area specially, by the excavation of the generated energy to different regions each stage annual, the mutual relationship existed between the generated energy between different regions may be found.This mutual relationship is exactly the knowledge that business intelligence usage mining arrives.
In the present embodiment in data warehouse data perform dredge operation based on a data mining model.Fig. 3 shows the data structure of described data mining model.See Fig. 3, described data mining model comprises: fact table model 31, data line table model 32, data list model 33, dimensional model 34 and index model 35.Further, described fact table model 31 corresponds to the fact table in data warehouse; Described data line table model 32 corresponds to the data line table in data warehouse; Described data list model 33 corresponds to the data list in data warehouse; Described dimensional model 34 corresponds to the dimension definition list in data warehouse; Described index model 35 corresponds to the index definition table in data warehouse.That is, described data mining model corresponds to the tables of data in described data warehouse.
Described fact table is used to the factual data recording basis the most in data warehouse.These data are also referred to as derived data.Described data line table is used for recording the data line title in described fact table and dimension name.Described data list is used for recording the data rows title in described fact table and index name.
In traditional relevant database, entity-relation model (E-Rmodel) is generally adopted to carry out modeling to data.Take this modeling method, each tables of data corresponds to a concrete business.The advantage of this modeling pattern is that data redundancy amount is little, workable for concrete business.But also there is shortcoming, the readability that these shortcomings are mainly reflected in data is poor simultaneously, data analysis is performed to the data wherein stored very difficult.
Conveniently data analysis, adopts the mode of data dimension to carry out modeling to data in the present embodiment.Under this modeling pattern, in data warehouse, not only storing the fact table for recording the data fact, also storing the dimension definition list being used for data of description dimension.
So-called dimension, is an angle of observing problem, it provides a kind of analysis means problem being carried out to destructing.Such as, store specially at one in the data warehouse of the generated energy data in each area, area can be just a dimension in this data warehouse.
Same fact table may correspond to multiple dimension definition list.Fig. 4 gives the example that a fact table corresponds to multiple dimension definition list.See Fig. 4, fact table---metadata definition list 41 corresponds to regional dimension definition list 42, time dimension definition list 43, product dimension definition list 44 and user's dimension definition list 45 simultaneously.The pattern that a this fact table corresponds to multiple dimension definition list is called as Star Schema.
In addition, except Star Schema, the associative mode between fact table and dimension definition list can also be Fact constellation pattern, or snowflake schema.Under Fact constellation pattern, multiple different fact table can correspond to same dimension definition list.Under snowflake schema, a primary dimension table can also have oneself sub-dimension table, and sub-dimension table also can have the sub-dimension table of the next stage of oneself even.
Concrete, in the present embodiment, described data dimension table comprises: primary dimension table, sub-dimension table, dimension values enumerated table and dimension values set table.Primary dimension table represents a class scope dimension.It is a main dimension that such as area is defined in primary dimension table, and expression can regional classification different regions.The meaning of primary dimension table definition must be clearly, pass through and refine, and data need the dimension meaning reflecting definition exactly after the calculating, polymerization of different dimensions, and logic statement can not have ambiguous.It is the relation comprised between sub-dimension table and primary dimension table.That is, sub-dimension can be included in primary dimension table.Namely dimension values enumerated table is enumerated out to dimension values, and such as scheduling type can be enumerated into network regulation, economize and adjust.Dimension values set table is used for the set operation due to dimension values.The retrieval of data is reduced to set operation by the set operation of dimension values, by the set operation on union, common factor, complete or collected works, this several basis of supplementary set, can search required data fast in tables of data.Compared with traditional conditional search mode, set operation is more prone in expression, and the efficiency of search operaqtion is also higher.It should be noted that, dimension values set and dimension values are mutual exclusions, in data line, that is define dimension values set just can not define dimension values again.
In addition, in order to reduce the complexity of dimensionality analysis, reduce the quantity of dimension table, the various dimension definition lists in the present embodiment adopt real dimensions table and virtual dimension table double-mode.The dimension with multirelation is deposited in true dimension table, uses independently tables of data to store in data warehouse; Independently dimensional systems or sub-dimension are deposited in virtual dimension table, share a tables of data and store.
By previously described dimensional model, adopt the design that data and dimension are separated, the definition of the dimension of data is stored in various dimension definition list.Such dimensional model is adopted to have the following advantages: data redundancy amount is little, is easy to maintenance and management; Data retrieval performance is high, is more suitable for on-line analytical processing and data mining; Express towards various dimensions, dimension values has hereditary capacity, can drill through multidimensional data; Dimension is not by the impact of version, and data layout change does not affect dimension variation.
Described index model for defining the key index system of business intelligence application, thus can describe the relation between numerical value, and the computation rule of definition numerical value, follows the trail of the primary source of numerical value.Described index model corresponds to the index definition table in data warehouse.Described index definition table comprises: index table, pointer type table, data granularity table and index set table.
Namely described index table can define basic index also can define polymerization index, is the direct expression of data sense.Described pointer type table has been more the certain sense except basic index data, is mainly used in the data type defining index.The data granularity of the data in the described data granularity table definition fact is several.The higher level one data meaning of described index set table definition, forms the expression of cluster data with data target, pointer type, data granularity.
As an example, table 1 shows a data granularity graph of the time granularity of identification data.See table 1, in this data granularity table, the time granularity of data can be day degree, monthly or annual.
Table 1
ID Title Describe
1 Day Day degrees of data
2 Month Monthly data
3 Year Annual data
In the present embodiment, there are these three kinds of relations of calculating, total score and dependence between different indexs.Fig. 5 shows the relation between different index.See Fig. 5, for the target generated output of power plant, target generated output by planned value, complete the specific targets such as value, aggregate-value and formed.Wherein, completion rate of the plan is planned value and completes the number percent of value, so completion rate of the plan and planned value, complete between value and there is calculated relationship.When time value by inside the plan complete value and overfulfil value form, describedly complete value, overfulfil between value and belong to total score relation when time value and plan.Inside the planly complete value and overfulfil mutual exclusion between value, between them, belong to dependence.
Each index by index class name, pointer type and data type three attribute formed.Described by index class name is the classification of index derived data; Pointer type is the classification of index value; Data type then defines the classification that numerical value may use.Fig. 6 shows an example of three attribute of index.See Fig. 6, the index class name of the index illustrated is " electricity volume ", and pointer type is " upper time value ", and data type is " monthly ".
Preferred further, described data mining model also comprises versions of data model.Described versions of data model is for recording the version information of described data mining model.
The data mining rule of carrying out data mining institute foundation is recorded in above-mentioned various tables of data.Described data mining rule comprises: data definition rule, data value rule and excavation operation rule.Described data mining rule carries out record with markup language.Preferably, described data mining rule carries out record with XML language.
Further, different labels can be defined in XML language, the different operating that should perform in excavating for identification data, thus realize the record to data mining rule.
S12, excavates the factual data in data warehouse according to described data mining rule.
Owing to have recorded the data mining rule of carrying out data mining in various tables of data, can excavate factual data according to described data mining rule, thus therefrom find knowledge.
Further, because described data mining rule carries out record by XML language, and in XML language, define the label representing different operating, when performing the excavation to factual data, semantic network technology can be used, according to the various labels defined in described data mining rule, described factual data is excavated.
Further, in the present embodiment, the mode of syntax parsing chain is adopted to excavate factual data.Because described syntax parsing chain is used for excavating factual data according to the value of the various label defined in XML language and these labels, it is otherwise known as XML syntax parsing chain.
Described syntax parsing chain can add the resolver corresponding to the different labels in described data mining rule.Once described syntax parsing chain with the addition of interpreter corresponding to a kind of label, then described syntax parsing chain can be used in the parsing to the data mining rule including this kind of label.
Adopt this Model Design data mining code, mainly have followed " open-close " principle in Design Mode.Specifically, suppose the label having increased a type in the data mining rule of XML language newly, then only need the code of adding interpreter corresponding to this label, and on described syntax parsing chain, the interpreter of this new definition is added when needing to resolve this label, and do not need the code changing syntax parsing chain.Suppose that the interpretation logic of the label of a type there occurs change, also only need the code changing interpreter corresponding to this label, what do not need change syntax parsing chain realizes logic.Obviously, what such codes implement mode can improve code reuses rate, greatly facilitates the maintenance to program code.
Adopt another benefit of the codes implement mode of syntax parsing chain to be, the logic realization of the interpreter that different labels is corresponding is separate, does not interfere with each other.Like this, be conducive to externally providing unified routine interface, reduce the code degree of coupling between different object.
The present embodiment is by obtaining data mining model, and according to the data mining rule in described data mining model, the factual data in data warehouse is excavated, in the process that the factual data stored in data warehouse is excavated, to the identification of data, drill through, add up, sort out, the operation such as polymerization can complete automatically, thus the automaticdata achieved in data warehouse excavates.
Second embodiment
The present embodiment, based on the above embodiment of the present invention, further provides a kind of technical scheme of dredge operation in data digging method.In this technical scheme, according to described data mining rule, excavation is carried out to the factual data in data warehouse and comprise: obtain described fact table model according to described fact table; Obtain the versions of data model of described fact table; According to described data line table and described versions of data model, obtain the data line table model that described fact table model is corresponding; According to described dimension definition list and described versions of data model, obtain the dimensional model that described fact table model is corresponding; According to described data line table model and described dimensional model corresponding data mining rule, obtain the data line title of described fact table, and the dimension in described dimensional model screened and is polymerized; According to described data list and described versions of data model, obtain the data rows model that described fact table model is corresponding; According to described index definition table and described versions of data model, obtain the index model that described fact table model is corresponding; According to described data list model and described index model corresponding data mining rule, obtain the data rows title of described fact table, and the index in described index model screened, calculate, add up and sorted out.
See Fig. 7, according to described data mining rule, excavation is carried out to the factual data in data warehouse and comprises:
S71, obtains described fact table model according to described fact table.
In the present invention, the data mining model comprising fact table model, data line table model, data list model etc. is set up according to the various tables of data stored in data warehouse.Described tables of data can be divided into following several classes: static table, dump list, middle table, multidimensional table and view table.
Described static table is the table being in most original form.Data rows, dimension and index can be defined for static table.But, any data mining rule in static table, can not be defined.Described dump list is similar to the storing process in database.Data line and data rows can be defined for dump list, the data mining rule about data line and data rows can also be defined.Described middle table carries out to primary source table the table that produces in the process of data processing.Data rows and index can be defined for described middle table.Described multidimensional table has multiple storage dimension.For described multidimensional table, its data line and data rows can be defined, and described data line and each self-corresponding data mining rule of data rows.Described view table is for defining the different views of corresponding data table.For described view table, its data line, data rows, dimension and index can be defined, can also define they separately data mining rule.
When the data mining utilizing program code to realize above-mentioned several tables of data, above-mentioned several tables of data utilizes different classes to represent respectively.For the tables of data that each is concrete, can represent that the class of different types of tables of data is instantiated as object by being used for, then by realizing concrete data mining exercises to the calling of member function of these objects.
Fig. 8 shows the inheritance between class corresponding to dissimilar tables of data.See Fig. 8, described static table corresponds to TableModel class 81.Further, TableModel class 81 is parents of other classes shown in Fig. 8, is also base class.Described dump list corresponds to TransTableModel class 82.Described TransTableModel class 82 with TableModel class 81 for base class.Described middle table corresponds to MiddleTableModel class 83.Described MiddleTableModel class 83 with TableModel class 81 for base class.Described multidimensional table corresponds to MultiTableModel class 84.Described MultiTableModel class 84 with TableModel class 81 for base class.Described view table corresponds to ViewTableModel class 85.Described ViewTableModel class 85 with TableModel class 81 for base class.
In the operation obtaining fact table model, the object of table manager (TableManager) generates fact table model to utilize name to be called.Described table manager is created with factory mode.And after its creation, according to needing the fact table setting up fact table model No. ID to search concrete fact table, and set up fact table model according to the fact table found.To generate fact table model corresponding to fact table that No. ID is 2000L, concrete fact table model generation code is as follows:
TableManagertm=(TableManager)ac.getBean("TableManager");
TableModeltableModel=tm.findTableModel(2000L);
Object tableModel is exactly the fact table model generated.The internal members of fact table model mainly contains several as follows: connect (AConn) object, data rows manager (ColumnManager) object, data line manager (RowManager) object, revisions manager (VersionManager) object and reference list (refTables) object.Described connecting object, for connecting external data base, adopts DATASOURCE field to define usually in fact table.The column data that described data rows manager object management data list defines.The row that described data line manager object management data list defines.The version that described revisions manager Object Management group tables of data defines.Other tables of data that described reference list object deposit data table is quoted.The definition of reference data table is stored in the data mining rule of fact table.The data structure of this object is List<TableModel>.
S72, obtains the versions of data model of described fact table.
Because the internal members of described tables of data comprises revisions manager, so the versions of data model of described fact table can be obtained by described revisions manager, namely VersionModel object.Data rows manager obtains effective data line at the appointed time according to Model of Version, there is the problem of multiple version for solving tables of data.When tables of data maps, the versions of data model available according to the selection of time one imported into is deposited in name and is called in the member variable of current version by revisions manager.
S73, according to described data line table and described versions of data model, obtains the data line table model that described fact table model is corresponding.
In the present embodiment, utilize the data line table model that fact table model described in RowManager object acquisition is corresponding, the RowModel object namely in code.Described RowManager object obtains certain according to versions of data model and determines on time point, for defining the data line table model of data line.
Described data line table model, RowModel object implementatio8 namely in code ITreeNode interface.Therefore, described RowModel object has the structure of tree-shaped.In addition, RowModel also achieves Comparable interface.Therefore, RowManager can sort to the RowModel object of instantiation.More specifically, RowManager, when sorting to RowModel object, is the sequence that the SN field in the data line table corresponding to RowModel object is carried out.
S74, according to described dimension definition list and described versions of data model, obtains the dimensional model that described fact table model is corresponding.
Identical with the various data mining models described above, the acquisition of dimensional model is also realized by a dimension manager.In code, dimensional model object is called as DimModel object, and dimension manager object is called as DimManager object.
Fig. 9 shows the mutual relationship in dimensional model between each object.See Fig. 9, dimension manager is injected in TableManager as single example, before injection, namely constructs complete dimensional model DimModel list and dimension values DimValue list.DimModel and DimValue all achieves clone interface, therefore can carry out degree of depth clone.The DimValue member obtained in DimValueSetModel is all cloned objects, to ensure that the hierarchical relationship of DimValue example can oppositely trace back to correct DimValueSetModel.
ColumnModel can associate with a DimModel, also can associate with DimValueSetModel.When ColumnModel has the DimValue of a main dimension, DimValue can be converted to a DimValueSetModel.RowManager provides multiple method and searches the row relevant to dimension values, and searchByDimValueSetModel () method can search the multiple row met as corresponding to the DimValueSetModel of subclass.SearchByDimValue () method can search the multiple row corresponding with main dimension values.The data line that findByDimValueSetModel () can search dimension values set corresponding.
S75, according to described data line table model and described dimensional model corresponding data mining rule, obtain the data line title of described fact table, and the dimension in described dimensional model screened and is polymerized.
Described data mining rule defines with XML language.Further, in the XML language of the described data mining rule of definition, pre-defined and much there is certain semantic label.Introduce below and excavate relevant XML semantic label to dimension.
Dimension definition label <dim-define>
Dimension definition label is used for defining data dimension.Its subtab can have: condition types label, arithmetic type label, value type label, measurement type label.
Dimension value label <dim>
Dimension value label is used to specify the value of dimension, can use this label in the computation process of dimension.Its sub-attribute comprises: identification number id, and the value value of correspondence.The identification number supposing a dimension value label is 5, and value is 18, then the code of this dimension value label is: <dimid=" 5 " key=" 18 "/>.
The label that above-mentioned two kinds of labels use when belonging to data dimension definition, they appear in various dimension definition list, for arranging the calculated relationship between dimension.
Define like labels class with dimension, in described data mining rule, also further define condition types label, arithmetic type label, value type label, date-time definition class label, measurement type label and data tag table.Described condition types label is for defining the condition distinguishing relation in data mining process.Described arithmetic type label is for defining the various sign of operation in data mining process.Described value type label is for defining the value type transformational relation in data mining process.Described date-time definition class label is for defining the conversion of date type data.Described measurement type label is used for defining various statistical calculation.Described data tag table is for defining the associative operation to tables of data.
When excavating data dimension according to data mining rule, the automatic mining to data dimension can be performed according to the above-mentioned grammer label defined in described data mining rule.
S76, according to described data list and described versions of data model, obtains the data rows model that described fact table model is corresponding.
Data rows manager object corresponding to described data rows model is held in fact table model inside.Therefore, described data rows manager object can be obtained by obtaining the mode of the internal object of described fact table model.Especially, the acquisition of described data rows manager object needs with reference to described versions of data object, there is version problem to avoid the tree fern manager object got.
In code, described data rows manager object is otherwise known as ColumnManager object.Instantiation data row model can be carried out, namely ColumnModel object by ColumnManager object.
Identical with RowModel object, ColumnModel object also achieves ITreeNode interface, so described ColumnMode possesses the structure of tree-shaped.ColumnModel also achieves Comparable interface equally, and therefore ColumnManager can sort to ColumnModel.More specifically, when sorting to ColumnModel, ColumnManager sorts to ColumnModel according to the SN field in the data list of correspondence.
S77, according to described index definition table and described versions of data model, obtains the index model that described fact table model is corresponding.
When obtaining described fact table model, be also utilize INDEX MANAGEMENT device instantiation index model, thus realize the acquisition to index model.Described INDEX MANAGEMENT device adopts singleton pattern to realize, and leaves TableManager object in by relying on injection.During ColumnModel object-instantiated, from INDEX MANAGEMENT device IndexSetManager, obtain index set model IndexSetModel example.
IndexSetModel example adopts integrated mode by the example set of index model IndexModel, pointer type model IndexTypeModel, data granularity model GradingSizeModel altogether.
IndexTypeModel represents the statistical of data, represent can according to same index can calculate dissimilar, current pointer type is defined as three grades of index definitions.GradingSizeModel is used for the granularity of data of description.In index associated class, provide a series of method for comparing the similarity of two indices set.In ColumnManager, the methods such as searchByIndexModelSet () and findByIndexSetModelId () are all that service index set is screened data rows, thus inquire the data rows that index matches.
S78, according to described data list model and described index model corresponding data mining rule, obtain the data rows title of described fact table, and the index in described index model screened, calculate, add up and sorted out.
Be that the XML label used in the dredge operation that index is correlated with mainly contains index definition label and data granularity respective labels.Described index definition label is mainly used in the definition of data target.Described data granularity respective labels is mainly used in the definition of data granularity.
Need to further illustrate, previously described condition types label, arithmetic type label, value type label, date-time definition class label, and also can be used in the process excavated in index of measurement type label.
The present embodiment is by obtaining described fact table model according to described fact table, obtain the versions of data model of described fact table, according to described data line table and described versions of data model, obtain the data line table model that described fact table model is corresponding, according to described dimension definition list and described versions of data model, obtain the dimensional model that described fact table model is corresponding, according to described data line table model and described dimensional model corresponding data mining rule, obtain the data line title of described fact table, and the dimension in described dimensional model is screened and is polymerized, according to described data list and described versions of data model, obtain the data rows model that described fact table model is corresponding, according to described index definition table and described versions of data model, obtain the index model that described fact table model is corresponding, according to described data list model and described index model corresponding data mining rule, obtain the data rows title of described fact table, and the index in described index model is screened, calculate, statistics and classification, achieve according to the automatic discovery of the data in data warehouse to the knowledge wherein contained.
3rd embodiment
The present embodiment, based on the above embodiment of the present invention, further provides a kind of technical scheme that in dredge operation, dimension is excavated.In this technical scheme, according to described data line table model and described dimensional model corresponding data mining rule, obtain the data line title of described fact table, and the dimension in described dimensional model to be screened and polymerization comprises: be that described data line creates subject method analytic thread; Described subject method analytic thread adds interpreter; Excavate according to the data dimension of described subject method analytic thread to described fact table.
See Figure 10, according to described data line table model and data mining rule corresponding to described dimensional model, obtain the data line title of described fact table, and the dimension in described dimensional model to be screened and polymerization comprises:
S101, for described data line creates subject method analytic thread.
Described subject method analytic thread is the object of type i XmlNodeResolverChain.It is responsible for providing principal solution to analyse chain interface and attribute, provides the physical interface of the implementation method of specific implementation class, can be used as the pointer of sub-analytic thread.
S102, described subject method analytic thread adds interpreter.
Although complete the establishment of subject method analytic thread, the parsing of the complete paired data mining rule of subject method analytic thread of establishment be utilized, also need to add interpreter on described subject method analytic thread.
Figure 11 shows the inheritance between interpreter.See Figure 11, all interpreters are all inherited from AbstractRuleResolver abstract class.The concrete resolution logic of resolve method realization to different pieces of information object that various concrete interpreter provides by realizing AbstractRuleResolver abstract class.The subclass of AbstractRuleResolver abstract class comprises: for perform additive operation addition interpreter AddResolver, for perform subtraction subtraction interpreter RecResolver, for perform multiplying multiplication interpreter MulResolver, for perform division arithmetic division interpreter DivResolver, for performing the maximal value interpreter MaxResolver getting maximum operation, and for performing the minimum value interpreter MinResolver getting minimum operation.
To add the example divResolver of division actuator on the example resolverChain of described subject method analytic thread, concrete code is as follows:
resolverChain.append(divResolver);
S103, excavates according to the data dimension of described subject method analytic thread to described fact table.
Creating subject method analytic thread, and with the addition of various interpreter on described subject method analytic thread after, according to described subject method analytic thread, data dimension is being excavated.
Concrete, when after the mapping method calling TableModel object, described subject method analytic thread is called, thus performs the excavation to the data dimension of fact table.Further, the Result of data dimension is stored in mapData member's parameter of TableModel object.
If TableModel object has reference list, when namely the refTable constituent parameters of TableModel object is not empty, when calling mapping method, the excavation of the data dimension to reference list also can be performed.Further, performing in the process to the excavation of the data degree of being of reference list, also can be polymerized the Result of the data dimension of different reference lists, and polymerization result being left in rowMapData member's parameter of RowModel object.Then, ColumnModel object can obtain polymerization result from the rowMapData member of RowModel object, and this result is placed on equally in mapData member's parameter of TableModel object.
The present embodiment connects by creating main syntax parsing for described data line, described subject method analytic thread adds interpreter, and excavate according to the data dimension of described subject method analytic thread to described fact table, thus the mode building syntax parsing chain is utilized to achieve the automatic mining performed data dimension according to predefined data mining rule.
4th embodiment
The present embodiment, based on the above embodiment of the present invention, further provides the another kind of technical scheme that in dredge operation, dimension is excavated.In this technical scheme, according to described data line table model and described dimensional model corresponding data mining rule, obtain the data line title of described fact table, and the dimension in described dimensional model is screened and is polymerized also comprise: add interpreter on described subject method analytic thread after, for described subject method analytic thread creates sub-analytic thread; Described sub-syntax parsing chain adds interpreter; Carry out excavation according to described subject method analytic thread to the data dimension of described fact table to comprise: according to described subject method analytic thread, and sub-syntax parsing chain, the data dimension of described fact table is excavated.
See Figure 12, according to described data line table model and data mining rule corresponding to described dimensional model, obtain the data line title of described fact table, and the dimension in described dimensional model to be screened and polymerization comprises:
S121, for described data line creates subject method analytic thread.
S122, described subject method analytic thread adds interpreter.
S123, for described subject method analytic thread creates sub-syntax parsing chain.
If there is hierarchical relationship in the dimensional model of fact table, when namely main dimension has again a sub-dimension of self, only utilize subject method analytic thread cannot complete complete excavation to various level dimension.In order to complete the complete excavation to the data dimension that there is hierarchical relationship, corresponding to the sub-dimension in dimensional model, for subject method analytic thread creates sub-syntax parsing chain.
Concrete, the establishment code of sub-syntax parsing chain is as follows:
IXmlNodeResolverChainsubChain;
S124, described sub-syntax parsing chain adds interpreter.
Similar with subject method analytic thread, the sub-syntax parsing chain of establishment be used to resolve data mining rule, also need to add interpreter on sub-syntax parsing chain.Sub-syntax parsing chain adds logic and the code of interpreter and on subject method analytic thread, add interpreter similar, do not repeat them here.
S125, according to described subject method analytic thread, and sub-syntax parsing chain, the data dimension of described fact table is excavated.
The present embodiment is by after establishment subject method analytic thread, for described subject method analytic thread further creates sub-syntax parsing chain, and add interpreter on described sub-syntax parsing chain, thus sub-syntax parsing chain is utilized to complete the excavation of various level data dimension.
5th embodiment
The present embodiment, based on the above embodiment of the present invention, further provides a kind of technical scheme that in dredge operation, index is excavated.In this technical scheme, according to described data list model and described index model corresponding data mining rule, obtain the data rows title of described fact table, and the index in described index model is screened, calculate, to add up and classification comprises: create syntax parsing chain; The interpreter that syntax parsing chain described in initialization needs; Described syntax parsing chain adds initialized interpreter; Excavate according to the data target of described syntax parsing chain to described fact table.
See Figure 13, according to described data list model and data mining rule corresponding to described index model, obtain the data rows title of described fact table, and the index in described index model is screened, calculate, to add up and classification comprises:
S131, creates syntax parsing chain.
With the class of operation performed in dimension excavation seemingly, when index of performance excavates, also need first to create the syntax parsing chain excavated for index.Exemplary, the code creating syntax parsing chain is as follows:
IXmlNodeResolverChainindexSetResolver=newIndexSetResolver();
resolverChain=indexSetResolver;
S132, the interpreter that syntax parsing chain described in initialization needs.
Equally, in the process that index of performance excavates, in order to the different semantic label defined in identification data mining rule, different classes of interpreter is needed.These interpreters are being added on described syntax parsing chain, before performing the identification to different semantic label, are needing to carry out initialization to these interpreters.
To be initialized as example to an example of addition interpreter, code is as follows:
IXmlNodeResolverChainaddResolver=newAddResolver();
S133, described syntax parsing chain adds initialized interpreter.
After completing the initialization operation to various interpreter, described syntax parsing chain adds the interpreter carried out after initialization.Concrete code is as follows:
resolverChain.append(addResolver);
S134, excavates according to the data target of described syntax parsing chain to described fact table.
After completing the interpolation operation of various interpreter, the data target of syntax parsing chain to fact table that with the addition of different interpreter is utilized to excavate.If according to the definition in data mining rule, need to excavate pointer type and data granularity, the excavation of pointer type and data granularity is also completed in the lump.
The present embodiment is by creating syntax parsing chain, the various interpreters that syntax parsing chain described in initialization needs, described syntax parsing chain adds initialized interpreter, and excavate according to the data target of described syntax parsing chain to described fact table, achieve the automatic mining of the data target to fact table.
6th embodiment
Present embodiments provide a kind of technical scheme of data mining device.See Figure 14, in this technical scheme, described data mining device comprises: model acquisition module 141 and excavation module 142.
Described model acquisition module 141 for obtaining data mining model, described data mining model for the tables of data in data warehouse, and record in described tables of data carry out data mining institute foundation data mining rule.
Described excavation module 142 is for excavating the factual data in data warehouse according to described data mining rule.
Note, above are only preferred embodiment of the present invention and institute's application technology principle.Skilled person in the art will appreciate that and the invention is not restricted to specific embodiment described here, various obvious change can be carried out for a person skilled in the art, readjust and substitute and can not protection scope of the present invention be departed from.Therefore, although be described in further detail invention has been by above embodiment, the present invention is not limited only to above embodiment, when not departing from the present invention's design, can also comprise other Equivalent embodiments more, and scope of the present invention is determined by appended right.

Claims (10)

1. a data digging method, is characterized in that, comprising:
Obtain data mining model, described data mining model corresponds to and the tables of data in data warehouse, and records the data mining rule of carrying out data mining institute foundation in described tables of data;
According to described data mining rule, the factual data in data warehouse is excavated.
2. method according to claim 1, is characterized in that, described data mining model comprises: fact table model, data line table model, data list model, dimensional model and index model.
3. method according to claim 2, it is characterized in that, described fact table model corresponds to the fact table in data warehouse, described data line table model corresponds to the data line table in data warehouse, described data list model corresponds to the data list in data warehouse, described dimensional model corresponds to the dimension definition list in data warehouse, and described index model corresponds to the index definition table in data warehouse.
4. method according to claim 3, is characterized in that, described dimension definition list comprises: primary dimension table, sub-dimension table, dimension values enumerated table and dimension values set table.
5. method according to claim 3, is characterized in that, described index definition table comprises: index table, pointer type table, data granularity table and index set table.
6. method according to claim 3, is characterized in that, carries out excavation comprise according to described data mining rule to the factual data in data warehouse:
Described fact table model is obtained according to described fact table;
Obtain the versions of data model of described fact table;
According to described data line table and described versions of data model, obtain the data line table model that described fact table model is corresponding;
According to described dimension definition list and described versions of data model, obtain the dimensional model that described fact table model is corresponding;
According to described data line table model and described dimensional model corresponding data mining rule, obtain the data line title of described fact table, and the dimension in described dimensional model screened and is polymerized;
According to described data list and described versions of data model, obtain the data rows model that described fact table model is corresponding;
According to described index definition table and described versions of data model, obtain the index model that described fact table model is corresponding;
According to described data list model and described index model corresponding data mining rule, obtain the data rows title of described fact table, and the index in described index model screened, calculate, add up and sorted out.
7. method according to claim 6, it is characterized in that, according to described data line table model and data mining rule corresponding to described dimensional model, obtain the data line title of described fact table, and the dimension in described dimensional model to be screened and polymerization comprises:
For described data line creates subject method analytic thread;
Described subject method analytic thread adds interpreter;
Excavate according to the data dimension of described subject method analytic thread to described fact table.
8. method according to claim 7, it is characterized in that, according to described data line table model and data mining rule corresponding to described dimensional model, obtain the data line title of described fact table, and the dimension in described dimensional model is screened and is polymerized also comprise:
After described subject method analytic thread adds interpreter, for described subject method analytic thread creates sub-syntax parsing chain;
Described sub-syntax parsing chain adds interpreter;
Carry out excavation according to described subject method analytic thread to the data dimension of described fact table to comprise:
According to described subject method analytic thread, and sub-syntax parsing chain, the data dimension of described fact table is excavated.
9. method according to claim 6, it is characterized in that, according to described data list model and data mining rule corresponding to described index model, obtain the data rows title of described fact table, and the index in described index model is screened, calculate, to add up and classification comprises:
Create syntax parsing chain;
The interpreter that syntax parsing chain described in initialization needs;
Described syntax parsing chain adds initialized interpreter;
Excavate according to the data target of described syntax parsing chain to described fact table.
10. a data mining device, is characterized in that, comprising:
Model acquisition module, for obtaining data mining model, described data mining model corresponds to and the tables of data in data warehouse, and records the data mining rule of carrying out data mining institute foundation in described tables of data;
Excavate module, for excavating the factual data in data warehouse according to described data mining rule.
CN201510598360.0A 2015-09-18 2015-09-18 Data digging method and device Active CN105404637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510598360.0A CN105404637B (en) 2015-09-18 2015-09-18 Data digging method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510598360.0A CN105404637B (en) 2015-09-18 2015-09-18 Data digging method and device

Publications (2)

Publication Number Publication Date
CN105404637A true CN105404637A (en) 2016-03-16
CN105404637B CN105404637B (en) 2019-03-01

Family

ID=55470127

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510598360.0A Active CN105404637B (en) 2015-09-18 2015-09-18 Data digging method and device

Country Status (1)

Country Link
CN (1) CN105404637B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609064A (en) * 2017-08-30 2018-01-19 成都中建科联网络科技有限公司 Rival's intelligent analysis method based on data mining
CN108241692A (en) * 2016-12-26 2018-07-03 北京国双科技有限公司 The querying method and device of data
CN112418721A (en) * 2020-12-08 2021-02-26 中国建设银行股份有限公司 Index determination method and device
CN112767058A (en) * 2021-02-05 2021-05-07 深圳市爱云信息科技有限公司 AIOT DaaS digital twin cloud platform
CN113449045A (en) * 2021-06-02 2021-09-28 中国人民解放军海军工程大学 Data warehouse system for ship propulsion system performance analysis
CN113487347A (en) * 2021-06-22 2021-10-08 南方电网能源发展研究院有限责任公司 Intelligent cost analysis method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040122646A1 (en) * 2002-12-18 2004-06-24 International Business Machines Corporation System and method for automatically building an OLAP model in a relational database
CN104123346A (en) * 2014-07-02 2014-10-29 广东电网公司信息中心 Structural data searching method
CN104281713A (en) * 2014-10-28 2015-01-14 用友软件股份有限公司 Data summarizing method and data summarizing device
CN104317936A (en) * 2014-10-31 2015-01-28 北京思特奇信息技术股份有限公司 ROLAP (relational on-line analysis processing) analysis engine design method and device on basis of star models
CN104572894A (en) * 2014-12-24 2015-04-29 天津南大通用数据技术股份有限公司 Method for describing service model by utilizing XML (Extensible Markup Language) in business intelligence and business intelligence system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040122646A1 (en) * 2002-12-18 2004-06-24 International Business Machines Corporation System and method for automatically building an OLAP model in a relational database
CN104123346A (en) * 2014-07-02 2014-10-29 广东电网公司信息中心 Structural data searching method
CN104281713A (en) * 2014-10-28 2015-01-14 用友软件股份有限公司 Data summarizing method and data summarizing device
CN104317936A (en) * 2014-10-31 2015-01-28 北京思特奇信息技术股份有限公司 ROLAP (relational on-line analysis processing) analysis engine design method and device on basis of star models
CN104572894A (en) * 2014-12-24 2015-04-29 天津南大通用数据技术股份有限公司 Method for describing service model by utilizing XML (Extensible Markup Language) in business intelligence and business intelligence system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241692A (en) * 2016-12-26 2018-07-03 北京国双科技有限公司 The querying method and device of data
CN107609064A (en) * 2017-08-30 2018-01-19 成都中建科联网络科技有限公司 Rival's intelligent analysis method based on data mining
CN112418721A (en) * 2020-12-08 2021-02-26 中国建设银行股份有限公司 Index determination method and device
CN112767058A (en) * 2021-02-05 2021-05-07 深圳市爱云信息科技有限公司 AIOT DaaS digital twin cloud platform
CN113449045A (en) * 2021-06-02 2021-09-28 中国人民解放军海军工程大学 Data warehouse system for ship propulsion system performance analysis
CN113487347A (en) * 2021-06-22 2021-10-08 南方电网能源发展研究院有限责任公司 Intelligent cost analysis method

Also Published As

Publication number Publication date
CN105404637B (en) 2019-03-01

Similar Documents

Publication Publication Date Title
CN105404637A (en) Data mining method and device
CN110168518B (en) User interface for preparing and collating data for subsequent analysis
Abel et al. The systems integration problem
CN104767813B (en) Public&#39;s row big data service platform based on openstack
Di Domenica et al. Stochastic programming and scenario generation within a simulation framework: an information systems perspective
CN103744846A (en) Multidimensional dynamic local knowledge map and constructing method thereof
CN104778540A (en) BOM (bill of material) management method and management system for building material equipment manufacturing
CN103631882A (en) Semantization service generation system and method based on graph mining technique
CN105912666A (en) Method for high-performance storage and inquiry of hybrid structure data aiming at cloud platform
Dolk Integrated model management in the data warehouse era
CN103631922A (en) Hadoop cluster-based large-scale Web information extraction method and system
CN113168413B (en) Correlated incremental loading of multiple data sets for interactive data preparation applications
CN105405069A (en) Electricity purchase operating decision analysis and data processing method
Tardio et al. An iterative methodology for big data management, analysis and visualization
CN108416524A (en) Estate planning based on a figure general framework refines deciphering method
Usman et al. Discovering diverse association rules from multidimensional schema
Brdjanin et al. Automated synthesis of initial conceptual database model based on collaborative business process model
CN106095443A (en) A kind of API call mode method for digging based on C/C++ code library
CN113722564A (en) Visualization method and device for energy and material supply chain based on space map convolution
CN113254517A (en) Service providing method based on internet big data
CN101706840A (en) Product node tree based presentation method of product performance simulation information
Escobedo et al. Business intelligence and data analytics (BI&DA) to support the operation of smart grid
CN109523145A (en) Electric Design quality control platform
CN109242301A (en) A kind of soil performance interactive mode real-time analysis method based on big data framework
OUKHOUYA et al. Automating Data Warehouse Design With MDA Approach Using NoSQL and Relational Systems

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant