CN103853820A - Data processing method and data processing system - Google Patents

Data processing method and data processing system Download PDF

Info

Publication number
CN103853820A
CN103853820A CN201410058539.2A CN201410058539A CN103853820A CN 103853820 A CN103853820 A CN 103853820A CN 201410058539 A CN201410058539 A CN 201410058539A CN 103853820 A CN103853820 A CN 103853820A
Authority
CN
China
Prior art keywords
dimension
data
subject heading
heading list
fact table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410058539.2A
Other languages
Chinese (zh)
Other versions
CN103853820B (en
Inventor
陈国强
朱培冬
郝栋
姬永杰
刘广财
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing UYU Government Software Co.,Ltd.
Original Assignee
BEIJING UFIDA SOFTWARE CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING UFIDA SOFTWARE CO LTD filed Critical BEIJING UFIDA SOFTWARE CO LTD
Priority to CN201410058539.2A priority Critical patent/CN103853820B/en
Publication of CN103853820A publication Critical patent/CN103853820A/en
Application granted granted Critical
Publication of CN103853820B publication Critical patent/CN103853820B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Abstract

The invention discloses a data processing method and a data processing system. The data processing method comprises the following steps: firstly, storing original data to a theme table, and recording codes and names of dimensions which need to generate a dimension table in the theme table; then, generating the corresponding dimension table according to the codes and the names, recorded in the theme table, of the dimensions; storing corresponding dimension table data to the dimension table, and generating the dimension ID of each dimension; generating a fact table related to the dimension table from the theme table according to the dimension ID of each dimension; storing corresponding fact table data into the fact table; finally, generating an application summary table from the fact table as needed to obtain application summary data, and storing to the application summary table. According to the method and the system, by adding the theme table and the application summary table in a data processing cycle, the data can be repeatedly utilized based on the theme table; through derived dimension-supported calculation, the conversion of a data analysis aperture is realize, and the data processing efficiency and the practicability of processed data are effectively improved.

Description

A kind of data processing method and system
Technical field
The present invention relates to the technical field of data processing of data warehouse, be specifically related to a kind of data processing method and system.
Background technology
Along with the rise of large concept data, government agencies at all levels are all based on data warehouse technology, utilize energetically the each type of structured and the unstructured data that in governability process, produce, traditional data REPOSITORY TECHNOLOGY based on prerequisite be clear and definite how data go to utilize, need first by Data Collection but the demand of reality is government department, how to utilize data also not think over completely.
The ultimate principle of traditional data REPOSITORY TECHNOLOGY is that the raw data in data source is drawn into temporary area, after unified cleaning, conversion and processing, be updated to dimension table and fact table, the multi-dimension data cube by data visualization tool based on fact table and dimension table composition carries out data utilization.But, utilizing traditional data REPOSITORY TECHNOLOGY to face a series of challenge in building government data warehouse process, because the relative aperture of historical data is thick, standardization is relatively not high, data management bore in different years, different zonings is inconsistent, in business, lack unified transformation rule, often need business personnel to confirm one by one conversion regime to service data, this workload and difficulty are inconceivable; Even if data complete conversion according to unified data standard, because governability data bore and analysis data bore have very big-difference, cause service data cannot be directly used in analysis, often need technician to develop complicated transcode, need to carry out a large amount of interim computing work, often occur representing efficiency low, or even the inconsistent phenomenon of data, causing can not rapid response to customer's need, and the practicality of data reduces greatly.
Summary of the invention
For the defect existing in prior art, the object of the present invention is to provide a kind of data processing method and system, improve the efficiency of data processing and process the practicality of rear data.
For achieving the above object, the technical solution used in the present invention is: a kind of data processing method, comprises the following steps:
(1) raw data is stored in subject heading list, and in subject heading list, record code and the title of the dimension that need to generate dimension table; Described subject heading list refer to build according to the description demand of business object for storing the tables of data of all kinds of original business datums;
(2) generate corresponding dimension table according to the code of the dimension recording in subject heading list and title, corresponding dimension table data are stored in dimension table, and generate the dimension ID of each dimension;
(3), according to described dimension ID, generate the fact table associated with dimension table from subject heading list, and corresponding fact table data are stored in fact table;
(4) generate application summary sheet from fact table as required, the combined data that is applied, and store in application summary sheet; Described application summary sheet is for storing according to default derivation calculated relationship the data after the data-switching of fact table.
Further, a kind of data processing method as above, in step (1), before storing raw data into subject heading list, first store raw data into temporary area, obtain raw data from temporary area afterwards and store into subject heading list, raw data is stored into after subject heading list, delete raw data corresponding in temporary area.
Further, a kind of data processing method as above, in step (1), carries out after pre-service raw data, then raw data is stored in subject heading list; Described pre-service comprises attribute of overall importance, unified conversion zoning and the date property that polishing raw data lacks and deletes unwanted data in subject heading list; Described unwanted data comprise cancels and business datum in transit.
Further, a kind of data processing method as above, in step (3), the concrete mode that generates the fact table associated with dimension table from subject heading list according to dimension ID comprises:
The dimension name of data as required, in subject heading list, obtain dimension code corresponding to this dimension name, by the dimension code recording in subject heading list and dimension associated codes in corresponding dimension table, obtain dimension ID, dimension ID is stored in fact table, and the non-dimension table data under this dimension in subject heading list are directly stored in fact table.
Further, a kind of data processing method as above, in step (3), generates when fact table, adopts the mode of increment extraction, only extracts Data Update in the subject heading list of setting-up time section in fact table.
Further, a kind of data processing method as above, in step (4), comprises from the concrete mode of fact table generation application summary sheet as required:
As required, the default calculated relationship that derives from, generates derivation dimension table according to deriving from calculated relationship; Described derivation calculated relationship refers to the calculated relationship deriving between dimension table and dimension table;
By deriving from, computing formula is associated with the dimension ID in fact table, and according to deriving from computing formula, the data in fact table is changed, and generates application summary sheet.
Further again, a kind of data processing method as above, in step (4), described derivation calculated relationship comprises the operation relation that adds, subtracts and take advantage of.
Further, a kind of data processing method as above, in step (4), when the data in fact table conversions according to derivation computing formula, is converted to described derivation calculated relationship the operation relation of cartesian product.
A kind of data handling system, comprising:
Subject heading list builds module, for setting up subject heading list, raw data is stored in subject heading list, and in subject heading list, records code and the title of the dimension that need to generate dimension table; Described subject heading list refer to build according to the description demand of business object for storing the tables of data of all kinds of business datums;
Dimension table generation module, generates corresponding dimension table for code and the title of the dimension that records according to subject heading list, corresponding dimension table data is stored in dimension table, and generates the dimension ID of each dimension;
Fact table generation module, for according to described dimension ID, generates the fact table associated with dimension table from subject heading list, and corresponding fact table data is stored in fact table;
Application summary sheet generation module, generates application summary sheet from fact table as required, the combined data that is applied, and store in application summary sheet; Described application summary sheet is for storing according to default derivation calculated relationship the data after the data-switching of fact table.
Further, a kind of data handling system as above, described application summary sheet generation module comprises:
Derive from dimension table generation unit, for the default calculated relationship that derives from, derive from dimension table according to deriving from calculated relationship generation; Described derivation calculated relationship refers to the calculated relationship deriving between dimension table and dimension table;
Application summary sheet generation unit, for associated with the dimension ID of fact table by deriving from computing formula, and changes the data in fact table according to deriving from computing formula, generates application summary sheet.
Beneficial effect of the present invention is: method and system of the present invention, by increased " subject heading list ", " deriving from dimension " and " application data sheet " in flow chart of data processing, data can be reused based on subject heading list, in addition, tie up the calculating of supporting by derivation, realize the conversion of data analysis bore, effectively improved the efficiency of data processing and the practicality of the rear data of processing.
Accompanying drawing explanation
Fig. 1 is the configuration diagram of a kind of data handling system in embodiment;
Fig. 2 is the structured flowchart of a kind of data handling system in embodiment;
Fig. 3 is the process flow diagram of a kind of data handling system in embodiment;
Fig. 4 is the structural representation of subject heading list in embodiment;
Fig. 5 is the schematic diagram of dimension table in embodiment;
Fig. 6 is the schematic diagram of dimension table in embodiment;
Fig. 7 is the schematic diagram of fact table in embodiment;
Fig. 8 is the schematic diagram that derives from dimension table in embodiment;
Fig. 9 is the schematic diagram that in embodiment, derivation calculated relationship is converted to cartesian product operation relation;
Figure 10 is the schematic diagram of applying summary sheet in embodiment.
Embodiment
Below in conjunction with Figure of description and embodiment, the present invention is described in further detail.
For a better understanding of the present invention, first technical term related in this embodiment is explained:
Temporary area: the raw data of obtaining from data source for interim storage, generally the data of storage are incomplete, as a: annual data, January data or a day data, at data pick-up, after subject heading list, the data of temporary area can be dropped.
Subject heading list: for permanent storage government straddle over year, all kinds of business datums of transregional stroke, as the most crucial part of data warehouse, the structure of subject heading list builds according to the description demand of business object, the data of subject heading list retain the fine granularity of original business, can farthest describe original business.
Dimension table: the part in traditional data warehouse, for storing dimension table data.
Fact table: the part in traditional data warehouse, for storing the business datum after processing, composition data warehouse " cube ", is convenient to multidimensional data analysis flexibly together with dimension table.
Derive from dimension table: for storing the data that derive from dimension definition, comprise and derive from the list of dimension value collection, and each derives from the computing formula of dimension value.
Application summary sheet: for storing the statistical data analysis calculating according to deriving from dimension definition, need according to actual statistical study, the granularity of data storage is thicker, and the instrument that can be visualized directly utilizes.
Fig. 1 and Fig. 2 show respectively configuration diagram and the structured flowchart of a kind of data handling system in this embodiment, this system comprises that subject heading list builds module 11, dimension table generation module 12, fact table generation module 13 and application summary sheet generation module 14, wherein:
Subject heading list builds module 11 for setting up subject heading list, raw data is stored in subject heading list, and in subject heading list, record code and the title of the dimension that need to generate dimension table; Described subject heading list refer to build according to the description demand of business object for storing the tables of data of all kinds of business datums;
Dimension table generation module 12 generates corresponding dimension table for code and the title of the dimension that records according to subject heading list, corresponding dimension table data is stored in dimension table, and generates the dimension ID of each dimension;
Fact table generation module 13, for according to described dimension ID, generates the fact table associated with dimension table from subject heading list, and corresponding fact table data is stored in fact table;
Application summary sheet generation module 14 generates application summary sheet from fact table as required, the combined data that is applied, and store in application summary sheet; Described application summary sheet is for storing according to default derivation calculated relationship the data after the data-switching of fact table.This module comprises derivation dimension table generation unit 141 and application summary sheet generation unit 142, and derivation dimension table generation unit 141, for default derivation calculated relationship, generates derivation dimension table according to deriving from calculated relationship; Described derivation calculated relationship refers to the calculated relationship deriving between dimension table and dimension table; Apply summary sheet generation unit 142 for associated with the dimension ID of fact table by deriving from computing formula, and according to deriving from computing formula, the data in fact table are changed, generate application summary sheet.
Fig. 3 shows the process flow diagram of a kind of data processing method based on system shown in Fig. 2 in this embodiment, and the method comprises the following steps:
Step S21: build subject heading list, raw data is stored in subject heading list;
First build a subject heading list, the raw data of data source is stored in subject heading list, and in subject heading list, record code and the title of the dimension that need to generate dimension table.Described subject heading list refer to build according to the description demand of business object for storing the tables of data of all kinds of original business datums.Subject heading list in present embodiment for permanent storage government straddle over year, all kinds of business datums of transregional stroke, as the most crucial part of data warehouse, data in subject heading list have retained the fine granularity of original business datum, can farthest describe original business.
In present embodiment, before storing raw data into subject heading list, can also set up the temporary area of a raw data of obtaining from data source for temporary transient storage, first store raw data into temporary area, obtaining raw data from temporary area afterwards stores into subject heading list, as shown in fig. 1, the data that temporary area is generally stored not are complete raw data, and be the data within the scope of a setting-up time, as the data of a year, the data of the data of month or one day, raw data is being stored into after subject heading list, can delete raw data corresponding in temporary area.
The raw data of data source can be a data file, as Excel file, can be also the snapshot of a database.Before storing raw data into subject heading list, also need raw data to carry out some pre-service, these pre-service mainly comprise attribute of overall importance, unified conversion zoning and the date property that polishing raw data lacks and delete unwanted data in subject heading list.
Wherein, described attribute of overall importance includes but not limited to date (year, month etc.) and the zoning etc. of raw data.In the time of unified conversion zoning and date property, the zoning dimension table in the dimension table that need to guarantee to set up in the code of this two generic attribute of zoning and date and step S22 and the code in date dimension table are consistent.Described unwanted data comprise cancels and business datum in transit.
The schematic diagram of one " evidence for payment " subject heading list structure in present embodiment has been shown in Fig. 4, in this subject heading list shown in " field name " row is the attribute of overall importance of raw data, field description is explaining to attribute of overall importance, for helping to understand the implication of field name, if the implication of the attribute of field " YEAR " by name is " year ".In this table except having retained the attribute informations such as above-mentioned original payment ID, evidence for payment number, summary, simultaneously also for having designed code and the field name of dimension in step S22, " business sections ", " fund character " and " budget entity " etc. are the attributes (dimension in step S22) that need to generate dimension table as shown in Figure 4, code and the field name of dimension that be therefore the attribute design such as " business sections ", " fund character " and " budget entity " in this step.In this embodiment, in the time of the design code of dimension and field name, its prefix is consistent with subject heading list design specifications, and suffix is respectively the code of dimension and the title of dimension with the field name of " CODE " and " NAME " respectively.In addition, " the origin system ID " in Fig. 4 refers to unique major key ID of original business datum table, stored unique major key ID of the original traffic table data of its correspondence in subject heading list, can be realized subject heading list and the associated of acquired original data and be checked by this ID.
By the appropriate design of subject heading list, can guarantee business of description that each record can independent completion, can reflect a concrete payment transaction by the data in table, when pay, pay the information such as which unit, which sections and payment.
In subject heading list, specifically need to design code and the title of which dimension is determined as required by user, but need to guarantee to need in subject heading list the code of the attribute that generates dimension table consistent with code and the title of dimension in dimension table with title, as the fund character in Fig. 4, in the time generating dimension table, in subject heading list, the code of fund character need to be consistent with code and the title of the fund character in the dimension table generating in step S22 with title.
Step S22: generate dimension table from subject heading list as required;
Step S23: generate fact table from subject heading list as required;
In step S21, code and the title of the dimension that need to generate dimension table in subject heading list, are recorded, generate respectively corresponding dimension table according to the code of these dimensions that record in subject heading list and title, corresponding dimension table data (dimension table data refer to dimension name and dimension code) are stored in dimension table, and generate the dimension ID of each dimension.For example, for code and the title of " zoning " this dimension of the record in a subject heading list, generate zoning dimension table, and in this dimension table, record code and the title of all zoning dimensions, generate the dimension ID of each zoning dimension simultaneously, as shown in Figure 5, " leader of office " this dimension, its dimension name is " leader of office ", dimension code " 01 ", dimension ID is " 118301 ", from the description in step S21, for the dimension table in Fig. 5, wherein dimension code " 01 " and dimension name " leader of office " have also recorded dimension code " 01 " and dimension name " leader of office " in subject heading list.
In addition, in order to improve the efficiency of dimension table, avoid repeating generating, in the time generating dimension table, can first the dimension code recording in subject heading list be compared with the dimension code in corresponding dimension table, if can not find dimension code and the dimension name of coupling in dimension table, by the dimension code recording in subject heading list and name storage in this dimension table, and supplement the dimension ID generating should dimension code, if found the result of coupling in dimension table, illustrate and in dimension table, stored this dimension code and dimension name, do not need regeneration.The rule of following in the time upgrading dimension table comprises: the value collection of dimension table only increases and revises, can not delete (same dimension different year values collection possible different, if deleted, can cause the business datum in historical year to concentrate and can not find corresponding dimension code in dimension tabular value, cannot analyze again); Integrate as basic value collection with the value in this zoning and the current year; While there is code and Name Conflict the current year, use up-to-date title; The code in historical year did not use and is increased in dimension table when hierarchical relationship does not conflict in the current year.
Generating after dimension table, according to the dimension ID in dimension table, generate the fact table associated with dimension table from subject heading list, and corresponding fact table data are stored in fact table.Concrete generating mode is:
The dimension name of data as required, in subject heading list, obtain dimension code corresponding to this dimension name, by this dimension code recording in subject heading list and dimension associated codes in corresponding dimension table, obtain dimension ID, the dimension ID obtaining is stored in fact table, and the non-dimension table data under this dimension in subject heading list are directly stored in fact table.
That is to say, in the time generating fact table, the dimension name of data first as required, in subject heading list, obtain dimension code corresponding to this dimension name, obtain corresponding dimension ID according to subject heading list with the dimension associated codes in dimension table afterwards, and the non-dimension table data that need under this dimension in dimension ID and subject heading list are stored in fact table.Data as corresponding in " business sections " in Fig. 4, first the dimension code of a certain business sections code in " evidence for payment " subject heading list and Tu5Zhong business sections dimension tables is carried out associated, obtain the dimension ID of corresponding service sections, then the dimension ID of business sections obtaining is stored in fact table, and non-dimension table data (data outside dimension code and title) corresponding subject heading list Zhong Gai business sections are directly stored in fact table.
Fact table only has dimension ID and concrete metric field (non-dimension table data), while generating fact table from subject heading list, adopt the mode of increment extraction, only extract Data Update in the subject heading list of setting-up time section (as nearest 1 year, the combined data in January or a day) according to the type of fact table in fact table.
In present embodiment, the title that is stored in the data in each table is defined as with its table name and claims corresponding title, as the data that are stored in dimension table are called dimension table data, the data that are stored in fact table are called to fact table data, the data in application summary sheet are called to application combined data.
Step S24: generate application summary sheet from fact table as required.
Generate application summary sheet from fact table as required, the combined data that is applied, and store in application summary sheet; Described application summary sheet is for storing according to default derivation calculated relationship the data after the data-switching of fact table.In present embodiment, comprise from the concrete mode of fact table generation application summary sheet as required:
As required, the default calculated relationship that derives from, generates derivation dimension table according to deriving from calculated relationship; Described derivation calculated relationship refers to the calculated relationship deriving between dimension table and dimension table;
By deriving from, computing formula is associated with the dimension ID in fact table, and according to deriving from computing formula, the data in fact table is changed, and generates application summary sheet.
Wherein, described derivation calculated relationship comprises the operation relation that adds, subtracts and take advantage of.When the data in fact table conversions according to the calculated relationship between derivation dimension table and original dimension table (dimension table generating in step S22), in order to improve counting yield, described calculated relationship is converted to the operation relation of cartesian product.
The generation of applying summary sheet in present embodiment depends on default derivation calculated relationship, generates corresponding derivation dimension table according to this derivation calculated relationship, in the time that derivation calculated relationship arranges, supports the computing of " adding ", " subtracting " and " taking advantage of ".Consider counting yield, computing need to be converted to Descartes's set operation, deriving from dimension " statistics zoning " partial arithmetic logical transition is that cartesian product example is referring to Fig. 7.In present embodiment, from the false code of fact table generation application summary sheet be:
INSERT INTO[application summary sheet] (" derivation dimension ", " tolerance ")
SELECT DIMT. " derive from and tie up ", SUM(FACT. " tolerance " * DIMT. " coefficient ")
FROM[fact table] FACT
INNER JOIN[derives from dimension conversion table] DIMT ON DIMT. " original dimension "=FACT. " original dimension "
Group By DIMT. " derive from and tie up ".
For above-mentioned false code, in [], be table name mark, use on stream actual table name to replace, in (), be literary name segment identification, use on stream the actual field of actual table to replace.Concrete INSERT INTO[application summary sheet] represent data to be inserted in application summary sheet, [application summary sheet] refers to apply the table name of summary sheet, in false code, be the mark of " application summary sheet table name ", " application summary sheet table name " according to reality in exploitation code replaced.(" derives from dimension "; " tolerance ") refer to apply the attribute field (mainly comprise and derive from dimension id field and metric field) of summary sheet; in false code, for deriving from the mark of dimension field and metric field, in exploitation code, replace according to the literary name name section of " the application summary sheet " of reality.Which field SELECT represents to inquire about, as dimension and the tolerance of inserting application summary sheet; FROM represents from fact table and derives from data query dimension conversion table; INNER JOIN ties up corresponding original dimension ID by derivation and is associated with the original dimension ID of fact table.Group By represents the level time of grouping, by deriving from dimension grouping.Writing of above-mentioned false code is to belong to prior art for those skilled in the art, can adjust as required.
In the data processing step of above-mentioned steps S21-S24, there is no before how explicit data to utilize, can not perform step S22, S23, S24, after how explicit data utilizes, regeneration dimension table, fact table and application summary sheet as required; Certainly,, in the time that data utilize demand to change, step S22, S23, S24 need to re-execute.Carry out data processing by method and system of the present invention and there is following effect:
1) improve the quality of data by the flow chart of data processing of standard.Data enter subject heading list, dimension table, fact table and application summary sheet successively, and in the time that transformation rule is indefinite, by deposit data, in subject heading list, data are not deliberately changed, and have guaranteed the genuineness of data.
On the basis of dimension table and fact table, increase subject heading list, derived from peacekeeping application summary sheet concept, specified the purposes of every class tables of data, data have first been put into subject heading list, before fact table and the generation of application summary sheet are postponed till in data-switching work.
2) data after treatment possess compared with high practicability.The data that enter application summary sheet generate according to service needed completely, meet the needs of statistical study.Derive from dimension definition by introducing, the operation relation of " adding ", " subtracting " and " taking advantage of " can be set flexibly, greatly improve conversion efficiency by being converted to Descartes's set operation, the service data that business personnel is produced is converted to the statistical data analysis that leader needs fast, allows data become more valuable.
3) data can be repeated to utilize.The construction of data warehouse is take subject of management table as core, and is not only take generating multidimensional cubic as object.In the time that analysis demand changes, do not need to reload raw data, dimension table, fact table and application summary sheet can regenerate according to subject heading list.
For a better understanding of the present invention, below in conjunction with specific embodiment, the present invention is further described.
Embodiment
Raw data in this embodiment is the government data in Jiangsu Province, from numerous government data, extract as required total " the index amount of money " and " amount paid " of Nanjing, Wuxi City, Xuzhou City, Changzhou and Wu Ge prefecture-level city of Suzhou City 2013, and " the index amount of money " and " amount paid " of above-mentioned five prefecture-level city Middle Easterns, and " the index amount of money " and " amount paid " of special bore.
Above-mentioned " special bore " and " districts and cities' total ", " eastern region " are the same, all refer to a kind of statistical study bore, are a kind of bore mark." special bore " is mainly used in ranging the data analysis bore mark of normal packets.The mode of bore mark is various, and different bore marks can be set as required." the index amount of money " refers to certain zoning (as Nanjing), certain year (as 2013) can carry out the amount of money of expenditure, is equivalent to index amount." amount paid " refer to the actual expenditure occurring of certain zoning (as Nanjing), certain year (as 2013) the amount of money, pay according to index amount.
The first step, first builds a subject heading list, and the government data in Jiangsu Province is stored in subject heading list.Because the data that finally need to obtain are related datas of five prefecture-level cities, need to set up zoning dimension table, therefore, in subject heading list, need to record dimension code and the title of each zoning.It should be noted that, subject heading list data are most complete, and in the time generating subject heading list data, dimension table data and non-dimension table data (as amount of money data) all can generate.Fact table data generate according to subject heading list data again.
Second step, generate dimension table according to the dimension code of zoning in subject heading list and title, and dimension code and the title of Nanjing, Wuxi City that in dimension table, storage needs, Xuzhou City, five zoning dimensions of Changzhou and Suzhou City, and generate the dimension ID of each zoning dimension, as shown in Figure 6.
The 3rd step, by the dimension associated codes in the dimension code in subject heading list and dimension table, obtain respectively the dimension ID of Nanjing, Wuxi City, Xuzhou City, Changzhou and Suzhou City, and the dimension ID obtaining is stored in fact table, simultaneously, " the index amount of money " data under corresponding dimension ID in subject heading list and " amount paid " data are stored in fact table, as shown in Figure 7.
The 4th step, as required, the default calculated relationship that derives from.What in concrete, need statistics is total " the index amount of money " and " amount paid " of above-mentioned five prefecture-level cities, and " the index amount of money " and " amount paid " of above-mentioned five prefecture-level city Middle Easterns, and " the index amount of money " and " amount paid " of special bore, therefore, derive from computing formula and be set to " 3201 Changzhou+3205, Xuzhou City+3204, Wuxi City+3203, Nanjing+3202 Suzhou City ", " 3201 Xuzhou City+3204, Nanjing+3203 Changzhou ", " 3201 Nanjing+3205 Suzhou City ", and generate and derive from dimension table according to above-mentioned computing formula, as shown in Figure 8, afterwards above-mentioned calculated relationship is converted to cartesian product relation table as shown in Figure 9.Last associated dimension table and the fact table of deriving from, generate application summary sheet by the mode of false code, obtain the data that need, as shown in Figure 10, five the regional totals " the index amount of money " and " amount paid " that obtain are respectively 790 and 700, " the index amount of money " and " amount paid " of eastern region are respectively 470 and 410, and " the index amount of money " and " amount paid " of special bore are respectively 220 and 190.
Obviously, those skilled in the art can carry out various changes and modification and not depart from the spirit and scope of the present invention the present invention.Like this, if within of the present invention these are revised and modification belongs to the scope of the claims in the present invention and equivalent technology thereof, the present invention is also intended to comprise these changes and modification interior.

Claims (10)

1. a data processing method, comprises the following steps:
(1) raw data is stored in subject heading list, and in subject heading list, record code and the title of the dimension that need to generate dimension table; Described subject heading list refer to build according to the description demand of business object for storing the tables of data of all kinds of original business datums;
(2) generate corresponding dimension table according to the code of the dimension recording in subject heading list and title, corresponding dimension table data are stored in dimension table, and generate the dimension ID of each dimension;
(3), according to described dimension ID, generate the fact table associated with dimension table from subject heading list, and corresponding fact table data are stored in fact table;
(4) generate application summary sheet from fact table as required, the combined data that is applied, and store in application summary sheet; Described application summary sheet is for storing according to default derivation calculated relationship the data after the data-switching of fact table.
2. a kind of data processing method as claimed in claim 1, it is characterized in that, in step (1), before storing raw data into subject heading list, first store raw data into temporary area, obtain raw data from temporary area afterwards and store into subject heading list, raw data is stored into after subject heading list, delete raw data corresponding in temporary area.
3. a kind of data processing method as claimed in claim 1 or 2, is characterized in that, in step (1), raw data is carried out after pre-service, then raw data is stored in subject heading list; Described pre-service comprises attribute of overall importance, unified conversion zoning and the date property that polishing raw data lacks and deletes unwanted data in subject heading list; Described unwanted data comprise cancels and business datum in transit.
4. a kind of data processing method as claimed in claim 1, is characterized in that, in step (3), the concrete mode that generates the fact table associated with dimension table from subject heading list according to dimension ID comprises:
The dimension name of data as required, in subject heading list, obtain dimension code corresponding to this dimension name, by the dimension code recording in subject heading list and dimension associated codes in corresponding dimension table, obtain dimension ID, dimension ID is stored in fact table, and the non-dimension table data under this dimension in subject heading list are directly stored in fact table.
5. a kind of data processing method as described in claim 1 or 4, is characterized in that, in step (3), generates when fact table, adopts the mode of increment extraction, only extracts Data Update in the subject heading list of setting-up time section in fact table.
6. a kind of data processing method as described in one of claim 1 to 4, is characterized in that, in step (4), comprises as required from the concrete mode of fact table generation application summary sheet:
As required, the default calculated relationship that derives from, generates derivation dimension table according to deriving from calculated relationship; Described derivation calculated relationship refers to the calculated relationship deriving between dimension table and dimension table;
By deriving from, computing formula is associated with the dimension ID in fact table, and according to deriving from computing formula, the data in fact table is changed, and generates application summary sheet.
7. a kind of data processing method as claimed in claim 6, is characterized in that, in step (4), described derivation calculated relationship comprises the operation relation that adds, subtracts and take advantage of.
8. a kind of data processing method as claimed in claim 7, is characterized in that, in step (4), when the data in fact table conversions according to derivation computing formula, described derivation calculated relationship is converted to the operation relation of cartesian product.
9. a data handling system, comprising:
Subject heading list builds module, for setting up subject heading list, raw data is stored in subject heading list, and in subject heading list, records code and the title of the dimension that need to generate dimension table; Described subject heading list refer to build according to the description demand of business object for storing the tables of data of all kinds of business datums;
Dimension table generation module, generates corresponding dimension table for code and the title of the dimension that records according to subject heading list, corresponding dimension table data is stored in dimension table, and generates the dimension ID of each dimension;
Fact table generation module, for according to described dimension ID, generates the fact table associated with dimension table from subject heading list, and corresponding fact table data is stored in fact table;
Application summary sheet generation module, generates application summary sheet from fact table as required, the combined data that is applied, and store in application summary sheet; Described application summary sheet is for storing according to default derivation calculated relationship the data after the data-switching of fact table.
10. a kind of data handling system as claimed in claim 9, is characterized in that, described application summary sheet generation module comprises:
Derive from dimension table generation unit, for the default calculated relationship that derives from, derive from dimension table according to deriving from calculated relationship generation; Described derivation calculated relationship refers to the calculated relationship deriving between dimension table and dimension table;
Application summary sheet generation unit, for associated with the dimension ID of fact table by deriving from computing formula, and changes the data in fact table according to deriving from computing formula, generates application summary sheet.
CN201410058539.2A 2014-02-20 2014-02-20 Data processing method and data processing system Expired - Fee Related CN103853820B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410058539.2A CN103853820B (en) 2014-02-20 2014-02-20 Data processing method and data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410058539.2A CN103853820B (en) 2014-02-20 2014-02-20 Data processing method and data processing system

Publications (2)

Publication Number Publication Date
CN103853820A true CN103853820A (en) 2014-06-11
CN103853820B CN103853820B (en) 2017-05-03

Family

ID=50861475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410058539.2A Expired - Fee Related CN103853820B (en) 2014-02-20 2014-02-20 Data processing method and data processing system

Country Status (1)

Country Link
CN (1) CN103853820B (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346449A (en) * 2014-10-28 2015-02-11 用友软件股份有限公司 Data merging method and data merging device
CN104360879A (en) * 2014-10-29 2015-02-18 中国建设银行股份有限公司 Method and device for data processing
CN104391927A (en) * 2014-11-21 2015-03-04 浪潮通用软件有限公司 Dimensionality reconstitution achievement method of multidimensional data models
CN104408183A (en) * 2014-12-15 2015-03-11 北京国双科技有限公司 Data import method and device of data system
CN104536982A (en) * 2014-12-08 2015-04-22 北京用友政务软件有限公司 Data processing method and data processing device
CN105574188A (en) * 2015-12-23 2016-05-11 武汉璞华大数据技术有限公司 Method and system for managing data in different dimensions and at different layers
CN105679309A (en) * 2014-11-21 2016-06-15 科大讯飞股份有限公司 Method and device for optimizing speech recognition system
CN106156040A (en) * 2015-03-26 2016-11-23 阿里巴巴集团控股有限公司 multi-dimensional data management method and device
CN106294792A (en) * 2016-08-15 2017-01-04 上海携程商务有限公司 The method for building up of correlation inquiry system and set up system
CN106326249A (en) * 2015-06-23 2017-01-11 中兴通讯股份有限公司 Data integration processing method and device
CN106407360A (en) * 2016-09-07 2017-02-15 广州视源电子科技股份有限公司 Data processing method and device
CN106713032A (en) * 2016-12-21 2017-05-24 瑞斯康达科技发展股份有限公司 Method and apparatus for managing network management business
CN106909566A (en) * 2015-12-23 2017-06-30 阿里巴巴集团控股有限公司 A kind of Data Modeling Method and equipment
CN106933906A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The querying method and device of data multidimensional degree
CN106933909A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The querying method and device of multi-dimensional data
CN106933907A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The processing method and processing device of tables of data extended counter
CN107402981A (en) * 2017-07-07 2017-11-28 国网浙江省电力公司信息通信分公司 A kind of data increment processing method and system based on distributed offline database
CN107818177A (en) * 2017-11-23 2018-03-20 山东浪潮通软信息科技有限公司 A kind of business intelligence model buildings method and build device
CN108241653A (en) * 2016-12-23 2018-07-03 阿里巴巴集团控股有限公司 Data processing method and device
CN109086309A (en) * 2018-06-21 2018-12-25 深圳市彬讯科技有限公司 A kind of index dimensional relationships define method, server and storage medium
CN109656986A (en) * 2018-10-09 2019-04-19 阿里巴巴集团控股有限公司 A kind of householder method that business datum summarizes, device and electronic equipment
CN110019195A (en) * 2017-09-27 2019-07-16 北京国双科技有限公司 A kind of storage method and device of data
CN110019559A (en) * 2017-12-27 2019-07-16 航天信息股份有限公司 A kind of data query method and system
CN110297818A (en) * 2019-06-26 2019-10-01 杭州数梦工场科技有限公司 Construct the method and device of data warehouse
CN110309496A (en) * 2019-06-24 2019-10-08 招商局金融科技有限公司 Data summarization method, electronic device and computer readable storage medium
CN110928903A (en) * 2018-08-31 2020-03-27 阿里巴巴集团控股有限公司 Data extraction method and device, equipment and storage medium
CN111159173A (en) * 2018-11-08 2020-05-15 王纹 Method for constructing multidimensional semantic database
CN112256744A (en) * 2020-10-27 2021-01-22 武汉市钱鲸科技有限公司 Retail data statistics flow
CN112464619A (en) * 2021-01-25 2021-03-09 平安国际智慧城市科技股份有限公司 Big data processing method, device and equipment and computer readable storage medium
CN112632067A (en) * 2020-12-19 2021-04-09 西安银石科技发展有限责任公司 System and method for analyzing data of one-time crew operation of locomotive crew member
CN117350520A (en) * 2023-12-04 2024-01-05 浙江大学高端装备研究院 Automobile production optimization method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020038306A1 (en) * 2000-06-16 2002-03-28 Griffin David Antony John Method of managing slowly changing dimensions
CN101446964A (en) * 2008-12-31 2009-06-03 中国建设银行股份有限公司 Method of data mining and computer device
CN101866360A (en) * 2010-06-28 2010-10-20 北京用友政务软件有限公司 Data warehouse authentication method and system based on object multidimensional property space
CN101957852A (en) * 2010-09-26 2011-01-26 用友软件股份有限公司 Method and system for producing correlation information of table data
CN103020301A (en) * 2012-12-31 2013-04-03 中国科学院自动化研究所 Multidimensional data query and storage method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020038306A1 (en) * 2000-06-16 2002-03-28 Griffin David Antony John Method of managing slowly changing dimensions
CN101446964A (en) * 2008-12-31 2009-06-03 中国建设银行股份有限公司 Method of data mining and computer device
CN101866360A (en) * 2010-06-28 2010-10-20 北京用友政务软件有限公司 Data warehouse authentication method and system based on object multidimensional property space
CN101957852A (en) * 2010-09-26 2011-01-26 用友软件股份有限公司 Method and system for producing correlation information of table data
CN103020301A (en) * 2012-12-31 2013-04-03 中国科学院自动化研究所 Multidimensional data query and storage method and system

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346449B (en) * 2014-10-28 2017-11-24 用友网络科技股份有限公司 Data merging method and data merging device
CN104346449A (en) * 2014-10-28 2015-02-11 用友软件股份有限公司 Data merging method and data merging device
CN104360879A (en) * 2014-10-29 2015-02-18 中国建设银行股份有限公司 Method and device for data processing
CN104360879B (en) * 2014-10-29 2019-03-01 中国建设银行股份有限公司 A kind of data processing method and device
CN105679309B (en) * 2014-11-21 2019-05-07 北京讯飞乐知行软件有限公司 A kind of optimization method and device of speech recognition system
CN104391927A (en) * 2014-11-21 2015-03-04 浪潮通用软件有限公司 Dimensionality reconstitution achievement method of multidimensional data models
CN105679309A (en) * 2014-11-21 2016-06-15 科大讯飞股份有限公司 Method and device for optimizing speech recognition system
CN104536982A (en) * 2014-12-08 2015-04-22 北京用友政务软件有限公司 Data processing method and data processing device
CN104408183A (en) * 2014-12-15 2015-03-11 北京国双科技有限公司 Data import method and device of data system
CN104408183B (en) * 2014-12-15 2018-05-15 北京国双科技有限公司 The data lead-in method and device of data system
CN106156040A (en) * 2015-03-26 2016-11-23 阿里巴巴集团控股有限公司 multi-dimensional data management method and device
CN106326249A (en) * 2015-06-23 2017-01-11 中兴通讯股份有限公司 Data integration processing method and device
CN106326249B (en) * 2015-06-23 2021-08-03 中兴通讯股份有限公司 Data integration processing method and device
CN106909566A (en) * 2015-12-23 2017-06-30 阿里巴巴集团控股有限公司 A kind of Data Modeling Method and equipment
CN105574188A (en) * 2015-12-23 2016-05-11 武汉璞华大数据技术有限公司 Method and system for managing data in different dimensions and at different layers
CN106933909B (en) * 2015-12-31 2020-06-12 北京国双科技有限公司 Multi-dimensional data query method and device
CN106933907A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The processing method and processing device of tables of data extended counter
CN106933906A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The querying method and device of data multidimensional degree
CN106933909A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The querying method and device of multi-dimensional data
CN106933907B (en) * 2015-12-31 2020-09-15 北京国双科技有限公司 Processing method and device for data table expansion indexes
CN106294792A (en) * 2016-08-15 2017-01-04 上海携程商务有限公司 The method for building up of correlation inquiry system and set up system
CN106294792B (en) * 2016-08-15 2019-05-31 上海携程商务有限公司 The method for building up of correlation inquiry system and establish system
CN106407360A (en) * 2016-09-07 2017-02-15 广州视源电子科技股份有限公司 Data processing method and device
CN106407360B (en) * 2016-09-07 2020-07-24 广州视源电子科技股份有限公司 Data processing method and device
CN106713032A (en) * 2016-12-21 2017-05-24 瑞斯康达科技发展股份有限公司 Method and apparatus for managing network management business
CN106713032B (en) * 2016-12-21 2019-09-17 瑞斯康达科技发展股份有限公司 A kind of method and device for realizing network management service management
CN108241653A (en) * 2016-12-23 2018-07-03 阿里巴巴集团控股有限公司 Data processing method and device
CN107402981A (en) * 2017-07-07 2017-11-28 国网浙江省电力公司信息通信分公司 A kind of data increment processing method and system based on distributed offline database
CN110019195A (en) * 2017-09-27 2019-07-16 北京国双科技有限公司 A kind of storage method and device of data
CN107818177B (en) * 2017-11-23 2021-06-15 浪潮通用软件有限公司 Business intelligent model building method and building device
CN107818177A (en) * 2017-11-23 2018-03-20 山东浪潮通软信息科技有限公司 A kind of business intelligence model buildings method and build device
CN110019559A (en) * 2017-12-27 2019-07-16 航天信息股份有限公司 A kind of data query method and system
CN109086309A (en) * 2018-06-21 2018-12-25 深圳市彬讯科技有限公司 A kind of index dimensional relationships define method, server and storage medium
CN109086309B (en) * 2018-06-21 2022-12-30 土巴兔集团股份有限公司 Index dimension relation definition method, server and storage medium
CN110928903B (en) * 2018-08-31 2024-03-15 阿里巴巴集团控股有限公司 Data extraction method and device, equipment and storage medium
CN110928903A (en) * 2018-08-31 2020-03-27 阿里巴巴集团控股有限公司 Data extraction method and device, equipment and storage medium
CN109656986A (en) * 2018-10-09 2019-04-19 阿里巴巴集团控股有限公司 A kind of householder method that business datum summarizes, device and electronic equipment
CN111159173A (en) * 2018-11-08 2020-05-15 王纹 Method for constructing multidimensional semantic database
CN111159173B (en) * 2018-11-08 2023-04-18 王纹 Method for constructing multidimensional semantic database
CN110309496A (en) * 2019-06-24 2019-10-08 招商局金融科技有限公司 Data summarization method, electronic device and computer readable storage medium
CN110309496B (en) * 2019-06-24 2023-08-22 招商局金融科技有限公司 Data summarizing method, electronic device and computer readable storage medium
CN110297818B (en) * 2019-06-26 2022-03-01 杭州数梦工场科技有限公司 Method and device for constructing data warehouse
CN110297818A (en) * 2019-06-26 2019-10-01 杭州数梦工场科技有限公司 Construct the method and device of data warehouse
CN112256744A (en) * 2020-10-27 2021-01-22 武汉市钱鲸科技有限公司 Retail data statistics flow
CN112632067A (en) * 2020-12-19 2021-04-09 西安银石科技发展有限责任公司 System and method for analyzing data of one-time crew operation of locomotive crew member
CN112464619B (en) * 2021-01-25 2021-05-25 平安国际智慧城市科技股份有限公司 Big data processing method, device and equipment and computer readable storage medium
CN112464619A (en) * 2021-01-25 2021-03-09 平安国际智慧城市科技股份有限公司 Big data processing method, device and equipment and computer readable storage medium
CN117350520A (en) * 2023-12-04 2024-01-05 浙江大学高端装备研究院 Automobile production optimization method and system
CN117350520B (en) * 2023-12-04 2024-02-27 浙江大学高端装备研究院 Automobile production optimization method and system

Also Published As

Publication number Publication date
CN103853820B (en) 2017-05-03

Similar Documents

Publication Publication Date Title
CN103853820A (en) Data processing method and data processing system
CN110618983B (en) JSON document structure-based industrial big data multidimensional analysis and visualization method
CN102937965B (en) A kind of metasystem method for designing based on data model
CN102521225B (en) Incremental data extraction device and incremental data extraction method
US9384256B2 (en) Reporting and summarizing metrics in sparse relationships on an OLTP database
CN106339274B (en) A kind of method and system that data snapshot obtains
CN103605651A (en) Data processing showing method based on on-line analytical processing (OLAP) multi-dimensional analysis
CN102089759A (en) A method of generating an analytical data set for input into an analytical model
CN104111955A (en) Combined inquiring method oriented to Hbase database
CN104182484A (en) Method and device for realizing mapping of HBase data and Java domain objects
US20140095549A1 (en) Method and Apparatus for Generating Schema of Non-Relational Database
CN105630934A (en) Data statistic method and system
CN109408906B (en) BIM model-based engineering quantity expression and statistics method
CN102375827A (en) Method for fast loading versioned electricity network model database
Li A framework study of ETL processes optimization based on metadata repository
CN112651594A (en) Index management system, index management method, index management corresponding device and storage medium
CN104050291B (en) A kind of method for parallel processing and system of account balance data
CN105183949A (en) Railway main data cleaning method and system
CN104408128B (en) A kind of reading optimization method indexed based on B+ trees asynchronous refresh
JP6432893B1 (en) Database processing apparatus, group map file production method and program
CN103942634A (en) Group consolidated statement offset number data processing method and system
CN111221967A (en) Language data classification storage system based on block chain architecture
CN111026760A (en) CDC data acquisition method based on multidimensional service time
CN111125045B (en) Lightweight ETL processing platform
CN101609331A (en) A kind of data storage and read method that is applicable in the industrial process control

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100094 2F, building 11, UFIDA Software Park, 68 Beiqing Road, Haidian District, Beijing

Patentee after: Beijing UYU Government Software Co.,Ltd.

Address before: 100094 2F, building 11, UFIDA Software Park, 68 Beiqing Road, Haidian District, Beijing

Patentee before: YONYOU GOVERNMENT AFFAIRS SOFTWARE Co.,Ltd.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170503

Termination date: 20210220