CN103853820B - Data processing method and data processing system - Google Patents
Data processing method and data processing system Download PDFInfo
- Publication number
- CN103853820B CN103853820B CN201410058539.2A CN201410058539A CN103853820B CN 103853820 B CN103853820 B CN 103853820B CN 201410058539 A CN201410058539 A CN 201410058539A CN 103853820 B CN103853820 B CN 103853820B
- Authority
- CN
- China
- Prior art keywords
- dimension
- data
- subject heading
- heading list
- true
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data processing method and a data processing system. The data processing method comprises the following steps: firstly, storing original data to a theme table, and recording codes and names of dimensions which need to generate a dimension table in the theme table; then, generating the corresponding dimension table according to the codes and the names, recorded in the theme table, of the dimensions; storing corresponding dimension table data to the dimension table, and generating the dimension ID of each dimension; generating a fact table related to the dimension table from the theme table according to the dimension ID of each dimension; storing corresponding fact table data into the fact table; finally, generating an application summary table from the fact table as needed to obtain application summary data, and storing to the application summary table. According to the method and the system, by adding the theme table and the application summary table in a data processing cycle, the data can be repeatedly utilized based on the theme table; through derived dimension-supported calculation, the conversion of a data analysis aperture is realize, and the data processing efficiency and the practicability of processed data are effectively improved.
Description
Technical field
The present invention relates to the technical field of data processing of data warehouse, and in particular to a kind of data processing method and system.
Background technology
With the rise of big data concept, government agencies at all levels all based on data warehouse technology, are positively utilized government
The each type of structured produced during management and unstructured data, the premise that traditional data REPOSITORY TECHNOLOGY is based on is clear and definite
How data go to utilize, but the demand of reality is that government department is needed first by data collection, how also not to be had using data
Have and think over completely.
The ultimate principle of traditional data REPOSITORY TECHNOLOGY is that the initial data in data source is drawn into into temporary area, through unification
Cleaning, conversion and after processing, update to dimension table and true table, by data visualization tool based on the fact that table and dimension table
The multi-dimension data cube of composition carries out data separate.But, using traditional data REPOSITORY TECHNOLOGY government data warehouse process is being built
In face a series of challenge, because the relative aperture of historical data is relatively thick, standardization is not relatively high, in different year, not same districts
The data management bore drawn is inconsistent, and unified transformational rule is lacked in business, generally requires business personnel to peration data
Conversion regime is confirmed one by one, and this workload and difficulty are inconceivable;Even if data are according to unified data standard
Conversion is completed, because governability data bore and analytical data bore have very big difference, causes peration data directly to use
In analysis, the complicated transcode of technical staff's exploitation is generally required, need to carry out substantial amounts of interim operation, Jing often occurs
Represent that efficiency is low, even the inconsistent phenomenon of data, cause can not rapid response to customer's need, the practicality of data drops significantly
It is low.
The content of the invention
For defect present in prior art, it is an object of the invention to provide a kind of data processing method and system,
The practicality of data after improving the efficiency of data processing and processing.
For achieving the above object, the technical solution used in the present invention is:A kind of data processing method, comprises the following steps:
(1)By in original data storage to subject heading list, and record needs the generation of the dimension for generating dimension table in subject heading list
Code and title;Described subject heading list is referred to according to the description demand of business object structure for storing all kinds of original service data
Tables of data;
(2)Corresponding dimension table is generated according to the code and title of the dimension recorded in subject heading list, by corresponding dimension table
Data storage generates dimension ID of each dimension in dimension table;
(3)According to described dimension ID, from subject heading list the fact that associate with dimension table table is generated, and will correspondingly the fact table
Data storage is in true table;
(4)Generate from true table as needed and apply summary sheet, be applied cohersive and integrated data, and store to applying summary sheet
In;Described application summary sheet is used for storage according to the default calculated relationship that derives from by the number after the data conversion in true table
According to.
Further, a kind of data processing method as above, step(1)In, by original data storage to subject heading list it
Before, first by original data storage to temporary area, original data storage is obtained in subject heading list, by original number from temporary area afterwards
According to storage to after subject heading list, corresponding initial data in temporary area is deleted.
Further, a kind of data processing method as above, step(1)In, initial data is carried out after pretreatment, then
By in original data storage to subject heading list;Described pretreatment includes attribute of overall importance, the unification turn that polishing initial data lacks
Change zoning and date property and delete unwanted data in subject heading list;Described unwanted data include cancel and
The business datum on way.
Further, a kind of data processing method as above, step(3)In, according to dimension ID from subject heading list generate with
The concrete mode of the fact that dimension table is associated table includes:
The dimension name of data as needed, obtains the corresponding dimension code of the dimension name, by master in subject heading list
The dimension code recorded in topic table and the dimension associated codes in corresponding dimension table, obtain dimension ID, by the storage of dimension ID to thing
In real table, and the non-dimension table data under the dimension in subject heading list is directly stored in true table.
Further, a kind of data processing method as above, step(3)In, when generating true table, using increment extraction
Mode, the data only extracted in the subject heading list of setting time section are updated in true table.
Further, a kind of data processing method as above, step(4)In, generate from true table apply as needed
The concrete mode of summary sheet includes:
As needed, preset and derive from calculated relationship, according to deriving from calculated relationship and generating dimension table is derived from;Described derivation meter
Calculation relation refers to the calculated relationship derived between dimension table and dimension table;
Computing formula will be derived to associate with dimension ID in true table, and according to derivation computing formula to the number in true table
According to being changed, summary sheet is applied in generation.
Further, a kind of data processing method as above, step(4)In, described derivation calculated relationship includes
Plus, the operation relation that subtracts and take advantage of.
Further, a kind of data processing method as above, step(4)In, according to derivation computing formula to the fact
When data in table are changed, described derivation calculated relationship is converted to into the operation relation of cartesian product.
A kind of data handling system, including:
Subject heading list builds module, for setting up subject heading list, by original data storage to subject heading list, and remembers in subject heading list
Record needs the code and title of the dimension for generating dimension table;Described subject heading list is referred to and built according to the description demand of business object
The tables of data for storing all kinds of business datums;
Dimension table generation module, for the code and title according to the dimension recorded in subject heading list corresponding dimension is generated
Table, corresponding dimension table data is stored in dimension table, and generates dimension ID of each dimension;
True table generation module, for according to described dimension ID, from subject heading list the fact that associate with dimension table table being generated,
And by the fact that correspondence table data storage in true table;
Using summary sheet generation module, generate from true table apply summary sheet as needed, be applied cohersive and integrated data, and
Store using in summary sheet;Described application summary sheet is used for storage according to default derivation calculated relationship by true table
Data after data conversion.
Further, a kind of data handling system as above, described application summary sheet generation module includes:
Dimension table signal generating unit is derived from, for default calculated relationship is derived from, according to deriving from calculated relationship and generating dimension is derived from
Table;Described derivation calculated relationship refers to the calculated relationship derived between dimension table and dimension table;
Using summary sheet signal generating unit, associate with dimension ID in true table for computing formula will to be derived from, and according to group
Raw computing formula is changed to the data in true table, and summary sheet is applied in generation.
The beneficial effects of the present invention is:Method and system of the present invention, by increasing in flow chart of data processing
" subject heading list ", " deriving from dimension " and " application data sheet ", enable data to be reused based on subject heading list, additionally, passing through
The calculating that dimension is supported is derived from, the conversion of data analysiss bore is realized, number after effectively increasing the efficiency of data processing and processing
According to practicality.
Description of the drawings
Fig. 1 is a kind of configuration diagram of data handling system in specific embodiment;
Fig. 2 is a kind of structured flowchart of data handling system in specific embodiment;
Fig. 3 is a kind of flow chart of data handling system in specific embodiment;
Fig. 4 is the structural representation of subject heading list in specific embodiment;
Fig. 5 is the schematic diagram of dimension table in specific embodiment;
Fig. 6 is the schematic diagram of dimension table in embodiment;
Fig. 7 is the schematic diagram of true table in embodiment;
Fig. 8 is the schematic diagram that dimension table is derived from embodiment;
Fig. 9 is that derivation calculated relationship is converted to into the schematic diagram of cartesian product operation relation in embodiment;
Figure 10 is using the schematic diagram of summary sheet in embodiment.
Specific embodiment
With reference to Figure of description, the present invention is described in further detail with specific embodiment.
In order to be better understood from the present invention, technical term involved in this specific embodiment is explained first
It is bright:
Temporary area:For the initial data that interim storage is obtained from data source, the data for generally storing are not
Complete, such as:One annual data, January data or a day data, after data pick-up to subject heading list, the data of temporary area can be lost
Abandon.
Subject heading list:Straddle over year for permanently storing government, transregional stroke of all kinds of business datums, it is most crucial as data warehouse
Part, the structure of subject heading list builds according to the description demand of business object, and the data of subject heading list retain the thin of original business
Granularity, can farthest describe original service.
Dimension table:The part in traditional data warehouse, for storing dimension table data.
True table:The part in traditional data warehouse, for the business datum after storage processing, is constituted together with dimension table
Data warehouse " cube ", is easy to flexible multidimensional data analysis.
Derive from dimension table:For the data that dimension definition is derived from storage, including dimension value collection list is derived from, and each derivation
The computing formula of dimension value.
Using summary sheet:For storage according to the statistical data analysis that dimension definition is calculated are derived from, according to actual system
Meter analysis needs, and the granularity of data storage is thicker, can directly be utilized by visualization tool.
Fig. 1 and Fig. 2 respectively illustrate a kind of configuration diagram and structure of data handling system in this specific embodiment
Block diagram, the system includes that subject heading list builds module 11, dimension table generation module 12, true table generation module 13 and application and converges
Summary table generation module 14, wherein:
Subject heading list builds module 11 to be used to set up subject heading list, by original data storage to subject heading list, and in subject heading list
Record needs the code and title of the dimension for generating dimension table;Described subject heading list refers to the description demand structure according to business object
The tables of data for storing all kinds of business datums built;
Dimension table generation module 12 is used to generate corresponding dimension according to the code and title of the dimension recorded in subject heading list
Table, corresponding dimension table data is stored in dimension table, and generates dimension ID of each dimension;
True table generation module 13 is used for according to described dimension ID, and from subject heading list the fact that associate with dimension table is generated
Table, and by the fact that correspondence table data storage in true table;
Generate from true table as needed using summary sheet generation module 14 and apply summary sheet, be applied cohersive and integrated data,
And store in using summary sheet;Described application summary sheet is used for storage according to default derivation calculated relationship by true table
Data conversion after data.The module includes deriving from dimension table signal generating unit 141 and using summary sheet signal generating unit 142, group
Raw dimension table signal generating unit 141 is used for default derivation calculated relationship, and according to deriving from calculated relationship and generating dimension table is derived from;Described
Derive from calculated relationship and refer to the calculated relationship derived between dimension table and dimension table;Being used for using summary sheet signal generating unit 142 will
Derive from computing formula to associate with dimension ID in true table, and according to computing formula is derived from the data in true table are carried out to turn
Change, summary sheet is applied in generation.
Fig. 3 shows a kind of flow chart of the data processing method in this specific embodiment based on system shown in Fig. 2,
The method is comprised the following steps:
Step S21:Subject heading list is built, by original data storage to subject heading list;
A subject heading list is built first, and by the original data storage of data source to subject heading list, and record is needed in subject heading list
Generate the code and title of the dimension of dimension table.Described subject heading list refers to the use built according to the description demand of business object
In the tables of data for storing all kinds of original service data.Subject heading list in present embodiment be used for permanently store government straddle over year, across
All kinds of business datums of zoning, used as the most crucial part of data warehouse, the data in subject heading list remain original business datum
Fine granularity, original service can be described farthest.
In present embodiment, before by original data storage to subject heading list, can also set up one for temporarily store from
The temporary area of the initial data obtained in data source, first by original data storage to temporary area, obtains former from temporary area afterwards
In beginning data Cun Chudao subject heading list, as shown in fig. 1, the data that temporary area is generally stored not are complete original number
According to, and simply data in the range of a setting time, the data of such as a year, the data of month or the data of a day, by original
After beginning data Cun Chudao subject heading list, corresponding initial data in temporary area can be deleted.
The initial data of data source can be a data file, and such as Excel file, or data base's is fast
According to.Before by original data storage to subject heading list, in addition it is also necessary to carry out some pretreatment to initial data, these pretreatment are main
The attribute of overall importance that lacks including polishing initial data, unified conversion zoning and date property and deleting is not required in subject heading list
The data wanted.
Wherein, described attribute of overall importance includes but is not limited to the date of initial data(Year, month etc.)And zoning
Deng.In unified conversion zoning and date property, therefore, to assure that build in the code of zoning and this two generic attribute of date and step S22
The code in zoning dimension table and date dimension table in vertical dimension table is consistent.Described unwanted data include making
Give up and in the business datum on way.
The schematic diagram of " evidence for payment " subject heading list structure one in present embodiment, " field in the subject heading list are shown in Fig. 4
The attribute of overall importance for initial data shown in name " string, field description is the explanation to attribute of overall importance, is used to help
Understand the implication of field name, the attribute of such as field entitled " YEAR " is meant that " year ".Except remaining above-mentioned original in the table
Outside the attribute informations such as payment ID, evidence for payment number, the summary of beginning, while being also the code and word that dimension is devised in step S22
Name section, as shown in Figure 4 " business sections ", " fund property " and " budget entity " etc. are the category for needing to generate dimension table
Property(Dimension in step S22), therefore be in this step attributes such as " business sections ", " fund property " and " budget entity "
Devise the code and field name of dimension.In the code for designing dimension and during field name in the embodiment, its prefix and master
Topic table design specification is consistent, and respectively the field name with " CODE " and " NAME " is respectively the code of dimension and the name of dimension to suffix
Claim.Additionally, " the origin system ID " in Fig. 4 refers to unique major key ID of original service tables of data, its is stored in subject heading list right
Unique major key ID of the original service table data answered, by the ID can realize subject heading list and most original gathered data associate with
Verification.
By the appropriate design of subject heading list, it can be ensured that each records one business of description for being capable of independent completion, leads to
The data crossed in table can reflect a specific payment transaction, it is when paying, pay which unit, at which
The information such as room and payment.
The code and title for designing which dimension is specifically needed to be determined as needed by user in subject heading list, but
The code and title one of dimension in the code and title and dimension table of the attribute for needing to ensure to need in subject heading list to generate dimension table
Cause, the fund property in such as Fig. 4, when dimension table is generated, the code of fund property and title need and step S22 in subject heading list
The code of the fund property in the dimension table of middle generation is consistent with title.
Step S22:Generate dimension table from subject heading list as needed;
Step S23:Generate true table from subject heading list as needed;
In the step s 21, the code and title of the dimension for needing to generate dimension table is have recorded in subject heading list, according to subject heading list
The code and title of these dimensions of middle record generates respectively corresponding dimension table, by corresponding dimension table data(Dimension table number
According to referring to dimension name and dimension code)In being stored in dimension table, and generate dimension ID of each dimension.For example for a master
The code and title of " zoning " of record this dimension in topic table, generates zoning dimension table, and institute is recorded in the dimension table
Have the code and title of zoning dimension, while generate dimension ID of each zoning dimension, as shown in figure 5, " office leader " this
Dimension, its dimension name is " office leader ", dimension code " 01 ", and dimension ID is " 118301 ", can by the description in step S21
Know, for the dimension table in Fig. 5, wherein dimension code " 01 " and dimension name " office leader " also have recorded dimension in subject heading list
Code " 01 " and dimension name " office leader ".
Additionally, the efficiency in order to improve dimension table, it is to avoid repeatedly generate, when dimension table is generated, can first by theme
The dimension code recorded in table is compared with the dimension code in corresponding dimension table, if can not find matching in dimension table
Dimension code and dimension name, then store the dimension code recorded in subject heading list and title in the dimension table, and supplements life
In pairs should dimension code dimension ID, if having found the result of matching in dimension table, illustrate in dimension table
Store the dimension code and dimension name, it is not necessary to regenerate.The rule followed when dimension table is updated includes:The value of dimension table
Collection only increases and changes, it is impossible to delete(May be different in different year angle value collection with dimension, if deleted, history can be caused
The business datum in year is concentrated in dimension tabular value and can not find corresponding dimension code, it is impossible to be analyzed again);With this zoning and originally
Value collection based on the value collection in year;Using newest title when there is code and Name Conflict in the current year;The code in history year
The current year not using and hierarchical relationship increase in dimension table when there is no conflict.
After dimension table is generated, dimension ID in dimension table generates the fact that associate with dimension table table from subject heading list,
And by the fact that correspondence table data storage in true table.Specifically generating mode is:
The dimension name of data as needed, obtains the corresponding dimension code of the dimension name, by master in subject heading list
The dimension code recorded in topic table and the dimension associated codes in corresponding dimension table, obtain dimension ID, by dimension ID for obtaining
In storing true table, and the non-dimension table data under the dimension in subject heading list is directly stored in true table.
That is, when true table is generated, the dimension name of data first as needed is somebody's turn to do in subject heading list
The corresponding dimension code of dimension name, obtains corresponding dimension according to subject heading list with the dimension associated codes in dimension table afterwards
ID, and the non-dimension table data needed under the dimension in dimension ID and subject heading list is stored in true table.In Fig. 4
" business sections " corresponding data, first by business in a certain business sections code and Fig. 5 in " evidence for payment " subject heading list
The dimension code of room dimension table is associated, and obtains dimension ID of corresponding service sections, then by the business sections dimension for obtaining
ID is stored in true table, and by the corresponding non-dimension table data of the business sections in subject heading list(Outside dimension code and title
Data)In being directly stored in true table.
True table only has dimension ID and specific metric field(Non- dimension table data), when generating true table from subject heading list,
By the way of increment extraction, setting time section is only extracted according to the type of true table(Such as nearest 1 year, January or the remittance of a day
Total data)Subject heading list in data update in true table.
In present embodiment, the name definition that will be stored in the data in each table is title corresponding with its table name, is such as deposited
Data of the storage in dimension table are referred to as dimension table data, will be stored in the data in true table and are referred to as true table data, will apply
Data in summary sheet are referred to as applying cohersive and integrated data.
Step S24:Generate from true table as needed and apply summary sheet.
Generate from true table as needed and apply summary sheet, be applied cohersive and integrated data, and store in using summary sheet;
Described application summary sheet is used for storage according to the default calculated relationship that derives from by the data after the data conversion in true table.This
Generate from true table as needed in embodiment includes using the concrete mode of summary sheet:
As needed, preset and derive from calculated relationship, according to deriving from calculated relationship and generating dimension table is derived from;Described derivation meter
Calculation relation refers to the calculated relationship derived between dimension table and dimension table;
Computing formula will be derived to associate with dimension ID in true table, and according to derivation computing formula to the number in true table
According to being changed, summary sheet is applied in generation.
Wherein, described derivation calculated relationship includes the operation relation for adding, subtracting and take advantage of.According to derivation dimension table and original dimension
Degree table(The dimension table generated in step S22)Between calculated relationship when the data in true table are changed, in order to improve
Computational efficiency, by described calculated relationship the operation relation of cartesian product is converted to.
Generation in present embodiment using summary sheet depends on default derivation calculated relationship, is calculated according to the derivation and is closed
System generates corresponding derivation dimension table, when deriving from calculated relationship and arranging, support " plus ", the computing of " subtracting " and " taking advantage of ".Consider meter
Efficiency is calculated, needs for computing to be converted to Descartes's set operation, it is Descartes to derive from dimension " statistics zoning " partial arithmetic logical transition
Product example is referring to Fig. 7.Generate from true table in present embodiment and be using the false code of summary sheet:
INSERT INTO [apply summary sheet](" derive from dimension ", " tolerance ")
SELECT DIMT. " derive from dimension ", SUM(FACT. " measure " * DIMT. " coefficient ")
FROM [true table] FACT
The original dimensions of the original dimensions of INNER JOIN [deriving from dimension conversion table] DIMT ON DIMT. " "=FACT. " "
Group By DIMT. " derive from dimension ".
For being table name mark in above-mentioned false code, [], replaced using actual table name on stream, be table in ()
Field identification, is replaced on stream using the actual field of actual table.Specific INSERT INTO [applying summary sheet] tables
Show in inserting data into using summary sheet, [applying summary sheet] refers to using the table name of summary sheet, for " application converges in false code
The mark of summary table table name ", is replaced in exploitation code according to actual " applying summary sheet table name ".(" derive from dimension ", " degree
Amount ")Refer to using the attribute field of summary sheet(It is main to include deriving from dimension id field and metric field), tie up to derive from false code
The mark of field and metric field, is replaced in exploitation code according to the literary name name section of actual " applying summary sheet ".
SELECT represents which field inquired about, and as insertion the dimension and tolerance of summary sheet are applied;FROM is represented from true table and derivation
Data are inquired about in dimension conversion table;INNER JOIN tie up the original dimension ID phase of corresponding original dimension ID and true table by deriving from
Association.Group By represent the level time of packet, are grouped by dimension is derived from.Writing for above-mentioned false code be for those skilled in the art
Belong to prior art, can be adjusted as needed.
In the data processing step of above-mentioned steps S21-S24, before how utilizing without explicit data, can not perform
Step S22, S23, S24, after how explicit data utilizes, regenerate as needed dimension table, true table and application and collect
Table;Certainly, when data separate demand changes, step S22, S23, S24 need to re-execute.By the method for the present invention
And system carries out data processing and has the effect that:
1)The quality of data is improved by the flow chart of data processing of specification.Data sequentially enter subject heading list, dimension table, true table
With apply summary sheet, when transformational rule is indefinite, in placing the data in subject heading list, data are not purposely changed, protect
The genuineness of data is demonstrate,proved.
Subject heading list is increased on the basis of dimension table and true table, peacekeeping application summary sheet concept is derived from, clearly per class
The purposes of tables of data, data is first put into subject heading list, before data conversion work is postponed till true table and generated using summary sheet.
2)Data after process possess compared with high practicability.Into the data using summary sheet fully according to service needed next life
Into meeting the needs of statistical analysiss.Derive from dimension definition by introducing, can flexibly arrange " plus ", the computing pass of " subtracting " and " taking advantage of "
System, by being converted to Descartes's set operation conversion efficiency is drastically increased, and the peration data that business personnel produces quickly is turned
The statistical data analysis that leader needs are changed to, allow data to become more valuable.
3)Data can be recycled.The construction of data warehouse is not only with life to manage subject heading list as core
Into for the purpose of multi-dimension data cube.When analysis demand changes, it is not necessary to reload initial data, dimension table, true table
With can regenerate according to subject heading list using summary sheet.
In order to be better understood from the present invention, the present invention is further described with reference to specific embodiment.
Embodiment
Initial data in the embodiment is the government data in Jiangsu Province, extracts from numerous government datas as needed
Total " the index amount of money " of Nanjing, Wuxi City, Xuzhou City, Changzhou and prefecture-level city 2013 of Suzhou City five and " expenditure gold
Volume ", and " the index amount of money " and " amount paid " of above-mentioned five prefecture-level city Middle Easterns, the and " index of special bore
The amount of money " and " amount paid ".
Above-mentioned " special bore " as " districts and cities add up to ", " eastern region ", all referring to a kind of statistical analysiss bore, only
It is a kind of bore mark." special bore " is mainly used in that the data analysiss bore mark of normal packets can not be ranged.Bore mark
The mode of knowledge is various, can be arranged as required to different bore marks." the index amount of money " refers to certain zoning(Such as Nanjing
City), certain year(Such as 2013)The amount of money of expenditure can be carried out, equivalent to index amount." amount paid " refers to certain
Zoning(Such as Nanjing), certain year(Such as 2013)The amount of money of the expenditure for actually occurring, foundation index amount carry out propping up
Go out.
The first step, builds first a subject heading list, and the government data in Jiangsu Province is stored in subject heading list.Due to final needs
The data of acquisition are the related datas of five prefecture-level cities, need to set up zoning dimension table, therefore, need to record in subject heading list
The dimension code of each zoning and title.It should be noted that subject heading list data are most complete, subject heading list data are being generated
When, dimension table data and non-dimension table data(Such as value data)Will generate.True table data are given birth to further according to subject heading list data
Into.
Second step, according to the dimension code and title of zoning in subject heading list dimension table is generated, and storage is needed in dimension table
The dimension code and title of Nanjing, Wuxi City, Xuzhou City, Changzhou and the zoning dimension of Suzhou City five wanted, and generate every
Dimension ID of individual zoning dimension, as shown in Figure 6.
3rd step, by the dimension associated codes in the dimension code and dimension table in subject heading list, respectively obtains Nanjing, nothing
Dimension ID of Xi Shi, Xuzhou City, Changzhou and Suzhou City, and dimension ID for obtaining is stored in true table, meanwhile, by theme
In " the index amount of money " data and " amount paid " data Cun Chudao fact table in table under correspondence dimension ID, as shown in Figure 7.
4th step, as needed, presets and derives from calculated relationship.Statistics is needed in specific is above-mentioned five prefecture-level cities
Total " the index amount of money " and " amount paid ", and " the index amount of money " and " expenditure gold of above-mentioned five prefecture-level city Middle Easterns
Volume ", and " the index amount of money " and " amount paid " of special bore, therefore, derive from computing formula be set to " 3201 Nanjing+
The Suzhou City of+3204 Changzhou of+3203 Xuzhou City of 3202 Wuxi City+3205 ", " Changzhou of+3203 Xuzhou City of 3201 Nanjing+3204
City ", " Suzhou City of 3201 Nanjing+3205 ", and derivation dimension table is generated according to above-mentioned computing formula, as shown in figure 8, afterwards will
Above-mentioned calculated relationship is converted to cartesian product relation table as shown in Figure 9.Dimension table and true table are derived from finally association, by puppet
The mode of code is generated applies summary sheet, obtains the data for needing, and as shown in Figure 10, the total of five areas for obtaining " refers to
Standard gold volume " and " amount paid " are respectively 790 and 700, and " the index amount of money " of eastern region and " amount paid " are respectively 470 Hes
410, " the index amount of money " of special bore and " amount paid " is respectively 220 and 190.
Obviously, those skilled in the art can carry out the essence of various changes and modification without deviating from the present invention to the present invention
God and scope.So, if these modifications of the present invention and modification belong to the scope of the claims in the present invention and its equivalent technology
Within, then the present invention is also intended to comprising these changes and modification.
Claims (9)
1. a kind of data processing method, comprises the following steps:
(1) by original data storage to subject heading list, and in subject heading list record need the dimension for generating dimension table code and
Title;Described subject heading list refers to the number for storing all kinds of original service data built according to the description demand of business object
According to table;
(2) corresponding dimension table is generated according to the code and title of the dimension recorded in subject heading list, by corresponding dimension table data
In being stored in dimension table, and generate dimension ID of each dimension;
(3) according to described dimension ID, from subject heading list the fact that associate with dimension table table is generated, and will correspondingly the fact table data
In being stored in true table;Included from the concrete mode that subject heading list generates the fact that associate with dimension table table according to dimension ID:According to
The dimension name of the data of needs, obtains the corresponding dimension code of the dimension name in subject heading list, by what is recorded in subject heading list
Dimension code and the dimension associated codes in corresponding dimension table, obtain dimension ID, and dimension ID is stored in true table, and will be main
Non- dimension table data in topic table under the dimension is directly stored in true table;
(4) generate from true table as needed and apply summary sheet, be applied cohersive and integrated data, and store in using summary sheet;
Described application summary sheet is used for storage according to the default calculated relationship that derives from by the data after the data conversion in true table.
2. a kind of data processing method as claimed in claim 1, it is characterised in that in step (1), original data storage is arrived
Before subject heading list, first by original data storage to temporary area, afterwards from temporary area acquisition original data storage in subject heading list,
After original data storage to subject heading list, corresponding initial data in temporary area is deleted.
3. a kind of data processing method as claimed in claim 1 or 2, it is characterised in that in step (1), initial data is entered
After row pretreatment, then by original data storage to subject heading list;Described pretreatment includes the overall situation that polishing initial data lacks
Property attribute, unified conversion zoning and date property and delete unwanted data in subject heading list;Described unwanted data
Including calcellation and in the business datum on way.
4. a kind of data processing method as claimed in claim 1, it is characterised in that in step (3), when generating true table, adopt
With the mode of increment extraction, only the data renewal in the subject heading list of extraction setting time section is in true table.
5. a kind of data processing method as claimed in claim 1, it is characterised in that in step (4), as needed from true table
Generate includes using the concrete mode of summary sheet:
As needed, preset and derive from calculated relationship, according to deriving from calculated relationship and generating dimension table is derived from;Described derivation is calculated closes
System refers to the calculated relationship derived between dimension table and dimension table;
Computing formula will be derived to associate with dimension ID in true table, and the data in true table are entered according to computing formula is derived from
Summary sheet is applied in row conversion, generation.
6. a kind of data processing method as claimed in claim 5, it is characterised in that in step (4), described derivation is calculated closes
System includes the operation relation for adding, subtracting and take advantage of.
7. a kind of data processing method as claimed in claim 6, it is characterised in that in step (4), according to deriving from computing formula
When changing to the data in true table, described derivation calculated relationship is converted to into the operation relation of cartesian product.
8. a kind of data handling system, including:
Subject heading list builds module, and for setting up subject heading list, by original data storage to subject heading list, and record is needed in subject heading list
Generate the code and title of the dimension of dimension table;Described subject heading list refers to the use built according to the description demand of business object
In the tables of data for storing all kinds of business datums;
Dimension table generation module, for the code and title according to the dimension recorded in subject heading list corresponding dimension table is generated, will
Corresponding dimension table data is stored in dimension table, and generates dimension ID of each dimension;
True table generation module, for according to described dimension ID, from subject heading list the fact that associate with dimension table table being generated, and will
The fact that correspondence, table data storage was in true table;The tool of the fact that associate with dimension table table is generated from subject heading list according to dimension ID
Body mode includes:The dimension name of data as needed, obtains the corresponding dimension code of the dimension name in subject heading list, will
The dimension code recorded in subject heading list and the dimension associated codes in corresponding dimension table, obtain dimension ID, and the storage of dimension ID is arrived
In true table, and the non-dimension table data under the dimension in subject heading list is directly stored in true table;
Using summary sheet generation module, generate from true table apply summary sheet as needed, be applied cohersive and integrated data, and stores
To in using summary sheet;Described application summary sheet is used for storage according to the default calculated relationship that derives from by the data in true table
Data after conversion.
9. a kind of data handling system as claimed in claim 8, it is characterised in that described application summary sheet generation module bag
Include:
Dimension table signal generating unit is derived from, for default calculated relationship is derived from, according to deriving from calculated relationship and generating dimension table is derived from;Institute
The derivation calculated relationship stated refers to the calculated relationship derived between dimension table and dimension table;
Using summary sheet signal generating unit, associate with dimension ID in true table for computing formula will to be derived from, and count according to deriving from
Calculate formula to change the data in true table, summary sheet is applied in generation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410058539.2A CN103853820B (en) | 2014-02-20 | 2014-02-20 | Data processing method and data processing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410058539.2A CN103853820B (en) | 2014-02-20 | 2014-02-20 | Data processing method and data processing system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103853820A CN103853820A (en) | 2014-06-11 |
CN103853820B true CN103853820B (en) | 2017-05-03 |
Family
ID=50861475
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410058539.2A Expired - Fee Related CN103853820B (en) | 2014-02-20 | 2014-02-20 | Data processing method and data processing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103853820B (en) |
Families Citing this family (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104346449B (en) * | 2014-10-28 | 2017-11-24 | 用友网络科技股份有限公司 | Data merging method and data merging device |
CN104360879B (en) * | 2014-10-29 | 2019-03-01 | 中国建设银行股份有限公司 | A kind of data processing method and device |
CN104391927A (en) * | 2014-11-21 | 2015-03-04 | 浪潮通用软件有限公司 | Dimensionality reconstitution achievement method of multidimensional data models |
CN105679309B (en) * | 2014-11-21 | 2019-05-07 | 北京讯飞乐知行软件有限公司 | A kind of optimization method and device of speech recognition system |
CN104536982A (en) * | 2014-12-08 | 2015-04-22 | 北京用友政务软件有限公司 | Data processing method and data processing device |
CN104408183B (en) * | 2014-12-15 | 2018-05-15 | 北京国双科技有限公司 | The data lead-in method and device of data system |
CN106156040A (en) * | 2015-03-26 | 2016-11-23 | 阿里巴巴集团控股有限公司 | multi-dimensional data management method and device |
CN106326249B (en) * | 2015-06-23 | 2021-08-03 | 中兴通讯股份有限公司 | Data integration processing method and device |
CN106909566A (en) * | 2015-12-23 | 2017-06-30 | 阿里巴巴集团控股有限公司 | A kind of Data Modeling Method and equipment |
CN105574188A (en) * | 2015-12-23 | 2016-05-11 | 武汉璞华大数据技术有限公司 | Method and system for managing data in different dimensions and at different layers |
CN106933906B (en) * | 2015-12-31 | 2020-05-22 | 北京国双科技有限公司 | Data multi-dimensional query method and device |
CN106933907B (en) * | 2015-12-31 | 2020-09-15 | 北京国双科技有限公司 | Processing method and device for data table expansion indexes |
CN106933909B (en) * | 2015-12-31 | 2020-06-12 | 北京国双科技有限公司 | Multi-dimensional data query method and device |
CN106294792B (en) * | 2016-08-15 | 2019-05-31 | 上海携程商务有限公司 | The method for building up of correlation inquiry system and establish system |
CN106407360B (en) * | 2016-09-07 | 2020-07-24 | 广州视源电子科技股份有限公司 | Data processing method and device |
CN106713032B (en) * | 2016-12-21 | 2019-09-17 | 瑞斯康达科技发展股份有限公司 | A kind of method and device for realizing network management service management |
CN108241653A (en) * | 2016-12-23 | 2018-07-03 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN107402981B (en) * | 2017-07-07 | 2023-07-18 | 国网浙江省电力公司信息通信分公司 | Data increment processing method and system based on distributed offline database |
CN110019195A (en) * | 2017-09-27 | 2019-07-16 | 北京国双科技有限公司 | A kind of storage method and device of data |
CN107818177B (en) * | 2017-11-23 | 2021-06-15 | 浪潮通用软件有限公司 | Business intelligent model building method and building device |
CN110019559A (en) * | 2017-12-27 | 2019-07-16 | 航天信息股份有限公司 | A kind of data query method and system |
CN109086309B (en) * | 2018-06-21 | 2022-12-30 | 土巴兔集团股份有限公司 | Index dimension relation definition method, server and storage medium |
CN110928903B (en) * | 2018-08-31 | 2024-03-15 | 阿里巴巴集团控股有限公司 | Data extraction method and device, equipment and storage medium |
CN109656986A (en) * | 2018-10-09 | 2019-04-19 | 阿里巴巴集团控股有限公司 | A kind of householder method that business datum summarizes, device and electronic equipment |
CN111159173B (en) * | 2018-11-08 | 2023-04-18 | 王纹 | Method for constructing multidimensional semantic database |
CN110309496B (en) * | 2019-06-24 | 2023-08-22 | 招商局金融科技有限公司 | Data summarizing method, electronic device and computer readable storage medium |
CN110297818B (en) * | 2019-06-26 | 2022-03-01 | 杭州数梦工场科技有限公司 | Method and device for constructing data warehouse |
CN112182119A (en) * | 2020-09-30 | 2021-01-05 | 中国平安财产保险股份有限公司 | Method and device for verifying dimension table of data warehouse |
CN112256744A (en) * | 2020-10-27 | 2021-01-22 | 武汉市钱鲸科技有限公司 | Retail data statistics flow |
CN112464619B (en) * | 2021-01-25 | 2021-05-25 | 平安国际智慧城市科技股份有限公司 | Big data processing method, device and equipment and computer readable storage medium |
CN113934782A (en) * | 2021-09-22 | 2022-01-14 | 易联众智鼎(厦门)科技有限公司 | DAG model-based data ETL system and using method |
CN117350520B (en) * | 2023-12-04 | 2024-02-27 | 浙江大学高端装备研究院 | Automobile production optimization method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101446964A (en) * | 2008-12-31 | 2009-06-03 | 中国建设银行股份有限公司 | Method of data mining and computer device |
CN101866360A (en) * | 2010-06-28 | 2010-10-20 | 北京用友政务软件有限公司 | Data warehouse authentication method and system based on object multidimensional property space |
CN101957852A (en) * | 2010-09-26 | 2011-01-26 | 用友软件股份有限公司 | Method and system for producing correlation information of table data |
CN103020301A (en) * | 2012-12-31 | 2013-04-03 | 中国科学院自动化研究所 | Multidimensional data query and storage method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2311884A1 (en) * | 2000-06-16 | 2001-12-16 | Cognos Incorporated | Method of managing slowly changing dimensions |
-
2014
- 2014-02-20 CN CN201410058539.2A patent/CN103853820B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101446964A (en) * | 2008-12-31 | 2009-06-03 | 中国建设银行股份有限公司 | Method of data mining and computer device |
CN101866360A (en) * | 2010-06-28 | 2010-10-20 | 北京用友政务软件有限公司 | Data warehouse authentication method and system based on object multidimensional property space |
CN101957852A (en) * | 2010-09-26 | 2011-01-26 | 用友软件股份有限公司 | Method and system for producing correlation information of table data |
CN103020301A (en) * | 2012-12-31 | 2013-04-03 | 中国科学院自动化研究所 | Multidimensional data query and storage method and system |
Also Published As
Publication number | Publication date |
---|---|
CN103853820A (en) | 2014-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103853820B (en) | Data processing method and data processing system | |
CN106339274B (en) | A kind of method and system that data snapshot obtains | |
CN104636338B (en) | A kind of data cleansing storage method for the monitoring of value-added tax negative and positive ticket | |
CN106648446A (en) | Time series data storage method and apparatus, and electronic device | |
CN110275920A (en) | Data query method, apparatus, electronic equipment and computer readable storage medium | |
CN106021389A (en) | System and method for automatically generating news based on template | |
CN107273482A (en) | Alarm data storage method and device based on HBase | |
CN107818115A (en) | A kind of method and device of processing data table | |
CN107704590A (en) | A kind of data processing method and system based on data warehouse | |
CN104182484A (en) | Method and device for realizing mapping of HBase data and Java domain objects | |
CN107657049A (en) | A kind of data processing method based on data warehouse | |
CN104636337B (en) | A kind of data cleansing storage method for value-added tax | |
CN103744948B (en) | Searching data checks the method and system of reason of discrepancies | |
CN106326438A (en) | Personnel information correlating method | |
CN106095964A (en) | A kind of method that data are carried out visualization filing and search | |
CN105630934A (en) | Data statistic method and system | |
CN106897285A (en) | Data Elements extract analysis system and Data Elements extract analysis method | |
CN107729448A (en) | A kind of data handling system based on data warehouse | |
CN108959560A (en) | Information processing method, device and electronic equipment based on tables of data | |
CN103020753A (en) | Document state display system and document state display method | |
CN110019694A (en) | Method, apparatus and computer readable storage medium for knowledge mapping | |
CN106033438A (en) | Public sentiment data storage method and server | |
CN102208061A (en) | Data cancel after verification processing device and method | |
CN101963993B (en) | Method for fast searching database sheet table record | |
CN104636341B (en) | A kind of data cleansing storage method for the several monitoring of value-added tax No.1 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: 100094 2F, building 11, UFIDA Software Park, 68 Beiqing Road, Haidian District, Beijing Patentee after: Beijing UYU Government Software Co.,Ltd. Address before: 100094 2F, building 11, UFIDA Software Park, 68 Beiqing Road, Haidian District, Beijing Patentee before: YONYOU GOVERNMENT AFFAIRS SOFTWARE Co.,Ltd. |
|
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170503 Termination date: 20210220 |