CN103853820B - Data processing method and data processing system - Google Patents

Data processing method and data processing system Download PDF

Info

Publication number
CN103853820B
CN103853820B CN201410058539.2A CN201410058539A CN103853820B CN 103853820 B CN103853820 B CN 103853820B CN 201410058539 A CN201410058539 A CN 201410058539A CN 103853820 B CN103853820 B CN 103853820B
Authority
CN
China
Prior art keywords
dimension
data
subject heading
heading list
true
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410058539.2A
Other languages
Chinese (zh)
Other versions
CN103853820A (en
Inventor
陈国强
朱培冬
郝栋
姬永杰
刘广财
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing UYU Government Software Co.,Ltd.
Original Assignee
BEIJING UFIDA SOFTWARE CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING UFIDA SOFTWARE CO LTD filed Critical BEIJING UFIDA SOFTWARE CO LTD
Priority to CN201410058539.2A priority Critical patent/CN103853820B/en
Publication of CN103853820A publication Critical patent/CN103853820A/en
Application granted granted Critical
Publication of CN103853820B publication Critical patent/CN103853820B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data processing method and a data processing system. The data processing method comprises the following steps: firstly, storing original data to a theme table, and recording codes and names of dimensions which need to generate a dimension table in the theme table; then, generating the corresponding dimension table according to the codes and the names, recorded in the theme table, of the dimensions; storing corresponding dimension table data to the dimension table, and generating the dimension ID of each dimension; generating a fact table related to the dimension table from the theme table according to the dimension ID of each dimension; storing corresponding fact table data into the fact table; finally, generating an application summary table from the fact table as needed to obtain application summary data, and storing to the application summary table. According to the method and the system, by adding the theme table and the application summary table in a data processing cycle, the data can be repeatedly utilized based on the theme table; through derived dimension-supported calculation, the conversion of a data analysis aperture is realize, and the data processing efficiency and the practicability of processed data are effectively improved.

Description

A kind of data processing method and system
Technical field
The present invention relates to the technical field of data processing of data warehouse, and in particular to a kind of data processing method and system.
Background technology
With the rise of big data concept, government agencies at all levels all based on data warehouse technology, are positively utilized government The each type of structured produced during management and unstructured data, the premise that traditional data REPOSITORY TECHNOLOGY is based on is clear and definite How data go to utilize, but the demand of reality is that government department is needed first by data collection, how also not to be had using data Have and think over completely.
The ultimate principle of traditional data REPOSITORY TECHNOLOGY is that the initial data in data source is drawn into into temporary area, through unification Cleaning, conversion and after processing, update to dimension table and true table, by data visualization tool based on the fact that table and dimension table The multi-dimension data cube of composition carries out data separate.But, using traditional data REPOSITORY TECHNOLOGY government data warehouse process is being built In face a series of challenge, because the relative aperture of historical data is relatively thick, standardization is not relatively high, in different year, not same districts The data management bore drawn is inconsistent, and unified transformational rule is lacked in business, generally requires business personnel to peration data Conversion regime is confirmed one by one, and this workload and difficulty are inconceivable;Even if data are according to unified data standard Conversion is completed, because governability data bore and analytical data bore have very big difference, causes peration data directly to use In analysis, the complicated transcode of technical staff's exploitation is generally required, need to carry out substantial amounts of interim operation, Jing often occurs Represent that efficiency is low, even the inconsistent phenomenon of data, cause can not rapid response to customer's need, the practicality of data drops significantly It is low.
The content of the invention
For defect present in prior art, it is an object of the invention to provide a kind of data processing method and system, The practicality of data after improving the efficiency of data processing and processing.
For achieving the above object, the technical solution used in the present invention is:A kind of data processing method, comprises the following steps:
(1)By in original data storage to subject heading list, and record needs the generation of the dimension for generating dimension table in subject heading list Code and title;Described subject heading list is referred to according to the description demand of business object structure for storing all kinds of original service data Tables of data;
(2)Corresponding dimension table is generated according to the code and title of the dimension recorded in subject heading list, by corresponding dimension table Data storage generates dimension ID of each dimension in dimension table;
(3)According to described dimension ID, from subject heading list the fact that associate with dimension table table is generated, and will correspondingly the fact table Data storage is in true table;
(4)Generate from true table as needed and apply summary sheet, be applied cohersive and integrated data, and store to applying summary sheet In;Described application summary sheet is used for storage according to the default calculated relationship that derives from by the number after the data conversion in true table According to.
Further, a kind of data processing method as above, step(1)In, by original data storage to subject heading list it Before, first by original data storage to temporary area, original data storage is obtained in subject heading list, by original number from temporary area afterwards According to storage to after subject heading list, corresponding initial data in temporary area is deleted.
Further, a kind of data processing method as above, step(1)In, initial data is carried out after pretreatment, then By in original data storage to subject heading list;Described pretreatment includes attribute of overall importance, the unification turn that polishing initial data lacks Change zoning and date property and delete unwanted data in subject heading list;Described unwanted data include cancel and The business datum on way.
Further, a kind of data processing method as above, step(3)In, according to dimension ID from subject heading list generate with The concrete mode of the fact that dimension table is associated table includes:
The dimension name of data as needed, obtains the corresponding dimension code of the dimension name, by master in subject heading list The dimension code recorded in topic table and the dimension associated codes in corresponding dimension table, obtain dimension ID, by the storage of dimension ID to thing In real table, and the non-dimension table data under the dimension in subject heading list is directly stored in true table.
Further, a kind of data processing method as above, step(3)In, when generating true table, using increment extraction Mode, the data only extracted in the subject heading list of setting time section are updated in true table.
Further, a kind of data processing method as above, step(4)In, generate from true table apply as needed The concrete mode of summary sheet includes:
As needed, preset and derive from calculated relationship, according to deriving from calculated relationship and generating dimension table is derived from;Described derivation meter Calculation relation refers to the calculated relationship derived between dimension table and dimension table;
Computing formula will be derived to associate with dimension ID in true table, and according to derivation computing formula to the number in true table According to being changed, summary sheet is applied in generation.
Further, a kind of data processing method as above, step(4)In, described derivation calculated relationship includes Plus, the operation relation that subtracts and take advantage of.
Further, a kind of data processing method as above, step(4)In, according to derivation computing formula to the fact When data in table are changed, described derivation calculated relationship is converted to into the operation relation of cartesian product.
A kind of data handling system, including:
Subject heading list builds module, for setting up subject heading list, by original data storage to subject heading list, and remembers in subject heading list Record needs the code and title of the dimension for generating dimension table;Described subject heading list is referred to and built according to the description demand of business object The tables of data for storing all kinds of business datums;
Dimension table generation module, for the code and title according to the dimension recorded in subject heading list corresponding dimension is generated Table, corresponding dimension table data is stored in dimension table, and generates dimension ID of each dimension;
True table generation module, for according to described dimension ID, from subject heading list the fact that associate with dimension table table being generated, And by the fact that correspondence table data storage in true table;
Using summary sheet generation module, generate from true table apply summary sheet as needed, be applied cohersive and integrated data, and Store using in summary sheet;Described application summary sheet is used for storage according to default derivation calculated relationship by true table Data after data conversion.
Further, a kind of data handling system as above, described application summary sheet generation module includes:
Dimension table signal generating unit is derived from, for default calculated relationship is derived from, according to deriving from calculated relationship and generating dimension is derived from Table;Described derivation calculated relationship refers to the calculated relationship derived between dimension table and dimension table;
Using summary sheet signal generating unit, associate with dimension ID in true table for computing formula will to be derived from, and according to group Raw computing formula is changed to the data in true table, and summary sheet is applied in generation.
The beneficial effects of the present invention is:Method and system of the present invention, by increasing in flow chart of data processing " subject heading list ", " deriving from dimension " and " application data sheet ", enable data to be reused based on subject heading list, additionally, passing through The calculating that dimension is supported is derived from, the conversion of data analysiss bore is realized, number after effectively increasing the efficiency of data processing and processing According to practicality.
Description of the drawings
Fig. 1 is a kind of configuration diagram of data handling system in specific embodiment;
Fig. 2 is a kind of structured flowchart of data handling system in specific embodiment;
Fig. 3 is a kind of flow chart of data handling system in specific embodiment;
Fig. 4 is the structural representation of subject heading list in specific embodiment;
Fig. 5 is the schematic diagram of dimension table in specific embodiment;
Fig. 6 is the schematic diagram of dimension table in embodiment;
Fig. 7 is the schematic diagram of true table in embodiment;
Fig. 8 is the schematic diagram that dimension table is derived from embodiment;
Fig. 9 is that derivation calculated relationship is converted to into the schematic diagram of cartesian product operation relation in embodiment;
Figure 10 is using the schematic diagram of summary sheet in embodiment.
Specific embodiment
With reference to Figure of description, the present invention is described in further detail with specific embodiment.
In order to be better understood from the present invention, technical term involved in this specific embodiment is explained first It is bright:
Temporary area:For the initial data that interim storage is obtained from data source, the data for generally storing are not Complete, such as:One annual data, January data or a day data, after data pick-up to subject heading list, the data of temporary area can be lost Abandon.
Subject heading list:Straddle over year for permanently storing government, transregional stroke of all kinds of business datums, it is most crucial as data warehouse Part, the structure of subject heading list builds according to the description demand of business object, and the data of subject heading list retain the thin of original business Granularity, can farthest describe original service.
Dimension table:The part in traditional data warehouse, for storing dimension table data.
True table:The part in traditional data warehouse, for the business datum after storage processing, is constituted together with dimension table Data warehouse " cube ", is easy to flexible multidimensional data analysis.
Derive from dimension table:For the data that dimension definition is derived from storage, including dimension value collection list is derived from, and each derivation The computing formula of dimension value.
Using summary sheet:For storage according to the statistical data analysis that dimension definition is calculated are derived from, according to actual system Meter analysis needs, and the granularity of data storage is thicker, can directly be utilized by visualization tool.
Fig. 1 and Fig. 2 respectively illustrate a kind of configuration diagram and structure of data handling system in this specific embodiment Block diagram, the system includes that subject heading list builds module 11, dimension table generation module 12, true table generation module 13 and application and converges Summary table generation module 14, wherein:
Subject heading list builds module 11 to be used to set up subject heading list, by original data storage to subject heading list, and in subject heading list Record needs the code and title of the dimension for generating dimension table;Described subject heading list refers to the description demand structure according to business object The tables of data for storing all kinds of business datums built;
Dimension table generation module 12 is used to generate corresponding dimension according to the code and title of the dimension recorded in subject heading list Table, corresponding dimension table data is stored in dimension table, and generates dimension ID of each dimension;
True table generation module 13 is used for according to described dimension ID, and from subject heading list the fact that associate with dimension table is generated Table, and by the fact that correspondence table data storage in true table;
Generate from true table as needed using summary sheet generation module 14 and apply summary sheet, be applied cohersive and integrated data, And store in using summary sheet;Described application summary sheet is used for storage according to default derivation calculated relationship by true table Data conversion after data.The module includes deriving from dimension table signal generating unit 141 and using summary sheet signal generating unit 142, group Raw dimension table signal generating unit 141 is used for default derivation calculated relationship, and according to deriving from calculated relationship and generating dimension table is derived from;Described Derive from calculated relationship and refer to the calculated relationship derived between dimension table and dimension table;Being used for using summary sheet signal generating unit 142 will Derive from computing formula to associate with dimension ID in true table, and according to computing formula is derived from the data in true table are carried out to turn Change, summary sheet is applied in generation.
Fig. 3 shows a kind of flow chart of the data processing method in this specific embodiment based on system shown in Fig. 2, The method is comprised the following steps:
Step S21:Subject heading list is built, by original data storage to subject heading list;
A subject heading list is built first, and by the original data storage of data source to subject heading list, and record is needed in subject heading list Generate the code and title of the dimension of dimension table.Described subject heading list refers to the use built according to the description demand of business object In the tables of data for storing all kinds of original service data.Subject heading list in present embodiment be used for permanently store government straddle over year, across All kinds of business datums of zoning, used as the most crucial part of data warehouse, the data in subject heading list remain original business datum Fine granularity, original service can be described farthest.
In present embodiment, before by original data storage to subject heading list, can also set up one for temporarily store from The temporary area of the initial data obtained in data source, first by original data storage to temporary area, obtains former from temporary area afterwards In beginning data Cun Chudao subject heading list, as shown in fig. 1, the data that temporary area is generally stored not are complete original number According to, and simply data in the range of a setting time, the data of such as a year, the data of month or the data of a day, by original After beginning data Cun Chudao subject heading list, corresponding initial data in temporary area can be deleted.
The initial data of data source can be a data file, and such as Excel file, or data base's is fast According to.Before by original data storage to subject heading list, in addition it is also necessary to carry out some pretreatment to initial data, these pretreatment are main The attribute of overall importance that lacks including polishing initial data, unified conversion zoning and date property and deleting is not required in subject heading list The data wanted.
Wherein, described attribute of overall importance includes but is not limited to the date of initial data(Year, month etc.)And zoning Deng.In unified conversion zoning and date property, therefore, to assure that build in the code of zoning and this two generic attribute of date and step S22 The code in zoning dimension table and date dimension table in vertical dimension table is consistent.Described unwanted data include making Give up and in the business datum on way.
The schematic diagram of " evidence for payment " subject heading list structure one in present embodiment, " field in the subject heading list are shown in Fig. 4 The attribute of overall importance for initial data shown in name " string, field description is the explanation to attribute of overall importance, is used to help Understand the implication of field name, the attribute of such as field entitled " YEAR " is meant that " year ".Except remaining above-mentioned original in the table Outside the attribute informations such as payment ID, evidence for payment number, the summary of beginning, while being also the code and word that dimension is devised in step S22 Name section, as shown in Figure 4 " business sections ", " fund property " and " budget entity " etc. are the category for needing to generate dimension table Property(Dimension in step S22), therefore be in this step attributes such as " business sections ", " fund property " and " budget entity " Devise the code and field name of dimension.In the code for designing dimension and during field name in the embodiment, its prefix and master Topic table design specification is consistent, and respectively the field name with " CODE " and " NAME " is respectively the code of dimension and the name of dimension to suffix Claim.Additionally, " the origin system ID " in Fig. 4 refers to unique major key ID of original service tables of data, its is stored in subject heading list right Unique major key ID of the original service table data answered, by the ID can realize subject heading list and most original gathered data associate with Verification.
By the appropriate design of subject heading list, it can be ensured that each records one business of description for being capable of independent completion, leads to The data crossed in table can reflect a specific payment transaction, it is when paying, pay which unit, at which The information such as room and payment.
The code and title for designing which dimension is specifically needed to be determined as needed by user in subject heading list, but The code and title one of dimension in the code and title and dimension table of the attribute for needing to ensure to need in subject heading list to generate dimension table Cause, the fund property in such as Fig. 4, when dimension table is generated, the code of fund property and title need and step S22 in subject heading list The code of the fund property in the dimension table of middle generation is consistent with title.
Step S22:Generate dimension table from subject heading list as needed;
Step S23:Generate true table from subject heading list as needed;
In the step s 21, the code and title of the dimension for needing to generate dimension table is have recorded in subject heading list, according to subject heading list The code and title of these dimensions of middle record generates respectively corresponding dimension table, by corresponding dimension table data(Dimension table number According to referring to dimension name and dimension code)In being stored in dimension table, and generate dimension ID of each dimension.For example for a master The code and title of " zoning " of record this dimension in topic table, generates zoning dimension table, and institute is recorded in the dimension table Have the code and title of zoning dimension, while generate dimension ID of each zoning dimension, as shown in figure 5, " office leader " this Dimension, its dimension name is " office leader ", dimension code " 01 ", and dimension ID is " 118301 ", can by the description in step S21 Know, for the dimension table in Fig. 5, wherein dimension code " 01 " and dimension name " office leader " also have recorded dimension in subject heading list Code " 01 " and dimension name " office leader ".
Additionally, the efficiency in order to improve dimension table, it is to avoid repeatedly generate, when dimension table is generated, can first by theme The dimension code recorded in table is compared with the dimension code in corresponding dimension table, if can not find matching in dimension table Dimension code and dimension name, then store the dimension code recorded in subject heading list and title in the dimension table, and supplements life In pairs should dimension code dimension ID, if having found the result of matching in dimension table, illustrate in dimension table Store the dimension code and dimension name, it is not necessary to regenerate.The rule followed when dimension table is updated includes:The value of dimension table Collection only increases and changes, it is impossible to delete(May be different in different year angle value collection with dimension, if deleted, history can be caused The business datum in year is concentrated in dimension tabular value and can not find corresponding dimension code, it is impossible to be analyzed again);With this zoning and originally Value collection based on the value collection in year;Using newest title when there is code and Name Conflict in the current year;The code in history year The current year not using and hierarchical relationship increase in dimension table when there is no conflict.
After dimension table is generated, dimension ID in dimension table generates the fact that associate with dimension table table from subject heading list, And by the fact that correspondence table data storage in true table.Specifically generating mode is:
The dimension name of data as needed, obtains the corresponding dimension code of the dimension name, by master in subject heading list The dimension code recorded in topic table and the dimension associated codes in corresponding dimension table, obtain dimension ID, by dimension ID for obtaining In storing true table, and the non-dimension table data under the dimension in subject heading list is directly stored in true table.
That is, when true table is generated, the dimension name of data first as needed is somebody's turn to do in subject heading list The corresponding dimension code of dimension name, obtains corresponding dimension according to subject heading list with the dimension associated codes in dimension table afterwards ID, and the non-dimension table data needed under the dimension in dimension ID and subject heading list is stored in true table.In Fig. 4 " business sections " corresponding data, first by business in a certain business sections code and Fig. 5 in " evidence for payment " subject heading list The dimension code of room dimension table is associated, and obtains dimension ID of corresponding service sections, then by the business sections dimension for obtaining ID is stored in true table, and by the corresponding non-dimension table data of the business sections in subject heading list(Outside dimension code and title Data)In being directly stored in true table.
True table only has dimension ID and specific metric field(Non- dimension table data), when generating true table from subject heading list, By the way of increment extraction, setting time section is only extracted according to the type of true table(Such as nearest 1 year, January or the remittance of a day Total data)Subject heading list in data update in true table.
In present embodiment, the name definition that will be stored in the data in each table is title corresponding with its table name, is such as deposited Data of the storage in dimension table are referred to as dimension table data, will be stored in the data in true table and are referred to as true table data, will apply Data in summary sheet are referred to as applying cohersive and integrated data.
Step S24:Generate from true table as needed and apply summary sheet.
Generate from true table as needed and apply summary sheet, be applied cohersive and integrated data, and store in using summary sheet; Described application summary sheet is used for storage according to the default calculated relationship that derives from by the data after the data conversion in true table.This Generate from true table as needed in embodiment includes using the concrete mode of summary sheet:
As needed, preset and derive from calculated relationship, according to deriving from calculated relationship and generating dimension table is derived from;Described derivation meter Calculation relation refers to the calculated relationship derived between dimension table and dimension table;
Computing formula will be derived to associate with dimension ID in true table, and according to derivation computing formula to the number in true table According to being changed, summary sheet is applied in generation.
Wherein, described derivation calculated relationship includes the operation relation for adding, subtracting and take advantage of.According to derivation dimension table and original dimension Degree table(The dimension table generated in step S22)Between calculated relationship when the data in true table are changed, in order to improve Computational efficiency, by described calculated relationship the operation relation of cartesian product is converted to.
Generation in present embodiment using summary sheet depends on default derivation calculated relationship, is calculated according to the derivation and is closed System generates corresponding derivation dimension table, when deriving from calculated relationship and arranging, support " plus ", the computing of " subtracting " and " taking advantage of ".Consider meter Efficiency is calculated, needs for computing to be converted to Descartes's set operation, it is Descartes to derive from dimension " statistics zoning " partial arithmetic logical transition Product example is referring to Fig. 7.Generate from true table in present embodiment and be using the false code of summary sheet:
INSERT INTO [apply summary sheet](" derive from dimension ", " tolerance ")
SELECT DIMT. " derive from dimension ", SUM(FACT. " measure " * DIMT. " coefficient ")
FROM [true table] FACT
The original dimensions of the original dimensions of INNER JOIN [deriving from dimension conversion table] DIMT ON DIMT. " "=FACT. " "
Group By DIMT. " derive from dimension ".
For being table name mark in above-mentioned false code, [], replaced using actual table name on stream, be table in () Field identification, is replaced on stream using the actual field of actual table.Specific INSERT INTO [applying summary sheet] tables Show in inserting data into using summary sheet, [applying summary sheet] refers to using the table name of summary sheet, for " application converges in false code The mark of summary table table name ", is replaced in exploitation code according to actual " applying summary sheet table name ".(" derive from dimension ", " degree Amount ")Refer to using the attribute field of summary sheet(It is main to include deriving from dimension id field and metric field), tie up to derive from false code The mark of field and metric field, is replaced in exploitation code according to the literary name name section of actual " applying summary sheet ". SELECT represents which field inquired about, and as insertion the dimension and tolerance of summary sheet are applied;FROM is represented from true table and derivation Data are inquired about in dimension conversion table;INNER JOIN tie up the original dimension ID phase of corresponding original dimension ID and true table by deriving from Association.Group By represent the level time of packet, are grouped by dimension is derived from.Writing for above-mentioned false code be for those skilled in the art Belong to prior art, can be adjusted as needed.
In the data processing step of above-mentioned steps S21-S24, before how utilizing without explicit data, can not perform Step S22, S23, S24, after how explicit data utilizes, regenerate as needed dimension table, true table and application and collect Table;Certainly, when data separate demand changes, step S22, S23, S24 need to re-execute.By the method for the present invention And system carries out data processing and has the effect that:
1)The quality of data is improved by the flow chart of data processing of specification.Data sequentially enter subject heading list, dimension table, true table With apply summary sheet, when transformational rule is indefinite, in placing the data in subject heading list, data are not purposely changed, protect The genuineness of data is demonstrate,proved.
Subject heading list is increased on the basis of dimension table and true table, peacekeeping application summary sheet concept is derived from, clearly per class The purposes of tables of data, data is first put into subject heading list, before data conversion work is postponed till true table and generated using summary sheet.
2)Data after process possess compared with high practicability.Into the data using summary sheet fully according to service needed next life Into meeting the needs of statistical analysiss.Derive from dimension definition by introducing, can flexibly arrange " plus ", the computing pass of " subtracting " and " taking advantage of " System, by being converted to Descartes's set operation conversion efficiency is drastically increased, and the peration data that business personnel produces quickly is turned The statistical data analysis that leader needs are changed to, allow data to become more valuable.
3)Data can be recycled.The construction of data warehouse is not only with life to manage subject heading list as core Into for the purpose of multi-dimension data cube.When analysis demand changes, it is not necessary to reload initial data, dimension table, true table With can regenerate according to subject heading list using summary sheet.
In order to be better understood from the present invention, the present invention is further described with reference to specific embodiment.
Embodiment
Initial data in the embodiment is the government data in Jiangsu Province, extracts from numerous government datas as needed Total " the index amount of money " of Nanjing, Wuxi City, Xuzhou City, Changzhou and prefecture-level city 2013 of Suzhou City five and " expenditure gold Volume ", and " the index amount of money " and " amount paid " of above-mentioned five prefecture-level city Middle Easterns, the and " index of special bore The amount of money " and " amount paid ".
Above-mentioned " special bore " as " districts and cities add up to ", " eastern region ", all referring to a kind of statistical analysiss bore, only It is a kind of bore mark." special bore " is mainly used in that the data analysiss bore mark of normal packets can not be ranged.Bore mark The mode of knowledge is various, can be arranged as required to different bore marks." the index amount of money " refers to certain zoning(Such as Nanjing City), certain year(Such as 2013)The amount of money of expenditure can be carried out, equivalent to index amount." amount paid " refers to certain Zoning(Such as Nanjing), certain year(Such as 2013)The amount of money of the expenditure for actually occurring, foundation index amount carry out propping up Go out.
The first step, builds first a subject heading list, and the government data in Jiangsu Province is stored in subject heading list.Due to final needs The data of acquisition are the related datas of five prefecture-level cities, need to set up zoning dimension table, therefore, need to record in subject heading list The dimension code of each zoning and title.It should be noted that subject heading list data are most complete, subject heading list data are being generated When, dimension table data and non-dimension table data(Such as value data)Will generate.True table data are given birth to further according to subject heading list data Into.
Second step, according to the dimension code and title of zoning in subject heading list dimension table is generated, and storage is needed in dimension table The dimension code and title of Nanjing, Wuxi City, Xuzhou City, Changzhou and the zoning dimension of Suzhou City five wanted, and generate every Dimension ID of individual zoning dimension, as shown in Figure 6.
3rd step, by the dimension associated codes in the dimension code and dimension table in subject heading list, respectively obtains Nanjing, nothing Dimension ID of Xi Shi, Xuzhou City, Changzhou and Suzhou City, and dimension ID for obtaining is stored in true table, meanwhile, by theme In " the index amount of money " data and " amount paid " data Cun Chudao fact table in table under correspondence dimension ID, as shown in Figure 7.
4th step, as needed, presets and derives from calculated relationship.Statistics is needed in specific is above-mentioned five prefecture-level cities Total " the index amount of money " and " amount paid ", and " the index amount of money " and " expenditure gold of above-mentioned five prefecture-level city Middle Easterns Volume ", and " the index amount of money " and " amount paid " of special bore, therefore, derive from computing formula be set to " 3201 Nanjing+ The Suzhou City of+3204 Changzhou of+3203 Xuzhou City of 3202 Wuxi City+3205 ", " Changzhou of+3203 Xuzhou City of 3201 Nanjing+3204 City ", " Suzhou City of 3201 Nanjing+3205 ", and derivation dimension table is generated according to above-mentioned computing formula, as shown in figure 8, afterwards will Above-mentioned calculated relationship is converted to cartesian product relation table as shown in Figure 9.Dimension table and true table are derived from finally association, by puppet The mode of code is generated applies summary sheet, obtains the data for needing, and as shown in Figure 10, the total of five areas for obtaining " refers to Standard gold volume " and " amount paid " are respectively 790 and 700, and " the index amount of money " of eastern region and " amount paid " are respectively 470 Hes 410, " the index amount of money " of special bore and " amount paid " is respectively 220 and 190.
Obviously, those skilled in the art can carry out the essence of various changes and modification without deviating from the present invention to the present invention God and scope.So, if these modifications of the present invention and modification belong to the scope of the claims in the present invention and its equivalent technology Within, then the present invention is also intended to comprising these changes and modification.

Claims (9)

1. a kind of data processing method, comprises the following steps:
(1) by original data storage to subject heading list, and in subject heading list record need the dimension for generating dimension table code and Title;Described subject heading list refers to the number for storing all kinds of original service data built according to the description demand of business object According to table;
(2) corresponding dimension table is generated according to the code and title of the dimension recorded in subject heading list, by corresponding dimension table data In being stored in dimension table, and generate dimension ID of each dimension;
(3) according to described dimension ID, from subject heading list the fact that associate with dimension table table is generated, and will correspondingly the fact table data In being stored in true table;Included from the concrete mode that subject heading list generates the fact that associate with dimension table table according to dimension ID:According to The dimension name of the data of needs, obtains the corresponding dimension code of the dimension name in subject heading list, by what is recorded in subject heading list Dimension code and the dimension associated codes in corresponding dimension table, obtain dimension ID, and dimension ID is stored in true table, and will be main Non- dimension table data in topic table under the dimension is directly stored in true table;
(4) generate from true table as needed and apply summary sheet, be applied cohersive and integrated data, and store in using summary sheet; Described application summary sheet is used for storage according to the default calculated relationship that derives from by the data after the data conversion in true table.
2. a kind of data processing method as claimed in claim 1, it is characterised in that in step (1), original data storage is arrived Before subject heading list, first by original data storage to temporary area, afterwards from temporary area acquisition original data storage in subject heading list, After original data storage to subject heading list, corresponding initial data in temporary area is deleted.
3. a kind of data processing method as claimed in claim 1 or 2, it is characterised in that in step (1), initial data is entered After row pretreatment, then by original data storage to subject heading list;Described pretreatment includes the overall situation that polishing initial data lacks Property attribute, unified conversion zoning and date property and delete unwanted data in subject heading list;Described unwanted data Including calcellation and in the business datum on way.
4. a kind of data processing method as claimed in claim 1, it is characterised in that in step (3), when generating true table, adopt With the mode of increment extraction, only the data renewal in the subject heading list of extraction setting time section is in true table.
5. a kind of data processing method as claimed in claim 1, it is characterised in that in step (4), as needed from true table Generate includes using the concrete mode of summary sheet:
As needed, preset and derive from calculated relationship, according to deriving from calculated relationship and generating dimension table is derived from;Described derivation is calculated closes System refers to the calculated relationship derived between dimension table and dimension table;
Computing formula will be derived to associate with dimension ID in true table, and the data in true table are entered according to computing formula is derived from Summary sheet is applied in row conversion, generation.
6. a kind of data processing method as claimed in claim 5, it is characterised in that in step (4), described derivation is calculated closes System includes the operation relation for adding, subtracting and take advantage of.
7. a kind of data processing method as claimed in claim 6, it is characterised in that in step (4), according to deriving from computing formula When changing to the data in true table, described derivation calculated relationship is converted to into the operation relation of cartesian product.
8. a kind of data handling system, including:
Subject heading list builds module, and for setting up subject heading list, by original data storage to subject heading list, and record is needed in subject heading list Generate the code and title of the dimension of dimension table;Described subject heading list refers to the use built according to the description demand of business object In the tables of data for storing all kinds of business datums;
Dimension table generation module, for the code and title according to the dimension recorded in subject heading list corresponding dimension table is generated, will Corresponding dimension table data is stored in dimension table, and generates dimension ID of each dimension;
True table generation module, for according to described dimension ID, from subject heading list the fact that associate with dimension table table being generated, and will The fact that correspondence, table data storage was in true table;The tool of the fact that associate with dimension table table is generated from subject heading list according to dimension ID Body mode includes:The dimension name of data as needed, obtains the corresponding dimension code of the dimension name in subject heading list, will The dimension code recorded in subject heading list and the dimension associated codes in corresponding dimension table, obtain dimension ID, and the storage of dimension ID is arrived In true table, and the non-dimension table data under the dimension in subject heading list is directly stored in true table;
Using summary sheet generation module, generate from true table apply summary sheet as needed, be applied cohersive and integrated data, and stores To in using summary sheet;Described application summary sheet is used for storage according to the default calculated relationship that derives from by the data in true table Data after conversion.
9. a kind of data handling system as claimed in claim 8, it is characterised in that described application summary sheet generation module bag Include:
Dimension table signal generating unit is derived from, for default calculated relationship is derived from, according to deriving from calculated relationship and generating dimension table is derived from;Institute The derivation calculated relationship stated refers to the calculated relationship derived between dimension table and dimension table;
Using summary sheet signal generating unit, associate with dimension ID in true table for computing formula will to be derived from, and count according to deriving from Calculate formula to change the data in true table, summary sheet is applied in generation.
CN201410058539.2A 2014-02-20 2014-02-20 Data processing method and data processing system Expired - Fee Related CN103853820B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410058539.2A CN103853820B (en) 2014-02-20 2014-02-20 Data processing method and data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410058539.2A CN103853820B (en) 2014-02-20 2014-02-20 Data processing method and data processing system

Publications (2)

Publication Number Publication Date
CN103853820A CN103853820A (en) 2014-06-11
CN103853820B true CN103853820B (en) 2017-05-03

Family

ID=50861475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410058539.2A Expired - Fee Related CN103853820B (en) 2014-02-20 2014-02-20 Data processing method and data processing system

Country Status (1)

Country Link
CN (1) CN103853820B (en)

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346449B (en) * 2014-10-28 2017-11-24 用友网络科技股份有限公司 Data merging method and data merging device
CN104360879B (en) * 2014-10-29 2019-03-01 中国建设银行股份有限公司 A kind of data processing method and device
CN104391927A (en) * 2014-11-21 2015-03-04 浪潮通用软件有限公司 Dimensionality reconstitution achievement method of multidimensional data models
CN105679309B (en) * 2014-11-21 2019-05-07 北京讯飞乐知行软件有限公司 A kind of optimization method and device of speech recognition system
CN104536982A (en) * 2014-12-08 2015-04-22 北京用友政务软件有限公司 Data processing method and data processing device
CN104408183B (en) * 2014-12-15 2018-05-15 北京国双科技有限公司 The data lead-in method and device of data system
CN106156040A (en) * 2015-03-26 2016-11-23 阿里巴巴集团控股有限公司 multi-dimensional data management method and device
CN106326249B (en) * 2015-06-23 2021-08-03 中兴通讯股份有限公司 Data integration processing method and device
CN106909566A (en) * 2015-12-23 2017-06-30 阿里巴巴集团控股有限公司 A kind of Data Modeling Method and equipment
CN105574188A (en) * 2015-12-23 2016-05-11 武汉璞华大数据技术有限公司 Method and system for managing data in different dimensions and at different layers
CN106933906B (en) * 2015-12-31 2020-05-22 北京国双科技有限公司 Data multi-dimensional query method and device
CN106933907B (en) * 2015-12-31 2020-09-15 北京国双科技有限公司 Processing method and device for data table expansion indexes
CN106933909B (en) * 2015-12-31 2020-06-12 北京国双科技有限公司 Multi-dimensional data query method and device
CN106294792B (en) * 2016-08-15 2019-05-31 上海携程商务有限公司 The method for building up of correlation inquiry system and establish system
CN106407360B (en) * 2016-09-07 2020-07-24 广州视源电子科技股份有限公司 Data processing method and device
CN106713032B (en) * 2016-12-21 2019-09-17 瑞斯康达科技发展股份有限公司 A kind of method and device for realizing network management service management
CN108241653A (en) * 2016-12-23 2018-07-03 阿里巴巴集团控股有限公司 Data processing method and device
CN107402981B (en) * 2017-07-07 2023-07-18 国网浙江省电力公司信息通信分公司 Data increment processing method and system based on distributed offline database
CN110019195A (en) * 2017-09-27 2019-07-16 北京国双科技有限公司 A kind of storage method and device of data
CN107818177B (en) * 2017-11-23 2021-06-15 浪潮通用软件有限公司 Business intelligent model building method and building device
CN110019559A (en) * 2017-12-27 2019-07-16 航天信息股份有限公司 A kind of data query method and system
CN109086309B (en) * 2018-06-21 2022-12-30 土巴兔集团股份有限公司 Index dimension relation definition method, server and storage medium
CN110928903B (en) * 2018-08-31 2024-03-15 阿里巴巴集团控股有限公司 Data extraction method and device, equipment and storage medium
CN109656986A (en) * 2018-10-09 2019-04-19 阿里巴巴集团控股有限公司 A kind of householder method that business datum summarizes, device and electronic equipment
CN111159173B (en) * 2018-11-08 2023-04-18 王纹 Method for constructing multidimensional semantic database
CN110309496B (en) * 2019-06-24 2023-08-22 招商局金融科技有限公司 Data summarizing method, electronic device and computer readable storage medium
CN110297818B (en) * 2019-06-26 2022-03-01 杭州数梦工场科技有限公司 Method and device for constructing data warehouse
CN112182119A (en) * 2020-09-30 2021-01-05 中国平安财产保险股份有限公司 Method and device for verifying dimension table of data warehouse
CN112256744A (en) * 2020-10-27 2021-01-22 武汉市钱鲸科技有限公司 Retail data statistics flow
CN112464619B (en) * 2021-01-25 2021-05-25 平安国际智慧城市科技股份有限公司 Big data processing method, device and equipment and computer readable storage medium
CN113934782A (en) * 2021-09-22 2022-01-14 易联众智鼎(厦门)科技有限公司 DAG model-based data ETL system and using method
CN117350520B (en) * 2023-12-04 2024-02-27 浙江大学高端装备研究院 Automobile production optimization method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446964A (en) * 2008-12-31 2009-06-03 中国建设银行股份有限公司 Method of data mining and computer device
CN101866360A (en) * 2010-06-28 2010-10-20 北京用友政务软件有限公司 Data warehouse authentication method and system based on object multidimensional property space
CN101957852A (en) * 2010-09-26 2011-01-26 用友软件股份有限公司 Method and system for producing correlation information of table data
CN103020301A (en) * 2012-12-31 2013-04-03 中国科学院自动化研究所 Multidimensional data query and storage method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2311884A1 (en) * 2000-06-16 2001-12-16 Cognos Incorporated Method of managing slowly changing dimensions

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446964A (en) * 2008-12-31 2009-06-03 中国建设银行股份有限公司 Method of data mining and computer device
CN101866360A (en) * 2010-06-28 2010-10-20 北京用友政务软件有限公司 Data warehouse authentication method and system based on object multidimensional property space
CN101957852A (en) * 2010-09-26 2011-01-26 用友软件股份有限公司 Method and system for producing correlation information of table data
CN103020301A (en) * 2012-12-31 2013-04-03 中国科学院自动化研究所 Multidimensional data query and storage method and system

Also Published As

Publication number Publication date
CN103853820A (en) 2014-06-11

Similar Documents

Publication Publication Date Title
CN103853820B (en) Data processing method and data processing system
CN106339274B (en) A kind of method and system that data snapshot obtains
CN104636338B (en) A kind of data cleansing storage method for the monitoring of value-added tax negative and positive ticket
CN106648446A (en) Time series data storage method and apparatus, and electronic device
CN110275920A (en) Data query method, apparatus, electronic equipment and computer readable storage medium
CN106021389A (en) System and method for automatically generating news based on template
CN107273482A (en) Alarm data storage method and device based on HBase
CN107818115A (en) A kind of method and device of processing data table
CN107704590A (en) A kind of data processing method and system based on data warehouse
CN104182484A (en) Method and device for realizing mapping of HBase data and Java domain objects
CN107657049A (en) A kind of data processing method based on data warehouse
CN104636337B (en) A kind of data cleansing storage method for value-added tax
CN103744948B (en) Searching data checks the method and system of reason of discrepancies
CN106326438A (en) Personnel information correlating method
CN106095964A (en) A kind of method that data are carried out visualization filing and search
CN105630934A (en) Data statistic method and system
CN106897285A (en) Data Elements extract analysis system and Data Elements extract analysis method
CN107729448A (en) A kind of data handling system based on data warehouse
CN108959560A (en) Information processing method, device and electronic equipment based on tables of data
CN103020753A (en) Document state display system and document state display method
CN110019694A (en) Method, apparatus and computer readable storage medium for knowledge mapping
CN106033438A (en) Public sentiment data storage method and server
CN102208061A (en) Data cancel after verification processing device and method
CN101963993B (en) Method for fast searching database sheet table record
CN104636341B (en) A kind of data cleansing storage method for the several monitoring of value-added tax No.1

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100094 2F, building 11, UFIDA Software Park, 68 Beiqing Road, Haidian District, Beijing

Patentee after: Beijing UYU Government Software Co.,Ltd.

Address before: 100094 2F, building 11, UFIDA Software Park, 68 Beiqing Road, Haidian District, Beijing

Patentee before: YONYOU GOVERNMENT AFFAIRS SOFTWARE Co.,Ltd.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170503

Termination date: 20210220