CN107729448A - A kind of data handling system based on data warehouse - Google Patents
A kind of data handling system based on data warehouse Download PDFInfo
- Publication number
- CN107729448A CN107729448A CN201710919091.2A CN201710919091A CN107729448A CN 107729448 A CN107729448 A CN 107729448A CN 201710919091 A CN201710919091 A CN 201710919091A CN 107729448 A CN107729448 A CN 107729448A
- Authority
- CN
- China
- Prior art keywords
- data
- layer
- module
- processing module
- conformable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Abstract
A kind of data handling system based on data warehouse of present invention offer, the system, including:Atomic layer and conformable layer;Atomic layer connects with conformable layer;Atomic layer includes:First processing module and the first memory module;Conformable layer includes:Second processing module and the second memory module;First memory module connects with first processing module;Second memory module connects with Second processing module;Atomic layer is used for the first data for obtaining structuring;First processing module is used to, according to presets, carry out the first data tissue classification, obtains the second data;First memory module is used for the second data after sorting out to tissue and carries out partitioned storage;Atomic layer is used to the second data being updated to conformable layer;Second processing module is used to, according to default integration rules, merge the second data processing, generates the 3rd data;Second memory module is used to store the 3rd data.
Description
Technical field
The present invention relates to technical field of data processing, and in particular to a kind of data processing method based on data warehouse.
Background technology
With expansion plans of the company in big data urban project and deployment, big data Urban Data repository entry will be every
The foundation stone project in one city, it is the multiplier and boost motor of all other data items.
In the prior art, data warehouse often has situations such as superfluous data, shortage of data, and then data can be caused to differ
Cause, increase maintenance cost;In addition, often source is numerous and jumbled for the data of data warehouse, data have an ambiguity, data warehouse it is accurate
Property is relatively low.
The content of the invention
For drawbacks described above of the prior art, the invention provides a kind of data handling system based on data warehouse,
It is possible to prevente effectively from the data that situations such as data redundancy, shortage of data brings are inconsistent, maintenance cost is reduced, meanwhile, it is capable to disappear
It is more with property except data, and then increase the accuracy of data warehouse.
A kind of data handling system based on data warehouse provided by the invention, including:Atomic layer and conformable layer;
The atomic layer connects with the conformable layer;
The atomic layer includes:First processing module and the first memory module;
The conformable layer includes:Second processing module and the second memory module;
First memory module connects with the first processing module;
Second memory module connects with the Second processing module;
The atomic layer is used for the first data for obtaining structuring;
The first processing module is used to, according to presets, carry out first data tissue classification, obtains second
Data;
First memory module is used for the second data after sorting out to tissue and carries out partitioned storage;
The atomic layer is used to second data being updated to the conformable layer;
The Second processing module is used to, according to default integration rules, processing is merged to second data, raw
Into the 3rd data;
Second memory module is used to store the 3rd data.
Optionally, first memory module, is specifically used for:
According to the one or more in data source, data cycle, business classification, relationship type, after tissue classification
Data carry out partitioned storage.
Optionally, the conformable layer, in addition to:Rule establishes module;
The rule is established module and is connected with the first processing module;
The rule establish module be used for according to the non-NULL priority principle of data, the priority of data, data it is ageing,
One or more in majority rule, common-sense, rational integration rules are established, and the integration rules are sent
To the first processing module.
Optionally, the conformable layer, in addition to:Rule verification module;
The rule verification module establishes module with the rule and the first processing module is all connected with;
The rule verification module is used for the integration rules for verifying that the rule establishes module foundation;If being verified,
The integration rules are sent to the first processing module;
If checking by checking not over not being sent to the rule by information and establish module, build the rule
Formwork erection block re-establishes integration rules.
Optionally, the system, in addition to:Paste active layer;
The patch active layer is connected with the atomic layer;
The patch active layer is used to be standardized buffered data, obtains first data;And to the buffering
Data and first data carry out Historical archiving processing;
The patch active layer is additionally operable to first data being updated to the atomic layer.
Optionally, the system, in addition to:Cushion;
The cushion is connected with the patch active layer;
The cushion is used for the source data for caching the separate sources of the structuring in source database, generates the buffer number
According to;
The cushion is additionally operable to the buffered data being updated to the patch active layer.
Optionally, the system, in addition to:Collect city level;
The collection city level is connected with the conformable layer;
The collection city level is used to obtain the 3rd data from the conformable layer;And by interrelational form, splicing described the
Piece segment table in three data, the wide table in generation basis.
Optionally, the system, in addition to:Application layer;
The application layer is all connected with the patch active layer, the atomic layer, the conformable layer and the collection city level;
The application layer is used to paste described in periodic synchronization in active layer, the atomic layer, the conformable layer and the collection city level
Data.
Optionally, the system, in addition to:Large database concept;
The large database concept and the application layer, collection city level, the atomic layer, the patch active layer and the conformable layer
It is all connected with;
The large database concept is used for periodic synchronization and stores the application layer, the collection city level, the atomic layer, described
Paste the data in active layer and the conformable layer.
Optionally, the large database concept uses Hadoop large database concepts.
From above technical scheme, the present invention provides a kind of data handling system based on data warehouse, atomic layer and
Conformable layer;The atomic layer connects with the conformable layer;The atomic layer includes:First processing module and the first memory module;
The conformable layer includes:Second processing module and the second memory module;First memory module and the first processing module
Connection;Second memory module connects with the Second processing module;The atomic layer is used for the first number for obtaining structuring
According to;The first processing module is used to, according to presets, carry out first data tissue classification, obtains the second data;
First memory module is used for the second data after sorting out to tissue and carries out partitioned storage;The atomic layer is used for described the
Two data are updated to the conformable layer;The Second processing module is used for according to default integration rules, to second data
Processing is merged, generates the 3rd data;Second memory module is used to store the 3rd data.
The present invention carries out tissue classification by using atomic layer to data, can be effectively prevented from data redundancy, data lack
The data that situations such as mistake brings are inconsistent, reduce maintenance cost.Partitioned storage is carried out by the data after sorting out to tissue, is easy to
Information is traced to the source, is easy to information management.Meanwhile processing is merged to the second data by using conformable layer, it can make
Multi-source data unity eliminates the ambiguity of data into unique information, ensures that data have unique accuracy.
Brief description of the drawings
, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical scheme of the prior art
The required accompanying drawing used is briefly described in embodiment or description of the prior art.In all of the figs, similar element
Or part is typically identified by similar reference.In accompanying drawing, each element or part might not be drawn according to the ratio of reality.
Fig. 1 shows that the embodiment of the present invention provides a kind of schematic diagram of the data handling system based on data warehouse;
Fig. 2 shows that the embodiment of the present invention provides a kind of data flowchart of the data handling system based on data warehouse;
Fig. 3 shows that the embodiment of the present invention provides a kind of data framework schematic diagram of the data handling system of data warehouse.
Embodiment
The embodiment of technical solution of the present invention is described in detail below in conjunction with accompanying drawing.Following examples are only used for
Clearly illustrate technical scheme, therefore be intended only as example, and the protection of the present invention can not be limited with this
Scope.
It should be noted that unless otherwise indicated, technical term or scientific terminology used in this application should be this hair
The ordinary meaning that bright one of ordinary skill in the art are understood.
The invention provides a kind of data handling system based on data warehouse.Below in conjunction with the accompanying drawings to the implementation of the present invention
Example illustrates.
Fig. 1 shows a kind of schematic diagram for data handling system based on data warehouse that the embodiment of the present invention is provided;
Fig. 2 shows that the embodiment of the present invention provides a kind of data flowchart of the data handling system based on data warehouse;Fig. 3 is shown
The embodiment of the present invention provides a kind of data framework schematic diagram of the data handling system of data warehouse.As shown in figure 1, the present invention is real
A kind of data handling system based on data warehouse of example offer is provided, including:Atomic layer and conformable layer;
The atomic layer connects with the conformable layer;The atomic layer includes:First processing module and the first memory module;
The conformable layer includes:Second processing module and the second memory module;First memory module and the first processing module
Connection;Second memory module connects with the Second processing module;The atomic layer is used for the first number for obtaining structuring
According to;The first processing module is used to, according to presets, carry out first data tissue classification, obtains the second data;
First memory module is used for the second data after sorting out to tissue and carries out partitioned storage;The atomic layer is used for described the
Two data are updated to the conformable layer;The Second processing module is used for according to default integration rules, to second data
Processing is merged, generates the 3rd data;Second memory module is used to store the 3rd data.
The present invention carries out tissue classification by using atomic layer to data, can be effectively prevented from data redundancy, data lack
The data that situations such as mistake brings are inconsistent, reduce maintenance cost.Partitioned storage is carried out by the data after sorting out to tissue, is easy to
Information is traced to the source, is easy to information management.Meanwhile processing is merged to the second data by using conformable layer, it can make
Multi-source data unity eliminates the ambiguity of data into unique information, ensures that data have unique accuracy.
Wherein, data warehouse can atomize layer and conformable layer.
Wherein, the first data refer to the data for obtaining the structuring handled after initial data by tissue classification.First
Data can use the structure type of tables of data, numeral etc..
Generally, the information source of the first data of acquisition is more, and granularity is thinner, in addition each source data all do not do it is any
Merging treatment, the data in each source keep being completely independent.It is reflected on data row, then refers to that the tables of data of these information of tissue includes
Attribute it is smaller, as the information of people can be split to each fragment or stage, as people can be divided into essential information, relation information, connection
It is mode information, contact address information (including household register, inhabitation and work address, can be associated with address base, house storehouse), education letter
Breath, marriage information, growth and development information, talent market (can be associated with legal person storehouse), guarantee information, under one's name common reserve fund information, house property, name
Get off production, under one's name enterprise's (can be associated with legal person storehouse), good record, record of bad behavior, dead information etc..
Because the first data are generally from multi-source, and granularity is thinner, and data volume is very big, therefore, it is necessary to tissue
The first data after classification press partitioned storage, are easy to manage.
In a specific embodiment provided by the invention, first memory module, it is specifically used for:According to data come
One or more in source, data cycle, business classification, relationship type, the data after sorting out to tissue carry out partitioned storage.
Such as:It is by data source subregion such as the Basic Information Table of people;The social relationships of people, then therefore relation have it is more
Kind, and every kind of relation has multi-source, then needs to use relationship type and source assemblage province (relationship type main partition, data
Source child partition) mode preserve;And social security pays situation, then it only is from social security and pays information, but needs temporally that the cycle is (such as
Monthly) subregion preserves, that is, carries out subregion preservation by the data cycle.
Preserved during partitioned storage, it is necessary to which data source mark is individually deposited into a row.It can so be easy to data
Information is traced to the source.
For example, atomic layer has following subregion to store the first data:Essential information, educational information, marriage information, work
Make information, relation information, contact details, occupancy information, company information, certificate information etc..
In the present invention, the atomic layer is additionally operable to obtain the incremental data of first data in real time;At described first
Reason module is additionally operable to according to the presets, and tissue classification is carried out to the incremental data, and to first memory module
In second data be updated.
In the present invention, data snapshot periodically can be synchronized to big data by incremental data by atomic layer according to the time cycle
In storehouse, data snapshot can also be periodically synchronized in large database concept by the second data after renewal according to the time cycle, it is convenient
It is follow-up to search.
For example, per the end of month unloading, the historical snapshot of a second data is to large database concept, daily the data of incremental update
It is synchronized in large database concept.
First processing module is after the incremental data of the first data is obtained, it is also desirable to according to presets, to incremental data
Tissue classification is carried out, and the data partition after tissue classification is stored, and then realizes the renewal to the second data.
In the present invention, the first memory module is additionally operable to store the data class data of second data;Described first deposits
Storage module is additionally operable to the data class data that covering updates second data.And then improve the reliable of the data class data of data
Property.
For example, monthly or weekly update the data class data of second data.
In the present invention, atomic layer can also realize number by the data class data syn-chronization of the second data into large database concept
According to upgrade in time.
In a specific embodiment provided by the invention, the system, in addition to:Paste active layer;The patch active layer and institute
State atomic layer connection;The patch active layer is used to be standardized buffered data, obtains first data;And to described
Buffered data and first data carry out Historical archiving processing;The patch active layer is additionally operable to first data being updated to institute
State atomic layer.
In a specific embodiment provided by the invention, the system, in addition to:Cushion;The cushion and institute
State patch active layer connection;The cushion is used for the source data for caching the separate sources of structuring in source database, described in generation
Buffered data;The cushion is additionally operable to the buffered data being updated to the patch active layer.
In the present invention, cushion is the Data entries of block database, and cushion can obtain structuring from the system of source
Separate sources source data, and cached.
The cushion can include:3rd processing module and the 3rd memory module;3rd processing module with it is described
3rd memory module connects.
3rd memory module can cache the source data in the system of source, and patch active layer can be made directly to obtain number from cache layer
According to again when can prevent Back end data processing error, unnecessary trouble can be brought to source system by being taken out repeatedly when performing again;Together
When, when can prevent second decimation data, because source system update causes can not find the snapshot of data at that time.
3rd processing module can add timestamp to source data, so, can make patch active layer directly according to timestamp weight
It is new to extract data, the convenient extraction for pasting active layer to data;Meanwhile source data can be recorded according to the time in cushion;
In addition, this timestamp, which is additionally operable to paste when active layer extracts data, identifies incremental data.Therefore, the data table model of cushion and source
System is completely the same.3rd processing module of cushion does not do any additional modifications on the basis of the table of source, only increases a data
Loading time identifies, and this time marking (can use SYS_UPDATE_TIME), which is used to paste when active layer extracts data, identifies incremental number
According to.
Such as:3rd processing module of cushion does not do data modification, merely add the time of a data insertion
Row, the time is generated with mark data, and carry out circulating subregion storage to data cached with this time, and do increment extraction with it
Field.
3rd processing module to source data when increasing timestamp, the fill system time in increased incremental time field
As data generation time.Data can use additional pattern to deposit, and set circulation subregion, if retaining the data of 7 days, that
Just by a circulation subregion is formed week, if the same day is Monday, then the data of last Monday will be covered, according to this recursion, weekly
It is several to cover several data last week.
Cache layer can be stored on different main frames according to actual conditions by data cached, particularly having intranet and extranet
In the case of isolation safe, centre preserves the first data with a main frame, that is, ensure that the safety of data, can play bridge again
Connect the effect of heterogeneous networks.
Cushion can periodically (or irregular) structured data synchronization comes from the system of source, forms some cycles (7
Or 30 times circulation subregions) data.Generally, it is that passage time incremental mode synchronously comes to extract data from source system,
Database journal can certainly be parsed by increment identifier, full dose extracts the mode compared after data, carrys out synchronous source data and arrives
Cushion.
After the DSR of cushion, patch active layer periodically can go over from cushion synchrodata.
In the present invention, pasting the fourth processing module of active layer can be standardized to buffered data, generation first
Data, and store to the 4th memory module.
The patch active layer, can include:Fourth processing module and the 4th memory module.The fourth processing module with it is described
4th memory module connects.
By being standardized to data, it is ensured that data are expressed with same set of standard.
At the same time it can also carry out Historical archiving to buffered data and first data, by carrying out Historical archiving to it,
It can ensure that data can do historical analysis and contrast at any time.
Wherein, standardization can include:Data cleansing, conversion, coding mapping etc..For the process of coding mapping,
Need to use substantial amounts of Data element standard, and many standards are that possible have present country, place or professional standard, such as
Sex, marital status, educational background etc., some codings, these can directly refer to existing standard, then the coding mapping by source data
To standard code.If the standard not referred to, need to formulate standard for these data, to close in multi-source data
And when, there is provided unified standard.
Historical archiving is carried out to data, is exactly the history slide fastener data storage by data in the 4th memory module.Meanwhile
In the present invention, patch active layer can also be gone through by the first data syn-chronization of generation into large database concept using large database concept by all
The data of history version save.
For fourth processing module when being standardized to data, mapping row can't cover original row, but newly
Increase respective column to preserve the attribute information after mapping, do not delete original information, be directly to increase the field after new conversion.
Fourth processing module can be also used for after being standardized to data:To the data after standardization
Carry out real-time update.
When updating the data, covering renewal can be used, only retains a newest data.
The data of patch active layer will ensure that the data of data warehouse have history, and therefore, it can preserve longer data week
Phase (being usually 3 years) preserves the slide fastener data of history, and when storing slide fastener data, data storage is additional mode, not
Cover data;Standardized data can be generated by standardizing mapping process simultaneously, finally be preserved by the renewal frequency subregion of data,
Once data need to take out again, it is only necessary to first empty current bay data, the data for just needing to take out again further according to timestamp loading.
Further, since the data source of block database is extensive, all kinds of complete disunities of derived data standard, mark can be passed through
Standardization mapping process generates standardized data.Standardized data renewal is not additional pattern, but covers generation patterns.Together
When, original row can't be covered by pasting the mapping row of active layer, but increase respective column newly to preserve the attribute information after mapping.
In the present invention, the patch active layer is also connected with large database concept;Pasting active layer can be by the first big number of data Cun Chudao
According to the record in storehouse, facilitating the historical data for magnanimity and analysis.
Because the history layer for pasting active layer is increasing with the operation of system, the expansion in space not only proposes height to storage
It is required that also causing very big pressure to traditional structured database (such as Oracle), therefore, the storage of historical data is necessary
There is certain cycle, such as 1 year or 3 years, otherwise, whole storehouse will become too fat to move and can't bear.By the way that the data of filing are directly led
Enter in large database concept, both ensure that data safety, simultaneously for the historical data analysis of magnanimity, also become to be particularly easy to.
Such as:
Data before standardization:
Data after standardization:
In the present invention, applications can be directly fed by pasting the first data of active layer, can also be synchronized to atomic layer, because
This, application layer can be not only supplied to by pasting the first data of active layer, be also provided to atomic layer, and this is all in the protection of the present invention
In the range of.
In the present invention, the Second processing module of conformable layer is used for according to default integration rules, to described second
Data merge processing, generate the 3rd data;Second memory module is used to store the 3rd data.
By merging processing to the second data, multi-source data unity can be made into unique information, and then eliminate number
According to ambiguity, ensure that data have unique accuracy.
For example, the marriage information of people is the state of certain determination, such as unmarried, first marriage, it is remarried, remarry, divorce, be bereft of one's spouse
A certain state, the data of the different conditions of multi-source may be included in the second data for same person, this just needs
Data are integrated, obtain unique, accurate data.State only per attribute determines, could support all kinds of answer
Use scene.
In the present invention, the conformable layer, in addition to:Rule establishes module;The rule establishes module and described first
Processing module connects;The rule establish module be used for according to the non-NULL priority principle of data, the priority of data, data when
One or more in effect property, majority rule, common-sense, rational integration rules are established, and described integrate is advised
Then it is sent to the first processing module.
For example, the priority of data can be the authority of data, such as marriage information, and for authority, the civil affairs department
Should be most authoritative;Data it is ageing, the recent renewal time of data is primarily referred to as, as someone marriage comes from Ministry of Civil Affairs
Door, not have updated for 1 year, though its information authority is higher, it is ageing poor, also it is not necessarily
Accurately;Majority rule, as political affiliation (without authoritative source) is derived from the data in 10 sources altogether, wherein there is 9
Source is defined as member, and an only source is registered as party member, then result is likely to be defined by the result in 9 sources;Common-sense, it is such as all
Educational background takes maximum educational background to combine age comprehensive descision as someone educational background, marital status, can not such as the people of age under-18s
Energy marital status is married etc..
In the present invention, the conformable layer, in addition to:Rule verification module;The rule verification module and the rule
Establish module and the first processing module is all connected with;The rule verification module is used to verify that the rule establishes module foundation
Integration rules;If being verified, the integration rules are sent to the first processing module;If checking not over,
Checking is not then sent to the rule by information and establishes module, the rule is established module and re-establishes integration rules.
When being merged to each item data in the second data, integration rules are all verified by enough samples, to determine
The accuracy of Data Integration.
After certain sample checking, it is maximally effective that can determine a certain rule certainly, and therefore, each is regular
Determination, be required for substantial amounts of data verification.
Wherein, sample data can be the partial data in the second data.Integration rules can be met by sample data
Probability, to judge whether integration rules can be used as optimal integration rules.
If the coincidence rate of sample data is less than predetermined threshold value, show that the accuracy of the integration rules is not high, it is impossible to as
Optimal integration rules, checking do not pass through;If the coincidence rate of sample data is not less than predetermined threshold value, show the integration rules
Accuracy is higher, can be verified as optimal integration rules.
Wherein, integration rules are needed by verifying repeatedly, accuracy highest rule, Cai Nengzuo only in some rules
For optimal rules.
In the present invention, merged to the second data, can be mistake of the table to table during obtaining the 3rd data
Journey.When data merge, many labels or statistical information can be derived.As being directed to someone telephone number, it is possible to derive
Following index:Earliest enrollment time, nearest enrollment time, registered by how many individual sources, once joined by whom as registration
It is mode, time registered earliest in owner etc..By being optimized to integration rules, accurate spread out can be obtained
The desired value born.
After integration, data volume is typically can be controlled within millions second data, and therefore, the second memory module is not required to
3rd data are carried out with partitioned storage, but attribute of the Second processing module when merging needs to increase source, and the attribute
Business hours corresponding to renewal, in addition to more convenient trace to the source, also as the merging of generally attribute is to rely on the source of data
Priority, renewal time determine, so when incremental data merges, can also be carried out incremental data by same rule
Merge.
In the present invention, the data class data of the 3rd data generated after integration, can also be covered according to preset time
Lid renewal.
In the present invention, Second processing module can also be according to default integration rules, to the incremental data of the second data
Integrated, generate the incremental data of the 3rd data.
In the present invention, conformable layer can be by the 3rd data Cun Chudao large database concepts, can also be by the increasing of the 3rd data
Data syn-chronization is measured into large database concept.Periodically data snapshot can be synchronized in large database concept according to certain cycle.It is for example, every
The historical snapshot of the end of month unloading the 3rd data of portion is into large database concept, daily the data syn-chronization of incremental update to large database concept
In, conveniently subsequently data can be traced to the source.
In a specific embodiment provided by the invention, the system, in addition to:Collect city level;The collection city level and institute
State conformable layer connection;The collection city level is used to obtain the 3rd data from the conformable layer;And by interrelational form, splice institute
State the piece segment table in the 3rd data, the wide table in generation basis.
The collection city level can include the 5th processing module and the 5th memory module;5th processing module and described the
Five memory modules connect.
5th processing module is used to pass through interrelational form, splices the piece segment table in the 3rd data, generation basis
Wide table.5th memory module is used to store the wide table in the basis and the 3rd data.
By generating the wide table in basis, and then the more basic wide table for needing to provide for types of applications, and then reduce application process
In compute repeatedly.
, can be by the required primary attribute of some statistical analyses, such as sex, age, nationality for example, the wide table of statistical analysis
Pass through, political affiliation, marital status, household register, residential area, working region, five dangerous states, pay social security time, common reserve fund recently
State, individual attribute, label or the statistical indicator up to more than 50 such as public product time are paid recently, by these dimensions, index and combination,
Diversified statistical analysis application can be carried out, and it is even more individual field up to more than 150 to excavate the wide table in basis.
In the present invention, collection city level, which can divide, multiple fairground subregions, facilitates data application.
For example, collection city level can have license fairground, people's livelihood service fairground, economic industry fairground, statistical analysis fairground, excavation
Analyze fairground etc..
Collect the 5th memory module of city level in data storage, temporally cycle subregion preserves the 3rd data, this data
It is additional formula insertion data.5th memory module can also store basic data class data, can be covering renewal storage, and whole
The storage method for closing area is consistent.
In a specific embodiment provided by the invention, the system, in addition to:Application layer;The application layer and institute
Patch active layer, the atomic layer, the conformable layer and the collection city level is stated to be all connected with;The application layer is used for patch described in periodic synchronization
Data in active layer, the atomic layer, the conformable layer and the collection city level.
In the present invention, application layer can be from any one layer of acquisition number in patch active layer, atomic layer, conformable layer, collection city level
According to, but most evidences should come from collection city level.Application layer can be carried by the pattern of database service interface to applications
For data supporting.
In addition, the data of application layer will also file in large database concept, Historical archiving is done.
In a specific embodiment provided by the invention, the system, in addition to:Large database concept;The large database concept
It is all connected with the application layer, collection city level, the atomic layer, the patch active layer and the conformable layer;The large database concept
The number in the application layer, the collection city level, the atomic layer, the patch active layer and the conformable layer is stored for periodic synchronization
According to.
In the present invention, large database concept can include historical data base, unstructured database, chart database, daily record storehouse
Deng this is all within the scope of the present invention.
Application layer can provide data supporting by the pattern of database service interface to applications, and user is in application data
When, it can obtain, can also be acquired from large database concept, this is all within the scope of the present invention from application layer.
In application layer, the index for counting class, was preserved by the cycle, statistics is had history.For money
Expect the query interface of class data, preserved usually using covering newer.
The large database concept can use Hadoop large database concepts.Computing capability and storage using Hadoop infinite expandings
Ability, ensure that historical data is in " online " state all the time.
For pipelined data, then temporally flowing water preserves, and the data of some cycles (in such as 3 years) are stored in traditional data
Storehouse, meanwhile, all historical datas are dumped in large database concept.Slide fastener data, and all history numbers are formed by history
According to dumping in Hadoop large database concepts.
For data class data, then slide fastener data are formed, the change historical trajectory data of formation is all saved in big number
Keep synchronous according in storehouse, and using the data of moon full dose day increment, so, when data can trace back to any before the previous day
Between point.
The preservation of these historical datas, the storage safety of data is on the one hand ensure that, on the other hand, then can be directed to history
Data do mutation analysis.
Application oriented data, due to the unpredictability of future application, therefore, it may be from each data field, very
To being that access evidence carrys out support applications in large database concept, therefore, for some personalized applications, may be fetched from other areas
According to, but most of application data demand, it should it is directly to be generated from collection urban district.
In the present invention, can also include:Unstructured data sources;
The unstructured data sources are connected with the large database concept;
Large database concept can obtain unstructured data in real time, and by unstructured data storage into the large database concept
Unstructured database in.
Can be by way of generating key-value pair, directly by unstructured data circulation into unstructured database.
To sum up, technical scheme, there is following beneficial effect:
With higher autgmentability:Due in atomic layer in strict accordance with third normal form specification, when system increase it is various
During new demand, it is not necessary to original framework is made an amendment, it is only necessary to certain extension is done to original model, it is possible to meet these
Demand.On the other hand, because the pattern for acquiring Hadoop and traditional database combination builds data warehouse, when unstructured number
According into data warehouse, or when data accumulation to traditional data warehouse has been difficult to support, then can utilize
Hadoop high scalability carrys out the storage capacity and computing capability in mandatory growth data warehouse.
With opening:The data such as the compatible structuring of the present invention, semi-structured, unstructured, diagram data, daily record data
Form, while be directed to different heterogeneous data sources, as Oracle, Mysql, SQLServer, Access, DB2, Postgres,
The systems such as Teradata, local file are compatible, and provide data, services for outside, and external service platforms need not manage
Inside is the data preserved with what storage medium, is provided which unified interface.
With security:By sound back mechanism, the risk for reducing loss of data is maximized.
With ease for maintenance:Because all data processing rules are all disposed in rule list, and resolution rules
Program there was only one, regular (Qiang ZhiyaoQiung well-regulated service description) and data processing engine are full decouplings, work as data
During rule change, it is only necessary to simply one rule of modification, it is possible to all influence point modifications come, this is greatly improved
The maintainability of system.
With easy traceability:Since atomic layer, until conformable layer, collection city level, thick to every carefully to each cell
One record, have recorded the source of data, with reference to the processing rule of data, can track number always from application layer data
According to source, and the rule transformation that data are passed through during intermediate treatment.
With integrality:Source data enters data warehouse, has just been directly entered patch active layer, all information are all without loss.
More than the historical data of certain time window (such as more than 1 year), Hadoop platform (the not offline magnetic of on-line operation can be transferred to
Tape pool), Life cycle data storage, calculating and management are carried out, ensure that the integrality of data.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example " or " some show
The description of example " etc. means that combining specific features, structure, material or feature that the embodiment or example describe is contained in the present invention
At least one embodiment or example in.In this manual, the schematic representation of above-mentioned term is not required to be directed to
Identical embodiment or example.Moreover, specific features, structure, material or the feature of description can be in any one or more realities
Apply and combined in an appropriate manner in example or example.In addition, in the case of not conflicting, those skilled in the art can incite somebody to action
Different embodiments or example and the feature of different embodiments or example described in this specification are combined and combined.Need
Illustrate, the flow chart and block diagram in accompanying drawing of the present invention show that system according to an embodiment of the invention obtains the production of machine program
Architectural framework in the cards, function and the operation of product.At this point, each square frame in flow chart or block diagram can represent one
A part for individual module, program segment or code, a part for the module, program segment or code include one or more be used in fact
The executable instruction of logic function as defined in existing.It should also be noted that marked at some as in the realization replaced in square frame
Function can also be with different from the order marked in accompanying drawing generation.For example, two continuous square frames can essentially substantially simultaneously
Perform capablely, they can also be performed in the opposite order sometimes, and this is depending on involved function.It is also noted that frame
The combination of figure and/or each square frame and block diagram in flow chart and/or the square frame in flow chart, it can use as defined in performing
Function or the special hardware based system of action realize, or can with specialized hardware with obtain combination that machine instructs come
Realize.
In several embodiments provided herein, it should be understood that disclosed system, can be by others side
Formula is realized.System embodiment described above is only schematical, for example, the division of the unit, only one kind are patrolled
Function division is collected, there can be other dividing mode when actually realizing, in another example, multiple units or component can combine or can
To be integrated into another system, or some features can be ignored, or not perform.
The unit illustrated as separating component can be or may not be physically separate, show as unit
The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units it is integrated in a unit.
If the function is realized in the form of SFU software functional unit and is used as independent production marketing or in use, can be with
It is stored in an acquisition machine read/write memory medium.Based on such understanding, technical scheme is substantially in other words
The part to be contributed to prior art or the part of the technical scheme can be embodied in the form of software product, and this is obtained
Machine software product is stored in a storage medium, including some instructions are causing an acquisition machine machine (can be
People obtains machine, server, or net machine etc.) perform all or part of step of each embodiment methods described of the present invention.
And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations;To the greatest extent
The present invention is described in detail with reference to foregoing embodiments for pipe, it will be understood by those within the art that:Its according to
The technical scheme described in foregoing embodiments can so be modified, either which part or all technical characteristic are entered
Row equivalent substitution;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology
The scope of scheme, it all should cover among the claim of the present invention and the scope of specification.
Claims (10)
- A kind of 1. data handling system based on data warehouse, it is characterised in that including:Atomic layer and conformable layer;The atomic layer connects with the conformable layer;The atomic layer includes:First processing module and the first memory module;The conformable layer includes:Second processing module and the second memory module;First memory module connects with the first processing module;Second memory module connects with the Second processing module;The atomic layer is used for the first data for obtaining structuring;The first processing module is used to, according to presets, carry out first data tissue classification, obtains the second data;First memory module is used for the second data after sorting out to tissue and carries out partitioned storage;The atomic layer is used to second data being updated to the conformable layer;The Second processing module is used for according to default integration rules, and second data are merged with processing, generates the Three data;Second memory module is used to store the 3rd data.
- 2. the data handling system according to claim 1 based on data warehouse, it is characterised in that the first storage mould Block, it is specifically used for:According to the one or more in data source, data cycle, business classification, relationship type, the data after sorting out to tissue Carry out partitioned storage.
- 3. the data handling system according to claim 1 based on data warehouse, it is characterised in that the conformable layer, also Including:Rule establishes module;The rule is established module and is connected with the first processing module;The rule establish module be used for according to the non-NULL priority principle of data, the priority of data, data it is ageing, a small number of The one or more in majority principle, common-sense are obeyed, establish rational integration rules, and the integration rules are sent to institute State first processing module.
- 4. the data handling system according to claim 3 based on data warehouse, it is characterised in that the conformable layer, also Including:Rule verification module;The rule verification module establishes module with the rule and the first processing module is all connected with;The rule verification module is used for the integration rules for verifying that the rule establishes module foundation;If being verified, by institute State integration rules and be sent to the first processing module;If checking by checking not over not being sent to the rule by information and establish module, the rule is established mould Block re-establishes integration rules.
- 5. the data handling system according to claim 1 based on data warehouse, it is characterised in that the system, also wrap Include:Paste active layer;The patch active layer is connected with the atomic layer;The patch active layer is used to be standardized buffered data, obtains first data;And to the buffered data Historical archiving processing is carried out with first data;The patch active layer is additionally operable to first data being updated to the atomic layer.
- 6. the data handling system according to claim 5 based on data warehouse, it is characterised in that the system, also wrap Include:Cushion;The cushion is connected with the patch active layer;The cushion is used for the source data for caching the separate sources of the structuring in source database, generates the buffered data;The cushion is additionally operable to the buffered data being updated to the patch active layer.
- 7. the data handling system according to claim 6 based on data warehouse, it is characterised in that the system, also wrap Include:Collect city level;The collection city level is connected with the conformable layer;The collection city level is used to obtain the 3rd data from the conformable layer;And pass through interrelational form, splicing the 3rd number Piece segment table in, the wide table in generation basis.
- 8. the data handling system according to claim 7 based on data warehouse, it is characterised in that the system, also wrap Include:Application layer;The application layer is all connected with the patch active layer, the atomic layer, the conformable layer and the collection city level;The application layer is used to paste the number in active layer, the atomic layer, the conformable layer and the collection city level described in periodic synchronization According to.
- 9. the data handling system according to claim 8 based on data warehouse, it is characterised in that the system, also wrap Include:Large database concept;The large database concept connects with the application layer, collection city level, the atomic layer, the patch active layer and the conformable layer Connect;The large database concept is used for periodic synchronization and stores the application layer, the collection city level, the atomic layer, the patch source Data in layer and the conformable layer.
- 10. the data handling system according to claim 9 based on data warehouse, it is characterised in that the large database concept Using Hadoop large database concepts.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710919091.2A CN107729448A (en) | 2017-09-30 | 2017-09-30 | A kind of data handling system based on data warehouse |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710919091.2A CN107729448A (en) | 2017-09-30 | 2017-09-30 | A kind of data handling system based on data warehouse |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107729448A true CN107729448A (en) | 2018-02-23 |
Family
ID=61208516
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710919091.2A Pending CN107729448A (en) | 2017-09-30 | 2017-09-30 | A kind of data handling system based on data warehouse |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107729448A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108829830A (en) * | 2018-06-15 | 2018-11-16 | 四川众之金科技有限公司 | Data processing method and device |
CN110309108A (en) * | 2019-05-08 | 2019-10-08 | 江苏满运软件科技有限公司 | Data acquisition and storage method, device, electronic equipment, storage medium |
CN110990390A (en) * | 2019-12-02 | 2020-04-10 | 东莞中国科学院云计算产业技术创新与育成中心 | Data cooperative processing method and device, computer equipment and storage medium |
CN111125069A (en) * | 2019-11-13 | 2020-05-08 | 深圳市华傲数据技术有限公司 | Data cleaning and fusing system |
CN111337727A (en) * | 2020-03-05 | 2020-06-26 | 山东泰开互感器有限公司 | Current transformer and cloud computing-based current transformer information interaction system |
CN113377872A (en) * | 2021-06-25 | 2021-09-10 | 北京红山信息科技研究院有限公司 | Offline synchronization method, device and equipment of online system data in big data center |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101197876A (en) * | 2006-12-06 | 2008-06-11 | 中兴通讯股份有限公司 | Method and system for multi-dimensional analysis of message service data |
US20130132383A1 (en) * | 2007-05-29 | 2013-05-23 | Christopher Ahlberg | Information service for relationships between facts extracted from differing sources on a wide area network |
CN105608203A (en) * | 2015-12-24 | 2016-05-25 | Tcl集团股份有限公司 | Internet of things log processing method and device based on Hadoop platform |
CN106294521A (en) * | 2015-06-12 | 2017-01-04 | 交通银行股份有限公司 | Date storage method and data warehouse |
-
2017
- 2017-09-30 CN CN201710919091.2A patent/CN107729448A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101197876A (en) * | 2006-12-06 | 2008-06-11 | 中兴通讯股份有限公司 | Method and system for multi-dimensional analysis of message service data |
US20130132383A1 (en) * | 2007-05-29 | 2013-05-23 | Christopher Ahlberg | Information service for relationships between facts extracted from differing sources on a wide area network |
CN106294521A (en) * | 2015-06-12 | 2017-01-04 | 交通银行股份有限公司 | Date storage method and data warehouse |
CN105608203A (en) * | 2015-12-24 | 2016-05-25 | Tcl集团股份有限公司 | Internet of things log processing method and device based on Hadoop platform |
Non-Patent Citations (1)
Title |
---|
XUZHENGZHU: "Oracle & BI & 大数据分析", 《HTTPS://WWW.CNBLOGS.COM/HONDAHSU/P/5314176.HTML》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108829830A (en) * | 2018-06-15 | 2018-11-16 | 四川众之金科技有限公司 | Data processing method and device |
CN110309108A (en) * | 2019-05-08 | 2019-10-08 | 江苏满运软件科技有限公司 | Data acquisition and storage method, device, electronic equipment, storage medium |
CN111125069A (en) * | 2019-11-13 | 2020-05-08 | 深圳市华傲数据技术有限公司 | Data cleaning and fusing system |
CN111125069B (en) * | 2019-11-13 | 2023-04-28 | 深圳市华傲数据技术有限公司 | Data cleaning fusion system |
CN110990390A (en) * | 2019-12-02 | 2020-04-10 | 东莞中国科学院云计算产业技术创新与育成中心 | Data cooperative processing method and device, computer equipment and storage medium |
CN110990390B (en) * | 2019-12-02 | 2024-03-08 | 东莞中国科学院云计算产业技术创新与育成中心 | Data cooperative processing method, device, computer equipment and storage medium |
CN111337727A (en) * | 2020-03-05 | 2020-06-26 | 山东泰开互感器有限公司 | Current transformer and cloud computing-based current transformer information interaction system |
CN113377872A (en) * | 2021-06-25 | 2021-09-10 | 北京红山信息科技研究院有限公司 | Offline synchronization method, device and equipment of online system data in big data center |
CN113377872B (en) * | 2021-06-25 | 2024-02-27 | 北京红山信息科技研究院有限公司 | Offline synchronization method, device and equipment of online system data in big data center |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107657049A (en) | A kind of data processing method based on data warehouse | |
CN107729448A (en) | A kind of data handling system based on data warehouse | |
CN107704590A (en) | A kind of data processing method and system based on data warehouse | |
CN112241924B (en) | Wisdom gas system | |
US5745755A (en) | Method for creating and maintaining a database for a dynamic enterprise | |
CN110347719A (en) | A kind of enterprise's foreign trade method for prewarning risk and system based on big data | |
US8626703B2 (en) | Enterprise resource planning (ERP) system change data capture | |
CN106294521A (en) | Date storage method and data warehouse | |
Mađer et al. | Analysis of possibilities for linking land registers and other official registers in the Republic of Croatia based on LADM | |
CN102663008B (en) | Government integrated business platform business library and construction method of base library | |
Josélyne et al. | Partitioning microservices: A domain engineering approach | |
US11119989B1 (en) | Data aggregation with schema enforcement | |
CN111382956A (en) | Enterprise group relationship mining method and device | |
CN110109908B (en) | Analysis system and method for mining potential relationship of person based on social basic information | |
CN108959560A (en) | Information processing method, device and electronic equipment based on tables of data | |
CN107945014A (en) | One kind is based on LAOP platform small amount personal loan systems | |
CN111737335B (en) | Product information integration processing method and device, computer equipment and storage medium | |
CN110457333A (en) | Data real time updating method, device and computer readable storage medium | |
CN112506892A (en) | Index traceability management system based on metadata technology | |
CN107491558A (en) | Metadata updates method and device | |
CN114240333A (en) | Holographic application center system for electronic accounting archives | |
CN113688396A (en) | Automobile information safety risk assessment automation system | |
CN107506155A (en) | Date storage method and device based on block number evidence | |
CN110019237B (en) | System and method for analyzing criminal whereabouts based on map | |
Richter | In-tensions to infrastructure: developing digital property databases in urban Karnataka, India |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180223 |
|
RJ01 | Rejection of invention patent application after publication |