CN107729448A - A kind of data handling system based on data warehouse - Google Patents

A kind of data handling system based on data warehouse Download PDF

Info

Publication number
CN107729448A
CN107729448A CN201710919091.2A CN201710919091A CN107729448A CN 107729448 A CN107729448 A CN 107729448A CN 201710919091 A CN201710919091 A CN 201710919091A CN 107729448 A CN107729448 A CN 107729448A
Authority
CN
China
Prior art keywords
data
layer
module
processing module
conformable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710919091.2A
Other languages
Chinese (zh)
Inventor
黎仁全
唐明辉
李邱林
贾西贝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huaao Data Technology Co Ltd
Original Assignee
Shenzhen Huaao Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huaao Data Technology Co Ltd filed Critical Shenzhen Huaao Data Technology Co Ltd
Priority to CN201710919091.2A priority Critical patent/CN107729448A/en
Publication of CN107729448A publication Critical patent/CN107729448A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Abstract

A kind of data handling system based on data warehouse of present invention offer, the system, including:Atomic layer and conformable layer;Atomic layer connects with conformable layer;Atomic layer includes:First processing module and the first memory module;Conformable layer includes:Second processing module and the second memory module;First memory module connects with first processing module;Second memory module connects with Second processing module;Atomic layer is used for the first data for obtaining structuring;First processing module is used to, according to presets, carry out the first data tissue classification, obtains the second data;First memory module is used for the second data after sorting out to tissue and carries out partitioned storage;Atomic layer is used to the second data being updated to conformable layer;Second processing module is used to, according to default integration rules, merge the second data processing, generates the 3rd data;Second memory module is used to store the 3rd data.

Description

A kind of data handling system based on data warehouse
Technical field
The present invention relates to technical field of data processing, and in particular to a kind of data processing method based on data warehouse.
Background technology
With expansion plans of the company in big data urban project and deployment, big data Urban Data repository entry will be every The foundation stone project in one city, it is the multiplier and boost motor of all other data items.
In the prior art, data warehouse often has situations such as superfluous data, shortage of data, and then data can be caused to differ Cause, increase maintenance cost;In addition, often source is numerous and jumbled for the data of data warehouse, data have an ambiguity, data warehouse it is accurate Property is relatively low.
The content of the invention
For drawbacks described above of the prior art, the invention provides a kind of data handling system based on data warehouse, It is possible to prevente effectively from the data that situations such as data redundancy, shortage of data brings are inconsistent, maintenance cost is reduced, meanwhile, it is capable to disappear It is more with property except data, and then increase the accuracy of data warehouse.
A kind of data handling system based on data warehouse provided by the invention, including:Atomic layer and conformable layer;
The atomic layer connects with the conformable layer;
The atomic layer includes:First processing module and the first memory module;
The conformable layer includes:Second processing module and the second memory module;
First memory module connects with the first processing module;
Second memory module connects with the Second processing module;
The atomic layer is used for the first data for obtaining structuring;
The first processing module is used to, according to presets, carry out first data tissue classification, obtains second Data;
First memory module is used for the second data after sorting out to tissue and carries out partitioned storage;
The atomic layer is used to second data being updated to the conformable layer;
The Second processing module is used to, according to default integration rules, processing is merged to second data, raw Into the 3rd data;
Second memory module is used to store the 3rd data.
Optionally, first memory module, is specifically used for:
According to the one or more in data source, data cycle, business classification, relationship type, after tissue classification Data carry out partitioned storage.
Optionally, the conformable layer, in addition to:Rule establishes module;
The rule is established module and is connected with the first processing module;
The rule establish module be used for according to the non-NULL priority principle of data, the priority of data, data it is ageing, One or more in majority rule, common-sense, rational integration rules are established, and the integration rules are sent To the first processing module.
Optionally, the conformable layer, in addition to:Rule verification module;
The rule verification module establishes module with the rule and the first processing module is all connected with;
The rule verification module is used for the integration rules for verifying that the rule establishes module foundation;If being verified, The integration rules are sent to the first processing module;
If checking by checking not over not being sent to the rule by information and establish module, build the rule Formwork erection block re-establishes integration rules.
Optionally, the system, in addition to:Paste active layer;
The patch active layer is connected with the atomic layer;
The patch active layer is used to be standardized buffered data, obtains first data;And to the buffering Data and first data carry out Historical archiving processing;
The patch active layer is additionally operable to first data being updated to the atomic layer.
Optionally, the system, in addition to:Cushion;
The cushion is connected with the patch active layer;
The cushion is used for the source data for caching the separate sources of the structuring in source database, generates the buffer number According to;
The cushion is additionally operable to the buffered data being updated to the patch active layer.
Optionally, the system, in addition to:Collect city level;
The collection city level is connected with the conformable layer;
The collection city level is used to obtain the 3rd data from the conformable layer;And by interrelational form, splicing described the Piece segment table in three data, the wide table in generation basis.
Optionally, the system, in addition to:Application layer;
The application layer is all connected with the patch active layer, the atomic layer, the conformable layer and the collection city level;
The application layer is used to paste described in periodic synchronization in active layer, the atomic layer, the conformable layer and the collection city level Data.
Optionally, the system, in addition to:Large database concept;
The large database concept and the application layer, collection city level, the atomic layer, the patch active layer and the conformable layer It is all connected with;
The large database concept is used for periodic synchronization and stores the application layer, the collection city level, the atomic layer, described Paste the data in active layer and the conformable layer.
Optionally, the large database concept uses Hadoop large database concepts.
From above technical scheme, the present invention provides a kind of data handling system based on data warehouse, atomic layer and Conformable layer;The atomic layer connects with the conformable layer;The atomic layer includes:First processing module and the first memory module; The conformable layer includes:Second processing module and the second memory module;First memory module and the first processing module Connection;Second memory module connects with the Second processing module;The atomic layer is used for the first number for obtaining structuring According to;The first processing module is used to, according to presets, carry out first data tissue classification, obtains the second data; First memory module is used for the second data after sorting out to tissue and carries out partitioned storage;The atomic layer is used for described the Two data are updated to the conformable layer;The Second processing module is used for according to default integration rules, to second data Processing is merged, generates the 3rd data;Second memory module is used to store the 3rd data.
The present invention carries out tissue classification by using atomic layer to data, can be effectively prevented from data redundancy, data lack The data that situations such as mistake brings are inconsistent, reduce maintenance cost.Partitioned storage is carried out by the data after sorting out to tissue, is easy to Information is traced to the source, is easy to information management.Meanwhile processing is merged to the second data by using conformable layer, it can make Multi-source data unity eliminates the ambiguity of data into unique information, ensures that data have unique accuracy.
Brief description of the drawings
, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical scheme of the prior art The required accompanying drawing used is briefly described in embodiment or description of the prior art.In all of the figs, similar element Or part is typically identified by similar reference.In accompanying drawing, each element or part might not be drawn according to the ratio of reality.
Fig. 1 shows that the embodiment of the present invention provides a kind of schematic diagram of the data handling system based on data warehouse;
Fig. 2 shows that the embodiment of the present invention provides a kind of data flowchart of the data handling system based on data warehouse;
Fig. 3 shows that the embodiment of the present invention provides a kind of data framework schematic diagram of the data handling system of data warehouse.
Embodiment
The embodiment of technical solution of the present invention is described in detail below in conjunction with accompanying drawing.Following examples are only used for Clearly illustrate technical scheme, therefore be intended only as example, and the protection of the present invention can not be limited with this Scope.
It should be noted that unless otherwise indicated, technical term or scientific terminology used in this application should be this hair The ordinary meaning that bright one of ordinary skill in the art are understood.
The invention provides a kind of data handling system based on data warehouse.Below in conjunction with the accompanying drawings to the implementation of the present invention Example illustrates.
Fig. 1 shows a kind of schematic diagram for data handling system based on data warehouse that the embodiment of the present invention is provided; Fig. 2 shows that the embodiment of the present invention provides a kind of data flowchart of the data handling system based on data warehouse;Fig. 3 is shown The embodiment of the present invention provides a kind of data framework schematic diagram of the data handling system of data warehouse.As shown in figure 1, the present invention is real A kind of data handling system based on data warehouse of example offer is provided, including:Atomic layer and conformable layer;
The atomic layer connects with the conformable layer;The atomic layer includes:First processing module and the first memory module; The conformable layer includes:Second processing module and the second memory module;First memory module and the first processing module Connection;Second memory module connects with the Second processing module;The atomic layer is used for the first number for obtaining structuring According to;The first processing module is used to, according to presets, carry out first data tissue classification, obtains the second data; First memory module is used for the second data after sorting out to tissue and carries out partitioned storage;The atomic layer is used for described the Two data are updated to the conformable layer;The Second processing module is used for according to default integration rules, to second data Processing is merged, generates the 3rd data;Second memory module is used to store the 3rd data.
The present invention carries out tissue classification by using atomic layer to data, can be effectively prevented from data redundancy, data lack The data that situations such as mistake brings are inconsistent, reduce maintenance cost.Partitioned storage is carried out by the data after sorting out to tissue, is easy to Information is traced to the source, is easy to information management.Meanwhile processing is merged to the second data by using conformable layer, it can make Multi-source data unity eliminates the ambiguity of data into unique information, ensures that data have unique accuracy.
Wherein, data warehouse can atomize layer and conformable layer.
Wherein, the first data refer to the data for obtaining the structuring handled after initial data by tissue classification.First Data can use the structure type of tables of data, numeral etc..
Generally, the information source of the first data of acquisition is more, and granularity is thinner, in addition each source data all do not do it is any Merging treatment, the data in each source keep being completely independent.It is reflected on data row, then refers to that the tables of data of these information of tissue includes Attribute it is smaller, as the information of people can be split to each fragment or stage, as people can be divided into essential information, relation information, connection It is mode information, contact address information (including household register, inhabitation and work address, can be associated with address base, house storehouse), education letter Breath, marriage information, growth and development information, talent market (can be associated with legal person storehouse), guarantee information, under one's name common reserve fund information, house property, name Get off production, under one's name enterprise's (can be associated with legal person storehouse), good record, record of bad behavior, dead information etc..
Because the first data are generally from multi-source, and granularity is thinner, and data volume is very big, therefore, it is necessary to tissue The first data after classification press partitioned storage, are easy to manage.
In a specific embodiment provided by the invention, first memory module, it is specifically used for:According to data come One or more in source, data cycle, business classification, relationship type, the data after sorting out to tissue carry out partitioned storage.
Such as:It is by data source subregion such as the Basic Information Table of people;The social relationships of people, then therefore relation have it is more Kind, and every kind of relation has multi-source, then needs to use relationship type and source assemblage province (relationship type main partition, data Source child partition) mode preserve;And social security pays situation, then it only is from social security and pays information, but needs temporally that the cycle is (such as Monthly) subregion preserves, that is, carries out subregion preservation by the data cycle.
Preserved during partitioned storage, it is necessary to which data source mark is individually deposited into a row.It can so be easy to data Information is traced to the source.
For example, atomic layer has following subregion to store the first data:Essential information, educational information, marriage information, work Make information, relation information, contact details, occupancy information, company information, certificate information etc..
In the present invention, the atomic layer is additionally operable to obtain the incremental data of first data in real time;At described first Reason module is additionally operable to according to the presets, and tissue classification is carried out to the incremental data, and to first memory module In second data be updated.
In the present invention, data snapshot periodically can be synchronized to big data by incremental data by atomic layer according to the time cycle In storehouse, data snapshot can also be periodically synchronized in large database concept by the second data after renewal according to the time cycle, it is convenient It is follow-up to search.
For example, per the end of month unloading, the historical snapshot of a second data is to large database concept, daily the data of incremental update It is synchronized in large database concept.
First processing module is after the incremental data of the first data is obtained, it is also desirable to according to presets, to incremental data Tissue classification is carried out, and the data partition after tissue classification is stored, and then realizes the renewal to the second data.
In the present invention, the first memory module is additionally operable to store the data class data of second data;Described first deposits Storage module is additionally operable to the data class data that covering updates second data.And then improve the reliable of the data class data of data Property.
For example, monthly or weekly update the data class data of second data.
In the present invention, atomic layer can also realize number by the data class data syn-chronization of the second data into large database concept According to upgrade in time.
In a specific embodiment provided by the invention, the system, in addition to:Paste active layer;The patch active layer and institute State atomic layer connection;The patch active layer is used to be standardized buffered data, obtains first data;And to described Buffered data and first data carry out Historical archiving processing;The patch active layer is additionally operable to first data being updated to institute State atomic layer.
In a specific embodiment provided by the invention, the system, in addition to:Cushion;The cushion and institute State patch active layer connection;The cushion is used for the source data for caching the separate sources of structuring in source database, described in generation Buffered data;The cushion is additionally operable to the buffered data being updated to the patch active layer.
In the present invention, cushion is the Data entries of block database, and cushion can obtain structuring from the system of source Separate sources source data, and cached.
The cushion can include:3rd processing module and the 3rd memory module;3rd processing module with it is described 3rd memory module connects.
3rd memory module can cache the source data in the system of source, and patch active layer can be made directly to obtain number from cache layer According to again when can prevent Back end data processing error, unnecessary trouble can be brought to source system by being taken out repeatedly when performing again;Together When, when can prevent second decimation data, because source system update causes can not find the snapshot of data at that time.
3rd processing module can add timestamp to source data, so, can make patch active layer directly according to timestamp weight It is new to extract data, the convenient extraction for pasting active layer to data;Meanwhile source data can be recorded according to the time in cushion; In addition, this timestamp, which is additionally operable to paste when active layer extracts data, identifies incremental data.Therefore, the data table model of cushion and source System is completely the same.3rd processing module of cushion does not do any additional modifications on the basis of the table of source, only increases a data Loading time identifies, and this time marking (can use SYS_UPDATE_TIME), which is used to paste when active layer extracts data, identifies incremental number According to.
Such as:3rd processing module of cushion does not do data modification, merely add the time of a data insertion Row, the time is generated with mark data, and carry out circulating subregion storage to data cached with this time, and do increment extraction with it Field.
3rd processing module to source data when increasing timestamp, the fill system time in increased incremental time field As data generation time.Data can use additional pattern to deposit, and set circulation subregion, if retaining the data of 7 days, that Just by a circulation subregion is formed week, if the same day is Monday, then the data of last Monday will be covered, according to this recursion, weekly It is several to cover several data last week.
Cache layer can be stored on different main frames according to actual conditions by data cached, particularly having intranet and extranet In the case of isolation safe, centre preserves the first data with a main frame, that is, ensure that the safety of data, can play bridge again Connect the effect of heterogeneous networks.
Cushion can periodically (or irregular) structured data synchronization comes from the system of source, forms some cycles (7 Or 30 times circulation subregions) data.Generally, it is that passage time incremental mode synchronously comes to extract data from source system, Database journal can certainly be parsed by increment identifier, full dose extracts the mode compared after data, carrys out synchronous source data and arrives Cushion.
After the DSR of cushion, patch active layer periodically can go over from cushion synchrodata.
In the present invention, pasting the fourth processing module of active layer can be standardized to buffered data, generation first Data, and store to the 4th memory module.
The patch active layer, can include:Fourth processing module and the 4th memory module.The fourth processing module with it is described 4th memory module connects.
By being standardized to data, it is ensured that data are expressed with same set of standard.
At the same time it can also carry out Historical archiving to buffered data and first data, by carrying out Historical archiving to it, It can ensure that data can do historical analysis and contrast at any time.
Wherein, standardization can include:Data cleansing, conversion, coding mapping etc..For the process of coding mapping, Need to use substantial amounts of Data element standard, and many standards are that possible have present country, place or professional standard, such as Sex, marital status, educational background etc., some codings, these can directly refer to existing standard, then the coding mapping by source data To standard code.If the standard not referred to, need to formulate standard for these data, to close in multi-source data And when, there is provided unified standard.
Historical archiving is carried out to data, is exactly the history slide fastener data storage by data in the 4th memory module.Meanwhile In the present invention, patch active layer can also be gone through by the first data syn-chronization of generation into large database concept using large database concept by all The data of history version save.
For fourth processing module when being standardized to data, mapping row can't cover original row, but newly Increase respective column to preserve the attribute information after mapping, do not delete original information, be directly to increase the field after new conversion.
Fourth processing module can be also used for after being standardized to data:To the data after standardization Carry out real-time update.
When updating the data, covering renewal can be used, only retains a newest data.
The data of patch active layer will ensure that the data of data warehouse have history, and therefore, it can preserve longer data week Phase (being usually 3 years) preserves the slide fastener data of history, and when storing slide fastener data, data storage is additional mode, not Cover data;Standardized data can be generated by standardizing mapping process simultaneously, finally be preserved by the renewal frequency subregion of data, Once data need to take out again, it is only necessary to first empty current bay data, the data for just needing to take out again further according to timestamp loading.
Further, since the data source of block database is extensive, all kinds of complete disunities of derived data standard, mark can be passed through Standardization mapping process generates standardized data.Standardized data renewal is not additional pattern, but covers generation patterns.Together When, original row can't be covered by pasting the mapping row of active layer, but increase respective column newly to preserve the attribute information after mapping.
In the present invention, the patch active layer is also connected with large database concept;Pasting active layer can be by the first big number of data Cun Chudao According to the record in storehouse, facilitating the historical data for magnanimity and analysis.
Because the history layer for pasting active layer is increasing with the operation of system, the expansion in space not only proposes height to storage It is required that also causing very big pressure to traditional structured database (such as Oracle), therefore, the storage of historical data is necessary There is certain cycle, such as 1 year or 3 years, otherwise, whole storehouse will become too fat to move and can't bear.By the way that the data of filing are directly led Enter in large database concept, both ensure that data safety, simultaneously for the historical data analysis of magnanimity, also become to be particularly easy to.
Such as:
Data before standardization:
Data after standardization:
In the present invention, applications can be directly fed by pasting the first data of active layer, can also be synchronized to atomic layer, because This, application layer can be not only supplied to by pasting the first data of active layer, be also provided to atomic layer, and this is all in the protection of the present invention In the range of.
In the present invention, the Second processing module of conformable layer is used for according to default integration rules, to described second Data merge processing, generate the 3rd data;Second memory module is used to store the 3rd data.
By merging processing to the second data, multi-source data unity can be made into unique information, and then eliminate number According to ambiguity, ensure that data have unique accuracy.
For example, the marriage information of people is the state of certain determination, such as unmarried, first marriage, it is remarried, remarry, divorce, be bereft of one's spouse A certain state, the data of the different conditions of multi-source may be included in the second data for same person, this just needs Data are integrated, obtain unique, accurate data.State only per attribute determines, could support all kinds of answer Use scene.
In the present invention, the conformable layer, in addition to:Rule establishes module;The rule establishes module and described first Processing module connects;The rule establish module be used for according to the non-NULL priority principle of data, the priority of data, data when One or more in effect property, majority rule, common-sense, rational integration rules are established, and described integrate is advised Then it is sent to the first processing module.
For example, the priority of data can be the authority of data, such as marriage information, and for authority, the civil affairs department Should be most authoritative;Data it is ageing, the recent renewal time of data is primarily referred to as, as someone marriage comes from Ministry of Civil Affairs Door, not have updated for 1 year, though its information authority is higher, it is ageing poor, also it is not necessarily Accurately;Majority rule, as political affiliation (without authoritative source) is derived from the data in 10 sources altogether, wherein there is 9 Source is defined as member, and an only source is registered as party member, then result is likely to be defined by the result in 9 sources;Common-sense, it is such as all Educational background takes maximum educational background to combine age comprehensive descision as someone educational background, marital status, can not such as the people of age under-18s Energy marital status is married etc..
In the present invention, the conformable layer, in addition to:Rule verification module;The rule verification module and the rule Establish module and the first processing module is all connected with;The rule verification module is used to verify that the rule establishes module foundation Integration rules;If being verified, the integration rules are sent to the first processing module;If checking not over, Checking is not then sent to the rule by information and establishes module, the rule is established module and re-establishes integration rules.
When being merged to each item data in the second data, integration rules are all verified by enough samples, to determine The accuracy of Data Integration.
After certain sample checking, it is maximally effective that can determine a certain rule certainly, and therefore, each is regular Determination, be required for substantial amounts of data verification.
Wherein, sample data can be the partial data in the second data.Integration rules can be met by sample data Probability, to judge whether integration rules can be used as optimal integration rules.
If the coincidence rate of sample data is less than predetermined threshold value, show that the accuracy of the integration rules is not high, it is impossible to as Optimal integration rules, checking do not pass through;If the coincidence rate of sample data is not less than predetermined threshold value, show the integration rules Accuracy is higher, can be verified as optimal integration rules.
Wherein, integration rules are needed by verifying repeatedly, accuracy highest rule, Cai Nengzuo only in some rules For optimal rules.
In the present invention, merged to the second data, can be mistake of the table to table during obtaining the 3rd data Journey.When data merge, many labels or statistical information can be derived.As being directed to someone telephone number, it is possible to derive Following index:Earliest enrollment time, nearest enrollment time, registered by how many individual sources, once joined by whom as registration It is mode, time registered earliest in owner etc..By being optimized to integration rules, accurate spread out can be obtained The desired value born.
After integration, data volume is typically can be controlled within millions second data, and therefore, the second memory module is not required to 3rd data are carried out with partitioned storage, but attribute of the Second processing module when merging needs to increase source, and the attribute Business hours corresponding to renewal, in addition to more convenient trace to the source, also as the merging of generally attribute is to rely on the source of data Priority, renewal time determine, so when incremental data merges, can also be carried out incremental data by same rule Merge.
In the present invention, the data class data of the 3rd data generated after integration, can also be covered according to preset time Lid renewal.
In the present invention, Second processing module can also be according to default integration rules, to the incremental data of the second data Integrated, generate the incremental data of the 3rd data.
In the present invention, conformable layer can be by the 3rd data Cun Chudao large database concepts, can also be by the increasing of the 3rd data Data syn-chronization is measured into large database concept.Periodically data snapshot can be synchronized in large database concept according to certain cycle.It is for example, every The historical snapshot of the end of month unloading the 3rd data of portion is into large database concept, daily the data syn-chronization of incremental update to large database concept In, conveniently subsequently data can be traced to the source.
In a specific embodiment provided by the invention, the system, in addition to:Collect city level;The collection city level and institute State conformable layer connection;The collection city level is used to obtain the 3rd data from the conformable layer;And by interrelational form, splice institute State the piece segment table in the 3rd data, the wide table in generation basis.
The collection city level can include the 5th processing module and the 5th memory module;5th processing module and described the Five memory modules connect.
5th processing module is used to pass through interrelational form, splices the piece segment table in the 3rd data, generation basis Wide table.5th memory module is used to store the wide table in the basis and the 3rd data.
By generating the wide table in basis, and then the more basic wide table for needing to provide for types of applications, and then reduce application process In compute repeatedly.
, can be by the required primary attribute of some statistical analyses, such as sex, age, nationality for example, the wide table of statistical analysis Pass through, political affiliation, marital status, household register, residential area, working region, five dangerous states, pay social security time, common reserve fund recently State, individual attribute, label or the statistical indicator up to more than 50 such as public product time are paid recently, by these dimensions, index and combination, Diversified statistical analysis application can be carried out, and it is even more individual field up to more than 150 to excavate the wide table in basis.
In the present invention, collection city level, which can divide, multiple fairground subregions, facilitates data application.
For example, collection city level can have license fairground, people's livelihood service fairground, economic industry fairground, statistical analysis fairground, excavation Analyze fairground etc..
Collect the 5th memory module of city level in data storage, temporally cycle subregion preserves the 3rd data, this data It is additional formula insertion data.5th memory module can also store basic data class data, can be covering renewal storage, and whole The storage method for closing area is consistent.
In a specific embodiment provided by the invention, the system, in addition to:Application layer;The application layer and institute Patch active layer, the atomic layer, the conformable layer and the collection city level is stated to be all connected with;The application layer is used for patch described in periodic synchronization Data in active layer, the atomic layer, the conformable layer and the collection city level.
In the present invention, application layer can be from any one layer of acquisition number in patch active layer, atomic layer, conformable layer, collection city level According to, but most evidences should come from collection city level.Application layer can be carried by the pattern of database service interface to applications For data supporting.
In addition, the data of application layer will also file in large database concept, Historical archiving is done.
In a specific embodiment provided by the invention, the system, in addition to:Large database concept;The large database concept It is all connected with the application layer, collection city level, the atomic layer, the patch active layer and the conformable layer;The large database concept The number in the application layer, the collection city level, the atomic layer, the patch active layer and the conformable layer is stored for periodic synchronization According to.
In the present invention, large database concept can include historical data base, unstructured database, chart database, daily record storehouse Deng this is all within the scope of the present invention.
Application layer can provide data supporting by the pattern of database service interface to applications, and user is in application data When, it can obtain, can also be acquired from large database concept, this is all within the scope of the present invention from application layer.
In application layer, the index for counting class, was preserved by the cycle, statistics is had history.For money Expect the query interface of class data, preserved usually using covering newer.
The large database concept can use Hadoop large database concepts.Computing capability and storage using Hadoop infinite expandings Ability, ensure that historical data is in " online " state all the time.
For pipelined data, then temporally flowing water preserves, and the data of some cycles (in such as 3 years) are stored in traditional data Storehouse, meanwhile, all historical datas are dumped in large database concept.Slide fastener data, and all history numbers are formed by history According to dumping in Hadoop large database concepts.
For data class data, then slide fastener data are formed, the change historical trajectory data of formation is all saved in big number Keep synchronous according in storehouse, and using the data of moon full dose day increment, so, when data can trace back to any before the previous day Between point.
The preservation of these historical datas, the storage safety of data is on the one hand ensure that, on the other hand, then can be directed to history Data do mutation analysis.
Application oriented data, due to the unpredictability of future application, therefore, it may be from each data field, very To being that access evidence carrys out support applications in large database concept, therefore, for some personalized applications, may be fetched from other areas According to, but most of application data demand, it should it is directly to be generated from collection urban district.
In the present invention, can also include:Unstructured data sources;
The unstructured data sources are connected with the large database concept;
Large database concept can obtain unstructured data in real time, and by unstructured data storage into the large database concept Unstructured database in.
Can be by way of generating key-value pair, directly by unstructured data circulation into unstructured database.
To sum up, technical scheme, there is following beneficial effect:
With higher autgmentability:Due in atomic layer in strict accordance with third normal form specification, when system increase it is various During new demand, it is not necessary to original framework is made an amendment, it is only necessary to certain extension is done to original model, it is possible to meet these Demand.On the other hand, because the pattern for acquiring Hadoop and traditional database combination builds data warehouse, when unstructured number According into data warehouse, or when data accumulation to traditional data warehouse has been difficult to support, then can utilize Hadoop high scalability carrys out the storage capacity and computing capability in mandatory growth data warehouse.
With opening:The data such as the compatible structuring of the present invention, semi-structured, unstructured, diagram data, daily record data Form, while be directed to different heterogeneous data sources, as Oracle, Mysql, SQLServer, Access, DB2, Postgres, The systems such as Teradata, local file are compatible, and provide data, services for outside, and external service platforms need not manage Inside is the data preserved with what storage medium, is provided which unified interface.
With security:By sound back mechanism, the risk for reducing loss of data is maximized.
With ease for maintenance:Because all data processing rules are all disposed in rule list, and resolution rules Program there was only one, regular (Qiang ZhiyaoQiung well-regulated service description) and data processing engine are full decouplings, work as data During rule change, it is only necessary to simply one rule of modification, it is possible to all influence point modifications come, this is greatly improved The maintainability of system.
With easy traceability:Since atomic layer, until conformable layer, collection city level, thick to every carefully to each cell One record, have recorded the source of data, with reference to the processing rule of data, can track number always from application layer data According to source, and the rule transformation that data are passed through during intermediate treatment.
With integrality:Source data enters data warehouse, has just been directly entered patch active layer, all information are all without loss. More than the historical data of certain time window (such as more than 1 year), Hadoop platform (the not offline magnetic of on-line operation can be transferred to Tape pool), Life cycle data storage, calculating and management are carried out, ensure that the integrality of data.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example " or " some show The description of example " etc. means that combining specific features, structure, material or feature that the embodiment or example describe is contained in the present invention At least one embodiment or example in.In this manual, the schematic representation of above-mentioned term is not required to be directed to Identical embodiment or example.Moreover, specific features, structure, material or the feature of description can be in any one or more realities Apply and combined in an appropriate manner in example or example.In addition, in the case of not conflicting, those skilled in the art can incite somebody to action Different embodiments or example and the feature of different embodiments or example described in this specification are combined and combined.Need Illustrate, the flow chart and block diagram in accompanying drawing of the present invention show that system according to an embodiment of the invention obtains the production of machine program Architectural framework in the cards, function and the operation of product.At this point, each square frame in flow chart or block diagram can represent one A part for individual module, program segment or code, a part for the module, program segment or code include one or more be used in fact The executable instruction of logic function as defined in existing.It should also be noted that marked at some as in the realization replaced in square frame Function can also be with different from the order marked in accompanying drawing generation.For example, two continuous square frames can essentially substantially simultaneously Perform capablely, they can also be performed in the opposite order sometimes, and this is depending on involved function.It is also noted that frame The combination of figure and/or each square frame and block diagram in flow chart and/or the square frame in flow chart, it can use as defined in performing Function or the special hardware based system of action realize, or can with specialized hardware with obtain combination that machine instructs come Realize.
In several embodiments provided herein, it should be understood that disclosed system, can be by others side Formula is realized.System embodiment described above is only schematical, for example, the division of the unit, only one kind are patrolled Function division is collected, there can be other dividing mode when actually realizing, in another example, multiple units or component can combine or can To be integrated into another system, or some features can be ignored, or not perform.
The unit illustrated as separating component can be or may not be physically separate, show as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.
If the function is realized in the form of SFU software functional unit and is used as independent production marketing or in use, can be with It is stored in an acquisition machine read/write memory medium.Based on such understanding, technical scheme is substantially in other words The part to be contributed to prior art or the part of the technical scheme can be embodied in the form of software product, and this is obtained Machine software product is stored in a storage medium, including some instructions are causing an acquisition machine machine (can be People obtains machine, server, or net machine etc.) perform all or part of step of each embodiment methods described of the present invention. And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations;To the greatest extent The present invention is described in detail with reference to foregoing embodiments for pipe, it will be understood by those within the art that:Its according to The technical scheme described in foregoing embodiments can so be modified, either which part or all technical characteristic are entered Row equivalent substitution;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology The scope of scheme, it all should cover among the claim of the present invention and the scope of specification.

Claims (10)

  1. A kind of 1. data handling system based on data warehouse, it is characterised in that including:Atomic layer and conformable layer;
    The atomic layer connects with the conformable layer;
    The atomic layer includes:First processing module and the first memory module;
    The conformable layer includes:Second processing module and the second memory module;
    First memory module connects with the first processing module;
    Second memory module connects with the Second processing module;
    The atomic layer is used for the first data for obtaining structuring;
    The first processing module is used to, according to presets, carry out first data tissue classification, obtains the second data;
    First memory module is used for the second data after sorting out to tissue and carries out partitioned storage;
    The atomic layer is used to second data being updated to the conformable layer;
    The Second processing module is used for according to default integration rules, and second data are merged with processing, generates the Three data;
    Second memory module is used to store the 3rd data.
  2. 2. the data handling system according to claim 1 based on data warehouse, it is characterised in that the first storage mould Block, it is specifically used for:
    According to the one or more in data source, data cycle, business classification, relationship type, the data after sorting out to tissue Carry out partitioned storage.
  3. 3. the data handling system according to claim 1 based on data warehouse, it is characterised in that the conformable layer, also Including:Rule establishes module;
    The rule is established module and is connected with the first processing module;
    The rule establish module be used for according to the non-NULL priority principle of data, the priority of data, data it is ageing, a small number of The one or more in majority principle, common-sense are obeyed, establish rational integration rules, and the integration rules are sent to institute State first processing module.
  4. 4. the data handling system according to claim 3 based on data warehouse, it is characterised in that the conformable layer, also Including:Rule verification module;
    The rule verification module establishes module with the rule and the first processing module is all connected with;
    The rule verification module is used for the integration rules for verifying that the rule establishes module foundation;If being verified, by institute State integration rules and be sent to the first processing module;
    If checking by checking not over not being sent to the rule by information and establish module, the rule is established mould Block re-establishes integration rules.
  5. 5. the data handling system according to claim 1 based on data warehouse, it is characterised in that the system, also wrap Include:Paste active layer;
    The patch active layer is connected with the atomic layer;
    The patch active layer is used to be standardized buffered data, obtains first data;And to the buffered data Historical archiving processing is carried out with first data;
    The patch active layer is additionally operable to first data being updated to the atomic layer.
  6. 6. the data handling system according to claim 5 based on data warehouse, it is characterised in that the system, also wrap Include:Cushion;
    The cushion is connected with the patch active layer;
    The cushion is used for the source data for caching the separate sources of the structuring in source database, generates the buffered data;
    The cushion is additionally operable to the buffered data being updated to the patch active layer.
  7. 7. the data handling system according to claim 6 based on data warehouse, it is characterised in that the system, also wrap Include:Collect city level;
    The collection city level is connected with the conformable layer;
    The collection city level is used to obtain the 3rd data from the conformable layer;And pass through interrelational form, splicing the 3rd number Piece segment table in, the wide table in generation basis.
  8. 8. the data handling system according to claim 7 based on data warehouse, it is characterised in that the system, also wrap Include:Application layer;
    The application layer is all connected with the patch active layer, the atomic layer, the conformable layer and the collection city level;
    The application layer is used to paste the number in active layer, the atomic layer, the conformable layer and the collection city level described in periodic synchronization According to.
  9. 9. the data handling system according to claim 8 based on data warehouse, it is characterised in that the system, also wrap Include:Large database concept;
    The large database concept connects with the application layer, collection city level, the atomic layer, the patch active layer and the conformable layer Connect;
    The large database concept is used for periodic synchronization and stores the application layer, the collection city level, the atomic layer, the patch source Data in layer and the conformable layer.
  10. 10. the data handling system according to claim 9 based on data warehouse, it is characterised in that the large database concept Using Hadoop large database concepts.
CN201710919091.2A 2017-09-30 2017-09-30 A kind of data handling system based on data warehouse Pending CN107729448A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710919091.2A CN107729448A (en) 2017-09-30 2017-09-30 A kind of data handling system based on data warehouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710919091.2A CN107729448A (en) 2017-09-30 2017-09-30 A kind of data handling system based on data warehouse

Publications (1)

Publication Number Publication Date
CN107729448A true CN107729448A (en) 2018-02-23

Family

ID=61208516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710919091.2A Pending CN107729448A (en) 2017-09-30 2017-09-30 A kind of data handling system based on data warehouse

Country Status (1)

Country Link
CN (1) CN107729448A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108829830A (en) * 2018-06-15 2018-11-16 四川众之金科技有限公司 Data processing method and device
CN110309108A (en) * 2019-05-08 2019-10-08 江苏满运软件科技有限公司 Data acquisition and storage method, device, electronic equipment, storage medium
CN110990390A (en) * 2019-12-02 2020-04-10 东莞中国科学院云计算产业技术创新与育成中心 Data cooperative processing method and device, computer equipment and storage medium
CN111125069A (en) * 2019-11-13 2020-05-08 深圳市华傲数据技术有限公司 Data cleaning and fusing system
CN111337727A (en) * 2020-03-05 2020-06-26 山东泰开互感器有限公司 Current transformer and cloud computing-based current transformer information interaction system
CN113377872A (en) * 2021-06-25 2021-09-10 北京红山信息科技研究院有限公司 Offline synchronization method, device and equipment of online system data in big data center

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101197876A (en) * 2006-12-06 2008-06-11 中兴通讯股份有限公司 Method and system for multi-dimensional analysis of message service data
US20130132383A1 (en) * 2007-05-29 2013-05-23 Christopher Ahlberg Information service for relationships between facts extracted from differing sources on a wide area network
CN105608203A (en) * 2015-12-24 2016-05-25 Tcl集团股份有限公司 Internet of things log processing method and device based on Hadoop platform
CN106294521A (en) * 2015-06-12 2017-01-04 交通银行股份有限公司 Date storage method and data warehouse

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101197876A (en) * 2006-12-06 2008-06-11 中兴通讯股份有限公司 Method and system for multi-dimensional analysis of message service data
US20130132383A1 (en) * 2007-05-29 2013-05-23 Christopher Ahlberg Information service for relationships between facts extracted from differing sources on a wide area network
CN106294521A (en) * 2015-06-12 2017-01-04 交通银行股份有限公司 Date storage method and data warehouse
CN105608203A (en) * 2015-12-24 2016-05-25 Tcl集团股份有限公司 Internet of things log processing method and device based on Hadoop platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XUZHENGZHU: "Oracle & BI & 大数据分析", 《HTTPS://WWW.CNBLOGS.COM/HONDAHSU/P/5314176.HTML》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108829830A (en) * 2018-06-15 2018-11-16 四川众之金科技有限公司 Data processing method and device
CN110309108A (en) * 2019-05-08 2019-10-08 江苏满运软件科技有限公司 Data acquisition and storage method, device, electronic equipment, storage medium
CN111125069A (en) * 2019-11-13 2020-05-08 深圳市华傲数据技术有限公司 Data cleaning and fusing system
CN111125069B (en) * 2019-11-13 2023-04-28 深圳市华傲数据技术有限公司 Data cleaning fusion system
CN110990390A (en) * 2019-12-02 2020-04-10 东莞中国科学院云计算产业技术创新与育成中心 Data cooperative processing method and device, computer equipment and storage medium
CN110990390B (en) * 2019-12-02 2024-03-08 东莞中国科学院云计算产业技术创新与育成中心 Data cooperative processing method, device, computer equipment and storage medium
CN111337727A (en) * 2020-03-05 2020-06-26 山东泰开互感器有限公司 Current transformer and cloud computing-based current transformer information interaction system
CN113377872A (en) * 2021-06-25 2021-09-10 北京红山信息科技研究院有限公司 Offline synchronization method, device and equipment of online system data in big data center
CN113377872B (en) * 2021-06-25 2024-02-27 北京红山信息科技研究院有限公司 Offline synchronization method, device and equipment of online system data in big data center

Similar Documents

Publication Publication Date Title
CN107657049A (en) A kind of data processing method based on data warehouse
CN107729448A (en) A kind of data handling system based on data warehouse
CN107704590A (en) A kind of data processing method and system based on data warehouse
CN112241924B (en) Wisdom gas system
US5745755A (en) Method for creating and maintaining a database for a dynamic enterprise
CN110347719A (en) A kind of enterprise's foreign trade method for prewarning risk and system based on big data
US8626703B2 (en) Enterprise resource planning (ERP) system change data capture
CN106294521A (en) Date storage method and data warehouse
Mađer et al. Analysis of possibilities for linking land registers and other official registers in the Republic of Croatia based on LADM
CN102663008B (en) Government integrated business platform business library and construction method of base library
Josélyne et al. Partitioning microservices: A domain engineering approach
US11119989B1 (en) Data aggregation with schema enforcement
CN111382956A (en) Enterprise group relationship mining method and device
CN110109908B (en) Analysis system and method for mining potential relationship of person based on social basic information
CN108959560A (en) Information processing method, device and electronic equipment based on tables of data
CN107945014A (en) One kind is based on LAOP platform small amount personal loan systems
CN111737335B (en) Product information integration processing method and device, computer equipment and storage medium
CN110457333A (en) Data real time updating method, device and computer readable storage medium
CN112506892A (en) Index traceability management system based on metadata technology
CN107491558A (en) Metadata updates method and device
CN114240333A (en) Holographic application center system for electronic accounting archives
CN113688396A (en) Automobile information safety risk assessment automation system
CN107506155A (en) Date storage method and device based on block number evidence
CN110019237B (en) System and method for analyzing criminal whereabouts based on map
Richter In-tensions to infrastructure: developing digital property databases in urban Karnataka, India

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180223

RJ01 Rejection of invention patent application after publication