CN102855277A - Data center system and data processing method - Google Patents
Data center system and data processing method Download PDFInfo
- Publication number
- CN102855277A CN102855277A CN2012102570388A CN201210257038A CN102855277A CN 102855277 A CN102855277 A CN 102855277A CN 2012102570388 A CN2012102570388 A CN 2012102570388A CN 201210257038 A CN201210257038 A CN 201210257038A CN 102855277 A CN102855277 A CN 102855277A
- Authority
- CN
- China
- Prior art keywords
- data
- relational database
- exported
- confidential
- mass
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention provides a data center system and a data processing method. The method comprises the following steps of: after the data center system receives a query request, extracting original data to be processed from a data source; exporting confidential data from the original data to a relation database, and exporting massive data from the original data to a non-relation database; performing statistic analysis on the massive data in the non-relation database according to the query request to obtain valuable data; and associating the confidential data with the valuable data to generate and output query result data. By the method, the data can be processed by fully using the advantages of the relation database and the non-relation database.
Description
Technical field
The present invention relates to a kind of data center systems and data processing method, belong to the data management technique field.
Background technology
Existing data center systems is used relevant database or non-relational database usually.Wherein, relevant database is being a kind of database that is based upon on the relational model basis, come data in the process database by means of the mathematical concept such as algebra of sets and method, relational model is retrained three parts and is formed by relational data structure, relational operation set, relation integraity.The non-relational database is the database that is not based upon on the relational model basis.Wherein:
Described relevant database has the security of height, but defective is: when relating to the index in mass data source, have numerous technical bottlenecks in efficient and realization, therefore have the inefficient problem of distributed query.Described non-relational database possesses free schema, can fit well demand extending transversely.The burst ability, the storage mass data, but defective is: lack Warrant Bounds, security is lower.
The defective of available data centring system is: only can select a ground choice relation type database or non-relational database and make up, therefore can't take full advantage of the advantage of two kinds of databases.
Summary of the invention
The invention provides a kind of data center systems and data processing method, in order to take full advantage of relevant database and non-relational database advantage separately.
One aspect of the present invention provides a kind of data processing method, comprising:
Data center systems extracts pending raw data from data source after receiving query requests;
Confidential data in the described raw data is exported in the relational database, and the mass data in the described raw data is exported in the non-relational database;
According to described query requests the mass data in the described non-relational database is carried out statistical study and obtain valuable data;
The described confidential data generated query result data that is associated with described valuable data is exported.
The present invention provides a kind of data center systems on the other hand, comprising:
Abstraction module is used for extracting pending raw data from data source after receiving query requests;
Derive module, be used for the confidential data of described raw data is exported in the relational database, and the mass data in the described raw data is exported in the non-relational database;
Analysis module is used for according to described query requests the mass data of described non-relational database being carried out statistical study and obtains valuable data;
Output module is exported for generated query result data that described confidential data is associated with described valuable data.
The present invention is by exporting to relational database with confidential data, thereby guaranteed the security of confidential data, and by mass data being exported to non-relational database, improve the efficient of distributed query, thereby can take full advantage of relevant database and the advantage realization data processing separately of non-relational database.
Description of drawings
Fig. 1 is the process flow diagram of data processing method embodiment of the present invention;
Fig. 2 is the structural representation of data center systems embodiment of the present invention.
Embodiment
Fig. 1 is the process flow diagram of data processing method embodiment of the present invention, as shown in the figure, comprises the steps:
Wherein, described pending raw data refers to the raw data that needs are processed in order to satisfy described query requests, and for example, in order to inquire about such as the data in the following table 5, then the data in following table 1~4 are then for needing the pending raw data of extraction; Described data source can be made of databases such as DB2, oracle, sqlserver; Particularly, can usage data extract, conversion and load (Extraction Transformation Loading, be called for short: ETL) instrument carries out above-mentioned extraction operation, and the content of described raw data is exemplified below:
Table 1
Id | Password |
1 | ********* |
2 | ********* |
3 | ********* |
4 | ********* |
Table 2
Id | Label (tags) |
1 | Dog (Dog) |
1 | Cat (Cat) |
2 | Cat (Cat) |
3 | Mouse (Mouse) |
3 | Cat (Cat) |
3 | Dog (Dog) |
Table 3
Label | Retail price |
Dog | 2 |
Cat | 5 |
Mouse | 3 |
Table 4
Production firm | Member's grade | Label | Wholesale price |
Produce1 | 3 stars | Dog | 1 |
Produce2 | 2 stars | Cat | 2 |
Produce3 | 1 star | Mouse | 1.5 |
Wherein:
Table 1 is the Basic Information Table of producer, is used for preserving the essential information data of producer.
Table 2 is marketing information tabulations of producer, has sold which commodity to show producer, and as seen, Dog and these label commodity of two types of Cat have been sold by producer 1 from table 2; The label commodity of Cat type have been sold by producer 2; The label commodity of Mouse, Cat, this three types of Dog have been sold by producer 3.
Table 3 is essential informations of various label commodity, has for example shown the retail price of all kinds labels in the table 3.
Table 4 is essential informations of commodity production manufacturer, comprises the wholesale price of producing Commercial goods labels, the member of manufacturer grade, commodity.
Wherein, described confidential data refers to the data that security requirement is higher, and for example, the content of table 1, table 3 and table 4 is confidential data; Described mass data refers to the data that data volume is larger, and for example the content of table 2 is mass data.
Particularly, described data center systems can be first identified confidential data and mass data in the described raw data according to the preset configuration information in the configuration file of described ETL instrument, then the confidential data that identifies is exported in the relational database, and the mass data that identifies is exported in the non-relational database.Preset configuration information is used to indicate which data belongs to confidential data in the described raw data, and which data belongs to mass data, and this preset configuration information had been disposed at the configuration file of ETL before using the ETL instrument.
In addition, the raw data that is exported adopts the form consistent with associated databases, and for example, when described non-relational database was the Mongo database, the raw data of wherein preserving all adopted the csv form.For example, adopt table 2 content of csv form to be expressed as:
{_id:1,tags:[′dog′,′cat′]}
{_id:2,tags:[′cat′]}
{_id:3,tags:[′mouse′,′cat′,′dog′]}
{_id:4,tags:[]}
Particularly, can be by adopting mapping abbreviation (MapReduce) technology that the mass data in the described non-relational database is carried out real-time statistic analysis and/or the off-line statistical study obtains described valuable data.Specifically carry out which statistical study and determined by query requests, for example, if the data in the following table 5 of requesting query then can be analyzed based on the content statistics of the table 2 in the Mongo database total sales volume of different labels, the valuable data that obtain are as follows:
{″tags″:″Cat″,″value″:{″count″:3}}
{″tags″:″Dog″,″value″:{″count″:2}}
{″tags″:″Mouse″,″value″:{″count″:1}}
Wherein, above-mentioned valuable data show, in all producers that added up, the label commodity are 3 for the total sales volume of " Cat ", and the label commodity are 2 for the total sales volume of " Dog ", and the label commodity are 1 for the total sales volume of " Mouse ".
For example, described valuable data are associated with the content of the above-mentioned table 4 Query Result data of rear generation are as shown in table 5.
Table 5
Production firm | Label | Total sales volume |
Produce1 | Dog | 3 |
Produce2 | Cat | 2 |
Produce3 | Mouse | 1 |
Particularly, can be by shifting REpresentation State Transfer based on the REST(of http protocol statement sexual state) interface exports the Query Result data as data product.In addition, the data product that is output can also have can supply the URL(Uniform Universal Resource Locator of access, URL(uniform resource locator)) address, the user can mention this data product by this URL address.
The described method of the present embodiment is by exporting to relational database with confidential data, thereby guaranteed the security of confidential data, and by mass data being exported to non-relational database, improve the efficient of distributed query, thereby can take full advantage of relevant database and the advantage realization data processing separately of non-relational database.
Fig. 2 is the structural representation of data center systems embodiment of the present invention, in order to realize said method, as shown in the figure, this system comprises at least: abstraction module 10, derivation module 20, analysis module 30 and output module 40, in addition, can also comprise: data source 50, relational database 61 and non-relational database 62.The principle of work of this system is as follows:
After described data center systems receives query requests, extract pending raw data by abstraction module 10 from data source 50, specifically can utilize the ETL instrument to extract described pending raw data; Then by deriving module 20 confidential data in the described raw data is exported in the relational database 61, and the mass data 62 in the described raw data is exported in the non-relational database 62.
Particularly, derive module 20 and can be first identify confidential data and mass data in the described raw data by recognition unit 21 according to the preset configuration information in the configuration file of ETL instrument; And then exported in the described relational database 61 by the confidential data that lead-out unit 22 identifies recognition unit 21, and the mass data that identifies is exported in the described non-relational database 62.
After this, according to described query requests the mass data in the described non-relational database 62 is carried out statistical study by analysis module 30 and obtain valuable data, specifically can adopt mapping abbreviation technology that the mass data in the described non-relational database is carried out real-time statistic analysis and/or the off-line statistical study obtains described valuable data by a plurality of cloud node servers; Then by output module 40 the described confidential data generated query result data that is associated with described valuable data is exported, specifically can described Query Result data be exported as data product by the REST interface based on http protocol.Be correlated with and referring to the content of said method embodiment, repeat no more for example herein.
The described system of the present embodiment is by exporting to relational database with confidential data, thereby guaranteed the security of confidential data, and by mass data being exported to non-relational database, improve the efficient of distributed query, thereby can take full advantage of relevant database and the advantage realization data processing separately of non-relational database.
One of ordinary skill in the art will appreciate that: all or part of step that realizes said method embodiment can be finished by the relevant hardware of programmed instruction, aforesaid program can be stored in the computer read/write memory medium, this program is carried out the step that comprises said method embodiment when carrying out; And aforesaid storage medium comprises: the various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
It should be noted that at last: above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment puts down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.
Claims (9)
1. a data processing method is characterized in that, comprising:
Data center systems extracts pending raw data from data source after receiving query requests;
Confidential data in the described raw data is exported in the relational database, and the mass data in the described raw data is exported in the non-relational database;
According to described query requests the mass data in the described non-relational database is carried out statistical study and obtain valuable data;
The described confidential data generated query result data that is associated with described valuable data is exported.
2. method according to claim 1 is characterized in that, described data center systems extracts pending raw data from data source and comprises: described data center systems utilizes the ETL instrument to extract described pending raw data.
3. method according to claim 2 is characterized in that, the confidential data in the described raw data is exported in the relational database, and the mass data in the described raw data exported in the non-relational database comprise:
Identify confidential data and mass data in the described raw data according to the preset configuration information in the configuration file of described ETL instrument;
The confidential data that identifies is exported in the described relational database, and the mass data that identifies is exported in the described non-relational database.
4. method according to claim 1, it is characterized in that, describedly mass data in the described non-relational database is carried out statistical study obtain valuable data and comprise: adopt mapping abbreviation technology that the mass data in the described non-relational database is carried out real-time statistic analysis and/or the off-line statistical study obtains described valuable data by a plurality of cloud node servers.
5. method according to claim 1 is characterized in that, described Query Result data is exported comprise: by the REST interface based on http protocol described Query Result data are exported as data product.
6. a data center systems is characterized in that, comprising:
Abstraction module is used for extracting pending raw data from data source after receiving query requests;
Derive module, be used for the confidential data of described raw data is exported in the relational database, and the mass data in the described raw data is exported in the non-relational database;
Analysis module is used for according to described query requests the mass data of described non-relational database being carried out statistical study and obtains valuable data;
Output module is exported for generated query result data that described confidential data is associated with described valuable data.
7. system according to claim 6 is characterized in that, described derivation module comprises:
Recognition unit is used for identifying confidential data and mass data in the described raw data according to the preset configuration information of the configuration file of ETL instrument;
Lead-out unit, the confidential data that is used for recognition unit is identified exports to described relational database, and the mass data that identifies is exported in the described non-relational database.
8. system according to claim 6 is characterized in that, also comprises:
Described relational database is used for receiving the confidential data of deriving by deriving module;
Described non-relational database is used for receiving the mass data that derives by deriving module.
9. system according to claim 6 is characterized in that, also comprises: described data source is used for preserving described raw data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012102570388A CN102855277A (en) | 2012-07-23 | 2012-07-23 | Data center system and data processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012102570388A CN102855277A (en) | 2012-07-23 | 2012-07-23 | Data center system and data processing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102855277A true CN102855277A (en) | 2013-01-02 |
Family
ID=47401865
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012102570388A Pending CN102855277A (en) | 2012-07-23 | 2012-07-23 | Data center system and data processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102855277A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103747060A (en) * | 2013-12-26 | 2014-04-23 | 惠州华阳通用电子有限公司 | Distributed monitor system and method based on streaming media service cluster |
CN104965850A (en) * | 2015-04-29 | 2015-10-07 | 云南电网有限责任公司 | Database high-available implementation method based on open source technology |
CN106156174A (en) * | 2015-04-16 | 2016-11-23 | 中国移动通信集团山西有限公司 | The system and method that a kind of db transaction processes |
CN106383850A (en) * | 2016-08-31 | 2017-02-08 | 东软集团股份有限公司 | Data processing method and apparatus |
WO2017198227A1 (en) * | 2016-05-19 | 2017-11-23 | 中兴通讯股份有限公司 | Interactive internet protocol television system and real-time acquisition method for user data |
CN108228752A (en) * | 2017-12-21 | 2018-06-29 | 中国联合网络通信集团有限公司 | Data full dose deriving method, data distribution device and data export node |
CN111708778A (en) * | 2020-06-09 | 2020-09-25 | 樊馨 | Big data management method and system |
CN111931214A (en) * | 2020-08-31 | 2020-11-13 | 平安国际智慧城市科技股份有限公司 | Data processing method, device, server and storage medium |
CN112527836A (en) * | 2020-12-08 | 2021-03-19 | 航天科技控股集团股份有限公司 | Big data query method based on T-BOX platform |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102110150A (en) * | 2011-02-18 | 2011-06-29 | 中交四航工程研究院有限公司 | Autonomous examination and approval method based on distributed database |
CN102236867A (en) * | 2011-08-15 | 2011-11-09 | 悠易互通(北京)广告有限公司 | Cloud computing-based audience behavioral analysis advertisement targeting system |
CN102375891A (en) * | 2011-11-15 | 2012-03-14 | 山东浪潮金融信息系统有限公司 | Implementation tool for unloading and loading incremental data |
CN102508908A (en) * | 2011-11-11 | 2012-06-20 | 北京用友政务软件有限公司 | Method for acquiring subordinate financial business data and system for acquiring subordinate financial business data |
-
2012
- 2012-07-23 CN CN2012102570388A patent/CN102855277A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102110150A (en) * | 2011-02-18 | 2011-06-29 | 中交四航工程研究院有限公司 | Autonomous examination and approval method based on distributed database |
CN102236867A (en) * | 2011-08-15 | 2011-11-09 | 悠易互通(北京)广告有限公司 | Cloud computing-based audience behavioral analysis advertisement targeting system |
CN102508908A (en) * | 2011-11-11 | 2012-06-20 | 北京用友政务软件有限公司 | Method for acquiring subordinate financial business data and system for acquiring subordinate financial business data |
CN102375891A (en) * | 2011-11-15 | 2012-03-14 | 山东浪潮金融信息系统有限公司 | Implementation tool for unloading and loading incremental data |
Non-Patent Citations (1)
Title |
---|
马俊涛等: "以混合存储模型实现云计算平台对电信海量数据的处理", 《移动通信》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103747060A (en) * | 2013-12-26 | 2014-04-23 | 惠州华阳通用电子有限公司 | Distributed monitor system and method based on streaming media service cluster |
CN103747060B (en) * | 2013-12-26 | 2017-12-08 | 惠州华阳通用电子有限公司 | A kind of distributed monitoring system and method based on streaming media service cluster |
CN106156174A (en) * | 2015-04-16 | 2016-11-23 | 中国移动通信集团山西有限公司 | The system and method that a kind of db transaction processes |
CN104965850A (en) * | 2015-04-29 | 2015-10-07 | 云南电网有限责任公司 | Database high-available implementation method based on open source technology |
CN104965850B (en) * | 2015-04-29 | 2018-01-30 | 云南电网有限责任公司 | A kind of database high availability implementation method based on open source technology |
WO2017198227A1 (en) * | 2016-05-19 | 2017-11-23 | 中兴通讯股份有限公司 | Interactive internet protocol television system and real-time acquisition method for user data |
CN106383850A (en) * | 2016-08-31 | 2017-02-08 | 东软集团股份有限公司 | Data processing method and apparatus |
CN108228752A (en) * | 2017-12-21 | 2018-06-29 | 中国联合网络通信集团有限公司 | Data full dose deriving method, data distribution device and data export node |
CN111708778A (en) * | 2020-06-09 | 2020-09-25 | 樊馨 | Big data management method and system |
CN111931214A (en) * | 2020-08-31 | 2020-11-13 | 平安国际智慧城市科技股份有限公司 | Data processing method, device, server and storage medium |
CN112527836A (en) * | 2020-12-08 | 2021-03-19 | 航天科技控股集团股份有限公司 | Big data query method based on T-BOX platform |
CN112527836B (en) * | 2020-12-08 | 2022-12-30 | 航天科技控股集团股份有限公司 | Big data query method based on T-BOX platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102855277A (en) | Data center system and data processing method | |
US11132384B2 (en) | Generating a multi-column index for relational databases by interleaving data bits for selectivity | |
CN104685498B (en) | The hardware implementation mode of polymerization/division operation:Hash table method | |
US20140351285A1 (en) | Platform and method for analyzing electric power system data | |
US10061787B2 (en) | Unified data model for integration between relational and non-relational databases | |
CN104899295B (en) | A kind of heterogeneous data source data relation analysis method | |
CN104462430B (en) | The data processing method and device of relevant database | |
CN103268336A (en) | Fast data and big data combined data processing method and system | |
CN103631912B (en) | A kind of method utilizing non-relational database storage magnanimity monitoring industrial equipment data | |
CN101571861A (en) | Method and device for converting data table | |
CN107622068A (en) | A kind of blog management method and device based on JSON forms | |
CN114064660B (en) | Data structured analysis method based on ElasticSearch | |
CN106649368A (en) | Data storage method and device and data query method and device | |
CN105913188A (en) | Multidirectional management system and multidirectional management method of RFID supply chain | |
Singh et al. | Cassandra-based data repository design for food supply chain traceability | |
CN103543959A (en) | Method and device for mass data caching | |
CN107562949B (en) | Method for writing merged report Excel template into database | |
CN104598520A (en) | Commodity information processing method and device | |
JP6438295B2 (en) | Automatic editing of graph input for hypergraph solvers | |
CN103995832A (en) | Complex relational data storage technology implementation method based on separation of attributes and relations | |
CN110908983A (en) | Intelligent marketing system based on user portrait recognition | |
CN110826845A (en) | Multidimensional combination cost allocation device and method | |
CN110647845A (en) | Invoice data identification device, related method and related device | |
Vishwanath et al. | An Association Rule Mining for Materialized View Selection and View Maintanance | |
CN111159204B (en) | Method and system for generating label in configuration mode |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20130102 |