CN102855277A - Data center system and data processing method - Google Patents

Data center system and data processing method Download PDF

Info

Publication number
CN102855277A
CN102855277A CN2012102570388A CN201210257038A CN102855277A CN 102855277 A CN102855277 A CN 102855277A CN 2012102570388 A CN2012102570388 A CN 2012102570388A CN 201210257038 A CN201210257038 A CN 201210257038A CN 102855277 A CN102855277 A CN 102855277A
Authority
CN
China
Prior art keywords
data
relational database
exported
confidential
mass
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012102570388A
Other languages
Chinese (zh)
Inventor
王伟华
李建功
李珩
齐飞
博格利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN2012102570388A priority Critical patent/CN102855277A/en
Publication of CN102855277A publication Critical patent/CN102855277A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a data center system and a data processing method. The method comprises the following steps of: after the data center system receives a query request, extracting original data to be processed from a data source; exporting confidential data from the original data to a relation database, and exporting massive data from the original data to a non-relation database; performing statistic analysis on the massive data in the non-relation database according to the query request to obtain valuable data; and associating the confidential data with the valuable data to generate and output query result data. By the method, the data can be processed by fully using the advantages of the relation database and the non-relation database.

Description

Data center systems and data processing method
Technical field
The present invention relates to a kind of data center systems and data processing method, belong to the data management technique field.
Background technology
Existing data center systems is used relevant database or non-relational database usually.Wherein, relevant database is being a kind of database that is based upon on the relational model basis, come data in the process database by means of the mathematical concept such as algebra of sets and method, relational model is retrained three parts and is formed by relational data structure, relational operation set, relation integraity.The non-relational database is the database that is not based upon on the relational model basis.Wherein:
Described relevant database has the security of height, but defective is: when relating to the index in mass data source, have numerous technical bottlenecks in efficient and realization, therefore have the inefficient problem of distributed query.Described non-relational database possesses free schema, can fit well demand extending transversely.The burst ability, the storage mass data, but defective is: lack Warrant Bounds, security is lower.
The defective of available data centring system is: only can select a ground choice relation type database or non-relational database and make up, therefore can't take full advantage of the advantage of two kinds of databases.
Summary of the invention
The invention provides a kind of data center systems and data processing method, in order to take full advantage of relevant database and non-relational database advantage separately.
One aspect of the present invention provides a kind of data processing method, comprising:
Data center systems extracts pending raw data from data source after receiving query requests;
Confidential data in the described raw data is exported in the relational database, and the mass data in the described raw data is exported in the non-relational database;
According to described query requests the mass data in the described non-relational database is carried out statistical study and obtain valuable data;
The described confidential data generated query result data that is associated with described valuable data is exported.
The present invention provides a kind of data center systems on the other hand, comprising:
Abstraction module is used for extracting pending raw data from data source after receiving query requests;
Derive module, be used for the confidential data of described raw data is exported in the relational database, and the mass data in the described raw data is exported in the non-relational database;
Analysis module is used for according to described query requests the mass data of described non-relational database being carried out statistical study and obtains valuable data;
Output module is exported for generated query result data that described confidential data is associated with described valuable data.
The present invention is by exporting to relational database with confidential data, thereby guaranteed the security of confidential data, and by mass data being exported to non-relational database, improve the efficient of distributed query, thereby can take full advantage of relevant database and the advantage realization data processing separately of non-relational database.
Description of drawings
Fig. 1 is the process flow diagram of data processing method embodiment of the present invention;
Fig. 2 is the structural representation of data center systems embodiment of the present invention.
Embodiment
Fig. 1 is the process flow diagram of data processing method embodiment of the present invention, as shown in the figure, comprises the steps:
Step 100, data center systems extract pending raw data from data source after receiving query requests.
Wherein, described pending raw data refers to the raw data that needs are processed in order to satisfy described query requests, and for example, in order to inquire about such as the data in the following table 5, then the data in following table 1~4 are then for needing the pending raw data of extraction; Described data source can be made of databases such as DB2, oracle, sqlserver; Particularly, can usage data extract, conversion and load (Extraction Transformation Loading, be called for short: ETL) instrument carries out above-mentioned extraction operation, and the content of described raw data is exemplified below:
Table 1
Id Password
1 *********
2 *********
3 *********
4 *********
Table 2
Id Label (tags)
1 Dog (Dog)
1 Cat (Cat)
2 Cat (Cat)
3 Mouse (Mouse)
3 Cat (Cat)
3 Dog (Dog)
Table 3
Label Retail price
Dog 2
Cat 5
Mouse 3
Table 4
Production firm Member's grade Label Wholesale price
Produce1 3 stars Dog 1
Produce2 2 stars Cat 2
Produce3 1 star Mouse 1.5
Wherein:
Table 1 is the Basic Information Table of producer, is used for preserving the essential information data of producer.
Table 2 is marketing information tabulations of producer, has sold which commodity to show producer, and as seen, Dog and these label commodity of two types of Cat have been sold by producer 1 from table 2; The label commodity of Cat type have been sold by producer 2; The label commodity of Mouse, Cat, this three types of Dog have been sold by producer 3.
Table 3 is essential informations of various label commodity, has for example shown the retail price of all kinds labels in the table 3.
Table 4 is essential informations of commodity production manufacturer, comprises the wholesale price of producing Commercial goods labels, the member of manufacturer grade, commodity.
Step 200, data center systems exports to the confidential data in the described raw data in the relational database, and the mass data in the described raw data is exported in the non-relational database.
Wherein, described confidential data refers to the data that security requirement is higher, and for example, the content of table 1, table 3 and table 4 is confidential data; Described mass data refers to the data that data volume is larger, and for example the content of table 2 is mass data.
Particularly, described data center systems can be first identified confidential data and mass data in the described raw data according to the preset configuration information in the configuration file of described ETL instrument, then the confidential data that identifies is exported in the relational database, and the mass data that identifies is exported in the non-relational database.Preset configuration information is used to indicate which data belongs to confidential data in the described raw data, and which data belongs to mass data, and this preset configuration information had been disposed at the configuration file of ETL before using the ETL instrument.
In addition, the raw data that is exported adopts the form consistent with associated databases, and for example, when described non-relational database was the Mongo database, the raw data of wherein preserving all adopted the csv form.For example, adopt table 2 content of csv form to be expressed as:
{_id:1,tags:[′dog′,′cat′]}
{_id:2,tags:[′cat′]}
{_id:3,tags:[′mouse′,′cat′,′dog′]}
{_id:4,tags:[]}
Step 300, data center systems are carried out statistical study according to described query requests to the mass data in the described non-relational database and are obtained valuable data.
Particularly, can be by adopting mapping abbreviation (MapReduce) technology that the mass data in the described non-relational database is carried out real-time statistic analysis and/or the off-line statistical study obtains described valuable data.Specifically carry out which statistical study and determined by query requests, for example, if the data in the following table 5 of requesting query then can be analyzed based on the content statistics of the table 2 in the Mongo database total sales volume of different labels, the valuable data that obtain are as follows:
{″tags″:″Cat″,″value″:{″count″:3}}
{″tags″:″Dog″,″value″:{″count″:2}}
{″tags″:″Mouse″,″value″:{″count″:1}}
Wherein, above-mentioned valuable data show, in all producers that added up, the label commodity are 3 for the total sales volume of " Cat ", and the label commodity are 2 for the total sales volume of " Dog ", and the label commodity are 1 for the total sales volume of " Mouse ".
Step 400, data center systems is exported the described confidential data generated query result data that is associated with described valuable data.
For example, described valuable data are associated with the content of the above-mentioned table 4 Query Result data of rear generation are as shown in table 5.
Table 5
Production firm Label Total sales volume
Produce1 Dog 3
Produce2 Cat 2
Produce3 Mouse 1
Particularly, can be by shifting REpresentation State Transfer based on the REST(of http protocol statement sexual state) interface exports the Query Result data as data product.In addition, the data product that is output can also have can supply the URL(Uniform Universal Resource Locator of access, URL(uniform resource locator)) address, the user can mention this data product by this URL address.
The described method of the present embodiment is by exporting to relational database with confidential data, thereby guaranteed the security of confidential data, and by mass data being exported to non-relational database, improve the efficient of distributed query, thereby can take full advantage of relevant database and the advantage realization data processing separately of non-relational database.
Fig. 2 is the structural representation of data center systems embodiment of the present invention, in order to realize said method, as shown in the figure, this system comprises at least: abstraction module 10, derivation module 20, analysis module 30 and output module 40, in addition, can also comprise: data source 50, relational database 61 and non-relational database 62.The principle of work of this system is as follows:
After described data center systems receives query requests, extract pending raw data by abstraction module 10 from data source 50, specifically can utilize the ETL instrument to extract described pending raw data; Then by deriving module 20 confidential data in the described raw data is exported in the relational database 61, and the mass data 62 in the described raw data is exported in the non-relational database 62.
Particularly, derive module 20 and can be first identify confidential data and mass data in the described raw data by recognition unit 21 according to the preset configuration information in the configuration file of ETL instrument; And then exported in the described relational database 61 by the confidential data that lead-out unit 22 identifies recognition unit 21, and the mass data that identifies is exported in the described non-relational database 62.
After this, according to described query requests the mass data in the described non-relational database 62 is carried out statistical study by analysis module 30 and obtain valuable data, specifically can adopt mapping abbreviation technology that the mass data in the described non-relational database is carried out real-time statistic analysis and/or the off-line statistical study obtains described valuable data by a plurality of cloud node servers; Then by output module 40 the described confidential data generated query result data that is associated with described valuable data is exported, specifically can described Query Result data be exported as data product by the REST interface based on http protocol.Be correlated with and referring to the content of said method embodiment, repeat no more for example herein.
The described system of the present embodiment is by exporting to relational database with confidential data, thereby guaranteed the security of confidential data, and by mass data being exported to non-relational database, improve the efficient of distributed query, thereby can take full advantage of relevant database and the advantage realization data processing separately of non-relational database.
One of ordinary skill in the art will appreciate that: all or part of step that realizes said method embodiment can be finished by the relevant hardware of programmed instruction, aforesaid program can be stored in the computer read/write memory medium, this program is carried out the step that comprises said method embodiment when carrying out; And aforesaid storage medium comprises: the various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
It should be noted that at last: above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment puts down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (9)

1. a data processing method is characterized in that, comprising:
Data center systems extracts pending raw data from data source after receiving query requests;
Confidential data in the described raw data is exported in the relational database, and the mass data in the described raw data is exported in the non-relational database;
According to described query requests the mass data in the described non-relational database is carried out statistical study and obtain valuable data;
The described confidential data generated query result data that is associated with described valuable data is exported.
2. method according to claim 1 is characterized in that, described data center systems extracts pending raw data from data source and comprises: described data center systems utilizes the ETL instrument to extract described pending raw data.
3. method according to claim 2 is characterized in that, the confidential data in the described raw data is exported in the relational database, and the mass data in the described raw data exported in the non-relational database comprise:
Identify confidential data and mass data in the described raw data according to the preset configuration information in the configuration file of described ETL instrument;
The confidential data that identifies is exported in the described relational database, and the mass data that identifies is exported in the described non-relational database.
4. method according to claim 1, it is characterized in that, describedly mass data in the described non-relational database is carried out statistical study obtain valuable data and comprise: adopt mapping abbreviation technology that the mass data in the described non-relational database is carried out real-time statistic analysis and/or the off-line statistical study obtains described valuable data by a plurality of cloud node servers.
5. method according to claim 1 is characterized in that, described Query Result data is exported comprise: by the REST interface based on http protocol described Query Result data are exported as data product.
6. a data center systems is characterized in that, comprising:
Abstraction module is used for extracting pending raw data from data source after receiving query requests;
Derive module, be used for the confidential data of described raw data is exported in the relational database, and the mass data in the described raw data is exported in the non-relational database;
Analysis module is used for according to described query requests the mass data of described non-relational database being carried out statistical study and obtains valuable data;
Output module is exported for generated query result data that described confidential data is associated with described valuable data.
7. system according to claim 6 is characterized in that, described derivation module comprises:
Recognition unit is used for identifying confidential data and mass data in the described raw data according to the preset configuration information of the configuration file of ETL instrument;
Lead-out unit, the confidential data that is used for recognition unit is identified exports to described relational database, and the mass data that identifies is exported in the described non-relational database.
8. system according to claim 6 is characterized in that, also comprises:
Described relational database is used for receiving the confidential data of deriving by deriving module;
Described non-relational database is used for receiving the mass data that derives by deriving module.
9. system according to claim 6 is characterized in that, also comprises: described data source is used for preserving described raw data.
CN2012102570388A 2012-07-23 2012-07-23 Data center system and data processing method Pending CN102855277A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012102570388A CN102855277A (en) 2012-07-23 2012-07-23 Data center system and data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012102570388A CN102855277A (en) 2012-07-23 2012-07-23 Data center system and data processing method

Publications (1)

Publication Number Publication Date
CN102855277A true CN102855277A (en) 2013-01-02

Family

ID=47401865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012102570388A Pending CN102855277A (en) 2012-07-23 2012-07-23 Data center system and data processing method

Country Status (1)

Country Link
CN (1) CN102855277A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103747060A (en) * 2013-12-26 2014-04-23 惠州华阳通用电子有限公司 Distributed monitor system and method based on streaming media service cluster
CN104965850A (en) * 2015-04-29 2015-10-07 云南电网有限责任公司 Database high-available implementation method based on open source technology
CN106156174A (en) * 2015-04-16 2016-11-23 中国移动通信集团山西有限公司 The system and method that a kind of db transaction processes
CN106383850A (en) * 2016-08-31 2017-02-08 东软集团股份有限公司 Data processing method and apparatus
WO2017198227A1 (en) * 2016-05-19 2017-11-23 中兴通讯股份有限公司 Interactive internet protocol television system and real-time acquisition method for user data
CN108228752A (en) * 2017-12-21 2018-06-29 中国联合网络通信集团有限公司 Data full dose deriving method, data distribution device and data export node
CN111708778A (en) * 2020-06-09 2020-09-25 樊馨 Big data management method and system
CN111931214A (en) * 2020-08-31 2020-11-13 平安国际智慧城市科技股份有限公司 Data processing method, device, server and storage medium
CN112527836A (en) * 2020-12-08 2021-03-19 航天科技控股集团股份有限公司 Big data query method based on T-BOX platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102110150A (en) * 2011-02-18 2011-06-29 中交四航工程研究院有限公司 Autonomous examination and approval method based on distributed database
CN102236867A (en) * 2011-08-15 2011-11-09 悠易互通(北京)广告有限公司 Cloud computing-based audience behavioral analysis advertisement targeting system
CN102375891A (en) * 2011-11-15 2012-03-14 山东浪潮金融信息系统有限公司 Implementation tool for unloading and loading incremental data
CN102508908A (en) * 2011-11-11 2012-06-20 北京用友政务软件有限公司 Method for acquiring subordinate financial business data and system for acquiring subordinate financial business data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102110150A (en) * 2011-02-18 2011-06-29 中交四航工程研究院有限公司 Autonomous examination and approval method based on distributed database
CN102236867A (en) * 2011-08-15 2011-11-09 悠易互通(北京)广告有限公司 Cloud computing-based audience behavioral analysis advertisement targeting system
CN102508908A (en) * 2011-11-11 2012-06-20 北京用友政务软件有限公司 Method for acquiring subordinate financial business data and system for acquiring subordinate financial business data
CN102375891A (en) * 2011-11-15 2012-03-14 山东浪潮金融信息系统有限公司 Implementation tool for unloading and loading incremental data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马俊涛等: "以混合存储模型实现云计算平台对电信海量数据的处理", 《移动通信》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103747060A (en) * 2013-12-26 2014-04-23 惠州华阳通用电子有限公司 Distributed monitor system and method based on streaming media service cluster
CN103747060B (en) * 2013-12-26 2017-12-08 惠州华阳通用电子有限公司 A kind of distributed monitoring system and method based on streaming media service cluster
CN106156174A (en) * 2015-04-16 2016-11-23 中国移动通信集团山西有限公司 The system and method that a kind of db transaction processes
CN104965850A (en) * 2015-04-29 2015-10-07 云南电网有限责任公司 Database high-available implementation method based on open source technology
CN104965850B (en) * 2015-04-29 2018-01-30 云南电网有限责任公司 A kind of database high availability implementation method based on open source technology
WO2017198227A1 (en) * 2016-05-19 2017-11-23 中兴通讯股份有限公司 Interactive internet protocol television system and real-time acquisition method for user data
CN106383850A (en) * 2016-08-31 2017-02-08 东软集团股份有限公司 Data processing method and apparatus
CN108228752A (en) * 2017-12-21 2018-06-29 中国联合网络通信集团有限公司 Data full dose deriving method, data distribution device and data export node
CN111708778A (en) * 2020-06-09 2020-09-25 樊馨 Big data management method and system
CN111931214A (en) * 2020-08-31 2020-11-13 平安国际智慧城市科技股份有限公司 Data processing method, device, server and storage medium
CN112527836A (en) * 2020-12-08 2021-03-19 航天科技控股集团股份有限公司 Big data query method based on T-BOX platform
CN112527836B (en) * 2020-12-08 2022-12-30 航天科技控股集团股份有限公司 Big data query method based on T-BOX platform

Similar Documents

Publication Publication Date Title
CN102855277A (en) Data center system and data processing method
US11132384B2 (en) Generating a multi-column index for relational databases by interleaving data bits for selectivity
CN104685498B (en) The hardware implementation mode of polymerization/division operation:Hash table method
US20140351285A1 (en) Platform and method for analyzing electric power system data
US10061787B2 (en) Unified data model for integration between relational and non-relational databases
CN104899295B (en) A kind of heterogeneous data source data relation analysis method
CN104462430B (en) The data processing method and device of relevant database
CN103268336A (en) Fast data and big data combined data processing method and system
CN103631912B (en) A kind of method utilizing non-relational database storage magnanimity monitoring industrial equipment data
CN101571861A (en) Method and device for converting data table
CN107622068A (en) A kind of blog management method and device based on JSON forms
CN114064660B (en) Data structured analysis method based on ElasticSearch
CN106649368A (en) Data storage method and device and data query method and device
CN105913188A (en) Multidirectional management system and multidirectional management method of RFID supply chain
Singh et al. Cassandra-based data repository design for food supply chain traceability
CN103543959A (en) Method and device for mass data caching
CN107562949B (en) Method for writing merged report Excel template into database
CN104598520A (en) Commodity information processing method and device
JP6438295B2 (en) Automatic editing of graph input for hypergraph solvers
CN103995832A (en) Complex relational data storage technology implementation method based on separation of attributes and relations
CN110908983A (en) Intelligent marketing system based on user portrait recognition
CN110826845A (en) Multidimensional combination cost allocation device and method
CN110647845A (en) Invoice data identification device, related method and related device
Vishwanath et al. An Association Rule Mining for Materialized View Selection and View Maintanance
CN111159204B (en) Method and system for generating label in configuration mode

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130102