CN102214236A - Method and system for processing mass data - Google Patents

Method and system for processing mass data Download PDF

Info

Publication number
CN102214236A
CN102214236A CN201110182296XA CN201110182296A CN102214236A CN 102214236 A CN102214236 A CN 102214236A CN 201110182296X A CN201110182296X A CN 201110182296XA CN 201110182296 A CN201110182296 A CN 201110182296A CN 102214236 A CN102214236 A CN 102214236A
Authority
CN
China
Prior art keywords
data
platform
transmission
module
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201110182296XA
Other languages
Chinese (zh)
Other versions
CN102214236B (en
Inventor
祝博立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Feinno Communication Technology Co Ltd
Original Assignee
Beijing Feinno Communication Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Feinno Communication Technology Co Ltd filed Critical Beijing Feinno Communication Technology Co Ltd
Priority to CN 201110182296 priority Critical patent/CN102214236B/en
Publication of CN102214236A publication Critical patent/CN102214236A/en
Application granted granted Critical
Publication of CN102214236B publication Critical patent/CN102214236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method for processing mass data. The method comprises the following steps that: a scheduling module judges whether to call a data warehouse operation statement (HQL) according to acquired current service information and a predetermined scheduling strategy, acquires a calling sequence according to the acquired current service information and the predetermined scheduling strategy if the HQL is called, and calls the HQL to a data warehouse platform according to the calling sequence; and the data warehouse platform reads configuration information which corresponds to a data warehouse from a relational database, triggers the HQL to perform operation on data stored in a distributed platform according to the calling sequence, generates result data and stores the result data into the distributed platform. The invention also discloses a system for processing the mass data. By the method and the system provided by the invention, the flexibility of processing of the mass data can be improved.

Description

A kind of mass data processing method and system
Technical field
The present invention relates to data processing technique, particularly relate to a kind of mass data processing method and system.
Background technology
Along with the fast development of Internet technology, Internet user's quantity sharp increase, therefore, more and more for the demand of data processing such as the collection of Internet user's data, cleaning, statistics, analysis.Simultaneously, the magnitude of Internet user's data also is being explosive growth, thereby causes the pressure of above-mentioned data processing further to increase.
At present, when Internet user's mass data is handled, the method that adopts distributed platform (Hadoop) technology to combine with data warehouse platform (Hive) technology.In distributed platform storage mass data, the calculation command by control desk command calls data warehouse action statement (HQL) to the mass data of distributed platform storage add up, processing such as analysis, the very flexible of this method when command calls.
Summary of the invention
The invention provides a kind of mass data processing method, adopt this method can strengthen the dirigibility of mass data processing.
The present invention also provides a kind of mass data processing system, adopts this system can strengthen the dirigibility of mass data processing.
For achieving the above object, technical scheme of the present invention is achieved in that
The invention discloses a kind of mass data processing method, comprising:
Scheduler module judges whether to call the data warehouse action statement according to the current business information obtained and default scheduling strategy, when being judged as when being, obtaining according to the current business information of obtaining and default scheduling strategy and to call order;
Scheduler module is called in proper order to data warehouse platform invoke data warehouse action statement according to described;
The data warehouse platform reads the configuration information of described data warehouse action statement correspondence from relational database;
The data warehouse platform triggers described data warehouse action statement the data of distributed platform storage is carried out computing according to the described order of calling, and generates result data and also stores described distributed platform into.
Described generation destination file also stores into after the described distributed platform, also comprises:
Scheduler module is controlled described distributed platform described result data is imported described relational database;
Scheduler module control cache module extracts result data commonly used according to the default strategy that represents from described relational database;
The data exhibiting platform reads from described cache module and represents described result data commonly used.
Described data exhibiting platform reads from described cache module and represents after the described destination file commonly used, also comprises:
The data exhibiting platform reads from described relational database and represents described result data.
Described scheduler module judges whether to call before the data warehouse action statement according to current business information of obtaining and default scheduling strategy, also comprises:
The data access platform is at least data of distributed platform transmission;
When each transmission was finished, the data access platform sent data transmission to the message interface module and finishes message;
Described scheduler module is obtained at least once described data transmission from described message interface module and is finished message, as described current business information.
Described data access platform is finished message to message interface module transmission data transmission and is comprised:
Described data access platform adopts the transmission of messages scheme protoBuffer of Google communication modes to send described data transmission to the message interface module and finishes message.
The invention discloses a kind of mass data processing system, comprising:
Scheduler module, be used for judging whether to call the data warehouse action statement according to current business information of obtaining and default scheduling strategy, when being judged as when being, obtain according to the current business information of obtaining and default scheduling strategy and to call order, according to the described order of calling to data warehouse platform invoke data warehouse action statement;
Described data warehouse platform, be used for reading the configuration information of described data warehouse action statement correspondence from relational database, trigger described data warehouse action statement the data of distributed platform storage are carried out computing according to the described order of calling, generate result data and also store described distributed platform into;
Described relational database is used to store the configuration information of described data warehouse action statement correspondence;
Distributed platform is used to store described data and described result data.
Described scheduler module also is used to control described distributed platform described result data is imported described relational database, and the control cache module extracts result data commonly used according to the default strategy that represents from described relational database;
Described system also comprises:
Described cache module: be used for the described result data commonly used of buffer memory;
The data exhibiting platform is used for reading and representing described result data commonly used from described cache module.
Described data exhibiting platform also is used for reading and representing described result data from described relational database.
Described system also comprises:
The data access platform is used for, sending data transmission to the message interface module and finishing message when each transmission is finished at least data of distributed platform transmission;
Described message interface module is used to receive described data transmission and finishes message;
Described scheduler module also is used for obtaining at least once described data transmission from described message interface module and finishes message, as described current business information.
Described data access platform specifically is used to adopt Google's transmission of messages scheme protoBuffer communication modes to send described data transmission to the message interface module to finish message.
By the foregoing invention content as seen, in the mass data processing system, add scheduler module, this module is determined to call the data warehouse action statement and is called order according to current business information and default scheduling strategy, under the control of scheduler module, finish data handling procedure, thereby avoided in the existing mass data processing system by control desk transmitting order to lower levels one by one, because control by scheduler module, can be according to the logic of the business of required realization, the corresponding scheduling strategy of flexible configuration and call order, thus the dirigibility of mass data processing strengthened.
Description of drawings
Fig. 1 is the process flow diagram of the mass data processing method of the embodiment of the invention one;
Fig. 2 is the process flow diagram of the mass data processing method of the embodiment of the invention two;
Fig. 3 is the structural representation of the mass data processing system of the embodiment of the invention three.
Embodiment
In order to make the purpose, technical solutions and advantages of the present invention clearer, describe the present invention below in conjunction with the drawings and specific embodiments.
Basic thought of the present invention is, in the mass data processing system, add scheduler module, this module is determined to call the data warehouse action statement and is called order according to current business information and default scheduling strategy, finishes data handling procedure under the control of scheduler module.
Fig. 1 is the process flow diagram of the mass data processing method of the embodiment of the invention one.As shown in Figure 1, this method comprises following process at least.
Step 101: scheduler module judges whether to call the data warehouse action statement according to the current business information obtained and default scheduling strategy, when being judged as when being, obtaining according to the current business information of obtaining and default scheduling strategy and to call order.
Step 102: scheduler module is according to calling order to data warehouse platform invoke data warehouse action statement.
Step 103: the configuration information of data warehouse platform reading of data warehouse action statement correspondence from relational database (mysql).
Step 104: the data warehouse platform carries out computing according to calling order trigger data warehouse action statement to the data of distributed platform storage, generates result data and stores distributed platform into.
Fig. 2 is the process flow diagram of the mass data processing method of the embodiment of the invention two.As shown in Figure 2, this method comprises following process.
Step 201: the data access platform is at least data of distributed platform transmission.
In this step, a kind of preferred implementation is that the data transmission that the data access platform regularly will receive is to the distributed platform the inside.Distributed platform supports Data Receiving, arrangement, calculating, the distribution result of calculation of peripheral system to arrive functions such as reporting system.Particularly, distributed platform is the data storage platform under the foundation (Apache) of abroad increasing income, by member compositions such as distributed file system (HDFS), distributed document processing.Wherein, the processing of distributed file system (HDFS) and distributed document is two most important members the most basic.Distributed file system (HDFS) is the version of increasing income of distribution file system of Google (GFS), it is a highly fault-tolerant distributed file system, it can provide the data access of high-throughput, the big file that is fit to storage magnanimity, the big file that surpasses 64M of PB level for example, big file is split into N little file be distributed to above the different machines, and the quantity of backup can be set, thus still can operate as normal when some machine goes wrong.It is the sharp weapon that large-scale data calculates that distributed document is handled, and for example TB level data comprise that distributed data extracts (Map) and distributed data is handled (Reduce) module.The distributed data abstraction module is responsible for data are broken up; The distributed data processing module is responsible for data are assembled.The user only need realize that distributed data extracts and distributed data is handled two interfaces, can finish TB level data computing.Distributed document is handled and can be applied to data analyses such as log analysis and data mining, also can be applicable to science data and calculates, as the calculating of circular constant PI etc.
Step 202: when each transmission was finished, the data access platform sent data transmission to the message interface module and finishes message.
In this step, when distributed platform transmission data were finished, the data access platform sent data transmission to the message interface module and finishes message the data access platform at every turn, and the information synchronization of data transmission being finished by this message is to the application system of data platform.A kind of preferred implementation is that the data access platform adopts a kind of transmission of messages scheme (protoBuffer) communication modes of Google to send data transmission to the message interface module and finishes message.
Step 203: scheduler module is obtained at least one data transfer from the message interface module and is finished message, as current business information.
In this step, for example, the data access platform has transmitted 3 secondary data to distributed platform, correspondingly, scheduler module is obtained 3 data transfer from the message interface module and is finished message, and scheduler module is finished message as current business information with the data transmission of obtaining for 3 times.
Step 204: scheduler module judges whether to call the data warehouse action statement according to current business information of obtaining and default scheduling strategy.When being judged as when being execution in step 205; When whether being judged as, return step 201.
In this step, scheduling strategy sets in advance in scheduler module.Scheduling strategy is used to indicate the trigger condition of calling the data warehouse action statement, if current business information satisfies the scheduling strategy defined terms, then scheduler module is judged as and calls the data warehouse action statement, otherwise, if current business information does not satisfy the scheduling strategy defined terms, then scheduler module is judged as and never calls the data warehouse action statement.For example, the data that the data access platform receives comprise the data of many aspects, data import to the distributed platform the inside several times, correspondingly, scheduler module is obtained repeatedly data transmission from the message interface module and is finished message, dispatching system is finished message according to data transmission repeatedly and is judged whether to call the data warehouse action statement, according to scheduling strategy, when receiving only wherein the partial data transmission when finishing message, never call the data warehouse action statement, have only when the data of above-mentioned many aspects are all complete import to distributed platform after, receive whole data transmission and finish message, scheduler module just is judged as and begins to call the data warehouse action statement, to carry out data computation.
Step 205: scheduler module is obtained according to current business information of obtaining and default scheduling strategy and is called order.
In this step,, there is not logical communication link between the step that has mutually, and must carries out in a certain order between the step that has, therefore, carry out calculating according to certain sequence call data warehouse action statement of calling because data computation comprises a lot of steps.This calls order and sets in advance in scheduler module.Can preset a plurality of orders of calling in scheduler module, scheduler module can select to call accordingly order according to current business information of obtaining and default scheduling strategy.
Step 206: scheduler module is according to calling order to data warehouse platform invoke data warehouse action statement.
Step 207: the configuration information of data warehouse platform reading of data warehouse action statement correspondence from relational database.
In this step, the data warehouse platform is a Structured Query Language (SQL) (SQL) analytics engine, and it is used for that SQL statement is translated into distributed data extraction/distributed data handles, and carries out in distributed platform then, to reach the purpose of quick exploitation.The table of storing in the data warehouse platform is the catalogue of distributed platform, particularly, data warehouse platform default table is deposited the data warehouse catalogue that the path is positioned at the work at present catalogue, separate as file with table name, if there is partition table in work at present, then the subregion value is a sub-folder, can directly directly use this part data in other distributed data extraction/distributed data is handled.The data warehouse platform can carry out related with relational database.File or catalogue that the data warehouse action statement need be operated are mapped to the table name information stores in relational database, and the field information that the field in the file also is mapped to the table that will operate is stored in the relational database, and table name information that above-mentioned mapping obtains and field information are as the configuration information of this data warehouse action statement.When data warehouse receives when calling the order that the data warehouse action statement calculates, can resolve the order that receives, and from relational database, read the relevant configuration information of data warehouse action statement that is called, be translated into distributed data extraction/distributed data handling procedure according to this configuration information and carry out statistical computation.
Step 208: the data warehouse platform carries out computing according to calling order trigger data warehouse action statement to the data of distributed platform storage, generates result data and stores distributed platform into.
Step 209: scheduler module control distributed platform imports relational database with result data.
In this step, particularly, calling module adopts and imports the result data that algorithm generates from the reading of data warehouse calculating of distributed platform the inside, this result data can be with the storage of the form of destination file, then calling module according to business demand with in a plurality of tables of data of The above results data importing in the relational database.
Step 210: scheduler module control cache module extracts result data commonly used according to the default strategy that represents from relational database.
In this step, representing strategy sets in advance in scheduler module, this represents the frequently-used data that strategy is used to indicate exhibition platform, scheduler module represents strategy according to this, and the result data that belongs to the frequently-used data of exhibition platform in the result data of storing in the relational database is drawn in the cache module.Particularly, cache module can adopt memory cache (memcache) technology, it is a high performance distributed memory object caching system, data by huge hash (Hash) table of safeguarding a unification in internal memory is stored various forms comprise the result of image, video, file and database retrieval etc.Cache module is a kind of distributed, just can allow a plurality of users on the different main frames to visit simultaneously, thereby not only having solved shared drive can only be the drawback of unit, but also has reduced the pressure of database retrieval, and has improved the speed of obtaining data of visiting.
Step 211: the data exhibiting platform reads from cache module and represents result data commonly used.
In this step, the data exhibiting platform obtains by read result data from cache module, and represent result data commonly used after acquisition for self data commonly used.The data that are of little use for the data exhibiting platform are because can't read from cache module, so continue to carry out following step 212.
Step 212: the data exhibiting platform reads from relational database and represents result data.
In this step, the data that the data exhibiting platform is of little use for example, need the data of dynamic mapping and inquiry etc., and the data exhibiting platform obtains by read result data from relational database, and represents result data commonly used after acquisition.
Fig. 3 is the structural representation of the mass data processing system of the embodiment of the invention three.As shown in Figure 3, this mass data processing system comprises at least: scheduler module 31, data warehouse platform 32, relational database 33 and distributed platform 34.On this basis, can also comprise: data access platform 35, message interface module 36, cache module 37 and data exhibiting platform 38.Above-mentioned message interface module 36 can all be arranged in application system with scheduler module 31.Wherein the processing mode and the flow process of each ingredient execution can be referring to the records of the embodiment of the invention one and the embodiment of the invention two.
Wherein, scheduler module 31 judges whether to call the data warehouse action statement according to current business information of obtaining and default scheduling strategy, when being judged as when being, obtain according to the current business information of obtaining and default scheduling strategy and to call order, call the data warehouse action statement to data warehouse platform 32 according to calling order.
The configuration information of data warehouse platform 32 reading of data warehouse action statement correspondence from relational database 33, according to calling order trigger data warehouse action statement the data of distributed platform 34 storages are carried out computing, generate result data and store distributed platform 34 into.
The configuration information of relational database 33 storage data warehouse action statement correspondences.
Distributed platform 34 above-mentioned data of storage and The above results data.
On the basis of technique scheme, in said system, comprise under the situation of data access platform 35 and message interface module 36, data access platform 35, sends data transmission to message interface module 36 and finishes message when each transmission is finished at least data of distributed platform 34 transmission.Message interface module 36 receives data transmission and finishes message.Scheduler module 31 is obtained at least one data transfer from message interface module 36 and is finished message, as current business information.Particularly, data access platform 35 specifically can adopt a kind of transmission of messages scheme of Google, and for example the protoBuffer communication modes sends data transmission to message interface module 36 and finishes message.Wherein, the data that data access platform 35 is used for peripheral system insert, and support real-time interface to insert.The data form according to the rules that data access platform 35 receives generates text, for example file of txt form.And data access platform 35 regularly is transferred to above-mentioned text the HDFS file system the inside of distributed platform 34.
On the basis of technique scheme, in said system, comprise under the situation of cache module 37, scheduler module 31 is also controlled distributed platform 34 result data is imported relational database 33, and control cache module 37 extracts result data commonly used according to the default strategy that represents from relational database 33.The result data that cache module 37 buffer memorys are commonly used.
Data exhibiting platform 38 represents the interface with the result data of the final arrangement of notebook data disposal system.The Data Source of data exhibiting platform 38 comprises following two kinds: the first, from cache module 37, obtain; The second, from relational database, obtain.Particularly, data exhibiting platform 38 reads from cache module 37 and represents result data commonly used.And data exhibiting platform 38 also reads from relational database 33 and represents result data.
According to above embodiment as seen, in the mass data processing system, add scheduler module, this module is determined to call the data warehouse action statement and is called order according to current business information and default scheduling strategy, under the control of scheduler module, finish data handling procedure, thereby avoided in the existing mass data processing system by control desk transmitting order to lower levels one by one, because control by scheduler module, can be according to the logic of the business of required realization, the corresponding scheduling strategy of flexible configuration and call order, thus the dirigibility of mass data processing strengthened.And, by cache module storage result data commonly used, the data exhibiting module preferentially reads result data and represents from cache module, have only when not storing required result data in the cache module, the data exhibiting platform just can read from database, thereby has reduced the pressure that a large amount of visits cause for the data exhibiting platform by increasing cache module.
The above only is preferred embodiment of the present invention, and is in order to restriction the present invention, within the spirit and principles in the present invention not all, any modification of being made, is equal to replacement, improvement etc., all should be included within the scope of protection of the invention.

Claims (10)

1. a mass data processing method is characterized in that, comprising:
Scheduler module judges whether to call the data warehouse action statement according to the current business information obtained and default scheduling strategy, when being judged as when being, obtaining according to the current business information of obtaining and default scheduling strategy and to call order;
Scheduler module is called in proper order to data warehouse platform invoke data warehouse action statement according to described;
The data warehouse platform reads the configuration information of described data warehouse correspondence from relational database;
The data warehouse platform triggers described data warehouse action statement the data of distributed platform storage is carried out computing according to the described order of calling, and generates result data and also stores described distributed platform into.
2. mass data processing method according to claim 1 is characterized in that, described generation destination file also stores into after the described distributed platform, also comprises:
Scheduler module is controlled described distributed platform described result data is imported described relational database;
Scheduler module control cache module extracts result data commonly used according to the default strategy that represents from described relational database;
The data exhibiting platform reads from described cache module and represents described result data commonly used.
3. mass data processing method according to claim 2 is characterized in that, described data exhibiting platform reads from described cache module and represents after the described destination file commonly used, also comprises:
The data exhibiting platform reads from described relational database and represents described result data.
4. according to any described mass data processing method in the claim 1 to 3, it is characterized in that described scheduler module judges whether to call before the data warehouse action statement according to current business information of obtaining and default scheduling strategy, also comprises:
The data access platform is at least data of distributed platform transmission;
When each transmission was finished, the data access platform sent data transmission to the message interface module and finishes message;
Described scheduler module is obtained at least once described data transmission from described message interface module and is finished message, as described current business information.
5. mass data processing method according to claim 4 is characterized in that, described data access platform is finished message to message interface module transmission data transmission and comprised:
Described data access platform adopts the transmission of messages scheme protoBuffer of Google communication modes to send described data transmission to the message interface module and finishes message.
6. a mass data processing system is characterized in that, comprising:
Scheduler module, be used for judging whether to call the data warehouse action statement according to current business information of obtaining and default scheduling strategy, when being judged as when being, obtain according to the current business information of obtaining and default scheduling strategy and to call order, according to the described order of calling to data warehouse platform invoke data warehouse action statement;
Described data warehouse platform, be used for reading the configuration information of described data warehouse correspondence from relational database, trigger described data warehouse action statement the data of distributed platform storage are carried out computing according to the described order of calling, generate result data and also store described distributed platform into;
Described relational database is used to store the configuration information of described data warehouse action statement correspondence;
Distributed platform is used to store described data and described result data.
7. mass data processing according to claim 6 system is characterized in that,
Described scheduler module also is used to control described distributed platform described result data is imported described relational database, and the control cache module extracts result data commonly used according to the default strategy that represents from described relational database;
Described system also comprises:
Described cache module: be used for the described result data commonly used of buffer memory;
The data exhibiting platform is used for reading and representing described result data commonly used from described cache module.
8. mass data processing according to claim 7 system is characterized in that,
Described data exhibiting platform also is used for reading and representing described result data from described relational database.
9. according to any described mass data processing system in claim 6 or 8, it is characterized in that described system also comprises:
The data access platform is used for, sending data transmission to the message interface module and finishing message when each transmission is finished at least data of distributed platform transmission;
Described message interface module is used to receive described data transmission and finishes message;
Described scheduler module also is used for obtaining at least once described data transmission from described message interface module and finishes message, as described current business information.
10. mass data processing according to claim 9 system is characterized in that,
Described data access platform specifically is used to adopt Google's transmission of messages scheme protoBuffer communication modes to send described data transmission to the message interface module to finish message.
CN 201110182296 2011-06-30 2011-06-30 Method and system for processing mass data Active CN102214236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110182296 CN102214236B (en) 2011-06-30 2011-06-30 Method and system for processing mass data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110182296 CN102214236B (en) 2011-06-30 2011-06-30 Method and system for processing mass data

Publications (2)

Publication Number Publication Date
CN102214236A true CN102214236A (en) 2011-10-12
CN102214236B CN102214236B (en) 2013-10-23

Family

ID=44745544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110182296 Active CN102214236B (en) 2011-06-30 2011-06-30 Method and system for processing mass data

Country Status (1)

Country Link
CN (1) CN102214236B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750368A (en) * 2012-06-18 2012-10-24 天津神舟通用数据技术有限公司 High-speed importing method of cluster data in data base
CN102880503A (en) * 2012-08-24 2013-01-16 新浪网技术(中国)有限公司 Data analysis system and data analysis method
CN102904952A (en) * 2012-10-12 2013-01-30 北京锐安科技有限公司 Self-adapting system and method for efficiently processing input of mass data to database
CN102929961A (en) * 2012-10-10 2013-02-13 北京锐安科技有限公司 Data processing method and device thereof based on building quick data staging channel
CN104090901A (en) * 2013-12-31 2014-10-08 腾讯数码(天津)有限公司 Method, device and server for processing data
CN104102701A (en) * 2014-07-07 2014-10-15 浪潮(北京)电子信息产业有限公司 Hive-based method for filing and inquiring historical data
CN104298671A (en) * 2013-07-16 2015-01-21 深圳中兴网信科技有限公司 Data statistics analysis method and device
CN106446168A (en) * 2016-09-26 2017-02-22 北京赛思信安技术股份有限公司 Oriented distribution data warehouse high efficiency load client end realization method
CN106909641A (en) * 2017-02-16 2017-06-30 青岛高校信息产业股份有限公司 A kind of real-time data memory device
CN108153852A (en) * 2017-12-22 2018-06-12 中国平安人寿保险股份有限公司 A kind of data processing method, device, terminal device and storage medium
CN109408598A (en) * 2018-09-14 2019-03-01 深圳市新代信息技术研究院有限公司 A kind of mass data processing system for multimedia research and development training platform
CN111078770A (en) * 2019-11-28 2020-04-28 曙光信息产业股份有限公司 Data processing system, method and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1604042A (en) * 2003-09-30 2005-04-06 国际商业机器公司 Method for dispatching task, dispatcher and net computer system
CN101038559A (en) * 2006-09-11 2007-09-19 中国工商银行股份有限公司 Batch task scheduling engine and dispatching method
CN101127578A (en) * 2007-09-14 2008-02-20 广东威创日新电子有限公司 A method and system for processing a magnitude of data
CN101364891A (en) * 2007-08-10 2009-02-11 中兴通讯股份有限公司 System for collecting performance data by single point in distributed telecommunication network management and implementing method
US20100077107A1 (en) * 2008-09-19 2010-03-25 Oracle International Corporation Storage-side storage request management
US20100223297A1 (en) * 2007-03-30 2010-09-02 Alibaba Group Holding Limited Data Merging in Distributed Computing
CN101937524A (en) * 2009-06-30 2011-01-05 华中师范大学 Graduation design personalized guide system
CN102033912A (en) * 2010-11-25 2011-04-27 北京北纬点易信息技术有限公司 Distributed-type database access method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1604042A (en) * 2003-09-30 2005-04-06 国际商业机器公司 Method for dispatching task, dispatcher and net computer system
CN101038559A (en) * 2006-09-11 2007-09-19 中国工商银行股份有限公司 Batch task scheduling engine and dispatching method
US20100223297A1 (en) * 2007-03-30 2010-09-02 Alibaba Group Holding Limited Data Merging in Distributed Computing
CN101364891A (en) * 2007-08-10 2009-02-11 中兴通讯股份有限公司 System for collecting performance data by single point in distributed telecommunication network management and implementing method
CN101127578A (en) * 2007-09-14 2008-02-20 广东威创日新电子有限公司 A method and system for processing a magnitude of data
US20100077107A1 (en) * 2008-09-19 2010-03-25 Oracle International Corporation Storage-side storage request management
CN101937524A (en) * 2009-06-30 2011-01-05 华中师范大学 Graduation design personalized guide system
CN102033912A (en) * 2010-11-25 2011-04-27 北京北纬点易信息技术有限公司 Distributed-type database access method and system

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750368B (en) * 2012-06-18 2014-03-26 天津神舟通用数据技术有限公司 High-speed importing method of cluster data in data base
CN102750368A (en) * 2012-06-18 2012-10-24 天津神舟通用数据技术有限公司 High-speed importing method of cluster data in data base
CN102880503B (en) * 2012-08-24 2015-04-15 新浪网技术(中国)有限公司 Data analysis system and data analysis method
CN102880503A (en) * 2012-08-24 2013-01-16 新浪网技术(中国)有限公司 Data analysis system and data analysis method
CN102929961A (en) * 2012-10-10 2013-02-13 北京锐安科技有限公司 Data processing method and device thereof based on building quick data staging channel
CN102929961B (en) * 2012-10-10 2016-12-21 北京锐安科技有限公司 Based on the data processing method and the device thereof that build rapid data classification passage
CN102904952A (en) * 2012-10-12 2013-01-30 北京锐安科技有限公司 Self-adapting system and method for efficiently processing input of mass data to database
CN102904952B (en) * 2012-10-12 2015-07-01 北京锐安科技有限公司 Self-adapting system and method for efficiently processing input of mass data to database
CN104298671B (en) * 2013-07-16 2018-02-13 深圳中兴网信科技有限公司 data statistical analysis method and device
CN104298671A (en) * 2013-07-16 2015-01-21 深圳中兴网信科技有限公司 Data statistics analysis method and device
CN104090901A (en) * 2013-12-31 2014-10-08 腾讯数码(天津)有限公司 Method, device and server for processing data
CN104090901B (en) * 2013-12-31 2017-06-13 腾讯数码(天津)有限公司 A kind of method that data are processed, device and server
CN104102701B (en) * 2014-07-07 2017-10-13 浪潮(北京)电子信息产业有限公司 A kind of historical data based on hive is achieved and querying method
CN104102701A (en) * 2014-07-07 2014-10-15 浪潮(北京)电子信息产业有限公司 Hive-based method for filing and inquiring historical data
CN106446168A (en) * 2016-09-26 2017-02-22 北京赛思信安技术股份有限公司 Oriented distribution data warehouse high efficiency load client end realization method
CN106446168B (en) * 2016-09-26 2019-11-01 北京赛思信安技术股份有限公司 A kind of load client realization method of Based on Distributed data warehouse
CN106909641A (en) * 2017-02-16 2017-06-30 青岛高校信息产业股份有限公司 A kind of real-time data memory device
CN106909641B (en) * 2017-02-16 2020-09-29 青岛高校信息产业股份有限公司 Real-time data memory
CN108153852A (en) * 2017-12-22 2018-06-12 中国平安人寿保险股份有限公司 A kind of data processing method, device, terminal device and storage medium
CN109408598A (en) * 2018-09-14 2019-03-01 深圳市新代信息技术研究院有限公司 A kind of mass data processing system for multimedia research and development training platform
CN111078770A (en) * 2019-11-28 2020-04-28 曙光信息产业股份有限公司 Data processing system, method and storage medium
CN111078770B (en) * 2019-11-28 2023-07-21 曙光信息产业股份有限公司 Data processing system, method and storage medium

Also Published As

Publication number Publication date
CN102214236B (en) 2013-10-23

Similar Documents

Publication Publication Date Title
CN102214236B (en) Method and system for processing mass data
CN107247808B (en) Distributed NewSQL database system and picture data query method
CN106897322B (en) A kind of access method and device of database and file system
KR101621137B1 (en) Low latency query engine for apache hadoop
CN102663117B (en) OLAP (On Line Analytical Processing) inquiry processing method facing database and Hadoop mixing platform
US20120130963A1 (en) User defined function database processing
CN112905595A (en) Data query method and device and computer readable storage medium
US10242051B2 (en) Efficient multi-tenant spatial and relational indexing
CN104679898A (en) Big data access method
CN105138679B (en) A kind of data processing system and processing method based on distributed caching
CN104903894A (en) System and method for distributed database query engines
CN103177027A (en) Method and system for obtaining dynamic feed index
CN104850572A (en) HBase non-primary key index building and inquiring method and system
CN101799808A (en) Data processing method and system thereof
CN110263061A (en) A kind of data query method and system
CN106484713A (en) A kind of based on service-oriented Distributed Request Processing system
CN107193898B (en) The inquiry sharing method and system of log data stream based on stepped multiplexing
CN107784103A (en) A kind of standard interface of access HDFS distributed memory systems
CN103853714A (en) Data processing method and device
CN103077197A (en) Data storing method and device
CN101329686A (en) System for implementing network search caching and search method
CN105159845A (en) Memory reading method
CN104036029A (en) Big data consistency comparison method and system
EP2469423A1 (en) Aggregation in parallel computation environments with shared memory
CN106055678A (en) Hadoop-based panoramic big data distributed storage method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP02 Change in the address of a patent holder

Address after: Room 810, 8 / F, 34 Haidian Street, Haidian District, Beijing 100080

Patentee after: BEIJING D-MEDIA COMMUNICATION TECHNOLOGY Co.,Ltd.

Address before: 100089 Beijing city Haidian District wanquanzhuang Road No. 28 Wanliu new building A block 5 layer

Patentee before: BEIJING D-MEDIA COMMUNICATION TECHNOLOGY Co.,Ltd.

CP02 Change in the address of a patent holder