CN105243155A - Big data extracting and exchanging system - Google Patents

Big data extracting and exchanging system Download PDF

Info

Publication number
CN105243155A
CN105243155A CN201510711186.6A CN201510711186A CN105243155A CN 105243155 A CN105243155 A CN 105243155A CN 201510711186 A CN201510711186 A CN 201510711186A CN 105243155 A CN105243155 A CN 105243155A
Authority
CN
China
Prior art keywords
data
csp
switching point
control switching
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510711186.6A
Other languages
Chinese (zh)
Inventor
姬源
黄育松
谢冬
王向东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Dispatch Control Center of Guizhou Power Grid Co Ltd
Original Assignee
Electric Power Dispatch Control Center of Guizhou Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Dispatch Control Center of Guizhou Power Grid Co Ltd filed Critical Electric Power Dispatch Control Center of Guizhou Power Grid Co Ltd
Priority to CN201510711186.6A priority Critical patent/CN105243155A/en
Publication of CN105243155A publication Critical patent/CN105243155A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a big data extracting and exchanging system. The invention relates to a big data extracting and exchanging method and system. Efficient data exchange is realized by steps of combining a control and exchange centre deployed in Spark with a plurality of exchange agents, supporting data bidirectional flow between a relational database, a non-structured document and a sensor database, and an Hive system, an HBase system, and an HDFS system of a Hadoop platform correspondingly, adopting parallel task scheduling, and adopting memory for storing all intermediate data.

Description

A kind of large data pick-up and exchange system
Technical field
The present invention relates to the method and system of a kind of large data pick-up and exchange, the control being deployed in Spark platform by one with switching centre in conjunction with some clearing agents, support relational database, non-structured document, sensor database and Hadoop platform Hive, the two-way circulation of HBase, HDFS data among systems, by adopting Parallel Task Scheduling and adopting internal memory to store all intermediate data, realize efficient exchanges data.
Background technology
Along with the continuous increase of business data amount, computing machine needs data to be processed to reach TB rank from MB rank, even PB rank, individual server cannot carry out storing to all data of enterprise and analyze, and needs data pick-up to be aggregated into large data platform and carries out analyzing and processing.Enterprise's Legacy System comprises various types of data usually, comprises the business datum being stored in relational database system, is stored as various document information and the journal file of document form, also comprises the Real-time Monitoring Data etc. from large quantity sensor.How the successful first step of large data items to the collection of carrying out that these data are all efficient, real-time.
Hadoop platform is the most frequently used large data platform software at present, and Hadoop achieves the running environment of MapReduce program, supports the distributed execution of task.HDFS is a distributed file system, and this file system data can store and multiple copy, therefore has very high fault-tolerance.But HDFS does not allow to modify to file content, can only add file content.Hive is a data warehouse, and data store with HDFS with non-structured text form, and upper strata provides the query interface of similar SQL, and provides translation engine that query statement is automatically translated into MapReduce program to perform.Because data are stored in HDFS, in Hive, data also can only read and can not revise.HBase is a kind of column stored data base, and data acquisition major key accesses, and does not support SQL query, but has very high handling capacity, and HBase supported data is revised.
There is the large data acquisition system (DAS) of some single types at present, the Sqoop system of the such as Hadoop ecosystem, the data pick-up walked abreast is carried out in support from relational database, support the various Sybase such as Oracle, SQLServer, MySql at present, and supported that task is extracted in the execution walked abreast by MapReduce.Such as distributed message acquisition system kafka, be a kind of distributed post subscribe message system of high-throughput, it can process the everything flow data in the website of consumer's scale.This action (web page browsing, the action of search and other users) is a key factor of the many social functions on modern network.These data are normally solved by process daily record and log aggregation due to the requirement of handling capacity.Also this distributed crawler system of such as Nutch, can walk abreast from internet and capture data and be stored in Hadoop file system.
The instrument widespread use mutually transformed between relational database and enterprise, comprise the instrument that Oracle, SQLServer also both provide other databases of data exporting.Informatica and IBM also has Related product, supports the conversion of the structuring such as relational database, XML semi-structured data.But also there is no special system to support that in large data platform, system and traditional relational etc. exchange easily at present.Because large data system One's name is legion, also in continuous increase, only NoSQL database just has tens kinds, how to provide good system architecture that these databases are linked into exchange system, is the problem with challenge.
At present these large data acquisition system (DAS)s independently exist mutually, and the load mechanism of Hadoop is single, and the data such as extracted from relational database can only be loaded into Hive, and can not be loaded in HBase and realize some inquiry services fast.Be loaded into after in Hadoop in addition, also there is not a kind of method supported data and flow at Hadoop different sub-systems.Data in such as Hive need to carry out mass data cleaning, and the amendment of Hive not supported data itself, at this moment just need to process in data batchmove to HBase.
Summary of the invention
Because the above-mentioned defect of prior art, technical matters to be solved by this invention is to provide the system of a kind of large data pick-up and exchange, support relational database, non-structured document, sensor database and Hadoop platform Hive, the two-way circulation of HBase, HDFS data among systems, by adopting Parallel Task Scheduling and adopting internal memory to store all intermediate data, realize efficient exchanges data.
For achieving the above object, the invention provides a kind of large data pick-up and exchange system, comprise the control switching point (CSP) being deployed in Spark platform, by Yarn resource management framework, Spark platform and Hadoop platform are deployed in same cluster; Control switching point (CSP) memory object stores with Spark, and all intermediate data and different types of data model conversion task are also performed by Spark;
Comprise the relational database system, non-structured document, the sensing data that are all dispersed in different servers;
Comprise an independently large data platform of clustered deploy(ment) Hadoop, the large data platform of described Hadoop comprises HDFS, HBase, Hive subsystem, for loading the data of extraction, and provides analytic function;
Comprise and be deployed on different pieces of information origin system or the clearing agent of control switching point (CSP); For coming to carry out alternately with data source by remote interface;
Comprise the control message passage between clearing agent and interactive controlling center and data channel;
Described control switching point (CSP) comprises task scheduling modules, memory object administration module, data conversion module;
Described task scheduling modules is used for the extraction of dispatching exchange proxy data, Data import task, data model translation task, data transfer task;
Described memory object administration module is for managing storage and the renewal of intermediate data;
Described data conversion module is used for the conversion between different pieces of information model and unified memory object;
The clearing agent that described control switching point (CSP) is used for notification data source carries out data pick-up, and transfers data to control switching point (CSP); Described control switching point (CSP) is for carrying out the conversion of source data model to memory object model;
Described control switching point (CSP) is also for the United Dispatching of task:
A) for the data translation tasks of control switching point (CSP), the programming language exploitation adopting Spark to provide;
B) according to demand or according to resource utilization guiding scheduler task perform order;
When memory headroom is not enough, cannot store newly to data, control switching point (CSP) notifies clearing agent according to scheduling strategy, suspends data pick-up task, when waiting memory headroom to satisfy the demand, continues to perform data pick-up task.
Preferably, when system malfunctions, control switching point (CSP) is log before carrying out each operation, restarts system after fault, recovers the state before losing efficacy, then again extracts the data of all loss, reconstruct memory headroom.
Preferably, control to adopt unified memory object model to store the intermediate data of exchanges data with switching centre, the data of each data source realize the Mapping and Converting of data model and memory object model by clearing agent; Unified memory object model adopts SparkRDD form to store data; Data are transfer in internal memory, does not write disk.
Preferably, the described control switching point (CSP) task of being waited for by queue management.
The invention has the beneficial effects as follows: the present invention supports that in Legacy System different types of data and large data platform, dissimilar system carries out exchanges data, also supports to carry out exchanges data between different system in large data platform, can meet different disposal demand.All switching tasks are unified carries out scheduling controlling, can improve the efficiency of exchanges data.
Accompanying drawing explanation
Fig. 1 is the structural representation of the embodiment of the invention.
Fig. 2 is that relational database is to Hive systems exchange instance graph.
Fig. 3 is that sensor database is to HBase systems exchange instance graph.
Fig. 4 is that file system arrives HDFS exchange instance graph.
Embodiment
Below in conjunction with drawings and Examples, the invention will be further described: a kind of large data pick-up and exchange system, comprise the control switching point (CSP) being deployed in Spark platform, by Yarn resource management framework, Spark platform and Hadoop platform are deployed in same cluster; Control switching point (CSP) memory object stores with Spark, and all intermediate data and different types of data model conversion task are also performed by Spark;
Comprise the relational database system, non-structured document, the sensing data that are all dispersed in different servers; Extraction and the loading of data is realized by clearing agent.
Comprise an independently large data platform of clustered deploy(ment) Hadoop, the large data platform of described Hadoop comprises HDFS, HBase, Hive subsystem, for loading the data of extraction, and provides analytic function;
Comprise and be deployed on different pieces of information origin system or the clearing agent of control switching point (CSP); For coming to carry out alternately with data source by remote interface;
Comprise the control message passage between clearing agent and interactive controlling center and data channel;
Described control switching point (CSP) comprises task scheduling modules, memory object administration module, data conversion module;
Described task scheduling modules is used for the extraction of dispatching exchange proxy data, Data import task, data model translation task, data transfer task;
Described memory object administration module is for managing storage and the renewal of intermediate data; If data have completed the conversion to target data model, just can delete original data object, reclaim memory headroom as early as possible.
Described data conversion module is used for the conversion between different pieces of information model and unified memory object;
The clearing agent that described control switching point (CSP) is used for notification data source carries out data pick-up, and transfers data to control switching point (CSP); Described control switching point (CSP) is for carrying out the conversion of source data model to memory object model;
Described control switching point (CSP) is also for the United Dispatching of task:
A) for the data translation tasks of control switching point (CSP), the programming language exploitation adopting Spark to provide, the execution of task utilizes distributed memory;
B) according to demand or according to resource utilization guiding scheduler task perform order;
When memory headroom is not enough, cannot store newly to data, control switching point (CSP) notifies clearing agent according to scheduling strategy, suspends data pick-up task, when waiting memory headroom to satisfy the demand, continues to perform data pick-up task.
In the present embodiment, when system malfunctions, data in EMS memory may all can be lost, control switching point (CSP) is log before carrying out each operation, restarts system after fault, recovers the state before losing efficacy, then the data of all loss are again extracted, reconstruct memory headroom.
In the present embodiment, control to adopt unified memory object model to store the intermediate data of exchanges data with switching centre, the data of each data source realize the Mapping and Converting of data model and memory object model by clearing agent; Unified memory object model adopts SparkRDD form to store data; Data are transfer in internal memory, does not write disk.
Control switching point (CSP) will take a large amount of internal memory and carry out intermediate data storage, need to carry out scheduling to task and avoid memory source not enough, in the present embodiment, and the task that described control switching point (CSP) is waited for by queue management.
Fig. 1 is an overall system composition diagram.The left side of figure represents the data resource of some existing Legacy Systems, and the right of figure represents large data platform, adopts Hadoop to build at present.The center section of figure is control switching point (CSP), disposes and Spark platform, carrys out the scheduled for executing of responsible large data pick-up and interactive task.Clearing agent is one and independently serves, and will be deployed in data source or Data import end place machine, and carry out data interaction and communication interaction with data source and Data import end, together with clearing agent also can be deployed in control switching point (CSP).
Interactive mode between two agencies that recording user is selected by control switching point (CSP), comprise the mapping ruler of metadata and data, the time that exchanges data performs and frequency, exchanges data performs full dose data or incremental data etc.Control center is responsible for sending message to agency, and order agency performs corresponding operating.Control center is responsible for carrying out task priority dispatching according to user's definition rule.
All exchanges data are carried out control agent to realize by control center.Concrete exchanges data task, needs to develop in advance, can provide exploitation by system, also can by application side's customized development.Exchanges data task refers to the conversion between the data model of data source and memory object model.The scala that all exchanges data tasks all adopt Spark platform to support or Java language exploitation, rearmost part is deployed on Spark platform to perform, thus realizes paralleling abstracting and conversion.The object done like this to utilize Spark platform distributed memory technology, carrys out the intermediate data of managing mass.
Agency has different types, for the agency that the exploitation of different data translation types is corresponding.Such as will from relational database to HBase, we develop and corresponding act on behalf of S, are responsible for from relational database extracted data, act on behalf of D and are responsible for HBase and load data.Agency is adopted to add the framework of control center, thus the extensibility of back-up system.For a kind of new data source, by the clearing agent of exploitation correspondence, the exchange of existing system in data and system can be realized.
The deployed position of agency also can be positioned at control switching point (CSP), relational database etc. is provided to the data source of service interface, and agency is positioned at data also can be carried out in control switching point (CSP) extraction by remote interface.
Next will be described by three concrete scenes.
First scene sees Fig. 2, in certain national grid subsidiary company, will analyze a large number of users electrographic recording in storage now and relational database, but traditional database cannot provide High Performance Data Query to analyze demand, therefore needs Data import to analyze to Hadoop platform Hive system.
First need the interactive agent of a selection correspondence database, such as Oracle interactive agent, realize the extraction of data, and be transferred to control switching point (CSP).Relation data is converted to memory object model by control switching point (CSP).Here corresponding memory object model, each form is exactly a class.This memory object model storage, in SparkRDD, is namely stored by distributed memory.Then memory object model conversion in RDD is the data model of Hive by control switching point (CSP), and transfers to Hive clearing agent data to be write in Hive system.
Before Hive analyzes, find to there is a large amount of dirty data, need to carry out data scrubbing, but Hive does not support to modify to data, therefore partial data is moved to HBase to clear up, and then is written back in Hive.Here the Mutual data transmission realizing Hive and HBase is needed.Respectively all four-headed arrow between agency, control switching point (CSP), large data system as we can see from the figure.Need to select Hive agency and HBase agency.Because Hive and HBase data model is also different, therefore user needs definition rule, and which shows and will be transformed in HBase which row to select Hive, and which row is as the key of HBase, and which row is by the row bunch of which kind of form as HBase.Concrete mapping method the present invention does not list in detail, can be accomplished in several ways.
For the data revised to Hive write-back, first the raw data list deletion of correspondence, more amended data can be write, also the data of amendment can be write new table, and raw data coexist in Hive system.
Second scenario sees Fig. 3, and in concrete implementation environment, a large amount of power equipment and residing environment arrange some sensors, carry out the information such as Real-time Collection equipment operational factor, temperature, humidity, and are stored in the key-value pair data storehouse of sensor server.Because cumulative data amount is large especially, need Data Migration now in HBase.Key-value pair data storehouse clearing agent can be selected to carry out data pick-up, select HBase clearing agent to realize Data import.Key-value pair type and HBase data model can be easy to map, and HBase is just reduced to a key-value pair data storehouse to be existed.
3rd scene sees Fig. 4, some servers constantly produce heap file, and the source of these files may from web crawlers, also may from server log, present needs extract some key messages in real time from these files, being stored in HDFS of real time high-speed.In order to improve data throughout, we can design special agency.This agency comprises a hook program, can the file message of capturing operation system, when server will carry out file write operation, synchronously resolve information order, obtain file content, be put in internal memory, then be sent to control switching point (CSP) by clearing agent, then transfer to HDFS clearing agent to carry out file write, all like this intermediate data are all present in internal memory, greatly can reduce disk I/O, improve data transmission efficiency.
More than describe preferred embodiment of the present invention in detail.Should be appreciated that those of ordinary skill in the art just design according to the present invention can make many modifications and variations without the need to creative work.Therefore, all technician in the art, all should by the determined protection domain of claims under this invention's idea on the basis of existing technology by the available technical scheme of logical analysis, reasoning, or a limited experiment.

Claims (4)

1. large data pick-up and an exchange system, is characterized in that:
Comprise the control switching point (CSP) being deployed in Spark platform, by Yarn resource management framework, Spark platform and Hadoop platform are deployed in same cluster; Control switching point (CSP) memory object stores with Spark, and all intermediate data and different types of data model conversion task are also performed by Spark;
Comprise the relational database system, non-structured document, the sensing data that are all dispersed in different servers;
Comprise an independently large data platform of clustered deploy(ment) Hadoop, the large data platform of described Hadoop comprises HDFS, HBase, Hive subsystem, for loading the data of extraction, and provides analytic function;
Comprise and be deployed on different pieces of information origin system or the clearing agent of control switching point (CSP); For coming to carry out alternately with data source by remote interface;
Comprise the control message passage between clearing agent and interactive controlling center and data channel;
Described control switching point (CSP) comprises task scheduling modules, memory object administration module, data conversion module;
Described task scheduling modules is used for the extraction of dispatching exchange proxy data, Data import task, data model translation task, data transfer task;
Described memory object administration module is for managing storage and the renewal of intermediate data;
Described data conversion module is used for the conversion between different pieces of information model and unified memory object;
The clearing agent that described control switching point (CSP) is used for notification data source carries out data pick-up, and transfers data to control switching point (CSP); Described control switching point (CSP) is for carrying out the conversion of source data model to memory object model;
Described control switching point (CSP) is also for the United Dispatching of task:
A) for the data translation tasks of control switching point (CSP), the programming language exploitation adopting Spark to provide;
B) according to demand or according to resource utilization guiding scheduler task perform order;
When memory headroom is not enough, cannot store newly to data, control switching point (CSP) notifies clearing agent according to scheduling strategy, suspends data pick-up task, when waiting memory headroom to satisfy the demand, continues to perform data pick-up task.
2. a kind of large data pick-up as claimed in claim 1 and exchange system, it is characterized in that: when system malfunctions, control switching point (CSP) is log before carrying out each operation, system is restarted after fault, recover the state before losing efficacy, then the data of all loss are again extracted, reconstruct memory headroom.
3. a kind of large data pick-up as claimed in claim 1 and exchange system, it is characterized in that: control to adopt unified memory object model to store the intermediate data of exchanges data with switching centre, the data of each data source realize the Mapping and Converting of data model and memory object model by clearing agent; Unified memory object model adopts SparkRDD form to store data; Data are transfer in internal memory, does not write disk.
4. the large data pick-up of one as described in claim 1 or 2 or 3 and exchange system, is characterized in that: the task that described control switching point (CSP) is waited for by queue management.
CN201510711186.6A 2015-10-29 2015-10-29 Big data extracting and exchanging system Pending CN105243155A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510711186.6A CN105243155A (en) 2015-10-29 2015-10-29 Big data extracting and exchanging system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510711186.6A CN105243155A (en) 2015-10-29 2015-10-29 Big data extracting and exchanging system

Publications (1)

Publication Number Publication Date
CN105243155A true CN105243155A (en) 2016-01-13

Family

ID=55040803

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510711186.6A Pending CN105243155A (en) 2015-10-29 2015-10-29 Big data extracting and exchanging system

Country Status (1)

Country Link
CN (1) CN105243155A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105786996A (en) * 2016-02-18 2016-07-20 国网智能电网研究院 Electricity information data quality analyzing system
CN106250444A (en) * 2016-07-27 2016-12-21 北京集奥聚合科技有限公司 The real-time Input System of a kind of heterogeneous data source and method
CN106844496A (en) * 2016-12-26 2017-06-13 山东中创软件商用中间件股份有限公司 Data transmission scheduling method, device and server based on ESB
CN106897411A (en) * 2017-02-20 2017-06-27 广东奡风科技股份有限公司 ETL system and its method based on Spark technologies
CN107193854A (en) * 2016-03-14 2017-09-22 商业对象软件有限公司 Uniform client for distributed processing platform
CN107247799A (en) * 2017-06-27 2017-10-13 北京天机数测数据科技有限公司 Data processing method, system and its modeling method of compatible a variety of big data storages
CN107395669A (en) * 2017-06-01 2017-11-24 华南理工大学 A kind of collecting method and system based on the real-time distributed big data of streaming
WO2018129787A1 (en) * 2017-01-10 2018-07-19 网宿科技股份有限公司 Data persistence method and system in stream computing
CN108334603A (en) * 2018-02-01 2018-07-27 广东聚晨知识产权代理有限公司 A kind of big data interaction exchange system
CN108563787A (en) * 2018-04-26 2018-09-21 郑州云海信息技术有限公司 A kind of data interaction management system and method for data center's total management system
CN108733758A (en) * 2018-04-11 2018-11-02 北京三快在线科技有限公司 Hotel's static data method for pushing, device, electronic equipment and readable storage medium storing program for executing
CN108762921A (en) * 2018-05-18 2018-11-06 电子科技大学 A kind of method for scheduling task and device of the on-line optimization subregion of Spark group systems
CN108804606A (en) * 2018-05-29 2018-11-13 上海欣能信息科技发展有限公司 A kind of electric power measures class Data Migration to the method and system of HBase
CN109460408A (en) * 2018-10-29 2019-03-12 成都四方伟业软件股份有限公司 A kind of data processing method and device
CN109617734A (en) * 2018-12-25 2019-04-12 北京市天元网络技术股份有限公司 Network operation capability analysis method and device
CN110955645A (en) * 2019-10-10 2020-04-03 望海康信(北京)科技股份公司 Big data integration processing method and system
CN110971685A (en) * 2019-11-29 2020-04-07 腾讯科技(深圳)有限公司 Content processing method, content processing device, computer equipment and storage medium
CN111309719A (en) * 2020-05-13 2020-06-19 深圳市赢时胜信息技术股份有限公司 Data standardization method and system corresponding to HBase database
CN112015795A (en) * 2020-08-21 2020-12-01 广州欢网科技有限责任公司 System and method for large-data-volume ad hoc query
CN112671851A (en) * 2020-12-14 2021-04-16 南方电网数字电网研究院有限公司 Monitoring and early warning system using remote agent unit
CN112685385A (en) * 2020-12-31 2021-04-20 广西中科曙光云计算有限公司 Big data platform for smart city construction

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361110A (en) * 2014-12-01 2015-02-18 广东电网有限责任公司清远供电局 Mass electricity consumption data analysis system as well as real-time calculation method and data mining method
US20150066646A1 (en) * 2013-08-27 2015-03-05 Yahoo! Inc. Spark satellite clusters to hadoop data stores
CN104699723A (en) * 2013-12-10 2015-06-10 北京神州泰岳软件股份有限公司 Data exchange adapter and system and method for synchronizing data among heterogeneous systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150066646A1 (en) * 2013-08-27 2015-03-05 Yahoo! Inc. Spark satellite clusters to hadoop data stores
CN104699723A (en) * 2013-12-10 2015-06-10 北京神州泰岳软件股份有限公司 Data exchange adapter and system and method for synchronizing data among heterogeneous systems
CN104361110A (en) * 2014-12-01 2015-02-18 广东电网有限责任公司清远供电局 Mass electricity consumption data analysis system as well as real-time calculation method and data mining method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李佳玮 等: "电网企业大数据技术应用研究", 《电力信息与通信技术》 *

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105786996A (en) * 2016-02-18 2016-07-20 国网智能电网研究院 Electricity information data quality analyzing system
CN107193854A (en) * 2016-03-14 2017-09-22 商业对象软件有限公司 Uniform client for distributed processing platform
CN107193854B (en) * 2016-03-14 2022-02-25 商业对象软件有限公司 Unified client for distributed processing platform
CN106250444A (en) * 2016-07-27 2016-12-21 北京集奥聚合科技有限公司 The real-time Input System of a kind of heterogeneous data source and method
CN106844496A (en) * 2016-12-26 2017-06-13 山东中创软件商用中间件股份有限公司 Data transmission scheduling method, device and server based on ESB
CN106844496B (en) * 2016-12-26 2020-04-10 山东中创软件商用中间件股份有限公司 Data transmission scheduling method and device based on enterprise service bus and server
WO2018129787A1 (en) * 2017-01-10 2018-07-19 网宿科技股份有限公司 Data persistence method and system in stream computing
CN106897411A (en) * 2017-02-20 2017-06-27 广东奡风科技股份有限公司 ETL system and its method based on Spark technologies
CN107395669A (en) * 2017-06-01 2017-11-24 华南理工大学 A kind of collecting method and system based on the real-time distributed big data of streaming
CN107395669B (en) * 2017-06-01 2020-04-07 华南理工大学 Data acquisition method and system based on streaming real-time distributed big data
CN107247799A (en) * 2017-06-27 2017-10-13 北京天机数测数据科技有限公司 Data processing method, system and its modeling method of compatible a variety of big data storages
CN108334603A (en) * 2018-02-01 2018-07-27 广东聚晨知识产权代理有限公司 A kind of big data interaction exchange system
CN108733758A (en) * 2018-04-11 2018-11-02 北京三快在线科技有限公司 Hotel's static data method for pushing, device, electronic equipment and readable storage medium storing program for executing
CN108733758B (en) * 2018-04-11 2022-04-05 北京三快在线科技有限公司 Hotel static data pushing method and device, electronic equipment and readable storage medium
CN108563787A (en) * 2018-04-26 2018-09-21 郑州云海信息技术有限公司 A kind of data interaction management system and method for data center's total management system
CN108762921B (en) * 2018-05-18 2019-07-12 电子科技大学 A kind of method for scheduling task and device of the on-line optimization subregion of Spark group system
CN108762921A (en) * 2018-05-18 2018-11-06 电子科技大学 A kind of method for scheduling task and device of the on-line optimization subregion of Spark group systems
CN108804606A (en) * 2018-05-29 2018-11-13 上海欣能信息科技发展有限公司 A kind of electric power measures class Data Migration to the method and system of HBase
CN108804606B (en) * 2018-05-29 2021-08-31 上海欣能信息科技发展有限公司 Method and system for migrating power measurement data to HBase
CN109460408A (en) * 2018-10-29 2019-03-12 成都四方伟业软件股份有限公司 A kind of data processing method and device
CN109617734A (en) * 2018-12-25 2019-04-12 北京市天元网络技术股份有限公司 Network operation capability analysis method and device
CN109617734B (en) * 2018-12-25 2021-12-07 北京市天元网络技术股份有限公司 Network operation capability analysis method and device
CN110955645A (en) * 2019-10-10 2020-04-03 望海康信(北京)科技股份公司 Big data integration processing method and system
CN110955645B (en) * 2019-10-10 2022-10-11 望海康信(北京)科技股份公司 Big data integration processing method and system
CN110971685A (en) * 2019-11-29 2020-04-07 腾讯科技(深圳)有限公司 Content processing method, content processing device, computer equipment and storage medium
CN111309719B (en) * 2020-05-13 2020-08-21 深圳市赢时胜信息技术股份有限公司 Data standardization method and system corresponding to HBase database
CN111309719A (en) * 2020-05-13 2020-06-19 深圳市赢时胜信息技术股份有限公司 Data standardization method and system corresponding to HBase database
CN112015795A (en) * 2020-08-21 2020-12-01 广州欢网科技有限责任公司 System and method for large-data-volume ad hoc query
CN112671851A (en) * 2020-12-14 2021-04-16 南方电网数字电网研究院有限公司 Monitoring and early warning system using remote agent unit
CN112685385A (en) * 2020-12-31 2021-04-20 广西中科曙光云计算有限公司 Big data platform for smart city construction
CN112685385B (en) * 2020-12-31 2021-11-16 广西中科曙光云计算有限公司 Big data platform for smart city construction

Similar Documents

Publication Publication Date Title
CN105243155A (en) Big data extracting and exchanging system
CN112534396B (en) Diary watch in database system
US11422982B2 (en) Scaling stateful clusters while maintaining access
US11093466B2 (en) Incremental out-of-place updates for index structures
Liu et al. Survey of real-time processing systems for big data
US10122783B2 (en) Dynamic data-ingestion pipeline
JP6416194B2 (en) Scalable analytic platform for semi-structured data
US10684990B2 (en) Reconstructing distributed cached data for retrieval
US10990288B2 (en) Systems and/or methods for leveraging in-memory storage in connection with the shuffle phase of MapReduce
US10061834B1 (en) Incremental out-of-place updates for datasets in data stores
Zhang et al. A video cloud platform combing online and offline cloud computing technologies
WO2014011434A2 (en) System and method for economical migration of legacy applications from mainframe and distributed platforms
CN104462185A (en) Digital library cloud storage system based on mixed structure
CN104239377A (en) Platform-crossing data retrieval method and device
CN112084190A (en) Big data based acquired data real-time storage and management system and method
Sundarakumar et al. A comprehensive study and review of tuning the performance on database scalability in big data analytics
CN104035522A (en) Large database appliance
Marcu KerA: A Unified Ingestion and Storage System for Scalable Big Data Processing
CN102360382B (en) High-speed object-based parallel storage system directory replication method
Su et al. A survey on big data analytics technologies
CN109800208B (en) Network traceability system and its data processing method, computer storage medium
CN108334603A (en) A kind of big data interaction exchange system
Bui et al. ROARS: a scalable repository for data intensive scientific computing
CN204102026U (en) Large database concept all-in-one
Wu Big data processing with Hadoop

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160113

RJ01 Rejection of invention patent application after publication