CN108334603A - A kind of big data interaction exchange system - Google Patents

A kind of big data interaction exchange system Download PDF

Info

Publication number
CN108334603A
CN108334603A CN201810100144.2A CN201810100144A CN108334603A CN 108334603 A CN108334603 A CN 108334603A CN 201810100144 A CN201810100144 A CN 201810100144A CN 108334603 A CN108334603 A CN 108334603A
Authority
CN
China
Prior art keywords
data
csp
switching point
exchange system
control switching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810100144.2A
Other languages
Chinese (zh)
Inventor
郑英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Ji Chen Intellectual Property Agency Co Ltd
Original Assignee
Guangdong Ji Chen Intellectual Property Agency Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Ji Chen Intellectual Property Agency Co Ltd filed Critical Guangdong Ji Chen Intellectual Property Agency Co Ltd
Priority to CN201810100144.2A priority Critical patent/CN108334603A/en
Publication of CN108334603A publication Critical patent/CN108334603A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of big data interaction exchange system, the system comprises:Data acquisition module, the data acquisition module can carry out data acquisition according to behaviors such as the previous historical viewings of user, purchaser records;The data of computing module, the data collecting module collected are converted into computer language after calculating;Database, the data for storing the data collecting module collected pass through the computer language of computing module conversion;Operating system can call the information stored in the database by the operating system, and send the data in database to cloud computing;Cloud computing receives data to be calculated, and carries out calculation processing.Using the embodiment of the present invention, the precision of the precision and store merchandise display of the advertizing of website can be promoted according to big data interaction exchange system.

Description

A kind of big data interaction exchange system
Technical field
The invention belongs to electronic technology fields, more particularly to a kind of big data interaction exchange system.
Background technology
With being continuously increased for business data amount, computer data to be treated reach TB ranks from MB ranks, Even PB ranks, individual server can not be stored and be analyzed to all data of enterprise, need data pick-up being aggregated into Big data platform carries out analyzing processing.Enterprise's Legacy System generally comprises various types of data, including being stored in relation data The business datum of library system is stored as the various document informations and journal file of document form, also includes to come from big quantity sensor Real-time Monitoring Data etc..How all efficient to these data, real-time be acquired is the first of big data project success Step.
Currently exist the big data acquisition system of some single types, such as the Sqoop systems of the Hadoop ecosystems System, support carry out parallel data pick-up from relational database, have supported that Oracle, SQLServer, MySql etc. are each at present Kind Sybase, and support to extract task by MapReduce come parallel execution.Such as distributed message acquisition system Unite kafka, is that a kind of distributed post of high-throughput subscribes to message system, it can be handled in the website of consumer's scale Everything flow data.This action (web page browsing, the action of search and other users) is many on modern network One key factor of social function.These data are often as the requirement of handling capacity and are gathered by handling daily record and daily record It closes to solve.This distributed crawler systems of also such as Nutch can capture data parallel from internet and be stored in In Hadoop file system.
The tool mutually converted between relational database extensively using and enterprise, including Oracle, SQL Server Also the tool of other databases of data exporting is both provided.Informatica and IBM also has Related product, supports relationship The conversion of the structurings semi-structured data such as database, XML.But there is presently no special systems to support that big data is flat System and traditional relational etc. are easily exchanged in platform.Because big data system quantity is numerous, also constantly increasing Add, only NoSQL databases just there are tens kinds, how to provide good system architecture and these databases are linked into exchange system System is that have the problem of challenge.
These big data acquisition systems are individually present mutually at present, and the load mechanism of Hadoop is single, such as from pass It is the data that database extracts to be loaded into Hive, and cannot be loaded into and realize some quickly inquiry clothes in HBase Business.In addition after being loaded into Hadoop, a kind of method is also not present and supports that data are flowed in Hadoop different sub-systems It is dynamic.For example the data in Hive need to carry out mass data cleaning, and Hive itself does not support the modification of data, at this moment It needs data being transferred in HBase to handle.
Invention content
The purpose of the present invention is to provide a kind of big data interaction exchange systems, support relational database, unstructured text Shelves, sensor database and the two-way circulation of Hadoop platform Hive, HBase, HDFS data among systems, by using parallel Task scheduling and all intermediate data are stored using memory, realize efficient data exchange.
In order to achieve the above object, the present invention provides a kind of big data interaction exchange system, the system comprises:
Control switching point (CSP) is deployed in Spark platforms, by Yarn resource management frameworks by Spark platforms and Hadoop platform It is deployed in the same cluster;In control switching point (CSP) memory object storage and Spark, all intermediate data and different type number It is also executed by Spark according to model conversion task;Including the relational database system, non-structural being both dispersed in different servers Change document, sensing data;
Hadoop big data platforms, including HDFS, HBase, Hive subsystem, for loading the data extracted, and provide analysis Function;
Clearing agent, is deployed on different data sources system or control switching point (CSP);For by remote interface come sum number It is interacted according to source;Including between clearing agent and interactive controlling center control message channel and data channel.
As a preferred technical solution of the present invention, the control switching point (CSP) includes task scheduling modules, memory pair As management module, data conversion module, clearing agent of the control switching point (CSP) for notification data source carries out data pick-up, And transfer data to control switching point (CSP);The control switching point (CSP) is for carrying out source data model to memory object model Conversion;The control switching point (CSP) is additionally operable to the United Dispatching of task.
As a preferred technical solution of the present invention, the task scheduling modules are taken out for dispatching exchange proxy data It takes, data loading tasks, data model translation task, data transfer task.
As a preferred technical solution of the present invention, the memory object management module is for managing depositing for intermediate data Storage and update;The data conversion module is for the conversion between different data model and unified memory object.
As a preferred technical solution of the present invention, when system breaks down, control switching point (CSP) is each in progress Record log before a operation restarts system after failure, restores the state before failure, then extracts the data of all loss again, Reconstruct memory headroom.
As a preferred technical solution of the present invention, control is deposited with switching centre using unified memory object model The intermediate data of data exchange is stored up, the data of each data source realize data model and memory object mould by clearing agent The Mapping and Converting of type;Unified memory object model stores data using SparkRDD formats;Data transfer in memory, It is not written into disk.
As a preferred technical solution of the present invention, task that the control switching point (CSP) is waited for by queue management.
As a preferred technical solution of the present invention, when memory headroom deficiency, it can not store and newly arrive data, control exchanges Center notifies clearing agent according to scheduling strategy, suspends data pick-up task, when memory headrooms satisfaction being waited to need, continues to execute Data pick-up task
Compared with prior art, the beneficial effects of the present invention are:
The present invention provides a kind of big data interaction exchange system, and the big data interaction exchange system is combined with current big data On the basis of the various solutions of technology, different disposal demand can be met, all switching tasks are uniformly scheduled control, can To improve the efficiency of data exchange.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with Obtain other attached drawings according to these attached drawings.
Fig. 1 is the first structural schematic diagram of big data interaction exchange system provided by the invention.
Fig. 2 is second of structural schematic diagram of big data interaction exchange system provided by the invention.
Fig. 3 is the third structural schematic diagram of big data interaction exchange system provided by the invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Below by specific embodiment, the present invention will be described in detail.
Referring to Fig. 1, a kind of big data interaction exchange system, the system comprises:
First scene sees Fig. 2, in certain national grid subsidiary, is used with a large number of users in relational database storing now Electrographic recording is analyzed, but traditional database can not provide High Performance Data Query analysis demand, it is therefore desirable to be loaded data into Hadoop platform Hive systems are analyzed.
Firstly the need of the interactive agent of one correspondence database of selection, such as Oracle interactive agents, to realize data It extracts, and is transferred to control switching point (CSP).Relation data is converted to memory object model by control switching point (CSP).Here it corresponds to Memory object model, each table is exactly a class.The memory object model is stored in SparkRDD, i.e., by dividing Cloth memory storage.Then memory object model conversion in RDD is the data model of Hive by control switching point (CSP), and is transferred to Hive clearing agents write data into Hive systems.
Before Hive is analyzed, find, there are a large amount of dirty datas, to need to carry out data scrubbing, but Hive is not supported It modifies to data, therefore partial data is moved into HBase to clear up, be then written back in Hive again.Here it needs Realize the Mutual data transmission of Hive and HBase.Each agency, control switching point (CSP), big data system as we can see from the figure Between be all four-headed arrow.It needs to select Hive agencies and HBase agencies.Because Hive and HBase data models are simultaneously different, Therefore user needs definition rule, and to select which row of which table of Hive to be transformed into HBase, which row is as HBase's Key, which is arranged is used as the row cluster of HBase by which kind of form.The specific mapping method present invention does not list in detail, can pass through Various ways are realized.
For the data changed to Hive write-backs, corresponding raw data table can be deleted first, then after modification is written Data, can also the data of modification be written to new table and initial data coexists in Hive systems.
Second scenario sees Fig. 3, and in specific implementation environment, a large amount of power equipments and local environment arrange several Sensor carrys out the information such as real-time collecting device operating parameter, temperature, humidity, and is stored in the key assignments logarithm of sensor server According to library.Because it is especially big to accumulate data volume, it is now desired to will be in Data Migration to HBase.Key-value pair data library can be selected to hand over Reason regenerate to carry out data pick-up, selects HBase clearing agents to realize that data load.Key-value pair type and HBase data moulds Type can be easy to map, and HBase is just reduced to a key-value pair data library to exist.
Third scene constantly generates heap file on several servers, and the source of these files may be from net Network reptile, it is also possible to come from server log, it is now desired to extract some key messages, real time high-speed in real time from these files Be stored in HDFS.In order to improve data throughout, we can design special agency.The agency includes a hook Program, can be with the file message of capturing operation system, when server will carry out file write operation, synchronous parsing message life It enables, to obtain file content, is put into memory, control switching point (CSP) is then sent to by clearing agent, then HDFS is transferred to exchange Agency carries out file write-in, and intermediate data all so are all present in memory, can greatly reduce disk I/O, improves data and passes Defeated efficiency.
It should be noted that herein, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, identical similar portion between each embodiment Point just to refer each other, and each embodiment focuses on the differences from other embodiments.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (9)

1. a kind of big data interaction exchange system, which is characterized in that the system comprises:
Control switching point (CSP) is deployed in Spark platforms, by Yarn resource management frameworks by Spark platforms and Hadoop platform It is deployed in the same cluster;In control switching point (CSP) memory object storage and Spark, all intermediate data and different types of data Model conversion task is also executed by Spark;Including the relational database system, unstructured being both dispersed in different servers Document, sensing data;
Hadoop big data platforms, including HDFS, HBase, Hive subsystem, for loading the data extracted, and provide analysis Function;
Clearing agent, is deployed on different data sources system or control switching point (CSP);For by remote interface come sum number It is interacted according to source;Including between clearing agent and interactive controlling center control message channel and data channel.
2. a kind of big data interaction exchange system according to claim 1, which is characterized in that the control switching point (CSP) packet Containing task scheduling modules, memory object management module, data conversion module;The control switching point (CSP) is for notification data source Clearing agent carries out data pick-up, and transfers data to control switching point (CSP);The control switching point (CSP) is for carrying out source number According to model to the conversion of memory object model;The control switching point (CSP) is additionally operable to the United Dispatching of task.
3. a kind of big data interaction exchange system according to claim 2, which is characterized in that the task scheduling modules are used It is extracted in dispatching exchange proxy data, data loading tasks, data model translation task, data transfer task.
4. a kind of big data interaction exchange system according to claim 2, which is characterized in that the memory object manages mould Block is used to manage the storage and update of intermediate data.
5. a kind of big data interaction exchange system according to claim 2, which is characterized in that the data conversion module is used Conversion between different data model and unified memory object.
6. a kind of big data interaction exchange system according to claim 1, which is characterized in that when system breaks down, Control switching point (CSP) is carrying out the preceding record log of each operation, restarts system after failure, restores the state before failure, then weigh The data of all loss are newly extracted, memory headroom is reconstructed.
7. a kind of big data interaction exchange system according to claim 1, which is characterized in that the control and switching centre The intermediate data of data exchange is stored using unified memory object model, the data of each data source pass through clearing agent Realize the Mapping and Converting of data model and memory object model;Unified memory object model is deposited using SparkRDD formats Store up data;Data transfer in memory, is not written into disk.
8. a kind of big data interaction exchange system according to claim 1 or 2, which is characterized in that during the control exchanges The task that the heart is waited for by queue management.
9. a kind of big data interaction exchange system according to claim 1, which is characterized in that when memory headroom deficiency, nothing Data are newly arrived in method storage, and control switching point (CSP) notifies clearing agent according to scheduling strategy, suspends data pick-up task, waits memories When space satisfaction needs, data pick-up task is continued to execute.
CN201810100144.2A 2018-02-01 2018-02-01 A kind of big data interaction exchange system Withdrawn CN108334603A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810100144.2A CN108334603A (en) 2018-02-01 2018-02-01 A kind of big data interaction exchange system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810100144.2A CN108334603A (en) 2018-02-01 2018-02-01 A kind of big data interaction exchange system

Publications (1)

Publication Number Publication Date
CN108334603A true CN108334603A (en) 2018-07-27

Family

ID=62927855

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810100144.2A Withdrawn CN108334603A (en) 2018-02-01 2018-02-01 A kind of big data interaction exchange system

Country Status (1)

Country Link
CN (1) CN108334603A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543367A (en) * 2019-08-30 2019-12-06 联想(北京)有限公司 Resource processing method and device, electronic device and medium
CN112463868A (en) * 2020-12-04 2021-03-09 车智互联(北京)科技有限公司 Data processing method, data processing system and computing device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243155A (en) * 2015-10-29 2016-01-13 贵州电网有限责任公司电力调度控制中心 Big data extracting and exchanging system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243155A (en) * 2015-10-29 2016-01-13 贵州电网有限责任公司电力调度控制中心 Big data extracting and exchanging system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543367A (en) * 2019-08-30 2019-12-06 联想(北京)有限公司 Resource processing method and device, electronic device and medium
CN110543367B (en) * 2019-08-30 2022-07-26 联想(北京)有限公司 Resource processing method and device, electronic device and medium
CN112463868A (en) * 2020-12-04 2021-03-09 车智互联(北京)科技有限公司 Data processing method, data processing system and computing device

Similar Documents

Publication Publication Date Title
CN111400326B (en) Smart city data management system and method thereof
CN105243155A (en) Big data extracting and exchanging system
Mishne et al. Fast data in the era of big data: Twitter's real-time related query suggestion architecture
CN103930875B (en) Software virtual machine for acceleration of transactional data processing
CN102779185B (en) High-availability distribution type full-text index method
CN102999537B (en) System and method for data migration
CN111327681A (en) Cloud computing data platform construction method based on Kubernetes
US12014248B2 (en) Machine learning performance and workload management
CN107103064B (en) Data statistical method and device
CN106339509A (en) Power grid operation data sharing system based on large data technology
CN107515927A (en) A kind of real estate user behavioural analysis platform
CN107766402A (en) A kind of building dictionary cloud source of houses big data platform
CN108021809A (en) A kind of data processing method and system
CN107800808A (en) A kind of data-storage system based on Hadoop framework
CN106951552A (en) A kind of user behavior data processing method based on Hadoop
CN112148718A (en) Big data support management system for city-level data middling station
CN110377595A (en) A kind of vehicle data management system
CN109067841A (en) Service current-limiting method, system, server and storage medium based on ZooKeeper
CN103440290A (en) Big data loading system and method
CN111724046B (en) Electricity purchase management system
CN112632025A (en) Power grid enterprise management decision support application system based on PAAS platform
CN108595605A (en) A kind of construction method of car networking platform database
CN111459900B (en) Big data life cycle setting method, device, storage medium and server
CN117149873A (en) Data lake service platform construction method based on flow batch integration
CN117677943A (en) Data consistency mechanism for hybrid data processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20180727

WW01 Invention patent application withdrawn after publication