CN108334603A - A kind of big data interaction exchange system - Google Patents
A kind of big data interaction exchange system Download PDFInfo
- Publication number
- CN108334603A CN108334603A CN201810100144.2A CN201810100144A CN108334603A CN 108334603 A CN108334603 A CN 108334603A CN 201810100144 A CN201810100144 A CN 201810100144A CN 108334603 A CN108334603 A CN 108334603A
- Authority
- CN
- China
- Prior art keywords
- data
- csp
- switching point
- exchange system
- control switching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of big data interaction exchange system, the system comprises:Data acquisition module, the data acquisition module can carry out data acquisition according to behaviors such as the previous historical viewings of user, purchaser records;The data of computing module, the data collecting module collected are converted into computer language after calculating;Database, the data for storing the data collecting module collected pass through the computer language of computing module conversion;Operating system can call the information stored in the database by the operating system, and send the data in database to cloud computing;Cloud computing receives data to be calculated, and carries out calculation processing.Using the embodiment of the present invention, the precision of the precision and store merchandise display of the advertizing of website can be promoted according to big data interaction exchange system.
Description
Technical field
The invention belongs to electronic technology fields, more particularly to a kind of big data interaction exchange system.
Background technology
With being continuously increased for business data amount, computer data to be treated reach TB ranks from MB ranks,
Even PB ranks, individual server can not be stored and be analyzed to all data of enterprise, need data pick-up being aggregated into
Big data platform carries out analyzing processing.Enterprise's Legacy System generally comprises various types of data, including being stored in relation data
The business datum of library system is stored as the various document informations and journal file of document form, also includes to come from big quantity sensor
Real-time Monitoring Data etc..How all efficient to these data, real-time be acquired is the first of big data project success
Step.
Currently exist the big data acquisition system of some single types, such as the Sqoop systems of the Hadoop ecosystems
System, support carry out parallel data pick-up from relational database, have supported that Oracle, SQLServer, MySql etc. are each at present
Kind Sybase, and support to extract task by MapReduce come parallel execution.Such as distributed message acquisition system
Unite kafka, is that a kind of distributed post of high-throughput subscribes to message system, it can be handled in the website of consumer's scale
Everything flow data.This action (web page browsing, the action of search and other users) is many on modern network
One key factor of social function.These data are often as the requirement of handling capacity and are gathered by handling daily record and daily record
It closes to solve.This distributed crawler systems of also such as Nutch can capture data parallel from internet and be stored in
In Hadoop file system.
The tool mutually converted between relational database extensively using and enterprise, including Oracle, SQL Server
Also the tool of other databases of data exporting is both provided.Informatica and IBM also has Related product, supports relationship
The conversion of the structurings semi-structured data such as database, XML.But there is presently no special systems to support that big data is flat
System and traditional relational etc. are easily exchanged in platform.Because big data system quantity is numerous, also constantly increasing
Add, only NoSQL databases just there are tens kinds, how to provide good system architecture and these databases are linked into exchange system
System is that have the problem of challenge.
These big data acquisition systems are individually present mutually at present, and the load mechanism of Hadoop is single, such as from pass
It is the data that database extracts to be loaded into Hive, and cannot be loaded into and realize some quickly inquiry clothes in HBase
Business.In addition after being loaded into Hadoop, a kind of method is also not present and supports that data are flowed in Hadoop different sub-systems
It is dynamic.For example the data in Hive need to carry out mass data cleaning, and Hive itself does not support the modification of data, at this moment
It needs data being transferred in HBase to handle.
Invention content
The purpose of the present invention is to provide a kind of big data interaction exchange systems, support relational database, unstructured text
Shelves, sensor database and the two-way circulation of Hadoop platform Hive, HBase, HDFS data among systems, by using parallel
Task scheduling and all intermediate data are stored using memory, realize efficient data exchange.
In order to achieve the above object, the present invention provides a kind of big data interaction exchange system, the system comprises:
Control switching point (CSP) is deployed in Spark platforms, by Yarn resource management frameworks by Spark platforms and Hadoop platform
It is deployed in the same cluster;In control switching point (CSP) memory object storage and Spark, all intermediate data and different type number
It is also executed by Spark according to model conversion task;Including the relational database system, non-structural being both dispersed in different servers
Change document, sensing data;
Hadoop big data platforms, including HDFS, HBase, Hive subsystem, for loading the data extracted, and provide analysis
Function;
Clearing agent, is deployed on different data sources system or control switching point (CSP);For by remote interface come sum number
It is interacted according to source;Including between clearing agent and interactive controlling center control message channel and data channel.
As a preferred technical solution of the present invention, the control switching point (CSP) includes task scheduling modules, memory pair
As management module, data conversion module, clearing agent of the control switching point (CSP) for notification data source carries out data pick-up,
And transfer data to control switching point (CSP);The control switching point (CSP) is for carrying out source data model to memory object model
Conversion;The control switching point (CSP) is additionally operable to the United Dispatching of task.
As a preferred technical solution of the present invention, the task scheduling modules are taken out for dispatching exchange proxy data
It takes, data loading tasks, data model translation task, data transfer task.
As a preferred technical solution of the present invention, the memory object management module is for managing depositing for intermediate data
Storage and update;The data conversion module is for the conversion between different data model and unified memory object.
As a preferred technical solution of the present invention, when system breaks down, control switching point (CSP) is each in progress
Record log before a operation restarts system after failure, restores the state before failure, then extracts the data of all loss again,
Reconstruct memory headroom.
As a preferred technical solution of the present invention, control is deposited with switching centre using unified memory object model
The intermediate data of data exchange is stored up, the data of each data source realize data model and memory object mould by clearing agent
The Mapping and Converting of type;Unified memory object model stores data using SparkRDD formats;Data transfer in memory,
It is not written into disk.
As a preferred technical solution of the present invention, task that the control switching point (CSP) is waited for by queue management.
As a preferred technical solution of the present invention, when memory headroom deficiency, it can not store and newly arrive data, control exchanges
Center notifies clearing agent according to scheduling strategy, suspends data pick-up task, when memory headrooms satisfaction being waited to need, continues to execute
Data pick-up task
Compared with prior art, the beneficial effects of the present invention are:
The present invention provides a kind of big data interaction exchange system, and the big data interaction exchange system is combined with current big data
On the basis of the various solutions of technology, different disposal demand can be met, all switching tasks are uniformly scheduled control, can
To improve the efficiency of data exchange.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
Obtain other attached drawings according to these attached drawings.
Fig. 1 is the first structural schematic diagram of big data interaction exchange system provided by the invention.
Fig. 2 is second of structural schematic diagram of big data interaction exchange system provided by the invention.
Fig. 3 is the third structural schematic diagram of big data interaction exchange system provided by the invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Below by specific embodiment, the present invention will be described in detail.
Referring to Fig. 1, a kind of big data interaction exchange system, the system comprises:
First scene sees Fig. 2, in certain national grid subsidiary, is used with a large number of users in relational database storing now
Electrographic recording is analyzed, but traditional database can not provide High Performance Data Query analysis demand, it is therefore desirable to be loaded data into
Hadoop platform Hive systems are analyzed.
Firstly the need of the interactive agent of one correspondence database of selection, such as Oracle interactive agents, to realize data
It extracts, and is transferred to control switching point (CSP).Relation data is converted to memory object model by control switching point (CSP).Here it corresponds to
Memory object model, each table is exactly a class.The memory object model is stored in SparkRDD, i.e., by dividing
Cloth memory storage.Then memory object model conversion in RDD is the data model of Hive by control switching point (CSP), and is transferred to
Hive clearing agents write data into Hive systems.
Before Hive is analyzed, find, there are a large amount of dirty datas, to need to carry out data scrubbing, but Hive is not supported
It modifies to data, therefore partial data is moved into HBase to clear up, be then written back in Hive again.Here it needs
Realize the Mutual data transmission of Hive and HBase.Each agency, control switching point (CSP), big data system as we can see from the figure
Between be all four-headed arrow.It needs to select Hive agencies and HBase agencies.Because Hive and HBase data models are simultaneously different,
Therefore user needs definition rule, and to select which row of which table of Hive to be transformed into HBase, which row is as HBase's
Key, which is arranged is used as the row cluster of HBase by which kind of form.The specific mapping method present invention does not list in detail, can pass through
Various ways are realized.
For the data changed to Hive write-backs, corresponding raw data table can be deleted first, then after modification is written
Data, can also the data of modification be written to new table and initial data coexists in Hive systems.
Second scenario sees Fig. 3, and in specific implementation environment, a large amount of power equipments and local environment arrange several
Sensor carrys out the information such as real-time collecting device operating parameter, temperature, humidity, and is stored in the key assignments logarithm of sensor server
According to library.Because it is especially big to accumulate data volume, it is now desired to will be in Data Migration to HBase.Key-value pair data library can be selected to hand over
Reason regenerate to carry out data pick-up, selects HBase clearing agents to realize that data load.Key-value pair type and HBase data moulds
Type can be easy to map, and HBase is just reduced to a key-value pair data library to exist.
Third scene constantly generates heap file on several servers, and the source of these files may be from net
Network reptile, it is also possible to come from server log, it is now desired to extract some key messages, real time high-speed in real time from these files
Be stored in HDFS.In order to improve data throughout, we can design special agency.The agency includes a hook
Program, can be with the file message of capturing operation system, when server will carry out file write operation, synchronous parsing message life
It enables, to obtain file content, is put into memory, control switching point (CSP) is then sent to by clearing agent, then HDFS is transferred to exchange
Agency carries out file write-in, and intermediate data all so are all present in memory, can greatly reduce disk I/O, improves data and passes
Defeated efficiency.
It should be noted that herein, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, identical similar portion between each embodiment
Point just to refer each other, and each embodiment focuses on the differences from other embodiments.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (9)
1. a kind of big data interaction exchange system, which is characterized in that the system comprises:
Control switching point (CSP) is deployed in Spark platforms, by Yarn resource management frameworks by Spark platforms and Hadoop platform
It is deployed in the same cluster;In control switching point (CSP) memory object storage and Spark, all intermediate data and different types of data
Model conversion task is also executed by Spark;Including the relational database system, unstructured being both dispersed in different servers
Document, sensing data;
Hadoop big data platforms, including HDFS, HBase, Hive subsystem, for loading the data extracted, and provide analysis
Function;
Clearing agent, is deployed on different data sources system or control switching point (CSP);For by remote interface come sum number
It is interacted according to source;Including between clearing agent and interactive controlling center control message channel and data channel.
2. a kind of big data interaction exchange system according to claim 1, which is characterized in that the control switching point (CSP) packet
Containing task scheduling modules, memory object management module, data conversion module;The control switching point (CSP) is for notification data source
Clearing agent carries out data pick-up, and transfers data to control switching point (CSP);The control switching point (CSP) is for carrying out source number
According to model to the conversion of memory object model;The control switching point (CSP) is additionally operable to the United Dispatching of task.
3. a kind of big data interaction exchange system according to claim 2, which is characterized in that the task scheduling modules are used
It is extracted in dispatching exchange proxy data, data loading tasks, data model translation task, data transfer task.
4. a kind of big data interaction exchange system according to claim 2, which is characterized in that the memory object manages mould
Block is used to manage the storage and update of intermediate data.
5. a kind of big data interaction exchange system according to claim 2, which is characterized in that the data conversion module is used
Conversion between different data model and unified memory object.
6. a kind of big data interaction exchange system according to claim 1, which is characterized in that when system breaks down,
Control switching point (CSP) is carrying out the preceding record log of each operation, restarts system after failure, restores the state before failure, then weigh
The data of all loss are newly extracted, memory headroom is reconstructed.
7. a kind of big data interaction exchange system according to claim 1, which is characterized in that the control and switching centre
The intermediate data of data exchange is stored using unified memory object model, the data of each data source pass through clearing agent
Realize the Mapping and Converting of data model and memory object model;Unified memory object model is deposited using SparkRDD formats
Store up data;Data transfer in memory, is not written into disk.
8. a kind of big data interaction exchange system according to claim 1 or 2, which is characterized in that during the control exchanges
The task that the heart is waited for by queue management.
9. a kind of big data interaction exchange system according to claim 1, which is characterized in that when memory headroom deficiency, nothing
Data are newly arrived in method storage, and control switching point (CSP) notifies clearing agent according to scheduling strategy, suspends data pick-up task, waits memories
When space satisfaction needs, data pick-up task is continued to execute.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810100144.2A CN108334603A (en) | 2018-02-01 | 2018-02-01 | A kind of big data interaction exchange system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810100144.2A CN108334603A (en) | 2018-02-01 | 2018-02-01 | A kind of big data interaction exchange system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108334603A true CN108334603A (en) | 2018-07-27 |
Family
ID=62927855
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810100144.2A Withdrawn CN108334603A (en) | 2018-02-01 | 2018-02-01 | A kind of big data interaction exchange system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108334603A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110543367A (en) * | 2019-08-30 | 2019-12-06 | 联想(北京)有限公司 | Resource processing method and device, electronic device and medium |
CN112463868A (en) * | 2020-12-04 | 2021-03-09 | 车智互联(北京)科技有限公司 | Data processing method, data processing system and computing device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105243155A (en) * | 2015-10-29 | 2016-01-13 | 贵州电网有限责任公司电力调度控制中心 | Big data extracting and exchanging system |
-
2018
- 2018-02-01 CN CN201810100144.2A patent/CN108334603A/en not_active Withdrawn
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105243155A (en) * | 2015-10-29 | 2016-01-13 | 贵州电网有限责任公司电力调度控制中心 | Big data extracting and exchanging system |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110543367A (en) * | 2019-08-30 | 2019-12-06 | 联想(北京)有限公司 | Resource processing method and device, electronic device and medium |
CN110543367B (en) * | 2019-08-30 | 2022-07-26 | 联想(北京)有限公司 | Resource processing method and device, electronic device and medium |
CN112463868A (en) * | 2020-12-04 | 2021-03-09 | 车智互联(北京)科技有限公司 | Data processing method, data processing system and computing device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111400326B (en) | Smart city data management system and method thereof | |
CN105243155A (en) | Big data extracting and exchanging system | |
Mishne et al. | Fast data in the era of big data: Twitter's real-time related query suggestion architecture | |
CN103930875B (en) | Software virtual machine for acceleration of transactional data processing | |
CN102779185B (en) | High-availability distribution type full-text index method | |
CN102999537B (en) | System and method for data migration | |
CN111327681A (en) | Cloud computing data platform construction method based on Kubernetes | |
US12014248B2 (en) | Machine learning performance and workload management | |
CN107103064B (en) | Data statistical method and device | |
CN106339509A (en) | Power grid operation data sharing system based on large data technology | |
CN107515927A (en) | A kind of real estate user behavioural analysis platform | |
CN107766402A (en) | A kind of building dictionary cloud source of houses big data platform | |
CN108021809A (en) | A kind of data processing method and system | |
CN107800808A (en) | A kind of data-storage system based on Hadoop framework | |
CN106951552A (en) | A kind of user behavior data processing method based on Hadoop | |
CN112148718A (en) | Big data support management system for city-level data middling station | |
CN110377595A (en) | A kind of vehicle data management system | |
CN109067841A (en) | Service current-limiting method, system, server and storage medium based on ZooKeeper | |
CN103440290A (en) | Big data loading system and method | |
CN111724046B (en) | Electricity purchase management system | |
CN112632025A (en) | Power grid enterprise management decision support application system based on PAAS platform | |
CN108595605A (en) | A kind of construction method of car networking platform database | |
CN111459900B (en) | Big data life cycle setting method, device, storage medium and server | |
CN117149873A (en) | Data lake service platform construction method based on flow batch integration | |
CN117677943A (en) | Data consistency mechanism for hybrid data processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20180727 |
|
WW01 | Invention patent application withdrawn after publication |