CN110209646A - A kind of data platform system calculated based on real-time streaming - Google Patents
A kind of data platform system calculated based on real-time streaming Download PDFInfo
- Publication number
- CN110209646A CN110209646A CN201910397951.XA CN201910397951A CN110209646A CN 110209646 A CN110209646 A CN 110209646A CN 201910397951 A CN201910397951 A CN 201910397951A CN 110209646 A CN110209646 A CN 110209646A
- Authority
- CN
- China
- Prior art keywords
- data
- module
- workflow
- management
- real
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of data platform systems calculated based on real-time streaming, it include: to obtain the behavioral data information of tentation data and store, the tentation data is converted into a series of short and small data flows, the processing of real-time streaming data that high-throughput may be implemented, having fault tolerant mechanism.The system is by being converted into a series of short and small batch processing jobs for the behavioral data information of tentation data and being parsed and extracted key message, to realize high-volume data conversion into micro- batch data, and data are quickly calculated by distributed way, achieve the effect that handling capacity is big, low latency, to meet requirement of the data jettison system to high-timeliness and be applicable in the business sensitive for time delay.
Description
Technical field
The present invention relates to computer internet Data Warehouse Design technology more particularly to it is a kind of based on real-time streaming calculate
Data platform system.
Background technique
Real-time streaming calculating is to carry out staged operation to data stream, resolves into a series of short and small batch processing jobs, then
By the data stream transmitting of segmentation into data batch processing engine, data are cleaned in data engine, are extracted, convert behaviour
Make, and obtained data processed result is saved in memory.
The data warehouse of traditional standardized, it is main to utilize the mechanism such as the timed task of oracle and trigger, partition table
Storage data are quasi real time statisticallyd analyze, in terms of performance and function, have been difficult to meet user to Data Management Analysis
Timeliness, the demand of accuracy.
With the rapid development of Internet, big data era has arrived, data are presented explosive growth, at traditional data
Reason mode has been not suitable for the analysis to mass data, needs a kind of new data processing method to handle complicated business in real time and patrols
Volume.
Summary of the invention
It, can be quickly and effectively the technical problem to be solved by the present invention is to establish the real-time streaming computing system of a set of maturation
Accurately data service is provided.In order to solve the above-mentioned technical problems, the present invention provides a kind of numbers calculated based on real-time streaming
It is flat according to plateform system, including data source modules, real time data computing module, data billboard display module and job scheduling management
Platform;
Wherein, the data source modules are responsible for disposition data source link information, and the business that extracts needs to calculate and real-time synchronization
Data;
The real time data computing module be responsible for real time data calculate and storage, and be responsible for each node resource management and
Scheduling;
The data billboard display module is used for interaction analysis;
The job scheduling management platform is responsible for the configuration of workflow, calculates the normal operation of scheduler task in real time and determine
When traffic control stream, it is ensured that the accuracy of data.
The data source modules include data volume module and data source module;
The data volume module is for distinguishing whether data are initialized, when the size of data volume is less than 10W+ number
According to when, it is only necessary to data are synchronized, full table loads data, does not need to initialize;
The data source module includes structural data.Such as: oracle database, mysql database, structuring text
Part, Hive table etc. and business need to calculate and the data of real-time synchronization, such as: sales volume, sales order, member's number, voucher
Number etc.;
The real time data computing module includes real-time data synchronization module, yarn distributed management system, data storage
Module and data computation module;
The real-time data synchronization module includes data volume module and data source module;
The data volume module and the function of data source module are identical as the function in data source modules;
The yarn distributed management system is used for the scheduling of resource of each node of cluster, to reach efficient resource pipe
Reason, such as: when data need to store, the host node ReourceManager on yarn distributed management system understands basis from section
The resource request of point NodeManager, carrys out reasonable distribution resource;
The data memory module is used to grab data by the micro- batch processing of spark sql to HDFS (Hadoop distribution
File system) on synchronize, and be stored in memory, by processing, there are in each node of cluster for final data;
The data computation module is used for the data in cluster memory by registering interim table, and utilizes sql sentence
Carry out logic calculation, such as: data from data source in real time micro- batch be drawn into HIVE after, using sql sentence data are carried out
It screens and is associated and calculates with other tables, thus the achievement data needed;
The data billboard display module includes interaction analysis module, report display module and permission control module;
The interaction analysis module is carried out for the logic calculation of data set and according to different dimensions, different themes
Analysis and processing, such as: the data of same table daily carry out data statistics and index calculate according to time dimension, or
According to regional dimension, according to different provinces, to be divided, different data sets is had reached, completes different reports;
The report display module for data to be shown by different data drawing lists, such as: histogram, curve
Figure, text box etc..
The permission control module gives phase to different business sides for controlling the permission that every report is checked and modified
Corresponding permission.
The job scheduling management console module is used for the quasi real time management and running of spark task;
Quasi real time management and running include Workflow configuration, Coordinator configuration, workflow prison to the spark task
Keyholed back plate reason;Workflow can Parallel Scheduling, also can serial scheduling, workflow allows unsuccessfully to weigh brush mechanism, can restart
Execute workflow schedule;
The Workflow is configured to management and running spark task, and each spark task requires one work of configuration
Industry, same category of operation need to configure a workflow, and during multiple workflow Parallel Schedulings, a spark task
Other task runs in the same workflow are not interfered with unsuccessfully;
The Coordinator is configured to the timer-triggered scheduler of management work stream, needs to specify a corresponding job
The frequency of stream and scheduling time, scheduling;
The workflow monitoring management is used to monitor state and the time of each spark task run.
The data memory module is used to grab by the micro- batch processing of spark sql and synchronize in data to HDFS,
Synchronous data are a Dataframe (structured data sets) on HDFS, it can register interim table, to use sql
It is operated.
The yarn distributed management system is responsible for cluster resource management and scheduling, for dividing the data of data source modules
Cloth is stored on each node on HDFS and the scheduling of each node resource.
The data of the distributed storage on HDFS are by being mapped to the external table of GreenPlum.
The data of the GreenPlum external table are carried out logic calculation, are inserted into GreenPlum by storing process
Portion's table.
The data of the GreenPlum external table are carried out logic calculation, are inserted by storing process
Table inside GreenPlum, includes the following steps:
Step 1, the data of data source modules are extracted using spark sql micro- batch;
Step 2, the data register of extraction at interim table;
Step 3, the data of interim table are inserted into HIVE external table;
Step 4, data are mapped in GreenPlum external table by HIVE external table;
Step 5, GreenPlum external table is by storing process (logic calculation passes through after data progress logic calculation
Sql sentence is completed), it is inserted into the inside table of GreenPlum;
Step 6, the data of internal table are write in data set, is shown by report.
The present invention has the advantages that following control:
The behavioral data information of tentation data is obtained using spark sql and is stored, and the tentation data is converted into one
The short and small data flow of series, the processing of real-time streaming data that high-throughput may be implemented, having fault tolerant mechanism, by high-volume number
According to being converted into micro- batch data, and data are quickly calculated by distributed way, reach handling capacity greatly, the effect of low latency
Fruit;
The yarn distributed management system is used to be responsible for the scheduling of resource of each node of cluster, to reach the utilization of resources
Maximization;
Distributed data-storage system is by the disk space on every server of Web vector graphic, by the storage resource of dispersion
Constitute a virtual storage equipment, each corner of data dispersion storage in a network, to reach resource efficiency height, safety
High feature;
Real time data calculating converts the data into RDD data set and is cached in memory, due to frequently using data set, subtracts
I/O operation, the network transmission, the time recalculated for having lacked intermediate result improve the speed using operation significantly, reach
The synchronization and calculating of near-realtime data;
Interaction analysis module can specify different dimension and theme according to different requirements, to complete corresponding report
Table;
Permission control module controls the permission of different user, different reports, can according to the demand of user, to user and report
Corresponding permission is arranged in table, reach user can see it is associated with oneself;
Monitoring operation: by operation, job title, state (it is ready, successfully, operation, failure, alarm), real time inspection make
Industry operating status realizes that operation is run again by reset button.
Detailed description of the invention
The present invention is done with reference to the accompanying drawings and detailed description and is further illustrated, it is of the invention above-mentioned or
Otherwise advantage will become apparent.
Fig. 1 is system structure of the invention figure.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
As shown in Figure 1, the invention discloses one to calculate the data platform system being related to based on real-time streaming, it is a kind of standard
The system of real-time data synchronization and calculating, including data source modules, real time data computing module, data billboard display module, work
Industry management and running platform;
Data source modules: data volume and data source;
Real time data computing module: real-time data synchronization module, yarn distributed management system, data storage, data meter
It calculates;
Data billboard display module: interaction analysis module, report display module, permission control module;
Job scheduling manages platform: Workflow configuration, Coordinator configuration, workflow monitoring management;
1, data source modules
It is responsible for disposition data source link information, the business that extracts needs to calculate and the data of real-time synchronization;
1.1 data volume
When the data volume of data source is smaller, it can take and extract total data progress data synchronization every time and calculate, work as number
When according to measuring larger, first extracting all data synchronizations and calculating, being initialized, in the synchronization of subsequent data and calculating process,
It takes toward the mode of backwash for a period of time, realizes real time data synchronization;
1.2 data source
Data source includes structural data, such as: oracle database, mysql database, structured document, Hive
Table etc.;
2, real time data computing module
It is responsible for real time data calculating and storage and the resource management and scheduling of each node;
3, real-time data synchronization module
Real-time data synchronization module is using spark sql at required for index in data source part field data micro- batch
It manages, on real-time synchronization to HDFS.
4, Yarn distributed management system
Yarn distributed management system is made of ResourceManager and ApplicationMaster,
ResourceManager is responsible for the resource management and scheduling of entire cluster, and ApplicationMaster is responsible for application program
Relevant issues, such as task schedule, Mission Monitor and fault-tolerant etc.;
5, data store
Data storage is made of data buffer storage and data storage, and on data pick-up to HDFS, data buffer storage is in server
Memory in, final data is stored on each node of server.
6, data calculate
The data being buffered in server memory are registered interim table by Dataframe, are answered using sql sentence
Miscellaneous logic calculation solves the problems, such as that mass data loading velocity is slow in conjunction with Hive external table and GreenPlum external table, with
And the problem of each data among systems Type-Inconsistencies, GreenPlum external table is mapped the data by Hive external table
In, by calling storing process to insert data into inside calculated result layer GreenPlum in table;Such as: outside GreenPlum
Portion's table carries out complicated logic calculation using sql by calling storing process;
7, data billboard display module
It is responsible for interaction analysis module, the permission control module of report display module and report of data;
8, interaction analysis module
Interaction analysis module be data in data set according to different dimensions, theme, different classification is carried out, thus complete
At;
9, report display module
Report display module shows that two parts form by production report and report, according to the achievement data in data set,
When making report, the data in data set can directly be carried out drawing and dragged, shown with forms such as column, curve, text boxes,
The time interval of refresh page is set according to different requirements,;
10, permission control module
Permission control module setting user checks the permission of report, and setting user can check only report related to user,
Other reports do not appear in the interface that user can check;
11, job scheduling manages platform
It is responsible for the configuration of workflow, calculates the normal operation and timer-triggered scheduler workflow of scheduler task in real time, it is ensured that number
According to accuracy
12, Workflow is configured
Workflow is configured in Oozie, an operation configures a spark program, can configure in a workflow
Multiple operation concurrent processing, need to import the connection packet of data source in configuration and the jar packet of project, multiple operations are concurrent
When processing, the state of operation is independent of each other;
13, Coordinator is configured
Coordinator is configured in Oozie, first selectes the workflow for needing timer-triggered scheduler, the beginning of timer-triggered scheduler is set
The frequency of time, end time and timer-triggered scheduler;
14, workflow monitoring management
Workflow monitoring management is made of work flow operation state and spark task run state, and workflow passes through timing
After scheduling starting, spark task brings into operation, by workflow and task names, check spark task operating status, when
Between, it can also check the journal file of spark task run;
In the present embodiment, development deployment environment is as follows:
Develop environment:
Scala version: 2.10.5
Spark is multiplexed existing CDH component
IDE:IDEA
Deployed environment:
Scala version: 2.10.5
Spark version: 1.6.0
Zookeeper version: 3.4.5
MySQL version: 5.1.4
Oracle version: 10g
GreenPlum version: 5.7.0
Streaming computing system server deployed position see the table below 1:
Table 1
Embodiment
In the present embodiment, setting certain in oracle database table to have the data of 260M (as shown in table 2, is certain company pin
Sell data),
Table 2
In table 2, first row ORDERNO indicates that order number, secondary series MEMBERNO indicate membership number, third column
TOTALPRICE indicates total price, and the 4th column ORDERSTATUS indicates order status, and the 5th arranges organization number belonging to ORGID expression,
6th column OPERATORID indicates operator's number;
By data source modules, oracle database link information, including user name, password, oracle driving etc. are configured,
Since data volume is larger, 120W row is had reached, needs advanced row data initialization procedure, is then carrying out data synchronization process;
Data initialization process is all data extracted in oracle first with spark sql micro- batch;Pass through data meter
Module is calculated, the data of extraction are stored in memory, a data set is converted by registering interim table and is inserted into HIVE
In, it by data memory module, stores data on each node of server, passes through yarn distributed management system mould
Block carries out each node of cluster the reasonable distribution of resource, by the map feature of HIVE external table, data is mapped and are synchronized
Into the external table of GreenPlum, by calling storing process, logic required for being carried out the data of external table using sql
Then processing inserts data into table inside GreenPlum including association, conditional filtering, the logic calculation etc. with other tables
In;
Data synchronization process, and the data extracted in oracle first with spark sql micro- batch, but be to extract currently
Time, then next process was identical with the process of initialization toward one day data of backwash, but required first to empty every time
Then HIVE external table, GreenPlum external table again come in data insertion;Each data synchronization process is all once to dispatch,
It needs to manage platform by job scheduling and dispatches, job scheduling needs to configure the jar packet of code execution, link library
Jar packet, distribution, the time of timer-triggered scheduler of dispatching resource etc.;Result data after having handled shows mould by data billboard
Data are carried out the processing of different dimensions using data set by the interaction analysis module of block, such as: with time dimension (according to difference
Period data are classified) or with regional dimension (according to different areas, data are classified) etc., pass through
Data are carried out visualization exhibition by report display module by permission control module, the permission of every report of control, different user
Show.
The present invention provides a kind of data platform systems calculated based on real-time streaming, implement the side of the technical solution
There are many method and approach, the above is only a preferred embodiment of the present invention, it is noted that for the common skill of the art
For art personnel, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications
Also it should be regarded as protection scope of the present invention.All undefined components in this embodiment can be implemented in the prior art.
Claims (10)
1. a kind of data platform system calculated based on real-time streaming, which is characterized in that including data source modules, real time data meter
It calculates module, data billboard display module and job scheduling and manages platform;
Wherein, the data source modules are responsible for disposition data source link information, and the business that extracts needs to calculate and the number of real-time synchronization
According to;
The real time data computing module is responsible for real time data and calculates and store, and is responsible for resource management and the tune of each node
Degree;
The data billboard display module is used for interaction analysis;
The job scheduling management platform is responsible for configuration, the normal operation for calculating scheduler task in real time and the timing tune of workflow
Spend workflow, it is ensured that the accuracy of data.
2. system according to claim 1, which is characterized in that the data source modules include that data volume module and data are come
Source module;
The data volume module is for distinguishing whether data are initialized, when the size of data volume is less than 10W+ data
When, it is only necessary to data are synchronized, full table loads data, does not need to initialize;
The data source module includes structural data.
3. system according to claim 2, which is characterized in that the real time data computing module includes real-time data synchronization
Module, yarn distributed management system, data memory module and data computation module;
The real-time data synchronization module includes data volume module and data source module;
The data volume module and the function of data source module are identical as the function in data source modules;
The yarn distributed management system is used for the scheduling of resource of each node of cluster, to reach efficient resource management;
The data memory module is used to grab by the micro- batch processing of spark sql and synchronize in data to HDFS, and stores
In memory, by processing, there are in each node of cluster for final data;
The data computation module is used for the data in cluster memory by registering interim table, and using sql sentence come into
Row logic calculation.
4. system according to claim 3, which is characterized in that the data billboard display module includes interaction analysis mould
Block, report display module and permission control module;
The interaction analysis module is used for the logic calculation of data set and is analyzed according to different dimensions, different main bodys
And processing;
The report display module is for showing data by different data drawing lists;
The permission control module gives different business sides corresponding for controlling the permission that every report is checked and modified
Permission.
5. system according to claim 4, which is characterized in that the job scheduling management console module is appointed for spark
Business quasi real time management and running;
Quasi real time management and running include Workflow configuration, Coordinator configuration, workflow monitoring pipe to the spark task
Reason;Workflow can Parallel Scheduling, also can serial scheduling, workflow allows unsuccessfully to weigh brush mechanism, can restart to execute
Workflow schedule;
The Workflow is configured to management and running spark task, and each spark task requires one operation of configuration, together
A kind of other operation needs to configure a workflow, and during multiple workflow Parallel Schedulings, a spark mission failure is not
Influence whether other task runs in the same workflow;
The Coordinator is configured to the timer-triggered scheduler of management work stream, need to specify a corresponding workflow with
And the frequency of scheduling time, scheduling;
The workflow monitoring management is used to monitor state and the time of each spark task run.
6. the system stated according to claim 5, which is characterized in that the data memory module is used for by spark sql micro- batch
It is synchronized in reason crawl data to HDFS, synchronous data are a Dataframe structured data sets, its energy on HDFS
Interim table is registered, enough to use sql to be operated.
7. the system stated according to claim 6, which is characterized in that the yarn distributed management system be responsible for cluster resource management and
Scheduling, for the data distribution formula of data source modules to be stored on each node on HDFS and the tune of each node resource
Degree.
8. the system stated according to claim 7, which is characterized in that the data of the distributed storage on HDFS are by being mapped to
The external table of GreenPlum.
9. system according to claim 8, which is characterized in that the data of the GreenPlum external table were by storing
Journey carries out logic calculation, is inserted into table inside GreenPlum.
10. system according to claim 9, which is characterized in that the data of the GreenPlum external table are by depositing
Storage process carries out logic calculation, is inserted into table inside GreenPlum, includes the following steps:
Step 1, the data of data source modules are extracted using spark sql micro- batch;
Step 2, the data register of extraction at interim table;
Step 3, the data of interim table are inserted into HIVE external table;
Step 4, data are mapped in GreenPlum external table by HIVE external table;
Step 5, after data are carried out logic calculation by storing process by GreenPlum external table, it is inserted into GreenPlum's
Internal table;
Step 6, the data of internal table are write in data set, is shown by report.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910397951.XA CN110209646A (en) | 2019-05-14 | 2019-05-14 | A kind of data platform system calculated based on real-time streaming |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910397951.XA CN110209646A (en) | 2019-05-14 | 2019-05-14 | A kind of data platform system calculated based on real-time streaming |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110209646A true CN110209646A (en) | 2019-09-06 |
Family
ID=67787178
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910397951.XA Pending CN110209646A (en) | 2019-05-14 | 2019-05-14 | A kind of data platform system calculated based on real-time streaming |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110209646A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110908641A (en) * | 2019-11-27 | 2020-03-24 | 中国建设银行股份有限公司 | Visualization-based stream computing platform, method, device and storage medium |
CN111198884A (en) * | 2019-12-27 | 2020-05-26 | 福建威盾科技集团有限公司 | Information processing method and information processing system for vehicle initial entering city |
CN111400352A (en) * | 2020-03-18 | 2020-07-10 | 北京三维天地科技股份有限公司 | Workflow engine capable of processing data in batches |
CN112561368A (en) * | 2020-12-22 | 2021-03-26 | 绿瘦健康产业集团有限公司 | Visual achievement calculation method and device of OA examination and approval system |
CN112632114A (en) * | 2019-10-08 | 2021-04-09 | 中国移动通信集团辽宁有限公司 | Method and device for MPP database to quickly read data and computing equipment |
CN113064704A (en) * | 2021-03-18 | 2021-07-02 | 北京沃东天骏信息技术有限公司 | Task processing method and device, electronic equipment and computer readable medium |
CN115618194A (en) * | 2022-12-19 | 2023-01-17 | 江苏未至科技股份有限公司 | Spark-based data processing method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160364211A1 (en) * | 2015-06-11 | 2016-12-15 | Electronics And Telecommunications Research Institute | Method for generating workflow model and method and apparatus for executing workflow model |
CN107515927A (en) * | 2017-08-24 | 2017-12-26 | 深圳市云房网络科技有限公司 | A kind of real estate user behavioural analysis platform |
CN108255855A (en) * | 2016-12-29 | 2018-07-06 | 北京国双科技有限公司 | Date storage method and device |
CN108446145A (en) * | 2018-03-21 | 2018-08-24 | 苏州提点信息科技有限公司 | A kind of distributed document loads MPP data base methods automatically |
CN108681569A (en) * | 2018-05-04 | 2018-10-19 | 亚洲保理(深圳)有限公司 | A kind of automatic data analysis system and its method |
CN108984547A (en) * | 2017-05-31 | 2018-12-11 | 北京京东尚科信息技术有限公司 | The method and apparatus of data processing |
CN109408546A (en) * | 2018-10-17 | 2019-03-01 | 深圳中顺易金融服务有限公司 | A kind of stream data processing method and processing device |
-
2019
- 2019-05-14 CN CN201910397951.XA patent/CN110209646A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160364211A1 (en) * | 2015-06-11 | 2016-12-15 | Electronics And Telecommunications Research Institute | Method for generating workflow model and method and apparatus for executing workflow model |
CN108255855A (en) * | 2016-12-29 | 2018-07-06 | 北京国双科技有限公司 | Date storage method and device |
CN108984547A (en) * | 2017-05-31 | 2018-12-11 | 北京京东尚科信息技术有限公司 | The method and apparatus of data processing |
CN107515927A (en) * | 2017-08-24 | 2017-12-26 | 深圳市云房网络科技有限公司 | A kind of real estate user behavioural analysis platform |
CN108446145A (en) * | 2018-03-21 | 2018-08-24 | 苏州提点信息科技有限公司 | A kind of distributed document loads MPP data base methods automatically |
CN108681569A (en) * | 2018-05-04 | 2018-10-19 | 亚洲保理(深圳)有限公司 | A kind of automatic data analysis system and its method |
CN109408546A (en) * | 2018-10-17 | 2019-03-01 | 深圳中顺易金融服务有限公司 | A kind of stream data processing method and processing device |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112632114A (en) * | 2019-10-08 | 2021-04-09 | 中国移动通信集团辽宁有限公司 | Method and device for MPP database to quickly read data and computing equipment |
CN112632114B (en) * | 2019-10-08 | 2024-03-19 | 中国移动通信集团辽宁有限公司 | Method, device and computing equipment for fast reading data by MPP database |
CN110908641A (en) * | 2019-11-27 | 2020-03-24 | 中国建设银行股份有限公司 | Visualization-based stream computing platform, method, device and storage medium |
CN110908641B (en) * | 2019-11-27 | 2024-04-26 | 中国建设银行股份有限公司 | Visualization-based stream computing platform, method, device and storage medium |
CN111198884A (en) * | 2019-12-27 | 2020-05-26 | 福建威盾科技集团有限公司 | Information processing method and information processing system for vehicle initial entering city |
CN111198884B (en) * | 2019-12-27 | 2023-06-06 | 福建威盾科技集团有限公司 | Method and system for processing information of first entering city of vehicle |
CN111400352A (en) * | 2020-03-18 | 2020-07-10 | 北京三维天地科技股份有限公司 | Workflow engine capable of processing data in batches |
CN112561368A (en) * | 2020-12-22 | 2021-03-26 | 绿瘦健康产业集团有限公司 | Visual achievement calculation method and device of OA examination and approval system |
CN112561368B (en) * | 2020-12-22 | 2023-08-01 | 广东壹健康健康产业集团股份有限公司 | Visual performance calculation method and device for OA approval system |
CN113064704A (en) * | 2021-03-18 | 2021-07-02 | 北京沃东天骏信息技术有限公司 | Task processing method and device, electronic equipment and computer readable medium |
CN115618194A (en) * | 2022-12-19 | 2023-01-17 | 江苏未至科技股份有限公司 | Spark-based data processing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110209646A (en) | A kind of data platform system calculated based on real-time streaming | |
Hu et al. | Time-and cost-efficient task scheduling across geo-distributed data centers | |
CN103092698B (en) | Cloud computing application automatic deployment system and method | |
CN103930875B (en) | Software virtual machine for acceleration of transactional data processing | |
CN107103064B (en) | Data statistical method and device | |
CN111917887A (en) | System for realizing data governance under big data environment | |
Isah et al. | A scalable and robust framework for data stream ingestion | |
Ju et al. | iGraph: an incremental data processing system for dynamic graph | |
CN103885986A (en) | Main and auxiliary database synchronization method and device | |
CN105574082A (en) | Storm based stream processing method and system | |
Huang et al. | Yugong: Geo-distributed data and job placement at scale | |
CN106354729A (en) | Graph data handling method, device and system | |
CN105930417B (en) | A kind of big data ETL interactive process platform based on cloud computing | |
CN110704465B (en) | Method, device and storage medium for processing service work list | |
CN103077192B (en) | A kind of data processing method and system thereof | |
CN110308984A (en) | It is a kind of for handle geographically distributed data across cluster computing system | |
CN106407231A (en) | A data multi-thread export method and system | |
CN110134430A (en) | A kind of data packing method, device, storage medium and server | |
CN105553732B (en) | A kind of distributed network analogy method and system | |
CN102129443A (en) | Real-time data transmission channel and method based on USAS (Univac Standard Airline Systems) host | |
CN113672240A (en) | Container-based multi-machine-room batch automatic deployment application method and system | |
CN114756629B (en) | Multi-source heterogeneous data interaction analysis engine and method based on SQL | |
Sathya et al. | Application of Hadoop MapReduce technique to Virtual Database system design | |
CN108563787A (en) | A kind of data interaction management system and method for data center's total management system | |
CN105824892A (en) | Method for synchronizing and processing data by data pool |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |