CN105930373A - Spark streaming based big data stream processing method and system - Google Patents

Spark streaming based big data stream processing method and system Download PDF

Info

Publication number
CN105930373A
CN105930373A CN201610228189.9A CN201610228189A CN105930373A CN 105930373 A CN105930373 A CN 105930373A CN 201610228189 A CN201610228189 A CN 201610228189A CN 105930373 A CN105930373 A CN 105930373A
Authority
CN
China
Prior art keywords
data
spark
streami
data source
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610228189.9A
Other languages
Chinese (zh)
Inventor
杜旭苗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Si Tech Information Technology Co Ltd
Original Assignee
Beijing Si Tech Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Si Tech Information Technology Co Ltd filed Critical Beijing Si Tech Information Technology Co Ltd
Priority to CN201610228189.9A priority Critical patent/CN105930373A/en
Publication of CN105930373A publication Critical patent/CN105930373A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a spark streaming based big data stream processing method and system. The method includes: a step S1, receiving data sent by a data source at an appointed position, executing a step S2 if the data source is an HDFS, and executing a step S3 if the data source is an FLUME; the step S2, storing the data in a file form, and executing the step S3; the step S3, processing the received data or file through the spark streaming; and a step S4, writing the processing result of the file or the data in a result catalogue through the spark streaming according to a time interval. The method and system provide good fault-tolerant state calculation for fault-tolerant and data assurance, can support Scala programming and Java programming in the aspect of API programming; and in cluster management integration, the Spark Streaming can run on clusters thereof, and can run on a YARN and an Mesos.

Description

A kind of high amount of traffic processing method based on spark streaming and system
Technical field
The present invention relates to high amount of traffic process field, particularly relate to a kind of based on spark streaming High amount of traffic processing method and system.
Background technology
In prior art, commonly used Storm realizes data flow model, uses Storm to realize data stream mould During type, wherein data continue to flow through a conversion entity network.The abstract of one data stream is referred to as one Stream, this is a unlimited tuple sequence.Tuple uses some additional serializing codes just as a kind of Represent standard data type (such as integer, floating-point and byte arrays) or the structure of user defined type. Each stream is defined by a unique ID, and this ID can be used for building data source and the topological structure of receiver.
But Storm has the defect of himself, such as: in terms of fault-tolerant, data guarantee, in Storm Each single record is by necessary tracked, so Storm can at least ensure each record during system To be processed once, but allow to duplicate record when recovering from mistake, it means that Variableness may be updated twice improperly;In terms of realizing, programming API, due to Storm's Kernel is that clojure writes (but most expansion work is all java write), for us Understand that its realization brings certain difficulty;At the integrated aspect of cluster management, Storm may operate in On the cluster of oneself, Storm can also operate on Mesos, but when operating in YARN, it is desirable to have one Individual third party supporting assembly Storm on YARN, is not primary support.
Summary of the invention
The technical problem to be solved is for the deficiencies in the prior art, it is provided that a kind of based on The high amount of traffic processing method of spark streaming and system.
The technical scheme is that a kind of based on spark streaming High amount of traffic processing method, comprise the steps:
Step S1, receives the data that data source sends at appointed position, if data source is HDFS, then holds Row step S2, if data source is FLUME, then performs step S3;
Data are stored by step S2 with document form, perform step S3;
Step S3, data or the file of reception are processed by spark streaming;
Step S4, the result of file or data is write by spark streaming according to time interval Result list.
The invention has the beneficial effects as follows: the present invention by spark streaming by the file received or Person's data carry out batch processing according to time interval and write result list according to time interval, compared to Using Storm to next the process of data or file one in prior art, and individually record processes As a result, the present invention can speed up processing, improve treatment effeciency, and owing to being according to time interval Record result, therefore fault-tolerance is more preferably.
On the basis of technique scheme, the present invention can also do following improvement.
Further, in step S1, if data source is HDFS, the most described appointed position is that HDFS fixes Catalogue, if data source is FLUME, the most described appointed position is the agreement port of agreement main frame.
Using above-mentioned further scheme to provide the benefit that: according to the difference of data source, reasonably distribution connects Receive the position of data source, it is possible to avoid data to omit, it is ensured that data primitiveness and accuracy.
Further, if data source is HDFS, also included before performing step S3:
Spark streaming fixes whether there is newly-increased literary composition under catalogue according to time interval monitoring HDFS Part, if it has, then perform step S3, processes newly-increased file, otherwise continues monitoring.
Above-mentioned further scheme is used to provide the benefit that: in the case of data source is HDFS, according to Time interval monitoring HDFS fixes catalogue and determines whether newly-increased file, can be collected by newly-increased file, So that file is carried out subsequent treatment.
Further, if data source is FLUME, and the pattern of FLUME is push-model, is performing step Also include before S1:
Start spark streaming and based on spark streaming according to time interval monitoring agreement Whether the agreement port of main frame has newly-increased data, if it has, then perform step S1, receives newly-increased number According to, otherwise continue monitoring.
Above-mentioned further scheme is used to provide the benefit that: in the pattern that data source is FLUME and FLUME In the case of push-model, arrange port according to time interval monitoring and determine whether newly-increased data, can With by newly-increased data summarization, in order to data are carried out subsequent treatment.
Further, if data source is FLUME, and the pattern of FLUME is pull-mode, is performing step Also include before S3:
Start spark streaming and based on spark streaming according to time interval monitoring agreement Whether the agreement port of main frame has newly-increased data, if it has, then perform step S3, to newly-increased data Process, otherwise continue monitoring.
Above-mentioned further scheme is used to provide the benefit that: in the pattern that data source is FLUME and FLUME In the case of pull-mode, arrange port according to time interval monitoring and determine whether newly-increased data, can With by newly-increased data summarization, in order to data are carried out subsequent treatment.
The another kind of technical scheme that the present invention solves above-mentioned technical problem is as follows: a kind of based on spark The high amount of traffic processing system of streaming, including data reception module, file storage module, process Module and writing module:
Described data reception module, for receiving the data that data source sends at appointed position, if data Source is HDFS, then call described file storage module, if data source is FLUME, then calls described process Module;
Described file storage module, for data being stored with document form, and calls described processing module;
Described processing module, for carrying out data or the file of reception based on spark streaming Process, and call said write module;
Said write module, for based on spark streaming according to time interval by file or data Result write result list.
The invention has the beneficial effects as follows: the present invention by spark streaming by the file received or Person's data carry out batch processing according to time interval and write result list according to time interval, compared to Using Storm to next the process of data or file one in prior art, and individually record processes As a result, the present invention can speed up processing, improve treatment effeciency, and owing to being according to time interval Record result, therefore fault-tolerance is more preferably.
On the basis of technique scheme, the present invention can also do following improvement.
Further, if data source is HDFS, the most described appointed position is that HDFS fixes catalogue, if number Being FLUME according to source, the most described appointed position is the agreement port of agreement main frame.
Using above-mentioned further scheme to provide the benefit that: according to the difference of data source, reasonably distribution connects Receive the position of data source, it is possible to avoid data to omit, it is ensured that data primitiveness and accuracy.
Further, if data source is HDFS, the most also include:
First monitoring module, is connected with described file storage module and described processing module, respectively for base Fix under catalogue, whether there is newly-increased file in spark streaming according to time interval monitoring HDFS, If it has, then call described processing module, otherwise continue monitoring.
Above-mentioned further scheme is used to provide the benefit that: in the case of data source is HDFS, according to Time interval monitoring HDFS fixes catalogue and determines whether newly-increased file, can be collected by newly-increased file, So that file is carried out subsequent treatment.
Further, if data source is FLUME, and the pattern of FLUME is push-model, the most also includes:
Second monitoring module, is connected with described data reception module, is used for starting spark streaming And whether have newly-increased based on spark streaming according to the agreement port of time interval monitoring agreement main frame Data, if it has, then call described data reception module, otherwise continue monitoring.
Above-mentioned further scheme is used to provide the benefit that: in the pattern that data source is FLUME and FLUME In the case of push-model, arrange port according to time interval monitoring and determine whether newly-increased data, can With by newly-increased data summarization, in order to data are carried out subsequent treatment.
Further, if data source is FLUME, and the pattern of FLUME is pull-mode, the most also includes:
3rd monitoring module, is connected with described data reception module and described processing module respectively, is used for opening Dynamic spark streaming also arranges main frame based on spark streaming according to time interval monitoring Whether agreement port has newly-increased data, if it has, then call described processing module, otherwise continues monitoring.
Above-mentioned further scheme is used to provide the benefit that: in the pattern that data source is FLUME and FLUME In the case of pull-mode, arrange port according to time interval monitoring and determine whether newly-increased data, can With by newly-increased data summarization, in order to data are carried out subsequent treatment.
Accompanying drawing explanation
Fig. 1 is heretofore described high amount of traffic process flow figure based on spark streaming;
When Fig. 2 is that in the present invention, data source is HDFS, Spark Streaming carries out the flow chart of stream process;
When Fig. 3 is that in the present invention, data source is FLUME, during push-model, Spark Streaming is carried out at stream The flow chart of reason;
When Fig. 4 is that in the present invention, data source is FLUME, during pull-mode, Spark Streaming is carried out at stream The flow chart of reason;
Fig. 5 is heretofore described high amount of traffic processing system structure chart based on spark streaming;
When Fig. 6 is that in the present invention, data source is HDFS, Spark Streaming carries out the system knot of stream process Composition;
When Fig. 7 is that in the present invention, data source is FLUME, during push-model, Spark Streaming is carried out at stream The system construction drawing of reason;
When Fig. 8 is that in the present invention, data source is FLUME, during pull-mode, Spark Streaming is carried out at stream The system construction drawing of reason.
Detailed description of the invention
Being described principle and the feature of the present invention below in conjunction with accompanying drawing, example is served only for explaining this Invention, is not intended to limit the scope of the present invention.
Spark Streaming is the extension of spark Core API, can realize the height to real-time stream Handling capacity, fault-tolerant stream process.The data source of Spark Streaming can have a lot, including kafka, Flume, twitter, ZeroMQ or traditional TCP sockets.
Spark Streaming is an extension of core Spark API, it can't as Storm that Sample processes data stream one at a time, but is one section one by its cutting the most in advance The batch processing job of section.Spark is referred to as DStream for the abstract of persistent data stream (DiscretizedStream), a DStream is a micro-batch processing (micro-batching) RDD (elasticity distribution formula data set);RDD is then a kind of distributed data collection, it is possible to two kinds of sides Formula functioning in parallel, is the conversion of arbitrary function and sliding window data respectively.
Fig. 1 is heretofore described high amount of traffic process flow figure based on spark streaming.
As it is shown in figure 1, a kind of high amount of traffic processing method based on spark streaming, including such as Lower step:
Step S1, receives the data that data source sends at appointed position;If data source is HDFS, then Appointed position is that HDFS fixes catalogue, if data source is FLUME, then appointed position is the pact of agreement main frame Fixed end mouth.If data source is HDFS, then performs step S2, if data source is FLUME, then perform step S3;
Data are stored by step S2 with document form, perform step S3;
Step S3, data or the file of reception are processed by spark streaming;
Step S4, the result of file or data is write by spark streaming according to time interval Result list.
When Fig. 2 is that in the present invention, data source is HDFS, Spark Streaming carries out the flow process of stream process Figure.If as in figure 2 it is shown, data source is HDFS, also included before performing step S3: spark Streaming fixes whether there is newly-increased file under catalogue according to time interval monitoring HDFS, if it has, Then perform step S3, newly-increased file is processed, otherwise continue monitoring.
When Fig. 3 is that in the present invention, data source is FLUME, during push-model, Spark Streaming is carried out at stream The flow chart of reason.If as it is shown on figure 3, data source is FLUME, and the pattern of FLUME is push-model, Also included before performing step S1: start spark streaming and based on spark streaming Newly-increased data whether are had, if it has, then perform according to the agreement port of time interval monitoring agreement main frame Step S1, receives newly-increased data, otherwise continues monitoring.
When Fig. 4 is that in the present invention, data source is FLUME, during pull-mode, Spark Streaming is carried out at stream The flow chart of reason.As shown in Figure 4, if data source is FLUME, and the pattern of FLUME is pull-mode, Also included before performing step S3: start spark streaming and based on spark streaming Newly-increased data whether are had, if it has, then perform according to the agreement port of time interval monitoring agreement main frame Newly-increased data are processed by step S3, otherwise continue monitoring.
Fig. 5 is heretofore described high amount of traffic processing system structure chart based on spark streaming. Can draw as described in Figure 5 according to above-mentioned high amount of traffic processing method based on spark streaming A kind of high amount of traffic processing system based on spark streaming, including data reception module, file Memory module, processing module and writing module.Data reception module, for receiving number at appointed position The data sent according to source;If data source is HDFS, then appointed position is that HDFS fixes catalogue, if data Source is FLUME, then appointed position is the agreement port of agreement main frame.If data source is HDFS, then call File storage module, if data source is FLUME, then calls processing module.File storage module, is used for Data are stored with document form, and calls processing module;Processing module, for based on spark Data or the file of reception are processed by streaming, and call writing module.Writing module, For the result of file or data being write result based on spark streaming according to time interval Catalogue.
When Fig. 6 is that in the present invention, data source is HDFS, Spark Streaming carries out the system of stream process Structure chart.As shown in Figure 6, if data source is HDFS, then system also includes: the first monitoring module, point It is not connected with file storage module and processing module, between based on spark streaming according to the time Fix under catalogue, whether there is newly-increased file every monitoring HDFS, if it has, then call processing module, no Then continue monitoring.
When Fig. 7 is that in the present invention, data source is FLUME, during push-model, Spark Streaming is carried out at stream The system construction drawing of reason.If as it is shown in fig. 7, data source is FLUME, and the pattern of FLUME is for pushing away mould Formula, then system also includes: the second monitoring module, is connected with data reception module, is used for starting spark Streaming and based on spark streaming according to time interval monitoring agreement main frame agreement port Whether there are newly-increased data, if it has, then call data reception module, otherwise continue monitoring.
When Fig. 8 is that in the present invention, data source is FLUME, during pull-mode, Spark Streaming is carried out at stream The system construction drawing of reason.As shown in Figure 8, if data source is FLUME, and the pattern of FLUME is drawing-die Formula, the most also includes: the 3rd monitoring module, is connected with data reception module and processing module respectively, is used for Start spark streaming and based on spark streaming according to time interval monitoring agreement main frame Agreement port whether have newly-increased data, if it has, then call processing module, otherwise continue monitoring.
Compared with the Storm of prior art, it is an advantage of the current invention that: in terms of fault-tolerant, data guarantee, Spark Streaming provides and preferably supports fault-tolerant state computation;In terms of realizing, programming API, Spark Streaming is to program with Scala, also supports Java;Spark Streaming mono-is good Characteristic be that it operates on Spark, this makes it possible to write the same code of batch processing, without Write single code to process real-time streaming data and historical data;At the integrated aspect of cluster management, Spark Streaming may operate on the cluster of oneself, and Spark Streaming is on YARN and Mesos Also all can run, Spark Streaming is primary adaptive YARN.
In the description of this specification, reference term " embodiment one ", " embodiment two ", " example ", The description of " concrete example " or " some examples " etc. means to combine this embodiment or the tool of example description Body method, device or feature are contained at least one embodiment or the example of the present invention.In this explanation In book, the schematic representation of above-mentioned term is necessarily directed to identical embodiment or example.And, The specific features, method, device or the feature that describe can be with in one or more embodiments in office or examples Combine in an appropriate manner.Additionally, in the case of the most conflicting, those skilled in the art is permissible The feature of the different embodiments described in this specification or example and different embodiment or example is carried out In conjunction with and combination.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all in the present invention Spirit and principle within, any modification, equivalent substitution and improvement etc. made, should be included in this Within bright protection domain.

Claims (10)

1. a high amount of traffic processing method based on spark streami ng, it is characterised in that include Following steps:
Step S1, receives the data that data source sends at appointed position, if data source is HDFS, then holds Row step S2, if data source is FLUME, then performs step S3;
Data are stored by step S2 with document form, perform step S3;
Step S3, data or the file of reception are processed by spark streami ng;
Step S4, the result of file or data is write by spark streami ng according to time interval Result list.
High amount of traffic processing method based on spark streami ng the most according to claim 1, It is characterized in that, in step S1, if data source is HDFS, the most described appointed position is that HDFS fixes mesh Record, if data source is FLUME, the most described appointed position is the agreement port of agreement main frame.
High amount of traffic processing method based on spark streami ng the most according to claim 2, It is characterized in that, if data source is HDFS, also included before performing step S3:
Spark streami ng fixes whether there is newly-increased literary composition under catalogue according to time interval monitoring HDFS Part, if it has, then perform step S3, processes newly-increased file, otherwise continues monitoring.
High amount of traffic processing method based on spark streami ng the most according to claim 2, It is characterized in that, if data source is FLUME, and the pattern of FLUME is push-model, is performing step S1 The most also include:
Start spark streami ng and based on spark streami ng according to time interval monitoring agreement Whether the agreement port of main frame has newly-increased data, if it has, then perform step S1, receives newly-increased number According to, otherwise continue monitoring.
High amount of traffic processing method based on spark streami ng the most according to claim 2, It is characterized in that, if data source is FLUME, and the pattern of FLUME is pull-mode, is performing step S3 The most also include:
Start spark streami ng and based on spark streami ng according to time interval monitoring agreement Whether the agreement port of main frame has newly-increased data, if it has, then perform step S3, to newly-increased data Process, otherwise continue monitoring.
6. a high amount of traffic processing system based on spark streami ng, it is characterised in that include Data reception module, file storage module, processing module and writing module:
Described data reception module, for receiving the data that data source sends at appointed position, if data Source is HDFS, then call described file storage module, if data source is FLUME, then calls described process Module;
Described file storage module, for data being stored with document form, and calls described processing module;
Described processing module, for carrying out data or the file of reception based on spark streami ng Process, and call said write module;
Said write module, for based on spark streami ng according to time interval by file or data Result write result list.
High amount of traffic processing system based on spark streami ng the most according to claim 6, It is characterized in that, if data source is HDFS, the most described appointed position is that HDFS fixes catalogue, if data Source is FLUME, and the most described appointed position is the agreement port of agreement main frame.
High amount of traffic processing system based on spark streami ng the most according to claim 7, It is characterized in that, if data source is HDFS, the most also include:
First monitoring module, is connected with described file storage module and described processing module, respectively for base Fix under catalogue, whether there is newly-increased file in spark streami ng according to time interval monitoring HDFS, If it has, then call described processing module, otherwise continue monitoring.
High amount of traffic processing system based on spark streami ng the most according to claim 7, It is characterized in that, if data source is FLUME, and the pattern of FLUME is push-model, the most also includes:
Second monitoring module, is connected with described data reception module, is used for starting spark streami ng And whether have newly-increased based on spark streami ng according to the agreement port of time interval monitoring agreement main frame Data, if it has, then call described data reception module, otherwise continue monitoring.
High amount of traffic processing system based on spark streami ng the most according to claim 7, It is characterized in that, if data source is FLUME, and the pattern of FLUME is pull-mode, the most also includes:
3rd monitoring module, is connected with described data reception module and described processing module respectively, is used for opening Dynamic spark streami ng also arranges main frame based on spark streami ng according to time interval monitoring Whether agreement port has newly-increased data, if it has, then call described processing module, otherwise continues monitoring.
CN201610228189.9A 2016-04-13 2016-04-13 Spark streaming based big data stream processing method and system Pending CN105930373A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610228189.9A CN105930373A (en) 2016-04-13 2016-04-13 Spark streaming based big data stream processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610228189.9A CN105930373A (en) 2016-04-13 2016-04-13 Spark streaming based big data stream processing method and system

Publications (1)

Publication Number Publication Date
CN105930373A true CN105930373A (en) 2016-09-07

Family

ID=56839072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610228189.9A Pending CN105930373A (en) 2016-04-13 2016-04-13 Spark streaming based big data stream processing method and system

Country Status (1)

Country Link
CN (1) CN105930373A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106371366A (en) * 2016-09-22 2017-02-01 南京中新赛克科技有限责任公司 ARM architecture-based big data acquisition and analysis platform
CN107256158A (en) * 2017-06-07 2017-10-17 广州供电局有限公司 The detection method and system of power system load reduction
CN107294801A (en) * 2016-12-30 2017-10-24 江苏号百信息服务有限公司 Stream Processing method and system based on magnanimity real-time Internet DPI data
CN107566341A (en) * 2017-07-31 2018-01-09 南京邮电大学 A kind of data persistence storage method and system based on federal distributed file storage system
CN108132986A (en) * 2017-12-14 2018-06-08 北京航天测控技术有限公司 A kind of immediate processing method of aircraft magnanimity biosensor assay data
CN108540407A (en) * 2018-03-01 2018-09-14 山东大学 Spark Streaming receivers Dynamic Configurations and device in a kind of big data platform
CN108647329A (en) * 2018-05-11 2018-10-12 中国联合网络通信集团有限公司 Processing method, device and the computer readable storage medium of user behavior data
CN110287215A (en) * 2019-05-20 2019-09-27 湖南大学 Large-scale area target real-time searching method based on hibert curve
WO2020233262A1 (en) * 2019-07-12 2020-11-26 之江实验室 Spark-based multi-center data collaborative computing stream processing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636494A (en) * 2015-03-04 2015-05-20 浪潮电子信息产业股份有限公司 Spark-based log auditing and reversed checking system for big data platforms
CN105207826A (en) * 2015-10-26 2015-12-30 南京联成科技发展有限公司 Security attack alarm positioning system based on Spark big data platform of Tachyou
CN105302890A (en) * 2015-10-16 2016-02-03 海信集团有限公司 Multimedia content online recommendation method and multimedia content online recommendation auxiliary method and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636494A (en) * 2015-03-04 2015-05-20 浪潮电子信息产业股份有限公司 Spark-based log auditing and reversed checking system for big data platforms
CN105302890A (en) * 2015-10-16 2016-02-03 海信集团有限公司 Multimedia content online recommendation method and multimedia content online recommendation auxiliary method and apparatus
CN105207826A (en) * 2015-10-26 2015-12-30 南京联成科技发展有限公司 Security attack alarm positioning system based on Spark big data platform of Tachyou

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何美斌 等: ""基于SPARK 的信令数据实时处理平台设计"", 《电子技术与软件工程》 *
王家林 等: "《Spark大数据实例开发教程》", 31 January 2016 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106371366A (en) * 2016-09-22 2017-02-01 南京中新赛克科技有限责任公司 ARM architecture-based big data acquisition and analysis platform
CN107294801B (en) * 2016-12-30 2020-03-31 江苏号百信息服务有限公司 Streaming processing method and system based on massive real-time internet DPI data
CN107294801A (en) * 2016-12-30 2017-10-24 江苏号百信息服务有限公司 Stream Processing method and system based on magnanimity real-time Internet DPI data
CN107256158A (en) * 2017-06-07 2017-10-17 广州供电局有限公司 The detection method and system of power system load reduction
CN107256158B (en) * 2017-06-07 2021-06-18 广州供电局有限公司 Method and system for detecting load reduction of power system
CN107566341A (en) * 2017-07-31 2018-01-09 南京邮电大学 A kind of data persistence storage method and system based on federal distributed file storage system
CN107566341B (en) * 2017-07-31 2020-03-31 南京邮电大学 Data persistence storage method and system based on federal distributed file storage system
CN108132986A (en) * 2017-12-14 2018-06-08 北京航天测控技术有限公司 A kind of immediate processing method of aircraft magnanimity biosensor assay data
CN108540407A (en) * 2018-03-01 2018-09-14 山东大学 Spark Streaming receivers Dynamic Configurations and device in a kind of big data platform
CN108647329A (en) * 2018-05-11 2018-10-12 中国联合网络通信集团有限公司 Processing method, device and the computer readable storage medium of user behavior data
CN108647329B (en) * 2018-05-11 2021-08-10 中国联合网络通信集团有限公司 User behavior data processing method and device and computer readable storage medium
CN110287215A (en) * 2019-05-20 2019-09-27 湖南大学 Large-scale area target real-time searching method based on hibert curve
WO2020233262A1 (en) * 2019-07-12 2020-11-26 之江实验室 Spark-based multi-center data collaborative computing stream processing method

Similar Documents

Publication Publication Date Title
CN105930373A (en) Spark streaming based big data stream processing method and system
US9292448B2 (en) Dynamic sizing of memory caches
CN109918141B (en) Thread execution method, thread execution device, terminal and storage medium
CN107317838B (en) Astronomical metadata filing method and system based on streaming data processing architecture
CN105468735A (en) Stream preprocessing system and method based on mass information of mobile internet
CN1967487A (en) Cooperative scheduling using coroutines and threads
CN101944114A (en) Data synchronization method between memory database and physical database
CN102810184A (en) Method and device for dynamically executing workflow and enterprise system
CN104301442A (en) Method for achieving client of access object storage cluster based on fuse
US20150112934A1 (en) Parallel scanners for log based replication
CN106339217B (en) Event management method and system based on Unity
CN106997394B (en) A kind of data random ordering arrival processing method and system
CN107678923A (en) A kind of optimization method of distributed file system Message Processing
Petrov et al. Adaptive performance model for dynamic scaling Apache Spark Streaming
CN110647392A (en) Intelligent elastic expansion method based on container cluster
Liu et al. Optimizing shuffle in wide-area data analytics
CN111209467A (en) Data real-time query system under multi-concurrency multi-channel environment
US9515886B2 (en) Rule set orchestration processing method and apparatus, and cluster data system
CN102915344A (en) SQL (structured query language) statement processing method and device
CN104410511B (en) A kind of server management method and system
CN107479966B (en) Signaling acquisition method based on multi-core CPU
CN1687899A (en) Method, system and module for dynamic downloading of applied programe to user identification
CN106407233A (en) A data processing method and apparatus
CN1737764A (en) Task scheduling method for embedded real-time operation system supporting OSEK standard
CN108108479A (en) A kind of database connecting detection method, system, equipment and computer media

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160907

RJ01 Rejection of invention patent application after publication