CN107704594A - Power system daily record data real-time processing method based on SparkStreaming - Google Patents

Power system daily record data real-time processing method based on SparkStreaming Download PDF

Info

Publication number
CN107704594A
CN107704594A CN201710951969.0A CN201710951969A CN107704594A CN 107704594 A CN107704594 A CN 107704594A CN 201710951969 A CN201710951969 A CN 201710951969A CN 107704594 A CN107704594 A CN 107704594A
Authority
CN
China
Prior art keywords
time
block gap
block
data
batch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710951969.0A
Other languages
Chinese (zh)
Other versions
CN107704594B (en
Inventor
宋爱波
涂金林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201710951969.0A priority Critical patent/CN107704594B/en
Publication of CN107704594A publication Critical patent/CN107704594A/en
Application granted granted Critical
Publication of CN107704594B publication Critical patent/CN107704594B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/244Grouping and aggregation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries

Abstract

The invention discloses a kind of power system daily record data real-time processing method based on Spark Streaming, first against the whole network log data stream sharp increase, the problem of classification and the association attributes change for the daily record data that processing system obtains are various, predefined statistical model, reduce the time of processing system pretreatment;Then by the analysis to block gap and processing time relation, the dynamic adjustment based on block gap is found, can be optimal the processing time of query task;It is finally based on this method and devises efficient dynamic adjustable strategies, seek optimal block gap in time, the processing time of query task is reduced, analyzes running status and the track of electric power scheduling automatization system whereby, realizes the analysis conversion of power system health status qualitative to quantitative.The present invention for effective management of power system daily record data so that provide a kind of efficient, easy-to-use real-time processing method.

Description

Power system daily record data real-time processing method based on Spark Streaming
Technical field
The present invention relates to a kind of power system daily record data real-time processing method, more particularly to one kind to be based on Spark Streaming power system daily record data real-time processing method.
Background technology
Electric power is the basic industry of modernized society's operating and development, the safety of power system and stable relation to mankind society The every aspect that can be lived.As a kind of data handling system, it believes electric power scheduling automatization system comprising Operation of Electric Systems Breath, analysis decision instrument and control device.Electric power scheduling automatization system can produce state, debugging, mistake in the process of running Etc. data, this kind of data are referred to as daily record data.A kind of form of expression of the daily record data as Operation of Electric Systems information, to it Fast and accurately analyzed, there is important guaranteeing role for power system security stable operation.
With the continuous expansion of dispatch automated system scale, power system needs the daily record data amount handled in real time drastically Increase.Show big data quantity in face of the whole network real-time logs data, the characteristics of rapid development, it is calculated, analyzed, is emulated and For the demands such as optimization considerably beyond the ability to bear of ordinary computing system, traditional log management means can not meet massive logs The management of data and analysis demand.Previous Stream Processing system by abandon a part of input traffic (such as classification unload Carry), data of the selection with distinguishing feature are handled, or by flexibly increasing extra resource.But as a rule, lose It is not a selection well to abandon data, it is more likely that the data of discarding are extremely important, thus influence the correctness of result;And For the real-time stream of high-throughput, advance acquisition related resource, this cost is huge.
In order to determine the trend of system operation and pattern, find out failure etc., the operation of electric power scheduling automatization system is analyzed State and track are, it is necessary to accomplish on line real time.Influenceed by disk performance, daily record data, which fails timely processing, to be caused to count According to loss, it is necessary to by the fast throughput of internal memory.Meanwhile in face of the continuous change of system resource and state, processing system Can timely it adjust, it is ensured that the processing time of system is optimal.
For problem above, how researchers are begun to focus on using memory source breakthrough I/O bottlenecks, improve data throughput Rate, accelerate the processing speed of data.Apache Spark are exactly the Computational frame of increasing income wherein shown one's talent.Spark is based on interior The iterative calculation framework deposited can in internal memory multi-pass operation specific set of data, realize the quick analyzing and processing of big data.Spark Streaming is as its upper level tool, there is provided the real-time processing function based on interval.Data flow be divided into some data blocks when Between be referred to as block gap, the time that some data chunks synthesize a batch is referred to as batch interval.This mode can be good at meeting Real-time processing requirement of the electric power scheduling automatization system to data in some period.
Generally, if the degree of parallelism of Spark Streaming processing datas (includes data block in a batch Quantity=batch interval/block gap) it is lower, then and the expense and utilization rate of resource will be smaller, such as establishment, the interaction of task Deng.And large-scale parallel computation will cause substantial amounts of resource overhead, the high resource utilization of simultaneous.In order to and When understanding electric power scheduling automatization system running status and track, realize power system health status qualitative to quantitative point Analysis conversion, this is just necessary to ensure that query task can reach relatively low resource overhead and Geng Gao resource utilization.In order to weigh The expense and utilization rate of resource, when in face of different system mode and change in resources, the degree of parallelism of processing needs adjustment in time.
In the last few years, the process demand of real-time stream promoted the development of distributed Computational frame in real time.Such as:Document “High-Throughput Robust Architecture for Log Analysis and Data Stream Mining” Then analyzed as real-time Computational frame, receiving real-time data using Apache Storm.Spark Streaming conducts Spark upper strata upgrade kit, unlike Storm systems:Spark Streaming are not an a record then notes The processing data stream of record, but data flow is divided into the batch job of multiple periods in advance according to time interval and handled. Storm is the real-time Computational frame based on event level, and electric power scheduling automatization system is more in some period The calculating analysis of the stateful batch processing of data flow.And Storm can at least be handled once for every record, when node is from mistake In recover, record can recalculate, and this is just unsatisfactory for the safe and reliable demand of electric power scheduling automatization system.
By dynamic adjustment batch interval or dynamic adjusting data block size, can actually ensure in not advance skill In the case of stream mode and running environment, system can stablize operation.But these modes are paid close attention to and are more data Read-write throughput and resource utilization.And for complicated calculations, dynamic adjustment also fails to the more excellent batch interval of selection or number According to block size, cause processing time increasingly longer, ignore the demand that dispatch automated system is quickly handled completely.
The content of the invention
Goal of the invention:For problem above, the present invention proposes a kind of power system daily record based on Spark Streaming Real-time Data Processing Method.
Technical scheme:To realize the purpose of the present invention, the technical solution adopted in the present invention is:One kind is based on Spark Streaming power system daily record data real-time processing method, comprises the following steps:
(1) statistical model of different log categories is defined;
(2) Spark Streaming block gaps and the relational model of Data Stream Processing time are built;
(3) dynamic adjustment block gap, seeks optimal block gap.
Further, in the step (1), statistical model includes element:Data set, result set, packet condition, it was grouped Filter and rule action.
Further, in the step (2), data flow is divided into the time of some data blocks, i.e. block gap;Some numbers The time of a batch is combined into according to block, that is, criticizes interval.
Relational model construction step:
(1) data flow of reception is divided into independent data block according to block gap by module in batches;
(2) data block in one batch of interval time is rolled into a batch, waits in line to be located into batch queue Reason;
The parallel data processing of all block gaps in (3) one batches of interval times.
Further, in the step (3), batch interval is given, using greedy algorithm, dynamic adjusts block gap, sought most Excellent block gap.
The greedy algorithm step is:
(1) original block time interval is β, adjusting step i;
(2) if the batch processing time that block gap is β is less than the batch processing time that block gap is β+i, between optimal block It is interposed between the left side of initial block gap;If the batch processing time that block gap is β is less than the batch processing time that block gap is β-i, Optimal block gap is on the right side of initial block gap;
(3) when the direction for seeking optimal block gap, exploration is continued cycling through, can not be reduced again until processing time.
Beneficial effect:This method considers the characteristics of power system daily record data, in face of system resource and state not Disconnected change, processing system quickly can be moved timely without redefining statistical function and model according to the change of data flow State adjusts, so as to reach higher resource utilization and shorter processing time.
Brief description of the drawings
Fig. 1 is block gap schematic diagram;
Fig. 2 is influence curve figure of the block gap to processing time.
Embodiment
Technical scheme is further described with reference to the accompanying drawings and examples.
The present invention for it is existing in real time Computational frame processing log data stream existing for deficiency, consider block gap and The relation of Data Stream Processing time, propose a kind of power system daily record data based on Spark Streaming side of processing in real time Method, it is intended to ensure that Spark Streaming block gaps can dynamically adjust with the continuous change of system resource and state, add The processing speed of fast real-time stream, running status and the track of electric power scheduling automatization system are analyzed whereby, realizes power train The analysis conversion of system health status qualitative to quantitative.
The present invention first against the whole network log data stream sharp increase, the classification for the daily record data that processing system obtains and The problem of association attributes change is various, defines statistical model to different log categories in advance, locates in advance so as to reduce processing system The time of reason;Then by the analysis to processing system block gap and processing time relation, it is found that the dynamic based on block gap is adjusted The whole processing time that can be effectively reduced system;Above-mentioned analysis is finally based on, devises the dynamic adjustment based on greedy algorithm Strategy, optimal block gap is sought in time, accelerate the processing speed of log data stream, reduce the processing time of query task.
Power system daily record data real-time processing method based on Spark Streaming, comprises the following steps:
Step 1:The statistical model of different log categories is defined, according to statistical model, quick analysis in real time;
When processing system obtain daily record data classification and association attributes constantly change, in advance for different daily record classes Each field during other Treatment Analysis, statistical model is defined, reduce the time of processing system pretreatment.
Statistical model describe one in real time analysis during, it is necessary to each element set.According to structuring SELECT Sentence format in query language, a statistical model need to include following element:
(1) data set:Equivalent to FROM and WHERE clause., it is necessary to indicate the log category of subscription, system in data set Time window of meter etc., the daily record data for belonging to certain classification are then supported to be based on layout element if necessary to further screening Logical expression.
(2) result set:Equivalent to SELECT clause., it is necessary to most be produced at last during indicating present analysis in result set Raw result field, mainly include layout element and static fields.Static fields support multiple statistical functions:COUNT、SUM、 MAX、MIN、TOP(N)、ASSERT。
(3) it is grouped condition:Equivalent to GROUP BY clauses.Packet condition can only be included in the field defined in result set.
(4) packet filter:Packet filter can only include the static fields in result set, for the element branch of numeric type The operator held has:=,>、>=,<、<=,!=, the operator that the element of character type is supported has:EQUAL、CONTAIN、 BEGINWITH、ENDWITH。
(5) rule action:According to the content matching of result set rule:Storage, alarm.Storage refers to store result of calculation Into external system;Alarm refers to set a threshold value for the result of statistical operation, when result exceeds threshold value, sends alarm letter Breath.
Analyze target and statistical model example is as shown in table 1:
Table 1
Step 2:Build Spark Streaming block gaps and the relational model of Data Stream Processing time;
Spark Streaming block gaps and the relation of Data Stream Processing time are analyzed, seeking makes the Data Stream Processing time Reach the condition of minimum block gap.
As shown in figure 1, the module in batches in figure is Spark Streaming module in batches, its effect is to receive Data flow be divided into multiple batches, then each batch is handled respectively.Module forms a batch, it is necessary to two weights in batches The parameter wanted:Block gap and batch interval.The time that data flow is divided into some data blocks is referred to as block gap, some data block combinations Time into a batch is referred to as batch interval.
Therefore, in batches module by the data flow received first according to block gap (block gap<Batch interval) be divided into it is each Independent data block, then by one batch of interlude, all data blocks in this period can be rolled into one batch Secondary, this last batch, which enters in batch queue, to be waited in line to be processed.
There it can be seen that the execution degree of parallelism of batch is by crowd interval/block gap (batch interval/block Interval) determine, represent the number of data block in a batch.Under equal resource allocation, if the degree of parallelism of processing is got over It is low, then the expense and utilization rate of resource will be smaller, such as the establishment of task, interaction etc.;And large-scale parallel computation is then Substantial amounts of resource overhead, the high resource utilization of simultaneous can be caused.For the expense and utilization rate of trading-off resources, During in face of different system mode and change in resources, the degree of parallelism of processing needs adjustment in time.Understand power dispatching automation system The running status of system and track, realize the analysis conversion of power system health status qualitative to quantitative, it is meant that batch interval needs Keep relative constancy.Therefore, the execution degree of parallelism of processing system is mainly influenceed by block gap.
Analyzed more than, block gap determines the execution degree of parallelism of processing system, while also just has influence on the place of system Rationality energy.As shown in Fig. 2 batch alternate constant of Reduce workflows is at 3 seconds, and batch alternate constant of Join workflows is at 1 second, Respectively under 2MB/S and 4MB/S data stream reception speed, influence of the block gap to processing time.As can be seen that different number According to stream receiving velocity, obtained curve approximation is in parabola, then it is exactly to throw processing time is reached minimum optimal block gap The summit of thing line.In fact, by the change of operating environment and the interference etc. of noise, the relation of block gap and processing time are simultaneously Non- is parabola truly.But have not with suspecting, optimal block gap is necessarily with the change of data reception rate Change and change, because data reception rate is faster, the data in block gap are more;Data reception rate is slower, in block gap Data are fewer, and more major generals of data directly affect the processing time of processing system.
Observed based on more than, for a given batch interval, it is possible to by adjusting the size of block gap, appoint inquiry The processing time of business is optimal.
Step 3:When log data stream is analyzed in real time, according to the relational model in step 2, Spark is utilized The dynamic adjustment of Streaming block gaps, reduce the processing time of query task.
Reach the condition of minimum block gap according to the Data Stream Processing time, by the method for greed, seek in time most Excellent block gap;And according to the continuous change of processing system resource and state, dynamic adjusts, and reduces the processing of query task Time.
The optimization aim of the present invention is to ensure that processing system has often handled a batch, the block gap of next group data receiver Determine.As can be seen that if selected original block interval too small or excessive, explores optimal block gap in Fig. 2 Time will be very long.The scheme of compromise is then to select block gap/2 as initial block gap, and without frequently exploring, then By gradually increasing or reducing block gap, can not reduce again until processing time.
Table 2 gives the algorithm for calculating next block gap.Original block time interval is β, adjusting step i, is calculated Cheng Zhong, β then represent next block gap.P1And P2Represent the processing time of the first two batch.
Dynamic adjustable strategies based on greedy algorithm are as shown in table 2:
Table 2
Calculating process mainly includes two parts:If the batch processing time that block gap is β is less than batch that block gap is β+i Processing time, then optimal block gap is in the left side of initial block gap;If block gap is the β batch processing time to be less than block gap For the β-i batch processing time, then optimal block gap is on the right side of initial block gap.When the direction for seeking optimal block gap, Exploration is continued cycling through, can not be reduced again until processing time.
If data reception rate and system running environment keep constant, then optimal block gap will keep stable. But when running environment changes, then optimal block gap will change, and now correct algorithm needs to do in time Go out adjustment to adapt to newest environment.But the convergent time will be extended from the beginning, therefore present invention selection running environment Restart greedy adjustment as initial block gap in block gap before change.

Claims (6)

  1. A kind of 1. power system daily record data real-time processing method based on Spark Streaming, it is characterised in that:Including with Lower step:
    (1) statistical model of different log categories is defined;
    (2) Spark Streaming block gaps and the relational model of Data Stream Processing time are built;
    (3) dynamic adjustment block gap, seeks optimal block gap.
  2. 2. the power system daily record data real-time processing method according to claim 1 based on Spark Streaming, its It is characterised by:In the step (1), statistical model includes element:Data set, result set, packet condition, packet filter and rule Then act.
  3. 3. the power system daily record data real-time processing method according to claim 2 based on Spark Streaming, its It is characterised by:In the step (2), data flow is divided into the time of some data blocks, i.e. block gap;Some data block combinations Into the time of a batch, that is, criticize interval.
  4. 4. the power system daily record data real-time processing method according to claim 3 based on Spark Streaming, its It is characterised by:Relational model construction step in the step (2):
    (1) data flow of reception is divided into independent data block according to block gap by module in batches;
    (2) data block in one batch of interval time is rolled into a batch, waits in line to be processed into batch queue;
    The parallel data processing of all block gaps in (3) one batches of interval times.
  5. 5. the power system daily record data real-time processing method according to claim 4 based on Spark Streaming, its It is characterised by:In the step (3), batch interval is given, using greedy algorithm, dynamic adjusts block gap, seeks optimal block gap.
  6. 6. the power system daily record data real-time processing method according to claim 5 based on Spark Streaming, its It is characterised by:The greedy algorithm step is:
    (1) original block time interval is β, adjusting step i;
    (2) if the batch processing time that block gap is β is less than the batch processing time that block gap is β+i, optimal block gap exists The left side of initial block gap;If the batch processing time that block gap is β is less than the batch processing time that block gap is β-i, optimal Block gap on the right side of initial block gap;
    (3) when the direction for seeking optimal block gap, exploration is continued cycling through, can not be reduced again until processing time.
CN201710951969.0A 2017-10-13 2017-10-13 Real-time processing method for log data of power system based on spark streaming Active CN107704594B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710951969.0A CN107704594B (en) 2017-10-13 2017-10-13 Real-time processing method for log data of power system based on spark streaming

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710951969.0A CN107704594B (en) 2017-10-13 2017-10-13 Real-time processing method for log data of power system based on spark streaming

Publications (2)

Publication Number Publication Date
CN107704594A true CN107704594A (en) 2018-02-16
CN107704594B CN107704594B (en) 2021-02-09

Family

ID=61183445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710951969.0A Active CN107704594B (en) 2017-10-13 2017-10-13 Real-time processing method for log data of power system based on spark streaming

Country Status (1)

Country Link
CN (1) CN107704594B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109831316A (en) * 2018-12-17 2019-05-31 国网浙江省电力有限公司 Massive logs real-time analyzer, real-time analysis method and readable storage medium storing program for executing
CN112632020A (en) * 2020-12-25 2021-04-09 中国电子科技集团公司第三十研究所 Log information type extraction method and mining method based on spark big data platform

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102281816A (en) * 2008-11-20 2011-12-14 人体媒介公司 Method and apparatus for determining critical care parameters
US20140033161A1 (en) * 2012-07-30 2014-01-30 Synopsys, Inc. Accurate approximation of the objective function for solving the gate-sizing problem using a numerical solver
CN104616205A (en) * 2014-11-24 2015-05-13 北京科东电力控制系统有限责任公司 Distributed log analysis based operation state monitoring method of power system
CN105005585A (en) * 2015-06-24 2015-10-28 上海卓悠网络科技有限公司 Log data processing method and device
CN105677489A (en) * 2016-03-04 2016-06-15 山东大学 System and method for dynamically setting batch intervals under disperse flow processing model
CN105868019A (en) * 2016-02-01 2016-08-17 中国科学院大学 Automatic optimization method for performance of Spark platform
CN106168909A (en) * 2016-06-30 2016-11-30 北京奇虎科技有限公司 A kind for the treatment of method and apparatus of daily record
CN106227832A (en) * 2016-07-26 2016-12-14 浪潮软件股份有限公司 The Internet big data technique framework application process in operational analysis in enterprise
US20170046412A1 (en) * 2014-04-01 2017-02-16 Huawei Technologies Co., Ltd. Method for Querying and Updating Entries in a Database
US20170063888A1 (en) * 2015-08-31 2017-03-02 Splunk Inc. Malware communications detection
CN106547854A (en) * 2016-10-20 2017-03-29 天津大学 Distributed file system storage optimization power-economizing method based on greedy glowworm swarm algorithm
CN106599182A (en) * 2016-12-13 2017-04-26 飞狐信息技术(天津)有限公司 Feature engineering recommendation method and device based on spark streaming real-time streams and video website
CN106778033A (en) * 2017-01-10 2017-05-31 南京邮电大学 A kind of Spark Streaming abnormal temperature data alarm methods based on Spark platforms
CN106936812A (en) * 2017-01-10 2017-07-07 南京邮电大学 File privacy leakage detection method based on Petri network under a kind of cloud environment

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102281816A (en) * 2008-11-20 2011-12-14 人体媒介公司 Method and apparatus for determining critical care parameters
US20140033161A1 (en) * 2012-07-30 2014-01-30 Synopsys, Inc. Accurate approximation of the objective function for solving the gate-sizing problem using a numerical solver
US20170046412A1 (en) * 2014-04-01 2017-02-16 Huawei Technologies Co., Ltd. Method for Querying and Updating Entries in a Database
CN104616205A (en) * 2014-11-24 2015-05-13 北京科东电力控制系统有限责任公司 Distributed log analysis based operation state monitoring method of power system
CN105005585A (en) * 2015-06-24 2015-10-28 上海卓悠网络科技有限公司 Log data processing method and device
US20170063888A1 (en) * 2015-08-31 2017-03-02 Splunk Inc. Malware communications detection
CN105868019A (en) * 2016-02-01 2016-08-17 中国科学院大学 Automatic optimization method for performance of Spark platform
CN105677489A (en) * 2016-03-04 2016-06-15 山东大学 System and method for dynamically setting batch intervals under disperse flow processing model
CN106168909A (en) * 2016-06-30 2016-11-30 北京奇虎科技有限公司 A kind for the treatment of method and apparatus of daily record
CN106227832A (en) * 2016-07-26 2016-12-14 浪潮软件股份有限公司 The Internet big data technique framework application process in operational analysis in enterprise
CN106547854A (en) * 2016-10-20 2017-03-29 天津大学 Distributed file system storage optimization power-economizing method based on greedy glowworm swarm algorithm
CN106599182A (en) * 2016-12-13 2017-04-26 飞狐信息技术(天津)有限公司 Feature engineering recommendation method and device based on spark streaming real-time streams and video website
CN106778033A (en) * 2017-01-10 2017-05-31 南京邮电大学 A kind of Spark Streaming abnormal temperature data alarm methods based on Spark platforms
CN106936812A (en) * 2017-01-10 2017-07-07 南京邮电大学 File privacy leakage detection method based on Petri network under a kind of cloud environment

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
JAVASTART: "Spark Streaming场景应用- Spark Streaming计算模型及监控", 《HTTPS://BLOG.CSDN.NET/JAVASTART/ARTICLE/DETAILS/77510886》 *
W397090770: "Spark Streaming性能调优详解Spark", 《HTTPS://WWW.ITEBLOG.COM/ARCHIVES/1333.HTML》 *
WEIQING687: "Faster Stateful Stream Processing in Apache Spark Streaming", 《HTTPS://BLOG.CSDN.NET/QQ_26222859/ARTICLE/DETAILS/54836445》 *
张彬: "基于Spark大数据平台日志审计系统的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
村里的INTERN: "基于ELK Stack和Spark Streaming的日志处理平台设计与实现", 《HTTPS://BLOG.CSDN.NET/BIGSTAR863/ARTICLE/DETAILS/49099531》 *
涂金林: "基于Spark的电力系统日志数据的分析处理", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109831316A (en) * 2018-12-17 2019-05-31 国网浙江省电力有限公司 Massive logs real-time analyzer, real-time analysis method and readable storage medium storing program for executing
CN112632020A (en) * 2020-12-25 2021-04-09 中国电子科技集团公司第三十研究所 Log information type extraction method and mining method based on spark big data platform
CN112632020B (en) * 2020-12-25 2022-03-18 中国电子科技集团公司第三十研究所 Log information type extraction method and mining method based on spark big data platform

Also Published As

Publication number Publication date
CN107704594B (en) 2021-02-09

Similar Documents

Publication Publication Date Title
CN109933306B (en) Self-adaptive hybrid cloud computing framework generation method based on operation type recognition
US10467569B2 (en) Apparatus and method for scheduling distributed workflow tasks
CN105069134B (en) A kind of automatic collection method of Oracle statistical informations
CN111740884B (en) Log processing method, electronic equipment, server and storage medium
US20100318516A1 (en) Productive distribution for result optimization within a hierarchical architecture
CN103345514A (en) Streamed data processing method in big data environment
CN103116599A (en) Urban mass data flow fast redundancy elimination method based on improved Bloom filter structure
CN106780149A (en) A kind of equipment real-time monitoring system based on timed task scheduling
CN106383746A (en) Configuration parameter determination method and apparatus of big data processing system
US20210216548A1 (en) Method and database system for generating a query operator execution flow
WO2023011236A1 (en) Compilation optimization method for program source code, and related product
CN110413927B (en) Optimization method and system based on matching instantaneity in publish-subscribe system
CN106570145B (en) Distributed database result caching method based on hierarchical mapping
CN104679590A (en) Map optimization method and device in distributive calculating system
CN107704594A (en) Power system daily record data real-time processing method based on SparkStreaming
US20220043690A1 (en) Parallelized segment generation via key-based subdivision in database systems
Chen et al. Cost-effective resource provisioning for spark workloads
WO2021088605A1 (en) Big data-based decision method, device and medium
CN110176276B (en) Biological information analysis process management method and system
CN116974994A (en) High-efficiency file collaboration system based on clusters
CN111352820A (en) Method, equipment and device for predicting and monitoring running state of high-performance application
CN107908691A (en) A kind of big data via operation analytic system
CN112967495A (en) Short-time traffic flow prediction method and system based on big data of movement track
Li et al. Optimizing the cost-performance tradeoff for geo-distributed data analytics with uncertain demand
CN113296946B (en) Processing method and equipment for concurrent real-time streaming data analysis tasks with coordinated side ends

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant