CN107704594A - Power system daily record data real-time processing method based on SparkStreaming - Google Patents
Power system daily record data real-time processing method based on SparkStreaming Download PDFInfo
- Publication number
- CN107704594A CN107704594A CN201710951969.0A CN201710951969A CN107704594A CN 107704594 A CN107704594 A CN 107704594A CN 201710951969 A CN201710951969 A CN 201710951969A CN 107704594 A CN107704594 A CN 107704594A
- Authority
- CN
- China
- Prior art keywords
- time
- block gap
- block
- data
- batch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
- G06F16/244—Grouping and aggregation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
Abstract
The invention discloses a kind of power system daily record data real-time processing method based on Spark Streaming, first against the whole network log data stream sharp increase, the problem of classification and the association attributes change for the daily record data that processing system obtains are various, predefined statistical model, reduce the time of processing system pretreatment;Then by the analysis to block gap and processing time relation, the dynamic adjustment based on block gap is found, can be optimal the processing time of query task;It is finally based on this method and devises efficient dynamic adjustable strategies, seek optimal block gap in time, the processing time of query task is reduced, analyzes running status and the track of electric power scheduling automatization system whereby, realizes the analysis conversion of power system health status qualitative to quantitative.The present invention for effective management of power system daily record data so that provide a kind of efficient, easy-to-use real-time processing method.
Description
Technical field
The present invention relates to a kind of power system daily record data real-time processing method, more particularly to one kind to be based on Spark
Streaming power system daily record data real-time processing method.
Background technology
Electric power is the basic industry of modernized society's operating and development, the safety of power system and stable relation to mankind society
The every aspect that can be lived.As a kind of data handling system, it believes electric power scheduling automatization system comprising Operation of Electric Systems
Breath, analysis decision instrument and control device.Electric power scheduling automatization system can produce state, debugging, mistake in the process of running
Etc. data, this kind of data are referred to as daily record data.A kind of form of expression of the daily record data as Operation of Electric Systems information, to it
Fast and accurately analyzed, there is important guaranteeing role for power system security stable operation.
With the continuous expansion of dispatch automated system scale, power system needs the daily record data amount handled in real time drastically
Increase.Show big data quantity in face of the whole network real-time logs data, the characteristics of rapid development, it is calculated, analyzed, is emulated and
For the demands such as optimization considerably beyond the ability to bear of ordinary computing system, traditional log management means can not meet massive logs
The management of data and analysis demand.Previous Stream Processing system by abandon a part of input traffic (such as classification unload
Carry), data of the selection with distinguishing feature are handled, or by flexibly increasing extra resource.But as a rule, lose
It is not a selection well to abandon data, it is more likely that the data of discarding are extremely important, thus influence the correctness of result;And
For the real-time stream of high-throughput, advance acquisition related resource, this cost is huge.
In order to determine the trend of system operation and pattern, find out failure etc., the operation of electric power scheduling automatization system is analyzed
State and track are, it is necessary to accomplish on line real time.Influenceed by disk performance, daily record data, which fails timely processing, to be caused to count
According to loss, it is necessary to by the fast throughput of internal memory.Meanwhile in face of the continuous change of system resource and state, processing system
Can timely it adjust, it is ensured that the processing time of system is optimal.
For problem above, how researchers are begun to focus on using memory source breakthrough I/O bottlenecks, improve data throughput
Rate, accelerate the processing speed of data.Apache Spark are exactly the Computational frame of increasing income wherein shown one's talent.Spark is based on interior
The iterative calculation framework deposited can in internal memory multi-pass operation specific set of data, realize the quick analyzing and processing of big data.Spark
Streaming is as its upper level tool, there is provided the real-time processing function based on interval.Data flow be divided into some data blocks when
Between be referred to as block gap, the time that some data chunks synthesize a batch is referred to as batch interval.This mode can be good at meeting
Real-time processing requirement of the electric power scheduling automatization system to data in some period.
Generally, if the degree of parallelism of Spark Streaming processing datas (includes data block in a batch
Quantity=batch interval/block gap) it is lower, then and the expense and utilization rate of resource will be smaller, such as establishment, the interaction of task
Deng.And large-scale parallel computation will cause substantial amounts of resource overhead, the high resource utilization of simultaneous.In order to and
When understanding electric power scheduling automatization system running status and track, realize power system health status qualitative to quantitative point
Analysis conversion, this is just necessary to ensure that query task can reach relatively low resource overhead and Geng Gao resource utilization.In order to weigh
The expense and utilization rate of resource, when in face of different system mode and change in resources, the degree of parallelism of processing needs adjustment in time.
In the last few years, the process demand of real-time stream promoted the development of distributed Computational frame in real time.Such as:Document
“High-Throughput Robust Architecture for Log Analysis and Data Stream Mining”
Then analyzed as real-time Computational frame, receiving real-time data using Apache Storm.Spark Streaming conducts
Spark upper strata upgrade kit, unlike Storm systems:Spark Streaming are not an a record then notes
The processing data stream of record, but data flow is divided into the batch job of multiple periods in advance according to time interval and handled.
Storm is the real-time Computational frame based on event level, and electric power scheduling automatization system is more in some period
The calculating analysis of the stateful batch processing of data flow.And Storm can at least be handled once for every record, when node is from mistake
In recover, record can recalculate, and this is just unsatisfactory for the safe and reliable demand of electric power scheduling automatization system.
By dynamic adjustment batch interval or dynamic adjusting data block size, can actually ensure in not advance skill
In the case of stream mode and running environment, system can stablize operation.But these modes are paid close attention to and are more data
Read-write throughput and resource utilization.And for complicated calculations, dynamic adjustment also fails to the more excellent batch interval of selection or number
According to block size, cause processing time increasingly longer, ignore the demand that dispatch automated system is quickly handled completely.
The content of the invention
Goal of the invention:For problem above, the present invention proposes a kind of power system daily record based on Spark Streaming
Real-time Data Processing Method.
Technical scheme:To realize the purpose of the present invention, the technical solution adopted in the present invention is:One kind is based on Spark
Streaming power system daily record data real-time processing method, comprises the following steps:
(1) statistical model of different log categories is defined;
(2) Spark Streaming block gaps and the relational model of Data Stream Processing time are built;
(3) dynamic adjustment block gap, seeks optimal block gap.
Further, in the step (1), statistical model includes element:Data set, result set, packet condition, it was grouped
Filter and rule action.
Further, in the step (2), data flow is divided into the time of some data blocks, i.e. block gap;Some numbers
The time of a batch is combined into according to block, that is, criticizes interval.
Relational model construction step:
(1) data flow of reception is divided into independent data block according to block gap by module in batches;
(2) data block in one batch of interval time is rolled into a batch, waits in line to be located into batch queue
Reason;
The parallel data processing of all block gaps in (3) one batches of interval times.
Further, in the step (3), batch interval is given, using greedy algorithm, dynamic adjusts block gap, sought most
Excellent block gap.
The greedy algorithm step is:
(1) original block time interval is β, adjusting step i;
(2) if the batch processing time that block gap is β is less than the batch processing time that block gap is β+i, between optimal block
It is interposed between the left side of initial block gap;If the batch processing time that block gap is β is less than the batch processing time that block gap is β-i,
Optimal block gap is on the right side of initial block gap;
(3) when the direction for seeking optimal block gap, exploration is continued cycling through, can not be reduced again until processing time.
Beneficial effect:This method considers the characteristics of power system daily record data, in face of system resource and state not
Disconnected change, processing system quickly can be moved timely without redefining statistical function and model according to the change of data flow
State adjusts, so as to reach higher resource utilization and shorter processing time.
Brief description of the drawings
Fig. 1 is block gap schematic diagram;
Fig. 2 is influence curve figure of the block gap to processing time.
Embodiment
Technical scheme is further described with reference to the accompanying drawings and examples.
The present invention for it is existing in real time Computational frame processing log data stream existing for deficiency, consider block gap and
The relation of Data Stream Processing time, propose a kind of power system daily record data based on Spark Streaming side of processing in real time
Method, it is intended to ensure that Spark Streaming block gaps can dynamically adjust with the continuous change of system resource and state, add
The processing speed of fast real-time stream, running status and the track of electric power scheduling automatization system are analyzed whereby, realizes power train
The analysis conversion of system health status qualitative to quantitative.
The present invention first against the whole network log data stream sharp increase, the classification for the daily record data that processing system obtains and
The problem of association attributes change is various, defines statistical model to different log categories in advance, locates in advance so as to reduce processing system
The time of reason;Then by the analysis to processing system block gap and processing time relation, it is found that the dynamic based on block gap is adjusted
The whole processing time that can be effectively reduced system;Above-mentioned analysis is finally based on, devises the dynamic adjustment based on greedy algorithm
Strategy, optimal block gap is sought in time, accelerate the processing speed of log data stream, reduce the processing time of query task.
Power system daily record data real-time processing method based on Spark Streaming, comprises the following steps:
Step 1:The statistical model of different log categories is defined, according to statistical model, quick analysis in real time;
When processing system obtain daily record data classification and association attributes constantly change, in advance for different daily record classes
Each field during other Treatment Analysis, statistical model is defined, reduce the time of processing system pretreatment.
Statistical model describe one in real time analysis during, it is necessary to each element set.According to structuring
SELECT Sentence format in query language, a statistical model need to include following element:
(1) data set:Equivalent to FROM and WHERE clause., it is necessary to indicate the log category of subscription, system in data set
Time window of meter etc., the daily record data for belonging to certain classification are then supported to be based on layout element if necessary to further screening
Logical expression.
(2) result set:Equivalent to SELECT clause., it is necessary to most be produced at last during indicating present analysis in result set
Raw result field, mainly include layout element and static fields.Static fields support multiple statistical functions:COUNT、SUM、
MAX、MIN、TOP(N)、ASSERT。
(3) it is grouped condition:Equivalent to GROUP BY clauses.Packet condition can only be included in the field defined in result set.
(4) packet filter:Packet filter can only include the static fields in result set, for the element branch of numeric type
The operator held has:=,>、>=,<、<=,!=, the operator that the element of character type is supported has:EQUAL、CONTAIN、
BEGINWITH、ENDWITH。
(5) rule action:According to the content matching of result set rule:Storage, alarm.Storage refers to store result of calculation
Into external system;Alarm refers to set a threshold value for the result of statistical operation, when result exceeds threshold value, sends alarm letter
Breath.
Analyze target and statistical model example is as shown in table 1:
Table 1
Step 2:Build Spark Streaming block gaps and the relational model of Data Stream Processing time;
Spark Streaming block gaps and the relation of Data Stream Processing time are analyzed, seeking makes the Data Stream Processing time
Reach the condition of minimum block gap.
As shown in figure 1, the module in batches in figure is Spark Streaming module in batches, its effect is to receive
Data flow be divided into multiple batches, then each batch is handled respectively.Module forms a batch, it is necessary to two weights in batches
The parameter wanted:Block gap and batch interval.The time that data flow is divided into some data blocks is referred to as block gap, some data block combinations
Time into a batch is referred to as batch interval.
Therefore, in batches module by the data flow received first according to block gap (block gap<Batch interval) be divided into it is each
Independent data block, then by one batch of interlude, all data blocks in this period can be rolled into one batch
Secondary, this last batch, which enters in batch queue, to be waited in line to be processed.
There it can be seen that the execution degree of parallelism of batch is by crowd interval/block gap (batch interval/block
Interval) determine, represent the number of data block in a batch.Under equal resource allocation, if the degree of parallelism of processing is got over
It is low, then the expense and utilization rate of resource will be smaller, such as the establishment of task, interaction etc.;And large-scale parallel computation is then
Substantial amounts of resource overhead, the high resource utilization of simultaneous can be caused.For the expense and utilization rate of trading-off resources,
During in face of different system mode and change in resources, the degree of parallelism of processing needs adjustment in time.Understand power dispatching automation system
The running status of system and track, realize the analysis conversion of power system health status qualitative to quantitative, it is meant that batch interval needs
Keep relative constancy.Therefore, the execution degree of parallelism of processing system is mainly influenceed by block gap.
Analyzed more than, block gap determines the execution degree of parallelism of processing system, while also just has influence on the place of system
Rationality energy.As shown in Fig. 2 batch alternate constant of Reduce workflows is at 3 seconds, and batch alternate constant of Join workflows is at 1 second,
Respectively under 2MB/S and 4MB/S data stream reception speed, influence of the block gap to processing time.As can be seen that different number
According to stream receiving velocity, obtained curve approximation is in parabola, then it is exactly to throw processing time is reached minimum optimal block gap
The summit of thing line.In fact, by the change of operating environment and the interference etc. of noise, the relation of block gap and processing time are simultaneously
Non- is parabola truly.But have not with suspecting, optimal block gap is necessarily with the change of data reception rate
Change and change, because data reception rate is faster, the data in block gap are more;Data reception rate is slower, in block gap
Data are fewer, and more major generals of data directly affect the processing time of processing system.
Observed based on more than, for a given batch interval, it is possible to by adjusting the size of block gap, appoint inquiry
The processing time of business is optimal.
Step 3:When log data stream is analyzed in real time, according to the relational model in step 2, Spark is utilized
The dynamic adjustment of Streaming block gaps, reduce the processing time of query task.
Reach the condition of minimum block gap according to the Data Stream Processing time, by the method for greed, seek in time most
Excellent block gap;And according to the continuous change of processing system resource and state, dynamic adjusts, and reduces the processing of query task
Time.
The optimization aim of the present invention is to ensure that processing system has often handled a batch, the block gap of next group data receiver
Determine.As can be seen that if selected original block interval too small or excessive, explores optimal block gap in Fig. 2
Time will be very long.The scheme of compromise is then to select block gap/2 as initial block gap, and without frequently exploring, then
By gradually increasing or reducing block gap, can not reduce again until processing time.
Table 2 gives the algorithm for calculating next block gap.Original block time interval is β, adjusting step i, is calculated
Cheng Zhong, β then represent next block gap.P1And P2Represent the processing time of the first two batch.
Dynamic adjustable strategies based on greedy algorithm are as shown in table 2:
Table 2
Calculating process mainly includes two parts:If the batch processing time that block gap is β is less than batch that block gap is β+i
Processing time, then optimal block gap is in the left side of initial block gap;If block gap is the β batch processing time to be less than block gap
For the β-i batch processing time, then optimal block gap is on the right side of initial block gap.When the direction for seeking optimal block gap,
Exploration is continued cycling through, can not be reduced again until processing time.
If data reception rate and system running environment keep constant, then optimal block gap will keep stable.
But when running environment changes, then optimal block gap will change, and now correct algorithm needs to do in time
Go out adjustment to adapt to newest environment.But the convergent time will be extended from the beginning, therefore present invention selection running environment
Restart greedy adjustment as initial block gap in block gap before change.
Claims (6)
- A kind of 1. power system daily record data real-time processing method based on Spark Streaming, it is characterised in that:Including with Lower step:(1) statistical model of different log categories is defined;(2) Spark Streaming block gaps and the relational model of Data Stream Processing time are built;(3) dynamic adjustment block gap, seeks optimal block gap.
- 2. the power system daily record data real-time processing method according to claim 1 based on Spark Streaming, its It is characterised by:In the step (1), statistical model includes element:Data set, result set, packet condition, packet filter and rule Then act.
- 3. the power system daily record data real-time processing method according to claim 2 based on Spark Streaming, its It is characterised by:In the step (2), data flow is divided into the time of some data blocks, i.e. block gap;Some data block combinations Into the time of a batch, that is, criticize interval.
- 4. the power system daily record data real-time processing method according to claim 3 based on Spark Streaming, its It is characterised by:Relational model construction step in the step (2):(1) data flow of reception is divided into independent data block according to block gap by module in batches;(2) data block in one batch of interval time is rolled into a batch, waits in line to be processed into batch queue;The parallel data processing of all block gaps in (3) one batches of interval times.
- 5. the power system daily record data real-time processing method according to claim 4 based on Spark Streaming, its It is characterised by:In the step (3), batch interval is given, using greedy algorithm, dynamic adjusts block gap, seeks optimal block gap.
- 6. the power system daily record data real-time processing method according to claim 5 based on Spark Streaming, its It is characterised by:The greedy algorithm step is:(1) original block time interval is β, adjusting step i;(2) if the batch processing time that block gap is β is less than the batch processing time that block gap is β+i, optimal block gap exists The left side of initial block gap;If the batch processing time that block gap is β is less than the batch processing time that block gap is β-i, optimal Block gap on the right side of initial block gap;(3) when the direction for seeking optimal block gap, exploration is continued cycling through, can not be reduced again until processing time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710951969.0A CN107704594B (en) | 2017-10-13 | 2017-10-13 | Real-time processing method for log data of power system based on spark streaming |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710951969.0A CN107704594B (en) | 2017-10-13 | 2017-10-13 | Real-time processing method for log data of power system based on spark streaming |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107704594A true CN107704594A (en) | 2018-02-16 |
CN107704594B CN107704594B (en) | 2021-02-09 |
Family
ID=61183445
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710951969.0A Active CN107704594B (en) | 2017-10-13 | 2017-10-13 | Real-time processing method for log data of power system based on spark streaming |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107704594B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109831316A (en) * | 2018-12-17 | 2019-05-31 | 国网浙江省电力有限公司 | Massive logs real-time analyzer, real-time analysis method and readable storage medium storing program for executing |
CN112632020A (en) * | 2020-12-25 | 2021-04-09 | 中国电子科技集团公司第三十研究所 | Log information type extraction method and mining method based on spark big data platform |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102281816A (en) * | 2008-11-20 | 2011-12-14 | 人体媒介公司 | Method and apparatus for determining critical care parameters |
US20140033161A1 (en) * | 2012-07-30 | 2014-01-30 | Synopsys, Inc. | Accurate approximation of the objective function for solving the gate-sizing problem using a numerical solver |
CN104616205A (en) * | 2014-11-24 | 2015-05-13 | 北京科东电力控制系统有限责任公司 | Distributed log analysis based operation state monitoring method of power system |
CN105005585A (en) * | 2015-06-24 | 2015-10-28 | 上海卓悠网络科技有限公司 | Log data processing method and device |
CN105677489A (en) * | 2016-03-04 | 2016-06-15 | 山东大学 | System and method for dynamically setting batch intervals under disperse flow processing model |
CN105868019A (en) * | 2016-02-01 | 2016-08-17 | 中国科学院大学 | Automatic optimization method for performance of Spark platform |
CN106168909A (en) * | 2016-06-30 | 2016-11-30 | 北京奇虎科技有限公司 | A kind for the treatment of method and apparatus of daily record |
CN106227832A (en) * | 2016-07-26 | 2016-12-14 | 浪潮软件股份有限公司 | The Internet big data technique framework application process in operational analysis in enterprise |
US20170046412A1 (en) * | 2014-04-01 | 2017-02-16 | Huawei Technologies Co., Ltd. | Method for Querying and Updating Entries in a Database |
US20170063888A1 (en) * | 2015-08-31 | 2017-03-02 | Splunk Inc. | Malware communications detection |
CN106547854A (en) * | 2016-10-20 | 2017-03-29 | 天津大学 | Distributed file system storage optimization power-economizing method based on greedy glowworm swarm algorithm |
CN106599182A (en) * | 2016-12-13 | 2017-04-26 | 飞狐信息技术(天津)有限公司 | Feature engineering recommendation method and device based on spark streaming real-time streams and video website |
CN106778033A (en) * | 2017-01-10 | 2017-05-31 | 南京邮电大学 | A kind of Spark Streaming abnormal temperature data alarm methods based on Spark platforms |
CN106936812A (en) * | 2017-01-10 | 2017-07-07 | 南京邮电大学 | File privacy leakage detection method based on Petri network under a kind of cloud environment |
-
2017
- 2017-10-13 CN CN201710951969.0A patent/CN107704594B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102281816A (en) * | 2008-11-20 | 2011-12-14 | 人体媒介公司 | Method and apparatus for determining critical care parameters |
US20140033161A1 (en) * | 2012-07-30 | 2014-01-30 | Synopsys, Inc. | Accurate approximation of the objective function for solving the gate-sizing problem using a numerical solver |
US20170046412A1 (en) * | 2014-04-01 | 2017-02-16 | Huawei Technologies Co., Ltd. | Method for Querying and Updating Entries in a Database |
CN104616205A (en) * | 2014-11-24 | 2015-05-13 | 北京科东电力控制系统有限责任公司 | Distributed log analysis based operation state monitoring method of power system |
CN105005585A (en) * | 2015-06-24 | 2015-10-28 | 上海卓悠网络科技有限公司 | Log data processing method and device |
US20170063888A1 (en) * | 2015-08-31 | 2017-03-02 | Splunk Inc. | Malware communications detection |
CN105868019A (en) * | 2016-02-01 | 2016-08-17 | 中国科学院大学 | Automatic optimization method for performance of Spark platform |
CN105677489A (en) * | 2016-03-04 | 2016-06-15 | 山东大学 | System and method for dynamically setting batch intervals under disperse flow processing model |
CN106168909A (en) * | 2016-06-30 | 2016-11-30 | 北京奇虎科技有限公司 | A kind for the treatment of method and apparatus of daily record |
CN106227832A (en) * | 2016-07-26 | 2016-12-14 | 浪潮软件股份有限公司 | The Internet big data technique framework application process in operational analysis in enterprise |
CN106547854A (en) * | 2016-10-20 | 2017-03-29 | 天津大学 | Distributed file system storage optimization power-economizing method based on greedy glowworm swarm algorithm |
CN106599182A (en) * | 2016-12-13 | 2017-04-26 | 飞狐信息技术(天津)有限公司 | Feature engineering recommendation method and device based on spark streaming real-time streams and video website |
CN106778033A (en) * | 2017-01-10 | 2017-05-31 | 南京邮电大学 | A kind of Spark Streaming abnormal temperature data alarm methods based on Spark platforms |
CN106936812A (en) * | 2017-01-10 | 2017-07-07 | 南京邮电大学 | File privacy leakage detection method based on Petri network under a kind of cloud environment |
Non-Patent Citations (6)
Title |
---|
JAVASTART: "Spark Streaming场景应用- Spark Streaming计算模型及监控", 《HTTPS://BLOG.CSDN.NET/JAVASTART/ARTICLE/DETAILS/77510886》 * |
W397090770: "Spark Streaming性能调优详解Spark", 《HTTPS://WWW.ITEBLOG.COM/ARCHIVES/1333.HTML》 * |
WEIQING687: "Faster Stateful Stream Processing in Apache Spark Streaming", 《HTTPS://BLOG.CSDN.NET/QQ_26222859/ARTICLE/DETAILS/54836445》 * |
张彬: "基于Spark大数据平台日志审计系统的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
村里的INTERN: "基于ELK Stack和Spark Streaming的日志处理平台设计与实现", 《HTTPS://BLOG.CSDN.NET/BIGSTAR863/ARTICLE/DETAILS/49099531》 * |
涂金林: "基于Spark的电力系统日志数据的分析处理", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109831316A (en) * | 2018-12-17 | 2019-05-31 | 国网浙江省电力有限公司 | Massive logs real-time analyzer, real-time analysis method and readable storage medium storing program for executing |
CN112632020A (en) * | 2020-12-25 | 2021-04-09 | 中国电子科技集团公司第三十研究所 | Log information type extraction method and mining method based on spark big data platform |
CN112632020B (en) * | 2020-12-25 | 2022-03-18 | 中国电子科技集团公司第三十研究所 | Log information type extraction method and mining method based on spark big data platform |
Also Published As
Publication number | Publication date |
---|---|
CN107704594B (en) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109933306B (en) | Self-adaptive hybrid cloud computing framework generation method based on operation type recognition | |
US10467569B2 (en) | Apparatus and method for scheduling distributed workflow tasks | |
CN105069134B (en) | A kind of automatic collection method of Oracle statistical informations | |
CN111740884B (en) | Log processing method, electronic equipment, server and storage medium | |
US20100318516A1 (en) | Productive distribution for result optimization within a hierarchical architecture | |
CN103345514A (en) | Streamed data processing method in big data environment | |
CN103116599A (en) | Urban mass data flow fast redundancy elimination method based on improved Bloom filter structure | |
CN106780149A (en) | A kind of equipment real-time monitoring system based on timed task scheduling | |
CN106383746A (en) | Configuration parameter determination method and apparatus of big data processing system | |
US20210216548A1 (en) | Method and database system for generating a query operator execution flow | |
WO2023011236A1 (en) | Compilation optimization method for program source code, and related product | |
CN110413927B (en) | Optimization method and system based on matching instantaneity in publish-subscribe system | |
CN106570145B (en) | Distributed database result caching method based on hierarchical mapping | |
CN104679590A (en) | Map optimization method and device in distributive calculating system | |
CN107704594A (en) | Power system daily record data real-time processing method based on SparkStreaming | |
US20220043690A1 (en) | Parallelized segment generation via key-based subdivision in database systems | |
Chen et al. | Cost-effective resource provisioning for spark workloads | |
WO2021088605A1 (en) | Big data-based decision method, device and medium | |
CN110176276B (en) | Biological information analysis process management method and system | |
CN116974994A (en) | High-efficiency file collaboration system based on clusters | |
CN111352820A (en) | Method, equipment and device for predicting and monitoring running state of high-performance application | |
CN107908691A (en) | A kind of big data via operation analytic system | |
CN112967495A (en) | Short-time traffic flow prediction method and system based on big data of movement track | |
Li et al. | Optimizing the cost-performance tradeoff for geo-distributed data analytics with uncertain demand | |
CN113296946B (en) | Processing method and equipment for concurrent real-time streaming data analysis tasks with coordinated side ends |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |