CN108683560A - A kind of performance benchmark test system and method for high amount of traffic processing frame - Google Patents
A kind of performance benchmark test system and method for high amount of traffic processing frame Download PDFInfo
- Publication number
- CN108683560A CN108683560A CN201810461515.XA CN201810461515A CN108683560A CN 108683560 A CN108683560 A CN 108683560A CN 201810461515 A CN201810461515 A CN 201810461515A CN 108683560 A CN108683560 A CN 108683560A
- Authority
- CN
- China
- Prior art keywords
- data
- performance
- test
- application
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
- H04L43/045—Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0852—Delays
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0876—Network utilisation, e.g. volume of load or congestion level
- H04L43/0888—Throughput
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0876—Network utilisation, e.g. volume of load or congestion level
- H04L43/0894—Packet rate
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/12—Avoiding congestion; Recovering from congestion
- H04L47/125—Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/27—Evaluation or update of window size, e.g. using information derived from acknowledged [ACK] packets
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Data Mining & Analysis (AREA)
- Debugging And Monitoring (AREA)
Abstract
The present invention relates to the performance benchmark test system and method that a kind of high amount of traffic handles frame, system is made of four streaming workload generator, streaming scene and application builder, performance data collection tool and performance data analysis tool parts.The present invention meets the application of Stream Processing mode computation feature by choosing, generate the load for meeting Stream Processing mode data feature, high amount of traffic processing frame is tested in typical scene and the performance under application, the performance indicators such as back-pressure, handling capacity, delay, system resource, node data when acquisition operation diagnose the bottleneck place of stream process frame finally by analysis and statistics gatherer data.
Description
Technical field
The present invention relates to the performance benchmark test system and methods that a kind of high amount of traffic handles frame, more particularly in typical case
Streaming scene and performance when application lower frame operation, belong to software technology field.
Background technology
With the arrival of Internet era, the continuous development of the technologies such as mobile Internet, social networks, e-commerce, number
According to explosive growth is presented, big data has become current scientific and technological circle, the business circles even hot spot of attention from government.
In general, data can be divided into bounded data and infinite data.Bounded data, also referred to as batch data refer to fixed bounded
The data being stored in persistence medium, data volume does not change when calculating.Usually, batch big data processing frame (after
Text is referred to as batch processing frame) receive task that user submits to the data set progress logical process and analysis that store, finally
Export result.For example, carrying out analysis mining to history data set using machine learning algorithm, prediction model is established.Have now
The batch processing frame of many maturations gets application, such as Hadoop, Spark.
But with sensing equipment, the rise of social networks or extensive utilization, magnanimity high-speed data is analyzed in real time
Demand constantly promoted, it is this it is lasting generation and infinite data be referred to as infinite data, also known as flow data.Consulting overseas
Mechanism shows the investigation of IT application in enterprises, demand that 70% enterprise is handled in real time there are stream data (Liu X,
Iftikhar N,Xie X.Survey of real-time processing systems for big
data.International Database Engineering&Applications Symposium ACM.New
York.USA 2014:356-361).For example, Alibaba is based on Blink frames, real-time update commercial articles searching engine, structure exists
Line machine learning platform;Group of U.S. net is based on Storm frames, analyzes user behavior, realizes recommendation feedback quasi real time;Ooze row
Based on Samza frames, the generation place of order data is monitored, draws geographical thermodynamic chart early warning.
But flow data has a feature different from batch data, traditional batch processing frame can not well stream data into
Row is handled, and then streaming big data processing frame (being hereinafter stream process frame) comes into being.Although stream process frame is also
Be in developing stage, but increasingly important with Stream Processing scene, stream process frame have become academia, industrial quarters pass
Focus on point.The stream process frame of mainstream has Storm, Flink etc. now.
Cluster environment it is increasing, the probability that system performance problems occur is consequently increased, and node may be can not be pre-
Occur the problems such as failure, inadequate resource (Sun great Wei, Zhang Guangyan, Zheng's latitude people's big data streaming computings on the time known or data:
Key technology and system example [J] Journal of Software, 2014, (04):839-862.).In Stream Processing scene, load excessive,
The reasons such as parameter configuration is unreasonable may cause throughput of system to decline, delay rises;Node processing rate is not as good as input speed
Rate, in fact it could happen that back-pressure phenomenon;Data distribution is unbalanced, may lead to single-point resource bottleneck.The requirement of Stream Processing real-time
Harsh, user's tolerance is low, thus ensures that stable system performance is particularly important in Stream Processing.But it is directed to big data system at present
Unite performance issue solution usually all problem generation after, if the scene that performance issue may occur can be built in advance
With application, tested in actual production cluster, can resource or the problem of configuration in discovery system in advance, reduce practical fortune
Loss when row.
Stream process frame just starts to develop in recent years, therefore, for stream process frame performance benchmark test in industry
There is no ripe unified standard, and negligible amounts.Yahoo Streaming Benchmarks(Yahoo Steaming
Benchmarks https://github.com/yahoo/streaming-benchmarks) it is the stream that Yahoo companies design
Handle frame test benchmark.It generates data by Kafka, chooses stream process frame to be measured and executes, while and outside Redis numbers
It is interacted according to library.But the test benchmark only provides the test application of a filter operation, whole completeness is low.
StreamBench(Lu R,Wu G,Xie B,et al.Stream bench:Towards benchmarking modern
distributed stream computing frameworks[C]//Utility and Cloud Computing(UCC),
2014IEEE/ACM 7th International Conference on.IEEE,2014:It is also 69-78.) to be directed at stream
The test benchmark of Development of Framework is managed, it includes the application collection of 7 filterings or statistical operation, to test prolonging for stream process frame
Late, handling capacity and failover capability.But the test application of the benchmark not complexity such as support window, does not also support dynamic change
Data source.In addition some thesis works also had touch upon, Chintapalli S et al. (Chintapalli to this field
S,Dagit D,Evans B,et al.Benchmarking streaming computation engines:storm,
flink and spark streaming[C]//Parallel and Distributed Processing Symposium
Workshops,2016IEEE International.IEEE,2016:1789-1792.) in order to compare Spark
The performance of Streaming, Storm and Flink stream process frame devises one kind simply with Kafka input datas, carried out
Filter, connection, the application of converging operation;Karimov J et al. (Karimov J, Rabl T, Katsifodimos A, et
Al.Benchmarking Distributed Stream Data Processing Systems [J]) construct window polymerization
Two kinds of test application convection current processing frames are connected with window to compare and analyze.But these work all exist using test set
The problems such as covering surface is too small.
To sum up, the existing performance benchmark test for stream process frame is in application test set, stream data source and performance
There are 3 points of deficiencies in terms of index, first, the data source of dynamic change is not supported, second is that the construction of application is too simple, and convection current
The feature level of coverage of formula processing is low, third, most of performance indicator only considered delay and handling capacity, to other as anti-
Pressure etc. indexs without reference to.
Invention content
The technology of the present invention solves the problems, such as:Overcome the deficiencies of the prior art and provide a kind of property of high amount of traffic processing frame
Energy Benchmark test system and method build the data source of Cover Characteristics especially for the feature of Stream Processing pattern, with test
Performance of the frame under Representative flow scene, the bottleneck place of analysis and diagnosis stream process frame.
The technology of the present invention solution is related to the performance benchmark test system of a high amount of traffic processing frame, including streaming
Four workload generator, streaming scene and application builder, performance data collection tool and performance data analysis tool modules.
Streaming workload generator, it is responsible to generate the load for meeting Stream Processing mode data feature.Different from traditional lot number
According to generating mode, streaming workload generator includes the design of flow rate model and the two aspect task of design of data set attribute.Flow velocity
Pattern refers to that frame input rate changes with time pattern, and rate may be that a steady state value is also likely to be changing value, or
Meet the variation of a certain function.Data set attribute refers to the feature of the data set of inflow per second, it includes dimension, out of order journey
Degree, gradient etc..By combining flow rate model and data set attribute, the generation of streaming load may be implemented.
Streaming scene and application builder are responsible for scene and application that structure covering flow data calculates feature, this test system
The scene of system and application are mainly derived from two aspects, when the flow data processing scene frequently encountered in real life, second is that
The test case that current stream process framework test benchmark provides.Meanwhile special consideration should be given to the windows in Stream Processing for constructor
Mechanism, constructor carry out the parameter testing of control variable by changing different window affecting parameters values.
Performance data collection tool, is responsible for each performance indicator of collecting test application during the test, these indexs are removed
Further include some more fine-grained nodal informations outside throughput, delay, back-pressure and system resource, such as in test at stream
Manage processing speed, the data volume of processing, the buffer pool usage amount etc. of each node of frame.
Performance data analysis tool is responsible for handling collected data, according to top-down analysis statistics side
Data visualization is turned to chart by method, reflects influence of the Parameters variation to indexs such as handling capacity, delay, back-pressure, system performances, and
Analyze cause diagnosis stream process frame bottleneck.
The invention further relates to a kind of performance benchmark test method that high amount of traffic handles frame, implementation step is as follows:
1) test cluster is disposed, determines the configuration parameter of stream process frame to be measured, including memory, CPU, Slave number, collection
Group's maximum parallelism degree etc..
2) streaming scene to be measured and application are chosen, flow rate model and data set attribute are set, required parameter is configured.It surveys
Examination parameter is broadly divided into two major classes, when data source modules, including flow velocity, data skewness, out of order degree etc.;Second is that window mould
Block parameter, including window type, degree of parallelism, window size etc..Then tester determines some parameter value for needing to change,
And keep other environment configurations constant, multigroup test is carried out on cluster.
3) test script of the corresponding application of operation, while startability metadata acquisition tool.
4) test is waited for complete.During the test, periodic access cluster is obtained data by performance data collection tool,
And it is persisted in hard disk after test.
5) complete on cluster when test application or after, the performance data that when analysis operation acquires compares this parameter
Influence of the change of variable to handling capacity, delay, back-pressure and system resource, top-down bed-by-bed analysis position bottleneck point.
The present invention has the following advantages that compared with existing big data handles frame test benchmark:
(1) compared with existing big data test benchmark, generation meets actual dynamic change streaming load.In flow velocity
In terms of pattern, the generation pattern of four kinds of different flow velocitys is devised;In terms of data set attribute, it is contemplated that out of order, tilt data
The generation of collection.
(2) compared with existing big data test benchmark, construct covering Stream Processing pattern feature scene with answer
With, especially stream data processing in window scheme, comprehensively consider window type and influence window parameter.
(3) compared with existing big data test benchmark, general handling capacity and delay performance index are not only allowed for,
Distinctive back-pressure index and node data in stream process frame are also acquired and analyzed, while also the system resource of cluster is carried out
It considers.
Description of the drawings
Fig. 1 is inventive energy Benchmark test system Organization Chart;
Fig. 2 is that window of the present invention chooses flow chart;
Fig. 3 is ProductStatis using the performance comparison result figure under different gradients;Left figure is under different gradients
Back-pressure comparison diagram, right figure be different gradients under data source reality output rate diagram;
Fig. 4, which is gradient, influences data distribution principle figure;
Fig. 5 is the performance comparison result figure under TransactionJoin application different windows sizes;Left figure is different windows
Back-pressure comparison diagram under size, right figure are to postpone comparison diagram under different windows size;
Fig. 6, which is window size, influences calculation amount schematic diagram;Left figure is the data calculation amount under wicket, and right figure is big window
Under data calculation amount;
Fig. 7 is that ProductStatis applies the performance comparison figure under different nominal rated speeds from TransactionJoin.
The picture left above is that PorductStatis applies the back-pressure comparison diagram under different nominal rated speeds, top right plot to be applied for PorductStatis
The reality output rate comparison figure under different nominal rated speeds, lower-left figure are that TransactionJoin is applied in different nominal rated speeds
Lower back-pressure comparison diagram, bottom-right graph are that TransactionJoin applies the reality output rate comparison figure under different nominal rated speeds.
Specific implementation mode
Below in conjunction with the drawings and specific embodiments, the present invention is described in detail.
The present invention proposes a kind of performance benchmark test system and method for high amount of traffic processing frame, and core concept is logical
Scene and the application for crossing structure covering Stream Processing pattern feature, realize that Representative flow handles scene by combining running parameter
Under test, while performance indicator data of acquisition applications stream process frame at runtime are finally visualized as chart, are sent out
Now and analyze performance issue and reason that stream process frame generates.
The feature that Stream Processing pattern has its exclusive is classified as data characteristics and calculates two aspect of feature herein.Data
Feature refers mainly to the feature of the pending flow data of frame, including following five kinds:
1) real-time.Flow data is generated, is reached in real time in real time, needs to be handled in real time.
2) timing.Flow data has sequential logic, with time correlation.But data source is not unique, each tuple in data flow
Order of arrival it is mutual indepedent, processing when may generation time it is out of order.
3) unlimitedness.Flow data is as unit of tuple, and data are infinite and lasting generation.
4) dynamic changeability.The rate of flow data and distribution are influenced by current time actual production environment, can not be shifted to an earlier date
Precognition.
5) difficult reproduction.Data are once processing, unless specially preserving, otherwise cannot be taken out again, or extract again
It costs dearly.
In addition, the calculating feature of Stream Processing pattern particularly may be divided into following four:
1) real-time calculated.In stream process frame, the data one by one sent out from data source flow between operator,
It obtains a result after calculating and analyzing.Flow data is generated, is reached in real time in real time, needs to be handled in real time, have to frame higher
Low latency requirement.
2) order calculated.Flow data and time are closely related, but the sequence that data reach is unpredictable, due to network
The reason of delay etc. unavoidably generates out of order data flow, it is therefore desirable to a kind of place for the time when operator receives data
Reason strategy, orderly calculates out of order data, and water level line mechanism (Watermark) is out of order place common in stream process frame
Reason strategy.
3) boundedness calculated.Flow data persistently generates, this is originally enterprising in bounded data set to polymerizeing, connecting etc. some
The semanteme of row operation proposes new demand, and the unlimitedness of data causes it can not be until all data receivers terminate just to be located
Reason.Therefore, Stream Processing generally uses window scheme, and based on certain rule, the collection of bounded is marked off in unlimited data flow
It closes, relevant polymerization or attended operation is then carried out on the bounded set.Window scheme is also that Stream Processing is different from batch
One big feature of reason.
4) high reliability run.What batch processing calculated is the off-line data of persistence, if going wrong in operation,
Only load need to be simply repeated to calculate.But what flow data computed repeatedly costs dearly, this requires Stream Processing to be sent out in failure
It so that it is restored to the cluster state at certain nearest moment after life, avoid computing repeatedly.The failure recovery of stream process frame
Semanteme, which ensures, is divided into three kinds, is at most primary, primary, different stream process frame failure recovery language at least once and just respectively
Justice, which ensures, has any different.
Currently, the application of universal Stream Processing all operates on distributed stream processing frame, the distributed stream of industry mainstream
Processing frame has Storm, Spark Streaming, Flink etc..Flink is the distributed stream increased income a processing frame, it
Using the tupe of continuous flow, have the advantages that high-throughput, low latency, and supports the failure recovery of " just primary " semantic.
Compared to other stream process frames, Flink is directed to the characteristic of Stream Processing pattern, devises perfect treatment mechanism, while it
Batch processing is considered as to a kind of special circumstances of stream process, it is unified to have carried out stream batch in frame bottom.Therefore, Flink is selected herein
For object to be measured frame.
The application architecture that present example uses is as shown in Figure 1, top half is to be measured applies in Flink stream process frames
Logic chart when middle operation, lower half portion are the module architectures of present system (stream process frame of reference tests system), including
Streaming workload generator, scene and four application test set, performance data collection tool and performance data analysis tool composition moulds
Block.
(1) streaming workload generator
It is responsible to generate the streaming load for meeting Stream Processing mode data feature.According to data characteristics, streaming load generates
Device includes design of both flow rate model and data set attribute.
In flow rate model, the present invention devises four kinds of different flow rate models:Fixed rate, i.e. data source life each second
At data volume it is constant;The data volume of random rates, i.e. data source generation per second obtains at random within the scope of one;Mutation speed
Rate, i.e. data source suddenly increase to hump speed by a stable smaller value in a short time;Exponential rate, i.e. data source are every
The data volume second generated increases according to exponential function, and is recycled within the period.Specify three kinds of different data set categories simultaneously
Property:Data dimension, dimension is bigger, and the size per data is bigger, influences network transmission volume;Data skewness, for Key/
The data of Value forms, the inclined degree of description Key distributions, gradient is higher, and the distribution of Key is more uneven;Out of order degree,
Input traffic time out of order degree is described, degree is bigger, and certain data time are out of order more serious in input traffic.
(2) streaming scene and application builder
Include the test set of performance Benchmark test system.On the one hand test set needs to build comprehensive scene with application to cover
On the other hand the calculating feature of lid Stream Processing pattern needs the processing logic in application energy practical in production life, is based on
Above 2 points, test set devises the typical five kinds of scenes of Stream Processing and eight kinds of test applications altogether, as shown in table 1.
1 scene of table and typical case introduction
Table 2 compares the spy of the application collection involved in this performance Benchmark test system and other benchmark or correlative theses work
Coverage condition is levied, this system is wider compared with other benchmark level of coverage in window type and two aspect of operator operation.
The feature coverage condition comparison of 2 each test benchmark of table application collection
Wherein, window treatments mechanism is a big feature of stream process frame, and the parameter of this part is the most complicated.Influence window
There are many parameters of mouth, such as window size, sliding step, operator degree of parallelism, triggering calculating function etc., some parameters are only mutually
Vertical, some interdepend.Application window in the specific implementation, Benchmark test system is docked not by different parameter configurations
Same window selects flow as shown in Figure 2.
1) determine whether window is the window based on Key first, if it is the window based on Key, then data flow into
Before row window operator, the division operation for carrying out keyby functions is needed;If not the common window based on Key, then its window
The degree of parallelism of operator may be only configured to 1, can not carry out parallel computation.
2) it and then determines the type of drive of window, is divided into counting driving two kinds of window and time driving window.Count window
Window calculation is triggered by the data amount check in window, time window is by being arranged time triggered window calculation.In time window
In there is also the difference of time of origin (Event-time) and processing time (Processing-time), according to time of origin,
It also needs to setting water level line operator and maximum allowable out of order time parameter handles out of order situation.
3) mobile type for then determining window is divided into rolling window, sliding window and session window.Wherein roll window
It only needs that window size is arranged, sliding window needs that window size is arranged and sliding step, session window only belong to time driving
Window needs that inactive time gap is arranged.
4) window calculation function is finally accessed, this function is included in using in logic, is transmitted not as parameter.
(3) performance data collection tool
It is responsible for being collected each performance indicator of test application, these indexs include throughput as shown in table 3, prolong
Late, the more fine-grained letter such as rate, processing data amount, buffer pool usage amount of each node when back-pressure, system resource and operation
Breath.The present invention carries out data acquisition by Profiler sampling instruments, it will periodically access the Master nodes of cluster, obtain
The runtime data of current time test application.After test, Profiler is by all data of the secondary test according to adopting
The collection time is persisted to disk, is preserved with Json formats.
3 performance indicator of table
(4) performance data analysis tool
It is responsible for for statistical analysis to the performance data after acquisition.It includes two stages, and the first stage is to persistence
Json formatted datas carry out statistics and analysis, and multiple test result is combined and is compared, the intermediate result of Csv formats is converted into;The
Two-stage converts Csv formatted datas to visual chart data.According to the intermediate result stored in Csv files, draw anti-
Press figure, delay figure, throughput figure etc., influence of the reflection Parameters variation to performance, diagnosis stream process frame bottleneck place.
Using Apache Flink as object to be measured frame, is applied using five kinds of scenes eight of structure and carried out on cluster
Test finds and summarizes the performance issue that Flink occurs in a variety of Representative flows processing scene.
(1) influence of the gradient to performance
Fig. 3 applies the performance comparison result figure under different gradients for ProductStatis, and wherein left figure is different inclinations
Back-pressure comparison diagram under degree, right figure are data source reality output rate diagram under different gradients, and the value of data skewness is
{ 0,1,4 } no inclination, low dip and high dip are indicated respectively.It can be obtained from left figure, in the identical situation of other configurations parameter
Under, occur the back-pressure that grade is HIGH under high obliquity, under no inclination and low dip, back-pressure grade is LOW;Right figure can be seen
Go out, in low dip and nonangular test, the data volume of the ends Source output is close, rate in or so 4000K items/second, but
In high dip test, the data volume that the ends Source generate tails off, only less than 2000K items/second.
By analysis, obtain to draw a conclusion:As shown in figure 4, the processing of window operator is related to Key, the data of identical Key
It will be assigned on same stream and calculate.The data of Key=1, Key=3 are assigned in window (1/2) node in figure, Key=
2, Key=4 is assigned in window (2/2) node.Without the number tilted in low dip test, each window node is got
The influence of the reason of suitable or gap is little, and back-pressure grade is LOW according to measuring mainly data source rate.And it is surveyed in high obliquity
In examination, the accounting of some or several Key in data set is very high, leads to the data volume that some window node is got very
Greatly, the execution time of window calculation increases, and node processing rate is less than data entry rate at this time, and the data for having little time processing will
It is overstock in the buffering area of input terminal, occurs obstruction and HIGH grade back-pressures after the buffering area of input terminal takes.In order to subtract
Congestion situations are fed back to upstream node by growth that is slow or stopping buffering area, system so that the traffic volume of upstream is reduced, processing is fast
Rate is slack-off, this will feed back to upstream ... and so feed back step by step again, eventually leads to the ends Source output data quantity and reduces.
(2) influence of the window size to performance
Fig. 5 is the performance comparison result figure under TransactionJoin application different windows sizes, and wherein left figure is difference
Back-pressure comparison diagram under window size, right figure are to postpone comparison diagram under different windows size.Left figure can obtain, as window is calculated
The increase of window size in son, back-pressure higher grade;Maximum when right figure is test execution in the delay that output end is collected into
Value, due to there are two data source, so delay is also there are two source, lateral comparison show the delay difference between two data sources away from
Less.But with window size increase in window operator, increase in the delay that output end receives.
It is drawn a conclusion by analysis:As shown in fig. 6, the bigger data for including of window are more, and the meter of window operator triggering
The connection of a complexity calculates at last, executes the time with the growth of data square rank, this makes in triggering computation window mouth node
Processing speed be less than input rate, data are overstock in the buffering area of input terminal, after the buffering area of input terminal is occupied full
There is back-pressure.After feeding back to data source, the input of the data of whole system reduces, calculate that time-consuming and cause data in node etc.
Wait for that the time increases, delay is consequently increased.
(3) influence of the rate to performance
Fig. 7 is that ProductStatis applies the performance comparison under different nominal rated speeds from TransactionJoin.It is left
It is upper to apply the back-pressure comparison diagram under different nominal rated speeds, upper right to be applied in difference for PorductStatis for PorductStatis
Reality output rate comparison figure under nominal rated speed, lower-left are that TransactionJoin applies the back-pressure pair under different nominal rated speeds
Than figure, bottom right is that TransactionJoin applies the reality output rate comparison figure under different nominal rated speeds.
In ProductStatis application results, with the increase of nominal rated speed, back-pressure becomes LOW from OK.What real data source generated
Rate changes with time under 10k/s and 160k/s nominal rated speeds, is met it is contemplated that still in 640k/s and 1000k/s
It when nominal rated speed, and is not up to expected, and the actual speed rate of the two is close, illustrates that bottleneck occurs in system velocity growth, but this
When do not occur the back-pressures of HIGH grades, judge that rate the reason of bottleneck occurs and is network bandwidth.TransactionJoin
Using in result, other than back-pressure grade is OK under 1k/s rates, all occur the anti-of HIGH grades under other nominal rated speeds
Pressure.The actual data output rate of data source shows outside except 1k/s nominal rated speeds that volume is not achieved in other actual speed rates
Definite value.Two may determine that the back-pressure of system under the complexity effect nominal rated speed of computation logic using lateral comparison.
Although nominal rated speed is not achieved in ProductStatis applications, rate is maintained at always stabilization,
There is larger fluctuation at any time in the actual speed rate of TransactionJoin applications, and rate, which is in, rises a period of time, rapidly
It falls after rise, in the loop cycle then risen again.The reason of rate fluctuation occurs is probed into, for complexity such as TransactionJoin
Window logic calculates application, and when window does not trigger calculating, input terminal receives always data, and no back-pressure occurs;Work as window calculation
When, long due to calculating the time, processing speed is not as good as receiving velocity, and buffer data, which is overstock, produces high back-pressure, and feedback causes to work as
The reality output rate of time data source is reduced;After this window calculation, the data receiver overstock in buffering area is extensive
Multiple, back-pressure grade reduces, rate rises;When triggering window calculation again, buffer data starts to overstock again, therefore back-pressure and speed
The inverse relation of fluctuation is presented in rate.And for ProductStatis simple computation applications, window triggering execution required when calculating
Time is short, and buffering area does not take in calculating process, can maintain equilibrium state substantially as long as reducing input rate at this time.
Although disclosing specific embodiments of the present invention and attached drawing for the purpose of illustration, its object is to help to understand the present invention
Content and implement according to this, but it will be appreciated by those skilled in the art that:The present invention and the attached claims are not being departed from
Spirit and scope in, corresponding method and tool can also be realized on other platforms.Therefore, the present invention should not be limited to reality
Apply example and attached drawing disclosure of that.
Claims (7)
1. a kind of performance benchmark test system of high amount of traffic processing frame, which is characterized in that including:Streaming workload generator,
Streaming scene and application builder, performance data collection tool and performance data analysis tool;
The streaming workload generator generates the flow data for including data parameters;
The streaming scene and application builder, structure special scenes and application, run the application and carry out institute under different parameters
Frame is stated in the scene and the performance test for handling the flow data in application;
The performance data collection tool, acquires the performance indicator during the performance test;
The performance data analysis tool carries out processing analysis to the performance indicator of performance collection tool acquisition, with
Reflect influence of the Parameters variation to the performance indicator, and diagnoses the bottleneck that the frame carries out flow data processing.
2. the performance benchmark test system of high amount of traffic processing frame according to claim 1, it is characterised in that:The ginseng
Number includes but not limited to data parameters, application parameter or systematic parameter.
3. the performance benchmark test system of high amount of traffic processing frame according to claim 1 or 2, it is characterised in that:Institute
It includes but not limited to streaming load characteristic to state data parameters, and the flow velocity load characteristic includes but not limited to flow rate model or data
Technical, the flow rate model includes but not limited to fixed rate, random rates, mutation rate or exponential rate, the data
Set attribute includes but not limited to data dimension, data skewness or the out of order degree of data.
4. the performance benchmark test system of high amount of traffic processing frame according to claim 2, it is characterised in that:The system
Parameter of uniting includes but not limited to memory, CPU, Slave number or cluster maximum parallelism degree.
5. the performance benchmark test system of high amount of traffic processing frame according to claim 1, it is characterised in that:The property
Energy index includes but not limited to throughput, delay, back-pressure, system resource, node processing rate, node data amount or node buffering
Pond usage amount.
6. the performance benchmark test system of high amount of traffic processing frame according to claim 2, it is characterised in that:It is described to answer
With for window scheme, the application parameter is window parameter, the window parameter include but not limited to window type, degree of parallelism,
Or window size.
7. a kind of performance benchmark test method of high amount of traffic processing frame, which is characterized in that include the following steps:
(1) test cluster is disposed, the systematic parameter of the frame is configured;
(2) scene and application are chosen, configuration data parameter, application parameter and test parameter determine the more of some parameter to be tested
A value, and other parameter constants are kept, the application is run on the cluster carries out multigroup test;
(3) it runs the application to be tested, while startability sampling instrument, test is waited for complete;
(4) during the test, the performance sampling instrument by described in periodic access test cluster collect performance indicator, and
Collected performance indicator is persisted in storage device after test;
(5) when it is described apply on the test cluster complete or after, analyze the property that the performance sampling instrument is acquired
Can data, the influence of the change of the parameter to be tested to the performance indicator, top-down bed-by-bed analysis, described in positioning
The bottleneck point of frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810461515.XA CN108683560B (en) | 2018-05-15 | 2018-05-15 | Performance benchmark test system and method for large data stream processing framework |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810461515.XA CN108683560B (en) | 2018-05-15 | 2018-05-15 | Performance benchmark test system and method for large data stream processing framework |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108683560A true CN108683560A (en) | 2018-10-19 |
CN108683560B CN108683560B (en) | 2021-03-30 |
Family
ID=63806177
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810461515.XA Active CN108683560B (en) | 2018-05-15 | 2018-05-15 | Performance benchmark test system and method for large data stream processing framework |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108683560B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109542985A (en) * | 2018-11-27 | 2019-03-29 | 江苏擎天信息科技有限公司 | A kind of general streaming Data Analysis Model and its construction method |
CN110058977A (en) * | 2019-01-14 | 2019-07-26 | 阿里巴巴集团控股有限公司 | Monitor control index method for detecting abnormality, device and equipment based on Stream Processing |
CN110069331A (en) * | 2019-04-24 | 2019-07-30 | 北京百度网讯科技有限公司 | A kind of data processing method, device and electronic equipment |
CN110704998A (en) * | 2019-06-25 | 2020-01-17 | 眸芯科技(上海)有限公司 | Multimedia IP bandwidth performance verification method and device |
CN110740079A (en) * | 2019-10-16 | 2020-01-31 | 北京航空航天大学 | full link benchmark test system for distributed scheduling system |
CN110971483A (en) * | 2019-11-08 | 2020-04-07 | 苏宁云计算有限公司 | Pressure testing method and device and computer system |
CN111049684A (en) * | 2019-12-12 | 2020-04-21 | 闻泰通讯股份有限公司 | Data analysis method, device, equipment and storage medium |
CN111143143A (en) * | 2019-12-26 | 2020-05-12 | 北京神州绿盟信息安全科技股份有限公司 | Performance test method and device |
CN111737097A (en) * | 2020-06-05 | 2020-10-02 | 浪潮电子信息产业股份有限公司 | Performance test method and related device of stream processing system |
CN111930630A (en) * | 2020-08-17 | 2020-11-13 | 电信科学技术第十研究所有限公司 | Big data test case generation method and device based on data flow |
CN112070235A (en) * | 2020-09-08 | 2020-12-11 | 北京小米松果电子有限公司 | Abnormity positioning method and device of deep learning framework and storage medium |
CN113760989A (en) * | 2021-02-04 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Method, device and equipment for processing unbounded stream data and storage medium |
CN115033457A (en) * | 2022-06-22 | 2022-09-09 | 浙江大学 | Multi-source data real-time acquisition method and system capable of monitoring and early warning |
-
2018
- 2018-05-15 CN CN201810461515.XA patent/CN108683560B/en active Active
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109542985B (en) * | 2018-11-27 | 2023-09-19 | 南京擎天科技有限公司 | Universal stream data analysis model and construction method thereof |
CN109542985A (en) * | 2018-11-27 | 2019-03-29 | 江苏擎天信息科技有限公司 | A kind of general streaming Data Analysis Model and its construction method |
WO2020147480A1 (en) * | 2019-01-14 | 2020-07-23 | 阿里巴巴集团控股有限公司 | Stream processing-based monitoring index abnormality detection method, device and equipment |
CN110058977A (en) * | 2019-01-14 | 2019-07-26 | 阿里巴巴集团控股有限公司 | Monitor control index method for detecting abnormality, device and equipment based on Stream Processing |
CN110058977B (en) * | 2019-01-14 | 2020-08-14 | 阿里巴巴集团控股有限公司 | Monitoring index abnormity detection method, device and equipment based on stream processing |
CN110069331A (en) * | 2019-04-24 | 2019-07-30 | 北京百度网讯科技有限公司 | A kind of data processing method, device and electronic equipment |
CN110704998A (en) * | 2019-06-25 | 2020-01-17 | 眸芯科技(上海)有限公司 | Multimedia IP bandwidth performance verification method and device |
CN110704998B (en) * | 2019-06-25 | 2023-04-18 | 眸芯科技(上海)有限公司 | Multimedia IP bandwidth performance verification method and device |
CN110740079A (en) * | 2019-10-16 | 2020-01-31 | 北京航空航天大学 | full link benchmark test system for distributed scheduling system |
CN110971483A (en) * | 2019-11-08 | 2020-04-07 | 苏宁云计算有限公司 | Pressure testing method and device and computer system |
CN110971483B (en) * | 2019-11-08 | 2021-11-09 | 苏宁云计算有限公司 | Pressure testing method and device and computer system |
CN111049684A (en) * | 2019-12-12 | 2020-04-21 | 闻泰通讯股份有限公司 | Data analysis method, device, equipment and storage medium |
CN111143143A (en) * | 2019-12-26 | 2020-05-12 | 北京神州绿盟信息安全科技股份有限公司 | Performance test method and device |
CN111143143B (en) * | 2019-12-26 | 2024-02-23 | 绿盟科技集团股份有限公司 | Performance test method and device |
CN111737097A (en) * | 2020-06-05 | 2020-10-02 | 浪潮电子信息产业股份有限公司 | Performance test method and related device of stream processing system |
CN111737097B (en) * | 2020-06-05 | 2022-06-07 | 浪潮电子信息产业股份有限公司 | Performance test method and related device of stream processing system |
CN111930630B (en) * | 2020-08-17 | 2024-01-05 | 电信科学技术第十研究所有限公司 | Method and device for generating big data test case based on data stream |
CN111930630A (en) * | 2020-08-17 | 2020-11-13 | 电信科学技术第十研究所有限公司 | Big data test case generation method and device based on data flow |
CN112070235A (en) * | 2020-09-08 | 2020-12-11 | 北京小米松果电子有限公司 | Abnormity positioning method and device of deep learning framework and storage medium |
CN113760989A (en) * | 2021-02-04 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Method, device and equipment for processing unbounded stream data and storage medium |
CN115033457A (en) * | 2022-06-22 | 2022-09-09 | 浙江大学 | Multi-source data real-time acquisition method and system capable of monitoring and early warning |
CN115033457B (en) * | 2022-06-22 | 2023-08-25 | 浙江大学 | Multi-source data real-time acquisition method and system capable of monitoring and early warning |
Also Published As
Publication number | Publication date |
---|---|
CN108683560B (en) | 2021-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108683560A (en) | A kind of performance benchmark test system and method for high amount of traffic processing frame | |
Meng et al. | Localizing failure root causes in a microservice through causality inference | |
Karimov et al. | Benchmarking distributed stream data processing systems | |
CN106886485B (en) | System capacity analysis and prediction method and device | |
CN107967485A (en) | Electro-metering equipment fault analysis method and device | |
CN109074284A (en) | For increasing and decreasing the method and system and computer program product of resource in proportion | |
CN110740079B (en) | Full link benchmark test system for distributed scheduling system | |
Mustafa et al. | A machine learning approach for predicting execution time of spark jobs | |
CN106776288B (en) | A kind of health metric method of the distributed system based on Hadoop | |
CN104516808A (en) | Data preprocessing device and method thereof | |
CN106850321A (en) | A kind of simulated testing system of cluster server | |
CN105512264A (en) | Performance prediction method of concurrency working loads in distributed database | |
JPWO2008001678A1 (en) | Method, program and apparatus for optimizing system configuration parameter sets | |
Liu et al. | Benchmarking time series databases with IoTDB-benchmark for IoT scenarios | |
CN112633542A (en) | System performance index prediction method, device, server and storage medium | |
CN107707680A (en) | A kind of distributed data load-balancing method and system based on node computing capability | |
CN107360026A (en) | Distributed message performance of middle piece is predicted and modeling method | |
CN110321493A (en) | A kind of abnormality detection of social networks and optimization method, system and computer equipment | |
Singh et al. | Improving the quality of software by quantifying the code change metric and predicting the bugs | |
US20140181174A1 (en) | Distributed processing of stream data on an event protocol | |
CN111274112B (en) | Application program pressure measurement method, device, computer equipment and storage medium | |
CN112988529B (en) | Method and system for predicting database system performance based on machine learning | |
Ehrenstein | Scalability benchmarking of kafka streams applications | |
CN108712303B (en) | Tail delay evaluation system and method for cloud platform | |
Guo et al. | Sigco: Mining significant correlations via a distributed real-time computation engine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |