CN108683560A - A kind of performance benchmark test system and method for high amount of traffic processing frame - Google Patents

A kind of performance benchmark test system and method for high amount of traffic processing frame Download PDF

Info

Publication number
CN108683560A
CN108683560A CN201810461515.XA CN201810461515A CN108683560A CN 108683560 A CN108683560 A CN 108683560A CN 201810461515 A CN201810461515 A CN 201810461515A CN 108683560 A CN108683560 A CN 108683560A
Authority
CN
China
Prior art keywords
data
performance
test
application
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810461515.XA
Other languages
Chinese (zh)
Other versions
CN108683560B (en
Inventor
黄涛
许利杰
魏峻
王伟
郑莹莹
刘重瑞
胡家煊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN201810461515.XA priority Critical patent/CN108683560B/en
Publication of CN108683560A publication Critical patent/CN108683560A/en
Application granted granted Critical
Publication of CN108683560B publication Critical patent/CN108683560B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • H04L43/045Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0888Throughput
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0894Packet rate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/27Evaluation or update of window size, e.g. using information derived from acknowledged [ACK] packets

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Mining & Analysis (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention relates to the performance benchmark test system and method that a kind of high amount of traffic handles frame, system is made of four streaming workload generator, streaming scene and application builder, performance data collection tool and performance data analysis tool parts.The present invention meets the application of Stream Processing mode computation feature by choosing, generate the load for meeting Stream Processing mode data feature, high amount of traffic processing frame is tested in typical scene and the performance under application, the performance indicators such as back-pressure, handling capacity, delay, system resource, node data when acquisition operation diagnose the bottleneck place of stream process frame finally by analysis and statistics gatherer data.

Description

A kind of performance benchmark test system and method for high amount of traffic processing frame
Technical field
The present invention relates to the performance benchmark test system and methods that a kind of high amount of traffic handles frame, more particularly in typical case Streaming scene and performance when application lower frame operation, belong to software technology field.
Background technology
With the arrival of Internet era, the continuous development of the technologies such as mobile Internet, social networks, e-commerce, number According to explosive growth is presented, big data has become current scientific and technological circle, the business circles even hot spot of attention from government.
In general, data can be divided into bounded data and infinite data.Bounded data, also referred to as batch data refer to fixed bounded The data being stored in persistence medium, data volume does not change when calculating.Usually, batch big data processing frame (after Text is referred to as batch processing frame) receive task that user submits to the data set progress logical process and analysis that store, finally Export result.For example, carrying out analysis mining to history data set using machine learning algorithm, prediction model is established.Have now The batch processing frame of many maturations gets application, such as Hadoop, Spark.
But with sensing equipment, the rise of social networks or extensive utilization, magnanimity high-speed data is analyzed in real time Demand constantly promoted, it is this it is lasting generation and infinite data be referred to as infinite data, also known as flow data.Consulting overseas Mechanism shows the investigation of IT application in enterprises, demand that 70% enterprise is handled in real time there are stream data (Liu X, Iftikhar N,Xie X.Survey of real-time processing systems for big data.International Database Engineering&Applications Symposium ACM.New York.USA 2014:356-361).For example, Alibaba is based on Blink frames, real-time update commercial articles searching engine, structure exists Line machine learning platform;Group of U.S. net is based on Storm frames, analyzes user behavior, realizes recommendation feedback quasi real time;Ooze row Based on Samza frames, the generation place of order data is monitored, draws geographical thermodynamic chart early warning.
But flow data has a feature different from batch data, traditional batch processing frame can not well stream data into Row is handled, and then streaming big data processing frame (being hereinafter stream process frame) comes into being.Although stream process frame is also Be in developing stage, but increasingly important with Stream Processing scene, stream process frame have become academia, industrial quarters pass Focus on point.The stream process frame of mainstream has Storm, Flink etc. now.
Cluster environment it is increasing, the probability that system performance problems occur is consequently increased, and node may be can not be pre- Occur the problems such as failure, inadequate resource (Sun great Wei, Zhang Guangyan, Zheng's latitude people's big data streaming computings on the time known or data: Key technology and system example [J] Journal of Software, 2014, (04):839-862.).In Stream Processing scene, load excessive, The reasons such as parameter configuration is unreasonable may cause throughput of system to decline, delay rises;Node processing rate is not as good as input speed Rate, in fact it could happen that back-pressure phenomenon;Data distribution is unbalanced, may lead to single-point resource bottleneck.The requirement of Stream Processing real-time Harsh, user's tolerance is low, thus ensures that stable system performance is particularly important in Stream Processing.But it is directed to big data system at present Unite performance issue solution usually all problem generation after, if the scene that performance issue may occur can be built in advance With application, tested in actual production cluster, can resource or the problem of configuration in discovery system in advance, reduce practical fortune Loss when row.
Stream process frame just starts to develop in recent years, therefore, for stream process frame performance benchmark test in industry There is no ripe unified standard, and negligible amounts.Yahoo Streaming Benchmarks(Yahoo Steaming Benchmarks https://github.com/yahoo/streaming-benchmarks) it is the stream that Yahoo companies design Handle frame test benchmark.It generates data by Kafka, chooses stream process frame to be measured and executes, while and outside Redis numbers It is interacted according to library.But the test benchmark only provides the test application of a filter operation, whole completeness is low. StreamBench(Lu R,Wu G,Xie B,et al.Stream bench:Towards benchmarking modern distributed stream computing frameworks[C]//Utility and Cloud Computing(UCC), 2014IEEE/ACM 7th International Conference on.IEEE,2014:It is also 69-78.) to be directed at stream The test benchmark of Development of Framework is managed, it includes the application collection of 7 filterings or statistical operation, to test prolonging for stream process frame Late, handling capacity and failover capability.But the test application of the benchmark not complexity such as support window, does not also support dynamic change Data source.In addition some thesis works also had touch upon, Chintapalli S et al. (Chintapalli to this field S,Dagit D,Evans B,et al.Benchmarking streaming computation engines:storm, flink and spark streaming[C]//Parallel and Distributed Processing Symposium Workshops,2016IEEE International.IEEE,2016:1789-1792.) in order to compare Spark The performance of Streaming, Storm and Flink stream process frame devises one kind simply with Kafka input datas, carried out Filter, connection, the application of converging operation;Karimov J et al. (Karimov J, Rabl T, Katsifodimos A, et Al.Benchmarking Distributed Stream Data Processing Systems [J]) construct window polymerization Two kinds of test application convection current processing frames are connected with window to compare and analyze.But these work all exist using test set The problems such as covering surface is too small.
To sum up, the existing performance benchmark test for stream process frame is in application test set, stream data source and performance There are 3 points of deficiencies in terms of index, first, the data source of dynamic change is not supported, second is that the construction of application is too simple, and convection current The feature level of coverage of formula processing is low, third, most of performance indicator only considered delay and handling capacity, to other as anti- Pressure etc. indexs without reference to.
Invention content
The technology of the present invention solves the problems, such as:Overcome the deficiencies of the prior art and provide a kind of property of high amount of traffic processing frame Energy Benchmark test system and method build the data source of Cover Characteristics especially for the feature of Stream Processing pattern, with test Performance of the frame under Representative flow scene, the bottleneck place of analysis and diagnosis stream process frame.
The technology of the present invention solution is related to the performance benchmark test system of a high amount of traffic processing frame, including streaming Four workload generator, streaming scene and application builder, performance data collection tool and performance data analysis tool modules.
Streaming workload generator, it is responsible to generate the load for meeting Stream Processing mode data feature.Different from traditional lot number According to generating mode, streaming workload generator includes the design of flow rate model and the two aspect task of design of data set attribute.Flow velocity Pattern refers to that frame input rate changes with time pattern, and rate may be that a steady state value is also likely to be changing value, or Meet the variation of a certain function.Data set attribute refers to the feature of the data set of inflow per second, it includes dimension, out of order journey Degree, gradient etc..By combining flow rate model and data set attribute, the generation of streaming load may be implemented.
Streaming scene and application builder are responsible for scene and application that structure covering flow data calculates feature, this test system The scene of system and application are mainly derived from two aspects, when the flow data processing scene frequently encountered in real life, second is that The test case that current stream process framework test benchmark provides.Meanwhile special consideration should be given to the windows in Stream Processing for constructor Mechanism, constructor carry out the parameter testing of control variable by changing different window affecting parameters values.
Performance data collection tool, is responsible for each performance indicator of collecting test application during the test, these indexs are removed Further include some more fine-grained nodal informations outside throughput, delay, back-pressure and system resource, such as in test at stream Manage processing speed, the data volume of processing, the buffer pool usage amount etc. of each node of frame.
Performance data analysis tool is responsible for handling collected data, according to top-down analysis statistics side Data visualization is turned to chart by method, reflects influence of the Parameters variation to indexs such as handling capacity, delay, back-pressure, system performances, and Analyze cause diagnosis stream process frame bottleneck.
The invention further relates to a kind of performance benchmark test method that high amount of traffic handles frame, implementation step is as follows:
1) test cluster is disposed, determines the configuration parameter of stream process frame to be measured, including memory, CPU, Slave number, collection Group's maximum parallelism degree etc..
2) streaming scene to be measured and application are chosen, flow rate model and data set attribute are set, required parameter is configured.It surveys Examination parameter is broadly divided into two major classes, when data source modules, including flow velocity, data skewness, out of order degree etc.;Second is that window mould Block parameter, including window type, degree of parallelism, window size etc..Then tester determines some parameter value for needing to change, And keep other environment configurations constant, multigroup test is carried out on cluster.
3) test script of the corresponding application of operation, while startability metadata acquisition tool.
4) test is waited for complete.During the test, periodic access cluster is obtained data by performance data collection tool, And it is persisted in hard disk after test.
5) complete on cluster when test application or after, the performance data that when analysis operation acquires compares this parameter Influence of the change of variable to handling capacity, delay, back-pressure and system resource, top-down bed-by-bed analysis position bottleneck point.
The present invention has the following advantages that compared with existing big data handles frame test benchmark:
(1) compared with existing big data test benchmark, generation meets actual dynamic change streaming load.In flow velocity In terms of pattern, the generation pattern of four kinds of different flow velocitys is devised;In terms of data set attribute, it is contemplated that out of order, tilt data The generation of collection.
(2) compared with existing big data test benchmark, construct covering Stream Processing pattern feature scene with answer With, especially stream data processing in window scheme, comprehensively consider window type and influence window parameter.
(3) compared with existing big data test benchmark, general handling capacity and delay performance index are not only allowed for, Distinctive back-pressure index and node data in stream process frame are also acquired and analyzed, while also the system resource of cluster is carried out It considers.
Description of the drawings
Fig. 1 is inventive energy Benchmark test system Organization Chart;
Fig. 2 is that window of the present invention chooses flow chart;
Fig. 3 is ProductStatis using the performance comparison result figure under different gradients;Left figure is under different gradients Back-pressure comparison diagram, right figure be different gradients under data source reality output rate diagram;
Fig. 4, which is gradient, influences data distribution principle figure;
Fig. 5 is the performance comparison result figure under TransactionJoin application different windows sizes;Left figure is different windows Back-pressure comparison diagram under size, right figure are to postpone comparison diagram under different windows size;
Fig. 6, which is window size, influences calculation amount schematic diagram;Left figure is the data calculation amount under wicket, and right figure is big window Under data calculation amount;
Fig. 7 is that ProductStatis applies the performance comparison figure under different nominal rated speeds from TransactionJoin. The picture left above is that PorductStatis applies the back-pressure comparison diagram under different nominal rated speeds, top right plot to be applied for PorductStatis The reality output rate comparison figure under different nominal rated speeds, lower-left figure are that TransactionJoin is applied in different nominal rated speeds Lower back-pressure comparison diagram, bottom-right graph are that TransactionJoin applies the reality output rate comparison figure under different nominal rated speeds.
Specific implementation mode
Below in conjunction with the drawings and specific embodiments, the present invention is described in detail.
The present invention proposes a kind of performance benchmark test system and method for high amount of traffic processing frame, and core concept is logical Scene and the application for crossing structure covering Stream Processing pattern feature, realize that Representative flow handles scene by combining running parameter Under test, while performance indicator data of acquisition applications stream process frame at runtime are finally visualized as chart, are sent out Now and analyze performance issue and reason that stream process frame generates.
The feature that Stream Processing pattern has its exclusive is classified as data characteristics and calculates two aspect of feature herein.Data Feature refers mainly to the feature of the pending flow data of frame, including following five kinds:
1) real-time.Flow data is generated, is reached in real time in real time, needs to be handled in real time.
2) timing.Flow data has sequential logic, with time correlation.But data source is not unique, each tuple in data flow Order of arrival it is mutual indepedent, processing when may generation time it is out of order.
3) unlimitedness.Flow data is as unit of tuple, and data are infinite and lasting generation.
4) dynamic changeability.The rate of flow data and distribution are influenced by current time actual production environment, can not be shifted to an earlier date Precognition.
5) difficult reproduction.Data are once processing, unless specially preserving, otherwise cannot be taken out again, or extract again It costs dearly.
In addition, the calculating feature of Stream Processing pattern particularly may be divided into following four:
1) real-time calculated.In stream process frame, the data one by one sent out from data source flow between operator, It obtains a result after calculating and analyzing.Flow data is generated, is reached in real time in real time, needs to be handled in real time, have to frame higher Low latency requirement.
2) order calculated.Flow data and time are closely related, but the sequence that data reach is unpredictable, due to network The reason of delay etc. unavoidably generates out of order data flow, it is therefore desirable to a kind of place for the time when operator receives data Reason strategy, orderly calculates out of order data, and water level line mechanism (Watermark) is out of order place common in stream process frame Reason strategy.
3) boundedness calculated.Flow data persistently generates, this is originally enterprising in bounded data set to polymerizeing, connecting etc. some The semanteme of row operation proposes new demand, and the unlimitedness of data causes it can not be until all data receivers terminate just to be located Reason.Therefore, Stream Processing generally uses window scheme, and based on certain rule, the collection of bounded is marked off in unlimited data flow It closes, relevant polymerization or attended operation is then carried out on the bounded set.Window scheme is also that Stream Processing is different from batch One big feature of reason.
4) high reliability run.What batch processing calculated is the off-line data of persistence, if going wrong in operation, Only load need to be simply repeated to calculate.But what flow data computed repeatedly costs dearly, this requires Stream Processing to be sent out in failure It so that it is restored to the cluster state at certain nearest moment after life, avoid computing repeatedly.The failure recovery of stream process frame Semanteme, which ensures, is divided into three kinds, is at most primary, primary, different stream process frame failure recovery language at least once and just respectively Justice, which ensures, has any different.
Currently, the application of universal Stream Processing all operates on distributed stream processing frame, the distributed stream of industry mainstream Processing frame has Storm, Spark Streaming, Flink etc..Flink is the distributed stream increased income a processing frame, it Using the tupe of continuous flow, have the advantages that high-throughput, low latency, and supports the failure recovery of " just primary " semantic. Compared to other stream process frames, Flink is directed to the characteristic of Stream Processing pattern, devises perfect treatment mechanism, while it Batch processing is considered as to a kind of special circumstances of stream process, it is unified to have carried out stream batch in frame bottom.Therefore, Flink is selected herein For object to be measured frame.
The application architecture that present example uses is as shown in Figure 1, top half is to be measured applies in Flink stream process frames Logic chart when middle operation, lower half portion are the module architectures of present system (stream process frame of reference tests system), including Streaming workload generator, scene and four application test set, performance data collection tool and performance data analysis tool composition moulds Block.
(1) streaming workload generator
It is responsible to generate the streaming load for meeting Stream Processing mode data feature.According to data characteristics, streaming load generates Device includes design of both flow rate model and data set attribute.
In flow rate model, the present invention devises four kinds of different flow rate models:Fixed rate, i.e. data source life each second At data volume it is constant;The data volume of random rates, i.e. data source generation per second obtains at random within the scope of one;Mutation speed Rate, i.e. data source suddenly increase to hump speed by a stable smaller value in a short time;Exponential rate, i.e. data source are every The data volume second generated increases according to exponential function, and is recycled within the period.Specify three kinds of different data set categories simultaneously Property:Data dimension, dimension is bigger, and the size per data is bigger, influences network transmission volume;Data skewness, for Key/ The data of Value forms, the inclined degree of description Key distributions, gradient is higher, and the distribution of Key is more uneven;Out of order degree, Input traffic time out of order degree is described, degree is bigger, and certain data time are out of order more serious in input traffic.
(2) streaming scene and application builder
Include the test set of performance Benchmark test system.On the one hand test set needs to build comprehensive scene with application to cover On the other hand the calculating feature of lid Stream Processing pattern needs the processing logic in application energy practical in production life, is based on Above 2 points, test set devises the typical five kinds of scenes of Stream Processing and eight kinds of test applications altogether, as shown in table 1.
1 scene of table and typical case introduction
Table 2 compares the spy of the application collection involved in this performance Benchmark test system and other benchmark or correlative theses work Coverage condition is levied, this system is wider compared with other benchmark level of coverage in window type and two aspect of operator operation.
The feature coverage condition comparison of 2 each test benchmark of table application collection
Wherein, window treatments mechanism is a big feature of stream process frame, and the parameter of this part is the most complicated.Influence window There are many parameters of mouth, such as window size, sliding step, operator degree of parallelism, triggering calculating function etc., some parameters are only mutually Vertical, some interdepend.Application window in the specific implementation, Benchmark test system is docked not by different parameter configurations Same window selects flow as shown in Figure 2.
1) determine whether window is the window based on Key first, if it is the window based on Key, then data flow into Before row window operator, the division operation for carrying out keyby functions is needed;If not the common window based on Key, then its window The degree of parallelism of operator may be only configured to 1, can not carry out parallel computation.
2) it and then determines the type of drive of window, is divided into counting driving two kinds of window and time driving window.Count window Window calculation is triggered by the data amount check in window, time window is by being arranged time triggered window calculation.In time window In there is also the difference of time of origin (Event-time) and processing time (Processing-time), according to time of origin, It also needs to setting water level line operator and maximum allowable out of order time parameter handles out of order situation.
3) mobile type for then determining window is divided into rolling window, sliding window and session window.Wherein roll window It only needs that window size is arranged, sliding window needs that window size is arranged and sliding step, session window only belong to time driving Window needs that inactive time gap is arranged.
4) window calculation function is finally accessed, this function is included in using in logic, is transmitted not as parameter.
(3) performance data collection tool
It is responsible for being collected each performance indicator of test application, these indexs include throughput as shown in table 3, prolong Late, the more fine-grained letter such as rate, processing data amount, buffer pool usage amount of each node when back-pressure, system resource and operation Breath.The present invention carries out data acquisition by Profiler sampling instruments, it will periodically access the Master nodes of cluster, obtain The runtime data of current time test application.After test, Profiler is by all data of the secondary test according to adopting The collection time is persisted to disk, is preserved with Json formats.
3 performance indicator of table
(4) performance data analysis tool
It is responsible for for statistical analysis to the performance data after acquisition.It includes two stages, and the first stage is to persistence Json formatted datas carry out statistics and analysis, and multiple test result is combined and is compared, the intermediate result of Csv formats is converted into;The Two-stage converts Csv formatted datas to visual chart data.According to the intermediate result stored in Csv files, draw anti- Press figure, delay figure, throughput figure etc., influence of the reflection Parameters variation to performance, diagnosis stream process frame bottleneck place.
Using Apache Flink as object to be measured frame, is applied using five kinds of scenes eight of structure and carried out on cluster Test finds and summarizes the performance issue that Flink occurs in a variety of Representative flows processing scene.
(1) influence of the gradient to performance
Fig. 3 applies the performance comparison result figure under different gradients for ProductStatis, and wherein left figure is different inclinations Back-pressure comparison diagram under degree, right figure are data source reality output rate diagram under different gradients, and the value of data skewness is { 0,1,4 } no inclination, low dip and high dip are indicated respectively.It can be obtained from left figure, in the identical situation of other configurations parameter Under, occur the back-pressure that grade is HIGH under high obliquity, under no inclination and low dip, back-pressure grade is LOW;Right figure can be seen Go out, in low dip and nonangular test, the data volume of the ends Source output is close, rate in or so 4000K items/second, but In high dip test, the data volume that the ends Source generate tails off, only less than 2000K items/second.
By analysis, obtain to draw a conclusion:As shown in figure 4, the processing of window operator is related to Key, the data of identical Key It will be assigned on same stream and calculate.The data of Key=1, Key=3 are assigned in window (1/2) node in figure, Key= 2, Key=4 is assigned in window (2/2) node.Without the number tilted in low dip test, each window node is got The influence of the reason of suitable or gap is little, and back-pressure grade is LOW according to measuring mainly data source rate.And it is surveyed in high obliquity In examination, the accounting of some or several Key in data set is very high, leads to the data volume that some window node is got very Greatly, the execution time of window calculation increases, and node processing rate is less than data entry rate at this time, and the data for having little time processing will It is overstock in the buffering area of input terminal, occurs obstruction and HIGH grade back-pressures after the buffering area of input terminal takes.In order to subtract Congestion situations are fed back to upstream node by growth that is slow or stopping buffering area, system so that the traffic volume of upstream is reduced, processing is fast Rate is slack-off, this will feed back to upstream ... and so feed back step by step again, eventually leads to the ends Source output data quantity and reduces.
(2) influence of the window size to performance
Fig. 5 is the performance comparison result figure under TransactionJoin application different windows sizes, and wherein left figure is difference Back-pressure comparison diagram under window size, right figure are to postpone comparison diagram under different windows size.Left figure can obtain, as window is calculated The increase of window size in son, back-pressure higher grade;Maximum when right figure is test execution in the delay that output end is collected into Value, due to there are two data source, so delay is also there are two source, lateral comparison show the delay difference between two data sources away from Less.But with window size increase in window operator, increase in the delay that output end receives.
It is drawn a conclusion by analysis:As shown in fig. 6, the bigger data for including of window are more, and the meter of window operator triggering The connection of a complexity calculates at last, executes the time with the growth of data square rank, this makes in triggering computation window mouth node Processing speed be less than input rate, data are overstock in the buffering area of input terminal, after the buffering area of input terminal is occupied full There is back-pressure.After feeding back to data source, the input of the data of whole system reduces, calculate that time-consuming and cause data in node etc. Wait for that the time increases, delay is consequently increased.
(3) influence of the rate to performance
Fig. 7 is that ProductStatis applies the performance comparison under different nominal rated speeds from TransactionJoin.It is left It is upper to apply the back-pressure comparison diagram under different nominal rated speeds, upper right to be applied in difference for PorductStatis for PorductStatis Reality output rate comparison figure under nominal rated speed, lower-left are that TransactionJoin applies the back-pressure pair under different nominal rated speeds Than figure, bottom right is that TransactionJoin applies the reality output rate comparison figure under different nominal rated speeds. In ProductStatis application results, with the increase of nominal rated speed, back-pressure becomes LOW from OK.What real data source generated Rate changes with time under 10k/s and 160k/s nominal rated speeds, is met it is contemplated that still in 640k/s and 1000k/s It when nominal rated speed, and is not up to expected, and the actual speed rate of the two is close, illustrates that bottleneck occurs in system velocity growth, but this When do not occur the back-pressures of HIGH grades, judge that rate the reason of bottleneck occurs and is network bandwidth.TransactionJoin Using in result, other than back-pressure grade is OK under 1k/s rates, all occur the anti-of HIGH grades under other nominal rated speeds Pressure.The actual data output rate of data source shows outside except 1k/s nominal rated speeds that volume is not achieved in other actual speed rates Definite value.Two may determine that the back-pressure of system under the complexity effect nominal rated speed of computation logic using lateral comparison.
Although nominal rated speed is not achieved in ProductStatis applications, rate is maintained at always stabilization, There is larger fluctuation at any time in the actual speed rate of TransactionJoin applications, and rate, which is in, rises a period of time, rapidly It falls after rise, in the loop cycle then risen again.The reason of rate fluctuation occurs is probed into, for complexity such as TransactionJoin Window logic calculates application, and when window does not trigger calculating, input terminal receives always data, and no back-pressure occurs;Work as window calculation When, long due to calculating the time, processing speed is not as good as receiving velocity, and buffer data, which is overstock, produces high back-pressure, and feedback causes to work as The reality output rate of time data source is reduced;After this window calculation, the data receiver overstock in buffering area is extensive Multiple, back-pressure grade reduces, rate rises;When triggering window calculation again, buffer data starts to overstock again, therefore back-pressure and speed The inverse relation of fluctuation is presented in rate.And for ProductStatis simple computation applications, window triggering execution required when calculating Time is short, and buffering area does not take in calculating process, can maintain equilibrium state substantially as long as reducing input rate at this time.
Although disclosing specific embodiments of the present invention and attached drawing for the purpose of illustration, its object is to help to understand the present invention Content and implement according to this, but it will be appreciated by those skilled in the art that:The present invention and the attached claims are not being departed from Spirit and scope in, corresponding method and tool can also be realized on other platforms.Therefore, the present invention should not be limited to reality Apply example and attached drawing disclosure of that.

Claims (7)

1. a kind of performance benchmark test system of high amount of traffic processing frame, which is characterized in that including:Streaming workload generator, Streaming scene and application builder, performance data collection tool and performance data analysis tool;
The streaming workload generator generates the flow data for including data parameters;
The streaming scene and application builder, structure special scenes and application, run the application and carry out institute under different parameters Frame is stated in the scene and the performance test for handling the flow data in application;
The performance data collection tool, acquires the performance indicator during the performance test;
The performance data analysis tool carries out processing analysis to the performance indicator of performance collection tool acquisition, with Reflect influence of the Parameters variation to the performance indicator, and diagnoses the bottleneck that the frame carries out flow data processing.
2. the performance benchmark test system of high amount of traffic processing frame according to claim 1, it is characterised in that:The ginseng Number includes but not limited to data parameters, application parameter or systematic parameter.
3. the performance benchmark test system of high amount of traffic processing frame according to claim 1 or 2, it is characterised in that:Institute It includes but not limited to streaming load characteristic to state data parameters, and the flow velocity load characteristic includes but not limited to flow rate model or data Technical, the flow rate model includes but not limited to fixed rate, random rates, mutation rate or exponential rate, the data Set attribute includes but not limited to data dimension, data skewness or the out of order degree of data.
4. the performance benchmark test system of high amount of traffic processing frame according to claim 2, it is characterised in that:The system Parameter of uniting includes but not limited to memory, CPU, Slave number or cluster maximum parallelism degree.
5. the performance benchmark test system of high amount of traffic processing frame according to claim 1, it is characterised in that:The property Energy index includes but not limited to throughput, delay, back-pressure, system resource, node processing rate, node data amount or node buffering Pond usage amount.
6. the performance benchmark test system of high amount of traffic processing frame according to claim 2, it is characterised in that:It is described to answer With for window scheme, the application parameter is window parameter, the window parameter include but not limited to window type, degree of parallelism, Or window size.
7. a kind of performance benchmark test method of high amount of traffic processing frame, which is characterized in that include the following steps:
(1) test cluster is disposed, the systematic parameter of the frame is configured;
(2) scene and application are chosen, configuration data parameter, application parameter and test parameter determine the more of some parameter to be tested A value, and other parameter constants are kept, the application is run on the cluster carries out multigroup test;
(3) it runs the application to be tested, while startability sampling instrument, test is waited for complete;
(4) during the test, the performance sampling instrument by described in periodic access test cluster collect performance indicator, and Collected performance indicator is persisted in storage device after test;
(5) when it is described apply on the test cluster complete or after, analyze the property that the performance sampling instrument is acquired Can data, the influence of the change of the parameter to be tested to the performance indicator, top-down bed-by-bed analysis, described in positioning The bottleneck point of frame.
CN201810461515.XA 2018-05-15 2018-05-15 Performance benchmark test system and method for large data stream processing framework Active CN108683560B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810461515.XA CN108683560B (en) 2018-05-15 2018-05-15 Performance benchmark test system and method for large data stream processing framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810461515.XA CN108683560B (en) 2018-05-15 2018-05-15 Performance benchmark test system and method for large data stream processing framework

Publications (2)

Publication Number Publication Date
CN108683560A true CN108683560A (en) 2018-10-19
CN108683560B CN108683560B (en) 2021-03-30

Family

ID=63806177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810461515.XA Active CN108683560B (en) 2018-05-15 2018-05-15 Performance benchmark test system and method for large data stream processing framework

Country Status (1)

Country Link
CN (1) CN108683560B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109542985A (en) * 2018-11-27 2019-03-29 江苏擎天信息科技有限公司 A kind of general streaming Data Analysis Model and its construction method
CN110058977A (en) * 2019-01-14 2019-07-26 阿里巴巴集团控股有限公司 Monitor control index method for detecting abnormality, device and equipment based on Stream Processing
CN110069331A (en) * 2019-04-24 2019-07-30 北京百度网讯科技有限公司 A kind of data processing method, device and electronic equipment
CN110704998A (en) * 2019-06-25 2020-01-17 眸芯科技(上海)有限公司 Multimedia IP bandwidth performance verification method and device
CN110740079A (en) * 2019-10-16 2020-01-31 北京航空航天大学 full link benchmark test system for distributed scheduling system
CN110971483A (en) * 2019-11-08 2020-04-07 苏宁云计算有限公司 Pressure testing method and device and computer system
CN111049684A (en) * 2019-12-12 2020-04-21 闻泰通讯股份有限公司 Data analysis method, device, equipment and storage medium
CN111143143A (en) * 2019-12-26 2020-05-12 北京神州绿盟信息安全科技股份有限公司 Performance test method and device
CN111737097A (en) * 2020-06-05 2020-10-02 浪潮电子信息产业股份有限公司 Performance test method and related device of stream processing system
CN111930630A (en) * 2020-08-17 2020-11-13 电信科学技术第十研究所有限公司 Big data test case generation method and device based on data flow
CN112070235A (en) * 2020-09-08 2020-12-11 北京小米松果电子有限公司 Abnormity positioning method and device of deep learning framework and storage medium
CN113760989A (en) * 2021-02-04 2021-12-07 北京沃东天骏信息技术有限公司 Method, device and equipment for processing unbounded stream data and storage medium
CN115033457A (en) * 2022-06-22 2022-09-09 浙江大学 Multi-source data real-time acquisition method and system capable of monitoring and early warning

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109542985B (en) * 2018-11-27 2023-09-19 南京擎天科技有限公司 Universal stream data analysis model and construction method thereof
CN109542985A (en) * 2018-11-27 2019-03-29 江苏擎天信息科技有限公司 A kind of general streaming Data Analysis Model and its construction method
WO2020147480A1 (en) * 2019-01-14 2020-07-23 阿里巴巴集团控股有限公司 Stream processing-based monitoring index abnormality detection method, device and equipment
CN110058977A (en) * 2019-01-14 2019-07-26 阿里巴巴集团控股有限公司 Monitor control index method for detecting abnormality, device and equipment based on Stream Processing
CN110058977B (en) * 2019-01-14 2020-08-14 阿里巴巴集团控股有限公司 Monitoring index abnormity detection method, device and equipment based on stream processing
CN110069331A (en) * 2019-04-24 2019-07-30 北京百度网讯科技有限公司 A kind of data processing method, device and electronic equipment
CN110704998A (en) * 2019-06-25 2020-01-17 眸芯科技(上海)有限公司 Multimedia IP bandwidth performance verification method and device
CN110704998B (en) * 2019-06-25 2023-04-18 眸芯科技(上海)有限公司 Multimedia IP bandwidth performance verification method and device
CN110740079A (en) * 2019-10-16 2020-01-31 北京航空航天大学 full link benchmark test system for distributed scheduling system
CN110971483A (en) * 2019-11-08 2020-04-07 苏宁云计算有限公司 Pressure testing method and device and computer system
CN110971483B (en) * 2019-11-08 2021-11-09 苏宁云计算有限公司 Pressure testing method and device and computer system
CN111049684A (en) * 2019-12-12 2020-04-21 闻泰通讯股份有限公司 Data analysis method, device, equipment and storage medium
CN111143143A (en) * 2019-12-26 2020-05-12 北京神州绿盟信息安全科技股份有限公司 Performance test method and device
CN111143143B (en) * 2019-12-26 2024-02-23 绿盟科技集团股份有限公司 Performance test method and device
CN111737097A (en) * 2020-06-05 2020-10-02 浪潮电子信息产业股份有限公司 Performance test method and related device of stream processing system
CN111737097B (en) * 2020-06-05 2022-06-07 浪潮电子信息产业股份有限公司 Performance test method and related device of stream processing system
CN111930630B (en) * 2020-08-17 2024-01-05 电信科学技术第十研究所有限公司 Method and device for generating big data test case based on data stream
CN111930630A (en) * 2020-08-17 2020-11-13 电信科学技术第十研究所有限公司 Big data test case generation method and device based on data flow
CN112070235A (en) * 2020-09-08 2020-12-11 北京小米松果电子有限公司 Abnormity positioning method and device of deep learning framework and storage medium
CN113760989A (en) * 2021-02-04 2021-12-07 北京沃东天骏信息技术有限公司 Method, device and equipment for processing unbounded stream data and storage medium
CN115033457A (en) * 2022-06-22 2022-09-09 浙江大学 Multi-source data real-time acquisition method and system capable of monitoring and early warning
CN115033457B (en) * 2022-06-22 2023-08-25 浙江大学 Multi-source data real-time acquisition method and system capable of monitoring and early warning

Also Published As

Publication number Publication date
CN108683560B (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN108683560A (en) A kind of performance benchmark test system and method for high amount of traffic processing frame
Meng et al. Localizing failure root causes in a microservice through causality inference
Karimov et al. Benchmarking distributed stream data processing systems
CN106886485B (en) System capacity analysis and prediction method and device
CN107967485A (en) Electro-metering equipment fault analysis method and device
CN109074284A (en) For increasing and decreasing the method and system and computer program product of resource in proportion
CN110740079B (en) Full link benchmark test system for distributed scheduling system
Mustafa et al. A machine learning approach for predicting execution time of spark jobs
CN106776288B (en) A kind of health metric method of the distributed system based on Hadoop
CN104516808A (en) Data preprocessing device and method thereof
CN106850321A (en) A kind of simulated testing system of cluster server
CN105512264A (en) Performance prediction method of concurrency working loads in distributed database
JPWO2008001678A1 (en) Method, program and apparatus for optimizing system configuration parameter sets
Liu et al. Benchmarking time series databases with IoTDB-benchmark for IoT scenarios
CN112633542A (en) System performance index prediction method, device, server and storage medium
CN107707680A (en) A kind of distributed data load-balancing method and system based on node computing capability
CN107360026A (en) Distributed message performance of middle piece is predicted and modeling method
CN110321493A (en) A kind of abnormality detection of social networks and optimization method, system and computer equipment
Singh et al. Improving the quality of software by quantifying the code change metric and predicting the bugs
US20140181174A1 (en) Distributed processing of stream data on an event protocol
CN111274112B (en) Application program pressure measurement method, device, computer equipment and storage medium
CN112988529B (en) Method and system for predicting database system performance based on machine learning
Ehrenstein Scalability benchmarking of kafka streams applications
CN108712303B (en) Tail delay evaluation system and method for cloud platform
Guo et al. Sigco: Mining significant correlations via a distributed real-time computation engine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant