CN114036034A - Performance test method applied to real-time streaming computation - Google Patents

Performance test method applied to real-time streaming computation Download PDF

Info

Publication number
CN114036034A
CN114036034A CN202111266696.9A CN202111266696A CN114036034A CN 114036034 A CN114036034 A CN 114036034A CN 202111266696 A CN202111266696 A CN 202111266696A CN 114036034 A CN114036034 A CN 114036034A
Authority
CN
China
Prior art keywords
real
test
time
data
time streaming
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111266696.9A
Other languages
Chinese (zh)
Inventor
薛鹏
于红建
易琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shansong Technology Co ltd
Original Assignee
Beijing Shansong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shansong Technology Co ltd filed Critical Beijing Shansong Technology Co ltd
Priority to CN202111266696.9A priority Critical patent/CN114036034A/en
Publication of CN114036034A publication Critical patent/CN114036034A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3414Workload generation, e.g. scripts, playback

Abstract

The invention discloses a performance test method applied to real-time streaming calculation, which comprises a performance test environment of the real-time streaming calculation, a performance test scene of the real-time streaming calculation, simulation of real-time streaming data, compiling of producer classes of various service scenes, preparation of test data of the real-time streaming calculation, preparation of a pressure test script, implementation of a pressure test process of the real-time streaming calculation and output of a test report, and is a comprehensive and complete performance test scheme for the real-time streaming calculation. Compared with the prior art, the method has the advantages that real-time stream data of different scenes are simulated through codes, the flexibility and the reusability are higher, and the performance test of real-time stream type calculation is completed by combining with a pressure measurement tool.

Description

Performance test method applied to real-time streaming computation
Technical Field
The patent relates to a performance test method applied to real-time streaming computing, and relates to the field of computer technology and internet test.
Background
The big data brings colorful life style for people, and people can fully enjoy the value mined from the big data. However, just because big data is generated too fast, and facing endless data flood, we mainly have two ways of off-line computation and real-time computation to process data, and the technical features are batch processing and streaming. The value of real-time streaming computing is that after data is generated, a business result can be obtained quickly, and the real-time value of the data is analyzed and mined. For real-time computing, once data is not available for real-time streaming and results are generated immediately, the value of the data decreases over time.
The application scenarios of real-time streaming computing are many, such as real-time warehouse counting, real-time wind control, real-time recommendation, and real-time machine learning. With the advent of real-time computing frameworks such as storm, spark training, flink and the like, the technique of the real-time computing field is more and more perfect due to the rise of kafka and es, so that the real-time streaming computing plays an increasingly important role in the fields of financial industry, online education industry, e-commerce industry, advertising industry and the like.
The characteristics of real-time streaming data are mainly as follows: 1) real-time performance, the data processed by real-time calculation has real-time value, which requires that the calculation time delay must be small; 2) the data flow has randomness in time and quantity, and the flow rate becomes smaller in a period of time and possibly larger in a period of time; 3) the non-ordering, the streaming data is a sequence of events with respect to time, and the strict global ordering is difficult to guarantee or even almost impossible. We can order the data within a local time window. In the current mainstream real-time stream computing framework, a common method is to distribute received events to individual time window segments according to timestamps, and after waiting for a period of time, trigger a unified processing operation of data in the time window segments; 4) the stream data has no concept of 'every time', and is generated continuously, infinitely and infinitely. The infinite nature of real-time streaming data requires that the system must have high availability and real-time processing capabilities on the system architecture. The more messages are to be processed when the system processing power cannot keep up with the speed at which the data stream is generated. Systems with limited storage must crash when the backlog number exceeds a threshold. To eliminate backlog messages that already exist, the system processing power must exceed the speed at which the data stream is generated, otherwise a backlog situation may persist.
Based on the characteristics of the real-time streaming data, a tester needs to perform performance test on the real-time streaming calculation program to obtain the performance capacity index of the real-time calculation program, provide a performance test report, provide tuning reference for developers, finally provide reliable and stable service, and meet the online requirements.
Disclosure of Invention
The invention provides a performance test method applied to real-time streaming computing, which simulates data of different scenes through codes and completes performance test by means of a pressure test tool.
The sources of real-time streaming data vary widely, log collection is common, for example, buried point data is collected through filebeat, data is collected through canal based on database incremental log binlog, and then the collected data is output to a kafka distributed message system. And then, the real-time calculation program consumes and processes the data in the kafka, and the processed data are stored in databases such as redis, es, kudu, hive, clinkhouse and the like.
The performance test mode of real-time streaming computing, namely, a large amount of data needs to be simulated, has two modes:
1. the access of some or all of the online data to the test environment in case of interworking with the online environment has the advantages of: the data is authentic. The disadvantages are as follows: firstly, data security problems; performance results such as its strength; also, such as buried point data, this approach makes final data statistics difficult.
2. The data is sent by writing multithread through writing code simulation data such as java, python and the like, and the method has the advantages of flexibility and capability of simulating data of all service scenes; the defects are that the requirement on the code level of a tester is high, visualization cannot be realized, and the pressure is difficult to control.
The present invention provides a performance testing method applied to real-time streaming computing, and the solution can solve the above problems. The data of different scenes are simulated through the codes, and the performance test is completed by means of a pressure test tool, and the method has the advantages that: 1) the flexibility is high, familiar programming languages such as java and python can be used, and data of different scenes can be simulated through codes; 2) the pressure measurement tool is visual, friendly and complete in function, and can be used for setting thread groups, controlling the pressure, parameterizing and asserting a return result. Even multiple machines send pressure, server resources are monitored, test reports are generated, and the like, such as open source tools apache meter (corresponding to java language), and locust (corresponding to python).
Drawings
FIG. 1: physical diagram of pressure measurement
FIG. 2: schematic diagram of test flow
The specific implementation mode is as follows:
1. preparing a test environment
Preparing a test environment of real-time streaming computing, preparing a server, deploying a performance test environment comprising a pressure test tool, a message middleware, a real-time streaming computing program, a cache cluster, an es middleware, a database and the like, and preferably a clean pressure test environment.
2. Test scenario design
And designing a test scene of real-time streaming calculation, namely a test case. The following aspects are considered: 1) considering from a business scene, such as indexes of order completion quantity, order cancellation, income, good evaluation, bad evaluation and the like of a real-time data billboard of a distributor, the next order taking-more than the order grabbing-more than the order completing-more than the evaluation, the next order taking-more than the order dispatching-more than the order transferring-more than the order completing-and the like are simulated, and the scenes of order cancellation and the like are designed, such as single transaction load, mixed pressure test and the like; 2) considering from the test scheme, pressure test, reliability test, stability test, expandability test, cluster effectiveness and the like are carried out; 3) considering the characteristics of streaming computation, single parallelism and multiple parallelism are considered.
3. Simulating real-time streaming data, writing producer class (written using java)
1) Introducing corresponding dependency packages, wherein the dependency packages mainly comprise a message system kafka and a pressure measurement tool; simulating data changes of different service scenes, sending the data changes to a message system, and managing the different service scenes by using different classes;
2) the producer class is written, firstly, the self program is ensured not to have performance problems, a single case mode is used, objects cannot be frequently created and destroyed when a pressure measurement tool is used for multi-thread pressure measurement, and the use of a memory is reduced.
3) In the method for rewriting the pressure measurement tool, data is to be as real as possible, variables and parameterized fields are set aside, such as the ip and topic of kafka, and parameterized fields in a sending message are set aside.
4) Generating jar package
4. Preparing test data
Comprises two parts: 1) the parameterized data in the pressure measurement script is mainly a field which is reserved in a producer class and needs parameterization, so that the simulated data is more real; 2) for example, if the buried point data is written into the kudu table through real-time calculation, the performance of inserting and inquiring data can be influenced if the kudu table with different data levels needs to be simulated.
Different levels of database test data are generated, depending on the specific database. There are several ways: 1) direct-write code insertion data, while most databases provide a way to generate data in batches, this approach creates hundreds of millions of data volumes, which is very time consuming; 2) by looking up other tables or inserting a part of the data first. And the data volume is multiplied by Cartesian product, so that the method is efficient, but only part of the database is suitable for use.
5. Compiling and pressure measuring notebook
And (4) importing jar packages in a pressure measurement tool, selecting related classes in different requests according to requirements, and simulating different scenes. A parameter file required for parameterization is added. A timer is added to test the planned test scenario, controlling the size of the QPS.
6. Real-time calculated pressure measurement process
In the stage of preparing the test environment, the corresponding pressure test server is prepared, and the pressure test tool of the corresponding version is prepared on the server. Aiming at the real situation of real-time calculation, the pressure measurement server and the tested program are in the same intranet environment, and the inaccuracy of performance results caused by network overhead is avoided. Before the pressure measurement, the script and the parameter file, including jar packages used in the script, are deployed on a pressure measurement server.
Executing a test scene, and performing the following points during pressure measurement: 1) and monitoring corresponding resource consumption, including all servers of the pressure measuring end and the measured end, and mainly monitoring server resources such as a CPU, a memory, an IO, a network and the like. The real-time calculation relates to a server, a cache server and a storage server for spark and flink calculation, and can be obtained through a resource monitoring platform or tools such as nmon and the like; 2) checking whether the messages are stably processed or not through real-time calculation, monitoring the lag value of kafka through compiling codes (java and python), and obtaining the graph of the program consumption messages in the time period for 1 time per second; 3) and checking the task manager log of the real-time calculation program to determine whether an error log exists.
7. Real-time calculation of pressure measurements
The performance result of real-time calculation cannot be directly obtained through a pressure measurement tool, the pressure measurement process is only to write data to kafka at a certain pressure, the data consumed from kafka is calculated in real time, and the result is finally output after program calculation, namely the result is displayed in an index form, such as the finished quantity, the cancelled quantity, the income and the like in a real-time data display board of a distributor. When the information of orders, income and the like comes, the real-time calculation program is calculated and stored in a database or a cache, and then an interface is compiled to call and display to a distributor side. Accuracy is the result of comparing the index with the simulated data validation data. The performance result is used for counting time consumption through the log, and therefore the processing time is obtained. The other is directly writing into a large data table, such as a real-time data warehouse and a buried point system, the accuracy of data can be checked through the number of the large data table and the data comparison of partial events, and the processing amount per second, namely QPS, is obtained by dividing the total number of the data by the time difference between the writing time of the last record and the writing time of the first record.
If the pressure measurement result is not in accordance with the expectation, the following aspects are considered: 1) firstly, analyzing whether the script and the parameter have problems or not from the script written by the user; 2) analyzing a returned result, judging whether an error is reported or not, and checking whether the log of the real-time calculation program has an error or not; 3) the machine resource consumption of the pressure can be realized by adding a proxy machine; 4) the pressure measurement phenomenon, the gradient profile, and the like are specifically analyzed.
If the pressure measurement result is in accordance with the expectation, a performance test report calculated in real time is written, and the performance test report comprises the following parts: 1) the test result shows that the test result comprises a pressure test version, the configuration of each server and the comparison between the test and the production server; 2) the performance indexes of the test result, the pressure test result of each test scene, include the total processing number, the concurrency number, the response time, the QPS and the success rate; 3) server resource usage reports under various scenarios; 4) and (5) a performance test conclusion is given when the new service is on line.

Claims (4)

1. A performance testing method applied to real-time streaming computing comprises the following steps:
-a test environment for real-time streaming computing, preparing servers, deploying performance test environments, including pressure instrumentation, message middleware, real-time streaming computing programs, cache clusters, es middleware, databases, etc.
-a test scenario of real-time streaming computation, 1) considered from a traffic scenario; 2) considering from the test scheme, a pressure test, a reliability test, a stability test and the like; 3) considering the characteristics of streaming computation, single parallelism and multiple parallelism are considered.
Simulating real-time stream data, compiling producer classes of various service scenes, and rewriting the pressure measurement tool in combination with a dependency package provided by the pressure measurement tool to reserve a parameterized field.
-preparing real-time streaming computed test data, 1) pressure testing of parametric data in scripts; 2) is the data magnitude of the database that has an impact on performance.
Preparing a pressure measurement script, selecting related classes according to requirements in different requests, simulating different scenes, and adding parameter files required by parameterization.
-performing a pressure test procedure of real-time streaming computing, executing a test scenario, monitoring and logging of server resources, monitoring logs of real-time computing programs.
Problem troubleshooting and tuning, reusable pressure measurement scripts and data, and assistance of developers in troubleshooting and tuning real-time program problems.
Outputting the test report, outputting the test result description, the test result, the resource consumption report and the performance test conclusion, and providing a performance reference for the online real-time computing program.
2. The method for performance testing of real-time streaming computing of claim 1, being a comprehensive and complete performance testing scheme for real-time streaming computing. Is a solution to obtain the performance capacity of a streaming computing program.
3. The method for performance testing of real-time streaming computing of claim 2, comprising a method of designing a pressure test scenario of real-time streaming computing, a method of simulating real-time streaming data of different scenarios by encoding, a method of parameterizing data, a method of writing a performance test script, an execution process, and a method of obtaining a final real-time computing performance test result.
4. The performance testing method of real-time streaming computing according to claim 3, wherein the method comprises writing codes in an unlimited coding language to simulate real-time streaming data in different service scenarios, and combining with a pressure testing tool to conveniently realize pressure in each test scenario.
CN202111266696.9A 2021-10-27 2021-10-27 Performance test method applied to real-time streaming computation Pending CN114036034A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111266696.9A CN114036034A (en) 2021-10-27 2021-10-27 Performance test method applied to real-time streaming computation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111266696.9A CN114036034A (en) 2021-10-27 2021-10-27 Performance test method applied to real-time streaming computation

Publications (1)

Publication Number Publication Date
CN114036034A true CN114036034A (en) 2022-02-11

Family

ID=80135676

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111266696.9A Pending CN114036034A (en) 2021-10-27 2021-10-27 Performance test method applied to real-time streaming computation

Country Status (1)

Country Link
CN (1) CN114036034A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756630A (en) * 2022-04-18 2022-07-15 焦点科技股份有限公司 Real-time warehouse counting construction method based on Flink state
CN114780584A (en) * 2022-06-22 2022-07-22 云账户技术(天津)有限公司 Multi-scene streaming data processing method, system, network equipment and storage medium
CN114756630B (en) * 2022-04-18 2024-04-19 焦点科技股份有限公司 Real-time bin counting construction method based on Flink state

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756630A (en) * 2022-04-18 2022-07-15 焦点科技股份有限公司 Real-time warehouse counting construction method based on Flink state
CN114756630B (en) * 2022-04-18 2024-04-19 焦点科技股份有限公司 Real-time bin counting construction method based on Flink state
CN114780584A (en) * 2022-06-22 2022-07-22 云账户技术(天津)有限公司 Multi-scene streaming data processing method, system, network equipment and storage medium
CN114780584B (en) * 2022-06-22 2022-09-02 云账户技术(天津)有限公司 Multi-scene streaming data processing method, system, network equipment and storage medium

Similar Documents

Publication Publication Date Title
Bermbach et al. Cloud service benchmarking
US8898643B2 (en) Application trace replay and simulation systems and methods
US10116534B2 (en) Systems and methods for WebSphere MQ performance metrics analysis
US9519571B2 (en) Method for analyzing transaction traces to enable process testing
US8677324B2 (en) Evaluating performance of an application using event-driven transactions
Sneed et al. Wsdltest-a tool for testing web services
CN110007921B (en) Code publishing method and device
CN112650676A (en) Software testing method, device, equipment and storage medium
Yin et al. On representing resilience requirements of microservice architecture systems
CN114036034A (en) Performance test method applied to real-time streaming computation
Yin et al. On representing resilience requirements of microservice architecture systems
US7340650B2 (en) Method to measure stored procedure execution statistics
CN111930611B (en) Statistical method and device for test data
CN114155054A (en) Big data report statistics testing method and system based on kafka
CN113127356A (en) Pressure measurement method and device, electronic equipment and storage medium
CN117056218A (en) Test management method, platform, medium and equipment
CN115576831A (en) Test case recommendation method, device, equipment and storage medium
Rover et al. Software tools for complex distributed systems: Toward integrated tool environments
CN114676198A (en) Benchmark evaluation system for multimode database and construction method thereof
Graham Software testing tools: A new classification scheme
Huang et al. Modelling software corrective maintenance productivity using an analytical economic model and simulation
Anderson Performance modelling of reactive web applications using trace data from automated testing
CN116991750B (en) Pressure test method for realizing large concurrency simulation by utilizing multithreading and distributed agents
Damm Evaluating and Improving Test Efficiency
de Gooijer Performance modeling of ASP. Net web service applications: an industrial case study

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination