CN111309612A - Distributed file system based data current limiting test method and system - Google Patents
Distributed file system based data current limiting test method and system Download PDFInfo
- Publication number
- CN111309612A CN111309612A CN202010094784.4A CN202010094784A CN111309612A CN 111309612 A CN111309612 A CN 111309612A CN 202010094784 A CN202010094784 A CN 202010094784A CN 111309612 A CN111309612 A CN 111309612A
- Authority
- CN
- China
- Prior art keywords
- data
- task
- recording
- cluster
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010998 test method Methods 0.000 title claims abstract description 18
- 238000004364 calculation method Methods 0.000 claims abstract description 31
- 238000012360 testing method Methods 0.000 claims abstract description 26
- 230000000694 effects Effects 0.000 claims abstract description 19
- 238000000034 method Methods 0.000 claims description 37
- 230000008569 process Effects 0.000 claims description 31
- 238000013112 stability test Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 12
- 238000003860 storage Methods 0.000 claims description 9
- 230000008901 benefit Effects 0.000 abstract description 7
- 238000011156 evaluation Methods 0.000 abstract description 4
- 238000004519 manufacturing process Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012430 stability testing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G06F11/3672—Test management
- G06F11/3688—Test management for test execution, e.g. scheduling of test suites
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention provides a data current-limiting test method and a data current-limiting test system based on a distributed file system, which are used for testing cluster stability before and after data current limiting, and comprise the steps of checking the current reading and writing and task scale of a cluster, the reading and writing time and bandwidth occupation record of large-scale concurrent execution data, the large-scale concurrent execution calculation, the flow task time and the bandwidth occupation record of the cluster, and carrying out data comparison before and after data current limiting, so that whether the large-scale HDFS cluster data current limiting in a production environment can achieve the data current limiting benefit or not is evaluated, the effect of a data current limiting strategy is accurately evaluated, an evaluation result is provided for the HDFS data current limiting technical innovation, and one dimension is improved for the stability of the large data.
Description
Technical Field
The invention relates to the technical field of server clusters, in particular to a data current-limiting test method and system based on a distributed file system.
Background
With the development of Hadoop community technology, the HDFS continuously supports different storage strategies to deal with data at different temperatures, SSM is adopted to realize more intelligent storage management, the HDFS is high in availability and continuously perfect, and the federal dealing with large-scale cluster data at higher data level is realized. With the increasing of data volume, the improvement of high storage efficiency and high reliability is all the previous step, but today the number of large-scale clusters increases exponentially, a state of no concentration has appeared on bottom layer data nodes, data is continuously stored in the clusters, tasks are operated, the storage efficiency and the computing power of a software layer have been improved by SSM and Spark, Flink and the like with higher computing power, but continuous data flow and tasks occupy a large amount of network bandwidth, data reading and writing in the large-scale clusters are very frequent, the data transmission number is large, the computing and streaming tasks are large in quantity, the network bandwidth of one machine is necessarily limited, and if the bandwidth is used up by some tasks on the machine, normal task network transmission data is influenced. If the bandwidth is filled for a long time, machine IO alarms may also be caused, and the purpose of current limiting is in this place. It is not necessarily a malicious program or service that can cause the network bandwidth to quickly become full, and an inadvertent process or small error in the program can cause large-scale data transfers.
In order to solve the problem that the network bandwidth of a machine room is instantly filled due to several large tasks running in the current HDFS large-scale cluster, so that the on-line part of service is jittered and other service operation is influenced, a limiting scheme at a dataode end has been proposed in a Hadoop community, but a series of related function release is not completely perfected. The large-scale cluster optimization technology represented by the Hadoop system ecosphere can improve one dimension of the stability of the current HDFS cluster and prevent the HDFS cluster from happening. And limiting data flow limitation related similar operations to the DataNode so as to ensure cluster stability. With the increasing of big data, the data flow limitation will be more and more perfect with the perfection of the function, the updating of Hadoop community patches and the release of subsequent new versions, and for such an intelligent and complex tuning scheme, how to evaluate whether the data flow limitation can achieve corresponding benefits, whether data operation and tasks in a cluster are intelligently limited and managed, and how to evaluate the benefits of a data flow limitation strategy is an important problem that needs to be solved by technical personnel in the field.
Disclosure of Invention
The invention aims to provide a data current-limiting test method and system based on a distributed file system, and aims to solve the problem that the prior art lacks data current-limiting strategy evaluation, achieve the effect of accurately evaluating a data current-limiting strategy and improve the stability of a big data cluster.
In order to achieve the technical purpose, the invention provides a data current-limiting test method based on a distributed file system, which comprises the following operations:
respectively executing cluster stability tests before and after data current limiting, wherein the cluster stability tests comprise checking the current reading and writing and task scale of a cluster, recording large-scale concurrent execution data reading and writing time and bandwidth occupation, recording large-scale concurrent execution calculation, streaming task time and bandwidth occupation;
and comparing the read-write data, the calculation data and the stream data before and after data current limiting, and evaluating whether the current data current limiting strategy meets the requirements.
Preferably, the recording of the large-scale concurrent execution data read-write time and the bandwidth occupation specifically includes:
before data current limiting: executing a random-size file concurrent read-write task, recording the current concurrent read-write time T1-0, and recording the cluster bandwidth occupancy rate BW1-0 in the task execution process;
after data current limiting: and executing concurrent read-write tasks of the files with the same quantity and random sizes, recording the current concurrent read-write time T1-1, and recording the cluster bandwidth occupancy BW1-1 in the task execution process.
Preferably, the recording of the large-scale concurrent execution calculation, the stream task time and the bandwidth occupation specifically includes:
before data current limiting: executing a random size file Wordcount task, recording the time T2-0 used by the current Wordcount task, and recording the cluster bandwidth occupancy rate BW2-0 in the task execution process; executing a Hive table duplicate removal task with random size, storing the completed table into an HDFS through Kafka, recording the time T3-0 used by the current Hive duplicate removal task, and recording the cluster bandwidth occupancy rate BW3-0 in the task execution process;
after data current limiting: executing the Wordcount tasks of the random files with the same quantity and the random sizes, recording the time T2-1 used by the current Wordcount task, and recording the cluster bandwidth occupancy rate BW2-1 in the task execution process; executing the same number of Hive table deduplication tasks with random sizes, storing the completed tables into an HDFS through Kafka, recording the time T3-1 used by the current Hive deduplication tasks, and recording the cluster bandwidth occupancy ratio BW3-1 in the task execution process.
Preferably, when the following conditions exist in the read-write class data:
t1-1> T1-0 and BW1-1< BW1-0
The data flow limitation achieves the effect, the data read-write task is limited after the flow limitation, the task execution time is longer after the data flow limitation, but the bandwidth occupancy rate is lower;
when the following conditions exist for computing class data:
t2-1> T2-0 and BW2-1< BW2-0
The data flow limitation achieves the effect, the calculation tasks are limited after the flow limitation, the task execution time is longer after the data flow limitation, but the bandwidth occupancy rate is lower;
when the following conditions exist in the stream class data:
t3-1> T3-0 and BW3-1< BW3-0
The data current limit achieves the effect, the current class task is limited after the current limit, the task time of the data current limit execution is longer, but the broadband occupancy rate is lower.
Preferably, the method further comprises:
log saves all records of the test procedure to log Xianliu _ test.
The invention also provides a data current-limiting test system based on the distributed file system, which comprises:
the stability test module before and after current limiting is used for respectively executing cluster stability tests before and after data current limiting, and comprises the steps of checking the current reading and writing and task scale of a cluster, recording the reading and writing time and bandwidth occupation of large-scale concurrent execution data, and recording the large-scale concurrent execution calculation, the streaming task time and the bandwidth occupation;
and the data comparison module is used for performing data comparison on the read-write data, the calculation data and the stream data before and after data current limiting and evaluating whether the current data current limiting strategy meets the requirements or not.
Preferably, the system further comprises:
and the log storage module is used for storing all records of the test process into a log Xianliu _ test.
The invention also provides a data current-limiting test device based on the distributed file system, which comprises:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the distributed file system data current limiting test method.
The invention also provides a readable storage medium for storing a computer program, wherein the computer program is executed by a processor to implement the distributed file system data current limiting test method.
The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
compared with the prior art, the cluster stability is tested before and after data current limiting, the cluster stability testing method comprises the steps of checking the current reading and writing and task scale of the cluster, large-scale concurrent execution data reading and writing time and bandwidth occupation records, large-scale concurrent execution calculation, stream task time and bandwidth occupation records, and comparing data before and after data current limiting, so that whether the large-scale HDFS cluster data current limiting in a production environment can achieve the data current limiting benefit or not is evaluated, the effect achieved by a data current limiting strategy is accurately evaluated, an evaluation result is provided for the HDFS data current limiting technology innovation, and one dimension is improved for the large data cluster stability.
Drawings
Fig. 1 is a flowchart of a distributed file system based data current limiting test method provided in an embodiment of the present invention;
fig. 2 is a block diagram of a distributed file system based data current limiting test system according to an embodiment of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
The following describes a data current limiting test method and system based on a distributed file system in detail with reference to the accompanying drawings.
As shown in fig. 1, the present invention discloses a data current limiting test method based on a distributed file system, wherein the method comprises the following operations:
respectively executing cluster stability tests before and after data current limiting, wherein the cluster stability tests comprise checking the current reading and writing and task scale of a cluster, recording large-scale concurrent execution data reading and writing time and bandwidth occupation, recording large-scale concurrent execution calculation, streaming task time and bandwidth occupation;
and comparing the read-write data, the calculation data and the stream data before and after data current limiting, and evaluating whether the current data current limiting strategy meets the requirements.
The embodiment of the invention simulates the operations of reading and writing, calculating, streaming tasks and the like of a big data cluster before and after changing the data current-limiting strategy, records each task time, the bandwidth occupation ratio in the task process and the on-line service jitter degree, judges whether the data current-limiting strategy can achieve the benefits or not according to the data comparison of the task execution time before and after the data current-limiting strategy is started, the cluster bandwidth occupation ratio in the task process and the on-line service jitter degree, reserves the log in the test process, and performs retrospective analysis on the execution record.
Performing the following operations prior to data throttling:
and checking a data node current limiting strategy, detecting whether the data current limiting strategy of thousands or tens of thousands of DataNode nodes of the large-scale cluster is started, and if the data current limiting strategy is started, closing the data current limiting.
A cluster stability test is performed.
Checking the current reading and writing and task scale of the cluster, checking the data reading and writing scale of the current cluster through Edit.log, and checking the task scale of current cluster calculation, flow and the like through Yarn to ensure that the current indexes of the cluster are in a normal range.
Recording the read-write time and the bandwidth occupation of large-scale concurrent execution data, executing 10 ten thousand concurrent read-write tasks of files with random sizes from 100M to 10G, recording the current 10 ten thousand concurrent read-write time T1-0 with random sizes from 100M to 10G, and recording the cluster bandwidth occupation rate BW1-0 in the task execution process.
Recording large-scale concurrent execution calculation, streaming task time and bandwidth occupation, executing 10 ten thousand Wordcount tasks of 100M to 10G random-size files, recording the time T2-0 used by the current Wordcount task, and recording the cluster bandwidth occupation rate BW2-0 in the task execution process; executing 10 ten thousand 100M to 10G Hive table deduplication tasks with random sizes, storing the completed table into an HDFS (Hadoop distributed File System) through Kafka, recording the time T3-0 used by the current Hive deduplication tasks, and recording the cluster bandwidth occupancy BW3-0 in the task execution process.
And checking the cluster network bandwidth and the online service, wherein the checking comprises checking whether the cluster network bandwidth is stable and checking whether the online service has jitter and instability.
Performing the following operations after data throttling:
and checking a data node current limiting strategy, detecting whether the data current limiting strategy of thousands or tens of thousands of DataNode nodes of the large-scale cluster is started, and starting data current limiting if the data current limiting strategy is not started.
A cluster stability test is performed.
Checking the current reading and writing and task scale of the cluster, checking the data reading and writing scale of the current cluster through Edit.log, and checking the task scale of current cluster calculation, flow and the like through Yarn to ensure that the current indexes of the cluster are in a normal range.
Recording large-scale concurrent execution data read-write time and bandwidth occupation, executing 10 ten thousand random-size file concurrent read-write tasks of 100M to 10G, recording the current 10 ten thousand random-size concurrent read-write time T1-1 of 100M to 10G, and recording the cluster bandwidth occupation rate BW1-1 in the task execution process.
Recording large-scale concurrent execution calculation, streaming task time and bandwidth occupation, executing 10 ten thousand Wordcount tasks of 100M to 10G random-size files, recording the time T2-1 used by the current Wordcount task, and recording the cluster bandwidth occupation rate BW2-1 in the task execution process; executing 10 ten thousand 100M to 10G Hive table deduplication tasks with random sizes, storing the completed table into an HDFS (Hadoop distributed File System) through Kafka, recording the time T3-1 used by the current Hive deduplication tasks, and recording the cluster bandwidth occupancy BW3-1 in the task execution process.
And checking the cluster network bandwidth and the online service, wherein the checking comprises checking whether the cluster network bandwidth is stable and checking whether the online service has jitter and instability.
And comparing data before and after the data current limiting, including data reading and writing data comparison, calculation data comparison and stream data comparison.
When the following conditions exist in the read-write type data:
t1-1> T1-0 and BW1-1< BW1-0
The data flow limitation achieves the effect, the data read-write task is limited after the flow limitation, the task execution time is longer after the data flow limitation, but the bandwidth occupancy rate is lower.
When the following conditions exist for computing class data:
t2-1> T2-0 and BW2-1< BW2-0
The data flow limitation achieves the effect, the calculation tasks are limited after the flow limitation, the task execution time is longer after the data flow limitation, and the bandwidth occupancy rate is lower.
When the following conditions exist in the stream class data:
t3-1> T3-0 and BW3-1< BW3-0
The data current limit achieves the effect, the current class task is limited after the current limit, the task time of the data current limit execution is longer, but the broadband occupancy rate is lower.
Log saves all records of the test procedure to log Xianliu _ test.
The embodiment of the invention tests the cluster stability before and after data current limiting, and comprises the steps of checking the current reading and writing and task scale of the cluster, recording the large-scale concurrent execution data reading and writing time and bandwidth occupation, recording the large-scale concurrent execution calculation, the flow task time and the bandwidth occupation, and comparing the data before and after the data current limiting, so as to evaluate whether the large-scale HDFS cluster data current limiting in the production environment can achieve the data current limiting benefit, accurately evaluate the effect achieved by the data current limiting strategy, provide an evaluation result for the HDFS data current limiting technical innovation, and improve one dimension for the stability of the large data cluster.
As shown in fig. 2, an embodiment of the present invention further discloses a data current limiting test system based on a distributed file system, where the system includes:
the stability test module before and after current limiting is used for respectively executing cluster stability tests before and after data current limiting, and comprises the steps of checking the current reading and writing and task scale of a cluster, recording the reading and writing time and bandwidth occupation of large-scale concurrent execution data, and recording the large-scale concurrent execution calculation, the streaming task time and the bandwidth occupation;
and the data comparison module is used for performing data comparison on the read-write data, the calculation data and the stream data before and after data current limiting and evaluating whether the current data current limiting strategy meets the requirements or not.
Performing the following operations prior to data throttling:
and checking a data node current limiting strategy, detecting whether the data current limiting strategy of thousands or tens of thousands of DataNode nodes of the large-scale cluster is started, and if the data current limiting strategy is started, closing the data current limiting.
A cluster stability test is performed.
Checking the current reading and writing and task scale of the cluster, checking the data reading and writing scale of the current cluster through Edit.log, and checking the task scale of current cluster calculation, flow and the like through Yarn to ensure that the current indexes of the cluster are in a normal range.
Recording the read-write time and the bandwidth occupation of large-scale concurrent execution data, executing 10 ten thousand concurrent read-write tasks of files with random sizes from 100M to 10G, recording the current 10 ten thousand concurrent read-write time T1-0 with random sizes from 100M to 10G, and recording the cluster bandwidth occupation rate BW1-0 in the task execution process.
Recording large-scale concurrent execution calculation, streaming task time and bandwidth occupation, executing 10 ten thousand Wordcount tasks of 100M to 10G random-size files, recording the time T2-0 used by the current Wordcount task, and recording the cluster bandwidth occupation rate BW2-0 in the task execution process; executing 10 ten thousand 100M to 10G Hive table deduplication tasks with random sizes, storing the completed table into an HDFS (Hadoop distributed File System) through Kafka, recording the time T3-0 used by the current Hive deduplication tasks, and recording the cluster bandwidth occupancy BW3-0 in the task execution process.
And checking the cluster network bandwidth and the online service, wherein the checking comprises checking whether the cluster network bandwidth is stable and checking whether the online service has jitter and instability.
Performing the following operations after data throttling:
and checking a data node current limiting strategy, detecting whether the data current limiting strategy of thousands or tens of thousands of DataNode nodes of the large-scale cluster is started, and starting data current limiting if the data current limiting strategy is not started.
A cluster stability test is performed.
Checking the current reading and writing and task scale of the cluster, checking the data reading and writing scale of the current cluster through Edit.log, and checking the task scale of current cluster calculation, flow and the like through Yarn to ensure that the current indexes of the cluster are in a normal range.
Recording large-scale concurrent execution data read-write time and bandwidth occupation, executing 10 ten thousand random-size file concurrent read-write tasks of 100M to 10G, recording the current 10 ten thousand random-size concurrent read-write time T1-1 of 100M to 10G, and recording the cluster bandwidth occupation rate BW1-1 in the task execution process.
Recording large-scale concurrent execution calculation, streaming task time and bandwidth occupation, executing 10 ten thousand Wordcount tasks of 100M to 10G random-size files, recording the time T2-1 used by the current Wordcount task, and recording the cluster bandwidth occupation rate BW2-1 in the task execution process; executing 10 ten thousand 100M to 10G Hive table deduplication tasks with random sizes, storing the completed table into an HDFS (Hadoop distributed File System) through Kafka, recording the time T3-1 used by the current Hive deduplication tasks, and recording the cluster bandwidth occupancy BW3-1 in the task execution process.
And checking the cluster network bandwidth and the online service, wherein the checking comprises checking whether the cluster network bandwidth is stable and checking whether the online service has jitter and instability.
And comparing data before and after the data current limiting, including data reading and writing data comparison, calculation data comparison and stream data comparison.
When the following conditions exist in the read-write type data:
t1-1> T1-0 and BW1-1< BW1-0
The data flow limitation achieves the effect, the data read-write task is limited after the flow limitation, the task execution time is longer after the data flow limitation, but the bandwidth occupancy rate is lower.
When the following conditions exist for computing class data:
t2-1> T2-0 and BW2-1< BW2-0
The data flow limitation achieves the effect, the calculation tasks are limited after the flow limitation, the task execution time is longer after the data flow limitation, and the bandwidth occupancy rate is lower.
When the following conditions exist in the stream class data:
t3-1> T3-0 and BW3-1< BW3-0
The data current limit achieves the effect, the current class task is limited after the current limit, the task time of the data current limit execution is longer, but the broadband occupancy rate is lower.
The system also comprises a log saving module which is used for saving all records of the test process into a log Xianliu _ test.
The embodiment of the invention also discloses a data current-limiting test device based on the distributed file system, which comprises:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the distributed file system data current limiting test method.
The embodiment of the invention also discloses a readable storage medium for storing a computer program, wherein the computer program is executed by a processor to realize the distributed file system data current limiting test method.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (9)
1. A data current limiting test method based on a distributed file system is characterized by comprising the following operations:
respectively executing cluster stability tests before and after data current limiting, wherein the cluster stability tests comprise checking the current reading and writing and task scale of a cluster, recording large-scale concurrent execution data reading and writing time and bandwidth occupation, recording large-scale concurrent execution calculation, streaming task time and bandwidth occupation;
and comparing the read-write data, the calculation data and the stream data before and after data current limiting, and evaluating whether the current data current limiting strategy meets the requirements.
2. The distributed file system data flow-limiting test method according to claim 1, wherein the recording of the large-scale concurrent execution data read-write time and the bandwidth occupation is specifically:
before data current limiting: executing a random-size file concurrent read-write task, recording the current concurrent read-write time T1-0, and recording the cluster bandwidth occupancy rate BW1-0 in the task execution process;
after data current limiting: and executing concurrent read-write tasks of the files with the same quantity and random sizes, recording the current concurrent read-write time T1-1, and recording the cluster bandwidth occupancy BW1-1 in the task execution process.
3. The distributed file system data flow-limiting-based testing method of claim 1, wherein the recording of large-scale concurrent execution calculation, streaming task time, and bandwidth occupancy specifically comprises:
before data current limiting: executing a random size file Wordcount task, recording the time T2-0 used by the current Wordcount task, and recording the cluster bandwidth occupancy rate BW2-0 in the task execution process; executing a Hive table duplicate removal task with random size, storing the completed table into an HDFS through Kafka, recording the time T3-0 used by the current Hive duplicate removal task, and recording the cluster bandwidth occupancy rate BW3-0 in the task execution process;
after data current limiting: executing the Wordcount tasks of the random files with the same quantity and the random sizes, recording the time T2-1 used by the current Wordcount task, and recording the cluster bandwidth occupancy rate BW2-1 in the task execution process; executing the same number of Hive table deduplication tasks with random sizes, storing the completed tables into an HDFS through Kafka, recording the time T3-1 used by the current Hive deduplication tasks, and recording the cluster bandwidth occupancy ratio BW3-1 in the task execution process.
4. The distributed file system data flow limit-based testing method of claim 3, wherein when the following conditions exist in the read-write data:
t1-1> T1-0 and BW1-1< BW1-0
The data flow limitation achieves the effect, the data read-write task is limited after the flow limitation, the task execution time is longer after the data flow limitation, but the bandwidth occupancy rate is lower;
when the following conditions exist for computing class data:
t2-1> T2-0 and BW2-1< BW2-0
The data flow limitation achieves the effect, the calculation tasks are limited after the flow limitation, the task execution time is longer after the data flow limitation, but the bandwidth occupancy rate is lower;
when the following conditions exist in the stream class data:
t3-1> T3-0 and BW3-1< BW3-0
The data current limit achieves the effect, the current class task is limited after the current limit, the task time of the data current limit execution is longer, but the broadband occupancy rate is lower.
5. The distributed file system data flow-limiting-based testing method of claim 1, wherein the method further comprises:
log saves all records of the test procedure to log Xianliu _ test.
6. A distributed file system based data current limiting test system, the system comprising:
the stability test module before and after current limiting is used for respectively executing cluster stability tests before and after data current limiting, and comprises the steps of checking the current reading and writing and task scale of a cluster, recording the reading and writing time and bandwidth occupation of large-scale concurrent execution data, and recording the large-scale concurrent execution calculation, the streaming task time and the bandwidth occupation;
and the data comparison module is used for performing data comparison on the read-write data, the calculation data and the stream data before and after data current limiting and evaluating whether the current data current limiting strategy meets the requirements or not.
7. The distributed file system data flow-restriction based test system of claim 6, wherein the system further comprises:
and the log storage module is used for storing all records of the test process into a log Xianliu _ test.
8. A data current limiting test device based on a distributed file system is characterized by comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the distributed file system data flow limitation testing method according to any of claims 1 to 5.
9. A readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the distributed file system data throttling testing method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010094784.4A CN111309612A (en) | 2020-02-16 | 2020-02-16 | Distributed file system based data current limiting test method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010094784.4A CN111309612A (en) | 2020-02-16 | 2020-02-16 | Distributed file system based data current limiting test method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111309612A true CN111309612A (en) | 2020-06-19 |
Family
ID=71158221
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010094784.4A Pending CN111309612A (en) | 2020-02-16 | 2020-02-16 | Distributed file system based data current limiting test method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111309612A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114598708A (en) * | 2020-11-20 | 2022-06-07 | 马上消费金融股份有限公司 | Information processing method, device, system, equipment and readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103116542A (en) * | 2013-01-24 | 2013-05-22 | 浪潮(北京)电子信息产业有限公司 | Test method of equipment expansion stability |
CN106095646A (en) * | 2016-06-27 | 2016-11-09 | 江苏迪纳数字科技股份有限公司 | Hadoop performance cluster computational methods based on multiple linear regression model |
US20170124464A1 (en) * | 2015-10-28 | 2017-05-04 | Fractal Industries, Inc. | Rapid predictive analysis of very large data sets using the distributed computational graph |
CN110061930A (en) * | 2019-02-01 | 2019-07-26 | 阿里巴巴集团控股有限公司 | A kind of limitation of data traffic, cut-off current determination method and apparatus |
-
2020
- 2020-02-16 CN CN202010094784.4A patent/CN111309612A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103116542A (en) * | 2013-01-24 | 2013-05-22 | 浪潮(北京)电子信息产业有限公司 | Test method of equipment expansion stability |
US20170124464A1 (en) * | 2015-10-28 | 2017-05-04 | Fractal Industries, Inc. | Rapid predictive analysis of very large data sets using the distributed computational graph |
CN106095646A (en) * | 2016-06-27 | 2016-11-09 | 江苏迪纳数字科技股份有限公司 | Hadoop performance cluster computational methods based on multiple linear regression model |
CN110061930A (en) * | 2019-02-01 | 2019-07-26 | 阿里巴巴集团控股有限公司 | A kind of limitation of data traffic, cut-off current determination method and apparatus |
Non-Patent Citations (3)
Title |
---|
ANDROID路上的人: "HDFS的读写限流方案", 《HTTPS://BLOG.CSDN.NET/ANDROIDLUSHANGDEREN/ARTICLE/DETAILS/51235380》 * |
ZHIGANG1007: "hadoop集群基准测试", 《HTTPS://BLOG.CSDN.NET/ZHIGANG1007/ARTICLE/DETAILS/78695064》 * |
万川梅: "《Hadoop应用开发实战详解》", 31 August 2014, 中国铁道出版社 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114598708A (en) * | 2020-11-20 | 2022-06-07 | 马上消费金融股份有限公司 | Information processing method, device, system, equipment and readable storage medium |
CN114598708B (en) * | 2020-11-20 | 2024-04-26 | 马上消费金融股份有限公司 | Information processing method, device, system, equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110147411B (en) | Data synchronization method, device, computer equipment and storage medium | |
Rabl et al. | Solving big data challenges for enterprise application performance management | |
CN108595664B (en) | Agricultural data monitoring method in hadoop environment | |
Donvito et al. | Testing of several distributed file-systems (HDFS, Ceph and GlusterFS) for supporting the HEP experiments analysis | |
US11782609B2 (en) | Method and apparatus for auditing abnormality of block device in cloud platform, device, and storage medium | |
CN111459698A (en) | Database cluster fault self-healing method and device | |
WO2017161471A1 (en) | Heterogeneous type database storage system based on optical disk, and method for using system | |
CN106850321A (en) | A kind of simulated testing system of cluster server | |
US20210191903A1 (en) | Generating hash trees for database schemas | |
CN105677251A (en) | Storage system based on Redis cluster | |
Wang et al. | Benchmarking replication and consistency strategies in cloud serving databases: Hbase and cassandra | |
US20150089042A1 (en) | Dynamic discovery of applications, external dependencies, and relationships | |
CN111309612A (en) | Distributed file system based data current limiting test method and system | |
Ivanov et al. | Performance evaluation of enterprise big data platforms with HiBench | |
CN111240936A (en) | Data integrity checking method and equipment | |
WO2024066506A1 (en) | Data monitoring and analysis method and apparatus, and server, operation and maintenance system, and storage medium | |
Zou et al. | Improving log-based fault diagnosis by log classification | |
CN113778795B (en) | Cross-version Oracle monitoring system based on Python language | |
Iuhasz et al. | Monitoring of exascale data processing | |
Tovarnák et al. | Structured and interoperable logging for the cloud computing Era: The pitfalls and benefits | |
CN107870824A (en) | A kind of method and device that inspection is carried out to component | |
Hasanpuri et al. | Comparative analysis of techniques for big-data performance testing | |
US20140358616A1 (en) | Asset management for a computer-based system using aggregated weights of changed assets | |
CN113688017B (en) | Automatic abnormality testing method and device for multi-node BeeGFS file system | |
Ibrahim | Workstation Cluster’s Hadoop Distributed File System Simulation and Modeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200619 |
|
RJ01 | Rejection of invention patent application after publication |