CN111309612A - Distributed file system based data current limiting test method and system - Google Patents

Distributed file system based data current limiting test method and system Download PDF

Info

Publication number
CN111309612A
CN111309612A CN202010094784.4A CN202010094784A CN111309612A CN 111309612 A CN111309612 A CN 111309612A CN 202010094784 A CN202010094784 A CN 202010094784A CN 111309612 A CN111309612 A CN 111309612A
Authority
CN
China
Prior art keywords
data
task
recording
cluster
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010094784.4A
Other languages
Chinese (zh)
Inventor
张东东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010094784.4A priority Critical patent/CN111309612A/en
Publication of CN111309612A publication Critical patent/CN111309612A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a data current-limiting test method and a data current-limiting test system based on a distributed file system, which are used for testing cluster stability before and after data current limiting, and comprise the steps of checking the current reading and writing and task scale of a cluster, the reading and writing time and bandwidth occupation record of large-scale concurrent execution data, the large-scale concurrent execution calculation, the flow task time and the bandwidth occupation record of the cluster, and carrying out data comparison before and after data current limiting, so that whether the large-scale HDFS cluster data current limiting in a production environment can achieve the data current limiting benefit or not is evaluated, the effect of a data current limiting strategy is accurately evaluated, an evaluation result is provided for the HDFS data current limiting technical innovation, and one dimension is improved for the stability of the large data.

Description

Distributed file system based data current limiting test method and system
Technical Field
The invention relates to the technical field of server clusters, in particular to a data current-limiting test method and system based on a distributed file system.
Background
With the development of Hadoop community technology, the HDFS continuously supports different storage strategies to deal with data at different temperatures, SSM is adopted to realize more intelligent storage management, the HDFS is high in availability and continuously perfect, and the federal dealing with large-scale cluster data at higher data level is realized. With the increasing of data volume, the improvement of high storage efficiency and high reliability is all the previous step, but today the number of large-scale clusters increases exponentially, a state of no concentration has appeared on bottom layer data nodes, data is continuously stored in the clusters, tasks are operated, the storage efficiency and the computing power of a software layer have been improved by SSM and Spark, Flink and the like with higher computing power, but continuous data flow and tasks occupy a large amount of network bandwidth, data reading and writing in the large-scale clusters are very frequent, the data transmission number is large, the computing and streaming tasks are large in quantity, the network bandwidth of one machine is necessarily limited, and if the bandwidth is used up by some tasks on the machine, normal task network transmission data is influenced. If the bandwidth is filled for a long time, machine IO alarms may also be caused, and the purpose of current limiting is in this place. It is not necessarily a malicious program or service that can cause the network bandwidth to quickly become full, and an inadvertent process or small error in the program can cause large-scale data transfers.
In order to solve the problem that the network bandwidth of a machine room is instantly filled due to several large tasks running in the current HDFS large-scale cluster, so that the on-line part of service is jittered and other service operation is influenced, a limiting scheme at a dataode end has been proposed in a Hadoop community, but a series of related function release is not completely perfected. The large-scale cluster optimization technology represented by the Hadoop system ecosphere can improve one dimension of the stability of the current HDFS cluster and prevent the HDFS cluster from happening. And limiting data flow limitation related similar operations to the DataNode so as to ensure cluster stability. With the increasing of big data, the data flow limitation will be more and more perfect with the perfection of the function, the updating of Hadoop community patches and the release of subsequent new versions, and for such an intelligent and complex tuning scheme, how to evaluate whether the data flow limitation can achieve corresponding benefits, whether data operation and tasks in a cluster are intelligently limited and managed, and how to evaluate the benefits of a data flow limitation strategy is an important problem that needs to be solved by technical personnel in the field.
Disclosure of Invention
The invention aims to provide a data current-limiting test method and system based on a distributed file system, and aims to solve the problem that the prior art lacks data current-limiting strategy evaluation, achieve the effect of accurately evaluating a data current-limiting strategy and improve the stability of a big data cluster.
In order to achieve the technical purpose, the invention provides a data current-limiting test method based on a distributed file system, which comprises the following operations:
respectively executing cluster stability tests before and after data current limiting, wherein the cluster stability tests comprise checking the current reading and writing and task scale of a cluster, recording large-scale concurrent execution data reading and writing time and bandwidth occupation, recording large-scale concurrent execution calculation, streaming task time and bandwidth occupation;
and comparing the read-write data, the calculation data and the stream data before and after data current limiting, and evaluating whether the current data current limiting strategy meets the requirements.
Preferably, the recording of the large-scale concurrent execution data read-write time and the bandwidth occupation specifically includes:
before data current limiting: executing a random-size file concurrent read-write task, recording the current concurrent read-write time T1-0, and recording the cluster bandwidth occupancy rate BW1-0 in the task execution process;
after data current limiting: and executing concurrent read-write tasks of the files with the same quantity and random sizes, recording the current concurrent read-write time T1-1, and recording the cluster bandwidth occupancy BW1-1 in the task execution process.
Preferably, the recording of the large-scale concurrent execution calculation, the stream task time and the bandwidth occupation specifically includes:
before data current limiting: executing a random size file Wordcount task, recording the time T2-0 used by the current Wordcount task, and recording the cluster bandwidth occupancy rate BW2-0 in the task execution process; executing a Hive table duplicate removal task with random size, storing the completed table into an HDFS through Kafka, recording the time T3-0 used by the current Hive duplicate removal task, and recording the cluster bandwidth occupancy rate BW3-0 in the task execution process;
after data current limiting: executing the Wordcount tasks of the random files with the same quantity and the random sizes, recording the time T2-1 used by the current Wordcount task, and recording the cluster bandwidth occupancy rate BW2-1 in the task execution process; executing the same number of Hive table deduplication tasks with random sizes, storing the completed tables into an HDFS through Kafka, recording the time T3-1 used by the current Hive deduplication tasks, and recording the cluster bandwidth occupancy ratio BW3-1 in the task execution process.
Preferably, when the following conditions exist in the read-write class data:
t1-1> T1-0 and BW1-1< BW1-0
The data flow limitation achieves the effect, the data read-write task is limited after the flow limitation, the task execution time is longer after the data flow limitation, but the bandwidth occupancy rate is lower;
when the following conditions exist for computing class data:
t2-1> T2-0 and BW2-1< BW2-0
The data flow limitation achieves the effect, the calculation tasks are limited after the flow limitation, the task execution time is longer after the data flow limitation, but the bandwidth occupancy rate is lower;
when the following conditions exist in the stream class data:
t3-1> T3-0 and BW3-1< BW3-0
The data current limit achieves the effect, the current class task is limited after the current limit, the task time of the data current limit execution is longer, but the broadband occupancy rate is lower.
Preferably, the method further comprises:
log saves all records of the test procedure to log Xianliu _ test.
The invention also provides a data current-limiting test system based on the distributed file system, which comprises:
the stability test module before and after current limiting is used for respectively executing cluster stability tests before and after data current limiting, and comprises the steps of checking the current reading and writing and task scale of a cluster, recording the reading and writing time and bandwidth occupation of large-scale concurrent execution data, and recording the large-scale concurrent execution calculation, the streaming task time and the bandwidth occupation;
and the data comparison module is used for performing data comparison on the read-write data, the calculation data and the stream data before and after data current limiting and evaluating whether the current data current limiting strategy meets the requirements or not.
Preferably, the system further comprises:
and the log storage module is used for storing all records of the test process into a log Xianliu _ test.
The invention also provides a data current-limiting test device based on the distributed file system, which comprises:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the distributed file system data current limiting test method.
The invention also provides a readable storage medium for storing a computer program, wherein the computer program is executed by a processor to implement the distributed file system data current limiting test method.
The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
compared with the prior art, the cluster stability is tested before and after data current limiting, the cluster stability testing method comprises the steps of checking the current reading and writing and task scale of the cluster, large-scale concurrent execution data reading and writing time and bandwidth occupation records, large-scale concurrent execution calculation, stream task time and bandwidth occupation records, and comparing data before and after data current limiting, so that whether the large-scale HDFS cluster data current limiting in a production environment can achieve the data current limiting benefit or not is evaluated, the effect achieved by a data current limiting strategy is accurately evaluated, an evaluation result is provided for the HDFS data current limiting technology innovation, and one dimension is improved for the large data cluster stability.
Drawings
Fig. 1 is a flowchart of a distributed file system based data current limiting test method provided in an embodiment of the present invention;
fig. 2 is a block diagram of a distributed file system based data current limiting test system according to an embodiment of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
The following describes a data current limiting test method and system based on a distributed file system in detail with reference to the accompanying drawings.
As shown in fig. 1, the present invention discloses a data current limiting test method based on a distributed file system, wherein the method comprises the following operations:
respectively executing cluster stability tests before and after data current limiting, wherein the cluster stability tests comprise checking the current reading and writing and task scale of a cluster, recording large-scale concurrent execution data reading and writing time and bandwidth occupation, recording large-scale concurrent execution calculation, streaming task time and bandwidth occupation;
and comparing the read-write data, the calculation data and the stream data before and after data current limiting, and evaluating whether the current data current limiting strategy meets the requirements.
The embodiment of the invention simulates the operations of reading and writing, calculating, streaming tasks and the like of a big data cluster before and after changing the data current-limiting strategy, records each task time, the bandwidth occupation ratio in the task process and the on-line service jitter degree, judges whether the data current-limiting strategy can achieve the benefits or not according to the data comparison of the task execution time before and after the data current-limiting strategy is started, the cluster bandwidth occupation ratio in the task process and the on-line service jitter degree, reserves the log in the test process, and performs retrospective analysis on the execution record.
Performing the following operations prior to data throttling:
and checking a data node current limiting strategy, detecting whether the data current limiting strategy of thousands or tens of thousands of DataNode nodes of the large-scale cluster is started, and if the data current limiting strategy is started, closing the data current limiting.
A cluster stability test is performed.
Checking the current reading and writing and task scale of the cluster, checking the data reading and writing scale of the current cluster through Edit.log, and checking the task scale of current cluster calculation, flow and the like through Yarn to ensure that the current indexes of the cluster are in a normal range.
Recording the read-write time and the bandwidth occupation of large-scale concurrent execution data, executing 10 ten thousand concurrent read-write tasks of files with random sizes from 100M to 10G, recording the current 10 ten thousand concurrent read-write time T1-0 with random sizes from 100M to 10G, and recording the cluster bandwidth occupation rate BW1-0 in the task execution process.
Recording large-scale concurrent execution calculation, streaming task time and bandwidth occupation, executing 10 ten thousand Wordcount tasks of 100M to 10G random-size files, recording the time T2-0 used by the current Wordcount task, and recording the cluster bandwidth occupation rate BW2-0 in the task execution process; executing 10 ten thousand 100M to 10G Hive table deduplication tasks with random sizes, storing the completed table into an HDFS (Hadoop distributed File System) through Kafka, recording the time T3-0 used by the current Hive deduplication tasks, and recording the cluster bandwidth occupancy BW3-0 in the task execution process.
And checking the cluster network bandwidth and the online service, wherein the checking comprises checking whether the cluster network bandwidth is stable and checking whether the online service has jitter and instability.
Performing the following operations after data throttling:
and checking a data node current limiting strategy, detecting whether the data current limiting strategy of thousands or tens of thousands of DataNode nodes of the large-scale cluster is started, and starting data current limiting if the data current limiting strategy is not started.
A cluster stability test is performed.
Checking the current reading and writing and task scale of the cluster, checking the data reading and writing scale of the current cluster through Edit.log, and checking the task scale of current cluster calculation, flow and the like through Yarn to ensure that the current indexes of the cluster are in a normal range.
Recording large-scale concurrent execution data read-write time and bandwidth occupation, executing 10 ten thousand random-size file concurrent read-write tasks of 100M to 10G, recording the current 10 ten thousand random-size concurrent read-write time T1-1 of 100M to 10G, and recording the cluster bandwidth occupation rate BW1-1 in the task execution process.
Recording large-scale concurrent execution calculation, streaming task time and bandwidth occupation, executing 10 ten thousand Wordcount tasks of 100M to 10G random-size files, recording the time T2-1 used by the current Wordcount task, and recording the cluster bandwidth occupation rate BW2-1 in the task execution process; executing 10 ten thousand 100M to 10G Hive table deduplication tasks with random sizes, storing the completed table into an HDFS (Hadoop distributed File System) through Kafka, recording the time T3-1 used by the current Hive deduplication tasks, and recording the cluster bandwidth occupancy BW3-1 in the task execution process.
And checking the cluster network bandwidth and the online service, wherein the checking comprises checking whether the cluster network bandwidth is stable and checking whether the online service has jitter and instability.
And comparing data before and after the data current limiting, including data reading and writing data comparison, calculation data comparison and stream data comparison.
When the following conditions exist in the read-write type data:
t1-1> T1-0 and BW1-1< BW1-0
The data flow limitation achieves the effect, the data read-write task is limited after the flow limitation, the task execution time is longer after the data flow limitation, but the bandwidth occupancy rate is lower.
When the following conditions exist for computing class data:
t2-1> T2-0 and BW2-1< BW2-0
The data flow limitation achieves the effect, the calculation tasks are limited after the flow limitation, the task execution time is longer after the data flow limitation, and the bandwidth occupancy rate is lower.
When the following conditions exist in the stream class data:
t3-1> T3-0 and BW3-1< BW3-0
The data current limit achieves the effect, the current class task is limited after the current limit, the task time of the data current limit execution is longer, but the broadband occupancy rate is lower.
Log saves all records of the test procedure to log Xianliu _ test.
The embodiment of the invention tests the cluster stability before and after data current limiting, and comprises the steps of checking the current reading and writing and task scale of the cluster, recording the large-scale concurrent execution data reading and writing time and bandwidth occupation, recording the large-scale concurrent execution calculation, the flow task time and the bandwidth occupation, and comparing the data before and after the data current limiting, so as to evaluate whether the large-scale HDFS cluster data current limiting in the production environment can achieve the data current limiting benefit, accurately evaluate the effect achieved by the data current limiting strategy, provide an evaluation result for the HDFS data current limiting technical innovation, and improve one dimension for the stability of the large data cluster.
As shown in fig. 2, an embodiment of the present invention further discloses a data current limiting test system based on a distributed file system, where the system includes:
the stability test module before and after current limiting is used for respectively executing cluster stability tests before and after data current limiting, and comprises the steps of checking the current reading and writing and task scale of a cluster, recording the reading and writing time and bandwidth occupation of large-scale concurrent execution data, and recording the large-scale concurrent execution calculation, the streaming task time and the bandwidth occupation;
and the data comparison module is used for performing data comparison on the read-write data, the calculation data and the stream data before and after data current limiting and evaluating whether the current data current limiting strategy meets the requirements or not.
Performing the following operations prior to data throttling:
and checking a data node current limiting strategy, detecting whether the data current limiting strategy of thousands or tens of thousands of DataNode nodes of the large-scale cluster is started, and if the data current limiting strategy is started, closing the data current limiting.
A cluster stability test is performed.
Checking the current reading and writing and task scale of the cluster, checking the data reading and writing scale of the current cluster through Edit.log, and checking the task scale of current cluster calculation, flow and the like through Yarn to ensure that the current indexes of the cluster are in a normal range.
Recording the read-write time and the bandwidth occupation of large-scale concurrent execution data, executing 10 ten thousand concurrent read-write tasks of files with random sizes from 100M to 10G, recording the current 10 ten thousand concurrent read-write time T1-0 with random sizes from 100M to 10G, and recording the cluster bandwidth occupation rate BW1-0 in the task execution process.
Recording large-scale concurrent execution calculation, streaming task time and bandwidth occupation, executing 10 ten thousand Wordcount tasks of 100M to 10G random-size files, recording the time T2-0 used by the current Wordcount task, and recording the cluster bandwidth occupation rate BW2-0 in the task execution process; executing 10 ten thousand 100M to 10G Hive table deduplication tasks with random sizes, storing the completed table into an HDFS (Hadoop distributed File System) through Kafka, recording the time T3-0 used by the current Hive deduplication tasks, and recording the cluster bandwidth occupancy BW3-0 in the task execution process.
And checking the cluster network bandwidth and the online service, wherein the checking comprises checking whether the cluster network bandwidth is stable and checking whether the online service has jitter and instability.
Performing the following operations after data throttling:
and checking a data node current limiting strategy, detecting whether the data current limiting strategy of thousands or tens of thousands of DataNode nodes of the large-scale cluster is started, and starting data current limiting if the data current limiting strategy is not started.
A cluster stability test is performed.
Checking the current reading and writing and task scale of the cluster, checking the data reading and writing scale of the current cluster through Edit.log, and checking the task scale of current cluster calculation, flow and the like through Yarn to ensure that the current indexes of the cluster are in a normal range.
Recording large-scale concurrent execution data read-write time and bandwidth occupation, executing 10 ten thousand random-size file concurrent read-write tasks of 100M to 10G, recording the current 10 ten thousand random-size concurrent read-write time T1-1 of 100M to 10G, and recording the cluster bandwidth occupation rate BW1-1 in the task execution process.
Recording large-scale concurrent execution calculation, streaming task time and bandwidth occupation, executing 10 ten thousand Wordcount tasks of 100M to 10G random-size files, recording the time T2-1 used by the current Wordcount task, and recording the cluster bandwidth occupation rate BW2-1 in the task execution process; executing 10 ten thousand 100M to 10G Hive table deduplication tasks with random sizes, storing the completed table into an HDFS (Hadoop distributed File System) through Kafka, recording the time T3-1 used by the current Hive deduplication tasks, and recording the cluster bandwidth occupancy BW3-1 in the task execution process.
And checking the cluster network bandwidth and the online service, wherein the checking comprises checking whether the cluster network bandwidth is stable and checking whether the online service has jitter and instability.
And comparing data before and after the data current limiting, including data reading and writing data comparison, calculation data comparison and stream data comparison.
When the following conditions exist in the read-write type data:
t1-1> T1-0 and BW1-1< BW1-0
The data flow limitation achieves the effect, the data read-write task is limited after the flow limitation, the task execution time is longer after the data flow limitation, but the bandwidth occupancy rate is lower.
When the following conditions exist for computing class data:
t2-1> T2-0 and BW2-1< BW2-0
The data flow limitation achieves the effect, the calculation tasks are limited after the flow limitation, the task execution time is longer after the data flow limitation, and the bandwidth occupancy rate is lower.
When the following conditions exist in the stream class data:
t3-1> T3-0 and BW3-1< BW3-0
The data current limit achieves the effect, the current class task is limited after the current limit, the task time of the data current limit execution is longer, but the broadband occupancy rate is lower.
The system also comprises a log saving module which is used for saving all records of the test process into a log Xianliu _ test.
The embodiment of the invention also discloses a data current-limiting test device based on the distributed file system, which comprises:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the distributed file system data current limiting test method.
The embodiment of the invention also discloses a readable storage medium for storing a computer program, wherein the computer program is executed by a processor to realize the distributed file system data current limiting test method.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (9)

1. A data current limiting test method based on a distributed file system is characterized by comprising the following operations:
respectively executing cluster stability tests before and after data current limiting, wherein the cluster stability tests comprise checking the current reading and writing and task scale of a cluster, recording large-scale concurrent execution data reading and writing time and bandwidth occupation, recording large-scale concurrent execution calculation, streaming task time and bandwidth occupation;
and comparing the read-write data, the calculation data and the stream data before and after data current limiting, and evaluating whether the current data current limiting strategy meets the requirements.
2. The distributed file system data flow-limiting test method according to claim 1, wherein the recording of the large-scale concurrent execution data read-write time and the bandwidth occupation is specifically:
before data current limiting: executing a random-size file concurrent read-write task, recording the current concurrent read-write time T1-0, and recording the cluster bandwidth occupancy rate BW1-0 in the task execution process;
after data current limiting: and executing concurrent read-write tasks of the files with the same quantity and random sizes, recording the current concurrent read-write time T1-1, and recording the cluster bandwidth occupancy BW1-1 in the task execution process.
3. The distributed file system data flow-limiting-based testing method of claim 1, wherein the recording of large-scale concurrent execution calculation, streaming task time, and bandwidth occupancy specifically comprises:
before data current limiting: executing a random size file Wordcount task, recording the time T2-0 used by the current Wordcount task, and recording the cluster bandwidth occupancy rate BW2-0 in the task execution process; executing a Hive table duplicate removal task with random size, storing the completed table into an HDFS through Kafka, recording the time T3-0 used by the current Hive duplicate removal task, and recording the cluster bandwidth occupancy rate BW3-0 in the task execution process;
after data current limiting: executing the Wordcount tasks of the random files with the same quantity and the random sizes, recording the time T2-1 used by the current Wordcount task, and recording the cluster bandwidth occupancy rate BW2-1 in the task execution process; executing the same number of Hive table deduplication tasks with random sizes, storing the completed tables into an HDFS through Kafka, recording the time T3-1 used by the current Hive deduplication tasks, and recording the cluster bandwidth occupancy ratio BW3-1 in the task execution process.
4. The distributed file system data flow limit-based testing method of claim 3, wherein when the following conditions exist in the read-write data:
t1-1> T1-0 and BW1-1< BW1-0
The data flow limitation achieves the effect, the data read-write task is limited after the flow limitation, the task execution time is longer after the data flow limitation, but the bandwidth occupancy rate is lower;
when the following conditions exist for computing class data:
t2-1> T2-0 and BW2-1< BW2-0
The data flow limitation achieves the effect, the calculation tasks are limited after the flow limitation, the task execution time is longer after the data flow limitation, but the bandwidth occupancy rate is lower;
when the following conditions exist in the stream class data:
t3-1> T3-0 and BW3-1< BW3-0
The data current limit achieves the effect, the current class task is limited after the current limit, the task time of the data current limit execution is longer, but the broadband occupancy rate is lower.
5. The distributed file system data flow-limiting-based testing method of claim 1, wherein the method further comprises:
log saves all records of the test procedure to log Xianliu _ test.
6. A distributed file system based data current limiting test system, the system comprising:
the stability test module before and after current limiting is used for respectively executing cluster stability tests before and after data current limiting, and comprises the steps of checking the current reading and writing and task scale of a cluster, recording the reading and writing time and bandwidth occupation of large-scale concurrent execution data, and recording the large-scale concurrent execution calculation, the streaming task time and the bandwidth occupation;
and the data comparison module is used for performing data comparison on the read-write data, the calculation data and the stream data before and after data current limiting and evaluating whether the current data current limiting strategy meets the requirements or not.
7. The distributed file system data flow-restriction based test system of claim 6, wherein the system further comprises:
and the log storage module is used for storing all records of the test process into a log Xianliu _ test.
8. A data current limiting test device based on a distributed file system is characterized by comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the distributed file system data flow limitation testing method according to any of claims 1 to 5.
9. A readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the distributed file system data throttling testing method according to any one of claims 1 to 5.
CN202010094784.4A 2020-02-16 2020-02-16 Distributed file system based data current limiting test method and system Pending CN111309612A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010094784.4A CN111309612A (en) 2020-02-16 2020-02-16 Distributed file system based data current limiting test method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010094784.4A CN111309612A (en) 2020-02-16 2020-02-16 Distributed file system based data current limiting test method and system

Publications (1)

Publication Number Publication Date
CN111309612A true CN111309612A (en) 2020-06-19

Family

ID=71158221

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010094784.4A Pending CN111309612A (en) 2020-02-16 2020-02-16 Distributed file system based data current limiting test method and system

Country Status (1)

Country Link
CN (1) CN111309612A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114598708A (en) * 2020-11-20 2022-06-07 马上消费金融股份有限公司 Information processing method, device, system, equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116542A (en) * 2013-01-24 2013-05-22 浪潮(北京)电子信息产业有限公司 Test method of equipment expansion stability
CN106095646A (en) * 2016-06-27 2016-11-09 江苏迪纳数字科技股份有限公司 Hadoop performance cluster computational methods based on multiple linear regression model
US20170124464A1 (en) * 2015-10-28 2017-05-04 Fractal Industries, Inc. Rapid predictive analysis of very large data sets using the distributed computational graph
CN110061930A (en) * 2019-02-01 2019-07-26 阿里巴巴集团控股有限公司 A kind of limitation of data traffic, cut-off current determination method and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116542A (en) * 2013-01-24 2013-05-22 浪潮(北京)电子信息产业有限公司 Test method of equipment expansion stability
US20170124464A1 (en) * 2015-10-28 2017-05-04 Fractal Industries, Inc. Rapid predictive analysis of very large data sets using the distributed computational graph
CN106095646A (en) * 2016-06-27 2016-11-09 江苏迪纳数字科技股份有限公司 Hadoop performance cluster computational methods based on multiple linear regression model
CN110061930A (en) * 2019-02-01 2019-07-26 阿里巴巴集团控股有限公司 A kind of limitation of data traffic, cut-off current determination method and apparatus

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANDROID路上的人: "HDFS的读写限流方案", 《HTTPS://BLOG.CSDN.NET/ANDROIDLUSHANGDEREN/ARTICLE/DETAILS/51235380》 *
ZHIGANG1007: "hadoop集群基准测试", 《HTTPS://BLOG.CSDN.NET/ZHIGANG1007/ARTICLE/DETAILS/78695064》 *
万川梅: "《Hadoop应用开发实战详解》", 31 August 2014, 中国铁道出版社 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114598708A (en) * 2020-11-20 2022-06-07 马上消费金融股份有限公司 Information processing method, device, system, equipment and readable storage medium
CN114598708B (en) * 2020-11-20 2024-04-26 马上消费金融股份有限公司 Information processing method, device, system, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN110147411B (en) Data synchronization method, device, computer equipment and storage medium
Rabl et al. Solving big data challenges for enterprise application performance management
CN108595664B (en) Agricultural data monitoring method in hadoop environment
Donvito et al. Testing of several distributed file-systems (HDFS, Ceph and GlusterFS) for supporting the HEP experiments analysis
US11782609B2 (en) Method and apparatus for auditing abnormality of block device in cloud platform, device, and storage medium
CN111459698A (en) Database cluster fault self-healing method and device
WO2017161471A1 (en) Heterogeneous type database storage system based on optical disk, and method for using system
CN106850321A (en) A kind of simulated testing system of cluster server
US20210191903A1 (en) Generating hash trees for database schemas
CN105677251A (en) Storage system based on Redis cluster
Wang et al. Benchmarking replication and consistency strategies in cloud serving databases: Hbase and cassandra
US20150089042A1 (en) Dynamic discovery of applications, external dependencies, and relationships
CN111309612A (en) Distributed file system based data current limiting test method and system
Ivanov et al. Performance evaluation of enterprise big data platforms with HiBench
CN111240936A (en) Data integrity checking method and equipment
WO2024066506A1 (en) Data monitoring and analysis method and apparatus, and server, operation and maintenance system, and storage medium
Zou et al. Improving log-based fault diagnosis by log classification
CN113778795B (en) Cross-version Oracle monitoring system based on Python language
Iuhasz et al. Monitoring of exascale data processing
Tovarnák et al. Structured and interoperable logging for the cloud computing Era: The pitfalls and benefits
CN107870824A (en) A kind of method and device that inspection is carried out to component
Hasanpuri et al. Comparative analysis of techniques for big-data performance testing
US20140358616A1 (en) Asset management for a computer-based system using aggregated weights of changed assets
CN113688017B (en) Automatic abnormality testing method and device for multi-node BeeGFS file system
Ibrahim Workstation Cluster’s Hadoop Distributed File System Simulation and Modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200619

RJ01 Rejection of invention patent application after publication