CN111858365A - Method and equipment for testing performance of Flink K-Means - Google Patents

Method and equipment for testing performance of Flink K-Means Download PDF

Info

Publication number
CN111858365A
CN111858365A CN202010724528.9A CN202010724528A CN111858365A CN 111858365 A CN111858365 A CN 111858365A CN 202010724528 A CN202010724528 A CN 202010724528A CN 111858365 A CN111858365 A CN 111858365A
Authority
CN
China
Prior art keywords
data
flink
parameters
test
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010724528.9A
Other languages
Chinese (zh)
Inventor
蔡丽敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010724528.9A priority Critical patent/CN111858365A/en
Publication of CN111858365A publication Critical patent/CN111858365A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3692Test management for test results analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method and equipment for testing the performance of Flink K-Means, wherein the method comprises the following steps: defining parameters needed by data generation based on a k-means clustering algorithm, analyzing the parameters and generating original data based on the analyzed parameters; carrying out format conversion on the original data to form a data format required by the test; testing the Flink K-Means based on the data after format conversion, and graphically displaying the test result; and analyzing specific parameters of the test result to judge whether the Flink distributed real-time processing engine can meet the current production requirement. By using the scheme of the invention, the performance of the Flink batch K-Means can be tested, each node of the Flink cluster can be tested, the test data result can be analyzed and counted to determine whether the Flink distributed real-time processing engine can meet the current production requirement, and whether the physical machine, the memory, the CPU and the hard disk can meet the on-line production requirement of the Flink.

Description

Method and equipment for testing performance of Flink K-Means
Technical Field
The field relates to the field of computers, and more particularly to a method and apparatus for flank K-Means performance testing.
Background
In recent years, rapid development of big data has brought about a plurality of popular open source communities, which are known as Hadoop, Storm, and Spark, Flink, etc., and Apache Flink has become a mainstream choice for users in the field of real-time computing with rapid development in recent years.
Apache Flink is a distributed, high-performance, highly available, high-precision open-source streaming framework for data streaming applications. At the core of Flink is distributed computing that provides data distribution, communication, and fault tolerance on data streams. Meanwhile, Flink provides batch-flow fusion computing power on the flow processing engine, and SQL expression power. The Flink technology is developed more and more mature, and the PK with Spark gradually occupies the wind, so that the Flink technology is a new popular candidate in the current real-time processing field. Apache Flink is an open source stream processing framework developed by the Apache software foundation, at the heart of which is a distributed stream data stream engine written in Java and Scala. Flink executes arbitrary stream data programs in a data parallel and pipelined manner, and Flink's pipelined runtime system can execute batch and stream processing programs. In addition, the runtime of Flink itself supports the execution of iterative algorithms. The K-means clustering algorithm is also called as a K-means clustering algorithm, and is a distance-based clustering algorithm integrating simplicity and classics. The distance is used as an evaluation index of similarity, namely the closer the distance between two objects is, the greater the similarity of the two objects is. The algorithm considers that class clusters are composed of closely spaced objects, and therefore the resulting compact and independent clusters are the final target.
The Hibench is an Intel-sourced big data benchmark test tool, and can evaluate the speed, the throughput and the system resource utilization rate of different big data frames. The method comprises the steps of Sort, WordCount, TeraSort, Sleep, SQL, PageRank, Nutchindex, Bayes, Kmeans, NWeiight, enhanced DFSIO and the like, and for the support of a Flink framework, only the test of flow type calculation is supported at present, and for the batch calculation mode of a Flink distributed data flow engine, the test cannot be carried out.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method and a device for testing flank K-Means performance, by using the method of the present invention, it is possible to test the performance of the flank batch K-Means, test each node of the flank cluster, and analyze and count the test data result to determine whether the flank distributed real-time processing engine can meet the current production requirement, and whether the physical machine and the memory, the CPU, and the hard disk can meet the production requirement on the flank line.
In view of the above object, an aspect of the embodiments of the present invention provides a method for testing performance of Flink K-Means, comprising the following steps:
defining parameters needed by data generation based on a k-means clustering algorithm, analyzing the parameters and generating original data based on the analyzed parameters;
carrying out format conversion on the original data to form a data format required by the test;
testing the Flink K-Means based on the data after format conversion, and graphically displaying the test result;
and analyzing specific parameters of the test result to judge whether the Flink distributed real-time processing engine can meet the current production requirement.
According to an embodiment of the present invention, defining parameters required for generating data based on a k-means clustering algorithm, analyzing the parameters and generating raw data based on the analyzed parameters includes:
selecting a point and adding the point to a central set S;
acquiring the mean value of all points of each dimension in the central set S, and calculating a new point through the mean value of the dimension plus the variance;
adding a new point to the center set S, and circulating the previous step until enough initial centers are obtained;
generating points around the initial center from the initial center through Gaussian distribution, dividing the number of the points to be generated, and writing the data points generated by each partition into the result to generate original data.
According to one embodiment of the invention, the parameters include file output location, dimensionality, number of data points, number of clusters, minimum distance to all central means, and standard deviation of data points.
According to one embodiment of the invention, the specific parameters include the total throughput, the average number of delays, the error rate and the average discontinuous throughput, delay, error rate for the same time.
According to one embodiment of the invention, the data format includes a data volume of 500G, a thread number of 100, and a k value of 3.
In another aspect of the embodiments of the present invention, there is also provided an apparatus for performing a Flink K-Means performance test, the apparatus including:
the analysis module is configured to define parameters required for generating data based on a k-means clustering algorithm, analyze the parameters and generate original data based on the analyzed parameters;
the conversion module is configured to convert the format of the original data to form a data format required by the test;
the testing module is configured to test the Flink K-Means based on the data after format conversion, and graphically display a testing result;
and the analysis module is configured to analyze specific parameters of the test result so as to judge whether the Flink distributed real-time processing engine can meet the current production requirement.
According to an embodiment of the invention, the parsing module is further configured to:
selecting a point and adding the point to a central set S;
acquiring the mean value of all points of each dimension in the central set S, and calculating a new point through the mean value of the dimension plus the variance;
adding a new point to the center set S, and circulating the previous step until enough initial centers are obtained;
generating points around the initial center from the initial center through Gaussian distribution, dividing the number of the points to be generated, and writing the data points generated by each partition into the result to generate original data.
According to one embodiment of the invention, the parameters include file output location, dimensionality, number of data points, number of clusters, minimum distance to all central means, and standard deviation of data points.
According to one embodiment of the invention, the specific parameters include the total throughput, the average number of delays, the error rate and the average discontinuous throughput, delay, error rate for the same time.
According to one embodiment of the invention, the data format includes a data volume of 500G, a thread number of 100, and a k value of 3.
The invention has the following beneficial technical effects: according to the method for testing the performance of the Flink K-Means, parameters needed by data generation are defined based on a K-Means clustering algorithm, the parameters are analyzed, and original data are generated based on the analyzed parameters; carrying out format conversion on the original data to form a data format required by the test; testing the Flink K-Means based on the data after format conversion, and graphically displaying the test result; the technical scheme includes that specific parameters are analyzed on test results to judge whether the Flink distributed real-time processing engine can meet the current production requirement or not, the performance of the Flink batch K-Means can be tested, each node of a Flink cluster can be tested, test data results can be analyzed and counted to confirm whether the Flink distributed real-time processing engine can meet the current production requirement or not, and whether physical machines, memories, CPUs and hard disks can meet the production requirement on a Flink line or not is judged.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram of a method of Flink K-Means performance testing in accordance with one embodiment of the present invention;
FIG. 2 is a schematic diagram of a device for Flink K-Means performance testing according to one embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
In view of the above objects, a first aspect of embodiments of the present invention provides an embodiment of a method for performing a performance test of Flink K-Means. Fig. 1 shows a schematic flow diagram of the method.
As shown in fig. 1, the method may include the steps of:
s1 defining parameters needed by data generation based on a k-means clustering algorithm, analyzing the parameters and generating original data based on the analyzed parameters;
s2, converting the format of the original data to form the data format required by the test;
s3 testing the Flink K-Means based on the data after format conversion, and graphically displaying the test result;
s4, analyzing the specific parameters of the test result to judge whether the Flink distributed real-time processing engine can meet the current production requirement.
By the technical scheme, the invention can test the performance of the Flink batch K-Means, test each node of the Flink cluster, analyze and count the test data result to determine whether the Flink distributed real-time processing engine can meet the current production requirement and whether the physical machine, the memory, the CPU and the hard disk can meet the on-line production requirement of the Flink.
In a preferred embodiment of the present invention, defining parameters required for generating data based on a k-means clustering algorithm, and analyzing the parameters and generating raw data based on the analyzed parameters includes:
selecting a point and adding the point to a central set S;
acquiring the mean value of all points of each dimension in the central set S, and calculating a new point through the mean value of the dimension plus the variance;
adding a new point to the center set S, and circulating the previous step until enough initial centers are obtained;
generating points around the initial center from the initial center through Gaussian distribution, dividing the number of the points to be generated, and writing the data points generated by each partition into the result to generate original data.
The Java-based Flink K-Means data generation module is developed based on a java language, the core of the module is a distributed K-mean data generator, and the function of generating the simulation data in a distributed mode is realized by adopting java IO stream programming and an original K-mean algorithm. The module source code needs to be compiled into mvn clear packet to generate data generator jar, and the jar packet is copied to $ { FLINK _ HOME }/example/batch.
Specifically, the defining module receives parameters such as parameter output, dimension, data point number, cluster number, minimum distance from all central means, standard deviation of data points and the like, analyzes the parameters and initializes the parameters. Then, selecting a point and adding the point to a center set S and creating a next point, obtaining the average value of all points of each dimension in S, calculating a new point by the average value of dimension + variance, wherein the variance is minDistance + (minDistance rnd. nextgausain), adding the calculated new point to S, circulating from the previous step until enough initial center is obtained, generating points around the center from the initial center through Gaussian distribution, dividing the number of points to be generated, generating data points by each partition, and writing the result. The generated data mainly uses buffer writer FileWriter class of java io module and the like and Random class of util. And then converting the data format, acquiring the data generated in the previous step as input data, formatting the data, defining data output '-output' by using a Flank apiDataSet, enabling the file receiver to be inert, and triggering the output position by using the parameters.
In a preferred embodiment of the invention, the parameters include file output location, dimension, number of data points, number of clusters, minimum distance from all central means, and standard deviation of data points.
In a preferred embodiment of the invention, the specific parameters include the total throughput, the average number of delays, the error rate and the average discontinuous throughput, delay, error rate for the same time. Judging whether the Flink distributed real-time processing engine can meet the current production requirement or not based on the specific parameters comprises the following steps:
1. in the test task execution process, a test data analysis module calculates throughput (AvgQPS/TPS (strips/min)), delay and total number of errors) of a fixed time period obtained in a test task in an accumulation mode by calling a Flink Restfull web interface GET/v 1/jobs/jobid, records execution time when the test is finished, outputs the record in real time, and uses the queue service with larger data volume;
2. calculating the average number of total throughput and delay, the error rate and the average discontinuous throughput, delay and error rate in the same time by adopting a tail-cutting average method according to the total data and the execution time;
3. and when the test is finished, outputting the test result to an excel file, namely a single test result, according to a certain format through an interface, and judging whether the Flink distributed real-time processing engine can meet the current production requirement or not according to the test result.
In a preferred embodiment of the present invention, the data format includes a data volume of 500G, a thread number of 100, and a k value of 3. Jar is used for generating simulation data, the data volume and the thread number can be defined by users, the generated positions, the number of threads, the k value and the like can be set, and test data can be generated accurately and efficiently.
Flink run-c DistributedDataGenerator data-generator-1.0-SNAPSHOT.jar--output hdfs://xx.xx.xx.xx:9000/flink/kmeans--d 100--size500000000--k 3
Wherein: the Flink command has a data size of 500G, a thread number of 100, and a k value of 3.
And finally, outputting the data to a hadoop system to store hdfs:// xx.xx.xx.xx.xx.xx.9000/flink/kmeans, calling the output mode by using an interface, developing the interface to meet RESTful interface specification, packaging the function into a module, and realizing data output automatically.
In a preferred embodiment of the present invention, testing the Flink K-Means based on the transformed data comprises: running a Flink K-Means test module, using a self-contained K-Means application program CLI under $ { FLINK _ HOME }/example/batch KMeans. jar, and storing the test result in the Hadoop distributed system.
flink run KMeans.jar--points hdfs://xx.xx.xx.xx:9000/flink/kmeans--output hdfs://xx.xx.xx.xx:9000/flink/kmeans/Output--k 3--iterations 20
Wherein: and (3) in a Flink command, a data input source is the data stored in the Hadoop generated in the first step, the result data is output and stored in the Hadoop, and iterations are the iterations.
In a preferred embodiment of the present invention, graphically displaying the test results comprises:
1. flink configuration metrics monitors, detects the Flink test process,
metrics.scope.jm,metrics.scope.jm.job,
etc. of metrics
The above configuration items of the flight about metrics can monitor and collect the job, task and other index data of the flight;
2. the Flink configuration pushes monitoring information to Prometheus,
above, the parameter, the reader, the promegateway, the host and the like are configuration items of the flag about the parameters, and Prometous service information is configured;
3. configuring Prometous, downloading and installing Prometous components, and configuring Prometous;
4. the Flink service calls a metering with an interface to collect monitoring information and calls a Prometous restfulllweb interface at the same time, monitoring data is pushed to Prometous in real time, and monitoring results such as throughput, delay and the like can be seen in real time in a graphical mode through a Prometous UI.
By the technical scheme, the invention can test the performance of the Flink batch K-Means, test each node of the Flink cluster, analyze and count the test data result to determine whether the Flink distributed real-time processing engine can meet the current production requirement and whether the physical machine, the memory, the CPU and the hard disk can meet the on-line production requirement of the Flink.
It should be noted that, as will be understood by those skilled in the art, all or part of the processes in the methods of the above embodiments may be implemented by instructing relevant hardware through a computer program, and the above programs may be stored in a computer-readable storage medium, and when executed, the programs may include the processes of the embodiments of the methods as described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention.
In view of the above object, according to a second aspect of the embodiments of the present invention, there is provided an apparatus for performing a flank K-Means performance test, as shown in fig. 2, the apparatus 200 includes:
the analysis module is configured to define parameters required for generating data based on a k-means clustering algorithm, analyze the parameters and generate original data based on the analyzed parameters;
the conversion module is configured to convert the format of the original data to form a data format required by the test;
the testing module is configured to test the Flink K-Means based on the data after format conversion, and graphically display a testing result;
and the analysis module is configured to analyze specific parameters of the test result so as to judge whether the Flink distributed real-time processing engine can meet the current production requirement.
In a preferred embodiment of the present invention, the parsing module is further configured to:
selecting a point and adding the point to a central set S;
acquiring the mean value of all points of each dimension in the central set S, and calculating a new point through the mean value of the dimension plus the variance;
adding a new point to the center set S, and circulating the previous step until enough initial centers are obtained;
generating points around the initial center from the initial center through Gaussian distribution, dividing the number of the points to be generated, and writing the data points generated by each partition into the result to generate original data.
In a preferred embodiment of the invention, the parameters include file output location, dimension, number of data points, number of clusters, minimum distance from all central means, and standard deviation of data points.
In a preferred embodiment of the invention, the specific parameters include the total throughput, the average number of delays, the error rate and the average discontinuous throughput, delay, error rate for the same time.
In a preferred embodiment of the present invention, the data format includes a data volume of 500G, a thread number of 100, and a k value of 3.
It should be particularly noted that the embodiment of the system described above employs the embodiment of the method described above to specifically describe the working process of each module, and those skilled in the art can easily think that the modules are applied to other embodiments of the method described above.
Further, the above-described method steps and system elements or modules may also be implemented using a controller and a computer-readable storage medium for storing a computer program for causing the controller to implement the functions of the above-described steps or elements or modules.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The embodiments described above, particularly any "preferred" embodiments, are possible examples of implementations and are presented merely to clearly understand the principles of the invention. Many variations and modifications may be made to the above-described embodiments without departing from the spirit and principles of the technology described herein. All such modifications are intended to be included within the scope of this disclosure and protected by the following claims.

Claims (10)

1. A method for testing the performance of Flink K-Means is characterized by comprising the following steps:
defining parameters needed by data generation based on a k-means clustering algorithm, analyzing the parameters and generating original data based on the analyzed parameters;
carrying out format conversion on the original data to form a data format required by the test;
testing the Flink K-Means based on the data after format conversion, and graphically displaying the test result;
and analyzing specific parameters of the test result to judge whether the Flink distributed real-time processing engine can meet the current production requirement.
2. The method of claim 1, wherein parameters needed to generate data are defined based on a k-means clustering algorithm, and wherein parsing the parameters and generating raw data based on the parsed parameters comprises:
selecting a point and adding the point to a central set S;
acquiring the mean value of all points of each dimension in the central set S, and calculating a new point through the mean value of the dimension + the variance;
adding the new point to the center set S, and circulating the previous step until enough initial centers are obtained;
generating points around the initial center from the initial center through Gaussian distribution, dividing the number of points to be generated, and writing data points generated by each partition into a result to generate the original data.
3. The method of claim 1, wherein the parameters include file output location, dimensionality, number of data points, number of clusters, minimum distance from all central means, and standard deviation of data points.
4. The method of claim 1, wherein the specific parameters include total throughput, average number of delays, error rate, and average discontinuous throughput, delay, error rate for the same time.
5. The method of claim 1, wherein the data format comprises a data volume of 500G, a thread count of 100, and a k value of 3.
6. An apparatus for performing a Flink K-Means performance test, the apparatus comprising:
the analysis module is configured to define parameters required for generating data based on a k-means clustering algorithm, analyze the parameters and generate original data based on the analyzed parameters;
the conversion module is configured to convert the format of the original data to form a data format required by the test;
the testing module is configured to test the Flink K-Means based on the data after format conversion, and graphically display a testing result;
an analysis module configured to perform parameter-specific analysis on the test result to determine whether the Flink distributed real-time processing engine can meet the current production requirement.
7. The device of claim 6, wherein the parsing module is further configured to:
selecting a point and adding the point to a central set S;
acquiring the mean value of all points of each dimension in the central set S, and calculating a new point through the mean value of the dimension + the variance;
adding the new point to the center set S, and circulating the previous step until enough initial centers are obtained;
generating points around the initial center from the initial center through Gaussian distribution, dividing the number of points to be generated, and writing data points generated by each partition into a result to generate the original data.
8. The apparatus of claim 6, wherein the parameters include file output location, dimensionality, number of data points, number of clusters, minimum distance from all central means, and standard deviation of data points.
9. The apparatus of claim 6, wherein the specific parameters comprise total throughput, average number of delays, error rate, and average discontinuous throughput, delay, error rate for the same time.
10. The apparatus of claim 6, wherein the data format comprises a data volume of 500G, a thread count of 100, and a k value of 3.
CN202010724528.9A 2020-07-24 2020-07-24 Method and equipment for testing performance of Flink K-Means Withdrawn CN111858365A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010724528.9A CN111858365A (en) 2020-07-24 2020-07-24 Method and equipment for testing performance of Flink K-Means

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010724528.9A CN111858365A (en) 2020-07-24 2020-07-24 Method and equipment for testing performance of Flink K-Means

Publications (1)

Publication Number Publication Date
CN111858365A true CN111858365A (en) 2020-10-30

Family

ID=72951203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010724528.9A Withdrawn CN111858365A (en) 2020-07-24 2020-07-24 Method and equipment for testing performance of Flink K-Means

Country Status (1)

Country Link
CN (1) CN111858365A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880103A (en) * 2022-07-11 2022-08-09 中电云数智科技有限公司 System and method for adapting flink task to hadoop ecology

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880103A (en) * 2022-07-11 2022-08-09 中电云数智科技有限公司 System and method for adapting flink task to hadoop ecology
CN114880103B (en) * 2022-07-11 2022-09-09 中电云数智科技有限公司 System and method for flight task adaptation hadoop ecology

Similar Documents

Publication Publication Date Title
EP3754495B1 (en) Data processing method and related products
US10684940B1 (en) Microservice failure modeling and testing
CN110309071B (en) Test code generation method and module, and test method and system
US10515002B2 (en) Utilizing artificial intelligence to test cloud applications
US10389592B2 (en) Method, system and program product for allocation and/or prioritization of electronic resources
Qian et al. Timestream: Reliable stream computation in the cloud
EP3757793A1 (en) Machine-assisted quality assurance and software improvement
US20200183769A1 (en) Methods and systems that detect and classify incidents and anomolous behavior using metric-data observations
CN110750458A (en) Big data platform testing method and device, readable storage medium and electronic equipment
US11880271B2 (en) Automated methods and systems that facilitate root cause analysis of distributed-application operational problems and failures
JP2016100006A (en) Method and device for generating benchmark application for performance test
CN111367786B (en) Symbol execution method, electronic equipment and storage medium
US20120303325A1 (en) Inferring effects of configuration on performance
Wang Stream processing systems benchmark: Streambench
WO2020140624A1 (en) Method for extracting data from log, and related device
Han et al. Benchmarking big data systems: State-of-the-art and future directions
WO2024027384A1 (en) Fault detection method, apparatus, electronic device, and storage medium
CN113986746A (en) Performance test method and device and computer readable storage medium
US11468365B2 (en) GPU code injection to summarize machine learning training data
Bei et al. MEST: A model-driven efficient searching approach for MapReduce self-tuning
CN111858365A (en) Method and equipment for testing performance of Flink K-Means
CN114503132A (en) Debugging and profiling of machine learning model training
CN112232960B (en) Transaction application system monitoring method and device
Chen et al. Using deep learning to predict and optimize hadoop data analytic service in a cloud platform
Ng'ang'a et al. A Machine Learning Framework for Predicting Failures in Cloud Data Centers-A Case of Google Cluster-Azure Clouds and Alibaba Clouds

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20201030

WW01 Invention patent application withdrawn after publication