WO2017157203A1 - 一种分布式环境下监督学习算法的基准测试方法和装置 - Google Patents

一种分布式环境下监督学习算法的基准测试方法和装置 Download PDF

Info

Publication number
WO2017157203A1
WO2017157203A1 PCT/CN2017/075854 CN2017075854W WO2017157203A1 WO 2017157203 A1 WO2017157203 A1 WO 2017157203A1 CN 2017075854 W CN2017075854 W CN 2017075854W WO 2017157203 A1 WO2017157203 A1 WO 2017157203A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
benchmark test
supervised learning
learning algorithm
tested
Prior art date
Application number
PCT/CN2017/075854
Other languages
English (en)
French (fr)
Inventor
孙忠英
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017157203A1 publication Critical patent/WO2017157203A1/zh
Priority to US16/134,939 priority Critical patent/US20190019111A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3428Benchmarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites

Definitions

  • the present application relates to the field of machine learning technology, and in particular to a benchmark test method for supervised learning algorithms in a distributed environment and a benchmark test device for supervised learning algorithms in a distributed environment.
  • Machine learning is a multi-disciplinary subject that has emerged in the past 20 years. It involves many disciplines such as probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. Machine learning algorithms are a class of algorithms that automatically analyze and obtain rules from data and use rules to predict unknown data.
  • machine learning has a wide range of applications, such as: data mining, computer vision, natural language processing, biometrics, search engines, medical diagnosis, detection of credit card fraud, securities market analysis, DNA sequence sequencing, speech and handwriting recognition. , strategy games and robotics.
  • Supervised learning A function is generated by mapping the existing input data to the output data, and the input is mapped to an appropriate output, such as classification.
  • Unsupervised learning Modeling input datasets directly, such as clustering.
  • Semi-supervised learning comprehensively use data with and without class labels to generate appropriate classification functions.
  • supervised learning is divided into supervised learning in a stand-alone environment and supervised learning in a distributed environment.
  • Supervised learning in a distributed environment refers to multiple different and/or identical physics in different physical locations.
  • the structured device performs a supervised learning solution for supervised learning algorithms.
  • embodiments of the present application have been proposed in order to provide a benchmark test method for supervised learning algorithms in a distributed environment that overcomes the above problems or at least partially solves the above problems, and a corresponding supervised learning algorithm in a distributed environment.
  • Benchmarking device
  • the present application discloses a benchmark test method for supervised learning algorithms in a distributed environment, the method comprising:
  • the method further includes:
  • the first benchmark test result is determined based on the output data in the benchmark test.
  • the benchmarking test the to-be-tested learning learning algorithm according to the evaluation model to obtain output data including:
  • test-supervised learning algorithm is benchmarked to obtain output data.
  • the benchmarking test the to-be-tested learning learning algorithm according to the cross-validation model to obtain output data including:
  • N-1 of the N pieces of data are determined as training data, and the remaining one is determined as prediction data, wherein, in the M round benchmark test, each data is only determined once as an opportunity to predict data, wherein
  • the M and N are positive integers;
  • the input data in the determined one piece of prediction data is supplied to the function to derive output data.
  • the benchmarking test of the to-be-tested supervised learning algorithm according to the Label proportional allocation model obtains output data, including:
  • the test data sample includes: data having a first mark and data having a second mark;
  • One of the determinations is training data, and one or more of the remaining data is determined as prediction data, wherein the M and N are positive integers;
  • the determined input data in the prediction data having the first mark and the second mark is supplied to the function to obtain output data.
  • the first benchmark test result includes at least one of the following indicators: a true rate TP determined to be true, a correct rate TN determined to be false, a false positive rate FP and a false negative rate FN, an accuracy precision, and a recall rate Recall And accuracy Accuracy;
  • the second benchmark test result includes at least one of the following indicators: a usage of the processor to be tested by the supervised learning algorithm, a memory usage of the supervised learning algorithm to be tested, a memory MEM, an iterative number of iterations of the supervised learning algorithm to be tested, and The usage time of the supervised learning algorithm to be tested.
  • the method further includes:
  • the present application also discloses a benchmarking device for supervising a learning algorithm in a distributed environment, the device comprising: a first benchmark test result obtaining module, an index obtaining module, a second benchmark test result determining module, and a reference. Test total result determination module; wherein
  • the first benchmark test result obtaining module is configured to obtain a first benchmark test result determined according to output data in the benchmark test
  • the indicator obtaining module is configured to obtain a distributed performance indicator in the benchmark test
  • the second benchmark test result determining module is configured to determine the distributed performance indicator as a second benchmark test result
  • the benchmark test total result determining module is configured to combine the first benchmark test result and the second benchmark test result to obtain a benchmark test total result.
  • the device further comprises:
  • a determining module configured to determine a supervised learning algorithm to be tested before the first benchmark test result obtaining module obtains the first benchmark test result determined according to the output data in the benchmark test;
  • the benchmark test module is configured to perform benchmark test on the to-be-tested supervised learning algorithm according to the evaluation model to obtain output data;
  • the first benchmark test result determining module is configured to determine a first benchmark test result according to the output data in the benchmark test.
  • the benchmarking module is configured to perform benchmark testing on the supervised learning algorithm to be tested according to a cross-validation model; or, benchmarking the supervised learning algorithm to be tested according to a labeling proportional distribution model; or Performing benchmark test on the supervised learning algorithm to be tested according to the cross-validation model and the Label proportional distribution model respectively, wherein the output data is obtained;
  • the benchmark test module includes: a first benchmark test submodule and a second benchmark test submodule; wherein
  • the first benchmark test sub-module is configured to perform a benchmark test on the supervised learning algorithm to be tested according to a cross-validation model or a labeled proportional distribution model;
  • the second benchmark test sub-module is configured to perform a benchmark test on the supervised learning algorithm to be tested according to a cross-validation model or a labeled proportional distribution model.
  • the first benchmark test submodule comprises:
  • a first halving unit configured to divide the data in the test data sample into N parts
  • a first determining unit configured to determine, in each round of the benchmark test, N-1 of the N pieces of data as training data, and the remaining one is determined as prediction data, wherein each of the M round benchmark tests The data is only determined once as an opportunity to predict the data, and M and N are positive integers;
  • a first providing unit configured to provide the determined N-1 training data to the supervised learning algorithm to be tested to obtain a function in each round of benchmark test
  • a second providing unit configured to provide input data in the determined one piece of prediction data to the function in each round of benchmark test to obtain output data.
  • the second benchmark test submodule comprises:
  • a second data unit configured to take a test data sample, where the test data sample includes: data having a first mark and data having a second mark;
  • a second halving unit configured to separately divide the data having the first mark and the data having the second mark in the test data sample into N parts;
  • a second determining unit configured to determine, in each round of the benchmark test, one of the N pieces of data having the first mark as the training data, and determine one or more of the remaining data as the predicted data At the same time, one of the N pieces of data having the second mark is determined as training data, and one or more parts of the remaining data are determined as prediction data, where M and N are positive integers;
  • a third providing unit configured to provide the determined training data with the first mark and the second mark to the supervised learning algorithm to be tested to learn a function in each round of benchmark test;
  • a fourth providing unit configured to provide, in each round of the benchmark test, the input data in the determined prediction data having the first mark and the second mark to the function, to obtain output data.
  • the first benchmark test result includes at least one of the following indicators:
  • the correct rate TP the correct rate TN determined to be false, the false positive rate FP, the false negative rate FN, the precision Precision, the recall rate Recall, and the accuracy Accuracy;
  • the second benchmark test result includes at least one of the following indicators: a usage of the processor to be tested by the supervised learning algorithm, a memory usage of the supervised learning algorithm to be tested, a memory MEM, an iterative number of iterations of the supervised learning algorithm to be tested, and The usage time of the supervised learning algorithm to be tested.
  • the device further comprises:
  • a performance evaluation module configured to determine an F1 score according to the first benchmark test result; and perform performance evaluation on the supervised learning algorithm to be tested by:
  • the embodiment of the present application obtains the first benchmark test result determined according to the output data in the benchmark test, and obtains the second benchmark test result by acquiring the distributed performance indicator in the benchmark test, and then, by combining the first benchmark test result And the second benchmark test results, so that the combined benchmark results obtained after the merger include performance analysis indicators of different dimensions. Since multi-dimensional performance indicators can maximize the performance of the algorithm, those skilled in the art can comprehensively and accurately evaluate the performance of the supervised learning algorithm in a distributed environment by analyzing the benchmark results of the different dimensions. The evaluation error caused by a single performance indicator is avoided.
  • the second benchmark test result includes distributed performance indicators obtained from the distributed system
  • the distributed performance indicators can accurately reflect the current hardware consumption information of the distributed system when the supervised learning algorithm is run. Therefore, by comprehensively analyzing the distributed performance indicators and the first benchmark test results, the performance status of the current distributed system when running the algorithm can be accurately and quickly judged, and the prior art is overcome because The supervised learning algorithm in a distributed environment performs a benchmark test and cannot benchmark the supervised learning algorithm in a distributed environment.
  • FIG. 1 is a flow chart of steps of an embodiment of a benchmark test method for a supervised learning algorithm in a distributed environment according to an embodiment of the present application;
  • FIG. 2 is a flow chart of steps of an embodiment of a benchmark test method for a supervised learning algorithm in a distributed environment according to an embodiment of the present application;
  • FIG. 3 is a structural block diagram of an embodiment of a benchmarking device for monitoring a learning algorithm in a distributed environment according to an embodiment of the present application
  • FIG. 4 is a structural block diagram of an embodiment of a benchmarking apparatus for a supervised learning algorithm in a distributed environment according to an embodiment of the present application;
  • FIG. 5 is a reference measurement of a supervised learning algorithm in a distributed environment according to an embodiment of an apparatus of the present application.
  • FIG. 6 is a schematic diagram showing a logical sequence of data type division in each round of benchmark test process according to an embodiment of a benchmark test method for a supervised learning algorithm in a distributed environment according to an example of the present application;
  • FIG. 7 is a structural diagram of a benchmark test system for a supervised learning algorithm in a distributed environment according to an example of the present application.
  • FIG. 8 is a service flow diagram of an embodiment of Benchmark benchmarking using a cross-validation model and a Label proportional allocation model according to an embodiment of the present application;
  • FIG. 9 is a process flow diagram of a supervised learning algorithm in a distributed environment, according to an example of the present application.
  • the difference between supervised learning in a distributed environment and supervised learning in a traditional stand-alone environment is that the resources for supervised learning in a distributed environment are not easily calculated and counted.
  • the supervised learning algorithm is executed in a distributed environment, all computing resources are composed of data results generated on several machines.
  • the total resources are 10 cores and 20Gs.
  • the training data of a supervised learning algorithm is 128M.
  • the 128M training data will explode during the training phase.
  • the data can be sliced according to the data size to apply for resources.
  • the training data is expanded to 1G.
  • Calculated with an instance of 256M data you need 4 instances to complete the algorithm task.
  • the CPU and memory are dynamically applied for each instance.
  • four instances are running at the same time.
  • various resources are coordinated with each other in the distributed case.
  • the CPU and memory consumed by the task need to be calculated simultaneously. The resource consumption under the instance is consumed, and the resource consumption under each instance is not easy to be counted.
  • One of the core concepts of the embodiments of the present application is to obtain a first benchmark test result determined according to output data in a benchmark test, and to obtain a distribution in the benchmark test.
  • a performance indicator is determined as a second benchmark test result; the first benchmark test result and the second benchmark test result are combined to obtain a benchmark test total result.
  • FIG. 1 a benchmark of a supervised learning algorithm in a distributed environment of the present application is shown.
  • the flow chart of the steps of the method embodiment may specifically include the following steps:
  • Step 101 Acquire a first benchmark test result determined according to output data in a benchmark test
  • a first benchmark test result may be determined, and the first benchmark test result is an analysis result obtained by analyzing the output data.
  • the first benchmark test result may include at least one of the following performance indicators: a True Positives (TP), a True Negative (TN), and a False Positive Rate. (False Positives, FP), False Negative (FN), Precision Precision, Recall Recall, Accuracy.
  • TP True Positives
  • TN True Negative
  • FN False Positive Rate
  • Step 102 Obtain a distributed performance indicator in the benchmark test, and determine the distributed performance indicator as a second benchmark test result.
  • the distributed performance indicator that needs to be obtained is the hardware consumption information generated during the benchmark test process of the supervised learning algorithm, such as the processor usage CPU and memory. Usage MEM, algorithm iteration number Iterate and algorithm usage time Duration and so on.
  • Step 103 Combine the first benchmark test result and the second benchmark test result to obtain a benchmark test total result.
  • each performance indicator data in the first benchmark test result and the second benchmark test result may be combined and displayed in various manners, such as a table, a graph, a curve, and the like.
  • the evaluation dimension table is The form of the benchmark results obtained by the combination:
  • the overall results of the benchmark test can reflect the performance index information of the algorithm from multiple dimensions, and based on this information, the technician with professional knowledge can analyze the information and treat the test.
  • the embodiment of the present application obtains the first benchmark test result determined according to the output data in the benchmark test, and obtains the second benchmark test result by acquiring the distributed performance indicator in the benchmark test, and then, by combining the first benchmark The test result and the second benchmark test result, the total benchmark test result obtained after the merger includes performance analysis indicators of different dimensions, and the multi-dimensional performance index can maximize the performance of the algorithm, therefore,
  • the person skilled in the art can comprehensively and accurately evaluate the performance of the supervised learning algorithm in the distributed environment, and avoid the evaluation error caused by the single performance index.
  • the second benchmark test result includes distributed performance indicators obtained from the distributed system
  • the distributed performance indicators can accurately reflect the current hardware consumption information of the system when the distributed system runs the supervised learning algorithm. Therefore, by comprehensively analyzing the distributed performance indicators and the first benchmark test results, the performance status of the current distributed system when running the algorithm can be accurately and quickly judged, and the prior art is overcome because The supervised learning algorithm in a distributed environment performs a benchmark test and cannot benchmark the supervised learning algorithm in a distributed environment.
  • a benchmark test platform can be constructed based on a benchmark test method provided by an embodiment of the present application, and the benchmark test method or platform can be based on output data and distributed performance indicators acquired during the execution of the supervised learning algorithm in a distributed environment. Analysis to provide a comprehensive and accurate performance assessment of the supervised learning algorithm in a distributed environment.
  • FIG. 2 a flow chart of the steps of the embodiment of the benchmarking method for the supervised learning algorithm in the distributed environment of the present application is shown, which may specifically include the following steps:
  • Step 201 Determine a supervised learning algorithm to be tested.
  • a supervised learning algorithm to be tested needs to be determined, and then the supervised learning algorithm to be tested is benchmarked to evaluate the performance of the supervised learning algorithm to be tested.
  • the method provided in the second embodiment of the present application mainly tests the supervised learning algorithm in a distributed environment.
  • the step can be selected by the user.
  • the user can directly submit a supervised learning algorithm to the benchmark test system, and the benchmark test system determines the received supervised learning algorithm as the supervised learning algorithm to be tested; or, the user is The supervisory learning algorithm to be tested is selected in the selection interface in the benchmark system, and the benchmarking system determines the supervised learning algorithm selected by the user as the supervised learning algorithm to be tested.
  • Step 202 Perform benchmark test on the to-be-tested supervised learning algorithm according to the evaluation model to obtain output data.
  • an evaluation model needs to be pre-set, which has the function of benchmarking the test supervised learning algorithm.
  • the cross-validation model and the labeled Label proportional distribution model are widely
  • the two models used have high accuracy and algorithm stability. Therefore, the embodiments of the present application select these two models as examples of the evaluation model to describe the method provided by the present application;
  • the evaluation model includes a cross-validation model and/or a labeled Label proportional distribution model.
  • the benchmarking of the to-be-tested supervised learning algorithm according to the evaluation model includes:
  • the benchmarking test algorithm is benchmarked according to the cross-validation model and the Label proportional distribution model.
  • FIG. 8 is a service flow diagram of a Benchmark benchmarking embodiment using a cross-validation model and a Label proportional distribution model in the present application.
  • the user can select any one of the above two models to run the task and obtain the display result according to the needs.
  • the benchmarking test the to-be-tested learning learning algorithm according to the cross-validation model to obtain output data includes the following steps:
  • Step 1 Take a test data sample
  • the test data sample is usually a measured data sample
  • the data sample includes a plurality of data, each of which includes input data and output data
  • the input and output values in each data are usually actual monitoring.
  • Values can also be referred to as standard input data and standard output data, respectively.
  • the input of each piece of data is the size of the house, and the corresponding output is the average price, and the specific values are the actual values obtained.
  • Step 2 dividing the data in the test data sample into N parts
  • Step 3 Perform an M round benchmark test on the N pieces of data
  • N-1 of the N pieces of data are determined as training data, and the remaining one is determined as prediction data, wherein, in the M round benchmark test, each data is only determined once as an opportunity to predict data, M.
  • N is a positive integer; providing the determined N-1 training data to the supervised learning algorithm to be tested to obtain a function; and providing input data in the determined piece of predicted data to the function, Output Data.
  • the quasi-test system first divides the data in the test data sample 1 into five parts, namely, data 1, data 2, data 3, data 4, and data 5, so that each piece contains 200 pieces of data; the M value is also 5.
  • the benchmark system then performs 5 rounds of benchmarking on the 5 pieces of data.
  • N-1 4. Therefore, 4 copies are selected as training data and 1 is used as prediction data.
  • each row shows a data division manner of 5 data in one round of benchmark test, wherein each row is data from left to right in order. 1 to data 5 division; in the first row, data 1 to data 4 are divided into training data, data 5 is prediction data; in the second row, data 1 to data 3 and data 5 are divided into training data, data 4 is the prediction data; in the third row, data 1, data 2, data 4 to data 5 are training data, and data 3 is prediction data; and so on, in the fourth row, data 2 is prediction data, and the rest is training data. In the fifth line, data 1 is the prediction data, and the rest is the training data. After the data is divided, the data needs to be tested in five rounds.
  • the four training data determined are provided to be
  • the test supervised learning algorithm learns to obtain a function (or may also be referred to as a model), and then, the input data in the remaining one piece of prediction data is supplied to the function, and the output data is obtained, and the output is obtained.
  • the data is a predicted value obtained by predicting the input data using the function; thus, after the five-round benchmark test is completed, five sets of output data can be obtained.
  • the data types in each round of the benchmark test process may be divided according to the logical sequence in the manner given in FIG. 6, or the data in the benchmark test process may be performed according to other logical sequences.
  • the type is divided, for example, the order between the top-down rows and rows in Fig. 6 is scrambled, as long as it is ensured that only one chance of each data is determined as prediction data in the M round benchmark test.
  • the benchmarking of the to-be-tested supervised learning algorithm according to the Label Proportional Assignment Model to obtain output data includes the following steps:
  • Step 1 Take a test data sample, where the test data sample includes: data having a first mark and data having a second mark;
  • the test data sample includes and includes only data having a first mark and data having a second mark, and the first mark and the second mark are used based on a specific need.
  • a tag that classifies data so the scheme is applied to a two-category scenario that contains two types of data.
  • Step 2 separately dividing the data having the first mark and the data having the second mark in the test data sample into N parts;
  • Step 3 Perform an M round benchmark test on the N pieces of data:
  • One of the determinations is training data, and one or more of the remaining data is determined as prediction data, wherein M and N are positive integers; and the determined training data having the first mark and the second mark is provided to The supervised learning algorithm to be tested learns to obtain a function; and the input data in the determined prediction data having the first mark and the second mark is supplied to the function to obtain output data.
  • the first mark and the second mark are only used to distinguish different marks, and are not used for definition.
  • the first mark and the second mark may use different mark symbols, for example, the first mark may be 1 and the second mark is 0; or the first mark is Y, the second mark is N, and the like.
  • the label proportional distribution model is classified according to the label value. After that, each type is divided into equal parts, and then different ratios are combined to perform training.
  • test data sample 2 contains 1000 pieces of data, of which 600 pieces of data have a label value of 1, and 400 pieces of data have a label value of 0.
  • 600 pieces of data with a label value of 1 can be divided into 10 parts, and 60 pieces of data and 400 pieces of label 0 are also divided into 10 pieces of 40 pieces of data.
  • the method for dividing the test data sample 2 is as shown in Table 2, wherein each row represents a piece of data, data 1 to data 10 represent data with a 10-point Label value of 1, and data 11 to data 20 represent a 10-point Label value. 0 data.
  • Test data sample 2 Label Data 1 1 Data 2 1 Data 3 1 Data 4 1 Data 5 1 Data 6 1 Data 7 1 Data 8 1 Data 9 1
  • Data 10 1 Data 11 0 Data 12 0 Data 13 0 Data 14 0 Data 15 0 Data 16 0 Data 17 0 Data 18 0 Data 19 0 Data 20 0
  • the benchmark system can determine 1 data with a label value of 1 and 1 data with a label value of 0 as training data, and determine another data with a label value of 1 and a label value of 0 as Predict the data, or determine more than one piece of data with a label value of 1 and a label value of 0 as prediction data.
  • the benchmarking test for the supervised learning algorithm to be tested according to the cross-validation model and the Label proportional allocation model respectively refers to benchmarking the test data samples according to the cross-validation model and the Label proportional distribution model respectively, so that Under different evaluation models, a set of output data will be obtained, and the two sets of output data will be determined as the output data of the entire benchmark test process.
  • Step 203 Acquire a first benchmark test result determined according to output data in the benchmark test
  • the plurality of parameter indicators may be determined according to the deviation between the output data and the standard output data, that is, the output data corresponding to the input data in the test data sample.
  • the first A benchmark test result may include at least one of the following performance indicators: TP, TN, FP, FN, Precision, Recall, Accuracy.
  • Step 204 Obtain a distributed performance indicator in the benchmark test, and determine the distributed performance indicator as a second benchmark test result.
  • the system performance detection module in the benchmark test system can obtain various distributed performance indicators in the benchmark test process, and the distributed performance indicators are the second benchmark test results.
  • the distributed performance indicators include At least one of the following indicators: the usage of the processor to be tested by the supervised learning algorithm, the memory usage of the supervised learning algorithm to be tested, the number of iterations of the supervised learning algorithm to be tested, and the usage time of the supervised learning algorithm to be tested Duration.
  • Step 205 Combine the first benchmark test result and the second benchmark test result to obtain a benchmark test total result.
  • the two benchmark test results may be combined to generate a list corresponding to the results, and the list is displayed to the user through the display screen, and when the user is provided
  • the technicians who evaluate the analysis capabilities can perform comprehensive analysis based on the data presented in the list, the performance of the test supervision learning algorithm can be evaluated.
  • the list may include one or more rows of output results, and each row of output results corresponds to a first benchmark test result and a second benchmark test result determined by one round of benchmark tests; or, each row of output results corresponds to a comprehensive analysis of multiple rounds of benchmark tests The determined first benchmark test result and the second benchmark test result.
  • Table 3 is a list of benchmark benchmark results.
  • Step 206 Perform performance evaluation on the supervised learning algorithm to be tested according to the benchmark test result.
  • the performance evaluation of the to-be-tested supervised learning algorithm according to the benchmark test result includes:
  • the performance of the test supervision learning algorithm can be directly evaluated, that is, When the F1 scores are the same and similar, the number of iterations of the supervised learning algorithm to be tested is determined, and the smaller the number of iterations, the supervised learning algorithm to be tested is determined to be better.
  • the F1 score that is, the F1 score
  • the F1 score can be regarded as a weighted average of the algorithm accuracy rate and the recall rate, and is an important index for evaluating the quality of the supervised learning algorithm to be tested.
  • the calculation formula is as follows:
  • precision and recall are indicators in the first benchmark test results. Specifically, precision is precision and recall is recall rate.
  • performance evaluation of the supervised learning algorithm to be tested may also be performed by:
  • the benchmark test result and the F1 score can also be outputted at the same time, which is convenient for the technician to view and analyze.
  • An exemplary list is shown in Table 4 below, which is a schematic table of the benchmark test results and the F1 score output simultaneously for another example of the present application:
  • the performance evaluation result may be sent to the user. Specifically, the performance evaluation result may be displayed on the display interface for the user to view. To assist the user in performance evaluation of the algorithm.
  • the method further includes:
  • the user can preset a standard value of the F1 score for different supervised learning algorithms to be tested, and set the deviation range, when the F1 score is If the deviation is within the range set by the user, it is determined that the benchmark test is successful. If the deviation of the F1 score exceeds the range set by the user, it is determined that the benchmark test is unsuccessful. Users can retest.
  • the method provided in the second embodiment of the present application determines the F1 value by performing further performance analysis on the total benchmark test result, and then, based on the F1 value, directly determines the running performance of the supervised algorithm in a distributed environment.
  • the judgment result is provided to the user, so that those skilled in the art can intuitively know the running performance of the supervised learning algorithm in the distributed environment from the output result, and the user does not need to recalculate the analysis index, thereby reducing the comparison with the above-mentioned first embodiment.
  • the time required for the user to analyze and judge further improves the analysis efficiency.
  • FIG. 3 it is a structural block diagram of an embodiment of a benchmarking device for monitoring a learning algorithm in a distributed environment according to the present application.
  • the method may include: a first benchmark test result obtaining module 31, an index obtaining module 32, and a second benchmark. a test result determining module 33 and a benchmark test total result determining module 34; wherein
  • the first benchmark test result determining module 31 is configured to determine a first benchmark test result according to the output data in the benchmark test
  • the indicator obtaining module 32 is configured to obtain a distributed performance indicator in the benchmark test
  • the second benchmark test result determining module 33 is configured to determine the distributed performance indicator as a second benchmark test result
  • the benchmark total result determining module 34 is configured to combine the first benchmark test result and the second benchmark test result to obtain a benchmark test total result.
  • the device further includes:
  • a determining module 35 configured to determine a supervised learning algorithm to be tested before the first benchmark test result obtaining module obtains the first benchmark test result determined according to the output data in the benchmark test;
  • the benchmarking module 36 is configured to perform benchmark testing on the to-be-tested learning learning algorithm according to the evaluation model to obtain output data;
  • the first benchmark test result determining module 37 is configured to determine a first benchmark according to output data in the benchmark test Test Results.
  • the benchmarking module 36 is configured to perform benchmark testing on the supervised learning algorithm to be tested according to the cross-validation model; or, benchmarking the supervised learning algorithm to be tested according to the labeled Label proportional allocation model; or And performing a benchmark test on the supervised learning algorithm to be tested according to the cross-validation model and the Label proportional distribution model to obtain output data; wherein,
  • the benchmark test module 36 includes: a first benchmark test submodule and a second benchmark test submodule; wherein
  • the first benchmark test sub-module is configured to perform a benchmark test on the supervised learning algorithm to be tested according to a cross-validation model or a labeled proportional distribution model;
  • the second benchmark test sub-module is configured to perform a benchmark test on the supervised learning algorithm to be tested according to a cross-validation model or a labeled proportional distribution model.
  • the first benchmark test submodule includes:
  • a first halving unit configured to divide the data in the test data sample into N parts
  • a first determining unit configured to determine, in each round of the benchmark test, N-1 of the N pieces of data as training data, and the remaining one is determined as prediction data, wherein each of the M round benchmark tests The data is only determined once as an opportunity to predict the data, and M and N are positive integers;
  • a first providing unit configured to provide the determined N-1 training data to the supervised learning algorithm to be tested to obtain a function in each round of benchmark test
  • a second providing unit configured to provide input data in the determined one piece of prediction data to the function in each round of benchmark test to obtain output data.
  • the second benchmark test submodule includes:
  • a second data unit configured to take a test data sample, where the test data sample includes: data having a first mark and data having a second mark;
  • a second halving unit configured to separately divide the data having the first mark and the data having the second mark in the test data sample into N parts;
  • a second determining unit configured to determine, in each round of the benchmark test, one of the N pieces of data having the first mark as the training data, and determine one or more of the remaining data as the predicted data At the same time, one of the N pieces of data having the second mark is determined as training data, and one or more parts of the remaining data are determined as prediction data, where M and N are positive integers;
  • a third providing unit configured to determine the first mark and the second mark in each round of benchmark test Training data is provided to the supervised learning algorithm to be tested for learning to obtain a function;
  • a fourth providing unit configured to provide, in each round of the benchmark test, the input data in the determined prediction data having the first mark and the second mark to the function, to obtain output data.
  • the first benchmark test result includes at least one of the following indicators:
  • the correct rate TP the correct rate TN determined to be false, the false positive rate FP, the false negative rate FN, the precision Precision, the recall rate Recall, and the accuracy Accuracy;
  • the second benchmark test result includes at least one of the following indicators: a usage of the processor to be tested by the supervised learning algorithm, a memory usage of the supervised learning algorithm to be tested, a memory MEM, an iterative number of iterations of the supervised learning algorithm to be tested, and The usage time of the supervised learning algorithm to be tested.
  • the apparatus further includes: a performance evaluation module 38, configured to determine an F1 score according to the first benchmark test result; and, for passing the following The method performs performance evaluation on the supervised learning algorithm to be tested:
  • the F1 score that is, the F1 score
  • the F1 score can be regarded as a weighted average of the algorithm accuracy rate and the recall rate, and is an important index for evaluating the quality of the supervised learning algorithm to be tested.
  • the calculation formula is as follows:
  • precision and recall are indicators in the first benchmark test results. Specifically, precision is precision and recall is recall rate.
  • the test result determining module 37 and the performance evaluation module 38 can be implemented by a central processing unit (CPU), a microprocessor (MPU, a Micro Processing Unit), a digital signal processor (DSP, Digital Signal Processor) in the benchmark system. Or a programmable logic array (FPGA, Field-Programmable Gate Array) to achieve.
  • the description is relatively simple and relevant. See the section of the method embodiment for a description.
  • FIG. 7 is a structural diagram of an exemplary benchmarking system including: a task creation module 71, a task splitting module 72, a task execution module 73, a data statistics module 74, a distributed indicator collection module 75, and data. a storage module 76; wherein
  • the task creation module 71 is configured to establish a benchmark test task according to the user indication
  • the user determines the supervised learning algorithm to be tested, thereby establishing a benchmark test task for the supervised learning algorithm to be tested.
  • the task splitting module 72 is configured to split a benchmark test task that is instructed by a user to be determined
  • each of the supervised learning algorithms to be tested is split into one benchmark test task.
  • the task execution module 73 is configured to perform benchmark testing on the benchmark test task and generate test data.
  • the data statistics module 74 is configured to generate a benchmark test result by statistics
  • test data generated during the centralized testing process is combined to obtain a centralized test result.
  • the distributed indicator collection module 75 is configured to collect distributed indicators generated during the benchmark test process
  • the data storage module 76 is configured to store the benchmark test result and the distributed indicator.
  • the task execution module 73 further includes: a training module 731, a prediction module 732, and an analysis module 733; wherein the training module 731 is configured to provide training data to the supervised learning algorithm to be tested to obtain a
  • the prediction module 732 is configured to provide prediction data to the function to obtain output data.
  • the analyzing module 733 is configured to generate test data according to the output data.
  • FIG. 9 a flow chart of an exemplary benchmarking method is shown in FIG. 9, which includes the following steps:
  • Step 901 creating a new task
  • the user creates a new task according to requirements, and the task is directed to a specific supervised learning algorithm, so the user needs to set a supervised learning algorithm to be tested;
  • Step 902 Perform a task
  • the supervised learning algorithm is benchmarked according to a cross-validation model or a proportional allocation model.
  • Step 903 Generate a benchmark test total result.
  • the benchmark test results here include: based on the test data when benchmarking the supervised learning algorithm Determined benchmark results and distributed metrics obtained during benchmark execution.
  • Step 904 determining an F1 score
  • the F1 score is determined according to the benchmark test result.
  • Step 905 determining whether the F1 score is reasonable; when the F1 score is reasonable, go to step 906; when the F1 score is unreasonable, go to step 907;
  • Step 906 Instruct the user to create a new benchmark test task.
  • Step 907 indicating that the benchmark test task fails
  • an indication message that the benchmark test task fails is sent to the user.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer device includes one or more processors (CPUs), input data/output data interfaces, network interfaces, and memory.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
  • the embodiments of the present application refer to a method, a terminal device (system), and a computer program according to an embodiment of the present application.
  • the flow chart and/or block diagram of the product is described. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.
  • the above is a reference test method for a supervised learning algorithm in a distributed environment and a benchmark test device for a supervised learning algorithm in a distributed environment, and a specific example is applied to the principle of the present application.
  • the embodiments are described in the above embodiments, and the description of the above embodiments is only for helping to understand the method of the present application and its core ideas. Meanwhile, for those skilled in the art, according to the idea of the present application, There is a change in the scope of the application and the scope of application. In summary, the content of the specification should not be construed as limiting the application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种分布式环境下监督学习算法的基准测试方法和装置,其中的方法包括:获取根据基准测试中的输出数据所确定的第一基准测试结果(101);获取所述基准测试中的分布式性能指标,将所述分布式性能指标确定为第二基准测试结果(102);将所述第一基准测试结果和第二基准测试结果合并得到基准测试总结果(103)。提供了一种完善的、用于解决分布式环境下监督学习算法的基准测试问题的方案,可以协助技术人员对监督学习算法的性能进行准确、快速的评估。

Description

一种分布式环境下监督学习算法的基准测试方法和装置
本申请要求2016年03月18日递交的申请号为201610158881.9、发明名称为“一种分布式环境下监督学习算法的基准测试方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及机器学习技术领域,特别是涉及一种分布式环境下监督学习算法的基准测试方法和一种分布式环境下监督学习算法的基准测试装置。
背景技术
机器学习是近20多年兴起的一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。机器学习算法是一类从数据中自动分析获得规律,并利用规律对未知数据进行预测的算法。
目前,机器学习已经有了十分广泛的应用,例如:数据挖掘、计算机视觉、自然语言处理、生物特征识别、搜索引擎、医学诊断、检测信用卡欺诈、证券市场分析、DNA序列测序、语音和手写识别、战略游戏和机器人运用。
在机器学习领域,监督学习、非监督学习以及半监督学习是三类研究比较多、应用比较广的机器学习技术,上述三种学习的简单描述如下:
监督学习:通过已有的一部分输入数据与输出数据之间的对应关系,生成一个函数,将输入映射到合适的输出,例如分类。
非监督学习:直接对输入数据集进行建模,例如聚类。
半监督学习:综合利用有类标的数据和没有类标的数据,来生成合适的分类函数。
按照部署结构的不同,监督学习被分为单机环境下的监督学习和分布式环境下的监督学习,分布式环境下的监督学习是指由处于不同物理位置的多个具备不同和/或相同物理结构的设备执行监督学习算法的一种监督学习解决方案。
由于分布式环境下的监督学习在设备部署上的复杂性,其在资源协调通信和消耗因素较多,这使得对于分布式环境下的监督学习算法的基准测试(benchmark),也就是,对分布式环境下的监督学习算法的性能进行评估的难度更大。
目前,针对分布式环境下监督学习算法的基准测试问题,还没有完整、有效的方案 被提出。
发明内容
鉴于上述问题,提出了本申请实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种分布式环境下监督学习算法的基准测试方法和相应的一种分布式环境下监督学习算法的基准测试装置。
为了解决上述问题,本申请公开了一种分布式环境下监督学习算法的基准测试方法,所述方法包括:
获取根据基准测试中的输出数据所确定的第一基准测试结果;
获取所述基准测试中的分布式性能指标,将所述分布式性能指标确定为第二基准测试结果;
将所述第一基准测试结果和第二基准测试结果合并得到基准测试总结果。
优选地,所述获取根据基准测试中的输出数据所确定第一基准测试结果之前,所述方法还包括:
确定待测试监督学习算法;
按照评估模型对所述待测试监督学习算法进行基准测试得到输出数据;
根据基准测试中的输出数据确定第一基准测试结果。
优选地,所述按照评估模型对所述待测试监督学习算法进行基准测试得到输出数据,包括:
按照交叉验证模型对所述待测监督学习算法进行基准测试得到输出数据;或者,
按照标记Label按比例分配模型对所述待测监督学习算法进行基准测试得到输出数据;或者,
按照交叉验证模型和Label按比例分配模型分别对所述待测监督学习算法进行基准测试得到输出数据。
优选地,所述按照交叉验证模型对所述待测试监督学习算法进行基准测试得到输出数据,包括:
取一测试数据样本;
将所述测试数据样本中的数据等分为N份;
对所述N份数据执行M轮基准测试;其中,
在每一轮基准测试中,包括以下步骤:
将所述N份数据中的N-1份确定为训练数据,其余一份确定为预测数据,其中,M轮基准测试中,每一份数据仅有一次被确定为预测数据的机会,其中,所述M、N为正整数;
将所确定的N-1份训练数据提供给所述待测试监督学习算法进行学习得到一个函数;
将所确定的一份预测数据中的输入数据提供给所述函数,得出输出数据。
优选地,所述按照Label按比例分配模型对所述待测试监督学习算法进行基准测试得到输出数据,包括:
取一测试数据样本,所述测试数据样本包括:具备第一标记的数据和具备第二标记的数据;
分别将所述测试数据样本中具备第一标记的数据和具备第二标记的数据等分为N份;
对所述等分后得到的2N份数据执行M轮基准测试;其中,
在每一轮基准测试中包括以下步骤:
将所述N份具备第一标记的数据中的一份确定为训练数据、并将剩余数据中的一份或多份确定为预测数据,同时,将所述N份具备第二标记的数据中的一份确定为训练数据、并将剩余数据中的一份或多份确定为预测数据,其中,所述M、N为正整数;
将所确定的具备第一标记和第二标记的训练数据提供给所述待测试监督学习算法进行学习得到一个函数;
将所确定的具备第一标记和第二标记的预测数据中的输入数据提供给所述函数,得到输出数据。
优选地,所述第一基准测试结果包括以下指标至少其中之一:判断为真的正确率TP、判断为假的正确率TN、误报率FP及漏报率FN、精度Precision、召回率Recall及准确度Accuracy;
所述第二基准测试结果包括以下指标至少其中之一:待测试监督学习算法对处理器的使用情况CPU、待测试监督学习算法对内存的使用情况MEM、待测试监督学习算法的迭代次数Iterate及待测试监督学习算法的使用时间Duration。
优选地,所述得到基准测试总结果后,所述方法还包括:
根据所述第一基准测试结果确定F1得分;以及,通过以下方式对所述待测试监督学习算法进行性能评估:
当F1得分相同或者接近时,待测试监督学习算法的Iterate值越小则确定待测试监督学习算法性能越好;或者,
当F1指标相同时,待测试监督学习算法的CPU、MEM、Iterate及Duration值越小,则确定待测试监督学习算法性能越好。
为了解决上述问题,本申请还公开了一种分布式环境下监督学习算法的基准测试装置,所述装置包括:第一基准测试结果获取模块、指标获取模块、第二基准测试结果确定模块及基准测试总结果确定模块;其中,
所述第一基准测试结果获取模块,用于获取根据基准测试中的输出数据所确定的第一基准测试结果;
所述指标获取模块,用于获取所述基准测试中的分布式性能指标;
所述第二基准测试结果确定模块,用于将所述分布式性能指标确定为第二基准测试结果;
所述基准测试总结果确定模块,用于将所述第一基准测试结果和第二基准测试结果合并得到基准测试总结果。
优选地,所述装置还包括:
确定模块,用于在所述第一基准测试结果获取模块获取根据基准测试中的输出数据所确定第一基准测试结果之前,确定待测试监督学习算法;
所述基准测试模块,用于按照评估模型对所述待测试监督学习算法进行基准测试得到输出数据;
所述第一基准测试结果确定模块,用于根据基准测试中的输出数据确定第一基准测试结果。
优选地,所述基准测试模块,用于按照交叉验证模型对所述待测监督学习算法进行基准测试;或者,按照标记Label按比例分配模型对所述待测监督学习算法进行基准测试;或者,按照交叉验证模型和Label按比例分配模型分别对所述待测监督学习算法进行基准测试得到输出数据;其中,
所述基准测试模块,包括:第一基准测试子模块和第二基准测试子模块;其中,
所述第一基准测试子模块,用于按照交叉验证模型或标记Label按比例分配模型对所述待测监督学习算法进行基准测试;
所述第二基准测试子模块,用于按照交叉验证模型或标记Label按比例分配模型对所述待测监督学习算法进行基准测试。
优选地,所述第一基准测试子模块,包括:
第一取数据单元,用于取一测试数据样本;
第一等分单元,用于将所述测试数据样本中的数据等分为N份;
第一确定单元,用于在每一轮基准测试中,将所述N份数据中的N-1份确定为训练数据、其余一份确定为预测数据,其中,M轮基准测试中,每一份数据仅有一次被确定为预测数据的机会,M、N为正整数;
第一提供单元,用于在每一轮基准测试中,将所确定的N-1份训练数据提供给所述待测试监督学习算法进行学习得到一个函数;
第二提供单元,用于在每一轮基准测试中,将所确定的一份预测数据中的输入数据提供给所述函数,得出输出数据。
优选地,所述第二基准测试子模块,包括:
第二取数据单元,用于取一测试数据样本,所述测试数据样本包括:具备第一标记的数据和具备第二标记的数据;
第二等分单元,用于分别将所述测试数据样本中具备第一标记的数据和具备第二标记的数据等分为N份;
第二确定单元,用于在每一轮基准测试中,将所述N份具备第一标记的数据中的一份确定为训练数据、并将剩余数据中的一份或多份确定为预测数据,同时,将所述N份具备第二标记的数据中的一份确定为训练数据、并将剩余数据中的一份或多份确定为预测数据,其中,M、N为正整数;
第三提供单元,用于在每一轮基准测试中,将所确定的具备第一标记和第二标记的训练数据提供给所述待测试监督学习算法进行学习得到一个函数;
第四提供单元,用于在每一轮基准测试中,将所确定的具备第一标记和第二标记的预测数据中的输入数据提供给所述函数,得出输出数据。
优选地,所述第一基准测试结果包括以下指标至少其中之一:
判断为真的正确率TP、判断为假的正确率TN、误报率FP、漏报率FN、精度Precision、召回率Recall及准确度Accuracy;
所述第二基准测试结果包括以下指标至少其中之一:待测试监督学习算法对处理器的使用情况CPU、待测试监督学习算法对内存的使用情况MEM、待测试监督学习算法的迭代次数Iterate及待测试监督学习算法的使用时间Duration。
优选地,所述装置还包括:
性能评估模块,用于根据所述第一基准测试结果确定F1得分;以及,通过以下方式对所述待测试监督学习算法进行性能评估:
当F1得分相同或者接近时,待测试监督学习算法的迭代次数越小则确定待测试监督学习算法性能越好;或者,
当F1指标相同时,待测试监督学习算法的CPU、MEM、Iterate及Duration值越小,则确定待测试监督学习算法性能越好。
本申请实施例包括以下优点:
本申请实施例获取根据基准测试中的输出数据所确定的第一基准测试结果,以及,获取基准测试中的分布式性能指标得到第二基准测试结果,然后,通过合并所述第一基准测试结果和第二基准测试结果,使得合并后得到的基准测试总结果包含了不同维度的性能分析指标。由于多维度的性能指标能够最大程度地表现算法的运行性能,因此,本领域技术人员通过分析该不同维度的基准测试结果就能够对分布式环境下的监督学习算法进行全面、准确地性能评估,避免了性能指标单一所带来的评估误差。
进一步的,由于第二基准测试结果中包含了从分布式系统中所获取的分布式性能指标,而这些分布式性能指标能够准确反映分布式系统在运行监督学习算法时系统当前的硬件消耗信息,因此,通过对这些分布式性能指标和第一基准测试结果进行综合分析,即可对当前分布式系统运行算法时的性能状况进行准确、快速地判断,克服了现有技术中,由于不具备对分布式环境下的监督学习算法进行基准测试的完整方案而无法对分布式环境下的监督学习算法进行基准测试的问题。
附图说明
图1是根据本申请一个方法实施例提供的一种分布式环境下监督学习算法的基准测试方法实施例的步骤流程图;
图2是根据本申请一个方法实施例提供的一种分布式环境下监督学习算法的基准测试方法实施例的步骤流程图;
图3是根据本申请一个装置实施例提供的一种分布式环境下监督学习算法的基准测试装置实施例的结构框图;
图4是根据本申请一个装置实施例提供的一种分布式环境下监督学习算法的基准测试装置实施例的结构框图;
图5是根据本申请一个装置实施例提供的一种分布式环境下监督学习算法的基准测 试装置实施例的结构框图;
图6是根据本申请一个示例提供的一种分布式环境下监督学习算法的基准测试方法实施例的对每一轮基准测试过程中数据类型划分的逻辑顺序示意图;
图7是根据本申请一个示例提供的一种分布式环境下监督学习算法的基准测试系统的结构图;
图8是本申请一个实施例提供的一种采用交叉验证模型和Label按比例分配模型进行Benchmark基准测试实施例的业务流程图;
图9是根据本申请一个示例提供的一种分布式环境下监督学习算法的处理流程图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
在资源使用方面,分布式环境下的监督学习和传统的单机环境下的监督学习的区别在于分布式环境下监督学习的资源不易被计算统计,以一份128M的训练数据为例,在单机环境下计算执行监督学习算法过程中cpu和内存的消耗很容易,然而,在分布式环境下执行监督学习算法时,所有计算资源由若干台机器上所产生的数据结果组成。
以5台2核4G内存的机器集群为例,其总资源为10核、20G。假设一个监督学习算法的训练数据为128M,这128M的训练数据在训练阶段会发生数据膨胀,分布式环境下可以根据数据大小对数据进行切片从而进行资源的申请,比如,训练数据膨胀到了1G,以256M数据一个实例(instance)来计算,则需要4个instance来完成这个算法任务。假设,为每个instance去动态申请CPU和内存,在分布式环境下4个instance同时运行,加上分布式情况下各种资源间相互协调,最终,该任务消耗的cpu、内存需要同时计算4个instance下的资源消耗,而各个instance下的资源消耗是不容易被统计的。
针对分布式环境下资源消耗不易统计的这一问题,本申请实施例的核心构思之一在于,获取根据基准测试中的输出数据所确定的第一基准测试结果;获取所述基准测试中的分布式性能指标,将所述分布式性能指标确定为第二基准测试结果;将所述第一基准测试结果和第二基准测试结果合并得到基准测试总结果。
方法实施例一
参照图1,示出了本申请的一种分布式环境下监督学习算法的基准测试(benchmark) 方法实施例的步骤流程图,具体可以包括如下步骤:
步骤101、获取根据基准测试中的输出数据所确定的第一基准测试结果;
基于基准测试过程中所获得的输出数据,可以确定第一基准测试结果,该第一基准测试结果是对所述输出数据进行分析而获得的分析结果。
具体应用中,所述第一基准测试结果可以包括以下性能指标至少其中之一:判断为真的正确率(True Positives,TP)、判断为假的正确率(True Negative,TN)、误报率(False Positives,FP)、漏报率(False Negative,FN)、精度Precision、召回率Recall、准确率Accuracy。
步骤102、获取所述基准测试中的分布式性能指标,将所述分布式性能指标确定为第二基准测试结果;
具体的,在分布式环境下的监督学习算法基准测试过程中,所需要获取的分布式性能指标为对监督学习算法基准测试过程中所产生的硬件消耗信息,如,处理器使用情况CPU、内存使用情况MEM、算法迭代次数Iterate及算法使用时间Duration等等。
需要说明的是,在具体应用时,本领域技术人员还可根据实际所选择的不同评估模型确定上述第一基准测试结果和第二基准测试结果中所包含的性能指标,本申请对性能指标的内容不作限制。
步骤103、将所述第一基准测试结果和第二基准测试结果合并得到基准测试总结果。
具体应用时,可将第一基准测试结果和第二基准测试结果中的各个性能指标数据以表格、图形、曲线等多种方式进行合并展示,例如,参见表1所示,是以评估维度表的形式对所述合并得到的基准测试总结果进行展示:
TP FP TN FN CPU MEM Iterate Duration
表1
容易理解的是,基准测试总结果无论以何种形式展现,其都能够从多个维度反映算法的性能指标信息,基于这些信息,具备专业知识的技术人员可以对这些信息进行分析,从而对待测试监督学习算法的性能进行评估。也就是说,本申请实施例一所提供的方法能够协助技术人员完成对监督学习算法的性能评估。
综上,本申请实施例获取根据基准测试中的输出数据所确定的第一基准测试结果,以及获取基准测试中的分布式性能指标得到第二基准测试结果,然后,通过合并所述第一基准测试结果和第二基准测试结果,使得合并后得到的基准测试总结果包含了不同维度的性能分析指标,由于多维度的性能指标能够最大程度地表现算法的运行性能,因此, 本领域技术人员通过分析该不同维度的基准测试结果就能够对分布式环境下的监督学习算法进行全面、准确地性能评估,避免了性能指标单一所带来的评估误差。
进一步的,由于第二基准测试结果中包含了从分布式系统中所获取的分布式性能指标,而这些分布式性能指标能够准确反映当分布式系统运行监督学习算法时系统当前的硬件消耗信息,因此,通过对这些分布式性能指标和第一基准测试结果进行综合分析,即可对当前分布式系统运行算法时的性能状况进行准确、快速地判断,克服了现有技术中,由于不具备对分布式环境下的监督学习算法进行基准测试的完整方案而无法对分布式环境下的监督学习算法进行基准测试的问题。
另外,基于本申请实施例提供的一种基准测试方法可以构建基准测试平台,该基准测试方法或平台能够基于对分布式环境下监督学习算法执行过程中所获取的输出数据和分布式性能指标进行分析,从而对分布式环境下的监督学习算法进行全面、准确地性能评估。
方法实施例二
参照图2,示出了本申请的一种分布式环境下监督学习算法的基准测试方法实施例的步骤流程图,具体可以包括如下步骤:
步骤201、确定待测试监督学习算法;
具体的,在该步骤中需要确定出一个待测试监督学习算法,之后,对该待测试监督学习算法进行基准测试,从而对该待测试监督学习算法的性能进行评估。
由于机器学习技术的广泛应用,不同领域针对不同应用场景会产生各种各样的学习算法,而对不同学习算法的性能进行评估就成为了一项重要内容。
本申请实施例二所提供的方法,主要对分布式环境下的监督学习算法进行基准测试。
该步骤可以由用户进行选择,实际实现中,用户可以直接将某一监督学习算法提交至基准测试系统,则基准测试系统将接收到的监督学习算法确定为待测试监督学习算法;或者,用户在基准测试系统中的选择界面中选择需要被测试的监督学习算法,则基准测试系统将用户所选择的监督学习算法确定为待测试监督学习算法。
步骤202、按照评估模型对所述待测试监督学习算法进行基准测试得到输出数据;
这一步骤之前,需要预先设置评估模型,该模型具备对待测试监督学习算法进行基准测试的功能。
具体的,在算法评估领域,交叉验证模型和标记Label按比例分配模型是被广泛应 用的两种模型,具备较高的准确度和算法稳定性,因此,本申请实施例选择这两种模型作为评估模型示例对本申请提供的方法进行描述;
即,在步骤202中,所述评估模型包括:交叉验证模型和/或标记Label按比例分配模型。
因此,所述按照评估模型对所述待测试监督学习算法进行基准测试,包括:
按照交叉验证模型对所述待测监督学习算法进行基准测试;或者,
按照标记Label按比例分配模型对所述待测监督学习算法进行基准测试;或者,
按照交叉验证模型和Label按比例分配模型分别对所述待测监督学习算法进行基准测试。
参照图8,图8示出的是本申请一个采用交叉验证模型和Label按比例分配模型进行Benchmark基准测试实施例的业务流程图。具体实现时,用户可根据需要选择上述两种模型中其中任意一种模型运行任务并得到展示结果。
在本申请的一个可选实施例中,所述按照交叉验证模型对所述待测试监督学习算法进行基准测试得到输出数据,包括以下步骤:
步骤一、取一测试数据样本;
具体的,测试数据样本通常为一实测数据样本,该数据样本中包括多条数据,每一条数据均包括输入数据和输出数据,而每一条数据中的输入和输出的值通常都为实际的监测值,也可以分别称为标准输入数据和标准输出数据。例如,某一个对房价进行预测的数据样本中,每一条数据的输入为房子大小,对应的输出为均价,其具体取值均为获取的真实值。
步骤二、将所述测试数据样本中的数据等分为N份;
步骤三、对所述N份数据执行M轮基准测试;
其中,在每一轮基准测试中,包括以下步骤:
将所述N份数据中的N-1份确定为训练数据、其余一份确定为预测数据,其中,M轮基准测试中,每一份数据仅有一次被确定为预测数据的机会,M、N为正整数;将所确定的N-1份训练数据提供给所述待测试监督学习算法进行学习得到一个函数;将所确定的一份预测数据中的输入数据提供给所述函数,得出输出数据。
下面通过一个具体应用示例对上述按照交叉验证模型对所述待测试监督学习算法进行基准测试的方法进行详细介绍:
假设,取一个包含1000条数据的测试数据样本1,按照预设规则,N=5,因此,基 准测试系统首先将所述测试数据样本1中的数据等分为5份,分别为数据1、数据2、数据3、数据4及数据5,这样,每份包含200条数据;M值也为5,这样基准测试系统对所述5份数据进行5轮基准测试。
每轮基准测试中,需要对数据类型进行划分,具体的,N-1=4,因此,选择4份作为训练数据,1份作为预测数据。
图6为一种数据类型划分方法的示意图,如图6所示,每一行示出的是5份数据在一轮基准测试中的数据划分方式,其中,每一行中从左至右依次为数据1至数据5的划分方式;第一行中,数据1至数据4被划分为训练数据,数据5为预测数据;第二行中,数据1至数据3及数据5被划分为训练数据,数据4为预测数据;第三行中,数据1、数据2、数据4至数据5为训练数据,而数据3为预测数据;依次类推,第四行中,数据2为预测数据,其余为训练数据;第五行中,数据1为预测数据,其余为训练数据;对数据划分完成之后,需要对数据进行五轮基准测试,在每一轮基准测试中,将所确定的4份训练数据提供给待测试监督学习算法进行学习,得到一个函数(或者,也可称为模型),接下来,将剩余的一份预测数据中的输入数据提供给所述函数,就可以得到输出数据,该输出数据是使用所述函数对输入数据进行预测后得到的预测值;这样,五轮基准测试完成后,可以得到5组输出数据。
需要说明的是,五轮基准测试中,可以按照图6给出的方式中的逻辑顺序对每一轮基准测试过程中的数据类型进行划分,也可以按照其它逻辑顺序对基准测试过程中的数据类型进行划分,例如,将图6中自上至下的行与行之间的次序打乱,只要确保M轮基准测试中,每一份数据只有一次机会被确定为预测数据即可。
在本申请的另一可选实施例中,所述按照Label按比例分配模型对所述待测试监督学习算法进行基准测试得到输出数据,包括以下步骤:
步骤一、取一测试数据样本,所述测试数据样本包括:具备第一标记的数据和具备第二标记的数据;
需要说明的是,在该方案中,所述测试数据样本中包括且仅包括具备第一标记的数据和具备第二标记的数据,第一标记和第二标记是指基于某特定需要而用于对数据进行分类的标记,因此,该方案应用于包含两类数据的二分类场景下。
步骤二、分别将所述测试数据样本中具备第一标记的数据和具备第二标记的数据等分为N份;
步骤三、对所述N份数据执行M轮基准测试:
其中,在每一轮基准测试中,包括以下步骤:
将所述N份具备第一标记的数据中的一份确定为训练数据、并将剩余数据中的一份或多份确定为预测数据,同时,将所述N份具备第二标记的数据中的一份确定为训练数据、并将剩余数据中的一份或多份确定为预测数据,其中,M、N为正整数;将所确定的具备第一标记和第二标记的训练数据提供给所述待测试监督学习算法进行学习得到一个函数;将所确定的具备第一标记和第二标记的预测数据中的输入数据提供给所述函数,得出输出数据。
具体的,第一标记和第二标记只是用于对不同标记进行区分,并不用于限定。实际应用中,第一标记和第二标记可以使用不同的标记符号,例如,第一标记可以为1,第二标记为0;或者,第一标记为Y,第二标记为N等等。
下面通过一个应用示例对按照Label按比例分配模型对所述待测试监督学习算法进行基准测试的方法进行详细介绍:
Label按比例分配模型是根据label值进行分类,之后,对每个类型进行等比区分,然后再进行不同比例的组合去做训练。
假设,一个测试数据样本2包含1000条数据,其中,600条数据的label值为1、400条数据的label值为0。按照Label按比例分配模型可以把600条label值为1的数据分成10份,每份60个数据、将400条label为0的数据也分成10份,每份40个数据。所述测试数据样本2的划分方法如表2所示,其中,每一行代表一份数据,数据1至数据10代表10分Label值为1的数据,数据11至数据20代表10分Label值为0的数据。
测试数据样本2 Label
数据1 1
数据2 1
数据3 1
数据4 1
数据5 1
数据6 1
数据7 1
数据8 1
数据9 1
数据10 1
数据11 0
数据12 0
数据13 0
数据14 0
数据15 0
数据16 0
数据17 0
数据18 0
数据19 0
数据20 0
表2
在进行基准测试时,基准测试系统可以将1份label值为1的数据和1份label值为0的数据确定为训练数据,将另外一份label值为1和label值为0的数据确定为预测数据、或者将一份以上label值为1和label值为0的数据确定为预测数据。
对数据划分完成之后,就可以对数据进行基准测试,假设M=4,则需要进四轮基准测试。在每一轮基准测试中,将所确定的训练数据提供给待测试监督学习算法进行学习,得到一个函数(或者,也可称为模型),接下来,将预测数据中的输入数据提供给所述函数,就可以得到输出数据,该输出数据是使用所述函数对输入数据进行预测后得到的预测值;这样,四轮基准测试完成后,可以得到四组输出数据。
相应的,所述按照交叉验证模型和Label按比例分配模型分别对所述待测监督学习算法进行基准测试,是指将测试数据样本分别按照交叉验证模型和Label按比例分配模型进行基准测试,这样,不同评估模型下,将得到一组输出数据,将这两组输出数据确定为整个基准测试过程的输出数据。
步骤203、获取根据基准测试中的输出数据所确定的第一基准测试结果;
具体的,通过基准测试获得输出数据以后,可以根据输出数据与标准输出数据,即,输入数据在测试数据样本中所对应的输出数据的偏差来确定多个参数指标,具体应用中,所述第一基准测试结果可以包括以下性能指标至少其中之一:TP、TN、FP、FN、Precision、Recall、Accuracy。
步骤204、获取所述基准测试中的分布式性能指标,将所述分布式性能指标确定为第二基准测试结果;
具体的,基准测试系统中的系统性能检测模块能够在基准测试过程中获得各种分布式性能指标,这些分布式性能指标即为第二基准测试结果,具体的,所述分布式性能指标,包括以下指标至少其中之一:待测试监督学习算法对处理器的使用情况CPU、待测试监督学习算法对内存的使用情况MEM、待测试监督学习算法的迭代次数Iterate及待测试监督学习算法的使用时间Duration。
步骤205、将所述第一基准测试结果和第二基准测试结果合并得到基准测试总结果。
在对待测试监督学习算法进行基准测试(也就是性能评估)时,需要结合第一基准测试结果和第二基准测试结果来进行综合分析。
因此,可以在获得第一基准测试结果和第二基准测试结果之后,将这两种基准测试结果合并,生成这些结果所对应的列表,并将该列表通过显示屏显示给用户,当用户为具备算法评估分析能力的技术人员时,可以直接根据列表中所呈现的数据进行综合分析,从而对待测试监督学习算法的性能进行评估。
一个示例性的基准测试总结果列表如下表3所示:
TP FP TN FN Precision Recall Accuracy CPU MEM Iterate Duration
                     
表3
该列表可以包括一行或多行输出结果,每一行输出结果对应一轮基准测试所确定的第一基准测试结果和第二基准测试结果;或者,每一行输出结果对应对多轮基准测试综合分析后所确定的第一基准测试结果和第二基准测试结果。表3是一个示例的基准测试总结果列表。
步骤206、根据所述基准测试结果对所述待测试监督学习算法进行性能评估。
具体的,所述根据所述基准测试结果对所述待测试监督学习算法进行性能评估,包括:
根据所述第一基准测试结果确定F1得分;以及,通过以下方式对所述待测试监督学习算法进行性能评估:
当F1得分相同或者接近时,待测试监督学习算法的迭代次数越小则待测试监督学习算法性能越好。依据这种方式可以直接对待测试监督学习算法的性能进行评估,也就是, 在F1得分相同和相近时,确定待测试监督学习算法的迭代次数,而迭代次数越小的待测试监督学习算法被确定为性能更好。
其中,F1得分,即,F1 score,可以看作是算法准确率和召回率的一种加权平均,是用于评估待测试监督学习算法好坏的一个重要指标,其计算公式如下:
Figure PCTCN2017075854-appb-000001
其中,precision和recall均为第一基准测试结果中的指标,具体的,precision为精度,recall为召回率。
因此,在这种性能评估方式中,只需要确定precision、recall及待测试监督学习算法的迭代次数的取值,即可对待测试监督学习算法的性能进行评估。
另外,也可以通过以下方式对所述待测试监督学习算法进行性能评估:
当F1指标相同时,待测试监督学习算法的CPU、MEM、Iterate及Duration值越小,则确定待测试监督学习算法性能越好。
上述方案中,也可以将基准测试结果和F1得分同时列表输出,方便技术人员查看和分析。一个示例性的列表如下表4所示,表4是本申请另一个示例的基准测试结果和F1得分同时输出的示意表:
F1 TP FP TN FN Precision Recall Accuracy CPU MEM Iterate Duration
                       
表4
在本申请的另一种可选实施例中,对待测试监督学习算法进行性能评估之后,可以将性能评估结果发送给用户,具体的,可以将性能评估结果展示于显示界面之上,供用户查看,从而辅助用户进行算法性能评估。
在本申请的另一种可选实施例中,所述方法还包括:
判断F1得分的偏差是否合理,如果合理,确定基准测试成功;如果不合理,确定基准测试不成功,且向用户发送报警指示信息。由于F1得分是用于判断待测试监督学习算法性能的一个重要指标,在实际应用中,用户可以针对不同待测试监督学习算法预先设置F1得分的一个标准值,并设置偏差范围,当F1得分的偏差在用户设置的范围内,则确定基准测试成功,如果F1得分的偏差超出用户设置的范围,则确定基准测试不成功, 用户可以重新进行测试。
综上,本申请实施例二所提供的方法,通过对基准测试总结果作进一步性能分析确定F1值,然后,可基于该F1值直接对监督算法在分布式环境下的运行性能做出判断并将判断结果提供给用户,使得本领域技术人员能够从输出结果中直观地获知监督学习算法在分布式环境下的运行性能,与上述实施例一相比,由于用户无需重新计算分析指标,因此减少了用户分析判断所需的时间,进一步提高了分析效率。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
装置实施例
参照图3,示出了本申请的一种分布式环境下监督学习算法的基准测试装置实施例的结构框图,具体可以包括:第一基准测试结果获取模块31、指标获取模块32、第二基准测试结果确定模块33及基准测试总结果确定模块34;其中,
所述第一基准测试结果确定模块31,用于根据所述基准测试中的输出数据确定第一基准测试结果;
所述指标获取模块32,用于获取所述基准测试中的分布式性能指标;
所述第二基准测试结果确定模块33,用于将所述分布式性能指标确定为第二基准测试结果;
所述基准测试总结果确定模块34,用于将所述第一基准测试结果和第二基准测试结果合并得到基准测试总结果。
在本申请的一种可选实施例中,如图4所示,所述装置还包括:
确定模块35,用于在所述第一基准测试结果获取模块获取根据基准测试中的输出数据所确定第一基准测试结果之前,确定待测试监督学习算法;
所述基准测试模块36,用于按照评估模型对所述待测试监督学习算法进行基准测试得到输出数据;
所述第一基准测试结果确定模块37,用于根据基准测试中的输出数据确定第一基准 测试结果。
具体的,所述基准测试模块36,用于按照交叉验证模型对所述待测监督学习算法进行基准测试;或者,按照标记Label按比例分配模型对所述待测监督学习算法进行基准测试;或者,按照交叉验证模型和Label按比例分配模型分别对所述待测监督学习算法进行基准测试得到输出数据;其中,
所述基准测试模块36,包括:第一基准测试子模块和第二基准测试子模块;其中,
所述第一基准测试子模块,用于按照交叉验证模型或标记Label按比例分配模型对所述待测监督学习算法进行基准测试;
所述第二基准测试子模块,用于按照交叉验证模型或标记Label按比例分配模型对所述待测监督学习算法进行基准测试。
具体的,所述第一基准测试子模块,包括:
第一取数据单元,用于取一测试数据样本;
第一等分单元,用于将所述测试数据样本中的数据等分为N份;
第一确定单元,用于在每一轮基准测试中,将所述N份数据中的N-1份确定为训练数据、其余一份确定为预测数据,其中,M轮基准测试中,每一份数据仅有一次被确定为预测数据的机会,M、N为正整数;
第一提供单元,用于在每一轮基准测试中,将所确定的N-1份训练数据提供给所述待测试监督学习算法进行学习得到一个函数;
第二提供单元,用于在每一轮基准测试中,将所确定的一份预测数据中的输入数据提供给所述函数,得出输出数据。
具体的,所述第二基准测试子模块,包括:
第二取数据单元,用于取一测试数据样本,所述测试数据样本包括:具备第一标记的数据和具备第二标记的数据;
第二等分单元,用于分别将所述测试数据样本中具备第一标记的数据和具备第二标记的数据等分为N份;
第二确定单元,用于在每一轮基准测试中,将所述N份具备第一标记的数据中的一份确定为训练数据、并将剩余数据中的一份或多份确定为预测数据,同时,将所述N份具备第二标记的数据中的一份确定为训练数据、并将剩余数据中的一份或多份确定为预测数据,其中,M、N为正整数;
第三提供单元,用于在每一轮基准测试中,将所确定的具备第一标记和第二标记的 训练数据提供给所述待测试监督学习算法进行学习得到一个函数;
第四提供单元,用于在每一轮基准测试中,将所确定的具备第一标记和第二标记的预测数据中的输入数据提供给所述函数,得出输出数据。
具体的,所述第一基准测试结果包括以下指标至少其中之一:
判断为真的正确率TP、判断为假的正确率TN、误报率FP、漏报率FN、精度Precision、召回率Recall及准确度Accuracy;
所述第二基准测试结果包括以下指标至少其中之一:待测试监督学习算法对处理器的使用情况CPU、待测试监督学习算法对内存的使用情况MEM、待测试监督学习算法的迭代次数Iterate及待测试监督学习算法的使用时间Duration。
在本申请的另一种可选实施例中,如图5所示,所述装置还包括:性能评估模块38,用于根据所述第一基准测试结果确定F1得分;以及,用于通过以下方式对所述待测试监督学习算法进行性能评估:
当F1得分相同或者接近时,待测试监督学习算法的迭代次数越小则确定待测试监督学习算法性能越好;或者,
当F1指标相同时,待测试监督学习算法的CPU、MEM、Iterate及Duration值越小,则确定待测试监督学习算法性能越好。
其中,F1得分,即,F1 score,可以看作是算法准确率和召回率的一种加权平均,是用于评估待测试监督学习算法好坏的一个重要指标,其计算公式如下:
Figure PCTCN2017075854-appb-000002
其中,precision和recall均为第一基准测试结果中的指标,具体的,precision为精度,recall为召回率。
在具体实施过程中,上述第一基准测试结果获取模块31、指标获取模块32、第二基准测试结果确定模块33、基准测试总结果确定模块34、确定模块35、基准测试模块36、第一基准测试结果确定模块37及性能评估模块38可以由基准测试系统内的中央处理器(CPU,Central Processing Unit)、微处理器(MPU,Micro Processing Unit)、数字信号处理器(DSP,Digital Signal Processor)或可编程逻辑阵列(FPGA,Field-Programmable Gate Array)来实现。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关 之处参见方法实施例的部分说明即可。
应用实例
图7为一种示例性的基准测试系统的结构图,该基准测试系统包括:任务新建模块71、任务拆分模块72、任务执行模块73、数据统计模块74、分布式指标采集模块75及数据存储模块76;其中,
所述任务新建模块71,用于根据用户指示建立基准测试任务;
具体的,用户确定待测试监督学习算法,从而建立针对该待测试监督学习算法的基准测试任务。
所述任务拆分模块72,用于对用户指示建立的基准测试任务进行拆分;
当用户所设置的待测试监督学习算法包括一种以上时,将每一种待测试监督学习算法拆分为一个基准测试任务。
所述任务执行模块73,用于对所述基准测试任务进行基准测试并生成测试数据;
所述数据统计模块74,用于通过统计生成的基准测试结果;
具体的,将集中测试过程中生成的测试数据合并得到集中测试结果。
所述分布式指标采集模块75,用于采集基准测试过程中所产生的分布式指标;
所述数据存储模块76,用于对所述基准测试结果和分布式指标进行存储。
其中,所述任务执行模块73,进一步包括:训练模块731、预测模块732及分析模块733;其中,所述训练模块731,用于将训练数据提供给所述待测试监督学习算法进行学习得到一个函数;所述预测模块732,用于将预测数据提供给所述函数,得到输出数据。所述分析模块733,用于根据所述输出数据生成测试数据。
基于上述基准测试系统,一种示例性的基准测试方法的步骤流程图如图9所示,该方法包括以下步骤:
步骤901、新建任务;
具体的,用户根据需要新建一个任务,该任务针对一特定监督学习算法,因此用户需要设置待测试的监督学习算法;
步骤902、执行任务;
具体的,按照交叉验证模型或者按比例分配模型对所述监督学习算法进行基准测试。
步骤903、生成基准测试总结果;
这里的基准测试总结果包括:对所述监督学习算法进行基准测试时根据测试数据所 确定的基准测试结果和基准测试执行过程中所获取的分布式指标。
步骤904、确定F1得分;
具体的,根据所述基准测试结果确定F1得分。
步骤905、判断F1得分是否合理;当F1得分合理时,转至步骤906;当F1得分不合理时,转至步骤907;
步骤906、指示用户新建基准测试任务;
同时,指示用户上一个基准测试任务测试成功。
步骤907、指示基准测试任务失败;
具体的,向用户发出基准测试任务失败的指示消息。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
在一个典型的配置中,所述计算机设备包括一个或多个处理器(CPU)、输入数据/输出数据接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非持续性的电脑可读媒体(transitory media),如调制的数据信号和载波。
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产 品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种分布式环境下监督学习算法的基准测试方法和一种分布式环境下监督学习算法的基准测试装置,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方 式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (14)

  1. 一种分布式环境下监督学习算法的基准测试方法,其特征在于,所述方法包括:
    获取根据基准测试中的输出数据所确定的第一基准测试结果;
    获取所述基准测试中的分布式性能指标,将所述分布式性能指标确定为第二基准测试结果;
    将所述第一基准测试结果和第二基准测试结果合并得到基准测试总结果。
  2. 根据权利要求1所述的方法,其特征在于,所述获取根据基准测试中的输出数据所确定第一基准测试结果之前,所述方法还包括:
    确定待测试监督学习算法;
    按照评估模型对所述待测试监督学习算法进行基准测试得到输出数据;
    根据基准测试中的输出数据确定第一基准测试结果。
  3. 根据权利要求2所述的方法,其特征在于,所述按照评估模型对所述待测试监督学习算法进行基准测试得到输出数据,包括:
    按照交叉验证模型对所述待测监督学习算法进行基准测试得到输出数据;或者,
    按照标记Label按比例分配模型对所述待测监督学习算法进行基准测试得到输出数据;或者,
    按照交叉验证模型和Label按比例分配模型分别对所述待测监督学习算法进行基准测试得到输出数据。
  4. 根据权利要求3所述的方法,其特征在于,所述按照交叉验证模型对所述待测试监督学习算法进行基准测试得到输出数据,包括:
    取一测试数据样本;
    将所述测试数据样本中的数据等分为N份;
    对所述N份数据执行M轮基准测试;其中,
    在每一轮基准测试中,包括以下步骤:
    将所述N份数据中的N-1份确定为训练数据,其余一份确定为预测数据,其中,M轮基准测试中,每一份数据仅有一次被确定为预测数据的机会,其中,所述M、N为正整数;
    将所确定的N-1份训练数据提供给所述待测试监督学习算法进行学习得到一个函数;
    将所确定的一份预测数据中的输入数据提供给所述函数,得出输出数据。
  5. 根据权利要求3所述的方法,其特征在于,所述按照Label按比例分配模型对所述待测试监督学习算法进行基准测试得到输出数据,包括:
    取一测试数据样本,所述测试数据样本包括:具备第一标记的数据和具备第二标记的数据;
    分别将所述测试数据样本中具备第一标记的数据和具备第二标记的数据等分为N份;
    对所述等分后得到的2N份数据执行M轮基准测试;其中,
    在每一轮基准测试中包括以下步骤:
    将所述N份具备第一标记的数据中的一份确定为训练数据、并将剩余数据中的一份或多份确定为预测数据,同时,将所述N份具备第二标记的数据中的一份确定为训练数据、并将剩余数据中的一份或多份确定为预测数据,其中,所述M、N为正整数;
    将所确定的具备第一标记和第二标记的训练数据提供给所述待测试监督学习算法进行学习得到一个函数;
    将所确定的具备第一标记和第二标记的预测数据中的输入数据提供给所述函数,得到输出数据。
  6. 根据权利要求1至5其中任一项所述的方法,其特征在于,所述第一基准测试结果包括以下指标至少其中之一:判断为真的正确率TP、判断为假的正确率TN、误报率FP及漏报率FN、精度Precision、召回率Recall及准确度Accuracy;
    所述第二基准测试结果包括以下指标至少其中之一:待测试监督学习算法对处理器的使用情况CPU、待测试监督学习算法对内存的使用情况MEM、待测试监督学习算法的迭代次数Iterate及待测试监督学习算法的使用时间Duration。
  7. 根据权利要求1至5其中任一项所述的方法,其特征在于,所述得到基准测试总结果后,所述方法还包括:
    根据所述第一基准测试结果确定F1得分;以及,通过以下方式对待测试监督学习算法进行性能评估:
    当F1得分相同或者接近时,待测试监督学习算法的Iterate值越小则确定待测试监督学习算法性能越好;或者,
    当F1指标相同时,待测试监督学习算法的CPU、MEM、Iterate及Duration值越小,则确定待测试监督学习算法性能越好。
  8. 一种分布式环境下监督学习算法的基准测试装置,其特征在于,所述装置包括: 第一基准测试结果获取模块、指标获取模块、第二基准测试结果确定模块及基准测试总结果确定模块;其中,
    所述第一基准测试结果获取模块,用于获取根据基准测试中的输出数据所确定的第一基准测试结果;
    所述指标获取模块,用于获取所述基准测试中的分布式性能指标;
    所述第二基准测试结果确定模块,用于将所述分布式性能指标确定为第二基准测试结果;
    所述基准测试总结果确定模块,用于将所述第一基准测试结果和第二基准测试结果合并得到基准测试总结果。
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:
    确定模块,用于在所述第一基准测试结果获取模块获取根据基准测试中的输出数据所确定第一基准测试结果之前,确定待测试监督学习算法;
    所述基准测试模块,用于按照评估模型对所述待测试监督学习算法进行基准测试得到输出数据;
    所述第一基准测试结果确定模块,用于根据基准测试中的输出数据确定第一基准测试结果。
  10. 根据权利要求9所述的装置,其特征在于,所述基准测试模块,用于按照交叉验证模型对所述待测监督学习算法进行基准测试;或者,按照标记Label按比例分配模型对所述待测监督学习算法进行基准测试;或者,按照交叉验证模型和Label按比例分配模型分别对所述待测监督学习算法进行基准测试得到输出数据;其中,
    所述基准测试模块,包括:第一基准测试子模块和第二基准测试子模块;其中,
    所述第一基准测试子模块,用于按照交叉验证模型或标记Label按比例分配模型对所述待测监督学习算法进行基准测试;
    所述第二基准测试子模块,用于按照交叉验证模型或标记Label按比例分配模型对所述待测监督学习算法进行基准测试。
  11. 根据权利要求10所述的装置,其特征在于,所述第一基准测试子模块,包括:
    第一取数据单元,用于取一测试数据样本;
    第一等分单元,用于将所述测试数据样本中的数据等分为N份;
    第一确定单元,用于在每一轮基准测试中,将所述N份数据中的N-1份确定为训练数据、其余一份确定为预测数据,其中,M轮基准测试中,每一份数据仅有一次被确定 为预测数据的机会,M、N为正整数;
    第一提供单元,用于在每一轮基准测试中,将所确定的N-1份训练数据提供给所述待测试监督学习算法进行学习得到一个函数;
    第二提供单元,用于在每一轮基准测试中,将所确定的一份预测数据中的输入数据提供给所述函数,得出输出数据。
  12. 根据权利要求10所述的装置,其特征在于,所述第二基准测试子模块,包括:
    第二取数据单元,用于取一测试数据样本,所述测试数据样本包括:具备第一标记的数据和具备第二标记的数据;
    第二等分单元,用于分别将所述测试数据样本中具备第一标记的数据和具备第二标记的数据等分为N份;
    第二确定单元,用于在每一轮基准测试中,将所述N份具备第一标记的数据中的一份确定为训练数据、并将剩余数据中的一份或多份确定为预测数据,同时,将所述N份具备第二标记的数据中的一份确定为训练数据、并将剩余数据中的一份或多份确定为预测数据,其中,M、N为正整数;
    第三提供单元,用于在每一轮基准测试中,将所确定的具备第一标记和第二标记的训练数据提供给所述待测试监督学习算法进行学习得到一个函数;
    第四提供单元,用于在每一轮基准测试中,将所确定的具备第一标记和第二标记的预测数据中的输入数据提供给所述函数,得出输出数据。
  13. 根据权利要求8至12其中任一项所述的装置,其特征在于,所述第一基准测试结果包括以下指标至少其中之一:
    判断为真的正确率TP、判断为假的正确率TN、误报率FP、漏报率FN、精度Precision、召回率Recall及准确度Accuracy;
    所述第二基准测试结果包括以下指标至少其中之一:待测试监督学习算法对处理器的使用情况CPU、待测试监督学习算法对内存的使用情况MEM、待测试监督学习算法的迭代次数Iterate及待测试监督学习算法的使用时间Duration。
  14. 根据权利要求8至12其中任一项所述的装置,其特征在于,所述装置还包括:
    性能评估模块,用于根据所述第一基准测试结果确定F1得分;以及,通过以下方式对待测试监督学习算法进行性能评估:
    当F1得分相同或者接近时,待测试监督学习算法的迭代次数越小则确定待测试监督学习算法性能越好;或者,
    当F1指标相同时,待测试监督学习算法的CPU、MEM、Iterate及Duration值越小,则确定待测试监督学习算法性能越好。
PCT/CN2017/075854 2016-03-18 2017-03-07 一种分布式环境下监督学习算法的基准测试方法和装置 WO2017157203A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/134,939 US20190019111A1 (en) 2016-03-18 2018-09-18 Benchmark test method and device for supervised learning algorithm in distributed environment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610158881.9 2016-03-18
CN201610158881.9A CN107203467A (zh) 2016-03-18 2016-03-18 一种分布式环境下监督学习算法的基准测试方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/134,939 Continuation US20190019111A1 (en) 2016-03-18 2018-09-18 Benchmark test method and device for supervised learning algorithm in distributed environment

Publications (1)

Publication Number Publication Date
WO2017157203A1 true WO2017157203A1 (zh) 2017-09-21

Family

ID=59850091

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/075854 WO2017157203A1 (zh) 2016-03-18 2017-03-07 一种分布式环境下监督学习算法的基准测试方法和装置

Country Status (4)

Country Link
US (1) US20190019111A1 (zh)
CN (1) CN107203467A (zh)
TW (1) TWI742040B (zh)
WO (1) WO2017157203A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110262939A (zh) * 2019-05-14 2019-09-20 苏宁金融服务(上海)有限公司 算法模型运行监控方法、装置、计算机设备和存储介质
CN111242314A (zh) * 2020-01-08 2020-06-05 中国信息通信研究院 深度学习加速器基准测试方法和装置
CN111274821A (zh) * 2020-02-25 2020-06-12 北京明略软件系统有限公司 一种命名实体识别数据标注质量评估方法及装置

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11704610B2 (en) * 2017-08-31 2023-07-18 Accenture Global Solutions Limited Benchmarking for automated task management
US10949252B1 (en) * 2018-02-13 2021-03-16 Amazon Technologies, Inc. Benchmarking machine learning models via performance feedback
US11301909B2 (en) * 2018-05-22 2022-04-12 International Business Machines Corporation Assigning bias ratings to services
US11263484B2 (en) * 2018-09-20 2022-03-01 Innoplexus Ag System and method for supervised learning-based prediction and classification on blockchain
CN113168206A (zh) 2018-12-07 2021-07-23 惠普发展公司,有限责任合伙企业 使用预测模型的自动超频
US11275672B2 (en) 2019-01-29 2022-03-15 EMC IP Holding Company LLC Run-time determination of application performance with low overhead impact on system performance
US11138088B2 (en) 2019-01-31 2021-10-05 Hewlett Packard Enterprise Development Lp Automated identification of events associated with a performance degradation in a computer system
CN110362492B (zh) * 2019-07-18 2024-06-11 腾讯科技(深圳)有限公司 人工智能算法测试方法、装置、服务器、终端及存储介质
CN114328166A (zh) * 2020-09-30 2022-04-12 阿里巴巴集团控股有限公司 Ab测试算法的性能信息获取方法、装置和存储介质
WO2022136904A1 (en) * 2020-12-23 2022-06-30 Intel Corporation An apparatus, a method and a computer program for benchmarking a computing system
CN113392976A (zh) * 2021-06-05 2021-09-14 清远市天之衡传感科技有限公司 一种量子计算系统性能监测方法及装置
JP7176158B1 (ja) * 2021-06-30 2022-11-21 楽天グループ株式会社 学習モデル評価システム、学習モデル評価方法、及びプログラム
TWI817237B (zh) * 2021-11-04 2023-10-01 關貿網路股份有限公司 風險預測方法、系統及其電腦可讀媒介

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6381558B1 (en) * 1998-12-18 2002-04-30 International Business Machines Corporation Alternative profiling methodology and tool for analyzing competitive benchmarks
US20090083717A1 (en) * 2007-09-20 2009-03-26 Michael John Branson Benchmark profiling for distributed systems
US20110296249A1 (en) * 2010-05-26 2011-12-01 Merchant Arif A Selecting a configuration for an application
CN104077218A (zh) * 2013-03-29 2014-10-01 百度在线网络技术(北京)有限公司 MapReduce分布式系统的测试方法及设备
CN104809063A (zh) * 2015-04-24 2015-07-29 百度在线网络技术(北京)有限公司 分布式系统的测试方法及装置
CN105068934A (zh) * 2015-08-31 2015-11-18 浪潮集团有限公司 一种用于云平台的基准测试系统及方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559303A (zh) * 2013-11-15 2014-02-05 南京大学 一种对数据挖掘算法的评估与选择方法
TWI519965B (zh) * 2013-12-26 2016-02-01 Flexible assembly system and method for cloud service service for telecommunication application

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6381558B1 (en) * 1998-12-18 2002-04-30 International Business Machines Corporation Alternative profiling methodology and tool for analyzing competitive benchmarks
US20090083717A1 (en) * 2007-09-20 2009-03-26 Michael John Branson Benchmark profiling for distributed systems
US20110296249A1 (en) * 2010-05-26 2011-12-01 Merchant Arif A Selecting a configuration for an application
CN104077218A (zh) * 2013-03-29 2014-10-01 百度在线网络技术(北京)有限公司 MapReduce分布式系统的测试方法及设备
CN104809063A (zh) * 2015-04-24 2015-07-29 百度在线网络技术(北京)有限公司 分布式系统的测试方法及装置
CN105068934A (zh) * 2015-08-31 2015-11-18 浪潮集团有限公司 一种用于云平台的基准测试系统及方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110262939A (zh) * 2019-05-14 2019-09-20 苏宁金融服务(上海)有限公司 算法模型运行监控方法、装置、计算机设备和存储介质
CN111242314A (zh) * 2020-01-08 2020-06-05 中国信息通信研究院 深度学习加速器基准测试方法和装置
CN111242314B (zh) * 2020-01-08 2023-03-21 中国信息通信研究院 深度学习加速器基准测试方法和装置
CN111274821A (zh) * 2020-02-25 2020-06-12 北京明略软件系统有限公司 一种命名实体识别数据标注质量评估方法及装置
CN111274821B (zh) * 2020-02-25 2024-04-26 北京明略软件系统有限公司 一种命名实体识别数据标注质量评估方法及装置

Also Published As

Publication number Publication date
CN107203467A (zh) 2017-09-26
TWI742040B (zh) 2021-10-11
TW201734841A (zh) 2017-10-01
US20190019111A1 (en) 2019-01-17

Similar Documents

Publication Publication Date Title
WO2017157203A1 (zh) 一种分布式环境下监督学习算法的基准测试方法和装置
CN113792825B (zh) 一种用电信息采集设备故障分类模型训练方法及装置
US11048729B2 (en) Cluster evaluation in unsupervised learning of continuous data
WO2021174811A1 (zh) 车流量时间序列的预测方法及预测装置
CN113092981B (zh) 晶圆数据检测方法及系统、存储介质及测试参数调整方法
CN116450399B (zh) 微服务系统故障诊断及根因定位方法
CN109891508A (zh) 单细胞类型检测方法、装置、设备和存储介质
CN113010389A (zh) 一种训练方法、故障预测方法、相关装置及设备
Grbac et al. Stability of software defect prediction in relation to levels of data imbalance
CN111863135B (zh) 一种假阳性结构变异过滤方法、存储介质及计算设备
CN108446213A (zh) 一种静态代码质量分析方法和装置
CN108133234B (zh) 基于稀疏子集选择算法的社区检测方法、装置及设备
CN114896024B (zh) 基于核密度估计的虚拟机运行状态检测方法和装置
Liu et al. Random rounded integer-valued autoregressive conditional heteroskedastic process
CN113032998B (zh) 医疗器械寿命评估方法和装置
CN111367781B (zh) 一种实例处理方法及其装置
CN109886288A (zh) 一种用于电力变压器的状态评价方法及装置
CN107291722B (zh) 一种描述词的分类方法及设备
JP2011141674A (ja) ソフトウェア品質指標値管理システム、ソフトウェア品質指標値の真値を推定する推定方法及び推定プログラム
EP4287198A1 (en) Method and system for determining which stage a user performance belongs to
EP4254182A1 (en) Method and apparatus of detecting running state of a virtual machine based on kernel density estimation
CN112884167B (zh) 一种基于机器学习的多指标异常检测方法及其应用系统
WO2023029065A1 (zh) 数据集质量评估方法、装置、计算机设备及存储介质
US11977987B2 (en) Automatic hypothesis generation using geospatial data
Wu et al. Estimate the Precision of Defects Based on Reports Duplication in Crowdsourced Testing

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17765745

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17765745

Country of ref document: EP

Kind code of ref document: A1