CN108491226B

CN108491226B - Spark configuration parameter automatic tuning method based on cluster scaling

Info

Publication number: CN108491226B
Application number: CN201810110273.XA
Authority: CN
Inventors: 鲍亮; 陈炜昭; 卜晓璇
Original assignee: Xidian University
Current assignee: Hegang Digital Technology Co ltd
Priority date: 2018-02-05
Filing date: 2018-02-05
Publication date: 2021-03-23
Anticipated expiration: 2038-02-05
Also published as: CN108491226A

Abstract

The invention discloses a Spark configuration parameter automatic tuning method based on cluster scaling, which comprises the following steps: (1) building a cluster; (2) selecting a configuration parameter set; (3) determining the value type and range of the configuration parameters; (4) zooming the cluster; (5) training a random forest model; (6) screening the optimal configuration; (7) and verifying the configuration effect. The invention can be applied to the technical field of mass data processing, shortens the time for evaluating each configuration by zooming the value range of the distributed memory computing frame Spark memory configuration parameters and the data volume to be processed, establishes the relationship between the configuration and the distributed memory computing frame Spark cluster performance influence by the random forest model, and searches the configuration with the best performance of the distributed memory computing frame Spark cluster formed by a plurality of computers with the same hardware configuration.

Description

Spark configuration parameter automatic tuning method based on cluster scaling

Technical Field

The invention belongs to the technical field of computers, and further relates to a Spark configuration parameter automatic tuning method based on cluster scaling in the technical field of mass data processing. According to the invention, the configuration of the distributed memory computing frame Spark cluster performance superior to that of the distributed memory computing frame Spark cluster under default configuration can be obtained by zooming the distributed memory computing frame Spark cluster and training the random forest model.

Background

The distributed memory computing framework Spark is a big data parallel computing framework based on memory computing. The distributed memory computing framework Spark is based on memory computing, so that the real-time performance of data processing in a big data environment is improved, high fault tolerance and high scalability are guaranteed, and a user is allowed to deploy the distributed memory computing framework Spark on a large amount of cheap hardware to form a cluster. Currently, the distributed memory computing framework Spark, which has been used by many macros including Amazon, eBay, and Yahoo! . Many organizations run a distributed memory computing framework Spark over a cluster having thousands of nodes. Configuration parameter optimization has been one of the research hotspots of the distributed memory computing framework Spark, since the configuration parameters are numerous (more than 100), the performance is greatly affected by the configuration parameters, and the best performance is far from being achieved by using the default configuration. Therefore, automatic optimization of configuration parameters for the distributed memory computing framework Spark is an urgent problem to be solved.

The patent document 'an automatic optimization method of data-aware Spark configuration parameters' (application number: 201611182310.5 application date: 2016.12.20 published: CN106648654A) applied by Shenzhen research institute of advanced technology discloses an automatic optimization method of data-aware Spark configuration parameters. The method comprises the steps of selecting a Spark application program, further determining parameters influencing Spark performance in the application program, and determining the value range of the parameters; randomly generating parameters in a value range, generating a configuration file configuration Spark, running an application program after configuration and collecting data; constructing transverse vectors by using the collected Spark running time, input data sets and configuration parameter value data, constructing a training set by using a plurality of vectors, and modeling the training set by using a random forest algorithm; and searching for optimal configuration parameters through a genetic algorithm by using the constructed performance model. The method has the disadvantages that the influence of each configuration on the performance of the Spark cluster of the distributed memory computing framework needs to be evaluated in the actual environment, and the influence is used as a training set of the random forest model, so that a large amount of time cost is wasted.

In a patent document applied by university of Chinese academy of sciences, a Spark platform performance automatic optimization method (application number: 201610068611.9 application date: 2016.02.01 publication number: CN105868019A), an automatic Spark platform performance optimization method is disclosed, which creates a Spark application performance model through an execution mechanism of a Spark platform, selects partial data load of the Spark application to run on the Spark platform aiming at a set Spark application, and collects performance data when the Spark application runs; inputting the collected performance data into a Spark application performance model, and determining values of all parameters in the Spark application performance model when Spark application is operated; and calculating the performance (total application execution time) of the Spark platform when the Spark platform is combined with different configuration parameters to obtain the configuration parameter combination when the Spark platform is optimal in performance. The method has the defects that the establishment of the distributed memory computing framework Spark application performance model needs to understand the execution mechanism of the distributed memory computing framework Spark, and the model establishment process is complex and difficult.

Disclosure of Invention

The invention aims to provide a Spark configuration parameter automatic optimization method based on cluster scaling, aiming at the defects of high time cost and complex model creation process of the Spark configuration parameter automatic optimization method of the distributed memory computing framework in the prior art.

The idea for realizing the purpose of the invention is to scale the value range of the distributed memory computing frame Spark memory configuration parameters and the input data volume according to the cluster scaling, shorten the time for evaluating the influence of each configuration on the performance of the distributed memory computing frame Spark cluster, spend less time to obtain a sufficient training set and train a more accurate random forest model. And searching the configuration with the best performance of a distributed memory computing frame Spark cluster formed by a plurality of computers with the same hardware configuration by using a random forest model and a screening optimal configuration method.

The method comprises the following specific steps:

(1) building a cluster:

building a cluster consisting of a plurality of computers with the same hardware configuration and provided with distributed memory computing frames Spark;

(2) selecting a configuration parameter set:

selecting the configuration parameters recommended to be modified in the optimization standard from all the configuration parameters to be modified of the Spark cluster of the distributed memory computing framework to form a configuration parameter set to be optimized;

(3) determining the value type and range of the configuration parameters:

setting the value type and range of each parameter in a configuration parameter set to be optimized in a Spark cluster of a distributed memory computing framework according to a parameter description standard, extracting a default value from the value range of each parameter, and forming all default values into default configuration;

(4) scaling the clusters:

zooming the value range of the memory configuration parameters in the configuration parameter set to be optimized and the data to be processed by utilizing a zooming strategy of a Spark cluster of a distributed memory computing framework;

(5) training a random forest model:

(5a) recording the starting time of the searching process;

(5b) forming a multi-dimensional space by using the configuration parameter sets to be optimized as a search space, and sampling the search space by using a uniform sampling strategy to obtain configuration parameter sets uniformly distributed in the search space as an initial search configuration parameter set;

(5c) evaluating all configurations in the initial search configuration parameter set by using a configuration evaluation strategy to obtain a training set which is ordered from large to small according to the performance influence of a Spark cluster of a distributed memory computing framework;

(5d) before taking from the training set

Configuring to form an iterative search configuration parameter set, wherein m represents the total number of configurations searched in each iterative search process specified by a user;

(5e) inputting the training set into a random forest model to train the model;

(6) screening an optimal configuration:

(6a) generating a configuration parameter set by using a uniform sampling strategy, and randomly taking out the configuration parameter set

A configuration for evaluating each configuration by using a configuration evaluation strategyIf the influence of the configuration on the distributed memory computing frame Spark cluster performance is larger than the first configuration in the training set, creating an ordered configuration parameter set, putting the configuration into the ordered configuration parameter set which is sorted according to the descending order of the distributed memory computing frame Spark cluster performance influence, and adding each configuration evaluation result into the training set;

(6b) for each actual configuration in the iterative search configuration parameter set, reducing a search space according to a range approximation strategy, and generating a configuration parameter set by using a uniform sampling strategy; inputting each configuration in the configuration parameter set into a random forest model, predicting the performance influence of the configuration on a distributed memory computing frame Spark cluster, and obtaining the predicted configuration with the maximum performance influence in the prediction result;

(6c) obtaining the performance influence of the predicted configuration on the distributed memory computing frame Spark cluster by using a configuration evaluation strategy, forming a sequence by the predicted configuration and the performance influence of the configuration on the distributed memory computing frame Spark cluster, adding the sequence into a training set, and replacing the actual configuration according to two situations in a configuration replacement strategy; if the actual configuration is not replaced, the next search does not adopt a range approximation strategy for the actual configuration;

(6d) subtracting the initial time of the searching process from the time when the configuration replacement is completed to obtain the time of the searching process;

(6e) judging whether the time of the searching process is less than the searching time specified by the user, if so, executing the step (6a), otherwise, executing the step (6 f);

(6f) extracting the configuration with the maximum influence on the performance of the distributed memory computing framework Spark cluster in the training set as the optimal configuration;

(7) and (3) verifying configuration effect:

(7a) reducing the values of the reduced memory configuration and the data to be processed by using a distributed memory computing framework Spark cluster reduction strategy to obtain the configuration to be verified and the actual data to be processed;

(7b) and respectively evaluating the performance influence of the configuration to be verified and the default configuration on the Spark cluster of the distributed memory computing framework by using a configuration evaluation strategy, and taking the configuration to be verified, which is greater than the performance influence of the default configuration on the Spark cluster of the distributed memory computing framework, as the configuration parameters of the Spark cluster of the automatically-tuned distributed memory computing framework.

Compared with the prior art, the invention has the following advantages:

firstly, the method utilizes the scaling strategy of the distributed memory computing frame Spark cluster to scale the value range of the memory configuration parameters in the configuration parameter set to be optimized and the data to be processed, so that the time for evaluating the influence of each configuration on the performance of the distributed memory computing frame Spark cluster is shortened, and the problem that the time cost is wasted as the training set of the random forest model because the influence of each configuration on the performance of the distributed memory computing frame Spark cluster needs to be evaluated in the actual environment in the prior art is solved, so that the time cost for obtaining the model training set is reduced.

Secondly, the training set is input into the training model in the random forest model, and the random forest model directly simulates the execution mechanism of the frame Spark, so that the problems that the establishment of the application performance model of the distributed memory computing frame Spark in the prior art needs to understand the execution mechanism of the distributed memory computing frame Spark, the model establishment process is complex, and the difficulty is high are solved, and the threshold of optimizing the distributed memory computing frame Spark cluster by a user is reduced.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a diagram of a simulation experiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

The specific steps of the present invention are further described with reference to fig. 1.

And step 1, building a cluster.

And (4) building a cluster consisting of a plurality of computers with the same hardware configuration and provided with distributed memory computing frames Spark.

And 2, selecting a configuration parameter set.

And selecting the configuration parameters recommended to be modified in the optimization standard from all the configuration parameters to be modified of the Spark cluster of the distributed memory computing framework to form a configuration parameter set to be optimized.

In the optimization page in the Spark official document of the distributed memory computing framework, the optimization criteria specify the configuration parameters that should be optimized.

And step 3, determining the value type and range of the configuration parameters.

Setting the value type and range of each parameter in a configuration parameter set to be optimized in a Spark cluster of a distributed memory computing framework according to a parameter description standard, extracting default values from the value range of each parameter, and forming default configuration by all the default values.

In a configuration page in a Spark official document of a distributed memory computing framework, a parameter description standard specifies the role, default values and value ranges of each configuration parameter set in detail.

And 4, scaling the cluster.

And utilizing a distributed memory computing framework Spark cluster scaling strategy to scale the value range of the memory configuration parameters in the configuration parameter set to be optimized and the data to be processed.

The steps of the distributed memory computing framework Spark cluster scaling strategy are as follows:

step 1, calculating the scaling of a Spark cluster of a distributed memory calculation framework according to the following formula:

wherein, R represents the scale of the Spark cluster of the distributed memory computing framework,

represents a rounding-down operation, log₂Representing base 2 logarithmic operation, and M represents the memory size per computer in megabits.

Step 2, calculating the value range of the scaled memory configuration parameters according to the following formula:

wherein m represents the scaled memory configuration parameter, and e represents the symbol.

And 3, calculating the scaled data to be processed according to the following formula:

wherein D represents the data to be processed after zooming, and D represents the data to be processed before zooming.

And 5, training a random forest model.

The starting moment of the search process is recorded.

And forming a multi-dimensional space by using the configuration parameter sets to be optimized as a search space, and sampling the search space by using a uniform sampling strategy to obtain configuration parameter sets uniformly distributed in the search space as an initial search configuration parameter set.

The steps of the uniform sampling strategy are as follows:

step 1, equally dividing each dimension in the search space according to k to obtain k intervals with the same size, wherein k is the total number of the configuration parameter sets to be searched in the initial search specified by the user.

And 2, randomly selecting a floating point number in each interval.

And 3, combining the floating point numbers selected in all the intervals into a k-dimensional sequence, and randomly disordering the sequence of the floating point numbers in the k-dimensional sequence to obtain a disordered k-dimensional sequence.

And 4, forming a sequence by floating point numbers at the same position in the disordered k-dimensional sequences in all dimensions, wherein each sequence is used as a configuration to obtain k configuration parameter sets.

And evaluating all configurations in the initial search configuration parameter set by using a configuration evaluation strategy to obtain a training set which is ordered from large to small according to the performance influence of a Spark cluster of a distributed memory computing framework.

The configuration evaluation strategy is to run a distributed memory computing frame Spark cluster according to the configuration to be evaluated, analyze the data to be processed by using an analysis method specified by a user, record the time required by analyzing the data, take the reciprocal of the time as the performance influence of the distributed memory computing frame Spark cluster, and combine the configuration and the configuration on the performance influence of the distributed memory computing frame Spark cluster into a sequence, wherein the analysis method specified by the user is any data processing method selected by the user from the fields of statistical analysis, machine learning and webpage retrieval.

Before taking from the training set

And configuring to form an iterative search configuration parameter set, wherein m represents the total number of configurations searched in each iterative search process specified by a user.

And inputting the training set into a random forest model to train the model.

And 6, screening the optimal configuration.

Generating a configuration parameter set by using a uniform sampling strategy, and randomly taking out the configuration parameter set

And (3) configuration, namely evaluating each configuration by using a configuration evaluation strategy, if the influence of the configuration on the distributed memory computing frame Spark cluster performance is greater than that of the first configuration in a training set, creating an ordered configuration parameter set, putting the configuration into the ordered configuration parameter set which is sorted according to the descending order of the distributed memory computing frame Spark cluster performance influence, and adding each configuration evaluation result into the training set.

The steps of the uniform sampling strategy are as follows:

And 2, randomly selecting a floating point number in each interval.

For each actual configuration in the iterative search configuration parameter set, reducing a search space according to a range approximation strategy, and generating a configuration parameter set by using a uniform sampling strategy; and inputting each configuration in the configuration parameter set into the random forest model, predicting the performance influence of the configuration on the distributed memory computing frame Spark cluster, and obtaining the predicted configuration with the maximum performance influence in the prediction result.

The steps of the uniform sampling strategy are as follows:

And 2, randomly selecting a floating point number in each interval.

The steps of the range approximation strategy are as follows:

and step 1, extracting other configuration values with the shortest distance to the value of the configuration to be processed from other configurations larger than the value of the configuration to be processed as an upper boundary in each dimension of all the configurations in the training set of the search space, and extracting other configuration values with the shortest distance to the value of the configuration to be processed from other configurations smaller than the value of the configuration to be processed as a lower boundary.

And 2, taking the upper and lower boundaries of each dimension as the value range of the dimension, and forming the reduced search space by the value ranges of all the dimensions.

Obtaining the performance influence of the predicted configuration on the distributed memory computing frame Spark cluster by using a configuration evaluation strategy, forming a sequence by the predicted configuration and the performance influence of the configuration on the distributed memory computing frame Spark cluster, adding the sequence into a training set, and replacing the actual configuration according to two situations in a configuration replacement strategy; if the actual configuration is not replaced, the next search does not employ a range approximation strategy for the actual configuration.

Two cases in the configuration replacement policy to replace the actual configuration refer to:

A. for situations where the predicted configuration performance impact is greater than the actual configuration, the actual configuration is replaced with the predicted configuration.

B. For the case where the ordered set of configuration parameters is not empty, a first configuration is extracted from the ordered set of configuration parameters in place of the actual configuration.

The steps of the range approximation strategy are as follows:

And subtracting the starting time of the searching process from the time when the configuration replacement is finished to obtain the time of the searching process.

And (4) judging whether the time of the searching process is less than the searching time specified by the user, if so, re-executing the step (6), and otherwise, extracting the configuration with the largest influence on the performance of the Spark cluster of the distributed memory computing frame in the training set as the optimal configuration.

And 7, verifying the configuration effect.

And restoring the value of the reduced memory configuration and the data to be processed by using a distributed memory computing framework Spark cluster restoring strategy to obtain the configuration to be verified and the actual data to be processed.

The steps of the distributed memory computing framework Spark cluster restoring strategy are as follows:

step 1, calculating the restored memory configuration according to the following formula:

C＝(m-300)×R+300

wherein, C represents the restored memory configuration.

And step 2, calculating the reduced data to be processed according to the following formula:

D＝d×R

wherein D represents the data to be processed before scaling.

And respectively evaluating the performance influence of the configuration to be verified and the default configuration on the Spark cluster of the distributed memory computing framework by using a configuration evaluation strategy, and taking the configuration to be verified, which is greater than the performance influence of the default configuration on the Spark cluster of the distributed memory computing framework, as the configuration parameters of the Spark cluster of the automatically-tuned distributed memory computing framework.

The effect of the present invention is further verified and explained below with the simulation experiment.

1. Simulation conditions are as follows:

the simulation experiment environment of the invention is that 6 computers with distributed memory computing frame Spark on Ariyun are selected to be configured with the same hardware, and a distributed memory computing frame Spark cluster is built. The specification parameters for each computer in the simulation are shown in table 1.

TABLE 1 computer parameter Specification List

Operating system	CentOS 6.8
		Number of processor cores	4
Memory device	32GB
		Hard disk	250GB

2. Simulation content:

the method comprises the steps of performing simulation experiments by using a distributed memory computing frame Spark configuration parameter automatic optimization method based on cluster scaling through three different user inputs, verifying that the performance of the distributed memory computing frame Spark cluster under the searched configuration is superior to that of default configuration, wherein the serial number of each simulation experiment, data to be processed specified by a user each time, an analysis method, search time, the total number k of a configuration parameter set to be searched in initial search and the total number m of configuration parameters searched in each iterative search process are shown in a table 2.

Table 2 simulation parameters summary

Serial number

Data to be processed

Analytical method

Search time

k

m

1

506.9M

PageRank (webpage retrieval)

485 minutes

317

20

2

7.5G

Logistic regression (machine learning)

360 minutes

163

20

3

76.5G

WordCount (statistical score)Analysis)

320 minutes

211

20

3. And (3) simulation result analysis:

the simulation results of the present invention are further described with reference to fig. 2. The abscissa in fig. 2 represents the serial number input by the user each time, and the ordinate represents the time of analyzing the to-be-processed data by the Spark cluster in the distributed memory computing framework, and the unit is second. The diagonal columns in fig. 2 represent default configurations and the solid columns represent optimized configurations. Fig. 2 records the time for completing the analysis of the to-be-processed data by using the analysis method specified by the user in the optimization configuration and the default configuration of the distributed memory computing framework Spark cluster in the three user inputs. In fig. 2, the entity columns with the same serial number are all lower than the diagonal columns, and it can be seen that, under the optimized configuration obtained by three times of user input, the time for analyzing the data to be processed by the distributed memory computing framework Spark cluster is all smaller than the default configuration, which indicates that under the optimized configuration, the performance of the distributed memory computing framework Spark cluster is better than the default configuration, and the effectiveness of the Spark configuration parameter automatic tuning method based on cluster scaling is verified.

In summary, the invention discloses a Spark configuration parameter automatic optimization method based on cluster scaling, which solves the problems of high time cost and complex model creation process of the Spark configuration parameter automatic optimization method of the distributed computing framework in the prior art. The method comprises the following specific steps: (1) building a cluster; (2) selecting a configuration parameter set; (3) determining the value type and range of the configuration parameters; (4) zooming the cluster; (5) training a random forest model; (6) screening the optimal configuration; (7) and verifying the configuration effect. In the process of scaling the Spark cluster of the distributed memory computing frame, the random forest model is trained and the optimal configuration is screened to serve as an innovation point of the experiment, and the time cost for obtaining the training set is reduced by scaling the Spark cluster of the distributed memory computing frame; by training the random forest model and screening the optimal configuration set, the problem of complex model creation process is solved, and optimal configuration superior to the performance of a Spark cluster of a distributed memory computing framework under default configuration is obtained. The invention can be applied to the technical field of mass data processing, and the configuration parameters with the best performance of the distributed memory computing frame Spark cluster formed by a plurality of computers with the same hardware configuration are searched by scaling the value range and the input data volume of the distributed memory computing frame Spark memory configuration parameters according to the cluster scaling.

Claims

1. A distributed memory computing frame Spark configuration parameter automatic tuning method based on cluster scaling is characterized in that the value range and the input data volume of the distributed memory computing frame Spark configuration parameter are scaled according to the cluster scaling, and the configuration with the best performance of a distributed memory computing frame Spark cluster formed by a plurality of computers with the same hardware configuration is searched, wherein the method comprises the following specific steps:

(1) building a cluster:

(2) selecting a configuration parameter set:

(3) determining the value type and range of the configuration parameters:

(4) scaling the clusters:

firstly, calculating the scale of a Spark cluster of a distributed memory calculation framework according to the following formula:

represents a rounding-down operation, log₂The logarithm operation with the base of 2 is represented, M represents the memory size of each computer, and the unit is megabyte;

secondly, calculating the value range of the scaled memory configuration parameters according to the following formula:

wherein m represents the scaled memory configuration parameter, and e represents the symbol;

thirdly, calculating the scaled data to be processed according to the following formula:

wherein D represents the data to be processed after zooming, and D represents the data to be processed before zooming;

(5) training a random forest model:

(5a) recording the starting time of the searching process;

(5d) before taking from the training set

(5e) inputting the training set into a random forest model to train the model;

(6) screening an optimal configuration:

The configuration comprises the steps that each configuration is evaluated by a configuration evaluation strategy, if the influence of the configuration on the distributed memory computing frame Spark cluster performance is larger than that of the first configuration in a training set, an ordered configuration parameter set is created, the configuration is placed into the ordered configuration parameter set which is sorted according to the descending order of the distributed memory computing frame Spark cluster performance influence, and each configuration evaluation result is added into the training set;

(7) and (3) verifying configuration effect:

2. The cluster scaling-based automatic optimization method for the Spark configuration parameters of the distributed memory computing framework according to claim 1, wherein: the steps of the uniform sampling strategy in the steps (5b), (6a) and (6b) are as follows:

the method comprises the steps that firstly, each dimension in a search space is equally divided according to k to obtain k intervals with the same size, wherein k is the total number of configuration parameter sets to be searched in initial search specified by a user;

secondly, randomly selecting a floating point number in each interval;

thirdly, combining the floating point numbers selected in all the intervals into a k-dimensional sequence, and randomly disordering the sequence of the floating point numbers in the k-dimensional sequence to obtain a disordered k-dimensional sequence;

and fourthly, forming a sequence by floating point numbers at the same position in the disordered k-dimensional sequences in all dimensions, wherein each sequence is used as a configuration to obtain k configuration parameter sets.

3. The cluster scaling-based automatic optimization method for the Spark configuration parameters of the distributed memory computing framework according to claim 1, wherein: the configuration evaluation strategy in the step (5c), the step (6a) and the step (6c) is to run a distributed memory computing frame Spark cluster according to the configuration to be evaluated, analyze the data to be processed by using an analysis method specified by a user, record the time required by analyzing the data, use the reciprocal of the time as the performance influence of the distributed memory computing frame Spark cluster, and form a sequence of the configuration and the performance influence of the configuration on the distributed memory computing frame Spark cluster, wherein the analysis method specified by the user is any data processing method selected by the user from the fields of statistical analysis, machine learning and webpage retrieval.

4. The cluster scaling-based automatic optimization method for the Spark configuration parameters of the distributed memory computing framework according to claim 1, wherein: the range approximation strategy in the steps (6b) and (6c) comprises the following steps:

firstly, extracting other configuration values with the shortest distance to the value of the configuration to be processed from other configurations larger than the value of the configuration to be processed as an upper boundary in each dimension of all the configurations in the training set of the search space, and extracting other configuration values with the shortest distance to the value of the configuration to be processed from other configurations smaller than the value of the configuration to be processed as a lower boundary;

and secondly, taking the upper and lower boundaries of each dimension as the value range of the dimension, and forming a reduced search space by the value ranges of all the dimensions.

5. The cluster scaling-based automatic optimization method for the Spark configuration parameters of the distributed memory computing framework according to claim 1, wherein: the step (6c) of replacing the actual configuration according to two situations in the configuration replacement policy includes:

A. for the case that the predicted configuration performance impact is greater than the actual configuration, replacing the actual configuration with the predicted configuration;

6. The cluster scaling-based automatic optimization method for the Spark configuration parameters of the distributed memory computing framework according to claim 1, wherein: the step of restoring the strategy of the distributed memory computing framework Spark cluster described in the step (7a) is as follows:

step one, calculating the restored memory configuration according to the following formula:

C＝(m-300)×R+300

wherein, C represents the memory configuration after reduction;

secondly, calculating the reduced data to be processed according to the following formula:

D＝d×R

wherein D represents the data to be processed before scaling.