CN112231356A

CN112231356A - Data processing method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN112231356A
Application number: CN202011124091.1A
Authority: CN
Inventors: 刘煜
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2020-10-20
Filing date: 2020-10-20
Publication date: 2021-01-15

Abstract

The application provides a data processing method and device, electronic equipment and a computer readable storage medium, comprising: the method comprises the steps of collecting data usage information and data usage capacity information of a big data cluster, at least comprising data usage of each mode in the big data cluster and data usage of each data table, respectively determining each mode and an application system preset to correspond to each data table, and obtaining the data usage of the application system in the big data cluster according to the usage of the mode corresponding to the application system and the usage of the data table corresponding to the application system. According to the rule that the application system mainly uses the mode of the big data cluster and the data table of the data resource using the big data cluster, the data usage of the application system in the big data cluster can be obtained by determining the mode corresponding to the application system and the data usage of the corresponding data table.

Description

Data processing method and device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of electronic information, and in particular, to a method and an apparatus for data processing, an electronic device, and a computer-readable storage medium.

Background

With the development of big data, many big data clusters based on greenplus and HADOOP products have been generated so far. Big data clusters can be used to provide a large amount of data resources for application systems.

At present, in the analysis of data usage of a big data cluster, an existing query tool only queries to obtain the data usage of the big data cluster, and a research and development staff generally determines a scheme for expanding the big data cluster according to the data usage of the big data cluster. However, the data utilization rate of the big data cluster is used as a data basis for capacity expansion of the big data cluster, and the problem of data resource waste of the big data cluster often occurs, so how to obtain data more suitable for capacity expansion of the big data cluster becomes a problem to be solved urgently.

Disclosure of Invention

The inventor finds that the data resource waste of the big data cluster is caused by adopting the data utilization rate of the big data cluster as the basis data for expanding the big data cluster, because a plurality of application systems actually share the same big data cluster, different application systems have different data utilization amounts of the big data cluster, and the data utilization amounts have different growth trends. Therefore, under the condition of only the data usage rate of the big data cluster, the expansion of the big data cluster by research personnel is not targeted expansion (for example, the expansion of all data tables in the big data cluster), and the expansion of the big data cluster data cannot be targeted according to the data usage rate of each application system to the big data cluster, so that the waste of data resources is often caused. Therefore, how to obtain the usage amount of the application system for the big data cluster data becomes a problem to be solved urgently.

In order to achieve the above object, the present application provides the following technical solutions:

a method of data processing, comprising:

acquiring preset data acquisition items of a big data cluster, wherein the data acquisition items at least comprise data usage of each mode in the big data cluster and data usage of each data table;

respectively determining each mode and an application system corresponding to each data table preset; the mode corresponds to the application system, the application system presets to use the mode, and the data table corresponds to the application system, and the application system presets to use the data table;

and aiming at each application system, obtaining the data usage of the application system in the big data cluster according to the data usage of the mode corresponding to the application system and the data usage of the data table corresponding to the application system.

In the above method, optionally, the big data cluster is a plurality of big data clusters;

the preset data acquisition items of the big data collection clusters are preset data acquisition items of the big data collection clusters;

prior to collecting the data collection items of the plurality of big data clusters, further comprising:

acquiring big data cluster information of each big data cluster, wherein the big data cluster information at least comprises a use peak time period of the big data cluster and connection configuration information for connecting the big data cluster;

for each big data cluster, connecting to the big data cluster according to the connection configuration information of the big data cluster;

and configuring and obtaining the acquisition time period and the acquisition frequency for acquiring each data acquisition item according to the peak use time period of each big data cluster.

Optionally, in the foregoing method, for each big data cluster, acquiring a preset data acquisition item of the big data cluster includes:

acquiring a preset acquisition script corresponding to the data acquisition item; the acquisition script comprises a script execution statement for acquiring the data acquisition item;

and executing the acquisition script according to the acquisition time period and the acquisition frequency, and acquiring to obtain the data acquisition item.

Optionally, the determining the modes and the application systems corresponding to the data table presets respectively includes:

acquiring a preset corresponding relation table, wherein the corresponding relation table comprises the application systems corresponding to the modes and the application systems corresponding to the data table;

and obtaining each mode and the application system corresponding to each data table preset according to the corresponding relation table.

In the above method, optionally, the data collection item further includes a cluster data usage amount of the big data cluster.

The above method, optionally, further includes:

inputting the cluster data usage amount of the big data cluster and historical cluster data usage amount of the big data cluster obtained in a historical manner into a pre-established prediction model to obtain the cluster data usage amount of the big data cluster in a future period; the prediction model is obtained by modeling according to a linear regression algorithm.

Inputting the data usage of the application system and historical data usage of the application system obtained in a historical mode into the prediction model for each application system to obtain the data usage of the application system in a future period;

the cluster data usage amount of the big data cluster in the future time period and the data usage amount of the application system in the future time period are data for capacity expansion of the big data cluster.

The above method, optionally, further includes:

responding to a query request of the data acquisition items input by a user, and displaying each data acquisition item;

and responding to a modification request of the data acquisition items input by a user, displaying each data acquisition item, and taking a new data acquisition item input by the user as the data acquisition item of the big data cluster.

An apparatus for data processing, comprising:

the system comprises an acquisition unit, a data acquisition unit and a data processing unit, wherein the acquisition unit is used for acquiring preset data acquisition items of a big data cluster, and the data acquisition items at least comprise data usage of each mode in the big data cluster and data usage of each data table;

the determining unit is used for respectively determining each mode and each application system corresponding to the data table presetting; the mode corresponds to the application system, the application system presets to use the mode, and the data table corresponds to the application system, and the application system presets to use the data table;

and the calculating unit is used for obtaining the data usage of the application system in the big data cluster according to the data usage of the mode corresponding to the application system and the data usage of the data table corresponding to the application system for each application system.

An electronic device, comprising: a processor and a memory for storing a program; the processor is used for running the program to realize the data processing method.

A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the above-described method of data processing.

The method and the device for processing the data use of the large data cluster acquire data use information and data use capacity information of the large data cluster, at least comprise data use amounts of all modes (namely schemas of the large data cluster) in the large data cluster and data use amounts of all data tables, respectively determine application systems corresponding to the modes and the data tables in a preset mode, corresponding to the application systems, and preset use data tables corresponding to the application systems, and obtain the data use amounts of the application systems in the large data cluster according to the use amounts of the modes corresponding to the application systems and the use amounts of the data tables corresponding to the application systems. According to the rule that the application system mainly uses the mode of the big data cluster and the data table of the data resource using the big data cluster, the data usage of the application system in the big data cluster can be obtained by determining the mode corresponding to the application system and the data usage of the corresponding data table.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a data processing method according to an embodiment of the present application;

fig. 2 is a flowchart of a method for predicting data usage according to an embodiment of the present disclosure;

FIG. 3 is a block diagram of a data processing platform according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

At present, an existing query tool only queries to obtain a data usage rate of a big data cluster, and a research and development staff generally determines a scheme for expanding the big data cluster according to the data usage rate of the big data cluster. However, the data usage rate of the big data cluster is used as the data to be used for capacity expansion of the big data cluster, which often results in waste of data resources of the big data cluster. Therefore, under the condition that only the data usage rate of the big data cluster exists, the capacity expansion of the big data cluster by research and development personnel is needless to be performed with targeted capacity expansion, and the capacity expansion of the big data cluster data cannot be performed with targeted capacity expansion according to the data usage rate of each application system to the big data cluster, so that the waste of data resources is often caused.

Therefore, the data processing method provided by the application aims to obtain the data usage amount of each application system in the big data cluster by combining each mode in the big data cluster and the usage amount of the data table, so that a later research and development worker can specifically expand the big data cluster data according to the data usage amount of each application system to the big data cluster.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that, in this application, the "mode" of the big data cluster refers to "schema" of the big data cluster, where the schema includes schema objects, and the schema objects may be tables, columns, data types, views, storage procedures, relationships, primary keys, and foreign keys.

Fig. 1 is a data processing method provided in an embodiment of the present application, and the method may include the following steps:

and S101, collecting preset data collection items of the big data cluster.

The big data cluster includes a large number of pre-created patterns and data tables. The data acquisition items at least comprise data usage of each mode in the big data cluster and data usage of each data table.

The specific implementation manner of the step is as follows: and acquiring a preset acquisition script corresponding to the data acquisition item, executing the acquisition script, and acquiring to obtain the acquisition item. The acquisition script comprises a script execution statement of the acquired acquisition item, and the script execution statement of the specific acquisition item can be set according to requirements. The acquisition period and acquisition frequency may be obtained in conjunction with a peak-time-of-use setting for a large data cluster. For example, the collection period is set to a period after a peak period of use of the big data cluster, and the collection frequency may be a preset number of times of collection during the collection period.

It should be noted that the data collection items of the big data cluster can be queried and modified, so that the data collection items can be expanded and added at a later stage according to different requirements of data collection. The specific implementation mode is as follows: and displaying each data acquisition item in response to a query request of the data acquisition item input by a user, displaying each data acquisition item in response to a modification request of the data acquisition item input by the user, and taking a new data acquisition item input by the user as a data acquisition item of the big data cluster.

And S102, respectively determining each mode and each application system preset by the data table.

In this embodiment, the mode corresponds to the application system, the application system presets the mode, and the data table corresponds to the application system, and the application system presets the data table. It should be noted that an application system may preset a plurality of modes using big data clusters, and a plurality of data tables.

The specific implementation manner of the step is as follows: and acquiring a preset corresponding relation table, an application system corresponding to the packet mode in the corresponding relation table and an application system corresponding to the data table, and acquiring each mode and the application system corresponding to each data table preset according to the corresponding relation table.

S103, aiming at each application system, obtaining the data usage of the application system in the big data cluster according to the usage of the mode corresponding to the application system and the usage of the data table corresponding to the application system.

The specific implementation manner of the step is as follows: and regarding each application system, taking the sum of the usage of the mode corresponding to the application system and the usage of the data table corresponding to the application system as the data usage of the application system in the big data cluster.

The method provided by this embodiment acquires data usage information and data usage capacity information of a big data cluster, where the data usage information at least includes data usage amounts of each mode and each data table in the big data cluster, and determines each mode and an application system preset to correspond to each data table, where the mode and the application system correspond to a preset usage mode of the application system, and the data table and the application system correspond to a preset usage data table of the application system, and obtains the data usage amount of the application system in the big data cluster according to the usage amounts of the mode corresponding to the application system and the usage amount of the data table corresponding to the application system. According to the rule that the application system mainly uses the mode of the big data cluster and the data table of the data resource using the big data cluster, the data usage of the application system in the big data cluster can be obtained by determining the mode corresponding to the application system and the data usage of the corresponding data table.

In this application, the data processing method described in the above embodiment may be executed on multiple big data clusters in parallel or in series to obtain the data usage of the application system in each big data cluster in the big data cluster. It should be noted that the plurality of big data clusters may be different types of big data clusters, for example, greenplus big data clusters and HADOOP big data clusters, and each acquisition item of the big data clusters may be acquired by connecting to the big data clusters, setting an acquisition time period and an acquisition frequency, and executing the data processing method shown in fig. 1.

Therefore, for any one big data cluster, before acquiring data acquisition items of the big data cluster, it needs to connect to the big data cluster, and configure the acquisition period and the acquisition frequency, a specific embodiment may include the following steps a1 to A3:

and A1, acquiring big data cluster information of each big data cluster.

The big data cluster information includes at least a peak period of use of the big data cluster, and connection configuration information for connecting the big data cluster, for example, an identification of the big data cluster (the identification of the big data cluster may be a name of the big data cluster), and an IP address of the big data cluster.

And A2, connecting each big data cluster to the big data cluster according to the connection configuration information of the big data cluster.

And A3, configuring and obtaining the collection time period and the collection frequency for collecting each data collection item according to the peak use time period of each big data cluster.

For example, according to the peak time of each big data cluster, the peak time of each big data cluster is counted, if the peak time of each big data cluster is the same, parallel acquisition can be performed, and in the parallel acquisition, the acquisition time period and the acquisition frequency corresponding to each big data cluster are the same. If the non-use peak time periods of the big data clusters are different, serial collection can be carried out, and in the serial collection, the collection time period and the collection frequency corresponding to each big data cluster can be set by self. It should be noted that the acquisition period is any period of the off-peak period.

The scheme provided by the application can be connected to the big data cluster only according to the connection configuration information of the big data cluster, and is irrelevant to the product type of the big data cluster, and after the scheme is connected to the big data cluster, according to the set acquisition time period and the acquisition frequency, the data acquisition items of all the big data clusters can be acquired in parallel or in series, that is, the data acquisition items of the big data clusters of different product types can be acquired, for example, the HADOOP big data cluster and the GreenPlum big data cluster can be acquired simultaneously, on the other hand, along with the continuous increase of the big data products and the clusters, the scheme provided by the application can be connected to the newly increased big data cluster according to the connection configuration information of the newly increased big data cluster, the data acquisition items of the newly increased big data cluster are acquired, and the expandability is good.

In the foregoing embodiment of the present application, the data collection item may further include a cluster data usage amount of the big data cluster. Similarly, the cluster data usage of the big data cluster may be obtained by performing a statement query according to a preset cluster data usage query.

In order to grasp the increasing trend of the cluster data usage of the big data cluster and the increasing trend of the data usage of the big data cluster by each application system as a basis for expanding the big data cluster, the application provides a method for predicting the data usage, as shown in fig. 2, the method may include the following steps:

s201, acquiring first data to be predicted of the big data cluster and second data to be predicted of each application system.

The first to-be-predicted data of the big data cluster is: the method comprises the steps of collecting cluster data usage of a big data cluster obtained in a collecting time period, and historical cluster data usage of the big data cluster obtained in a historical mode.

The second data to be predicted of the application system is: and acquiring the data usage of the application system acquired in the acquisition time period and the data usage of the application system acquired historically.

S202, data processing is carried out on the first to-be-predicted data to obtain first model data.

For example, the data to be predicted of the big data cluster is cleaned and transformed according to the format requirement of the model data input into the prediction model.

And S203, respectively carrying out data processing on the second data to be predicted of each application system to obtain second model data of each application system.

And S204, inputting the data to be predicted into the prediction model to obtain the cluster data usage amount of the big data cluster in the future time period.

The prediction model is obtained by modeling in advance according to a linear regression algorithm. The specific modeling process can refer to the prior art.

And S205, inputting the second data to be predicted of each application system into the prediction model respectively to obtain the cluster data usage amount of each application system in the future time period.

In the method provided by this embodiment, the future data usage trend of the big data cluster obtained by the prediction model may be used as early warning information for capacity expansion of the big data cluster, for example, according to the future data usage trend of the big data cluster, at a certain time in the future, the cluster data usage of the big data cluster will reach the total data capacity of the big data cluster, and before that, the capacity expansion of the big data cluster is required, so that the data of the future data usage of the big data cluster may play an early warning role in capacity expansion.

In addition, the cluster data usage amount of each application system in the future period, which is obtained by combining with the prediction model, can perform targeted and reasonable capacity expansion on the large data cluster, for example, if the future data usage amount of the application system a is increased faster and the future data usage amount of the application system a is increased slower, the data table or schema expansion of the application system a in the large data cluster can be performed with targeted emphasis, so that the reasonable configuration of the data resources of the large data cluster can be realized. Therefore, in this embodiment, the cluster data usage amount of the big data cluster in the future time period and the data usage amount of the application system in the future time period may be used as the capacity expansion basis data of the big data cluster.

In correspondence with the above-mentioned example of the present application, fig. 3 is an architecture diagram of a data processing platform 300 provided in the example of the present application, as shown in fig. 3, including: a scheduling module 301, an acquisition module 302, and an analysis prediction module 303.

The scheduling module 301 has functions of configuration management, scheduling management, and rights management.

The authority management is used for controlling and managing inquiry authority and modification authority, and specifically comprises the following steps: and displaying each data acquisition item in response to a query request of the data acquisition item input by a user, displaying the data acquisition item in response to a modification request of the data acquisition item input based on a management account with management authority, and taking the input new data acquisition item as a data acquisition item of the big data cluster.

Configuration management: and managing information of each big data cluster input by a user, wherein the information comprises a big data cluster name, a big data cluster use peak period, connection configuration information of the big data cluster and the like.

And the scheduling management function can configure parallel acquisition or serial acquisition tasks, acquisition time and acquisition frequency of each big data cluster according to the use peak time of each big data cluster. The collection task may be distributed to the collection module 302 at regular time according to a set collection time period and collection frequency, where the collection task at least includes connection configuration information of a large data cluster to be collected. Of course, when the scheduling module 301 is abnormal and cannot be triggered regularly, manual triggering may be adopted to distribute the collection task to the collection module 302

The acquisition module 302 is composed of a plurality of acquisition nodes, each acquisition node is pre-installed with a client of a large data cluster to be acquired, such as a client of a HADOOP large data cluster and a client of a greenplus large data cluster, and after the connection configuration information of the large data cluster is adopted, the acquisition node is connected to the large data cluster and starts to execute an acquisition task, acquires data usage of each mode in the large data cluster, data usage of each data table and cluster data usage of the large data cluster, and stores the acquired data information in a MySQL database of the analysis and prediction module 303.

The analysis and prediction module 303 includes analysis management and prediction management.

The analysis management is to analyze and process the data of the MySQL database to obtain the data usage amount of the application system in the big data cluster.

The prediction management is to input the cluster data usage of the big data cluster and historical cluster data usage of the big data cluster obtained historically into a pre-established prediction model to obtain the cluster data usage of the big data cluster in a future period; and for each application system, inputting the data usage amount of the application system and the historical data usage amount of the application system obtained by history into a prediction model to obtain the data usage amount of the application system in the future period, and finally outputting the cluster data usage amount of the large data cluster in the future period and the data usage amount of the application system in the future period.

The data processing platform provided by the application can obtain the data usage amount of the application system in the big data cluster; and historical data usage of the application system.

Further, along with the continuous increase of big data product and cluster, this data processing platform can be connected to newly-increased big data cluster according to newly-increased big data cluster's the configuration information that connects, gathers the data acquisition item of newly-increased big data cluster, has good expansibility and practicality.

Furthermore, the operation and maintenance manpower can be saved. With the continuous increase of big data products and clusters, operation and maintenance personnel do not need to log in different tool pages of big data clusters to check data usage information. The data platform can uniformly check the data usage conditions of all big data clusters.

Further, the dynamic early warning and cluster expansion basis data are provided. According to the future data usage trend of the data cluster, the early warning function of capacity expansion can be achieved. By combining the cluster data usage amount of each application system in the future period predicted by the prediction model, large data clusters can be expanded with pertinence and reasonability.

Fig. 4 is a diagram illustrating the result of an apparatus 400 for data processing according to an embodiment of the present application, including:

the system comprises an acquisition unit 401, a storage unit and a processing unit, wherein the acquisition unit 401 is used for acquiring preset data acquisition items of a big data cluster, and the data acquisition items at least comprise data usage of each mode and data usage of each data table in the big data cluster;

a determining unit 402, configured to determine each mode and each application system preset in the data table respectively; the mode corresponds to the application system, the application system presets to use the mode, and the data table corresponds to the application system, and the application system presets to use the data table;

a calculating unit 403, configured to obtain, for each application system, a data usage amount of the application system in the big data cluster according to the data usage amount of the mode corresponding to the application system and the data usage amount of the data table corresponding to the application system.

Optionally, the big data cluster is a plurality of big data clusters;

the acquisition unit 401 acquires preset data acquisition items of the big data clusters, and acquires the preset data acquisition items of the big data clusters for the acquisition unit 401;

optionally, the apparatus 400 further includes a configuration unit 404, configured to obtain big data cluster information of each big data cluster, where the big data cluster information at least includes a peak usage period of the big data cluster and connection configuration information for connecting the big data cluster; for each big data cluster, connecting to the big data cluster according to the connection configuration information of the big data cluster; and configuring and obtaining the acquisition time period and the acquisition frequency for acquiring each data acquisition item according to the peak use time period of each big data cluster.

Optionally, the data collection item further includes a cluster data usage amount of the big data cluster.

Optionally, as shown in fig. 4, the apparatus 400 further includes a prediction unit 405, configured to input a pre-established prediction model into cluster data usage of the big data cluster and historical cluster data usage of the big data cluster obtained historically, so as to obtain the cluster data usage of the big data cluster in a future time period; the prediction model is obtained by modeling according to a linear regression algorithm;

Optionally, as shown in fig. 4, the apparatus 400 further includes a user interaction unit 406, configured to respond to a query request of the data acquisition items input by a user, and display each of the data acquisition items;

Optionally, the specific implementation manner of the acquisition unit 401, for each big data cluster, acquiring the preset data acquisition items of the big data cluster is as follows: acquiring a preset acquisition script corresponding to the data acquisition item; the acquisition script comprises a script execution statement for acquiring the data acquisition item; and executing the acquisition script according to the acquisition time period and the acquisition frequency, and acquiring to obtain the data acquisition item.

Optionally, the specific implementation manner of determining, by the determining unit 402, each mode and each application system preset in the data table is as follows:

The device provided by this embodiment collects data usage information and data usage capacity information of a big data cluster, and at least includes data usage of each mode and each data table in the big data cluster, and determines each mode and an application system preset for each data table, where the mode corresponds to the application system and is a preset usage mode of the application system, and the data table corresponds to the application system and is a preset usage data table of the application system, and obtains the data usage of the application system in the big data cluster according to the usage of the mode corresponding to the application system and the usage of the data table corresponding to the application system. According to the rule that the application system mainly uses the mode of the big data cluster and the data table of the data resource using the big data cluster, the data usage of the application system in the big data cluster can be obtained by determining the mode corresponding to the application system and the data usage of the corresponding data table.

The present application further provides an electronic device 500, a schematic structural diagram of which is shown in fig. 5, including: a processor 501 and a memory 502, the memory 502 is used for storing application programs, the processor 501 is used for executing the application programs to realize the data processing method of the application, namely, the following steps are executed:

The present application also provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the method of data processing of the present application, namely to perform the steps of:

The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of data processing, comprising:

2. The method of claim 1, wherein the big data cluster is a plurality of big data clusters;

3. The method of claim 2, wherein collecting the preset data collection items of the big data clusters for each big data cluster comprises:

4. The method of claim 1, wherein said determining each of said patterns and each of said data tables, respectively, presets a corresponding application system comprises:

5. The method of claim 1, wherein the data collection item further comprises a cluster data usage of the big data cluster.

6. The method of claim 5, further comprising:

inputting the cluster data usage amount of the big data cluster and historical cluster data usage amount of the big data cluster obtained in a historical manner into a pre-established prediction model to obtain the cluster data usage amount of the big data cluster in a future period; the prediction model is obtained by modeling according to a linear regression algorithm;

7. The method of claim 1, further comprising:

8. An apparatus for data processing, comprising:

9. An electronic device, comprising: a processor and a memory for storing a program; the processor is configured to execute the program to implement the method of data processing according to any one of claims 1 to 7.

10. A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the method of data processing of any of claims 1-7.