CN115061978A - Construction method of hadoop parameter optimization model - Google Patents

Construction method of hadoop parameter optimization model Download PDF

Info

Publication number
CN115061978A
CN115061978A CN202210671845.8A CN202210671845A CN115061978A CN 115061978 A CN115061978 A CN 115061978A CN 202210671845 A CN202210671845 A CN 202210671845A CN 115061978 A CN115061978 A CN 115061978A
Authority
CN
China
Prior art keywords
node
time
hadoop
server
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210671845.8A
Other languages
Chinese (zh)
Inventor
付学良
罗小玲
潘新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia Agricultural University
Original Assignee
Inner Mongolia Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia Agricultural University filed Critical Inner Mongolia Agricultural University
Priority to CN202210671845.8A priority Critical patent/CN115061978A/en
Publication of CN115061978A publication Critical patent/CN115061978A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of distributed processing, in particular to a method for constructing a hadoop parameter optimization model, which comprises the following steps: collecting data volume generated by each data source in a certain time by using a server; analyzing the characteristics of the single data source, giving characteristic values according to the generated data quantity in proportion, and estimating the scale of the file to be processed by utilizing the server according to the characteristic values; the server collects the resource storage amount of each prepared node in a conventional running state by taking a certain time as a period, and groups the resource storage amount; the server pre-estimates the number of nodes and the processing time according to the scale of the file to be processed; and the server adjusts the hadoop parameters according to the number of the pre-estimated nodes and the processing time of the server. The method comprises the steps of giving characteristic values by analyzing the characteristics of a data source, predicting the file scale according to the characteristic values, grouping hadoop distributed nodes, and adjusting the parameters of the hadoop according to the file scale and the node group, so that the resources of hadoop projects are saved.

Description

Construction method of hadoop parameter optimization model
Technical Field
The invention relates to a hadoop optimization method, in particular to a construction method of a hadoop parameter optimization model.
Background
With the increasing expansion of data generated by information, hadoop is widely applied as an important means for processing and solving large files, and in application, adjustment of hadoop configuration parameters plays a crucial role in overall operation efficiency and resource utilization rate. Chinese patent publication No. CN104317610A discloses "a method and apparatus for automatic installation and deployment of hadoop platform", which loads a hadoop end node by using a host cluster, and adjusts necessary parameters to default parameters. Chinese patent publication No. CN103064664A discloses "an automatic Hadoop parameter optimization method and system based on performance estimation", which adjusts Hadoop parameters by using analog operation on Hadoop projects, so as to reduce cost. Chinese patent publication No. CN104750780A discloses "a Hadoop configuration parameter optimization method based on statistical analysis", which classifies applications with strong characteristics and establishes a prediction model, thereby guiding the parameter optimization of Hadoop.
It can be seen that the above method and system have the following problems: when the information source of the project is in various states, the scale of the project is difficult to judge, and the aim of saving resources is difficult to achieve for the hadoop parameter optimization.
Disclosure of Invention
Therefore, the invention provides a method for constructing a hadoop parameter optimization model. The method is used for solving the problems that in the prior art, when the information source of the project is in various states, the scale of the project is difficult to judge, and the purpose of saving resources is difficult to achieve for the hadoop parameter optimization.
In order to achieve the above object, the present invention provides a method for constructing a hadoop parameter optimization model, comprising:
step S1, collecting the data quantity generated by each data source in a certain time by using a server and analyzing the maximum value and the minimum value of the data generated by the single-grid data source in a preset time;
step S2, analyzing the characteristics of the single data source according to experience and inputting the characteristics into the server, wherein the server gives characteristic values to the data sources according to the data quantity generated by the data sources in proportion, and pre-estimates the scale of the file to be processed according to the characteristic values;
step S3, the server collects the resource storage of each prepared node in the normal running state with a certain time as a period, and groups the prepared nodes according to the resource storage according to the time;
step S4, the server pre-estimates the number of nodes and the processing time according to the scale of the file to be processed;
and step S5, adjusting hadoop parameters according to the node number and the processing time estimated by the server.
Further, the data volume generated by the data source in a preset period is D, and D is regularly changed in a preset time T;
for the data amount Dij generated by the ith data source in the period of j time, which is evenly divided by the preset time T, there are a maximum value maxDij and a minimum value minDij, and i is set to 1,2,3, …, N, j is set to 1,2,3, …, m.
Further, the maximum data size generated by the data source within the preset time T is denoted as maxDT, the minimum data size generated within the preset time T is denoted as minDT, and maxDT is set to maxdijxnxm, and minDT is set to minDij xnxm;
and setting hadoop standard parameters A corresponding to the minDT by taking the minDT as the standard file scale, and grouping by taking the A as a reference according to the running state of the preparation node.
Further, for a single said backup node, the resource storage amount R of the backup node has a highest value maxR within the preset time T, a first preset resource storage amount R1, a second preset resource storage amount R2 is set, wherein R1 is 0.3maxR, R2 is 0.7maxR,
if R < R1, the server judges that the reserve node resource is insufficient, and records the time period of the node under the condition as an unavailable time period;
if R1 is more than or equal to R < R2, the server judges that the reserve node resource storage amount is low, and records the time period of the node under the condition as an inefficient time period;
if R is larger than or equal to R2, the server judges that the reserve node resource storage amount is high, and records the time period of the node under the condition as an efficient time period.
Further, for the kth of the preliminary node, a state within the jth of the time periodP kj Assignment where k is 1,2,3, …, n,
if the time interval is marked as an unavailable time interval, P is added kj The value is assigned to be 0 and,
if the period is marked as an inefficient period, P is assigned kj The value is assigned to be 1,
if the time interval is recorded as the high-efficiency time interval, P kj The value is assigned to be 2,
grouping the prepared nodes by the optimal node number NA according with the A by using the standard parameter A so that the group of nodes is in P kj The item whose data size is minDT is completed after the optimal execution time tA in the state of (1).
Further, for the kth of the preparation node,
when j + T is less than or equal to T,
if P kj =P kj+1 =…=P kj+t The server takes the node as a stable node record of the (j, j + t) time period and brings the node into a group;
if P kj =P kj+1 =…=P kj+t 0, the server regards the node as an unavailable node for the (j, j + t) period;
when j + T > T, the server determines that the node is unavailable.
Further, the average of the maximum data amount maxDT and the minimum data amount minDT is
Figure BDA0003694947640000031
The hadoop parameter of (a) is a ', wherein the optimal number of nodes of a' is NA ', the optimal execution time is tA', and a function of the optimal number of nodes and the optimal execution time is set as f (d), wherein f (d) N × t,
and after the data volume D is obtained, the server judges the adjustment mode of the working hadoop parameter according to a function f (D).
Furthermore, the data quantity Dij corresponds to a characteristic attribute of the time period j, the characteristic attribute influences the scale of the data quantity Dij, in the time period T, the proportion of the characteristic attributes of Dij and j in the time period T is multiplied, and an approximate value of DT is obtained after summation and is used as estimated data quantity to carry out hadoop parameter optimization reference data.
Further, if each of the nodes cannot be deployed in the case of more than 70% of the maximum data amount maxDT, the node group operates normally.
Further, the preset time has no continuity, and the processing time has continuity.
Compared with the prior art, the method has the advantages that the characteristic values are given by analyzing the characteristics of the data source, the file scale is estimated according to the characteristic values, the hadoop distributed nodes are grouped, the node groups are distributed according to the file scale, and the parameters of the hadoop are adjusted according to the file scale and the node groups, so that the resources of hadoop projects are saved.
Furthermore, by means of classifying the data volume generated by the data source according to time and scale, the data scale judgment error caused by unclear classification of the data source is avoided, and meanwhile, the classification efficiency of the data source is effectively improved, so that the resources of a hadoop project are further saved.
Furthermore, by means of the mode that the maximum scale and the minimum scale which can be reached by the acquired data source within a certain time are pre-estimated, and the minimum-scale data volume is set as a reference group, the calculation power waste caused by no reference in file segmentation is avoided, meanwhile, the reasonability of the distribution of calculation power resources is effectively improved, and therefore the resources of a hadoop project are further saved.
Furthermore, the nodes with larger workload except the hadoop project are separated from the nodes with smaller workload by utilizing the node classification mode, so that the working reliability of the nodes is effectively improved while the time waste caused by insufficient processing capacity of a single node is avoided, and the resources of the hadoop project are further saved.
Furthermore, the nodes are classified by assigning the working states of the nodes, so that the uneven processing capacity of the nodes is avoided, the working reliability of the nodes is effectively improved, and the resources of hadoop projects are further saved.
Furthermore, the time interval with the highest node working efficiency is determined by comparing the node working filling value with the working time interval, the nodes are grouped, the project duration increase caused by different node processing efficiency is avoided, meanwhile, the reliability of the node working is effectively improved, and therefore the resources of the hadoop project are further saved.
Furthermore, the completion result of the hadoop project is estimated in a mode of setting functions of the number of nodes, the execution time and the data volume, the calculation amount is reduced, and meanwhile, resource shortage or waste caused by inaccurate estimation of the data scale is avoided, so that the resources of the hadoop project are further saved.
Furthermore, the pre-estimation attribute of the data source is adjusted through the time attribute, so that the problem that the pre-estimation is inaccurate due to abnormal increase of data amount caused by special time is avoided, the stability of the hadoop project is improved, and the resources of the hadoop project are further saved.
Furthermore, by means of obtaining the maximum data volume, project estimation uncertainty caused by node grouping being too small is avoided, meanwhile, system interference performance is improved, and accordingly resources of hadoop projects are saved.
Furthermore, through the continuity of the set time, the data volume is prevented from being abnormally increased due to continuous information collection, and meanwhile, the stability of the hadoop project is improved, so that the resources of the hadoop project are further saved.
Drawings
FIG. 1 is a schematic flow chart of a method for constructing a hadoop parameter optimization model according to the present invention.
Detailed Description
In order that the objects and advantages of the invention will be more clearly understood, the invention is further described below with reference to examples; it should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and do not limit the scope of the present invention.
It should be noted that in the description of the present invention, the terms of direction or positional relationship indicated by the terms "upper", "lower", "left", "right", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, which are only for convenience of description, and do not indicate or imply that the device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention.
Furthermore, it should be noted that, in the description of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
Fig. 1 is a schematic flow chart of a method for constructing a hadoop parameter optimization model according to the present invention, which includes:
step S1, collecting the data quantity generated by each data source in a certain time by using a server and analyzing the maximum value and the minimum value of the data generated by the single-grid data source in a preset time;
step S2, analyzing the characteristics of a single data source according to the experience of the server and inputting the characteristics into the server, wherein the server gives characteristic values to the data sources according to the proportion of the data amount generated by the data sources and pre-estimates the scale of the file to be processed according to the characteristic values;
step S3, the server collects the resource storage of each prepared node in the normal running state with a certain time as a period, and groups the prepared nodes according to the resource storage according to the time;
step S4, the server pre-estimates the number of nodes and the processing time according to the scale of the file to be processed;
and step S5, the server adjusts the hadoop parameters according to the number of nodes and the processing time estimated by the server.
The method comprises the steps of giving a characteristic value by analyzing the characteristics of a data source, estimating the file scale according to the characteristic value, grouping hadoop distributed nodes, distributing node groups according to the file scale, and adjusting the parameters of hadoops according to the file scale and the node groups, so that the resources of hadoop projects are saved.
Specifically, the data volume generated by the data source in a preset period is D, and D is regularly changed in a preset time T;
for the data amount Dij generated by the ith data source in the j time periods evenly dividing the preset time T, there are a maximum value maxDij and a minimum value minDij, setting i equal to 1,2,3, …, N, j equal to 1,2,3, …, m.
By means of the mode that the data volume generated by the data source is classified according to time and scale, the data scale judgment error caused by unclear classification of the data source is avoided, meanwhile, the classification efficiency of the data source is effectively improved, and therefore resources of a hadoop project are further saved.
Specifically, the maximum data size generated by the data source within a preset time T is denoted as maxDT, the minimum data size generated within the preset time T is denoted as minDT, and maxDT ═ maxdijxnxm and minDT ═ minDij × nxxm are set;
and setting a hadoop standard parameter A corresponding to the minDT by taking the minDT as the standard file scale, and grouping according to the running state of the preparation node by taking A as a reference.
The maximum scale and the minimum scale which can be reached by the data source within a certain time are obtained for pre-estimation, and the minimum-scale data volume is set as a reference group, so that the calculation power waste caused by no reference in file segmentation is avoided, and meanwhile, the reasonability of the allocation of calculation power resources is effectively improved, and the resources of a hadoop project are further saved.
Specifically, for a single said backup node, resource storage amount R of the backup node has a highest value maxR within preset time T, a first preset resource storage amount R1 and a second preset resource storage amount R2 are set, where R1 is 0.3maxR, R2 is 0.7maxR,
if R < R1, the server can determine that the reserve node resource is insufficient, and record the time period of the node in the condition as an unavailable time period;
if R1 is not more than R < R2, the server can judge that the reserve node resource storage amount is low, and mark the time period of the node under the condition as an inefficient time period;
if R is larger than or equal to R2, the server can judge that the reserve node resource storage amount is high, and the time period of the node under the condition is recorded as an efficient time period.
By means of node classification, nodes with larger workload except the hadoop project are separated from nodes with smaller workload, time waste caused by insufficient processing capacity of a single node is avoided, reliability of node work is effectively improved, and resources of the hadoop project are further saved.
Specifically, for the kth preliminary node, the state P in the jth time period kj Assignment where k is 1,2,3, …, n,
if the time interval is marked as an unavailable time interval, P is added kj The value is assigned to be 0 and,
if the period is marked as an inefficient period, P is assigned kj The value is assigned to be 1,
if the time interval is marked as the high efficiency time interval, P is added kj The value is assigned to be 2,
grouping the prepared nodes by the optimal node number NA according with the A by using the standard parameter A so that the group of nodes is in P kj Can complete the project with the data volume minDT after the optimal execution time tA in the state of (1).
The nodes are classified by assigning the working states of the nodes, so that the uneven processing capacity of the nodes is avoided, the working reliability of the nodes is effectively improved, and the resources of hadoop projects are further saved.
Specifically, for the kth of the preparation node,
when j + T is less than or equal to T,
if P kj =P kj+1 =…=P kj+t The server takes the node as a stable node record of the (j, j + t) time period and brings the node into a group;
if P kj =P kj+1 =…=P kj+t 0, the server treats the node as an unavailable node for a (j, j + t) period;
when j + T > T, the server determines that the node is unavailable.
The time interval with the highest node working efficiency is determined by comparing the node working filling value with the working time interval, the nodes are grouped, the project duration increase caused by different node processing efficiency is avoided, the node working reliability is effectively improved, and the hadoop project resource is further saved.
Specifically, the average value of the maximum data amount maxDT and the minimum data amount minDT is
Figure BDA0003694947640000071
The hadoop parameter of (a) is a ', wherein the optimal number of nodes of a' is NA ', the optimal execution time is tA', and a function of the optimal number of nodes and the optimal execution time is set as f (d), wherein f (d) N × t,
after the data volume D is obtained, the adjustment mode of the hadoop parameter of the work can be judged.
The completion result of the hadoop project is estimated by setting functions of the number of nodes, the execution time and the data volume, so that the amount of calculation is reduced, and simultaneously, the resource shortage or waste caused by inaccurate estimation of the data scale is avoided, and the resource of the hadoop project is further saved.
Specifically, the data quantity Dij corresponds to a characteristic attribute of the time period j, the characteristic attribute influences the scale of the data quantity Dij, and in the time period T, the ratio of the characteristic attributes of the time period j to the Dij in the time period T is multiplied, and an approximate value of DT is obtained after the multiplication and is used as estimated data quantity to carry out hadoop parameter optimization reference data. For example, when information of shopping software for a certain continuous 7 days is processed, 2 holidays, a,1 specific shopping festival, 1 travel discount day and 4 working days, the server may assign 1 to a working day, 1.5 to a rest day, 5 to a specific shopping festival and 0.5 to a travel discount day, so that the data volume Di ' generated by the data source in the continuous 7 days is Di ' ═ Di × (1 × 4+1.5 × 2+0.5 × 1+5 × 1), and if the travel discount day overlaps with one rest day, the data volume Di ' generated by the data source in the continuous 7 days is Di × (1 × 4+1.5 × 2+0.5 × 1+5 × 1)
Figure BDA0003694947640000081
The pre-estimation attribute of the data source is adjusted through the time attribute, so that the problem that pre-estimation is inaccurate due to abnormal increase of data amount caused by special time is avoided, the stability of the hadoop project is improved, and the resources of the hadoop project are further saved.
Specifically, if each of the nodes cannot be placed above 70% of the maximum data amount maxDT, the node group may be operational. By means of obtaining the maximum data volume, project estimation uncertainty caused by node grouping undersize is avoided, meanwhile, system interference performance is improved, and accordingly resources of hadoop projects are saved.
Specifically, the preset time has no continuity, and the processing time has continuity. Through the continuity of the set time, the data volume is prevented from being abnormally increased due to continuous information collection, and meanwhile, the stability of the hadoop project is improved, so that the resources of the hadoop project are further saved.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention; various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for constructing a hadoop parameter optimization model is characterized by comprising the following steps:
step S1, collecting the data quantity generated by each data source in a certain time by using a server and analyzing the maximum value and the minimum value of the data generated by the single-grid data source in a preset time;
step S2, analyzing the characteristics of the single data source according to experience and inputting the characteristics into the server, wherein the server gives characteristic values to the data sources according to the data quantity generated by the data sources in proportion, and pre-estimates the scale of the file to be processed according to the characteristic values;
step S3, the server collects the resource storage amount of each prepared node in each time interval under the normal operation state with a certain time as a cycle, and groups the prepared nodes according to the size of the resource storage amount and the time;
step S4, the server pre-estimates the number of nodes and the processing time according to the scale of the file to be processed;
and step S5, adjusting hadoop parameters according to the node number and the processing time estimated by the server.
2. The method for constructing the hadoop parameter optimization model according to claim 1, wherein the data volume generated by the data source in a preset period is D, and D changes regularly in a preset time T;
for the data amount Dij generated by the ith data source in the period of j time, which is evenly divided by the preset time T, there are a maximum value maxDij and a minimum value minDij, and i is set to 1,2,3, …, N, j is set to 1,2,3, …, m.
3. The method for constructing the hadoop parameter optimization model according to claim 2, wherein the maximum data size generated by the data source within the preset time T is denoted as maxDT, the minimum data size generated within the preset time T is denoted as minDT, and maxDT is set to maxdijxnxm, and minDT is set to minDij xm;
and setting hadoop standard parameters A corresponding to the minDT by taking the minDT as the standard file scale, and grouping by taking the A as a reference according to the running state of the preparation node.
4. The method for constructing hadoop parameter optimization model according to claim 3, wherein for a single of the preliminary nodes, the resource storage amount R of the preliminary node has a maximum value maxR within a preset time T, a first preset resource storage amount R1 and a second preset resource storage amount R2 are set, wherein R1 is 0.3maxR, R2 is 0.7maxR,
if R < R1, the server judges that the reserve node resource is insufficient, and records the time period of the node under the condition as an unavailable time period;
if R1 is more than or equal to R < R2, the server judges that the reserve node resource storage amount is low, and records the time period of the node under the condition as an inefficient time period;
and if R is larger than or equal to R2, the server judges that the reserve node resource storage amount is high, and records the time period of the node under the condition as an efficient time period.
5. The method for constructing hadoop parameter optimization model according to claim 4, wherein for the kth preparation node, the state P in the jth time period kj Assignment where k is 1,2,3, …, n,
if the time interval is marked as an unavailable time interval, P is added kj The value is assigned to be 0 and,
if the period is marked as an inefficient period, P is assigned kj The value is assigned to be 1,
if the time interval is marked as the high efficiency time interval, P is added kj The value is assigned to be 2,
grouping the prepared nodes by the optimal node number NA according with the A by using the standard parameter A so that the group of nodes is in P kj In the state of optimum executionAnd finishing the item with the data size of minDT after the time tA.
6. The method of constructing a hadoop parameter optimization model according to claim 5, wherein for the kth preparation node,
when j + T is less than or equal to T,
if P kj =P kj+1 =…=P kj+t The server takes the node as a stable node record of the (j, j + t) time period and brings the node into a group;
if P kj =P kj+1 =…=P kj+t 0, the server treats the node as an unavailable node for a (j, j + t) period;
when j + T > T, the server determines that the node is unavailable.
7. The method for constructing the hadoop parameter optimization model according to claim 6, wherein the mean value of the maximum data amount maxDT and the minimum data amount minDT is
Figure FDA0003694947630000021
The hadoop parameter of (a) is a ', wherein the optimal number of nodes of a' is NA ', the optimal execution time is tA', and a function of the optimal number of nodes and the optimal execution time is set as f (d), wherein f (d) N × t,
and after the data volume D is obtained, the server calculates the adjustment mode of the hadoop parameter of the work according to a function f (D).
8. The method for constructing the hadoop parameter optimization model according to claim 7, wherein the data quantity Dij corresponds to a characteristic attribute of the time period j, the characteristic attribute affects the scale of the data quantity Dij, in the time period T, the proportion of the characteristic attributes of the Dij and the time period T is multiplied, and an approximate value of DT is obtained after the multiplication and is used as an estimated data quantity to perform hadoop parameter optimization reference data.
9. The method of claim 8, wherein if each node cannot be deployed above 70% of the maximum data volume maxDT, the cluster of nodes operates normally.
10. The method of constructing a hadoop parameter optimization model according to claim 1, wherein the preset time is not continuous and the processing time is continuous.
CN202210671845.8A 2022-06-15 2022-06-15 Construction method of hadoop parameter optimization model Pending CN115061978A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210671845.8A CN115061978A (en) 2022-06-15 2022-06-15 Construction method of hadoop parameter optimization model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210671845.8A CN115061978A (en) 2022-06-15 2022-06-15 Construction method of hadoop parameter optimization model

Publications (1)

Publication Number Publication Date
CN115061978A true CN115061978A (en) 2022-09-16

Family

ID=83200687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210671845.8A Pending CN115061978A (en) 2022-06-15 2022-06-15 Construction method of hadoop parameter optimization model

Country Status (1)

Country Link
CN (1) CN115061978A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116149790A (en) * 2023-02-15 2023-05-23 北京景安云信科技有限公司 Method for starting multiple Springboot items based on JVM process reduction

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116149790A (en) * 2023-02-15 2023-05-23 北京景安云信科技有限公司 Method for starting multiple Springboot items based on JVM process reduction
CN116149790B (en) * 2023-02-15 2023-11-21 北京景安云信科技有限公司 Method for starting multiple Springboot items based on JVM process reduction

Similar Documents

Publication Publication Date Title
US20230004436A1 (en) Container scheduling method and apparatus, and non-volatile computer-readable storage medium
CN105005570A (en) Method and apparatus for mining massive intelligent power consumption data based on cloud computing
CN111414070B (en) Case power consumption management method and system, electronic device and storage medium
CN117036104B (en) Intelligent electricity utilization method and system based on electric power Internet of things
CN113010260A (en) Elastic expansion method and system for container quantity
CN115061978A (en) Construction method of hadoop parameter optimization model
CN113228574A (en) Computing resource scheduling method, scheduler, internet of things system and computer readable medium
CN113872813A (en) Full life cycle management method and system for carrier communication equipment
US20040117408A1 (en) Systems, methods and articles of manufacture for determining available space in a database
CN103607731A (en) Method and device for processing measurement reports
CN111314234B (en) Flow distribution method and device, storage medium and electronic equipment
CN117472652A (en) Data backup method, device and system of cloud computing operation and maintenance platform
CN115766473B (en) Resource capacity planning method suitable for cloud platform operation
CN109450672B (en) Method and device for identifying bandwidth demand burst
CN117593171B (en) Image acquisition, storage and processing method based on FPGA
WO2013128836A1 (en) Virtual server management device and method for determining destination of virtual server
CN114079997B (en) High-performance communication method based on WSN (Wireless sensor network) improved routing protocol
CN112383949B (en) Edge computing and communication resource allocation method and system
CN116974468B (en) Equipment data storage management method and system based on big data
CN117519913B (en) Method and system for elastically telescoping scheduling of container memory resources
CN102752122A (en) Device and method for acquiring multidimensional static performance data in network management
CN117909069A (en) Container storage performance optimization method, device, equipment and storage medium
CN116755887A (en) Automatic slot deployment system and method based on big data balanced acquisition
CN117370138A (en) High capacity distributed storage system
CN118118457A (en) Method for automatically registering working node of distributed system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination