WO2023115931A1 - 大数据组件参数调整方法、装置、电子设备及存储介质 - Google Patents

大数据组件参数调整方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023115931A1
WO2023115931A1 PCT/CN2022/107123 CN2022107123W WO2023115931A1 WO 2023115931 A1 WO2023115931 A1 WO 2023115931A1 CN 2022107123 W CN2022107123 W CN 2022107123W WO 2023115931 A1 WO2023115931 A1 WO 2023115931A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
component
parameters
big data
parameter
Prior art date
Application number
PCT/CN2022/107123
Other languages
English (en)
French (fr)
Inventor
田家辉
Original Assignee
浪潮通信信息系统有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮通信信息系统有限公司 filed Critical 浪潮通信信息系统有限公司
Publication of WO2023115931A1 publication Critical patent/WO2023115931A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Definitions

  • the present disclosure relates to the technical field of operation and maintenance data processing, and in particular to a parameter adjustment method, device, electronic equipment and storage medium of a big data component.
  • the present disclosure provides a large data component parameter adjustment method, device, electronic equipment and media to solve the technical problems of poor user experience and high labor costs caused by insufficient manual adjustment of component parameters in the prior art.
  • the present disclosure realizes self-adaptive adjustment The purpose of component parameters, reducing labor costs and improving user experience.
  • the present disclosure provides a method for adjusting parameters of a big data component, including:
  • the target component is any component in the big data component
  • the first process applies for additional resources from the resource manager, and re-determines the target execution process to complete unfinished tasks according to the additional resources.
  • adjusting multiple target parameters corresponding to the target component according to the operation of the target component includes:
  • a plurality of target parameters are determined according to the running time and the preset timeout threshold
  • the multiple target parameters are adjusted according to the multiple target parameters and the multiple target threshold intervals.
  • adjusting the multiple target parameters according to the multiple target parameters and multiple target threshold intervals includes:
  • the first target parameter belongs to the first target threshold interval, adjusting the first target parameter
  • the first target parameter is any one of the multiple target parameters; the first target threshold interval is any one of the multiple target threshold intervals.
  • the first process applies for additional resources from the resource manager, and re-determines whether the target execution process is completed or not according to the additional resources tasks, including:
  • the first process applies for additional resources to a resource manager
  • the first process After the first process receives the extra resources allocated by the resource manager, it will start the target execution process, and assign unfinished tasks to the target execution process.
  • the first process will start the target execution process after receiving the additional resources allocated by the resource manager, and allocate unfinished tasks to the target execution process, including :
  • the first process applies to multiple execution processes that have assigned tasks and have not completed tasks to withdraw unfinished tasks
  • the withdrawn unfinished tasks are reassigned to the target execution process.
  • the first process will start the target execution process after receiving the additional resources allocated by the resource manager, assign unfinished tasks to the target execution process, and include:
  • the tasks in the first execution process are completed, report a completion message to the first process, and the first process applies to the resource manager for canceling the first execution process, and restarts according to the target parameter
  • the first execution process executes a new task
  • the first execution process is any one of the plurality of execution processes.
  • the real-time monitoring of the operation of the target component includes:
  • the present disclosure also provides a big data component parameter adjustment device, including:
  • a monitoring module is used to monitor the operation of the target component in real time;
  • the target component is any component in the big data component;
  • An adjustment module configured to adjust a plurality of target parameters corresponding to the target component according to the operation of the target component
  • the execution module is configured to apply the first process to the resource manager for additional resources according to the adjusted multiple target parameters, and re-determine the target execution process to complete unfinished tasks according to the additional resources.
  • the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • the processor executes the program, any one of the above-mentioned Describe the steps of the parameter adjustment method of the big data component.
  • the present disclosure also provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of any one of the methods for adjusting the parameters of the big data component described above are implemented .
  • the disclosure provides a big data component parameter adjustment method, device, electronic equipment, and media.
  • the target component is any component in the big data component, and the target component is adjusted according to the operation of the target component.
  • the plurality of target parameters corresponding to the component, according to the adjusted multiple target parameters, the first process applies for additional resources from the resource manager, and re-determines the target execution process to complete unfinished tasks according to the additional resources.
  • the big data component parameter adjustment method provided in the present disclosure can adjust parameters adaptively according to the production environment, realize dynamic adjustment of component parameters, reduce labor costs, and improve user experience.
  • FIG. 1 is a schematic flow diagram of a method for adjusting parameters of a big data component provided by the present disclosure
  • FIG. 2 is a schematic diagram of the overall flow of the method for adjusting parameters of big data components provided by the present disclosure
  • Fig. 3 is a schematic structural diagram of a big data component parameter adjustment device provided by the present disclosure.
  • Fig. 4 is a schematic structural diagram of an electronic device provided by the present disclosure.
  • Fig. 1 is a schematic flowchart of a method for adjusting parameters of a big data component provided by the present disclosure. As shown in Fig. 1 , the method for adjusting parameters of a big data component provided by the present disclosure specifically includes the following steps:
  • Step 101 Monitor the operation of the target component in real time; the target component is any one of the big data components.
  • monitoring software is used to monitor the operation of the target components in the big data in real time.
  • the monitoring software in this embodiment is implemented based on Scala and Java language, and realizes the synchronization and dynamic adjustment of parameter configuration based on zookeeper , by modifying the source code of the big data component to adapt to the function of the monitoring software, wherein zookeeper is a proxy software that provides consistent services for distributed application components.
  • the target component is any one of the big data components.
  • computing tool components include Spark, hadoop, and flink components
  • storage components include Hive, Hbase, and Redis components.
  • the target component can be any one of them, which can be selected according to the actual needs of the user, and is not specifically limited here.
  • Step 102 Adjust multiple target parameters corresponding to the target component according to the running conditions of the target component.
  • the target parameter refers to a parameter used to describe the operating performance of the target component, and is a parameter that can be dynamically adjusted.
  • the target parameter refers to a parameter used to describe the operating performance of the target component, and is a parameter that can be dynamically adjusted.
  • It is the total number of executors processes started in the cluster, the memory of each executors process and other parameters, and multiple target parameters are written to the node znode specified in zookeeper to monitor the running status of the target components in real time.
  • Multiple target parameters can be selected and confirmed according to the target component selected by the user, which is not specifically limited here.
  • the target component is Spark
  • the monitoring software will monitor the running status of the Spark component in real time, which can specifically include the running status of the Application, Stage, and Taskset components.
  • the running status dynamically Adjust multiple target parameters, and write multiple target parameters to the specified znode in zookeeper for storage.
  • the multiple target parameters can be the total number of executors processes started in the cluster (num-executors), the memory of each executors process (executor-memory), and the management memory of components (driver-memory).
  • Yarn is a Hadoop resource manager and a general resource management system that can provide unified resource management and scheduling for upper-layer applications.
  • the monitoring software adopts distributed deployment to realize distributed monitoring, which can obtain the operation status of each node in the cluster more quickly and quickly adjust target parameters.
  • Step 103 According to the adjusted target parameters, the first process applies for additional resources from the resource manager, and re-determines the target execution process to complete unfinished tasks according to the additional resources.
  • the first process needs to apply for additional resources from the resource manager according to the adjusted sizes of the multiple target parameters, and then re-determines the target execution process according to the applied additional resources to execute unfinished tasks.
  • the extra resource refers to a new execution process, and the new execution process is applied for to complete the unfinished tasks of the old process.
  • the first process is the Driver process, which can periodically scan for changes in multiple target parameters, and when multiple target parameters change, apply to the resource manager Yarn for additional resources to the target components, and then the first process also starts the target execution process according to the allocated additional resources to execute unfinished tasks.
  • the target component is any one of the big data components, and adjusting multiple parameters corresponding to the target component according to the running status of the target component.
  • Target parameters according to the adjusted multiple target parameters, the first process applies for additional resources from the resource manager, and re-determines the target execution process according to the additional resources to complete unfinished tasks.
  • the big data component parameter adjustment method provided in the present disclosure can adjust parameters adaptively according to the production environment, realize dynamic adjustment of component parameters, reduce labor costs, and improve user experience.
  • the adjusting multiple target parameters corresponding to the target component according to the operation of the target component includes:
  • a plurality of target parameters are determined according to the running time and the preset timeout threshold
  • the multiple target parameters are adjusted according to the multiple target parameters and the multiple target threshold intervals.
  • the monitoring software when the Spark component is submitted to the resource manager Yarn to run, it is monitored that the Spark component runs very slowly during the running process, and when the running time exceeds the timeout threshold configured in advance by the monitoring software, the monitoring software will Judging and determining multiple target values, such as the total number of executors processes started, the memory of each executors process, and the number of cores of the Central Processing Unit (CPU) of each executors process, and then according to the determined target value and a certain The calculation method judges the relationship between the target value and the target threshold interval, and dynamically adjusts multiple target parameters such as num-executors, executor-memory, and driver-memory.
  • target values such as the total number of executors processes started, the memory of each executors process, and the number of cores of the Central Processing Unit (CPU) of each executors process
  • the target threshold interval is used to determine the adjustment range of the target parameter. For example, if the value of the target parameter is 5, and the set target threshold range is 3-10, then the target parameter can be automatically adjusted to 6 as needed. it is also fine. If the target parameter value is 12, it will be automatically adjusted to a boundary value of 10, and then the monitoring module will record logs and send alarms to the operation and maintenance personnel.
  • the big data component parameter adjustment method provided in the present disclosure, by judging the relationship between the target parameter and the preset target threshold range, the purpose of dynamically adjusting the target parameter is realized, and the operation and maintenance cost and difficulty of operation and maintenance of the big data cluster are reduced.
  • the adjusting the multiple target parameters according to the multiple target parameters and multiple target threshold intervals includes:
  • the first target parameter belongs to the first target threshold interval, adjusting the first target parameter
  • the first target parameter is any one of the multiple target parameters; the first target threshold interval is any one of the multiple target threshold intervals.
  • the first target parameter when the first target parameter belongs to the preset first target threshold interval, the first target parameter is adjusted, wherein the first target parameter can be the total number of executors processes started in the cluster, each executors process Any one of the three parameters of the memory and the managed memory of the component. It should be noted that the adjusted size of the first target parameter is also a parameter within the first target threshold interval.
  • the first target parameter when the first target parameter does not belong to the preset first target threshold interval, the first target parameter is adjusted to the critical value of the first target threshold interval, if it does not belong to the first target threshold When it is in the interval, it means that the resources corresponding to the target parameters cannot be fully satisfied, and corresponding partial adjustments can be made, and an alarm is recorded and sent in the monitoring software, and then the alarm is sent for the reference of the operation and maintenance personnel and the threshold value is adjusted in time.
  • the critical value may be the maximum value of the first target threshold interval, or the minimum value of the first target threshold interval, which may be set according to the actual needs of the user, and is not specifically limited here.
  • the big data component parameter adjustment method provided by the present disclosure, by setting the target threshold interval in advance, according to the maximum and minimum values of each target parameter set, it can prevent the adjustment from being too large or too small in the process of dynamically adjusting the target parameter situation, to avoid resource waste or exceptions.
  • the first process applies for additional resources from the resource manager, and re-determines the target execution process according to the additional resources to complete the unfinished tasks, including:
  • the first process applies for additional resources to a resource manager
  • the first process After the first process receives the extra resources allocated by the resource manager, it will start the target execution process, and assign unfinished tasks to the target execution process.
  • the target component is the Spark component
  • the first process is the Driver process
  • the Driver process periodically scans the changes of multiple target parameters, and then applies for additional resources to the Spark component from the resource manager Yarn
  • the first process After obtaining the allocated additional resources, the target execution process will be automatically started, and unfinished tasks will be assigned to the target execution process.
  • the target execution process refers to the new Executer process, which obtains confirmation from the allocated additional resources.
  • the first process applies for additional resources from the resource manager, and the first process will receive the additional resources allocated by the resource manager.
  • Starting the target execution process and assigning unfinished tasks to the target execution process can realize resource scheduling and improve resource utilization.
  • the first process will start the target execution process after receiving the additional resources allocated by the resource manager, and assign unfinished tasks to the target execution process, including:
  • the first process applies to multiple execution processes that have assigned tasks and have not completed tasks to withdraw unfinished tasks
  • the withdrawn unfinished tasks are reassigned to the target execution process.
  • the Driver process starts a new Executer process after obtaining the additional resources allocated by Yarn, and then applies to the old Executer that has assigned tasks to withdraw tasks that have not yet been run, while the old Executer process The Executer process returns the unrunning tasks to the Driver process, and the Driver process reassigns these tasks to the new Executer process for execution.
  • the first process will apply to multiple execution processes that have assigned tasks and have not completed tasks to withdraw unfinished tasks, and then reassign the withdrawn unfinished tasks to the target execution process, It can realize the scheduling and allocation of resources and improve the operation speed of big data components.
  • the first process will start the target execution process after receiving the additional resources allocated by the resource manager, assign unfinished tasks to the target execution process, and include: :
  • the tasks in the first execution process are completed, report a completion message to the first process, and the first process applies to the resource manager for canceling the first execution process, and restarts according to the target parameter
  • the first execution process executes a new task
  • the first execution process is any one of the plurality of execution processes.
  • the task in the first execution process when the task in the first execution process is completed, it will report the completion status to the first process, and then the first process will apply to the resource manager Yarn to log off the first execution process according to the received information, and according to Multiple target parameters restart the first execution process, allowing the first execution process to perform new tasks.
  • the first execution process refers to an old Executer process that has assigned tasks.
  • the completion message is reported to the first process, and the first process applies to the resource manager for canceling the first execution process, and according to the target parameter Restarting the first execution process and executing new tasks can realize resource reuse, ensure the running speed of big data components, and save resources.
  • the real-time monitoring of the operation of the target component includes:
  • the acquisition method of parameters is determined by modifying the source code of the big data component. For example, if the target component is a Spark component, modify the source code of the Spark component, and change the logical way of obtaining parameters from the configuration file to from the monitoring agent Obtained from the specified node znode in the software zookeeper. Then, according to the running status of the cluster, dynamically write the latest parameter value to the specified znode of zookeeper.
  • the operation status of the target component can be accurately determined, and data support can be provided for the adjustment process of the target parameters.
  • this embodiment can realize distributed monitoring.
  • the target component is a Spark component
  • the logic of the target parameter is changed to be obtained from the specified node znode in the monitoring agent software zookeeper; in the monitoring software, it is necessary to dynamically write the latest target parameter value to the specified znode of zookeeper according to the operation status of the big data components in the cluster.
  • the Spark component needs to be submitted to the resource manager Yarn to run. If the Spark component is found to be running very slowly during the running process, and the running time exceeds the timeout threshold configured in advance by the monitoring software, the monitoring software will judge The number of Executer processes, the memory of each Executer process, and the number of CPU cores of each Executer, and then adjust the three targets of num-executors, executor-memory and driver-memory according to a certain calculation algorithm and the target threshold range configured in advance parameter.
  • the adjusted multiple target parameters need to be saved on the specified znode of zookeeper, and the Driver process periodically scans the three target parameters.
  • the Driver process will re-apply for additional resources from the resource manager Yarn to the Spark component; then the Driver process will start a new Executer process after obtaining the additional resources allocated by the resource manager Yarn, and then assign tasks to the old Executer process Process requests to withdraw tasks that have not yet run.
  • the old execution process Executer will return the unrunning tasks to the Driver process, and then the Driver will reassign the unrunning tasks to the new target execution Executer process, and the target execution process will execute these unrunning tasks.
  • the old execution process Executer will report to the Driver process after executing the running tasks, and the Driver process will apply to the resource manager Yarn to log off the old execution process Executer, and then restart the Executer according to the adjusted target parameters.
  • the old execution process and assign corresponding tasks will be noted that the old execution process Executer will report to the Driver process after executing the running tasks, and the Driver process will apply to the resource manager Yarn to log off the old execution process Executer, and then restart the Executer according to the adjusted target parameters. The old execution process and assign corresponding tasks.
  • the monitoring software can also detect other abnormalities of the Spark components. For example, if Shuffle runs slowly, the monitoring software will adjust the parameters of Shuffle parallelism. For example, when data skew occurs, the monitoring software will detect that a certain task is running If it is abnormally slow, and the running time is much higher than that of other tasks in the taskSet, the parameters of the shuffle read task will be modified, and the batch of taskSets will be resubmitted on other idle process Executers according to the new target parameters, and then completed according to the priority execution The process Executers processing results are used as the final processing results, and the unfinished process Executers terminate the operation directly.
  • a corresponding dynamic parameter adjustment logic and algorithm is developed to adapt to various big data components and compatible with most common Big data components, and the target threshold range of each target parameter needs to be set in advance to prevent too many or too few resources from being applied, avoid resource waste, reduce the workload of operation and maintenance personnel, and thus spend more time In terms of business logic itself, it indirectly lowers the threshold for operation and maintenance personnel.
  • Fig. 3 is a large data component parameter adjustment device provided by the present disclosure. As shown in Fig. 3, the large data component parameter adjustment device provided by the present disclosure includes:
  • the monitoring module 301 is used for real-time monitoring of the operation of the target component;
  • the target component is any component in the big data component;
  • An adjustment module 302 configured to adjust a plurality of target parameters corresponding to the target component according to the operation of the target component;
  • the execution module 302 is configured to, according to the adjusted multiple target parameters, the first process apply for additional resources from the resource manager, and re-determine the target execution process to complete unfinished tasks according to the additional resources.
  • a big data component parameter adjustment device monitors the operation of the target component in real time.
  • the target component is any component in the big data component, and adjusts multiple targets corresponding to the target component according to the operation of the target component.
  • parameters, according to the adjusted multiple target parameters, the first process applies for additional resources from the resource manager, and re-determines the target execution process according to the additional resources to complete unfinished tasks.
  • the big data component parameter adjustment method provided in the present disclosure can adjust parameters adaptively according to the production environment, realize dynamic adjustment of component parameters, reduce labor costs, and improve user experience.
  • FIG. 4 is a schematic diagram of the physical structure of an electronic device provided in an embodiment of the present disclosure. As shown in FIG. 4 , the present disclosure provides an electronic device, including: a processor (processor) 401, a memory (memory) 402, and a bus 403;
  • processor processor
  • memory memory
  • bus 403 a bus
  • processor 401 and the memory 402 complete mutual communication through the bus 403;
  • the processor 401 is used to call the program instructions in the memory 402 to execute the methods provided in the above method embodiments, for example, including: real-time monitoring of the operation of the target component; the target component is any component in the big data component ;According to the operation of the target component, adjusting multiple target parameters corresponding to the target component; according to the adjusted multiple target parameters, the first process applies for additional resources from the resource manager, and re-according to the additional resources Determines that the target execution process completes outstanding tasks.
  • the above-mentioned logic instructions in the memory 403 may be implemented in the form of software function units and be stored in a computer-readable storage medium when sold or used as an independent product.
  • the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
  • the present disclosure also provides a computer program product
  • the computer program product includes a computer program stored on a non-transitory computer-readable storage medium
  • the computer program includes program instructions, and when the program instructions are executed by a computer During execution, the computer can execute the methods provided by the above methods, the method comprising: monitoring the operation of the target component in real time; the target component is any component in the big data component; according to the operation of the target component, adjusting A plurality of target parameters corresponding to the target component; according to the adjusted multiple target parameters, the first process applies for additional resources from the resource manager, and re-determines the target execution process to complete unfinished tasks according to the additional resources.
  • the present disclosure also provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the methods provided above are implemented, and the method includes: monitoring the target in real time The running status of the application; real-time monitoring of the running status of the target component; the target component is any one of the big data components; according to the running status of the target component, adjusting multiple target parameters corresponding to the target component; After adjusting the multiple target parameters, the first process applies for additional resources from the resource manager, and re-determines the target execution process to complete unfinished tasks according to the additional resources.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to realize the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without any creative efforts.
  • each implementation can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware.
  • the essence of the above technical solution or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic discs, optical discs, etc., including several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) execute the methods described in various embodiments or some parts of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)
  • General Factory Administration (AREA)

Abstract

本公开提供一种大数据组件参数调整方法、装置、电子设备及介质,所述方法包括:实时监控目标组件的运行情况,所述目标组件为大数据组件中的任意一个组件,根据所述目标组件的运行情况,调整所述目标组件所对应的多个目标参数,根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据所述额外资源重新确定目标执行进程完成未完成的任务。

Description

大数据组件参数调整方法、装置、电子设备及存储介质
相关申请的交叉引用
本申请要求于2021年12月21日提交的申请号为2021115726590,发明名称为“大数据组件参数调整方法、装置、电子设备及存储介质”的中国专利申请的优先权,其通过引用方式全部并入本文。
技术领域
本公开涉及运维数据处理技术领域,尤其涉及一种大数据组件参数调整方法、装置、电子设备及存储介质。
背景技术
随着大数据生态体系的不断发展完善,越来越多的大数据组件被陆续的发布和应用。
目前,大数据组件已经逐渐地被应用到生产环境中,在功能不断丰富完善的情况下,整个的大数据系统变得越来越复杂。由于大数据组件中的组件数据量非常庞大,对于应用于生产环境中的大数据组件参数进行优化处理,且保证系统的正常运行是非常困难的。
现有技术中,对于大数据组件参数的调整优化处理需要人工操作的方式,需要大量的运维工作人员,工作量比较大,且运维难度比较大,不够及时导致用户体验较差。
发明内容
本公开提供一种大数据组件参数调整方法、装置、电子设备及介质用以解决现有技术中人工调整组件参数不够及时导致用户体验差,人工成本较高的技术问题,本公开实现自适应调整组件参数、降低人力成本提升用户体验 的目的。
第一方面,本公开提供一种大数据组件参数调整方法,包括:
实时监控目标组件的运行情况;所述目标组件为大数据组件中的任意一个组件;
根据所述目标组件的运行情况,调整所述目标组件所对应的多个目标参数;
根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据所述额外资源重新确定目标执行进程完成未完成的任务。
进一步,根据本公开提供的大数据组件参数调整方法,根据所述目标组件的运行情况,调整所述目标组件所对应的多个目标参数,包括:
在所述目标组件为Spark组件,且所述Spark组件运行异常的情况下,根据运行时长与预设的超时阈值,确定出多个目标参数;
根据所述多个目标参数与多个目标阈值区间,调整所述多个目标参数。
进一步,根据本公开提供的大数据组件参数调整方法,所述根据所述多个目标参数与多个目标阈值区间,调整所述多个目标参数,包括:
在第一目标参数属于第一目标阈值区间的情况下,调整所述第一目标参数;
在所述第一目标参数不属于第一目标阈值区间的情况下,调整所述第一目标参数为所述第一目标阈值区间中最接近所述第一目标参数的临界值;
其中,所述第一目标参数为所述多个目标参数中的任意一个;所述第一目标阈值区间为所述多个目标阈值区间中的任意一个。
进一步,根据本公开提供的大数据组件参数调整方法,所述根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据所述额外资源重新确定目标执行进程完成未完成的任务,包括:
在所述目标组件为Spark组件的情况下,所述第一进程向资源管理器申请额外资源;
所述第一进程在接收到所述资源管理器分配的额外资源后会启动目标执行进程,将未完成的任务分配给目标执行进程。
进一步,根据本公开提供的大数据组件参数调整方法,所述第一进程在接收到所述资源管理器分配的额外资源后会启动目标执行进程,将未完成的任务分配给目标执行进程,包括:
所述第一进程向已分配任务且未完成任务的多个执行进程申请撤回未完成任务;
将撤回的未完成任务重新分配给所述目标执行进程。
进一步,根据本公开提供的大数据组件参数调整方法,所述第一进程在接收到所述资源管理器分配的额外资源后会启动目标执行进程,将未完成的任务分配给目标执行进程,还包括:
在第一执行进程中的任务执行完毕的情况下,向所述第一进程报告完成消息,所述第一进程向所述资源管理器申请注销所述第一执行进程,并根据目标参数重新启动所述第一执行进程,执行新的任务;
其中,所述第一执行进程为所述多个执行进程中的任意一个。
进一步,根据本公开提供的大数据组件参数调整方法,所述实时监控目标组件的运行情况,包括:
将所述目标组件的参数获取方式修改为从监控软件代理中的节点中获取;
根据获取的参数确定所述目标组件的运行情况。
第二方面,本公开还提供一种大数据组件参数调整装置,包括:
监控模块,用于实时监控目标组件的运行情况;所述目标组件为大数据组件中的任意一个组件;
调整模块,用于根据所述目标组件的运行情况,调整所述目标组件所对应的多个目标参数;
执行模块,用于根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据所述额外资源重新确定目标执行进程完成未完成的任务。
第三方面,本公开还提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述任一种所述大数据组件参数调整方法的步骤。
第四方面,本公开还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上述任一种所述大数据组件参数调整方法的步骤。
本公开提供的一种大数据组件参数调整方法、装置、电子设备及介质,通过实时监控目标组件的运行情况,目标组件为大数据组件中的任意一个组件,根据目标组件的运行情况,调整目标组件所对应的多个目标参数,根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据额外资源重新确定目标执行进程完成未完成的任务。本公开提供的大数据组件参数调整方法能够根据生产环境自适应的调整参数,实现组件参数的动态调整,降低人工成本,提升用户体验。
附图说明
为了更清楚地说明本公开或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本公开提供的大数据组件参数调整方法的流程示意图;
图2是本公开提供的大数据组件参数调整方法的整体流程示意图;
图3是本公开提供的大数据组件参数调整装置的结构示意图;
图4是本公开提供的电子设备的结构示意图。
具体实施方式
为使本公开的目的、技术方案和优点更加清楚,下面将结合本公开中的附图,对本公开中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
图1为本公开提供的大数据组件参数调整方法的流程示意图,如图1所示,本公开提供的大数据组件参数调整方法,具体包括以下步骤:
步骤101:实时监控目标组件的运行情况;所述目标组件为大数据组件中的任意一个组件。
在本实施例中,需要实时监控大数据中的目标组件的运行情况,其中,运行情况包括组件运行是否缓慢、组件运行是否存在异常等情况。而且本实施例中是采用监控软件来实时监控大数据中的目标组件的运行情况的,本实施例中的监控软件是基于Scala与Java语言实现的,而且基于zookeeper实现参数配置的同步和动态调整,通过修改大数据组件的源代码的方式达到与该监控软件的功能适配,其中,zookeeper是一种为分布式应用组件提供一致服务的代理软件。
需要说明的是,目标组件为大数据组件中的任意一个,如大数据组件中计算工具类的组件有Spark、hadoop、flink组件等,存储类的组件有Hive、Hbase、Redis等组件,本实施例中目标组件可以为其中任意一个组件,具体可以根据用户的实际需要进行选择,在此不作具体限定。
步骤102:根据所述目标组件的运行情况,调整所述目标组件所对应的多个目标参数。
在本实施例中,需要根据目标组件的运行情况,调整目标组件对应的多个目标参数,其中,目标参数是指用于描述目标组件运行性能的参数,并且是可动态调整的参数,具体可以是集群中启动的executors进程的总数、每个executors进程的内存等参数,并将多个目标参数写入zookeeper中指定的节点znode上,用于实时监控目标组件的运行状况,需要说明的是,多个目标参数可以根据用户的选择的目标组件进行选择确认,在此不作具体限定。
举例说明,假如目标组件为Spark时,将Spark组件提交到Yarn上运行,监控软件会实时地监控Spark组件的运行情况,可以具体包括Application、Stage、Taskset组件的运行情况,根据运行情况,动态地调整多个目标参数,并将多个目标参数写入到zookeeper中指定的znode上进行存储。其中,多个目标参数可以是集群中启动的executors进程的总数(num-executors)、每个 executors进程的内存(executor-memory)以及组件的管理内存(driver-memory)。
其中,Yarn是一种Hadoop资源管理器,是一种通用资源管理系统,可为上层应用提供统一的资源管理和调度。
需要说明的是,本实施例中,监控软件采用分布式部署,实现分布式监控,能更快地获取集群中每个节点的运行情况,快速对目标参数做出调整。
步骤103:根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据所述额外资源重新确定目标执行进程完成未完成的任务。
在本实施例中,需要根据调整后的多个目标参数的大小,第一进程向资源管理器申请额外资源,然后根据申请到的额外资源重新确定出目标执行进程来执行尚未完成的任务。其中,本实施例中,额外资源是指新的执行进程,申请新的执行进程来完成旧的进程尚未完成的任务。
需要说明的是,本实施例中,第一进程为Driver进程,能够周期性的扫描到多个目标参数的变化,在多个目标参数发生变化时,向资源管理器Yarn申请额外资源给到目标组件,然后第一进程还会根据分配到的额外资源启动目标执行进程用于执行未完成的任务。
根据本公开提供的一种大数据组件参数调整方法,通过实时监控目标组件的运行情况,目标组件为大数据组件中的任意一个组件,根据目标组件的运行情况,调整目标组件所对应的多个目标参数,根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据额外资源重新确定目标执行进程完成未完成的任务。本公开提供的大数据组件参数调整方法能够根据生产环境自适应的调整参数,实现组件参数的动态调整,降低人工成本,提升用户体验。
基于上述任一实施例,在本实施例中,所述根据所述目标组件的运行情况,调整所述目标组件所对应的多个目标参数,包括:
在所述目标组件为Spark组件,且所述Spark组件运行异常的情况下,根据运行时长与预设的超时阈值,确定出多个目标参数;
根据所述多个目标参数与多个目标阈值区间,调整所述多个目标参数。
在本实施例中,当将Spark组件提交到资源管理器Yarn上运行时,运行过程中监控到Spark组件运行非常缓慢,且运行时长超过监控软件提前配置好的超时阈值的情况下,监控软件会判断确定出多个目标数值,如启动的executors进程的总数、每个executors进程的内存以及每个executors进程的中央处理器(Central Processing Unit,CPU)的内核数,然后根据确定的目标数值和一定的计算方法,判断目标数值与目标阈值区间的关系,动态地调整多个目标参数如num-executors和executor-memory以及driver-memory等相关参数。
需要说明的是,目标阈值区间是用于确定目标参数的调整范围的,如目标参数值的大小为5,设定的目标阈值区间为3-10,那么可以根据需要将目标参数自动调整为6也是可以的。如果目标参数值是12,那么先自动调整为边界值10,然后监控模块会记录日志并发送告警给运维人员。
根据本公开提供的大数据组件参数调整方法,通过判断目标参数与预设的目标阈值区间的关系,实现动态调整目标参数的目的,降低大数据集群的运维成本与运维难度。
基于上述任一实施例,在本实施例中,所述根据所述多个目标参数与多个目标阈值区间,调整所述多个目标参数,包括:
在第一目标参数属于第一目标阈值区间的情况下,调整所述第一目标参数;
在所述第一目标参数不属于第一目标阈值区间的情况下,调整所述第一目标参数为所述第一目标阈值区间中最接近所述第一目标参数的临界值;
其中,所述第一目标参数为所述多个目标参数中的任意一个;所述第一目标阈值区间为所述多个目标阈值区间中的任意一个。
在本实施例中,当第一目标参数属于预先设定的第一目标阈值区间时,调整第一目标参数,其中,第一目标参数可以为集群中启动的executors进程的总数、每个executors进程的内存以及组件的管理内存三个参数中的任意一个。需要说明的是,调整之后的第一目标参数的大小也是属于第一目标阈值区间内的参数。
在本实施例中,当第一目标参数不属于预先设定的第一目标阈值区间时,对第一目标参数进行调整,调整为第一目标阈值区间的临界值,若是不属于第一目标阈值区间时,说明该目标参数对应的资源不能完全满足,可进行相应的部分调整,并在监控软件中记录并发送告警,然后再发送告警供运维人员参考并及时调整阈值。需要说明的是,临界值可以是第一目标阈值区间的最大值,也可以是第一目标阈值区间的最小值,可以根据用户的实际需要进行设定,在此不作具体限定。
根据本公开提供的大数据组件参数调整方法,通过预先设置目标阈值区间,根据设定的每个目标参数的最大值和最小值,能够防止动态调整目标参数的过程中出现调整过大或过小的情况,避免造成资源浪费或异常。
基于上述任一实施例,在本实施例中,所述根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据所述额外资源重新确定目标执行进程完成未完成的任务,包括:
在所述目标组件为Spark组件的情况下,所述第一进程向资源管理器申请额外资源;
所述第一进程在接收到所述资源管理器分配的额外资源后会启动目标执行进程,将未完成的任务分配给目标执行进程。
在本实施例中,当目标组件为Spark组件时,第一进程为Driver进程,Driver进程周期性扫描多个目标参数的变化,进而向资源管理器Yarn申请额外资源给该Spark组件,第一进程在获取到分配的额外资源后会自动启动目标执行进程,并将未完成任务分配给该目标执行进程。需要说明的是,目标执行进程是指新的Executer进程,从分配到的额外资源中获取确认的。
根据本公开实施例提供的大数据组件参数调整方法,当目标组件为Spark组件的情况下,第一进程向资源管理器申请额外资源,第一进程在接收到资源管理器分配的额外资源后会启动目标执行进程,将未完成的任务分配给目标执行进程,能够实现资源的调度,提高资源的利用率。
基于上述任一实施例,在本实施例中,所述第一进程在接收到所述资源管理器分配的额外资源后会启动目标执行进程,将未完成的任务分配给目标 执行进程,包括:
所述第一进程向已分配任务且未完成任务的多个执行进程申请撤回未完成任务;
将撤回的未完成任务重新分配给所述目标执行进程。
在本实施例中,第一进程在接收到分配的新的额外资源后,会启动新的执行进程,同时还会向已分配多项任务的旧的多个执行进程申请撤回还未运行的任务,并将撤回的任务重新分配给目标执行进程去执行。需要说明的是,在本实施例中,Driver进程在获取到Yarn分配过来的额外资源后,启动新的Executer进程,然后向已分配tasks的旧的Executer申请撤回还未运行的tasks,而旧的Executer进程将未运行的tasks返还给Driver进程,Driver进程将这些tasks重新分配给新的Executer进程去执行。
根据本公开提供的大数据组件参数调整方法,第一进程会向已经分配任务且未完成任务的多个执行进程申请撤回未完成的任务,然后将撤回的未完成任务重新分配给目标执行进程,能够实现资源的调度分配,提高大数据组件的运行速率。
基于上述任一实施例,在本实施例中,所述第一进程在接收到所述资源管理器分配的额外资源后会启动目标执行进程,将未完成的任务分配给目标执行进程,还包括:
在第一执行进程中的任务执行完毕的情况下,向所述第一进程报告完成消息,所述第一进程向所述资源管理器申请注销所述第一执行进程,并根据目标参数重新启动所述第一执行进程,执行新的任务;
其中,所述第一执行进程为所述多个执行进程中的任意一个。
在本实施例中,当第一执行进程中的任务执行完毕时,会向第一进程报告完成情况,然后第一进程根据接收到信息向资源管理器Yarn申请注销该第一执行进程,并根据多个目标参数重新启动第一执行进程,让第一执行进程执行新的任务。其中,第一执行进程是指已经分配任务的旧的Executer进程。
根据本公开提供的大数据组件参数调整方法,当第一执行进程中的任务执行完毕时,向第一进程报告完成消息,第一进程向资源管理器申请注销第 一执行进程,并根据目标参数重新启动第一执行进程,执行新的任务,能够实现资源的重复利用,保证大数据组件的运行速率,节约资源。
基于上述任一实施例,在本实施例中,所述实时监控目标组件的运行情况,包括:
将所述目标组件的参数获取方式修改为从监控代理软件中的节点上获取;
根据获取的参数确定所述目标组件的运行情况。
在本实施例中,需要将目标组件的参数获取方式修改为从监控代理软件中的节点上获取,然后根据获取的参数确定出目标组件的运行情况。本实施例中,通过修改大数据组件源代码的方式来确定参数的获取方式,如目标组件为Spark组件,修改Spark组件的源代码,将从配置文件里获取参数的逻辑方式改成从监控代理软件zookeeper中的指定节点znode中获取。然后根据集群的运行情况,动态地向zookeeper的指定znode上写入最新的参数值。
根据本公开提供的大数据组件参数调整方法,通过从确定的监控代理软件中的节点上获取相应的参数,能够准确地确定出目标组件的运行情况,为目标参数的调整处理提供数据支持。
基于上述任一实施例,在本实施例中,如图2所示,本实施例可以实现分布式监控,在目标组件为Spark组件时,需要修改Spark组件的源代码,将从配置文件里获取目标参数的逻辑改成从监控代理软件zookeeper中指定节点znode中获取;在监控软件中,需要根据集群中大数据组件的运行情况,动态地向zookeeper的指定znode上写入最新的目标参数值。
在本实施例中,需要实时地监控Spark组件的运行情况,主要包括Application、Stage、Taskset组件的运行情况,然后根据各个组件的运行情况,动态地调整目标参数并写入到zookeeper的指定znode上。
需要说明的是,还需要将Spark组件提交到资源管理器Yarn上运行,如果在运行过程中发现Spark组件运行非常缓慢,运行时长超过了监控软件提前配置好的超时阈值时,该监控软件会判断Executer进程数量和每个Executer进程的内存以及每个Executer的CPU core数量,然后根据一定的运算算法以 及提前配置好的目标阈值区间,调整num-executors和executor-memory以及driver-memory这三个目标参数。
需要说明的是,本实施例中还需要将调整后的多个目标参数保存到zookeeper的指定znode上,Driver进程周期性对该三个目标参数进行扫描,在确定三个目标参数发生变化的情况下,Driver进程会重新向资源管理器Yarn申请额外资源给Spark组件;然后Driver进程在获取到资源管理器Yarn分配过来的额外资源后,启动新的Executer进程,然后向已分配tasks的旧的执行进程申请撤回还未运行的任务。
需要说明的是,旧的执行进程Executer会将未运行的任务返还给Driver进程,然后Driver将未运行的任务重新分配给新的目标执行Executer进程,目标执行进程会执行这些未运行的任务。
需要说明的是,旧的执行进程Executer将正在运行的tasks执行完后,向Driver进程报告,Driver进程则向资源管理器Yarn申请注销旧的执行进程Executer,然后按照调整后的目标参数重新启动该旧的执行进程,并分配相应的任务。
需要说明的是,监控软件还能检测Spark组件的其他运行异常,比如Shuffle运行缓慢,监控软件则会调整Shuffle并行度的参数,再比如说发生数据倾斜时,监控软件会检测到某个任务运行异常缓慢,运行时长远远高于taskSet中的其他任务的运行时长,则会修改shuffle read task的参数,在其他空闲的进程Executers上根据新的目标参数重新递交这批taskSet,然后根据优先执行完成的进程Executers处理结果作为最终的处理结果,未执行完成的进程Executers则直接终止操作。
根据本公开实施例提供的大数据组件参数调整方法,对于大数据中的每个组件开发出对应的动态参数调整的逻辑和算法,以适配各种各样的大数据组件,兼容大部分常见的大数据组件,并且需要提前设置好每个目标参数的目标阈值区间,防止申请过多或者过少的资源,避免资源浪费,降低了运维人员的工作负荷,从而能把更多的时间放在业务逻辑本身上,间接降低运维人员的门槛。
图3为本公开提供的一种大数据组件参数调整装置,如图3所示,本公开提供的大数据组件参数调整装置,包括:
监控模块301,用于实时监控目标组件的运行情况;所述目标组件为大数据组件中的任意一个组件;
调整模块302,用于根据所述目标组件的运行情况,调整所述目标组件所对应的多个目标参数;
执行模块302,用于根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据所述额外资源重新确定目标执行进程完成未完成的任务。
本公开提供的一种大数据组件参数调整装置,通过实时监控目标组件的运行情况,目标组件为大数据组件中的任意一个组件,根据目标组件的运行情况,调整目标组件所对应的多个目标参数,根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据额外资源重新确定目标执行进程完成未完成的任务。本公开提供的大数据组件参数调整方法能够根据生产环境自适应的调整参数,实现组件参数的动态调整,降低人工成本,提升用户体验。
由于本公开实施例所述装置与上述实施例所述方法的原理相同,对于更加详细的解释内容在此不再赘述。
图4为本公开实施例中提供的电子设备实体结构示意图,如图4所示,本公开提供一种电子设备,包括:处理器(processor)401、存储器(memory)402和总线403;
其中,处理器401、存储器402通过总线403完成相互间的通信;
处理器401用于调用存储器402中的程序指令,以执行上述各方法实施例中所提供的方法,例如包括:实时监控目标组件的运行情况;所述目标组件为大数据组件中的任意一个组件;根据所述目标组件的运行情况,调整所述目标组件所对应的多个目标参数;根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据所述额外资源重新确定目标执行进程完成未完成的任务。
此外,上述的存储器403中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
另一方面,本公开还提供一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行上述各方法所提供的方法,该方法包括:实时监控目标组件的运行情况;所述目标组件为大数据组件中的任意一个组件;根据所述目标组件的运行情况,调整所述目标组件所对应的多个目标参数;根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据所述额外资源重新确定目标执行进程完成未完成的任务。
又一方面,本公开还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各提供的方法,该方法包括:实时监控目标应用的运行情况;实时监控目标组件的运行情况;所述目标组件为大数据组件中的任意一个组件;根据所述目标组件的运行情况,调整所述目标组件所对应的多个目标参数;根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据所述额外资源重新确定目标执行进程完成未完成的任务。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例 方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
最后应说明的是:以上实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开各实施例技术方案的精神和范围。

Claims (16)

  1. 一种大数据组件参数调整方法,包括:
    实时监控目标组件的运行情况;所述目标组件为大数据组件中的任意一个组件;
    根据所述目标组件的运行情况,调整所述目标组件所对应的多个目标参数;
    根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据所述额外资源重新确定目标执行进程完成未完成的任务。
  2. 根据权利要求1所述的大数据组件参数调整方法,其中,所述根据所述目标组件的运行情况,调整所述目标组件所对应的多个目标参数,包括:
    在所述目标组件为Spark组件,且所述Spark组件运行异常的情况下,根据运行时长与预设的超时阈值,确定出多个目标参数;
    根据所述多个目标参数与多个目标阈值区间,调整所述多个目标参数。
  3. 根据权利要求2所述的大数据组件参数调整方法,其中,所述根据所述多个目标参数与多个目标阈值区间,调整所述多个目标参数,包括:
    在第一目标参数属于第一目标阈值区间的情况下,调整所述第一目标参数;
    在所述第一目标参数不属于第一目标阈值区间的情况下,调整所述第一目标参数为所述第一目标阈值区间中最接近所述第一目标参数的临界值;
    其中,所述第一目标参数为所述多个目标参数中的任意一个;所述第一目标阈值区间为所述多个目标阈值区间中的任意一个。
  4. 根据权利要求1所述的大数据组件参数调整方法,其中,所述根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据所述额外资源重新确定目标执行进程完成未完成的任务,包括:
    在所述目标组件为Spark组件的情况下,所述第一进程向资源管理器申请额外资源;
    所述第一进程在接收到所述资源管理器分配的额外资源后会启动目标执行进程,将未完成的任务分配给目标执行进程。
  5. 根据权利要求4所述的大数据组件参数调整方法,其中,所述第一进程在接收到所述资源管理器分配的额外资源后会启动目标执行进程,将未完成的任务分配给目标执行进程,包括:
    所述第一进程向已分配任务且未完成任务的多个执行进程申请撤回未完成任务;
    将撤回的未完成任务重新分配给所述目标执行进程。
  6. 根据权利要求5所述的大数据组件参数调整方法,其中,所述第一进程在接收到所述资源管理器分配的额外资源后会启动目标执行进程,将未完成的任务分配给目标执行进程,还包括:
    在第一执行进程中的任务执行完毕的情况下,向所述第一进程报告完成消息,所述第一进程向所述资源管理器申请注销所述第一执行进程,并根据目标参数重新启动所述第一执行进程,执行新的任务;
    其中,所述第一执行进程为所述多个执行进程中的任意一个。
  7. 根据权利要求1所述的大数据组件参数调整方法,其中,所述实时监控目标组件的运行情况,包括:
    将所述目标组件的参数获取方式修改为从监控软件代理中的节点中获取;
    根据获取的参数确定所述目标组件的运行情况。
  8. 一种大数据组件参数调整装置,包括:
    监控模块,用于实时监控目标组件的运行情况;所述目标组件为大数据组件中的任意一个组件;
    调整模块,用于根据所述目标组件的运行情况,调整所述目标组件所对应的多个目标参数;
    执行模块,用于根据调整后的多个目标参数,第一进程向资源管理器申请额外资源,并根据所述额外资源重新确定目标执行进程完成未完成的任务。
  9. 根据权利要求8所述的大数据组件参数调整装置,其中,所述调整模块包括确定单元和调整单元;
    所述确定单元用于,在所述目标组件为Spark组件,且所述Spark组件运 行异常的情况下,根据运行时长与预设的超时阈值,确定出多个目标参数;
    所述调整单元用于,根据所述多个目标参数与多个目标阈值区间,调整所述多个目标参数。
  10. 根据权利要求9所述的大数据组件参数调整装置,其中,所述调整单元具体用于:
    在第一目标参数属于第一目标阈值区间的情况下,调整所述第一目标参数;
    在所述第一目标参数不属于第一目标阈值区间的情况下,调整所述第一目标参数为所述第一目标阈值区间中最接近所述第一目标参数的临界值;
    其中,所述第一目标参数为所述多个目标参数中的任意一个;所述第一目标阈值区间为所述多个目标阈值区间中的任意一个。
  11. 根据权利要求8所述的大数据组件参数调整装置,其中,所述执行模块包括申请单元和分配单元;
    所述申请单元用于,在所述目标组件为Spark组件的情况下,所述第一进程向资源管理器申请额外资源;
    所述分配单元用于,所述第一进程在接收到所述资源管理器分配的额外资源后会启动目标执行进程,将未完成的任务分配给目标执行进程。
  12. 根据权利要求11所述的大数据组件参数调整装置,其中,所述分配单元具体用于:
    所述第一进程向已分配任务且未完成任务的多个执行进程申请撤回未完成任务;
    将撤回的未完成任务重新分配给所述目标执行进程。
  13. 根据权利要求12所述的大数据组件参数调整装置,其中,所述分配单元还用于:
    在第一执行进程中的任务执行完毕的情况下,向所述第一进程报告完成消息,所述第一进程向所述资源管理器申请注销所述第一执行进程,并根据目标参数重新启动所述第一执行进程,执行新的任务;
    其中,所述第一执行进程为所述多个执行进程中的任意一个。
  14. 根据权利要求8所述的大数据组件参数调整装置,其中,所述监控模块具体用于:
    将所述目标组件的参数获取方式修改为从监控软件代理中的节点中获取;
    根据获取的参数确定所述目标组件的运行情况。
  15. 一种电子设备,包括:处理器、存储器和总线,其中,
    所述处理器和所述存储器通过所述总线完成相互间的通信;
    所述存储器存储有可被所述处理器执行的程序指令,所述处理器调用所述程序指令能够执行如权利要求1至7中任一项所述大数据组件参数调整方法的步骤。
  16. 一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行如权利要求1至7中任一项所述大数据组件参数调整方法的步骤。
PCT/CN2022/107123 2021-12-21 2022-07-21 大数据组件参数调整方法、装置、电子设备及存储介质 WO2023115931A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111572659.0A CN114371975A (zh) 2021-12-21 2021-12-21 大数据组件参数调整方法、装置、电子设备及存储介质
CN202111572659.0 2021-12-21

Publications (1)

Publication Number Publication Date
WO2023115931A1 true WO2023115931A1 (zh) 2023-06-29

Family

ID=81140660

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/107123 WO2023115931A1 (zh) 2021-12-21 2022-07-21 大数据组件参数调整方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN114371975A (zh)
WO (1) WO2023115931A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116610537A (zh) * 2023-07-20 2023-08-18 中债金融估值中心有限公司 一种数据量监控方法、系统、设备及存储介质
CN117407177A (zh) * 2023-12-13 2024-01-16 苏州元脑智能科技有限公司 任务执行方法、装置、电子设备及可读存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114371975A (zh) * 2021-12-21 2022-04-19 浪潮通信信息系统有限公司 大数据组件参数调整方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7757214B1 (en) * 2005-11-10 2010-07-13 Symantec Operating Coporation Automated concurrency configuration of multi-threaded programs
CN108845884A (zh) * 2018-06-15 2018-11-20 中国平安人寿保险股份有限公司 物理资源分配方法、装置、计算机设备和存储介质
CN110389842A (zh) * 2019-07-26 2019-10-29 中国工商银行股份有限公司 一种动态资源分配方法、装置、存储介质及设备
CN114371975A (zh) * 2021-12-21 2022-04-19 浪潮通信信息系统有限公司 大数据组件参数调整方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7757214B1 (en) * 2005-11-10 2010-07-13 Symantec Operating Coporation Automated concurrency configuration of multi-threaded programs
CN108845884A (zh) * 2018-06-15 2018-11-20 中国平安人寿保险股份有限公司 物理资源分配方法、装置、计算机设备和存储介质
CN110389842A (zh) * 2019-07-26 2019-10-29 中国工商银行股份有限公司 一种动态资源分配方法、装置、存储介质及设备
CN114371975A (zh) * 2021-12-21 2022-04-19 浪潮通信信息系统有限公司 大数据组件参数调整方法、装置、电子设备及存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116610537A (zh) * 2023-07-20 2023-08-18 中债金融估值中心有限公司 一种数据量监控方法、系统、设备及存储介质
CN116610537B (zh) * 2023-07-20 2023-11-17 中债金融估值中心有限公司 一种数据量监控方法、系统、设备及存储介质
CN117407177A (zh) * 2023-12-13 2024-01-16 苏州元脑智能科技有限公司 任务执行方法、装置、电子设备及可读存储介质
CN117407177B (zh) * 2023-12-13 2024-03-08 苏州元脑智能科技有限公司 任务执行方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN114371975A (zh) 2022-04-19

Similar Documents

Publication Publication Date Title
WO2023115931A1 (zh) 大数据组件参数调整方法、装置、电子设备及存储介质
WO2020147330A1 (zh) 一种数据流处理方法及系统
US20170031622A1 (en) Methods for allocating storage cluster hardware resources and devices thereof
EP2614436B1 (en) Controlled automatic healing of data-center services
US9104498B2 (en) Maximizing server utilization within a datacenter
CN108132837B (zh) 一种分布式集群调度系统及方法
US20090265707A1 (en) Optimizing application performance on virtual machines automatically with end-user preferences
US20140130048A1 (en) Dynamic scaling of management infrastructure in virtual environments
CN111694633A (zh) 集群节点负载均衡方法、装置及计算机存储介质
CN105760234A (zh) 一种线程池管理方法及装置
US8239872B2 (en) Method and system for controlling distribution of work items to threads in a server
US10733024B2 (en) Task packing scheduling process for long running applications
CN111666141A (zh) 任务调度方法、装置、设备及计算机存储介质
CN113467908B (zh) 任务执行方法、装置、计算机可读存储介质及终端设备
CN114116173A (zh) 动态调整任务分配的方法、装置和系统
US11429435B1 (en) Distributed execution budget management system
CN112817758A (zh) 一种资源消耗动态控制方法、系统、存储介质及设备
CN111158896A (zh) 一种分布式进程调度方法及系统
CN116257333A (zh) 分布式任务调度方法、装置和分布式任务调度系统
CN114281479A (zh) 一种容器管理方法及装置
US11366692B2 (en) Task execution based on whether task completion time exceeds execution window of device to which task has been assigned
CN112350837B (zh) 一种基于云平台的电力应用集群管理方法及装置
CN112860391A (zh) 一种动态集群渲染资源管理系统及方法
CN112433838A (zh) 批量调度方法、装置、设备及计算机存储介质
CN114546631A (zh) 任务调度方法、控制方法、核心、电子设备、可读介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22909261

Country of ref document: EP

Kind code of ref document: A1