CN110046046A - A kind of distributed hyperparameter optimization system and method based on Mesos - Google Patents

A kind of distributed hyperparameter optimization system and method based on Mesos Download PDF

Info

Publication number
CN110046046A
CN110046046A CN201910278557.4A CN201910278557A CN110046046A CN 110046046 A CN110046046 A CN 110046046A CN 201910278557 A CN201910278557 A CN 201910278557A CN 110046046 A CN110046046 A CN 110046046A
Authority
CN
China
Prior art keywords
mesos
hyperparameter optimization
distributed
application
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910278557.4A
Other languages
Chinese (zh)
Inventor
陆忠华
李铄
孙永泽
代闯闯
邓笋根
牛北方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN201910278557.4A priority Critical patent/CN110046046A/en
Publication of CN110046046A publication Critical patent/CN110046046A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of distributed hyperparameter optimization system and method based on Mesos, it is combined including computation layer described in computation layer and dispatch layer with distributed hyperparameter optimization algorithm, the computation layer is made of all kinds of optimization algorithms, for carrying out the sampling of distributed hyperparameter optimization and generating calculating task, the dispatch layer is the specific implementation of a Mesos operation frame, is mainly responsible for resource allocation and executes calculating task.A kind of distributed hyperparameter optimization system and method based on Mesos proposed by the invention can satisfy multi-tenant and use under mixed portion's scene of High-Performance Computing Cluster, improve the efficiency of hyperparameter optimization, the scope of application of the strategy is more extensive, invasive low to already existing Mesos group system.

Description

A kind of distributed hyperparameter optimization system and method based on Mesos
Technical field
The present invention relates to scheduler technical field more particularly to a kind of distributed hyperparameter optimization systems based on Mesos And method.
Background technique
Scheduler is the ability that can be provided timing and excite task, the ability of resource management, the dependence of maintenance task With the system of execution sequence, some scheduling systems are also integrated with the tool of Mission Monitor and various measure of criterions, cluster resource tune Degree is the system of the resource management and scheduling in distributed system, and main to provide the ability of resource management, parameter optimization is to reach To a kind of method of design object, by parameterizing design object, using optimization method, continuous adjusted design variable makes Design result is obtained constantly close to the target value of parametrization, and the concept of hyperparameter optimization is generally used in present artificial intelligence In learning type model, it is generally the case that some parameters for needing artificially to specify, these parameters are exactly hyper parameter, super for these The optimization of parameter refers in the case where determining that model and parameter combination determine, sets the optimization range of each parameter, then right These parameters are optimized to reach a satisfied training effect.
The realization of hyperparameter optimization at present, there are probelem in two aspects: be on the one hand hyperparameter optimization sampling and hold It is serial implementation in capable process, the machine learning for nowadays artificial intelligence field, the hyper parameter in deep learning algorithm is excellent In change, need to spend a large amount of resource and time due to carrying out primary parameter outcome evaluation, the serial drawback of hyperparameter optimization is just It exposes out;On the other hand, be deficient in resources scheduling mechanism during super several optimizations, does not support rent in current realization more Family, platform model submit hyperparameter optimization operation, this is disagreed for now a large amount of using the situation of large-scale cluster resource, lead Applying, family is possible to carry out hyperparameter optimization using extremely limited resource, can not utilize cluster resource very well, additionally Have one have to put forward a bit, currently existing scheme is mainly by cloud service provider Google etc. based on oneself system architecture It is researched and developed, the scope of application is very narrow.
Mesos is a cluster resource scheduling system, in view of the above-mentioned problems, main target of the present invention and content concentrate on setting It counts and realizes a kind of distributed hyperparameter optimization system and method based on Mesos, allow hyperparameter optimization operation in large size It runs on distributed type assemblies, is asked with the resource contention under solving the use of a large amount of computing resources and multi-tenant scene of hyperparameter optimization Topic.
Summary of the invention
The purpose of the present invention is to solve disadvantages existing in the prior art, and a kind of point based on Mesos proposed Cloth hyperparameter optimization system and method.
To achieve the goals above, present invention employs following technical solutions:
A kind of distributed hyperparameter optimization system based on Mesos, including computation layer described in computation layer and dispatch layer with point The hyperparameter optimization algorithm of cloth combines, and the computation layer is made of all kinds of optimization algorithms, for carrying out distributed hyper parameter The sampling and generation calculating task of optimization, the dispatch layer is the specific implementation of a Mesos operation frame, is mainly responsible for resource Distribution and execution calculating task;
The computation layer is formed by main application and from application, and using client/server, the main application is responsible for running hyper parameter Optimization algorithm, it is described from application be responsible for operation intelligent algorithm training program, the dispatch layer by client, scheduler, hold Row device is constituted, and for users to use, the scheduler is responsible for the scheduling of system resource to the client, and the actuator is responsible for calculating The execution of layer task, is communicatively coupled between each section by network.
Preferably, only have a main application in the computation layer operational process, have one or more according to user demand It is a to be calculated from application.
A kind of distributed hyperparameter optimization method based on Mesos, the described method comprises the following steps:
(1) the specified artificial intelligence program for needing to optimize of user, and the hyper parameter for needing to optimize is configured, it is submitted by system Task gives Mesos cluster;
(2) the integrated dispatching algorithm of dispatch layer is according to the resource requirement of user's submission task and the operation conditions of task It distributes cluster resource, starts computation layer various components;
(3) computation layer starts the main configuration progress hyperparameter optimization applied according to user first, and new to dispatch layer application Resource run from application, be responsible for being calculated according to the parameter that main application provides from application, and calculated result is fed back to Main application judges whether the stopping requirement for meeting user;
(4) finally when main application, which reaches user, to be stopped requiring, hyperparameter optimization result is returned into user, operation terminates.
Compared with the prior art, the beneficial effects of the present invention are:
A kind of distributed hyperparameter optimization system and method based on Mesos proposed by the invention can satisfy multi-tenant It is used under mixed portion's scene of High-Performance Computing Cluster, improves the efficiency of hyperparameter optimization, the scope of application of the strategy is more extensive, right Already existing Mesos group system is invasive low.
Detailed description of the invention
Fig. 1 is a kind of calculating layer frame of the distributed hyperparameter optimization system and method based on Mesos proposed by the present invention Structure schematic diagram.
Fig. 2 is a kind of scheduling layer frame of the distributed hyperparameter optimization system and method based on Mesos proposed by the present invention Structure schematic diagram.
Fig. 3 is a kind of operational method of the distributed hyperparameter optimization system and method based on Mesos proposed by the present invention Flow diagram.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.
In the description of the present invention, it is to be understood that, term " on ", "lower", "front", "rear", "left", "right", "top", The orientation or positional relationship of the instructions such as "bottom", "inner", "outside" is to be based on the orientation or positional relationship shown in the drawings, merely to just In description the present invention and simplify description, rather than the device or element of indication or suggestion meaning must have a particular orientation, with Specific orientation construction and operation, therefore be not considered as limiting the invention.
Referring to Fig.1-3, a kind of distributed hyperparameter optimization system based on Mesos, including computation layer and dispatch layer calculate Layer is combined with distributed hyperparameter optimization algorithm, and computation layer is made of all kinds of optimization algorithms, for carrying out distributed super ginseng The sampling and generation calculating task of number optimization, dispatch layer is the specific implementation of a Mesos operation frame, is mainly responsible for resource point Match and execute calculating task;
Computation layer is formed by main application and from application, and using client/server, main application is responsible for running hyperparameter optimization algorithm, It is responsible for operation intelligent algorithm training program from application, dispatch layer is made of client, scheduler, actuator, and client supplies User uses, and scheduler is responsible for the scheduling of system resource, and actuator is responsible for the execution of computation layer task, passes through net between each section Network is communicatively coupled.
Only have a main application in computation layer operational process, one or more is had according to user demand and is carried out from application It calculates.
A kind of distributed hyperparameter optimization method based on Mesos, method the following steps are included:
(1) the specified artificial intelligence program for needing to optimize of user, and the hyper parameter for needing to optimize is configured, it is submitted by system Task gives Mesos cluster;
(2) the integrated dispatching algorithm of dispatch layer is according to the resource requirement of user's submission task and the operation conditions of task It distributes cluster resource, starts computation layer various components;
(3) computation layer starts the main configuration progress hyperparameter optimization applied according to user first, and new to dispatch layer application Resource run from application, be responsible for being calculated according to the parameter that main application provides from application, and calculated result is fed back to Main application judges whether the stopping requirement for meeting user;
(4) finally when main application, which reaches user, to be stopped requiring, hyperparameter optimization result is returned into user, operation terminates.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.

Claims (3)

1. a kind of distributed hyperparameter optimization system based on Mesos, including computation layer and dispatch layer, which is characterized in that described Computation layer is combined with distributed hyperparameter optimization algorithm, and the computation layer is made of all kinds of optimization algorithms, for being divided The sampling and generation calculating task of cloth hyperparameter optimization, the dispatch layer is the specific implementation of a Mesos operation frame, main It is responsible for resource allocation and executes calculating task;
The computation layer is formed by main application and from application, and using client/server, the main application is responsible for running hyperparameter optimization Algorithm, described to be responsible for operation intelligent algorithm training program from application, the dispatch layer is by client, scheduler, actuator It constitutes, for users to use, the scheduler is responsible for the scheduling of system resource to the client, and the actuator is responsible for computation layer and is appointed The execution of business is communicatively coupled between each section by network.
2. a kind of distributed hyperparameter optimization system based on Mesos according to claim 1, which is characterized in that described Only have a main application in computation layer operational process, one or more is had according to user demand and is calculated from application.
3. a kind of distributed hyperparameter optimization method based on Mesos, the described method comprises the following steps:
(1) the specified artificial intelligence program for needing to optimize of user, and the hyper parameter for needing to optimize is configured, task is submitted by system Give Mesos cluster;
(2) the integrated dispatching algorithm of dispatch layer is its point according to the resource requirement of user's submission task and the operation conditions of task With cluster resource, start computation layer various components;
(3) computation layer starts the main configuration progress hyperparameter optimization applied according to user, and the money new to dispatch layer application first Source is run from application, is responsible for being calculated according to the parameter that main application provides from application, and calculated result is fed back to lead and is answered With the stopping requirement for judging whether to meet user;
(4) finally when main application, which reaches user, to be stopped requiring, hyperparameter optimization result is returned into user, operation terminates.
CN201910278557.4A 2019-04-09 2019-04-09 A kind of distributed hyperparameter optimization system and method based on Mesos Pending CN110046046A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910278557.4A CN110046046A (en) 2019-04-09 2019-04-09 A kind of distributed hyperparameter optimization system and method based on Mesos

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910278557.4A CN110046046A (en) 2019-04-09 2019-04-09 A kind of distributed hyperparameter optimization system and method based on Mesos

Publications (1)

Publication Number Publication Date
CN110046046A true CN110046046A (en) 2019-07-23

Family

ID=67276384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910278557.4A Pending CN110046046A (en) 2019-04-09 2019-04-09 A kind of distributed hyperparameter optimization system and method based on Mesos

Country Status (1)

Country Link
CN (1) CN110046046A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113608722A (en) * 2021-07-31 2021-11-05 云南电网有限责任公司信息中心 Algorithm packaging method based on distributed technology
CN113712511A (en) * 2021-09-03 2021-11-30 湖北理工学院 Stable mode discrimination method for brain imaging fusion features
US12067420B2 (en) 2020-10-22 2024-08-20 Hewlett Packard Enterprise Development Lp Deep learning autotuning task optimization

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170330105A1 (en) * 2016-05-13 2017-11-16 Cognitive Scale, Inc. Ranking of Parse Options Using Machine Learning
CN107463356A (en) * 2017-08-17 2017-12-12 北京云纵信息技术有限公司 The execution method and apparatus of flow of task
CN108984257A (en) * 2018-07-06 2018-12-11 无锡雪浪数制科技有限公司 A kind of machine learning platform for supporting custom algorithm component

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170330105A1 (en) * 2016-05-13 2017-11-16 Cognitive Scale, Inc. Ranking of Parse Options Using Machine Learning
CN107463356A (en) * 2017-08-17 2017-12-12 北京云纵信息技术有限公司 The execution method and apparatus of flow of task
CN108984257A (en) * 2018-07-06 2018-12-11 无锡雪浪数制科技有限公司 A kind of machine learning platform for supporting custom algorithm component

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李铄,陆忠华,孙永泽: "基于Mesos的分布式参数优化调度策略及系统设计", 《科研信息化技术与应用》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12067420B2 (en) 2020-10-22 2024-08-20 Hewlett Packard Enterprise Development Lp Deep learning autotuning task optimization
CN113608722A (en) * 2021-07-31 2021-11-05 云南电网有限责任公司信息中心 Algorithm packaging method based on distributed technology
CN113712511A (en) * 2021-09-03 2021-11-30 湖北理工学院 Stable mode discrimination method for brain imaging fusion features
CN113712511B (en) * 2021-09-03 2023-05-30 湖北理工学院 Stable mode discrimination method for brain imaging fusion characteristics

Similar Documents

Publication Publication Date Title
CN110046046A (en) A kind of distributed hyperparameter optimization system and method based on Mesos
CN106776005A (en) A kind of resource management system and method towards containerization application
Chen et al. Optimal deadline scheduling with commitment
CN104636197B (en) A kind of evaluation method of data center's virtual machine (vm) migration scheduling strategy
CN107045455A (en) A kind of Docker Swarm cluster resource method for optimizing scheduling based on load estimation
CN107508901A (en) Distributed data processing method, apparatus, server and system
CN110018817A (en) The distributed operation method and device of data, storage medium and processor
CN107330516A (en) Model parameter training method, apparatus and system
CN103699446A (en) Quantum-behaved particle swarm optimization (QPSO) algorithm based multi-objective dynamic workflow scheduling method
CN109857534A (en) A kind of intelligent task scheduling strategy training method based on Policy-Gradient Reinforcement Learning
CN108985709A (en) Workflow management method towards more satellite data centers collaboration Remote Sensing Products production
CN110060765A (en) A kind of standardization cloud radiotherapy planning method, storage medium and system
CN103164190A (en) Rapid parallelization method of totally-distributed type watershed eco-hydrology model
CN107239337A (en) The distribution of virtual resources and dispatching method and system
CN101661520A (en) Synergetic design method for mechanical and electrical products
CN106570679A (en) Participant capability based excitation method for crowdsourcing task of group
CN104965762B (en) A kind of scheduling system towards hybrid task
CN101639788A (en) Multi-core parallel method for continuous system simulation based on TBB threading building blocks
CN110347489A (en) A kind of method for stream processing that the multicenter data collaborative based on Spark calculates
CN115085202A (en) Power grid multi-region intelligent power collaborative optimization method, device, equipment and medium
Balla et al. Reliability-aware: task scheduling in cloud computing using multi-agent reinforcement learning algorithm and neural fitted Q.
CN112488542A (en) Intelligent building site material scheduling method and system based on machine learning
CN105528250B (en) The evaluation and test of Multi-core computer system certainty and control method
CN112967148B (en) Block chain consensus mechanism for intelligent Internet of things computing service
CN105426247B (en) A kind of HLA federal members programming dispatching method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190723