CN109062700A - A kind of method for managing resource and server based on distributed system - Google Patents

A kind of method for managing resource and server based on distributed system Download PDF

Info

Publication number
CN109062700A
CN109062700A CN201810953290.XA CN201810953290A CN109062700A CN 109062700 A CN109062700 A CN 109062700A CN 201810953290 A CN201810953290 A CN 201810953290A CN 109062700 A CN109062700 A CN 109062700A
Authority
CN
China
Prior art keywords
server
resource
node server
deep learning
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810953290.XA
Other languages
Chinese (zh)
Inventor
赵仁明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201810953290.XA priority Critical patent/CN109062700A/en
Publication of CN109062700A publication Critical patent/CN109062700A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of method for managing resource based on distributed system, the distributed system includes the first server for executing spark task, for executing the second server of ResourceManager process, and it is deployed with the node server of worker process, the method is executed by the first server, when including: starting Spark task, resource is assessed according to the calculation amount of pretreated training data and preset deep learning neural network model, to the second server application resource;After the information for receiving the node server for the resource abundance that the second server returns, the model file of the preparatory derived deep learning neural network model is broadcast to the node server of each resource abundance.This programme can carry out auto slice to input data to be efficiently completed the model training of data parallel.

Description

A kind of method for managing resource and server based on distributed system
Technical field
The present invention relates to the communication technology, espespecially a kind of method for managing resource and server based on distributed system.
Background technique
YARN (Yet Another Resource Negotiator, another resource coordination person) is a kind of new resource Manager, it is a universal resource management system, and unified resource management and scheduling, its introducing can be provided for upper layer application For cluster utilization rate, resource unified management and in terms of bring big advantages.YARN can be monitored often simultaneously The state of one subtask.The ApplicationMaster (main application) of YARN is variable part, supports user to different moulds Type writes the AppMst of oneself, and further types of model can be allowed to run under unified YARN frame.
The concept of deep learning is derived from the research of artificial neural network, and the multilayer perceptron containing more hidden layers is exactly a kind of depth Learning structure.Deep learning, which forms more abstract high level by combination low-level feature, indicates attribute classification or feature, with discovery The distributed nature of data indicates.TensorFlow (tensor stream) is that Google is based on DistBelief (manual depth study system System) artificial intelligence system researched and developed, it can be used for the multinomial machine learning such as speech recognition or image recognition and deep learning Field.It can run in small to one smart phone, the various equipment greatly to thousands of data center servers.Spark is The computing engines of large-scale data processing and the Universal-purpose quick of design are aimed at, being analogous to Hadoop (is a kind of distributed system Platform) MapReduce (MapReduce) universal parallel frame.
Using existing technology in application deep learning solving practical problems, user oneself is needed to complete to calculate, deposit The management of the resources such as storage.It needs oneself to complete building for such as Tensorflow even deep learning framework.It needs to manually complete number Data preprocess, Segmentation of Data Set, Feature Engineering, model training, the overall process that model verifying is assessed and model is online.It is multiple when having Task, and when task is complicated, input data amount is larger, matching, the scheduling of resource can not be automatically carried out to task, can not also be had The state of the monitor task of effect.
Summary of the invention
In order to solve the above-mentioned technical problems, the present invention provides a kind of method for managing resource and clothes based on distributed system Business device, to carry out auto slice to input data to be efficiently completed the model training of data parallel.
In order to reach the object of the invention, the present invention provides a kind of method for managing resource based on distributed system, wherein The distributed system includes the first server for executing spark task, for executing ResourceManager process Second server, and it is deployed with the node server of worker process, the method is executed by the first server, packet It includes:
When starting Spark task, according to pretreated training data and preset deep learning neural network model Calculation amount assesses resource, to the second server application resource;
It, will preparatory derived institute after the information for receiving the node server for the resource abundance that the second server returns The model file for stating deep learning neural network model is broadcast to the node server of each resource abundance.
Further, the training data is stored in advance in Hadoop distributed file system.
A kind of server, for executing spark task, wherein include:
Apply for module, when for starting Spark task, according to pretreated training data and preset deep learning mind Calculation amount through network model assesses resource, to the second server application resource;
Broadcast module, after the information of the node server for receiving the resource abundance that the second server returns, The model file of the preparatory derived deep learning neural network model is broadcast to the node clothes of each resource abundance Business device.
Further, the training data is stored in advance in Hadoop distributed file system.
A kind of method for managing resource based on distributed system, wherein the distributed system includes for executing spark The first server of task for executing the second server of ResourceManager process, and is deployed with worker process Node server, the method executes by the second server, comprising:
After the resource bid for receiving the first server, will be stored in Hadoop distributed file system described in Training data carries out fragment, and the training data after fragment is corresponding with each worker process;
Worker process is created on the node server that resource utilization is less than threshold value, and resource utilization is less than threshold value The information of node server be sent to the first server.
A kind of server, for executing ResourceManager process, wherein include:
Fragment module after the resource bid for receiving the first server, will be stored in the distributed text of Hadoop The training data in part system carries out fragment, and the training data after fragment is corresponding with each worker process;
Creation module, for creating worker process on the node server that resource utilization is less than threshold value, by resource The information that utilization rate is less than the node server of threshold value is sent to the first server.
A kind of method for managing resource based on distributed system, wherein the distributed system includes for executing spark The first server of task for executing the second server of ResourceManager process, and is deployed with worker process Node server, the method executes by the node server, comprising:
Receive the model file of the deep learning neural network model of the first server broadcast;
The corresponding training data in Hadoop distributed file system is read, the model file is trained.
Further, it is described the model file is trained after, further includes:
It is distributed that deep learning neural network model and the relevant parameter export that training is completed are stored in the Hadoop In file system.
A kind of node server is deployed with worker process, wherein include:
Receiving module, the model file of the deep learning neural network model for receiving the first server broadcast;
Training module, for reading the corresponding training data in Hadoop distributed file system, to the model text Part is trained.
Further, the node server can also include:
Export module, it is described for the deep learning neural network model and relevant parameter export of training completion to be stored in In Hadoop distributed file system.
A kind of method for managing resource based on distributed system, comprising:
When first server for executing spark task starts Spark task, according to pretreated training data and The calculation amount of preset deep learning neural network model assesses resource, to for executing ResourceManager process Two server application resources;
After the second server receives the resource bid of the first server, the distributed text of Hadoop will be stored in The training data in part system carries out fragment, and the training data after fragment is corresponding with each worker process;Make in resource It is less than on the node server of threshold value the worker process that creates with rate, resource utilization is less than to the letter of the node server of threshold value Breath is sent to the first server;
After the first server receives the information of the node server for the resource abundance that the second server returns, The model file of the preparatory derived deep learning neural network model is broadcast to the node clothes of each resource abundance Business device;
The node server receives the model file of the deep learning neural network model of the first server broadcast; The corresponding training data in Hadoop distributed file system is read, the model file is trained.
One kind being based on distributed system, comprising: above-mentioned server and node server.
To sum up, a kind of method for managing resource and server and node server based on distributed system, can pass through work Make stream engine and construct DAG, the data prediction of deep learning, training, model are exported and the movements such as preservation are connected, it is convenient The layout of multiple tasks.The assessment of automation and the process for completing resource bid and distribution.The completion automated by frame is super The process of parameter distribution and model deployment can carry out auto slice to input data to be efficiently completed the model of data parallel Training.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by specification, right Specifically noted structure is achieved and obtained in claim and attached drawing.
Detailed description of the invention
Attached drawing is used to provide to further understand technical solution of the present invention, and constitutes part of specification, with this The embodiment of application technical solution for explaining the present invention together, does not constitute the limitation to technical solution of the present invention.
Fig. 1 is a kind of flow chart of method for managing resource based on distributed system of the embodiment of the present invention;
Fig. 2 is the schematic diagram of the server of the execution spark task of the embodiment of the present invention;
Fig. 3 is the resource management side based on distributed system of the ResourceManager process side of the embodiment of the present invention The flow chart of method;
Fig. 4 is the schematic diagram of the server of the execution ResourceManager process of the embodiment of the present invention;
Fig. 5 is the process of the method for managing resource based on distributed system of the node server side of the embodiment of the present invention Figure;
Fig. 6 is the schematic diagram of the node server of the embodiment of the present invention;
Fig. 7 is that the present invention applies a kind of exemplary flow chart of the method for managing resource based on distributed system.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention Embodiment be described in detail.It should be noted that in the absence of conflict, in the embodiment and embodiment in the application Feature can mutual any combination.
Step shown in the flowchart of the accompanying drawings can be in a computer system such as a set of computer executable instructions It executes.Also, although logical order is shown in flow charts, and it in some cases, can be to be different from herein suitable Sequence executes shown or described step.
Fig. 1 is a kind of flow chart of method for managing resource based on distributed system of the embodiment of the present invention, the distribution Formula system includes the first server for executing spark task, for executing the second service of ResourceManager process Device, and it is deployed with the node server of worker process, the method is executed by the first server, as shown in Figure 1, packet It includes:
When step 11, starting Spark task, according to pretreated training data and preset deep learning neural network The calculation amount of model assesses resource, to the second server application resource;
It, will be preparatory after step 12, the information for the node server for receiving the resource abundance that the second server returns The model file of the derived deep learning neural network model is broadcast to the node server of each resource abundance.
It is proposed of the embodiment of the present invention carries out general element hair distribution using the broadcast of Spark, such as data and model are retouched It states.Pass through the relevant ETL of Hadoop (Extraction Transformation Loading, data pick-up, conversion and load) Tool completes the pretreatment works such as the cleaning conversion of training data, and by load shedding to HDFS (Hadoop Distributed File System, Hadoop distributed file system) on, facilitate Spark to directly read data to carry out the training of model.
Correspondingly, the present embodiment provides a kind of servers 200, as shown in Fig. 2, the server 200 of the present embodiment is for holding Row spark task, comprising:
Apply for module 201, when for starting Spark task, according to pretreated training data and preset depth The calculation amount for practising neural network model assesses resource, to the second server application resource;
Broadcast module 202, the information of the node server for receiving the resource abundance that the second server returns Afterwards, the model file of the preparatory derived deep learning neural network model is broadcast to the node of each resource abundance Server.
In one embodiment, the broadcast module 202, being will be derived in advance described by the context process of Spark The model file of deep learning neural network model is broadcast to the node server of each resource abundance.
Fig. 3 is the resource management side based on distributed system of the ResourceManager process side of the embodiment of the present invention The flow chart of method, as shown in figure 3, the method for the present embodiment includes:
Step 31, after receiving the resource bid of the first server, according to the data volume and worker of training data The quantity of process carries out fragment to the training data being stored in Hadoop distributed file system;
Step 32 creates worker process on the node server that resource utilization is less than threshold value, by resource utilization Information less than the node server of threshold value is sent to the first server.
The scheme of the present embodiment, can be according to the type and input of task, and the assessment automated simultaneously completes resource Shen Please with the process of distribution, the monitoring and management of convenient progress task status simplify the submission and management process of task.
Correspondingly, the present embodiment also provides a kind of for executing the server 400 of ResourceManager process, such as Fig. 4 Shown, the server 400 of the present embodiment may include:
Fragment module 401, after the resource bid for receiving the first server, according to the data volume of training data Fragment is carried out to the training data being stored in Hadoop distributed file system with the quantity of worker process;
Creation module 402 will be provided for creating worker process on the node server that resource utilization is less than threshold value The information that source utilization rate is less than the node server of threshold value is sent to the first server.
Fig. 5 is the process of the method for managing resource based on distributed system of the node server side of the embodiment of the present invention Scheme, the node server in the present embodiment is deployed with worker process, as shown in figure 5, the method for the present embodiment may include:
Step 51, the model file for receiving the deep learning neural network model that the first server is broadcasted;
Corresponding training data in step 52, reading Hadoop distributed file system, carries out the model file Training.
The method of the present embodiment realizes deep learning data preparation, and training environment automation constructs, and according to task class The application and distribution of type and the completion resource of datamation, it may be convenient to which the training for carrying out data parallel improves operation The degree of automation of submission and the efficiency of deep learning model training.
Correspondingly, a kind of node server 600 for being deployed with worker process is present embodiments provided, as shown in fig. 6, this The node server 600 of embodiment may include:
Receiving module 601, the model text of the deep learning neural network model for receiving the first server broadcast Part;
Training module 602, for reading the corresponding training data in Hadoop distributed file system, to the model File is trained.
In one embodiment, the node server 600 can also include:
Export module 603, for the deep learning neural network model and relevant parameter export of training completion to be stored in In the Hadoop distributed file system.
The embodiment of the present invention is constructed for the automated environment of deep learning, the scheduling of task, the monitoring and mistake of task Processing etc., while the technology combined by deep learning with Spark, it may be convenient to each worker be made to have one completely Model, and realize parallel data processing by running different data between different worker.
Deep learning library will automatically create the neural network BP training algorithm of various shape and size.Establish a nerve net The real process of network, it is more more complex than running some algorithms on data set merely.Some hyper parameters are usually had to need to be arranged, Such as every layer of neuron number, learning rate etc..The superior performance of model can be allowed by selecting correct parameter, and bad parameter will It will lead to reasoning performance on prolonged trained and bad line.In practice, machine learning practitioner can be used for multiple times different Hyper parameter reruns identical model, to find optimal set.
It is a kind of process of the specific embodiment of method for managing resource based on distributed system below, as shown in fig. 7, May include:
Pretreated training data is stored in Hadoop distributed file system by step 101;
In the present embodiment, Kettle (kettle, Kettle are the ETL tools of a external open source) or Sqoop is used Data processing tools such as (sqoop are a tool for data exporting between hadoop and relevant database) are completed The parallelization of training data is extracted, conversion and cleaning, and is stored data on HDFS.
Step 102, the design construction work for completing deep learning neural network model, export as mould for designed model Type file.
The model of deep learning neural network can be carried out according to existing more mature model (such as vgg, AlexNet etc.) Modification (such as modification number of plies, modify certain layer of number of nodes etc.).It can also be according to own service feature, one nerve of brand-new design Network model.The method of design is that respectively have difference according to different deep learning frame differences.
Step 103, when starting Spark task, input data source is appointed as in step 101 having handled completion on HDFS Training data, system is according to the deep learning neural network model in the size and step 102 of the data volume of training data The resource that calculation amount assessment needs, and the application to the second server progress resource for executing ResourceManager process.
When one model of training, need to carry out the operation such as complicated matrix operation, this just need a large amount of cpu, The resources such as memory, storage.
To after ResourceManager application resource, ResourceManager can be assessed now Spark according to demand Used resource situation directly creates corresponding resource if resource abundance, and resource situation created is fed back to Spark.If resource is inadequate, carry out after waiting in line resource release, then created.
Step 104, ResourceManager process are according to the quantity of the data volume of training data and worker process to depositing The training data stored up on HDFS carries out automation fragment, and the training data after fragment and worker process correspond.
Step 105, ResourceManager process are distributed related according to the usage amount of the present resource of each node server Resource, and corresponding node create worker process (in worker process integrate such as tensorflow (tensor stream) depth Spend learning framework).
ResourceManager process can assess the resource behaviour in service of each node, if the memory of this node is complete Portion takes, and does not just use this node;If there are also surplus resources for this node, so that it may choose this node, create Worker process.
ResourceManager process is responsible for managing resource, it is communicated with each node, is responsible for each node of management Resource use and discharge situation.When Spark starts task, need to ResourceManager process application resource, such as Fruit resource abundance just starts task, if resource is inadequate, resource is waited to discharge.
After step 106, Spark receive the resource of ResourceManager process return, Spark's In ApplicationMaster, read out the model file of step 102 preservation, by the Context of Spark (context into Journey) being broadcast to model file automation in each worker process.
Step 107, each worker process read on HDFS the to one's name number of fragment using obtained model file According to carrying out parallel model training.
The model of training completion and the export of relevant parameter are stored on HDFS by step 108, worker process.
DAG can be constructed by workflow engine using the scheme of the present embodiment, and (Directed Acyclic Graph, has To acyclic figure), the data prediction of deep learning, training, model are exported and the movements such as preservation are connected, facilitates multiple The layout of business improves whole the degree of automation.
The scheme of the present embodiment, can be according to the type and input of task, and the assessment automated simultaneously completes resource Shen Please with the process of distribution, the monitoring and management of convenient progress task status simplify the submission and management process of task.Pass through The process for completing hyper parameter distribution and model deployment of frame automation can carry out auto slice to input data with efficient The model training for completing data parallel improves the degree of automation of operation submission and the efficiency of deep learning model training.
The embodiment of the present invention also provides a kind of device of video frequency searching, including processor and computer readable storage medium, Instruction is stored in the computer readable storage medium, wherein when described instruction is executed by the processor, realize above-mentioned Method for managing resource based on distributed system.
The embodiment of the invention also provides a kind of computer readable storage mediums, are stored with computer executable instructions, The computer executable instructions are performed the realization method for managing resource based on distributed system.
It will appreciated by the skilled person that whole or certain steps, system, dress in method disclosed hereinabove Functional module/unit in setting may be implemented as software, firmware, hardware and its combination appropriate.In hardware embodiment, Division between the functional module/unit referred in the above description not necessarily corresponds to the division of physical assemblies;For example, one Physical assemblies can have multiple functions or a function or step and can be executed by several physical assemblies cooperations.Certain groups Part or all components may be implemented as by processor, such as the software that digital signal processor or microprocessor execute, or by It is embodied as hardware, or is implemented as integrated circuit, such as specific integrated circuit.Such software can be distributed in computer-readable On medium, computer-readable medium may include computer storage medium (or non-transitory medium) and communication media (or temporarily Property medium).As known to a person of ordinary skill in the art, term computer storage medium is included in for storing information (such as Computer readable instructions, data structure, program module or other data) any method or technique in the volatibility implemented and non- Volatibility, removable and nonremovable medium.Computer storage medium include but is not limited to RAM, ROM, EEPROM, flash memory or its His memory technology, CD-ROM, digital versatile disc (DVD) or other optical disc storages, magnetic holder, tape, disk storage or other Magnetic memory apparatus or any other medium that can be used for storing desired information and can be accessed by a computer.This Outside, known to a person of ordinary skill in the art to be, communication media generally comprises computer readable instructions, data structure, program mould Other data in the modulated data signal of block or such as carrier wave or other transmission mechanisms etc, and may include any information Delivery media.

Claims (12)

1. a kind of method for managing resource based on distributed system, which is characterized in that the distributed system includes for executing The first server of spark task for executing the second server of ResourceManager process, and is deployed with The node server of worker process, the method are executed by the first server, comprising:
When starting Spark task, according to the calculating of pretreated training data and preset deep learning neural network model Amount assessment resource, to the second server application resource;
It, will the preparatory derived depth after the information for receiving the node server for the resource abundance that the second server returns The model file of degree learning neural network model is broadcast to the node server of each resource abundance.
2. the method for managing resource according to claim 1 based on distributed system, which is characterized in that
The training data is stored in advance in Hadoop distributed file system.
3. a kind of server, for executing spark task characterized by comprising
Apply for module, when for starting Spark task, according to pretreated training data and preset deep learning nerve net The calculation amount of network model assesses resource, to the second server application resource;
Broadcast module will be pre- after the information of the node server for receiving the resource abundance that the second server returns The model file of the deep learning neural network model derived from elder generation is broadcast to the node server of each resource abundance.
4. server according to claim 3, which is characterized in that
The training data is stored in advance in Hadoop distributed file system.
5. a kind of method for managing resource based on distributed system, which is characterized in that the distributed system includes for executing The first server of spark task for executing the second server of ResourceManager process, and is deployed with The node server of worker process, the method are executed by the second server, comprising:
After the resource bid for receiving the first server, the training that will be stored in Hadoop distributed file system Data carry out fragment, and the training data after fragment is corresponding with each worker process;
Worker process is created on the node server that resource utilization is less than threshold value, and resource utilization is less than to the section of threshold value The information of point server is sent to the first server.
6. a kind of server, for executing ResourceManager process characterized by comprising
Fragment module will be stored in Hadoop distributed field system after the resource bid for receiving the first server The training data on system carries out fragment, and the training data after fragment is corresponding with each worker process;
Creation module uses resource for creating worker process on the node server that resource utilization is less than threshold value The information that rate is less than the node server of threshold value is sent to the first server.
7. a kind of method for managing resource based on distributed system, which is characterized in that the distributed system includes for executing The first server of spark task for executing the second server of ResourceManager process, and is deployed with The node server of worker process, the method are executed by the node server, comprising:
Receive the model file of the deep learning neural network model of the first server broadcast;
The corresponding training data in Hadoop distributed file system is read, the model file is trained.
8. the method for managing resource according to claim 7 based on distributed system, which is characterized in that described to the mould After type file is trained, further includes:
Deep learning neural network model and the relevant parameter export that training is completed are stored in the Hadoop distributed document In system.
9. a kind of node server is deployed with worker process characterized by comprising
Receiving module, the model file of the deep learning neural network model for receiving the first server broadcast;
Training module, for reading the corresponding training data in Hadoop distributed file system, to the model file into Row training.
10. node server according to claim 9, which is characterized in that the node server further include:
Export module, it is described for the deep learning neural network model and relevant parameter export of training completion to be stored in In Hadoop distributed file system.
11. a kind of method for managing resource based on distributed system characterized by comprising
When first server for executing spark task starts Spark task, according to pretreated training data and preset Deep learning neural network model calculation amount assess resource, to for executing ResourceManager process second clothes Business device application resource;
After the second server receives the resource bid of the first server, Hadoop distributed field system will be stored in The training data on system carries out fragment, and the training data after fragment is corresponding with each worker process;In resource utilization Less than worker process is created on the node server of threshold value, the information that resource utilization is less than the node server of threshold value is sent out Give the first server;
It, will be pre- after the first server receives the information of the node server for the resource abundance that the second server returns The model file of the deep learning neural network model derived from elder generation is broadcast to the node server of each resource abundance;
The node server receives the model file of the deep learning neural network model of the first server broadcast;It reads Corresponding training data in Hadoop distributed file system, is trained the model file.
12. one kind is based on distributed system characterized by comprising server as described in claim 3 or 4, as right is wanted Server described in asking 6 and the node server such as claim 9 or 10.
CN201810953290.XA 2018-08-21 2018-08-21 A kind of method for managing resource and server based on distributed system Pending CN109062700A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810953290.XA CN109062700A (en) 2018-08-21 2018-08-21 A kind of method for managing resource and server based on distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810953290.XA CN109062700A (en) 2018-08-21 2018-08-21 A kind of method for managing resource and server based on distributed system

Publications (1)

Publication Number Publication Date
CN109062700A true CN109062700A (en) 2018-12-21

Family

ID=64686540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810953290.XA Pending CN109062700A (en) 2018-08-21 2018-08-21 A kind of method for managing resource and server based on distributed system

Country Status (1)

Country Link
CN (1) CN109062700A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685160A (en) * 2019-01-18 2019-04-26 创新奇智(合肥)科技有限公司 A kind of on-time model trained and dispositions method and system automatically
CN111753997A (en) * 2020-06-28 2020-10-09 北京百度网讯科技有限公司 Distributed training method, system, device and storage medium
CN112035261A (en) * 2020-09-11 2020-12-04 杭州海康威视数字技术股份有限公司 Data processing method and system
CN112394944A (en) * 2019-08-13 2021-02-23 阿里巴巴集团控股有限公司 Distributed development method, device, storage medium and computer equipment
CN113127163A (en) * 2019-12-31 2021-07-16 杭州海康威视数字技术股份有限公司 Model verification method and device and electronic equipment
CN113157252A (en) * 2021-04-13 2021-07-23 中国航天科工集团八五一一研究所 Electromagnetic signal general distributed intelligent processing and analyzing platform and method
CN113824650A (en) * 2021-08-13 2021-12-21 上海光华智创网络科技有限公司 Parameter transmission scheduling algorithm and system in distributed deep learning system
CN114169427A (en) * 2021-12-06 2022-03-11 北京百度网讯科技有限公司 Distributed training method, device and equipment based on end-to-end self-adaptation
CN114675965A (en) * 2022-03-10 2022-06-28 北京百度网讯科技有限公司 Federal learning method, apparatus, device and medium
US11847500B2 (en) 2019-12-11 2023-12-19 Cisco Technology, Inc. Systems and methods for providing management of machine learning components

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080222646A1 (en) * 2007-03-06 2008-09-11 Lev Sigal Preemptive neural network database load balancer
CN103699440A (en) * 2012-09-27 2014-04-02 北京搜狐新媒体信息技术有限公司 Method and device for cloud computing platform system to distribute resources to task
CN107145395A (en) * 2017-07-04 2017-09-08 北京百度网讯科技有限公司 Method and apparatus for handling task
CN107766148A (en) * 2017-08-31 2018-03-06 北京百度网讯科技有限公司 A kind of isomeric group and task processing method and device
CN107888669A (en) * 2017-10-31 2018-04-06 武汉理工大学 A kind of extensive resource scheduling system and method based on deep learning neutral net
CN107918560A (en) * 2016-10-14 2018-04-17 郑州云海信息技术有限公司 A kind of server apparatus management method and device
WO2018121282A1 (en) * 2016-12-26 2018-07-05 华为技术有限公司 Data processing method, end device, cloud device, and end-cloud collaboration system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080222646A1 (en) * 2007-03-06 2008-09-11 Lev Sigal Preemptive neural network database load balancer
CN103699440A (en) * 2012-09-27 2014-04-02 北京搜狐新媒体信息技术有限公司 Method and device for cloud computing platform system to distribute resources to task
CN107918560A (en) * 2016-10-14 2018-04-17 郑州云海信息技术有限公司 A kind of server apparatus management method and device
WO2018121282A1 (en) * 2016-12-26 2018-07-05 华为技术有限公司 Data processing method, end device, cloud device, and end-cloud collaboration system
CN107145395A (en) * 2017-07-04 2017-09-08 北京百度网讯科技有限公司 Method and apparatus for handling task
CN107766148A (en) * 2017-08-31 2018-03-06 北京百度网讯科技有限公司 A kind of isomeric group and task processing method and device
CN107888669A (en) * 2017-10-31 2018-04-06 武汉理工大学 A kind of extensive resource scheduling system and method based on deep learning neutral net

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘荣辉: "《大数据架构技术与实例分析》", 31 January 2018, 东北师范大学出版社 *
陶皖: "《云计算与大数据》", 31 January 2017, 西安电子科技大学出版社 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685160B (en) * 2019-01-18 2020-11-27 创新奇智(合肥)科技有限公司 Online model automatic training and deploying method and system
CN109685160A (en) * 2019-01-18 2019-04-26 创新奇智(合肥)科技有限公司 A kind of on-time model trained and dispositions method and system automatically
CN112394944A (en) * 2019-08-13 2021-02-23 阿里巴巴集团控股有限公司 Distributed development method, device, storage medium and computer equipment
US11847500B2 (en) 2019-12-11 2023-12-19 Cisco Technology, Inc. Systems and methods for providing management of machine learning components
CN113127163A (en) * 2019-12-31 2021-07-16 杭州海康威视数字技术股份有限公司 Model verification method and device and electronic equipment
CN111753997A (en) * 2020-06-28 2020-10-09 北京百度网讯科技有限公司 Distributed training method, system, device and storage medium
CN112035261A (en) * 2020-09-11 2020-12-04 杭州海康威视数字技术股份有限公司 Data processing method and system
CN113157252A (en) * 2021-04-13 2021-07-23 中国航天科工集团八五一一研究所 Electromagnetic signal general distributed intelligent processing and analyzing platform and method
CN113157252B (en) * 2021-04-13 2022-11-25 中国航天科工集团八五一一研究所 Electromagnetic signal general distributed intelligent processing and analyzing platform and method
CN113824650B (en) * 2021-08-13 2023-10-20 上海光华智创网络科技有限公司 Parameter transmission scheduling algorithm and system in distributed deep learning system
CN113824650A (en) * 2021-08-13 2021-12-21 上海光华智创网络科技有限公司 Parameter transmission scheduling algorithm and system in distributed deep learning system
CN114169427A (en) * 2021-12-06 2022-03-11 北京百度网讯科技有限公司 Distributed training method, device and equipment based on end-to-end self-adaptation
CN114169427B (en) * 2021-12-06 2022-10-04 北京百度网讯科技有限公司 Distributed training method, device and equipment based on end-to-end self-adaptation
CN114675965A (en) * 2022-03-10 2022-06-28 北京百度网讯科技有限公司 Federal learning method, apparatus, device and medium

Similar Documents

Publication Publication Date Title
CN109062700A (en) A kind of method for managing resource and server based on distributed system
US11429433B2 (en) Process discovery and automatic robotic scripts generation for distributed computing resources
Li et al. A scientific workflow management system architecture and its scheduling based on cloud service platform for manufacturing big data analytics
EP3047376B1 (en) Type-to-type analysis for cloud computing technical components
US11074107B1 (en) Data processing system and method for managing AI solutions development lifecycle
US20200097847A1 (en) Hyperparameter tuning using visual analytics in a data science platform
CN103092683B (en) For data analysis based on didactic scheduling
CN106067080B (en) Configurable workflow capabilities are provided
EP3495951A1 (en) Hybrid cloud migration delay risk prediction engine
CN111861020A (en) Model deployment method, device, equipment and storage medium
US10453165B1 (en) Computer vision machine learning model execution service
US20110313966A1 (en) Activity schemes for support of knowledge-intensive tasks
CN112287015B (en) Image generation system, image generation method, electronic device, and storage medium
CN108243012B (en) Charging application processing system, method and device in OCS (online charging System)
CN109918184A (en) Picture processing system, method and relevant apparatus and equipment
WO2022048648A1 (en) Method and apparatus for achieving automatic model construction, electronic device, and storage medium
Bhattacharjee et al. Stratum: A bigdata-as-a-service for lifecycle management of iot analytics applications
CN114254033A (en) Data processing method and system based on BS architecture
CN107168795B (en) Codon deviation factor model method based on CPU-GPU isomery combined type parallel computation frame
WO2020147601A1 (en) Graph learning system
CN113762514A (en) Data processing method, device, equipment and computer readable storage medium
Gu et al. Characterizing job-task dependency in cloud workloads using graph learning
CN110442753A (en) A kind of chart database auto-creating method and device based on OPC UA
US11775264B2 (en) Efficient deployment of machine learning and deep learning model's pipeline for serving service level agreement
CN114862098A (en) Resource allocation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181221

RJ01 Rejection of invention patent application after publication