WO2020062277A1 - 神经网络中数据预处理阶段的计算资源的管理方法和装置 - Google Patents

神经网络中数据预处理阶段的计算资源的管理方法和装置 Download PDF

Info

Publication number
WO2020062277A1
WO2020062277A1 PCT/CN2018/109181 CN2018109181W WO2020062277A1 WO 2020062277 A1 WO2020062277 A1 WO 2020062277A1 CN 2018109181 W CN2018109181 W CN 2018109181W WO 2020062277 A1 WO2020062277 A1 WO 2020062277A1
Authority
WO
WIPO (PCT)
Prior art keywords
computing
node
resource
information
adjusted
Prior art date
Application number
PCT/CN2018/109181
Other languages
English (en)
French (fr)
Inventor
范礼
路石
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201880098036.4A priority Critical patent/CN112753016A/zh
Priority to PCT/CN2018/109181 priority patent/WO2020062277A1/zh
Publication of WO2020062277A1 publication Critical patent/WO2020062277A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present application relates to the field of computing technology, and more particularly, to a method and device for computing resource management applied to a data preprocessing stage in a neural network.
  • Artificial intelligence is a theory, method, technology, and application system that uses digital computers or digital computer-controlled machines to simulate, extend, and extend human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic theories of AI.
  • the application of AI technology has penetrated into all walks of life.
  • the ability of neural network model training and rapid deployment has become the core competence of technology companies. How to improve the speed of network model training has become a research hotspot and has been widely concerned.
  • the neural network training process can be roughly divided into two stages, namely the data preprocessing stage and the network training stage.
  • the speed of network training largely depends on the computing acceleration capabilities of dedicated hardware, such as the powerful computing power of GPUs.
  • Most of the research has focused on the miniaturization of network structures, quantitative pruning, and operator fusion acceleration.
  • the speed of data preprocessing is often neglected. In many scenarios, one round of iterative calculations of the GPU is over, and the next round of data is not ready. It can only wait, which greatly reduces the overall calculation efficiency.
  • the present application provides a method and device for computing resource management applied to a data preprocessing stage in a neural network, which can dynamically adjust computing resources and implement load balancing, thereby making more reasonable use of computing resources.
  • a method for managing a computing resource applied to a data preprocessing stage in a neural network includes multiple heterogeneous computing nodes.
  • the method includes: monitoring the resources of the multiple computing nodes separately. Usage information, where the resource usage information is used to indicate the resource usage on each computing node; according to the resource usage information and based on a preset resource scheduling policy, a correspondence corresponding to the nodes to be adjusted among the multiple computing nodes is generated According to the resource adjustment information, the computing resources of the node to be adjusted are dynamically adjusted. Based on the size of the neural network and the actual data throughput requirements, it can adaptively dynamically adjust the data pre-processing calculations and optimize the deployment of the calculations. , To improve the computational throughput of data preprocessing, thereby improving the overall training efficiency of the neural network.
  • the resource scheduling strategy is set according to the calculation throughput requirement of the data preprocessing stage of the neural network model training stage.
  • the node to be adjusted may be a part of the computing nodes determined from multiple heterogeneous computing nodes, and may be one or more computing nodes.
  • the resource scheduling policy includes at least one of a load balancing policy or a resource utilization policy.
  • the resource scheduling policy includes at least one of a load balancing policy or a resource utilization policy. That is, the resource scheduling policy may include only a load balancing policy; or the resource scheduling policy may include only a resource utilization policy; or the resource scheduling policy includes a load balancing policy and a resource utilization policy.
  • the resource scheduling policy includes a load balancing policy. If the load on the node to be adjusted is unbalanced, the resource adjustment information includes topology information for adjusting the node to be adjusted in the computing topology; where And dynamically adjusting the computing resource according to the resource adjustment information, including: adjusting the topology position of the node to be adjusted in the computing topology according to the topology information, and adjusting the Compute load is migrated to idle compute nodes.
  • the load balancing of computing resources is a basic requirement. If load balancing is implemented, the data preprocessed in the data preprocessing phase can be obtained on time in the model training phase.
  • the method further includes: modifying information of a computing device corresponding to the idle computing node and information of a computing device corresponding to the bottleneck computing node.
  • the information of the computing device corresponding to the computing node and the information of the computing device corresponding to the node to be adjusted can be changed in time for the subsequent monitoring and use.
  • the resource scheduling strategy further includes a resource utilization strategy. If the resource utilization rate of the first computing node in the node to be adjusted is lower than the first resource utilization threshold, the resource to-be-adjusted information includes a processing thread that needs to be added. Information or information of a process that needs to be added; wherein, dynamically adjusting the computing resource according to the resource adjustment information includes: adding a processing thread or a path to the first computing node according to the information of the processing thread that needs to be added A processing process, where an added processing process or processing process includes one or more new computing nodes.
  • the resource utilization strategy can be used to further optimize the deployment of computing resources, which can further improve the overall training efficiency.
  • the method further includes: recording attribute information of the one or more newly-added computing nodes, the attribute information including one or more of the following information: a computing device corresponding to the computing node, computing The calculation type of the node and the execution probability of the node;
  • a device for managing computing resources applied to a data pre-processing phase in a neural network includes a module for executing the above-mentioned first aspect or a method in any possible implementation manner of the first aspect.
  • a computer-readable storage medium stores a program that causes a computer to execute the first aspect described above and data pre-processing in any of the various implementations of the neural network. Management methods of computing resources in the processing phase.
  • a computer program product containing instructions which when run on a computer, causes the computer to execute a method for managing a computing resource in a data preprocessing stage in the neural network in the first aspect described above.
  • a device for managing computing resources in a data preprocessing stage in a neural network includes a processor, a memory, and an interface.
  • the processor is connected to a memory and an interface (or interface circuit).
  • the memory is used to store instructions
  • the processor is used to execute the instructions
  • the transceiver is used to communicate with other computing nodes under the control of the processor.
  • the processor executes the instructions stored in the memory, the execution causes the processor to execute the method for managing a computing resource in a data preprocessing stage in the neural network in the first aspect.
  • FIG. 1 is an example diagram of an application architecture according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an example of a calculation process for training a neural network model.
  • FIG. 3 is a schematic flowchart of a computing resource management method applied to a data preprocessing stage in a neural network according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a specific implementation example of a method for managing a computing resource applied to a data preprocessing stage in a neural network according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an example of adjusting a computing resource according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of another example of adjusting a computing resource according to an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of a computing resource management device applied to a data preprocessing stage in a neural network according to an embodiment of the present application.
  • FIG. 8 is a schematic structural block diagram of a computing resource management device applied to a data preprocessing stage in a neural network according to an embodiment of the present application.
  • the technical solutions of the embodiments of the present application can be applied to various training model scenarios, machine learning fields, deep learning fields and other learning frameworks, such as the Tensorflow framework based on Google's open source software library, and neural network training models.
  • FIG. 1 is a schematic diagram of a data preprocessing heterogeneous computing application according to an embodiment of the present application.
  • FIG. 1 illustrates that data is images data as an example.
  • the architecture includes multiple computing units ( Figure 1 is described using two computing units as an example), and each computing unit may include an encoder / decoder, a central processing unit (central processing unit (CPU), advanced reduced instruction set microprocessor (advanced RISC machine, ARM), digital signal processor (digital signal processing, DSP), graphics processor (graphics processing unit (GPU), and embedded neural network processor (neural-network process units, NPU) and so on.
  • CPU central processing unit
  • advanced reduced instruction set microprocessor advanced reduced instruction set microprocessor
  • DSP digital signal processor
  • GPU graphics processor
  • embedded neural network processor neural-network process units
  • CPU ARM, DSP, GPU, NPU can support different operations.
  • FIG. 1 is used as an example for description, and does not limit the application architecture of the embodiment of the present application.
  • FIG. 1 may include more computing units, or the processing included in each computing unit. There may be more types or numbers of devices, which are not specifically limited.
  • Model training (such as model training in neural networks) is a cross-device, multi-stage process, and the computational efficiency of any stage will affect the overall training speed.
  • Figure 2 shows a schematic diagram of the calculation process in model training.
  • a training iteration calculation can be divided into two parts: a data preprocessing phase and a model training phase (also called a training calculation phase).
  • the data pre-processing stage includes four stages: data read / output (Data I / O), decoding (Decoding), pre-processing (Pre-Process), and augmentation (Augmentation).
  • the Data I / O phase includes fetching data from local storage, network file systems, distributed file systems, etc.
  • the model training phase refers to: after the data preprocessing phase, the neural network training calculation is started.
  • the data preprocessing stage is most likely to affect the efficiency of the training model.
  • the calculation process in the data preprocessing stage can be implemented using computing resources, for example, deployed on a central processing unit (CPU) or a graphics processing unit (graphics processing unit) to implement data preprocessing.
  • CPU central processing unit
  • graphics processing unit graphics processing unit
  • the prior art computing deployment is fixed, that is, statically deploying calculations in advance, or simply placing calculations on the CPU, which is not flexible enough and it is difficult to make full use of computing resources to achieve high-throughput data preprocessing.
  • Data throughput requirements of neural network models For example, some lightweight network models have fast calculation speeds and large data throughput requirements, while some networks are relatively large in scale, slow in calculation speed, and require small data throughput.
  • This application intends to propose a method for computing resource management method applied to the data preprocessing stage of a neural network, which can adaptively dynamically adjust data preprocessing calculations based on the size of the neural network and actual data throughput requirements, and optimize the deployment of the calculation , Flexibly adjust computing resources, improve the computing throughput of data preprocessing, and thereby improve the overall training efficiency of the neural network.
  • the data pre-processing process may be implemented through computing resources.
  • a computing resource can be understood as a computing device, and computing can be implemented by multiple processors.
  • Computing resources can implement one or more of the following computing functions: encoding, decoding, filtering, cropping, translation, rotation, enhanced contrast, inverse color, equalization, color gain, brightness, sharpness, cutout, etc.
  • the embodiment of the present application does not limit the type of the processor, and the processor may be any one or more of the following: a CPU, an advanced reduced instruction set microprocessor (ARM), a digital signal processor (digital signal processor) processing (DSP), GPU and other devices with computing or data processing capabilities.
  • ARM advanced reduced instruction set microprocessor
  • DSP digital signal processor
  • FIG. 3 is a schematic flowchart of a method 300 for computing resource management in a data preprocessing stage in a neural network according to an embodiment of the present application.
  • the computing resource includes multiple heterogeneous computing nodes, and the method 300 includes:
  • S310 Monitor resource usage information of the multiple computing nodes, respectively.
  • the plurality of computing nodes constitute a computing topology.
  • Each computing node can be used to process the data in the data preprocessing process of the neural network.
  • the resource usage information is used to indicate a resource usage situation on each computing node. After obtaining the resource usage, you can know which computing nodes have low resource utilization, which computing nodes have bottlenecks, which computing nodes are idle, and so on. Based on this information, you can obtain the resource adjustment corresponding to the node to be adjusted. information.
  • a device status monitoring process can be used to detect the resource usage of each node.
  • each computing node may be understood as a logical node, and one computing node or multiple computing nodes may correspond to one computing device (such as a processor).
  • the resource usage of each computing node can also be understood as the resource usage of the computing device corresponding to the computing node.
  • each computing node can also be understood as a physical node, and the resource usage of a computing node can be understood as the resource usage of a computing device.
  • the resource usage of the computing node or the resource usage of the computing device may include one or more of the following factors: processor utilization, memory utilization, memory bandwidth utilization, network bandwidth utilization, disk I / O rate, thread wait time, etc.
  • the resource usage information may indicate which computing nodes have low resource utilization, or may also indicate which computing nodes have excessive resource utilization.
  • the computing node to be adjusted can be determined according to the resource usage information.
  • the node to be adjusted here may be a part of the computing nodes determined from multiple heterogeneous computing nodes, and may be one or more computing nodes, which is not limited.
  • the number of nodes to be adjusted is not specifically limited in the embodiment of the present application.
  • the resource adjustment information may include various information required for computing resources, such as information such as a thread or process number, a node attribute, and a processor corresponding to the node, which is not specifically limited.
  • the resource scheduling strategy is set according to a computational throughput requirement of the data preprocessing stage during a model training phase of the neural network.
  • the model training phase of the neural network occurs after the data preprocessing phase.
  • “Calculating throughput rate requirements” means that during the model training phase, the calculation results of the data preprocessing phase need to be obtained within a certain period of time. That is, the execution speed of the data preprocessing stage needs to match the requirements of the model training stage.
  • the resource scheduling policy includes at least one of a load balancing policy or a resource utilization policy. That is, the resource scheduling policy may include only a load balancing policy; or the resource scheduling policy may include only a resource utilization policy; or the resource scheduling policy includes a load balancing policy and a resource utilization policy.
  • the resource scheduling policy may include only a load balancing policy; or the resource scheduling policy may include only a resource utilization policy; or the resource scheduling policy includes a load balancing policy and a resource utilization policy.
  • load balancing when using the computing resources to perform the data preprocessing phase, it is a basic requirement that the computing resources implement load balancing. If load balancing is implemented, it can meet the model training phase to obtain the preprocessed data in the data preprocessing phase on time. .
  • the resource scheduling strategy includes a load balancing strategy and a resource utilization strategy: Further, if the load balancing of computing resources has been met, but the resource utilization of the computing nodes can be further improved, then the resource utilization strategy can be further optimized The deployment of computing resources can further improve the overall training efficiency.
  • the adjustment includes one or more of the following processes: adjusting the number of processing threads of the node to be adjusted, adjusting the number of processing processes of the node to be adjusted, and adjusting the node to be adjusted in the calculation Topological location in the topology.
  • resource monitoring information of each computing node is monitored, and resource adjustment information of nodes to be adjusted among multiple computing nodes is generated according to the resource usage information and a preset resource scheduling policy, and finally according to The resource adjustment information dynamically adjusts the computing resources of the nodes to be adjusted, and can dynamically adjust the computing resources in the data preprocessing process, realize load balancing, increase the degree of parallelism of the calculation, and make the most of the calculation.
  • Resources thereby speeding up the processing speed of data preprocessing in the neural network (accelerating the computational throughput of data I / O, decoding, preprocessing, and enhancing operations), and reducing the training time of network models.
  • the resource scheduling policy includes a load balancing policy. If the load on the node to be adjusted is unbalanced, the resource adjustment information includes topology information used to adjust the node to be adjusted in a computing topology. That is, if a load balancing strategy is used and the load on the nodes to be adjusted is detected to be unbalanced, the resource adjustment information generated in step S320 will include information for adjusting the nodes in the computing topology to be adjusted. Topology information.
  • S330 includes:
  • the topology information adjust a topology position of the node to be adjusted in the computing topology, and migrate a computing load on a bottleneck computing node in the node to be adjusted to an idle computing node.
  • the computing load on the bottleneck computing node in the node to be adjusted may be migrated or transferred to the idle computing node.
  • judging the level of resource utilization on a computing node can be achieved by using a second resource utilization threshold.
  • the resource adjustment information may include topology information adjusted by the node to be adjusted in the computing topology.
  • the adjusted topology information of the node to be adjusted in the computing topology may include the adjusted topology position of the bottleneck computing node or the adjusted topology position of the idle computing node.
  • migration or transfer of the computing load on the bottleneck computing node to an idle computing node adjacent to or close to the bottleneck computing node can be selected to reduce the migration complexity.
  • the foregoing second resource utilization threshold may be determined based on actual requirements, and the specific value of the second resource utilization threshold is not limited in the embodiment of the present application.
  • the method 300 further includes:
  • the information of the computing device may include attributes of the computer device, information of a node corresponding to the computing device, and the like, which are not limited thereto.
  • the resource scheduling policy may further include a resource utilization policy.
  • the resource utilization strategy can be used to further optimize the deployment of computing resources. That is, the embodiment of the present application may combine a load balancing policy and a resource utilization policy together as a judgment condition for adjusting computing resources.
  • the information to be adjusted includes the information of the processing threads that need to be added or the information of the processes that need to be added;
  • S330 includes:
  • a processing thread may be added to the path where the first computing node is located, or Process.
  • the resource adjustment information includes information of a processing thread to be added or information of a process to be added.
  • the processing thread or processing process added to the path where the first computing node is located may include one or more newly added computing nodes to increase the computing load on the node to be adjusted.
  • the first resource utilization threshold here may be determined based on actual requirements, and the specific value of the first resource utilization threshold is not limited in the embodiment of the present application.
  • the embodiment of the present application does not limit the relationship between the first resource utilization threshold and the second resource utilization threshold, and may be the same or different, which is not limited.
  • the computing topology may be updated accordingly.
  • the method 300 further includes:
  • the attribute information includes one or more of the following information: a computing device corresponding to the computing node, a computing type of the computing node, and an execution probability of the computing node;
  • the attribute information of these computing nodes also needs to be recorded, where the attribute information may include one or more of the following information: the computing device corresponding to the newly added computing node ( Or processor), the type of calculation of the newly added computing nodes, and the probability that the newly added computing nodes will be used to perform the neural network data preprocessing process in order to detect these newly added computing nodes and obtain the Subsequent resource usage scenarios, which facilitates subsequent use when adjusting computing resources.
  • the computing device corresponding to the newly added computing node Or processor
  • the type of calculation of the newly added computing nodes the probability that the newly added computing nodes will be used to perform the neural network data preprocessing process in order to detect these newly added computing nodes and obtain the Subsequent resource usage scenarios, which facilitates subsequent use when adjusting computing resources.
  • the method for managing computing resources in the embodiments of the present application may not be limited to the process of neural network data preprocessing, but also other training scenarios that require data preprocessing, such as fuzzy model scenarios and support vector machine training model scenarios. This is not limited.
  • the embodiments of the present application are applicable to any heterogeneous hardware platform (including, distributed system, embedded system, and guaranteed PC server (Server)) and other automated computing deployment tasks.
  • FIG. 4 is a schematic diagram illustrating a specific implementation example of a method for managing a computing resource applied to a data preprocessing stage in a neural network according to an embodiment of the present application.
  • the method for managing computing resources applied to the data preprocessing stage in a neural network according to an embodiment of the present application may be implemented by the following components: a topology management component (or a topology manager), and a resource management component (or a topology manager).
  • a resource manager or a state management component (or called a state manager).
  • the topology management component is used to manage the computing topology, and is specifically used to execute: create a dynamic topology map and processing flow, manage the thread pool, implement parallel computing, and make full use of device computing resources.
  • the resource management component is used to manage the deployment of computing resources. It is specifically used to monitor and manage resources such as computing, memory, and threads, analyze the resource utilization of each computing device, and adjust the node's computing deployment and computing based on the feedback information about the computing device. Topology.
  • the resource management component can monitor the node or computing device through the device status monitoring process. Specifically, it can monitor the following information: processor utilization, memory usage, memory bandwidth utilization, network bandwidth utilization, disk I / O rate, thread waiting Time, etc.
  • the state management component is used to record the attribute information of the computing node, and the attribute information includes parameters such as the computing device, operator calculation type, and execution probability.
  • the state management component may transmit the attribute information of the node to the resource management component.
  • the resource management component when the resource management component learns that the computing resource utilization of a certain computing node is not high, the resource management component can inform the topology management component that it needs to adjust the topology and increase the processing thread so that the topology management component returns a thread to the resource management component. Or process number; when the resource management component learns that the platform load is not balanced, the resource management component can inform the state management component that it needs to adjust the computing device corresponding to the node, so that the state management component returns the deployment device ID to the resource management component. Wherein, when a new computing node is added, the topology management component also needs to inform the state management component of the new computing node information, so that the state management component records the information of the new computing node. The state management component can feedback the record confirmation of the newly added computing node attribute information to the topology management component.
  • the computing system initializes a session.
  • the initialization includes initialization of a computing topology, and initialization of components (including: a topology management component, a resource management component, and a state management component).
  • the resource management component can start the device status monitoring process and collect real-time resource information of various devices on the hardware platform, such as processor utilization, memory usage, memory bandwidth utilization, network bandwidth utilization, disk I / O rate, threads Wait time and so on.
  • the computing system performs data processing based on the computing topology, and at the same time, the resource management component synchronously analyzes the status information of each hardware device fed back by the device status monitoring process.
  • the resource management component detects that some computing nodes in the computing topology or the resource utilization path of the computing node is not high, it can notify the topology management component to increase the number of threads or processes on the path, that is, in the original calculation.
  • a path is dynamically added based on the topology, and the path is composed of multiple nodes and edges. Among them, a node represents a calculation operation, and an edge represents a flow of processing data, which is used to indicate a data processing flow.
  • the topology management component can feedback the newly created thread ID or process ID to the resource management component, so that the resource management component can update the monitoring analysis range.
  • the topology management component notifies the state management component of the new node.
  • the state management component creates the attribute information of the new node based on the new node, and returns the creation confirmation information of the attribute information of the new computing node to the topology management component.
  • FIG. 5 is a schematic diagram illustrating an example of a computing topology in which computing resources are adjusted according to an embodiment of the present application.
  • differently shaped boxes in the computing topology represent different processors (including processor A, processor B, processor C, and processor D), where the numbers in the box (including 1, 2 , 3 ... 10) represent different computing operations (may be called operators or computing nodes).
  • the upper graph in Figure 5 is the calculation topology before adjustment.
  • the resource management module learns that the resource utilization of the processors A and B is not high, the computing topology is adjusted to obtain the following computing topology in FIG. 5.
  • the adjusted calculation topology adds calculation branches of operators 1, 2, 3, 4, 5, and 7, that is, adds parallel processing threads and improves the calculation throughput of data processing.
  • the resource management module detects that the resource utilization of some computing nodes (that is, bottleneck computing nodes and idle computing nodes) in the computing topology is unbalanced, for example, there is a situation where data volume is blocked and the load is unbalanced, then The information of the bottleneck computing node and the information of the idle computing node are notified to the state management module.
  • the state management module deploys the calculations of the bottleneck computing nodes to the devices of the space nodes based on the information of the bottleneck computing nodes and the information of the idle computing nodes, thereby achieving load balancing.
  • the state management module needs to modify the information of the bottleneck computing node and the information of the idle computing node, and feedback the changed node attribute information to the resource management module.
  • the resource management module updates the correspondence between the hardware device and the computing node based on the changed node attribute information, and updates the monitoring analysis.
  • FIG. 6 is a schematic diagram of another example of a computing topology to which the embodiments of the present application are applied.
  • differently shaped boxes in the computing topology represent different processors, and the numbers in the boxes represent different computing operations (which may be referred to as operators or nodes or computing nodes).
  • the upper graph in Figure 6 is the calculation topology before adjustment.
  • the resource management module learns that the resource utilization of processor A and processor B is unbalanced, it adjusts the computing topology to obtain the following computing topology in FIG. 6.
  • the adjusted computing topology transferred the computing load on node 5 (node corresponding to processor A) where the bottleneck occurred to the idle computing node 6 (node corresponding to processor B). Load balancing.
  • FIG. 5 and FIG. 6 are merely for the convenience of those skilled in the art to understand the embodiments of the present application, and the embodiments of the present application are not intended to limit the embodiments to the specific scenarios illustrated. Those skilled in the art can obviously make various equivalent modifications or changes according to the examples of FIGS. 5 and 6, and such modifications or changes also fall within the scope of the embodiments of the present application.
  • the size of the sequence numbers of the above processes does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not be implemented in this application.
  • the implementation process of the example constitutes any limitation.
  • the method for managing a computing resource applied to a data preprocessing stage in a neural network is described in detail above with reference to FIGS. 1 to 6.
  • the following describes a management device for computing resources applied to a data preprocessing stage in a neural network according to an embodiment of the present application with reference to FIG. 7 and FIG. 8. It should be understood that the technical features described in the method embodiments are also applicable to the following device embodiments.
  • FIG. 7 shows a schematic block diagram of a computing resource management device 700 applied to a data preprocessing stage in a neural network according to an embodiment of the present application.
  • the apparatus 700 is configured to execute the method embodiment described above.
  • the specific form of the apparatus 700 may be a software component and / or hardware.
  • the apparatus 700 may be a processor or a chip in a processor.
  • the computing resource includes multiple heterogeneous computing nodes, and the apparatus 700 includes:
  • a monitoring module 710 configured to separately monitor resource usage information of the plurality of computing nodes
  • a generating module 720 configured to generate resource adjustment information corresponding to a node to be adjusted among the multiple computing nodes according to the resource usage information and based on a preset resource scheduling policy;
  • a processing module 730 is configured to dynamically adjust a computing resource of the node to be adjusted according to the resource adjustment information.
  • the resource scheduling strategy is set according to a computational throughput requirement of the data preprocessing stage of a model training phase of the neural network.
  • the resource scheduling policy includes at least one of a load balancing policy or a resource utilization policy.
  • the resource scheduling policy includes a load balancing policy. If a load imbalance exists in the node to be adjusted, the resource adjustment information includes a parameter for adjusting the calculation of the node to be adjusted. Topology information in the topology;
  • the processing module 730 is configured to dynamically adjust the computing resources according to the resource adjustment information, and specifically includes:
  • the topology information adjust a topology position of the node to be adjusted in the computing topology, and migrate a computing load on a bottleneck computing node in the node to be adjusted to an idle computing node.
  • processing module 730 is further configured to:
  • the resource scheduling policy further includes a resource utilization policy. If the resource utilization of the first computing node in the node to be adjusted is lower than a first resource utilization threshold, the resource The adjustment information includes information of a processing thread that needs to be added or information of a process that needs to be added;
  • the processing module is configured to dynamically adjust the computing resource according to the resource adjustment information, and specifically includes:
  • processing module 730 is further configured to:
  • the attribute information includes one or more of the following information: a computing device corresponding to the computing node, a computing type of the computing node, and an execution probability of the computing node;
  • the monitoring module 710 is further configured to monitor the one or more newly added computing nodes.
  • the device 700 may be used to execute the method of the foregoing method embodiment, for example, the method in FIG. 3, and the above and other management operations and / or functions of the various modules in the device 700 are respectively to achieve the foregoing
  • the corresponding steps of the method in the method embodiment can also achieve the beneficial effects in the foregoing method embodiment. For brevity, details are not described here.
  • each module in the apparatus 700 may be implemented in the form of software and / or hardware, which is not specifically limited.
  • the device 700 is presented in the form of a functional module.
  • the "module” herein may refer to an application-specific integrated circuit ASIC, a circuit, a processor and a memory executing one or more software or firmware programs, an integrated logic circuit, and / or other devices that can provide the above functions.
  • the apparatus 700 may take the form shown in FIG. 8.
  • the monitoring module 710, the generating module 720, and the processing module 730 may be implemented by the processor 801 shown in FIG. Specifically, the processor is implemented by executing a computer program stored in a memory.
  • the sending and receiving functions and / or implementation processes involved in the device 700 may also be implemented through pins or interface circuits.
  • the memory is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the computer device, such as the memory shown in FIG. 8 802.
  • FIG. 8 shows a schematic structural diagram of a computing resource management device 800 applied to a data preprocessing stage in a neural network according to an embodiment of the present application.
  • the apparatus 800 includes: a processor 801.
  • the processor 801 is configured to perform the following actions: separately monitor resource usage information of the plurality of computing nodes; and generate information based on the resource usage information and based on a preset resource scheduling policy. Resource adjustment information corresponding to the node to be adjusted among the plurality of computing nodes; and dynamically adjusting the computing resource of the node to be adjusted according to the resource adjustment information.
  • the processor 801 may call an interface to perform related sending and receiving or communication actions, and the called interface may be a logical interface or a physical interface, which is not limited thereto.
  • the physical interface may be implemented through a transceiver circuit.
  • the apparatus 800 further includes an interface 803.
  • the apparatus 800 further includes a memory 802, and the program code in the foregoing method embodiment may be stored in the memory 802, so as to be called by the processor 801.
  • the device 800 includes a processor 801, a memory 802, and an interface 803, the processor 801, the memory 802, and the interface 803 communicate with each other through an internal connection path to transfer control and / or data signals.
  • the processor 801, the memory 802, and the interface 803 may be implemented by a chip, and the processor 801, the memory 802, and the interface 803 may be implemented in the same chip, or may be implemented in different chips respectively, or in which Any two functions can be combined in one chip.
  • the memory 802 may store program code, and the processor 801 calls the program code stored in the memory 802 to implement a corresponding function of the device 800.
  • the device 800 may also be used to perform other steps and / or operations of the method in the foregoing embodiments. For brevity, details are not described herein.
  • the present application also provides a computing system for a neural network.
  • the computing system includes one or more computing nodes and a management device for computing resources (such as the device 700 described above) applied to a data preprocessing stage in the neural network.
  • the management device is configured to manage the one or more computing nodes.
  • the management device may be deployed on a computing device corresponding to any one of the computing nodes, or may be independently deployed, which is not limited.
  • the methods disclosed in the embodiments of the present application may be applied to a processor, or implemented by a processor.
  • the processor may be an integrated circuit chip with signal processing capabilities.
  • each step of the foregoing method embodiment may be completed by using an integrated logic circuit of hardware in a processor or an instruction in a form of software.
  • the above processor may be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA), or other programmable Programming logic devices, discrete gate or transistor logic devices, and discrete hardware components can also be system chips (SoCs), central processing units (CPUs), and network processors (network processors) processor (NP), can also be a digital signal processor (DSP), can also be a microcontroller (microcontroller unit, MCU), can also be a programmable controller (programmable logic device, PLD) or other Integrated chip.
  • SoCs system chips
  • CPUs central processing units
  • NP network processors
  • DSP digital signal processor
  • MCU microcontroller unit
  • PLD programmable controller
  • Various methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in combination with the embodiments of the present application may be directly implemented by a hardware decoding processor, or may be performed by using a combination of hardware and software modules in the decoding processor.
  • the software module may be located in a mature storage medium such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, and the like.
  • the storage medium is located in a memory, and the processor reads the information in the memory and completes the steps of the foregoing method in combination with its hardware.
  • the memory in the embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrical memory Erase programmable read-only memory (EPROM, EEPROM) or flash memory.
  • the volatile memory may be a random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double SDRAM double SDRAM
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • enhanced SDRAM enhanced SDRAM
  • SLDRAM synchronous connection dynamic random access memory
  • direct RAMbus RAM direct RAMbus RAM
  • Computer-readable media may include, but are not limited to: magnetic storage devices (eg, hard disks, floppy disks, or magnetic tapes, etc.), optical disks (eg, compact discs (CDs), digital versatile discs (DVDs) Etc.), smart cards and flash memory devices (for example, erasable programmable read-only memory (EPROM), cards, sticks or key drives, etc.).
  • magnetic storage devices eg, hard disks, floppy disks, or magnetic tapes, etc.
  • optical disks eg, compact discs (CDs), digital versatile discs (DVDs) Etc.
  • smart cards and flash memory devices for example, erasable programmable read-only memory (EPROM), cards, sticks or key drives, etc.
  • various storage media described herein may represent one or more devices and / or other machine-readable media used to store information.
  • machine-readable medium may include, but is not limited to, wireless channels and various other media capable of storing, containing, and / or carrying instruction (s) and / or data.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of this application is essentially a part that contributes to the existing technology or a part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
  • the aforementioned storage media include: U disks, mobile hard disks, read-only memories (ROMs), random access memories (RAMs), magnetic disks or compact discs and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种应用于神经网络的数据预处理阶段的计算资源的管理方法和装置,该计算资源包括多个异构的计算节点,该方法包括:分别监测所述多个计算节点的资源使用信息(S310);根据所述资源使用信息,并基于预设的资源调度策略,生成与所述多个计算节点中的待调整节点对应的资源调整信息(S320);根据所述资源调整信息,对所述待调整节点的计算资源进行动态调整(S330),能够提高计算资源的利用率,有助于减少神经网络模型的训练时间。

Description

神经网络中数据预处理阶段的计算资源的管理方法和装置 技术领域
本申请涉及计算技术领域,并且更具体地,涉及一种应用于神经网络中数据预处理阶段的计算资源的管理方法和装置。
背景技术
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。
AI技术的应用已经深入到各行各业,神经网络模型的训练和快速部署能力已经成为技术企业的核心能力,如何提高网络模型训练速度已经成为研究热点被广泛关注。神经网络训练过程可以大致分为两个阶段,即数据预处理阶段和网络训练阶段。其中,网络训练的速度很大程度上取决于专用硬件的计算加速能力,如GPU的超强算力,大部分的研究都集中在网络结构的小型化、量化剪枝、算子融合加速等方面,而数据预处理的速度往往被忽视,导致在很多场景下,GPU的一轮迭代计算已经结束,下一轮的数据还没有准备好,只能等待,使得整体计算效率大打折扣。
目前,现有技术在数据预处理的计算仍然部署在CPU上,导致很多训练场景仍热会出现性能瓶颈,比如网络规模不大、数据量大的情形。因此,亟需提出一种来提升数据预处理的效率。
发明内容
有鉴于此,本申请提供一种应用于神经网络中数据预处理阶段的计算资源的管理方法和装置,能够动态调整计算资源,实现负载均衡,从而更加合理的利用计算资源。
第一方面,提供了一种应用于神经网络中数据预处理阶段的计算资源的管理方法,所述计算资源包括多个异构的计算节点,该方法包括:分别监测该多个计算节点的资源使用信息,其中,该资源使用信息用于表示每个计算节点上的资源使用情况;根据该资源使用信息,并基于预设的资源调度策略,生成与该多个计算节点中的待调整节点对应的资源调整信息;根据该资源调整信息,对该待调整节点的计算资源进行动态调整,能够根据神经网络规模和实际的数据吞吐需求,自适应的动态调整数据预处理的计算,通过优化计算部署,提高数据预处理的计算吞吐,从而提升神经网络的整体训练效率。
可选地,该资源调度策略是根据该神经网络的模型训练阶段对该数据预处理阶段的计 算吞吐率需求设定的。
可选地,待调整的节点可以是从多个异构的计算节点中确定出的一部分计算节点,可以是一个或多个计算节点。
在一种可能的实现方式中,该资源调度策略包括负载均衡策略或资源利用率策略中的至少一种。
可选地,该资源调度策略包括负载均衡策略或资源利用率策略中的至少一种。也就是说,资源调度策略可以仅包括负载均衡策略;或者,资源调度策略可以仅包括资源利用率策略;或者,资源调度策略包括负载均衡策略和资源利用率策略。
在一种可能的实现方式中,该资源调度策略包括负载均衡策略,若该待调整节点上的负载不均衡,该资源调整信息包括用于调整该待调整节点在计算拓扑中的拓扑信息;其中,根据该资源调整信息,对该计算资源进行动态调整,包括:根据该拓扑信息,对该待调整节点在该计算拓扑中的拓扑位置进行调整,并将该待调整节点中瓶颈计算节点上的计算负载迁移到空闲计算节点。
这里,在使用计算资源执行该数据预处理阶段时,计算资源实现负载均衡是基本需求,如果实现了负载均衡,能够满足模型训练阶段按时得到数据预处理阶段预处理后的数据即可。
可选地,该方法还包括:修改该空闲计算节点对应的计算设备的信息以及该瓶颈计算节点对应的计算设备的信息。这里,在将瓶颈计算节点上的计算负载迁移到空闲计算节点后,还可以及时更改计算节点对应的计算设备的信息以及待调整节点对应的计算设备的信息,以便于后续监测使用。
可选地,该资源调度策略还包括资源利用率策略,若该待调整节点中存在第一计算节点的资源利用率低于第一资源利用门限,该待资源调整信息包括需要增加的处理线程的信息或者需要增加的进程的信息;其中,根据该资源调整信息,对该计算资源进行动态调整,包括:根据该需要增加的处理线程的信息,对该第一计算节点所在的路径增加处理线程或处理进程,其中,增加后的处理新程或处理进程中包括一个或多个新增计算节点。
这里,如果已经满足了计算资源的负载均衡,但是计算节点的资源利用率还可以进一步提升,那么可以采用资源利用率策略进一步优化计算资源的部署,从而能够进一步提高整体的训练效率。
在一种可能的实现方式中,该方法还包括:记录该一个或多个新增计算节点的属性信息,该属性信息包括以下信息中的一项或多项:计算节点对应的计算设备,计算节点的计算类型,计算节点的执行概率;
对该一个或多个新增计算节点进行监测。
第二方面,提供了一种应用于神经网络中数据预处理阶段的计算资源的管理装置,该装置包括用于执行上述第一方面或第一方面的任意可能的实现方式中的方法的模块。
第三方面,提供了一种计算机可读存储介质,该计算机可读存储介质存储有程序,该程序使得计算机执行上述第一方面,及其各种实现方式中的任一种神经网络中数据预处理阶段的计算资源的管理方法。
第四方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面中的神经网络中数据预处理阶段的计算资源的管理方法。
第五方面,提供了一种神经网络中数据预处理阶段的计算资源的管理装置,该装置包括处理器、存储器和接口。处理器与存储器和接口(或接口电路)连接。存储器用于存储指令,处理器用于执行该指令,收发器用于在处理器的控制下与其他计算节点进行通信。该处理器执行该存储器存储的指令时,该执行使得该处理器执行上述第一方面中的神经网络中数据预处理阶段的计算资源的管理方法。
附图说明
图1是本申请实施例的一个应用架构的示例图。
图2是神经网络模型训练的计算过程的一个例子的示意图。
图3是根据本申请实施例的应用于神经网络中数据预处理阶段的计算资源的管理方法的示意性流程图。
图4是根据本申请实施例的应用于神经网络中数据预处理阶段的计算资源的管理方法的一个具体实现的例子的示意图。
图5是应用本申请实施例的调整计算资源的一个例子的示意图。
图6是应用本申请实施例的调整计算资源的另一个例子的示意图。
图7是本申请实施例的应用于神经网络中数据预处理阶段的计算资源的管理装置的示意性框图。
图8是本申请实施例的应用于神经网络中数据预处理阶段的计算资源的管理装置的示意性结构框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
在本申请实施例的描述中,除非另有说明,“多个”或“多项”的含义是两个或两个以上。另外,“至少一个”可以解释为“一个或多个”;“至少一项”可以解释为“一个或多项”。
本申请实施例的技术方案可以应用于各种训练模型场景,机器学习领域,深度学习领域等学习框架中,比如,基于谷歌开源软件库的Tensorflow框架,神经网络训练模型。
应理解,本申请实施例的技术方案不仅适用于单机同构的硬件资源,还可以适用于分布式集群系统,且分布式集群系统中的每个节点上存在较多异构计算单元的大规模计算平台。在如此大规模且复杂的硬件平台上,要处理好神经网络模型训练数据预处理,匹配预处理速度和网络计算速度,需要一种有效的动态部署计算资源的方案。
图1是应用本申请实施例的数据预处理异构计算的一个架构示意图。图1是以数据是图像(images)数据为例进行说明。如图1所示,该架构包括可以包括多个计算单元(图1是以两个计算单元为例描述),每个计算单元中可以包括编解码器(encoder/decoder)、中央处理器(central processing unit,CPU)、高级精简指令集微处理器(advanced RISC machine,ARM)、数字信号处理器(digital signal processing,DSP)、图形处理器(graphics processing unit,GPU)和嵌入式神经网络处理器(neural-network process units,NPU)等。
其中,CPU、ARM、DSP、GPU、NPU可以支持不同的运算。
应理解,这里只是以图1的架构为例进行描述,并不对本申请实施例的应用架构造成 限定,比如,图1中可以包括更多的计算单元,或者,每个计算单元中包括的处理器可以又更多种类或数目,对此不作具体限定。
模型训练(比如神经网络中的模型训练)是一个跨设备、多阶段的过程,任一阶段的计算效率都会影响整体的训练速度。图2示出了模型训练中计算过程的示意图。如图2所示,一般而言,可以将一次训练迭代计算分为两个部分,分别为:数据预处理阶段和模型训练(model training)阶段(或称作训练计算阶段)。其中,数据预处理阶段包括四个阶段,分别为:数据读入/输出(Data I/O)、解码(Decoding)、预处理(Pre-Process)、增强(Augmentation)。其中,Data I/O阶段包括将数据从本地存储、网络文件系统、分布式文件系统等取到管道(pipeline)中处理;Decoding阶段是将压缩数据进行解码;Pre-Process阶段是对数据进行必要的预处理,如剪切等;Augmentation阶段是对数据进行增强操作,以丰富样本数据的多样性。模型训练阶段是指:在数据预处理阶段后,开始做神经网络的训练计算。
其中,数据预处理阶段最有可能影响训练模型的效率。数据预处理阶段的计算过程可采用计算资源实现,比如,部署在中央处理器(central processing unit,CPU)或图形处理器(graphics processing unit,GPU)上实现数据预处理。但是现有技术的计算部署是固定的,即预先静态地部署计算,或者简单的放置在CPU上运算,不够灵活,很难充分利用计算资源来实现大吞吐量的数据预处理,也没有考虑不同神经网络模型的数据量吞吐需求。比如,一些轻量级网络模型,计算速度快,要求的数据吞吐很大,而一些网络规模比较大,计算速度慢,要求的数据吞吐量较小。本申请拟提出一种应用于神经网络的数据预处理阶段的计算资源管理方法的方法,能够根据神经网络规模和实际的数据吞吐需求,自适应的动态调整数据预处理的计算,通过优化计算部署,灵活地调整计算资源,提高数据预处理的计算吞吐,从而提升神经网络的整体训练效率。
在本申请实施例中,数据预处理的过程可通过计算资源实现。计算资源可以理解为计算设备,可通过多个处理器实现计算。计算资源可以实行以下计算功能中的一项或多项:编码、解码、滤波、剪切、平移、旋转、增强对比度、反色、均衡、色彩都得、亮度、锐度、cutout等。本申请实施例对处理器的类型不作限定,处理器可以是以下中的任意一项或多项:CPU、高级精简指令集微处理器(advanced RISC machine,ARM)、数字信号处理器(digital signal processing,DSP)、GPU等具有计算能力或数据处理能力的设备。
图3是根据本申请实施例的神经网络中数据预处理阶段的计算资源的管理方法300的示意性流程图。所述计算资源包括多个异构的计算节点,所述方法300包括:
S310,分别监测所述多个计算节点的资源使用信息。所述多个计算节点组成计算拓扑。每个计算节点可以用于处理神经网络的数据预处理过程中的数据。
其中,所述资源使用信息用于表示每个计算节点上的资源使用情况。在得到资源使用情况后,可以得知哪些计算节点的资源利用率不高、哪些计算节点上出现瓶颈、哪些计算节点上空闲等等各种情形,从而基于这些信息获取待调整节点对应的资源调整信息。
示例性地,可以通过一个设备状态监控进程检测每个节点的资源使用情况。
可选地,每个计算节点可以理解为逻辑节点,一个计算节点或多个计算节点可以对应一个计算设备(比如处理器)。所述每个计算节点资源使用情况也可以理解为计算节点对应的计算设备的资源使用情况。或者,每个计算节点也可以理解为物理节点,那么计算节 点的资源使用情况可以理解为计算设备的资源使用情况。
可选地,计算节点的资源使用情况或计算设备的资源使用情况可以包括以下因素中的一项或多项:处理器利用率、内存占用率、内存带宽利用率、网络带宽利用率、磁盘I/O速率、线程等待时间等等。
S320,根据所述资源使用信息,并基于预设的资源调度策略,生成与所述多个计算节点中的待调整节点对应的资源调整信息。
可选地,所述资源使用信息可以指示哪些计算节点上的资源利用率较低,或者,也可以指示哪些计算节点上的资源利用率过高。这样,可以根据所述资源使用信息,确定出待调整的计算节点。这里的待调整的节点可以是从多个异构的计算节点中确定出的一部分计算节点,可以是一个或多个计算节点,对此不作限定。本申请实施例对待调整节点的数目不作具体限定。
其中,所述资源调整信息中可以包括计算资源所需的各种信息,比如,线程或进程号、节点属性、节点对应的处理器等信息,对此不作具体限定。
可选地,所述资源调度策略是根据所述神经网络的模型训练阶段对所述数据预处理阶段的计算吞吐率需求设定的。
其中,所述神经网络的模型训练阶段在所述数据预处理阶段后发生的。这两个阶段的具体描述可以参见前文关于图2中的描述。“计算吞吐率需求”是指:在执行模型训练阶段时,需要在一定时间内获取数据预处理阶段的计算结果。也就是说,所述数据预处理阶段的执行速度,需要匹配模型训练阶段的需求。
可选地,所述资源调度策略包括负载均衡策略或资源利用率策略中的至少一种。也就是说,资源调度策略可以仅包括负载均衡策略;或者,资源调度策略可以仅包括资源利用率策略;或者,资源调度策略包括负载均衡策略和资源利用率策略。具体而言,在使用计算资源执行所述数据预处理阶段时,计算资源实现负载均衡是基本需求,如果实现了负载均衡,能够满足模型训练阶段按时得到数据预处理阶段预处理后的数据即可。对于资源调度策略包括负载均衡策略和资源利用率策略的情形:进一步的,如果已经满足了计算资源的负载均衡,但是计算节点的资源利用率还可以进一步提升,那么可以采用资源利用率策略进一步优化计算资源的部署,从而能够进一步提高整体的训练效率。
S330,根据所述资源调整信息,对所述待调整节点的计算资源进行动态调整。
可选地,所述调整包括以下处理中的一项或多项:调整所述待调整节点的处理线程数目,调整所述待调整节点的处理进程数目,调整所述待调整节点在所述计算拓扑中的拓扑位置。
在本申请实施例中,通过监测每个计算节点的资源使用信息,并根据所述资源使用信息以及预设的资源调度策略,生成多个计算节点中的待调整节点的资源调整信息,最后根据所述资源调整信息,对所述待调整节点的计算资源进行动态调整处理,能够动态地调整数据预处理过程中的计算资源,实现了负载均衡,增加了计算并行度,最大程度的利用了计算资源,从而加快了神经网络中数据预处理的处理速度(加快了数据I/O、解码、预处理、增强操作的计算吞吐),减少了网络模型的训练时间。
可选地,所述资源调度策略为包括负载均衡策略,若所述待调整节点上的负载不均衡,所述资源调整信息包括用于调整所述待调整节点在计算拓扑中的拓扑信息。也就是说,如 果采用的是负载均衡策略,且监测到待调整节点上的负载不均衡,那么步骤S320中生成的资源调整信息中就会包括用于调整所述待调整节点在计算拓扑中的拓扑信息。
其中,S330包括:
根据所述拓扑信息,对所述待调整节点在所述计算拓扑中的拓扑位置进行调整,并将所述待调整节点中瓶颈计算节点上的计算负载迁移到空闲计算节点。
具体地,如果待调整节点中存在瓶颈计算节点和空闲计算节点,即发生了负载不均衡的情形,那么可以将所述待调整节点中瓶颈计算节点上的计算负载迁移或转移到空闲计算节点上。可选地,判断计算节点上的资源利用率的高低可以通过第二资源利用门限来实现,比如:如果存在计算节点上的资源利用率高于或等于第二资源利用门限,则认为该计算节点上的资源利用率过高,可以认为是瓶颈计算节点;如果存在计算节点上的资源利用率小于第二资源利用门限,则认为该计算节点上的资源利用率过低,可以认为是空闲计算节点。应理解,这里只是以第二资源利用门限为例进行描述,并不对本申请实施例构成限定。其中,资源调整信息中可以包括所述待调整节点在所述计算拓扑中调整后的拓扑信息。这里,所述待调整节点在所述计算拓扑中调整后的拓扑信息,可以包括瓶颈计算节点调整后的拓扑位置,也可以包括空闲计算节点调整后的拓扑位置。可选地,可以选择将瓶颈计算节点上的计算负载迁移或转移到与瓶颈计算节点相邻或相近的空闲计算节点上能够降低迁移复杂度。
应理解,上述第二资源利用门限可以基于实际需求确定,本申请实施例对第二资源利用门限的具体取值不作限定。
可选地,所述方法300还包括:
修改所述空闲计算节点对应的计算设备的信息以及所述瓶颈计算节点对应的计算设备的信息。
比如,计算设备的信息可以包括计算机设备的属性、计算设备对应的节点的信息等等,对此不作限定。
也就是说,在将瓶颈计算节点上的计算负载迁移到空闲计算节点后,还需要及时更改计算节点对应的计算设备的信息以及待调整节点对应的计算设备的信息,以便于后续监测使用。
进一步地,所述资源调度策略还可以包括资源利用率策略。这里,使用负载均衡策略调整了计算资源后,如果存在一些计算节点上的资源利用率还有进一步提升的空间,那么可以使用资源利用率策略进一步对计算资源进行优化部署。也就是说,本申请实施例可以联合负载均衡策略和资源利用率策略一起,作为调整计算资源的判断条件。
示例性地,若所述待调整节点中存在第一计算节点的资源利用率低于第一资源利用门限,所述待资源调整信息包括需要增加的处理线程的信息或者需要增加的进程的信息;
其中,S330,包括:
根据所述需要增加的处理线程的信息,对所述第一计算节点所在的路径增加处理线程或处理进程,其中,增加后的处理新程或处理进程中包括一个或多个新增计算节点。
具体地,如果第一计算节点上的资源利用率不高(也可以理解为第一计算节点对应的处理器的资源利用率不高),那么可以对第一计算节点所在的路径增加处理线程或者处理进程。这里,判断第一计算节点上的资源利用率的高低可以通过第一资源利用门限来实现, 具体即:如果第一计算节点上的资源利用率低于第一资源利用门限,则认为第一计算节点上的资源利用率不高。其中,资源调整信息中包括需要增加的处理线程的信息或需要增加的进程的信息。这里,对第一计算节点所在的路径增加的处理线程或者处理进程中,可以包括一个或多个新增的计算节点,以提高待调整节点上的计算负载。
应理解,这里的第一资源利用门限可以基于实际需求确定,本申请实施例对第一资源利用门限的具体取值不作限定。
还应理解,本申请实施例对上述第一资源利用门限和上述第二资源利用门限的关系不作限定,可以相同,也可以不同,对此不作限定。
还应理解,上述关于资源利用率策略的实施例可以单独实施,而不依赖于负载均衡策略,本申请实施例对此不作限定。
可选地,在增加上述一个或多个新增的计算节点后,还可以对计算拓扑进行相应的更新。
可选地,所述方法300还包括:
记录所述一个或多个新增计算节点的属性信息,所述属性信息包括以下信息中的一项或多项:计算节点对应的计算设备,计算节点的计算类型,计算节点的执行概率;
对所述一个或多个新增计算节点进行监测。
也就是说,在新增一个或多个计算节点后,还需要记录这些计算节点的属性信息,其中,属性信息可以包括以下信息中的一项或多项:新增计算节点对应的计算设备(或处理器)、新增计算节点的计算类型、新增计算节点后续用于执行神经网络数据预处理过程的概率,以便于对这些新增的计算节点进行检测,得到这些新增计算节点上的后续的资源使用情形,从而便于后续调整计算资源时使用。
应理解,本申请实施例的管理计算资源的方法可以不限于是神经网络数据预处理的过程,还可以是其他有数据预处理需求的训练场景,比如,模糊模型场景、支持向量机训练模型场景,对此不作限定。可选地,本申请实施例适用于任何异构硬件平台(包括、分布式系统、嵌入式系统、保准PC服务器(Server))等自动化计算部署任务。
为了便于本领域技术人员理解,下面结合具体的例子进行描述。
图4示出了根据本申请实施例的应用于神经网络中数据预处理阶段的计算资源的管理方法的一个具体实现的例子的示意图。如图4所示,本申请实施例的应用于神经网络中数据预处理阶段的计算资源的管理方法可以通过以下组件实现:拓扑管理组件(或称作拓扑管理器),资源管理组件(或称作资源管理器)和状态管理组件(或称作状态管理器)。
其中,拓扑管理组件用于管理计算拓扑,具体用于执行:创建动态拓扑图以及处理流程,管理线程池,实现并行计算,充分利用设备计算资源。
资源管理组件用于管理计算资源的部署,具体用于:监控并管理计算、内存、线程等资源,分析各个计算设备的资源利用率,并根据关于计算设备的反馈信息调整节点的计算部署以及计算拓扑。资源管理组件可以通过设备状态监控进程对节点或计算设备进行监控,具体可以监控以下信息:处理器利用率、内存占用率、内存带宽利用率、网络带宽利用率、磁盘I/O速率、线程等待时间等。
状态管理组件用于记录计算节点的属性信息,属性信息包括计算设备、算子计算类型、执行概率等参数。可选地,当数据的计算过程需要跨平台重新部署时,状态管理组件可以 将节点的属性信息传输给资源管理组件。
在图4中,在资源管理组件得知某一计算节点的计算资源利用率不高时,资源管理组件可以告知拓扑管理组件需要调整拓扑,增加处理线程,使得拓扑管理组件向资源管理组件返回线程号或进程号;在资源管理组件得知平台负载不均衡时,资源管理组件可以告知状态管理组件需要调整节点对应的计算设备,使得状态管理组件向资源管理组件返回部署设备ID。其中,在新增计算节点时,拓扑管理组件还需要告知状态管理组件新增计算节点的信息,使得状态管理组件记录新增计算节点的信息。状态管理组件可以将新增的计算节点属性信息的记录确认反馈给拓扑管理组件。
具体来说,计算系统初始化一个会话(session),初始化包括计算拓扑的初始化,组件(包括:拓扑管理组件,资源管理组件和状态管理组件)的初始化。资源管理组件可以启动设备状态监控进程,实时搜集硬件平台的各种设备的资源信息,比如,处理器利用率,内存占用率,内存带宽利用率,网络带宽利用率,磁盘I/O速率,线程等待时间等等。计算系统基于计算拓扑进行数据处理,同时,资源管理组件同步分析设备状态监控进程反馈的各个硬件设备的状态信息。
例如,如果资源管理组件监测到计算拓扑中的某些计算节点或者计算节点所在的路径的资源利用率不高,可以通知拓扑管理组件增加该路径上计算的线程数目或者进程数目,即在原有计算拓扑的基础上动态增加一条路径,该路径由多个节点和边组成。其中,一个节点表示一个计算操作,边表示处理数据的流向,用于表明数据的处理流程。拓扑管理组件可以将新创建的线程ID或者进程ID反馈给资源管理组件,使得资源管理组件可以更新监控分析范围。另外,拓扑管理组件还将新增节点通知给状态管理组件。状态管理组件基于新增节点创建新增节点的属性信息,并将新增计算节点的属性信息的创建确认信息返回给拓扑管理组件。
图5给出了应用本申请实施例调整计算资源的计算拓扑的一个例子的示意图。如图5所示,该计算拓扑中的不同形状的框代表不同的处理器(包括处理器A、处理器B、处理器C和处理器D),其中,框内的数字(包括1、2、3…10)代表不同的计算操作(可以称作算子或计算节点)。图5中上面的图是调整之前的计算拓扑。资源管理模块在得知处理器A和处理器B的资源利用率不高时,对计算拓扑进行调整,得到图5中下面的计算拓扑。调整后的计算拓扑相比于调整前的计算拓扑增加了算子1、2、3、4、5、7的计算分支,即增加了并行处理线程,提高了数据处理的计算吞吐。
又例如,如果资源管理模块监测到计算拓扑中的某些计算节点(即瓶颈计算节点和空闲计算节点)的资源利用率不均衡,比如,出现了数据量阻塞、负载不均衡的情形,则将瓶颈计算节点的信息和空闲计算节点的信息通知给状态管理模块。状态管理模块基于瓶颈计算节点的信息和空闲计算节点的信息,将瓶颈计算节点的计算部署到空间节点的设备上,从而实现负载均衡。状态管理模块需要修改瓶颈计算节点的信息和空闲计算节点的信息,并更改后的节点属性信息反馈给资源管理模块。资源管理模块基于更改后的节点属性信息,更新硬件设备和计算节点的对应关系,并更新监控分析。
举例来说,图6给出了应用本申请实施例的计算拓扑的另一个例子的示意图。如图6所示,该计算拓扑中的不同形状的框代表不同的处理器,其中,框内的数字代表不同的计算操作(可以称作算子或节点或计算节点)。图6中上面的图是调整之前的计算拓扑。资 源管理模块在得知处理器A和处理器B的资源利用率不均衡时,对计算拓扑进行调整,得到图6中下面的计算拓扑。调整后的计算拓扑相比于调整前的计算拓扑,将出现瓶颈的节点5(处理器A对应的节点)上的计算负载转移到了空闲计算节点6(处理器B对应的节点)上,实现了负载均衡。
应理解,图5和图6中的例子仅仅是为了便于本领域技术人员理解本申请实施例,并非要将本申请实施例限于例示的具体场景。本领域技术人员根据图5和图6的例子,显然可以进行各种等价的修改或变化,这样的修改或变化也落入本申请实施例的范围内。
还应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
上文结合图1至图6详细描述了根据本申请实施例的应用于神经网络中数据预处理阶段的计算资源的管理方法。下面将结合图7和图8描述根据本申请实施例的应用于神经网络中数据预处理阶段的计算资源的管理装置。应理解,方法实施例所描述的技术特征同样适用于以下装置实施例。
图7示出了根据本申请实施例的应用于神经网络中数据预处理阶段的计算资源的管理装置700的示意性框图。所述装置700用于执行前文描述的方法实施例。可选地,所述装置700的具体形态可以是软件组件和/或硬件。可选地,所述装置700可以是处理器或处理器中的芯片。所述计算资源包括多个异构的计算节点,所述装置700包括:
监测模块710,用于分别监测所述多个计算节点的资源使用信息;
生成模块720,用于根据所述资源使用信息,并基于预设的资源调度策略,生成与所述多个计算节点中的待调整节点对应的资源调整信息;
处理模块730,用于根据所述资源调整信息,对所述待调整节点的计算资源进行动态调整。
在一种可选的实现方式中,所述资源调度策略是根据所述神经网络的模型训练阶段对所述数据预处理阶段的计算吞吐率需求设定的。
在一种可选的实现方式中,所述资源调度策略包括负载均衡策略或资源利用率策略中的至少一种。
在一种可选的实现方式中,所述资源调度策略包括负载均衡策略,若所述待调整节点中存在负载不均衡的情形,所述资源调整信息包括用于调整所述待调整节点在计算拓扑中的拓扑信息;
其中,所述处理模块730用于根据所述资源调整信息,对所述计算资源进行动态调整,具体包括:
根据所述拓扑信息,对所述待调整节点在所述计算拓扑中的拓扑位置进行调整,并将所述待调整节点中瓶颈计算节点上的计算负载迁移到空闲计算节点。
在一种可选的实现方式中,所述处理模块730还用于:
修改所述空闲计算节点对应的计算设备的信息以及所述瓶颈计算节点对应的计算设备的信息。
在一种可选的实现方式中,所述资源调度策略还包括资源利用率策略,若所述待调整节点中存在第一计算节点的资源利用率低于第一资源利用门限,所述待资源调整信息包括 需要增加的处理线程的信息或者需要增加的进程的信息;
其中,所述处理模块用于根据所述资源调整信息,对所述计算资源进行动态调整,具体包括:
根据所述需要增加的处理线程的信息,对所述第一计算节点所在的路径增加处理线程或处理进程,其中,增加后的处理新程或处理进程中包括一个或多个新增计算节点。
在一种可选的实现方式中,所述处理模块730还用于:
记录所述一个或多个新增计算节点的属性信息,所述属性信息包括以下信息中的一项或多项:计算节点对应的计算设备,计算节点的计算类型,计算节点的执行概率;
其中,所述监测模块710还用于:对所述一个或多个新增计算节点进行监测。
应理解,根据本申请实施例的装置700可用于执行前述方法实施例的方法,比如,图3中的方法,并且装置700中的各个模块的上述和其它管理操作和/或功能分别为了实现前述方法实施例的方法的相应步骤,因此也可以实现前述方法实施例中的有益效果,为了简洁,这里不作赘述。
还应理解,装置700中的各个模块可以通过软件和/或硬件形式实现,对此不作具体限定。换言之,装置700是以功能模块的形式来呈现。这里的“模块”可以指特定应用集成电路ASIC、电路、执行一个或多个软件或固件程序的处理器和存储器、集成逻辑电路,和/或其他可以提供上述功能的器件。可选地,在一个简单的实施例中,本领域的技术人员可以想到装置700可以采用图8所示的形式。监测模块710、生成模块720和处理模块730可以通过图8所示的处理器801实现。具体的,处理器通过执行存储器中存储的计算机程序来实现。可选地,当所述装置700是芯片时,那么装置700中涉及的收发的功能和/或实现过程还可以通过管脚或接口电路等来实现。可选地,所述存储器为所述芯片内的存储单元,比如寄存器、缓存等,所述存储单元还可以是所述计算机设备内的位于所述芯片外部的存储单元,如图8所的存储器802。
图8示出了根据本申请实施例的应用于神经网络中数据预处理阶段的计算资源的管理装置800的示意性结构图。如图8所示,所述装置800包括:处理器801。
在一种可能的实现方式中,所述处理器801用于执行以下动作:分别监测所述多个计算节点的资源使用信息;根据所述资源使用信息,并基于预设的资源调度策略,生成与所述多个计算节点中的待调整节点对应的资源调整信息;根据所述资源调整信息,对所述待调整节点的计算资源进行动态调整。
应理解,所述处理器801可以调用接口执行相关的收发或通信动作,其中,调用的接口可以是逻辑接口或物理接口,对此不作限定。可选地,物理接口可以通过收发电路实现。可选地,所述装置800还包括接口803。
可选地,所述装置800还包括存储器802,存储器802中可以存储上述方法实施例中的程序代码,以便于处理器801调用。
具体地,若所述装置800包括处理器801、存储器802和接口803,则处理器801、存储器802和接口803之间通过内部连接通路互相通信,传递控制和/或数据信号。在一个可能的设计中,处理器801、存储器802和接口803可以通过芯片实现,处理器801、存储器802和接口803可以是在同一个芯片中实现,也可能分别在不同的芯片实现,或者其中任意两个功能组合在一个芯片中实现。该存储器802可以存储程序代码,处理器801 调用存储器802存储的程序代码,以实现装置800的相应功能。
应理解,所述装置800还可用于执行前文实施例中方法的其他步骤和/或操作,为了简洁,这里不作赘述。
本申请还提供了一种神经网络的计算系统,所述计算系统包括一个或多个计算节点,以及应用于神经网络中数据预处理阶段的计算资源的管理装置(比如前文所述的装置700),所述管理装置用于管理所述一个或多个计算节点。可选地,所述管理装置可以部署在任意一个计算节点对应的计算设备上,或者,也可以独立部署,对此不作限定。
上述本申请实施例揭示的方法可以应用于处理器中,或者由处理器实现。处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,还可以是系统芯片(system on chip,SoC),还可以是中央处理器(central processor unit,CPU),还可以是网络处理器(network processor,NP),还可以是数字信号处理电路(digital signal processor,DSP),还可以是微控制器(micro controller unit,MCU),还可以是可编程控制器(programmable logic device,PLD)或其他集成芯片。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
可以理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本申请的各个方面或特征可以实现成方法、装置或使用标准编程和/或工程技术的制品。本申请中使用的术语“制品”涵盖可从任何计算机可读器件、载体或介质访问的计算机程序。例如,计算机可读介质可以包括,但不限于:磁存储器件(例如,硬盘、软盘或磁带等),光盘(例如,压缩盘(compact disc,CD)、数字通用盘(digital versatile disc, DVD)等),智能卡和闪存器件(例如,可擦写可编程只读存储器(erasable programmable read-only memory,EPROM)、卡、棒或钥匙驱动器等)。另外,本文描述的各种存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读介质。术语“机器可读介质”可包括但不限于,无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。
还应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (18)

  1. 一种应用于神经网络中数据预处理阶段的计算资源的管理方法,其特征在于,所述计算资源包括多个异构的计算节点,所述方法包括:
    分别监测所述多个计算节点的资源使用信息;
    根据所述资源使用信息,并基于预设的资源调度策略,生成与所述多个计算节点中的待调整节点对应的资源调整信息;
    根据所述资源调整信息,对所述待调整节点的计算资源进行动态调整。
  2. 根据权利要求1所述的方法,其特征在于,所述资源调度策略是根据所述神经网络的模型训练阶段对所述数据预处理阶段的计算吞吐率需求设定的。
  3. 根据权利要求1或2所述的方法,其特征在于,所述资源调度策略包括负载均衡策略或资源利用率策略中的至少一种。
  4. 根据权利要求3所述的方法,其特征在于,所述资源调度策略包括负载均衡策略,若所述待调整节点中存在负载不均衡的情形,所述资源调整信息包括用于调整所述待调整节点在计算拓扑中的拓扑信息;
    其中,所述根据所述资源调整信息,对所述计算资源进行动态调整,包括:
    根据所述拓扑信息,对所述待调整节点在所述计算拓扑中的拓扑位置进行调整,并将所述待调整节点中的瓶颈计算节点上的计算负载迁移到空闲计算节点。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    修改所述空闲计算节点对应的计算设备的信息以及所述瓶颈计算节点对应的计算设备的信息。
  6. 根据权利要求4或5所述的方法,其特征在于,所述资源调度策略还包括资源利用率策略,若所述待调整节点中存在第一计算节点的资源利用率低于第一资源利用门限,所述待资源调整信息包括需要增加的处理线程的信息或者需要增加的进程的信息;
    其中,所述根据所述资源调整信息,对所述计算资源进行动态调整,包括:
    根据所述需要增加的处理线程的信息,对所述第一计算节点所在的路径增加处理线程或处理进程,其中,增加后的处理新程或处理进程中包括一个或多个新增计算节点。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    记录所述一个或多个新增计算节点的属性信息,所述属性信息包括以下信息中的一项或多项:计算节点对应的计算设备,计算节点的计算类型,计算节点的执行概率;
    对所述一个或多个新增计算节点进行监测。
  8. 一种应用于神经网络中数据预处理阶段的计算资源的管理装置,其特征在于,所述计算资源包括多个异构的计算节点,所述装置包括:
    监测模块,用于分别监测所述多个计算节点的资源使用信息;
    生成模块,用于根据所述资源使用信息,并基于预设的资源调度策略,生成与所述多个计算节点中的待调整节点对应的资源调整信息;
    处理模块,用于根据所述资源调整信息,对所述待调整节点的计算资源进行动态调整。
  9. 根据权利要求8所述的装置,其特征在于,所述资源调度策略是根据所述神经网 络的模型训练阶段对所述数据预处理阶段的计算吞吐率需求设定的。
  10. 根据权利要求8或9所述的装置,其特征在于,所述资源调度策略包括负载均衡策略或资源利用率策略中的至少一种。
  11. 根据权利要求10所述的装置,其特征在于,所述资源调度策略包括负载均衡策略,若所述待调整节点中存在负载不均衡的情形,所述资源调整信息包括用于调整所述待调整节点在计算拓扑中的拓扑信息;
    其中,所述处理模块用于根据所述资源调整信息,对所述计算资源进行动态调整,具体包括:
    根据所述拓扑信息,对所述待调整节点在所述计算拓扑中的拓扑位置进行调整,并将所述待调整节点中瓶颈计算节点上的计算负载迁移到空闲计算节点。
  12. 根据权利要求11所述的装置,其特征在于,所述处理模块还用于:
    修改所述空闲计算节点对应的计算设备的信息以及所述瓶颈计算节点对应的计算设备的信息。
  13. 根据权利要求11或12所述的装置,其特征在于,所述资源调度策略还包括资源利用率策略,若所述待调整节点中存在第一计算节点的资源利用率低于第一资源利用门限,所述待资源调整信息包括需要增加的处理线程的信息或者需要增加的进程的信息;
    其中,所述处理模块用于根据所述资源调整信息,对所述计算资源进行动态调整,具体包括:
    根据所述需要增加的处理线程的信息,对所述第一计算节点所在的路径增加处理线程或处理进程,其中,增加后的处理新程或处理进程中包括一个或多个新增计算节点。
  14. 根据权利要求13所述的装置,其特征在于,所述处理模块还用于:
    记录所述一个或多个新增计算节点的属性信息,所述属性信息包括以下信息中的一项或多项:计算节点对应的计算设备,计算节点的计算类型,计算节点的执行概率;
    其中,所述监测模块还用于:对所述一个或多个新增计算节点进行监测。
  15. 一种计算机程序存储介质,其特征在于,所述计算机程序存储介质具有程序指令,当所述程序指令被直接或者间接执行时,使得如权利要求1-7中任一所述的方法在计算设备中得以实现。
  16. 一种管理神经网络的数据预处理过程中的计算资源的装置,其特征在于,包括:至少一个处理器和接口,所述接口用于所述装置与一个或多个计算节点进行信息交互,当程序指令在所述至少一个处理器中执行时,使得所述装置实现如权利要求1-7中任一所述的方法。
  17. 根据权利要求16所述的装置,其特征在于,所述装置还包括:存储器,所述存储器中存储有所述程序指令。
  18. 一种神经网络的计算系统,其特征在于,包括:一个或多个计算节点,以及如权利要求8至14中任一项所述的应用于神经网络中数据预处理阶段的计算资源的管理装置。
PCT/CN2018/109181 2018-09-30 2018-09-30 神经网络中数据预处理阶段的计算资源的管理方法和装置 WO2020062277A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880098036.4A CN112753016A (zh) 2018-09-30 2018-09-30 神经网络中数据预处理阶段的计算资源的管理方法和装置
PCT/CN2018/109181 WO2020062277A1 (zh) 2018-09-30 2018-09-30 神经网络中数据预处理阶段的计算资源的管理方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/109181 WO2020062277A1 (zh) 2018-09-30 2018-09-30 神经网络中数据预处理阶段的计算资源的管理方法和装置

Publications (1)

Publication Number Publication Date
WO2020062277A1 true WO2020062277A1 (zh) 2020-04-02

Family

ID=69950217

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/109181 WO2020062277A1 (zh) 2018-09-30 2018-09-30 神经网络中数据预处理阶段的计算资源的管理方法和装置

Country Status (2)

Country Link
CN (1) CN112753016A (zh)
WO (1) WO2020062277A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113391907A (zh) * 2021-06-25 2021-09-14 中债金科信息技术有限公司 一种任务的放置方法、装置、设备和介质
CN114091688B (zh) * 2021-11-25 2022-05-20 北京九章云极科技有限公司 一种计算资源获取方法、装置、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040216114A1 (en) * 2003-04-22 2004-10-28 Lin Sheng Ling Balancing loads among computing nodes
CN103617086A (zh) * 2013-11-20 2014-03-05 东软集团股份有限公司 一种并行计算方法及系统
CN103812895A (zh) * 2012-11-12 2014-05-21 华为技术有限公司 调度方法、管理节点以及云计算集群
CN104168332A (zh) * 2014-09-01 2014-11-26 广东电网公司信息中心 高性能计算中负载均衡与节点状态监控方法
CN108200156A (zh) * 2017-12-29 2018-06-22 南京邮电大学 一种云环境下分布式文件系统的动态负载均衡方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040216114A1 (en) * 2003-04-22 2004-10-28 Lin Sheng Ling Balancing loads among computing nodes
CN103812895A (zh) * 2012-11-12 2014-05-21 华为技术有限公司 调度方法、管理节点以及云计算集群
CN103617086A (zh) * 2013-11-20 2014-03-05 东软集团股份有限公司 一种并行计算方法及系统
CN104168332A (zh) * 2014-09-01 2014-11-26 广东电网公司信息中心 高性能计算中负载均衡与节点状态监控方法
CN108200156A (zh) * 2017-12-29 2018-06-22 南京邮电大学 一种云环境下分布式文件系统的动态负载均衡方法

Also Published As

Publication number Publication date
CN112753016A (zh) 2021-05-04

Similar Documents

Publication Publication Date Title
US11074107B1 (en) Data processing system and method for managing AI solutions development lifecycle
US11948003B2 (en) System and method for automated production and deployment of packaged AI solutions
US10922205B2 (en) Monitoring applications running on containers
CN106951926B (zh) 一种混合架构的深度学习方法及装置
US9424079B2 (en) Iteration support in a heterogeneous dataflow engine
WO2019237811A1 (zh) 一种神经网络的内存分配方法及装置
US8862933B2 (en) Apparatus, systems and methods for deployment and management of distributed computing systems and applications
CN110532098B (zh) 提供gpu服务的方法及系统
CN107526645B (zh) 一种通信优化方法及系统
US20210350233A1 (en) System and Method for Automated Precision Configuration for Deep Neural Networks
US20200019854A1 (en) Method of accelerating execution of machine learning based application tasks in a computing device
CN106651748B (zh) 一种图像处理方法与图像处理装置
US20210132990A1 (en) Operator Operation Scheduling Method and Apparatus
EP3525119A1 (en) Deep learning fpga converter
CN105700956A (zh) 用于处理分布式作业的方法和系统
WO2020062277A1 (zh) 神经网络中数据预处理阶段的计算资源的管理方法和装置
WO2023246801A1 (zh) 算法流水线编排方法、装置、电子设备和存储介质
WO2022247110A1 (zh) 任务处理方法及装置、电子设备和存储介质
CN117009038B (zh) 一种基于云原生技术的图计算平台
US20210256373A1 (en) Method and apparatus with accelerator
US20230128271A1 (en) Method, electronic device, and computer program product for managing inference process
CN110519092A (zh) 边缘网关、配置工具和软plc功能的脚本化实现方法
CN114092313A (zh) 一种基于gpu设备的模型推理加速方法及系统
Mzahm et al. Software analysis for Agents of Things (AoT) applications
US20030045952A1 (en) Continuation manager

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18935102

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18935102

Country of ref document: EP

Kind code of ref document: A1