US20230350718A1 - Computer-readable recording medium having stored therein program for controlling accelerator, method for controlling accelerator, and information processing apparatus - Google Patents

Computer-readable recording medium having stored therein program for controlling accelerator, method for controlling accelerator, and information processing apparatus Download PDF

Info

Publication number
US20230350718A1
US20230350718A1 US18/157,846 US202318157846A US2023350718A1 US 20230350718 A1 US20230350718 A1 US 20230350718A1 US 202318157846 A US202318157846 A US 202318157846A US 2023350718 A1 US2023350718 A1 US 2023350718A1
Authority
US
United States
Prior art keywords
temperature
accelerator
accelerators
prospective
gpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US18/157,846
Other languages
English (en)
Inventor
Shinya Kuwamura
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Assigned to FUJITSU LIMITED reassignment FUJITSU LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KUWAMURA, SHINYA
Publication of US20230350718A1 publication Critical patent/US20230350718A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/509Offload
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments discussed herein relate to a computer-readable recording medium having stored therein a program for controlling an accelerator, a method for controlling an accelerator, and an information processing apparatus.
  • task scheduling is sometimes performed which allocates a task to a GPU having the minimal load.
  • Examples of the load include a utilization of each GPU and the number of waiting tasks.
  • An inference GPU is one specialized in inference process, and has characteristics of, for example, a simplified and compact-in-size cooling mechanism, a large difference between the upper limit and the lower limit of a clock frequency (for example, 600 MHz to 1.6 GHz), and a fluctuation in a clock frequency according to a load thereon.
  • a clock frequency for example, 600 MHz to 1.6 GHz
  • An example of fluctuation in the clock frequency according to the load includes a case where the clock frequency is lowered when the load is low and is heightened when the load is high. In this case, the processing time may be shorter when the load is higher.
  • the processing time of an inference process may be prolonged due to the characteristics of the inference GPU, in other words, the processing performance may be degraded.
  • an inference GPU sometimes carries out control to compensate for cooling performance that is degraded by adopting a simple cooling mechanism, in other words, control to suppress temperature rise of the inference GPU (temperature rise suppressing control).
  • This control includes, for example, a control that lowers the clock frequency when the consumed power reaches the upper limit and lowers the clock frequency near to the lower limit when the temperature of the inference GPU reaches the upper limit.
  • the inference GPU continues to operate at a high clock frequency, the temperature may reach the upper limit and the clock frequency may decrease to a lower limit consequently the process performance may rapidly degraded.
  • an information processing apparatus performs video analyzing processes such as object recognition and anomaly detection on images sequentially or periodically obtained from a device such as a camera. If the image is taken at 10 fps (frames per second), the information processing apparatus will perform a real-time process that analyzes ten images per second.
  • the video analyzing process may not be completed within a time limit (for example, 0.1 second per image), making it difficult to perform the real-time processing.
  • the above-described inconvenience is not limited to an inference GPU, and may also occur in a various types of accelerator that are set to operate at a given (lower) frequency when the temperature thereof rises to a threshold or higher, such as GPUs including an inference GPU and a dedicated accelerator.
  • a non-transitory computer-readable recording medium has stored therein a program for controlling an accelerator of a plurality of accelerators for causing a computer to execute a control process including: obtaining a correlation between an execution time of the accelerator according to a processing load of a process and a temperature difference of the accelerator between temperature before and after execution of the process, the plurality of accelerators each being set to have, as a clock frequency, a first frequency when temperature is first threshold or higher, the correlation being preset for each predetermined clock frequency; obtaining, when a first process is started, a prospective execution time when each of the plurality of accelerators executes the first process and a prospective temperature of each of the plurality of accelerators after execution of the first process is completed which are based on the correlation and information about a current processing load, a current clock frequency, and a current temperature of each of the plurality of accelerators; obtaining, when an accelerator having the obtained temperature of the first threshold or higher is present, a prospective execution time and a prospective temperature when
  • FIG. 1 is a block diagram illustrating an example of a configuration of a video analyzing system according to a first embodiment
  • FIG. 2 is a block diagram illustrating an example of a hardware (HW) configuration of a computer that achieves a function of the video analyzing apparatus of the first embodiment;
  • HW hardware
  • FIG. 3 is a block diagram illustrating an example of a software configuration of the video analyzing apparatus of the first embodiment
  • FIG. 4 is a diagram illustrating an example of a temperature table of the first embodiment
  • FIG. 5 is a flow diagram illustrating an example of operation of the video analyzing apparatus of the first embodiment
  • FIG. 6 is a block diagram illustrating an example of a software configuration of a video analyzing apparatus of a second embodiment
  • FIG. 7 is a diagram illustrating an example of a temperature table of the second embodiment.
  • FIG. 8 is a diagram illustrating an example of a utilization table of the second embodiment.
  • FIG. 1 is a block diagram illustrating an example of a configuration of a video analyzing system 1 according to a first embodiment.
  • the video analyzing system 1 may illustratively include a video analyzing apparatus 2 and multiple cameras 3 - 1 to 3 -M (where, M is an integer of two or more in the example of FIG. 1 ).
  • M is an integer of two or more in the example of FIG. 1 .
  • the cameras 3 - 1 to 3 -M are simply referred to as “cameras 3 ”.
  • the multiple cameras 3 may be provided in a video analyzing apparatus 2 .
  • the video analyzing system 1 is an example of the information processing system and executes a video analyzing process based on video data 4 obtained by the cameras 3 .
  • the video data 4 (multiple images frames) is an example of input data.
  • the video analyzing process is an example of an inference process, and is exemplified by an object recognizing process and an anomaly detecting process.
  • the first embodiment assumes that the video analyzing processing is object recognition.
  • Each of the multiple cameras 3 transmits the captured video data 4 to the video analyzing apparatus 2 .
  • the video data 4 may be transmitted from the cameras 3 to the video analyzing apparatus 2 via a non-illustrated network.
  • the video analyzing apparatus 2 is an example of an information processing apparatus.
  • the video analyzing apparatus 2 may include a scheduler 2 a and multiple GPUs 2 b (N GPUs in FIG. 1 ; N is an integer of two or more).
  • N is an integer of two or more.
  • the GPUs 2 b - 1 to 2 b -N are simply referred to as “GPUs 2 b”.
  • the scheduler 2 a performs task scheduling to allocate a task of the object recognizing process to any one of the multiple GPUs 2 b . If the video analyzing system 1 executes the real-time process as an inference process, the scheduler 2 a may allocate the task of the object recognizing process on the received video data 4 to the GPU 2 b by executing the task scheduling each the time receiving the video data 4 from each of multiple camera 3 .
  • a limit for example, time limit
  • the time limit is an example of acceptable execution time of an inference process in the execution of the real-time process, and may be a time period in an extent of 100 ms, for example.
  • the GPU 2 b is an example of an accelerator that executes an inference process on the input data, using trained machine learning model 21 c (see FIG. 3 ).
  • the GPU 2 b executes a task allocated by the scheduler 2 a and outputs, as an example of the inference result, recognition result 5 .
  • the first embodiment assumes that the GPU 2 b is an inference GPU, but is not limited thereto, and may be various accelerators.
  • control for suppressing the temperature rise of the GPU 2 b may be performed.
  • the temperature rise suppressing control may include a first control and a second control.
  • the first control is one that sets the clock frequency to a first frequency near to the lower limit when the temperature of the GPU 2 b becomes equal to or higher than the first threshold (threshold Th_t) serving as the upper limit.
  • the second control is one that set the clock frequency to a second frequency lower than the current clock frequency when the consumed power becomes equal to or higher than the second threshold (threshold Th_e) serving as the upper limit.
  • the first control may be performed by the HW (Hardware) of the GPU 2 b and the second control may be performed by the FW (Firmware) of the GPU 2 b , which are however not limited thereto.
  • multiple GPUs 2 b are provided in video analyzing apparatus 2 , but arrangement of the GPUs 2 b is not limited thereto.
  • video analyzing system 1 is a distributed system such as a MEC (Multi-access Edge Computing) system
  • each of the multiple GPUs 2 b may be provided in a device, such as an edge server, connected to the video analyzing apparatus 2 via a non-illustrated network.
  • the video analyzing apparatus 2 may be a device such as a Gateway server.
  • the video analyzing apparatus 2 may be a virtual server (Virtual Machine:VM) or a physical server.
  • the function of the video analyzing apparatus 2 may be achieved by a single computer or by two or more computers.
  • FIG. 2 is a block diagram illustrating an example of a hardware (HW) configuration of a computer 10 that achieves a function of the video analyzing apparatus 2 of the first embodiment. If multiple computers are used as the HW resources for achieving the functions of the video analyzing apparatus 2 , each of the computers may include the HW configuration illustrated in FIG. 2 .
  • HW hardware
  • the computer 10 may illustratively include a HW configuration formed of a processor 10 a , multiple accelerators 10 b , a memory 10 c , a storing device 10 d , an I/F (Interface) device 10 e , an IO (Input/Output) device 10 f , and a reader 10 g.
  • a HW configuration formed of a processor 10 a , multiple accelerators 10 b , a memory 10 c , a storing device 10 d , an I/F (Interface) device 10 e , an IO (Input/Output) device 10 f , and a reader 10 g.
  • a HW configuration formed of a processor 10 a , multiple accelerators 10 b , a memory 10 c , a storing device 10 d , an I/F (Interface) device 10 e , an IO (Input/Output) device 10 f , and a reader 10 g.
  • the processor 10 a is an example of an arithmetic operation processing device that performs various controls and calculations.
  • the processor 10 a may be communicably connected to the blocks in the computer 10 via a bus 10 j .
  • the processor 10 a may be a multiprocessor including multiple processors, may be a multicore processor having multiple processor cores, or may have a configuration having multiple multicore processors.
  • the processor 10 a may be any one of integrated circuits (ICs) such as Central Processing Units (CPUs), Micro Processing Units (MPUs), Accelerated Processing Units (APUs), Digital Signal Processors (DSPs), Application Specific ICs (ASICs) and Field Programmable Gate Arrays (FPGAs), or combinations of two or more of these ICs.
  • ICs integrated circuits
  • CPUs Central Processing Units
  • MPUs Micro Processing Units
  • APUs Accelerated Processing Units
  • DSPs Digital Signal Processors
  • ASICs Application Specific ICs
  • FPGAs Field Programmable Gate Arrays
  • the multiple accelerators 10 b each execute an inference process by inputting data into a machine learning model, and output the inference result.
  • Example of each accelerator 10 b are ICs such as GPUs, APUs, DSPs, ASICs, and FPGAs.
  • the CPU 2 b illustrated in FIG. 1 is an example of the accelerator 10 b.
  • the memory 10 c is an example of a HW device that stores information such as various types of data and programs.
  • Examples of the memory 10 c include one or both of a volatile memory such as a Dynamic Random Access Memory (DRAM) and a non-volatile memory such as a Persistent Memory (PM).
  • DRAM Dynamic Random Access Memory
  • PM Persistent Memory
  • the storing device 10 d is an example of a HW device that stores information such as various types of data and programs.
  • Examples of the storing device 10 d include a magnetic disk device such as a Hard Disk Drive (HDD), a semiconductor drive device such as a Solid-State Drive (SSD), and various storing devices such as a non-volatile memory.
  • Examples of the non-volatile memory include a flash memory, a Storage Class Memory (SCM), and a Read Only Memory (ROM).
  • the storing device 10 d may store a program 10 h (program for controlling) that implements all or part of various functions of the computer 10 .
  • the processor 10 a can achieve the functions of the video analyzing apparatus 2 (for example, a controlling unit 28 illustrated in FIG. 3 ) to be detailed below by expanding the program 10 h stored in the storing device 10 d onto the memory 10 c and executing the expanded program 10 h.
  • the video analyzing apparatus 2 for example, a controlling unit 28 illustrated in FIG. 3
  • the I/F device 10 e is an example of a communication IF that controls connection and communication between a video analyzing apparatus 2 and each of multiple cameras 3 .
  • the I/F device 10 e may include an applying adapter conforming to Local Area Network (LAN) such as Ethernet (registered trademark) or optical communication such as Fibre Channel (FC).
  • LAN Local Area Network
  • FC Fibre Channel
  • the applying adapter may be compatible with one of or both wireless and wired communication schemes.
  • the video analyzing apparatus 2 may be communicably connected, through the IF device 10 e and a non-illustrated network, to each of multiple cameras 3 .
  • the program 10 h may be downloaded from the network to the computer through the communication IF and be stored in the storing device 10 d , for example.
  • the IO device 10 f may include one or both of an input device and an output device.
  • Examples of the input device include a keyboard, a mouse, and a touch panel.
  • Examples of the output device include a monitor, a projector, and a printer.
  • the IO device 10 f may include, for example, a touch panel that integrates an input device and an output device.
  • the output device may be connected to the accelerator 10 b serving as a GPU or an APU.
  • the reader 10 g is an example of a reader that reads data and programs recorded on a recording medium 10 i .
  • the reader 10 g may include a connecting terminal or device to which the recording medium 10 i can be connected or inserted.
  • Examples of the reader 10 g include an applying adapter conforming to, for example, Universal Serial Bus (USB), a drive apparatus that accesses a recording disk, and a card reader that accesses a flash memory such as an SD card.
  • the program 10 h may be stored in the recording medium 10 i .
  • the reader 10 g may read the program 10 h from the recording medium 10 i and store the read program 10 h into the storing device 10 d.
  • the recording medium 10 i is an example of a non-transitory computer-readable recording medium such as a magnetic/optical disk, and a flash memory.
  • a magnetic/optical disk include a flexible disk, a Compact Disc (CD), a Digital Versatile Disc (DVD), a Blu-ray disc, and a Holographic Versatile Disc (HVD).
  • the flash memory include a semiconductor memory such as a USB memory and an SD card.
  • the HW configuration of the computer 10 described above is exemplary. Accordingly, the computer 10 may appropriately undergo increase or decrease of HW devices (e.g., addition or deletion of arbitrary blocks), division, integration in an arbitrary combination, and addition or deletion of the bus.
  • HW devices e.g., addition or deletion of arbitrary blocks
  • a computer that achieves a function of the edge server may have the same HW configuration as that of the computer illustrated in FIG. 2 .
  • FIG. 3 is a diagram illustrating an example of software configuration of the video analyzing apparatus 2 according to the first embodiment.
  • the video analyzing apparatus 2 may illustratively include a memory unit 21 , a video obtaining unit 22 , a GPU information obtaining unit 23 , a calculating unit 24 , a task allocating unit 25 , an object recognizing process unit 26 , and an outputting unit 27 .
  • the video obtaining unit 22 , the GPU information obtaining unit 23 , the calculating unit 24 , the task allocating unit 25 , the object recognizing process unit 26 , and the outputting unit 27 are an example of a controlling unit 28 .
  • Processes performed by the video obtaining unit 22 , the GPU information obtaining unit 23 , the calculating unit 24 , and the task allocating unit 25 are examples of a task scheduling process performed by the scheduler 2 a illustrated in FIG. 1 .
  • the object recognizing process unit 26 and the outputting unit 27 are examples of an inference processing unit that outputs a recognition result 5 of the object recognizing process, using the multiple GPU 2 b illustrated in FIG. 1 , and may be achieved by the function of the processor 10 a illustrated in FIG. 2 .
  • the memory unit 21 is an example of a storing region and stores various data used by the video analyzing apparatus 2 .
  • the memory unit 21 may be achieved by, for example, a storing region that one or both of the memory 10 c and the storing unit 10 d illustrated in FIG. 2 .
  • the memory unit 21 may illustratively be capable of storing a temperature table 21 a , GPU information 21 b , a machine learning model 21 c , video data 4 , and the recognition result 5 .
  • the temperature table 21 a is expressed in a table form for convenience, but is not limited to this form.
  • the temperature table 21 a may be in various forms such as DB (Database) or an array.
  • the video analyzing apparatus 2 (controlling unit 28 ) may create the temperature table 21 a as a preliminary setting process performed prior to the start of the operation by the video analyzing system 1 .
  • FIG. 4 is a diagram illustrating an example of a temperature table 21 a of the first embodiment.
  • the temperature table 21 a is an example of information indicating a correlation generated in advance for each predetermined clock frequency.
  • the temperature table 21 a may associate an execution time according to a processing load of a process on the GPU 2 b , a consumed power that the GPU 2 b consumes during the execution of the process corresponding to the processing load, and a temperature difference of the GPU 2 b between before and after the execution of the process with each predetermined clock frequency.
  • an example of the processing load is the number of processes of a task (tasks) that the GPU 2 b executes (is executing).
  • the “number of analyzing processes” represents the number of analyzing processes allocated to one GPU 2 b , in other words, the number n of processes of the task that the GPU 2 b simultaneously executes (where, n is an integer of one or more).
  • the “clock frequency” (MHz) is the clock frequency (operating frequency) at which the GPU 2 b operates.
  • three stages of clock frequencies of 500 MHz, 1000 MHz, 1500 MHz clock frequencies at intervals of 500 MHz are set in the temperature table 21 a , but the clock frequencies are not limited to this.
  • multiple stages of clock frequencies may be set at intervals of a frequency in the range of less than 500 MHz or in the range of greater than 500 MHz.
  • the “execution time” (ms), the “consumed power” (W), and the “temperature difference” (° C.) are set for each combination of the “number of analyzing processes” and the “clock frequency”.
  • the “execution time” is the time (required time) from the start to the completion of the analyzing process performed by the GPU 2 b .
  • the “consumed power” is the amount of power to be consumed by the GPU 2 b when the GPU 2 b executes the analyzing process.
  • the “temperature difference” is a difference between the temperature before the execution of the analyzing process by the GPU 2 b and the temperature after the execution.
  • the video analyzing apparatus 2 may measure the execution time, the consumed power, and the temperature difference for each clock-frequency when GPU 2 b is caused to execute n tasks, and set them into the temperature table 21 a . Even if the multiple GPUs 2 b are the same commercial product, the performance thereof may have individual differences among the GPUs 2 b . Thus, the temperature table 21 a may be created for each GPU 2 b.
  • the video obtaining unit 22 obtains the video data 4 from each of multiple cameras 3 and stores the obtained video data 4 into the memory unit 21 .
  • the analyzing process is started in the video analyzing apparatus 2 .
  • the GPU information obtaining unit 23 obtains the GPU information 21 b indicating the current status of each of the multiple GPUs 2 b and stores the GPU information 21 b into the memory unit 21 .
  • the GPU information 21 b may be obtained from, for example, the OS (Operating System) or a driver of the computer 10 .
  • the GPU information 21 b may include, for example, information of the current temperature of the GPU 2 b and information on one or both of the current operating frequency and the current consumed power of the GPU 2 b .
  • the GPU information 21 b may include the number of the object recognizing processes (analyzing processes) being executed by the GPU 2 b.
  • the calculating unit 24 calculates (obtains) an execution time and the consumed power of the object recognizing process on the video data 4 , and GPU temperature after the execution of the object recognizing process for each of the multiple GPUs 2 b with reference to the temperature table 21 a and the GPU information 21 b.
  • the calculating unit 24 specifies, from the temperature table 21 a , an entry corresponding to the number of analyzing processes obtained by adding the number of processes (tasks) to be allocated and the current number of processes included in the GPU information 21 b and also to the current operating frequency of the GPU 2 b included in the GPU information 21 b .
  • the process (task) to be allocated is an example of the first process.
  • the calculating unit 24 obtains the execution time and the consumed power of the specified entry. In addition, the calculating unit 24 calculates the temperature of the GPU 2 b after the object recognizing process by adding the temperature difference in the specified entry and the current temperature of the GPU 2 b included in the GPU information 21 b.
  • the calculating unit 24 obtains, for each GPU 2 b , the execution time, the consumed power, and the GPU temperature when the object recognizing process is executed.
  • the calculating unit 24 is assumed to specify an entry from the temperature table 21 a based on the number of analyzing processes and the current operating frequency of the GPU 2 b , but the manner of the specification is not limited this. Alternatively, the calculating unit 24 may specify an entry corresponding to the number of analyzing processes and the current consumed power of the GPU 2 b from the temperature table 21 a , or may specify an entry corresponding to the number of analyzing processes and the both of the operation frequency and the consumed power of the GPU 2 b from the temperature table 21 a.
  • the calculating unit 24 obtains, from the temperature table 21 a , the prospective execution time of a process to be allocated, the prospective consumed power and the prospective temperature after the completion of the process to be allocated for each GPU 2 b on the basis of the number of analyzing processes and one or both of the current operating frequency and the current consumed power of the GPU 2 b.
  • the temperature rise suppressing control (first control and second control) is performed.
  • the clock frequency of the GPU 2 b is lowered by the control. Since, when the clock frequency lowers, the processing performance (processing rate) of the GPU 2 b lowers, the execution time of the object recognizing process may exceed the execution time calculated (specified) by the calculating unit 24 .
  • the calculating unit 24 calculates (obtains) the execution time and the GPU temperature of the GPU 2 b that is estimated to be under the temperature rise suppressing control on the basis of the obtained power and temperature and each threshold of the first control and the second control by the following method.
  • the calculating unit 24 is assumed to execute, when the calculated GPU temperature is equal to or higher than a threshold Th_t, the first control for lowering the clock frequency near to the lower limit if the temperature of GPU 2 b reaches the upper limit.
  • the calculating unit 24 calculates (obtains) the prospective execution time and the prospective GPU temperature when the clock frequency is assumed to be lowered to near to the lower limit.
  • the “near to the lower limit” is, for example, near the lower limit (e.g., 600 MHz) of the rated operating frequency of GPU 2 b .
  • the clock frequency “near to the lower limit” is illustratively assumed to be the lowest clock frequency that can be set for the GPU 2 b.
  • the calculating unit 24 specifies, from the temperature table 21 a , an entry corresponding to the number of analyzing processes calculated on the basis of the GPU information 21 b and the lowest clock frequency. Then, the calculating unit 24 obtains the execution time of the specified entry. The calculating unit 24 calculates the GPU temperature by adding the temperature difference of the specified entry and the GPU temperature included in the GPU information 21 b.
  • the calculating unit 24 obtains, from the temperature table 21 a , the prospective execution time and the prospective temperature when the clock frequency of the GPU 2 b is set to the first frequency, in place of the execution time and the temperature obtained for the GPU 2 b.
  • the threshold Th_t is an example of a first threshold, and may be set according to, for example, the specification of the GPU 2 b to be subjected to the first control.
  • the threshold Th_t may be a value near the rated maximum temperature, for example, 135° C. or the like.
  • the calculating unit 24 is assumed to execute, when the obtained power consumption is equal to or higher than threshold Th_e, the second control for lowering the clock frequency if the power consumption reaches the upper limit.
  • the calculating unit 24 calculates (obtains), on the basis of the temperature table 21 a , the prospective execution time and the prospective GPU temperature when the clock frequency is assumed to be lowered.
  • the calculating unit 24 may specify, from the temperature table 21 a , an entry corresponding to the number of analyzing processes calculated on the basis of the GPU information 21 b and a clock frequency that is one-stage lower than the operating frequency included in the GPU information 21 b . Then, the calculating unit 24 obtains the execution time of the specified entry. The calculating unit 24 calculates the GPU temperature by adding the temperature difference of the specified entry and the GPU temperature included in the GPU information 21 b . In the illustrated example, the calculating unit 24 lowers the clock frequency by one stage, but the extent of lowering is not limited to this, and may lower the clock frequency by two or more stages.
  • the calculating unit 24 obtains, from the temperature table 21 a , the prospective execution time and the prospective temperature when the clock frequency of the GPU 2 b is set to the second frequency, in place of the execution time and the temperature obtained for the GPU 2 b.
  • the threshold Th_e is an example of a second threshold, and may be set according to, for example, the specification of the GPU 2 b to be subjected to the second control.
  • threshold Th_e may be a value near to the rated consumed power, for example, 70 W.
  • Threshold Th_e may be power consumed when the temperature of the GPU 2 b becomes lower than the Th_t (e.g., 85° C.).
  • the calculating unit 24 adopts, to the execution time and the GPU temperature of GPU 2 b estimated to be subjected to the temperature rise suppressing control, the prospective execution time and the prospective GPU temperature when the clock frequency is assumed to be lowered or to be the lowest.
  • the task allocating unit 25 allocates the task of the object recognizing process to a GPU 2 b having a prospective execution time within the time limit and a prospective GPU temperature satisfying a predetermined condition among the multiple GPUs 2 b on the basis of the execution time and the GPU temperature of each GPU 2 b calculated by the calculating unit 24 .
  • the predetermined condition may include, for example, having the lowest GPU temperature among the GPUs 2 b having prospective execution times within the time limit.
  • the object recognizing process unit 26 executes the object recognizing process serving as an example of the analyzing process (inference process), using the GPU 2 b allocated with the task. Specifically, the object recognizing process unit 26 causes the GPU 2 b allocated with the task to execute machine learning model 21 c using the video data 4 as the input, consequently obtains the recognition result 5 from the GPU 2 b , and stores the recognition result 5 into the memory unit 21 .
  • the machine learning model 21 c is a trained machine learning model that has undergone machine learning (training) of the object recognizing process using training data.
  • the task allocating unit 25 and object recognizing process unit 26 cause a GPU 2 b having a temperature being obtained by the calculation unit 24 and satisfying the predetermined condition, among one or more GPUs 2 b each having the prospective execution time being obtained by the calculating unit 24 and being within the time limit of the process to be allocated, to execute the process to be allocated.
  • the outputting unit 27 outputs the output data.
  • the outputting data may include, for example, the recognition result 5 serving as an example of inference result.
  • the outputting unit 27 may transmit (provide) the output data to, for example, another non-illustrated computer in the outputting of the output data, or may store and manage the output data in the memory unit 21 so as to be obtainable from the video analyzing apparatus 2 or another computer.
  • the outputting unit 27 may output, in the outputting of the output data, information indicating the output data to an output device such as the video analyzing apparatus 2 , or may output the output data in various other ways.
  • FIG. is a flow diagram illustrating an example of operation of the video analyzing apparatus 2 of the first embodiment.
  • the video obtaining unit 22 of the video analyzing apparatus 2 obtains the video data 4 transmitted from the cameras 3 (Step S 1 ) and stores the video data 4 into the memory unit 21 .
  • the GPU information obtaining unit 23 obtains the GPU information 21 b of each of the multiple GPUs 2 b (Step S 2 ) and stores the GPU information 21 b into the memory unit 21 .
  • the calculating unit 24 calculates, based on the temperature table 21 a and the GPU information 21 b , the consumed power, the execution time, and the temperature of each GPU 2 b when executing the task (Step S 3 ).
  • the calculating unit 24 determines whether or not a GPU 2 b having a calculated temperature equal to or higher than threshold Th_t is present among the multiple GPUs 2 b (Step S 4 ).
  • Step S 4 If a GPU 2 b having a calculated temperature equal to or higher than threshold Th_t is present (YES in Step S 4 ), the calculating unit 24 obtains the prospective consumed power, the prospective execution time, and the prospective temperature of the GPU 2 b when the GPU 2 b is operating at the lowest clock frequency (Step S 5 ), and the process proceeds to Step S 6 .
  • the calculating unit 24 uses, for the GPU 2 b , the prospective execution time, the prospective consumed power, and the prospective temperature obtained in Step S 5 in place of the execution time, the consumed power, and the temperature calculated in Step S 3 .
  • Step S 6 the calculating unit 24 determines whether or not a GPU 2 b satisfying the obtained consumed power is threshold Th_e or more is present among the multiple GPUs 2 b.
  • the calculating unit 24 obtains the prospective consumed power, the prospective execution time, and the prospective temperature of the GPU 2 b when the clock frequency is lowered (Step S 7 ), and the process proceeds to step S 8 .
  • the calculating unit 24 uses, for the GPU 2 b , the prospective execution time, the prospective consumed power, and the prospective temperature obtained in Step S 7 in place of the execution time, the consumed power, and the temperature calculated in Step S 3 .
  • the task allocating unit 25 specifies a GPU 2 b having an execution time within the time limit and also having the lowest temperature among the multiple GPU 2 b , and allocates a task to the specified GPU 2 b.
  • the object recognizing process unit 26 executes the task with a machine learning model 21 c (Step S 8 ) by inputting the video data 4 into the GPU 2 b allocated with the task, and stores the recognition result 5 into the memory unit 21 .
  • the outputting unit 27 outputs an output data including the recognition result 5 , and the process ends.
  • Steps S 4 and S 5 and Steps S 6 and S 7 may be performed in the reverse order. In addition, obtaining of the consumed power may be omitted in Step S 7 .
  • the video analyzing apparatus 2 obtains, for each of the multiple GPUs 2 b that are to be subjected to at least the first control, a correlation (temperature table 21 a ) generated in advice for each predetermined clock frequency, which correlation corresponds to a correlation between the execution time of the GPU 2 b according to a processing load of a process and a temperature difference of the GPU 2 b between before and after the execution of the process of the processing load.
  • the video analyzing apparatus 2 obtains, for each of the multiple GPUs 2 b , a prospective execution time when the first process is executed and the temperature of each GPU 2 b after execution of the first process is completed which are based on the correlation and information about current processing load, a current clock frequency, and the current temperature. Furthermore, when a GPU 2 b having the obtained temperature of the first threshold or higher is present, the video analyzing apparatus 2 obtains a prospective execution time and a prospective temperature when a clock frequency of the of the GPU 2 b is set to the first frequency from the correlation in place of the obtained execution time and the obtained temperature of the GPU 2 b . Then, the video analyzing apparatus 2 causes one GPU 2 b having the obtained temperature satisfying a predetermined condition among the multiple GPUs 2 b having execution times within the time limit of the first process to execute the first process.
  • the video analyzing apparatus 2 allocates a task to any one of the multiple GPUs 2 b on an assumption that the clock frequency of the certain GPU 2 b comes to be the lowest.
  • the video analyzing apparatus 2 can suppress the temperature rise of the GPU 2 b while satisfying the time constraint of a real-time process (for example, 10 fps) by the scheduling considering the temperature of the GPUs 2 b , so that the task can be executed by a GPU 2 b having a lower temperature.
  • a real-time process for example, 10 fps
  • the GPUs 2 b may reach the upper limit (first threshold) of the temperature and continue to operate at the lowest clock frequency as the system is continued to be executed for an extended period of time. In this case, the processing time may be prolonged, and consequently the analyzing processing may not be completed within the time limit.
  • the video analyzing apparatus 2 can shorten the processing time by lowering the possibility that the GPU 2 b continues to operate at the lowest clock frequency and consequently reserving a longer time to operate the GPU 2 b at a higher frequency clock. For example, assuming the performance when the GPU 2 b operates at a clock frequency near to the lower limit has a three-time difference from the performance at a clock frequency near to the upper limit, the video analyzing apparatus 2 can triple the processing speed at maximum.
  • the video analyzing apparatus 2 obtains a prospective execution time and prospective temperature when a clock frequency of the of the GPU 2 b is set to the second frequency from the correlation in place of the execution time and the temperature obtained with respect to the GPU 2 b .
  • the consumed power by the GPU 2 b it is possible to lower the possibility that the GPU 2 b continues to operate at the lowest clock frequency and consequently reserve a longer time to operate the GPU 2 b at a higher frequency clock, so that the processing time can be shortened.
  • the temperature table 21 a includes, as the processing load, the number of the first processes that the GPU 2 b simultaneously executes. Accordingly, the video analyzing apparatus 2 can easily specify an entry of the temperature table 21 a by specifying the number of the first processes.
  • the description of the first embodiment assumes that the analyzing process performed by the video analyzing apparatus 2 is one type of the object recognizing process.
  • the utilization (ratio) of the GPU 2 b may be different with a type of analyzing process. If the utilization of GPU 2 b is different, the clock frequency, the consumed power, and the temperature will vary with the utilization of the GPU 2 b .
  • the video analyzing apparatus 2 A according to the second embodiment executes the task scheduling process of the GPU 2 b , considering the utilization of the GPU 2 b.
  • FIG. 6 is a block diagram illustrating an example of a software configuration of a video analyzing apparatus 2 A of a second embodiment.
  • the video analyzing apparatus 2 A includes the memory unit 21 A, the GPU information obtaining unit 23 A, and the calculating unit 24 A in place of the memory unit 21 , the GPU information obtaining unit 23 , and the calculating unit 24 of the video analyzing apparatus 2 illustrated in FIG. 3 .
  • like reference numbers designate same or substantially same elements described with respect to the video analyzing apparatus 2 of FIG. 3 unless specified otherwise.
  • part (functions, processes, and the like) not particularly described with respect to the memory unit 21 A, the GPU information obtaining unit 23 A, and the calculating unit 24 A are the same as those of the memory unit 21 , the GPU information obtaining unit 23 , and the calculating unit 24 .
  • the memory unit 21 A may be capable of storing a temperature table 21 d , a utilization table 21 e , and GPU information 21 f in place of the temperature table 21 a and the GPU information 21 b of the memory unit 21 illustrated in FIG. 3 .
  • the temperature table 21 d and the utilization table 21 e are expressed in a table format, but the present invention is not limited thereto.
  • the temperature table 21 d and the utilization table 21 e may be in various forms such as a DB or an array.
  • the video analyzing apparatus 2 A may create the temperature table 21 d and the utilization table 21 e as a preliminary setting process prior to starting the operation by the video analyzing system 1 .
  • FIG. 7 is a diagram illustrating an example of a temperature table 21 d of the second embodiment.
  • the temperature table 21 d is an example of information indicating a correlation among an execution time a consumption power, and a GPU temperature according to a processing load on the GPU 2 b for each clock frequency of the GPU 2 b .
  • an example of the processing load is a utilization (ratio) of the GPU 2 b.
  • the temperature table 21 d includes an item of “utilization” instead of the item of “number of analyzing processes” of the temperature table 21 a illustrated in FIG. 4 .
  • the “utilization” (%) is the utilization of the GPU 2 b when the GPU 2 b executes the task of the object recognizing process.
  • FIG. 8 is a diagram illustrating an example of a utilization table 21 e of the second embodiment.
  • the utilization table 21 e is an example of information indicating a correlation between the type of task being executed by the GPU 2 b and the GPU utilization. As illustrated in FIG. 8 , the utilization table 21 e may include items of “analyzing process” and “utilization”.
  • the “analyzing process” represents a type of analyzing process that the GPU 2 b executes, and may include, for example, analyzing process A, analyzing process B, and analyzing process C.
  • the object recognizing process is an example of an “analyzing process”.
  • the “utilization” (%) represents a utilization when the GPU 2 b executes a single “analyzing process”.
  • the video analyzing apparatus 2 A may measure, as the preliminary setting process, the execution time, the consumed power, and the temperature difference for each clock frequency when GPU 2 b is caused to execute the task of each individual type of analyzing process or each combination of multiple types of analyzing processes, and set them into the temperature table 21 d . Even if the multiple GPUs 2 b are the same commercial product, the performance thereof may have individual differences among the GPUs 2 b . For this reason, each of the temperature table 21 d and the utilization table 21 e may be generated for each individual GPU 2 b.
  • the GPU information obtaining unit 23 A obtains the GPU information 21 f indicating the current status of each of the multiple GPUs 2 b and stores the GPU information 21 f into the memory unit 21 .
  • the GPU information 21 f may be obtained, for example, from the OS or a driver of the computer 10 .
  • the GPU information 21 f may include, for example, information of the current temperature of the GPU 2 b , information on one or both of the current operating frequency and the current consumed power of the GPU 2 b , and the number of the object recognizing processes (analyzing processes) being executed by the GPU 2 b .
  • the GPU information 21 f may include, in addition to the content of the GPU information 21 b , a type of analyzing process being executed by the GPU 2 b.
  • the calculating unit 24 A calculates (obtains) an execution time and the consumed power of the object recognizing process on the video data 4 , and GPU temperature after the execution of the object recognizing process for each of the multiple GPUs 2 b with reference to the temperature table 21 d , the utilization table 21 e , and the GPU information 21 f.
  • the calculating unit 24 A may include a utilization calculating unit 240 .
  • the utilization rate calculating unit 240 calculates a prospective GPU utilization when the GPU 2 b executes a process to be allocated on the basis of the type and the number of processes (tasks) to be allocated, the type and the number of the object recognizing process (analyzing process) being executed by the GPU 2 b included in GPU information 21 f.
  • the utilization rate calculating unit 240 multiplies the utilization of each type of the processes and the number of processes of the type for each type of analyzing processes on the basis of the utilization table 21 e . Then, the utilization rate calculating unit 240 adds (sums) the multiplied values (utilizations) over all types to obtain a prospective utilization when the GPU 2 b executes the process to be allocated.
  • an analyzing process A of a single video data 4 is assumed to be executed (i.e., a single “analyzing process A” is to be allocated).
  • the utilization is 10% from the utilization table 21 e .
  • the calculating unit 24 A identifies, from the temperature table 21 d , an entry corresponding to the utilization calculated by the utilization rate calculating unit 240 and also to the current operating frequency of the GPU 2 b included in the GPU information 21 f .
  • the process of the calculating unit 24 A after the specification of the entry of the temperature table 21 d is similar to that performed by the calculating unit 24 .
  • the calculating unit 24 A obtains the execution time and the consumed power of the specified entry.
  • the calculating unit 24 A calculates the temperature of the GPU 2 b after the object recognizing process by adding the temperature difference in the specified entry and the current temperature of the GPU 2 b included in the GPU information 21 f.
  • the calculating unit 24 A specifies, from temperature table 21 d , an entry based on the calculated utilization and the current operating frequency of the GPU 2 b , but the manner of the specification is not limited this. Alternatively, the calculating unit 24 A may specify an entry corresponding to the calculated utilization and the current consumed power of the GPU 2 b from the temperature table 21 d , or may specify an entry corresponding to the calculated utilization and the both of the operation frequency and the consumed power from the temperature table 21 d.
  • the calculating unit 24 A calculates the prospective execution time of a process to be allocated, the prospective consumed power and the prospective temperature after the completion of the process to be allocated for each GPU 2 b from the temperature table 21 d on the basis of the calculated utilization and one or both of the current operating frequency and the current consumed power of the GPU 2 b.
  • the calculating unit 24 A determines whether or not to execute the temperature rise suppressing control on each GPU 2 b on the basis of the obtained consumed power and temperature, and the thresholds Th_t and Th_e of the first control and the second control, respectively. Then the calculating unit 24 A adopts, to the GPU 2 b estimated to be subjected to the temperature rise suppressing control, the prospective execution time and the prospective GPU temperature when the clock frequency is assumed to be lowered or to be the lowest.
  • the processes performed by the task allocating unit 25 , the object recognizing process unit 26 , and the outputting unit 27 on the basis of the execution time and the GPU temperature calculated for each GPU 2 b by the calculating unit 24 A are the same as those in the first embodiment.
  • the video analyzing apparatus 2 A of the second embodiment brings the same advantageous effects as those of the video analyzing apparatus 2 of the first embodiment.
  • the video analyzing apparatus 2 A can specify the GPU utilization according to the type of process (analyzing process), it is possible to accurately estimate whether the temperature rise suppressing control is to be performed by GPU 2 b.
  • the functional blocks 22 to 27 included in the video analyzing apparatus 2 or 2 A illustrated in FIGS. 3 and 6 may be merged in any combination or may be divided.
  • the information 21 a to 21 c stored in memory unit 21 illustrated in FIG. 3 may be merged by any combination or may be divided.
  • the information 21 a to 21 e stored in memory unit 21 illustrated in FIG. 6 may be merged by any combination or may be divided.
  • the video analyzing apparatus 2 may use the “utilization” of the GPU 2 b like the video analyzing apparatus 2 A according to the second embodiment.
  • the “number of processes” of temperature table 21 a may be set to the value of “utilization” x “number of processes”.
  • the GPU information obtaining unit 23 may further obtain the “utilization” as the GPU information 21 b .
  • calculating unit 24 may specify, from the temperature table 21 a , an entry corresponding to the calculated utilization and the clock frequency in the GPU information 21 b.
  • the video analyzing apparatus 2 or 2 A executes a video analyzing process on the video data 4 input from the cameras 3 , the process is not limited to this. Alternatively, the video analyzing apparatus 2 or 2 A may execute an inference process on various type of input data.
  • the video analyzing apparatus 2 or 2 A illustrated in FIG. 3 or 6 may have a configuration that achieves each processing function by multiple apparatuses cooperating with each other via a network.
  • the video obtaining unit 22 and the outputting unit 27 may be a Web server and an application server;
  • the GPU information obtaining unit 23 or 23 A, the calculating unit 24 or 24 A, the task allocating unit 25 , and the object recognizing process unit 26 may be an application server;
  • the memory unit 21 or 21 A may be a DB server, or the like.
  • the processing function as the video analyzing apparatus 2 or 2 A may be achieved by the web server, the application server, and the DB server cooperating with one another via a network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Power Sources (AREA)
US18/157,846 2022-05-02 2023-01-23 Computer-readable recording medium having stored therein program for controlling accelerator, method for controlling accelerator, and information processing apparatus Abandoned US20230350718A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022075725A JP2023165100A (ja) 2022-05-02 2022-05-02 アクセラレータの制御プログラム及び制御方法、並びに、情報処理装置
JP2022-075725 2022-05-02

Publications (1)

Publication Number Publication Date
US20230350718A1 true US20230350718A1 (en) 2023-11-02

Family

ID=88513143

Family Applications (1)

Application Number Title Priority Date Filing Date
US18/157,846 Abandoned US20230350718A1 (en) 2022-05-02 2023-01-23 Computer-readable recording medium having stored therein program for controlling accelerator, method for controlling accelerator, and information processing apparatus

Country Status (2)

Country Link
US (1) US20230350718A1 (ja)
JP (1) JP2023165100A (ja)

Also Published As

Publication number Publication date
JP2023165100A (ja) 2023-11-15

Similar Documents

Publication Publication Date Title
US10826980B2 (en) Command process load balancing system
JP6249953B2 (ja) ヘテロジニアスマルチプロセッサシステムオンチップにおける熱駆動作業負荷スケジューリング
US9715407B2 (en) Computer product, multicore processor system, and scheduling method
US20150220370A1 (en) Job scheduling apparatus and method therefor
US20120072749A1 (en) Multi-core power management
US9342133B2 (en) Information processing apparatus and power saving control method
US9329648B2 (en) Performance management of subsystems in a server by effective usage of resources
US8843672B2 (en) Access method, computer and recording medium
US20170371761A1 (en) Real-time performance tracking using dynamic compilation
US11144234B2 (en) Apparatus, method for storage access management, and non-transitory computer-readable storage medium for storing program
US9495491B2 (en) Reliability aware thermal design
US8196146B2 (en) Information processing apparatus, parallel processing optimization method, and program
US20230350718A1 (en) Computer-readable recording medium having stored therein program for controlling accelerator, method for controlling accelerator, and information processing apparatus
US9772964B2 (en) Multicore processor system, computer product, assigning method, and control method
US10089151B2 (en) Apparatus, method, and program medium for parallel-processing parameter determination
US20130055281A1 (en) Information processing apparatus and scheduling method
WO2022166679A1 (zh) 计算核、计算核温度调整方法、设备、介质、芯片和系统
US11669429B2 (en) Configuration cluster-based performance optimization of applications in an information handling system (IHS)
US11467748B2 (en) Control apparatus and computer-readable recording medium having stored therein control program
KR101586712B1 (ko) 멀티 프로세서 시스템에서 태스크 의존성 그래프를 이용한 스케줄링 방법 및 장치
US10417050B2 (en) Apparatus and method to control calculation resources of an information processing device based on predictive values of reference data
KR102144211B1 (ko) 데이터 센터의 발열 관리 방법 및 장치
US20220366239A1 (en) Storage medium, machine learning method, and information processing device
JP6379841B2 (ja) 情報処理装置、試験方法および試験制御プログラム
US20230162067A1 (en) Computer-readable recording medium storing causal search program, causal search method, and information processing apparatus

Legal Events

Date Code Title Description
AS Assignment

Owner name: FUJITSU LIMITED, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KUWAMURA, SHINYA;REEL/FRAME:062448/0425

Effective date: 20221221

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STCB Information on status: application discontinuation

Free format text: EXPRESSLY ABANDONED -- DURING EXAMINATION