CN111221715A - Method, system, device and medium for dynamically optimizing Caffe performance - Google Patents

Method, system, device and medium for dynamically optimizing Caffe performance Download PDF

Info

Publication number
CN111221715A
CN111221715A CN202010007563.9A CN202010007563A CN111221715A CN 111221715 A CN111221715 A CN 111221715A CN 202010007563 A CN202010007563 A CN 202010007563A CN 111221715 A CN111221715 A CN 111221715A
Authority
CN
China
Prior art keywords
cpu
utilization rate
gpu
judging whether
threshold value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010007563.9A
Other languages
Chinese (zh)
Inventor
李磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010007563.9A priority Critical patent/CN111221715A/en
Publication of CN111221715A publication Critical patent/CN111221715A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3495Performance evaluation by tracing or monitoring for systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations

Abstract

The invention discloses a method, a system, equipment and a storage medium for dynamically optimizing Caffe performance, wherein the method comprises the following steps: acquiring the utilization rates and the temperatures of the CPU and the GPU every other first preset time, and judging whether the heat dissipation requirements are met according to the temperatures of the CPU and the GPU; responding to the heat dissipation requirement, and judging whether the utilization rate of the CPU is less than or equal to a utilization rate threshold value; responding to the condition that the utilization rate of the CPU is smaller than or equal to the utilization rate threshold value, calculating the variation coefficient of the GPU utilization rate in the second preset time, and judging whether a performance bottleneck exists or not based on the variation coefficient; and increasing the batch size and the number of data transfer lines in response to the presence of a performance bottleneck. The method, the system, the equipment and the medium for dynamically optimizing the Caffe performance provided by the invention discover and eliminate the bottleneck by monitoring the system resource operation condition, and ensure that the Caffe performance running on the GPU reaches the optimal state.

Description

Method, system, device and medium for dynamically optimizing Caffe performance
Technical Field
The present invention relates to the field of servers, and more particularly, to a method, a system, a computer device, and a readable medium for dynamically optimizing Caffe performance.
Background
In recent years, AI (Artificial Intelligence) technology has made a great breakthrough in the fields of image recognition, natural language processing, recommendation systems, and the like, and provides an unlimited possibility for falling to the ground in the business field. The AI model first requires a large amount of data training to achieve high accuracy, thereby functioning in actual production. The breakthrough of AI technology, in addition to the algorithm itself, is the most important reason that the computation power is rapidly increasing and the GPU accelerator card plays a crucial role.
Caffe can run on a CPU or a GPU, and in a model training stage, the GPU is a computing component with the strongest performance at present, but the GPU performs the maximum computing performance and still needs the cooperation of the CPU, a memory system, a PCIE system, a heat dissipation system and other IO systems. The prior art is concerned only with GPU utilization. However, the capability optimization is not only needed to optimize the GPU, but also needs to dynamically monitor the CPU, the memory system, the PCIE system, the heat dissipation system, and the IO system to find out the performance bottleneck, thereby providing an effective optimization scheme.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method, a system, a computer device, and a computer readable storage medium for dynamically optimizing Caffe performance, which discover and eliminate bottlenecks by monitoring system resource operation conditions, and ensure that the Caffe performance running on a GPU reaches an optimal state.
Based on the above object, an aspect of the embodiments of the present invention provides a method for dynamically optimizing Caffe performance, including the following steps: acquiring the utilization rates and the temperatures of a CPU and a GPU every other first preset time, and judging whether the heat dissipation requirements are met or not according to the temperatures of the CPU and the GPU; responding to the heat dissipation requirement, and judging whether the utilization rate of the CPU is smaller than or equal to a utilization rate threshold value; responding to the condition that the utilization rate of the CPU is smaller than or equal to the utilization rate threshold value, calculating a variation coefficient of the GPU utilization rate in second preset time, and judging whether a performance bottleneck exists or not based on the variation coefficient; and increasing the batch size and the number of data transfer lines in response to the presence of a performance bottleneck.
In some embodiments, the determining whether the heat dissipation requirement is met according to the temperatures of the CPU and the GPU includes: judging whether the temperature of the CPU is smaller than a first temperature threshold value or not; and judging whether the temperature of the GPU is smaller than a second temperature threshold value.
In some embodiments, further comprising: and responding to the situation that the heat dissipation requirement is not met, and adjusting the duty ratio of the fan according to the temperature of the CPU and the GPU.
In some embodiments, further comprising: in response to the utilization rate of the CPU being greater than the utilization rate threshold, determining whether the utilization rate of the CPU is less than or equal to a second utilization rate threshold; and increasing the operating frequency of the CPU in response to the utilization of the CPU being less than or equal to a second utilization threshold.
In some embodiments, further comprising: and acquiring and judging whether the training data set is cached in the memory or not based on the disk input and output rate, the memory and the size of the training data set.
In some embodiments, the determining whether the training data set has been cached in the memory comprises: judging whether the input and output speed of the disk is less than the input and output speed threshold value or not; and responding to the condition that the input and output speed of the disk is smaller than the input and output speed threshold value, and judging whether the size of the cache in the memory stops increasing and is larger than the size of the training data set.
In another aspect of the embodiments of the present invention, a system for dynamically optimizing Caffe performance is further provided, including: the sampling module is configured to acquire the utilization rates and the temperatures of the CPU and the GPU every other first preset time, and judge whether the heat dissipation requirements are met according to the temperatures of the CPU and the GPU; the judging module is configured to respond to the requirement of meeting the heat dissipation and judge whether the utilization rate of the CPU is less than or equal to a utilization rate threshold value; the analysis module is configured to respond that the utilization rate of the CPU is smaller than or equal to the utilization rate threshold, calculate a variation coefficient of the GPU utilization rate in a second preset time, and judge whether a performance bottleneck exists or not based on the variation coefficient; and a processing module configured to increase the batch size and the number of data transfer lines in response to a performance bottleneck.
In some embodiments, the sampling module is further configured to: judging whether the temperature of the CPU is smaller than a first temperature threshold value or not; and judging whether the temperature of the GPU is smaller than a second temperature threshold value.
In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method as above.
In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.
The invention has the following beneficial technical effects: by monitoring the system resource operation condition, the bottleneck is discovered and eliminated, and the effect performance running on the GPU is ensured to reach the optimal state.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a schematic diagram of an embodiment of a method for dynamically optimizing Caffe performance provided by the present invention;
fig. 2 is a schematic diagram of a hardware structure of an embodiment of the method for dynamically optimizing Caffe performance provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In view of the above, a first aspect of the embodiments of the present invention provides an embodiment of a method for dynamically optimizing Caffe performance. Fig. 1 is a schematic diagram illustrating an embodiment of the method for dynamically optimizing Caffe performance according to the present invention. As shown in fig. 1, the embodiment of the present invention includes the following steps:
s1, acquiring the utilization rate and the temperature of the CPU and the GPU every other first preset time, and judging whether the heat dissipation requirements are met according to the temperatures of the CPU and the GPU;
s2, responding to the heat dissipation requirement, judging whether the utilization rate of the CPU is less than or equal to a utilization rate threshold value;
s3, responding to the fact that the utilization rate of the CPU is smaller than or equal to the utilization rate threshold value, calculating the variation coefficient of the GPU utilization rate in the second preset time, and judging whether a performance bottleneck exists or not based on the variation coefficient; and
and S4, responding to the existence of the performance bottleneck, and increasing the batch size and the data transmission line number.
The context (Convolutional Architecture for Fast Feature Embedding) is an open source software framework, a set of basic programming framework or a template framework is provided inside the framework, and is used for realizing algorithms such as deep Convolutional neural network deep learning under a GPU parallel Architecture. At present, Caffe deployed in a production environment is used for reasoning operation, such as face recognition, gene information extraction, image processing and the like.
And acquiring the utilization rates and the temperatures of the CPU and the GPU every other first preset time, and judging whether the heat dissipation requirements are met according to the temperatures of the CPU and the GPU. And (5) running a Caffe program and starting model training operation. Monitoring the system, acquiring the utilization rate and the operating frequency of a CPU core, and monitoring the temperature of the CPU; obtaining the use condition of a memory and the use condition of cache (cache); obtaining disk IO (input/output), and obtaining a process of reading disk data into a memory; acquiring a memory bandwidth; acquiring GPU utilization rate and temperature; further, the sampling time interval does not exceed 1s, and the sampled data is archived as excel table data for subsequent analysis.
In some embodiments, the determining whether the heat dissipation requirement is met according to the temperatures of the CPU and the GPU includes: judging whether the temperature of the CPU is smaller than a first temperature threshold value or not; and judging whether the temperature of the GPU is smaller than a second temperature threshold value. The first temperature threshold may be 55 degrees, the second temperature threshold may be 60 degrees, and if the CPU temperature is lower than 55 degrees and the GPU temperature is lower than 60 degrees, the heat dissipation requirement is satisfied, which may be set as class I. A third temperature threshold and a fourth temperature threshold may also be set, for example, the third temperature threshold may be 80 degrees, the fourth temperature threshold may be 70 degrees, if the CPU temperature is higher than 55 degrees and lower than 80 degrees, the GPU temperature is higher than 60 degrees and lower than 70 degrees, which does not meet the heat dissipation requirement, and is level II; if the temperature of the CPU is higher than 80 ℃, the temperature of the GPU is lower than 70 ℃, or the temperature of the CPU is lower than 80 ℃, the temperature of the GPU is higher than 70 ℃, the heat dissipation requirement is not met, and the grade III is obtained; the temperature of the CPU is higher than 80 ℃, the temperature of the GPU is higher than 70 ℃, the heat dissipation requirement is not met, and the temperature is IV grade.
In some embodiments, further comprising: and responding to the situation that the heat dissipation requirement is not met, and adjusting the duty ratio of the fan according to the temperature of the CPU and the GPU. For example, when the temperature is in class II, the fan speed may be increased, i.e., the fan duty cycle may be 45%; when the temperature is in the III level, the duty ratio of the fan can be adjusted to be 75 percent; when the temperature is in class IV, the fan can be adjusted to full speed operation, i.e., the fan duty cycle is 100%.
And responding to the heat dissipation requirement, and judging whether the utilization rate of the CPU is less than or equal to a utilization rate threshold value. The utilization threshold may be 70%. If the CPU utilization is less than or equal to 70% indicates that the processor resources are sufficient.
In some embodiments, further comprising: in response to the utilization rate of the CPU being greater than the utilization rate threshold, determining whether the utilization rate of the CPU is less than or equal to a second utilization rate threshold; and increasing the operating frequency of the CPU in response to the utilization of the CPU being less than or equal to a second utilization threshold. The second utilization threshold may be 90%. If the CPU utilization is greater than 70% and less than 90%, indicating that the processor frequency may be insufficient; if the CPU utilization is greater than 90%, it indicates that the processor is out of resources. When the processor resources are insufficient, prompting a user to increase the physical core hardware resources of the processor; and when the frequency of the processor is insufficient, starting the over-frequency function of the processor.
And responding to the condition that the utilization rate of the CPU is less than or equal to the utilization rate threshold value, calculating the variation coefficient of the GPU utilization rate in the second preset time, and judging whether a performance bottleneck exists or not based on the variation coefficient. The Coefficient of Variation (CV) is the ratio of the standard deviation to the mean, and a larger result indicates a larger degree of dispersion of the data, i.e., a poorer stability. The second predetermined time may be 5 minutes, a coefficient of variation of the GPU utilization within 5 minutes may be calculated, a coefficient of variation threshold may be preset, for example, 3%, if the coefficient of variation is less than 3%, it is indicated that the GPU performance is stable, there is no performance bottleneck, and it may be defined as level I. Determining whether a performance bottleneck exists based on the coefficient of variation may include: if the coefficient of variation is greater than or equal to 3% and less than 8%, defining as class II; if the coefficient of variation is greater than or equal to 8% and less than 12%, defining as class III; if the coefficient of variation is greater than or equal to 12%, it is defined as class IV.
In response to the presence of a performance bottleneck, the batch size and the number of data transfer lines are increased. If the coefficient of variation is I grade, the Batcheze (batch size) keeps the current value, and no measure is taken; if the coefficient of variation is II level, increasing the Batchsize to the power N times of 2 of the current value, wherein N is more than or equal to 2 and less than or equal to 6, and increasing the number of data transmission threads to the number of processor cores; if the coefficient of variation is level III, increasing the Batchsize to the power N times of 2 of the current value, wherein N is more than or equal to 6 and less than or equal to 8, and increasing the number of data transmission threads to the number of processor cores; and if the coefficient of variation is IV level, increasing the Batchsize to the power N times of 2 of the current value, wherein N is more than or equal to 10 and less than or equal to 12, and increasing the number of data transmission threads to the number of processor cores.
In some embodiments, further comprising: and acquiring and judging whether the training data set is cached in the memory or not based on the disk input and output rate, the memory and the size of the training data set.
In some embodiments, the determining whether the training data set has been cached in the memory comprises: judging whether the input and output speed of the disk is less than the input and output speed threshold value or not; and responding to the condition that the input and output speed of the disk is smaller than the input and output speed threshold value, and judging whether the size of the cache in the memory stops increasing and is larger than the size of the training data set. The threshold input/output rate may be 5KB/s, and if the disk input/output rate is less than 5KB/s, it indicates that the disk input/output is stopped, and at this time, if the size of the buffer in the memory stops increasing and is larger than the size of the training data set, it indicates that the training data set has been buffered in the memory. If the size of the buffer is less than or equal to the size of the training data set, it indicates that the memory capacity is insufficient, and the physical memory space may be increased.
And starting a display module, wherein the display module is provided with a graphical interface and can acquire all the data. The obtained data can be dynamically analyzed, and the analysis result is stored in a log.
It should be particularly noted that, the steps in the embodiments of the method for dynamically optimizing Caffe performance described above can be mutually intersected, replaced, added, and deleted, so that these methods for dynamically optimizing Caffe performance, which are reasonably arranged and combined, should also belong to the scope of the present invention, and should not limit the scope of the present invention to the embodiments.
In view of the above object, a second aspect of the embodiments of the present invention provides a system for dynamically optimizing Caffe performance, including: the sampling module is configured to acquire the utilization rates and the temperatures of the CPU and the GPU every other first preset time, and judge whether the heat dissipation requirements are met according to the temperatures of the CPU and the GPU; the judging module is configured to respond to the requirement of meeting the heat dissipation and judge whether the utilization rate of the CPU is less than or equal to a utilization rate threshold value; the analysis module is configured to respond that the utilization rate of the CPU is smaller than or equal to the utilization rate threshold, calculate a variation coefficient of the GPU utilization rate in a second preset time, and judge whether a performance bottleneck exists or not based on the variation coefficient; and a processing module configured to increase the batch size and the number of data transfer lines in response to a performance bottleneck.
In some embodiments, the sampling module is further configured to: judging whether the temperature of the CPU is smaller than a first temperature threshold value or not; and judging whether the temperature of the GPU is smaller than a second temperature threshold value.
In some embodiments, the processing module is further configured to: and responding to the situation that the heat dissipation requirement is not met, and adjusting the duty ratio of the fan according to the temperature of the CPU and the GPU.
In some embodiments, the system further comprises a second determining module configured to: in response to the utilization rate of the CPU being greater than the utilization rate threshold, determining whether the utilization rate of the CPU is less than or equal to a second utilization rate threshold; and increasing the operating frequency of the CPU in response to the utilization of the CPU being less than or equal to a second utilization threshold.
In some embodiments, the sampling module is further configured to: and acquiring and judging whether the training data set is cached in the memory or not based on the disk input and output rate, the memory and the size of the training data set.
In some embodiments, the sampling module is further configured to: judging whether the input and output speed of the disk is less than the input and output speed threshold value or not; and responding to the condition that the disk input/output rate is smaller than the input/output rate threshold value, and judging whether the size of the memory stops increasing and is larger than the size of the training data set.
In view of the above object, a third aspect of the embodiments of the present invention provides a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions being executable by the processor to perform the steps of: s1, acquiring the utilization rate and the temperature of the CPU and the GPU every other first preset time, and judging whether the heat dissipation requirements are met according to the temperatures of the CPU and the GPU; s2, responding to the heat dissipation requirement, judging whether the utilization rate of the CPU is less than or equal to a utilization rate threshold value; s3, responding to the fact that the utilization rate of the CPU is smaller than or equal to the utilization rate threshold value, calculating the variation coefficient of the GPU utilization rate in the second preset time, and judging whether a performance bottleneck exists or not based on the variation coefficient; and S4, increasing the batch size and the number of data transfer lines in response to the existence of the performance bottleneck.
In some embodiments, the determining whether the heat dissipation requirement is met according to the temperatures of the CPU and the GPU includes: judging whether the temperature of the CPU is smaller than a first temperature threshold value or not; and judging whether the temperature of the GPU is smaller than a second temperature threshold value.
In some embodiments, further comprising: and responding to the situation that the heat dissipation requirement is not met, and adjusting the duty ratio of the fan according to the temperature of the CPU and the GPU.
In some embodiments, further comprising: in response to the utilization rate of the CPU being greater than the utilization rate threshold, determining whether the utilization rate of the CPU is less than or equal to a second utilization rate threshold; and increasing the operating frequency of the CPU in response to the utilization of the CPU being less than or equal to a second utilization threshold.
In some embodiments, further comprising: and acquiring and judging whether the training data set is cached in the memory or not based on the disk input and output rate, the memory and the size of the training data set.
In some embodiments, the determining whether the training data set has been cached in the memory comprises: judging whether the input and output speed of the disk is less than the input and output speed threshold value or not; and responding to the condition that the disk input/output rate is smaller than the input/output rate threshold value, and judging whether the size of the memory stops increasing and is larger than the size of the training data set.
Fig. 2 is a schematic diagram of a hardware structure of an embodiment of the method for dynamically optimizing Caffe performance according to the present invention.
Taking the apparatus shown in fig. 2 as an example, the apparatus includes a processor 301 and a memory 302, and may further include: an input device 303 and an output device 304.
The processor 301, the memory 302, the input device 303 and the output device 304 may be connected by a bus or other means, and fig. 2 illustrates the connection by a bus as an example.
Memory 302, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the method for dynamically optimizing Caffe's performance in the embodiments of the present application. The processor 301 executes various functional applications of the server and data processing by running nonvolatile software programs, instructions and modules stored in the memory 302, namely, implementing the method for dynamically optimizing the performance of Caffe according to the above method embodiment.
The memory 302 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the method of dynamically optimizing the performance of Caffe, and the like. Further, the memory 302 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 302 optionally includes memory located remotely from processor 301, which may be connected to a local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 303 may receive information such as a user name and a password that are input. The output means 304 may comprise a display device such as a display screen.
Program instructions/modules corresponding to one or more methods for dynamically optimizing Caffe performance are stored in memory 302 and, when executed by processor 301, perform the methods for dynamically optimizing Caffe performance in any of the above-described method embodiments.
Any embodiment of a computer apparatus implementing the method for dynamically optimizing Caffe's performance as described above may achieve the same or similar effects as any of the preceding method embodiments corresponding thereto.
The invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, performs the method as above.
Finally, it should be noted that, as one of ordinary skill in the art can appreciate that all or part of the processes of the methods of the above embodiments can be implemented by a computer program to instruct related hardware, and the program of the method for dynamically optimizing Caffe performance can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
Furthermore, the methods disclosed according to embodiments of the present invention may also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. Which when executed by a processor performs the above-described functions defined in the methods disclosed in embodiments of the invention.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A method for dynamically optimizing Caffe performance, comprising the steps of:
acquiring the utilization rates and the temperatures of a CPU and a GPU every other first preset time, and judging whether the heat dissipation requirements are met or not according to the temperatures of the CPU and the GPU;
responding to the heat dissipation requirement, and judging whether the utilization rate of the CPU is smaller than or equal to a utilization rate threshold value;
responding to the condition that the utilization rate of the CPU is smaller than or equal to the utilization rate threshold value, calculating a variation coefficient of the GPU utilization rate in second preset time, and judging whether a performance bottleneck exists or not based on the variation coefficient; and
in response to the presence of a performance bottleneck, the batch size and the number of data transfer lines are increased.
2. The method of claim 1, wherein determining whether the heat dissipation requirements are met according to the temperatures of the CPU and the GPU comprises:
judging whether the temperature of the CPU is smaller than a first temperature threshold value or not; and
and judging whether the temperature of the GPU is smaller than a second temperature threshold value.
3. The method of claim 1, further comprising:
and responding to the situation that the heat dissipation requirement is not met, and adjusting the duty ratio of the fan according to the temperature of the CPU and the GPU.
4. The method of claim 1, further comprising:
in response to the utilization rate of the CPU being greater than the utilization rate threshold, determining whether the utilization rate of the CPU is less than or equal to a second utilization rate threshold; and
and increasing the running frequency of the CPU in response to the utilization rate of the CPU being less than or equal to a second utilization rate threshold.
5. The method of claim 1, further comprising:
and acquiring and judging whether the training data set is cached in the memory or not based on the disk input and output rate, the memory and the size of the training data set.
6. The method of claim 5, wherein determining whether the training data set has been cached in memory comprises:
judging whether the input and output speed of the disk is less than the input and output speed threshold value or not; and
and responding to the condition that the input and output speed of the disk is less than the input and output speed threshold value, and judging whether the size of the cache in the memory stops increasing and is larger than the size of the training data set.
7. A system for dynamically optimizing Caffe performance, comprising:
the sampling module is configured to acquire the utilization rates and the temperatures of the CPU and the GPU every other first preset time, and judge whether the heat dissipation requirements are met according to the temperatures of the CPU and the GPU;
the judging module is configured to respond to the requirement of meeting the heat dissipation and judge whether the utilization rate of the CPU is less than or equal to a utilization rate threshold value;
the analysis module is configured to respond that the utilization rate of the CPU is smaller than or equal to the utilization rate threshold, calculate a variation coefficient of the GPU utilization rate in a second preset time, and judge whether a performance bottleneck exists or not based on the variation coefficient; and
a processing module configured to increase a batch size and a data transfer line count in response to a performance bottleneck.
8. The system of claim 7, wherein the sampling module is further configured to:
judging whether the temperature of the CPU is smaller than a first temperature threshold value or not; and
and judging whether the temperature of the GPU is smaller than a second temperature threshold value.
9. A computer device, comprising:
at least one processor; and
a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method of any one of claims 1 to 5.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
CN202010007563.9A 2020-01-04 2020-01-04 Method, system, device and medium for dynamically optimizing Caffe performance Withdrawn CN111221715A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010007563.9A CN111221715A (en) 2020-01-04 2020-01-04 Method, system, device and medium for dynamically optimizing Caffe performance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010007563.9A CN111221715A (en) 2020-01-04 2020-01-04 Method, system, device and medium for dynamically optimizing Caffe performance

Publications (1)

Publication Number Publication Date
CN111221715A true CN111221715A (en) 2020-06-02

Family

ID=70806247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010007563.9A Withdrawn CN111221715A (en) 2020-01-04 2020-01-04 Method, system, device and medium for dynamically optimizing Caffe performance

Country Status (1)

Country Link
CN (1) CN111221715A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112612349A (en) * 2020-12-18 2021-04-06 苏州浪潮智能科技有限公司 Method and equipment for increasing CPU heat dissipation efficiency
CN114138449A (en) * 2021-12-14 2022-03-04 河南省儿童医院郑州儿童医院 Rehabilitation training system based on virtual reality

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112612349A (en) * 2020-12-18 2021-04-06 苏州浪潮智能科技有限公司 Method and equipment for increasing CPU heat dissipation efficiency
CN114138449A (en) * 2021-12-14 2022-03-04 河南省儿童医院郑州儿童医院 Rehabilitation training system based on virtual reality

Similar Documents

Publication Publication Date Title
CN110032449A (en) A kind of method and device for the performance optimizing GPU server
Liu et al. Monitoring and analyzing big traffic data of a large-scale cellular network with Hadoop
CN109688207B (en) Log transmission method and device and server
CN111901377B (en) AI training platform-based file transmission method, device, equipment and medium
CN111221715A (en) Method, system, device and medium for dynamically optimizing Caffe performance
CN110413475B (en) Method and device for correcting server power consumption measured value
CN111258856A (en) Method, system, equipment and medium for monitoring running state of solid state disk
US20220303198A1 (en) Method and apparatus for detecting anomaly of traffic of internet of things device based on automata
CN111324533A (en) A/B test method and device and electronic equipment
CN111858284A (en) Resource monitoring method and device for artificial intelligence server
CN103177080B (en) The method and apparatus that file pre-reads
CN114035748A (en) Data file access method and system
CN111078497A (en) Data storage method, equipment and storage medium of BMC (baseboard management controller)
CN114579533A (en) Method and device for acquiring user activity index, electronic equipment and storage medium
CN111309264B (en) Method, system, device and medium for making directory quota compatible with snapshot
CN111176932B (en) Method and device for recording abnormal event log and readable medium
CN112380088A (en) Test method and device and electronic equipment
WO2020232903A1 (en) Monitoring task dynamic adjustment method and apparatus, and computer device and storage medium
US20170235717A1 (en) Method and Unit for Building Semantic Rule for a Semantic Data
CN107562790B (en) Method and system for realizing batch warehousing of data processing
CN111338811A (en) User writing behavior analysis method, server, terminal, system and electronic equipment
KR20210000041A (en) Method and apparatus for analyzing log data in real time
CN114218586B (en) Business data intelligent management method and device, electronic equipment and storage medium
WO2021244441A1 (en) Service configuration method and apparatus therefor
CN112367384B (en) Kafka cluster-based dynamic speed limiting method and device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200602