CN111045912B - AI application performance evaluation method, device and related equipment - Google Patents

AI application performance evaluation method, device and related equipment Download PDF

Info

Publication number
CN111045912B
CN111045912B CN201911386452.7A CN201911386452A CN111045912B CN 111045912 B CN111045912 B CN 111045912B CN 201911386452 A CN201911386452 A CN 201911386452A CN 111045912 B CN111045912 B CN 111045912B
Authority
CN
China
Prior art keywords
application
performance
target
data
gpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911386452.7A
Other languages
Chinese (zh)
Other versions
CN111045912A (en
Inventor
王鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201911386452.7A priority Critical patent/CN111045912B/en
Publication of CN111045912A publication Critical patent/CN111045912A/en
Application granted granted Critical
Publication of CN111045912B publication Critical patent/CN111045912B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • G06F11/3423Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time where the assessed time is active or idle time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/865Monitoring of software

Abstract

The application discloses an AI application performance evaluation method, which comprises the steps of starting a target AI application according to a received evaluation instruction; calling NVPROF to monitor the running process of the target AI application to obtain first performance index data; calling TEYE to monitor the running process of the target AI application to obtain second performance index data; calculating the first performance index data and the second performance index data by using a preset performance index calculation rule to obtain a performance evaluation value of the target AI application; the AI application performance evaluation method can carry out more effective performance evaluation on the AI application, and improves the performance evaluation efficiency while ensuring higher accuracy. The application also discloses an AI application performance evaluation device, equipment and a computer readable storage medium, all having the beneficial effects.

Description

AI application performance evaluation method, device and related equipment
Technical Field
The present application relates to the field of network application technologies, and in particular, to an AI application performance evaluation method, and further, to an AI application performance evaluation device, an AI application performance evaluation apparatus, and a computer-readable storage medium.
Background
AI (Artificial Intelligence) application generalizes and is divided into two categories of training and reasoning, wherein AI training can generate a deep learning model for feature extraction or target prediction, and the higher the model generation efficiency is, the faster the production practice speed is; AI reasoning can utilize the model generated by training to realize scene tasks, and the higher the AI reasoning efficiency is, the better the user experience is, and the more efficient the system performance is. Therefore, the tracking, monitoring and analyzing of the AI application performance are important, and the AI application performance can guide how to improve the training efficiency and the reasoning efficiency subsequently.
The conventional AI application is generally developed based on a deep learning framework such as PyTorch and TensorFlow, and when the AI application is tracked and monitored, a code positioning method based on a Debug program is adopted to analyze the calling time consumption of each functional module or determine the use condition of a GPU core by adopting an NVIDIA-SMI tool. However, because deep learning frames are packaged more, it is difficult to locate the bottom layer calling duration and functions; since the block positioning analysis is time-consuming, it is difficult to understand the relationship between the whole and the part. Therefore, based on the two points, the current AI application performance analysis does not form a systematic method, and more is adjusted and optimized by depending on the experience of research and development personnel, so that the efficiency is low, and the accuracy is low.
Therefore, how to perform more effective performance evaluation on the AI application, and improve the performance evaluation efficiency while ensuring higher accuracy is a problem to be urgently solved by those skilled in the art.
Disclosure of Invention
The AI application performance evaluation method can carry out more effective performance evaluation on the AI application, and improves the performance evaluation efficiency while ensuring higher accuracy; another object of the present application is to provide an AI application performance evaluation apparatus, system, device, and computer-readable storage medium, which also have the above-mentioned advantageous effects.
In order to solve the above technical problem, the present application provides an AI application performance evaluation method, where the AI application performance evaluation method includes:
starting a target AI application according to the received evaluation instruction;
calling NVPROF to monitor the running process of the target AI application to obtain first performance index data;
calling TEYE to monitor the running process of the target AI application to obtain second performance index data;
and calculating the first performance index data and the second performance index data by using a preset performance index calculation rule to obtain a performance evaluation value of the target AI application.
Preferably, the starting the target AI application according to the received evaluation instruction includes:
determining the target AI application according to the evaluation instruction;
and starting the target AI application so that the target AI application calls a preset data set to perform specified model training.
Preferably, the calling TEYE monitors the running process of the target AI application to obtain second performance index data, and the method includes:
and calling the TEYE, monitoring the running process of the target AI application in a TIMELINE mode, and obtaining the second performance index data.
Preferably, the first performance indicator data includes GPU data transmission time and GPU data calculation time.
Preferably, the second performance indicator data includes GPU core utilization and GPU bandwidth utilization.
Preferably, the calculating the first performance index data and the second performance index data by using a preset performance index calculation rule to obtain the performance evaluation value of the target AI application includes:
calculating according to the GPU data transmission time and the GPU data calculation time to obtain a time index value;
calculating to obtain a hardware index value according to the GPU core utilization rate and the GPU bandwidth utilization rate;
and calculating to obtain the performance evaluation value according to the time index value and the hardware index value.
Preferably, the AI application performance evaluation method further includes:
comparing and analyzing each performance index data with a corresponding preset threshold value; wherein the performance indicator data comprises the first performance indicator data and the second performance indicator data;
and when the performance index data does not meet the preset threshold, calling a corresponding preset optimization strategy to perform performance optimization on the target AI application.
In order to solve the above technical problem, the present application further provides an AI application performance evaluation apparatus, including:
the AI application starting module is used for starting the target AI application according to the received evaluation instruction;
the first index calculation module is used for calling the NVPROF to monitor the running process of the target AI application to obtain first performance index data;
the second index calculation module is used for calling TEYE to monitor the running process of the target AI application and obtain second performance index data;
and the performance evaluation module is used for calculating the first performance index data and the second performance index data by using a preset performance index calculation rule to obtain a performance evaluation value of the target AI application.
In order to solve the above technical problem, the present application further provides an AI application performance evaluation device, including:
a memory for storing a computer program;
a processor for implementing the steps of any one of the above-described AI application performance evaluation methods when executing the computer program.
To solve the above technical problem, the present application further provides a computer-readable storage medium having a computer program stored thereon, where the computer program, when executed by a processor, implements the steps of any one of the above AI application performance evaluation methods.
The AI application performance evaluation method comprises the steps of starting a target AI application according to a received evaluation instruction; calling NVPROF to monitor the running process of the target AI application to obtain first performance index data; calling TEYE to monitor the running process of the target AI application to obtain second performance index data; and calculating the first performance index data and the second performance index data by using a preset performance index calculation rule to obtain a performance evaluation value of the target AI application.
Therefore, the AI application performance evaluation method provided by the application realizes performance monitoring of the AI application in the operation process through NVPROF and TEYE, and calculates the performance index data obtained by monitoring based on the preset performance index calculation rule, so as to obtain the performance evaluation value of the AI application and realize performance evaluation, and the realization mode does not need to debug the code module, thereby effectively reducing the performance positioning analysis threshold, and also can integrally analyze the condition of calling API of the AI application and the proportion occupied by single operation calling, thereby more comprehensively and accurately obtaining the detection result of the system performance; in addition, the implementation mode does not need to depend on manual experience, so that labor can be effectively saved, and higher performance evaluation efficiency and accuracy are guaranteed.
The AI application performance evaluation device, the AI application performance evaluation apparatus, and the computer-readable storage medium provided by the present application all have the above beneficial effects, and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart of an AI application performance evaluation method provided in the present application;
fig. 2 is a schematic flow chart of an AI application performance evaluation apparatus provided in the present application;
fig. 3 is a schematic structural diagram of an AI application performance evaluation device provided in the present application.
Detailed Description
The core of the application is to provide an AI application performance evaluation method, which can carry out more effective performance evaluation on AI application, thereby improving the performance evaluation efficiency while ensuring higher accuracy; another core of the present application is to provide an AI application performance evaluation apparatus, device, and computer-readable storage medium, which also have the above-mentioned advantageous effects.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a schematic flow chart of an AI application performance evaluation method provided in the present application, where the AI application performance evaluation method may include:
s101: starting a target AI application according to the received evaluation instruction;
the step aims to realize the starting of a target AI application, namely the AI application needing performance evaluation, and because the performance evaluation of the AI application needs the AI application to be in a running state, when the performance evaluation of a certain AI application is needed, the AI application needs to be started first. Specifically, when the evaluation instruction is received, the relevant information of the target AI application can be acquired from the evaluation instruction to determine the target AI application, so as to start the AI application.
Preferably, the starting the target AI application according to the received evaluation instruction may include: determining a target AI application according to the evaluation instruction; and starting the target AI application so that the target AI application calls a preset data set to perform specified model training.
The preferred embodiment provides a more specific starting method for a target AI application, and specifically, the AI application is generally used for model training and reasoning, so that the target AI application can be started to train a designated model, wherein the model training process is realized based on a preset data set, and the preset data set is a set of pre-collected data information, is stored in a preset storage space, and can be directly called; of course, the type of the data in the preset data set is not unique and corresponds to the type of the model to be trained (the designated model), and similarly, the type of the model to be trained does not affect the implementation of the present technical solution, and may be any Network model, such as a CNN (Convolutional Neural Network) training model.
S102: calling NVPROF to monitor the running process of the target AI application to obtain first performance index data;
the step aims to realize the acquisition of the first performance index data, and the NVPROF can be directly called to monitor the target AI application in the running process of the target AI application so as to realize the acquisition of the corresponding performance index data and further obtain the first performance index data. In particular, NVPROF is a feature extraction tool developed based on GPU application, and can be used for test understanding and optimizing application program performance, such as CUDA, OpenACC, and the like, which can enable a user to collect and view analysis data from a command line, and is mainly used for realizing test of program time performance.
Preferably, the first performance indicator data may include a GPU data transmission time and a GPU data calculation time.
The preferred embodiment provides first performance index data of a specific type, namely GPU data transmission time and GPU data calculation time, wherein the GPU data transmission time can be time information used when data transmission is carried out among all functional modules in a system; the GPU data calculation time may be time information used when each functional module in the system performs data calculation. It is to be understood that the above type of data is only one implementation manner provided by the preferred embodiment, and is not exclusive, and may also include other types of performance indexes, which is not limited in this application.
S103: calling TEYE to monitor the running process of the target AI application to obtain second performance index data;
in the step, the second performance index data is acquired, and the TEYE can be directly called to monitor the target AI application in the running process of the target AI application so as to acquire the corresponding performance index data and acquire the second performance index data. Specifically, the TEYE (feature monitoring and analyzing system) is a high-performance application feature extraction tool based on an X86 server, and is mainly used for extracting the situation of system resource occupation when a high-performance application program runs on a large-scale cluster, and reflecting the running features of the application program in real time, so that a user is helped to furthest mine the computing potential of the application on the existing platform, and a scientific guidance direction is provided for system optimization, application program optimization, and application algorithm adjustment and improvement.
Preferably, the invoking TEYE monitors the running process of the target AI application to obtain the second performance index data, and the obtaining may include: and calling TEYE, monitoring the running process of the target AI application in a TIMELINE (time axis) mode, and obtaining second performance index data.
The preferred embodiment provides a more specific method for acquiring the second performance index data, that is, the method is implemented based on the timeout mode. Specifically, in the performance monitoring process, performance analysis items can be refined by positioning the utilization rates of different modules of the GPU, for example, in the model training process, the system can be divided into a data input reading module, a data preprocessing module, a network model forward computing module, a back propagation and optimization module, and a model transmission and storage module, so that the idleness of GPU equipment and the like when a process runs a corresponding function module can be judged through the submodule timeout, so as to discover performance bottlenecks in time.
Preferably, the second performance indicator data may include GPU core utilization and GPU bandwidth utilization.
The preferred embodiment provides specific types of second performance index data, that is, a GPU core utilization rate and a GPU bandwidth utilization rate, and similarly, the above types of data are only one implementation manner provided by the preferred embodiment, and are not unique, and may also include other types of performance indexes, such as a memory bandwidth utilization rate, a video memory bandwidth utilization rate, a CPU utilization rate, a PCIE bandwidth utilization rate, a hard disk read-write index item, and the like, which is not limited in the present application.
S104: and calculating the first performance index data and the second performance index data by using a preset performance index calculation rule to obtain a performance evaluation value of the target AI application.
Specifically, after the first performance index data and the second performance index data are obtained through monitoring, a preset performance index calculation rule can be called to substitute the first performance index data and the second performance index data for calculation, so that the performance evaluation value of the target AI application is obtained. The preset performance index calculation rule is a preset calculation rule used for realizing AI application performance evaluation calculation, is stored in a preset storage space, and can be directly called when in use; of course, the specific content of the preset performance index calculation rule is not unique, and may be set by a technician according to actual requirements, for example, the preset performance index calculation rule may correspond to a system type or an application platform, and the present application does not limit this.
Preferably, the calculating the first performance index data and the second performance index data by using the preset performance index calculation rule to obtain the performance evaluation value of the target AI application may include: calculating according to the GPU data transmission time and the GPU data calculation time to obtain a time index value; calculating according to the GPU core utilization rate and the GPU bandwidth utilization rate to obtain a hardware index value; and calculating according to the time index value and the hardware index value to obtain a performance evaluation value.
The preferred embodiment provides a more specific method for calculating a performance evaluation value, namely, the method is obtained by calculation based on GPU data transmission time, GPU data calculation time, GPU core utilization rate and GPU bandwidth utilization rate, time index data of a target AI application in the running process is obtained by calculation based on the GPU data transmission time and the GPU data calculation time, hardware index data of the target AI application in the running process is obtained by calculation based on the GPU core utilization rate and the GPU bandwidth utilization rate, and finally, the overall performance evaluation of the target AI application can be realized by combining the time index and the hardware index.
Preferably, the AI application performance evaluation method may further include: comparing and analyzing each performance index data with a corresponding preset threshold value; wherein the performance indicator data comprises first performance indicator data and second performance indicator data; and when the performance index data does not meet the preset threshold value, calling a corresponding preset optimization strategy to perform performance optimization on the target AI application.
The preferred embodiment aims to implement performance optimization of AI application, and specifically, after each performance index is obtained through monitoring, the performance index can be compared with a corresponding standard threshold, that is, the preset threshold, when the performance index data does not meet the corresponding preset threshold, it indicates that the corresponding performance is not good, performance optimization needs to be performed, and at this time, a corresponding optimization strategy can be called to perform performance optimization. For example, an optimization strategy table may be established in advance, the performance index in the table corresponds to the optimization strategy, and after the performance index to be optimized is determined, the corresponding optimization strategy is directly called from the optimization strategy table, thereby implementing performance optimization on the target AI application.
According to the AI application performance evaluation method provided by the application, performance monitoring of the AI application in the operation process is realized through NVPROF and TEYE, performance index data obtained through monitoring is calculated based on a preset performance index calculation rule, so that the performance evaluation value of the AI application is obtained, and performance evaluation is realized; in addition, the implementation mode does not need to depend on manual experience, so that labor can be effectively saved, and higher performance evaluation efficiency and accuracy are guaranteed.
On the basis of the above embodiments, taking CNN model training as an example, the embodiment of the present application provides a more specific AI application performance evaluation method, where CNN model training is performed based on AI applications, and a training process thereof is monitored and analyzed, a test platform used in the method adopts an X86 Tesla GPU platform, and basic software environments on which the test process depends include NVIDIA-DRIVER, CUDA, CUDNN, tensrflow, TEYE, NVPROF, and the like. The specific implementation flow of the AI application performance monitoring method is as follows:
(1) acquiring index data based on NVPROF:
first, a basic CNN training model algorithm is implemented based on a TensorFlow framework, and a test data set is called, in this embodiment, a ResNet50 model and a cifar10 data set are selected for AI application training test. Further, the NVPROF commands are used on the basis of the previous run code commands, and the testing of the key items (first performance indicator data) thereof includes: volta _ sgemm (cudnn) _ x _ _ _; wherein volta _ sgemm (cudnn) _ x _representsthe basic pattern of APIs for performing mathematical operations on the GPU (GPU data computation time), for example: volta _ scudnn _128x64_ relu _ interalor _ nn _ v1, volta represents a computing device, scudnn represents a computing library cudnn, 128x64 represents a matrix size, relu represents a computing type, and nn is a name of a developing library; CUDA memcpy to represents data communication (GPU data transfer time). Referring to table 1, table 1 is an index data table obtained based on NVPROF monitoring provided by the present application:
TABLE 1 index data sheet obtained based on NVPROF monitoring
Figure BDA0002343762560000081
Figure BDA0002343762560000091
Table 1 shows the important feature results of the obtained NVPROF, and analysis is performed according to the feature results, so that it can be found that the data transmission ratio is large, but the data transmission occurs before and after the module program runs, which may cause the usage rate of the GPU to exhibit regular fluctuation; the CUDA memory PtoP time is less in occupation, and the communication between the devices is proved to have no bottleneck, so that the communication time consumption of the memory bandwidth can be known based on the data, and the I/O optimization of the data is the key point. From Table 1, the available GPU _ time data calculation is about 74.1%, the GPU _ time data transmission is about 25.9%, then:
Figure BDA0002343762560000092
wherein, P1The time index value is obtained; GPU _ time1Computing time, GPU _ time, for GPU data2The GPU data transfer time.
(2) Index data acquisition based on TEYE:
in the AI application running process, extracting key feature data (second function index data) by using a TEYE tool and a timeout mode may specifically include: memory bandwidth detection (Mem _ total _ bw _ GB), PCIE bandwidth detection (PCIE _ WR _ bw _ GB), GPU core usage detection (GPU _ rate), GPU bandwidth usage detection (GPU _ Mem _ rate), and GPU Power consumption detection (GPU _ Power). If both the memory bandwidth and the PCIE bandwidth do not reach the peak value of actual use, the utilization rate of the video memory bandwidth cannot reach more than 70% indirectly; the utilization efficiency of the video memory bandwidth reflects the utilization efficiency of the GPU, namely the efficiency of AI application, to a certain extent; when the GPU utilization rate and the GPU power consumption fluctuate by 60% -90% of the peak value, a certain efficiency improvement space is provided, at the moment, the improvement of the calculation or communication performance is the key for improving the AI application performance, and the implementation method comprises the steps of increasing the Batchsize, changing the data reading mode, using the mixed precision training and the like. Further, performance analysis items are refined by positioning the utilization rates of different modules of GPUs, and the modules can be specifically divided into: 1. a data input reading module; 2. a data preprocessing module; 3. a CNN or neural network forward computing module; 4. a back propagation and optimization module; 5. and the model transmission and storage module can judge the idle rate of the GPU equipment when the corresponding module is operated by the process through the sub-module timeline so as to discover the performance bottleneck. Please refer to table 2, table 2 is an index data table obtained based on TEYE monitoring provided in the present application:
TABLE 2 index data sheet based on TEYE monitoring
Figure BDA0002343762560000103
According to the important characteristic result of TEYE obtained in table 2, if the data of GPU _ rate, GPU _ mem _ rate, and time in the 5 modules can be known, then:
Figure BDA0002343762560000101
n=5;i∈[1,n]
wherein, P2The hardware index value is obtained; GPUrateFor GPU core utilization, GPU _ menrateFor GPU bandwidth utilization, n is the total number of modules, timeiAnd occupying time for the operation of the ith module, wherein the time is the total time length.
(3) And (3) overall performance evaluation:
and calling a preset performance index calculation rule, and performing overall analysis on the two sets of data, wherein the preset performance index calculation rule is as follows:
Figure BDA0002343762560000102
i∈[1,n]
the P value is the performance evaluation value, and can represent the performance of GPU computing performance, and when the P value is lower than 0.2, it can be said that the computation is sufficient, the computation efficiency is high, otherwise, the efficiency is low.
Thus, based on the calculation data in tables 1 and 2, the value of P is 0.17 and less than 0.2, indicating that the AI application is fully used for the GPU device.
(4) Performance optimization:
according to experience, when the GPU core utilization rate is 85% -100%, the equipment can be considered to be fully used, and if the operation calculation ratio is large, the AI application performance optimization needs to be improved by depending on the calculation power of new equipment; if the total operation time of the CUDA memoey To is larger, the data input flow is considered To be continuously optimized, and whether bottlenecks occur in the memory bandwidth and the PCIE bandwidth is checked. When the usage rate of the GPU core is lower than 85% for most of the time, which indicates that the usage rate of the computing device is not sufficient, it is necessary to increase the computation or communication values, such as increasing the batch, increasing the number of computation layers, reducing IO of data, and using a more efficient data communication method.
It can be seen that the AI application performance evaluation method provided by the embodiment of the application realizes performance monitoring of an AI application in an operation process through NVPROF and TEYE, and calculates performance index data obtained by monitoring based on a preset performance index calculation rule, thereby obtaining a performance evaluation value of the AI application, and realizing performance evaluation, and the implementation mode does not need to debug a code module, thereby effectively reducing a performance positioning analysis threshold, and also can integrally analyze the condition of calling an API by the AI application and the proportion of single operation calling, thereby more comprehensively and accurately obtaining a detection result of system performance; in addition, the implementation mode does not need to depend on manual experience, so that labor can be effectively saved, and higher performance evaluation efficiency and accuracy are guaranteed.
To solve the above problem, please refer to fig. 2, fig. 2 is a schematic structural diagram of an AI application performance evaluation apparatus provided in the present application, where the AI application performance evaluation apparatus may include:
an AI application starting module 100, configured to start a target AI application according to a received evaluation instruction;
the first index calculation module 200 is configured to invoke the NVPROF to monitor an operation process of the target AI application, and obtain first performance index data;
the second index calculation module 300 is configured to call TEYE to monitor an operation process of the target AI application, and obtain second performance index data;
the performance evaluation module 400 is configured to calculate the first performance index data and the second performance index data by using a preset performance index calculation rule, so as to obtain a performance evaluation value of the target AI application.
It can be seen that the AI application performance evaluation device provided in the embodiment of the present application implements performance monitoring of an AI application in an operation process through NVPROF and TEYE, and calculates performance index data obtained by monitoring based on a preset performance index calculation rule, thereby obtaining a performance evaluation value of the AI application, and implementing performance evaluation, and this implementation manner does not need to debug a code module, effectively reduces a performance positioning analysis threshold, and can also analyze a situation that the AI application calls an API and a proportion occupied by a single operation call as a whole, thereby obtaining a detection result of system performance more comprehensively and accurately; in addition, the implementation mode does not need to depend on manual experience, so that labor can be effectively saved, and higher performance evaluation efficiency and accuracy are guaranteed.
As a preferred embodiment, the AI application starting module 100 may be specifically configured to determine a target AI application according to an evaluation instruction; and starting the target AI application so that the target AI application calls a preset data set to perform specified model training.
As a preferred embodiment, the second index calculating module 300 may be specifically configured to invoke TEYE, and monitor the running process of the target AI application in a timeout mode to obtain the second performance index data.
As a preferred embodiment, the first performance indicator data may include a GPU data transmission time and a GPU data calculation time.
In a preferred embodiment, the second performance indicator data may include GPU core utilization and GPU bandwidth utilization.
As a preferred embodiment, the performance evaluation module 400 may include:
the time index calculation unit is used for calculating and obtaining a time index value according to the GPU data transmission time and the GPU data calculation time;
the hardware index calculation unit is used for calculating and obtaining a hardware index value according to the GPU core utilization rate and the GPU bandwidth utilization rate;
and the performance evaluation unit is used for calculating and obtaining a performance evaluation value according to the time index value and the hardware index value.
As a preferred embodiment, the AI application performance evaluation device may further include a performance optimization module, configured to compare and analyze each performance index data with a corresponding preset threshold; wherein the performance indicator data comprises first performance indicator data and second performance indicator data; and when the performance index data does not meet the preset threshold value, calling a corresponding preset optimization strategy to perform performance optimization on the target AI application.
For the introduction of the apparatus provided in the present application, please refer to the above method embodiments, which are not described herein again.
To solve the above problem, please refer to fig. 3, fig. 3 is a schematic structural diagram of an AI application performance evaluation device provided in the present application, where the AI application performance evaluation device may include:
a memory 1 for storing a computer program;
the processor 2, when executing the computer program, may implement any of the steps of the AI application performance evaluation methods described above.
For the introduction of the device provided in the present application, please refer to the above method embodiment, which is not described herein again.
To solve the above problem, the present application also provides a computer-readable storage medium having a computer program stored thereon, where the computer program can be executed by a processor to implement any one of the steps of the AI application performance evaluation method.
The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
For the introduction of the computer-readable storage medium provided in the present application, please refer to the above method embodiments, which are not described herein again.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The AI application performance evaluation method, apparatus, device, and computer-readable storage medium provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and these improvements and modifications also fall into the elements of the protection scope of the claims of the present application.

Claims (6)

1. An AI application performance evaluation method, comprising:
starting a target AI application according to the received evaluation instruction;
calling NVPROF to monitor the running process of the target AI application to obtain first performance index data;
calling TEYE to monitor the running process of the target AI application to obtain second performance index data;
calculating the first performance index data and the second performance index data by using a preset performance index calculation rule to obtain a performance evaluation value of the target AI application;
the calling TEYE monitors the running process of the target AI application to obtain second performance index data, and the method comprises the following steps:
calling the TEYE, monitoring the running process of the target AI application in a TIMELINE mode, and obtaining the second performance index data;
the first performance index data comprises GPU data transmission time and GPU data calculation time;
the second performance index data comprises GPU core utilization and GPU bandwidth utilization;
the calculating the first performance index data and the second performance index data by using a preset performance index calculation rule to obtain the performance evaluation value of the target AI application includes:
calculating according to the GPU data transmission time and the GPU data calculation time to obtain a time index value;
calculating to obtain a hardware index value according to the GPU core utilization rate and the GPU bandwidth utilization rate;
and calculating to obtain the performance evaluation value according to the time index value and the hardware index value.
2. The AI application performance evaluation method of claim 1, wherein the launching of the target AI application in accordance with the received evaluation instruction comprises:
determining the target AI application according to the evaluation instruction;
and starting the target AI application so that the target AI application calls a preset data set to perform specified model training.
3. The AI application performance evaluation method of claim 1 or 2, further comprising:
comparing and analyzing each performance index data with a corresponding preset threshold value; wherein the performance indicator data comprises the first performance indicator data and the second performance indicator data;
and when the performance index data does not meet the preset threshold, calling a corresponding preset optimization strategy to perform performance optimization on the target AI application.
4. An AI application performance evaluation apparatus, comprising:
the AI application starting module is used for starting the target AI application according to the received evaluation instruction;
the first index calculation module is used for calling the NVPROF to monitor the running process of the target AI application to obtain first performance index data;
the second index calculation module is used for calling TEYE to monitor the running process of the target AI application and obtain second performance index data;
the performance evaluation module is used for calculating the first performance index data and the second performance index data by using a preset performance index calculation rule to obtain a performance evaluation value of the target AI application;
the second index calculation module is specifically configured to invoke the TEYE, and monitor an operation process of the target AI application in a timeout mode to obtain the second performance index data;
the first performance index data comprises GPU data transmission time and GPU data calculation time;
the second performance index data comprises GPU core utilization and GPU bandwidth utilization;
the performance evaluation module is specifically used for calculating and obtaining a time index value according to the GPU data transmission time and the GPU data calculation time; calculating to obtain a hardware index value according to the GPU core utilization rate and the GPU bandwidth utilization rate; and calculating to obtain the performance evaluation value according to the time index value and the hardware index value.
5. An AI application performance evaluation device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the AI application performance evaluation method according to any one of claims 1 to 3 when executing the computer program.
6. A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when being executed by a processor, carries out the steps of the AI application performance evaluation method according to any one of claims 1 to 3.
CN201911386452.7A 2019-12-29 2019-12-29 AI application performance evaluation method, device and related equipment Active CN111045912B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911386452.7A CN111045912B (en) 2019-12-29 2019-12-29 AI application performance evaluation method, device and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911386452.7A CN111045912B (en) 2019-12-29 2019-12-29 AI application performance evaluation method, device and related equipment

Publications (2)

Publication Number Publication Date
CN111045912A CN111045912A (en) 2020-04-21
CN111045912B true CN111045912B (en) 2022-03-22

Family

ID=70241155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911386452.7A Active CN111045912B (en) 2019-12-29 2019-12-29 AI application performance evaluation method, device and related equipment

Country Status (1)

Country Link
CN (1) CN111045912B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742243B (en) * 2021-09-17 2024-03-01 京东科技信息技术有限公司 Application evaluation method, device, electronic equipment and computer readable medium
CN116701125B (en) * 2023-07-31 2023-10-27 太初(无锡)电子科技有限公司 Performance data acquisition method of AI chip

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108683663A (en) * 2018-05-14 2018-10-19 中国科学院信息工程研究所 A kind of appraisal procedure and device of network safety situation
CN110442516A (en) * 2019-07-12 2019-11-12 上海陆家嘴国际金融资产交易市场股份有限公司 Information processing method, equipment and computer readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9715663B2 (en) * 2014-05-01 2017-07-25 International Business Machines Corporation Predicting application performance on hardware accelerators

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108683663A (en) * 2018-05-14 2018-10-19 中国科学院信息工程研究所 A kind of appraisal procedure and device of network safety situation
CN110442516A (en) * 2019-07-12 2019-11-12 上海陆家嘴国际金融资产交易市场股份有限公司 Information processing method, equipment and computer readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"A novel CNN-DDPG based AI-trader: Performance and roles in business operations";Suyuan Luo等;《Transportation Research Part E: Logistics and Transportation Review》;20191130;第68-79页 *
"浪潮携三大AI计算主力军亮相百度AI开发者大会";无;《电脑知识与技术(经验技巧)》;20180831;第111-112页 *

Also Published As

Publication number Publication date
CN111045912A (en) 2020-04-21

Similar Documents

Publication Publication Date Title
US10642642B2 (en) Techniques to manage virtual classes for statistical tests
CN114862656B (en) Multi-GPU-based acquisition method for training cost of distributed deep learning model
CN111045912B (en) AI application performance evaluation method, device and related equipment
CN106776455B (en) Single-machine multi-GPU communication method and device
US20230305880A1 (en) Cluster distributed resource scheduling method, apparatus and device, and storage medium
CN114840322B (en) Task scheduling method and device, electronic equipment and storage
CN114580263A (en) Knowledge graph-based information system fault prediction method and related equipment
CN110796591B (en) GPU card using method and related equipment
CN112801434A (en) Method, device, equipment and storage medium for monitoring performance index health degree
CN112463432A (en) Inspection method, device and system based on index data
CN116126346A (en) Code compiling method and device of AI model, computer equipment and storage medium
CN115150471A (en) Data processing method, device, equipment, storage medium and program product
Cunha et al. Context-aware execution migration tool for data science Jupyter Notebooks on hybrid clouds
EP4357924A1 (en) Application performance testing method, method and apparatus for establishing performance testing model
CN114021733B (en) Model training optimization method, device, computer equipment and storage medium
CN115328891A (en) Data migration method and device, storage medium and electronic equipment
CN112069022B (en) NPU type server power consumption testing method and system
CN113723538A (en) Cross-platform power consumption performance prediction method and system based on hierarchical transfer learning
CN114661571A (en) Model evaluation method, model evaluation device, electronic equipment and storage medium
CN112131468A (en) Data processing method and device in recommendation system
CN108073502B (en) Test method and system thereof
CN110633742A (en) Method for acquiring characteristic information and computer storage medium
WO2014117566A1 (en) Ranking method and system
CN113722292B (en) Disaster response processing method, device, equipment and storage medium of distributed data system
CN113626070B (en) Method, device, equipment and storage medium for configuring code quality index

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant