CN116627433B

CN116627433B - Real-time parameter prediction method, system, equipment and medium for AI processor

Info

Publication number: CN116627433B
Application number: CN202310881075.4A
Authority: CN
Inventors: 章弋嘉; 王丙强; 徐鹏翔; 田永鸿; 高文
Original assignee: Peng Cheng Laboratory
Current assignee: Peng Cheng Laboratory
Priority date: 2023-07-18
Filing date: 2023-07-18
Publication date: 2024-01-09
Anticipated expiration: 2043-07-18
Also published as: CN116627433A

Abstract

The application provides a real-time parameter prediction method, system, equipment and medium for an AI processor, and belongs to the technical field of artificial intelligence. The method comprises the following steps: running a target application program on an AI processor, and collecting basic parameters in the running process of the AI processor in a modeling stage; the basic parameters are regulated by a non-invasive regulating tool corresponding to the AI processor in real time, and target operation parameters corresponding to the basic parameters are obtained in the real-time regulating process; fitting operation is carried out based on a plurality of target operation parameters and corresponding basic parameters, and a function model which is mapped to the target operation parameters by the basic parameters is established; in the prediction stage, acquiring real-time basic parameters, and inputting the real-time basic parameters into a function model to obtain predicted target operation parameters so as to determine a parameter prediction result according to the predicted target operation parameters. The method and the device can be deployed and operated on the AI processor on line, no additional computing resources are needed, and the reliability of the predicted result is high.

Description

Real-time parameter prediction method, system, equipment and medium for AI processor

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a real-time parameter prediction method, system, equipment and medium of an AI processor.

Background

An artificial intelligence (artificial intelligence, AI) processor, which refers to a processor that can meet the requirements of AI computation, differs from a CPU processor that traditionally performs general-purpose computation in that the AI processor implements a high degree of optimization for AI training and reasoning tasks, and can efficiently complete AI computation tasks. The AI processor is a core component in the AI server, and is a key tool for the scientific research and industrial development of artificial intelligence in China.

The monitoring of key operation parameters of the AI processor, such as monitoring performance parameters, power consumption parameters and other energy efficiency parameters, is an important basis for system monitoring and system tuning of the AI server. In the related art, a great deal of computing resources are needed for directly acquiring the key operation parameters, so that the key operation parameters are needed to be obtained based on basic parameter modeling prediction of the AI processor, however, the existing modeling prediction method of the AI processor on the key operation parameters has the problems of incapability of being performed on line, serious resource waste, poor reliability and the like, and the key operation parameters cannot be deployed and operated on line on the AI processor.

Disclosure of Invention

The main purpose of the embodiment of the application is to provide a real-time parameter prediction method, system, device and medium for an AI processor, which can be deployed and operated on line on the AI processor without additional computing resources during the operation, and has high reliability.

To achieve the above object, a first aspect of an embodiment of the present application provides a real-time parameter prediction method for an AI processor, where the method includes: running a target application on an AI processor, wherein a run phase of the target application includes a modeling phase and a prediction phase after the modeling phase; in the modeling stage, basic parameters in the operation process of the AI processor are collected; the basic parameters are regulated in real time through a non-invasive regulating tool corresponding to the AI processor, and target operation parameters corresponding to the basic parameters are obtained in the real-time regulating process; performing fitting operation based on a plurality of target operation parameters and the corresponding basic parameters, and establishing a function model mapped to the target operation parameters by the basic parameters; and in the prediction stage, acquiring the real-time basic parameters, and inputting the real-time basic parameters into the function model to obtain the predicted target operation parameters so as to determine a parameter prediction result according to the predicted target operation parameters.

In some embodiments, if the function model is a piecewise function model, the target operating parameter is a feature utilization index; inputting the real-time basic parameters into the function model to obtain the predicted target operation parameters, so as to determine a parameter prediction result according to the predicted target operation parameters, wherein the method comprises the following steps: inputting the real-time basic parameters into the piecewise function model to obtain a predicted characteristic utilization index; acquiring a preset linear fitting function model mapped to the running performance by the characteristic utilization index, and inputting the predicted characteristic utilization index into the linear fitting function model to obtain a corresponding running performance prediction result; or if the function model is an operation performance prediction model, the target operation parameter is operation performance; inputting the real-time basic parameters into the function model to obtain the predicted target operation parameters, so as to determine a parameter prediction result according to the predicted target operation parameters, wherein the method comprises the following steps: inputting the real-time basic parameters into the running performance prediction model to obtain predicted running performance, and taking the predicted running performance as a corresponding running performance prediction result; or if the function model is an operation power consumption prediction model, the target operation parameter is operation power consumption; inputting the real-time basic parameters into the function model to obtain the predicted target operation parameters, so as to determine a parameter prediction result according to the predicted target operation parameters, wherein the method comprises the following steps: inputting the real-time basic parameters into the operation power consumption prediction model to obtain predicted operation power consumption, and taking the predicted operation power consumption as a corresponding operation power consumption prediction result.

In some embodiments, if the function model is a piecewise function model, the target operating parameter is a characteristic utilization index and the base parameter is frequency; the fitting operation is performed based on the target operation parameters and the corresponding basic parameters, and a function model mapped to the target operation parameters by the basic parameters is established, which comprises the following steps: establishing a roof line model according to each frequency and the corresponding characteristic utilization index in the real-time adjustment process, and determining critical frequency in the frequencies according to the roof line model; establishing an initial first fitting coefficient according to the roof line model, and establishing an initial piecewise function model mapped from the frequency to the characteristic utilization index according to the critical frequency and the initial first fitting coefficient; inputting the frequency in the real-time adjustment process into an initial piecewise function model according to the magnitude relation between the frequency and the critical frequency, and updating the initial first fitting coefficient according to the obtained result to obtain the updated first fitting coefficient; and adjusting the initial piecewise function model according to the updated first fitting coefficient to obtain the final piecewise function model.

In some embodiments, the inputting the frequency in the real-time adjustment process into the initial piecewise function model according to the magnitude relation with the critical frequency, and updating the initial first fitting coefficient according to the obtained result, so as to obtain the updated first fitting coefficient, which includes: determining that each frequency greater than or equal to the critical frequency in the real-time adjustment process is a first frequency, and each frequency smaller than the critical frequency is a second frequency, and respectively inputting the first frequency and the second frequency into the initial piecewise function model to obtain a first index and a second index corresponding to the real-time adjustment process; calculating a difference value between the first index and the characteristic utilization rate index corresponding to the first frequency to obtain a first difference value, and calculating a difference value between the second index and the characteristic utilization rate index corresponding to the second frequency to obtain a second difference value; and calculating the total square error between the first difference value and the second difference value, minimizing the corresponding total square error under all frequencies, solving to obtain the first fitting coefficient under the minimum total square error, and taking the first fitting coefficient as the updated first fitting coefficient.

In some embodiments, the linear fit function model is obtained by the steps comprising: running a reference application program on an AI processor, wherein the reference application program corresponds to the target application program; collecting the basic parameters of the AI processor in the process of running the reference application program, and adjusting the basic parameters through the non-invasive adjustment tool; acquiring operation performance corresponding to each basic parameter and at least one sub-feature utilization index corresponding to each basic parameter in the adjusting process, wherein the sub-feature utilization index comprises a calculation unit utilization rate, a memory occupancy rate or a memory bandwidth utilization rate; according to the operation performance obtained in the adjustment process, linear fitting operation is carried out between the operation performance and at least one corresponding sub-feature utilization index, and feature weights fitted to the at least one sub-feature utilization index are determined; and obtaining a total characteristic utilization index according to at least one sub-characteristic utilization index and the corresponding characteristic weight, and establishing the linear fitting function model according to the characteristic utilization index and the operation performance obtained in the adjusting process.

In some embodiments, if the function model is a piecewise function model, the target operating parameter is a feature utilization index; the obtaining the target operation parameters corresponding to the basic parameters in the real-time adjustment process comprises the following steps: acquiring a plurality of sub-feature utilization rate indexes corresponding to the basic parameters in a real-time adjustment process; and carrying out weighted calculation according to at least one sub-feature utilization index and the corresponding feature weight to obtain the feature utilization index in the real-time adjustment process.

In some embodiments, if the function model is an operation power consumption prediction model, the target operation parameter is operation power consumption, and the basic parameter is frequency; the fitting operation is performed based on the target operation parameters and the corresponding basic parameters, and a function model mapped to the target operation parameters by the basic parameters is established, which comprises the following steps: according to the power consumption types in different states in the actual operation process of the AI processor, an initial tertiary function model mapped from the frequency to the operation power consumption is established, and an initial second fitting coefficient is configured for the initial tertiary function model; inputting the frequency in the real-time adjustment process into the initial cubic function model, and updating the initial second fitting coefficient according to the obtained result to obtain the updated second fitting coefficient; and adjusting the initial cubic function model according to the updated second fitting coefficient, and taking the adjusted cubic function model as the running power consumption prediction model.

In some embodiments, the establishing an initial cubic function model mapped from the frequency to the operation power consumption according to the power consumption types in different states in the actual operation process of the AI processor, and configuring an initial second fitting coefficient for the initial cubic function model includes: determining static power consumption, dynamic power consumption and heat dissipation power consumption of the AI processor in different states in the actual operation process; determining that the dynamic power consumption and the frequency are in a cubic relation from the proportional relation between different power consumption types of the static power consumption, the dynamic power consumption and the heat dissipation power consumption and the frequency; and establishing an initial cubic function model mapped from the frequency to the running power consumption according to the cubic relation, and generating initial second fitting coefficients as coefficients of functions in the cubic function model.

In some embodiments, the inputting the frequency in the real-time adjustment process into the initial cubic function model, and updating the initial second fitting coefficient according to the obtained result, so as to obtain the updated second fitting coefficient, including: inputting the frequency in the real-time adjustment process into the initial cubic function model to obtain corresponding first power consumption in the real-time adjustment process; calculating a difference value between the first power consumption and the operation power consumption corresponding to the frequency to obtain a power consumption difference value; and calculating the square error of the power consumption difference value, minimizing the square error corresponding to all frequencies, solving to obtain the second fitting coefficient with the minimum square error, and taking the second fitting coefficient as the updated second fitting coefficient.

In some embodiments, the parameter predictors include an operating performance predictor and an operating power consumption predictor, the method further comprising: according to the operation performance prediction result and the operation power consumption prediction result, calculating the energy efficiency ratio of the AI processor for operating the target application program under the real-time basic parameters; and adjusting the basic parameters in real time according to one of the operation performance prediction result, the operation power consumption prediction result and the energy efficiency ratio.

In some embodiments, the method further comprises: if the AI processor is switched to run to a new target application program or the target application program is run to a new running stage, re-entering the modeling stage, wherein the running stage of the target application program comprises initialization, data loading, vector calculation, scalar calculation or data writing back; and re-fitting operation is carried out according to the basic parameters and the target operation parameters which are newly obtained after the new modeling stage is carried out, and a new function model is established.

In some embodiments, the AI processor runs the target application periodically, the method further comprising: dividing each operating cycle of the target application into a plurality of operating phases; the modeling phase and the prediction phase are set in the operation phase such that modeling is performed in the modeling phase within each of the operation cycles during the periodic operation of the target application, wherein at least one of the modeling phases is located before the prediction phase.

To achieve the above object, a second aspect of the embodiments of the present application proposes a real-time parameter prediction system of an AI processor, the system including: the program running module is used for running a target application program on the AI processor, wherein the running stage of the target application program comprises a modeling stage and a prediction stage after the modeling stage; the data acquisition module is used for acquiring basic parameters in the operation process of the AI processor in the modeling stage; the adjusting module is used for adjusting the basic parameters through a non-invasive adjusting tool corresponding to the AI processor in real time and acquiring target operation parameters corresponding to the basic parameters in a real-time adjusting process; the modeling module is used for carrying out fitting operation based on a plurality of target operation parameters and the corresponding basic parameters, and establishing a function model mapped to the target operation parameters by the basic parameters; and the prediction module is used for acquiring the real-time basic parameters in the prediction stage, inputting the real-time basic parameters into the function model, and obtaining the predicted target operation parameters so as to determine a parameter prediction result according to the predicted target operation parameters.

To achieve the above object, a third aspect of the embodiments of the present application provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores a computer program, and the processor implements the real-time parameter prediction method of the AI processor according to the embodiment of the first aspect when executing the computer program.

To achieve the above object, a fourth aspect of the embodiments of the present application proposes a storage medium, which is a computer-readable storage medium, storing a computer program, which when executed by a processor, implements the real-time parameter prediction method of the AI processor according to the embodiment of the first aspect.

The embodiment of the application has the following beneficial effects:

in the process of operating a target application program on an AI processor, if the program operation enters a modeling stage, a prediction model is required to be established, basic parameters of the AI processor in the operation process are firstly collected, then the basic parameters are regulated in real time through a non-invasive regulating tool corresponding to the AI processor, so that parameter regulation is realized under the condition that logic of the AI processor is not changed, no additional computing resources are needed, the target operation parameters corresponding to the basic parameters are obtained in the real-time regulation process, the target operation parameters can be changed along with the change of the basic parameters in the regulation process, fitting operation can be carried out according to the parameters, a function model which is formed by mapping the basic parameters to the target operation parameters is established, online modeling is realized, and in the later prediction stage, the real-time basic parameters can be input into the model, and a required parameter prediction result is obtained by using the model. Therefore, the embodiment of the application can be deployed and operated on line on the AI processor without additional computing resources during the operation, and the model is built according to the data sampled in the program operation process, so that the model has the interpretability and the predictability, and the reliability of the predicted result is high.

Drawings

FIG. 1 is a schematic diagram of an AI processor real-time parameter prediction system provided in an embodiment of the application;

FIG. 2 is a flowchart of a real-time parameter prediction method of an AI processor according to an embodiment of the disclosure;

fig. 3 is a flowchart illustrating the operation performance prediction result obtained in step S105 of fig. 2;

FIG. 4 is a flowchart illustrating the step S104 of FIG. 2 for creating a piecewise function model;

FIG. 5 is a schematic diagram of the relationship between the measured operating frequency and the characteristic utilization index in the embodiment of the present application;

FIG. 6 is a schematic diagram of a roof line model provided in an embodiment of the present application;

fig. 7 is a detailed flowchart of step S303 in fig. 4;

FIG. 8 is a flow chart of a linear fit function model creation process provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a characteristic utilization index-running performance relationship in a benchmark test procedure provided in an embodiment of the present application;

fig. 10 is a schematic flow chart of step S103 in fig. 2;

FIG. 11 is a complete flow chart provided by an embodiment of the present application;

FIG. 12 is a schematic diagram of a relationship between various data and time in a real-time sampling process according to an embodiment of the present application;

FIG. 13 is a schematic diagram of online performance modeling and online power consumption modeling based on the data in FIG. 12;

FIG. 14 is a flowchart illustrating the establishment of the operation power consumption prediction model in step S104 of FIG. 2;

fig. 15 is a schematic flowchart of step S701 in fig. 14;

fig. 16 is a schematic flow chart of step S702 in fig. 14;

FIG. 17 is a schematic flow chart of calculating energy efficiency ratios and adjusting basic parameters provided by embodiments of the present application;

FIG. 18 is a flow diagram of a remodelling process provided by an embodiment of the present application;

FIG. 19 is a flow diagram of a periodic modeling process provided by an embodiment of the present application;

FIG. 20 is a schematic diagram of a functional block of a real-time parameter prediction system of an AI processor according to an embodiment of the disclosure;

fig. 21 is a schematic hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.

First, several nouns referred to in this application are parsed:

artificial intelligence (artificial intelligence, AI): is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding the intelligence of people; artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that can react in a manner similar to human intelligence, research in this field including robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of consciousness and thinking of people. Artificial intelligence is also a theory, method, technique, and application system that utilizes a digital computer or digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

An artificial intelligence processor (AI processor for short) refers to a processor which contains Graphics Processor (GPU), neural Processor (NPU), tensor Processor (TPU) and other different types of processors and can meet AI computing requirements, and is different from a traditional CPU processor for performing general purpose computing in that the AI processor realizes high optimization for AI training and reasoning tasks and can efficiently complete AI computing tasks. The AI processor is a core component in the AI server, and is a key tool for the scientific research and industrial development of artificial intelligence in China.

The AI server runs an application program, and has long time and large power consumption when executing a calculation task. Typical run times for an AI training task are often as long as days to months, with typical power consumption for an AI server being several kw, and typical power consumption for an AI computing cluster being above several megawatts. Therefore, for the purposes of shortening the calculation time and reducing the calculation power consumption, it is important to monitor and tune the performance and the power consumption of the AI server. The monitoring of key operation parameters of the AI processor, such as monitoring performance parameters, power consumption parameters and other energy efficiency parameters, is an important basis for system monitoring and system tuning of the AI server. For example, by establishing an accurate model of the operating frequency of the AI processor and the power consumption of the program operating performance on line, the on-line energy efficiency ratio adjustment and optimization of the AI server can be realized, and the power consumption of the AI server is effectively reduced.

The monitoring of key operation parameters such as performance and power consumption by the existing AI processor has the following problems:

the existing AI processor performance power consumption modeling prediction method cannot be performed online.

The existing AI processor performance measurement method is to calculate the performance by dividing the workload completed by the program after each complete running of the program by the calculation time, which can be expressed as the formula "performance=workload/time". However, in the middle of the running of the program, no universal tool is used for acquiring the workload of the program which is currently finished, and even if the workload on-line measurement can be realized by modifying the program source code for a single program, the method for on-line measuring the workload cannot be made into a universal automatic tool because different program source codes are different, so that the existing method cannot realize on-line performance measurement and on-line performance modeling and prediction. Some related works in the related art also propose to use the processor hardware index to fit the running performance and the power consumption of the program, but the indexes used in the works can be collected only after the complete running of the program is finished, and the used tools such as CUPTI (CUDA Profiling Tools Interface) and nvprof (NVIDIA profiling tools) need to invade the application program, modify the source code of the application program and seriously reduce the normal speed of the program, so that the non-interference online modeling prediction cannot be realized in the running of the program.

And (II) the existing AI processor performance power consumption modeling prediction method seriously wastes calculation resources.

The existing AI processor performance power consumption modeling needs to repeatedly operate the same application program in different operation states to obtain the average performance and power consumption of the operation program in different states, and therefore the performance and power consumption models of the program are obtained through fitting. However, when the time required for executing the existing method is several times that of the single running program, for example, if the performance and power consumption of the running program at 10 different frequencies are compared, the time required for executing the program is enlarged by 10 times, which causes serious waste of computing resources. Because of the above-mentioned drawbacks of the existing methods, AI server users currently do not have sufficient on-machine resources to search for the optimal operating settings, and thus servers often do not operate on their optimal settings.

And thirdly, the existing AI processor performance power consumption modeling prediction method lacks reliability.

Some existing AI processor performance power consumption modeling prediction methods use neural network models to give prediction results. Their method requires training the neural network in advance on a part of the program and then generalizing the predictions of performance and power consumption on other programs. However, compared with the analytical model and the linear model with theoretical basis, the neural network model is basically not interpretable and predictable at present, so that one cannot judge whether the neural network can ensure enough prediction precision and possibly generate significant prediction errors outside the training set. Because advanced AI servers are generally millions of valuable devices, people often do not accept to let models lacking in interpretability participate in regulation of server system settings to avoid potential safety hazards, and thus neural network-based performance power consumption models are also difficult to deploy applications on practical systems.

Based on this, the embodiment of the application provides a real-time parameter prediction method, system, device and medium for an AI processor, the method can be deployed and operated on line on the AI processor without additional computing resources during the operation, and the model is built according to data sampled during the operation of the program, so that the model has interpretability and predictability, and the reliability of the predicted result is high.

The method is based on real-time chip utilization index sampling, combines roof line model theory, and realizes online construction of the performance model and the power consumption model of the AI processor by a numerical fitting method, so as to provide a prediction means of performance and power consumption.

The method, system, device and medium for predicting real-time parameters of an AI processor provided in the embodiments of the present application are specifically described through the following embodiments, and the real-time parameter predicting system of an AI processor in the embodiments of the present application is described first.

As shown in fig. 1, the real-time parameter prediction system of the AI processor is provided with a processing module, a non-invasive adjusting tool and a collecting tool, wherein the processing module is respectively connected with the non-invasive adjusting tool and the collecting tool, and the non-invasive adjusting tool and the collecting tool are respectively connected with the AI processor. After the target application program is operated on the AI processor, in a modeling stage, a processing module controls an acquisition tool to acquire basic parameters in the operation process of the AI processor, the processing module adjusts the basic parameters in real time through a non-invasive adjusting tool corresponding to the AI processor, and in the real-time adjusting process, the processing module acquires target operation parameters corresponding to the basic parameters through the acquisition tool, performs fitting operation based on a plurality of target operation parameters and the corresponding basic parameters, and builds a function model mapped from the basic parameters to the target operation parameters; in the prediction stage, the real-time basic parameters are continuously acquired through the acquisition tool, and the processing module inputs the real-time basic parameters into the function model to obtain predicted target operation parameters so as to determine parameter prediction results according to the predicted target operation parameters.

The AI processor may be located in an AI server, or in an AI computing cluster, and the application is taken as an example in the AI server, where a large number of AI processors are disposed in the AI server, and each AI processor may execute a task of training through an operating program, which is not specifically limited in this embodiment of the present application.

The processing module can be any type of processor, can control the working states of the non-invasive adjusting tool and the collecting tool, and can receive and process the data collected by the collecting tool. The processing module can be applied to the terminal and the server. In some embodiments, the terminal may be a smart phone, tablet, notebook, desktop, etc.; the server side can be configured as an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms and the like; the software may be an application or the like that implements the parameter prediction method of the AI processor in real time, but is not limited to the above form.

The non-invasive adjustment tool is an AI processor vendor provided tool, for example, nvidia-smi and nvml tools can be utilized as non-invasive adjustment tools on a company GPU. It is worth pointing out that the change of basic parameters such as the operating frequency of the processor and the like by the non-invasive adjusting tool does not change the logic of the program, so that the correctness of the program operation is not affected, and the source code of the application program is not required to be modified, so that the method can be implemented in a practical system at low cost and non-invasively on line, and the defect that the traditional method is not practical due to the fact that the cost is too high is avoided.

The acquisition tool is a tool that can acquire data from AI processing. The AI processor generally supports simultaneous acquisition of real-time data of the chip, such as operation power consumption, characteristic utilization index, operation performance, and the like, and specific acquirable index names and acquisition tool commands are different from AI processor model to AI processor model. For example, a company's NPU processor supports the use of NPU-smi tools to collect chip utilization data in real time, and a company's GPU processor supports the use of nvidia-smi and nvml tools to collect data in real time.

The operation power consumption is the power consumption of the AI processor in the operation process, and the power consumption may be of various types, for example, the power consumption of the AI processor mainly includes static power consumption, dynamic power consumption and heat dissipation power consumption.

The characteristic utilization index is a utilization index capable of indicating performance, so that the characteristic utilization index can comprise multiple sub-characteristic utilization indexes, and the multiple sub-characteristic utilization indexes can influence the running performance of the AI processing. In one embodiment, the sub-feature utilization index includes a computing unit utilization, a memory occupancy, a memory bandwidth utilization, and the like.

The running performance is a completion speed of the AI processor after executing the program, and may be obtained by dividing a workload completed by the program after the program is run over a certain period of time by a calculation time, and may be expressed as a formula of "performance=workload/time.

It should be noted that, whether the non-invasive adjustment tool or the collection tool is selected, the real-time adjustment and collection can be performed during the running process of the program, so that tools such as nvprof cannot be used for online performance power consumption modeling during the running of any program.

Furthermore, the non-invasive adjustment tool and the acquisition tool may be identical, and the adjustment function and the acquisition function may be implemented by the same tool, such as nvidia-smi and nvml tools used by a company's GPU processor support.

In the embodiments of the present application, when acquiring data of the AI processor, it is necessary to acquire permission or consent of the user. Moreover, the collection, use, processing, etc. of such data would comply with relevant laws and regulations.

Fig. 2 is an optional flowchart of a real-time parameter prediction method of an AI processor provided in an embodiment of the present application, where the method in fig. 2 may include, but is not limited to, steps S101 to S105.

Step S101, a target application program is operated on an AI processor, wherein the operation stage of the target application program comprises a modeling stage and a prediction stage after the modeling stage;

the target application is, for example, a program executed by the AI processor during the running process, and the target application may be a program of a different calculation mode, which is not particularly limited herein. The time period of the AI processor running the target application program is generally longer, so in the long-time running process, how to predict key parameters in the running process of the AI processor, such as running performance, running power consumption and the like, is an important basis for performing system monitoring and system tuning on the AI processor and even the AI server where the AI processor is located.

For example, the running time of the target application program is long, in the embodiment of the application, the running stage of the program is divided into a modeling stage and a prediction stage, wherein the modeling stage is located before the prediction stage, the modeling stage is mainly used for establishing a prediction model of parameters, and then in the prediction stage, the prediction model can be used for predicting key parameters.

Step S102, in a modeling stage, basic parameters in the operation process of an AI processor are collected;

for example, the AI processor may enter a modeling stage during the running of the target application, where the modeling stage may be a stage located before or a stage that is first entered during the running period of the program.

Illustratively, the basic parameters are parameters characterizing "processor specifications" during operation of the AI processor, and may include a variety of parameters describing some basic features and performance metrics of the processor, such as voltage, current, frequency, and number of cores the AI processor has during operation of the AI processor, which may reflect aspects of power consumption, operational stability, and computational performance of the AI processor at a given workload. The voltage is the voltage required by the AI processor chip, and the unit is volt (V); the current represents the current load of the AI processor chip in amperes (a); frequency refers to the speed at which an AI processor executes instructions, typically in hertz (Hz); the core number is the number of computing cores (cores) in the AI processor, which refers to hardware units for performing complex neural network computing tasks, and the computing cores may generally include vector/matrix computing units, operator accelerators, neural network accelerators, or heterogeneous computing units according to their functions and designs, and the number of computing cores is millions to millions on a typical AI processor, which is not particularly limited in the embodiments of the present application.

In the embodiment of the application, the basic parameters are selected according to the requirement of adjusting the key parameters of the AI processor, and the basic parameters are adjusted later to achieve the purpose of influencing the key parameters of the AI processor, so that the basic parameters are adjusted according to the actual key parameter sizes after the subsequent modeling. The key parameters in the embodiments of the present application include the operation performance and the operation power consumption of the AI processor, and the energy consumption ratio may be obtained based on the operation performance and the operation power consumption, so the key parameters may also include the energy consumption ratio, which is not described in detail later.

For the AI processor, the operation state, performance and power consumption of the AI processor can be controlled by various possible modes such as frequency regulation, voltage regulation, core switch, precision control and the like. Wherein:

the frequency regulation can realize the maximum power consumption regulation range and the finest power consumption control granularity, and the pipeline execution rate of the chip is changed by changing the running frequency of the chip on line, so that the performance of an application program and the power consumption of a processor are influenced, the performance is higher at a higher frequency, and the power consumption is also rapidly increased. On the latest GPU processor, the allowable chip running frequency adjusting range can reach 10 times, the frequency adjustment does not influence the normal execution of the program, and the adjusting delay is far less than 1 second;

The voltage regulation influences the power consumption of the chip by changing the operation voltage of the chip on line, so that the stable and correct operation of the chip can be ensured in a proper range, and the voltage regulation is often carried out by matching with the frequency regulation;

the core switch changes the performance and the power consumption by changing the running number of the computing cores on line, the number of the computing cores (cores) is millions on a typical AI processor, and the execution logic of the program is not sensitive to the number of the computing cores, so that the performance of the program and the power consumption of the chip can be regulated and controlled in a large range through the core switch, and meanwhile, the correct execution of the program is not influenced;

the precision control means that the floating point number/integer calculation mode of the AI processor is switched in modes such as double precision, single precision, half precision and the like, and the lower precision has higher energy efficiency ratio, but partial calculation precision is sacrificed, and for an application program with low calculation precision requirement, low precision can be adopted to calculate so as to improve the speed and reduce the power consumption.

Based on the foregoing, the basic parameters required in the embodiments of the present application may be determined, and in the embodiments of the present application, only the voltage, the current, the frequency, and the core number of the AI processor in operation are taken as examples of the basic parameters, and are not represented as limitations on the embodiments of the present application.

Step S103, basic parameters are regulated in real time through non-invasive regulating tools corresponding to the AI processor, and target operation parameters corresponding to the basic parameters are obtained in the real-time regulating process;

illustratively, in order to build an interpretable, predictive model, in embodiments of the present application, the target operating parameters need to be changed by adjusting the base parameters, so as to build a reliable model based on the magnitude change relationship between the target operating parameters and the base parameters.

The target operating parameter is, for example, a parameter that changes as the basic parameter changes, and is also data that can be acquired by the acquisition tool. In an embodiment, as the basic parameters change, the operation parameters such as the feature utilization index, the operation performance, the operation power consumption and the like in the AI processor change accordingly, so in the embodiment of the application, the feature utilization index, the operation performance, the operation power consumption and the like can be used as target operation parameters.

Illustratively, in the embodiment of the present application, as the program runs, the basic parameters are adjusted in real time in the modeling stage, and as the basic parameters are adjusted, the target running parameters that change accordingly are collected (sampled) in real time, so as to accurately perform modeling based on the two data.

In the embodiments of the present application, the basic parameters are adjusted by the non-invasive adjusting tool corresponding to the AI processor, where the non-invasive adjusting tool is described in the above embodiments, and is not described herein. It can be understood that, by adjusting the basic parameters through the non-invasive adjusting tool, for example, adjusting the operation frequency of the AI processor, the logic of the program itself is not changed, so that the correctness of the program operation is not affected, and the source code of the application program is not required to be modified, so that the step can realize the advantages of low cost, non-invasive and online execution in a practical system, and the defect that the traditional method is not practical due to over high cost is avoided.

By way of example, in the embodiment of the present application, only the modeling stage adjusts the basic parameters and collects the corresponding target operation parameters, which, although consuming a certain amount of computing resources, may also relatively affect the operation speed (performance) and the operation power consumption of the AI processor, but since the tool is non-invasive, such as provided by the AI processor manufacturer, this effect may be relatively small, and more importantly, the time of the modeling stage in the embodiment of the present application is relatively short, which is far less than the time of the subsequent prediction stage, for example, only 1.8s, so that the overall effect of the collection process on the program operation speed and the operation power consumption is relatively small, or even negligible.

Step S104, fitting operation is carried out based on a plurality of target operation parameters and corresponding basic parameters, and a function model mapped from the basic parameters to the target operation parameters is established;

after the target operation parameters are obtained, in the embodiment of the application, fitting operations are performed on the target operation parameters and the corresponding basic parameters, and a function model mapped from the basic parameters to the target operation parameters is built through multiple fitting.

For example, according to the changing relationship between the basic parameter and the target operation parameter, a corresponding fitting manner may be selected, for example, the fitting operation in the embodiment of the present application may be exponential fitting, logarithmic fitting, polynomial fitting and nonlinear fitting, and an appropriate fitting manner may be selected according to the fitting requirement of data. In the embodiment of the present application, the frequency is selected as the basic parameter, and the fitting mode is selected to be linear fitting, that is, the trend of linear variation between the target operating parameter and the frequency.

The function model in the embodiment of the application can realize mapping from the basic parameters to the target operation parameters, that is, the basic parameters can be input into the function model, and the corresponding predicted target operation parameters can be obtained through calculation of the model.

The parameters required for establishing the model are acquired in the modeling stage of the program operation and belong to the normal operation process of the program together with the subsequent prediction stage, so that the model established according to the previous data is suitable for the subsequent operation process under the same program.

Illustratively, after the model is built, the function model built in the modeling stage can be directly applied in the prediction stage.

Step S105, in the prediction stage, acquiring real-time basic parameters, and inputting the real-time basic parameters into the function model to obtain predicted target operation parameters, so as to determine parameter prediction results according to the predicted target operation parameters.

Illustratively, in the prediction phase, parameter prediction requires acquisition of real-time base parameters. The real-time basic parameters are obtained when the critical operation parameters need to be predicted, so that the predicted parameter prediction result is also a real-time result, and the required prediction time can be selected according to the needs in the embodiment of the application, so that the embodiment of the application is not particularly limited.

Illustratively, since there may be a plurality of target operating parameters, it is necessary in the prediction phase to determine the final parameter prediction result based on the type of target operating parameters. However, although the target operation parameters are various, different parameter prediction results can be corresponding, the required parameter prediction results include an operation performance prediction result, an operation power consumption prediction result and an energy consumption ratio prediction result according to actual prediction needs.

The following describes, for different target operating parameters, different parameter prediction results obtained:

if the target operation parameter is a characteristic utilization index, the parameter prediction result can be a characteristic utilization index prediction result, and after the basic parameter is input into the function model in the prediction stage, the model outputs the predicted characteristic utilization index, and the data is used as a final prediction result; the parameter prediction result may also be an operation parameter prediction result, where a mapping relationship exists between the operation performance and the feature utilization rate, and when the basic parameter is input into the function model in the prediction stage, the model outputs a predicted feature utilization rate index, and the data cannot be directly used as the prediction result, but needs to be mapped to obtain the predicted operation performance, and the predicted operation performance is used as a final prediction result.

If the target operation parameter is operation performance, the parameter prediction result may be operation performance prediction result, and after inputting the basic parameter into the function model in the prediction stage, the model outputs the predicted operation performance, and the predicted operation performance is used as a final prediction result.

If the target operation parameter is operation power consumption, the parameter prediction result may be operation power consumption prediction result, and after inputting the basic parameter into the function model in the prediction stage, the model outputs the predicted operation power consumption, and takes the predicted operation power consumption as a final prediction result.

In summary, in the process of running the target application program on the AI processor, if the program runs into the modeling stage, a prediction model needs to be built, first, basic parameters of the AI processor in the running process are collected, then, the basic parameters are adjusted in real time through a non-invasive adjusting tool corresponding to the AI processor, so that parameter adjustment is achieved under the condition that logic of the AI processing program is not changed, no additional computing resources are needed, the target running parameters corresponding to the basic parameters are obtained in the real-time adjusting process, the target running parameters are changed along with the change of the basic parameters in the adjusting process, fitting operation can be performed accordingly, a function model which is formed by mapping the basic parameters to the target running parameters is built, online modeling is achieved, and in the following prediction stage, the real-time basic parameters can be input into the model, and then, a required parameter prediction result is obtained by using the model. Therefore, the embodiment of the application can be deployed and operated on line on the AI processor without additional computing resources during the operation, and the model is built according to the data sampled in the program operation process, so that the model has the interpretability and the predictability, and the reliability of the predicted result is high.

For example, if the target operation parameter is a feature utilization index, the parameter prediction result may be an operation performance prediction result, and the function model is a model that is mapped from the basic parameter to the feature utilization index, in this case, the operation performance prediction result also needs to be obtained by linearly fitting the function model.

Referring to fig. 3, in some embodiments, the process of obtaining the running performance prediction result in step S105 may include steps S201 to S202:

step S201, inputting real-time basic parameters into a piecewise function model to obtain predicted characteristic utilization indexes;

step S202, a preset linear fitting function model which is mapped to the running performance by the characteristic utilization index is obtained, and the predicted characteristic utilization index is input into the linear fitting function model to obtain a corresponding running performance prediction result.

The piecewise function model is illustratively obtained by fitting between the basic parameters and the characteristic utilization index, so that the piecewise function model outputs the predicted characteristic utilization index after inputting the real-time basic parameters into the piecewise function model. The predicted feature utilization index cannot be directly used as a final prediction result, and therefore, it is also necessary to convert it.

Illustratively, in the embodiment of the application, the characteristic utilization index is mapped to the final required prediction result through a pre-established linear fitting function model. The linear fitting function model is a prediction model for mapping the characteristic utilization index to the running performance, in the embodiment of the application, the linear fitting function model can be pre-established in a modeling stage, and then the characteristic utilization index predicted by the piecewise function model is input into the linear fitting function model; in addition, due to the determination of the relation between the characteristic utilization index and the running performance, in order to save the occupation of resources and save the modeling time, a linear fitting function model can be pre-established in the embodiment of the application.

The linear fitting function model characterizes a mapping relationship between the characteristic utilization index and the running performance after the linear fitting function model is obtained, so that the linear fitting function model can output the predicted running performance, and finally, the predicted running performance is used as a running performance prediction result.

Based on this, in the embodiment of the present application, taking the basic parameter as an example, the process of establishing the piecewise function model is described. The method comprises the following steps:

Referring to fig. 4, in some embodiments, the process of creating the piecewise function model in step S104 may include steps S301 to S304:

step S301, building a roof line model according to each frequency and corresponding characteristic utilization index in the real-time adjustment process, and determining critical frequencies in the frequencies according to the roof line model;

step S302, an initial first fitting coefficient is established according to a roof line model, and an initial piecewise function model which is mapped from frequency to a characteristic utilization index is established according to critical frequency and the initial first fitting coefficient;

step S303, inputting the frequency in the real-time adjustment process into an initial piecewise function model according to the magnitude relation with the critical frequency, and updating an initial first fitting coefficient according to the obtained result to obtain an updated first fitting coefficient;

step S304, the initial piecewise function model is adjusted according to the updated first fitting coefficient, and a final piecewise function model is obtained.

In the embodiment of the application, a relative relationship is established between each frequency and a corresponding characteristic utilization index in a real-time adjustment process, f is used for representing the frequency, U is used for representing the characteristic utilization, and then the relationship between the frequency and the characteristic utilization index is U (f).

The operation frequency-feature utilization index relationship U (f) in the embodiment of the present application is shown in fig. 5, which is from the actual measurement data of the experiment performed on the AI server in the embodiment of the present application, where each curve represents the frequency-feature utilization index relationship of the AI processor when a target application is operated, in fact, this is not a simple linear relationship, and there is generally an obvious slope mutation point in this functional relationship, and each curve can actually be approximately fitted by using a broken line formed by two line segments.

Based on this, it can be explained by classical roof line Model (rooline Model) theory in the field of high performance computing, what frequency-feature utilization index relationship will generally exhibit the above polyline relationship. Fig. 6 shows a schematic diagram of a roof line model, in which a memory ratio (flow/Byte) is taken as an abscissa, a calculated intensity (flow/second) is taken as an ordinate, and diagonal lines show upper limits of access bandwidth (in bytes/second) of AI processors, and horizontal lines show upper limits of calculated intensity at different operation frequencies, and since the upper limits of calculated intensity of a processor are positively correlated with the operation frequency of the processor, a plurality of horizontal lines are arranged from high to low. The oblique lines and the transverse lines are determined by the characteristics of the processor and the system, and are not influenced by specific running application programs. When a program runs on the processor, it is subject to double constraints of access bandwidth and computational intensity, and therefore, the position of the program in the figure must be under the constraint line of computational intensity and bandwidth.

It should be noted that, for a particular program, the ratio is determined by the arithmetic logic, and changing the processor operating frequency does not substantially affect the ratio, so that the abscissa in fig. 6 is fixed, which is indicated by the thick arrow. When the running frequency of the processor is higher, the upper limit of the calculation intensity is high, the program encounters a memory access bottleneck, and the calculation intensity in actual running is limited by oblique lines in the graph; when the operating frequency of the processor is low, the upper limit of the computing intensity is low, the program encounters a computing bottleneck, and the actual computing intensity is limited by the lowest horizontal line in the graph. Therefore, there is a critical frequency for the program, which is the intersection of the arrow and the diagonal line corresponding to the memory ratio of the program, above which the program encounters a memory bottleneck, and below which the program encounters a computational bottleneck. Since the feature utilization index is proportional to the actual computation strength, it is known from the roof line model that the frequency-feature utilization index relationship of a program appears as a broken line relationship and has only one break point, i.e., the critical frequency shown in fig. 6.

Because the numerical value of the calculation ratio in the roof line model is difficult to extract on line in the running process of any program, the parameters of the roof line model such as critical frequency and the like cannot be directly calculated. Therefore, in the embodiment of the application, the polyline fitting is performed based on the acquired characteristic utilization index and the frequency, so as to obtain the relation U (f). Because of the existence of the critical frequency, the frequencies adjusted in the embodiments of the present application are divided into two types according to the magnitude relation between the frequencies and the critical frequency, one type is larger than the critical frequency and the other type is smaller than or equal to the critical frequency, so that a piecewise function model with the frequencies mapped to the characteristic utilization index is established based on the critical frequency, an initial first fitting coefficient is established according to the roof line model, and the initial first fitting coefficient is used as the coefficient of each function in the piecewise function model.

For example, the formula of the established piecewise function model is as follows:

wherein a, b, c, d is the first fitting coefficient, f _c Is the critical frequency.

Illustratively, in the fitting process, it is the process of solving the most effective first fitting coefficient. Therefore, in the embodiment of the application, the frequency in the real-time adjustment process is input into the initial piecewise function model, the initial first fitting coefficient is updated according to the obtained result, the fitting operation is completed, and finally the updated first fitting coefficient can be obtained.

Illustratively, after obtaining the updated first fitting coefficient, the coefficients in the piecewise function model may be refined to obtain a final piecewise function model.

After the frequency is input into the piecewise function model, a predicted feature utilization index can be obtained, optimization can be performed based on the feature utilization index actually collected, so as to obtain a final first fitting coefficient, and the following description is given by taking one of optimization modes as an example:

referring to fig. 7, in some embodiments, step S303 may include steps S401 to S403:

step S401, determining that each frequency greater than or equal to the critical frequency in the real-time adjustment process is a first frequency and each frequency less than the critical frequency is a second frequency, and respectively inputting the first frequency and the second frequency into an initial piecewise function model to obtain a first index and a second index corresponding to each other in the real-time adjustment process;

Step S402, calculating a difference value between the first index and the characteristic utilization index corresponding to the first frequency to obtain a first difference value, and calculating a difference value between the second index and the characteristic utilization index corresponding to the second frequency to obtain a second difference value;

step S403, calculating the total square error between the first difference and the second difference, minimizing the corresponding total square error under all frequencies, solving to obtain a first fitting coefficient under the minimum total square error, and taking the first fitting coefficient as the updated first fitting coefficient.

The frequencies obtained in the real-time adjustment process are divided based on the critical frequency, wherein each frequency greater than or equal to the critical frequency is a first frequency, each frequency less than the critical frequency is a second frequency, and the first frequency and the second frequency are respectively input into the piecewise function model to obtain a corresponding first index and a corresponding second index. It should be understood that the first index and the second index are only for convenience of description, and the first index and the second index are both characteristic utilization indexes.

For example, fk is a first frequency greater than fc, fj is a second frequency less than or equal to fc, then the first index isThe second index is- >。

Subsequently, setting the collected characteristic utilization rate indexes corresponding to the first frequencies as Uk, setting the collected characteristic utilization rate indexes corresponding to the second frequencies as Uj, and calculating the characteristic utilization rate indexes between the first indexes and the characteristic utilization rate indexes corresponding to the first frequenciesDifference value, obtain the first difference value) Calculating the difference between the second index and the characteristic utilization index corresponding to the second frequency to obtain a second difference (/ -degree)>）。

And finally, calculating the total square error between the first difference value and the second difference value, minimizing the corresponding total square error under all frequencies, solving to obtain a first fitting coefficient under the minimum total square error, and taking the first fitting coefficient as an updated first fitting coefficient. The equation for solving the optimization problem is as follows:

where Cost (fc) is the total trivial error. In the embodiment of the application, all data points (fi, ui) are divided into two types according to the frequency, the data point of the first frequency is (fk, uk), the data point of the second frequency is (fj, uj), and linear fitting is performed on the two groups of data points at the same time, so that a first fitting coefficient which enables the total square error to be minimum is obtained, namely, an optimization problem is solved.

Illustratively, the optimization problem may be solved directly by a conventional Lagrangian multiplier method, where specific solution formulas are omitted. The calculation of the first fitting coefficient is performed for a specific critical frequency fc, so that the fitting problem needs to be repeated for each possible critical frequency, all the total square errors Cost (fc) are compared, and the critical frequency minimizing the total square errors is taken, so as to obtain the final frequency-feature utilization index relationship, namely the piecewise function model.

The above description has been made on the establishment of the piecewise function model, and the above steps further include a linear fitting function model, where the linear fitting function model in the embodiment of the present application is obtained by pre-establishing, and the following description describes the establishment process of the linear fitting function model, specifically as follows:

referring to fig. 8, in some embodiments, the linear fit function model is obtained by the following steps, which may include steps S501 to S505:

step S501, a reference application program is operated on an AI processor, wherein the reference application program corresponds to a target application program;

step S502, basic parameters of an AI processor in the process of running a reference application program are collected, and the basic parameters are regulated by a non-invasive regulating tool;

step S503, obtaining operation performance corresponding to each basic parameter and at least one sub-feature utilization index corresponding to each basic parameter in the adjustment process, wherein the sub-feature utilization index comprises a calculation unit utilization rate, a memory occupancy rate or a memory bandwidth utilization rate;

step S504, performing linear fitting operation between the operation performance obtained in the adjustment process and at least one corresponding sub-feature utilization index, and determining feature weights fitted to the at least one sub-feature utilization index;

Step S505, obtaining a total characteristic utilization index according to at least one sub-characteristic utilization index and a corresponding characteristic weight, and establishing a linear fitting function model according to the characteristic utilization index and the operation performance obtained in the adjusting process.

Illustratively, the linear fit function model is pre-established, and then during the pre-establishment process, the AI process runs a reference application that corresponds to the target application, e.g., is of the same type. The step obtains corresponding data of program operation performance and characteristic utilization index by completely operating a series of benchmark test programs, thereby establishing a linear fitting function model for the model AI processor.

It should be noted that, for more accurate modeling, the number of benchmark test programs in the embodiments of the present application should be as large as possible, for example, at least 10 benchmark test programs should be at least to cover different computing modes as much as possible, and the benchmark test programs that may be used include both an open source test set provided by a chip designer and a test program set (such as AIperf, MLperf) independently developed by an academic organization, which is not limited in particular.

For example, after the benchmark test program is run, basic parameters of the AI processor in the process of running the benchmark application program can still be collected later, and the specific content of collection is similar to the modeling stage in the above steps, which is not described here again. Likewise, the basic parameters are adjusted here by means of a non-invasive adjustment tool.

For convenience of distinction, the specific feature utilization index in practice will be described as a sub-feature utilization index in the embodiments of the present application. Therefore, the chip utilization index that can be acquired in real time on the AI processor generally includes a computing unit utilization, a memory occupancy rate, a memory bandwidth utilization rate, and the like.

For each benchmark program, the benchmark program needs to be completely run for many times under the basic parameters (such as frequency) of different AI processors, and feature utilization indexes are synchronously collected, as shown in fig. 9, and a relative relation diagram between the feature utilization indexes and benchmark test running performance is constructed, wherein each line shown in fig. 9 represents one benchmark program (an auxiliary line passing through an origin is not included), different positions on the same line are running results under different basic parameters (such as frequency), and the abscissa position of each point is a time average value of a feature utilization index collected in one running of the program; the ordinate position of each point, i.e. the running performance of the benchmark program, is defined as the total work volume divided by the total calculated time period, normalized by the respective maximum performance of each program.

It should be noted that fig. 9 is a real experimental data display on the GPU server according to the embodiment of the present application, and the graph uses the GPU computing unit utilization index collected by the nvml tool as the feature utilization index, and the index can establish a good linear homogeneous function relationship with the performance of the benchmark test program, as can be clearly seen from fig. 9.

For example, since there are a plurality of sub-feature utilization indexes, not every sub-feature utilization information can be in a linear relationship with the running performance, in order to realize online modeling, a required linear fitting function model needs to be established, so that a feature utilization index capable of indicating the running performance needs to be found, and therefore, each collectable sub-feature utilization index needs to be plotted in the manner of fig. 9, the fitting accuracy of different sub-feature utilization indexes is compared through linear fitting, the linear combination of the plurality of indexes is considered, the index with the best linear relationship is found, and finally the overall relationship is referred to as the feature utilization index, and thus, the relationship between the feature utilization index and the running performance is established as q=a×u, wherein Q represents the running performance, U represents the feature utilization index, a is a coefficient, and coefficient a is different according to the program.

For example, U1, U2, and U3 are respectively used as the calculation unit utilization rate, the memory occupancy rate, and the memory bandwidth utilization rate, the running performance (Q, U1, U2, and U3) collected by the benchmark test program may be subjected to a linear fitting function relationship q=a1×u1+a2×u2+a3×u3, where A1, A2, and A3 are feature weights under the corresponding calculation unit utilization rate, memory occupancy rate, or memory bandwidth utilization rate, and after performing a linear fitting operation, the size of at least one of A1, A2, and A3 may be determined, and finally a linear fitting function model is established. It will be appreciated that if the sub-feature utilization information includes the computation unit utilization, the memory occupancy, and the memory bandwidth utilization, a1×u1+a2+u2+a3×u3 may represent the feature utilization index.

It should be noted that, in the embodiment of the present application, the characteristic utilization indexes at a plurality of different frequency points need to be collected to establish fig. 9, if only the characteristic utilization indexes at a single frequency are collected, the characteristic utilization indexes at a plurality of frequency points need to be collected to establish an effective performance model, and the actual measurement data in the embodiment of the present application indicate that even if two programs have identical utilization data at a certain frequency, their respective frequency-operation performance relationships may still show significant differences, so that to distinguish the differences, the characteristic utilization indexes at a plurality of frequency points need to be collected to establish an effective performance model.

In the embodiment of the present application, only the sub-feature utilization index includes the computing unit utilization, the memory occupancy rate and the memory bandwidth utilization rate, and any one or more of them may be selected for linear fitting on the premise of meeting the requirements of the embodiments of the present application, which is not particularly limited herein.

Referring to fig. 10, in some embodiments, step S103 may include steps S601 to S602:

step S601, a plurality of sub-feature utilization rate indexes corresponding to each basic parameter are obtained in a real-time adjustment process;

step S602, weighting calculation is carried out according to at least one sub-feature utilization index and corresponding feature weights, and the feature utilization index in the real-time adjustment process is obtained.

For example, in the process of pre-establishing the linear fitting function model, the relationship between the sub-feature utilization index and the feature utilization index is described, and then, in the same way, in the modeling stage, along with the adjustment of the basic parameters, a plurality of sub-feature utilization indexes of the AI processor in real time are obtained, including the computing unit utilization rate, the memory occupancy rate and the memory bandwidth utilization rate.

In the embodiment of the present application, in the process of establishing the linear fitting function model in the early stage, the feature weights corresponding to the sub-feature utilization indexes are determined, and the relationships between the sub-feature utilization indexes and the feature utilization indexes can be expressed by using the feature weights, so in the embodiment of the present application, the sub-feature utilization indexes U1, U2 and U3 obtained in real time in the modeling stage are respectively weighted according to the feature weights A1, A2 and A3, to obtain the final feature utilization indexes (a1+u1+a2+u2+a3).

The foregoing describes an example in which the function model is a piecewise function model, and in addition, the function model in the embodiment of the present application may also be an operation performance prediction model, where the target operation parameter at this time is an operation performance, and a process of establishing an operation performance prediction result is specifically as follows:

in some embodiments, the obtaining the running performance prediction result in step S105 may further include step S203:

step S203, inputting real-time basic parameters into an operation performance prediction model to obtain predicted operation performance, and taking the predicted operation performance as a corresponding operation performance prediction result.

As an example, according to the above embodiment, the basic parameter (such as frequency) and the characteristic utilization index are in a piecewise functional relationship, and then the characteristic utilization index and the running performance are in a linear relationship, so that the fitting operation can be directly performed between the basic parameter and the running performance obtained by directly sampling, and the running performance prediction model mapped from the basic parameter to the running performance can be built.

The method includes the steps of establishing an operation performance prediction model, directly inputting basic parameters acquired in real time into the operation performance prediction model in a prediction stage, predicting to obtain operation performance, and taking the predicted operation performance as a final operation performance prediction result.

The foregoing describes an example in which the function model is a piecewise function model and an operation performance prediction model, and in addition, the function model in the embodiment of the present application may also be an operation power consumption prediction model, where the target operation parameter is operation power consumption, and a process of establishing an operation power consumption prediction result is specifically as follows:

in some embodiments, obtaining the operation power consumption prediction result in step S105 may further include step S204:

step S204, inputting real-time basic parameters into an operation power consumption prediction model to obtain predicted operation power consumption, and taking the predicted operation power consumption as a corresponding operation power consumption prediction result.

Before explaining the process of establishing the running power consumption prediction result in the embodiment of the present application, in order to make the flow in the embodiment of the present application clearer, the complete implementation flow in the embodiment of the present application is described herein with reference to the above process of pre-establishing the linear fitting function model. As shown in fig. 11, a complete flowchart of the embodiment of the present application is shown, in which the following steps are performed:

step (1), characteristic utilization index-operation performance relation: constructing a characteristic utilization index-operation performance relation through data on a benchmark test program aiming at AI processor hardware;

Step (2), sampling in real time: when a user runs an actual target application program, the frequency of the AI processor is regulated for a plurality of times in a modeling stage, and real-time data sampling is carried out;

step (3), online performance modeling and online power consumption modeling: performing linear modeling by using the sampling data, and establishing a prediction model for predicting the running performance and the running power consumption;

step (4), online prediction: and predicting the running performance and the running power consumption of the target application program when the target application program runs at other frequencies in a prediction stage by using the established prediction model.

Wherein, the step (1) is only needed to be performed once for one piece of AI processor hardware to obtain the characteristic utilization index-operation performance relation model, namely, the linear fitting function model, and the step (1) is not needed to be executed when the user program is operated later. When the user runs the AI task on the AI server, the system administrator runs steps (2) (3) (4) in the background either manually or through a program script.

Illustratively, step (2) is illustrated by taking the sample to obtain the operating frequency (Mhz), the calculation unit utilization (%), the memory bandwidth occupancy (%) and the power consumption (watt) as examples, resulting in fig. 12. This step is performed when the AI processor runs the target application, and needs to collect the sub-feature utilization index and power consumption of the AI processor under different basic parameters (taking frequency as an example). The acquisition process needs to change the frequency gradually and perform sampling in a certain period of time (0.2 seconds in the figure), as shown by the shaded portion of the 1.4-3.2 second position in the figure, in this period of time, the frequency of the AI processor is gradually increased from about 1000MHz to about 1500MHz in multiple steps with the AI processor in a period of 0.2 seconds, then the frequency is gradually reduced back to about 1000MHz, and the calculating unit utilization index, the memory bandwidth utilization index and the power consumption value at the corresponding frequency are acquired in each 0.2 second period. In this step, the specific length of the sampling time period and the specific value of the sampling frequency point can be adjusted according to the characteristics of the AI processor, but the range of the AI processor itself for regulating and supporting and the inherent limitation of the regulating and controlling delay are considered, and in addition, the time interval should not be too long, so that the sampling step takes too long.

In fig. 12, the frequency appears to be two-stage rising and falling in the sampling process shown by hatching, so that there are two sampling values for the characteristic utilization index of each frequency. If various indexes show larger fluctuation due to a specific calculation mode in the running process of some programs, the number of frequency rising sections and frequency falling sections can be increased to obtain more sampling values, the influence caused by uncontrollable fluctuation can be relieved by using the averaging of a plurality of sampling values, and the accuracy of subsequent modeling and prediction is improved.

Based on the above sampling, as the linear relationship between the operating frequency and the operating performance of the application program in the upper part of fig. 13, the linear relationship between the operating frequency and the operating power consumption of the application program in the lower part of fig. 13, and the steps of the foregoing embodiments are collected, a prediction model for the currently operating program can be established. In addition, steps (2) (3) (4) may be repeatedly performed at intervals to cope with a scenario in which the current running program may vary.

The built prediction model can be used for predicting key parameters of the program under other different frequencies, such as running performance and running power consumption, and can be applied to system regulation and control under a power consumption limiting scene. For example, if the power consumption of the AI processor is required to be controlled within a certain period of time and cannot exceed a certain upper limit, the system administrator may determine the upper limit of the operating frequency of the AI processor by using the model; for another example, if control program performance is required to be not below a certain lower limit for a certain period of time, the system administrator may determine a lower limit for the AI processor operating frequency using the model. This modeling and regulatory process is not dependent on user involvement.

Therefore, the above description of the complete flow in the embodiment of the present application is that, in the modeling stage, the characteristic utilization index that can be collected in real time on the AI processor generally includes the computing unit utilization, the memory occupancy rate, the memory bandwidth utilization rate, and the like, and the AI processor also generally supports the real-time power consumption of the simultaneous collection chip, and the specific collectable index name and the collection tool command are different according to the type of the AI processor. For example, a company's NPU processor supports the use of NPU-smi tools to collect chip utilization data in real time, and a company's GPU processor supports the use of nvidia-smi and nvml tools to collect data in real time. It should be noted that these tools all need to be adapted to perform real-time acquisition during program execution.

As can be seen from the above embodiments, the basic parameter (such as frequency) and the operation power consumption are in a linear relationship, so that the fitting operation can be directly performed between the basic parameter and the operation power consumption obtained by directly sampling, and the operation power consumption prediction model mapped from the basic parameter to the operation power consumption can be established.

The method includes the steps of establishing an operation power consumption prediction model, directly inputting basic parameters acquired in real time into the operation power consumption prediction model in a prediction stage, predicting to obtain operation power consumption, and taking the predicted operation power consumption as a final operation power consumption prediction result.

The following describes the process of establishing the running power consumption prediction model, taking the basic parameters as the frequency as an example:

referring to fig. 14, in some embodiments, the process of establishing the operation power consumption prediction model in step S104 may include steps S701 to S703:

step S701, an initial cubic function model mapped from frequency to operation power consumption is established according to the power consumption types in different states in the actual operation process of the AI processor, and an initial second fitting coefficient is configured for the initial cubic function model;

step S702, inputting the frequency in the real-time adjustment process into an initial cubic function model, and updating an initial second fitting coefficient according to the obtained result to obtain an updated second fitting coefficient;

and step S703, adjusting the initial cubic function model according to the updated second fitting coefficient, and taking the adjusted cubic function model as an operation power consumption prediction model.

In the embodiment of the present application, a relative relationship is established between each frequency and the corresponding operation power consumption in the real-time adjustment process, where f represents the frequency, P represents the operation power consumption, and the frequency-operation power consumption relationship is P (f).

According to the power consumption types in different states in the actual operation process of the AI processor, an initial cubic function model mapped from frequency to operation power consumption is established, and an initial second fitting coefficient is configured for the initial cubic function model, wherein the power consumption is determined by a theoretical formula of the operation power consumption of the AI processor by using the cubic function fitting. The operating power consumption in an AI processor may be proportional to the frequency, i.e., the higher the frequency, the higher the operating power consumption.

In one possible implementation, the dominant power consumption types in the AI processor are in a cubic relationship with frequency, while the remaining power consumption types are all smaller than the cubic relationship, so for optimal modeling, a cubic function model is built to intentionally achieve the best fit effect. Therefore, in the embodiment of the application, a cubic function model is pre-established, an initial second fitting coefficient is configured for the initial cubic function model, and the second fitting coefficient is continuously optimized in the subsequent fitting process.

The frequency in the real-time adjustment process can be input into the initial cubic function model after the initial cubic function model is obtained, so that the predicted running power consumption of the model is obtained, the predicted running power consumption is used as an output result of the model, and the predicted running power consumption is compared with the running power consumption obtained by actual sampling, so that fitting can be performed, a second fitting coefficient under optimization is solved, and an updated second fitting coefficient is obtained.

Illustratively, after obtaining the updated second fitting coefficient, coefficients in the cubic function model may be perfected to obtain a final cubic function model, which in the embodiment of the present application is referred to as an operation power consumption prediction model.

After the frequency is input into the cubic function model, the predicted running power consumption can be obtained, and optimization can be performed based on the actually collected running power consumption, so as to obtain a final second fitting coefficient, and the following description is given by taking one of the optimization modes as an example:

referring to fig. 15, in some embodiments, step S701 may include steps S801 to S803:

step S801, determining static power consumption, dynamic power consumption and heat dissipation power consumption of an AI processor in different states in the actual operation process;

step S802, determining that the dynamic power consumption and the frequency are in a cubic relation from the direct proportion relation between different power consumption types of the static power consumption, the dynamic power consumption and the heat dissipation power consumption and the frequency respectively;

step S803, an initial cubic function model mapped from frequency to operation power consumption is established according to the cubic relation, and initial second fitting coefficients are generated as coefficients of functions in the cubic function model.

Illustratively, the running power consumption is determined by a theoretical formula of the AI processor running power consumption using a cubic function model fit. The power consumption in the actual operation process of the AI processor mainly comprises static power consumption, dynamic power consumption and heat dissipation power consumption. Wherein dominant dynamic power consumption P _dyn Is caused by the charge and discharge of the capacitor when the logic gate is switched, and the formula is P _dyn =CV ² f, wherein C is capacitance, V is voltage, f is frequency, and since the operating voltage V is approximately proportional to the frequency f, the dynamic power consumption P _dyn One term approximately has a cubic relationship with frequency. And the relation between the rest static power consumption, heat dissipation power consumption and frequency is not higher than a cubic function generally, so that the total power consumption of the AI processor can be fitted by a cubic function model. Experimental data in the embodiments of the present application also show that the primary or secondary functions are not sufficient to achieve optimal modeling, resulting in significant errors, while the tertiary function modeling can achieve optimal fit.

Illustratively, the formula of the cubic function model established in the embodiment of the present application is as follows:

wherein e ₀ 、e ₁ 、e ₂ And e ₃ For the second fitting coefficient, P (f) is the predicted operating power consumption and f is the frequency input into the cubic function model.

Referring to fig. 16, in some embodiments, step S702 may include steps S901 to S903:

step S901, inputting the frequency in the real-time adjustment process into an initial cubic function model to obtain corresponding first power consumption in the real-time adjustment process;

step S902, calculating a difference value between the first power consumption and the operation power consumption corresponding to the frequency to obtain a power consumption difference value;

Step S903, calculating the square error of the power consumption difference, minimizing the corresponding square error under all frequencies, solving to obtain a second fitting coefficient under the minimum square error, and using the second fitting coefficient as an updated second fitting coefficient.

Exemplary, the frequency point acquired in the real-time adjustment process is f _i (i=1, 2, …, n) corresponding to the acquired actual power consumption value P _i (i=1, 2, …, n). For these data points (fi, pi), a cubic function model fitting is performed, i.e. the following optimization problem is solved, resulting in the following formula:

wherein, the method comprises the following steps of) Is the first power consumption) For the power consumption difference, in the embodiment of the present application, the square error of the power consumption difference is calculated, the corresponding square error under all frequencies is minimized, and the second fitting coefficient e under the minimum square error is obtained by solving ₀ 、e ₁ 、e ₂ And e ₃ And serves as an updated second fitting coefficient.

Finally, based on the obtained relation between frequency and operation power consumption, the embodiment of the application can obtain a final cubic function model, and the final cubic function model is used as an operation power consumption prediction model to realize the prediction of the operation power consumption. Similar to the frequency-run performance relationship, the functional relationship applies to the predictive phase of the current run program, so that when a significant change occurs in the run program, a re-sampling of the modeling phase is required again to obtain updated data re-modeling. It should be noted that, based on the above embodiments, since the resources consumed by sampling modeling are small, the overall impact on operation is negligible.

In summary, compared with the previous method, the running performance or power consumption of the program can be obtained only after the complete running of the program is finished, and the method provided by the embodiment of the application can realize online modeling by collecting characteristic utilization index, running power consumption and the like. The AI processor allows real-time collection of the required characteristic utilization rate indexes, running power consumption and the like through a system tool, and the collection of the indexes does not influence the normal running of the application program, and the process does not need an invasive program, namely, does not need any modification of the source code of the application program, so that the method provided by the embodiment of the application is very easy to deploy the application on an actual cluster, and has high practicability.

In addition, the problem that computing resources are seriously wasted in the conventional performance power consumption modeling on an AI processor is solved. Some conventional performance power consumption modeling methods rely on application feature analysis on a program, and a feature analysis tool (such as an nvprof tool for a GPU of a company) used by the method needs to completely run the program to obtain an analysis result, and the analysis process needs to repeatedly run the same application program multiple times, so that serious waste of computing resources is caused. The method provided by the embodiment of the application only needs to sample the characteristic utilization rate index or the running power consumption and the like in a single running process of the program, so that modeling and prediction can be realized in the single running process of one program, and the requirement of repeatedly running the same program is avoided.

It should be noted that, all models in the embodiments of the present application are described above, and all models obtained by fitting used in the embodiments of the present application have sufficient theoretical basis, and are composed of basic functions (particularly, piecewise function models, cubic function models, etc.) with interpretability and predictability, so that the obtained prediction models are easy to understand, and can ensure that hidden danger is not brought to safe operation of the system when the models are deployed and operated on the AI computing cluster.

Next, a flow after obtaining a parameter prediction result in the embodiment of the present application will be described. In an embodiment, the parameter prediction result includes an operation performance prediction result and an operation power consumption prediction result, that is, in the embodiment of the present application, a prediction model for predicting operation performance and operation power consumption is established at the same time, which is not described herein.

Referring to fig. 17, in some embodiments, the real-time parameter prediction method of the AI processor may further calculate an energy efficiency ratio, and adjust the basic parameters according to the energy efficiency ratio, which specifically includes steps S1001 to S1002:

step S1001, calculating the energy efficiency ratio of the AI processor to run the target application program under the real-time basic parameters according to the running performance prediction result and the running power consumption prediction result;

Step S1002, adjusting the basic parameters in real time according to one of the operation performance prediction result, the operation power consumption prediction result, and the energy efficiency ratio.

For example, if the operation performance prediction result and the operation power consumption prediction result are obtained at the same time, the energy efficiency ratio may be calculated, and the energy efficiency ratio of the AI processor reflects the balance condition between the performance and the power consumption of the AI processor under a certain workload. The energy efficiency ratio of a processor can generally be calculated by the following formula: the energy efficiency ratio=operation performance/operation power consumption, the operation performance prediction result is a prediction result of operation performance, and the operation power consumption prediction result is a prediction result of operation power consumption, so that the prediction result can be substituted into the calculation formula, and the energy efficiency ratio of the AI processor for operating the target application program under the real-time basic parameters is calculated.

It should be noted that there may be a large difference between the running performance and the running power consumption of the AI processor under different workloads, so that when the energy efficiency ratio is calculated, an appropriate workload should be selected according to the actual situation, and the reliability of the result may be improved by taking an average value after multiple tests. Meanwhile, when the type selection and optimization of the AI processor are carried out, indexes such as energy efficiency ratio and the like are considered, so that the AI processor with good balance of performance and power consumption is selected, and effective energy conservation and management measures are carried out on the system.

Therefore, after the energy efficiency ratio is obtained, the embodiment of the application can adjust the basic parameters in real time according to one of the operation performance prediction result, the operation power consumption prediction result and the energy efficiency ratio, for example, if the user does not consider the operation power consumption, the user can only pay attention to the operation performance prediction result, the operation performance is improved by adjusting the basic parameters, or if the user does not consider the operation performance, the user can only pay attention to the operation power consumption prediction result, the operation power consumption is reduced by adjusting the basic parameters, or if the user needs to consider the overall energy efficiency ratio, the maximum energy efficiency ratio is ensured by adjusting the basic parameters, and the online energy efficiency ratio adjustment of the AI processing in the program operation process is realized. Compared with the method that the energy efficiency ratio is calculated after the program is operated, the method and the device can realize online energy efficiency ratio adjustment and optimization from the operation state, and improve the management level.

Referring to fig. 18, in some embodiments, the real-time parameter prediction method of the AI processor may further be re-modeled after switching the target application or entering a new operation phase, and specifically includes steps S1101 to S1102:

step S1101, if the AI processor is switched to run to a new target application or the target application runs to a new running stage, re-entering the modeling stage, wherein the running stage of the target application includes initializing, loading data, vector computing, scalar computing or data writing back;

Step 1102, re-fitting operation is performed according to the basic parameters and the target operation parameters newly obtained after the new modeling stage is performed, so as to build a new function model.

For example, an AI processor may run different programs at different times, and the same program may also be formed by multiple phases with significantly different computing characteristics, for example, different phases including initialization, loading data, vector calculation, scalar calculation, and data writing back, so in this embodiment of the present application, after facing a switching target application program or entering a new running phase, the new running phase needs to be modeled again, that is, the prediction model established by the above steps is specific to the current program, even a certain running phase of the current program, and at the same time, if the AI processor is replaced, the new running phase needs to be modeled again.

Thus, when the program is switched or enters a different phase on the AI processor, the modeling phase needs to be performed again, resampled to obtain updated data and re-modeled.

Referring to fig. 19, in some embodiments, if the AI processor periodically runs the target application, the real-time parameter prediction method of the AI processor may also periodically perform modeling, which specifically includes steps S1201 to S1202:

Step S1201, dividing each operation cycle of the target application into a plurality of operation phases;

step S1202, setting the modeling phase and the prediction phase in the running phase, so that the modeling phase in each running period is performed during the period in which the target application runs, wherein at least one modeling phase is located before the prediction phase.

For example, in the embodiment of the present application, the target application may be periodically run based on the task requirement of the AI processor, and the running time of one target application is long, so in order to ensure the stability of modeling, that is, ensure the accuracy of prediction, in the embodiment of the present application, modeling may be periodically performed.

In the embodiment of the present application, the corresponding modeling stage is set for each running period of the program, specifically including dividing each running period of the target application into a plurality of running stages, and then performing stage division, including dividing the modeling stage and the prediction stage, where at least one modeling stage is located before the prediction stage, that is, the modeling stage may be set in a plurality of stages, and the prediction stage may also be set in a plurality of stages, but always one modeling stage is located before the prediction stage, so that after the target application runs periodically, it is ensured that the modeling stage is first entered in each period, and then applied in the prediction stage according to the established prediction model.

In addition, when there are a plurality of modeling stages in one period, which can be respectively interleaved among a plurality of prediction stages, the modeling stages can be repeatedly performed in the period. Since the overhead of sampling and modeling the two specific steps is negligible in the repeated execution of the modeling phase, it is fully acceptable to perform the sampling modeling task on the computing cluster at regular intervals, i.e. once every few tens of seconds or minutes.

For example, the start-stop times of the modeling phase and the prediction phase may not be divided according to time nodes, but according to the degree of establishment of the model. For example, after entering the modeling stage, if the final prediction model is completely established, then the execution of the modeling stage is judged to be completed, and the prediction stage can be entered at any time according to the requirement. Of course, in the embodiment of the present application, explicit start-stop time nodes may be set for the modeling stage and the prediction stage, which is not specifically limited.

Referring to fig. 20, the embodiment of the present application further provides a real-time parameter prediction system of an AI processor, which may implement the real-time parameter prediction method of an AI processor, where the real-time parameter prediction system of an AI processor includes:

A program running module 2001 for running the target application program on the AI processor, wherein a running stage of the target application program includes a modeling stage and a prediction stage after the modeling stage;

the data acquisition module 2002 is used for acquiring basic parameters in the operation process of the AI processor in a modeling stage;

the adjusting module 2003 is used for adjusting basic parameters in real time through a non-invasive adjusting tool corresponding to the AI processor, and acquiring target operation parameters corresponding to the basic parameters in the real-time adjusting process;

a modeling module 2004 for performing a fitting operation based on the plurality of target operating parameters and the corresponding basic parameters, and establishing a function model mapped from the basic parameters to the target operating parameters;

the prediction module 2005 is configured to obtain real-time basic parameters in a prediction stage, and input the real-time basic parameters into the function model to obtain predicted target operation parameters, so as to determine a parameter prediction result according to the predicted target operation parameters.

By way of example, the real-time parameter prediction system of the AI processor may execute the real-time parameter prediction method of the AI processor, by executing the real-time parameter prediction method of the AI processor, in the process of running a target application program on the AI processor, if a prediction model needs to be built after the program runs into a modeling stage, first, basic parameters of the AI processor in the running process are collected, and then the basic parameters are adjusted in real time through a non-invasive adjustment tool corresponding to the AI processor, so that parameter adjustment is achieved without changing logic of the AI processor itself, no additional computing resources are required, and target running parameters corresponding to each basic parameter are obtained in the real-time adjustment process, and the target running parameters are changed along with the change of the basic parameters in the adjustment process, so that fitting operation can be performed accordingly, a function model mapped from the basic parameters to the target running parameters is built, online modeling is realized, and in the following prediction stage, the real-time basic parameters can be input into the model, and then, the required parameter prediction result is obtained by using the model. Therefore, the embodiment of the application can be deployed and operated on line on the AI processor without additional computing resources during the operation, and the model is built according to the data sampled in the program operation process, so that the model has the interpretability and the predictability, and the reliability of the predicted result is high.

The specific implementation of the real-time parameter prediction system of the AI processor is basically the same as the specific embodiment of the real-time parameter prediction method of the AI processor, and will not be described herein. On the premise of meeting the requirements of the embodiment of the application, the real-time parameter prediction system of the AI processor can also be provided with other functional modules so as to realize the real-time parameter prediction method of the AI processor in the embodiment.

The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the real-time parameter prediction method of the AI processor when executing the computer program. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.

Referring to fig. 21, fig. 21 illustrates a hardware structure of an electronic device according to another embodiment, the electronic device includes:

the processor 2101 may be implemented by a general purpose CPU (central processing unit), a microprocessor, an application specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present application;

Memory 2102 may be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage, dynamic storage, or random access memory (RandomAccessMemory, RAM), among others. The memory 2102 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present application are implemented by software or firmware, relevant program codes are stored in the memory 2102, and the processor 2101 invokes a real-time parameter prediction method of the AI processor for executing the embodiments of the present application;

an input/output interface 2103 for implementing information input and output;

the communication interface 2104 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g., USB, network cable, etc.), or may implement communication in a wireless manner (e.g., mobile network, WIFI, bluetooth, etc.);

a bus 2105 for transferring information between components of the device (e.g., the processor 2101, memory 2102, input/output interfaces 2103, and communication interfaces 2104);

wherein the processor 2101, memory 2102, input/output interface 2103 and communication interface 2104 enable communication connections within the device between each other via bus 2105.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the real-time parameter prediction method of the AI processor when being executed by the processor.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and as those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

It will be appreciated by those skilled in the art that the technical solutions shown in the figures do not constitute limitations of the embodiments of the present application, and may include more or fewer steps than shown, or may combine certain steps, or different steps.

The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in this application, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the above elements is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.

Preferred embodiments of the present application are described above with reference to the accompanying drawings, and thus do not limit the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims

1. A method for predicting parameters of an AI processor in real time, the method comprising:

running a target application on an AI processor, wherein a run phase of the target application includes a modeling phase and a prediction phase after the modeling phase;

in the modeling stage, basic parameters in the operation process of the AI processor are collected;

the basic parameters are regulated in real time through a non-invasive regulating tool corresponding to the AI processor, and target operation parameters corresponding to the basic parameters are obtained in the real-time regulating process, wherein the target operation parameters are parameters which change along with the change of the basic parameters, and the target operation parameters are characteristic utilization rate indexes, operation performance or operation power consumption;

performing fitting operation based on a plurality of target operation parameters and the corresponding basic parameters, and establishing a function model mapped to the target operation parameters by the basic parameters;

And in the prediction stage, acquiring the real-time basic parameters, and inputting the real-time basic parameters into the function model to obtain the predicted target operation parameters so as to determine a parameter prediction result according to the predicted target operation parameters.

2. The AI processor real-time parameter prediction method of claim 1, wherein if the function model is a piecewise function model, the target operating parameter is a feature utilization index; inputting the real-time basic parameters into the function model to obtain the predicted target operation parameters, so as to determine a parameter prediction result according to the predicted target operation parameters, wherein the method comprises the following steps:

inputting the real-time basic parameters into the piecewise function model to obtain a predicted characteristic utilization index;

acquiring a preset linear fitting function model mapped to the running performance by the characteristic utilization index, and inputting the predicted characteristic utilization index into the linear fitting function model to obtain a corresponding running performance prediction result;

or if the function model is an operation performance prediction model, the target operation parameter is operation performance; inputting the real-time basic parameters into the function model to obtain the predicted target operation parameters, so as to determine a parameter prediction result according to the predicted target operation parameters, wherein the method comprises the following steps:

Inputting the real-time basic parameters into the running performance prediction model to obtain predicted running performance, and taking the predicted running performance as a corresponding running performance prediction result;

or if the function model is an operation power consumption prediction model, the target operation parameter is operation power consumption; inputting the real-time basic parameters into the function model to obtain the predicted target operation parameters, so as to determine a parameter prediction result according to the predicted target operation parameters, wherein the method comprises the following steps:

inputting the real-time basic parameters into the operation power consumption prediction model to obtain predicted operation power consumption, and taking the predicted operation power consumption as a corresponding operation power consumption prediction result.

3. The AI processor real-time parameter prediction method of claim 2, wherein if the function model is a piecewise function model, the target operating parameter is a feature utilization index, and the base parameter is a frequency;

the fitting operation is performed based on the target operation parameters and the corresponding basic parameters, and a function model mapped to the target operation parameters by the basic parameters is established, which comprises the following steps:

Establishing a roof line model according to each frequency and the corresponding characteristic utilization index in the real-time adjustment process, and determining critical frequency in the frequencies according to the roof line model;

establishing an initial first fitting coefficient according to the roof line model, and establishing an initial piecewise function model mapped from the frequency to the characteristic utilization index according to the critical frequency and the initial first fitting coefficient;

inputting the frequency in the real-time adjustment process into an initial piecewise function model according to the magnitude relation between the frequency and the critical frequency, and updating the initial first fitting coefficient according to the obtained result to obtain the updated first fitting coefficient;

and adjusting the initial piecewise function model according to the updated first fitting coefficient to obtain the final piecewise function model.

4. The AI processor real-time parameter prediction method of claim 3, wherein the inputting the frequency in the real-time adjustment process into the initial piecewise function model according to the magnitude relation with the critical frequency, and updating the initial first fitting coefficient according to the obtained result, and obtaining the updated first fitting coefficient comprises:

Determining that each frequency greater than or equal to the critical frequency in the real-time adjustment process is a first frequency, and each frequency smaller than the critical frequency is a second frequency, and respectively inputting the first frequency and the second frequency into the initial piecewise function model to obtain a first index and a second index corresponding to the real-time adjustment process;

calculating a difference value between the first index and the characteristic utilization rate index corresponding to the first frequency to obtain a first difference value, and calculating a difference value between the second index and the characteristic utilization rate index corresponding to the second frequency to obtain a second difference value;

and calculating the total square error between the first difference value and the second difference value, minimizing the corresponding total square error under all frequencies, solving to obtain the first fitting coefficient under the minimum total square error, and taking the first fitting coefficient as the updated first fitting coefficient.

5. The AI processor real-time parameter prediction method of claim 2, wherein the linear fit function model is obtained by:

running a reference application program on an AI processor, wherein the reference application program corresponds to the target application program;

Collecting the basic parameters of the AI processor in the process of running the reference application program, and adjusting the basic parameters through the non-invasive adjustment tool;

acquiring operation performance corresponding to each basic parameter and at least one sub-feature utilization index corresponding to each basic parameter in the adjusting process, wherein the sub-feature utilization index comprises a calculation unit utilization rate, a memory occupancy rate or a memory bandwidth utilization rate;

according to the operation performance obtained in the adjustment process, linear fitting operation is carried out between the operation performance and at least one corresponding sub-feature utilization index, and feature weights fitted to the at least one sub-feature utilization index are determined;

and obtaining a total characteristic utilization index according to at least one sub-characteristic utilization index and the corresponding characteristic weight, and establishing the linear fitting function model according to the characteristic utilization index and the operation performance obtained in the adjusting process.

6. The AI processor real-time parameter prediction method of claim 5, wherein if the function model is a piecewise function model, the target operating parameter is a feature utilization index;

The obtaining the target operation parameters corresponding to the basic parameters in the real-time adjustment process comprises the following steps:

acquiring a plurality of sub-feature utilization rate indexes corresponding to the basic parameters in a real-time adjustment process;

and carrying out weighted calculation according to at least one sub-feature utilization index and the corresponding feature weight to obtain the feature utilization index in the real-time adjustment process.

7. The AI processor real-time parameter prediction method of claim 2, wherein if the function model is an operation power consumption prediction model, the target operation parameter is operation power consumption, and the basic parameter is frequency;

according to the power consumption types in different states in the actual operation process of the AI processor, an initial tertiary function model mapped from the frequency to the operation power consumption is established, and an initial second fitting coefficient is configured for the initial tertiary function model;

inputting the frequency in the real-time adjustment process into the initial cubic function model, and updating the initial second fitting coefficient according to the obtained result to obtain the updated second fitting coefficient;

And adjusting the initial cubic function model according to the updated second fitting coefficient, and taking the adjusted cubic function model as the running power consumption prediction model.

8. The method for predicting real-time parameters of an AI processor of claim 7, wherein said establishing an initial cubic function model mapped from said frequency to said operating power consumption according to power consumption types in different states during actual operation of said AI processor, and configuring an initial second fitting coefficient for said initial cubic function model, comprises:

determining static power consumption, dynamic power consumption and heat dissipation power consumption of the AI processor in different states in the actual operation process;

determining that the dynamic power consumption and the frequency are in a cubic relation from the proportional relation between different power consumption types of the static power consumption, the dynamic power consumption and the heat dissipation power consumption and the frequency;

and establishing an initial cubic function model mapped from the frequency to the running power consumption according to the cubic relation, and generating initial second fitting coefficients as coefficients of functions in the cubic function model.

9. The AI processor real-time parameter prediction method of claim 7, wherein inputting the frequency in the real-time adjustment process into the initial cubic function model, and updating the initial second fitting coefficient according to the obtained result, obtaining the updated second fitting coefficient comprises:

Inputting the frequency in the real-time adjustment process into the initial cubic function model to obtain corresponding first power consumption in the real-time adjustment process;

calculating a difference value between the first power consumption and the operation power consumption corresponding to the frequency to obtain a power consumption difference value;

and calculating the square error of the power consumption difference value, minimizing the square error corresponding to all frequencies, solving to obtain the second fitting coefficient with the minimum square error, and taking the second fitting coefficient as the updated second fitting coefficient.

10. The AI processor-real-time parameter-prediction method of claim 2, wherein the parameter-prediction result includes an operation-performance-prediction result and an operation-power-consumption-prediction result, the method further comprising:

according to the operation performance prediction result and the operation power consumption prediction result, calculating the energy efficiency ratio of the AI processor for operating the target application program under the real-time basic parameters;

and adjusting the basic parameters in real time according to one of the operation performance prediction result, the operation power consumption prediction result and the energy efficiency ratio.

11. The AI processor real-time parameter prediction method of claim 1, further comprising:

If the AI processor is switched to run to a new target application program or the target application program is run to a new running stage, re-entering the modeling stage, wherein the running stage of the target application program comprises initialization, data loading, vector calculation, scalar calculation or data writing back;

and re-fitting operation is carried out according to the basic parameters and the target operation parameters which are newly obtained after the new modeling stage is carried out, and a new function model is established.

12. The AI processor-real-time parameter-prediction method of claim 1, wherein the AI processor periodically runs the target application, the method further comprising:

dividing each operating cycle of the target application into a plurality of operating phases;

the modeling phase and the prediction phase are set in the operation phase such that modeling is performed in the modeling phase within each of the operation cycles during the periodic operation of the target application, wherein at least one of the modeling phases is located before the prediction phase.

13. A real-time parameter prediction system for an AI processor, the system comprising:

The program running module is used for running a target application program on the AI processor, wherein the running stage of the target application program comprises a modeling stage and a prediction stage after the modeling stage;

the data acquisition module is used for acquiring basic parameters in the operation process of the AI processor in the modeling stage;

the adjusting module is used for adjusting the basic parameters through a non-invasive adjusting tool corresponding to the AI processor in real time and acquiring target operation parameters corresponding to the basic parameters in a real-time adjusting process, wherein the target operation parameters are parameters which change along with the change of the basic parameters, and the target operation parameters are characteristic utilization rate indexes, operation performance or operation power consumption;

the modeling module is used for carrying out fitting operation based on a plurality of target operation parameters and the corresponding basic parameters, and establishing a function model mapped to the target operation parameters by the basic parameters;

and the prediction module is used for acquiring the real-time basic parameters in the prediction stage, inputting the real-time basic parameters into the function model, and obtaining the predicted target operation parameters so as to determine a parameter prediction result according to the predicted target operation parameters.

14. An electronic device comprising a memory and a processor, the memory storing a computer program, the processor implementing the AI processor real-time parameter prediction method of any of claims 1-12 when the computer program is executed.

15. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the AI processor real-time parameter prediction method of any of claims 1-12.