CN117827710B - DMA bandwidth determining method, device, equipment and medium based on AI chip - Google Patents

DMA bandwidth determining method, device, equipment and medium based on AI chip Download PDF

Info

Publication number
CN117827710B
CN117827710B CN202410251615.5A CN202410251615A CN117827710B CN 117827710 B CN117827710 B CN 117827710B CN 202410251615 A CN202410251615 A CN 202410251615A CN 117827710 B CN117827710 B CN 117827710B
Authority
CN
China
Prior art keywords
dma
data
bandwidth
chip
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410251615.5A
Other languages
Chinese (zh)
Other versions
CN117827710A (en
Inventor
朱佳乐
王军伟
孙诚程
贾明桥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Suiyuan Intelligent Technology Co ltd
Original Assignee
Shanghai Suiyuan Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Suiyuan Intelligent Technology Co ltd filed Critical Shanghai Suiyuan Intelligent Technology Co ltd
Priority to CN202410251615.5A priority Critical patent/CN117827710B/en
Publication of CN117827710A publication Critical patent/CN117827710A/en
Application granted granted Critical
Publication of CN117827710B publication Critical patent/CN117827710B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a DMA bandwidth determining method, device, equipment and medium based on an AI chip. Acquiring current DMA data corresponding to a target AI chip; performing data processing on the current DMA data through a pre-constructed data preprocessing method to obtain standard current DMA data; inputting standard current DMA data into a pre-trained chip bandwidth determination mathematical model, and determining the corresponding DMA bandwidth of a target AI chip under different data carrying conditions; and feeding back the DMA bandwidth to determine the optimal performance of the DMA bandwidth in data handling corresponding to the target AI chip. The problem of inaccurate DMA bandwidth deduction and evaluation of different AI chips is solved, limit evaluation of the AI chips is realized, and the accuracy of DMA bandwidth deduction of the AI chips of different types is improved.

Description

DMA bandwidth determining method, device, equipment and medium based on AI chip
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a medium for determining DMA bandwidth based on an AI chip.
Background
In a deep learning AI (ARTIFICIAL INTELLIGENCE ) chip, on-chip DMA (Direct Memory Access, direct memory access) operations are atomic level operations of each computational data stream, and the bandwidth of a DMA operation is one of the key factors that affect the actual performance of operators on hardware.
The inventors have found that the following drawbacks exist in the prior art in the process of implementing the present invention: the DMA bandwidth rules are different for different hardware. The development of the computing node needs to perform different optimization on different hardware, so that the calculation amount of DMA bandwidth limit performance evaluation is large, the inference process is complex, a large amount of labor cost and time cost are needed, and the efficiency and accuracy are low.
Disclosure of Invention
The invention provides a DMA bandwidth determining method, device, equipment and medium based on AI chips, so as to improve the accuracy and efficiency of the DMA bandwidth deduction of the AI chips of different types.
According to an aspect of the present invention, there is provided an AI chip-based DMA bandwidth determination method, including:
Acquiring current DMA data corresponding to a target AI chip; the current DMA data is data which is carried by using DMA and has minimum granularity in the target AI chip;
performing data processing on the current DMA data through a pre-constructed data preprocessing method to obtain standard current DMA data;
Inputting the standard current DMA data into a pre-trained chip bandwidth determination mathematical model, and determining the corresponding DMA bandwidth of the target AI chip under different data carrying conditions;
and feeding back the DMA bandwidth to determine the optimal performance of the DMA bandwidth in data handling corresponding to the target AI chip.
According to another aspect of the present invention, there is provided an AI chip-based DMA bandwidth determining apparatus, including:
The current DMA data acquisition module is used for acquiring current DMA data corresponding to the target AI chip; the current DMA data is data which is carried by using DMA and has minimum granularity in the target AI chip;
the standard current DMA data determining module is used for carrying out data processing on the current DMA data through a pre-constructed data preprocessing method to obtain standard current DMA data;
the DMA bandwidth determining module is used for inputting the standard current DMA data into a pre-trained chip bandwidth determining model to determine the DMA bandwidth of the target AI chip under different data carrying conditions;
and the DMA bandwidth feedback module is used for feeding back the DMA bandwidth so as to determine the optimal performance of the DMA bandwidth in the data handling corresponding to the target AI chip.
According to another aspect of the present invention, there is provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the AI-chip-based DMA bandwidth determination method according to any of the embodiments of the present invention when the computer program is executed by the processor.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the AI chip-based DMA bandwidth determination method of any one of the embodiments of the present invention when executed.
According to the technical scheme, the current DMA data corresponding to the target AI chip is obtained; performing data processing on the current DMA data through a pre-constructed data preprocessing method to obtain standard current DMA data; inputting standard current DMA data into a pre-trained chip bandwidth determination mathematical model, and determining the corresponding DMA bandwidth of a target AI chip under different data carrying conditions; and feeding back the DMA bandwidth to determine the optimal performance of the DMA bandwidth in data handling corresponding to the target AI chip. The problem of inaccurate DMA bandwidth deduction and evaluation of different AI chips is solved, limit evaluation of the AI chips is realized, accuracy and efficiency of DMA bandwidth deduction of the AI chips of different types are improved, and labor cost and time cost are saved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for determining a DMA bandwidth based on an AI chip according to a first embodiment of the present invention;
fig. 2 is a flowchart of a DMA bandwidth determining method based on an AI chip according to a second embodiment of the present invention;
Fig. 3 is a schematic structural diagram of a DMA bandwidth determining apparatus based on an AI chip according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "target," "current," and the like in the description and claims of the present invention and the above-described drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of an AI-chip-based DMA bandwidth determination method according to an embodiment of the present invention, which is applicable to the case of determining the DMA bandwidths of AI chips of different types, and the method may be performed by an AI-chip-based DMA bandwidth determination apparatus, which may be implemented in the form of hardware and/or software.
Accordingly, as shown in fig. 1, the method includes:
S110, acquiring current DMA data corresponding to the target AI chip.
The current DMA data is data which is carried by using DMA directly and has minimum granularity in the target AI chip.
The current DMA data may be data collected from AI chips, and the DMA data collected by different types of AI chips may be different. Specifically, the current DMA data is data that is directly handled using DMA with minimal granularity in the target AI chip.
In this embodiment, based on the minimum granularity DMA operation of the AI chip, experimental examples can be constructed, and AI chip data transmission can be divided into data transmission of different memory levels. For example: l3 to L3, L3 to L2, L2 to L3, L2 to L2, L2 to L1, and L1 to L2. Based on different kinds of data transmission between different layers, the data transmission realization of the minimum granularity of a single variable which can be operated or simulated operated on an AI chip is completed. Therefore, the current DMA data acquisition operation can be performed on the target AI chip by the data acquisition manner described above.
S120, carrying out data processing on the current DMA data through a pre-constructed data preprocessing method to obtain standard current DMA data.
In this embodiment, the current DMA data is subjected to data processing by a data preprocessing method, and specifically, the data preprocessing method may include performing data compression measurement processing and data shuffling reorganizing processing on the current DMA data.
In detail, the standard current DMA data is obtained by carrying out data compression measurement and data shuffling reorganization on the current DMA data, so that the determination processing operation of the DMA bandwidth can be better carried out through a chip bandwidth determination mathematical model.
S130, inputting the standard current DMA data into a pre-trained chip bandwidth determination mathematical model, and determining the corresponding DMA bandwidth of the target AI chip under different data carrying conditions.
The chip bandwidth determination mathematical model may be a model for performing bandwidth identification determination on AI chips of different types.
In this embodiment, the standard current DMA data may be identified by the chip bandwidth determining mathematical model, so as to determine the DMA bandwidth corresponding to the target AI chip under different data handling conditions.
Specifically, since the current DMA data is data that is directly handled using DMA with minimum granularity in the target AI chip, the current DMA data acquired in different data handling cases is different. Therefore, the DMA bandwidths respectively corresponding to the AI chips of different types under different data carrying conditions can be identified more flexibly, and the accuracy of determining the DMA bandwidths can be improved.
Optionally, the inputting the current DMA data using the standard into a chip bandwidth determining mathematical model, determining the DMA bandwidth corresponding to the target AI chip under different data handling conditions includes: carrying out data step frequency characteristic extraction processing on the standard current DMA data, and determining the step frequency characteristic of the current data; and acquiring the current chip type corresponding to the target AI chip, inputting the current chip type and the current data step frequency characteristic into a pre-trained chip bandwidth determination mathematical model, and determining the DMA bandwidth corresponding to the target AI chip.
The current data step frequency characteristic may be obtained by step frequency extraction of different layers in standard current DMA data.
Specifically, the standard current DMA data is step-frequency extracted by means of MLE (Maximum Likelihood Estimatio, maximum likelihood estimation) and MAP (Maximum A Posteriori, maximum a posteriori estimation). Specifically, because the DMA data presents the specific of multi-layer segmentation, the node with consistent step frequency in each layer is searched, the next sub-node for automatically generating the phase synchronous frequency is continued to judge the effectiveness of the step frequency for perception correction, and finally, the sub-sequence with consistent step frequency is formed. And after searching a sub-sequence, classifying and mapping the sub-sequence to the bandwidth under the layer to obtain a multi-layer perception equation, and further determining the DMA bandwidth.
It will be appreciated that the current data step frequency characteristic may be a sub-sequence.
Among them, the current chip type may be a case for describing a specific type of AI chip.
In this embodiment, the mathematical model may be determined according to the current chip type and the current data step frequency characteristic, so as to further determine the DMA bandwidth corresponding to the target AI chip.
And S140, feeding back the DMA bandwidth to determine the optimal performance of the DMA bandwidth in data handling corresponding to the target AI chip.
In this embodiment, after determining the DMA bandwidth through the chip bandwidth determining mathematical model, the DMA bandwidth needs to be fed back, so that the user can obtain the DMA bandwidths corresponding to different AI chips in time. Therefore, the optimal performance of the DMA bandwidth in the data carrying process corresponding to the target AI chip can be determined.
According to the technical scheme, the current DMA data corresponding to the target AI chip is obtained; performing data processing on the current DMA data through a pre-constructed data preprocessing method to obtain standard current DMA data; inputting standard current DMA data into a pre-trained chip bandwidth determination mathematical model, and determining the corresponding DMA bandwidth of a target AI chip under different data carrying conditions; and feeding back the DMA bandwidth to determine the optimal performance of the DMA bandwidth in data handling corresponding to the target AI chip. The problem of inaccurate DMA bandwidth deduction and evaluation of different AI chips is solved, limit evaluation of the AI chips is realized, accuracy and efficiency of DMA bandwidth deduction of the AI chips of different types are improved, and labor cost and time cost are saved.
Example two
Fig. 2 is a flowchart of a DMA bandwidth determining method based on an AI chip according to a second embodiment of the present invention. The present embodiment optimizes based on the above embodiments, and in this embodiment, training of a mathematical model for determining a chip bandwidth is further included before the current DMA data corresponding to the target AI chip is acquired.
Accordingly, as shown in fig. 2, the method includes:
S210, acquiring a plurality of historical DMA data corresponding to AI chips of different types respectively, and constructing a historical DMA data set by each historical DMA data.
The historical DMA data may be DMA data of different AI chips under different handling conditions obtained at a historical time.
Further, the collected plurality of historical DMA data may be stored to obtain a historical DMA data set. It is understood that the historical DMA data set may be a collection that stores different historical DMA data.
Optionally, the constructing a historical DMA data set from each historical DMA data includes: acquiring historical data shape information, DMA running time of the historical data, historical standard DMA bandwidth and chip type corresponding to each historical DMA data respectively; and carrying out joint storage on each historical DMA data, the historical data shape information, the DMA running time of the historical data, the historical standard DMA bandwidth and the chip type, and constructing a historical DMA data set.
The historical data shape information may be information describing tensor shapes during the DMA data handling process. The DMA running time of the history data may be a transfer running time required for describing the transfer of the DMA data, and in particular, the DMA running time of the history data needs to distinguish between data combinations in different directions.
The historical standard DMA bandwidth may be a DMA bandwidth used for bandwidth determination by a minimum granularity data size, or may be used for DMA bandwidth utilization calculation by DMA data. The chip type may be the type of chip corresponding to the different AI chips.
The advantages of this arrangement are that: the historical DMA data sets are constructed through the historical DMA data, so that the obtained historical DMA data sets can be better used for training the chip bandwidth determination mathematical model, and a rich data set is provided for training the chip bandwidth determination mathematical model.
S220, performing data processing on the historical DMA data set through a pre-constructed data preprocessing method to obtain a standard training historical DMA data set.
Wherein the standard training history DMA data set comprises a plurality of standard training history DMA data.
The standard training history DMA data set may be a data set obtained through data preprocessing. The standard training historical DMA data can be data obtained by respectively carrying out data preprocessing on each historical DMA data.
Optionally, the data processing is performed on the historical DMA data set by a pre-constructed data preprocessing method to obtain a standard training historical DMA data set, which includes: performing data compression measurement on the historical DMA data set to obtain a historical correct DMA data set; performing data shuffling and reorganizing treatment on the historical correct DMA data set to obtain a historical reorganized DMA data set; dividing the historical reorganization DMA data set according to a preset training test proportion to obtain a standard test historical DMA data set and a standard training historical DMA data set.
The historical correct DMA data set may be a data set obtained by performing multiple compression measurement on the historical DMA data set, so as to eliminate incorrect data in the historical DMA data set, for example: problems with exceeding physical limits due to data collection or other anomalies in the data.
The historical recombined DMA data set can be a data set obtained by carrying out data shuffling and recombination on the historical correct DMA data set, and specifically, the recombination can be a multi-round randomized out-of-order recombination process.
The standard test history DMA data set may be a data set for performing a model accuracy test on a trained chip bandwidth determination mathematical model. The standard training history DMA data set may be a data set for training a chip bandwidth determination mathematical model.
In this embodiment, the historical DMA data set is obtained by training the historical DMA data compression test process and the data shuffling reorganizing process for each standard in the historical DMA data set. The historical reorganization DMA data set can be further subjected to data division according to a certain proportion, so that a standard test historical DMA data set and a standard training historical DMA data set are obtained.
S230, inputting each standard training history DMA data to an initial chip bandwidth determination mathematical model to perform model training, and determining that training is completed when the error on the standard training history DMA data set is smaller than a preset error value threshold value and the initial chip bandwidth determination mathematical model outputs the DMA bandwidth.
In this embodiment, each standard training history DMA data in the standard training history DMA data set needs to be input to the initial chip bandwidth determination mathematical model to perform model training, so that a corresponding DMA bandwidth may be output.
Further, when the DMA bandwidth is output (i.e. the bandwidth value is determined for the history model), the DMA bandwidth needs to be compared with the history standard DMA bandwidth stored in the standard training history DMA data set in advance, so that the error size can be determined, and the error size can be compared with the error value threshold to determine whether the chip bandwidth determination mathematical model is trained.
Optionally, the inputting each standard training history DMA data to an initial chip bandwidth determining mathematical model to perform model training, when a model error value on the standard training history DMA data set is smaller than a preset error value threshold, determining that training is completed for the chip bandwidth determining mathematical model, including: respectively carrying out data step frequency feature extraction processing on each standard training historical DMA data to obtain historical data step frequency features respectively corresponding to each standard training historical DMA data; inputting each historical data step frequency characteristic into an initial chip bandwidth determination mathematical model to perform model training, and obtaining a historical model determination bandwidth value; the function expression of the initial chip bandwidth determination mathematical model is as follows: wherein x represents a model input parameter, and x is formed by historical data step frequency characteristics and historical standard DMA bandwidth; /(I) Determining a bandwidth value by representing the historical model; i represents the number of different types of species corresponding to AI chips; /(I)、/>、/>、/>、/>、/>And/>All represent the self-adaptive updating parameters, and the AI chips of different types correspond to different self-adaptive updating parameters; f represents an outlier elimination term increasing generalization; /(I)Determining a bandwidth value by using a historical model corresponding to the AI chip of the type i; determining a bandwidth value according to the historical model to calculate a model error value; and acquiring an error value threshold, judging whether the model error value of the current chip bandwidth determination mathematical model is smaller than the error value threshold, and if so, determining that training is completed to determine the chip bandwidth determination mathematical model.
In this embodiment, data step frequency feature extraction processing needs to be performed on each standard training history DMA data to obtain the step frequency features of the history data. Further, training of the initial chip bandwidth determination mathematical model is performed by combining the historical data step frequency characteristics and the historical chip types, and the historical model determination bandwidth value can be output.
In particular, different model inputs may result in different historical model determination bandwidth values. According to the formulaTo obtain historical model-determined bandwidth values.
Assuming that there are multiple inputs, the multiple inputs (assuming n inputs) may be perceptually multi-layered shunted to get the corresponding result:
Further, a model error value may be calculated based on the historical model determination bandwidth value and the historical standard DMA bandwidth. Correspondingly, an error value threshold can be obtained, whether the model error value of the current chip bandwidth determination mathematical model is smaller than the error value threshold is judged, and if yes, the chip bandwidth determination mathematical model is determined to be trained. If not, the new standard training history DMA data is required to be continuously acquired for model retraining until the error value threshold is met.
Optionally, the determining the bandwidth value according to the obtained historical model to calculate the model error value includes: acquiring historical standard DMA bandwidths respectively corresponding to the standard training historical DMA data; and determining a bandwidth value according to the historical standard DMA bandwidth and the historical model, and calculating to obtain a model error value through a preset Loss function calculation formula.
In this embodiment, specifically, the Loss function calculation formula may be: where Loss represents a model error value,/> Representing historical model to determine Bandwidth value,/>Representing historical standard DMA bandwidth.
In this embodiment, the calculation of the model error value may be performed by the above formula, or may be performed in a different manner.
In addition, after constructing the Loss function calculation formula, the method can also be used for、/>KL or/>To perform a searchable adjustable operation of the precision parameter.
The advantages of this arrangement are that: by constructing a Loss function calculation formula and adjusting precision parameters, the calculation of an accurate model error value is performed, so that the determination of the DMA bandwidth can be better performed, and the accuracy of the determination of the DMA bandwidth is improved.
In a preferred implementation of this embodiment, the step frequency characteristic of the history data may be regarded as a subsequence, and the subsequence is assumed to be a valid subsequence, and all points in the subsequence belong to the same pattern.
Further, the method comprises the following steps. And sending the subsequence and the bandwidths corresponding to the points on the subsequence into a model, wherein the model tries to fit an optimal equation, and judging the average model error value of the equation on all sample points after the fitting is completed. If the model error value is greater than the error value threshold, then the model error value is considered to be(Equidistant subsequences) do not belong to the same pattern, a re-search is performed. If the model error value is less than the error value threshold, then it is considered that a pattern has been searched, andDeleting from the sample set, and continuing to search for patterns in the rest of the sample set. At the same time, can searchAnd setting a stop condition to prevent the residual data set from being searched excessively long due to the fact that no pattern exists objectively.
S240, acquiring current DMA data corresponding to the target AI chip.
The current DMA data is data which is carried by using DMA directly and has minimum granularity in the target AI chip.
S250, carrying out data processing on the current DMA data through a pre-constructed data preprocessing method to obtain standard current DMA data.
S260, inputting the standard current DMA data into a pre-trained chip bandwidth determination mathematical model, and determining the corresponding DMA bandwidth of the target AI chip under different data carrying conditions.
And S270, feeding back the DMA bandwidth to determine the optimal performance of the DMA bandwidth in data handling corresponding to the target AI chip.
According to the technical scheme, a plurality of historical DMA data corresponding to AI chips of different types are obtained, and a historical DMA data set is constructed by the historical DMA data; performing data processing on the historical DMA data set through a pre-constructed data preprocessing method to obtain a standard training historical DMA data set; inputting each standard training history DMA data to an initial chip bandwidth determination mathematical model to perform model training, and determining that training is completed when the initial chip bandwidth determination mathematical model outputs a DMA bandwidth and errors on the standard training history DMA data set are smaller than a preset error value threshold value; acquiring current DMA data corresponding to a target AI chip; performing data processing on the current DMA data through a pre-constructed data preprocessing method to obtain standard current DMA data; inputting standard current DMA data into a pre-trained chip bandwidth determination mathematical model, and determining the corresponding DMA bandwidth of a target AI chip under different data carrying conditions; and feeding back the DMA bandwidth to determine the optimal performance of the DMA bandwidth in data handling corresponding to the target AI chip. The problem of inaccurate DMA bandwidth deduction and evaluation of different AI chips is solved, a more accurate chip bandwidth determination mathematical model can be trained, limit evaluation of the AI chips can be achieved, accuracy and efficiency of DMA bandwidth deduction of the AI chips of different types are improved, and labor cost and time cost are saved.
Example III
Fig. 3 is a schematic structural diagram of a DMA bandwidth determining device based on an AI chip according to a third embodiment of the present invention. The device for determining the DMA bandwidth based on the AI chip provided by the embodiment of the invention can be realized through software and/or hardware, and can be configured in terminal equipment or a server to realize the method for determining the DMA bandwidth based on the AI chip. As shown in fig. 3, the apparatus includes: a current DMA data acquisition module 310, a standard current DMA data determination module 320, a DMA bandwidth determination module 330, and a DMA bandwidth feedback module 340.
The current DMA data obtaining module 310 is configured to obtain current DMA data corresponding to the target AI chip; the current DMA data is data which is carried by using DMA and has minimum granularity in the target AI chip;
a standard current DMA data determining module 320, configured to perform data processing on the current DMA data by using a pre-constructed data preprocessing method, so as to obtain standard current DMA data;
The DMA bandwidth determination module 330 is configured to input the standard current DMA data into a pre-trained chip bandwidth determination model, and determine DMA bandwidths of the target AI chip under different data handling conditions;
And a DMA bandwidth feedback module 340, configured to feedback the DMA bandwidth to determine the optimal performance of the DMA bandwidth in data handling corresponding to the target AI chip.
According to the technical scheme, the current DMA data corresponding to the target AI chip is obtained; performing data processing on the current DMA data through a pre-constructed data preprocessing method to obtain standard current DMA data; inputting standard current DMA data into a pre-trained chip bandwidth determination mathematical model, and determining the corresponding DMA bandwidth of a target AI chip under different data carrying conditions; and feeding back the DMA bandwidth to determine the optimal performance of the DMA bandwidth in data handling corresponding to the target AI chip. The problem of inaccurate DMA bandwidth deduction and evaluation of different AI chips is solved, limit evaluation of the AI chips is realized, accuracy and efficiency of DMA bandwidth deduction of the AI chips of different types are improved, and labor cost and time cost are saved.
Based on the above embodiments, the DMA bandwidth determining module 330 may be specifically configured to: carrying out data step frequency characteristic extraction processing on the standard current DMA data, and determining the step frequency characteristic of the current data; and acquiring the current chip type corresponding to the target AI chip, inputting the current chip type and the current data step frequency characteristic into a pre-trained chip bandwidth determination mathematical model, and determining the DMA bandwidth corresponding to the target AI chip.
Based on the above embodiments, the chip bandwidth determining mathematical model training module may specifically include:
the historical DMA data set determining unit is used for acquiring a plurality of historical DMA data respectively corresponding to the AI chips of different types before the current DMA data corresponding to the AI chips of the target is acquired, and constructing a historical DMA data set by each historical DMA data;
The standard training historical DMA data set determining unit is used for carrying out data processing on the historical DMA data set through a pre-constructed data preprocessing method to obtain a standard training historical DMA data set; wherein the standard training history DMA data set comprises a plurality of standard training history DMA data;
The chip bandwidth determining mathematical model determining unit is used for inputting the standard training history DMA data into the initial chip bandwidth determining mathematical model to perform model training, and determining that training is completed when the error on the standard training history DMA data set is smaller than a preset error value threshold value when the DMA bandwidth output by the initial chip bandwidth determining mathematical model is smaller than the error value threshold value.
On the basis of the above embodiments, the historical DMA data set determination unit may be specifically configured to: acquiring historical data shape information, DMA running time of the historical data, historical standard DMA bandwidth and chip type corresponding to each historical DMA data respectively; and carrying out joint storage on each historical DMA data, the historical data shape information, the DMA running time of the historical data, the historical standard DMA bandwidth and the chip type, and constructing a historical DMA data set.
On the basis of the above embodiments, the standard training history DMA data set determination unit may be specifically configured to: performing data compression measurement on the historical DMA data set to obtain a historical correct DMA data set; performing data shuffling and reorganizing treatment on the historical correct DMA data set to obtain a historical reorganized DMA data set; dividing the historical reorganization DMA data set according to a preset training test proportion to obtain a standard test historical DMA data set and a standard training historical DMA data set.
On the basis of the above embodiments, the chip bandwidth determination mathematical model determination unit may be specifically configured to: respectively carrying out data step frequency feature extraction processing on each standard training historical DMA data to obtain historical data step frequency features respectively corresponding to each standard training historical DMA data; inputting each historical data step frequency characteristic into an initial chip bandwidth determination mathematical model to perform model training, and obtaining a historical model determination bandwidth value; determining a bandwidth value according to the historical model to calculate a model error value; and acquiring an error value threshold, judging whether the model error value of the current chip bandwidth determination mathematical model is smaller than the error value threshold, and if so, determining that training is completed to determine the chip bandwidth determination mathematical model.
On the basis of the above embodiments, the chip bandwidth determination mathematical model determination unit may be further specifically configured to: acquiring historical standard DMA bandwidths respectively corresponding to the standard training historical DMA data; and determining a bandwidth value according to the historical standard DMA bandwidth and the historical model, and calculating to obtain a model error value through a preset Loss function calculation formula.
The AI chip-based DMA bandwidth determining device provided by the embodiment of the invention can execute the AI chip-based DMA bandwidth determining method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executing method.
Example IV
Fig. 4 shows a schematic diagram of an electronic device 10 that may be used to implement a fourth embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the AI chip-based DMA bandwidth determination method.
In some embodiments, the AI chip-based DMA bandwidth determination method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the AI chip-based DMA bandwidth determination method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the AI chip-based DMA bandwidth determination method by any other suitable means (e.g., by means of firmware).
The method comprises the following steps: acquiring current DMA data corresponding to a target AI chip; the current DMA data is data which is carried by using DMA and has minimum granularity in the target AI chip; performing data processing on the current DMA data through a pre-constructed data preprocessing method to obtain standard current DMA data; inputting the standard current DMA data into a pre-trained chip bandwidth determination mathematical model, and determining the corresponding DMA bandwidth of the target AI chip under different data carrying conditions; and feeding back the DMA bandwidth to determine the optimal performance of the DMA bandwidth in data handling corresponding to the target AI chip.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Example five
A fifth embodiment of the present invention also provides a method for performing AI chip-based DMA bandwidth determination, the method comprising: acquiring current DMA data corresponding to a target AI chip; the current DMA data is data which is carried by using DMA and has minimum granularity in the target AI chip; performing data processing on the current DMA data through a pre-constructed data preprocessing method to obtain standard current DMA data; inputting the standard current DMA data into a pre-trained chip bandwidth determination mathematical model, and determining the corresponding DMA bandwidth of the target AI chip under different data carrying conditions; and feeding back the DMA bandwidth to determine the optimal performance of the DMA bandwidth in data handling corresponding to the target AI chip.
Of course, the computer-readable storage medium provided by the embodiments of the present invention has computer-executable instructions not limited to the method operations described above, but also can perform the related operations in the AI-chip-based DMA bandwidth determination provided by any of the embodiments of the present invention.
From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk, or an optical disk of a computer, etc., and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.
It should be noted that, in the above embodiment of determining the DMA bandwidth based on the AI chip, each unit and module included are only divided according to the functional logic, but not limited to the above division, so long as the corresponding function can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (9)

1. A DMA bandwidth determination method based on an AI chip, comprising:
Acquiring current DMA data corresponding to a target AI chip; the current DMA data is data which is carried by using DMA and has minimum granularity in the target AI chip;
performing data processing on the current DMA data through a pre-constructed data preprocessing method to obtain standard current DMA data;
Inputting the standard current DMA data into a pre-trained chip bandwidth determination mathematical model, and determining the corresponding DMA bandwidth of the target AI chip under different data carrying conditions;
feeding back the DMA bandwidth to determine the optimal performance of the DMA bandwidth in data handling corresponding to the target AI chip;
The method for determining the DMA bandwidth of the target AI chip under different data carrying conditions comprises the following steps of:
carrying out data step frequency characteristic extraction processing on the standard current DMA data, and determining the step frequency characteristic of the current data;
And acquiring the current chip type corresponding to the target AI chip, inputting the current chip type and the current data step frequency characteristic into a pre-trained chip bandwidth determination mathematical model, and determining the DMA bandwidth corresponding to the target AI chip.
2. The method of claim 1, further comprising, prior to the acquiring the current DMA data corresponding to the target AI chip:
Acquiring a plurality of historical DMA data corresponding to AI chips of different types respectively, and constructing a historical DMA data set by each historical DMA data;
Performing data processing on the historical DMA data set through a pre-constructed data preprocessing method to obtain a standard training historical DMA data set;
wherein the standard training history DMA data set comprises a plurality of standard training history DMA data;
and inputting each standard training history DMA data to an initial chip bandwidth determination mathematical model to perform model training, and determining that training is completed when the initial chip bandwidth determination mathematical model outputs the DMA bandwidth and the error on the standard training history DMA data set is smaller than a preset error value threshold value.
3. The method of claim 2, wherein said constructing a historical DMA data set from each of said historical DMA data comprises:
acquiring historical data shape information, DMA running time of the historical data, historical standard DMA bandwidth and chip type corresponding to each historical DMA data respectively;
and carrying out joint storage on each historical DMA data, the historical data shape information, the DMA running time of the historical data, the historical standard DMA bandwidth and the chip type, and constructing a historical DMA data set.
4. A method according to claim 3, wherein said data processing of said historical DMA data set by a pre-constructed data preprocessing method results in a standard training historical DMA data set comprising:
Performing data compression measurement on the historical DMA data set to obtain a historical correct DMA data set;
Performing data shuffling and reorganizing treatment on the historical correct DMA data set to obtain a historical reorganized DMA data set;
Dividing the historical reorganization DMA data set according to a preset training test proportion to obtain a standard test historical DMA data set and a standard training historical DMA data set.
5. The method according to claim 4, wherein inputting each standard training history DMA data into an initial chip bandwidth determining mathematical model for model training, determining that training is completed when a model error value on the standard training history DMA data set is less than a preset error value threshold value for a DMA bandwidth corresponding to the initial chip bandwidth determining mathematical model, comprises:
respectively carrying out data step frequency feature extraction processing on each standard training historical DMA data to obtain historical data step frequency features respectively corresponding to each standard training historical DMA data;
Inputting each historical data step frequency characteristic into an initial chip bandwidth determination mathematical model to perform model training, and obtaining a historical model determination bandwidth value;
determining a bandwidth value according to the historical model to calculate a model error value;
And acquiring an error value threshold, judging whether the model error value of the current chip bandwidth determination mathematical model is smaller than the error value threshold, and if so, determining that training is completed to determine the chip bandwidth determination mathematical model.
6. The method of claim 5, wherein said determining bandwidth values from the resulting historical model to calculate model error values comprises:
Acquiring historical standard DMA bandwidths respectively corresponding to the standard training historical DMA data;
and determining a bandwidth value according to the historical standard DMA bandwidth and the historical model, and calculating to obtain a model error value through a preset Loss function calculation formula.
7. An AI chip-based DMA bandwidth determining apparatus, comprising:
The current DMA data acquisition module is used for acquiring current DMA data corresponding to the target AI chip; the current DMA data is data which is carried by using DMA and has minimum granularity in the target AI chip;
the standard current DMA data determining module is used for carrying out data processing on the current DMA data through a pre-constructed data preprocessing method to obtain standard current DMA data;
the DMA bandwidth determining module is used for inputting the standard current DMA data into a pre-trained chip bandwidth determining model to determine the DMA bandwidth of the target AI chip under different data carrying conditions;
The DMA bandwidth feedback module is used for feeding back the DMA bandwidth so as to determine the optimal performance of the DMA bandwidth in the data handling corresponding to the target AI chip;
Wherein, the DMA bandwidth determining module is configured to: carrying out data step frequency characteristic extraction processing on the standard current DMA data, and determining the step frequency characteristic of the current data; and acquiring the current chip type corresponding to the target AI chip, inputting the current chip type and the current data step frequency characteristic into a pre-trained chip bandwidth determination mathematical model, and determining the DMA bandwidth corresponding to the target AI chip.
8. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements a AI-chip-based DMA bandwidth determination method as recited in any of claims 1-6 when the computer program is executed by the processor.
9. A computer readable storage medium storing computer instructions for causing a processor to implement an AI chip-based DMA bandwidth determination method as recited in any one of claims 1-6 when executed.
CN202410251615.5A 2024-03-06 2024-03-06 DMA bandwidth determining method, device, equipment and medium based on AI chip Active CN117827710B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410251615.5A CN117827710B (en) 2024-03-06 2024-03-06 DMA bandwidth determining method, device, equipment and medium based on AI chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410251615.5A CN117827710B (en) 2024-03-06 2024-03-06 DMA bandwidth determining method, device, equipment and medium based on AI chip

Publications (2)

Publication Number Publication Date
CN117827710A CN117827710A (en) 2024-04-05
CN117827710B true CN117827710B (en) 2024-05-24

Family

ID=90506145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410251615.5A Active CN117827710B (en) 2024-03-06 2024-03-06 DMA bandwidth determining method, device, equipment and medium based on AI chip

Country Status (1)

Country Link
CN (1) CN117827710B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109475004A (en) * 2018-12-05 2019-03-15 锐捷网络股份有限公司 Bandwidth allocation methods, device, transmitting equipment and storage medium
CN113220621A (en) * 2021-05-18 2021-08-06 中国南方电网有限责任公司超高压输电公司天生桥局 Method for adaptively adjusting data delay of high-speed differential signal data transmission interface
CN114430361A (en) * 2021-12-30 2022-05-03 天翼云科技有限公司 Abnormal bandwidth detection method and device, electronic equipment and storage medium
CN114500339A (en) * 2022-02-07 2022-05-13 北京百度网讯科技有限公司 Node bandwidth monitoring method and device, electronic equipment and storage medium
CN114546908A (en) * 2022-02-22 2022-05-27 杭州中天微系统有限公司 Bus bandwidth self-adaption unit, method and chip
CN114691566A (en) * 2020-12-31 2022-07-01 Oppo广东移动通信有限公司 AI model operation method, loading method and device and IC chip

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11922292B2 (en) * 2020-01-27 2024-03-05 Google Llc Shared scratchpad memory with parallel load-store

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109475004A (en) * 2018-12-05 2019-03-15 锐捷网络股份有限公司 Bandwidth allocation methods, device, transmitting equipment and storage medium
CN114691566A (en) * 2020-12-31 2022-07-01 Oppo广东移动通信有限公司 AI model operation method, loading method and device and IC chip
CN113220621A (en) * 2021-05-18 2021-08-06 中国南方电网有限责任公司超高压输电公司天生桥局 Method for adaptively adjusting data delay of high-speed differential signal data transmission interface
CN114430361A (en) * 2021-12-30 2022-05-03 天翼云科技有限公司 Abnormal bandwidth detection method and device, electronic equipment and storage medium
CN114500339A (en) * 2022-02-07 2022-05-13 北京百度网讯科技有限公司 Node bandwidth monitoring method and device, electronic equipment and storage medium
CN114546908A (en) * 2022-02-22 2022-05-27 杭州中天微系统有限公司 Bus bandwidth self-adaption unit, method and chip

Also Published As

Publication number Publication date
CN117827710A (en) 2024-04-05

Similar Documents

Publication Publication Date Title
CN116225769B (en) Method, device, equipment and medium for determining root cause of system fault
CN117827710B (en) DMA bandwidth determining method, device, equipment and medium based on AI chip
CN114866437B (en) Node detection method, device, equipment and medium
CN116578646A (en) Time sequence data synchronization method, device, equipment and storage medium
CN110781410A (en) Community detection method and device
CN116668264A (en) Root cause analysis method, device, equipment and storage medium for alarm clustering
CN116382658A (en) Compiling method and device of AI model, computer equipment and storage medium
CN116204522A (en) Data auditing method and device, electronic equipment and storage medium
CN115576902B (en) Method, device, equipment and medium for processing calibration description file
CN116108589B (en) Method, device, equipment and medium for constructing core model
CN117746069B (en) Graph searching model training method and graph searching method
CN118300997A (en) DMA bandwidth determining method and medium based on deep neural learning model
CN115511047B (en) Quantification method, device, equipment and medium of Softmax model
CN117591983B (en) Multi-index anomaly detection method and device, electronic equipment and storage medium
CN117150215B (en) Assessment result determining method and device, electronic equipment and storage medium
CN116523051A (en) Model mixed-precision reasoning method, device, equipment and storage medium
CN117434403A (en) Partial discharge detection method and device for electric appliance
CN117194018A (en) Processing method and device of system temperature control algorithm in multi-core and multi-chip environment
CN117422108A (en) Method, device, equipment and storage medium for determining convolution kernel
CN116502841A (en) Event processing method and device, electronic equipment and medium
CN116383498A (en) Data matching method and device, electronic equipment and storage medium
CN117591576A (en) Overlapping community dividing method, device, equipment and medium based on node similarity
CN118261303A (en) Optimization method, equipment and storage medium for large model of carbon emission reduction scheme
CN117608896A (en) Transaction data processing method and device, electronic equipment and storage medium
CN117520605A (en) Electric power data acquisition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant