CN110376503B - AI acceleration chip performance test method and device - Google Patents

AI acceleration chip performance test method and device Download PDF

Info

Publication number
CN110376503B
CN110376503B CN201910565843.9A CN201910565843A CN110376503B CN 110376503 B CN110376503 B CN 110376503B CN 201910565843 A CN201910565843 A CN 201910565843A CN 110376503 B CN110376503 B CN 110376503B
Authority
CN
China
Prior art keywords
instruction
time
chip
module
performance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910565843.9A
Other languages
Chinese (zh)
Other versions
CN110376503A (en
Inventor
陈坚
汪玉
林峰
葛广君
梁爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou Institute Of Data Technology Co ltd
Original Assignee
Fuzhou Institute Of Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou Institute Of Data Technology Co ltd filed Critical Fuzhou Institute Of Data Technology Co ltd
Priority to CN201910565843.9A priority Critical patent/CN110376503B/en
Publication of CN110376503A publication Critical patent/CN110376503A/en
Application granted granted Critical
Publication of CN110376503B publication Critical patent/CN110376503B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/28Testing of electronic circuits, e.g. by signal tracer
    • G01R31/2851Testing of integrated circuits [IC]
    • G01R31/2882Testing timing characteristics

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a device for testing the performance of an AI acceleration chip.A data record is formed by sampling and recording the starting time and the ending time of each instruction of each module in the chip; then, the data records are sorted into a list, and the list is subjected to corresponding calculation processing to obtain the instruction operation duration and relevant parameters of each module of the chip; in addition, parallel instructions of a specified module or specified time can be searched and obtained from the list and are printed in characters or displayed in a graph line for parallel analysis. The invention not only provides the performance analysis of the computing part, but also provides the parallelism analysis of the communication part and the computing part. The invention can search the parallel instructions in the condition range according to the set conditions and analyze the instruction parallelism, thereby providing better support for the performance optimization of the chip.

Description

AI acceleration chip performance test method and device
Technical Field
The invention relates to the field of chip testing, in particular to an AI acceleration chip performance testing method and device.
Background
The AI accelerator chip generally includes: the module comprises instruction scheduling, convolution operation, pooling, activating function calculation and data loading and unloading. Because the mainstream AI algorithm has huge parameter quantity at present, data needs to be repeatedly moved between an on-chip memory and an off-chip memory. Therefore, the matching degree of the calculation bandwidth and the data bandwidth of the system is the most main factor influencing the performance of the AI accelerating chip. In addition, the diversity of AI network models also results in high uncertainty in the efficiency of operation of the individual modules. Therefore, the performance of the AI chip depends not only on the individual performance of the instruction modules, but also on the data loading and unloading and the scheduling efficiency of other instruction modules, and it is necessary to provide a testing method for analyzing the performance of each instruction module and the parallelism between the instruction modules, so as to facilitate the subsequent chip performance improvement.
Because the use of the chip is different, the focus of the existing performance test method is also different, and some test methods pay attention to the performance of the chip under different utilization rates, so that the chip type selection is convenient; some test methods pay attention to the running performance of a certain module in the test method, so that subsequent improvement is facilitated; some concern about the parallelism of the operation of multiple modules of a chip, and predict the final performance of the chip in the development stage. At present, the following typical methods are mainly available:
1) and controlling the utilization rate of the chip, obtaining the chip performance test results under different utilization rates, and taking the geometric mean. The utilization rate of the CPU is controlled by a control instruction as shown in a 'CPU performance evaluation method and device' with the application number of 201310161217.6; performing benchmark test on a Central Processing Unit (CPU) to obtain performance test results of the CPU under each utilization rate, wherein each performance test result represents the performance of the CPU under a load; and calculating the geometric mean of the multiple performance test results to obtain the final performance evaluation result of the CPU. The patent can only predict the performance by testing the instruction segment, and cannot accurately measure the running time of each internal module and the parallelism of different internal modules. This patent fails to test out modules that cause performance bottlenecks.
2) Different modules in the chip are provided with bypass circuits, and the performance of the module to be tested can be tested after bypassing. As shown in the patent of "cell performance test method and system chip for artificial intelligence module" with application number 201910103596.0, for a plurality of AI processing cells arranged in a two-dimensional array, each processing cell includes an enable input terminal for receiving an enable signal and pausing or starting the operation of the processing cell according to the enable signal; the processing unit with the same dimension 1 and/or dimension 2 as the processing unit to be tested in the plurality of processing units can be configured to be in a bypass state so as to realize performance test on the processing unit to be tested; by giving the processing unit a bypass function, the AI module can be tested more conveniently. The patent tests the module to be tested through the bypass, and the overall performance of the chip cannot be estimated; the parallelism among different modules cannot be observed in a real scene.
3) And predicting the overall performance of the chip by drawing the running time of each calculation block in the chip. As shown in the 'method for predicting GPU performance and corresponding computer system' patent application No. 201510387995.6, a set of test applications are run in a GPU chip to be evaluated; capturing a set of scalar performance counters and vector performance counters; creating a model for evaluating and predicting GPU performance for different chip configurations based on the captured scalar performance counters and vector performance counters; and predicting a performance score of the GPU chip and identifying a bottleneck in the GPU pipeline. The patent builds a performance model for the parallelism of the computation modules of the GPU, but one chip includes a communication part and a computation part, and the communication part sometimes has a greater influence on the overall performance.
Disclosure of Invention
The invention aims to provide a method and a device for testing the performance of an AI accelerating chip, which not only provide the performance analysis of a computing part, but also provide the parallelism analysis of a communication part and the computing part. The invention can search the parallel instructions in the condition range according to the set conditions and analyze the instruction parallelism, thereby providing better support for the performance optimization of the chip.
The technical scheme adopted by the invention is as follows:
a performance test method for an AI accelerating chip comprises the following steps:
step 1, starting a global test, and distributing a test instruction to each module of an AI acceleration chip;
step 2, respectively sampling and acquiring the starting time and the ending time of each instruction operated by each module to form a data record and uploading the data record to an external performance analyzer;
step 3, the performance analyzer arranges the data records into a list according to modules;
step 4, calculating the list by using a script language to obtain the running time of each instruction;
step 5, respectively accumulating the operation time lengths of all instructions of each module to obtain the total operation time length of each module, and counting the instruction operation total time length occupying the most operation time according to the module;
step 6, calculating the list by using a script language to obtain the interval of adjacent instructions in each module;
step 7, searching a parallel instruction operation result in a specified range to be analyzed from the list;
and 8, outputting and displaying the searched running result of the parallel instruction.
Further, each module of the AI acceleration chip in step 1 includes a specific computation module and a communication module responsible for data handling.
Further, in step 2, the AI acceleration chip acquires the reference time through a timer, and records the start time and the end time of the command by using the reference time.
Further, the data record in step 2 is written into the volatile memory and uploaded to the performance analyzer after the memory capacity reaches the waterline or the instruction execution is finished.
Further, the data record in step 2 is directly uploaded to the performance analyzer through a communication interface arranged on the AI acceleration chip.
Further, the scripting language in step 4 or 6 is python language.
Further, the table lookup in step 7 includes three modes, specifically as follows:
mode 1: searching parallel instructions in a time range according to set time;
mode 2: searching parallel instructions in a time range according to the specified module instruction serial number;
mode 3: searching an instruction index with the longest instruction interval in a certain module, and searching for parallel instructions in a time range;
further, the specific steps of mode 3 are:
step 7-1, obtaining an instruction index number with the largest instruction interval by sequencing the instruction intervals in the designated module;
step 7-2, taking the starting Time corresponding to the q instructions before the maximum instruction index number as Time min; starting Time corresponding to the next p instructions is set by a user as the value of Time max, q and p;
and 7-3, taking the Time min and the Time max as Time ranges to search the parallel instructions.
The invention also discloses an AI accelerating chip performance testing device, which comprises a chip internal data record generating circuit and a performance analyzer, wherein the chip internal data record generating circuit comprises a test control circuit, a timer and an instruction time record summarizing and communication circuit, the test control circuit is respectively connected with the timer and each module in the chip, the test control circuit is used for controlling the chip to start or finish performance testing, the timer is used for generating time reference and providing the time reference for each module in the chip, each module of each AI accelerating chip is provided with a time sampling circuit, and the time sampling circuit is used for acquiring the operation starting time and the operation finishing time of each instruction in each module; and the instruction time record summarizing and communication circuit is used for summarizing the generated instruction running time and uploading the instruction running time to the performance analyzer.
Further, the command time record summarizing and communication circuit comprises a competition judging and record keeping circuit, an internal RAM memory and a communication interface,
the competition judging and record keeping circuit writes the records into an internal RAM (random access memory) according to a fair rotation principle, and when the number of the AI accelerating chip modules is X, all the records are written in X clock cycles;
the competition judging and recording holding circuit is a set, and the number of the operation cycles of each instruction of the competition judging and recording holding circuit is greater than the number of the AI accelerating chip modules; or a competition judging and recording retaining circuit with a plurality of sets of rotation is adopted, the number of the operation cycles of each instruction of the competition judging and recording retaining circuit is greater than that of the AI accelerating chip modules, and the rotation cycle of each set of competition judging and recording retaining circuit is less than that of the instruction;
the internal RAM memory is used for storing the recorded data and providing storage state information;
the communication interface is in communication connection with the performance analyzer, and the record is uploaded to the performance analyzer through the communication interface after the capacity of the internal RAM memory reaches a waterline or the instruction operation is finished;
the performance analyzer is a PC, a tablet, a smart phone or a cloud server.
By adopting the technical scheme, the data record is formed by sampling and recording the starting time and the ending time of each instruction of each module in the chip; then, the data records are sorted into a list, and the list is subjected to corresponding calculation processing to obtain the instruction operation duration and relevant parameters of each module of the chip; in addition, parallel instructions of a specified module or specified time can be searched and obtained from the list and are printed in characters or displayed in a graph line for parallel analysis. The invention not only provides the performance analysis of the computing part, but also provides the parallelism analysis of the communication part and the computing part. The invention can search the parallel instructions in the condition range according to the set conditions and analyze the instruction parallelism, thereby providing better support for the performance optimization of the chip.
Drawings
The invention is described in further detail below with reference to the accompanying drawings and the detailed description;
FIG. 1 is a schematic diagram of the testing principle of the present invention;
FIG. 2 is a schematic diagram of an instruction time record summarizing and communication circuit according to the present invention
FIG. 3 is a schematic diagram of a performance test data generation process according to the present invention;
FIG. 4 is a schematic analysis flow chart of the performance analyzer of the present invention;
FIG. 5 is a schematic representation of a collated list of performance analyzers of the present invention;
FIG. 6 is a schematic diagram of mode 1 in the list lookup of the present invention;
FIG. 7 is a schematic diagram of mode 2 in the list lookup of the present invention;
FIG. 8 is a schematic diagram of mode 3 in the list lookup of the present invention;
FIG. 9 is a graph illustrating the result of the total duration of instruction execution according to the present invention;
FIG. 10 is a diagram illustrating the search results of parallel instructions within a set time range according to the present invention;
FIG. 11 is a diagram illustrating a lookup result of parallel instructions within a time range corresponding to an instruction with an index assigned according to the present invention.
Detailed Description
As shown in one of fig. 1-11, the present invention discloses an AI acceleration chip performance testing device, which comprises a chip internal data record generating circuit and a performance analyzer, wherein the chip internal data record generating circuit comprises a test control circuit, a timer and an instruction time record summarizing and communication circuit,
the test control circuit is respectively connected with the timer and each module in the chip and is used for controlling the chip to start or finish performance test; the timer is used for generating a time reference and providing the time reference for each module in the chip; each module of each AI acceleration chip is provided with a time sampling circuit, the time sampling circuit is used for acquiring the operation starting time and the operation ending time of each instruction in each module, in addition, the module included in the AI acceleration chip can be a specific calculation module or a communication module responsible for data transportation, and the invention can test the calculation module and also can test the communication module; and the instruction time record summarizing and communication circuit is used for summarizing the sampled instruction running time data and uploading the instruction running time data to the performance analyzer.
Specifically, as shown in fig. 2, the command time record totaling and communication circuit includes a contention resolution and record keeping circuit that writes a plurality of records into an internal small-capacity RAM memory in accordance with the principle of fair rotation when the plurality of records arrive at the same time. When the number of modules is X, all records can be written in X clock cycles, so that the circuit has an applicable condition, and the number of the operation cycles of each instruction is greater than the number of the modules. If the condition is not met, a plurality of circuits are designed, and each round module is smaller than the instruction running period.
The internal small-capacity RAM is used to store recorded data and provide storage status information. Each record written contains, the type of instruction, the sequence of the type of instruction (the second instruction), the start and end times of the instruction.
When the storage capacity reaches a certain waterline (RAM is guaranteed not to overflow), or when all instructions finish running, a communication interface is started, the records are uploaded to a performance analyzer, and the communication interface can be an out-of-band interface of a chip, such as an SPI interface and a gigabit interface. Note that the transmission bandwidth of the communication interface must ensure that the internal small-capacity RAM does not overflow.
Specifically, as an implementation manner, the performance analyzer is a PC, a tablet, or a smart phone, and has the following functions:
a) and performing data analysis on the uploaded data records. b) The run time for each instruction is displayed or printed. c) The total run time for each type of instruction is counted. d) And searching and displaying the parallel operation instruction in the condition range according to the set condition.
Further, the invention also discloses a method for testing the performance of the AI accelerating chip, which comprises the following specific steps:
the process of generating performance test data in steps 1 to 2 of the present invention, as shown in fig. 3,
step 1, receiving an instruction operation starting signal, starting a global timer, providing a time reference, starting a global test, and distributing a test instruction to each module of an AI acceleration chip; further, each module of the AI acceleration chip in step 1 includes a specific computation module and a communication module responsible for data handling.
Step 2, respectively sampling and acquiring the starting time and the ending time of each instruction operated by each module to form a data record and uploading the data record to an external performance analyzer;
specifically, the start time and the end time of each instruction are saved as one record;
further, as an embodiment, the data record in step 2 is written into a volatile memory and uploaded to a performance analyzer after the memory capacity reaches the waterline or the instruction execution is finished. Further, as another embodiment, the data record in step 2 is directly uploaded to the performance analyzer through a communication interface arranged on the AI acceleration chip to reduce the use of the internal memory
The steps 3 to 8 of the present invention relate to the analysis flow of the performance analyzer, and specifically, as shown in fig. 4, where start (n) refers to the starting time of the nth instruction in the module; end (n) refers to the end time of the nth instruction in the module.
And 3, the performance analyzer arranges the data records into a list according to the modules, arranges the instruction time records of each module into a list format as shown in fig. 5, and facilitates subsequent analysis and processing, wherein Start represents the starting time of the instruction, and End represents the ending time of the instruction.
Step 4, calculating the list by using a script language to obtain the running time of each instruction;
specifically, for example, the python language, calculates the above list and calculates the operation time length of each instruction. The formula is as follows:
Module_x_Inst_cycle(n) = End(n) - Start(n) 1≤n≤index_max
wherein, start (n): the starting time of the nth instruction in the module x; end (n): end time of nth instruction in module x; module _ x _ Inst _ cycle (n): the running time of the nth instruction of the xth module; index _ max: the instruction maximum number.
Step 5, respectively accumulating the running time lengths of all instructions of each Module to obtain the running total time length of each Module, wherein the running total time length is used for counting the instruction occupying the most operating time according to the Module counting instruction running total time length;
step 6, calculating the list by using a script language to obtain the interval of adjacent instructions in each module; the script language is python language, and the specific formula is as follows:
Module_x_gap_cycle(n) = Start(n) - End(n-1) 2≤n≤index_max
wherein, Module _ x _ gap _ cycle (n): the nth instruction in the module x is separated from the (n-1) th instruction.
Step 7, searching a parallel instruction operation result in a specified range to be analyzed from the list; further, the table lookup in step 7 includes three modes, specifically as follows:
mode 1: searching parallel instructions in a time range according to set time; the specific principle is shown in fig. 6, where Time min and Time max are the display Time ranges set by the user.
Mode 2: searching parallel instructions in a time range according to the specified module instruction serial number; the specific principle is shown in fig. 7, where Time min and Time max are the display Time range set by the user.
Mode 3: searching an instruction index with the longest instruction interval in a certain module, and searching for parallel instructions in a time range;
further, as shown in fig. 8, the specific steps of mode 3 are:
step 7-1, obtaining an instruction index number with the largest instruction interval by sequencing the instruction intervals in the designated module;
step 7-2, taking the starting Time corresponding to the q instructions before the maximum instruction index number as Time min; starting Time corresponding to the next p instructions is set by a user as the value of Time max, q and p;
and 7-3, taking the Time min and the Time max as Time ranges to search the parallel instructions.
And 8, outputting and displaying the searched running result of the parallel instruction. And displaying the search result graph line or parallelly operating the results according to the text printing instruction.
The effect of the present invention will be described below with reference to a test example of an AI accelerator chip.
1) As shown in fig. 9, a display result of the total instruction operation duration of the AI acceleration chip is tested, where Load refers to the total data Load instruction operation duration, Save refers to the total calculation result storage instruction operation duration, Conv refers to the total convolution instruction operation duration, and Pooling refers to the total Pooling calculation instruction operation duration.
2) As shown in fig. 10, the parallel instructions within the set time range of the AI accelerator chip are further tested, and the display results are as follows:
Time from 1000 to 15000000 inst seq is:
Load inst seq: [2 : 7141]
save inst seq: [0 : 1749]
conv inst seq: [0 : 311]
misc inst seq: [0 : 271]。
3) as shown in fig. 11, the parallel instructions (instruction indexes 1000 to 1200 of the specification module 1) within the time range corresponding to the instruction of the specification index are further tested, and the display results are as follows:
Time from 2841401 to 3155305 inst seq is:
Load inst seq: [1000 : 1200]
save inst seq: [536 : 639]
conv inst seq: [74 : 81]
misc inst seq: [110 : 130]。
by adopting the technical scheme, the data record is formed by sampling and recording the starting time and the ending time of each instruction of each module in the chip; then, the data records are sorted into a list, and the list is subjected to corresponding calculation processing to obtain the instruction operation duration and relevant parameters of each module of the chip; in addition, parallel instructions of a specified module or specified time can be searched and obtained from the list and are printed in characters or displayed in a graph line for parallel analysis. The invention not only provides the performance analysis of the computing part, but also provides the parallelism analysis of the communication part and the computing part. The invention can search the parallel instructions in the condition range according to the set conditions and analyze the instruction parallelism, thereby providing better support for the performance optimization of the chip.

Claims (10)

1. A performance test method for an AI accelerating chip is characterized in that: which comprises the following steps:
step 1, starting a global test, and distributing a test instruction to each module of an AI acceleration chip;
step 2, respectively sampling and acquiring the starting time and the ending time of each instruction operated by each module to form a data record and uploading the data record to an external performance analyzer;
step 3, the performance analyzer arranges the data records into a list according to modules;
step 4, calculating the list by using a script language to obtain the running time of each instruction;
step 5, respectively accumulating the operation time lengths of all instructions of each module to obtain the total operation time length of each module, and counting the instruction operation total time length occupying the most operation time according to the module;
step 6, calculating the list by using a script language to obtain the interval of adjacent instructions in each module;
step 7, searching a parallel instruction operation result in a specified range to be analyzed from the list;
and 8, outputting and displaying the searched running result of the parallel instruction.
2. The AI acceleration chip performance testing method of claim 1, characterized in that: each module of the AI acceleration chip in step 1 includes a specific computation module and a communication module responsible for data handling.
3. The AI acceleration chip performance testing method of claim 1, characterized in that: in step 2, the AI accelerating chip acquires the reference time through a timer and records the start time and the end time of the instruction by using the reference time.
4. The AI acceleration chip performance testing method of claim 1, characterized in that: and (3) writing the data record in the step 2 into a volatile memory, and uploading the data record to a performance analyzer after the memory capacity reaches a waterline or the instruction operation is finished.
5. The AI acceleration chip performance testing method of claim 1, characterized in that: and the data record in the step 2 is directly uploaded to a performance analyzer through a communication interface arranged on an AI acceleration chip.
6. The AI acceleration chip performance testing method of claim 1, characterized in that: the scripting language in step 4 or 6 is python language.
7. The AI acceleration chip performance testing method of claim 1, characterized in that: the table lookup in step 7 includes three modes, specifically as follows:
mode 1: searching parallel instructions in a time range according to set time;
mode 2: searching parallel instructions in a time range according to the specified module instruction serial number;
mode 3: and searching the instruction index with the longest instruction interval in a certain module for parallel instructions in a time range.
8. The AI acceleration chip performance testing method of claim 7, characterized in that: the specific steps of mode 3 are:
step 7-1, obtaining an instruction index number with the largest instruction interval by sequencing the instruction intervals in the designated module;
step 7-2, taking the starting Time corresponding to the q instructions before the maximum instruction index number as Time min; starting Time corresponding to the next p instructions is set by a user as the value of Time max, q and p;
and 7-3, taking the Time min and the Time max as Time ranges to search the parallel instructions.
9. The utility model provides a AI accelerates chip performance test device which characterized in that: the device comprises a chip internal data record generating circuit and a performance analyzer, wherein the chip internal data record generating circuit comprises a test control circuit, a timer and an instruction time record summarizing and communication circuit, the test control circuit is respectively connected with the timer and each module in the chip, the test control circuit is used for controlling the chip to start or finish performance test, the timer is used for generating time reference and providing the time reference for each module in the chip, each module of each AI accelerating chip is provided with a time sampling circuit, and the time sampling circuit is used for acquiring the operation starting time and the operation ending time of each instruction in each module; and the instruction time record summarizing and communication circuit is used for summarizing the sampled instruction running time data and uploading the instruction running time data to the performance analyzer.
10. The AI acceleration chip performance testing device of claim 9, characterized in that: the instruction time recording and summarizing and communication circuit comprises a competition judging and recording holding circuit, an internal RAM memory and a communication interface,
the competition judging and record keeping circuit writes the records into an internal RAM (random access memory) according to a fair rotation principle, and when the number of the AI accelerating chip modules is X, all the records are written in X clock cycles;
the competition judging and recording holding circuit is a set, and the number of the operation cycles of each instruction of the competition judging and recording holding circuit is greater than the number of the AI accelerating chip modules; or a competition judging and recording retaining circuit with a plurality of sets of rotation is adopted, the number of the operation cycles of each instruction of the competition judging and recording retaining circuit is greater than that of the AI accelerating chip modules, and the rotation cycle of each set of competition judging and recording retaining circuit is less than that of the instruction;
the internal RAM memory is used for storing the recorded data and providing storage state information;
the communication interface is in communication connection with the performance analyzer, and the record is uploaded to the performance analyzer through the communication interface after the capacity of the internal RAM memory reaches a waterline or the instruction operation is finished;
the performance analyzer is a PC, a tablet, a smart phone or a cloud server.
CN201910565843.9A 2019-06-27 2019-06-27 AI acceleration chip performance test method and device Active CN110376503B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910565843.9A CN110376503B (en) 2019-06-27 2019-06-27 AI acceleration chip performance test method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910565843.9A CN110376503B (en) 2019-06-27 2019-06-27 AI acceleration chip performance test method and device

Publications (2)

Publication Number Publication Date
CN110376503A CN110376503A (en) 2019-10-25
CN110376503B true CN110376503B (en) 2021-07-27

Family

ID=68250991

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910565843.9A Active CN110376503B (en) 2019-06-27 2019-06-27 AI acceleration chip performance test method and device

Country Status (1)

Country Link
CN (1) CN110376503B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306829B (en) * 2020-10-12 2023-05-09 成都安易迅科技有限公司 Method and device for determining performance information, storage medium and terminal
CN113568821A (en) * 2021-07-26 2021-10-29 北京百度网讯科技有限公司 Method, device, equipment and medium for testing computation performance of AI chip
CN113884857B (en) * 2021-09-29 2024-03-08 上海阵量智能科技有限公司 Chip, chip pressure testing method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101196556A (en) * 2006-12-07 2008-06-11 上海华虹Nec电子有限公司 SOC for parallel test judgement and its implementing method
EP1293989B1 (en) * 2001-09-14 2008-07-16 Fujitsu Limited Test method for semiconductor memory circuit
CN102967815A (en) * 2012-11-07 2013-03-13 北京华大信安科技有限公司 Chip testing method, automated testing equipment and system
CN105527560A (en) * 2016-01-11 2016-04-27 福州瑞芯微电子股份有限公司 Chip difference monitoring method and monitoring circuit
CN106353667A (en) * 2015-07-13 2017-01-25 璧典凯 Fast test scheduling method
CN108226751A (en) * 2017-12-14 2018-06-29 芯海科技(深圳)股份有限公司 A kind of multiprocessor collaboration chip performance assessment system and method
CN108333497A (en) * 2017-11-28 2018-07-27 上海华力微电子有限公司 A kind of method of chip testing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1293989B1 (en) * 2001-09-14 2008-07-16 Fujitsu Limited Test method for semiconductor memory circuit
CN101196556A (en) * 2006-12-07 2008-06-11 上海华虹Nec电子有限公司 SOC for parallel test judgement and its implementing method
CN102967815A (en) * 2012-11-07 2013-03-13 北京华大信安科技有限公司 Chip testing method, automated testing equipment and system
CN106353667A (en) * 2015-07-13 2017-01-25 璧典凯 Fast test scheduling method
CN105527560A (en) * 2016-01-11 2016-04-27 福州瑞芯微电子股份有限公司 Chip difference monitoring method and monitoring circuit
CN108333497A (en) * 2017-11-28 2018-07-27 上海华力微电子有限公司 A kind of method of chip testing
CN108226751A (en) * 2017-12-14 2018-06-29 芯海科技(深圳)股份有限公司 A kind of multiprocessor collaboration chip performance assessment system and method

Also Published As

Publication number Publication date
CN110376503A (en) 2019-10-25

Similar Documents

Publication Publication Date Title
CN110376503B (en) AI acceleration chip performance test method and device
CN109298998B (en) Workload evaluation and model training method, electronic equipment and storage medium
CN111563014B (en) Interface service performance test method, device, equipment and storage medium
CN110546654A (en) Enhancing processing performance of DNN modules by configuring bandwidth control of interfaces
CN109240965B (en) FPGA logic capturing processing display suite and use method thereof
CN111989655B (en) SOC chip, method for determining hotspot function and terminal equipment
CN116362168B (en) Modeling method and device for GPGPU offline clock and storage medium
CN112464599B (en) Method for determining power supply voltage data in static time sequence analysis of circuit
CN103246566A (en) Resource monitoring method and device for application program
CN103164321A (en) Occupancy rate measuring method and device of central processing unit
CN109359727A (en) Structure determination methodology, device, equipment and the readable medium of neural network
CN115018081B (en) Feature selection method, application program prediction method and device
CN108681510A (en) Data processing method and device
CN107491484A (en) A kind of data matching method, device and equipment
CN107769987B (en) Message forwarding performance evaluation method and device
CN103389413A (en) Real-time statistical method for frequency spectrum histogram
CN109725785A (en) Task execution situation method for tracing, device, equipment and readable storage medium storing program for executing
CN109710521B (en) Multimedia application performance test method and device, computer equipment and storage medium
CN115238837B (en) Data processing method and device, electronic equipment and storage medium
US10496524B2 (en) Separating test coverage in software processes using shared memory
CN115409195A (en) Quantum program visual deduction method and device based on QASM programming architecture
CN110688289B (en) Processor performance event dynamic monitoring method based on simulation
CN114356512A (en) Data processing method, data processing equipment and computer readable storage medium
CN113806231A (en) Code coverage rate analysis method, device, equipment and medium
CN114764372A (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant