CN111242314A - Deep learning accelerator benchmark test method and device - Google Patents

Deep learning accelerator benchmark test method and device Download PDF

Info

Publication number
CN111242314A
CN111242314A CN202010017521.3A CN202010017521A CN111242314A CN 111242314 A CN111242314 A CN 111242314A CN 202010017521 A CN202010017521 A CN 202010017521A CN 111242314 A CN111242314 A CN 111242314A
Authority
CN
China
Prior art keywords
network model
test
accelerator
tested
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010017521.3A
Other languages
Chinese (zh)
Other versions
CN111242314B (en
Inventor
张蔚敏
孙明俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Academy of Information and Communications Technology CAICT
Original Assignee
China Academy of Information and Communications Technology CAICT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Academy of Information and Communications Technology CAICT filed Critical China Academy of Information and Communications Technology CAICT
Priority to CN202010017521.3A priority Critical patent/CN111242314B/en
Publication of CN111242314A publication Critical patent/CN111242314A/en
Application granted granted Critical
Publication of CN111242314B publication Critical patent/CN111242314B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a deep learning accelerator benchmark test method and device, wherein the method comprises the following steps: selecting a network model for testing according to the deployment position and the application scene of the accelerator to be tested; acquiring test data and preprocessing the test data; configuring a benchmark test environment component, loading the network model, and running the preprocessed test data on the accelerator to be tested; and acquiring a test result, and determining a test index of the tested accelerator. The method can comprehensively and accurately carry out benchmark test on the deep learning accelerator.

Description

Deep learning accelerator benchmark test method and device
Technical Field
The invention relates to the technical field of benchmark testing, in particular to a deep learning accelerator benchmark testing method and device.
Background
With the rapid development of AI technologies and applications, represented by deep learning, many vendors will integrate dedicated AI acceleration chips/processors or IPs on a platform or product. Related hardware systems are increasing, and how to fairly and systematically evaluate the performance of the chips and optimize the hardware becomes a research hotspot.
With the gradual maturity of algorithms and applications, a trained model is deployed and an inference task is completed to become a market demand hotspot, and inference task scenes are quite rich, for example, in the fields of data centers, automatic driving, security protection, mobile phones, smart homes and the like, under the background that a market technical route is not clear and products are unsmooth, a benchmark test evaluation system based on a clear current state index system is provided to objectively reflect AI chips, so that the healthy development of the industry is promoted.
As computer architectures evolve, it becomes more difficult to compare the performance of various computer systems simply by looking at their specifications. Therefore, it is necessary to develop a corresponding test that can compare different architectures, so that how to perform a comprehensive and efficient benchmark test is an urgent technical problem to be solved.
Disclosure of Invention
In view of this, the present application provides a deep learning accelerator benchmark testing method and apparatus, which can comprehensively and accurately perform a benchmark test on a deep learning accelerator.
In order to solve the technical problem, the technical scheme of the application is realized as follows:
in one embodiment, a deep learning accelerator benchmarking method is provided, the method comprising:
selecting a network model for testing according to the deployment position and the application scene of the accelerator to be tested;
acquiring test data and preprocessing the test data;
configuring a benchmark test environment component, loading the network model, and running the preprocessed test data on the accelerator to be tested;
and acquiring a test result, and determining a test index of the tested accelerator.
In another embodiment, there is provided a deep learning accelerator benchmarking apparatus, comprising: the device comprises a selection unit, a processing unit, a testing unit, an acquisition unit and a determination unit;
the selection unit is used for selecting a network model for testing according to the deployment position and the application scene of the accelerator to be tested;
the processing unit is used for acquiring test data and preprocessing the test data;
the test unit is used for configuring a benchmark test environment component, loading the network model selected by the selection unit, and running test data preprocessed by the processing unit on the accelerator to be tested;
the acquisition unit is used for acquiring a test result;
the determining unit is used for determining the test index of the tested accelerator according to the test result acquired by the acquiring unit.
In another embodiment, an electronic device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the deep learning accelerator benchmarking method as described when executing the program.
In another embodiment, a computer readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the steps of the deep learning accelerator benchmarking method.
According to the technical scheme, the benchmark test is realized by selecting the model, preprocessing the test data and configuring the test environment assembly, the benchmark test index is determined according to the test result, and the benchmark test can be comprehensively and accurately performed on the deep learning accelerator.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 is a schematic diagram of a basic test flow of a deep learning accelerator according to an embodiment of the present application;
FIG. 2 is a diagram of benchmark set-up data flow;
FIG. 3 is a diagram illustrating a benchmark hardware environment in an embodiment of the present application;
FIG. 4 is a schematic diagram of a benchmark index system in an embodiment of the present application;
FIG. 5 is a schematic diagram of an apparatus for implementing the above technique in an embodiment of the present application;
fig. 6 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.
The technical solution of the present invention will be described in detail with specific examples. Several of the following embodiments may be combined with each other and some details of the same or similar concepts or processes may not be repeated in some embodiments.
The embodiment of the application provides a deep learning accelerator benchmark test method, which realizes benchmark test by selecting a model, preprocessing test data and configuring a test environment assembly, determines benchmark test indexes according to test results, and can comprehensively and accurately perform benchmark test on a deep learning accelerator.
Referring to fig. 1, fig. 1 is a schematic diagram of a basic test flow of a deep learning accelerator in an embodiment of the present application. The method comprises the following specific steps:
step 101, selecting a network model for testing according to the deployment position and the application scene of the accelerator to be tested.
The deployment location may be: cloud, edge, end;
the application scenario may be: image classification, target detection, semantic segmentation, super-resolution, face recognition and the like.
Selecting a network model for testing according to the deployment position and the application scene of the accelerator to be tested, wherein the network model comprises the following steps:
selecting a network model according to the deployment position and the application scene of the accelerator to be tested;
taking the deployment location as an end as an example:
the network model as applied to scene selection for image classification may be: MobileNet;
the network model as selected for the target detection application scenario may be: an SSD;
the network model as selected for the semantic segmentation application scenario may be: a vdsr;
the network model as selected for the super resolution application scenario may be: deplab v 3;
the network model as selected for the face recognition application scenario may be: inclusion-ResNet.
During specific testing, one network model can be selected for testing according to the corresponding deployment position and application scene, or the network models can be added to form a tested network model set, each network model is used for testing respectively to obtain a test result, and finally, a reference index is determined.
If the network model is added, the principle of model difference degree is satisfied between the newly added network model and the selected network model.
If the principle of the model difference degree is satisfied, the difference degree between the two models is greater than a preset value, such as 30%, but not limited thereto.
Suppose one model is denoted as M and the other model is denoted as M
Figure BDA0002359457100000042
The formula for calculating the degree of difference threshold between the two models is specifically as follows:
Figure BDA0002359457100000041
wherein, MAC is the accumulated multiply-add operand of the Model, Input size is the Input size of the Model, Model size is the memory size occupied by the Model, Accurancy is the preset accuracy of the Model, layer is the number of layers of the Model, K is the number of layers of the Model1、K2、K3、K4、K5The sum of 5 weight values is 1 for each weight value corresponding to each parameter.
Step 102, obtaining test data and preprocessing the test data.
In this step, the test data is preprocessed, including:
and carrying out scaling processing and standardization processing on the test data so that the processed test data meets the requirements of the selected network model.
An image preprocessing process is given below by taking image test data as an example, but is not limited to the following preprocessing process:
firstly, the size of each frame of image in the test data is scaled according to the input requirement of the network model:
if the test data has N frames of images, IiRepresenting the ith frame image, inputting the height and width of the size according to the requirement of the model, and marking the image after the test data is zoomed as I'i,I'i=createScaledBitmap(IiHeight, width), wherein createscaladedbitmap is a scaling function provided for java;
three-channel z-score normalization of the scaled images, the sequence of benchmark input images being recorded as images,
Figure BDA0002359457100000051
where j is RGB channel information, j is 1,2,3,
Figure BDA0002359457100000052
k=1,…,height,l=1,…,width,
Figure BDA0002359457100000053
step 103, configuring a benchmark test environment component, loading the network model, and running the preprocessed test data on the accelerator to be tested.
After the network model is selected, a benchmark test environment assembly is configured to complete the construction of the operation environment, including the determination of the system environment, the acceleration mode of processor hardware, the realization format of the model, the completion of the conversion of the model, and the work including pruning, quantification and the like.
When the environment is built, the mode of calling the accelerator to be tested by the frame can be determined according to the model of the accelerator to be tested, various modes can be built in advance, and the mode of testing is determined according to the model of the accelerator to be tested during specific testing.
Calling an accelerator mode by using the selected network model (the selected network model defaults to a trained model, and if the trained model does not exist, training can be performed firstly, and then the trained network model is used for testing), and the method specifically comprises the following steps:
first, through deep learning inference framework invocation;
such as the framework (NCNN) + the CPU (mobile phone side, based on ARM architecture) of the accelerator under test.
Secondly, accelerating the calling through an own architecture;
self-contained architecture (HiAI) + NPU (Ka 980 boards) of the tested accelerator;
snpe + tested accelerator DSP (high pass 855/865 handset);
tensorRT + accelerator GPU tested (england T4).
And thirdly, calling through android NN.
The Tulite framework is based on Android API + CPU (central processing unit) of an accelerator to be tested (Android system mobile phone, such as a mobile phone carrying an MTK chip);
referring to fig. 2, fig. 2 is a schematic diagram of benchmark test component building data flow, fig. 2 shows three ways of calling the accelerator under test, wherein ① represents a first calling way, ② represents a second calling way, and ③ represents a third calling way.
If the tested accelerator is a CPU, the first or third calling mode can be selected, if the tested accelerator is a GPU, the second or third calling mode can be selected, if the tested accelerator is a DSP or NPU, the second or third calling mode can be selected, if the tested accelerator is an NNA, the second calling mode can be selected, and if the tested accelerator is a DPU, the second calling mode can be selected.
During testing, determining a mode for calling the accelerator to be tested according to the model of the accelerator to be tested;
and loading the network model during testing, and calling the accelerator to be tested by the preprocessed test data in a determined mode.
And 104, acquiring a test result and determining a test index of the tested accelerator.
During specific testing, testing time, the number of frames of an input image, testing accuracy and power consumption need to be acquired, and a power consumption testing tool needs to be added for acquiring corresponding power consumption for acquiring power consumption, for example, a power consumption testing instrument, see fig. 3, where fig. 3 is a schematic diagram of a benchmark testing hardware environment in an embodiment of the present application. Fig. 3 shows a schematic diagram of the connection between the testing apparatus and the power consumption testing tool, and the Device Under Test (DUT). The tested accelerator is carried on the tested equipment such as a mobile phone, a board card, a server and the like.
In the embodiment of the application, determining the test index of the tested accelerator according to the test result includes one or any combination of the following:
time delay, throughput, loss of precision value, power consumption, computational power energy consumption ratio.
Wherein,
determining the time delay comprises: using the ith network model test for each input of an image in a single frameTest time T of1iAnd the total number of frames N of the input image1iThen the time delay Metric is determined1Comprises the following steps:
Figure BDA0002359457100000071
determining the throughput, comprising: at each time with B2i=2j(j is 0,1,2,3 … 7) frame mode input images, respectively recording test time T used by using ith network model test2ijAnd the total number of frames N of the input image2ijDetermining 2 inputs per time for the ith network modeljThroughput per frame, Throughput per frame, and Throughput per frame2ij=N2ij×B2ij/T2ij(ii) a Will T2ij/N2ij<The Throughput with the largest value among the throughputs of 7s is taken as the Throughput Throughput determined when the ith network model is used for testing2i(for a network model with 8 frame input modes, one throughput, T, can be obtained for each mode2ij/N2ij<The corresponding throughput at 7s is the throughput to be considered, the maximum throughput under the premise is selected), and the throughput Metric of the tested accelerator is determined2Comprises the following steps:
Figure BDA0002359457100000072
determining the loss of precision value, comprising: obtaining the test accuracy of the ith network model when the ith network model is used for testing
Figure BDA0002359457100000074
The loss of precision value Metric is determined3Comprises the following steps:
Figure BDA0002359457100000073
among them, AccurancyiA preset accuracy for the ith said network model;
determining the power consumption, comprising: acquiring consumed energy W during the test by using the ith network modeliAnd the total number of frames N of the input image4iThen ensureDetermining the power consumption Metric4Comprises the following steps:
Figure BDA0002359457100000081
Withe unit of (a) is tile;
determining the computational power-to-energy ratio, comprising: determining the computing power to energy ratio Metric5Comprises the following steps:
Figure BDA0002359457100000082
wherein, T4iAnd Num is the number of the network models, wherein the total frame number of the input images is the number of the ith network model during testing.
The five test indexes can be displayed through an index system schematic diagram, and the specific conditions of the indexes of the tested processors can be clearly and visually given.
Referring to fig. 4, fig. 4 is a schematic diagram of a benchmark test index system in the embodiment of the present application. In fig. 4, the central point is used as a starting point, and the farther the value of each index is from the central point, the stronger the corresponding capability of the processor under test is identified. The performance of the tested processor can be comprehensively evaluated, namely, the performance of the tested processor is better when the area enclosed by the sequence connecting lines of the five index values is larger. As in fig. 4, the area of processor 1 (enclosed by the dashed line) is larger than the area of processor 2 (enclosed by the implementation), the performance of processor 1 is better than the performance of processor 2.
The embodiment of the application is based on an objective evaluation method, namely benchmark test, playing an important role in the evolution of a computer architecture, and provides an end-to-end deep learning processor evaluation method aiming at the particularity of a deep learning processor in the aspects of architecture design and implementation. The similarity of the model is analyzed through the key information of the model, a threshold value is set to determine a model pool, the tested model can reflect the performance of the processor based on a specified application scene (macro test) while the tested model comprises operator layer evaluation (micro test), and five types of indexes output by a benchmark test tool can objectively, comprehensively and comprehensively reflect the level of a deep learning processor/accelerator/IP. The method for obtaining the performance evaluation of the deep learning processor during actual deployment, provided by the embodiment of the application, provides valuable ideas and suggestions for the development of an artificial intelligent chip reference test system and the construction of an evaluation platform, and has wide market prospects and application values.
Based on the same inventive concept, the embodiment of the application also provides a deep learning accelerator benchmark testing device. Referring to fig. 5, fig. 5 is a schematic structural diagram of an apparatus applied to the above technology in the embodiment of the present application. The device comprises: a selection unit 501, a processing unit 502, a test unit 503, an acquisition unit 504, and a determination unit 505;
a selecting unit 501, configured to select a network model for testing according to a deployment position and an application scenario of an accelerator to be tested;
a processing unit 502, configured to obtain test data and perform preprocessing on the test data;
the test unit 503 is configured to configure a benchmark test environment component, load the network model selected by the selection unit 501, and run test data preprocessed by the processing unit 502 on the accelerator to be tested;
an obtaining unit 504, configured to obtain a test result;
a determining unit 505, configured to determine a test index of the accelerator under test according to the test result obtained by the obtaining unit 504.
Preferably, the first and second electrodes are formed of a metal,
the selection unit 501 is specifically configured to select a reference network model according to the deployment position and the application scenario of the accelerator to be tested; if the network model is added, the principle of model difference degree is satisfied between the newly added network model and the selected network model.
Preferably, the first and second electrodes are formed of a metal,
the processing unit 502 is specifically configured to perform scaling processing and normalization processing on the test data, so that the processed test data meets the requirements of the selected network model.
The test unit 503 is specifically configured to determine a mode of invoking the accelerator to be tested according to the model of the accelerator to be tested; and loading the network model, and calling the accelerator to be tested by the preprocessed test data in a determined mode.
Wherein, the mode is as follows:
through deep learning inference framework invocation;
or, accelerating the call through the self-contained architecture;
or, via android nn calls.
Preferably, the test index comprises one or any combination of the following:
time delay, throughput, loss of precision value, power consumption, computational power energy consumption ratio.
The units of the above embodiments may be integrated into one body, or may be separately deployed; may be combined into one unit or further divided into a plurality of sub-units.
In another embodiment, an electronic device is also provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the deep learning accelerator benchmarking method when executing the program.
In another embodiment, a computer readable storage medium is also provided having stored thereon computer instructions that, when executed by a processor, may implement the steps in the deep learning accelerator benchmarking method.
Fig. 6 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 6, the electronic device may include: a Processor (Processor)610, a communication Interface (Communications Interface)620, a Memory (Memory)630 and a communication bus 640, wherein the Processor 610, the communication Interface 620 and the Memory 630 communicate with each other via the communication bus 640. The processor 610 may call logic instructions in the memory 630 to perform the following method:
selecting a network model for testing according to the deployment position and the application scene of the accelerator to be tested;
acquiring test data and preprocessing the test data;
configuring a benchmark test environment component, loading the network model, and running the preprocessed test data on the accelerator to be tested;
and acquiring a test result, and determining a test index of the tested accelerator.
In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A deep learning accelerator benchmarking method, the method comprising:
selecting a network model for testing according to the deployment position and the application scene of the accelerator to be tested;
acquiring test data and preprocessing the test data;
configuring a benchmark test environment component, loading the network model, and running the preprocessed test data on the accelerator to be tested;
and acquiring a test result, and determining a test index of the tested accelerator.
2. The method of claim 1, wherein selecting the network model for testing according to the deployment location and application scenario of the accelerator under test comprises:
selecting a network model according to the deployment position and the application scene of the accelerator to be tested;
if the network model is added, the principle of model difference degree is satisfied between the newly added network model and the selected network model.
3. The method of claim 1, wherein the preprocessing the test data comprises:
and carrying out scaling processing and standardization processing on the test data so that the processed test data meets the requirements of the selected network model.
4. The method of claim 1, wherein the loading the network model and the running of the preprocessed test data on the accelerator under test comprises:
determining a mode for calling the accelerator to be tested according to the model of the accelerator to be tested;
and loading the network model, and calling the accelerator to be tested by the preprocessed test data in a determined mode.
5. The method of claim 4, wherein the manner is:
through deep learning inference framework invocation;
or, accelerating the call through the self-contained architecture;
or, via android nn calls.
6. The method of claim 1, wherein the test indicator comprises one or any combination of the following:
time delay, throughput, loss of precision value, power consumption, computational power energy consumption ratio.
7. The method of claim 6,
determining the time delay comprises: a test time T used for testing using the ith network model each time an image is input in a single frame manner1iAnd the total number of frames N of the input image1iThen the time delay Metric is determined1Comprises the following steps:
Figure FDA0002359457090000021
determining the throughput, comprising: at each time with B2i=2jWhen the image is input in a frame mode, the test time T used by the ith network model test is respectively recorded2ijAnd the total number of frames N of the input image2ijDetermining 2 inputs per time for the ith network modeljThroughput per frameThroughput2ij=N2ij×B2ij/T2ij(ii) a Will T2ij/N2ij<The Throughput with the largest value among the throughputs of 7s is taken as the Throughput Throughput determined when the ith network model is used for testing2iDetermining the throughput Metric of the accelerator under test2Comprises the following steps:
Figure FDA0002359457090000022
wherein j is an integer of not less than 0 and not more than 7;
determining the loss of precision value, comprising: when the ith network model is used for testing, acquiring the testing accuracy Accurancy of the ith network modeli realDetermining a loss of precision value Metric3Comprises the following steps:
Figure FDA0002359457090000023
among them, AccurancyiA preset accuracy for the ith said network model;
determining the power consumption, comprising: acquiring consumed energy W during the test by using the ith network modeliAnd the total number of frames N of the input image4iDetermining the power consumption Metric4Comprises the following steps:
Figure FDA0002359457090000024
determining the computational power-to-energy ratio, comprising: determining the computing power to energy ratio Metric5Comprises the following steps:
Figure FDA0002359457090000025
wherein, T4iAnd Num is the number of the network models, wherein the total frame number of the input images is the number of the ith network model during testing.
8. A deep learning accelerator benchmarking apparatus, the apparatus comprising: the device comprises a selection unit, a processing unit, a testing unit, an acquisition unit and a determination unit;
the selection unit is used for selecting a network model for testing according to the deployment position and the application scene of the accelerator to be tested;
the processing unit is used for acquiring test data and preprocessing the test data;
the test unit is used for configuring a benchmark test environment component, loading the network model selected by the selection unit, and running test data preprocessed by the processing unit on the accelerator to be tested;
the acquisition unit is used for acquiring a test result;
the determining unit is used for determining the test index of the tested accelerator according to the test result acquired by the acquiring unit.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-7 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 7.
CN202010017521.3A 2020-01-08 2020-01-08 Deep learning accelerator benchmark test method and device Active CN111242314B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010017521.3A CN111242314B (en) 2020-01-08 2020-01-08 Deep learning accelerator benchmark test method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010017521.3A CN111242314B (en) 2020-01-08 2020-01-08 Deep learning accelerator benchmark test method and device

Publications (2)

Publication Number Publication Date
CN111242314A true CN111242314A (en) 2020-06-05
CN111242314B CN111242314B (en) 2023-03-21

Family

ID=70880216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010017521.3A Active CN111242314B (en) 2020-01-08 2020-01-08 Deep learning accelerator benchmark test method and device

Country Status (1)

Country Link
CN (1) CN111242314B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581366A (en) * 2020-11-30 2021-03-30 黑龙江大学 Portable image super-resolution system and system construction method
WO2022012046A1 (en) * 2020-07-17 2022-01-20 苏州浪潮智能科技有限公司 Method and apparatus for selecting optimization direction of benchmark test in deep learning
CN117494759A (en) * 2023-11-24 2024-02-02 深圳市蓝鲸智联科技股份有限公司 Micro hardware machine learning method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017157203A1 (en) * 2016-03-18 2017-09-21 阿里巴巴集团控股有限公司 Reference test method and device for supervised learning algorithm in distributed environment
CN109376041A (en) * 2018-09-19 2019-02-22 广州优亿信息科技有限公司 A kind of Benchmark test system and its workflow for AI chip for cell phone
CN109918281A (en) * 2019-03-12 2019-06-21 中国人民解放军国防科技大学 Multi-bandwidth target accelerator efficiency testing method
CN110096401A (en) * 2019-05-13 2019-08-06 苏州浪潮智能科技有限公司 A kind of server data process performance test method and device
CN110515811A (en) * 2019-08-09 2019-11-29 中国信息通信研究院 Terminal artificial intelligence performance benchmark test method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017157203A1 (en) * 2016-03-18 2017-09-21 阿里巴巴集团控股有限公司 Reference test method and device for supervised learning algorithm in distributed environment
CN109376041A (en) * 2018-09-19 2019-02-22 广州优亿信息科技有限公司 A kind of Benchmark test system and its workflow for AI chip for cell phone
CN109918281A (en) * 2019-03-12 2019-06-21 中国人民解放军国防科技大学 Multi-bandwidth target accelerator efficiency testing method
CN110096401A (en) * 2019-05-13 2019-08-06 苏州浪潮智能科技有限公司 A kind of server data process performance test method and device
CN110515811A (en) * 2019-08-09 2019-11-29 中国信息通信研究院 Terminal artificial intelligence performance benchmark test method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张蔚敏: "深度神经网络硬件基准测试现状及发展趋势", 《信息通信技术与政策》 *
杨旭瑜等: "深度学习加速技术研究", 《计算机系统应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022012046A1 (en) * 2020-07-17 2022-01-20 苏州浪潮智能科技有限公司 Method and apparatus for selecting optimization direction of benchmark test in deep learning
CN112581366A (en) * 2020-11-30 2021-03-30 黑龙江大学 Portable image super-resolution system and system construction method
CN117494759A (en) * 2023-11-24 2024-02-02 深圳市蓝鲸智联科技股份有限公司 Micro hardware machine learning method and system

Also Published As

Publication number Publication date
CN111242314B (en) 2023-03-21

Similar Documents

Publication Publication Date Title
CN111242314B (en) Deep learning accelerator benchmark test method and device
CN110782335B (en) Method, device and storage medium for processing credit data based on artificial intelligence
CN108764487A (en) For generating the method and apparatus of model, the method and apparatus of information for identification
CN113241064B (en) Speech recognition, model training method and device, electronic equipment and storage medium
CN107995370A (en) Call control method, device and storage medium and mobile terminal
CN110445939B (en) Capacity resource prediction method and device
CN110929836B (en) Neural network training and image processing method and device, electronic equipment and medium
CN110609908A (en) Case serial-parallel method and device
CN112132279A (en) Convolutional neural network model compression method, device, equipment and storage medium
CN109474923A (en) Object identifying method and device, storage medium
CN109766476A (en) Video content sentiment analysis method, apparatus, computer equipment and storage medium
CN111667002A (en) Currency identification method, currency identification device and electronic equipment
CN113806501B (en) Training method of intention recognition model, intention recognition method and equipment
CN110610140A (en) Training method, device and equipment of face recognition model and readable storage medium
CN110188798A (en) A kind of object classification method and model training method and device
CN110046636A (en) Prediction technique of classifying and device, prediction model training method and device
CN113344064A (en) Event processing method and device
CN113240396A (en) Method, device and equipment for analyzing working state of employee and storage medium
CN111683296A (en) Video segmentation method and device, electronic equipment and storage medium
CN113763968B (en) Method, apparatus, device, medium, and product for recognizing speech
CN114237182B (en) Robot scheduling method and system
CN107154996B (en) Incoming call interception method and device, storage medium and terminal
CN111353577A (en) Optimization method and device of multi-task-based cascade combination model and terminal equipment
CN115375965A (en) Preprocessing method for target scene recognition and target scene recognition method
CN112765022B (en) Webshell static detection method based on data stream and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant