WO2022246705A1 - 一种深度学习模型测试的方法、设备及计算机存储介质 - Google Patents

一种深度学习模型测试的方法、设备及计算机存储介质 Download PDF

Info

Publication number
WO2022246705A1
WO2022246705A1 PCT/CN2021/096132 CN2021096132W WO2022246705A1 WO 2022246705 A1 WO2022246705 A1 WO 2022246705A1 CN 2021096132 W CN2021096132 W CN 2021096132W WO 2022246705 A1 WO2022246705 A1 WO 2022246705A1
Authority
WO
WIPO (PCT)
Prior art keywords
deep learning
learning model
compiler
acceleration
library
Prior art date
Application number
PCT/CN2021/096132
Other languages
English (en)
French (fr)
Inventor
胡鹏
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to CN202180001277.4A priority Critical patent/CN115701302A/zh
Priority to PCT/CN2021/096132 priority patent/WO2022246705A1/zh
Publication of WO2022246705A1 publication Critical patent/WO2022246705A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present disclosure relates to the technical field of automated testing, in particular to a method, device and computer storage medium for deep learning model testing.
  • a method for testing a deep learning model provided by an embodiment of the present disclosure is applied to an edge device, including:
  • the deep learning model is tested by using the test samples.
  • the acceleration method corresponding to the acceleration instruction includes multiple acceleration methods, according to the system type of the edge device currently used for testing the deep learning model and the hardware performance of the edge device, select from the multiple acceleration methods Choose an acceleration method that meets preset performance metrics.
  • test sample before using the test sample to test the deep learning model, it also includes:
  • the algorithm code corresponding to the deep learning model is compiled by using the compiler, and packaged into a library.
  • the type of the packaged library is determined as follows:
  • the compiler is a Windows compiler, it is determined that the type of the packaged library is a dll library.
  • the compiler is determined by one or more of the following:
  • the compiler is one of gcc, g++ and a cross compiler
  • the compiler is one of gcc, g++ and a cross compiler
  • the compiler after using the compiler to compile the algorithm code corresponding to the deep learning model and package it into a library, it also includes:
  • test sample after using the test sample to test the deep learning model, it also includes:
  • the acceleration method includes one or more of the following:
  • a device for testing a deep learning model includes a processor and a memory, the memory is used to store a program executable by the processor, and the processor is used to read the memory program in and perform the following steps:
  • the deep learning model is tested by using the test sample.
  • the processor before the acceleration of the deep learning model, the processor is further configured to execute:
  • the acceleration method corresponding to the acceleration instruction includes multiple acceleration methods, according to the system type of the edge device currently used for testing the deep learning model and the hardware performance of the edge device, select from the multiple acceleration methods Choose an acceleration method that meets preset performance metrics.
  • the processor before using the test sample to test the deep learning model, the processor is further configured to execute:
  • the algorithm code corresponding to the deep learning model is compiled by using the compiler, and packaged into a library.
  • the processor is specifically configured to determine the type of the packaged library by:
  • the compiler is a Windows compiler, it is determined that the type of the packaged library is a dll library.
  • the processor is specifically configured to determine the compiler in one or more of the following ways:
  • the compiler is one of gcc, g++ and a cross compiler
  • the compiler is one of gcc, g++ and a cross compiler
  • the algorithm code corresponding to the deep learning model is compiled by using the compiler, and after being packaged into a library, the processor is further configured to execute:
  • the processor is further configured to:
  • the acceleration method includes one or more of the following:
  • the embodiment of the present disclosure also provides a device for deep learning model testing, including:
  • An acceleration model unit configured to obtain an acceleration instruction specified by a user, and accelerate the deep learning model according to an acceleration method corresponding to the acceleration instruction, so as to increase the inference speed of the deep learning model;
  • a test model unit configured to use the test samples to test the deep learning model.
  • the accelerated model unit before the acceleration of the deep learning model, is further used to:
  • the acceleration method corresponding to the acceleration instruction includes multiple acceleration methods, according to the system type of the edge device currently used for testing the deep learning model and the hardware performance of the edge device, select from the multiple acceleration methods Choose an acceleration method that meets preset performance metrics.
  • a compiling unit before using the test sample to test the deep learning model, a compiling unit is specifically used for:
  • the algorithm code corresponding to the deep learning model is compiled by using the compiler, and packaged into a library.
  • the compilation unit is used to determine the type of the packaged library in the following manner:
  • the compiler is a Windows compiler, it is determined that the type of the packaged library is a dll library.
  • the compilation unit is used to determine the compiler in one or more of the following ways:
  • the compiler is one of gcc, g++ and a cross compiler
  • the compiler is one of gcc, g++ and a cross compiler
  • the algorithm code corresponding to the deep learning model is compiled by using the compiler, and after being packaged into a library, the compilation unit is also used for:
  • the test model unit is further used for:
  • the acceleration method includes one or more of the following:
  • an embodiment of the present disclosure further provides a computer storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps of the method described in the above-mentioned first aspect are implemented.
  • Fig. 1 is a flow chart of implementing a method for testing a deep learning model provided by an embodiment of the present disclosure
  • FIG. 2 is an implementation flowchart of an automated test provided by an embodiment of the present disclosure
  • FIG. 3A is a schematic configuration diagram of enabling an authentication function provided by an embodiment of the present disclosure
  • FIG. 3B is a schematic configuration diagram of enabling an authentication function provided by an embodiment of the present disclosure.
  • FIG. 3C is a schematic configuration diagram of enabling an authentication function provided by an embodiment of the present disclosure.
  • FIG. 4 is an implementation flowchart of an automated test provided by an embodiment of the present disclosure
  • FIG. 5 is a complete automated test implementation flowchart provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of a device for testing a deep learning model provided by an embodiment of the present disclosure
  • FIG. 7 is a schematic diagram of an apparatus for testing a deep learning model provided by an embodiment of the present disclosure.
  • This embodiment provides a method for automatically testing a deep learning model, and provides an online or offline testing method, which improves testing efficiency and saves labor costs. It should be noted that due to the complex calculation of the deep learning model, it cannot be directly deployed on the edge device. Therefore, it is necessary to accelerate the deep learning model, reduce parameter redundancy, reduce storage occupation, and reduce computational complexity.
  • this embodiment provides a method for testing a deep learning model.
  • the method in this embodiment is applied to offline devices such as edge devices, where the edge devices include but are not limited to computing workstations, PC terminals, Chip boards, etc., and the operating systems of edge devices include but are not limited to windows, linux, android, etc.
  • the core idea of the deep learning model testing method provided by the embodiment of the present disclosure is to establish an automated process on the edge device, from obtaining the deep learning model, accelerating the deep learning model, to obtaining test samples for testing after acceleration. , to achieve a one-click accelerated testing method, which improves the deployment efficiency of deep learning models.
  • Due to the complexity of the algorithm of the deep learning model when testing the deep learning model, it needs to be tested on a device equipped with a processor with a large data processing capability.
  • the cloud server can be used to test the deep learning model. When the learning model is tested, it is only for the test process of the deep learning algorithm itself.
  • test process in this embodiment is applicable to various deep learning models, it is a standardized and automated deep learning model test process, which can meet the testing requirements of various deep learning models, and can be tested by using edge devices, which can improve Efficiency of deploying deep learning models on edge devices.
  • the implementation process of the test method provided in this embodiment is as follows:
  • Step 100 obtaining the deep learning model to be deployed
  • the deep learning models to be deployed can be stored in a model warehouse, wherein the model warehouse is used to store various deep learning models.
  • the storage forms include but are not limited to deep learning model codes, mirror codes, and the like.
  • the deep learning model to be deployed can be obtained from a local server or a cloud server.
  • Step 101 Obtain an acceleration instruction specified by the user, and accelerate the deep learning model according to the acceleration method corresponding to the acceleration instruction, so as to increase the inference speed of the deep learning model;
  • acceleration instructions and acceleration methods obtained in this embodiment includes, but is not limited to, some or all of the following relationships: one-to-one, one-to-many, and many-to-many.
  • one-to-one refers to obtaining an acceleration instruction, the acceleration instruction corresponds to an acceleration method, and the deep learning model is accelerated according to an acceleration method corresponding to the acceleration instruction during implementation;
  • one-to-many is Refers to obtaining an acceleration instruction that corresponds to multiple acceleration methods.
  • the deep learning model is accelerated simultaneously or in stages according to various acceleration methods corresponding to the acceleration instruction; many-to-many refers to obtaining multiple
  • Each acceleration instruction corresponds to an acceleration method.
  • the deep learning model is accelerated simultaneously or in stages according to the acceleration method corresponding to each acceleration instruction.
  • the deep learning model is accelerated according to the acceleration method specified by the user.
  • the various acceleration methods Select an acceleration method that satisfies a preset performance index.
  • the preset performance index includes but is not limited to the best performance and/or the fastest running speed.
  • the system type includes but not limited to: windows, linux, and android; the hardware performance can be determined according to the processor CPU performance, memory performance, and memory size of the edge device.
  • an acceleration method that satisfies a preset performance index is selected, specifically including one or more of the following implementation methods:
  • Method 1 In the process of accelerating the deep learning model on the edge device, select an acceleration method that runs the deep learning model with the largest number of codes per unit time, that is, select an acceleration method with the fastest running speed;
  • Method 2 When the edge device accelerates the deep learning model, select an acceleration method with the least CPU usage of the edge device, that is, select an acceleration method with the best performance;
  • Method 3 When the edge device accelerates the deep learning model, choose an acceleration method that runs the deep learning model with the most codes per unit time and the CPU usage of the edge device is the least, that is, choose a running speed The fastest acceleration method with the least CPU usage;
  • Method 4 When the edge device accelerates the deep learning model, use the weights corresponding to the performance and the running speed to perform a weighted summation of the CPU usage and running speed obtained during the acceleration process, and choose the one with the smallest summation value. acceleration method.
  • the acceleration methods provided in this embodiment include but are not limited to one or more of the following:
  • MNN Mobile Neural Network
  • reasoning framework TNN reasoning framework
  • neural network reasoning engine Tengine-Lite neural network reasoning engine
  • Step 102 after completing the acceleration, obtain a test sample corresponding to the deep learning model
  • test samples may be stored in a data warehouse, and the data warehouse is used to store test samples corresponding to deep learning models.
  • Step 103 using the test sample to test the deep learning model.
  • the automated testing process provided in this embodiment in order to facilitate the deployment of the deep learning model on the edge device, completes the complete automated process of the deep learning model from acquisition to acceleration and testing, which improves the efficiency of pre-deployment preparations and saves human cost.
  • the automated testing process provided by this embodiment can effectively reduce the calculation amount of the deep learning model and improve the processing speed due to the accelerated processing of the deep learning model, especially for the edge device at the local end. The process realizes offline testing.
  • various acceleration methods involved in this embodiment include but are not limited to MNN, TNN, and Tengine-Lite.
  • the user can specify to select one with the smallest speed and the least CPU usage. way of acceleration.
  • the three acceleration modes involved in this embodiment are described as follows:
  • MNN is a lightweight deep neural network reasoning engine, which core solves the problem of inference operation of deep neural network models on the device side, covering the optimization, conversion and reasoning of deep neural network models.
  • MNN can be divided into two parts: Converter and Interpreter.
  • Frontends is responsible for supporting different training frameworks.
  • MNN currently supports Tensorflow (Lite), Caffe, and ONNX;
  • Graph Optimize optimizes graphs through operator fusion, operator replacement, and layout adjustment.
  • Interpreter consists of engine Engine and backend Backends.
  • Engine is responsible for loading models and scheduling calculation graphs;
  • Backends includes memory allocation under each computing device.
  • MNN applies a variety of optimization schemes, including applying Winograd algorithm in convolution and deconvolution, applying matrix multiplication Strassen algorithm in matrix multiplication, low-precision calculation, handwritten assembly, multithreading Optimization, memory reuse, heterogeneous computing, etc.
  • TNN is a high-performance, lightweight reasoning framework for mobile terminals, and has many outstanding advantages such as cross-platform, high performance, model compression, and code tailoring.
  • TNN includes: model conversion, low-precision optimization, operator compilation optimization, computing engine, hardware architecture, etc.
  • model conversion is used for model analysis and conversion
  • low-precision optimization is used for FP16 low-precision conversion
  • INT8 post-training quantization is used for operator Compilation optimization includes operator tuning, layout optimization, calculation graph optimization, etc.
  • computing engine includes high-performance kernel implementation and high-efficiency memory scheduling
  • hardware architecture includes ARM, GPU, NPU, etc.
  • Tengine-Lite realizes the rapid and efficient deployment of deep learning neural network models on embedded devices.
  • the characteristics of Tengine-Lite are: only rely on C library, have an independent model loading process, maintain a unified application interface with Tengine (Web server project), support CMSIS-NN and HCL-M operator libraries, support AI accelerators and heterogeneous computing , open support for Caffe/TensorFlow/MXNet models, and provide model quantization training tools.
  • the advantage of TNN is that it is lightweight, easy to deploy, decoupled model deployment and model running code, Cortex-A/M unified ecology, MCU application can be easily transplanted to AP, supports customized development of operators, improves performance at the same time, and adapts to embedded AI platform, giving developers more freedom to choose, etc.
  • this embodiment before using the test sample to test the deep learning model, this embodiment also provides a compiling method, and the specific execution steps are as follows:
  • Step 1) Determine the compiler according to the system type of the edge device used by the current test deep learning model
  • Step 2 Using the compiler to compile the algorithm code corresponding to the deep learning model, and package it into a library.
  • CMAKE cross-platform installation (compilation) tool
  • gcc g++
  • cross-compiler algorithm code compilation is packed into so storehouse Form
  • CMAKE control the execution of the compiler by setting the macro switch, specify whether to execute the Windows compiler, and compile and package the algorithm code into a dll library form. That is, this embodiment provides one or more forms of libraries, and the type of the packaged libraries can be determined as follows:
  • the compiler is a Windows compiler, it is determined that the type of the packaged library is a dll library.
  • This embodiment provides an automated process for accelerating, compiling, and testing a deep learning model, realizing one-click compilation and one-click packaging, and speeding up the deployment of the deep learning model.
  • the entire automation process of this embodiment includes but is not limited to one or more of the following middleware:
  • Model warehouse used to store deep learning models to be deployed
  • the code warehouse is used to store the algorithm code corresponding to the deep learning model to be deployed
  • the data warehouse is used to store the test samples, test data, and test reports corresponding to the deep learning models to be deployed.
  • the docker image of the compilation platform is used to realize the compilation and packaging of the deep learning model.
  • the embodiment of the present disclosure provides an automated testing process, and the specific implementation steps are as follows:
  • Step 200 obtain the deep learning model to be deployed, and store the deep learning model in the model warehouse;
  • Step 201 Obtain an acceleration instruction specified by the user, and select an acceleration method corresponding to the acceleration instruction from the acceleration library, which has the smallest operation speed and the least memory usage;
  • Step 202 using the selected acceleration method to accelerate the deep learning model
  • Step 203 determine that the acceleration is completed
  • Step 204 determine the compiler according to the system type used in the current test
  • Step 205 using the compiler to compile the algorithm code corresponding to the deep learning model, and package it into a library;
  • Step 206 obtaining a test sample corresponding to the deep learning model from a database
  • Step 207 using the test sample to test the deep learning model.
  • some function libraries can be encapsulated into the algorithm code of the deep learning model by using a compilation macro, so that After the deep learning model is deployed on the edge device, the function of this function library can be used.
  • the specific implementation is as follows:
  • At least one preset function library is encapsulated into the deep learning model, and the preset function library is used to realize one or more of authentication function, encryption function and network function.
  • the functions that can be realized by each function library are described as follows:
  • the method of authorized activation is adopted, based on the hardware fingerprint (read by the fingerprint tool) of the device (including edge device or cloud device), which is unique to the device. If each application for a license is valid for 3 months from the date of application , can apply for permanent validity after formal purchase.
  • the Linux platform enabling the authentication function needs to be configured as shown in FIG. 3A , FIG. 3B , and FIG. 3C .
  • the interface shown in Figure 3C is displayed, and the license can be downloaded in response to the selection instruction of the operating platform and the click instruction of the download button on this interface.
  • the advanced encryption standard Advanced Encryption Standard, AES
  • AES Advanced Encryption Standard
  • the http post request method is used to encrypt and transmit the data in the form of Json messages, so the network function and the encryption function need to be enabled at the same time.
  • the embodiment of the present disclosure also provides an automated testing process, and encapsulates the user-specified function library into the compiled and packaged library of the deep learning model to realize the authentication function and encryption of the deep learning model.
  • functions and network functions, etc. as shown in Figure 4, the specific implementation of the process is as follows:
  • Step 400 obtain the deep learning model to be deployed, and store the deep learning model in the model warehouse;
  • Step 401 Obtain an acceleration instruction specified by the user, and select an acceleration method corresponding to the acceleration instruction from the acceleration library, which has the smallest computing speed and the least CPU usage;
  • Step 402 using the selected acceleration method to accelerate the deep learning model
  • Step 403 determine that the acceleration is completed
  • Step 404 determine the compiler according to the system type used in the current test
  • Step 405 using the compiler to compile the algorithm code corresponding to the deep learning model, and package it into a library
  • Step 406 packaging one or more of the authentication function library, the encryption function library and the network function library into the packaged library;
  • Step 407 obtaining a test sample corresponding to the deep learning model from a database
  • Step 408 using the test sample to test the deep learning model.
  • the deep learning model can be tested through one or more of the following devices: server device; cloud device; edge device.
  • this embodiment can accelerate the deep learning model through one or more of the following devices: server device; cloud device; edge device.
  • this embodiment can compile the deep learning model through one or more of the following devices: server device; cloud device; edge device.
  • the test sample after using the test sample to test the deep learning model, it also includes: generating a test report according to the test data obtained from the test, so that technicians can view it conveniently, and judge the test according to the content in the test report. Whether deep learning models can be deployed on edge devices.
  • this embodiment can associate the model warehouse, code warehouse, and data warehouse based on the Gitlab runner function to realize the process of automatic acceleration, compilation and testing of the deep learning model in this embodiment, so that the entire process can be realized Standardization, automation, and modularization greatly shorten the algorithm development cycle.
  • this embodiment also provides a complete automated testing process, which is applied to edge devices.
  • the specific implementation steps of the process are as follows:
  • Step 500 obtaining the deep learning model to be deployed
  • the deep learning model to be deployed can be acquired through a cloud server or a local server, which is not limited in this embodiment.
  • Step 501 storing the deep learning model in a model warehouse
  • the model warehouse is a model storage partition in the edge device for storing deep learning models.
  • Step 502 Obtain an acceleration instruction specified by the user, and select an acceleration method corresponding to the acceleration instruction from the acceleration library, which has the smallest operation speed and the least memory usage;
  • the acceleration image docker corresponding to the acceleration method can be automatically pulled down through the command line (code), and the deep learning model can be accelerated by using the code corresponding to the acceleration method in the acceleration image.
  • acceleration library is an acceleration storage partition in the edge device.
  • Step 503 Use the selected acceleration method to accelerate the deep learning model, and determine that the acceleration is completed;
  • Step 504 determine the compiler according to the system type used in the current test
  • the compiler can be determined according to business requirements or system types.
  • Step 505 using the compiler to compile the algorithm code corresponding to the deep learning model, and package it into a library
  • Step 506 packaging one or more of the authentication function library, the encryption function library and the network function library into the packaged library;
  • Step 507 obtaining a test sample corresponding to the deep learning model from a database
  • test samples are automatically pulled down from the database.
  • Step 508 using the test sample to test the deep learning model.
  • Step 509 generating a test report according to the test data obtained from the test.
  • the embodiment of the present disclosure also provides a device for deep learning model testing, since the device is the device in the method in the embodiment of the present disclosure, and the device solves the problem
  • the principle is similar to the method, so the implementation of the device can refer to the implementation of the method, and the repetition will not be repeated.
  • the device includes a processor 600 and a memory 601, the memory is used to store a program executable by the processor, and the processor is used to read the program in the memory and perform the following steps:
  • the deep learning model is tested by using the test samples.
  • the processor before the acceleration of the deep learning model, the processor is further configured to execute:
  • the acceleration method corresponding to the acceleration instruction includes multiple acceleration methods, according to the system type of the edge device currently used for testing the deep learning model and the hardware performance of the edge device, select from the multiple acceleration methods Choose an acceleration method that meets preset performance metrics.
  • the processor before using the test sample to test the deep learning model, the processor is further configured to execute:
  • the algorithm code corresponding to the deep learning model is compiled by using the compiler, and packaged into a library.
  • the processor is specifically configured to determine the type of the packaged library in the following manner:
  • the compiler is a Windows compiler, it is determined that the type of the packaged library is a dll library.
  • the processor is specifically configured to determine the compiler in one or more of the following ways:
  • the compiler is one of gcc, g++ and a cross compiler
  • the compiler is one of gcc, g++ and a cross compiler
  • the algorithm code corresponding to the deep learning model is compiled by using the compiler, and after being packaged into a library, the processor is further configured to execute:
  • the processor is further configured to:
  • the acceleration method includes one or more of the following:
  • the embodiment of the present disclosure also provides a device for deep learning model testing, since the device is the device in the method in the embodiment of the present disclosure, and the device solves the problem
  • the principle is similar to the method, so the implementation of the device can refer to the implementation of the method, and the repetition will not be repeated.
  • the device includes:
  • the acceleration model unit 701 is configured to obtain an acceleration instruction specified by the user, and accelerate the deep learning model according to the acceleration method corresponding to the acceleration instruction, so as to increase the reasoning speed of the deep learning model;
  • a test model unit 703, configured to use the test samples to test the deep learning model.
  • the accelerated model unit before the acceleration of the deep learning model, is further used to:
  • the acceleration method corresponding to the acceleration instruction includes multiple acceleration methods, according to the system type of the edge device currently used for testing the deep learning model and the hardware performance of the edge device, select from the multiple acceleration methods Choose an acceleration method that meets preset performance metrics.
  • a compiling unit before using the test sample to test the deep learning model, a compiling unit is specifically used for:
  • the algorithm code corresponding to the deep learning model is compiled by using the compiler, and packaged into a library.
  • the compilation unit is used to determine the type of the packaged library in the following manner:
  • the compiler is a Windows compiler, it is determined that the type of the packaged library is a dll library.
  • the compilation unit is used to determine the compiler in one or more of the following ways:
  • the compiler is one of gcc, g++ and a cross compiler
  • the compiler is one of gcc, g++ and a cross compiler
  • the algorithm code corresponding to the deep learning model is compiled by using the compiler, and after being packaged into a library, the compilation unit is also used for:
  • the test model unit is further used for:
  • the acceleration method includes one or more of the following:
  • embodiments of the present disclosure also provide a computer storage medium on which a computer program is stored, and when the program is executed by a processor, the following steps are implemented:
  • the deep learning model is tested by using the test samples.
  • the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Accordingly, the present disclosure can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本公开公开了一种深度学习模型测试的方法、设备及计算机存储介质,用于提供一种自动化的对深度学习模型进行加速、测试的流程。该方法包括:获取待部署的深度学习模型;响应于用户指定的加速指令,根据与所述加速指令对应的加速方法,对所述深度学习模型进行加速,以提高所述深度学习模型的推理速度;确定所述加速完成后,获取与所述深度学习模型对应的测试样本;利用所述测试样本对所述深度学习模型进行测试。

Description

一种深度学习模型测试的方法、设备及计算机存储介质 技术领域
本公开涉及自动化测试技术领域,特别涉及一种深度学习模型测试的方法、设备及计算机存储介质。
背景技术
目前深度学习算法在各个领域应用广泛,对于目前面临的在不同硬件平台上有效地部署深度学习模型而言,由于深度学习模型的大小和计算成本的爆炸性增长,给实际部署过程带来了不同程度的难点,目前在将深度学习模型部署在边缘设备之前,需要人工进行推理加速和编译测试,人力成本较高,效率低下。
因此,在不同的硬件资源(如片上存储器大小和算术单元的数量等),如何对不同的深度学习模型进行自动化加速和测试,以能够高效地将深度学习模型部署在边缘设备上,是目前亟需解决的技术问题。
发明内容
第一方面,本公开实施例提供的一种深度学习模型测试的方法,应用于边缘设备,包括:
获取待部署的深度学习模型;
获取用户指定的加速指令,根据与所述加速指令对应的加速方法,对所述深度学习模型进行加速,以提高所述深度学习模型的推理速度;
完成所述加速后,获取与所述深度学习模型对应的测试样本;
利用所述测试样本对所述深度学习模型进行测试。
在一些实施例中,所述对所述深度学习模型进行加速之前,还包括:
若与所述加速指令对应的加速方法包括多种,则根据当前测试所述深度学习模型所使用的所述边缘设备的系统类型和所述边缘设备的硬件性能,从 所述多种加速方法中选择一种满足预设性能指标的加速方法。
在一些实施例中,所述利用所述测试样本对所述深度学习模型进行测试之前,还包括:
根据当前测试所述深度学习模型所使用的所述边缘设备的系统类型,确定编译器;
利用所述编译器对所述深度学习模型对应的算法代码进行编译,打包成库。
在一些实施例中,通过如下方式确定打包的所述库的类型:
若所述编译器为gcc、g++以及交叉编译器中的一种,则确定打包的所述库的类型为so库;
若所述编译器为Windows编译器,则确定打包的所述库的类型为dll库。
在一些实施例中,通过如下一种或多种方式确定编译器:
若当前测试使用Linux系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
若当前测试使用ARM-Linux系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
若当前测试使用Android系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
若当前测试使用Windows系统,则确定编译器为Windows编译器。
在一些实施例中,所述利用所述编译器对所述深度学习模型对应的算法代码进行编译,打包成库之后,还包括:
将至少一个预设功能库封装到所述库中,所述预设功能库用于实现鉴权功能、加密功能以及网络功能中的一种或多种。
在一些实施例中,所述利用所述测试样本对所述深度学习模型进行测试之后,还包括:
根据测试得到的测试数据生成测试报告。
在一些实施例中,所述加速方法包括如下一种或多种:
移动神经网络MNN;
推理框架TNN;
神经网络推理引擎Tengine-Lite。
第二方面,本公开实施例提供的一种深度学习模型测试的设备,包括处理器和存储器,所述存储器用于存储所述处理器可执行的程序,所述处理器用于读取所述存储器中的程序并执行如下步骤:
获取待部署的深度学习模型;
获取用户指定的加速指令,根据与所述加速指令对应的加速方法,对所述深度学习模型进行加速,以提高所述深度学习模型的推理速度;
完成所述加速后,获取与所述深度学习模型对应的测试样本;
利用所述测试样本对所述深度学习模型进行测试。
在一些实施例中,所述对所述深度学习模型进行加速之前,所述处理器还被配置为执行:
若与所述加速指令对应的加速方法包括多种,则根据当前测试所述深度学习模型所使用的所述边缘设备的系统类型和所述边缘设备的硬件性能,从所述多种加速方法中选择一种满足预设性能指标的加速方法。
在一些实施例中,所述利用所述测试样本对所述深度学习模型进行测试之前,所述处理器还被配置为执行:
根据当前测试所述深度学习模型所使用的所述边缘设备的系统类型,确定编译器;
利用所述编译器对所述深度学习模型对应的算法代码进行编译,打包成库。
在一些实施例中,所述处理器具体被配置为通过如下方式确定打包的所述库的类型:
若所述编译器为gcc、g++以及交叉编译器中的一种,则确定打包的所述库的类型为so库;
若所述编译器为Windows编译器,则确定打包的所述库的类型为dll库。
在一些实施例中,所述处理器具体被配置为执行通过如下一种或多种方式确定编译器:
若当前测试使用Linux系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
若当前测试使用ARM-Linux系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
若当前测试使用Android系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
若当前测试使用Windows系统,则确定编译器为Windows编译器。
在一些实施例中,所述利用所述编译器对所述深度学习模型对应的算法代码进行编译,打包成库之后,所述处理器还被配置为执行:
将至少一个预设功能库封装到所述库中,所述预设功能库用于实现鉴权功能、加密功能以及网络功能中的一种或多种。
在一些实施例中,所述利用所述测试样本对所述深度学习模型进行测试之后,所述处理器还被配置为执行:
根据测试得到的测试数据生成测试报告。
在一些实施例中,所述加速方法包括如下一种或多种:
移动神经网络MNN;
推理框架TNN;
神经网络推理引擎Tengine-Lite。
第三方面,本公开实施例还提供一种深度学习模型测试的装置,包括:
获取模型单元,用于获取待部署的深度学习模型;
加速模型单元,用于获取用户指定的加速指令,根据与所述加速指令对应的加速方法,对所述深度学习模型进行加速,以提高所述深度学习模型的推理速度;
获取样本单元,用于完成所述加速后,获取与所述深度学习模型对应的测试样本;
测试模型单元,用于利用所述测试样本对所述深度学习模型进行测试。
在一些实施例中,所述对所述深度学习模型进行加速之前,所述加速模型单元具体还用于:
若与所述加速指令对应的加速方法包括多种,则根据当前测试所述深度学习模型所使用的所述边缘设备的系统类型和所述边缘设备的硬件性能,从所述多种加速方法中选择一种满足预设性能指标的加速方法。
在一些实施例中,所述利用所述测试样本对所述深度学习模型进行测试之前,还包括编译单元具体用于:
根据当前测试所述深度学习模型所使用的所述边缘设备的系统类型,确定编译器;
利用所述编译器对所述深度学习模型对应的算法代码进行编译,打包成库。
在一些实施例中,所述编译单元用于通过如下方式确定打包的所述库的类型:
若所述编译器为gcc、g++以及交叉编译器中的一种,则确定打包的所述库的类型为so库;
若所述编译器为Windows编译器,则确定打包的所述库的类型为dll库。
在一些实施例中,所述编译单元用于通过如下一种或多种方式确定编译器:
若当前测试使用Linux系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
若当前测试使用ARM-Linux系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
若当前测试使用Android系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
若当前测试使用Windows系统,则确定编译器为Windows编译器。
在一些实施例中,所述利用所述编译器对所述深度学习模型对应的算法 代码进行编译,打包成库之后,所述编译单元还用于:
将至少一个预设功能库封装到所述库中,所述预设功能库用于实现鉴权功能、加密功能以及网络功能中的一种或多种。
在一些实施例中,所述利用所述测试样本对所述深度学习模型进行测试之后,所述测试模型单元还用于:
根据测试得到的测试数据生成测试报告。
在一些实施例中,所述加速方法包括如下一种或多种:
移动神经网络MNN;
推理框架TNN;
神经网络推理引擎Tengine-Lite。
第四方面,本公开实施例还提供计算机存储介质,其上存储有计算机程序,该程序被处理器执行时用于实现上述第一方面所述方法的步骤。
本公开的这些方面或其他方面在以下的实施例的描述中会更加简明易懂。
附图说明
为了更清楚地说明本公开实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的一种深度学习模型测试的方法实施流程图;
图2为本公开实施例提供的一种自动化测试的实施流程图;
图3A为本公开实施例提供的一种开启鉴权功能的配置示意图;
图3B为本公开实施例提供的一种开启鉴权功能的配置示意图;
图3C为本公开实施例提供的一种开启鉴权功能的配置示意图;
图4为本公开实施例提供的一种自动化测试的实施流程图;
图5为本公开实施例提供的一种完整的自动化测试实施流程图;
图6为本公开实施例提供的一种深度学习模型测试的设备示意图;
图7为本公开实施例提供的一种深度学习模型测试的装置示意图。
具体实施方式
为了使本公开的目的、技术方案和优点更加清楚,下面将结合附图对本公开作进一步地详细描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本公开保护的范围。
本公开实施例描述的应用场景是为了更加清楚的说明本公开实施例的技术方案,并不构成对于本公开实施例提供的技术方案的限定,本领域普通技术人员可知,随着新应用场景的出现,本公开实施例提供的技术方案对于类似的技术问题,同样适用。其中,在本公开的描述中,除非另有说明,“多个”的含义是两个或两个以上。
随着深度学习模型的广泛应用,在将深度学习模型部署在边缘设备之前,需要人工对深度学习模型进行加速、测试,耗费大量的人力,成本较高。本实施例提供了一种自动化测试深度学习模型的方法,提供一种应用于线上或线下的测试方法,提高测试效率,节省了人力成本。需要说明的是,由于深度学习模型计算复杂,无法直接部署在边缘设备上,因此需要对深度学习模型进行加速,降低参数冗余,减少存储占用,降低计算复杂度。
在一些实施例中,本实施例提供的一种深度学习模型测试的方法,本实施例中的方法应用于线下设备如边缘设备,其中所述边缘设备包括但不限于计算工作站、PC终端、芯片板卡等,边缘设备的操作系统包括但不限于windows、linux、android等。
本公开实施例提供的深度学习模型测试的方法,核心思想是在边缘设备建立一个自动化的流程,从获取深度学习模型,对深度学习模型进行加速,直至加速之后获取测试样本进行测试的整个流程中,实现一键加速测试的方法,提高了深度学习模型的部署效率。由于深度学习模型的算法复杂,在对 深度学习模型进行测试的时候需要在配置较大数据处理能力的处理器的设备上进行测试,一般可以是云端服务器进行深度学习模型的测试,并且目前对深度学习模型进行测试时都只是针对深度学习算法本身的测试流程,而即使该测试流程完成之后,由于深度学习模型运行过程复杂,若直接将测试完成的深度学习模型部署在边缘设备,由于边缘设备的数据处理能力无法支撑深度学习模型复杂的算法运行过程,导致在边缘设备上无法运行深度学习模型,可见,现有对深度学习模型的测试不仅只能基于云端服务器进行测试,测试完成后仍无法部署在边缘设备上。而本公开提供一种可通过边缘设备实现深度学习模型自动化测试到的方法,利用自动化对获取的深度学习模型进行加速后,降低了深度学习模型的计算量,从而可在边缘设备实现对深度学习模型的自动化测试流程,加快了深度学习模型在边缘设备上的部署,能够有效缩短部署周期。由于本实施例中的测试流程适用于各类深度学习模型,是一种具有标准化、自动化的深度学习模型测试流程,能够满足各类深度学习模型的测试需求,并利用边缘设备进行测试,能够提高将深度学习模型部署在边缘设备的效率。
如图1所示,本实施例提供的测试方法的实施流程如下所示:
步骤100、获取待部署的深度学习模型;
本实施例可以针对多种深度学习模型进行自动化加速和测试,在一些实施例中,所述待部署的深度学习模型可以存储于模型仓库,其中所述模型仓库用于存储各类深度学习模型,实施中,可以将不同的深度学习模型存储到对应的路径地址,存储的形式包括但不限于深度学习模型的代码、镜像代码等。
在一些实施例中,可以从本地服务器或云端服务器获取待部署的深度学习模型。
步骤101、获取用户指定的加速指令,根据与所述加速指令对应的加速方法,对所述深度学习模型进行加速,以提高所述深度学习模型的推理速度;
本实施例中获取的加速指令和加速方法的对应关系,包括但不限于如下 关系中的部分或全部:一对一、一对多、多对多。
其中,一对一,是指获取一种加速指令,该加速指令对应一种加速方法,实施中按照与该加速指令对应的一种加速方法对所述深度学习模型进行加速;一对多,是指获取一种加速指令,该加速指令对应多种加速方法,实施中按照与该加速指令对应的多种加速方法同时或分阶段对所述深度学习模型进行加速;多对多,是指获取多种加速指令,每种加速指令对应一种加速方法,实施中,按照与每种加速指令对应的加速方法同时或分阶段对所述深度学习模型进行加速。
在一些实施例中,若用户指定的加速指令对应一种加速方法,则按用户指定的加速方法,对深度学习模型进行加速。
在一些实施例中,若用户指定的加速指令对应的加速方法包括多种,则根据当前测试该深度学习模型所使用的边缘设备的系统类型和边缘设备的硬件性能,从所述多种加速方法中选择一种满足预设性能指标的加速方法,在一些示例中,预设性能指标包括但不限于性能最优和/或运行速度最快。实施中,系统类型包括但不限于:windows、linux以及android;硬件性能可依据边缘设备的处理器CPU性能、存储器性能、内存大小等确定。实施中,选择一种满足预设性能指标的加速方法,具体包括如下一种或多种实施方式:
方式1、边缘设备对深度学习模型进行加速的过程中,选择一种在单位时间内运行该深度学习模型的代码条数最多的加速方法,即选择一种运行速度最快的加速方法;
方式2、边缘设备对深度学习模型进行加速的过程中,选择一种边缘设备的CPU占用率最少的加速方法,即选择一种性能最优的加速方法;
方式3、边缘设备对深度学习模型进行加速的过程中,选择一种在单位时间内运行该深度学习模型的代码条数最多且边缘设备的CPU占用率最少的加速方法,即选择一种运行速度最快且CPU占用率最少的加速方法;
方式4、边缘设备对深度学习模型进行加速的过程中,利用与性能和运行速度分别对应的权重,对加速过程中得到的CPU占用率和运行速度进行加权 求和,选择一种求和值最小的加速方法。
在一些实施例中,本实施例提供的加速方法包括但不限于如下一种或多种:
移动神经网络(Mobile Neural Network,MNN);推理框架TNN;神经网络推理引擎Tengine-Lite。
步骤102、完成所述加速后,获取与所述深度学习模型对应的测试样本;
实施中,所述测试样本可以存储于数据仓库,所述数据仓库用于存储与深度学习模型对应的测试样本。
步骤103、利用所述测试样本对所述深度学习模型进行测试。
本实施例提供的自动化的测试流程,为了便于将深度学习模型部署在边缘设备上,完成了对深度学习模型从获取到加速、测试的完整的自动化流程,提高了部署前期准备工作的效率,节省了人力成本。并且本实施例提供的自动化测试流程由于对深度学习模型进行了加速处理,有效降低了深度学习模型的计算量,提高了处理速度,尤其针对本地端的边缘设备仍能保证利用本实施例中的测试流程实现线下测试。
在一些实施例中,本实施例中涉及的多种加速方式,包括但不限于MNN、TNN和Tengine-Lite,在三种加速方式中,用户可以指定选择一种速度最小且CPU占用率最少的加速方式。下面对本实施例中涉及到的三种加速方式进行如下说明:
方式1、MNN;
其中,MNN是一个轻量级的深度神经网络推理引擎,核心解决深度神经网络模型在端侧推理运行问题,涵盖深度神经网络模型的优化、转换和推理。MNN可以分为转换器Converter和解释器Interpreter两部分。
Converter由前端Frontends和图形优化Graph Optimize构成。Frontends负责支持不同的训练框架,MNN当前支持Tensorflow(Lite)、Caffe和ONNX;Graph Optimize通过算子融合、算子替代、布局调整等方式优化图。
Interpreter由引擎Engine和后端Backends构成。Engine负责模型的加载、 计算图的调度;Backends包含各计算设备下的内存分配。在Engine和Backends中,MNN应用了多种优化方案,包括在卷积和反卷积中应用维诺格拉德Winograd算法、在矩阵乘法中应用矩阵乘法Strassen算法、低精度计算、手写汇编、多线程优化、内存复用、异构计算等。
方式2、TNN;
其中,TNN是一种移动端高性能、轻量级推理框架,同时拥有跨平台、高性能、模型压缩、代码裁剪等众多突出优势。TNN包括:模型转换,低精度优化,算子编译优化,计算引擎,硬件架构等,其中,模型转换用于模型解析与转换,低精度优化用于FP16低精度转换、INT8后训练量化,算子编译优化包括算子tuning,布局优化,计算图优化等;计算引擎包括高性能kernel实现和高效能内存调度;硬件架构包括ARM,GPU,NPU等。
方式3、Tengine-Lite。
其中,Tengine-Lite实现了深度学习神经网络模型在嵌入式设备上快速、高效部署。Tengine-Lite的特性为:只依赖C库,具有独立模型加载过程,与Tengine(Web服务器项目)保持统一的应用接口,支持CMSIS-NN和HCL-M算子库,支持AI加速器与异构计算,开放支持Caffe/TensorFlow/MXNet模型,提供模型量化训练工具。TNN的优势是具有轻量级,易部署,解耦模型部署与模型运行代码,Cortex-A/M统一生态,MCU应用轻松移植到AP,支持算子定制开发,同时提升性能,适应嵌入式AI平台,给开发者更多自由选择等。
在一些实施例中,所述利用所述测试样本对所述深度学习模型进行测试之前,本实施例还提供一种编译方法,具体执行步骤如下:
步骤1)根据当前测试深度学习模型所使用的边缘设备的系统类型,确定编译器;
在一些实施例中,根据不同的系统有如下一种或多种确定编译器的方法:
11)若当前测试使用Linux系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
12)若当前测试使用ARM-Linux系统,则确定所述编译器为GNU编译 器套件(GNU Compiler Collection,gcc)、g++(GUN C++Compiler)以及交叉编译器中的一种;
13)若当前测试使用Android系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
14)若当前测试使用Windows系统,则确定编译器为Windows编译器。
步骤2)利用所述编译器对所述深度学习模型对应的算法代码进行编译,打包成库。
在一些实施例中,对于Linux、ARM-Linux以及Android系统,使用跨平台的安装(编译)工具CMAKE,选择编译器gcc、g++以及交叉编译器中的一种,将算法代码编译打包成so库形式;对于Windows系统,通过设置宏开关的方式控制编译器的执行,指定是否需要执行Windows编译器,将算法代码编译打包成dll库形式。即本实施例提供一种或多种库的形式,可通过如下方式确定打包的所述库的类型:
若所述编译器为gcc、g++以及交叉编译器中的一种,则确定打包的所述库的类型为so库;
若所述编译器为Windows编译器,则确定打包的所述库的类型为dll库。
本实施例提供一种自动化对深度学习模型进行加速、编译、测试的流程,实现了一键编译,一键打包,加快了深度学习模型的部署。
在一些实施例中,本实施例整个自动化流程中,包括但不限于如下一种或多种中间件:
1、模型仓库,用于存储待部署的深度学习模型;
2、代码仓库,用于存储待部署的深度学习模型对应的算法代码;
3、数据仓库,用于存储待部署的深度学习模型对应的测试样本及测试数据、测试报告等。
4、编译平台的docker镜像,用于实现对深度学习模型的编译、打包。
在一些实施例中,如图2所示,本公开实施例提供的一种自动化测试的流程,具体实施步骤如下所示:
步骤200、获取待部署的深度学习模型,并将所述深度学习模型存储至模型仓库;
步骤201、获取用户指定的加速指令,从加速库中选取与所述加速指令对应的一种运算速度最小且内存占用最少的加速方法;
步骤202、利用选取的加速方法对所述深度学习模型进行加速;
步骤203、确定加速完成;
步骤204、根据当前测试使用的系统类型,确定编译器;
步骤205、利用所述编译器对所述深度学习模型对应的算法代码进行编译,打包成库;
步骤206、从数据库中获取与所述深度学习模型对应的测试样本;
步骤207、利用所述测试样本对所述深度学习模型进行测试。
在一些实施例中,为了将深度学习模型部署在边缘设备后能够具备一些功能以供用户使用,还可以在编译之后,利用编译宏将一些功能库封装到深度学习模型的算法代码中,以使将深度学习模型部署在边缘设备后,能够使用该功能库的功能。具体实施方式如下所示:
将至少一个预设功能库封装到所述深度学习模型,所述预设功能库用于实现鉴权功能、加密功能以及网络功能中的一种或多种。其中,各个功能库可实现的功能进行如下说明:
1、鉴权功能;
实施中,采用授权激活的方式,基于设备(包括边缘设备或云端设备)的硬件指纹(通过指纹工具读取),具有设备唯一性,若每次申请license试用有效期为自申请日期后3个月,正式购买后可申请永久有效。则以Linux平台为例,开启鉴权功能需要进行如图3A、图3B、图3C所示的配置。按图3A配置后,响应于申请license功能按键的点击指令,则显示图3C所示界面,在该界面响应于运行平台的选择指令以及下载按钮的点击指令,则可以下载license。
2、加密功能;
实施中,采用高级加密标准(Advanced Encryption Standard,AES)加密的方式,保护算法模型和网络数据传输安全。
3、网络功能。
实施中,采用http post请求方式,将数据以Json报文的形式加密传输,因此网络功能需要和加密功能同时开启。
在一些实施例中,本公开实施例还提供一种自动化测试的流程,并将用户指定的功能库封装到深度学习模型编译打包的库中,实现对所述深度学习模型的鉴权功能、加密功能以及网络功能等,如图4所示,该流程的具体实施方式如下所示:
步骤400、获取待部署的深度学习模型,并将所述深度学习模型存储至模型仓库;
步骤401、获取用户指定的加速指令,从加速库中选取与所述加速指令对应的一种运算速度最小且CPU占用率最少的加速方法;
步骤402、利用选取的加速方法对所述深度学习模型进行加速;
步骤403、确定加速完成;
步骤404、根据当前测试使用的系统类型,确定编译器;
步骤405、利用所述编译器对所述深度学习模型对应的算法代码进行编译,打包成库;
步骤406、将鉴权功能库、加密功能库以及网络功能库中的一种或多种封装到所述打包的库中;
步骤407、从数据库中获取与所述深度学习模型对应的测试样本;
步骤408、利用所述测试样本对所述深度学习模型进行测试。
在一些实施例中,本实施例可通过如下一种或多种设备对所述深度学习模型进行测试:服务器设备;云端设备;边缘设备。
在一些实施例中,本实施例可通过如下一种或多种设备对所述深度学习模型进行加速:服务器设备;云端设备;边缘设备。
在一些实施例中,本实施例可通过如下一种或多种设备对所述深度学习 模型进行编译:服务器设备;云端设备;边缘设备。
在一些实施例中,所述利用所述测试样本对所述深度学习模型进行测试之后,还包括:根据测试得到的测试数据生成测试报告,以方便技术人员查看,依据测试报告中的内容判断该深度学习模型是否可以部署在边缘设备上。
在一些实施例中,本实施例可基于Gitlab runner功能,将模型仓库、代码仓库、数据仓库进行关联,实现本实施例中对深度学习模型的自动化加速、编译以及测试的流程,使得整个流程实现标准化,自动化,模块化,大幅度缩短算法的开发周期。
在一些实施例中,如图5所示,本实施例还提供一种完整的自动化测试流程,应用于边缘设备,该流程的具体实施步骤如下所示:
步骤500、获取待部署的深度学习模型;
其中,可以通云服务器或本地服务器获取待部署的深度学习模型,对此本实施例不作过多限定。
步骤501、将所述深度学习模型存储至模型仓库;
其中,所述模型仓库为所述边缘设备中的模型存储分区,用于存储深度学习模型。
步骤502、获取用户指定的加速指令,从加速库中选取与所述加速指令对应的一种运算速度最小且内存占用最少的加速方法;
实施中,可以通过命令行(代码)的方式,实现自动下拉加速方法对应的加速镜像docker,利用所述加速镜像中的加速方法对应的代码实现对深度学习模型的加速。
其中,所述加速库中存储有多种加速方法,所述加速库为所述边缘设备中的加速存储分区。
步骤503、利用选取的加速方法对所述深度学习模型进行加速,确定加速完成;
步骤504、根据当前测试使用的系统类型,确定编译器;
实施中,可以根据业务需求或系统类型,确定编译器。
步骤505、利用所述编译器对所述深度学习模型对应的算法代码进行编译,打包成库;
步骤506、将鉴权功能库、加密功能库以及网络功能库中的一种或多种封装到所述打包的库中;
步骤507、从数据库中获取与所述深度学习模型对应的测试样本;
实施中,自动在数据库下拉测试样本。
步骤508、利用所述测试样本对所述深度学习模型进行测试。
步骤509、根据测试得到的测试数据生成测试报告。
在一些实施例中,基于相同的发明构思,本公开实施例还提供了一种深度学习模型测试的设备,由于该设备即是本公开实施例中的方法中的设备,并且该设备解决问题的原理与该方法相似,因此该设备的实施可以参见方法的实施,重复之处不再赘述。
如图6所示,该设备包括处理器600和存储器601,所述存储器用于存储所述处理器可执行的程序,所述处理器用于读取所述存储器中的程序并执行如下步骤:
获取待部署的深度学习模型;
获取用户指定的加速指令,根据与所述加速指令对应的加速方法,对所述深度学习模型进行加速,以提高所述深度学习模型的推理速度;
完成所述加速后,获取与所述深度学习模型对应的测试样本;
利用所述测试样本对所述深度学习模型进行测试。
在一些实施例中,所述对所述深度学习模型进行加速之前,所述处理器还被配置为执行:
若与所述加速指令对应的加速方法包括多种,则根据当前测试所述深度学习模型所使用的所述边缘设备的系统类型和所述边缘设备的硬件性能,从所述多种加速方法中选择一种满足预设性能指标的加速方法。
在一些实施例中,所述利用所述测试样本对所述深度学习模型进行测试之前,所述处理器还被配置为执行:
根据当前测试所述深度学习模型所使用的所述边缘设备的系统类型,确定编译器;
利用所述编译器对所述深度学习模型对应的算法代码进行编译,打包成库。
在一些实施例中,所述处理器具体被配置为执行通过如下方式确定打包的所述库的类型:
若所述编译器为gcc、g++以及交叉编译器中的一种,则确定打包的所述库的类型为so库;
若所述编译器为Windows编译器,则确定打包的所述库的类型为dll库。
在一些实施例中,所述处理器具体被配置为执行通过如下一种或多种方式确定编译器:
若当前测试使用Linux系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
若当前测试使用ARM-Linux系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
若当前测试使用Android系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
若当前测试使用Windows系统,则确定编译器为Windows编译器。
在一些实施例中,所述利用所述编译器对所述深度学习模型对应的算法代码进行编译,打包成库之后,所述处理器还被配置为执行:
将至少一个预设功能库封装到所述库中,所述预设功能库用于实现鉴权功能、加密功能以及网络功能中的一种或多种。
在一些实施例中,所述利用所述测试样本对所述深度学习模型进行测试之后,所述处理器还被配置为执行:
根据测试得到的测试数据生成测试报告。
在一些实施例中,所述加速方法包括如下一种或多种:
移动神经网络MNN;
推理框架TNN;
神经网络推理引擎Tengine-Lite。
在一些实施例中,基于相同的发明构思,本公开实施例还提供了一种深度学习模型测试的装置,由于该装置即是本公开实施例中的方法中的装置,并且该装置解决问题的原理与该方法相似,因此该装置的实施可以参见方法的实施,重复之处不再赘述。
如图7所示,该装置包括:
获取模型单元700,用于获取待部署的深度学习模型;
加速模型单元701,用于获取用户指定的加速指令,根据与所述加速指令对应的加速方法,对所述深度学习模型进行加速,以提高所述深度学习模型的推理速度;
获取样本单元702,用于完成所述加速后,获取与所述深度学习模型对应的测试样本;
测试模型单元703,用于利用所述测试样本对所述深度学习模型进行测试。
在一些实施例中,所述对所述深度学习模型进行加速之前,所述加速模型单元具体还用于:
若与所述加速指令对应的加速方法包括多种,则根据当前测试所述深度学习模型所使用的所述边缘设备的系统类型和所述边缘设备的硬件性能,从所述多种加速方法中选择一种满足预设性能指标的加速方法。
在一些实施例中,所述利用所述测试样本对所述深度学习模型进行测试之前,还包括编译单元具体用于:
根据当前测试所述深度学习模型所使用的所述边缘设备的系统类型,确定编译器;
利用所述编译器对所述深度学习模型对应的算法代码进行编译,打包成库。
在一些实施例中,所述编译单元用于通过如下方式确定打包的所述库的类型:
若所述编译器为gcc、g++以及交叉编译器中的一种,则确定打包的所述库的类型为so库;
若所述编译器为Windows编译器,则确定打包的所述库的类型为dll库。
在一些实施例中,所述编译单元用于通过如下一种或多种方式确定编译器:
若当前测试使用Linux系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
若当前测试使用ARM-Linux系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
若当前测试使用Android系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
若当前测试使用Windows系统,则确定编译器为Windows编译器。
在一些实施例中,所述利用所述编译器对所述深度学习模型对应的算法代码进行编译,打包成库之后,所述编译单元还用于:
将至少一个预设功能库封装到所述库中,所述预设功能库用于实现鉴权功能、加密功能以及网络功能中的一种或多种。
在一些实施例中,所述利用所述测试样本对所述深度学习模型进行测试之后,所述测试模型单元还用于:
根据测试得到的测试数据生成测试报告。
在一些实施例中,所述加速方法包括如下一种或多种:
移动神经网络MNN;
推理框架TNN;
神经网络推理引擎Tengine-Lite。
在一些实施例中,基于相同的发明构思,本公开实施例还提供了一种计算机存储介质,其上存储有计算机程序,该程序被处理器执行时实现如下步骤:
获取待部署的深度学习模型;
获取用户指定的加速指令,根据与所述加速指令对应的加速方法,对所述深度学习模型进行加速,以提高所述深度学习模型的推理速度;
完成所述加速后,获取与所述深度学习模型对应的测试样本;
利用所述测试样本对所述深度学习模型进行测试。
本领域内的技术人员应明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本公开的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权 利要求意欲解释为包括优选实施例以及落入本公开范围的所有变更和修改。
显然,本领域的技术人员可以对本公开实施例进行各种改动和变型而不脱离本公开实施例的精神和范围。这样,倘若本公开实施例的这些修改和变型属于本公开权利要求及其等同技术的范围之内,则本公开也意图包含这些改动和变型在内。

Claims (10)

  1. 一种深度学习模型测试的方法,其中,应用于边缘设备,该方法包括:
    获取待部署的深度学习模型;
    获取用户指定的加速指令,根据与所述加速指令对应的加速方法,对所述深度学习模型进行加速,以提高所述深度学习模型的推理速度;
    完成所述加速后,获取与所述深度学习模型对应的测试样本;
    利用所述测试样本对所述深度学习模型进行测试。
  2. 根据权利要求1所述的方法,其中,所述对所述深度学习模型进行加速之前,还包括:
    若与所述加速指令对应的加速方法包括多种,则根据当前测试所述深度学习模型所使用的所述边缘设备的系统类型和所述边缘设备的硬件性能,从所述多种加速方法中选择一种满足预设性能指标的加速方法。
  3. 根据权利要求1所述的方法,其中,所述利用所述测试样本对所述深度学习模型进行测试之前,还包括:
    根据当前测试所述深度学习模型所使用的所述边缘设备的系统类型,确定编译器;
    利用所述编译器对所述深度学习模型对应的算法代码进行编译,打包成库。
  4. 根据权利要求3所述的方法,其中,通过如下方式确定打包的所述库的类型:
    若所述编译器为gcc、g++以及交叉编译器中的一种,则确定打包的所述库的类型为so库;
    若所述编译器为Windows编译器,则确定打包的所述库的类型为dll库。
  5. 根据权利要求3所述的方法,其中,通过如下一种或多种方式确定编译器:
    若当前测试使用Linux系统,则确定所述编译器为gcc、g++以及交叉编 译器中的一种;
    若当前测试使用ARM-Linux系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
    若当前测试使用Android系统,则确定所述编译器为gcc、g++以及交叉编译器中的一种;
    若当前测试使用Windows系统,则确定编译器为Windows编译器。
  6. 根据权利要求3所述的方法,其中,所述利用所述编译器对所述深度学习模型对应的算法代码进行编译,打包成库之后,还包括:
    将至少一个预设功能库封装到所述库中,所述预设功能库用于实现鉴权功能、加密功能以及网络功能中的一种或多种。
  7. 根据权利要求1~6任一所述的方法,其中,所述利用所述测试样本对所述深度学习模型进行测试之后,还包括:
    根据测试得到的测试数据生成测试报告。
  8. 根据权利要求1~6任一所述的方法,其中,所述加速方法包括如下一种或多种:
    移动神经网络MNN;
    推理框架TNN;
    神经网络推理引擎Tengine-Lite。
  9. 一种深度学习模型测试的设备,其中,该设备包括处理器和存储器,所述存储器用于存储所述处理器可执行的程序,所述处理器用于读取所述存储器中的程序并执行权利要求1~8任一所述方法的步骤。
  10. 一种计算机存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1~8任一所述方法的步骤。
PCT/CN2021/096132 2021-05-26 2021-05-26 一种深度学习模型测试的方法、设备及计算机存储介质 WO2022246705A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180001277.4A CN115701302A (zh) 2021-05-26 2021-05-26 一种深度学习模型测试的方法、设备及计算机存储介质
PCT/CN2021/096132 WO2022246705A1 (zh) 2021-05-26 2021-05-26 一种深度学习模型测试的方法、设备及计算机存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/096132 WO2022246705A1 (zh) 2021-05-26 2021-05-26 一种深度学习模型测试的方法、设备及计算机存储介质

Publications (1)

Publication Number Publication Date
WO2022246705A1 true WO2022246705A1 (zh) 2022-12-01

Family

ID=84229319

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096132 WO2022246705A1 (zh) 2021-05-26 2021-05-26 一种深度学习模型测试的方法、设备及计算机存储介质

Country Status (2)

Country Link
CN (1) CN115701302A (zh)
WO (1) WO2022246705A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180197317A1 (en) * 2017-01-06 2018-07-12 General Electric Company Deep learning based acceleration for iterative tomographic reconstruction
CN109213649A (zh) * 2018-09-18 2019-01-15 郑州云海信息技术有限公司 Gtx显卡深度学习优化测试方法、装置、终端及存储介质
CN109977813A (zh) * 2019-03-13 2019-07-05 山东沐点智能科技有限公司 一种基于深度学习框架的巡检机器人目标定位方法
CN110991614A (zh) * 2019-11-29 2020-04-10 苏州浪潮智能科技有限公司 一种Linux下GPU神经网络深度学习测试方法和系统
US20200151289A1 (en) * 2018-11-09 2020-05-14 Nvidia Corp. Deep learning based identification of difficult to test nodes
CN111709522A (zh) * 2020-05-21 2020-09-25 哈尔滨工业大学 一种基于服务器-嵌入式协同的深度学习目标检测系统
CN112835583A (zh) * 2021-01-12 2021-05-25 京东方科技集团股份有限公司 深度学习模型打包方法、装置、设备和介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180197317A1 (en) * 2017-01-06 2018-07-12 General Electric Company Deep learning based acceleration for iterative tomographic reconstruction
CN109213649A (zh) * 2018-09-18 2019-01-15 郑州云海信息技术有限公司 Gtx显卡深度学习优化测试方法、装置、终端及存储介质
US20200151289A1 (en) * 2018-11-09 2020-05-14 Nvidia Corp. Deep learning based identification of difficult to test nodes
CN109977813A (zh) * 2019-03-13 2019-07-05 山东沐点智能科技有限公司 一种基于深度学习框架的巡检机器人目标定位方法
CN110991614A (zh) * 2019-11-29 2020-04-10 苏州浪潮智能科技有限公司 一种Linux下GPU神经网络深度学习测试方法和系统
CN111709522A (zh) * 2020-05-21 2020-09-25 哈尔滨工业大学 一种基于服务器-嵌入式协同的深度学习目标检测系统
CN112835583A (zh) * 2021-01-12 2021-05-25 京东方科技集团股份有限公司 深度学习模型打包方法、装置、设备和介质

Also Published As

Publication number Publication date
CN115701302A (zh) 2023-02-07

Similar Documents

Publication Publication Date Title
CN107766126B (zh) 容器镜像的构建方法、系统、装置及存储介质
WO2019153829A1 (zh) 一种容器Dockerfile、容器镜像快速生成方法及系统
CN110083382A (zh) 跨平台内容管理和分发系统
CN111858370B (zh) DevOps的实现方法、装置、计算机可读介质
CN107168749B (zh) 一种编译方法、装置、设备和计算机可读存储介质
CN109117170B (zh) 一种运行环境搭建方法及装置、代码合入方法及系统
US20130125092A1 (en) Generating deployable code from simulation models
CN111198863A (zh) 一种规则引擎及其实现方法
US20180121171A1 (en) Integrated development environment for analytic authoring
CN114417355B (zh) 针对工业控制系统的轻量级安全性检测系统及方法
CN111316227B (zh) 一种调试应用程序的方法及设备
CN110532044A (zh) 一种大数据批处理方法、装置、电子设备及存储介质
CN111459490B (zh) 一种自动适配硬件平台的语音识别引擎移植方法及装置
CN106293687B (zh) 一种打包流程的控制方法,及装置
CN114610318A (zh) 安卓应用程序的打包方法、装置、设备及存储介质
CN114339470B (zh) 基于卫星指令的数据测试方法、装置、电子设备及介质
CN111309589A (zh) 一种基于代码动态分析的代码安全扫描系统及方法
Fursin The collective knowledge project: Making ML models more portable and reproducible with open APIs, reusable best practices and MLOps
Senington et al. Using docker for factory system software management: Experience report
WO2022246705A1 (zh) 一种深度学习模型测试的方法、设备及计算机存储介质
CN110413503A (zh) 一种应用程序性能监控方法、装置、设备及存储介质
Lerchner et al. An open S-BPM runtime environment based on abstract state machines
CN115033434B (zh) 一种内核性能理论值计算方法、装置及存储介质
US11573777B2 (en) Method and apparatus for enabling autonomous acceleration of dataflow AI applications
US20220206773A1 (en) Systems and methods for building and deploying machine learning applications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21942286

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18563594

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 23/01/2024)