CN117273171A - Deep learning framework adaptation method, deep learning framework adaptation device, computer equipment and storage medium - Google Patents
Deep learning framework adaptation method, deep learning framework adaptation device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN117273171A CN117273171A CN202311244999.XA CN202311244999A CN117273171A CN 117273171 A CN117273171 A CN 117273171A CN 202311244999 A CN202311244999 A CN 202311244999A CN 117273171 A CN117273171 A CN 117273171A
- Authority
- CN
- China
- Prior art keywords
- deep learning
- modified
- learning framework
- test
- code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 459
- 238000000034 method Methods 0.000 title claims abstract description 67
- 230000006978 adaptation Effects 0.000 title claims abstract description 37
- 238000011156 evaluation Methods 0.000 claims abstract description 28
- 238000012795 verification Methods 0.000 claims abstract description 27
- 238000004364 calculation method Methods 0.000 claims abstract description 11
- 238000012360 testing method Methods 0.000 claims description 322
- 238000013136 deep learning model Methods 0.000 claims description 37
- 238000011056 performance test Methods 0.000 claims description 33
- 238000012986 modification Methods 0.000 claims description 28
- 230000004048 modification Effects 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 20
- 238000012856 packing Methods 0.000 claims description 10
- 230000001419 dependent effect Effects 0.000 claims description 3
- 238000011161 development Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 11
- 238000012545 processing Methods 0.000 description 9
- 238000004806 packaging method and process Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 5
- 238000010276 construction Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- XSQUKJJJFZCRTK-NJFSPNSNSA-N UREA C 14 Chemical compound N[14C](N)=O XSQUKJJJFZCRTK-NJFSPNSNSA-N 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002715 modification method Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012827 research and development Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013112 stability test Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G06F11/3672—Test management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application relates to a deep learning framework adaptation method, a deep learning framework adaptation device, computer equipment and a storage medium, wherein the deep learning framework adaptation method comprises the following steps: acquiring operation information and operation requirements of the heterogeneous accelerator; the operation information and the operation requirement both represent information when the heterogeneous accelerator executes the calculation task of the target deep learning framework; modifying the source code of the target deep learning frame according to the operation information, modifying the three-party library code of the target deep learning frame according to the operation requirement, and then evaluating and verifying the modified deep learning frame; if the modified deep learning framework evaluation verification passes, the modified deep learning framework is determined to be a deep learning framework matched with the heterogeneous accelerator. The method improves the compatibility between the deep learning framework and the heterogeneous accelerator, so that the deep learning can stably run on the heterogeneous accelerator.
Description
Technical Field
The present disclosure relates to the field of computing technologies, and in particular, to a deep learning framework adaptation method, apparatus, computer device, and storage medium.
Background
With the rapid development of the deep learning model, higher requirements are put on training and reasoning speed of the deep learning framework.
In the related art, in order to improve the training and reasoning speed of the deep learning model, the deep learning framework and the heterogeneous accelerator can be subjected to research and development adaptation, and the computing tasks in the deep learning framework are executed in parallel through the heterogeneous accelerator.
However, there is a problem in the related art that compatibility between the deep learning framework and the heterogeneous accelerator is poor, resulting in that the deep learning framework cannot stably run on the heterogeneous accelerator.
Disclosure of Invention
Based on the foregoing, there is a need to provide a deep learning framework adaptation method, apparatus, computer device and storage medium, which can improve compatibility between a deep learning framework and heterogeneous accelerators, so that the deep learning framework can stably run on the heterogeneous accelerators.
In a first aspect, the present application provides a deep learning framework adaptation method, including:
acquiring operation information and operation requirements of the heterogeneous accelerator, wherein the operation information and the operation requirements both represent information when the heterogeneous accelerator executes a calculation task of a target deep learning framework;
Modifying the source code of the target deep learning frame according to the operation information, and modifying the three-party library code of the target deep learning frame according to the operation requirement;
evaluating and verifying the modified deep learning framework;
if the modified deep learning framework evaluation verification passes, the modified deep learning framework is determined to be a deep learning framework matched with the heterogeneous accelerator.
In the technical scheme of the embodiment of the application, firstly, the operation information and the operation requirement of the heterogeneous accelerator are acquired; the operation information and the operation requirement both represent information when the heterogeneous accelerator executes the calculation task of the target deep learning framework; modifying the source code of the target deep learning frame according to the operation information, modifying the three-party library code of the target deep learning frame according to the operation requirement, and then evaluating and verifying the modified deep learning frame; if the modified deep learning framework evaluation verification passes, the modified deep learning framework is determined to be a deep learning framework matched with the heterogeneous accelerator. In the embodiment of the application, since the operation information and the operation requirement are both information when the heterogeneous accelerator executes the calculation task of the target deep learning frame, the source code and the three-party library of the target deep learning frame are modified according to the operation information and the operation requirement, which is equivalent to considering the operation information and the operation requirement between the heterogeneous accelerator and the target deep learning frame when the target deep learning frame is executed by the heterogeneous accelerator in the process of adapting the target deep learning frame to the heterogeneous accelerator, thereby improving the compatibility between the target deep learning frame and the heterogeneous accelerator and ensuring the stable operation of the target deep learning frame on the heterogeneous accelerator; and moreover, the modified deep learning framework is evaluated and verified, so that the correctness and reliability of the modified deep learning framework are further ensured.
In one embodiment, the runtime information includes runtime interface information and heterogeneous accelerator dependency library files when using a math library; modifying the source code of the target deep learning framework according to the operation information, including:
modifying the corresponding interface code in the source code of the target deep learning frame according to the runtime interface information; and modifying a corresponding library file path in the source code of the target deep learning framework according to the library file when the heterogeneous accelerator uses the mathematical library.
In the technical scheme of the embodiment of the application, according to the interface information during operation, the corresponding interface code in the source code of the target deep learning frame is modified; and modifying a corresponding library file path in the source code of the target deep learning framework according to the library file when the heterogeneous accelerator uses the mathematical library. Therefore, the method is equivalent to modifying the interface codes and library file paths of the target deep learning framework and the heterogeneous accelerator in advance, so that the corresponding files can be found when the target deep learning framework is compiled and operated later, and the compatibility between the target deep learning framework and the heterogeneous accelerator is improved.
In one embodiment, the operational requirements include an execution mode; and modifying the three-party library code of the target deep learning framework according to the operation requirement, wherein the three-party library code comprises the following steps:
according to the storage position of the three-party library code, acquiring the three-party library code of the target deep learning frame;
based on the execution mode of the heterogeneous accelerator, the instruction set codes of the heterogeneous accelerator are obtained, and the three-party library codes are modified according to the instruction set codes.
According to the technical scheme, the three-party library codes of the target deep learning framework are obtained according to the storage positions of the three-party library codes, the instruction set codes of the heterogeneous accelerator are obtained based on the execution mode of the heterogeneous accelerator, and the three-party library codes are modified according to the instruction set codes. The instruction set of the heterogeneous accelerator can be adapted by modifying the three-party library, so that the usability problem of codes on the heterogeneous accelerator is solved; meanwhile, the execution efficiency of the code on the heterogeneous accelerator can be improved by modifying the three-party library code of the target deep learning framework.
In one embodiment, performing evaluation verification on the modified deep learning framework includes:
compiling the modified deep learning frame;
If the compiling is passed, carrying out unit test on the modified code of the deep learning frame;
if the unit test passes, continuing to perform a benchmark test on the modified code of the deep learning frame;
if the benchmark test is passed, determining that the evaluation verification of the modified deep learning framework is passed.
In the technical scheme of the embodiment of the application, compiling is performed on the modified deep learning frame, if the compiling is passed, unit testing is performed on the code of the modified deep learning frame, if the unit testing is passed, benchmark testing is continuously performed on the code of the modified deep learning frame, and if the benchmark testing is passed, evaluation verification of the modified deep learning frame is determined to be passed. The unit test can verify whether the functions of the modified deep learning frame in each unit are normal, and the reference test can evaluate the reliability of the modified deep learning frame in performance, so that the unit test and the reference test are performed on the codes of the modified deep learning frame under the condition that the modified deep learning frame passes the compiling, and the accuracy and the reliability of the modified deep learning frame are further improved; and only under the condition that compiling is passed, unit test is carried out, and under the condition that unit test is passed, benchmark test is carried out, so that the test is carried out in correct codes, and the reliability of evaluating and verifying the modified deep learning frame is improved.
In one embodiment, compiling the modified deep learning framework includes:
determining the compiling environment of the modified deep learning framework according to the configuration version of the target deep learning framework;
compiling the modified deep learning frame according to the compiling environment and a preset compiling command;
and if the compiling is failed, according to the log error reporting information generated by the compiling failure, adjusting the modified deep learning frame until the modified deep learning frame passes the compiling.
According to the technical scheme, according to the configuration version of the target deep learning frame, the compiling environment of the modified deep learning frame is determined, and according to the compiling environment and a preset compiling command, the modified deep learning frame is compiled; and if the compiling is failed, according to the log error reporting information generated by the compiling failure, adjusting the modified deep learning frame until the modified deep learning frame passes the compiling. By compiling the modified target deep learning frame and adjusting the modified deep learning frame when compiling fails, the accuracy and the effectiveness of the target deep learning frame can be ensured, and the accuracy and the efficiency of the adaptation of the target deep learning frame and the heterogeneous accelerator are improved.
In one embodiment, unit testing of the code of the modified deep learning framework includes:
performing at least one type of unit test on the modified deep learning frames respectively;
if the test results based on the unit tests of all types reach the test standard range, determining that the code unit test of the modified deep learning frame passes;
if the test results based on the unit tests of all types do not reach the test standard range, acquiring test failure cases and corresponding error reporting information according to the test results, and adjusting the modified deep learning framework based on the error reporting information corresponding to the test failure cases until the test results of the unit tests of all types reach the test standard range.
In the technical scheme of the embodiment of the application, at least one type of unit test is performed on the modified deep learning frame respectively, and if the test results based on the unit tests of all types reach the test standard range, the code unit test of the modified deep learning frame is determined to pass; if the test results based on the unit tests of all types do not reach the test standard range, acquiring test failure cases and corresponding error reporting information according to the test results, and adjusting the modified deep learning framework based on the error reporting information corresponding to the test failure cases until the test results of the unit tests of all types reach the test standard range. Setting a test standard range for the test result of the unit test, determining that the corresponding unit test passes only when the test result of the unit test reaches the test standard range, and performing at least one type of unit test on the modified deep learning frame, so that the comprehensiveness of the modified deep learning frame unit test is improved, and the test result of the modified deep learning frame is more accurate; and when the test result of the unit test does not reach the test standard range, the modified deep learning frame is adjusted according to the corresponding test failure use case, so that the accuracy and the efficiency of the adjustment of the modified deep learning frame are improved.
In one embodiment, benchmarking code of a modified deep learning framework includes:
according to a preset binary packing command, packing the modified code of the deep learning frame to generate a binary file;
training a network model through a preset data set based on the binary file to obtain a deep learning model;
performing various performance tests on the deep learning model;
and if all the performance tests of the deep learning model are passed, determining that the modified deep learning frame reference test is passed.
According to the technical scheme, according to the preset binary packaging command, the modified code of the deep learning frame is packaged to generate a binary file, training of a network model is conducted through a preset data set based on the binary file to obtain a deep learning model, then various performance tests are conducted on the deep learning model, and if all the performance tests of the deep learning model are passed, the pass of the modified deep learning frame reference test is determined. The modified deep learning framework is trained by a preset data set, and performance test is carried out on the deep learning model obtained by training, so that the modified deep learning framework is also high in accuracy while being matched with the heterogeneous accelerator.
In a second aspect, the present application further provides a deep learning frame adaptation device, including:
the acquisition module is used for acquiring the operation information and the operation requirement of the heterogeneous accelerator, wherein the operation information and the operation requirement both represent information when the heterogeneous accelerator executes the calculation task of the target deep learning framework;
the modification module is used for modifying the source code of the target deep learning frame according to the operation information and modifying the three-party library code of the target deep learning frame according to the operation requirement;
the evaluation module is used for evaluating and verifying the modified deep learning framework;
and the determining module is used for determining the modified deep learning framework as the deep learning framework matched with the heterogeneous accelerator if the modified deep learning framework is evaluated and verified.
In a third aspect, embodiments of the present application provide a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of the method provided by any of the embodiments of the first aspect described above when the computer program is executed.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method provided by any of the embodiments of the first aspect described above.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method provided by any of the embodiments of the first aspect described above.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for a person having ordinary skill in the art.
FIG. 1 is an internal block diagram of a computer device in one embodiment;
FIG. 2 is a flow diagram of a deep learning framework adaptation method in one embodiment;
FIG. 3 is a flow chart of a deep learning framework adaptation method in another embodiment;
FIG. 4 is a flowchart of a deep learning framework adaptation method according to another embodiment;
FIG. 5 is a flowchart of a deep learning framework adaptation method according to another embodiment;
FIG. 6 is a flowchart of a deep learning framework adaptation method according to another embodiment;
FIG. 7 is a flow chart of a deep learning framework adaptation method in another embodiment;
FIG. 8 is a flow chart of a deep learning framework adaptation method in another embodiment;
fig. 9 is a block diagram of a deep learning frame adapter in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The deep learning framework adaptation method provided by the embodiment of the application can be applied to computer equipment. The computer device may be a server, which may be implemented as a stand-alone server or as a cluster of servers. The internal structure of the computer device may be as shown in fig. 1. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing deep learning framework adaptation data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a deep learning framework adaptation method.
It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
The deep learning framework is an open source framework for machine learning, can be used for quickly constructing a neural network model, and simultaneously can quickly train, evaluate and store a network. The heterogeneous accelerator is also called a computing card, can realize high-performance computing capability economically and effectively, has strong expandability, and is hardware specially used for processing computing tasks. However, as the heterogeneous accelerator just starts to develop, no complete adaptation method of the deep learning framework on the heterogeneous accelerator exists at present, the heterogeneous accelerator is incompatible or has poor compatibility with the deep learning framework, and the model training, reasoning, storage and other functional support of the deep learning framework are imperfect, so that the software ecology of the heterogeneous accelerator cannot be well established, and the ecological development of the domestic heterogeneous accelerator can be reduced.
Based on the method, in order to solve the problem that the heterogeneous accelerator does not support or does not support the deep learning framework completely, the embodiment of the application provides a deep learning framework adaptation method, which can normalize and correctly assist the heterogeneous accelerator to complete adaptation and test research and development work on the deep learning framework, and effectively support the fast, stable and correct operation of the deep learning framework on the heterogeneous accelerator, so that the core competitiveness of the heterogeneous accelerator is improved.
In an exemplary embodiment, as shown in fig. 2, there is provided a deep learning framework adaptation method, which is described by taking an example that the method is applied to the computer device in fig. 1, and includes the following steps:
s201, acquiring operation information and operation requirements of the heterogeneous accelerator, wherein the operation information and the operation requirements both represent information when the heterogeneous accelerator executes a computing task of a target deep learning framework.
The heterogeneous accelerator is a hardware device capable of improving performance computing capacity, and can work together with a main processor to accelerate a specific type of computing task; for example, heterogeneous accelerators may accelerate computing tasks in a deep learning framework. Alternatively, the heterogeneous accelerator may be a graphics processor (Graphics Processing Unit, GPU), a field programmable gate array (Field Programmable Gate Array, FPGA), a tensor processing unit (Tensor Processing Unit, TPU), a Non-Volatile Memory (NVM), an NVM accelerator, or the like; the target deep learning framework can be any deep learning framework which needs heterogeneous accelerators to accelerate computing tasks; for example, the target deep learning framework may be TensorFlow, pyTorch and onefile, etc.
The manner of acquiring the operation information and the operation requirement when the heterogeneous accelerator executes the calculation task of the target deep learning framework may be: first, acquiring attribute information of a target deep learning framework, for example, the attribute information may include a version of the target deep learning framework, a supported hardware platform and an operating system, and the like; and then determining the operation information and the operation requirement of the heterogeneous accelerator according to the attribute information of the target deep learning framework and the corresponding relation between the attribute information and the operation requirement of the heterogeneous accelerator respectively.
Specifically, the correspondence relationship includes: the corresponding relation between the attribute information of the deep learning frame and the operation information of the heterogeneous accelerator and the corresponding relation between the attribute information of the deep learning frame and the operation requirement of the heterogeneous accelerator; the corresponding relation may include attribute information of a plurality of different deep learning frames, and the operation information and the operation requirement of the heterogeneous accelerator corresponding to the attribute information of each deep learning frame may be the same or different, and may be specifically determined according to actual situations.
Therefore, the operation information and the operation requirement of the heterogeneous accelerator corresponding to the target deep learning frame can be obtained from the corresponding relation.
In an alternative embodiment, the target deep learning framework may also be tested on heterogeneous accelerators to determine the operational information and operational requirements that the heterogeneous accelerators produce when performing the computing tasks of the target deep learning framework.
The running information may include a driver of the heterogeneous accelerator, a software development package version, a hardware specification, a computing capability, a memory bandwidth, and the like; the operating requirements may include storage requirements, operating systems, driver requirements, and the like.
S202, modifying the source code of the target deep learning frame according to the operation information, and modifying the three-party library code of the target deep learning frame according to the operation requirement.
In one embodiment, the code to be modified in the target deep learning framework can be determined according to a preset code modification model; for example, the operation information is input into a code modification model, the code modification model analyzes the operation information of the heterogeneous accelerator to obtain a modification mode of a source code of the target deep learning frame, and the source code in the target deep learning frame is modified based on the modification mode of the source code; and inputting the operation requirement into a code modification model, analyzing the operation requirement of the heterogeneous accelerator by the code modification model to obtain a modification mode of the target deep learning three-party library, and modifying the three-party library code of the target deep learning framework based on the modification mode of the three-party library.
Optionally, the mode of modifying the source code of the target deep learning frame according to the operation information may also be that the modification mode of the source code of the target deep learning frame is determined according to the operation information of the heterogeneous accelerator and the type of the target deep learning frame, and the source code of the target deep learning frame is modified according to the modification mode of the source code of the target learning frame; correspondingly, the method for modifying the three-party library of the target deep learning framework according to the operation requirement can also be that the modification method of the three-party library of the target deep learning framework is determined according to the operation requirement of the heterogeneous accelerator and the type of the target deep learning framework, and the three-party library code of the target deep learning is modified according to the modification method of the three-party library.
Both the modification of the source code and the modification of the three-party library may include the location of the code to be modified and the correct code to be replaced at that location.
And S203, evaluating and verifying the modified deep learning framework.
Wherein, the modified deep learning framework can be evaluated and verified to ensure functional correctness, performance optimization, compatibility, robustness, reliability and the like.
Thus, the manner of evaluating and verifying the modified deep learning framework may include at least one of evaluating performance of the modified deep learning framework, verifying functionality of the modified deep learning framework, testing compatibility of the modified deep learning framework, and testing robustness of the modified deep learning framework.
Performance testing is carried out on the modified deep learning frame, and the performance difference of the modified deep learning frame and the original target deep learning frame under the hardware environment such as the same heterogeneous accelerator can be compared; the performance of the modified deep learning framework can be evaluated by using a standard deep learning reference database (for example, imageNet) and common evaluation indexes (for example, accuracy, training time, reasoning time and the like), and comparing different evaluation data to evaluate the performance of the modified deep learning framework on heterogeneous accelerators.
The modified deep learning framework is subjected to function verification, so that the modified deep learning framework can be ensured to work normally in various deep learning tasks (such as image classification, target detection, voice recognition and the like); the method for performing functional verification on the modified deep learning framework may be to use a preset standard test data set and a test case to test the modified deep learning framework, verify whether the modified deep learning framework can correctly process input data, generate a correct output result, and have stable performance in various scenes.
Compatibility test is carried out on the modified deep learning frame, so that the modified deep learning frame can be ensured to be integrated with a common deep learning model; the compatibility test of the modified deep learning framework may be performed by using some common deep learning models (such as ResNet, yolo series, etc.) to test the modified deep learning framework and verify that the modified deep learning framework can load and run the models correctly.
The robustness test is carried out on the modified deep learning frame, so that the performance of the modified deep learning frame under various abnormal conditions can be confirmed; the method for performing the robustness test on the modified deep learning frame may be to simulate some common abnormal situations, for example, the missing of input data, noise interference, error of model parameters, and the like, and detect whether the modified deep learning frame can correctly process the abnormal situations, and has good robustness and fault tolerance.
It should be noted that, the manner of evaluating and verifying the modified deep learning framework may be determined according to specific requirements and application scenarios, and performed in combination with actual test data and test cases. Optionally, repeatability and scalability of the modified deep learning framework test may also be considered so that the modified deep learning framework can perform repeatability tests and extended verification under different environments and configurations.
S204, if the modified deep learning framework evaluation verification passes, determining the modified deep learning framework as a deep learning framework matched with the heterogeneous accelerator.
Based on the above evaluation verification of the modified deep learning framework, if the evaluation verification of the modified deep learning framework passes, it may be determined that the performance verification of the modified deep learning framework passes, and the modified deep learning framework may be determined as a deep learning framework adapted to the heterogeneous accelerator.
If the evaluation verification of the modified deep learning framework includes multiple verification modes, the condition that the evaluation verification of the modified deep learning framework passes may be that the multiple verification modes of the modified deep learning framework pass.
In the embodiment of the application, firstly, the operation information and the operation requirement of the heterogeneous accelerator are acquired; the operation information and the operation requirement both represent information when the heterogeneous accelerator executes the calculation task of the target deep learning framework; modifying the source code of the target deep learning frame according to the operation information, modifying the three-party library code of the target deep learning frame according to the operation requirement, and then evaluating and verifying the modified deep learning frame; if the modified deep learning framework evaluation verification passes, the modified deep learning framework is determined to be a deep learning framework matched with the heterogeneous accelerator. In the embodiment of the application, since the operation information and the operation requirement are both information when the heterogeneous accelerator executes the calculation task of the target deep learning frame, the source code and the three-party library of the target deep learning frame are modified according to the operation information and the operation requirement, which is equivalent to considering the operation information and the operation requirement between the heterogeneous accelerator and the target deep learning frame when the target deep learning frame is executed by the heterogeneous accelerator in the process of adapting the target deep learning frame to the heterogeneous accelerator, thereby improving the compatibility between the target deep learning frame and the heterogeneous accelerator and ensuring the stable operation of the target deep learning frame on the heterogeneous accelerator; and moreover, the modified deep learning framework is evaluated and verified, so that the correctness and reliability of the modified deep learning framework are further ensured.
The operation information comprises operation time interface information and a dependency library file when the heterogeneous accelerator uses a math library; in one exemplary embodiment, modifying the source code of the target deep learning framework based on the operational information includes: modifying the corresponding interface code in the source code of the target deep learning frame according to the runtime interface information; and modifying a corresponding library file path in the source code of the target deep learning framework according to the library file when the heterogeneous accelerator uses the mathematical library.
The runtime interface information may be an interface that needs to be invoked when the heterogeneous accelerator executes a computing task of the target deep learning framework; thus, when adapting the target deep learning framework to the heterogeneous accelerator, the corresponding interface code in the source code of the target deep learning framework may be modified according to the runtime interface information.
Optionally, determining a corresponding interface in the source code of the target deep learning frame according to the interface in the runtime interface information, wherein the interface in the runtime interface information is consistent with the corresponding interface in the source code of the target deep learning frame; and determining a modification mode of the interface according to the function and the parameter of the interface in the interface information in the running process, and modifying the corresponding interface code in the source code of the target deep learning frame according to the modification mode. The modification mode of the interface can comprise an interface calling mode, a data transmission mode, calculation logic, an algorithm and the like.
The dependency library file when the heterogeneous accelerator uses the mathematical library can be related dependencies to various databases used by the target deep learning framework when various computing tasks are executed on the heterogeneous accelerator; therefore, when the target deep learning framework is matched with the heterogeneous accelerator, the corresponding library file path in the source code of the target deep learning framework can be modified according to the library file when the heterogeneous accelerator uses the mathematical library.
Optionally, firstly, obtaining the dependency relationship between library files, namely all library files on which the mathematical library files depend; and then, acquiring a path depending on library files when the heterogeneous accelerator uses the math library, and modifying the path of the corresponding library file in the source code of the target deep learning frame according to the path depending on the library files when the heterogeneous accelerator uses the math library.
The modification of the corresponding library file path in the source code of the target deep learning framework may include: modifying the search path of the library file: finding a part for loading the mathematical library file in the source code, and modifying a corresponding library file searching path; this is typically accomplished by modifying the path string in the environment variable, configuration file, or code.
Optionally, the modification of the kernel function (kernel) of the target deep learning framework may also be performed according to the characteristics of the heterogeneous accelerator.
In the embodiment of the application, according to the interface information during operation, the corresponding interface code in the source code of the target deep learning frame is modified; and modifying a corresponding library file path in the source code of the target deep learning framework according to the library file when the heterogeneous accelerator uses the mathematical library. Therefore, the method is equivalent to modifying the interface codes and library file paths of the target deep learning framework and the heterogeneous accelerator in advance, so that the corresponding files can be found when the target deep learning framework is compiled and operated later, and the compatibility between the target deep learning framework and the heterogeneous accelerator is improved.
The operation requirement comprises an execution mode; in one exemplary embodiment, as shown in FIG. 3, the modification of the three-way library code of the target deep learning framework according to the operational requirements includes the steps of:
s301, acquiring the three-party library codes of the target deep learning framework according to the storage positions of the three-party library codes.
The execution mode may be a mode of heterogeneous accelerator bottom execution, for example, the execution mode may be executed in a unified computing device architecture (Compute Unified Device Architecture, CUDA) development platform mode or an open source computing platform (Radeon Open Compute, ROCm) development platform mode.
The storage position of the trigonal library code can be obtained from the source code of the target deep learning frame, and then the trigonal library code of the target deep learning frame is directly obtained according to the storage position of the trigonal library code.
Optionally, the external network may be required for obtaining part of the three-party libraries, at this time, some three-party library packages may be pulled to a local or self-built git server, and then the three-party library obtaining positions in the code of the target deep learning framework may be modified, so as to realize the correct downloading and compiling of the subsequent three-party libraries.
S302, based on the execution mode of the heterogeneous accelerator, acquiring an instruction set code of the heterogeneous accelerator, and modifying the three-party library code according to the instruction set code.
Determining the instruction set codes of the heterogeneous accelerator based on the correspondence between the execution mode and the instruction set codes; specifically, determining an instruction set code corresponding to an execution mode of the heterogeneous accelerator in the corresponding relation when the heterogeneous accelerator executes the computing task of the target deep learning framework as the instruction set code of the heterogeneous accelerator, and then modifying the three-party library code according to the instruction set code.
The method for modifying the three-party library code may be to add the instruction set code to the three-party library code or replace the instruction set code of the original heterogeneous accelerator in the three-party library.
For example, if the three-party library file is an llvm library file, a relevant instruction set code corresponding to the heterogeneous accelerator may be added to the llvm library file, so that the program of the target deep learning framework can operate on the corresponding heterogeneous accelerator correctly and stably.
In the embodiment of the application, the three-party library code of the target deep learning framework is obtained according to the storage position of the three-party library code, the instruction set code of the heterogeneous accelerator is obtained based on the execution mode of the heterogeneous accelerator, and the three-party library code is modified according to the instruction set code. The instruction set of the heterogeneous accelerator can be adapted by modifying the three-party library, so that the usability problem of codes on the heterogeneous accelerator is solved; meanwhile, the execution efficiency of the code on the heterogeneous accelerator can be improved by modifying the three-party library code of the target deep learning framework.
In one exemplary embodiment, as shown in FIG. 4, the modified deep learning framework is evaluated and validated, comprising the steps of:
s401, compiling the modified deep learning framework.
The compiling can improve the execution efficiency, and can detect grammar errors, type errors, some common logic errors and the like in the code, so that the modified deep learning framework can be compiled.
In one exemplary embodiment, as shown in FIG. 5, the modified deep learning framework is compiled, comprising the steps of:
s501, determining the compiling environment of the modified deep learning framework according to the configuration version of the target deep learning framework.
Before compiling the modified deep learning framework, the relevant compiling environment needs to be configured.
The compiling environment corresponding to the configuration version of the target deep learning frame can be obtained from the corresponding relation between the deep learning frame version and the compiling environment, and then the compiling environment corresponding to the configuration version of the target deep learning frame is determined to be the compiling environment of the modified deep learning frame.
In one embodiment, taking the object deep learning framework as a TensorFlow as an example, the corresponding python packet type may be determined according to the TensorFlow configuration version and the python version, and then the installation of the corresponding python packet may be performed.
S502, compiling the modified deep learning frame according to the compiling environment and a preset compiling command.
The compiling command is used for compiling the modified deep learning frame, and optionally after the compiling environment is configured, the modified deep learning frame is compiled according to a preset compiling command.
Compiling the modified deep learning frame through a compiling tool according to a preset compiling command; for example, the compilation tool may be Bazel, CMake, etc.; in the case where the compilation tool is Bazel, the compilation command may be a Bazel build compilation command.
Optionally, the modified deep learning framework can be compiled on a CUDA development platform, a ROCm development platform or a custom software platform; the platform for compiling the modified deep learning framework can be determined according to actual requirements.
Optionally, the modified deep learning framework is compiled, which is essentially the source code of the modified deep learning framework, and the modified source code may be compiled into an executable file or library file.
And S503, if the compiling fails, adjusting the modified deep learning frame according to the log error reporting information generated by the compiling failure until the modified deep learning frame passes the compiling.
After the modified deep learning framework is compiled, a compiling result may be output, wherein the compiling result may include a compiling success and a compiling failure.
If the compiling fails, the log error reporting information can be output correspondingly, so that the reason of the compiling failure can be positioned according to the log error reporting information of the compiling failure, then the modified deep learning frame is adjusted, then the source code of the modified deep learning frame (the adjusted deep learning frame) is compiled again until the compiling of the modified deep learning frame is successful, otherwise, the modified deep learning frame is adjusted again according to the log error reporting information generated by the compiling failure.
Optionally, the manner of adjusting the modified deep learning framework includes: and modifying the source code of the modified deep learning framework, modifying the three-party library code or modifying the compiling environment.
In the embodiment of the application, according to the configuration version of the target deep learning frame, determining the compiling environment of the modified deep learning frame, and compiling the modified deep learning frame according to the compiling environment and a preset compiling command; and if the compiling is failed, according to the log error reporting information generated by the compiling failure, adjusting the modified deep learning frame until the modified deep learning frame passes the compiling. By compiling the modified target deep learning frame and adjusting the modified deep learning frame when compiling fails, the accuracy and the effectiveness of the target deep learning frame can be ensured, and the accuracy and the efficiency of the adaptation of the target deep learning frame and the heterogeneous accelerator are improved.
And S402, if the compiling is passed, performing unit test on the modified code of the deep learning framework.
The unit test is an important step of ensuring the code quality and function are correct, so that the unit test can be performed on the code of the modified deep learning frame under the condition that the modified deep learning frame passes through compiling.
The mode of carrying out unit test on the modified code of the deep learning frame can be that the modified code of the deep learning frame is divided into a plurality of small test units; wherein each test unit may test a particular function or behavior; and acquiring test cases of each test unit, then running each test case according to a preset unit test framework to test each test unit, and determining a test result of each test case. If a certain test case test fails, the test unit corresponding to the test case fails, and the code unit test of the modified deep learning frame is determined to fail; and if all the test cases pass the test, determining that the code unit test of the modified deep learning framework passes.
The method for obtaining the test cases of each test unit may be writing the test cases for each test unit according to the function of each test unit; or directly acquiring the test case corresponding to the function of each test unit from the database according to the function of the test case; test cases may include input data, expected output, and predicate statements, etc., where the test case may cover as many code paths and boundary conditions as possible.
S403, if the unit test passes, continuing to perform the benchmark test on the modified code of the deep learning framework.
The benchmarking may evaluate the performance and efficiency of the modified deep learning framework, and thus, the benchmarking may be performed on the code of the modified deep learning framework in the event that the code unit test of the modified deep learning framework passes.
The reference test mode of the modified deep learning frame can be that a reference test task and a reference test data set of the modified deep learning frame are obtained; the benchmark test task can be determined according to the application field and the use scene, and the benchmark test data set comprises a plurality of samples and scenes; determining a plurality of benchmark test experiments according to the benchmark test tasks and the benchmark test data set; the benchmark test experiment comprises training, reasoning and the like; and running a benchmark test experiment according to the modified deep learning frame and the configuration environment to obtain a benchmark test experiment result.
Alternatively, the benchmarking results may include a code benchmarking pass of the modified deep learning framework and a code benchmarking fail of the modified deep learning framework.
Under the condition that the benchmark test is not passed, the modified deep learning frame can be adjusted, and compiling, unit testing and benchmark testing are performed on the modified deep learning frame again until the compiling, unit testing and benchmark testing of the modified deep learning frame are all passed. The manner of adjusting the modified deep learning framework includes: and modifying the source code of the modified deep learning framework, modifying the three-party library code or modifying the compiling environment.
S404, if the benchmark test is passed, determining that the evaluation verification of the modified deep learning framework is passed.
Under the condition that the code benchmark test of the modified deep learning frame passes, the compiling, the unit test and the benchmark test of the modified deep learning frame pass, and then the evaluation verification of the modified deep learning frame is determined to pass; wherein, the evaluation verification of the modified deep learning framework comprises compiling, unit testing and benchmark testing.
In the embodiment of the application, compiling is performed on the modified deep learning frame, if the compiling is passed, unit testing is performed on the code of the modified deep learning frame, if the unit testing is passed, benchmark testing is continuously performed on the code of the modified deep learning frame, and if the benchmark testing is passed, evaluation verification of the modified deep learning frame is determined. The unit test can verify whether the functions of the modified deep learning frame in each unit are normal, and the reference test can evaluate the reliability of the modified deep learning frame in performance, so that the unit test and the reference test are performed on the codes of the modified deep learning frame under the condition that the modified deep learning frame passes the compiling, and the accuracy and the reliability of the modified deep learning frame are further improved; and only under the condition that compiling is passed, unit test is carried out, and under the condition that unit test is passed, benchmark test is carried out, so that the test is carried out in correct codes, and the reliability of evaluating and verifying the modified deep learning frame is improved.
The following describes in detail how the modified code of the deep learning framework is subjected to unit testing by one embodiment, which in one exemplary embodiment, as shown in fig. 6, comprises the steps of:
s601, performing unit testing based on at least one type on the modified deep learning frames respectively.
The unit test is a functional verification test comprising a large number of test cases; the types of unit tests may include a c++ programming language-based unit test (ctest) and a python programming language-based unit test (pytest), and thus, two types of unit tests based on ctest and pytest may be performed on the modified deep learning framework.
At least one type of unit test can be respectively executed according to a preset unit test instruction, so that unit test results of all types of unit tests are obtained; the unit test results may include test results of multiple states such as Pass (PASSED), fail (FAILED), TIMEOUT (TIMEOUT), and unstable (FLAKY), wherein the TIMEOUT test results may include how long the TIMEOUT has occurred and the unstable test results may include test cases with several test failures.
Optionally, the unit test instruction may perform unit test on the modified code of the deep learning framework on a unified computing device architecture (Compute Unified Device Architecture, CUDA) development platform, an open source computing platform (Radeon Open Compute, ROCm), or a custom software platform; the platform for carrying out unit test on the modified deep learning frame can be determined according to actual requirements.
S602, if the test results based on the unit tests of all types reach the test standard range, determining that the code unit test of the modified deep learning framework passes.
The test standard range may be a preset standard for evaluating whether the test result of the unit test passes.
For example, the test results include pass, fail, timeout, unstable, etc., and if the test result is "pass", the test criteria range includes the test result being pass; therefore, if the test results of the unit tests of all types are "pass", determining that the code unit test of the modified deep learning frame passes, otherwise, determining that the code unit test of the modified deep learning frame does not pass.
And S603, if the test results based on the unit tests of all types do not reach the test standard range, acquiring test failure cases and corresponding error reporting information according to the test results, and adjusting the modified deep learning frame based on the reward information corresponding to the test failure cases until the test results of the unit tests of all types reach the test standard range.
If the test result of the unit test is not passed, determining that the test result of the corresponding unit test does not reach the test standard range, acquiring a test case (test failure case) and corresponding error reporting information, and adjusting the modified deep learning frame based on the error reporting information corresponding to the test failure case until the test result of each type of unit test reaches the test standard range.
The method for adjusting the modified deep learning framework based on the error reporting information corresponding to the test failure case may be to analyze the test failure case code according to the error reporting information, determine the reason of the test failure case unit, and adjust the modified deep learning framework according to the reason of the test failure.
The method for adjusting the modified deep learning framework comprises the following steps: and modifying the source code of the modified deep learning framework, modifying the three-party library code or modifying the compiling environment.
After the modified deep learning frames are adjusted, compiling the modified deep learning frames in sequence; under the condition that compiling is passed, performing unit testing based on ctest and pytest on the modified deep learning frame until the testing results of the unit testing reach the testing standard range; otherwise, continuing to adjust the modified deep learning framework according to the error reporting information.
In the embodiment of the application, at least one type of unit test is performed on the modified deep learning frame, and if the test results of the unit test based on each type reach the test standard range, the code unit test of the modified deep learning frame is determined to pass; if the test results based on the unit tests of all types do not reach the test standard range, acquiring test failure cases and corresponding error reporting information according to the test results, and adjusting the modified deep learning framework based on the error reporting information corresponding to the test failure cases until the test results of the unit tests of all types reach the test standard range. Setting a test standard range for the test result of the unit test, determining that the corresponding unit test passes only when the test result of the unit test reaches the test standard range, and performing at least one type of unit test on the modified deep learning frame, so that the comprehensiveness of the modified deep learning frame unit test is improved, and the test result of the modified deep learning frame is more accurate; and when the test result of the unit test does not reach the test standard range, the modified deep learning frame is adjusted according to the error reporting information corresponding to the corresponding test failure use case, so that the accuracy and the efficiency of the adjustment of the modified deep learning frame are improved.
The following describes in detail how the code of the modified deep learning framework is benchmarked by one embodiment. In one exemplary embodiment, as shown in FIG. 7, benchmarking the code of the modified deep learning framework includes the steps of:
s701, according to a preset binary packing command, packing the modified code of the deep learning frame to generate a binary file.
And under the condition that the compiling of the modified deep learning frame passes and the test results based on the unit tests of all types reach the test standard range, executing a binary packing command on the modified deep learning frame to carry out packing processing on the modified deep learning frame so as to obtain a binary file. The binary package command may be build_pip_package, and the binary file is whl package.
Optionally, a whl package execution command of the build_pip_package is executed, and whl package of the modified deep learning framework is performed.
It should be noted that, if the name of the binary file needs to be modified, the package name of the binary file may be modified in the binary file.
Meanwhile, a preset construction instruction can be executed for the modified deep learning framework, and the construction of the libTensorFlow. So library file of an interface (Application Programming Interface, API) between programs corresponding to the C language and the construction of the libTensorFlow_cc. So library file of a high-level packaging tool (Advanced Packaging Tools, APT) corresponding to the C++ language are carried out for the modified deep learning framework; the build instructions may be executed on the modified deep learning framework at a unified computing device architecture (Compute Unified Device Architecture, CUDA) development platform, an open source computing platform (Radeon Open Compute, ROCm), or a custom software platform; the platform for constructing the library file of the modified deep learning frame can be determined according to actual requirements.
S702, training a network model through a preset data set based on the binary file to obtain a deep learning model.
The preset data set may be ImageNet or CoCo data set, and the network model may be network models such as ResNet50, vgNet and InceptionNet.
Optionally, the training of the network model through the preset data set may be performed by performing long-time mounting operation of 90 iteration cycles of training of the network model through the data set based on the binary file, so as to obtain the deep learning model.
S703, performing various performance tests on the deep learning model.
And performing various performance tests on the deep learning model obtained through training, wherein the performance tests can comprise various types such as performance benchmark tests, accuracy tests, memory use tests, stability tests, single-process single-card tests, single-process multi-card tests, multi-process multi-card tests and the like.
The test results of each performance test can be compared with the data disclosed by a plurality of heterogeneous accelerators or the performance test results obtained by running the test results on the related heterogeneous accelerators, and whether the performance test of the deep learning model passes or not is determined based on the comparison results.
Optionally, the test result of each performance test on the deep learning model can be compared with the standard test result of each performance test, and if the test result of the performance test on the deep learning model is the same as the corresponding standard test result, determining that the performance test of the deep learning model passes for any performance test; if the test result of the performance test of the deep learning model is inconsistent with the corresponding standard test result, determining that the performance test of the deep learning model fails; based on this, the test results of the respective performance tests of the deep learning model can be determined.
And S704, if all the performance tests of the deep learning model are passed, determining that the modified deep learning frame benchmark test is passed.
If all the performance tests of the deep learning model pass, the performance of the deep learning model is better, and the modified deep learning frame reference test is determined to pass.
If each performance test of the deep learning model fails, the modified deep learning frame can be adjusted according to the failed performance test and errors of the corresponding test.
Specifically, the modified deep learning framework can be adjusted by compiling and debugging the source code of the modified deep learning framework; the method for adjusting the modified deep learning framework comprises the following steps: and modifying the source code of the modified deep learning framework, modifying the three-party library code or modifying the compiling environment.
Compiling and debugging the source code of the modified deep learning framework can be performed on a unified computing device architecture (Compute Unified Device Architecture, CUDA) development platform, an open source computing platform (Radeon Open Compute, ROCm) or a custom software platform; the platform for compiling and debugging the modified deep learning framework can be determined according to actual requirements.
The problem of performance test can be positioned, debugged and solved through the source code of the modified deep learning frame, then various performance tests are carried out until each performance test passes, otherwise, the reason of error is required to be continuously positioned, and the modified deep learning frame is adjusted.
In the embodiment of the application, according to a preset binary packaging command, the modified code of the deep learning frame is packaged to generate a binary file, training of a network model is performed through a preset data set based on the binary file to obtain a deep learning model, then various performance tests are performed on the deep learning model, and if all the performance tests of the deep learning model are passed, the pass of the modified deep learning frame reference test is determined. The modified deep learning framework is trained by a preset data set, and performance test is carried out on the deep learning model obtained by training, so that the modified deep learning framework is also high in accuracy while being matched with the heterogeneous accelerator.
In an exemplary embodiment, the embodiment of the present application further provides a deep learning framework adaptation method, taking a target deep learning framework as a TensorFlow as an example, as shown in fig. 8, where the embodiment includes the following steps:
s801, according to the runtime interfaces and various database related dependencies in heterogeneous accelerators which can be used in the TensorFlow, the paths of API interfaces and dependency items between programs in the TensorFlow are modified.
S802, modifying the three-party library codes according to the requirements of the heterogeneous accelerator.
The method for modifying the three-party library code can be modifying the instruction and the optimization of the three-party library code; for example, the three-party library is an llvm library file, and the related instruction set codes of heterogeneous accelerators can be added in the llvm library file.
S803, configuring a compiling environment for the modified TensorFlow.
Wherein configuring the compilation environment may include installing a corresponding version of the python package according to the configured version of the Tensorflow and the version of python.
S804, compiling the source code of the modified TensorFlow; if the compiling fails, step S805 is executed, and if the compiling passes, step S806 is executed.
And compiling the source code of the modified TensorFlow according to a preset compiling command to obtain a compiling result, wherein the compiling result comprises that compiling is passed or not passed.
S805, the source code, the three-party library or the compiling environment of the modified TensorFlow is adjusted.
If the compiling is failed, locating the failure reason according to the log error reporting information, and then adjusting the source code, the three-party library or the compiling environment until the compiling is successful.
S806, performing unit test on the modified TensorFlow; if the unit test fails, step S805 is performed, and if the unit test passes, step S807 is performed.
If the unit test fails, the source code, the three-party library or the compiling environment of the modified TensorFlow can be adjusted according to the test failure use case corresponding to the unit test failure, and then the modified TensorFlow is compiled and tested.
S807, packing the source codes according to a preset packing command to obtain a whl packet.
S808, performing benchmark test on the whl packet; if the reference test passes, step S809 is performed, and if the reference test fails, step S805 is performed.
The reference test process comprises the following steps: training a deep learning model through a data set based on the whl packet, and if the training is passed, success is achieved; if the training is not passed, debugging can be carried out through the source code of the modified TensorFlow, the source code of the modified TensorFlow is modified, and then compiling, unit testing and benchmark testing are carried out on the modified TensorFlow until the benchmark testing is passed.
S809, publishing whl packages.
Wherein the published whl package can be made available to the relevant users of the heterogeneous accelerator.
In the embodiment of the application, through the adaptation and the test of the TensorFlow framework on the heterogeneous accelerator, the functions of relevant model training, test, model storage and the like of the TensorFlow framework can be accurately and stably operated on the heterogeneous accelerator, the development of artificial intelligence technology is promoted, the ecological construction of the heterogeneous accelerator is promoted, the construction development of the heterogeneous accelerator is promoted, the core competitiveness of the heterogeneous accelerator is effectively improved, and the development strength of the chip industry is improved.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a deep learning frame adapting device for realizing the above-mentioned deep learning frame adapting method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the deep learning frame adapting device or devices provided below may be referred to the limitation of the deep learning frame adapting method hereinabove, and will not be repeated herein.
In one exemplary embodiment, as shown in fig. 9, there is provided a deep learning frame adaptation apparatus 900, comprising: an acquisition module 901, a modification module 902, an evaluation module 903, and a determination module 904, wherein:
the acquiring module 901 is configured to acquire operation information and operation requirements of the heterogeneous accelerator, where the operation information and the operation requirements both represent information when the heterogeneous accelerator executes a computing task of the target deep learning framework;
the modification module 902 is configured to modify a source code of the target deep learning framework according to the operation information, and modify a three-party library code of the target deep learning framework according to the operation requirement;
the evaluation module 903 is configured to evaluate and verify the modified deep learning framework;
A determining module 904, configured to determine the modified deep learning framework as a deep learning framework adapted to the heterogeneous accelerator if the modified deep learning framework evaluation verifies.
In one embodiment, the runtime information includes runtime interface information and heterogeneous accelerator use math library dependent library files; the modification module 902 includes:
the first modification unit is used for modifying the corresponding interface code in the source code of the target deep learning frame according to the runtime interface information; and modifying a corresponding library file path in the source code of the target deep learning framework according to the library file when the heterogeneous accelerator uses the mathematical library.
In one embodiment, the operational requirements include an execution mode; the modification module 902 includes:
the acquisition unit is used for acquiring the three-party library codes of the target deep learning frame according to the storage positions of the three-party library codes;
the second modification unit is used for acquiring the instruction set codes of the heterogeneous accelerator based on the execution mode of the heterogeneous accelerator, and modifying the three-party library codes according to the instruction set codes.
In one embodiment, the evaluation module 903 includes:
the compiling unit is used for compiling the modified deep learning frame;
The first test unit is used for carrying out unit test on the modified code of the deep learning frame if the compiling is passed;
the second test unit is used for continuing to perform reference test on the modified code of the deep learning frame if the unit test passes;
and the evaluation unit is used for determining that the evaluation verification of the modified deep learning framework passes if the benchmark test passes.
In one embodiment, the compiling unit comprises:
a first determining subunit, configured to determine a compiling environment of the modified deep learning framework according to the configuration version of the target deep learning framework;
the compiling subunit is used for compiling the modified deep learning frame according to the compiling environment and a preset compiling command;
and the first adjustment subunit is used for adjusting the modified deep learning frame according to the log error reporting information generated by the compiling failure if the compiling failure occurs, until the modified deep learning frame passes the compiling.
In one embodiment, the first test unit comprises:
a first test subunit for performing at least one type of unit-based test on the modified deep learning frames, respectively;
the second determining subunit is used for determining that the code unit test of the modified deep learning framework passes if the test results based on the unit tests of all types reach the test standard range;
And the second adjustment subunit is used for acquiring the test failure use case and the corresponding error reporting information according to the test result if the test result based on the unit test of each type does not reach the test standard range, and adjusting the modified deep learning frame based on the error reporting information corresponding to the test failure use case until the test result of the unit test of each type reaches the test standard range.
In one embodiment, the second test unit includes:
the generation subunit is used for carrying out packaging processing on the modified code of the deep learning frame according to a preset binary packaging command to generate a binary file;
the training subunit is used for training the network model through a preset data set based on the binary file to obtain a deep learning model;
the second testing subunit is used for performing various types of performance tests on the deep learning model;
and the third determination subunit is used for determining that the modified deep learning framework benchmark test passes if all the performance tests of the deep learning model pass.
The respective modules in the deep learning framework adaptation device described above may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
The implementation principle and technical effect of each step implemented by the processor in this embodiment are similar to those of the above deep learning framework adaptation method, and are not described herein.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
The steps implemented when the computer program is executed by the processor in this embodiment are similar to the principles and technical effects of the deep learning frame adaptation method described above, and will not be described here again.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
The steps implemented when the computer program is executed by the processor in this embodiment are similar to the principles and technical effects of the deep learning frame adaptation method described above, and will not be described here again.
It should be noted that, the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are all information and data authorized by the user or sufficiently authorized by each party, and the collection, use, and processing of the relevant data are required to meet the relevant regulations.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.
Claims (10)
1. A method of deep learning framework adaptation, the method comprising:
acquiring operation information and operation requirements of a heterogeneous accelerator, wherein the operation information and the operation requirements both represent information when the heterogeneous accelerator executes a calculation task of a target deep learning framework;
modifying the source code of the target deep learning frame according to the operation information, and modifying the three-party library code of the target deep learning frame according to the operation requirement;
Evaluating and verifying the modified deep learning framework;
and if the modified deep learning framework evaluation verification passes, determining the modified deep learning framework as a deep learning framework matched with the heterogeneous accelerator.
2. The method of claim 1, wherein the run information includes run-time interface information and a library-dependent file when the heterogeneous accelerator uses a math library; the modifying the source code of the target deep learning framework according to the running information comprises the following steps:
modifying corresponding interface codes in the source codes of the target deep learning framework according to the runtime interface information; and modifying a corresponding library file path in the source code of the target deep learning frame according to the library file dependent when the heterogeneous accelerator uses the mathematical library.
3. The method of claim 1 or 2, wherein the operational requirement comprises an execution mode; the modifying the three-party library code of the target deep learning framework according to the operation requirement comprises the following steps:
according to the storage position of the three-party library code, acquiring the three-party library code of the target deep learning frame;
And acquiring an instruction set code of the heterogeneous accelerator based on the execution mode of the heterogeneous accelerator, and modifying the three-party library code according to the instruction set code.
4. The method according to claim 1 or 2, wherein said evaluating verification of the modified deep learning framework comprises:
compiling the modified deep learning framework;
if the compiling is passed, carrying out unit test on the modified code of the deep learning frame;
if the unit test passes, continuing to perform a benchmark test on the modified code of the deep learning frame;
and if the benchmark test is passed, determining that the evaluation verification of the modified deep learning framework is passed.
5. The method of claim 4, wherein compiling the modified deep learning framework comprises:
determining the compiling environment of the modified deep learning framework according to the configuration version of the target deep learning framework;
compiling the modified deep learning frame according to the compiling environment and a preset compiling command;
and if the compiling is failed, according to the log error reporting information generated by the compiling failure, adjusting the modified deep learning frame until the modified deep learning frame passes the compiling.
6. The method of claim 4, wherein said performing unit testing on code of said modified deep learning framework comprises:
performing at least one type of unit test on the modified deep learning frames respectively;
if the test results based on the unit tests of all types reach the test standard range, determining that the code unit test of the modified deep learning frame passes;
if the test results based on the unit tests of the types do not reach the test standard range, acquiring test failure cases and corresponding error reporting information according to the test results, and adjusting the modified deep learning framework based on the error reporting information corresponding to the test failure cases until the test results of the unit tests of the types reach the test standard range.
7. The method of claim 4, wherein benchmarking the code of the modified deep learning framework comprises:
according to a preset binary packing command, packing the modified code of the deep learning frame to generate a binary file;
Training a network model through a preset data set based on the binary file to obtain a deep learning model;
performing various types of performance tests on the deep learning model;
and if all the performance tests of the deep learning model are passed, determining that the modified deep learning frame benchmark test is passed.
8. A deep learning frame adaptation device, the device comprising:
the system comprises an acquisition module, a target deep learning framework and a target deep learning framework, wherein the acquisition module is used for acquiring operation information and operation requirements of the heterogeneous accelerator, wherein the operation information and the operation requirements both represent information when the heterogeneous accelerator executes a calculation task of the target deep learning framework;
the modification module is used for modifying the source code of the target deep learning frame according to the operation information and modifying the three-party library code of the target deep learning frame according to the operation requirement;
the evaluation module is used for evaluating and verifying the modified deep learning framework;
and the determining module is used for determining the modified deep learning framework as the deep learning framework matched with the heterogeneous accelerator if the modified deep learning framework is evaluated and verified.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311244999.XA CN117273171A (en) | 2023-09-25 | 2023-09-25 | Deep learning framework adaptation method, deep learning framework adaptation device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311244999.XA CN117273171A (en) | 2023-09-25 | 2023-09-25 | Deep learning framework adaptation method, deep learning framework adaptation device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117273171A true CN117273171A (en) | 2023-12-22 |
Family
ID=89215572
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311244999.XA Pending CN117273171A (en) | 2023-09-25 | 2023-09-25 | Deep learning framework adaptation method, deep learning framework adaptation device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117273171A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117827523A (en) * | 2024-03-05 | 2024-04-05 | 北京壁仞科技开发有限公司 | Model exception handling method and device, electronic equipment and storage medium |
-
2023
- 2023-09-25 CN CN202311244999.XA patent/CN117273171A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117827523A (en) * | 2024-03-05 | 2024-04-05 | 北京壁仞科技开发有限公司 | Model exception handling method and device, electronic equipment and storage medium |
CN117827523B (en) * | 2024-03-05 | 2024-05-14 | 北京壁仞科技开发有限公司 | Model exception handling method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Tensorfi: A configurable fault injector for tensorflow applications | |
US9208057B2 (en) | Efficient model checking technique for finding software defects | |
US6697961B1 (en) | Method and system for describing predicates in disjuncts in procedures for test coverage estimation | |
US7882495B2 (en) | Bounded program failure analysis and correction | |
US20150309914A1 (en) | Metaphor based language fuzzing of computer code | |
US10514898B2 (en) | Method and system to develop, deploy, test, and manage platform-independent software | |
CN111581036B (en) | Internet of things fault detection method, detection system and storage medium | |
Jacob | Implementation of randomized test pattern generation strategy | |
CN117273171A (en) | Deep learning framework adaptation method, deep learning framework adaptation device, computer equipment and storage medium | |
Chowdhury et al. | CyFuzz: A differential testing framework for cyber-physical systems development environments | |
KP et al. | Finite‐state model extraction and visualization from Java program execution | |
US7908596B2 (en) | Automatic inspection of compiled code | |
Akpinar et al. | Web application testing with model based testing method: case study | |
US11755458B2 (en) | Automatic software behavior identification using execution record | |
Raiyat Aliabadi et al. | FIDL: A fault injection description language for compiler-based SFI tools | |
WO2023207973A1 (en) | Compiler test method and apparatus, case generation method and apparatus, and instruction storage structure | |
US8291383B1 (en) | Code analysis via dual branch exploration | |
US20140289712A1 (en) | Effective Lifetime Dependency Analysis and Typestate Analysis | |
Gräfe et al. | Large-scale application of fault injection into pytorch models-an extension to pytorchfi for validation efficiency | |
de Barros et al. | Learning to program using hierarchical model-based debugging | |
Dupriez et al. | Analysis and exploration for new generation debuggers | |
Eriksson | Formal requirement models for automotive embedded systems | |
US20240319992A1 (en) | Utilizing multiple analyses to migrate an application to a cloud computing environment | |
Ding et al. | Efficient model-level reliability analysis of simulink models | |
US20230376409A1 (en) | Code synthesis model evaluation harnessing real-world code repositories |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |