WO2023123828A1 - 模型处理方法、装置、电子设备、计算机存储介质和程序 - Google Patents

模型处理方法、装置、电子设备、计算机存储介质和程序 Download PDF

Info

Publication number
WO2023123828A1
WO2023123828A1 PCT/CN2022/093836 CN2022093836W WO2023123828A1 WO 2023123828 A1 WO2023123828 A1 WO 2023123828A1 CN 2022093836 W CN2022093836 W CN 2022093836W WO 2023123828 A1 WO2023123828 A1 WO 2023123828A1
Authority
WO
WIPO (PCT)
Prior art keywords
deep learning
learning model
hash value
speed measurement
operator node
Prior art date
Application number
PCT/CN2022/093836
Other languages
English (en)
French (fr)
Inventor
刘亮
龚睿昊
王裕淞
王燕飞
余锋伟
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023123828A1 publication Critical patent/WO2023123828A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the technical field of deep learning, and relates to but not limited to a model processing method, device, electronic equipment, computer storage medium and computer program product.
  • Speed testing is a necessary process in the deployment phase of the deep neural network model. Because the designs of different hardware platforms are different, the performance of the same model on different hardware platforms is also different, and the running speed will vary greatly.
  • the speed performance of the model is related to the landing and application of the model; after using the deep learning framework to train the corresponding model, if you want to get the speed performance of the model in the deployment environment, you often need to convert the model, apply for the corresponding hardware platform permission, Configure the hardware operating environment and other steps to correctly deploy the model and test the speed performance, but this process is very lengthy and complicated, and for different models and different hardware platforms, due to the different development kits of hardware platform manufacturers, it often requires Try a different deployment process.
  • the embodiments of the present disclosure expect to provide a model processing method, device, electronic device, computer storage medium and computer program product, and by building a database, the speed information of the neural network model can be obtained relatively quickly.
  • An embodiment of the present disclosure provides a model processing method, the method comprising:
  • each deep learning model in the operating platform set is formed by deploying each deployment tool in the accelerated library on each hardware in the hardware set;
  • a database is constructed based on the mapping relationship between each speed measurement result and the corresponding deep learning model and operating platform.
  • An embodiment of the present disclosure also provides a model processing device, the device comprising:
  • the acquisition part is configured to acquire a set of deep learning models and a set of operating platforms, wherein a hash value is used as an identifier characterizing each deep learning model in the set of deep learning models, and each deep learning model includes a network topology and Attribute information of each operator node; each operating platform in the operating platform set is formed by deploying each deployment tool in the acceleration library on each hardware in the hardware set; the determining part is configured to determine each depth The speed measurement results of the learning models on the various operating platforms; the construction part is configured to construct a database based on the mapping relationship between each of the speed measurement results and the corresponding deep learning model and the operating platform.
  • An embodiment of the present disclosure also provides an electronic device, including a processor and a memory for storing a computer program that can run on the processor; wherein,
  • the processor is configured to run the computer program to execute any one of the above model processing methods.
  • An embodiment of the present disclosure also provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a processor, any one of the above-mentioned model processing methods is implemented.
  • An embodiment of the present disclosure also provides a computer program product, the computer program product carries a program code, and instructions included in the program code can be used to execute any one of the above-mentioned model processing methods.
  • the method includes: obtaining a set of deep learning models and a set of operating platforms, wherein hash values are used as the representation of the depth
  • the identification of each deep learning model in the learning model set, each deep learning model includes network topology and attribute information of each operator node; each running platform in the running platform set is each deployment in the accelerated library
  • the tool is deployed on each hardware in the hardware set; determining the speed measurement result of each deep learning model on each operating platform; based on the relationship between each speed measurement result and the corresponding deep learning model and operating platform The mapping relationship to build the database.
  • the speed measurement results of each deep learning model on each operating platform are stored in the database; since each deep learning model can be represented by a hash value, therefore , when receiving a user's speed measurement request for a certain deep learning model, the hash value of the deep learning model can be compared with the hash value of each deep learning model stored in the database, and the first deep learning model can be quickly obtained.
  • the model-matched deep learning model can furthermore use the speed measurement result of the matched deep learning model found in the database as the speed measurement result of the first deep learning model. In this way, the acquisition of speed information of the model can be accelerated.
  • FIG. 1A is a flowchart of a model processing method according to an embodiment of the present disclosure
  • FIG. 1B is a flowchart of determining the hash value of each deep learning model according to an embodiment of the present disclosure
  • FIG. 1C is a flowchart of determining the hash value of each operator node and the hash value of the attribute information of each operator node in each deep learning model according to an embodiment of the present disclosure
  • FIG. 1D is another flow chart for determining the hash value of each deep learning model according to an embodiment of the present disclosure
  • FIG. 1E is another flow chart for determining the hash value of each deep learning model according to an embodiment of the present disclosure
  • FIG. 1F is a flow chart of building a database in an embodiment of the present disclosure.
  • FIG. 1G is a flow chart of determining the speed measurement result of the first deep learning model in an embodiment of the present disclosure
  • FIG. 1H is another flow chart for determining the speed measurement result of the first deep learning model in an embodiment of the present disclosure
  • FIG. 1I is a flow chart of determining the speed measurement results of each deep learning model on the first operating platform in an embodiment of the present disclosure
  • FIG. 1J is a flow chart of adding the speed measurement results of the first deep learning model on each operating platform to the database in an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of another model processing method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of the composition and structure of a model processing device according to an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • the term "comprises”, “comprises” or any other variation thereof is intended to cover a non-exclusive inclusion, so that a method or device comprising a series of elements not only includes the explicitly stated elements, but also include other elements not explicitly listed, or also include elements inherent in implementing the method or apparatus.
  • an element defined by the phrase “comprising a" does not exclude the presence of additional related elements (such as steps in the method or A unit in an apparatus, for example, a unit may be part of a circuit, part of a processor, part of a program or software, etc.).
  • model processing method provided by the embodiment of the present disclosure includes a series of steps, but the model processing method provided by the embodiment of the present disclosure is not limited to the steps described.
  • model processing device provided by the embodiment of the present disclosure includes a series of parts, but the device provided by the embodiments of the present disclosure is not limited to include the parts explicitly recorded, and may also include the parts that need to be set for obtaining relevant information or processing based on the information.
  • the present disclosure can be implemented based on an electronic device, where the electronic device can be a thin client, thick client, handheld or laptop device, microprocessor-based system, set-top box, programmable consumer electronics, networked personal computer, small computer system ,etc.
  • the electronic device can realize corresponding functions through the execution of program modules.
  • program modules may include routines, programs, objects, components, logic, data structures, and so on. They perform specific tasks or implement specific abstract data types.
  • the computer system can be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computing system storage media including storage devices.
  • some speed measurement platforms can assist users in convenient model deployment and testing. These platforms eliminate the cumbersome process of manual model deployment, and instead use the actual deployment experience of multiple platforms to integrate the model analysis interface and back-end operation.
  • the library interfaces are unified, and model deployment and evaluation are performed in a relatively automatic manner.
  • each acquisition of model speed information needs to go through a complete deployment process such as automatic model conversion, model compilation, and remote execution. Therefore, the acquisition of the speed information of the same model often requires repeated actual testing on the hardware platform. . Since the speed information of the model is related to the landing and application of the model, how to improve the speed information of the model quickly is a technical problem that needs to be solved urgently.
  • the model processing method can be realized by using a processor in the model processing device, and the processor can be an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (Digital Signal Processor, DSP), Digital Signal Processing Device (Digital Signal Processing Device, DSPD), Programmable Logic Device (Programmable Logic Device, PLD), Field Programmable Logic Gate Array (Field Programmable Gate Array, FPGA), Central Processing Unit (Central Processing Unit , CPU), controller, microcontroller, microprocessor at least one.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSPD Digital Signal Processing Device
  • PLD Programmable Logic Device
  • Field Programmable Logic Gate Array Field Programmable Gate Array
  • FPGA Field Programmable Gate Array
  • CPU Central Processing Unit
  • CPU Central Processing Unit
  • microcontroller microprocessor at least one.
  • Fig. 1A is a flowchart of a model processing method according to an embodiment of the present disclosure. As shown in Fig. 1A, the process may include:
  • Step 100 Obtain a set of deep learning models and a set of operating platforms, wherein the hash value is used as the identifier representing each deep learning model in the set of deep learning models, and each deep learning model includes network topology and attributes of each operator node Information; each running platform in the running platform set is formed by deploying each deployment tool in the acceleration library on each hardware in the hardware set.
  • each deep learning model in the deep learning model set is not limited, for example, it can be a deep neural network (Deep Neural Networks, DNN) model, a recurrent neural network (Recurrent Neural Networks, RNN) model, etc.; wherein, depth
  • DNN Deep Neural Networks
  • RNN Recurrent Neural Networks
  • FIG. 1B is a method for determining the hash value of each deep learning model in an embodiment of the present disclosure.
  • Value flow chart, as shown in Figure 1B, the process may include the following steps:
  • Step 1000 Obtain the network topology of each deep learning model and attribute information of each operator node
  • Step 1001 Determine the hash value of each operator node and the hash value of the attribute information of each operator node in each deep learning model
  • Step 1002 Perform hash processing on the hash value of each operator node and the hash value of the attribute information of each operator node to obtain the hash value of each deep learning model.
  • each deep learning model may represent a network model including multiple neural network layers, where each neural network layer may include at least one neuron; for example, each operator in each deep learning model Nodes correspond to each neuron included in each deep learning model; according to the connection relationship of each neuron included in each neural network layer, the network topology of each deep learning model can be determined.
  • the deep learning model includes three neural network layers of an input layer, a hidden layer and an output layer, and the input layer includes three neurons, the hidden layer includes four neurons, and the output layer includes three neurons, then
  • the ten neurons included in the deep learning model correspond to the ten operator nodes in the model; the connection relationship between these ten operator nodes constitutes the network topology of the deep learning model.
  • the attribute information of each operator node may include information such as the weight corresponding to the node, the data type corresponding to the node, and the calculation method of the function corresponding to the node (for example, it may be a sum function).
  • the network topology structure of each deep learning model in the deep learning model set and the attribute information of each operator node can be acquired through a graph traversal method.
  • the implementation method of determining the hash value of each deep learning model in the deep learning model set through the attribute information and topology structure of each node is clarified, so that it is convenient for subsequent comparison of two deep learning models.
  • the hash value of the model can quickly determine whether the two deep learning models have the same network topology, and improve the efficiency of judging the identity of the two models.
  • the hash value of each operator node in each deep learning model and the hash value of the attribute information of each operator node can be determined through the steps shown in Figure 1C:
  • Step 1003 Construct a directed acyclic graph of each deep learning model based on the network topology of each deep learning model and the attribute information of each operator node;
  • Step 1004 Determine the hash value of each operator node in each DAG and the hash value of the attribute information of each operator node.
  • the information constructs the corresponding directed acyclic graph; after obtaining the directed acyclic graph of each deep learning model, determine the hash value of each operator node in each directed acyclic graph and the attribute information of each operator node hash value.
  • each The hash value of the deep learning model After obtaining the hash value of each operator node in each DAG and the hash value of the attribute information of each operator node, each The hash value of the deep learning model:
  • Step 1005 Perform hash processing on the hash value of each operator node in each DAG and the hash value of the attribute information of each operator node to obtain each operator node in each DAG The unique representation value of ;
  • Step 1006 sort the unique representation values of each operator node in each DAG
  • Step 1007 Perform hash processing on the sorted unique representation values of each operator node to obtain the hash value of each deep learning model.
  • the hash value of each operator node is sorted by the reverse topology sorting method Combined with the hash value of the attribute information and further hashed to obtain the unique representation value of each operator node in each directed acyclic graph; here, the operator nodes with the same hash value not only have The same operator type, the same attribute information, and the same topological relationship of child nodes.
  • the hash value of each deep learning model based on the hash value of each operator node in each deep learning model and the hash value of the attribute information of each operator node;
  • the hash value of the learning model can quickly get the matching deep learning model.
  • the unique representation value of each operator node is sorted and further hashed to obtain each deep neural network The hash value of the model.
  • the process of determining the hash value of each deep learning model is described below in conjunction with FIG. 1E.
  • the process may include:
  • Step 1008 delete the parameter information of each deep learning model
  • Step 1009 Extracting the network topology of each deep learning model and attribute information of each operator node
  • Step 1010 Construct a directed acyclic graph based on the network topology and attribute information of each operator node
  • Step 1011 Determine the unique representation value of each operator node in each DAG
  • Step 1012 Determine the hash value of each deep learning model.
  • step 1009 is implemented in the same manner as step 1000 above
  • step 1010 is implemented in the same manner as step 1003 above
  • step 1011 is implemented in the same manner as step 1005 above.
  • each deep learning model can be determined according to the above processing flow; each deep learning model in the deep learning model set corresponds to a hash value, and the hash values of different deep learning models are different, that is, Each deep learning model has a unique hash value corresponding to it, which uniquely identifies the network topology of the deep learning model and the attribute information of each operator node.
  • the parameter information of each deep learning model is deleted.
  • the format of each deep learning model can be converted into a specific format that can be hashed; here, There is no limitation on the type of specific format; for example, it may be Open Neural Network Exchange (ONNX), or other types of conversion formats.
  • OTNX Open Neural Network Exchange
  • the parameter information of each deep learning model can be deleted first, so that the memory usage of the model data can be reduced.
  • each operating platform in the operating platform set reflects the connection between each hardware in the hardware set and each deployment tool in the accelerated library; it represents the combination of the hardware and the accelerated library when running the model, and is a pair of There are multiple connections; that is, through different combinations of hardware and acceleration libraries, multiple different types of operating platforms can be obtained; here, hardware can represent a hardware platform.
  • Step 101 Determine the speed measurement results of each deep learning model on each operating platform.
  • the speed measurement results of each deep learning model in the set of deep learning models on each running platform in the set of running platforms will be determined.
  • the speed measurement results reflect the connection between each operating platform and the network topology of each deep learning model; due to the differences in hardware configurations, deployment tools and operating environments for different operating platforms in the operating platform set; Therefore, the performance of the same deep learning model on different operating platforms is also different, and the operating speed will also vary greatly.
  • determining the speed measurement results of each deep learning model on each operating platform it may be: deploy each deep learning model on each operating platform through manual deployment, and obtain each deep learning model It can also be: unify the model analysis interface and the back-end runtime interface, deploy and evaluate the model in a relatively automatic way, and perform speed measurement on each operating platform through operations such as model conversion, model compilation, and remote execution. Operation to get the speed measurement results of each deep learning model.
  • Step 102 Build a database based on the mapping relationship between each speed measurement result and the corresponding deep learning model and operating platform.
  • a database can be constructed according to the mapping relationship between each speed measurement result and the corresponding deep learning model and the operating platform; At the same time, the model information, operating platform information, speed information, etc. of each deep learning model are stored in the database.
  • the model information may also include information such as the network topology of the model; the operating platform information may include hardware information and acceleration library information, etc.; the speed information represents each of the above speed measurement results.
  • relevant information such as each deep learning model, and the hash value and speed measurement results of each deep learning model can be persistently stored in the database.
  • the deep learning model matching the first deep learning model can be quickly obtained by comparing the hash value of the deep learning model with the hash value of each deep learning model stored in the database.
  • the speed measurement result of the matching deep learning model found in the database can be used as the speed measurement result of the first deep learning model, thus, the acquisition of speed information of the model can be accelerated.
  • the implementation of building a database it can also be: firstly extract the attribute information of the entity and the relationship according to the information to be stored in each deep learning model, and design the entity relationship diagram according to the attribute information of the entity and the relationship; the entity relationship diagram
  • the main entities may include network topology, hardware, etc., wherein the attribute information of the network topology may include attributes such as serial number, name, hash value, input shape, output shape, and attribute topology map, among which hash value, input shape, The output shape can uniquely index a network topology; the attribute information of the hardware can include attributes such as serial number, name, and architecture, and the name of the hardware can uniquely index the hardware.
  • the main relationship in the entity relationship diagram may include the running platform, speed, etc.
  • the running platform is the connection between the hardware and the acceleration library, which means the combination of the hardware and the acceleration library when running the model. It is a many-to-many relationship, and the speed It is the connection between the running platform and the network topology, indicating the running speed of the model on the specific running platform, and also includes attribute information such as creation time, occupied memory, and test times. It is a many-to-many connection.
  • the database can be constructed, and the specific implementation can refer to FIG. 1F;
  • FIG. 1F is a flow chart of building a database in an embodiment of the present disclosure. As shown in FIG. :
  • Step 1013 extracting attribute information of entities and relationships to construct an entity relationship graph
  • Step 1014 Determine the access field and its data representation
  • Step 1015 Build database tables and interfaces for adding, deleting, modifying, and checking.
  • an entity relationship graph can be constructed according to the extracted attribute information of each entity and connection; then, the field of the database table and its data representation can be determined by using the entity relationship graph, so as to realize the Structured storage of related attribute information; finally, data connection, database table construction, addition, deletion, modification, query and other operation interfaces can be realized through corresponding coding, providing certain support for model performance information storage and query services.
  • Embodiments of the present disclosure propose a model processing method, device, electronic equipment, computer storage medium, and computer program product.
  • the method includes: acquiring a set of deep learning models and a set of operating platforms, wherein the hash value is used as a representation of the deep learning model
  • the identification of each deep learning model in the collection, each deep learning model includes the network topology and attribute information of each operator node; each running platform in the running platform collection is each deployment tool in the acceleration library deployed in the hardware collection Formed on each hardware; determine the speed measurement results of each deep learning model on each operating platform; build a database based on the mapping relationship between each speed measurement result and the corresponding deep learning model and operating platform.
  • the speed measurement results of each deep learning model on each operating platform are stored in the database; since each deep learning model can be represented by a hash value, therefore , when receiving a user's speed measurement request for a certain deep learning model, by comparing the hash value of the first deep learning model and the hash value of each deep learning model, you can quickly get the same speed as the first deep learning model.
  • the speed measurement result of the matched deep learning model can be used as the speed measurement result of the first deep learning model, so that the acquisition of speed information of the model can be accelerated.
  • the speed measurement result of the first deep learning model can be determined through the steps shown in FIG. 1G:
  • Step 1016 In response to the user's first speed measurement request, determine the hash value of the first deep learning model based on the identifier of the first deep learning model included in the first speed measurement request;
  • Step 1017 Based on the hash value of the first deep learning model, search in the database to obtain the speed measurement result of the first deep learning model.
  • the user wants to query the speed measurement result of the first deep learning model specified by himself, he can send the first speed measurement request including the identification of the first deep learning model to the running platform; the running platform receives the user's first After the speed measurement request, the system service backend corresponding to the operating platform will obtain the network topology of the first deep learning model and the attribute information of each operator node according to the identifier of the first deep learning model in the first speed measurement request.
  • the system service backend corresponding to the operating platform will obtain the network topology of the first deep learning model and the attribute information of each operator node according to the identifier of the first deep learning model in the first speed measurement request.
  • you can Determine the hash value of the model based on the network topology of the first deep learning model and the attribute information of each operator node; here, the realization of the hash value of the deep learning model is determined by the attribute information and topology of each node.
  • the hash value of the first deep learning model is compared with the hash value of each deep learning model stored in the database to obtain the comparison result; here , there is no limitation on the way of comparing the hash values of the two, for example, one by one comparison may be adopted, and a simultaneous comparison may also be adopted.
  • the speed measurement result of the deep learning model that matches it can be directly found in the database based on the hash value.
  • the acquisition of the speed information of the model can be accelerated.
  • the deep learning model with the same hash value as the first deep learning model is stored in the database, it means that the deep learning model with the same hash value as the first deep learning model has been found.
  • model that is, the network topology of the first deep learning model is the same as the found deep learning model.
  • the speed measurement results of the found deep learning model on each operating platform can be used as the speed measurement results of the first deep learning model .
  • the speed measurement result of the found deep learning model can be used as the first deep learning model.
  • Speed measurement results it can be seen that the embodiment of the present disclosure does not need to repeatedly deploy the first deep learning model on the running platform to obtain the speed measurement results, which effectively improves the efficiency of obtaining the speed measurement results.
  • the first speed measurement request also includes the identification of the target operating platform; for the hash value based on the first deep learning model, search in the database to obtain the speed measurement result of the first deep learning model, you can refer to Steps as shown in Figure 1H:
  • Step 1018 Find a second deep learning model that is identical to the hash value of the first deep learning model and the identity of the target operating platform in the database;
  • Step 1019 Determine the speed measurement result of the second deep learning model on the target operating platform as the speed measurement result of the first deep learning model on the target operating platform.
  • the first speed measurement request may also include the hardware corresponding to the first deep learning model and the deployment tool in the accelerated library, so that based on the hardware corresponding to the first deep learning model and the deployment tool in the accelerated library, determine The target operating platform corresponding to the first deep learning model; it is convenient to query the speed measurement results of the deep learning model on the target operating platform from the database later.
  • the speed measurement result of the second deep learning model on the target operating platform may be determined as the speed measurement result of the first deep learning model on the target operating platform.
  • the hash value of the first deep learning model is 111, and the identifier of the target operating platform of the first deep learning model is 3; if the deep learning models stored in the database include deep learning models 1 to 3 , and the corresponding hash values are 101, 110, and 111 in sequence.
  • the identifier of the operating platform of deep learning model 1 is 1, the identifier of the operating platform of deep learning model 2 is 3, and the identifier of the operating platform of deep learning model 3 is 1 and 3; at this time, it can be determined that the hash value of the first deep learning model is the same as that of the deep learning model 3, and the identification of the operating platform corresponding to the first deep learning model is the corresponding operating platform of the deep learning model 3
  • the identification of is also the same; that is, the speed measurement result of the deep learning model 3 on the operating platform corresponding to the identification 3 can be used as the speed measurement result of the first deep learning model on the target operating platform.
  • the acquisition of the speed information of the model can be accelerated.
  • a query operation may be performed on the hash value based on the operation interface of the database, so as to obtain a deep learning model with the same hash value from the database. It can be seen that the use of database technology can realize the storage and retrieval of related information such as the structure and performance of deep learning models.
  • Step 1020 In response to the user's third speed measurement request, based on the identification of the first operating platform included in the third speed measurement request, search for a second operating platform with the same identification as the first operating platform in the database;
  • Step 1021 Determine the speed measurement results of each deep learning model on the first operating platform based on the speed measurement results of each deep learning model in the database on the second operating platform.
  • a third speed measurement request including the identification of the first running platform may be sent to the running platform;
  • the system service backend corresponding to the operating platform will search the database for the second operating platform with the same identification as the first operating platform according to the identification of the first operating platform in the first speed measurement request;
  • the speed measurement results of each deep learning model in the database on the second operating platform are determined to determine the speed measurement results of each deep learning model on the first operating platform.
  • the speed measurement results of each deep learning model on the running platform corresponding to the running platform identifier can be found in the database according to the running platform identifier included in the user speed measuring request. Therefore, Can better meet the application requirements.
  • Step 1022 In response to not finding the hash value of the first deep learning model in the database, measure the speed of the first deep learning model on each operating platform in the first thread, and obtain the first deep learning model on each operating platform speed test results on
  • Step 1023 Add the speed measurement results of the first deep learning model on each operating platform to the database.
  • the system service backend corresponding to the operating platform needs to initiate the actual speed measurement task for the first deep learning model; that is, it needs to be performed on each actual operating platform through operations such as model conversion, model compilation, and remote execution.
  • the speed measurement operation obtains the speed measurement results of the first deep learning model on each operating platform.
  • the speed measurement task for the first deep learning model After the speed measurement task for the first deep learning model is completed, check the returned speed measurement result, and if it is determined that the speed measurement is successful according to the speed measurement result, store performance records such as model structure information, operating platform information, and speed information in the database, namely , adding the speed measurement results of the first deep learning model on each operating platform to the database; otherwise, if it is determined that the speed measurement fails according to the speed measurement results, an error message is returned.
  • the subsequent performance query tasks of the deep learning model with the same network structure do not need to be repeated on the actual hardware platform each time.
  • Deployment evaluation thus, can accelerate the acquisition of model performance data.
  • the above method may further include: while measuring the speed of the first deep learning model in the first thread, responding to the user's second speed measurement request in the second thread.
  • the system service backend can initiate a speed measurement task in a separate thread (namely the first thread) and wait for the speed measurement result, which will not block the service backend in the second thread Receive the user's second speed measurement request for other deep learning models, thereby increasing the parallelism of system services; here, the first thread and the second thread represent two threads performing different tasks.
  • the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possible
  • the inner logic is OK.
  • FIG. 2 is a schematic flowchart of another model processing method according to an embodiment of the present disclosure. As shown in FIG. 2, the process may include:
  • Step 200 Receive a speed measurement request of the first deep learning model.
  • the speed measurement request of the first deep learning model designated by the user is received; here, the speed measurement request may include the first deep learning model information, hardware platform information, acceleration information, and the like.
  • Step 201 Query the database.
  • the hash value of the first deep learning model is determined according to the information of the first deep learning model included in the speed measurement request, and is compared with the hash value in the database.
  • the performance record includes the speed information of the model.
  • Step 202 Determine whether a performance record is found.
  • the hash value of each deep learning model among various deep learning models and related information such as the speed measurement results on each operating platform are pre-stored in the database; at this time, according to the first deep learning model
  • Step 203 Determine whether the same speed measurement task has been submitted.
  • step 207 it is determined whether the system service backend corresponding to the operating platform has submitted a model speed measurement task with the same hash value as the first deep learning model; if yes, perform step 207 , otherwise, perform step 204 .
  • Step 204 Measure the speed of the first deep learning model on the running platform.
  • the system service backend if the system service backend does not submit a model speed measurement task with the same hash value as the first deep learning model, the system service backend initiates an actual speed measurement task for the first deep learning model; that is, in the actual The speed measurement operation is performed on the first deep learning model on the running platform, and the speed measurement result is obtained.
  • Step 205 Determine whether the speed measurement is successful.
  • step 206 is executed; otherwise, step 208 is executed.
  • Step 206 Insert related performance records into the database.
  • step 208 when it is determined that the speed measurement is successful according to step 205, relevant performance records of the first deep learning model, such as model structure information, operating platform information, speed information and other performance records are stored in the database, and step 208 is executed.
  • relevant performance records of the first deep learning model such as model structure information, operating platform information, speed information and other performance records are stored in the database, and step 208 is executed.
  • Step 207 Wait for the completion of the speed measurement task and return the speed measurement result.
  • step 203 after it is determined according to step 203 that the model speed measurement task with the same hash value as the first deep learning model has been submitted, wait for the model speed measurement task to be executed, and return the speed measurement result corresponding to the model speed measurement task.
  • Step 208 Return relevant performance records or error information.
  • step 205 when it is determined that the speed measurement fails according to step 205, an error message is returned; if it is determined according to step 206 that the relevant performance records are stored in the database, when the database is queried, the relevant performance records stored in the database are returned.
  • the database is used to persistently store the network structure, platform information, speed information and other related performance records of the deep learning model, which can reduce the repeated deployment and evaluation of models with the same network structure, and accelerate the model performance query process.
  • the embodiments of the present disclosure propose a model processing device.
  • Fig. 3 is a schematic diagram of the composition and structure of a model processing device according to an embodiment of the present disclosure. As shown in Fig. 3, the device may include:
  • the acquisition part 300 is configured to acquire a set of deep learning models and a set of operating platforms, wherein a hash value is used as an identifier characterizing each deep learning model in the set of deep learning models, and each deep learning model includes a network topology and the attribute information of each operator node; each operating platform in the operating platform set is formed by deploying each deployment tool in the acceleration library on each hardware in the hardware set; the determining part 301 is configured to determine each A speed measurement result of the deep learning model on each running platform; the construction part 302 is configured to build a database based on the mapping relationship between each speed measurement result and the corresponding deep learning model and running platform.
  • the acquiring part 300 is further configured to: acquire the network topology of each deep learning model and attribute information of each operator node;
  • the determination part 301 is configured to: determine the hash value of each operator node in each deep learning model and the hash value of the attribute information of each operator node; The hash value and the hash value of the attribute information of each operator node are hashed to obtain the hash value of each of the deep learning models.
  • the determining part 301 is configured to determine the hash value of each operator node in each deep learning model and the hash value of the attribute information of each operator node, including: based on the Describe the network topology structure of each deep learning model and the attribute information of each operator node, construct the directed acyclic graph of each described deep learning model; determine each operator node in each described directed acyclic graph The hash value and the hash value of the attribute information of each operator node.
  • the determining part 301 is configured to perform hash processing on the hash value of each operator node and the hash value of the attribute information of each operator node to obtain each of the The hash value of the deep learning model includes: performing hash processing on the hash value of each operator node in the directed acyclic graph and the hash value of the attribute information of each operator node to obtain The unique representation value of each operator node in each said directed acyclic graph; sort the unique representation value of each operator node in each said directed acyclic graph; sort the The unique representation value is hashed to obtain the hash value of each of the deep learning models.
  • the acquisition part 300 is further configured to: delete the parameter information of each deep learning model before acquiring the network topology structure of each deep learning model and the attribute information of each operator node.
  • the device further includes a query part, the query part is configured to: in response to the user's first speed measurement request, based on the identifier of the first deep learning model included in the first speed measurement request, determine the The hash value of the first deep learning model; based on the hash value of the first deep learning model, search in the database to obtain the speed measurement result of the first deep learning model.
  • the first speed measurement request further includes an identifier of the target operating platform
  • the query part is configured to search in the database based on the hash value of the first deep learning model to obtain the
  • the speed measurement result of the first deep learning model includes: finding a second deep learning model identical to the hash value of the first deep learning model and the identity of the target operating platform in the database;
  • the speed measurement result of the second deep learning model on the target operating platform is determined as the speed measurement result of the first deep learning model on the target operating platform.
  • the first speed measurement request further includes the hardware corresponding to the first deep learning model and the deployment tool in the acceleration library
  • the query part is further configured to: based on the first deep learning model corresponding The hardware and the deployment tool in the accelerated library determine the target operating platform corresponding to the first deep learning model.
  • the query part is further configured to: in response to the hash value of the first deep learning model not being found in the database, perform a query on the first deep learning model in the first thread Perform speed measurement on each operating platform to obtain the speed measurement results of the first deep learning model on each operating platform; add the speed measurement results of the first deep learning model on each operating platform to the in the database.
  • the query part is further configured to: respond to the user's second speed measurement request in the second thread while measuring the speed of the first deep learning model in the first thread.
  • the query part is further configured to: in response to the user's third speed measurement request, based on the identifier of the first operating platform included in the third speed measurement request, find the The second operating platform with the same identity of the first operating platform; based on the speed measurement results of each deep learning model in the database on the second operating platform, determine the speed of each deep learning model on the first operating platform speed test results.
  • the acquisition part 300, the determination part 301, the construction part 302 and the query part can all be implemented by processors in electronic devices, and the above processors can be ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, micro At least one of a controller and a microprocessor.
  • each functional part in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented not only in the form of hardware, but also in the form of functional parts of software.
  • the integrated unit is realized in the form of a software function part and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this embodiment is essentially or It is said that the part that contributes to the prior art or the whole or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method described in this embodiment.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes.
  • the computer program instructions corresponding to a model processing method in this embodiment can be stored on a storage medium such as an optical disk, a hard disk, or a USB flash drive.
  • a storage medium such as an optical disk, a hard disk, or a USB flash drive.
  • FIG. 4 shows an electronic device 4 provided by an embodiment of the present disclosure, which may include: a memory 401 and a processor 402; wherein,
  • the memory 401 is configured to store computer programs and data
  • the processor 402 is configured to execute the computer program stored in the memory, so as to implement any model processing method of the foregoing embodiments.
  • the above-mentioned memory 401 can be a volatile memory (volatile memory), such as RAM; or a non-volatile memory (non-volatile memory), such as ROM, flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above-mentioned types of memory, and provide instructions and data to the processor 402.
  • volatile memory such as RAM
  • non-volatile memory such as ROM, flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above-mentioned types of memory, and provide instructions and data to the processor 402.
  • the aforementioned processor 402 may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It can be understood that, for different devices, the electronic device used to implement the above processor function may also be other, which is not specifically limited in this embodiment of the present disclosure.
  • An embodiment of the present disclosure provides a computer program product, including computer readable codes.
  • a processor in the electronic device executes any one of the model processing methods in the foregoing embodiments.
  • the functions or parts included in the device provided by the embodiments of the present disclosure can be used to execute the methods described in the method embodiments above, and its specific implementation can refer to the description of the method embodiments above. For brevity, here No longer.
  • the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation.
  • the technical solution of the present disclosure can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to enable a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in various embodiments of the present disclosure.
  • a terminal which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Stored Programmes (AREA)
  • Feedback Control In General (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本公开实施例提出了一种模型处理方法、装置、电子设备、计算机存储介质和计算机程序产品,该方法包括:获取深度学习模型集合和运行平台集合,其中,采用哈希值作为表征所述深度学习模型集合中每一深度学习模型的标识,所述每一深度学习模型包括网络拓扑结构以及各算子节点的属性信息;所述运行平台集合中的各运行平台是加速库中的每一部署工具部署在硬件集合中每一硬件上形成的;确定所述每一深度学习模型在所述各运行平台上的测速结果;基于每一所述测速结果与对应的深度学习模型和运行平台之间的映射关系,构建数据库。

Description

模型处理方法、装置、电子设备、计算机存储介质和程序
相关申请的交叉引用
本公开基于申请号为202111672855.5、申请日为2021年12月31日的中国专利申请提出,申请人为成都商汤科技有限公司,申请名称为“模型处理方法、装置、电子设备和计算机存储介质”的技术方案,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。
技术领域
本公开涉及深度学习技术领域,涉及但不限于一种模型处理方法、装置、电子设备、计算机存储介质和计算机程序产品。
背景技术
速度测试是深度神经网络模型在部署阶段的必要过程,由于不同硬件平台的设计各不相同,因而同一模型在不同硬件平台上的性能也不尽相同,运行速度会有较大差异。
目前,模型的速度性能关系着该模型地落地与应用;在使用深度学习框架训练得到对应的模型后,若想得到部署环境中的模型的速度性能,往往需要通过转换模型、申请对应硬件平台权限、配置硬件运行环境等步骤,才能正确部署模型并进行速度性能的测试,而这一过程是十分冗长复杂的,并且对于不同的模型、不同的硬件平台,由于硬件平台厂商开发套件的不同,往往需要尝试不同的部署流程。在一些方案中,在通过手动部署的方式得到模型的性能结果之后,需要进行手动记录,并且部署经验在不同模型、不同硬件平台间不可复用;可见,该方式不仅测速过程繁琐容易出错,还会降低获取模型的速度信息的效率。
发明内容
本公开实施例期望提供模型处理方法、装置、电子设备、计算机存储介质和计算机程序产品,通过构建数据库,能够较为快速地获取神经网络模型的速度信息。
本公开实施例提供了一种模型处理方法,所述方法包括:
获取深度学习模型集合和运行平台集合,其中,采用哈希值作为表征所述深度学习模型集合中每一深度学习模型的标识,所述每一深度学习模型包括网络拓扑结构以及各算子节点的属性信息;所述运行平台集合中的各运行平台是加速库中的每一部署工具部署在硬件集合中每一硬件上形成的;
确定所述每一深度学习模型在所述各运行平台上的测速结果;
基于每一所述测速结果与对应的深度学习模型和运行平台之间的映射关系,构建数据库。
本公开实施例还提供了一种模型处理装置,所述装置包括:
获取部分,配置为获取深度学习模型集合和运行平台集合,其中,采用哈希值作为表征所述深度学习模型集合中每一深度学习模型的标识,所述每一深度学习模型包括网 络拓扑结构以及各算子节点的属性信息;所述运行平台集合中的各运行平台是加速库中的每一部署工具部署在硬件集合中每一硬件上形成的;确定部分,配置为确定所述每一深度学习模型在所述各运行平台上的测速结果;构建部分,配置为基于每一所述测速结果与对应的深度学习模型和运行平台之间的映射关系,构建数据库。
本公开实施例还提供了一种电子设备,包括处理器和用于存储能够在处理器上运行的计算机程序的存储器;其中,
所述处理器用于运行所述计算机程序以执行上述任意一种模型处理方法。
本公开实施例还提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述任意一种模型处理方法。
本公开实施例还提供了一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的指令可用于执行上述任意一种模型处理方法。
本公开实施例提出的模型处理方法、装置、电子设备、计算机存储介质和计算机程序产品中,所述方法包括:获取深度学习模型集合和运行平台集合,其中,采用哈希值作为表征所述深度学习模型集合中每一深度学习模型的标识,所述每一深度学习模型包括网络拓扑结构以及各算子节点的属性信息;所述运行平台集合中的各运行平台是加速库中的每一部署工具部署在硬件集合中每一硬件上形成的;确定所述每一深度学习模型在所述各运行平台上的测速结果;基于每一所述测速结果与对应的深度学习模型和运行平台之间的映射关系,构建数据库。
可以看出,在本公开实施例中,在构建数据库后,数据库中存储着每一深度学习模型在各运行平台上的测速结果;由于每一深度学习模型均可以采用哈希值进行表征,因而,在后续接收到用户针对某一深度学习模型的测速请求时,可以通过对比该深度学习模型的哈希值与数据库中存储的每一深度学习模型的哈希值,快速得到与第一深度学习模型匹配的深度学习模型,进而,可以将在数据库中查找到的该匹配的深度学习模型的测速结果作为第一深度学习模型的测速结果,如此,可以加快模型的速度信息的获取。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。
图1A为本公开实施例的一种模型处理方法的流程图;
图1B为本公开实施例的一种确定每一深度学习模型的哈希值的流程图;
图1C为本公开实施例的确定每一深度学习模型中各算子节点的哈希值以及各算子节点的属性信息的哈希值的流程图;
图1D为本公开实施例的另一种确定每一深度学习模型的哈希值的流程图;
图1E为本公开实施例的又一种确定每一深度学习模型的哈希值的流程图;
图1F为本公开实施例中的一种构建数据库的流程图;
图1G为本公开实施例中的一种确定第一深度学习模型的测速结果的流程图;
图1H为本公开实施例中的另一种确定第一深度学习模型的测速结果的流程图;
图1I为本公开实施例中的一种确定各深度学习模型在第一运行平台的测速结果的流程图;
图1J为本公开实施例中的一种将第一深度学习模型在各运行平台上的测速结果添加到数据库的流程图;
图2为本公开实施例的另一种模型处理方法的流程示意图;
图3为本公开实施例的模型处理装置的组成结构示意图;
图4为本公开实施例的一种电子设备的结构示意图。
具体实施方式
以下结合附图及实施例,对本公开进行进一步详细说明。应当理解,此处所提供的实施例仅仅用以解释本公开,并不用于限定本公开。另外,以下所提供的实施例是用于实施本公开的部分实施例,而非提供实施本公开的全部实施例,在不冲突的情况下,本公开实施例记载的技术方案可以任意组合的方式实施。
需要说明的是,在本公开实施例中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的方法或者装置不仅包括所明确记载的要素,而且还包括没有明确列出的其他要素,或者是还包括为实施方法或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,并不排除在包括该要素的方法或者装置中还存在另外的相关要素(例如方法中的步骤或者装置中的单元,例如的单元可以是部分电路、部分处理器、部分程序或软件等等)。
例如,本公开实施例提供的模型处理方法包含了一系列的步骤,但是本公开实施例提供的模型处理方法不限于所记载的步骤,同样地,本公开实施例提供的模型处理装置包括了一系列部分,但是本公开实施例提供的装置不限于包括所明确记载的部分,还可以包括为获取相关信息、或基于信息进行处理时所需要设置的部分。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
本公开可以基于电子设备实现,这里,电子设备可以是瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统,等等。
电子设备可以通过程序模块的执行实现相应的功能。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等。它们执行特定的任务或者实现特定的抽 象数据类型。计算机系统可以在分布式云计算环境中实施,在分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
在相关技术中,一些测速平台可以辅助用户方便的进行模型部署及测试,这些平台省去了人工进行模型部署的繁琐流程,转而利用多平台的实际部署经验,将模型解析接口和后端运行库接口统一起来,用相对自动的方式进行模型部署与评测。但每一次模型速度信息的获取,均需要经历自动的模型转换、模型编译、远端执行这样一个完整的部署流程,因而对于相同模型的速度信息的获取往往需要重复地在硬件平台上进行实际测试。由于模型的速度信息关系着模型地落地与应用,因而,如何提高快速获取模型的速度信息,是亟待解决的技术问题。
针对上述技术问题,在本公开的一些实施例中,提出了一种模型查询的技术方案。
在本公开的一些实施例中,模型处理方法可以利用模型处理装置中的处理器实现,上述处理器可以为特定用途集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processor,DSP)、数字信号处理装置(Digital Signal Processing Device,DSPD)、可编程逻辑装置(Programmable Logic Device,PLD)、现场可编程逻辑门阵列(Field Programmable Gate Array,FPGA)、中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器中的至少一种。
图1A为本公开实施例的一种模型处理方法的流程图,如图1A所示,该流程可以包括:
步骤100:获取深度学习模型集合和运行平台集合,其中,采用哈希值作为表征深度学习模型集合中每一深度学习模型的标识,每一深度学习模型包括网络拓扑结构以及各算子节点的属性信息;运行平台集合中的各运行平台是加速库中的每一部署工具部署在硬件集合中每一硬件上形成的。
这里,对于深度学习模型集合中每一深度学习模型的类型不作限定,例如,可以是深度神经网络(Deep Neural Networks,DNN)模型、循环神经网络(Recurrent Neural Networks,RNN)模型等;其中,深度学习模型集合中两两深度学习模型的类型可以相同,也可以不同。
在一些实施例中,在获取深度学习模型集合后,需要确定深度学习模型集合中每一深度学习模型的哈希值;图1B为本公开实施例的一种确定每一深度学习模型的哈希值的流程图,如图1B所示,该流程可以包括以下步骤:
步骤1000:获取每一深度学习模型的网络拓扑结构以及各算子节点的属性信息;
步骤1001:确定每一深度学习模型中各算子节点的哈希值以及各算子节点的属性信息的哈希值;
步骤1002:对各算子节点的哈希值以及各算子节点的属性信息的哈希值进行哈希处理,得到每一深度学习模型的哈希值。
本公开实施例中,每一深度学习模型可以表示包括多个神经网络层的网络模型,其 中,每个神经网络层可以包括至少一个神经元;示例性地,每一深度学习模型中各算子节点对应每种深度学习模型中包括的各个神经元;根据每个神经网络层包括的各个神经元的连接关系,可以确定每一深度学习模型的网络拓扑结构。
示例性地,假设深度学习模型包括输入层,隐藏层和输出层这三个神经网络层,且输入层包括三个神经元,隐藏层包括四个神经元,输出层包括三个神经元,则深度学习模型中包括的十个神经元对应该模型中的十个算子节点;这十个算子节点之间的连接关系构成了该深度学习模型的网络拓扑结构。
在一些实施例中,各算子节点的属性信息可以包括节点对应的权重、节点对应的数据的类型、以及节点对应的函数计算方式(例如,可以是求和函数)等信息。
示例性地,可以通过图遍历方法获取深度学习模型集合中每一深度学习模型的网络拓扑结构以及各算子节点的属性信息。
可以看出,本公开实施例中,明确了通过各个节点的属性信息和拓扑结构确定深度学习模型集合中每一深度学习模型的哈希值的实现方式,这样,便于后续通过对比两个深度学习模型的哈希值,快速确定这两个深度学习模型是否具有相同的网络拓扑结构,提高判断两个模型的同一性的效率。
在一些实施例中,可以通过如图1C所示的步骤确定每一深度学习模型中各算子节点的哈希值以及各算子节点的属性信息的哈希值:
步骤1003:基于每一深度学习模型的网络拓扑结构以及各算子节点的属性信息,构建每一深度学习模型的有向无环图;
步骤1004:确定每一有向无环图中各算子节点的哈希值以及各算子节点的属性信息的哈希值。
本公开实施例中,在获取深度学习模型集合中每一深度学习模型的网络拓扑结构以及各算子节点的属性信息后,可以基于每一深度学习模型的网络拓扑结构以及各算子节点的属性信息构建对应的有向无环图;在得到每一深度学习模型的有向无环图后,确定每一有向无环图中各算子节点的哈希值以及各算子节点的属性信息的哈希值。
这里,具体说明了确定每一深度学习模型中各算子节点的哈希值以及各算子节点的属性信息的哈希值的实现方式;进一步地,通过结合这两部分哈希值,可快速确定每一深度学习模型的哈希值。
在一些实施例中,在得到每一有向无环图中各算子节点的哈希值以及各算子节点的属性信息的哈希值后,可以通过如图1D所示的步骤确定每一深度学习模型的哈希值:
步骤1005:对每一有向无环图中各算子节点的哈希值以及各算子节点的属性信息的哈希值进行哈希处理,得到每一有向无环图中各算子节点的唯一表示值;
步骤1006:对每一有向无环图中各算子节点的唯一表示值进行排序;
步骤1007:对排序后的各算子节点的唯一表示值进行哈希处理,得到每一深度学习模型的哈希值。
示例性地,在确定每一有向无环图中各算子节点的哈希值以及各算子节点的属性信 息的哈希值后,通过逆拓扑排序方法将各算子节点的哈希值和属性信息的哈希值结合起来,并做进一步哈希处理,得到每一有向无环图中各算子节点的唯一表示值;这里,具有相同哈希值的算子节点之间不仅具有相同的算子类型、相同的属性信息,还具有相同的子节点拓扑关系。
这里,具体说明了根据每一深度学习模型中各算子节点的哈希值以及各算子节点的属性信息的哈希值,确定每一深度学习模型的哈希值的实现方式;后续基于深度学习模型的哈希值,可快速得到与其匹配的深度学习模型。
在一些实施例中,在得到每一有向无环图中各算子节点的唯一表示值后,对各算子节点的唯一表示值进行排序并进行进一步哈希处理,得到每一深度神经网络模型的哈希值。
示例性地,下面结合图1E说明确定每一深度学习模型的哈希值的流程,如图1E所示,该流程可以包括:
步骤1008:删除每一深度学习模型的参数信息;
步骤1009:提取每一深度学习模型的网络拓扑结构以及各个算子节点的属性信息;
步骤1010:基于网络拓扑结构以及各算子节点的属性信息,构建有向无环图;
步骤1011:确定每一有向无环图中各算子节点的唯一表示值;
步骤1012:确定每一深度学习模型的哈希值。
这里,步骤1009与上述步骤1000的实现方式相同,步骤1010与上述步骤1003的实现方式相同,步骤1011与上述步骤1005的实现方式相同。
示例性地,根据上述处理流程可以确定每一深度学习模型的哈希值;深度学习模型集合中每一深度学习模型均对应一个哈希值,不同深度学习模型的哈希值不相同,即,每种深度学习模型都有一个唯一的哈希值与其对应,该哈希值唯一标识了深度学习模型的网络拓扑结构以及各个算子节点的属性信息。
示例性地,根据图1E所示的流程可知,在获取每一深度学习模型的网络拓扑结构以及各算子节点的属性信息之前,删除每一深度学习模型的参数信息。
本公开实施例中,在获取每一深度学习模型的网络拓扑结构以及各算子节点的属性信息之前,可以先将每种深度学习模型的格式转换成可以进行哈希表示的特定格式;这里,对于特定格式的类型不作限定;例如,可以是开放式神经网络交换格式(Open Neural Network Exchange,ONNX),也可以是其他类型的转换格式。
在一些实施例中,在得到特定格式的深度学习模型后,可以先删除每种深度学习模型的参数信息,如此,可以减少模型数据的内存占用。
在一些实施例中,运行平台集合中的各运行平台反映了硬件集合中每一硬件与加速库中每一部署工具之间的联系;其表示运行模型时硬件和加速库的组合,是多对多的联系;即,通过硬件和加速库的不同组合方式,可以得到多种不同类型的运行平台;这里,硬件可以表示硬件平台。
步骤101:确定每一深度学习模型在各运行平台上的测速结果。
本公开实施例中,在获取到深度学习模型集合和运行平台集合后,会确定深度学习模型集合中每一深度学习模型在运行平台集合中的各运行平台上的测速结果。
这里,测速结果反映了各运行平台和每一深度学习模型的网络拓扑结构之间的联系;由于运行平台集合中不同运行平台对应的硬件配置、加速库的部署工具和运行环境等方面存在区别;因而,同一深度学习模型在不同运行平台上的性能也不尽相同,运行速度也会有较大差异。
示例性地,对于确定每一深度学习模型在各运行平台上的测速结果的实现方式,可以为:通过手动部署的方式将每一深度学习模型部署到各运行平台上,得到每一深度学习模型的测速结果;还可以为:将模型解析接口和后端运行库接口统一起来,用相对自动的方式进行模型部署与评测,通过模型转换、模型编译、远程执行等操作在各运行平台上进行测速操作,得到每一深度学习模型的测速结果。
可以理解地,在确定每一深度学习模型在各运行平台上的测速结果的过程中,可以根据上述步骤确定的每一深度学习模型的哈希值确定深度学习模型集合中是否存在相同的深度学习模型;如果确定深度学习模型集合中存在相同的深度学习模型,则对于这些相同的深度学习模型,仅需要确定其中一个深度学习模型在各运行平台上的测速结果即可;如此,可以减少一定的操作数。
步骤102:基于每一测速结果与对应的深度学习模型和运行平台之间的映射关系,构建数据库。
本公开实施例中,在根据上述过程得到每一深度学习模型在各运行平台上的测速结果后,可以根据每一测速结果与对应的深度学习模型和运行平台之间的映射关系,构建数据库;同时,将每一深度学习模型的模型信息、运行平台信息、速度信息等存储至数据库中。
这里,模型信息除了包括哈希值外,还可以包括模型的网络拓扑结构等信息;运行平台信息可以包括硬件信息和加速库信息等;速度信息即表示上述每一测速结果。
本公开实施例中,通过构建数据库,可以将每一深度学习模型、以及每一深度学习模型的哈希值和测速结果等相关信息持久化地存储在数据库中,如此,在后续接收到用户针对某一深度学习模型的测速请求时,可以通过对比该深度学习模型的哈希值与数据库中存储的每一深度学习模型的哈希值,快速得到与第一深度学习模型匹配的深度学习模型,进而,可以将在数据库中查找到的该匹配的深度学习模型的测速结果作为第一深度学习模型的测速结果,如此,可以加快模型的速度信息的获取。
示例性地,对于构建数据库的实现方式,还可以为:首先根据每一深度学习模型的待存储信息提取实体和联系的属性信息,根据实体和联系的属性信息设计实体联系图;实体联系图中主要的实体可以包括网络拓扑结构、硬件等,其中,网络拓扑结构的属性信息可以包括序号、名称、哈希值、输入形状、输出形状、属性拓扑图等属性,其中哈希值、输入形状、输出形状可以唯一索引一个网络拓扑结构;硬件的属性信息可以包括序号、名称、架构等属性,其中硬件的名称可以唯一索引硬件。
示例性地,实体联系图中主要的联系可以包括运行平台、速度等,运行平台是硬件与加速库之间的联系,表示运行模型时硬件和加速库的组合,是多对多的联系,速度是运行平台和网络拓扑结构之间的联系,表示模型在具体运行平台上的运行速度,同时包含创建时间、占用内存、测试次数等属性信息,是多对多的联系。根据上述实体和联系的属性信息可以进行数据库的构建,具体的实现方式可以参照图1F;图1F为本公开实施例中的一种构建数据库的流程图,如图1F所示,该流程可以包括:
步骤1013:提取实体、联系的属性信息构建实体联系图;
步骤1014:确定存取字段及其数据表示;
步骤1015:构建数据库表格以及增、删、改、查接口。
可以看出,在本公开实施例中,首先可以根据提取到的各实体及联系的属性信息可以构建实体联系图;接着,利用实体联系图确定数据库表格的字段及其数据表示,实现对各实体及联系的属性信息的结构化存储;最后,可以通过相应编码实现数据的连接、构建数据库表格、增、删、改、查等操作接口,为模型性能信息存储、查询服务提供一定的支持。
本公开实施例提出了一种模型处理方法、装置、电子设备、计算机存储介质和计算机程序产品,该方法包括:获取深度学习模型集合和运行平台集合,其中,采用哈希值作为表征深度学习模型集合中每一深度学习模型的标识,每一深度学习模型包括网络拓扑结构以及各算子节点的属性信息;运行平台集合中的各运行平台是加速库中的每一部署工具部署在硬件集合中每一硬件上形成的;确定每一深度学习模型在各运行平台上的测速结果;基于每一测速结果与对应的深度学习模型和运行平台之间的映射关系,构建数据库。
可以看出,在本公开实施例中,在构建数据库后,数据库中存储着每一深度学习模型在各运行平台上的测速结果;由于每一深度学习模型均可以采用哈希值进行表征,因而,在后续在接收到用户针对某一深度学习模型的测速请求时,可以通过对比第一深度学习模型的哈希值以及每种深度学习模型的哈希值,可以快速得到与第一深度学习模型匹配的深度学习模型,进而,可以将该匹配的深度学习模型的测速结果作为第一深度学习模型的测速结果,如此,可以加快模型的速度信息的获取。
在一些实施例中,可以通过如图1G所示的步骤确定第一深度学习模型的测速结果:
步骤1016:响应于用户的第一测速请求,基于第一测速请求包括的第一深度学习模型的标识,确定第一深度学习模型的哈希值;
步骤1017:基于第一深度学习模型的哈希值,在数据库中进行查找,得到第一深度学习模型的测速结果。
示例性地,若用户想要查询自己指定的第一深度学习模型的测速结果,则可以向运行平台发送包括第一深度学习模型的标识的第一测速请求;在运行平台接收到用户的第一测速请求后,运行平台对应的系统服务后端会根据第一测速请求中的第一深度学习模型的标识,获取第一深度学习模型的网络拓扑结构以及各个算子节点的属性信息,如此, 可以基于第一深度学习模型的网络拓扑结构以及各个算子节点的属性信息确定该模型的哈希值;这里,通过各个节点的属性信息和拓扑结构确定深度学习模型的哈希值的实现方式,已经在前述记载的内容中进行相应说明,这里不再赘述。
示例性地,在得到第一深度学习模型的哈希值后,将第一深度学习模型的哈希值分别与数据库中存储的每一深度学习模型的哈希值进行对比,得到对比结果;这里,对于两者的哈希值进行对比的方式不作限定,例如,可以采用逐一对比的方式,也可以采用同时对比的方式。
可以看出,本公开实施例中,在确定与用户的测速请求对应的深度学习模型的哈希值后,可直接基于该哈希值在数据库中查找到与其匹配的深度学习模型的测速结果,通过将该测速结果作为与用户的测速请求对应的深度学习模型的测速结果,可以加快模型的速度信息的获取。
在一些实施例中,若根据对比结果确定数据库中存储着与第一深度学习模型的哈希值相同的深度学习模型时,则说明查找到与第一深度学习模型的哈希值相同的深度学习模型,即,第一深度学习模型的网络拓扑结构与查找到的深度学习模型相同,此时,可以将查找到的深度学习模型在各运行平台的测速结果,作为第一深度学习模型的测速结果。
相关技术中,在需要判断两个深度学习模型的网络拓扑结构是否相同时,由于深度学习模型是相互连接的算子图结构,因此一般基于图同构算法判定深度模型的同一性,但是该算法基于深度优先搜索策略,对于包含多个节点的深度学习模型,该方法速度过慢,难以满足实际要求。然而,与相关技术相比,本公开实施例中通过对比两个深度学习模型的哈希值,可以快速确定这两个深度学习模型是否具有相同的网络拓扑结构;有效提高判断两个模型的同一性的效率。在一些实施例中,由于可以直接从数据库中查找到与第一深度学习模型的哈希值相同的深度学习模型,因而,可以将查找到的深度学习模型的测速结果作为第一深度学习模型的测速结果;可见,本公开实施例无需在运行平台上对第一深度学习模型进行重复部署以获取测速结果,有效提高获取测速结果的效率。
在一些实施例中,第一测速请求还包括目标运行平台的标识;对于基于第一深度学习模型的哈希值,在数据库中进行查找,得到第一深度学习模型的测速结果的流程,可以参照如图1H所示的步骤:
步骤1018:在数据库中查找到与第一深度学习模型的哈希值以及目标运行平台的标识相同的第二深度学习模型;
步骤1019:将第二深度学习模型在目标运行平台的测速结果,确定为第一深度学习模型在目标运行平台上的测速结果。
在一些实施例中,第一测速请求还可以包括第一深度学习模型对应的硬件和加速库中的部署工具,这样,可以基于第一深度学习模型对应的硬件和加速库中的部署工具,确定第一深度学习模型对应的目标运行平台;便于后续从数据库中查询到深度学习模型在目标运行平台的测速结果。
示例性地,在根据第一测速请求确定第一深度学习模型的哈希值以及目标运行平台后,如果在数据库中查找到与第一深度学习模型的哈希值以及目标运行平台的标识相同的第二深度学习模型,则可以将第二深度学习模型在目标运行平台的测速结果,确定为第一深度学习模型在目标运行平台上的测速结果。
示例性地,假设第一深度学习模型的哈希值为111,且第一深度学习模型的目标运行平台的标识为3;若数据库存储的深度学习模型中包括深度学习模型1至深度学习模型3,且对应的哈希值依次为101、110和111,若深度学习模型1的运行平台的标识为1,深度学习模型2的运行平台的标识为3,深度学习模型3的运行平台的标识为1和3;此时,可以确定第一深度学习模型的哈希值与深度学习模型3的哈希值相同,且第一深度学习模型对应的运行平台的标识与深度学习模型3对应的运行平台的标识也相同;即,可以将深度学习模型3在标识3对应的运行平台的测速结果,作为第一深度学习模型在目标运行平台上的测速结果。
可以看出,本公开实施例中,在确定与用户的测速请求对应的深度学习模型的哈希值后,可直接基于该哈希值以及测速请求包括的目标运行平台的标识在数据库中查找到与其匹配的深度学习模型,通过将该深度学习模型在目标运行平台的测速结果作为与用户的测速请求对应的深度学习模型的测速结果,可以加快模型的速度信息的获取。
示例性地,在确定第一深度学习模型的哈希值后,可以基于数据库的操作接口,对该哈希值进行查询操作,以从数据库中获取到与该哈希值相同的深度学习模型。可见,利用数据库技术可以实现深度学习模型的结构及其性能等相关信息的存储与检索。
在一些实施例中,对于确定各深度学习模型在第一运行平台上的测速结果的流程,可以参照如图1I所示的步骤:
步骤1020:响应于用户的第三测速请求,基于第三测速请求包括的第一运行平台的标识,在数据库中查找到与第一运行平台的标识相同的第二运行平台;
步骤1021:基于数据库中各深度学习模型在第二运行平台上的测速结果,确定各深度学习模型在第一运行平台上的测速结果。
示例性地,若用户想要查询数据库中各深度学习模型在第一运行平台上的测速结果,则可以向运行平台发送包括第一运行平台的标识的第三测速请求;在运行平台接收到用户的第三测速请求后,运行平台对应的系统服务后端会根据第一测速请求中的第一运行平台的标识,从数据库中查找与第一运行平台的标识相同的第二运行平台;如果查找到第二运行平台,则将数据库中各深度学习模型在第二运行平台上的测速结果,确定各深度学习模型在第一运行平台上的测速结果。
可以看出,在本公开实施例中,还可以根据用户测速请求包括的运行平台的标识,在数据库中查找到各个深度学习模型在与运行平台的标识对应的运行平台上的测速结果,因而,可以更好地满足应用需求。
在一些实施例中,对于将第一深度学习模型在各运行平台上的测速结果添加到数据库中的流程,可以参照如图1J所示的步骤:
步骤1022:响应于在数据库中查找不到第一深度学习模型的哈希值,在第一线程中对第一深度学习模型在各运行平台上进行测速,得到第一深度学习模型在各运行平台上的测速结果;
步骤1023:将第一深度学习模型在各运行平台上的测速结果,添加到数据库中。
示例性地,如果在数据库中查找不到第一深度学习模型的哈希值,则说明第一深度学习模型的哈希值与数据库中存储的每一深度学习模型的哈希值均不相同,此时,从数据库中获取不到第一深度学习模型的测速结果。这种情况下,则需要运行平台对应的系统服务后端发起实际的针对第一深度学习模型的测速任务;即,需要通过模型转换、模型编译、远程执行等操作在实际的各个运行平台上进行测速操作,得到第一深度学习模型在各运行平台上的测速结果。
这里,在针对第一深度学习模型的测速任务完成后,检查返回的测速结果,若根据测速结果确定测速成功,则将模型结构信息、运行平台信息、速度信息等性能记录存储到数据库中,即,将第一深度学习模型在各运行平台上的测速结果,添加到数据库中;反之,若根据测速结果确定测速失败,则返回错误信息。
这里,通过将得到的深度学习模型在各运行平台上的测速结果添加到数据库中,使得后续对于网络结构相同的深度学习模型的性能查询任务,不需要每次在实际的硬件平台上进行重复的部署评测,从而,可以加速模型性能数据的获取。
在一些实施例中,上述方法还可以包括:在第一线程中对第一深度学习模型进行测速的同时,在第二线程中响应用户的第二测速请求。
示例性地,由于实际测速流程比较缓慢,因此,系统服务后端可以在单独的线程(即第一线程)中发起测速任务并等待测速结果,该方式不会阻塞服务后端在第二线程中接收用户针对其他深度学习模型的第二测速请求,从而增加了系统服务的并行性;这里,第一线程和第二线程表示两个执行不同任务的线程。
可见,在本公开实施例中,对于网络拓扑结构不相同的深度学习模型的性能查询任务,才需要在实际的硬件平台上进行部署评测;对于网络结构相同的深度学习模型的性能查询任务,不需要每次在实际的硬件平台上进行重复的部署评测,从而,可以加速模型性能数据的获取。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
为了能够更加体现本公开的目的,在本公开上述实施例的基础上,进行进一步的说明。
图2为本公开实施例的另一种模型处理方法的流程示意图,如图2所示,该流程可以包括:
步骤200:接收第一深度学习模型的测速请求。
示例性地,接收用户指定的第一深度学习模型的测速请求;这里,该测速请求中可 以包括第一深度学习模型信息、硬件平台信息以及加速度信息等。
步骤201:查询数据库。
示例性地,在接收到用户指定的第一深度学习模型的测速请求后,根据该测速请求中包括第一深度学习模型信息确定第一深度学习模型的哈希值,并在数据库中与该哈希值对应的性能记录,这里,性能记录包括模型的速度信息。
步骤202:判断是否查询到性能记录。
示例性地,由于数据库中预先存储着多种深度学习模型中的每种深度学习模型的哈希值以及在各运行平台上的测速结果等相关信息;此时,可以根据第一深度学习模型的哈希值,查询数据库中是否存在与第一深度学习模型的哈希值相同的深度学习模型对应的性能记录;如果是,执行步骤208,反之,执行步骤203。
步骤203:判断是否已经提交相同的测速任务。
示例性地,判断运行平台对应的系统服务后端是否已经提交了与第一深度学习模型的哈希值相同的模型测速任务;如果是,执行步骤207,反之,执行步骤204。
步骤204:在运行平台对第一深度学习模型进行测速。
示例性地,若系统服务后端并未提交与第一深度学习模型的哈希值相同的模型测速任务,则系统服务后端发起实际的针对第一深度学习模型的测速任务;即,在实际的运行平台上对第一深度学习模型进行测速操作,得到测速结果。
步骤205:判断测速是否成功。
示例性地,在根据步骤204的测速结果确定测速成功,则执行步骤206,反之,执行步骤208。
步骤206:将相关性能记录插入数据库。
示例性地,在根据步骤205确定测速成功时,将第一深度学习模型的相关性能记录,例如,模型结构信息、运行平台信息、速度信息等性能记录存储到数据库中,执行步骤208。
步骤207:等待测速任务完成并返回测速结果。
示例性地,在根据步骤203确定已经提交了与第一深度学习模型的哈希值相同的模型测速任务后,则等待该模型测速任务执行完成,并返回该模型测速任务对应的测速结果。
步骤208:返回相关性能记录或错误信息。
示例性地,在根据步骤205确定测速失败时,返回错误信息;若根据步骤206确定将相关性能记录存储至数据库时,则在查询数据库时,返回存储在该数据库中的相关性能记录。
可见,本公开实施例中利用数据库对深度学习模型的网络结构、平台信息、速度信息等相关性能记录进行持久化存储,可以减少相同网络结构的模型的重复部署评测,加速模型性能查询过程。
在前述实施例提出的模型处理方法的基础上,本公开实施例提出了一种模型处理装 置。
图3为本公开实施例的模型处理装置的组成结构示意图,如图3所示,该装置可以包括:
获取部分300,配置为获取深度学习模型集合和运行平台集合,其中,采用哈希值作为表征所述深度学习模型集合中每一深度学习模型的标识,所述每一深度学习模型包括网络拓扑结构以及各算子节点的属性信息;所述运行平台集合中的各运行平台是加速库中的每一部署工具部署在硬件集合中每一硬件上形成的;确定部分301,配置为确定所述每一深度学习模型在所述各运行平台上的测速结果;构建部分302,配置为基于每一所述测速结果与对应的深度学习模型和运行平台之间的映射关系,构建数据库。
在一些实施例中,在所述获取深度学习模型集合后,所述获取部分300,还配置为:获取所述每一深度学习模型的网络拓扑结构以及各算子节点的属性信息;
所述确定部分301,配置为:确定所述每一深度学习模型中各算子节点的哈希值以及所述各算子节点的属性信息的哈希值;对所述各算子节点的哈希值以及所述各算子节点的属性信息的哈希值进行哈希处理,得到所述每一所述深度学习模型的哈希值。
在一些实施例中,所述确定部分301,配置为确定所述每一深度学习模型中各算子节点的哈希值以及所述各算子节点的属性信息的哈希值,包括:基于所述每一深度学习模型的网络拓扑结构以及各算子节点的属性信息,构建所述每一深度学习模型的有向无环图;确定每一所述有向无环图中各算子节点的哈希值以及所述各算子节点的属性信息的哈希值。
在一些实施例中,所述确定部分301,配置为对所述各算子节点的哈希值以及所述各算子节点的属性信息的哈希值进行哈希处理,得到所述每一所述深度学习模型的哈希值,包括:对每一所述有向无环图中各算子节点的哈希值以及所述各算子节点的属性信息的哈希值进行哈希处理,得到每一所述有向无环图中各算子节点的唯一表示值;对每一所述有向无环图中各算子节点的唯一表示值进行排序;对排序后的各算子节点的唯一表示值进行哈希处理,得到所述每一所述深度学习模型的哈希值。
在一些实施例中,所述获取部分300,还配置为:获取所述每一深度学习模型的网络拓扑结构以及各算子节点的属性信息之前,删除所述每一深度学习模型的参数信息。
在一些实施例中,所述装置还包括查询部分,所述查询部分,配置为:响应于用户的第一测速请求,基于所述第一测速请求包括的第一深度学习模型的标识,确定所述第一深度学习模型的哈希值;基于所述第一深度学习模型的哈希值,在所述数据库中进行查找,得到所述第一深度学习模型的测速结果。
在一些实施例中,所述第一测速请求还包括目标运行平台的标识,所述查询部分,配置为基于所述第一深度学习模型的哈希值,在所述数据库中进行查找,得到所述第一深度学习模型的测速结果,包括:在所述数据库中查找到与所述第一深度学习模型的哈希值以及所述目标运行平台的标识相同的第二深度学习模型;将所述第二深度学习模型在所述目标运行平台的测速结果,确定为所述第一深度学习模型在所述目标运行平台上 的测速结果。
在一些实施例中,所述第一测速请求还包括所述第一深度学习模型对应的硬件和加速库中的部署工具,所述查询部分,还配置为:基于所述第一深度学习模型对应的硬件和加速库中的部署工具,确定所述第一深度学习模型对应的目标运行平台。
在一些实施例中,所述查询部分,还配置为:响应于在所述数据库中查找不到所述第一深度学习模型的哈希值,在第一线程中对所述第一深度学习模型在各运行平台上进行测速,得到所述第一深度学习模型在所述各运行平台上的测速结果;将所述第一深度学习模型在所述各运行平台上的测速结果,添加到所述数据库中。
在一些实施例中,所述查询部分,还配置为:在所述第一线程中对所述第一深度学习模型进行测速的同时,在第二线程中响应所述用户的第二测速请求。
在一些实施例中,所述查询部分,还配置为:响应于用户的第三测速请求,基于所述第三测速请求包括的第一运行平台的标识,在所述数据库中查找到与所述第一运行平台的标识相同的第二运行平台;基于所述数据库中各深度学习模型在所述第二运行平台上的测速结果,确定所述各深度学习模型在所述第一运行平台上的测速结果。
实际应用中,获取部分300、确定部分301、构建部分302和查询部分均可以利用电子设备中的处理器实现,上述处理器可以为ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微处理器中的至少一种。
另外,在本实施例中的各功能部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能部分的形式实现。
所述集成的单元如果以软件功能部分的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
具体来讲,本实施例中的一种模型处理方法对应的计算机程序指令可以被存储在光盘,硬盘,U盘等存储介质上,当存储介质中的与一种模型处理方法对应的计算机程序指令被一电子设备读取或被执行时,实现前述实施例的任意一种模型处理方法。
基于前述实施例相同的技术构思,参见图4,其示出了本公开实施例提供的一种电子设备4,可以包括:存储器401和处理器402;其中,
所述存储器401,配置为存储计算机程序和数据;
所述处理器402,配置为执行所述存储器中存储的计算机程序,以实现前述实施例的任意一种模型处理方法。
在实际应用中,上述存储器401可以是易失性存储器(volatile memory),例如RAM;或者非易失性存储器(non-volatile memory),例如ROM,快闪存储器(flash memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的存储器的组合,并向处理器402提供指令和数据。
上述处理器402可以为ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微处理器中的至少一种。可以理解地,对于不同的设备,用于实现上述处理器功能的电子器件还可以为其他,本公开实施例不作具体限定。
本公开实施例提供了一种计算机程序产品,包括计算机可读代码,当计算机可读代码在电子设备中运行时,电子设备中的处理器执行前述实施例的任意一种模型处理方法。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的部分可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。
上文对各个实施例的描述倾向于强调各个实施例之间的不同之处,其相同或相似之处可以互相参考,为了简洁,本文不再赘述。
本公开所提供的各方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。
本公开所提供的各产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。
本公开所提供的各方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本公开各个实施例所述的方法。
上面结合附图对本公开的实施例进行了描述,但是本公开并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本公开的启示下,在不脱离本公开宗旨和权利要求所保护的范围情况下,还可做出很多形式,这些均属于本公开的保护之内。

Claims (25)

  1. 一种模型处理方法,应用于电子设备中,所述方法包括:
    获取深度学习模型集合和运行平台集合,其中,采用哈希值作为表征所述深度学习模型集合中每一深度学习模型的标识,所述每一深度学习模型包括网络拓扑结构以及各算子节点的属性信息;所述运行平台集合中的各运行平台是加速库中的每一部署工具部署在硬件集合中每一硬件上形成的;
    确定所述每一深度学习模型在所述各运行平台上的测速结果;
    基于每一所述测速结果与对应的深度学习模型和运行平台之间的映射关系,构建数据库。
  2. 根据权利要求1所述的方法,其中,在所述获取深度学习模型集合后,所述方法还包括:
    获取所述每一深度学习模型的网络拓扑结构以及各算子节点的属性信息;
    确定所述每一深度学习模型中各算子节点的哈希值以及所述各算子节点的属性信息的哈希值;
    对所述各算子节点的哈希值以及所述各算子节点的属性信息的哈希值进行哈希处理,得到所述每一所述深度学习模型的哈希值。
  3. 根据权利要求2所述的方法,其中,所述确定所述每一深度学习模型中各算子节点的哈希值以及所述各算子节点的属性信息的哈希值,包括:
    基于所述每一深度学习模型的网络拓扑结构以及各算子节点的属性信息,构建所述每一深度学习模型的有向无环图;
    确定每一所述有向无环图中各算子节点的哈希值以及所述各算子节点的属性信息的哈希值。
  4. 根据权利要求3所述的方法,其中,所述对所述各算子节点的哈希值以及所述各算子节点的属性信息的哈希值进行哈希处理,得到所述每一所述深度学习模型的哈希值,包括:
    对每一所述有向无环图中各算子节点的哈希值以及所述各算子节点的属性信息的哈希值进行哈希处理,得到每一所述有向无环图中各算子节点的唯一表示值;
    对每一所述有向无环图中各算子节点的唯一表示值进行排序;
    对排序后的各算子节点的唯一表示值进行哈希处理,得到所述每一所述深度学习模型的哈希值。
  5. 根据权利要求1至4任一项所述的方法,其中,所述方法还包括:
    获取所述每一深度学习模型的网络拓扑结构以及各算子节点的属性信息之前,删除所述每一深度学习模型的参数信息。
  6. 根据权利要求1至5任一项所述的方法,其中,所述方法还包括:
    响应于用户的第一测速请求,基于所述第一测速请求包括的第一深度学习模型的标 识,确定所述第一深度学习模型的哈希值;
    基于所述第一深度学习模型的哈希值,在所述数据库中进行查找,得到所述第一深度学习模型的测速结果。
  7. 根据权利要求6所述的方法,其中,所述第一测速请求还包括目标运行平台的标识,所述基于所述第一深度学习模型的哈希值,在所述数据库中进行查找,得到所述第一深度学习模型的测速结果,包括:
    在所述数据库中查找到与所述第一深度学习模型的哈希值以及所述目标运行平台的标识相同的第二深度学习模型;
    将所述第二深度学习模型在所述目标运行平台的测速结果,确定为所述第一深度学习模型在所述目标运行平台上的测速结果。
  8. 根据权利要求6所述的方法,其中,所述第一测速请求还包括所述第一深度学习模型对应的硬件和加速库中的部署工具,所述方法还包括:
    基于所述第一深度学习模型对应的硬件和加速库中的部署工具,确定所述第一深度学习模型对应的目标运行平台。
  9. 根据权利要求6至8任一项所述的方法,其中,所述方法还包括:
    响应于在所述数据库中查找不到所述第一深度学习模型的哈希值,在第一线程中对所述第一深度学习模型在各运行平台上进行测速,得到所述第一深度学习模型在所述各运行平台上的测速结果;
    将所述第一深度学习模型在所述各运行平台上的测速结果,添加到所述数据库中。
  10. 根据权利要求9所述的方法,其中,所述方法还包括:
    在所述第一线程中对所述第一深度学习模型进行测速的同时,在第二线程中响应所述用户的第二测速请求。
  11. 根据权利要求1至10任一项所述的方法,其中,所述方法还包括:
    响应于用户的第三测速请求,基于所述第三测速请求包括的第一运行平台的标识,在所述数据库中查找到与所述第一运行平台的标识相同的第二运行平台;
    基于所述数据库中各深度学习模型在所述第二运行平台上的测速结果,确定所述各深度学习模型在所述第一运行平台上的测速结果。
  12. 一种模型处理装置,应用于电子设备中,所述装置包括:
    获取部分,配置为获取深度学习模型集合和运行平台集合,其中,采用哈希值作为表征所述深度学习模型集合中每一深度学习模型的标识,所述每一深度学习模型包括网络拓扑结构以及各算子节点的属性信息;所述运行平台集合中的各运行平台是加速库中的每一部署工具部署在硬件集合中每一硬件上形成的;
    确定部分,配置为确定所述每一深度学习模型在所述各运行平台上的测速结果;
    构建部分,配置为基于每一所述测速结果与对应的深度学习模型和运行平台之间的映射关系,构建数据库。
  13. 根据权利要求12所述的装置,其中,在所述获取深度学习模型集合后,所述 获取部分,还配置为获取所述每一深度学习模型的网络拓扑结构以及各算子节点的属性信息;
    所述确定部分,配置为确定所述每一深度学习模型中各算子节点的哈希值以及所述各算子节点的属性信息的哈希值;对所述各算子节点的哈希值以及所述各算子节点的属性信息的哈希值进行哈希处理,得到所述每一所述深度学习模型的哈希值。
  14. 根据权利要求13所述的装置,其中,所述确定部分,配置为确定所述每一深度学习模型中各算子节点的哈希值以及所述各算子节点的属性信息的哈希值,包括:
    基于所述每一深度学习模型的网络拓扑结构以及各算子节点的属性信息,构建所述每一深度学习模型的有向无环图;
    确定每一所述有向无环图中各算子节点的哈希值以及所述各算子节点的属性信息的哈希值。
  15. 根据权利要求14所述的装置,其中,所述确定部分,配置为对所述各算子节点的哈希值以及所述各算子节点的属性信息的哈希值进行哈希处理,得到所述每一所述深度学习模型的哈希值,包括:
    对每一所述有向无环图中各算子节点的哈希值以及所述各算子节点的属性信息的哈希值进行哈希处理,得到每一所述有向无环图中各算子节点的唯一表示值;
    对每一所述有向无环图中各算子节点的唯一表示值进行排序;
    对排序后的各算子节点的唯一表示值进行哈希处理,得到所述每一所述深度学习模型的哈希值。
  16. 根据权利要求12至15任一项所述的装置,其中,所述获取部分,还配置为获取所述每一深度学习模型的网络拓扑结构以及各算子节点的属性信息之前,删除所述每一深度学习模型的参数信息。
  17. 根据权利要求12至16任一项所述的装置,其中,所述装置还包括查询部分,所述查询部分,配置为:
    响应于用户的第一测速请求,基于所述第一测速请求包括的第一深度学习模型的标识,确定所述第一深度学习模型的哈希值;
    基于所述第一深度学习模型的哈希值,在所述数据库中进行查找,得到所述第一深度学习模型的测速结果。
  18. 根据权利要求17所述的装置,其中,所述第一测速请求还包括目标运行平台的标识,所述查询部分,配置为基于所述第一深度学习模型的哈希值,在所述数据库中进行查找,得到所述第一深度学习模型的测速结果,包括:
    在所述数据库中查找到与所述第一深度学习模型的哈希值以及所述目标运行平台的标识相同的第二深度学习模型;
    将所述第二深度学习模型在所述目标运行平台的测速结果,确定为所述第一深度学习模型在所述目标运行平台上的测速结果。
  19. 根据权利要求17所述的装置,其中,所述第一测速请求还包括所述第一深度 学习模型对应的硬件和加速库中的部署工具,所述查询部分,配置为基于所述第一深度学习模型对应的硬件和加速库中的部署工具,确定所述第一深度学习模型对应的目标运行平台。
  20. 根据权利要求17至19任一项所述的装置,其中,所述查询部分,还配置为:响应于在所述数据库中查找不到所述第一深度学习模型的哈希值,在第一线程中对所述第一深度学习模型在各运行平台上进行测速,得到所述第一深度学习模型在所述各运行平台上的测速结果;
    将所述第一深度学习模型在所述各运行平台上的测速结果,添加到所述数据库中。
  21. 根据权利要求20所述的装置,其中,所述查询部分,还配置为:在所述第一线程中对所述第一深度学习模型进行测速的同时,在第二线程中响应所述用户的第二测速请求。
  22. 根据权利要求12至21任一项所述的装置,其中,所述查询部分,还配置为:响应于用户的第三测速请求,基于所述第三测速请求包括的第一运行平台的标识,在所述数据库中查找到与所述第一运行平台的标识相同的第二运行平台;
    基于所述数据库中各深度学习模型在所述第二运行平台上的测速结果,确定所述各深度学习模型在所述第一运行平台上的测速结果。
  23. 一种电子设备,包括处理器和用于存储能够在处理器上运行的计算机程序的存储器;其中,
    所述处理器用于运行所述计算机程序以执行权利要求1至11任一项所述的模型处理方法。
  24. 一种计算机存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现权利要求1至11任一项所述的模型处理方法。
  25. 一种计算机程序产品,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行如权利要求1至11任一项所述的模型处理方法。
PCT/CN2022/093836 2021-12-31 2022-05-19 模型处理方法、装置、电子设备、计算机存储介质和程序 WO2023123828A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111672855.5 2021-12-31
CN202111672855.5A CN114330668A (zh) 2021-12-31 2021-12-31 模型处理方法、装置、电子设备和计算机存储介质

Publications (1)

Publication Number Publication Date
WO2023123828A1 true WO2023123828A1 (zh) 2023-07-06

Family

ID=81020392

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/093836 WO2023123828A1 (zh) 2021-12-31 2022-05-19 模型处理方法、装置、电子设备、计算机存储介质和程序

Country Status (2)

Country Link
CN (1) CN114330668A (zh)
WO (1) WO2023123828A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117311998A (zh) * 2023-11-30 2023-12-29 卓世未来(天津)科技有限公司 一种大模型部署方法及系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114330668A (zh) * 2021-12-31 2022-04-12 成都商汤科技有限公司 模型处理方法、装置、电子设备和计算机存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109358944A (zh) * 2018-09-17 2019-02-19 深算科技(重庆)有限公司 深度学习分布式运算方法、装置、计算机设备及存储介质
CN111340237A (zh) * 2020-03-05 2020-06-26 腾讯科技(深圳)有限公司 数据处理和模型运行方法、装置和计算机设备
CN112561081A (zh) * 2020-12-18 2021-03-26 北京百度网讯科技有限公司 深度学习模型的转换方法、装置、电子设备和存储介质
CN114330668A (zh) * 2021-12-31 2022-04-12 成都商汤科技有限公司 模型处理方法、装置、电子设备和计算机存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11132687B2 (en) * 2019-10-04 2021-09-28 Visa International Service Association Method for dynamically reconfiguring machine learning models
CN111222637B (zh) * 2020-01-17 2023-11-28 上海商汤智能科技有限公司 神经网络模型部署方法及装置、电子设备和存储介质
CN111353608B (zh) * 2020-02-26 2023-09-12 Oppo广东移动通信有限公司 模型移植方法及相关设备
CN111447108B (zh) * 2020-03-23 2022-12-06 中电福富信息科技有限公司 基于深度学习的渐进式时延带宽测速分析方法及其系统
CN112506796B (zh) * 2020-12-21 2022-06-10 北京百度网讯科技有限公司 数据处理方法、装置、设备以及存储介质
CN113282941A (zh) * 2021-06-15 2021-08-20 深圳市商汤科技有限公司 获取对象标识的方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109358944A (zh) * 2018-09-17 2019-02-19 深算科技(重庆)有限公司 深度学习分布式运算方法、装置、计算机设备及存储介质
CN111340237A (zh) * 2020-03-05 2020-06-26 腾讯科技(深圳)有限公司 数据处理和模型运行方法、装置和计算机设备
CN112561081A (zh) * 2020-12-18 2021-03-26 北京百度网讯科技有限公司 深度学习模型的转换方法、装置、电子设备和存储介质
CN114330668A (zh) * 2021-12-31 2022-04-12 成都商汤科技有限公司 模型处理方法、装置、电子设备和计算机存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117311998A (zh) * 2023-11-30 2023-12-29 卓世未来(天津)科技有限公司 一种大模型部署方法及系统
CN117311998B (zh) * 2023-11-30 2024-03-05 卓世未来(天津)科技有限公司 一种大模型部署方法及系统

Also Published As

Publication number Publication date
CN114330668A (zh) 2022-04-12

Similar Documents

Publication Publication Date Title
WO2023123828A1 (zh) 模型处理方法、装置、电子设备、计算机存储介质和程序
JP5749279B2 (ja) アイテム関連付けのための結合埋込
US20050235248A1 (en) Apparatus for discovering computing services architecture an developing patterns of computing services and method therefor
JP6239004B2 (ja) 最適化されたデータサブセット化のための方法、装置及びコンピュータ読み取り可能媒体
CN110991553B (zh) Bim模型对比方法
CN110515896B (zh) 模型资源管理方法、模型文件制作方法、装置和系统
US10713152B2 (en) Automated path generator for optimized application testing
CN110716950B (zh) 一种口径系统建立方法、装置、设备及计算机存储介质
CN112416369B (zh) 一种面向异构混合环境的智能化部署方法
WO2022083436A1 (zh) 数据处理方法、装置、设备及可读存储介质
Debattista et al. Quality assessment of linked datasets using probabilistic approximation
CN115017158A (zh) 节点信息查询方法
CN108875317A (zh) 软件克隆检测方法及装置、检测设备及存储介质
CN116069577A (zh) 一种rpc服务的接口测试方法、设备及介质
CN110659063A (zh) 软件项目重构方法、装置、计算机装置及存储介质
CN113821554A (zh) 一种实现异构数据库数据采集的方法
CN114780443A (zh) 微服务应用自动化测试方法、装置、电子设备及存储介质
CN110188432A (zh) 系统架构的验证方法、电子设备及计算机可读存储介质
US20150347402A1 (en) System and method for enabling a client system to generate file system operations on a file system data set using a virtual namespace
CN113031835A (zh) 一种菜单数据处理方法及装置
CN112463896A (zh) 档案编目数据处理方法、装置、计算设备及存储介质
Novitsky The concept and evaluating of big data quality in the semantic environment
CN116126852B (zh) 一种基于bim的装配式建筑智慧管理数据存储方法
CN115098389B (zh) 一种基于依赖模型的rest接口测试用例生成方法
CN117056238B (zh) 验证部署框架下模型转换正确性的方法及计算设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22913123

Country of ref document: EP

Kind code of ref document: A1