CN117455015B

CN117455015B - Model optimization method and device, storage medium and electronic equipment

Info

Publication number: CN117455015B
Application number: CN202311774499.7A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Moore Thread Intelligent Technology Chengdu Co ltd
Current assignee: Moore Thread Intelligent Technology Chengdu Co ltd
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-04-02
Anticipated expiration: 2043-12-20
Also published as: CN117455015A

Abstract

The application discloses a method, a device, a storage medium and electronic equipment for model optimization, which can collect operation data of model units with different granularities contained in a target model appointed by a user by calling preset data collection programs after the target model is appointed by the user, analyze and process the operation data of the model units with different granularities contained in the target model, determine the hardware resource usage information of the model units with different granularities contained in the target model, evaluate and optimize the target model from multiple granularity levels according to the determined hardware resource usage information of the model units with different granularities contained in the target model, and further effectively improve the performance of the target model.

Description

Model optimization method and device, storage medium and electronic equipment

Technical Field

The present invention relates to the field of machine learning technologies, and in particular, to a method and apparatus for model optimization, a storage medium, and an electronic device.

Background

With the development of machine learning technology, the performance of machine learning models has received a great deal of attention. In general, a user can collect performance data (such as input and output sizes of operators, execution time of the operators, access frequency of the operators, and the like) of the machine learning model in a training process through a pile inserting method, so that the machine learning model can be evaluated according to the performance data of the machine learning model, and the structure of the machine learning model can be optimized according to an evaluation result, so that performance of the machine learning model is improved.

However, the performance data of the machine learning model obtained by the pile-inserting method according to the personal experience of the user is often less, so that the accuracy of the evaluation result of performing the performance evaluation on the machine learning model based on the obtained performance data is also lower, the effect of optimizing the machine learning model is poorer, and the performance of the machine learning model is improved less.

Therefore, how to effectively improve the performance of the machine learning model is a problem to be solved.

Disclosure of Invention

The invention provides a method, a device, a storage medium and electronic equipment for model optimization, which are used for partially solving the problems existing in the prior art.

The invention adopts the following technical scheme:

the invention provides a model optimization method, which comprises the following steps:

obtaining a target model;

calling preset data acquisition programs, and acquiring operation data of model units with different granularities in the target model and hardware resource global use information of hardware equipment when the target model is operated; wherein, the model unit of each granularity includes: at least one of an operator unit under operator granularity, a computation graph unit under computation graph granularity, and a target model under model granularity;

Determining the hardware resource use information corresponding to the model units with different granularities according to the operation data and the hardware resource global use information;

and optimizing the target model according to the hardware resource use information and the operation data to obtain an optimized target model.

Optionally, before determining the hardware resource usage information corresponding to the model units with different granularities according to the operation data and the hardware resource global usage information, the method further includes:

determining the running start-stop time periods of model units with different granularities according to the running data;

sequencing the running start-stop time periods of the model units with different granularities according to the sequence of time to obtain the running time sequences of the model units with different granularities;

according to the operation data and the hardware resource global use information, determining the hardware resource use information corresponding to model units with different granularities, wherein the method specifically comprises the following steps:

and determining the hardware resource use information corresponding to the model units with different granularities according to the run time sequence and the hardware resource global use information.

Optionally, determining the hardware resource usage information corresponding to the model units with different granularities according to the operation data and the hardware resource global usage information specifically includes:

Determining call stack information of model units with different granularities according to the operation data;

and determining the hardware resource use information corresponding to the model units with different granularities according to the call stack information and the hardware resource global use information.

Optionally, optimizing the target model according to the hardware resource usage information and the operation data specifically includes:

and optimizing the target model according to the hierarchical relationship among different granularities, the hardware resource use information and the operation data.

Optionally, calling preset data acquisition programs to acquire the operation data of model units with different granularities in the target model, and specifically including:

calling preset data acquisition programs to acquire operation data of model units with different granularities contained in the target model under different frames;

determining hardware resource use information corresponding to model units with different granularities contained in the target model under each frame according to collected operation data of the model units with different granularities contained in the target model under each frame;

Optimizing the target model according to the hardware resource use information and the operation data to obtain an optimized target model, wherein the method specifically comprises the following steps:

and optimizing the target model according to the hardware resource usage information corresponding to the model units with different granularities contained in the target model under each frame and the operation data of the model units with different granularities contained in the target model under each frame.

Optionally, optimizing the target model according to the hardware resource usage information and the operation data to obtain an optimized target model, which specifically includes:

generating a data visualization component corresponding to the target model according to the hardware resource use information and the operation data, wherein the data visualization component comprises: a hardware resource usage information statistics diagram component, a hardware resource usage information table component;

displaying the data visualization component to the user through equipment used by the user, and receiving an operation instruction sent by the user;

and optimizing the target model according to the operation instruction to obtain an optimized target model.

Optionally, the method further comprises:

Storing the determined hardware resource use information into a database;

and if a data acquisition instruction sent by a user is received, retrieving the hardware resource use information matched with the acquisition instruction from the database, and sending the hardware resource use information to equipment used by the user so as to display the hardware resource use information matched with the acquisition instruction to the user through the equipment.

The invention provides a model optimization device, which comprises:

the acquisition module is used for acquiring the target model;

the acquisition module is used for calling preset data acquisition programs, and acquiring operation data of model units with different granularities in the target model and hardware resource global use information of hardware equipment when the target model is operated; wherein, the model unit of each granularity includes: at least one of an operator unit under operator granularity, a computation graph unit under computation graph granularity, and a target model under model granularity;

the processing module is used for determining the hardware resource use information corresponding to the model units with different granularities according to the operation data and the hardware resource global use information;

and the deployment module is used for optimizing the target model according to the hardware resource use information and the operation data to obtain an optimized target model.

The present invention provides a computer readable storage medium storing a computer program which when executed by a processor implements the method of model optimization described above.

The invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of model optimization described above when executing the program.

The at least one technical scheme adopted by the invention can achieve the following beneficial effects:

in the method for optimizing the model, provided by the invention, a target model is firstly obtained, each preset data acquisition program is called, the running data of model units with different granularities in the target model and the global use information of hardware resources of hardware equipment when the target model is run are acquired, wherein the model units with different granularities comprise: at least one of an operator unit under the operator granularity, a calculation graph unit under the calculation graph granularity and a target model under the model granularity, determining hardware resource use information corresponding to model units with different granularities according to the operation data and the hardware resource global use information, and optimizing the target model according to the hardware resource use information and the operation data to obtain an optimized target model.

According to the method, after the target model is specified by the user, the operation data of model units with different granularities contained in the target model specified by the user can be collected by calling preset data collection programs, and the operation data of the model units with different granularities contained in the target model can be analyzed and processed, so that the hardware resource utilization information of the model units with different granularities contained in the target model can be determined, and the target model can be evaluated and optimized from multiple granularity levels according to the determined hardware resource utilization information of the model units with different granularities contained in the target model, so that the performance of the target model can be effectively improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a flow chart of a method of model optimization provided in the present invention;

FIG. 2 is a schematic illustration of a visualization component provided in the present disclosure;

FIG. 3 is a schematic diagram of a visualization component generation process provided in the present invention;

FIG. 4 is a schematic diagram of a model optimization device provided by the invention;

fig. 5 is a schematic diagram of an electronic device corresponding to fig. 1 according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to specific embodiments of the present invention and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The following describes in detail the technical solutions provided by the embodiments of the present invention with reference to the accompanying drawings.

FIG. 1 is a schematic flow chart of a method for model optimization provided in the present invention, comprising the following steps:

s101: and obtaining a target model.

In an actual application scenario, in a process of training, deploying or maintaining a machine learning model, a user generally needs to optimize the machine learning model to improve performance of the machine learning model, and before that, the user may take the machine learning model needing to be optimized as a target model, and may send basic information of the target model to a service platform through equipment used by the user, where the basic information includes: model name, data set used by the model, hardware environment in which the model is run.

Of course, the user may also select the basic information of the target model from the candidate basic information provided by the service platform by performing a designating operation in the client of the service platform, where the designating operation may be: drag operation, text input operation, click operation, etc.

After receiving the basic information of the target model, the service platform can determine the target model according to the received basic information and generate a model analysis task aiming at the target model, so that the model analysis task aiming at the target model can be executed, each operation data of the target model is collected and analyzed, and the target model can be optimized according to an analysis result, so that the optimized target model is obtained and deployed.

In the present invention, an execution body for executing a model optimization method may refer to a designated device such as a server provided on a service platform, or may refer to a designated device such as a desktop computer, a notebook computer, a mobile phone, etc., and for convenience of description, the model optimization method provided in the present invention will be described below by taking the server as an example of the execution body.

S102: calling preset data acquisition programs, and acquiring operation data of model units with different granularities in the target model and hardware resource global use information of hardware equipment when the target model is operated; wherein, the model unit of each granularity includes: at least one of an operator unit at operator granularity, a computational graph unit at computational graph granularity, and a target model at model granularity.

Further, when executing a model analysis task for a target model, the server may collect operation data of model units with different granularities included in the target model by calling preset data collection programs, and collect global use information of hardware resources of a hardware device when the target model is operated, where the model units with different granularities included in the target model include: at least one of an operator unit at operator granularity, a computation graph unit at computation graph granularity, and a target model at model granularity, the operation data includes: the running start-stop time of each model unit, the call information of each model unit, and the like.

It should be noted that, in the code implementation, the model units with different granularities included in the target model are formed by combining elements such as different functions, methods, operators, and the like, so when the server needs to acquire the operation data of the model units with different granularities included in the target model, the server may collect the operation data of the functions, methods, and operators included in the model units with different granularities included in the target model by calling preset data collection programs, and further may determine the operation data of the model units with different granularities included in the target model according to the collected operation data of the functions, methods, and operators included in the model units with different granularities included in the target model.

In the above, each data acquisition program is used for acquiring the operation data of at least one granularity model unit, and in addition, at least one data acquisition program is also used for acquiring the hardware resource global use information when the target model is operated.

The above-mentioned hardware resource global usage information may refer to data of hardware resources in the hardware device occupied by the overall operation of the object model at each time in the process of operating the object model, where the hardware device may refer to a hardware device such as a graphics processor (Graphics Processing Unit, GPU), a central processing unit (Central Processing Unit, CPU), a neural network processor (Neural network Processing Unit, NPU), and the like, and the hardware resource global usage information may include: GPU/CPU/NPU utilization rate, memory utilization rate, GPU/CPU/NPU power consumption, bandwidth and other data.

The data acquisition program can be selected according to actual requirements, for example: the PyTorchprofiler, NVidia Nweight, and the like.

The call information of each model element described above may be data for characterizing a call condition of each model element, for example: the number of times called, the size of input data when called, the size of output data after called, and the like.

S103: and determining the hardware resource use information corresponding to the model units with different granularities according to the operation data and the hardware resource global use information.

After the server obtains the operation data of the model units with different granularities and the global use information of the hardware resources of the hardware equipment when the target model is operated, the server can sort the obtained operation data and the global use information of the hardware resources according to a specified data format, and can perform performance analysis on the target model based on the sorted operation data and the global use information of the hardware resources, or can store the sorted operation data and the global use information of the hardware resources into a database so as to manage the sorted operation data and the global use information of the hardware resources through the database.

Further, the server can determine the hardware resource usage information corresponding to the model units with different granularities according to the collected operation data of the model units with different granularities and the hardware resource global usage information.

Specifically, the server may determine the running start-stop time periods of the model units with different granularities according to the collected running data of the model units with different granularities, and sort the collected running start-stop time periods of the model units with different granularities according to the time sequence, so as to obtain the running time sequences of the model units with different granularities, and further determine the hardware resource usage information corresponding to the model units with different granularities according to the determined running time sequences and the hardware resource global usage information.

The server can align time axes according to time sequence by the determined running time sequence and hardware resource global use information at different moments, so that hardware resource use information corresponding to model units with different granularities can be determined according to the cross coverage condition between running start and stop time periods of the model units with different granularities contained in the running time sequence on the aligned time axes and the hardware resource global use information at different moments.

For example: according to the hardware resource global use information at the time t1, the use rate of the CPU can be determined to be ten percent, and at the moment, a model unit a operator under the granularity of the operator contained in the target model is running, the use rate of the CPU corresponding to the a operator can be determined to be ten percent.

In addition, the server can also determine call stack information of the model units with different granularities according to the collected operation data of the model units with different granularities, and determine the hardware resource use information corresponding to the model units with different granularities according to the call stack information and the hardware resource global use information, wherein the call stack information refers to call information called by a method, a function and an operator in the process of operating program codes corresponding to a target model, and is a sequence formed according to a call relation.

For example: the utilization rate of the CPU can be determined to be ten percent according to the hardware resource global utilization information at the time t1, the utilization rate of the CPU can be determined to be fifteen percent according to the hardware resource global utilization information at the time t2, the utilization rate of the CPU corresponding to the b operator can be determined to be five percent according to the call stack information when the b operator is called from the time t1 to the time t 2.

It should be noted that, the server may further determine a distribution condition of each operator included in the target model according to a cross coverage condition between the running start and stop times of each model unit and call information of each model unit, for example: and each calculation graph contains operators and the execution sequence of the operators contained in each calculation graph, so that the operator distribution in each calculation graph can be optimized and adjusted according to the determined hardware resource use information corresponding to each calculation graph.

S104: and optimizing the target model according to the hardware resource use information and the operation data to obtain an optimized target model.

Further, the server can optimize the target model according to the hardware resource usage information corresponding to the model units with different granularities and the operation data of the model units with different granularities to obtain an optimized target model.

In addition, after the optimized target model is obtained, the server can deploy the optimized target model and execute tasks through the deployed optimized target model.

The tasks performed by the optimized object model may be determined according to the object model, for example: if the target model is an image recognition model, image recognition can be performed according to the optimized image recognition model. For another example: if the target model is a search recommendation model, commodity recommendation can be carried out for the user according to the optimized search recommendation model.

When the server optimizes the target model according to the hardware resource usage information corresponding to the model units with different granularities and the operation data of the model units with different granularities, the server may optimize the target model according to a single model unit with one granularity included in the target model, or may optimize the target model according to a hierarchical relationship between different granularities, and the two optimization methods will be described in detail below.

If the server needs to optimize the target model according to a single granularity model unit contained in the target model, the model unit of each granularity contained in the target model can be optimized according to the hardware resource usage information corresponding to the model unit of each granularity contained in the target model.

For example: if the CPU usage rate is lower when the calculation map is determined to be executed according to the hardware resource usage information corresponding to a certain calculation map, operators contained in the calculation map can be adjusted to improve the CPU usage rate when the calculation map is executed, so that the target model can be optimized.

For another example: if it is determined that the power consumption of a certain operator is higher when the certain operator is executed according to the hardware resource usage information corresponding to the certain operator, a method or a function contained in the operator can be adjusted to reduce the power consumption of the operator.

If the server needs to optimize the target model according to the hierarchical relationship between different granularities, the target model may be optimized according to the hierarchical relationship between different granularities, the hardware resource usage information corresponding to the model units with different granularities, and the operation data of the model units with different granularities, where the hierarchical relationship may be a containment relationship.

For example: if there is a higher power consumption of executing a computational graph and it can be determined that a higher power consumption of an operator used in a large amount in the computational graph is also higher, it can be shown to some extent that the higher power consumption of the computational graph when running is due to the higher power consumption of the operator used in a large amount in the computational graph, and therefore, the use of the operator by the computational graph can be reduced to optimize the object model.

It should be noted that, the server may further optimize any one of the model units with different granularities included in the target model according to the call information included in the operation data of the model unit and the hardware resource usage information corresponding to the model unit. For example: the server can determine the size of the input data when the model unit is called each time according to the calling information of the model unit, and then can determine the corresponding hardware resource use information of the model unit when the input data with different sizes are processed, so that the size of the input data suitable for the model unit to process can be determined for the model unit.

In addition, since the same method, function or operator contained in different frameworks may have some variance, for example: the code of different frameworks may be different for specific implementations of the same method, function, or operator, as may the naming. Therefore, in performing performance evaluation on the target model, performance evaluation on the target model in different frameworks is also required to optimize the target model.

Based on the above, the server may also collect, through preset data collection programs, operation data of model units with different granularities included in the target model under different frames, and optimize the target model according to the collected operation data of model units with different granularities included in the target model under different frames.

Specifically, the server may collect operation data of model units of different granularities included in the under-frame target model through each data collection program, determine hardware resource usage information corresponding to the model units of different granularities included in each under-frame target model according to the collected operation data of the model units of different granularities included in each under-frame target model, and optimize the target model according to the hardware resource usage information corresponding to the model units of different granularities included in each under-frame target model and the operation data of the model units of different granularities included in each under-frame target model.

In addition, the server may generate a data visualization component corresponding to the target model according to the hardware resource usage information corresponding to the model units with different granularities and the operation data of the model units with different granularities, where the data visualization component includes: the hardware resource usage information statistics diagram component, the hardware resource usage information table component, is specifically shown in fig. 2.

Fig. 2 is a schematic diagram of a visualization component provided in the present invention.

As can be seen in conjunction with fig. 2, the server may display, through the data visualization component, model units with different granularities included in the target model, where Op1, op2, op3 in fig. 2 are different operators included in the target model under the operator granularity, and Fn1, fn2, fn3, fn4 are functions or methods included in the Op1 operator. The area of the sector corresponding to each operator may represent the CPU utilization rate corresponding to each operator, and of course, the area of the sector corresponding to each operator may represent the power consumption corresponding to each operator.

Further, the server can display the data visualization component to the user through equipment used by the user, receive an operation instruction sent by the user, and optimize the target model according to the received operation instruction to obtain an optimized target model.

In addition, the server can store the determined hardware resource usage information corresponding to the model units with different granularities into a database, and if a data acquisition instruction sent by a user is received, the hardware resource usage information matched with the acquisition instruction is called from the database and is sent to equipment used by the user, so that the hardware resource usage information matched with the acquisition instruction is displayed to the user through the equipment used by the user.

In order to further describe the above, the overall flow of generating and displaying the visualization component in the above will be described in detail as shown in fig. 3.

Fig. 3 is a schematic diagram of a process for generating a visualization component provided in the present invention.

As can be seen in connection with fig. 3, the server may generate a model analysis task for the target model according to basic information of the target model input by the user. And further, the generated model analysis task can be executed, and each preset data acquisition program is called to acquire the operation data of model units with different granularities contained in the target model through each data acquisition program, and acquire the overall use information of hardware resources of the hardware equipment when the target model is operated. And the acquired operation data and the global use information of the hardware resources can be analyzed and processed to obtain the operation data of each model unit under different granularities contained in the target model, and finally, the visualized assembly can be generated according to the determined operation data of each model unit under different granularities contained in the target model.

The above method for implementing model optimization for one or more methods of the present invention, based on the same thought, the present invention also provides a corresponding device for model optimization, as shown in fig. 4.

FIG. 4 is a schematic diagram of a model optimization apparatus according to the present invention, including:

an acquisition module 401, configured to acquire a target model;

the acquisition module 402 is used for calling preset data acquisition programs, and acquiring operation data of model units with different granularities in the target model and hardware resource global use information of hardware equipment when the target model is operated; wherein, the model unit of each granularity includes: at least one of an operator unit under operator granularity, a computation graph unit under computation graph granularity, and a target model under model granularity;

a processing module 403, configured to determine, according to the operation data and the hardware resource global usage information, hardware resource usage information corresponding to model units with different granularities;

and the deployment module 404 is configured to optimize the target model according to the hardware resource usage information and the operation data, so as to obtain an optimized target model.

Optionally, the processing module 403 is specifically configured to determine, according to the operation data, an operation start-stop time period of the model unit with different granularity; sequencing the running start-stop time periods of the model units with different granularities according to the sequence of time to obtain the running time sequences of the model units with different granularities; and determining the hardware resource use information corresponding to the model units with different granularities according to the run time sequence and the hardware resource global use information.

Optionally, the processing module 403 is specifically configured to determine call stack information of model units with different granularities according to the operation data; and determining the hardware resource use information corresponding to the model units with different granularities according to the call stack information and the hardware resource global use information.

Optionally, the deployment module 404 is specifically configured to optimize the target model according to the hierarchical relationship between different granularities, the hardware resource usage information, and the operation data.

Optionally, the collection module 402 is specifically configured to invoke preset data collection procedures to collect operation data of model units with different granularities included in the target model under different frames;

the processing module 403 is specifically configured to determine, according to collected operation data of model units with different granularities included in the target model under each frame, hardware resource usage information corresponding to the model units with different granularities included in the target model under each frame;

the deployment module 404 is specifically configured to optimize the target model according to the hardware resource usage information corresponding to the model units with different granularities included in the target model under each framework, and the operation data of the model units with different granularities included in the target model under each framework.

Optionally, the processing module 403 is specifically configured to generate, according to the hardware resource usage information and the operation data, a data visualization component corresponding to the target model, where the data visualization component includes: a hardware resource usage information statistics diagram component, a hardware resource usage information table component; displaying the data visualization component to the user through equipment used by the user, and receiving an operation instruction sent by the user; and optimizing the target model according to the operation instruction to obtain an optimized target model.

Optionally, the apparatus further comprises: a management module 405;

the management module 405 is specifically configured to store the determined usage information of the hardware resource in a database; and if a data acquisition instruction sent by a user is received, retrieving the hardware resource use information matched with the acquisition instruction from the database, and sending the hardware resource use information to equipment used by the user so as to display the hardware resource use information matched with the acquisition instruction to the user through the equipment.

The present invention also provides a computer readable storage medium storing a computer program operable to perform a method of model optimization as provided in fig. 1 above.

The invention also provides a schematic block diagram of the electronic device shown in fig. 5, which corresponds to fig. 1. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as illustrated in fig. 5, although other hardware required by other services may be included. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to implement the model optimization method described above with respect to fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present invention, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

Improvements to one technology can clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present invention.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments of the present invention are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present invention and is not intended to limit the present invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are to be included in the scope of the claims of the present invention.

Claims

1. A method of model optimization, comprising:

obtaining a target model;

calling preset data acquisition programs, and acquiring operation data of model units with different granularities in the target model and hardware resource global use information of hardware equipment when the target model is operated; wherein, the model unit of each granularity includes: at least one of an operator unit under operator granularity, a calculation graph unit under calculation graph granularity and a target model under model granularity, wherein the overall use information of the hardware resources is used for representing the occupation condition of the whole target model on the hardware resources in the hardware equipment when the target model runs;

and optimizing the target model according to the hardware resource use information corresponding to the model units with different granularities and the operation data to obtain an optimized target model.

2. The method of claim 1, wherein prior to determining hardware resource usage information corresponding to model units of different granularity based on the operational data and the hardware resource global usage information, the method further comprises:

3. The method of claim 1, wherein determining hardware resource usage information corresponding to model units of different granularities according to the operation data and the hardware resource global usage information specifically comprises:

4. The method of claim 1, wherein optimizing the object model based on the hardware resource usage information and the operational data, comprises:

5. The method of claim 1, wherein calling each preset data acquisition program to acquire the operation data of model units with different granularities in the target model, specifically comprises:

6. The method of claim 1, wherein optimizing the target model based on the hardware resource usage information and the operational data to obtain an optimized target model, specifically comprises:

7. The method of claim 1, wherein the method further comprises:

storing the determined hardware resource use information into a database;

8. A model-optimized apparatus, comprising:

the acquisition module is used for acquiring the target model;

the acquisition module is used for calling preset data acquisition programs, and acquiring operation data of model units with different granularities in the target model and hardware resource global use information of hardware equipment when the target model is operated; wherein, the model unit of each granularity includes: at least one of an operator unit under operator granularity, a calculation graph unit under calculation graph granularity and a target model under model granularity, wherein the overall use information of the hardware resources is used for representing the occupation condition of the whole target model on the hardware resources in the hardware equipment when the target model runs;

the deployment module is used for optimizing the target model according to the hardware resource use information corresponding to the model units with different granularities and the operation data to obtain an optimized target model.

9. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-7 when executing the program.