WO2022161059A1 - 一种模型运行方法及相关装置 - Google Patents

一种模型运行方法及相关装置 Download PDF

Info

Publication number
WO2022161059A1
WO2022161059A1 PCT/CN2021/141399 CN2021141399W WO2022161059A1 WO 2022161059 A1 WO2022161059 A1 WO 2022161059A1 CN 2021141399 W CN2021141399 W CN 2021141399W WO 2022161059 A1 WO2022161059 A1 WO 2022161059A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
target
module
control instruction
computing unit
Prior art date
Application number
PCT/CN2021/141399
Other languages
English (en)
French (fr)
Inventor
罗佳
张忠立
Original Assignee
展讯通信(上海)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 展讯通信(上海)有限公司 filed Critical 展讯通信(上海)有限公司
Publication of WO2022161059A1 publication Critical patent/WO2022161059A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means

Definitions

  • the present application relates to the field of information processing, and in particular, to a model running method and related devices.
  • terminal devices such as mobile phones, tablet computers, and vehicle-mounted terminals, etc.
  • the terminal devices can run application models to enable the terminal devices to have diversified functions, providing great convenience for people's daily life.
  • the process of the terminal device calling the application model to perform the task can be summarized into two stages: the model compilation stage (or called the model preparation stage) and the model calculation stage.
  • the model compilation stage or called the model preparation stage
  • the model calculation stage For example, when an image recognition model based on a neural network performs an image recognition task in a terminal device, the terminal device needs to first load and compile the image recognition model (ie, the model compilation stage), and then identify the to-be-recognized model based on the model compilation result of the image recognition model. the target image (i.e. the model computation stage).
  • the model compilation phase can be simply understood as a necessary prerequisite for the model calculation phase. If the application model is too complex, the time consumption of the model compilation stage (including model loading, model optimization, and model compilation, etc.) will increase, and the efficiency of the terminal device calling the application model to perform tasks will be reduced.
  • the embodiments of the present application provide a model running method and a related device.
  • a terminal device can obtain the model compilation result of the model to be called from the memory, thereby shortening the time for calling the model to perform the target task and improving the model. Efficiency in performing the target task.
  • an embodiment of the present application provides a method for running a model, and the method includes:
  • a model call instruction the model call instruction is used to call the target model to perform the target task; based on the model call instruction, determine at least one module model corresponding to the target model, and obtain the first control instruction and the first control instruction of each module model from the cache cache. a model parameter; the target task is indicated based on the first control instruction and the first model parameter corresponding to each module model.
  • the terminal device can obtain the model control instructions and model parameters compiled in advance from the memory, which saves the model compilation time during the model calling process, thereby improving the efficiency of model execution tasks (or called the efficiency of calling the model). ).
  • the module model includes at least one operator; compiling each module model to obtain the first control instruction and the first model parameter of each module model; storing the first control instruction and the first model parameter of each module model in the cache.
  • the number of the module models is multiple, the multiple module models correspond to at least two types of computing units, and the first control instructions and the first model parameters of the module models correspond to the computing units.
  • the type of the computing unit is: a central processing unit (CPU), a graphics processing unit (GPU), or a neural network processor (NPU).
  • CPU central processing unit
  • GPU graphics processing unit
  • NPU neural network processor
  • Use the state compile the target module model based on the second target computing unit, obtain the second control instruction and the second model parameter corresponding to the target module model, the second target computing unit and the target module model have a corresponding relationship, and the second target unit is different from the first A target unit; updating the first control instruction and the first model parameter of the target module model stored in the cache according to the second control instruction and the second model parameter.
  • the first target corresponding to the target module model can also be detected. Whether the computing unit is in an unavailable state; if the first target computing unit is not in an unavailable state, the target module model is compiled based on the first target computing unit, and the third control instruction and the third model parameter corresponding to the target module model are obtained. The third control instruction and the third model parameter update the first control instruction and the first model parameter of the target module model stored in the cache.
  • an embodiment of the present application provides a model running device, where the model running device includes:
  • a receiving unit configured to receive a model invocation instruction, the model invocation instruction is used to invoke the target model to perform the target task;
  • a processing unit configured to determine at least one module model corresponding to the target model based on the model invocation instruction, and obtain the first control instruction and the first model parameter of each of the module models from the cache;
  • the processing unit is further configured to execute the target task based on the first control instruction and the first model parameter corresponding to each of the module models.
  • the processing unit before acquiring the first control instruction and the first model parameter of each of the module models from the cache, is further configured to: based on a plurality of operators included in the target model. , segment the target model to obtain one or more module models, the module models include at least one of the operators; compile each of the module models to obtain the first control instruction and the first Model parameters; store the first control instruction and the first model parameters of each of the module models in the cache.
  • the number of the module models is multiple, the multiple module model parts correspond to at least two types of computing units, and the first control instruction and the first model parameters of the module model are the same as The computing unit corresponds to.
  • the type of the computing unit is: a central processing unit (CPU), a graphics processing unit (GPU), or a neural network processor (NPU).
  • CPU central processing unit
  • GPU graphics processing unit
  • NPU neural network processor
  • the processing unit is specifically configured to: detect whether the computing unit corresponding to each module model is in an unavailable state; determine a target module model from the module model based on the detection result, The first target computing unit corresponding to the target module model is in an unavailable state; the target module model is compiled based on the second target computing unit to obtain a second control instruction and a second model parameter corresponding to the target module model, the The second target computing unit has a corresponding relationship with the target module model, and the second target unit is different from the first target unit; the storage in the cache is updated according to the second control instruction and the second model parameter The first control instruction and the first model parameter of the target module model.
  • the processing unit is further configured to: detect whether the first target computing unit corresponding to the target module model is in an unavailable state; if the first target computing unit is not in an unavailable state, based on the first target computing unit
  • the target computing unit compiles the target module model, obtains a third control instruction and a third model parameter corresponding to the target module model, and updates the data stored in the cache according to the third control instruction and the third model parameter.
  • the first control instruction and the first model parameter of the target module model is further configured to: detect whether the first target computing unit corresponding to the target module model is in an unavailable state; if the first target computing unit is not in an unavailable state, based on the first target computing unit
  • the target computing unit compiles the target module model, obtains a third control instruction and a third model parameter corresponding to the target module model, and updates the data stored in the cache according to the third control instruction and the third model parameter.
  • the first control instruction and the first model parameter of the target module model is further configured to: detect whether the first target computing unit corresponding to
  • the present application provides a chip, which is used for receiving a model call instruction, and the model call instruction is used to call a target model to perform a target task; based on the model call instruction, determine at least one module model corresponding to the target model, and The first control instructions and first model parameters of each module model are obtained from the cache; the target task is executed based on the first control instructions and the first model parameters corresponding to each module model.
  • the present application provides a chip module, and the chip mold base includes the chip of the aforementioned third aspect.
  • the present application provides a terminal device, the terminal device comprising:
  • the processor which invokes a computer program, is configured to perform the following operations: receive a model invocation instruction, where the model invocation instruction is used to invoke the target model to perform the target task;
  • the first control instructions and first model parameters of each module model are obtained in the cache; the target task is indicated based on the first control instructions and the first model parameters corresponding to each module model.
  • an embodiment of the present application provides a computer-readable storage medium for storing computer software instructions used by the terminal device, including a program for executing any of the methods described in the first aspect.
  • FIG. 1 is a schematic flowchart of a model running method provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of storing data in a cache provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a target model provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of another model running method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the correspondence between each module model of a target model and a computing unit according to an embodiment of the present application
  • FIG. 6 is a schematic structural diagram of a model running device provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
  • At least one (item) means one or more
  • plural means two or more
  • at least two (item) means two or three and three
  • “and/or” is used to describe the corresponding relationship between corresponding objects, indicating that there can be three kinds of relationships, for example, “A and/or B” can mean: only A exists, only B exists, and both A and B exist three A case where A and B can be singular or plural.
  • the character “/” generally indicates that the corresponding object before and after is an "or” relationship.
  • At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • At least one (a) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, c can be single or multiple.
  • the model running method in this application can be applied to terminal equipment.
  • terminal equipment mentioned in this application can also be referred to as terminal, user equipment, access terminal, subscriber unit, subscriber station, mobile station, mobile station , remote station, remote terminal, mobile device, user terminal, user agent or user device.
  • the terminal device in the embodiment of the present application may be a mobile phone (mobile phone), a tablet computer (Pad), a computer with a wireless transceiver function, a virtual reality (virtual reality, VR) terminal device, an augmented reality (augmented reality, AR) terminal equipment, wireless terminals in industrial control, wireless terminals in self driving, wireless terminals in remote medical, wireless terminals in smart grid, transportation security ( Wireless terminals in transportation safety), wireless terminals in smart cities, wireless terminals in smart homes, etc.
  • a virtual reality (virtual reality, VR) terminal device an augmented reality (augmented reality, AR) terminal equipment
  • wireless terminals in industrial control wireless terminals in self driving
  • wireless terminals in remote medical wireless terminals in remote medical
  • wireless terminals in smart grid wireless terminals in smart grid
  • transportation security Wireless terminals in transportation safety
  • wireless terminals in smart cities wireless terminals in smart homes, etc.
  • Cache also known as Cache memory: is located between the central processing unit (Central Processing Unit, CPU) and random access memory (Dynamic Random Access Memory, DRAM), small-scale, but high-speed memory, usually composed of static Memory (Static Random Access Memory, SRAM) composition.
  • CPU Central Processing Unit
  • DRAM Dynamic Random Access Memory
  • SRAM static Random Access Memory
  • CPU Central Processing Unit
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • SIMD Single Instruction Multiple Data
  • NPU Neural Network Processing Unit Its operating principle is to simulate human neurons and synapses at the circuit layer, and use the deep learning instruction set to directly process large-scale neurons and synapses. One instruction completes a group of processing of neurons. Compared with CPU and GPU, NPU is much more efficient.
  • FIG. 1 is a schematic flowchart of a model running method provided by an embodiment of the present application.
  • the execution body of the method shown in FIG. 1 may be a terminal device, or a chip in the terminal device.
  • FIG. 1 takes a terminal device as an execution subject of the method as an example for description.
  • the model running method includes steps S101 to S103.
  • the terminal device receives the model invocation instruction input by the user.
  • the application 1 is used to realize a human portrait recognition function, and the human portrait identification function is implemented based on a convolutional neural network (Convolutional Neural Networks, CNN) model.
  • Application 1 is installed in the user's mobile phone terminal.
  • the user opens application 1 to start the portrait recognition function, it can be regarded as the terminal device has received a model invocation instruction, and the CNN model corresponding to application 1 is determined as the target based on the model invocation instruction.
  • Model call the CNN model to perform face recognition (ie, the target task).
  • S102 Based on the model calling instruction, determine at least one module model corresponding to the target model, and obtain the first control instruction and the first model parameter of each module model from the cache.
  • the cache of the terminal device stores the first control instruction and the first model parameter of at least one module model, and each module model carries the identifier of the target model.
  • the terminal device may determine the first control instruction and the first model parameter of the module model corresponding to the target model from the cache according to the identifier of the target model in the model invocation instruction.
  • a module model corresponds to a first control instruction and a first model parameter, and the first control instruction and the first model parameter have an association or a calling relationship.
  • the first control instructions and the first model parameters included in the multiple module models stored in the cache of the terminal device are shown, and the module models with the same model identifier form one model.
  • a model corresponds to an application program installed in the terminal device.
  • one model corresponds to one application program; in other embodiments, one application program may correspond to multiple models.
  • the application does not define the correspondence between the model and the application.
  • the terminal device When the terminal device receives the model invocation instruction and determines that model 1 is the target model according to the model invocation instruction, the terminal device obtains the first control instruction and the first model parameters included in the module model 1 of model 1 from the cache, and the model 1
  • the module model 2 includes the first control instructions and the first model parameters.
  • the present application introduces in detail the specific manner of acquiring the first control instruction and the first model parameter of each module model.
  • the terminal device divides the target model based on multiple operators included in the target model to obtain one or more module models, where the module model includes at least one operator. Further, the terminal device compiles each module model, obtains the first control instruction and the first model parameter of each module model, and stores the first control instruction and the first model parameter of each module model in the cache.
  • an operator is a computing unit of a target model, and can be understood as a function in the model, or as a collection of multiple functions in the model, which is not specifically limited here.
  • each convolutional layer, pooling layer, and fully connected layer can be considered as 1 operator.
  • the target model is shown in FIG. 3 .
  • the target model includes operator 1, operator 2, operator 3, operator 4, operator 5 and operator 6.
  • the terminal device can The connection structure between the various operators divides the target model into module model 1, module model 2, module model 3 and module model 4 as shown in Figure 3, wherein the module model 1 includes the operator 1, the target module 2 includes operator 2, operator 3 and operator 4 connected in series, module model 3 includes operator 5, and module model 4 includes operator 6. Further, the terminal device compiles the module model 1, the module model 2, the module model 3 and the module model 4 to obtain the first control instruction and the first model parameter of the module model 1, and the first control instruction and the first model parameter of the module model 2. model parameters, the first control instruction and the first model parameter of the module model 3, the first control instruction and the first model parameter of the module model 4, and the first control instruction and the first model parameter of each module model are stored in the cache .
  • the number of module models of the target model is multiple, and the multiple module models correspond to at least two types of computing units (also referred to as chips or processors).
  • the first control instruction and the first model parameter of the module model correspond to the computing unit.
  • the module model is compiled based on the computing units corresponding to each module model, and the module model is obtained.
  • a first control command and a first model parameter For example, if the module model corresponds to the CPU, the terminal device compiles the module model into a first control instruction (or referred to as a hardware instruction) in a format corresponding to the CPU and a first model parameter corresponding to the first control instruction.
  • the computing unit includes CPU, GPU and NPU.
  • the terminal device includes CPU, GPU and NPU. Taking the target model shown in FIG. 3 as an example, the terminal device divides the target model into module model 1 , module model 2 , module model 3 and module model 4 . If module model 1 and module model 4 run on CPU, module model 2 runs on GPU, and module model 3 runs on NPU, the terminal device compiles module model 1 and module model 4 based on CPU, and obtains module model 1 and module model respectively. The first control instruction and the first model parameter of 4; the module model 2 is compiled based on the GPU, and the first control instruction and the first model parameter of the module model 2 are obtained; the module model 3 is compiled based on the NPU, and the module model 3 is obtained. A first control command and a first model parameter.
  • the number of module models of the target model is one.
  • the target model corresponds to A modular model (that is, the target model does not have model heterogeneity).
  • the module model is the complete target model, and the terminal device compiles the target model (including one or more of model loading, model optimization, and model compilation) to obtain the first control of the target model. instruction and first model parameter.
  • S103 Execute the target task based on the first control instruction and the first model parameter corresponding to each module model.
  • the terminal device executes the target task based on the connection relationship between each module model, the first control instruction and the first model parameter.
  • the terminal device stores the first control instruction and the first model parameters of the module model corresponding to the target model in the cache, so that after receiving the model calling instruction of the target model subsequently, it can directly obtain the first control instruction from the cache. and the first model parameter, shorten the calling time of the target model, and improve the efficiency of the target model to perform the target task.
  • FIG. 4 is a schematic flowchart of a model running method provided by an embodiment of the present application.
  • the execution body of the method shown in FIG. 4 may be a terminal device, or a chip in the terminal device.
  • FIG. 4 takes a terminal device as an execution subject of the method as an example for description.
  • the model running method includes steps S401 to S407 .
  • S401 Receive a model invocation instruction, where the model invocation instruction is used to invoke a target model to execute a target task.
  • S402 Determine at least one module model corresponding to the target model based on the model calling instruction.
  • steps S401 to S402 For specific implementations of steps S401 to S402, reference may be made to the relevant descriptions of steps S101 to S102 in the foregoing embodiment 1, and details are not repeated here.
  • the computing unit is in an unavailable state can be understood as: the computing unit is in a closed state (for example, when the terminal device turns off the terminal NPU in order to save power, in this case, the terminal device detects that the NPU is unavailable), and the computing unit has insufficient cache.
  • the computing unit has insufficient cache.
  • the terminal device detects whether the computing unit running the module model is in an unavailable state. For example, the terminal device divides the target model into module model 1 , module model 2 , module model 3 and module model 4 . Among them, module model 1 and module model 4 run on the CPU, module model 2 runs on the GPU, and module model 3 runs on the NPU. In this case, the end device detects whether the CPU, GPU and NPU are in an unavailable state.
  • the terminal device determines a first target computing unit in an unavailable state from a plurality of computing units included in the terminal device, and determines a module model whose target model runs on the first target computing unit as a target module model.
  • S405 Compile the target module model based on the second target computing unit to obtain second control instructions and second model parameters corresponding to the target module model, the second target computing unit has a corresponding relationship with the target module model, and the second target unit is different from the target module model. the first target unit.
  • the second target computing unit has a corresponding relationship with the target module model, which can be understood as the target module model running on the second target computing unit.
  • the terminal device determines the first target unit and the target module model
  • the terminal device determines a second target unit from other computing units other than the first target unit, and compiles the second target unit based on the second target unit. the target module model, and obtain the second control instruction and the second model parameter.
  • the target model includes a module model 1, a module model 2, a module model 3 and a module model 4 as shown in FIG. 3, wherein the corresponding relationship between each module model and the computing unit is shown in 5a in FIG. 5, and the module model 1 and module model 4 run on the CPU, module model 2 runs on the GPU, and module model 3 runs on the NPU.
  • the terminal device detects that the NPU is in an unavailable state, the terminal device regards the NPU as the first target unit, and determines the module model 3 running on the NPU as the target module model.
  • the terminal device determines that the GPU is the second target unit (that is to say, runs the module model 3 on the GPU) from other computing units (i.e., CPU and GPU) except the NPU, then the terminal device is based on the GPU to the module model. 3. Compile to obtain the second control instruction and the second model parameter of the module model 3. In other words, the updated correspondence between each module model and the computing unit is shown in module 5b in FIG. 5 , module model 1 and module model 4 run on the CPU, and module model 2 and module model 3 run on the GPU.
  • the terminal device may also determine the CPU as the second target unit. After that, the terminal device compiles the module model 3 based on the CPU, and the subsequent process can refer to the above introduction, which will not be repeated here.
  • S406 Update the first control instruction and the first model parameter of the target module model stored in the cache according to the second control instruction and the second model parameter.
  • the terminal device After the terminal device obtains the second control command and the second model parameter of the target model, the terminal device updates the first control command and the first model parameter of the target module model stored in the cache, and the updated first control command is the second control command instruction, the updated first model parameter is the second model parameter. Further, the terminal device also obtains the first control instructions and the first model parameters of other module models other than the target module model from the cache.
  • the terminal device after the terminal device updates the first control instruction and the first model parameter of the target module model in the cache, the terminal device detects whether the first target computing unit corresponding to the target module model is in an unavailable state. If the first target computing unit is not in an unavailable state, compile a target module model based on the first target computing unit, obtain a third control instruction and a third model parameter corresponding to the target module model, and obtain a third control instruction and a third model parameter corresponding to the target module model. The instruction and the third model parameter update the first control instruction and the first model parameter of the target module model stored in the cache.
  • the target models include module model 1 , module model 2 , module model 3 and module model 4 .
  • the terminal device regards the NPU as the first target unit, and determines the module model 3 running on the NPU as the target module model.
  • the terminal device After the terminal device obtains the updated first control instruction (ie the second control instruction) and the updated first model parameter (ie the second model parameter) based on the GPU compilation module model 3, the terminal device detects that the NPU (the first target The computing unit) is not in an unavailable state (an example application scenario: due to insufficient power of the terminal device, the terminal device turns on the power saving mode and turns off the high-power computing unit NPU, and turns on the NPU when the terminal device has sufficient power), and the terminal device is based on the NPU Compile the module model 3 to obtain the third control instruction and the third model parameter of the module model 3 .
  • the NPU the first target The computing unit
  • the terminal device updates the first control instruction (that is, the foregoing second control instruction) and the first model parameter (that is, the foregoing second model parameter) of the target module model stored in the cache, and the updated first control instruction is the third control instruction, The updated first model parameter is the third model parameter. Further, the terminal device also obtains the first control instructions and the first model parameters of other module models other than the target module model from the cache.
  • S407 Execute the target task based on the first control instruction and the first model parameter corresponding to each module model.
  • step S407 For the specific implementation of step S407, reference may be made to the specific implementation of step S103 in the foregoing embodiment, which will not be repeated here.
  • the terminal device can dynamically adjust the target model according to the state of its own computing unit (whether it is in an unavailable state), which improves the model operation efficiency and improves the model operation reliability.
  • FIG. 6 is a schematic structural diagram of a model running device provided by an embodiment of the present invention.
  • the model running device is configured in a terminal device, and the model running device includes:
  • a receiving unit 601 configured to receive a model invocation instruction, where the model invocation instruction is used to invoke a target model to perform a target task;
  • a processing unit 602 configured to determine at least one module model corresponding to the target model based on the model invocation instruction, and obtain a first control instruction and a first model parameter of each of the module models from a cache;
  • the processing unit 602 is further configured to execute the target task based on the first control instruction and the first model parameter corresponding to each of the module models.
  • the processing unit 602 before acquiring the first control instruction and the first model parameter of each of the module models from the cache, the processing unit 602 is further configured to: based on a plurality of algorithms included in the target model subdividing the target model to obtain one or more module models, the module models including at least one of the operators; compiling each module model to obtain the first control instruction and the first control instruction of each module model a model parameter; the first control instruction and the first model parameter of each of the module models are stored in the cache.
  • the number of the module models is multiple, the multiple module model parts correspond to at least two types of computing units, and the first control instruction and the first model parameters of the module model are the same as the The computing unit corresponds to.
  • the type of the computing unit is: a central processing unit (CPU), a graphics processing unit (GPU), or a neural network processor (NPU).
  • CPU central processing unit
  • GPU graphics processing unit
  • NPU neural network processor
  • the processing unit 602 is specifically configured to: detect whether the computing unit corresponding to each module model is in an unavailable state; and determine a target module model from the module model based on the detection result , the first target computing unit corresponding to the target module model is in an unavailable state; the target module model is compiled based on the second target computing unit, and the second control instruction and the second model parameter corresponding to the target module model are obtained, so The second target computing unit has a corresponding relationship with the target module model, and the second target unit is different from the first target unit; update the cache memory according to the second control instruction and the second model parameter The stored first control instruction and the first model parameter of the target module model.
  • the processing unit 602 is further configured to: detect whether the first target computing unit corresponding to the target module model is in an unavailable state; if the first target computing unit is not in an unavailable state, based on the first target computing unit A target computing unit compiles the target module model, obtains a third control instruction and a third model parameter corresponding to the target module model, and updates storage in the cache according to the third control instruction and the third model parameter The first control instruction and the first model parameter of the target module model.
  • each unit module of the model running device described in the embodiment of the present invention may be specifically implemented according to the method in the method embodiment described in FIG. 1 or FIG. 4 , and the specific implementation process may refer to FIG. 1 or FIG. 4 . The relevant description of the method embodiment of 4 is not repeated here.
  • Embodiments of the present application further provide a chip, where the chip can execute the relevant steps of the terminal device in the foregoing method embodiments.
  • the chip is used to: receive a model call instruction, the model call instruction is used to call the target model to perform the target task; based on the model call instruction, determine at least one module model corresponding to the target model, and obtain the first module model of each module model from the cache cache. Control instructions and first model parameters; perform target tasks based on the first control instructions and first model parameters corresponding to each module model.
  • the chip is also used for: dividing the target model based on multiple operators included in the target model to obtain one or more module models, the module models including at least one operator; compiling each module model to obtain the first control instruction and the first model parameter of each module model; the first control instruction and the first model parameter of each module model are stored in the cache.
  • the number of the module models is multiple, and the multiple module models correspond to at least two types of computing units, wherein the first control instructions and the first model parameters of the module models correspond to the computing units.
  • the type of the computing unit is: a central processing unit (CPU), a graphics processing unit (GPU), or a neural network processor (NPU).
  • CPU central processing unit
  • GPU graphics processing unit
  • NPU neural network processor
  • the chip is specifically used to: detect whether the computing unit corresponding to each module model is in an unavailable state; based on the detection result, determine the target module model from the module model, and the first target corresponding to the target module model.
  • the computing unit is in an unavailable state; the target module model is compiled based on the second target computing unit to obtain a second control instruction and a second model parameter corresponding to the target module model, the second target computing unit has a corresponding relationship with the target module model, and the second target
  • the unit is different from the first target unit; the first control instruction and the first model parameter of the target module model stored in the cache are updated according to the second control instruction and the second model parameter.
  • the chip is also used to detect whether the first target computing unit corresponding to the target module model is in an unavailable state; if the first target computing unit is not in an unavailable state, the target is compiled based on the first target computing unit.
  • the module model obtains the third control instruction and the third model parameter corresponding to the target module model, and updates the first control instruction and the first model parameter of the target module model stored in the cache according to the third control instruction and the third model parameter.
  • An embodiment of the present application further provides a chip module, which can be applied to a terminal device, and the chip module includes the aforementioned chip that can be applied to a terminal device.
  • FIG. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
  • the terminal device 70 described in the embodiments of the present application includes: a processor 701 and a memory 702.
  • the processor 701 and the memory 702 are connected through one or more communication buses.
  • the above-mentioned processor 701 can be a central processing unit (Central Processing Unit, CPU), and the processor can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC) ), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the processor 701 is configured to support the user equipment to perform the corresponding functions of the terminal equipment in the method described in FIG. 1 or FIG. 4 .
  • the above-mentioned memory 702 may include read-only memory and random access memory, and provides computer programs and data to the processor 701 .
  • a portion of memory 702 may also include non-volatile random access memory.
  • the model invocation instruction is used to invoke the target model to perform the target task
  • the target task is executed based on the first control instructions and the first model parameters corresponding to each of the module models.
  • the processor 701 before acquiring the first control instruction and the first model parameter of each of the module models from the cache, the processor 701 is specifically configured to: an operator, dividing the target model to obtain one or more module models, the module model including at least one of the operators; compiling each module model to obtain the first control instruction of each module model and the first model parameter; the first control instruction and the first model parameter of each of the module models are stored in the cache.
  • the number of the module models is multiple, the multiple module models correspond to at least two types of computing units, and the first control instructions and the first model parameters of the module models corresponding to the computing unit.
  • the computing unit includes: a central processing unit (CPU), a graphics processing unit (GPU), and a neural network processor (NPU).
  • CPU central processing unit
  • GPU graphics processing unit
  • NPU neural network processor
  • the processor 701 is specifically configured to: detect whether the computing unit corresponding to each module model is in an unavailable state; and determine a target from the module model based on the detection result module model, the first target computing unit corresponding to the target module model is in an unavailable state; the target module model is compiled based on the second target computing unit, and the second control instruction and the second model parameter corresponding to the target module model are obtained , the second target computing unit has a corresponding relationship with the target module model, and the second target unit is different from the first target unit; the second target unit is updated according to the second control instruction and the second model parameter.
  • the first control instruction and the first model parameter of the target module model are stored in the cache.
  • the processor 701 is further configured to: detect whether the first target computing unit corresponding to the target module model is in an unavailable state; if the first target computing unit is not in an unavailable state, based on the The first target computing unit compiles the target module model, obtains a third control instruction and a third model parameter corresponding to the target module model, and updates the cache according to the third control instruction and the third model parameter The first control instruction and the first model parameter of the target module model stored in .
  • the processor 701 and the memory 702 described in this embodiment of the present invention may execute the implementation manner described in the method embodiment described in FIG. 1 or FIG. 4 provided in the embodiment of the present invention, and may also execute the embodiment of the present invention.
  • the implementation method of the model running device described in FIG. 6 is provided, which will not be repeated here.
  • each module/unit included in each device and product described in the above-mentioned embodiments it may be a software module/unit, a hardware module/unit, or a part of a software module/unit and a part of a hardware module/unit .
  • each module/unit included in the product may be implemented by hardware such as a circuit, or at least some modules/units may be implemented by a software program, and the software program runs Since the processor is integrated inside the chip, the remaining (if any) modules/units can be implemented by hardware such as circuits; for each device and product applied to or integrated in the chip module, each module/unit included can be implemented using It is realized by hardware such as circuits, and different modules/units can be located in the same piece of the chip module (such as chips, circuit modules, etc.) or in different components, or at least some modules/units can be realized by software programs.
  • the remaining (if any) part of the modules/units can be implemented by hardware such as circuits; for each device and product applied to or integrated in the terminal, the modules/units contained therein can be all It is implemented by means of hardware such as circuits, and different modules/units may be located in the same component (eg, chip, circuit module, etc.) or in different components in the terminal, or at least some modules/units may be implemented by means of software programs.
  • the program runs on the processor integrated inside the terminal, and the remaining (if any) part of the modules/units can be implemented by hardware such as circuits.
  • An embodiment of the present application further provides a computer-readable storage medium, where the readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program can be used to implement the embodiment corresponding to FIG. 1 or FIG. 4 of the present application.
  • the model running method described in the embodiment will not be repeated here.
  • the computer-readable storage medium may be an internal storage unit of the terminal device described in any of the foregoing embodiments, such as a hard disk or a memory of the device.
  • the computer-readable storage medium may also be an external storage device of the terminal device, such as a plug-in hard disk equipped on the device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, Flash Card, etc.
  • the computer-readable storage medium may also include both an internal storage unit of the terminal device and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the terminal device.
  • the computer-readable storage medium can also be used to temporarily store data that has been or will be output.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM), and the like.

Abstract

本申请实施例公开一种模型运行方法及相关装置。其中,模型运行方法包括:接收模型调用指令,该模型调用指令用于调用目标模型执行目标任务;基于该模型调用指令,确定目标模型对应的至少一个模块模型,并从内存中获取各个模块模型的第一控制指令和第一模型参数;基于各个模块模型对应的第一控制指令和第一模型参数,执行该目标任务。通过这样的方式,终端设备在调用模型执行目标任务时,可以直接从内存中获取模型的编译结果,从而缩短调用模型执行目标任务的时间,提升模型执行目标任务的效率。

Description

一种模型运行方法及相关装置 技术领域
本申请涉及信息处理领域,尤其涉及一种模型运行方法及相关装置。
背景技术
随着终端设备(如移动手机、平板电脑和车载终端等)的快速发展,终端设备可以通过运行应用模型,使终端设备具有多元化功能,为人们的日常生活提供了极大的便利。
终端设备调用应用模型执行任务的过程可概述为两个阶段:模型编译阶段(或称模型准备阶段)和模型计算阶段。例如,基于神经网络的图像识别模型在终端设备中执行图像识别任务时,终端设备需要先加载和编译图像识别模型(即模型编译阶段),再基于该图像识别模型的模型编译结果来识别待识别的目标图像(即模型计算阶段)。模型编译阶段可以简单的理解为模型计算阶段的必要前提条件。若应用模型过于复杂,则会增加模型编译阶段(包括模型加载、模型优化和模型编译等)的时间损耗,降低终端设备调用该应用模型执行任务的效率。
可见,如何在调用应用模型执行任务时,提升模型执行任务的运行效率是一个亟待解决的问题。
发明内容
本申请实施例提供了一种模型运行方法及相关装置,通过本申请提供的方法,终端设备可以通过从内存中获取待调用模型的模型编译结果,从而缩短调用模型执行目标任务的时间,提升模型执行目标任务的效率。
第一方面,本申请实施例提供了一种模型运行方法,该方法包括:
接收模型调用指令,该模型调用指令用于调用目标模型执行目标任务;基于该模型调用指令,确定目标模型对应的至少一个模块模型,并从缓存cache中获取各个模块模型的第一控制指令和第一模型参数;基于各个模块模型对应的第一控制指令和第一模型参数,指示该目标任务。
可见,通过该模型运行方法终端设备可以从内存中获取提前编译的模型控制指令和模型参数,在模型调用过程中节省模型编译的时间,进而提升模型执行任务的效率(或称为调用模型的效率)。
一种可能的实现方式中,从缓存cache中获取各个模块模型的第一控制指令和第一模型参数之前,还可以基于该目标模型包括的多个算子,切分目标模型得到一个或多个模块模型,该模块模型包括至少一个算子;编译每个,模块模型,得到每个模块模型的第一控制指令和第一模型参数;存储每个模块模型的第一控制指令和第一模型参数于cache。
一种可能的实现方式中,该模块模型的数量为多个,该多个模块模型对应至少两种计算单元,该模块模型的第一控制指令和第一模型参数与计算单元对应。
一种可能的实现中,该计算单元的种类为:中央处理器CPU、图形处理器GPU或者神经网络处理器NPU。
一种可能的实现方式中,检测每个模块模型对应的计算单元是否处于不可用状态;基于检测结果,从模块模型中确定出目标模块模型,该目标模块模型对应的第一目标计算单元处于不可用状态;基于第二目标计算单元编译目标模块模型,得到目标模块模型对应的第二控制指令和第二模型参数,第二目标计算单元与目标模块模型具有对应关系,第二目标单元不同于第一目标单元;根据第二控制指令和第二模型参数更新cache中存储的目标模块模型的第一控制指令和第一模型参数。
一种可能的实现方式中,在根据第二控制指令和第二模型参数更新cache中存储的目标模块模型的第一控制指令和第一模型参数之后,还可以检测目标模块模型对应的第一目标计算单元是否处于不可用状态;若第一目标计算单元不处于不可用状态,则基于第一目标计算单元编译目标模块模型,得到目标模块模型对应的第三控制指令和第三模型参数,并根据第三控制指令和第三模型参数更新cache中存储的目标模块模型的第一控制指令和第一模型参数。
第二方面,本申请实施例提供了一种模型运行装置,所述模型运行装置包括:
接收单元,用于接收模型调用指令,所述模型调用指令用于调用目标模型执行目标任务;
处理单元,用于基于所述模型调用指令,确定所述目标模型对应的至少一个模块模型,并从缓存cache中获取各个所述模块模型的第一控制指令和第一模型参数;
所述处理单元,还用于基于各个所述模块模型对应的所述第一控制指令和所述第一模型参数,执行所述目标任务。
一种可能的实现中,所述从缓存cache中获取各个所述模块模型的第一控制指令和第一模型参数之前,所述处理单元还用于:基于所述目标模型包括的多个算子,切分所述目标模型得到一个或多个模块模型,所述模块模型包括至少一个所述算子;编译每个所述模块模型,得到每个所述模块模型的第一控制指令和第一模型参数;存储每个所述模块模型的所述第一控制指令和所述第一模型参数于cache。
一种可能的实现中,所述模块模型的数量为多个,所述多个模块模型部对应至少两种计算单元,所述模块模型的所述第一控制指令和所述第一模型参数与所述计算单元对应。
一种可能的实现中,所述计算单元的种类为:中央处理器CPU、图形处理器GPU或者神经网络处理器NPU。
一种可能的实现中,所述处理单元具体用于:检测每个所述模块模型对应的所述计算单元是否处于不可用状态;基于检测结果,从所述模块模型中确定出目标模块模型,所述目标模块模型对应的第一目标计算单元处于不可用状态;基于第二目标计算单元编译所述目标模块模型,得到所述目标模块模型对应的第二控制指令和第二模型参数,所述第二目标计算单元与所述目标模块模型具有对应关系,所述第二目标单元不同于所述第一目标单元;根据所述第二控制指令和所述第二模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数。
一种可能的实现中,所述根据所述第二控制指令和所述第二模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数之后,所述处理单元还用于:检测所述目标模块模型对应的所述第一目标计算单元是否处于不可用状态;若所 述第一目标计算单元不处于不可用状态,则基于所述第一目标计算单元编译所述目标模块模型,得到所述目标模块模型对应的第三控制指令和第三模型参数,并根据所述第三控制指令和所述第三模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数。
第三方面,本申请提供了一种芯片,该芯片用于接收模型调用指令,该模型调用指令用于调用目标模型执行目标任务;基于模型调用指令,确定目标模型对应的至少一个模块模型,并从缓存cache中获取各个模块模型的第一控制指令和第一模型参数;基于各个模块模型对应的第一控制指令和第一模型参数,执行目标任务。
第四方面,本申请提供了一种芯片模组,该芯片模座包括前述第三方面中的芯片。
第五方面,本申请提供了一种终端设备,所述终端设备包括:
存储器,用于存储计算机程序;
处理器,调用计算机程序,用于执行以下操作:接收模型调用指令,该模型调用指令用于调用目标模型执行目标任务;基于该模型调用指令,确定目标模型对应的至少一个模块模型,并从缓存cache中获取各个模块模型的第一控制指令和第一模型参数;基于各个模块模型对应的第一控制指令和第一模型参数,指示该目标任务。
第六方面,本申请实施例提供一种计算机可读存储介质,用于储存上述终端设备所用的计算机软件指令,其包括用于执行上述第一方面任一所述的方法所涉及的程序。
附图说明
图1为本申请实施例提供的一种模型运行方法的流程示意图;
图2为本申请实施例提供的一种cache中存储数据的示意图;
图3为本申请实施例提供的一种目标模型的示意图;
图4为本申请实施例提供的另一种模型运行方法的流程示意图;
图5为本申请实施例提供的一种目标模型的各个模块模型与计算单元的对应关系的示意图;
图6为本申请实施例提供的一种模型运行装置的结构示意图;
图7为本申请实施例提供的一种终端设备的结构示意图。
具体实施方式
本申请实施例提供一种模型运行方法及相关装置,为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述。
本申请的说明书、权利要求书及附图中的术语“第一”和“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列操作或单元的过程、方法、系统、产品或设备没有限定于已列出的操作或单元,而是可选地还包括没有列出的操作或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它操作或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实 施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上,“至少两个(项)”是指两个或三个及三个以上,“和/或”,用于描述对应对象的对应关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后对应对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
本申请中的模型运行方法可应用于终端设备,需要知晓的是,本申请所提及的终端设备也可以称为终端、用户设备、接入终端、用户单元、用户站、移动站、移动台、远方站、远程终端、移动设备、用户终端、用户代理或用户装置。本申请的实施例中的终端设备可以是手机(mobile phone)、平板电脑(Pad)、带无线收发功能的电脑、虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程医疗(remote medical)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端等。
为便于理解本申请公开的实施例,首先对本申请实施例涉及的一些概念进行阐述。这些概念的阐述包括但不限于以下内容。
缓存(又称Cache)存储器:是位于中央处理器(Central Processing Unit,CPU)和随机存取存储器(Dynamic Random Access Memory,DRAM)之间,规模较小,但速度很高的存储器,通常由静态存储器(Static Random Access Memory,SRAM)组成。
中央处理器(Central Processing Unit,CPU),是终端设备的核心配件之一,其功能主要是解释计算机指令以及处理计算机软件中的数据,是终端设备的运算核心和控制核心。
图形处理器(Graphics Processing Unit,GPU):又称显示核心、视觉处理器、显示芯片,其采用多线程单指令多数据流(Single Instruction Multiple Data,SIMD)架构,是在终端设备上做图像和图形相关运算工作的微处理器。
神经网络处理器(Neural Network Processing Unit,NPU):它的运行原理是在电路层模拟人类神经元和突触,并且利用深度学习指令集直接处理大规模神经元和突触,一条指令完成一组神经元的处理。相比于CPU和GPU,NPU的运行效率高出很多。
为了更好地理解本申请提供的方案,下面将结合本申请实施例中的附图,对本申请实施例进行阐述。
请参见图1,图1是本申请实施例提供的一种模型运行方法的流程示意图。图1所示方法的执行主体可以为终端设备,或,为终端设备中的芯片。图1以终端设备为方法的执行主体为例进行说明。如图1所示,该模型运行方法包括步骤S101-步骤S103。
S101:接收模型调用指令,该模型调用指令用于调用目标模型执行目标任务。
终端设备接收用户输入的模型调用指令。例如,应用程序1用于实现人像识别功能, 该人像识别功能是基于卷积神经网络(Convolutional Neural Networks,CNN)模型实现的。用户的手机终端中安装有应用程序1,当用户打开应用程序1启动人像识别功能时,则可视为终端设备接收到模型调用指令,基于该模型调用指令确定应用程序1对应的CNN模型为目标模型,调用该CNN模型执行人像识别(即目标任务)。
S102:基于该模型调用指令,确定目标模型对应的至少一个模块模型,并从cache中获取各个模块模型的第一控制指令和第一模型参数。
换而言之,终端设备的cache中存储有至少一个模块模型的第一控制指令和第一模型参数,每个模块模型上携带有目标模型的标识。终端设备接收模型调用指令之后,可以根据模型调用指令中目标模型的标识,从cache中确定出目标模型对应的模块模型的第一控制指令和第一模型参数。其中,一个模块模型对应有第一控制指令和第一模型参数,第一控制指令和第一模型参数之间具有关联或调用关系。
示例性地,如图2所示为终端设备cache中存储的多个模块模型包括的第一控制指令和第一模型参数,带有相同的模型标识的模块模型组成一个模型。模型与终端设备中安装的应用程序相对应,示例性的,在一些实施方式中,一个模型与一个应用程序相对应;在另一些实施例中,一个应用程序可以对应多个模型,对此本申请不限定模型与应用程序之间的对应关系。
当终端设备接收模型调用指令,并根据该模型调用指令确定模型1为目标模型时,则终端设备从cache中获取模型1的模块模型1包括的第一控制指令和第一模型参数,以及模型1的模块模型2包括的第一控制指令和第一模型参数。
接下来,本申请对获取每个模块模型的第一控制指令和第一模型参数的具体方式进行详细介绍。
在一个可能的实现中,终端设备基于目标模型包括的多个算子,切分该目标模型得到一个或多个模块模型,该模块模型包括至少一个算子。进一步地,终端设备对每个模块模型进行编译,得到每个模块模型的第一控制指令和第一模型参数,并将每个模块模型的第一控制指令和第一模型参数存储于cache。
需要说明的是,算子是目标模型的计算单元,可以理解为模型中的一个函数,也可以理解为模型中多个函数的集合,在此不做具体限定。例如,就包含2层卷积层、2层池化层和1层全连接层的CNN模型而言,可以将每一层卷积层、池化层和全连接层视为1个算子。
示例性地,目标模型如图3所示,该目标模型包括算子1、算子2、算子3、算子4、算子5和算子6,在这种情况下,终端设备可以根据各个算子之间的连接结构特性将该目标模型切分为如图3所示的模块模型1、模块模型2、模块模型3和模块模型4,其中,模块模型1包括算子1,目标模块2包括串行连接的算子2、算子3和算子4,模块模型3包括算子5,模块模型4包括算子6。进一步地,终端设备对模块模型1、模块模型2、模块模型3和模块模型4进行编译,得到模块模型1的第一控制指令和第一模型参数,模块模型2的第一控制指令和第一模型参数,模块模型3的第一控制指令和第一模型参数,模块模型4的第一控制指令和第一模型参数,并将每个模块模型的第一控制指令和第一模型参数存储于cache。
在一个可能的实现中,目标模型的模块模型的数量为多个,该多个模块模型对应至少两种计算单元(又可以称为芯片或处理器)。模块模型的第一控制指令和第一模型参数与计算单元对应。
换而言之,终端设备中存在至少两种计算单元,则终端设备将目标模型切分为多个模块模型后,基于各个模块模型对应的计算单元对该模块模型进行编译,得到该模块模型的第一控制指令和第一模型参数。例如,模块模型对应CPU,则终端设备将该模块模型编译为CPU对应格式下的第一控制指令(或称为硬件指令)和第一控制指令对应的第一模型参数。其中,计算单元包括CPU、GPU和NPU。
示例性地,终端设备中包括CPU、GPU和NPU,以图3所示的目标模型为例,终端设备将该目标模型切分为模块模型1、模块模型2、模块模型3和模块模型4。若模块模型1和模块模型4运行于CPU,模块模型2运行于GPU,模块模型3运行于NPU,则终端设备基于CPU对模块模型1和模块模型4进行编译,分别得到模块模型1与模块模型4的第一控制指令和第一模型参数;基于GPU对模块模型2进行编译,得到模块模型2的第一控制指令和第一模型参数;基于NPU对模块模型3进行编译,得到模块模型3的第一控制指令和第一模型参数。
在另一个可能的实现中,目标模型的模块模型的数量为一个。在这种应用场景中,终端设备仅有一个芯片(又称为处理器或计算单元)时,或,终端设备有多个芯片,但该目标模型仅在其中一个芯片中运行时,目标模型对应一个模块模型(即该目标模型不存在模型异构的情况)。在这种情况下,该模块模型即是完整的目标模型,终端设备对目标模型进行编译(包括模型加载、模型优化和模型编译中的一种或多种),得到该目标模型的第一控制指令和第一模型参数。
S103:基于各个模块模型对应的第一控制指令和第一模型参数,执行该目标任务。
终端设备基于各个模块模型之间的连接关系、第一控制指令和第一模型参数,执行目标任务。
可见,终端设备将该目标模型对应的模块模型的第一控制指令和第一模型参数存储于cache中,以便后续接收到该目标模型的模型调用指令后,可直接从cache中获取第一控制指令和第一模型参数,缩短目标模型调用的时长,提升目标模型执行目标任务的效率。
请参见图4,图4是本申请实施例提供的一种模型运行方法的流程示意图。图4所示方法的执行主体可以为终端设备,或,为终端设备中的芯片。图4以终端设备为方法的执行主体为例进行说明。如图4所示,该模型运行方法包括步骤S401-步骤S407。
S401:接收模型调用指令,该模型调用指令用于调用目标模型执行目标任务。
S402:基于该模型调用指令,确定目标模型对应的至少一个模块模型。
其中,步骤S401-步骤S402的具体实施方式可参见前述实施例1中步骤S101-步骤S102的相关描述,此处不再进行过多赘述。
S403:检测模块模型对应的计算单元是否处于不可用状态。
其中,计算单元处于不可用状态可以理解为:计算单元处于关闭状态(如当终端设备为了省电,关闭了终端NPU,在这种情况下,终端设备检测到NPU不可用),计算单元 缓存不足导致计算单元无法正常运行,或计算单元硬件损坏而导致该计算单元不可用中的一种或多种。
终端设备中存在至少两种计算单元,则终端设备将目标模型切分为多个模块模型后,终端设备检测运行模块模型的计算单元是否处于不可用状态。例如,终端设备将该目标模型切分为模块模型1、模块模型2、模块模型3和模块模型4。其中,模块模型1和模块模型4运行于CPU,模块模型2运行于GPU,模块模型3运行于NPU。在这种情况下,终端设备检测CPU、GPU和NPU是否处于不可用状态。
S404:基于检测结果,从模块模型中确定出目标模块模型,该目标模块模型对应的第一目标计算单元处于不可用状态。
终端设备从终端设备包含的多个计算单元中确定处于不可用状态的第一目标计算单元,并将目标模型运行于该第一目标计算单元的模块模型确定为目标模块模型。
S405:基于第二目标计算单元编译目标模块模型,得到目标模块模型对应的第二控制指令和第二模型参数,该第二目标计算单元与目标模块模型具有对应关系,该第二目标单元不同于该第一目标单元。
其中,第二目标计算单元与目标模块模型具有对应关系,可以理解为目标模块模型在第二目标计算单元上运行。换而言之,终端设备确定出第一目标单元和目标模块模型之后,终端设备从除该第一目标单元以外的其他计算单元中,确定第二目标单元,并基于该第二目标单元编译该目标模块模型,得到第二控制指令和第二模型参数。
示例性地,目标模型如图3中所示包括模块模型1、模块模型2、模块模型3和模块模型4,其中,各个模块模型与计算单元的对应关系如图5中5a所示,模块模型1和模块模型4运行于CPU,模块模型2运行于GPU,模块模型3运行于NPU。在这种情况下,若终端设备检测到NPU处于不可用状态,则终端设备将NPU视为第一目标单元,运行于NPU的模块模型3确定为目标模块模型。进一步地,终端设备从除NPU外的其他计算单元(即CPU和GPU)中确定GPU为第二目标单元(也即是说将模块模型3运行于GPU上),则终端设备基于GPU对模块模型3进行编译,得到模块模型3的第二控制指令和第二模型参数。换而言之,更新后的各个模块模型与计算单元的对应关系如图5中模块5b所示,模块模型1和模块模型4运行于CPU,模块模型2和模块模型3运行于GPU。可选的,在一种可能的实现方式中,终端设备也可以将CPU确定为第二目标单元。之后,终端设备基于CPU对模块模型3进行编译,后续流程可参照上述介绍,此处不再赘述。
S406:根据该第二控制指令和第二模型参数更新cache中存储的目标模块模型的第一控制指令和第一模型参数。
终端设备得到目标模型的第二控制指令和第二模型参数之后,则终端设备更新cache中存储的目标模块模型的第一控制指令和第一模型参数,更新后的第一控制指令为第二控制指令,更新后的第一模型参数为第二模型参数。进一步的终端设备还从cache中获取除目标模块模型以外的其他模块模型的第一控制指令和第一模型参数。
在一个可能的实现中,终端设备在cache中更新该目标模块模型的第一控制指令和第一模型参数之后,终端设备检测该目标模块模型对应的第一目标计算单元是否处于不可用状态。若该第一目标计算单元不处于不可用状态,则基于该第一目标计算单元编译目标模 块模型,得到所述目标模块模型对应的第三控制指令和第三模型参数,并根据该第三控制指令和第三模型参数更新cache中存储的目标模块模型的第一控制指令和第一模型参数。
示例性地,目标模型包括模块模型1、模块模型2、模块模型3和模块模型4。终端设备检测到NPU处于不可用状态,则终端设备将NPU视为第一目标单元,运行于NPU的模块模型3确定为目标模块模型。在终端设备基于GPU编译模块模型3得到更新后的第一控制指令(即第二控制指令)和更新后的第一模型参数(即第二模型参数)之后,终端设备检测到NPU(第一目标计算单元)不处于不可用状态(示例性的应用场景:由于终端设备电量不足,终端设备开启省电模式关闭高耗电的计算单元NPU,当终端设备电量充足后开启NPU),终端设备基于NPU对模块模型3进行编译,得到模块模型3的第三控制指令和第三模型参数。终端设备更新cache中存储的目标模块模型的第一控制指令(即前述第二控制指令)和第一模型参数(即前述第二模型参数),更新后的第一控制指令为第三控制指令,更新后的第一模型参数为第三模型参数。进一步的终端设备还从cache中获取除目标模块模型以外的其他模块模型的第一控制指令和第一模型参数。
S407:基于各个模块模型对应的第一控制指令和第一模型参数,执行该目标任务。
其中,步骤S407的具体实施方式可参见前述实施例中步骤S103的具体实现方式,对此不再进行过多赘述。
可见,通过这样的模型运行方式,终端设备可以根据自身计算单元的状态(是否处于不可用状态)动态地对目标模型进行调整,提升了模型运行效率的同时,还提升了模型运行的可靠性。
参见图6,图6是本发明实施例提供的模型运行装置的结构示意图,所述模型运行装置配置于终端设备中,所述模型运行装置包括:
接收单元601,用于接收模型调用指令,所述模型调用指令用于调用目标模型执行目标任务;
处理单元602,用于基于所述模型调用指令,确定所述目标模型对应的至少一个模块模型,并从缓存cache中获取各个所述模块模型的第一控制指令和第一模型参数;
所述处理单元602,还用于基于各个所述模块模型对应的所述第一控制指令和所述第一模型参数,执行所述目标任务。
在一个可能的实现中,所述从缓存cache中获取各个所述模块模型的第一控制指令和第一模型参数之前,所述处理单元602还用于:基于所述目标模型包括的多个算子,切分所述目标模型得到一个或多个模块模型,所述模块模型包括至少一个所述算子;编译每个所述模块模型,得到每个所述模块模型的第一控制指令和第一模型参数;存储每个所述模块模型的所述第一控制指令和所述第一模型参数于cache。
在一个可能的实现中,所述模块模型的数量为多个,所述多个模块模型部对应至少两种计算单元,所述模块模型的所述第一控制指令和所述第一模型参数与所述计算单元对应。
在一个可能的实现中,所述计算单元的种类为:中央处理器CPU、图形处理器GPU或者神经网络处理器NPU。
在一个可能的实现中,所述处理单元602具体用于:检测每个所述模块模型对应的所 述计算单元是否处于不可用状态;基于检测结果,从所述模块模型中确定出目标模块模型,所述目标模块模型对应的第一目标计算单元处于不可用状态;基于第二目标计算单元编译所述目标模块模型,得到所述目标模块模型对应的第二控制指令和第二模型参数,所述第二目标计算单元与所述目标模块模型具有对应关系,所述第二目标单元不同于所述第一目标单元;根据所述第二控制指令和所述第二模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数。
在一个可能的实现中,所述根据所述第二控制指令和所述第二模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数之后,所述处理单元602还用于:检测所述目标模块模型对应的所述第一目标计算单元是否处于不可用状态;若所述第一目标计算单元不处于不可用状态,则基于所述第一目标计算单元编译所述目标模块模型,得到所述目标模块模型对应的第三控制指令和第三模型参数,并根据所述第三控制指令和所述第三模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数。
需要说明的是,本发明实施例所描述的模型运行装置的各单元模块的功能可根据图1或图4所述的方法实施例中的方法具体实现,其具体实现过程可以参照图1或图4的方法实施例的相关描述,此处不再赘述。
本申请实施例还提供一种芯片,该芯片可以执行前述方法实施例中终端设备的相关步骤。该芯片用于:接收模型调用指令,该模型调用指令用于调用目标模型执行目标任务;基于模型调用指令,确定目标模型对应的至少一个模块模型,并从缓存cache中获取各个模块模型的第一控制指令和第一模型参数;基于各个模块模型对应的第一控制指令和第一模型参数,执行目标任务。
一种可能的实现中,该芯片还用于:基于目标模型包括的多个算子,切分目标模型得到一个或多个模块模型,模块模型包括至少一个算子;编译每个模块模型,得到每个模块模型的第一控制指令和第一模型参数;存储每个模块模型的第一控制指令和第一模型参数于cache。
一种可能的实现中,该模块模型的数量为多个,该多个模块模型对应至少两种计算单元,其中,模块模型的第一控制指令和第一模型参数与计算单元对应。
一种可能的实现中,该计算单元的种类为:中央处理器CPU、图形处理器GPU或者神经网络处理器NPU。
一种可能的实现中,该芯片具体用于:检测每个模块模型对应的计算单元是否处于不可用状态;基于检测结果,从模块模型中确定出目标模块模型,目标模块模型对应的第一目标计算单元处于不可用状态;基于第二目标计算单元编译目标模块模型,得到目标模块模型对应的第二控制指令和第二模型参数,第二目标计算单元与目标模块模型具有对应关系,第二目标单元不同于第一目标单元;根据第二控制指令和第二模型参数更新cache中存储的目标模块模型的第一控制指令和第一模型参数。
一种可能的实现中,该芯片还用于检测目标模块模型对应的第一目标计算单元是否处于不可用状态;若第一目标计算单元不处于不可用状态,则基于第一目标计算单元编译目 标模块模型,得到目标模块模型对应的第三控制指令和第三模型参数,并根据第三控制指令和第三模型参数更新cache中存储的目标模块模型的第一控制指令和第一模型参数。
本申请实施例还提供一种芯片模组,该芯片模组可以应用在终端设备中,该芯片模组包括上述的可以应用在终端设备的芯片。
请参见图7,图7为本申请实施例提供的一种终端设备的结构示意图。本申请实施例中所描述的终端设备70,包括:处理器701、存储器702,处理器701和存储器702通过一条或多条通信总线连接。
上述处理器701可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。处理器701被配置为支持用户设备执行图1或图4所述方法中终端设备相应的功能。
上述存储器702可以包括只读存储器和随机存取存储器,并向处理器701提供计算机程序和数据。存储器702的一部分还可以包括非易失性随机存取存储器。其中,所述处理器701调用所述计算机程序时用于执行:
接收模型调用指令,所述模型调用指令用于调用目标模型执行目标任务;
基于所述模型调用指令,确定所述目标模型对应的至少一个模块模型,并从缓存cache中获取各个所述模块模型的第一控制指令和第一模型参数;
基于各个所述模块模型对应的所述第一控制指令和所述第一模型参数,执行所述目标任务。
在一种可能的实现方式中,所述从缓存cache中获取各个所述模块模型的第一控制指令和第一模型参数之前,所述处理器701具体用于:基于所述目标模型包括的多个算子,切分所述目标模型得到一个或多个模块模型,所述模块模型包括至少一个所述算子;编译每个所述模块模型,得到每个所述模块模型的第一控制指令和第一模型参数;存储每个所述模块模型的所述第一控制指令和所述第一模型参数于cache。
在一种可能的实现方式中,所述模块模型的数量为多个,所述多个模块模型对应至少两种计算单元,所述模块模型的所述第一控制指令和所述第一模型参数与所述计算单元对应。
在一种可能的实现方式中,所述计算单元包括:中央处理器CPU、图形处理器GPU和神经网络处理器NPU。
在一种可能的实现方式中,所述处理器701具体用于:检测每个所述模块模型对应的所述计算单元是否处于不可用状态;基于检测结果,从所述模块模型中确定出目标模块模型,所述目标模块模型对应的第一目标计算单元处于不可用状态;基于第二目标计算单元编译所述目标模块模型,得到所述目标模块模型对应的第二控制指令和第二模型参数,所述第二目标计算单元与所述目标模块模型具有对应关系,所述第二目标单元不同于所述第一目标单元;根据所述第二控制指令和所述第二模型参数更新所述cache中存储的所述目 标模块模型的所述第一控制指令和所述第一模型参数。
在一种可能的实现方式中,所述根据所述第二控制指令和所述第二模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数之后,所述处理器701还用于:检测所述目标模块模型对应的所述第一目标计算单元是否处于不可用状态;若所述第一目标计算单元不处于不可用状态,则基于所述第一目标计算单元编译所述目标模块模型,得到所述目标模块模型对应的第三控制指令和第三模型参数,并根据所述第三控制指令和所述第三模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数。
具体实现中,本发明实施例中所描述的处理器701和存储器702可执行本发明实施例提供的图1或图4所述的方法实施例所描述的实现方式,也可执行本发明实施例提供的图6所描述的模型运行装置的实现方法,在此不再赘述。
关于上述实施例中描述的各个装置、产品包含的各个模块/单元,其可以是软件模块/单元,也可以是硬件模块/单元,或者也可以部分是软件模块/单元,部分是硬件模块/单元。例如,对于应用于或集成于芯片的各个装置、产品其包含的各个模块/单元可以都采用电路等硬件的方式实现,或者,至少部分模块/单元可以采用软件程序的方式实现,该软件程序运行于芯片内部集成处理器,剩余的(如果有)部分模块/单元可以采用电路等硬件方式实现;对于应用于或集成于芯片模组的各个装置、产品,其包含的各个模块/单元可以都采用电路等硬件的方式实现,不同模块/单元可以位于芯片模组的同一件(例如芯片、电路模块等)或者不同组件中,或者,至少部分模块/单元可以采用软件程序的方式实现,该软件程序运行于芯片模组内部集成的处理器,剩余的(如果有)部分模块/单元可以采用电路等硬件方式实现;对于应用于或集成于终端的各个装置、产品,其包含的模块/单元可以都采用电路等硬件的方式实现,不同的模块/单元可以位于终端内同一组件(例如,芯片、电路模块等)或者不同组件中,或者,至少部分模块/单元可以采用软件程序的方式实现,该软件程序运行于终端内部集成的处理器,剩余的(如果有)部分模块/单元可以采用电路等硬件方式实现。
本申请实施例还提供一种计算机可读存储介质,所述可读存储介质存储有计算机程序,所述计算机程序被处理器执行时,可以用于实现本申请实施例图1或图4所对应实施例中描述的模型运行方法,在此不再赘述。
所述计算机可读存储介质可以是前述任一实施例所述的终端设备的内部存储单元,例如设备的硬盘或内存。所述计算机可读存储介质也可以是所述终端设备的外部存储设备,例如所述设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述计算机可读存储介质还可以既包括所述终端设备的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述终端设备所需的其他程序和数据。所述计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于可读取存储介质中,所述程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、 只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (26)

  1. 一种模型运行方法,其特征在于,所述方法包括:
    接收模型调用指令,所述模型调用指令用于调用目标模型执行目标任务;
    基于所述模型调用指令,确定所述目标模型对应的至少一个模块模型,并从缓存cache中获取各个所述模块模型的第一控制指令和第一模型参数;
    基于各个所述模块模型对应的所述第一控制指令和所述第一模型参数,执行所述目标任务。
  2. 根据权利要求1所述方法,其特征在于,所述从缓存cache中获取各个所述模块模型的第一控制指令和第一模型参数之前,所述方法还包括:
    基于所述目标模型包括的多个算子,切分所述目标模型得到一个或多个模块模型,所述模块模型包括至少一个所述算子;
    编译每个所述模块模型,得到每个所述模块模型的第一控制指令和第一模型参数;
    存储每个所述模块模型的所述第一控制指令和所述第一模型参数于cache。
  3. 根据权利要求1或2所述方法,其特征在于,所述模块模型的数量为多个,所述多个模块模型对应至少两种计算单元,所述模块模型的所述第一控制指令和所述第一模型参数与所述计算单元对应。
  4. 根据权利要求3所述方法,其特征在于,所述计算单元的种类为:中央处理器CPU、图形处理器GPU或者神经网络处理器NPU。
  5. 根据权利要求4所述方法,其特征在于,所述从缓存cache中获取各个所述模块模型的第一控制指令和第一模型参数,包括:
    检测每个所述模块模型对应的所述计算单元是否处于不可用状态;
    基于检测结果,从所述模块模型中确定出目标模块模型,所述目标模块模型对应的第一目标计算单元处于不可用状态;
    基于第二目标计算单元编译所述目标模块模型,得到所述目标模块模型对应的第二控制指令和第二模型参数,所述第二目标计算单元与所述目标模块模型具有对应关系,所述第二目标单元不同于所述第一目标单元;
    根据所述第二控制指令和所述第二模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数。
  6. 根据权利要求5所述方法,其特征在于,所述根据所述第二控制指令和所述第二模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数之后,所述方法还包括:
    检测所述目标模块模型对应的所述第一目标计算单元是否处于不可用状态;
    若所述第一目标计算单元不处于不可用状态,则基于所述第一目标计算单元编译所述目标模块模型,得到所述目标模块模型对应的第三控制指令和第三模型参数,并根据所述第三控制指令和所述第三模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数。
  7. 一种模型运行装置,其特征在于,所述模型运行装置包括:
    接收单元,用于接收模型调用指令,所述模型调用指令用于调用目标模型执行目标任务;
    处理单元,用于基于所述模型调用指令,确定所述目标模型对应的至少一个模块模型,并从缓存cache中获取各个所述模块模型的第一控制指令和第一模型参数;
    所述处理单元,还用于基于各个所述模块模型对应的所述第一控制指令和所述第一模型参数,执行所述目标任务。
  8. 根据权利要求7所述装置,其特征在于,所述从缓存cache中获取各个所述模块模型的第一控制指令和第一模型参数之前,所述处理单元还用于:
    基于所述目标模型包括的多个算子,切分所述目标模型得到一个或多个模块模型,所述模块模型包括至少一个所述算子;
    编译每个所述模块模型,得到每个所述模块模型的第一控制指令和第一模型参数;
    存储每个所述模块模型的所述第一控制指令和所述第一模型参数于cache。
  9. 根据权利要求7或8所述装置,其特征在于,所述模块模型的数量为多个,所述多个模块模型部对应至少两种计算单元,所述模块模型的所述第一控制指令和所述第一模型参数与所述计算单元对应。
  10. 根据权利要求9所述装置,其特征在于,所述计算单元的种类为:中央处理器CPU、图形处理器GPU或者神经网络处理器NPU。
  11. 根据权利要求10所述装置,其特征在于,所述处理单元具体用于:
    检测每个所述模块模型对应的所述计算单元是否处于不可用状态;
    基于检测结果,从所述模块模型中确定出目标模块模型,所述目标模块模型对应的第一目标计算单元处于不可用状态;
    基于第二目标计算单元编译所述目标模块模型,得到所述目标模块模型对应的第二控制指令和第二模型参数,所述第二目标计算单元与所述目标模块模型具有对应关系,所述第二目标单元不同于所述第一目标单元;
    根据所述第二控制指令和所述第二模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数。
  12. 根据权利要求11所述装置,其特征在于,所述根据所述第二控制指令和所述第二 模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数之后,所述处理单元还用于:
    检测所述目标模块模型对应的所述第一目标计算单元是否处于不可用状态;
    若所述第一目标计算单元不处于不可用状态,则基于所述第一目标计算单元编译所述目标模块模型,得到所述目标模块模型对应的第三控制指令和第三模型参数,并根据所述第三控制指令和所述第三模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数。
  13. 一种芯片,其特征在于,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器用于执行以下步骤:
    接收模型调用指令,所述模型调用指令用于调用目标模型执行目标任务;
    基于所述模型调用指令,确定所述目标模型对应的至少一个模块模型,并从缓存cache中获取各个所述模块模型的第一控制指令和第一模型参数;
    基于各个所述模块模型对应的所述第一控制指令和所述第一模型参数,执行所述目标任务。
  14. 根据权利要求13所述芯片,其特征在于,所述从缓存cache中获取各个所述模块模型的第一控制指令和第一模型参数之前,所述处理器还用于:
    基于所述目标模型包括的多个算子,切分所述目标模型得到一个或多个模块模型,所述模块模型包括至少一个所述算子;
    编译每个所述模块模型,得到每个所述模块模型的第一控制指令和第一模型参数;
    存储每个所述模块模型的所述第一控制指令和所述第一模型参数于cache。
  15. 根据权利要求13或14所述芯片,其特征在于,所述模块模型的数量为多个,所述多个模块模型部对应至少两种计算单元,所述模块模型的所述第一控制指令和所述第一模型参数与所述计算单元对应。
  16. 根据权利要求15所述芯片,其特征在于,所述计算单元的种类为:中央处理器CPU、图形处理器GPU或者神经网络处理器NPU。
  17. 根据权利要求16所述芯片,其特征在于,所述处理器具体用于:
    检测每个所述模块模型对应的所述计算单元是否处于不可用状态;
    基于检测结果,从所述模块模型中确定出目标模块模型,所述目标模块模型对应的第一目标计算单元处于不可用状态;
    基于第二目标计算单元编译所述目标模块模型,得到所述目标模块模型对应的第二控制指令和第二模型参数,所述第二目标计算单元与所述目标模块模型具有对应关系,所述第二目标单元不同于所述第一目标单元;
    根据所述第二控制指令和所述第二模型参数更新所述cache中存储的所述目标模块模 型的所述第一控制指令和所述第一模型参数。
  18. 根据权利要求17所述芯片,其特征在于,所述根据所述第二控制指令和所述第二模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数之后,所述处理器还用于:
    检测所述目标模块模型对应的所述第一目标计算单元是否处于不可用状态;
    若所述第一目标计算单元不处于不可用状态,则基于所述第一目标计算单元编译所述目标模块模型,得到所述目标模块模型对应的第三控制指令和第三模型参数,并根据所述第三控制指令和所述第三模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数。
  19. 一种芯片模组,包括收发组件和芯片,所述芯片包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器用于执行以下步骤:
    接收模型调用指令,所述模型调用指令用于调用目标模型执行目标任务;
    基于所述模型调用指令,确定所述目标模型对应的至少一个模块模型,并从缓存cache中获取各个所述模块模型的第一控制指令和第一模型参数;
    基于各个所述模块模型对应的所述第一控制指令和所述第一模型参数,执行所述目标任务。
  20. 根据权利要求19所述芯片模组,其特征在于,所述从缓存cache中获取各个所述模块模型的第一控制指令和第一模型参数之前,所述处理器还用于:
    基于所述目标模型包括的多个算子,切分所述目标模型得到一个或多个模块模型,所述模块模型包括至少一个所述算子;
    编译每个所述模块模型,得到每个所述模块模型的第一控制指令和第一模型参数;
    存储每个所述模块模型的所述第一控制指令和所述第一模型参数于cache。
  21. 根据权利要求19或20所述芯片模组,其特征在于,所述模块模型的数量为多个,所述多个模块模型部对应至少两种计算单元,所述模块模型的所述第一控制指令和所述第一模型参数与所述计算单元对应。
  22. 根据权利要求21所述芯片模组,其特征在于,所述计算单元的种类为:中央处理器CPU、图形处理器GPU或者神经网络处理器NPU。
  23. 根据权利要求22所述芯片模组,其特征在于,所述处理器具体用于:
    检测每个所述模块模型对应的所述计算单元是否处于不可用状态;
    基于检测结果,从所述模块模型中确定出目标模块模型,所述目标模块模型对应的第一目标计算单元处于不可用状态;
    基于第二目标计算单元编译所述目标模块模型,得到所述目标模块模型对应的第二控 制指令和第二模型参数,所述第二目标计算单元与所述目标模块模型具有对应关系,所述第二目标单元不同于所述第一目标单元;
    根据所述第二控制指令和所述第二模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数。
  24. 根据权利要求23所述芯片模组,其特征在于,所述根据所述第二控制指令和所述第二模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数之后,所述处理器还用于:
    检测所述目标模块模型对应的所述第一目标计算单元是否处于不可用状态;
    若所述第一目标计算单元不处于不可用状态,则基于所述第一目标计算单元编译所述目标模块模型,得到所述目标模块模型对应的第三控制指令和第三模型参数,并根据所述第三控制指令和所述第三模型参数更新所述cache中存储的所述目标模块模型的所述第一控制指令和所述第一模型参数。
  25. 一种终端设备,其特征在于,包括处理器和存储器,所述处理器和所述存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行如权利要求1-6中任一项所述的方法。
  26. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求1-6中任一项所述的方法。
PCT/CN2021/141399 2021-01-29 2021-12-25 一种模型运行方法及相关装置 WO2022161059A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110126081.XA CN112783506B (zh) 2021-01-29 2021-01-29 一种模型运行方法及相关装置
CN202110126081.X 2021-01-29

Publications (1)

Publication Number Publication Date
WO2022161059A1 true WO2022161059A1 (zh) 2022-08-04

Family

ID=75759742

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/141399 WO2022161059A1 (zh) 2021-01-29 2021-12-25 一种模型运行方法及相关装置

Country Status (2)

Country Link
CN (1) CN112783506B (zh)
WO (1) WO2022161059A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112783506B (zh) * 2021-01-29 2022-09-30 展讯通信(上海)有限公司 一种模型运行方法及相关装置
CN114327671A (zh) * 2021-12-03 2022-04-12 北京达佳互联信息技术有限公司 参数配置方法、装置、设备及存储介质
CN115056784B (zh) * 2022-07-04 2023-12-05 小米汽车科技有限公司 车辆控制方法、装置、车辆、存储介质及芯片
CN116985830A (zh) * 2023-07-26 2023-11-03 小米汽车科技有限公司 车辆模式的运行方法、装置、车辆及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10261765B1 (en) * 2018-03-09 2019-04-16 Oracle International Corporation Enhancing program execution using optimization-driven inlining
CN111273953A (zh) * 2018-11-19 2020-06-12 Oppo广东移动通信有限公司 模型处理方法、装置、终端及存储介质
CN111651207A (zh) * 2020-08-06 2020-09-11 腾讯科技(深圳)有限公司 一种神经网络模型运算芯片、方法、装置、设备及介质
CN112204524A (zh) * 2018-05-24 2021-01-08 赛灵思公司 用于硬件加速的硬件资源的嵌入式调度
CN112783506A (zh) * 2021-01-29 2021-05-11 展讯通信(上海)有限公司 一种模型运行方法及相关装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10802992B2 (en) * 2016-08-12 2020-10-13 Xilinx Technology Beijing Limited Combining CPU and special accelerator for implementing an artificial neural network
CA2974556C (en) * 2016-08-25 2018-06-05 Sas Institute Inc. Compilation for node device gpu-based parallel processing
CN110908667B (zh) * 2019-11-18 2021-11-16 北京迈格威科技有限公司 神经网络联合编译的方法、装置和电子设备
CN111340237B (zh) * 2020-03-05 2024-04-26 腾讯科技(深圳)有限公司 数据处理和模型运行方法、装置和计算机设备
CN112035220A (zh) * 2020-09-30 2020-12-04 北京百度网讯科技有限公司 开发机操作任务的处理方法、装置、设备以及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10261765B1 (en) * 2018-03-09 2019-04-16 Oracle International Corporation Enhancing program execution using optimization-driven inlining
CN112204524A (zh) * 2018-05-24 2021-01-08 赛灵思公司 用于硬件加速的硬件资源的嵌入式调度
CN111273953A (zh) * 2018-11-19 2020-06-12 Oppo广东移动通信有限公司 模型处理方法、装置、终端及存储介质
CN111651207A (zh) * 2020-08-06 2020-09-11 腾讯科技(深圳)有限公司 一种神经网络模型运算芯片、方法、装置、设备及介质
CN112783506A (zh) * 2021-01-29 2021-05-11 展讯通信(上海)有限公司 一种模型运行方法及相关装置

Also Published As

Publication number Publication date
CN112783506A (zh) 2021-05-11
CN112783506B (zh) 2022-09-30

Similar Documents

Publication Publication Date Title
WO2022161059A1 (zh) 一种模型运行方法及相关装置
CN110096310B (zh) 运算方法、装置、计算机设备和存储介质
CN108563517A (zh) 系统接口的调用方法及装置
US20180052723A1 (en) Middleware Interface and Middleware Interface Generator
CN110471701B (zh) 图像渲染的方法、装置、存储介质及电子设备
CN111158756B (zh) 用于处理信息的方法和装置
WO2023246801A1 (zh) 算法流水线编排方法、装置、电子设备和存储介质
CN110187923A (zh) 一种应用于多cpu板卡的cpu启动方法和装置
CN108196929B (zh) 一种智能加载系统、方法、存储介质及设备
CN110232665B (zh) 最大池化方法、装置、计算机设备及存储介质
CN107562499A (zh) 应用加载的方法、装置和计算机可读存储介质
CN110458285B (zh) 数据处理方法、装置、计算机设备和存储介质
CN112084023A (zh) 数据并行处理的方法、电子设备及计算机可读存储介质
CN110806932A (zh) 一种算法调度方法及装置
CN111506393A (zh) 一种基于arm的虚拟化装置及其使用方法
CN114564241B (zh) 硬件设备的访问方法、装置、计算机设备和存储介质
WO2022105743A1 (zh) 一种算子计算方法、装置、设备及系统
US20220334840A1 (en) Data processing apparatus and related products
CN110555522A (zh) 数据处理方法、装置、计算机设备和存储介质
CN109753293A (zh) 一种插件的处理方法及相关设备
CN110083469B (zh) 一种异构硬件组织运行统一内核方法及系统
CN111459564A (zh) boot阶段初始化兼容的实现方法、系统及计算机设备
CN109597611B (zh) 前端数据流控制组件开发系统、方法、设备及存储介质
CN115373646A (zh) 扩展信息方法、装置和相关产品
WO2020082354A1 (zh) 一种系统状态检测方法、系统状态装置及终端设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922649

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922649

Country of ref document: EP

Kind code of ref document: A1