CN113614749B - Processing method, device and equipment of artificial intelligence model and readable storage medium - Google Patents

Processing method, device and equipment of artificial intelligence model and readable storage medium Download PDF

Info

Publication number
CN113614749B
CN113614749B CN202180002364.1A CN202180002364A CN113614749B CN 113614749 B CN113614749 B CN 113614749B CN 202180002364 A CN202180002364 A CN 202180002364A CN 113614749 B CN113614749 B CN 113614749B
Authority
CN
China
Prior art keywords
operator
unit
model
artificial intelligence
storage unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202180002364.1A
Other languages
Chinese (zh)
Other versions
CN113614749A (en
Inventor
朱湘毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN113614749A publication Critical patent/CN113614749A/en
Application granted granted Critical
Publication of CN113614749B publication Critical patent/CN113614749B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The application discloses a processing method, a device and equipment of an artificial intelligence model and a readable storage medium, wherein the processing method of the artificial intelligence model is applied to an artificial intelligence processing unit, and the artificial intelligence processing unit comprises a control unit, an arithmetic logic unit and a storage unit; the method comprises the following steps: acquiring an artificial intelligence AI model; the AI model comprises a control operator and a calculation operator, and is issued by the processor based on the user mode interface API; the API comprises a first API, and the first API is used for issuing a control operator; the arithmetic logic unit stores data after the arithmetic operator is executed in the storage unit; the control operator is executed based on the data in the storage unit. By the method and the device, the performance of model reasoning or model training can be improved.

Description

Processing method, device and equipment of artificial intelligence model and readable storage medium
Technical Field
The present application relates to the field of processor technologies, and in particular, to a processing method of an artificial intelligence model, a processing apparatus of an artificial intelligence model, a processing device of an artificial intelligence model, a computer-readable storage medium, and a computer program.
Background
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. AI is the leading-edge technology of current popular science and world development, applied to a wide variety of scenes in life.
The automatic driving system has a large number of scenes and needs to use AI model reasoning, the AI models are basically deep neural networks, and the neural networks are matrix and vector calculation intensive and have high requirements on calculation power (T level). An ordinary CPU generally cannot meet the computation power requirement of the deep neural Network, that is, the AI model, for mass computation, and therefore a dedicated accelerator is required to execute the AI model, such as a specially-customized Graphics Processing Unit (GPU) or a neural Network Processor (NPU).
The application loads the AI model into the AI accelerator, through which the AI model is trained or inferred (executed). When an AI accelerator in the prior art encounters a conditional branch judgment operator and a cyclic operator, the AI accelerator can only execute the current operator and cannot control the execution of the subsequent operator according to the execution result of the current operator; the AI accelerator needs to drop part of the control functions back to the main processor for execution. Therefore, in the process of training or reasoning the AI model, the main processor and the AI accelerator need to frequently interact with each other to complete the training, which results in low performance of model reasoning or model training.
Disclosure of Invention
The embodiment of the application provides a processing method of an artificial intelligence model, a processing device of the artificial intelligence model, processing equipment of the artificial intelligence model, a computer readable storage medium and a computer program, so as to improve the performance of model reasoning or model training.
In a first aspect, an embodiment of the present application provides a processing method for an artificial intelligence model, which is applied to an artificial intelligence processing unit, where the artificial intelligence processing unit includes a control unit, an arithmetic logic unit, and a storage unit, and the method includes:
the artificial intelligence processing unit obtains an AI model issued by a processor side based on an Application Programming Interface (API); the AI model comprises a control operator and a calculation operator, the API comprises a first API, and the first API is used for issuing the control operator;
the artificial intelligence processing unit executes the calculation operator through the operation logic unit in the process of training or reasoning the AI model, and the operation logic unit stores the executed data in the storage unit after executing the calculation operator;
the artificial intelligence processing unit executes the control operator based on the data in the storage unit through the control unit.
According to the embodiment of the application, the API used for issuing the control operator is set, the AI model including the control operator can be issued to the artificial intelligence processing unit based on the API, so that the artificial intelligence processing unit can store data after the operator or task is executed in the storage unit by setting the storage unit in the artificial intelligence processing unit in the process of training or reasoning the AI model, the control unit can execute the control operator directly based on the data in the storage unit, and the control unit can control the execution of the following operator according to the execution result of the current operator. The whole AI model is executed in the controller and the arithmetic logic unit, and partial control functions do not need to be returned to the main processor for processing; the method solves the technical problem that the performance of model reasoning or model training is not high because the main processor and the AI accelerator need to frequently interact with each other to finish the training or reasoning process of the AI model in the prior art; the performance of model reasoning or model training is improved.
In one possible implementation, the storage unit includes a first storage unit and a second storage unit; the storing the data after the calculation operator is executed in the storage unit includes: storing the data after the calculation operator is executed in the second storage unit;
the executing, by the control unit, the control operator based on the data in the storage unit includes: reading the data in the second storage unit through a control unit, and then writing the data in the second storage unit into the first storage unit; then at the time of executing the control operator, the control operator may be read and executed based on the data in the first memory location.
According to the embodiment of the application, the first storage unit and the second storage unit are arranged in the artificial intelligence processing unit, the second storage unit is used for storing data after the arithmetic logic unit executes an operator or a task, and the data are read by the control unit to the first storage unit, so that the control operator can be directly executed based on the data in the first storage unit when the control operator is executed, and the control unit can control the execution of the following operator according to the execution result of the current operator.
In a possible implementation, the first storage unit may be integrated in the control unit, that is, the first storage unit may be added to the control unit, and the first storage unit may be a dedicated register of the control unit. The second storage unit may be integrated in the arithmetic logic unit, that is, the second storage unit may be added to the arithmetic logic unit, and the second storage unit may be a dedicated register of the arithmetic logic unit. The control unit can further quickly and efficiently read the data after the arithmetic logic unit executes the operator or the task, so that the execution of the following operator can be controlled according to the execution result of the current operator. The whole AI model is executed in the controller and the arithmetic logic unit, and partial control functions do not need to be returned to the main processor for processing.
And each operation logic unit in the artificial intelligence processing unit can be added with a second storage unit (a register special for the operation logic unit), so that each operation logic unit can be used for cooperating with the control unit to execute a control operator, and the performance of model inference or model training of the artificial intelligence processing unit can be further improved.
In one possible implementation, the AI model corresponds to at least one execution sequence, and each first storage unit corresponds to a different execution sequence.
In this embodiment, the processor may set the number of the first storage units in the control unit in a customized manner according to the number of the AI models and the number of the execution sequences corresponding to each AI model.
In one possible implementation, the control operator includes a branch judgment operator, wherein the branch judgment operator is used for judging whether to execute the first branch operator or the second branch operator;
the reading and executing the control operator based on the data in the first storage unit includes:
reading data in the first storage unit;
judging whether to execute the first branch operator based on the data in the first storage unit and the parameters in the branch judgment operator;
if the judgment result is yes, executing the first branch operator; if not, the second branch operator is executed.
In this embodiment, the control operator may include a branch judgment operator, that is, the control unit may directly read data required for branch judgment when executing the issued branch judgment operator, and then further perform a next operator or task (a first branch operator or a second branch operator). The AI model is executed in the control unit and the arithmetic logic unit, and partial control functions are not required to be returned to the main processor for processing.
In a possible implementation, the control operator further includes a loop operator, and the loop operator is configured to loop through the first calculation operator of the AI model; after the executing the first branch operator, the method further includes: and executing the loop operator to jump to the first calculation operator of the AI model through the operation logic unit, so as to circularly execute the first calculation operator, so that loop iteration is performed, and the loop iteration is ended and the second branch operator is executed until the judgment result of judging whether to execute the first branch operator is negative according to the data in the first storage unit and the parameters in the branch judgment operator next time.
The AI model in the embodiment of the application can further comprise a cyclic operator, and the execution of the cyclic operator can control the operation logic unit to jump to the first calculation operator of the AI model so as to circularly execute the first calculation operator, so that the branch judgment operator and the cyclic operator in the AI model can be directly executed in the control unit, the branch judgment operator and the cyclic operator can be completed without frequent interaction with a main processor, and the technical problem of low performance of model inference or model training is solved; the performance of model reasoning or model training is improved.
In one possible implementation, the API further includes a second API and a third API, the second API for creating a tag; the third API is used for setting the position of a label in the AI model; the AI model further includes a first tag and a second tag for jumping; wherein the first label is placed in a last operator adjacent to a first computation operator of the AI model, and the second label is placed in a last operator adjacent to the second branch operator;
before the reading by the control unit and the execution of the control operator based on the data in the first storage unit, the method further includes: executing the first calculation operator by the arithmetic logic unit;
then the executing the loop operator to iteratively execute the calculation operator of the AI model through the arithmetic logic unit includes: executing the loop operator, jumping to the position of the first label, and executing the first calculation operator in a loop mode through the operation logic unit;
if not, executing the second branch operator, including: if not, jumping to the position of the second label to execute the second branch operator.
The AI model in the embodiment of the present application may further include a first tag and a second tag, and their respective positions placed in the AI model; the operator after the branch judgment operator, for example, the jump when executing the loop operator, is efficiently realized through the jump of the first label and the second label. Therefore, the branch judgment operator and the circulation operator in the AI model can be directly executed in the control unit without frequent interaction with the main processor, and the technical problem of low performance of model inference or model training is solved; the performance of model reasoning or model training is improved.
In a possible implementation manner, before the reading of the data in the second storage unit by the control unit, the method further includes: setting the second storage unit to an invalid value;
the writing the data in the second memory cell into the first memory cell includes: writing the data of the second storage unit into the first storage unit under the condition that the read data of the second storage unit is judged to be a valid value; and if the read data of the second storage unit is judged to be invalid, not writing the data of the second storage unit into the first storage unit.
In the embodiment of the application, the control unit sets the second storage unit of the operation logic unit to be an invalid value, and then writes the data into the first storage unit when the read data of the second storage unit is judged to be an valid value, so that the accuracy of the training or reasoning AI model can be well ensured.
In a second aspect, the present application provides a method for processing an artificial intelligence model, the method comprising:
a processor (or called a main processor) creates an artificial intelligence AI model; the AI model comprises a control operator and a calculation operator;
based on API, the AI model is issued to an artificial intelligence processing unit;
the API comprises a first API, and the first API is used for issuing the control operator; the artificial intelligence processing unit is used for executing the control operator and the calculation operator in the process of training or reasoning the AI model.
According to the embodiment of the application, the API used for issuing the control operator is set, the AI model including the control operator can be issued to the artificial intelligence processing unit based on the API, so that the artificial intelligence processing unit can independently execute the whole AI model in the process of training or reasoning the AI model, and partial control functions do not need to be returned to the main processor for processing. The method solves the technical problem that the performance of model reasoning or model training is not high because the main processor and the AI accelerator need to frequently interact with each other to finish the training or reasoning process of the AI model in the prior art; the performance of model reasoning or model training is improved.
In one possible implementation, the control operator includes a branch judgment operator for judging whether to execute the first branch operator or the second branch operator.
According to the embodiment of the application, the API used for issuing the branch judgment operator can be set, and then the AI model is issued to the artificial intelligence processing unit, so that the artificial intelligence processing unit can independently complete the execution of the branch judgment operator in the process of training or reasoning the AI model, and the interaction with the main processor is not needed.
In a possible implementation, the control operator further includes a loop operator, and the loop operator is configured to loop through the first calculation operator of the AI model; the API also includes a second API and a third API, the second API for creating a tag; the third API is used to set the location of the tag in the AI model.
In one possible implementation, the artificial intelligence processing unit includes a control unit, an arithmetic logic unit, and a storage unit. The calculation operator in the AI model is dispatched to the arithmetic logic unit by a control unit in the artificial intelligence processing unit for execution, and the executed data is stored in the storage unit after the calculation operator is executed each time. So that the artificial intelligence processing unit executes the control operator based on the data in the storage unit through the control unit.
In the embodiment of the application, an API for issuing a cyclic operator may be further configured, so that the AI model issued to the artificial intelligence processing unit can enable the artificial intelligence processing unit to independently complete the execution of the cyclic operator in the process of training or reasoning the AI model. And the artificial intelligence processing unit can carry out related jumping through the API for creating the label and the API for setting the position of the label in the AI model in the process of executing the loop operator, so that the execution of the loop operator can be further completed quickly and efficiently.
In a third aspect, the present application provides an artificial intelligence model processing apparatus, which is an artificial intelligence processing unit, including:
the acquisition unit is used for acquiring an AI model issued by the processor side based on the API; the AI model comprises a control operator and a calculation operator, and is issued by the processor based on the API; the API comprises a first API, and the first API is used for issuing the control operator;
the first execution unit is used for executing the calculation operator through the operation logic unit of the artificial intelligence processing unit in the process of training or reasoning the AI model;
the storage processing unit is used for storing the executed data in the storage unit of the artificial intelligence processing unit after the first execution unit executes the calculation operator;
and the second execution unit is used for executing the control operator based on the data in the storage unit through the control unit of the artificial intelligence processing unit.
In one possible implementation, the storage unit includes a first storage unit and a second storage unit;
the storage processing unit is specifically configured to: storing the data after the calculation operator is executed in the second storage unit;
the second execution unit includes:
a first reading unit for reading the data in the second storage unit;
a first writing unit for writing the data in the second storage unit into the first storage unit;
and the reading execution unit is used for reading and executing the control operator based on the data in the first storage unit.
In one possible implementation, the control operator includes a branch judgment operator, wherein the branch judgment operator is used for judging whether to execute the first branch operator or the second branch operator; the read execution unit includes:
a second reading unit for reading the data in the first storage unit;
a judging unit, configured to judge whether to execute the first branch operator based on the data in the first storage unit and the parameter in the branch judgment operator;
the judgment processing unit is used for executing the first branch operator if the judgment unit judges that the first branch operator is the first branch operator; if the judgment unit judges that the branch operator is not the first branch operator, the second branch operator is executed.
In one possible implementation, the control operator further includes a loop operator, and the loop operator is configured to loop through the first calculation operator of the AI model; the processing apparatus further includes:
and a third execution unit, configured to execute the loop operator after the judgment processing unit executes the first branch operator, so as to jump to the first computation operator of the AI model through the operation logic unit, thereby executing the first computation operator in a loop manner, so as to perform loop iteration, and ending the loop iteration until the next time the judgment unit judges whether to execute the first branch operator based on the data in the first storage unit and the parameter in the branch judgment operator, and executing the second branch operator by the judgment processing unit.
In one possible implementation, the API further includes a second API and a third API, the second API for creating a tag; the third API is used for setting the position of a label in the AI model; the AI model further includes a first tag and a second tag for jumping; wherein the first label is placed in a last operator adjacent to a first computation operator of the AI model, and the second label is placed in a last operator adjacent to the second branch operator; the processing apparatus further includes:
a fourth execution unit for executing the first calculation operator by the arithmetic logic unit before the read execution unit reads and executes the control operator based on the data in the first storage unit;
the third execution unit is specifically configured to: after the judging and processing unit executes the first branch operator, the loop operator is executed, and the position of the first label is jumped to, so that the first calculation operator is executed in a loop way through the operation logic unit;
if the judgment unit judges that the first branch operator is not the second branch operator, the judgment processing unit is specifically used for jumping to the position where the second label is located so as to execute the second branch operator.
In one possible implementation, the AI model corresponds to at least one execution sequence, and each of the first storage units corresponds to a different execution sequence.
In one possible implementation, the processing apparatus further includes:
a setting unit configured to set the second storage unit to an invalid value before the first reading unit reads the data in the second storage unit;
the first write unit is specifically configured to: writing the data of the second storage unit into the first storage unit under the condition that the read data of the second storage unit is judged to be a valid value; and if the read data of the second storage unit is judged to be invalid, not writing the data of the second storage unit into the first storage unit.
In a fourth aspect, the present application provides a processing apparatus for an artificial intelligence model, the processing apparatus comprising:
a creating unit for creating an AI model; the AI model comprises a control operator and a calculation operator;
the issuing unit is used for issuing the AI model to the artificial intelligence processing unit based on the API;
the API comprises a first API, and the first API is used for issuing the control operator; the artificial intelligence processing unit is used for executing the control operator and the calculation operator in the process of training or reasoning the AI model.
In one possible implementation, the control operator includes a branch judgment operator for judging whether to execute the first branch operator or the second branch operator.
In a possible implementation, the control operator further includes a loop operator for looping execution of the first calculation operator of the AI model; the API also includes a second API and a third API, the second API for creating a tag; the third API is used to set the location of the tag in the AI model.
In a fifth aspect, the present application provides an artificial intelligence model processing apparatus, comprising the artificial intelligence processing unit and a memory; wherein the memory is configured to store program codes, and the artificial intelligence processing unit calls the program codes stored in the memory to cause the processing device of the artificial intelligence model to execute the method in the first aspect and various possible implementations thereof.
In a sixth aspect, the present application provides an artificial intelligence model processing apparatus, comprising a processor and a memory; wherein the memory is used for storing program codes, and the processor calls the program codes stored in the memory to make the processing device of the artificial intelligence model execute the method in the second aspect and the various possible implementation manners.
In a seventh aspect, the present application provides an artificial intelligence model processing apparatus, comprising a processor, an artificial intelligence processing unit, and a memory; wherein the memory may include a plurality for storing program code; the processor is coupled with the artificial intelligence processing unit; the artificial intelligence processing unit may invoke a memory coupled to itself or invoke program code stored in a memory internal to itself to cause the processing device of the artificial intelligence model to perform the methods of the first aspect and its various possible implementations described above. The processor may invoke program code stored in the memory coupled to the processor to cause the processing device of the artificial intelligence model to perform the methods of the second aspect and its various possible implementations described above.
In an eighth aspect, the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of the first or second aspect and its various possible implementations.
In a ninth aspect, the present application provides a computer program comprising instructions which, when executed by a processor, cause the main processor to perform the method of the first or second aspect and its various possible implementations.
Drawings
Fig. 1 is a functional block diagram of a vehicle 100 according to an embodiment of the present application.
Fig. 2 is a schematic structural diagram of an AI computing architecture provided in the embodiment of the present application.
Fig. 3 is a schematic structural diagram of a processing device of an artificial intelligence model according to an embodiment of the present application.
Fig. 4 is a schematic structural diagram of a processing device of an artificial intelligence model according to another embodiment provided in the present application.
Fig. 5 is a schematic structural diagram of a processing device of an artificial intelligence model according to another embodiment provided in the present application.
Fig. 6 is a schematic diagram of a loop statement of an AI model provided in an embodiment of the present application.
Fig. 7 is a schematic diagram of an execution sequence of stream1 in the AI model provided in the embodiment of the present application.
Fig. 8 is a schematic diagram of an execution flow of a control unit according to an embodiment of the present application.
Fig. 9 is a schematic diagram illustrating an artificial intelligence processing unit according to an embodiment of the present disclosure.
Fig. 10 is a schematic structural diagram of a processing apparatus of an artificial intelligence model according to an embodiment of the present application.
Fig. 11 is a schematic structural diagram of a processing apparatus of an artificial intelligence model according to another embodiment of the present disclosure.
Fig. 12 is a flowchart illustrating a processing method of an artificial intelligence model according to an embodiment of the present application.
Fig. 13 is a flowchart illustrating a processing method of an artificial intelligence model according to another embodiment of the present disclosure.
FIG. 14 is a conceptual partial view of a computer program or computer program product provided by embodiments of the present application.
Detailed Description
The embodiments of the present application will be described below with reference to the drawings. The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
As used in this specification, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between 2 or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from two components interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
First, some terms in the present application are explained so as to be easily understood by those skilled in the art.
(1) The register, which is a component within the CPU, is associated with the CPU. Registers are high-speed storage elements of limited storage capacity that may be used to temporarily store instructions, data, and addresses. In the control unit of the central processing unit, registers are included, such as an Instruction Register (IR) and a Program Counter (PC). In the arithmetic and logic part of the central processor, the registers included are Accumulators (ACC).
(2) AI accelerators, a class of specialized hardware accelerators or computer systems, are intended to accelerate the application of artificial intelligence, particularly artificial neural networks, machine vision, and machine learning. Typical applications include robotics, internet of things, and other data intensive or sensor driven task algorithms. AI accelerators, a class of hardware accelerators dedicated to dedicated tasks, are often an aid or supplement to a main processor in a computer system, including, but not limited to, a specially-tailored GPU or NPU for executing AI models, for example.
(3) An API, is a predefined interface (e.g., a function, HTTP interface), or a convention for linking different components of a software system. To provide a set of routines that applications and developers can access based on certain software or hardware without accessing source code or understanding the details of the internal workings.
(4) Runtime, refers to a state in which a program is running (cc or being executed), that is, when the program is running. The Runtime library is a library which is required to be relied on when the program runs. In some programming languages, certain reusable programs or instances are packaged or reconstructed into a Runtime library. These instances may be linked or called by any program as they are running. The Runtime library of the embodiment of the present application provides an API of an artificial intelligence processing unit (e.g., GPU or NPU), for example, including an API for generating and issuing a control operator.
(5) In the execution sequence, one AI model is generally split into a plurality of streams, which correspond to the execution sequence. Under each execution sequence, there are multiple tasks (or called operators, or one operator is packaged by a task), and there are events for synchronization between streams. The tasks among the streams can be executed in parallel on the artificial intelligence processing unit, and the tasks in the streams can be executed only in series. The operators or tasks in the embodiments of the present application may include a calculation operator, which may be used for data calculation, and a control operator, which may be used for controlling the execution order of the execution sequence, and the like. The operator or task is essentially a code in the AI model, for example, a convolutional code in the AI model is an operator or task. That is to say, the calculation operator in the embodiment of the present application is a code for implementing or completing data calculation, and is generally run on an arithmetic logic unit in an artificial intelligence processing unit to complete a data calculation task. The control operator in the embodiment of the present application is a code for controlling the execution sequence of the execution sequence, and can be executed by a control unit in the artificial intelligence processing unit.
The processing method of the artificial intelligence model, the processing apparatus of the artificial intelligence model, and the processing device of the artificial intelligence model provided in the embodiments of the present application may be applied to all application scenarios that require an AI accelerator, including scenarios in which an AI processing is performed on a picture of a camera using a Convolutional Neural Network (CNN) model or a Mask Region-based convolutional neural network (Mask RCNN) model, for example, scenarios in an intelligent vehicle scenario, such as an automatic driving field, a driver monitoring, parking, and automatic driving. Scenarios involving AI processing of data using a Recurrent Neural Network (RNN) model, such as scenarios involving voice interaction between a car and a driver, and passengers in a car in a smart car scenario, may also be included.
In order to facilitate understanding of the embodiments of the present application, the technical problems to be solved by the present application are further analyzed and presented. The following description will be made taking an automatic driving scenario of a vehicle as an example:
first, a functional block diagram of a vehicle 100 provided in the embodiment of the present application shown in fig. 1 is shown. In one embodiment, the vehicle 100 is configured in a fully or partially autonomous driving mode. For example, the vehicle 100 may control itself while in the autonomous driving mode, and may determine a current state of the vehicle and its surroundings by human operation, determine a possible behavior of at least one other vehicle in the surroundings, and determine a confidence level corresponding to a likelihood that the other vehicle performs the possible behavior, controlling the vehicle 100 based on the determined information. While the vehicle 100 is in the autonomous driving mode, the vehicle 100 may be placed into operation without human interaction.
The vehicle 100 may be a car, a truck, a motorcycle, a bus, a boat, an airplane, a helicopter, a lawn mower, an amusement car, a playground vehicle, construction equipment, a trolley, a golf cart, a train, a trolley, etc., and the embodiment of the present application is not particularly limited.
The vehicle 100 may include various subsystems such as a travel system 102, a sensing system 104, a control system 106, one or more peripherals 108, as well as a power supply 110, a computer system 112, and a user interface 116. Alternatively, vehicle 100 may include more or fewer subsystems, and each subsystem may include multiple elements. In addition, each of the sub-systems and elements of the vehicle 100 may be interconnected by wire or wirelessly.
The travel system 102 may include components that provide powered motion to the vehicle 100. In one embodiment, the travel system 102 may include an engine 118, an energy source 119, a transmission 120, and wheels/tires 121. The engine 118 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a hybrid engine of a gasoline engine and an electric motor, or a hybrid engine of an internal combustion engine and an air compression engine. The engine 118 converts the energy source 119 into mechanical energy.
Examples of energy sources 119 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electrical power. The energy source 119 may also provide energy to other systems of the vehicle 100.
The transmission 120 may transmit mechanical power from the engine 118 to the wheels 121. The transmission 120 may include a gearbox, a differential, and a drive shaft. In one embodiment, the transmission 120 may also include other devices, such as a clutch. Wherein the drive shaft may comprise one or more shafts that may be coupled to one or more wheels 121.
The sensing system 104 may include several sensors that sense information about the environment surrounding the vehicle 100. For example, the sensing system 104 may include a global positioning system 122 (which may be a GPS system, a beidou system, or other positioning system), an Inertial Measurement Unit (IMU) 124, a radar 126, a laser range finder 128, and a camera 130. The sensing system 104 may also include sensors of internal systems of the monitored vehicle 100 (e.g., an in-vehicle air quality monitor, a fuel gauge, an oil temperature gauge, etc.). Sensor data from one or more of these sensors may be used to detect the object and its corresponding characteristics (position, shape, orientation, velocity, etc.). Such detection and identification is a critical function of the safe operation of the autonomous vehicle 100.
The global positioning system 122 may be used to estimate the geographic location of the vehicle 100. The IMU124 is used to sense position and orientation changes of the vehicle 100 based on inertial acceleration. In one embodiment, IMU124 may be a combination of an accelerometer and a gyroscope.
The radar 126 may utilize radio signals to sense objects within the surrounding environment of the vehicle 100. In some embodiments, in addition to sensing objects, radar 126 may also be used to sense the speed and/or heading of an object.
The laser rangefinder 128 may utilize laser light to sense objects in the environment in which the vehicle 100 is located. In some embodiments, the laser rangefinder 128 may include one or more laser sources, laser scanners, and one or more detectors, among other system components.
The camera 130 may be used to capture multiple images of the surrounding environment of the vehicle 100. The camera 130 may be a still camera or a video camera.
The control system 106 is for controlling the operation of the vehicle 100 and its components. Control system 106 may include various elements including a steering system 132, a throttle 134, a braking unit 136, a sensor fusion algorithm 138, a computer vision system 140, a route control system 142, and an obstacle avoidance system 144.
The steering system 132 is operable to adjust the heading of the vehicle 100. For example, in one embodiment, a steering wheel system.
The throttle 134 is used to control the operating speed of the engine 118 and thus the speed of the vehicle 100.
The brake unit 136 is used to control the deceleration of the vehicle 100. The brake unit 136 may use friction to slow the wheel 121. In other embodiments, the brake unit 136 may convert the kinetic energy of the wheel 121 into an electric current. The brake unit 136 may take other forms to slow the rotational speed of the wheels 121 to control the speed of the vehicle 100.
The computer vision system 140 may be operable to process and analyze images captured by the camera 130 to identify objects and/or features in the environment surrounding the vehicle 100. The objects and/or features may include traffic signals, road boundaries, and obstacles. The computer vision system 140 may use object recognition algorithms, motion recovery Structure (SFM) algorithms, video tracking, and other computer vision techniques. In some embodiments, the computer vision system 140 may be used to map an environment, track objects, estimate the speed of objects, and so forth.
The route control system 142 is used to determine a travel route of the vehicle 100. In some embodiments, the route control system 142 may combine data from the sensor fusion algorithm 138, the GPS122, and one or more predetermined maps to determine a travel route for the vehicle 100.
Obstacle avoidance system 144 is used to identify, assess, and avoid or otherwise negotiate potential obstacles in the environment of vehicle 100.
Of course, in one example, the control system 106 may additionally or alternatively include components other than those shown and described. Or may reduce some of the components shown above.
Vehicle 100 interacts with external sensors, other vehicles, other computer systems, or users through peripherals 108. The peripheral devices 108 may include a wireless communication system 146, an in-vehicle computer 148, a microphone 150, and/or speakers 152.
In some embodiments, the peripheral devices 108 provide a means for a user of the vehicle 100 to interact with the user interface 116. For example, the onboard computer 148 may provide information to a user of the vehicle 100. The user interface 116 may also operate the in-vehicle computer 148 to receive user input. The in-vehicle computer 148 may be operated via a touch screen. In other cases, the peripheral devices 108 may provide a means for the vehicle 100 to communicate with other devices located within the vehicle. For example, the microphone 150 may receive audio (e.g., voice commands or other audio input) from a user of the vehicle 100. Similarly, the speaker 152 may output audio to a user of the vehicle 100.
The wireless communication system 146 may communicate wirelessly with one or more devices, either directly or via a communication network. For example, the wireless communication system 146 may use 3G cellular communication, such as CDMA, EVD0, GSM/GPRS, or 4G cellular communication, such as LTE. Or 5G cellular communication. The wireless communication system 146 may communicate with a Wireless Local Area Network (WLAN) using WiFi. In some embodiments, the wireless communication system 146 may utilize an infrared link, bluetooth, or ZigBee to communicate directly with the device. Other wireless protocols, such as various vehicle communication systems, for example, the wireless communication system 146 may include one or more Dedicated Short Range Communication (DSRC) devices that may include public and/or private data communication between vehicles and/or roadside stations.
The power supply 110 may provide power to various components of the vehicle 100. In one embodiment, power source 110 may be a rechargeable lithium ion or lead acid battery. One or more battery packs of such batteries may be configured as a power source to provide power to various components of the vehicle 100. In some embodiments, the power source 110 and the energy source 119 may be implemented together, such as in some all-electric vehicles.
Some or all of the functionality of the vehicle 100 is controlled by the computer system 112. The computer system 112 may include at least one processor 113, the processor 113 executing instructions 115 stored in a non-transitory computer readable medium, such as a data storage device 114. The computer system 112 may also be a plurality of computing devices that control individual components or subsystems of the vehicle 100 in a distributed manner.
The processor 113 may be any conventional processor, such as a commercially available CPU. Alternatively, the processor may be a dedicated device such as an ASIC or other hardware-based processor. Although fig. 1 functionally illustrates processors, memories, and other elements of the computer 110 in the same blocks, those of ordinary skill in the art will appreciate that the processors, computers, or memories may actually comprise multiple processors, computers, or memories that may or may not be stored within the same physical housing. For example, the memory may be a hard disk drive or other storage medium located in a different housing than the computer 110. Thus, references to a processor or computer are to be understood as including references to a collection of processors or computers or memories which may or may not operate in parallel. Rather than using a single processor to perform the steps described herein, some components, such as the steering component and the retarding component, may each have their own processor that performs only computations related to the component-specific functions.
In various aspects described herein, the processor may be located remotely from the vehicle and in wireless communication with the vehicle. In other aspects, some of the processes described herein are executed on a processor disposed within the vehicle and others are executed by a remote processor, including taking the steps necessary to perform a single maneuver.
In some embodiments, the data storage device 114 may include instructions 115 (e.g., program logic), and the instructions 115 may be executed by the processor 113 to perform various functions of the vehicle 100, including those described above. The data storage device 114 may also contain additional instructions, including instructions to send data to, receive data from, interact with, and/or control one or more of the travel system 102, the sensing system 104, the control system 106, and the peripheral devices 108.
In addition to instructions 115, data storage device 114 may also store data such as road maps, route information, the location, direction, speed of the vehicle, and other such vehicle data, among other information. Such information may be used by the vehicle 100 and the computer system 112 during operation of the vehicle 100 in autonomous, semi-autonomous, and/or manual modes.
A user interface 116 for providing information to and receiving information from a user of the vehicle 100. Optionally, the user interface 116 may include one or more input/output devices within the collection of peripheral devices 108, such as a wireless communication system 146, an on-board vehicle computer 148, a microphone 150, and a speaker 152.
The computer system 112 may control the functions of the vehicle 100 based on inputs received from various subsystems (e.g., the travel system 102, the sensing system 104, and the control system 106) and from the user interface 116. For example, the computer system 112 may utilize input from the control system 106 to control the steering unit 132 to avoid obstacles detected by the sensing system 104 and the obstacle avoidance system 144. In some embodiments, the computer system 112 is operable to provide control over many aspects of the vehicle 100 and its subsystems.
Alternatively, one or more of these components described above may be mounted or associated separately from the vehicle 100. For example, the data storage device 114 may exist partially or completely separate from the vehicle 100. The above components may be communicatively coupled together in a wired and/or wireless manner.
Optionally, the above components are only an example, in an actual application, components in the above modules may be added or deleted according to an actual need, and fig. 1 should not be construed as limiting the embodiment of the present invention.
An autonomous automobile traveling on a roadway, such as vehicle 100 above, may identify objects within its surrounding environment to determine an adjustment to the current speed. The object may be another vehicle, a traffic control device, or another type of object. In some examples, each identified object may be considered independently, and based on the respective characteristics of the object, such as its current speed, acceleration, separation from the vehicle, etc., may be used to determine the speed at which the autonomous vehicle is to be adjusted.
Alternatively, the autonomous automobile vehicle 100 or a computing device associated with the autonomous vehicle 100 (e.g., the computer system 112, the computer vision system 140, the data storage 114 of fig. 1) may predict behavior of an identified object based on characteristics of the identified object and the state of the surrounding environment (e.g., traffic, rain, ice on the road, etc.). Optionally, each identified object depends on the behavior of each other, so it is also possible to predict the behavior of a single identified object taking all identified objects together into account. The vehicle 100 is able to adjust its speed based on the predicted behavior of the identified object. In other words, the autonomous vehicle is able to determine what steady state the vehicle will need to adjust to (e.g., accelerate, decelerate, or stop) based on the predicted behavior of the object. In this process, other factors may also be considered to determine the speed of the vehicle 100, such as the lateral position of the vehicle 100 in the road on which it is traveling, the curvature of the road, the proximity of static and dynamic objects, and so forth.
In addition to providing instructions to adjust the speed of the autonomous vehicle, the computing device may also provide instructions to modify the steering angle of the vehicle 100 to cause the autonomous vehicle to follow a given trajectory and/or to maintain a safe lateral and longitudinal distance from objects in the vicinity of the autonomous vehicle (e.g., cars in adjacent lanes on the road).
Further, the data storage device 114 in the computer system 112 may include a system memory, and data running in the system memory may include an operating system and an application program APP of the computer. The processor 113 in the computer system 112 may be connected to the system memory through a system bus, and read and process data in the system memory.
The operating system includes a Shell and a kernel. Shell is an interface between the user and the kernel of the operating system. The shell is the outermost layer of the operating system. The shell manages the interaction between the user and the operating system, waiting for user input, interpreting the user input to the operating system, and processing the output results of the various operating systems.
The kernel is made up of those parts of the operating system that are used to manage memory, files, peripherals, and system resources. Interacting directly with the hardware, the operating system kernel typically runs processes and provides inter-process communication, CPU slot management, interrupts, memory management, IO management, and the like.
The application programs include programs related to controlling the automatic driving of the vehicle, such as programs for managing the interaction of the automatically driven vehicle with obstacles on the road, programs for controlling the route or speed of the automatically driven vehicle, and programs for controlling the interaction of the automatically driven vehicle with other automatically driven vehicles on the road. The application program also exists on the system of the software deploying server. In one embodiment, computer system 112 may download an application from a deploying server when needed to execute the application.
Then, the processing method of the artificial intelligence model provided by the embodiment of the present application may be specifically applied to the computer system 112 of fig. 1, and is applied to a scenario where the processor 113 of the computer system 112 performs AI processing on an image acquired by the camera by using an AI model such as a CNN model or a Mask RCNN model, so as to predict the behavior of the identified object according to the characteristics of the identified object and the state of the surrounding environment (e.g., traffic, rain, ice on the road, and the like), thereby determining what stable state the vehicle will need to adjust to (e.g., accelerate, decelerate, or stop). The processing apparatus and the processing device for an artificial intelligence model provided in the present application may specifically correspond to the computer system 112 in fig. 1.
The following describes a process of training or reasoning the AI by the processor of the computer system 112, with reference to the schematic structural diagram of the AI computing architecture provided in the embodiment of the present application shown in fig. 2. The AI computing architecture may correspond to the processor 113 in the computer system 112, which may specifically include a main processor (Host CPU) and an artificial intelligence processing unit, wherein:
the Host CPU may include an artificial intelligence processing unit driver (driver), a Runtime unit or Runtime layer or user state driver layer (Runtime), and a Library (Library), that is, the Host CPU may read the above data in the system memory or storage. Wherein, the artificial intelligence processing unit driver can provide the driving function of the artificial intelligence processing unit. Runtime may provide an Application Programming Interface (API) of an artificial intelligence processing unit, and is deployed in an Application program APP. The Library can provide an operator Library function which can be directly executed on an arithmetic logic unit of the artificial intelligence processing unit, and facilitates APP development service functions.
The artificial intelligence processing unit, which may also be referred to as an AI accelerator, may include an AI processor such as a GPU or an NPU, where the NPU may be a dedicated or customized neural network processor. The artificial intelligence processing unit may include a control unit (or controller) and an arithmetic logic unit.
In one implementation, for Runtime of the artificial intelligence processing unit, APIs such as model (model), stream (stream), task (task), event (event), and the like may be provided. The upper layer service (such as APP) splits an AI computation graph (namely AI model) into streams, tasks, events and the like which can be processed by the artificial intelligence processing unit, and the AI model is issued to the artificial intelligence processing unit for processing by calling the APIs. The control unit of the artificial intelligence processing unit can be used for receiving the AI model issued by the Host CPU, scheduling the AI model for training or reasoning, and reporting the execution result to the Host CPU. The arithmetic logic unit can execute the tasks in the AI model issued by the control unit and return the result of each task to the control unit (one AI model may include a plurality of tasks).
In order to improve the execution efficiency of the model, the APP loads the AI model to the artificial intelligence processing unit, and the artificial intelligence processing unit stores the AI model. The model only needs to be loaded once, and can be executed for a plurality of times subsequently, and when the APP exits or the service is finished, the artificial intelligence processing unit is informed to unload the loaded model before. On the artificial intelligence processing unit side, the loaded AI model is stored according to the similar structure modes of stream, task and event.
When the AI accelerator does not support the branch judgment operator and the cyclic operator, part of the calculation needs to be performed on a Host CPU in the process of model training or model execution, and the inference performance of the model is reduced.
That is to say, for a certain branch judgment operator and a loop operator in the AI model, after the arithmetic logic unit executes the operator, the execution of the following operator cannot be controlled, and only data is returned to the Host CPU side through the control unit, and the branch judgment operator and the loop operator are executed by the Host CPU to trigger the control unit to schedule the operator or task for the arithmetic logic unit again. Therefore, the method interacts with the Host CPU for multiple times, and part of calculation needs to be dropped back to the Host CPU for execution, so that a branch judgment operator and a loop operator in the whole AI model are completed. Resulting in poor reasoning or training performance of the model.
The following describes how the AI computing architecture provided by the present application improves the performance of model inference or model training, with reference to a schematic structural diagram of a processing device of an artificial intelligence model provided by the embodiment of the present application shown in fig. 3. The processing device 30 of the artificial intelligence model comprises a processor 300 and an artificial intelligence processing unit 301, wherein:
the processor 300 is used for creating an Artificial Intelligence (AI) model; the AI model comprises a control operator and a calculation operator; then based on the user mode interface API, the AI model is issued to the artificial intelligence processing unit 301; the API comprises a first API, and the first API is used for issuing the control operator; the processor 300 may correspond to a main processor.
Specifically, the APP is provided with program codes of an AI model, which can be used for AI processing of input data. The processor 300 reads the data of the APP and runs the process of the program code therein about the AI model, i.e., creates the AI model.
The AI model in the embodiments of the present application includes a control operator, which may be code or a function that implements control logic. In the embodiment of the application, an API (application program interface) for issuing the control operator can be arranged on the Runtime layer; the processor 300 may issue the AI model to the artificial intelligence processing unit 301 by calling the API of the Runtime layer.
The artificial intelligence processing unit 301 is configured to execute the control operator and the calculation operator in the process of acquiring the artificial intelligence AI model and training or reasoning the AI model.
It will be appreciated that the processor 300 of the embodiments of the present application may include its own controller, arithmetic unit, etc. for interpreting computer instructions and processing data in computer software. The processor 300 is a central hardware unit of the processing device 30 of the artificial intelligence model, and is mainly responsible for computation and overall coordination, including controlling and allocating all hardware resources (such as a memory, an input/output unit, and the artificial intelligence processing unit 301 of the embodiment of the present application) of the processing device 30 of the artificial intelligence model and executing general operations.
The artificial intelligence processing unit 301 of the embodiments of the present application may actually be a processor or a processing chip, such as a dedicated or customized GPU or NPU, and may be mounted on the processor 300 as a coprocessor, or coupled to the processor 300, and assigned with tasks by the processor 300.
Taking NPU as an example, the core component of the artificial intelligence processing unit 301 is an arithmetic logic unit, and the control unit controls the arithmetic logic unit to extract matrix data and perform arithmetic. The arithmetic logic unit may include a plurality of processing units (PEs) therein. In some implementations, the arithmetic logic unit is a two-dimensional systolic array. The arithmetic logic unit may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic logic unit is a general-purpose matrix processor. The artificial intelligence processing unit 301 may further include a unified Memory, a Memory Access Controller (DMAC), a weight Memory, a bus interface unit, a vector calculation unit, an Instruction Fetch Memory (Instruction Fetch Buffer), and the like. Wherein:
the unified memory may be used to store input data as well as output data. The weight data may be carried directly through the DMAC into the weight memory. Input data may also be carried through the DMAC to the unified memory.
The bus interface unit may be used for the interaction of the AXI bus with the DMAC and the fetch memory. It can be specifically used for the instruction fetch memory to fetch instructions from the external memory, and for the DMAC to fetch the original data of the input matrix a or the weight matrix B from the external memory.
The DMAC is mainly used to transfer input data in the external memory to the unified memory or transfer weight data into the weight memory or transfer input data into the input memory.
The vector calculation unit may comprise a plurality of arithmetic processing units, and further processes the output of the arithmetic logic unit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, etc., if necessary. The method is mainly used for non-convolution/FC layer network calculation in the neural network, such as Pooling (Pooling), Batch Normalization (Batch Normalization), Local Response Normalization (Local Response Normalization) and the like.
In some implementations, the vector calculation unit can store the vector of processed outputs to a unified buffer. For example, the vector calculation unit may apply a non-linear function to the output of the arithmetic logic unit, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to an arithmetic logic unit, e.g., for use in subsequent layers in a neural network.
The instruction fetch memory to which the control unit is coupled may be used to store instructions used or executed by the control unit.
According to the embodiment of the application, the API used for issuing the control operator is set, and the AI model including the control operator can be issued to the artificial intelligence processing unit based on the API, so that the artificial intelligence processing unit can independently execute the whole AI model in the process of training or reasoning the AI model, and part of control functions do not need to be returned to the main processor for processing. The method solves the technical problem that the performance of model reasoning or model training is not high because the main processor and the AI accelerator need to frequently interact with each other to finish the training or reasoning process of the AI model in the prior art; the performance of model reasoning or model training is improved.
In a possible implementation manner, the control operator of the embodiment of the present application may include a branch judgment operator, where the branch judgment operator is used to judge whether to execute the first branch operator or the second branch operator.
In a possible implementation, the control operator further includes a loop operator, and the loop operator is configured to loop through the first calculation operator of the AI model; the API also includes a second API and a third API, the second API for creating a tag; the third API is used to set the location of the tag in the AI model.
That is, the API for issuing the control operator (i.e., the first API) may include an API for issuing a branch judgment operator and an API for issuing a loop operator. Specifically, the method comprises the following steps:
the processor 300 can create or add 4 APIs for issuing AI models at the Runtime layer:
1. second API, create tag (CreateLabel): positioning when jumping;
2. third API, label set (label, stream related): placing label at the current position of the stream; that is, for the task placed with label, setting its position in the execution sequence or data stream;
3. branch judgment (Switch) (related to value, condition, false _ label, stream): the data (or value) in the condition register is compared with the value condition, and if the value is false, the task execution after the false _ label is jumped to. The conditions support ">", "<", "═ and", "! Comparing unsigned shaping numerical values such as "" < ═ and "> =" and the like;
4. cycle (Goto) (label, stream related): unconditionally jumping to task execution after label.
Then, in the process of developing the application APP, a developer may develop based on the API provided by the Runtime layer, and split the AI computation graph according to the development requirement, and convert the AI computation graph into a stream, a task, an event, and the like that can be processed by the artificial intelligence processing unit 301. Then, the control operators (such as branch judgment operators and cycle operators) are issued to the artificial intelligence processing unit 301 for execution by calling the corresponding APIs.
In the embodiment of the application, an API for issuing a cyclic operator may be further configured, so that the AI model issued to the artificial intelligence processing unit can enable the artificial intelligence processing unit to independently complete the execution of the cyclic operator in the process of training or reasoning the AI model. And the artificial intelligence processing unit can carry out related jumping through the API for creating the label and the API for setting the position of the label in the AI model in the process of executing the loop operator, so that the execution of the loop operator can be further completed quickly and efficiently.
Further, the following describes, with reference to fig. 4, a schematic structural diagram of a processing device of an artificial intelligence model according to another embodiment provided by the present application, a specific structure of an artificial intelligence processing unit provided by the present application, and how to improve performance of model inference or model training.
The processing device 40 of the artificial intelligence model of fig. 4 may include a control unit (or controller) 400 and an arithmetic logic unit 402. As described above, the processing device 40 of the artificial intelligence model may also include other units or modules, such as a unified memory, DMAC, weight memory, bus interface unit, etc., but are not shown in FIG. 4 for this embodiment. As shown in FIG. 4, the processing device 40 of the artificial intelligence model may often include a plurality of arithmetic logic units 402. The control unit 400 may include a plurality of first storage units (which may be referred to as first registers or Condition registers (COND)), and during the process of training or reasoning the AI model, the control unit 400 may correspond to one execution sequence of the AI model, and different execution sequences in parallel correspond to different first storage units; the arithmetic logic unit 402 includes a second storage unit (which may be referred to as a second Register or a Special Purpose Condition Register (COND _ SPR)). The control unit 400 is used to schedule the AI model execution, that is, to schedule the operator or task of the model to the operation logic unit 402 for execution (e.g., calculation type operator), or the control unit 400 itself for execution (e.g., event type task). The arithmetic logic unit executes a single calculation type operator. Specifically, the method comprises the following steps:
the control unit 400 is configured to obtain or read an AI model, where the AI model includes a control operator and a calculation operator; the AI model is issued by the main processor based on the API; the API comprises a first API, and the first API is used for issuing the control operator;
the second storage unit is used for storing data after the arithmetic logic unit 402 executes the calculation operator.
The control unit 400 is also used to read the data of the second memory cell. After the arithmetic logic unit 402 executes the task, the executed data is written into its second storage unit, and the control unit 400 may be notified that the execution is completed to trigger the control unit 400 to read the data in the second storage unit. After reading the data in the second storage unit, the control unit 400 writes the data in the first storage unit corresponding to the execution sequence. The control unit 400 may then execute the next task of the execution sequence based on the data stored in the first storage unit and the parameters in the control operator;
the control operator of the embodiment of the present application is a control task that the control unit 400 needs to execute according to the data read from the 402 arithmetic logic unit. In one possible implementation, the control operator may include a branch judgment operator for judging whether to execute the first branch operator or the second branch operator;
the control unit 400 may specifically determine whether to execute the first branch operator based on the data and the parameter in the branch judgment operator when executing the next task of the execution sequence based on the data and the parameter in the control operator;
if the branch judgment operator is used for judging whether to execute the first branch operator or the second branch operator, executing the first branch operator; if not, the second branch operator is executed.
Further, the AI model of the embodiment of the present application may further include a loop operator, where the loop operator is configured to loop through the first calculation operator of the AI model. After judging that the first branch operator is executed, the control unit 400 may execute the loop operator; to schedule the arithmetic logic unit 402 to loop through the first calculation operator of the execution sequence until the control unit 400 judges not to execute the first branch operator based on the data and the parameter in the branch judgment operator.
It should be noted that, when executing the branch judgment operator, the control unit 400 often judges whether it is true (true) or False (False) according to specific judgment logic or judgment conditions based on the data and parameters in the branch judgment operator, for example, when judging it is true, the first branch operator is executed, and when judging it is False, the second branch operator is executed; alternatively, the first branch operator may be executed when the determination is False, and the second branch operator may be executed when the determination is future. The embodiment of the present application determines whether to determine true (true) or False (False) by determining whether to execute the first branch operator. Since the loop operator is executed after the first branch operator is determined to be executed, that is, the embodiment of the present application may execute the loop operator when the branch judgment operator is executed and is determined to be future through specific judgment logic or judgment conditions based on the data and parameters in the branch judgment operator, or execute the loop operator when the branch judgment operator is determined to be False.
For example, in the execution sequence of the AI model, the arithmetic logic unit 402 is invoked by the control unit 400 to execute a certain calculation operator (e.g., the first calculation operator in the present application) before the control unit 400 determines whether to execute the first branch operator based on the data and the parameters in the branch judgment operator. Then the control unit 400 is executing the loop operator and may trigger the control unit 400 to call the arithmetic logic unit 402 to execute the certain calculation operator again, thereby jumping to the first calculation operator for loop execution. Until the control unit 400 judges not to execute the first branch operator based on the data and the parameter in the branch judgment operator.
In a possible implementation manner, the foregoing in the process of executing the branch judgment operator and the loop operator may specifically be implemented by:
the AI model of the embodiment of the application may further include a first tag and a second tag for jumping, and their respective positions placed in the execution sequence; wherein the first tag is placed in a last task adjacent to a first compute operator of the execution sequence and the second tag is placed in a last task adjacent to the second branch operator;
assume that in the execution sequence of the AI model, the control unit 400 includes the control unit 400 scheduling the first computational operator to the task executed by the arithmetic logic unit before executing the next task of the execution sequence based on the data and the parameters in the control operator;
after judging that the first branch operator is executed, the control unit 400 may execute the loop operator, jump to the position of the first tag, and return the next task to the first calculation operator of the execution sequence, that is, the control unit 400 schedules the operation logic unit 402 to execute the first calculation operator again for iteration;
when the control unit 400 executes the second branch operator, that is, jumps to the position where the second label is located, the next task is the second branch operator, so as to execute the second branch operator.
The control unit 400 is further configured to output execution completion instruction information when the execution sequence is completed.
According to the embodiment of the application, the plurality of first storage units are arranged in the control unit, the second storage unit is arranged in the arithmetic logic unit, the arithmetic logic unit can write data after executing tasks into the second storage unit, and then the control unit can read the data of the second storage unit and write the data into the first storage unit corresponding to the execution sequence. That is to say, the control unit can read the data after the current operator is executed, and the execution of the following operator is controlled according to the execution result of the current operator. The whole AI model is executed in the control unit and the operation logic unit, and partial control functions do not need to be returned to the main processor for processing; the method solves the technical problem that the performance of model reasoning or model training is not high because the main processor and the AI accelerator need to frequently interact with each other to finish the training or reasoning process of the AI model in the prior art; the performance of model reasoning or model training is improved.
The structure of the main processor and the artificial intelligence processing unit is further described with reference to fig. 5, and fig. 5 shows a schematic diagram of a processing device of an artificial intelligence model 50 according to another embodiment of the present disclosure, which includes an artificial intelligence processing unit, which may include a control unit 500 and an arithmetic logic unit 502. In one implementation, processing device 50 of the artificial intelligence model may also include a main processor 504. Control unit 500 is coupled to main processor 504.
A certain number of first storage units may be customized or provided for the control unit 500 as desired. The first memory unit may allow only reading and writing by the control unit 500. In a possible implementation manner, in the process of training or reasoning the artificial intelligence AI model, one first storage unit corresponds to one execution sequence of the AI model, and different execution sequences in parallel are corresponding to different first storage units.
As in fig. 5, 3 AI models are listed as examples: AI model1 may correspond to 2 execution sequences, the 2 execution sequences corresponding to first storage unit 0 and first storage unit 1, respectively. That is, the execution sequence 0 (including task 01, task 02, and the like) corresponds to the first storage unit 0; the execution sequence 1 (including the task 11, the task 12, and the like) corresponds to the first storage unit 1. The number of tasks corresponding to the execution sequence 0 and the number of tasks corresponding to the execution sequence 1 can be set by the AI model1 according to the requirement.
Similarly, the AI model 2 may correspond to 1 execution sequence, and the 1 execution sequence corresponds to the first storage unit 2. The AI model 3 may correspond to 3 execution sequences, which correspond to the first storage unit 3, the first storage unit 4, and the first storage unit 5, respectively.
Each arithmetic logic unit 502 may include a respective second memory location. The arithmetic logic unit 502 may write data or arithmetic results after executing the task into its own second storage unit; the control unit 500 may access the second memory cell and read data in the second memory cell.
It is understood that the embodiments of the present application are not limited to 3 AI models, nor 6 execution sequences in fig. 5. The processing apparatus of the embodiment of the application may set the number of the first storage units in the control unit according to the actual scene requirement, for example, 50 first storage units are provided, and then the control unit may concurrently process 50 execution sequences, where the 50 execution sequences may be execution sequences of one or more AI models.
Assuming that all three applications (application 1, application 2, and application 3) are currently running, 10 AI models need to be loaded for training or reasoning. The 10 AI models have 100 execution sequences in total, so the first 50 execution sequences may be issued to the control unit for scheduling execution, and after the execution is completed and a result is returned, the subsequent execution sequences may be issued and executed in sequence until all execution sequences are completed.
In one implementation, the second storage unit in the arithmetic logic unit 502 only stores the data after the last execution of the calculation operator, and refreshes and stores the latest data after the execution of the calculation operator in the second storage unit after the subsequent execution of the new calculation operator. Similarly, the control unit may also refresh and write the read new data into its first memory cell, that is, the first memory cell in the control unit may also store only once written data, and when the new data is written, the control unit may delete the old data and store only the newly written data to refresh the data stored therein.
The processing device 50 of the artificial intelligence model in the embodiment of the present application may include, but is not limited to, a mobile phone (mobile phone), a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), a wearable device, a Virtual Reality (VR) device, an Augmented Reality (AR) device, a wireless terminal in industrial control (industrial control), a wireless terminal in self driving (self driving), a wireless terminal in remote surgery (remote medical supply), a wireless terminal in smart grid (smart grid), a wireless terminal in transportation safety (transportation safety), a wireless terminal in smart city (smart city), a wireless terminal in smart home (smart home), a smart robot, a vehicle-mounted system, or a vehicle including a cabin controller, and the like.
It is understood that the processing device 50 of the artificial intelligence model can also include at least one of other processors, memory modules, communication modules, display screens, batteries, battery management modules, multiple sensors based on different functions, and the like. Only the processing device 50 of the artificial intelligence model in fig. 5 is not shown.
The following description is made by way of example with reference to a schematic diagram of a loop statement of an AI model provided in the embodiment of the present application and shown in fig. 6. Assume that the execution logic of the loop statement for model1 (i.e., AI model 1) is:
i is initially 0, and when i <10, i +1 operation is performed in a loop. In fig. 6, the entry of the computation graph is an input (Enter) operator, the Enter passes through an input tensor 0, the Merge (Merge) operator receives 0 and then starts execution, transmits 0 to the Switch operator, and simultaneously outputs 0 and 10 for comparison, the comparison result (true) serves as the control input (p) of Switch, the Switch operator forwards input 0 to the true branch, the add (add) operator receives input 0 and then starts 0+1 computation, the computation result 1 is output to a loop or iteration (nexttirrection) operator, the nexttiction operator transmits input 1 to Merge, and the loop is executed until output > of Merge is 10, the Switch operator transmits input to false branch, and exits (Exit) loop.
Then, in combination with the above 4 APIs created or added at Runtime layer, that is, the API including the first API (the API that generates the branch judgment operator and the API that generates the loop operator), the second API and the third API, the encoding may be as follows:
Create model1
Create stream1
Create label1
Create label2
Model1.add(stream 1)
Launch(stream1,Enter,…)
LabelSet(label1,stream1)
Launch(stream1,Merge,…)
Switch(10,“<”,label2,stream1)
Launch(stream1,Add,…)
Goto(label1,stream1)
LabelSet(label2,stream1)
Launch(stream1,Exit,…)
that is to say, the first tag label1 and the second tag label2 for hopping are created, and the first tag label1 and the second tag label2 are distributed at positions in the data stream 1. The location of label1 (i.e., the location of the executing task at which this label of label1 is placed) is adjacent to and before the Merge operator. The position of label2 (i.e., the position of the executing task at which this label of label2 is placed) is adjacent to and precedes the Exit operator. Referring to the schematic diagram of the execution sequence of stream1 in the AI model provided in the embodiment of the present application shown in fig. 7, the execution sequence of stream1 in the AI model may include 8 tasks, for example, as shown in the figure, Enter corresponds to task 1, label1 corresponds to task 2, Merge corresponds to task 3, Switch corresponds to task 4, add corresponds to task 5, Goto corresponds to task 6, label2 corresponds to task 7, and Exit corresponds to task 8. The execution sequence of stream1 corresponds to the first memory location in fig. 7.
After the application program (for example, APP1) sends the code to the control unit 500 through the Runtime layer and through the NPU driver, that is, after the model1 that needs training or inference is sent to the control unit 500, the control unit 500 is triggered to dispatch the calculation operator of the execution sequence of the AI model to the arithmetic logic unit 502 for execution. When executing the loop Goto operator, the control unit 500 triggers the arithmetic logic unit 502 to iteratively execute the tasks of the execution sequence based on the fact that label1 jumps to the position of label1 where the execution sequence of the AI model is located (i.e. the position of the executed task on which the label1 is placed). The control unit 500 executes a branch judgment operator (Switch), and if the judgment is negative, based on label2, jumps to the position of the second label2 in the execution sequence of the AI model (i.e. the position of the execution task with the label2 placed), and then schedules the quit task (i.e. the Exit operator) to the operation logic unit 502; upon receiving a notification that the arithmetic logic unit 502 has completed executing the task, the arithmetic logic unit outputs instruction information indicating that the execution has been completed.
Specifically, with reference to the schematic diagram of the control unit execution flow provided in the embodiment of the present application shown in fig. 8 and the schematic diagram of the artificial intelligence processing unit execution principle shown in fig. 9, an execution flow of the control unit is described:
step S800: the control unit dispatches the Enter task to the arithmetic logic unit for execution;
step S802: the arithmetic logic unit executes the Enter task;
step S804: after the arithmetic logic unit finishes executing, the control unit is informed of the completion of executing;
specifically, after the arithmetic logic unit is executed, the result or data after the task is executed may be stored in the second storage unit.
Step S806: after receiving the notification, the control unit can directly skip the task or operator (Label1 task) of the first Label and execute the following task;
specifically, after learning that the operator task is executed by the arithmetic logic unit, the control unit may read data in the second storage unit, and store (or write) the data in the first storage unit COND corresponding to the execution sequence; the Label1 task is executed, and the control unit skips directly for the Label1 task, that is, step S808 is executed.
Step S808: the control unit dispatches the Merge task to the arithmetic logic unit for execution; in the embodiment of the present application, the Merge task may be equivalent to the first calculation operator.
Step S810: the arithmetic logic unit executes the Merge task and writes i into a second storage unit COND _ SPR of the arithmetic logic unit;
specifically, when the arithmetic logic unit executes the Merge task for the first time, the value of i is directly passed through to the Merge task by the Enter task. When the Merge task is executed in the subsequent iteration, the value of i is transferred to the Merge task by the loop operator.
Step S812: after the operation logic unit finishes executing, the operation logic unit informs the control unit of finishing executing;
step S814: after receiving the notification, the control unit reads the value (i.e., i) of the COND _ SPR and writes the read data into a first storage unit COND corresponding to stream 1;
specifically, after learning that the arithmetic logic unit has completed executing the Merge task, the control unit may read the data (i.e., i) in the second storage unit, and write the data in the first storage unit COND corresponding to the execution sequence. It can be understood that if the first memory cell COND has stored data, the current writing is equivalent to refreshing the data in the first memory cell COND.
Step S816: the control unit executes the Switch task, performs judgment processing according to the read data and the parameter in the Switch task (for example, compares i with value, and judges whether i is smaller than value), if so, continues to execute the following task (i.e., executes step S818), and if so, jumps to label2 (i.e., jumps to step S824);
specifically, for example, if the value is 10, if it is determined that the current i is less than 10, step S818 is executed; if not less than 10, go to step S824.
Step S818: the control unit dispatches the added task Add task (corresponding to the next task, such as a first branch operator) to the arithmetic logic unit for execution; that is, in the embodiment of the present application, the loop operator in the subsequent step S822 is executed only when it is determined that there is a future in step S816. In fact, as described above in the embodiment of fig. 4, the embodiments of the present application are not limited.
Step S820: the arithmetic logic unit informs the control unit of the completion of execution;
for example, the arithmetic logic unit may execute Add task (first branch operator), for example, Add 1 to the i value, store the current i value in its own second storage unit, and notify the control unit after the execution is completed.
Step S822: the control unit executes the loop operator (gototask), then unconditionally jumps to label1 to start executing (i.e. jumps to step S806 to start executing, and then triggers execution of the first calculation operator, Merge task, again);
specifically, after learning that the Goto task is executed by the arithmetic logic unit, the control unit may read the data (i.e., the current i value) in the second storage unit, and write the data in the first storage unit COND corresponding to the execution sequence. The data in the current first memory location COND (i.e., the current i value) is then passed to the Merge task when the Goto task is executed. The control unit of the embodiment of the application supports data reading from the second storage unit of the logic operation unit, so that execution of the branch judgment operator can be completed inside the control unit, and repeated interaction with a Host CPU is not needed.
Step S824: the control unit can skip Lable2 task for Lable2 task (which can be equivalent to another next task, such as a second branch operator), and schedule the Exit task to be executed by the arithmetic logic unit;
step S826: the arithmetic logic unit informs the control unit of the completion of execution;
step S828: and after receiving the notification, the control unit judges that the execution sequence is completed and then outputs the instruction information of the completed execution.
Specifically, the execution completion instruction information may be output to the NPU driver.
In a possible implementation manner, the control unit 500 of the embodiment of the present application may be further configured to set the second storage unit of the operation logic unit 502 to an invalid value before reading the data of the second storage unit; then, the control unit 500 subsequently writes the read data into the first storage unit corresponding to the execution sequence when determining that the read data of the second storage unit is the valid value. If the data is judged to be invalid, the read data is not written into the first storage unit corresponding to the execution sequence.
It is understood that the above description of the embodiments of fig. 6-8 is only one embodiment of the present application, and the first branch operator of the present application is not limited to Add task, and the second branch operator is not limited to lab 2 task or Exit task.
Correspondingly, the application also provides a processing device of the artificial intelligence model and a processing method of the artificial intelligence model, which will be described with reference to fig. 10 to 13.
As shown in fig. 10, which is a schematic structural diagram of a processing apparatus of an artificial intelligence model provided in an embodiment of the present application, the processing apparatus 16 of the artificial intelligence model may be the processor 300 of the processing device 30 of the artificial intelligence model in fig. 3 or the main processor 504 of the processing device 50 of the artificial intelligence model in fig. 5. The processing device 16 of the artificial intelligence model may include a creating unit 160 and an issuing unit 162, wherein:
the creating unit 160 is used for creating an artificial intelligence AI model; the AI model comprises a control operator and a calculation operator;
in particular, the creation unit 160 may correspond to program code executed in the processor 300 or the main processor 504 for creating an artificial intelligence AI model.
The issuing unit 162 is configured to issue the AI model to the artificial intelligence processing unit based on the user mode interface API; the API comprises a first API, and the first API is used for issuing the control operator; the artificial intelligence processing unit is used for executing the control operator and the calculation operator in the process of training or reasoning the AI model.
Specifically, the issuing unit 162 may correspond to program code executed in the processor 300 or the main processor 504 for issuing the AI model to the artificial intelligence processing unit based on a user-mode interface API.
In one implementation, the control operator includes a branch judgment operator for judging whether to execute the first branch operator or the second branch operator.
In one implementation, the control operator further includes a loop operator, and the loop operator is configured to loop through the first calculation operator of the AI model; the API also includes a second API and a third API, the second API for creating a tag; the third API is used to set the location of the tag in the AI model.
The specific implementation of the creating unit 160 and the issuing unit 162 may refer to the process of performing AI model processing on the processor 300 or the main processor 504 in the embodiments of fig. 3 to 8, which is not described herein again.
As shown in fig. 11, which is a schematic structural diagram of a processing apparatus of an artificial intelligence model according to another embodiment provided in the present application, the processing apparatus 17 of the artificial intelligence model may be the artificial intelligence processing unit 301 of the processing device 30 of the artificial intelligence model according to fig. 3, the processing device 40 of the artificial intelligence model according to fig. 4, or the artificial intelligence processing unit of the processing device 50 of the artificial intelligence model according to fig. 5. Wherein the processing means 17 of the artificial intelligence model may comprise an obtaining unit 170 and an executing operator unit 172, wherein:
the obtaining unit 170 is configured to obtain an artificial intelligence AI model; the AI model comprises a control operator and a calculation operator, and is issued by the processor based on a user mode interface API; the API comprises a first API, and the first API is used for issuing the control operator;
the execution operator unit 172 is configured to execute the control operator and the calculation operator during the process of training or reasoning the AI model.
Specifically, the acquisition unit 170 may correspond to program code for acquiring an artificial intelligence AI model executed in a control unit in an artificial intelligence processing unit of the artificial intelligence processing unit 301 or the processing apparatus 40 of an artificial intelligence model or the processing apparatus 50 of an artificial intelligence model.
The execution operator unit 172 may correspond to program code for executing the control operator in the course of training or reasoning the AI model, executed in cooperation or in cooperation with each other in the control unit and the arithmetic logic unit in the artificial intelligence processing unit 301 or the processing device 40 of the artificial intelligence model or the artificial intelligence processing unit of the processing device 50 of the artificial intelligence model.
In one implementation, the execution operator unit 172 may include a first execution unit 1720, a storage processing unit 1721, and a second execution unit 1722, where:
the first execution unit 1720 is used for executing a calculation operator in the AI model through an arithmetic logic unit of the artificial intelligence processing unit in the process of training or reasoning the AI model; that is, the first execution unit 1720 may correspond to a program code for executing a calculation operator in the AI model on an arithmetic logic unit.
The storage processing unit 1721 is used for storing the data after the calculation operator is executed in the storage unit of the artificial intelligence processing unit; specifically, the storage processing unit 1721 may correspond to a program code on an arithmetic logic unit for storing data after the calculation operator is executed in a storage unit of the artificial intelligence processing unit.
The second execution unit 1722 is used for executing the control operator based on the data in the storage unit through the control unit of the artificial intelligence processing unit. In particular, the second execution unit 1722 may correspond to a program code on a control unit for executing the control operator based on data in the storage unit.
In one implementation, the storage unit in the artificial intelligence processing unit may include a first storage unit and a second storage unit;
the storage processing unit 1721 may specifically be configured to: storing the data after the calculation operator is executed in the second storage unit;
the second execution unit 1722 may specifically include:
a first reading unit for reading the data in the second storage unit;
a first writing unit for writing the data in the second storage unit into the first storage unit;
and the reading execution unit is used for reading and executing the control operator based on the data in the first storage unit.
In one implementation manner, the read execution unit may specifically include:
a second reading unit for reading the data in the first storage unit;
a judging unit, configured to judge whether to execute the first branch operator based on the data in the first storage unit and the parameter in the branch judgment operator;
the judgment processing unit is used for executing the first branch operator if the judgment unit judges that the first branch operator is the first branch operator; if the judging unit judges that the branch operator is not the first branch operator, the second branch operator is executed.
In one implementation, the control operator may further include a loop operator, and the loop operator is configured to loop through the first calculation operator of the AI model; the processing means 17 of the artificial intelligence model may further comprise a third execution unit 174 for executing the loop operator after the judgment processing unit executes the first branch operator to loop execute the first calculation operator through the arithmetic logic unit until the judgment is no. In particular, the third execution unit 174 may correspond to a program code for executing the loop operator on a control unit.
In one implementation, the API further includes a second API and a third API, the second API for creating a tag; the third API is used for setting the position of a label in the AI model; the AI model includes a first tag and a second tag for jumping; wherein the first label is placed in a last operator adjacent to a first computation operator of the AI model, and the second label is placed in a last operator adjacent to the second branch operator; the processing means 17 of the artificial intelligence model may further comprise a fourth execution unit 176 for executing the first calculation operator by the arithmetic logic unit before the reading execution unit reads and executes the control operator based on the data in the first storage unit; that is, the fourth execution unit 176 may correspond to a program code for executing the first calculation operator on an arithmetic logic unit.
The third executing unit 174 is specifically configured to: after the judgment processing unit executes the first branch operator, executing the loop operator, jumping to the position of the first label, and executing the first calculation operator circularly through the operation logic unit;
if the judgment unit judges that the first branch operator is not the second branch operator, the judgment processing unit is specifically used for jumping to the position where the second label is located so as to execute the second branch operator.
In one implementation, the processing apparatus 17 of the artificial intelligence model may further include a setting unit 178 for setting the second storage unit to an invalid value before the first reading unit reads the data in the second storage unit; in particular, the setting unit 178 may correspond to program code on the control unit for setting the second storage unit to an invalid value.
The first write unit is specifically configured to: and writing the data of the second storage unit into the first storage unit under the condition that the read data of the second storage unit is judged to be a valid value.
It should be noted that, for a specific implementation manner of the processing apparatus 17 of the artificial intelligence model, reference may be made to the process of performing AI model processing by the artificial intelligence processing unit 301 of the processing device 30 of the artificial intelligence model in the embodiments in fig. 3 to 8, the processing device 40 of the artificial intelligence model in fig. 4, or the artificial intelligence processing unit of the processing device 50 of the artificial intelligence model in the embodiment in fig. 5, which is not described herein again.
Fig. 12 is a flowchart illustrating a processing method of an artificial intelligence model according to an embodiment of the present disclosure, which is applied to the processor 300 of the processing device 30 of the artificial intelligence model in fig. 3 or the main processor 504 of the processing device 50 of the artificial intelligence model in fig. 5, and includes the following steps:
step S120: a processor (or called a main processor) creates an Artificial Intelligence (AI) model; the AI model comprises a control operator and a calculation operator;
step S122: and issuing the AI model to an artificial intelligence processing unit based on the user mode interface API.
The API comprises a first API, and the first API is used for issuing the control operator; the artificial intelligence processing unit is used for executing the control operator and the calculation operator in the process of training or reasoning the AI model.
For a specific implementation of the processing method of the artificial intelligence model in this embodiment, reference may be made to the process of performing AI model processing on the processor 300 or the main processor 504 in the embodiments of fig. 3 to 8, which is not described herein again.
As shown in fig. 13, a flowchart of a processing method of an artificial intelligence model according to another embodiment provided by this application is applied to an artificial intelligence processing unit 301 of a processing device 30 of an artificial intelligence model according to fig. 3, or a processing device 40 of an artificial intelligence model according to fig. 4, or an artificial intelligence processing unit of a processing device 50 of an artificial intelligence model according to fig. 5, where the artificial intelligence processing unit may include a control unit, an arithmetic logic unit, and a storage unit, and may perform the following steps:
step S130: reading an Artificial Intelligence (AI) model; the AI model comprises a control operator and a calculation operator, and is issued by the processor based on a user mode interface API; the API comprises a first API, and the first API is used for issuing the control operator;
step S132: executing the calculation operator, and storing the data after the calculation operator is executed in the storage unit; the control operator is executed based on the data in the storage unit.
The artificial intelligence processing unit can execute a calculation operator in the AI model through the operation logic unit in the process of training or reasoning the AI model, and the operation logic unit stores data after the calculation operator is executed in the storage unit; the control unit may then execute the control operator based on the data in the storage unit.
In one possible implementation, the storage unit may include a first storage unit and a second storage unit; then storing the data after the calculation operator in the storage unit may include: storing the data after the calculation operator is executed in the second storage unit;
the executing the control operator based on the data in the storage unit may include: reading the data in the second storage unit and writing the data in the second storage unit into the first storage unit; and reading and executing the control operator based on the data in the first storage unit.
In a possible implementation, the first storage unit may be integrated in the control unit, that is, the first storage unit may be added to the control unit, and the first storage unit may be a dedicated register of the control unit. The second storage unit may be integrated in the arithmetic logic unit, that is, the second storage unit may be added to the arithmetic logic unit, and the second storage unit may be a dedicated register of the arithmetic logic unit. The control unit can further quickly and efficiently read the data after the arithmetic logic unit executes the operator or the task, so that the execution of the following operator can be controlled according to the execution result of the current operator. The whole AI model is executed in the controller and the arithmetic logic unit, and partial control functions do not need to be returned to the main processor for processing.
And each operation logic unit in the artificial intelligence processing unit can be added with a second storage unit (a register special for the operation logic unit), so that each operation logic unit can be used for cooperating with the control unit to execute a control operator, and the performance of model inference or model training of the artificial intelligence processing unit can be further improved.
In one possible implementation, the AI model corresponds to at least one execution sequence, and each of the first storage units corresponds to a different execution sequence.
In this embodiment, the processor may set the number of the first storage units in the control unit in a customized manner according to the number of the AI models and the number of the execution sequences corresponding to each AI model.
In one possible implementation, the control operator may include a branch judgment operator and a loop operator, wherein the branch judgment operator is used for judging whether to execute the first branch operator or the second branch operator;
the reading and executing the control operator based on the data in the first memory location may include: reading data in the first storage unit; judging whether to execute the first branch operator based on the data in the first storage unit and the parameters in the branch judgment operator; if the judgment result is yes, executing the first branch operator; if not, the second branch operator is executed.
In a possible implementation, the control operator may further include a loop operator, and then after the executing the first branch operator, further includes: and executing the loop operator to iteratively execute the calculation operator of the AI model through the arithmetic logic unit until the judgment is negative.
In one possible implementation, the AI model may include a first tag and a second tag for jumping; wherein the first label is placed in a last operator adjacent to a first computation operator of the AI model, and the second label is placed in a last operator adjacent to the second branch operator; before the reading and executing the control operator based on the data in the first storage unit, the method may further include: executing the first calculation operator by the arithmetic logic unit;
the executing the loop operator to iteratively execute the calculation operator of the AI model through the arithmetic logic unit may include: executing the loop operator, jumping to the position of the first label, and executing the first calculation operator of the AI model by iteration of the operation logic unit; if not, executing the second branch operator, including: if not, jumping to the position of the second label to execute the second branch operator.
In a possible implementation manner, before the reading the data in the second storage unit, the method may further include: setting the second storage unit to an invalid value; the writing the data in the second memory cell to the first memory cell may include: and writing the data of the second storage unit into the first storage unit under the condition that the read data of the second storage unit is judged to be a valid value.
For a specific implementation of the processing method of the artificial intelligence model in this embodiment, reference may be made to the process of performing AI model processing by the artificial intelligence processing unit 301 of the processing device 30 of the artificial intelligence model in the embodiments in fig. 3 to 8, or the processing device 40 of the artificial intelligence model in fig. 4, or the artificial intelligence processing unit of the processing device 50 of the artificial intelligence model in the embodiment in fig. 5, which is not described herein again.
The present application also provides a computer-readable storage medium, where the computer-readable storage medium may store a program, and when the program is executed by a processor of the present application, the program includes some or all of the steps of the processing method of any one of the artificial intelligence models described in the above method embodiments.
For example, the program, when executed by the processor, may create an artificial intelligence AI model; the AI model includes a control operator; and then based on the user mode interface API, the AI model is issued to the artificial intelligence processing unit. Wherein, the API comprises an API used for issuing the control operator; the artificial intelligence processing unit is used for executing the control operator in the process of training or reasoning the AI model. For a specific implementation manner, reference may be made to the process of performing AI model processing on the processor 300 or the main processor 504 in the embodiments of fig. 3 to fig. 8, which is not described herein again.
For another example, when executed by the processor, the program may obtain or read an artificial intelligence AI model; the AI model comprises a control operator, and is issued by the processor through establishing the AI model and based on the user mode interface API; the API comprises an API used for issuing the control operator; the control operator is then executed during training or reasoning about the AI model. For a specific implementation manner of the AI model processing process, reference may be made to artificial intelligence processing unit 301 of processing device 30 of the artificial intelligence model in the embodiments in fig. 3 to fig. 8, or processing device 40 of the artificial intelligence model in fig. 4, or artificial intelligence processing unit of processing device 50 of the artificial intelligence model in the embodiment in fig. 5, which is not described herein again.
The embodiment of the present application further provides a computer program, where the computer program includes instructions, and when the computer program is executed by a multi-core processor, the processor of the embodiment of the present application may execute part or all of the steps of the processing method of any artificial intelligence model.
In some embodiments, the disclosed methods may be implemented as computer program instructions encoded on a computer-readable storage medium in a machine-readable format or encoded on other non-transitory media or articles of manufacture. Fig. 14 schematically illustrates a conceptual partial view of an example computer program or computer program product comprising a computer program for executing a computer process on a computing device, arranged according to at least some embodiments presented herein. In one embodiment, the example computer program product 1400 is provided using a signal bearing medium 1401. The signal bearing medium 1401 may comprise one or more program instructions 1402 which, when executed by one or more processors, may provide the functions or portions of the functions described above with respect to the processor 300 or the main processor 504, or the artificial intelligence processing unit 301 of the processing device 30 of the artificial intelligence model, or the processing device 40 of the artificial intelligence model of fig. 4, or the artificial intelligence processing unit of the processing device 50 of the artificial intelligence model of the fig. 5 embodiment of the embodiments of fig. 3-8.
In some examples, signal bearing medium 1401 may comprise a computer readable medium 1403 such as, but not limited to, a hard disk drive, a Compact Disc (CD), a Digital Video Disc (DVD), a digital tape, a Memory, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like. In some embodiments, the signal bearing medium 1401 may comprise a computer recordable medium 1404 such as, but not limited to, a memory, a read/write (R/W) CD, a R/W DVD, and the like. In some implementations, the signal bearing medium 1401 may include a communication medium 1405 such as, but not limited to, a digital and/or analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.). Thus, for example, signal-bearing medium 1401 may be conveyed by a wireless form of communication medium 1405 (e.g., a wireless communication medium conforming to the IEEE 802.11 standard or other transmission protocol). The one or more program instructions 1402 may be, for example, computer-executable instructions or logic-implementing instructions. In some examples, an artificial intelligence processing unit, such as for example artificial intelligence processing unit 301 with respect to processor 300 or main processor 504, or processing device 30 of the artificial intelligence model in the fig. 3-8 embodiments, or processing device 40 of the artificial intelligence model of fig. 4, or processing device 50 of the artificial intelligence model of the fig. 5 embodiments, may be configured to provide various operations, functions, or actions in response to program instructions 1402 communicated to a computing device by one or more of computer-readable medium 1403, computer-recordable medium 1404, and/or communication medium 1405. It should be understood that the arrangements described herein are for illustrative purposes only. Thus, those skilled in the art will appreciate that other arrangements and other elements (e.g., machines, interfaces, functions, orders, and groupings of functions, etc.) can be used instead, and that some elements may be omitted altogether depending upon the desired results. In addition, many of the described elements are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, in any suitable combination and location.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute all or part of the steps of the above-described method of the embodiments of the present application. The storage medium may include: a U-disk, a removable hard disk, a magnetic disk, an optical disk, a Read-Only Memory (ROM) or a Random Access Memory (RAM), and the like.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (22)

1. The processing method of the artificial intelligence model is characterized by being applied to an artificial intelligence processing unit, wherein the artificial intelligence processing unit comprises a control unit, an arithmetic logic unit and a storage unit; the method comprises the following steps:
acquiring an artificial intelligence AI model; the AI model comprises a control operator and a calculation operator, and is issued by the processor based on the user mode interface API; the API comprises a first API, and the first API is used for issuing the control operator;
executing the calculation operator, and storing data after the calculation operator is executed in the storage unit;
executing the control operator based on the data in the storage unit.
2. The method of claim 1, wherein the memory cell comprises a first memory cell and a second memory cell; the storing the data after the calculation operator is executed in the storage unit comprises: storing the data after the calculation operator is executed in the second storage unit;
the executing the control operator based on the data in the storage unit comprises: reading the data in the second storage unit through the control unit, and writing the data in the second storage unit into the first storage unit; and reading and executing the control operator based on the data in the first storage unit.
3. The method of claim 2, wherein the control operator comprises a branch predicate operator and a loop operator, wherein the branch predicate operator is used to predicate execution of a first branch operator or a second branch operator;
the reading and executing the control operator based on the data in the first storage unit includes:
reading data in the first storage unit;
judging whether to execute the first branch operator based on the data in the first storage unit and the parameters in the branch judgment operator;
if the judgment result is yes, executing the first branch operator; if not, executing the second branch operator.
4. The method of claim 3, wherein the control operator further comprises a loop operator for looping through a first computation operator of the AI model; after the executing the first branch operator, further comprising: and executing the cyclic operator to cyclically execute the first calculation operator through the arithmetic logic unit until the judgment is negative.
5. The method of claim 4, wherein the APIs further comprise a second API and a third API, the second API for creating a tag; the third API is used for setting the position of a tag in the AI model; the AI model further comprises a first tag and a second tag for jumping; wherein the first label is placed in a last operator adjacent to the first calculation operator and the second label is placed in a last operator adjacent to the second branch operator;
before the reading and executing the control operator based on the data in the first storage unit, the method further includes: executing the first computational operator by the arithmetic logic unit;
the executing the loop operator to loop execute the first computation operator through the arithmetic logic unit comprises: executing the loop operator, jumping to the position of the first label, and executing the first calculation operator in a loop mode through the operation logic unit;
said executing said second branch operator comprises: jumping to the position of the second label to execute the second branch operator.
6. The method of any of claims 2-5, wherein prior to reading the data in the second memory location, further comprising: setting the second storage unit to an invalid value;
the writing the data in the second storage unit into the first storage unit includes: and writing the data of the second storage unit into the first storage unit under the condition that the read data of the second storage unit is judged to be a valid value.
7. A method for processing an artificial intelligence model, the method comprising:
creating an Artificial Intelligence (AI) model; the AI model comprises a control operator and a calculation operator;
based on user mode interface API, the AI model is sent to artificial intelligence processing unit;
the API comprises a first API, and the first API is used for issuing the control operator; the artificial intelligence processing unit is used for executing the control operator and the calculation operator in the process of training or reasoning the AI model.
8. The method of claim 7, wherein the control operator comprises a branch judgment operator for judging whether to execute the first branch operator or the second branch operator.
9. The method of claim 8, wherein the control operator further comprises a loop operator for looping through a first computation operator of the AI model; the APIs further comprise a second API and a third API, the second API is used for creating a label; the third API is used to set the location of a tag in the AI model.
10. An apparatus for processing an artificial intelligence model, the apparatus being an artificial intelligence processing unit comprising:
the acquisition unit is used for acquiring an artificial intelligence AI model; the AI model comprises a control operator and a calculation operator, and is issued by the processor based on the user mode interface API; the API comprises a first API, and the first API is used for issuing the control operator;
a first execution unit for executing the calculation operator;
the storage processing unit is used for storing the data after the calculation operator is executed in the storage unit of the artificial intelligence processing unit;
a second execution unit for executing the control operator based on the data in the storage unit.
11. The processing apparatus according to claim 10, wherein the storage unit includes a first storage unit and a second storage unit;
the storage processing unit is specifically configured to: storing the data after the calculation operator is executed in the second storage unit;
the second execution unit includes:
a first reading unit for reading the data in the second storage unit;
a first writing unit for writing the data in the second storage unit into the first storage unit;
and the reading execution unit is used for reading and executing the control operator based on the data in the first storage unit.
12. The processing apparatus as claimed in claim 11, wherein the control operator comprises a branch judgment operator and a loop operator, wherein the branch judgment operator is used to judge whether to execute the first branch operator or the second branch operator; the read execution unit includes:
a second reading unit for reading the data in the first storage unit;
a judging unit, configured to judge whether to execute the first branch operator based on the data in the first storage unit and the parameter in the branch judgment operator;
the judgment processing unit is used for executing the first branch operator if the judgment unit judges that the first branch operator is the first branch operator; and if the judgment unit judges that the branch operator is not the first branch operator, executing the second branch operator.
13. The processing apparatus of claim 12, wherein the control operator further comprises a loop operator to loop through a first computation operator of the AI model; the processing apparatus further comprises:
a third execution unit configured to execute the loop operator after the judgment processing unit executes the first branch operator, so as to loop execute the first calculation operator through an arithmetic logic unit until the judgment is negative.
14. The processing apparatus as in claim 13 wherein the APIs further comprise a second API and a third API, the second API for creating tags; the third API is used for setting the position of a tag in the AI model; the AI model further comprises a first tag and a second tag for jumping; wherein the first label is placed in a last operator adjacent to a first computation operator of the AI model and the second label is placed in a last operator adjacent to the second branch operator; the processing apparatus further comprises:
a fourth execution unit configured to execute the first calculation operator by the arithmetic logic unit before the read execution unit reads and executes the control operator based on the data in the first storage unit;
the third execution unit is specifically configured to: after the judgment processing unit executes the first branch operator, executing the loop operator, and jumping to the position of the first label to execute the first calculation operator circularly through the operation logic unit;
if the judgment unit judges that the first branch operator is not the second branch operator, the judgment processing unit is specifically used for jumping to the position where the second label is located so as to execute the second branch operator.
15. The processing apparatus according to any one of claims 11 to 14, wherein the processing apparatus further comprises:
a setting unit configured to set the second storage unit to an invalid value before the first reading unit reads the data in the second storage unit;
the first writing unit is specifically configured to: and writing the data of the second storage unit into the first storage unit under the condition that the read data of the second storage unit is judged to be a valid value.
16. An apparatus for processing an artificial intelligence model, comprising:
the creating unit is used for creating an artificial intelligence AI model; the AI model comprises a control operator and a calculation operator;
the issuing unit is used for issuing the AI model to the artificial intelligence processing unit based on the user mode interface API;
the API comprises a first API, and the first API is used for issuing the control operator; the artificial intelligence processing unit is used for executing the control operator and the calculation operator in the process of training or reasoning the AI model.
17. The processing apparatus as in claim 16 wherein the control operator comprises a branch judgment operator for judging whether to execute the first branch operator or the second branch operator.
18. The processing apparatus of claim 17, wherein the control operator further comprises a loop operator to loop through a first computation operator of the AI model; the APIs further comprise a second API and a third API, the second API is used for creating a label; the third API is used to set the location of a tag in the AI model.
19. A processing apparatus of an artificial intelligence model, comprising the artificial intelligence processing unit and a memory; wherein the memory is configured to store program code, the artificial intelligence processing unit invoking the memory-stored program code to cause the processing device of the artificial intelligence model to perform the method of any of claims 1-6.
20. An artificial intelligence model processing apparatus comprising a processor and a memory; wherein the memory is configured to store program code, and the processor invokes the memory-stored program code to cause the processing device of the artificial intelligence model to perform the method of any of claims 7-9.
21. The processing equipment of the artificial intelligence model is characterized by comprising a processor, an artificial intelligence processing unit and a memory; wherein the memory is configured to store program code and the processor is coupled to the artificial intelligence processing unit; the artificial intelligence processing unit invoking the program code stored by the memory to cause the processing device of the artificial intelligence model to perform the method of any of claims 1-6; the processor invoking the program code stored by the memory causes the processing device of the artificial intelligence model to perform the method of any of claims 7-9.
22. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when being executed by a processor, carries out the method of any of the preceding claims 1-9.
CN202180002364.1A 2021-06-25 2021-06-25 Processing method, device and equipment of artificial intelligence model and readable storage medium Active CN113614749B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/102522 WO2022267049A1 (en) 2021-06-25 2021-06-25 Artificial intelligence model processing method, apparatus, and device, and readable storage medium

Publications (2)

Publication Number Publication Date
CN113614749A CN113614749A (en) 2021-11-05
CN113614749B true CN113614749B (en) 2022-08-09

Family

ID=78310967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180002364.1A Active CN113614749B (en) 2021-06-25 2021-06-25 Processing method, device and equipment of artificial intelligence model and readable storage medium

Country Status (2)

Country Link
CN (1) CN113614749B (en)
WO (1) WO2022267049A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117675608A (en) * 2022-08-31 2024-03-08 华为技术有限公司 Processing device and control method
CN115782835B (en) * 2023-02-09 2023-04-28 江苏天一航空工业股份有限公司 Automatic parking remote driving control method for passenger boarding vehicle

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10747631B2 (en) * 2018-01-19 2020-08-18 DinoplusAI Holdings Limited Mission-critical AI processor with record and replay support
US10931588B1 (en) * 2019-05-10 2021-02-23 Innovium, Inc. Network switch with integrated compute subsystem for distributed artificial intelligence and other applications
CN112308198A (en) * 2019-07-26 2021-02-02 中科寒武纪科技股份有限公司 Calculation method of recurrent neural network and related product
CN112465129B (en) * 2019-09-09 2024-01-09 上海登临科技有限公司 On-chip heterogeneous artificial intelligent processor
WO2022032628A1 (en) * 2020-08-14 2022-02-17 华为技术有限公司 Data interaction method between main cpu and npu, and computing device
CN112231270A (en) * 2020-10-14 2021-01-15 苏州浪潮智能科技有限公司 Artificial intelligence accelerator and computer equipment
CN112465116B (en) * 2020-11-25 2022-12-09 安徽寒武纪信息科技有限公司 Compiling method, operation method, electronic device, and storage medium

Also Published As

Publication number Publication date
CN113614749A (en) 2021-11-05
WO2022267049A1 (en) 2022-12-29

Similar Documents

Publication Publication Date Title
US20210262808A1 (en) Obstacle avoidance method and apparatus
CN109901572B (en) Automatic driving method, training method and related device
CN113879295B (en) Track prediction method and device
WO2022027304A1 (en) Testing method and apparatus for autonomous vehicle
WO2021102955A1 (en) Path planning method for vehicle and path planning apparatus for vehicle
CN112703506B (en) Lane line detection method and device
CN110371132B (en) Driver takeover evaluation method and device
CN113614749B (en) Processing method, device and equipment of artificial intelligence model and readable storage medium
CN109131340A (en) Active vehicle adjusting performance based on driving behavior
CN110532846B (en) Automatic channel changing method, device and storage medium
CN113835421B (en) Method and device for training driving behavior decision model
CN115291596A (en) Road travelable area reasoning method and device
WO2022000127A1 (en) Target tracking method and device therefor
CN113525373A (en) Lane changing control system and method for vehicle
WO2022062825A1 (en) Vehicle control method, device, and vehicle
US20230048680A1 (en) Method and apparatus for passing through barrier gate crossbar by vehicle
CN113632033A (en) Vehicle control method and device
CN114693540A (en) Image processing method and device and intelligent automobile
CN112810603B (en) Positioning method and related product
WO2022017307A1 (en) Autonomous driving scenario generation method, apparatus and system
CN113954858A (en) Method for planning vehicle driving route and intelligent automobile
JP2023024276A (en) Action planning for autonomous vehicle in yielding scenario
CN113859265A (en) Reminding method and device in driving process
US20230107033A1 (en) Method for optimizing decision-making regulation and control, method for controlling traveling of vehicle, and related apparatus
WO2022068643A1 (en) Multi-task deployment method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant