CN111340202A - Operation method, device and related product - Google Patents

Operation method, device and related product Download PDF

Info

Publication number
CN111340202A
CN111340202A CN201910755816.8A CN201910755816A CN111340202A CN 111340202 A CN111340202 A CN 111340202A CN 201910755816 A CN201910755816 A CN 201910755816A CN 111340202 A CN111340202 A CN 111340202A
Authority
CN
China
Prior art keywords
module
target
synchronous control
control instruction
modules
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910755816.8A
Other languages
Chinese (zh)
Other versions
CN111340202B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to PCT/CN2019/110167 priority Critical patent/WO2020073925A1/en
Publication of CN111340202A publication Critical patent/CN111340202A/en
Application granted granted Critical
Publication of CN111340202B publication Critical patent/CN111340202B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3889Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Abstract

The disclosure relates to an operation method, an operation device and a related product. Wherein the combined processing device comprises: a machine learning arithmetic device, a universal interconnection interface and other processing devices; the machine learning arithmetic device interacts with other processing devices to jointly complete the calculation operation designated by the user, wherein the combined processing device further comprises: and the storage device is respectively connected with the machine learning arithmetic device and the other processing devices and is used for storing the data of the machine learning arithmetic device and the other processing devices. The operation method, the operation device and the related products provided by the embodiment of the disclosure have the advantages of wide application range, high instruction processing efficiency and high instruction processing speed.

Description

Operation method, device and related product
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for processing a synchronization control instruction, and a related product.
Background
With the continuous development of science and technology, machine learning, especially neural network algorithms, are more and more widely used. The method is well applied to the fields of image recognition, voice recognition, natural language processing and the like. However, as the complexity of neural network algorithms is higher and higher, the types and the number of involved data operations are increasing. In the related art, the mode of synchronously controlling the data processing process cannot meet the operation requirement, so that the efficiency and the speed of the operation on the data are low.
Disclosure of Invention
In view of this, the present disclosure provides a method, an apparatus, and a related product for processing a synchronous control instruction, which perform synchronous control on a data processing process to improve efficiency and speed of performing operations on data.
According to a first aspect of the present disclosure, there is provided a synchronous control instruction processing apparatus, the apparatus including:
the control module is used for analyzing the obtained synchronous control instruction to obtain an operation code of the synchronous control instruction and determining a target operation module which needs to execute the synchronous control instruction;
the target operation module is used for entering a pause state when the synchronous control instruction is executed;
the control module is also used for monitoring the running states of the plurality of operation modules, controlling the target operation modules in the pause state to synchronously enter the working state when the target operation modules are all determined to be in the pause state,
the operation code is used for indicating the synchronous control instruction to be used for synchronously controlling a plurality of operation modules of the device.
According to a second aspect of the present disclosure, there is provided a machine learning arithmetic device, the device including:
one or more synchronous control instruction processing devices according to the first aspect, configured to acquire data to be operated and control information from another processing device, execute a specified machine learning operation, and transmit an execution result to the other processing device through an I/O interface;
when the machine learning arithmetic device comprises a plurality of synchronous control instruction processing devices, the plurality of synchronous control instruction processing devices can be connected through a specific structure and transmit data;
the synchronous control instruction processing devices are interconnected through a PCIE bus of a fast peripheral equipment interconnection bus and transmit data so as to support operation of larger-scale machine learning; the synchronous control instruction processing devices share the same control system or own respective control systems; the synchronous control instruction processing devices share a memory or own memories; the interconnection mode of the plurality of synchronous control instruction processing devices is any interconnection topology.
According to a third aspect of the present disclosure, there is provided a combined processing apparatus, the apparatus comprising:
the machine learning arithmetic device, the universal interconnect interface, and the other processing device according to the second aspect;
and the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user.
According to a fourth aspect of the present disclosure, there is provided a machine learning chip including the machine learning network operation device of the second aspect or the combination processing device of the third aspect.
According to a fifth aspect of the present disclosure, there is provided a machine learning chip package structure, which includes the machine learning chip of the fourth aspect.
According to a sixth aspect of the present disclosure, a board card is provided, which includes the machine learning chip packaging structure of the fifth aspect.
According to a seventh aspect of the present disclosure, there is provided an electronic device, which includes the machine learning chip of the fourth aspect or the board of the sixth aspect.
According to an eighth aspect of the present disclosure, there is provided a synchronous control instruction processing method applied to a synchronous control instruction processing apparatus including a plurality of operation module control modules, the method including:
controlling the control module to analyze the synchronous control instruction to obtain an operation code of the synchronous control instruction, and determining a target operation module which needs to execute the synchronous control instruction;
controlling the target operation module to enter a pause state when the synchronous control instruction is executed;
controlling the control module to monitor the running states of the plurality of operation modules, controlling the target operation modules in the pause state to synchronously enter the working state when the target operation modules are all determined to be in the pause state,
the operation code is used for indicating that the processing of the synchronous control instruction on a plurality of operation modules of the device is synchronous control.
According to a ninth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method of the above eighth aspect.
In some embodiments, the electronic device comprises a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
In some embodiments, the vehicle comprises an aircraft, a ship, and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
The device comprises a control module and a plurality of operation modules, wherein the control module is used for analyzing the obtained synchronous control instruction to obtain an operation code of the synchronous control instruction and determining a target operation module which needs to execute the synchronous control instruction; the target operation module is used for entering a pause state when a synchronous control instruction is executed; the control module is also used for monitoring the running states of the plurality of operation modules and controlling the target operation modules in the pause state to synchronously enter the working state when the target operation modules are determined to be in the pause state. The synchronous control instruction processing method, the synchronous control instruction processing device and the related products provided by the embodiment of the disclosure have the advantages of wide application range, high synchronous control instruction processing efficiency and high synchronous control instruction processing speed, and the synchronous control processing efficiency and the synchronous control processing speed of the corresponding operation module are improved, so that the efficiency and the speed of data operation are improved.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1a shows a block diagram of a synchronous control instruction processing apparatus according to an embodiment of the present disclosure.
Fig. 1b is a schematic structural diagram of a module cluster in a synchronous control instruction processing apparatus according to an embodiment of the present disclosure.
Fig. 2 a-2 f show block diagrams of a synchronization control instruction processing apparatus according to an embodiment of the present disclosure.
Fig. 3 is a schematic diagram illustrating an application scenario of a synchronous control instruction processing apparatus according to an embodiment of the present disclosure.
Fig. 4a, 4b show block diagrams of a combined processing device according to an embodiment of the present disclosure.
Fig. 5 shows a schematic structural diagram of a board card according to an embodiment of the present disclosure.
Fig. 6 illustrates a flow diagram of a synchronization control instruction processing method according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, not all embodiments of the present disclosure. All other embodiments, which can be derived by one skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the scope of protection of the present disclosure.
It should be understood that the terms "zero," "first," "second," and the like in the claims, the description, and the drawings of the present disclosure are used for distinguishing between different objects and not for describing a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Due to the wide use of neural network algorithms, the computing man power of computer hardware is continuously improved, and the types and the number of data operations involved in practical application are continuously improved. Because of the variety of programming languages, in order to implement the synchronous control process in different language environments, in the related art, because there is no synchronous control instruction which can be widely applied to various programming languages, technicians need to customize multiple instructions corresponding to their programming language environments to implement the synchronous control, resulting in low efficiency and low speed of performing the synchronous control. The present disclosure provides a synchronous control instruction processing method, apparatus, computer device, and storage medium, which can implement synchronous control with only one instruction, and can significantly improve the efficiency and speed of performing synchronous control.
Fig. 1a shows a block diagram of a synchronous control instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 1a, the apparatus comprises a control module 11 and a plurality of calculation modules 12.
And the control module 11 is configured to analyze the obtained synchronous control instruction to obtain an operation domain code of the synchronous control instruction, and determine a target operation module that needs to execute the synchronous control instruction. The operation code is used for indicating a synchronous control instruction to be used for synchronously controlling a plurality of operation modules of the device.
And the target operation module in the plurality of operation modules 12 is used for entering a pause state when the synchronous control instruction is executed. And in the pause state, the target operation module pauses work, does not perform data operation any more, and cannot continue to execute the calculation instruction required to be executed.
The control module 11 is further configured to monitor the operating states of the plurality of operation modules 12, and when it is determined that all the target operation modules are in the suspended state, control the target operation modules in the suspended state to synchronously enter the operating state. The target operation module in the working state can perform data operation and execute the calculation instruction required to be executed.
In this embodiment, the synchronous control instruction may perform synchronous control on the process of executing the calculation instruction by the operation module, so that the operation module may suspend working until the synchronous control instruction is executed, and wait for the control module to send an instruction to continue working, thereby achieving the purpose of synchronous control.
Alternatively, the instruction processing device may include a general-purpose processor and an artificial intelligence processor, and the artificial intelligence processor may include the above-mentioned control module, operation module, and the like, and the specific structure of the artificial intelligence processor may be referred to in the following description, and the artificial intelligence processor may parse the received synchronous control instruction and execute the corresponding instruction.
In this embodiment, the operation code may be a part of an instruction or a field (usually indicated by a code) specified in the computer program to perform an operation, and is an instruction sequence number used to inform a device executing the instruction which instruction needs to be executed specifically. The operation domain may be a source of all data required for executing the corresponding instruction, and all data required for executing the corresponding instruction includes parameters such as data to be operated on, a quantity threshold, and a corresponding operation method. It may include an opcode and an operation field for one synchronization control instruction.
It should be understood that the instruction format of the synchronization control instruction and the contained operation code and operation domain may be set as required by those skilled in the art, and the present disclosure does not limit this.
In this embodiment, the apparatus may include one or more control modules and one or more operation modules, and the number of the control modules and the number of the operation modules may be set according to actual needs, which is not limited in this disclosure. When the device comprises a control module, the control module can receive the synchronous control instruction and realize the synchronous control of at least one corresponding operation module. When the device comprises a plurality of control modules, the plurality of control modules can respectively receive the synchronous control instruction and respectively realize the synchronous control of the corresponding plurality of operation modules.
In this embodiment, the operation module may be a core (core) of the apparatus, a device, a module, or the like capable of executing a computation instruction, such as a processor in the apparatus, and the disclosure is not limited thereto.
The device for processing the synchronous control instruction provided by the embodiment of the disclosure comprises a control module and a plurality of operation modules, wherein the control module is used for analyzing the obtained synchronous control instruction to obtain an operation code of the synchronous control instruction, and determining a target operation module which needs to execute the synchronous control instruction and a target signal which needs to be synchronized by the target operation module; the target operation module is used for controlling the processing related to the target signal to enter a pause state when the synchronous control instruction is executed; the control module is also used for monitoring the running states of the plurality of operation modules, and controlling the processing related to the target signal in the target operation module in the pause state to synchronously enter the working state when the target operation modules are all determined to be in the pause state. The synchronous control instruction processing method, the synchronous control instruction processing device and the related products provided by the embodiment of the disclosure have the advantages of wide application range, high synchronous control instruction processing efficiency and high synchronous control instruction processing speed, and the synchronous control processing efficiency and the synchronous control processing speed of the corresponding operation module are improved, so that the efficiency and the speed of data operation are improved.
In a possible implementation, an operation code may be used to indicate a target signal required for synchronization of the target operation module, or the operation domain may include a target signal required for synchronization of the target operation module, so that the control module determines the target signal according to the operation code or the operation domain. And the target operation module is also used for controlling the processing corresponding to the target signal determined by the control module to enter a pause state when the synchronous control instruction is executed.
Wherein the target signal may comprise at least one of: queue signal, IO signal, arrival signal are calculated. The arriving signal is a kind of signal arriving in parallel between the operation modules, including all signals that the operation modules need to execute synchronously. The computation queue signal may be a signal of a queue of computation tasks waiting to be executed in the operation module, and the IO signal may be an input and/or output signal of the operation module.
In this implementation, when the synchronization control instruction does not indicate the target signal, a preset default target signal may be determined as the target signal, which is not limited by the present disclosure.
In this implementation, specific operation codes or operation domains may be set for different target signals, so that the processor may determine the signal to be suspended in the target operation module according to the specific operation codes or operation domains. The target signal may be other signals related to the operation, control, operation and other processing of the operation module, and the disclosure does not limit this.
In a possible implementation manner, determining a target operation module that needs to execute a synchronous control instruction may include: and determining an operation module executing the target task from the plurality of operation modules as a target operation module according to the identification of the target task. The identification of the target task comprises at least one of: the task name, the task type, the task number, and the identifier of the target task may further include other information capable of characterizing the target task, which is not limited in this disclosure.
In the device, the control module allocates one or more operation modules to the task according to the type of the task (including the target task) and the working state of the operation module, so that the task is executed.
By the mode, synchronous control of all the operation modules executing the target task can be realized.
In a possible implementation manner, the control module may determine, according to a preset target task identifier, a target operation module that needs to execute the synchronous control instruction. At this time, the synchronization control instruction may include only an opcode. The instruction format of the synchronous control instruction may be "sync _ all ()", where the device may implement synchronous control of all the operation modules of the target task identified by the preset target task based on the synchronous control instruction.
Alternatively, the synchronization control instruction may be included in a kernel function, and the general-purpose processor of the apparatus may send the kernel function to a corresponding operation module on the artificial intelligence device for execution. Wherein the apparatus may further determine an identification of the target task according to a characteristic of the kernel function. Thus, the synchronous control instruction in the kernel function can determine the target operation module according to the determined target task identifier.
Fig. 1b is a schematic structural diagram of a module cluster in a synchronous control instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 1b, the apparatus includes a control module 200, an interconnect module 500, a global memory 400, a plurality of module clusters, and a shared storage corresponding to each module cluster. Fig. 1b only shows two module clusters 1 and 2, and the structures of the remaining module clusters are similar to those of the module clusters 1 and 2, which are not shown in the figure. Each module cluster comprises four operation modules, namely an operation module 1, an operation module 2, an operation module 3 and an operation module 4. The interconnection module 500 is configured to implement communication interconnection among the global content 400, the control module 200, and the module cluster. The global memory 400 is used to implement the storage of the control modules 200 and module clusters in the device.
For example, assume that the identity UNION of the target task may specify the corresponding operation module by specifying the number of module clusters. For example, UNION1 indicates that when a kernel function is called to execute a task, 1 module cluster is occupied, and 4 operation modules are shared. UNION2 shows that when a kernel function is called to execute a task, 2 module clusters are occupied, and 8 operation modules are shared. UNION4 shows that when a kernel function is called to execute a task, it occupies a cluster of 4 modules, and has 16 cores in total. UNION8 shows that when a kernel function is called to execute a task, 8 module clusters are occupied, and 32 cores are shared. Taking "UNION 1" as an example, the control module 200 may designate a module cluster1 that is idle or capable of executing tasks according to "UNION 1" so that the module cluster1 can execute the task "UNION 1".
In one possible implementation, the plurality of operation modules are divided into a plurality of module clusters (clusters), and each module cluster includes one or more operation modules (as shown in fig. 2 b). The determining of the target operation module that needs to execute the synchronous control instruction may include: and determining all operation modules in a target module cluster related to the execution of the target task in the plurality of module clusters as target operation modules according to the identification of the target task, wherein all or part of the operation modules in the target cluster are used for executing the target task.
By the method, all the operation modules in the target module cluster related to the execution of the target task can be synchronously controlled.
For example, the control module may determine, according to a preset target task identifier, the number of computing module clusters that the target task identifier needs to occupy, and further determine a target computing module that needs to execute a synchronous control instruction. At this time, the synchronization control instruction may include only an opcode. The instruction format of the synchronous control instruction may be "sync _ all0 ()", where based on the synchronous control instruction, the device may implement synchronous control of the operation modules in all the operation module clusters of the target task identified by the preset target task.
Alternatively, the synchronous control instruction may be included in a kernel function, and the general-purpose processor of the apparatus may send a program such as an instruction in the kernel function to a corresponding operation module on the artificial intelligence device for execution. Wherein the apparatus may further determine an identification of the target task according to a characteristic of the kernel function. Thus, the synchronous control instruction in the kernel function can determine the target operation module cluster according to the determined target task identifier. Therefore, the control module can take all the operation modules in the target operation module cluster as the target operation modules.
In one possible implementation, the operation code or the operation field may be used to indicate the identity of the target task.
In one possible implementation, the instruction format of the synchronous control instruction may be "sync _ sign2_ all1 ()". Where sign2 is the identification of the target task, the synchronous control instruction device can perform synchronous control on all the operation modules in the target module cluster related to the target task identified as sign 2.
In a possible implementation manner, the plurality of operation modules are divided into a plurality of module clusters, each module cluster includes one or more operation modules, and the operation code or the operation domain is used to indicate an identifier of a target module cluster. The determining of the target operation module that needs to execute the synchronous control instruction may include: and determining the operation module belonging to the target module cluster in the plurality of operation modules as a target operation module according to the identification of the target module cluster.
In this implementation manner, the identifier of the target module cluster may be identification information that can represent the target module cluster, such as a number, a serial number, a name, and the like of the target module cluster in the plurality of module clusters, which is not limited in this disclosure.
By the method, synchronous control of all the operation modules in one or more target module clusters can be realized.
In one possible implementation, the instruction format of the synchronization control instruction may be "sync _ cluster". Wherein, cluster is the identification of the target module cluster. Through the synchronous control instruction, the device can realize synchronous control on all the operation modules in the target cluster marked as cluster. When the number of the target clusters is multiple, the instruction format of the synchronization control instruction may be "sync _ cluster0cluster1 … cluster", where cluster0cluster1 … cluster is an identifier of a first target module cluster and an identifier of a second target module cluster … nth target module cluster, respectively, so as to implement synchronization control on all the operation modules in the multiple module clusters.
In one possible implementation, the operation code or the operation field may be used to indicate the identity of the target operation module. Determining a target operation module that needs to execute a synchronous control instruction according to the operation code or the operation domain may include: and determining a target operation module from the plurality of operation modules according to the identification of the target operation module.
In this implementation manner, the identifier of the target operation module may be identification information that can represent the target operation module, such as a number, a serial number, a name, and the like of the target operation module in the plurality of operation modules, which is not limited in this disclosure.
By the mode, synchronous control of one or more target operation modules can be realized.
In one possible implementation, the instruction format of the synchronous control instruction may be "syn _ sign3_0sign3_1 … sign3_ n". Wherein sign3_0sign3_1 … sign3_ n is an identifier of the first target operation module and an identifier of the second target operation module … nth target operation module, respectively. Through the synchronous control command, the device can realize synchronous control on the target operation module identified as sign3_0sign3_1 … sign3_ n.
In a possible implementation manner, when the target operation module is not indicated in both the operation code and the operation domain of the synchronous control instruction, the control module is further configured to determine a kernel (kernel) in which the synchronous control instruction is located, and determine an operation module, which calls the kernel, among the plurality of operation modules as the target operation module.
In this implementation, the device may call one or more kernel functions, and the operation module may call the kernel functions to perform tasks that require the kernel functions. The synchronous control instruction can be written in the kernel function in advance, and the control module can determine the target operation module according to the record of the operation module calling the kernel function.
When the control module controls the plurality of operation modules to execute the tasks, the operation modules (or under the control of the control module) can determine the kernel function to be called according to the task information such as the type of the executed task, the task parallelism and the like, so as to complete the tasks.
By the mode, the synchronous control of the target operation module calling the kernel function where the synchronous control instruction is located can be realized.
In one possible implementation, a quantity threshold is included in the operational domain. The control module is further configured to control the target operation module in the suspended state to enter the working state when it is determined that the number of the target operation modules in the suspended state reaches the number threshold.
In this implementation, on the basis of the determined target operation modules, the number of target operation modules for synchronous control may be limited, and the number threshold may be smaller than or equal to the number of the determined target operation modules.
In one possible implementation, the instruction format of the synchronization control instruction may be "barrier N". Wherein N is a number threshold. barrier is only used to indicate that the command is a synchronous control command and that its target signal is an arrival signal. Through the synchronous control instruction, the device can realize synchronous control on the target operation module calling the kernel function where the synchronous control instruction is located, and can control the target operation module in the pause state to enter the working state when the number of the target operation modules in the pause state is determined to reach the number threshold value.
For the above-described synchronous control instructions "sync _ sign1_ all0 ()", "sync _ sign2_ all1 ()", "sync _ cluster0cluster1 … cluster", "syn _ sign3_0sign3_1 … sign3_ N", and "barrier N", sync, syn, barrier can also be used to indicate target signals, where the target signals indicated by sync may be calculation queue signals and IO signals, and the target signals indicated by sync and barrier may be arrival signals. The components of the synchronous control instruction, sync, syn, barrier, all0(), all1(), cluster0cluster1 … cluster, sign3_0sign3_1 … sign3_ N, may be set as an operation code or operation domain, and their positions in the synchronous control instruction, as needed, and the present disclosure is not limited thereto. Moreover, the above synchronous control commands are only a few examples of the technical solutions of the present disclosure, and a person skilled in the art may set the command format thereof according to the technical solutions of the present disclosure as needed, which is not limited by the present disclosure.
Fig. 2a shows a block diagram of a synchronous control instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2a, the operation module 12 may include a plurality of operators 120. The plurality of operators 120 are for performing operations corresponding to operation types of the computation instructions.
In this implementation, the operator may include an adder, a divider, a multiplier, a comparator, and the like capable of performing arithmetic operations, logical operations, and the like on data. The type and number of the arithmetic units may be set according to the requirements of the size of the data amount of the arithmetic operation to be performed, the type of the arithmetic operation, the processing speed and efficiency of the arithmetic operation on the data, and the like, which is not limited by the present disclosure.
Fig. 2b shows a block diagram of a synchronous control instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2b, the operation module 12 may include a master operation sub-module 121 and a plurality of slave operation sub-modules 122. The main arithmetic sub-module 121 may include a plurality of operators (not shown in the drawings).
The control module 11 is further configured to obtain a calculation instruction and data to be calculated required for executing the calculation instruction, analyze the calculation instruction to obtain a plurality of calculation instructions, and send the data to be calculated and the plurality of calculation instructions to the main operation submodule 121.
And the operation module 12 is configured to perform operation on the data to be operated according to the calculation instruction to obtain an operation result.
The main operation sub-module 121 is configured to perform preamble processing on data to be operated, and transmit data and operation instructions with the plurality of slave operation sub-modules 122.
The slave operation submodule 122 is configured to execute an intermediate operation in parallel according to the data and the operation instruction transmitted from the master operation submodule 121 to obtain a plurality of intermediate results, and transmit the plurality of intermediate results to the master operation submodule 122.
The main operation sub-module 121 is further configured to perform subsequent processing on the plurality of intermediate results to obtain an operation result.
In this implementation, the calculation instruction may be an instruction for performing arithmetic operation, logical operation, and the like on data such as a scalar, a vector, and a tensor, and a person skilled in the art may set the calculation instruction according to actual needs, which is not limited by the present disclosure.
In this implementation manner, the control module is further configured to analyze the compiled calculation instruction to obtain an operation code of the calculation instruction, and obtain data to be calculated according to the operation code. Optionally, the control module is further configured to analyze the calculation instruction to obtain an operation code and an operation domain of the calculation instruction, and obtain data to be operated according to the operation code and the operation domain.
It should be noted that, a person skilled in the art may set the connection manner between the master operation submodule and the plurality of slave operation submodules according to actual needs to implement the configuration setting of the operation module, for example, the configuration of the operation module may be an "H" configuration, an array configuration, a tree configuration, and the like, which is not limited in the present disclosure.
Fig. 2c shows a block diagram of a synchronous control instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2c, the operation module 12 may further include one or more branch operation sub-modules 123, and the branch operation sub-module 123 is configured to forward data and/or operation instructions between the master operation sub-module 121 and the slave operation sub-module 122. The main operation sub-module 121 is connected to one or more branch operation sub-modules 123. Therefore, the main operation sub-module, the branch operation sub-module and the slave operation sub-module in the operation module are connected by adopting an H-shaped structure, and data and/or operation instructions are forwarded by the branch operation sub-module, so that the resource occupation of the main operation sub-module is saved, and the instruction processing speed is further improved.
Fig. 2d shows a block diagram of a synchronous control instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in FIG. 2d, a plurality of slave operation sub-modules 122 are distributed in an array.
Each slave operation submodule 122 is connected to another adjacent slave operation submodule 122, the master operation submodule 121 is connected to k slave operation submodules 122 of the plurality of slave operation submodules 122, and the k slave operation submodules 122 are: n slave operator sub-modules 122 of row 1, n slave operator sub-modules 122 of row m, and m slave operator sub-modules 122 of column 1.
As shown in fig. 2d, the k slave operator modules include only the n slave operator modules in the 1 st row, the n slave operator modules in the m th row, and the m slave operator modules in the 1 st column, that is, the k slave operator modules are slave operator modules directly connected to the master operator module among the plurality of slave operator modules. The k slave operation submodules are used for forwarding data and instructions between the master operation submodules and the plurality of slave operation submodules. Therefore, the plurality of slave operation sub-modules are distributed in an array, the speed of sending data and/or operation instructions to the slave operation sub-modules by the master operation sub-module can be increased, and the instruction processing speed is further increased.
Fig. 2e shows a block diagram of a synchronization control instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2e, the operation module may further include a tree sub-module 124. The tree submodule 124 includes a root port 401 and a plurality of branch ports 402. The root port 401 is connected to the master operation submodule 121, and the plurality of branch ports 402 are connected to the plurality of slave operation submodules 122, respectively. The tree sub-module 124 has a transceiving function, and is configured to forward data and/or operation instructions between the master operation sub-module 121 and the slave operation sub-module 122. Therefore, the operation modules are connected in a tree-shaped structure under the action of the tree-shaped sub-modules, and the speed of sending data and/or operation instructions from the main operation sub-module to the auxiliary operation sub-module can be increased by utilizing the forwarding function of the tree-shaped sub-modules, so that the instruction processing speed is increased.
In one possible implementation, the tree submodule 124 may be an optional result of the apparatus, which may include at least one level of nodes. The nodes are line structures with forwarding functions, and the nodes do not have operation functions. The lowest level node is connected to the slave operation sub-module to forward data and/or operation instructions between the master operation sub-module 121 and the slave operation sub-module 122. In particular, if the tree submodule has zero level nodes, the apparatus does not require the tree submodule.
In one possible implementation, the tree submodule 124 may include a plurality of nodes of an n-ary tree structure, and the plurality of nodes of the n-ary tree structure may have a plurality of layers.
For example, fig. 2f shows a block diagram of a synchronous control instruction processing apparatus according to an embodiment of the present disclosure. As shown in FIG. 2f, the n-ary tree structure may be a binary tree structure with tree-type sub-modules including 2 levels of nodes 01. The lowest level node 01 is connected with the slave operation sub-module 122 to forward data and/or operation instructions between the master operation sub-module 121 and the slave operation sub-module 122.
In this implementation, the n-ary tree structure may also be a ternary tree structure or the like, where n is a positive integer greater than or equal to 2. The number of n in the n-ary tree structure and the number of layers of nodes in the n-ary tree structure may be set by those skilled in the art as needed, and the disclosure is not limited thereto.
In one possible implementation, as shown in fig. 2 a-2 f, the apparatus may further include a storage module 13. The storage module 13 is used for storing data to be operated.
In this implementation, the storage module may include one or more of a cache and a register, and the cache may include a temporary cache and may further include at least one NRAM (Neuron Random Access Memory). The cache can be used for storing data to be operated on, and the register can be used for storing scalar data in the data to be operated on.
In one possible implementation, the cache may include a neuron cache. The neuron buffer, i.e., the neuron random access memory, may be configured to store neuron data in data to be operated on, where the neuron data may include neuron vector data. The data to be calculated comprises data related to scalar type conversion and/or data related to operation of other calculation instructions.
In a possible implementation manner, the apparatus may further include a direct memory access module for reading or storing data from the storage module.
In one possible implementation, as shown in fig. 2 a-2 f, the control module 11 may include an instruction storage sub-module 111, an instruction processing sub-module 112, and a queue storage sub-module 113.
The instruction storage submodule 111 is used for storing synchronous control instructions and calculation instructions.
The instruction processing sub-module 112 is configured to analyze the synchronization control instruction and the calculation instruction respectively to obtain corresponding operation codes. Namely, the synchronous control instruction is analyzed to obtain the operation code of the synchronous control instruction, and the calculation instruction is analyzed to obtain the operation code of the calculation instruction. Optionally, the instruction processing sub-module 112 is configured to analyze the synchronization control instruction and the calculation instruction respectively to obtain a corresponding operation code and an operation domain.
The queue storage submodule 113 is configured to store an instruction queue, where the instruction queue includes multiple instructions to be executed that are sequentially arranged according to an execution order, and the multiple instructions to be executed may include a synchronization control instruction and a computation instruction.
In this implementation manner, the execution order of the multiple instructions to be executed may be arranged according to the receiving time, the priority level, and the like of the instructions to be executed to obtain an instruction queue, so that the multiple instructions to be executed are sequentially executed according to the instruction queue.
In one possible implementation, as shown in fig. 2 a-2 f, the control module 11 may further include a dependency processing sub-module 114. The dependency relationship processing submodule 114 is configured to, when it is determined that a first to-be-executed instruction in the multiple to-be-executed instructions is associated with a zeroth to-be-executed instruction before the first to-be-executed instruction, cache the first to-be-executed instruction in the instruction storage submodule 111, and after the zeroth to-be-executed instruction is executed, extract the first to-be-executed instruction from the instruction storage submodule 111 and send the first to-be-executed instruction to the operation module 12.
The method for determining the zero-th instruction to be executed before the first instruction to be executed has an incidence relation with the first instruction to be executed comprises the following steps: the first storage address interval for storing the data required by the first to-be-executed instruction and the zeroth storage address interval for storing the data required by the zeroth to-be-executed instruction have an overlapped area. On the contrary, there is no association relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction, which may be that there is no overlapping area between the first storage address interval and the zeroth storage address interval.
By the method, according to the dependency relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction, the subsequent first to-be-executed instruction is executed after the execution of the previous zeroth to-be-executed instruction is finished, and the accuracy of the operation result is ensured.
In one possible implementation manner, the apparatus may be disposed in one or more of a Graphics Processing Unit (GPU), a Central Processing Unit (CPU), and an embedded Neural Network Processor (NPU).
It should be noted that, although the synchronous control instruction processing apparatus has been described above by taking the above-described embodiment as an example, those skilled in the art will understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each module according to personal preference and/or actual application scene, as long as the technical scheme of the disclosure is met.
Application example
An application example according to the embodiment of the present disclosure is given below in conjunction with "synchronous control with a synchronous control instruction processing apparatus" as one exemplary application scenario to facilitate understanding of the flow of the synchronous control instruction processing apparatus. It is understood by those skilled in the art that the following application examples are merely for the purpose of facilitating understanding of the embodiments of the present disclosure and should not be construed as limiting the embodiments of the present disclosure
Fig. 3 is a schematic diagram illustrating an application scenario of a synchronous control instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 3, the process of processing the synchronization control command by the synchronization control command processing means is as follows:
the control module 11 analyzes the acquired synchronous control instruction 1 (for example, the synchronous control instruction 1 is barrier 16), and obtains an operation code of the synchronous control instruction 1. The operation code of the synchronous control instruction 1 is barrier, the quantity threshold value is 16, and the target signal is determined to be an arrival signal according to the barrier. The control module 11 sends a synchronization control instruction to all the operation modules of the device.
When the synchronous control instruction is executed, the target operation module in the plurality of operation modules 12 controls the processing related to the arrival signal to enter a pause state.
The control module 11 is further configured to detect an operating status of the plurality of operation modules 12, and control the 16 operation modules 12 in the suspended state to enter the operating status in synchronization with the processing related to the arrival signal when it is determined that the number of operation modules in the suspended state reaches the number threshold 16.
Thus, the synchronous control instruction processing device can efficiently and quickly process the synchronous control instruction. The working process of the above modules can refer to the above related description.
The present disclosure provides a machine learning arithmetic device, which may include one or more of the above-described synchronous control instruction processing devices, and is configured to acquire data to be operated and control information from other processing devices and execute a specified machine learning operation. The machine learning arithmetic device can obtain a synchronous control command from other machine learning arithmetic devices or non-machine learning arithmetic devices, and transmit an execution result to peripheral equipment (also called other processing devices) through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one synchronous control command processing device is included, the synchronous control command processing devices can be linked and transmit data through a specific structure, for example, a PCIE bus is used for interconnection and data transmission, so as to support larger-scale operation of the neural network. At this time, the same control system may be shared, or there may be separate control systems; the memory may be shared or there may be separate memories for each accelerator. In addition, the interconnection mode can be any interconnection topology.
The machine learning arithmetic device has high compatibility and can be connected with various types of servers through PCIE interfaces.
Fig. 4a shows a block diagram of a combined processing device according to an embodiment of the present disclosure. As shown in fig. 4a, the combined processing device includes the machine learning arithmetic device, the universal interconnection interface, and other processing devices. The machine learning arithmetic device interacts with other processing devices to jointly complete the operation designated by the user.
Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the machine learning arithmetic device and external data and control, and comprise data transportation to finish basic control of starting, stopping and the like of the machine learning arithmetic device; other processing devices may cooperate with the machine learning computing device to perform computing tasks.
And the universal interconnection interface is used for transmitting data and control instructions between the machine learning arithmetic device and other processing devices. The machine learning arithmetic device acquires required input data from other processing devices and writes the input data into a storage device on the machine learning arithmetic device; control instructions can be obtained from other processing devices and written into a control cache on a machine learning arithmetic device chip; the data in the storage module of the machine learning arithmetic device can also be read and transmitted to other processing devices.
Fig. 4b shows a block diagram of a combined processing device according to an embodiment of the present disclosure. In a possible implementation manner, as shown in fig. 4b, the combined processing device may further include a storage device, and the storage device is connected to the machine learning operation device and the other processing device respectively. The storage device is used for storing data stored in the machine learning arithmetic device and the other processing device, and is particularly suitable for data which is required to be calculated and cannot be stored in the internal storage of the machine learning arithmetic device or the other processing device.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.
The present disclosure provides a machine learning chip, which includes the above machine learning arithmetic device or combined processing device.
The present disclosure provides a machine learning chip package structure, which includes the above machine learning chip.
Fig. 5 shows a schematic structural diagram of a board card according to an embodiment of the present disclosure. As shown in fig. 5, the board includes the above-mentioned machine learning chip package structure or the above-mentioned machine learning chip. The board may include, in addition to the machine learning chip 389, other kits including, but not limited to: memory device 390, interface device 391 and control device 392.
The memory device 390 is coupled to a machine learning chip 389 (or a machine learning chip within a machine learning chip package structure) via a bus for storing data. Memory device 390 may include multiple sets of memory cells 393. Each group of memory cells 393 is coupled to a machine learning chip 389 via a bus. It is understood that each group 393 of memory cells may be a ddr SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM.
In one embodiment, memory device 390 may include 4 groups of memory cells 393. Each group of memory cells 393 may include a plurality of DDR4 particles (chips). In one embodiment, the machine learning chip 389 may include 4 72-bit DDR4 controllers therein, where 64bit is used for data transmission and 8bit is used for ECC check in the 72-bit DDR4 controller. It is appreciated that when DDR4-3200 particles are used in each group of memory cells 393, the theoretical bandwidth of data transfer may reach 25600 MB/s.
In one embodiment, each group 393 of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is provided in the machine learning chip 389 for controlling data transfer and data storage of each memory unit 393.
Interface device 391 is electrically coupled to machine learning chip 389 (or a machine learning chip within a machine learning chip package). The interface device 391 is used to implement data transmission between the machine learning chip 389 and an external device (e.g., a server or a computer). For example, in one embodiment, the interface device 391 may be a standard PCIE interface. For example, the data to be processed is transmitted to the machine learning chip 289 by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the interface device 391 may also be another interface, and the disclosure does not limit the specific representation of the other interface, and the interface device can implement the switching function. In addition, the calculation result of the machine learning chip is still transmitted back to the external device (e.g., server) by the interface device.
The control device 392 is electrically connected to a machine learning chip 389. The control device 392 is used to monitor the state of the machine learning chip 389. Specifically, the machine learning chip 389 and the control device 392 may be electrically connected through an SPI interface. The control device 392 may include a single chip Microcomputer (MCU). For example, machine learning chip 389 may include multiple processing chips, multiple processing cores, or multiple processing circuits, which may carry multiple loads. Therefore, the machine learning chip 389 can be in different operation states such as a multi-load and a light load. The control device can regulate and control the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the machine learning chip.
The present disclosure provides an electronic device, which includes the above machine learning chip or board card.
The electronic device may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle may include an aircraft, a ship, and/or a vehicle. The household appliances may include televisions, air conditioners, microwave ovens, refrigerators, electric rice cookers, humidifiers, washing machines, electric lamps, gas cookers, and range hoods. The medical device may include a nuclear magnetic resonance apparatus, a B-mode ultrasound apparatus and/or an electrocardiograph.
Fig. 6 illustrates a flow diagram of a synchronization control instruction processing method according to an embodiment of the present disclosure. As shown in fig. 6, the method is applied to the above-described synchronous control instruction processing apparatus, and includes steps S51 through S53.
In step S51, the control module is controlled to parse the acquired synchronization control instruction to obtain an operation code of the synchronization control instruction, and determine a target operation module that needs to execute the synchronization control instruction. The operation code is used for indicating a synchronous control instruction to be used for synchronously controlling a plurality of operation modules of the device.
In step S52, the plurality of operation modules are controlled to enter a suspend state when a synchronization control instruction is executed.
In step S53, the control module monitors the operating states of the plurality of arithmetic modules, and controls the target arithmetic module in the suspended state to enter the operating state synchronously when it is determined that the target arithmetic modules are all in the suspended state.
In a possible implementation manner, the synchronization control instruction further includes an operation field, where the operation code is used to indicate a target signal that the target operation module needs to synchronize, or the operation field includes the target signal that the target operation module needs to synchronize, so that the control module determines the target signal according to the operation code or the operation field,
wherein, controlling the target operation module to enter a pause state when executing the synchronous control instruction comprises:
controlling the target operation module to control the processing corresponding to the target signal determined by the control module to enter a pause state when the synchronous control instruction is executed,
wherein the target signal comprises at least one of: queue signal, IO signal, arrival signal are calculated.
In one possible implementation, determining a target operation module that needs to execute a synchronous control instruction includes:
determining an operation module, which executes the target task, of the plurality of operation modules as a target operation module according to the identifier of the target task, where the identifier of the target task includes at least one of: task name, task type, task number.
In one possible implementation, the plurality of operation modules are divided into a plurality of module clusters, each module cluster comprises one or more operation modules,
the method for determining the target operation module which needs to execute the synchronous control instruction comprises the following steps:
determining all operation modules in a target module cluster related to the execution of the target task in the plurality of module clusters as target operation modules according to the identification of the target task, wherein all or part of the operation modules in the target cluster are used for executing the target task, and the identification of the target task comprises at least one of the following: task name, task type, task number.
In a possible implementation manner, the operation code or the operation domain is used for indicating to obtain an identifier of a target task.
In a possible implementation manner, the plurality of operation modules are divided into a plurality of module clusters, each module cluster comprises one or more operation modules, the operation code or the operation domain is used for indicating the identification of a target module cluster,
the method for determining the target operation module which needs to execute the synchronous control instruction comprises the following steps:
and determining the operation module belonging to the target module cluster in the plurality of operation modules as a target operation module according to the identification of the target module cluster.
In one possible implementation, the operation code or the operation field is used to indicate the identity of the target operation module,
the method for determining the target operation module which needs to execute the synchronous control instruction comprises the following steps:
and determining a target operation module from the plurality of operation modules according to the identification of the target operation module.
In a possible implementation manner, the control module is controlled to determine a kernel function where the synchronous control instruction is located, and an operation module, which calls the kernel function, of the plurality of operation modules is determined as a target operation module.
In one possible implementation, the operation domain includes a quantity threshold, and the method further includes:
and controlling the control module to control the target operation module in the pause state to enter a working state when the control module determines that the number of the target operation modules in the pause state reaches the number threshold.
In one possible implementation, the operation module includes a master operation submodule and a plurality of slave operation submodules, and the method may further include:
controlling the control module to obtain a calculation instruction, executing to-be-calculated data required by the calculation instruction, and analyzing the calculation instruction to obtain a plurality of calculation instructions;
controlling the main operation sub-module to execute preorder processing on the data to be operated and transmit data and operation instructions;
controlling the slave operation submodule to execute intermediate operation in parallel according to the transmitted data and the operation instruction to obtain a plurality of intermediate results;
and controlling the main operation sub-module to execute subsequent processing on the plurality of intermediate results to obtain an operation result.
In one possible implementation, the method may further include: and storing the data to be operated.
Wherein the storage module comprises at least one of a register and a cache,
the cache is used for storing the data to be operated, and comprises at least one neuron cache NRAM;
the register is used for storing scalar data in the data to be operated;
the neuron cache is used for storing neuron data in the data to be operated, wherein the neuron data comprises neuron vector data.
In one possible implementation, the method may further include:
the control module stores a synchronous control instruction and a calculation instruction;
the control module respectively analyzes the synchronous control instruction and the calculation instruction to obtain a corresponding operation code and an operation domain;
the control module stores an instruction queue, the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the plurality of instructions to be executed comprise synchronous control instructions and calculation instructions.
In one possible implementation, the method may further include: when determining that the first to-be-executed instruction in the plurality of to-be-executed instructions is in an association relation with a zeroth to-be-executed instruction before the first to-be-executed instruction, caching the first to-be-executed instruction, and after determining that the zeroth to-be-executed instruction is completely executed, controlling to execute the first to-be-executed instruction,
the associating relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction may include: the first storage address interval for storing the data required by the first to-be-executed instruction and the zeroth storage address interval for storing the data required by the zeroth to-be-executed instruction have an overlapped area.
The present disclosure also provides a non-transitory computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement the above-described synchronization control instruction processing method.
It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art will also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.
It should be further noted that, although the steps in the flowchart of fig. 6 are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 6 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
It should be understood that the above-described apparatus embodiments are merely exemplary, and that the apparatus of the present disclosure may be implemented in other ways. For example, the division of the units/modules in the above embodiments is only one logical function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented.
In addition, unless otherwise specified, each functional unit/module in the embodiments of the present disclosure may be integrated into one unit/module, each unit/module may exist alone physically, or two or more units/modules may be integrated together. The integrated units/modules may be implemented in the form of hardware or software program modules.
If the integrated unit/module is implemented in hardware, the hardware may be digital circuits, analog circuits, etc. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. Unless otherwise specified, the storage module may be any suitable magnetic storage medium or magneto-optical storage medium, such as resistive Random Access Memory rram (resistive Random Access Memory), Dynamic Random Access Memory dram (Dynamic Random Access Memory), Static Random Access Memory SRAM (Static Random-Access Memory), enhanced Dynamic Random Access Memory edram (enhanced Dynamic Random Access Memory), High-Bandwidth Memory HBM (High-Bandwidth Memory), hybrid Memory cubic hmc (hybrid Memory cube), and so on.
The integrated units/modules, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The foregoing may be better understood in light of the following clauses:
clause a1. a synchronous control instruction processing apparatus, the apparatus including a control module and a plurality of arithmetic modules,
the control module is used for analyzing the obtained synchronous control instruction to obtain an operation code of the synchronous control instruction and determining a target operation module which needs to execute the synchronous control instruction;
the target operation module is used for entering a pause state when the synchronous control instruction is executed;
the control module is also used for monitoring the running states of the plurality of operation modules, controlling the target operation modules in the pause state to synchronously enter the working state when the target operation modules are all determined to be in the pause state,
the operation code is used for indicating the synchronous control instruction to be used for synchronously controlling a plurality of operation modules of the device.
Clause a2. the apparatus according to clause a1, the synchronization control instruction further comprising an operation field, the opcode indicating a target signal that the target operation module requires for synchronization, or the operation field including a target signal that the target operation module requires for synchronization, to cause the control module to determine a target signal according to the opcode or the operation field,
the target operation module is also used for controlling the processing corresponding to the target signal determined by the control module to enter a pause state when the synchronous control instruction is executed,
wherein the target signal comprises at least one of: queue signal, IO signal, arrival signal are calculated.
Clause a3. the apparatus of clause a1 or 2, wherein the target operation module that needs to execute the synchronous control instruction is determined, comprising:
determining an operation module, which executes the target task, of the plurality of operation modules as a target operation module according to the identifier of the target task, where the identifier of the target task includes at least one of: task name, task type, task number.
Clause a4. the apparatus of clause a1 or clause a2, the plurality of calculation modules being divided into a plurality of module clusters, each module cluster including one or more calculation modules,
the method for determining the target operation module which needs to execute the synchronous control instruction comprises the following steps:
determining all operation modules in a target module cluster related to the execution of the target task in the plurality of module clusters as target operation modules according to the identification of the target task, wherein all or part of the operation modules in the target cluster are used for executing the target task, and the identification of the target task comprises at least one of the following: task name, task type, task number.
Clause a5. the apparatus of clause A3 or clause a4, the opcode or the operation field to indicate an identification of a target task.
Clause a6. the apparatus of clause a1 or clause a2, the plurality of calculation modules being divided into a plurality of module clusters, each module cluster including one or more calculation modules therein, the opcode or the operation field indicating an identification of a target module cluster,
the method for determining the target operation module which needs to execute the synchronous control instruction comprises the following steps:
and determining the operation module belonging to the target module cluster in the plurality of operation modules as a target operation module according to the identification of the target module cluster.
Clause A7. is the apparatus of clause a1 or clause a2, the opcode or the operation field to indicate an identification of a target operation module,
the method for determining the target operation module which needs to execute the synchronous control instruction comprises the following steps:
and determining a target operation module from the plurality of operation modules according to the identification of the target operation module.
Clause A8. the apparatus of clause a1 or clause a2, wherein the control module is further configured to determine a kernel function in which the synchronous control instruction is located, and determine an operation module of the plurality of operation modules that calls the kernel function as a target operation module.
Clause A9. the apparatus of clause A3 or clause a8, including a quantity threshold in the operational domain,
the control module is further configured to control the target operation module in the suspended state to enter the working state when it is determined that the number of the target operation modules in the suspended state reaches the number threshold.
Clause a10. the apparatus of clause a1, the calculation module comprising a master calculation sub-module and a plurality of slave calculation sub-modules,
the control module is further configured to obtain a calculation instruction, to-be-calculated data required for executing the calculation instruction, analyze the calculation instruction to obtain a plurality of calculation instructions, and send the to-be-calculated data and the plurality of calculation instructions to the main calculation sub-module;
the main operation submodule is used for performing preorder processing on the data to be operated and transmitting data and operation instructions with the plurality of slave operation submodule;
the slave operation sub-module is used for executing intermediate operation in parallel according to the data and the operation instruction transmitted from the master operation sub-module to obtain a plurality of intermediate results and transmitting the plurality of intermediate results to the master operation sub-module;
and the main operation sub-module is also used for executing subsequent processing on the plurality of intermediate results to obtain operation results.
Clause a11. the apparatus of clause a10, further comprising:
a storage module for storing the data to be operated,
wherein the storage module comprises at least one of a register and a cache,
the cache is used for storing the data to be operated, and comprises at least one neuron cache NRAM;
the register is used for storing scalar data in the data to be operated;
the neuron cache is used for storing neuron data in the data to be operated, wherein the neuron data comprises neuron vector data.
Clause a12. the apparatus of clause a10, the control module comprising:
the instruction storage submodule is used for storing the synchronous control instruction and the calculation instruction;
the instruction processing submodule is used for respectively analyzing the synchronous control instruction and the calculation instruction to obtain a corresponding operation code and an operation domain;
and the queue storage submodule is used for storing an instruction queue, the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the plurality of instructions to be executed comprise the synchronous control instruction and the calculation instruction.
Clause a13. the apparatus of clause a12, the control module further comprising:
the dependency relationship processing submodule is used for caching a first instruction to be executed in the instruction storage submodule when the fact that the first instruction to be executed in the plurality of instructions to be executed is associated with a zeroth instruction to be executed before the first instruction to be executed is determined, extracting the first instruction to be executed from the instruction storage submodule after the zeroth instruction to be executed is executed, and sending the first instruction to be executed to the operation module,
wherein the association relationship between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction comprises:
and a first storage address interval for storing the data required by the first instruction to be executed and a zeroth storage address interval for storing the data required by the zeroth instruction to be executed have an overlapped area.
Clause a14. a machine learning arithmetic device, the device comprising:
one or more synchronous control instruction processing devices as described in any of clauses a 1-clause a13, configured to obtain data to be operated and control information from other processing devices, execute a specified machine learning operation, and transmit an execution result to other processing devices through an I/O interface;
when the machine learning arithmetic device comprises a plurality of synchronous control instruction processing devices, the plurality of synchronous control instruction processing devices can be connected through a specific structure and transmit data;
the synchronous control instruction processing devices are interconnected through a PCIE bus of a fast peripheral equipment interconnection bus and transmit data so as to support operation of larger-scale machine learning; the synchronous control instruction processing devices share the same control system or own respective control systems; the synchronous control instruction processing devices share a memory or own memories; the interconnection mode of the plurality of synchronous control instruction processing devices is any interconnection topology.
Clause a15. a combined treatment apparatus, comprising:
the machine learning computing device, universal interconnect interface, and other processing device of clause a 14;
the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user,
wherein the combination processing apparatus further comprises: and a storage device connected to the machine learning arithmetic device and the other processing device, respectively, for storing data of the machine learning arithmetic device and the other processing device.
Clause a16. a machine learning chip, the machine learning chip comprising:
the machine learning computing device of clause a14 or the combined processing device of clause a15.
Clause a17. an electronic device, comprising:
the machine learning chip of clause a16.
Clause a18. a card, the card comprising: a memory device, an interface device and a control device and a machine learning chip as described in clause a 16;
wherein the machine learning chip is connected with the storage device, the control device and the interface device respectively;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the machine learning chip and external equipment;
and the control device is used for monitoring the state of the machine learning chip.
Clause a19. a synchronous control instruction processing method applied to a synchronous control instruction processing apparatus including a plurality of arithmetic modules and a control module, the method comprising:
controlling the control module to analyze the obtained synchronous control instruction to obtain an operation code of the synchronous control instruction, and determining a target operation module which needs to execute the synchronous control instruction;
controlling the target operation module to enter a pause state when the synchronous control instruction is executed;
controlling the control module to monitor the running states of the plurality of operation modules, controlling the target operation modules in the pause state to synchronously enter the working state when the target operation modules are all determined to be in the pause state,
the operation code is used for indicating that the processing of the synchronous control instruction on a plurality of operation modules of the device is synchronous control.
Clause a20. the method according to clause a19, the opcode indicating a target signal for which synchronization of the target operation module is required, or the operation domain including a target signal for which synchronization of the target operation module is required, to cause the control module to determine a target signal according to the opcode or the operation domain,
wherein, controlling the target operation module to enter a pause state when executing the synchronous control instruction comprises:
controlling the target operation module to control the processing corresponding to the target signal determined by the control module to enter a pause state when the synchronous control instruction is executed,
wherein the target signal comprises at least one of: queue signal, IO signal, arrival signal are calculated.
Clause a21. determining a target operation module required to execute a synchronous control instruction according to the method of clause a19 or clause a20, including:
determining an operation module, which executes the target task, of the plurality of operation modules as a target operation module according to the identifier of the target task, where the identifier of the target task includes at least one of: task name, task type, task number.
Clause a22. the method of clause a19 or clause a20, wherein the plurality of calculation modules are divided into a plurality of module clusters, each module cluster including one or more calculation modules,
the method for determining the target operation module which needs to execute the synchronous control instruction comprises the following steps:
determining all operation modules in a target module cluster related to the execution of the target task in the plurality of module clusters as target operation modules according to the identification of the target task, wherein all or part of the operation modules in the target cluster are used for executing the target task, and the identification of the target task comprises at least one of the following: task name, task type, task number.
Clause a23. the opcode or the operation field is used to indicate that an identification of a target task is obtained according to the method of clause a21 or clause a22.
Clause a24. the method of clause a19 or clause a20, wherein the plurality of calculation modules are divided into a plurality of module clusters, each module cluster including one or more calculation modules therein, the operation code or the operation field indicating an identification of a target module cluster,
the method for determining the target operation module which needs to execute the synchronous control instruction comprises the following steps:
and determining the operation module belonging to the target module cluster in the plurality of operation modules as a target operation module according to the identification of the target module cluster.
Clause a25. the method of clause a19 or clause a20, the opcode or the operation field to indicate an identification of a target operation module,
the method for determining the target operation module which needs to execute the synchronous control instruction comprises the following steps:
and determining a target operation module from the plurality of operation modules according to the identification of the target operation module.
A clause a26. according to the method of clause a19 or clause a20, controlling the control module to determine a kernel function in which the synchronous control instruction is located, and determining an operation module, which calls the kernel function, of the plurality of operation modules as a target operation module.
Clause a27. the method of clause a21 or clause a26, the operational domain including a quantity threshold, the method further comprising:
and controlling the control module to control the target operation module in the pause state to enter a working state when the control module determines that the number of the target operation modules in the pause state reaches the number threshold.
Clause a28. the method of clause a19, the calculation module comprising a master calculation submodule and a plurality of slave calculation submodules, the method further comprising:
controlling the control module to obtain a calculation instruction, executing to-be-calculated data required by the calculation instruction, and analyzing the calculation instruction to obtain a plurality of calculation instructions;
controlling the main operation sub-module to execute preorder processing on the data to be operated and transmit data and operation instructions;
controlling the slave operation submodule to execute intermediate operation in parallel according to the transmitted data and the operation instruction to obtain a plurality of intermediate results;
and controlling the main operation sub-module to execute subsequent processing on the plurality of intermediate results to obtain an operation result.
Clause a29. the method of clause a28, further comprising:
controlling a storage module of the device to store the data to be operated,
wherein the storage module comprises at least one of a register and a cache,
the cache is used for storing the data to be operated, and comprises at least one neuron cache NRAM;
the register is used for storing scalar data in the data to be operated;
the neuron cache is used for storing neuron data in the data to be operated, wherein the neuron data comprises neuron vector data.
Clause a30. the method of clause a28, further comprising:
controlling the control module to store the synchronous control instruction and the calculation instruction;
controlling the control module to respectively analyze the synchronous control instruction and the calculation instruction to obtain a corresponding operation code and an operation domain;
and controlling the control module to store an instruction queue, wherein the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the plurality of instructions to be executed comprise the synchronous control instruction and the calculation instruction.
Clause a31. the method of clause a30, further comprising:
controlling the control module to cache the first instruction to be executed when determining that the first instruction to be executed in the plurality of instructions to be executed has an association relation with a zeroth instruction to be executed before the first instruction to be executed, and controlling execution of the first instruction to be executed after determining that the zeroth instruction to be executed is executed,
wherein the association relationship between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction comprises:
and a first storage address interval for storing the data required by the first instruction to be executed and a zeroth storage address interval for storing the data required by the zeroth instruction to be executed have an overlapped area.
Clause a32. a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method of any of clauses a19 to 31.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (16)

1. A synchronous control instruction processing device is characterized by comprising a control module and a plurality of operation modules,
the control module is used for analyzing the obtained synchronous control instruction to obtain an operation code of the synchronous control instruction and determining a target operation module which needs to execute the synchronous control instruction;
the target operation module is used for entering a pause state when the synchronous control instruction is executed;
the control module is also used for monitoring the running states of the plurality of operation modules, controlling the target operation modules in the pause state to synchronously enter the working state when the target operation modules are all determined to be in the pause state,
the operation code is used for indicating the synchronous control instruction to be used for synchronously controlling a plurality of operation modules of the device.
2. The apparatus of claim 1, wherein the synchronization control instruction further comprises an operation field, wherein the operation code is used to indicate a target signal required to be synchronized by the target operation module, or the operation field comprises the target signal required to be synchronized by the target operation module, so that the control module determines the target signal according to the operation code or the operation field,
the target operation module is also used for controlling the processing corresponding to the target signal determined by the control module to enter a pause state when the synchronous control instruction is executed,
wherein the target signal comprises at least one of: queue signal, IO signal, arrival signal are calculated.
3. The apparatus of claim 1 or 2, wherein determining the target operation module that needs to execute the synchronous control instruction comprises:
determining an operation module, which executes the target task, of the plurality of operation modules as a target operation module according to the identifier of the target task, where the identifier of the target task includes at least one of: task name, task type, task number.
4. The apparatus of claim 1 or 2, wherein the plurality of calculation modules are divided into a plurality of module clusters, each module cluster including one or more calculation modules therein,
the method for determining the target operation module which needs to execute the synchronous control instruction comprises the following steps:
determining all operation modules in a target module cluster related to the execution of the target task in the plurality of module clusters as target operation modules according to the identification of the target task, wherein all or part of the operation modules in the target cluster are used for executing the target task, and the identification of the target task comprises at least one of the following: task name, task type, task number.
5. The apparatus of claim 3 or 4, wherein the operation code or the operation field is used to indicate that an identification of a target task is obtained.
6. The apparatus of claim 1 or 2, wherein the plurality of operation modules are divided into a plurality of module clusters, each module cluster comprises one or more operation modules therein, the operation code or the operation domain is used for indicating the identification of a target module cluster,
the method for determining the target operation module which needs to execute the synchronous control instruction comprises the following steps:
and determining the operation module belonging to the target module cluster in the plurality of operation modules as a target operation module according to the identification of the target module cluster.
7. The apparatus of claim 1 or 2, wherein the opcode or the operation field is used to indicate an identity of a target operation module,
the method for determining the target operation module which needs to execute the synchronous control instruction comprises the following steps:
and determining a target operation module from the plurality of operation modules according to the identification of the target operation module.
8. The apparatus according to claim 1 or 2, wherein the control module is further configured to determine a kernel function where the synchronous control instruction is located, and determine an operation module, which calls the kernel function, among the plurality of operation modules as a target operation module.
9. The apparatus of claim 3 or 8, wherein a quantity threshold is included in the operational domain,
the control module is further configured to control the target operation module in the suspended state to enter the working state when it is determined that the number of the target operation modules in the suspended state reaches the number threshold.
10. A machine learning arithmetic device, the device comprising:
one or more synchronous control instruction processing devices according to any one of claims 1 to 9, configured to obtain data to be operated and control information from other processing devices, execute a specified machine learning operation, and transmit an execution result to the other processing devices through an I/O interface;
when the machine learning arithmetic device comprises a plurality of synchronous control instruction processing devices, the plurality of synchronous control instruction processing devices can be connected through a specific structure and transmit data;
the synchronous control instruction processing devices are interconnected through a PCIE bus of a fast peripheral equipment interconnection bus and transmit data so as to support operation of larger-scale machine learning; the synchronous control instruction processing devices share the same control system or own respective control systems; the synchronous control instruction processing devices share a memory or own memories; the interconnection mode of the plurality of synchronous control instruction processing devices is any interconnection topology.
11. A combined processing apparatus, characterized in that the combined processing apparatus comprises:
the machine learning computing device, universal interconnect interface, and other processing device of claim 10;
the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user,
wherein the combination processing apparatus further comprises: and a storage device connected to the machine learning arithmetic device and the other processing device, respectively, for storing data of the machine learning arithmetic device and the other processing device.
12. A machine learning chip, the machine learning chip comprising:
a machine learning computation apparatus according to claim 10 or a combined processing apparatus according to claim 11.
13. An electronic device, characterized in that the electronic device comprises:
the machine learning chip of claim 12.
14. The utility model provides a board card, its characterized in that, the board card includes: a memory device, an interface apparatus and a control device and a machine learning chip according to claim 12;
wherein the machine learning chip is connected with the storage device, the control device and the interface device respectively;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the machine learning chip and external equipment;
and the control device is used for monitoring the state of the machine learning chip.
15. A synchronous control instruction processing method is applied to a synchronous control instruction processing device, the device comprises a plurality of operation modules and control modules, and the method comprises the following steps:
controlling the control module to analyze the obtained synchronous control instruction to obtain an operation code of the synchronous control instruction, and determining a target operation module which needs to execute the synchronous control instruction;
controlling the target operation module to enter a pause state when the synchronous control instruction is executed;
controlling the control module to monitor the running states of the plurality of operation modules, controlling the target operation modules in the pause state to synchronously enter the working state when the target operation modules are all determined to be in the pause state,
the operation code is used for indicating that the processing of the synchronous control instruction on a plurality of operation modules of the device is synchronous control.
16. A non-transitory computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the method of claim 15.
CN201910755816.8A 2018-10-09 2019-08-15 Operation method, device and related product Active CN111340202B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/110167 WO2020073925A1 (en) 2018-10-09 2019-10-09 Operation method and apparatus, computer device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2018115482005 2018-12-18
CN201811548200 2018-12-18

Publications (2)

Publication Number Publication Date
CN111340202A true CN111340202A (en) 2020-06-26
CN111340202B CN111340202B (en) 2023-06-09

Family

ID=71185095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910755816.8A Active CN111340202B (en) 2018-10-09 2019-08-15 Operation method, device and related product

Country Status (1)

Country Link
CN (1) CN111340202B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377440A (en) * 2021-06-03 2021-09-10 昆山丘钛微电子科技股份有限公司 FPGA-based instruction processing method and device, electronic equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908472A (en) * 2017-09-30 2018-04-13 平安科技(深圳)有限公司 Data synchronization unit, method and computer-readable recording medium
CN108572926A (en) * 2017-03-13 2018-09-25 阿里巴巴集团控股有限公司 A kind of method and apparatus for synchronizing caching belonging to central processing unit

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108572926A (en) * 2017-03-13 2018-09-25 阿里巴巴集团控股有限公司 A kind of method and apparatus for synchronizing caching belonging to central processing unit
CN107908472A (en) * 2017-09-30 2018-04-13 平安科技(深圳)有限公司 Data synchronization unit, method and computer-readable recording medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377440A (en) * 2021-06-03 2021-09-10 昆山丘钛微电子科技股份有限公司 FPGA-based instruction processing method and device, electronic equipment and medium

Also Published As

Publication number Publication date
CN111340202B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
CN110096309B (en) Operation method, operation device, computer equipment and storage medium
CN110096310B (en) Operation method, operation device, computer equipment and storage medium
CN110119807B (en) Operation method, operation device, computer equipment and storage medium
CN111047005A (en) Operation method, operation device, computer equipment and storage medium
CN111340202B (en) Operation method, device and related product
CN112052040A (en) Processing method, processing device, computer equipment and storage medium
CN111047030A (en) Operation method, operation device, computer equipment and storage medium
CN111061507A (en) Operation method, operation device, computer equipment and storage medium
CN111353124A (en) Operation method, operation device, computer equipment and storage medium
CN111124497B (en) Operation method, operation device, computer equipment and storage medium
CN111026440B (en) Operation method, operation device, computer equipment and storage medium
CN111353595A (en) Operation method, device and related product
CN111338694B (en) Operation method, device, computer equipment and storage medium
CN111353125B (en) Operation method, operation device, computer equipment and storage medium
CN112395008A (en) Operation method, operation device, computer equipment and storage medium
CN111275197B (en) Operation method, device, computer equipment and storage medium
CN111290789B (en) Operation method, operation device, computer equipment and storage medium
CN111339060B (en) Operation method, device, computer equipment and storage medium
CN111062483A (en) Operation method, operation device, computer equipment and storage medium
CN111290788B (en) Operation method, operation device, computer equipment and storage medium
CN112396169B (en) Operation method, device, computer equipment and storage medium
CN111078125B (en) Operation method, device and related product
CN111079914B (en) Operation method, system and related product
CN112395007A (en) Operation method, operation device, computer equipment and storage medium
CN112395001A (en) Operation method, operation device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant