CN111813450A

CN111813450A - Operation method, device and related product

Info

Publication number: CN111813450A
Application number: CN201910294130.3A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2019-04-12
Filing date: 2019-04-12
Publication date: 2020-10-23

Abstract

The disclosure relates to an operation method, an operation device and a related product. The integrated circuit board includes: the device comprises a storage device, an interface device, a control device and a machine learning chip; the machine learning chip is respectively connected with the storage device, the control device and the interface device; the storage device is used for storing data; the interface device is used for realizing data transmission between the machine learning chip and external equipment; the control device is used for monitoring the state of the machine learning chip.

Description

Operation method, device and related product

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for processing a matrix mirror instruction, and a related product.

Background

With the continuous development of science and technology, machine learning, especially neural network algorithms, are more and more widely used. The method is well applied to the fields of image recognition, voice recognition, natural language processing and the like. However, as the complexity of neural network algorithms is higher and higher, the types and the number of involved data operations are increasing. In the related art, the efficiency and the speed of mirroring the matrix data are low.

Disclosure of Invention

In view of this, the present disclosure provides a method and an apparatus for processing a matrix mirroring instruction, and a related product, so as to improve efficiency and speed of mirroring a matrix.

According to a first aspect of the present disclosure, there is provided a matrix mirroring instruction processing apparatus, the apparatus comprising:

the control module is used for analyzing the received matrix mirror image instruction, obtaining an operation code and an operation domain of the matrix mirror image instruction, determining a matrix to be mirrored and a target address required by executing the matrix mirror image instruction according to the operation code and the operation domain, and determining a mirror image strategy required by mirror image processing;

a processing module for carrying out mirror image processing on the matrix to be mirror image according to the mirror image strategy to obtain a matrix after mirror image and storing the matrix after mirror image into the target address,

the operation code is used for indicating that the matrix mirroring instruction processes the matrix data to be mirrored, and the operation domain comprises the matrix address to be mirrored and the target address.

According to a second aspect of the present disclosure, there is provided a machine learning arithmetic device, the device including:

one or more matrix mirroring instruction processing devices according to the first aspect, configured to obtain a matrix to be mirrored and control information from another processing device, execute a specified machine learning operation, and transmit an execution result to the other processing device through an I/O interface;

when the machine learning arithmetic device comprises a plurality of matrix mirroring instruction processing devices, the plurality of matrix mirroring instruction processing devices can be connected through a specific structure and transmit data;

the matrix mirroring instruction processing devices are interconnected through a PCIE bus of a fast peripheral equipment interconnection bus and transmit data so as to support operation of larger-scale machine learning; the matrix mirror image instruction processing devices share the same control system or own respective control systems; the matrix mirror image instruction processing devices share a memory or own memories; the interconnection mode of the matrix mirror image instruction processing devices is any interconnection topology.

According to a third aspect of the present disclosure, there is provided a combined processing apparatus, the apparatus comprising:

the machine learning arithmetic device, the universal interconnect interface, and the other processing device according to the second aspect;

and the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user.

According to a fourth aspect of the present disclosure, there is provided a machine learning chip including the machine learning network operation device of the second aspect or the combination processing device of the third aspect.

According to a fifth aspect of the present disclosure, there is provided a machine learning chip package structure, which includes the machine learning chip of the fourth aspect.

According to a sixth aspect of the present disclosure, a board card is provided, which includes the machine learning chip packaging structure of the fifth aspect.

According to a seventh aspect of the present disclosure, there is provided an electronic device, which includes the machine learning chip of the fourth aspect or the board of the sixth aspect.

According to an eighth aspect of the present disclosure, there is provided a matrix mirroring instruction processing method, which is applied to a matrix mirroring instruction processing apparatus, the method including:

analyzing a received matrix mirror image instruction to obtain an operation code and an operation domain of the matrix mirror image instruction, determining a matrix to be mirrored and a target address required by executing the matrix mirror image instruction according to the operation code and the operation domain, and determining a mirror image strategy required by mirror image processing;

performing mirror image processing on the matrix to be mirrored according to the mirror image strategy to obtain a matrix after mirror image, storing the matrix after mirror image into the target address,

the operation code is used for indicating that the matrix mirror image instruction processes the matrix as mirror image processing, and the operation domain comprises the address of the matrix to be mirrored and the target address.

In some embodiments, the electronic device comprises a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

In some embodiments, the vehicle comprises an aircraft, a ship, and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

The matrix mirror image instruction processing method, device and related product provided by the embodiment of the disclosure comprise a control module and a processing module. The control module is used for analyzing the received matrix mirror image instruction, obtaining an operation code and an operation domain of the matrix mirror image instruction, and determining a matrix to be mirrored and a target address required by the execution of the matrix mirror image instruction according to the operation code and the operation domain. The processing module is used for carrying out mirror image processing on the matrix to be mirrored according to the mirror image strategy to obtain a matrix after mirror image, and storing the matrix after mirror image into a target address. The matrix mirroring instruction processing method and device and the related products provided by the embodiment of the disclosure have wide application range, and the processing efficiency and the processing speed for mirroring the matrix according to the matrix mirroring instruction are high.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1a, 1b show block diagrams of a combined processing device according to an embodiment of the present disclosure.

Fig. 2 shows a schematic structural diagram of a board card according to an embodiment of the present disclosure.

FIG. 3 shows a block diagram of a matrix mirrored instruction processing apparatus according to an embodiment of the present disclosure.

FIG. 4 shows a block diagram of a matrix mirrored instruction processing apparatus according to an embodiment of the present disclosure.

Fig. 5 is a schematic diagram illustrating an application scenario of a matrix mirroring instruction processing apparatus according to an embodiment of the present disclosure.

FIG. 6 shows a flow diagram of a matrix mirroring instruction processing method according to an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

As the neural network algorithm is more and more widely used in the fields of image recognition, voice recognition, natural language processing and the like, the complexity of the neural network algorithm is higher and higher, and the type and the number of the related data operation are continuously increased. The matrix is a data format that is common in neural network algorithms, and is composed of numbers and/or characters. The processing process of the matrix in the neural network algorithm comprises the step of carrying out mirror image processing on the matrix. In the related art, the mirror image processing of the matrix is required to be low in efficiency and low in speed.

The machine learning arithmetic device can perform relevant operation of a neural network algorithm, and can comprise one or more matrix mirror image instruction processing devices for carrying out mirror image processing on a matrix according to a received matrix mirror image instruction, and the matrix mirror image instruction processing devices are used for acquiring a matrix to be mirrored and control information from other processing devices and executing specified machine learning arithmetic. The machine learning arithmetic device can obtain the matrix mirror image instruction from other machine learning arithmetic devices or non-machine learning arithmetic devices and transmit the execution result to peripheral equipment (also called other processing devices) through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one matrix mirroring instruction processing device is included, the matrix mirroring instruction processing devices can be linked and transmit data through a specific structure, for example, the matrix mirroring instruction processing devices are interconnected and transmit data through a PCIE bus, so as to support larger-scale operation of the neural network. At this time, the same control system may be shared, or there may be separate control systems; the memory may be shared or there may be separate memories for each accelerator. In addition, the interconnection mode can be any interconnection topology.

The machine learning arithmetic device has high compatibility and can be connected with various types of servers through PCIE interfaces.

FIG. 1a shows a block diagram of a combined processing device according to an embodiment of the present disclosure. As shown in fig. 1a, the combined processing device includes the machine learning arithmetic device, the universal interconnection interface and other processing devices. The machine learning arithmetic device interacts with other processing devices to jointly complete the operation designated by the user.

Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the machine learning arithmetic device and external data and control, and comprise data transportation to finish basic control of starting, stopping and the like of the machine learning arithmetic device; other processing devices may cooperate with the machine learning computing device to perform computing tasks.

And the universal interconnection interface is used for transmitting data and control instructions between the machine learning arithmetic device and other processing devices. The machine learning arithmetic device acquires required input data from other processing devices and writes the input data into a storage device on the machine learning arithmetic device; control instructions can be obtained from other processing devices and written into a control cache on a machine learning arithmetic device chip; the data in the storage module of the machine learning arithmetic device can also be read and transmitted to other processing devices.

FIG. 1b shows a block diagram of a combined processing device according to an embodiment of the present disclosure. In a possible implementation manner, as shown in fig. 1b, the combined processing device may further include a storage device, and the storage device is connected to the machine learning operation device and the other processing device respectively. The storage device is used for storing data stored in the machine learning arithmetic device and other processing devices, and is particularly suitable for data which is required to be calculated and cannot be stored in the internal storage of the machine learning arithmetic device or other processing devices.

The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.

The present disclosure provides a machine learning chip, which includes the above machine learning arithmetic device or combined processing device.

The present disclosure provides a machine learning chip package structure, which includes the above machine learning chip.

Fig. 2 shows a schematic structural diagram of a board card according to an embodiment of the present disclosure. As shown in fig. 2, the board includes the above-mentioned machine learning chip package structure or the above-mentioned machine learning chip. The board may include, in addition to the machine learning chip 389, other kits including, but not limited to: memory device 390, interface device 391 and control device 392.

The memory device 390 is coupled to a machine learning chip 389 (or a machine learning chip within a machine learning chip package structure) via a bus for storing data. Memory device 390 may include multiple sets of memory cells 393. Each group of memory cells 393 is coupled to a machine learning chip 389 via a bus. It is understood that each group 393 may be a DDR SDRAM (Double Data Rate SDRAM).

DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM.

In one embodiment, memory device 390 may include 4 groups of memory cells 393. Each group of memory cells 393 may include a plurality of DDR4 particles (chips). In one embodiment, the machine learning chip 389 may include 4 72-bit DDR4 controllers therein, where 64bit is used for data transmission and 8bit is used for ECC check in the 72-bit DDR4 controller. It is appreciated that when DDR4-3200 particles are used in each group of memory cells 393, the theoretical bandwidth of data transfer may reach 25600 MB/s.

In one embodiment, each group 393 of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is provided in the machine learning chip 389 for controlling data transfer and data storage of each memory unit 393.

Interface device 391 is electrically coupled to machine learning chip 389 (or a machine learning chip within a machine learning chip package). The interface device 391 is used to implement data transmission between the machine learning chip 389 and an external device (e.g., a server or a computer). For example, in one embodiment, the interface device 391 may be a standard PCIE interface. For example, the data to be processed is transmitted to the machine learning chip 289 by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the interface device 391 may also be another interface, and the disclosure does not limit the specific representation of the other interface, and the interface device can implement the switching function. In addition, the calculation result of the machine learning chip is still transmitted back to the external device (e.g., server) by the interface device.

The control device 392 is electrically connected to a machine learning chip 389. The control device 392 is used to monitor the state of the machine learning chip 389. Specifically, the machine learning chip 389 and the control device 392 may be electrically connected through an SPI interface. The control device 392 may include a single chip Microcomputer (MCU). For example, machine learning chip 389 may include multiple processing chips, multiple processing cores, or multiple processing circuits, which may carry multiple loads. Therefore, the machine learning chip 389 can be in different operation states such as a multi-load and a light load. The control device can regulate and control the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the machine learning chip.

The present disclosure provides an electronic device, which includes the above machine learning chip or board card.

The electronic device may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

The vehicle may include an aircraft, a ship, and/or a vehicle. The household appliances may include televisions, air conditioners, microwave ovens, refrigerators, electric rice cookers, humidifiers, washing machines, electric lamps, gas cookers, and range hoods. The medical device may include a nuclear magnetic resonance apparatus, a B-mode ultrasound apparatus and/or an electrocardiograph.

FIG. 3 shows a block diagram of a matrix mirrored instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 3, the apparatus comprises a control module 11 and a processing module 12.

The control module 11 is configured to analyze the received matrix mirror instruction, obtain an operation code and an operation domain of the matrix mirror instruction, determine a matrix to be mirrored and a target address required for executing the matrix mirror instruction according to the operation code and the operation domain, and determine a mirror policy required for performing mirror processing. The operation code is used for indicating that the matrix mirror image instruction processes the matrix data to be mirror image processing, and the operation domain comprises a matrix address to be mirror image and a target address.

And the processing module 12 is used for carrying out mirror image processing on the matrix to be mirrored according to the mirror image strategy to obtain a matrix after mirror image, and storing the matrix after mirror image into the target address.

In this embodiment, the matrix to be mirrored may be a data set in which a plurality of numbers and/or characters are arranged in an array. The mirror image processing is to perform a transformation processing on the matrix, and turn over the matrix to be mirror-imaged along a specific turning straight line (in a two-dimensional plane) or a specific turning plane (in a three-dimensional space) to obtain the matrix after mirror image processing. For example, if the matrix to be mirrored is in the two-dimensional plane, the mirroring strategy may include at least one of folding the matrix to be mirrored in the horizontal direction of the matrix to be mirrored and folding the matrix to be mirrored in the vertical direction of the matrix to be mirrored. If the matrix to be mirrored is in the three-dimensional space, the mirroring strategy can comprise at least one of turning the matrix to be mirrored along the horizontal plane of the matrix to be mirrored, turning the matrix to be mirrored along the vertical plane of the matrix to be mirrored and turning the matrix to be mirrored along the common vertical plane of the horizontal plane and the vertical plane. The mirror image policy may include parameters required for mirror image processing, such as a flip straight line and a flip plane, required for mirror image processing on the matrix to be mirror image processed, and the matrix mirror image instruction may perform mirror image processing on the matrix one or more times, which is not limited by the present disclosure.

For example, assume that the matrix to be mirrored is [ [1,4,7], [2,5,8], [3,6,9] ]. If the mirror image policy is determined to be "horizontal mirror image" according to the matrix mirror image command, the device performs horizontal mirror image processing on the matrix to be mirror image, and then obtains a matrix after mirror image [ [3,6,9], [2,5,8], [1,4,7] ]. If the symmetric strategy is determined to be vertical mirror image according to the matrix mirror image instruction, the device performs vertical mirror image processing on the matrix to be mirror image to obtain symmetric matrices [ [9,6,3], [8,5,2], [7,4,1] ].

In this embodiment, the control module may obtain the matrix to be mirrored from the address of the matrix to be mirrored. The address of the matrix to be mirrored can be a physical address such as a first address for storing the matrix to be mirrored, and can also be a logical address and a linear address. The control module may store the matrix to be mirrored in the target address. The target address may be a physical address such as a first address of the matrix after memory mirroring, or may be a logical address or a linear address. The representation mode of the address and the target address of the mirror image matrix is not limited in the present disclosure. . The control module may obtain the matrix mirroring instruction and the matrix to be mirrored through a data input/output unit, which may be one or more data I/O interfaces or I/O pins.

In this embodiment, the instruction for one matrix mirroring may include an opcode and an operation field. An opcode may be the portion of an instruction or field (usually denoted by a code) specified in a computer program that is to perform an operation, and is an instruction sequence number that tells the device executing the instruction which instruction specifically needs to be executed. The operation domain may be a source of all data required for executing the corresponding instruction, where all data required for executing the corresponding instruction includes the matrix to be mirrored, the corresponding mirroring policy, or an address storing the matrix to be mirrored, the corresponding mirroring policy, and so on. For example, the operation domain may include a matrix address to be mirrored and a target address.

It should be understood that the instruction format of the matrix mirroring instruction and the contained opcodes and operation domains may be set as desired by those skilled in the art, and the present disclosure is not limited thereto.

In this embodiment, the apparatus may include one or more control modules and one or more processing modules, and the number of the control modules and the number of the processing modules may be set according to actual needs, which is not limited by this disclosure. When the apparatus includes a control module, the control module may receive the matrix mirroring instruction and control one or more processing modules to perform the mirroring process. When the device comprises a plurality of control modules, the plurality of control modules can respectively receive the matrix mirroring instruction and control the corresponding one or more processing modules to perform mirroring.

The matrix mirroring instruction processing device provided by the embodiment of the disclosure comprises a control module and a processing module. The control module is used for analyzing the received matrix mirror image instruction, obtaining an operation code and an operation domain of the matrix mirror image instruction, determining a matrix to be mirrored and a target address required by executing the matrix mirror image instruction according to the operation code and the operation domain, and determining a mirror image strategy required by mirror image processing. And the processing module performs mirror image processing on the matrix to be mirrored according to the mirror image strategy to obtain a matrix after mirror image, and stores the matrix after mirror image into the target address. The matrix mirror image instruction processing device provided by the embodiment of the disclosure has a wide application range, and is high in processing efficiency and high in processing speed when the matrix mirror image instruction is used for carrying out mirror image processing on the matrix.

In one possible implementation, the operation domain may also include input shapes of the matrix to be mirrored. The processing module 12 may also be configured to perform mirror processing on the matrix to be mirrored according to the input shape and the mirror policy, so as to obtain a matrix after mirroring.

In this implementation manner, the matrix is conveniently subjected to symmetric processing according to the input shape of the matrix to be mirrored, and the shape of the matrix after symmetry can also be determined according to the input shape of the matrix to be mirrored. The shape of the matrix may be represented by the number of numbers and/or characters on the rows, columns of the matrix to be mirrored. For example, the matrix 1 to be mirrored is [ [0,1,1], [0,1, -1] ], and the shape of the matrix 1 to be mirrored is 3 × 2, that is, the matrix 1 to be processed is 3 rows and 2 columns, and is composed of 6 numbers.

In one possible implementation, default input shapes for the matrix to be mirrored may be preset. When the input shape of the matrix to be mirrored is not included in the operation domain, the default input shape of the matrix to be mirrored may be determined as the input shape of the matrix to be mirrored of the current matrix mirroring instruction. The present disclosure is not so limited.

In one possible implementation, the operation domain may also include the output shape of the mirrored matrix. The processing module 12 is further configured to perform mirror processing on the matrix to be mirrored according to the output shape and the mirror policy, so as to obtain a matrix after mirroring.

In this implementation, the output shape may be the shape of the mirrored matrix. For example, the mirrored matrix is [ [1,0], [0,1], [ -1,0] ], and the shape of the mirrored matrix is 2 × 3, that is, the mirrored matrix is 2 rows and 3 columns and is composed of 6 numbers.

In one possible implementation, a default output shape of the mirrored matrix may be preset. When the output shape of the post-mirror matrix is not included in the operation domain, the default output shape of the post-mirror matrix may be determined as the output shape of the post-mirror matrix of the current matrix-mirroring instruction. The present disclosure is not so limited.

In one possible implementation, the operation domain may also be used to indicate a mirroring policy.

In one possible implementation, the opcode may also be used to indicate a mirroring policy.

In one possible implementation, the mirroring policy may be determined according to an opcode or an operation domain of the matrix mirroring instruction. The default mirror strategy of the matrix to be mirrored can be preset. When the operation domain does not contain the mirror strategy of the matrix to be mirrored, the default mirror strategy of the matrix to be mirrored can be determined as the mirror strategy of the matrix to be mirrored of the current matrix mirror instruction.

FIG. 4 shows a block diagram of a matrix mirrored instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation manner, as shown in fig. 4, the matrix mirroring instruction processing apparatus may further include: and the storage module 13 is used for storing the matrix to be mirrored.

In this implementation, the storage module may include one or more of a memory, a cache, and a register, and the cache may include a scratch pad cache. The matrix to be mirrored can be stored in a memory, a cache and/or a register in the storage module as needed, which is not limited by the present disclosure.

In a possible implementation manner, the apparatus may further include a direct memory access module for reading or storing data from the storage module.

In one possible implementation, as shown in fig. 4, the control module 11 may include an instruction storage sub-module 111, an instruction processing sub-module 112, and a queue storage sub-module 113.

The instruction storage submodule 111 is used for storing matrix mirroring instructions.

The instruction processing sub-module 112 is configured to analyze the matrix mirroring instruction to obtain an operation code and an operation domain of the matrix mirroring instruction.

The queue storage submodule 113 is configured to store an instruction queue, where the instruction queue includes multiple instructions to be executed that are sequentially arranged according to an execution order, and the multiple instructions to be executed may include a matrix mirror instruction. The plurality of instructions to be executed may include other computational instructions that may also include instructions related to the matrix mirroring instruction.

In this implementation manner, the execution order of the multiple instructions to be executed may be arranged according to the receiving time, the priority level, and the like of the instructions to be executed to obtain an instruction queue, so that the multiple instructions to be executed are sequentially executed according to the instruction queue.

In one possible implementation, as shown in fig. 4, the control module 11 may further include a dependency processing sub-module 114.

The dependency relationship processing sub-module 114 is configured to, when it is determined that a dependency relationship exists between a first instruction to be executed in the plurality of instructions to be executed and a zeroth instruction to be executed before the first instruction to be executed, cache the first instruction to be executed in the instruction storage sub-module 112 by the dependency relationship processing sub-module 114, and after the zeroth instruction to be executed is executed, extract the first instruction to be executed from the instruction storage sub-module 112 and send the first instruction to be executed to the processing module 12. The first to-be-executed instruction and the zeroth to-be-executed instruction are instructions in the plurality of to-be-executed instructions.

The method for judging whether the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction have a dependency relationship comprises the following steps: the first storage address interval for storing the data required by the first to-be-executed instruction and the zeroth storage address interval for storing the data required by the zeroth to-be-executed instruction have an overlapped area. Conversely, the no dependency relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction may be that there is no overlapping area between the first storage address interval and the zeroth storage address interval.

By the method, according to the dependency relationship among the instructions to be executed, after the prior instruction to be executed is executed, the subsequent instruction to be executed is executed, so that the accuracy of the operation result is ensured.

In one possible implementation, the instruction format of the matrix mirroring instruction may be:

Rotate2 type dst src src_shape dst_shape

wherein, Rotate2 is an operation code, and type, dst, src _ shape, dst _ shape are operation domains. Rotate2 is used to indicate that the instruction is a matrix mirror instruction. type is a mirror policy. dst is the target address. src is the matrix address to be mirrored. src _ shape is the input shape. dst _ shape is the output shape.

Rotate2_type dst src src_shape dst_shape

wherein, Rotate2_ type is the operation code, dst, src _ shape, dst _ shape are the operation domain. The Rotate2 in Rotate2_ type is used to indicate that the instruction is a matrix mirroring instruction, and the type in Rotate2_ type is a mirroring policy. dst is the target address. src is the matrix address to be mirrored. src _ shape is the input shape. dst _ shape is the output shape.

It should be understood that the opcode of the matrix mirroring instruction, the opcode in the instruction format, and the location of the opcode field may be set by one skilled in the art as desired, and the disclosure is not limited thereto.

In one possible implementation manner, the apparatus may be disposed in one or more of a Graphics Processing Unit (GPU), a Central Processing Unit (CPU), and an embedded Neural Network Processor (NPU).

It should be noted that, although the matrix mirroring instruction processing apparatus is described above by taking the above-described embodiment as an example, those skilled in the art will understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each module according to personal preference and/or actual application scene, as long as the technical scheme of the disclosure is met.

Application example

An application example according to the embodiment of the present disclosure is given below in conjunction with "performing mirroring processing on a to-be-matrix mirror by using a matrix mirror instruction processing apparatus" as an exemplary application scenario, so as to facilitate understanding of a flow of the matrix mirror instruction processing apparatus. It is to be understood by those skilled in the art that the following application examples are for the purpose of facilitating understanding of the embodiments of the present disclosure only and are not to be construed as limiting the embodiments of the present disclosure.

Fig. 5 is a schematic diagram illustrating an application scenario of a matrix mirroring instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 5, the matrix mirroring instruction processing means processes the matrix mirroring instruction as follows.

Example 1

Upon receiving the matrix mirroring instruction 1(Rotate2_ type 200100S 1S 2), the control module 11 parses the matrix mirroring instruction to obtain the operation code and the operation field of the matrix mirroring instruction 1. The opcode of the matrix mirror instruction 1 is Rotate2_ type. From the operation code it can be determined: the command is a matrix mirroring command, and the mirroring strategy is type. From the operation domain it can be determined: the matrix to be mirrored has an address of 100, an input shape of S1, a target address of 200, and an output shape of S2. The control module 11 then obtains the input matrix to be mirrored 1 with the shape of S1 from the matrix to be mirrored address 100.

The processing module 12 performs mirror processing on the to-be-mirrored matrix 1 according to the mirror strategy to obtain a mirrored matrix 1 ', and stores the mirrored matrix 1' in the target address 200.

In addition to Rotate2_ type 200100S 1S 2 and Rotate2 type 200100S 1S 2, the matrix mirroring instruction 1 may also be an instruction with a different instruction format and representing the same processing procedure, and the processing procedures of the matrix mirroring instruction apparatus are similar and will not be described again.

The above processing is detailed in the above description.

Therefore, the matrix mirroring instruction processing device can quickly and efficiently mirror the matrix according to the matrix mirroring instruction.

FIG. 6 shows a flow diagram of a matrix mirroring instruction processing method according to an embodiment of the present disclosure. As shown in fig. 6, the method is applied to the above-described matrix mirroring instruction processing apparatus, and includes step S51 and step S52.

In step S51, the received matrix mirroring instruction is analyzed to obtain an operation code and an operation domain of the matrix mirroring instruction, and a matrix to be mirrored and a target address required for executing the matrix mirroring instruction and a mirroring policy required for mirroring are determined according to the operation code and the operation domain. The operation code is used for indicating that the matrix mirror image instruction processes the matrix into mirror image processing, and the operation domain comprises a matrix address to be mirrored and a target address.

In step S52, the mirror image matrix is mirrored according to the mirror image policy to obtain a mirrored matrix, and the mirrored matrix is stored in the destination address.

In one possible implementation, the operation domain may also include input shapes of the matrix to be mirrored. Performing mirror image processing on the matrix to be mirrored according to the mirror image policy to obtain a matrix after mirror image, where the mirror image processing may include: and carrying out mirror image processing on the matrix to be mirror image according to the output shape and the mirror image strategy to obtain a matrix after mirror image.

In a possible implementation manner, the operation domain may further include an output shape of the matrix after mirroring, where mirroring is performed on the matrix to be mirrored according to a mirroring policy to obtain the matrix after mirroring, which may include: and carrying out mirror image processing on the matrix to be mirror image according to the output shape and the mirror image strategy to obtain a matrix after mirror image.

In one possible implementation, the method may further include: and storing the matrix to be mirrored.

In a possible implementation manner, analyzing the received matrix mirroring instruction to obtain an operation code and an operation domain of the matrix mirroring instruction may include:

storing a matrix mirroring instruction;

analyzing the matrix mirror image instruction to obtain an operation code and an operation domain of the matrix mirror image instruction;

the method includes storing an instruction queue, where the instruction queue includes a plurality of instructions to be executed that are sequentially arranged according to an execution order, and the plurality of instructions to be executed may include matrix mirror instructions.

In one possible implementation, the method may further include:

when determining that a first to-be-executed instruction in the plurality of to-be-executed instructions has a dependency relationship with a zeroth to-be-executed instruction before the first to-be-executed instruction, caching the first to-be-executed instruction, and after determining that the zeroth to-be-executed instruction is completely executed, controlling to execute the first to-be-executed instruction,

the method for judging whether the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction have a dependency relationship comprises the following steps:

the first storage address interval for storing the data required by the first to-be-executed instruction and the zeroth storage address interval for storing the data required by the zeroth to-be-executed instruction have an overlapped area.

It should be noted that, although the matrix mirroring instruction processing method is described above by taking the above-described embodiment as an example, those skilled in the art will understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each step according to personal preference and/or actual application scene, as long as the technical scheme of the disclosure is met.

The matrix mirror image instruction processing method provided by the embodiment of the disclosure has the advantages of wide application range, high processing efficiency and high processing speed on the matrix mechanical energy mirror image.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present disclosure, it should be understood that the disclosed system and apparatus may be implemented in other ways. For example, the above-described embodiments of systems and apparatuses are merely illustrative, and for example, a division of a device, an apparatus, and a module is merely a logical division, and an actual implementation may have another division, for example, a plurality of modules may be combined or integrated into another system or apparatus, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices, apparatuses or modules, and may be an electrical or other form.

Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present disclosure may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a form of hardware or a form of a software program module.

The integrated modules, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A matrix mirroring instruction processing apparatus, the apparatus comprising:

2. The apparatus of claim 1, wherein the operation domain further comprises input shapes of a matrix to be mirrored,

the processing module is further configured to perform mirror image processing on the matrix to be mirrored according to the input shape and the mirror image policy to obtain a matrix after mirror image processing.

3. The apparatus of claim 1, wherein the operation domain further comprises an output shape of a mirrored matrix,

the processing module is further configured to perform mirror image processing on the matrix to be mirrored according to the output shape and the mirror image policy to obtain a matrix after mirror image processing.

4. The apparatus of claim 1, wherein the operation domain is further configured to indicate a mirroring policy.

5. The apparatus of claim 1, wherein the opcode is further configured to indicate the mirroring policy.

6. The apparatus of claim 1,

the device further comprises: a storage module for storing the matrix to be mirrored,

wherein the control module comprises:

the instruction storage submodule is used for storing the matrix mirror image instruction;

the instruction processing submodule is used for analyzing the matrix mirror image instruction to obtain an operation code and an operation domain of the matrix mirror image instruction;

a queue storage submodule, configured to store an instruction queue, where the instruction queue includes multiple instructions to be executed that are sequentially arranged according to an execution order, where the multiple instructions to be executed include the matrix mirroring instruction,

wherein, the control module further comprises:

the dependency relationship processing submodule is used for caching a first instruction to be executed in the instruction storage submodule when the dependency relationship between the first instruction to be executed in the plurality of instructions to be executed and a zeroth instruction to be executed before the first instruction to be executed is determined, extracting the first instruction to be executed from the instruction storage submodule after the zeroth instruction to be executed is executed, and sending the first instruction to be executed to the processing module,

wherein the dependency relationship between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction comprises:

and a first storage address interval for storing the data required by the first instruction to be executed and a zeroth storage address interval for storing the data required by the zeroth instruction to be executed have an overlapped area.

7. A machine learning arithmetic device, the device comprising:

one or more matrix mirroring instruction processing devices according to any one of claims 1 to 6, configured to obtain a matrix to be mirrored and control information from another processing device, perform a specified machine learning operation, and transmit an execution result to the other processing device through an I/O interface;

8. A combined processing apparatus, characterized in that the combined processing apparatus comprises:

the machine learning computing device, the universal interconnect interface, and the other processing device of claim 7;

the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user,

wherein the combination processing apparatus further comprises: and a storage device connected to the machine learning arithmetic device and the other processing device, respectively, for storing data of the machine learning arithmetic device and the other processing device.

9. A machine learning chip, the machine learning chip comprising:

the machine learning arithmetic device according to claim 7 or the combined processing device according to claim 8.

10. An electronic device, characterized in that the electronic device comprises:

the machine learning chip of claim 9.

11. The utility model provides a board card, its characterized in that, the board card includes: a memory device, an interface apparatus and a control device and a machine learning chip according to claim 9;

wherein the machine learning chip is connected with the storage device, the control device and the interface device respectively;

the storage device is used for storing data;

the interface device is used for realizing data transmission between the machine learning chip and external equipment;

and the control device is used for monitoring the state of the machine learning chip.

12. A matrix mirroring instruction processing method is applied to a matrix mirroring instruction processing device, and comprises the following steps:

13. The method of claim 12, wherein the operation domain further comprises input shapes of a matrix to be mirrored,

performing mirror image processing on the matrix to be mirrored according to the mirror image strategy to obtain a matrix after mirror image, including:

and carrying out mirror image processing on the matrix to be mirror image according to the input shape and the mirror image strategy to obtain the matrix after mirror image.

14. The method of claim 12, wherein the operation domain further comprises an output shape of the mirrored matrix,

and carrying out mirror image processing on the matrix to be mirror image according to the output shape and the mirror image strategy to obtain the matrix after mirror image.

15. The method of claim 12, wherein the operation domain is further configured to indicate a mirroring policy.

16. The method of claim 12, wherein the opcode is further used to indicate the mirroring policy.

17. The method of claim 12,

the method further comprises the following steps: the matrix to be mirrored is stored and,

the method for analyzing the received matrix mirror image instruction to obtain the operation code and the operation domain of the matrix mirror image instruction comprises the following steps:

storing the matrix mirroring instruction;

storing an instruction queue, the instruction queue comprising a plurality of instructions to be executed arranged in sequence according to an execution order, the plurality of instructions to be executed comprising the matrix mirroring instruction,

wherein the method further comprises:

when determining that a first to-be-executed instruction in the plurality of to-be-executed instructions has a dependency relationship with a zeroth to-be-executed instruction before the first to-be-executed instruction, caching the first to-be-executed instruction, and after determining that the zeroth to-be-executed instruction is completely executed, controlling execution of the first to-be-executed instruction,