CN112396169B

CN112396169B - Operation method, device, computer equipment and storage medium

Info

Publication number: CN112396169B
Application number: CN201910745723.7A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2024-04-02
Anticipated expiration: 2039-08-13
Also published as: CN112396169A

Abstract

The present disclosure relates to an operation method, an apparatus, a computer device, and a storage medium. The combined processing device comprises: a machine learning computing device, a universal interconnection interface and other processing devices; the machine learning operation device interacts with other processing devices to jointly complete the calculation operation designated by the user, wherein the combined processing device further comprises: and a storage device connected to the machine learning computing device and the other processing device, respectively, for storing data of the machine learning computing device and the other processing device. The operation method, the operation device, the computer equipment and the storage medium provided by the embodiment of the disclosure have wide application range, high operation processing efficiency and high operation processing speed.

Description

Operation method, device, computer equipment and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to a method, a device, computer equipment and a storage medium for processing summation pooling instructions.

Background

With the continuous development of technology, machine learning, especially neural network algorithms, are increasingly used. The method is well applied to the fields of image recognition, voice recognition, natural language processing and the like. However, as the complexity of the neural network algorithm is higher and higher, the types and the number of the data operations involved are continuously increased. In the related art, the efficiency of summing and pooling operation on data is low and the speed is low.

Disclosure of Invention

In view of this, the present disclosure proposes a summation pooling instruction processing method, apparatus, computer device, and storage medium to improve the efficiency and speed of performing summation pooling operations on data.

According to a first aspect of the present disclosure there is provided a summation pooling instruction processing apparatus, the apparatus comprising:

the control module is used for analyzing the obtained summation pooling instruction to obtain an operation code and an operation domain of the summation pooling instruction, and obtaining data to be operated, pooling cores and target addresses required by executing the summation pooling instruction according to the operation domain;

and the operation module is used for carrying out summation pooling operation on the data to be operated according to the pooling check, obtaining an operation result and storing the operation result into the target address.

According to a second aspect of the present disclosure, there is provided a machine learning arithmetic device, the device comprising:

one or more of the summation pooling instruction processing apparatuses according to the first aspect of the present invention is configured to obtain data to be operated and control information from other processing apparatuses, perform specified machine learning operation, and transmit an execution result to the other processing apparatuses through an I/O interface;

When the machine learning operation device comprises a plurality of the summation pooling instruction processing devices, the summation pooling instruction processing devices can be connected through a specific structure and transmit data;

the summation pooling instruction processing devices are interconnected and transmit data through a PCIE bus of a rapid external equipment interconnection bus so as to support larger-scale machine learning operation; the plurality of summation pooling instruction processing devices share the same control system or have respective control systems; the plurality of summation pooling instruction processing devices share a memory or have respective memories; the interconnection mode of the plurality of the summation pooling instruction processing devices is any interconnection topology.

According to a third aspect of the present disclosure, there is provided a combination processing apparatus, the apparatus comprising:

the machine learning arithmetic device, the universal interconnect interface, and the other processing device described in the second aspect;

the machine learning operation device interacts with the other processing devices to jointly complete the calculation operation designated by the user.

According to a fourth aspect of the present disclosure, there is provided a machine learning chip including the machine learning complex operation device described in the above second aspect or the combination processing device described in the above third aspect.

According to a fifth aspect of the present disclosure, there is provided a machine learning chip package structure including the machine learning chip of the fourth aspect described above.

According to a sixth aspect of the present disclosure, there is provided a board including the machine learning chip package structure of the fifth aspect.

According to a seventh aspect of the present disclosure, there is provided an electronic device including the machine learning chip described in the fourth aspect or the board described in the sixth aspect.

According to an eighth aspect of the present disclosure, there is provided a summation pooling instruction processing method, the method being applied to a summation pooling instruction processing apparatus, the method comprising:

analyzing the obtained summation pooling instruction to obtain an operation code and an operation domain of the summation pooling instruction, and obtaining data to be operated, pooling cores and target addresses required by executing the summation pooling instruction according to the operation domain;

and carrying out summation pooling operation on the data to be operated according to the pooling check to obtain an operation result, and storing the operation result into the target address.

According to a ninth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described summation pooling instruction processing method.

In some embodiments, the electronic device comprises a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a vehicle recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

In some embodiments, the vehicle comprises an aircraft, a ship, and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.

The device comprises a control module and an operation module, wherein the control module is used for analyzing the obtained summation pooling instruction to obtain an operation code and an operation domain of the summation pooling instruction, and obtaining data to be operated, pooling cores and target addresses required by executing the summation pooling instruction according to the operation domain; the operation module is used for carrying out summation pooling operation on the data to be operated according to the pooling check, obtaining an operation result and storing the operation result into the target address. The method and the device for processing the summation pooling instruction and related products provided by the embodiment of the disclosure have wide application range, high processing efficiency and high processing speed for processing the summation pooling instruction, and high processing efficiency and high processing speed for processing summation pooling operation.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 illustrates a block diagram of a summation pooling instruction processing apparatus according to an embodiment of the present disclosure.

Fig. 2 a-2 f illustrate block diagrams of a summation pooling instruction processing apparatus according to an embodiment of the present disclosure.

Fig. 3 shows a schematic diagram of an application scenario of a summation pooling instruction processing apparatus according to an embodiment of the disclosure.

Fig. 4a, 4b show block diagrams of a combined processing apparatus according to an embodiment of the disclosure.

Fig. 5 shows a schematic structural diagram of a board according to an embodiment of the present disclosure.

Fig. 6 illustrates a flow chart of a summation pooling instruction processing method according to an embodiment of the present disclosure.

7 a-7 c illustrate schematic diagrams of a sum pooling operation of an embodiment of the present disclosure.

Detailed Description

The following description of the embodiments of the present disclosure will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the disclosure. Based on the embodiments in this disclosure, all other embodiments that may be made by those skilled in the art without the inventive effort are within the scope of the present disclosure.

It should be understood that the terms "zero," "first," "second," and the like in the claims, specification and drawings of the present disclosure are used for distinguishing between different objects and not for describing a particular sequential order. The terms "comprises" and "comprising" when used in the specification and claims of the present disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present disclosure is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the present disclosure and claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Because of the wide use of neural network algorithms, the capability of computer hardware operators is continuously improved, and the variety and the number of data operations involved in practical application are continuously improved. A sum pooling operation (underpool) is an operation that sums the data in the region corresponding to the pooling core. Because the programming languages are various, in order to realize the operation process of the summation pooling operation under different language environments, in the related technology, because the summation pooling instructions which can be widely applied to various programming languages are not available at the present stage, a technician needs to customize a plurality of instructions corresponding to the programming language environments to realize the summation pooling operation, so that the efficiency of the summation pooling operation is low and the speed is low. The present disclosure provides a method, an apparatus, a computer device, and a storage medium for processing a summation and pooling instruction, where summation and pooling operations can be implemented by only one instruction, and efficiency and speed of performing summation and pooling operations can be significantly improved.

Fig. 1 illustrates a block diagram of a summation pooling instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 1, the apparatus includes a control module 11 and an operation module 12.

The control module 11 is configured to parse the obtained pooling instruction to obtain an operation code and an operation domain of the pooling instruction, and obtain data to be operated, a pooling core and a target address required by executing the pooling instruction according to the operation domain. The operation code is used for indicating that the operation of the summation pooling instruction on the data is summation pooling operation, and the operation domain comprises a data address to be operated, a pooling core and a target address.

The operation module 12 is configured to perform summation pooling operation on the data to be operated according to the pooling check, obtain an operation result, and store the operation result in the target address.

In this embodiment, the control module may obtain the data to be operated from the data address to be operated. The control module may obtain instructions and data through a data input output unit, which may be one or more data I/O interfaces or I/O pins.

In this embodiment, the opcode may be a portion of an instruction or field (usually represented by a code) specified in a computer program to perform an operation, and is an instruction sequence number used to inform an apparatus executing the instruction of which instruction is specifically required to be executed. The operation domain may be a source of data required for executing the corresponding instruction, where the data required for executing the corresponding instruction includes parameters such as data to be operated on, a pooling core, and a corresponding operation method. It must include an opcode and an operation field for a sum pooling instruction, where the operation field includes at least the data address to be operated on, the pooling core, and the target address.

It should be appreciated that the instruction format of the sum pooling instruction, as well as the operation codes and operation fields involved, may be set by those skilled in the art as desired, and this disclosure is not limited in this regard.

In this embodiment, the apparatus may include one or more control modules and one or more operation modules, and the number of the control modules and the operation modules may be set according to actual needs, which is not limited in this disclosure. When the device comprises a control module, the control module can receive the summation pooling instruction and control one or more operation modules to perform summation pooling operation. When the device comprises a plurality of control modules, the control modules can respectively receive the summation pooling instructions and control the corresponding one or more operation modules to perform summation pooling operation.

The device for processing the summation and pooling instruction provided by the embodiment of the disclosure comprises a control module and an operation module, wherein the control module is used for analyzing the acquired summation and pooling instruction to obtain an operation code and an operation domain of the summation and pooling instruction, and acquiring data to be operated, pooling cores and target addresses required by executing the summation and pooling instruction according to the operation domain; the operation module is used for carrying out summation pooling operation on the data to be operated according to the pooling check, obtaining an operation result and storing the operation result into the target address. The summation pooling instruction processing device provided by the embodiment of the disclosure has wide application range, high processing efficiency and high processing speed for summation pooling instructions, and high processing efficiency and high processing speed for summation pooling operation.

Fig. 2a shows a block diagram of a summation pooling instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2a, the operation module 12 may include one or more adders 120. The adder 120 is configured to perform a summation operation on data to be operated in an area corresponding to the pooling core, so as to obtain an operation result.

In this implementation, the number of adders may be set according to the size of the data amount of the summation operation to be performed, the processing speed, the efficiency, and the like of the summation operation, which is not limited by the present disclosure.

Fig. 2b illustrates a block diagram of a sum-pooling instruction processing apparatus according to various embodiments of the present disclosure. In one possible implementation, as shown in fig. 2b, the operation module 12 may include a master operation sub-module 121 and a plurality of slave operation sub-modules 122. The main operator module 121 includes one or more adders.

In one possible implementation, the main operation submodule 121 is configured to perform a summation operation on data to be operated in an area corresponding to the pooling core by using an adder, obtain an operation result, and store the operation result in the target address.

In one possible implementation manner, the control module 11 may be further configured to parse the obtained calculation instruction to obtain an operation domain and an operation code of the calculation instruction, and obtain data to be operated required for executing the calculation instruction according to the operation domain. The operation module 12 may be further configured to operate on the data to be operated according to the calculation instruction, so as to obtain a calculation result of the calculation instruction. The operation module may further include a plurality of operators for performing operations corresponding to operation types of the calculation instruction.

In this implementation manner, the calculation instruction may be other instructions for performing arithmetic operations, logical operations, and other operations on data such as scalar, vector, matrix, tensor, etc., and those skilled in the art may set the calculation instruction according to actual needs, which is not limited in this disclosure.

In this implementation, the arithmetic unit may include an arithmetic unit capable of performing arithmetic operations, logical operations, and the like on data, such as an adder, a divider, a multiplier, and a comparator. The type and number of the operators may be set according to the size of the data amount of the operation to be performed, the operation type, the processing speed of the operation on the data, the efficiency, and the like, and the present disclosure is not limited thereto.

In one possible implementation, as shown in fig. 2b, the operation module 12 may include a master operation sub-module 121 and a plurality of slave operation sub-modules 122. The slave operator module 122 includes one or more adders.

In a possible implementation manner, the control module 11 is further configured to parse the calculation instruction to obtain a plurality of operation instructions, and send the data to be operated and the plurality of operation instructions to the main operation sub-module 121.

The master operation sub-module 121 is configured to receive the data to be operated, the pooling core, and the target address required for executing the summation pooling instruction, which are acquired by the control module, and allocate and transmit the data to be operated, the pooling core, and the target address required for executing the summation pooling instruction to the slave operation sub-module.

The slave operation sub-module 122 is configured to receive the data to be operated, the pooled core and the target address, which are allocated and transmitted by the master operation sub-module and are required by executing the summation pooled instruction, perform summation operation on the data to be operated in the area corresponding to the pooled core by using one or more adders, obtain an operation result, and store the operation result in the target address.

In this implementation, when the calculation instruction is an operation performed on scalar, vector data, the apparatus may control the main operation submodule to perform an operation corresponding to the calculation instruction using an operator therein. When the calculation instruction is an operation for data with dimensions greater than or equal to 2, such as a matrix, a tensor and the like, the device can control the slave operation submodule to perform an operation corresponding to the calculation instruction by using an operator in the slave operation submodule.

It should be noted that, a person skilled in the art may set the connection manner between the master operator module and the plurality of slave operator modules according to actual needs, so as to implement the architecture setting of the operation module, for example, the architecture of the operation module may be an "H" type architecture, an array type architecture, a tree type architecture, etc., which is not limited in this disclosure.

Fig. 2c shows a block diagram of a summation pooling instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2c, the operation module 12 may further include one or more branch operation sub-modules 123, where the branch operation sub-modules 123 are configured to forward data and/or operation instructions between the master operation sub-module 121 and the slave operation sub-module 122. Wherein the main operator module 121 is connected to one or more branch operator modules 123. In this way, the main operator module, the branch operator module and the slave operator module in the operation module are connected by adopting an H-shaped framework, and data and/or operation instructions are forwarded through the branch operator module, so that the occupation of resources of the main operator module is saved, and the processing speed of the instructions is further improved.

Fig. 2d shows a block diagram of a summation pooling instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in FIG. 2d, a plurality of slave operator modules 122 are distributed in an array.

Each slave operator module 122 is connected to other adjacent slave operator modules 122, and the master operator module 121 connects k slave operator modules 122 among the plurality of slave operator modules 122, where the k slave operator modules 122 are: n slave operation sub-modules 122 of row 1, n slave operation sub-modules 122 of row m, and m slave operation sub-modules 122 of column 1.

As shown in fig. 2d, the k slave operator modules only include n slave operator modules in row 1, n slave operator modules in row m, and m slave operator modules in column 1, that is, the k slave operator modules are slave operator modules directly connected with the master operator module from the plurality of slave operator modules. The k slave operation sub-modules are used for forwarding data and instructions among the master operation sub-module and the plurality of slave operation sub-modules. In this way, the plurality of slave operation sub-modules are distributed in an array, so that the speed of sending data and/or operating instructions to the slave operation sub-modules by the master operation sub-module can be improved, and the processing speed of the instructions can be further improved.

Fig. 2e shows a block diagram of a summation pooling instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2e, the operation module may further include a tree submodule 124. The tree submodule 124 includes a root port 401 and a plurality of branch ports 402. The root port 401 is connected to the master operator module 121, and the plurality of branch ports 402 are connected to the plurality of slave operator modules 122, respectively. The tree submodule 124 has a transceiving function and is used for forwarding data and/or operation instructions between the master operation submodule 121 and the slave operation submodule 122. Therefore, the operation module is connected in a tree-shaped structure through the action of the tree-shaped sub-module, and the forwarding function of the tree-shaped sub-module is utilized, so that the speed of transmitting data and/or operation instructions to the slave operation sub-module by the master operation sub-module can be improved, and the processing speed of the instructions is further improved.

In one possible implementation, the tree submodule 124 may be an optional result of the apparatus, which may include at least one layer of nodes. The node is a line structure with a forwarding function, and the node itself has no operation function. The lowest level of nodes are connected with the slave operator modules to forward data and/or operation instructions between the master operator module 121 and the slave operator module 122. In particular, if the tree submodule has zero level nodes, the device does not require a tree submodule.

In one possible implementation, tree submodule 124 may include a plurality of nodes of an n-ary tree structure, which may have a plurality of layers.

For example, fig. 2f shows a block diagram of a summation pooling instruction processing device according to an embodiment of the disclosure. As shown in fig. 2f, the n-ary tree structure may be a binary tree structure, the tree submodule comprising a layer 2 node 01. The lowest level node 01 interfaces with the slave operator module 122 to forward data and/or operation instructions between the master operator module 121 and the slave operator module 122.

In this implementation, the n-ary tree structure may also be a three-ary tree structure or the like, where n is a positive integer greater than or equal to 2. The number of layers of n in the n-ary tree structure and nodes in the n-ary tree structure can be set as desired by those skilled in the art, and this disclosure is not limited in this regard.

In one possible implementation, the operation domain may further include an input height and an input width.

The control module is further used for acquiring the data to be operated corresponding to the input width and the input height from the data address to be operated.

In this implementation, the input height and input width may define the data amount and size of the obtained data to be operated on. The operation field may include a specific value for the input height and input width, or may be a memory address for storing the input height and input width. When a specific value of the input height and the input width is directly included in the operation domain, the specific value is determined as the corresponding input height and input width, respectively. When the memory addresses of the input height and the input width are included in the operation domain, the input height and the input width may be obtained from the memory addresses of the input height and the input width, respectively.

In one possible implementation, when the operation domain does not include the input height and the input width, the data to be operated may be obtained according to a default input height and a default input width that are set in advance.

By the method, the data quantity and the size of the data to be operated can be limited, the accuracy of an operation result is ensured, and the device can execute the summation pooling instruction.

In one possible implementation, the operation domain may also include the number of input channels.

The control module is further used for obtaining data to be operated corresponding to the number of input channels from the data address to be operated.

In this implementation, the number of input channels may define the number of channels of the obtained data to be operated on. The number of input channels included in the operation field may be a specific value, or may be a storage address storing the number of input channels. When a specific value of the number of input channels is directly included in the operation domain, the specific value is determined as the corresponding number of input channels. When the memory address of the number of input channels is included in the operation domain, the number of input channels may be obtained from the memory address of the number of input channels.

In one possible implementation, when the operation domain does not include the number of input channels, the data to be operated may be obtained according to a default input channel number set in advance.

By the method, the number of input channels of the data to be operated can be limited, accuracy of operation results is guaranteed, and the device can execute the summation pooling instruction.

In one possible implementation, the operation domain may also include a pooled core height and a pooled core width.

Wherein the control module 11 is further configured to perform a summation pooling operation according to the pooling core height and the pooling core width.

In one possible implementation, the operation field may also include a first stride. The operation module 12 may be further configured to move the pooling core in the width direction according to the first step.

In one possible implementation, the operation field may further include a second stride. The operation module 12 may be further configured to move the pooling core in the height direction according to the second step.

In this implementation, the stride of the sum pooling operation is the magnitude of each moving pooling core in performing the sum pooling operation. The first step may be an amplitude of moving the pooling core in the width direction and the second step may be an amplitude of moving the pooling core in the height direction.

In this disclosure, parameters such as a height, a width, a first stride, and a second stride of the pooling core required for performing the sum pooling operation are described by taking the pooling core as two dimensions only as an example, and if the pooling core is multidimensional, the parameters of the pooling core include a size and a stride of each dimension thereof.

In one possible implementation manner, when the first stride and the second stride are not given in the operation domain of the summation pooling instruction, the operation module may use the height and the width of the pooling core as strides of corresponding dimensions, so as to ensure normal operation of summation pooling operation. For example, the operation module 12 may be further configured to move the pooling core on the data to be operated in a non-overlapping manner, and compare a plurality of data to be operated in an area corresponding to the pooling core to obtain an operation result.

In one possible implementation, when the operation domain does not include the pooling core height and the pooling core width, a preset default pooling core height and default pooling core width may be obtained, so that the control module and the operation module may execute the summation pooling instruction.

In one possible implementation, the operation module 12 is further configured to perform a summation pooling operation on data that is an integer multiple of the size of the pooling core in the data to be operated on when the size of the data to be operated on is a non-integer multiple of the size of the pooling core. The size of the data to be operated on is a non-integer multiple of the size of the pooling core, and may include at least one of the following: when the first stride is not included or the first stride is equal to the width of the pooling core, the input width of the data to be operated is a non-integer multiple of the width of the pooling core; when the first stride is included, the difference between the input width of the data to be operated and the width of the pooling core is a non-integer multiple of the first stride; when the second stride is not included or the second stride is equal to the height of the pooling core, the input height of the data to be operated is a non-integer multiple of the height of the pooling core; when the second stride is included, a difference between the input height of the data to be operated and the pooling core height is a non-integer multiple of the second stride. For example, the input data width and height are 5 and 4, respectively, the width and length of the pooling kernel are 2 and 2, respectively, and the first stride and the second stride are both 2 and 2. In this example, the data units are the same, and may be bytes, pixels, or the like, which is not limited thereto. Since the width of the pooling core is the same as the first stride, and the input data width is 5, which is not an integer multiple of the pooling core width, the data to be operated on, which is an integer multiple of the size of the pooling core, is subjected to summation pooling operation, i.e. the data with the input width of the first 4 data is subjected to summation pooling operation.

In this implementation, for the remaining data of the non-integer multiple of the pooled core in the data to be operated, the remaining data whose size is smaller than the pooled core size may be processed in a plurality of ways.

As shown in fig. 7a, for the remaining data of the non-integer multiple of the pooled core in the data to be operated, the remaining data of which the size is smaller than the pooled core size may not be subjected to the summation pooling operation. That is, for the above example, the last data of each row in the width direction may not be operated.

As shown in fig. 7a, the sum pooling operation may be directly performed on the remaining data of the size smaller than the size of the pooling core, where the remaining data is a non-integer multiple of the size of the pooling core in the data to be operated. That is, for the above example, even if only the last data remains in each row in the width direction, which is smaller than the pooling core width, the last data in each row in the width direction is summed up to obtain the operation result.

As shown in fig. 7b, for the remaining data of the non-integer multiple of the pooled core in the data to be operated, the size of the remaining data is smaller than that of the pooled core, and the remaining data is complemented and then summed and pooled to obtain an operation result. That is, for the above example, the complement may be made in the width direction, i.e., 1 default value, such as 0, is complemented in the width direction. At this time, the input width after complement is 6, which is an integer multiple of the pooling core width, and then the summation pooling operation is carried out to obtain the operation result.

As shown in fig. 7c, for the remaining data of the non-integer multiple of the size of the pooling core in the data to be operated, the position of the pooling core is moved reversely, so that the size of the data in the pooling core corresponding area after the reverse movement is equal to the size of the pooling core, the data in the pooling core corresponding area after the reverse movement comprises the remaining data, and the operation result is obtained by summing the pooling operation according to the data in the pooling core corresponding area after the movement. That is, in the above example, when the number of the pooled cores is 1 st, 2 nd, and 3 rd, 4 th, the pooled operation is performed as an integer multiple of the size of the pooled cores. When the pooling core is at the 5 th position, the position of the pooling core is not an integral multiple of the size of the pooling core, the position of the pooling core is moved reversely, namely, the position of the pooling core is moved forwards, namely, the positions of the pooling core at the 4 th and 5 th positions are the integral multiple of the size of the pooling core, and pooling operation is carried out, so that an operation result is obtained.

In one possible implementation, as shown in fig. 2 a-2 f, the apparatus may further comprise a storage module 13. The storage module 13 is used for storing data to be operated on and the pooling core.

In this implementation, the storage module may be one or more of a cache, a register, which may include a scratch pad cache, and may also include at least one NRAM (Neuron Random Access Memory, neuronal random access memory). The cache may be used to store data to be operated on and the results of the operation, and the registers may be used to store data to be operated on, scalar data, parameters, and the like.

In one possible implementation, the cache may comprise a neuron cache. The neuron cache, that is, the above-mentioned neuron random access memory, may be used to store neuron data in the data to be operated on, and the neuron data may include neuron vector data.

In one possible implementation, the apparatus may further include a direct memory access module for reading or storing data from the storage module.

In one possible implementation, as shown in fig. 2 a-2 f, the control module 11 may include an instruction storage sub-module 111, an instruction processing sub-module 112, and a queue storage sub-module 113.

The instruction storage sub-module 111 is used to store sum-pooling instructions.

The instruction processing sub-module 112 is configured to parse the summation pooling instruction to obtain an operation code and an operation domain of the summation pooling instruction.

The queue storage submodule 113 is configured to store an instruction queue, where the instruction queue includes a plurality of instructions to be executed, which may include a summation pooling instruction, sequentially arranged according to an execution order.

In this implementation manner, the execution sequence of the plurality of instructions to be executed may be arranged according to the receiving time, the priority level, and the like of the instructions to be executed to obtain an instruction queue, so that the plurality of instructions to be executed may be executed sequentially according to the instruction queue.

In one possible implementation, as shown in fig. 2 a-2 f, the control module 11 may also include a dependency processing sub-module 114.

The dependency relationship processing sub-module 114 is configured to cache a first to-be-executed instruction in the instruction storage sub-module 111 when determining that there is an association relationship between the first to-be-executed instruction in the plurality of to-be-executed instructions and a zeroth to-be-executed instruction before the first to-be-executed instruction, and extract the first to-be-executed instruction from the instruction storage sub-module 111 and send the first to-be-executed instruction to the operation module 12 after the execution of the zeroth to-be-executed instruction is completed.

The association relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction includes: the first storage address interval for storing the data required by the first instruction to be executed and the zeroth storage address interval for storing the data required by the zeroth instruction to be executed have overlapping areas. Otherwise, the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction have no association relationship, and the first memory address interval and the zeroth memory address interval have no overlapping area.

In this way, the first to-be-executed instruction can be executed after the execution of the previous zeroth to-be-executed instruction is finished according to the dependency relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction, so that the accuracy of the operation result is ensured.

In one possible implementation, the instruction format of the sum pooling instruction may be:

sumpool dstsrc0srcChannelsrcHeighsrcWidth

where shortool is the opcode of the sum pooling instruction and dst, src0, srcChannel, srcHeigh, srcWidth are the operation fields of the sum pooling instruction. Wherein dst is a target address, src0 is a data address to be operated, src channel is the number of input channels, src height is the input height, and src width is the input width. Namely, the data to be processed acquired from src0, the size of the data to be processed is as follows, the number of input channels is srcChannel, the input height is srchheigh, and the input width is srchwidth. The size of the pooled core adopts a default value. And storing the operation result after summation pooling into a place with an address dst.

sumpool dstsrc0srcChannelsrcHeighsrcWidth kernelHeightkernelWidth

Where shortool is the opcode of the sum pooling instruction and dst, src0, srcChannel, srcHeigh, srcWidth are the operation fields of the sum pooling instruction. Wherein dst is a target address, src0 is a data address to be operated, src channel is an input channel number, src height is an input height, src width is an input width, kernel height is a pooling core height, and kernel width is a pooling core width. Namely, the data to be processed acquired from src0, the size of the data to be processed is as follows, the number of input channels is srcChannel, the input height is srchheigh, and the input width is srchwidth. The size of the pooling core is as follows, the height of the pooling core is kernelHeight, and the width of the pooling core is kernelWidth. The moving step length of each pooling core is a default value, for example, the step length of each moving in the width direction is kernelWidth, and the step length of each moving in the height direction is kernalHeight. And storing the operation result after summation pooling into a place with an address dst.

sumpool dstsrc0srcChannelsrcHeighsrcWidthkernelHeightkernelWidth strideX strideY

where shortool is the opcode of the sum pooling instruction and dst, src0, srcChannel, srcHeigh, srcWidth, kernelHeight, kernelWidth, strideX, strideY are the operation fields of the sum pooling instruction. Wherein dst is a target address, src0 is a data address to be calculated, src channel is an input channel number, src height is an input height, src width is an input width, kernel height is a pooling core height, kernel width is a pooling core width, stride is a first stride of the pooling core moving in a width direction, and stride is a second stride of the pooling core moving in a height direction. Namely, the data to be processed acquired from src0, the size of the data to be processed is as follows, the number of input channels is srcChannel, the input height is srchheigh, and the input width is srchwidth. The size of the pooling core is as follows, the height of the pooling core is kernelHeight, and the width of the pooling core is kernelWidth. The step length of each movement of the pooling nucleus in the width direction is stride, and the step length of each movement in the height direction is stride. And storing the operation result after summation pooling into a place with an address dst.

It should be appreciated that one skilled in the art may set the opcode of the sum pooling instruction, the location of the opcode and the operation field in the instruction format, as desired, and this disclosure is not limited in this regard.

In one possible implementation, the apparatus may be disposed in one or more of a graphics processor (Graphics Processing Unit, GPU for short), a central processor (Central Processing Unit, CPU for short), and an embedded Neural network processor (Neural-network Processing Unit, NPU for short).

It should be noted that, although the summation pooling instruction processing apparatus is described above by way of example in the above embodiments, those skilled in the art will understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each module according to personal preference and/or actual application scene, so long as the technical scheme of the disclosure is met.

Application example

An application example according to an embodiment of the present disclosure is given below in conjunction with "performing a summation pooling operation with a summation pooling instruction processing apparatus" as one exemplary application scenario, so as to facilitate understanding of the flow of the summation pooling instruction processing apparatus. It will be appreciated by those skilled in the art that the following examples of applications are for purposes of facilitating understanding the embodiments of the present disclosure only and should not be construed as limiting the embodiments of the present disclosure

Fig. 3 shows a schematic diagram of an application scenario of a summation pooling instruction processing apparatus according to an embodiment of the disclosure. As shown in fig. 3, the process of the summation pooling instruction processing means for processing the summation pooling instruction is as follows:

the control module 11 parses the obtained summation and pooling instruction 1 (for example, the summation and pooling instruction 1 is a summation 5001005643221 21) to obtain an operation code and an operation domain of the summation and pooling instruction 1. The operation code of the summation pooling instruction 1 is sampling, the target address is 500, the data address to be operated is 100, the number of input channels is 5, the input height is 64, the input width is 32, the pooling core height is 2, the pooling core width is 1, the first step is 2, and the second step is 1. The control module 11 acquires 64×32×5 data to be operated from the data address to be operated 100.

The operation module 12 performs summation pooling operation on 64×32 scale data to be operated on 5 input channels by using pooling cores respectively to obtain an operation result, and stores the operation result into the target address 500.

The operation of the above modules may be described with reference to the relevant description above.

Therefore, the summation pooling instruction can be efficiently and rapidly processed, and the efficiency and the speed of the summation pooling operation are also remarkably improved.

The present disclosure provides a machine learning operation device that may include one or more of the above-described summation-pooling instruction processing devices for acquiring data to be operated and control information from other processing devices, and performing a specified machine learning operation. The machine learning computing device may obtain the summation pooling instruction from other machine learning computing devices or non-machine learning computing devices, and transmit the execution result to a peripheral device (may also be referred to as other processing devices) through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one summation and pooling instruction processing device is included, the summation and pooling instruction processing devices may be linked and data transmitted through a specific structure, for example, interconnected and data transmitted through a PCIE bus, so as to support operation of a larger-scale neural network. At this time, the same control system may be shared, or independent control systems may be provided; the memory may be shared, or each accelerator may have its own memory. In addition, the interconnection mode can be any interconnection topology.

The machine learning operation device has higher compatibility and can be connected with various types of servers through PCIE interfaces.

Fig. 4a shows a block diagram of a combined processing apparatus according to an embodiment of the disclosure. As shown in fig. 4a, the combined processing device includes the machine learning computing device, the universal interconnect interface, and other processing devices. The machine learning operation device interacts with other processing devices to jointly complete the operation designated by the user.

Other processing means may include one or more processor types of general purpose/special purpose processors such as Central Processing Units (CPU), graphics Processing Units (GPU), neural network processors, etc. The number of processors included in the other processing means is not limited. Other processing devices are used as interfaces between the machine learning operation device and external data and control, including data carrying, and complete basic control such as starting, stopping and the like of the machine learning operation device; the other processing device may cooperate with the machine learning computing device to complete the computing task.

And the universal interconnection interface is used for transmitting data and control instructions between the machine learning operation device and other processing devices. The machine learning operation device acquires required input data from other processing devices and writes the required input data into a storage device on a chip of the machine learning operation device; the control instruction can be obtained from other processing devices and written into a control cache on a machine learning operation device chip; the data in the memory module of the machine learning arithmetic device may be read and transmitted to the other processing device.

Fig. 4b shows a block diagram of a combined processing apparatus according to an embodiment of the disclosure. In a possible implementation, as shown in fig. 4b, the combined processing device may further comprise a storage device, which is connected to the machine learning computing device and the other processing device, respectively. The storage device is used for storing data of the machine learning arithmetic device and the other processing devices, and is particularly suitable for data which cannot be stored in the machine learning arithmetic device or the other processing devices in the internal storage of the data required to be calculated.

The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle, video monitoring equipment and the like, so that the core area of a control part is effectively reduced, the processing speed is improved, and the overall power consumption is reduced. In this case, the universal interconnect interface of the combined processing apparatus is connected to some parts of the device. Some components such as cameras, displays, mice, keyboards, network cards, wifi interfaces.

The present disclosure provides a machine learning chip including the machine learning arithmetic device or the combination processing device described above.

The present disclosure provides a machine learning chip packaging structure including the machine learning chip described above.

The present disclosure provides a board card, and fig. 5 shows a schematic structural diagram of the board card according to an embodiment of the present disclosure. As shown in fig. 5, the board card includes the above machine learning chip package structure or the above machine learning chip. In addition to including machine learning chip 389, the board card may include other kits including, but not limited to: a memory device 390, an interface device 391 and a control device 392.

The memory device 390 is connected to the machine learning chip 389 (or the machine learning chip within the machine learning chip package structure) via a bus for storing data. Memory device 390 may include multiple sets of memory cells 393. Each set of memory units 393 is connected to the machine learning chip 389 via a bus. It is understood that each set of memory cells 393 may be DDR SDRAM (English: double Data Rate SDRAM, double Rate synchronous dynamic random Access memory).

DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on both the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM.

In one embodiment, memory device 390 may include 4 sets of memory cells 393. Each set of memory cells 393 may include a plurality of DDR4 particles (chips). In one embodiment, the machine learning chip 389 may include 4 72-bit DDR4 controllers within, where 64 bits of the 72-bit DDR4 controllers are used to transfer data and 8 bits are used for ECC verification.

In one embodiment, each set of memory cells 393 includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is provided in the machine learning chip 389 for controlling data transfer and data storage for each memory unit 393.

The interface device 391 is electrically connected to the machine learning chip 389 (or the machine learning chip within the machine learning chip package structure). The interface device 391 is used to enable data transfer between the machine learning chip 389 and an external device (e.g., a server or computer). For example, in one embodiment, the interface device 391 may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the machine learning chip 289 through a standard PCIE interface, so as to implement data transfer.

In another embodiment, the interface device 391 may be another interface, and the disclosure is not limited to the specific implementation form of the other interface, and the interface device may be capable of implementing the transfer function. In addition, the calculation result of the machine learning chip is still transmitted back to the external device (e.g., server) by the interface device.

The control device 392 is electrically connected to the machine learning chip 389. The control device 392 is configured to monitor the status of the machine learning chip 389. Specifically, machine learning chip 389 and control device 392 may be electrically connected via an SPI interface. The control device 392 may include a single-chip microcomputer (Micro Controller Unit, MCU). For example, the machine learning chip 389 may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may drive a plurality of loads. Therefore, the machine learning chip 389 may be in different operating states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the machine learning chip.

The present disclosure provides an electronic device including the machine learning chip or the board card described above.

The electronic device may include a data processing apparatus, a computer device, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

The vehicle may include an aircraft, a ship, and/or a vehicle. The household appliances may include televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers, range hoods. The medical device may include a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.

Fig. 6 illustrates a flow chart of a summation pooling instruction processing method according to an embodiment of the present disclosure. The method may be applied to, for example, a computer device comprising a memory and a processor, wherein the memory is used to store data used during execution of the method; the processor is configured to perform related processing and operation steps, such as performing step S51 and step S52 described below. As shown in fig. 6, the method is applied to the above-described summation pooling instruction processing apparatus, and includes steps S51 and S52.

In step S51, the obtained pooled instruction is parsed to obtain an operation code and an operation domain of the pooled instruction, and data to be operated, pooled core and target address required for executing the pooled instruction are obtained according to the operation domain. The operation code is used for indicating the operation of the summation pooling instruction on the data to be summation pooling operation, and the operation domain comprises a data address to be operated, a target address and a pooling core.

In step S52, the data to be operated is subjected to summation pooling operation according to the pooling check, an operation result is obtained, and the operation result is stored in the target address.

In one possible implementation manner, the summing pooling operation is performed on the data to be operated according to the pooling check, so as to obtain an operation result, which may include:

and summing the plurality of data to be operated in the area corresponding to the pooling core by utilizing a plurality of adders in the operation module to obtain an operation result.

In one possible implementation, the operation module includes a master operation sub-module and a plurality of slave operation sub-modules, the master operation sub-module including an adder,

the method comprises the steps of carrying out summation pooling operation on data to be operated according to pooling check to obtain an operation result, storing the operation result into a target address, and comprising the following steps:

And carrying out summation operation on the data to be operated in the area corresponding to the pooling core by using an adder to obtain an operation result, and storing the operation result into the target address.

In one possible implementation, the operation domain may further include an input height and an input width. The obtaining, according to the operation domain, data to be operated, a pooling core and a target address required by executing the summation pooling instruction may include:

and acquiring the data to be operated corresponding to the input width and the input height from the data address to be operated.

In one possible implementation, the operation domain may also include the number of input channels. The obtaining, according to the operation domain, data to be operated, a pooling core and a target address required by executing the summation pooling instruction may include:

and acquiring the data to be operated corresponding to the input channel number from the data address to be operated.

In one possible implementation, the operation field may also include a first stride. The step of carrying out summation pooling operation on the data to be operated according to the pooling check may include: the pooling core is moved in the width direction in a first step.

In one possible implementation, the operation field may further include a second stride. The step of carrying out summation pooling operation on the data to be operated according to the pooling check may include: the pooling nucleus is moved in the height direction in a second step.

and moving the pooling core on the data to be operated, and comparing a plurality of data to be operated in the area corresponding to the pooling core to obtain an operation result.

In one possible implementation manner, the summing pooling operation is performed on the data to be operated according to the pooling check, so as to obtain an operation result, which may include: and when the size of the data to be operated is a non-integer multiple of the size of the pooling core, carrying out summation pooling operation on the data which is the integer multiple of the size of the pooling core in the data to be operated.

The size of the data to be operated is a non-integer multiple of the size of the pooling core, and the method comprises at least one of the following steps: when the first stride is not included or the first stride is equal to the width of the pooling core, the input width of the data to be operated is a non-integer multiple of the width of the pooling core; when the first stride is included, the difference between the input width of the data to be operated and the width of the pooling core is a non-integer multiple of the first stride; when the second stride is not included or the second stride is equal to the height of the pooling core, the input height of the data to be operated is a non-integer multiple of the height of the pooling core; when the second stride is included, a difference between the input height of the data to be operated and the pooling core height is a non-integer multiple of the second stride.

In one possible implementation manner, the summing pooling operation is performed on the data to be operated according to the pooling check, so as to obtain an operation result, and the method may further include: and when the size of the residual data in the data to be operated is smaller than the pooling core size, carrying out no summation pooling operation on the residual data.

In one possible implementation manner, the summing pooling operation is performed on the data to be operated according to the pooling check, so as to obtain an operation result, and the method may further include: and when the size of the residual data in the data to be operated is smaller than the size of the pooling core, carrying out summation pooling operation on the residual data to obtain an operation result.

In one possible implementation manner, the summing pooling operation is performed on the data to be operated according to the pooling check, so as to obtain an operation result, and the method may further include: and when the size of the residual data in the data to be operated is smaller than the size of the pooling core, carrying out summation pooling operation after supplementing the residual data, and obtaining an operation result.

In one possible implementation manner, the summing pooling operation is performed on the data to be operated according to the pooling check, so as to obtain an operation result, and the method may further include: and when the size of the residual data in the data to be operated is smaller than the size of the pooling core, reversely moving the position of the pooling core, so that the size of the data in the pooling core corresponding area after the reverse movement is equal to the size of the pooling core, and the data in the pooling core corresponding area after the reverse movement comprises the residual data, and carrying out summation pooling operation according to the data in the pooling core corresponding area after the movement to obtain an operation result.

In one possible implementation, the method may further include: the storage module of the device is used for storing data to be operated and operation results.

In one possible implementation manner, the parsing the obtained summation and pooling instruction to obtain the operation code and the operation domain of the summation and pooling instruction may include:

storing a summation pooling instruction;

analyzing the summation pooling instruction to obtain an operation code and an operation domain of the summation pooling instruction;

the instruction queue is stored, the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the plurality of instructions to be executed can comprise summation pooling instructions.

In one possible implementation, the method may further include: when determining that the association relation exists between a first to-be-executed instruction in the plurality of to-be-executed instructions and a zeroth to-be-executed instruction before the first to-be-executed instruction, caching the first to-be-executed instruction, executing the first to-be-executed instruction after the execution of the zeroth to-be-executed instruction is finished,

the association relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction comprises at least one of the following:

a first storage address interval for storing data required by the first instruction to be executed and a zeroth storage address interval for storing data required by the zeroth instruction to be executed have overlapping areas;

The first operation part required to execute the first instruction to be executed is the same as or partially the same as the zeroth operation part required to execute the zeroth instruction to be executed.

It should be noted that, although the summation pooling instruction processing method is described above by way of example in the above embodiments, those skilled in the art will appreciate that the present disclosure should not be limited thereto. In fact, the user can flexibly set each step according to personal preference and/or actual application scene, so long as the technical scheme of the disclosure is met.

The method for processing the summation pooling instruction provided by the embodiment of the disclosure has wide application range, high processing efficiency and high processing speed for the summation pooling instruction, and high efficiency and high speed for carrying out summation pooling operation.

The present disclosure also provides a non-transitory computer readable storage medium having stored thereon computer program instructions, characterized in that the computer program instructions, when executed by a processor, implement the above-described summation pooling instruction processing method.

It should be noted that, for simplicity of description, the foregoing method embodiments are all depicted as a series of acts, but it should be understood by those skilled in the art that the present disclosure is not limited by the order of acts described, as some steps may occur in other orders or concurrently in accordance with the disclosure. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.

It should be further noted that, although the steps in the flowchart of fig. 6 are sequentially shown as indicated by arrows, the steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 6 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

It should be understood that the above-described device embodiments are merely illustrative and that the device of the present disclosure may be implemented in other ways. For example, the division of the units/modules in the above embodiments is merely a logic function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted or not performed.

In addition, unless specifically stated, each functional unit/module in the embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may be integrated together. The integrated units/modules described above may be implemented either in hardware or in software program modules.

The integrated units/modules, if implemented in hardware, may be digital circuits, analog circuits, etc. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. Unless otherwise indicated, the Memory modules may be any suitable magnetic or magneto-optical storage medium, such as resistive Random Access Memory RRAM (Resistive Random Access Memory), dynamic Random Access Memory DRAM (Dynamic Random Access Memory), static Random Access Memory SRAM (Static Random Access Memory), enhanced dynamic Random Access Memory EDRAM (Enhanced Dynamic Random Access Memory), high-Bandwidth Memory HBM (High-Bandwidth Memory), hybrid Memory cube HMC (Hybrid Memory Cube), etc.

The integrated units/modules may be stored in a computer readable memory if implemented in the form of software program modules and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method described in the various embodiments of the present disclosure. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments. The technical features of the foregoing embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the foregoing embodiments are not described, however, all of the combinations of the technical features should be considered as being within the scope of the disclosure.

The foregoing may be better understood in light of the following clauses:

clause A1, a summation pooling instruction processing apparatus, the apparatus comprising:

Clause A2, the apparatus of clause A1, the operation module comprising:

and the adder is used for carrying out summation operation on the data to be operated in the area corresponding to the pooling core to obtain an operation result.

Clause A3, the apparatus of clause A2, the operation module comprising a master operation sub-module and a plurality of slave operation sub-modules, the master operation sub-module comprising one or more of the adders,

and the main operation submodule is used for carrying out summation operation on data to be operated in a region corresponding to the pooling core by using one or more adders to obtain an operation result, and storing the operation result into the target address.

Clause A4, the apparatus of clause A2, the operation module comprising a master operation sub-module and a plurality of slave operation sub-modules, the slave operation sub-modules comprising one or more of the adders,

the main operation sub-module is used for receiving the data to be operated, the pooling core and the target address required by the execution of the summation pooling instruction, which are acquired by the control module, and distributing and transmitting the data to be operated, the pooling core and the target address required by the respective execution of the summation pooling instruction to the slave operation sub-module;

the slave operation sub-module is configured to receive the data to be operated, the pooling core and the target address, which are allocated and transmitted by the master operation sub-module and are required by executing the summation pooling instruction, perform summation operation on the data to be operated in the area corresponding to the pooling core by using one or more adders, obtain an operation result, and store the operation result in the target address.

Clause A5, the device of clause A1, the operation field further comprising an input height and an input width,

the control module is further configured to obtain data to be operated corresponding to the input width and the input height from the data address to be operated.

Clause A6, the apparatus of clause A1, the operation field further comprising an input channel number,

the control module is further configured to obtain data to be operated corresponding to the number of input channels from the data address to be operated.

Clause A7, the apparatus of clause A1, the operation domain further comprising a pooling kernel height and a pooling kernel width,

and the operation module is further used for carrying out summation pooling operation on the data to be operated according to the pooling core height and the pooling core width.

Clause A8, the apparatus of clause A1, the operation field further comprising a first stride,

the operation module is further used for moving the pooling core in the width direction according to the first step.

Clause A9, the apparatus of clause A1, the operation field further comprising a second stride,

the operation module is further used for moving the pooling core in the height direction according to the second step.

Clause a10, the device according to clause A1, wherein the operation module is further configured to move the pooling core on the data to be operated, and compare a plurality of data to be operated in an area corresponding to the pooling core, so as to obtain the operation result.

Clause a11, the apparatus of clause A1, the operation module further configured to perform a summation pooling operation on data that is an integer multiple of the size of the pooling core in the data to be operated when the size of the data to be operated is a non-integer multiple of the size of the pooling core,

And a clause a12, where the operation module is further configured to perform a summation pooling operation on the remaining data or perform a summation pooling operation after the complement is performed when the size of the remaining data in the data to be operated is smaller than the pooling core size, so as to obtain an operation result.

The apparatus according to clause a13, wherein the operation module is further configured to, when the size of the remaining data in the data to be operated is smaller than the size of the pooled core, reversely move the position of the pooled core, so that the size of the data in the pooled core corresponding area after the reverse movement is equal to the size of the pooled core, and the data in the pooled core corresponding area after the reverse movement includes the remaining data, and perform a summation pooled operation according to the data in the pooled core corresponding area after the movement, to obtain an operation result.

Clause a14, the apparatus of clause A1, further comprising:

and the storage module is used for storing the data to be operated and the operation result.

Clause a15, the apparatus of clause A1, the control module comprising:

an instruction storage sub-module for storing the summation pooling instruction;

the instruction processing sub-module is used for analyzing the summation pooling instruction to obtain an operation code and an operation domain of the summation pooling instruction;

the queue storage submodule is used for storing an instruction queue, the instruction queue comprises a plurality of instructions to be executed, the instructions to be executed are sequentially arranged according to an execution sequence, and the instructions to be executed comprise the summation pooling instructions.

Clause a16, the apparatus of clause a15, the control module further comprising:

a dependency relationship processing sub-module, configured to cache a first to-be-executed instruction in the plurality of to-be-executed instructions in the instruction storage sub-module when determining that there is an association relationship between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction, extract the first to-be-executed instruction from the instruction storage sub-module after the execution of the zeroth to-be-executed instruction is completed, send the first to-be-executed instruction to the operation module,

wherein, the association relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction includes at least one of the following:

Clause a17, a machine learning computing device, the device comprising:

One or more summation pooling instruction processing apparatuses according to any one of clauses A1 to a16, configured to acquire data to be operated and control information from other processing apparatuses, perform specified machine learning operation, and transmit the execution result to the other processing apparatuses through an I/O interface;

Clause a18, a combination processing device, the combination processing device comprising:

the machine learning computing device, universal interconnect interface, and other processing device of clause a 17;

The machine learning operation device interacts with the other processing devices to jointly complete the calculation operation designated by the user,

wherein the combination processing device further comprises: and a storage device connected to the machine learning operation device and the other processing device, respectively, for storing data of the machine learning operation device and the other processing device.

Clause a19, a machine learning chip, the machine learning chip comprising:

the machine learning computing device of clause a17 or the combination processing device of clause a 18.

Clause a20, an electronic device, comprising:

the machine learning chip of clause a 19.

Clause a21, a board card, the board card comprising: a memory device, interface means and control device, and a machine learning chip as set forth in clause a 19;

wherein the machine learning chip is respectively connected with the storage device, the control device and the interface device;

the storage device is used for storing data;

the interface device is used for realizing data transmission between the machine learning chip and external equipment;

the control device is used for monitoring the state of the machine learning chip.

Clause a22, a summation pooling instruction processing method, the method being applied to a summation pooling instruction processing device, the method comprising:

Clause a23, according to the method of clause a22, performing summation pooling operation on the data to be operated according to the pooling check to obtain an operation result, including:

and carrying out summation operation on the data to be operated in the area corresponding to the pooling core to obtain an operation result.

Clause a24, the method of clause a23, the operation module comprising a master operation sub-module and a plurality of slave operation sub-modules, the master operation sub-module comprising the adder,

the step of carrying out summation pooling operation on the data to be operated according to the pooling check to obtain an operation result, and storing the operation result into the target address, wherein the step of carrying out summation pooling operation on the data to be operated comprises the following steps:

And carrying out summation operation on the data to be operated in the area corresponding to the pooling core to obtain an operation result, and storing the operation result into the target address.

Clause a25, the method of clause a23, the operation module comprising a master operation sub-module and a plurality of slave operation sub-modules, the slave operation sub-modules comprising the adder,

and carrying out summation operation on a plurality of data to be operated in the area corresponding to the pooling core to obtain an operation result, and storing the operation result into the target address.

Receiving the data to be operated, the pooling core and the target address required by the summation pooling instruction, and distributing and transmitting the data to be operated, the pooling core and the target address required by the summation pooling instruction to the slave operation submodule;

and receiving required data to be operated, pooling cores and target addresses, which are distributed and transmitted by a main operation submodule and correspond to the pooling instruction, carrying out summation operation on the data to be operated in the area corresponding to the pooling cores to obtain an operation result, and storing the operation result into the target addresses.

Clause a26, the method of clause a22, the operation field further comprising an input height and an input width,

the method for obtaining the data to be operated, the pooling core and the target address required by executing the summation pooling instruction according to the operation domain comprises the following steps:

Clause a27, the method of clause a22, the operation field further comprising an input channel number,

and acquiring data to be operated corresponding to the input channel number from the data address to be operated.

Clause a28, the method of clause a22, the operation domain further comprising a pooling core height and a pooling core width, wherein obtaining data to be operated on, a pooling core, and a target address required to execute the sum pooling instruction according to the operation domain comprises:

and carrying out summation pooling operation on the data to be operated according to the pooling core height and the pooling core width.

Clause a29, the method of clause a22, the operation field further comprising a first stride,

And carrying out summation pooling operation on the data to be operated according to the pooling check, wherein the summation pooling operation comprises the following steps:

the pooling core is moved in a width direction according to the first stride.

Clause a30, the method of clause a22, the operation field further comprising a second stride,

the pooling core is moved in the height direction according to the second stride.

Clause a31, the method according to clause a22, wherein the performing a summation pooling operation on the data to be operated according to the pooling check to obtain an operation result includes:

and moving the pooling core on the data to be operated in a non-overlapping manner, and comparing a plurality of data to be operated in a region corresponding to the pooling core to obtain the operation result.

Clause a32, the method according to clause a22, wherein the performing a summation pooling operation on the data to be operated according to the pooling check to obtain an operation result includes:

when the size of the data to be operated is a non-integer multiple of the size of the pooling core, carrying out summation pooling operation on the data which is the integer multiple of the size of the pooling core in the data to be operated,

Clause a33, the method according to clause a32, wherein the performing a summation pooling operation on the data to be operated according to the pooling check to obtain an operation result, further includes: and when the size of the residual data in the data to be operated is smaller than the size of the pooling core, carrying out summation pooling operation on the residual data or carrying out complement and then carrying out summation pooling operation to obtain an operation result.

Clause a34, the method according to clause a32, wherein the performing a summation pooling operation on the data to be operated according to the pooling check to obtain an operation result, further includes:

and when the size of the residual data in the data to be operated is smaller than the size of the pooling core, reversely moving the position of the pooling core, so that the size of the data in the pooling core corresponding area after the reverse movement is equal to the size of the pooling core, and the data in the pooling core corresponding area after the reverse movement comprises the residual data, and carrying out summation pooling operation according to the data in the pooling core corresponding area after the movement to obtain an operation result.

Clause a35, the method of clause a22, the method further comprising:

and storing the data to be operated and the operation result by using a storage module of the device.

Clause a36, the method according to clause a22, analyzing the obtained summation pooling instruction to obtain the operation code and the operation domain of the summation pooling instruction, including:

storing the sum pooling instruction;

and storing an instruction queue, wherein the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the instructions to be executed comprise the summation pooling instruction.

Clause a37, the method of clause a36, further comprising:

when determining that the first to-be-executed instruction in the plurality of to-be-executed instructions has an association relation with a zeroth to-be-executed instruction before the first to-be-executed instruction, caching the first to-be-executed instruction, controlling the execution of the first to-be-executed instruction after determining that the zeroth to-be-executed instruction is executed,

Clause a38, a non-transitory computer readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement the method of any of clauses a22 to a 37.

The foregoing has outlined rather broadly the more detailed description of embodiments of the present application, wherein specific examples are provided herein to illustrate the principles and embodiments of the present application, the above examples being provided solely to assist in the understanding of the methods of the present application and the core ideas thereof; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A summation pooling instruction processing apparatus, the apparatus comprising:

the operation module is used for carrying out summation pooling operation on the data to be operated according to the pooling core to obtain an operation result, and storing the operation result into the target address, wherein the residual data which is non-integer times of the pooling core in the data to be operated is smaller than the residual data of the pooling core in size, and the summation pooling operation processing is carried out according to any one of the following modes:

Directly carrying out summation pooling operation;

carrying out summation pooling operation after supplementing numbers;

and reversely moving the position of the pooling core, so that the size of the data in the pooling core corresponding area after the reverse movement is equal to the size of the pooling core, the data in the pooling core corresponding area after the reverse movement comprises the residual data, and carrying out summation pooling operation according to the data in the pooling core corresponding area after the movement.

2. The apparatus of claim 1, wherein the operation module comprises:

and the adder is used for carrying out addition operation on the data to be operated in the area corresponding to the pooling core to obtain an operation result.

3. The apparatus of claim 2, wherein the operation module comprises a master operation sub-module and a plurality of slave operation sub-modules, the master operation sub-module comprising the adder,

and the main operation sub-module is used for carrying out summation operation on a plurality of data to be operated in the area corresponding to the pooling core by utilizing the adder to obtain an operation result, and storing the operation result into the target address.

4. A machine learning computing device, the device comprising:

One or more summation pooling instruction processing apparatuses according to any one of claims 1-3, configured to obtain data to be operated and control information from other processing apparatuses, perform specified machine learning operation, and transmit the execution result to the other processing apparatuses through an I/O interface;

5. A combination processing apparatus, characterized in that the combination processing apparatus comprises:

the machine learning computing device, universal interconnect interface, and other processing device of claim 4;

6. A machine learning chip, the machine learning chip comprising:

the machine learning arithmetic device of claim 4 or the combination processing device of claim 5.

7. An electronic device, the electronic device comprising:

the machine learning chip of claim 6.

8. A board, characterized in that, the board includes: a memory device, an interface device and a control device, and a machine learning chip as claimed in claim 6;

the storage device is used for storing data;

9. A method of processing a summation pooling instruction, the method being applied to a summation pooling instruction processing apparatus, the method comprising:

carrying out summation pooling operation on the data to be operated according to the pooling core to obtain an operation result, and storing the operation result into the target address, wherein for the residual data which is non-integer multiple of the pooling core in the data to be operated, the size of the residual data is smaller than the size of the pooling core, carrying out summation pooling operation processing according to any one of the following modes:

directly carrying out summation pooling operation;

carrying out summation pooling operation after supplementing numbers;

10. A non-transitory computer readable storage medium, having stored thereon computer program instructions, which when executed by a processor, implement the method of claim 9.