WO2022037422A1

WO2022037422A1 - Processor, implementation method, electronic device, and storage medium

Info

Publication number: WO2022037422A1
Application number: PCT/CN2021/110952
Authority: WO
Inventors: 严小平
Original assignee: 北京百度网讯科技有限公司
Priority date: 2020-08-21
Filing date: 2021-08-05
Publication date: 2022-02-24
Also published as: EP4075759A1; US11784946B2; CN112152947B; CN112152947A; EP4075759A4; JP2023517921A; JP7379794B2; US20230179546A1; KR20220122756A

Abstract

The present application relates to the field of artificial intelligence and deep learning. Disclosed are a processor, an implementation method, an electronic device, and a storage medium. The processor comprises: a system controller, used for transmitting predetermined data pack information to a data packing and unpacking module; the data packing and unpacking module, used for acquiring corresponding data pack data from a storage array module on the basis of the data pack information, packing the data pack information, transmitting a first data pack produced by packing to a computing module for computational processing, acquiring a second data pack returned by the computing module, unpacking to produce computation result data, and storing same in the storage array module; the storage array module, used for data storage; and the computing module, used for computational processing with respect to the first data pack received, generating the second data pack on the basis of the computation result, and returning to the data packing and unpacking module. The application of the solution of the present application reduces design difficulty and increases overall processing efficiency.

Description

Processor and implementation method, electronic device and storage medium

This application claims the priority of the Chinese patent application with the filing date of August 21, 2020 and the application number of 2020108517577, the invention title is "processor and implementation method, electronic device and storage medium".

technical field

The present application relates to computer application technologies, in particular to processors and implementation methods, electronic devices and storage media in the field of artificial intelligence and deep learning.

Background technique

More and more intelligent applications make neural network algorithms more diverse, making the overall neural network model more and more complex, correspondingly, bringing a larger amount of operations and data storage interactions. More and more attention has been paid to neural network-based processors such as NPU (Network Processing Unit) chips.

The current NPU includes two mainstream design methods with accelerators as the core and instruction expansion as the core. The former design method is rarely used due to its poor versatility and scalability, and the latter design method is mainly used. However, in the latter design method, it is necessary to expand the tedious instruction set corresponding to the operation of the neural network, and it is necessary to develop a special compiler to support it. The design is very difficult, especially when it is applied to real-time processing of speech data.

SUMMARY OF THE INVENTION

The present application provides processors and implementation methods, electronic devices, and storage media.

A processor, comprising: a system controller, a storage array module, a data packing and unpacking module, and an arithmetic module;

the system controller, configured to send predetermined data packet information to the data packaging and unpacking module;

The data packaging and unpacking module is configured to obtain the corresponding data packet data from the storage array module according to the data packet information, package the data packet data and the data packet information, and sending the data packet to the operation module for operation processing, and acquiring the second data packet returned by the operation module, obtaining operation result data by unpacking the second data packet, and storing it in the storage array module;

The storage array module is used for data storage;

The operation module is configured to perform operation processing on the acquired first data packet, generate the second data packet according to the operation result data, and return it to the data packaging and unpacking module.

A processor implementation method, comprising:

Build a processor consisting of a system controller, a storage array module, a data packaging and unpacking module, and an arithmetic module;

Use the processor to perform neural network operations; wherein, the system controller is configured to send predetermined data packet information to the data packing and unpacking module; the data packing and unpacking module is configured to convert from The storage array module obtains the corresponding data packet data, packages the data packet data and the data packet information, sends the first data packet obtained by packaging to the operation module for operation processing, and obtains the operation For the second data packet returned by the module, the operation result data is obtained by unpacking the second data packet and stored in the storage array module; the storage array module is used for data storage; the operation module is used for Perform operation processing on the acquired first data packet, generate the second data packet according to the operation result data, and return it to the data packaging and unpacking module.

An electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method as described above.

An embodiment in the above application has the following advantages or beneficial effects: a storage-computing integration implementation is proposed, and the overall interaction from neural network storage to computing is completed in the processor, avoiding complex instruction design and difficult operations. Compiler development, etc., thereby reducing the design difficulty and improving the overall processing efficiency.

It should be understood that what is described in this section is not intended to identify key or critical features of embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.

Description of drawings

The accompanying drawings are used for better understanding of the present solution, and do not constitute a limitation to the present application. in:

FIG. 1 is a schematic diagram of the composition and structure of the first embodiment of the processor 10 described in this application;

FIG. 2 is a schematic structural diagram of the composition of the processor 10 according to the second embodiment of the present application;

FIG. 3 is a schematic diagram of the composition and structure of the processor 10 according to the third embodiment of the present application;

FIG. 4 is a flowchart of an embodiment of a method for implementing a processor described in this application;

FIG. 5 is a block diagram of an electronic device according to the method described in the embodiment of the present application.

detailed description

Exemplary embodiments of the present application are described below with reference to the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

In addition, it should be understood that the term "and/or" in this document is only an association relationship for describing associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, and A exists at the same time and B, there are three cases of B alone. In addition, the character "/" in this document generally indicates that the related objects are an "or" relationship.

FIG. 1 is a schematic structural diagram of a first embodiment of the processor 10 described in this application. As shown in FIG. 1 , it includes: a system controller 101 , a storage array module 102 , a data packing and unpacking module 103 and an arithmetic module 104 .

The system controller 101 is configured to send the predetermined data packet information to the data packing and unpacking module 103 .

The data packaging and unpacking module 103 is used to obtain the corresponding data packet data from the storage array module 102 according to the data packet information, package the data packet data and the data packet information, and send the packaged first data packet to the computing module 104 for processing. The operation processing is performed, and the second data packet returned by the operation module 104 is acquired, and the operation result data is obtained by unpacking the second data packet, and stored in the storage array module 102 .

The storage array module 102 is used for data storage.

The operation module 104 is configured to perform operation processing on the acquired first data packet, generate a second data packet according to the operation result data, and return it to the data packaging and unpacking module 103.

It can be seen that the above-mentioned embodiment proposes an integrated implementation of storage and computing, which completes the overall interaction between neural network storage and computing in the processor, and avoids complex instruction design and difficult compiler development. It reduces the design difficulty and improves the overall processing efficiency.

On the basis shown in FIG. 1 , the processor 10 may further include one or all of the following: a direct memory access (DMA, Direct Memory Access) module, and a routing switch module.

Preferably, the above two modules may be included at the same time. Correspondingly, FIG. 2 is a schematic structural diagram of the composition of the second embodiment of the processor 10 described in this application. As shown in FIG. 2 , it includes: a system controller 101 , a storage array module 102 , a data packing and unpacking module 103 , an arithmetic module 104 , a DMA module 105 and a routing switching module 106 .

The DMA module 105 is used to realize high-speed exchange of external storage data and internal storage array data in the storage array module 103 under the control of the system controller 101 .

The routing switching module 106 is configured to send the first data packet obtained from the data packing and unpacking module 103 to the computing module 104 , and send the second data packet obtained from the computing module 104 to the data packing and unpacking module 103 .

As shown in FIG. 2 , the operation module 104 may further include: a general operation module 1041 and an activation operation module 1042 . As the name implies, the general operation module 1041 can be used to perform general operations, and the activation operation module 1042 can be used to perform activation operations.

The system controller 101 may adopt a simple control logic or state machine design, or may include a complex processor IP, where IP is the abbreviation of Intellectual Property, such as the complex processor IP may include an advanced reduced instruction set Machine (ARM, Advanced RISC Machine), Digital Signal Processing (DSP, Digital Signal Processing), X86, Microcontroller Unit (MCU, Microcontroller Unit) core IP, etc.

The storage array module 102 can be composed of multiple groups of Static Random-Access Memory (SRAM, Static Random-Access Memory), supports multi-port high-speed simultaneous reading or writing, and can implement data cache or storage by means of a matrix. The data stored in the storage array module 102 may include neural network model data, external input data, temporary data of the middle layer, and the like.

The data packaging and unpacking module 103 can perform data reading and storage operations on the storage array module 102, perform packaging operations on the data packet information obtained from the system controller 101 and the data packet data in the storage array module 102, and store the data obtained by packaging. A data packet is sent to the computing module 104 through the routing switching module 106 , and the second data packet returned by the computing module 104 through the routing switching module 106 is unpacked, and the obtained operation result data is stored in the storage array module 102 .

Correspondingly, the routing switching module 106 can receive the data packets of the data packing and unpacking module 103 and the computing module 104, and perform data exchange and the like.

The general operations performed by the general operation module 1041 may include general vector operations such as vector arithmetic operations, logical operations, comparison operations, dot multiplication, accumulation, and summation. The activation operation performed by the activation operation module 1042 may include one or more of nonlinear functions sigmoid, tanh, relu, softmax operation, and the like.

The system controller 101 can manage and control the whole, such as sending the data packet information to the data packing and unpacking module 102 as mentioned above, so that the data packing and unpacking module 102 can carry out the packing and unpacking of the data, etc., and can be responsible for starting the DMA module. 105 to realize high-speed exchange of external storage data and internal storage array data in the storage array module 102, and the like.

It can be seen that, in the above embodiment, the processor adopts the main structure of the storage array module + the data packaging and unpacking module + the routing switching module as a whole, which completes the overall interaction from the neural network storage to the operation, and avoids complex instruction design and high difficulty. Compiler development, etc., thus reducing the design difficulty and improving the overall processing efficiency.

FIG. 3 is a schematic structural diagram of a third embodiment of the processor 10 described in this application. As shown in FIG. 3 , it includes: a system controller 101 , a storage array module 102 , a data packing and unpacking module 103 , an arithmetic module 104 , a DMA module 105 and a routing switching module 106 . The storage array module 102 may include N1 storage units 1021, each storage unit 1021 may be a set of SRAM, etc., and the data packing and unpacking module 103 may include N2 data packing and unpacking units 1031, each data packing and unpacking unit 1031 The packet unit 1031 can be respectively connected to the routing switching module 106 through a data channel, and N1 and N2 are both positive integers greater than one. In addition, the general operation module 1041 can include M operation units 10411, and the activation operation module 1042 can include There are P arithmetic units 10421, and each arithmetic unit 10411/10421 can be connected to the routing switching module 106 through a data channel respectively, and M and P are both positive integers greater than one. The specific values of N1, N2, M and P can be determined according to actual needs.

Correspondingly, the data packaging and unpacking unit 1031 can package the data packet data obtained from the storage unit 1021 and the data packet information obtained from the system controller 101, and use the data channel to exchange the first data packet obtained through routing. The module 106 sends the operation unit 10411/10421 for operation processing, and uses the data channel to obtain the second data packet returned by the operation unit 10411/10421 through the routing switching module 106, and obtains operation result data by unpacking the second data packet, stored in the storage unit 1021.

In practical applications, the system controller 101 can simulate the details of each neural network operation, such as what data needs to be obtained, where to obtain it, what kind of operation needs to be performed, etc. Correspondingly, it can generate data packet information and send it to the relevant The data packing and unpacking unit 1031. Each data packing and unpacking unit 1031 can work in parallel, such as respectively acquiring data packet information from the system controller 101 and performing packing and unpacking operations.

Correspondingly, the data packet information may include: source channel, source address, destination channel (operation channel), operation type, data packet length, and the like. The data packing and unpacking unit 1031 can obtain the data packet data from the source address of the storage unit 1021 corresponding to the source channel, and the routing switching module 106 can send the obtained first data packet to the operation unit 10411/10421 corresponding to the destination channel, and calculate The units 10411/10421 can perform corresponding types of operation processing according to the operation type.

Preferably, the values of N1 and N2 are the same, that is, the number of storage units 1021 and data packing and unpacking units 1031 is the same, and each data packing and unpacking unit 1031 may respectively correspond to one storage unit 1021, and the data from the corresponding storage unit 1021 Get packet data. In this way, the parallel work of each data packing and unpacking unit 1031 can be better ensured. Assuming that both data packing and unpacking units 1031 can obtain data from a certain storage unit 1021, there may be a waiting situation, that is, one of the data The packing and unpacking unit 1031 needs to wait for another data packing and unpacking unit 1031 to obtain the data before it can obtain the data, thereby causing a reduction in efficiency and the like.

In the above processing method, by dividing the units, the parallel processing capability is improved, and the data storage and interaction capability is further improved.

In the existing NPU with instruction extension as the core, the data storage interaction adopts a unified load/store (load/store) mode, and the operation is synchronized in sequence, which is inefficient. However, after the processing method described in the present application is adopted, the processing can be performed in parallel, and the waiting time and the like caused by the synchronous operation are avoided, thereby making the system control and data storage interaction more efficient.

The data packet information may further include: destination address or storage policy. If the data packet information includes the destination address, then the data packing and unpacking unit 1031 can store the operation result data in the corresponding storage unit 1021 according to the destination address. If the data packet information includes the storage policy, then the data packing and unpacking unit 1031 The operation result data can be stored in the corresponding storage unit 1021 according to the storage policy. The storage strategy may be a storage strategy that achieves data alignment.

After the operation unit 10411/10421 completes the operation, the data in the data segment in the first data packet can be replaced with the operation result data, and the data length usually changes, so it is also necessary to modify the data length information in the data packet, etc. The second data packet is returned to the data packaging and unpacking unit 1031 according to the transmission path of the first data packet. After the data packaging and unpacking unit 1031 parses the operation result data from the second data packet, it will involve how the operation result data is. storage problems.

Correspondingly, the data packet information may include: source channel, source address, destination channel, and destination address, etc., that is, the source address, the destination address, and the channel addresses on both sides. In this way, for the obtained operation result data, the data is packaged. The unpacking unit 1031 may store the destination address in the corresponding storage unit 1021 according to the destination address. Alternatively, the data packet information may not include the destination address, but include a storage policy, and the data packaging and unpacking unit 1031 may store the operation result data in the corresponding storage unit 1021 according to the storage policy, thereby realizing automatic data alignment and the like.

The specific strategy of the storage strategy may be determined according to actual needs, for example, it may include upward alignment, downward alignment, and how to process other places after alignment (such as filling processing).

The operations involved in the neural network will cause the data to shrink or expand, that is, the length of the above data will change, which can easily cause the data after the operation to be misaligned. In the existing NPU with instruction expansion as the core, additional data conversion or Transpose is used to solve the data alignment problem. This extra operation will reduce the overall processing efficiency. Since the neural network operation involves a large number of repeated storage operations and interactive iterative operations, it will have a great impact on the overall processing efficiency. However, in the processing method described in this application, the free interaction of storage and operation is realized by means of routing exchange, and the storage is automatically completed through storage policies, etc., and automatic data alignment is realized. The implementation method is simple, and the overall processing efficiency is improved.

As shown in FIG. 3 , the system controller 101 can interact with the processing unit through the external bus interface, and the DMA module 105 can interact with the double-rate (DDR, Double Data Rate) external storage unit through the external bus storage interface. for existing technology.

The above is the introduction of the apparatus embodiments, and the solution described in the present application will be further described below through the method embodiments.

FIG. 4 is a flowchart of an embodiment of a method for implementing a processor described in this application. As shown in FIG. 4 , the following specific implementations are included.

In 401, a processor consisting of a system controller, a storage array module, a data packing and unpacking module, and an arithmetic module is constructed.

In 402, use the processor to perform neural network operation; wherein, the system controller is used for sending predetermined data packet information to the data packaging and unpacking module; the data packaging and unpacking module is used for obtaining corresponding data from the storage array module according to the data packet information Packet data, pack the packet data and the packet information, send the first packet obtained by packing to the arithmetic module for arithmetic processing, and obtain the second packet returned by the arithmetic module, and disassemble the second packet by disassembling the second packet. The operation result data is obtained in the package and stored in the storage array module; the storage array module is used for data storage; the operation module is used to perform operation processing on the obtained first data packet, generate a second data packet according to the operation result data, and return it to Data packing and unpacking module.

On the above basis, a DMA module can also be added to the processor, and the DMA module can be used to realize high-speed exchange of external storage data and internal storage array data in the storage array module under the control of the system controller.

In addition, a routing switching module can be added to the processor, and the routing switching module can be used to send the first data packet obtained from the data packing and unpacking module to the computing module, and send the second data packet obtained from the computing module to the data Pack and unpack modules.

The operation modules may include: a general operation module for performing general operations and an activation operation module for performing activation operations.

In addition, the storage array module may include N1 storage units, the data packing and unpacking module may include N2 data packing and unpacking units, and each data packing and unpacking unit is respectively connected to the routing switch module through a data channel, and N1 and N2 is a positive integer greater than one. The general operation module may include M operation units, the activation operation module may include P operation units, each operation unit may be connected to the routing exchange module through a data channel, and M and P are both positive integers greater than one.

Correspondingly, the data packing and unpacking unit can be used to pack the packet data obtained from the storage unit and the packet information obtained from the system controller, and use the data channel to send the first packet obtained by packing to the routing switch module. The arithmetic unit performs arithmetic processing, and uses the data channel to obtain the second data packet returned by the arithmetic unit through the routing switch module, and unpacks the second data packet to obtain the arithmetic result data, which is stored in the storage unit.

The data packet information may include: source channel, source address, destination channel and operation type. Correspondingly, the data packet data can be the data packet data obtained by the data packaging and unpacking unit from the source address of the storage unit corresponding to the source channel, and the operation unit that obtains the first data packet can be the destination channel determined by the routing switching module. The operation unit, the operation processing may be the operation processing of the operation type performed by the operation unit.

Preferably, the values of N1 and N2 are the same, each data packing and unpacking unit corresponds to a storage unit respectively, and data packet data is acquired from the corresponding storage unit.

The data packet information may further include: destination address or storage policy. If the data packet information includes the destination address, the data packaging and unpacking unit can store the operation result data in the corresponding storage unit according to the destination address. If the data packet information includes the storage strategy, the data packaging and unpacking unit can store the data according to the The strategy is to store the operation result data in the corresponding storage unit. The storage strategy may be a storage strategy that achieves data alignment.

For the specific work flow of the method embodiment shown in FIG. 4 , please refer to the relevant descriptions in the foregoing apparatus embodiments, which will not be repeated.

In a word, by adopting the solutions described in the method embodiments of the present application, an implementation method integrating storage and computing is proposed, which completes the overall interaction from neural network storage to computing in the processor, avoiding complex instruction design and highly difficult compilers. development, etc., thereby reducing the design difficulty and improving the overall processing efficiency.

According to the embodiments of the present application, the present application further provides an electronic device and a readable storage medium.

As shown in FIG. 5 , it is a block diagram of an electronic device according to the method described in the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the application described and/or claimed herein.

As shown in FIG. 5, the electronic device includes: one or more processors Y01, a memory Y02, and interfaces for connecting various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or otherwise as desired. The processor may process instructions executed within the electronic device, including instructions stored in or on memory to display graphical information of a graphical user interface on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used with multiple memories and multiple memories, if desired. Likewise, multiple electronic devices may be connected, each providing some of the necessary operations (eg, as a server array, a group of blade servers, or a multiprocessor system). In FIG. 5, a processor Y01 is taken as an example.

The memory Y02 is the non-transitory computer-readable storage medium provided in this application. Wherein, the memory stores instructions executable by at least one processor, so that the at least one processor executes the method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the methods provided by the present application.

As a non-transitory computer-readable storage medium, the memory Y02 can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present application. The processor Y01 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions and modules stored in the memory Y02, that is, to implement the methods in the above method embodiments.

The memory Y02 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function; the storage data area may store data created according to the use of the electronic device, and the like. In addition, the memory Y02 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory Y02 may optionally include memory located remotely relative to processor Y01, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the Internet, intranets, blockchain networks, local area networks, mobile communication networks, and combinations thereof.

The electronic device may further include: an input device Y03 and an output device Y04. The processor Y01, the memory Y02, the input device Y03, and the output device Y04 may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 5 .

Input device Y03 can receive input numerical or character information, and generate key signal input related to user settings and function control of electronic equipment, such as touch screen, keypad, mouse, track pad, touch pad, pointing stick, one or more Input devices such as mouse buttons, trackballs, joysticks, etc. The output device Y04 may include a display device, an auxiliary lighting device, a haptic feedback device (eg, a vibration motor), and the like. The display devices may include, but are not limited to, liquid crystal displays, light emitting diode displays, and plasma displays. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described herein can be implemented in digital electronic circuitry, integrated circuit systems, application specific integrated circuits, computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that The processor, which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.

These computational programs (also referred to as programs, software, software applications, or codes) include machine instructions for programmable processors, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages calculation program. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or apparatus for providing machine instructions and/or data to a programmable processor ( For example, a magnetic disk, an optical disk, a memory, a programmable logic device), including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having: a display device (eg, a cathode ray tube or liquid crystal display monitor) for displaying information to the user; and a keyboard and pointing A device (eg, a mouse or trackball) through which the user can provide input to the computer through the keyboard and the pointing device. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.

The systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: local area networks, wide area networks, blockchain networks, and the Internet.

A computer system can include clients and servers. Clients and servers are generally remote from each other and usually interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also known as a cloud computing server or a cloud host. It is a host product in the cloud computing service system to solve the traditional physical host and VPS services, which are difficult to manage and weak in business scalability. defect.

It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in the present application can be executed in parallel, sequentially or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, no limitation is imposed herein.

The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of this application shall be included within the protection scope of this application.

Claims

A processor, comprising: a system controller, a storage array module, a data packing and unpacking module, and an arithmetic module;

the system controller, configured to send predetermined data packet information to the data packaging and unpacking module;

The data packaging and unpacking module is configured to obtain the corresponding data packet data from the storage array module according to the data packet information, package the data packet data and the data packet information, and sending the data packet to the operation module for operation processing, and acquiring the second data packet returned by the operation module, obtaining operation result data by unpacking the second data packet, and storing it in the storage array module;

The storage array module is used for data storage;

The operation module is configured to perform operation processing on the acquired first data packet, generate the second data packet according to the operation result data, and return it to the data packaging and unpacking module.
The processor of claim 1, further comprising:

The direct memory access module is used for realizing high-speed exchange of external storage data and internal storage array data in the storage array module under the control of the system controller.
The processor of claim 1, further comprising:

A routing switching module, configured to send the first data packet obtained from the data packaging and unpacking module to the computing module, and send the second data packet obtained from the computing module to the data Pack and unpack modules.
The processor of claim 3, wherein,

The operation module includes: a general operation module and an activation operation module;

The general operation module is used for general operation; the activation operation module is used for activation operation.
The processor of claim 4, wherein,

The storage array module includes N1 storage units;

The data packing and unpacking module includes N2 data packing and unpacking units, each data packing and unpacking unit is respectively connected to the routing switching module through a data channel, and N1 and N2 are both positive integers greater than one;

The general computing module includes M computing units, the activation computing module includes P computing units, each computing unit is connected to the routing switching module through a data channel, and M and P are both greater than one. positive integer;

The data packaging and unpacking unit packs the data packet data obtained from the storage unit and the data packet information obtained from the system controller, and uses the data channel to pack the data obtained by packaging. A data packet is sent to the computing unit through the routing switching module for computing processing, and the data channel is used to obtain the second data packet returned by the computing unit through the routing switching module. The data packets are unpacked to obtain operation result data, which are stored in the storage unit.
The processor of claim 5, wherein,

The data packet information includes: source channel, source address, destination channel and operation type;

The data packaging and unpacking unit obtains the data packet data from the source address of the storage unit corresponding to the source channel;

The routing switching module sends the first data packet to the operation unit corresponding to the destination channel for operation processing of the operation type.
The processor of claim 6, wherein,

The values of N1 and N2 are the same, each data packing and unpacking unit corresponds to a storage unit respectively, and the data packet data is acquired from the corresponding storage unit.
The processor of claim 7, wherein,

The data packet information further includes: destination address or storage policy;

If the destination address is included in the data packet information, the data packing and unpacking unit stores the operation result data in the corresponding storage unit according to the destination address;

If the data packet information includes the storage policy, the data packaging and unpacking unit stores the operation result data in a corresponding storage unit according to the storage policy.
The processor of claim 8, wherein the storage policy comprises a storage policy that implements data alignment.
A processor implementation method, comprising:

Build a processor consisting of a system controller, a storage array module, a data packaging and unpacking module, and an arithmetic module;

Use the processor to perform neural network operations; wherein, the system controller is configured to send predetermined data packet information to the data packing and unpacking module; the data packing and unpacking module is configured to convert from The storage array module obtains the corresponding data packet data, packages the data packet data and the data packet information, sends the first data packet obtained by packaging to the operation module for operation processing, and obtains the operation For the second data packet returned by the module, the operation result data is obtained by unpacking the second data packet and stored in the storage array module; the storage array module is used for data storage; the operation module is used for Perform operation processing on the acquired first data packet, generate the second data packet according to the operation result data, and return it to the data packaging and unpacking module.
The method of claim 10, further comprising:

A direct memory access module is added to the processor, and the direct memory access module is used to realize high-speed exchange of external storage data and internal storage array data in the storage array module under the control of the system controller .
The method of claim 10, further comprising:

A routing switching module is added to the processor, and the routing switching module is configured to send the first data packet obtained from the data packaging and unpacking module to the computing module, and obtain the first data packet from the computing module The second data packet is sent to the data packaging and unpacking module.
The method of claim 12, wherein,

The operation module includes: a general operation module for performing general operations and an activation operation module for performing activation operations.
The method of claim 13, wherein,

The storage array module includes N1 storage units;

The data packing and unpacking module includes N2 data packing and unpacking units, each data packing and unpacking unit is respectively connected to the routing switching module through a data channel, and N1 and N2 are both positive integers greater than one;

The general computing module includes M computing units, the activation computing module includes P computing units, each computing unit is connected to the routing switching module through a data channel, and M and P are both greater than one. positive integer;

The data packaging and unpacking unit is used to package the data packet data obtained from the storage unit and the data packet information obtained from the system controller, and use the data channel to package the data obtained by packaging. The first data packet is sent to the computing unit through the routing switching module for operation processing, and the second data packet returned by the computing unit is acquired through the routing switching module by using the data channel, and the The second data packet is unpacked to obtain operation result data, which is stored in the storage unit.
The method of claim 14, wherein,

The data packet information includes: source channel, source address, destination channel and operation type;

The data packet data is the data packet data obtained by the data packaging and unpacking unit from the source address of the storage unit corresponding to the source channel;

The operation unit that obtains the first data packet is the operation unit corresponding to the destination channel determined by the routing switching module;

The operation processing is the operation processing of the operation type performed by the operation unit.
The method of claim 15, wherein,

The values of N1 and N2 are the same, each data packing and unpacking unit corresponds to a storage unit respectively, and the data packet data is acquired from the corresponding storage unit.
The method of claim 16, wherein,

The data packet information further includes: destination address or storage policy;

If the destination address is included in the data packet information, the data packaging and unpacking unit stores the operation result data in the corresponding storage unit according to the destination address;

If the data packet information includes the storage policy, the data packaging and unpacking unit stores the operation result data in a corresponding storage unit according to the storage policy.
18. The method of claim 17, wherein the storage policy comprises a storage policy that implements data alignment.
An electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the execution of any of claims 10-18 Methods.
A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of any of claims 10-18.