WO2022037422A1 - Processor, implementation method, electronic device, and storage medium - Google Patents

Processor, implementation method, electronic device, and storage medium Download PDF

Info

Publication number
WO2022037422A1
WO2022037422A1 PCT/CN2021/110952 CN2021110952W WO2022037422A1 WO 2022037422 A1 WO2022037422 A1 WO 2022037422A1 CN 2021110952 W CN2021110952 W CN 2021110952W WO 2022037422 A1 WO2022037422 A1 WO 2022037422A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
module
data packet
unpacking
storage
Prior art date
Application number
PCT/CN2021/110952
Other languages
French (fr)
Chinese (zh)
Inventor
严小平
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to JP2022554384A priority Critical patent/JP7379794B2/en
Priority to US17/792,867 priority patent/US11784946B2/en
Priority to KR1020227027218A priority patent/KR20220122756A/en
Priority to EP21857517.3A priority patent/EP4075759A4/en
Publication of WO2022037422A1 publication Critical patent/WO2022037422A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9057Arrangements for supporting packet reassembly or resequencing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0635Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9063Intermediate storage in different physical parts of a node or terminal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • the present application relates to computer application technologies, in particular to processors and implementation methods, electronic devices and storage media in the field of artificial intelligence and deep learning.
  • neural network-based processors such as NPU (Network Processing Unit) chips.
  • the current NPU includes two mainstream design methods with accelerators as the core and instruction expansion as the core.
  • the former design method is rarely used due to its poor versatility and scalability, and the latter design method is mainly used.
  • the latter design method it is necessary to expand the tedious instruction set corresponding to the operation of the neural network, and it is necessary to develop a special compiler to support it.
  • the design is very difficult, especially when it is applied to real-time processing of speech data.
  • the present application provides processors and implementation methods, electronic devices, and storage media.
  • a processor comprising: a system controller, a storage array module, a data packing and unpacking module, and an arithmetic module;
  • the system controller configured to send predetermined data packet information to the data packaging and unpacking module
  • the data packaging and unpacking module is configured to obtain the corresponding data packet data from the storage array module according to the data packet information, package the data packet data and the data packet information, and sending the data packet to the operation module for operation processing, and acquiring the second data packet returned by the operation module, obtaining operation result data by unpacking the second data packet, and storing it in the storage array module;
  • the storage array module is used for data storage
  • the operation module is configured to perform operation processing on the acquired first data packet, generate the second data packet according to the operation result data, and return it to the data packaging and unpacking module.
  • a processor implementation method comprising:
  • the system controller is configured to send predetermined data packet information to the data packing and unpacking module;
  • the data packing and unpacking module is configured to convert from The storage array module obtains the corresponding data packet data, packages the data packet data and the data packet information, sends the first data packet obtained by packaging to the operation module for operation processing, and obtains the operation
  • the operation result data is obtained by unpacking the second data packet and stored in the storage array module;
  • the storage array module is used for data storage;
  • the operation module is used for Perform operation processing on the acquired first data packet, generate the second data packet according to the operation result data, and return it to the data packaging and unpacking module.
  • An electronic device comprising:
  • the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
  • a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method as described above.
  • An embodiment in the above application has the following advantages or beneficial effects: a storage-computing integration implementation is proposed, and the overall interaction from neural network storage to computing is completed in the processor, avoiding complex instruction design and difficult operations. Compiler development, etc., thereby reducing the design difficulty and improving the overall processing efficiency.
  • FIG. 1 is a schematic diagram of the composition and structure of the first embodiment of the processor 10 described in this application;
  • FIG. 2 is a schematic structural diagram of the composition of the processor 10 according to the second embodiment of the present application.
  • FIG. 3 is a schematic diagram of the composition and structure of the processor 10 according to the third embodiment of the present application.
  • FIG. 4 is a flowchart of an embodiment of a method for implementing a processor described in this application;
  • FIG. 5 is a block diagram of an electronic device according to the method described in the embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a first embodiment of the processor 10 described in this application. As shown in FIG. 1 , it includes: a system controller 101 , a storage array module 102 , a data packing and unpacking module 103 and an arithmetic module 104 .
  • the system controller 101 is configured to send the predetermined data packet information to the data packing and unpacking module 103 .
  • the data packaging and unpacking module 103 is used to obtain the corresponding data packet data from the storage array module 102 according to the data packet information, package the data packet data and the data packet information, and send the packaged first data packet to the computing module 104 for processing.
  • the operation processing is performed, and the second data packet returned by the operation module 104 is acquired, and the operation result data is obtained by unpacking the second data packet, and stored in the storage array module 102 .
  • the storage array module 102 is used for data storage.
  • the operation module 104 is configured to perform operation processing on the acquired first data packet, generate a second data packet according to the operation result data, and return it to the data packaging and unpacking module 103.
  • the above-mentioned embodiment proposes an integrated implementation of storage and computing, which completes the overall interaction between neural network storage and computing in the processor, and avoids complex instruction design and difficult compiler development. It reduces the design difficulty and improves the overall processing efficiency.
  • the processor 10 may further include one or all of the following: a direct memory access (DMA, Direct Memory Access) module, and a routing switch module.
  • DMA Direct Memory Access
  • routing switch module a routing switch module
  • FIG. 2 is a schematic structural diagram of the composition of the second embodiment of the processor 10 described in this application. As shown in FIG. 2 , it includes: a system controller 101 , a storage array module 102 , a data packing and unpacking module 103 , an arithmetic module 104 , a DMA module 105 and a routing switching module 106 .
  • the DMA module 105 is used to realize high-speed exchange of external storage data and internal storage array data in the storage array module 103 under the control of the system controller 101 .
  • the routing switching module 106 is configured to send the first data packet obtained from the data packing and unpacking module 103 to the computing module 104 , and send the second data packet obtained from the computing module 104 to the data packing and unpacking module 103 .
  • the operation module 104 may further include: a general operation module 1041 and an activation operation module 1042 .
  • the general operation module 1041 can be used to perform general operations
  • the activation operation module 1042 can be used to perform activation operations.
  • the system controller 101 may adopt a simple control logic or state machine design, or may include a complex processor IP, where IP is the abbreviation of Intellectual Property, such as the complex processor IP may include an advanced reduced instruction set Machine (ARM, Advanced RISC Machine), Digital Signal Processing (DSP, Digital Signal Processing), X86, Microcontroller Unit (MCU, Microcontroller Unit) core IP, etc.
  • IP is the abbreviation of Intellectual Property
  • the complex processor IP may include an advanced reduced instruction set Machine (ARM, Advanced RISC Machine), Digital Signal Processing (DSP, Digital Signal Processing), X86, Microcontroller Unit (MCU, Microcontroller Unit) core IP, etc.
  • ARM advanced reduced instruction set Machine
  • DSP Digital Signal Processing
  • MCU Microcontroller Unit
  • the storage array module 102 can be composed of multiple groups of Static Random-Access Memory (SRAM, Static Random-Access Memory), supports multi-port high-speed simultaneous reading or writing, and can implement data cache or storage by means of a matrix.
  • the data stored in the storage array module 102 may include neural network model data, external input data, temporary data of the middle layer, and the like.
  • the data packaging and unpacking module 103 can perform data reading and storage operations on the storage array module 102, perform packaging operations on the data packet information obtained from the system controller 101 and the data packet data in the storage array module 102, and store the data obtained by packaging.
  • a data packet is sent to the computing module 104 through the routing switching module 106 , and the second data packet returned by the computing module 104 through the routing switching module 106 is unpacked, and the obtained operation result data is stored in the storage array module 102 .
  • the routing switching module 106 can receive the data packets of the data packing and unpacking module 103 and the computing module 104, and perform data exchange and the like.
  • the general operations performed by the general operation module 1041 may include general vector operations such as vector arithmetic operations, logical operations, comparison operations, dot multiplication, accumulation, and summation.
  • the activation operation performed by the activation operation module 1042 may include one or more of nonlinear functions sigmoid, tanh, relu, softmax operation, and the like.
  • the system controller 101 can manage and control the whole, such as sending the data packet information to the data packing and unpacking module 102 as mentioned above, so that the data packing and unpacking module 102 can carry out the packing and unpacking of the data, etc., and can be responsible for starting the DMA module. 105 to realize high-speed exchange of external storage data and internal storage array data in the storage array module 102, and the like.
  • the processor adopts the main structure of the storage array module + the data packaging and unpacking module + the routing switching module as a whole, which completes the overall interaction from the neural network storage to the operation, and avoids complex instruction design and high difficulty. Compiler development, etc., thus reducing the design difficulty and improving the overall processing efficiency.
  • FIG. 3 is a schematic structural diagram of a third embodiment of the processor 10 described in this application. As shown in FIG. 3 , it includes: a system controller 101 , a storage array module 102 , a data packing and unpacking module 103 , an arithmetic module 104 , a DMA module 105 and a routing switching module 106 .
  • the storage array module 102 may include N1 storage units 1021, each storage unit 1021 may be a set of SRAM, etc., and the data packing and unpacking module 103 may include N2 data packing and unpacking units 1031, each data packing and unpacking unit 1031
  • the packet unit 1031 can be respectively connected to the routing switching module 106 through a data channel, and N1 and N2 are both positive integers greater than one.
  • the general operation module 1041 can include M operation units 10411, and the activation operation module 1042 can include There are P arithmetic units 10421, and each arithmetic unit 10411/10421 can be connected to the routing switching module 106 through a data channel respectively, and M and P are both positive integers greater than one.
  • the specific values of N1, N2, M and P can be determined according to actual needs.
  • the data packaging and unpacking unit 1031 can package the data packet data obtained from the storage unit 1021 and the data packet information obtained from the system controller 101, and use the data channel to exchange the first data packet obtained through routing.
  • the module 106 sends the operation unit 10411/10421 for operation processing, and uses the data channel to obtain the second data packet returned by the operation unit 10411/10421 through the routing switching module 106, and obtains operation result data by unpacking the second data packet, stored in the storage unit 1021.
  • the system controller 101 can simulate the details of each neural network operation, such as what data needs to be obtained, where to obtain it, what kind of operation needs to be performed, etc. Correspondingly, it can generate data packet information and send it to the relevant The data packing and unpacking unit 1031.
  • Each data packing and unpacking unit 1031 can work in parallel, such as respectively acquiring data packet information from the system controller 101 and performing packing and unpacking operations.
  • the data packet information may include: source channel, source address, destination channel (operation channel), operation type, data packet length, and the like.
  • the data packing and unpacking unit 1031 can obtain the data packet data from the source address of the storage unit 1021 corresponding to the source channel, and the routing switching module 106 can send the obtained first data packet to the operation unit 10411/10421 corresponding to the destination channel, and calculate The units 10411/10421 can perform corresponding types of operation processing according to the operation type.
  • the values of N1 and N2 are the same, that is, the number of storage units 1021 and data packing and unpacking units 1031 is the same, and each data packing and unpacking unit 1031 may respectively correspond to one storage unit 1021, and the data from the corresponding storage unit 1021 Get packet data.
  • the parallel work of each data packing and unpacking unit 1031 can be better ensured. Assuming that both data packing and unpacking units 1031 can obtain data from a certain storage unit 1021, there may be a waiting situation, that is, one of the data The packing and unpacking unit 1031 needs to wait for another data packing and unpacking unit 1031 to obtain the data before it can obtain the data, thereby causing a reduction in efficiency and the like.
  • the data storage interaction adopts a unified load/store (load/store) mode, and the operation is synchronized in sequence, which is inefficient.
  • the processing can be performed in parallel, and the waiting time and the like caused by the synchronous operation are avoided, thereby making the system control and data storage interaction more efficient.
  • the data packet information may further include: destination address or storage policy. If the data packet information includes the destination address, then the data packing and unpacking unit 1031 can store the operation result data in the corresponding storage unit 1021 according to the destination address. If the data packet information includes the storage policy, then the data packing and unpacking unit 1031 The operation result data can be stored in the corresponding storage unit 1021 according to the storage policy.
  • the storage strategy may be a storage strategy that achieves data alignment.
  • the data in the data segment in the first data packet can be replaced with the operation result data, and the data length usually changes, so it is also necessary to modify the data length information in the data packet, etc.
  • the second data packet is returned to the data packaging and unpacking unit 1031 according to the transmission path of the first data packet. After the data packaging and unpacking unit 1031 parses the operation result data from the second data packet, it will involve how the operation result data is. storage problems.
  • the data packet information may include: source channel, source address, destination channel, and destination address, etc., that is, the source address, the destination address, and the channel addresses on both sides.
  • the unpacking unit 1031 may store the destination address in the corresponding storage unit 1021 according to the destination address.
  • the data packet information may not include the destination address, but include a storage policy, and the data packaging and unpacking unit 1031 may store the operation result data in the corresponding storage unit 1021 according to the storage policy, thereby realizing automatic data alignment and the like.
  • the specific strategy of the storage strategy may be determined according to actual needs, for example, it may include upward alignment, downward alignment, and how to process other places after alignment (such as filling processing).
  • the operations involved in the neural network will cause the data to shrink or expand, that is, the length of the above data will change, which can easily cause the data after the operation to be misaligned.
  • additional data conversion or Transpose is used to solve the data alignment problem. This extra operation will reduce the overall processing efficiency. Since the neural network operation involves a large number of repeated storage operations and interactive iterative operations, it will have a great impact on the overall processing efficiency.
  • the free interaction of storage and operation is realized by means of routing exchange, and the storage is automatically completed through storage policies, etc., and automatic data alignment is realized.
  • the implementation method is simple, and the overall processing efficiency is improved.
  • the system controller 101 can interact with the processing unit through the external bus interface, and the DMA module 105 can interact with the double-rate (DDR, Double Data Rate) external storage unit through the external bus storage interface. for existing technology.
  • DDR Double Data Rate
  • FIG. 4 is a flowchart of an embodiment of a method for implementing a processor described in this application. As shown in FIG. 4 , the following specific implementations are included.
  • a processor consisting of a system controller, a storage array module, a data packing and unpacking module, and an arithmetic module is constructed.
  • the system controller is used for sending predetermined data packet information to the data packaging and unpacking module;
  • the data packaging and unpacking module is used for obtaining corresponding data from the storage array module according to the data packet information Packet data, pack the packet data and the packet information, send the first packet obtained by packing to the arithmetic module for arithmetic processing, and obtain the second packet returned by the arithmetic module, and disassemble the second packet by disassembling the second packet.
  • the operation result data is obtained in the package and stored in the storage array module; the storage array module is used for data storage; the operation module is used to perform operation processing on the obtained first data packet, generate a second data packet according to the operation result data, and return it to Data packing and unpacking module.
  • a DMA module can also be added to the processor, and the DMA module can be used to realize high-speed exchange of external storage data and internal storage array data in the storage array module under the control of the system controller.
  • a routing switching module can be added to the processor, and the routing switching module can be used to send the first data packet obtained from the data packing and unpacking module to the computing module, and send the second data packet obtained from the computing module to the data Pack and unpack modules.
  • the operation modules may include: a general operation module for performing general operations and an activation operation module for performing activation operations.
  • the storage array module may include N1 storage units
  • the data packing and unpacking module may include N2 data packing and unpacking units
  • each data packing and unpacking unit is respectively connected to the routing switch module through a data channel
  • N1 and N2 is a positive integer greater than one
  • the general operation module may include M operation units
  • the activation operation module may include P operation units
  • each operation unit may be connected to the routing exchange module through a data channel
  • M and P are both positive integers greater than one.
  • the data packing and unpacking unit can be used to pack the packet data obtained from the storage unit and the packet information obtained from the system controller, and use the data channel to send the first packet obtained by packing to the routing switch module.
  • the arithmetic unit performs arithmetic processing, and uses the data channel to obtain the second data packet returned by the arithmetic unit through the routing switch module, and unpacks the second data packet to obtain the arithmetic result data, which is stored in the storage unit.
  • the data packet information may include: source channel, source address, destination channel and operation type.
  • the data packet data can be the data packet data obtained by the data packaging and unpacking unit from the source address of the storage unit corresponding to the source channel, and the operation unit that obtains the first data packet can be the destination channel determined by the routing switching module.
  • the operation unit, the operation processing may be the operation processing of the operation type performed by the operation unit.
  • each data packing and unpacking unit corresponds to a storage unit respectively, and data packet data is acquired from the corresponding storage unit.
  • the data packet information may further include: destination address or storage policy. If the data packet information includes the destination address, the data packaging and unpacking unit can store the operation result data in the corresponding storage unit according to the destination address. If the data packet information includes the storage strategy, the data packaging and unpacking unit can store the data according to the The strategy is to store the operation result data in the corresponding storage unit.
  • the storage strategy may be a storage strategy that achieves data alignment.
  • an implementation method integrating storage and computing is proposed, which completes the overall interaction from neural network storage to computing in the processor, avoiding complex instruction design and highly difficult compilers. development, etc., thereby reducing the design difficulty and improving the overall processing efficiency.
  • the present application further provides an electronic device and a readable storage medium.
  • FIG. 5 it is a block diagram of an electronic device according to the method described in the embodiment of the present application.
  • Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the application described and/or claimed herein.
  • the electronic device includes: one or more processors Y01, a memory Y02, and interfaces for connecting various components, including a high-speed interface and a low-speed interface.
  • the various components are interconnected using different buses and may be mounted on a common motherboard or otherwise as desired.
  • the processor may process instructions executed within the electronic device, including instructions stored in or on memory to display graphical information of a graphical user interface on an external input/output device, such as a display device coupled to the interface.
  • multiple processors and/or multiple buses may be used with multiple memories and multiple memories, if desired.
  • multiple electronic devices may be connected, each providing some of the necessary operations (eg, as a server array, a group of blade servers, or a multiprocessor system).
  • a processor Y01 is taken as an example.
  • the memory Y02 is the non-transitory computer-readable storage medium provided in this application.
  • the memory stores instructions executable by at least one processor, so that the at least one processor executes the method provided by the present application.
  • the non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the methods provided by the present application.
  • the memory Y02 can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present application.
  • the processor Y01 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions and modules stored in the memory Y02, that is, to implement the methods in the above method embodiments.
  • the memory Y02 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function; the storage data area may store data created according to the use of the electronic device, and the like.
  • the memory Y02 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device.
  • memory Y02 may optionally include memory located remotely relative to processor Y01, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the Internet, intranets, blockchain networks, local area networks, mobile communication networks, and combinations thereof.
  • the electronic device may further include: an input device Y03 and an output device Y04.
  • the processor Y01, the memory Y02, the input device Y03, and the output device Y04 may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 5 .
  • Input device Y03 can receive input numerical or character information, and generate key signal input related to user settings and function control of electronic equipment, such as touch screen, keypad, mouse, track pad, touch pad, pointing stick, one or more Input devices such as mouse buttons, trackballs, joysticks, etc.
  • the output device Y04 may include a display device, an auxiliary lighting device, a haptic feedback device (eg, a vibration motor), and the like.
  • the display devices may include, but are not limited to, liquid crystal displays, light emitting diode displays, and plasma displays. In some implementations, the display device may be a touch screen.
  • Various implementations of the systems and techniques described herein can be implemented in digital electronic circuitry, integrated circuit systems, application specific integrated circuits, computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that The processor, which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
  • the processor which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
  • machine-readable medium and “computer-readable medium” refer to any computer program product, apparatus, and/or apparatus for providing machine instructions and/or data to a programmable processor (for example, a magnetic disk, an optical disk, a memory, a programmable logic device), including a machine-readable medium that receives machine instructions as a machine-readable signal.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • the systems and techniques described herein may be implemented on a computer having: a display device (eg, a cathode ray tube or liquid crystal display monitor) for displaying information to the user; and a keyboard and pointing A device (eg, a mouse or trackball) through which the user can provide input to the computer through the keyboard and the pointing device.
  • a display device eg, a cathode ray tube or liquid crystal display monitor
  • a keyboard and pointing A device eg, a mouse or trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: local area networks, wide area networks, blockchain networks, and the Internet.
  • a computer system can include clients and servers. Clients and servers are generally remote from each other and usually interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the server can be a cloud server, also known as a cloud computing server or a cloud host. It is a host product in the cloud computing service system to solve the traditional physical host and VPS services, which are difficult to manage and weak in business scalability. defect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Human Computer Interaction (AREA)
  • Neurology (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Multi Processors (AREA)
  • Memory System (AREA)
  • Credit Cards Or The Like (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

The present application relates to the field of artificial intelligence and deep learning. Disclosed are a processor, an implementation method, an electronic device, and a storage medium. The processor comprises: a system controller, used for transmitting predetermined data pack information to a data packing and unpacking module; the data packing and unpacking module, used for acquiring corresponding data pack data from a storage array module on the basis of the data pack information, packing the data pack information, transmitting a first data pack produced by packing to a computing module for computational processing, acquiring a second data pack returned by the computing module, unpacking to produce computation result data, and storing same in the storage array module; the storage array module, used for data storage; and the computing module, used for computational processing with respect to the first data pack received, generating the second data pack on the basis of the computation result, and returning to the data packing and unpacking module. The application of the solution of the present application reduces design difficulty and increases overall processing efficiency.

Description

处理器及实现方法、电子设备和存储介质Processor and implementation method, electronic device and storage medium
本申请要求了申请日为2020年08月21日,申请号为2020108517577发明名称为“处理器及实现方法、电子设备和存储介质”的中国专利申请的优先权。This application claims the priority of the Chinese patent application with the filing date of August 21, 2020 and the application number of 2020108517577, the invention title is "processor and implementation method, electronic device and storage medium".
技术领域technical field
本申请涉及计算机应用技术,特别涉及人工智能及深度学习领域的处理器及实现方法、电子设备和存储介质。The present application relates to computer application technologies, in particular to processors and implementation methods, electronic devices and storage media in the field of artificial intelligence and deep learning.
背景技术Background technique
越来越智能化的应用使得神经网络算法更为多样化,使得整体的神经网络模型变得越来越复杂,相应地,带来了更大量的运算和数据的存储交互,因此如神经网络处理器(NPU,Network Processing Unit)芯片等基于神经网络的处理器越来越受到重视。More and more intelligent applications make neural network algorithms more diverse, making the overall neural network model more and more complex, correspondingly, bringing a larger amount of operations and data storage interactions. More and more attention has been paid to neural network-based processors such as NPU (Network Processing Unit) chips.
目前的NPU包括以加速器为核心和以指令扩展为核心的两种主流设计方式,其中前一种设计方式由于通用性和扩展性较差,较少采用,主要采用后一种设计方式。但后一种设计方式中,需要扩展对应神经网络运算操作的繁琐指令集,并需要开发专用的编译器支持等,设计难度很高,尤其是应用于语音数据实时处理时。The current NPU includes two mainstream design methods with accelerators as the core and instruction expansion as the core. The former design method is rarely used due to its poor versatility and scalability, and the latter design method is mainly used. However, in the latter design method, it is necessary to expand the tedious instruction set corresponding to the operation of the neural network, and it is necessary to develop a special compiler to support it. The design is very difficult, especially when it is applied to real-time processing of speech data.
发明内容SUMMARY OF THE INVENTION
本申请提供了处理器及实现方法、电子设备和存储介质。The present application provides processors and implementation methods, electronic devices, and storage media.
一种处理器,包括:系统控制器、存储阵列模块、数据打包拆包模块以及运算模块;A processor, comprising: a system controller, a storage array module, a data packing and unpacking module, and an arithmetic module;
所述系统控制器,用于将预定的数据包信息发送给所述数据打包拆包模块;the system controller, configured to send predetermined data packet information to the data packaging and unpacking module;
所述数据打包拆包模块,用于根据所述数据包信息从所述存储阵列模块获取对应的数据包数据,将所述数据包数据与所述数据包信息进行打包,将打包得到的第一数据包发送给所述运算模块进行运算处理,并获取所述运算模块返回的第二数据包,通过对所述第二数据包进行拆包 得到运算结果数据,存储到所述存储阵列模块中;The data packaging and unpacking module is configured to obtain the corresponding data packet data from the storage array module according to the data packet information, package the data packet data and the data packet information, and sending the data packet to the operation module for operation processing, and acquiring the second data packet returned by the operation module, obtaining operation result data by unpacking the second data packet, and storing it in the storage array module;
所述存储阵列模块,用于进行数据存储;The storage array module is used for data storage;
所述运算模块,用于对获取到的所述第一数据包进行运算处理,根据运算结果数据生成所述第二数据包,返回给所述数据打包拆包模块。The operation module is configured to perform operation processing on the acquired first data packet, generate the second data packet according to the operation result data, and return it to the data packaging and unpacking module.
一种处理器实现方法,包括:A processor implementation method, comprising:
构建由系统控制器、存储阵列模块、数据打包拆包模块以及运算模块组成的处理器;Build a processor consisting of a system controller, a storage array module, a data packaging and unpacking module, and an arithmetic module;
利用所述处理器进行神经网络运算;其中,所述系统控制器用于将预定的数据包信息发送给所述数据打包拆包模块;所述数据打包拆包模块用于根据所述数据包信息从所述存储阵列模块获取对应的数据包数据,将所述数据包数据与所述数据包信息进行打包,将打包得到的第一数据包发送给所述运算模块进行运算处理,并获取所述运算模块返回的第二数据包,通过对所述第二数据包进行拆包得到运算结果数据,存储到所述存储阵列模块中;所述存储阵列模块用于进行数据存储;所述运算模块用于对获取到的所述第一数据包进行运算处理,根据运算结果数据生成所述第二数据包,返回给所述数据打包拆包模块。Use the processor to perform neural network operations; wherein, the system controller is configured to send predetermined data packet information to the data packing and unpacking module; the data packing and unpacking module is configured to convert from The storage array module obtains the corresponding data packet data, packages the data packet data and the data packet information, sends the first data packet obtained by packaging to the operation module for operation processing, and obtains the operation For the second data packet returned by the module, the operation result data is obtained by unpacking the second data packet and stored in the storage array module; the storage array module is used for data storage; the operation module is used for Perform operation processing on the acquired first data packet, generate the second data packet according to the operation result data, and return it to the data packaging and unpacking module.
一种电子设备,包括:An electronic device comprising:
至少一个处理器;以及at least one processor; and
与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如以上所述的方法。The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行如以上所述的方法。A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method as described above.
上述申请中的一个实施例具有如下优点或有益效果:提出了一种存算一体的实现方式,在处理器中完成了神经网络存储到运算的整体交互,避免了复杂的指令设计和高难度的编译器开发等,从而降低了设计难度,提升了整体处理效率等。An embodiment in the above application has the following advantages or beneficial effects: a storage-computing integration implementation is proposed, and the overall interaction from neural network storage to computing is completed in the processor, avoiding complex instruction design and difficult operations. Compiler development, etc., thereby reducing the design difficulty and improving the overall processing efficiency.
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or critical features of embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.
附图说明Description of drawings
附图用于更好地理解本方案,不构成对本申请的限定。其中:The accompanying drawings are used for better understanding of the present solution, and do not constitute a limitation to the present application. in:
图1为本申请所述处理器10第一实施例的组成结构示意图;FIG. 1 is a schematic diagram of the composition and structure of the first embodiment of the processor 10 described in this application;
图2为本申请所述处理器10第二实施例的组成结构示意图;FIG. 2 is a schematic structural diagram of the composition of the processor 10 according to the second embodiment of the present application;
图3为本申请所述处理器10第三实施例的组成结构示意图;FIG. 3 is a schematic diagram of the composition and structure of the processor 10 according to the third embodiment of the present application;
图4为本申请所述处理器实现方法实施例的流程图;FIG. 4 is a flowchart of an embodiment of a method for implementing a processor described in this application;
图5为根据本申请实施例所述方法的电子设备的框图。FIG. 5 is a block diagram of an electronic device according to the method described in the embodiment of the present application.
具体实施方式detailed description
以下结合附图对本申请的示范性实施例做出说明,其中包括本申请实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本申请的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present application are described below with reference to the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.
另外,应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。In addition, it should be understood that the term "and/or" in this document is only an association relationship for describing associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, and A exists at the same time and B, there are three cases of B alone. In addition, the character "/" in this document generally indicates that the related objects are an "or" relationship.
图1为本申请所述处理器10第一实施例的组成结构示意图。如图1所示,包括:系统控制器101、存储阵列模块102、数据打包拆包模块103以及运算模块104。FIG. 1 is a schematic structural diagram of a first embodiment of the processor 10 described in this application. As shown in FIG. 1 , it includes: a system controller 101 , a storage array module 102 , a data packing and unpacking module 103 and an arithmetic module 104 .
系统控制器101,用于将预定的数据包信息发送给数据打包拆包模块103。The system controller 101 is configured to send the predetermined data packet information to the data packing and unpacking module 103 .
数据打包拆包模块103,用于根据数据包信息从存储阵列模块102获取对应的数据包数据,将数据包数据与数据包信息进行打包,将打包得到的第一数据包发送给运算模块104进行运算处理,并获取运算模块104返回的第二数据包,通过对第二数据包进行拆包得到运算结果数据,存储到存储阵列模块102中。The data packaging and unpacking module 103 is used to obtain the corresponding data packet data from the storage array module 102 according to the data packet information, package the data packet data and the data packet information, and send the packaged first data packet to the computing module 104 for processing. The operation processing is performed, and the second data packet returned by the operation module 104 is acquired, and the operation result data is obtained by unpacking the second data packet, and stored in the storage array module 102 .
存储阵列模块102,用于进行数据存储。The storage array module 102 is used for data storage.
运算模块104,用于对获取到的第一数据包进行运算处理,根据运 算结果数据生成第二数据包,返回给数据打包拆包模块103。The operation module 104 is configured to perform operation processing on the acquired first data packet, generate a second data packet according to the operation result data, and return it to the data packaging and unpacking module 103.
可以看出,上述实施例中提出了一种存算一体的实现方式,在处理器中完成了神经网络存储到运算的整体交互,避免了复杂的指令设计和高难度的编译器开发等,从而降低了设计难度,提升了整体处理效率等。It can be seen that the above-mentioned embodiment proposes an integrated implementation of storage and computing, which completes the overall interaction between neural network storage and computing in the processor, and avoids complex instruction design and difficult compiler development. It reduces the design difficulty and improves the overall processing efficiency.
在图1所示基础上,处理器10中还可进一步包括以下之一或全部:直接内存存取(DMA,Direct Memory Access)模块、路由交换模块。On the basis shown in FIG. 1 , the processor 10 may further include one or all of the following: a direct memory access (DMA, Direct Memory Access) module, and a routing switch module.
优选地,可同时包括上述两个模块,相应地,图2为本申请所述处理器10第二实施例的组成结构示意图。如图2所示,包括:系统控制器101、存储阵列模块102、数据打包拆包模块103、运算模块104、DMA模块105以及路由交换模块106。Preferably, the above two modules may be included at the same time. Correspondingly, FIG. 2 is a schematic structural diagram of the composition of the second embodiment of the processor 10 described in this application. As shown in FIG. 2 , it includes: a system controller 101 , a storage array module 102 , a data packing and unpacking module 103 , an arithmetic module 104 , a DMA module 105 and a routing switching module 106 .
其中,DMA模块105,用于在系统控制器101的控制下实现外部存储数据与存储阵列模块103中的内部存储阵列数据的高速交换。The DMA module 105 is used to realize high-speed exchange of external storage data and internal storage array data in the storage array module 103 under the control of the system controller 101 .
路由交换模块106,用于将获取自数据打包拆包模块103的第一数据包发送给运算模块104,并将获取自运算模块104的第二数据包发送给数据打包拆包模块103。The routing switching module 106 is configured to send the first data packet obtained from the data packing and unpacking module 103 to the computing module 104 , and send the second data packet obtained from the computing module 104 to the data packing and unpacking module 103 .
如图2所示,运算模块104中可进一步包括:通用运算模块1041以及激活运算模块1042。顾名思义,通用运算模块1041可用于进行通用运算,激活运算模块1042可用于进行激活运算。As shown in FIG. 2 , the operation module 104 may further include: a general operation module 1041 and an activation operation module 1042 . As the name implies, the general operation module 1041 can be used to perform general operations, and the activation operation module 1042 can be used to perform activation operations.
系统控制器101可以采用简单的控制逻辑或状态机设计,也可包括复杂的处理器IP,IP为知识产权(Intellectual Propert)的缩写,如所述复杂的处理器IP可包括进阶精简指令集机器(ARM,Advanced RISC Machine)、数字信号处理(DSP,Digital Signal Processing)、X86、微控制单元(MCU,Microcontroller Unit)内核IP等。The system controller 101 may adopt a simple control logic or state machine design, or may include a complex processor IP, where IP is the abbreviation of Intellectual Property, such as the complex processor IP may include an advanced reduced instruction set Machine (ARM, Advanced RISC Machine), Digital Signal Processing (DSP, Digital Signal Processing), X86, Microcontroller Unit (MCU, Microcontroller Unit) core IP, etc.
存储阵列模块102可由多组静态随机存取存储器(SRAM,Static Random-Access Memory)组成,支持多口高速同时读出或写入,并可采用矩阵的方式实现数据高速缓存或存储。存储阵列模块102中存储的数据可包括神经网络模型数据、外部输入数据以及中间层的临时数据等。The storage array module 102 can be composed of multiple groups of Static Random-Access Memory (SRAM, Static Random-Access Memory), supports multi-port high-speed simultaneous reading or writing, and can implement data cache or storage by means of a matrix. The data stored in the storage array module 102 may include neural network model data, external input data, temporary data of the middle layer, and the like.
数据打包拆包模块103可对存储阵列模块102进行数据读取和存储操作,对获取自系统控制器101的数据包信息及存储阵列模块102的数据包数据进行打包操作,并将打包得到的第一数据包通过路由交换模块106发送给运算模块104,并将运算模块104通过路由交换模块106返回 的第二数据包进行拆包,将得到的运算结果数据存储到存储阵列模块102中。The data packaging and unpacking module 103 can perform data reading and storage operations on the storage array module 102, perform packaging operations on the data packet information obtained from the system controller 101 and the data packet data in the storage array module 102, and store the data obtained by packaging. A data packet is sent to the computing module 104 through the routing switching module 106 , and the second data packet returned by the computing module 104 through the routing switching module 106 is unpacked, and the obtained operation result data is stored in the storage array module 102 .
相应地,路由交换模块106可接收数据打包拆包模块103和运算模块104的数据包,进行数据交换等。Correspondingly, the routing switching module 106 can receive the data packets of the data packing and unpacking module 103 and the computing module 104, and perform data exchange and the like.
通用运算模块1041进行的通用运算可包括向量四则运算、逻辑运算、比较运算、点乘、累加、求和等通用的向量运算。激活运算模块1042进行的激活运算可包括非线性函数sigmoid、tanh、relu、softmax运算中的一种或多种等。The general operations performed by the general operation module 1041 may include general vector operations such as vector arithmetic operations, logical operations, comparison operations, dot multiplication, accumulation, and summation. The activation operation performed by the activation operation module 1042 may include one or more of nonlinear functions sigmoid, tanh, relu, softmax operation, and the like.
系统控制器101可对整体进行管理和控制,如前述的将数据包信息发送给数据打包拆包模块102,以便数据打包拆包模块102进行数据的打包拆包工作等,并可负责启动DMA模块105以实现外部存储数据与存储阵列模块102中的内部存储阵列数据的高速交换等。The system controller 101 can manage and control the whole, such as sending the data packet information to the data packing and unpacking module 102 as mentioned above, so that the data packing and unpacking module 102 can carry out the packing and unpacking of the data, etc., and can be responsible for starting the DMA module. 105 to realize high-speed exchange of external storage data and internal storage array data in the storage array module 102, and the like.
可以看出,上述实施例中,处理器整体采用存储阵列模块+数据打包拆包模块+路由交换模块的主体结构,完成了神经网络存储到运算的整体交互,避免了复杂的指令设计和高难度的编译器开发等,从而降低了设计难度,提升了整体处理效率等。It can be seen that, in the above embodiment, the processor adopts the main structure of the storage array module + the data packaging and unpacking module + the routing switching module as a whole, which completes the overall interaction from the neural network storage to the operation, and avoids complex instruction design and high difficulty. Compiler development, etc., thus reducing the design difficulty and improving the overall processing efficiency.
图3为本申请所述处理器10第三实施例的组成结构示意图。如图3所示,包括:系统控制器101、存储阵列模块102、数据打包拆包模块103、运算模块104、DMA模块105以及路由交换模块106。其中,存储阵列模块102中可包括N1个存储单元1021,每个存储单元1021可为一组SRAM等,数据打包拆包模块103中可包括N2个数据打包拆包单元1031,每个数据打包拆包单元1031可分别通过一个数据通道连接到路由交换模块106上,N1和N2均为大于一的正整数,另外,通用运算模块1041中可包括M个运算单元10411,激活运算模块1042中可包括P个运算单元10421,每个运算单元10411/10421可分别通过一个数据通道连接到路由交换模块106上,M和P均为大于一的正整数。N1、N2、M和P的具体取值均可根据实际需要而定。FIG. 3 is a schematic structural diagram of a third embodiment of the processor 10 described in this application. As shown in FIG. 3 , it includes: a system controller 101 , a storage array module 102 , a data packing and unpacking module 103 , an arithmetic module 104 , a DMA module 105 and a routing switching module 106 . The storage array module 102 may include N1 storage units 1021, each storage unit 1021 may be a set of SRAM, etc., and the data packing and unpacking module 103 may include N2 data packing and unpacking units 1031, each data packing and unpacking unit 1031 The packet unit 1031 can be respectively connected to the routing switching module 106 through a data channel, and N1 and N2 are both positive integers greater than one. In addition, the general operation module 1041 can include M operation units 10411, and the activation operation module 1042 can include There are P arithmetic units 10421, and each arithmetic unit 10411/10421 can be connected to the routing switching module 106 through a data channel respectively, and M and P are both positive integers greater than one. The specific values of N1, N2, M and P can be determined according to actual needs.
相应地,数据打包拆包单元1031可将从存储单元1021元获取的数据包数据及从系统控制器101获取的数据包信息进行打包,利用数据通道,将打包得到的第一数据包通过路由交换模块106发送给运算单元10411/10421进行运算处理,并利用数据通道,通过路由交换模块106 获取运算单元10411/10421返回的第二数据包,通过对第二数据包进行拆包得到运算结果数据,存储到存储单元1021中。Correspondingly, the data packaging and unpacking unit 1031 can package the data packet data obtained from the storage unit 1021 and the data packet information obtained from the system controller 101, and use the data channel to exchange the first data packet obtained through routing. The module 106 sends the operation unit 10411/10421 for operation processing, and uses the data channel to obtain the second data packet returned by the operation unit 10411/10421 through the routing switching module 106, and obtains operation result data by unpacking the second data packet, stored in the storage unit 1021.
在实际应用中,系统控制器101可模拟出每次神经网络运算的细节等,如需要获取哪些数据、从哪里获取、需要进行何种运算等,相应地,可生成数据包信息,发送给相关的数据打包拆包单元1031。各数据打包拆包单元1031可并行工作,如分别获取来自系统控制器101的数据包信息,并进行打包拆包操作等。In practical applications, the system controller 101 can simulate the details of each neural network operation, such as what data needs to be obtained, where to obtain it, what kind of operation needs to be performed, etc. Correspondingly, it can generate data packet information and send it to the relevant The data packing and unpacking unit 1031. Each data packing and unpacking unit 1031 can work in parallel, such as respectively acquiring data packet information from the system controller 101 and performing packing and unpacking operations.
相应地,数据包信息中可包括:源通道、源地址、目的通道(运算通道)、运算类型及数据包长度等。数据打包拆包单元1031可从源通道对应的存储单元1021的源地址中获取数据包数据,路由交换模块106可将获取到的第一数据包发送给目的通道对应的运算单元10411/10421,运算单元10411/10421可根据运算类型,进行相应类型的运算处理。Correspondingly, the data packet information may include: source channel, source address, destination channel (operation channel), operation type, data packet length, and the like. The data packing and unpacking unit 1031 can obtain the data packet data from the source address of the storage unit 1021 corresponding to the source channel, and the routing switching module 106 can send the obtained first data packet to the operation unit 10411/10421 corresponding to the destination channel, and calculate The units 10411/10421 can perform corresponding types of operation processing according to the operation type.
优选地,N1和N2的取值相同,即存储单元1021和数据打包拆包单元1031的个数相同,每个数据打包拆包单元1031可分别对应于一个存储单元1021,从对应的存储单元1021获取数据包数据。这样,可以更好地保证各数据打包拆包单元1031的并行工作,假设两个数据打包拆包单元1031均可从某一存储单元1021中获取数据,那么可能出现等待的情况,即其中一个数据打包拆包单元1031需要等待另一个数据打包拆包单元1031获取完数据之后才能获取数据,从而造成效率的降低等。Preferably, the values of N1 and N2 are the same, that is, the number of storage units 1021 and data packing and unpacking units 1031 is the same, and each data packing and unpacking unit 1031 may respectively correspond to one storage unit 1021, and the data from the corresponding storage unit 1021 Get packet data. In this way, the parallel work of each data packing and unpacking unit 1031 can be better ensured. Assuming that both data packing and unpacking units 1031 can obtain data from a certain storage unit 1021, there may be a waiting situation, that is, one of the data The packing and unpacking unit 1031 needs to wait for another data packing and unpacking unit 1031 to obtain the data before it can obtain the data, thereby causing a reduction in efficiency and the like.
上述处理方式中,通过划分单元,提升了并行处理能力,进而提升了数据的存储交互能力等。In the above processing method, by dividing the units, the parallel processing capability is improved, and the data storage and interaction capability is further improved.
现有的以指令扩展为核心的NPU中,数据的存储交互采用统一的加载/存储(load/store)模式,顺序同步操作,效率低下。而采用本申请所述处理方式后,可并行地进行处理,并避免了同步操作带来的等待时延等,从而使得系统控制和数据存储交互等更为高效。In the existing NPU with instruction extension as the core, the data storage interaction adopts a unified load/store (load/store) mode, and the operation is synchronized in sequence, which is inefficient. However, after the processing method described in the present application is adopted, the processing can be performed in parallel, and the waiting time and the like caused by the synchronous operation are avoided, thereby making the system control and data storage interaction more efficient.
数据包信息中还可进一步包括:目的地址或存储策略。若数据包信息中包括目的地址,那么数据打包拆包单元1031可根据目的地址,将运算结果数据存储到对应的存储单元1021中,若数据包信息中包括存储策略,那么数据打包拆包单元1031可根据存储策略,将运算结果数据存储到对应的存储单元1021中。所述存储策略可为实现数据对齐的存储策略。The data packet information may further include: destination address or storage policy. If the data packet information includes the destination address, then the data packing and unpacking unit 1031 can store the operation result data in the corresponding storage unit 1021 according to the destination address. If the data packet information includes the storage policy, then the data packing and unpacking unit 1031 The operation result data can be stored in the corresponding storage unit 1021 according to the storage policy. The storage strategy may be a storage strategy that achieves data alignment.
运算单元10411/10421完成运算之后,可用运算结果数据替换第一数据包中的数据段中的数据,并且,数据长度通常会发生变化,因此还需要修改数据包中的数据长度信息等,将生成的第二数据包按照第一数据包的传输路径返回给数据打包拆包单元1031,数据打包拆包单元1031将运算结果数据从第二数据包中解析出来后,就会涉及到运算结果数据如何存储的问题。After the operation unit 10411/10421 completes the operation, the data in the data segment in the first data packet can be replaced with the operation result data, and the data length usually changes, so it is also necessary to modify the data length information in the data packet, etc. The second data packet is returned to the data packaging and unpacking unit 1031 according to the transmission path of the first data packet. After the data packaging and unpacking unit 1031 parses the operation result data from the second data packet, it will involve how the operation result data is. storage problems.
相应地,数据包信息中可包括:源通道、源地址、目的通道及目的地址等,即可包括源地址、目的地址和两侧的通道地址,这样,对于获取到的运算结果数据,数据打包拆包单元1031可根据目的地址,将其存储到对应的存储单元1021中。或者,数据包信息中也可不包括目的地址,而是包括存储策略,数据打包拆包单元1031可根据存储策略,将运算结果数据存储到对应的存储单元1021中,从而实现数据的自动对齐等。Correspondingly, the data packet information may include: source channel, source address, destination channel, and destination address, etc., that is, the source address, the destination address, and the channel addresses on both sides. In this way, for the obtained operation result data, the data is packaged. The unpacking unit 1031 may store the destination address in the corresponding storage unit 1021 according to the destination address. Alternatively, the data packet information may not include the destination address, but include a storage policy, and the data packaging and unpacking unit 1031 may store the operation result data in the corresponding storage unit 1021 according to the storage policy, thereby realizing automatic data alignment and the like.
所述存储策略具体为何种策略可根据实际需要而定,比如,可以包括向上对齐、向下对齐、对齐后其它地方如何处理(如进行填充处理)等。The specific strategy of the storage strategy may be determined according to actual needs, for example, it may include upward alignment, downward alignment, and how to process other places after alignment (such as filling processing).
神经网络涉及的运算操作会造成数据缩小或膨胀,即上述数据长度会发生变化,很容易造成运算后的数据不对齐,现有的以指令扩展为核心的NPU中,通常通过额外的数据转换或转置来解决数据对齐问题,这种额外的操作会降低整体处理效率,由于神经网络运算涉及大量的反复存储运算交互迭代操作,因此对整体处理效率会造成很大的影响。而本申请所述处理方式中,通过路由交换的方式实现存储和运算的自由交互,并通过存储策略等自动完成存储,实现数据的自动对齐,实现方式简单,并提升了整体处理效率等。The operations involved in the neural network will cause the data to shrink or expand, that is, the length of the above data will change, which can easily cause the data after the operation to be misaligned. In the existing NPU with instruction expansion as the core, additional data conversion or Transpose is used to solve the data alignment problem. This extra operation will reduce the overall processing efficiency. Since the neural network operation involves a large number of repeated storage operations and interactive iterative operations, it will have a great impact on the overall processing efficiency. However, in the processing method described in this application, the free interaction of storage and operation is realized by means of routing exchange, and the storage is automatically completed through storage policies, etc., and automatic data alignment is realized. The implementation method is simple, and the overall processing efficiency is improved.
如图3所示,系统控制器101可通过外部总线接口与处理单元进行交互,DMA模块105可通过外部总线存储接口与双倍速率(DDR,Double Data Rate)外存储单元进行交互等,具体实现为现有技术。As shown in FIG. 3 , the system controller 101 can interact with the processing unit through the external bus interface, and the DMA module 105 can interact with the double-rate (DDR, Double Data Rate) external storage unit through the external bus storage interface. for existing technology.
以上是装置实施例的介绍,以下通过方法实施例,对本申请所述方案进行进一步说明。The above is the introduction of the apparatus embodiments, and the solution described in the present application will be further described below through the method embodiments.
图4为本申请所述处理器实现方法实施例的流程图。如图4所示,包括以下具体实现方式。FIG. 4 is a flowchart of an embodiment of a method for implementing a processor described in this application. As shown in FIG. 4 , the following specific implementations are included.
在401中,构建由系统控制器、存储阵列模块、数据打包拆包模块 以及运算模块组成的处理器。In 401, a processor consisting of a system controller, a storage array module, a data packing and unpacking module, and an arithmetic module is constructed.
在402中,利用处理器进行神经网络运算;其中,系统控制器用于将预定的数据包信息发送给数据打包拆包模块;数据打包拆包模块用于根据数据包信息从存储阵列模块获取对应的数据包数据,将数据包数据与数据包信息进行打包,将打包得到的第一数据包发送给运算模块进行运算处理,并获取运算模块返回的第二数据包,通过对第二数据包进行拆包得到运算结果数据,存储到存储阵列模块中;存储阵列模块用于进行数据存储;运算模块用于对获取到的第一数据包进行运算处理,根据运算结果数据生成第二数据包,返回给数据打包拆包模块。In 402, use the processor to perform neural network operation; wherein, the system controller is used for sending predetermined data packet information to the data packaging and unpacking module; the data packaging and unpacking module is used for obtaining corresponding data from the storage array module according to the data packet information Packet data, pack the packet data and the packet information, send the first packet obtained by packing to the arithmetic module for arithmetic processing, and obtain the second packet returned by the arithmetic module, and disassemble the second packet by disassembling the second packet. The operation result data is obtained in the package and stored in the storage array module; the storage array module is used for data storage; the operation module is used to perform operation processing on the obtained first data packet, generate a second data packet according to the operation result data, and return it to Data packing and unpacking module.
在上述基础上,还可在处理器中增加DMA模块,DMA模块可用于在系统控制器的控制下实现外部存储数据与存储阵列模块中的内部存储阵列数据的高速交换。On the above basis, a DMA module can also be added to the processor, and the DMA module can be used to realize high-speed exchange of external storage data and internal storage array data in the storage array module under the control of the system controller.
此外,还可在处理器中增加路由交换模块,路由交换模块可用于将获取自数据打包拆包模块的第一数据包发送给运算模块,并将获取自运算模块的第二数据包发送给数据打包拆包模块。In addition, a routing switching module can be added to the processor, and the routing switching module can be used to send the first data packet obtained from the data packing and unpacking module to the computing module, and send the second data packet obtained from the computing module to the data Pack and unpack modules.
运算模块可包括:用于进行通用运算的通用运算模块以及用于进行激活运算的激活运算模块。The operation modules may include: a general operation module for performing general operations and an activation operation module for performing activation operations.
另外,存储阵列模块中可包括N1个存储单元,数据打包拆包模块中可包括N2个数据打包拆包单元,每个数据打包拆包单元分别通过一个数据通道连接到路由交换模块上,N1和N2均为大于一的正整数。通用运算模块中可包括M个运算单元,激活运算模块中可包括P个运算单元,每个运算单元可分别通过一个数据通道连接到路由交换模块上,M和P均为大于一的正整数。In addition, the storage array module may include N1 storage units, the data packing and unpacking module may include N2 data packing and unpacking units, and each data packing and unpacking unit is respectively connected to the routing switch module through a data channel, and N1 and N2 is a positive integer greater than one. The general operation module may include M operation units, the activation operation module may include P operation units, each operation unit may be connected to the routing exchange module through a data channel, and M and P are both positive integers greater than one.
相应地,数据打包拆包单元可用于将从存储单元获取的数据包数据及从系统控制器获取的数据包信息进行打包,利用数据通道,将打包得到的第一数据包通过路由交换模块发送给运算单元进行运算处理,并利用数据通道,通过路由交换模块获取运算单元返回的第二数据包,通过对第二数据包进行拆包得到运算结果数据,存储到存储单元中。Correspondingly, the data packing and unpacking unit can be used to pack the packet data obtained from the storage unit and the packet information obtained from the system controller, and use the data channel to send the first packet obtained by packing to the routing switch module. The arithmetic unit performs arithmetic processing, and uses the data channel to obtain the second data packet returned by the arithmetic unit through the routing switch module, and unpacks the second data packet to obtain the arithmetic result data, which is stored in the storage unit.
数据包信息中可包括:源通道、源地址、目的通道以及运算类型。相应地,数据包数据可为数据打包拆包单元从源通道对应的存储单元的源地址中获取的数据包数据,获取到第一数据包的运算单元可为路由交 换模块确定出的目的通道对应的运算单元,运算处理可为运算单元进行的所述运算类型的运算处理。The data packet information may include: source channel, source address, destination channel and operation type. Correspondingly, the data packet data can be the data packet data obtained by the data packaging and unpacking unit from the source address of the storage unit corresponding to the source channel, and the operation unit that obtains the first data packet can be the destination channel determined by the routing switching module. The operation unit, the operation processing may be the operation processing of the operation type performed by the operation unit.
优选地,N1和N2的取值相同,每个数据打包拆包单元分别对应于一个存储单元,从对应的存储单元获取数据包数据。Preferably, the values of N1 and N2 are the same, each data packing and unpacking unit corresponds to a storage unit respectively, and data packet data is acquired from the corresponding storage unit.
数据包信息中还可进一步包括:目的地址或存储策略。若数据包信息中包括目的地址,那么数据打包拆包单元可根据目的地址,将运算结果数据存储到对应的存储单元中,若数据包信息中包括存储策略,那么数据打包拆包单元可根据存储策略,将运算结果数据存储到对应的存储单元中。所述存储策略可为实现数据对齐的存储策略。The data packet information may further include: destination address or storage policy. If the data packet information includes the destination address, the data packaging and unpacking unit can store the operation result data in the corresponding storage unit according to the destination address. If the data packet information includes the storage strategy, the data packaging and unpacking unit can store the data according to the The strategy is to store the operation result data in the corresponding storage unit. The storage strategy may be a storage strategy that achieves data alignment.
图4所示方法实施例的具体工作流程请参照前述装置实施例中的相关说明,不再赘述。For the specific work flow of the method embodiment shown in FIG. 4 , please refer to the relevant descriptions in the foregoing apparatus embodiments, which will not be repeated.
总之,采用本申请方法实施例所述方案,提出了一种存算一体的实现方式,在处理器中完成了神经网络存储到运算的整体交互,避免了复杂的指令设计和高难度的编译器开发等,从而降低了设计难度,提升了整体处理效率等。In a word, by adopting the solutions described in the method embodiments of the present application, an implementation method integrating storage and computing is proposed, which completes the overall interaction from neural network storage to computing in the processor, avoiding complex instruction design and highly difficult compilers. development, etc., thereby reducing the design difficulty and improving the overall processing efficiency.
根据本申请的实施例,本申请还提供了一种电子设备和一种可读存储介质。According to the embodiments of the present application, the present application further provides an electronic device and a readable storage medium.
如图5所示,是根据本申请实施例所述方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本申请的实现。As shown in FIG. 5 , it is a block diagram of an electronic device according to the method described in the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the application described and/or claimed herein.
如图5所示,该电子设备包括:一个或多个处理器Y01、存储器Y02,以及用于连接各部件的接口,包括高速接口和低速接口。各个部件利用不同的总线互相连接,并且可以被安装在公共主板上或者根据需要以其它方式安装。处理器可以对在电子设备内执行的指令进行处理,包括存储在存储器中或者存储器上以在外部输入/输出装置(诸如,耦合至接口的显示设备)上显示图形用户界面的图形信息的指令。在其它实施方式中,若需要,可以将多个处理器和/或多条总线与多个存储器和多个存储 器一起使用。同样,可以连接多个电子设备,各个设备提供部分必要的操作(例如,作为服务器阵列、一组刀片式服务器、或者多处理器系统)。图5中以一个处理器Y01为例。As shown in FIG. 5, the electronic device includes: one or more processors Y01, a memory Y02, and interfaces for connecting various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or otherwise as desired. The processor may process instructions executed within the electronic device, including instructions stored in or on memory to display graphical information of a graphical user interface on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used with multiple memories and multiple memories, if desired. Likewise, multiple electronic devices may be connected, each providing some of the necessary operations (eg, as a server array, a group of blade servers, or a multiprocessor system). In FIG. 5, a processor Y01 is taken as an example.
存储器Y02即为本申请所提供的非瞬时计算机可读存储介质。其中,所述存储器存储有可由至少一个处理器执行的指令,以使所述至少一个处理器执行本申请所提供的方法。本申请的非瞬时计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行本申请所提供的方法。The memory Y02 is the non-transitory computer-readable storage medium provided in this application. Wherein, the memory stores instructions executable by at least one processor, so that the at least one processor executes the method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the methods provided by the present application.
存储器Y02作为一种非瞬时计算机可读存储介质,可用于存储非瞬时软件程序、非瞬时计算机可执行程序以及模块,如本申请实施例中的方法对应的程序指令/模块。处理器Y01通过运行存储在存储器Y02中的非瞬时软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例中的方法。As a non-transitory computer-readable storage medium, the memory Y02 can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present application. The processor Y01 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions and modules stored in the memory Y02, that is, to implement the methods in the above method embodiments.
存储器Y02可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据电子设备的使用所创建的数据等。此外,存储器Y02可以包括高速随机存取存储器,还可以包括非瞬时存储器,例如至少一个磁盘存储器件、闪存器件、或其他非瞬时固态存储器件。在一些实施例中,存储器Y02可选包括相对于处理器Y01远程设置的存储器,这些远程存储器可以通过网络连接至电子设备。上述网络的实例包括但不限于互联网、企业内部网、区块链网络、局域网、移动通信网及其组合。The memory Y02 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function; the storage data area may store data created according to the use of the electronic device, and the like. In addition, the memory Y02 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory Y02 may optionally include memory located remotely relative to processor Y01, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the Internet, intranets, blockchain networks, local area networks, mobile communication networks, and combinations thereof.
电子设备还可以包括:输入装置Y03和输出装置Y04。处理器Y01、存储器Y02、输入装置Y03和输出装置Y04可以通过总线或者其他方式连接,图5中以通过总线连接为例。The electronic device may further include: an input device Y03 and an output device Y04. The processor Y01, the memory Y02, the input device Y03, and the output device Y04 may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 5 .
输入装置Y03可接收输入的数字或字符信息,以及产生与电子设备的用户设置以及功能控制有关的键信号输入,例如触摸屏、小键盘、鼠标、轨迹板、触摸板、指示杆、一个或者多个鼠标按钮、轨迹球、操纵杆等输入装置。输出装置Y04可以包括显示设备、辅助照明装置和触觉反馈装置(例如,振动电机)等。该显示设备可以包括但不限于,液晶显示器、发光二极管显示器和等离子体显示器。在一些实施方式中,显示设备可以是触摸屏。Input device Y03 can receive input numerical or character information, and generate key signal input related to user settings and function control of electronic equipment, such as touch screen, keypad, mouse, track pad, touch pad, pointing stick, one or more Input devices such as mouse buttons, trackballs, joysticks, etc. The output device Y04 may include a display device, an auxiliary lighting device, a haptic feedback device (eg, a vibration motor), and the like. The display devices may include, but are not limited to, liquid crystal displays, light emitting diode displays, and plasma displays. In some implementations, the display device may be a touch screen.
此处描述的系统和技术的各种实施方式可以在数字电子电路系统、 集成电路系统、专用集成电路、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described herein can be implemented in digital electronic circuitry, integrated circuit systems, application specific integrated circuits, computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that The processor, which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
这些计算程序(也称作程序、软件、软件应用、或者代码)包括可编程处理器的机器指令,并且可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。如本文使用的,术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置),包括,接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。These computational programs (also referred to as programs, software, software applications, or codes) include machine instructions for programmable processors, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages calculation program. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or apparatus for providing machine instructions and/or data to a programmable processor ( For example, a magnetic disk, an optical disk, a memory, a programmable logic device), including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,阴极射线管或者液晶显示器监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having: a display device (eg, a cathode ray tube or liquid crystal display monitor) for displaying information to the user; and a keyboard and pointing A device (eg, a mouse or trackball) through which the user can provide input to the computer through the keyboard and the pointing device. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网、广域网、区块链网络和互 联网。The systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: local area networks, wide area networks, blockchain networks, and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决了传统物理主机与VPS服务中,存在的管理难度大,业务扩展性弱的缺陷。A computer system can include clients and servers. Clients and servers are generally remote from each other and usually interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also known as a cloud computing server or a cloud host. It is a host product in the cloud computing service system to solve the traditional physical host and VPS services, which are difficult to manage and weak in business scalability. defect.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请公开的技术方案所期望的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in the present application can be executed in parallel, sequentially or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, no limitation is imposed herein.
上述具体实施方式,并不构成对本申请保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等,均应包含在本申请保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of this application shall be included within the protection scope of this application.

Claims (20)

  1. 一种处理器,包括:系统控制器、存储阵列模块、数据打包拆包模块以及运算模块;A processor, comprising: a system controller, a storage array module, a data packing and unpacking module, and an arithmetic module;
    所述系统控制器,用于将预定的数据包信息发送给所述数据打包拆包模块;the system controller, configured to send predetermined data packet information to the data packaging and unpacking module;
    所述数据打包拆包模块,用于根据所述数据包信息从所述存储阵列模块获取对应的数据包数据,将所述数据包数据与所述数据包信息进行打包,将打包得到的第一数据包发送给所述运算模块进行运算处理,并获取所述运算模块返回的第二数据包,通过对所述第二数据包进行拆包得到运算结果数据,存储到所述存储阵列模块中;The data packaging and unpacking module is configured to obtain the corresponding data packet data from the storage array module according to the data packet information, package the data packet data and the data packet information, and sending the data packet to the operation module for operation processing, and acquiring the second data packet returned by the operation module, obtaining operation result data by unpacking the second data packet, and storing it in the storage array module;
    所述存储阵列模块,用于进行数据存储;The storage array module is used for data storage;
    所述运算模块,用于对获取到的所述第一数据包进行运算处理,根据运算结果数据生成所述第二数据包,返回给所述数据打包拆包模块。The operation module is configured to perform operation processing on the acquired first data packet, generate the second data packet according to the operation result data, and return it to the data packaging and unpacking module.
  2. 根据权利要求1所述的处理器,还包括:The processor of claim 1, further comprising:
    直接内存存取模块,用于在所述系统控制器的控制下实现外部存储数据与所述存储阵列模块中的内部存储阵列数据的高速交换。The direct memory access module is used for realizing high-speed exchange of external storage data and internal storage array data in the storage array module under the control of the system controller.
  3. 根据权利要求1所述的处理器,还包括:The processor of claim 1, further comprising:
    路由交换模块,用于将获取自所述数据打包拆包模块的所述第一数据包发送给所述运算模块,并将获取自所述运算模块的所述第二数据包发送给所述数据打包拆包模块。A routing switching module, configured to send the first data packet obtained from the data packaging and unpacking module to the computing module, and send the second data packet obtained from the computing module to the data Pack and unpack modules.
  4. 根据权利要求3所述的处理器,其中,The processor of claim 3, wherein,
    所述运算模块包括:通用运算模块以及激活运算模块;The operation module includes: a general operation module and an activation operation module;
    所述通用运算模块,用于进行通用运算;所述激活运算模块,用于进行激活运算。The general operation module is used for general operation; the activation operation module is used for activation operation.
  5. 根据权利要求4所述的处理器,其中,The processor of claim 4, wherein,
    所述存储阵列模块中包括N1个存储单元;The storage array module includes N1 storage units;
    所述数据打包拆包模块中包括N2个数据打包拆包单元,每个数据打包拆包单元分别通过一个数据通道连接到所述路由交换模块上,N1和N2均为大于一的正整数;The data packing and unpacking module includes N2 data packing and unpacking units, each data packing and unpacking unit is respectively connected to the routing switching module through a data channel, and N1 and N2 are both positive integers greater than one;
    所述通用运算模块中包括M个运算单元,所述激活运算模块中包括 P个运算单元,每个运算单元分别通过一个数据通道连接到所述路由交换模块上,M和P均为大于一的正整数;The general computing module includes M computing units, the activation computing module includes P computing units, each computing unit is connected to the routing switching module through a data channel, and M and P are both greater than one. positive integer;
    所述数据打包拆包单元将从所述存储单元获取的所述数据包数据及从所述系统控制器获取的所述数据包信息进行打包,利用所述数据通道,将打包得到的所述第一数据包通过所述路由交换模块发送给运算单元进行运算处理,并利用所述数据通道,通过所述路由交换模块获取所述运算单元返回的所述第二数据包,通过对所述第二数据包进行拆包得到运算结果数据,存储到所述存储单元中。The data packaging and unpacking unit packs the data packet data obtained from the storage unit and the data packet information obtained from the system controller, and uses the data channel to pack the data obtained by packaging. A data packet is sent to the computing unit through the routing switching module for computing processing, and the data channel is used to obtain the second data packet returned by the computing unit through the routing switching module. The data packets are unpacked to obtain operation result data, which are stored in the storage unit.
  6. 根据权利要求5所述的处理器,其中,The processor of claim 5, wherein,
    所述数据包信息中包括:源通道、源地址、目的通道以及运算类型;The data packet information includes: source channel, source address, destination channel and operation type;
    所述数据打包拆包单元从所述源通道对应的存储单元的所述源地址中获取所述数据包数据;The data packaging and unpacking unit obtains the data packet data from the source address of the storage unit corresponding to the source channel;
    所述路由交换模块将所述第一数据包发送给所述目的通道对应的运算单元进行所述运算类型的运算处理。The routing switching module sends the first data packet to the operation unit corresponding to the destination channel for operation processing of the operation type.
  7. 根据权利要求6所述的处理器,其中,The processor of claim 6, wherein,
    所述N1和N2的取值相同,每个数据打包拆包单元分别对应于一个存储单元,从对应的存储单元获取所述数据包数据。The values of N1 and N2 are the same, each data packing and unpacking unit corresponds to a storage unit respectively, and the data packet data is acquired from the corresponding storage unit.
  8. 根据权利要求7所述的处理器,其中,The processor of claim 7, wherein,
    所述数据包信息中进一步包括:目的地址或存储策略;The data packet information further includes: destination address or storage policy;
    若所述数据包信息中包括所述目的地址,则所述数据打包拆包单元根据所述目的地址,将所述运算结果数据存储到对应的存储单元中;If the destination address is included in the data packet information, the data packing and unpacking unit stores the operation result data in the corresponding storage unit according to the destination address;
    若所述数据包信息中包括所述存储策略,则所述数据打包拆包单元根据所述存储策略,将所述运算结果数据存储到对应的存储单元中。If the data packet information includes the storage policy, the data packaging and unpacking unit stores the operation result data in a corresponding storage unit according to the storage policy.
  9. 根据权利要求8所述的处理器,其中,所述存储策略包括:实现数据对齐的存储策略。The processor of claim 8, wherein the storage policy comprises a storage policy that implements data alignment.
  10. 一种处理器实现方法,包括:A processor implementation method, comprising:
    构建由系统控制器、存储阵列模块、数据打包拆包模块以及运算模块组成的处理器;Build a processor consisting of a system controller, a storage array module, a data packaging and unpacking module, and an arithmetic module;
    利用所述处理器进行神经网络运算;其中,所述系统控制器用于将预定的数据包信息发送给所述数据打包拆包模块;所述数据打包拆包模块用于根据所述数据包信息从所述存储阵列模块获取对应的数据包数 据,将所述数据包数据与所述数据包信息进行打包,将打包得到的第一数据包发送给所述运算模块进行运算处理,并获取所述运算模块返回的第二数据包,通过对所述第二数据包进行拆包得到运算结果数据,存储到所述存储阵列模块中;所述存储阵列模块用于进行数据存储;所述运算模块用于对获取到的所述第一数据包进行运算处理,根据运算结果数据生成所述第二数据包,返回给所述数据打包拆包模块。Use the processor to perform neural network operations; wherein, the system controller is configured to send predetermined data packet information to the data packing and unpacking module; the data packing and unpacking module is configured to convert from The storage array module obtains the corresponding data packet data, packages the data packet data and the data packet information, sends the first data packet obtained by packaging to the operation module for operation processing, and obtains the operation For the second data packet returned by the module, the operation result data is obtained by unpacking the second data packet and stored in the storage array module; the storage array module is used for data storage; the operation module is used for Perform operation processing on the acquired first data packet, generate the second data packet according to the operation result data, and return it to the data packaging and unpacking module.
  11. 根据权利要求10所述的方法,还包括:The method of claim 10, further comprising:
    在所述处理器中增加直接内存存取模块,所述直接内存存取模块用于在所述系统控制器的控制下实现外部存储数据与所述存储阵列模块中的内部存储阵列数据的高速交换。A direct memory access module is added to the processor, and the direct memory access module is used to realize high-speed exchange of external storage data and internal storage array data in the storage array module under the control of the system controller .
  12. 根据权利要求10所述的方法,还包括:The method of claim 10, further comprising:
    在所述处理器中增加路由交换模块,所述路由交换模块用于将获取自所述数据打包拆包模块的所述第一数据包发送给所述运算模块,并将获取自所述运算模块的所述第二数据包发送给所述数据打包拆包模块。A routing switching module is added to the processor, and the routing switching module is configured to send the first data packet obtained from the data packaging and unpacking module to the computing module, and obtain the first data packet from the computing module The second data packet is sent to the data packaging and unpacking module.
  13. 根据权利要求12所述的方法,其中,The method of claim 12, wherein,
    所述运算模块包括:用于进行通用运算的通用运算模块以及用于进行激活运算的激活运算模块。The operation module includes: a general operation module for performing general operations and an activation operation module for performing activation operations.
  14. 根据权利要求13所述的方法,其中,The method of claim 13, wherein,
    所述存储阵列模块中包括N1个存储单元;The storage array module includes N1 storage units;
    所述数据打包拆包模块中包括N2个数据打包拆包单元,每个数据打包拆包单元分别通过一个数据通道连接到所述路由交换模块上,N1和N2均为大于一的正整数;The data packing and unpacking module includes N2 data packing and unpacking units, each data packing and unpacking unit is respectively connected to the routing switching module through a data channel, and N1 and N2 are both positive integers greater than one;
    所述通用运算模块中包括M个运算单元,所述激活运算模块中包括P个运算单元,每个运算单元分别通过一个数据通道连接到所述路由交换模块上,M和P均为大于一的正整数;The general computing module includes M computing units, the activation computing module includes P computing units, each computing unit is connected to the routing switching module through a data channel, and M and P are both greater than one. positive integer;
    所述数据打包拆包单元用于将从所述存储单元获取的所述数据包数据及从所述系统控制器获取的所述数据包信息进行打包,利用所述数据通道,将打包得到的所述第一数据包通过所述路由交换模块发送给运算单元进行运算处理,并利用所述数据通道,通过所述路由交换模块获取所述运算单元返回的所述第二数据包,通过对所述第二数据包进行拆包得到运算结果数据,存储到所述存储单元中。The data packaging and unpacking unit is used to package the data packet data obtained from the storage unit and the data packet information obtained from the system controller, and use the data channel to package the data obtained by packaging. The first data packet is sent to the computing unit through the routing switching module for operation processing, and the second data packet returned by the computing unit is acquired through the routing switching module by using the data channel, and the The second data packet is unpacked to obtain operation result data, which is stored in the storage unit.
  15. 根据权利要求14所述的方法,其中,The method of claim 14, wherein,
    所述数据包信息中包括:源通道、源地址、目的通道以及运算类型;The data packet information includes: source channel, source address, destination channel and operation type;
    所述数据包数据为所述数据打包拆包单元从所述源通道对应的存储单元的所述源地址中获取的数据包数据;The data packet data is the data packet data obtained by the data packaging and unpacking unit from the source address of the storage unit corresponding to the source channel;
    获取到所述第一数据包的运算单元为所述路由交换模块确定出的所述目的通道对应的运算单元;The operation unit that obtains the first data packet is the operation unit corresponding to the destination channel determined by the routing switching module;
    所述运算处理为所述运算单元进行的所述运算类型的运算处理。The operation processing is the operation processing of the operation type performed by the operation unit.
  16. 根据权利要求15所述的方法,其中,The method of claim 15, wherein,
    所述N1和N2的取值相同,每个数据打包拆包单元分别对应于一个存储单元,从对应的存储单元获取所述数据包数据。The values of N1 and N2 are the same, each data packing and unpacking unit corresponds to a storage unit respectively, and the data packet data is acquired from the corresponding storage unit.
  17. 根据权利要求16所述的方法,其中,The method of claim 16, wherein,
    所述数据包信息中进一步包括:目的地址或存储策略;The data packet information further includes: destination address or storage policy;
    若所述数据包信息中包括所述目的地址,则所述数据打包拆包单元根据所述目的地址,将所述运算结果数据存储到对应的存储单元中;If the destination address is included in the data packet information, the data packaging and unpacking unit stores the operation result data in the corresponding storage unit according to the destination address;
    若所述数据包信息中包括所述存储策略,则所述数据打包拆包单元根据所述存储策略,将所述运算结果数据存储到对应的存储单元中。If the data packet information includes the storage policy, the data packaging and unpacking unit stores the operation result data in a corresponding storage unit according to the storage policy.
  18. 根据权利要求17所述的方法,其中,所述存储策略包括:实现数据对齐的存储策略。18. The method of claim 17, wherein the storage policy comprises a storage policy that implements data alignment.
  19. 一种电子设备,包括:An electronic device comprising:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求10-18中任一项所述的方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the execution of any of claims 10-18 Methods.
  20. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行权利要求10-18中任一项所述的方法。A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of any of claims 10-18.
PCT/CN2021/110952 2020-08-21 2021-08-05 Processor, implementation method, electronic device, and storage medium WO2022037422A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2022554384A JP7379794B2 (en) 2020-08-21 2021-08-05 Processors and implementation methods, electronic devices, and storage media
US17/792,867 US11784946B2 (en) 2020-08-21 2021-08-05 Method for improving data flow and access for a neural network processor
KR1020227027218A KR20220122756A (en) 2020-08-21 2021-08-05 Processor and implementation method, electronic device, and recording medium
EP21857517.3A EP4075759A4 (en) 2020-08-21 2021-08-05 Processor, implementation method, electronic device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010851757.7 2020-08-21
CN202010851757.7A CN112152947B (en) 2020-08-21 2020-08-21 Processor, implementation method, electronic device and storage medium

Publications (1)

Publication Number Publication Date
WO2022037422A1 true WO2022037422A1 (en) 2022-02-24

Family

ID=73888869

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/110952 WO2022037422A1 (en) 2020-08-21 2021-08-05 Processor, implementation method, electronic device, and storage medium

Country Status (6)

Country Link
US (1) US11784946B2 (en)
EP (1) EP4075759A4 (en)
JP (1) JP7379794B2 (en)
KR (1) KR20220122756A (en)
CN (1) CN112152947B (en)
WO (1) WO2022037422A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112152947B (en) * 2020-08-21 2021-07-20 北京百度网讯科技有限公司 Processor, implementation method, electronic device and storage medium
CN114049978A (en) * 2021-10-28 2022-02-15 国核自仪系统工程有限公司 Nuclear power station safety instrument control system
CN117195989B (en) * 2023-11-06 2024-06-04 深圳市九天睿芯科技有限公司 Vector processor, neural network accelerator, chip and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284823A (en) * 2017-04-20 2019-01-29 上海寒武纪信息科技有限公司 A kind of arithmetic unit and Related product
CN110334799A (en) * 2019-07-12 2019-10-15 电子科技大学 Integrated ANN Reasoning and training accelerator and its operation method are calculated based on depositing
CN110990060A (en) * 2019-12-06 2020-04-10 北京瀚诺半导体科技有限公司 Embedded processor, instruction set and data processing method of storage and computation integrated chip
US20200202215A1 (en) * 2018-12-21 2020-06-25 Advanced Micro Devices, Inc. Machine intelligence processor with compute unit remapping
US20200242459A1 (en) * 2019-01-30 2020-07-30 Intel Corporation Instruction set for hybrid cpu and analog in-memory artificial intelligence processor
CN112152947A (en) * 2020-08-21 2020-12-29 北京百度网讯科技有限公司 Processor, implementation method, electronic device and storage medium

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000322400A (en) 1999-05-10 2000-11-24 Fuji Xerox Co Ltd Information processor
JP6147131B2 (en) * 2013-07-30 2017-06-14 オリンパス株式会社 Arithmetic unit
US11029949B2 (en) * 2015-10-08 2021-06-08 Shanghai Zhaoxin Semiconductor Co., Ltd. Neural network unit
US12073328B2 (en) * 2016-07-17 2024-08-27 Gsi Technology Inc. Integrating a memory layer in a neural network for one-shot learning
US12118451B2 (en) 2017-01-04 2024-10-15 Stmicroelectronics S.R.L. Deep convolutional network heterogeneous architecture
CN107590535A (en) * 2017-09-08 2018-01-16 西安电子科技大学 Programmable neural network processor
US11636327B2 (en) * 2017-12-29 2023-04-25 Intel Corporation Machine learning sparse computation mechanism for arbitrary neural networks, arithmetic compute microarchitecture, and sparsity for training mechanism
CN108256628B (en) * 2018-01-15 2020-05-22 合肥工业大学 Convolutional neural network hardware accelerator based on multicast network-on-chip and working method thereof
US10698730B2 (en) 2018-04-03 2020-06-30 FuriosaAI Co. Neural network processor
US11861484B2 (en) 2018-09-28 2024-01-02 Qualcomm Incorporated Neural processing unit (NPU) direct memory access (NDMA) hardware pre-processing and post-processing
CN111241028A (en) * 2018-11-28 2020-06-05 北京知存科技有限公司 Digital-analog hybrid storage and calculation integrated chip and calculation device
CN111382847B (en) * 2018-12-27 2022-11-22 上海寒武纪信息科技有限公司 Data processing device and related product
CN111523652B (en) * 2019-02-01 2023-05-02 阿里巴巴集团控股有限公司 Processor, data processing method thereof and image pickup device
US11726950B2 (en) * 2019-09-28 2023-08-15 Intel Corporation Compute near memory convolution accelerator
US12067479B2 (en) * 2019-10-25 2024-08-20 T-Head (Shanghai) Semiconductor Co., Ltd. Heterogeneous deep learning accelerator

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284823A (en) * 2017-04-20 2019-01-29 上海寒武纪信息科技有限公司 A kind of arithmetic unit and Related product
US20200202215A1 (en) * 2018-12-21 2020-06-25 Advanced Micro Devices, Inc. Machine intelligence processor with compute unit remapping
US20200242459A1 (en) * 2019-01-30 2020-07-30 Intel Corporation Instruction set for hybrid cpu and analog in-memory artificial intelligence processor
CN110334799A (en) * 2019-07-12 2019-10-15 电子科技大学 Integrated ANN Reasoning and training accelerator and its operation method are calculated based on depositing
CN110990060A (en) * 2019-12-06 2020-04-10 北京瀚诺半导体科技有限公司 Embedded processor, instruction set and data processing method of storage and computation integrated chip
CN112152947A (en) * 2020-08-21 2020-12-29 北京百度网讯科技有限公司 Processor, implementation method, electronic device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
IMANI MOHSEN MOIMANI@UCSD.EDU; GUPTA SARANSH SGUPTA@UCSD.EDU; ROSING TAJANA TAJANA@UCSD.EDU: "Digital-based processing in-memory a highly-parallel accelerator for data intensive applications", PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS , MEMSYS '19, ACM PRESS, NEW YORK, NEW YORK, USA, 30 September 2019 (2019-09-30) - 3 October 2019 (2019-10-03), New York, New York, USA , pages 38 - 40, XP058468857, ISBN: 978-1-4503-7206-0, DOI: 10.1145/3357526.3357551 *

Also Published As

Publication number Publication date
EP4075759A1 (en) 2022-10-19
US11784946B2 (en) 2023-10-10
CN112152947B (en) 2021-07-20
CN112152947A (en) 2020-12-29
EP4075759A4 (en) 2023-09-20
JP2023517921A (en) 2023-04-27
JP7379794B2 (en) 2023-11-15
US20230179546A1 (en) 2023-06-08
KR20220122756A (en) 2022-09-02

Similar Documents

Publication Publication Date Title
WO2022037422A1 (en) Processor, implementation method, electronic device, and storage medium
JP2022008781A (en) Decentralized training method, system, device, storage medium and program
CN104820657A (en) Inter-core communication method and parallel programming model based on embedded heterogeneous multi-core processor
JP7210830B2 (en) Speech processing system, speech processing method, electronic device and readable storage medium
US20210373799A1 (en) Method for storing data and method for reading data
US11863469B2 (en) Utilizing coherently attached interfaces in a network stack framework
CN103645994A (en) Data processing method and device
KR102522958B1 (en) Method and apparatus for traversing graph database
JP7488322B2 (en) Access method, device, electronic device and computer storage medium
US20220318163A1 (en) Transporting request types with different latencies
CN112929183B (en) Intelligent network card, message transmission method, device, equipment and storage medium
JP7482223B2 (en) APPLET PAGE RENDERING METHOD, DEVICE, ELECTRONIC DEVICE, AND STORAGE MEDIUM
CN103338156A (en) Thread pool based named pipe server concurrent communication method
JP2021144730A (en) Instruction executing method, apparatus, electronic device, computer-readable storage medium, and program
JP2021068414A (en) Wrapping method, registration method, device, rendering device, and program
CN116225419A (en) Code-free development method, device, storage medium and equipment for back-end transaction processing
US10078601B2 (en) Approach for interfacing a pipeline with two or more interfaces in a processor
US20210318920A1 (en) Low latency remoting to accelerators
Chiu et al. Design and Implementation of the Link-List DMA Controller for High Bandwidth Data Streaming
CN103257940B (en) A kind of SOC(system on a chip) SoC writes the method and device of data
US12079516B2 (en) Host-preferred memory operation
WO2024174258A1 (en) Deep neural network checkpoint optimization system and method based on nonvolatile memory
JP2024129010A (en) Data transmission method, device, equipment and medium
CN116846977A (en) Network sharing method, device and system, electronic equipment and storage medium
WO2024035807A1 (en) System and method for ghost bridging

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 20227027218

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021857517

Country of ref document: EP

Effective date: 20220713

ENP Entry into the national phase

Ref document number: 2022554384

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE