WO2021134521A1 - Storage management apparatus and chip - Google Patents

Storage management apparatus and chip Download PDF

Info

Publication number
WO2021134521A1
WO2021134521A1 PCT/CN2019/130640 CN2019130640W WO2021134521A1 WO 2021134521 A1 WO2021134521 A1 WO 2021134521A1 CN 2019130640 W CN2019130640 W CN 2019130640W WO 2021134521 A1 WO2021134521 A1 WO 2021134521A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
instruction
data packet
chip
processing core
Prior art date
Application number
PCT/CN2019/130640
Other languages
French (fr)
Chinese (zh)
Inventor
罗飞
王维伟
Original Assignee
北京希姆计算科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京希姆计算科技有限公司 filed Critical 北京希姆计算科技有限公司
Priority to CN201980102940.2A priority Critical patent/CN114902619B/en
Priority to PCT/CN2019/130640 priority patent/WO2021134521A1/en
Publication of WO2021134521A1 publication Critical patent/WO2021134521A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2115/00Details relating to the type of the circuit
    • G06F2115/02System on chip [SoC] design
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the field of chip technology, in particular to a storage management device and a chip.
  • the chip is the cornerstone of data processing, and it fundamentally determines the ability of people to process data. From the perspective of application fields, there are two main routes for chips: one is a general-purpose chip route, such as a central processing unit (CPU), etc. They can provide great flexibility, but they are effective in processing algorithms in specific fields. The power is relatively low; the other is a dedicated chip route, such as Tensor Processing Unit (TPU), etc. They can exert higher effective computing power in some specific fields, but they are more versatile in the face of flexible and changeable In the field, their processing power is relatively poor or even unable to handle.
  • a general-purpose chip route such as a central processing unit (CPU), etc. They can provide great flexibility, but they are effective in processing algorithms in specific fields. The power is relatively low; the other is a dedicated chip route, such as Tensor Processing Unit (TPU), etc. They can exert higher effective computing power in some specific fields, but they are more versatile in the face of flexible and changeable In the field, their processing power is relatively poor or even
  • the chip Due to the wide variety and huge amount of data in the intelligent era, the chip is required to have extremely high flexibility, capable of processing different fields and rapidly changing algorithms, and extremely strong processing capabilities, which can quickly process extremely large and rapidly increasing data. the amount.
  • the object of the present invention is to provide a storage management device and a chip that can parse the instructions received from the on-chip network, generate at least one data packet based on the data released from the external storage device, and send all the data packets To the on-chip network, all the data packets generated by the storage management device can be sent to the corresponding processing cores through the on-chip network, so that each processing core that requires the same data does not need to fetch data from the storage device separately, which reduces latency and saves chips. And the power consumption of the storage device saves the time for processing cores that require the same data to read the data.
  • the first aspect of the present invention provides a storage management device, the storage management device is arranged between the on-chip network and the storage device; the storage management device includes: an instruction parsing unit for parsing from the The instruction received by the on-chip network generates a control signal according to the instruction, and the data packet generating unit is configured to generate at least one data packet based on the data released from the storage device according to the control signal, and combine all The data packet is sent to the network on chip.
  • the storage management device provided by the embodiment of the present invention can parse the instructions received from the network on chip, generate at least one data packet based on the data released from the external storage device, and send all the data packets to the network on chip, through the network on chip
  • a storage management device is set between the storage device and the storage device.
  • the storage management device sends data packets to all processing cores that require the same data, so that each processing core that requires the same data does not need to fetch data from the storage device separately, which reduces time delay and saves The power consumption of the chip and the storage device is reduced, and the time for processing cores that require the same data to read data is saved.
  • the data packet generating unit is configured to generate a data packet based on the data released from the storage device according to the control signal, and includes: the data packet generating unit is configured to generate a data packet based on the control signal and based on the control signal.
  • the released data and the processing core indicated by the instruction generate the data packet corresponding to the processing core.
  • the data packet generating unit is configured to generate a data packet corresponding to each of the processing cores based on the released data and the processing cores indicated by the instruction. Corresponding to the data packet, and send the generated multiple data packets to the network on chip in parallel.
  • the instruction includes fetching information and storing information.
  • the device further includes an access number indicating unit; the access number indicating unit is configured to receive the storage information to instruct the storage device to release the data indicated by the access information;
  • the access number indication unit is further configured to receive the access information, and generate a storage address of the data corresponding to each processing core indicated by the instruction according to the storage information.
  • the data packet generating unit is configured to generate a data packet based on the data released from the storage device according to the control signal, including: the data packet generating unit is configured to generate the data packet according to the control signal
  • the address of the processing core indicated by the instruction generates the data packet based on the released data, the storage address, and the address of the processing core indicated by the instruction.
  • the data packet generating unit includes switches, the number of the switches is greater than or equal to the number of processing cores connected to the on-chip network; wherein each processing core corresponds to one switch, and the control signal is used to control The state of the switch.
  • the data packet generating unit is configured to send the data packet to the processing core indicated by the instruction in parallel, including: the data packet generating unit is configured to connect the data packet to the processing core indicated by the instruction according to the control signal.
  • the switch corresponding to the processing core sends the data packet to the on-chip network in parallel, so as to send the data packet to the corresponding processing core through the on-chip network.
  • the switch includes a control terminal, an input terminal, and an output terminal; the input terminal of each switch is used to receive the storage address sent by the access number indicating unit and the storage device released Data; the control terminal of each switch is used to implement an on state or an off state according to the received control signal.
  • the instruction parsing unit is configured to send the control signal to the control terminal to control the switch to turn on, and the data packet generation unit combines the address of the processing core with the input The storage address and the data obtained by the terminal generate a data packet.
  • the instruction includes a control bit for controlling the switch; the control bit corresponding to each processing core indicated by the instruction is a preset value; when the instruction parsing unit determines the control bit When it is a preset value, the control signal is generated.
  • the instruction includes the number of fetches
  • the fetch information includes the first address of the fetch address
  • the access number indicating unit is configured to calculate the number of fetches according to the first address of the fetch address and the number of fetches.
  • the data access address is generated by the data, and the storage device is instructed to release the data according to the access address.
  • the storage information includes the first address of the storage address of the data in the processing core indicated by the instruction, and the access number indicating unit is configured to obtain data according to the first address of the storage address and the number of accesses. The number generates the storage address.
  • the instruction includes: a control bit, an information bit, a first address for fetching, the number of fetching, and the first address of a data storage address in each processing core of the data to be received.
  • control bit is used to control the state of the switch
  • information bit is used to indicate the processing core of the data to be received.
  • the storage management device further includes an instruction caching unit, configured to receive the instruction sent by the on-chip network, cache the instruction, and send the cached instruction to the instruction parsing unit.
  • an instruction caching unit configured to receive the instruction sent by the on-chip network, cache the instruction, and send the cached instruction to the instruction parsing unit.
  • a chip including an on-chip network, a plurality of the processing cores connected to the on-chip network, a storage device, and the storage management device provided in the first aspect.
  • a card board which includes one or more chips provided in the second aspect.
  • an electronic device including one or more cards provided in the third aspect.
  • a storage management method including: parsing an instruction received from an on-chip network, and generating a control signal according to the instruction; according to the control signal, based on the data released from the storage device At least one data packet is generated, and all the data packets are sent to the network on chip.
  • a computer storage medium having a computer program stored on the computer storage medium, and when the program is executed by a processor, a storage management method of the fifth aspect is implemented.
  • an electronic device including a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor implements A storage management method of the fifth aspect.
  • a computer program product which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute a storage management method of the fifth aspect.
  • the storage management device provided by the embodiment of the present invention can parse the instructions received from the network on chip, generate at least one data packet based on the data released from the external storage device, and send all the data packets to the network on chip, through the network on chip
  • a storage management device is set between the storage device and the storage device.
  • the storage management device sends data packets to all processing cores that require the same data, so that each processing core that requires the same data does not need to fetch data from the storage device separately, which reduces time delay and saves The power consumption of the chip and the storage device is reduced, and the time for processing cores that require the same data to read data is saved.
  • FIG. 1 is a schematic diagram of a flow of multiple cores obtaining the same data in the prior art
  • FIG. 2 is a schematic diagram of a flow of multiple cores obtaining the same data in the prior art
  • FIG. 3 is a schematic structural diagram of a storage management device according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a storage management device according to another embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a chip according to still another embodiment of the present invention.
  • Fig. 6 is a schematic flowchart of a storage management method according to an embodiment of the present invention.
  • the chip has N cores, all cores independently process images in parallel, and the algorithms and weights used by each core are the same. Then the way to maximize the computing power of the chip and spend the least delay and power consumption is that all cores share the same weight, that is, multiple cores extract the same data.
  • FIG. 1 is a schematic diagram of a flow of multiple cores acquiring the same data in the prior art.
  • a CPU includes N processing cores and shared memory (Shared Memory, SM).
  • the N processing cores are processing core C1 to processing core Cn, and each processing core reads its own core needs from SM separately.
  • the process of multiple processing cores is:
  • Processing core C1 issues a fetch instruction to SM and fetches the data released by SM. After processing core C1 fetches the data, other processing cores will issue fetch instructions to SM in turn, and retrieve data stored in the same location from SM again. .
  • the single arrow in FIG. 1 indicates the data transmission direction for multiple cores to obtain the same data, and the dashed line indicates that the core can implement two-way data transmission with the shared memory module SM.
  • each processing core needs to read it independently.
  • the same data is read multiple times from the SM, resulting in high power consumption in the SM.
  • the SM is in If only one copy of the data is stored, multiple processing cores are required to read one by one.
  • the processing cores that are ranked later need to wait for the previous processing cores to read them before they can be read, causing the processing cores that are ranked later to read the data.
  • Fig. 2 is a schematic diagram of a process in which multiple cores obtain the same data in the prior art.
  • a GPU includes N processing cores and SM.
  • the N processing cores are processing core C1 to processing core Cn.
  • the SM stores multiple identical data consecutively, and one processing core reads from SM. After fetching the required data, the SM releases the same data continuously stored for multiple processing cores to read.
  • the process of multiple processing check numbers is:
  • the first processing core C1 issues a fetch instruction to SM.
  • SM releases the same data continuously stored.
  • Processing cores C1-Cn that require the same data respectively receive the same data released by SM. This method requires SM to transfer the same data. There are multiple copies of continuous storage, which wastes the memory of SM.
  • the single arrow in FIG. 2 indicates the direction of data transmission for multiple cores to obtain the same data, and the dotted line indicates that the cores can implement two-way data transmission with the shared memory module SM.
  • Fig. 3 is a schematic structural diagram of a storage management device according to an embodiment of the present invention.
  • the storage management device is arranged between the on-chip network and the storage device; wherein the on-chip network NoC is connected to N processing cores, and N is an integer greater than or equal to 1.
  • the storage management device includes: a command parsing unit ID and a data packet generating unit PG-SW.
  • the instruction parsing unit ID is used for parsing the instruction received from the on-chip network, and generating a control signal according to the instruction.
  • the data packet generating unit PG-SW is configured to generate at least one data packet based on the data released from the storage device according to the control signal, and send all the data packets to the network on chip.
  • the storage management device can parse the instructions received from the on-chip network, generate at least one data packet based on the data released from the external storage device, and send all the data packets to the on-chip network, by setting storage between the on-chip network and the storage device Management device, the storage management device sends data packets to all processing cores that require the same data, so that processing cores that require the same data do not need to fetch data from the storage device separately, which reduces the delay and saves the power consumption of the chip and the storage device , Which saves the time for processing cores that require the same data to read data.
  • the data packet generating unit PG-SW is configured to generate a data packet based on the data released from the storage device according to the control signal, including: the data packet generating unit PG-SW is configured to generate a data packet based on the control signal The signal generates the data packet corresponding to the processing core based on the released data and the processing core indicated by the instruction.
  • the data packet generating unit when there are multiple processing cores indicated by the instruction, is configured to generate a data packet with the processing core based on the released data and the processing core indicated by the instruction.
  • the data packet generating unit is configured to generate a data packet based on the released data and the processing core indicated by the instruction
  • the data packet corresponding to the processing core is sent in parallel to the generated multiple data packets to the on-chip network, and each data packet is sent to the processing core indicated by the instruction through the on-chip network.
  • corresponding to each processing core refers to corresponding to each processing core address.
  • corresponding to each processing core may also refer to respectively corresponding to each processing core address and corresponding to the storage address of the data in each processing core.
  • the instruction includes fetching information and storing information.
  • the storage management device further includes an access number indicating unit Addr-G; the access number indicating unit Addr-G is configured to receive the access information and instruct the storage device to release the access information according to the access information. Access the data indicated by the number information, where the data can be parameters such as weights.
  • the instruction includes the number of fetches
  • the fetch information includes the first address of the fetch address
  • the access number indicating unit is used to determine the number of fetches according to the first address of the fetch address and the first address of the fetch address. Fetching several numbers to generate a fetching address, and instructing the storage device to release the data according to the fetching address.
  • setting the fetch information as the first address of the fetch address can save the number of bits of the instruction on the one hand, so that the instruction can carry more other information, and on the other hand, the fetch instruction unit is used according to the fetch of the instruction. Counting the first address of the address and fetching the number, it is more convenient to extract the data.
  • the access number indicating unit Addr-G is further configured to receive the storage information, and generate, according to the storage information, the storage address of the data packet corresponding to the processing core indicated by the instruction.
  • the storage management device further includes an instruction caching unit for receiving instructions sent by the on-chip network, caching the instructions, and sending the cached instructions to the instruction parsing unit ID.
  • the instruction cache unit IS caches the instructions in the order of the time when the instructions are received, and sends the instructions to the instruction parsing unit ID in the order of the cache.
  • the instruction cache unit IS may be an instruction stack for temporarily storing the received data access instruction, and the storage management device MME will execute the instructions in the stack one by one.
  • the storage information includes the first address of the storage address of the data in the processing core indicated by the instruction, and the access number indication unit is configured to perform according to the first address of the storage address and the fetching address. Several numbers generate the storage address.
  • setting the storage information as the first address of the storage address of the data in the processing core, compared to the storage information in the instruction being the storage address of the data in the processing core, can save the number of bits of the instruction.
  • the data packet generating unit is configured to generate a data packet based on the data released from the storage device according to the control signal, including:
  • the data packet generating unit PG-SW is configured to generate the address of the processing core indicated by the instruction according to the control signal, and generate based on the released data, the storage address, and the address of the processing core indicated by the instruction The data packet.
  • the instruction includes a control bit, and the control bit corresponding to the processing core of the data to be received indicated by each instruction is a preset value.
  • the instruction parsing unit ID determines that the control bit is a preset When the value is set, the control signal corresponding to the processing core is generated.
  • the instruction includes the information bit used to indicate the processing core of the data to be received, the first address of the number of fetches, the number of fetches, the first address of the storage address of the data in each processing core of the data to be received, and The control bit of the control switch.
  • the format of the instruction is as follows:
  • multiple processing cores can be numbered first. Since the storage management device needs to receive instructions through NoC, it also needs to send the generated data packet to one or more processing cores indicated by the instruction through NoC, so the storage management The device is numbered together with the processing core. For example, there are N processing cores in the chip, and the number of the storage management device can be set to 1, the number of the first processing core C1 is 2, and the number of the Nth processing core Cn is N+1.
  • the number of the storage management device can also be set to 0, the number of the first processing core C1 is 1, and the number of the Nth processing core Cn is the Nth, and the number of the storage management device is 1 in the present invention. Take it as an example, but not limit it.
  • the instruction can be 148 bits.
  • the information bit is represented by ID_C, which can be 4 bits, and is used to indicate the address of the processing core of the data to be received.
  • ID_C which can be 4 bits, and is used to indicate the address of the processing core of the data to be received.
  • the information bit is set with N+1 bits, and the first bit of the N+1 bits corresponds to the storage management device.
  • the second to N+1th bits correspond to the N processing cores respectively.
  • the network-on-chip NoC uses this information bit to determine the recipient of the instruction or data packet.
  • the receiver here can be a processing core or a storage management device. That is, when the first bit in ID_C is 1, it means that the instruction or data packet is sent to the storage management device; when one of the second to N+1 bits in ID_C is 1, it means the data The packet is sent to the processing core corresponding to this bit.
  • the instruction code can be 8 bits.
  • the instruction code is set with a variety of information. For example, set the fetch instruction code to 00000001. When the instruction parsing unit ID finds that the instruction code is 00000001, it means that this is a fetch instruction.
  • the number of control bits is N+1 bits
  • the N+1 bits of the control bits and the N+1 bits of the information bits are in a one-to-one correspondence
  • the second to the N+1th bits are respectively related to the N processing cores.
  • the control bit corresponding to the processing core indicated by each instruction is a preset value, for example, the preset value is 1, that is, when the instruction parsing unit recognizes that the control bit is 1, a control signal is generated.
  • the control bits have 5 bits
  • the second to fifth bits should have 4 processing cores one by one.
  • the CC 10110
  • the first and fourth bits are 0, and the first and fourth bits are 0.
  • the second, third, and fifth bits are 1, indicating that the first processing core, the second processing core, and the fourth processing core all need to receive data.
  • the first number on the right side of the N+1 digits of CC is set to correspond to the first digit (corresponding to the storage management device), and the second number on the right corresponds to the second digit.
  • the last number on the right corresponds to the last digit (Nth processing core)
  • the second number corresponds to the second digit (the first processing core), and the last number on the left corresponds to the last digit (the Nth processing core), and the present invention is not limited to this.
  • the number of access information, the number of stored information, and the number of accesses may each be 16 bits.
  • reserved means that no coding has been performed.
  • FIG. 4 is a schematic structural diagram of a storage management device according to another embodiment of the present invention.
  • the data packet generating unit includes a Tx transmitter and an Rx receiver, wherein the Tx transmitter includes a switch and a packet header generating unit, and the number of switches is greater than or equal to the number of processing cores connected to the on-chip network NoC . Wherein, each processing core corresponds to one switch, and the control signal is used to control the state of the switch.
  • the data packet generating unit PG-SW is configured to send a plurality of the data packets to the on-chip network NoC in parallel, including: the data packet generating unit is configured to connect according to the control signal Through the switch corresponding to the processing core indicated by the instruction, the data packets are sent to the network on chip in parallel, so that all the data packets are sent to the processing core through the network on chip.
  • the switch includes a control terminal, an input terminal, and an output terminal; the input terminal of each switch is used to receive the storage address sent by the access number indicating unit and from the storage The data released in the device; the control terminal of each switch is used to realize the on state or the off state according to the received control signal.
  • control bit included in the instruction is used to control the state of the switch; the control bit corresponding to the processing core indicated by each instruction is a preset value; when the instruction parsing unit ID determines the control When the bit is a preset value, the control signal is generated.
  • the above-mentioned storage management device MME stores the received data in the storage device through the Rx receiver.
  • the first processing core C1 sends the instruction to the storage management device MME through the network on chip NoC, wherein the cache unit IS caches the instruction and sends the instruction to the instruction analysis unit ID.
  • the instruction parsing unit ID analyzes the decoding instruction to be 00000001, which means that the data needs to be fetched from the storage device.
  • the instruction parsing unit ID analyzes the first address Addr_S, the number of fetches is M, and the control information C_C is 0110, indicating that the data needs to be fetched.
  • the M numbers are sent to the first processing core C1 and the second processing core C2, and the first address Addr_D1 of the storage address storing the M data in the first processing core C1 is analyzed and the M data is stored in the first processing core C1.
  • the instruction analysis unit ID generates the first control signal C_C1 (0010) and the second control signal C_C2 (0100) respectively according to the control information 0110.
  • the instruction parsing unit ID sends the number M, Addr_S, and Addr_D to the access number indicating unit Addr_G, and sends the first control signal C_C1 and the second control signal C_C2 to the data packet generating unit PG_SW and the first processing core, respectively.
  • the first switch S1 and the second switch S2 respectively corresponding to the second processing core.
  • Addr_G generates specific access addresses according to M and Addr_S, fetches numbers from the external storage device SM, and generates specific access addresses Addr_D1_N and Addr_D2_N according to M, Addr_D1 and Addr_D2, and sends Addr_D1_N and Addr_D2_N to PG_SW.
  • the packet header generating unit ID-GEN generates the first information bit ID_C1 and the second information bit ID_C2 respectively according to the first control signal C_C1 and the second control signal C_C2, the first information bit is 0010, and the second information bit is 0100.
  • the fourth bit of the information bit can be set to 0.
  • PG_SW generates a first packet header according to the storage address Addr_D1_N and the first information bit ID_C1 obtained from the output terminal of the first switch, and generates a second packet header according to the storage address Addr_D2_N and the second information bit ID_C2 obtained from the output terminal of the second switch.
  • PG_SW also generates a first data packet based on the first packet header and the data released from the storage device obtained from the output terminal of the first switch, and based on the second packet header and the data released from the storage device obtained from the output terminal of the second switch Generate the second data packet, and send the first data packet and the second data packet to the network on chip NoC in parallel.
  • NoC sends the first data packet to the first processing core, and sends the second data packet to the second processing core.
  • the data packet received by the storage management device MME or the data packet sent by the MME may be in the following format:
  • the data packet generating unit is provided with three switches.
  • the third processing core does not require data.
  • the instruction analysis unit ID may generate a third control signal to control the third switch to be in an off state, or the instruction analysis unit ID may not generate the third control signal.
  • the storage device provided by the embodiment of the present invention can send data packets to all processing cores that require the same data, so that each processing core that requires the same data does not need to fetch data from the storage device separately, which reduces time delay and saves chips and storage.
  • the power consumption of the device saves the time for processing cores that require the same data to read data.
  • Fig. 5 is a schematic structural diagram of a chip according to still another embodiment of the present invention.
  • the chip includes an on-chip network NoC, a plurality of the processing cores connected to the on-chip network, a storage device, and the storage management device MME provided in the above-mentioned embodiments.
  • N processing cores that is, the first processing core C1 to the Nth processing core Cn.
  • the storage management device parses the instruction obtained from the network-on-chip, generates a data packet corresponding to the processing core that receives the data indicated by the instruction based on the instruction and the data released from the storage module, and sends all the generated data packets to the network-on-chip , Send all data packets to the processing core through the on-chip network.
  • the single arrow indicates the direction of data transmission provided by the embodiment of the present invention
  • the dashed line indicates that dual-line interaction can be realized.
  • the dashed double arrow between the processing core C1 and the storage management device MME indicates processing.
  • the core and storage management device MME can implement data sending and receiving in both directions.
  • the storage management device MME and the storage device SM can also implement data transmission and reception in both directions.
  • An embodiment of the present invention provides a card board, which includes one or more chips provided in the foregoing embodiments.
  • An embodiment of the present invention provides an electronic device, including one or more of the card boards provided in the foregoing embodiments.
  • Fig. 6 is a schematic flowchart of a storage management method according to an embodiment of the present invention.
  • the storage management method includes step S101-step S102:
  • Step S101 The instruction parsing unit analyzes the instruction received from the on-chip network, and generates a control signal according to the instruction.
  • Step S102 The data packet generating unit generates at least one data packet based on the data released from the storage device according to the control signal, and sends all the data packets to the network on chip.
  • An embodiment of the present invention also provides a computer storage medium having a computer program stored on the computer storage medium, and when the program is executed by a processor, the storage management method provided in the foregoing embodiment is implemented.
  • Another embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the program when the program is executed.
  • the foregoing embodiment provides a storage management method.
  • Another embodiment of the present invention provides a computer program product, which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute a storage management method provided in the foregoing embodiments.

Abstract

A storage management apparatus and a chip. The storage management apparatus is disposed between a network on chip and a storage apparatus, and the storage management apparatus comprises: an instruction parsing unit, which is used to parse an instruction received from the network on chip, and generate a control signal according to the instruction; and a data packet generation unit, which is used to generate, according to the control signal, at least one data packet on the basis of data released from the storage apparatus, and send all the data packets to the network on chip. The described storage management apparatus can send data packets to all processing cores that require the same data, thus each processing core that requires the same data need not separately fetch data from the storage apparatus, thereby reducing delay, reducing the power consumption of the chip and the storage apparatus, and reducing the time spent by processing cores that require the same data reading data.

Description

一种存储管理装置及芯片Storage management device and chip 技术领域Technical field
本发明涉及芯片技术领域,尤其是涉及一种存储管理装置及芯片。The present invention relates to the field of chip technology, in particular to a storage management device and a chip.
背景技术Background technique
随着科学技术的发展,人类社会正在快速进入智能时代。智能时代的重要特点,就是人们获得数据的种类越来越多,获得数据的量越来越大,而对处理数据的速度要求越来越高。With the development of science and technology, human society is rapidly entering the era of intelligence. The important feature of the intelligent age is that people have more and more types of data, the amount of data they can obtain is larger and larger, and the requirements for the speed of data processing are getting higher and higher.
芯片是数据处理的基石,它从根本上决定了人们处理数据的能力。从应用领域来看,芯片主要有两条路线:一条是通用芯片路线,例如中央处理器(Central Processing Unit,CPU)等,它们能提供极大的灵活性,但是在处理特定领域算法时有效算力比较低;另一条是专用芯片路线,例如张量处理器(Tensor Processing Unit,TPU)等,它们在某些特定领域,能发挥较高的有效算力,但是面对灵活多变的比较通用的领域,它们处理能力比较差甚至无法处理。The chip is the cornerstone of data processing, and it fundamentally determines the ability of people to process data. From the perspective of application fields, there are two main routes for chips: one is a general-purpose chip route, such as a central processing unit (CPU), etc. They can provide great flexibility, but they are effective in processing algorithms in specific fields. The power is relatively low; the other is a dedicated chip route, such as Tensor Processing Unit (TPU), etc. They can exert higher effective computing power in some specific fields, but they are more versatile in the face of flexible and changeable In the field, their processing power is relatively poor or even unable to handle.
由于智能时代的数据种类繁多且数量巨大,所以要求芯片既具有极高的灵活性,能处理不同领域且日新月异的算法,又具有极强的处理能力,能快速处理极大的且急剧增长的数据量。Due to the wide variety and huge amount of data in the intelligent era, the chip is required to have extremely high flexibility, capable of processing different fields and rapidly changing algorithms, and extremely strong processing capabilities, which can quickly process extremely large and rapidly increasing data. the amount.
发明内容Summary of the invention
(一)发明目的(1) Purpose of the invention
本发明的目的是提供一种存储管理装置及芯片,该存储管理装置能解析 从片上网络接收到的指令,基于从外部存储装置中释放的数据生成至少一个数据包,并将全部的数据包发送至片上网络,能够通过片上网络将存储管理装置生成的全部的数据包发送至相应的处理核,使得需要相同数据的各个处理核无需分别从存储装置中取数,降低了时延,节省了芯片及存储装置的功耗,节省了需要相同数据的处理核读取数据的时间。The object of the present invention is to provide a storage management device and a chip that can parse the instructions received from the on-chip network, generate at least one data packet based on the data released from the external storage device, and send all the data packets To the on-chip network, all the data packets generated by the storage management device can be sent to the corresponding processing cores through the on-chip network, so that each processing core that requires the same data does not need to fetch data from the storage device separately, which reduces latency and saves chips. And the power consumption of the storage device saves the time for processing cores that require the same data to read the data.
(二)技术方案(2) Technical solution
为解决上述问题,本发明的第一方面提供了一种存储管理装置,所述存储管理装置设置于片上网络和存储装置之间;所述存储管理装置包括:指令解析单元,用于解析从所述片上网络接收到的指令,并根据所述指令生成控制信号,数据包生成单元,用于根据所述控制信号,基于从所述存储装置中释放的数据生成至少一个数据包,并将全部所述数据包发送至所述片上网络。In order to solve the above-mentioned problems, the first aspect of the present invention provides a storage management device, the storage management device is arranged between the on-chip network and the storage device; the storage management device includes: an instruction parsing unit for parsing from the The instruction received by the on-chip network generates a control signal according to the instruction, and the data packet generating unit is configured to generate at least one data packet based on the data released from the storage device according to the control signal, and combine all The data packet is sent to the network on chip.
本发明实施方式提供的存储管理装置,能解析从片上网络接收到的指令,基于从外部存储装置中释放的数据生成至少一个数据包,并将全部的数据包发送至片上网络,通过在片上网络和存储装置之间设置存储管理装置,由存储管理装置将数据包发送至需要相同数据的全部处理核,从而需要相同数据的各个处理核无需分别从存储装置中取数,降低了时延,节省了芯片及存储装置的功耗,节省了需要相同数据的处理核读取数据的时间。The storage management device provided by the embodiment of the present invention can parse the instructions received from the network on chip, generate at least one data packet based on the data released from the external storage device, and send all the data packets to the network on chip, through the network on chip A storage management device is set between the storage device and the storage device. The storage management device sends data packets to all processing cores that require the same data, so that each processing core that requires the same data does not need to fetch data from the storage device separately, which reduces time delay and saves The power consumption of the chip and the storage device is reduced, and the time for processing cores that require the same data to read data is saved.
进一步地,所述数据包生成单元用于根据所述控制信号,基于从所述存储装置中释放的数据生成数据包,包括:所述数据包生成单元用于根据所述控制信号,基于所述释放的数据和所述指令指示的处理核,生成与所述处理核对应的所述数据包。Further, the data packet generating unit is configured to generate a data packet based on the data released from the storage device according to the control signal, and includes: the data packet generating unit is configured to generate a data packet based on the control signal and based on the control signal. The released data and the processing core indicated by the instruction generate the data packet corresponding to the processing core.
进一步地,当所述指令指示的处理核的个数为多个时,所述数据包生成单元用于基于所述释放的数据和所述指令指示的处理核,生成与每个所述处理核对应的所述数据包,并将生成的多个所述数据包并行的发送至所述片上网络。Further, when the number of processing cores indicated by the instruction is multiple, the data packet generating unit is configured to generate a data packet corresponding to each of the processing cores based on the released data and the processing cores indicated by the instruction. Corresponding to the data packet, and send the generated multiple data packets to the network on chip in parallel.
进一步,所述指令包括取数信息和存数信息。Further, the instruction includes fetching information and storing information.
可选地,所述装置还包括存取数指示单元;所述存取数指示单元,用于 接收所述存数信息,以指示所述存储装置释放所述取数信息所指示的数据;所述存取数指示单元还用于接收所述取数信息,根据所述存数信息生成与每个所述指令指示的处理核对应的所述数据的存储地址。Optionally, the device further includes an access number indicating unit; the access number indicating unit is configured to receive the storage information to instruct the storage device to release the data indicated by the access information; The access number indication unit is further configured to receive the access information, and generate a storage address of the data corresponding to each processing core indicated by the instruction according to the storage information.
可选地,所述数据包生成单元用于根据所述控制信号,基于从所述存储装置中释放的数据生成数据包,包括:所述数据包生成单元用于根据所述控制信号生成所述指令指示的处理核的地址,基于所述释放的数据、所述存储地址和所述指令指示的处理核的地址生成所述数据包。Optionally, the data packet generating unit is configured to generate a data packet based on the data released from the storage device according to the control signal, including: the data packet generating unit is configured to generate the data packet according to the control signal The address of the processing core indicated by the instruction generates the data packet based on the released data, the storage address, and the address of the processing core indicated by the instruction.
可选地,数据包生成单元包括开关,所述开关的数量大于或等于与所述片上网络连接的处理核的数量;其中,每个处理核对应一个所述开关,所述控制信号用于控制所述开关的状态。Optionally, the data packet generating unit includes switches, the number of the switches is greater than or equal to the number of processing cores connected to the on-chip network; wherein each processing core corresponds to one switch, and the control signal is used to control The state of the switch.
可选地,数据包生成单元用于将所述数据包并行的发送至所述指令指示的处理核,包括:所述数据包生成单元用于根据所述控制信号接通与所述指令指示的处理核对应的开关,将所述数据包并行的发送至所述片上网络,以通过所述片上网络将所述数据包发送至对应的所述处理核。Optionally, the data packet generating unit is configured to send the data packet to the processing core indicated by the instruction in parallel, including: the data packet generating unit is configured to connect the data packet to the processing core indicated by the instruction according to the control signal. The switch corresponding to the processing core sends the data packet to the on-chip network in parallel, so as to send the data packet to the corresponding processing core through the on-chip network.
可选地,开关包括控制端、输入端和输出端;每个所述开关的所述输入端用于接收所述存取数指示单元发送的所述存储地址和所述存储装置释放的所述数据;每个所述开关的所述控制端用于根据接收的所述控制信号实现接通状态或断开状态。Optionally, the switch includes a control terminal, an input terminal, and an output terminal; the input terminal of each switch is used to receive the storage address sent by the access number indicating unit and the storage device released Data; the control terminal of each switch is used to implement an on state or an off state according to the received control signal.
进一步可选的,所述指令解析单元用于将所述控制信号发送至所述控制端,以控制所述开关接通,所述数据包生成单元将所述处理核的地址和通过所述输入端获取的所述存储地址、所述数据生成数据包。Further optionally, the instruction parsing unit is configured to send the control signal to the control terminal to control the switch to turn on, and the data packet generation unit combines the address of the processing core with the input The storage address and the data obtained by the terminal generate a data packet.
可选地,所述指令包括用于控制所述开关的控制位;与每个所述指令指示的处理核对应的所述控制位为预设值;当所述指令解析单元确定所述控制位为预设值时,生成所述控制信号。Optionally, the instruction includes a control bit for controlling the switch; the control bit corresponding to each processing core indicated by the instruction is a preset value; when the instruction parsing unit determines the control bit When it is a preset value, the control signal is generated.
优选的,所述指令包括取数个数,所述取数信息包括取数地址的首地址;所述存取数指示单元,用于根据所述取数地址的首地址和所述取数个数生成取数地址,并根据所述取数地址指示所述存储装置释放所述数据。Preferably, the instruction includes the number of fetches, and the fetch information includes the first address of the fetch address; and the access number indicating unit is configured to calculate the number of fetches according to the first address of the fetch address and the number of fetches. The data access address is generated by the data, and the storage device is instructed to release the data according to the access address.
进一步优选的,所述存数信息包括所述指令指示的处理核中数据的存储地址的首地址,所述存取数指示单元,用于根据所述存储地址的首地址和所述取数个数生成所述存储地址。Further preferably, the storage information includes the first address of the storage address of the data in the processing core indicated by the instruction, and the access number indicating unit is configured to obtain data according to the first address of the storage address and the number of accesses. The number generates the storage address.
可选的,所述指令包括:控制位、信息位、取数首地址、取数个数、每个待接收数据的处理核中数据的存储地址的首地址。Optionally, the instruction includes: a control bit, an information bit, a first address for fetching, the number of fetching, and the first address of a data storage address in each processing core of the data to be received.
其中,控制位用于控制开关的状态,信息位用于指示待接收数据的处理核。Among them, the control bit is used to control the state of the switch, and the information bit is used to indicate the processing core of the data to be received.
可选的,所述存储管理装置还包括指令缓存单元,用于接收所述片上网络发送的所述指令,并对所述指令缓存,将缓存后的所述指令发送给所述指令解析单元。Optionally, the storage management device further includes an instruction caching unit, configured to receive the instruction sent by the on-chip network, cache the instruction, and send the cached instruction to the instruction parsing unit.
根据本发明的第二方面,提供了一种芯片,包括片上网络、与所述片上网络连接的多个所述处理核、存储装置和第一方面提供的存储管理装置。According to a second aspect of the present invention, there is provided a chip including an on-chip network, a plurality of the processing cores connected to the on-chip network, a storage device, and the storage management device provided in the first aspect.
根据本发明的第三方面,提供了一种卡板,包括一个或多个第二方面提供的芯片。According to a third aspect of the present invention, a card board is provided, which includes one or more chips provided in the second aspect.
根据本发明的第四方面,提供了一种电子设备,包括一个或多个第三方面提供的卡板。According to a fourth aspect of the present invention, there is provided an electronic device including one or more cards provided in the third aspect.
根据本发明的第五方面,提供了一种存储管理方法,包括:解析从片上网络接收到的指令,并根据所述指令生成控制信号;根据所述控制信号,基于从存储装置中释放的数据生成至少一个数据包,并将全部所述数据包发送至片上网络。According to a fifth aspect of the present invention, there is provided a storage management method, including: parsing an instruction received from an on-chip network, and generating a control signal according to the instruction; according to the control signal, based on the data released from the storage device At least one data packet is generated, and all the data packets are sent to the network on chip.
根据本发明的第六方面,提供了一种计算机存储介质,所述计算机存储介质上存储有计算机程序,所述程序被处理器执行时实现第五方面的一种存储管理方法。According to a sixth aspect of the present invention, there is provided a computer storage medium having a computer program stored on the computer storage medium, and when the program is executed by a processor, a storage management method of the fifth aspect is implemented.
根据本发明的第七方面,提供了一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现第五方面的一种存储管理方法。According to a seventh aspect of the present invention, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor implements A storage management method of the fifth aspect.
根据本发明的第八方面,提供一种计算机程序产品,其中,包括计算机 指令,当所述计算机指令被计算设备执行时,所述计算设备可以执行第五方面的一种存储管理方法。According to an eighth aspect of the present invention, a computer program product is provided, which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute a storage management method of the fifth aspect.
(三)有益效果(3) Beneficial effects
本发明实施方式提供的存储管理装置,能解析从片上网络接收到的指令,基于从外部存储装置中释放的数据生成至少一个数据包,并将全部的数据包发送至片上网络,通过在片上网络和存储装置之间设置存储管理装置,由存储管理装置将数据包发送至需要相同数据的全部处理核,从而需要相同数据的各个处理核无需分别从存储装置中取数,降低了时延,节省了芯片及存储装置的功耗,节省了需要相同数据的处理核读取数据的时间。The storage management device provided by the embodiment of the present invention can parse the instructions received from the network on chip, generate at least one data packet based on the data released from the external storage device, and send all the data packets to the network on chip, through the network on chip A storage management device is set between the storage device and the storage device. The storage management device sends data packets to all processing cores that require the same data, so that each processing core that requires the same data does not need to fetch data from the storage device separately, which reduces time delay and saves The power consumption of the chip and the storage device is reduced, and the time for processing cores that require the same data to read data is saved.
附图说明Description of the drawings
图1是现有技术中多个核获取相同数据的流程示意图;FIG. 1 is a schematic diagram of a flow of multiple cores obtaining the same data in the prior art;
图2是现有技术中多个核获取相同数据的流程示意图;FIG. 2 is a schematic diagram of a flow of multiple cores obtaining the same data in the prior art;
图3是根据本发明一实施方式的存储管理装置的结构示意图;FIG. 3 is a schematic structural diagram of a storage management device according to an embodiment of the present invention;
图4是根据本发明又一实施方式的存储管理装置的结构示意图;4 is a schematic structural diagram of a storage management device according to another embodiment of the present invention;
图5是根据本发明再一实施方式的芯片的结构示意图;FIG. 5 is a schematic structural diagram of a chip according to still another embodiment of the present invention;
图6是根据本发明一实施方式的存储管理方法的流程示意图。Fig. 6 is a schematic flowchart of a storage management method according to an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明的目的、技术方案和优点更加清楚明了,下面结合具体实施方式并参照附图,对本发明进一步详细说明。应该理解,这些描述只是示例性的,而并非要限制本发明的范围。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要地混淆本发明的概念。In order to make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings. It should be understood that these descriptions are only exemplary, and are not intended to limit the scope of the present invention. In addition, in the following description, descriptions of well-known structures and technologies are omitted to avoid unnecessarily obscuring the concept of the present invention.
在神经网络计算中,经常会用到多核或者众核的芯片架构。如何让众多的核能够发挥出尽量高的算力成为芯片性能的关键。而算力的发挥,取决于核怎么将数据高效的读到核内进行计算。In neural network computing, multi-core or many-core chip architectures are often used. How to make many cores can exert the highest possible computing power has become the key to chip performance. The use of computing power depends on how the core efficiently reads the data into the core for calculation.
例如,在图片识别的应用中,芯片有N个核,所有的核独立并行处理图 片,各核所用的算法和权重均相同。那么最大的发挥芯片的算力,且花费最小的延时及功耗的方法,就是所有的核共用相同一份权重,即多个核提取相同数据。For example, in the application of image recognition, the chip has N cores, all cores independently process images in parallel, and the algorithms and weights used by each core are the same. Then the way to maximize the computing power of the chip and spend the least delay and power consumption is that all cores share the same weight, that is, multiple cores extract the same data.
图1是现有技术中多个核获取相同数据的流程示意图。FIG. 1 is a schematic diagram of a flow of multiple cores acquiring the same data in the prior art.
如图1所示,一个CPU包括N个处理核和共享存储模块(Shared Memory,SM),N个处理核为处理核C1到处理核Cn,各个处理核均单独从SM中读取本核需要的数据,当各个处理核需要相同的数据时,例如相同的权重或其他相同的参数,多个处理核取数的流程为:As shown in Figure 1, a CPU includes N processing cores and shared memory (Shared Memory, SM). The N processing cores are processing core C1 to processing core Cn, and each processing core reads its own core needs from SM separately. When each processing core requires the same data, such as the same weight or other same parameters, the process of multiple processing cores is:
处理核C1向SM发出取数指示,并提取到SM释放的数据,当处理核C1取数完成之后,其他处理核在依次向SM发出取数指令,再次从SM中提取存储在相同位置的数据。Processing core C1 issues a fetch instruction to SM and fetches the data released by SM. After processing core C1 fetches the data, other processing cores will issue fetch instructions to SM in turn, and retrieve data stored in the same location from SM again. .
图1中的单箭头表示多个核获取相同数据的数据传输的方向,虚线表示核能够与共享存储模块SM实现数据的双向传输。The single arrow in FIG. 1 indicates the data transmission direction for multiple cores to obtain the same data, and the dashed line indicates that the core can implement two-way data transmission with the shared memory module SM.
通常这种方式存在如下的弊端:Usually this method has the following disadvantages:
当多个处理核需要使用相同的数据,每个处理核都需要独自去读取,一方面,相同的数据从SM中读取多次,导致SM的功耗大,另一方面,当SM中只存储有一份该数据,则需要多个处理核一个接一个的读取,排在后面的处理核需要等前面的处理核读取之后,才能读取,导致排在后面的处理核读取数据存在延时,有时候只有处理核读取到该数据后才能计算,这样导致排在后面的处理核等待的时间很长,浪费了算力,降低了芯片的性能。When multiple processing cores need to use the same data, each processing core needs to read it independently. On the one hand, the same data is read multiple times from the SM, resulting in high power consumption in the SM. On the other hand, when the SM is in If only one copy of the data is stored, multiple processing cores are required to read one by one. The processing cores that are ranked later need to wait for the previous processing cores to read them before they can be read, causing the processing cores that are ranked later to read the data. There is a delay, and sometimes it can only be calculated after the processing core reads the data. This causes the processing cores in the back to wait for a long time, wastes computing power, and reduces the performance of the chip.
图2是现有技术中多个核获取相同数据的流程示意图。Fig. 2 is a schematic diagram of a process in which multiple cores obtain the same data in the prior art.
如图2所示,一个GPU中包括N个处理核和SM,N个处理核为处理核C1到处理核Cn,该SM中连续的存储有多个相同的数据,一个处理核从SM中读取需要的数据,SM将连续的存储的相同的数据都释放,供多个处理核读取。多个处理核取数的流程为:As shown in Figure 2, a GPU includes N processing cores and SM. The N processing cores are processing core C1 to processing core Cn. The SM stores multiple identical data consecutively, and one processing core reads from SM. After fetching the required data, the SM releases the same data continuously stored for multiple processing cores to read. The process of multiple processing check numbers is:
第一个处理核C1向SM发出取数指示,SM释放连续存储的相同的数据,需要相同数据的处理核C1-Cn分别对应接收SM释放的相同的数据,这种方式 需要SM将同一个数据连续的存储有多份,浪费SM的内存。The first processing core C1 issues a fetch instruction to SM. SM releases the same data continuously stored. Processing cores C1-Cn that require the same data respectively receive the same data released by SM. This method requires SM to transfer the same data. There are multiple copies of continuous storage, which wastes the memory of SM.
图2中的单箭头表示多个核获取相同数据的数据传输的方向,虚线表示核能够与共享存储模块SM实现数据的双向传输。The single arrow in FIG. 2 indicates the direction of data transmission for multiple cores to obtain the same data, and the dotted line indicates that the cores can implement two-way data transmission with the shared memory module SM.
为解决上述问题,提出本发明的技术方案。In order to solve the above-mentioned problems, the technical solution of the present invention is proposed.
下面将详细说明本申请一实施方式提供的芯片。在本发明的描述中,需要说明的是,术语“第一”、“第二”、“第三”、“第四”仅用于描述目的,而不能理解为指示或暗示相对重要性。此外,下面所描述的本发明不同实施方式中所涉及的技术特征只要彼此之间未构成冲突就可以相互结合。The chip provided by an embodiment of the present application will be described in detail below. In the description of the present invention, it should be noted that the terms "first", "second", "third", and "fourth" are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance. In addition, the technical features involved in the different embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.
图3是根据本发明一实施方式的存储管理装置的结构示意图。Fig. 3 is a schematic structural diagram of a storage management device according to an embodiment of the present invention.
在本实施方式中,该存储管理装置设置于片上网络和存储装置之间;其中片上网络NoC连接N个处理核,N为大于等于1的整数。In this embodiment, the storage management device is arranged between the on-chip network and the storage device; wherein the on-chip network NoC is connected to N processing cores, and N is an integer greater than or equal to 1.
如图3所示,所述存储管理装置包括:指令解析单元ID和数据包生成单元PG-SW。As shown in FIG. 3, the storage management device includes: a command parsing unit ID and a data packet generating unit PG-SW.
指令解析单元ID,用于解析从所述片上网络接收到的指令,并根据所述指令生成控制信号。The instruction parsing unit ID is used for parsing the instruction received from the on-chip network, and generating a control signal according to the instruction.
数据包生成单元PG-SW,用于根据所述控制信号,基于从所述存储装置中释放的数据生成至少一个数据包,并将全部所述数据包发送至所述片上网络。The data packet generating unit PG-SW is configured to generate at least one data packet based on the data released from the storage device according to the control signal, and send all the data packets to the network on chip.
存储管理装置能解析从片上网络接收到的指令,基于从外部存储装置中释放的数据生成至少一个数据包,并将全部的数据包发送至片上网络,通过在片上网络和存储装置之间设置存储管理装置,由存储管理装置将数据包发送至需要相同数据的全部处理核,从而需要相同数据的处理核无需分别从存储装置中取数,降低了时延,节省了芯片及存储装置的功耗,节省了需要相同数据的处理核读取数据的时间。The storage management device can parse the instructions received from the on-chip network, generate at least one data packet based on the data released from the external storage device, and send all the data packets to the on-chip network, by setting storage between the on-chip network and the storage device Management device, the storage management device sends data packets to all processing cores that require the same data, so that processing cores that require the same data do not need to fetch data from the storage device separately, which reduces the delay and saves the power consumption of the chip and the storage device , Which saves the time for processing cores that require the same data to read data.
在一个实施例中,数据包生成单元PG-SW用于根据所述控制信号,基于从所述存储装置中释放的数据生成数据包,包括:数据包生成单元PG-SW用于根据所述控制信号,基于所述释放的数据和所述指令指示的处理核,生成 与所述处理核对应的所述数据包。In an embodiment, the data packet generating unit PG-SW is configured to generate a data packet based on the data released from the storage device according to the control signal, including: the data packet generating unit PG-SW is configured to generate a data packet based on the control signal The signal generates the data packet corresponding to the processing core based on the released data and the processing core indicated by the instruction.
在一个优选的实施例中,当所述指令指示的处理核为多个时,所述数据包生成单元用于基于所述释放的数据和所述指令指示的处理核,生成与所述处理核对应的所述数据包,并将所述数据包发送至所述片上网络,包括:所述数据包生成单元用于基于所述释放的数据和所述指令指示的处理核,生成与每个所述处理核对应的所述数据包,并将生成的多个所述数据包并行的发送至所述片上网络,进而通过片上网络将每个数据包发送给指令指示的处理核。In a preferred embodiment, when there are multiple processing cores indicated by the instruction, the data packet generating unit is configured to generate a data packet with the processing core based on the released data and the processing core indicated by the instruction. Corresponding to the data packet, and sending the data packet to the on-chip network, including: the data packet generating unit is configured to generate a data packet based on the released data and the processing core indicated by the instruction The data packet corresponding to the processing core is sent in parallel to the generated multiple data packets to the on-chip network, and each data packet is sent to the processing core indicated by the instruction through the on-chip network.
需要说明的是,“与每个处理核对应”是指与每个处理核地址对应。优选的,“与每个处理核对应”还可以是指分别与每个处理核地址对应、与每个处理核中数据的存储地址相对应。It should be noted that “corresponding to each processing core” refers to corresponding to each processing core address. Preferably, "corresponding to each processing core" may also refer to respectively corresponding to each processing core address and corresponding to the storage address of the data in each processing core.
在一个实施例中,指令包括取数信息和存数信息。In one embodiment, the instruction includes fetching information and storing information.
在一个实施例中,存储管理装置还包括存取数指示单元Addr-G;存取数指示单元Addr-G,用于接收所述取数信息,根据所述取数信息指示存储装置释放所述取数信息所指示的数据,其中,数据可以是权重等参数。In one embodiment, the storage management device further includes an access number indicating unit Addr-G; the access number indicating unit Addr-G is configured to receive the access information and instruct the storage device to release the access information according to the access information. Access the data indicated by the number information, where the data can be parameters such as weights.
在一个优选的实施例中,指令包括取数个数,所述取数信息包括取数地址的首地址;所述存取数指示单元,用于根据所述取数地址的首地址和所述取数个数生成取数地址,并根据所述取数地址指示所述存储装置释放所述数据。In a preferred embodiment, the instruction includes the number of fetches, and the fetch information includes the first address of the fetch address; and the access number indicating unit is used to determine the number of fetches according to the first address of the fetch address and the first address of the fetch address. Fetching several numbers to generate a fetching address, and instructing the storage device to release the data according to the fetching address.
在本实施例中,设置取数信息为取数地址的首地址,一方面能节省指令的位数,使得指令能携带更多的其他信息,另一方面,通过取数指示单元根据指令的取数地址首地址和取数个数,提取数据更方便。In this embodiment, setting the fetch information as the first address of the fetch address can save the number of bits of the instruction on the one hand, so that the instruction can carry more other information, and on the other hand, the fetch instruction unit is used according to the fetch of the instruction. Counting the first address of the address and fetching the number, it is more convenient to extract the data.
在一个实施例中,存取数指示单元Addr-G还用于接收所述存数信息,根据所述存数信息生成与所述指令指示的处理核对应的所述数据包的存储地址。In an embodiment, the access number indicating unit Addr-G is further configured to receive the storage information, and generate, according to the storage information, the storage address of the data packet corresponding to the processing core indicated by the instruction.
在一个可选的实施例中,存储管理装置还包括指令缓存单元,用于接收所述片上网络发送的指令,并对所述指令缓存,将缓存后的所述指令发送给 所述指令解析单元ID。In an optional embodiment, the storage management device further includes an instruction caching unit for receiving instructions sent by the on-chip network, caching the instructions, and sending the cached instructions to the instruction parsing unit ID.
当片上网络发送的指令为多个时,指令缓存单元IS按照接收指令的时间的先后顺序对指令缓存,并按照缓存的顺序将指令发送给指令解析单元ID。When there are multiple instructions sent by the on-chip network, the instruction cache unit IS caches the instructions in the order of the time when the instructions are received, and sends the instructions to the instruction parsing unit ID in the order of the cache.
可选的,指令缓存单元IS可以是指令栈,用于临时存储接收到的存取数据指令,该存储管理装置MME会逐条的执行栈中的指令。Optionally, the instruction cache unit IS may be an instruction stack for temporarily storing the received data access instruction, and the storage management device MME will execute the instructions in the stack one by one.
在一个优选的实施例中,存数信息包括所述指令指示的处理核中数据的存储地址的首地址,所述存取数指示单元,用于根据所述存储地址的首地址和所述取数个数生成所述存储地址。In a preferred embodiment, the storage information includes the first address of the storage address of the data in the processing core indicated by the instruction, and the access number indication unit is configured to perform according to the first address of the storage address and the fetching address. Several numbers generate the storage address.
在本实施例中,设置存数信息为处理核中数据的存储地址的首地址,相比于指令中存数信息为处理核中数据的存储地址,能节省指令的位数。In this embodiment, setting the storage information as the first address of the storage address of the data in the processing core, compared to the storage information in the instruction being the storage address of the data in the processing core, can save the number of bits of the instruction.
在一个实施例中,数据包生成单元用于根据所述控制信号,基于从所述存储装置释放的数据生成数据包,包括:In an embodiment, the data packet generating unit is configured to generate a data packet based on the data released from the storage device according to the control signal, including:
所述数据包生成单元PG-SW,用于根据所述控制信号生成所述指令指示的处理核的地址,基于所述释放的数据、所述存储地址以及所述指令指示的处理核的地址生成所述数据包。The data packet generating unit PG-SW is configured to generate the address of the processing core indicated by the instruction according to the control signal, and generate based on the released data, the storage address, and the address of the processing core indicated by the instruction The data packet.
在一个实施例中,指令包括控制位,与每个所述指令指示的待接收数据的处理核对应的所述控制位为预设值,当所述指令解析单元ID确定所述控制位为预设值时,生成对应该处理核的所述控制信号。In one embodiment, the instruction includes a control bit, and the control bit corresponding to the processing core of the data to be received indicated by each instruction is a preset value. When the instruction parsing unit ID determines that the control bit is a preset When the value is set, the control signal corresponding to the processing core is generated.
在一个实施例中,指令包括用于指示待接收数据的处理核的信息位、取数首地址、取数个数、每个待接收数据的处理核中数据的存储地址的首地址、用于控制开关的控制位。In one embodiment, the instruction includes the information bit used to indicate the processing core of the data to be received, the first address of the number of fetches, the number of fetches, the first address of the storage address of the data in each processing core of the data to be received, and The control bit of the control switch.
在一个具体的实施例中,指令的格式如下:In a specific embodiment, the format of the instruction is as follows:
Figure PCTCN2019130640-appb-000001
Figure PCTCN2019130640-appb-000001
需要说明的是,可以先对多个处理核进行编号,由于存储管理装置需要 通过NoC接收指令,也需要通过NoC将生成的数据包发送给指令指示的一个或多个处理核,所以将存储管理装置与处理核一同编号。例如,芯片中共有N个处理核,可设定存储管理装置的编号为1、第一个处理核C1的编号为2、第N个处理核Cn的编号为N+1。It should be noted that multiple processing cores can be numbered first. Since the storage management device needs to receive instructions through NoC, it also needs to send the generated data packet to one or more processing cores indicated by the instruction through NoC, so the storage management The device is numbered together with the processing core. For example, there are N processing cores in the chip, and the number of the storage management device can be set to 1, the number of the first processing core C1 is 2, and the number of the Nth processing core Cn is N+1.
应当理解的是,也可以设定存储管理装置的编号为0、第一个处理核C1的编号为1、第N个处理核Cn的编号为第N,本发明以存储管理装置的编号为1为例,但不以此为限。It should be understood that the number of the storage management device can also be set to 0, the number of the first processing core C1 is 1, and the number of the Nth processing core Cn is the Nth, and the number of the storage management device is 1 in the present invention. Take it as an example, but not limit it.
该指令可以为148比特。The instruction can be 148 bits.
其中,信息位用ID_C表示,可以为4比特,用于指示待接收数据的处理核的地址,信息位设置有N+1位,这N+1位中的第一位与存储管理装置对应,第二位至第N+1位分别与N个处理核一一对应。片上网络NoC通过该信息位确定指令或数据包的接收方。这里接收方可以是处理核或者存储管理装置。即当ID_C中第一位为1时,表示该指令或数据包是发送给存储管理装置的;当ID_C中第二位至第N+1位中的某一位为1时,则表示该数据包是发给与该某一位对应的处理核的。Among them, the information bit is represented by ID_C, which can be 4 bits, and is used to indicate the address of the processing core of the data to be received. The information bit is set with N+1 bits, and the first bit of the N+1 bits corresponds to the storage management device. The second to N+1th bits correspond to the N processing cores respectively. The network-on-chip NoC uses this information bit to determine the recipient of the instruction or data packet. The receiver here can be a processing core or a storage management device. That is, when the first bit in ID_C is 1, it means that the instruction or data packet is sent to the storage management device; when one of the second to N+1 bits in ID_C is 1, it means the data The packet is sent to the processing core corresponding to this bit.
指令编码,可以为8比特,指令编码设置有多种信息,例如设定取数指令编码为00000001,当指令解析单元ID发现指令编码为00000001时,说明这是一条取数指令。The instruction code can be 8 bits. The instruction code is set with a variety of information. For example, set the fetch instruction code to 00000001. When the instruction parsing unit ID finds that the instruction code is 00000001, it means that this is a fetch instruction.
其中,控制位的位数为N+1位,控制位的N+1位与信息位的N+1位是一一对应的,其中第二位到第N+1位与N个处理核分别是一一对应的,与每个指令指示的处理核对应的所述控制位为预设值,例如,预设值是1,即当指令解析单元识别出控制位为1时,产生控制信号。例如,芯片有4个处理核,则控制位有5位,第2位到第5位一一对应该4个处理核,当C-C为10110时,即第一位和第四位为0、第二位、第三位和第五位为1,表示第一处理核、第二处理核、第四处理核都需要接收数据。Among them, the number of control bits is N+1 bits, the N+1 bits of the control bits and the N+1 bits of the information bits are in a one-to-one correspondence, and the second to the N+1th bits are respectively related to the N processing cores. There is a one-to-one correspondence, and the control bit corresponding to the processing core indicated by each instruction is a preset value, for example, the preset value is 1, that is, when the instruction parsing unit recognizes that the control bit is 1, a control signal is generated. For example, if the chip has 4 processing cores, the control bits have 5 bits, and the second to fifth bits should have 4 processing cores one by one. When the CC is 10110, that is, the first and fourth bits are 0, and the first and fourth bits are 0. The second, third, and fifth bits are 1, indicating that the first processing core, the second processing core, and the fourth processing core all need to receive data.
需要说明的是,本发明实施例中,设置C-C的N+1位中右侧第一个数对应的是第一位(对应存储管理装置),右侧第二个数对应的是第二位(对应 第一处理核),右侧最后一个数对应的是最后一位(第N处理核),当然也可以设置左侧第一个数对应的是第一位(存储管理装置),左侧第二个数对应的是第二位(第一处理核),左侧最后一个数对应的是最后一位(第N处理核),本发明并不以此为限。It should be noted that, in the embodiment of the present invention, the first number on the right side of the N+1 digits of CC is set to correspond to the first digit (corresponding to the storage management device), and the second number on the right corresponds to the second digit. (Corresponding to the first processing core), the last number on the right corresponds to the last digit (Nth processing core), of course, you can also set the first number on the left to correspond to the first digit (storage management device), on the left The second number corresponds to the second digit (the first processing core), and the last number on the left corresponds to the last digit (the Nth processing core), and the present invention is not limited to this.
其中,取数信息、存数信息和取数个数可以分别为16比特。Among them, the number of access information, the number of stored information, and the number of accesses may each be 16 bits.
其中,保留是指未进行编码。Among them, reserved means that no coding has been performed.
图4是根据本发明又一实施方式的存储管理装置的结构示意图。FIG. 4 is a schematic structural diagram of a storage management device according to another embodiment of the present invention.
在一个实施例中,数据包生成单元包括Tx发送器和Rx接收器,其中Tx发送器包括开关和包头生成单元,所述开关的数量大于或等于与所述片上网络NoC连接的处理核的数量。其中,每个处理核对应一个所述开关,所述控制信号用于控制所述开关的状态。In one embodiment, the data packet generating unit includes a Tx transmitter and an Rx receiver, wherein the Tx transmitter includes a switch and a packet header generating unit, and the number of switches is greater than or equal to the number of processing cores connected to the on-chip network NoC . Wherein, each processing core corresponds to one switch, and the control signal is used to control the state of the switch.
在一个实施例中,所述数据包生成单元PG-SW用于将多个所述数据包并行的发送至所述片上网络NoC,包括:所述数据包生成单元用于根据所述控制信号接通与所述指令指示的处理核对应的开关,将所述数据包并行的发送至所述片上网络,以通过所述片上网络将全部的所述数据包发送至所述处理核。In an embodiment, the data packet generating unit PG-SW is configured to send a plurality of the data packets to the on-chip network NoC in parallel, including: the data packet generating unit is configured to connect according to the control signal Through the switch corresponding to the processing core indicated by the instruction, the data packets are sent to the network on chip in parallel, so that all the data packets are sent to the processing core through the network on chip.
在一个具体的实施例中,开关包括控制端、输入端和输出端;每个所述开关的所述输入端用于接收所述存取数指示单元发送的所述存储地址和从所述存储装置中释放的所述数据;每个所述开关的所述控制端用于根据接收的所述控制信号实现接通状态或断开状态。In a specific embodiment, the switch includes a control terminal, an input terminal, and an output terminal; the input terminal of each switch is used to receive the storage address sent by the access number indicating unit and from the storage The data released in the device; the control terminal of each switch is used to realize the on state or the off state according to the received control signal.
其中,指令包括的控制位,该控制位用于控制开关的状态;与每个所述指令指示的处理核对应的所述控制位为预设值;当所述指令解析单元ID确定所述控制位为预设值时,生成所述控制信号。Wherein, the control bit included in the instruction is used to control the state of the switch; the control bit corresponding to the processing core indicated by each instruction is a preset value; when the instruction parsing unit ID determines the control When the bit is a preset value, the control signal is generated.
在一个实施方式中,上述存储管理装置MME通过Rx接收器将接收到的数据存储在存储装置中。In one embodiment, the above-mentioned storage management device MME stores the received data in the storage device through the Rx receiver.
在图4所示的实施方式中,第一处理核C1通过片上网络NoC将指令发送存储管理装置MME,其中,缓存单元IS缓存所述指令,并将指令发送给指令 解析单元ID。In the embodiment shown in FIG. 4, the first processing core C1 sends the instruction to the storage management device MME through the network on chip NoC, wherein the cache unit IS caches the instruction and sends the instruction to the instruction analysis unit ID.
指令解析单元ID解析出解码指令为00000001,表示是需要从存储装置中取出数据,指令解析单元ID解析出取数首地址Addr_S、取数个数为M个,控制信息C_C为0110,表示需要将该M个数发送给第一处理核C1和第二处理核C2,解析出将该M个数据存入到第一处理核C1的存储地址的首地址Addr_D1和将该M个数据存入到第二处理核C2的存储地址的首地址Addr_D2。指令解析单元ID根据控制信息0110,分别生成第一控制信号C_C1即0010和第二控制信号C_C2即0100。The instruction parsing unit ID analyzes the decoding instruction to be 00000001, which means that the data needs to be fetched from the storage device. The instruction parsing unit ID analyzes the first address Addr_S, the number of fetches is M, and the control information C_C is 0110, indicating that the data needs to be fetched. The M numbers are sent to the first processing core C1 and the second processing core C2, and the first address Addr_D1 of the storage address storing the M data in the first processing core C1 is analyzed and the M data is stored in the first processing core C1. The first address Addr_D2 of the storage address of the second processing core C2. The instruction analysis unit ID generates the first control signal C_C1 (0010) and the second control signal C_C2 (0100) respectively according to the control information 0110.
指令解析单元ID将取数个数M、Addr_S和Addr_D发送给存取数指示单元Addr_G,且将第一控制信号C_C1和第二控制信号C_C2分别发送给数据包生成单元PG_SW中与第一处理核和第二处理核分别对应的第一开关S1和第二开关S2。The instruction parsing unit ID sends the number M, Addr_S, and Addr_D to the access number indicating unit Addr_G, and sends the first control signal C_C1 and the second control signal C_C2 to the data packet generating unit PG_SW and the first processing core, respectively. The first switch S1 and the second switch S2 respectively corresponding to the second processing core.
Addr_G根据M和Addr_S生成具体的取数地址,并从外部的存储装置SM中取数,且根据M、Addr_D1和Addr_D2生成具体的存数地址Addr_D1_N和Addr_D2_N,并将Addr_D1_N和Addr_D2_N发给PG_SW。Addr_G generates specific access addresses according to M and Addr_S, fetches numbers from the external storage device SM, and generates specific access addresses Addr_D1_N and Addr_D2_N according to M, Addr_D1 and Addr_D2, and sends Addr_D1_N and Addr_D2_N to PG_SW.
PG_SW根据接收到的第一控制信号C_C1和第二控制信号C_C2接通第一开关和第二开关。包头生成单元ID-GEN根据第一控制信号C_C1和第二控制信号C_C2分别生成第一信息位ID_C1和第二信息位ID_C2,第一信息位为0010,第二信息位为0100。PG_SW turns on the first switch and the second switch according to the received first control signal C_C1 and second control signal C_C2. The packet header generating unit ID-GEN generates the first information bit ID_C1 and the second information bit ID_C2 respectively according to the first control signal C_C1 and the second control signal C_C2, the first information bit is 0010, and the second information bit is 0100.
需要说明的是,当处理核的个数位小于3时,信息位的第四位可以设置为0。It should be noted that when the number of processing cores is less than 3, the fourth bit of the information bit can be set to 0.
PG_SW根据从第一开关的输出端获得的存储地址Addr_D1_N和第一信息位ID_C1生成第一包头,根据从第二开关的输出端获得的存储地址Addr_D2_N和第二信息位ID_C2生成第二包头。PG_SW还根据该第一包头和从第一开关的输出端获得的从存储装置释放的数据生成第一数据包,根据该第二包头和从第二开关的输出端获得的从存储装置释放的数据生成第二数据包,并行的将第一数据包和第二数据包发送到片上网络NoC。PG_SW generates a first packet header according to the storage address Addr_D1_N and the first information bit ID_C1 obtained from the output terminal of the first switch, and generates a second packet header according to the storage address Addr_D2_N and the second information bit ID_C2 obtained from the output terminal of the second switch. PG_SW also generates a first data packet based on the first packet header and the data released from the storage device obtained from the output terminal of the first switch, and based on the second packet header and the data released from the storage device obtained from the output terminal of the second switch Generate the second data packet, and send the first data packet and the second data packet to the network on chip NoC in parallel.
NoC将第一数据包发送至第一处理核,将第二数据包发送到第二处理核。NoC sends the first data packet to the first processing core, and sends the second data packet to the second processing core.
在一个实施例中,存储管理装置MME接收的数据包或者MME发送出的数据包都可以是下述格式:In an embodiment, the data packet received by the storage management device MME or the data packet sent by the MME may be in the following format:
信息位Information bit 处理核中数据的存储地址The storage address of the data in the processing core 数据data
(ID_C,4bit)(ID_C,4bit) (Addr_D,16bit)(Addr_D,16bit) (128bit)(128bit)
需要说明的是,在图4所示的实施方式中,具有两个处理核,当具有三个处理核时,则数据包生成单元则设置有3个开关。当第一处理核发送的指示第一处理核和第二处理核都需要数据,第三处理核不需要数据。指令解析单元ID可以生成第三控制信号,控制第三开关为断开的状态,或者指令解析单元ID也可以不生成第三控制信号。It should be noted that in the embodiment shown in FIG. 4, there are two processing cores. When there are three processing cores, the data packet generating unit is provided with three switches. When the first processing core sends an indication that both the first processing core and the second processing core require data, the third processing core does not require data. The instruction analysis unit ID may generate a third control signal to control the third switch to be in an off state, or the instruction analysis unit ID may not generate the third control signal.
本发明实施方式提供的存储装置,能够将数据包发送至需要相同数据的全部处理核,从而需要相同数据的各个处理核无需分别从存储装置中取数,降低了时延,节省了芯片及存储装置的功耗,节省了需要相同数据的处理核读取数据的时间。The storage device provided by the embodiment of the present invention can send data packets to all processing cores that require the same data, so that each processing core that requires the same data does not need to fetch data from the storage device separately, which reduces time delay and saves chips and storage. The power consumption of the device saves the time for processing cores that require the same data to read data.
图5是根据本发明再一实施方式的芯片的结构示意图。Fig. 5 is a schematic structural diagram of a chip according to still another embodiment of the present invention.
如图5所示,该芯片包括片上网络NoC、与片上网络连接的多个所述处理核、存储装置和上述实施方式提供的存储管理装置MME。As shown in FIG. 5, the chip includes an on-chip network NoC, a plurality of the processing cores connected to the on-chip network, a storage device, and the storage management device MME provided in the above-mentioned embodiments.
其中,多个处理核为N个即第一处理核C1到第N处理核Cn。Among them, there are N processing cores, that is, the first processing core C1 to the Nth processing core Cn.
该存储管理装置,解析从片上网络获取的指令,根据该指令和从存储模块释放的数据,生成与指令指示的接收数据的处理核对应的数据包,将生成的全部的数据包发送至片上网络,通过片上网络将全部的数据包发送至处理核。The storage management device parses the instruction obtained from the network-on-chip, generates a data packet corresponding to the processing core that receives the data indicated by the instruction based on the instruction and the data released from the storage module, and sends all the generated data packets to the network-on-chip , Send all data packets to the processing core through the on-chip network.
需要说明的是,在图5中,单箭头表示本发明实施方式提供中数据的传输方向,虚线表示能够实现双线的交互,例如处理核C1与存储管理装置MME之间虚线的双箭头表示处理核与存储管理装置MME能够双向的实现数据的发送和接收。存储管理装置MME与存储装置SM也能够双向的实现数据的发送和接收。It should be noted that, in FIG. 5, the single arrow indicates the direction of data transmission provided by the embodiment of the present invention, and the dashed line indicates that dual-line interaction can be realized. For example, the dashed double arrow between the processing core C1 and the storage management device MME indicates processing. The core and storage management device MME can implement data sending and receiving in both directions. The storage management device MME and the storage device SM can also implement data transmission and reception in both directions.
本发明的一个实施方式,提供了一种卡板,包括一个或多个上述实施方式提供的芯片。An embodiment of the present invention provides a card board, which includes one or more chips provided in the foregoing embodiments.
本发明的一个实施方式,提供了一种电子设备,包括一个或多个上述实施方式提供的卡板。An embodiment of the present invention provides an electronic device, including one or more of the card boards provided in the foregoing embodiments.
图6是根据本发明一实施方式的存储管理方法的流程示意图。Fig. 6 is a schematic flowchart of a storage management method according to an embodiment of the present invention.
如图6所示,该存储管理方法包括步骤S101-步骤S102:As shown in Fig. 6, the storage management method includes step S101-step S102:
步骤S101,指令解析单元解析从所述片上网络接收到的指令,并根据所述指令生成控制信号。Step S101: The instruction parsing unit analyzes the instruction received from the on-chip network, and generates a control signal according to the instruction.
步骤S102,数据包生成单元根据所述控制信号,基于从所述存储装置中释放的数据生成至少一个数据包,并将全部所述数据包发送至所述片上网络。Step S102: The data packet generating unit generates at least one data packet based on the data released from the storage device according to the control signal, and sends all the data packets to the network on chip.
本发明一个实施方式还提供了一种计算机存储介质,所述计算机存储介质上存储有计算机程序,所述程序被处理器执行时实现上述实施方式提供的存储管理方法。An embodiment of the present invention also provides a computer storage medium having a computer program stored on the computer storage medium, and when the program is executed by a processor, the storage management method provided in the foregoing embodiment is implemented.
本发明的又一个实施方式,提供了一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现上述实施方式提供的一种存储管理方法。Another embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the program when the program is executed. The foregoing embodiment provides a storage management method.
本发明的又一个实施方式,提供了一种计算机程序产品,其中,包括计算机指令,当所述计算机指令被计算设备执行时,所述计算设备可以执行上述实施方式提供的一种存储管理方法。Another embodiment of the present invention provides a computer program product, which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute a storage management method provided in the foregoing embodiments.
应当理解的是,本发明的上述具体实施方式仅仅用于示例性说明或解释本发明的原理,而不构成对本发明的限制。因此,在不偏离本发明的精神和范围的情况下所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。此外,本发明所附权利要求旨在涵盖落入所附权利要求范围和边界、或者这种范围和边界的等同形式内的全部变化和修改例。It should be understood that the above-mentioned specific embodiments of the present invention are only used to exemplarily illustrate or explain the principle of the present invention, and do not constitute a limitation to the present invention. Therefore, any modifications, equivalent substitutions, improvements, etc. made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. In addition, the appended claims of the present invention are intended to cover all changes and modifications that fall within the scope and boundary of the appended claims, or equivalent forms of such scope and boundary.

Claims (11)

  1. 一种存储管理装置,其特征在于,所述存储管理装置设置于片上网络和存储装置之间;A storage management device, characterized in that the storage management device is arranged between the on-chip network and the storage device;
    所述存储管理装置包括:The storage management device includes:
    指令解析单元,用于解析从所述片上网络接收到的指令,并根据所述指令生成控制信号;An instruction parsing unit for parsing the instruction received from the on-chip network, and generating a control signal according to the instruction;
    数据包生成单元,用于根据所述控制信号,基于从所述存储装置中释放的数据生成至少一个数据包,并将全部所述数据包发送至所述片上网络。The data packet generating unit is configured to generate at least one data packet based on the data released from the storage device according to the control signal, and send all the data packets to the network on chip.
  2. 如权利要求1所述的装置,其特征在于,所述数据包生成单元用于根据所述控制信号,基于从所述存储装置中释放的数据生成至少一个数据包,包括:The device according to claim 1, wherein the data packet generating unit is configured to generate at least one data packet based on the data released from the storage device according to the control signal, comprising:
    所述数据包生成单元用于根据所述控制信号,基于所述释放的数据和所述指令指示的处理核,生成与所述处理核对应的所述数据包。The data packet generating unit is configured to generate the data packet corresponding to the processing core based on the released data and the processing core indicated by the instruction according to the control signal.
  3. 如权利要求2所述的装置,其特征在于,The device of claim 2, wherein:
    当所述指令指示的处理核为多个时,When there are multiple processing cores indicated by the instruction,
    所述数据包生成单元用于基于所述释放的数据和所述指令指示的处理核,生成与所述处理核对应的所述数据包,并将所述数据包发送至所述片上网络,包括:The data packet generating unit is configured to generate the data packet corresponding to the processing core based on the released data and the processing core indicated by the instruction, and send the data packet to the on-chip network, including :
    所述数据包生成单元用于基于所述释放的数据和所述指令指示的处理核,生成与每个所述处理核对应的所述数据包,并将生成的多个所述数据包并行的发送至所述片上网络。The data packet generating unit is configured to generate the data packet corresponding to each processing core based on the released data and the processing core indicated by the instruction, and parallel the generated multiple data packets Send to the on-chip network.
  4. 如权利要求1-3任一项所述的装置,其特征在于,所述指令包括取数信息和存数信息。The device according to any one of claims 1 to 3, wherein the instruction includes fetching information and storing information.
  5. 如权利要求3所述的装置,其特征在于,所述装置还包括存取数指示单元;5. The device of claim 3, wherein the device further comprises an access number indicating unit;
    所述存取数指示单元,用于接收所述取数信息,根据所述取数信息指 示所述存储装置释放所述取数信息所指示的数据;The access number indicating unit is configured to receive the access information, and instruct the storage device to release the data indicated by the access information according to the access information;
    所述存取数指示单元还用于接收所述存数信息,根据所述存数信息生成与所述指令指示的处理核对应的所述数据包的存储地址。The access number indicating unit is further configured to receive the stored number information, and generate, according to the stored number information, the storage address of the data packet corresponding to the processing core indicated by the instruction.
  6. 如权利要求4所述的装置,其特征在于,所述数据包生成单元用于根据所述控制信号,基于从所述存储装置释放的数据生成数据包,包括:The device according to claim 4, wherein the data packet generating unit is configured to generate a data packet based on the data released from the storage device according to the control signal, comprising:
    所述数据包生成单元用于根据所述控制信号生成所述指令指示的处理核的地址,基于所述释放的数据、所述存储地址以及所述指令指示的处理核的地址生成所述数据包。The data packet generating unit is configured to generate the address of the processing core indicated by the instruction according to the control signal, and generate the data packet based on the released data, the storage address, and the address of the processing core indicated by the instruction .
  7. 如权利要求5所述的装置,其特征在于,所述数据包生成单元包括开关,所述开关的数量大于或等于与所述片上网络连接的处理核的数量;7. The device of claim 5, wherein the data packet generating unit comprises a switch, and the number of the switches is greater than or equal to the number of processing cores connected to the on-chip network;
    其中,每个处理核对应一个所述开关,所述控制信号用于控制所述开关的状态。Wherein, each processing core corresponds to one switch, and the control signal is used to control the state of the switch.
  8. 如权利要求6所述的装置,其特征在于,所述数据包生成单元用于将多个所述数据包并行的发送至所述片上网络,包括:7. The device according to claim 6, wherein the data packet generating unit is configured to send a plurality of the data packets to the on-chip network in parallel, comprising:
    所述数据包生成单元用于根据所述控制信号接通与所述指令指示的处理核对应的开关,将多个所述数据包并行的发送至所述片上网络,以通过所述片上网络将多个所述数据包发送至对应的所述处理核。The data packet generating unit is configured to turn on the switch corresponding to the processing core indicated by the instruction according to the control signal, and send a plurality of the data packets to the network on chip in parallel, so as to transmit the data packets to the network on chip through the network on chip. The multiple data packets are sent to the corresponding processing core.
  9. 如权利要求6或7所述的装置,其特征在于,所述开关包括控制端、输入端和输出端;The device according to claim 6 or 7, wherein the switch includes a control terminal, an input terminal and an output terminal;
    每个所述开关的所述输入端用于接收所述存取数指示单元发送的所述存储地址和所述存储装置中释放的所述数据;The input terminal of each switch is used to receive the storage address sent by the access number indicating unit and the data released in the storage device;
    每个所述开关的所述控制端用于根据接收的所述控制信号实现接通状态或断开状态。The control terminal of each switch is used to realize an on state or an off state according to the received control signal.
  10. 如权利要求1-8任一项所述的装置,其特征在于,The device according to any one of claims 1-8, wherein:
    所述指令包括控制位;与每个所述指令指示的处理核对应的所述控制位为预设值;The instruction includes a control bit; the control bit corresponding to the processing core indicated by each instruction is a preset value;
    当所述指令解析单元确定所述控制位为预设值时,生成所述控制信号。When the instruction parsing unit determines that the control bit is a preset value, the control signal is generated.
  11. 一种芯片,其特征在于,包括片上网络、与所述片上网络连接的多个所述处理核、存储装置和如权利要求1-9任一项提供的存储管理装置。A chip, characterized by comprising a network on a chip, a plurality of processing cores connected to the network on a chip, a storage device, and the storage management device provided in any one of claims 1-9.
PCT/CN2019/130640 2019-12-31 2019-12-31 Storage management apparatus and chip WO2021134521A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980102940.2A CN114902619B (en) 2019-12-31 2019-12-31 Storage management device and chip
PCT/CN2019/130640 WO2021134521A1 (en) 2019-12-31 2019-12-31 Storage management apparatus and chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/130640 WO2021134521A1 (en) 2019-12-31 2019-12-31 Storage management apparatus and chip

Publications (1)

Publication Number Publication Date
WO2021134521A1 true WO2021134521A1 (en) 2021-07-08

Family

ID=76686059

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/130640 WO2021134521A1 (en) 2019-12-31 2019-12-31 Storage management apparatus and chip

Country Status (2)

Country Link
CN (1) CN114902619B (en)
WO (1) WO2021134521A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070115939A1 (en) * 2005-10-12 2007-05-24 Samsung Electronics Co., Ltd. Network on chip system employing an advanced extensible interface protocol
CN101141261A (en) * 2007-10-10 2008-03-12 山东大学 Network-on-chip digital router and its parallel data transmission method
CN102567278A (en) * 2011-12-29 2012-07-11 中国科学院计算技术研究所 On-chip multi-core data transmission method and device
CN105095150A (en) * 2015-08-14 2015-11-25 中国电子科技集团公司第五十八研究所 Network interface supporting network-on-chip
CN105528311A (en) * 2015-12-11 2016-04-27 中国航空工业集团公司西安航空计算技术研究所 Memory reading-writing circuit and method based on data packet
CN108470009A (en) * 2018-03-19 2018-08-31 上海兆芯集成电路有限公司 Processing circuit and its neural network computing method

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080177979A1 (en) * 2006-03-01 2008-07-24 Gheorghe Stefan Hardware multi-core processor optimized for object oriented computing
CN100458751C (en) * 2007-05-10 2009-02-04 忆正存储技术(深圳)有限公司 Paralleling flash memory controller
US8200992B2 (en) * 2007-09-24 2012-06-12 Cognitive Electronics, Inc. Parallel processing computer systems with reduced power consumption and methods for providing the same
FR2925187B1 (en) * 2007-12-14 2011-04-08 Commissariat Energie Atomique SYSTEM COMPRISING A PLURALITY OF TREATMENT UNITS FOR EXECUTING PARALLEL STAINS BY MIXING THE CONTROL TYPE EXECUTION MODE AND THE DATA FLOW TYPE EXECUTION MODE
US8490111B2 (en) * 2011-04-16 2013-07-16 Throughputer, Inc. Efficient network and memory architecture for multi-core data processing system
CN102508643A (en) * 2011-11-16 2012-06-20 刘大可 Multicore-parallel digital signal processor and method for operating parallel instruction sets
CN102521201A (en) * 2011-11-16 2012-06-27 刘大可 Multi-core DSP (digital signal processor) system-on-chip and data transmission method
CN103034615B (en) * 2012-12-07 2016-04-13 无锡美森微电子科技有限公司 A kind of being applicable to flows the memory management method applying polycaryon processor
CN103092788B (en) * 2012-12-24 2016-01-27 华为技术有限公司 Polycaryon processor and data access method
WO2015050594A2 (en) * 2013-06-16 2015-04-09 President And Fellows Of Harvard College Methods and apparatus for parallel processing
CN103744644B (en) * 2014-01-13 2017-03-01 上海交通大学 The four core processor systems built using four nuclear structures and method for interchanging data
US9658675B1 (en) * 2015-02-19 2017-05-23 Amazon Technologies, Inc. Achieving power saving by a circuit including pluralities of processing cores based on status of the buffers used by the processing cores
US20170147513A1 (en) * 2015-11-24 2017-05-25 Knuedge, Inc. Multiple processor access to shared program memory
US10068041B2 (en) * 2016-02-01 2018-09-04 King Fahd University Of Petroleum And Minerals Multi-core compact executable trace processor
CN106293642B (en) * 2016-08-08 2018-10-02 合肥工业大学 A kind of branch process module and its branch process mechanism for coarseness multinuclear computing system
CN109241641B (en) * 2018-09-18 2022-09-13 西安微电子技术研究所 Dual-core ARM type SoC application verification realization method and application verification board

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070115939A1 (en) * 2005-10-12 2007-05-24 Samsung Electronics Co., Ltd. Network on chip system employing an advanced extensible interface protocol
CN101141261A (en) * 2007-10-10 2008-03-12 山东大学 Network-on-chip digital router and its parallel data transmission method
CN102567278A (en) * 2011-12-29 2012-07-11 中国科学院计算技术研究所 On-chip multi-core data transmission method and device
CN105095150A (en) * 2015-08-14 2015-11-25 中国电子科技集团公司第五十八研究所 Network interface supporting network-on-chip
CN105528311A (en) * 2015-12-11 2016-04-27 中国航空工业集团公司西安航空计算技术研究所 Memory reading-writing circuit and method based on data packet
CN108470009A (en) * 2018-03-19 2018-08-31 上海兆芯集成电路有限公司 Processing circuit and its neural network computing method

Also Published As

Publication number Publication date
CN114902619B (en) 2023-07-25
CN114902619A (en) 2022-08-12

Similar Documents

Publication Publication Date Title
US20210240634A1 (en) Highly integrated scalable, flexible dsp megamodule architecture
CN110647480B (en) Data processing method, remote direct access network card and equipment
JP7221242B2 (en) Neural network data processor, method and electronics
US10049061B2 (en) Active memory device gather, scatter, and filter
JP5859017B2 (en) Control node for processing cluster
KR101150928B1 (en) Network architecture and method for processing packet data using the same
TWI506444B (en) Processor and method to improve mmio request handling
KR102409024B1 (en) Multi-core interconnect in a network processor
CN109542830B (en) Data processing system and data processing method
US11301408B1 (en) Asymmetric read / write architecture for enhanced throughput and reduced latency
US20230017643A1 (en) Composable infrastructure enabled by heterogeneous architecture, delivered by cxl based cached switch soc
US20190294570A1 (en) Technologies for dynamic multi-core network packet processing distribution
US20220358002A1 (en) Network attached mpi processing architecture in smartnics
US7657724B1 (en) Addressing device resources in variable page size environments
US10346049B2 (en) Distributed contiguous reads in a network on a chip architecture
US7466716B2 (en) Reducing latency in a channel adapter by accelerated I/O control block processing
CN100504824C (en) Opportunistic read completion combining
WO2021134521A1 (en) Storage management apparatus and chip
US11456972B2 (en) Methods and arrangements to accelerate array searches
WO2022199357A1 (en) Data processing method and apparatus, electronic device, and computer-readable storage medium
CN113138711B (en) Storage management device and chip
WO2018052718A1 (en) Method and apparatus for masking and transmitting data
CN113778937A (en) System and method for performing transaction aggregation in a network on chip (NoC)
US11960727B1 (en) System and method for large memory transaction (LMT) stores
CN108234147A (en) DMA broadcast data transmission method based on host counting in GPDSP

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19958209

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19958209

Country of ref document: EP

Kind code of ref document: A1