WO2021134521A1

WO2021134521A1 - Storage management apparatus and chip

Info

Publication number: WO2021134521A1
Application number: PCT/CN2019/130640
Authority: WO
Inventors: 罗飞; 王维伟
Original assignee: 北京希姆计算科技有限公司
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2021-07-08
Also published as: CN114902619B; CN114902619A

Abstract

A storage management apparatus and a chip. The storage management apparatus is disposed between a network on chip and a storage apparatus, and the storage management apparatus comprises: an instruction parsing unit, which is used to parse an instruction received from the network on chip, and generate a control signal according to the instruction; and a data packet generation unit, which is used to generate, according to the control signal, at least one data packet on the basis of data released from the storage apparatus, and send all the data packets to the network on chip. The described storage management apparatus can send data packets to all processing cores that require the same data, thus each processing core that requires the same data need not separately fetch data from the storage apparatus, thereby reducing delay, reducing the power consumption of the chip and the storage apparatus, and reducing the time spent by processing cores that require the same data reading data.

Description

Storage management device and chip

Technical field

The present invention relates to the field of chip technology, in particular to a storage management device and a chip.

Background technique

With the development of science and technology, human society is rapidly entering the era of intelligence. The important feature of the intelligent age is that people have more and more types of data, the amount of data they can obtain is larger and larger, and the requirements for the speed of data processing are getting higher and higher.

The chip is the cornerstone of data processing, and it fundamentally determines the ability of people to process data. From the perspective of application fields, there are two main routes for chips: one is a general-purpose chip route, such as a central processing unit (CPU), etc. They can provide great flexibility, but they are effective in processing algorithms in specific fields. The power is relatively low; the other is a dedicated chip route, such as Tensor Processing Unit (TPU), etc. They can exert higher effective computing power in some specific fields, but they are more versatile in the face of flexible and changeable In the field, their processing power is relatively poor or even unable to handle.

Due to the wide variety and huge amount of data in the intelligent era, the chip is required to have extremely high flexibility, capable of processing different fields and rapidly changing algorithms, and extremely strong processing capabilities, which can quickly process extremely large and rapidly increasing data. the amount.

Summary of the invention

(1) Purpose of the invention

The object of the present invention is to provide a storage management device and a chip that can parse the instructions received from the on-chip network, generate at least one data packet based on the data released from the external storage device, and send all the data packets To the on-chip network, all the data packets generated by the storage management device can be sent to the corresponding processing cores through the on-chip network, so that each processing core that requires the same data does not need to fetch data from the storage device separately, which reduces latency and saves chips. And the power consumption of the storage device saves the time for processing cores that require the same data to read the data.

(2) Technical solution

In order to solve the above-mentioned problems, the first aspect of the present invention provides a storage management device, the storage management device is arranged between the on-chip network and the storage device; the storage management device includes: an instruction parsing unit for parsing from the The instruction received by the on-chip network generates a control signal according to the instruction, and the data packet generating unit is configured to generate at least one data packet based on the data released from the storage device according to the control signal, and combine all The data packet is sent to the network on chip.

The storage management device provided by the embodiment of the present invention can parse the instructions received from the network on chip, generate at least one data packet based on the data released from the external storage device, and send all the data packets to the network on chip, through the network on chip A storage management device is set between the storage device and the storage device. The storage management device sends data packets to all processing cores that require the same data, so that each processing core that requires the same data does not need to fetch data from the storage device separately, which reduces time delay and saves The power consumption of the chip and the storage device is reduced, and the time for processing cores that require the same data to read data is saved.

Further, the data packet generating unit is configured to generate a data packet based on the data released from the storage device according to the control signal, and includes: the data packet generating unit is configured to generate a data packet based on the control signal and based on the control signal. The released data and the processing core indicated by the instruction generate the data packet corresponding to the processing core.

Further, when the number of processing cores indicated by the instruction is multiple, the data packet generating unit is configured to generate a data packet corresponding to each of the processing cores based on the released data and the processing cores indicated by the instruction. Corresponding to the data packet, and send the generated multiple data packets to the network on chip in parallel.

Further, the instruction includes fetching information and storing information.

Optionally, the device further includes an access number indicating unit; the access number indicating unit is configured to receive the storage information to instruct the storage device to release the data indicated by the access information; The access number indication unit is further configured to receive the access information, and generate a storage address of the data corresponding to each processing core indicated by the instruction according to the storage information.

Optionally, the data packet generating unit is configured to generate a data packet based on the data released from the storage device according to the control signal, including: the data packet generating unit is configured to generate the data packet according to the control signal The address of the processing core indicated by the instruction generates the data packet based on the released data, the storage address, and the address of the processing core indicated by the instruction.

Optionally, the data packet generating unit includes switches, the number of the switches is greater than or equal to the number of processing cores connected to the on-chip network; wherein each processing core corresponds to one switch, and the control signal is used to control The state of the switch.

Optionally, the data packet generating unit is configured to send the data packet to the processing core indicated by the instruction in parallel, including: the data packet generating unit is configured to connect the data packet to the processing core indicated by the instruction according to the control signal. The switch corresponding to the processing core sends the data packet to the on-chip network in parallel, so as to send the data packet to the corresponding processing core through the on-chip network.

Optionally, the switch includes a control terminal, an input terminal, and an output terminal; the input terminal of each switch is used to receive the storage address sent by the access number indicating unit and the storage device released Data; the control terminal of each switch is used to implement an on state or an off state according to the received control signal.

Further optionally, the instruction parsing unit is configured to send the control signal to the control terminal to control the switch to turn on, and the data packet generation unit combines the address of the processing core with the input The storage address and the data obtained by the terminal generate a data packet.

Optionally, the instruction includes a control bit for controlling the switch; the control bit corresponding to each processing core indicated by the instruction is a preset value; when the instruction parsing unit determines the control bit When it is a preset value, the control signal is generated.

Preferably, the instruction includes the number of fetches, and the fetch information includes the first address of the fetch address; and the access number indicating unit is configured to calculate the number of fetches according to the first address of the fetch address and the number of fetches. The data access address is generated by the data, and the storage device is instructed to release the data according to the access address.

Further preferably, the storage information includes the first address of the storage address of the data in the processing core indicated by the instruction, and the access number indicating unit is configured to obtain data according to the first address of the storage address and the number of accesses. The number generates the storage address.

Optionally, the instruction includes: a control bit, an information bit, a first address for fetching, the number of fetching, and the first address of a data storage address in each processing core of the data to be received.

Among them, the control bit is used to control the state of the switch, and the information bit is used to indicate the processing core of the data to be received.

Optionally, the storage management device further includes an instruction caching unit, configured to receive the instruction sent by the on-chip network, cache the instruction, and send the cached instruction to the instruction parsing unit.

According to a second aspect of the present invention, there is provided a chip including an on-chip network, a plurality of the processing cores connected to the on-chip network, a storage device, and the storage management device provided in the first aspect.

According to a third aspect of the present invention, a card board is provided, which includes one or more chips provided in the second aspect.

According to a fourth aspect of the present invention, there is provided an electronic device including one or more cards provided in the third aspect.

According to a fifth aspect of the present invention, there is provided a storage management method, including: parsing an instruction received from an on-chip network, and generating a control signal according to the instruction; according to the control signal, based on the data released from the storage device At least one data packet is generated, and all the data packets are sent to the network on chip.

According to a sixth aspect of the present invention, there is provided a computer storage medium having a computer program stored on the computer storage medium, and when the program is executed by a processor, a storage management method of the fifth aspect is implemented.

According to a seventh aspect of the present invention, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor implements A storage management method of the fifth aspect.

According to an eighth aspect of the present invention, a computer program product is provided, which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute a storage management method of the fifth aspect.

(3) Beneficial effects

Description of the drawings

FIG. 1 is a schematic diagram of a flow of multiple cores obtaining the same data in the prior art;

FIG. 2 is a schematic diagram of a flow of multiple cores obtaining the same data in the prior art;

FIG. 3 is a schematic structural diagram of a storage management device according to an embodiment of the present invention;

4 is a schematic structural diagram of a storage management device according to another embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a chip according to still another embodiment of the present invention;

Fig. 6 is a schematic flowchart of a storage management method according to an embodiment of the present invention.

Detailed ways

In order to make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings. It should be understood that these descriptions are only exemplary, and are not intended to limit the scope of the present invention. In addition, in the following description, descriptions of well-known structures and technologies are omitted to avoid unnecessarily obscuring the concept of the present invention.

In neural network computing, multi-core or many-core chip architectures are often used. How to make many cores can exert the highest possible computing power has become the key to chip performance. The use of computing power depends on how the core efficiently reads the data into the core for calculation.

For example, in the application of image recognition, the chip has N cores, all cores independently process images in parallel, and the algorithms and weights used by each core are the same. Then the way to maximize the computing power of the chip and spend the least delay and power consumption is that all cores share the same weight, that is, multiple cores extract the same data.

FIG. 1 is a schematic diagram of a flow of multiple cores acquiring the same data in the prior art.

As shown in Figure 1, a CPU includes N processing cores and shared memory (Shared Memory, SM). The N processing cores are processing core C1 to processing core Cn, and each processing core reads its own core needs from SM separately. When each processing core requires the same data, such as the same weight or other same parameters, the process of multiple processing cores is:

Processing core C1 issues a fetch instruction to SM and fetches the data released by SM. After processing core C1 fetches the data, other processing cores will issue fetch instructions to SM in turn, and retrieve data stored in the same location from SM again. .

The single arrow in FIG. 1 indicates the data transmission direction for multiple cores to obtain the same data, and the dashed line indicates that the core can implement two-way data transmission with the shared memory module SM.

Usually this method has the following disadvantages:

When multiple processing cores need to use the same data, each processing core needs to read it independently. On the one hand, the same data is read multiple times from the SM, resulting in high power consumption in the SM. On the other hand, when the SM is in If only one copy of the data is stored, multiple processing cores are required to read one by one. The processing cores that are ranked later need to wait for the previous processing cores to read them before they can be read, causing the processing cores that are ranked later to read the data. There is a delay, and sometimes it can only be calculated after the processing core reads the data. This causes the processing cores in the back to wait for a long time, wastes computing power, and reduces the performance of the chip.

Fig. 2 is a schematic diagram of a process in which multiple cores obtain the same data in the prior art.

As shown in Figure 2, a GPU includes N processing cores and SM. The N processing cores are processing core C1 to processing core Cn. The SM stores multiple identical data consecutively, and one processing core reads from SM. After fetching the required data, the SM releases the same data continuously stored for multiple processing cores to read. The process of multiple processing check numbers is:

The first processing core C1 issues a fetch instruction to SM. SM releases the same data continuously stored. Processing cores C1-Cn that require the same data respectively receive the same data released by SM. This method requires SM to transfer the same data. There are multiple copies of continuous storage, which wastes the memory of SM.

The single arrow in FIG. 2 indicates the direction of data transmission for multiple cores to obtain the same data, and the dotted line indicates that the cores can implement two-way data transmission with the shared memory module SM.

In order to solve the above-mentioned problems, the technical solution of the present invention is proposed.

The chip provided by an embodiment of the present application will be described in detail below. In the description of the present invention, it should be noted that the terms "first", "second", "third", and "fourth" are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance. In addition, the technical features involved in the different embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

Fig. 3 is a schematic structural diagram of a storage management device according to an embodiment of the present invention.

In this embodiment, the storage management device is arranged between the on-chip network and the storage device; wherein the on-chip network NoC is connected to N processing cores, and N is an integer greater than or equal to 1.

As shown in FIG. 3, the storage management device includes: a command parsing unit ID and a data packet generating unit PG-SW.

The instruction parsing unit ID is used for parsing the instruction received from the on-chip network, and generating a control signal according to the instruction.

The data packet generating unit PG-SW is configured to generate at least one data packet based on the data released from the storage device according to the control signal, and send all the data packets to the network on chip.

The storage management device can parse the instructions received from the on-chip network, generate at least one data packet based on the data released from the external storage device, and send all the data packets to the on-chip network, by setting storage between the on-chip network and the storage device Management device, the storage management device sends data packets to all processing cores that require the same data, so that processing cores that require the same data do not need to fetch data from the storage device separately, which reduces the delay and saves the power consumption of the chip and the storage device , Which saves the time for processing cores that require the same data to read data.

In an embodiment, the data packet generating unit PG-SW is configured to generate a data packet based on the data released from the storage device according to the control signal, including: the data packet generating unit PG-SW is configured to generate a data packet based on the control signal The signal generates the data packet corresponding to the processing core based on the released data and the processing core indicated by the instruction.

In a preferred embodiment, when there are multiple processing cores indicated by the instruction, the data packet generating unit is configured to generate a data packet with the processing core based on the released data and the processing core indicated by the instruction. Corresponding to the data packet, and sending the data packet to the on-chip network, including: the data packet generating unit is configured to generate a data packet based on the released data and the processing core indicated by the instruction The data packet corresponding to the processing core is sent in parallel to the generated multiple data packets to the on-chip network, and each data packet is sent to the processing core indicated by the instruction through the on-chip network.

It should be noted that “corresponding to each processing core” refers to corresponding to each processing core address. Preferably, "corresponding to each processing core" may also refer to respectively corresponding to each processing core address and corresponding to the storage address of the data in each processing core.

In one embodiment, the instruction includes fetching information and storing information.

In one embodiment, the storage management device further includes an access number indicating unit Addr-G; the access number indicating unit Addr-G is configured to receive the access information and instruct the storage device to release the access information according to the access information. Access the data indicated by the number information, where the data can be parameters such as weights.

In a preferred embodiment, the instruction includes the number of fetches, and the fetch information includes the first address of the fetch address; and the access number indicating unit is used to determine the number of fetches according to the first address of the fetch address and the first address of the fetch address. Fetching several numbers to generate a fetching address, and instructing the storage device to release the data according to the fetching address.

In this embodiment, setting the fetch information as the first address of the fetch address can save the number of bits of the instruction on the one hand, so that the instruction can carry more other information, and on the other hand, the fetch instruction unit is used according to the fetch of the instruction. Counting the first address of the address and fetching the number, it is more convenient to extract the data.

In an embodiment, the access number indicating unit Addr-G is further configured to receive the storage information, and generate, according to the storage information, the storage address of the data packet corresponding to the processing core indicated by the instruction.

In an optional embodiment, the storage management device further includes an instruction caching unit for receiving instructions sent by the on-chip network, caching the instructions, and sending the cached instructions to the instruction parsing unit ID.

When there are multiple instructions sent by the on-chip network, the instruction cache unit IS caches the instructions in the order of the time when the instructions are received, and sends the instructions to the instruction parsing unit ID in the order of the cache.

Optionally, the instruction cache unit IS may be an instruction stack for temporarily storing the received data access instruction, and the storage management device MME will execute the instructions in the stack one by one.

In a preferred embodiment, the storage information includes the first address of the storage address of the data in the processing core indicated by the instruction, and the access number indication unit is configured to perform according to the first address of the storage address and the fetching address. Several numbers generate the storage address.

In this embodiment, setting the storage information as the first address of the storage address of the data in the processing core, compared to the storage information in the instruction being the storage address of the data in the processing core, can save the number of bits of the instruction.

In an embodiment, the data packet generating unit is configured to generate a data packet based on the data released from the storage device according to the control signal, including:

The data packet generating unit PG-SW is configured to generate the address of the processing core indicated by the instruction according to the control signal, and generate based on the released data, the storage address, and the address of the processing core indicated by the instruction The data packet.

In one embodiment, the instruction includes a control bit, and the control bit corresponding to the processing core of the data to be received indicated by each instruction is a preset value. When the instruction parsing unit ID determines that the control bit is a preset When the value is set, the control signal corresponding to the processing core is generated.

In one embodiment, the instruction includes the information bit used to indicate the processing core of the data to be received, the first address of the number of fetches, the number of fetches, the first address of the storage address of the data in each processing core of the data to be received, and The control bit of the control switch.

In a specific embodiment, the format of the instruction is as follows:

It should be noted that multiple processing cores can be numbered first. Since the storage management device needs to receive instructions through NoC, it also needs to send the generated data packet to one or more processing cores indicated by the instruction through NoC, so the storage management The device is numbered together with the processing core. For example, there are N processing cores in the chip, and the number of the storage management device can be set to 1, the number of the first processing core C1 is 2, and the number of the Nth processing core Cn is N+1.

It should be understood that the number of the storage management device can also be set to 0, the number of the first processing core C1 is 1, and the number of the Nth processing core Cn is the Nth, and the number of the storage management device is 1 in the present invention. Take it as an example, but not limit it.

The instruction can be 148 bits.

Among them, the information bit is represented by ID_C, which can be 4 bits, and is used to indicate the address of the processing core of the data to be received. The information bit is set with N+1 bits, and the first bit of the N+1 bits corresponds to the storage management device. The second to N+1th bits correspond to the N processing cores respectively. The network-on-chip NoC uses this information bit to determine the recipient of the instruction or data packet. The receiver here can be a processing core or a storage management device. That is, when the first bit in ID_C is 1, it means that the instruction or data packet is sent to the storage management device; when one of the second to N+1 bits in ID_C is 1, it means the data The packet is sent to the processing core corresponding to this bit.

The instruction code can be 8 bits. The instruction code is set with a variety of information. For example, set the fetch instruction code to 00000001. When the instruction parsing unit ID finds that the instruction code is 00000001, it means that this is a fetch instruction.

Among them, the number of control bits is N+1 bits, the N+1 bits of the control bits and the N+1 bits of the information bits are in a one-to-one correspondence, and the second to the N+1th bits are respectively related to the N processing cores. There is a one-to-one correspondence, and the control bit corresponding to the processing core indicated by each instruction is a preset value, for example, the preset value is 1, that is, when the instruction parsing unit recognizes that the control bit is 1, a control signal is generated. For example, if the chip has 4 processing cores, the control bits have 5 bits, and the second to fifth bits should have 4 processing cores one by one. When the CC is 10110, that is, the first and fourth bits are 0, and the first and fourth bits are 0. The second, third, and fifth bits are 1, indicating that the first processing core, the second processing core, and the fourth processing core all need to receive data.

It should be noted that, in the embodiment of the present invention, the first number on the right side of the N+1 digits of CC is set to correspond to the first digit (corresponding to the storage management device), and the second number on the right corresponds to the second digit. (Corresponding to the first processing core), the last number on the right corresponds to the last digit (Nth processing core), of course, you can also set the first number on the left to correspond to the first digit (storage management device), on the left The second number corresponds to the second digit (the first processing core), and the last number on the left corresponds to the last digit (the Nth processing core), and the present invention is not limited to this.

Among them, the number of access information, the number of stored information, and the number of accesses may each be 16 bits.

Among them, reserved means that no coding has been performed.

FIG. 4 is a schematic structural diagram of a storage management device according to another embodiment of the present invention.

In one embodiment, the data packet generating unit includes a Tx transmitter and an Rx receiver, wherein the Tx transmitter includes a switch and a packet header generating unit, and the number of switches is greater than or equal to the number of processing cores connected to the on-chip network NoC . Wherein, each processing core corresponds to one switch, and the control signal is used to control the state of the switch.

In an embodiment, the data packet generating unit PG-SW is configured to send a plurality of the data packets to the on-chip network NoC in parallel, including: the data packet generating unit is configured to connect according to the control signal Through the switch corresponding to the processing core indicated by the instruction, the data packets are sent to the network on chip in parallel, so that all the data packets are sent to the processing core through the network on chip.

In a specific embodiment, the switch includes a control terminal, an input terminal, and an output terminal; the input terminal of each switch is used to receive the storage address sent by the access number indicating unit and from the storage The data released in the device; the control terminal of each switch is used to realize the on state or the off state according to the received control signal.

Wherein, the control bit included in the instruction is used to control the state of the switch; the control bit corresponding to the processing core indicated by each instruction is a preset value; when the instruction parsing unit ID determines the control When the bit is a preset value, the control signal is generated.

In one embodiment, the above-mentioned storage management device MME stores the received data in the storage device through the Rx receiver.

In the embodiment shown in FIG. 4, the first processing core C1 sends the instruction to the storage management device MME through the network on chip NoC, wherein the cache unit IS caches the instruction and sends the instruction to the instruction analysis unit ID.

The instruction parsing unit ID analyzes the decoding instruction to be 00000001, which means that the data needs to be fetched from the storage device. The instruction parsing unit ID analyzes the first address Addr_S, the number of fetches is M, and the control information C_C is 0110, indicating that the data needs to be fetched. The M numbers are sent to the first processing core C1 and the second processing core C2, and the first address Addr_D1 of the storage address storing the M data in the first processing core C1 is analyzed and the M data is stored in the first processing core C1. The first address Addr_D2 of the storage address of the second processing core C2. The instruction analysis unit ID generates the first control signal C_C1 (0010) and the second control signal C_C2 (0100) respectively according to the control information 0110.

The instruction parsing unit ID sends the number M, Addr_S, and Addr_D to the access number indicating unit Addr_G, and sends the first control signal C_C1 and the second control signal C_C2 to the data packet generating unit PG_SW and the first processing core, respectively. The first switch S1 and the second switch S2 respectively corresponding to the second processing core.

Addr_G generates specific access addresses according to M and Addr_S, fetches numbers from the external storage device SM, and generates specific access addresses Addr_D1_N and Addr_D2_N according to M, Addr_D1 and Addr_D2, and sends Addr_D1_N and Addr_D2_N to PG_SW.

PG_SW turns on the first switch and the second switch according to the received first control signal C_C1 and second control signal C_C2. The packet header generating unit ID-GEN generates the first information bit ID_C1 and the second information bit ID_C2 respectively according to the first control signal C_C1 and the second control signal C_C2, the first information bit is 0010, and the second information bit is 0100.

It should be noted that when the number of processing cores is less than 3, the fourth bit of the information bit can be set to 0.

PG_SW generates a first packet header according to the storage address Addr_D1_N and the first information bit ID_C1 obtained from the output terminal of the first switch, and generates a second packet header according to the storage address Addr_D2_N and the second information bit ID_C2 obtained from the output terminal of the second switch. PG_SW also generates a first data packet based on the first packet header and the data released from the storage device obtained from the output terminal of the first switch, and based on the second packet header and the data released from the storage device obtained from the output terminal of the second switch Generate the second data packet, and send the first data packet and the second data packet to the network on chip NoC in parallel.

NoC sends the first data packet to the first processing core, and sends the second data packet to the second processing core.

In an embodiment, the data packet received by the storage management device MME or the data packet sent by the MME may be in the following format:

信息位Information bit	处理核中数据的存储地址The storage address of the data in the processing core	数据data
(ID_C,4bit)(ID_C,4bit)	(Addr_D,16bit)(Addr_D,16bit)	(128bit)(128bit)

It should be noted that in the embodiment shown in FIG. 4, there are two processing cores. When there are three processing cores, the data packet generating unit is provided with three switches. When the first processing core sends an indication that both the first processing core and the second processing core require data, the third processing core does not require data. The instruction analysis unit ID may generate a third control signal to control the third switch to be in an off state, or the instruction analysis unit ID may not generate the third control signal.

The storage device provided by the embodiment of the present invention can send data packets to all processing cores that require the same data, so that each processing core that requires the same data does not need to fetch data from the storage device separately, which reduces time delay and saves chips and storage. The power consumption of the device saves the time for processing cores that require the same data to read data.

Fig. 5 is a schematic structural diagram of a chip according to still another embodiment of the present invention.

As shown in FIG. 5, the chip includes an on-chip network NoC, a plurality of the processing cores connected to the on-chip network, a storage device, and the storage management device MME provided in the above-mentioned embodiments.

Among them, there are N processing cores, that is, the first processing core C1 to the Nth processing core Cn.

The storage management device parses the instruction obtained from the network-on-chip, generates a data packet corresponding to the processing core that receives the data indicated by the instruction based on the instruction and the data released from the storage module, and sends all the generated data packets to the network-on-chip , Send all data packets to the processing core through the on-chip network.

It should be noted that, in FIG. 5, the single arrow indicates the direction of data transmission provided by the embodiment of the present invention, and the dashed line indicates that dual-line interaction can be realized. For example, the dashed double arrow between the processing core C1 and the storage management device MME indicates processing. The core and storage management device MME can implement data sending and receiving in both directions. The storage management device MME and the storage device SM can also implement data transmission and reception in both directions.

An embodiment of the present invention provides a card board, which includes one or more chips provided in the foregoing embodiments.

An embodiment of the present invention provides an electronic device, including one or more of the card boards provided in the foregoing embodiments.

As shown in Fig. 6, the storage management method includes step S101-step S102:

Step S101: The instruction parsing unit analyzes the instruction received from the on-chip network, and generates a control signal according to the instruction.

Step S102: The data packet generating unit generates at least one data packet based on the data released from the storage device according to the control signal, and sends all the data packets to the network on chip.

An embodiment of the present invention also provides a computer storage medium having a computer program stored on the computer storage medium, and when the program is executed by a processor, the storage management method provided in the foregoing embodiment is implemented.

Another embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the program when the program is executed. The foregoing embodiment provides a storage management method.

Another embodiment of the present invention provides a computer program product, which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute a storage management method provided in the foregoing embodiments.

It should be understood that the above-mentioned specific embodiments of the present invention are only used to exemplarily illustrate or explain the principle of the present invention, and do not constitute a limitation to the present invention. Therefore, any modifications, equivalent substitutions, improvements, etc. made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. In addition, the appended claims of the present invention are intended to cover all changes and modifications that fall within the scope and boundary of the appended claims, or equivalent forms of such scope and boundary.

Claims

A storage management device, characterized in that the storage management device is arranged between the on-chip network and the storage device;

The storage management device includes:

An instruction parsing unit for parsing the instruction received from the on-chip network, and generating a control signal according to the instruction;

The data packet generating unit is configured to generate at least one data packet based on the data released from the storage device according to the control signal, and send all the data packets to the network on chip.
The device according to claim 1, wherein the data packet generating unit is configured to generate at least one data packet based on the data released from the storage device according to the control signal, comprising:

The data packet generating unit is configured to generate the data packet corresponding to the processing core based on the released data and the processing core indicated by the instruction according to the control signal.
The device of claim 2, wherein:

When there are multiple processing cores indicated by the instruction,

The data packet generating unit is configured to generate the data packet corresponding to the processing core based on the released data and the processing core indicated by the instruction, and send the data packet to the on-chip network, including :

The data packet generating unit is configured to generate the data packet corresponding to each processing core based on the released data and the processing core indicated by the instruction, and parallel the generated multiple data packets Send to the on-chip network.
The device according to any one of claims 1 to 3, wherein the instruction includes fetching information and storing information.
5. The device of claim 3, wherein the device further comprises an access number indicating unit;

The access number indicating unit is configured to receive the access information, and instruct the storage device to release the data indicated by the access information according to the access information;

The access number indicating unit is further configured to receive the stored number information, and generate, according to the stored number information, the storage address of the data packet corresponding to the processing core indicated by the instruction.
The device according to claim 4, wherein the data packet generating unit is configured to generate a data packet based on the data released from the storage device according to the control signal, comprising:

The data packet generating unit is configured to generate the address of the processing core indicated by the instruction according to the control signal, and generate the data packet based on the released data, the storage address, and the address of the processing core indicated by the instruction .
7. The device of claim 5, wherein the data packet generating unit comprises a switch, and the number of the switches is greater than or equal to the number of processing cores connected to the on-chip network;

Wherein, each processing core corresponds to one switch, and the control signal is used to control the state of the switch.
7. The device according to claim 6, wherein the data packet generating unit is configured to send a plurality of the data packets to the on-chip network in parallel, comprising:

The data packet generating unit is configured to turn on the switch corresponding to the processing core indicated by the instruction according to the control signal, and send a plurality of the data packets to the network on chip in parallel, so as to transmit the data packets to the network on chip through the network on chip. The multiple data packets are sent to the corresponding processing core.
The device according to claim 6 or 7, wherein the switch includes a control terminal, an input terminal and an output terminal;

The input terminal of each switch is used to receive the storage address sent by the access number indicating unit and the data released in the storage device;

The control terminal of each switch is used to realize an on state or an off state according to the received control signal.
The device according to any one of claims 1-8, wherein:

The instruction includes a control bit; the control bit corresponding to the processing core indicated by each instruction is a preset value;

When the instruction parsing unit determines that the control bit is a preset value, the control signal is generated.
A chip, characterized by comprising a network on a chip, a plurality of processing cores connected to the network on a chip, a storage device, and the storage management device provided in any one of claims 1-9.