WO2013185660A1

WO2013185660A1 - Instruction storage device of network processor and instruction storage method for same

Info

Publication number: WO2013185660A1
Application number: PCT/CN2013/078736
Authority: WO
Inventors: 郝宇; 安康; 王志忠; 刘衡祁
Original assignee: 中兴通讯股份有限公司
Priority date: 2012-07-06
Filing date: 2013-07-03
Publication date: 2013-12-19
Also published as: CN102855213A; CN102855213B

Abstract

An instruction storage device of a network processor and an instruction storage method for same. The device comprises: quick memories (Qmems), buffers, a first low-speed instruction memory and a second low-speed instruction memory. The network processor comprises more than two micro engine large groups, each micro engine large group comprising N micro engines, and the N micro engines being divided into more than two micro engine subgroups; each micro engine corresponds to one Qmem and one buffer, the Qmem being connected to the micro engine, and the buffer being connected to the Qmem; each micro engine subgroup corresponds to one first low-speed instruction memory, the buffer corresponding to each micro engine in the micro engine subgroup being connected to the first low-speed instruction memory; and each micro engine large group corresponds to one second low-speed instruction memory. In this solution, a high instruction fetch efficiency is ensured, a large amount of hardware storage resources is saved, and the realization of a compiler is made simpler.

Description

Instruction storage device of network processor and instruction storage method of the device

Technical field

The present invention relates to the field of the Internet, and in particular, to an instruction storage device of a network processor and an instruction storage method of the instruction storage device.

Background technique

With the rapid development of the Internet, the interface rate of the core router used for backbone network interconnection has reached 100 Gbps, which requires the core router's line card to quickly process the packets passing through the line card. Most of the current industry uses multi-core network processors. structure. The fetch efficiency of instructions is a key factor affecting the performance of multicore network processors.

In a network processor system with a multi-core structure, the same set of micro-engines (MEs) have the same instruction requirements. Due to chip area and process limitations, it is impossible to equip each micro-engine with an exclusive storage space. Store these instructions. Therefore, it is necessary to design a corresponding solution to realize the sharing of a set of micro-engines to a piece of instruction storage space, and at the same time, it can have higher indexing efficiency.

Some traditional multi-core network processors use a multi-level cache structure. For example, each micro-engine is equipped with a separate level 1 cache, and a group of micro-engines share a level 2 cache structure to achieve storage space sharing, as shown in Figure 1. Show. These caches have a large space to ensure the hit rate. However, due to the randomness of the network packets, the locality of the instructions is not strong. Therefore, the large-capacity cache does not guarantee the efficiency of the indexing, and also causes a large waste of resources.

Other network processors use a polling instruction storage scheme to store the instructions required by a group of microengines in the same number of random access memories (RAMs) as the microengines, as shown in Figure 2. As shown, the four microengines in the figure poll the instructions in the four RAMs through an arbitration module. Each micro-arch engine accesses all of the RAM in turn, and their access is always in a different "phase", so that different micro-engines will not collide with the same RAM, realizing the sharing of storage space. However, due to the large number of jump instructions in the instruction, it is assumed that for the microengine of the pipeline structure, it takes n clocks from the start of the jump instruction to the completion of the jump. Ensure that the target of a jump instruction is in the n+1th RAM behind the RAM where the jump instruction is located. When writing the instruction, some empty instructions must be inserted to ensure the correct jump destination. When the proportion of jump instructions is large, a large number of empty instructions need to be inserted, which causes a large waste of instruction space, and also increases the complexity of the compiler implementation. This scheme requires all RAMs to return data in one clock cycle. Therefore, static RAM (SRAM) is required, but the use of a large number of SRAMs also causes a large amount of resource overhead.

Summary of the invention

Embodiments of the present invention provide an instruction storage device of a network processor and an instruction storage method of the instruction storage device, which can save hardware resources.

An embodiment of the present invention provides an instruction storage device for a network processor, including: a fast memory (Qmem), a cache (cache), a first low-speed instruction memory, and a second low-speed instruction memory, where:

The network processor includes two or more micro-engine large groups, each micro-engine large group includes N micro-engines, and the N micro-engines are divided into two or more micro-engine groups;

Each microengine corresponds to a Qmem and a cache, the Qmem is connected to the microengine, and the cache is connected to the Qmem;

Each micro-engine group corresponds to a first low-speed instruction memory, and a cache corresponding to each of the micro-engine groups is connected to the first low-speed instruction memory;

Each micro-engine large group corresponds to a second low-speed instruction memory, and a cache corresponding to each micro-engine in the micro-engine large group is connected to the second low-speed instruction memory.

Optionally,

The Qmem is configured to: after receiving the instruction data request sent by the microengine, determine whether the Qmem has instruction data, and if yes, return the instruction data to the microengine, if not, then The cache sends the instruction data request.

Optionally,

The Qmem stores an instruction for an address segment that has the highest processing quality.

Optionally, The cache includes two cache lines, each cache line stores a plurality of consecutive instructions; the cache is set to: after receiving the instruction data request sent by the Qmem, determining whether the cache has the instruction data, if If yes, the instruction data is returned to the microengine through the Qmem, and if not, the instruction data request is sent to the first low speed instruction memory or the second low speed instruction memory.

Optionally,

The two Cache Lines process the message with a ping-pong operation, and the ping-pong operation is synchronized with the ping-pong operation of the message store.

Optionally,

The instruction storage device further includes a first arbitration module, a second arbitration module, and a third arbitration module, wherein:

Each microengine corresponds to a first arbitration module, and the first arbitration module is connected to a cache of each microengine;

Each micro-engine group corresponds to a second arbitration module, one end of the second arbitration module is connected to the first arbitration module of each micro-engine in the micro-engine group, and the other end is connected to the first low-speed instruction memory;

Each microengine large group corresponds to a third arbitration module, one end of the third arbitration module is connected to a first arbitration module of each microengine in the microengine large group, and the other end is connected to the second low speed instruction memory. Connected.

Optionally,

The first arbitration module is configured to: when buffering the instruction data request, determining whether the requested instruction is located in the first low speed instruction memory or in the second low speed instruction memory, when determining that the requested instruction is located in the Transmitting, by the first low speed instruction memory, the instruction data request to the first low speed instruction memory, and sending the instruction to the second low speed instruction memory when determining that the requested instruction is located in the second low speed instruction memory And receiving the instruction data returned by the first low speed instruction memory or the second low speed instruction memory, and returning the instruction data to the cache;

The second arbitration module is configured to: receive the finger sent by one or more first arbitration modules When the data is requested, an instruction data request is sent to the first low speed instruction memory processing, and the instruction data obtained by fetching the first low speed instruction memory is returned to the first arbitration module;

The third arbitration module is configured to: when receiving the instruction data request sent by the one or more first arbitration modules, select an instruction data request to be sent to the second low speed instruction memory to process, the second low speed instruction The instruction data obtained after the memory fetching is returned to the first arbitration module.

Optionally,

The cache is further configured to: update the cached content and the tag after receiving the instruction data returned by the first arbitration module.

Optionally,

Each microengine large group includes 32 microengines, which are divided into 4 microengine groups, and each microengine group includes 8 microengines.

The embodiment of the present invention further provides a method for storing an instruction by the instruction storage device as described above, wherein the instruction storage device is an instruction storage device as described above, and the method includes:

The fast memory (Qmem), after receiving the instruction data request sent by the microengine, determines whether the Qmem has instruction data, and if so, returns the instruction data to the microengine, and if not, sends the Instruction data request;

After receiving the instruction data request sent by the Qmem, the Cache Line in the cache determines whether the cache has the instruction data, and if so, returns the instruction data to the micro engine through the Qmem. If not, transmitting the instruction data request to the first low speed instruction memory or the second low speed instruction memory;

After receiving the instruction data request sent by the cache, the first low speed instruction memory searches for instruction data, and returns the found instruction data to the cache;

After receiving the instruction data request sent by the cache, the second low speed instruction memory searches for the instruction data, and returns the found instruction data to the cache.

Optionally, the method for storing an instruction further includes: A cache line in the cache sends the instruction data request to the first arbitration module when determining that the cache does not have the instruction data, and the first arbitration module determines that the requested instruction is located in the first low speed instruction. In the case of a memory, the instruction data request is sent to the first low speed instruction memory, and the first arbitration module determines that the requested instruction is in the second low speed instruction memory if located in the second low speed instruction memory. Sending the instruction data request.

Optionally, the method for storing an instruction further includes:

When the first arbitration module determines that the requested instruction is located in the first low-speed instruction memory, sends the instruction data request to the second arbitration module, and the second arbitration module receives one or more first arbitration modules. When the transmitted instruction data is requested, an instruction data request is selected and sent to the first low speed instruction memory;

When the first arbitration module determines that the requested instruction is located in the second low-speed instruction memory, the instruction data request is sent to the third arbitration module, and the third arbitration module receives one or more first arbitration modules. When the transmitted command data is requested, an instruction data request is selected and sent to the second low speed instruction memory.

In the fast memory and cache-based instruction storage scheme applicable to the multi-core network processor provided by the embodiment of the present invention, a fast memory, a small-capacity ping-pong operation buffer, and a low-speed dynamic RAM (DRAM) are combined. Together, where the memory uses a hierarchical grouping strategy. Using this kind of instruction storage scheme, it effectively guarantees high indexing efficiency and high average indexing efficiency of some instructions, and saves a lot of hardware storage resources, and the compiler is also very simple to implement.

BRIEF abstract

FIG. 1 is a schematic structural diagram of a conventional two-level cache.

2 is a schematic structural diagram of an instruction storage scheme of a polling mode.

Figure 3 is a block diagram showing the structure of an instruction storage device in accordance with Embodiment 1 of the present invention.

4 is a schematic structural diagram of a specific instruction storage device according to an embodiment of the present invention.

FIG. 5 is a schematic diagram of a ping-pong operation of a message memory and an icache according to an embodiment of the present invention.

Figure 6 is a process flow diagram of an instruction storage device in accordance with an embodiment of the present invention. Figure Ί is a detailed process flow diagram of an instruction storage device in accordance with an embodiment of the present invention. FIG. 8 is a process diagram of a Cache Line operation in a cache module according to an embodiment of the present invention.

Preferred embodiment of the invention

In the embodiment of the present invention, a fast memory (Qume Memory, referred to as Qmem), a small-capacity ping-pong Cache (Cache), and a low-speed RAM (for example, an Instruction Memory (IMEM)) are combined as a micro-engine. Cache.

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other.

Example 1

The instruction storage device of this embodiment is as shown in Fig. 3, and the following structure is employed.

A micro-engine large group includes N micro-engines, and the N micro-engines are divided into two or more micro-engine groups, each micro-engine corresponding to one Qmem and one Cache, and each micro-engine group corresponds to a first low-speed instruction memory ( Hereinafter, IMEM), the N microengines of the micro-engine group correspond to a second low-speed instruction memory (hereinafter referred to as IMEM-COM). As shown in FIG. 3, the Qmem is set to be connected to the microengine, and the cache is connected to the Qmem; The cache corresponding to each microengine in the microengine group is connected to the first low speed instruction memory; the cache corresponding to each microengine in the microengine group is connected to the second low speed instruction memory.

The Qmem is set to: after receiving the instruction data request sent by the microengine, determine whether the Qmem has instruction data, and if so, return the instruction data to the microengine, and if not, send the instruction data request to the cache. The Qmem stores instructions for an address segment with the highest processing quality, which can be implemented by SRAM with fast read/write speed. The content in Qmem will not be updated during the message processing. When the micro engine needs this part of the instruction, Qmem can return the required instruction data of the micro engine in one clock cycle, which greatly improves the efficiency of indexing.

The Cache has two Cache Lines (no general Chinese technical terminology), each Cache Line can store multiple consecutive instructions, and the Cache Line is set to determine whether the cache has the instruction data after receiving the instruction data request sent by Qmem. If yes, return the command data to the microengine via Qmem. If not, send the command data to IMEM or IMEM-COM. Request. The two Cache Lines process the message with a ping-pong operation, and the ping-pong operation is synchronized with the ping-pong operation of the message memory.

The above IMEM and IMEM-COM are respectively set to: store a piece of instructions located in different address segments, request to find instruction data based on the instruction data, and return the instruction data.

For the above four storage locations: Qmem, Cache, IMEM, and IMEM-COM, the access speed is reduced in turn. Hierarchical memory can effectively utilize the difference in the probability of instruction execution, thereby optimizing the efficiency of the micro-engine fetching instructions. Since more low-speed memory is used, hardware resources are saved.

Optionally, the apparatus further includes a first arbitration module (arbiter), a second arbitration module (arbiter2), and a third arbitration module (arbiter3). Each microengine corresponds to an arbiterl, and the arbiterl is connected to the cache of each microengine; each microengine group corresponds to an arbiter2, one end of which is connected to the abiter1 of each microengine in the microengine group, and the other end is connected to the IMEM. Each micro-bow I engine group corresponds to an arbiter3, one end of which is connected to the abiterl of each micro-engine in the micro-engine large group, and the other end is connected to IMEM-COM.

The arbiterl is set to: when the cache instruction data request is made, determine whether the requested instruction is located in the IMEM or in the IMEM-COM, and when determining that the requested instruction is located in the IMEM, send an instruction data request to the IMEM, when determining that the requested instruction is located in the IMEM - COM, send an instruction data request to IMEM-COM; and receive instruction data returned by IMEM or IMEM-COM, and return the instruction data to the cache;

The arbiter2 is set to: when receiving one or more instruction data requests sent by arbiterl, select an instruction data request to send to the IMEM for processing, and return the instruction data obtained by the IMEM fetching to arbiterl;

The arbiter3 is set to: when receiving one or more instruction data requests sent by arbiterl, select an instruction data request to be sent to the IMEM-COM process, and return the instruction data obtained by the IMEM-COM fetch to the arbiter.

Taking N=32 as an example, each group of 32 micro-engines can be divided into 4 groups, each group including 8 micro-engines. As shown in Figure 4, each microengine corresponds to a Qmem and a Cache (including two instruction caches (icache)), and each group of 8 microengines share an IMEM, each group of 32 microengines share an IMEM—C0M. In Fig. 4, A1 represents arbiterl, A2 represents arbiter2, and A3 represents arbiter3. As shown in Figure 5, the two icache correspond to two message stores in the ME, which work in turn to mask the delay in message storage and fetching.

Example 2

Corresponding to the instruction storage device shown in FIG. 3, the instruction storage method of the instruction storage device is as shown in FIG. 6, and includes the following steps.

Step 1: After receiving the instruction data request sent by the microengine, Qmem determines whether the Qmem has the instruction data, and if so, returns the instruction data to the microengine, and if not, sends the instruction data request to the cache.

Step 2: After receiving a command data request sent by Qmem, a cache line in the cache determines whether the cache has the command data, and if so, returns the command data to the micro engine through Qmem, if not, to the IMEM or IMEM—COM sends an instruction data request.

Step 3: After receiving the instruction data request sent by the cache, the IMEM searches for the instruction data, and returns the found instruction data to the cache. After receiving the instruction data request sent by the cache, the IMEM searches for the instruction data and returns the search to the cache. The instruction data to.

For any microengine, the instruction fetch process is shown in Figure 7, which includes the following steps.

Step 110: The micro engine sends the required instruction address and address enable to the Qmem of the micro engine.

When receiving the message, the message memory in the micro engine sends the instruction address and address in the message to the instruction storage device, that is, the Qmem corresponding to the micro engine.

Step 120: Qmem determines whether the instruction address is within the address range of the instruction it stores. If yes, step 130 is performed. If not, step 140 is performed.

Step 130: Qmem uses the instruction address and the address to enable fetching instruction data and return the instruction data to the microengine. The fetching process ends.

In step 140, Qmem transmits the instruction address and address enable to the Cache of the micro engine. Step 150: The Cache determines whether the instruction address is within the address range of the instruction it stores. If yes, step 160 is performed. If no, step 170 is performed. Since each part of the Cache has only one Cache Line, the Cache tag has only one tag information. When the address request is sent to the Cache, it can immediately determine whether the required data is in the Cache according to the tag, that is, The bit corresponding to the instruction address is compared with the tag corresponding to the currently working Cache Line. If they are the same, the instruction is in the Cache. If the command is different, the instruction is not in the Cache.

Step 160: The Cache extracts the instruction data corresponding to the location in the Cache Line based on the address and sends the instruction data to the micro engine through the Qmem, and the fetching process ends.

Step 170: The Cache sends the instruction address and address enable to the first arbitration module (arbiterl). Step 180, arbiterl determines whether the instruction address is in the IMEM corresponding to the microengine group of the microengine, or in the IMEM-COM corresponding to the microengine large group of the microengine, if in the IMEM, step 190 is performed. In IMEM-COM, step 210 is performed.

Arbiterl determines whether the instruction is in IMEM or IMEM-COM based on the instruction address. Step 190, arbiterl sends the instruction address and address enable to the second arbitration module (arbiter2). Step 200: arbiter2 selects an instruction request to send to the IMEM, and the IMEM enables the instruction data according to the instruction address and address in the request, and returns the instruction data to the Cache through arbiterl, and step 230 is performed.

When arbiter1 corresponding to multiple microengines initiates an instruction fetching request to arbiter2, arbiter2 processes the requests of each Cache by polling, selects an instruction fetch request and sends IMEM processing, and since the data return requires multiple clock cycles, it has already sent out The requested branch will no longer be polled.

Step 210, arbiterl sends the instruction address and address enable to the third arbitration module (arbiter3). Step 220: arbiter3 selects an instruction request to send to IMEM_COM, and the IMEM_COM enables the instruction data according to the instruction address and address in the request, and returns the instruction data to the Cache through arbiterl, and step 230 is performed.

Each microengine corresponds to the function of arbiter with the function of arbiterl, and the function of arbiter3 is the same as the function of arbiter2.

Step 230: The Cache updates the contents of the Cache Line and the Tag, and returns the instruction data to the micro engine through the Qmem, and the fetching process ends.

Figure 8 is a schematic diagram of the structure of the icon of Figure 5, icache receives the command address sent by Qmem After comparing with Tag, it is judged whether it is hit. If it is hit, after decoding, according to the address enable, the instruction content is fetched from the physical storage location of the icon, and the instruction content is output through the multiplexer. If it is missed, continue to go. The low-speed instruction memory fetches the instruction data, and the returned instruction data is output through the multiplexer.

When processing the same message, only use one Cache Line in the Cache to work. The Cache Linel used by the current message finds the corresponding instruction data in the Cache, and does not issue a read request to the lower-level low-speed instruction memory (IMEM or IMEM-COM), if Cache Line2 detects the first address in the next message. If requested, Cache Line2 issues a read request to the lower low-speed instruction memory with the instruction first address contained in the next message to obtain the instruction data required in the next message. After the current Cache Linel packet is processed, the Cache switches to the other half of Cache Line 2 to prepare for processing the next packet. In this way, the ping-pong operation to process the message can effectively cover the time of the message storage and the delay of the instruction to the low-speed instruction memory. When the micro-engine switches to the next message, the required instruction can be obtained immediately. The efficiency of the fetching is increased, so that the processing efficiency of the microengine is improved.

One of ordinary skill in the art will appreciate that all or a portion of the above steps may be accomplished by a program instructing the associated hardware, such as a read-only memory, a magnetic disk, or an optical disk. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the above embodiment may be implemented in the form of hardware or in the form of a software function module. Embodiments of the invention are not limited to any particular form of combination of hardware and software.

It is a matter of course that the invention may be embodied in various other forms and modifications without departing from the spirit and scope of the invention.

Industrial applicability

The instruction storage scheme of the embodiment of the invention effectively ensures high indexing efficiency and high average indexing efficiency of a part of the instructions, and saves a large amount of hardware storage resources, and the compiler is also very simple to implement.

Claims

Claim

A command storage device for a network processor, the network processor comprising two or more micro-engine groups, each micro-engine group comprising N micro-engines, the N micro-engines comprising more than two micro-engine groups The instruction storage device includes: a flash memory (Qmem), a cache (cache), a first low speed instruction memory, and a second low speed instruction memory, wherein:

Each micro engine corresponds to a Qmem and a cache, the Qmem is set to be connected to the micro engine, and the cache is connected to the Qmem;

2. The instruction storage device of claim 1, wherein:

3. Apparatus according to claim 1 or 2, wherein:

4. The instruction storage device of claim 1, wherein:

The cache includes two Cache Lines, each Cache Line stores a plurality of consecutive instructions; the Cache Line is configured to determine whether the cache has the instruction data after receiving the instruction data request sent by the Qmem, if If yes, the instruction data is returned to the microengine through the Qmem, and if not, the instruction data request is sent to the first low speed instruction memory or the second low speed instruction memory.

5. Apparatus according to claim 4 wherein:

6. The instruction storage device according to claim 1 or 2 or 4 or 5, further comprising a first arbitration mode a block, a second arbitration module, and a third arbitration module, wherein:

7. The instruction storage device of claim 6, wherein:

The second arbitration module is configured to: when receiving the instruction data request sent by the one or more first arbitration modules, select an instruction data request to be sent to the first low speed instruction memory to process, the first low speed instruction The instruction data obtained after the memory fetching is returned to the first arbitration module;

8. The instruction storage device of claim 7, wherein:

The cache is further configured to: after receiving the instruction data returned by the first arbitration module, update the cache content and the label.

9. The instruction storage device of claim 1, 2, 4, 5, 7 or 8, wherein: each microengine large group comprises 32 microengines, and the 32 microengines are divided into 4 microengine groups. Each microengine group includes 8 microengines.

10. A method of storing instructions using the instruction storage device of claim 1 , the method comprising:

11. The method of storing instructions according to claim 10, further comprising:

A cache line in the cache sends the instruction data request to the first arbitration module when determining that the cache does not have the instruction data, and the first arbitration module determines that the requested instruction is located in the first low speed instruction. In the case of a memory, the instruction data request is sent to the first low speed instruction memory, and the first arbitration module determines that the requested instruction is located in the second low speed instruction memory, and then sends the instruction to the second low speed instruction memory. The instruction data request.

12. The method of storing instructions according to claim 11, further comprising:

The first arbitration module determines that the requested instruction is located in the second low speed instruction memory, And sending, by the third arbitration module, the instruction data request, when the third arbitration module receives the instruction data request sent by the one or more first arbitration modules, selecting an instruction data request to send to the second low-speed instruction memory. .