CN102855213B

CN102855213B - A kind of instruction storage method of network processing unit instruction storage device and the device

Info

Publication number: CN102855213B
Application number: CN201210233710.XA
Authority: CN
Inventors: 郝宇; 安康; 王志忠; 刘衡祁
Original assignee: ZTE Corp
Current assignee: Sanechips Technology Co Ltd
Priority date: 2012-07-06
Filing date: 2012-07-06
Publication date: 2017-10-27
Anticipated expiration: 2032-07-06
Also published as: CN102855213A; WO2013185660A1

Abstract

The invention discloses the instruction storage method of a kind of network processing unit instruction storage device and the device, hardware resource can be saved.The network processing unit includes big group of more than two micro engines, and each big group of micro engine includes N number of micro engine, and N number of micro engine includes more than two micro engine groups, and the instruction storage device includes：Qmem, caching, the first low speed instruction memory and the second low speed instruction memory, wherein：Each one Qmem and caching of micro engine correspondence, Qmem is set to be connected with micro engine, and caching is connected with Qmem；Corresponding cache of each micro engine is connected with the first low speed instruction memory in one the first low speed instruction memory of each micro engine group correspondence, micro engine group；Corresponding cache of each micro engine is connected with the second low speed instruction memory in one the second low speed instruction memory of each big group of correspondence of micro engine, big group of micro engine.Substantial amounts of hardware store resource is saved using the program.

Description

A kind of instruction storage method of network processing unit instruction storage device and the device

Technical field

The present invention relates to internet arena, and in particular to the instruction of a kind of network processing unit instruction storage device and the device Storage method.

Background technology

With internet（Internet）Fast development, for core network interconnect core router interface rate 100Gbps is reached, the line card of the rate requirement core router can be handled rapidly by the message on line card, current industry Boundary mostly uses the structure of multi-core network processor.And the fetching efficiency instructed is to influence one of multi-core network processor performance Key factor.

In the network processor system of coenocytism, same group of micro engine (Micro Engine, abbreviation ME) has together The instruction demand of sample, due to the limitation of chip area and technique, it is impossible to be equipped with one piece of storage exclusively enjoyed for each micro engine Space instructs to store these.Therefore one corresponding scheme of design is needed to realize that one group of micro engine is empty to a piece of instruction storage Between it is shared, while can have higher fetching efficiency.

Some traditional multi-core network processors use the structure of multi-level buffer, and for example each micro engine is equipped with one individually Level cache, one group of micro engine shares the structure of a L2 cache to realize the shared of memory space, as shown in Figure 1.This A little cachings all have larger space to ensure hit rate, but are due to that the randomness of network message causes the locality of instruction not By force, therefore the caching of Large Copacity does not ensure that fetching efficiency, while also resulting in a large amount of wastes of resource.

Other network processing units employ the instruction storage scheme of polling type, and the instruction needed for one group of micro engine is stored With the RAM of micro engine equivalent amount in, as shown in Fig. 24 micro engines pass through in 4 RAM of an arbitration modules poll in figure Instruction.Each micro engine in turn accesses all RAM, and their access is in different " phase " all the time, therefore will not Occur the collision that different micro engines access same RAM, realize the shared of memory space.But it is due to exist largely in instructing Jump instruction, it is assumed that for the micro engine of pipeline organization, from starting to get jump instruction to when redirecting completion and needing n The time of clock, to ensure the target of some jump instruction where the jump instruction behind RAM in (n+1)th RAM, write instruction When have to insert some do-nothing instructions and ensure the correct of jump target position.When jump instruction proportion is very big It is accomplished by inserting substantial amounts of do-nothing instruction, causes a large amount of wastes of the instruction space, and the complexity of compiler realization can be increased Degree.The program requires that all RAM must can be realized in 1 clock cycle returned data using SRAM, but a large amount of SRAM make With also resulting in substantial amounts of resource overhead.

The content of the invention

The technical problem to be solved in the present invention is to provide the instruction of a kind of network processing unit instruction storage device and the device Storage method, can save hardware resource.

In order to solve the above technical problems, the invention provides a kind of network processing unit instruction storage device, network processing unit Including big group of more than two micro engines, each big group of micro engine includes N number of micro engine, and N number of micro engine includes two or more Micro engine group, the instruction storage device includes：Fast storage（Qmem）, caching（cache）, the first low speed instruction deposits Reservoir and the second low speed instruction memory, wherein：

Each one Qmem and caching of micro engine correspondence, Qmem is set to be connected with micro engine, caching and Qmem phases Even；

Each micro engine is corresponding in one the first low speed instruction memory of each micro engine group correspondence, micro engine group Caching is connected with the first low speed instruction memory；

Each micro engine is corresponding in one the second low speed instruction memory of each big group of correspondence of micro engine, big group of micro engine Caching is connected with the second low speed instruction memory.

Further, the Qmem is used for after the director data request of micro engine transmission is received, and judges that this Qmem is It is no to have the director data, if so, director data then is returned into micro engine, if it is not, sending director data to caching Request.

Further, the instruction to handling one address field of quality requirement highest is stored in the Qmem.

Further, described cache includes two Cache Line, a plurality of continuous instruction of each Cache Line storages, The Cache Line are used for after the director data request of Qmem transmissions is received, and judge whether this caching has the instruction number According to, if so, director data then is returned into micro engine by Qmem, if it is not, to the first low speed instruction memory or Second low speed instruction memory sends director data request.

Further, described two Cache Line use ping-pong operation form, and with the ping-pong operation of packet storage device It is synchronous.

Further, described device also includes the first arbitration modules, the second arbitration modules and the 3rd arbitration modules, wherein：

One the first arbitration modules of each micro engine correspondence, first arbitration modules are connected with the caching of each micro engine；

In one the second arbitration modules of each micro engine group correspondence, one end of second arbitration modules and micro engine group First arbitration modules of each micro engine are connected, and the other end is connected with the first low speed instruction memory；

One the 3rd arbitration modules of each big group of correspondence of micro engine, one end and each micro engine of the 3rd arbitration modules First arbitration modules are connected, and the other end is connected with the second low speed instruction memory.

Further, first arbitration modules, in cache request director data, judging asked command bits Still it is located at the second low speed instruction memory in the first low speed instruction memory, to the first low speed instruction memory or the second low speed Command memory sends director data request；And for receiving the first low speed instruction memory or the second low speed instruction memory The director data of return, caching is returned to by the director data；

Second arbitration modules, for receiving the director data request that one or more first arbitration modules are sent When, one director data request of selection is sent to the processing of the first low speed instruction memory, by the first low speed instruction memory fetching After obtain director data and return to corresponding first arbitration modules；

3rd arbitration modules, for receiving the director data request that one or more first arbitration modules are sent When, one director data request of selection is sent to the processing of the second low speed instruction memory, by the second low speed instruction memory fetching After obtain director data and return to corresponding first arbitration modules.

Further, the caching is additionally operable to after the director data of the first arbitration modules return is received, and updates caching Content and label.

Further, big group of each micro engine includes 32 micro engines, and 32 micro engines include 4 micro engine groups, Each micro engine group includes 8 micro engines.

In order to solve the above technical problems, present invention also offers a kind of instruction storage method of instruction storage device, it is described Instruction storage device is foregoing instruction storage device, and methods described includes：

Fast storage（Qmem）After the director data request of micro engine transmission is received, judge whether this Qmem has this Director data, if so, director data then is returned into micro engine, if it is not, sending director data request to caching；

A Cache Line in the caching judge that this caching is after the director data request of Qmem transmissions is received It is no to have the director data, if so, director data then is returned into micro engine by Qmem, if it is not, to the first low speed Command memory or the second low speed instruction memory send director data request；

The first low speed instruction memory is after the director data request that caching is sent is received, look-up command data, The director data found is returned to caching；

The second low speed instruction memory is after the director data request that caching is sent is received, look-up command data, The director data found is returned to caching.

Further, methods described also includes：

A Cache Line in the caching ask director data to send out when judging this caching without the director data The first arbitration modules are given, first arbitration modules judge asked instruction if located in the first low speed instruction memory, then To the first low speed instruction memory send director data request, the instruction asked if located in the second low speed instruction memory, To the second low speed instruction memory requests director data.

Further, methods described also includes：

First arbitration modules judge asked instruction if located in the first low speed instruction memory, then secondary to second Cut out module and send director data request, second arbitration modules receive the instruction number that one or more first arbitration modules are sent During according to request, one director data request of selection is sent to the first low speed instruction memory；

First arbitration modules judge asked instruction if located in the second low speed instruction memory, then secondary to the 3rd Cut out module and send director data request, the 3rd arbitration modules receive the instruction number that one or more first arbitration modules are sent During according to request, one director data request of selection is sent to the second low speed instruction memory.

What the embodiment of the present invention was provided is applied to the instruction based on fast storage and caching of multi-core network processor Storage scheme, is combined together fast storage, low capacity and using the caching and low speed DRAM memory of ping-pong operation, Wherein memory uses the grouping strategy of stratification.Storage scheme is instructed effectively to ensure that the height of part instruction takes using this kind Refer to efficiency and higher average fetching efficiency, and save substantial amounts of hardware store resource, while the realization of compiler also ten Divide simple.

Brief description of the drawings

Fig. 1 is the structural representation of traditional two-level cache；

Fig. 2 is the structural representation of the instruction storage scheme of polling mode；

Fig. 3 is a kind of structural representation of the instruction storage device of embodiment 1；

Fig. 4 is a kind of specific instruction storage device structural representation；

Fig. 5 is the schematic diagram of packet storage device and icache ping-pong operations；

Fig. 6 is instruction storage device process chart；

A kind of instruction storage device detailed process figures of Fig. 7；

Fig. 8 is the procedure chart of a Cache Line job in Cache modules in the present invention.

Embodiment

The present invention is considered by fast storage (Quick Memory, abbreviation Qmem) and low capacity and using ping-pong operation Cache (Cache), and low speed RAM memory（Such as low speed instruction memory（Instruction memory, abbreviation IMEM）） Combine the caching as micro engine.

For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with accompanying drawing to the present invention Embodiment be described in detail.It should be noted that in the case where not conflicting, in the embodiment and embodiment in the application Feature can mutually be combined.

Embodiment 1

The command memory of the present embodiment is as shown in figure 3, using following structure：

One, which organizes greatly N number of micro engine, is divided into more than two groups, and each micro engine corresponds to Qmem and Cache, Per group's micro engine one the first low speed instruction memory of correspondence（Hereinafter referred to as IMEM）, this one organizes greatly N number of micro engine correspondence one Individual second low speed instruction memory（Hereinafter referred to as IMEM_COM）, as shown in figure 3, Qmem is set to be connected with micro engine, caching It is connected with Qmem；Corresponding cache of each micro engine is connected with the first low speed instruction memory in micro engine group；Micro engine is big Corresponding cache of each micro engine is connected with the second low speed instruction memory in group, wherein：

The Qmem is used for after the director data request of micro engine transmission is received, and judges whether this Qmem has the instruction number According to if so, director data then is returned into micro engine, if it is not, sending director data request to caching.The Qmem is excellent Instruction of the choosing storage to processing one address field of quality requirement highest, is preferably realized by the fast SRAM of read or write speed.In Qmem Content will not will be updated again during Message processing, when this part of micro engine demand is instructed, Qmem can be Director data needed for returning to micro engine in one clock cycle, substantially increases fetching efficiency；

The Cache has two Cache Line（Without universal Chinese technical term）, each Cache Line can deposit A plurality of continuous instruction, Cache Line are used for after the director data request of Qmem transmissions is received, and whether judge this caching There is the director data, if so, director data then is returned into micro engine by Qmem, if it is not, to IMEM or IMEM_ COM sends director data request.Two Cache Line use ping-pong operation form, and same with the ping-pong operation of packet storage device Step；

Above-mentioned IMEM and IMEM_COM are respectively used to a piece of instruction that storage is located at different address section, please based on director data Seek look-up command data and return.

Aforementioned four storage location：Qmem, Cache, IMEM, IMEM_COM, access speed are also reduced successively.Using level The memory of change can effectively utilize the difference for the probability that instruction is performed, so as to optimize the efficiency that micro engine gets instruction.By Slow memory is employed in more, hardware resource has been saved.

Preferably, the device also includes the first arbitration modules（arbiter1）, the second arbitration modules（arbiter2）With Three arbitration modules（arbiter3）.Each micro engine correspondence one arbiter1, the arbiter1 and each micro engine caching It is connected；Each micro engine group correspondence one arbiter2, the arbiter2 one end and each micro engine in micro engine group Arbiter1 be connected, the other end is connected with IMEM；Each corresponding arbiter3 of big group of micro engine, the one of the arbiter3 End is connected with the arbiter1 of each micro engine, and the other end is connected with IMEM_COM.

The arbiter1 is used in cache request director data, judges that asked instruction is still located at positioned at IMEM IMEM_COM, director data request is sent to IMEM or IMEM_COM；And for receiving the finger that IMEM or IMEM_COM is returned Data are made, the director data is returned into caching；

The arbiter2 is used to, when receiving the director data request that one or more arbiter1 are sent, select one Director data request is sent to IMEM processing, and director data will be obtained after IMEM fetchings and returns to corresponding arbiter1；

The arbiter3 is used to, when receiving the director data request that one or more arbiter1 are sent, select one Director data request is sent to IMEM_COM processing, is returned to director data is obtained after IMEM_COM fetchings accordingly arbiter1。

By taking N=32 as an example, every group of 32 micro engines can be divided into 4 groups, per 8 micro engines of group.As shown in figure 4, every One Qmem and Cache of individual micro engine correspondence（Including two instruction buffers（icache））, shared per 8 micro engines of group One IMEM, every group of 32 micro engines share an IMEM_COM.A1 represents that arbiter1, A2 represent arbiter2, A3 in Fig. 4 Represent arbiter3.As shown in figure 5, two packet storage devices in two icache and ME are corresponded, they take turns to operate To cover the delay of packet storage and fetching.

Embodiment 2

Instruction storage device shown in corresponding diagram 3, corresponding instruction storage method as shown in fig. 6, including：

Step 1, Qmem judges whether this Qmem has the instruction number after the director data request of micro engine transmission is received According to if so, director data then is returned into micro engine, if it is not, sending director data request to caching；

Step 2, the Cache Line in caching judge this caching after the director data request of Qmem transmissions is received Whether have the director data, if so, director data then is returned into micro engine by Qmem, if it is not, to IMEM or IMEM_COM sends director data request；

Step 3, IMEM is after the director data request that caching is sent is received, and look-up command data are looked into caching return The director data found；IMEM_COM is after the director data request that caching is sent is received, and look-up command data are returned to caching Return the director data found.

Specifically, to any one micro engine, instruction fetch process is as shown in fig. 7, comprises following steps：

Step 110, micro engine sends the IA and address enable of demand to the Qmem of the micro engine；

Specifically, the instruction first address in message and address enable are sent out when the packet storage device in micro engine receives message Instruction storage device is given, i.e. the corresponding Qmem of the micro engine.

Step 120, Qmem judges the IA whether in the address realm that it deposits instruction, if performed Step 130, step 140 is otherwise performed；

Step 130, micro engine is returned to the IA and the address enable instruction fetch data, this fetching process knot Beam；

Step 140, the IA and address enable are sent to the Cache of the micro engine；

Step 150, Cache judges the IA whether in the address realm that it deposits instruction, if it is, performing step Rapid 160, otherwise perform step 170；

Due to Cache every part only one of which Cache Line, therefore Cache label（Tag）Also only one of which mark The information of label, when Address requests are sent to Cache, according to tag at once it may determine that whether data are in Cache needed for going out In, will IA corresponding positions tag corresponding with the CacheLine of work at present contrasted, if identical, illustrate this Otherwise instruction illustrates the instruction not in Cache in Cache.

Step 160, the director data of correspondence position in Cache Line is taken out based on address enable and given by Qmem Micro engine, this fetching process terminates；

Step 170, the IA and address enable are delivered to the first arbitration modules by Cache（arbiter1）；

Step 180, during arbiter1 judges that the IA is the corresponding IMEM of the group where the micro engine, or In the corresponding IMEM_COM of micro engine group where the micro engine, if in IMEM, step 190 is performed, if in IMEM_ In COM, then step 210 is performed；

Specifically arbiter1 judges that the instruction is in IMEM or in IMEM_COM according to IA；

Step 190, arbiter1 sends IA and address enable to the second arbitration modules（arbiter2）；

Step 200, arbiter2 select an instruction request send to IMEM, IAes of the IMEM in request and Address enable instruction fetch data, Cache is returned to by arbiter1, performs step 230；

When there is the corresponding arbiter1 of multiple micro engines to initiate fetching request to arbiter2, arbiter2 passes through wheel The mode of inquiry handles each cache request, and IMEM processing is sent in selection one fetching request, and multiple clock weeks are needed because data are returned Phase, having been sent from the branch road of request will not be polled to again；

Step 210, arbiter1 sends IA and address enable to the 3rd arbitration modules（arbiter3）；

Step 220, arbiter3 selects an instruction request to send to IMEM_COM, fingers of the IMEM_COM in request Address and address enable instruction fetch data are made, Cache is returned to by arbiter1, step 230 is performed；

The corresponding arbiter of each micro engine function same arbiter1, the arbiter3 same arbiter2 of function.

Step 230, Cache Line and tag content are updated, and the director data is returned into micro engine by Qmem, This fetching process terminates.

The structural representation that Fig. 8 is icache in Fig. 5, icache is received after the IA that Qmem is sent, and is carried out with tag Compare, judge whether hit, if hit, after decoding, according to address enable from icache physical storage locations fetching Content is made, is exported by MUX, if miss, continues low speed instruction memory instruction fetch data, return Director data is exported through MUX.

Only carry out work using one of Cache Line when handling same message.Used in current message Cache Line1 find corresponding director data in Cache, not to subordinate's slow memory（IMEM or IMEM_COM）Send During read request, now, if Cache Line2 detect the request of first address in next message, with institute in next message Comprising instruction first address to subordinate's slow memory send read request, to obtain the director data needed for next message. After current Cache Line1 Message processings are complete, Cache is switched to second half Cache Line 2 to prepare to handle next report Text.Message is so handled using ping-pong operation can effectively be covered the time of packet storage and remove the command memory of low speed The delay of middle fetching, required instruction can be just got when micro engine is switched to next message at once, improve fetching effect Rate, so that the treatment effeciency of micro engine is improved.

One of ordinary skill in the art will appreciate that all or part of step in the above method can be instructed by program Related hardware is completed, and described program can be stored in computer-readable recording medium, such as read-only storage, disk or CD Deng.Alternatively, all or part of step of above-described embodiment can also use one or more integrated circuits to realize.Accordingly Each module/unit in ground, above-described embodiment can be realized in the form of hardware, it would however also be possible to employ the shape of software function module Formula is realized.The present invention is not restricted to the combination of the hardware and software of any particular form.

Certainly, the present invention can also have other various embodiments, ripe in the case of without departing substantially from spirit of the invention and its essence Various corresponding changes and deformation, but these corresponding changes and change ought can be made according to the present invention by knowing those skilled in the art Shape should all belong to the protection domain of appended claims of the invention.

Claims

1. a kind of network processing unit instruction storage device, network processing unit includes big group of more than two micro engines, each micro- to draw Holding up big group includes N number of micro engine, and N number of micro engine is divided into more than two micro engine groups, the instruction storage device bag Include：Fast storage Qmem, caching cache, the first low speed instruction memory and the second low speed instruction memory, wherein：

Each one Qmem and caching of micro engine correspondence, Qmem is set to be connected with micro engine, and caching is connected with Qmem；

The corresponding caching of each micro engine in one the first low speed instruction memory of each micro engine group correspondence, micro engine group It is connected with the first low speed instruction memory；

The corresponding caching of each micro engine in one the second low speed instruction memory of each big group of correspondence of micro engine, big group of micro engine It is connected with the second low speed instruction memory.

2. device as claimed in claim 1, it is characterised in that：

The Qmem is used for after the director data request of micro engine transmission is received, and judges whether this Qmem has the instruction number According to if so, director data then is returned into micro engine, if it is not, sending director data request to caching.

3. device as claimed in claim 1 or 2, it is characterised in that：

The instruction to handling one address field of quality requirement highest is stored in the Qmem.

4. device as claimed in claim 1, it is characterised in that：

The caching includes two Cache Line, each Cache Line and deposits a plurality of continuous instruction, the Cache Line is used for after the director data request of Qmem transmissions is received, and judges whether this caching has the director data, if so, then Director data is returned into micro engine by Qmem, if it is not, to the first low speed instruction memory or the second low speed instruction Memory sends director data request.

5. device as claimed in claim 4, it is characterised in that：

Described two Cache Line use ping-pong operation form, and synchronous with the ping-pong operation of packet storage device.

6. the device as described in claim 1 or 2 or 4 or 5, it is characterised in that：

Described device also includes the first arbitration modules, the second arbitration modules and the 3rd arbitration modules, wherein：

It is each in one the second arbitration modules of each micro engine group correspondence, one end of second arbitration modules and micro engine group First arbitration modules of micro engine are connected, and the other end is connected with the first low speed instruction memory；

One the 3rd arbitration modules of each big group of correspondence of micro engine, one end of the 3rd arbitration modules and the first of each micro engine Arbitration modules are connected, and the other end is connected with the second low speed instruction memory.

7. device as claimed in claim 6, it is characterised in that：

First arbitration modules, in cache request director data, judging that asked instruction refers to positioned at the first low speed Make memory still be located at the second low speed instruction memory, sent out to the first low speed instruction memory or the second low speed instruction memory Director data is sent to ask；And for receiving the instruction number that the first low speed instruction memory or the second low speed instruction memory are returned According to the director data is returned into caching；

Second arbitration modules, during for being asked in the director data for receiving one or more first arbitration modules transmissions, One director data request of selection is sent to the processing of the first low speed instruction memory, will be obtained after the first low speed instruction memory fetching Corresponding first arbitration modules are returned to director data；

3rd arbitration modules, during for being asked in the director data for receiving one or more first arbitration modules transmissions, One director data request of selection is sent to the processing of the second low speed instruction memory, will be obtained after the second low speed instruction memory fetching Corresponding first arbitration modules are returned to director data.

8. device as claimed in claim 7, it is characterised in that：

The caching is additionally operable to after the director data of the first arbitration modules return is received, and updates cache contents and label.

9. the device as described in claim 1,2,4,5,7 or 8, it is characterised in that：

Each big group of micro engine includes 32 micro engines, and 32 micro engines are divided into 4 micro engine groups, and each micro engine is small Group includes 8 micro engines.

10. a kind of instruction storage method of instruction storage device, the instruction storage device is instruction as claimed in claim 1 Storage device, methods described includes：

Fast storage Qmem judges whether this Qmem has the instruction number after the director data request of micro engine transmission is received According to if so, director data then is returned into micro engine, if it is not, sending director data request to caching；

A Cache Line in the caching judge whether this caching has after the director data request of Qmem transmissions is received The director data, if so, director data then is returned into micro engine by Qmem, if it is not, to the first low speed instruction Memory or the second low speed instruction memory send director data request；

The first low speed instruction memory is after the director data request that caching is sent is received, look-up command data, Xiang Huan Deposit the director data for returning and finding；

The second low speed instruction memory is after the director data request that caching is sent is received, look-up command data, Xiang Huan Deposit the director data for returning and finding.

11. method as claimed in claim 10, it is characterised in that：

Methods described also includes：

Director data request is sent to by the Cache Line in the caching when judging this caching without the director data First arbitration modules, first arbitration modules judge asked instruction if located in the first low speed instruction memory, then to the One low speed instruction memory sends director data request, and the instruction asked is if located in the second low speed instruction memory, to the Two low speed instruction memory requests director datas.

12. method as claimed in claim 11, it is characterised in that：

Methods described also includes：

First arbitration modules judge asked instruction if located in the first low speed instruction memory, then to the second arbitration mould Block sends director data request, and the director data that second arbitration modules receive one or more first arbitration modules transmissions please When asking, one director data request of selection is sent to the first low speed instruction memory；

First arbitration modules judge asked instruction if located in the second low speed instruction memory, then to the 3rd arbitration mould Block sends director data request, and the director data that the 3rd arbitration modules receive one or more first arbitration modules transmissions please When asking, one director data request of selection is sent to the second low speed instruction memory.