GB2539407A

GB2539407A - Data processing

Info

Publication number: GB2539407A
Application number: GB1510432.6A
Authority: GB
Inventors: Mcconnell Ray; Powell Ifor
Original assignee: Bluwireless Technology Ltd
Current assignee: Bluwireless Technology Ltd
Priority date: 2015-06-15
Filing date: 2015-06-15
Publication date: 2016-12-21
Anticipated expiration: 2035-06-15
Also published as: GB201510432D0; GB2539407B

Abstract

A network node device for a wireless mesh communications network including a processing unit operable to execute instructions delivered thereto, the instructions being for execution on data relating to modulated data packets being communicated to and from the network node device. The processing unit comprises a processing element having a plurality of function units (16) operable to execute respective functions in dependence upon received instructions in parallel with one another, and an instruction controller (40). The instruction controller (40) comprises a plurality of instruction pipelines (42), each of which is operable to deliver instructions to an associated function unit of the processing element for execution thereby, and a register (41) having a plurality of register entries, each of which is operable to store an instruction word therein. Delivery to function units is determined by a timing value for the instruction. An internal storage unit is used to ensure that the delay between execution instructions is minimised, and selected based on timing position. A media access control device may control flow of data packets to the unit.

Description

DATA PROCESSING

The present invention relates to data processing, an in particular to execution of instructions in a network node device of a wireless mesh communications network.

BACKGROUND TO THE INVENTION

Digital communications networks transfer data between sending and receiving devices using a series of network node devices. In wireless networks, these network node devices typically communicate using radio frequency signals. Each network node device processes received data packets to determine, for example, the next step of a route through the network for a data packet. Such processing requires efficient low latency processing devices at each network node device.

Figure 1 of the accompanying drawings illustrates a data processing device 1 having a processing element 10. Each processing element 10 receives data 11 to be processed in accordance with a received instruction 12. The processing element 10 receives a clock signal input 13 for synchronising operation and execution of the received instructions. Following execution of the instruction 12 on the data 11 by the processing element 10, a result 14 is output. The processing element 10 can be arranged to provide any appropriate function or functions.

Figure 2 of the accompanying drawings illustrates an exemplary processing element 10 which includes a plurality of function units 16 (16A to 16F) which have respective individual functions. For example, a function unit may provide a memory read function, a memory write function, an add function, a divide function, or a multiply function. The plurality of function units 16 can be arranged to provide a desired range of functions. It will be readily appreciated that each function unit may have any appropriate function, and that any appropriate combination of functions may be provided.

A data input 17 delivers data to be processed to the processing element 10, and a multiplexer 18 routes the data to the correct function unit dependent upon the contents of the data being received. An enable signal and a clock signal (not shown in Figure 2 for the sake of clarity) are provided to the function units. When the enable signal is provided to a function unit, then the function unit executes its function on received data on the next clock cycle or cycles. The number of cycles taken for execution of a particular function is dependent upon that function as is well known.

Following execution of the function, a function unit 16 provides processed data as an output 20 (20A to 20F). These outputs 20 (20A to 20F) are provided as inputs to a multiplexer 21 which operates to select one of the outputs 20 for output from the processing element 10 as an output 22.

In a previously considered processing element, instructions are executed serially in order of receipt, so that only one function unit in the plurality of function units is operating at any one time. This order of execution is determined by the program being executed on the processing device 1. In such an arrangement, only one output 20 is active at any one time, and the multiplexer 21 selects that output 20 as the output from the processing element 10.

In order to provide enhanced processing capabilities, and in order to reduce the need for external memory write and read operations (which add to delay and latency of processing), and to increase the number of instructions executing in parallel in one cycle, a processed data feedback architecture has been proposed for the processing element. Figure 3 of the accompanying drawings illustrates schematically such an architecture. The processing element 10' of Figure 3 includes an input multiplexer 24 for supplying data to be processed to the functions units 16. In contrast with the Figure 2 example, the input multiplexer 24 of the processing element 10' is connected to receive the outputs 22A to 22F of the functions units 16A to 16F. In this manner, the input multiplexer is able to feedback the result of one function unit to one of the function units for further processing in dependence upon the program being executed. In this manner, a series of instructions can be executed without the need for external memory input/output processes and increases the number of instructions executing in parallel in one cycle. Such a technique enables a series of instructions be processed more quickly and with lower delay.

However, when a program contains multiple sequences of instructions, execution of the instruction in a single series can lead to unnecessarily extended delays. In order to overcome this issue, in a paper entitled "Cheap Out-of-Order Execution using Delayed Issue" (0-7695-0801-4/00), J.P. Grossman of the Dept of EECS, MIT, presents a technique in which instruction sequences that are independent of one another are interleaved. In such a technique, instructions are executed such that multiple function units operate in parallel, with the requirement that instructions in a given sequence are executed in the correct order. Grossman achieves this by proposing to delay issuance of instructions to function units and controlling the order in which these instructions are executed. Grossman also discusses applying such a technique to looped instruction sequences. In such a manner it is possible to reduce the overall execution time of the independent instruction sequences.

However, such a technique can still result in unnecessary delays in processing sequences of instructions, particularly if those sequences include looped instructions.

The problem is particularly acute in data processing applications where low latency is desirable, if not essential. One example of such an application is in the wireless telecommunications field in which streams of data packets must be processed with low latency whilst maintaining data packet order and low rates of packet dropping.

SUMMARY OF THE INVENTION

The present invention seeks to address the problems of the prior art.

According to one aspect of the present invention, there is provided a method of processing instructions in a network node of a wireless mesh communications network, the network node including a processing unit which includes a processing element having a plurality of function units arranged in parallel with one another, and an instruction controller having a plurality of instruction pipelines associated with respective function units of the processing element, the method comprising receiving a plurality of instruction words from a data storage device, each instruction word including a plurality of instructions, and each instruction belonging to an instruction sequence, and having a timing value indicative of a relative timing for execution of the instruction in the instruction sequence to which the instruction belongs; storing such received instruction words in respective instruction register entries in a register of the instruction controller; retrieving an instruction word from an active register entry of the instruction register; supplying the instructions of the retrieved instruction word to respective instruction pipelines of the instruction controller in dependence upon a function of the instruction concerned, retaining the retrieved instruction word in the instruction register for subsequent further retrieval; propagating such supplied instructions through the instruction pipelines to respective function units of the processing element, such that the instructions for an instruction sequence are delivered to the associated function units for execution in an order determined by the timing value for the instructions of that instruction sequence; and executing instructions on respective function units of the processing element of the processing unit, wherein the instructions are executed on data relating to modulated data packets being communicated in such a communications network.

In one example, the instructions are supplied to the instruction pipeline such that delay between processing instructions in adjacent instruction sequences is minimized.

In one example, the initial timing positions are determined such that adjacent instruction sequences do not overlap in time on any one instruction pipeline.

In one example, wherein the instruction pipeline comprises a series of storage units, each of which is operable to store an instruction for a predetermined number of system clock cycles, and wherein the method includes, after a predetermined number of system clock cycles, for each such unit except the last in the series, passing a stored instruction the next unit in the series, and, for the last unit in the series, passing a instruction to the processing unit associated with the instruction pipeline concerned.

In one example, relative timing between instruction sequences is determined by detecting a position in at least one instruction pipeline for an instruction belonging to a first instruction sequence, detecting whether an instruction from a second instruction sequence is destined for execution at the same function unit as an instruction from the first instruction sequence, and, if so, determining a position in the pipeline such that the second instruction sequence does not overlap in time with the first instruction sequence.

According to another aspect of the present invention, there is provided a network node device for a wireless mesh communications network, the network node comprising a processing unit operable to execute instructions delivered thereto, the instructions being for execution on data relating to modulated data packets being communicated to and from the network node device; wherein the processing unit comprises a processing element having a plurality of function units operable to execute respective functions in dependence upon received instructions in parallel with one another; and an instruction controller comprising a plurality of instruction pipelines, each of which is operable to deliver instructions to an associated function unit of the processing element for execution thereby, each instruction belonging to an instruction sequence, and having a timing value indicative of a relative timing for execution of the instruction in the instruction sequence to which the instruction belongs; and a register having a plurality of register entries, each of which is operable to store an instruction word therein, wherein each instruction pipeline comprises a plurality of storage units, arrange in a series and operable to store an instruction therein, each storage unit, except the last in the series, being operable to transfer an instruction to the next unit in the series, the last unit in the series being operable to transfer an instruction to an associated function unit of a processing element; a timing controller operable to receive timing information for a received instruction, and to determine an initial storage unit into which the instruction is to be loaded, the initial storage unit being determined by the timing value of the instruction concerned, and by relative timing between instruction sequences; an instruction handler operable to receive, from an active register entry in the register, an instruction for a function unit of a processing element associated with the instruction pipeline concerned, and to load that instruction into a storage unit determined by the timing controller; and wherein each instruction pipeline is operable to propagate an instruction from the initial storage unit determined by the timing controller to a function unit associated with the pipeline concerned.

In one example, the timing controller is operable to determine an initial storage unit for an instruction such that the instructions are supplied to the instruction pipelines such that delay between execution of instructions in adjacent instruction sequences is minimized.

In one example, each instruction pipeline includes a position detector operable to determine a position of an instruction in the instruction pipeline concerned, and to transmit position information to the timing controllers of the instruction pipelines, and wherein the timing controllers are operable to use received position information in determining the initial storage unit for an instruction.

In one example, the timing controller of each pipeline is operable to determine relative timing between instruction sequences by detecting a position in at least one instruction pipeline for an instruction belonging to a first instruction sequence, detecting whether an instruction from a second instruction sequence is destined for execution at the same function unit as an instruction from the first instruction sequence, and, if so, determining a position in the pipeline such that the second instruction sequence does not overlap in time with the first instruction sequence.

In one example, the processing unit is a physical layer processor of such a network node.

In one example, the network node device further comprises a media access control 20 device operable to control flow of data packets to the processing unit, and a network processor operable to determine routing for data packets across a network.

In one example, the network node device further comprises an antenna unit having an array of antennas for transmission and/or reception of radio frequency signals, wherein the processing unit is operable to transfer signals with the antenna unit, and wherein the instructions also relate to a process for determining antenna weightings for antennas in the antenna array of the antenna unit.

A according to another aspect of the present invention, there is provided a mesh communications network comprising a plurality of such network node devices.

BRIEF DESCRIPTION OF THE DRAWINGS

An embodiment of the invention will now be described, by way of example only, and 5 with reference to the accompanying drawings, in which: Figure 1 illustrates schematically a simple data processor; Figure 2 illustrates schematically a processing element; Figure 3 illustrates schematically a second processing element; Figure 4 illustrates a compacted instruction word; Figure 5 illustrates an instruction from the compacted instruction word of Figure 4; Figure 6 illustrates an instruction pipeline for use with the function units of Figure 2; Figures 7A to 7D illustrates a plurality of instruction sequences stored in a register; Figure 8 is a flow chart illustrating steps in a method of delivering instructions to respective function units; Figures 9 to 21 illustrate delivery of instructions from the sequences of Figure 7 to respective function units; Figure 22 illustrates a single instruction multiple data (SIMD) architecture; Figure 23 illustrates a communications network node; Figure 24 illustrates a beamforming antenna; and Figure 25 illustrates a mesh communications network.

DETAILED DESCRIPTION OF THE INVENTION

As described above, Figure 3 illustrates a processing element 10' comprising a plurality of function units 16 (16A to 16F) connected to receive respective data items 19 (19A to 19F) via an input multiplexer 24, and to output respective processed data items 22A to 22. The function unit outputs 22A to 22F may be supplied to the input multiplexer 24 for supply to the function units 16A to 16F for further processing in accordance with the instruction sequence received by the processing element 10'. One or more of the outputs 22A to 22F may be supplied out of the processing element 10' as an output of the processing element 10' In an example, the outputs 22A to 22F of the function units 16A to 16F are made available at the output of the functions units from generation until the function unit concerned executes another instruction. In this way, the processed data is available locally within the processing element until needed. In another example embodiment, the function units may provide one or more registers or local storage elements for holding processed data locally in the processing element.

The function units 16A to 16F are arranged to receive sequences of instructions in order to perform the overall desired function. In one example embodiment of the present invention, these instructions are provided in the form of a compacted instruction word, such as is illustrated in Figure 4.

The compacted instruction word 30 in Figure 4 includes an instruction 30A to 30F for each of the function units 16A to 16F of the processing element 10'. For any given instruction sequence, the instruction field 30A to 30F may include an instruction or may be empty.

Figure 5 illustrates an example of an instruction for one function unit of the compacted 25 instruction word of Figure 4. The instruction comprises an enable/type field 31, first and second data fields 33 and 34, and a timing field 35. It will be readily appreciated that the structure illustrated in Figure 5 is merely exemplary and is intended to enable explanation of the principles of the present invention. Other instruction field formats and contents may be used as appropriate.

When supplied to the function unit 16 to which the instruction relates, the enable/type field 31 causes the function unit 16 to become active, and defines the particular type of function to be executed. For example, a multiply function may relate to a "simple" multiplication or to a floating point multiplication. The enable/type field 31 is decoded by the function unit 16 upon receipt by the unit.

Each instruction belongs to an instruction sequence determined by the overall program being executed.

The first and second data fields 33 and 34 indicate the data to be used for the instruction execution. The fields 33 and 34 may include the data itself, or may include a reference to a location of the data, such as a memory or register location for the data.

The timing field 35 includes information relating to the relative timing of the instruction in the instruction sequence to which the instruction belongs.

Figure 6 illustrates schematically an instruction controller 40 in combination with a processing element having a plurality of function units 16. The instruction controller 40 operates to receive compacted instructions, to expand and schedule those instructions and to deliver the instructions to the function units 16A to 16F for execution. The instruction controller 40 is arranged in a plurality of columns -one for each of the function units 16A to 16F. These columns are labelled A to F in Figure 6.

For the sake of clarity, Figure 6 illustrates the controller 40 arranged to deliver instructions to a single processing element. However, it will be appreciated that the instruction controller is also able to deliver the same instructions to a plurality of processing elements, thus providing a SIMD (single instruction multiple data) processing architecture. The description below of the operation of the controller 40 is set out with reference to a single processing element, but applies to the SIMD architecture.

The columns of the instruction controller 40 are identical to one another. A column includes an instruction register 41 into which an instruction is input. Since a compacted instruction word is used to provide the instructions for execution, the instruction register for each of the columns is loaded with the next instruction field for processing at the same time.

A column also includes an instruction pipeline 42 which is operable to deliver the instructions to the associated function unit 16. The instruction pipeline 42 comprises a plurality of storage units 43 which are connected in a series. In Figure 6, each pipeline 42comprises nine storage units 43, but any number of units may be provided. Each storage unit 43 in the series, except the last in the series, is operable to store an instruction and to pass that instruction to the next unit in the series on receipt of a clock pulse signal. The last storage unit 43 in the series is operable to pass the instruction to the function unit 16 associated with the pipeline 42.

An instruction handler 44 is operable to retrieve the instruction for the column concerned from the instruction register 41, and to transfer that retrieved instruction to an appropriate one of the storage units 43 in the pipeline 42 via an instruction delivery bus 45. A timing controller 46 detects the timing information (35, Figure 5) of an instruction and is operable to cause the instruction handler 44 to deliver the instruction to the correct storage unit 43 in the pipeline 42. A position detector 48 provides synchronisation information to the timing controller 46 in order to enable synchronisation of delivery of instructions to the pipeline 42 across different instruction sequences. Operation of the pipeline 42 will be described below in more detail.

In accordance with the principles of the present invention, the instruction register 41 of Figure 6 is provided by a multi-entry register, such as that illustrated in Figures 7A to 7D. The register 41 of Figures 7A to 7D has five entries 411 to 415, although it will be appreciated that the register 41 may be of any desired depth. In applications where sequences of instructions are repeated to form instruction loops, it is desirable to be able to reduce memory access operations. In accordance with the principles of the present invention, the instruction register 41 enables such reduction in memory access by providing a reloading loop 416.

Instructions are supplied to the instruction register 41 as a series of compacted instruction words. Such a series is illustrated in Figures 7A to 7D, in which each instruction has been simplified for the sake of clarity. Each instruction comprises the instruction fields set out in Figure 5, but representation in Figures 7A to 7D shows only the instruction sequence to which the instruction belongs and the timing field 35 for the sake of clarity.

In Figures 7A to 7D, the instruction sequence is represented by a letter -v, w, x, y, z, and the timing field 35 is represented by an integer -1 to 4. It will be readily appreciated that the representations of the sequences and timings shown here are purely exemplary and have been simplified for the sake of the clarity of the description of the principles of the invention. It will also be readily appreciated that any number of sequences may be supplied to the processing element, and each of these sequences may be of any depth. The sequences shown in Figures 7A to 7D are merely exemplary. The example of Figures 7A to 7D gives five sequences to be processed. The individual instructions may be any appropriate type of instruction, as required by the overall processing of the processing element.

The series of instruction sequences are loaded into the instruction register 41 from 20 memory (for example random access memory, RAM) in the order in which the sequences are to be executed. In this example, the sequences are delivered to the instruction register 41 in the order x, y, z, w, v.

As will be described below, the instructions stored in the first level of the instruction register 411 are delivered to the columns A to F of the instruction pipeline 42 for propagation to the function units. The remaining instructions are moved one position closer to the pipeline 42, leaving an empty register entry. This entry may be filled by a new instruction word retrieved from memory, or may be filled by the instruction word that has been transferred to the pipeline 42. The reloading loop enables this feedback of an instruction word in to the empty register entry 415.

The register entries may be provided by physical memory devices, and the instruction words may be moved between the devices. Alternatively, the register 41 may have 5 addressable entries, with the address being used to cycle through the instruction words stored in the instruction register 42 to achieve the looping capability.

Figures 7A to 7D show an exemplary loop in which the first, second and third sequences x, y and z are looped back in to the instruction register 41.

In the example shown, the first three sequences x, y, z are identical and in themselves 10 form an effective looped sequence. The sequences x, y, and z form a xyz super-sequence and are identified as such for timing and synchronisation purposes, as will be explained below.

Another way of forming such a looped sequence is to repeat a single instruction sequence. For example, looping the look up address of the instruction register enables 15 a repeat (or repeats) of a single instruction word to be achieved.

Figure 8 is a flow chart illustrating steps in a method for loading and propagating an instruction from the compacted instruction word to the function unit, and will also be referred to in relation to Figures 9 to 21.

The compacted instruction words relating to the instruction sequences (x, y, z, v, w) are loaded (step 100) into the instruction register 41. On the next clock cycle, respective instructions from the first sequence x are loaded (step 101) into the instruction handlers 44. The timing fields 35 are read (102) by the timing controllers 46. The instructions for the x-sequence are then looped backed in to the instruction register (Figure 7B).

In dependence upon the contents of the timing field 35, the appropriate storage unit 43 25 of the pipeline 42 is enabled (103), such that the instruction handler 44 is able to load (104) the instruction into the storage unit 43 via the delivery bus 45. Loading of the instructions into the storage units 43 is assumed to take a single clock cycle. Following loading, subsequent clock cycles cause the instructions to propagate (105) through the pipelines 42A to 42F until loaded (106) into the function units 16A to 16F.

Once the instructions from the one sequence have been loaded into the appropriate 5 storage units 43, the next sequence can be loaded. During such loading, the instructions reaching the function units 16 are executed and outputs provided.

Figures 9 to 21 illustrate propagation of instructions from the instruction sequences of Figure 7 to the functions units 16A to 16F. Figure 9 shows the compacted instruction word for the first sequence x loaded into the instruction register 41 ready for delivery to the pipeline 42. In Figures 9 to 21, only the current or active register entry is shown for the sake of clarity. Reference to Figures 7A to 7D show the contents of the register entries.

Figure 10 shows one clock cycle later with the x-sequence instructions loaded into the instruction handlers 44 and the timing field 35 into the timing controllers 46. The instructions for the second sequence, y, are loaded into the active register entry 41. In the example, the first instruction x1 of the x-sequence is to be executed by the function unit 16A, and so enters column A of the instruction controller 40.

After one clock cycle, and in dependence upon the timing information, the x-sequence instructions are loaded into the appropriate storage units 43, with the y-sequence instructions loaded into the instruction handlers 44 and timing controllers 46, and the z-sequence instructions loaded into the active register entry 41. The timing controller make use of position information supplied by the position detectors, in combination with sequence and timing information to determine the location into which an instruction is to be loaded in the pipeline 42. The x-sequence instruction word is looped into the register entry 415.

In the example shown in Figure 11, the first x-sequence instruction x1 is loaded into storage unit 43A1, that is the unit 43 in column A closest to the function unit 16A. The second instruction x2 is loaded into storage unit 43C2, the third instruction x3 into unit 43B3, and the fourth instruction x4 into unit 43F4. The instructions are loaded in this manner in order that they are delivered to the functions units 16, and hence are executed, in the correct order for the sequence concerned. This simplified example assumes that each instruction takes a single clock cycle to execute.

As shown in Figure 12, after a further clock cycle, the first x sequence instruction x1 is transferred to the appropriate function unit (16A), with each of the other x-sequence instructions being moved along in the pipeline 42 by one position closer to the function units 16. Thus, the second x-sequence instruction x2 is moved to position 43C1, x3 is moved to 43B2 and x4 is moved to 43F3. The first x-sequence instruction x1 is available for processing by the function unit 16A on the next clock cycle.

The instructions for the y-sequence are loaded into the pipeline 42 at appropriate positions. In this example, the y-sequence instructions match those for the x-sequence, and are therefore placed in the pipeline 42 queueing behind the corresponding x-sequence instruction. In this case, that results in the first y-sequence instruction y1 being loaded into position 43A1, the second y-sequence instruction y2 into the position 43C2, the third instruction y3 into 43B3 and the fourth instruction y4 into 43F4. The instructions for the z-sequence are loaded into the instruction handlers 44 and timing controllers 46. The w-sequence instructions are loaded into the active register entry 41.

After a further clock cycle, as illustrated in Figure 13, the result Rx1 of the execution of x-sequence instruction x1 is available at the output of the functioning unit 16A. All other instructions are moved one place closer to the function units, with x-sequence instruction x2 being loaded into function unit 16C for execution on the next clock cycle, and y-sequence instruction y1 being loaded into function unit 16A for execution on the next clock cycle. The z-sequence instructions, which are identical to the x and y sequence instructions, are loaded into appropriate locations in the pipelines 42, as illustrated in Figure 13. Since the x, y, and z sequences form a single super-sequence, the instructions are able to be loaded into the pipeline, and hence executed by the function units, with minimal delay between instructions, as illustrated in Figure 13.

The w-sequence instructions are loaded into the instruction handler 44 and timing controller 46, with the v-sequence instruction loaded into the active register entry 41.

After a further clock cycle, the illustration in Figure 14, the result Rx2 of the execution of the second x-sequence instruction x2 is available at the output of function unit 16C. 5 The result Ry1 of the processing of the first y-sequence instruction y1 is available at the output of the first function unit 16A. The third x-sequence instruction x3, the second y-sequence instruction y2 and the first z-sequence instruction z1 are loaded into respective function units 16B, 16C and 16A for execution on the next clock cycle. All other queued instructions are moved one place closer to the function units 16 in the 10 pipeline 42.

The w-sequence instructions w1 and w2 are placed in the pipeline at an appropriate position. The w-sequence is not part of the xyz super-sequence, and so the timing controller 46 must take notice of the positon information provided by the position detectors 48 in order to determine the appropriate locations for the w-sequence instructions in the pipeline 42. In this case, the instructions wl and w2 make use of function units which are not executing instructions in the x, y and z sequences. As such, the position detectors 48 do not generate any position information.

Accordingly, the instruction handler is able to insert the w1 and w2 instructions as close as possible to the appropriate function units (16D and 16E) in columns D and E of the instruction controller. In this way, it is possible to execute instructions from the w sequence in advance of non-interfering instructions in other sequences, thereby reducing latency of execution of the w sequence.

On this same clock cycle, v-sequence instructions are loaded into the instruction handlers 44 and timing information into the timing controllers 46. The looped x25 sequence instructions are now located in the active register entry 41.

As shown in Figure 15, after a further clock cycle, the results Rx3 of the execution of the third x-sequence instruction x3 is available at the output of function unit 16B, the result Ry2 of processing of the second y-sequence instruction is available at the output of the function unit 16C, and the result Rz1 of processing the first z-sequence instruction is available at the output of the function unit 16A. All of the other queued instructions are moved one place closer to the function units 16, or into the function units themselves, resulting in the fourth x-sequence x4 instruction being ready for execution in the function 5 unit 16F, the third y-sequence instruction y3 being ready for execution in the function unit 16B, and the second z-sequence instruction z2 being ready for execution in the function unit 16C. In addition, the first w-sequence instruction w1 is ready for execution in the function unit 16D. It will be noted that the w sequence processing is interleaved with the x, y and z sequence processing, and this is possible because the w-sequence 10 instructions do not make use of function units used by those other sequences.

The v-sequence instructions v1, v2 and v3 are placed at appropriate points in the pipeline of the respective function units 16C, 16E and 16F. Unlike the w sequence instructions, however, the v-sequence instructions make use of at least one function unit used by another sequence (in this case sequences x, y and z, and function unit 16F).

Since the v-sequence is not part of the xyz super-sequence it is important that the sequences of instructions do not overlap with one another, since such overlapping may cause data conflict issues.

In order to detect and control the positioning of the potentially overlapping sequence instructions, each of the pipelines and each of the columns includes the position detector 48. The position detector 48 stores the latest position that contains an instruction in the column concerned and broadcasts this position to the position detection unit 48 of each of the other columns. When a new instruction is received the position detector determines whether there is an instruction already queued in the pipeline column to which it relates, and if so this is indicated to the timing controllers 46.

Each of the position detectors 48 in which an instruction is detected reports the position of the instruction, so that the highest position in the pipeline can be determined. The new instruction sequence must then be placed at a position at least one higher than this highest position, in order that the new sequence can be placed at an appropriate non-interfering position within the pipeline.

In this case, the last instruction from the x, y and z sequences to be processed is instruction z4, on function unit 16F. In other words, the fourth instruction z4 of the z-sequence is highest placed in the pipeline. At the time of placement of the v-sequence instructions into the pipelines, instruction z4 is held at location 43F2, and so the first 5 instruction v1 in the v-sequence must not be placed in a location less than or equal to the fourth instruction z4 of the z-sequence. Therefore, the first instruction v1 of the v-sequence is placed at position 3 for its appropriate function unit. In this example, v1 is therefore placed in position 43F3. This positioning ensures that the first instruction of the v-sequence does not overlap with any instructions in the xyz super-sequence. This 10 ensures that the data and results available through the v-sequence do not clash with those available and used for the xyz super-sequence.

The remaining v-sequence instructions v2 and v3 then have their initial positions determined with respect to the first v-sequence instruction v1 in order that the timing of the v-sequence is maintained, and ensures that the v-sequence does not overlap with the existing independent sequence in the instruction pipeline. Thus the second v-sequence instruction v2 is placed at 43E4, and the third at 43C5. This placement ensures that the sequence v instructions are executed in an appropriate order, without interference with the xyz super-sequence.

The v-sequence shown in Figure 15 is a simplified example to show the principles of the 20 present invention. In more complex examples, the v-sequence may overlap in time with the existing sequence or sequences in the instruction pipeline as a whole, but not in an individual pipeline for a specific function unit. The timing of the v-sequence is determined by the timing required to ensure that instructions in any one pipeline do not overlap or become interleaved in that pipeline.

The looped x-sequence instructions are loaded into the instruction handlers 44 and the timing controllers 46.

As shown in Figure 16, after a further clock cycle the result Rx4 of the execution of the fourth x-sequence instruction x4 is available at the output of the function 16F. In addition, the result Ry3 of the execution of the third y-sequence instruction y3 is available at the output of function unit 16B, the result Rz2 of the execution of instruction z2 is available at the output of function unit 16C, and the result Rw1 of the execution of the first w-sequence instruction w1 is available at the output of the function unit 16D.

The fourth y-sequence instruction y4, the third z-sequence instruction z3 and the second w-sequence instruction w2 are placed at the appropriate function units for execution thereby. The remaining queued instructions are moved one place further towards the function units in the pipeline 42.

The looped x-sequence instructions are transferred to appropriate locations in the pipelines 42. Since the looped x-sequence does not form a super-sequence with the v-sequence, it is important that, for a given instruction pipeline, the looped x-sequence instructions do not overlap in time with any of the v-sequence instructions. As before, the position detectors 48 are used to determine the position that the x-sequence instructions are to be loaded in order to avoid interference with the v-sequence instructions. In the example shown, this results in the first looped instruction x1 being located at position 43A4, the second instructions x2 being located at position 43C5, the third at 43B6 and the third at 43F7.

Figure 17 shows the location of the instructions after a further clock cycle, and that the result Ry4 of the execution of the fourth y-sequence instruction y4 is available at the output of function unit 16F, the result Rz3 of the execution of the third z-sequence instruction is available at the output of the function unit 16B and the result Rw2 of the execution of the second w-sequence instruction is available at the output of the function unit 16E. The fourth z-sequence instruction z4 is available for execution by the function unit 16F, and the v-sequence instructions v1 and v2 are moved closer to the function units 16.

Figure 17 illustrates how the v sequence instructions do not overlap with processing of the x, y and z sequence instructions, since the first v-sequence instruction v1 is queued for execution directly after the fourth z-sequence instruction, which is the last of the instructions in the x, y and z sequences.

Also after this further clock cycle, the looped y-sequence instructions are loaded into the pipelines 42. Since the looped y-sequence forma a super sequence with the looped x-sequence, the looped y-sequence instructions can be located adjacent the corresponding lopped x-sequence instructions. In this case, the first looped y-sequence instruction is located at position 43A4, the second at position 43C5, the third at 43B56 and the fourth at 43F7. The looped z-sequence instructions are loaded into the instruction handlers 44 and the timing controllers 46.

Figure 18 illustrates the position after a further clock cycle, in which the result Rz4 of the execution of the fourth z-sequence instruction z4 is available at the output of the 10 function unit 16F.

The first of the v-sequence instructions v1 is available for execution on the next clock cycle of the function unit 16F, with the second and third v-sequence instructions v2 and v3 being moved along the pipeline 42 towards their respective function units. Thus, the first of the instructions for the v-sequence is not executed until after the execution of the last of the instructions for the xyz super-sequence.

Also after this further clock cycle, the looped z-sequence instructions are loaded into the pipelines 42. Since the looped z-sequence forms a super sequence with the looped x-and y-sequences, the looped z-sequence instructions can be located adjacent the corresponding lopped y-sequence instructions. In this case, the first looped z-sequence instruction z1 is located at position 43A4, the second z2 at position 43C5, the third z3 at 43B76 and the fourth z4 at 43F7.

Figure 19 shows a situation one clock cycle later in which the result Rv1 of the execution of the first v-sequence instruction v1 is available at the output of the function unit 16F. Instruction v2 is placed in function unit 16E for execution, with instruction v3 25 being moved one location closer to the function unit 16C.

Figure 20 shows a situation one further clock cycle later in which the result Rv2 of execution of the second v-sequence instruction v2 is available at the output of the function unit 16E, with the third instruction of the v sequence v3 being available for processing by the function unit 16C for the next clock cycle. The looped x-, y-, and z-sequence instructions are moved along the pipelines 42, such that the first of the looped x-sequence instructions x1 is available for execution on the next clock cycle.

Figure 21 illustrates that the third instruction of the v sequence v3 has been executed with the result Rv3 is available at the output of the function unit 16C. The first of the looped x-sequence instruction x1 has been executed, and the result Rx1 is available at the output of function unit 16A. The second of the looped x-sequence instructions x2 is then available for execution by function unit 16C in the next clock cycle.

Although Figures 9 to 21 show the processing of five instruction sequences, it will be 10 readily appreciated that the techniques is applicable to any number of sequences of instructions.

As mentioned above, the principles of the present invention have been described with reference to a single controller supplying instructions to a single processing element. As will be readily appreciated, the controller may supply the instructions to a plurality of processing elements. One possible configuration of such a technique is illustrated schematically in Figure 22, and provides a single instruction multiple data instruction (SIMD) architecture. A single instruction controller 50, comprising an instruction register 52 and a plurality of instruction pipelines 54, communicates instructions through the pipelines in the manner described above. The pipelines 54 are connected to deliver instructions in parallel to respective function units 62 of a plurality of processing elements 60A to 60E via an instruction bus 56. It will be appreciated that the number of processing elements 60, function units 62 and pipelines 54 shown in Figure 22 is merely exemplary and the principles of the present invention are applicable to any number of those units.

In such a manner, an embodiment of the present invention is able to provide lower latency processing of multiple sequences of instructions. In addition, the use of a register having multiple entries in the processing element itself, and used to store a plurality of instruction words enables lower power and lower latency due to the removal of the need to access a memory device for looped instruction words.

The techniques described above are particularly applicable to the processing of data packets in a wireless telecommunications network. In such a network, digital data packets are communicated from a sending device communicating with a source network node, over a radio frequency network to a destination network node for delivery to a receiving device. The actual processing of the data packets, including encoding, routing and decoding, will not be described here in detail for the sake of clarity.

An example network node 80 is illustrated in Figure 23, and comprises an antenna device 82, which includes radio frequency circuitry 82a for transmitting radio frequency (RF) signals from an antenna 82b. The RF circuitry 82a operates to generate RF modulated output signals for transmission from the antenna 82b. The antenna 82b is also able to receive RF signals, and to supply these to the RF circuitry 82a. A detailed description of the antenna device will not be included here, for the sake of clarity. The principles of radio frequency transmission and reception are well known and are applicable to the antenna device of Figure 23. In a preferred embodiment of the present invention the RF circuitry operates in the 60GHz frequency band.

The network node 80 also includes a PHY processor 83 which is operable to supply a modulated baseband signal including encoded data packets to the antenna device 82.

The PHY processor 83 uses appropriate modulation and coding schemes (MCSs) in dependence upon, for example, RF channel reliability and strength, required data transmission rates, and other factors. In an embodiment of the present invention, the PHY processor 83 includes at least one processing element and an instruction controller as described above, and operates as described above. In a preferred embodiment of the present invention, the PHY processor 83 includes a plurality of processing elements which receive instructions from a single common instruction processor.

Together, the antenna device 82 and PHY processor 83 provide so-called layer 1 functionality for the network node 80. Layer 1 functionality provides the interface between the network and the transmission medium.

The network node 80 further includes a MAC (media access control) device 84. The 5 MAC device 84 is operable to control flow of data packets to the PHY processor 83 including determination of the frequency channel to be used for a particular data packet, timing and synchronisation of data packet flow, and error detection and correction. Once again, a detailed explanation of the role and functions of the MAC device 84 will not be included here for the sake of clarity, and since these are well known and 10 understood by the skilled reader, particularly with reference to the relevant communications standards of the Institute of Electrical and Electronics Engineers (the IEEE). The MAC device 84 provides the network node with layer 2 functionality.

The network node 80 also comprises a network processor 85 which operates at the network layer level (layer 2/3), and is operable to determine routing information for data packets being transmitted across the network for delivery to the correct destination. The network processor 85 may include a wired connection 86 to enable communication over a wired (for example copper or optical) communications link. The network processor 85 receives data packets for transmission over the network, and determines routing information across the network for those data packets. Typically, the network processor 85 operates to determine a physical route across the network from a destination address (for example, an internet protocol (IP) address) of the destination device. In one example, the network processor 85 takes the destination address of the received data packet, and processes that address to determine the route from the current network node to the next network node in the network. In another example, the network processor 85 generates an entire route across the network for a data packet. There are many data packet routing techniques that may be employed, and the skilled reader will readily understand that any of these may be applicable in this case.

In one example network node, the antenna 82b of the antenna device 82 is provided by a beamforming antenna, such as is illustrated schematically in Figure 24. The antenna 89 preferably comprises a two-dimensional array of individual antenna elements. Such a beamforming antenna 89 is able to direct its effective transmission and reception beam pattern. For example, the antenna may have a central beam 90, and first and second beams 91 and 92 to respective sides of the central beam 90. The antenna 89 may have any number of beam shapes, and hence directions, thereby enabling the antenna to direct transmissions to a specific receiving network node, and to receive signals from a selected network node.

In a transmitting mode, the RF circuitry 82a generates respective drive signals for the antenna elements of the antenna 89. The drive signals are respective modified versions of the RF modulated output signal specific to each antenna element. The output signal may be modified in phase and/or amplitude is order to produce the desired beam pattern, and hence beam direction. The determination of the modifications required for each drive signal is made during processing of the incoming data packets by the PHY processor 83.

In a receiving mode, the reception characteristics of the antenna elements of the antenna 89 are modified according to weighting values determined by the PHY processor 83 and supplied to the RF circuitry 82a, such that the antenna 89 can be directed to receive RF signals from a specific direction (that is, from a specific transmitting network node).

In accordance with the principles of the present invention, the PHY processor 83 makes use of the instruction delivery technique described above in order to process data packets at a high enough rate to maintain the desired high levels of data packet throughput.

One particular application for such a network node 80 including a beamforming antenna 89 is in a mesh network, such as that illustrated in Figure 25. A mesh network embodying an aspect of the present invention makes use of network nodes having a PHY processor as described with reference to Figures 23 and 24. The mesh network 100 of Figure 25 comprises a plurality of network nodes 101 that communicate with each other in a predetermined pattern using bidirectional radio frequency (RF) communications links 102. The network of Figure 25 is commonly referred to as a mesh network due to the multiple interconnections between network nodes. A typical application for a mesh network is to provide a wireless outdoor backbone network for mobile telephony and data services. Such a backbone network is concerned with the transmission of data between node stations of the mobile network for transmission to mobile devices, such as mobile telephones, smart phones and computing devices.

Although aspects of the invention have been described with reference to the embodiment shown in the accompanying drawings, it is to be understood that the 10 invention is not limited to the precise embodiment shown and that various changes and modifications may be effected without further inventive skill and effort.