GB2192295A

GB2192295A - Computer system

Info

Publication number: GB2192295A
Application number: GB08718518A
Authority: GB
Inventors: Peter K Bailey; Peter J Brumfitt; Andrew C Sleigh; Neil F Trevett; Nicholas M Trier
Original assignee: Logica Medizintechnik GmbH; UK Secretary of State for Defence
Current assignee: Logica Medizintechnik GmbH; UK Secretary of State for Defence
Priority date: 1984-06-18
Filing date: 1987-08-05
Publication date: 1988-01-06
Also published as: GB8718518D0; GB2192295B

Abstract

The clock rate of a computer system is controlled by supplying a plurality of fields to an assembler for assembly into a microword, each of the fields containing information relating to the duration of that field; determining, in the assembly, the duration of the longest field and generating a clock field corresponding to that duration; and controlling the clock rate on the basis of the clock field. The image recognition system disclosed is identical to that in the parent application. <IMAGE>

Description

SPECIFICATION Computer System The present invention relates to method for controlling the clock rate of a computer system.

It is normal for the operations of a processor of a computer system to be controlled by a fixed frequency clock sending out a regular stream of pulses which cause the other components of the processor to operate in a regular way. The time taken for each operation of each component may depend on that operation, but with a fixed frequency clock the total processing time is determined entirely by the clock rate. It is also known to provide clocks of variable frequency, so that the length of each clock cycle can be set to the length of the longest operation to be performed that cycle and time can be saved when all the operations carried out by the various components are short. However such variable frequency clocks depend on a manual determination of the lengths of the various operations, and in a complex program this is virtually impossible.Therefore the present invention proposes a way of achieving variation in the clock length automatically in dependence upon the length of the operations that must be performed during that cycle.

It is achieved by using an assembler to calculate the longest operation in a microword. An assembler "assembles" a microword in response to an input command from a plurality of "fields", each of which may represent an instruction for a part of the processor. In known devices one of the fields is a "clock" field which controls the length of the clock cycle, and in the prior art this clock field must be precalculated for each microword. In the present invention there are no pre-programmed clock fields, but all the other fields each have information relating to the duration of that field, and the assembler calculates a clock field from the other fields. The clock field then is added to the microword which is assembled by the assembler and controls the clock cycle in dependence on the longest operation of each microword.Thus the clock fields are calculated automatically, unlike in the prior art, and this permits automatic regulation of the clock rate.

Therefore, according to the present invention there is provided a method for controlling the clock rate of a computer system comprising: supplying a plurality of fields to an assembler for assembly into a microword, each of the fields containing information relating to the duration of that field; determining, in the assembler, the deviation of the longest field and generating a clock field corresponding to that duration; and controlling the clock rate on the basis of the clock field so calculated.

An embodiment of the present invention will now be described in detail, by way of example, with reference to the accompanying drawings in which: Figure 1 is a block diagram of a computer system in which the present invention may be used; Figure 2 is a timing diagram of various signals on the network bus of Figure 1; Figure 3 shows the structure of an execution unit of the system of Figure 1; Figure 4 shows a block diagram of the address generator module of an execution processor of the execution unit of Figure 3; Figure 5 shows a schematic drawing of a treestructure program; Figure 6 shows a block diagram of the preset unit of the execution processor; Figure 7 shows a known system for distributing microcodes to a plurality of boards; Figure 8 shows a block diagram of a system for distributing microcodes to various boards;; Figure 9 shows a block diagram of the execution processor of the execution unit of Figure 3; and Figure 10 shows a block diagram of an image recognition device.

Bus Structure Referring first to Figure 1, a computer system has a logical ring structure comprising a network bus 100 to which is connected a master unit 101, an input/output (I/O unit) 102 and a plurality of execution units 103. Three execution units are shown in Figure 1, but additional execution units may be provided up to a maximum of N-2 where N is the number of data lines in the network bus 100.

Normally the network bus will have 16 data lines, so that a maximum of 14 execution units 103 may be provided.

A master unit 101, which may be a standard minicomputer e.g. Plexus 35, is the unit which provides user input such as the programs to be performed by the various execution units 103 to the computer system. It also provides a start signal for initiating the processing by the other units, and may provide a monitor to check that the execution units 103 are processing data in the correct sequence, but otherwise plays no part in the processing of data by the rest of the computer system. The input/output unit is the point of entry of data to the system for processing by the execution units 103, and the point of exit of processed data.

The execution units 103 will be described in more detail later.

The network bus 100 comprises 30 lines, each connected in parallel to the master unit 101, the input/output unit 102, and the execution units 103.

Sixteen of the lines of the network bus 100 are data lines, which in a conventional system would be used to transmit data from one unit to another. In the system shown in Fig. 1, however, the data lines are used for transmitting data, but also are used for indicating the status of each unit to which the network bus is connected. Each unit is assigned a corresponding data line which is to carry status signals from that unit. In the absence of other information on the data lines, each unit applies a signal to its assigned data line to indicate whether or not it is ready to receive data. Suppose that one unit is to transmit data to some of the other units. The transmitting unit checks the data lines of the other units to confirm that they are ready to receive data.

Since each unit has a corresponding data line, the transmitting unit can perform this check simultaneously (i.e. in parallel) for all the units to which it is to transmit. The unit transmitting the data then applies signals to the data lines of the units to receive the data, which signals enable the receiving units in parallel, and then the data is transmitted in parallel to all the enabled units. After the receiving units have received the data, each receiving unit signals on its assigned data line that it has received the data correctly, and this permits the transmitting unit to check that the data it has transmitted has been received by all the units to which that data was to be transmitted, this check again being in parallel for all the receiving units.Thus status checks, enabling signals, validation checks, and data transfer are all achieved in parallel, so that the checks take up as little time as possible, increasing the time available for data transfer, and hence increasing the efficiency of the bus.

This bus structure facilitates the use of the system as a token-passing ring. Token-passing rings are known and comprise a logical ring of interconnected processing units with a (notional) token which is passed between them. The processing unit with the token is enabled to transmit data to other processing units, and when it has finished transmitting, it passes the token to the next processing unit in the logical ring. That unit then transmits any data it has to transmit, and the token is again passed on. This continues until the token has been passed around the ring, completing a cycle for the system.Using the bus structure discussed above, the computer system of the present invention is particularly suitable for a token-passing system, because when the token is passed from one unit to another, the receiving unit can signal on its assigned data line that it has received the token, so that the control unit 101 can monitor that the token is being passed correctly between the units and that the token is not "dropped" (when a token is passed from one unit but is not received correctly by another unit). In this way the amount of time during which the signals on the bus represent token passing signals (during which time the bus is not available for data transmission) may be reduced relative to prior art arrangements.

In addition to the sixteen data lines, the network bus has 14 control lines divided into four address lines, four mode lines, four clock lines and two parity lines. The signals on these lines co-operate with the signals on the data lines to control the various steps of data transmission and token passing.The functions of the control lines are as foldf the control lines are as follows: address lines-during data transfer, the transmitting unit applies a signal to the data line corresponding to its addres so that receiving units know which unit is transmitting; ~during token passing, the transmitting unit gives the address of the unit to receive the token; mode lines-signals are applied to these lines by the transmitting unit only during data transfer, to give a signal indicating a characteristic of the data being transferred; clock lines a) bus busy-indicates data being transferred; b) TX/strobe~provides clock pulses for data transfer; c) control/strobe-during data transfer, a signal on this line from the transmitting unit enables the receiving units;; ~during token passing, a signal on this line indicates the token is to be passed; d) ack/strnb#during data transfer, a signal on this line from the transmitting unit indicates that all the data has been transferred and that the receiving units are to acknowledge data receipt; ~during token passing, a signal is applied to this line by the master unit if token transfer has failed; parity lines-during data transfer, provide a check from transmitting to receiving unit for assisting validation.

Referring now to Fig. 2, the pattern of signals on the network bus 100 can be seen. Assume that one execution unit has the token and all other units are signalling on their assigned data lines that they are ready to receive (checked by the unit with the token).

Assuming that it has data to transmit, the execution unit with the token applies a signal 201 to the bus busy (BB) line indicating that it is about to transmit data, applies a signal 202 to the data lines of the units to receive the data, which, together with an enabling signal 203 on the control/strobe (CTLSTB) line enables the receiving units, applies a signal 204 to the address (ADR) line corresponding to the address of the unit with the token so that the receiving units know which unit is transmitting, and applies a signal 205 to the mode lines indicating a characteristic of the data to be sent. If the unit has no data to transmit, it immediately starts the token passing sequence.

When data is to be transmittld, the data 206 is transmitted across the data lines accompanied by clock pulses 207 on the TX/strobe (TXSTB) line and parity pulses 208 on the parity lines. At the end of the data transfer the unit with the token applies a signal 209 to the ack/strobe (ACKSTB) line requesting acknowledgement of valid data receipt by the receiving units, which is achieved as described above by the receiving units each applying a signal 210 to their assigned data line.

This completes data transfer and the bus busy (BB), address (ADR) and mode lines are cleared.

Then the unit with the token passes that token to another by applying a signal to the address lines indicating the address of the unit which is sent to receive the token, and applying a signal 212 to the control/strobe (CTLSTB) line to clock the passage of the token. The unit receiving the token then applies a signal 213 to its assigned data line indicating that it has validly received the token, and token passing has been completed. If token passing is not performed correctly, and the token is dropped, the master unit 101 may apply a signal 214to the ack/ strobe (ACKSTB) line to reset the token passing system. Once the token has been transferred, the unit now with the token waits until all units have indicated by a signal 215 on their assigned data lines that they are ready to receive data, then data transfer from the unit now with the token may commence.

As described above a network bus 100 with 16 data lines permits 14 execution units 103 since each unit must have a corresponding data line. If more were needed it would be feasible to link network bus rings to create an extra level of processing within the system, i.e. one of the execution units would be replaced by an interface to another network ring, which itself could have up to 14 execution units. The system is very efficient because the data lines are used for several purposes, decreasing the number of lines which would otherwise be required in the network bus 100. Furthermore, during data transfer, the signals on the bus are controlled by the unit with the token, i.e. the unit with the token acts as a bus controller whilst it has the token, and control of the bus is passed to another unit when the token is passed.

Execution Unit Structure The structure of an execution unit 103 will now be discussed in more detail with reference to Figure 3.

As can be seen from that figure, the execution unit 103 consists of two processors, a control processor 310 (which may be a standard Motorola 68000 microcomputer) and an execution processor 302.

The control processor 301 is connected by a transmission bus 303 to the network bus 100 discussed with reference to Fig. 1. The function of the control processor 301 is to control the transfer of data between the network bus 100 and the execution processor 302. It is the control processor 301 which: (i) signals to the appropriate data line of the network bus 100 that the execution unit 103 is ready to receive information; (ii) signals to other units that it is about to transmit and that they are to receive data; (iii) transmits the data; (iv) checks that data has been received correctly, and applies a suitable signal to the appropriate data line of the network bus 100; (v) receives and passes the token of the logical ring.

Thus it is the control processor 301 of each execution unit 103 which interacts with the network bus 100 and with the rest of the computer system to achieve the advantages of efficient data transmission and signalling discussed above in connection with Fig. 1.

The control processor 301 also controls the transmission of data to and from the execution processor 302.

The control processor 301 and the execution processor 302 each have their own "working" memories (304,305 respectively, but in addition there is a "bank switch" memory 306 connected between them. The memory space of the bank switch memory 306 is divided into two areas 307, 308, each of which is (notionally) subdivided into two parts 307a, 307b, 308a, 308b during each processing operation. The use of a bank switch memory 306 with such subdivision permits simultaneous transmission of data from the control processor 301 to the execution processor 302 and vice versa. Of course, there need be no physical division of the bank switch memory 306, and the division may be a purely logical one of division of memory addresses in a single memory component.

Normally, the addresses of the two areas 307, 308 will not change, but the addresses of each part of the area may be changed by the appropriate processor unit 301,302 depending on the operations to be performed.

The program to be used on the computer system may be divided into a series of "frames" corresponding to the processing of a batch of information by the execution processor 302 of each execution unit 103. Assume that data to be processed by the execution processor 302 is stored in the right half 308b of the memory area 308 and that data to be transmitted by the control processor 302 to other parts of the computer system is stored in the left half 307a of the memory area 307. The "frame" begins with the execution processor 302 commencing to process the data in the right half 308b of memory area 308 and this continues until the data is fully processed and can be stored in the left half 308a of the memory area 308.

Simultaneously with this processing by the execution processor 302, the control processor 301 transmits data from the left half 307a of memory area 307 to other units, and receives data from appropriate other units which is stored in the right half 307b of memory area 307. At the end of the processing by both processors 301,302, the memory areas 307, 308 are "switched" (again a logical operation rather than actual movement of data) so that the control processor 301 has access to the memory area 308 and the execution processor has access to the memory area 307.

As described above the system may operate as a token-passing ring and during a token cycle the token is passed once around the ring. Consider now the operations of one particular execution unit 103 during a token cycle. At the start of the token cycle the unit signals on its assigned data line that it is ready to receive data. When the token arrives at a unit (e.g. another execution unit 103), which is to transmit data to the execution unit 103 under consideration, that other unit signals on the appropriate data lines that it is about to transmit data thereby enabling the execution unit under consideration. A data packet (which may be all or only a part of the data that unit has to transmit) is then transmitted via the network bus 100, is received by the execution unit 103 under consideration via the bus 303 and is stored in the right half 307b of the memory area 307.Storing of data packets in the right half 307b of the memory area 307 continues as the token is passed around the ring. Thus during a token cycle the right half 307b of the memory area 307 receives packets of data which are to be processed by the execution processor 302 during the next "frame".

At some time during the token cycle the execution unit 103 under consideration will receive the token.

It signals that it is to transmit data, thereby enabling the units to receive that data and then transmits a data packet from the left half 307a of the memory area 307 onto the network bus 100 and hence to other appropriate units. At the end of the transmission of the data packet it checks that the data packet has been received correctly by monitoring the signals on the data lines assigned to the receiving units and then it signals via the appropriate data line of network bus 100 that it has finished transmitting the data packet and the token is then passed on. Thus during the token cycle the control unit transmits a packet of data processed by the execution processor 302 during the previous "frame". The token is passed round and round the ring, and each execution unit with data to transmit will transmit a packet of that data each time the unit has the token.If the execution unit has no data to transmit, it simply passes on the token. After a sufficient number of token cycles, a control processor of an execution unit will have passed all the data processed by the execution processor during the previous "frame" of that execution unit and data is switched between the control processor and execution processor as will now be described.

The control unit 301 has suitable means for recognising when it has received all the input data and successfully transmitted all its output data.

When the execution processor 302 finishes processing the data of that frame (from the right half 308b of memory area 308) it signals to the control unit 301 that it has finished, and requests more data.

However, the control processor 301 will only respond to this request when it has received all input data and transmitted all output data. When this happens the memory areas are switched so that the control processor 301 has access to the memory area 308 and the execution processor 302 has access to the memory area 307. Although this is, in fact, merely a change of addresses, it may be considered as a transfer of the data in the right half 308b by the memory area 308 (to form the input data for the execution processor 302 for the next frame) and of the data in the left half 308a of the memory area 308 to the left half 307a of memory area 307 (to form the output data to be transmitted by control processor 301). The execution unit 103 has then completed one frame and the control unit can signal that it is ready to receive data (i.e. is ready for the next frame to begin).The frame then begins when all the execution units 103 are ready to receive data.

The division of the bank switch memory 306 into the memory areas 307, 308 means that neither processor 301,302 contends with the other for access to the memory 306.

If the execution processor 302 is to handle particularly complex processes, it may be necessary for there to be more interaction between the control processor 301 and the execution processor 302 than described above. For example, the signal from the execution processor 302 to the control processor 301 to switch access to the memory areas 307 and 308 need not be at the end of a processing cycle by the execution processor 302, but the execution processor 302 may continue processing data after the data access has been switched.

Execution Processor Components The execution processor 302 consists of a plurality of components all connected in parallel to a plurality of data lines. Many of the components are conventional, but some relate to various aspects of the invention and will now be described in detail.

The architecture of the execution processor as a whole will be described later.

Address Generator Module An address generator module converts a data signal appearing as a data line of the procesorto an address. However, if the memory has more memory addresses than possible signals on the data line, it is necessary to have a conversion system which effectively multiplies the number of data signals, and the problem is to convert data signals, which are random sequences of n bits to an ordered sequence of memory addresses of n+x bits (i.e. the memory is object addressable). So far as is known, there is no prior art system to do this. There are systems that can convert an address of n bits to an address of n+x bits, i.e. convert one ordered sequence to another, but none that can translate from the random sequence of data signals. One way of achieving this will now be described with reference to Fig. 4.Assume that a 16 bit address signal on the data lines 400 is to be converted to a 20 bit memory address which is transmitted from the address generator module 401 via an address bus 402 to the memory 305 (see Fig. 3). If this conversion were not done, the total address space of the memory 305 and the area of the bank switch memory 306 to which the execution processor has access would be limited to 64K, but by increasing the number of bits in the address signals, a memory address base of 2 megabytes ran be achieved. The address generator module consists of two translation units 404,405 connected in parallel, via an adder 406, to the address bus 402. Two translation units 404,405 are required because there are two different types of addresses with which the address generator module 402 must deal.One type of addresses are the addresses of static data objects i.e. those data with predetermined positions in the main memory 305, and the address of which are therefore known before a program is executed.

Since the addresses of these data objects are known it is relatively easy to generate a 20 bit address for each data object. One of the translation units 404 acts as a static translation unit, and consists of a random access memory (RAM) which stores the addresses of the static data objects as 20 bit addresses and acts as a "look-up" table to convert each 16 bit address to a corresponding 20 bit address. Suppose that there are a maximum of 4K static data objects. A 12 bit signal fed from the data bus 400 can then be used to generate a complete set of unique addresses for each of the static data objects, and the RAM of the static translation unit 404 converts the 12 bit signals into 20 bit addresses for transmission to the address bus 402.

The other type of data objects stored in the memory 305 and all the data objects stored in the bank switch memory are dynamic data objects, i.e. data objects which are not predetermined and which may change during the program. Practical programs require a large number of dynamic data objects, so that it is not practicable to use a RAM to store the addresses of all the dynamic data objects.

Therefore the dynamic memory addresses must be generated directly from the data on the data bus 400. This is achieved by feeding a number of bits of the data signal to a control unit 407 which inhibits the static translation unit 404 and enables the other translation unit (the dynamic translation unit) 405.

The dynamic translation unit 405 shifts the signal on the data line by up to 4 bits, to form a 20 bit address with the bits of the address not corresponding to a bit of the data signal being set to zero. The shifted signal, now being a 20 bit signal is fed to the address bus 402, via adder 406.

There is a difficulty with this however. As the most significant bit (MSB) of the signal on the data line 400 is shifted towards the MSB of the memory address, the memory that can be addressed increases, but the memory has to be allocated in larger blocks. In order that the available dynamic address space is used efficiently, it is desirable that the program controlling the dynamic memory allocation knows by how many bits the dynamic addresses are being shifted.

It is convenient if the addresses of the static data objects correspond to the bottom 4K of the 16 bit address line 402. The top 60K can then be used for the addresses of dynamic data objects. Assume a signal appears on the data bus 400 which is to be converted to an address on the address bus 402. The top 4 bits of the 16 bit signal are fed to the control unit 406. If these top 4 bits are all zero, i.e. the signal is in the bottom 4K, the control unit 406 enables the static translation unit 404 which receives the bottom 12 bits of the signal from the data bus 400. The RAM of the static translation unit 404 converts this 12 bit signal to a 20 bit address which is fed via line 407, and the adder 403 to the address bus 402.If, on the other hand, any one of the 4 top bits of the signal on data bus 400 is non-zero, the control unit 407 inhibits the static translation unit 404 and enables the dynamic translation unit 405 to receive the 16 bit signal on the data line 400. The dynamic translation unit 405 then shifts the 16 bits upwards to create a 20 bit signal which is again fed via line 407 and adder 406 to the address bus 310.

As shown in Figure 4, the adder 403 may combine the signal from the static or dynamic translation units 401,402, with a signal from an offset unit 408.

The purpose of the offset unit 408 is to permit the generation of the addresses of vectors, i.e.

quantities for which more than one parameter, and thus more than one address is necessary to define the quantity. When the addresses of a vector are to be generated, first one address is generated as described above then a signal is applied to a data bus 409 which is fed via line 410 to the offset unit 408 which calculates the difference between the initial address and subsequent address, and applies that difference to the adder 406, which sums it with the initial address thereby deriving the subsequent address in a simple way.

Program Memory and Prefetch Unit One way of structuring a computer program is known as tree-structuring. Such a structure is shown in Figure 5 and consists of an entry point 501 which is connected to other points or nodes, which may themselves be branching points, or "secondaries" 502 with depending sub-programs or "subtrees" or may represent a single sub-routine of the program. Such a sub-routine is known as an executable primitive. When such a program is run, each node is scanned in turn and the subtrees (if it is a secondary) from that node investigated, again in turn, until all primitives have been extracted. First for example starting at the entry point 501, the secondary 502a would be scanned, and the first subtree from that secondary leads to another secondary 502b.Investigating the branches from the nodes 502b, the first is primitive 503a which would then be extracted for subsequent execution by the computer system. The next subtree goes to another secondary 502c, and the subtrees from that secondary would be investigated. This would lead to the extraction of primitives 503b and 503c. Since all the subtrees from secondary 502c would then have been investigated, processing returns to secondary 502b to extract the primitive 503d.

Processing then returns to secondary 502a for the extraction of primitive 503e, and then processing returns to the entry point 501 for processing of another branch from that entry point 501. The sequence of investigating the subtrees from each secondary is normally controlled by software known as an inner-interpreter". However, in many programs the ratio of secondaries 502 to primitives 503 is high so that a considerable amount of processing time is spent simply in transversing the "tree" of secondaries looking for primitives 503 to extract.

Therefore it is desirable that there is a way for extracting the primitives more rapidly than could be done by programming alone. Normally in a treestructure code, each word in the program memory may represent either an instruction (known as a parameter field) to be acted upon by other components of the processor or a tag which is associated with one or more parameter fields and indicates the nature of the parameter field e.g.

primitive or secondary. Since the parameter field and tag are separate program words, it is necessary to extract from the program memory first the tag then the associated parameter field(s), so that at least two processing steps are-needed.

Therefore use may be made of a program memory which has a word length longer than the word length of the parameter field. The extra bits of the memory word are then used to form the tag, so that the tag and parameter field are combined in a single memory word.

Referring now to Fig. 6, a memory 600 with a word length of e.g. 24 bits forms the heart of both an instruction format unit 601 and a prefetch unit 602.

The prefetch unit 602 acts as a hardware "inner interpreter" and will be described first. On a signal from a program counter 603. The program memory 600 outputs an instruction word, which may correspond either to a secondary of the program, in which case the parameter field is an instruction to obtain another instruction word, orto a primitive in which case the parameterfield is an instruction for other parts of the processor.

If the program was not branched, it would consist simply of a string of primitives ordered in the sequence in which they are to be performed. A program counter 603 would send out a sequence of signals instructing the program memory to output the primitives in the correct order. However, in branched code this cannot be done because at a secondary the program must jump to an instruction in one subtree from that secondary, but be capable of returning to the secondary for executing the other subtrees from that secondary. Therefore when secondary occurs, the program jumps to one subtree but remembers the address of the next nde (secondary or primitive) to enable the program to return when the one subtree has been completed.

This is achieved by adding one to the address of a secondary in an adder 604 and storing the result in a stack 605. The node is detected by a control unit 606 which receives the tag of the instruction output from the program memory 600 and is capable of distinguishing tags representing primitives, tags representing secondaires, and special tags representing "return" primitives which are the end of a secondary subtree (i.e. are the rightmost instruction at any particular level in any secondary subtree of the tree of Fig. 5). The return primitives may be simply an instruction to return to the next level or may be both a return instruction and an instruction to be transmitted to other parts of the processor.When a subtree has been executed, the end of the subtree is detected by the control unit 606, and the top address in the stack 605 is removed from the stack and output via a multiplexer 607 to the program counter 603 and becomes the next address fed to the program memory 600. It is important that each instruction word consists of both a tag and a parameter field because then the tag and the parameter field are produced in a single output from the memory. If the tag and the parameter field were separate words, as in the prior art, it would not be possible to know which step of the program to return to without investigating several words, which would be slow and inefficient.

Consider the tree of Fig. 5, in which the letters A to Y representthe sequence of instruction words stored in the program memory. The first instruction word to be output from the program memory 600 is secondary A. The control unit detects that it is a secondary, causes the address plus one (i.e. B) to be stored in the stack 605 and the program counter 607 is caused via information from the parameter field of the instruction word to jump to instruction E. As this is also a secondary, its address plus one (F) is stored above address B in the stack 605 and programming jumps to address M. This is a primitive, the tag of which is detected by the control unit 606, and so the instruction word is fed to the instruction format unit 601.Processing then continues with the instruction word at address N, which is a secondary so its address plus one (0) is stored at the top of a stack and the program jumps to address U, which is a primitive so is output to the instruction format unit. Then the next instruction word at address V is output and again this is a primitive and so is output to the instruction format unit. However, it is also the rightmost instruction at that level in that subtree, and therefore has a tag which instructs the control unit to extract the topmost address (i.e. (0) from the stack 605 and this then forms the next address fed to the program memory. Again this is a primitive and the rightmost in that subtree at the level so it is output to the instruction format unit 601 and the next address (F) extracted from the stack 605.This continues until all the instruction words have been output from the memory.

As described above, one (1) is added to the secondary address before it is stored in the stack, so that processing can return to the instruction word immediately after that secondary. It would alternatively be possible to store the address of the secondary itself on the stack, and add one when the address is output to the program memory.

The prefetch unit 602 thus steps through the program and extracts the primitives of the program and feeds them in the sequence in which they are to be performed to the instruction format unit. The prefetch unit operates asynchronously with the rest of the execution processor 302 so that the primitives may be "queued" for use at an appropriate time.

Consider now the instruction format unit 601. This receives the primitives from the program memory and each primitive consists of a tag (of e.g. 8 bits) and a parameter field (of e.g. 16 bits). The 8 bit tag is sufficient to define 256 instructions which can use the parameter fields in any way required. However, 256 instructions are not suffic; nnt for many programs, and therefore it is necessary to derive other instructions. This is achieved by feeding the 8 bit tag both to a look-up memory 608, via a latch 609, and also to the control unit 606 which is common as to both the prefetch unit 602 and the instruction format 601.The look-up memory 608 acts in a similar way to the RAM of the dynamic translation unit 404 so that all but one of the 256 tags are converted to a 12 bit address by the look-up memory 608 and are fed to a multiplexer 610. At the same time the tag fed to the control unit 606 causes the control unit 607 to enable the multiplexer 610 to pass the 12 bit address from the look-up memory 609 directto an instruction buffer 611. However, the one other tag causes the control unit 606 to prevent any address being fed from the look-up unit 609, but instead obtains the 12 bit address from the 12 least significant bits of the 16 bit parameter fields being fed on line 612 from the program memory 600 to the instruction buffer 611 via the latch 609. These 12 least significant bits then become the address signal fed to the buffer 611.

In this way a 24 bit instruction word in the program memory 600 may be used to specify a 12 bit address and a 16 bit value, and to permit maximum use to be made of the 12 bits of the address, so that 4096 addresses may be obtained.

Thus the output buffer 611 stores a 12 bit address and a 16 bit parameter field for each primitive extracted from the program, and the primitives are queued in the buffer 611 in the order in which they are to be performed.

Hardware Structure Each of the various components of the execution processor 202 corresponds to a combination of hardware, mounted on a series of circuit boards. It is convenient for simplicity, to think of each component being mounted on a separate board, but in practice this need not be the case, and it has been found that the various components can be fabricated on only three circuit boards.

However, for the sake of simplicity assume for the moment that each part of the execution processor is on a separate board. It is then necessary to distribute the control signals from the microprogram memory of the execution unit 302 to the various other boards. The prior art method of doing this would be to buffer the control signals, then transmit them via a bus which interconnects the various boards and is known as a "backplane".

Thus referring to Figure 7 a series of boards 701, 702, 703 contain circuit components, generally indicated at 704, 705, 706 respectively, connected to a data bus 707. One of the boards 701 contains the microprogram memory 708 in which the microprogram to be run on the boards is stored. A sequencer 709 controls the output of microprogram instructions from the microprogram memory 708.

Each instruction of the microprogram memory consists of a string of bits divided into a plurality of fields, with each field being to control a component of the processor, thus one field may be used as a control signals to the circuitry 704 of the board 701 containing the microprogram memory 708, whilst the other fields are fed into the backplane 710. One of the fields in the backplane is fed as control signals to the circuitry 705 of the board 702, a further field to the circuitry 706 of the board 703, leaving e.g. one more field for one further board. It can be seen immediately that this limits the number of boards that may be interconnected in this way since the signals transmitted by the backplane 710 is limited by the bit length of the instructions in the microprogram memory 708.Therefore if the system is to permit increase in the number of boards, "spare" capacity must be included in the word length (to add extra fields), and the size of the microprogram memory 708 must be sufficiently large in order to permit this. It is clearly undesirable to include "spare" capacity initially or to have the number of boards limited, and hence an aspect of the present invention seeks to overcome this problem.

Referring to Figure 8, three boards 801, 802, 803 each have circuitry 804,805,806 connected to a data bus 807. One of the boards 801 has a sequencer 809 which generates address signals which are to be fed to the microprogram memory. However, unlike the prior art system, each board 801,802, 803 has its own microprogram memory 811,812,813 which may each contain the full microprogram memory required, or the microprogram memory required for that board only. The addresses from the sequencer 809 are fed to the microprogram memory 811 of that board 810 for controlling the circuitry 804, but also to a microprogram address bus 814. This microprogram address bus 814 is then connected to each microprogram memory 812,813 of the boards 802,803 in parallel.Thus, when an address is generated by the sequencer 809 it is fed in parallel to the microprogram memories 811,812,813 of each board 801,802, 803, thus extracting the corresponding instructions from each microprogram memory 811,812,813 so that the instructions may then be acted upon by the circuitry 804,805,806 of one or more of the boards 801,802, 803.

It can be seen immediately that it is simple to increase the number of boards, merely by connecting the microprogram memory of that board to the microprogram address bus 814. It is therefore unnecessary to include spare capacity in the existing memory to permit increase in the number of boards, and the size of the microprogram address bus does not increase with the number of boards, because it merely carries the address signals.

Variable Clock Each execution unit 103 has a clock which controls the parallel operation of the various components of the execution processor. It is clear that in some of the operations the execution processor 202 will execute will be shorter than others. However, with a fixed frequency clock, the processing speed is entirely determined by the clock rate, and no time advantage can be gained from operations which are shorter than the clock rate. It is known, in prior art processors, to employ a variable frequency clock the cycle length of which can be changed to suit the operations being performed during any particular cycle.However, the variable frequency clocks which are known cannot react automatically to the periods of the various operations within the processor and thus it is necessaryforthe programmer to calculate the duration of every operation manually. For processors of any complexity this is virtually impossible. Therefore the present invention seeks to provide automatic variation of the clock frequency independent upon the operations being performed.

To explain the way this is achieved, it is necessary to consider the operation of an assembler. An assembler converts an input code into a microprogram word, with the microprogram word being a string of fields, each field being preprogrammed instruction for some part of the processor. In the prior art, the input of a code word generates a plurality of fields, one of which is a field representing the length of the longest operation within that microprogram word. Since all the fields are pre-programmed, this means that the length of longest operation must be pre-calculated.

In the present invention, each field contains information relating to the length of operation of that field. When assembling a microprogram the assembler compares the information of the length of operation of each field, and from that information generates a "clock" field representing the maximum operation length and this forms part of the microprogram word. That clock field is then fed to the clock which regulates the clock time in dependence on the clock field set. Thus the system differs from the prior art in that no clock fields representing the total time of the microword are stored, each field stored contains information relating to the length of the operation represented, and the assembler calculates the clock field from the information from other fields.The calculation may simply involve calculation of the longest time if the operations are all in parallel, but some operations may be serial or may themselves represent subprograms in which operations occur in parallel, in series, or both. In such circumstances it is desirable that the assembler is able to analyse information about the subprograms so that the duration of the microword may be calculated accurately.

In this way the operation which determines the maximum duration of the operations represented by the microword during any one cycle of the processing of the execution processor 302 can be determined, and the clock control set automatically.

Execution Processor Structure Fig. 9 shows the general structure of the execution processor 302. It consists of three data buses 901, 902, 903 connected in parallel to the various components of the processor. Some of the components have already been described and it is assumed that each component is fabricated in a separate board so that each component has a separate microprogram memory and the sequencer 809 is connected to the microprogram memory of each component via the microprogram address bus 814 (the two parts of the microprogram address bus 814 shown in Fig. 9 being interconnected). Thus each component corresponds to a board 801,802, 803 of the hardware of Fig. 8. However as discussed in connection with Fig. 8 it is usually possible to combine several components on one board so that the structure of Fig. 9 may be achieved in three boards.The address generator module (AGM) 401 has already been described in connection with Fig.

4, with two of the buses 901,902,903 corresponding to the buses 400,409 of that figure. Since the AGM 401 is connected to all three buses 901,902,903 may the bus 400 may correspond to any one of them, and the corresponding bus may be changed during the operation of the processor.

The component marked PREFETCH has already been described because this corresponds to the instruction format unit 601 and the prefetch unit 602.

Thus the component marked PREFETCH contains the program memory and the main data memory 305 is also shown. In addition the data buses 901 will be connected to the bank switch memory306 but this is not shown.

The other components shown in Fig. 9 are more conventional. The constant source 904 provides constants for the rest of the processor, which constants may be desired directly from the microprogram or via the output of the prefetch unit.

The memory I/F (MEM I/F) 905 acts as a short-term memory and buffer to allow byte swapping and byte storage during transactions with the memory 305.

The cache 906 acts as a rapid access memory which can be addressed in a number of ways, the data stack (DS) 907 stores a stack of data values, the top two of which can be accessed simultaneously, or can be written into simultaneously using two of the data buses 901, 902, 903. Finally the components 408 and 409 marked MAC and ALU are the multiplier/accumulator and the arithmetic and logic unit respectively. It is not necessary to discuss these components in detail as their function will be understood to those skilled in the art.

Image Recognition An example of the application of a computer system as described above to an image recognition device will now be described with reference to Figure 10. The image recognition device comprises a series of stages, with data being passed in a pipeline operation from one stage to the next. It is assumed that the device has six execution units 103 connected to the network bus 100 of Figure 1.

A video input of a scene containing the image to be recognised is fed to a patch extractor 1001, forming the first stage of the image recognition device, which extracts a 64x64 pixel patch. The data in this patch is fed via the I/O unit 102 to three execution units 1002,1003, and 1004 which each receive the data of the 64x 64 patch and from the second stage. A first one of these execution units 1002 analyses the patch, looking for regions with a single axis of symmetry (e.g. edges and lines) whilst a second execution unit 1003 looks for regions with no symmetry or with complex symmetry. The program which detects these symmetry features is pre-programmed within the appropriate execution units. The third execution unit 1004 carried out extraction analysis on the patch, again the program for this being stored within the third execution unit 1004.The operations of edge/line extraction, nosymmetry extraction, and texture extraction are all known in image processing, but the structure of the execution units of the computer system of the present invention permit operations to be made more rapidly than by known processors.

Since the operations performed by the first and second execution units 1002,1003 issimplerthan that of the texture extraction operation of the third unit 1004,they are likely to finish first. This gives time for them to compare the results they have found with a previous analysis to look for movement, before all three execution units 1002, 1003 and 1004 output their process data to the next stage of the image recognition device. In that next stage, the output of the first and second execution units 1002, 1003 are fed to a fourth execution unit 1005, which analyses edges, lines and other features of the patch and forms line segments, arcs, and nearness relations. The output of the third processor 1004 is fed to a sixth processor 1006 which carries out "region growing" using the texture information from the unit 1004.

Once the execution units 1005 and 1006 have completed their analysis, their outputs are fed to the fourth stage of the image recognition device which comprises a sixth execution unit 1007. This compares the results obtained with various shape models, using the nearness information to dictate a problem solving strategy. This enables a description of the object to be built in absolute co-ordinates, so thatthe unit 1007 may then determine which is the next part of the image to be scanned. It then transmits a signal via line 1008 to patch extractor 1001 to extract another patch for further processing.

At the same time it outputs via a line 1009 the results of its analysis, and the sequence of outputs from the execution unit 1007 on the line 1009 will build up an image of the objection being viewed by the video system.

Thus the device has both parallel, and pipeline features. The processing by the three execution units 1002, 1003 and 1004 is carried out in parallel, as is the processing by the execution units 1005 and 1006. However, the movement of information between each stage represents a pipeline operation, with the transfer of information between the various executiong units being achieved in the way described in connection with Figure 1.

This application has been divided out of UK Patent Application No.85.15318 (published under No.

2,160,406) and describes matter which is also described in that application. Attention is therefore drawn to application No.85.15318.

Claims

1. A method for controlling the clock rate of a computer system comprising: supplying a plurality of fields to an assembler for assembly into a microword, each of the fields containing information relating to the duration of that field; determining, in the assembler, the deviation of the longestfielo and generating a clock field corresponding to that duration; and controlling the clock rate on the basis of the clock field so calculated.

2. A method for controlling the clock rate of a computer system, the method being substantially as herein described.

Amendments to the claims have been filed, and have the following effect.

Claim 1 above has been deleted or textually amended.

New or textually amended claims have been filed as follows:

1. A method for controlling the clock rate of a computer system comprising: supplying a plurality of fields to an assemblerfor assembly into a microword, each of the fields containing information relating to the duration of that field; determining, in the assembler, the duration of the longest field and generating a clock field corresponding to that duration; and controlling the clock rate on the basis of the clock field so calculated.