EP0722583A1

EP0722583A1 - Processor for variable-length character strings

Info

Publication number: EP0722583A1
Application number: EP94928335A
Authority: EP
Inventors: Wilhelm Ernst Haller; Klaus Jörg GETZLAFF; Herbert Chilinski; Ralph KÖSTER
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1993-10-08
Filing date: 1994-09-12
Publication date: 1996-07-24
Also published as: US5761521A; DE4334294C1; WO1995010803A1; JPH09503327A; JP3183669B2

Abstract

A processor for variable-length character strings A, B is used rapidly to determine matches, non-matches and larger/smaller differences. The character strings, the length of which is limited by character string end flags, are broken down into successive part-strings with a number of bytes corresponding to the data flow width and processed to establish a match, a non-match and a string end flag. Each part-string is taken in parallel to an arithmetic unit (20), a logic unit (22) and a comparator unit (24) via operand registers (16, 18) and processed simultaneously. In the arithmetic unit (20) one part-string is subtracted from the other, in the logic unit (22) they are both compared and in the comparator unit (24) the bytes of both part-strings are compared with the content of a flag register (26) previously set to the string end flag. These operations are conducted in one machine cycle. Output signals from the comparator unit are used as indicators of the match between both part-strings, those from the logic unit of the non-match between them and a transmission signal from the arithmetic unit is also used as an indication of which of the two part-strings is larger or smaller.

Description

VARIABLE LENGTH PROCESSOR

The invention relates to a processor for character strings of variable length with a system of memory units for storing character strings, which can be addressed in pairs by program instructions and from which partial strings corresponding to the data flow width are transferred to two operand registers, with an arithmetic / logic unit for executing Processing operations, with a condition code circuit which stores signals derived from the processing results, which are used to control program branches, and with a control unit which successively addresses the partial chains of the character string pairs in the storage unit and, in successive machine cycles, the operation of the units and transfers between the units Controls units.

Computer applications for database queries, applications in the field of word processing and the support of higher programming languages require the processing of character strings to a great extent. Generally speaking, a character string is a data element consisting of a byte sequence with a variable length. The length of a character string can range from one byte to a number of bytes, which is only limited by the size of a storage unit. It can be determined by a length code or by a special character that is contained in the string and indicates the end of the chain. It is normal for string commands that the strings to be processed have different lengths. Common types of processing are the comparison of two character strings and the determination of the first matching or different byte pair, the determination of a chain end character in a character string A and in a character string B, the search for a partial chain within a character string or the shifting of a character string to another storage position.

Complex character string instructions have three operands: the address of a first character string A, the address of a second character string B, where A and B have variable lengths, and, as third operands, the address of an end-of-chain byte, which mark the end of the character strings A and B. Executing such a command requires a number of operations. The character string representing the first operand is compared byte-by-byte from left to right with the character string representing the second operand until a non-matching byte pair or a chain end byte is found. Both strings are the same if the end of the byte is found in the same bit position in both strings. If the end of the byte is found in only one of the two strings, this string is the shorter one and is considered the smaller one. If, on the other hand, a mismatched byte pair is found instead of a chain end byte, these two bytes must be compared with one another to determine which of the two operands is the smaller one. The command execution therefore comprises the following phases: search for the end-of-chain byte in the character string A, search for the end-of-chain byte of the character string B, comparison of both character strings for mismatch, subtraction of one character string from the other to determine which character string is the smaller. The execution of this Various operations require considerable micro-program control and processing time.

It is known to execute

Accelerate string commands by operating circuits that allow a number of bytes to be compared in parallel (U.S. Patent No. 4,896,133). With such an arrangement, it is possible to determine a specific control character in a character string by means of a comparison operation in which copies of this control character are stored in all byte positions of an operand register and are simultaneously compared with eight bytes of the chain. If a match is found, a micro-controlled branch to the next program instruction takes place. In the other case, the comparison is repeated with another character string. The comparison operations can be performed by a bank of EXCLUSIVE-NOR circuits or by the arithmetic and logic unit of the processor in which this arrangement is used. However, the arrangement is only suitable for the parallel execution of part of the previously explained operating phases.

It is also known in the art to provide, in a conventional data processing system, special string instructions that are executed by microprogram using the existing central processor facilities (US Pat. No. 4,556,951). With these commands, the length of the character strings to be processed is specified by a length code contained in the commands, which represents the number of bytes over which the chain extends. The character comparisons are made by Arithmetic and logic unit operations of the processor are performed. In this case, the condition codes generated in such a system as part of the results are used to indicate the correspondence or non-correspondence of character strings, partial strings or individual characters and to control the branches to subsequent program parts. This arrangement also requires a considerable amount of time for the numerous micro-program steps necessary to carry out the operation phases explained at the outset.

The invention is based on the object of specifying an improved processor for character strings which avoids these disadvantages and which makes greater use of the principle of parallel processing. The features of the invention for solving this problem are characterized in claim 1. Claims 2 to 7 indicate advantageous refinements and developments of the invention.

A preferred embodiment of the invention is described below with reference to drawings. Show it:

FIG. 1 is a block diagram of a string processor in accordance with the invention.

Figure 2 is a table of compliance and

Mismatch conditions in the processing of character strings to explain the mode of operation of the arrangement of FIG. 1, 3 shows a block diagram of the result evaluation logic for use in the arrangement of FIG. 1, and

Figure 4 is a flow diagram of a microprogram as used in the control unit of the processor of Figure 1.

The processor of Figure 1 contains a local memory 10, an arithmetic unit 20 and a control unit 40. These units are constructed in a conventional manner and are therefore not shown in detail here. The memory 10 is a fast memory of limited capacity which is arranged on the processor chip and which is connected via a multiplexer 13 to a memory unit 12 which, in a conventional manner, consists of separate semiconductor chips. The storage unit 12 contains a large number of character strings which, for example, form a database in their entirety. Each of these strings consists of a number of bytes, each of which represents a character. However, a different assignment between characters and bytes can also be selected, for example the representation of two characters by one byte. The number of characters belonging to a character string is variable and can be freely selected within wide limits. The only limitation is the capacity of the memory. The length of a character string is determined by an end-of-string character, which is represented by the last byte of the chain and indicates the end of the chain during processing. The processing is carried out by means of character string commands contained in the respective application program, which usually address two character strings of different lengths to be related to one another and control their processing. Typical Forms of processing are the test for equality or inequality, the determination of which character string is the larger or the smaller or in a given order scheme, such as an alphabetical order before the other character strings. The individual bytes of both strings must be checked in pairs to determine which byte position in the order of priority from left to right is the first byte position with different bytes. The string commands have three operands *: the address of a first string A, the address of a second string B and, as the third operand, the address of an end-of-chain byte, at the application programmer's choice, which marks the end of the two strings, which are usually of different lengths. An application program containing the string commands is stored in the memory 12. The microprogram that executes the character string command is located in a control memory (not shown), which is part of the control unit 40. The local memory 10 is successively loaded with a part of the character strings from the memory 12.

The output of the memory 10 is connected via bus lines 14, 15 to operand registers 16, 18, each of which is designed to hold a partial chain of four bytes. The registers 16, 18 are loaded under the control of the control unit 40 simultaneously in one machine cycle, starting in each case with the first partial chains of two character strings A and B to be processed, which are addressed by the operand addresses in the respective character string instruction. In the following description, these partial chains are also called A and B, partial chain A being stored in register 16 and partial chain B in register 18. To the outputs of this Operand registers 16, 18, the arithmetic unit 20, a logic unit 22 and a comparison unit 24 are connected in parallel to one another via bus lines 17 and 19. These units each receive the partial chains A and B stored in the registers simultaneously via the collecting lines 17, 19.

The comparison unit 24 has a third input, which is connected to the output of a further register 26, in which the end of chain identifier specified by the character string command as third operands is stored in a preparatory operation from the memory 10 via the register 16 and the bus 17. This takes place before the operand partial chains are fed to the units 20, 22, 24. The comparison unit 24 carries out a parallel multiple comparison. It compares the identifier located in register 26 with all bytes of sub-chain A and with all bytes of sub-chain B. These comparison operations are carried out with the help of EXCLUSIVE-OR circuits, not shown. The comparison unit 24 has two outputs 28, 30, each with four lines. A signal EA (0), EA (1), EA (2) or EA (3) appears in each case on the output lines 28 if one of the four bytes of the partial chain A matches the chain end identifier in the register 26. Each of these signals is assigned to a byte position in sub-chain A and provides an indication that the byte supplied to unit 24 in this position matches the end-of-chain identifier. Likewise, a signal EB (0), EB (1), EB (2) or EB (3) appears on the output lines 30 if one of the four bytes of the sub-chain B matches the chain end identifier in the register 26. Here too the signal shows the Match for the assigned byte position in sub-chain B.

The logical unit 22 optionally carries out different logical operand combinations such as AND, OR, EXCLUSIVE OR. Here only the comparison operation is of interest to determine a mismatch in the supplied operand bytes. The contents of the four byte positions of operand A are compared with the contents of the four corresponding byte positions of operand B. This comparison also takes place in parallel. Since a mismatch is to be determined, the EXCLUSIVE-OR operation is suitable for carrying out the comparison, which delivers an output signal for each pair of operand bits if the two bits are unequal. In relation to an operand byte, this means that the output signal of one bit position is sufficient to indicate a mismatch for the respective operand byte pair. The logic unit 22 supplies four signals MC (0), MC (1), MC (2) and MC (3) on an output 32, each of which is assigned to a byte position of the two operands A and B. If one or more of these signals occur, this indicates that the operand bytes of the assigned position are not the same.

The arithmetic unit 20 performs the subtraction BA. This is done by adding the two's complement of the partial chain A to the partial chain B. For this purpose, the partial chain A is fed from the register 16 to the operand A input of the arithmetic unit 20 via a complementing circuit 36. A complementing circuit 38 connected to the operand B input remains ineffective. The arithmetic unit 20 has a carry output line 34 in the highest byte position. This is the output signal of the unit 20 which is of interest in the present context. A carry signal "1" on line 34 indicates that operand A is smaller than operand B and the absence of such a signal indicates that operand A is larger than operand B. For the operation of arithmetic unit 20, the Sub-chains A and B are understood as arithmetic operands. The following examples illustrate the operation of arithmetic unit 52 in generating these signals.

byte

Chain B 'hex' 00 AA 07 XX

Chain A 'hex' 00 AA 06 XX

Chain B 'binary' 0000 0000 1010 1010 0000 Olli xxxx xxxx Chain A 'binary' 00000000 1010 10100000 0110 xxxx xxxx

NU byte 2

Chain B 'binary' 0000 0000 1010 1010 0000 Olli xxxx xxxx Chain A 'binary' 1111 1111 0101 0101 1111 1001 xxxx xxxx

Carryovers 1 1 1 1

Ü (0) Ü in Ü in 2er

Byte (θ) Byte (θ) Compl.

The top line of the example specifies the byte position of the character strings (partial strings) A and B of the example, which are shown in hexadecimal form in the next two lines and in binary form in the two lines below. The "x" in bit position 3 means that the bytes in this position have no influence on the result and can therefore contain any characters. It can be seen that there is a mismatch (MC) in byte position 2 of both chains and that the value of the Chain A in this position is smaller than the corresponding value in chain B. The binary representation of chain B is repeated in the third to last line, while the penultimate line represents chain A in two's complement form, as it is for operand input A of the arithmetic unit is supplied to carry out an addition. The last line shows the transfers that occur during the addition. First, the lowest byte position, ie byte 3, is carried as part of the two's complement formation. This is done in a known manner by a signal from the control unit 40 on line 48, which also controls the arithmetic unit 20 to carry out a subtraction. A carry is generated in byte position 2, which runs into byte position 1 and from there also into position 0, which in turn generates a carry on line 34. This carry serves as an indication that the chain A is smaller than the chain B. At the same time, the logic unit 22 has compared chains A and B, which are supplied to it in real, ie not complementary binary division according to lines 4 and 5 of the example above have been. As a result of this comparison, the logic unit 22 provides on its output line 32 an MC (2) signal which indicates that the position 2 bytes are not equal. This indication and the carry on line 34 are independent of the bytes in byte position 3. If this position does not generate a carry in the example above, the result is as shown. This does not change if it is assumed that a carryover occurs in position 3. In this case, the byte of chain B in position 2 is increased by one. However, this does not change the fact that a carry is still generated in this position, which leads to a carry signal on line 34. Also if the bytes in position 3 are unequal and result in the logic unit 22 an output signal MC (3), which occurs in addition to the explained signal MC (2), this has no effect, since it only depends on the mismatch signal from the byte position that is closest to the beginning of the chain. It can therefore be seen that the bytes to the right of the mismatched position have no effect on the result of the operation of units 20 and 22. It must be added that the regular results that occur at the outputs of units 20 and 22, ie the algebraic difference A - B and the EXCLUSIVE-OR combination of chains A and B, are not important for the arrangement according to the invention. In the regular operations of the two units 20 and 22, these results are transmitted to a local memory via a multiplexer 39, a bus 42 and the multiplexer 13 and stored there.

The operations of the arrangement according to FIG. 1 are controlled by the control unit 40. This unit generates control signals on lines 46 to 51 which lead to the individual units. These signals are generated at predefined cycle times. Access to the memory 10 is via a bus 46 in order to load the registers 16 and 18, each with four bytes of the character strings A and B. A control signal on line 47 causes these bytes to be transmitted to units 20, 22 and 24 and the complement circuit 36 is activated. At the same time, a subtraction control signal SUB BA occurs on line 48 and a control signal VGL (A, B) on line 49 which activates the logic unit to carry out an EXCLUSIVE-OR operation, and on line 50 a control signal VGL EZ (A, B) which the Transmission of the chain end identifier EZ transmits from the register 26 to the comparison unit and activates this to carry out the multiple comparison explained above. The operations triggered by the control signals on lines 47 through 50 take up one machine cycle. At the end of this cycle, the result of the processing of the sub-chains A and B in the form of the display signals EA (0..3), EB (0..3), MC (0..3) and TRANSL (O) is on the lines 28 , 30, 32 and 34. These signals arrive at a display circuit 60, which is explained with reference to FIGS. 2 and 3.

The possible combinations of the signals MC, EA and EB are shown schematically in the left part of the table in FIG. 2 and the selection of the effective signal combinations is shown in the right part. The display logic 60 has a circuit 62 for shortening the effective partial chains, which takes into account a missing or incorrect alignment of the partial chains, as can occur when a physical memory limit is exceeded when accessing the character strings in the memory 12. The display logic 60 also has a circuit 64 for determining priority and a selection circuit 66 controlled by the carry signal on line 34.

The circuit 62 for shortening the effective partial chains consists of AND circuits 72, 73, 74, which are selectively conditioned via a bus 70. Each of the four lines in the manifolds 28, 30, 32 from the output of the units 24 and 22 are each connected to one of the AND circuits 72, 73, 74 which output signals corresponding to the signals on these lines to a manifold 76 when on the manifold 70 for all four in these Units of processed bytes

Conditioning signal appears. If, on the other hand, a memory access has loaded only an incomplete partial chain A, B into the registers 16, 18 because the addressed memory area exceeds a memory limit, the control unit 40 suppresses

Conditioning signal on the cores of the bus that correspond to the byte positions in which no significant byte was processed. These can be, for example, bytes 2 and 3, to which the wires 71 are assigned, so that their AND circuits 72, 73 74 do not emit a signal to the bus 70. On the collecting line 70, the connections on the input side are designated AO to A3, B0 to B3 and MO to M3, connections A0 to A3 being assigned to lines 28, connections B0 to B3 to lines 30 and connections MO to M3 to lines 32 are. The priority logic 64 determines in which byte position an end-of-chain character in sub-chains A or B is displayed and in which byte position a mismatch between these sub-chains is displayed. This is done by AND circuits 82 to 85, which inverters 78 to 80 are connected upstream. For the sake of illustration, part of the AND circuits and inverters have been omitted here. The AND circuit 82 receives an input signal MO from the bus which indicates a mismatch in byte position 0. This signal is only transmitted to a bus if there are no signals A0, B0, ie if no end-of-chain character is displayed in the same byte position. In this case, the AND circuit 82 is conditioned by output signals from the upstream inverters 78. Likewise, the AND circuit 83 transmits a mismatch signal M1 to the bus 88 if neither in byte position 0 nor in byte position 1 there is a chain end Character is displayed and no mismatch is displayed in byte position 0. In the same way, the mismatch signals M2 and M3 are transmitted to the bus 88 through AND circuits, not shown. The signals AO and B0 lead from the bus 76 directly to the bus 88. The AND circuit 84 transmits the signal AI to the bus 88 if neither an end-of-chain character nor a mismatch is indicated for byte positions 0. A corresponding transmission of the signal B1 takes place via the AND circuit 85. Likewise, the remaining end-of-chain indication signals A2, A3 and B2, B3 are transmitted to bus 88 via AND circuits (not shown) if none of the lower byte positions indicates an end-of-chain character or a mismatch. Signals AO to A3, B0 to B3 and MO to M3 from the bus 88 are combined by OR circuits 89 to form signals EA, EB and MC, which are fed via a further bus 90 to the carry evaluation logic 66, which are also provided with the carry line 34 is connected from the highest byte position of the arithmetic unit 20. The carry evaluation logic 66 has AND circuits 91, 93, 94 and an inverter 92 and indicates on lines 95 and 96 which of the two partial chains A, B is the larger and the smaller, respectively. To this end, the AND circuit 91 transmits the signal MC to line 95 when it has been conditioned by a carry signal on line 34. The output signal on line 95 indicates that the sub-chain A is smaller than the sub-chain B and is used to set the condition code CC = 01 in a locking circuit, not shown. Signal MC is further transmitted via AND circuit 93 to line 96 when there is no carry signal is present and the inverter 92 supplies a conditioning signal to the AND circuit 93. The output signal on line 96 indicates that the sub-chain B is smaller than the sub-chain A and is used to set the condition code CC = 10. The AND circuit 94 also provides an output signal on line 97 if the signals EA and EB occur together on the bus 90, ie if an end-of-chain character has been found in both the sub-chain A and the sub-chain B. The signal on line 97 is used to set the condition code CC = 00 to indicate that both sub-chains are the same. The signals EA and EB are also fed from the bus 90 to the lines 98 and 99 for setting the condition codes CC = 01 and CC = 10. Both signals indicate the chain end in the partial chain A or B. It can be seen that these signals are used to set the same condition codes as the signals on lines 95 and 96, ie that the end of sub-chain A also indicates that it is smaller than sub-chain B and that the end of sub-chain B also indicates that it is smaller than the sub-chain A. At the same time, the byte position for which a match was found during the last sub-chain processing was displayed on a bus 100 connected to the bus 88. This display, which results directly from the input signals A0 to A3 and B0 to B3 of the bus 88, is temporarily stored in a register (not shown) for use in the execution of subsequent program instructions.

FIG. 4 shows, in a simplified representation, the essential steps of the microprogram routine, which becomes active repeatedly in the control circuit 40 when two strings A and B are to be processed. This microprogram routine is stored as part of the microprogram of the processor in a memory of the control unit, not shown. In step 101, starting with the start address specified by the string command to be executed, the first eight bytes of the string A are transferred from the storage unit 12 to the local data memory 10. At the same time, the start address is increased by eight. In step 102, the same process takes place for the character string B. In step 103, the first four of the bytes of the character strings A and B located in the storage unit 103 are each transferred to the registers 16 and 18. This transfer takes place in one machine cycle. Then, in step 104, the partial chains of A and B in the registers 16, 18 are processed in parallel in the units 20, 22 and 24 in the manner described. This processing also takes place in just one machine cycle. Step 105 loads the second four bytes of character strings A and B from memory 10 as new partial strings of A and B into registers 16, 18. This is followed by another processing step 106, which corresponds to step 104. In a branching step 107, an inquiry is made as to whether an output signal EA, EB or MC has been determined in steps 104 or 106.

This is done by scanning the signal state of lines 97, 98 and 99 (Fig. 3). If there is no output signal EA, EB or MC, the microprogram branches back to step 101, with which the next eight bytes in the memory unit 12 are accessed. On the other hand, if such an output signal has been detected, this means that the processing of the character strings A and B has ended. Usually, this will not already be the case after the first run of the microprogram routine according to FIG. 4, but several such runs will be necessary in order to process longer character strings. Regardless of how many runs are necessary, a YES result in step 107 branches to step 108, which sets the condition codes CC in accordance with the signal state of lines 95 to 99. The following step 109 ends the microprogram and at the same time the execution of the string instruction. The next instruction of the respective application instruction can be a branch instruction which uses the previously set condition codes CC to carry out a program branching to a program section in which the processing result of the executed string processing instruction including the address of the byte position indicating a match indicated on the output bus 100 is further used.

Claims

18th

P a t e n t a n s r u c h e

Processor for character strings of variable length with a system of storage units for storing character strings, which can be addressed in pairs by program instructions and from which partial strings corresponding to the data flow width are transferred to two operand registers, with an arithmetic / logic unit for performing processing operations, with a condition code Circuit which stores signals derived from the processing results, which are used to control program branches, and with a control unit which addresses the partial chains of the character string pairs in the memory unit in succession and controls the operation of the units and transfers between the units in successive machine cycles characterized in that the operand registers (16, 18) are connected in parallel with an arithmetic unit (20), with a logic unit (22) and with a comparison unit (24), that one with a chain end identifier (E) Loadable identifier register (26) is connected to a further input of the comparison unit (24), that the partial chains stored in the operand registers (16, 18) are simultaneously used by the arithmetic unit (20) for subtracting one partial chain from the other partial chain, the logical unit ( 22) for comparing the two partial chains and the comparison unit (24) for comparing the characters of both partial chains with the content of the identifier register (26) and that output signals of these units are available within the same machine cycle and output signals of the comparison unit serve as an indication of the equality of both character strings or partial chains, output signals of the logic unit serve as an indication of the inequality of both character strings or partial chains, and a carry signal from the arithmetic unit as This is used to show which of the two strings or partial strings is the larger or the smaller.

2. Processor according to claim 1, wherein the characters are represented by bytes, characterized in that the comparison unit (24) is designed to carry out a multiple comparison of the contents of the identifier register (26) in parallel with all bytes of both partial chains and for each partial chain (A, B) has an output line (28, 30) on which a chain end signal (EA, EB) is generated if the content of the identifier register corresponds to a byte of the partial chains.

3. Processor according to claim 1 or 2, characterized in that the logic unit (22) is designed for a parallel comparison of the bytes of the same position in the two sub-chains (A, B) and has an output line (32) for each byte pair which generates a mismatch signal (MC) when the bytes differ.

4. Processor according to one of claims 1 to 3, characterized in that the arithmetic unit (20) is designed for a parallel addition of the bytes of the same position in the two partial chains (A, B) and a complementing circuit (16) at one of its inputs. and in its highest byte position has a carry output line (34) on which a carry signal appears when the partial chain (A) supplied via the activated complementing circuit (16) is larger in binary values of its bytes than the other partial chain (B) and the other one Trap due to the lack of such a carry signal indicates that the other sub-chain is larger.

5. Processor according to one of claims 1 to 4, characterized in that a result evaluation logic (60) has a circuit (64) for determining the priority among the output signals (EA, EB, MC) from the comparison unit (24) and the logical Unit (22), which indicates the lowest byte position, in which a match for one or the other sub-chain (A or B) or for both sub-chains is found to match the end-of-chain character (E), and which is on an output bus ( 100) indicates the lowest byte position in which a mismatch was found.

6. Processor according to one of claims 1 to 5, characterized in that a Uebertragauswertelogik (66) is provided, which by carry signals from the arithmetic Unit-controlled selection circuits (91, 92, 93) contain A SMALLER B and B SMALLER A on output lines (95, 96) for generating signals.

Processor according to one of Claims 1 to 6, characterized in that the output signals EA, EB, EAB, A SMALLER B and B SMALLER A on output lines (98, 99, 97, 95, 96) serve to set condition codes (CC), the program commands following the byte position displayed on the output bus (100) are stored for further use by the executed string command.