US20040215924A1 - Analyzing stored data - Google Patents

Analyzing stored data Download PDF

Info

Publication number
US20040215924A1
US20040215924A1 US10/426,052 US42605203A US2004215924A1 US 20040215924 A1 US20040215924 A1 US 20040215924A1 US 42605203 A US42605203 A US 42605203A US 2004215924 A1 US2004215924 A1 US 2004215924A1
Authority
US
United States
Prior art keywords
register
elements
value
extrema
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/426,052
Inventor
Jean-Francois Collard
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to US10/426,052 priority Critical patent/US20040215924A1/en
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: COLLARD, JEAN-FRANCOIS C.
Publication of US20040215924A1 publication Critical patent/US20040215924A1/en
Priority to US11/302,908 priority patent/US7206920B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30021Compare instructions, e.g. Greater-Than, Equal-To, MINMAX
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30105Register structure
    • G06F9/30109Register structure having multiple operands in a single register

Definitions

  • This disclosure relates to analyzing data in data storage.
  • Software code may contain instructions to locate specific data in data storage (e.g., memory such as volatile memory, and non-volatile memory, and the like).
  • software code may include instructions to search for a value in memory and to specify its location. Typically, this is accomplished by comparing each value in the data storage to the value to be searched until the location containing the value is determined.
  • Other software code may contain instructions to validate extrema values such as a maximum value or a minimum value in the data storage.
  • Each element in the array, y is compared to the maximum value, MAX, one at a time.
  • FIG. 1 is a flowchart of a process for locating a target value in data storage.
  • FIG. 2 is a diagram of registers used in locating the target value in the data storage.
  • FIG. 3 is a flow chart of a process for verifying an initial extrema value in the data storage.
  • FIG. 4 is a diagram of registers used in verifying the initial extrema value for nonnegative integer values in the data storage.
  • FIG. 5 is a flow chart of a process for verifying an initial maximum value for negative integer values in the data storage.
  • FIG. 6 is a diagram of registers used in verifying the initial maximum value for negative integer values in the data storage.
  • FIG. 7 is a block diagram of a computer system on which the processes of FIGS. 1 and 3 may be implemented.
  • a process 10 may be used to locate a target value in a data storage location (not shown). Instead of comparing each value in an element within the data storage location one-at-a-time with the target value, process 10 searches for the target value N elements-at-a-time (N>0) and as will be described below process 10 saves processing time.
  • Each element for example, may include 8-bits or 16-bits.
  • the target value may be a value required and requested during the execution of a program (e.g., from a compiler), an arbitrary value, or a user chosen value.
  • Process 10 may load ( 12 ) the target value into each element 30 of a first register 32 having N (N>0) elements.
  • each element may be 8 bits and a target value of 3 may be loaded into 8 elements of first register 32 , a 64-bit register.
  • process 10 may load ( 12 ) the target value using a single computer instruction (e.g., in this embodiment, mux).
  • Process 10 may load ( 14 ) the first N elements of the storage location into a second register 34 . This can be done, for example, using one 8-byte load or eight 8-bit loads.
  • Process 10 may compare ( 16 ) each element of first register 32 with its corresponding element in second register 34 .
  • Process 10 may indicate ( 18 ) which elements match the target value by placing a nonzero value into a corresponding element of a third register 36 .
  • Process 10 may place a zero value into the corresponding value of the third register if there is no match.
  • the corresponding elements of third register 36 may be set to hexadecimal value 0xff to indicate a match and 0x00 to indicate no match.
  • process 10 compares ( 16 ) and indicates ( 18 ) using a single computer instruction (e.g., in this embodiment, pcmp.eq).
  • Process 10 may obtain ( 20 ) the complement of third register 36 and place resulting corresponding values into a fourth register 38 .
  • process 10 obtains ( 20 ) the complement using a single computer instruction (e.g., in this embodiment, negate). In other embodiments, ( 20 ) may be skipped.
  • Process 10 may load ( 22 ) a value into a position field 40 indicating if and where there is an element in fourth register 38 having a zero value.
  • a value from “0” to “N ⁇ ” may be loaded into a position field 40 to indicate a match and the position (described below) of the element having the matching value.
  • a value of “N” may be loaded into position field 40 to indicate no match.
  • Each register (first register 32 , second register 34 , third register 36 , and fourth register 38 ) stores values in a little-endian format, i.e., the least-significant (“right-most”) element is the least significant.
  • the least significant element has a value of “1” and the most significant element has a value of “4.”
  • the least significant value has a position value of “0” and the most significant value has a position value of “7.”
  • the position value of the element of fourth register 38 containing a zero value is position value “5.”
  • a value of “5” is placed in position field 40 .
  • process 10 may load ( 22 ) into position field 40 the position value of the least significant element in fourth register 38 having a zero value.
  • process may load ( 22 ) a position field value using a single computer instruction (e.g., in this embodiment, czx1.r).
  • a single computer instruction e.g., in this embodiment, czx1.r
  • Process 10 may determine ( 24 ) if there is a match by reading position value 40 . If there are no matches (i.e., a value of “N” in field 40 , e.g., a value of “8” when there are eight elements), process 10 may load ( 26 ) the next N elements (following the first N elements) of the data storage location into second register 34 , and process 10 may compare ( 16 ) each field of the second register with first register 32 , as above.
  • a value of “N” in field 40 e.g., a value of “8” when there are eight elements
  • RVAL corresponds to first register 32
  • RCONT corresponds to second register 34
  • RRES corresponds to third register 36
  • NR corresponds to fourth register 38
  • RIND corresponds to position field 40 .
  • other code or even hardware may be used to implement process 10 .
  • a process 60 searches data storage and verifies that an initial extrema value, such as a maximum value or a minimum value, is valid.
  • Process 60 may load ( 62 ) an initial extrema value into each element of a first register 82 having N (N>0) elements, e.g., eight elements (FIG. 4).
  • process 60 may load ( 62 ) an initial extrema value using a single computer instruction (e.g., in this embodiment, mux).
  • the initial extrema value is a guess of the actual extrema value for the data storage.
  • Process 60 may be used to determine if that guess is correct.
  • the initial extrema value can come from a user input or the initial extrema value can be determined by a compiler via a compiler optimization setting. For example, a compiler, prior to executing process 60 , may read the first 10% of the values in the data storage and may take the extrema from those values. The compiler may then process the remaining 90% of the data storage elements using process 60 .
  • Process 60 may load ( 64 ) N (N>0) elements from the data storage into a second register 84 .
  • Process 60 may compare ( 66 ) each element's value in second register 84 to the initial extrema value loaded in first register 82 .
  • Process 60 may load ( 68 ) the extrema value between the first register and the second register into third register 86 . For example, if the initial extrema value is a maximum, the larger of the first register element and the second register element is placed in a corresponding third register element. If the initial extrema value is a minimum, the smaller of the first register element and the second register element is placed in a corresponding third register element.
  • process 60 compares ( 66 ) and loads ( 68 ) third register 86 using a computer instruction (e.g., in this embodiment, pmax) if the initial extrema value is a maximum and another computer instruction (e.g., in this embodiment, pmin) is used if the initial extrema value is a minimum.
  • a computer instruction e.g., in this embodiment, pmax
  • another computer instruction e.g., in this embodiment, pmin
  • Process 60 may determine ( 70 ) if the initial extrema value is valid by comparing elements from third register 86 to the initial extrema value. If all values match the initial extrema value, then the initial extrema value is valid. If at least one value in the third register does not match the initial extrema value, the initial extrema value is invalid.
  • process 60 may load ( 72 ) the next N (N>0) elements into second register 84 . If the initial extrema value is invalid, process 60 ends.
  • RMAX corresponds to first register 82
  • RVAL corresponds to second register 84
  • RRES corresponds to third register 86 .
  • Process 60 may be modified into a process 80 to account for nonnegative integers.
  • Actions 62 , 64 and 66 in process 80 are the same as actions 62 , 64 and 66 of process 60 (FIG. 3).
  • process 80 may load ( 88 ) a hexadecimal value of 0xff in the corresponding element of a third register 86 .
  • process 80 may load ( 88 ) a hexadecimal value of 0x00 into the corresponding element of third register 86 .
  • the initial extrema value is a maximum, process 80 may determine that the initial extrema value is valid if values in first register 82 are greater than or equal to values in second register 84 . If the extrema value is a minimum, process 80 determines that the initial extrema value is valid if values in first register 82 are less than or equal to the values in second register 84 .
  • process 80 may compare ( 88 ) the values using a single computer instruction (e.g., in this embodiment, pcmpl.gt).
  • Process 80 may load ( 90 ) into an invalid count field 94 a count of the elements in third register 86 where the initial extrema value is invalid (i.e., elements having a hexadecimal value of 0xff).
  • process 80 may load ( 90 ) invalid count field 94 by using a single computer instruction (e.g., in this embodiment, popcnt).
  • Process 80 may determine ( 92 ) if the initial extrema value is invalid by determining if there is a nonzero value in invalid count field 94 .
  • process 80 may load ( 72 ) the next N (N>0) elements into second register 84 . If the initial extrema value is invalid (i.e., invalid count field 94 contains a nonzero value), process 80 ends.
  • RVAL corresponds to second register 84
  • RRES corresponds to third register 86
  • RCNT corresponds to invalid count field 94 .
  • FIG. 7 shows a computer 100 for using processes 10 , 60 and 80 .
  • Computer 100 includes a processor 102 , a memory 104 , and a storage medium 106 (e.g., hard disk).
  • Storage medium 106 stores operating system 110 , data storage 112 and registers 116 , and computer instructions 114 which are executed by processor 102 out of memory 104 to perform processes 10 , 60 and 80 .
  • Processes 10 , 60 and 80 are not limited to use with the hardware and software of FIG. 7; they may find applicability in any computing or processing environment and with any type of machine that is capable of running a computer program. Processes 10 , 60 and 80 may be implemented in hardware, software, or a combination of the two. For example, processes 10 , 60 and 80 may be implemented in a circuit that includes one or a combination of a processor, a memory, programmable logic and logic gates.
  • Processes 10 , 60 and 80 may be implemented in computer programs executed on programmable computers/machines that each includes a processor, a storage medium or other article of manufacture that is readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and one or more output devices.
  • Program code may be applied to data entered using an input device to perform processes 10 , 60 and 80 and to generate output information.
  • Each such program may be implemented in a high level procedural or object-oriented programming language to communicate with a computer system.
  • the programs can be implemented in assembly or machine language.
  • the language may be a compiled or an interpreted language.
  • Each computer program may be stored on a storage medium or device (e.g., CD-ROM, hard disk, or magnetic diskette) that is readable by a general or special purpose programmable computer for configuring and operating the computer when the storage medium or device is read by the computer to perform processes 10 , 60 and 80 .
  • Processes 10 , 60 and 80 may also be implemented as one or more machine-readable storage media, configured with a computer program(s), where upon execution, instructions in the computer program(s) cause a computer to operate in accordance with processes 10 , 60 and 80 .
  • Processes 10 , 60 and 80 are not limited to the specific embodiments described herein.
  • the elements are not limited to 8-bit or 16-bit, nor are the registers limited to 64 bits. Rather, the elements and registers can be any combination of sizes that are consistent with the processes described herein.
  • processes 60 and 80 are not limited to the actions described herein. For example, after determining that an extrema value is invalid by another value in the data storage, processes 60 and 80 can overwrite the elements of the first register with a new extrema value and continue processes 60 and 80 with the rest of the data storage elements.
  • overwriting the registers with the new values may reduce the number of registers used to execute processes 10 , 60 and 80 .
  • Processes 10 , 60 and 80 are not limited to the specific processing order of FIGS. 1, 3 and 5 . Rather, the blocks of FIGS. 1, 3 and 5 may be re-ordered, as necessary, to achieve the results set forth above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

A method of locating a target value includes loading the target value into elements of a first register. The first register includes N elements (N>0). The method also includes indicating in elements of a second register, which includes N elements corresponding to the first register, whether a corresponding element from data storage matches a corresponding element of the first register.

Description

    TECHNICAL FIELD
  • This disclosure relates to analyzing data in data storage. [0001]
  • BACKGROUND
  • Software code may contain instructions to locate specific data in data storage (e.g., memory such as volatile memory, and non-volatile memory, and the like). For example, software code may include instructions to search for a value in memory and to specify its location. Typically, this is accomplished by comparing each value in the data storage to the value to be searched until the location containing the value is determined. For example, typical instructions to locate a value, VALUE, in an array, x, having N elements are: [0002]
    1     pos = −1
    2     for (i = 0; i < N; i++) {
    3         if (x[i] == VALUE) {
    4             pos = i
    5             break;
    6         }
    7     }
  • Other software code may contain instructions to validate extrema values such as a maximum value or a minimum value in the data storage. For example, typical instructions to verify a maximum value, MAX, in an array, y, having N elements are: [0003]
    1     MAX = −1
    2     for (i = 0; i < N; i++) {
    3         if (y[i] > MAX) {
    4             MAX = y[i]
    5         }
    6     }
  • Each element in the array, y, is compared to the maximum value, MAX, one at a time.[0004]
  • DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a flowchart of a process for locating a target value in data storage. [0005]
  • FIG. 2 is a diagram of registers used in locating the target value in the data storage. [0006]
  • FIG. 3 is a flow chart of a process for verifying an initial extrema value in the data storage. [0007]
  • FIG. 4 is a diagram of registers used in verifying the initial extrema value for nonnegative integer values in the data storage. [0008]
  • FIG. 5 is a flow chart of a process for verifying an initial maximum value for negative integer values in the data storage. [0009]
  • FIG. 6 is a diagram of registers used in verifying the initial maximum value for negative integer values in the data storage. FIG. 7 is a block diagram of a computer system on which the processes of FIGS. 1 and 3 may be implemented.[0010]
  • DESCRIPTION
  • Referring to FIGS. 1 and 2, a [0011] process 10 may be used to locate a target value in a data storage location (not shown). Instead of comparing each value in an element within the data storage location one-at-a-time with the target value, process 10 searches for the target value N elements-at-a-time (N>0) and as will be described below process 10 saves processing time. Each element, for example, may include 8-bits or 16-bits. The target value may be a value required and requested during the execution of a program (e.g., from a compiler), an arbitrary value, or a user chosen value.
  • [0012] Process 10 may load (12) the target value into each element 30 of a first register 32 having N (N>0) elements. For example, each element may be 8 bits and a target value of 3 may be loaded into 8 elements of first register 32, a 64-bit register. In one embodiment, process 10 may load (12) the target value using a single computer instruction (e.g., in this embodiment, mux). Process 10 may load (14) the first N elements of the storage location into a second register 34. This can be done, for example, using one 8-byte load or eight 8-bit loads.
  • [0013] Process 10 may compare (16) each element of first register 32 with its corresponding element in second register 34. Process 10 may indicate (18) which elements match the target value by placing a nonzero value into a corresponding element of a third register 36. Process 10 may place a zero value into the corresponding value of the third register if there is no match. With eight one-byte values, the corresponding elements of third register 36 may be set to hexadecimal value 0xff to indicate a match and 0x00 to indicate no match. In one embodiment, process 10 compares (16) and indicates (18) using a single computer instruction (e.g., in this embodiment, pcmp.eq).
  • [0014] Process 10 may obtain (20) the complement of third register 36 and place resulting corresponding values into a fourth register 38. In one embodiment, process 10 obtains (20) the complement using a single computer instruction (e.g., in this embodiment, negate). In other embodiments, (20) may be skipped.
  • [0015] Process 10 may load (22) a value into a position field 40 indicating if and where there is an element in fourth register 38 having a zero value. A value from “0” to “N−” may be loaded into a position field 40 to indicate a match and the position (described below) of the element having the matching value. A value of “N” may be loaded into position field 40 to indicate no match.
  • Each register ([0016] first register 32, second register 34, third register 36, and fourth register 38) stores values in a little-endian format, i.e., the least-significant (“right-most”) element is the least significant. Thus, in second register 34, the least significant element has a value of “1” and the most significant element has a value of “4.” The least significant value has a position value of “0” and the most significant value has a position value of “7.” In FIG. 2, the position value of the element of fourth register 38 containing a zero value is position value “5.” Thus, a value of “5” is placed in position field 40.
  • If more than one zero value is in [0017] fourth register 38, process 10 may load (22) into position field 40 the position value of the least significant element in fourth register 38 having a zero value.
  • In one embodiment, process may load ([0018] 22) a position field value using a single computer instruction (e.g., in this embodiment, czx1.r).
  • [0019] Process 10 may determine (24) if there is a match by reading position value 40. If there are no matches (i.e., a value of “N” in field 40, e.g., a value of “8” when there are eight elements), process 10 may load (26) the next N elements (following the first N elements) of the data storage location into second register 34, and process 10 may compare (16) each field of the second register with first register 32, as above.
  • If there are matches (e.g., a value from “0” to “N−1” is placed in field [0020] 40), process 10 ends.
  • A representative example of program code (i.e., machine-executable instructions) for an INTEL® ITANIUM® processor to implement [0021] process 10 is as follows:
    1      mov rA = addr of the 1st element of x
    2      mov RPOS = 0
    3      mux1 RVAL = VAL, @bcst
    4  L:
    5      ld8 RCONT = [rA] , 8 ;; //Post-increment by
    8 bytes
    6      pcmp1.eq RRES = RVAL, RCONT ;;
    7      negate NR=RRES ;; //Using e.g.xor
    NR=0xffffffffffffffff, RRES
    8      czx1.r RIND=NR;;
    9      cmp.eq p2, p3=RIND, 8;;
    10 (p3)   br.cond out
    11  (p2)   add RPOS=8, RPOS//Increment RPOS by 8
    12     br. Cloop L;;
    13 out:
    14     //If RIND is different from 8, the value was found
    15     //Then, its position pos in array x equals RPOS+RIND
  • In the above code, “RVAL” corresponds to [0022] first register 32, “RCONT” corresponds to second register 34, “RRES” corresponds to third register 36, “NR” corresponds to fourth register 38, and “RIND” corresponds to position field 40. Of course, other code (or even hardware) may be used to implement process 10.
  • Referring now to FIGS. 3 and 4, another process is shown for validating extrema values. In more detail, a process [0023] 60 (FIG. 3) searches data storage and verifies that an initial extrema value, such as a maximum value or a minimum value, is valid. Process 60 may load (62) an initial extrema value into each element of a first register 82 having N (N>0) elements, e.g., eight elements (FIG. 4). In one embodiment, process 60 may load (62) an initial extrema value using a single computer instruction (e.g., in this embodiment, mux).
  • The initial extrema value is a guess of the actual extrema value for the data storage. [0024] Process 60 may be used to determine if that guess is correct. The initial extrema value can come from a user input or the initial extrema value can be determined by a compiler via a compiler optimization setting. For example, a compiler, prior to executing process 60, may read the first 10% of the values in the data storage and may take the extrema from those values. The compiler may then process the remaining 90% of the data storage elements using process 60.
  • [0025] Process 60 may load (64) N (N>0) elements from the data storage into a second register 84. Process 60 may compare (66) each element's value in second register 84 to the initial extrema value loaded in first register 82. Process 60 may load (68) the extrema value between the first register and the second register into third register 86. For example, if the initial extrema value is a maximum, the larger of the first register element and the second register element is placed in a corresponding third register element. If the initial extrema value is a minimum, the smaller of the first register element and the second register element is placed in a corresponding third register element.
  • In one embodiment, [0026] process 60 compares (66) and loads (68) third register 86 using a computer instruction (e.g., in this embodiment, pmax) if the initial extrema value is a maximum and another computer instruction (e.g., in this embodiment, pmin) is used if the initial extrema value is a minimum.
  • [0027] Process 60 may determine (70) if the initial extrema value is valid by comparing elements from third register 86 to the initial extrema value. If all values match the initial extrema value, then the initial extrema value is valid. If at least one value in the third register does not match the initial extrema value, the initial extrema value is invalid.
  • If the initial extrema value is valid, [0028] process 60 may load (72) the next N (N>0) elements into second register 84. If the initial extrema value is invalid, process 60 ends.
  • A representative example of program code (i.e., machine-executable instructions) for an INTEL® ITANIUM® processor to implement [0029] process 60 is as follows:
    1    //Process first elements using method in prior art
    2    //At this point, MAX contains the local maximum
    3    //rA = addr of the 1st element of x on which this method is applied
    4    mux1 RMAX = MAX, @bcst
    5 L:
    7    ld8 RVAL = [rA] , 8 ;; //8values are loaded in one step
    8    pmax1.u RRES = RVAL, RMAX;;
    9    cmp.eq p2,p3=RRES,RMAX;; //Are all values in RVAL
    lower than or equal to MAX?
    10 (p3) br.cond method_of_prior_art //No. Branch to recovery
    11 (p2) br.cond L;; //Yes. Process 60 can proceed.
  • In the above code, “RMAX” corresponds to [0030] first register 82, “RVAL” corresponds to second register 84 and “RRES” corresponds to third register 86. The code can be pipelined with an initiation interval of one using the following br.ctop instruction:
    1 L:
    2 (p16) ld8r32 = [rA] , 8 // r32 serves a RVAL3
    3 (p17) pmax1.u r34 = r33, RMAX // r32 rotated into r33. r34
    serves as RRES
    4 (p19) cmp.eq p2, p3 = r36, RMAX  //r34 rotated into r36
    5 (p3) br.cond method_of_prior_art
    6 (p2) br.ctop L ;;
  • Of course, other code (or even hardware) may be used to implement [0031] process 60.
  • Heretofore, comparing each element one-at-a-time to validate an initial extrema value took N (N>0) cycles plus a fixed amount of time (e.g., time to load instructions, etc.), assuming the processing is pipelined with an initiation interval of one. For an array x having N elements, assuming that the values are stored using 8 bits per element, assuming [0032] process 60 is applied to the last f*N (0<f<1) elements of the data storage where f is the portion of the data storage analyzed by a compiler before executing process 60 and assuming the maximum value was in the first (1−f)*N elements, then process 60 takes:
  • (1−f)*N+f*N/8+a cycles;
  • where a is a constant. Assuming that N is sufficiently large, [0033] process 60 takes 7f/8 cycles.
  • Referring to FIGS. 5 and 6, other embodiments process values in data storage that may be negative integers instead of nonnegative integers. [0034] Process 60 may be modified into a process 80 to account for nonnegative integers.
  • [0035] Actions 62, 64 and 66 in process 80 (FIG. 5) are the same as actions 62, 64 and 66 of process 60 (FIG. 3).
  • For each element in which an initial extrema is false, [0036] process 80 may load (88) a hexadecimal value of 0xff in the corresponding element of a third register 86. For each element in which the initial extrema is true, process 80 may load (88) a hexadecimal value of 0x00 into the corresponding element of third register 86. If the initial extrema value is a maximum, process 80 may determine that the initial extrema value is valid if values in first register 82 are greater than or equal to values in second register 84. If the extrema value is a minimum, process 80 determines that the initial extrema value is valid if values in first register 82 are less than or equal to the values in second register 84. In one embodiment, process 80 may compare (88) the values using a single computer instruction (e.g., in this embodiment, pcmpl.gt).
  • [0037] Process 80 may load (90) into an invalid count field 94 a count of the elements in third register 86 where the initial extrema value is invalid (i.e., elements having a hexadecimal value of 0xff). In one embodiment, process 80 may load (90) invalid count field 94 by using a single computer instruction (e.g., in this embodiment, popcnt).
  • [0038] Process 80 may determine (92) if the initial extrema value is invalid by determining if there is a nonzero value in invalid count field 94.
  • If the initial extrema value is valid (i.e., a zero value in invalid count field [0039] 94), process 80 may load (72) the next N (N>0) elements into second register 84. If the initial extrema value is invalid (i.e., invalid count field 94 contains a nonzero value), process 80 ends.
  • A representative example of program code (i.e., machine-executable instructions) for an INTEL® ITANIUM® processor to implement [0040] process 80 is as follows:
    1 RA = addr of the 1st element of x on which process 80 is applied
    2 mux1 RMAX = MAX, @bcst
    3 L:
    4 ld8 RVAL = [rA] , 8;;
    5 pcmp1.gt RRES = RVAL, RMAX ;;
    6 cmp.eq p2, p3=RCNT, 0;;
    7 (p3) br.cond method_of_prior_art
    8 (p2) br.cond L;;
  • In the above code, “RMAX” corresponds to [0041] first register 82, “RVAL” corresponds to second register 84, “RRES” corresponds to third register 86 and “RCNT” corresponds to invalid count field 94. The instruction above can be pipelined with an initiation interval of one and described as:
    1  L:
    2 (p16) ld8 RVAL = [rA] , 8 ;;
    3 (p17) pcmp1.gt RRES = RVAL, RMAX ;;
    4 (p19) popcnt RCNT = RRES;;
    5 (p21) cmp.eq p2, p3 = RCNT, 0;;
    6 (p3) br.cond method_of_prior_art
    7 (p2) br.ctop L ;;
  • Of course, other code (or even hardware) may be used to implement [0042] process 80.
  • FIG. 7 shows a [0043] computer 100 for using processes 10, 60 and 80. Computer 100 includes a processor 102, a memory 104, and a storage medium 106 (e.g., hard disk). Storage medium 106 stores operating system 110, data storage 112 and registers 116, and computer instructions 114 which are executed by processor 102 out of memory 104 to perform processes 10, 60 and 80.
  • Processes [0044] 10, 60 and 80 are not limited to use with the hardware and software of FIG. 7; they may find applicability in any computing or processing environment and with any type of machine that is capable of running a computer program. Processes 10, 60 and 80 may be implemented in hardware, software, or a combination of the two. For example, processes 10, 60 and 80 may be implemented in a circuit that includes one or a combination of a processor, a memory, programmable logic and logic gates. Processes 10, 60 and 80 may be implemented in computer programs executed on programmable computers/machines that each includes a processor, a storage medium or other article of manufacture that is readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and one or more output devices. Program code may be applied to data entered using an input device to perform processes 10, 60 and 80 and to generate output information.
  • Each such program may be implemented in a high level procedural or object-oriented programming language to communicate with a computer system. However, the programs can be implemented in assembly or machine language. The language may be a compiled or an interpreted language. Each computer program may be stored on a storage medium or device (e.g., CD-ROM, hard disk, or magnetic diskette) that is readable by a general or special purpose programmable computer for configuring and operating the computer when the storage medium or device is read by the computer to perform [0045] processes 10, 60 and 80. Processes 10, 60 and 80 may also be implemented as one or more machine-readable storage media, configured with a computer program(s), where upon execution, instructions in the computer program(s) cause a computer to operate in accordance with processes 10, 60 and 80.
  • Processes [0046] 10, 60 and 80 are not limited to the specific embodiments described herein. For example, the elements are not limited to 8-bit or 16-bit, nor are the registers limited to 64 bits. Rather, the elements and registers can be any combination of sizes that are consistent with the processes described herein.
  • In another example, processes [0047] 60 and 80 are not limited to the actions described herein. For example, after determining that an extrema value is invalid by another value in the data storage, processes 60 and 80 can overwrite the elements of the first register with a new extrema value and continue processes 60 and 80 with the rest of the data storage elements.
  • In still another example, overwriting the registers with the new values may reduce the number of registers used to execute [0048] processes 10, 60 and 80.
  • Processes [0049] 10, 60 and 80 are not limited to the specific processing order of FIGS. 1, 3 and 5. Rather, the blocks of FIGS. 1, 3 and 5 may be re-ordered, as necessary, to achieve the results set forth above.
  • Other embodiments not described herein are also within the scope of the following claims. [0050]

Claims (34)

What is claimed is:
1. A method of locating a target value, comprising:
loading the target value into elements of a first register, the first register comprising N elements (N>0); and
indicating in elements of a second register, comprising N elements corresponding to the first register, whether a corresponding element from data storage matches a corresponding element of the first register.
2. The method of claim 1, further comprising:
indicating a position of at least one element in the data storage containing the target value based on the contents of the second register.
3. The method of claim 1, further comprising:
indicating a position of a least significant element containing the target value.
4. The method of claim 1, further comprising:
taking a complement of the second register; and
loading the complement of the second register into a third register comprising corresponding N elements.
5. The method of claim 1, further comprising:
overwriting the second register with its complement.
6. A method of verifying if an initial extrema value is valid, comprising:
loading the initial extrema value into elements of a first register, the first register comprising N elements (N>0); and
indicating in a second register, comprising N elements, the extrema values between corresponding elements in data storage and corresponding elements in the first register.
7. The method of claim 6, further comprising:
indicating when an initial extrema value is invalid.
8. The method of claim 7, wherein the extrema value comprises a maximum; and
wherein indicating comprises determining if an element in the second register is greater than the initial extrema value.
9. The method of claim 6, wherein the initial extrema value is determined by a user.
10. The method of claim 6, wherein the initial extrema value is determined by a compiler.
11. An apparatus comprising:
circuitry, for locating a value, to:
load the target value into elements of a first register, the first register comprising N elements (N>0); and
indicate in elements of a second register, comprising N elements corresponding to the first register, whether a corresponding element from data storage matches a corresponding element of the first register.
12. The apparatus of claim 11, further comprising circuitry to:
indicate a position of at least one element in the data storage containing the target value based on the contents of the second register.
13. The apparatus of claim 11, further comprising circuitry to:
indicate a position of a least significant element containing the target value.
14. The apparatus of claim 11, further comprising circuitry to:
take a complement of the second register; and
load the complement of the second register into a third register comprising corresponding N elements.
15. The apparatus of claim 11, further comprising circuitry to:
overwrite the second register with its complement.
16. An apparatus comprising:
circuitry, for locating a value, to:
load the initial extrema value into elements of a first register, the first register comprising N elements (N>0); and
indicate in a second register, comprising N elements, the extrema values between corresponding elements in data storage and corresponding elements in the first register.
17. The apparatus of claim 16, further comprising circuitry to:
indicate when an initial extrema value is invalid.
18. The apparatus of claim 17, wherein the extrema value comprises a maximum; and
wherein indicating comprises determining if an element in the second register is greater than the initial extrema value.
19. The apparatus of claim 16, wherein the initial extrema value is determined by a user.
20. The apparatus of claim 16, wherein the initial extrema value is determined by a compiler.
21. An article comprising a machine-readable medium that stores executable instructions for locating data, the instructions causing a machine to:
load the target value into elements of a first register, the first register comprising N elements (N>0); and
indicate in elements of a second register, comprising N elements corresponding to the first register, whether a corresponding element from data storage matches a corresponding element of the first register.
22. The article of claim 21, further comprising instructions causing a machine to:
indicate a position of at least one element in the data storage containing the target value based on the contents of the second register.
23. The article of claim 21, further comprising instructions causing a machine to:
indicate a position of a least significant element containing the target value.
24. The article of claim 21, further comprising instructions causing a machine to:
take a complement of the second register; and
load the complement of the second register into a third register comprising corresponding N elements.
25. The article of claim 21, further comprising instructions causing a machine to:
overwrite the second register with its complement.
26. An article comprising a machine-readable medium that stores executable instructions for locating data, the instructions causing a machine to:
load the initial extrema value into elements of a first register, the first register comprising N elements (N>0); and
indicate in a second register, comprising N elements, the extrema values between corresponding elements in data storage and corresponding elements in the first register.
27. The article of claim 26, further comprising instructions causing a machine to:
indicate when an initial extrema value is invalid.
28. The article of claim 27, wherein the extrema value comprises a maximum; and
wherein indicating comprises determining if an element in the second register is greater than the initial extrema value.
29. The article of claim 26, wherein the initial extrema value is determined by a user.
30. The article of claim 26, wherein the initial extrema value is determined by a compiler.
31. A system, comprising:
at least one processor;
memory; and
logic coupled to the processing device and the memory, usable by the at least one processor to:
load a target value into elements of a first register, the first register comprising N elements (N>);
indicate in elements of a second register, comprising N elements corresponding to the first register, whether a corresponding element from data storage matches a corresponding element of the first register;
load the initial extrema value into elements of a third register, the third register comprising N elements (N>0); and
indicate in a fourth register, comprising N elements, the extrema values between corresponding elements in data storage and corresponding elements in the third register.
32. The system of claim 31 wherein the first register is the third register and the second register is the fourth register.
33. The system of claim 31, further comprising logic to:
indicate a position of at least one element in the data storage containing the target value based on the contents of the second register.
34. The system of claim 31, further comprising logic to:
indicate when an initial extrema value is invalid.
US10/426,052 2003-04-28 2003-04-28 Analyzing stored data Abandoned US20040215924A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US10/426,052 US20040215924A1 (en) 2003-04-28 2003-04-28 Analyzing stored data
US11/302,908 US7206920B2 (en) 2003-04-28 2005-12-13 Min/max value validation by repeated parallel comparison of the value with multiple elements of a set of data elements

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/426,052 US20040215924A1 (en) 2003-04-28 2003-04-28 Analyzing stored data

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US11/302,908 Division US7206920B2 (en) 2003-04-28 2005-12-13 Min/max value validation by repeated parallel comparison of the value with multiple elements of a set of data elements
US11/302,908 Continuation US7206920B2 (en) 2003-04-28 2005-12-13 Min/max value validation by repeated parallel comparison of the value with multiple elements of a set of data elements

Publications (1)

Publication Number Publication Date
US20040215924A1 true US20040215924A1 (en) 2004-10-28

Family

ID=33299533

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/426,052 Abandoned US20040215924A1 (en) 2003-04-28 2003-04-28 Analyzing stored data
US11/302,908 Expired - Fee Related US7206920B2 (en) 2003-04-28 2005-12-13 Min/max value validation by repeated parallel comparison of the value with multiple elements of a set of data elements

Family Applications After (1)

Application Number Title Priority Date Filing Date
US11/302,908 Expired - Fee Related US7206920B2 (en) 2003-04-28 2005-12-13 Min/max value validation by repeated parallel comparison of the value with multiple elements of a set of data elements

Country Status (1)

Country Link
US (2) US20040215924A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013136232A1 (en) 2012-03-15 2013-09-19 International Business Machines Corporation Vector find element not equal instruction
WO2013136233A1 (en) 2012-03-15 2013-09-19 International Business Machines Corporation Vector find element equal instruction
US20140032877A1 (en) * 2011-12-23 2014-01-30 Thomas R. Craver Apparatus and method for an instruction that determines whether a value is within a range
US9268566B2 (en) 2012-03-15 2016-02-23 International Business Machines Corporation Character data match determination by loading registers at most up to memory block boundary and comparing
US9280347B2 (en) 2012-03-15 2016-03-08 International Business Machines Corporation Transforming non-contiguous instruction specifiers to contiguous instruction specifiers
US9383996B2 (en) 2012-03-15 2016-07-05 International Business Machines Corporation Instruction to load data up to a specified memory boundary indicated by the instruction
US9442722B2 (en) 2012-03-15 2016-09-13 International Business Machines Corporation Vector string range compare
US9454367B2 (en) 2012-03-15 2016-09-27 International Business Machines Corporation Finding the length of a set of character data having a termination character
US9454366B2 (en) 2012-03-15 2016-09-27 International Business Machines Corporation Copying character data having a termination character from one memory location to another
US9459868B2 (en) 2012-03-15 2016-10-04 International Business Machines Corporation Instruction to load data up to a dynamically determined memory boundary
US9710267B2 (en) 2012-03-15 2017-07-18 International Business Machines Corporation Instruction to compute the distance to a specified memory boundary

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110928580B (en) * 2019-10-23 2022-06-24 北京达佳互联信息技术有限公司 Asynchronous flow control method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4896133A (en) * 1987-02-10 1990-01-23 Davin Computer Corporation Parallel string processor and method for a minicomputer
US6128614A (en) * 1995-12-20 2000-10-03 Intel Corporation Method of sorting numbers to obtain maxima/minima values with ordering
US6708168B2 (en) * 2000-12-29 2004-03-16 Nortel Networks Limited Method and apparatus for searching a data stream for character patterns

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5717947A (en) * 1993-03-31 1998-02-10 Motorola, Inc. Data processing system and method thereof
US5838984A (en) * 1996-08-19 1998-11-17 Samsung Electronics Co., Ltd. Single-instruction-multiple-data processing using multiple banks of vector registers
US6006315A (en) * 1996-10-18 1999-12-21 Samsung Electronics Co., Ltd. Computer methods for writing a scalar value to a vector
US6948056B1 (en) * 2000-09-28 2005-09-20 Intel Corporation Maintaining even and odd array pointers to extreme values by searching and comparing multiple elements concurrently where a pointer is adjusted after processing to account for a number of pipeline stages
US6697064B1 (en) * 2001-06-08 2004-02-24 Nvidia Corporation System, method and computer program product for matrix tracking during vertex processing in a graphics pipeline
US6792460B2 (en) * 2002-10-02 2004-09-14 Mercury Interactive Corporation System and methods for monitoring application server performance
US7454451B2 (en) * 2003-04-23 2008-11-18 Micron Technology, Inc. Method for finding local extrema of a set of values for a parallel processing element

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4896133A (en) * 1987-02-10 1990-01-23 Davin Computer Corporation Parallel string processor and method for a minicomputer
US6128614A (en) * 1995-12-20 2000-10-03 Intel Corporation Method of sorting numbers to obtain maxima/minima values with ordering
US6708168B2 (en) * 2000-12-29 2004-03-16 Nortel Networks Limited Method and apparatus for searching a data stream for character patterns

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104011659A (en) * 2011-12-23 2014-08-27 英特尔公司 Apparatus and method for an instruction that determines whether a value is within a range
US9898284B2 (en) 2011-12-23 2018-02-20 Intel Corporation Apparatus and method for an instruction that determines whether a value is within a range
US20140032877A1 (en) * 2011-12-23 2014-01-30 Thomas R. Craver Apparatus and method for an instruction that determines whether a value is within a range
CN107168682A (en) * 2011-12-23 2017-09-15 英特尔公司 For determination value whether the apparatus and method of the instruction in the range of
US9411586B2 (en) * 2011-12-23 2016-08-09 Intel Corporation Apparatus and method for an instruction that determines whether a value is within a range
US9459864B2 (en) 2012-03-15 2016-10-04 International Business Machines Corporation Vector string range compare
US9459867B2 (en) 2012-03-15 2016-10-04 International Business Machines Corporation Instruction to load data up to a specified memory boundary indicated by the instruction
EP2758891A4 (en) * 2012-03-15 2014-10-08 Ibm Vector find element equal instruction
US9268566B2 (en) 2012-03-15 2016-02-23 International Business Machines Corporation Character data match determination by loading registers at most up to memory block boundary and comparing
US9280347B2 (en) 2012-03-15 2016-03-08 International Business Machines Corporation Transforming non-contiguous instruction specifiers to contiguous instruction specifiers
US9383996B2 (en) 2012-03-15 2016-07-05 International Business Machines Corporation Instruction to load data up to a specified memory boundary indicated by the instruction
EP2758891A1 (en) * 2012-03-15 2014-07-30 International Business Machines Corporation Vector find element equal instruction
US9442722B2 (en) 2012-03-15 2016-09-13 International Business Machines Corporation Vector string range compare
US9454374B2 (en) 2012-03-15 2016-09-27 International Business Machines Corporation Transforming non-contiguous instruction specifiers to contiguous instruction specifiers
US9454367B2 (en) 2012-03-15 2016-09-27 International Business Machines Corporation Finding the length of a set of character data having a termination character
US9454366B2 (en) 2012-03-15 2016-09-27 International Business Machines Corporation Copying character data having a termination character from one memory location to another
WO2013136232A1 (en) 2012-03-15 2013-09-19 International Business Machines Corporation Vector find element not equal instruction
US9459868B2 (en) 2012-03-15 2016-10-04 International Business Machines Corporation Instruction to load data up to a dynamically determined memory boundary
EP2756415A4 (en) * 2012-03-15 2014-09-03 Ibm Vector find element not equal instruction
US9471312B2 (en) 2012-03-15 2016-10-18 International Business Machines Corporation Instruction to load data up to a dynamically determined memory boundary
US9477468B2 (en) 2012-03-15 2016-10-25 International Business Machines Corporation Character data string match determination by loading registers at most up to memory block boundary and comparing to avoid unwarranted exception
US9588763B2 (en) 2012-03-15 2017-03-07 International Business Machines Corporation Vector find element not equal instruction
US9588762B2 (en) 2012-03-15 2017-03-07 International Business Machines Corporation Vector find element not equal instruction
US9710267B2 (en) 2012-03-15 2017-07-18 International Business Machines Corporation Instruction to compute the distance to a specified memory boundary
US9710266B2 (en) 2012-03-15 2017-07-18 International Business Machines Corporation Instruction to compute the distance to a specified memory boundary
US9715383B2 (en) 2012-03-15 2017-07-25 International Business Machines Corporation Vector find element equal instruction
EP2756415A1 (en) * 2012-03-15 2014-07-23 International Business Machines Corporation Vector find element not equal instruction
US9772843B2 (en) 2012-03-15 2017-09-26 International Business Machines Corporation Vector find element equal instruction
WO2013136233A1 (en) 2012-03-15 2013-09-19 International Business Machines Corporation Vector find element equal instruction
US9946542B2 (en) 2012-03-15 2018-04-17 International Business Machines Corporation Instruction to load data up to a specified memory boundary indicated by the instruction
US9952862B2 (en) 2012-03-15 2018-04-24 International Business Machines Corporation Instruction to load data up to a dynamically determined memory boundary
US9959117B2 (en) 2012-03-15 2018-05-01 International Business Machines Corporation Instruction to load data up to a specified memory boundary indicated by the instruction
US9959118B2 (en) 2012-03-15 2018-05-01 International Business Machines Corporation Instruction to load data up to a dynamically determined memory boundary

Also Published As

Publication number Publication date
US20060095425A1 (en) 2006-05-04
US7206920B2 (en) 2007-04-17

Similar Documents

Publication Publication Date Title
US7206920B2 (en) Min/max value validation by repeated parallel comparison of the value with multiple elements of a set of data elements
US8583905B2 (en) Runtime extraction of data parallelism
JP4380987B2 (en) Array processing operation
US5826074A (en) Extenstion of 32-bit architecture for 64-bit addressing with shared super-page register
US6865667B2 (en) Data processing system having redirecting circuitry and method therefor
US9696996B2 (en) Parallel execution unit that extracts data parallelism at runtime
US20070106883A1 (en) Efficient Streaming of Un-Aligned Load/Store Instructions that Save Unused Non-Aligned Data in a Scratch Register for the Next Instruction
US20070055886A1 (en) Message Digest Instruction
JPS61243536A (en) Advance control device of data processor
US11803379B2 (en) Vector floating-point classification
JPH06236268A (en) Apparatus and method for judgment of length of instruction
US5854914A (en) Mechanism to improved execution of misaligned loads
US11036511B2 (en) Processing of a temporary-register-using instruction including determining whether to process a register move micro-operation for transferring data from a first register file to a second register file based on whether a temporary variable is still available in the second register file
US5502827A (en) Pipelined data processor for floating point and integer operation with exception handling
US7272705B2 (en) Early exception detection
US6931515B2 (en) Method and system for using dynamic, deferred operation information to control eager deferral of control-speculative loads
US7401328B2 (en) Software-implemented grouping techniques for use in a superscalar data processing system
US5237664A (en) Pipeline circuit
US6032250A (en) Method and apparatus for identifying instruction boundaries
US20030084272A1 (en) Handling problematic events in a data processing apparatus
TWI606393B (en) Processor and method of determining memory ownership on cache line basis for detecting self-modifying code
JPH1173301A (en) Information processor
US8296548B2 (en) Device and method for finding extreme values in a data block
JP3438631B2 (en) Arithmetic processing device and method
WO2001082059A2 (en) Method and apparatus to improve context switch times in a computing system

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTEL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:COLLARD, JEAN-FRANCOIS C.;REEL/FRAME:014509/0628

Effective date: 20030918

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION