WO2023139344A1 - Data processing - Google Patents

Data processing Download PDF

Info

Publication number
WO2023139344A1
WO2023139344A1 PCT/GB2022/053215 GB2022053215W WO2023139344A1 WO 2023139344 A1 WO2023139344 A1 WO 2023139344A1 GB 2022053215 W GB2022053215 W GB 2022053215W WO 2023139344 A1 WO2023139344 A1 WO 2023139344A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
array
instruction
access
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/GB2022/053215
Other languages
English (en)
French (fr)
Inventor
Eric Biscondi
Didier MARTINOT
Joe Savage
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Ltd
Original Assignee
ARM Ltd
Advanced Risc Machines Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Ltd, Advanced Risc Machines Ltd filed Critical ARM Ltd
Priority to KR1020247027330A priority Critical patent/KR20240132511A/ko
Priority to EP22829827.9A priority patent/EP4466605B1/en
Priority to JP2024541131A priority patent/JP2025502112A/ja
Priority to CN202280088688.6A priority patent/CN118541670A/zh
Priority to US18/729,148 priority patent/US20250181352A1/en
Priority to IL313629A priority patent/IL313629A/en
Publication of WO2023139344A1 publication Critical patent/WO2023139344A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • G06F9/30038Instructions to perform operations on packed data, e.g. vector, tile or matrix operations using a mask
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30018Bit or string instructions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/3016Decoding the operand specifier, e.g. specifier format

Definitions

  • This disclosure relates to data processing.
  • Some data processing arrangements allow for vector processing operations, involving applying a single vector processing instruction to data items of a data vector having a plurality of data items at respective positions in the data vector.
  • scalar processing operates on, effectively, single data items rather than on data vectors.
  • data processing apparatus comprising: vector processing circuitry to access an array register having at least n x n storage locations, where n is an integer greater than one, the vector processing circuitry comprising: instruction decoder circuitry to decode program instructions; and instruction processing circuitry to execute instructions decoded by the instruction decoder circuitry; in which the instruction decoder circuitry is responsive to an array access instruction, to control the instruction processing circuitry to access, for a vector of n vector elements, a set of n storage locations each having a respective array location in the array register, the array location accessed for a given vector element of the vector being defined by one or more coordinates associated with the given vector element by one or more parameters of the array access instruction.
  • a data processing method comprising: accessing an array register having at least n x n storage locations, where n is an integer greater than one, by: decoding program instructions; and executing instructions decoded by the decoding step; in which the decoding step is responsive to an array access instruction, to control the executing step to access, for a vector of n vector elements, a set of n storage locations each having a respective array location in the array register, the array location accessed for a given vector element of the vector being defined by one or more coordinates associated with the given vector element by one or more parameters of the array access instruction.
  • a virtual machine comprising a data processor to execute a computer program comprising machine readable instructions, in which execution of the computer program causes the data processor to operate as a data processing apparatus comprising: vector processing circuitry to access an array register having at least n x n storage locations, where n is an integer greater than one, the vector processing circuitry comprising: instruction decoder circuitry to decode program instructions; and instruction processing circuitry to execute instructions decoded by the instruction decoder circuitry; in which the instruction decoder circuitry is responsive to an array access instruction, to control the instruction processing circuitry to access, for a vector of n vector elements, a set of n storage locations each having a respective array location in the array register, the array location accessed for a given vector element of the vector being defined by one or more coordinates associated with the given vector element by one or more parameters of the array access instruction.
  • Figure 1 schematically illustrates a data processing apparatus
  • Figures 2 schematically illustrates a storage array
  • Figures 3 schematically illustrates storage locations within the storage array of Figure 2;
  • FIGS 4 and 5 schematically illustrate linear array accesses
  • Figure 6 schematically illustrates an addressing notation
  • FIGS. 7 to 10 schematically illustrate further respective array access examples
  • Figure 11 schematically illustrates a virtual machine
  • Figure 12 is a schematic flowchart representing a method.
  • Figure 1 schematically illustrates a data processing system 10 comprising a processor 20 coupled to a memory 30 storing data values 32 and program instructions 34.
  • the processor 20 includes an instruction fetch unit 40 for fetching program instructions 34 from the memory 30 and supplying the fetch program instructions to decoder circuitry 50.
  • the decoder circuitry 50 decodes the fetched program instructions and generates control signals to control processing circuity 60 to perform processing operations upon registers stored within register circuity 70 as specified by the decoded vector instructions.
  • the processor 20 can access a storage array 90 of at least n x n storage locations. This is drawn in broken line to illustrate that it may or may not be provided as part of the processor 20.
  • the storage array can be implemented as any one or more of the following: architecturally-addressable registers; non-architecturally-addressable registers; a scratchpad memory; and a cache.
  • the processing circuitry 60 may be, for example vector processing circuitry and/or scalar processing circuitry.
  • scalar processing involves applying a single vector processing instruction to data items of a data vector having a plurality of data items at respective positions in the data vector.
  • Scalar processing operates on, effectively, single data items rather than on data vectors.
  • Vector processing can be useful in instances where processing operations are carried out on many different instances of the data to be processed.
  • a single instruction can be applied to multiple data items (of a data vector) at the same time. This can improve the efficiency and throughput of data processing compared to scalar processing.
  • the present techniques relate to processing two dimensional arrays of data items, stored in for example the storage array 90.
  • the two-dimensional storage arrays may, in at least some examples, be accessed as vectors, for example of n elements.
  • the storage array 90 may store a square array portion of a larger or even higher-dimensioned array or matrix of data items in memory.
  • Multiple instances of the storage array 90 may be provided so as to store multiple respective arrays of data items.
  • Embodiments of the present disclosure include an apparatus, for example of the type shown in Figure 1 , operable or configured to decode and execute such program instructions.
  • Figure 1 therefore provides an example of processing circuitry to selectively apply a processing operation to carry out functionality to be discussed below.
  • the vector processing operations may be under the control of so-called predicates.
  • a respective predicate can control whether or not a particular vector function is applied in respect of one of the data item positions within the linear arrays (which could be treated as data vectors in this example arrangement).
  • the processing circuitry 60 is arranged, under control of instructions decoded by decoder circuitry 50, to access the registers 70 and/or the storage array 90. Further details of this latter arrangement will now be described with reference to Figure 2.
  • the storage array 90 is arranged as an array 205 of at least n x n storage locations 200, where n is an integer greater than 1 .
  • n is 16 which implies that the granularity of access to the storage locations 200 is 1/16 th of the total storage in either horizontal or vertical array directions. This aspect will be discussed further below.
  • Example access as linear arrays From the point of view of the processing circuitry, the array of n x n locations is accessible as n linear (one-dimensional) arrays in a first direction (for example, a horizontal direction as drawn) and n linear arrays in a second array direction (for example, a vertical direction as drawn).
  • Each linear array has n elements so that each of the storage arrays stores a linear array of n data items.
  • the n x n storage locations are arranged or at least accessible, from the point of view of the processing circuitry 60, as 2n linear array, each of n data items.
  • the array of n x n storage locations comprises an array of storage elements accessible by the instruction processing circuitry as 2n linear arrays, the 2n linear arrays comprising n linear arrays in the first array direction and n linear arrays in the second array direction, each linear array containing n data items (for example, though this is not a requirement, as a data vector register.
  • Example instructions discussed below may specify one or more of the 2n linear arrays.
  • access, for a vector of n vector elements is made to a set of n storage locations each having a respective array location in the array register, the array location accessed for a given vector element of the vector being defined by one or more coordinates associated with the given vector element by one or more parameters of the array access instruction.
  • a parameter may be, for example, a reference to an index vector or to a register storing such a vector of indices.
  • the array location accessed for the given vector element of the vector may be defined by at least a pair of coordinates associated with the given vector element of the vector by parameters of the array access instruction.
  • the pair of coordinates may define, for the given vector element of the vector, an array location in each of a first array direction and a second array direction different to the first array direction (for example, x and y directions as presented schematically below).
  • the array location accessed for the given vector element of the vector may be defined by a coordinate in a first array direction dependent upon a vector position of the given vector element, and a coordinate in a second array direction different to the first array direction defined by a parameter of the array access instruction.
  • This arrangement applies to the examples (such as that shown in Figure 7) in which one index vector is provided in a two-dimensional system; the other index is implied or provided by the vector element position.
  • the second array direction may be orthogonal to the first array direction. Examples of horizontal rows and vertical columns are discussed here.
  • the instruction decoder circuitry may be configured to select the first array direction and the second array direction from two candidate array directions in response to a parameter of the array access instruction.
  • a parameter of the array access instruction For example, the “HV” parameter discussed below can be used.
  • Predicated control of operation may optionally be used, in which the instruction processing circuitry is responsive to one or more sets of predicates associated with respective vector elements to control accessing of the array register in respect of the respective vector elements.
  • the array of storage locations 200 is accessible by access circuitry 210, 220, column selection circuitry 230 and row selection circuitry 240, under the control of control circuitry 250 in communication with at least the processing circuitry and optionally with the decoder circuitry 50.
  • the array access instruction may comprise an instruction selected from the list consisting of: a vector storage instruction to store data items to respective locations in the array register; and a vector retrieval instruction to retrieve data items from respective locations in the array register.
  • the array access instruction may comprise a vector storage instruction to store vector elements of an input data vector to respective locations in the array register; or a vector retrieval instruction to retrieve data items of a set of memory locations of the main memory to respective vector elements of a destination data vector.
  • the vector storage instruction may comprise an instruction selected from the list consisting of: a first vector retrieval instruction to retrieve vector elements of an output data vector from respective locations in the array register; and a second vector retrieval instruction to retrieve data items to a set of memory locations of the main memory from respective locations in the array register.
  • Each of these types of access may be selectable by the use of a separate respective instruction or op-code, and/or by the use of respective parameters of an instruction.
  • a single instruction may provide for any of these accesses, with the direction (write or read as set out above) being defined by an instruction parameter, and with the source/destination (vector register or set of memory locations) being defined and/or identified by another parameter.
  • the writing to or reading from the register 260 or the set 270 of locations can be performed serially, for example one data item or element at a time in a predefined order, in parallel (all at substantially the same time) or in groups of, for example, 4 or 8 data items or elements in parallel.
  • the routing of data items to or from vector elements or memory locations can be under the control of the access circuitry 210, 220 and/or the control circuitry 250.
  • the n linear arrays in the first direction (a horizontal or “H” direction as drawn), in the case of an example storage array 90 designated as “A1” are each of 16 data items 0...F (in hexadecimal notation) and may be referenced in this example as A1 H0...A1 H15.
  • A1 H0...A1 H15 there could be more than one such storage array 90 implemented, for example A0, A1 , A2 and so on.
  • the same underlying data, stored in the 256 entries (16 x 16 entries) of the storage array 90 A1 of Figure 3 may instead be referenced in the second direction (a vertical or “V” direction as drawn) as A1V0...A1V15.
  • a data item 260 is referenced as item F of A1 HO but item 0 of A1 V15.
  • H and V does not imply any spatial or physical layout requirement relating to the storage of the data elements making up the storage array 90, nor does it have any relevance to whether the storage arrays store row or column data in an example application involving matrix processing.
  • second array direction for example vertical or horizontal as drawn in Figure 2
  • first array direction for example horizontal or vertical respectively as drawn).
  • FIG. 4 In order to access one of the linear arrays A1 HO ... A1 H15 in the first direction, for example the horizontal direction as drawn, reference is made to Figure 4 in which an arbitrary linear array A1 Hm 300 (where m is an arbitrary number between 0 and 15 in this example) is being accessed.
  • the row selection circuitry 240 is controlled by the control circuitry 250 to select the row of storage locations corresponding to the linear array 300, and the access circuitry 210 controls access (input or output) out of individual data items 310 of the linear array 300 to be provided via an interface 320 to the processing circuitry.
  • the column selection circuitry 230 selects the column of storage elements corresponding to the linear array 400 and data is read (output) via the access circuitry 220 to be interfaced with the processing circuitry by an interface 410.
  • a linear array A1 Hm represents 16 data items each of 32 bits. There are 16 such linear arrays, and each linear array A1 Vm in the second array direction also has 16 entries of 32 bits. Instead, however, this storage could be arranged as (say) a vector of 64 data items of 8 bits in each direction. In other words, the granularity of access to the storage which provides the storage array 90 could be a granularity of 8 bits rather than a granularity of 32 bits. However, in the present examples, the granularity and the number of data items in each linear array in the first and second directions should be the same (16 in the first example, 64 in the second example).
  • an example architecture may support a scalable or processor-selectable vector length as discussed above, for example with the processor 20 maintaining a variable VL indicative of the vector length in use.
  • VL is established or selected using techniques defined by the SVE / SVE2 arrangements discussed above. So while in this particular example an example vector length of 512 bits is used, in general, A1 Hm represents (VL/32) items each of 32 bits, or even more generally (VL/ELEM_SIZE()) items of ELEM_SIZE() bits.
  • the instruction processing circuitry 60 is configured to store an input vector or linear array to the array of storage locations as a group (A1 Hm) of n storage locations arranged in the first array direction; and is responsive to a data retrieval instruction, to retrieve, as a linear array, a set of n storage locations arranged in an array direction (A1 Hm or A1 Vm for example) selected, under control of the data retrieval instruction, from the set of candidate array directions; and the first array direction is a predetermined array direction (for example, horizontal as drawn).
  • A1 Hm or A1 Vm for example
  • the examples discussed above relate to accessing multi-dimensional storage arrays such as two-dimensional storage arrays in either the horizontal or the vertical directions so as to store or retrieve linear arrays with respect to the two-dimensional storage arrays.
  • the array location accessed for a given vector element of the vector being defined by one or more coordinates associated with the given vector element by one or more parameters of the array access instruction.
  • a parameter may be, for example, a reference to an index vector or to a register storing such a vector of indices.
  • Examples of the present techniques provide further instructions or variants of instructions which, when decoded and executed, provide for the use of one or more vectors of indices to perform indexed accesses by row (horizontal) and/or column (vertical) in order to store or gather individual elements from the storage array into (or from) a destination/source vector in a register or in memory.
  • the apparatus of Figures 1 and 2 operating in accordance with the techniques described below, provides an example of data processing apparatus 10 comprising: vector processing circuitry 20 to access an array register having at least n x n storage locations, where n is an integer greater than one, the vector processing circuitry comprising: instruction decoder circuitry 50 to decode program instructions; and instruction processing circuitry 60 to execute instructions decoded by the instruction decoder circuitry; in which the instruction decoder circuitry 50 is responsive to an array access instruction, to control the instruction processing circuitry 60 to access, for a vector of n vector elements, a set of n storage locations each having a respective array location in the array register, the array location accessed for a given vector element of the vector being defined by one or more coordinates associated with the given vector element by one or more parameters of the array access instruction.
  • Figure 6 provides a representation of an example 4 x 4 storage array (where the array size of 4 x 4 is a simplification for the purposes of clarity of the diagrams, bearing in mind that an actual array size may be somewhat larger than 4 x 4 in a practical embodiment) in which the array is identified as ZAO; rows are identified by “H” plus an index n, m, p, q (for example 0-3); and columns are identified by “V” plus an index.
  • Figure 7 schematically illustrates a move (MOV) instruction which, as illustrated, moves data from the two-dimensional ZAO array to a one-dimensional vector register Zd, according to a vector Zc of indices applicable to the vertical direction as drawn.
  • MOV move
  • each element is derived from a respective column ZAOV of the array ZAO in Figure 7.
  • the index vector Zc defines a respective vertical coordinate or row from which to extract the data item applicable to that column. So, for the column ZAOVn, the respective index element is n so that the element a is extracted from the row ZAOHn. For the next-adjacent column ZAOVm, the index is q so that the element b is extracted from the row ZAOHq, and so on.
  • Each of these operations is subject to predication so that a predicate vector defines whether, for each processing lane (corresponding to elements of Zd) the operation takes place as shown.
  • the predication arrangement can allow for that element of Zd to be zeroed or left unaltered.
  • the destination vector Zd is assembled as [a b c d] by gathering four respective data values from storage locations defined by a horizontal position applicable to the position within the destination vector of the respective destination vector element and a vertical position defined by the respective element of the index vector Zc.
  • MOVA represents an array move instruction.
  • Zd is a destination vector register.
  • B indicates a byte format in this example.
  • Pg is a predicate register which controls operation for each vector lane.
  • M is a modifier relating to the predicate operation, for example defining whether an inactive predicate indicates that the element at that lane should be set to zero or maintained at its previous value.
  • ZAt defines the array in use (such as ZAO as drawn).
  • H or V defines whether the indexed access is in a horizontal or vertical direction.
  • B indicates a byte format.
  • Zc is a vector of indices.
  • Figure 9 provides an example using two index vector is stored by respective vector registers Zb, Zc to access the array ZAO.
  • a destination register Zd is populated by [a b e f] using vertically indexed access to the array ZAO according to the index vector Zb.
  • MOVA Ze.B, Pg/M, ZA0V.B[Zc] a destination register Ze is populated by [c d g h] using vertically indexed access to the array ZAO according to the index vector Zc.
  • a previously proposed primary zip operation populates a vector register Zf
  • the ZIP1 instruction reads adjacent vector elements from the lower half of two source registers (in this case Zd and Ze as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination register Zf.
  • the first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
  • a previously proposed secondary zip operation populates a vector register Zg.
  • the ZIP2 operation reads adjacent vector elements from the upper half of two source vector registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination vector register.
  • the first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
  • a potential use of such instructions could be the extraction of small patches from a storage array, for example to provide processing for a patch of pixels from an image, or to extract the rows of small matrices produced by predicated outer product instructions, for example those smaller than the streaming vector length divided by the element size.
  • a move operation could be provided from a source vector register into the storage array, with the storage array locations to which data is moved from respective vector elements of the source vector register being defined by a horizontally or vertically indexed access using these same techniques.
  • the syntax may be similar to that discussed above, but using a “store” command with the destination being defined by indexed access into an array (an example being ⁇ ZAtxHV>.B[ ⁇ Zc>] as used above) and the source being defined by a vector register (an example being ⁇ Zd>.B as used above).
  • predication can be used in the same manner as described above.
  • the move instruction as discussed can operate in either direction between a vector register and a set of array locations, with the sense or direction of the operation (reading from the array locations or writing to the array locations) being defined by the ordering of the operands defining the origin and destination of the data (destination defined first, then origin, in the example syntax shown here).
  • a store operation to be described below is always from a set to array locations to memory. Another variant is illustrated by the following example instruction:
  • ST1W is a word-based store instruction.
  • the suffix .S indicates a data element size in the vector, being selected from a list which may include (for example):
  • the data destination is defined in the same way as discussed above, in that ⁇ ZAtxHV>.S[ ⁇ Zc>] defines access into the array ZAt, either horizontally (H) or vertically (V) indexed by an index vector stored in Zc.
  • Optional predication is provided by ⁇ Pg>.
  • SP> ⁇ , ⁇ Xm>, LSL #2 ⁇ ] provides a known definition (in the context of the SVE system) of a corresponding set of memory locations, by defining a base address (Xn or SP), then an offset (Xm) defined as a number of elements.
  • the instruction decoder circuitry is responsive to an array access instruction (such as the move or store instructions discussed above), to control the instruction processing circuitry to access (for example read from or write to), for a vector of n vector elements (whether embodied in a vector register or in memory), a set of n storage locations each having a respective array location in the array register, the array location accessed for a given vector element of the vector being defined by one or more coordinates associated with the given vector element by one or more parameters of the array access instruction.
  • array access instruction such as the move or store instructions discussed above
  • n vector elements whether embodied in a vector register or in memory
  • a set of n storage locations each having a respective array location in the array register
  • the array location accessed for a given vector element of the vector being defined by one or more coordinates associated with the given vector element by one or more parameters of the array access instruction.
  • parameters include xHV>.B[ ⁇ Zc>], namely the index vector, for example in conjunction with the definition ⁇ HV>
  • any of the techniques discussed above may be used in the context of two-dimensional indexing.
  • An example is illustrated schematically in Figure 10, in which, for providing access to an example storage array ZAO, a pair of index vectors is provided by respective vector registers Zx, Zy.
  • the pair of index vectors is accessed at that lane or position to generate a pair of (X, Y) coordinates defining an array storage location to be accessed (read from or stored to) relating to that lane or position.
  • the syntax may be similar to that discussed above, except that both the H and V parameters may be set (to indicate indexing in both directions) and the pair of index registers defined as the source of the indexing information:
  • the arrays can be implemented as any one or more of the following: architecturally-addressable registers; non-architecturally-addressable registers; a scratchpad memory; and a cache.
  • one or more coordinates of storage locations to be accessed in respect of a vector element position or lane can be defined by an entry in an index vector, zero or more coordinates can be implied by the vector element position or lane, and zero or more coordinates can be specified by one or more parameters of the access instruction.
  • FIG. 11 schematically illustrates a virtual machine by which some or all of the functionality discussed above may be provided.
  • the virtual machine comprises a central processing unit (CPU) as an example of data processing circuitry 1100, a non-volatile memory 1110, a control interface 1120 and an input/output (IO) interface 1130, all interconnected by a bus arrangement 1140.
  • a random access memory (RAM) 1150 stores program instructions providing software 1160 to control operations of the CPU 1100. Under the control of the software 1160, the CPU 1100 provides or emulates the functionality of one or more of the processing instructions discussed above.
  • the RAM 1150 also stores program instructions 1170 and data 1180, where the program instructions 1170 are instructions applicable to the processor 20 of Figure 1 and which are interpreted, emulated or otherwise executed by the CPU 1100 acting as a virtual machine.
  • the data 1180 is data corresponding to the data 32 of Figure 1 to be acted upon by (virtual) execution of the program instructions 1170.
  • the arrangement of Figure 11 therefore provides an example of a virtual machine comprising a data processor (such as the CPU 1100) to execute a computer program comprising machine readable instructions (for example the software 1160), in which execution of the computer program causes the data processor to operate as a data processing apparatus of the type described above.
  • Example embodiments are also represented by computer software which, when executed by a computer, causes the computer to carry out one or more of the techniques described here including the method of Figure 12 discussed below, and by a non-transitory machine readable storage medium which stores such computer software.
  • the techniques described above may be implemented by the processing circuitry (which may comprise or may control the control circuitry) causing the control circuitry to control the access and selection circuitry 210, 220, 230, 240 to access the appropriate elements in the storage array.
  • Figure 12 is a schematic flowchart illustrating a data processing method comprising: accessing (at a step 1200) an array register having at least n x n storage locations, where n is an integer greater than one, by: decoding (at a step 1210) program instructions; and executing (at a step 1220) instructions decoded by the decoding step; in which the decoding step 1210 is responsive to an array access instruction, to control the executing step 1220 to access, for a vector of n vector elements, a set of n storage locations each having a respective array location in the array register, the array location accessed for a given vector element of the vector being defined by one or more coordinates associated with the given vector element by one or more parameters of the array access instruction.
  • the words “configured to...” are used to mean that an element of an apparatus has a configuration able to carry out the defined operation.
  • a “configuration” means an arrangement or manner of interconnection of hardware or software.
  • the apparatus may have dedicated hardware which provides the defined operation, or a processor or other processing device may be programmed to perform the function. “Configured to” does not imply that the apparatus element needs to be changed in any way in order to provide the defined operation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Complex Calculations (AREA)
  • Executing Machine-Instructions (AREA)
PCT/GB2022/053215 2022-01-19 2022-12-14 Data processing Ceased WO2023139344A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
KR1020247027330A KR20240132511A (ko) 2022-01-19 2022-12-14 데이터 처리
EP22829827.9A EP4466605B1 (en) 2022-01-19 2022-12-14 Data processing
JP2024541131A JP2025502112A (ja) 2022-01-19 2022-12-14 データ処理
CN202280088688.6A CN118541670A (zh) 2022-01-19 2022-12-14 数据处理
US18/729,148 US20250181352A1 (en) 2022-01-19 2022-12-14 Data processing
IL313629A IL313629A (en) 2022-01-19 2022-12-14 Data processing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB2200675.3A GB2614886B (en) 2022-01-19 2022-01-19 Data processing
GB2200675.3 2022-01-19

Publications (1)

Publication Number Publication Date
WO2023139344A1 true WO2023139344A1 (en) 2023-07-27

Family

ID=80448744

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2022/053215 Ceased WO2023139344A1 (en) 2022-01-19 2022-12-14 Data processing

Country Status (9)

Country Link
US (1) US20250181352A1 (https=)
EP (1) EP4466605B1 (https=)
JP (1) JP2025502112A (https=)
KR (1) KR20240132511A (https=)
CN (1) CN118541670A (https=)
GB (1) GB2614886B (https=)
IL (1) IL313629A (https=)
TW (1) TW202344983A (https=)
WO (1) WO2023139344A1 (https=)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190250915A1 (en) * 2016-04-26 2019-08-15 Onnivation, LLC Computing Machine Using a Matrix Space For Matrix and Array Processing
EP3629154A2 (en) * 2018-09-27 2020-04-01 INTEL Corporation Systems for performing instructions to quickly convert and use tiles as 1d vectors
US20210042261A1 (en) * 2019-08-05 2021-02-11 Arm Limited Data processing
GB2594971A (en) * 2020-05-13 2021-11-17 Advanced Risc Mach Ltd Variable position shift for matrix processing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170337156A1 (en) * 2016-04-26 2017-11-23 Onnivation Llc Computing machine architecture for matrix and array processing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190250915A1 (en) * 2016-04-26 2019-08-15 Onnivation, LLC Computing Machine Using a Matrix Space For Matrix and Array Processing
EP3629154A2 (en) * 2018-09-27 2020-04-01 INTEL Corporation Systems for performing instructions to quickly convert and use tiles as 1d vectors
US20210042261A1 (en) * 2019-08-05 2021-02-11 Arm Limited Data processing
US11074214B2 (en) 2019-08-05 2021-07-27 Arm Limited Data processing
GB2594971A (en) * 2020-05-13 2021-11-17 Advanced Risc Mach Ltd Variable position shift for matrix processing

Also Published As

Publication number Publication date
TW202344983A (zh) 2023-11-16
IL313629A (en) 2024-08-01
EP4466605A1 (en) 2024-11-27
GB2614886B (en) 2025-03-26
KR20240132511A (ko) 2024-09-03
JP2025502112A (ja) 2025-01-24
GB2614886A (en) 2023-07-26
EP4466605B1 (en) 2026-03-04
US20250181352A1 (en) 2025-06-05
CN118541670A (zh) 2024-08-23

Similar Documents

Publication Publication Date Title
US11321092B1 (en) Tensor-based memory access
US12235773B2 (en) Two address translations from a single table look-aside buffer read
CN115904501A (zh) 具有在每个维度上可选择的多维循环寻址的流引擎
JP2023525811A (ja) 行列処理のための可変位置シフト
TWI844714B (zh) 資料處理
JP2023525812A (ja) 行列処理のための行又は列の位置のマスキング
EP4466605B1 (en) Data processing
US20250173148A1 (en) Technique for handling data elements stored in an array storage
US20250173146A1 (en) Technique for handling data elements stored in an array storage
US20260079714A1 (en) Data processing array
KR102673748B1 (ko) 다차원 직접 메모리 접근 컨트롤러 및 그것을 포함하는 컴퓨터 시스템
KR20250089097A (ko) 메모리 장치 및 그 동작 방법
EP4537228A1 (en) Technique for performing outer product operations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22829827

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 313629

Country of ref document: IL

WWE Wipo information: entry into national phase

Ref document number: 2024541131

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 202280088688.6

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 18729148

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 202417054347

Country of ref document: IN

ENP Entry into the national phase

Ref document number: 20247027330

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2022829827

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022829827

Country of ref document: EP

Effective date: 20240819

WWP Wipo information: published in national office

Ref document number: 18729148

Country of ref document: US

WWG Wipo information: grant in national office

Ref document number: 2022829827

Country of ref document: EP