WO2023104143A1 - 向量混洗方法、处理器及电子设备 - Google Patents

向量混洗方法、处理器及电子设备 Download PDF

Info

Publication number
WO2023104143A1
WO2023104143A1 PCT/CN2022/137500 CN2022137500W WO2023104143A1 WO 2023104143 A1 WO2023104143 A1 WO 2023104143A1 CN 2022137500 W CN2022137500 W CN 2022137500W WO 2023104143 A1 WO2023104143 A1 WO 2023104143A1
Authority
WO
WIPO (PCT)
Prior art keywords
source
elements
register
index value
shuffling
Prior art date
Application number
PCT/CN2022/137500
Other languages
English (en)
French (fr)
Inventor
汪文祥
Original Assignee
龙芯中科技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 龙芯中科技术股份有限公司 filed Critical 龙芯中科技术股份有限公司
Publication of WO2023104143A1 publication Critical patent/WO2023104143A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode

Definitions

  • the present application relates to the field of computer technology, in particular to a vector shuffling method, processor and electronic equipment.
  • SIMD Single Instruction Multiple Data
  • shuffling instructions are generally introduced in SIMD processors, and different shuffling instructions can meet different requirements.
  • the operation method is relatively complicated and reduces the execution efficiency of specific functions.
  • the present application provides a vector shuffling method, processor and electronic equipment to solve the problem in the prior art that multiple instructions are required to implement a series of operations, the operation mode is relatively complicated, and the execution efficiency of specific functions is reduced.
  • the present application discloses a vector shuffling method, the method comprising:
  • An instruction is received, and the instruction includes: a register identifier and a shuffling parameter; wherein, the register identifier includes a source register identifier and a destination register identifier; the source register identifier is used to represent a source register, and the source register is a storage execution vector shuffling The register of the source element operated during the washing operation; the destination register identifier is used to characterize the destination register, and the destination register is a register storing the target element obtained after performing the vector shuffling operation; the shuffling parameter is used for Indicates the parameters by which to perform the vector shuffling operation on said source elements;
  • the destination element is written to the destination register.
  • a processor including:
  • the plurality of vector registers including source registers and target registers, the source registers are used to store data elements
  • the decoding unit is used to decode the vector shuffling instruction; wherein, the vector shuffling instruction includes: a register identifier and a shuffling parameter, and the register identifier includes a source register identifier and a destination register identifier;
  • the execution unit in response to the vector shuffling instruction, performs a vector shuffling operation on the source elements obtained from the source register according to the shuffling parameter, and obtains the target element after the vector shuffling operation, and converts the The destination element is written to the destination register.
  • an electronic device including a memory and one or more programs, wherein one or more programs are stored in the memory and configured to be executed by one or more processors.
  • One or more of the described vector shuffling methods One or more of the described vector shuffling methods.
  • the present application includes the following advantages:
  • the vector shuffling method, processor, and electronic device provided in the embodiments of the present application can perform a vector shuffling operation on the elements obtained in the source register by adding register identifiers and shuffling parameters to the instruction, and combining the shuffling parameters.
  • a vector shuffling operation of a specific function can be realized by one instruction, and the specific function does not need to be realized by multiple instructions performing the shuffling operation, thereby improving the execution efficiency of the specific function.
  • FIG. 1 is a flow chart of the steps of a vector shuffling method provided in Embodiment 1 of the present application;
  • FIG. 2 is a flow chart of the steps of a vector shuffling method provided in Embodiment 2 of the present application;
  • FIG. 3 is a flow chart of the steps of a vector shuffling method provided in Embodiment 3 of the present application;
  • FIG. 4 is a flow chart of the steps of a vector shuffling method provided in Embodiment 4 of the present application.
  • FIG. 5 is a flow chart of the steps of a vector shuffling method provided in Embodiment 5 of the present application.
  • FIG. 6 is a flow chart of the steps of a vector shuffling method provided in Embodiment 6 of the present application.
  • FIG. 7 is a structural block diagram of a processor provided in an embodiment of the present application.
  • FIG. 8 is a structural block diagram of an electronic device provided by an embodiment of the present application.
  • Embodiments of the present application apply to any processor or machine that performs data manipulation.
  • the present application is not limited to processors or machines that perform operations on 256-bit, 128-bit, 64-bit, 32-bit, or 16-bit data, but applies to any processor and machine in which manipulation of packed data is required.
  • the methods of the present application are embodied as machine-executable instructions.
  • the instructions can be used to cause a general or special purpose processor programmed with these instructions to perform the steps of the application.
  • the steps of the present application may be performed by dedicated hardware components containing hard-wired logic for performing the steps, or by any combination of programmed computer components and custom hardware components.
  • the software may be stored in memory in the system.
  • FIG. 1 shows a flow chart of steps of a vector shuffling method provided by an embodiment of the present application.
  • the vector shuffling method provided by the embodiment of the present application can be executed by a CPU (Central Processing Unit, central processing unit), comprising the following steps:
  • Step 101 Receive an instruction, where the instruction includes: a register identifier and a shuffling parameter.
  • the instruction refers to an instruction for performing a vector shuffling operation, and the instruction is an instruction for the CPU to execute.
  • the CPU may receive an instruction for executing the vector shuffling operation, and the instruction includes register identifiers and shuffling parameters.
  • the register identification may include: a source register identification and a destination register identification, and the source register identification is used to characterize the source register, and the source register is a register storing the source element operated when performing the vector shuffling operation;
  • the source element to be operated can be all the data stored in the source register, or part of the data stored in the source register.
  • the destination register identifier is used to represent the destination register, and the destination register is a register that stores the target element obtained after performing the vector shuffling operation.
  • the number of source registers can be one or two, that is, source elements come from one or two registers.
  • the number of source registers can be determined according to business requirements. This is not limited.
  • the shuffling parameters can be used to indicate the parameters on which the vector shuffling operation is performed on the source elements.
  • the shuffling parameters can include: parameters such as index value and opcode; optionally, the index value is passed through the immediate value mode; opcodes are codes represented in binary, or opcodes are identifiers that can be converted to binary codes.
  • step 102 is executed.
  • Step 102 Execute the instruction to perform a vector shuffling operation on the source elements obtained from the source register according to the shuffling parameter, and obtain a target element after the vector shuffling operation.
  • the target element refers to an element obtained after performing a vector shuffling operation on elements in the source register.
  • the instruction may be executed by the CPU to execute the vector shuffling operation on the source elements obtained from the source register according to the shuffling parameters, and obtain the execution The target element after the vector shuffle operation.
  • Step 103 Write the target element into the destination register.
  • the target element after the target element after the vector shuffling operation is obtained, the target element may be written into the destination register.
  • the source element is obtained according to the shuffling parameter to perform the vector shuffling operation
  • the method for obtaining the target element includes: according to the position information of the source element in the source register required by the vector shuffling operation and the vector shuffling operation
  • the required number of source elements is to select source elements from source registers so that all selected source elements are used as target elements.
  • the above step 102 may include:
  • Sub-step A1 According to the shuffling parameter, determine the position information of the source element in the source register and the number of source elements required for the vector shuffling operation; wherein, the selected number of source elements is one or Multiple.
  • the shuffling parameters include parameters that can be used to indicate the position information of the source element in the source register and the number of source elements.
  • the CPU After the CPU receives the instruction for performing the vector shuffling operation, it may parse the instruction to obtain the shuffling parameters included in the instruction.
  • the location information of the source elements required for the vector shuffling operation in the source register and the number of source elements required for the vector shuffling operation can be determined according to the shuffling parameters, wherein, The number of selected source elements may be one or multiple, and in the following examples, multiple is used as an example for illustration.
  • sub-step A2 is executed.
  • Sub-step A2 Select source elements from the source register according to the determined position information and the number of source elements.
  • the source element is selected from the source register.
  • sub-step A3 is executed.
  • Sub-step A3 Determine all the selected source elements as target elements.
  • all the selected source elements can be used as target elements to be written into the destination register.
  • the shuffling parameter may include an index value and an operation code, and the source element is selected through the index value and the operation code. Specifically, it may be described in detail in conjunction with the following specific implementation manners.
  • the index value is used to indicate the position information of each source element required by the vector shuffling operation in the source register;
  • the operation code is used to represent the source register and the destination register.
  • Sub-step B1 Determine a selection rule for obtaining source elements according to the index value and the operation code.
  • the selection rule refers to a constraint condition for reading source elements from a source register.
  • the selection rule for obtaining the source element from the source register can be determined according to the index value and the operation code included in the shuffling parameter. Specifically, it can be divided into the following two situations:
  • the method of grouping source elements can be determined according to the number of index values, and the selection of source elements can be determined according to the grouping method and operation code Rules, that is, first group the source elements in the source register, such as grouping N adjacent source elements, and then obtain the selection rules of the source elements from the grouping elements according to the index value; usually, N is four, of course, N can also be determined according to specific application scenarios such as the number of bits of the source register, and details will not be described here.
  • the selection rule for obtaining the source elements can be determined according to the operation code.
  • sub-step B2 After the selection rule for obtaining the source element is determined according to the index value and the operation code, sub-step B2 is performed.
  • Sub-step B2 Obtain the source element indicated by each index value from the source register according to the selection rule.
  • the source element indicated by each index value can be respectively obtained from the source register according to the selection rule.
  • the CPU can determine whether the number of index values is the same as the number of source elements through the operation code of the vector shuffling instruction, that is, the CPU can determine the grouping and selection rules of the source elements according to the operation code.
  • the target element into the position corresponding to the immediate value in the destination register, that is, to determine the position corresponding to the index value from the destination register, and store the source element in the determined position sequentially.
  • any source element is obtained through a determined index value, and the source element is written to the address in the destination register corresponding to the determined index value; for example, source element A is obtained through the index value ui8[1 :0] acquisition (where ui8 represents an immediate value, the immediate value ui8 is an index value representing a set of data, ui8[1:0] represents the number formed by the lowest 2 bits of the immediate value), and the index value ui8[1 :0] corresponds to the lowest address in a group of addresses in the destination register.
  • the acquired source element A is written to the lowest address in the destination register as a target element.
  • the immediate value ui8 is a set of data including 8 bits, and the immediate value ui8 is used to construct 4 index values, and the number formed by every 2 bits of the ui8 is used as an index value.
  • the positions or serial numbers of these index values in ui8 indicate or imply that the source elements acquired by the index values should be moved to the element positions of the destination register.
  • the index value ui8[7:6] is the 4th index value in ui8, then its corresponding source operand will be written into the 4th element position of the destination register.
  • ui8[n:n-1] is the (n+1)/2th index value in ui8
  • its corresponding source operand will be written to the nth element position of the destination register.
  • the immediate value may include other numbers of index values.
  • the source operand corresponding to the i-th index value in the immediate value will be written into the i-th element position of the destination register, where i is a positive integer.
  • the shuffling instruction when implementing the SHUF instruction (a shuffling instruction), can obtain shuffling effects of different functions according to the setting of the shuffling mode; the shuffling mode is determined by the application requirements, and usually can be Call the shuffling mode through at least one other instruction, and transfer the shuffling mode to the above shuffling instruction, or add the shuffling mode to the memory, and obtain the shuffling mode by accessing the memory during the execution of the above shuffling instruction Shuffle mode. It can be seen that in the prior art, multiple instructions or memory access methods are required to realize the shuffling instructions of different shuffling modes.
  • the operation code can be realized by a code represented in a binary mode, or the operation code can be realized by an identifier that can be converted into a binary code, therefore, in combination with the vector containing the operation code and the index value in Embodiment 1
  • the implementation method of the shuffling instruction is described in detail through the following specific embodiment 2 to embodiment 6 of the specific processing method of the vector shuffling instruction including different operation codes.
  • the operation code is the first operation code, and the number of the index value is different from the number of the source elements; as shown in Figure 2, the processing method of the vector shuffling instruction Can include:
  • Step 201 Receive an instruction, where the instruction includes: a register identifier and a shuffling parameter.
  • the number of source registers is one, that is, the source element comes from one register.
  • the shuffling parameters include an index value and an operation code; wherein, the index value is realized in the form of an immediate value; the operation code is realized in the form of an identifier that can be converted into a binary code, and the operation code is the first operation code.
  • the instruction format is "opcode destination register, source register, immediate value".
  • the instruction can be represented as “[X]VS. ⁇ B/H/W ⁇ vd,vj,ui8”; [X]VS is the instruction name in the first opcode, [X] ] is optional, used to distinguish registers with different digits, ⁇ B/H/W ⁇ is the data type in the first opcode, B means the data type is byte, H means the data type is half word, W means data The type is word, [X]VS. ⁇ B/H/W ⁇ is the first opcode in the identifier form; vd represents the destination register, vj represents the source register, and ui8 represents the immediate value.
  • VS. ⁇ B/H/W ⁇ is the first operation code that can be converted into binary form, such as converting [X]VS.B into the first operation code in binary form of 01110011100100.
  • the immediate value can be a set of data, such as the different bits ui8[1:0], ui8[3:2], ui8[5:4] and ui8[7:6] of the immediate value ui8 can be expressed for the register Index values at different positions.
  • Step 202 Execute the instruction, and form every N1 adjacent elements in the source register into a set of element groups according to the operation code and the index value; wherein, the data types of the elements are byte, half Any one of word and word, N1 is a positive integer greater than 0.
  • every N1 adjacent elements in the source register can form a set of element groups, and the data type of the adjacent elements can be byte, half Either word or word, for example, every four adjacent word elements in the source register can form a set of element groups, etc.
  • every N1 adjacent elements in the source register form a group of elements, and the data type of the adjacent elements can be any of byte, halfword, or word.
  • N1 is also the number of index values, so that even if the number of index values is less than the number of source elements, the source elements are grouped according to the difference between N1 and the number of source elements, so that the number of index values is equal to the number of source elements in each group Quantity such that each index value has a one-to-one correspondence with a source element within the group.
  • the adjacent elements are elements in the source register that are sequentially adjacent to each other, and the address of elements in multiple adjacent element groups are partly the same or completely different, and the element address is the position information of the element in the register.
  • the maximum number of elements with the same position information between every two adjacent element groups is N1-1.
  • the adjacent elements are cross-adjacent elements in the source register.
  • the operation code is the first operation code
  • the data type is byte, halfword or word
  • the N1 elements can be such as A2 to A5 may also be cross-adjacent elements of element A1 , element A3 , element A5 , and element A7 .
  • every N1 adjacent elements in the source register form a set of element groups, including two cases:
  • the elements A1-A4 form a group of elements, and the elements A5-A8 form another group of elements, and there is no element with the same position information between the two element groups;
  • the elements A1-A4 form a group of elements
  • the elements A2-A5 form another group of elements, and there are three elements with the same position information between the two element groups (that is, elements A2, A3 and Element A4).
  • elements A3-A6 can also be selected as another group of elements
  • elements A4-A7 can be selected as another group of elements; as long as the number of elements with the same position information between each two adjacent element groups is satisfied The maximum value is N1-1, which will not be repeated here.
  • the data types of the elements contained in each element group are the same, and the data types of the elements contained in different element groups are the same.
  • the divided element groups include: element group 1, element group 2, and element group 3; wherein, the data types of elements contained in element group 1, element group 2, and element group 3 are all bytes; or, element group 1.
  • the data types of elements contained in element group 2 and element group 3 are all halfwords; or, the data types of elements contained in element group 1, element group 2 and element group 3 are all words.
  • different element groups use the same index value, or different element groups use not exactly the same index value, for example, when different element groups use the same index value, element group 1 to element group 4 Both use the same index value ui8; when different element groups use different index values, element group 1 and element group 2 use ui8a as the index value to select the source element, and element group 3 and element group 4 use ui8b as the index value Select the source element, ui8a and ui8b represent different positions in ui8, and both represent index values of different values.
  • the data types of the elements in the element groups are the same, but the elements in different element groups (such as elements in element group 1 and element group 2) have different data types.
  • the number of elements in each element group is the same or different. For example, in element group 1 with 4 elements and element group 2 with 2 elements, the same immediate ui8 provides 4 index values for element group 1 and 2 index values for element group 2.
  • step 203 is executed.
  • Step 203 Determine the elements in each element group as initial source elements.
  • the elements in each element group may be determined as initial source elements.
  • the initial source element refers to the initial element used to select the source element.
  • step 204 is performed.
  • Step 204 Obtain source elements indicated by each index value from the initial source elements; the number of source elements selected from each element group is n1.
  • the source element indicated by each immediate value may be obtained from the initial source element respectively, that is, the corresponding source element is selected from the element group according to the immediate value.
  • the number of source elements selected from each element group is n1, and n1 is a positive integer greater than 0.
  • the element position can be the element address, or the sequence bit of the element in the element group, and the sequence bit indicates that the element is in the element The position number in the group.
  • the source element indicated by each immediate value is respectively obtained from the initial source element, that is, the element at the element position corresponding to the immediate value is respectively obtained from each element group, and the obtained element is determined as source element.
  • the number of source elements selected in different element groups is the same.
  • the operation code is the first operation code
  • the data type is byte, halfword, or word
  • the number of initial source elements contained in each element group is the same, which is four
  • Select the source element corresponding to the immediate value from each element group; n1 is 4; N1 n1.
  • the immediate value represents element address 3, select the element whose address is 3 from each element group, and determine all selected elements as source elements; Select the third element backward from the first element in the element group, and determine all the selected elements as source elements.
  • the number of source elements selected from each element group is four, and the data type of the source elements is byte, halfword, or word; usually, the data types of each selected source element are the same.
  • Step 205 Determine the selected source element as the target element, and write the target element into the position corresponding to the index value in the destination register.
  • a step of creating an intermediate vector may be added between step 201 and step 202; specifically, according to the determined position information and the number of source elements, from the source register Before selecting the source element, an intermediate vector is created; the intermediate vector includes at least one intermediate vector parameter, and the number of the intermediate vector parameters is equal to the number of the target elements.
  • step 204 that is, after the source element is selected from the source register, each source element selected is stored in a corresponding intermediate vector parameter in the intermediate vector; wherein, There is a one-to-one correspondence between the intermediate vector parameters and the selected source elements;
  • Step 205 is, according to the immediate value, write the content in each intermediate vector parameter to the corresponding position of the destination register.
  • the intermediate vector can be created according to the source register; wherein, the intermediate vector can be created according to the type of the source register and the like.
  • the number of intermediate vector parameters in the intermediate vector is the same as the number of target elements, and according to the index value, there is a preset corresponding relationship between the position of each target element in the destination register and each intermediate vector parameter in the intermediate vector;
  • write the content in each of the intermediate vector parameters to the corresponding position of the destination register that is, set the parameter i, i represents a constant, and the value of i
  • the range is 0 ⁇ n-1, n is determined by the number of register bits and data type; according to N1 and i, determine the source element in the source register indexed by each intermediate vector parameter in the intermediate vector; perform i from 0 to n- 1, and write the source elements corresponding to different index values in the intermediate vector to the target element position corresponding to the index in the target register.
  • intermediate vector ⁇ VR[source register].data Type[N1i+N1-1],...VR[source register].Data type[N1i] ⁇ "; where, i represents a constant, and the value range of i is 0 ⁇ n, and n is determined by the number of register bits and data type , such as when the number of bits in the register is 128 bits and the data type is byte, i is 4; when the number of bits in the register is 128 bits and the data type is halfword, i is 2; when the number of bits in the register is 128 bits, the data type is When it is a word, i is 1.
  • vj is the source register
  • create an intermediate vector vec0 ⁇ VR[vj].B[4i+3],VR[vj].B [4i+2],VR[vj].B[4i+1],VR[vj].B[4i] ⁇ ;
  • VR[vj].B[4i+3], VR[vj].B[ 4i+2], VR[vj].B[4i+1], VR[vj].B[4i] are intermediate vector parameters; i represents a constant, [4i+0], [4i+1], [4i +2] and [4i+3] represent four consecutive positions in the register; the value range of i is 0-3.
  • Writing the content in each of the intermediate vector parameters to the corresponding position of the destination register vd can be expressed as:
  • ui8[1:0], ui8[3:2], ui8[5:4] and ui8[7:6] are all immediate numbers, indicating the index value corresponding to the intermediate vector; specifically, the immediate number ui8
  • the lowest two bits (ui8[1:0]) express the index of the first target element in the intermediate vector, and the second target is expressed by the third and fourth bits (ui8[3:2]) of the immediate value ui8
  • the index of the element in the intermediate vector is expressed by the fifth and sixth bits of the immediate value ui8 (ui8[5:4])
  • the index of the third target element in the intermediate vector is expressed by the seventh and sixth bits of the immediate value ui8
  • the eighth bit (ui8[7:6]) expresses the index of the fourth target element in the intermediate vector.
  • the intermediate vector and index method are the same as the above example; when the instruction name of the opcode is XVS. ⁇ B/H/W ⁇ , two intermediate vectors are needed to realize vector shuffling operate.
  • the intermediate vector is as follows:
  • vec0 ⁇ VR[vj].B[4i+3],VR[vj].B[4i+2],VR[vj].B[4i+1],VR[vj].B[4i] ⁇
  • vec1 ⁇ VR[vj].B[4i+19],VR[vj].B[4i+18],VR[vj].B[4i+17],VR[vj].B[4i+16] ⁇
  • the intermediate vectors are vec0 and vec1; VR[vj].B[4i+3], VR[vj].B[4i+2], VR[vj].B[4i+1], VR[vj].
  • B[4i] is the intermediate vector parameter of the intermediate vector vec0, VR[vj].B[4i+19], VR[vj].B[4i+18], VR[vj].B[4i+17], VR [vj].B[4i+16] is the intermediate vector parameter of the intermediate vector vec1;
  • B indicates that the data type is byte; i indicates the position of the element in the register, [4i+0], [4i+1], [4i +2] and [4i+3] represent elements at four consecutive positions in the register, and [4i+16], [4i+17], [4i+18] and [4i+19] represent four in the register elements in consecutive positions.
  • the vector shuffling instruction "XVS.B vd, vj, ui8" indicates that four phases are read from the vector register vj Adjacent byte elements form a group of elements to be shuffled, and then write the result into the vector register vd;
  • the first opcode is VS.H, the data type is halfword, and when N1 is 4, the vector shuffling instruction "VS .H vd, vj, ui8" means to read four adjacent halfword elements from the vector register vj to form a set of elements for shuffling, and then write the obtained result into the vector register vd;
  • the first opcode is VS.
  • the data type is word
  • the vector shuffling instruction "VS.W vd, vj, ui8" means to read four adjacent word elements from the vector register vj to form a set of elements for shuffling, and then The result obtained is written into the vector register vd.
  • a shuffling parameter is added to the vector shuffling instruction, and the shuffling parameter includes an index value and an operation code, and the shuffling operation under the condition that the source operand and the number of index values are different is realized according to the index value and the operation code;
  • the operation code is the second operation code, and the number of the index value is the same as the number of the source elements; as shown in FIG. 3 , the processing method of the vector shuffling instruction Can include:
  • Step 301 Receive an instruction, where the instruction includes: a register identifier and a shuffling parameter.
  • the number of source registers is two, that is, the source elements come from two different registers; when the number of the source registers is multiple, each source register identifier in all the source registers is identical to that of the destination register The identifiers are different; or, when there are multiple source registers, there is one source register identifier in all the source registers that is the same as the destination register identifier.
  • the shuffling parameters include an index value and an operation code; wherein, the index value is realized in the form of an immediate value; the operation code is realized in the form of an identifier that can be converted into a binary code, and the operation code is a second operation code.
  • the source register includes the first source register and the second source register, and the destination register is the first source register or the second source register.
  • the instruction format is "opcode destination register, source register, immediate value".
  • the instruction can be expressed as “[X]VS.D vd,vj,ui8”; [X]VS is the instruction name in the second opcode, and D is the instruction name in the second opcode. Data type, D indicates that the data type is double word, [X]VS.D is the second opcode in the identifier form; vd indicates the destination register, vj and vd indicate the source register, and ui8 indicates the immediate value.
  • VS.D may be converted into a second operation code in binary form, such as converting VS.D into a second operation code in binary form of 01110011100111.
  • the immediate value can be a set of data, for example, the index value can be expressed by different bits ui8[1:0], ui8[3:2], ui8[5:4] and ui8[7:6] of the immediate value ui8.
  • Step 302 Execute the instruction, and according to the operation code and the index value, in the source register, obtain the source element indicated by each index value from M N2 elements of every N2 bits; wherein, The data type of the element is a double word; the number of source elements selected from the M N2 elements of each N2 bits is n2, and N2, M N2 and n2 are all positive integers greater than 0.
  • the number of index values is the same as the number of source elements, and the operation code is the second operation code.
  • the source element indicated by each index value will be obtained from M N2 elements of every N2 bits, and the element The data type of is a double word, the number of source elements selected from the M N2 elements of each N2 bit is n2, and the conditions such as N2, M N2 and n2 are all positive integers greater than 0 are determined as selection rules.
  • the element position may be an element address.
  • the source element indicated by each index value is respectively obtained from M N2 elements of every N2 bits, that is, in the first source register, the source elements indicated by each N2 bits are respectively obtained from M N2 elements of every N2 bits
  • the first source element indicated by each index value, and in the second source register respectively obtain the second source element indicated by each index value from M N2 elements of every N2 bits; the first source element and The second source element is determined as the finally selected source element.
  • the Mn2 elements can be sequentially adjacent elements, or cross-adjacent elements; for example, when Mn2 is four, it is assumed that the source register contains eight elements, namely element A1, element A2, and element A3 , element A4, element A5, element A6, element A7, and element A8, the Mn4 elements can be, for example, elements A2 to A5, or elements A1, A3, A5, and A7 cross adjacent elements.
  • M2 is 128, M N2 is four, and n2 is 2.
  • the number of source registers is two, that is, a first source register and a second source register.
  • each index value indicated by each index value is obtained from M N2 elements of every N2 bits respectively.
  • the source element includes: in the first source register, obtain the source element indicated by the first index value (such as ui8[1:0]) from M N2' elements of every N2 bits; and in the first source register , obtain the source element indicated by the second index value (such as ui8[3:2]) from M N2' elements of every N2 bits; wherein, M N2' is half of M N2 ; select from the first source register
  • the number of source elements of is n2/2, and the number of source elements selected from the second source register is n2/2.
  • each source register performs vector shuffling through different bits of the immediate value, that is, the bits in the immediate value corresponding to different source registers are different; which bits of the immediate value are used for indexing depends on the specific situation Hence, no more details here.
  • Step 303 Determine the selected source element as the target element, and write the target element into the position corresponding to the index value in the destination register.
  • a step of creating an intermediate vector may be added between step 301 and step 302; specifically, according to the determined position information and the number of source elements, from the source register Before selecting the source element, an intermediate vector is created; the intermediate vector contains at least one intermediate vector parameter, and when there is an element group, the number of the intermediate vector parameters is equal to the number of the element group; when there is no element group, the The number of intermediate vector parameters is equal to the number of said source elements.
  • each source element selected is stored in a corresponding intermediate vector parameter in the intermediate vector; wherein, There is a one-to-one correspondence between the intermediate vector parameters and the selected source elements; step 303 is, according to the immediate value, write the content in each of the intermediate vector parameters to the corresponding position of the destination register.
  • the method for creating the intermediate vector is the same as that in Embodiment 2, and will not be repeated here.
  • the content in each of the intermediate vector parameters is written to the corresponding position of the destination register, that is, the following operations are performed for each of the intermediate vector parameters: the The content in the intermediate vector parameter is written to the position in the destination register indicated by the index value corresponding to the intermediate vector parameter.
  • ui8[1:0] and ui8[3:2] are immediate numbers, indicating the index value corresponding to the register; specifically, the lowest two bits of the immediate number ui8 (ui8[1:0]) express the first The index of the first target element in the source register, the third and fourth bits (ui8[3:2]) of the immediate value ui8 express the index of the second target element in the source register.
  • the intermediate vector is as follows:
  • the intermediate vectors are vec0 and vec1;
  • XR[xj][127:0], XR[xd][127:0] represent the intermediate vector parameters of vec0, XR[xj][255:128], XR[xd][ 255:128] indicates the intermediate vector parameter of vec1;
  • D indicates that the data type is double word, 64 bits wide.
  • the vector shuffling instruction "VS.D vd, vj, ui8" means respectively Select two double-word elements from the four double-word elements in every 128 bits of the vector register vj and vector register vd according to the immediate value content, and write the obtained result into the corresponding 128 bits of the vector register vd;
  • the vector shuffling instruction "XVS.D vd, vj, ui8" indicates that the vector register xj and vector register xd reads two double-word elements from the four double-word elements in every 128 bits according to the immediate data content, and then writes the read double-word elements into the corresponding 128 bits of xd.
  • one of the two source registers has the same source register as the destination register, that is, there is one register that is both the source register and the destination register; by adopting the above technical scheme, each time the shuffling instruction is executed, the destination register can be Half of the elements in the register are overwritten, which can be applied in software application scenarios that need to perform corresponding operations.
  • a shuffling parameter is added to the vector shuffling instruction.
  • the shuffling parameter includes an index value and an operation code. According to the index value and the operation code, the number of source operands and index values are the same, and the data type is a double word.
  • the shuffling operation under the condition that the register is 128 bits; it can be seen that, adopting the technical scheme of the present application, through a vector shuffling instruction, the shuffling under the situation that the source operand and the index value are different and the data type is a double word is realized
  • the shuffling operation does not need to add other instructions to pass the shuffling mode, and does not need to obtain the shuffling mode through memory access, thus effectively reducing the system overhead and improving the execution efficiency of the vector shuffling operation.
  • the operation code is a third operation code
  • the index value includes a first index value, a second index value, a third index value, and a fourth index value
  • the first The index value, the second index value, the third index value and the fourth index value respectively index the same or different positions
  • the processing method of the vector shuffling instruction may include:
  • Step 401 Receive an instruction, where the instruction includes: a register identifier and a shuffling parameter.
  • the number of source registers is two, that is, the source elements come from two different registers; when the number of the source registers is multiple, each source register identifier in all the source registers is identical to that of the destination register The identifiers are different; or, when there are multiple source registers, there is one source register identifier in all the source registers that is the same as the destination register identifier.
  • the shuffling parameters include an index value and an operation code; wherein, the index value is realized in the form of an immediate value; the operation code is realized in the form of an identifier that can be converted into a binary code, and the operation code is a third operation code.
  • the operation code is the third operation code
  • the source register includes the first source register and the second source register
  • the destination register is the first source register or the second source register.
  • the instruction format is "opcode destination register, source register, immediate value".
  • the instruction can be expressed as “[X]VP.W vd/xd,vj/xj,ui8”; [X]VP is the instruction name in the third opcode, and W is the third The data type in the opcode, W indicates that the data type is word, [X]VP.W is the third opcode in the identifier form; vd/xd indicates the destination register, vj and vd indicate the source register (or xj and xd indicate the source Register), ui8 means immediate value.
  • VP.W is a third operation code that can be converted into a binary form, such as converting VP.W into a third operation code in a binary form of 01110011111001.
  • the immediate value can be a set of data, for example, the index value can be expressed by different bits ui8[1:0], ui8[3:2], ui8[5:4] and ui8[7:6] of the immediate value ui8.
  • Step 402 Execute the instruction, according to the operation code and the index value, in the first source register, obtain the first index value and the second index value from M N3 elements of every N3 bits respectively Indicated source element; in the second source register, respectively acquire the source element indicated by the third index value and the fourth index value from M N3 elements of every N3 bits.
  • the data type of the element is a word; the number of source elements selected from the M N3 elements of each N3 bit is n3, and N3, M N3 and n3 are all positive integers greater than 0.
  • the index value includes four index values, namely: the first index value, the second index value, the third index value and the fourth index value, the first index value, the second index value, the third index value value and the fourth index value each index a different location.
  • each source register performs vector shuffling through different bits of the immediate value, that is, the bits in the immediate value corresponding to different source registers are different; which bits of the immediate value are used for indexing depends on the specific situation Alternatively, no more details here.
  • the third opcode is [X]VP.W
  • the first index value is ui8[1:0]
  • the second index value is ui8[3:2]
  • the third index value is ui8[5 :4]
  • the fourth index value is ui8[7:6].
  • index values there are as many source elements; the source element indicated by each index value will be taken from M N3 elements each N3 bits of data type word, M N3 bits each The number of selected source elements among the elements is n3, and the conditions such as N3, M N3 and n3 are all positive integers greater than 0 are determined as selection rules.
  • the Mn3 elements can be sequentially adjacent elements, and can also be cross-adjacent elements; for example, when Mn3 is four, it is assumed that the source register contains eight elements, namely element A1, element A2, and element A3 , element A4, element A5, element A6, element A7, and element A8, the Mn4 elements can be, for example, elements A2 to A5, or elements A1, A3, A5, and A7 cross adjacent elements.
  • the element position may be an element address.
  • the source elements indicated by each index value are respectively obtained from M N3 elements of every N3 bits, that is, in the first source register, M N3 elements of every N3 bits are respectively obtained Acquire the source elements indicated by the first index value and the second index value; and in the second source register, obtain the third index value and the fourth index value indicated by the M N3 elements of each N3 bit respectively source element.
  • N3 is 128, M N3 is four, and n3 is 2.
  • Step 403 and Step 404 are executed after obtaining the source element indicated by each index value from the M N3 elements of every N3 bits in the source register.
  • Step 403 Determine the source element indicated by the first index value as the first target element, and determine the source element indicated by the second index value as the second target element.
  • the source element indicated by the first index value selected from the first source register is determined as the first target element, and the source element indicated by the second index value selected from the first source register is determined as Second target element.
  • Step 404 Determine the source element indicated by the third index value as the third target element, and determine the source element indicated by the fourth index value as the fourth target element.
  • the source element indicated by the third index value selected from the second source register is determined as the third target element
  • the source element indicated by the fourth index value selected from the second source register is determined as The fourth target element
  • step 403 and step 404 may be executed simultaneously, or may be executed sequentially, and the sequence is not restricted; after all steps 403 and 404 are executed, step 405 is executed.
  • Step 405 Write the first target element and the second target element into the first position in the target register; and write the third target element and the fourth target element into the The second location in the destination register.
  • the operation code when the operation code is the third operation code, after obtaining the first target element, the second target element, the third target element and the fourth target element, the first target element and the second target element can be The element is written to a first location in the destination register, and the third and fourth destination elements are written to a second location in the destination register.
  • the third opcode is VP.W/XVP.W (the two are abbreviated as [X]VP.W)
  • the data type is word
  • N3 is 128, M N3 is four, and n3 is 2
  • the vector The shuffling instruction "[X]VP.W vd,vj,ui8" means to use the ui8[1:0] and ui8[3:2] values as index values from four of every 128 bits in the vector register vj/xj Select two of the word elements and write them into the 0th and 1st word elements corresponding to 128 bits of the vector register vd/xd respectively; use the values of ui8[5:4] and ui8[7:6] as index values , respectively select two of the four word elements in every 128 bits of the vector register vd/xd and write them into the second and third word elements corresponding to the 128 bits of the vector register vd/xd.
  • a shuffling parameter is added to the vector shuffling instruction.
  • the shuffling parameter includes an index value and an operation code. According to the index value and the operation code, the source operand and the index value have the same number and the data type is a word. It can be seen that, by adopting the technical solution of this application, through a vector shuffling instruction, the shuffling operation under the condition that the number of source operands and index values are the same and the data type is word is realized, without adding other instructions Passing the shuffling mode does not need to obtain the shuffling mode through memory access, which effectively reduces the system overhead and improves the execution efficiency of the vector shuffling operation.
  • the operation code is the fourth operation code, and the number of the index value is the same as the number of the source elements; as shown in Figure 5, the processing method of the vector shuffling instruction Can include:
  • Step 501 Receive an instruction, where the instruction includes: a register identifier and a shuffling parameter.
  • the number of source registers is one, that is, the source element comes from one register.
  • the shuffling parameter includes an index value and an operation code; wherein, the index value is realized in the form of an immediate value; the operation code is realized in the form of an identifier that can be converted into a binary code, and the operation code is the fourth operation code.
  • the instruction format is "opcode destination register, source register, immediate value".
  • the instruction can be expressed as XVP.D xd, xj, ui8;
  • XVP is the instruction name in the fourth opcode, D is the data type in the fourth opcode, and D indicates that the data type is Double word,
  • XVP.D is the fourth opcode in the identifier form;
  • xd indicates the destination register, xj indicates the source register, and ui8 indicates the immediate value.
  • XVP.D may be converted into a fourth operation code in binary form, such as converting XVP.D into a fourth operation code in binary form of 01110111111010.
  • the immediate value can be a set of data, for example, the index value can be expressed by different bits ui8[1:0], ui8[3:2], ui8[5:4] and ui8[7:6] of the immediate value ui8.
  • Step 502 Execute the instruction, and obtain the source element indicated by each index value from Mn4 elements in the source register according to the operation code and the immediate value; wherein, the The data type is a double word; the number of selected source elements is n4, and both M n4 and n4 are positive integers greater than 0.
  • the operation code is the fourth operation code
  • the fourth operation code may be used to instruct to obtain an element of the double-word data type from the source register.
  • the number of index values is the same as the number of source elements; the source element indicated by each index value will be obtained from M n4 elements, the data type of the element is a double word, and the number of selected source elements is n4, so Multiple conditions such as Mn4 and n4 are positive integers greater than 0 are determined as selection rules.
  • the Mn4 elements can be sequentially adjacent elements, and can also be cross-adjacent elements; for example, when Mn4 is four, it is assumed that the source register contains eight elements, namely element A1, element A2, and element A3 , element A4, element A5, element A6, element A7, and element A8, the Mn4 elements can be, for example, elements A2 to A5, or elements A1, A3, A5, and A7 that cross adjacent elements.
  • the element position may be an element address.
  • the source element indicated by each index value can be obtained from the Mn4 elements respectively in the source register, and the data type of the obtained source element is a double word,
  • the number of selected source elements is n4, and both M n4 and n4 are positive integers greater than 0.
  • M n4 is four, and n4 is 4.
  • Step 503 Determine the selected source element as the target element, and write the target element into the position corresponding to the index value in the destination register.
  • the vector shuffling instruction "XVP.D xd, xj, ui8" means that ui8[1: 0], ui8[3:2], ui8[5:4], ui8[7:6] values as index values, select the source element indicated by each index value from the four double-word elements in the vector register xj , and the source elements are sequentially written into the four double-word elements of the vector register xd.
  • a shuffling parameter is added to the vector shuffling instruction.
  • the shuffling parameter includes an index value and an operation code. According to the index value and the operation code, the number of source operands and index values are the same, and the data type is a double word.
  • the shuffling operation under the condition that the register is 256 bits; it can be seen that, by adopting the technical scheme of the present application, through a vector shuffling instruction, the number of source operands and index values is the same, and the data type is double word, and the register is 256
  • the shuffling operation in the bit case does not need to add other instructions to pass the shuffling mode, and does not need to obtain the shuffling mode through memory access, thereby effectively reducing the system overhead and improving the execution efficiency of the vector shuffling operation.
  • the operation code is the fifth operation code
  • the index value includes a first index value and a third index value
  • the first index value and the third index value have different indexes.
  • the position of; described source register comprises first source register and second source register;
  • the processing mode of vector shuffling instruction can comprise:
  • Step 601 Receive an instruction, where the instruction includes: a register identifier and a shuffling parameter.
  • the number of source registers is two, that is, the source elements come from two different registers; when the number of the source registers is multiple, each source register identifier in all the source registers is identical to that of the destination register The identifiers are different; or, when there are multiple source registers, there is one source register identifier in all the source registers that is the same as the destination register identifier.
  • the shuffling parameters include an index value and an operation code; wherein, the index value is realized in the form of an immediate value; the operation code is realized in the form of an identifier that can be converted into a binary code, and the operation code is the fifth operation code.
  • the source register includes the first source register and the second source register, and the destination register is the first source register or the second source register.
  • the instruction format is "opcode destination register, source register, immediate value".
  • the instruction can be expressed as XVP.Q vd/xd,vj/xj,ui8;
  • XVP is the instruction name in the fifth opcode, Q is the data type in the fifth opcode, and Q Indicates that the data type is quadword,
  • XVP.Q is the fifth opcode in the identifier form;
  • xd indicates the destination register, xj and xd indicate the source register, and ui8 indicates the immediate value.
  • XVP.Q can be converted into the fifth operation code in binary form, such as converting XVP.Q into the fifth operation code in binary form 01110111111011.
  • the immediate value can be a set of data, for example, the index value can be expressed through different bits ui8[1:0], ui8[5:4] of the immediate value ui8.
  • Step 602 Execute the instruction, and obtain the first source element indicated by the first index value from the Mn5 elements in the first source register according to the operation code and the immediate value; and, in In the second source register, obtain the second source element indicated by the second index value from Mn5 elements; wherein, the data type of the element is a quadword; the number of selected source elements is n5, and n5 is a positive integer greater than 0.
  • the operation code may be a fifth operation code, and the fifth operation code may be used to instruct to obtain an element whose data type is a quadword from the source register.
  • the index value includes two index values, respectively: the first index value and the third index value, the first index value and the third index value respectively index different positions; the first index value and the third index value respectively represent the same immediate value
  • the first index value represents the low bit of the immediate value ui8, and the third index value represents the high bit of the immediate value ui8.
  • the first index value can also represent the lowest two bits of the immediate value ui8, and the third index value can also Indicates the second lowest two digits of the immediate value ui8.
  • the fifth opcode is XVP.Q
  • the first index value is ui8[1:0]
  • the third index value is ui8[5:4].
  • each source register performs vector shuffling through different bits of the immediate value, that is, the bits in the immediate value corresponding to different source registers are different; which bits of the immediate value are used for indexing depends on the specific situation Hence, no more details here.
  • the number of index values is the same as the number of source elements; the first source element indicated by the first index value will be obtained from Mn5 elements in the first source register, and in the second source register , obtain the second source element indicated by the second index value from the M n5 elements, the data type of the element is a quadruplet, the number of selected source elements is n5, and n5 is a positive integer greater than 0, etc. Conditions are determined as selection rules.
  • the Mn5 elements can be sequentially adjacent elements, or can be cross-adjacent elements; for example, when Mn5 is four, it is assumed that the source register contains eight elements, namely element A1, element A2, and element A3 , element A4, element A5, element A6, element A7, and element A8, the Mn5 elements can be, for example, elements A2 to A5, or elements A1, A3, A5, and A7 that cross adjacent elements.
  • the element position may be an element address.
  • the first source element indicated by the first index value can be obtained from the Mn5 elements in the first source register, and the first source element indicated by the first index value can be obtained in the second source register from M
  • the second source element indicated by the second index value is obtained from the n5 elements.
  • the number of source elements selected from the first source register is n3/2, and the number of source elements selected from the second source register is n3/2.
  • each source register performs vector shuffling through different bits of the immediate value, that is, the bits in the immediate value corresponding to different source registers are different; which bits of the immediate value are used for indexing depends on the specific situation Hence, no more details here.
  • step 603 is executed.
  • Step 603 Determine the first source element and the second source element as target elements respectively, and write them into corresponding positions of the target register.
  • the first source element when the opcode is the fifth opcode, after obtaining the first source element and the second source element, the first source element can be determined as the target element and written to the first position of the destination register, and the second The second source element is determined as the target element to be written into the second location of the destination register.
  • the first position and the second position are respectively determined by index values.
  • the vector shuffling instruction "XVP.Q xd, xj, ui8" indicates that according to ui8[1: 0], ui8[5:4] values, select a source element from the two quadword elements of the vector register xj, and select a source element from the two quadword elements of the vector register xd, the selected two source elements Elements are written to the two quadword elements of vector register xd by index value.
  • a shuffling parameter is added to the vector shuffling instruction.
  • the shuffling parameter includes an index value and an operation code. According to the index value and the operation code, the same number of source operands and index values are realized, and the data type is quadword shuffling operation in the case; it can be seen that, by adopting the technical scheme of the present application, through a vector shuffling instruction, the shuffling operation in the case where the source operand and the index value are the same in number and the data type is a quadword is realized without adding Other instructions pass the shuffling mode, and there is no need to obtain the shuffling mode through memory access, thereby effectively reducing the system overhead and improving the execution efficiency of the vector shuffling operation.
  • FIG. 7 shows a schematic structural diagram of a processor provided by an embodiment of the present application.
  • the processor can include:
  • a plurality of vector registers include a source register 72 and a target register 74, and the source register 71 is used to store data elements;
  • the decoding unit 71 is configured to decode a vector shuffling instruction; wherein, the vector shuffling instruction includes: a register identifier and a shuffling parameter, and the register identifier includes a source register identifier and a destination register identifier;
  • the execution unit 73 in response to the vector shuffling instruction, performs a vector shuffling operation on the source elements acquired from the source register 71 according to the shuffling parameters, and acquires a target element after the vector shuffling operation, and The destination element is written to the destination register 74 .
  • instructions are stored in instruction memory 70 .
  • the execution unit 73 determines the position information of the source element in the source register 71 and the number of source elements; wherein, the number of the selected source elements is one or multiple; according to the determined position information and the number of source elements, select source elements from the source register; determine all the selected source elements as target elements.
  • the shuffling parameters include an index value and an operation code; the index value is used to indicate the position information of each source element required for the vector shuffling operation in the source register; the operation code is used for characterizing operations performed on the source register and the destination register;
  • the execution unit 73 determines a selection rule for obtaining source elements according to the index value and the operation code; and obtains the source element indicated by each index value from the source register 71 according to the selection rule.
  • the execution unit 73 when the number of the index values is different from the number of the source elements, determines the way to group the source elements according to the number of the index values, and according to the The method of grouping and the operation code determine the selection rule; when the number of the index values is the same as the number of the source elements, the selection rule is determined according to the operation code.
  • the execution unit 73 forms a set of element groups for every N1 adjacent elements in the source register; wherein, the data type of the elements is any one of byte, halfword, and word; N1 is a positive integer greater than 0; determine the element in each element group as the initial source element; respectively obtain the source element indicated by each index value from the initial source element; select from each element group The number of source elements is n1.
  • the adjacent elements are sequentially adjacent elements in the source register, and the addresses of elements in multiple adjacent element groups are partially the same or completely different;
  • the data types of the elements contained in each element group are the same; the data types of the elements contained in different element groups are the same or different.
  • the operation code is a second operation code, and the number of the index values is the same as the number of the source elements;
  • the execution unit 73 in the source register, respectively obtains the source element indicated by each index value from M N2 elements of every N2 bits; wherein, the data type of the elements is a double word; every N2 bits
  • the number of source elements selected from the M N2 elements of is n2, and N2, M N2 and n2 are all positive integers greater than 0.
  • the execution unit 73 creates an intermediate vector; the intermediate vector includes at least one intermediate vector parameter, and when there is an element group, the number of the intermediate vector parameters is equal to the number of the element group; when there is no element group When grouping, the number of the intermediate vector parameters is equal to the number of the source elements; each source element selected is stored in the corresponding intermediate vector parameters in the intermediate vector; wherein, the intermediate vector parameters and There is a one-to-one correspondence between the selected source elements; according to the shuffling parameters, the content in each of the intermediate vector parameters is written into the corresponding position of the destination register.
  • the operation code is a third operation code;
  • the index value includes a first index value, a second index value, a third index value and a fourth index value, and the first index value, the second index value , the third index value and the fourth index value respectively index different positions;
  • the source register includes a first source register and a second source register;
  • the execution unit 73 in the source register 71, respectively obtains the source elements indicated by the first index value and the second index value from M N3 elements of every N3 bits; and, in the second source register Among them, the source elements indicated by the third index value and the fourth index value are respectively obtained from M N3 elements of every N3 bits; wherein, the data type of the elements is a word; selected from M N3 elements of every N3 bits
  • the number of source elements is n3, N3, M N3 and n3 are all positive integers greater than 0; the source element indicated by the first index value is determined as the first target element, and the source element indicated by the second index value determine the element as the second target element; and determine the source element indicated by the third index value as the third target element, and determine the source element indicated by the fourth index value as the fourth target element; set the first target element and the second destination element are written to a first location in the destination register; and the third destination element and the fourth destination element are written to a second location in the destination register.
  • the operation code is a fourth operation code
  • the execution unit 73 in the source register, respectively obtains the source element indicated by each index value from the Mn4 elements; wherein, the data type of the element is a double word; the number of selected source elements is n4, the M n4 and n4 are both positive integers greater than 0.
  • the operation code is a fifth operation code;
  • the index value includes a first index value and a third index value, and the first index value and the third index value respectively index different positions;
  • the source register including a first source register and a second source register;
  • the execution unit 73 in the first source register, obtains the first source element indicated by the first index value from the M n5 elements; and, in the first source register, obtains the first source element indicated by the M n5 elements Obtain the second source element indicated by the third index value; wherein, the data type of the element is four words; the number of selected source elements is n5, and n5 is a positive integer greater than 0; the first source element is respectively and the second source source element is determined as the target element, and written into the corresponding position of the destination register.
  • the number of the source register is one or more, and the number of the destination register is one;
  • the source register identifier and the destination register identifier may be the same or different;
  • each source register identifier in all the source registers is different from the destination register identifier; or, when the number of the source registers is multiple, all of the source registers There is a source register ID identical to the destination register ID.
  • the processor provided in the embodiment of the present application can perform a vector shuffle operation on the elements obtained in the source register by adding the register identifier and the shuffle parameter in the instruction, and combining the shuffle parameter. Therefore, a specific function can be realized through a single instruction
  • the vector shuffling operation does not need to implement a specific function through multiple instructions for performing the shuffling operation, which improves the execution efficiency of the specific function.
  • the electronic device may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, and a sensor component 814, and a communication component 816.
  • the processing component 802 generally controls the overall operations of the electronic device, such as those associated with display, data communication, camera operations, and recording operations.
  • the processing element 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the above method.
  • processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components.
  • processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802 .
  • the memory 804 is configured to store various types of data to support operations at the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, etc.
  • the memory 804 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • the power supply component 806 provides power to various components of the electronic device.
  • Power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for terminal 800 .
  • Multimedia component 808 includes a screen providing an output interface between the electronic device and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense a boundary of a touch or swipe action, but also detect duration and pressure associated with the touch or swipe action.
  • the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.
  • the audio component 810 is configured to output and/or input audio signals.
  • the audio component 810 includes a microphone (MIC), which is configured to receive an external audio signal when the terminal is in an operation mode, such as a call mode, a recording mode and a voice recognition mode. Received audio signals may be further stored in memory 804 or sent via communication component 816 .
  • the audio component 810 also includes a speaker for outputting audio signals.
  • the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.
  • Sensor assembly 814 includes one or more sensors for providing status assessments of various aspects of electronic device 800 .
  • the sensor component 814 can detect the open/closed state of the electronic device 800, the relative positioning of the components, such as the display and the keypad of the terminal, the sensor component 814 can also detect the position change of the terminal or a component of the terminal, and the user The presence or absence of contact with the electronic device, the orientation or acceleration/deceleration of the electronic device and the temperature change of the electronic device.
  • Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • Sensor assembly 814 may also include an optical sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
  • the communication component 816 is configured to facilitate wired or wireless communication between the electronic device and other devices. Electronic devices can access wireless networks based on communication standards, such as WiFi, 2G/3G/4G/5G, or a combination thereof.
  • the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication.
  • NFC near field communication
  • the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wideband
  • Bluetooth Bluetooth
  • the electronic device may be programmed by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable A gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the vector shuffling method described above.
  • ASICs Application Specific Integrated Circuits
  • DSPs Digital Signal Processors
  • DSPDs Digital Signal Processing Devices
  • PLDs Programmable Logic Devices
  • FPGA Field Programmable A gate array
  • controller microcontroller, microprocessor or other electronic component implementation for performing the vector shuffling method described above.
  • non-transitory computer-readable storage medium including instructions, such as the memory 804 including instructions, which can be executed by the processor 820 of the electronic device to implement the above vector shuffling method.
  • the non-transitory computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
  • the electronic device in the embodiment of the present application is used to implement the corresponding vector shuffling method in the foregoing method embodiments, and has the beneficial effect of implementing the corresponding method, which will not be repeated here.
  • each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other.
  • the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
  • modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment.
  • Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies.
  • All features disclosed in this specification including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined.
  • Each feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
  • the various component embodiments of the present application may be realized in hardware, or in software modules running on one or more processors, or in a combination thereof.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some or all components in the browser client device according to the embodiments of the present application.
  • DSP digital signal processor
  • the present application can also be implemented as an apparatus or apparatus program (eg, computer program and computer program product) for performing a part or all of the methods described herein.
  • Such a program implementing the present application may be stored on a computer-readable medium, or may be in the form of one or more signals.
  • Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.

Abstract

本申请提供一种向量混洗方法、处理器及电子设备。方法包括:接收指令,指令包括:寄存器标识和混洗参数;寄存器标识包括源寄存器标识和目的寄存器标识;源寄存器标识用于表征源寄存器,源寄存器为存储执行向量混洗操作时被操作的源元素的寄存器;目的寄存器标识用于表征目的寄存器,目的寄存器为存储执行向量混洗操作后得到的目标元素的寄存器;混洗参数用于指示对源元素执行向量混洗操作时所依据的参数;执行指令,以根据混洗参数对从源寄存器获取的源元素执行向量混洗操作,并获取向量混洗操作后的目标元素;将目标元素写入目的寄存器。本申请能够通过一条指令,实现特定功能的向量混洗操作,提高了特定功能的执行效率。

Description

向量混洗方法、处理器及电子设备
本申请要求于2021年12月10日提交中国专利局、申请号为202111508098.8、申请名称为“向量混洗方法、处理器及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别是涉及一种向量混洗方法、处理器及电子设备。
背景技术
随着多媒体应用的发展,处理器越来越多的计算任务都来自于数字图像处理领域,基于图像的应用成为服务器、桌面计算机、个人移动设备也即嵌入式设备中不可忽视的工作负载。针对数字图像处理软件的实际情况,对指令集架构进行更新,在处理器中加入对应用中常用操作的指令支持,是处理器发展的一个主要方向,同时也是处理器针对特定应用提升性能的简单且有效的方法,因此越来越多的处理器中增加了单指令多数据流(Single Instruction Multiple Data,SIMD)结构,以支持规则数据集合上的同种操作。
目前,SIMD处理器中普遍引入了混洗指令,不同的混洗指令能够满足不同的需求,而在现有技术方案中,在实现特定功能的向量混洗操作时,需要多条指令实现一系列的操作,操作方式较为复杂,且降低了特定功能的执行效率。
发明内容
本申请提供了一种向量混洗方法、处理器及电子设备,以解决现有技术中需要多条指令实现一系列操作,操作方式较为复杂,降低了特定功能的执行效率的问题。
为了解决上述问题,本申请公开了一种向量混洗方法,所述方法包括:
接收指令,所述指令包括:寄存器标识和混洗参数;其中,所述寄存器标识包括源寄存器标识和目的寄存器标识;所述源寄存器标识用于表征源寄存器,所述源寄存器为存储执行向量混洗操作时被操作的源元素的寄存器;所述目的寄存器标识用于表征目的寄存器,所述目的寄存器为存储执行所述向量混洗操作后得到的目标元素的寄存器;所述混洗参数用于指示对所述源元素执行向量混洗操作时所依据的参数;
执行所述指令,以根据所述混洗参数对从所述源寄存器获取的源元素执行向量混洗操作,并获取所述向量混洗操作后的目标元素;
将所述目标元素写入所述目的寄存器。
为了解决上述问题,本申请公开了一种处理器,包括:
多个向量寄存器,所述多个向量寄存器包括源寄存器与目标寄存器,源寄存器用于存储数据元素;
译码单元,用于译码向量混洗指令;其中,所述向量混洗指令包括:寄存器标识 和混洗参数,所述寄存器标识包括源寄存器标识和目的寄存器标识;
执行单元,响应于所述向量混洗指令,根据所述混洗参数对从所述源寄存器获取的源元素执行向量混洗操作,并获取所述向量混洗操作后的目标元素,并将所述目标元素写入所述目的寄存器。
为了解决上述问题,本申请公开了一种电子设备,包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行上述一个或多个所述的向量混洗方法。
与现有技术相比,本申请包括以下优点:
本申请实施例提供的向量混洗方法、处理器及电子设备,通过在指令中添加寄存器标识和混洗参数,结合混洗参数可以对源寄存器内获取的元素执行向量混洗操作,因此,通过一条指令即可实现特定功能的向量混洗操作,无需通过多条执行混洗操作的指令实现特定功能,提高了特定功能的执行效率。
附图说明
图1为本申请实施例一提供的一种向量混洗方法的步骤流程图;
图2为本申请实施例二提供的一种向量混洗方法的步骤流程图;
图3为本申请实施例三提供的一种向量混洗方法的步骤流程图;
图4为本申请实施例四提供的一种向量混洗方法的步骤流程图;
图5为本申请实施例五提供的一种向量混洗方法的步骤流程图;
图6为本申请实施例六提供的一种向量混洗方法的步骤流程图;
图7为本申请实施例提供的一种处理器的结构框图;
图8为本申请实施例提供的一种电子设备的结构框图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
本申请中说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”等是用于区别类似或同类的对象或实体,而不必然意味着限定特定的顺序或先后次序,除非另外注明(Unless otherwise indicated)。应该理解这样使用的用语在适当情况下可以互换,例如能够根据本申请实施例图示或描述中给出那些以外的顺序实施。
虽然下面的实施例是参照一种处理器来描述的,但是其他实施例适用于其他类型的集成电路和逻辑设备。可以更容易地将本申请的上述技术和教导应用于其他类型的电路或半导体设备,它们会因更高的流水线吞吐率和改进的性能而受益。本申请的实施例适用于执行数据操纵的任何处理器或机器。但是,本申请不限于执行256位、128位、64位、32位或16位数据操作的处理器或机器,而是适用于在其中需要操作组合型数据的任何处理器和机器。
在下面的描述中,为了解释的目的,给出了大量具体细节,以提供对本申请的透彻理解。然而,本领域内的技术人员应该认识到这些具体细节对实时本申请来说并不是必需的。在其它情况下,没有详细地给出一些公知的电学结构和电路,以免不必要 地混淆本申请。另外,下面的描述提供了多个示例,并且附图示出了各种示例以用于说明。然而,这些示例不应当被理解成限制性的,因为它们只是用来提供本申请的一些示例,而不是用来提供对本申请的所有可能实现的穷举。
虽然下面的例子描述了在执行单元的上下文中的指令处理和分发,但是本申请的其他实施例可以以软件的形式被实现。在一个实施例中,本申请的方法表现为机器可执行指令。所述指令可以被用于使得用这些指令编程的通用或专用处理器执行本申请的步骤。本申请的步骤可以由包含用于执行所述步骤的硬布线逻辑的专用硬件组件,或由被编程的计算机组件和定制硬件组件的任何组合来执行。这些软件可以存储在系统中的存储器内。
实施例一
参照图1,示出了本申请实施例提供的一种向量混洗方法的步骤流程图。
本申请实施例提供的向量混洗方法,其执行主体可以是CPU(Central Processing Unit,中央处理器),包括以下步骤:
步骤101:接收指令,所述指令包括:寄存器标识和混洗参数。
本申请实施例中,指令是指用于执行向量混洗操作的指令,且该指令为供CPU执行的指令。
在执行向量混洗操作时,可以由CPU接收用于执行向量混洗操作的指令,在该指令中包含有寄存器标识和混洗参数。
其中,寄存器标识可以包括:源寄存器标识和目的寄存器标识,源寄存器标识用于表征源寄存器,该源寄存器为存储执行向量混洗操作时被操作的源元素的寄存器;该混洗操作执行时的被操作源元素可以为源寄存器中存储的所有数据,也可以为源寄存器中存储的部分数据。目的寄存器标识用于表征目的寄存器,该目的寄存器为存储执行向量混洗操作后得到的目标元素的寄存器。
在本示例中,源寄存器的数量可以为一个,也可以为两个,即源元素来源于一个或两个寄存器,具体地,对于源寄存器的数量可以根据业务需求而定,本申请实施例对此不加以限制。
混洗参数可以用于指示对源元素执行向量混洗操作时所依据的参数,在本示例中,混洗参数可以包括:索引值和操作码等参数;可选地,索引值通过立即数的方式呈现;操作码为通过二进制方式表示的代码,或者操作码为可以转换为二进制代码的标识符。
在接收到指令之后,执行步骤102。
步骤102:执行所述指令,以根据所述混洗参数对从所述源寄存器获取的源元素执行向量混洗操作,并获取所述向量混洗操作后的目标元素。
其中,目标元素是指在对源寄存器中的元素执行向量混洗操作之后得到的元素。
本申请实施例中,在CPU接收到用于执行向量混洗操作的指令之后,可以由CPU执行该指令,以根据混洗参数对从源寄存器获取的源元素执行向量混洗操作,并获取执行向量混洗操作后的目标元素。
步骤103:将所述目标元素写入所述目的寄存器。
本申请实施例中,在获取到向量混洗操作后的目标元素之后,可以将目标元素写入目的寄存器。
可选地,根据混洗参数获取源元素,以执行向量混洗操作,获取目标元素的方法包括:根据向量混洗操作所需要的源元素在源寄存器内的位置信息和所述向量混洗操作所需要的源元素数量,从源寄存器中选取源元素,以将选取的所有源元素作为目标元素,具体地,可以结合下述具体实现方式进行详细描述。
在本申请的一种具体实现方式中,上述步骤102可以包括:
子步骤A1:根据所述混洗参数,确定源元素在所述源寄存器中的位置信息和所述向量混洗操作所需要的源元素数量;其中,选取的所述源元素的数量为一个或多个。
本申请实施例中,混洗参数中包含可以用于指示源元素在源寄存器中的位置信息和源元素数量的参数。
在CPU接收到用于执行向量混洗操作的指令之后,可以对该指令进行解析,以解析得到指令中包含的混洗参数。
在解析得到指令中包含的混洗参数之后,可以根据混洗参数确定向量混洗操作所需要的源元素在源寄存器中的位置信息和所述向量混洗操作所需要的源元素数量,其中,选取的源元素的数量可以为一个,也可以为多个,在后续的示例中,以多个为例进行说明。
在根据混洗参数确定出源元素在源寄存器中的位置信息和源元素数量之后,执行子步骤A2。
子步骤A2:根据确定的所述位置信息和源元素数量,从所述源寄存器中选取源元素。
在根据混洗参数确定出源元素在源寄存器中的位置信息和源元素数量时,从源寄存器中选取源元素。
在根据确定的位置信息和源元素数量从源寄存器中选取源元素之后,执行子步骤A3。
子步骤A3:将所有所述选取的源元素确定为目标元素。
在根据确定的位置信息和源元素数量从源寄存器中选取源元素之后,可以将所有选取的源元素作为目标元素,以写入目的寄存器内。
在本申请实施例中,混洗参数可以包括索引值和操作码,通过索引值和操作码以选取源元素,具体地,可以结合下述具体实现方式进行详细描述。
可选地,所述索引值用于指示所述向量混洗操作所需要的每一个源元素在所述源寄存器中的位置信息;所述操作码用于表征对所述源寄存器和目的寄存器所进行的操作,上述子步骤A2可以包括:
子步骤B1:根据所述索引值和所述操作码,确定获取源元素的选取规则。
在本申请实施例中,选取规则是指用于从源寄存器内读取源元素的约束条件。
在获取到混洗参数之后,可以根据混洗参数中包含的索引值和操作码确定出从源寄存器获取源元素的选取规则。具体地,可以分为以下两种情况:
第一种情况、在索引值的数量与源元素的数量不相同时,可以根据索引值的数量确定对源元素进行分组的方式,并根据该分组的方式和操作码确定出获取源元素的选取规则,即先将源寄存器内源元素进行分组,如以N个相邻源元素为一组,然后按照索引值从分组元素内获取源元素的选取规则;通常情况下,N为四,当然,N也可以根 据源寄存器的位数等具体应用场景来确定,在此不再赘述。
第二种情况、在索引值的数量与源元素的数量相同时,可以根据操作码确定出获取源元素的选取规则。
在根据索引值和操作码确定出获取源元素的选取规则之后,执行子步骤B2。
子步骤B2:从源寄存器中,按照所述选取规则,分别获取每个索引值所指示的源元素。
在根据索引值和操作码确定出获取源元素的选取规则之后,可以按照选取规则从源寄存器中分别获取每个索引值所指示的源元素。
在实际应用中,CPU通过向量混洗指令的操作码,即可确定索引值的数量与源元素的数量是否相同,即CPU根据操作码即可确定源元素的分组情况和选取规则。
可选地,索引值和目的寄存器中地址之间存在预设对应关系。可选地,将目标元素写入所述目的寄存器中所述立即数对应的位置,即为从目的寄存器中确定索引值对应的位置,将源元素依次存储在确定的位置上。具体地,任一源元素通过确定的索引值获取,将该源元素写入与该确定的索引值存在对应关系的目的寄存器中的地址上;示例性地,源元素A通过索引值ui8[1:0]获取(其中ui8代表立即数,该立即数ui8为表示一组数据的索引值,ui8[1:0]代表该立即数的最低2比特所构成的数),而索引值ui8[1:0]对应目的寄存器中一组地址中的最低位地址,此时将获取地源元素A作为一个目标元素写入目的寄存器中最低位地址上。作为举例,立即数ui8为包括8比特的一组数据,利用立即数ui8构造4个索引值,ui8的每2比特所构成数作为一个索引值。而这些索引值在ui8中的位置或序号,则指示或暗示了由索引值所获取的源元素应当被搬移到目的寄存器的元素位置。例如,索引值ui8[7:6]是ui8中的第4个索引值,那么其对应的源操作数将被写入目的寄存器的第4个元素位置。类似地,当ui8[n:n-1]是ui8中的第(n+1)/2个索引值,那么其对应的源操作数将被写入目的寄存器的第n个元素位置。可以理解地,立即数中可包括其他数量的索引值,相应地立即数中的第i个索引值所对应的源操作数将被写入目的寄存器的第i个元素位置,i为正整数。
现有技术中,在实现SHUF指令(一种混洗指令)时,该一条混洗指令可以根据混洗模式的设置,得到不同功能的混洗效果;该混洗模式由应用需求决定,通常可以通过其他至少一条指令调用混洗模式,并将该混洗模式传输至上述混洗指令中,或者,可以将混洗模式添加至内存中,在上述混洗指令执行过程中,通过访问内存获取该混洗模式。由此可见,现有技术中需要多条指令或者访存的方式,实现不同混洗模式的混洗指令,无论是多条指令方式,还是访存的方式,都大大增加了混洗指令实现时整个CPU系统的开销;基于现有技术中存在的技术问题,本申请实施例中,通过在指令中增加混洗参数(操作码和索引值),不同的混洗参数能够实现不同混洗模式的混洗指令,进而无需使用多条指令实现数据混洗,也无需通过访存获取混洗模式,使得通过一条混洗指令能够实现数据混洗操作,有效降低了系统开销。
由于索引值可以通过立即数来实现,操作码可以通过二进制方式表示的代码实现,或者操作码为可以转换为二进制代码的标识符实现,因此,结合实施例一中包含操作码和索引值的向量混洗指令的实现方法,通过以下具体实施例二至实施例六详细描述 包含不同操作码的向量混洗指令的具体处理方式。
实施例二
在本申请的一种具体实现方式中,所述操作码为第一操作码,且所述索引值的数量与所述源元素的数量不同;如图2所示,向量混洗指令的处理方式可以包括:
步骤201:接收指令,所述指令包括:寄存器标识和混洗参数。
本申请实施例中,指令的含义和指令包含的参数如实施例一所述,在此不再赘述。
可选地,源寄存器的数量为一个,即源元素来自一个寄存器。
可选地,混洗参数包括索引值和操作码;其中,索引值通过立即数的形式实现;操作码通过可以转化为二进制代码的标识符的形式实现,且操作码为第一操作码。
可选地,指令格式为“操作码目的寄存器,源寄存器,立即数”。根据该指令格式,在具体实现中,指令可以表示为“[X]VS.{B/H/W}vd,vj,ui8”;[X]VS为第一操作码中的指令名称,[X]为可选项,用于区分不同位数的寄存器,{B/H/W}为第一操作码中的数据类型,B表示数据类型为字节,H表示数据类型为半字,W表示数据类型为字,[X]VS.{B/H/W}为标识符形式的第一操作码;vd表示目的寄存器,vj表示源寄存器,ui8表示立即数。示例性地,VS.{B/H/W}为可以转化为二进制形式的第一操作码,如将[X]VS.B转换为01110011100100二进制形式的第一操作码。此外,立即数可以为一组数据,如可以通过立即数ui8的不同位ui8[1:0]、ui8[3:2]、ui8[5:4]和ui8[7:6]表达针对寄存器中不同位置的索引值。
步骤202:执行所述指令,根据所述操作码和所述索引值,将所述源寄存器中每N1个相邻元素构成一组元素组;其中,所述元素的数据类型为字节、半字、字中的任一种,N1为大于0的正整数。
本申请实施例中,在索引值的数量与源元素的数量不相同时,可以将源寄存器中每N1个相邻元素构成一组元素组,该相邻元素的数据类型可以为字节、半字、字中的任一种,例如,可以将源寄存器内每四个相邻的字元素构成一组元素组等。其中,索引值的数量与源元素的数量不相同时,将源寄存器中每N1个相邻元素构成一组元素组,该相邻元素的数据类型可以为字节、半字、字中的任一种,从元素组中选择源元素,N1为大于0的正整数等多个条件确定为选取规则。例如,N1也是索引值的数量,从而即使索引值的数量少于源元素的数量,根据N1和源元素的数量差距,对源元素进行分组,使得索引值的数量等于每组内的源元素的数量,使得每个索引值同组内的源元素一一对应。
其中,相邻元素为源寄存器中位置依次相邻的元素,相邻多个元素组中的元素地址存在部分相同或者完全不同,该元素地址即为元素在寄存器中的位置信息。在相邻多个元素组之间存在相同位置信息的元素时,每两个相邻元素组之间相同位置信息元素的数量最大值为N1-1。进一步地,相邻元素为源寄存器中交叉相邻的元素。示例性地,当操作码为第一操作码时,数据类型为字节、半字或者字,假设源寄存器中包含八个元素,分别为元素A1、元素A2、元素A3、元素A4、元素A5、元素A6、元素A7、元素A8,上述元素的位置信息为按照示出顺序依次相邻,N1=4;其中,N1个元素可以为依次相邻的元素,也可以为交叉相邻的元素;例如,当N1为四,假设源寄存器中包 含八个元素,分别为元素A1、元素A2、元素A3、元素A4、元素A5、元素A6、元素A7、元素A8时,N1个元素可以为如元素A2~A5,也可以为元素A1、元素A3、元素A5、元素A7交叉相邻的元素。
基于上述实施例,将所述源寄存器中每N1个相邻元素构成一组元素组,包含两种情况:
第一种情况、将元素A1~A4构成一组元素组,将元素A5~A8构成另外一组元素组,两个元素组之间没有相同位置信息的元素;
第二种情况、将元素A1~A4构成一组元素组,将元素A2~A5构成另一组元素组,两个元素组之间有三个相同位置信息的元素(即为元素A2、元素A3和元素A4)。除此之外,还可以选择元素A3~A6作为另一组元素组、或者选择元素A4~A7作为另一组元素组;只要满足每两个相邻元素组之间相同位置信息的元素的数量最大值为N1-1,在此不再赘述。
可选地,在被划分的多个元素组中,每个元素组内包含的元素的数据类型相同,不同元素组内包含的元素的数据类型相同。例如,被划分的元素组包括:元素组1、元素组2和元素组3;其中,元素组1、元素组2和元素组3中包含的元素的数据类型均为字节;或者,元素组1、元素组2和元素组3中包含的元素的数据类型均为半字;或者,元素组1、元素组2和元素组3中包含的元素的数据类型均为字。
进一步地,不同的元素组使用相同的索引值,或者,不同的元素组使用不完全相同的索引值,示例性地,当不同的元素组使用相同的索引值时,元素组1至元素组4均使用相同的索引值ui8;当不同的元素组使用不完全相同的索引值时,元素组1和元素组2使用ui8a作为索引值选取源元素,元素组3和元素组4使用ui8b作为索引值选取源元素,ui8a和ui8b表示ui8中不同的位置,且两者表示不同取值的索引值。
在又一个例子中,多个元素组中,元素组内的各元素的数据类型相同,但不同元素组(如元素组1的元素与元素组2)内的元素具有不同的数据类型。进一步地,各元素组的元素数量相同或不同。例如,在元素组1具有4个元素,元素组2具有2个元素,而相同的立即数ui8为元素组1提供4个索引值,而为元素组2提供2个索引值。
可以理解地,上述示例仅是为了更好地理解本申请实施例的技术方案而列举的示例,不作为对本申请实施例的唯一限制。
在将源寄存器中每N1个相邻元素构成一组元素组之后,执行步骤203。
步骤203:将每个元素组中元素确定为初始源元素。
本申请实施例中,在将源寄存器中每N1个相邻元素构成一组元素组之后,可以将每个元素组中元素确定为初始源元素。初始源元素是指用于选取源元素的初始元素。
在将每个元素组的元素确定为初始源元素之后,执行步骤204。
步骤204:从所述初始源元素中分别获取每个索引值所指示的源元素;从每个所述元素组中选取的源元素的数量为n1个。
本申请实施例中,在确定初始源元素之后,可以从初始源元素中分别获取每个立即数所指示的源元素,即按照立即数从元素组内选取对应的源元素。从每个元素组内选取的源元素的数量为n1个,n1为大于0的正整数。
可选地,立即数和每一组元素组中的元素位置之间存在预设对应关系;该元素位置可以为元素地址,也可以为元素在元素组中的序列位,序列位表示元素在元素组中的位置编号。
可选地,从所述初始源元素中分别获取每个立即数所指示的源元素,即为从每一个元素组中,分别获取立即数对应的元素位置上的元素,将获取的元素确定为源元素。其中,不同元素组中选取的源元素数量相同。
在具体实现中,当操作码为第一操作码,N1=4,数据类型为字节、半字、或者字时,每个元素组中包含的初始源元素个数相同,均为四个,从每个元素组中选取立即数对应的源元素;n1为4;N1=n1。例如,当立即数表示元素地址3时,则从每个元素组中选取地址为3的元素,将选取的所有元素确定为源元素;再如,当立即数表示序列位为3时,从每个元素组中选取从首个元素起依次向后第三个元素,将选取的所有元素确定为源元素。
可选地,N1可以与n1相等或者不相等;当N1=n1时,可以不执行步骤204,而直接将步骤203每个元素组中的元素作为选取的元素。
进一步地,从每个元素组中选取的源元素的数量为四个,源元素的数据类型为字节、半字、或者字;通常情况下,选取的每个源元素的数据类型相同。
步骤205:将选取的源元素确定为目标元素,并将目标元素写入所述目的寄存器中所述索引值对应的位置。
本申请实施例中,立即数和目的寄存器中地址之间存在预设对应关系。可选地,将目标元素写入所述目的寄存器中所述立即数对应的位置,即为从目的寄存器中确定立即数对应的位置,将源元素依次存储在确定的位置上。
进一步地,在一种可行的方案中,可以在步骤201和步骤202之间增加创建中间向量的步骤;具体来说,在根据确定的所述位置信息和源元素数量,从所述源寄存器中选取源元素之前,创建中间向量;所述中间向量包含至少一个中间向量参数,所述中间向量参数数量与所述目标元素的数量相等。基于创建的中间向量,在步骤204之后,即从所述源寄存器中选取源元素之后,将所述选取的每一个源元素分别存储至所述中间向量中的相应中间向量参数中;其中,所述中间向量参数和选取的源元素存在一一对应关系;步骤205即为,根据所述立即数,将每一个所述中间向量参数中的内容写入至所述目的寄存器的相应位置。
可选地,所述中间向量可以根据源寄存器创建;其中,可以根据源寄存器的类型等创建中间向量。
可选地,所述中间向量中的中间向量参数数量和目标元素数量相同,且根据索引值,目的寄存器中每个目标元素的位置与中间向量中每个中间向量参数存在预设对应关系;在源元素分组的情况下,根据所述立即数,将每一个所述中间向量参数中的内容写入至所述目的寄存器的相应位置,即为,设置参数i,i表示常量,i的取值范围为0~n-1,n由寄存器位数和数据类型决定;根据N1和i,确定中间向量中每个中间向量参数所索引的源寄存器中的源元素;对i进行自0至n-1的遍历取值,同时将中间向量中不同索引值对应的源元素写入目标寄存器该索引对应的目标元素位置。具体地,用[N1i]、[N1i+1]、[N1i+2]……[N1i+N1-1]分别表示不同位置,中间向量可以表 示为“中间向量={VR[源寄存器].数据类型[N1i+N1-1],……VR[源寄存器].数据类型[N1i]}”;其中,i表示常量,i的取值范围为0~n,n由寄存器位数和数据类型决定,如当寄存器位数为128位,数据类型为字节时,i为4;当寄存器位数为128位,数据类型为半字时,i为2;当寄存器位数为128位,数据类型为字时,i为1。
基于上述中间向量的方案,示例性地,对于第一操作码为VS.B,vj为源寄存器,创建中间向量vec0={VR[vj].B[4i+3],VR[vj].B[4i+2],VR[vj].B[4i+1],VR[vj].B[4i]};其中,VR[vj].B[4i+3]、VR[vj].B[4i+2]、VR[vj].B[4i+1]、VR[vj].B[4i]均为中间向量参数;i表示常量,[4i+0]、[4i+1]、[4i+2]和[4i+3]则表示寄存器中的四个连续位置;i的取值范围为0~3。将每一个所述中间向量参数中的内容写入至所述目的寄存器vd的相应位置,可以表示为:
VR[vd].B[4i+0]=vec0.B[ui8[1:0]]
VR[vd].B[4i+1]=vec0.B[ui8[3:2]]
VR[vd].B[4i+2]=vec0.B[ui8[5:4]]
VR[vd].B[4i+3]=vec0.B[ui8[7:6]]
其中,ui8[1:0]、ui8[3:2]、ui8[5:4]和ui8[7:6]均为立即数,表示中间向量对应的索引值;具体来讲,由立即数ui8的最低两位(ui8[1:0])表达第一个目标元素在中间向量中的索引,由立即数ui8的第三位和第四位(ui8[3:2])表达第二个目标元素在中间向量中的索引,由立即数ui8的第五位和第六位(ui8[5:4])表达第三个目标元素在中间向量中的索引,由立即数ui8的第七位和第八位(ui8[7:6])表达第四个目标元素在中间向量中的索引。
同理,当数据类型为半字和字时,中间向量和索引方式同上述示例;当操作码的指令名称为XVS.{B/H/W}时,将需要两个中间向量实现向量混洗操作。示例性地,当第一操作码为XVS.B时,中间向量如下所示:
vec0={VR[vj].B[4i+3],VR[vj].B[4i+2],VR[vj].B[4i+1],VR[vj].B[4i]}
vec1={VR[vj].B[4i+19],VR[vj].B[4i+18],VR[vj].B[4i+17],VR[vj].B[4i+16]}
其中,中间向量为vec0和vec1;VR[vj].B[4i+3]、VR[vj].B[4i+2]、VR[vj].B[4i+1]、VR[vj].B[4i]为中间向量vec0的中间向量参数,VR[vj].B[4i+19]、VR[vj].B[4i+18]、VR[vj].B[4i+17]、VR[vj].B[4i+16]为中间向量vec1的中间向量参数;B表示数据类型为字节;i表示元素在寄存器中的位置,[4i+0]、[4i+1]、[4i+2]和[4i+3]表示寄存器中的四个连续位置的元素,以及[4i+16]、[4i+17]、[4i+18]和[4i+19]表示寄存器中的四个连续位置的元素。
示例性地,当第一操作码为XVS.B,数据类型为字节,N1为4时,向量混洗指令“XVS.B vd,vj,ui8”表示从向量寄存器vj中读取四个相邻字节元素构成一组元素进行混洗,然后将得到的结果写入向量寄存器vd内;第一操作码为VS.H,数据类型为半字,N1为4时,向量混洗指令“VS.H vd,vj,ui8”表示从向量寄存器vj中读取四个相邻半字元素构成一组元素进行混洗,然后将得到的结果写入向量寄存器vd内;第一操作码为VS.W,数据类型为字,N1为4时,向量混洗指令“VS.W vd,vj,ui8”表示从向量寄存器vj中读取四个相邻字元素构成一组元素进行混洗,然后将得到的结果写入向量寄存器vd内。
可以理解地,上述示例仅是为了更好地理解本申请的技术方案而列举的示例,不作为对本申请实施例的唯一限制。
本申请实施例中,在向量混洗指令中添加混洗参数,混洗参数包括索引值和操作码,根据索引值和操作码实现了源操作数和索引值数量不同情况下的混洗操作;由此可见,采用本申请技术方案,通过一条向量混洗指令,实现了源操作数和索引值数量不同情况下的混洗操作,无需增加其他指令传递混洗模式,也无需通过访存的方式获取混洗模式,从而有效降低了系统开销,提高了向量混洗操作的执行效率。
实施例三
在本申请的一种具体实现方式中,所述操作码为第二操作码,且所述索引值的数量与所述源元素的数量相同;如图3所示,向量混洗指令的处理方式可以包括:
步骤301:接收指令,所述指令包括:寄存器标识和混洗参数。
本申请实施例中,指令的含义和指令包含的参数如实施例一和实施例二所述,在此不再赘述。
可选地,源寄存器的数量为两个,即源元素来自两个不同的寄存器;当所述源寄存器数量为多个时,所有所述源寄存器中每一个源寄存器标识均与所述目的寄存器标识不同;或者,当所述源寄存器数量为多个时,所有所述源寄存器中存在一个源寄存器标识与所述目的寄存器标识相同。
可选地,混洗参数包括索引值和操作码;其中,索引值通过立即数的形式实现;操作码通过可以转化为二进制代码的标识符的形式实现,且操作码为第二操作码。示例性地,当操作码为第二操作码时,源寄存器包括第一源寄存器和第二源寄存器,且目的寄存器即为第一源寄存器或第二源寄存器。
可选地,指令格式为“操作码目的寄存器,源寄存器,立即数”。根据该指令格式,在具体实现中,指令可以表示为“[X]VS.D vd,vj,ui8”;[X]VS为第二操作码中的指令名称,D为第二操作码中的数据类型,D表示数据类型为双字,[X]VS.D为标识符形式的第二操作码;vd表示目的寄存器,vj和vd表示源寄存器,ui8表示立即数。示例性地,VS.D可以转化为二进制形式的第二操作码,如将VS.D转换为01110011100111二进制形式的第二操作码。此外,立即数可以为一组数据,如可以通过立即数ui8的不同位ui8[1:0]、ui8[3:2]、ui8[5:4]和ui8[7:6]表达索引值。
步骤302:执行所述指令,根据所述操作码和所述索引值,在所述源寄存器中,分别从每N2位的M N2个元素中获取每个索引值所指示的源元素;其中,所述元素的数据类型为双字;每N2位的M N2个元素中选取的源元素的数量为n2个,N2、M N2和n2均为大于0的正整数。
本申请实施例中,索引值的数量与源元素的数量是相同的,操作码为第二操作码,将从每N2位的M N2个元素中获取每个索引值所指示的源元素,元素的数据类型为双字,每N2位的M N2个元素中选取的源元素的数量为n2个,N2、M N2和n2均为大于0的正整数等条件确定为选取规则。
可选地,索引值分别和每个源寄存器中的元素位置之间存在预设对应关系;该元素位置可以为元素地址。在所述源寄存器中,分别从每N2位的M N2个元素中获取每个 索引值所指示的源元素,即为在第一源寄存器中,分别从每N2位的M N2个元素中获取每个索引值所指示的第一源元素,以及在第二源寄存器中,分别从每N2位的M N2个元素中获取每个索引值所指示的第二源元素;将第一源元素和第二源元素确定为最终选取的源元素。其中,M n2个元素可以为依次相邻的元素,也可以为交叉相邻的元素;例如,当M n2为四,假设源寄存器中包含八个元素,分别为元素A1、元素A2、元素A3、元素A4、元素A5、元素A6、元素A7、元素A8时,M n4个元素可以为如元素A2~A5,也可以为元素A1、元素A3、元素A 5、元素A7交叉相邻的元素。
示例性地,当第二操作数为[X]VS.D时,M2为128,M N2为四,n2为2。
在具体实现中,源寄存器的数量为两个,即第一源寄存器和第二源寄存器,在所述源寄存器中,分别从每N2位的M N2个元素中获取每个索引值所指示的源元素,包括:在第一源寄存器中,从每N2位的M N2’个元素中获取第一索引值(如ui8[1:0])所指示的源元素;以及在第一源寄存器中,从每N2位的M N2’个元素中获取第二索引值(如ui8[3:2])所指示的源元素;其中,M N2’为M N2的一半;从第一源寄存器中选取的源元素数量为n2/2,从第二源寄存器中选取的源元素数量为n2/2。当源寄存器的数量为多个时,每个源寄存器通过立即数不同的位进行向量混洗,即不同源寄存器对应的立即数中的位不同;通过立即数的哪些位进行索引根据具体情况而定,在此不再赘述。
步骤303:将选取的源元素确定为目标元素,并将目标元素写入所述目的寄存器中所述索引值对应的位置。
本申请实施例中,立即数和目的寄存器中地址之间存在预设对应关系。可选地,将目标元素写入所述目的寄存器中所述立即数对应的位置,即为从目的寄存器中确定立即数对应的位置,将源元素依次存储在确定的位置上。
进一步地,在一种可行的方案中,可以在步骤301和步骤302之间增加创建中间向量的步骤;具体来说,在根据确定的所述位置信息和源元素数量,从所述源寄存器中选取源元素之前,创建中间向量;所述中间向量包含至少一个中间向量参数,当存在元素组时,所述中间向量参数数量与所述元素组的数量相等;当不存在元素组时,所述中间向量参数数量与所述源元素的数量相等。基于创建的中间向量,在步骤302之后,即从所述源寄存器中选取源元素之后,将所述选取的每一个源元素分别存储至所述中间向量中的相应中间向量参数中;其中,所述中间向量参数和选取的源元素存在一一对应关系;步骤303即为,根据所述立即数,将每一个所述中间向量参数中的内容写入至所述目的寄存器的相应位置。其中,创建中间向量的方法同实施例二,在此不再赘述。
可选地,根据所述立即数,将每一个所述中间向量参数中的内容写入至所述目的寄存器的相应位置,即为,针对每一个所述中间向量参数均执行如下操作:将该中间向量参数中的内容,写入至该中间向量参数对应的索引值所指示的目的寄存器中的位置。
基于上述中间向量的方案,示例性地,对于第二操作码为VS.D,指令格式为VS.Dvd,vj,ui8,vj和vd为源寄存器,创建中间向量vec0={VR[vj],VR[vd]},将每一个所述中间向量参数中的内容写入至所述目的寄存器vd的相应位置,可以表示为:
VR[vd].D[0]=vec0.D[ui8[1:0]]
VR[vd].D[1]=vec0.D[ui8[3:2]]
其中,ui8[1:0]、ui8[3:2]均为立即数,表示寄存器对应的索引值;具体来讲,由立即数ui8的最低两位(ui8[1:0])表达第一个目标元素在源寄存器中的索引,由立即数ui8的第三位和第四位(ui8[3:2])表达第二个目标元素在源寄存器中的索引。
当第二操作码为XVS.D时,将需要两个中间向量实现向量混洗操作。示例性地,中间向量如下所示:
vec0={XR[xj][127:0],XR[xd][127:0]}
vec1={XR[xj][255:128],XR[xd][255:128]}
其中,中间向量为vec0和vec1;XR[xj][127:0],XR[xd][127:0]表示vec0的中间向量参数,XR[xj][255:128],XR[xd][255:128]表示vec1的中间向量参数;D表示数据类型为双字,64比特宽。
示例性地,当第二操作码为VS.D,数据类型为双字,N2为128,M N2为四,n2为2时,向量混洗指令“VS.D vd,vj,ui8”表示分别对向量寄存器vj和向量寄存器vd每128位中的四个双字元素中按照立即数内容选择出两个双字元素,所得到的结果写入到向量寄存器vd对应的128位内;当第二操作码为XVS.D,数据类型为双字,N2为128,M N2为四,n2为2时,向量混洗指令“XVS.D vd,vj,ui8”表示分别从向量寄存器xj和向量寄存器xd每128位中的四个双字元素中按照立即数内容读取两个双字元素,然后将读取的双字元素写入xd对应的128位内。
本申请实施例中,两个源寄存器中存在一个源寄存器和目的寄存器相同,即存在一个寄存器既为源寄存器,又为目的寄存器;采用上述技术方案,每一次执行混洗指令,都可将目的寄存器中一半元素进行覆盖,可以应用在需要执行相应操作的软件应用场景中。
本申请实施例中,在向量混洗指令中添加混洗参数,混洗参数包括索引值和操作码,根据索引值和操作码实现了源操作数和索引值数量相同、且数据类型为双字、寄存器为128位情况下的混洗操作;由此可见,采用本申请技术方案,通过一条向量混洗指令,实现了源操作数和索引值数量不同、且数据类型为双字情况下的混洗操作,无需增加其他指令传递混洗模式,也无需通过访存的方式获取混洗模式,从而有效降低了系统开销,提高了向量混洗操作的执行效率。
实施例四
在本申请的一种具体实现方式中,所述操作码为第三操作码,所述索引值包括第一索引值、第二索引值、第三索引值和第四索引值,所述第一索引值、第二索引值、第三索引值和第四索引值分别索引相同或不同的位置;如图4所示,向量混洗指令的处理方式可以包括:
步骤401:接收指令,所述指令包括:寄存器标识和混洗参数。
本申请实施例中,指令的含义和指令包含的参数如实施例一、实施例二和实施例三所述,在此不再赘述。
可选地,源寄存器的数量为两个,即源元素来自两个不同的寄存器;当所述源寄存器数量为多个时,所有所述源寄存器中每一个源寄存器标识均与所述目的寄存器标 识不同;或者,当所述源寄存器数量为多个时,所有所述源寄存器中存在一个源寄存器标识与所述目的寄存器标识相同。
可选地,混洗参数包括索引值和操作码;其中,索引值通过立即数的形式实现;操作码通过可以转化为二进制代码的标识符的形式实现,且操作码为第三操作码。示例性地,当操作码为第三操作码时,源寄存器包括第一源寄存器和第二源寄存器,且目的寄存器即为第一源寄存器或第二源寄存器。
可选地,指令格式为“操作码目的寄存器,源寄存器,立即数”。根据该指令格式,在具体实现中,指令可以表示为“[X]VP.W vd/xd,vj/xj,ui8”;[X]VP为第三操作码中的指令名称,W为第三操作码中的数据类型,W表示数据类型为字,[X]VP.W为标识符形式的第三操作码;vd/xd表示目的寄存器,vj和vd表示源寄存器(或者xj和xd表示源寄存器),ui8表示立即数。示例性地,VP.W为可以转化为二进制形式的第三操作码,如将VP.W转换为01110011111001二进制形式的第三操作码。此外,立即数可以为一组数据,如可以通过立即数ui8的不同位ui8[1:0]、ui8[3:2]、ui8[5:4]和ui8[7:6]表达索引值。
步骤402:执行所述指令,根据所述操作码和所述索引值,在所述第一源寄存器中,分别从每N3位的M N3个元素中获取第一索引值和第二索引值所指示的源元素;在所述第二源寄存器中,分别从每N3位的M N3个元素中获取第三索引值和第四索引值所指示的源元素。
其中,所述元素的数据类型为字;每N3位的M N3个元素中选取的源元素的数量为n3个,N3、M N3和n3均为大于0的正整数。
本申请实施例中,索引值包括四个索引值,分别为:第一索引值、第二索引值、第三索引值和第四索引值,第一索引值、第二索引值、第三索引值和第四索引值分别索引不同的位置。当源寄存器的数量为多个时,每个源寄存器通过立即数不同的位进行向量混洗,即不同源寄存器对应的立即数中的位不同;通过立即数的哪些位进行索引根据具体情况而定,在此不再赘述。示例性地,当第三操作码为[X]VP.W时,第一索引值为ui8[1:0],第二索引值为ui8[3:2],第三索引值为ui8[5:4],第四索引值为ui8[7:6]。
此外,索引值的数量与源元素的数量相同;将从每N3位的M N3个元素中获取每个索引值所指示的源元素,所述元素的数据类型为字,每N3位的M N3个元素中选取的源元素的数量为n3个,N3、M N3和n3均为大于0的正整数等条件确定为选取规则。其中,M n3个元素可以为依次相邻的元素,也可以为交叉相邻的元素;例如,当M n3为四,假设源寄存器中包含八个元素,分别为元素A1、元素A2、元素A3、元素A4、元素A5、元素A6、元素A7、元素A8时,M n4个元素可以为如元素A2~A5,也可以为元素A1、元素A3、元素A 5、元素A7交叉相邻的元素。
可选地,索引值分别和每个源寄存器中的元素位置之间存在预设对应关系;该元素位置可以为元素地址。在所述源寄存器中,分别从每N3位的M N3个元素中获取每个索引值所指示的源元素,即为在所述第一源寄存器中,分别从每N3位的M N3个元素中获取第一索引值和第二索引值所指示的源元素;以及在所述第二源寄存器中,分别从每N3位的M N3个元素中获取第三索引值和第四索引值所指示的源元素。
示例性地,当第三操作码为[X]VP.W时,N3为128,M N3为四,n3为2。
在源寄存器中分别从每N3位的M N3个元素中获取每个索引值所指示的源元素之后,执行步骤403和步骤404。
步骤403:将所述第一索引值指示的源元素确定为第一目标元素,并将第二索引值指示的源元素确定为第二目标元素。
本申请实施例中,将从第一源寄存器中选取的第一索引值指示的源元素确定为第一目标元素,并将从第一源寄存器中选取的第二索引值指示的源元素确定为第二目标元素。
步骤404:将所述第三索引值指示的源元素确定为第三目标元素,并将第四索引值指示的源元素确定为第四目标元素。
本申请实施例中,将从第二源寄存器中选取的第三索引值指示的源元素确定为第三目标元素,并将从第二源寄存器中选取的第四索引值指示的源元素确定为第四目标元素。
本申请实施例中,步骤403和步骤404可以为同时执行的步骤,也可以为先后执行的步骤,先后顺序不加约束;在全部执行完毕步骤403和步骤404后,执行步骤405。
步骤405:将所述第一目标元素和所述第二目标元素写入到所述目的寄存器中的第一位置;并将所述第三目标元素和所述第四目标元素写入到所述目的寄存器中的第二位置。
本申请实施例中,立即数和目的寄存器中地址之间存在预设对应关系。可选地,将目标元素写入所述目的寄存器中所述立即数对应的位置,即为从目的寄存器中确定立即数对应的位置,将源元素依次存储在确定的位置上。
本申请实施例中,当操作码为第三操作码时,在获取到第一目标元素、第二目标元素、第三目标元素和第四目标元素之后,可以将第一目标元素和第二目标元素写入到目的寄存器中的第一位置,并将第三目标元素和第四目标元素写入到目的寄存器的第二位置。
示例性地,当第三操作码为VP.W/XVP.W(两者简写为[X]VP.W),数据类型为字,N3为128,M N3为四,n3为2时,向量混洗指令“[X]VP.W vd,vj,ui8”表示将ui8[1:0]和ui8[3:2]值作为索引值,从向量寄存器vj/xj中每128位中的四个字元素中各选择出两个分别写入到向量寄存器vd/xd对应128位的第0个和第1个字元素中;将ui8[5:4]和ui8[7:6]值作为索引值,从向量寄存器vd/xd每128位中的四个字元素中各选择出两个分别写入到向量寄存器vd/xd对应128位的第2个和第3个字元素中。
本申请实施例中,在向量混洗指令中添加混洗参数,混洗参数包括索引值和操作码,根据索引值和操作码实现了源操作数和索引值数量相同、且数据类型为字情况下的混洗操作;由此可见,采用本申请技术方案,通过一条向量混洗指令,实现了源操作数和索引值数量相同、且数据类型为字情况下的混洗操作,无需增加其他指令传递混洗模式,也无需通过访存的方式获取混洗模式,从而有效降低了系统开销,提高了向量混洗操作的执行效率。
实施例五
在本申请的一种具体实现方式中,所述操作码为第四操作码,且所述索引值的数量与所述源元素的数量相同;如图5所示,向量混洗指令的处理方式可以包括:
步骤501:接收指令,所述指令包括:寄存器标识和混洗参数。
本申请实施例中,指令的含义和指令包含的参数如实施例一至实施例四所述,在此不再赘述。
可选地,源寄存器的数量为一个,即源元素来自一个寄存器。
可选地,混洗参数包括索引值和操作码;其中,索引值通过立即数的形式实现;操作码通过可以转化为二进制代码的标识符的形式实现,且操作码为第四操作码。
可选地,指令格式为“操作码目的寄存器,源寄存器,立即数”。根据该指令格式,在具体实现中,指令可以表示为XVP.D xd,xj,ui8;XVP为第四操作码中的指令名称,D为第四操作码中的数据类型,D表示数据类型为双字,XVP.D为标识符形式的第四操作码;xd表示目的寄存器,xj表示源寄存器,ui8表示立即数。示例性地,XVP.D可以转化为二进制形式的第四操作码,如将XVP.D转换为01110111111010二进制形式的第四操作码。此外,立即数可以为一组数据,如可以通过立即数ui8的不同位ui8[1:0]、ui8[3:2]、ui8[5:4]和ui8[7:6]表达索引值。
步骤502:执行所述指令,根据所述操作码和所述立即数,在所述源寄存器中,分别从M n4个元素中获取每个索引值所指示的源元素;其中,所述元素的数据类型为双字;选取的源元素的数量为n4个,所述M n4和n4均为大于0的正整数。
本申请实施例中,操作码为第四操作码,第四操作码可以用于指示从源寄存器内获取双字数据类型的元素。索引值的数量与源元素的数量相同;将从M n4个元素中获取每个索引值所指示的源元素,所述元素的数据类型为双字,选取的源元素的数量为n4个,所述M n4和n4均为大于0的正整数等多个条件确定为选取规则。其中,M n4个元素可以为依次相邻的元素,也可以为交叉相邻的元素;例如,当M n4为四,假设源寄存器中包含八个元素,分别为元素A1、元素A2、元素A3、元素A4、元素A5、元素A6、元素A7、元素A8时,M n4个元素可以为如元素A2~A5,也可以为元素A1、元素A3、元素A5、元素A7交叉相邻的元素。
可选地,索引值分别和每个源寄存器中的元素位置之间存在预设对应关系;该元素位置可以为元素地址。根据该第四操作码和索引值确定出选取规则之后,可以在源寄存器中,分别从M n4个元素中获取每个索引值所指示的源元素,获取的源元素的数据类型为双字,选取的源元素的数量为n4个,M n4和n4均为大于0的正整数。
示例性地,当第四操作码为XVP.D时,M n4为四,n4为4。
步骤503:将选取的源元素确定为目标元素,并将目标元素写入所述目的寄存器中所述索引值对应的位置。
本申请实施例中,立即数和目的寄存器中地址之间存在预设对应关系。可选地,将目标元素写入所述目的寄存器中所述立即数对应的位置,即为从目的寄存器中确定立即数对应的位置,将源元素依次存储在确定的位置上。
示例性地,当第四操作码为XVP.D,数据类型为双字,M n4为四,n4为4时,向量混洗指令“XVP.D xd,xj,ui8”表示将ui8[1:0]、ui8[3:2]、ui8[5:4]、ui8[7:6]值作为索引值,从向量寄存器xj中的四个双字元素中选择每一个索引值所指示的源元素, 并将源元素依次写入到向量寄存器xd的四个双字元素中。
本申请实施例中,在向量混洗指令中添加混洗参数,混洗参数包括索引值和操作码,根据索引值和操作码实现了源操作数和索引值数量相同、且数据类型为双字、寄存器为256位情况下的混洗操作;由此可见,采用本申请技术方案,通过一条向量混洗指令,实现了源操作数和索引值数量相同、且数据类型为双字、寄存器为256位情况下的混洗操作,无需增加其他指令传递混洗模式,也无需通过访存的方式获取混洗模式,从而有效降低了系统开销,提高了向量混洗操作的执行效率。
实施例六
在本申请的一种具体实现方式中,所述操作码为第五操作码,所述索引值包括第一索引值和第三索引值,所述第一索引值和第三索引值分别索引不同的位置;所述源寄存器包括第一源寄存器和第二源寄存器;如图6所示,向量混洗指令的处理方式可以包括:
步骤601:接收指令,所述指令包括:寄存器标识和混洗参数。
本申请实施例中,指令的含义和指令包含的参数如实施例一至实施例五所述,在此不再赘述。
可选地,源寄存器的数量为两个,即源元素来自两个不同的寄存器;当所述源寄存器数量为多个时,所有所述源寄存器中每一个源寄存器标识均与所述目的寄存器标识不同;或者,当所述源寄存器数量为多个时,所有所述源寄存器中存在一个源寄存器标识与所述目的寄存器标识相同。
可选地,混洗参数包括索引值和操作码;其中,索引值通过立即数的形式实现;操作码通过可以转化为二进制代码的标识符的形式实现,且操作码为第五操作码。示例性地,当操作码为第五操作码时,源寄存器包括第一源寄存器和第二源寄存器,且目的寄存器即为第一源寄存器或第二源寄存器。
可选地,指令格式为“操作码目的寄存器,源寄存器,立即数”。根据该指令格式,在具体实现中,指令可以表示为XVP.Q vd/xd,vj/xj,ui8;XVP为第五操作码中的指令名称,Q为第五操作码中的数据类型,Q表示数据类型为四字,XVP.Q为标识符形式的第五操作码;xd表示目的寄存器,xj和xd表示源寄存器,ui8表示立即数。示例性地,XVP.Q可以转化为二进制形式的第五操作码,如将XVP.Q转换为01110111111011二进制形式的第五操作码。此外,立即数可以为一组数据,如可以通过立即数ui8的不同位ui8[1:0]、ui8[5:4]表达索引值。
步骤602:执行所述指令,根据所述操作码和所述立即数,在所述第一源寄存器中,从M n5个元素中获取第一索引值所指示的第一源元素;以及,在所述第二源寄存器中,从M n5个元素中获取第二索引值所指示的第二源元素;其中,所述元素的数据类型为四字;选取的源元素的数量为n5个,n5为大于0的正整数。
本申请实施例中,操作码可以为第五操作码,该第五操作码可以用于指示从源寄存器内获取数据类型为四字的元素。索引值包括两个索引值,分别为:第一索引值和第三索引值,第一索引值和第三索引值分别索引不同的位置;第一索引值和第三索引值分别表示同一立即数的不同位,如第一索引值表示立即数ui8的低位,第三索引值 表示立即数ui8的高位,此外,第一索引值还可以表示立即数ui8的最低两位,第三索引值还可以表示立即数ui8的次低两位。示例性地,当第五操作码为XVP.Q时,第一索引值为ui8[1:0],第三索引值为ui8[5:4]。当源寄存器的数量为多个时,每个源寄存器通过立即数不同的位进行向量混洗,即不同源寄存器对应的立即数中的位不同;通过立即数的哪些位进行索引根据具体情况而定,在此不再赘述。
此外,索引值的数量与源元素的数量相同;将在所述第一源寄存器中,从M n5个元素中获取第一索引值所指示的第一源元素,在所述第二源寄存器中,从M n5个元素中获取第二索引值所指示的第二源元素,所述元素的数据类型为四字,选取的源元素的数量为n5个,n5为大于0的正整数等多个条件确定为选取规则。其中,M n5个元素可以为依次相邻的元素,也可以为交叉相邻的元素;例如,当M n5为四,假设源寄存器中包含八个元素,分别为元素A1、元素A2、元素A3、元素A4、元素A5、元素A6、元素A7、元素A8时,M n5个元素可以为如元素A2~A5,也可以为元素A1、元素A3、元素A5、元素A7交叉相邻的元素。
可选地,索引值分别和每个源寄存器中的元素位置之间存在预设对应关系;该元素位置可以为元素地址。在根据第五操作码和索引值确定出选取规则之后,可以在第一源寄存器中从M n5个元素中获取第一索引值所指示的第一源元素,并在第二源寄存器中从M n5个元素中获取第二索引值所指示的第二源元素。从第一源寄存器中选取的源元素数量为n3/2,从第二源寄存器中选取的源元素数量为n3/2。当源寄存器的数量为多个时,每个源寄存器通过立即数不同的位进行向量混洗,即不同源寄存器对应的立即数中的位不同;通过立即数的哪些位进行索引根据具体情况而定,在此不再赘述。
示例性地,当第五操作码为XVP.Q时,M N5为两,n3为2。
在获取第一源元素和第二源元素之后,执行步骤603。
步骤603:分别将第一源元素和第二源源元素确定为目标元素,写入所述目的寄存器的相应位置。
本申请实施例中,立即数和目的寄存器中地址之间存在预设对应关系。可选地,将目标元素写入所述目的寄存器中所述立即数对应的位置,即为从目的寄存器中确定立即数对应的位置,将源元素依次存储在确定的位置上。
本申请实施例中,当操作码为第五操作码时,在获取第一源元素和第二源元素之后,可以将第一源元素确定为目标元素写入目的寄存器的第一位置,将第二源元素确定为目标元素以写入目的寄存器的第二位置。第一位置和第二位置分别由索引值确定。
示例性地,当第五操作码为XVP.Q,数据类型为四字,M N5为两,n3为2时,向量混洗指令“XVP.Q xd,xj,ui8”表示根据ui8[1:0]、ui8[5:4]值,从向量寄存器xj的两个四字元素中选择一个源元素,并从向量寄存器xd的两个四字元素中选择一个源元素,将选取的两个源元素按照索引值写入到向量寄存器xd的两个四字元素中。
本申请实施例中,在向量混洗指令中添加混洗参数,混洗参数包括索引值和操作码,根据索引值和操作码实现了源操作数和索引值数量相同、且数据类型为四字情况下的混洗操作;由此可见,采用本申请技术方案,通过一条向量混洗指令,实现了源操作数和索引值数量相同、且数据类型为四字情况下的混洗操作,无需增加其他指令传递混洗模式,也无需通过访存的方式获取混洗模式,从而有效降低了系统开销,提 高了向量混洗操作的执行效率。
实施例七
参照图7,示出了本申请实施例提供的一种处理器的结构示意图。
如图7所示,该处理器可以包括:
多个向量寄存器,所述多个向量寄存器包括源寄存器72与目标寄存器74,源寄存器71用于存储数据元素;
译码单元71,用于译码向量混洗指令;其中,所述向量混洗指令包括:寄存器标识和混洗参数,所述寄存器标识包括源寄存器标识和目的寄存器标识;
执行单元73,响应于所述向量混洗指令,根据所述混洗参数对从所述源寄存器71获取的源元素执行向量混洗操作,并获取所述向量混洗操作后的目标元素,并将所述目标元素写入所述目的寄存器74。
可选地,指令存储在指令存储器70中。
可选地,所述执行单元73,根据所述混洗参数,确定所述源元素在所述源寄存器71中的位置信息和源元素数量;其中,选取的所述源元素的数量为一个或多个;根据确定的所述位置信息和源元素数量,从所述源寄存器中选取源元素;将所有所述选取的源元素确定为目标元素。
可选地,所述混洗参数包括索引值和操作码;所述索引值用于指示向量混洗操作所需要的每一个源元素在所述源寄存器中的位置信息;所述操作码用于表征对所述源寄存器和目的寄存器所进行的操作;
所述执行单元73,根据所述索引值和所述操作码,确定获取源元素的选取规则;从源寄存器71中,按照所述选取规则,分别获取每个索引值所指示的源元素。
可选地,所述执行单元73,当所述索引值的数量与所述源元素的数量不同时,根据所述索引值的数量,确定对所述源元素进行分组的方式,并根据所述分组的方式和所述操作码,确定所述选取规则;当所述索引值的数量与所述源元素的数量相同时,根据所述操作码,确定所述选取规则。
可选地,所述执行单元73,将所述源寄存器中每N1个相邻元素构成一组元素组;其中,所述元素的数据类型为字节、半字、字中的任一种;N1为大于0的正整数;将每个元素组中元素确定为初始源元素;从所述初始源元素中分别获取每个索引值所指示的源元素;从每个所述元素组中选取的源元素的数量为n1个。
可选地,所述相邻元素为所述源寄存器中位置依次相邻的元素,相邻多个元素组中的元素地址存在部分相同或者完全不同;
其中,每个元素组中包含的元素的数据类型相同;不同元素组中包含的元素的数据类型相同或不同。
可选地,所述操作码为第二操作码,且所述索引值的数量与所述源元素的数量相同;
所述执行单元73,在所述源寄存器中,分别从每N2位的M N2个元素中获取每个索引值所指示的源元素;其中,所述元素的数据类型为双字;每N2位的M N2个元素中选取的源元素的数量为n2个,N2、M N2和n2均为大于0的正整数。
可选地,所述执行单元73,创建中间向量;所述中间向量包含至少一个中间向量参数,当存在元素组时,所述中间向量参数数量与所述元素组的数量相等;当不存在元素组时,所述中间向量参数数量与所述源元素的数量相等;将所述选取的每一个源元素分别存储至所述中间向量中的相应中间向量参数中;其中,所述中间向量参数和选取的源元素存在一一对应关系;根据所述混洗参数,将每一个所述中间向量参数中的内容写入至所述目的寄存器的相应位置。
可选地,所述操作码为第三操作码;所述索引值包括第一索引值、第二索引值、第三索引值和第四索引值,所述第一索引值、第二索引值、第三索引值和第四索引值分别索引不同的位置;所述源寄存器包括第一源寄存器和第二源寄存器;
所述执行单元73,在所述源寄存器71中,分别从每N3位的M N3个元素中获取第一索引值和第二索引值所指示的源元素;以及,在所述第二源寄存器中,分别从每N3位的M N3个元素中获取第三索引值和第四索引值所指示的源元素;其中,所述元素的数据类型为字;每N3位的M N3个元素中选取的源元素的数量为n3个,N3、M N3和n3均为大于0的正整数;将所述第一索引值指示的源元素确定为第一目标元素,并将第二索引值指示的源元素确定为第二目标元素;以及将所述第三索引值指示的源元素确定为第三目标元素,并将第四索引值指示的源元素确定为第四目标元素;将所述第一目标元素和所述第二目标元素写入到所述目的寄存器中的第一位置;并将所述第三目标元素和所述第四目标元素写入到所述目的寄存器中的第二位置。
可选地,所述操作码为第四操作码;
所述执行单元73,在所述源寄存器中,分别从M n4个元素中获取每个索引值所指示的源元素;其中,所述元素的数据类型为双字;选取的源元素的数量为n4个,所述M n4和n4均为大于0的正整数。
可选地,所述操作码为第五操作码;所述索引值包括第一索引值和第三索引值,所述第一索引值和第三索引值分别索引不同的位置;所述源寄存器包括第一源寄存器和第二源寄存器;
所述执行单元73,在所述第一源寄存器中,从M n5个元素中获取第一索引值所指示的第一源元素;以及,在所述第一源寄存器中,从M n5个元素中获取第三索引值所指示的第二源元素;其中,所述元素的数据类型为四字;选取的源元素的数量为n5个,n5为大于0的正整数;分别将第一源元素和第二源源元素确定为目标元素,写入所述目的寄存器的相应位置。
可选地,所述源寄存器数量为一个或多个,所述目的寄存器数量为一个;
当所述源寄存器数量为一个时,所述源寄存器标识与所述目的寄存器标识可以相同或不同;
当所述源寄存器数量为多个时,所有所述源寄存器中每一个源寄存器标识均与所述目的寄存器标识不同;或者,当所述源寄存器数量为多个时,所有所述源寄存器中存在一个源寄存器标识与所述目的寄存器标识相同。
本申请实施例提供的处理器,通过在指令中添加寄存器标识和混洗参数,结合混洗参数可以对源寄存器内获取的元素执行向量混洗操作,因此,通过一条指令即可实现特定功能的向量混洗操作,无需通过多条执行混洗操作的指令实现特定功能,提高 了特定功能的执行效率。
实施例八
如图8所示,电子设备可以包括以下一个或多个组件:处理组件802,存储器804,电源组件806,多媒体组件808,音频组件810,输入/输出(I/O)的接口812,传感器组件814,以及通信组件816。
处理组件802通常控制电子设备的整体操作,诸如与显示,数据通信,相机操作和记录操作相关联的操作。处理元件802可以包括一个或多个处理器820来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件802可以包括一个或多个模块,便于处理组件802和其他组件之间的交互。例如,处理部件802可以包括多媒体模块,以方便多媒体组件808和处理组件802之间的交互。
存储器804被配置为存储各种类型的数据以支持在电子设备的操作。这些数据的示例包括用于在电子设备上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件806为电子设备的各种组件提供电力。电源组件806可以包括电源管理系统,一个或多个电源,及其他与为终端800生成、管理和分配电力相关联的组件。
多媒体组件808包括在所述电子设备和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件808包括一个前置摄像头和/或后置摄像头。当电子设备处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件810被配置为输出和/或输入音频信号。例如,音频组件810包括一个麦克风(MIC),当终端处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器804或经由通信组件816发送。在一些实施例中,音频组件810还包括一个扬声器,用于输出音频信号。
I/O接口812为处理组件802和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件814包括一个或多个传感器,用于为电子设备800提供各个方面的状态评估。例如,传感器组件814可以检测到电子设备800的打开/关闭状态,组件的相对定位,例如所述组件为终端的显示器和小键盘,传感器组件814还可以检测终端或 终端一个组件的位置改变,用户与电子设备接触的存在或不存在,电子设备方位或加速/减速和电子设备的温度变化。传感器组件814可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件814还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件816被配置为便于电子设备和其他设备之间有线或无线方式的通信。电子设备可以接入基于通信标准的无线网络,如WiFi,2G/3G/4G/5G,或它们的组合。在一个示例性实施例中,通信部件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信部件816还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,电子设备可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述向量混洗方法。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器804,上述指令可由电子设备的处理器820执行以完成上述向量混洗方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本申请实施例的电子设备用于实现前述多个方法实施例中相应的向量混洗方法,并且具有相应的方法实施的有益效果,在此不再赘述。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上对本申请所提供的一种向量混洗方法、处理器及电子设备进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。
在此提供的算法和显示不与任何特定计算机、电子系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本申请也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本申请的内容,并且上面对特定语言所做的描述是为了披露本申请的最佳实施方式。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本申请的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个, 在上面对本申请的示例性实施例的描述中,本申请的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本申请要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本申请的单独实施例。
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本申请的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。
本申请的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本申请实施例的浏览器客户端设备中的一些或者全部部件的一些或者全部功能。本申请还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本申请的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
应该注意的是上述实施例对本申请进行说明而不是对本申请进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本申请可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。

Claims (25)

  1. 一种向量混洗方法,所述方法包括:
    接收指令,所述指令包括:寄存器标识和混洗参数;其中,所述寄存器标识包括源寄存器标识和目的寄存器标识;所述源寄存器标识用于表征源寄存器,所述源寄存器为存储执行向量混洗操作时被操作的源元素的寄存器;所述目的寄存器标识用于表征目的寄存器,所述目的寄存器为存储执行所述向量混洗操作后得到的目标元素的寄存器;所述混洗参数用于指示对所述源元素执行向量混洗操作时所依据的参数;
    执行所述指令,以根据所述混洗参数对从所述源寄存器获取的源元素执行向量混洗操作,并获取所述向量混洗操作后的目标元素;
    将所述目标元素写入所述目的寄存器。
  2. 根据权利要求1所述的方法,其特征在于,根据所述混洗参数对从所述源寄存器获取的源元素执行向量混洗操作,并获取所述向量混洗操作后的目标元素,包括:
    根据所述混洗参数,确定向量混洗操作所需要的所述源元素在所述源寄存器中的位置信息和源元素数量;其中,选取的所述源元素的数量为一个或多个;
    根据确定的所述位置信息和源元素数量,从所述源寄存器中选取源元素;
    将所有所述选取的源元素确定为目标元素。
  3. 根据权利要求2所述的方法,其特征在于,所述混洗参数包括索引值和操作码;所述索引值用于指示向量混洗操作所需要的每一个源元素在所述源寄存器中的位置信息;所述操作码用于表征对所述源寄存器和目的寄存器所进行的操作;
    所述根据确定的所述位置信息和源元素数量,从所述源寄存器中选取源元素,包括:
    根据所述索引值和所述操作码,确定获取源元素的选取规则;
    从源寄存器中,按照所述选取规则,分别获取每个索引值所指示的源元素。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述索引值和所述操作码,确定获取源元素的选取规则,包括:
    当所述索引值的数量与所述源元素的数量不同时,根据所述索引值的数量,确定对所述源元素进行分组的方式,并根据所述分组的方式和所述操作码,确定所述选取规则;
    当所述索引值的数量与所述源元素的数量相同时,根据所述操作码,确定所述选取规则。
  5. 根据权利要求4所述的方法,其特征在于,所述操作码为第一操作码,且所述索引值的数量与所述源元素的数量不同;
    所述从源寄存器中,按照所述选取规则,分别获取每个索引值所指示的源元素,包括:
    将所述源寄存器中每N1个相邻元素构成一组元素组;其中,所述元素的数据类型为字节、半字、字中的任一种;N1为大于0的正整数;
    将每个元素组中元素确定为初始源元素;
    从所述初始源元素中分别获取每个索引值所指示的源元素;从每个所述元素组中选取的源元素的数量为n1个。
  6. 根据权利要求5所述的方法,其特征在于,所述相邻元素为所述源寄存器中位置依次相邻的元素,相邻多个元素组中的元素地址存在部分相同或者完全不同;
    其中,每个元素组中包含的元素的数据类型相同;不同元素组中包含的元素的数据类型相同或不同。
  7. 根据权利要求4所述的方法,其特征在于,所述操作码为第二操作码,且所述索引值的数量与所述源元素的数量相同;
    所述从源寄存器中,按照所述选取规则,分别获取每个索引值所指示的源元素,包括:
    在所述源寄存器中,分别从每N2位的M N2个元素中获取每个索引值所指示的源元素;其中,所述元素的数据类型为双字;每N2位的M N2个元素中选取的源元素的数量为n2个,N2、M N2和n2均为大于0的正整数。
  8. 根据权利要求5-7任一项所述的方法,其特征在于,所述根据确定的所述位置信息和源元素数量,从所述源寄存器中选取源元素之前,还包括:
    创建中间向量;所述中间向量包含至少一个中间向量参数,当存在元素组时,所述中间向量参数数量与所述元素组的数量相等;当不存在元素组时,所述中间向量参数数量与所述源元素的数量相等;
    从所述源寄存器中选取源元素,包括:
    将所述选取的每一个源元素分别存储至所述中间向量中的相应中间向量参数中;其中,所述中间向量参数和选取的源元素存在一一对应关系;
    将所述目标元素写入所述目的寄存器,包括:
    根据所述混洗参数,将每一个所述中间向量参数中的内容写入至所述目的寄存器的相应位置。
  9. 根据权利要求4所述的方法,其特征在于,所述操作码为第三操作码;所述索引值包括第一索引值、第二索引值、第三索引值和第四索引值,所述第一索引值、第二索引值、第三索引值和第四索引值分别索引不同的位置;所述源寄存器包括第一源寄存器和第二源寄存器;
    所述从源寄存器中,按照所述选取规则,分别获取每个索引值所指示的源元素,包括:
    在所述第一源寄存器中,分别从每N3位的M N3个元素中获取第一索引值和第二索引值所指示的源元素;以及
    在所述第二源寄存器中,分别从每N3位的M N3个元素中获取第三索引值和第四索引值所指示的源元素;其中,所述元素的数据类型为字;每N3位的M N3个元素中选取的源元素的数量为n3个,N3、M N3和n3均为大于0的正整数;
    所述将所述目标元素写入所述目的寄存器,包括:
    将所述第一索引值指示的源元素确定为第一目标元素,并将第二索引值指示的源元素确定为第二目标元素;以及
    将所述第三索引值指示的源元素确定为第三目标元素,并将第四索引值指示的源元素确定为第四目标元素;
    将所述第一目标元素和所述第二目标元素写入到所述目的寄存器中的第一位置; 并将所述第三目标元素和所述第四目标元素写入到所述目的寄存器中的第二位置。
  10. 根据权利要求4所述的方法,其特征在于,所述操作码为第四操作码;
    所述从源寄存器中,按照所述选取规则,分别获取每个索引值所指示的源元素,包括:
    在所述源寄存器中,分别从M n4个元素中获取每个索引值所指示的源元素;其中,所述元素的数据类型为双字;选取的源元素的数量为n4个,所述M n4和n4均为大于0的正整数。
  11. 根据权利要求4所述的方法,其特征在于,所述操作码为第五操作码;所述索引值包括第一索引值和第三索引值,所述第一索引值和第三索引值分别索引不同的位置;所述源寄存器包括第一源寄存器和第二源寄存器;
    所述从源寄存器中,按照所述选取规则,分别获取每个索引值所指示的源元素,包括:
    在所述第一源寄存器中,从M n5个元素中获取第一索引值所指示的第一源元素;以及,在所述第二源寄存器中,从M n5个元素中获取第三索引值所指示的第二源元素;其中,所述元素的数据类型为四字;选取的源元素的数量为n5个,n5为大于0的正整数;
    所述将所述目标元素写入所述目的寄存器,包括:
    分别将第一源元素和第二源源元素确定为目标元素,写入所述目的寄存器的相应位置。
  12. 根据权利要求1-11任一项所述的方法,其特征在于,所述源寄存器数量为一个或多个,所述目的寄存器数量为一个;
    当所述源寄存器数量为一个时,所述源寄存器标识与所述目的寄存器标识不同;
    当所述源寄存器数量为多个时,所有所述源寄存器中每一个源寄存器标识均与所述目的寄存器标识不同;或者,当所述源寄存器数量为多个时,所有所述源寄存器中存在一个源寄存器标识与所述目的寄存器标识相同。
  13. 一种处理器,包括:
    多个向量寄存器,所述多个向量寄存器包括源寄存器与目标寄存器,源寄存器用于存储数据元素;
    译码单元,用于译码向量混洗指令;其中,所述向量混洗指令包括:寄存器标识和混洗参数,所述寄存器标识包括源寄存器标识和目的寄存器标识;
    执行单元,响应于所述向量混洗指令,根据所述混洗参数对从所述源寄存器获取的源元素执行向量混洗操作,并获取所述向量混洗操作后的目标元素,并将所述目标元素写入所述目的寄存器。
  14. 根据权利要求13所述的处理器,其特征在于,
    所述执行单元,根据所述混洗参数,确定所述源元素在所述源寄存器中的位置信息和源元素数量;其中,选取的所述源元素的数量为一个或多个;根据确定的所述位置信息和源元素数量,从所述源寄存器中选取源元素;将所有所述选取的源元素确定为目标元素。
  15. 根据权利要求14所述的处理器,其特征在于,所述混洗参数包括索引值和操 作码;所述索引值用于指示向量混洗操作所需要的每一个源元素在所述源寄存器中的位置信息;所述操作码用于表征对所述源寄存器和目的寄存器所进行的操作;
    所述执行单元,根据所述索引值和所述操作码,确定获取源元素的选取规则;从源寄存器中,按照所述选取规则,分别获取每个索引值所指示的源元素。
  16. 根据权利要求15所述的处理器,其特征在于,
    所述执行单元,当所述索引值的数量与所述源元素的数量不同时,根据所述索引值的数量,确定对所述源元素进行分组的方式,并根据所述分组的方式和所述操作码,确定所述选取规则;当所述索引值的数量与所述源元素的数量相同时,根据所述操作码,确定所述选取规则。
  17. 根据权利要求16所述的处理器,其特征在于,所述操作码为第一操码,且所述索引值的数量与所述源元素的数量不同;
    所述执行单元,将所述源寄存器中每N1个相邻元素构成一组元素组;其中,所述元素的数据类型为字节、半字、字中的任一种;N1为大于0的正整数;将每个元素组中元素确定为初始源元素;从所述初始源元素中分别获取每个索引值所指示的源元素;从每个所述元素组中选取的源元素的数量为n1个。
  18. 根据权利要求17所述的处理器,其特征在于,所述相邻元素为所述源寄存器中位置依次相邻的元素,相邻多个元素组中的元素地址存在部分相同或者完全不同;
    其中,每个元素组中包含的元素的数据类型相同;不同元素组中包含的元素的数据类型相同或不同。
  19. 根据权利要求16所述的处理器,其特征在于,所述操作码为第二操作码,且所述索引值的数量与所述源元素的数量相同;
    所述执行单元,在所述源寄存器中,分别从每N2位的M N2个元素中获取每个索引值所指示的源元素;其中,所述元素的数据类型为双字;每N2位的M N2个元素中选取的源元素的数量为n2个,N2、M N2和n2均为大于0的正整数。
  20. 根据权利要求17-19任一项所述的处理器,其特征在于,
    所述执行单元,创建中间向量;所述中间向量包含至少一个中间向量参数,当存在元素组时,所述中间向量参数数量与所述元素组的数量相等;当不存在元素组时,所述中间向量参数数量与所述源元素的数量相等;将所述选取的每一个源元素分别存储至所述中间向量中的相应中间向量参数中;其中,所述中间向量参数和选取的源元素存在一一对应关系;根据所述混洗参数,将每一个所述中间向量参数中的内容写入至所述目的寄存器的相应位置。
  21. 根据权利要求16所述的处理器,其特征在于,所述操作码为第三操作码;所述索引值包括第一索引值、第二索引值、第三索引值和第四索引值,所述第一索引值、第二索引值、第三索引值和第四索引值分别索引不同的位置;所述源寄存器包括第一源寄存器和第二源寄存器;
    所述执行单元,在所述源寄存器中,分别从每N3位的M N3个元素中获取第一索引值和第二索引值所指示的源元素;以及,在所述第二源寄存器中,分别从每N3位的M N3个元素中获取第三索引值和第四索引值所指示的源元素;其中,所述元素的数据类型为字;每N3位的M N3个元素中选取的源元素的数量为n3个,N3、M N3和n3均为大于 0的正整数;将所述第一索引值指示的源元素确定为第一目标元素,并将第二索引值指示的源元素确定为第二目标元素;以及将所述第三索引值指示的源元素确定为第三目标元素,并将第四索引值指示的源元素确定为第四目标元素;将所述第一目标元素和所述第二目标元素写入到所述目的寄存器中的第一位置;并将所述第三目标元素和所述第四目标元素写入到所述目的寄存器中的第二位置。
  22. 根据权利要求16所述的处理器,其特征在于,所述操作码为第四操作码;
    所述执行单元,在所述源寄存器中,分别从M n4个元素中获取每个索引值所指示的源元素;其中,所述元素的数据类型为双字;选取的源元素的数量为n4个,所述M n4和n4均为大于0的正整数。
  23. 根据权利要求16所述的处理器,其特征在于,所述操作码为第五操作码;所述索引值包括第一索引值和第三索引值,所述第一索引值和第三索引值分别索引不同的位置;所述源寄存器包括第一源寄存器和第二源寄存器;
    所述执行单元,在所述第一源寄存器中,从M n5个元素中获取第一索引值所指示的第一源元素;以及,在所述第一源寄存器中,从M n5个元素中获取第三索引值所指示的第二源元素;其中,所述元素的数据类型为四字;选取的源元素的数量为n5个,n5为大于0的正整数;分别将第一源元素和第二源源元素确定为目标元素,写入所述目的寄存器的相应位置。
  24. 根据权利要求13-23任一项所述的处理器,其特征在于,所述源寄存器数量为一个或多个,所述目的寄存器数量为一个;
    当所述源寄存器数量为一个时,所述源寄存器标识与所述目的寄存器标识不同;
    当所述源寄存器数量为多个时,所有所述源寄存器中每一个源寄存器标识均与所述目的寄存器标识不同;或者,当所述源寄存器数量为多个时,所有所述源寄存器中存在一个源寄存器标识与所述目的寄存器标识相同。
  25. 一种电子设备,其特征在于,包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行如权利要求1-12中一个或多个所述的向量混洗方法。
PCT/CN2022/137500 2021-12-10 2022-12-08 向量混洗方法、处理器及电子设备 WO2023104143A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111508098.8A CN114297138B (zh) 2021-12-10 2021-12-10 向量混洗方法、处理器及电子设备
CN202111508098.8 2021-12-10

Publications (1)

Publication Number Publication Date
WO2023104143A1 true WO2023104143A1 (zh) 2023-06-15

Family

ID=80968226

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/137500 WO2023104143A1 (zh) 2021-12-10 2022-12-08 向量混洗方法、处理器及电子设备

Country Status (2)

Country Link
CN (1) CN114297138B (zh)
WO (1) WO2023104143A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114297138B (zh) * 2021-12-10 2023-12-26 龙芯中科技术股份有限公司 向量混洗方法、处理器及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104335166A (zh) * 2012-06-29 2015-02-04 英特尔公司 用于执行混洗和操作(混洗-操作)的系统、装置和方法
US20160188532A1 (en) * 2014-12-27 2016-06-30 Intel Corporation Method and apparatus for performing a vector bit shuffle
US20170235516A1 (en) * 2011-12-23 2017-08-17 Intel Corporation Apparatus and method for shuffling floating point or integer values
CN107741861A (zh) * 2011-12-23 2018-02-27 英特尔公司 用于混洗浮点或整数值的装置和方法
CN114297138A (zh) * 2021-12-10 2022-04-08 龙芯中科技术股份有限公司 向量混洗方法、处理器及电子设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040054877A1 (en) * 2001-10-29 2004-03-18 Macy William W. Method and apparatus for shuffling data
JP2008513903A (ja) * 2004-09-21 2008-05-01 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ シャッフル演算のためのマイクロプロセッサデバイス及び方法
WO2009144681A1 (en) * 2008-05-30 2009-12-03 Nxp B.V. Vector shuffle with write enable
EP2584460A1 (en) * 2011-10-20 2013-04-24 ST-Ericsson SA Vector processing system comprising a replicating subsystem and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170235516A1 (en) * 2011-12-23 2017-08-17 Intel Corporation Apparatus and method for shuffling floating point or integer values
CN107741861A (zh) * 2011-12-23 2018-02-27 英特尔公司 用于混洗浮点或整数值的装置和方法
CN104335166A (zh) * 2012-06-29 2015-02-04 英特尔公司 用于执行混洗和操作(混洗-操作)的系统、装置和方法
US20160188532A1 (en) * 2014-12-27 2016-06-30 Intel Corporation Method and apparatus for performing a vector bit shuffle
CN114297138A (zh) * 2021-12-10 2022-04-08 龙芯中科技术股份有限公司 向量混洗方法、处理器及电子设备

Also Published As

Publication number Publication date
CN114297138A (zh) 2022-04-08
CN114297138B (zh) 2023-12-26

Similar Documents

Publication Publication Date Title
US20220353070A1 (en) Instructions and logic to provide simd sm4 cryptographic block cipher functionality
JP5795787B2 (ja) 条件付きループをベクトル化する命令及び論理
KR101679111B1 (ko) 연산 마스크들의 마스킹되지 않는 요소들을 통합하기 위한 프로세서들, 방법들, 시스템들, 및 명령어들
JP6466388B2 (ja) 方法及び装置
US10360035B2 (en) Instruction and logic for Boyer-Moore search of text strings
JP2016103280A (ja) 複数の試験ソースに対するor試験及びand試験機能を提供するために命令をフュージングする方法及び装置
TWI518588B (zh) 遮罩暫存器上的廣播運算技術
US20140019732A1 (en) Systems, apparatuses, and methods for performing mask bit compression
JP2016527650A (ja) ベクトルポピュレーションカウント機能性を提供する方法、装置、命令、およびロジック
JP2017097902A (ja) 複数の128ビットデータパスにおけるsha1ラウンド処理のための命令セット
TWI603208B (zh) 利用索引和立即數執行向量排列的方法和設備
KR102472894B1 (ko) 벡터 패킹된 투플 교차 비교 기능을 제공하기 위한 방법, 장치, 명령어들 및 로직
JP2015535955A (ja) Sha256アルゴリズムのメッセージスケジューリングのための命令セット
WO2023104145A1 (zh) 向量移位方法、处理器及电子设备
JP6419205B2 (ja) プロセッサ、方法、システム、コンピュータシステム、およびコンピュータ可読記憶媒体
JP2016529617A (ja) ゲノム配列決定および配列比較のためのポピュレーションカウント機能性を提供する方法、装置、命令、およびロジック
JP2014530426A (ja) ストライド機能及びマスク機能を有するベクトルロード及びベクトルストアを提供する命令及びロジック
JP6074511B2 (ja) プロセッサ、方法、データプロセッシングシステム、および装置
WO2017185384A1 (zh) 一种用于执行向量循环移位运算的装置和方法
KR20130140143A (ko) 마스크 레지스터를 이용한 점프를 위한 시스템, 장치, 및 방법
WO2023104143A1 (zh) 向量混洗方法、处理器及电子设备
KR102307103B1 (ko) Simd sm4 암호화 블록 암호 기능성을 제공하는 명령어 및 로직
JP5753603B2 (ja) データ要素内のビットをゼロ化するためのシステム、装置、および方法
JP2018503890A (ja) ベクトル水平論理命令のための装置および方法
JP2016167291A (ja) ストライド機能及びマスク機能を有するベクトルロード及びベクトルストアを提供する命令及びロジック

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22903562

Country of ref document: EP

Kind code of ref document: A1