WO2021249054A1 - 一种数据处理方法及装置、存储介质 - Google Patents

一种数据处理方法及装置、存储介质 Download PDF

Info

Publication number
WO2021249054A1
WO2021249054A1 PCT/CN2021/090352 CN2021090352W WO2021249054A1 WO 2021249054 A1 WO2021249054 A1 WO 2021249054A1 CN 2021090352 W CN2021090352 W CN 2021090352W WO 2021249054 A1 WO2021249054 A1 WO 2021249054A1
Authority
WO
WIPO (PCT)
Prior art keywords
register
index
content
instruction code
vector register
Prior art date
Application number
PCT/CN2021/090352
Other languages
English (en)
French (fr)
Inventor
刘君
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to EP21820959.1A priority Critical patent/EP4152146A4/en
Publication of WO2021249054A1 publication Critical patent/WO2021249054A1/zh
Priority to US17/989,141 priority patent/US20230084523A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30105Register structure
    • G06F9/30109Register structure having multiple operands in a single register
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8053Vector processors
    • G06F15/8076Details on data register access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/3013Organisation of register space, e.g. banked or distributed register file according to data content, e.g. floating-point registers, address registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30138Extension of register space, e.g. register cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of signal processing, and in particular to a data processing method, device, and storage medium.
  • VDSP Vector Digital Signal Processor
  • VDSPs provide a large vector register array; for example, a VDSP with a large vector register array can provide 512 1024bit vector registers.
  • the vector register is not only used to temporarily store the intermediate results of several instructions, but also can temporarily store all the data in a relatively independent complex signal processing flow, avoiding repeated loading and storing operations for each data when each SIMD instruction is executed. . It can effectively reduce power consumption, and idle load and store processing units and time slots can be used for other parallel operations in processors with Very Long Instruction Word (VLIW) architecture, further improving the parallelism of VLIW processors Processing power.
  • VLIW Very Long Instruction Word
  • the current vector signal processors of various SIMD architectures access the vector registers by coding the target vector register index into the instruction word.
  • the register index value in the instruction word will occupy too many bits, causing the problem of a large amount of code.
  • the register index is fixed in the instruction word, it cannot be changed at runtime.
  • multiple codes are required to perform different access operations, which leads to the problem of large instruction memory.
  • the embodiments of the present application provide a data processing method, device, and storage medium, which can reduce the amount of code and instruction memory.
  • the embodiment of the present application provides a data processing method, which is applied to a data processing device, the processor of the data processing device includes an index register set, and the method includes:
  • the instruction code is generated by a compiler, and the index register is a register in the index register group;
  • An embodiment of the present application provides a data processing device, the processor of the data processing device includes an index register set, and the data processing device includes:
  • the decoding part is configured to obtain a first index value of an index register according to an instruction code, the instruction code is generated by a compiler, and the index register is a register in the index register group;
  • a determining part configured to determine an index register according to the first index value, and determine a first vector register according to the first content; to execute the instruction code by accessing the first vector register;
  • the obtaining part is configured to obtain the first content stored in the index register.
  • An embodiment of the application provides a data processing device, the data processing device includes: a processor, a memory, and a communication bus; the processor includes an index register set; when the processor executes the operating program stored in the memory, the implementation is as described above Data processing method.
  • the embodiment of the present application provides a storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned data processing method is implemented.
  • the embodiments of the present application provide a data processing method and device, and a storage medium.
  • the processor of the data processing device includes an index register group.
  • the method includes: obtaining a first index value of an index register according to an instruction code, and according to the first index The index value determines the index register, the instruction code is generated by the compiler, the index register is a register in the index register group; the first content stored in the index register is obtained, and the first vector register is determined according to the first content; in order to access the first vector
  • the register executes the instruction code.
  • the processor includes an index register group
  • the instruction code is configured with the index value of the index register
  • the data processing device determines the index register through the instruction code, and accesses and updates the first content in the index register to achieve Different vector registers are accessed during different rounds of instruction code execution, which can reduce the amount of code and instruction memory.
  • Fig. 1 is a frame diagram of cyclic execution of instructions when data is stored in the memory
  • FIG. 2 is a first flowchart of a data processing method provided by an embodiment of this application.
  • FIG. 3 is a schematic diagram of an exemplary instruction field of an instruction code provided by an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of an exemplary addressing mapping of a vector register file within a processor provided by an embodiment of the application;
  • FIG. 5 is a framework diagram of an exemplary instruction loop instruction provided by an embodiment of the application.
  • FIG. 6 is a first structural diagram of a data processing device provided by an embodiment of this application.
  • FIG. 7 is a second structural diagram of a data processing device provided by an embodiment of this application.
  • vinc is a vector plus one instruction
  • the left operand is the target vector register
  • the right operand is the source operand stored in the source vector register. Add to the source operand Stored to the target vector register after 1;
  • the data can only be stored in the memory, and all the data can be processed by loading and storing each data and the address auto-increment operation.
  • the input data D1-D4 and the output data d1-d4 are stored in the memory, and the loop kernel needs to perform the operation of processing vinc, loading the next data, and storing the previous data.
  • the overall processing power consumption is very large.
  • VDSP When VDSP provides a large vector register array, the data is stored in the register, and the corresponding execution code is:
  • the vinc instruction in the loop can only access the fixed register after the completion of the compilation, so 100 data cannot be accessed through the loop instruction, and the instruction can only be expanded 100 times, and different registers are used in each expansion. This method will significantly increase the amount of code, increase instruction memory, and increase the power consumption of the processor's instruction fetching.
  • the embodiment of the present application provides a data processing method, which is applied to a data processing device.
  • the processor of the data processing device includes an index register set. As shown in FIG. 2, the method may include:
  • the data processing method provided by the embodiment of the present application is applicable to a scenario where a vector register array is used to process multiple data in an instruction.
  • the data processing device generates instruction codes through a compiler, where the processor of the data processing device is a vector signal processor (Vector Digital Signal Processor, VDSP).
  • VDSP Vector Digital Signal Processor
  • the processor includes an index register group, and an index register field is configured for the index register in the index register group in the instruction code.
  • the processor determines the address of the corresponding index register through the index register field, and the index register field The value of is the first index value of the index register.
  • the instruction code can be executed multiple times in a loop, and different vector registers are accessed when the instruction code is executed in each round.
  • Field when the register field is parsed, the value corresponding to the index register field is obtained, and the value corresponding to the index register field is determined as the first index value of the index register.
  • the vector register types for indirect addressing by the index register include a source vector register and a target vector register.
  • a first index register can be configured for the source vector register, and a second index register can be configured for the target vector register; or the source vector register and the target vector register use the same index register, and the specific selection is based on actual conditions.
  • This application implements The examples do not make specific restrictions.
  • the index value and the second index of the first index register are extracted from the instruction code.
  • the index value of the first index register and the index value of the second index register are the first index value.
  • the indirect addressing index register is determined according to the first index value. Specifically, the corresponding relationship between the index value and the index register address can be set in advance, and then the index register address corresponding to the first index value is searched from the corresponding relationship between the index value and the index register, so as to locate in the current round of instruction execution process. Index register for indirect addressing.
  • the fields irs1 and irs2 indicate the source index register serial number, and the field ird indicates the target index register serial number.
  • the fields irs1 and field irs2 The and field ird occupies 5 bits respectively, that is, the field irs1, the field irs2 and the field ird can encode 32 index registers to meet the requirements of different platforms.
  • the VDSP When the VDSP is running, it first fetches the instruction code from the program code space, and then decodes the instruction.
  • the field irs1, field irs2 and field ird are used to indirectly address the vector register.
  • irs1 When the value of the field is 2, it means that the third index register is taken out, and the index register number starts from 0.
  • the data processing device After the data processing device determines the index register according to the first index value, the data processing device obtains the first content stored in the index register, and determines the first vector register according to the first content.
  • the number of index registers is one, and the source vector register and the target vector register share one index register; the second index value of the initial source vector register and the third index value of the target source vector register are combined Stored in a dedicated register, or solidified in the instruction code.
  • the process of the data processing device determining the first vector register according to the first content stored in the index memory is specifically: obtaining the second index value corresponding to the initial source vector register and the third index value corresponding to the initial target vector register; according to the second index value And the first content, determine the source vector register; determine the target vector register according to the third index value and the first content; determine the source vector register and the target vector register as the first vector register.
  • the first content stored in an index register represents the index register offset
  • the second index value of the initial source vector register is added to the index register offset to obtain the source vector register
  • the first content of the initial target vector register is added
  • the three index values are added to the offset of the index register to obtain the target vector register.
  • the index register includes a first index register and a second index register, the first index register is used to indirectly address the source vector register, and the second index register is used to indirectly address the target vector register, Obtain the first address offset stored in the first index register and the second address offset stored in the second index register respectively.
  • the first address offset and the second address offset are the above-mentioned first content, corresponding to .
  • the process for the data processing device to determine the first vector register according to the first content specifically includes: decoding the first address offset to obtain the first address; determining the source vector register corresponding to the first address; offsetting the second address Perform decoding to obtain the second address; determine the target vector register corresponding to the second address; determine the source vector register and the target vector register as the first vector register.
  • the address of the source vector register corresponding to the third index register is 221, and the sequence numbers of the vector registers start from 0.
  • the corresponding indirect addressing of the source vector register using the index register can be represented by VR[IR[irs1]].
  • multiple index registers can be used to nest indirect access to vector registers. That is, get the address of index register 1 among multiple index registers from the instruction code; then get the address of the next index register from index register 1, until index register 2 that stores the address of the vector register is obtained, use index register 2 Indirect access to vector registers to complete the process of indirect access to vector registers by nesting multiple index registers.
  • an operation field is configured in the instruction code, and the value of the operation field corresponds to the instruction operation code.
  • the instruction operation code includes: load, store, arithmetic, logic, and shift operations, specific instruction operation codes The selection can be made according to the actual situation, and the embodiment of the present application does not make specific limitations.
  • the arithmetic logic operation is obtained from the instruction code, and then the source data is obtained from the source vector register, and the arithmetic logic operation is performed on the source data to obtain the target data, and the target data is stored in the target vector register, At this point, the execution process of the instruction code of this round is completed.
  • arithmetic operations may include addition operations, subtraction operations, multiplication operations, division operations, remainder operations, exponentiation operations, etc.
  • logical operations may include: logical AND operations, logical NOT operations, logical OR operations, logical exclusive ORs
  • the budget, etc. can be specifically selected according to actual conditions, and the embodiments of this application do not make specific limitations.
  • the data processing device includes an arithmetic logic unit (Arithmetic and Logic Unit, ALU).
  • ALU arithmetic logic unit
  • the arithmetic logic operation in the ALU is specified by the instruction code.
  • the ALU starts from the source vector register.
  • the source data is taken out from the ALU, and the ALU performs arithmetic and logical operations on the metadata, and then stores the target data obtained by the operation in the target vector register.
  • the field opcode indicates an instruction operation code
  • the field opcode occupies 7 bits
  • the field opcode can carry operation codes such as load, store, arithmetic, logic, and shift.
  • the field vm indicates whether the instruction code performs a mask operation, which occupies 1 bit;
  • the field funct3 is a user-defined field, which occupies 6 bits. Since the data is directly read and stored in the vector register in this application, there is no need to load And store the opcode, so it degenerates into a format of 1 operand, which can be distinguished by opcode or funct3.
  • the data processing device executes the instruction code by accessing the first vector register, when the data processing device executes the instruction code, and/or when the data processing device executes the instruction Code, the data processing device updates the first content in the index register according to the instruction code; when the instruction code is executed in the next round, the data processing device accesses the second vector register based on the updated first content, and uses the second vector The register performs the next round of cyclic execution of the instruction code.
  • the data processing device may update the first content in the index register according to the instruction code after executing the instruction code; or in the process of executing the instruction code, update the first content in the index register according to the instruction code.
  • the content, the specific timing for the data processing device to update the first content is selected according to the actual situation, and the embodiment of the present application does not make specific limitations.
  • the instruction code is also configured with an offset field corresponding to the index register.
  • the value of the offset field represents the offset value and the offset type.
  • the offset field is parsed. When, obtain the offset value and offset type; and adjust the first content according to the offset value and offset type to obtain the updated first content; after that, write the updated first content into the index register to use The updated first content replaces the first content.
  • the offset type includes: increase, decrease, and other offset operation types, which are specifically selected according to actual conditions, and are not specifically limited in the embodiment of the present application.
  • the first content is offset according to the offset type, and the step length of the offset is the offset value to obtain the updated first content, and then replace the first content in the index register with the updated first content .
  • the fields ai1, ai2, and ai3 respectively indicate whether the fields irs1, irs2, and ird perform an auto-increment operation, where 0 means no auto-increment operation, 1 means auto-increment, that is, the offset value corresponding to 1 1.
  • the offset type is increase.
  • the field ai1 is 1, get the value of the third index register and the third index register corresponding to the irs1 field, that is, 220.
  • the value of the register is 221, and 221 is written into the third index register.
  • the data processing device when the instruction code is executed in the next round, decodes the instruction code to obtain the first index value of the index register; after that, obtains the updated first content stored in the index register, and according to the update The first content determines the second vector register; the data processing device executes the next round of instruction code by accessing the second vector register, and continues to update the updated first content in the index register according to the instruction code to execute the instruction subsequently The code accesses different vector registers based on the updated first content.
  • the VDSP may include an index register (Index Register File, IRF) 50, an instruction memory 51, a main memory 52, a vector register 53, and a scalar register. 54. ALU 55, address manager 56 and address decoder 57.
  • the address index of the vector register 53 can be configured through the IRF50 to achieve the purpose of accessing any vector register.
  • the working principle of IRF50 is: in each instruction cycle, look up the instruction code from the instruction memory 51 and decode the instruction code to obtain the IR index value; determine the corresponding IRF50 according to the IR index value, and obtain the required access from the IRF50
  • the address offset of the vector register 53 is passed to the address decoder 57, and the actual address of the vector register 53 is decoded by the address decoder 57 for the ALU 55 to fetch numbers from it.
  • the ALU55 stores the intermediate result after the operation in the vector register 53 after completing the corresponding operation.
  • 100 input data D0 ⁇ D99 are directly placed in the registers VR0 ⁇ VR99, ALU adds 1 to it and puts them into VR100 ⁇ VR199, and the processed data is d0 ⁇ d99.
  • the initial value of the index ir0 of the input register is set to 0, which means that the access starts from VR0; the initial value of the index ir1 of the output register is set to 100, which means that the output starts from VR100.
  • ir0 and ir1 are incremented by one each time, and the next input and output registers will be accessed in the next cycle. It can be seen from the above that only one instruction is needed to implement 100 cycles of execution of instructions, thereby reducing the amount of code, instruction memory, and the power consumption of the processor to fetch values.
  • the processor includes an index register group
  • the instruction code is configured with the index value of the index register
  • the data processing device determines the index register through the instruction code, and accesses and updates the first content in the index register to achieve Different vector registers are accessed during different rounds of instruction code execution, which can reduce the amount of code and instruction memory.
  • the embodiment of the application provides a data processing device.
  • the processor of the data processing device includes an index register group, and the data processing device 1 includes:
  • the decoding part 10 is configured to decode the instruction code in the loop code to obtain the first index value of the index register, the instruction code is generated by a compiler, and the index register is a register in the index register group;
  • the determining part 11 is configured to determine an index register according to the first index value, and determine a first vector register according to the first content; to execute the instruction code by accessing the first vector register;
  • the obtaining part 12 is configured to obtain the first content stored in the index register.
  • the device further includes: an instruction execution part and an update part;
  • the instruction execution part is configured to execute the instruction code by accessing the first vector register
  • the update part is configured to update the first content in the index register according to the instruction code, so as to access the second vector register based on the updated first content when the instruction code is executed in the next round.
  • an index register field is configured in the instruction code
  • the decoding part 10 is further configured to decode the instruction code
  • the obtaining part 12 is further configured to obtain the value corresponding to the index register field
  • the determining part 11 is further configured to determine the value corresponding to the index register field as the first index value of the index register.
  • the update part is further configured to update the first content in the index register according to the instruction code when the instruction code is executed and/or when the instruction code is executed.
  • the data processing device further includes: an adjustment part and a writing part;
  • the obtaining part 12 is further configured to obtain the offset value and the offset type from the instruction code
  • the adjustment part is configured to adjust the first content according to the offset value and the offset type to obtain updated first content
  • the writing part is configured to write the updated first content into the index register to replace the first content with the updated first content.
  • the index register is an index register
  • the acquiring part 12 is further configured to acquire a second index value corresponding to the initial source vector register and a third index value corresponding to the initial target vector register;
  • the determining part 11 is further configured to determine a source vector register according to the second index value and the first content; determine a target vector register according to the third index value and the first content; The source vector register and the target vector register are determined to be the first vector register.
  • the index register includes a first index register and a second index register
  • the obtaining part 12 is further configured to obtain the first address offset stored in the first index register; obtain the second address offset stored in the second index register;
  • the determining part 11 is further configured to determine the first address offset and the second address offset as the first content.
  • the decoding part 10 is further configured to decode the first address offset to obtain a first address; decode the second address offset to obtain a second address;
  • the determining part 11 is further configured to determine the source vector register corresponding to the first address; determine the target vector register corresponding to the second address; determine the source vector register and the target vector register as the first A vector register.
  • the data processing device further includes: a storage part;
  • the obtaining part 12 is also configured to obtain arithmetic logic operations from the instruction code; obtain source data from the source vector register;
  • the instruction execution part is further configured to execute the arithmetic logic operation on the source data to obtain target data;
  • the storage part is configured to store the target data in the target vector register.
  • An embodiment of the application provides a data processing device.
  • the processor of the data processing device includes an index register group.
  • the instruction code According to the instruction code, the first index value of the index register is obtained, and the index register is determined according to the first index value.
  • the instruction code passes The compiler generates that the index register is a register in the index register group; obtains the first content stored in the index register, and determines the first vector register according to the first content; to execute the instruction code by accessing the first vector register.
  • the data processing device proposed in this embodiment includes an index register group in the processor, the index value of the index register is configured in the instruction code, and the data processing device determines the index register through the instruction code, and accesses and updates the index register The first content in, implements access to different vector registers during different rounds of instruction code execution, thereby reducing the amount of code and instruction memory.
  • FIG. 7 is a second schematic diagram of the composition structure of a data processing device 1 provided by an embodiment of the application.
  • the data processing device 1 of this embodiment is shown in FIG. It includes a processor 15, a memory 16 and a communication bus 17; the processor 15 includes an index register group 150.
  • the decoding part 10, the determining part 11, the acquiring part 12, the instruction executing part, the updating part, the adjusting part and the writing part can be realized by the processor 15 located on the data processing device 1.
  • the storage part can be realized by the memory 16 located on the data processing device 1.
  • the processor 15 can be an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), or a digital signal processing image processing device.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSPD Digital Signal Processing Device
  • PLD Programmable Logic Image Processing Device
  • FPGA Field Programmable Gate Array
  • CPU controller, microcontroller, microprocessor At least one of them. It can be understood that, for different devices, the electronic devices used to implement the above-mentioned processor functions may also be other, which is not specifically limited in this embodiment.
  • the above-mentioned communication bus 17 is used to realize the connection and communication between the processor 15 and the memory 16; when the above-mentioned processor 15 executes the operating program stored in the memory 16, the following data processing method is implemented:
  • the instruction code is generated by a compiler, and the index register is a register in the index register group 150; The first content stored in the index register, and the first vector register is determined according to the first content; to execute the instruction code by accessing the first vector register.
  • the instruction code is executed by accessing the first vector register; according to the instruction code, the first content in the index register is updated, so as to execute the instruction code in the next round based on the update The first content of the accesses the second vector register.
  • an index register field is configured in the instruction code
  • the processor 15 is further configured to decode the instruction code, and when the register field is parsed, obtain the index register field corresponding The value of; the value corresponding to the index register field is determined as the first index value of the index register.
  • the above-mentioned processor 15 is further configured to, when executing the instruction code, and/or when executing the instruction code, update the first item in the index register according to the instruction code. content.
  • the above-mentioned processor 15 is further configured to obtain an offset value and an offset type from the instruction code; adjust the first content according to the offset value and the offset type to obtain Updated first content; write the updated first content into the index register to replace the first content with the updated first content.
  • the index register is an index register
  • the above-mentioned processor 15 is further configured to obtain the second index value corresponding to the initial source vector register and the third index value corresponding to the initial target vector register; The second index value and the first content determine the source vector register; according to the third index value and the first content, the target vector register is determined; the source vector register and the target vector register are determined to be all The first vector register.
  • the index register includes a first index register and a second index register.
  • the above-mentioned processor 15 is further configured to obtain the first address offset stored in the first index register; The second address offset stored in the second index register; the first address offset and the second address offset are determined as the first content.
  • the above-mentioned processor 15 is further configured to decode the first address offset to obtain the first address; and determine the source vector register corresponding to the first address; The address offset is decoded to obtain a second address; the target vector register corresponding to the second address is determined; the source vector register and the target vector register are determined as the first vector register.
  • the above-mentioned processor 15 is further configured to obtain an arithmetic logic operation from the instruction code; obtain source data from the source vector register, and perform the arithmetic logic operation on the source data, Obtain target data; store the target data in the target vector register.
  • the embodiment of the present application provides a storage medium on which a computer program is stored, and the above-mentioned computer-readable storage medium stores one or more programs, and the above-mentioned one or more programs can be executed by one or more processors and applied to data In the processing device, the computer program implements the data processing method as described.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Complex Calculations (AREA)
  • Advance Control (AREA)

Abstract

一种数据处理方法及装置、存储介质,数据处理装置的处理器中包括索引寄存器组,该方法包括:根据指令代码,得到索引寄存器的第一索引值,并根据第一索引值确定索引寄存器,指令代码通过编译器产生,索引寄存器为索引寄存器组中的寄存器(S101);获取索引寄存器中存储的第一内容,并根据第一内容确定第一矢量寄存器;以通过访问第一矢量寄存器执行指令代码(S102)。

Description

一种数据处理方法及装置、存储介质
相关申请的交叉引用
本申请基于申请号为202010520154.9、申请日为2020年06月09日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。
技术领域
本申请涉及信号处理领域,尤其涉及一种数据处理方法及装置、存储介质。
背景技术
通常在高速信号处理领域有些信号处理模块使用矢量信号处理器对信号进行处理。为了加快对信号处理的速度通常使用单指令多数据(Single Instruction Multiple Data,SIMD)结构的指令在一条指令中同时对多个数据进行处理。拥有SIMD指令集的矢量信号处理器(Vector Digital Signal Processor,VDSP)对数据的操作过程通常是先从VDSP外部的存储器中加载几个矢量寄存器大小的数据到内部的矢量寄存器作为SIMD指令的输入数据,一条SIMD指令执行完成后把存放结果的矢量寄存器中的数据存储至VDSP外部的存储器中。通常寄存器组的大小在32个以内,并且足以满足几条SIMD指令执行过程中的数据暂存能力。
为了改善VDSP的处理功耗,有些VDSP提供超大的矢量寄存器数组;如拥有超大矢量寄存器数组的VDSP可以提供512个1024bit的矢量寄存器。这样矢量寄存器不但用于暂存几条指令的中间结果,而且可以暂存一个相对独立的复杂信号处理流程中的所有数据,避免了每条SIMD指令执行时对每条数据的反复加载和存储操作。能够有效的降低功耗,并且空闲的加载和存储处理单元和时隙在超长指令字(Very Long Instruction Word,VLIW)架构的处理器中可以用于其他并行操作,进一步提高VLIW处理器的并行处理能力。
目前的各种SIMD架构的矢量信号处理器对矢量寄存器的访问是通过把目标矢量寄存器索引编入指令字中。当矢量寄存器的数量大时,指令字中的寄存器索引值就会占用太多的bit位,导致代码量大的问题。而且,如果寄存器索引固化在指令字中,那么在运行时将无法改变,在循环中同一 条指令访问不同矢量寄存器中数据时,需要多条代码执行不同的访问操作,导致指令内存大的问题。
发明内容
本申请实施例提供一种数据处理方法及装置、存储介质,能够减少代码量和指令内存。
本申请的技术方案是这样实现的:
本申请实施例提供一种数据处理方法,应用于数据处理装置,所述数据处理装置的处理器中包括索引寄存器组,所述方法包括:
根据指令代码,得到索引寄存器的第一索引值,并根据所述第一索引值确定索引寄存器,所述指令代码通过编译器产生,所述索引寄存器为所述索引寄存器组中的寄存器;
获取所述索引寄存器中存储的第一内容,并根据所述第一内容确定第一矢量寄存器;以通过访问所述第一矢量寄存器执行所述指令代码。
本申请实施例提供一种数据处理装置,所述数据处理装置的处理器中包括索引寄存器组,所述数据处理装置包括:
译码部分,配置为根据指令代码,得到索引寄存器的第一索引值,所述指令代码通过编译器产生,所述索引寄存器为所述索引寄存器组中的寄存器;
确定部分,配置为根据所述第一索引值确定索引寄存器,并根据所述第一内容确定第一矢量寄存器;以通过访问所述第一矢量寄存器执行所述指令代码;
获取部分,配置为获取所述索引寄存器中存储的第一内容。
本申请实施例提供一种数据处理装置,所述数据处理装置包括:处理器、存储器及通信总线;所述处理器中包括索引寄存器组;所述处理器执行存储器存储的运行程序时实现如上述的数据处理方法。
本申请实施例提供一种存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上述的数据处理方法。
本申请实施例提供了一种数据处理方法及装置、存储介质,数据处理装置的处理器中包括索引寄存器组,该方法包括:根据指令代码,得到索引寄存器的第一索引值,并根据第一索引值确定索引寄存器,指令代码通过编译器产生,索引寄存器为索引寄存器组中的寄存器;获取索引寄存器中存储的第一内容,并根据第一内容确定第一矢量寄存器;以通过访问第一矢量寄存器执行指令代码。采用上述实现方案,在处理器中包括索引寄存器组,指令代码中配置有索引寄存器的索引值,数据处理装置通过指令代码确定索引寄存器,并通过访问和更新索引寄存器中的第一内容,实现在不同轮次的指令代码执行过程中访问不同的矢量寄存器,进而能够减少 代码量和指令内存。
附图说明
图1为一种数据存储在存储器时进行指令循环执行的框架图;
图2为本申请实施例提供的一种数据处理方法的流程图一;
图3为本申请实施例提供的一种示例性的指令代码的指令字段示意图;
图4为本申请实施例提供的一种示例性的处理器内部矢量寄存器堆的寻址映射的结构示意图;
图5为本申请实施例提供的一种示例性的指令循环指令的框架图;
图6为本申请实施例提供的一种数据处理装置的结构示意图一;
图7为本申请实施例提供的一种数据处理装置的结构示意图二。
具体实施方式
应当理解,此处描述的具体实施例仅仅用以解释本申请。并不用于限定本申请。
假设有一段代码需要把100个数字做加一处理,其中,vinc是矢量加一指令,左操作数为目标矢量寄存器,右操作数为源矢量寄存器中存储的源操作数,对源操作数加1后存储至目标矢量寄存器;
当VDSP中的矢量寄存器数组的数量很少时,需要将数据存储在存储器中,其相应的执行代码为:
set r0,#input_addr
set r1,#output_addr
ldi vr1,r0
vinc vr2,vr1|ldi vr1,vr0
Label:loop 98
vsti r1,vr2|vinc vr2,vr1|ldi vr1,r0
jump label
vsti r1,vr2|vinc vr2,vr1
vsti r1,vr2
由于不能对寄存器进行间接访问,如果通过循环指令访问数据那么只能把数据存储在内存中,通过对每一个数据进行加载和存储以及地址自增操作完成对所有数据的处理。根据图1所示,输入数据D1-D4和输出数据d1-d4均存放在存储器中,循环内核中需要执行处理vinc操作、加载下一个数据的操作,以及存储上一个数据的操作。整体的处理功耗非常大。
当VDSP提供了超大的矢量寄存器数组时,将数据存储在寄存器中,其相应的执行代码为:
vinc vr100,vr0
vinc vr101,vr1
Vinc vr199,vr99
上述指令执行过程中,loop中的vinc指令编译完成之后只能访问固定寄存器,所以无法通过循环指令来访问100个数据,只能通过指令展开100次,在每次展开中使用不同寄存器来实现。这种方法会明显增加代码量,增加指令内存以及增加处理器取指的功耗。
为解决上述问题,提出了本方案的一种数据处理方法及装置、存储介质,通过以下实施例进行具体的说明。
实施例一
本申请实施例提供一种数据处理方法,应用于数据处理装置,所述数据处理装置的处理器中包括索引寄存器组,如图2所示,该方法可以包括:
S101、根据指令代码,得到索引寄存器的第一索引值,并根据第一索引值确定索引寄存器,指令代码通过编译器产生,索引寄存器为索引寄存器组中的寄存器。
本申请实施例提供的一种数据处理方法适用于使用矢量寄存器数组对一条指令中的多个数据进行处理的场景下。
本申请实施例中,数据处理装置通过编译器产生指令代码,其中,数据处理装置的处理器为矢量信号处理器(Vector Digital Signal Processor,VDSP)。
可以理解的是,通过编译器产生的指令代码与实际运行代码一致,且编译后程序的行为固定可预知,可以支持后续对指令代码的单步调试功能。
本申请实施例中,处理器中包括索引寄存器组,在指令代码中为索引寄存器组中的索引寄存器配置了索引寄存器字段,处理器通过索引寄存器字段确定对应的索引寄存器的地址,该索引寄存器字段的值即为索引寄存器的第一索引值。
本申请实施例中,指令代码可以循环多次执行,在每一轮执行指令代码时访问不同的矢量寄存器,当执行本轮指令代码时,先对指令代码进行译码,依次解析指令代码中的字段,当解析到寄存器字段时,获取索引寄存器字段对应的值,并将该索引寄存器字段对应的值确定为索引寄存器的第一索引值。
本申请实施例中,索引寄存器进行间接寻址的矢量寄存器类型包括源矢量寄存器和目标矢量寄存器。
可选的,可以为源矢量寄存器配置第一索引寄存器,为目标矢量寄存器配置第二索引寄存器;或者源矢量寄存器和目标矢量寄存器同用一个索引寄存器,具体的根据实际情况进行选择,本申请实施例不做具体的限定。
在一种可选的实施例中,当分别为源矢量寄存器配置第一索引寄存器、为目标矢量寄存器配置第二索引寄存器时,从指令代码中分别提取第一索引寄存器的索引值和第二索引寄存器的索引值,此时,第一索引寄存器的 索引值和第二索引寄存器的索引值即为第一索引值。
本申请实施例中,当获取到索引寄存器的第一索引值之后,根据第一索引值确定间接寻址的索引寄存器。具体的,可以预先设置索引值和索引寄存器地址的对应关系,之后,从索引值和索引寄存器对应关系中查找第一索引值对应的索引寄存器地址,以定位在本轮的指令执行过程中、进行间接寻址的索引寄存器。
示例性的,如图3所示,以32比特编码格式的指令代码为例,字段irs1和字段irs2指示源索引寄存器序号,字段ird指示目标索引寄存器序号,由图3可知,字段irs1、字段irs2和字段ird分别占用5比特,即字段irs1、字段irs2和字段ird均可编码32个索引寄存器,以满足不同平台需求。VDSP运行时,首先从程序代码空间取出指令代码,然后进行指令译码,当解析到字段irs1、字段irs2和字段ird时,利用字段irs1、字段irs2和字段ird进行间接寻址矢量寄存器,当irs1字段的值是2时,表示取出第三个索引寄存器,其中,索引寄存器序号从0开始排序。
S102、获取索引寄存器中存储的第一内容,并根据第一内容确定第一矢量寄存器;以通过访问所述第一矢量寄存器执行所述指令代码。
当数据处理装置根据第一索引值确定索引寄存器之后,数据处理装置获取索引寄存器中存储的第一内容,并根据第一内容确定第一矢量寄存器。
在一种可选的实施例中,索引寄存器的个数为一个,源矢量寄存器和目标矢量寄存器共用一个索引寄存器;将初始源矢量寄存器的第二索引值和目标源矢量寄存器的第三索引值存放至专用的寄存器中,或者固化到指令代码中。数据处理装置根据索引存储器中存储的第一内容确定第一矢量寄存器的过程具体为:获取初始源矢量寄存器对应的第二索引值和初始目标矢量寄存器对应的第三索引值;根据第二索引值和第一内容,确定源矢量寄存器;根据第三索引值和第一内容,确定目标矢量寄存器;将源矢量寄存器和目标矢量寄存器确定为第一矢量寄存器。
需要说明的是,一个索引寄存器中存储的第一内容表征索引寄存器偏移,将初始源矢量寄存器的第二索引值与索引寄存器偏移相加,得到源矢量寄存器;将初始目标矢量寄存器的第三索引值与索引寄存器偏移相加,得到目标矢量寄存器。
在另一种可选的实施例中,索引寄存器包括第一索引寄存器和第二索引寄存器,第一索引寄存器用于间接寻址源矢量寄存器,第二索引寄存器用于间接寻址目标矢量寄存器,分别获取第一索引寄存器中存储的第一地址偏移和第二索引寄存器中存储的第二地址偏移,此时,第一地址偏移和第二地址偏移即为上述第一内容,对应的,数据处理装置根据第一内容确定第一矢量寄存器的过程具体为:对第一地址偏移进行译码,得到第一地址;确定第一地址对应的源矢量寄存器;对第二地址偏移进行译码,得到第二地址;确定第二地址对应的目标矢量寄存器;将源矢量寄存器和目标 矢量寄存器确定为第一矢量寄存器。
示例性的,参考图3,当第三索引寄存器的值为220时,表征第三索引寄存器对应的源矢量寄存器的地址为221,其中,矢量寄存器的序号从0开始排序。对应的利用索引寄存器间接寻址源矢量寄存器可以用VR[IR[irs1]]表示。
本申请实施例中,可以利用多个索引寄存器嵌套间接访问矢量寄存器。即从指令代码中获取多个索引寄存器中索引寄存器1的地址;之后从索引寄存器1中获取下一个索引寄存器的地址,直至获取到存储有矢量寄存器的地址的索引寄存器2时,利用索引寄存器2间接访问矢量寄存器,以完成利用多个索引寄存器嵌套间接访问矢量寄存器的过程。
本申请实施例中,指令代码中配置有操作字段,操作字段的值对应指示指令操作码,可选的,指令操作码包括:加载、存储、算数、逻辑及移位操作,具体的指令操作码可以根据实际情况进行选择,本申请实施例不做具体的限定。
本申请实施例中,从指令代码中获取算数逻辑运算,之后,从源矢量寄存器中获取源数据,并对源数据执行算数逻辑运算,得到目标数据,并将目标数据存储至目标矢量寄存器中,此时,完成了本轮的指令代码的执行过程。
可选的,算数运算可以包括:加法运算、减法运算、乘法运算、除法运算、求余运算、乘幂运算等,逻辑运算可以包括:逻辑与运算、逻辑非运算、逻辑或运算、逻辑异或预算等,具体的可以根据实际情况进行选择,本申请实施例不做具体的限定。
本申请实施例中,数据处理装置包括算数逻辑单元(Arithmetic and Logic Unit,ALU),ALU中的算数逻辑运算由指令代码指定,在确定了源矢量寄存器和目标矢量寄存器之后,ALU从源矢量寄存器中取出源数据,ALU对元数据进行算数逻辑运算,之后,将运算得到的目标数据存储至目标矢量寄存器。
示例性的,如图3所示,字段opcode指示指令操作码,字段opcode占用7比特,在字段opcode中可以携带加载、存储、算数、逻辑及移位等操作码。进一步地,字段vm指示指令代码是否进行mask操作,占用1个比特;字段funct3为用户自定义字段,占用6个比特,由于本申请中直接在矢量寄存器中实现数据的读取和存储,无需加载和存储操作码,故退化为1个操作数的格式,可通过opcode或者funct3来区分。
可以理解的是,通常VDSP中存在多个指令槽,加载、存储和ALU分属不同的指令槽。本申请由于直接在矢量寄存器中实现数据的读取和存储,无需加载和存储操作码,故只有ALU指令槽在运行,加载和存储指令槽空闲,那么加载和存储指令槽可以通过添加其他指令增加VDSP的并行处理能力。
进一步地,当数据处理装置根据第一内容确定第一矢量寄存器之后,数据处理装置通过访问第一矢量寄存器执行指令代码,当数据处理装置执行完成指令代码时,和/或当数据处理装置执行指令代码时,数据处理装置根据指令代码,更新索引寄存器中的第一内容;当进行下一轮执行该指令代码时,数据处理装置基于更新的第一内容访问第二矢量寄存器,并利用第二矢量寄存器进行下一轮指令代码的循环执行。
本申请实施例中,数据处理装置可以在执行完指令代码之后,根据指令代码,更新索引寄存器中的第一内容;或者在执行指令代码的过程中,根据指令代码,更新索引寄存器中的第一内容,具体的数据处理装置更新第一内容的时机根据实际情况进行选择,本申请实施例不做具体的限定。
本申请实施例中,指令代码中还配置有索引寄存器对应的偏移字段,偏移字段的值表征偏移值和偏移类型,在对指令代码进行译码的过程中,在解析偏移字段时,获取偏移值和偏移类型;并根据偏移值和偏移类型对第一内容进行调整,得到更新的第一内容;之后,将更新的第一内容写入索引寄存器中,以利用更新的第一内容替换第一内容。
可选的,偏移类型包括:增加、减小等偏移操作类型,具体的根据实际情况进行选择,本申请实施例不做具体的限定。
本申请实施例中,将第一内容按照偏移类型进行偏移,偏移的步长为偏移值,得到更新的第一内容,之后利用更新的第一内容替换索引寄存器中的第一内容。
示例性的,基于图3,字段ai1,ai2和ai3分别表示字段irs1,irs2和ird是否进行自增操作,其中0表示不进行自增操作,1表示自增,即1对应的偏移值为1,偏移类型为增加。当字段ai1为1时,获取irs1字段对应的第三个索引寄存器和第三个索引寄存器的值,即220,此时,将第三个索引寄存器的值加一,得到更新的第三个索引寄存器的值,即221,并将221写入第三个索引寄存器中。
本申请实施例中,在下一轮执行指令代码时,数据处理装置对指令代码进行译码,得到索引寄存器的第一索引值;之后,获取索引寄存器中存储的更新的第一内容,并根据更新的第一内容确定第二矢量寄存器;数据处理装置通过访问第二矢量寄存器执行下一轮的指令代码,并继续根据指令代码,更新索引寄存器中的更新的第一内容,以在后续执行该指令代码时基于更新的第一内容访问不同的矢量寄存器。
在实际的处理器内部矢量寄存器堆的寻址映射过程中,如图4所示,VDSP可以包括索引寄存器(Index Register File,IRF)50、指令存储器51、主内存52、矢量寄存器53、标量寄存器54、ALU55、地址管理器56和地址译码器57。可以通过IRF50配置矢量寄存器53的地址索引,进而达到访问任意矢量寄存器的目的。IRF50的工作原理为:在每个指令周期中,从指令存储器51中查找指令代码,并对指令代码进行译码得到IR索引值;根 据IR索引值确定对应的IRF50,并从IRF50中获取需要访问的矢量寄存器53的地址偏移,将矢量寄存器53的地址偏移传递给地址译码器57,经由地址译码器57译码出矢量寄存器53的实际地址,供ALU55从中取数。ALU55在完成相应的运算之后将运算后的中间结果存储至矢量寄存器53中。然后根据指令代码配置的相应字段确定自增的偏移值,将自增的偏移值和矢量寄存器53的地址偏移值传递给地址管理器56中的加法器560,加法器560产生更新的地址索引,并将该更新后的地址索引写回到该指令代码使用的IRF中作为下次访问的地址偏移。需要说明的是,在图4中,从主内存52加载几个矢量寄存器大小的数据到内部的矢量寄存器53,在指令执行完成之后,把存放结果的矢量寄存器53中的数据存储回主内存52中,标量寄存器54用于统计主内存的地址。
示例性的,假设有一段代码需要把100个数字做加一处理,其中,vinc是矢量加一指令,左操作数为目标矢量寄存器,右操作数为源矢量寄存器中存储的源操作数,对源操作数加1后存储至目标矢量寄存器,则处理器支持间接寄存器访问的代码为:
set ir0,0
set ir1,100
label:loop 100
vinc vr[ir1++],vr[ir0++]
jump label
由图5所示可知,100输入数据D0~D99直接放在寄存器VR0~VR99中,ALU对其加1操作后放入VR100~VR199,处理后的数据为d0~d99。从代码可以看出输入寄存器的索引ir0初始值设置为0,表示从VR0开始访问;输出寄存器索引ir1初始值设置为100,表示从VR100开始输出。100次的循环中每循环一次就对ir0和ir1进行加1操作,在接下来的一轮循环中就会访问下一个输入和输出寄存器。由上可知,只需要一个指令即可实现指令的100次循环执行,从而减小了代码量、指令内存以及处理器取值的功耗。
可以理解的是,在处理器中包括索引寄存器组,指令代码中配置有索引寄存器的索引值,数据处理装置通过指令代码确定索引寄存器,并通过访问和更新索引寄存器中的第一内容,实现在不同轮次的指令代码执行过程中访问不同的矢量寄存器,进而能够减少代码量和指令内存。
实施例二
本申请实施例提供一种数据处理装置。如图6所示,数据处理装置的处理器中包括索引寄存器组,该数据处理装置1包括:
译码部分10,配置为对循环代码中的指令代码进行译码,得到索引寄存器的第一索引值,所述指令代码通过编译器产生,所述索引寄存器为所述索引寄存器组中的寄存器;
确定部分11,配置为根据所述第一索引值确定索引寄存器,并根据所 述第一内容确定第一矢量寄存器;以通过访问所述第一矢量寄存器执行所述指令代码;
获取部分12,配置为获取所述索引寄存器中存储的第一内容。
可选的,所述装置还包括:指令执行部分和更新部分;
所述指令执行部分,配置为通过访问所述第一矢量寄存器执行所述指令代码;
所述更新部分,配置为根据所述指令代码,更新索引寄存器中的所述第一内容,以在下一轮执行所述指令代码时基于更新的第一内容访问第二矢量寄存器。
可选的,所述指令代码中配置索引寄存器字段;
所述译码部分10,还配置为对所述指令代码进行译码;
所述获取部分12,还配置为获取所述索引寄存器字段对应的值;
所述确定部分11,还配置为将所述索引寄存器字段对应的值确定为所述索引寄存器的第一索引值。
可选的,所述更新部分,还配置为在执行完成所述指令代码时,和/或在执行所述指令代码时,根据所述指令代码,更新索引寄存器中的所述第一内容。
可选的,所述数据处理装置还包括:调整部分和写入部分;
所述获取部分12,还配置为从所述指令代码中获取偏移值和偏移类型;
所述调整部分,配置为根据所述偏移值和偏移类型对所述第一内容进行调整,得到更新的第一内容;
所述写入部分,配置为将所述更新的第一内容写入所述索引寄存器中,以利用所述更新的第一内容替换所述第一内容。
可选的,所述索引寄存器为一个索引寄存器,
所述获取部分12,还配置为获取初始源矢量寄存器对应的第二索引值和初始目标矢量寄存器对应的第三索引值;
所述确定部分11,还配置为根据所述第二索引值和所述第一内容,确定源矢量寄存器;根据所述第三索引值和所述第一内容,确定目标矢量寄存器;将所述源矢量寄存器和所述目标矢量寄存器确定为所述第一矢量寄存器。
可选的,所述索引寄存器包括第一索引寄存器和第二索引寄存器,
所述获取部分12,还配置为获取所述第一索引寄存器中存储的第一地址偏移;获取所述第二索引寄存器中存储的第二地址偏移;
所述确定部分11,还配置为将所述第一地址偏移和所述第二地址偏移确定为所述第一内容。
可选的,所述译码部分10,还配置为对所述第一地址偏移进行译码,得到第一地址;对所述第二地址偏移进行译码,得到第二地址;
所述确定部分11,还配置为确定所述第一地址对应的源矢量寄存器; 确定所述第二地址对应的目标矢量寄存器;将所述源矢量寄存器和所述目标矢量寄存器确定为所述第一矢量寄存器。
可选的,所述数据处理装置还包括:存储部分;
所述获取部分12,还配置为从所述指令代码中获取算数逻辑运算;从所述源矢量寄存器中获取源数据;
所述指令执行部分,还配置为对所述源数据执行所述算数逻辑运算,得到目标数据;
所述存储部分,配置为将所述目标数据存储至所述目标矢量寄存器中。
本申请实施例提供的一种数据处理装置,数据处理装置的处理器中包括索引寄存器组,根据指令代码,得到索引寄存器的第一索引值,并根据第一索引值确定索引寄存器,指令代码通过编译器产生,索引寄存器为索引寄存器组中的寄存器;获取索引寄存器中存储的第一内容,并根据第一内容确定第一矢量寄存器;以通过访问第一矢量寄存器执行指令代码。由此可见,本实施例提出的数据处理装置,在处理器中包括索引寄存器组,指令代码中配置有索引寄存器的索引值,数据处理装置通过指令代码确定索引寄存器,并通过访问和更新索引寄存器中的第一内容,实现在不同轮次的指令代码执行过程中访问不同的矢量寄存器,进而能够减少代码量和指令内存。
图7为本申请实施例提供的一种数据处理装置1的组成结构示意图二,在实际应用中,基于上述实施例的同一公开构思下,如图7所示,本实施例的数据处理装置1包括:处理器15、存储器16及通信总线17;所述处理器15中包括索引寄存器组150。
在具体的实施例的过程中,上述译码部分10、确定部分11、获取部分12、指令执行部分、更新部分、调整部分和写入部分可由位于数据处理装置1上的处理器15实现,上述存储部分可由位于数据处理装置1上的存储器16实现,上述处理器15可以为特定用途集成电路(ASIC,Application Specific Integrated Circuit)、数字信号处理器(DSP,Digital Signal Processor)、数字信号处理图像处理装置(DSPD,Digital Signal Processing Device)、可编程逻辑图像处理装置(PLD,Programmable Logic Device)、现场可编程门阵列(FPGA,Field Programmable Gate Array)、CPU、控制器、微控制器、微处理器中的至少一种。可以理解地,对于不同的设备,用于实现上述处理器功能的电子器件还可以为其它,本实施例不作具体限定。
在本申请实施例中,上述通信总线17用于实现处理器15和存储器16之间的连接通信;上述处理器15执行存储器16中存储的运行程序时实现如下的数据处理方法:
根据指令代码,得到索引寄存器的第一索引值,并根据所述第一索引值确定索引寄存器,所述指令代码通过编译器产生,所述索引寄存器为所述索引寄存器组150中的寄存器;获取所述索引寄存器中存储的第一内容, 并根据所述第一内容确定第一矢量寄存器;以通过访问所述第一矢量寄存器执行所述指令代码。
在本申请实施例中,通过访问所述第一矢量寄存器执行所述指令代码;根据所述指令代码,更新索引寄存器中的所述第一内容,以在下一轮执行所述指令代码时基于更新的第一内容访问第二矢量寄存器。
在本申请实施例中,所述指令代码中配置索引寄存器字段,上述处理器15,还用于对所述指令代码进行译码,当解析到所述寄存器字段时,获取所述索引寄存器字段对应的值;将所述索引寄存器字段对应的值确定为所述索引寄存器的第一索引值。
在本申请实施例中,上述处理器15,还用于在执行完成所述指令代码时,和/或在执行所述指令代码时,根据所述指令代码,更新索引寄存器中的所述第一内容。
在本申请实施例中,上述处理器15,还用于从所述指令代码中获取偏移值和偏移类型;根据所述偏移值和偏移类型对所述第一内容进行调整,得到更新的第一内容;将所述更新的第一内容写入所述索引寄存器中,以利用所述更新的第一内容替换所述第一内容。
在本申请实施例中,所述索引寄存器为一个索引寄存器,上述处理器15,还用于获取初始源矢量寄存器对应的第二索引值和初始目标矢量寄存器对应的第三索引值;根据所述第二索引值和所述第一内容,确定源矢量寄存器;根据所述第三索引值和所述第一内容,确定目标矢量寄存器;将所述源矢量寄存器和所述目标矢量寄存器确定为所述第一矢量寄存器。
在本申请实施例中,所述索引寄存器包括第一索引寄存器和第二索引寄存器,上述处理器15,还用于获取所述第一索引寄存器中存储的第一地址偏移;获取所述第二索引寄存器中存储的第二地址偏移;将所述第一地址偏移和所述第二地址偏移确定为所述第一内容。
在本申请实施例中,上述处理器15,还用于对所述第一地址偏移进行译码,得到第一地址;并确定所述第一地址对应的源矢量寄存器;对所述第二地址偏移进行译码,得到第二地址;并确定所述第二地址对应的目标矢量寄存器;将所述源矢量寄存器和所述目标矢量寄存器确定为所述第一矢量寄存器。
在本申请实施例中,上述处理器15,还用于从所述指令代码中获取算数逻辑运算;从所述源矢量寄存器中获取源数据,并对所述源数据执行所述算数逻辑运算,得到目标数据;将所述目标数据存储至所述目标矢量寄存器中。
本申请实施例提供一种存储介质,其上存储有计算机程序,上述计算机可读存储介质存储有一个或者多个程序,上述一个或者多个程序可被一个或者多个处理器执行,应用于数据处理装置中,该计算机程序实现如所述的数据处理方法。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台图像显示设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本公开各个实施例所述的方法。
以上所述,仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。

Claims (20)

  1. 一种数据处理方法,应用于数据处理装置,所述数据处理装置的处理器中包括索引寄存器组,所述方法包括:
    根据指令代码,得到索引寄存器的第一索引值,并根据所述第一索引值确定索引寄存器,所述指令代码通过编译器产生,所述索引寄存器为所述索引寄存器组中的寄存器;
    获取所述索引寄存器中存储的第一内容,并根据所述第一内容确定第一矢量寄存器;以通过访问所述第一矢量寄存器执行所述指令代码。
  2. 根据权权利要求1所述的方法,其中,所述根据所述第一内容确定第一矢量寄存器之后,所述方法还包括:
    通过访问所述第一矢量寄存器执行所述指令代码;
    根据所述指令代码,更新索引寄存器中的所述第一内容,以在下一轮执行所述指令代码时基于更新的第一内容访问第二矢量寄存器。
  3. 根据权利要求1所述的方法,其中,所述指令代码中配置索引寄存器字段,所述根据指令代码,得到索引寄存器的第一索引值,包括:
    对所述指令代码进行译码,获取所述索引寄存器字段对应的值;
    将所述索引寄存器字段对应的值确定为所述索引寄存器的第一索引值。
  4. 根据权利要求2所述的方法,其中,所述根据所述指令代码,更新索引寄存器中的所述第一内容,包括:
    在执行完成所述指令代码时,和/或在执行所述指令代码时,根据所述指令代码,更新索引寄存器中的所述第一内容。
  5. 根据权利要求2或4所述的方法,其中,所述根据所述指令代码,更新索引寄存器中的所述第一内容,包括:
    从所述指令代码中获取偏移值和偏移类型;
    根据所述偏移值和偏移类型对所述第一内容进行调整,得到更新的第一内容;
    将所述更新的第一内容写入所述索引寄存器中,以利用所述更新的第一内容替换所述第一内容。
  6. 根据权利要求1所述的方法,其中,所述索引寄存器为一个索引寄存器,所述根据所述第一内容确定第一矢量寄存器,包括:
    获取初始源矢量寄存器对应的第二索引值和初始目标矢量寄存器对应的第三索引值;
    根据所述第二索引值和所述第一内容,确定源矢量寄存器;
    根据所述第三索引值和所述第一内容,确定目标矢量寄存器;
    将所述源矢量寄存器和所述目标矢量寄存器确定为所述第一矢量寄存器。
  7. 根据权利要求1所述的方法,其中,所述索引寄存器包括第一索引寄存器和第二索引寄存器,所述获取所述索引寄存器中存储的第一内容,包括:
    获取所述第一索引寄存器中存储的第一地址偏移;
    获取所述第二索引寄存器中存储的第二地址偏移;
    将所述第一地址偏移和所述第二地址偏移确定为所述第一内容。
  8. 根据权利要求7所述的方法,其中,所述根据所述第一内容确定第一矢量寄存器,包括:
    对所述第一地址偏移进行译码,得到第一地址;并确定所述第一地址对应的源矢量寄存器;
    对所述第二地址偏移进行译码,得到第二地址;并确定所述第二地址对应的目标矢量寄存器;
    将所述源矢量寄存器和所述目标矢量寄存器确定为所述第一矢量寄存器。
  9. 根据权利要求6或8所述的方法,其中,所述通过访问所述第一矢量寄存器执行所述指令代码,包括:
    从所述指令代码中获取算数逻辑运算;
    从所述源矢量寄存器中获取源数据,并对所述源数据执行所述算数逻辑运算,得到目标数据;
    将所述目标数据存储至所述目标矢量寄存器中。
  10. 一种数据处理装置,所述数据处理装置的处理器中包括索引寄存器组,所述数据处理装置包括:
    译码部分,配置为根据指令代码,得到索引寄存器的第一索引值,所述指令代码通过编译器产生,所述索引寄存器为所述索引寄存器组中的寄存器;
    确定部分,配置为根据所述第一索引值确定索引寄存器,并根据第一内容确定第一矢量寄存器;以通过访问所述第一矢量寄存器执行所述指令代码;
    获取部分,配置为获取所述索引寄存器中存储的第一内容。
  11. 根据权利要求10所述的装置,其中,所述装置还包括:指令执行部分和更新部分;
    所述指令执行部分,配置为通过访问所述第一矢量寄存器执行所述指令代码;
    所述更新部分,配置为根据所述指令代码,更新索引寄存器中的所述第一内容,以在下一轮执行所述指令代码时基于更新的第一内容访问第二矢量寄存器。
  12. 根据权利要求10所述的装置,其中,所述指令代码中配置索引寄存器字段;
    所述译码部分,还配置为对所述指令代码进行译;
    所述获取部分,还配置为获取所述索引寄存器字段对应的值;
    所述确定部分,还配置为将所述索引寄存器字段对应的值确定为所述索引寄存器的第一索引值。
  13. 根据权利要求11所述的装置,其中,
    所述更新部分,还配置为在执行完成所述指令代码时,和/或在执行所述指令代码时,根据所述指令代码,更新索引寄存器中的所述第一内容。
  14. 根据权利要求11或13所述的装置,其中,所述装置还包括:调整部分和写入部分;
    所述获取部分,还配置为从所述指令代码中获取偏移值和偏移类型;
    所述调整部分,配置为根据所述偏移值和偏移类型对所述第一内容进行调整,得到更新的第一内容;
    所述写入部分,配置为将所述更新的第一内容写入所述索引寄存器中,以利用所述更新的第一内容替换所述第一内容。
  15. 根据权利要求10所述的装置,其中,所述索引寄存器为一个索引寄存器,
    所述获取部分,还配置为获取初始源矢量寄存器对应的第二索引值和初始目标矢量寄存器对应的第三索引值;
    所述确定部分,还配置为根据所述第二索引值和所述第一内容,确定源矢量寄存器;根据所述第三索引值和所述第一内容,确定目标矢量寄存器;将所述源矢量寄存器和所述目标矢量寄存器确定为所述第一矢量寄存器。
  16. 根据权利要求10所述的装置,其中,所述索引寄存器包括第一索引寄存器和第二索引寄存器,
    所述获取部分,还配置为获取所述第一索引寄存器中存储的第一地址偏移;获取所述第二索引寄存器中存储的第二地址偏移;
    所述确定部分,还配置为将所述第一地址偏移和所述第二地址偏移确定为所述第一内容。
  17. 根据权利要求16所述的装置,其中,
    所述译码部分,还配置为对所述第一地址偏移进行译码,得到第一地址;对所述第二地址偏移进行译码,得到第二地址;
    所述确定部分,还配置为确定所述第一地址对应的源矢量寄存器;确定所述第二地址对应的目标矢量寄存器;将所述源矢量寄存器和所述目标矢量寄存器确定为所述第一矢量寄存器。
  18. 根据权利要求15或17所述的装置,其中,所述装置还包括:存储部分;
    所述获取部分,还配置为从所述指令代码中获取算数逻辑运算;从所述源矢量寄存器中获取源数据;
    所述指令执行部分,还配置为对所述源数据执行所述算数逻辑运算,得到目标数据;
    所述存储部分,配置为将所述目标数据存储至所述目标矢量寄存器中。
  19. 一种数据处理装置,所述数据处理装置包括:处理器、存储器及通信总线;所述处理器中包括索引寄存器组;所述处理器执行存储器存储的运行程序时实现如权利要求1-9任一项所述的方法。
  20. 一种存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如权利要求1-9任一项所述的方法。
PCT/CN2021/090352 2020-06-09 2021-04-27 一种数据处理方法及装置、存储介质 WO2021249054A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21820959.1A EP4152146A4 (en) 2020-06-09 2021-04-27 DATA PROCESSING METHOD AND DEVICE AND STORAGE MEDIUM
US17/989,141 US20230084523A1 (en) 2020-06-09 2022-11-17 Data Processing Method and Device, and Storage Medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010520154.9A CN111782270B (zh) 2020-06-09 2020-06-09 一种数据处理方法及装置、存储介质
CN202010520154.9 2020-06-09

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/989,141 Continuation US20230084523A1 (en) 2020-06-09 2022-11-17 Data Processing Method and Device, and Storage Medium

Publications (1)

Publication Number Publication Date
WO2021249054A1 true WO2021249054A1 (zh) 2021-12-16

Family

ID=72755787

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090352 WO2021249054A1 (zh) 2020-06-09 2021-04-27 一种数据处理方法及装置、存储介质

Country Status (4)

Country Link
US (1) US20230084523A1 (zh)
EP (1) EP4152146A4 (zh)
CN (1) CN111782270B (zh)
WO (1) WO2021249054A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782270B (zh) * 2020-06-09 2023-12-19 Oppo广东移动通信有限公司 一种数据处理方法及装置、存储介质
CN112307431B (zh) * 2020-11-09 2023-10-27 哲库科技(上海)有限公司 一种vdsp、数据处理方法及通讯设备
CN114461274A (zh) * 2022-01-30 2022-05-10 上海阵量智能科技有限公司 指令处理装置、方法、芯片、计算机设备以及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040250044A1 (en) * 2003-03-28 2004-12-09 Seiko Epson Corporation Method for referring to address of vector data and vector processor
CN104040489A (zh) * 2011-12-23 2014-09-10 英特尔公司 多寄存器收集指令
CN107003846A (zh) * 2014-12-23 2017-08-01 英特尔公司 用于向量索引加载和存储的方法和装置
CN108292227A (zh) * 2015-12-30 2018-07-17 英特尔公司 用于步进加载的系统、设备和方法
CN111782270A (zh) * 2020-06-09 2020-10-16 Oppo广东移动通信有限公司 一种数据处理方法及装置、存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8452945B2 (en) * 2002-09-17 2013-05-28 Hewlett-Packard Development Company, L.P. Indirect indexing instructions
JP2006004226A (ja) * 2004-06-18 2006-01-05 Asahi Kasei Corp データ演算装置
EP2584460A1 (en) * 2011-10-20 2013-04-24 ST-Ericsson SA Vector processing system comprising a replicating subsystem and method
GB2543303B (en) * 2015-10-14 2017-12-27 Advanced Risc Mach Ltd Vector data transfer instruction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040250044A1 (en) * 2003-03-28 2004-12-09 Seiko Epson Corporation Method for referring to address of vector data and vector processor
CN104040489A (zh) * 2011-12-23 2014-09-10 英特尔公司 多寄存器收集指令
CN107003846A (zh) * 2014-12-23 2017-08-01 英特尔公司 用于向量索引加载和存储的方法和装置
CN108292227A (zh) * 2015-12-30 2018-07-17 英特尔公司 用于步进加载的系统、设备和方法
CN111782270A (zh) * 2020-06-09 2020-10-16 Oppo广东移动通信有限公司 一种数据处理方法及装置、存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4152146A4 *

Also Published As

Publication number Publication date
EP4152146A1 (en) 2023-03-22
CN111782270B (zh) 2023-12-19
US20230084523A1 (en) 2023-03-16
CN111782270A (zh) 2020-10-16
EP4152146A4 (en) 2023-11-08

Similar Documents

Publication Publication Date Title
WO2021249054A1 (zh) 一种数据处理方法及装置、存储介质
JP4986431B2 (ja) プロセッサ
US10241791B2 (en) Low energy accelerator processor architecture
TWI476597B (zh) 資料處理裝置及半導體積體電路裝置
US8135975B2 (en) Software programmable timing architecture
US11341085B2 (en) Low energy accelerator processor architecture with short parallel instruction word
CN108885551B (zh) 存储器复制指令、处理器、方法和系统
US10303399B2 (en) Data processing apparatus and method for controlling vector memory accesses
US20060095726A1 (en) Independent hardware based code locator
JP2005182659A (ja) Vliw型dsp,及びその動作方法
US6957323B2 (en) Operand file using pointers and reference counters and a method of use
US6986028B2 (en) Repeat block with zero cycle overhead nesting
JP2004086837A (ja) データ処理装置
CN108959180B (zh) 一种数据处理方法及系统
JP4073721B2 (ja) データ処理装置
US20190369995A1 (en) Vector generating instruction
CN111813447B (zh) 一种数据拼接指令的处理方法和处理装置
US6934728B2 (en) Euclidean distance instructions
JP2020140290A (ja) 中央演算処理装置
JP2001216154A (ja) むき出しのパイプラインを具備するコードのサイズを、nop演算を命令オペランドとしてコード化することで削減するための方法並びに装置
JPH05250156A (ja) Riscプロセッサ
JP2009134745A (ja) データ処理装置
JP2007048316A (ja) データ処理装置
JPH04168526A (ja) ループ制御方式

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21820959

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021820959

Country of ref document: EP

Effective date: 20221215

NENP Non-entry into the national phase

Ref country code: DE