CN114461277A

CN114461277A - Design and application method of DSP instruction set special for electric power

Info

Publication number: CN114461277A
Application number: CN202111634287.XA
Authority: CN
Inventors: 周柯; 习伟; 金庆忍; 姚浩; 王晓明; 莫枝阅; 李肖博; 蔡田田; 吴丽芳; 于杨; 王泽宇
Original assignee: Electric Power Research Institute of Guangxi Power Grid Co Ltd; Southern Power Grid Digital Grid Research Institute Co Ltd
Current assignee: Electric Power Research Institute of Guangxi Power Grid Co Ltd; Southern Power Grid Digital Grid Research Institute Co Ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-05-10

Abstract

The invention belongs to the technical field of DSP design, and particularly relates to a design and application method of a DSP instruction set special for electric power. And the execution sequence among the instructions is determined by a compiler by adopting a Very Long Instruction Word (VLIW) technology, so that a complex hardware instruction scheduler is avoided, and two instruction formats are supported: a four-slot VLIW instruction and a two-slot immediate instruction; the multiple instruction formats can realize more compact instruction encoding, the condition that encoding space cannot be fully utilized when the optimized operands are insufficient is optimized, the support type of each operation is limited instead of supporting all the operations when the instruction formats are defined, the design complexity can be reduced, and registers and operation resources can be fully utilized.

Description

Design and application method of DSP instruction set special for electric power

Technical Field

The invention belongs to the technical field of DSP design, and particularly relates to a design and application method of a DSP instruction set special for electric power.

Background

At present, DSP has been widely used in the fields of wireless communication, voice recognition, multimedia, internet and defense due to its powerful real-time data processing capability. The general DSP executable program does not have the characteristic of power specialization, the traditional CPU adopts a scalar architecture without a production line, vector operation is not carried out, and the problems of low resource utilization efficiency, poor performance and the like exist when one instruction is executed in sequence each time.

Disclosure of Invention

In order to solve the above problems, the present invention provides a method for designing and applying a DSP instruction set dedicated for power, which has the following specific technical solutions:

a design method of a power-specific DSP instruction set comprises the following steps:

s1: designing a general register of the DSP; the general registers comprise 16 32-bits data registers, 8 address registers, 8 index registers, 1 PC register, 1 SP register, 1 LR register, 8 1-bit condition registers and 2 256-bits vector registers;

s2: designing a scalar instruction of the DSP, wherein the scalar instruction is designed into a VILW instruction by adopting a very-long instruction word technology; two instruction formats are supported: a four-slot VLIW instruction and a two-slot immediate instruction;

s3: designing vector instructions, wherein the vector instructions are designed by adopting a single instruction multiple data stream technology.

Preferably, the 256-bits vector register is an 8 x 32-bits vector register.

Preferably, the 8 1-bit condition registers and 2 256-bits vector registers are dedicated to single instruction multiple data stream operations.

Preferably, the scalar instructions include control instructions, scalar arithmetic instructions, scalar read-write instructions, scalar logic instructions.

Preferably, the method further comprises the steps of designing a load operation for loading 8 32-bit data into the vector register at one time and a multiply-add operation in a vector mode.

Preferably, the multiply-add operation specifically calculates the multiplication of two 8 × 32-bits data to obtain 8 × 64-bits data, and then outputs the addition result of the 8 × 64-bits data and the input data.

Preferably, the vector instruction mainly comprises a vector arithmetic instruction and a vector read-write instruction.

A method for applying a DSP instruction set special for electric power comprises the following steps:

s1: preparing a calculation parameter and coefficient table by a main CPU, configuring a DSP mode and initializing an instruction memory and a data memory of a DSP;

s2: the main CPU sequentially selects and calls the vector arithmetic instruction, the vector read-write instruction, the scalar read-write instruction and the scalar logic instruction in the design method according to the requirement of an executable code section;

s3: writing the corresponding selected or called vector arithmetic instruction, vector read-write instruction, scalar read-write instruction and scalar logic instruction into an instruction register, writing the variable data segment into a data register, and storing the user-defined data segment into the data register;

s4: the main CPU configures a DSP register to start a DSP, waits for an interrupt signal of completion of calculation execution, and supports a Ping-Pong operation mode, namely writing data to be calculated next time in the calculation process of the data to be calculated this time;

s5: and after receiving the interrupt signal of finishing the execution of the DSP, the main CPU reads out the calculation result and selects whether to re-configure and start the DSP according to the completion condition of the task.

The invention has the beneficial effects that: the invention provides a design and application method of a DSP instruction set special for electric power, the design method comprises the steps of designing a general register, adopting an ultra-long instruction word technology to design a scalar instruction into a VILW instruction and adopting a single instruction multiple data flow technology to design a vector instruction. And the execution sequence among the instructions is determined by a compiler by adopting a Very Long Instruction Word (VLIW) technology, so that a complex hardware instruction scheduler is avoided, and two instruction formats are supported: a four-slot VLIW instruction and a two-slot immediate instruction; the multiple instruction formats can realize more compact instruction encoding, the condition that encoding space cannot be fully utilized when the optimized operands are insufficient is optimized, the support type of each operation is limited instead of supporting all the operations when the instruction formats are defined, the design complexity can be reduced, and registers and operation resources can be fully utilized.

The application method customizes and designs the vector operation instruction according to the calculation requirement of the power equipment, improves the calculation processing speed of the equipment to the specific calculation requirement, and leads the response capability of the equipment to be rapidly improved.

Drawings

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

FIG. 1 is a flow chart of the instruction set design of the present invention;

FIG. 2 is a schematic diagram of vector instruction design;

FIG. 3 is a flow chart of an application of the instruction set of the present invention;

FIG. 4 is a diagram of a designed power-specific custom DSP core architecture;

FIG. 5 is a diagram of the software and hardware environment for the design and application of a DSP dedicated to power.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As shown in fig. 1, the embodiment of the present invention provides a method for designing an instruction set of a power-specific DSP, comprising the following steps:

s1: designing a general register of the DSP; the general registers comprise 16 32-bits data registers, 8 address registers, 8 index registers, 1 PC register, 1 SP register, 1 LR register, 8 1-bit condition registers and 2 256-bits vector registers; the 256-bits vector register is an 8 x 32-bits vector register. The 8 1-bit condition registers and 2 256-bits vector registers are dedicated to single instruction multiple data stream operations.

S2: designing a scalar instruction of the DSP, wherein the scalar instruction is designed into a VILW instruction by adopting a very-long instruction word technology; the scalar instructions comprise control instructions, scalar arithmetic instructions, scalar read-write instructions and scalar logic instructions, and instruction examples and operation objects are shown in the following table 2. The DSP special for electric power adopts a Very Long Instruction Word (VLIW) technology, integrates 4-path VLIW decoders, occupies 2 paths for data reading and writing and 2 paths for logic operation, and improves the performance of the processor by utilizing the instruction set parallelism. The VILW instruction mainly comprises an arithmetic instruction, a comparison instruction, a pool instruction, three move instructions of data, a zero overhead loop instruction, a control instruction, a long instruction with 32-bits immediate and the like.

The compiler determines the execution sequence among the instructions, and avoids designing a complex hardware instruction scheduler, so the special DSP for the electric power only needs to execute the operation in parallel according to the sequence compiled by the program. A VLIW is a Multiple Instruction Multiple Data (MIMD) structure, where a VLIW instruction can encode multiple operations, and one operation occupies at least one arithmetic unit. The designed power-specific DSP mainly supports two instruction formats: four-slot VLIW instructions and two-slot immediate instructions, as shown in table 1, the four-slot VLIW instruction has the highest parallelism and can support 4 operations simultaneously, wherein two read-write operations M0/M1 can be read memory, write memory, or inter-register move. Examples of instructions and operands are shown in table 1 below.

In order to improve the performance of the processor, the instruction is divided into multiple steps by a pipeline technology to realize asynchronous parallel execution of a plurality of sequential instructions, the VLIW processor can provide strong operation capability and simultaneously keep low hardware complexity, the structure of the processor is simplified, and a plurality of complex control circuits in the processor are deleted. A VLIW instruction can encode multiple operations, with one operation occupying at least one arithmetic unit.

Multiple instruction formats can realize more compact instruction encoding, and the condition that the encoding space cannot be fully utilized when the optimized operands are insufficient is solved. In addition, the instruction format is defined by limiting the support type of each operation, for example, the first two slots are limited in a four-slot VLIW instruction to only support read-write operations, but not all operations, so that the design complexity can be reduced, and registers and operation resources can be fully utilized.

Table 1 designed DSP instruction format for power

S3: designing a vector instruction, wherein the vector instruction is designed by adopting a Single Instruction Multiple Data (SIMD) technology. Vector instructions are designed by adopting Single Instruction Multiple Data (SIMD) technology, and multiple processing units (PE) are adopted to simultaneously execute the same operation on multiple groups of data, so that the parallelism of the data set can be fully utilized, but only one instruction is executed at any time, as shown in fig. 2.

According to the vector operation characteristics of the power parameter algorithm, load operation and multiply-add operation in a vector mode are designed, wherein 8 32-bit data can be loaded to a vector register at one time. The multiplication and addition operation is specifically to calculate the multiplication of two 8 x 32-bits data to obtain 8 x 64-bits data, and then output the addition result of the data and the input 8 x 64-bits data, so that vector operations existing in a large number in the power application algorithm, such as fast Fourier transform, mean square error, full-cycle integral algorithm and the like, can be optimized. The vector instruction mainly comprises a vector arithmetic instruction and a vector reading and writing instruction. Examples of instructions and operands are shown in table 2 below.

Table 2 designed DSP instruction set for power

Given the C/C + + code of the power application algorithm, a binary file which can run on the power special DSP is generated through a matched compiler, an assembler and a linker. The program space of the DSP special for the electric power is divided into an executable code segment, a variable data segment and a custom data segment, wherein the executable code segment is written into an instruction memory, the variable data segment and the custom data segment are written into a data memory, and the custom data segment comprises configuration parameters, a coefficient table (such as trigonometric functions, nonlinear operation data and the like) and data to be calculated. The specific implementation of the present invention further provides an application method of the power-dedicated DSP instruction set, as shown in fig. 3, including the following steps:

s5: and after receiving the interrupt signal of the completion of the execution of the DSP, the main CPU reads out the calculation result and selects whether to re-configure and start the DSP according to the completion condition of the task.

FIG. 4 is a block diagram of a power specific custom DSP core. The DSP core special for electric power comprises three bus interfaces which are respectively used for accessing an instruction memory, a data memory and a control register.

The DSP core special for the electric power is designed by a 3-stage pipeline, namely Instruction Fetch (IF), decoding (ID) and Execution (EX). In the instruction fetching stage, a new instruction is read from an instruction memory; in the decoding stage, the previous instruction is decoded, and the address register corresponding to the read and store operation is updated. The instruction decoder transmits a control signal to the data read-write module and the operation module, wherein the data read-write module has the functions of data alignment and packaging and transfers data to a register file of the operation module; the operation module comprises scalar operation and vector operation and respectively consists of a register file and a processing unit; in the execute stage, all data computations and write back operations are performed.

The power specific DSP supports C language programming and customizes SIMD instruction set acceleration power specific algorithms. It should be noted that the power DSP supports C language programming, but the required tools such as compiler are custom tools, i.e. the binary file generated by the general compiler cannot run on the DSP without using a custom instruction set. A common feasibility program divides program data into executable code segments, initialized data segments, uninitialized data segments, and custom data segments. While the program space of a power-specific DSP is distinguished from the common program space:

the specific space is divided into (i) executable code segments for writing into an instruction memory;

a variable data segment for writing in a data memory;

and the user-defined data segment is used for writing in the data memory and storing the configuration parameters, the coefficient table and the data to be calculated. The coefficient area stores commonly used coefficient tables, such as trigonometric functions, nonlinear operation data and the like. The data area includes a variable area, a configuration data area and a calculation data area, and is generally used for storing initialization data, which is initialization data, data to be calculated and calculation results.

FIG. 5 shows the software and hardware environment for the power specific DSP CPU core design and application. The software and hardware environment system used by the DSP CPU core special for electric power designed by the invention can be divided into a user layer, a software layer and a hardware layer. The user layer is responsible for inputting the target power application algorithm C/C + + code. In the software layer, an input algorithm code generates an ELF format binary system machine code through a C/C + + optimization compiler, an assembler and a linker, a debugger is connected with a chip or a JTAG debugging interface on an FPGA, and the ELF format binary system machine code is led into the chip or the FPGA, so that debugging and development of the CPU core of the DSP special for electric power are performed. The hardware layer mainly comprises a special power DSP chip designed by the invention and opens a JTAG hardware debugging interface to the outside.

The design of the DSP CPU core special for the electric power comprises two steps: architectural design and chip design. The architecture design adopts a 'Compiler-In-The-Loop' method for algorithm-driven architecture exploration, and input files are target power application algorithm C/C + + codes and power special DSP CPU core initial architecture description files.

The initial architecture description file of the CPU core of the DSP special for electric power comprises various information such as an instruction set, an instruction pipeline, a very long instruction word processor, a vector processor, a programmable data path, a microprocessor, an I/O interface and the like which are designed by the design method. Similar to the application scenario of the CPU core of the DSP special for electric power, an input file generates binary machine codes in an ELF format through a C/C + + optimization compiler, an assembler and a linker, then an ELF file is imported into an instruction set simulator through a debugger for simulation, and meanwhile, the bottleneck operation of a target electric power algorithm under the current architecture is found out through analyzing the performance of a profiler.

And (4) performing optimization design aiming at bottleneck operation by a designer through an analysis report of the analyzer, such as increasing the number of instructions or operation units and the like, and obtaining the adjusted power special DSP CPU core architecture. And repeating the process, iteratively searching the most suitable architecture parameters under the constraint of the target power algorithm, and finishing the design of the power application algorithm-oriented DSP CPU core architecture special for the power.

The instruction set simulator can realize the rapid simulation of accurate cycle of the support structure by adopting a real-time compiling technology, can support source code level debugging with a test generator structure, and records and analyzes key contents such as instructions, storage, functional units, assembly line adventure and the like through a parser to form a closed loop flow of iteration of the framework version, thereby realizing the rapid exploration of the DSP CPU core framework and completing the design of the optimal DSP CPU core framework for electric power application. When the optimal DSP CPU core architecture special for electric power in the search space is obtained, a corresponding comprehensive RTL code is obtained through an RTL generator, and a chip design stage is entered:

firstly, integrating comprehensive RTL of a DSP CPU core special for electric power into SoC of a chip special for electric power;

secondly, performing front-end simulation and functional verification by using a VCS tool, wherein part of test excitation of the DSP CPU core special for power can be provided by a test generator;

building an FPGA prototype platform again, and performing FPGA test by referring to the previous flow applied by the DSP CPU core special for electric power;

and finally, performing DC synthesis on the SoC RTL of the special power chip, performing clock constraint according to the target parameters of the chip, completing the final chip layout design, handing over to a third party for chip flow, and performing chip-level test on the special power DSP CPU core after the sample wafer to be tested is completed.

Those of ordinary skill in the art will appreciate that the elements of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components of the examples have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present application, it should be understood that the division of the unit is only one division of logical functions, and other division manners may be used in actual implementation, for example, multiple units may be combined into one unit, one unit may be split into multiple units, or some features may be omitted.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. A design method of a DSP instruction set special for electric power is characterized by comprising the following steps: the method comprises the following steps:

s2: designing a scalar instruction of the DSP, wherein the scalar instruction is designed into a VILW instruction by adopting an ultra-long instruction word technology; two instruction formats are supported: a four-slot VLIW instruction and a two-slot immediate instruction;

2. The method of claim 2, wherein the method further comprises: the 256-bits vector register is an 8 x 32-bits vector register.

3. The method of claim 2, wherein the method further comprises: the 8 1-bit condition registers and 2 256-bits vector registers are dedicated to single instruction multiple data stream operations.

4. The method of claim 1, wherein the method further comprises: the scalar instructions include control instructions, scalar arithmetic instructions, scalar read-write instructions, scalar logic instructions.

5. The method of claim 3, wherein the method further comprises: the method also comprises a load operation for loading 8 32-bit data into the vector register at one time and a multiply-add operation in a vector mode.

6. The method of claim 5, wherein the method comprises: the multiply-add operation is specifically to calculate the multiplication of two 8 x 32-bits data to obtain 8 x 64-bits data, and then output the addition result of the 8 x 64-bits data and the input data.

7. The method of claim 1, wherein the method further comprises: the vector instruction mainly comprises a vector arithmetic instruction and a vector reading and writing instruction.

8. An application method of a DSP instruction set special for electric power is characterized in that: the method comprises the following steps: