CN105975048A

CN105975048A - DSP chip and construction method thereof

Info

Publication number: CN105975048A
Application number: CN201610290943.1A
Authority: CN
Inventors: 高靳旭; 谷晟
Original assignee: Individual
Current assignee: Individual
Priority date: 2016-05-05
Filing date: 2016-05-05
Publication date: 2016-09-28

Abstract

The invention discloses a DSP chip and a construction method thereof. The DSP chip comprises a plurality of task channels for completing arithmetic tasks assigned by a host CPU. Each task channel comprises a DMA controller, an arithmetic unit, and a memory that are used for completing the arithmetic tasks independently; each task channel is connected by using a data bus to a plurality of interface modules corresponding to and used for completing the algorithm tasks; and each task channel is connected to a preset memory management unit, and the memory management unit is connected to a data memory by using the data bus. The DSP chip of the invention realizes the parallel multi-task system from the hardware, so that task switch is no longer needed, the work main frequency is lower, and the energy consumption is reduced.

Description

A kind of dsp chip and building method thereof

Technical field

The invention belongs to chip field, more specifically, relate to a kind of dsp chip and building method thereof.

Background technology

In high performance Digital Signal Processing (DSP) chip, in order to realize Digital Signal Processing fortune rapidly Calculating, dsp chip the most all uses special software and hardware structure.Mesh is introduced as a example by classical TMS320 The basic structure of the dsp chip of front main flow:

1, Harvard structure

Harvard structure is different from the parallel architecture of traditional Feng Nuoman (Von Neuman) structure, It is mainly characterized by being stored in different memory spaces program and data, i.e. program storage and data are deposited Reservoir is two separate memorizeies, and each memorizer independently addresses, independent access.

2, instruction execution pipeline

Relevant to Harvard structure, the widely used streamline of dsp chip is to reduce time for each instruction, thus increases The strong disposal ability of processor.The pipeline depth of TMS320 series processors is from 2～6 grades.Also That is, processor can be with parallel processing 2～6 instructions, the different phase that every instruction is on streamline. In three class pipeline operates, fetching, decoding and perform operation and can be independently processed from, this can make instruction hold Row can be completely overlapped.Within each instruction cycle, three different instructions are active, each instruction It is in the different stages.Such as, when n-th instruction fetching, that is the N-1 instruction of previous instruction Decoding, the N-2 instruction is then carrying out.

3, specialized hardware

In the FIR filter of general type, multiplication is the important component part of DSP.To each filtering Device tap, it is necessary to do a multiplication and a sub-addition.Multiplication speed is the fastest, and the performance of dsp processor is more High.In general microprocessor, multiplying order is realized by a series of additions, therefore needs many to refer to The cycle is made to complete.Comparatively speaking, the feature of dsp chip is exactly a special hardware multiplier.? In TMS320 series, owing to having special hardware multiplier, multiplication can complete within an instruction cycle.

4, special instruction

Another feature of dsp chip is to use special instruction.Such as: DMOV be exactly one special DSP instruction, it completes data shift function.In digital signal processing, postpone operation extremely important, This delay is realized by DMOV.Another special instruction in TMS32010 is LTD, It completes LT, DMOV and APAC tri-instruction within an instruction cycle.LTD and MPY instructs FIR filter tap calculation can be reduced to 2 instructions from 4 instructions.In second filial generation processor, as TMS320C25, adds 2 and more specifically instructs, i.e. RPT and MACD instruct, use this 2 Bar special instruction, can reduce to 1 by the operational order number of each tap from 2 further.

5, the quick instruction cycle

Collection is added in Harvard structure, pile line operation, special hardware multiplier, special DSP instruction Become the optimization design of circuit, the instruction cycle of dsp chip can be made to be greatly shortened.Along with integrated circuit technology Progress, the instruction cycle of general dsp processor has already decreased to ns level.

Each feature of summary so that dsp chip is long-range to the disposal ability of the algorithm of DSP class In general processor, it is possible to the many real-time Embedded Application of real-time implementation.

Owing to for highly versatile, being easy to the principle that software development is easy-to-use, present DSP remains edge By the basic structure of the general processor core with CPU as hardware systems.Such structure makes DSP adapts to the motility of different application and is maximized, and programmer is prone to application.

Dsp processor progress can be summed up and have a both direction:

Improve the operand that an instruction completes as far as possible.Harvard structure, special instruction, specialized hardware are all Belong to this direction.

Shorten the time of an instruction operation as far as possible.Instruction pipeline, technique progress broadly fall into this side To.

Chip design professional quarters it can be seen that, the effort in the two direction all makes computing faster, But it is difficult to be greatly improved the utilization rate of energy consumption.That is: the trigger upset needed for same operand is completed Number is not greatly decreased.For the application that some power consumption requirements are harsher, general dsp cannot be from joint The angle saving power consumption does more deep optimization.The underlying cause be following some:

CPU is that the architecture of core is difficult to make DSP algorithm realize reaching peak efficiency

Although the particularity for DSP application has done many optimization designs, but fundamentally remains with central authorities The instruction of processor carries out computing, as universal cpu based on performing.So, the execution of computing During, about CPU company the most without a stop.Fetching, decoding, perform, the process of execution again may be used Be divided into reading data storage, ALU or specialized hardware calculating, data write back memorizer.

But the feature great majority of DSP algorithm are all: arithmetic type is the most fixing, and data to be operated Amount is relatively big, and the data showing as a big section storage perform a same instruction.So, CPU is come Saying, each computing is all taking same instruction, is translating the such useless operation of same code, such behaviour Necessarily cause the waste of power consumption.

Depend on and promote the strategy of dominant frequency and cause the extra power consumption of hardware:

Chip improves the dominant frequency of work immediately following the progress of integrated circuit technology, and the thing followed is and specifically transports The power consumption of unrelated accessory circuits increases notable.

The most typical with clock trees.The design of digit chip mostly uses Synchronization Design, and clock trees technology is permissible Effectively overcome clock drift, but power consumption cost is the biggest.There may come a time when the 20% to 30% of the most whole chip, And along with dominant frequency and improve chip-scale increase, this ratio has the trend of increase.

" multitask " is originally a Concept of Software in operating system, refers to that computer or CPU perform simultaneously The ability of multiple tasks.

The typical method of general-purpose processor system be by different task between frequent switching, make single cpu " seem " to perform multiple task at the same time. actually this is the software magic that operating system is played.Only The frequency wanting task to switch is sufficiently high (general each second more than 100 times), it is possible to the sensation of the people that out-tricks. By the way, but one of the three of multiple task management operating system big basic functions.

Powerful CPU coordinates produced multitask effect with being stationed silent the cutting of operating system software therein Really, in the data flow processing system of extremely low power dissipation, it is difficult to the work of smoothness.One of trouble goes out in office In business switching, task switches the on-the-spot preservation of inevitable requirement task and recovery frequently, causes the most managerial Operation overhead.In the system processing continuous data stream, this problem is seriously weakened.The two of trouble go out In system work dominant frequency, the multitask system of single CPU, certainly will require that the work dominant frequency of system is greatly improved, Thus on SOC design, must be introduced into multi-level buffer, data pipeline, synchronised clock tree and signal Powerful driving etc., these all will cause the rising of individual part energy consumption.

Therefore, the problem that in prior art, dsp chip causes because of general processor energy consumption is too high.

Summary of the invention

The open a kind of dsp chip of the present invention and building method thereof, be used for solving dsp chip in prior art The problem that the energy consumption that causes because of general processor is too high.

, it is provided that a kind of dsp chip, and use for achieving the above object, according to an aspect of the present invention Following technical scheme:

A kind of dsp chip includes: a plurality of task passage, has been used for the algorithm task of master cpu distribution； Every task passage all includes dma controller, arithmetical unit and memorizer, for calculating described in complete independently Method task；Every task passage all multiple is connect by what data/address bus had connected corresponding to described algorithm task Mouth die block；Every task passage is all connected with the MMU preset, described MMU memory management unit Data storage is connected by data/address bus.

Further, described a plurality of task passage includes: first passage, and described first passage is total by data Line connects described master cpu respectively, FPU Float Point Unit, PWM special purpose interface, USB interface and GIPO manages module；Second channel, described second channel connects a DMA respectively by data/address bus Controller, First Series statistical module, the first ALU and table transform module；Third channel, Described third channel connects the second dma controller respectively by data/address bus, second series statistical module, Second ALU；Fourth lane, described fourth lane connects the 3rd DMA respectively by data/address bus Controller, the 3rd series of statistical module, multiply accumulating array arithmetical unit；Five-channel, described Five-channel consolidates Gu data/address bus connects the 4th dma controller, ADC, DAC module, sequence flows IO respectively Interface.

Further, described third channel is one or more.

Further, described a plurality of task passage also includes: one or more clematis stem road, described clematis stem Road includes the 5th dma controller and the port for external SD RAM Interface.

Further, described PWM special purpose interface is 8 road PWM special purpose interfaces.

Further, multiplying accumulating array arithmetical unit described in is that 8*8 multiplies accumulating array arithmetical unit.

Further, described ADC and described DAC module are 16Bit.

Further, described data/address bus is local bus.

According to another aspect of the present invention, it is provided that the building method of a kind of dsp chip, and use such as Lower technical scheme:

The building method of a kind of dsp chip includes: constructs a plurality of task passage, has been used for master cpu The algorithm task of distribution；Control every task passage all and dma controller, arithmetical unit and memorizer phase Connect, for algorithm task described in complete independently；Control every task passage all to have been connected by data/address bus Become the multiple interface modules corresponding to described algorithm task；Control every task passage all with preset memorizer Administrative unit is connected, and described MMU memory management unit connects data storage by data/address bus.

In the inventive solutions, each independent algorithm task is controlled by one group of passage, DMA Device, arithmetical unit and memorizer are constituted；Such structure has a following advantage for the realization of low-power consumption:

Open different number of task passage according to task amount number, do not have the passage of task to hang up completely, firmly The power consumption of part and the strict positive correlation of operand.

The data/address bus of each hardware corridor is local bus, parasitic electric capacity and required average driving Electric current is greatly reduced, and hard-wired extra power consumption is few.

The task of CPU is as just the manager of commander Yu peripheral hardware, in big data quantity calculating process, CPU Can completely hang up, there is no useless fetching, decoding process.

Multitask is realized by multiple simple controllers, for general dsp, core can be greatly reduced The dominant frequency of sheet, the lower power consumption thus brought is the most considerable.

Accompanying drawing explanation

Accompanying drawing is used for providing a further understanding of the present invention, constitutes the part of the application, and the present invention shows Meaning property embodiment and explanation thereof are used for explaining the present invention, are not intended that inappropriate limitation of the present invention.At accompanying drawing In:

Fig. 1 represents the structural representation of a kind of dsp chip described in the embodiment of the present invention；

Fig. 2 represents the schematic flow sheet of the building method of the dsp chip described in the embodiment of the present invention.

Detailed description of the invention

Below in conjunction with accompanying drawing, embodiments of the invention are described in detail, but the present invention can be wanted by right The multitude of different ways limited and cover is asked to implement.

Fig. 1 represents the structural representation of a kind of dsp chip described in the embodiment of the present invention..

Shown in Figure 1, a kind of dsp chip includes: a plurality of task passage, and in Fig. 1, first leads to Road 10, second channel 20, third channel 30, fourth lane 40, Five-channel 50, clematis stem road 60, For completing the algorithm task of master cpu 11 distribution；Every task passage all includes dma controller, Arithmetical unit and memorizer, for algorithm task described in complete independently；Every task passage is all total by data Line has connected the multiple interface modules corresponding to described algorithm task；Every task passage is all deposited with default Reservoir administrative unit 1 is connected, and described MMU memory management unit 1 connects data storage 2 by data/address bus.

The angle that the present embodiment requires from extremely low power dissipation design, it is proposed that a kind of new dsp chip Hardware configuration, decreases the invalid operation that in calculating process, CPU performs, and the multitask of hardware level is without greatly The Context switches of amount, greatly improves the operation efficiency that DSP algorithm realizes, thus dominant frequency is greatly reduced, The most further reduce the extra power consumption brought due to simultaneous techniques.Harshness is required in low-power consumption Under applied environment, it it is a kind of hardware configuration preferably meeting power consumption requirements.

Preferably, described a plurality of task passage includes: first passage 10, and described first passage 10 is by number Described master cpu 11, FPU Float Point Unit 12, PWM special purpose interface 13, USB is connected respectively according to bus Interface 14 and GIPO manages module 15；Second channel 20, described second channel 20 is total by data Line connects the first dma controller 21, First Series statistical module 22, the first ALU respectively 23 and table transform module 24；Third channel 30, described third channel 30 is connected respectively by data/address bus Connect the second dma controller 31, second series statistical module 32, the second ALU 33；4th Passage 40, described fourth lane 40 connects the 3rd dma controller 41 respectively by data/address bus, and the 3rd Series of statistical module 42, multiplies accumulating array arithmetical unit 43；Five-channel 50, described Five-channel 50 leads to Cross data/address bus and connect the 4th dma controller 51, ADC 52, DAC module 54 respectively, be Row stream I/O interface 53.

This enforcement provides a typical case achieving hardware level parallel multi-task system and realizes logic diagram, to hardware Level parallel multi-task is explained as follows:

System can support six independent tasks simultaneously, and the controller of first passage 10 is master cpu 11, with Manage and be configured to main, also supporting the task with master cpu 11 as arithmetical unit；Second channel 20 is with data It is transmitted as main；Fourth lane 40 is that 8 × 8 multiplier arrays are special；Five-channel 50 is fixed cycle number Special according to stream, it is typically allocated to ADC52 module or DAC module 54；Clematis stem road 60 is external Sdram interface 62 and the designated lane of data RAM exchange data in sheet；Other arithmetical units walk the 3rd Passage 30, third channel 30 can be one or more.

Preferably, described PWM special purpose interface 13 is 8 road PWM special purpose interfaces.

Preferably, multiply accumulating array arithmetical unit 43 described in and multiply accumulating array arithmetical unit for 8*8.

Preferably, described ADC 52 is 16Bit with described DAC module 54.

Preferably, described data/address bus is local bus.

Shown in Figure 2, the building method of a kind of dsp chip includes:

S101: construct a plurality of task passage, has been used for the algorithm task of master cpu distribution；

S103: control every task passage and be all connected with dma controller, arithmetical unit and memorizer, For algorithm task described in complete independently；

S105: control every task passage and all connected by data/address bus, corresponding to described algorithm task Multiple interface modules；

S107: control every task passage and be all connected with the MMU preset, described storage tube Reason unit connects data storage by data/address bus.

In the technical scheme of the present embodiment, in step S101, a plurality of task passage of structure hardware level, And it being equipped with multiplex data bus so that each task passage all can be with the calculation of complete independently master cpu distribution Method task；So that the algorithm task that task passage every day all can distribute with complete independently master cpu, In step s 103, every task passage is controlled all and dma controller, arithmetical unit and memorizer phase Connecting, for algorithm task described in complete independently, this reservoir of will seeking survival needs multi-channel port multiple to coordinate Task passage, and the number of special arithmetical unit is not only multiplier, but according to algorithm task custom hardware Arithmetic element；In step S105 to S107, it is all according to algorithm task, every task passage to be passed through Data/address bus connects corresponding hardware cell.Additionally the multitask of hardware level needs programmer distribution memorizer Use, the task passage of reasonable arrangement hardware.

The method that the present invention proposes is to use hardware level parallel multi-task system.It it is exactly briefly each task Take alone a hardware space, switch completely without task.Many group hardware are in concurrent working, and work Dominant frequency is relatively low.Practice have shown that, work dominant frequency is power saving on tens orders of magnitude.Low-power consumption MCU's Just that's about the size of it for dominant frequency, the cortex-M0 of such as ARM.And for digital signal processing algorithm, Basic operator is all fairly simple, at all need not be by the control process of CPU level.Complicated and uncomfortable Control mechanism can increase power consumption.

Multitask is realized by multiple simple controllers, for general dsp, core can be greatly reduced The dominant frequency of sheet, the lower power consumption thus brought is the most considerable.The above is the side of being preferable to carry out of the present invention Formula, it is noted that for those skilled in the art, without departing from of the present invention former On the premise of reason, it is also possible to make some improvements and modifications, it is new that these improvements and modifications also should be regarded as this practicality The protection domain of type.

Claims

1. a dsp chip, it is characterised in that including:

A plurality of task passage, has been used for the algorithm task of master cpu distribution；

Every task passage all includes dma controller, arithmetical unit and memorizer, for complete independently institute State algorithm task；

Every task passage has all connected the multiple interfaces corresponding to described algorithm task by data/address bus Module；

Every task passage is all connected with the MMU preset, and described MMU memory management unit is passed through Data/address bus connects data storage.

2. dsp chip as claimed in claim 1, it is characterised in that described a plurality of task passage includes:

First passage, described first passage connects described master cpu respectively by data/address bus, and floating-point is transported Calculate unit, PWM special purpose interface, USB interface and GIPO and manage module；

Second channel, described second channel by data/address bus connect respectively the first dma controller, first Series of statistical module, the first ALU and table transform module；

Third channel, described third channel connects the second dma controller respectively by data/address bus, and second Series of statistical module, the second ALU；

Fourth lane, described fourth lane connects the 3rd dma controller respectively by data/address bus, and the 3rd Series of statistical module, multiplies accumulating array arithmetical unit；

Five-channel, described Five-channel connects the 4th dma controller, ADC respectively by data/address bus Module, DAC module, sequence flows I/O interface.

3. dsp chip as claimed in claim 2, it is characterised in that described third channel is or many Bar.

4. dsp chip as claimed in claim 2, it is characterised in that described a plurality of task passage also includes:

One or more clematis stem road, described clematis stem road includes the 5th dma controller and for external The port of SD RAM Interface.

5. dsp chip as claimed in claim 2, it is characterised in that described PWM special purpose interface is 8 road PWM special purpose interfaces.

6. dsp chip as claimed in claim 2, it is characterised in that described in multiply accumulating array arithmetical unit Array arithmetical unit is multiplied accumulating for 8*8.

7. dsp chip as claimed in claim 2, it is characterised in that described ADC is with described DAC module is 16Bit.

8. the dsp chip as described in any one of claim 1-7, it is characterised in that described data/address bus is equal For local bus.

9. the building method of a dsp chip, it is characterised in that including:

Construct a plurality of task passage, be used for the algorithm task of master cpu distribution；

Control every task passage to be all connected with dma controller, arithmetical unit and memorizer, for solely Found described algorithm task；

Control every task passage all by data/address bus connected corresponding to described algorithm task multiple Interface module；

Control every task passage to be all connected with the MMU preset, described MMU memory management unit Data storage is connected by data/address bus.