CN101365138B

CN101365138B - JPEG2000 image compression processing system

Info

Publication number: CN101365138B
Application number: CN 200810223915
Authority: CN
Inventors: 谭贤红; 于巍巍; 王菊花; 程亚娟; 张锐菊
Original assignee: No504 Institute Of China Space Technology Group No5 Academy
Current assignee: No504 Institute Of China Space Technology Group No5 Academy
Priority date: 2008-10-10
Filing date: 2008-10-10
Publication date: 2010-12-08
Anticipated expiration: 2028-10-10
Also published as: CN101365138A

Abstract

The invention relates to a compressing and processing system for JPEG2000 images. An image wavelet converting unit decomposes an original image into wavelet coefficient blocks according to JPEG2000 algorithm to be transmitted to a memorizer; a wavelet coefficient reading and controlling unit reads the coefficient blocks from the memorizer according to a preset scanning order to be transmitted to a wavelet coefficient caching unit; a wavelet coefficient coding unit composed of DSP encodes the coefficient blocks; each DSP sends out data-exchanging requests to be transmitted to a priority controlling unit, so as to conduct priority ranking; each DSP reads the corresponding coefficient blocks from the wavelet coefficient caching unit according to a priority order; the wavelet coefficient coding unit transmits the encoded coefficient blocks to a coding coefficient caching unit; a code stream optimizing and intercepting unit reads all the coefficient blocks and a corresponding interception point and a distortion value from the coding coefficient caching unit, then seeks a group of interception points which have the smallest distortion at the formulated compressed code rate according to Lagrance algorithm and, and intercepts a part of the code stream of each coefficient block, so as to be output to the exterior after being spliced with assistant information after encoded by RS.

Description

A kind of JPEG2000 image compression processing system

Technical field

The present invention relates to a kind of image compression processing system, particularly a kind of many DSP and many FPGA of adopting realizes, towards remote sensing images high-fidelity, highly reliable, real-time processing, based on the image compression processing system of JPEG2000 standard.

Background technology

JPEG2000 is the Joint Photographic Experts Group that International Standards Organization's still image coding group was formulated and issued in 2000.It is the transform method of core with the dct transform that JPEG2000 has changed traditional Joint Photographic Experts Group, has adopted to have the more concentrated small wave converting method of energy response.The technological core of JPEG2000 is wavelet transform and rate-distortion optimization intercepting built-in code block encryption algorithm EBCOT (EmbeddedBlock Coding with Optimized Truncation).The basic structure of JPEG2000 algorithm as shown in Figure 1.Before carrying out wavelet transform, image is carried out some suitable preliminary treatment, mainly comprise: big image is carried out piecemeal (Tile) handle, every absolute coding reduces the complexity of system and helps parallel processing; Carry out level shift and prevent that data from overflowing; Coloured image or many component images are carried out component transformation (reversible or irreversible) etc., so that image compression.The EBCOT algorithm is divided into T1 and two encoders of T2.T1 is made up of embedded bit-plane coding and MQ arithmetic encoder, and control of T2 encoder completion rate and code stream tissue.During the EBCOT coding, each wavelet sub-band being divided into littler code block (as 64 * 64), is that unit independently carries out the T1 coding with code block (Code-Block).The bitstream length that different code blocks produces is inequality, and they also are different to the contribution that recovers picture quality.Therefore the bit stream that produces for all code blocks, T2 has adopted the rate-distortion optimization technology to carry out the back compression and has handled (PCRD:PostCompression Rate Distortion), promptly to the code stream of each code block according to recovering the quality contribution layering of image, finish the rate control and the tissue of code stream.

Based on the most representative dedicated video and the rest image compression chip ADV202 that has AD company to release of the image compression processing system of JPEG2000 standard based on the JPEG2000 algorithm.This chip internal is integrated with one 32 risc processor as system controller; The wavelet transformation part can realize 6

grade

97 or 53 wavelet filters; Three entropy coding modules finish quantification, rate-distortion optimization, based on contextual coding, and data are carried out layout according to the form of bag and layer, form the encoded data stream of JPEG2000 at last.But this chip has only the credit rating of technical grade, can't satisfy that needs such as space remote sensing image are highly reliable, the instructions for use of long-life application scenario.

At present, it is fewer that the image Compression of JPEG2000 standard adopts the hardware system of non-special chip.Document " Development of lmage Processing System Based on DSP and FPGA; Duan Jinghong, 2-4244-1135-1/07/2007IEEE " a kind of image compression processing system is disclosed, this system adopts the implementation of monolithic DSP and monolithic FPGA, and its realization speed is lower, only is 262.144K pixel/second.The patent No. is CN1216485C, the patent documentation that name is called " the high speed EBCOT encoder that is applicable to JPEG2000 " discloses a kind of image processing system encoder, wherein said embedded platform is the hardware system that is suitable for Network Transmission, that emphasizes is data acquisition, data format and satisfies host-host protocol, and to picture quality, processing speed all less than describing.The patent No. is CN1313976C, name is called " based on JPEG2000 image encoding and the transmission method and the system of embedded platform " and discloses a kind of image processing system, its described high speed EBCOT encoder has proposed the improvement of JPEG2000 core algorithm EBCOT, the VLSI structure can reach 100Mbit/s by simulation velocity, if pixel is 8 bit accuracy, its processing speed is 12.5MSample/s.But this patent does not provide actual hardware system and algorithm improves the quality that image is recovered in the back.The article of being delivered in the computer engineering in December, 2007 and design by Ge Baoshan " based on the Remote Sensing Image Compression system of many DSP " has also provided a kind of Remote Sensing Image Compression treatment system, its described Remote Sensing Image Compression system based on many DSP adopts four DSP to carry out Remote Sensing Image Compression, be better than JPEG2000 though recover picture quality, but processing speed has only 50Mbit/s, if pixel is 8 bit accuracy, its processing speed only is 6.25MSample/s, can't satisfy the space remote sensing image compression to high speed (30MSample/s), the application requirements of high-fidelity (KDU is suitable with the JPEG2000 standard software).

Summary of the invention

Technology of the present invention is dealt with problems and is: overcome the deficiencies in the prior art, a kind of high speed, high-fidelity JPEG2000 image compression processing system that Remote Sensing Image Compression is used that be suitable for is provided.

Technical solution of the present invention is: a kind of JPEG2000 image compression processing system comprises that image wavelet transform unit, wavelet coefficient read control unit, wavelet coefficient buffer unit, priority control unit, wavelet coefficient coding unit, code coefficient buffer unit and code stream and optimize interception unit; The image wavelet transform unit is according to the JPEG2000 algorithm, delivers to memory after raw image data is resolved into Wavelet Coefficient Blocks; Wavelet coefficient reads control unit and Wavelet Coefficient Blocks is read and delivered to the wavelet coefficient buffer unit according to the scanning sequency of setting from memory; The wavelet coefficient coding unit is made up of one or more DSP, is used for Wavelet Coefficient Blocks is encoded, and each DSP sends data exchange request and delivers to the priority control unit; The priority control unit carries out prioritization to the data exchange request that each DSP in the wavelet coefficient coding unit sends, and controls each DSP and read corresponding Wavelet Coefficient Blocks according to priority orders from the wavelet coefficient buffer unit; Wavelet coefficient code block after the wavelet coefficient coding unit will be encoded is delivered to the code coefficient buffer unit; Code stream is optimized interception unit and is read all coefficient block and intercept point and the distortion value corresponding with each coefficient block from the code coefficient buffer unit, seek one group of intercept point of distortion minimum under the regulation compression bit rate according to the Lagrange algorithm, and from described intercepting light the intercepting each coefficient block partial code streams, supplementary through the outside input in RS coding back and system is spliced, at last to the outside output of system.

The scanning sequency that described wavelet coefficient reads control unit when Wavelet Coefficient Blocks is read from memory is: 3n+1 the subband that produces behind the n level wavelet transformation for store in the memory image is carried out, between the subband according to LLn, HLn, LHn, HHn, HLn-1, LHn-1, HHn-1 ..., the zigzag sequential scanning of HL2, LH2, HH2, HL1, LH1, HH1; In each subband, all Wavelet Coefficient Blocks are divided into symmetrical four groups, between four groups according to the zigzag sequential scanning, divide according to four parts of symmetry for every group, the scanning sequency between four parts also is a zigzag again, the data block and the predetermined value equal and opposite in direction that obtain after dividing; Obtain after dividing for each with the equal-sized data block of predetermined value according to the band sequential scanning, each data block is divided into band of 4 row, each band of sequential scanning from top to bottom, the every row of band inside sequential scanning from left to right, every row scan from top to bottom.

The method that the data exchange request that described priority control unit sends each DSP is carried out prioritization is: adopt two-stage FIFO to realize, the solicited status that each DSP is sent is encoded and is buffered among the FIFOA earlier, sky sign according to FIFOA has judged whether that request is stored then, if the coded message of the request of then reading is arranged, described coded message is sorted and order writes FIFOB according to priority preset, sky sign according to FIFOB reads the coded message and the decoding of storing among the FIFOB, and the priority control unit produces response signal.

Described wavelet coefficient buffer unit or code coefficient buffer unit adopt the twoport BLOCKRAM of FPGA inside to realize.

The data exchange ways of each DSP and wavelet coefficient buffer unit or code coefficient buffer unit is for adopting the extended pattern direct memory access mode of external memory interface in the described wavelet coefficient coding unit.

The present invention's advantage compared with prior art is:

(1) system of the present invention adopts wavelet coefficient to read control unit, priority control unit and wavelet coefficient coding unit match and finish reading and encoding operation of image wavelet coefficient piece, wavelet coefficient reads control unit and according to the scanning sequency of setting Wavelet Coefficient Blocks is read from memory, read to compare and improved reading rate with order, the wavelet coefficient coding unit is made up of one or more DSP, be used for Wavelet Coefficient Blocks is encoded, can make full use of the parallel processing capability of many DSP, improve the processing speed of whole system, the data exchange request that the priority control unit sends each DSP carries out responding respectively after the prioritization again, has further improved the processing speed of system.When wavelet coefficient read control unit and priority control unit employing FPGA realization, itself and multi-disc DSP integrating parallel were handled, and can guarantee that the processing speed of system reaches per second 30MSample/s.For the remote sensing images of complexity, Y-PSNR and the only poor 0.5Db of JPEG2000 standard software result of calculation when 4 multiplication of voltages contract, fidelity is good;

(2) scanning sequency that wavelet coefficient reads control unit among the present invention is, between data block and the data block zigzag scanning, the order generation wavelet coefficient that in the piece is band scanning is read the address, can save the hardware implementation space of realizing this function greatly, thereby improve the addressing speed of wavelet coefficient.In addition according to demand, the subband scanning sequency that can change Wavelet Coefficient Blocks for by the low layer wavelet coefficient to high-rise or by high-rise wavelet coefficient to low layer, probability upgrades the high efficiency of encoding in the back during with better adaptation DSP parallel encoding;

(3) the priority control unit adopts two-stage FIFO to realize the data exchange request that each DSP sends, and the sort method of prioritization relative fixed order can reduce the stand-by period of DSP to greatest extent, guarantees the high-speed cruising of whole system;

(4) adopt the twoport BLOCKRAM of FPGA inside to realize wavelet coefficient buffer unit or code coefficient buffer unit, can realize the exchange of wavelet coefficient between FPGA and the DSP by the read-write control of dual port RAM, with respect to the design of adopting FPGA sheet external memory unit as buffer memory, hardware resource takies few, path delay is little, and the BLOCKRAM read or write speed is fast and be easy to control;

(5) the relative direct memory storage mode commonly used of extended pattern direct memory storage mode of each DSP employing external memory interface in the wavelet coefficient coding unit with the exchanges data of wavelet coefficient buffer unit or code coefficient buffer unit, can improve the efficient and the speed of DSP and FPGA exchanges data effectively, with exchanges data and digital coding executed in parallel mechanism in the realization dsp code process, thus the high speed processing ability of assurance whole system.

Description of drawings

Fig. 1 is the basic structure block diagram of JPEG2000 algorithm;

Fig. 2 realizes theory diagram for the hardware of image compression processing system of the present invention;

Fig. 3 is Wavelet Coefficient Blocks block count in the system of the present invention, piece sequence number and block address corresponding relation figure;

Fig. 4 is band scanning sequency figure in the Wavelet Coefficient Blocks in the system of the present invention;

Fig. 5 reads the address for wavelet coefficient in the system of the present invention and produces schematic diagram;

Fig. 6 is monolithic DSP and two FPGA exchanges data theory diagrams in the system of the present invention;

Fig. 7 is the priority queueing theory diagram of DSP in the system of the present invention and the application of FPGA exchanges data;

Fig. 8 judges and mechanism of production figure for each DSP application in the system of the present invention;

The state transition diagram that Fig. 9 carries out priority queueing for system of the present invention medium priority control unit;

The dual port RAM read-write control principle block diagram that Figure 10 adopts for data buffer storage unit in the system of the present invention;

Figure 11 handles the flow chart of wavelet coefficient for the DSP of system of the present invention medium and small wave system number encoder unit;

Figure 12 optimizes interception unit EDMA read-write for code stream in the system of the present invention and handles the parallel mechanism theory diagram with DSP.

Embodiment

As shown in Figure 2, image compression processing system of the present invention comprises that the design with four DSP and two FPGA is the course of work that example further specifies system of the present invention below.

One, the realization of High Speed scheme of four DSP and two FPGA co-design

As shown in Figure 2, in system of the present invention, the JPEG2000 algorithm is finished jointly by four DSP and two FPGA, wherein wavelet transformation FPGA and peripheral SRAM thereof finish the hardware realization of view data buffer memory and wavelet transformation, four DSP finish the embedded bit-plane coding of Wavelet Coefficient Blocks, and another sheet FPGA finishes the buffer memory of optimization intercepting, RS coding and the auxiliary data of coded data.Wavelet transformation FPGA has opened up inside four BLOCKRAM, respectively as a part of external memory space of four DSP, so that the storage Wavelet Coefficient Blocks; Optimize intercepting FPGA inside and open up another part external memory space of four BLOCKRAM, with the wavelet data behind the memory encoding as four DSP.

The view data buffer memory adopts two SRAM to realize the division of ping-pong buffer and image block.In the present embodiment, input picture (is the example explanation with 3072*1024) is a delegation with 3072 pixels, and sram cache 1024 row can be divided into the TILE of 3 1024*1024 sizes.Wavelet transformation FPGA serves as to handle unit with a TILE, reads three TILE respectively and carries out wavelet transformation.Wavelet transformation FPGA finishes a TILE view data 97 shaping wavelet transformations.The shaping wavelet conversion coefficient adopts pairing coefficient in the JPEG2000 standard.Hardware needing to realize external SRAM buffer memory results of intermediate calculations.For saving memory space, can adopt the LINE_BASED wavelet transformation technique, line translation, rank transformation can carry out simultaneously, after wavelet transformation finishes, the wavelet conversion coefficients at different levels of buffer memory are write external SRAM.Adopt two external SRAM can realize the ping-pong buffer of wavelet conversion coefficient.

When wavelet transformation FPGA and multi-disc DSP carry out the high speed exchange of wavelet coefficient, be to call over Wavelet Coefficient Blocks and deposit FPGA internal storage space (twoport BLOCKRAM) in according to predefined, this memory space carries out data read operation as the outside extension storage space of DSP by DSP.

Four DSP carry out embedded bit-plane coding respectively to Wavelet Coefficient Blocks.Coefficient behind each wavelet coefficient block encoding deposits in optimizes intercepting FPGA internal storage space (twoport BLOCKRAM).Optimization intercepting program reads all coefficient block earlier and deposits in the external SRAM, reads each coefficient block intercept point again and is buffered in the SRAM that FPGA inside is offered with corresponding distortion value.According to processing speed, inside and outsidely all offered the ping-pong buffer that data are carried out in three groups of zones.Program is sought regulation compression bit rate (as 4:1) one group of intercept point of distortion minimum down according to the Lagrange algorithm, intercepts the partial code streams of every coefficient again according to intercept point, carries out the RS coding after the code stream combination, exports with the auxiliary data splicing back of buffer memory again.

But the reason of this scheme realization of High Speed JPEG2000 algorithm mainly contain following some: 1) make full use of the quick hardware that FPGA parallel processing mechanism solves wavelet transformation and optimize intercepting and realize; 2) adopting the dsp code Wavelet Coefficient Blocks is to encode based on the characteristics of Wavelet Coefficient Blocks according to T1 in the EBCOT algorithm, and has utilized the powerful data processing function of DSP; 3) to carry out the high speed exchange of wavelet coefficient be to guarantee that FPGA and DSP performance deal with advantage separately, the key of assurance disposed of in its entirety speed for wavelet transformation FPGA and multi-disc DSP.After tested, when the DSP dominant frequency was 850Mhz, monolithic DSP and FPGA exchanges data speed were 70*16Mbps, and the speed of four DSP and FPGA exchanges data can reach 4.48Gbps.

Two, the efficient address of Wavelet Coefficient Blocks

The problem that the efficient address of Wavelet Coefficient Blocks will solve is according to the requirement of JPEG2000 algorithm, the coefficient in Wavelet Coefficient Blocks and the piece according to fixing calling over, promptly to be produced the address of reading of reading wavelet coefficient according to certain scanning sequency.The scanning sequency that Wavelet Coefficient Blocks is read is: being the zigzag scanning sequency between piece and the piece, is the band scanning sequency in the piece.Upgrade owing to will carry out the context probability according to several code block code coefficients in the code block cataloged procedure, the code block of statistical property unanimity is one group and carries out the probability renewal that can make probability distribution more concentrated, code efficiency is higher.Therefore the scanning sequency code-aiming block according to the Z word scans.The zigzag scanning sequency is about to wavelet coefficient and is divided into the identical piece of size, is numbered according to the zigzag order.As shown in Figure 3, behind the image of the 1024*1024 size process level Four wavelet transformation, have 13 subbands (zone that heavy line marks off).Size according to 64*64 is a coefficient block, can be divided into 256 coefficient block (zone that dotted line marks off).The scanning of Z word at first is embodied in the subband scanning sequency, promptly to the sequencing scanning of level Four wavelet transformation according to LL4, HL4, LH4, HH4, HL3, LH3, HH3, HL2, LH2, HH2, HL1, LH1, HH1.In each subband all pieces are divided into symmetrical four groups then, every group of scanning sequency also is zigzag.Every group is continued to divide till only remaining a data block, and the size of this data block can be set as required, as 32*32,64*64 etc.Two numerals are arranged in each coefficient block among Fig. 3, and last layer digital is according to the piece sequence number layout in sequence of LL4, HL4, LH4, HH4, HL3, LH3, HH3, HL2, LH2, HH2, HL1, LH1, HH1 subband, the block count in the corresponding diagram 5.And layer digital is according to the piece sequence number layout in sequence of HL1, LH1, HH1, HL2, LH2, HH2, HL3, LH3, HH3, HL4, LH4, HH4, LL4 subband down, the piece sequence number in the corresponding diagram 5.Why be in proper order because can be under the fixedly compression ratio of 4:1 so that the code efficiency of DSP reaches optimal value according to this.

Band scanning sequency in each coefficient block is about to a coefficient block and is divided into band of four lines as shown in Figure 4, and each band of sequential scanning from top to bottom for the data block of 64*64 size, is divided into into 16 bands.The every row of band inside sequential scanning from left to right, every row are to scan from top to bottom.Because every row wavelet coefficient is identical often or close in the band, this scanning sequency helps the extraction of wavelet coefficient correlation.

According to scanning sequency, generation is read the address and can be realized according to method shown in Figure 5.To read the address be to represent 1024*1024 size address by 20 bits to SRAM among Fig. 5, and wherein most-significant byte is a block address, band address in the low 12 bit representation pieces.Block address is the input by Fig. 5, and block count generates through piece sequence number generation module and block address generation module, and the band address then is the input by Fig. 5 in the piece, i.e. piece inside counting generates through the band address generating module.The high 4 bit representation band sequence numbers (from 0 to 15) of band address in the piece, least-significant byte is represented the sequence number of coefficient in the 64*4 size strip.Block count is produced by one 8 digit counter, is mapped as the piece sequence number by piece sequence number generation module, and the piece sequence number is mapped as block address by the block address generation module.Block address is exactly according to from left to right, and block-by-block scanning sequency from top to bottom increases progressively.The piece inside counting is produced by one 12 counter, represents a coefficient block inside according to every row from left to right, the order of lining by line scan from top to bottom again.Can be converted into scanning sequency shown in Figure 4 by the band address generating module.The wavelet coefficient of the 3rd row 3 row in the 12nd (than decimal fractions) wavelet block among Fig. 3 for example, the production process of its block address are that block count 12 becomes piece sequence number 76 earlier and becomes block address 42 again.The band address is by 130 forming that the band sequence number 0 of high 4 bit representations in the piece inside counting value 11 and band interscan order 11 become.Its final address is 42*4096+0*256+130=172162.Adopt method shown in Figure 5 to realize that Z word scan method and the band in the Wavelet Coefficient Blocks between the Wavelet Coefficient Blocks scan the hardware implementation space that can save this function of realization greatly, thereby improve the addressing speed of wavelet coefficient.

Three, dual port RAM is write the control of wavelet coefficient priority

As shown in Figure 6, DSP adopts the EMIF interface to carry out the visit of external memory space by the EDMA mode in the embodiment of the invention.This memory space is that the BLOCKRAM of FPGA inside realizes dual port RAM, and FPGA also carries out storage operation to this space.Each port of dual port RAM all has independently data wire, and address wire and reading and writing enable control line.WV BLOCKRAM1 is for realizing the dual port RAM of wavelet conversion coefficient buffer memory among Fig. 6.Wherein A mouth data wire, address wire, read-write enable to be respectively WV_DATA_A[0..15], WV_ADDR_A[0..15], WV_EN_A, WV_WE_A, B mouth data wire, address wire, read-write enable to be respectively DATA_B[0..15], ADDR_B[0..14], EN_B, WE_B.OT BLOCKRAM2 is for realizing the dual port RAM of code coefficient buffer memory among Fig. 6.Wherein A mouth data wire, address wire, read-write enable to be respectively WV_DATA_A[0..15], WV_ADDR_A[0..15], WV_EN_A, WV_WE_A, B mouth data wire, address wire, that read-write enables is public with the B mouth of WV BLOCKRAM1.Adopt 16 position datawires and 15 bit address lines in the present embodiment, B mouth high address line (ADDR_B[14]) is used to distinguish two memory spaces with a slice DSP correspondence.ADDR_B[14]=' 0 ' show corresponding wavelet transformation buffer memory RAM, ADDR_B[14]=' 1 ' show corresponding code coefficient buffer memory RAM.BLOCKRAM internal data spatial division is flag bit, valid data district.Setting the zero-address space is the read-write state sign, with the mode of operation of expression twoport.Write " AAAA " as the WV_BLOCKRAM1 zero-address and show that the wavelet data piece writes, wait for that DSP reads.DSP writes " BBBB " in zero-address after running through data, show that wavelet data reads, and can write next piece wavelet data, promptly FPGA has been sent the application that can write data.

Fig. 7 is the priority queueing theory diagram of DSP in the system of the present invention and the application of FPGA exchanges data.FPGA is the flag bit of DSP request for data exchange among a plurality of BLOCKRAM of inquiry (external memory space of multi-disc DSP) constantly, to a plurality of applications store, prioritization and will sort after the sequence number storage, response is applied for one by one at last.Adopt two-stage FIFO to add judgement, storage, queuing and the response of method to finish application of priority control and sequencing control, this method can guarantee that multi-disc DSP reads with speed the most efficiently.

The zero-address that application is judged and generation module will constantly be inquired about four BLOCKRAM, when certain BLOCKRAM zero-address data was " AAAA ", signal b* uprised.Owing to four BLOCKRAM may produce application simultaneously, therefore to when any one application produces, four application status be encoded and buffer memory, i.e. generation is write the data of FIFO and is enabled.Fig. 8 judges and mechanism of production figure for application, and pulse signal S1, S2, S3 or the S4 that produces a clock width at b1, b2, b3 or b4, by the low moment that uprises to be representing corresponding application status, and with this state latch and deposit among the FIFO.For example among Fig. 8, b3 is during earlier by low uprising, and the coding of application status is S1, S2, S3 and S4 combinations of states, i.e. and " 0010 ", write signal We_fifoa uprises simultaneously, and " 0010 " is write FIFOA.According to the principle of first-in first-out, the application that produces earlier will write earlier and response earlier.If plural application produces simultaneously, for example b1, b2 uprise simultaneously, and then application status is " 1100 ", and the prioritization module can priority orders decision according to a preconcerted arrangement respond b1 or b2 earlier.Why adopting the FIFO buffer memory is in order to solve continuous plural application when successively arriving, the situation that priority queueing may get congestion.

The prioritization module of application has judged whether that according to the sky sign of FIFOA application is stored in the space of applying for response.In case have application promptly to produce read signal (Rd_a among Fig. 7) and read the code signal (Dout_a among Fig. 7) of application, carry out priority queueing according to preset priority then, application sign (Din_b among Fig. 7) order that ordering is good writes FIFOB.The prioritization module is to adopt the literary style of state machine to enumerate 16 kinds of input states and corresponding state transitions, at every kind of non-zero status all produce write FIFOB write gate and corresponding write data.The priority queueing state transition diagram is seen Fig. 9, and wherein 16 circles are represented 16 kinds of states, 15 kinds of possible outputs of corresponding FIFOA and a kind of reset mode " 0000 ".By every kind of state " Dout= ^*" represent that this state should output to the signal value of FIFOB, the output of priority queueing module just.At every kind of input state, state transitions and corresponding output are such.When reset signal arrives, enter " 0000 " state, no-output, the invalid FIFOA of FIFOB write signal read effectively (Wr=0, rd=1).Under non-reset mode, FIFOA exports " 0001 ", then state machine enters " 0001 " state, Dout=" 4 ", show that the application that DSP4 sends enters the FIFOB buffer memory, write simultaneously FIFOB effectively, reading FIFOA, invalid (Wr=1 rd=0), gets back to " 0000 " then and waits for the non-zero output of FIFOA.When FIFOA output " 0010 ", then state machine enters input " 0010 " state, and Dout=" 3 " shows that the application that DSP3 sends enters the FIFOB buffer memory.If input signal occurs more than two " 1 ", for example " 0011 ", then state machine earlier output Dout=" 4 " show response DSP4 earlier, jump into " 0010 " state then, output Dout=" 3 ", i.e. back response DSP3 gets back to " 0000 " state at last.Enter input " 0001 " state, Dout=" 2 " shows that the application that DSP2 sends enters the FIFOB buffer memory.When input is " 1111 ", show that four DSP produce application simultaneously, sequencing according to DSP4, DSP3, DSP2, DSP1, then state transitions is " 1110 ", " 1100 ", " 1000 ", " 0000 ", produces continuous four simultaneously and writes FIFOB gate and corresponding write data Dout=" 4 ", Dout=" 3 ", Dout=" 2 ", Dout=" 1 ".The rest may be inferred, and 15 kinds of output states of corresponding FIFOA can find corresponding output and state transitions in Fig. 9.The data of reading when the FIFOA non-NULL are effective.It is to inquire about during the application respective free that FIFOB reads that useful signal produces, and writes then produces the FIFOB read signal if FIFOB non-NULL, FIFOB are non-.

The application respond module is being write gate in the space of application response according to the sky sign of FIFOB and is being read the sign and the decoding of storing among the FIFOB when invalid among Fig. 7, produces response signal d ^*, will apply for signal b simultaneously ^*Step-down guarantees that new application judgement can produce.For example FIFOB reads " 0001 ", and then d4 becomes ' 1 ', and b4 becomes ' 0 '.

Four, the read-write of wavelet coefficient dual port RAM control

In the embodiment of the invention, wavelet coefficient buffer unit and code coefficient buffer unit all adopt dual port RAM to realize.The read-write control module of wavelet coefficient dual port RAM is divided as Figure 10.To be the Wavelet Coefficient Blocks that will read from external SRAM by A mouth signal generation module on the one hand write A mouth (address, data, write enable, read to enable be respectively addr_a, data_a, en_a, we_a shown in Figure 10) according to gate d and address addr according to the sequential of writing of dual port RAM to the function that the read-write of dual port RAM control is finished, and read read-write state indicate (data_a_out shown in Figure 10) and send into the priority control unit to apply for judging and priority is controlled from the A mouth.DSP will judge from B mouth read-out mark (data_b_out shown in Figure 10) by B mouth signal generation module on the other hand, if reading " AAAA " shows that the wavelet data piece writes, wait is read.After DSP produces and reads to enable, read the address and read clock (en, addr, clk) Wavelet Coefficient Blocks (data_dsp) to be read from the B mouth of RAM by B mouth signal generation module, write " BBBB " sign (data_dsp) again, to show that data read, can write next Wavelet Coefficient Blocks.A, when B mouth signal generation module will guarantee that the B mouth is write, the A mouth is not read, when the A mouth was write, the B mouth was not read, and guaranteed that with this read data is correct.In addition, the B mouth signal generation module read-write that also will send according to high address line states identification DSP is at wavelet transformation FPGA or optimizes intercepting FPGA.Particularly, when Addr (14)=' 0 ', show at wavelet transformation FPGA, otherwise, when Addr (14)=' 1 ', show at optimizing intercepting FPGA.

Five, DSP high speed EMIF interface control

For reaching the exchange of DSP and FPGA high-speed data, DSP selects for the control mode of high speed external memory interface and the parallel processing mechanism design of DSP data streams is a key link.Must adopt external memory interface (EMIF) interface during the DSP access external memory.The concrete model of DSP is TMS3206416 in the present embodiment, and its CPU (CPU) dominant frequency can reach 1GHz, and inside has EMIFA and two external memory interfaces of EMIFB and EDMA (extended pattern direct memory access (DMA)) controller.Employing EMIFB interface is finished the visit to external memory storage in the present embodiment, specifically adopts 16 bit data bus width, the sync cap pattern.The EDMA controller is responsible for the data passes between DSP on-chip memory L2 and the external memory storage.Through the reality test, the synchronised clock flank speed can reach 140MHz, and data messaging efficiency almost reaches 100%.When the DSP dominant frequency is operated in 850MHz, monolithic DSP and FPGA exchanges data speed are actual when being 70*16Mbps, and the speed of four DSP and FPGA exchanges data can reach 4.48Gbps, and equipment can satisfy the image input rate of 30M sampling per second.

DSP handles the wavelet coefficient flow process as shown in figure 11.In DSP program initialization process, write " BBBB " sign earlier, show that FPGA can write wavelet coefficient, wait for " AAAA " sign then, identify rearmounted EDMA read operation by the time.At first whether determining program is write for the first time in the EDMA read procedure, if then directly put " DDDD " sign (showing that the inner BLOACKRAM of optimization intercepting FPGA has read sky).After EDMA runs through, wait for " DDDD " sign, wait until that rearmounted EDMA writes.When writing a block encoding coefficient, current code block wavelet coefficient is carried out EBC handle (embedded bit-plane coding).Wait for after the end-of-encode that EDMA writes end and " AAAA " sign (showing that the inner BLOACKRAM of wavelet transformation FPGA has write full), if both are all satisfied, then carry out EDMA and reads.Read in the process of next piece wavelet coefficient the current block coded data to be carried out the MQ entropy coding, enter the judgement of waiting for that EDMA runs through after the end-of-encode.

Because EDMA can guarantee not enable in the data-moving process CPU (central control unit), therefore, when EDMA writes, can carry out the EBC operation, when EDMA reads, can carry out the MQ coding.The EDMA read-write is handled parallel mechanism such as Figure 12 with DSP.Complete processing cycle be since the first road vertical line to the, three road vertical lines by.Read to finish and read sign " DDDD " when (showing that the inner BLOACKRAM of optimization intercepting FPGA has read sky) at the judgement EDMA of vertical line place, first road, after startup EDMA writes, begin EBC (embedded bit-plane coding) immediately.If EDMA is enough fast, when EBC finishes, just can declare EDMA and write end at vertical line place, second road, just can get started EDMA and read entropy coding if at this moment read sign " AAAA " (it is full to show that the inner BLOACKRAM of wavelet transformation FPGA has write) with MQ.When if read-write EDMA speed is enough fast, the time that DSP carries out the wavelet data coding just is not subjected to the influence of wavelet data transmission and encoding code stream transmission, and only depends on EBC operation and MQ speed of coding.

Embedded bit-plane coding (EBC) is to upgrade three coding steps in sequence according to importance propagation, amplitude refinement and cleaning, and the end point of each PASS is exactly an intercept point.Rate-distortion optimization intercepting algorithm is exactly to be under the situation of R at target bit rate, seeks the intercept point of each code block, when making each code block code stream sum smaller or equal to R, and the distortion of each code block and minimum.Solving this constrained extremal problem can and be summed up as the minimization problem of single code block by the Lagrange algorithm, promptly for given rate distortion thresholding can find distortion rate greater than the rate distortion thresholding the maximum intercept point of correspondence when reciprocal.In cataloged procedure, can calculate the corresponding distortion rate of each code block intercept point, obtain one group of rate distortion curve.After all code block end-of-encodes,, can find the intercept point and the corresponding code rate sum of each code block for same rate distortion thresholding.Optimizing the function that intercepting FPGA finishes is exactly to adopt dichotomy regulation distortion threshold, up to the intercept point corresponding code rate sum of each code block near target bit rate R.

The content that is not described in detail in the specification of the present invention belongs to those skilled in the art's known technology.

Claims

1. JPEG2000 image compression processing system is characterized in that comprising: image wavelet transform unit, wavelet coefficient read control unit, wavelet coefficient buffer unit, priority control unit, wavelet coefficient coding unit, code coefficient buffer unit and code stream and optimize interception unit; The image wavelet transform unit is according to the JPEG2000 algorithm, delivers to memory after raw image data is resolved into Wavelet Coefficient Blocks; Wavelet coefficient reads control unit and Wavelet Coefficient Blocks is read and delivered to the wavelet coefficient buffer unit according to the scanning sequency of setting from memory; The wavelet coefficient coding unit is made up of one or more DSP, is used for Wavelet Coefficient Blocks is encoded, and each DSP sends data exchange request and delivers to the priority control unit; The priority control unit carries out prioritization to the data exchange request that each DSP in the wavelet coefficient coding unit sends, and controls each DSP and read corresponding Wavelet Coefficient Blocks according to priority orders from the wavelet coefficient buffer unit; Wavelet coefficient code block after the wavelet coefficient coding unit will be encoded is delivered to the code coefficient buffer unit; Code stream is optimized interception unit and is read all coefficient block and intercept point and the distortion value corresponding with each coefficient block from the code coefficient buffer unit, seek one group of intercept point of distortion minimum under the regulation compression bit rate according to the Lagrange algorithm, and from described intercepting light the intercepting each coefficient block partial code streams, supplementary through the outside input in RS coding back and system is spliced, at last to the outside output of system; The scanning sequency that described wavelet coefficient reads control unit when Wavelet Coefficient Blocks is read from memory is: 3n+1 the subband that produces behind the n level wavelet transformation for store in the memory image is carried out, between the subband according to LLn, HLn, LHn, HHn, HLn-1, LHn-1, HHn-1 ..., the zigzag sequential scanning of HL2, LH2, HH2, HL1, LH1, HH1; In each subband, all Wavelet Coefficient Blocks are divided into symmetrical four groups, between four groups according to the zigzag sequential scanning, divide according to four parts of symmetry for every group, the scanning sequency between four parts also is a zigzag again, the data block and the predetermined value equal and opposite in direction that obtain after dividing; Obtain after dividing for each with the equal-sized data block of predetermined value according to the band sequential scanning, each data block is divided into band of 4 row, each band of sequential scanning from top to bottom, the every row of band inside sequential scanning from left to right, every row scan from top to bottom.

2. a kind of JPEG2000 image compression processing system according to claim 1, it is characterized in that: the method that the data exchange request that described priority control unit sends each DSP is carried out prioritization is: adopt two-stage FIFO to realize, the solicited status that each DSP is sent is encoded and is buffered among the FIFOA earlier, sky sign according to FIFOA has judged whether that request is stored then, if the coded message of the request of then reading is arranged, described coded message is sorted and order writes FIFOB according to priority preset, sky sign according to FIFOB reads the coded message and the decoding of storing among the FIFOB, and the priority control unit produces response signal.

3. a kind of JPEG2000 image compression processing system according to claim 1 is characterized in that: described wavelet coefficient buffer unit or code coefficient buffer unit adopt the twoport BLOCKRAM of FPGA inside to realize.

4. a kind of JPEG2000 image compression processing system according to claim 1 is characterized in that: the data exchange ways of each DSP and wavelet coefficient buffer unit or code coefficient buffer unit is for adopting the extended pattern direct memory access mode of external memory interface in the described wavelet coefficient coding unit.