CN1691782A

CN1691782A - High-performance code compression system of dynamic image information

Info

Publication number: CN1691782A
Application number: CNA2005100770229A
Authority: CN
Inventors: 国枝博昭; 一色刚; 李冬菊; 伊藤和人; 大塚友彦; 崔欧·阿迪恩; 查瓦雷特·宏沙卫克
Original assignee: Individual
Current assignee: Individual
Priority date: 2000-11-28
Filing date: 2001-11-26
Publication date: 2005-11-02
Also published as: JP3486871B2; JP2002165222A

Abstract

The present invention provided a high-performance code compression system, in which image information on a dynamic image is compressed and reduced in an image transmission line with a limited information transmission amount, which shortens the transmission delay time and which reduces hardware constituting the system. A section whose importance is high in an image, e.g. a face which contains the movement of lips accompanying talking, is discriminated from a part whose importance is low, e.g. a background other than the face of a human. This information processing operation is weighted, so as to enhance its information transmission efficiency. When difference information on a frame B is forcibly changed into an image having zero value, an image in a transmission source can be restored in a transmission destination, on the basis of least information. A discrete cosine conversion part and a quantization part can be omitted from an encoder, and the compatibility of the high-performance code compression system with the conventional systems is guaranteed.

Description

The high-performance code compression system of dynamic image data

The application is that application number is 01140078.1, the applying date is November 26 calendar year 2001, invention and created name is divided an application for " high-performance code compression system of dynamic image data ".

[technical field]

The present invention, needed with video telephone etc., the code compressed system of dynamic image data that reduces bit rate is relevant.In other words, in image information, important part is arranged, the face image of lip motion when the people speaks in video telephone; Not too part and parcel is also arranged, such as the background beyond people's the face image; They are distinguished, the information processing of pith emphatically, improve information transfer efficiency with this, simultaneously, telephone line is originally restricting ever-increasing transmission capacity, has only the image information conveying capacity is compressed to Min., adapt to original telephone line with this, come the expression of synchronous driving near natural dialogue, the synchronized movement of sound and lip transmits in the time of will realizing especially speaking, and promptly lip sound is synchronous.The present invention just relates to the high-performance code compression system of such dynamic image data.

[background technology]

See image in the video telephone in the past, for can be in the amount of information allowed band that can transmit transmission information, owing to be subjected to the restriction of telephone line, image information is cut down, picture quality is not good, compare with the dynamic menu of TV, transmission be to be similar to the still frame that continuous shortage changes.In other words, sensuously, be that the people's that makes a phone call facial photo is sent out slightly a little later with the sound of fax (FAX) than phone.Like this, compare, will preferentially keep the image quality of each frame with the dynamic menu function.And the image quality when keeping still frame, have to reduce and to transmit 25 frame (Europe by per second, the PAL in Asia, SECAM mode TV) or the transmissible frame number of per second of the television image of per second 30 frames (the NTSC mode TVs of states such as Japan), promptly to reduce dynamic image function that TV has originally significantly as cost.Like this,, compare amount of information still a lot,, finally cause receive delay, so the motion of lip and one's voice in speech are inconsistent because of its processing and transmission need the time with sound even most frame is cut down.Conversely, if in order to cooperate, and take to force the method that makes sound synchronous with the image that postpone to receive.Like this, reply also will be delayed, and see as relaying at telstar, talk being discord very.

[summary of the invention]

Certainly, the image that video telephone etc. adopted there is no need to pursue the image quality as the high-grade film of arenas projection; As long as video telephone can be seen the motion with teller's the corresponding to lip of speaking, follow the motion of lip, part beyond the face, can reach the reduction of moving image of high fidelity of the degree of per second 24 frames (film standard) or per second 25 or 30 frames (television standard), just can reach the original purpose of video telephone.According to this purpose, whether the specific region of considering to comprise in the picture person's of being taken face profile etc. has importance, finishes the signal processing of various piece selectively.So just can be compressed into necessary Min. to amount of information, not damage the atmosphere that video telephone is talked simultaneously again, this is one of purpose of the present invention.

The purpose original according to video telephone, the not too interested part of recipient in the picture, and the part that is easy to generate the eye-catching face of interest is distinguished as the specific region.With the degree that reaches per second 24 frames (film standard) or per second 25 or 30 frames (television standard), realize the reduction of real motion, and the motion of lip and sound are consistent, realize the synchronous driving of lip sound, this also is a purpose of the present invention.

Corresponding with the picture of video telephone, distinguish weight according to people's interest and handle, like this dynamic image data amount that takies limited and expensive transmission lines is compressed, and then the efficient of raising information transmission.This also is a purpose of the present invention.

Realize above-mentioned purpose, the technical problem to be solved in the present invention is for this reason:

1, at first, be in the restriction of the information processing capacity that is subjected to transmission lines, to do weighted average to transmission information and handle, also be necessary to carry out finite information and accept or reject, must be the selection reference that is used for carrying out above-mentioned weighted average processing as necessary condition.,,, determine to follow the means of the people's that made a video recording action, just become problem with this window if when it is set at the above-mentioned window that comprises the people key area of making a phone call as selection reference.

For this reason, can this window be defined as the dynamic image data of its image section that comprises is weighted average selection reference by follow the motion of shot subject with window; Simultaneously, follow the motion of shot subject, thereby reach weighted average really the dynamic image data of reality with this window.

2, in the middle of the use of video telephone, not only people's face as main body, simultaneously, for with the descried attitude of the speaker who is seen, actions such as gesture dynamic image together carries out the emphasis of information to be handled, i.e. weighted average, and make it real-time and clear., former way is not set the window of the motion that can follow parts such as hand, so just can not carry out the weighted average of actual dynamic image data to this part.

For this reason, can with speaker's attitude, the dynamic image of gesture etc. actions associated also on the spot as average weighted selection reference, and with this as peripheral window, follow by shooting people's actions such as gesture with this periphery window, its objective is face, as related part, handle with this weighted average of carrying out dynamic image data with attitude and gesture based on the people.

3, the portable video telephone that in automobile, is equipped with, picked-up be the background that moves continuously.Moving of background is very fast, so the amount of information of dynamic image increases severely, how to cut down its amount of information significantly, is the problem that we study., when needing to cut down significantly this does not also have to determine because background acutely moves the amount of information and the suitable processing method that cause.For this reason, determine to cut down this significantly when exactly because background acutely moves the amount of information that causes, and carry out suitable processing, purpose is exactly to establish the means of carrying out suitable choice on a kind of degree that the dynamic image of unessential background visually is not difficult to find out.

4, from dynamic image, extraction of motion information is come out, it is carried out predicted picture synthetic after the motion compensation with reference to image and compare and obtain difference image with decoded, this difference image is compressed, transmit, by this coded system, and then establish the means of the image quality improve the dynamic image that decoder one side is reduced.Relevant therewith is, the noise signal of mixing into when difference image compresses has also been strengthened when improving compression ratio, and the result is to cause the reduction of decoder one side image quality.Further situation is, is benchmark and synthetic predicted picture with the bad decoded picture of this image quality, and its precision of prediction has also reduced, and like this, the amount of information of difference image also will increase, and compression ratio also will improve, and so causes vicious circle, and this is a problem points.

For this reason, with the B two field picture that the synthetic predicted picture of 2 picture frames before and after on time relationship is used during as picture decoding, during the decoding of other picture frame and it it doesn't matter, so can not influence the image quality of other picture frame, utilize this characteristics, the difference image information of B frame is forced to be made as 0, amount of information with this B frame waste is suppressed at irreducible minimum, make it to share the more information amount and the part that saves is used in I frame or P frame that image quality is had a direct influence, this is purpose just.

5, image information is encoded and is transmitted, and has comprised the noise that is encoded in the image of its decoding reduction.From the effect that these noises are cut down, grey scale signal is compared with color difference signal, we know and utilize color difference signal to carry out the effect of noise-cut, and are better from the visual characteristic effect of human eye, thereby we establish carry out the means that effective noise is cut down on color difference signal.

So, though in standard H.263+, established " as the noise-cut method of the color difference signal of formal option ", in " as the noise-cut method of the color difference signal of formal option ", do not have the encoding image information transfer system, so, provide " the noise-cut means of color difference signal " with more succinct structure and different method here.This is the purpose of this invention.

6, be the controlling organization of the bit rate that Information Compression is handled in encoder.Its necessity has 2 reasons, and one of reason is in order to adapt to transmission lines information to be transmitted quantitative limitation; Former two is that the bit length of each frame is equalization as much as possible if decoder is certain to the reduction rate of dynamic image therefore.

In the former Bit-Rate Control Algorithm mode, visual encoding and decoding test model recent version 11 (VideoCodec Test Model according to the ITU of International Telecommunications Union (InternationalTelecommunication Union) issue, Near-Term, Versionl1) (below, abbreviate TMN-11 as), dynamic image data compressed software program according in the standard H.263+ has several modes that can adopt.

, the situation of former Bit-Rate Control Algorithm mode is that from image process above-mentioned encoder, transmission lines and the image of decoder behind output decoder of camera input, time of origin postpones and the loss of the frame width of cloth during this.And former Bit-Rate Control Algorithm mode does not possess for the frame number that makes this delay and loss and reaches the needed strict control of the Min. function of this time of delay; Like this, become problem time of delay, that is to say and undelayed acoustic phase ratio, the dynamic image of expression mouth motion has lagged behind, and causes the motion of mouth and the sound can not be synchronous thus, brings such problem.And, for the accurate equalization of the bit length that makes above-mentioned each picture frame, need very complicated calculating; And computing the time, cause the generation of above-mentioned time of delay again inevitably.

At this, be with simply calculating those calculating of indispensable complexity done of replacing over for the equalization of the bit length of above-mentioned each picture frame being carried out pinpoint accuracy, and then reducing computing needed time of delay, purpose provides the system that can realize that lip sound is synchronous.

7, a lot of hardware modules are carried out each other combination effectively and constituted a system, at this time, consider association each other, the situation that all hardware modules all meet design is non-existent, like this, the time that shortens the designing institute needs of finishing whole system is the comparison difficulty, and this is a shortcoming; Certainly, the design alteration in a place will relate to whole design, and the restriction in the design is also a lot.

At this, not that a lot of hardware modules is carried out each other combination effectively, that is to say does not have being connected of the horizontal stroke relevant with exchanges data, only is connected with control centre by vertical connection, finishes control by this control centre.Specifically, the inputoutput data relevant these hardware modules all temporarily leaves in the memory, and the behavior sequence of these hardware modules and public inputoutput data are all controlled by control centre.By such structure, the behavior of these hardware modules does not just interdepend, but independent separately.Different like this modular design have independence, and the restriction in the design also obviously reduces; More designer shares different design objectives, like this, reaches the purpose that the shortening whole system designs the time that needs.

8, be window Memory Sharing processor array, be the characteristics of window MSPA (Memory SharingProcessor Array), when not reducing " parallel efficiency ", from the said external memory " search data " and comparable data are imported successively, the executive means that this needs " window parallel processing " does not have this method so far.

At this, exactly when not reducing " parallel efficiency ", " search data " and comparable data to be imported successively from the said external memory, purpose is to establish the means of carrying out " window parallel processing ".

9, in the encoder of dynamic image data and decoder, be used, up to contrary two-dimension discrete cosine transform device, is to realize by the device of realizing a discrete cosine transform and these two kinds of processing of quantification is carried out reasonable combination from two-dimension discrete cosine transform device, quantizer, inverse quantizer.Research is in this respect carried out., realize these, in theory still be in the structure of stage of fumbling, the way of reality is the data transferring method of corresponding external memory storage, some new structures that do not have before forming, to seek the good method of efficient on the whole, so just can finish purpose.

Here,, they are carried out two-dimension discrete cosine transform and quantification, be stored in then in the external memory storage from the external memory storage reading of data; Similarly, from the external memory storage reading of data, they are carried out re-quantization and 2-D discrete cosine inverse transformation, and be stored in the external memory storage.Purpose is to establish the high efficiency means of establishing under the situation that does not reduce data transfer rate.

For addressing the above problem, technical scheme content of the present invention is:

One, at the specific region of moving arbitrarily in the picture of the dynamic image in identification is handled, just preferentially carried out the window (21) of information processing, the entire image that constitutes this window is divided into the fritter of rectangle, successively rectangular tiles is handled, and the dynamic image of utilization and piece motion motion vector together, infer the position of window of the face of next frame, and then the enough windows of energy (21) are followed the motion of shot subject.

Like this, window (21) clearly for carrying out the determinating reference that the dynamic image data weighted average is handled, simultaneously because window (21) is followed the motion of above-mentioned shot subject, is handled so can carry out the weighted average of dynamic image data truly.

Two, the difference according to former frame and present frame surpasses the threshold value of regulation as condition, judge the zone that has any change that object part comprised of poor slightly importance inferior to shot subject, be the peripolesis window (51) in the above-mentioned quilt specific zone of preferentially carrying out information processing, the entire image that constitutes it be divided into the fritter of rectangle; Successively it are handled, and the dynamic image of utilization and piece motion motion vector together, infer the peripolesis position of window and the zone of next frame, and then the enough peripolesis windows of energy (51) follow the zone of above-mentioned any change,

For example, not only people's face as main body, simultaneously, be accompanied by the object part of action such as posture, gesture, dynamic image data corresponding with it also will be weighted average treatment, and clearly above-mentioned peripolesis window is defined as this average weighted selection reference.Simultaneously, follow the motion of people's hand etc. with above-mentioned peripolesis window, so just can be handling based on people's face and the weighted average of following the object part of actions such as posture, gesture to carry out real dynamic image data.

Three, within the scope of above-mentioned window (21) and above-mentioned peripheral window (51), distinguish shot subject and background, when very violent thereby dynamic image data amount increases in the motion of its background, cut down the amount of exercise of above-mentioned background, calculating with this background image quality that weakens, promptly, in " macro block " image of present frame, being in the data of the previous image frame of same position with this " macro block " image, carrying out addition according to certain ratio mixes, the filter that possesses this time orientation is cut down the dynamic image data amount of above-mentioned background with this.

Like this, will determine to cut down significantly and thisly carry out simultaneously suitable processing again, carry out suitable choice on the degree that the dynamic image of unessential background visually is not difficult to find out because background acutely moves the amount of information that causes when.

Four, the image of the image of the image of present frame, front reference frame and back reference frame input and carry out motion prediction, motion compensation and definite prediction mode, this is the motion prediction functional module; Differential signal between the image of predicted picture of exporting from this motion prediction functional block and present frame is imported, and the both full-pixel value of this differential signal is carried out compulsory making zero, this is making zero of a both full-pixel value functional module; Input is from the both full-pixel information of the making zero of this both full-pixel rule pulverised functional block output, and the prediction mode of stating the decision of motion prediction functional block is in the use predicted when the next one of dynamic image moves, above-mentioned making zero both full-pixel is encoded, and this is the coding synthesis module; By constituting encoder with upper module.Carry out encoding compression and send out by this encoder, receive these dynamic image datas through transmission lines, it is decoded, this is a decoder module; Importing from the signal of this decoder module output and carrying out re-quantization, this is an inverse quantization module; From the signal input of inverse quantization module output and carry out inverse discrete cosine transform, and and then be reduced to difference image, this is a discrete cosine inverse transform module; Carry out addition the difference image of this reduction with by prediction of above-mentioned prediction mode and the predicted picture that obtains, the image of output reduction, this is an adder Module; By constituting decoder with upper module.To sum up, be equipped with such decoder and encoder, and carry out the processing of B frame.

So, the differential signal that obtains as image information from the reference frame of front and back, in the coded system of the B frame that the information of present frame is compressed, be not to transmit differential signal self the former mode that resembles, but transmit the kind of Difference Calculation, just still be that this information of two direction predictions is established the method that the B frame information compresses only to send along direction prediction, contrary direction prediction.Like this, when the motion of above-mentioned dynamic image was very violent, it is a lot of that the amount of information of above-mentioned difference image just becomes; In order to make the amount of information minimum of above-mentioned difference image, both full-pixel is forced to make zero, and transmits with minimum information by the image that makes transmission like this, thereby the recipient can be reduced.

Five, do not use above-mentioned predicted picture, and only current " intra-macroblock " directly encoded, in the processing to this " intra-macroblock ", grey scale signal and carrier chrominance signal quantize by rounding up; To this " intra-macroblock " situation in addition, grey scale signal will be cut down and quantize according to the mode of rounding, and carrier chrominance signal still quantizes by rounding up; And grey scale signal and color difference signal are used same quantization level, reduce the noise of color difference signal with this.

Like this, according to method in the encoder that comprises a grey scale signal and two color difference signals by quantizing to carry out conversion, in decoder, decode, just can alleviate the color difference signal noise, thereby established the quantization method that improves the visual quality of decoded picture with identical quantization level.Like this, use, but can bring into play the anti noise of effective reduction color difference signal than the simple structure of previous method, thereby, characteristics, images with high image quality visually can be obtained from human eye.

Six, the comparison means that the bit quantity of the dynamic image data after being encoded and the residual bit quantity of communication buffer are compared; Compare the comparative result that means obtain by this, the target bits amount of coming control frame is so that the above-mentioned residual inexhausted control device of bit quantity; The control result that use obtains by this control device, make it to reach Min. to controlling up to time of delay that the image of output decoder takes place during this and the frame width of cloth number cast out via above-mentioned encoder, transmission lines and above-mentioned decoder from it, utilize the computational methods of target bit rate of this every frame of the control of this other bit rate of frame level at each frame width of cloth from the image of camera input.

Seven, possess two kinds of calculating means.With the average weighted mean value of the quantization level of each " macro block " of previous frame, calculate the rank of the quantification of in initial " macro block " of frame, using, this is the calculating means of the initial first step; With quantization level of using in the encoding amount of the reality of above-mentioned target bits amount, current " macro block " and above-mentioned initial " macro block ", calculate the amount trimmed of the quantization level that later " macro block " of second step be suitable for, this is second to go on foot computational methods.

So in above-mentioned " the 6th " and " the 7th ", be suppressed at irreducible minimum the time of delay that produces in the time of on the one hand will be the transmission of the coding of image, information and coding, also will be suppressed at irreducible minimum to losing of the frame width of cloth; This just need seek each frame optimum target code bit quantity (below, this processing is called frame rank Bit-Rate Control Algorithm), simultaneously the quantization level among each " macro block " after this is adjusted to only degree (below, this processing is called " macro block " rank Bit-Rate Control Algorithm).Though this method amount of calculation seldom, can bring into play good control ability.Like this, former the sort of for the bit length of each picture frame is carried out high-precision equalization requisite very complicated calculating, just can with simple calculate alternative.Like this, can reduce aforementioned calculation and handle the time of delay that produces, realize " lip sound is synchronous ".

Eight, possesses such encoder, comprising the memory of memory image frame information; When the hardware module of operation independently of one another mutually combines with data/address bus, by data flow between Central Control Module control storage and the hardware module and time sequential routine, promptly so-called address-generation unit AGU; By control bus each hardware module is controlled and mutually combined by this AGU, thus the construction system structure.

Like this, having flexibility and high speed, and the hardware module of the small scale integrated circuit of power saving combines by bus, uses AGU their time sequential routine of module controls and with the data flow between the data of memory, so establishes the system configuration of the most suitable dynamic image compression.

Nine, has such Central Control Module.Image is divided into a lot of pieces, after the coordinate information of these pieces is processed, is stored in the external memory storage, and this external memory storage has the address structure that is suitable for this processing mode; In the read only memory ROM that possesses in centralized control module to be used for depositing command program, these command programs are being controlled the execution of above-mentioned each hardware module, and these orders can to produce coordinate with " macro block " be unit, be unit with the piece, be the memory addressing address of unit with the pixel, constitute its system configuration thus.

So, as central controlled processor to the control of the data input and output of the beginning of the operation of each hardware module and end and memory or the like all operations, just can to generate coordinate with " macro block " be unit, be unit with the piece, be the storage address of unit with the pixel.Like this, the operation of these hardware modules does not just interdepend, but independent separately.Restriction in the design also obviously reduces; Therefore, more designer can share different design objectives, so just can shorten the time that the whole system design needs.

Ten, possess with lower module and means.Comprise above-mentioned external memory storage; And, be the data of " macro block " in the memory that unit converts the serial input buffer that the line data formal argument uses of going forward side by side to by each macro block; And, data 3 ends of exporting from this buffer offer 32 parallel-by-bit arrays, and this array is made of the processor unit that is joined together, be called window treatments Memory Sharing processing array technique, (window memory sharing process array architecture); And, the computing means of the data of this processor unit being carried out the ultrahigh speed computing; And search is used for representing that " macro block " of present frame is the motion-vector search loop of where moving the motion vector of coming from previous frame.

Like this, among hardware module of all kinds, whole treating capacities more than 80% are to be finished by motion vectors search circuit, for this reason, have established the structure of being carried out efficient operation by a plurality of processors.Just can not reduce the efficient of high-speed parallel with this structure, and, just can carry out the parallel processing of window by search data and comparable data serial being imported from external memory storage.

11, possess with lower module and means.Comprise external memory storage; And, from external memory storage importing successively by horizontal 8 * " macro block " that perpendicular 8=64 pixel constitutes data, simultaneously, do not reduce rate of data signalling situation under, serial data is transformed into parallel data, two groups such data mode shift means; And, by means of this data mode shift means, above-mentioned parallel data is carried out 2 yuan of discrete cosine transforms, such processor array; And, carry out 2 yuan of outputs after the discrete cosine transform handling array from this, import again, and it is quantized, then data are exported and stored in the external memory storage, such quantization modules.

Like this, there are some after motion-vector prediction calculates, to need a large amount of computing modules such as discrete cosine transform and sample circuit, and, carry out the inverse quantization circuit and the inverse discrete cosine transform module of inverse operation with them, in these modules, will be when not influencing the operation of its inner high speed, the data of outside memory storage are carried out the reading of serial, parallel processing and storage.Like this, established under the situation of the efficient that does not reduce data transfer rate at a high speed and parallel processing,, carried out the means of parallel processing then from external memory storage serial input pixel data.

The present invention compared with prior art has following good effect:

1,, window 21 is defined as carrying out the needed discrimination standard of weighted average of dynamic image data according to first of foregoing invention content (following handle " foregoing invention content " omits); And then, can carry out the weighted average of reliable dynamic image data by the motion that above-mentioned window is followed people being shot.

2, according to second, not only people's face as main body, the dynamic image data for the object part of actions such as attitude, gesture is defined as the peripolesis window it is weighted average discrimination standard; And then follow the action of staff etc. by above-mentioned peripolesis window, and like this, can follow the object part of actions such as attitude, gesture to face simultaneously based on the people, carry out the weighted average of reliable dynamic image data.

3, according to the 3rd, determine when the motion of background is violent, its amount of information significantly to be cut down; Carry out suitable processing simultaneously, can carry out suitable choice to the dynamic image of unessential background in not ugly degree.

4, according to the 4th, when the motion of above-mentioned dynamic image is very fierce, the amount of information of difference image increases in a large number, for the B frame, should make the amount of information minimum of difference image as much as possible, by difference both full-pixel value is forced to make zero, can the image of transfer source be reduced at reciever with minimum information.

5, according to the 5th, use than simple structure in the past, but can bring into play significant reduction color difference signal anti noise, thereby, can obtain characteristics, images with high image quality visually from human eye

6, according to the 6th and the 7th, for the bit length to each picture frame carries out high-precision equalization, indispensable in the past very complicated calculating is substituted with simple calculating, reduced aforementioned calculation like this and handled the time of delay of being caused, can realize " lip sound is synchronous ".

7, according to the 8th and the 9th, different Module Design have independence, and the restriction in the design also obviously reduces; More designer can share different design objectives, like this, can shorten the time that the whole system design needs.

8, according to the tenth, do not reduce the efficient of high-speed parallel, from external memory storage serial inputted search data and comparable data, can carry out the window parallel processing.

9, according to the 11, do not reduce data transfer rate,, can carry out binary discrete cosine transform and quantification and re-quantization and binary inverse discrete cosine transform from the external memory storage serial input data.

The present invention because constituted the high-performance code pressing system of dynamic image data according to above explanation, can bring into play the existing telephone line that uses before the optical fiber communication to greatest extent effectively.That is to say,, only sound is carried out high efficiency transmission, construction systems under such prerequisite with limited certain band bandwidth circuit.Though be in the restriction scope of the information transfer capacity of existing telephone wire road network (also having comprised wireless) defined, realized the transmission of very practical television image, and be lip sound synchronous driving sound.Because electromagnetic absolute velocity reaches the light velocity, the delay of image transmission can be ignored, so, so long as there is not the same distance of astronomical figure, just can carry out TV talk near nature.

The present invention is effectively to old telephone network, also is effective to from now on optical networking.That is to say that in jumbo transmission lines at a high speed, the video telephone of the application of the invention may increase the number of packages of communication.Thereby, bring huge numbers of families facility.

Again such as, if the present invention is applied in the video recording reduction apparatus that makes up with 128M bit DRAM, the dynamic image that comprises sound compresses with the data volume of per second 34K bit, can realize 1 hour record and reduction, so just do not needed the video tape drive unit of former video tape recorder, such record regenerator can create very cheaply.Just because of this, the ROM image-reproducing apparatus is easy to popularize, and range of application comprises child's toy, audio-visual materials, daily necessities and public utility etc., and range of application and convenience are innumerable.

[description of drawings]

Fig. 1 is the key diagram of the window of face.

Fig. 2 is the window of face and the key diagram of peripolesis window.

Fig. 3 is a key diagram of cutting down the method for background information amount.

Fig. 4 utilizes the processing of B frame to carry out the key diagram of the method for amount of information reduction.

Fig. 5 is the condition and the variable general chart of relevant frame rank transfer rate.

Fig. 6 is the process chart of frame rank transfer rate control.

Fig. 7 is the variable relevant with " macro block " rank transfer rate and the general chart of constant.

Fig. 8 is the process chart of " macro block " rank transfer rate control.

Fig. 9 is the key diagram of the definition of function C Q (x) in the control of " macro block " rank transfer rate.

Figure 10 is " macro block " (i) renewal process chart of middle quantization level q.

Figure 11 is the encoder functionality block diagram.

Figure 12 is the decoder function block diagram.

Figure 13 is the functional block diagram of encoder/decoder dual-purpose one.

Figure 14 is the functional block diagram of central control device (AGU).

Figure 15 is the structure chart of the memory area of external memory storage.

Figure 16 is the motion-vector search line map.

Figure 17 is the functional block diagram of discrete cosine transformer and quantizer.

Figure 18 is a data mode inverter functionality block diagram.

Figure 19 is the functional block diagram of discrete cosine transform/inverse converter.

Figure 20 is the functional block diagram of the input processing section in discrete cosine transform/inverse converter.

Figure 21 is the functional block diagram that the output processing part in discrete cosine transform/inverse converter is divided.

The explanation of the symbol in the above-mentioned accompanying drawing:

1. memory

2. data/address bus

3.AGU

4. control bus

8，9.ROM

10. motion-vector search part

21. window

22. background image

39,57. buffers

41. motion prediction funtion part

42. making zero of both full-pixel value funtion part

45. coding generating portion

46. decoded portion

47. re-quantization part

48. inverse discrete cosine transform part

44,49. addition section

51. peripolesis window

55. data format converter

56. discrete cosine transform/inverse converter

101,102 ..., 132. processor units

[the concrete mode of implementing]

Below, to Figure 21, form of implementation of the present invention is described by Fig. 1.

Fig. 1 is the key diagram of window.The window 21 of face distinguished from background image 22 come, carry out the weighted average of information simultaneously; The window 21 of face as emphasis, is opposed that mutually background image 22 specially reduces image quality to reduce amount of information.And, by follow the mechanism of the motion of the face of shaking with the window 21 of face, be updated to up-to-date position continuously.

The specific region of moving arbitrarily in the picture of dynamic image during identification is handled, just preferentially carried out the window (21) of the face of information processing, all pixels that constitute it are divided into the fritter of rectangle successively, under this processing mode, and the dynamic image of utilization and fritter motion motion vector together, infer the position of window of the face of next frame, and then follow the motion of shot subject with the window (21) of face.

When beginning to communicate by letter, the window 21 of the face of solid shape is set in middle position, and among the motion vector of " macro block " of the inside of this window 21, calculating is not the mean value of 0 " macro block ", the direction that move of this mean value, upgrade the position of window 21 as window 21.

If originally, personage's face is not positioned at the central authorities of picture, when so the position of window 21 and people's face are inconsistent, in case have the inside that the part of the face of motion vector enters window 21, just can pass through the effect of the motion vector of window 21 at once, make window 21 close with personage's face gradually, up to final unanimity.

Like this, by window 21 being defined as carrying out the average weighted selection reference of dynamic image data.Because window 21 can be followed the motion of face, just can carry out the weighted average of real dynamic image data thus.

Below, be the ITU prescribed by standard, some definition relevant with the processing of block unit.The pixel quantity that the following horizontal X of numeral is perpendicular.One frame of dynamic image is made of the piece that is called as least unit, and a piece comprises 8 * 8 pixel; And then, the grey scale signal of 16 * 16 pixels, and two color difference signal Cr, Cb of 8 * 8 pixels, these three zones that together constitute are called " macro block ".Thereby " macro block " is by 4 adjacent grey scale signal Y and respectively be 1 color difference signal Cr, Cb, adds up to 6 pieces to constitute.

The pixel count of the standard of a frame that adopts in the above-mentioned ITU standard is made of the grey scale signal of 144 * 176 pixels and 2 color difference signal Cr, Cb of 72 * 88 pixels; Since 1/4th CLV Common Intermediate Format QCIF (Quarter Common Intermediate Format), and CIF (Y:288 * 352 pixels, Cr/Cb:144 * 176 pixels) and 4CIF (Y:576 * 04 pixel, Cr/Cb:288 * 52 pixels) or the like, there are many types.The present invention is of a size of object with the complete frame that is allowed to use in the above-mentioned ITU standard, and the above-mentioned macro block and the unit of piece carry out in the limited field that adapts to and conform with the signal processing defined.Improving image quality in the scope of such qualifications is the main idea that innovative the present invention will follow.

Fig. 2 is the key diagram of the window of face with the peripolesis window, in other words when the motion of face is fewer, speaker's care is then concentrated on around the face, such as on hand.This will address this problem just.Concrete operation is, obtain the difference value of the image of previous frame with the image of present frame, when difference value is above above the threshold value of defined, operate, importance is only second to face to fence up such as objects such as arms, just use the zone of the big block of pixels (hereinafter referred to as " macro block ") of 16 * 16 pixels formation or any change of block of pixels (hereinafter referred to as " piece ") formation that 8 * 8 pixels constitute, be window 51, and it is carried out preferential information processing.

So, the whole image that constitutes above-mentioned peripolesis window 51 is divided into the fritter of rectangle, under this mode that is divided into rectangle one by one, the motion vector that utilizes the dynamic image with piece to accompany simultaneously, infer the peripolesis position of window and the field of next frame, thus make peripolesis window 51 can follow above-mentioned any change the zone.Peripolesis window 51 is distinguishing with the window 21 of face, and it is the difference value according to former frame image and current frame image, and has used the scope bigger than corresponding threshold value, calculates as principle.And can be transformed to shape arbitrarily aptly.

Near the window 21 of face, when face did not have motion substantially, threshold value can suitably change this peripolesis window 51, thereby made this window cover the peripheral part of face in the motion fierceness of face.Particularly when hand was made action, the peripolesis window can cover the action of hand.Like this, not only people's face as main body, and whether peripolesis window 51 be defined as differentiating having the object part of actions such as posture, gesture, carry out the average weighted discrimination standard of dynamic image data; And, follow motions partly such as hand with peripolesis window 51, like this, just can be to carry out the weighted average of actual dynamic image data simultaneously with the object part of actions such as posture, gesture as main body with people's face.

Fig. 3 is the key diagram of method of cutting down the amount of information of background, except facial window 21, also comprises peripolesis window 51, and their scope is distinguished mutually with background, is illustrated by Fig. 2 about this point.This be a kind of background motion relatively fierceness thereby dynamic image data amount a lot of in, by reducing deliberately the weaken algorithm of image quality of background of background motion amount.That is to say from " macro block " image of present frame with carry out addition with its this " macro block " image by a certain percentage in the data of previous frame and mix with identical position, with the filter (not illustrating) of such time orientation, the dynamic image data of above-mentioned background is cut down.

The window 21 of face and peripolesis window 51 are combined, and emphasis is considered this two fields, opposes that mutually deliberately reducing image quality with they background images 22 in addition reduces amount of information, carries out the filtering on the time orientation.Here, present frame " macro block " is not handled, but replace " macro block " derived handled, should " macro block " be " macro block " and " macro block " of present frame that utilizes from preceding picture frame (hereinafter referred to as " preceding frame " or " preceding reference frame "), by carrying out " macro block " that equalization derives out after they are weighted with same position.Weighted average, " macro block " of preceding frame that can think co-located is with the mean value of the pixel value of present frame.Opposite extreme situations is to have only " macro block " of preceding frame oneself to replace oneself, and this moment, the average image was exactly a rest image.By noise or the motion that suppresses background image like this, the amount of information that background frame is used is limited to very little ratio, so just can be used for face and peripolesis field to the information of more ratios.

Like this, consumed the dynamic image data of the motion fierceness of transinformation content in large quantities,,, can greatly reduce amount of information though image quality reduces slightly by the use of termporal filter.When the image that these amplitudes that contain much information are cut down reduced later on through decoder, the scene image quality of having only fast moving to change was subjected to some influence slightly.Specifically, use in traveling automobile in this video telephone, the landscape that moves as teller's background is seen from vehicle window has a point fuzziness slightly.So like this, when background moves fierceness,, carry out suitable processing simultaneously by its amount of information is significantly cut down, the dynamic image data amount of those unessential backgrounds just can suitably be cut to the degree that visually is unlikely to ugly like this.

Here, the relation between I frame, P frame and the B frame once is described in advance.

The I frame only uses the image information of present frame to encode, so, the image quality of the decoded picture before its image does not rely on.

The P frame, it is image according to front reference frame (before I frame that has been encoded or the P frame adjacent) with present frame, the motion vector of each " macro block " that utilization is tried to achieve in motion predictor, generate the predicted picture of motion compensation, then, the difference image information of the image of this predicted picture and current incoming frame is compressed (discrete cosine transform, quantification, Variable Length Codeization) and sent to decoder.

For the P frame, decoder is to the compressed encoding of this difference image information decode (inverse discrete cosine transform, re-quantization, variable-length decodingization), simultaneously, the P frame also has a difference with the I frame, and to be it undertaken motion compensation to what encoder sent and the predicted picture (generation in encoder one side is same) that obtains merges with difference image by motion vector, finishes the decoding (decoding P frame=predicted picture+decoded difference image) of P frame with this.The noise that has carried when, in decoded difference image, having sneaked into compressed encoding.The image quality of decoding back P frame depends on the precision of prediction (predicted picture is with the similar degree of the image of current incoming frame) of predicted picture greatly, so it is subjected to being used to producing the direct influence of image quality of the preceding reference frame of predicted picture.

In other words, in the time of the image quality aggravation of preceding reference frame, the image quality of decoded current P frame also worsens, and particularly brings the influence of chain deterioration to the image quality of its later P frame.Otherwise the image quality of P frame improves, and is that the image quality of the frame of reference also improves the chain raising of the image quality of frame after also bringing with it.

The B frame, it is image based on 2 reference frame before and after on time relationship (adjacent and afterwards adjacent I frame or P frame after decoded present frame before), generate and above-mentioned same predicted picture, and then, and current incoming frame between the compressed encoding of difference image information deliver to decoder.

In decoder, decoded difference information, with another information, just from encoder send here serve as the predicted picture (generation encoder is same) of basis generation with motion vector and prediction direction (equidirectional prediction, prediction in the other direction, twocouese prediction) information, both merge, and then realize the decoding of B frame.

The difference of B frame and P frame is, because be the predicted picture that generates on the basis of 2 reference frame, so it is higher to compare accuracy of predicting with the P frame, the compressed encoding amount of difference image is littler, and this is a bit; In addition, because the B frame is with from as reference, so even the image quality aggravation of B frame, the influence of deterioration can not involve other (after the B frame) frames yet.This is more in addition.

, the image quality of B frame itself, the same with the P frame, greatly be subjected to the influence of image quality of the frame of its institute's reference.

Fig. 4 is a key diagram of handling the method for cutting down amount of information by the B frame.Title to signal and processing in each processing procedure marks, and here, of course not only the title of the functional block of each signal processing is sectioned out, but handle has been set numbering with the relevant functional block that illustrates that is necessary of each processing.

At first, in the represented encoder of Fig. 4 (a), current frame image, preceding reference frame image and back reference frame image are imported into prediction function piece 41.In this motion prediction function piece, carry out motion prediction, motion compensation and decision prediction mode.Then, predicted picture and the difference image information between the current frame image exported from this motion prediction functional block 41 are imported into making zero of both full-pixel functional block, and all pixels of this difference information are forced to make zero.

Via discrete cosine transformation block 43, pass through quantized segment 44 from the both full-pixel information that makes zero of this making zero of both full-pixel functional block 42 outputs again, be imported into coding generating portion 45.Coding generating portion 45 when predicting the next one motion of dynamic image according to motion prediction part 41 definite prediction mode, is encoded the both full-pixel information of above-mentioned making zero.The structure that encoder is complete will be narrated in the structure chart that Figure 11 represents in the back, Figure 12 represents the structure chart of decoder, the structure chart of the compatible one of Figure 13 presentation code device/decoder, the back will be narrated together by them, so, here the mobile of signal represented in only handling with regard to the B frame, and its effect is described.

Fig. 4 (b) is used for representing decoder.From encoder dynamic image signal is carried out encoding compression, send out then, receive and decoding by decoded portion 46, from decoded portion 46 decoded signal is input to re-quantization part 47 then and carries out the re-quantization processing through transmission lines 100.Then, from re-quantization part 47 output inverse quantized signals, be input to inverse discrete cosine transform part 48 then, carry out inverse discrete cosine transform at this, be reduced to differential signal by above these steps, the difference image of this reduction is with predicting that according to above-mentioned prediction mode the predicted picture that obtains merges, last output decoder image.

In general, in employed dynamic image compress mode, there are following three kinds of frames, promptly the image information of current incoming frame carried out the I frame of direct coding; And the image information of the reference frame of the image information of present frame by before predicts, the P frame that the difference image between present frame and this predicted picture is encoded; And, by the synthetic predicted picture of the image information of 2 reference frame before and after on time relationship, and this predicted picture between the differential signal B frame of encoding.

Here said " reference frame " refers to decoded I frame or P frame.Here said " predicted picture " refers to, and the motion that takes place between reference frame and the current incoming frame is that unit predicts with various " macro block ", and in reference frame amount of exercise compensated and the image that obtains

Recently, in standard H.263 and H.263+, advocate a kind of coded system that is called the PB frame, that is, and B frame and be that unit encodes with " macro block " simultaneously at the P of its back frame.

As the Forecasting Methodology of B frame in the PB frame coding mode, there are following 3 classes in B frame in the past:

(1) the equidirectional prediction of using during reference frame before the prediction;

The opposite direction prediction of using when (2) predicting the back reference frame;

That (3) uses during the reference frame of two directions before and after the prediction is bi-directional predicted.

In the past,, these difference images were encoded, delivered to decoder then in order to revise the deviation between current incoming frame image and the predicted picture; And here, represented as Fig. 4 (a), the image its differential signal pressure is all made zero in fact, only sends the Forecasting Methodology of above-mentioned three kinds of B frames exactly as information, greatly reduces B frame information amount with this.Here the represented method that comes to this.

The B frame, as preceding and illustrated, because the reference frame of encoding as other frame not, so even what take place and worsen in the image quality of B frame, to the also not influence of coding of the frame after its.By the amount of information of B frame is significantly reduced the shared amount of information of P is increased, consequently the image quality of P frame is improved, and just as previously described, the whole image quality of dynamic image also is accompanied by and has improved.

In the image processing that the difference image of forcing to make the B frame all makes zero, the discrete cosine transform part 43 that dotted line surrounded of Fig. 4 (a) and quantized segment not necessarily can correspondingly be done the reduction of hardware.

Introduce the coding method of this new B frame in decoder one side, do not need to change the structure of decoder one side, just can keep fully and the interchangeability of standard in the past.Like this, when the motion of above-mentioned dynamic image was very fierce, the amount of information of its difference image also increased greatly, in order to make the amount of information minimum of difference image as far as possible, by forcing that the both full-pixel value is made zero, can reduce to original image with minimum amount of information.

Here, to reducing the noise of pattern colour difference signal, the quantification method that improves decoded picture image quality visually simultaneously describes.The quality of the image by decoder reduction, in general, be to depend on to grey scale signal and 2 noisinesses that color difference signal carries out sneaking in the compression encoding process, people's vision is responsive especially to color difference signal, hence one can see that, cut down the noise of these two color difference signals, just can improve the quality of image visually.

Quantization method in the past is the quantization level by appointment, and utilizes following formula, carries out discrete cosine transform, obtains the data of frequency domain.

|L|＝[(|C|-Q·s·(p+f))/(Q·s)]

Here, | L| is the absolute value of the data that are quantized, | C| is the absolute value of former data, and Q is a quantization level, and s is the correction value of quantization level, and p is the re-quantization correction value, and f accepts or rejects corrected parameter, the transformation calculations that [] expression real number rounds.

Re-quantization correction value p and quantization level correction value s determine according to different moving picture encoding standards.In H263 and H263+ standard, re-quantization correction value p according to the coded system of " macro block " and the kind of frequency data, gets 0 respectively the flip-flop of (" intra-macroblock " time) or get 0.5 (other situation), and the quantization level correction value fixedly gets 2.Here said " intra-macroblock " refers to, and do not use predicted picture but " macro block " of directly present " macro block " image being encoded.

On the other hand, accept or reject corrected parameter f, in standard, do not stipulate, be free to set.In the time of f=0.5, mean the quantification that rounds up; In the time of f=0, mean that rounding (reduction) quantizes.

Adopted in TMT-11, in the method for former quantification, accept or reject corrected parameter f, the mode according to the coding of " macro block " is set at 0.5 (situation of " intra-macroblock ") respectively, or 0.25 (other situation).In grey scale signal and two color difference signals, use identical f value.

In the present invention, by being that grey scale signal of different nature and color difference signal are set the f value, grey scale signal uses identical quantization level with color difference signal, has successfully reduced the noise of color difference signal significantly.Specifically, as followsly determine the f value.

(1) situation of " macro block " in is with equally setting f=0.5 (quantizing according to rounding up) in the past

(2) other situation, grey scale signal uses f=0 (rounding (reduction)), and color difference signal is got f=0.5 (rounding up).

When grey scale signal quantizes, use the method that rounds (reduction), certainly will increase the noise of quantification, so basic in the past nobody considers to make in this way., the effect that grey scale signal is rounded the quantification of (reduction) is in the quantification of getting 0 value, and the number of frequency content sharply increases, and compression ratio significantly rises like this, therefore the reduction that brings quantization level.The tendency of the noise increase that quantizes has then been offset in the reduction of quantization level, and the noise of final quantification only increases the degree of 0.0dB to 0.3dB, compares with former situation, does not have what change substantially.On the contrary, round up because use in the quantification of color difference signal, the noise of aberration has improved the degree of 1.2dB even 1.3dB.Signal to noise ratio has generally been improved 0.3dB.Particularly, the significantly reduction of the noise of color difference signal has guaranteed the correct reduction of color, and the picture quality on the visual effect has obtained significantly to improve.Thereby, use than simple structure in the past, brought into play tangible reduction color difference signal anti noise, by the improvement of signal to noise ratio,, obtained the raising of the image quality of visual essence from the characteristics of human eye.

Next, be the quantification of the DC coefficient (value in 8 * 8 the upper left corner) of " intra-macroblock ".That is, divided by 8, remainder rounds up the DC coefficient.On the contrary, in decoder one side, above-mentioned DC coefficient will obtain with 8 by the trusted that is quantized is on duty, thereby, [re-quantization value]=8 * [quantized value].When not using predicted picture, but to current " macro block " (16 * 16 or 8 * 8) image direct coding, just in this " intra-macroblock ", the just quantification that grey scale signal and color difference signal are rounded up.And to " intra-macroblock " situation in addition, grey scale signal rounds (reduction) and quantizes, the color difference signal quantification that rounds up.

Like this, by grey scale signal and color difference signal are used same quantization level, can reduce the noise of color difference signal.But on measuring, cut down the noise of same degree, compare with grey scale signal, to the reduction of color difference signal, effect visually is higher.Thereby, use than simple structure in the past, just brought into play tangible reduction color difference signal anti noise, by the improvement of signal to noise ratio,, obtained the raising of the image quality of visual essence from the characteristics of human eye

Fig. 5 is the condition relevant with the frame level rate and the general chart of variable.The control and treatment flow chart of Fig. 6 time frame level rate.Fig. 5 has represented in the control of frame level rate, the condition that need consider when calculating target bit rate.1 second pairing frame numbers G of input picture is that the dynamic image input source (video camera, video tape recorder) according to encoder one side decides, and general value is 25 to 30.For the image of output, 1 second pairing frame numbers F gets 1 second following value of corresponding frame numbers G of input picture, and the ratio of G and F is integer.

At this, the pairing frame numbers C of 1 width of cloth output map picture frame, when next width of cloth output frame used the coded system of PB frame, value was C=2; When next width of cloth output frame used the coded system of I frame or P frame, value was C=1.The frame numbers H of the one second pairing maximum that can reduce in the encoder gets the above value of F.The frame delay time D of the maximum that can guarantee promptly from the input of the image of encoder one side, to the time of delay that needs between the image restoring of decoder one side, is to represent the unit interval to import picture cycle (1/G) second.This time of delay, the needed time of transmission except the coded message that depends on each picture, also depend on the time of delay that encoder is handled.Here, this processing is ignored time of delay.Have again, will narrate in the back about the condition that D should satisfy.

What Fig. 5 (b) represented is 3 variablees determining according to the condition of Fig. 5 (a), and E is the minimum bit amount that needs for the bandwidth of 100% use channel.When the bit quantity of reality less than this E the time, the communication buffer that is used for storing output information becomes empty, causes the generation of spillover down, and the actual value of communication speed descends, and the result causes the deterioration of image quality.

Here, L can guarantee the frame delay time D and the bit quantity of the maximum that needs, and when the coding bit rate of reality surpassed L, the delay of going back original image in decoder had just surpassed D.Thereby spillover carries out stable dynamic image communication in order to reduce down, can guarantee that frame delay time D, L value must get the value more than the E simultaneously again, has provided the G/F-1 of the condition that D must satisfy: D 〉=(2C-1) in Fig. 5 (a).Have, K is the corresponding bit quantity of distributing of 1 frame, is set at the following value of the above L of E again.Also have, the s among the figure is a constant of determining efficiency value, the value from 0 to 1.

What Fig. 5 (c) represented is the variable that depends on the coding bit rate of each picture frame.Fig. 5 (a) (b) (c) middle variable R, G, F, C, H, D, E, L, K, W, B, U, T that occurs all is similar to integer.Like this, the processing of the rank of following frame control can all be an integer calculations, the easier realization of hardware.

What Fig. 6 represented is the process chart of the level rate control of frame.In Fig. 6 (S61), the bit quantity W residual communication buffer is set at 0, and the input picture of beginning is encoded as 1 frame.In Fig. 6 (S62), the bit quantity B that judges present frame whether be on the occasion of.If B be on the occasion of, frame time of handling is C/F so now, the bit quantity that sends during this is set at RC/F.If B is a negative value, judge that current frame has been skipped, the cycle of encoding process time and input picture is same, is 1/G second, and the bit quantity U that sends during this is set at R/G.In Fig. 6 (S63), the residual bit quantity W of communication buffer is with the bit quantity B addition of present frame, and with compare from the bit quantity U that communication buffer sends in the present frame processing time, and then upgrade the value of W.

In Fig. 6 (S64), calculate the target bits amount T of next frame, and check the magnitude relationship between W and L-E, F and the H at this moment.General situation is W＜L-E, at this moment, deducts the residual bit quantity W of communication buffer among the bit quantity K of a frame correspondence, and its difference is set at T.And in W＞L-E, present frame can't reduce in decoder with interior in the frame delay time that guarantees, so the hysteresis meeting temporary transient generation above above the time of delay that guarantees (below, this situation is called " excess delay state ").After this, in encoder one side, in order to eliminate this temporary transient excess delay state, K is set, if the frame numbers H of the one second corresponding maximum that can reduce of decoder just can eliminate this excess delay state in decoder one side greater than one second corresponding frame numbers F of output image.

, for example when H and F equate, at this moment, though eliminated the excess delay state in encoder one side, can if can not eliminate in decoder one side, probably the excess delay state also will continue.Like this, temporary transient excess delay state can't be monitored, and the excess delay state continuance time, many times produce to postpone excessive problem, thereby, in W＞=L-E and F=H, set T=0.In Fig. 6 (S65) T be on the occasion of in, as common processing, skip next (D/F-1) frame, and to the C of its back (C=1 when ensuing frame is I or P frame; C=2 in the time of the PB frame) the input picture is encoded.The value of T be 0 or negative value in, one in the ensuing input picture skip all of also not encoding.So when skipping like this, the bit quantity of frame is B=0.

Like this, just can strict control from the input picture of encoder one side until the time of delay of decoder one side image till being reduced.And when producing the LOF of output picture, compared with the former CG/F frame of losing, the present invention is suppressed at 1 width of cloth picture to LOF, and the picture of losing so generally also seldom can be realized the communication of stable dynamic image.

The means that the residual bit quantity W of the bit quantity B of the present frame of the dynamic image data that is encoded and communication buffer compares Fig. 6 (S62) (S63) in expression, utilize its comparative result, in order to make above-mentioned residual bit quantity W inexhausted, and then the control device of the target bits amount T of control next frame is represented in (S65) at Fig. 6 (S64).The control result that use obtains by this control device, through above-mentioned encoder, transmission lines 100 and decoder, the time of delay and the LOF that take place during this up to the image of output decoder are limited to minimum value the image of video camera input.The means of the target bits amount of the calculating next frame of controlling with the frame level rate have been established with this.Like this, the following delay phenomenon that overflows or go back original image of the communication buffer that the practical communication speed that caused for controlling well in the past reduces, by very simply calculating, just reached very high control ability here, can realize " lip sound is synchronous " in real time.

By Fig. 7, Fig. 8, Fig. 9 and Figure 10, to being that the control method of " macro block " level rate of index describes with the coded-bit amount.

Fig. 7 is the variable relevant with " macro block " rank and the general chart of constant.What particularly point out is, the mean value Qa of the quantization level of " macro block " of the numeralization of preceding frame, be that " macro block " that be encoded (referred to, under the situation of not using " middle macro block ", all becoming 0 " macro block " in sampling frequency composition and the motion vector composition is not encoded) carry out five equilibrium, resulting value by the quantity of " macro block " of utilization.

As shown in Figure 8, be to use the average weighted mean value Qa of quantization level of each " macro block " of former frame, calculate the initial value Q of quantization level of utilization in " macro block " of initial (i=1) of present frame, this has constituted first kind of computational methods (S1). like this, among Fig. 8 (S3), use the rank of quantification of utilization in " macro block " of the encoding amount of the reality of above-mentioned target bits amount till current " macro block " and initial (i=1), the amount trimmed of applied quantization level in " macro block " of (i=2,3 up to N) after calculating second.This constitutes second kind of computational methods, and Xiang Guan structure is represented among Fig. 9 and Figure 10 therewith.

Fig. 8 is the process chart of " macro block " rank control.But, when 1 frame of initial input picture is encoded, do not use the control of this " macro block " level rate, at this moment the rank of Liang Huaing is fixed as specific value.Fig. 8 (S1) is the state that is used for checking former frame.When former frame is initial I frame or is skipped (B '=0), the intact initial value that is made as the quantization level of present frame of the initial value Q ' of the quantization level of former frame.

Situation beyond above-mentioned, mean value Qa with the quantization level of the coding " macro block " of former frame, shared rate value the B '/T ' of coded-bit amount B ' with corresponding reality among the target bits amount T ' of former frame multiplies each other, with this factor as function C Q, the output of function C Q is as the initial value Q of the quantization level of present frame.

Fig. 9 is the key diagram of the definition of function C Q (x) in the control of " macro block " rank.As shown in Figure 9, function C Q in the permission maximum Qmin of factor x less than quantization level, is output as Qmin.If x is within allowed band, function is just the intactly output of this value so.Function C Q is such " ending " function.

In former frame, when the coded-bit amount of reality greater than the target bits amount time, the initial value of quantization level also will be set than Qa height; On the contrary, actual bit quantity is less than the target bits amount time, and the initial value of quantization level will be set lowlyer than Qa.Suitably adjust " control rate " of the control of this " macro block " rank, can improve its control ability.

Other, also have the predicted value J of the average pairing average bit quantity of each sampling frequency composition of calculative non-zero among Fig. 8 (S4).This calculates, and utilizes employed all bit quantity in the frequency content of former frame to quantize the value J ' that the number of frequency content obtains divided by non-zero exactly, J and J ' according to t: ratio addition (1-t), resulting value is as the value of new J.T is a constant, for greater than 0 less than 1 real number value.

Among Fig. 8 (S2), if the target bits amount T of present frame be on the occasion of, just carry out the encoding process of actual present frame; If not so, directly forwarding Fig. 8 (S6) to goes.

Fig. 8 (S3) is each " macro block " first half that (i) compressed encoding is handled in (i=1,2 until N).Carry out discrete cosine transform for above-mentioned difference image, its frequency content is quantized according to quantization level q.

Fig. 8 (S4) is updated to suitable value to quantization level q.Have, the details of this section processes represents that in Figure 10 this will narrate in the back again.

Fig. 8 (S5) is each " macro block " (i) latter half of middle compressed encoding processing, carries out variable length code, and the value of B is updated to the part that comprises " macro block " coding (i).And then, generate re-quantization, reach also original image of generation discrete cosine.

Among Fig. 8 (S6), whole " macro block " processing finishes.After the coding of present frame is finished, replace Q`, B`, the value of T`, the processing of preparing next frame.

The above all-pass of handling is crossed integer arithmetic and is realized.

Figure 10 is each " macro block " (i) flow chart of the update calculation processing of middle quantization level q.Going up in proper order from front and back, is the detailed description of the processing of Fig. 8 (S4).In Figure 10 (S7), at first judge each " macro block " (i) still do not encoded.The condition that " macro block " encoded is not, is not only " intra-macroblock ", and sampling frequency composition and motion vector composition all are zero.When not encoding, also just do not carry out the renewal of quantization level q.

In Figure 10 (S8), carry out at first calculating 4 variablees in " macro block " coding as the index of the update calculation of quantization level q, i.e. variable d, h, a, e.d are the predicted values of current " macro block " bit quantity (i).It is the pairing average bit quantity J of the non-zero that number z takes advantage of and above-mentioned (S61) calculates quantification per unit frequency content by current " macro block " non-sampling frequency composition (i), the value that obtains is thus added the bit quantity V of the anticipation beyond the frequency content, and the bit quantity of the anticipation beyond the frequency content adopts the resulting value of experiment of anticipation.

H is the predicted value of residual (being untreated) " macro block " bit quantity of being consumed, and is called residual amount bit predicted value.

A is the residual value of bit that " macro block " (i) can be consumed later, is called residual amount bit permissible value.

E is when hypothesis each " macro block " (i) same coded-bit amount takes place, the desired value of the bit quantity of " macro block " consumption of remaining (being untreated).Be called residual amount bit desired value.

In the residual amount predicted value h, quantization level q is risen much larger than bit at the residual value a of bit, cause the information generating capacity to reduce; On the contrary, in the residual amount predicted value e, quantization level q is descended much smaller than bit, increased the information generating capacity at the residual value a of bit.

Among Figure 10 (S9), ask b1, the b2 parameter value.

When present quantization level q than initial " macro block " in the initial value Q of employed quantization level big the time, b1 is the deviator of drawing that plays the effect that suppresses q, makes q be unlikely to become big more than Q; On the contrary, b2 plays when q is littler than Q, makes q be unlikely to become the deviator of drawing than the little more this effect of Q.

Among Figure 10 (S10), ask c1, the c2 parameter value.

Employed constant f in this calculates is a parameter of adjusting the sensitivity of bit rate control, generally uses the value more than 1.0.

Constant g is used for adjusting drawing deviator b1, the parameter of the intensity of the role of b2.Generally use the value more than 0.0.

Among Figure 10 (S11), carry out the renewal of actual quantization level q.

Here, consider the condition of following 1-4.

Condition 1:q＜Q and a＜h, for genuine the time, q ' gets the value of q and q1 addition; In the time of pseudo-, judge by condition 2.

Condition 2:e＞ac1, for genuine the time, q ' gets the value of q and q1 addition; In the time of pseudo-, judge by condition 3.

Condition 3:ec2＞a for genuine the time, is judged by condition 4; In the time of pseudo-, the value of getting q.

Condition 4:W+B＜U, for genuine the time, q adds q2, situation q in addition remains unchanged.In the time of pseudo-, q ' gets the value of q.

But, satisfy 0＜q1≤qmax and qmin≤q2＜0.W is the remaining bit quantity of the communication buffer of expression in Fig. 5 (c), and U is the bit quantity that the encoding process at present frame of expression in Fig. 5 (c) sent in the time.

At last, utilize the judgement of above-mentioned 4 conditions to calculate the value of q`, determine the value of among function C Q, " ending " with this, and as the updating value of q.

H.263 the maximum permissible value Qmax of the quantification of regulation is set at 31 and H.263+, and minimum permissible value Qmin is set at 1; The permission maximum qmax of the variable quantity of the quantization level of continuous 2 " macro blocks " is set at+and 2, variable quantity allows minimum value qmin to be set at-2.

Above processing all realizes with integer arithmetic or fixed point.

Like this,,, replace with simple calculating, reduced computing used time of delay, can realize " lip sound is synchronous " synchronously indispensable very complicated calculating in the past for the bit length to above-mentioned each picture frame carries out high-precision equalization.

Next, form as a kind of enforcement, constitute the System Memory Sharing Processor Array of the high-performance code compression system of dynamic image data, be systematized Memory Sharing type processor array mode (below, be also referred to as the MSPA of system), with this structure is the memory that preserve on the basis, and the system that is used for low bit rate video coding, will be described further them with figure below.

Figure 11 is the coder structure block diagram, and Figure 12 is the decoder architecture block diagram, and Figure 13 is the general one-piece type structured flowchart of encoder/decoder.

In Figure 13, the 5th, video camera is input to the dynamic image signal based on people's face in the data/address bus 2 by camera interface 6 thus.Data/address bus 2 links to each other with master computer 8 by main interface 7.And can be at address-generation unit (Adress Generation Unit), also be to carry out exchanges data between central control device (below, be called AGU) 3 and the memory 1.

Because AGU3 and memory are combined together carry out functional verification, be very easy to so carry out functional verification in design and the stage that studies.

AGU3 except with memory 1 also link to each other with the ROM9 that has worked out the elemental motion program, and control signal is sent to above-mentioned each hardware module (below be also referred to as " module " or " each unit " or " unit "), and then constitute the high-performance code compression system of dynamic image by control bus 4.

In actual applications,, use DRAM (dynamic memory), obtained satisfied result as memory 1.Its main idea is record to some extent in each figure, in the present invention, the dynamic image data that is encoded is carried out suitable storage and lightheartedly reads, and such memory means not only limit above-mentioned dynamic memory, use other method also passable.If the application of the sort of actual form is arranged, should regard as certainly and be included within the main idea of the present invention.

The total of dynamic image data realized by data/address bus 2 in above-mentioned each unit, and in control bus 2, just the memory 1 by linking to each other with the circuit outside is realized the transmission of the data between above-mentioned each unit.The data of above-mentioned each unit and memory 1 transmit and are controlled by AGU3 and ROM9.

This is the characteristics of said system MSPA maximum, the transmission of the data between above-mentioned like this each unit will be undertaken by memory all, like this, the dependence of the processing between above-mentioned each unit, only by when memory 1 and AGU3 carry out the memory data exchange, the shared time of time-division processing is decided.

As each above-mentioned unit, in Figure 13, represent at Figure 11.Like this, do not need to consider the relation of the dependence between above-mentioned each unit, independent design is coupled together concurrently, constitute whole by data/address bus and control bus.Thereby for above-mentioned each unit, different designers can carry out simultaneously, and the program structure of entire system does not need very big scale, has shortened the time of design.And if the design of above-mentioned each unit needs change, the program that only changes AGU3 just can realize at an easy rate.Thereby make system have soft adaptability to changes.

Motion-vector search part 10 has been equipped with the prediction deciding section that does not have expression among the figure, is used for calculating the average amount of movement of image of the window 21 of face.And follow the window 21 of face according to this average amount of movement.

Like this, match,, utilize above-mentioned average amount of movement to predict following motion, the window 21 of face is gone ahead of the rest send to decoder in order to obtain the effect that does not have the high image quality that postpones with people's being shot shaking of face.

Below, window MSPA mainly is described.

Preserve the memory 1 of image frame information, combine by data/address bus with the hardware module of various independent execution commands, come control storage 1 flowing and executive program by AGU3 with the data between the above-mentioned hardware module, AGU3 with each hardware module construction system that combines, and forms above-mentioned encoder and decoder by this system configuration by control bus 4.

The video telephone of practical application should be the represented general one of encoder/decoder of structured flowchart that resembles Figure 13, just comprises the device that sends and accept twocouese, is combined into one and construction system by such device.Here, the encoder among Figure 11, and the decoder among Figure 12 difference comes, and is expressed as independent structure, omits the explanation of their repeating part.

In the system of encoder shown in Figure 11, externally use the memory 1 of 4K bit capacity, this storage address is that 16 bits, data are that 16 bits, access time were 40 nanoseconds, can store the data of 4 frames of QCIF (176 * 144 pixel) form.

In Figure 11, the data of importing from video camera 5 are stored in the memory 1 on the one hand, and on the other hand, " macro block " that constitute according to 16 * 16 pixels compressed processing successively.

At first, in motion-vector search part 10, it is next from where moving of previous frame searching for processed " macro block ", and exports as motion vector.At this moment, for " macro block " of window 21 that does not belong to face or peripolesis window 51, its image information is carried out deterioration by not carrying out illustrated termporal filter.

When the image information of still frame was transfused to, original image information was by intact output in above-mentioned termporal filter; On the contrary, when the fierce television image information of motion is transfused to, carry out the information operating that motion is relaxed.

Like this, the image information of the strenuous exercise of a large amount of transinformation contents of needs consumption, brings and cuts down amount of information significantly to reduce some image quality slightly by the processing of above-mentioned termporal filter.

When the image that these amplitudes that contain much information are cut down reduces through decoder, have only the image quality of the scene of fast moving variation to be subjected to some influence slightly.Specifically, in traveling automobile, use in this video telephone,

The landscape that moves as teller's background is seen from vehicle window has a point fuzziness slightly.

In motion compensation portion 11, utilize the motion vector obtain, by " macro block " of the present frame in handling, and be considered to this " macro block " be by move and the position of former frame, both produce differential data, and in the write memory 1.

In discrete cosine (contrary) conversion fraction/(contrary) quantized segment 12, the differential data that reads among the memory 1, / 4th i.e. pieces that constitute of one by one 8 * 8 pixels according to " macro block " carry out discrete cosine transform, obtain the frequency content of 8 * 8 pixels thus.And then " macro block " carries out the quantization operation of " figure place is variable " at a high speed according to quantization step one by one, and its result outputs in the memory 1.

In variable length coder 13, the frequency content of the differential data that has been quantized and has cut down bit quantity that reads from memory 1 is assigned with suitable coding, and does not store into and carry out in the illustrated internal buffer.This internal buffer outputs to the outside to coded data according to certain transfer rate.

In discrete cosine (contrary) conversion fraction/(contrary) quantized segment 12, such as the piece image that constitutes by 352 * 288 pixels, be divided into the block of pixels that 8 * 8 pixels constitute, then by DCT (DiscreteCosine Transform) discrete cosine transform (below be also referred to as dct transform), decomposite frequency content (orthogonal transform), cut down radio-frequency component and carry out Information Compression, each coefficient behind the dct transform carries out division arithmetic with certain divisor, and remainder rounds up.

More than these, have when encoding process for along direction, be to be backward function in decoding processing.In the decoder of Figure 12, because not along direction transformation, so be equipped with inverse discrete cosine transform part/re-quantization part 12b.

Discrete cosine (contrary) conversion fraction/(contrary) sonization part 12, and P " macro block " constructs part 14 again, and " macro block " predicted portions 15a, and the piece deformity is removed filter segment 16, their function is to finish the process of constructing current " macro block " from the sampling frequency composition again, and is applied in the compression processing of next frame.

Inverse discrete cosine transform part/re-quantization part 12 is used for finishing the inverse operation of discrete cosine and quantification, that is, the quantized data of the frequency content of differential data is input in the memory, by quantization operation, restores original bit; Further by inverse discrete cosine transform, restore differential data, the result is saved in memory 1.

P " macro block " constructs part 14 again, is that Figure 11, Figure 12, Figure 13 are common.Here, read differential data and the data of the former frame that obtained by motion vector from memory 1, their phase Calais are restored " macro block " of present frame, write memory 1 then.

In Figure 13, B " macro block " predicted portions/construct part 15 is again arranged, utilize it to finish the processing of various P frames and B frame.Be equipped with B " macro block " predicted portions 15a in the encoder shown in Figure 11; Dispose B " macro block " in the decoder shown in Figure 12 and constructed part 15b again.

In the B of encoder " macro block " predicted portions 15a, the P that is constructed again " macro block " and it by move and " macro block " of the former frame of coming from memory 1, read out, from predicting forward that by former frame B " macro block " data that obtain mix mutually with two top data, construct predicted B " macro block ", and carry out the comparison of similarity from the B " macro block " of video camera 5 input with reality.Judge that for each " macro block " which is optimum in these methods, utilize variable length coder 13 to send this information itself.In decoder, construct part 15b again by B " macro block ", reproduce the data of B frame practically.

The piece deformity is eliminated filter 16, is that Figure 11, Figure 12, Figure 13 are common.Read the P " macro block " that is reduced from memory 1, eliminate the effect of filter 16 by piece deformity, remove the noise of the similar go cross-point geometry that the human eye of " macro block " connecting place is difficult for discovering, write memory 1 as a result.

AGU3 controls the execution of each module, and controls the exchange of each module with the data between the memory 1.

AGU3 utilizes among the ROM9 program stored to operate, and said program command is meant that the address of memory 1 produces, and to the access control of memory 1 etc.

Main interface 7, the ROM9 that replaces AGU3 from the master computer 8 of outside to system's input command.Control and the memory 1 of carrying out each module transmit with the data between the master computer 8.

Like this, window MSPA is formed by connecting by data/address bus 2 and memory 1 by various hardware modules (each unit) with high-speed cruising function on the whole.

That is to say that each unit is handled from memory 1 reading of data of outside, its result outputs to again in the external memory storage 1.

Thereby, there is not direct exchanges data between above-mentioned each unit, all must be undertaken by memory 1.AGU3 utilizes among the ROM9 and orders, and controls each unit and transmits with the data between the memory 1.

Also have,,, also may not use two if possess response disposal ability with two memory response though two memories 1 are respectively applied for encoder.

The characteristics of the maximum of above-mentioned window MSPA are such, because the data between each unit transmit fully and will realize through external memory storage, the processing dependence between above-mentioned like this each unit just can be only decided by the time distribution of the AGU3 that carries out memory 1 access.Because the processing dependence of each unit is decided by AGU3, above-mentioned each unit designs independently, and couples together concurrently by data/address bus and control bus, constitutes an integral body.

Like this, above-mentioned each unit has obtained the independence that designs separately, and the restriction in the design also obviously reduces; Therefore, more designer can share design objective separately, and designs simultaneously, and whole system does not need very big scale, has shortened the time that design needs.

And, because the processing dependence of above-mentioned each unit depends on the program of AGU3, so can realize flexible design.In the practical design, the processing of " macro block " of each P frame and B frame need just finish after 15,625 clock cycle.

, motion-vector search is handled, though it is very short to be used for access time of memory 1 of data input and output, needs the very long processing time, and whole motion-vector search is handled needs 13,000 clock cycle.

Other processing is finished by each hardware module, and all remaining processing can dispose in 15,625 clock cycle.Motion-vector search is handled and the processing beyond it, carries out the pipeline parallel method of each " macro block " is handled.

That is to say that 100 " macro blocks " that comprised are carried out continuous compression successively and handled in a frame.Motion-vector search is handled after finishing in 15,625 clock cycle, and the motion search of next 15,625 clock cycle of beginning is handled.This is the place of main points.

Like this, in the most cycle, motion search is handled and other processing is carried out simultaneously.By this pipeline processes that the program of AGU3 is carried out, also make with structure of the present invention, carry out flexible design and become possibility.

And, because all inputoutput datas of above-mentioned each functional block all are kept among the memory 1, make the experiment of carrying out from the outside become very simple.In fact, master computer 8 is through main interface 7, and sending should be the order that AGU3 sends; And from the data of master computer 8 setting memories 1, through after the processing execution of above-mentioned each functional block, the data of memory 1 can be written into master computer 8 again and detect.

Particularly, the data of the inside of above-mentioned each functional block also can write master computer and detect, and are equipped with such program, make to detect to be easy to carry out.

Like this, the various execution of above-mentioned each functional block do not have the relation of dependence, and only giving needs the functional block of operation that clock is provided, and needn't carry out the functional block beyond it, like this, can reduce the whole power consumption of system.

Next, by Figure 12, decoder is illustrated separately.

The differential signal of decoded frequency domain in variable length decoder 17 is constructed part 15b and piece deformity elimination filter 16 again through the B " macro block " in the encoder, is reduced into the data of actual B frame.The same with encoder, independent above-mentioned each functional block of carrying out is from memory 1 reading of data, the memory 1 of restoring after data processing is intact, and these controls are carried out by AGU3.The image of final reduction, by the LCD interface, externally the LCD of Xiang Lianing goes up and shows.

Figure 14 is the structured flowchart of central control device (AGU3).The order (illustrated PC order) of the diverter switch 30 of execution/test pattern by sending via main interface 7 from master computer 8 decides to enter operation or test pattern; Carry out control device 31 by the order compiling, carry out the content of the command program of ROM9; Utilize memory repeat write/control section 32 of reading order and the operation that the address register 33 of iterated command start address is set, carry out and memory 1 between the exchanges data by address control.

Like this, the computing that generates address, the memory access control signal of memory 1 and control above-mentioned each functional block begins and finishes.And, iterated command start address register 33, and register file 34 mainly is made of general register, comprises 2 groups of 84 bit register that link, 1 group of 82 bit register that link, 81 group of 1 bit register (not having detailed icon) that link.

Figure 15 is the memory area pie graph of external memory storage 1, corresponding with the order of the ROM9 of the execution control command program of in store above-mentioned hardware module, image is divided into a lot of pieces, thereby, use and " block mode " matched address structure of handling above-mentioned coordinate unit information.

As shown in figure 15, come row address and column address, and then control by the repetitive operation order control section 31 that the memory of AGU3 reads/writes by storage address.

Constitute external memory storage 1 by above-mentioned memory area; The order of using the above-mentioned address that can carry out storage access by coordinate unit, block unit, the pixel unit of " macro block " to generate is stored in ROM9; ROM9 such as Figure 11, Figure 12, the outside that also can be configured in AGU3 shown in Figure 13.

In addition, the situation that Shi Ji hardware configuration is more or less different with the configuration of these structured flowcharts also is common.

The storage space of memory 1, as shown in figure 15, its address is made of 18 bits.Comprise a high position 1 bit of frame, X coordinate and Y coordinate each 4 bits, row address of totally 9 bits of expression " macro block " position; And 2 bits and 3 bits of Y coordinate, the column address of totally 9 bits of the X coordinate of 1 bit of 2 bits of X coordinate of the position of low level 1 bit, expression piece and Y coordinate, remarked pixel position., beyond 4 zones, (0,0) (0,1) (1,0) (1,1), piece position, also to distribute zone, (0,2) (1,2) for the piece of the information of expression color difference signal Cr, Cb except the information that is used for representing grey scale signal Y.

Because data are 16 bits, 2 pixels (data of each 8 bit) adjacent on the X coordinate direction are assigned with an address.So,, can only be assigned to 2 bits though the X coordinate of location of pixels also has 8 pixel five equilibriums.

Like this, the length of order is decided to be 27 bits.Comprise to issue orders.

(1) storage access start address order

(2) memory reads loop command

(3) memory writes loop command

(4) AGU register controlled order

(5) subprogram order and conditional branch instruction

(6) special command that sends of master computer

The wherein storage access start address order of (1) is for memory reads or writes, and sends before loop command is carried out, for start address is set in the execution of loop command.

Like this, just can finish the appointment of absolute address`, when the appointment of the relative position of " macro block " of pre-treatment, handle " macro block " appointment at present to side-play amount corresponding between the motion vector.

The memory of above-mentioned (2) (3) reads/writes loop command, is that a lot " macro blocks " to the rectangle field carry out cycling, a lot of piece in rectangle field is carried out cycling, a lot of pixel scale in rectangle field is carried out cycling.Because possess such cycling mechanism.The general loop control that needs use cyclic program section carry out complicated statement, here, the memory of utilization (2) (3) reads/writes loop command and simply realized.

For example, to reading of certain frame data, perhaps to the reading of certain " macro block " data, perhaps to the reading of the pixel in certain rectangle field, a memory reads/writes loop command and just can realize.

(4) AGU register command, when being to use AGU3, in order to finish control to the execution sequence of module (functional block), and the setting up and order such as removing of the data of the background register that utilizes.

(5) subprogram order and conditional branch instruction in are to be used for order that the program of AGU3 is controlled.

(6) special command that the master computer in sends is only to get for " execution pattern " of operating based on the program among the ROM9, and has adopted just effectively order under " test pattern " this pattern that takes orders from master computer.This wherein comprises the orders of only operating by " stepping " number of appointment such as " steppings ".In the test pattern, all orders of from (1) to (3) can both be sent from master computer.

Thereby, under test pattern, write view data to memory 1 from master computer 8, be transformed to operator scheme then and make certain hardware module work, and then get back to test pattern, can read into resulting in of writing in the memory in the master computer 8.

Like this, can on the rank of above-mentioned each functional block, carry out functional verification.

Among the special command that master computer 8 sends, the order of the internal state of reading above-mentioned each functional block is arranged also, profit uses the same method, and also can carry out functional verification on the rank of loop to above-mentioned each functional block.

Here, " motion-vector search module (functional block) " described.

Motion-vector search is at each " macro block " in current processed frame, search in the former frame should the position of " macro block " near, the most similar to it search in 16 * 16 pixel field.

Actual search is such, from present position begin about, right and left moves up.Maximum removable 16 pixels promptly in 48 * 48 pixel coverage, are utilized interpolation processing, search for according to half resolution capability of pixel.

Between the pixel of 16 * 16 the pixel region arbitrarily in above scope and " macro block " of present frame, calculate the summation SAD (Sum ofAbsolute Difference) (hereinafter referred to as SAD) of the absolute value of the difference between the two to carrying out this operation at each pixel according to all " macro block " in the hunting zone, therefrom find out minimum, the position of this zone as its " macro block " in former frame of generation current " macro block ".

Then, obtain motion vector when position He its position in former frame of pre-treatment " macro block ".This operation is to set 16 * 16 window field in 48 * 48 search field, finds the solution the window treatments of above-mentioned SAD at whole search field.This can be considered to a kind of form of distinctive window treatments in the image processing.

Originally, aspect the constructive method of the hardware configuration that is fit to this window treatments, we had window MSPA structure, be characteristics with window MSPA among the present invention, do not reduce the high-speed parallel treatment effeciency, serial inputted search data and comparable data among the memory are carried out the window parallel processing.

Like this, solved the problem of the high speed of handling in the hardware configuration of dynamic image compression processing in the past.

Next, Figure 16 is motion vectors search circuit figure, comprises the external memory storage 1 of storing image data; Comprise the buffer 39 that data mode is changed that is used for that is used for " macro block " data of import successively by each " macro block " from memory.They link to each other with data/address bus 2, carry out mutual exchanges data.

Like this, by the data of internal bus 40, be provided for processor unit 101,102 by 32 parallel array shapes that link until the 132 window MSPA that constituted from 3 ends of this buffer 39 outputs.

These processor units 101 to 132 carry out the ultrahigh speed concurrent operation by adder 44 to its data, constitute motion vectors search circuit by this adder, " macro block " of search expression present frame is the motion vector of coming that where moves from former frame thus.

For example, two of inputted search data and comparable data pixel values carry out subtraction and absolute value operation in processor unit 101 to 132, carry out the add operation of the SAD till a certain search in again.The window MSPA that these 32 processor units 101 to 132 constitute, 1089 window treatments of executed in parallel.

Window treatments once needs 16 * 16=256 time SAD to handle, and like this, whole search needs 1089 * 256=278,784 times SAD operation.

In fact, these 32 processor units (being also referred to as processor array) can be finished SAD operation, 278,784/32=8712 clock cycle of the fastest needs in a clock cycle.

But can not start working together because of 32 processor arrays, so native system can be finished the search of motion vector in 13,000 clock cycle.

Like this, just established and do not reduced the high-speed parallel treatment effeciency, from memory 1 serial input heuristic data and comparable data but and carry out " window parallel processing ", means.

This " window parallel processing " is widely used in image processing, and basic principle is exactly, and in broad relatively image field, by moving relative to narrow and small widnow21 is gapless up and down, finishes the identification and the information Processing processing of image; When seeing some specific pixels emphatically, because in window 21, comprise a lot of positions, if the pixel to each position is all carried out once specific computing, so every change all will re-execute specific computing during a position, and this will have a lot of useless repetitions.

Here, must save above-mentioned repetition, so, replace the double counting that those are done when moving up and down window 21, but carry out parallel processing with a lot of processors.Like this, established and utilized window MSPA, and needn't carry out double counting, and can improve the array processing method of parallel efficiency.

" parallel efficiency " here mentioned refers to, and in the parallel processing simultaneously of n processor, the processing time when having only a processing shortens to 1/n, the perfect condition this 100%, the efficient that can reach.Need not carry out the relevant content of array processing method that double counting just can improve " parallel efficiency " with above-mentioned,, omit explanation here because published.

Next, represented among Figure 16 is the motion-vector search line map, promptly utilization in above-mentioned " window parallel processing " " existing window MSPA " is applied in " motion-vector search ", and be new invention beyond example.

This " motion-vector search " is unit with " macro block " of 16 * 16 pixels, only grey scale signal Y handled; Thereby it is next from where moving of previous frame searching for certain " macro block ".

Next, Figure 17 is the functional block diagram of discrete cosine transformer and quantizer, and their suitable operation and inverse operation is described.

The blocks of data of 8 * 8=64 pixel of preserving in the memory 1 is input in inverse quantizer 53 and the discrete cosine transform/inverse converter 56 successively by data/address bus 2.

Inverse quantizer 53 and quantizer 58, be with a clock cycle just can handle data by the module of pipelining, and such design cannot guarantee its high speed operation.

Data mode converter 55 is to carry out converter from parallel data form to the serial data form conversion for discrete cosine transformer/inverse converter 56.Discrete cosine transform or inverse transformation are common two-dimensional process to be decomposed into 2 times one dimension processing, and replacement operator buffer 57 carries out the replacement operator of 8 * 8 necessary data at this moment.

The detailed part of data mode converter 55 is represented in Figure 18, and operation is illustrated.

At first, the bit parallel data that are made of 16 bits are imported successively by each clock, deposited in

register

62 and 16 registers of register 63 totals from incoming line 61.

When 8 data are here, they are exported by connecting

line

64,65 in the bit serial mode in register 62 or the register 63, select the dateout of register 62 or register 63 by the input diverter switch from this register.

Below, be described further with regard to the 2 groups of

registers

62,63 that possessed in the data converter emphatically.

By using 2 groups of

registers

62,63, in the data mode converter, receive the input data, be accompanied by shifting function, come the conveyance data group from the data block that makes progress of data block down, handle for these two and carry out side by side simultaneously, and then reach the high speed of data processing.

In Figure 18, be the time sequential routine at interval by 8 clocks, operation is described.

1) in initial 8 clocks, the data of 16 bits 7 are input to register 62 successively from data 0 to data.

2) in ensuing 8 clocks, no longer to register 62 input data, only 8 the most the next data that accumulate in register are exported from register, each data of register are moving to bottom offset simultaneously.Like this, the parallel data that is transfused to is one group with 8, output successively from the next to upper.

During this, need replace to register 63 storages in back to back following again 8 clock cycle so to register 62 storage data if forbid.

Like this,, utilize input diverter switch 66 to be used alternatingly 2 groups of

registers

62,63, and then receive continuously from memory 1 and the input data of coming for data mode converter 55.This data processing by pipeline parallel method has high speed, and just, by not reducing data transfer rate, the transform method that the data of input is successively become the data mode of 2 groups of parallel data comes construction system.

Here said data transfer rate is meant that per second can transmit the data of how many bits, refers to its speed.The data of serial can only transmit 1 Bit data by 1 line, and for example, when needing 1,000,000 bps data transfer rate, the line of 100,000 bps data transfer rate is 10 parallel uses preferably.

Shown in the functional block diagram of the discrete cosine transform/inverse converter among Figure 19, the serial data of 8 bits is input to the input processing section, by 8 16 bit accumulator, finally is input to output processing part and divides 73.

The functional block diagram that Figure 20 must import the processing section to discrete cosine transform/inverse converter carries out detailed icon.The input serial data of 8 bits, be input in bit-serial adder 81 and the bit serial subtracter 82, their output is imported into 8 bits and disperses to calculate among the ROM83 of usefulness, and then 8 input data are directly inputted to 8 other bits and disperse to calculate among the ROM84 of usefulness.

The dateout of this ROM of two groups is selected by the switch 86 that is switched by switching signal 85 control, as this switching signal 85, is input as when carrying out discrete cosine transform " 0 ", and is input as " 1 " when carrying out inverse discrete cosine transform.

The functional block diagram that Figure 21 divides the output processing part of discrete cosine transform/inverse converter carries out detailed icon.8 bit parallel input data are input in parallel-by-bit adder 91 and the bit parallel subtracter 92.

The dateout of this ROM of two groups is by selecting by switching signal 93 control and by the switch 94 that switches, as this switching signal 93, if be input as " 0 " when carrying out discrete cosine transform, if be input as " 1 " when carrying out inverse discrete cosine transform.

The dateout of diverter switch 94 is input in 8 registers 95, and is exported serially by output line 96 by register.

Use the video frequency pick-up head on the video telephone to take messenger face, the vision signal as main body is transmitted to the receiver by telephone line.Here a series of processing before it being transmitted describes.

Because be the communication of twocouese, deliver letters and the information that comes and goes of trusted transmits and will carry out simultaneously on same circuit, here, omit and the twocouese relevant explanation of communicating by letter.

General method is the signal processing circuit that transmits of the low bit rate dynamic image that H.263 has based on existing ITU international standard.

Comprise as on this basis the function of appending, extract the window 21 that the part of face in the vision signal constitutes face, and then follow facial motion by the moving of window 21 of face; And, the termporal filter of the motion of the background parts beyond the inhibition face; And, the new transfer rate control structure that propagation delay is very little; By these structures and function, come that dynamic image data is carried out suitable compressed signal and handle.

As the function of appending on the basis of the above, it is at first to discerning processing from the video signal of inputs such as video camera, constitute the window 21 of face by the specific zone of in the picture of this dynamic image, moving arbitrarily, calculate that pixel in this window 21 is followed facial movement and the mean value of the motion vector that produces, this appends functional block and possesses this calculation program.

When the fiduciary can capture the bearer in the visual field of its naked eyes, fiduciary's sight line just can be followed the motion of bearer's face, so bearer's face often is in the center in fiduciary's the visual field.This phenomenon is because have best image identifying ability near the center in the visual field in people's the naked eyes, to interested object in order to see the center that just it is placed on the visual field clearly.Valuable effective information is not omitted ground collects to greatest extent, this be instinct, unconscious also be necessary consciousness action.

Like this, bearer's face will be placed on the position, center (not being the center of picture) of the better face of resolution window 21 of fiduciary's eyes.This should be to have possessed the 1st kind of intelligence.For this intelligent use in electro-mechanical device, need calculate the mean value of motion vector of the window 21 of the face that comprises this people by the calculation program; And control moving of face's window 21 according to above-mentioned mean value, possess such the window's position control program from function.

As the 2nd kind of intelligence, promptly the background image beyond the window 21 of face 22 fierce move thereby dynamic image data a lot of in, the amount of exercise of background image 22 is carried out suitable reduction, possess the encryption algorithm that constitutes by the calculation program of cutting down image quality.This calculation algorithm is exactly the information addition each other each pixel of the same position of continuous front and back frame, and then divided by 2, the value that obtains is replaced the image information of subsequent frames.Like this, the amount of information that suppresses the dynamic image that is encoded.

In the process that suppresses amount of information, utilize face to emphasize H.263+ encryption algorithm of type, it has the controlling organization of output transfer rate, and this mechanism makes the image on the transmission lines that the transmission capacity is restricted reach propagation delay and minimizes; And then, by its decision operating sequence and the function of program that changes the sequential of giving and accepting of data, make naked eyes can see the action of nature clearly.The result who does does not so obviously make the image quality aggravation of background image 22, and only is that a point fuzziness is arranged.

The profile of the face that send words person is placed on the center roughly of the window 21 of face; The amount of information of the background image 22 beyond the window 21 of face is carried out alligatoring; And the window 21 of face is kept closeization with interior amount of information, the summation of the amount of image information that vision signal had is very little like this, by these, makes amount of information meet the restriction condition of telephone line.

Face also will have some to shake when speaking, and in order to follow these motions, controls window 21 mobile of face with the window's position control program of face, can continue window 21 is updated to latest position by this program.Like this, in general, use home-use black and white television set, just can obtain face with the audiovisual people very close and with the image of action naturally, and the garbage relevant with background can be compressed to the limit, and then can keep very high transmission efficiency.

Because clear and definite video telephone can the information transmitted amount, so be necessary the dynamic image data of looking the hearer and not have to influence is compressed completely.Like this, for the information that can not ignore in the part,, must eliminate if influence is very little on the whole.Can be increased with the important information that guarantees other, consequently Zheng Ti performance is improved.

Quantize by reduction to the making zero of all Pixel Information of difference and to the data of " interior macro block " grey scale signal in addition in the B frame information, these also can cut down the amount of information of dynamic image significantly.

Though these operations will cause the deterioration of image from self, the amount of information that can increase in contrast to this is far above the amount of information itself of being cut down, and the result has improved image quality.

Next, be the method that makes the motion of lip and sound corresponding to " lip sound is synchronous ".By " rate controlled mode " newly developed, is 1 frame that the pairing encoding amount of every two field picture (per second 25 width of cloth or 30 width of cloth in the TV) of dynamic menu carries out benchmark, fixed encoding amount such as being 112 bytes, the amount of information that can transmit of unit interval correspondence promptly transmits bit rate, as be 27kbps, the communication delay that can predict by them is comprised in " rate controlled mechanism "; Take place if for example usually can predict the communication delay of 10 frames, when utilizing from the past that variable quantity is till now predicted future, information processing is carried out in above-mentioned delay.

Though the outer generation of some predictions more or less will be arranged,, because be in the degree that can not produce the problem in the practical application with interior to predicting future, so, in the information after the processing of an example of the present invention, the sound of accepting with respect to not postponing under normal conditions, can successfully be suppressed at the delay of image in 3 frames.

Though video telephone has comprised these intelligence, realize that these intelligent hardware when can not carry out high speed operation, lose some frame numbers in 1 second.This television image that loses frame number can cause the natural atmosphere of dialogue not exist.

Window MSPA newly developed and each device that constitutes each key element of hardware system, comprise AGU central control device, motion-vector search circuit, discrete cosine transform and quantization transform circuit and re-quantization circuit and inverse discrete cosine transform etc., realized with on a small scale cheaply circuit finish the purpose of anticipation hardware.Its circuit as previously shown.

In a word, in order to make the video telephone clear picture, nature.

(a) only emphasize that face etc. wants the part of seeing, importance is according to target distinguished, and carries out corresponding weighted average.

(b) in order to make whole picture clear, cut down useless amount of information.

(c) in order to make picture and sound synchronous, build the atmosphere of nature, to carrying out minimization the time of delay of image processing generation.

(d) in order to produce picture, though adopt the hardware of the low high speed motion of cost to constitute with the speed motion of nature.

Composing images message transfer system thus.

Above-mentioned video telephone is a kind of example of the present invention only, and the scope of application of the code compressed system of the dynamic image data that this efficient is good is unlimited.

Make the image quality deterioration for the uninterested part of user; Part in the opposite interest set is kept original image quality; For the nonuseable part that flows through continually, be compressed to the amount of information of the needed irreducible minimum of image quality.Follow the purpose of the script of image transmission, do not destroy the atmosphere of image, carry out the motion of nature, these can regard the necessary condition that the present invention comprises as.

Claims

1, a kind of high-performance code compression system of dynamic image data is characterized in that: input current frame image, preceding reference frame image and back reference frame image, and the motion prediction function part of carrying out motion prediction, motion compensation and determining prediction mode; And, from this motion prediction function predicted picture of partly exporting and the difference information of working as above-mentioned prior image frame, its input is also carried out compulsory making zero of the both full-pixel value function part that makes zero to all pixel values of difference information; And, all pixels of the making zero of partly exporting from this making zero of both full-pixel value function, they inputs, and the prediction mode that is determined according to front motion prediction function part, motion is predicted to the next one of dynamic image, simultaneously the coding generating portion that the full images information of above-mentioned making zero is encoded; By this encoder dynamic image data is carried out sending after the encoding compression, accept signal of sending from encoder and the decoded portion of decoding through transmission lines; And, from the decoded signal of decoded portion output, be transfused to and carry out the re-quantization part of re-quantization; And, partly export the signal of re-quantization from this re-quantization, its input and carry out inverse discrete cosine transform, and then restore the inverse discrete cosine transform part of above-mentioned difference image; And, the difference image of this reduction and the predicted picture that obtains by the prediction mode prediction are mixed together and then export the adder of going back original image, constitute decoder by above these parts.Possess above decoder, and be treated to the high-performance code compression system of the dynamic image data of characteristics with the B frame.

2, according to the high-performance code compression system of the described dynamic image data of claim 1, it is characterized in that: for not using above-mentioned predicted picture only to current " macro block " image direct coding, both situations of " intra-macroblock ", grey scale signal and color difference signal are quantized by rounding up, to this " intra-macroblock " situation in addition, grey scale signal is rounded and quantizes, and color difference signal still quantizes by rounding up, grey scale signal and color difference signal are adopted same quantization level, thereby lower the noise of color difference signal.

3, according to the high-performance code compression system of the described dynamic image data of claim 1, it is characterized in that: the comparison means that the residual bit quantity of the bit quantity of the dynamic image data that is encoded and communication buffer is compared; And, in order to make above-mentioned residual bit quantity inexhausted,, come the control device of the target bits amount of control frame by the comparative result that obtains by this relatively means; And, the control result that utilization is obtained by above-mentioned control device, to image by the video camera input, from its process encoder, transmission lines and above-mentioned decoder, until the image of last output decoder, the time of delay and the LOF that take place during this are controlled, it is minimum making them, in the control of frame rank transfer rate, calculate the calculating means of the target bits amount of every frame.

4, according to the high-performance code compression system of the described dynamic image data of claim 3, it is characterized in that: the average weighted mean value of the quantization level of each " macro block " of use previous frame, calculate the quantization level that is suitable in initial " macro block " of frame, this is a first step computational methods; And, use the rank of the quantification of using in the encoding amount of reality of above-mentioned target bits amount, current " macro block " and above-mentioned initial " macro block ", calculate the inching amount of the 2nd step quantization level that is suitable for of later " macro block ", this is second to go on foot computational methods.

5, according to the high-performance code compression system of the described dynamic image data of claim 4, it is characterized in that: making the memory of preserving image frame information is that medium combine with the data/address bus with hardware module of operating independently of one another, and then, control that above-mentioned memory and the data between the hardware module flow and the central control device in time sequential routine, by control bus each hardware module is combined, form encoder with such system configuration.

6, according to the high-performance code compression system of claim 4 or 5 described dynamic image datas, it is characterized in that: image is divided into a lot of pieces, with the information of the coordinate unit that can handle this piece and be suitable for the address structure of piece processing mode, constitute the memory addressing scope, such external memory storage; And, can generate by the coordinate unit of above-mentioned " macro block ", the unit of piece, the address that pixel unit carries out storage access with order, these command programs that are used for the order of address generation and above-mentioned each hardware module is carried out control are stored in read only memory ROM; Central control device possesses above structure.

7, according to the high-performance code compression system of each described dynamic image data among the claim 1-5, it is characterized in that: adopt the said external memory; And, be " macro block " data the buffer that unit converts the data mode conversion of serial input to by each macro block from above-mentioned memory; And, be provided for the processor unit that connects with 32 parallel array shapes by the data of 3 ends of above-mentioned buffer output, constitute window treatments Memory Sharing processor array structure thus; And, the operation method of the data of processor unit being carried out the ultrahigh speed computing; And search is used for representing that current " macro block " is the motion vectors search circuit of coming that where moves from former frame.

8, according to the high-performance code compression system of each described dynamic image data in the claim 6, it is characterized in that: adopt the said external memory; And, be " macro block " data the buffer that unit converts the data mode conversion of serial input to by each macro block from above-mentioned memory; And, be provided for the processor unit that connects with 32 parallel array shapes by the data of 3 ends of above-mentioned buffer output, constitute window treatments Memory Sharing processor array structure thus; And, the operation method of the data of processor unit being carried out the ultrahigh speed computing; And search is used for representing that current " macro block " is the motion vectors search circuit of coming that where moves from former frame.

9, according to the high-performance code compression system of each described dynamic image data in the claim 1-5 item, it is characterized in that: adopt above-mentioned memory; And, in the memory horizontal 8 perpendicular 8 " macro block " data of forming of totally 64 pixels be unit serial input one by one with each macro block, do not reduce rate of data signalling, the data conversion of serial is become two groups of data mode transform methods of parallel data; And, adopt the processor array that above-mentioned parallel data is carried out the binary discrete cosine transform with by this data conversion method; And the binary discrete cosine transform data input from processor array output quantizes it, then dateout and deposit external memory storage in, such quantization modules successively.

10, according to the high-performance code compression system of each described dynamic image data in the claim 6, it is characterized in that: adopt above-mentioned memory; And, in the memory horizontal 8 perpendicular 8 " macro block " data of forming of totally 64 pixels be unit serial input one by one with each macro block, do not reduce rate of data signalling, the data conversion of serial is become two groups of data mode transform methods of parallel data; And, adopt the processor array that above-mentioned parallel data is carried out the binary discrete cosine transform with by this data conversion method; And the binary discrete cosine transform data input from processor array output quantizes it, then dateout and deposit external memory storage in, such quantization modules successively.

11, according to the high-performance code compression system of each described dynamic image data in the claim 7, it is characterized in that: adopt above-mentioned memory; And, in the memory horizontal 8 perpendicular 8 " macro block " data of forming of totally 64 pixels be unit serial input one by one with each macro block, do not reduce rate of data signalling, the data conversion of serial is become two groups of data mode transform methods of parallel data; And, adopt the processor array that above-mentioned parallel data is carried out the binary discrete cosine transform with by this data conversion method; And the binary discrete cosine transform data input from processor array output quantizes it, then dateout and deposit external memory storage in, such quantization modules successively.

12, according to the high-performance code compression system of each described dynamic image data in the claim 8, it is characterized in that: adopt above-mentioned memory; And, in the memory horizontal 8 perpendicular 8 " macro block " data of forming of totally 64 pixels be unit serial input one by one with each macro block, do not reduce rate of data signalling, the data conversion of serial is become two groups of data mode transform methods of parallel data; And, adopt the processor array that above-mentioned parallel data is carried out the binary discrete cosine transform with by this data conversion method; And the binary discrete cosine transform data input from processor array output quantizes it, then dateout and deposit external memory storage in, such quantization modules successively.