WO2011111533A1 - Transmission error concealment processing device, transmission error concealment processing method, and program thereof - Google Patents

Transmission error concealment processing device, transmission error concealment processing method, and program thereof Download PDF

Info

Publication number
WO2011111533A1
WO2011111533A1 PCT/JP2011/054064 JP2011054064W WO2011111533A1 WO 2011111533 A1 WO2011111533 A1 WO 2011111533A1 JP 2011054064 W JP2011054064 W JP 2011054064W WO 2011111533 A1 WO2011111533 A1 WO 2011111533A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
transmission error
decoded
error concealment
concealment processing
Prior art date
Application number
PCT/JP2011/054064
Other languages
French (fr)
Japanese (ja)
Inventor
和也 早瀬
藤井 寛
誠之 高村
裕尚 如澤
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Publication of WO2011111533A1 publication Critical patent/WO2011111533A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/89Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving methods or arrangements for detection of transmission errors at the decoder
    • H04N19/895Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving methods or arrangements for detection of transmission errors at the decoder in combination with error concealment

Definitions

  • the present invention receives two or more hierarchized encoded data, decodes the received encoded data, and reproduces the video. Even if a transmission error occurs, the image quality of the reproduced video is as much as possible.
  • the present invention relates to a transmission error concealment technique that does not degrade the transmission.
  • Transmission error recovery technology is a technology that provides redundant data to the original video data in advance and recovers the information of packets lost due to transmission errors using the redundant data.
  • a typical example is forward error correction (FEC) technology.
  • Transmission error concealment technology is as high as possible only with information that has already been received, such as when packets are lost due to transmission errors and cannot be recovered using FEC, or when video is not completed at the timing to be reproduced due to transmission delay. It is a technology for constructing quality playback video. For example, in Patent Document 1, when an error occurs in decoded image data, a transmission error is concealed by repeatedly displaying the currently displayed image.
  • a scalable video-coded video stream is composed of a basic layer that holds information of low video quality and an extended layer that holds information of high video quality.
  • the extended layer data is difference information from the basic layer data necessary for reproducing high-quality video. Due to its hierarchical nature, the usage hierarchy can be flexibly switched in response to transmission errors, and it has very high compatibility with error concealment technology.
  • the present invention has been made in view of such circumstances, and when a video stream having a hierarchical structure is input, if a decoded signal of a desired hierarchy is not decoded at the timing of the time at which it is desired to be reproduced, By inputting the decoded signal stored in the frame buffer of the terminal that has been received and decoded and the interpolated signal interpolated by a predetermined method into a predetermined mixing function and mixing them, the signal of the desired layer at the time
  • the purpose is to establish a design method of a transmission error concealer that generates a pseudo signal and outputs the signal as a final video signal for reproduction.
  • a first aspect of the present invention is to conceal transmission errors in a system that receives two or more hierarchized encoded data and decodes and reproduces the received encoded data.
  • a processing apparatus wherein a decoded signal storage unit for storing a decoded signal obtained by decoding the encoded data for each layer, and a decoded signal of a desired layer at a time required for reproduction cannot be obtained due to a transmission error
  • one or more decoded signals stored in the decoded signal storage unit are read, and the read decoded signals are input to a mixing function and mixed at a set mixing ratio to generate a mixed signal
  • An interpolated signal generating unit that makes the generated mixed signal an interpolated signal of the desired hierarchy that is artificially created at the time when the reproduction is required, and the interpolated signal is a signal for reproduction at the time that requires the reproduction.
  • a reproduction image output unit for outputting.
  • the transmission error concealment processing apparatus includes an interpolation signal storage unit that stores the interpolation signal, and the interpolation signal generation unit reads one or more interpolation signals stored in the interpolation signal storage unit, and the decoded signal By inputting the read interpolation signal to the mixing function together with the decoded signal read from the storage unit, the one or more decoded signals and the one or more interpolation signals are mixed, and the desired layer A mixed signal as the interpolation signal may be generated.
  • the mixing rate is higher when the signal input to the mixing function is closer to the time when the reproduction is required, or the signal input to the mixing function is the desired layer.
  • the mixing rate set to be a value corresponding to the temporal pixel value change of the signal is determined by the motion amount estimation for each region obtained by dividing the screen. It may be a value set according to the determination result by determining whether it is a moving area or not.
  • a transmission error concealment processing method in a system which receives two or more layered encoded data, decodes the received encoded data, and reproduces the encoded data.
  • the mixed signal is generated by inputting the read decoded signal into a mixing function and mixing at a set mixing ratio, and the generated mixed signal is created in a pseudo manner at the time when the reproduction is necessary.
  • the transmission error concealment processing method includes a step of storing the interpolation signal.
  • the interpolation signal generation step one or more stored interpolation signals are read, and the read interpolation signal is read together with the read decoded signal.
  • the mixing function By inputting a signal to the mixing function, the one or more decoded signals and the one or more interpolation signals are mixed to generate a mixed signal that is the interpolation signal of the desired hierarchy. good.
  • the mixing rate is higher when the signal input to the mixing function is closer to the time when the reproduction is necessary, or the signal input to the mixing function is the desired layer.
  • the mixing rate set to be a value corresponding to a temporal change in the pixel value of the signal is that each region is a still region by motion estimation for each divided region of the screen. It may be a value set according to the determination result by determining whether it is a moving area or not.
  • a third aspect of the present invention is a transmission error concealment processing program for causing a computer to execute the transmission error concealment processing method.
  • the image quality of the video finally reproduced can be improved as compared with the conventional technique.
  • the decoded signal of the desired layer is not obtained at the timing of the desired time to be reproduced due to packet loss, transmission delay, decoding processing delay, etc.
  • the decoded signal received so far or generated so far Using the interpolation signal, an interpolation signal of a desired hierarchy in which the decoded signal is lost at the time is generated in a pseudo manner.
  • the interpolated signal refers to a signal obtained by performing a high image quality process such as restoration of a high frequency component on the decoded signal.
  • This interpolated signal is finally reproduced as a signal for reproduction at the corresponding time over the video renderer.
  • the transmission error includes not only errors such as packet loss in the network but also cases where a transmission delay occurs or decoding is not in time.
  • the following procedure is taken to create an interpolation signal at the time.
  • the received decoded signal and interpolation signal are stored in a memory area such as a frame buffer.
  • the number, type, and time range of the decoded signals and interpolation signals to be stored are set in advance from the outside.
  • the decoded signal, the interpolation signal, etc. stored in the memory area are mixed using a predetermined mixing function, and the obtained mixed signal is regarded as an interpolation signal of a desired hierarchy created in a pseudo manner at the time.
  • the function format of the mixed function and the coefficients used internally are set in advance from the outside.
  • surrounding decoded signals are mixed adaptively with reference to information on motion and image quality. For example, in the static region, it is considered that the past signal of the same layer as the desired layer has a signal value closer to the missing signal. On the other hand, in the moving area, it is considered that the lower layer signal at the time has a signal value closer to the missing signal. Therefore, when the surrounding decoded signals are mixed, the image quality of the video is improved by increasing the mixing ratio of signals that are estimated to have signal values close to the missing signals with reference to motion information and the like. Can do.
  • the encoded data to be input is data hierarchized into two or more hierarchies by scalable video coding.
  • scalable video coding examples include H.264.
  • SVC Scalable Video Video Coding
  • Annex G an extension (Annex G) standard of H.264 / AVC.
  • the processing of the present embodiment is started upon receipt of a desired layer missing instruction flag indicating that a signal of the desired layer is not obtained.
  • a desired hierarchy not the lowest hierarchy but any higher hierarchy is set. It is assumed that a predetermined received decoded signal and a predetermined interpolation signal generated up to the time are stored in the memory area of the terminal that performs decoding. The number, type, and time range of decoded signals and interpolation signals to be stored are given in advance from the outside.
  • each pixel of the interpolation signal ipl (T) at the time T is generated as follows.
  • f (s) is a mixing function for generating a mixed signal by inputting the signal group s.
  • each pixel of the interpolation signal ipl (T) is generated using the following mixing function.
  • ipl (T) f (rec (T ⁇ a), rec (T ⁇ a + 1),..., rec (T ⁇ 1), rec (T)) (1)
  • the decoded signal and the interpolation signal from time Ta to time T before time T are stored.
  • H. When a H.264 / AVC B picture or the like is used, since a future decoded signal is received before the time T, only the past signal in time is not necessarily the target.
  • the time is T
  • the desired hierarchy to be reproduced is L.
  • the interpolation signal generated in this embodiment is ipl (T).
  • T spatially scalable encoded data
  • data has been received up to layer L at time T-2, layer L-2 at time T-1, and layer L-1 at time T without loss.
  • the decoded signal rec (T-2) is reproduced as it is at time T-2, and the interpolation signal ipl (T-1) is reproduced at time T-1.
  • the present embodiment may be applied to the method for generating the interpolation signal ipl (T-1), or other methods may be used. As an example of the other method, the method described in Patent Document 1 can be cited.
  • the mixing function f (s) a function that performs linear weighting on an input signal and averages the input signal is used.
  • the resolution of the decoded signal rec (T-1) and the decoded signal rec (T) is smaller than the resolution of the desired layer, the resolution is expanded to the resolution of the desired layer.
  • the enlargement method include an enlargement method using a linear filter such as a 4-tap or a 6-tap, and a super-resolution process for pseudo-reconstructing a high-frequency component.
  • the pixel values at the same spatial position are respectively weighted and mixed.
  • the mixing ratio w is derived from the outside either using a setting file at the receiving terminal or by providing a derivation module inside the application. Further, one mixing rate may be given to the frame, or a separate mixing rate may be given to each image region or pixel having an arbitrary shape.
  • the mixing ratio may be set by a combination of these four methods.
  • the mixing rate setting method 1 is a setting method according to the difference between the time and the time of the decoded signal or interpolation signal to be mixed. Since the signal closer to the time has a picture structure closer to that of the signal of the desired hierarchy at the time, it is desirable to set the mixing ratio of the signal close to the time as high as possible.
  • the mixing rate setting method 2 is a setting method corresponding to the difference between the desired layer and the decoded signal or interpolated signal layer to be mixed. Since the signal in the layer closer to the desired layer stores a larger number of higher frequency components, it is desirable to set the mixing ratio of the signal in the layer close to the desired layer as high as possible.
  • ⁇ Mixing ratio setting method 3 setting according to the estimated value of video quality>
  • a method for setting the mixing ratio according to the estimated value of the video quality is conceivable. If the video quality can be estimated, it is desirable to set the mixing ratio of the highest quality signal as high as possible.
  • a method for estimating the video quality from the quantized value and the picture type can be considered.
  • ⁇ Mixing rate setting method 4 Setting according to temporal pixel value change of signal> A method for setting the mixing ratio in accordance with temporal pixel value changes is conceivable. Since pixel values at the same position are mixed spatially, if the pixel values differ according to temporal changes due to movement of the object, etc., the mixing ratio of the pixel values having different times is reduced and the decoding signal mixing ratio at that time is reduced. It is desirable to increase. On the other hand, when the pixel value does not change with time, the mixing rate of the decoded signal or interpolation signal in the desired layer containing a lot of high-frequency components is increased, and the mixing rate is increased as the layer moves away from the desired layer. It is desirable to set a smaller value.
  • a setting method based on the above requirements can be considered.
  • the pixel position x it is estimated whether the pixel value change is large when moving from time T-2 to the time T, and whether the pixel value change is large when moving from time T-1 to the time T. To do.
  • the motion amount is estimated from time T-2 to the time T, and the pixel value in the area determined to be a moving area is regarded as having a large pixel value change.
  • a method can be considered in which the pixel value in the region determined to be a region is regarded as having a small change in pixel value.
  • a different mixing ratio setting method may be used for each interpolation signal.
  • the motion amount estimation method 1 is a method for estimating the motion amount according to the difference value between the reduced signal of the decoded signal at time T-2 and the decoded signal at time T.
  • Determine whether the pixel belongs to the static region or the moving region as follows.
  • the decoded signal rec (T-2) at time T-2 is reduced to the resolution of the decoded signal at time T, and a pixel value difference is obtained between pixels at the same spatial position.
  • a reduced signal of the decoded signal rec (T-2) is expressed as dws (T-2).
  • E 1 is a threshold value of the difference signal value that separates the stationary region and the moving region, and is given by an external function. For example, if the resolution of rec (T-2) is 1920 ⁇ 1080 and the resolution of rec (T) is 960 ⁇ 540, the above determination is performed for each pixel of 960 ⁇ 540. The determination result is regarded as a determination result of four pixels at the same spatial position of 1920 ⁇ 1080.
  • the motion amount estimation method 2 is a method for estimating the motion amount according to the difference value between the decoded signal at time T-2 and the enlarged signal at time T.
  • Determine whether the pixel belongs to the static region or the moving region as follows. A pixel value difference is obtained between pixels in the same spatial position. The enlarged signal at the time T is expressed as ups (T).
  • E 2 is a threshold value of the difference signal value that separates the stationary region and the moving region, and is given by an external function.
  • the motion amount estimation method 3 is a method for estimating the motion amount according to the norm of the motion vector used for generating the decoded signal rec (T) at the time T.
  • the norm of the motion vector of a certain macroblock (16 ⁇ 16 pixel region) used for generating the decoded signal rec (T) at the time T is set to n.
  • This motion vector is assumed to be a motion vector from time T-2 to the time T. At this time, it is determined as follows whether the macroblock to which the pixel belongs to a still area or a moving area.
  • N is a threshold value of a motion vector norm that separates a stationary region and a moving region, and is given by an external function.
  • An example of a norm is Euclidean distance.
  • the motion amount estimation method 4 is a method for estimating a motion amount according to the type of prediction mode used to generate the decoded signal rec (T) at the time T.
  • the prediction mode of a macroblock used for generating the decoded signal rec (T) at the time T is m. It is assumed that the encoded data conforms to SVC. At this time, it is determined as follows whether the macroblock to which the pixel belongs to a still area or a moving area.
  • “skip” indicates the skip mode in SVC.
  • the motion amount estimation method 5 is a method for estimating the motion amount according to the magnitude of the prediction residual signal used for generating the decoded signal rec (T) at the time T.
  • the signal value of the prediction residual signal used for generating the decoded signal rec (T) at the time T is set as r. At this time, it is determined as follows whether the pixel belongs to a still region or a moving region.
  • R is a threshold value of a prediction residual signal that separates a stationary region and a moving region, and is given by an external function.
  • the signal value of the prediction residual signal at the same location as the pixel position in space may be used as it is, or the variance value, average value, maximum value in the image region (for example, macroblock) to which the pixel belongs. , Intermediate value, etc.
  • CBP Coded Block Pattern
  • Some of these five motion amount estimation methods may be connected in multiple stages. For example, the motion amount estimation according to the value of the motion vector norm is performed, the frame is divided into a region with a large motion and a region with a small motion, and a motion amount estimation according to the signal difference value is performed for the region with a small motion. , It may be subdivided into a stationary area and a moving area.
  • the determination result derived by these determinations may be corrected. For example, if a certain pixel or one or more image areas are determined to be moving areas, but all surrounding pixels or image areas are determined to be still areas, the determination result for the target pixel or image area Is likely to be a false determination. If the determination result is an erroneous determination, the pixel or image area appears as an isolated point, causing image quality degradation. Therefore, in this case, the determination result for the target pixel or the target image region is regarded as an erroneous determination, and the determination result is corrected as a still region. That is, when the determination result around the target pixel or the target image region is greatly different from the determination result of itself, the determination result can be corrected to improve the estimation accuracy.
  • mixing rate setting method 4 setting according to temporal change in pixel value of signal
  • motion amount estimating method 1 time T ⁇ It is assumed that the temporal pixel value change is estimated by “motion amount estimation according to the difference value between the reduced signal of the decoded signal 2 and the decoded signal at time T”. The motion amount is estimated by dividing a frame into image regions of one pixel or more. It should be noted that the present embodiment can be implemented in the same manner as will be apparent from the following description when other mixing ratio setting methods or other motion amount estimation methods are used.
  • FIG. 2 shows a configuration example of a transmission error concealment processing apparatus according to an embodiment of the present invention.
  • 10 is a transmission error concealment processing device
  • 20 is a receiving device that receives packets of an encoded stream
  • 30 is a decoding device that decodes the encoded stream and outputs a reproduced video signal (also simply referred to as a reproduced signal).
  • a reproduced video signal also simply referred to as a reproduced signal
  • the receiving device 20 is a device that receives an encoded stream encoded by scalable video encoding.
  • the reception of the encoded stream by the receiving device 20 may be the same as that of the conventional receiving device. However, if there is some transmission error and the reception of the hierarchically encoded data with the processing target frame is missing, a missing instruction signal indicating the missing is sent to the transmission error concealment processing device 10.
  • the decoding device 30 may generate a missing instruction signal.
  • the decoding device 30 is the same as a conventional device for performing scalable video decoding. However, the decoding device 30 is different from the conventional one in that not only the reproduced video signal of the decoding result is output, but also the received encoded information decoded by the variable length decoding unit 31 is not only the scalable decoding unit 32, To output to the transmission error concealment processing device 10 and to output the decoded signal of each layer to the transmission error concealment processing device 10.
  • the transmission error concealment processing device 10 and the decoding device 30 are shown as separate devices for easy understanding of the description. However, the transmission error concealment processing device 10 may be incorporated in the decoding device 30 as a part of the decoding device 30.
  • the storage device 15 is provided with a decoded signal storage unit 151 that stores a decoded signal that has already been decoded by the decoding device 30 and an interpolation signal storage unit 152 that stores a previously generated mixed signal as a frame buffer. .
  • information such as a motion vector, a prediction mode, a prediction residual signal, and CBP is input from the decoding device 30 and stored as received encoded information.
  • a motion amount estimation threshold and a frame mixture ratio are set in advance from the external setting unit 40 and stored.
  • the still region / moving region determination unit 11 When the transmission error concealment processing device 10 receives the processing target frame missing instruction signal from the receiving device 20, the still region / moving region determination unit 11 performs motion amount estimation for the pixel (group) to be reproduced, It is determined whether it belongs to a stationary area or a moving area. That is, the still region / moving region determination unit 11 reads the decoded signal, the enlarged signal and the interpolation signal in the storage device 15 and the encoded information such as the received motion vector and prediction mode from the memory region. The determination result is stored in the storage device 15.
  • the mixing rate setting unit 12 reads the determination result of the still region / moving region and the value of the mixing rate for each determination (that is, the still region and the moving region) from the storage device 15, and outputs each decoded signal, enlarged signal, and interpolation for the pixel. Set as signal mixing ratio. When the setting is completed, the process proceeds to the interpolation signal generation unit 13.
  • the interpolation signal generation unit 13 reads the decoding rate, the enlarged signal, and the mixing rate of the interpolation signal set by the mixing rate setting unit 12 and reads from the storage device 15 each decoding signal, the enlarged signal, and the spatially same position as the pixel.
  • the value of the interpolation signal is read, and the values of the decoded signal, the enlarged signal and the interpolation signal are mixed according to the read mixing ratio, and the interpolation signal at the time is generated.
  • the interpolation signal generation unit 13 outputs the interpolation signal generated by the mixing to the interpolation signal storage unit 152 of the storage device 15. When the output is completed, the process proceeds to the process of the reproduction video output unit 14.
  • the playback video output unit 14 reads the interpolation signal at the time from the interpolation signal storage unit 152 of the storage device 15 and outputs it as a playback video signal to a video renderer (not shown) at the playback timing.
  • FIG. 3 shows the overall processing flow
  • FIG. 4 shows the specific processing flow of the interpolation signal generation processing S12 in FIG.
  • the decoded signal rec (T) of the highest layer (hereinafter referred to as the highest layer) among the layers decoded at the desired time T, and the decoded signal decoded up to the desired layer
  • mixing rate setting method 4 setting according to change in temporal pixel value of signal
  • motion amount estimation method 1 reduction of decoded signal at time T-2
  • the motion amount is estimated by dividing a frame into image regions of one pixel or more. The following description is a flow of generating a frame interpolation signal at the desired time T.
  • Step S10 Image Region Division Processing
  • an interpolation signal output frame of a desired hierarchy is input, and the frame is divided into a plurality of predetermined image areas of one pixel or more.
  • the predetermined plurality of image regions having one or more pixels may be, for example, a macro block (16 ⁇ 16 pixels), but is not limited thereto.
  • the output by this processing is a divided frame and division information.
  • Steps S11-S13 Interpolation Signal Generation Processing Loop in Each Image Region
  • Step S12 Interpolation Signal Generation Processing
  • the input in the interpolation signal generation processing is the decoded signal rec (T) of the highest layer decoded at the desired time T, and the decoded signal rec (T) of the frame at the latest time among the decoded signals decoded up to the desired layer.
  • data stored in the storage device 15 such as a still region / moving region determination threshold, a mixing ratio for the still region and the moving region, a signal mixing formula, an index of the image region, and the like.
  • the output of the interpolation signal generation process is an interpolation signal ipl (T) for the image area.
  • the still region / moving region determination unit 11 decodes the decoded signal rec (T) of the highest layer decoded at the desired time T and the decoded to the desired layer for the image region to be processed.
  • the decoded signal rec (T-2) of the frame at the most recent time in the signal is used to determine whether it is a still area or a moving area.
  • the mixing rate setting unit 12 sets the mixing rate.
  • the interpolation signal generator 13 mixes the decoded signal rec (T) and the signal value of the decoded signal rec (T-2) according to the mixing ratio, and the mixed signal is the interpolation signal ipl (T) for the image region. Is output to the interpolation signal storage unit 152.
  • Step S20 Reduced Signal Generation Processing
  • the input of the reduced signal generation processing performed first by the still region / moving region determination unit 11 is the decoded signal rec (T-2) of the frame at the most recent time among the decoded signals decoded to the desired layer, and the resolution of each layer Information.
  • the still region / moving region determination unit 11 reaches the resolution of the decoded signal rec (T) of the highest layer that can be decoded at the desired time T from the decoded signal rec (T-2) at the time T-2. Process to reduce.
  • the reduced image area is an 8 ⁇ 8 pixel area at the same spatial position.
  • the output in this process is a reduced signal dws (T-2) of the decoded signal rec (T-2).
  • Step S21 Difference Absolute Value Calculation Processing
  • the next input of the absolute difference calculation process performed by the still region / moving region determination unit 11 is the reduced signal dws (T-2) and the decoded signal rec (T) of the decoded signal rec (T-2).
  • the still region / moving region determination unit 11 calculates a pixel value difference between pixels at the same spatial position between the reduced signal dws (T ⁇ 2) and the decoded signal rec (T).
  • the still region / moving region determination unit 11 calculates the sum of the absolute values of the differences in the reduced image region.
  • the still region / moving region determination unit 11 calculates the sum E of absolute difference values in the reduced image region 8 ⁇ 8 pixels.
  • the output of the difference absolute value calculation process is the sum E of the difference absolute values in the reduced image area.
  • Step S22 Still Area / Moving Area Determination Process
  • the sum E of absolute differences in the reduced image region and a threshold E 1 for determining the still region / moving region are input.
  • This threshold value E 1 is a threshold value of a differential signal value that separates a stationary area and a moving area given in advance by the setting unit 40 such as an external function, and is a value stored in the storage device 15. Based on these inputs, the still region / moving region determination unit 11 determines whether or not the sum E of the absolute differences is larger than a threshold E 1 for determining the still region / moving region.
  • the image area corresponding spatially to the reduced image area is regarded as a still area. If the total E is larger than the threshold value, the image area is regarded as a moving area. In other words, the following judgment is performed.
  • Steps S23 to S25 Correction processing of determination result
  • the still region / moving region determination unit 11 further performs the following processing based on whether the image region is a still region based on the determination result. If the image region is a still region, the still region / moving region determination unit 11 determines whether many surrounding image regions are moving regions. If so, the process proceeds to step S27. Otherwise, the process proceeds to step S26. If the image region is not a still region, the still region / moving region determination unit 11 determines whether many surrounding image regions are still regions. If so, the process proceeds to step S26. Otherwise, the process proceeds to step S27. As a result, when the determination result for many of the surrounding image areas is a determination result different from the image area, the determination result for the image area is corrected to the determination result for the surrounding image area.
  • Step S26 Setting Mixing Ratio for Still Area
  • the mixing rate setting unit 12 reads the mixing rate for the still area from the storage device 15 and sets the value in a register (not shown). Thereafter, the process proceeds to step S28.
  • Step S27 Setting Mixing Ratio for Moving Area
  • the mixing rate setting unit 12 reads the mixing rate for the moving area from the storage device 15 and sets the value in the register. Thereafter, the process proceeds to step S28.
  • the interpolated signal generation unit 13 decodes the highest level decoded signal rec (T) decoded at the desired time T, and the decoded signal rec (T-2) of the frame at the latest time among the decoded signals decoded up to the desired layer. ), A mixing ratio for the stationary region and the moving region, and a signal mixing formula are input, and an interpolation signal in the image region is generated from these as follows. First, the interpolation signal generation unit 13 expands the decoded signal rec (T) to the resolution of the decoded signal rec (T-2).
  • the interpolation signal generation unit 13 mixes the decoded signal rec (T-2) and the enlarged signal p (rec (T)) of the decoded signal rec (T) at the mixing ratio set in the register.
  • the signal mixing formula is a linear weighted sum, and the mixing rate is the linear weighting coefficient.
  • the interpolation signal generation means may be changed for each time.
  • a method of setting a mixing rate according to the estimated video quality value is applied to ipl (T-1), and a method of setting the mixing rate according to temporal pixel value change is applied to ipl (T).
  • You may apply.
  • the conventional technology such as the super-resolution technology and this embodiment may be changed for each time.
  • the form of the mixing function may be an arbitrary non-linear function form instead of the linear weighted average as described above.
  • a function format that outputs an intermediate value of a plurality of signal values or an intermediate value of a weighted signal may be used.
  • interpolation processing is performed after determining a stationary region and a moving region for each image region as needed.
  • the interpolation process may be performed on each pixel after the determination process of the still area and the moving area is performed on all the pixels of one frame.
  • the above transmission error concealment processing can also be realized by a computer and a software program.
  • the program can be recorded on a computer-readable recording medium or provided through a network.
  • FIG. 5 shows a hardware configuration example when the transmission error concealment processing device is realized by using a software program.
  • This system receives a coded stream via a network (CPU (Central Processing Unit) 50 for executing a program, a memory 51 such as a RAM (Random Access Memory) in which a program and data accessed by the CPU 50 are stored, and the network.
  • the encoded stream receiving unit 52, a program storage device 53 for storing a program to be executed by the CPU 50, and a video reproducing unit 54 for outputting a reproduced video signal are connected by a bus.
  • the program storage device 53 includes the decoding processing program 531 for decoding the encoded stream of the hierarchically encoded data received by the encoded stream receiving unit 52, and the above-described case when there is a transmission error in the processing target frame.
  • a transmission error concealment processing program 532 for causing the CPU 50 to execute a transmission error concealment process is stored.
  • the CPU 50 loads the decoding processing program 531 and the transmission error concealment processing program 532 into the memory 51 and executes them. As a result, even when the desired layer of the desired frame is not decoded at the timing to be reproduced due to packet loss or transmission delay, it is possible to make the deterioration of the image quality of the finally reproduced video inconspicuous.
  • the present invention is used, for example, in a system that receives two or more hierarchized encoded data, decodes the received encoded data, and reproduces a video. According to the present invention, even when a desired frame in a desired layer is not decoded at a timing to be reproduced due to packet loss or transmission delay, the image quality of the video to be finally reproduced can be improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

With the present disclosures, when decoding and replaying hierarchically coded data, quality degradation of replayed video is caused to not be prominent even if a signal of a desired hierarchical level at a time point for which replay is desired can not be decoded due to a transmission error. In a system that receives coded data that has at least two hierarchical levels and that decodes and replays the received coded data, a decoded signal obtained from decoding the coded data at each hierarchical level is recorded. In the case where a decoded signal of the desired hierarchical level at a time point for which replay is necessary cannot be obtained due to a transmission error, at least one recorded decoded signal is read, and the read decoded signal is input into a mixing function and is mixed at a set mixing rate, by which means a mixed signal is generated, the generated mixed signal is used as an artificially produced interpolated signal for the desired hierarchical level at the time point at which replay is necessary, and the interpolated signal is output as the signal for replay for the point in time at which replay is necessary.

Description

伝送エラー隠蔽処理装置,伝送エラー隠蔽処理方法およびそのプログラムTransmission error concealment processing apparatus, transmission error concealment processing method and program thereof
 本発明は,2つ以上の階層化された符号化データを受信し,受信した符号化データを復号し,映像を再生するシステムにおいて,伝送エラーが発生しても,できるだけ再生する映像の画像品質を劣化させないようにした伝送エラー隠蔽の技術に関する。
 本願は,2010年3月11日に日本へ出願された日本特願2010-054136号に対して優先権を主張し,その内容をここに援用する。
The present invention receives two or more hierarchized encoded data, decodes the received encoded data, and reproduces the video. Even if a transmission error occurs, the image quality of the reproduced video is as much as possible. The present invention relates to a transmission error concealment technique that does not degrade the transmission.
This application claims priority to Japanese Patent Application No. 2010-054136 filed in Japan on March 11, 2010, the contents of which are incorporated herein by reference.
 パケット遅延やパケットロスといった伝送エラーの伴うネットワーク回線を用いて映像データのストリーミング再生を行うと,ブロックノイズの発生,再生の遅延,フレーム飛びといった映像品質劣化が発生する。この品質劣化を回避するために,通常,伝送エラー回復技術や伝送エラー隠蔽技術が用いられる。 When streaming playback of video data using a network line with transmission errors such as packet delay and packet loss, video quality degradation such as generation of block noise, playback delay, and frame skipping occurs. In order to avoid this quality degradation, transmission error recovery technology and transmission error concealment technology are usually used.
 伝送エラー回復技術は,あらかじめ本来の映像データに冗長データを付与し,その冗長データを用いて伝送エラーによりロスしたパケットの情報を回復する技術である。代表的なものに,前方誤り訂正技術(FEC:Forward Error Correction)が挙げられる。 Transmission error recovery technology is a technology that provides redundant data to the original video data in advance and recovers the information of packets lost due to transmission errors using the redundant data. A typical example is forward error correction (FEC) technology.
 伝送エラー隠蔽技術は,伝送エラーによりパケットが欠落し,FECなどを用いても回復できない場合や,伝送遅延により再生したいタイミングで映像ができ上がっていない場合などに,すでに受信済みの情報のみで極力高品質の再生映像を構成する技術である。例えば,特許文献1では,デコード画像データにエラーが発生した場合に,現に表示している画像をリピート表示することで伝送エラーを隠蔽している。 Transmission error concealment technology is as high as possible only with information that has already been received, such as when packets are lost due to transmission errors and cannot be recovered using FEC, or when video is not completed at the timing to be reproduced due to transmission delay. It is a technology for constructing quality playback video. For example, in Patent Document 1, when an error occurs in decoded image data, a transmission error is concealed by repeatedly displaying the currently displayed image.
日本特開2001-119693号公報Japanese Unexamined Patent Publication No. 2001-119893
 エラーが発生した際に,前述の特許文献1のようなリピート表示では,エラー発生区間における被写体の動きは十分に再現できない。 When an error occurs, the movement of the subject in the error occurrence section cannot be sufficiently reproduced with the repeat display as described in Patent Document 1 described above.
 ところで,スケーラブル映像符号化技術は,伝送エラーに対する耐性が強い映像符号化技術として注目されている。スケーラブル映像符号化された映像ストリームは,低い映像品質の情報を保持する基本階層と高い映像品質の情報を保持する拡張階層から構成される。拡張階層のデータは,高い映像品質の映像を再生するために必要な基本階層のデータからの差分情報である。階層性を有することから,伝送エラーに対して使用階層を柔軟に切り替えることができ,エラー隠蔽技術と非常に親和性が高い。 By the way, the scalable video coding technology is attracting attention as a video coding technology that is highly resistant to transmission errors. A scalable video-coded video stream is composed of a basic layer that holds information of low video quality and an extended layer that holds information of high video quality. The extended layer data is difference information from the basic layer data necessary for reproducing high-quality video. Due to its hierarchical nature, the usage hierarchy can be flexibly switched in response to transmission errors, and it has very high compatibility with error concealment technology.
 本発明はかかる事情に鑑みてなされたものであって,階層構造を有する映像ストリームが入力される場合において,所望階層の復号信号が再生したい当該時刻のタイミングに復号されていない場合には,受信端末のフレームバッファに蓄えられている既に受信して復号できた復号信号と所定の方法により補間された補間信号等を所定の混合関数に入力して混合することにより,当該時刻の所望階層の信号を擬似的に生成し,その信号を最終的な再生用の映像信号として出力する伝送エラー隠蔽器の設計方法を確立することを目的とする。 The present invention has been made in view of such circumstances, and when a video stream having a hierarchical structure is input, if a decoded signal of a desired hierarchy is not decoded at the timing of the time at which it is desired to be reproduced, By inputting the decoded signal stored in the frame buffer of the terminal that has been received and decoded and the interpolated signal interpolated by a predetermined method into a predetermined mixing function and mixing them, the signal of the desired layer at the time The purpose is to establish a design method of a transmission error concealer that generates a pseudo signal and outputs the signal as a final video signal for reproduction.
 以上の課題を解決するために,本発明の第1の観点は,2つ以上の階層化された符号化データを受信し,前記受信した符号化データを復号して再生するシステムにおける伝送エラー隠蔽処理装置であって,前記符号化データを階層ごとに復号して得られた復号信号を記憶する復号信号記憶部と,再生が必要な時刻の所望階層の復号信号が伝送エラーにより得られない場合に,前記復号信号記憶部に記憶されている1つ以上の復号信号を読み込み,前記読み込んだ復号信号を混合関数に入力して設定された混合率で混合することにより混合信号を生成し,前記生成された混合信号を前記再生が必要な時刻において擬似的に作り出した前記所望階層の補間信号とする補間信号生成部と,前記補間信号を前記再生が必要な時刻の再生用の信号として出力する再生映像出力部とを備える。 In order to solve the above problems, a first aspect of the present invention is to conceal transmission errors in a system that receives two or more hierarchized encoded data and decodes and reproduces the received encoded data. A processing apparatus, wherein a decoded signal storage unit for storing a decoded signal obtained by decoding the encoded data for each layer, and a decoded signal of a desired layer at a time required for reproduction cannot be obtained due to a transmission error In addition, one or more decoded signals stored in the decoded signal storage unit are read, and the read decoded signals are input to a mixing function and mixed at a set mixing ratio to generate a mixed signal, An interpolated signal generating unit that makes the generated mixed signal an interpolated signal of the desired hierarchy that is artificially created at the time when the reproduction is required, and the interpolated signal is a signal for reproduction at the time that requires the reproduction. And a reproduction image output unit for outputting.
 上記伝送エラー隠蔽処理装置において,前記補間信号を記憶する補間信号記憶部を備え,前記補間信号生成部は,前記補間信号記憶部に記憶されている1つ以上の補間信号を読み込み,前記復号信号記憶部から読み込んだ前記復号信号とともに,前記読み込んだ補間信号を前記混合関数に入力することにより,前記1つ以上の復号信号と前記1つ以上の補間信号とを混合して,前記所望階層の前記補間信号とする混合信号を生成するようにしても良い。 The transmission error concealment processing apparatus includes an interpolation signal storage unit that stores the interpolation signal, and the interpolation signal generation unit reads one or more interpolation signals stored in the interpolation signal storage unit, and the decoded signal By inputting the read interpolation signal to the mixing function together with the decoded signal read from the storage unit, the one or more decoded signals and the one or more interpolation signals are mixed, and the desired layer A mixed signal as the interpolation signal may be generated.
 上記伝送エラー隠蔽処理装置において,前記混合率は,前記混合関数に入力される信号が前記再生が必要な時刻に近い信号ほど高い値,または,前記混合関数に入力される前記信号が前記所望階層に近い階層の信号ほど高い値,または,前記混合関数に入力される前記信号が映像品質の推定値が高い信号ほど高い値,または,前記混合関数に入力される前記信号が信号の時間的な画素値変化に応じた値となるように設定されたものであっても良い。 In the transmission error concealment processing device, the mixing rate is higher when the signal input to the mixing function is closer to the time when the reproduction is required, or the signal input to the mixing function is the desired layer. The higher the signal level is, the higher the signal input to the mixing function, the higher the image quality estimation value, or the higher the signal input to the mixing function, It may be set to have a value corresponding to a change in pixel value.
 上記伝送エラー隠蔽処理装置において,前記信号の時間的な画素値変化に応じた値となるように設定された混合率は,画面を分割した領域ごとに,動き量推定により各領域が静止領域であるか動領域であるかを判定し,その判定結果に従って設定された値であっても良い。 In the transmission error concealment processing device, the mixing rate set to be a value corresponding to the temporal pixel value change of the signal is determined by the motion amount estimation for each region obtained by dividing the screen. It may be a value set according to the determination result by determining whether it is a moving area or not.
 本発明の第2の観点は,2つ以上の階層化された符号化データを受信し,前記受信した符号化データを復号して再生するシステムにおける伝送エラー隠蔽処理方法であって,前記符号化データを階層ごとに復号して得られた復号信号を記憶するステップと,再生が必要な時刻の所望階層の復号信号が伝送エラーにより得られない場合に,記憶されている1つ以上の復号信号を読み込み,前記読み込んだ復号信号を混合関数に入力して設定された混合率で混合することにより混合信号を生成し,前記生成された混合信号を前記再生が必要な時刻において擬似的に作り出した前記所望階層の補間信号とする補間信号生成ステップと,前記補間信号を前記再生が必要な時刻の再生用の信号として出力する再生映像出力ステップとを有する。 According to a second aspect of the present invention, there is provided a transmission error concealment processing method in a system which receives two or more layered encoded data, decodes the received encoded data, and reproduces the encoded data. A step of storing a decoded signal obtained by decoding data for each layer, and one or more decoded signals stored when a decoded signal of a desired layer at a time required for reproduction cannot be obtained due to a transmission error The mixed signal is generated by inputting the read decoded signal into a mixing function and mixing at a set mixing ratio, and the generated mixed signal is created in a pseudo manner at the time when the reproduction is necessary. An interpolation signal generating step for generating an interpolation signal of the desired hierarchy; and a reproduction video output step for outputting the interpolation signal as a reproduction signal at a time when the reproduction is required.
 上記伝送エラー隠蔽処理方法において,前記補間信号を記憶するステップを有し,前記補間信号生成ステップでは,記憶されている1つ以上の補間信号を読み込み,前記読み込んだ復号信号とともに,前記読み込んだ補間信号を前記混合関数に入力することにより,前記1つ以上の復号信号と前記1つ以上の補間信号とを混合して,前記所望階層の前記補間信号とする混合信号を生成するようにしても良い。 The transmission error concealment processing method includes a step of storing the interpolation signal. In the interpolation signal generation step, one or more stored interpolation signals are read, and the read interpolation signal is read together with the read decoded signal. By inputting a signal to the mixing function, the one or more decoded signals and the one or more interpolation signals are mixed to generate a mixed signal that is the interpolation signal of the desired hierarchy. good.
 上記伝送エラー隠蔽処理方法において,前記混合率は,前記混合関数に入力される信号が前記再生が必要な時刻に近い信号ほど高い値,または,前記混合関数に入力される前記信号が前記所望階層に近い階層の信号ほど高い値,または,前記混合関数に入力される前記信号が映像品質の推定値が高い信号ほど高い値,または,前記混合関数に入力される前記信号が信号の時間的な画素値変化に応じた値となるように設定されたものであっても良い。 In the transmission error concealment processing method, the mixing rate is higher when the signal input to the mixing function is closer to the time when the reproduction is necessary, or the signal input to the mixing function is the desired layer. The higher the signal level is, the higher the signal input to the mixing function, the higher the image quality estimation value, or the higher the signal input to the mixing function, It may be set to have a value corresponding to a change in pixel value.
 上記伝送エラー隠蔽処理方法において,前記信号の時間的な画素値変化に応じた値となるように設定された混合率は,画面を分割した領域ごとに,動き量推定により各領域が静止領域であるか動領域であるかを判定し,その判定結果に従って設定された値であっても良い。 In the transmission error concealment processing method, the mixing rate set to be a value corresponding to a temporal change in the pixel value of the signal is that each region is a still region by motion estimation for each divided region of the screen. It may be a value set according to the determination result by determining whether it is a moving area or not.
 本発明の第3の観点は,上記伝送エラー隠蔽処理方法をコンピュータに実行させるための伝送エラー隠蔽処理プログラムである。 A third aspect of the present invention is a transmission error concealment processing program for causing a computer to execute the transmission error concealment processing method.
 本発明により,パケットロスや伝送遅延により所望階層の所望フレームが再生したいタイミングに復号されていない場合に,従来の技術と比較して最終的に再生する映像の画像品質を向上させることができる。 According to the present invention, when the desired frame in the desired layer is not decoded at the timing to be reproduced due to packet loss or transmission delay, the image quality of the video finally reproduced can be improved as compared with the conventional technique.
本発明の一実施形態を説明するためのフレーム構造の例を示す図である。It is a figure which shows the example of the frame structure for demonstrating one Embodiment of this invention. 本発明の一実施形態による伝送エラー隠蔽処理装置の構成例を示す図である。It is a figure which shows the structural example of the transmission error concealment processing apparatus by one Embodiment of this invention. 本発明の一実施形態による伝送エラー隠蔽処理の流れを示す図である。It is a figure which shows the flow of the transmission error concealment process by one Embodiment of this invention. 伝送エラー隠蔽処理における補間信号生成処理の流れを示す図である。It is a figure which shows the flow of the interpolation signal generation process in a transmission error concealment process. 伝送エラー隠蔽処理装置をソフトウェアプログラムにより実現するときのハードウェア構成例を示す図である。It is a figure which shows the hardware structural example when implement | achieving a transmission error concealment processing apparatus by a software program.
 以下,図面を参照して本発明の一実施形態について説明する。まず,本実施形態の概要について説明する。
 本実施形態では,パケットロス,伝送遅延,復号処理遅延などにより,所望階層の復号信号が再生したい当該時刻のタイミングに得られていない場合に,それまでに受信した復号信号やそれまでに生成した補間信号を用いて,当該時刻で復号信号が失われた所望階層の補間信号を擬似的に作り出す。この補間信号とは,復号信号に対して高周波成分の復元のような高画質化処理を施した信号を指す。この補間信号が,最終的にビデオレンダラに渡り,当該時刻の再生用信号として再生される。なお,本実施形態において伝送エラーとは,ネットワークにおけるパケットロスなどのエラーに加えて,伝送遅延が発生した場合や復号が間に合わない場合も含むものとする。
Hereinafter, an embodiment of the present invention will be described with reference to the drawings. First, an outline of the present embodiment will be described.
In this embodiment, when the decoded signal of the desired layer is not obtained at the timing of the desired time to be reproduced due to packet loss, transmission delay, decoding processing delay, etc., the decoded signal received so far or generated so far Using the interpolation signal, an interpolation signal of a desired hierarchy in which the decoded signal is lost at the time is generated in a pseudo manner. The interpolated signal refers to a signal obtained by performing a high image quality process such as restoration of a high frequency component on the decoded signal. This interpolated signal is finally reproduced as a signal for reproduction at the corresponding time over the video renderer. In this embodiment, the transmission error includes not only errors such as packet loss in the network but also cases where a transmission delay occurs or decoding is not in time.
 本実施形態では,当該時刻の補間信号を作成するために,以下のような手順を踏む。まず,受信済みの復号信号や補間信号をフレームバッファなどのメモリ領域に保存しておく。蓄積しておく復号信号と補間信号の数,種類,時間範囲は,あらかじめ外部より設定する。そして,メモリ領域に保存した復号信号や補間信号等を所定の混合関数を用いて混合し,得られた混合信号を当該時刻において擬似的に作り出した所望階層の補間信号とみなす。混合関数の関数形式やその内部で用いる係数は,あらかじめ外部より設定する。 In this embodiment, the following procedure is taken to create an interpolation signal at the time. First, the received decoded signal and interpolation signal are stored in a memory area such as a frame buffer. The number, type, and time range of the decoded signals and interpolation signals to be stored are set in advance from the outside. Then, the decoded signal, the interpolation signal, etc. stored in the memory area are mixed using a predetermined mixing function, and the obtained mixed signal is regarded as an interpolation signal of a desired hierarchy created in a pseudo manner at the time. The function format of the mixed function and the coefficients used internally are set in advance from the outside.
 本実施形態では,当該時刻で復号信号が失われた所望階層の補間信号を擬似的に作り出すために,動きや画質の情報を参考にして適応的に周囲の復号信号を混合している。例えば,静止領域では所望階層と同じ階層の過去の信号ほど,欠落された信号に近い信号値を持つと考えられる。一方,動領域では当該時刻の下位階層の信号ほど,欠落された信号に近い信号値を持つと考えられる。そこで,周囲の復号信号を混合する際に,動きの情報などを参考に欠落された信号に近い信号値を持つと推測される信号の混合率を高くすることで,映像の画像品質を高めることができる。 In this embodiment, in order to artificially create an interpolation signal of a desired layer in which the decoded signal is lost at the time, surrounding decoded signals are mixed adaptively with reference to information on motion and image quality. For example, in the static region, it is considered that the past signal of the same layer as the desired layer has a signal value closer to the missing signal. On the other hand, in the moving area, it is considered that the lower layer signal at the time has a signal value closer to the missing signal. Therefore, when the surrounding decoded signals are mixed, the image quality of the video is improved by increasing the mixing ratio of signals that are estimated to have signal values close to the missing signals with reference to motion information and the like. Can do.
 なお,それまでに生成した補間信号を用いないで,階層ごとに保存しておいた復号信号だけを混合関数に入力して所望階層の補間信号を生成することもできる。この場合にも従来技術よりも良好な結果が得られる。 In addition, without using the interpolation signal generated so far, only the decoded signal stored for each layer can be input to the mixing function to generate the interpolation signal of the desired layer. In this case as well, better results can be obtained than in the prior art.
 次に,本実施形態についてさらに詳細に説明する。本実施形態の説明にあたり,本実施形態の前提条件を以下に記す。入力される符号化データは,スケーラブル映像符号化により2階層以上に階層化されたデータとする。スケーラブル符号化の例としては,H.264/AVCの拡張(Annex G)規格であるSVC(Scalable Video Coding) が挙げられる。さらに,当該時刻において,再生したい所望階層の信号が得られなかった場合を想定している。 Next, this embodiment will be described in further detail. In the description of the present embodiment, the preconditions of the present embodiment are described below. The encoded data to be input is data hierarchized into two or more hierarchies by scalable video coding. Examples of scalable coding include H.264. SVC (Scalable Video Video Coding), which is an extension (Annex G) standard of H.264 / AVC. Further, it is assumed that a signal of a desired hierarchy to be reproduced cannot be obtained at the time.
 したがって,本実施形態の処理は所望階層の信号が得られていないことを示す所望階層の欠落指示フラグを受けて開始される。この所望階層としては,最下位階層ではなく,いずれかの上位階層が設定されるものとする。復号を行う端末のメモリ領域には,所定の受信済みの復号信号および当該時刻までに生成された所定の補間信号が保存されているものとする。保存する復号信号と補間信号の数,種類,時間範囲はあらかじめ外部より与えられる。 Therefore, the processing of the present embodiment is started upon receipt of a desired layer missing instruction flag indicating that a signal of the desired layer is not obtained. As this desired hierarchy, not the lowest hierarchy but any higher hierarchy is set. It is assumed that a predetermined received decoded signal and a predetermined interpolation signal generated up to the time are stored in the memory area of the terminal that performs decoding. The number, type, and time range of decoded signals and interpolation signals to be stored are given in advance from the outside.
 例えば,当該時刻をTとし,時刻T-aから時刻Tまでの復号信号recと時刻T-aから時刻T-1までの補間信号iplをフレームバッファに保存しておいたとする。当該時刻Tにおける補間信号ipl(T)の各画素は,次式のように生成される。 For example, assume that the time is T, and the decoded signal rec from time Ta to time T and the interpolation signal ip1 from time Ta to time T-1 are stored in the frame buffer. Each pixel of the interpolation signal ipl (T) at the time T is generated as follows.
 ipl(T)=f(rec(T-a),rec(T-a+1),…,rec(T-1),rec(T),ipl(T-a),ipl(T-a+1),…,ipl(T-1)) …… 式(1) 
ここで,f(s)は,信号群sを入力して混合信号を生成する混合関数である。
ipl (T) = f (rec (T−a), rec (T−a + 1),..., rec (T−1), rec (T), ipl (T−a), ipl (T−a + 1),. , Ipl (T-1)) ... Formula (1)
Here, f (s) is a mixing function for generating a mixed signal by inputting the signal group s.
 なお,ここでは主に混合関数として生成済みの補間信号を用いる例について説明する。これに対して,生成済みの補間信号を用いない場合には,補間信号ipl(T)の各画素は,以下に示す混合関数を用いて生成される。 Note that here, an example in which an interpolation signal that has already been generated as a mixed function is used will be described. On the other hand, when the generated interpolation signal is not used, each pixel of the interpolation signal ipl (T) is generated using the following mixing function.
 ipl(T)=f(rec(T-a),rec(T-a+1),…,rec(T-1),rec(T)) …… 式(1')
 また,前述の例は,当該時刻T以前の時刻T-aから時刻Tまでの復号信号や補間信号を保存している。しかし,H.264/AVCのBピクチャなどを用いる場合には,当該時刻Tよりも未来の復号信号を先に受信するため,必ずしも時間的に過去の信号のみが対象になるとは限らない。
ipl (T) = f (rec (T−a), rec (T−a + 1),..., rec (T−1), rec (T)) (1)
In the above example, the decoded signal and the interpolation signal from time Ta to time T before time T are stored. However, H. When a H.264 / AVC B picture or the like is used, since a future decoded signal is received before the time T, only the past signal in time is not necessarily the target.
 図1を参照して,補間信号の生成処理の具体的な一例を説明する。当該時刻をT,再生したい所望階層をLとする。本実施形態で生成する補間信号はipl(T)である。ここで,空間スケーラブル符号化データを受信し,時刻T-2では階層Lまで,時刻T-1では階層L-2まで,時刻Tでは階層L-1までデータを欠落なく受信できたとする。また,時刻T-2では復号信号rec(T-2)をそのまま再生し,時刻T-1では補間信号ipl(T-1)を再生したとする。補間信号ipl(T-1)の生成方法は,本実施形態を適用してもよいし,その他の方法でもよい。その他の方法の一例としては,特許文献1に記載されている方法が挙げられる。 Referring to FIG. 1, a specific example of the interpolation signal generation process will be described. The time is T, and the desired hierarchy to be reproduced is L. The interpolation signal generated in this embodiment is ipl (T). Here, it is assumed that spatially scalable encoded data has been received, and data has been received up to layer L at time T-2, layer L-2 at time T-1, and layer L-1 at time T without loss. Further, it is assumed that the decoded signal rec (T-2) is reproduced as it is at time T-2, and the interpolation signal ipl (T-1) is reproduced at time T-1. The present embodiment may be applied to the method for generating the interpolation signal ipl (T-1), or other methods may be used. As an example of the other method, the method described in Patent Document 1 can be cited.
 SVCの符号化データを処理する場合には,テクスチャ予測(IntraBLモード)で利用されるアップサンプルフィルタを流用すると,実装コストが小さくて済む。フレームバッファには,過去2フレーム分の復号信号と補間信号が保存されているものとする。つまり,rec(T-2),rec(T-1),rec(T)と,ipl(T-1)が保存されているものとする。 When processing the encoded data of SVC, if the upsampling filter used in texture prediction (IntraBL mode) is used, the implementation cost can be reduced. It is assumed that the past two frames of decoded signals and interpolated signals are stored in the frame buffer. That is, rec (T-2), rec (T-1), rec (T), and ipl (T-1) are stored.
 また,混合関数f(s)としては,入力された信号に対して線形の重み付けを行い,平均化する関数を用いるとする。重み付けを行うにあたり,復号信号rec(T-1)と復号信号rec(T)の解像度は所望階層の解像度より小さいため,所望階層の解像度まで拡大を行う。拡大の方法の一例としては,4タップや6タップなどの線形フィルタを用いた拡大方法や,高周波成分を擬似的に再構成する超解像度処理が挙げられる。ここで,拡大信号をups(t)とおき,拡大処理の関数系を
 ups(t)=p(rec(t))
と表現する。
Further, as the mixing function f (s), a function that performs linear weighting on an input signal and averages the input signal is used. In performing the weighting, since the resolution of the decoded signal rec (T-1) and the decoded signal rec (T) is smaller than the resolution of the desired layer, the resolution is expanded to the resolution of the desired layer. Examples of the enlargement method include an enlargement method using a linear filter such as a 4-tap or a 6-tap, and a super-resolution process for pseudo-reconstructing a high-frequency component. Here, the enlarged signal is set as ups (t), and the function system of the enlargement process is ups (t) = p (rec (t))
It expresses.
 このとき,当該時刻Tの補間信号ipl(T)の各画素は,
 ipl(T)=wa ・rec(T-2)+wb ・p(rec(T-1))
   +wc ・p(rec(T))+wd ・ipl(T-1) … 式(2) 
として生成される。ここで,wは各信号の混合率を示しており,
 wa +wb +wc +wd =1
である。空間的に同位置の画素値をそれぞれ重み付けして混合する。
At this time, each pixel of the interpolation signal ipl (T) at the time T is
ipl (T) = w a · rec (T-2) + w b · p (rec (T-1))
+ W c · p (rec (T)) + w d · ipl (T-1) Equation (2)
Is generated as Here, w indicates the mixing ratio of each signal,
w a + w b + w c + w d = 1
It is. The pixel values at the same spatial position are respectively weighted and mixed.
 混合率wは,受信端末において設定ファイルを用いて外部から与えるか,またはアプリケーション内部に導出モジュールを設けて与えるかのいずれかの方法で導出する。また,フレームに1つの混合率を与えてもよいし,任意形状の画像領域ごとや画素ごとに別個の混合率を与えてもよい。 The mixing ratio w is derived from the outside either using a setting file at the receiving terminal or by providing a derivation module inside the application. Further, one mixing rate may be given to the frame, or a separate mixing rate may be given to each image region or pixel having an arbitrary shape.
 混合率の与え方の例を以下に4つ記す。これら4つの方法の組み合わせによって混合率を設定してもよい。 The following are four examples of how to give the mixing ratio. The mixing ratio may be set by a combination of these four methods.
 <混合率設定方法1:信号の時刻差に応じた設定>
 混合率設定方法1は,当該時刻と混合の対象とした復号信号または補間信号の時刻との差に応じた設定方法である。当該時刻により近い信号の方が,その絵の構造が当該時刻の所望階層の信号が持つ絵の構造に近くなるため,できるだけ当該時刻に近い信号の混合率を高く設定することが望ましい。
<Mixing rate setting method 1: Setting according to the time difference of signals>
The mixing rate setting method 1 is a setting method according to the difference between the time and the time of the decoded signal or interpolation signal to be mixed. Since the signal closer to the time has a picture structure closer to that of the signal of the desired hierarchy at the time, it is desirable to set the mixing ratio of the signal close to the time as high as possible.
 <混合率設定方法2:信号の階層差に応じた設定>
 混合率設定方法2は,所望階層と混合の対象とした復号信号または補間信号の階層の差に応じた設定方法である。所望階層により近い階層の信号の方がより高い周波成分を多く保存しているため,できるだけ所望階層に近い階層の信号の混合率を高く設定することが望ましい。
<Mixing rate setting method 2: Setting according to signal hierarchy difference>
The mixing rate setting method 2 is a setting method corresponding to the difference between the desired layer and the decoded signal or interpolated signal layer to be mixed. Since the signal in the layer closer to the desired layer stores a larger number of higher frequency components, it is desirable to set the mixing ratio of the signal in the layer close to the desired layer as high as possible.
 <混合率設定方法3:映像品質の推定値に応じた設定>
 映像品質の推定値に応じた混合率の設定方法が考えられる。映像の品質が推定できるとすると,できるだけ高い品質の信号の混合率を高く設定することが望ましい。量子化値やピクチャタイプから映像品質を推定する方法が考えられる。
<Mixing ratio setting method 3: setting according to the estimated value of video quality>
A method for setting the mixing ratio according to the estimated value of the video quality is conceivable. If the video quality can be estimated, it is desirable to set the mixing ratio of the highest quality signal as high as possible. A method for estimating the video quality from the quantized value and the picture type can be considered.
 <混合率設定方法4:信号の時間的な画素値変化に応じた設定>
 時間的な画素値変化に応じた混合率の設定方法が考えられる。空間的に同位置の画素値を混合するため,オブジェクトの移動などによって時間的変化に従って画素値が異なる場合には,時間が異なる画素値の混合率を小さくし,当該時刻の復号信号の混合率を大きくする方が望ましい。また,反対に時間的変化に対して画素値が変化しない場合には,高周波成分が多く含まれる所望階層の復号信号または補間信号の混合率を大きくし,階層が所望階層から離れるほど混合率を小さく設定する方が望ましい。以上の要求を踏まえた設定方法が考えられる。画素位置xにおいて,時刻T-2から当該時刻Tに移る際に画素値変化が大きいか否か,また,時刻T-1から当該時刻Tに移る際に画素値変化が大きいか否かを推定する。
<Mixing rate setting method 4: Setting according to temporal pixel value change of signal>
A method for setting the mixing ratio in accordance with temporal pixel value changes is conceivable. Since pixel values at the same position are mixed spatially, if the pixel values differ according to temporal changes due to movement of the object, etc., the mixing ratio of the pixel values having different times is reduced and the decoding signal mixing ratio at that time is reduced. It is desirable to increase. On the other hand, when the pixel value does not change with time, the mixing rate of the decoded signal or interpolation signal in the desired layer containing a lot of high-frequency components is increased, and the mixing rate is increased as the layer moves away from the desired layer. It is desirable to set a smaller value. A setting method based on the above requirements can be considered. In the pixel position x, it is estimated whether the pixel value change is large when moving from time T-2 to the time T, and whether the pixel value change is large when moving from time T-1 to the time T. To do.
 混合率設定方法4の時間的な画素値変化に応じた混合率の設定方法について,混合率の与え方の一例を示す。時刻T-2から当該時刻Tおよび時刻T-1から当該時刻Tへの画素値変化がともに大きいとみなされた画素においては,混合率を,
 wa =wb =wd =0,wc =1
と設定し,補間信号を,
 ipl(T)=p(rec(T))  …… 式(3) 
として生成すればよい。
An example of how to give the mixing ratio will be described with respect to the mixing ratio setting method 4 according to the temporal change of the pixel value in the mixing ratio setting method 4. In a pixel in which the change in pixel value from time T-2 to time T and from time T-1 to time T is both considered to be large,
w a = w b = w d = 0, w c = 1
And set the interpolation signal to
ipl (T) = p (rec (T)) (3)
Can be generated as
 また,時刻T-2から当該時刻Tおよび時刻T-1から当該時刻Tへの画素値変化がともに小さいとみなされた画素においては,混合率を,
 wa =wb =wc =wd =1/4
と設定し,補間信号を,
 ipl(T)=(1/4)×{rec(T-2)+p(rec(T-1))+p(rec(T))+ipl(T-1)} …… 式(4) 
として生成すればよい。
In addition, in the pixel in which the change in pixel value from time T-2 to time T and from time T-1 to time T is both considered to be small,
w a = w b = w c = w d = ¼
And set the interpolation signal to
ipl (T) = (1/4) × {rec (T−2) + p (rec (T−1)) + p (rec (T)) + ipl (T−1)} (4)
Can be generated as
 時刻T-2から当該時刻Tへの画素値変化は大きいが,時刻T-1から当該時刻Tへの画素値変化が小さい場合には,混合率を,
 wa =0,wb =wc =wd =1/3
と設定し,補間信号を,
 ipl(T)=(1/3)×{p(rec(T-1))+p(rec(T))+ipl(T-1)} …… 式(5) 
として生成すればよい。
When the pixel value change from time T-2 to time T is large, but when the pixel value change from time T-1 to time T is small, the mixing ratio is
w a = 0, w b = w c = w d = 1/3
And set the interpolation signal to
ipl (T) = (1/3) × {p (rec (T−1)) + p (rec (T)) + ipl (T−1)} (5)
Can be generated as
 画素値変化の大小の推定方法としては,例えば,時刻T-2から当該時刻Tへの動き量推定を行い,動領域と判定された領域内の画素については画素値変化が大きいとみなし,静止領域と判定された領域内の画素については画素値変化が小さいとみなす方法が考えられる。補間信号のそれぞれごとに異なる混合率設定方法を用いてもよい。 As a method of estimating the magnitude of the pixel value change, for example, the motion amount is estimated from time T-2 to the time T, and the pixel value in the area determined to be a moving area is regarded as having a large pixel value change. A method can be considered in which the pixel value in the region determined to be a region is regarded as having a small change in pixel value. A different mixing ratio setting method may be used for each interpolation signal.
 時刻T-2から当該時刻Tへの動き量推定を行う方法の例を以下に5つ記す。つまり,当該画素が静止領域と動領域のどちらに属しているかを判定する。メモリ領域内の復号信号,拡大信号および補間信号や,受信した動きベクトルや予測モードなどの符号化情報を読み込み,静止領域と動領域の判定を行い,その判定結果を出力する。これらは,時刻T-2から当該時刻Tへの動き量推定の例であり,画素を静止領域と動領域の2つのカテゴリに分割しているが,カテゴリの数は3つ以上あってもよい。また,補間信号のそれぞれごとに異なる動き量推定方法を用いてもよい。 Here are five examples of how to estimate the amount of motion from time T-2 to time T. That is, it is determined whether the pixel belongs to a stationary region or a moving region. The decoding information in the memory area, the enlarged signal, the interpolation signal, and the encoded information such as the received motion vector and prediction mode are read, the still area and the moving area are determined, and the determination result is output. These are examples of estimating the amount of motion from time T-2 to time T, and the pixel is divided into two categories of a stationary region and a moving region, but the number of categories may be three or more. . Further, a different motion amount estimation method may be used for each interpolation signal.
 <動き量推定方法1>
 動き量推定方法1は,時刻T-2の復号信号の縮小信号と当該時刻Tの復号信号の差分値に応じた動き量を推定する方法である。
<Motion amount estimation method 1>
The motion amount estimation method 1 is a method for estimating the motion amount according to the difference value between the reduced signal of the decoded signal at time T-2 and the decoded signal at time T.
 当該画素が静止領域または動領域のどちらに属するかを,以下のようにして判定する。時刻T-2の復号信号rec(T-2)を当該時刻Tの復号信号が持つ解像度にまで縮小し,空間的に同位置の画素同士で画素値の差分をとる。復号信号rec(T-2)の縮小信号を,dws(T-2)と表現する。 ∙ Determine whether the pixel belongs to the static region or the moving region as follows. The decoded signal rec (T-2) at time T-2 is reduced to the resolution of the decoded signal at time T, and a pixel value difference is obtained between pixels at the same spatial position. A reduced signal of the decoded signal rec (T-2) is expressed as dws (T-2).
  静止領域:|rec(T)-dws(T-2)|≦E1  …… 式(6) 
  動領域 :E1 <|rec(T)-dws(T-2)| …… 式(7) 
ここで,E1 は,静止領域と動領域を分ける差分信号値の閾値であり,外部関数より与えられるものとする。例えばrec(T-2)の解像度が1920×1080,rec(T)の解像度が960×540であったとすると,上記の判定は,960×540の画素一つずつについて行うこととなる。また,その判定結果は,1920×1080の空間的同位置の4画素の判定結果としてみなされる。
Still region: | rec (T) −dws (T−2) | ≦ E 1 (6)
Moving region: E 1 <| rec (T) −dws (T−2) | (7)
Here, E 1 is a threshold value of the difference signal value that separates the stationary region and the moving region, and is given by an external function. For example, if the resolution of rec (T-2) is 1920 × 1080 and the resolution of rec (T) is 960 × 540, the above determination is performed for each pixel of 960 × 540. The determination result is regarded as a determination result of four pixels at the same spatial position of 1920 × 1080.
 <動き量推定方法2>
 動き量推定方法2は,時刻T-2の復号信号と当該時刻Tの拡大信号の差分値に応じた動き量を推定する方法である。
<Motion amount estimation method 2>
The motion amount estimation method 2 is a method for estimating the motion amount according to the difference value between the decoded signal at time T-2 and the enlarged signal at time T.
 当該画素が静止領域または動領域のどちらに属するかを,以下のようにして判定する。空間的に同位置の画素同士で画素値の差分をとる。当該時刻Tの拡大信号をups(T)と表現する。 ∙ Determine whether the pixel belongs to the static region or the moving region as follows. A pixel value difference is obtained between pixels in the same spatial position. The enlarged signal at the time T is expressed as ups (T).
  静止領域:|ups(T)-rec(T-2)|≦E2  …… 式(8) 
  動領域 :E2 <|ups(T)-rec(T-2)| …… 式(9) 
ここで,E2 は,静止領域と動領域を分ける差分信号値の閾値であり,外部関数より与えられるものとする。
Still region: | ups (T) -rec (T-2) | ≦ E 2 ...... Formula (8)
Moving region: E 2 <| ups (T) -rec (T-2) | Equation (9)
Here, E 2 is a threshold value of the difference signal value that separates the stationary region and the moving region, and is given by an external function.
 <動き量推定方法3>
 動き量推定方法3は,当該時刻Tの復号信号rec(T)の生成に用いた動きベクトルのノルムに応じた動き量を推定する方法である。
<Motion amount estimation method 3>
The motion amount estimation method 3 is a method for estimating the motion amount according to the norm of the motion vector used for generating the decoded signal rec (T) at the time T.
 当該時刻Tの復号信号rec(T)の生成に用いたあるマクロブロック(16×16画素領域)の動きベクトルのノルムをnとおく。この動きベクトルは,時刻T-2から当該時刻Tへの動きベクトルであるとする。このとき,当該画素が属するマクロブロックが静止領域または動領域のどちらに属するかを,以下のようにして判定する。 The norm of the motion vector of a certain macroblock (16 × 16 pixel region) used for generating the decoded signal rec (T) at the time T is set to n. This motion vector is assumed to be a motion vector from time T-2 to the time T. At this time, it is determined as follows whether the macroblock to which the pixel belongs belongs to a still area or a moving area.
  静止領域: n≦N …… 式(10)
  動領域 : N<n …… 式(11)
ここで,Nは,静止領域と動領域を分ける動きベクトルノルムの閾値であり,外部関数より与えられるものとする。ノルムの例としては,ユークリッド距離などが挙げられる。
Static region: n ≦ N …… Equation (10)
Moving region: N <n (Equation 11)
Here, N is a threshold value of a motion vector norm that separates a stationary region and a moving region, and is given by an external function. An example of a norm is Euclidean distance.
 <動き量推定方法4>
 動き量推定方法4は,当該時刻Tの復号信号rec(T)の生成に用いた予測モードの種類に応じた動き量を推定する方法である。
<Motion amount estimation method 4>
The motion amount estimation method 4 is a method for estimating a motion amount according to the type of prediction mode used to generate the decoded signal rec (T) at the time T.
 当該時刻Tの復号信号rec(T)の生成に用いたあるマクロブロックの予測モードをmとおく。符号化データはSVCに準拠するものとする。このとき,当該画素が属するマクロブロックが静止領域または動領域のどちらに属するかを,以下のようにして判定する。 Suppose that the prediction mode of a macroblock used for generating the decoded signal rec (T) at the time T is m. It is assumed that the encoded data conforms to SVC. At this time, it is determined as follows whether the macroblock to which the pixel belongs belongs to a still area or a moving area.
  静止領域: m==“skip”  …… 式(12)
  動領域 : m!==“skip” …… 式(13)
“skip”は,SVCにおけるskipモードを指している。判定の対象となるモードは,skipモード以外の別のモードでもよく,外部関数より与えられるものとする。なお、“==”は両辺が等しいことを意味し、“!==”は両辺が等しくないことを意味する。
Still region: m == “skip” (12)
Moving area: m! == “skip” (13)
“Skip” indicates the skip mode in SVC. The mode to be determined may be another mode other than the skip mode, and is given by an external function. Note that “==” means that both sides are equal, and “! ==” means that both sides are not equal.
 <動き量推定方法5>
 動き量推定方法5は,当該時刻Tの復号信号rec(T)の生成に用いた予測残差信号の大きさに応じた動き量を推定する方法である。
<Motion amount estimation method 5>
The motion amount estimation method 5 is a method for estimating the motion amount according to the magnitude of the prediction residual signal used for generating the decoded signal rec (T) at the time T.
 当該時刻Tの復号信号rec(T)の生成に用いた予測残差信号の信号値をrとおく。このとき,当該画素が静止領域または動領域のどちらに属するかを,以下のようにして判定する。 The signal value of the prediction residual signal used for generating the decoded signal rec (T) at the time T is set as r. At this time, it is determined as follows whether the pixel belongs to a still region or a moving region.
  静止領域: |r|≦R …… 式(14)
  動領域 : R<|r| …… 式(15)
ここで,Rは,静止領域と動領域を分ける予測残差信号の閾値であり,外部関数より与えられるものとする。rは,空間的に当該画素位置と同じ場所の予測残差信号の信号値をそのまま用いてもよいし,当該画素が属する画像領域(例えば,マクロブロック)内の分散値,平均値,最大値,中間値などでもよい。
Static region: | r | ≦ R (14)
Moving region: R <| r | Equation (15)
Here, R is a threshold value of a prediction residual signal that separates a stationary region and a moving region, and is given by an external function. For r, the signal value of the prediction residual signal at the same location as the pixel position in space may be used as it is, or the variance value, average value, maximum value in the image region (for example, macroblock) to which the pixel belongs. , Intermediate value, etc.
 SVCの符号化データの場合には,量子化係数がすべて0か否かを示すCBP(Coded Block Pattern )というフラグがある。予測残差信号の信号値の代わりに,このフラグの値が0か1かに応じて動き量のクラスを分けてもよい。 In the case of SVC encoded data, there is a flag called CBP (Coded Block Pattern) indicating whether or not all quantization coefficients are 0. Instead of the signal value of the prediction residual signal, the motion amount class may be divided according to whether the value of this flag is 0 or 1.
 これら5つの動き量推定の方法のいくつかを,多段につなげてもよい。例えば,動きベクトルノルムの値に応じた動き量推定を行い,フレームを動きの大きい領域と小さい領域に分割し,さらに,この動きの小さい領域に対して信号差分値に応じた動き量推定を行い,静止領域と動領域に再分割してもよい。 Some of these five motion amount estimation methods may be connected in multiple stages. For example, the motion amount estimation according to the value of the motion vector norm is performed, the frame is divided into a region with a large motion and a region with a small motion, and a motion amount estimation according to the signal difference value is performed for the region with a small motion. , It may be subdivided into a stationary area and a moving area.
 また,これらの判定により導出された判定結果を補正してもよい。例えば,ある画素または1画素以上の画像領域が動領域と判定されたものの,その周囲の画素または画像領域がすべて静止領域と判定された場合,対象となっている画素または画像領域についての判定結果は誤判定である可能性が高い。判定結果が誤判定である場合,その画素または画像領域は孤立点として表れ,画質劣化を招く。そこで,この場合,対象画素または対象画像領域についての判定結果は誤判定とみなし,静止領域として判定結果を修正する。つまり,対象画素または対象画像領域の周囲の判定結果が自分の判定結果と大きく異なる場合には,判定結果を修正することで,推定精度を向上できる。 Moreover, the determination result derived by these determinations may be corrected. For example, if a certain pixel or one or more image areas are determined to be moving areas, but all surrounding pixels or image areas are determined to be still areas, the determination result for the target pixel or image area Is likely to be a false determination. If the determination result is an erroneous determination, the pixel or image area appears as an isolated point, causing image quality degradation. Therefore, in this case, the determination result for the target pixel or the target image region is regarded as an erroneous determination, and the determination result is corrected as a still region. That is, when the determination result around the target pixel or the target image region is greatly different from the determination result of itself, the determination result can be corrected to improve the estimation accuracy.
 [伝送エラー隠蔽処理装置]
 以降で説明する実施形態では,混合率の設定方法として「混合率設定方法4:信号の時間的な画素値変化に応じた設定」を適用し,かつ,「動き量推定方法1:時刻T-2の復号信号の縮小信号と当該時刻Tの復号信号の差分値に応じた動き量推定」によって時間的な画素値変化を推定するものとする。動き量の推定は,フレームを1画素以上の画像領域に分割して実施するものとする。なお,他の混合率設定方法もしくは他の動き量推定方法を用いた場合にも,以下の説明から明らかなように,同様に本実施形態を実施することができる。
[Transmission error concealment processing device]
In the embodiments described below, “mixing rate setting method 4: setting according to temporal change in pixel value of signal” is applied as a mixing rate setting method, and “motion amount estimating method 1: time T− It is assumed that the temporal pixel value change is estimated by “motion amount estimation according to the difference value between the reduced signal of the decoded signal 2 and the decoded signal at time T”. The motion amount is estimated by dividing a frame into image regions of one pixel or more. It should be noted that the present embodiment can be implemented in the same manner as will be apparent from the following description when other mixing ratio setting methods or other motion amount estimation methods are used.
 本発明の一実施形態による伝送エラー隠蔽処理装置の構成例を図2に示す。図2において,10は伝送エラー隠蔽処理装置,20は符号化ストリームのパケットを受信する受信装置,30は符号化ストリームを復号して再生映像信号(単に再生信号ともいう)を出力する復号装置を表す。 FIG. 2 shows a configuration example of a transmission error concealment processing apparatus according to an embodiment of the present invention. In FIG. 2, 10 is a transmission error concealment processing device, 20 is a receiving device that receives packets of an encoded stream, and 30 is a decoding device that decodes the encoded stream and outputs a reproduced video signal (also simply referred to as a reproduced signal). To express.
 受信装置20は,スケーラブル映像符号化により符号化された符号化ストリームを受信する装置である。受信装置20での符号化ストリームの受信は,従来技術の受信装置と同様でよい。ただし,何らかの伝送エラーがあり,処理対象フレームのある階層化された符号化データの受信が欠落すると,その欠落を示す欠落指示信号が伝送エラー隠蔽処理装置10に送られる。復号装置30が欠落指示信号を生成するようにしてもよい。 The receiving device 20 is a device that receives an encoded stream encoded by scalable video encoding. The reception of the encoded stream by the receiving device 20 may be the same as that of the conventional receiving device. However, if there is some transmission error and the reception of the hierarchically encoded data with the processing target frame is missing, a missing instruction signal indicating the missing is sent to the transmission error concealment processing device 10. The decoding device 30 may generate a missing instruction signal.
 また,復号装置30も,従来のスケーラブル映像復号を行う装置と同様である。ただし,従来と異なるのは,復号装置30は,復号結果の再生映像信号を出力するだけでなく,可変長復号部31で復号された受信済み符号化情報を,スケーラブル復号部32だけでなく,伝送エラー隠蔽処理装置10にも出力することと,各階層の復号信号を伝送エラー隠蔽処理装置10に対して出力することである。 Also, the decoding device 30 is the same as a conventional device for performing scalable video decoding. However, the decoding device 30 is different from the conventional one in that not only the reproduced video signal of the decoding result is output, but also the received encoded information decoded by the variable length decoding unit 31 is not only the scalable decoding unit 32, To output to the transmission error concealment processing device 10 and to output the decoded signal of each layer to the transmission error concealment processing device 10.
 なお,図2では,説明を分かりやすくするために,伝送エラー隠蔽処理装置10と復号装置30とを別装置として表している。しかし,伝送エラー隠蔽処理装置10が復号装置30の一部として,復号装置30に内蔵されるものであってもよい。 In FIG. 2, the transmission error concealment processing device 10 and the decoding device 30 are shown as separate devices for easy understanding of the description. However, the transmission error concealment processing device 10 may be incorporated in the decoding device 30 as a part of the decoding device 30.
 記憶装置15には,フレームバッファとして,既に復号装置30によって復号された復号信号を記憶する復号信号記憶部151や,以前に生成された混合信号を記憶する補間信号記憶部152が設けられている。また,記憶装置15のメモリ領域には,受信済み符号化情報として,動きベクトル,予測モード,予測残差信号,CBPなどの情報が復号装置30から入力されて格納されている。また,記憶装置15のメモリ領域には,動き量推定用閾値やフレーム混合率が外部の設定部40からあらかじめ設定されて記憶されている。 The storage device 15 is provided with a decoded signal storage unit 151 that stores a decoded signal that has already been decoded by the decoding device 30 and an interpolation signal storage unit 152 that stores a previously generated mixed signal as a frame buffer. . In the memory area of the storage device 15, information such as a motion vector, a prediction mode, a prediction residual signal, and CBP is input from the decoding device 30 and stored as received encoded information. Further, in the memory area of the storage device 15, a motion amount estimation threshold and a frame mixture ratio are set in advance from the external setting unit 40 and stored.
 伝送エラー隠蔽処理装置10が受信装置20から処理対象フレームの欠落指示信号を受信すると,静止領域・動領域判定部11は,再生対象となっている画素(群)について,動き量推定を行い,静止領域と動領域のどちらに属しているかを判定する。すなわち,静止領域・動領域判定部11は,記憶装置15内の復号信号,拡大信号および補間信号や,受信した動きベクトルや予測モードなどの符号化情報をメモリ領域から読み込み,静止領域と動領域の判定を行い,その判定結果を記憶装置15に格納する。 When the transmission error concealment processing device 10 receives the processing target frame missing instruction signal from the receiving device 20, the still region / moving region determination unit 11 performs motion amount estimation for the pixel (group) to be reproduced, It is determined whether it belongs to a stationary area or a moving area. That is, the still region / moving region determination unit 11 reads the decoded signal, the enlarged signal and the interpolation signal in the storage device 15 and the encoded information such as the received motion vector and prediction mode from the memory region. The determination result is stored in the storage device 15.
 混合率設定部12は,記憶装置15から静止領域・動領域の判定結果と各判定(すなわち,静止領域および動領域)に対する混合率の値を読み込み,当該画素における各復号信号,拡大信号および補間信号の混合率として設定する。設定が完了すれば,補間信号生成部13の処理に移る。 The mixing rate setting unit 12 reads the determination result of the still region / moving region and the value of the mixing rate for each determination (that is, the still region and the moving region) from the storage device 15, and outputs each decoded signal, enlarged signal, and interpolation for the pixel. Set as signal mixing ratio. When the setting is completed, the process proceeds to the interpolation signal generation unit 13.
 補間信号生成部13は,混合率設定部12が設定した各復号信号,拡大信号および補間信号の混合率を読み込み,記憶装置15から当該画素と空間的に同位置の各復号信号,拡大信号および補間信号の値を読み込み,読み込んだ混合率に従って各復号信号,拡大信号および補間信号の値を混合し,当該時刻の補間信号を生成する。補間信号生成部13は,混合により生成された補間信号を記憶装置15の補間信号記憶部152に出力する。出力が完了すれば,再生映像出力部14の処理に移る。 The interpolation signal generation unit 13 reads the decoding rate, the enlarged signal, and the mixing rate of the interpolation signal set by the mixing rate setting unit 12 and reads from the storage device 15 each decoding signal, the enlarged signal, and the spatially same position as the pixel. The value of the interpolation signal is read, and the values of the decoded signal, the enlarged signal and the interpolation signal are mixed according to the read mixing ratio, and the interpolation signal at the time is generated. The interpolation signal generation unit 13 outputs the interpolation signal generated by the mixing to the interpolation signal storage unit 152 of the storage device 15. When the output is completed, the process proceeds to the process of the reproduction video output unit 14.
 再生映像出力部14は,記憶装置15の補間信号記憶部152から当該時刻の補間信号を読み込み,再生タイミングにおいてビデオレンダラ(図示省略)に再生映像信号として出力する。 The playback video output unit 14 reads the interpolation signal at the time from the interpolation signal storage unit 152 of the storage device 15 and outputs it as a playback video signal to a video renderer (not shown) at the playback timing.
 [処理の流れ]
 伝送エラー隠蔽処理装置10が実行する処理の流れについて,図3および図4を参照して詳しく説明する。図3は,全体の処理の流れを示し,図4は,図3における補間信号生成処理S12の具体的な処理の流れを示している。
[Process flow]
The flow of processing executed by the transmission error concealment processing device 10 will be described in detail with reference to FIGS. FIG. 3 shows the overall processing flow, and FIG. 4 shows the specific processing flow of the interpolation signal generation processing S12 in FIG.
 SVCに準拠した空間スケーラビリティを持つ符号化ストリームを受信する。ただし,ここで,拡張階層のデータは伝送エラーにより欠落する可能性があるものとする。一方,基本階層のデータは欠落していないものとする。この実施形態の具体例として,所望時刻Tにて復号できた階層の中で最も上位の階層(以下,最上位階層という)の復号信号rec(T)と,所望階層まで復号できた復号信号の中で直近の時刻のフレームの復号信号rec(T-2)とを利用して,所望の補間信号ipl(T)を生成する場合の処理の流れを説明する。 Receives an encoded stream with spatial scalability conforming to SVC. However, here, it is assumed that the data in the extension layer may be lost due to a transmission error. On the other hand, it is assumed that data of the basic hierarchy is not missing. As a specific example of this embodiment, the decoded signal rec (T) of the highest layer (hereinafter referred to as the highest layer) among the layers decoded at the desired time T, and the decoded signal decoded up to the desired layer The flow of processing when the desired interpolation signal ipl (T) is generated using the decoded signal rec (T-2) of the frame at the latest time will be described.
 また,混合率の設定方法として「混合率設定方法4:信号の時間的な画素値変化に応じた設定」を適用し,かつ,「動き量推定方法1:時刻T-2の復号信号の縮小信号と当該時刻Tの復号信号の差分値に応じた動き量推定」によって時間的な画素値変化を推定する場合について説明する。動き量の推定は,フレームを1画素以上の画像領域に分割して実施するものとする。以降の説明は,所望時刻Tにおけるフレームの補間信号の生成フローである。 Also, “mixing rate setting method 4: setting according to change in temporal pixel value of signal” is applied as a mixing rate setting method, and “motion amount estimation method 1: reduction of decoded signal at time T-2” A case where a temporal pixel value change is estimated by “motion amount estimation according to a difference value between a signal and a decoded signal at time T” will be described. The motion amount is estimated by dividing a frame into image regions of one pixel or more. The following description is a flow of generating a frame interpolation signal at the desired time T.
 〔ステップS10:画像領域分割処理〕
 画像領域分割処理では,所望階層の補間信号出力用のフレームを入力し,あらかじめ決められた1画素以上の複数の画像領域にフレームを分割する。あらかじめ決められた1画素以上の複数の画像領域とは,例えばマクロブロック(16×16画素)などが考えられるが,これに限られない。この処理による出力は,分割されたフレームと分割情報である。
[Step S10: Image Region Division Processing]
In the image area dividing process, an interpolation signal output frame of a desired hierarchy is input, and the frame is divided into a plurality of predetermined image areas of one pixel or more. The predetermined plurality of image regions having one or more pixels may be, for example, a macro block (16 × 16 pixels), but is not limited thereto. The output by this processing is a divided frame and division information.
 〔ステップS11-S13:各画像領域における補間信号生成処理ループ〕
 所望階層のフレームの各画像領域について,ステップS12の補間信号生成処理を実施する。すべての画像領域に対する補間信号生成処理が完了するまでこの処理を繰り返す。
[Steps S11-S13: Interpolation Signal Generation Processing Loop in Each Image Region]
The interpolation signal generation process of step S12 is performed for each image region of the frame of the desired hierarchy. This process is repeated until the interpolation signal generation process for all image regions is completed.
 〔ステップS12:補間信号生成処理〕
 補間信号生成処理での入力は,所望時刻Tにて復号できた最上位階層の復号信号rec(T),所望階層まで復号できた復号信号の中で直近の時刻のフレームの復号信号rec(T-2),静止領域・動領域判定用閾値,静止領域および動領域に対する混合率,信号の混合式,当該画像領域のインデクスなどの記憶装置15に記憶されているデータである。補間信号生成処理の出力は,当該画像領域に対する補間信号ipl(T)である。
[Step S12: Interpolation Signal Generation Processing]
The input in the interpolation signal generation processing is the decoded signal rec (T) of the highest layer decoded at the desired time T, and the decoded signal rec (T) of the frame at the latest time among the decoded signals decoded up to the desired layer. -2), data stored in the storage device 15 such as a still region / moving region determination threshold, a mixing ratio for the still region and the moving region, a signal mixing formula, an index of the image region, and the like. The output of the interpolation signal generation process is an interpolation signal ipl (T) for the image area.
 ここでは,静止領域・動領域判定部11が,処理対象となっている当該画像領域について,所望時刻Tにて復号できた最上位階層の復号信号rec(T)および所望階層まで復号できた復号信号の中で直近の時刻のフレームの復号信号rec(T-2)を用いて静止領域か動領域かを判定する。その判定結果をもとに,混合率設定部12が混合率を設定する。そして,補間信号生成部13が,復号信号rec(T)と,復号信号rec(T-2)の信号値を,混合率に従って混合し,その混合信号を当該画像領域に対する補間信号ipl(T)として,補間信号記憶部152に出力する。 Here, the still region / moving region determination unit 11 decodes the decoded signal rec (T) of the highest layer decoded at the desired time T and the decoded to the desired layer for the image region to be processed. The decoded signal rec (T-2) of the frame at the most recent time in the signal is used to determine whether it is a still area or a moving area. Based on the determination result, the mixing rate setting unit 12 sets the mixing rate. Then, the interpolation signal generator 13 mixes the decoded signal rec (T) and the signal value of the decoded signal rec (T-2) according to the mixing ratio, and the mixed signal is the interpolation signal ipl (T) for the image region. Is output to the interpolation signal storage unit 152.
 以上の図3に示すステップS12の補間信号生成処理の詳細を,図4に従って説明する。 Details of the interpolation signal generation processing in step S12 shown in FIG. 3 will be described with reference to FIG.
 〔ステップS20:縮小信号生成処理〕
 静止領域・動領域判定部11が最初に行う縮小信号生成処理の入力は,所望階層まで復号できた復号信号の中で直近の時刻のフレームの復号信号rec(T-2),各階層の解像度情報である。ここでは,静止領域・動領域判定部11は,時刻T-2の復号信号rec(T-2)を,所望時刻Tにて復号できた最上位階層の復号信号rec(T)の解像度にまで縮小する処理を行う。例えば,(1920×1080)/(960×540)の空間スケーラビリティを持ち,拡張階層のある16×16画素を当該画像領域とすると,縮小画像領域は同じ空間位置の8×8画素の領域となる。この処理での出力は,復号信号rec(T-2)の縮小信号dws(T-2)である。
[Step S20: Reduced Signal Generation Processing]
The input of the reduced signal generation processing performed first by the still region / moving region determination unit 11 is the decoded signal rec (T-2) of the frame at the most recent time among the decoded signals decoded to the desired layer, and the resolution of each layer Information. Here, the still region / moving region determination unit 11 reaches the resolution of the decoded signal rec (T) of the highest layer that can be decoded at the desired time T from the decoded signal rec (T-2) at the time T-2. Process to reduce. For example, if the image area is 16 × 16 pixels having a spatial scalability of (1920 × 1080) / (960 × 540) and having an extended layer, the reduced image area is an 8 × 8 pixel area at the same spatial position. . The output in this process is a reduced signal dws (T-2) of the decoded signal rec (T-2).
 〔ステップS21:差分絶対値計算処理〕
 静止領域・動領域判定部11が次に行う差分絶対値計算処理の入力は,復号信号rec(T-2)の縮小信号dws(T-2)と復号信号rec(T)である。ここでは,静止領域・動領域判定部11は,縮小信号dws(T-2)と復号信号rec(T)との間で空間的に同位置の画素同士で画素値の差分をとる。そして,静止領域・動領域判定部11は,当該縮小画像領域における前記差分の絶対値の合計を計算する。前述の例の場合,静止領域・動領域判定部11は,縮小画像領域8×8画素の中の差分絶対値の合計Eを計算する。差分絶対値計算処理の出力は,当該縮小画像領域内の差分絶対値の合計Eである。
[Step S21: Difference Absolute Value Calculation Processing]
The next input of the absolute difference calculation process performed by the still region / moving region determination unit 11 is the reduced signal dws (T-2) and the decoded signal rec (T) of the decoded signal rec (T-2). Here, the still region / moving region determination unit 11 calculates a pixel value difference between pixels at the same spatial position between the reduced signal dws (T−2) and the decoded signal rec (T). Then, the still region / moving region determination unit 11 calculates the sum of the absolute values of the differences in the reduced image region. In the case of the above-described example, the still region / moving region determination unit 11 calculates the sum E of absolute difference values in the reduced image region 8 × 8 pixels. The output of the difference absolute value calculation process is the sum E of the difference absolute values in the reduced image area.
 〔ステップS22:静止領域・動領域判定処理〕
 静止領域・動領域判定処理では,当該縮小画像領域内の差分絶対値の合計Eと,静止領域・動領域の判定のための閾値E1 を入力する。この閾値E1 は,あらかじめ外部関数などの設定部40によって与えられた静止領域と動領域を分ける差分信号値の閾値であり,記憶装置15内に記憶されている値である。これらの入力をもとに,静止領域・動領域判定部11は,差分絶対値の合計Eが静止領域・動領域の判定のための閾値E1 より大きいかどうかを判定する。合計Eが閾値以下であれば,その縮小画像領域に空間的に対応する当該画像領域を静止領域とみなす。合計Eが閾値より大きければ当該画像領域を動領域とみなす。つまり,下記のような判定を実施する。
[Step S22: Still Area / Moving Area Determination Process]
In the still region / moving region determination process, the sum E of absolute differences in the reduced image region and a threshold E 1 for determining the still region / moving region are input. This threshold value E 1 is a threshold value of a differential signal value that separates a stationary area and a moving area given in advance by the setting unit 40 such as an external function, and is a value stored in the storage device 15. Based on these inputs, the still region / moving region determination unit 11 determines whether or not the sum E of the absolute differences is larger than a threshold E 1 for determining the still region / moving region. If the total E is less than or equal to the threshold value, the image area corresponding spatially to the reduced image area is regarded as a still area. If the total E is larger than the threshold value, the image area is regarded as a moving area. In other words, the following judgment is performed.
  静止領域: E≦E1  …… 式(16)
  動領域 : E1 <E …… 式(17)
 前述の例の場合,縮小画像領域8×8画素の判定結果が,当該画像領域16×16画素の判定結果とみなされる。この処理の出力は,当該画像領域内の静止領域・動領域判定結果である。
Still region: E ≦ E 1 …… Equation (16)
Moving region: E 1 <E …… Equation (17)
In the above example, the determination result of the reduced image area 8 × 8 pixels is regarded as the determination result of the image area 16 × 16 pixels. The output of this process is a still area / moving area determination result in the image area.
 〔ステップS23-S25:判定結果の修正処理〕
 静止領域・動領域判定部11は,さらに前記判定結果に基づき,当該画像領域が静止領域であるか否かに応じて次の処理を行う。当該画像領域が静止領域であれば,静止領域・動領域判定部11は,周囲の多くの画像領域が動領域であるかを判定する。そうであれば,処理はステップS27へ進む。そうでなければ,処理はステップS26へ進む。また,当該画像領域が静止領域でなければ,静止領域・動領域判定部11は,周囲の多くの画像領域が静止領域であるかを判定する。そうであれば,処理はステップS26へ進む。そうでなければ,処理はステップS27へ進む。これにより,周囲の画像領域の多くについての判定結果が当該画像領域と異なる判定結果であった場合に,当該画像領域の判定結果を周囲の画像領域の判定結果へと修正する。
[Steps S23 to S25: Correction processing of determination result]
The still region / moving region determination unit 11 further performs the following processing based on whether the image region is a still region based on the determination result. If the image region is a still region, the still region / moving region determination unit 11 determines whether many surrounding image regions are moving regions. If so, the process proceeds to step S27. Otherwise, the process proceeds to step S26. If the image region is not a still region, the still region / moving region determination unit 11 determines whether many surrounding image regions are still regions. If so, the process proceeds to step S26. Otherwise, the process proceeds to step S27. As a result, when the determination result for many of the surrounding image areas is a determination result different from the image area, the determination result for the image area is corrected to the determination result for the surrounding image area.
 〔ステップS26:静止領域に対する混合率設定〕
 混合率設定部12は,静止領域に対する混合率を記憶装置15から読み込み,レジスタ(図示省略)にその値をセットする。その後,処理はステップS28へ進む。
[Step S26: Setting Mixing Ratio for Still Area]
The mixing rate setting unit 12 reads the mixing rate for the still area from the storage device 15 and sets the value in a register (not shown). Thereafter, the process proceeds to step S28.
 〔ステップS27:動領域に対する混合率設定〕
 混合率設定部12は,動領域に対する混合率を記憶装置15から読み込み,レジスタにその値をセットする。その後,処理はステップS28へ進む。
[Step S27: Setting Mixing Ratio for Moving Area]
The mixing rate setting unit 12 reads the mixing rate for the moving area from the storage device 15 and sets the value in the register. Thereafter, the process proceeds to step S28.
 〔ステップS28:補間信号生成処理〕
 補間信号生成部13は,所望時刻Tにて復号できた最上位階層の復号信号rec(T),所望階層まで復号できた復号信号の中で直近の時刻のフレームの復号信号rec(T-2),静止領域および動領域に対する混合率,信号の混合式を入力し,これらから次のように当該画像領域における補間信号を生成する。まず,補間信号生成部13は,復号信号rec(T)を復号信号rec(T-2)の解像度まで拡大する。次に,補間信号生成部13は,復号信号rec(T-2)と復号信号rec(T)の拡大信号p(rec(T))とを,レジスタにセットされた混合率で混合する。ただし,ここで信号の混合式は線形の重み付け和とし,混合率はその線形の重み付け係数とする。混合率をwa とwc とするとき,復号信号rec(T-2)と拡大信号p(rec(T))の補間信号の生成は,次式の
 ipl(T)=wa ・rec(T-2)+wc ・p(rec(T)) …… 式(18)
に従う。
[Step S28: Interpolation Signal Generation Processing]
The interpolated signal generation unit 13 decodes the highest level decoded signal rec (T) decoded at the desired time T, and the decoded signal rec (T-2) of the frame at the latest time among the decoded signals decoded up to the desired layer. ), A mixing ratio for the stationary region and the moving region, and a signal mixing formula are input, and an interpolation signal in the image region is generated from these as follows. First, the interpolation signal generation unit 13 expands the decoded signal rec (T) to the resolution of the decoded signal rec (T-2). Next, the interpolation signal generation unit 13 mixes the decoded signal rec (T-2) and the enlarged signal p (rec (T)) of the decoded signal rec (T) at the mixing ratio set in the register. Here, the signal mixing formula is a linear weighted sum, and the mixing rate is the linear weighting coefficient. When the mixing ratio and w a and w c, the generation of the interpolation signal of the decoded signal rec (T-2) and the larger signal p (rec (T)) is the following formula ipl (T) = w a · rec ( T−2) + w c · p (rec (T)) (18)
Follow.
 本実施形態では,時刻ごとに補間信号の生成手段を変えてもよい。例えば,ipl(T-1)には,映像品質の推定値に応じた混合率の設定方法を適用し,ipl(T)には,時間的な画素値変化に応じた混合率の設定方法を適用してもよい。超解像度技術などの従来技術と本実施形態を時刻ごとに変えてもよい。 In the present embodiment, the interpolation signal generation means may be changed for each time. For example, a method of setting a mixing rate according to the estimated video quality value is applied to ipl (T-1), and a method of setting the mixing rate according to temporal pixel value change is applied to ipl (T). You may apply. The conventional technology such as the super-resolution technology and this embodiment may be changed for each time.
 また,本実施形態では,混合関数の形式は前述のような線形の重み付け平均ではなく,非線形の任意の関数形式をとってもよい。複数の信号値の中間値や,重み付けされた信号の中間値を出力するような関数形式であってもよい。 In this embodiment, the form of the mixing function may be an arbitrary non-linear function form instead of the linear weighted average as described above. A function format that outputs an intermediate value of a plurality of signal values or an intermediate value of a weighted signal may be used.
 また,この処理例では,随時,画像領域ごとに静止領域と動領域の判定を行ってから補間処理を実施している。しかし,1フレームの全画素に対して静止領域と動領域の判定処理を行ってから,各画素において補間処理を行ってもよい。 Also, in this processing example, interpolation processing is performed after determining a stationary region and a moving region for each image region as needed. However, the interpolation process may be performed on each pixel after the determination process of the still area and the moving area is performed on all the pixels of one frame.
 以上の伝送エラー隠蔽の処理は,コンピュータとソフトウェアプログラムとによっても実現することができる。また,そのプログラムをコンピュータ読み取り可能な記録媒体に記録することも,ネットワークを通して提供することも可能である。 The above transmission error concealment processing can also be realized by a computer and a software program. The program can be recorded on a computer-readable recording medium or provided through a network.
 図5は,伝送エラー隠蔽処理装置をソフトウェアプログラムを用いて実現するときのハードウェア構成例を示している。 FIG. 5 shows a hardware configuration example when the transmission error concealment processing device is realized by using a software program.
 本システムは,プログラムを実行するCPU(Central Processing Unit)50と,CPU50がアクセスするプログラムやデータが格納されるRAM(Random Access Memory)等のメモリ51と,ネットワークを介して符号化ストリームを受信する符号化ストリーム受信部52と,CPU50に実行させるプログラムを記憶するプログラム記憶装置53と,再生映像信号を出力する映像再生部54とが,バスで接続された構成になっている。 This system receives a coded stream via a network (CPU (Central Processing Unit) 50 for executing a program, a memory 51 such as a RAM (Random Access Memory) in which a program and data accessed by the CPU 50 are stored, and the network. The encoded stream receiving unit 52, a program storage device 53 for storing a program to be executed by the CPU 50, and a video reproducing unit 54 for outputting a reproduced video signal are connected by a bus.
 プログラム記憶装置53には,符号化ストリーム受信部52が受信した階層化された符号化データの符号化ストリームを復号する復号処理プログラム531と,処理対象フレームに伝送エラーがあった場合に,上述した伝送エラー隠蔽の処理をCPU50に実行させるための伝送エラー隠蔽処理プログラム532とが格納されている。 The program storage device 53 includes the decoding processing program 531 for decoding the encoded stream of the hierarchically encoded data received by the encoded stream receiving unit 52, and the above-described case when there is a transmission error in the processing target frame. A transmission error concealment processing program 532 for causing the CPU 50 to execute a transmission error concealment process is stored.
 CPU50は,これらの復号処理プログラム531および伝送エラー隠蔽処理プログラム532をメモリ51にロードして実行する。これにより,パケットロスや伝送遅延により所望フレームの所望階層が再生したいタイミングに復号されていない場合にも,最終的に再生する映像の画像品質の劣化を目立たなくさせることを実現する。 The CPU 50 loads the decoding processing program 531 and the transmission error concealment processing program 532 into the memory 51 and executes them. As a result, even when the desired layer of the desired frame is not decoded at the timing to be reproduced due to packet loss or transmission delay, it is possible to make the deterioration of the image quality of the finally reproduced video inconspicuous.
 以上,本発明の実施形態について図面を参照して詳述したが,具体的な構成は上述した実施形態に限られるものではなく,本発明の要旨を逸脱しない範囲の設計等(構成の付加,省略,置換,およびその他の変更)も含まれる。本発明は前述した説明によって限定されることはなく,添付の請求の範囲によってのみ限定される。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to the above-described embodiment, and the design and the like (addition of configuration, Omissions, substitutions, and other changes). The present invention is not limited by the above description, but only by the appended claims.
 本発明は,例えば,2つ以上の階層化された符号化データを受信し,受信した符号化データを復号し,映像を再生するシステムにおいて利用される。本発明によれば,パケットロスや伝送遅延により所望階層の所望フレームが再生したいタイミングに復号されていない場合であっても,最終的に再生する映像の画像品質を向上させることができる。 The present invention is used, for example, in a system that receives two or more hierarchized encoded data, decodes the received encoded data, and reproduces a video. According to the present invention, even when a desired frame in a desired layer is not decoded at a timing to be reproduced due to packet loss or transmission delay, the image quality of the video to be finally reproduced can be improved.
 10 伝送エラー隠蔽処理装置
 11 静止領域・動領域判定部
 12 混合率設定部
 13 補間信号生成部
 14 再生映像出力部
 15 記憶装置
 151 復号信号記憶部
 152 補間信号記憶部
 20 受信装置
 30 復号装置
 31 可変長復号部
 32 スケーラブル復号部
 40 設定部
DESCRIPTION OF SYMBOLS 10 Transmission error concealment processing apparatus 11 Still region / moving region determination unit 12 Mixing rate setting unit 13 Interpolation signal generation unit 14 Playback video output unit 15 Storage device 151 Decoding signal storage unit 152 Interpolation signal storage unit 20 Reception device 30 Decoding device 31 Variable Long decoding unit 32 Scalable decoding unit 40 Setting unit

Claims (9)

  1.  2つ以上の階層化された符号化データを受信し,前記受信した符号化データを復号して再生するシステムにおける伝送エラー隠蔽処理装置であって,
     前記符号化データを階層ごとに復号して得られた復号信号を記憶する復号信号記憶部と,
     再生が必要な時刻の所望階層の復号信号が伝送エラーにより得られない場合に,前記復号信号記憶部に記憶されている1つ以上の復号信号を読み込み,前記読み込んだ復号信号を混合関数に入力して設定された混合率で混合することにより混合信号を生成し,前記生成された混合信号を前記再生が必要な時刻において擬似的に作り出した前記所望階層の補間信号とする補間信号生成部と,
     前記補間信号を前記再生が必要な時刻の再生用の信号として出力する再生映像出力部とを備える
     伝送エラー隠蔽処理装置。
    A transmission error concealment processing apparatus in a system that receives two or more layered encoded data, decodes and reproduces the received encoded data,
    A decoded signal storage unit for storing a decoded signal obtained by decoding the encoded data for each layer;
    When a decoded signal of a desired layer at a time required for reproduction cannot be obtained due to a transmission error, one or more decoded signals stored in the decoded signal storage unit are read, and the read decoded signals are input to a mixing function An interpolated signal generating unit that generates a mixed signal by mixing at the set mixing ratio, and uses the generated mixed signal as an interpolated signal of the desired hierarchy that is artificially generated at the time when the reproduction is required; ,
    A transmission error concealment processing apparatus comprising: a reproduction video output unit that outputs the interpolated signal as a signal for reproduction at the time when the reproduction is required.
  2.  請求項1記載の伝送エラー隠蔽処理装置において,
     前記補間信号を記憶する補間信号記憶部を備え,
     前記補間信号生成部は,前記補間信号記憶部に記憶されている1つ以上の補間信号を読み込み,前記復号信号記憶部から読み込んだ前記復号信号とともに,前記読み込んだ補間信号を前記混合関数に入力することにより,前記1つ以上の復号信号と前記1つ以上の補間信号とを混合して,前記所望階層の前記補間信号とする混合信号を生成する
     伝送エラー隠蔽処理装置。
    The transmission error concealment processing device according to claim 1,
    An interpolation signal storage unit for storing the interpolation signal;
    The interpolation signal generation unit reads one or more interpolation signals stored in the interpolation signal storage unit, and inputs the read interpolation signal to the mixing function together with the decoded signal read from the decoded signal storage unit Thus, the transmission error concealment processing device generates the mixed signal that mixes the one or more decoded signals and the one or more interpolation signals to form the interpolation signal of the desired layer.
  3.  請求項1または請求項2記載の伝送エラー隠蔽処理装置において,
     前記混合率は,前記混合関数に入力される信号が前記再生が必要な時刻に近い信号ほど高い値,または,前記混合関数に入力される前記信号が前記所望階層に近い階層の信号ほど高い値,または,前記混合関数に入力される前記信号が映像品質の推定値が高い信号ほど高い値,または,前記混合関数に入力される前記信号が信号の時間的な画素値変化に応じた値となるように設定されたものである
     伝送エラー隠蔽処理装置。
    In the transmission error concealment processing device according to claim 1 or 2,
    The mixing rate is higher as the signal input to the mixing function is closer to the time when the reproduction is required, or higher as the signal input to the mixing function is closer to the desired layer. Or the signal input to the mixing function has a higher value as the image quality estimation value is higher, or the signal input to the mixing function has a value corresponding to a temporal pixel value change of the signal. A transmission error concealment processing device.
  4.  請求項3記載の伝送エラー隠蔽処理装置において,
     前記信号の時間的な画素値変化に応じた値となるように設定された混合率は,画面を分割した領域ごとに,動き量推定により各領域が静止領域であるか動領域であるかを判定し,その判定結果に従って設定された値である
     伝送エラー隠蔽処理装置。
    The transmission error concealment processing device according to claim 3,
    The mixing ratio set to be a value corresponding to the temporal pixel value change of the signal indicates whether each area is a stationary area or a moving area by estimating the amount of movement for each area obtained by dividing the screen. A transmission error concealment processing device that is determined and set according to the determination result.
  5.  2つ以上の階層化された符号化データを受信し,前記受信した符号化データを復号して再生するシステムにおける伝送エラー隠蔽処理方法であって,
     前記符号化データを階層ごとに復号して得られた復号信号を記憶するステップと,
     再生が必要な時刻の所望階層の復号信号が伝送エラーにより得られない場合に,記憶されている1つ以上の復号信号を読み込み,前記読み込んだ復号信号を混合関数に入力して設定された混合率で混合することにより混合信号を生成し,前記生成された混合信号を前記再生が必要な時刻において擬似的に作り出した前記所望階層の補間信号とする補間信号生成ステップと,
     前記補間信号を前記再生が必要な時刻の再生用の信号として出力する再生映像出力ステップとを有する
     伝送エラー隠蔽処理方法。
    A transmission error concealment processing method in a system that receives two or more layered encoded data, decodes the received encoded data, and reproduces the encoded data,
    Storing a decoded signal obtained by decoding the encoded data for each layer;
    When the decoded signal of the desired layer at the time that needs to be reproduced cannot be obtained due to a transmission error, one or more stored decoded signals are read, and the read decoded signal is input to the mixing function to set the mixing An interpolated signal generating step of generating a mixed signal by mixing at a rate, and using the generated mixed signal as an interpolated signal of the desired hierarchy that is artificially generated at the time when the reproduction is required;
    A reproduction video output step of outputting the interpolated signal as a signal for reproduction at a time when the reproduction is necessary. A transmission error concealment processing method.
  6.  請求項5記載の伝送エラー隠蔽処理方法において,
     前記補間信号を記憶するステップを有し,
     前記補間信号生成ステップでは,記憶されている1つ以上の補間信号を読み込み,前記読み込んだ復号信号とともに,前記読み込んだ補間信号を前記混合関数に入力することにより,前記1つ以上の復号信号と前記1つ以上の補間信号とを混合して,前記所望階層の前記補間信号とする混合信号を生成する
     伝送エラー隠蔽処理方法。
    The transmission error concealment processing method according to claim 5,
    Storing the interpolated signal;
    In the interpolation signal generation step, one or more stored interpolation signals are read, and the read interpolation signal is input to the mixing function together with the read decoded signal, and the one or more decoded signals and A transmission error concealment processing method for generating a mixed signal that is mixed with the one or more interpolation signals and used as the interpolation signal of the desired layer.
  7.  請求項5または請求項6記載の伝送エラー隠蔽処理方法において,
     前記混合率は,前記混合関数に入力される信号が前記再生が必要な時刻に近い信号ほど高い値,または,前記混合関数に入力される前記信号が前記所望階層に近い階層の信号ほど高い値,または,前記混合関数に入力される前記信号が映像品質の推定値が高い信号ほど高い値,または,前記混合関数に入力される前記信号が信号の時間的な画素値変化に応じた値となるように設定されたものである
     伝送エラー隠蔽処理方法。
    In the transmission error concealment processing method according to claim 5 or 6,
    The mixing rate is higher as the signal input to the mixing function is closer to the time when the reproduction is required, or higher as the signal input to the mixing function is closer to the desired layer. Or the signal input to the mixing function has a higher value as the image quality estimation value is higher, or the signal input to the mixing function has a value corresponding to a temporal pixel value change of the signal. A transmission error concealment processing method.
  8.  請求項7記載の伝送エラー隠蔽処理方法において,
     前記信号の時間的な画素値変化に応じた値となるように設定された混合率は,画面を分割した領域ごとに,動き量推定により各領域が静止領域であるか動領域であるかを判定し,その判定結果に従って設定された値である
     伝送エラー隠蔽処理方法。
    The transmission error concealment processing method according to claim 7,
    The mixing ratio set to be a value corresponding to the temporal pixel value change of the signal indicates whether each area is a stationary area or a moving area by estimating the amount of movement for each area obtained by dividing the screen. Transmission error concealment processing method that is a value determined according to the determination result.
  9.  請求項5から請求項8までのいずれか1項に記載の伝送エラー隠蔽処理方法を,コンピュータに実行させるための伝送エラー隠蔽処理プログラム。 A transmission error concealment processing program for causing a computer to execute the transmission error concealment processing method according to any one of claims 5 to 8.
PCT/JP2011/054064 2010-03-11 2011-02-24 Transmission error concealment processing device, transmission error concealment processing method, and program thereof WO2011111533A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2010054136A JP2013102256A (en) 2010-03-11 2010-03-11 Transmission error concealment processing apparatus, method, and program of the same
JP2010-054136 2010-03-11

Publications (1)

Publication Number Publication Date
WO2011111533A1 true WO2011111533A1 (en) 2011-09-15

Family

ID=44563341

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2011/054064 WO2011111533A1 (en) 2010-03-11 2011-02-24 Transmission error concealment processing device, transmission error concealment processing method, and program thereof

Country Status (3)

Country Link
JP (1) JP2013102256A (en)
TW (1) TW201203167A (en)
WO (1) WO2011111533A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000027129A1 (en) * 1998-11-02 2000-05-11 Nokia Mobile Phones Limited Error concealment in a video signal
JP2001148859A (en) * 1999-11-19 2001-05-29 Matsushita Electric Ind Co Ltd Error concealment system, error concealment method and program recording medium
JP2010041728A (en) * 2008-08-06 2010-02-18 Thomson Licensing Method for predicting lost or damaged block of enhanced spatial layer frame and svc-decoder

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000027129A1 (en) * 1998-11-02 2000-05-11 Nokia Mobile Phones Limited Error concealment in a video signal
JP2001148859A (en) * 1999-11-19 2001-05-29 Matsushita Electric Ind Co Ltd Error concealment system, error concealment method and program recording medium
JP2010041728A (en) * 2008-08-06 2010-02-18 Thomson Licensing Method for predicting lost or damaged block of enhanced spatial layer frame and svc-decoder

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHEN YING ET AL.: "Frame Loss Error Concealment for SVC, Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG(ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6)", JVT-Q046, 17TH MEETING: NICE, FR, October 2005 (2005-10-01), pages 1 - 14 *
LI-NA GHANG ET AL.: "A Novel SVC VoD System with Rate Adaptation and Error Concealment over GPRS/EDGE Network", CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP 2008), May 2008 (2008-05-01), pages 349 - 354, XP031286576 *
YI GUO ET AL.: "Error Resilient Coding and Error Concealment in Scalable Video Coding", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 19, no. 6, June 2009 (2009-06-01), pages 781 - 795, XP011253524 *

Also Published As

Publication number Publication date
JP2013102256A (en) 2013-05-23
TW201203167A (en) 2012-01-16

Similar Documents

Publication Publication Date Title
JP4607136B2 (en) Image data encoding method, decoding method, and codec system
JP4734168B2 (en) Image decoding apparatus and image decoding method
US20190141287A1 (en) Using low-resolution frames to increase frame rate of high-resolution frames
JP2008219876A (en) Moving image encoding device, moving image encoding method and moving image decoding device
US20190141332A1 (en) Use of synthetic frames in video coding
JP2006279917A (en) Dynamic image encoding device, dynamic image decoding device and dynamic image transmitting system
JP2005318297A (en) Method and device for encoding/decoding dynamic image
JP5016561B2 (en) Moving picture encoding apparatus, moving picture decoding apparatus, moving picture encoding method, and moving picture decoding method
JP4559811B2 (en) Information processing apparatus and information processing method
JP2016158282A (en) Moving image prediction decoding method and moving image prediction decoding apparatus
JP5547622B2 (en) VIDEO REPRODUCTION METHOD, VIDEO REPRODUCTION DEVICE, VIDEO REPRODUCTION PROGRAM, AND RECORDING MEDIUM
JP5972687B2 (en) Moving picture predictive coding apparatus, moving picture predictive coding method, moving picture predictive coding program, moving picture predictive decoding apparatus, moving picture predictive decoding method, and moving picture predictive decoding program
JP2001016595A (en) Moving picture encoder and decoder
JP4404157B2 (en) Moving picture coding apparatus and moving picture coding method
WO2011111533A1 (en) Transmission error concealment processing device, transmission error concealment processing method, and program thereof
WO2010001832A1 (en) Dynamic image prediction/encoding device and dynamic image prediction/decoding device
JP4909592B2 (en) Moving picture reproduction method, apparatus, and program
JP2016076758A (en) Reproducer, encoder, and reproduction method
JP2006180173A (en) Device and method for encoding dynamic image, and device and method for decoding dynamic image
JP5742048B2 (en) Color moving image structure conversion method and color moving image structure conversion device
JP4870143B2 (en) Video encoding device, video encoding method, video decoding device, video decoding method
JP3907623B2 (en) Video signal encoding / decoding device and encoding / decoding method
JP2004350030A (en) Telop superimposing device, telop superimposing method, and telop superimposing program
JP6071618B2 (en) Image processing apparatus and program
JP4403565B2 (en) Moving picture decoding apparatus and moving picture decoding method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11753193

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11753193

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP