WO2013075515A1 - 多视点视频的编码、解码方法、装置和编解码器 - Google Patents
多视点视频的编码、解码方法、装置和编解码器 Download PDFInfo
- Publication number
- WO2013075515A1 WO2013075515A1 PCT/CN2012/079503 CN2012079503W WO2013075515A1 WO 2013075515 A1 WO2013075515 A1 WO 2013075515A1 CN 2012079503 W CN2012079503 W CN 2012079503W WO 2013075515 A1 WO2013075515 A1 WO 2013075515A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- view image
- view
- current
- image
- deformed
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/128—Adjusting depth or disparity
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/161—Encoding, multiplexing or demultiplexing different image signal components
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/192—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N2013/0074—Stereoscopic image analysis
- H04N2013/0081—Depth or disparity estimation from stereoscopic image signals
Definitions
- the present invention relates to the field of video codec technology, and in particular, to a multi-view video coding method, a decoding method, an apparatus, an encoder, and a decoder.
- BACKGROUND OF THE INVENTION Multi-view Video Coding is a technology proposed by H.264/AVC (Advanced Video Coding), which is mainly used for encoding stereoscopic video and multi-angle 3D video content.
- the disparity vector in MVC represents redundant information for the same scene recorded between different viewpoints.
- These disparity vectors are vectors in block units that are computed by the encoder using a non-normative motion estimation algorithm.
- the displaced block region of the previously encoded viewpoint may be used as the prediction signal of the current coding region of the other viewpoints according to the disparity vector.
- These block-based disparity vectors are incapable of accurately describing disparity information between multi-viewpoints, making it difficult for the MVC encoding method to produce high-quality predictive signals for different view compression.
- the prior art provides a method of 3D Video Coding (3DV), in which depth information is added.
- the depth information is used to describe the relationship of each pixel position between different viewpoints, so that the 3DV coding technique can produce a relatively high quality prediction signal for different viewpoint compression.
- the flow of the 3DV encoding method is described as follows: where the base view is encoded using conventional coding standards. After encoding the base view, when encoding the next view point, the encoded front view and the current code are calculated using the camera parameter information and the depth map information of the front view. The disparity information between the viewpoints is deformed by using the disparity information to obtain a deformed viewpoint map, and the deformed viewpoint map is used as a reference map of the coding unit of the currently-applied viewpoint, and the currently-applied viewpoint is encoded. deal with.
- Embodiments of the present invention provide a method for encoding a multi-view video, which can improve the accuracy of a deformed view.
- An embodiment of the present invention provides a method for encoding a multi-view video, the method comprising: minimizing an error between a current view point view and a deformed view image of a front view image to obtain an optimal deformation offset;
- the embodiment of the invention further provides a method for decoding a multi-view video, the method comprising: parsing a code stream of a current view image to obtain an optimal deformation offset;
- the deformed view map is used as a reference map for reconstructing the current decoded view image.
- An embodiment of the present invention further provides an encoding apparatus for multi-view video encoding, where the apparatus includes: an optimal deformation offset acquiring unit, configured to minimize an error of a deformed viewpoint image of a current encoded viewpoint image and a front viewpoint image, and obtain the most Excellent deformation offset
- a first disparity information calculating unit configured to acquire disparity information between the front view image and the current coded view image according to the optimal deformation offset, the camera parameter of the view, and the depth map information of the front view image;
- a first anamorphic view map calculation unit configured to determine a deformed view point view of the front view image according to the disparity information and the front view image
- a first coding prediction unit configured to predictively encode the current coded view image as the prediction signal.
- the embodiment of the present invention further provides a decoding apparatus for multi-view video, the apparatus comprising: a code stream parsing unit, configured to parse a code stream of a current view image to obtain an optimal deformation offset; and a second disparity information calculating unit, Determining, according to the optimal deformation offset, the camera parameter of the viewpoint, and the depth map information of the front view image, a disparity information second distorted view map calculation unit between the front view image and the current decoded view image, according to the Calculating a deformed viewpoint view of the front view image by using parallax information and a front view image;
- a first decoding prediction unit configured to use the transformed view image as a reference picture for reconstructing the current decoded view image.
- An embodiment of the present invention further provides an encoder and an encoding device including the multi-view video.
- a decoder including a decoding device of the multi-view video.
- the optimal deformation offset is obtained by minimizing the error of the deformed view image of the current coded view and the front view, and the parallax information is corrected by using the optimal deformation offset, thereby improving the accuracy of the deformed view.
- FIG. 1 is a flowchart of an implementation of a method for encoding a multi-view video according to an embodiment of the present invention
- FIG. 2 is an error diagram of a deformed view of a current view of a coded view and a front view of an embodiment of the present invention
- Implementation flowchart of the offset
- FIG. 3 is a flowchart of an implementation of a method for encoding a multi-view video according to another embodiment of the present invention
- FIG. 4 is a flowchart of an implementation of a method for encoding a multi-view video according to another embodiment of the present invention
- FIG. 6 is a flowchart of an implementation of a method for decoding a multi-view video according to another embodiment of the present invention
- FIG. 7 is a multi-view video according to another embodiment of the present invention
- FIG. 8 is a structural block diagram of a multi-view video encoding apparatus according to an embodiment of the present invention
- FIG. 9 is a structural block diagram of a multi-view video decoding apparatus according to an embodiment of the present invention.
- the optimal deformation offset is obtained by minimizing the error of the deformed view image of the current coded view and the front view, and the parallax information is corrected by using the optimal deformation offset, thereby improving the accuracy of the deformed view. Thereby improving the quality of video images of multiple viewpoints.
- Embodiment 1
- FIG. 1 is a flowchart showing an implementation process of a multi-view video coding method according to an embodiment of the present invention, which is described in detail as follows:
- step S1 01 the error between the current view point map and the deformed view point map of the front view image is minimized to obtain an optimum deformation offset.
- the current coded view point map refers to a view point graph currently being encoded.
- the front view map can be a pre-coded view of the current coded view field.
- the deformed viewpoint image of the front view image is obtained by performing a deformation operation on the front view image using the viewpoint deformation algorithm.
- the viewpoint image includes a video map and a depth map.
- the deformed viewpoint map includes a deformed video map and a deformed depth map.
- the deformed video image and the deformed depth map of the front view image may adopt the same optimal deformation offset, or may adopt respective optimal deformation offsets.
- the deformed video image and the deformed depth map of the current view image respectively adopt respective optimal deformation offsets
- the error of the encoded video image of the current coded view image and the coded video image of the front view image are minimized, and the deformation of the front view image is obtained.
- the code rate required for encoding may also be reduced by predictive compression between the plurality of optimal deformation offsets.
- An example is as follows: If there is a correlation between multiple optimal deformation offsets, one of the optimal deformation offsets may be transmitted, and the other optimal deformation offsets may only pass the difference from the optimal deformation offset. In this way, the code rate required for encoding can be reduced.
- the optimal deformation offset can be used to correct the entire map, and can also be used to correct a part of the graph, such as a strip.
- the errors of the deformed view of the current coded view and the front view include, but are not limited to, Sum of Absolute Difference (SAD), Mean Square Error (MSE), and Sum of Squares for Sum of Squares for Error, SSE), etc.
- the offset preset range is set, and the disparity information between the front view image and the current coded view image is corrected according to all the values in the offset preset range as the deformation offset, and the disparity information is utilized.
- All values within the offset preset range include integer values or fractional values, where the decimal value is a fractional value that is one digit after the decimal point.
- decimal value is a fractional value that is one digit after the decimal point.
- other methods can be used to minimize the error of the deformed view of the current coded view and the forward view, which is no longer exemplified.
- step S102 disparity information between the front view image and the current coded view image is obtained based on the optimal deformation offset, the camera parameter of the viewpoint, and the depth map information of the front view image.
- the camera parameters of the viewpoint include the camera focal length and the camera spatial position and depth of field information. Wait.
- the parallax information is in units of pixels.
- the step of obtaining the disparity information between the front view image and the current coded view image according to the optimal deformation offset, the camera parameter of the view, and the depth map information of the front view image specifically includes:
- the disparity information of each pixel position between the front view image and the current coded view image may be calculated by using the following formula:
- ⁇ is the camera focal length of the current encoding viewpoint.
- Z is the distance of the object from the viewpoint.
- Dd is the position of the current encoded viewpoint in the 3D space on the line of sight. The position of the front view image in the 3D space on the viewpoint line. It is the optimal deformation offset obtained in step S101.
- d is the depth value of the current pixel, and d is the maximum depth value of the current view depth. "For half pixel precision.
- step S103 a deformed viewpoint map of the front view image is determined based on the disparity information and the front view image, and the modified view view is used as a prediction signal to predictively encode the current coded view image.
- an optimal deformation offset is obtained by minimizing the error of the deformed viewpoint image of the current coded view image and the front view image, and the parallax information is corrected by using the optimal deformation offset, thereby improving the front view image.
- the accuracy of the deformed viewpoint image thereby improving the quality of the multi-view video image.
- the method further comprises the step of: writing the optimal deformation offset into the code stream.
- the decoding end can The disparity information is obtained according to the optimal deformation offset in the code stream, and the distorted view of the front view image is obtained according to the disparity information, and the deformed view image of the front view image can be used as a reference map for reconstructing the current decoded view image, so that the decoding is performed.
- the end can decode and reconstruct each viewpoint.
- the method of an embodiment of the present invention may be performed by a processor (such as a central processing unit CPU) or an application specific integrated circuit (ASIC) or the like.
- Embodiment 2 is a diagrammatic representation of Embodiment 1:
- FIG. 2 is a flowchart showing an implementation process of the embodiment of the present invention for minimizing the error of the deformed view of the current coded view and the front view, and the implementation process of obtaining the optimal deformation offset is as follows:
- step S201 the deformation offset value is set. Ffset '.
- the deformation offset value. 3 ⁇ 4 ⁇ can be set arbitrarily.
- step S202 the deformation offset value is used.
- the ffset ', the camera parameters of the viewpoint, and the depth map information of the front view image determine the disparity information between the front view image and the current coded view image.
- the parallax information may be in units of pixels.
- the disparity information of each pixel position between the front view image and the current coded view image may be calculated by using the following formula:
- Zd max where ' is the camera focal length of the current viewpoint map. Z is the distance of the object from the viewpoint. Dd is the position of the current viewpoint map in the 3D space on the viewpoint line. The position of the front view image in the 3D space on the viewpoint line. d is the depth value of the current pixel, and d - x is the maximum depth value of the current view depth. "For half pixel precision.
- step S203 a distorted view of the front view is determined based on the disparity information and the front view. Since the specific process belongs to the prior art, it will not be described here.
- step S204 an error of the deformed viewpoint map and the original map of the current encoded viewpoint map is calculated.
- the deformed viewpoint image includes a deformed video image and a deformed depth map
- the original image of the current encoded viewpoint image includes Original video image and original depth map.
- the mean square error of the original view of the deformed viewpoint image and the current coded view image is taken as an example for description.
- MSE ⁇ ffs ⁇ refers to the mean square error of the deformed viewpoint map and the current encoded viewpoint map.
- X is the pixel of the deformed viewpoint image.
- the deformed view image obtained according to the deformation offset is closer to the current coded view image.
- step S205 it is judged whether the error of the deformed viewpoint map and the current encoded viewpoint map is smaller than the current minimum error value, and if so, step S206 is performed, otherwise step S207 is performed.
- the minimum error value is set to the maximum value at the beginning P segment.
- step S206 It is judged whether or not MSE (0 ff Seti ) ⁇ MSE ( off Set pt ) is established. If yes, step S206 is performed, otherwise step S207 is performed.
- step S206 offset is set.
- step S207 the deformation offset value ° 3 ⁇ 4 ⁇ is changed within the preset offset preset range, and the process returns to step S202.
- the offset preset range refers to the variation range of the deformation offset value ° 3 ⁇ 4 ⁇ . If the offset preset range can be set to [-2, 2], the offset preset range is not limited to this illustration.
- Changing the deformation offset value within the preset offset preset range means modifying the deformation offset value 013 ⁇ 4 ⁇ to any unused value within the offset preset range.
- a value that minimizes the MSE can be found within the offset preset range, that is, an original map of the deformed view and the current coded view is found from the offset preset range.
- the closest deformation offset value by determining the value as the optimal deformation offset, can greatly improve the accuracy of the deformation viewpoint map.
- FIG. 3 is a flowchart of an implementation of a method for encoding a multi-view video according to another embodiment of the present invention. Steps S301 and S302 are the same as steps S101 and S102 in FIG. 1 , except that the following steps are further included. as follows:
- step S303 a deformed viewpoint map of the front view image is calculated based on the disparity information and the front view image, and the transformed view image is predictively encoded as a prediction signal to obtain a first prediction result.
- step S304 the current coded view image is predictively encoded by using other predictive coding modes of the current coded view image to obtain a second prediction result.
- the optimal coding is used to select and optimize the current coding unit from the prediction mode of the deformation view image and the other prediction modes of the current view image. And write the mode indicator into the code stream.
- the optimal decision includes but is not limited to rate distortion decision.
- the mode indicator is used to identify the best mode selected from the prediction mode of the warped view map and the other prediction modes of the current view map using the optimal decision.
- the prediction mode of the warped view image refers to a mode in which the warped view image is predictively coded as a prediction signal for the current coded view image.
- the mode identification symbol is set when the best mode selected from the prediction mode of the deformation view image and the other prediction modes of the current view image is the prediction mode of the deformation view image by using the optimal decision.
- Embodiment 4 is a diagrammatic representation of Embodiment 4:
- FIG. 4 is a flowchart of an implementation of a method for encoding a multi-view video according to another embodiment of the present invention. Steps S401 and S402 are the same as steps S101 and S102 in FIG. 1 , except that the method further includes the following steps:
- step S403 the distorted view map of the front view image is calculated using the disparity information and the front view image.
- step S404 an occupation mask map of the current coded view image is obtained, and the occupancy mask map is used to describe whether the pixels of the front view image can be transformed into the current coded view image.
- the specific steps for obtaining the occupation mask map belong to the prior art, and the description of the cartridge is as follows:
- an occupation mask is generated.
- the occupancy mask map describes whether the pixels of the front view image can be transformed into the current view. It is represented by a Boolean footprint mask of the same size as the video.
- the pixel is represented as occupied (ie, true) in the occupied mask map, and otherwise indicated as vacant (ie, false), and the occupied mask can be obtained by the above method.
- step S405 the mask identifier of each coding unit of the current coded view image is obtained according to the occupation mask map.
- Size is the number of all pixels in the coding unit.
- step S407 it is determined whether the occupancy rate P( cu ) of the current coding unit is greater than a preset threshold. If yes, step S408 is performed, otherwise step S409 is performed.
- step S408 the anamorphic view is used as the prediction signal of the current coding unit to predictively encode the current coded view.
- the optimal coding is used to predictively encode the current coding unit from the prediction mode of the deformation view image and the other prediction modes of the current view image.
- the prediction mode of the deformed view image refers to a mode in which the transformed view image is used as a prediction signal of the current coding unit to predictively encode the current coded view image.
- the optimal decision includes but is not limited to rate distortion decision.
- the rate-distortion decision when used to select a best mode from the prediction mode of the deformed view image and the other prediction modes of the current view image, the rate-distortion decision is used to transform from the current coding unit.
- the current coding unit is predictively coded by selecting the least expensive mode among the prediction mode of the view image and the other prediction modes of the current view.
- the deformation view image is directly used as the prediction signal of the current coding unit to predict and encode the current coded view, or the optimal decision is taken from the deformation view.
- the prediction mode and the other prediction modes of the current view point select one of the best modes to predictively encode the current coding unit, thereby improving the coding prediction accuracy and obtaining better rate distortion performance.
- the method when the occupancy rate of the current coding unit is less than or equal to a preset threshold, the method further includes:
- Use mode indicator to identify the prediction mode and current from the deformed view using optimal decision The best mode selected among the other prediction modes of the view graph, and the mode indicator is written into the code stream.
- the decoding end can parse out which mode is used to predict and decode the current coding unit according to the mode indication symbol.
- the mode indicator can be represented by WarpSkip-Mode.
- FIG. 5 is a flowchart showing an implementation process of a method for decoding a multi-view video according to an embodiment of the present invention, which is as follows:
- step S501 the code stream of the current view image is parsed to obtain an optimal deformation offset.
- the optimal deformation offset of each view has been encoded into the code stream. Therefore, by parsing the code stream of the current view image, the view points can be obtained. Optimal deformation offset.
- step S502 disparity information between the front view image and the current decoded view image is calculated based on the optimal deformation offset, the camera parameter of the viewpoint, and the depth map information of the front view image.
- the front view image may be a pre-decode view image on the spatial domain of the currently decoded view image.
- the disparity information between the front view image and the current coded view image may be calculated by using the following formula:
- a deformed viewpoint map of the front view image with respect to the current decoded view image is calculated according to the disparity information and the front view image (wherein the deformed view of the front view image relative to the current decoded view image is also referred to as the deformed view of the front view image Figure).
- the specific steps are as follows:
- the pixels of the front view image are shifted according to the disparity information to obtain a deformed view of the front view image with respect to the current decoded view.
- step S504 the calculated transformed viewpoint map is taken as a reference map for reconstructing the current decoded viewpoint map.
- the decoding reconstruction of the current decoded viewpoint is realized.
- the disparity information is obtained according to the optimal deformation offset
- the distorted view of the front view image is obtained according to the disparity information
- the transformed view image is reconstructed.
- the reference picture of the current view point map is decoded, and the reconstructed picture is constructed by using other information obtained by decoding and the reference picture, so that decoding and reconstruction of the current decoded view point can be realized.
- the optimal deformation offset may be a common optimal deformation offset of the deformed video image and the deformed depth map of the front view image, and may also include the most deformed video image and the deformed depth map of the front view image. Excellent deformation offset.
- the optimal deformation offset includes the optimal deformation offset of the deformed video image and the deformed depth map of the front view image, when the decoding end decodes, the deformed video map and the deformed depth map of the front view image can be obtained respectively.
- Optimal deformation offset when the optimal deformation offset includes the optimal deformation offset of the deformed video image and the deformed depth map of the front view image, when the decoding end decodes, the deformed video map and the deformed depth map of the front view image can be obtained respectively. Optimal deformation offset.
- FIG. 6 is a flowchart showing an implementation process of a method for decoding a multi-view video according to another embodiment of the present invention, which is described in detail as follows:
- step S601 the code stream of the current decoding unit is decoded to obtain a mode indication symbol, such as WarpSkip-Mode.
- each edit has been The mode indicator of the code unit is encoded into the code stream, so by decoding the code stream of the current view picture, the mode indicator code of each coding unit can be decoded.
- step S602 it is determined according to the mode indication symbol whether the decoded decoding view is used as the prediction signal of the current view image to predictively decode the current decoding unit, or the other prediction signal is used to perform prediction decoding on the decoding unit of the current view image.
- An example is as follows:
- the mode indicator WarpSkip-Mode 1
- the morphing view is used as the prediction signal of the decoding unit of the current view to predict and decode the current decoded view.
- the decoding end no longer needs other information such as the decoding residual of the decoding unit.
- the current decoding unit performs prediction decoding on the decoding unit of the current decoded view image by using other predictive decoding modes.
- FIG. 7 is a flowchart of an implementation of a method for decoding a multi-view video according to another embodiment of the present invention. Steps S701 and S702 are the same as steps S501 and S502 in FIG. 5, except that the method further includes the following steps:
- step S703 the distorted view map of the front view image is calculated using the parallax information and the front view image.
- an occupation mask map of the current decoded view image is obtained, and the occupation mask map is used to describe whether the pixels of the front view image can be transformed into the current decoded view image.
- step S705 the mask identifier of each decoding unit of the current decoded view image is obtained according to the occupation mask map.
- step S706 the occupancy ratio P(CU) of the current decoding unit is calculated based on the mask identifier of each decoding unit of the current decoded view image. Its calculation formula can be as follows:
- Size(CU) Where is the mask identifier of each pixel of the current coding unit, and its value is 0 or 1. Size (CU) is the number of all pixels in the coding unit.
- step S707 it is determined whether the occupancy rate P( cu ) of the current decoding unit is greater than a preset threshold. If yes, step S708 is performed, otherwise step S709 is performed.
- step S708 the anamorphic view is used as the prediction signal of the current decoding unit to predictively decode the current decoded view.
- step S709 the mode indication symbol of the current decoding unit is decoded, and the current coding unit is predicted according to the mode indication symbol, which is determined by using the prediction mode of the modified view image and the other prediction modes of the current decoded view image. coding.
- the prediction decoding mode is used to perform prediction decoding on the current decoding unit; if the mode indicator is 0, the other decoding mode is used to decode the current decoding unit.
- the occupied mask map during the decoding process may be calculated as a whole, or may be an occupied mask map calculated when decoding each decoding unit.
- the deformed viewpoint image may be calculated as a whole during the decoding process, or may be a deformed viewpoint map calculated when decoding each decoding unit.
- FIG. 8 shows an apparatus for encoding a multi-view video according to an embodiment of the present invention.
- the multi-view video encoding device can be used for an encoder, can be a software unit running in the encoder, a hardware unit or a combination of hardware and software, or can be integrated into the encoder as a separate pendant or run on the encoder.
- an encoder can be a software unit running in the encoder, a hardware unit or a combination of hardware and software, or can be integrated into the encoder as a separate pendant or run on the encoder.
- the optimal deformation offset acquisition unit 81 minimizes the deformation viewpoint of the current coded view and the front view The error of the graph gives the optimal deformation offset.
- the viewpoint image includes a video map and a depth map.
- the deformed viewpoint map includes a deformed video map and a deformed depth map.
- the errors of the deformed view of the current coded view and the front view include, but are not limited to, Sum of Absolute Difference (SAD), Mean Square Error (MSE), and Sum of Squares for Sum of Squares for Error, SSE), etc. .
- the optimal deformation offset acquisition unit 81 minimizes the error of the distortion view of the current coded view and the front view.
- the offset preset range is set, and the disparity information between the front view image and the current coded view image is corrected according to all the values in the offset preset range as the deformation offset, and the disparity information is utilized.
- the smallest value which is determined as the optimal deformation offset.
- All values within the offset preset range include integer values or fractional values, where the decimal value is a small value that retains one decimal place after the decimal point.
- the first disparity information calculation unit 82 acquires disparity information between the front view image and the current coded view image based on the optimal deformation offset, the camera parameter of the view, and the depth map information of the forward view.
- the first deformed viewpoint image calculation unit 83 calculates a deformed viewpoint map of the front viewpoint map based on the parallax information and the front viewpoint map.
- the first code prediction unit 84 predicts the current coded view image using the transformed view image calculated by the first transformed view image calculation unit 83 as a prediction signal.
- the optimal deformation offset acquisition unit 81 includes: an initial setting module 81 1 that sets a deformation offset value of 3 ⁇ 4 ⁇ .
- the disparity calculation module 81 2 is based on the deformation offset value. 3 ⁇ 4 ⁇ , the camera parameter of the viewpoint and the depth map information of the front view image calculate the disparity information between the front view image and the current coded view image.
- the deformed viewpoint image calculation module 81 3 calculates a distortion viewpoint map of the front viewpoint image based on the parallax information and the front viewpoint image.
- the error calculation module 81 4 calculates an error of the original view of the deformed viewpoint image and the current encoded viewpoint image.
- the error judging module 81 5 judges whether the error of the original view of the deformed view image and the current coded view image is smaller than the current minimum error value.
- the minimum error value is set to the maximum value at the beginning.
- the disparity calculation module 81 2 Recalculate the disparity information between the front view image and the current coded view using the changed deformation offset value, the camera parameters of the viewpoint, and the front view depth map information.
- the above modules interact cyclically until all values within the offset preset range are traversed.
- the multi-view video encoding apparatus further includes a first mode prediction unit 85, a second mode prediction unit 86, and a mode selection unit 87. among them:
- the first mode prediction unit 85 calculates a deformed viewpoint map of the front view image based on the disparity information and the front view image, and predictively encodes the current coded view image as the prediction signal to obtain a first prediction result.
- the second mode prediction unit 86 predictively encodes the current coded view image by using other predictive coding modes of the current coded view image to obtain a second prediction result.
- the mode selection unit 87 selects a best mode from the prediction mode of the deformation view image and the other prediction modes of the current view image according to the first prediction result and the second prediction result, and performs prediction coding on the current coding unit, and Write the mode indicator to the code stream.
- the optimal decision includes but is not limited to rate distortion decision.
- the mode indicator is used to identify the best mode selected from the prediction mode of the warped view map and the other prediction modes of the current view map using the optimal decision.
- the prediction mode of the warped view image refers to a mode in which the warped view image is predictively coded as a prediction signal for the current coded view image.
- the apparatus further includes a first occupation mask map acquisition unit 88, a first mask identification acquisition unit 89, a first occupancy calculation unit 901, and a second coding prediction unit 902. among them:
- the first occupation mask map acquisition unit 88 acquires an occupation mask map of the current coded view image, which is used to describe whether the pixels of the front view image can be transformed into the current coded view image.
- the first mask identification obtaining unit 89 obtains the mask identifier of each coding unit of the current coded view image according to the occupation mask map.
- the first occupancy calculation unit 901 calculates the occupancy ratio P(CU) of the current coding unit based on the mask identification of each coding unit of the current coded view. Its calculation formula can be as follows:
- the second coding prediction unit 902 determines whether the occupancy rate of the current coding unit is greater than a preset threshold, and if yes, predictively encodes the current view image by using the transformed view image as a prediction signal of the current coding unit, and if not, adopts an optimal
- the decision selects a best mode from the prediction mode of the deformed view image and the conventional coded prediction mode of the current view image to predictively encode the current coding unit.
- FIG. 9 shows a decoding apparatus for multi-view video according to an embodiment of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown.
- the decoding device of the multi-view video coding may be used for a decoder, may be a software unit running in the decoder, a hardware unit or a combination of software and hardware, or may be integrated into the decoder as a separate pendant or run on the decoding.
- a decoder may be a software unit running in the decoder, a hardware unit or a combination of software and hardware, or may be integrated into the decoder as a separate pendant or run on the decoding.
- the code stream parsing unit 91 parses the code stream of the current view image to obtain an optimum distortion offset.
- the encoder when encoding, the optimum offset value modification ° 3 ⁇ 4 ⁇ is encoded into the bit stream, therefore, when decoding the current viewpoint FIG stream to be decoded while the optimum Deformation offset.
- the second parallax information calculation unit 92 calculates disparity information between the front view image and the current decoded view image based on the optimal deformation offset, the camera parameter of the viewpoint, and the depth map information of the front view image.
- the second deformed view image calculation unit 93 calculates a deformed viewpoint view of the front view image based on the obtained parallax information and the front view image.
- the specific process is as follows: The pixels of the front view image are shifted according to the disparity information to obtain a deformed view of the front view.
- the first decoding prediction unit 94 uses the calculated transformed viewpoint map as a reference map for reconstructing the current decoded viewpoint map.
- the decoding end constructs a reconstruction map by decoding other information obtained by decoding and the reference picture, thereby realizing decoding and reconstruction of the current decoding viewpoint.
- the multi-view video decoding apparatus further includes a mode indication symbol decoding unit 95 and a second decoding prediction unit 96.
- the mode indication symbol decoding unit 95 decodes the mode indication port when decoding the current decoding unit.
- the second decoding prediction unit 96 determines whether to decode the current decoding unit using the warped view image as the prediction signal of the current view image, or predictively decode the decoding unit of the current view image by using other prediction signals.
- the decoding apparatus for the multi-view video further includes a second occupation mask map acquisition unit 97, a second mask identification acquisition unit 98, a second occupancy acquisition unit 99, and a third decoding prediction unit. 100. among them:
- the second occupied mask map obtaining unit 97 acquires an occupation mask map of the current decoded view image, wherein the occupancy mask map is used to describe whether the pixels of the front view image can be transformed into the current decoded view image.
- the second mask identification obtaining unit 98 obtains the mask identifier of each decoding unit of the current decoded view image according to the occupation mask map.
- the second occupancy calculation unit 99 calculates the occupancy rate of the current decoding unit according to the mask identification of each decoding unit of the current decoded view image
- the third decoding prediction unit 100 determines whether the occupancy rate of the current decoding unit is greater than a preset threshold. If yes, the anamorphic view is used as a prediction signal of the current decoding unit to predictively decode the current view, otherwise the mode indication of the current decoding unit is decoded. And determining, according to the mode indicator, whether to decode the current decoding unit by using a prediction mode of the modified view image or predictive decoding the current decoding unit by using another prediction mode of the current decoded view image.
- the figure in the entire file of the present invention represents a decoding unit, which may be a frame or other unit such as a slice.
- the deformed view point map may be obtained as a whole or may be a current decoding unit reference corresponding to the calculated deformed view image when decoding each decoding unit.
- the respective units included in the viewpoint reconstruction apparatus of the multi-view video and the viewpoint reconstruction apparatus of the multi-view video are only divided according to functional logic, but are not limited to the above division, as long as the corresponding The functions are only applicable; in addition, the specific names of the functional units are only for the purpose of distinguishing each other, and are not intended to limit the scope of protection of the present invention.
- the video decoding end of the embodiment of the present invention may be a processor (such as a central processing unit CPU) or an application specific integrated circuit (ASIC) or the like.
- the video decoding end of the embodiment of the present invention may specifically be a computer, a mobile phone, a set top box, a television set, and other various electronic devices.
- an optimal deformation offset is obtained by minimizing the error of the deformed viewpoint image of the current coded view image and the front view image, and the parallax information is corrected by using the optimal deformation offset, thereby improving the deformation view image.
- Accuracy which in turn reduces the code rate of other viewpoint residuals, improves the coding quality and coding compression performance of the 3DV coding method.
- the decoding end can calculate the disparity information according to the optimal deformation offset in the code stream, and obtain a distorted view image of the front view image according to the disparity information, where the front view image is obtained
- the deformed view map can be used as a reference map for reconstructing the current view image, so that the decoding end can decode and reconstruct each view point.
- the current view image is directly predicted by using the deformed view image as the prediction signal of the current coding unit according to the occupancy rate P( cu ) of the current coding unit, or the traditional reference picture of the transformed view image and the current view image is used as the prediction signal pair.
- the current viewpoint image is predicted, so that the coding prediction accuracy can be improved and better rate distortion performance can be obtained.
- the disparity information is obtained according to the optimal deformation offset, and the distorted view of the front view is obtained according to the disparity information, and the transformed view is used as the reconstructed current decoding view.
- a reference map of the figure thereby enabling decoding of the current decoded viewpoint Refactoring.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
本发明实施例提供了一种多视点视频的编码方法、解码方法及对应的装置、编码器和解码器,其中多视点视频的编码方法包括:最小化当前编码视点图和前视点图的变形视点图的误差,获得最优变形偏移;利用最优变形偏移、视点的摄像机参数以及前视点图深度图信息计算前视点图和当前编码视点图之间的视差信息;利用视差信息和前视点图计算前视点图的变形视点图,并将该变形视点图作为预测信号对当前视点图进行预测。本发明实施例通过最小化当前编码视点图和前视点图的变形视点图的误差得到最优变形偏移,采用该最优变形偏移修正视差信息,从而提高变形视点图的精度,进而改善多视点的视频图像质量。
Description
多视点视频的编码、 解码方法、 装置和编解码器 技术领域 本发明属于视频编解码技术领域, 尤其涉及多视点视频的编码方法、 解 码方法、 装置和编码器、 解码器。 背景技术 多视点视频编码 ( Multi-view Video Coding , MVC ) 是 H.264/AVC ( Advanced Video Coding, 高级视频编码)提出的一个技术, 主要用于编码 立体视频以及多角度三维视频内容。
MVC 中的视差矢量代表不同视点间记录的同一场景的冗余信息。 这些 视差矢量是以块为单位的矢量, 它们是编码端通过非标准(non-normative ) 运动估计算法计算得到的。 对于其他的视点, 可以根据视差矢量用之前已编 码的视点的可置换(displaced )块区域作为其他视点当前编码区域的预测信 号。 这些以块为单位的视差矢量是不能精确的描述多视点之间的视差信息的, 从而使得 MVC编码方法难以产生高质量的用于不同视点压缩的预测信号。
现有技术提供了一种三维视频编码(3D Video Coding, 3DV )的方法, 该 3DV编码方法中增加了深度信息。 该深度信息用于描述不同视点间的每一 个像素位置的关系, 从而 3DV编码技术可以产生质量相对高一些的用于不同 视点压缩的预测信号。
在编码端, 3DV编码方法的流程筒述如下: 其中基本视点使用传统的编 码标准进行编码。 在编码完基本视点后, 对下一个视点进行编码时, 利用摄 像机参数信息和前视点图的深度图信息计算已编码的前视点图和当前要编码
视点之间的视差信息, 利用视差信息对前视点图进行变形操作, 得到变形视 点图, 并将该变形视点图作为当前要编码的视点的编码单元的参考图, 对当 前要编码的视点进行编码处理。
视差的计算在物体或者物体的局部可能引入像素移位噪声, 导致变形后 的像素位置和物体在其他视点的位置不能完全一致。 这种现象可以解释为 rounding效应, 离摄像机近的物体要比离摄像机远的物体效应明显。 rounding 效应的出现使得依据视差信息得到的变形视点图的精度降低了, 进而影响 3DV编码的压缩效率。 发明内容 本发明实施例提供一种多视点视频的编码方法, 可以提高变形视点图的 精度。
本发明实施例提供一种多视点视频的编码方法, 所述方法包括: 最小化当前编码视点图和前视点图的变形视点图之间的误差, 获得最优 变形偏移;
根据所述最优变形偏移、 视点的摄像机参数以及所述前视点图的深度图 信息, 获取所述前视点图和所述当前编码视点图之间的视差信息;
根据所述视差信息和所述前视点图确定所述前视点图的变形视点图, 并 将所述变形视点图作为预测信号对所述当前编码视点图进行预测编码。
本发明实施例还提供一种多视点视频的解码方法, 所述方法包括: 解析当前视点图的码流, 得到最优变形偏移;
根据所述最优变形偏移、 视点的摄像机参数以及前视点图的深度图信息 确定前视点图和当前解码视点图之间的视差信息;
根据所述视差信息计算所述前视点图相对于当前解码视点图的变形视点 图;
将所述变形视点图作为重构所述当前解码视点图的参考图。
本发明实施例还提供一种多视点视频编码的编码装置, 所述装置包括: 最优变形偏移获取单元, 用于最小化当前编码视点图和前视点图的变形 视点图的误差, 获得最优变形偏移;
第一视差信息计算单元, 用于根据所述最优变形偏移、 视点的摄像机参 数以及前视点图的深度图信息获取所述前视点图和当前编码视点图之间的视 差信息;
第一变形视点图计算单元, 用于根据所述视差信息和前视点图确定所述 前视点图的变形视点图;
第一编码预测单元, 用于将所述变形视点图作为预测信号对所述当前编 码视点图进行预测编码。
本发明实施例的还提供一种多视点视频的解码装置, 所述装置包括: 码流解析单元, 用于解析当前视点图的码流, 得到最优变形偏移; 第二视差信息计算单元, 用于根据所述最优变形偏移、 视点的摄像机参 数以及前视点图的深度图信息确定前视点图和当前解码视点图之间的视差信 第二变形视点图计算单元, 用于根据所述视差信息和前视点图计算所述 前视点图的变形视点图;
第一解码预测单元, 用于将所述变形视点图作为重构所述当前解码视点 图的参考图。
本发明实施例的还提供一种包括所述多视点视频的编码装置的编码器和
包括所述多视点视频的解码装置的解码器。
在本发明实施例中, 通过最小化当前编码视点图和前视点图的变形视点 图的误差得到最优变形偏移, 采用该最优变形偏移修正视差信息, 从而提高 变形视点图的精度, 改善多视点的视频图像质量。 附图说明 为了更清楚地说明本发明实施例中的技术方案, 下面将对实施例描述中 所需要使用的附图作一筒单地介绍, 显而易见地, 下面描述中的附图是本发 明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动的前 提下, 还可以根据这些附图获得其他的附图。
图 1是本发明实施例提供的多视点视频的编码方法的实现流程图; 图 2是本发明实施例提供的最小化当前编码视点图和前视点图的变形视 点图的误差, 获得最优变形偏移的实现流程图;
图 3是本发明另一实施例提供的多视点视频的编码方法的实现流程图; 图 4是本发明另一实施例提供的多视点视频的编码方法的实现流程图; 图 5是本发明实施例提供的多视点视频的解码方法的实现流程图; 图 6是本发明另一实施例提供的多视点视频的解码方法的实现流程图; 图 7是本发明另一实施例提供的多视点视频的解码方法的实现流程图; 图 8是本发明实施例提供的多视点视频的编码装置的结构框图; 图 9是本发明实施例提供的多视点视频的解码装置的结构框图。 具体实施方式 为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合本发
明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整地描述, 显然, 所描述的实施例是本发明一部分实施例, 而不是全部的实施例。 基于 本发明中的实施例, 本领域普通技术人员在没有作出创造性劳动前提下所获 得的所有其他实施例, 都属于本发明保护的范围。
在本发明实施例中, 通过最小化当前编码视点图和前视点图的变形视点 图的误差得到最优变形偏移, 采用该最优变形偏移修正视差信息, 从而提高 变形视点图的精度, 进而改善多视点的视频图像质量。
为了说明本发明所述的技术方案, 下面通过具体实施例来进行说明。 实施例一:
图 1示出了本发明实施例提供的一种多视点视频的编码方法的实现流程, 详述如下:
在步骤 S1 01 中,最小化当前编码视点图和前视点图的变形视点图之间的 误差, 获得最优变形偏移。
其中当前编码视点图是指当前正在进行编码的视点图。 前视点图可以为 当前编码视点图空间域上的前编码视点图。 前视点图的变形视点图是通过对 前视点图采用视点变形算法进行变形运算得到的。 其中视点图包括视频图和 深度图。 变形视点图包括变形视频图和变形深度图。
在本发明实施例中, 前视点图的变形视频图和变形深度图可以采用相同 的最优变形偏移, 也可以分别采用各自的最优变形偏移。
当前视点图的变形视频图和变形深度图分别采用各自的最优变形偏移 时, 则最小化当前编码视点图的编码视频图和前视点图的编码视频图的误差, 获得前视点图的变形视频图的最优变形偏移, 最小化当前编码视点图的深度 图和前视点图的变形深度图的误差, 获得前视点图的变形深度图的最优变形
偏移。
在本发明实施例中, 当存在多个最优变形偏移时, 该多个最优变形偏移 之间也可以通过预测压缩来减少编码所需码率。 举例说明如下: 如果多个最 优变形偏移之间存在相关性时, 则可以传递其中一个最优变形偏移, 其它最 优变形偏移可以仅传递与该最优变形偏移的差值, 通过这种方式可以减少编 码所需码率。
在本发明实施例中, 最优变形偏移可以用于^ ί'爹正整图, 也可以用于对一 部分图比如条带进行修正。
其中当前编码视点图和前视点图的变形视点图的误差包括但不限于绝对 差和 ( Sum of Absolute Difference, SAD )、 均方差 ( Mean Square Error, MSE )和误差项平方和 ( Sum of Squares for Error, SSE )等。
其中最小化当前编码视点图和前视点图的变形视点图的误差的方式有多 种。 如可以通过以下方式: 设置一偏移预设范围, 根据该偏移预设范围内的 所有值分别作为变形偏移来修正前视点图和当前编码视点图之间的视差信 息, 利用该视差信息得到前视点图的变形视点图, 计算当前编码视点图和前 视点图的变形视点图的误差, 从该偏移预设范围中选择一个使当前编码视点 图和前视点图的变形视点图的误差最小的值, 将该值确定为最优变形偏移。 其中偏移预设范围内的所有值包括整数值或者小数值, 其中小数值为保留小 数点后一位的小数值。 当然还可以采用其他方式最小化当前编码视点图和前 视点图的变形视点图的误差, 在此不再——举例说明。
在步骤 S102中,根据最优变形偏移、视点的摄像机参数以及前视点图的 深度图信息, 获取前视点图和当前编码视点图之间的视差信息。
其中视点的摄像机参数包括摄像机焦距以及摄像机空间位置和景深信息
等。 其中视差信息是以像素点为单位的。
其中根据最优变形偏移、 视点的摄像机参数以及前视点图的深度图信息, 获取前视点图和所述当前编码视点图之间的视差信息的步骤具体包括:
根据视点的摄像机参数以及前视点图的深度图信息计算得到初始视差信 根据最优变形偏移以及前视点图的深度图信息得到偏移修正信号; 根据初始视差信号和偏移修正信号得到前视点图和所述当前编码视点图 之间的视差信息。
在本发明实施例中, 可以采用如下公式计算前视点图和当前编码视点图 之间的每个像素位置的视差信息:
disp( p) =— * ( Xadd - Xbase ) + offset^ * -^— * a
z dmax 其中 ^为当前编码视点的摄像机焦距。 Z为物体距离视点的距离。 dd 为 3D空间内当前编码视点在视点直线上的位置。 为 3D空间内前视点图 在视点直线上的位置。 为步骤 S101 中得到的最优变形偏移。 d为当前像 素的深度值, d 为当前视点图深度的最大深度值。 "为半像素精度。
在步骤 S103中, 根据视差信息和前视点图确定前视点图的变形视点图, 并将该变形视点图作为预测信号对当前编码视点图进行预测编码。
在本发明实施例中, 通过最小化当前编码视点图和前视点图的变形视点 图的误差, 得到最优变形偏移, 采用该最优变形偏移修正视差信息, 从而可 以提高前视点图的变形视点图的精度, 进而提高多视点的视频图像质量。
在本发明优选实施例中, 在步骤 S101之后, 该方法还包括下述步骤: 将最优变形偏移写进码流中。
在本实施例中, 由于将最优变形偏移编码进码流中了, 从而解码端可以
根据码流中的最优变形偏移计算得到视差信息, 依据该视差信息得到前视点 图的变形视点图, 该前视点图的变形视点图可以作为重构当前解码视点图的 参考图, 使得解码端可以解码并重建各视点。 本发明实施例的方法可以由处 理器(比如中央处理器 CPU )或专用集成电路(ASIC )等执行。
实施例二:
图 2示出了本发明实施例提供的最小化当前编码视点图和前视点图的变 形视点图的误差, 获得最优变形偏移的实现流程, 详述如下:
在步骤 S201 中, 设定变形偏移值。 ffset'。
在本发明实施例中 , 该变形偏移值。 ¾^可以任意设置。
在步骤 S202中, 根据变形偏移值。 ffset'、 视点的摄像机参数以及前视点 图的深度图信息确定前视点图和当前编码视点图之间的视差信息。 其中视差 信息可以是以像素点为单位的。
在本发明实施例中, 可以采用如下公式计算前视点图和当前编码视点图 之间的每个像素位置的视差信息:
disp( p) =— * ( Xadd - Xbase ) + offset^ * -^— * a
z dmax 其中'为当前视点图的摄像机焦距。 Z 为物体距离视点的距离。 dd 为 3D空间内当前视点图在视点直线上的位置。 为 3D空间内前视点图在 视点直线上的位置。 d为当前像素的深度值, d-x为当前视点图深度的最大深 度值。 "为半像素精度。
在步骤 S203中, 根据视差信息和前视点图确定前视点图的变形视点图。 由于其具体过程属于现有技术, 在此不再赘述。
在步骤 S204中,计算变形视点图和当前编码视点图的原始图的误差。其 中变形视点图包括变形视频图和变形深度图, 当前编码视点图的原始图包括
原始视频图和原始深度图。 以计算变形视点图和当前编码视点图的原始图的 均方差为例进行说明。
其具体计算过程如下: n
MSE (offset ) = -V (X. -Y)
η i=i
其中 MSE^ffs^ )是指变形视点图和当前编码视点图的均方差。 为当前编 码视点图的像素。 X为变形视点图的像素。
在本发明实施例中, 当变形视点图和当前编码视点图的均方差越小时, 表示依据该变形偏移得到的变形视点图与当前编码视点图越接近。
在步骤 S205,判断变形视点图和当前编码视点图的误差是否小于目前最 小的误差值, 如果是, 执行步骤 S206, 否则执行步骤 S207。 其中最小误差 值在开始 P介段设置为极大值。
以判断变形视点图和当前编码视点的原始图的均方差是否小于预设的极 大值为例, 其中具体的判断步骤为:
判断 MSE(0ffSeti ) < MSE(offSet。pt)是否成立, 如果成立, 执行步骤 S206, 否 则执行步骤 S207。
在步 S206中, 设置 offset。pt =。ffseti , MSE(offset。Pt ) = MSE(offseti )。 在步骤 S207中, 在预设的偏移预设范围内更改变形偏移值°¾^ , 并返 回步骤 S202。 其中偏移预设范围是指变形偏移值°¾^的变化范围。 如可以将该偏移预 设范围设置为 [-2, 2], 但偏移预设范围不以该举例说明为限。
其中在预设的偏移预设范围内更改变形偏移值 是指将变形偏移值 01¾^修改为该偏移预设范围内的未使用过的任意一个值。 通过上述步骤, 遍历偏移预设范围内的所有值, 即当偏移预设范围内的
所有值均被设置为变形偏移值°^^以后, 即可从该偏移预设范围内找出一个 使 MSE^ffsetJ最小的值, 将该值确定为最优变形偏移。起镇南关偏移预设范围 内的所有值包括整数值或者小数值, 其中小数值为保留小数点后一位的小数 值。
在本发明实施例中,可以在偏移预设范围内找出一个使 MSE(。ffset 最小的 值, 即从偏移预设范围内找出一个使变形视点图和当前编码视点图的原始图 最接近的变形偏移值, 通过将该值确定为最优变形偏移, 从而可以极大的提 高变形视点图的精度。 实施例三:
图 3示出了本发明另一实施例提供的多视点视频的编码方法的实现流程, 其中步骤 S301、 S302与图 1 中的步骤 S101、 S102相同, 不同之处在于还 包括以下步骤, 详述如下:
在步骤 S303中, 根据视差信息和前视点图计算前视点图的变形视点图, 并将该变形视点图作为预测信号对当前编码视点图进行预测编码, 得到第一 预测结果。
在步骤 S304中,采用当前编码视点图的其他预测编码模式对当前编码视 点图进行预测编码, 得到第二预测结果。
在步骤 S305中,根据第一预测结果和第二预测结果采用最优决策从变形 视点图的预测模式和当前视点图的其他预测模式中选择一种最好的模式对当 前编码单元进行预测编码, 并将模式指示符号写进码流中。 其中最优决策包 括但不限于率失真决策。 模式指示符号用于标识采用最优决策从变形视点图 的预测模式和当前视点图的其他预测模式中选择出的最好的模式。 变形视点 图的预测模式是指将该变形视点图作为预测信号对当前编码视点图进行预测 编码的模式。
如当采用最优决策从变形视点图的预测模式和当前视点图的其他预测模 式中选择出的最好的模式为变形视点图的预测模式时, 设置模式标识符号
WarpSkip-Mode=1; 当采用最优决策从变形视点图的预测模式和当前视点图 的其他预测模式中选择出的最好的模式为当前视点图的其他预测模式时, 设 置模式标识符号 WarpSkip-Mode=0; 将该模式标识符号写进码流中, 以便解 码器在接收到码流后, 可以依据该模式标识符号进行正确的解码。
实施例四:
图 4示出了本发明另一实施例提供的多视点视频的编码方法的实现流程, 其中步骤 S401、 S402与图 1 中的步骤 S101、 S102相同, 不同之处在于, 其还包括以下步骤:
在步骤 S403中, 利用视差信息和前视点图计算前视点图的变形视点图。 在步骤 S404中,获取当前编码视点图的占用掩码图,该占用掩码图用于 描述前视点图的像素是否可以被变形到当前编码视点图中。
其中获取占用掩码图的具体步骤属于现有技术, 在此筒单说明如下: 在 计算变形视点图时, 产生一个占用掩码图 (occupancy mask )。 该占用掩码 图描述了前视点图的像素是否可以被变形到当前视点图中。 它的表现形式为 一个和视频相同尺寸的布尔占用掩码图。 当一个像素被成功的从其他视点变 形得到, 则该像素在占用掩码图中被表示为占用 (即真), 反之则被表示为空 缺(即假 ), 通过上述方式即可得到占用掩码图。
在步骤 S405中,根据占用掩码图得到当前编码视点图的每一个编码单元 的掩码标识。
在步骤 S406中,根据当前编码视点图的每一个编码单元的掩码标识计算 当前编码单元的占有率 P(cu)。 其计算公式可以如下:
p(CU) = ∑cu ?i
size(CU)
其中 为当前编码单元每一个像素的掩码标识,其值为 0或者 1。 size(CU) 是该编码单元内的所有像素的个数。
在步骤 S407中,判断当前编码单元的占有率 P(cu)是否大于预设的阈值, 如果是, 执行步骤 S408, 否则执行步骤 S409。
在步骤 S408中,将变形视点图作为当前编码单元的预测信号对当前编码 视点图进行预测编码。
在步骤 S409中,采用最优决策从变形视点图的预测模式和当前视点图的 其他预测模式中选择一种最好的模式对当前编码单元进行预测编码。 其中变 形视点图的预测模式是指将变形视点图作为当前编码单元的预测信号对当前 编码视点图进行预测编码的模式。 其中最优决策包括但不限于率失真决策。
在本发明实施例中, 在采用率失真决策从变形视点图的预测模式和当前 视点图的其他预测模式中选择一种最好的模式对当前编码单元进行预测编码 时, 采用率失真决策从变形视点图的预测模式和当前视点图的其他预测模式 中选择代价最小的模式对当前编码单元进行预测编码。
在本发明实施例中, 通过根据当前编码单元的占有率 P(cu) , 直接采用变 形视点图作为当前编码单元的预测信号对当前编码视点图进行预测编码, 或 者采用最优决策从变形视点图的预测模式和当前视点图的其他预测模式中选 择一种最好的模式对当前编码单元进行预测编码, 从而可以提高编码预测精 度, 得到更好的率失真性能。
在本发明的优选实施例中, 当当前编码单元的占有率小于或者等于预设 的阈值时, 所述方法还包括:
采用模式指示符号来标识采用最优决策从变形视点图的预测模式和当前
视点图的其他预测模式中选择出的最好的模式, 并将所述模式指示符号写进 码流中。 这样解码端即可根据该模式指示符号解析出采用何种模式对对当前 编码单元进行预测解码。 如模式指示符号可以用 WarpSkip-Mode代表。
实施例五
图 5示出了本发明实施例提供的多视点视频的解码方法的实现流程, 详 述如下:
在步骤 S501 中, 解析当前视点图的码流, 得到最优变形偏移。
在本发明实施例中, 在编码端对多视点视频进行编码时, 已经将各视点 的最优变形偏移编码进码流了, 因此通过解析当前视点图的码流, 即可得到 各视点的最优变形偏移。
在步骤 S502中,根据最优变形偏移、视点的摄像机参数以及前视点图的 深度图信息计算前视点图和当前解码视点图之间的视差信息。 其中前视点图 可以是当前解码视点图空间域上的前解码视点图。 其具体过程如下:
根据所述视点的摄像机参数以及前视点图的深度图信息计算得到初始视 差信号;
根据所述最优变形偏移以及所述前视点图的深度图信息得到偏移修正信 根据所述初始视差信号和所述偏移修正信号得到所述前视点图和所述当 前编码视点图之间的视差信息。
在本发明实施例中, 可以采用如下公式计算前视点图和当前编码视点图 之间的视差信息:
disp( p) =— * ( Xadd - Xbase ) + offset^ * - -^— * a
在步骤 S503中,根据视差信息和前视点图计算前视点图相对于当前解码 视点图的变形视点图 (其中前视点图相对于当前解码视点图的变形视点图也 称为前视点图的变形视点图)。 其具体步骤如下:
根据视差信息对前视点图的像素进行移位得到前视点图相对于当前解码 视点图的变形视点图。
在步骤 S504中,将计算得到的变形视点图作为重构当前解码视点图的参 考图。 通过将解码得到的其它信息和参考图共同构建重建图, 从而实现对当 前解码视点的解码重构。
在本发明实施例中, 通过解码码流中的最优变形偏移, 依据该最优变形 偏移得到视差信息, 依据视差信息得到前视点图的变形视点图, 将该变形视 点图作为重构当前解码视点图的参考图, 并用解码得到的其它信息和参考图 共同构建重建图, 从而可以实现对当前解码视点的解码重构。
在本发明实施例中, 该最优变形偏移可以为前视点图的变形视频图和变 形深度图共同最优变形偏移, 也可以包括前视点图的变形视频图和变形深度 图各自的最优变形偏移。 当最优变形偏移包括前视点图的变形视频图和变形 深度图各自的最优变形偏移时, 则在解码端进行解码时, 可以得到前视点图 的变形视频图和变形深度图各自的最优变形偏移。
实施例六
图 6示出了本发明另一实施例提供的多视点视频的解码方法的实现流程, 详述如下:
在步骤 S601 中, 解码当前解码单元的码流, 得到模式指示符号, 例如 WarpSkip-Mode。
在本发明实施例中, 在编码端对多视点视频进行编码时, 已经将各个编
码单元的模式指示符号编码进码流了, 因此通过解码当前视点图的码流, 即 可解码得到各个编码单元的此模式指示符号编码。
在步骤 S602中,根据模式指示符号判断是采用变形视点图作为当前视点 图的预测信号对当前解码单元进行预测解码, 还是采用其他预测信号对当前 视点图的解码单元进行预测解码。 举例说明如下:
如果该模式指示符号 WarpSkip-Mode=1的话,将变形视点图作为当前视 点图的解码单元的预测信号对当前解码视点图进行预测解码。 解码端不再需 要该解码单元的解码残差等其他信息。 如果该模式指示符号
WarpSkip-Mode=0的话, 当前解码单元采用其他的预测解码模式对当前解码 视点图的解码单元进行预测解码。
实施例七:
图 7示出了本发明另一实施例提供的多视点视频的解码方法的实现流程, 其中步骤 S701、 S702与图 5中的步骤 S501、 S502相同, 不同之处在于, 其还包括以下步骤:
在步骤 S703中, 利用视差信息和前视点图计算前视点图的变形视点图。 在步骤 S704中,获取当前解码视点图的占用掩码图,该占用掩码图用于 描述前视点图的像素是否可以被变形到当前解码视点图中。
在步骤 S705中,根据占用掩码图得到当前解码视点图的每一个解码单元 的掩码标识。
在步骤 S706中,根据当前解码视点图的每一个解码单元的掩码标识计算 当前解码单元的占有率 P(CU)。 其计算公式可以如下:
∑
size(CU)
其中 为当前编码单元每一个像素的掩码标识,其值为 0或者 1。 size(CU) 是该编码单元内的所有像素的个数。
在步骤 S707中,判断当前解码单元的占有率 P(cu)是否大于预设的阈值, 如果是, 执行步骤 S708, 否则执行步骤 S709。
在步骤 S708中,将变形视点图作为当前解码单元的预测信号对当前解码 视点图进行预测解码。
在步骤 S709中,解码当前解码单元的模式指示符号,根据模式指示符号 判断是采用变形视点图的预测模式和当前解码视点图的其他预测模式中选择 一种最好的模式对当前编码单元进行预测编码。
举例说明如下: 如果模式指示符号为 1 ,采用变形视点图的预测模式对当 前解码单元进行预测解码; 如果模式指示符号为 0, 采用其他预测解码模式对 当前解码单元进行解码。
在解码过程中占用掩码图可以是整体计算得到, 也可以是解码每一个解 码单元时计算得到占用掩码图。
在解码过程中变形视点图可以是整体计算得到, 也可以是解码每一个解 码单元时计算得到变形视点图。
实施例八
图 8示出了本发明实施例提供的多视点视频的编码装置, 为了便于说明, 仅示出了与本发明实施例相关的部分。 该多视点视频的编码装置可以用于编 码器, 可以是运行于编码器内的软件单元、 硬件单元或者软硬件相结合的单 元, 也可以作为独立的挂件集成到编码器中或者运行于编码器的应用系统中, 其中:
最优变形偏移获取单元 81最小化当前编码视点图和前视点图的变形视点
图的误差, 获得最优变形偏移。
其中视点图包括视频图和深度图。 变形视点图包括变形视频图和变形深 度图。 其中当前编码视点图和前视点图的变形视点图的误差包括但不限于绝 对差和 ( Sum of Absolute Difference, SAD )、 均方差 ( Mean Square Error, MSE )和误差项平方和 ( Sum of Squares for Error, SSE )等。。
其中最优变形偏移获取单元 81最小化当前编码视点图和前视点图的变形 视点图的误差的方式有多种。 如可以通过以下方式: 设置一偏移预设范围, 根据该偏移预设范围内的所有值分别作为变形偏移来修正前视点图和当前编 码视点图之间的视差信息, 利用该视差信息得到前视点图的变形视点图, 计 算当前编码视点图和前视点图的变形视点图的误差, 从该偏移预设范围中选 择一个使当前编码视点图和前视点图的变形视点图的误差最小的值, 将该值 确定为最优变形偏移。 其中偏移预设范围内的所有值包括整数值或者小数值, 其中小数值为保留小数点后一位的小数值。
第一视差信息计算单元 82根据最优变形偏移、视点的摄像机参数以及前 视点图的深度图信息获取前视点图和当前编码视点图之间的视差信息。
在本发明实施例中, 可以采用如下公式计算前视点图和当前编码视点图 之间的每个像素位置的视差信息: disp ( P) =— * ( Xadd - Xbase ) + offset, *-—* a
Z dmax 第一变形视点图计算单元 83根据视差信息和前视点图计算前视点图的变 形视点图。
第一编码预测单元 84将第一变形视点图计算单元 83计算得到的变形视 点图作为预测信号对当前编码视点图进行预测。
在本发明优选实施例中, 该最优变形偏移获取单元 81 包括: 初始设置模块 81 1设定变形偏移值°¾^。
视差计算模块 81 2根据变形偏移值。 ¾^、 视点的摄像机参数以及前视点 图的深度图信息计算前视点图和当前编码视点图之间的视差信息。
变形视点图计算模块 81 3根据视差信息和前视点图计算前视点图的变形 视点图。
误差计算模块 81 4计算变形视点图和当前编码视点图的原始图的误差。 误差判断模块 81 5判断变形视点图和当前编码视点图的原始图的误差是 否小于目前最小的误差值。 其中最小误差值在开始阶段设置为极大值。
偏移值更改模块 81 6在误差判断模块 81 5判定变形视点图和当前编码视 点图的原始图的误差小于目前最小的误差值时, 设置。 ffset。pt =。¾ ,
MSE(offsetopt) = MSE(offseti) , 并在预设的偏移预设范围内更改变形偏移值 , 否则直接在预设的偏移预设范围内更改变形偏移值 Qffsei。 偏移值更改模块
81 6再触发视差计算模块 81 2利用更改后的变形偏移值 、 视点的摄像机 参数以及前视点图深度图信息重新计算前视点图和当前编码视点图之间的视 差信息。 上述模块之间循环交互, 直到偏移预设范围内的所有值均遍历完毕。
在本发明优选实施例中, 该多视点视频的编码装置还包括第一模式预测 单元 85、 第二模式预测单元 86、 模式选择单元 87。 其中:
第一模式预测单元 85根据视差信息和前视点图计算前视点图的变形视点 图, 并将该变形视点图作为预测信号对当前编码视点图进行预测编码, 得到 第一预测结果。
第二模式预测单元 86采用当前编码视点图的其他预测编码模式对当前编 码视点图进行预测编码, 得到第二预测结果。
模式选择单元 87根据第一预测结果和第二预测结果采用最优决策从变形 视点图的预测模式和当前视点图的其他预测模式中选择一种最好的模式对当 前编码单元进行预测编码, 并将模式指示符号写进码流中。 其中最优决策包 括但不限于率失真决策。 模式指示符号用于标识采用最优决策从变形视点图 的预测模式和当前视点图的其他预测模式中选择出的最好的模式。 变形视点 图的预测模式是指将该变形视点图作为预测信号对当前编码视点图进行预测 编码的模式。
在本发明优选实施例中, 该装置还包括第一占用掩码图获取单元 88、 第 一掩码标识获取单元 89、第一占有率计算单元 901和第二编码预测单元 902。 其中:
第一占用掩码图获取单元 88获取当前编码视点图的占用掩码图,该占用 掩码图用于描述前视点图的像素是否可以被变形到当前编码视点图中。
第一掩码标识获取单元 89根据占用掩码图得到当前编码视点图的每一个 编码单元的掩码标识。
第一占有率计算单元 901根据当前编码视点图的每一个编码单元的掩码 标识计算当前编码单元的占有率 P(CU)。 其计算公式可以如下:
P(CU ) - 厶
size(CU)
第二编码预测单元 902 判断当前编码单元的占有率是否大于预设的阈 值, 如果是, 则将变形视点图作为当前编码单元的预测信号对当前视点图进 行预测编码, 如果否, 则采用最优决策从变形视点图的预测模式和当前视点 图的传统编码预测模式中选择一种最好的模式对当前编码单元进行预测编 码。
实施例九
图 9示出了本发明实施例提供的多视点视频的解码装置, 为了便于说明, 仅示出了与本发明实施例相关的部分。
该多视点视频编码的解码装置可以用于解码器, 可以是运行于解码器内 的软件单元、 硬件单元或者软硬件相结合的单元, 也可以作为独立的挂件集 成到解码器中或者运行于解码器的应用系统中, 其中:
码流解析单元 91解析当前视点图的码流, 得到最优变形偏移。
在本发明实施例中, 由于编码端在编码时, 将最优变形偏移值°¾^编码 进了码流中, 因此, 在解码当前视点图的码流时, 可以同时解码出该最优变 形偏移。
第二视差信息计算单元 92根据最优变形偏移、视点的摄像机参数以及前 视点图的深度图信息计算前视点图和当前解码视点图之间的视差信息。 其中 前视点图可以指当前解码视点图空间域上的前解码视点图。 其具体过程如下: disp ( P) =— * ( Xadd - Xbase ) + offset, *-—* a
Z dmax
第二变形视点图计算单元 93根据得到的视差信息和前视点图计算前视点 图的变形视点图。 其具体过程如下: 根据视差信息对前视点图的像素进行移 位得到前视点图的变形视点图。
第一解码预测单元 94将计算得到的变形视点图作为重构当前解码视点图 的参考图。 解码端通过解码得到的其它信息和参考图共同构建重建图, 从而 实现对当前解码视点的解码重构。
在本发明优选实施例中, 该多视点视频的解码装置还包括模式指示符号 解码单元 95和第二解码预测单元 96。
该模式指示符号解码单元 95在解码当前解码单元时,解码得到模式指示 口
付 。
第二解码预测单元 96根据模式指示符号判断是采用变形视点图作为当前 视点图的预测信号对当前解码单元进行预测解码, 还是采用其他预测信号对 当前视点图的解码单元进行预测解码。
在本发明的优选实施例中, 该多视点视频的解码装置还包括第二占用掩 码图获取单元 97、 第二掩码标识获取单元 98、 第二占有率获取单元 99和第 三解码预测单元 100。 其中:
第二占用掩码图获取单元 97获取当前解码视点图的占用掩码图,其中占 用掩码图用于描述前视点图的像素是否可以被变形到当前解码视点图中。
第二掩码标识获取单元 98根据占用掩码图得到当前解码视点图的每一个 解码单元的掩码标识。
第二占有率计算单元 99根据当前解码视点图的每一个解码单元的掩码标 识计算当前解码单元的占有率;
第三解码预测单元 100 判断当前解码单元的占有率是否大于预设的阈 值, 如果是, 将变形视点图作为当前解码单元的预测信号对当前视点图进行 预测解码, 否则解码当前解码单元的模式指示符号, 根据所述模式指示符号 来判断是采用变形视点图的预测模式对当前解码单元进行解码还是采用当前 解码视点图的其他预测模式对当前解码单元进行预测解码。
在本发明的整个文件中的图代表一种解码单位, 可以是一帧 (frame) , 也 可以是条带 (slice)等其它单位。在解码过程中变形视点图可以是整体计算得到 也可以是解码每一个解码单元时计算得到变形视点图对应的当前解码单元参 考。
值得注意的是, 上述多视点视频的视点重构装置和多视点视频的视点重 构装置所包括的各个单元只是按照功能逻辑进行划分的, 但并不局限于上述 的划分, 只要能够实现相应的功能即可; 另外, 各功能单元的具体名称也只 是为了便于相互区分, 并不用于限制本发明的保护范围。
本发明实施例的视频解码端可以是处理器(比如中央处理器 CPU )或专 用集成电路(ASIC )等。 本发明实施例的视频解码端具体可以是计算机、 手 机、 机顶盒、 电视机以及其他各种电子设备等。
本领域普通技术人员可以理解, 实现上述实施例方法中的全部或部分步 骤是可以通过程序来指令相关的硬件来完成, 所述的程序可以在存储于一计 算机可读取存储介质中, 所述的存储介质, 如 ROM/RAM、 磁盘、 光盘等。
在本发明实施例中, 通过最小化当前编码视点图和前视点图的变形视点 图的误差, 得到最优变形偏移, 采用该最优变形偏移修正视差信息, 从而可 以提高变形视点图的精度, 进而减少编码其他视点残差的码率, 改善 3DV编 码方法的编码质量和编码压缩性能。 通过将最优变形偏移编码进码流中, 从 而解码端可以根据码流中的最优变形偏移计算得到视差信息, 依据该视差信 息得到前视点图的变形视点图, 该前视点图的变形视点图可以作为重构当前 视点图的参考图, 使得解码端可以解码并重建各视点。 通过根据当前编码单 元的占有率 P(cu) ,直接采用变形视点图作为当前编码单元的预测信号对当前 视点图进行预测, 或者将变形视点图和当前视点图的传统参考图共同作为预 测信号对当前视点图进行预测, 从而可以提高编码预测精度, 得到更好的率 失真性能。 在解码端, 通过解码码流中的最优变形偏移, 依据该最优变形偏 移得到视差信息, 依据视差信息得到前视点图的变形视点图, 将该变形视点 图作为重构当前解码视点图的参考图, 从而可以实现对当前解码视点的解码
重构。
以上所述仅为本发明的较佳实施例而已, 并不用以限制本发明, 凡在本 发明的精神和原则之内所作的任何修改、 等同替换和改进等, 均应包含在本 发明的保护范围之内。
Claims
1、 一种多视点视频的编码方法, 其特征在于, 所述方法包括:
最小化当前编码视点图和前视点图的变形视点图之间的误差, 获得最优变 形偏移;
根据所述最优变形偏移、 视点的摄像机参数以及所述前视点图的深度图信 息, 获取所述前视点图和所述当前编码视点图之间的视差信息;
根据所述视差信息和所述前视点图确定所述前视点图的变形视点图, 并将 所述变形视点图作为预测信号对所述当前编码视点图进行预测编码。
2、 如权利要求 1所述的多视点视频的编码方法, 其特征在于, 所述当前编 码视点图和前视点图的变形视点图的误差包括:
绝对差和 SAD、 或均方差 MSE、 或误差项平方和 SSE。
3、 如权利要求 1所述的多视点视频的编码方法, 其特征在于, 所述最优变 形偏移为所述前视点图的变形视频图和变形深度图共同的最优变形偏移, 或者 所述最优变形偏移包括前视点图的变形视频图和变形深度图各自的最优变形偏 移。
4、 如权利要求 1所述的多视点视频的编码方法, 其特征在于, 所述最小化 当前编码视点图和前视点图的变形视点图的误差, 获得最优变形偏移包括: 设置一偏移预设范围, 根据所述偏移预设范围内的所有值分别作为变形偏 移来修正前视点图和当前编码视点图之间的视差信息, 利用所述视差信息得到 所述前视点图的变形视点图, 计算当前编码视点图和前视点图的变形视点图的 误差, 从所述偏移预设范围中选择一个使当前编码视点图和前视点图的变形视 点图的误差最小的值, 将该值确定为最优变形偏移, 所述偏移预设范围内的所 有值包括整数值或者小数值, 所述小数值为保留小数点后一位的小数值。
5、 如权利要求 1所述的多视点视频的编码方法, 其特征在于, 所述最小化 当前编码视点图和前视点图的变形视点图的误差, 获得最优变形偏移包括: a、 设定变形偏移值°¾^;
b、根据变形偏移值。 ¾^、视点的摄像机参数以及前视点图的深度图信息确 定前视点图和当前编码视点图之间的视差信息;
c、 根据视差信息和前视点图确定前视点图的变形视点图;
d、 计算变形视点图和当前编码视点图的原始图的误差;
e、判断变形视点图和当前编码视点图的原始图的误差是否小于目前最小的 误差值, 如果是, 执行步骤 f, 否则执行步骤 g;
f 设置 = offset MSE (offset。pt ) = MSE (offse^ ) · g、 在预设的偏移预设范围内更改变形偏移值 并返回所述步骤 b, 直 到所述偏移预设范围内的所有值均遍历完毕, 循环结束。
6、 如权利要求 1所述的方法, 其特征在于, 所述根据所述最优变形偏移、 视点的摄像机参数以及所述前视点图的深度图信息, 获取所述前视点图和所述 当前编码视点图之间的视差信息包括:
根据所述视点的摄像机参数以及前视点图的深度图信息计算得到初始视差 信号;
根据所述最优变形偏移以及所述前视点图的深度图信息得到偏移修正信
根据所述初始视差信号和所述偏移修正信号得到所述前视点图和所述当前 编码视点图之间的视差信息。
7、 如权利要求 1或 6所述的方法, 其特征在于, 采用下述公式获取前视点 图和当前编码视点图之间的视差信息:
disp( p) =— * ( Xadd - Xbase ) + offset^ * -^— * a
z dmax
其中 为当前视点图的摄像机焦距, Z为物体距离视点的距离, dd为 3D空 间内当前视点图在视点直线上的位置, 为 3D空间内前视点图在视点直线上 的位置, d为当前像素的深度值, d max为当前视点图深度的最大深度值, "为像 素精度。
8、 如权利要求 1所述的多视点视频的编码方法, 其特征在于, 在所述最小 化当前编码视点图和前视点图的变形视点图的误差, 获得最优变形偏移之后, 所述方法还包括:
将最优变形偏移写进码流中。
9、 如权利要求 1所述的多视点视频的编码方法, 其特征在于, 在根据所述 最优变形偏移、 视点的摄像机参数以及所述前视点图的深度图信息, 获取所述 前视点图和所述当前编码视点图之间的视差信息之后, 还包括:
根据视差信息和前视点图计算前视点图的变形视点图, 并将该变形视点图 作为预测信号对当前编码视点图进行预测编码, 得到第一预测结果;
采用当前编码视点图的其他预测编码模式对当前编码视点图进行预测编 码, 得到第二预测结果;
根据第一预测结果和第二预测结果采用最优决策从变形视点图的预测模式 和当前视点图的其他预测模式中选择一种最好的模式对当前编码单元进行预测 编码, 并将模式指示符号写进码流中, 所述模式指示符号用于标识采用最优决 策从变形视点图的预测模式和当前视点图的其他预测模式中选择出的最好的模 式。
1 0、 如权利要求 1 所述的多视点视频的编码方法, 其特征在于, 所述利用 视差信息和前视点图计算前视点图的变形视点图, 并将所述变形视点图作为预 测信号对当前编码视点图进行预测包括:
利用视差信息和前视点图计算前视点图的变形视点图;
获取当前编码视点图的占用掩码图, 所述占用掩码图用于描述所述前视点 图的像素是否可以被变形到当前编码视点图中;
根据所述占用掩码图得到所述当前编码视点图的每一个编码单元的掩码标 识;
根据所述当前编码视点图的每一个编码单元的掩码标识计算当前编码单元 的占有率;
判断当前编码单元的占有率是否大于预设的阈值, 如果是, 则将所述变形 视点图作为所述当前编码单元的预测信号对所述当前编码视点图进行预测编 码, 如果否, 则采用最优决策从变形视点图的预测模式和当前视点图的其他预 测模式中选择一种最好的模式对当前编码单元进行预测编码。
11、 如权利要求 10所述的多视点视频的编码方法, 其特征在于, 当当前编 码单元的占有率小于或者等于预设的阈值时, 所述方法还包括:
采用模式指示符号来标识采用最优决策从变形视点图的预测模式和当前视 点图的其他预测模式中选择出的最好的模式, 并将所述模式指示符号写进码流 中。
12、如权利要求 9至 11任一项所述的多视点视频的编码方法,其特征在于, 所述最优决策为率失真决策。
13、 如权利要求 1 0所述的多视点视频的编码方法, 其特征在于, 采用下述 公式计算当前编码单元的占有率 P(c
14、 一种多视点视频的解码方法, 其特征在于, 所述方法包括:
解析当前视点图的码流, 得到最优变形偏移;
根据所述最优变形偏移、 视点的摄像机参数以及前视点图的深度图信息确 定前视点图和当前解码视点图之间的视差信息;
根据所述视差信息和前视点图计算所述前视点图的变形视点图;
将所述变形视点图作为重构所述当前解码视点图的参考图。
15、 如权利要求 14所述的多视点视频的解码方法, 其特征在于, 所述最优 变形偏移为前视点图的变形视频图和变形深度图共同的最优变形偏移, 或者所 述最优变形偏移包括前视点图的变形视频图和变形深度图各自的最优变形偏 移。
1 6、 如权利要求 14所述的多视点视频的解码方法, 其特征在于, 所述根据 所述最优变形偏移、 视点的摄像机参数以及所述前视点图的深度图信息, 获取 所述前视点图和所述当前编码视点图之间的视差信息包括:
根据所述视点的摄像机参数以及前视点图的深度图信息计算得到初始视差 信号;
根据所述最优变形偏移以及所述前视点图的深度图信息得到偏移修正信 根据所述初始视差信号和所述偏移修正信号得到所述前视点图和所述当前 编码视点图之间的视差信息。
1 7、 如权利要求 14所述的多视点视频的解码方法, 其特征在于, 在所述根 据最优变形偏移、 视点的摄像机参数以及前视点图的深度图信息确定前视点图 和当前解码视点图之间的视差信息时, 采用下述公式计算前视点图和当前编码 视点图之间的视差信息:
disp(p) = -* (Xadd - Xbase) + offseti *-^-*a 其中 f为当前视点图的摄像机焦距, z 为物体距离视点的距离, X add为
3D空间内当前视点图在视点直线上的位置, 为 3D空间内前视点图在视点 直线上的位置, d为当前像素的深度值, 为当前视点图深度的最大深度值, a为像素精度。
18、 如权利要求 14所述的多视点视频的解码方法, 其特征在于, 所述利用 视差信息和前视点图计算前视点图的变形视点图, 并将所述变形视点图作为预 测信号对当前解码视点图进行预测包括:
在解码当前解码单元的码流时, 解码得到模式指示符号, 根据所述模式指 示符号判断是采用变形视点图作为当前视点图的预测信号对当前解码单元进行 预测解码, 还是采用其他预测信号对当前视点图的解码单元进行预测解码。
19、 如权利要求 14所述的多视点视频的解码方法, 其特征在于, 所述利 用视差信息和前视点图计算前视点图的变形视点图, 并将所述变形视点图作为 预测信号对当前解码视点图进行预测包括:
利用视差信息和前视点图计算前视点图的变形视点图;
获取当前解码视点图的占用掩码图, 所述占用掩码图用于描述前视点图的 像素是否可以被变形到当前解码视点图中;
根据所述占用掩码图得到所述当前解码视点图的每一个解码单元的掩码标 识;
根据所述当前解码视点图的每一个解码单元的掩码标识计算当前解码单元 的占有率; 判断当前解码单元的占有率是否大于预设的阈值, 如果是, 将变形视点图 作为当前解码单元的预测信号对当前解码视点图进行预测解码, 否则解码当前 解码单元的模式指示符号, 根据模式指示符号判断是采用变形视点图的预测模 式和当前解码视点图的其他预测模式中选择一种最好的模式对当前编码单元进 行预测编码。
20、 一种多视点视频的编码装置, 其特征在于, 所述装置包括:
最优变形偏移获取单元, 用于最小化当前编码视点图和前视点图的变形视 点图的误差, 获得最优变形偏移;
第一视差信息计算单元, 用于根据所述最优变形偏移、 视点的摄像机参数 以及前视点图的深度图信息获取所述前视点图和当前编码视点图之间的视差信 第一变形视点图计算单元, 用于根据所述视差信息和前视点图确定所述前 视点图的变形视点图;
第一编码预测单元, 用于将所述变形视点图作为预测信号对所述当前编码 视点图进行预测编码。
21、 如权利要求 20所述的多视点视频的编码装置, 其特征在于, 所述最优 变形偏移获取单元包括:
初始设置模块, 用于设定变形偏移值°¾^;
视差计算模块, 用于根据所述变形偏移值。13^、 视点的摄像机参数以及前 视点图的深度图信息确定前视点图和当前编码视点图之间的视差信息;
变形视点图计算模块, 用于根据所述视差信息和所述前视点图确定所述前 视点图的变形视点图;
误差计算模块, 用于计算变形视点图和当前编码视点图的原始图的误差; 误差判断模块, 用于判断变形视点图和当前编码视点图的原始图的误差是 否小于目前最小的误差值;
22、 如权利要求 20所述的多视点视频的编码装置, 其特征在于, 所述装置 还包括:
第一模式预测单元, 用于根据视差信息和前视点图计算前视点图的变形视 点图, 并将该变形视点图作为预测信号对当前编码视点图进行预测编码, 得到 第一预测结果;
第二模式预测单元, 用于采用当前编码视点图的其他预测编码模式对当前 编码视点图进行预测编码, 得到第二预测结果;
模式选择单元, 用于根据第一预测结果和第二预测结果采用最优决策从变 形视点图的预测模式和当前视点图的其他预测模式中选择一种最好的模式对当 前编码单元进行预测编码, 并将模式指示符号写进码流中, 所述模式指示符号 用于标识采用最优决策从变形视点图的预测模式和当前视点图的其他预测模式 中选择出的最好的模式。
23、 如权利要求 20所述的多视点视频的编码装置, 其特征在于, 所述装置 还包括:
第一占用掩码图获取单元, 用于获取当前编码视点图的占用掩码图, 所述 占用掩码图用于描述前视点图的像素是否可以被变形到当前编码视点图中; 第一掩码标识获取单元, 用于根据所述占用掩码图得到所述当前编码视点 图的每一个编码单元的掩码标识;
第一占有率计算单元, 用于根据当前编码视点图的每一个编码单元的掩码 标识计算当前编码单元的占有率;
第二编码预测单元, 用于判断当前编码单元的占有率是否大于预设的阈值, 如果是, 则将所述变形视点图作为所述当前编码单元的预测信号对所述当前视 点图进行预测编码, 如果否, 则采用最优决策从变形视点图的预测模式和当前 视点图的其他编码预测模式中选择一种最好的模式对当前编码单元进行预测编 码。
24、 一种多视点视频的解码装置, 其特征在于, 包括:
码流解析单元, 用于解析当前视点图的码流, 得到最优变形偏移; 第二视差信息计算单元, 用于根据所述最优变形偏移、 视点的摄像机参数 以及前视点图的深度图信息确定前视点图和当前解码视点图之间的视差信息; 第二变形视点图计算单元, 用于根据所述视差信息和前视点图计算所述前 视点图的变形视点图;
第一解码预测单元, 用于将所述变形视点图作为重构所述当前解码视点图 的参考图。
25、 如权利要求 24所述的多视点视频的解码装置, 其特征在于, 还包括: 模式指示符号解码单元, 用于在解码当前解码单元时, 解码得到模式指示 口
付"
第二解码预测单元, 用于根据所述模式指示符号判断是采用变形视点图作 为当前视点图的预测信号对当前解码单元进行预测解码, 还是采用其他预测信 号对当前视点图的解码单元进行预测解码。
26、 如权利要求 24所述的多视点视频的解码装置, 其特征在于, 还包括: 第二掩码标识获取单元, 用于获取当前解码视点图的占用掩码图, 所述占 用掩码图用于描述前视点图的像素是否可以被变形到当前解码视点图中;
第二掩码标识获取单元, 用于根据所述占用掩码图得到所述当前解码视点 图的每一个解码单元的掩码标识;
第二占有率计算单元, 用于根据所述当前解码视点图的每一个解码单元的 掩码标识计算当前解码单元的占有率;
第三解码预测单元, 用于判断当前解码单元的占有率是否大于预设的阈值, 如果是, 将变形视点图作为当前解码单元的预测信号对当前视点图进行预测解 码, 否则解码当前解码单元的模式指示符号, 根据所述模式指示符号来判断是 采用变形视点图的预测模式对当前解码单元进行解码还是采用当前解码视点图 的其他预测模式对当前解码单元进行预测解码。
27、 一种编码器, 其特征在于, 所述编码器包括权利要求 20至 23任一项 所述的多视点视频的编码装置。
28、 一种解码器, 其特征在于, 所述解码器包括权利要求 24至 26任一项 所述的多视点视频的解码装置。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP12852136.6A EP2773115B1 (en) | 2011-11-23 | 2012-08-01 | Coding and decoding method, device, encoder, and decoder for multi-view video |
US14/285,962 US9467676B2 (en) | 2011-11-23 | 2014-05-23 | Multi-view video coding and decoding methods and apparatuses, coder, and decoder |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110376564.1 | 2011-11-23 | ||
CN201110376564.1A CN103139569B (zh) | 2011-11-23 | 2011-11-23 | 多视点视频的编码、解码方法、装置和编解码器 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/285,962 Continuation US9467676B2 (en) | 2011-11-23 | 2014-05-23 | Multi-view video coding and decoding methods and apparatuses, coder, and decoder |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013075515A1 true WO2013075515A1 (zh) | 2013-05-30 |
Family
ID=48469082
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2012/079503 WO2013075515A1 (zh) | 2011-11-23 | 2012-08-01 | 多视点视频的编码、解码方法、装置和编解码器 |
Country Status (4)
Country | Link |
---|---|
US (1) | US9467676B2 (zh) |
EP (1) | EP2773115B1 (zh) |
CN (1) | CN103139569B (zh) |
WO (1) | WO2013075515A1 (zh) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104768019B (zh) * | 2015-04-01 | 2017-08-11 | 北京工业大学 | 一种面向多纹理多深度视频的相邻视差矢量获取方法 |
CN107071385B (zh) * | 2017-04-18 | 2019-01-25 | 杭州派尼澳电子科技有限公司 | 一种基于h265引入视差补偿的立体视频编码方法 |
WO2020196680A1 (ja) * | 2019-03-25 | 2020-10-01 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 三次元データ符号化方法、三次元データ復号方法、三次元データ符号化装置、及び三次元データ復号装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101056398A (zh) * | 2006-03-29 | 2007-10-17 | 清华大学 | 一种多视编码过程中获取视差矢量的方法及编解码方法 |
CN101483770A (zh) * | 2008-01-08 | 2009-07-15 | 华为技术有限公司 | 一种编解码方法及装置 |
CN101917619A (zh) * | 2010-08-20 | 2010-12-15 | 浙江大学 | 一种多视点视频编码快速运动估计方法 |
US20110211634A1 (en) * | 2010-02-22 | 2011-09-01 | Richard Edwin Goedeken | Method and apparatus for offset metadata insertion in multi-view coded view |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2329653B1 (en) * | 2008-08-20 | 2014-10-29 | Thomson Licensing | Refined depth map |
-
2011
- 2011-11-23 CN CN201110376564.1A patent/CN103139569B/zh active Active
-
2012
- 2012-08-01 EP EP12852136.6A patent/EP2773115B1/en active Active
- 2012-08-01 WO PCT/CN2012/079503 patent/WO2013075515A1/zh active Application Filing
-
2014
- 2014-05-23 US US14/285,962 patent/US9467676B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101056398A (zh) * | 2006-03-29 | 2007-10-17 | 清华大学 | 一种多视编码过程中获取视差矢量的方法及编解码方法 |
CN101483770A (zh) * | 2008-01-08 | 2009-07-15 | 华为技术有限公司 | 一种编解码方法及装置 |
US20110211634A1 (en) * | 2010-02-22 | 2011-09-01 | Richard Edwin Goedeken | Method and apparatus for offset metadata insertion in multi-view coded view |
CN101917619A (zh) * | 2010-08-20 | 2010-12-15 | 浙江大学 | 一种多视点视频编码快速运动估计方法 |
Non-Patent Citations (1)
Title |
---|
See also references of EP2773115A4 * |
Also Published As
Publication number | Publication date |
---|---|
CN103139569B (zh) | 2016-08-10 |
EP2773115B1 (en) | 2021-03-31 |
US20140254690A1 (en) | 2014-09-11 |
EP2773115A1 (en) | 2014-09-03 |
US9467676B2 (en) | 2016-10-11 |
EP2773115A4 (en) | 2014-09-17 |
CN103139569A (zh) | 2013-06-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI736906B (zh) | 運動向量精度細化 | |
US8537200B2 (en) | Depth map generation techniques for conversion of 2D video data to 3D video data | |
TW201931854A (zh) | 統一合併候選列表運用 | |
JP5947977B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム及び画像復号プログラム | |
JP2014524706A (ja) | 動きベクトル処理 | |
TW201919402A (zh) | 視訊圖像編碼和解碼的方法、裝置及設備 | |
US20150245062A1 (en) | Picture encoding method, picture decoding method, picture encoding apparatus, picture decoding apparatus, picture encoding program, picture decoding program and recording medium | |
CN108141610A (zh) | 用于编码和解码基于光场的图像的方法和设备,以及相应的计算机程序产品 | |
US20150365698A1 (en) | Method and Apparatus for Prediction Value Derivation in Intra Coding | |
US20150249839A1 (en) | Picture encoding method, picture decoding method, picture encoding apparatus, picture decoding apparatus, picture encoding program, picture decoding program, and recording media | |
US20150172715A1 (en) | Picture encoding method, picture decoding method, picture encoding apparatus, picture decoding apparatus, picture encoding program, picture decoding program, and recording media | |
JP5986657B2 (ja) | 簡易化した深度ベースのブロック分割の方法 | |
JP6039178B2 (ja) | 画像符号化装置、画像復号装置、並びにそれらの方法及びプログラム | |
JP5926451B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、および画像復号プログラム | |
WO2014048242A1 (zh) | 预测图像生成方法和装置 | |
CN112771861A (zh) | 色度帧内预测方法和装置、及计算机存储介质 | |
JP2024112926A (ja) | 符号化及び復号化方法並びに装置 | |
US20150016517A1 (en) | Encoding device and encoding method, and decoding device and decoding method | |
WO2013075515A1 (zh) | 多视点视频的编码、解码方法、装置和编解码器 | |
WO2015196860A1 (zh) | 一种图像处理方法、装置及系统 | |
US20240073447A1 (en) | Encoding and decoding method and apparatus, and devices therefor | |
JP4944046B2 (ja) | 映像符号化方法,復号方法,符号化装置,復号装置,それらのプログラムおよびコンピュータ読み取り可能な記録媒体 | |
JP5706291B2 (ja) | 映像符号化方法,映像復号方法,映像符号化装置,映像復号装置およびそれらのプログラム | |
Salmistraro et al. | A robust fusion method for multiview distributed video coding | |
CN112055201B (zh) | 视频编码方法及其相关装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12852136 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012852136 Country of ref document: EP |