WO2011102131A1

WO2011102131A1 - Image encoding device, image encoding method, program and integrated circuit

Info

Publication number: WO2011102131A1
Application number: PCT/JP2011/000875
Authority: WO
Inventors: 耕治有村; 重里　達郎; 津田　賢治郎; 一仁木村
Original assignee: パナソニック株式会社
Priority date: 2010-02-18
Filing date: 2011-02-17
Publication date: 2011-08-25

Abstract

An image encoding device (100) comprised of capture units (101a, 101b) which capture three-dimensional footage formed of images taken from at least two viewpoints, a correction unit (102) which corrects any discrepancy in the size or position of the subject of the three-dimensional images captured by the capture units (101a, 101b) when said images are projected, a motion vector detection unit (105) which detects vectors of motion occurring between the images from the two viewpoints which comprise the three-dimensional images corrected by the correction unit (102), and an encoding unit (106) which compresses and encodes the images corrected by the correction unit (102) based on the motion vectors detected by the motion vector detection unit (105). The correction unit (102) bases its correction processing on the vectors previously detected by the motion vector detection unit (105).

Description

Image coding apparatus, image coding method, program, and integrated circuit

The present invention relates to a high-efficiency image encoding device, and more particularly to a method for high-efficiency encoding of stereoscopic video data captured from a plurality of viewpoints using motion compensation prediction.

A stereoscopic video display device has been developed for stereoscopic viewing using parallax of images observed with both eyes. As a method for encoding such a stereoscopic video, it is known to use the fact that the correlation between the left-eye video and the right-eye video is high. Specifically, when one of the two images is encoded, a motion vector is obtained using the other image as a reference image, and motion compensation is performed. Thus, an encoding method that realizes highly efficient compression has been proposed.

Similarly, as an image encoding method for realizing multi-view image encoding using motion compensation between viewpoints, H.264 MVC (Multiview Video Coding) is being standardized.

FIG. 6 shows a reference relationship between frames (pictures) in MVC. When a normal one-viewpoint image is encoded, a motion vector is detected using a reference in the time axis direction, that is, another frame having a different imaging time as a reference image, and motion compensation prediction is performed. On the other hand, in MVC, in addition to multi-viewpoint reference in the time axis direction, reference between each viewpoint (V0 to V4), that is, a motion vector is obtained using frames of different viewpoints captured at the same time as reference images. Detection and motion compensation prediction can be performed.

In order to perform highly efficient encoding using motion compensation prediction, it is necessary to obtain an accurate motion vector between the encoding target image and the reference image. However, in many cases, multi-viewpoint images constituting a three-dimensional image are developed by combining a plurality of single cameras or by fixing the cameras to be an integrated type. In this case, since there are individual differences in the noise, brightness, and color difference of each camera, the conventional motion vector detection using block matching for each coding block cannot accurately detect a motion vector between viewpoints. was there.

As a conventional example for solving such a problem, Patent Document 1 discloses an accurate motion vector for use in encoding using a motion vector of a block around the corresponding block or a motion vector at the same position in a past frame. A conventional example in which the coding efficiency is improved by detecting the above is disclosed.

FIG. 7 shows a block diagram of Patent Document 1 of the conventional example. A conventional image coding apparatus 10 mainly includes a block matching unit 1, a parallax compensation vector detection unit 2,

memories

3 and 6, a correction vector detection unit 4, and a variable delay unit 5.

The image encoding apparatus 10 having the above configuration includes one image as an encoding target image among two images captured by a pair of synchronized cameras (that is, images captured from different viewpoints), and the other image. An image is input as a reference image.

The block matching unit 1 performs block matching with the reference image for each block (encoding target block) constituting the encoding target image. The block matching result output from the block matching unit 1 is input to the parallax compensation vector detection unit 2. The disparity compensation vector detection unit 2 detects the motion vector of the encoding target block based on the block matching result. The motion vector detected in this way is stored in the memory 3.

The correction vector detection unit 4 reads the corresponding encoding target block from the memory 3 and the motion vector of the block around the encoding target block, the past block at the same position stored in the memory 6 and the surrounding of the block. The motion vector of the block is acquired and, for example, the motion vector is averaged to detect an accurate motion vector of the block to be encoded.

FIG. 8 is a schematic diagram showing a motion vector for each block constituting the encoding target image. As shown in FIG. 8, the motion vectors of the encoding target images detected using the images of different viewpoints as reference images are vectors “a ^→ ” to “ta ^→ ”. In order to obtain a correction vector of the vector “K ^→ ”, the correction vector detection unit 4 includes neighboring vectors “A ^→ ”, “I ^→ ”, “U ^→ ”, “O ^→ ”, “K ^→ ”, “ “ ^→ ”, “Ko ^→ ”, “sa ^→ ” are used. Encoding efficiency is improved by performing motion compensation predictive encoding using the correction vector corrected in this way. The symbol “ ^→ (vector)” indicates a symbol added on the immediately preceding character.

JP-A-6-113335

However, the video of each viewpoint constituting the stereoscopic video is captured as a single unit by combining a plurality of single cameras or by fixing a plurality of cameras. For this reason, when one camera is considered as a reference, the other camera has an inclination (rotation) different from the parallax, a vertical (left / right) shift, or the size of the imaging target (imaging magnification) Are often different. When encoding a stereoscopic video having an image shift different from such a parallax using motion compensation prediction based on inter-viewpoint reference, even if an accurate motion vector can be detected, a motion compensation residual signal (prediction error) ) Is large and there is a first problem that the encoding efficiency does not increase.

9A to 9C, FIG. 10A, and FIG. 10B, the first problem in the case of encoding an image captured from two viewpoints will be specifically described. FIGS. 9A to 9C show an example in which a first viewpoint image is used as a reference image, a second viewpoint image is used as an encoding target image, and a block including a part of a subject (star) is encoded. . A block in the first viewpoint image is a reference block obtained by motion vector detection, and a block in the second viewpoint image is an encoding target block.

FIG. 9A shows a case where there is no deviation other than parallax between the images of the two viewpoints, and the residual signal (prediction error) between the encoding target block and the reference block becomes small, so that encoding can be performed with high efficiency. However, as shown in FIG. 9B, if there is a deviation in inclination between the two viewpoint images, or if there is a difference in size between the two viewpoint images as shown in FIG. 9C, the residual signal ( (Prediction error) becomes large, so that it cannot be encoded with high efficiency.

Similarly, FIG. 10A shows a case where there is no shift other than parallax between the images of the two viewpoints, and the residual signal (prediction error) between the encoding target block and the reference block becomes small, so that encoding can be performed with high efficiency. . However, as shown in FIG. 10B, when there are vertical shifts in the images of the two viewpoints, the reference block is a position extended outside the image. As a result, the residual signal (prediction error) becomes large and cannot be encoded with high efficiency.

In addition, when the images between the viewpoints have inclinations, sizes, or vertical shifts other than parallax, the encoded stereoscopic image data is reproduced and viewed, and stereoscopic viewing becomes difficult. There was a second problem of being easily fatigued.

Furthermore, in order to correct in advance an image shift other than the parallax of the image between the viewpoints, it is necessary to newly provide an image shift detection unit, which causes an increase in power consumption of the image encoding device 10 and an increase in circuit scale. There was a third problem.

Accordingly, the present invention has been made in view of the above first to third problems, and provides an image encoding apparatus that easily and appropriately corrects a shift caused by a cause different from a parallax between two images. For the purpose.

The image encoding device according to an aspect of the present invention encodes a stereoscopic video image including at least two viewpoint videos. Specifically, an acquisition unit that acquires the stereoscopic video, and a correction for correcting a displacement related to the size or position of the subject displayed in the stereoscopic video acquired by the acquisition unit A correction unit that executes processing, a motion vector detection unit that detects a motion vector between two viewpoint videos that form a stereoscopic video corrected by the correction unit, and a motion detected by the motion vector detection unit An encoding unit that compresses and encodes the stereoscopic video corrected by the correction unit based on the vector. And the said correction | amendment part performs the said correction process based on the motion vector detected by the said motion vector detection part before the present correction process.

This makes it possible to perform highly efficient image coding by inter-viewpoint reference. Also, when an encoded image is reproduced as a stereoscopic image and viewed, stereoscopic viewing is easy and eye fatigue is unlikely to occur. Furthermore, since it is not necessary to provide a new component such as an image shift detection unit, it is possible to suppress an increase in circuit scale and reduce power consumption.

In addition, the correction unit may use at least one of a shift due to rotation between two viewpoints of the subject displayed in the stereoscopic video, a shift due to enlargement, and a shift due to parallel movement based on the motion vector. May be corrected.

Further, the correction unit detects at least one of a shift due to the rotation, a shift due to the enlargement, and a shift due to the parallel movement based on the direction of the motion vector, and the shift indicated in the detection result May be corrected.

As an example, the correction unit may detect a shift associated with the parallel movement based on a vertical component of the motion vector. Note that the parallel movement is not necessarily detected using a plurality of motion vectors detected for each block, and can be detected from one motion vector.

For example, the motion vector detection unit may detect the motion vector for each region smaller than the entire region of the stereoscopic video corrected by the correction unit. And the said correction | amendment part may detect the shift | offset | difference accompanying the said rotation, or the shift | offset | difference accompanying the said expansion based on the tendency which the direction of the several motion vector detected by the said motion vector detection part shows.

As an example, when the plurality of motion vectors tend to converge toward a predetermined position in the stereoscopic video, or when the correction unit shows a tendency to diffuse from the predetermined position, the correction unit performs the enlargement. The accompanying shift may be detected.

As another example, the correction unit may detect a shift associated with the rotation when the plurality of motion vectors tend to draw a circle in the stereoscopic video.

Further, the encoding unit may start outputting the compression-encoded video for stereoscopic viewing when a predetermined period has passed since the start of encoding was instructed. Thereby, since an image including a shift is not output, stereoscopic viewing becomes easier.

Further, the motion vector detection unit may further start detection of a motion vector between the stereoscopic video images acquired by the acquisition unit before an instruction to start encoding is given. Then, the correction unit detects the latest motion vector detected by the motion vector detection unit for the first stereoscopic image acquired by the acquisition unit immediately after the start of the encoding process is instructed. The correction process may be executed using

According to the above configuration, the correction process can be executed on the first image to be encoded, so that only an image with substantially no deviation can be encoded.

As an example, the acquisition unit may include a first imaging unit that images a subject from a first viewpoint, and a second imaging unit that images the subject from a second viewpoint.

In addition, the motion vector detection unit is configured so that one of images captured at the first time at each of the first and second viewpoints is an encoding target image, and the other is a reference image, for each block of the encoding target image. Alternatively, a motion vector may be detected. And the said correction | amendment part is 2nd after the said 1st time in each of the said 1st and 2nd viewpoint based on the tendency of the said several motion vector corresponding to each block of the said encoding object image. You may correct | amend the shift | offset | difference regarding the magnitude | size or position at the time of the display of a to-be-photographed object with respect to the at least one image imaged at the time.

An image encoding method according to an aspect of the present invention is a method for encoding a stereoscopic video image including at least two viewpoint videos. Specifically, an acquisition step for acquiring the stereoscopic video, and a correction for correcting a displacement related to the size or position of the subject displayed in the stereoscopic video acquired in the acquisition step. A correction step for executing processing, a motion vector detection step for detecting a motion vector between two viewpoint videos constituting the stereoscopic video corrected in the correction step, and the motion detected in the motion vector detection step An encoding step of compressing and encoding the stereoscopic video corrected in the correction step based on the vector. In the correction step, the correction process is executed based on the motion vector detected in the motion vector detection step before the current correction process.

The program according to an aspect of the present invention causes a computer to encode a stereoscopic video image including at least two viewpoint videos. Specifically, an acquisition step for acquiring the stereoscopic video, and a correction for correcting a displacement related to the size or position of the subject displayed in the stereoscopic video acquired in the acquisition step. A correction step for executing processing, a motion vector detection step for detecting a motion vector between two viewpoint videos constituting the stereoscopic video corrected in the correction step, and the motion detected in the motion vector detection step Based on the vector, the computer is caused to execute an encoding step of compressing and encoding the stereoscopic video corrected in the correction step. In the correction step, the correction process is executed based on the motion vector detected in the motion vector detection step before the current correction process.

An integrated circuit according to an aspect of the present invention encodes a stereoscopic video image including at least two viewpoint videos. Specifically, an acquisition unit that acquires the stereoscopic video and a correction process for correcting a shift related to the size or position of the subject displayed in the acquired stereoscopic video are displayed. Based on a correction unit, a motion vector detection unit that detects a motion vector between two viewpoint videos constituting the stereoscopic video corrected by the correction unit, and a motion vector detected by the motion vector detection unit And an encoding unit that compresses and encodes the stereoscopic video corrected by the correction unit. And the said correction | amendment part performs the said correction process based on the motion vector detected by the said motion vector detection part before the present correction process.

According to the present invention, it is possible to correct a deviation caused by a cause different from the parallax from the result of motion vector detection by inter-viewpoint reference. As a result, highly efficient image coding by inter-viewpoint reference becomes possible. Also, when an encoded image is reproduced as a stereoscopic image and viewed, stereoscopic viewing is easy and eye fatigue is unlikely to occur. Furthermore, since it is not necessary to provide a new component such as an image shift detection unit, it is possible to suppress an increase in circuit scale and reduce power consumption.

FIG. 1 is a block diagram of an image coding apparatus according to Embodiment 1 of the present invention. FIG. 2A is a diagram illustrating a state in which images captured at the first and second viewpoints are arranged in the order of imaging. FIG. 2B is a diagram illustrating a state in which images captured at the first and second viewpoints are arranged in the encoding order. FIG. 3A is a flowchart showing main processing of the image coding apparatus according to Embodiment 1. FIG. 3B is a flowchart illustrating an encoding process of the image encoding device according to Embodiment 1. FIG. 3C is a flowchart showing a correction process of the image coding apparatus according to Embodiment 1. FIG. 4A is a diagram illustrating an example of images at the first and second viewpoints when there is no imaging deviation. FIG. 4B is a diagram illustrating an example of the first and second viewpoint images when the first viewpoint image is rotated with respect to the second viewpoint image. FIG. 4C is a diagram illustrating an example of first and second viewpoint images in a case where the first viewpoint image is reduced with respect to the second viewpoint image. FIG. 4D is a diagram illustrating an example of the first and second viewpoint images when the first viewpoint image is translated with respect to the second viewpoint image. FIG. 5A is a block diagram of an image coding apparatus according to Embodiment 2 of the present invention. FIG. 5B is a flowchart showing preprocessing of the image coding apparatus according to Embodiment 2 of the present invention. FIG. 2 is a diagram illustrating a reference relationship of H.264 MVC (Multiview Video Coding). FIG. FIG. 7 is a block diagram of a conventional image encoding device. FIG. 8 is a diagram showing the motion vectors of the blocks constituting the encoding target image. FIG. 9A is a diagram illustrating encoding efficiency when there is no deviation between the encoding target image and the reference image. FIG. 9B is a diagram illustrating encoding efficiency when the encoding target image is rotated with respect to the reference image. FIG. 9C is a diagram illustrating encoding efficiency when the encoding target image is enlarged with respect to the reference image. FIG. 10A is a diagram illustrating encoding efficiency when there is no deviation between the encoding target image and the reference image. FIG. 10B is a diagram illustrating encoding efficiency when the encoding target image is translated with respect to the reference image.

Hereinafter, embodiments of the present invention will be described with reference to the drawings.

(Embodiment 1)
FIG. 1 is a block diagram of an image coding apparatus 100 according to Embodiment 1 of the present invention. The image encoding apparatus 100 is an H.264 filer. As shown in FIG. 1, the left-eye imaging unit (first imaging unit) 101a, the right-eye imaging unit (second imaging unit) 101b, and a correction unit 102, multiplexing unit 103, switching

units

104 and 108, motion vector detection unit 105, encoding unit 106, reference image memory 107, variable length encoding unit 109, encoding mode control unit 110, Is provided. The correction unit 102 further includes a correction value calculation unit 111 and an image correction unit 112. The encoding unit 106 further includes an intra-screen encoding unit 114 and an inter-screen encoding unit 115.

The left-eye imaging unit 101a outputs an image (video or still image) obtained by imaging the subject from the first viewpoint to the correction unit 102. The right-eye imaging unit 101b outputs an image obtained by imaging the subject from a second viewpoint different from the first viewpoint to the correction unit 102.

That is, the image output from the left-eye imaging unit 101a and the image output from the right-eye imaging unit 101b have parallax. In other words, the images picked up by the left-eye image pickup unit 101a and the right-eye image pickup unit 101b are stereoscopic images made up of two viewpoint images.

FIG. 2A is a schematic diagram illustrating an image (video) captured by the left-eye imaging unit 101a and the right-eye imaging unit 101b. The left-eye imaging unit 101a and the right-eye imaging unit 101b are interlocked with each other, and as shown in FIG. 2A, 1 for every same time (t ₀ , t ₁ ,..., T ₆ ). An image of a frame (picture) is output.

In the first embodiment, the left-eye imaging unit 101a and the right-eye imaging unit 101b constitute an acquisition unit that acquires an image. However, in the present invention, the left-eye imaging unit 101a and the right-eye imaging unit 101b are not essential components and can be omitted. That is, an image captured by an external imaging device may be acquired and processed by an acquisition unit. Specifically, the acquisition unit (not shown) may acquire a stereoscopic video from a broadcast wave. The format of the stereoscopic video that can be acquired from the broadcast wave is not particularly limited.

For example, the left half of one image is the image of the first viewpoint and the right half is the image of the second viewpoint, or the upper half of one image is the image of the first viewpoint and the lower half is the image of the first viewpoint. A side-by-side format that is an image of the second viewpoint may be used. With this format, transmission and reception can be performed in the same manner as a conventional planar view video.

Alternatively, the first viewpoint image and the second viewpoint image may be transmitted and received alternately in units of pictures. With this format, a high-definition stereoscopic video can be transmitted and received although the frame rate is twice that of the conventional one.

The correction unit 102 performs a correction process for correcting an imaging shift on at least one of the images input from the left-eye imaging unit 101a and the right-eye imaging unit 101b. More specifically, the correction unit 102 calculates at least one of a shift due to rotation between two viewpoints of a subject displayed in a stereoscopic video, a shift due to enlargement, and a shift due to parallel movement as a motion vector. Is corrected on the basis of the size and / or direction. Then, the correction unit 102 outputs the corrected image to the multiplexing unit 103.

It should be noted that the imaging deviation can be defined, for example, as a deviation relating to the size or position when the subject is displayed. Specifically, when the image of the first viewpoint is enlarged or reduced (size shift) with respect to the image of the second viewpoint captured at the same time, rotation or translation (position shift). ) Is considered.

Also, the imaging shift can be defined as a shift (up / down shift, size shift, tilt shift, etc.) caused by a cause different from parallax. The “deviation caused by a cause different from parallax” refers to, for example, a deviation caused by an installation error of the left-eye imaging unit 101a and the right-eye imaging unit 101b, an imaging magnification mismatch, or the like.

The correction value calculation unit 111 determines the type of imaging deviation based on the tendency of the direction of the plurality of motion vectors. Further, the magnitude of the imaging deviation is calculated based on the tendency of the magnitudes of the plurality of motion vectors.

For example, the correction value calculation unit 111 detects a shift due to parallel movement based on the vertical component of the motion vector. Specifically, if the motion vectors of each block are directed in substantially the same direction (upward or downward) and the sizes are substantially the same, it is determined that the displacement is caused by the parallel movement. Can do.

Further, the correction value calculation unit 111 may detect a shift due to rotation or a shift due to enlargement based on the tendency indicated by the directions of a plurality of motion vectors. Specifically, when a plurality of motion vectors tend to converge toward a predetermined position in a stereoscopic video, or when they tend to diffuse from the predetermined position, it is determined that the displacement is caused by enlargement. Can do. Further, when a plurality of motion vectors tend to draw a circle in a stereoscopic video, it can be determined that the shift is caused by rotation.

The image correcting unit 112 applies at least one of the images captured at the same time from the first and second viewpoints according to the type and size of the imaging shift calculated by the correction value calculating unit 111. The correction process is executed for the above. Details of the operation of the correction unit 102 will be described later.

The multiplexing unit 103 changes the image acquired from the correction unit 102 to the encoding order, and outputs it to the switching unit 104.

FIG. 2B is a diagram illustrating the order of images (coding order) after the images in FIG. 2A are input to the multiplexing unit 103 and multiplexed. 2A and 2B, “I”, “P”, and “B” represent the encoding type of each frame. Specifically, “I” is an intra-frame prediction frame (I picture), “P” is a unidirectional inter-frame prediction frame (P picture), and “B” is a bi-directional inter-frame prediction frame (B picture). Represents. An “arrow” indicates a reference destination when performing inter-viewpoint reference.

In the example shown in FIGS. 2A and 2B, each block constituting the first viewpoint image is encoded using only the first viewpoint (same viewpoint) image captured at different times as a reference image. For example, each block of the frame F2 is encoded using only the frame F0 as a reference image. Each block of the frame F4 is encoded using the frame F0 or the frame F2 as a reference image. Each block of the frame F6 is encoded using the frame F2 or the frame F4 as a reference image.

On the other hand, each block constituting the second viewpoint image includes a first viewpoint (another viewpoint) imaged at the same time, or a second viewpoint (the same viewpoint) image captured at a different time. It is encoded as a reference image. For example, each block of the frame F1 is encoded using only the frame F0 as a reference image. Each block of the frame F3 is encoded using the frame F1 or the frame F2 as a reference image. Each block of the frame F5 is encoded using the frame F1, the frame F3, or the frame F4 as a reference image. Each block of the frame F7 is encoded using the frame F3, the frame F5, or the frame F6 as a reference image.

FIG. 3A is a flowchart showing a processing procedure of the main processing. With reference to FIG. 3A, the flow of the operation of the image coding apparatus 100 will be briefly described.

First, the left-eye imaging unit 101a acquires the first viewpoint image, and the right-eye imaging unit 101b acquires the second viewpoint image (S201). Note that in an image encoding device that does not include the left-eye imaging unit 101a and the right-eye imaging unit 101b, an image may be acquired from an external device.

Next, the correction unit 102 performs correction processing on the images acquired from the left-eye imaging unit 101a and the right-eye imaging unit 101b. A specific processing procedure of the correction processing will be described later with reference to FIG. 3C.

Next, the switching

units

104 and 108, the motion vector detection unit 105, the encoding unit 106, the reference image memory 107, the variable length encoding unit 109, and the encoding mode control unit 110 are corrected by the correction unit 102 and multiplexed. The image multiplexed by the unit 103 is encoded (S203). A specific processing procedure of the encoding process will be described later with reference to FIG. 3B.

FIG. 3B is a flowchart showing the procedure of the encoding process. With reference to FIG. 3B, the operation of the components after the switching unit 104 will be described in detail.

The switching unit 104 acquires the encoding type of the encoding target image acquired from the multiplexing unit 103 from the encoding mode control unit 110. When the encoding type is an intra prediction frame (I picture), the switching unit 104 outputs the encoding target image to the intra encoding unit 114 of the encoding unit 106. On the other hand, when the encoding type is an inter-frame prediction frame (P picture or B picture), the switching unit 104 outputs the encoding target image to the motion vector detecting unit 105 simultaneously with the intra-frame encoding unit 114.

That is, the encoding target image is always intra-coded by the intra-coding unit 114 (S301). Furthermore, when the encoding mode control unit 110 determines that the frame is an inter-frame prediction frame (Yes in S302), in addition to the intra-frame encoding, the motion vector detection unit 105 detects a motion vector (S303).

The in-screen encoding unit 114 performs intra-screen encoding on the input encoding target image (S301). Specifically, the intra-frame encoding unit 114 performs intra-frame prediction for each block (encoding target block) constituting the encoding target image to generate a prediction block. Next, a prediction error (residual signal) is calculated by subtracting the prediction block from the encoding target block. Next, the quantized coefficient is calculated by orthogonal transform and quantizing the calculated prediction error. Then, the obtained quantized coefficients and encoded information are output to switching section 108. Further, the intra-frame coding unit 114 performs inverse quantization and inverse orthogonal transform on the quantized coefficient, adds the prediction blocks, and creates a local decoded image. This local decoded image is stored in the reference image memory 107 as a reference image of the subsequent inter-frame prediction frame.

The motion vector detection unit 105 detects a motion vector between two viewpoint videos constituting the stereoscopic video corrected by the correction unit 102 by performing block matching between the encoding target block and the reference image. Specifically, the motion vector detection unit 105 detects a motion vector for each area (block) smaller than the entire area of the stereoscopic video corrected by the correction unit 102.

More specifically, the motion vector detection unit 105 acquires the local decoded image designated by the encoding mode control unit 110 from the reference image memory 107. Then, the motion vector detection unit 105 performs block matching of the encoding target block using the acquired local decoded image as a reference image, and detects a motion vector for each block (S303). Note that the reference image specified by the encoding mode control unit 110 may be one or plural.

Then, the motion vector detection unit 105 outputs the detected motion vector to the inter-screen encoding unit 115 of the encoding unit 106. Furthermore, when the reference image and the encoding target image are images of different viewpoints (Yes in S304), the motion vector detection unit 105 outputs the obtained motion vector to the correction value calculation unit 111 of the correction unit 102. (S305).

Referring to FIG. 2A, the motion vector detection unit 105 detects the motion vector of the frame F1 detected using the frame F0 as a reference image, the motion vector of the frame F5 detected using the frame F4 as a reference image, and the frame F6 as a reference image. The motion vector and the like of the frame F7 thus output are output to the correction value calculation unit 111.

The inter-screen encoding unit 115 inter-codes the input encoding target image. Specifically, the inter-frame encoding unit 115 performs motion compensation using the motion vector acquired from the motion vector detection unit 105 for each block (encoding target block) constituting the encoding target image, and performs prediction compensation. Is generated. Next, a prediction error (residual signal) is calculated by subtracting the prediction block from the encoding target block. Next, the quantized coefficient is calculated by orthogonal transform and quantizing the calculated prediction error.

Then, the inter-screen encoding unit 115 outputs the quantization coefficient and the encoding information to the switching unit 108. Further, the inter-frame coding unit 115 performs inverse quantization and inverse orthogonal transform on the quantization coefficient, and adds a prediction block to create a local decoded image. This local decoded image is stored in the reference image memory 107 as a reference image of the subsequent inter-frame prediction frame.

When the encoding target image is an inter-frame prediction frame (Yes in S302), the encoding mode control unit 110 performs quantization output from the encoding unit 106 (the intra-screen encoding unit 114 and the inter-screen encoding unit 115). Based on the coefficient and other coding information, for each coding target block, it is determined by a known evaluation formula whether to encode by intra prediction encoding or inter prediction encoding, and the switching unit 108 is Control.

According to the control (judgment result) of the encoding mode control unit 110, the switching unit 108 converts one of the quantized coefficients obtained from the intra-screen encoding unit 114 and the inter-screen encoding unit 115 to the variable-length encoding unit 109. (S306).

The variable length encoding unit 109 performs variable length encoding on the quantization coefficient and the encoding information acquired from the switching unit 108, and outputs the result as encoded data (S307). Then, the image coding apparatus 100 executes the above processing (S301 to S307) for all the blocks constituting the coding target image (S308).

Next, the processing of the correction unit 102 will be described in detail with reference to FIG. 3C and FIGS. 4A to 4D. FIG. 3C is a flowchart illustrating a processing procedure of the correction unit 102. 4A to 4D are diagrams showing the relationship between the imaging deviation between the images of the first and second viewpoints and the tendency of the motion vector.

First, when a motion vector is detected by inter-viewpoint reference (S310), the correction value calculation unit 111 aggregates the motion vectors, and types of imaging shift between the images of the first and second viewpoints. And the magnitude are detected, and the correction value is calculated (S311).

Specifically, first, the correction value calculation unit 111 excludes the influence of parallax from the motion vector detected by the motion vector detection unit 105. Since the parallax is a horizontal component shift, the vertical component is an image shift. In addition, since the object at the convergence point, which is the convergence point of the lens optical axes of the left and right eye imaging units set at the time of shooting, has a parallax almost equal to 0, the motion vector detected by the object at the convergence point is It is. Therefore, an object at the convergence point can be photographed from two different viewpoints, and a motion vector detected between the two captured images can be regarded as an imaging shift.

Also, the parallax of objects having the same distance from the imaging unit is the same size in the same direction. Therefore, for example, a motion vector (direction and magnitude) corresponding to the parallax may be set in the correction value calculation unit 111 in advance.

4A to 4D, the second viewpoint image is set as an encoding target image, and the first viewpoint image captured at the same time as the encoding target image is used as a reference image, which is detected by the motion vector detection unit 105. It is the figure which excluded the influence of parallax from the performed motion vector, and was shown in figure.

FIG. 4A shows the tendency of the motion vector when there is no deviation other than the parallax between the encoding target image and the reference image. In this case, since the encoding target image matches the reference image, the motion vector of each block tends to be (0, 0).

Next, FIG. 4B shows the tendency of the motion vector when the reference image is rotated with respect to the encoding target image. In this case, the motion vectors of the blocks tend to be arranged to draw a circle as a whole. Such a situation occurs, for example, when the left-eye imaging unit 101a is installed tilted.

When observing FIG. 4B more specifically, the motion vectors of each block are arranged to draw a counterclockwise circle. That is, it can be determined that the reference image is rotated counterclockwise with respect to the encoding target image. Further, the rotation center can be estimated as the position where the magnitude of the motion vector is the smallest (in this example, the image center). Furthermore, the degree of rotation (rotation angle) can be estimated from the magnitude of the motion vector and the distance from the center of rotation.

Here, in the case where the type of imaging deviation is rotation, a method for calculating the direction of rotation and the correction value will be described. Note that the method shown below is an example, and it can be calculated by other methods.

First, pre-processing is executed prior to calculating the direction of rotation and the correction value. This pre-processing is executed in common even when the type of imaging deviation is enlargement (reduction) or parallel movement.

Specifically, using the motion vector MV (i, j) of each macroblock, the average motion vector MVave in the frame, the average motion vector MVaveH [j] for each horizontal macroblock line, and the average for each vertical macroblock line A motion vector MVaveV [i] is calculated. The number of horizontal macroblock lines is mby, the number of vertical macroblock lines is mbx, and the number of macroblocks in one frame is MB. 4B to 4D, mbx = 4, mby = 3, and MB = 12.

Normally, the motion vector in image coding is such that the horizontal component (x component) has a positive direction on the right side of the image and a negative direction on the left side, while the vertical component (y component) has a positive value on the lower side of the image. Direction, upper side is negative direction.

The average motion vector MVave in the frame can be calculated using Equation 1. In the examples of FIGS. 4B to 4D, the average value of 12 motion vectors is obtained.

The average motion vector MVaveH [j] for each horizontal macroblock line can be calculated using Equation 2. In the example of FIGS. 4B to 4D, the average motion vector of each of the three horizontal macroblock lines (rows) is calculated.

The average motion vector MVaveV [i] for each vertical macroblock line can be calculated using Equation 3. In the example of FIGS. 4B to 4D, the average motion vector of each of the four vertical macroblock lines (columns) is calculated.

In the above example, the average motion vector is calculated in units of one macroblock line in both the horizontal direction and the vertical direction. However, the present invention is not limited to this, and the average motion vector may be calculated in units of a plurality of macroblock lines. Good.

Next, a method for determining the direction of rotation using each of the above average motion vectors and the following equations 4 to 8 will be described. First, when Vflag, Hflag, and Aflag in the following Expressions 4 to 6 are all true, it is determined that the rotation direction of the reference image with respect to the encoding target image is counterclockwise (counterclockwise).

Vflag of Equation 4 is true when the vertical component y of MVaveV monotonously decreases and the horizontal component x of MVaveV is equal to or less than the threshold value VTh. Here, since a value close to 0 is set as the threshold value VTh, the latter half of the equation 4 can be read as horizontal component x≈0 of MVaveV.

Hflag of Formula 5 is true when the horizontal component x of MVaveH monotonously increases and the vertical component y of MVaveH is equal to or less than the threshold value HTh. Here, since the threshold value HTh is set to a value close to 0, the second half of Equation 5 can be read as the vertical component y≈0 of MVaveH.

The Aflag of Equation 6 is true when the average motion vector MVave of the frame is equal to or less than the threshold value FTh. Here, since the threshold value FTh is set to a value close to 0, Equation 6 can also be read as MVave≈0.

On the other hand, when Vflag, Hflag, and Aflag in Expression 6 and Expressions 7 and 8 below are all true, the rotation direction of the reference image with respect to the encoding target image is determined to be clockwise (clockwise). .

Vflag of Expression 7 is true when the vertical component y of MVaveV monotonously increases and the horizontal component x of MVaveV is equal to or less than the threshold value VTh. Here, since a value close to 0 is set as the threshold value VTh, the latter half of Equation 7 can be read as horizontal component x≈0 of MVaveV.

Hflag of Expression 8 is true when the horizontal component x of MVaveH monotonously decreases and the vertical component y of MVaveH is equal to or less than the threshold value HTh. Here, since the threshold value HTh is set to a value close to 0, the second half of Equation 8 can be read as the vertical component y≈0 of MVaveH.

Next, a method of calculating a correction value (rotation angle) using each average motion vector and the following formulas 9 and 10 will be described. Specifically, the rotation angle X obtained by Equation 9 using the motion vector average of the horizontal macroblock line and the rotation angle Y obtained by Equation 10 from the motion vector average of the vertical macroblock line are obtained. Note that the number of horizontal and vertical pixels px = 16 in each macroblock.

However, when the horizontal macroblock line number mby is an odd number, the motion vector average of the central horizontal macroblock line is excluded from the calculation, and the horizontal macroblock line number mby is decremented by one.

However, when the number of vertical macroblock lines mbx is an odd number, the motion vector average of the center vertical macroblock line is excluded from the calculation, and the calculation is performed by subtracting 1 from the number of vertical macroblock lines mbx.

The rotation angle X of the above formula 9 is obtained as a positive value for counterclockwise rotation (counterclockwise), and the rotation angle Y of the equation 10 is obtained as a positive value for clockwise rotation (clockwise).

Using Equation 9 and Equation 10 above, the rotation angle (counterclockwise) of the entire frame is obtained as (XY) / 2 as an average of the rotation angle X and the rotation angle Y.

The correction value (rotation angle) when correcting the frame for which the motion vector is calculated is the rotation angle of the entire frame. Conversely, the correction value (rotation angle) when correcting the reference frame is in the reverse direction.

Next, FIG. 4C shows the tendency of the motion vector when the sizes of the reference image and the encoding target image do not match. In this case, the motion vectors of the blocks tend to be arranged radially as a whole. Such a situation occurs, for example, when the imaging magnifications of the left-eye imaging unit 101a and the right-eye imaging unit 101b are different.

When FIG. 4C is observed more specifically, the motion vector of each block faces the image center of the encoding target image. That is, it can be determined that the reference image is reduced with respect to the encoding target image. The reduction rate can be estimated from, for example, the average value of the magnitudes of the motion vectors.

Here, in the case where the type of imaging deviation is enlargement or reduction, the direction (whether it is enlargement or reduction) and a method for calculating the correction value will be described. Note that the method shown below is an example, and it can be calculated by other methods.

First, a method for determining whether the image is an enlargement or a reduction using each of the average motion vectors and the following equations 11 to 15 will be described. First, when Vflag, Hflag, and Aflag in the following Expressions 11 to 13 are all true, it is determined that the encoding target image is larger (enlarged) than the reference image.

Vflag of Expression 11 is true when the horizontal component x of MVaveV monotonously decreases and the vertical component y of MVaveV is equal to or less than the threshold value VTh. Here, since the threshold value VTh is set to a value close to 0, the second half of the equation 11 can be read as the vertical component y≈0 of MVaveV.

Hflag of Expression 12 is true when the vertical component y of MVaveH monotonously decreases and the horizontal component x of MVaveH is equal to or less than the threshold value HTh. Here, since the threshold value HTh is set to a value close to 0, the latter half of Equation 12 can be read as horizontal component x≈0 of MVaveH.

Aflag of Expression 13 is true when the average motion vector MVave of the frame is equal to or less than the threshold value FTh. Here, since the threshold value FTh is set to a value close to 0, the expression 13 can also be read as MVave≈0.

On the other hand, when Vflag, Hflag, and Aflag in Expression 13 and Expressions 14 and 15 below are all true, it is determined that the encoding target image is smaller (reduced) than the reference image.

Vflag of Expression 14 is true when the horizontal component x of MVaveV monotonously increases and the vertical component y of MVaveV is equal to or less than the threshold value VTh. Here, since a value close to 0 is set as the threshold value VTh, the latter half of Expression 14 can be read as the vertical component y≈0 of MVaveV.

Hflag of Expression 15 is true when the vertical component y of MVaveH monotonously increases and the horizontal component x of MVaveH is equal to or less than the threshold value HTh. Here, since the threshold value HTh is set to a value close to 0, the second half of Equation 15 can be read as horizontal component x≈0 of MVaveH.

Next, a method for calculating a correction value (reduction ratio / enlargement ratio) using each average motion vector and the following equations 16 and 17 will be described. First, the vertical reduction ratio XR obtained by Expression 16 using the motion vector average of the horizontal macroblock lines and the horizontal reduction ratio YR obtained by Expression 10 from the motion vector average of the vertical macroblock lines are obtained. Note that the number of pixels in each block px = 16.

Using the above formulas 9 and 10, the overall frame reduction ratio is obtained as (XR + YR) / 2 as an average of the vertical reduction ratio XR and the horizontal reduction ratio YR.

The correction value when correcting the frame for which the motion vector is calculated is the enlargement ratio. Conversely, the correction value when correcting the reference frame is an inverse number.

Next, FIG. 4D shows a tendency of the motion vector when the reference image is displaced in one direction (translated) with respect to the encoding target image. In this case, the motion vectors of the blocks tend to be in the same direction as a whole. Such a situation occurs, for example, when the left-eye imaging unit 101a does not accurately face the subject.

When observing FIG. 4D more specifically, the motion vector of each block faces upward. That is, it can be determined that the reference image is shifted upward with respect to the encoding target image. Moreover, the magnitude | size of deviation can be estimated from the average value etc. of the magnitude | size of each motion vector, for example.

Here, a method for calculating a correction value when the type of imaging deviation is parallel movement will be described. Note that the method shown below is an example, and it can be calculated by other methods.

First, as shown in Expression 18, the average motion vector of the frame and the motion vector of each block are compared for each component, and the number of blocks cnt whose difference from the average motion vector is within the threshold value mvTh is calculated.

Then, if cnt is larger than the threshold value MBTh, it is determined that the type of imaging deviation is parallel movement. The shift amount of the entire frame is obtained by the average motion vector MVave in the frame of Equation 1. The correction value for correcting the frame for which the motion vector is calculated is the frame motion vector MVave. Conversely, the correction value for correcting the reference frame is a value obtained by multiplying the motion vector MVave of the frame by -1.

As described above, it is possible to obtain the type and size of the imaging shift by counting the change in the magnitude of the motion vector at the horizontal position or the vertical position of each block in the frame. The correction value calculation unit 111 calculates a correction value for correcting the detected imaging deviation and outputs the correction value to the image correction unit 112 (S311).

Note that the types of imaging deviation shown in FIGS. 4A to 4D are examples, and the correction value calculation unit 111 can also detect other types of imaging deviations. In addition, the imaging deviation described with reference to FIGS. 4A to 4D may be combined.

The image correction unit 112 detects at least one of the first and second viewpoint images captured by the left-eye imaging unit 101a and the right-eye imaging unit 101b based on the input correction value. It correct | amends so that the image pick-up shift performed may be eliminated (S312). Note that the image to be corrected here is an image captured after (second time) after the time (first time) when the encoding target image and the reference image shown in FIGS. 4A to 4D are captured. It is.

The image correction unit 112 performs correction by extracting a common part from each image captured at the same time from each of the first and second viewpoints when the imaging shift between the images is vertical movement, horizontal translation, or inclination. May be. In addition, when the displacement between the images does not match, correction may be performed by enlarging or reducing one of the images captured at the same time from the first and second viewpoints.

Here, an example is described in which the correction of imaging deviation between images is performed by image processing, but the imaging settings such as the imaging position and zoom of the left-eye imaging unit 101a and the right-eye imaging unit 101b are changed. Thus, the imaging deviation may be corrected.

With the configuration as described above, it is possible to improve the encoding efficiency of stereoscopic video data by correcting the imaging shift between images based on the detected motion vector tendency. In addition, since there is no imaging shift in the images between the viewpoints, it is possible to obtain stereoscopic image data that is easy to stereoscopically view and does not cause eye fatigue when the encoded stereoscopic image data is reproduced and viewed. . Furthermore, it is not necessary to provide a new image shift detection means for detecting a shift other than the parallax of the image between the viewpoints, and it is possible to realize low power consumption of the image encoding device and an increase in circuit scale. It becomes possible.

(Modification of Embodiment 1)
In the above processing, sufficient correction processing is not performed on the image within a predetermined period after the processing is started. For this reason, an image with the imaging deviation remaining is encoded and output as an output stream.

Therefore, even if the switching unit 108 acquires the quantization coefficient and the encoding information from the encoding unit 106, it may be discarded without immediately outputting it to the variable length encoding unit 109.

Specifically, the switching unit 108 discards the encoded information acquired from the start of processing (that is, when the first encoded information is acquired) until the predetermined period elapses, and the predetermined period has elapsed. Only the encoded information after this may be output to the variable length encoding unit 109. Note that the predetermined period may be, for example, until N (N is an integer of 1 or more) frames of encoded information is acquired.

As another example, the correction unit 102 inputs the presence / absence of an imaging shift between the images of the first and second viewpoints (whether the correction value is 0) to the encoding control unit 116. The encoding control unit 116 determines that there is no imaging deviation between images (the correction value has become 0), and inputs an encoding start signal to the switching unit 108. Then, after receiving the encoding start signal from the encoding control unit 116, the switching unit 108 starts to output the quantization coefficient and the encoding information input from the encoding unit 106 to the variable length encoding unit 109. It may be.

With the above configuration, it is possible to encode only stereoscopic image data with no imaging deviation, and it is possible to obtain encoded data of an image that is easy to view stereoscopically and does not cause eye fatigue.

(Embodiment 2)
Next, an image encoding device according to Embodiment 2 of the present invention will be described with reference to FIGS. 5A and 5B. FIG. 5A is a block diagram of an image coding apparatus 200 according to Embodiment 2 of the present invention. FIG. 5B is a flowchart illustrating a processing procedure of preprocessing of the image encoding device 200.

The image encoding device 200 according to Embodiment 2 has a configuration in which an encoding control unit 116 is added to the configuration of the image encoding device 100 according to Embodiment 1. The operations of the other parts except the correction unit 102 and the switching unit 108 are the same as those in the first embodiment. In the following, description will be made with a focus on the different operations.

First, the left-eye imaging unit 101a and the right-eye imaging unit 101b of the image encoding device 200 start imaging in response to the power being turned on (Yes in S401). In addition, although the timing which starts imaging is not restricted to said example, imaging shall be started before the start of an encoding process is instruct | indicated.

Next, the motion vector detection unit 105 acquires images captured by the left-eye imaging unit 101a and the right-eye imaging unit 101b via the correction unit 102, the multiplexing unit 103, and the switching unit 104 (S402). Then, the motion vector of each acquired image is detected (S403), and only the latest motion vector is stored.

This process (S402, S403) is repeatedly executed until the start of the encoding process is instructed. However, at this time, the correction unit 102 does not execute the correction process. The latest motion vector may be stored in the correction unit 102 instead of the motion vector detection unit 105.

Then, in response to the instruction to start the encoding process (Yes in S404), the image encoding apparatus 200 executes the main process shown in FIG. 3A (S405). However, in the second embodiment, the correction process is executed using the latest motion vector detected by the motion vector detection unit 105 on the image immediately after the start of the encoding process is instructed (first) ( 3A is different from the first embodiment.

With the above configuration, the correction process can be executed even for an image immediately after the start of encoding, so that only an image from which imaging deviation has been removed can be encoded. As a result, it is possible to obtain encoded data of an image that is easily stereoscopically viewed and does not cause eye fatigue.

(Embodiment 3)
The configuration of the third embodiment of the present invention is the same as that of the second embodiment, but the operations of the left-eye imaging unit 101a and the right-eye imaging unit 101b are different. Specifically, the left-eye imaging unit 101a and the right-eye imaging unit 101b according to Embodiment 3 continuously capture still images, and sequentially output the captured images (still images) to the correction unit 102. The output point is different from the second embodiment.

The correction unit 102 does not perform correction processing on the images acquired from the left-eye imaging unit 101a and the right-eye imaging unit 101b until the start of the encoding process is instructed, and outputs the image to the multiplexing unit 103. To do. Then, the motion vector detection unit 105 acquires and acquires images captured by the left-eye imaging unit 101a and the right-eye imaging unit 101b via the correction unit 102, the multiplexing unit 103, and the switching unit 104. The motion vector of each image is detected and only the latest motion vector is stored. Then, the correction unit 102 performs correction processing on the image immediately after the start of the correction processing is instructed, using the latest motion vector detected by the motion vector detection unit 105 immediately before that.

The above configuration enables high-efficiency encoding of still stereoscopic images. Also, it is possible to encode only still stereoscopic image data with no imaging deviation between images, and it is possible to obtain encoded data of a still image of an image that is easy to stereoscopically view and hardly causes eye fatigue.

Note that the left-eye imaging unit 101a and the right-eye imaging unit 101b each have an image memory therein, and each captures a single still image, accumulates the captured image in the captured image memory, and continues the same image (still image). Then, it may be output to the correction unit 102.

(Other variations)
Although the present invention has been described based on the above embodiment, it is needless to say that the present invention is not limited to the above embodiment. The following cases are also included in the present invention.

Each of the above devices is specifically a computer system including a microprocessor, ROM, RAM, a hard disk unit, a display unit, a keyboard, a mouse, and the like. A computer program is stored in the RAM or the hard disk unit. Each device achieves its functions by the microprocessor operating according to the computer program. Here, the computer program is configured by combining a plurality of instruction codes indicating instructions for the computer in order to achieve a predetermined function.

Some or all of the constituent elements constituting each of the above-described devices may be configured by a single system LSI (Large Scale Integration). The system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of component parts on one chip, and specifically, a computer system including a microprocessor, a ROM, a RAM, and the like. is there. A computer program is stored in the RAM. The system LSI achieves its functions by the microprocessor operating according to the computer program.

Some or all of the constituent elements constituting each of the above devices may be constituted by an IC card or a single module that can be attached to and detached from each device. The IC card or module is a computer system that includes a microprocessor, ROM, RAM, and the like. The IC card or the module may include the super multifunctional LSI described above. The IC card or the module achieves its functions by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.

The present invention may be the method described above. Further, the present invention may be a computer program that realizes these methods by a computer, or may be a digital signal composed of a computer program.

The present invention also relates to a computer-readable recording medium capable of reading a computer program or a digital signal, such as a flexible disk, hard disk, CD-ROM, MO, DVD, DVD-ROM, DVD-RAM, BD (Blu-ray Disc), It may be recorded in a semiconductor memory or the like. Further, it may be a digital signal recorded on these recording media.

Further, the present invention may transmit a computer program or a digital signal via an electric communication line, a wireless or wired communication line, a network represented by the Internet, a data broadcast, or the like.

Further, the present invention may be a computer system including a microprocessor and a memory. The memory may store the computer program, and the microprocessor may operate according to the computer program.

Also, the program or digital signal may be recorded on a recording medium and transferred, or the program or digital signal may be transferred via a network or the like, and may be implemented by another independent computer system.

The above embodiment and the above modifications may be combined.

As mentioned above, although embodiment of this invention was described with reference to drawings, this invention is not limited to the thing of embodiment shown in figure. Various modifications and variations can be made to the illustrated embodiment within the same range or equivalent range as the present invention.

In the stereoscopic video encoding apparatus and method of the present invention, a digital camera, a digital video camera, a mobile phone with a camera, a DVD / BD recorder, a TV for recording programs, a web camera, a program distribution server For example, it is useful for encoding, compressing, recording, storing, and transferring stereoscopic image data.

DESCRIPTION OF SYMBOLS 1 Block matching part 2 Parallax compensation

vector detection part

3,6 Memory 4 Correction vector detection part 5 Variable delay part 10,100,200 Image coding apparatus 101a Left-eye imaging part 101b Right-eye imaging part 102 Correction part 103

Multiplexing Unit

104, 108 switching unit 105 motion vector detecting unit 106 encoding unit 107 reference image memory 109 variable length encoding unit 110 encoding mode control unit 111 correction value calculation unit 112 image correction unit 114 intra-screen encoding unit 115 inter-screen encoding Encoding unit 116 encoding control unit

Claims

An image encoding device that encodes a stereoscopic video composed of at least two viewpoints,
An acquisition unit for acquiring the stereoscopic video;
A correction unit that performs a correction process for correcting a shift relating to the size or position of the subject displayed in the stereoscopic image acquired by the acquisition unit;
A motion vector detection unit for detecting a motion vector between two viewpoint videos constituting the stereoscopic video corrected by the correction unit;
An encoding unit that compresses and encodes the stereoscopic video corrected by the correction unit based on the motion vector detected by the motion vector detection unit;
The image encoding apparatus that performs the correction process based on the motion vector detected by the motion vector detection unit before the current correction process.
The correction unit corrects at least one of a shift due to rotation between two viewpoints of a subject displayed in the stereoscopic video, a shift due to enlargement, and a shift due to parallel movement based on the motion vector. The image encoding device according to claim 1.
The correction unit detects at least one of a shift due to the rotation, a shift due to the enlargement, and a shift due to the parallel movement based on the direction of the motion vector, and corrects the shift indicated in the detection result. The image encoding device according to claim 2.
The image encoding device according to claim 3, wherein the correction unit detects a shift associated with the parallel movement based on a vertical component of the motion vector.
The motion vector detection unit detects the motion vector for each region smaller than the entire region of the stereoscopic video corrected by the correction unit,
The correction unit detects a shift caused by the rotation or a shift caused by the enlargement based on a tendency indicated by directions of a plurality of motion vectors detected by the motion vector detection unit. The image encoding device according to item 1.
When the plurality of motion vectors tend to converge toward a predetermined position in the stereoscopic video, or when the correction unit shows a tendency to diffuse from the predetermined position, the correction unit detects the shift due to the enlargement. The image encoding device according to claim 5, wherein the image encoding device is detected.
The image encoding device according to claim 5, wherein the correction unit detects a shift due to the rotation when the plurality of motion vectors have a tendency to draw a circle in the stereoscopic video.
8. The encoding unit starts output of compression-encoded stereoscopic video in response to a lapse of a predetermined period from the start of encoding. The image encoding device described in 1.
The motion vector detection unit further starts detection of a motion vector between the stereoscopic video images acquired by the acquisition unit before the start of encoding is instructed,
The correction unit uses the latest motion vector detected by the motion vector detection unit for the first stereoscopic image acquired by the acquisition unit immediately after the start of the encoding process is instructed. The image encoding device according to any one of claims 1 to 7, wherein correction processing is executed.
The acquisition unit
A first imaging unit that images a subject from a first viewpoint;
The image encoding device according to any one of claims 1 to 9, further comprising: a second imaging unit that images the subject from a second viewpoint.
The motion vector detection unit performs motion for each block of the encoding target image, with one of the images captured at the first time as the encoding target image and the other as a reference image at each of the first and second viewpoints. Detect the vector
The correction unit, at a second time after the first time at each of the first and second viewpoints, based on a tendency of the plurality of motion vectors corresponding to each block of the encoding target image. The image coding apparatus according to claim 10, wherein a deviation relating to a size or a position when the subject is displayed is corrected for at least one of the captured images.
An image encoding method for encoding a stereoscopic video composed of video of at least two viewpoints,
An acquisition step of acquiring the stereoscopic video;
A correction step for executing a correction process for correcting a shift related to the size or position at the time of display of the subject displayed in the stereoscopic video acquired in the acquisition step;
A motion vector detection step of detecting a motion vector between two viewpoint videos constituting the stereoscopic video corrected in the correction step;
An encoding step of compressing and encoding the stereoscopic video corrected in the correction step based on the motion vector detected in the motion vector detection step;
In the correction step, the correction processing is executed based on the motion vector detected in the motion vector detection step before the current correction processing.