JP2014116922A

JP2014116922A - Video playback device and video distribution device

Info

Publication number: JP2014116922A
Application number: JP2013108404A
Authority: JP
Inventors: Akio Kameda; 明男亀田; Katsuhiko Fukazawa; 勝彦深澤; Hideaki Kimata; 英明木全; Akira Kojima; 明小島; Yoshie Yamaguchi; 好江山口; Daisuke Ochi; 大介越智; Yasuaki Tanaka; 康暁田中; Hajime Noto; 肇能登
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-11-19
Filing date: 2013-05-22
Publication date: 2014-06-26
Anticipated expiration: 2033-05-22
Also published as: JP6006680B2

Abstract

PROBLEM TO BE SOLVED: To enable video viewing without a defective video image to be played back if a screen position and/or a screen size displayed during viewing changes.SOLUTION: A video playback device receives from a video distribution device and plays back coded video data in a video playback area which is a designated partial region out of the overall video data. The video playback device includes: distribution request means for requesting the video distribution device to distribute first video data in a video playback region, having a resolution necessary for displaying a video image on a designated screen, and second video data in a video playback region of a minimum resolution when displayed on the video playback device, including the overall video data; reception means for receiving coded video data, obtained by coding the first video data and the second video data and distributed from the video distribution device as coded video data; and display means for decoding the received coded video image data and for superposing the first video data with the second video data to display on a designated screen.

Description

本発明は、低スペックの再生端末に対して高精細映像を配信するために、映像品質を制御して映像の配信、映像の再生を行う映像再生装置及び映像配信装置に関する。 The present invention relates to a video playback apparatus and a video distribution apparatus that control video quality and perform video distribution and video playback in order to distribute high-definition video to a low-spec playback terminal.

近年、４Ｋ解像度、８Ｋ解像度及び巨大なパノラマ映像等のハイビジョン品質を大きく上回る解像度を持つ高精細映像から、視聴者が好みの位置や大きさで自由に操作しながら映像を視聴する技術の研究が行われている（例えば、非特許文献１参照）。非特許文献１では、ハイビジョン品質を越える解像度の巨大なパノラマ映像を対象としている。非特許文献１は、巨大なパノラマ映像を、まず複数のタイルに分割し、それぞれのタイル単位で映像符号化を行っている。図１０にタイル分割の例を示す。図１０は、巨大なパノラマ映像を、複数のタイルに分割した例を示す説明図である。さらに、国際標準規格であるＨ．２６４／ＭＶＣフォーマットに従って、各タイルの映像符号化データをまとめて１つのパノラマ映像ストリームを構成している。 In recent years, research has been conducted on technologies that allow viewers to freely view high-definition images with resolutions that greatly exceed high-definition quality, such as 4K resolution, 8K resolution, and huge panoramic images, while allowing viewers to freely operate images at their preferred positions and sizes. (For example, refer nonpatent literature 1). Non-Patent Document 1 targets a huge panoramic image with a resolution exceeding the high-definition quality. Non-Patent Document 1 first divides a huge panoramic video into a plurality of tiles, and performs video encoding for each tile. FIG. 10 shows an example of tile division. FIG. 10 is an explanatory diagram illustrating an example in which a huge panoramic video is divided into a plurality of tiles. Further, H.I., an international standard. According to the H.264 / MVC format, the video encoded data of each tile is combined to form one panoramic video stream.

その上で、視聴者が、巨大なパノラマ映像から、前述した視聴スタイルのように好みの位置や大きさを自由に操作しながら視聴する場合は、この１つのパノラマ映像ストリームから、視聴者の視聴している領域（表示領域）を含むいくつかのタイルのみを読み込んで復号し、その復号結果の画像からクリッピングして表示装置への表示を行っている（図１１参照）。図１１は、縦３タイル、横４タイルを読み込んで復号し、表示画像部分の画像をクリッピングして表示する例を示す説明図である。 In addition, when a viewer views a huge panoramic video while freely manipulating a favorite position and size as in the above-described viewing style, the viewer can view the video from this single panoramic video stream. Only some tiles including the current area (display area) are read and decoded, and the resulting decoded image is clipped and displayed on the display device (see FIG. 11). FIG. 11 is an explanatory diagram illustrating an example in which the vertical 3 tiles and the horizontal 4 tiles are read and decoded, and the image of the display image portion is clipped and displayed.

なお、視聴者の操作により表示画像の位置や大きさが変化した場合は、これに追随して読み込むタイルの位置を変更したり、大きさに応じて数を増減させ、これらを復号化することで、映像が途切れることなく表示を継続可能とするシステムとなっている。 If the position or size of the display image changes due to the viewer's operation, the position of the tile to be read can be changed following this, or the number can be increased or decreased according to the size, and these can be decoded. In this system, the video can be continuously displayed without interruption.

Hideaki Kimata, Shinya Shimizu, Yutaka Kunita, Megumi Isogai, and Yoshimitsu Ohtani :「Panorama video coding for user-driven interactive video application」,IEEE International Symposium on Consumer Electronics (ISCE) 2009,2009Hideaki Kimata, Shinya Shimizu, Yutaka Kunita, Megumi Isogai, and Yoshimitsu Ohtani: `` Panorama video coding for user-driven interactive video application '', IEEE International Symposium on Consumer Electronics (ISCE) 2009,2009

ところで、非特許文献１のシステムは、汎用的な計算機上においてソフトウェアで構成することが可能であり、以下の（ａ）、（ｂ）の特性を有している。
（ａ）表示画像を得るのに、通常は複数のタイルを復号する必要がある。
（ｂ）表示画像の位置や大きさが変化すると、通常復号するタイルの数が増減する。 Incidentally, the system of Non-Patent Document 1 can be configured by software on a general-purpose computer, and has the following characteristics (a) and (b).
(A) Usually, it is necessary to decode a plurality of tiles in order to obtain a display image.
(B) When the position or size of the display image changes, the number of tiles that are normally decoded increases or decreases.

しかしながら、（ａ）を行うためには、複数のタイルをリアルタイムに復号可能な高スペックな復号化を行うことができる再生装置が必要となるため、低スペックな再生装置では復号化の処理能力が不足し、表示画像を得るのに必要となるタイルが復号できない状況が発生し、この結果、映像が欠けてしまうという問題がある。 However, in order to perform (a), a playback device capable of performing high-spec decoding capable of decoding a plurality of tiles in real time is required. Therefore, a low-spec playback device has a decoding processing capability. There is a problem that the tiles necessary for obtaining the display image are insufficient and the tiles necessary for obtaining the display image cannot be decoded, and as a result, the video is missing.

また、このような問題を解決するために、ハードウェアによる専用復号化装置を導入した場合、（ｂ）の特性から、最大で高精細映像全体を表示画像とすることを想定した全てのタイル数分の専用復号化装置が必要となるが、低スペックな再生装置では一般には専用復号化装置は１つ程度である。仮に、図１１に示す表示画像を得ようとする場合、１２個（縦３個×横４個）のタイルのうちの１つのタイルしか復号化ができない状況が発生し、同様に映像が欠けてしまうという問題がある。 Further, in order to solve such a problem, when a dedicated decoding device by hardware is introduced, the number of all tiles assuming that the entire high-definition video is the maximum display image from the characteristic of (b). However, a low-spec playback device generally has only one dedicated decoding device. If the display image shown in FIG. 11 is to be obtained, a situation occurs in which only one tile out of 12 tiles (3 vertical × 4 horizontal) can be decoded, and similarly, the video is missing. There is a problem of end.

本発明は、このような事情に鑑みてなされたもので、視聴中に表示する画面位置や画面の大きさが変化しても再生するべき映像が欠けてしまうことなく、映像の視聴を行うことができる映像再生装置及び映像配信装置を提供することを目的とする。 The present invention has been made in view of such circumstances, and allows viewing of a video without missing a video to be reproduced even if the screen position or the screen size displayed during viewing changes. It is an object of the present invention to provide a video reproduction device and a video distribution device capable of performing the above.

本発明は、全体映像データのうち指定した一部領域である映像再生領域の符号化映像データを映像配信装置から受信して再生する映像再生装置であって、前記映像配信装置に対して、指定した画面に映像を表示するために必要な解像度の前記映像再生領域の第１の映像データと、前記全体映像データを含み前記映像再生装置に表示する際の最低解像度の前記映像再生領域の第２の映像データとの配信を要求する配信要求手段と、前記第１の映像データと、前記第２の映像データとをそれぞれ符号化し、前記符号化映像データとして前記映像配信装置から配信された前記符号化映像データを受信する受信手段と、受信した前記符号化映像データを復号し、前記第１の映像データと前記第２の映像データとを重畳して指定した前記画面に表示する表示手段とを備えることを特徴とする。 The present invention is a video playback device that receives and plays back encoded video data of a video playback area, which is a specified partial area of the entire video data, from the video distribution device. A first video data in the video playback area having a resolution necessary for displaying video on the screen, and a second video playback area in the video resolution including the entire video data and displayed on the video playback device. The distribution request means for requesting distribution of the video data, the first video data, and the second video data, respectively, and the code distributed from the video distribution device as the encoded video data Receiving means for receiving the encoded video data, decoding the received encoded video data, and displaying the first video data and the second video data superimposed on the designated screen Characterized in that it comprises a shows means.

本発明は、前記全体映像データ上の前記画面の空間的位置、または前記画面の大きさが変更になった場合は、新たな前記画面に表示するべき前記符号化映像データを前記映像配信装置から受信し、受信した前記符号化映像データを復号して変更後の前記画面に表示することを特徴とする。 In the present invention, when the spatial position of the screen on the entire video data or the size of the screen is changed, the encoded video data to be displayed on the new screen is transferred from the video distribution device. Receiving, decoding the received encoded video data, and displaying the decoded video data on the changed screen.

本発明は、全体映像データのうち指定された一部領域である映像再生領域の符号化映像データを映像再生装置に対して配信する映像配信装置であって、前記映像再生装置から指定された画面に映像を表示するために必要な解像度の前記映像再生領域の第１の映像データと、前記全体映像データを含む前記映像再生装置に表示する際の最低解像度の前記映像再生領域の第２の映像データとを前記全体映像データからそれぞれ生成する配信データ生成手段と、前記第１の映像データと、前記第２の映像データとをそれぞれ符号化して、前記符号化映像データを生成する符号化手段と、前記符号化映像データを前記映像再生装置に対して配信する映像配信手段とを備えることを特徴とする。 The present invention is a video distribution device that distributes to a video playback device encoded video data in a video playback region that is a specified partial region of the entire video data, and a screen specified by the video playback device. The first video data in the video playback area having a resolution necessary for displaying video on the screen and the second video in the video playback area having the lowest resolution when displayed on the video playback device including the entire video data Distribution data generating means for generating data from the entire video data, encoding means for encoding the first video data and the second video data, respectively, and generating the encoded video data; And a video distribution means for distributing the encoded video data to the video reproduction device.

本発明は、前記第１の映像データの候補となる所定の条件を満たす映像データと、前記第２の映像データとを予め符号化した前記符号化映像データを複数保存しておく映像データ保存手段をさらに備え、前記映像配信手段は、前記映像データ保存手段から前記符号化映像データを読み出して前記映像再生装置に対して配信することを特徴とする。 The present invention provides video data storage means for storing a plurality of encoded video data obtained by previously encoding video data satisfying a predetermined condition as candidates for the first video data and the second video data. The video distribution means reads out the encoded video data from the video data storage means and distributes the encoded video data to the video reproduction device.

本発明は、前記映像データ保存手段は、データ量が最小となる前記符号化映像データのみを保存しておくことを特徴とする。 The present invention is characterized in that the video data storage means stores only the encoded video data having a minimum data amount.

本発明は、前記配信データ生成手段は、必要な解像度の数と、前記映像再生領域の位置ずらし量に基づき、前記全体映像データから複数の前記第１の映像データを生成し、前記符号化手段は、前記映像再生装置から指定された前記第１の映像データ及び前記第２の映像データのみを符号化することを特徴とする。 According to the present invention, the distribution data generating means generates a plurality of the first video data from the whole video data based on the number of required resolutions and the position shift amount of the video reproduction area, and the encoding means Only encodes the first video data and the second video data designated by the video playback device.

本発明によれば、視聴者の好みの画面位置や画面の大きさに合わせた解像度の映像データと、視聴中に表示する位置や大きさが変化しても映像が欠けることのないように映像全体を含む最低解像度の映像データとを重畳して表示するようにしたため、再生するべき映像が欠けてしまうことなく、映像の視聴を行うことができるという効果が得られる。 According to the present invention, video data having a resolution adapted to the viewer's favorite screen position and screen size and video so that the video is not lost even if the position or size displayed during viewing changes. Since the video data with the lowest resolution including the whole is superimposed and displayed, the video can be viewed without missing the video to be reproduced.

本発明の第１実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of 1st Embodiment of this invention. 本発明のビューの構成例を示す説明図である。It is explanatory drawing which shows the structural example of the view of this invention. 図１に示す映像配信装置１及び映像再生装置２の動作を示す図である。It is a figure which shows operation | movement of the video delivery apparatus 1 and the video reproduction apparatus 2 which are shown in FIG. 各パラメータの定義を示す図である。It is a figure which shows the definition of each parameter. 最適な解像度の映像が欠ける問題を示す図である。It is a figure which shows the problem that the image | video of the optimal resolution is missing. 符号化対象となるビュー数を抑制するための解像度数ｎの候補を抽出する処理動作を示すフローチャートである。It is a flowchart which shows the processing operation | movement which extracts the candidate of the resolution number n for suppressing the view number used as encoding object. 本発明の第２実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of 2nd Embodiment of this invention. 図７に示す映像配信装置１及び映像再生装置２の動作を示す図である。It is a figure which shows operation | movement of the video delivery apparatus 1 and the video reproduction apparatus 2 which are shown in FIG. 最大ビュー数以内で映像品質を最大とするように、デジタルズームの拡大率が最小となるビューの構成を決定する処理動作を示すフローチャートである。It is a flowchart which shows the processing operation which determines the structure of a view in which the expansion rate of a digital zoom becomes the minimum so that video quality may be maximized within the maximum number of views. 巨大なパノラマ映像を、複数のタイルに分割した例を示す説明図である。It is explanatory drawing which shows the example which divided | segmented the huge panoramic image | video into the several tile. 縦３タイル、横４タイルを読み込んで復号し、表示画像部分の画像をクリッピングして表示する例を示す説明図である。It is explanatory drawing which shows the example which reads and decodes 3 vertical tiles and 4 horizontal tiles, and clips and displays the image of a display image part.

＜第１実施形態＞
以下、図面を参照して、本発明の第１実施形態による映像配信装置及び映像再生装置を説明する。図１は第１実施形態の構成を示すブロック図である。この図において、符号１は、映像を配信するサーバコンピュータ装置で構成する映像配信装置である。符号２は、映像配信装置１から配信された映像を再生する映像再生装置であり、コンピュータ端末装置で構成する。 <First Embodiment>
Hereinafter, a video distribution device and a video reproduction device according to a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the first embodiment. In this figure, reference numeral 1 denotes a video distribution apparatus constituted by a server computer apparatus that distributes video. Reference numeral 2 denotes a video reproduction device that reproduces the video distributed from the video distribution device 1, and is configured by a computer terminal device.

符号１０は、配信する映像／音声情報を記憶する情報記憶部である。符号１１は、映像再生装置２に対して送信すべき映像／音声情報を情報記憶部１０から読み出して、さらに映像情報を複数の解像度に変換して出力する映像／音声情報出力部である。なお、映像／音声情報は、カメラ等から映像／音声情報出力部１１に対してリアルタイムに直接入力された情報であってもよい。 Reference numeral 10 denotes an information storage unit that stores video / audio information to be distributed. Reference numeral 11 denotes a video / audio information output unit that reads video / audio information to be transmitted to the video reproduction apparatus 2 from the information storage unit 10, further converts the video information into a plurality of resolutions, and outputs the information. The video / audio information may be information directly input from the camera or the like to the video / audio information output unit 11 in real time.

符号１２は、映像／音声情報出力部１１から出力する映像／音声情報を符号化して出力する映像／音声符号化部である。映像／音声符号化部１２は、映像／音声情報のうち、映像情報を複数の映像に分割（例えば、図１０に示す分割）して符号化する。このとき、映像／音声符号化部１２は、分割して得られた複数の映像の位置を示す複数映像位置情報を送出する。複数映像位置情報は、例えば図１０のような分割の場合は、映像のサイズ（複数の解像度が有る場合は、それぞれの解像度毎の映像サイズ）、タイルのサイズを映像位置情報とし、予め左上からラスタスキャンして番号を付与するといった取り決めにより、映像位置を確定することができる情報である。 Reference numeral 12 denotes a video / audio encoding unit that encodes and outputs video / audio information output from the video / audio information output unit 11. The video / audio encoding unit 12 divides and encodes the video information of the video / audio information into a plurality of videos (for example, division shown in FIG. 10). At this time, the video / audio encoding unit 12 sends out multiple video position information indicating the positions of the multiple videos obtained by the division. For example, in the case of division as shown in FIG. 10, the video position information includes video size (if there are multiple resolutions, video size for each resolution) and tile size as video position information. This is information with which the video position can be determined by an arrangement such as raster scanning and assigning a number.

符号１３は、映像／音声符号化部１２から出力する複数映像位置情報を入力し、映像再生装置２に対して送信する複数映像情報送信部である。符号１４は、映像／音声符号化部１２から出力する符号化した映像／音声情報を保存する映像／音声情報保存部である。符号１５は、送信バッファを備え、映像／音声情報保存部１４に保存されている符号化映像／音声情報を映像再生装置２へ送信する送信部である。符号１６は、注視領域を制御する注視領域制御部である。映像／音声情報保存部１４は、注視領域制御部１６が出力する注視領域を示す情報に基づき保存されている映像／音声情報のうち、該当する映像／音声情報を読み出して、送信部１５へ出力する。 Reference numeral 13 denotes a multiple video information transmission unit that receives the multiple video position information output from the video / audio encoding unit 12 and transmits the information to the video reproduction apparatus 2. Reference numeral 14 denotes a video / audio information storage unit that stores the encoded video / audio information output from the video / audio encoding unit 12. Reference numeral 15 denotes a transmission unit that includes a transmission buffer and transmits the encoded video / audio information stored in the video / audio information storage unit 14 to the video reproduction device 2. Reference numeral 16 denotes a gaze area control unit that controls the gaze area. The video / audio information storage unit 14 reads out the corresponding video / audio information from the video / audio information stored based on the information indicating the gaze area output from the gaze area control unit 16 and outputs the read video / audio information to the transmission unit 15. To do.

符号２１は、複数映像位置情報送信部１３から送信された複数映像位置情報を受信する複数映像位置情報受信部である。符号２２は、複数映像位置情報受信部２１が受信した複数映像位置情報に基づき、視聴している画面の操作結果から、次に必要となる適切な解像度のタイル群（＝注視領域）を決定する画面操作制御部である。符号２３は、映像配信装置１が配信すべき映像／音声情報として画面操作に応じた注視領域の映像／音声情報を映像配信装置１に対して要求する注視領域要求部である。注視領域制御部１６は、注視領域要求部２３から要求された注視領域の映像／音声情報を読み出すように映像／音声情報保存部１４に対して指示する。 Reference numeral 21 denotes a multiple video position information receiving unit that receives the multiple video position information transmitted from the multiple video position information transmitting unit 13. Reference numeral 22 determines a tile group (= gaze area) having an appropriate resolution required next from the operation result of the screen being viewed based on the multiple video position information received by the multiple video position information receiving unit 21. It is a screen operation control unit. Reference numeral 23 denotes a gaze area request unit that requests the video distribution apparatus 1 for video / audio information of a gaze area corresponding to a screen operation as video / audio information to be distributed by the video distribution apparatus 1. The gaze area control unit 16 instructs the video / audio information storage unit 14 to read the video / audio information of the gaze area requested from the gaze area request unit 23.

符号２４は、受信バッファを備え、送信部１５から送信された符号化映像／音声情報を受信する受信部である。符号２５は、受信部２４において受信した符号化映像／音声情報を入力し、復号して出力する映像／音声復号化部である。符号２６は、映像／音声復号化部２５において復号された映像／音声情報を再生するための表示装置とスピーカとから構成する映像／音声再生部である。 Reference numeral 24 denotes a reception unit that includes a reception buffer and receives encoded video / audio information transmitted from the transmission unit 15. Reference numeral 25 denotes a video / audio decoding unit that inputs the encoded video / audio information received by the reception unit 24, decodes it, and outputs it. Reference numeral 26 denotes a video / audio reproduction unit including a display device and a speaker for reproducing the video / audio information decoded by the video / audio decoding unit 25.

図１に示す映像配信装置１及び映像再生装置２は、映像再生装置２側の複数映像位置情報受信部２１から画面操作制御部２２に対して、複数映像位置情報を伝え、視聴者が視聴している注視領域が変更された場合には、複数映像位置情報から、変更された視聴者の注視領域を抽出し、これを注視領域要求部２３に伝え、さらに、これを映像配信装置１側の注視領域制御部１６に送出する。これにより、映像配信装置１側では映像再生装置２側の注視領域にあった複数の映像を送信部１５に送出することで、視聴者の注視領域に適合した符号化済の映像情報が送信され、変更された視聴者の注視領域が視聴可能となる。 The video distribution device 1 and the video playback device 2 shown in FIG. 1 transmit the multiple video location information from the multiple video location information receiving unit 21 on the video playback device 2 side to the screen operation control unit 22 so that the viewer can watch it. When the gaze area is changed, the changed gaze area of the viewer is extracted from the plurality of video position information, and this is transmitted to the gaze area request unit 23, which is further transmitted to the video distribution device 1 side. The image is sent to the gaze area control unit 16. As a result, the video distribution device 1 side transmits a plurality of videos in the gaze area on the video playback device 2 side to the transmission unit 15, thereby transmitting encoded video information suitable for the viewer's gaze area. The changed viewer's gaze area can be viewed.

次に、図１に示す映像配信装置１及び映像再生装置２の動作を説明する。図２は、高精細映像（最高解像度）に対するビューの構成を示す図である。ビューとは、隣のタイルとの重なりの無い図１０に示すタイル構成とは異なり、隣と重なりのある冗長性を持つ構成で、この構成のうち固定されたサイズの１枚の映像情報のことである。ここでは、高スペックな映像再生装置向けのタイルと区別するためビューと称する。まず、映像／音声情報出力部１１は、情報記憶部１０に記憶されている最高解像度の高精細映像から複数の解像度（高解像度、中解像度、低解像度）の映像を生成し出力する。図２の中解像度は１つに限らず複数あっても良い。また、低解像度は、１つのビューのサイズに高精細映像全体が収まるサイズとなる（図２に示す低解像度に該当する）。 Next, operations of the video distribution device 1 and the video reproduction device 2 shown in FIG. 1 will be described. FIG. 2 is a diagram illustrating a view configuration for a high-definition video (maximum resolution). Unlike the tile configuration shown in FIG. 10 that does not overlap with the adjacent tile, the view is a configuration with redundancy that overlaps with the adjacent tile, and one piece of video information of a fixed size in this configuration. It is. Here, in order to distinguish from a tile for a high-spec video reproduction device, it is referred to as a view. First, the video / audio information output unit 11 generates and outputs a plurality of resolutions (high resolution, medium resolution, low resolution) from the highest resolution high-definition video stored in the information storage unit 10. The medium resolution in FIG. 2 is not limited to one and may be plural. Also, the low resolution is a size that fits the entire high-definition video in one view size (corresponding to the low resolution shown in FIG. 2).

次に、映像／音声符号化部１２は、ある解像度の映像に対して、予め固定されたビューのサイズで、左端から少しずつビューの位置を右方向及び下方向にずらして、各ビューを構成する。図２に示す例では、高解像度の映像において、左端から（１）、（２）、（３）のように少しずつ右に位置をずらしてビューを構成している。また、ビューの複数映像位置情報は、映像のサイズ（複数の解像度が有る場合は、それぞれの解像度毎の映像サイズ）、ビューのサイズ、隣り合う右および下のビューのずらし量を映像位置情報とし、予め左上からラスタスキャンして番号を付与するといった取り決めにより、映像位置を確定することができる情報である。 Next, the video / audio encoding unit 12 configures each view by shifting the position of the view little by little from the left end to the right and the bottom with a pre-fixed view size for a video of a certain resolution. To do. In the example shown in FIG. 2, in a high-resolution video, the view is configured by shifting the position little by little to the right as shown in (1), (2), and (3) from the left end. Also, the multiple video position information of the view is the video position information (the video size for each resolution if there are multiple resolutions), the size of the view, and the shift amount of the adjacent right and lower views. This is information with which the video position can be determined by an agreement such as raster scanning from the upper left and assigning a number in advance.

その後、映像／音声符号化部１２は、それぞれのビュー単位で映像符号化を行い、さらに、国際標準規格であるＨ．２６４／ＭＶＣフォーマットに従って、各ビューの映像符号化データをまとめて１つの高精細映像用ストリームを構成して、映像／音声情報保存部１４に保存する。このような高精細映像用ストリームを構成することで、図３に示す処理動作によって、低スペックの映像再生装置２において、高精細映像を視聴者が好きな位置や大きさで視聴することができるようにうなる。図３は、図１に示す映像配信装置１及び映像再生装置２の動作を示す図である。 Thereafter, the video / audio encoding unit 12 performs video encoding for each view unit, and further, H.264, which is an international standard. According to the H.264 / MVC format, the video encoded data of each view is collected to form one high-definition video stream, which is stored in the video / audio information storage unit 14. By configuring such a high-definition video stream, the viewer can view the high-definition video at a position and size that the viewer likes in the low-spec video playback device 2 by the processing operation shown in FIG. Like roaring. FIG. 3 is a diagram showing operations of the video distribution device 1 and the video reproduction device 2 shown in FIG.

まず、低スペックの映像再生装置２から、映像の表示領域が含まれる適切な高解像度のビュー（図３に示すビューＡで、注視領域に相当する）と最低解像度のビュー（図３に示すビューＢ）を配信要求する（図３（ｉ））。 First, from the low-spec video playback device 2, an appropriate high-resolution view including the video display area (view A shown in FIG. 3 corresponds to the gaze area) and the lowest-resolution view (view shown in FIG. 3). B) is requested for distribution (FIG. 3 (i)).

これを受けて、映像配信装置１は、映像再生装置２から要求されたビューＡ、Ｂを配信する（図３（ｉｉ））。映像再生装置２は、適切な高解像度のビューＡと最低解像度のビューＢを同時にデコード（２つのビューを復号）して重畳した映像を得る（図３（ｉｉｉ）。 In response to this, the video distribution device 1 distributes the views A and B requested from the video reproduction device 2 (FIG. 3 (ii)). The video reproduction device 2 obtains a video that is superimposed by decoding an appropriate high-resolution view A and a minimum-resolution view B at the same time (decoding two views) (FIG. 3 (iii)).

次に、映像再生装置２は、得られた映像から視聴者の要求する位置、大きさの映像をクリッピングして、映像／音声再生部２６において映像を再生して表示する（図３（ｉｖ））。そして、視聴者の要求する位置、大きさ（映像の表示領域）が変更された場合は、適切な高解像度のビューを変更して、前述した処理動作を繰り返す。 Next, the video playback device 2 clips the video of the position and size requested by the viewer from the obtained video, and plays back and displays the video in the video / audio playback unit 26 (FIG. 3 (iv)). ). When the position and size requested by the viewer (video display area) are changed, the appropriate high-resolution view is changed and the above-described processing operation is repeated.

候補となる“適切な高解像度のビュー”（図３に示すビューＡ）を予め多数符号化して準備しておくと、高い映像品質が得られることになるが、ビュー数が膨大となるとデータ量が飛躍的に増大する問題がある。しかし、一方で候補となる“適切な高解像度のビュー”の数を減らすと、デジタルズームの拡大率が上がり、映像品質（映像の鮮明さ）が低下するという問題が生じる。 If a large number of candidate “appropriate high-resolution views” (view A shown in FIG. 3) are encoded and prepared in advance, high video quality can be obtained. However, if the number of views becomes large, the amount of data However, there is a problem that the number increases dramatically. However, if the number of “appropriate high-resolution views” that are candidates is reduced, the enlargement ratio of the digital zoom increases and the video quality (the sharpness of the video) decreases.

そこで、これらの問題を解決するため以下の処理動作によって、映像品質を維持しながらも、符号化対象となるビュー数を抑制する高精細映像ストリームのビューの構成を決定する。 Therefore, in order to solve these problems, a view configuration of a high-definition video stream that suppresses the number of views to be encoded while maintaining the video quality is determined by the following processing operation.

まず、各パラメータの定義を行う（図４参照）。図４は、各パラメータの定義を示す図である。
Ｖｘ：ビューの横のサイズ（ｐｉｘｅｌ）
Ｖｙ：ビューの縦のサイズ（ｐｉｘｅｌ）
α：隣り合うビューのずらし量の割合を表す係数（ビューずらし量係数（０＜α＜１））
β：隣り合う解像度との映像サイズの割合（解像度変換率（β＞１））
Ｄｘ：低スペックの映像再生装置２の横の表示解像度（ｐｉｘｅｌ）
Ｄｙ：低スペックの映像再生装置２の縦の表示解像度（ｐｉｘｅｌ） First, each parameter is defined (see FIG. 4). FIG. 4 is a diagram showing the definition of each parameter.
Vx: View horizontal size (pixel)
Vy: vertical size of the view (pixel)
α: coefficient indicating the ratio of the shift amount between adjacent views (view shift amount coefficient (0 <α <1))
β: Ratio of video size with adjacent resolution (resolution conversion rate (β> 1))
Dx: horizontal display resolution of the low-spec video playback device 2 (pixel)
Dy: Vertical display resolution (pixel) of the low-spec video playback device 2

ここでは、最適な高解像度のビューを１つのみデコードする前提があるため、以下の制約がある。
制約：表示領域が（Ｖｘ・（１−α），Ｖｙ・（１−α））（＝右及び下との隣り合うビューとの重なりのサイズ）より大きくなったら、下の解像度のビューに移る。
この制約を越えて表示領域が大きくなると、その解像度では、１つのビューで表示領域をカバーできないケースが発生し、最適な解像度の映像が欠ける問題が発生するためにこの制約が必要である（図５参照）。図５は、最適な解像度の映像が欠ける問題を示す図である。 Here, since there is a premise that only one optimal high-resolution view is decoded, there are the following restrictions.
Restriction: When the display area becomes larger than (Vx · (1−α), Vy · (1−α)) (= size of overlap with adjacent views on the right and the lower side), the view shifts to the lower resolution view. .
If the display area becomes larger than this restriction, there may be a case where the display area cannot be covered by one view at that resolution, and this restriction is necessary because there is a problem that the video with the optimum resolution is missing (see FIG. 5). FIG. 5 is a diagram illustrating a problem that an image with an optimal resolution is missing.

上記の制約のもとで、映像品質として映像の鮮明さに影響をあたえるデジタルズームの拡大率（γ：γ＞１）を一定以下とした上で、符号化対象となるビュー数を抑制する処理動作を図６を参照して説明する。図６は、映像／音声符号化部１２が符号化対象となるビュー数を抑制するための解像度数ｎの候補を抽出する処理動作を示すフローチャートである。 Under the above constraints, the processing operation to suppress the number of views to be encoded with the digital zoom magnification ratio (γ: γ> 1) having an influence on the clearness of the video as the video quality Will be described with reference to FIG. FIG. 6 is a flowchart showing a processing operation in which the video / audio encoding unit 12 extracts candidates of the resolution number n for suppressing the number of views to be encoded.

まず、映像／音声符号化部１２は、解像度数ｎの初期値を決定する（ステップＳ１）。
ビューずらし量＝０で拡大率γｗが閾値γ´以下になるような最小の解像度数ｎを決定する。最初にオリジナルコンテンツサイズと最低解像度のサイズを以下で定義する。
オリジナルコンテンツサイズは、横：Ｚｘ（ｐｉｘｅｌ）、縦：Ｚｙ（ｐｉｘｅｌ）とする。最低解像度のサイズは、横：Ｖｘ（ビューの横のサイズ）、縦：Ｖｙ（ビューの縦のサイズ）とする。 First, the video / audio encoding unit 12 determines an initial value of the resolution number n (step S1).
The minimum number of resolutions n is determined such that the view shift amount = 0 and the enlargement ratio γw is less than or equal to the threshold γ ′. First, the original content size and the minimum resolution size are defined below.
The original content size is horizontal: Zx (pixel) and vertical: Zy (pixel). The minimum resolution size is horizontal: Vx (horizontal size of view) and vertical: Vy (vertical size of view).

このとき、解像度数ｎのときの解像度変換率βは（１）式となる。なお、ここでは、オリジナルコンテンツサイズとビューのサイズのアスペクト比は同一もしくは、オリジナルコンテンツの方がビューのサイズよりもアスペクト比としては横に長いものとする（以下、コンテンツ条件１という）。

At this time, the resolution conversion rate β when the number of resolutions is n is expressed by equation (1). Here, it is assumed that the aspect ratio of the original content size and the view size is the same, or that the original content has an aspect ratio longer than the view size (hereinafter referred to as content condition 1).

もしくは、オリジナルコンテンツの方がビューのサイズよりもアスペクト比としては縦に長いものの場合は（２）式となる（以下、コンテンツ条件２という）。

Alternatively, when the original content has a vertically longer aspect ratio than the view size, the expression (2) is established (hereinafter referred to as content condition 2).

次に、デジタルズームの拡大率γは、コンテンツ条件１の場合は（３）式、コンテンツ条件２の場合は（４）式によって求める。ここでは、前述の制約（表示領域が（Ｖｘ・（１−α），Ｖｙ・（１−α））（＝右及び下との隣り合うビューとの重なりのサイズ）より大きくなったら、下の解像度のビューに移る。）という条件から、下の解像度のビューに解像度が切り替わった直後のデジタルズームの拡大率γがもっとも大きくなる（映像の鮮明さが最も悪い）ことから、これをワーストケースのデジタルズームの拡大率としてγｗとする。

Next, the magnification γ of the digital zoom is obtained by the expression (3) for the content condition 1 and by the expression (4) for the content condition 2. Here, when the above-mentioned constraint (the display area becomes (Vx · (1−α), Vy · (1−α)) (= the size of the overlap between the right and lower adjacent views), Since the zoom ratio γ of the digital zoom immediately after the resolution is switched to the lower resolution view is the largest (the image is the worst), this is the worst-case digital zoom. Let γw be the enlargement ratio.

上記の式より、映像品質として映像の鮮明さに影響をあたえるワーストケースのデジタルズームの拡大率が求められる。 From the above equation, the worst-case digital zoom magnification ratio that affects the sharpness of the image as the image quality is required.

なお、コンテンツ設計者によりデジタルズームの拡大率（γ：γ＞１）の閾値（γ´）を規定する。 The threshold value (γ ′) of the digital zoom magnification (γ: γ> 1) is defined by the content designer.

ここから、仮想的にα＝０としてデジタルズームの拡大率の最大値を求める。まず、初期値ｎ＝２のときのγｗを求め、γｗ≦γ´を満たすまでｎを１つずつ増加させ、最初に満たした解像度数ｎを初期値として決定する。 From this, the maximum value of the enlargement ratio of the digital zoom is obtained by virtually setting α = 0. First, γw when the initial value n = 2 is obtained, n is incremented by 1 until γw ≦ γ ′ is satisfied, and the initially satisfied resolution number n is determined as the initial value.

次に、映像／音声符号化部１２は、ビューずらし量係数αを決定する（ステップＳ２）。すなわち、解像度数ｎを前提とした場合のγｗ≦γ´を満たす、最大のαを求める。なお、解像度変換率βは解像度数ｎに依存するため、前述の計算式で解像度数ｎに合わせてその都度βを算出する。コンテンツ条件１の場合は、（５）式によってαを求める。また、コンテンツ条件２の場合は、（６）式によってαを求める。

Next, the video / audio encoding unit 12 determines a view shift amount coefficient α (step S2). That is, the maximum α that satisfies γw ≦ γ ′ when the number of resolutions n is assumed is obtained. Since the resolution conversion rate β depends on the number of resolutions n, β is calculated each time according to the number of resolutions n in the above formula. In the case of the content condition 1, α is obtained by the equation (5). Further, in the case of the content condition 2, α is obtained by the equation (6).

次に、映像／音声符号化部１２は、制約条件のチェックを行う（ステップＳ３）。解像度数ｎ、ビューずらし量係数αが決定すると、必要となるビューの総数が算出可能となる。なお、例えば、複数のビューを１つの映像ストリームとして扱えるＨ．２６４ＭＶＣを使用した場合は、規格の制約上、ビューの総数は１０２４であり、これを超えることはできない。以下、制約条件となるビューの総数をＨｒｅｓとする。 Next, the video / audio encoding unit 12 checks the constraint condition (step S3). When the resolution number n and the view shift amount coefficient α are determined, the total number of required views can be calculated. Note that, for example, H.264 can handle a plurality of views as one video stream. When H.264 MVC is used, the total number of views is 1024 due to standard restrictions, and this cannot be exceeded. Hereinafter, it is assumed that the total number of views serving as constraint conditions is Hres.

このような制約条件のチェックのため、ビューの総数を算出する。まず、ビューずらし量係数αから「ビューずらし量」Ｅｘ，Ｅｙを（７）式、（８）式によって算出する。
横：Ｅｘ＝Ｅｖｅｎ（Ｖｘ×α）・・・（７）
縦：Ｅｙ＝Ｅｖｅｎ（Ｖｙ×α）・・・（８）
なお、上記で関数Ｅｖｅｎにて偶数化しているが、ビューの符号化時に支障が無ければ他の関数で整数化されても良い。 The total number of views is calculated for checking such a constraint condition. First, “view shift amounts” Ex and Ey are calculated from the view shift amount coefficient α according to equations (7) and (8).
Horizontal: Ex = Even (Vx × α) (7)
Vertical: Ey = Even (Vy × α) (8)
In the above description, the function Even is used to make the number even, but if there is no problem when the view is encoded, it may be converted to an integer using another function.

次に、ある解像度ｎａ（ｎａ＝０，１，２，３…，ｎ−１）のコンテンツサイズを、横：Ｆｎａｘ，縦：Ｆｎａｙと定義する。
ある解像度ｎａ（ｎａ＝０，１，２，３…，ｎ−１）のコンテンツサイズは、オリジナルコンテンツサイズＺｘ，Ｚｙと解像度変換率βから（９）式、（１０）式によって導出できる。ここで、解像度０がオリジナルコンテンツサイズとし、以後、解像度が低くなる毎に解像度１，２，３，４…，ｎ−１となる。
Ｆｎａｘ＝Ｅｖｅｎ（Ｚｘ／β^ｎａ）・・・（９）
Ｆｎａｙ＝Ｅｖｅｎ（Ｚｙ／β^ｎａ）・・・（１０）
なお、（９）式、（１０）式において関数Ｅｖｅｎにて偶数化しているが、ビューの符号化時に支障が無ければ他の関数で整数化されても良い。 Next, the content size of a certain resolution na (na = 0, 1, 2, 3,..., N−1) is defined as horizontal: Fnax, vertical: Fnay.
The content size of a certain resolution na (na = 0, 1, 2, 3,..., N−1) can be derived from the original content sizes Zx, Zy and the resolution conversion rate β by the equations (9) and (10). Here, the resolution 0 is the original content size, and the resolution becomes 1, 2, 3, 4,...
Fnax = Even (Zx / β ^na ) (9)
Fnay = Even (Zy / β ^na ) (10)
Note that although the even number is set by the function Even in the formulas (9) and (10), it may be converted into an integer by another function as long as there is no problem when the view is encoded.

また、ある解像度ｎａにおいて、コンテンツサイズから導出される、実際に符号化対象となる画像サイズを以下のように定義、算出する。ある解像度ｎａ（ｎａ＝０，１，２，３，…）において実際に符号化対象となる画像サイズは、
横：Ｆｎａｘ−Ｖｘ≦０の場合
Ｇｎａｘ＝Ｖｘ・・・（１１）
Ｆｎａｘ−Ｖｘ＞０の場合
Ｇｎａｘ＝Ｒｏｕｎｄｕｐ（Ｆｎａｘ−Ｖｘ）／Ｅｘ，０）×Ｅｘ＋Ｖｘ・・・（１２）
縦：Ｆｎａｙ−Ｖｙ≦０の場合
Ｇｎａｙ＝Ｖｙ・・・（１３）
Ｆｎａｙ−Ｖｙ＞０の場合
Ｇｎａｙ＝Ｒｏｕｎｄｕｐ（Ｆｎａｙ−Ｖｙ）／Ｅｙ，０）×Ｅｙ＋Ｖｙ・・・（１４）
ここで、Ｒｏｕｎｄｕｐ（ａ，ｂ）は、ａを小数点第ｂ位まで表示して以下切り上げの意味である。 Also, the image size that is actually encoded, derived from the content size at a certain resolution na, is defined and calculated as follows. The image size actually to be encoded at a certain resolution na (na = 0, 1, 2, 3,...)
Horizontal: When Fnax−Vx ≦ 0 Gnax = Vx (11)
When Fnax−Vx> 0 Gnax = Roundup (Fnax−Vx) / Ex, 0) × Ex + Vx (12)
Vertical: When Fnay−Vy ≦ 0 Gnay = Vy (13)
When Fnay−Vy> 0 Gnay = Roundup (Fnay−Vy) / Ey, 0) × Ey + Vy (14)
Here, Roundup (a, b) means that “a” is displayed to the second decimal place and rounded up.

さらに、Ｅｘ，Ｅｙから、ある解像度ｎａ（ｎａ＝０，１，２，３…）において符号化で必要となるビュー数は、（１５）式、（１６）式によって算出する。
Ｈｎａｘ＝Ｒｏｕｎｄｕｐ（Ｇｎａｘ−Ｖｘ／Ｅｘ，０）＋１・・・（１５）
Ｈｎａｙ＝Ｒｏｕｎｄｕｐ（Ｇｎａｙ−Ｖｙ／Ｅｙ，０）＋１・・・（１６）
ここで算出されたＨｎａｘ，Ｈｎａｙより必要となるビューの総数（Ｈｓｕｍ）を（１７）式で算出する。

Further, from Ex and Ey, the number of views required for encoding at a certain resolution na (na = 0, 1, 2, 3...) Is calculated by the equations (15) and (16).
Hnax = Roundup (Gnax−Vx / Ex, 0) +1 (15)
Hnay = Roundup (Gnay−Vy / Ey, 0) +1 (16)
The total number of views (Hsum) required from Hnax and Hnay calculated here is calculated by equation (17).

そして、Ｈｓｕｍ≦Ｈｒｅｓの判定を行い、ＮＧであればｎ←ｎ＋１としてステップＳ２に戻る。 Then, Hsum ≦ Hres is determined. If NG, n ← n + 1 and the process returns to step S2.

次に、映像／音声符号化部１２は、データ量を算出する（ステップＳ４）。解像度数ｎに対するデータ量Ｈｓｕｍ（ｎ）はＨｓｕｍ（ｎ）＝Ｈｓｕｍとする。 Next, the video / audio encoding unit 12 calculates a data amount (step S4). The data amount Hsum (n) for the resolution number n is Hsum (n) = Hsum.

次に、映像／音声符号化部１２は、終了条件のチェックを行う（ステップＳ５）。終了条件のチェックは、解像度ｎ≦Ｈｒｅｓの判定を行い、ＯＫであればｎ←ｎ＋１としステップＳ２に戻り、ＮＧであれば処理を終了する。 Next, the video / audio encoding unit 12 checks an end condition (step S5). In checking the end condition, the resolution n ≦ Hres is determined. If OK, n ← n + 1 is set, the process returns to step S2, and if NG, the process ends.

最後に、映像／音声符号化部１２は、ｍｉｎ（Ｈｓｕｍ（ｎ））を満たすｎ（＝ｎｍｉｎ）を求め、この解像度数ｎｍｉｎと対応するビューずらし量係数αでコンテンツの符号化を行い、高精細映像用ストリームを作成する。ｍｉｎ（ａ（ｎ））は、取りうるｎのなかから、最小のａ（ｎ）を求める関数である。 Finally, the video / audio encoding unit 12 obtains n (= nmin) that satisfies min (Hsum (n)), encodes the content with the view shift amount coefficient α corresponding to the resolution number nmin, Create a high-definition video stream. min (a (n)) is a function for obtaining the minimum a (n) from the possible n.

このように、低スペックな映像再生装置でも復号可能なように、復号するタイル数（ビュー数）は最低限の数とする。具体的には、視聴者の好みの位置や大きさに合わせた適切な解像度のビュー（１つ目のビュー）と、その後、視聴中に表示する位置や大きさが変化しても映像が欠けることのないように同時に高精細映像全体を含む最低解像度のビュー（２つ目のビュー）の合計２つのビューを復号するようにした。さらに、復号するビューの数は２つで固定とした。これにより、２つのビューという最低限のデコード能力を持ちうれば、低スペックの映像再生装置であっても、映像が欠ける問題を解決しつつ、視聴者の好みの位置や大きさに合わせた高精細映像の視聴が可能となる。 In this way, the number of tiles to be decoded (the number of views) is set to a minimum so that even a low-spec video reproduction device can decode. Specifically, a view with the appropriate resolution (first view) that matches the viewer's preferred position and size, and then the image is missing even if the position or size displayed during viewing changes. In order to prevent this, a total of two views of the lowest resolution view (second view) including the entire high-definition video are decoded at the same time. Furthermore, the number of views to be decoded is fixed at two. As a result, if you have the minimum decoding capability of two views, even if it is a low-spec video playback device, it solves the problem of lacking video, and it is highly adapted to the viewer's preferred position and size. Fine video can be viewed.

また、低スペックの映像配信装置にビューから映像を切り出して表示する際のデジタルズームの拡大率を一定に抑えた上で、ビューの総数が最小となる（データ量が最小となる）ように高精細映像に対してビューを構成するようにした。これにより、ビューの数が膨大となると、データ量（符号化後に配信サーバに配置するファイルサイズ）が飛躍的に増大（例えば、Ｈ．２６４／ＭＶＣの規格上、通常の映像の最大１０２４倍）してしまう問題が発生することを防ぐことができる。 In addition, high-definition so that the total number of views is minimized (data amount is minimized) while keeping the zoom ratio of digital zoom constant when video is cut out and displayed on a low-spec video distribution device. The view is configured for the video. As a result, when the number of views becomes enormous, the amount of data (file size to be placed on the distribution server after encoding) increases dramatically (for example, a maximum of 1024 times that of normal video according to the H.264 / MVC standard). Can be prevented from occurring.

以上説明したように、高精細映像用ストリームに「ビュー」の概念を導入し、さらに、２つのビューのみを配信、再生するために、必要となる高精細映像用ストリームを作成する際に必要となる解像度数ｎとビューのずらし量係数αと、映像再生時のデジタルズームの拡大率γとの関係を最適化した。また、解像度数ｎ、ビューのずらし量係数α、映像再生時のデジタルズームの拡大率γの関係から、ワーストケースのデジタルズームの拡大率γｗ以下でデータ量が最小となる高精細映像用ストリームの作成するようにした。 As explained above, it is necessary to introduce the concept of “view” into a high-definition video stream, and to create a high-definition video stream that is necessary to deliver and play back only two views. The relationship between the resolution number n, the view shift amount coefficient α, and the digital zoom magnification rate γ during video playback was optimized. Also, from the relationship between the number of resolutions n, the view shift amount coefficient α, and the digital zoom enlargement ratio γ during video reproduction, a high-definition video stream that produces a minimum amount of data below the worst case digital zoom enlargement ratio γw is created. I did it.

これにより、４Ｋ解像度、８Ｋ解像度及び巨大なパノラマ映像等のハイビジョン（ＨＤ）品質を大きく上回る解像度の高精細映像を、視聴者の好きな位置や大きさで視聴する際に、低スペックな映像再生装置であっても視聴することができる。また、視聴者が好きな位置や大きさを変更する際に、映像が欠けることなく、位置や大きさを変更することができる。さらに、高精細映像用ストリームの符号化に際して、ワーストケースのデジタルズームの拡大率以下で、データ量を最小限に抑えることができる。 As a result, low-spec video playback is possible when viewing high-definition video with a resolution that greatly exceeds high-definition (HD) quality, such as 4K resolution, 8K resolution, and huge panoramic video, at the viewer's favorite position and size. Even devices can be viewed. In addition, when changing the position and size that the viewer likes, the position and size can be changed without missing the video. Furthermore, when encoding a high-definition video stream, the amount of data can be kept to a minimum below the worst-case digital zoom magnification.

＜第２実施形態＞
次に、本発明の第２実施形態による映像配信装置及び映像再生装置を説明する。第２実施形態による映像配信装置及び映像再生装置は、ライブ映像の配信に対応するために、規格内のビューの総数（例えば、国際標準規格であるＨ．２６４／ＭＶＣであれば最大ビュー数は１０２４以内）、もしくはビューの切り出しを行うシステム上の性能限界に抑えて高精細映像のビューを構成し、一度、全てのビューを高精細映像から切り出した上で、その後、切り出した全てのビューでは無く、復号に必要な最低限の２つのビューのみを符号化して配信するものである。高精細映像を構成するビューの数が膨大となると、ライブ配信を前提として全ビューを符号化しようとした場合、飛躍的に増大したビューを符号化する装置の負荷を考慮すると、負荷の増大にあわせてシステム構成する装置数が膨大な数となる。さらに、映像／音声符号化部のクラスタ構成のスイッチ等の性能限界を超える可能性が高くなる問題が発生する。第２実施形態による映像配信装置及び映像再生装置は、このような問題を解決することができる。 Second Embodiment
Next, a video distribution device and a video playback device according to a second embodiment of the present invention will be described. The video distribution apparatus and the video reproduction apparatus according to the second embodiment have a total number of views within the standard (for example, the maximum number of views in the case of H.264 / MVC which is an international standard) in order to support the distribution of live video. (Up to 1024), or a high-definition video view is configured to limit the performance limit on the system that cuts out the view. Once all the views have been cut out from the high-definition video, Without encoding, only the minimum two views necessary for decoding are encoded and distributed. If the number of views that make up high-definition video becomes enormous, when trying to encode all the views on the premise of live distribution, the load on the device that encodes a dramatically increased view will be increased. In addition, the number of devices constituting the system is enormous. Furthermore, there is a problem that the possibility of exceeding the performance limit of the switch of the cluster configuration of the video / audio encoding unit is increased. The video distribution device and the video playback device according to the second embodiment can solve such problems.

図７は第２実施形態の構成を示すブロック図である。この図において、図１に示す第１実施形態による装置と同一の部分には同一の符号を付し、その説明を省略する。この図に示す装置が図１に示す装置と異なる点は、情報記憶部１０、映像／音声情報出力部１１、映像／音声符号化部１２及び映像／音声情報保存部１４に代えて、映像／音声情報取得部１７及び映像／音声符号化部１８が設けられている点である。 FIG. 7 is a block diagram showing the configuration of the second embodiment. In this figure, the same parts as those in the apparatus according to the first embodiment shown in FIG. The apparatus shown in this figure is different from the apparatus shown in FIG. 1 in that a video / audio information output unit 11, a video / audio encoding unit 12, and a video / audio information storage unit 14 are replaced with a video / audio information storage unit 10. An audio information acquisition unit 17 and a video / audio encoding unit 18 are provided.

映像／音声情報取得部１７は、ライブ演奏等を撮像するカメラ等で構成し、撮像によって得られた映像／音声情報を映像／音声符号化部１８へ出力する。映像／音声符号化部１８は、映像／音声情報取得部１７から出力する映像／音声情報を符号化して出力する映像／音声符号化部である。映像／音声符号化部１８は、映像／音声情報のうち、映像情報を複数の映像に分割（例えば、図１０に示す分割）して符号化する。このとき、映像／音声符号化部１２は、分割して得られた複数の映像の位置を示す複数映像位置情報を複数映像位置情報送信部１３に対して送出する。複数映像位置情報は、例えば図１０のような分割の場合は、映像のサイズ（複数の解像度が有る場合は、それぞれの解像度毎の映像サイズ）、タイルのサイズを映像位置情報とし、予め左上からラスタスキャンして番号を付与するといった取り決めにより、映像位置を確定することができる情報である。映像／音声符号化部１８は、注視領域制御部１６が出力する注視領域を示す情報に基づき映像／音声情報のうち、該当する映像／音声情報を読み出して、送信部１５へ出力する。 The video / audio information acquisition unit 17 is configured with a camera or the like that captures live performance or the like, and outputs the video / audio information obtained by the imaging to the video / audio encoding unit 18. The video / audio encoding unit 18 is a video / audio encoding unit that encodes and outputs the video / audio information output from the video / audio information acquisition unit 17. The video / audio encoding unit 18 divides and encodes the video information of the video / audio information into a plurality of videos (for example, division shown in FIG. 10). At this time, the video / audio encoding unit 12 sends the multiple video position information indicating the positions of the multiple videos obtained by the division to the multiple video position information transmission unit 13. For example, in the case of division as shown in FIG. 10, the video position information includes video size (if there are multiple resolutions, video size for each resolution) and tile size as video position information. This is information with which the video position can be determined by an arrangement such as raster scanning and assigning a number. The video / audio encoding unit 18 reads out the corresponding video / audio information from the video / audio information based on the information indicating the gaze area output from the gaze region control unit 16 and outputs the read video / audio information to the transmission unit 15.

次に、第２実施形態における高精細映像（最高解像度）に対するビューの構成を説明する。ビューは、前述した第１実施形態におけるビューと同等である。まず、最高解像度の高精細映像から複数の解像度（高解像度、中解像度、低解像度：図２参照）の映像を生成し出力する。中解像度は１つに限らず複数あっても良い。また、低解像度は、１つのビューのサイズに高精細映像全体が収まるサイズとなる（図２に示す低解像度に該当する）。 Next, a view configuration for high-definition video (maximum resolution) in the second embodiment will be described. The view is equivalent to the view in the first embodiment described above. First, a plurality of resolutions (high resolution, medium resolution, low resolution: see FIG. 2) are generated and output from a high-definition video with the highest resolution. The medium resolution is not limited to one and may be a plurality. Also, the low resolution is a size that fits the entire high-definition video in one view size (corresponding to the low resolution shown in FIG. 2).

次に、映像／音声符号化部１８は、ある解像度の映像に対して、予め固定されたビューのサイズで、左端から少しずつビューの位置を右方向及び下方向にずらして、各ビューを構成する。図２に示す例では、高解像度の映像において、左端から（１）、（２）、（３）のように少しずつ右に位置をずらしてビューを構成している。 Next, the video / audio encoding unit 18 configures each view by shifting the position of the view little by little from the left end to the right and the bottom with a view size fixed in advance for a video of a certain resolution. To do. In the example shown in FIG. 2, in a high-resolution video, the view is configured by shifting the position little by little to the right as shown in (1), (2), and (3) from the left end.

その後、映像再生装置２から要求されたビューのみ映像符号化を行い、さらに、国際標準規格であるＨ．２６４／ＭＶＣフォーマットに従って、前述で要求されたビューの映像符号化データをまとめて１つの高精細映像用ストリームを構成する。このような高精細映像用ストリームを構成することで、図８に示す処理動作によって、低スペックの映像再生装置２において、高精細映像を視聴者が好きな位置や大きさで視聴することができるようになる。図８は、図７に示す映像配信装置１及び映像再生装置２の動作を示す図である。 Thereafter, only the view requested by the video playback apparatus 2 is encoded, and the international standard H.264 is used. In accordance with the H.264 / MVC format, the video encoded data of the view requested as described above are combined to form one high-definition video stream. By configuring such a high-definition video stream, the high-definition video can be viewed at a position and size that the viewer likes in the low-spec video playback device 2 by the processing operation shown in FIG. It becomes like this. FIG. 8 is a diagram showing operations of the video distribution device 1 and the video reproduction device 2 shown in FIG.

まず、低スペックの映像再生装置２から、映像の表示領域が含まれる適切な高解像度のビュー（図８のビューＡに相当）と最低解像度のビュー（図８のビューＢに相当）を配信要求する（図８（ｉ））。これを受けて、注視領域制御部１６は、映像／音声符号化部１８に対して、適切な高解像度のビューと、最低解像度のビューの符号化を要求する（図８（ｉｉ））。映像／音声符号化部１８は、高精細映像から構成する全てのビューの映像を切り出した後、適切な高解像度のビューと、最低解像度のビューの２つのビューのみを符号化する（図８（ｉｉｉ））。そして、映像／音声符号化部１８は、適切な高解像度のビューと、最低解像度のビューの２つのビューを１つのストリームとして送信する（図８（ｉｖ））。 First, an appropriate high-resolution view (corresponding to the view A in FIG. 8) and the lowest-resolution view (corresponding to the view B in FIG. 8) including the video display area are requested to be distributed from the low-spec video playback device 2. (FIG. 8 (i)). In response to this, the gaze area control unit 16 requests the video / audio encoding unit 18 to encode an appropriate high-resolution view and a minimum-resolution view (FIG. 8 (ii)). The video / audio encoding unit 18 cuts out videos of all views configured from the high-definition video, and then encodes only two views, an appropriate high-resolution view and a minimum-resolution view (FIG. 8 ( iii)). Then, the video / audio encoding unit 18 transmits two views of an appropriate high-resolution view and a minimum-resolution view as one stream (FIG. 8 (iv)).

次に、送信部１５は、適切な高解像度のビューと、最低解像度のビューの２つのビュー（ビューＡ、Ｂ）を配信する（図８（ｖ））。これを受けて、映像／音声復号化部２５は、適切な高解像度のビュー（ビューＡ）と最低解像度のビュー（ビューＢ）を同時にデコード（２つのビューを復号）して重畳した映像を得る（図８（ｖｉ））。そして、映像／音声再生部２６は、得られた映像から視聴者の要求する位置、大きさの映像をクリッピングして、再生することにより映像を表示する（図８（ｖｉｉ））。そして、視聴者の要求する位置、大きさ（映像の表示領域）が変更された場合は、適切な高解像度のビューを変更して、前述した処理動作を繰り返す。 Next, the transmission unit 15 distributes two views (views A and B) of an appropriate high-resolution view and a minimum-resolution view (FIG. 8 (v)). In response to this, the video / audio decoding unit 25 decodes the appropriate high-resolution view (view A) and the lowest-resolution view (view B) at the same time (decodes two views) to obtain a superimposed video. (FIG. 8 (vi)). Then, the video / audio playback unit 26 displays the video by clipping and playing the video of the position and size requested by the viewer from the obtained video (FIG. 8 (vii)). When the position and size requested by the viewer (video display area) are changed, the appropriate high-resolution view is changed and the above-described processing operation is repeated.

このような処理を行うことにより、ビューＢを同時に配信して映像が欠けなくなるようにすることができるとともに、高精細映像用のビューを構成した上で、復号に必要なビューのみを符号化することより符号化の負荷を低減することもできる。 By performing such processing, the view B can be simultaneously distributed so that the video is not lost, and a view for high-definition video is configured, and only the view necessary for decoding is encoded. Thus, the encoding load can be reduced.

次に、具体的にビューの構成するための処理動作について説明する。規格内のビューの総数（例えば、Ｈ．２６４／ＭＶＣであれば最大１０２４）、もしくは、高精細映像からビューの切り出しを行うシステム上の性能限界が規格内のビュー数以下であればその総数に抑えて高精細映像用のビューを構成する必要がある。そこで、規格内のビューの数もしくは、高精細映像からビューの切り出しを行うシステム上の性能限界のビュー数（以後、規格内およびシステム上の性能限界のビュー数のうち少ない方を「最大ビュー数」とする）以内で映像品質を最大とするように高精細映像用のビューの構成を決定する。 Next, the processing operation for constructing the view will be specifically described. The total number of views within the standard (for example, a maximum of 1024 for H.264 / MVC), or the total number of views if the performance limit on the system for extracting views from high-definition video is less than the number of views within the standard It is necessary to construct a view for high-definition video while suppressing it. Therefore, the number of views within the standard or the number of views with a performance limit on the system that extracts a view from a high-definition video (hereinafter, the smaller of the number of views with a performance limit within the standard and on the system is referred to as “the maximum number of views. “)” Is determined so that the video quality is maximized within the range.

まず、各パラメータの定義を行う。各パラメータは、第１実施形態のパラメータと同様である（図４参照）。
Ｖｘ：ビューの横のサイズ（ｐｉｘｅｌ）
Ｖｙ：ビューの縦のサイズ（ｐｉｘｅｌ）
α：隣り合うビューのずらし量の割合を表す係数（ビューずらし量係数（０＜α＜１））
β：隣り合う解像度との映像サイズの割合（解像度変換率（β＞１））
Ｄｘ：低スペックの映像再生装置２の横の表示解像度（ｐｉｘｅｌ）
Ｄｙ：低スペックの映像再生装置２の縦の表示解像度（ｐｉｘｅｌ） First, each parameter is defined. Each parameter is the same as the parameter of the first embodiment (see FIG. 4).
Vx: View horizontal size (pixel)
Vy: vertical size of the view (pixel)
α: coefficient indicating the ratio of the shift amount between adjacent views (view shift amount coefficient (0 <α <1))
β: Ratio of video size with adjacent resolution (resolution conversion rate (β> 1))
Dx: horizontal display resolution of the low-spec video playback device 2 (pixel)
Dy: Vertical display resolution (pixel) of the low-spec video playback device 2

ここでは、最適な高解像度のビューを１つのみデコードする前提があるため、以下の制約がある。
制約：表示領域が（Ｖｘ・（１−α），Ｖｙ・（１−α））（＝右及び下との隣り合うビューとの重なりのサイズ）より大きくなったら、下の解像度のビューに移る。
この制約を越えて表示領域が大きくなると、その解像度では、１つのビューで表示領域をカバーできないケースが発生し、最適な解像度の映像が欠ける問題が発生するためにこの制約が必要である（図５参照）。 Here, since there is a premise that only one optimal high-resolution view is decoded, there are the following restrictions.
Restriction: When the display area becomes larger than (Vx · (1−α), Vy · (1−α)) (= size of overlap with adjacent views on the right and the lower side), the view shifts to the lower resolution view. .
If the display area becomes larger than this restriction, there may be a case where the display area cannot be covered by one view at that resolution, and this restriction is necessary because there is a problem that the video with the optimum resolution is missing (see FIG. 5).

上記の制約下のもとで、最大ビュー数以内で映像品質を最大とするように、映像品質として映像の鮮明さに影響をあたえるデジタルズームの拡大率（γ：γ＞１）が最小となるビューの構成（＝ビューずらし量：α、複数解像度の解像度数ｎ）を決定する処理動作を図９を参照して説明する。図９は、映像／音声符号化部１８が最大ビュー数以内で映像品質を最大とするように、映像品質として映像の鮮明さに影響をあたえるデジタルズームの拡大率（γ：γ＞１）が最小となるビューの構成を決定する処理動作を示すフローチャートである。 Under the above restrictions, a view that minimizes the zoom ratio (γ: γ> 1) of the digital zoom that affects the sharpness of the video as the video quality so that the video quality is maximized within the maximum number of views. The processing operation for determining the configuration (= view shift amount: α, resolution number n of multiple resolutions) will be described with reference to FIG. FIG. 9 shows that the zoom ratio (γ: γ> 1) of the digital zoom that affects the clearness of the video as the video quality is minimized so that the video / audio encoding unit 18 maximizes the video quality within the maximum number of views. It is a flowchart which shows the processing operation which determines the structure of view which becomes.

まず、映像／音声符号化部１８は、解像度数ｎの初期値（＝２）を設定する（ステップＳ１１）。次に、映像／音声符号化部１８は、解像度数ｎにおけるビューずらし量αｎを算出する（ステップＳ１２〜Ｓ１５）。解像度数ｎにおけるビューずらし量αｎを算出は、まず、オリジナルコンテンツサイズと最低解像度のサイズを以下で定義する。オリジナルコンテンツサイズは、横：Ｚｘ（ｐｉｘｅｌ）、縦：Ｚｙ（ｐｉｘｅｌ）とする。最低解像度のサイズは、横：Ｖｘ（ビューの横のサイズ）、縦：Ｖｙ（ビューの縦のサイズ）とする。 First, the video / audio encoding unit 18 sets an initial value (= 2) of the resolution number n (step S11). Next, the video / audio encoding unit 18 calculates the view shift amount αn at the resolution number n (steps S12 to S15). In calculating the view shift amount αn at the resolution number n, first, the original content size and the minimum resolution size are defined below. The original content size is horizontal: Zx (pixel) and vertical: Zy (pixel). The minimum resolution size is horizontal: Vx (horizontal size of view) and vertical: Vy (vertical size of view).

このとき、解像度数ｎのときの解像度変換率βは（１）式（第１実施形態と同様）となる。なお、ここでは、オリジナルコンテンツサイズとビューのサイズのアスペクト比は同一もしくは、オリジナルコンテンツの方がビューのサイズよりもアスペクト比としては横に長いものとする（以下、コンテンツ条件１という）。 At this time, the resolution conversion rate β when the number of resolutions is n is the expression (1) (similar to the first embodiment). Here, it is assumed that the aspect ratio of the original content size and the view size is the same, or that the original content has an aspect ratio longer than the view size (hereinafter referred to as content condition 1).

もしくは、オリジナルコンテンツの方がビューのサイズよりもアスペクト比としては縦に長いものの場合は（２）式（第１実施形態と同様）となる（以下、コンテンツ条件２という）。 Alternatively, when the original content has a vertically longer aspect ratio than the view size, the expression (2) (similar to the first embodiment) is used (hereinafter referred to as content condition 2).

次に、映像／音声符号化部１８は、ビューずらし量の初期値としてαにΔα（Δαは０に近い非常に小さい数値）を設定する（ステップＳ１２）。続いて、映像／音声符号化部１８は、解像度数ｎ、ビューずらし量係数αからビューの総数を算出する（ステップＳ１３）。まず、ビューずらし量係数αから「ビューずらし量」Ｅｘ，Ｅｙを（７）式、（８）式（第１実施形態と同様）によって算出する。なお、（７）式、（８）式において、関数Ｅｖｅｎにて偶数化しているが、ビューの符号化時に支障が無ければ他の関数で整数化されても良い。 Next, the video / audio encoding unit 18 sets Δα (Δα is a very small value close to 0) as α as an initial value of the view shift amount (step S12). Subsequently, the video / audio encoding unit 18 calculates the total number of views from the resolution number n and the view shift amount coefficient α (step S13). First, “view shift amounts” Ex and Ey are calculated from the view shift amount coefficient α according to equations (7) and (8) (similar to the first embodiment). In Equations (7) and (8), the even number is set by the function Even, but it may be converted to an integer by another function if there is no problem in view encoding.

次に、ある解像度ｎａ（ｎａ＝０，１，２，３…，ｎ−１）のコンテンツサイズを、横：Ｆｎａｘ，縦：Ｆｎａｙと定義する。ある解像度ｎａ（ｎａ＝０，１，２，３…，ｎ−１）のコンテンツサイズは、オリジナルコンテンツサイズＺｘ，Ｚｙと解像度変換率βから（９）式、（１０）式（第１実施形態と同様）によって導出できる。ここで、解像度０がオリジナルコンテンツサイズとし、以後、解像度が低くなる毎に解像度１，２，３，４…，ｎ−１となる。なお、（９）式、（１０）式において関数Ｅｖｅｎにて偶数化しているが、ビューの符号化時に支障が無ければ他の関数で整数化されても良い。 Next, the content size of a certain resolution na (na = 0, 1, 2, 3,..., N−1) is defined as horizontal: Fnax, vertical: Fnay. The content size of a certain resolution na (na = 0, 1, 2, 3,..., N−1) is expressed by the equations (9) and (10) (first embodiment) from the original content sizes Zx and Zy and the resolution conversion rate β. As well). Here, the resolution 0 is the original content size, and the resolution becomes 1, 2, 3, 4,... Note that although the even number is set by the function Even in the formulas (9) and (10), it may be converted into an integer by another function as long as there is no problem when the view is encoded.

また、ある解像度ｎａにおいて、コンテンツサイズから導出される、実際に符号化対象となる画像サイズを以下のように定義、算出する。ある解像度ｎａ（ｎａ＝０，１，２，３，…）において実際に符号化対象となる画像サイズは、第１実施形態と同様に、（１１）式〜（１４）式によって算出できる。 Also, the image size that is actually encoded, derived from the content size at a certain resolution na, is defined and calculated as follows. The image size that is actually the encoding target at a certain resolution na (na = 0, 1, 2, 3,...) Can be calculated by the equations (11) to (14), as in the first embodiment.

さらに、Ｅｘ，Ｅｙから、ある解像度ｎａ（ｎａ＝０，１，２，３…）において符号化で必要となるビュー数は、（１５）式、（１６）式（第１実施形態と同様）によって算出する。ここで算出されたＨｎａｘ，Ｈｎａｙより必要となるビューの総数（Ｈｓｕｍ）を（１７）式（第１実施形態と同様）で算出する。 Furthermore, from Ex and Ey, the number of views required for encoding at a certain resolution na (na = 0, 1, 2, 3...) Is expressed by equations (15) and (16) (similar to the first embodiment). Calculated by The total number of views (Hsum) required from the calculated Hnax and Hnay is calculated by equation (17) (similar to the first embodiment).

そして、映像／音声符号化部１８は、終了条件のチェックを行う（ステップＳ１４）。複数のビューを１つの映像ストリームとして扱えるＨ．２６４ＭＶＣを使用した場合は、規格の制約上、規格内のビュー数の総数は１０２４であり、これを超えることはできない。また、高精細映像からビューの切り出しを行うシステム上の性能限界のビュー数が、規格内のビュー数以下であれば、最大ビュー数は規格内のビュー数以下となる。これに合わせて、制約条件となるビューの総数（＝最大ビュー数）をＨｒｅｓとする。したがって、映像／音声符号化部１８は、Ｈｓｕｍ≦Ｈｒｅｓを満たすかを判定し、ＮＧであれば、αにΔαを加算（α←α＋Δα）して、再度ビューの総数を算出する。一方、ＯＫであれば、このときのαをαｎ＝α（αｎは解像度ｎでのビューずらし量係数を示す）とする(ステップＳ１５）。 Then, the video / audio encoding unit 18 checks the end condition (step S14). H. can handle multiple views as one video stream. When H.264 MVC is used, the total number of views in the standard is 1024 due to the limitations of the standard, and cannot exceed this. In addition, if the number of views with a performance limit on a system that extracts a view from a high-definition video is equal to or smaller than the number of views within the standard, the maximum number of views is equal to or smaller than the number of views within the standard. In accordance with this, the total number of views (= maximum number of views) serving as a constraint condition is Hres. Therefore, the video / audio encoding unit 18 determines whether or not Hsum ≦ Hres is satisfied, and if NG, adds Δα to α (α ← α + Δα), and calculates the total number of views again. On the other hand, if OK, α at this time is set to αn = α (αn indicates a view shift amount coefficient at resolution n) (step S15).

次に、映像／音声符号化部１８は、ワーストケースのデジタルズームの拡大率を算出する（ステップＳ１６）。デジタルズームの拡大率γは、コンテンツ条件１の場合は（３）式、コンテンツ条件２の場合は（４）式によって求める（第１実施形態と同様）。ここでは、前述の制約（表示領域が（Ｖｘ・（１−α），Ｖｙ・（１−α））（＝右及び下との隣り合うビューとの重なりのサイズ）より大きくなったら、下の解像度のビューに移る。）という条件から、下の解像度のビューに解像度が切り替わった直後のデジタルズームの拡大率γがもっとも大きくなる（映像の鮮明さが最も悪い）ことから、これをワーストケースのデジタルズームの拡大率としてγｗとする。上記の式より、映像品質として映像の鮮明さに影響をあたえるワーストケースのデジタルズームの拡大率が求められる。このγｗより、ｒ（ｎ）＝γｗ（ｒ（ｎ）は解像度数ｎのときのワーストケースのデジタルズームの拡大率を示す）として、解像度数ｎのときのワーストケースのデジタルズームの拡大率を保存する。 Next, the video / audio encoding unit 18 calculates the enlargement ratio of the worst case digital zoom (step S16). The enlargement ratio γ of the digital zoom is obtained by the expression (3) in the case of the content condition 1 and by the expression (4) in the case of the content condition 2 (similar to the first embodiment). Here, when the above-mentioned constraint (the display area becomes (Vx · (1−α), Vy · (1−α)) (= the size of the overlap between the right and lower adjacent views), Since the zoom ratio γ of the digital zoom immediately after the resolution is switched to the lower resolution view is the largest (the image is the worst), this is the worst-case digital zoom. Let γw be the enlargement ratio. From the above equation, the worst-case digital zoom magnification ratio that affects the sharpness of the image as the image quality is required. From this γw, the enlargement rate of the worst case digital zoom at the resolution number n is stored as r (n) = γw (r (n) indicates the worst case zoom rate at the resolution number n). .

次に、映像／音声符号化部１８は、終了条件のチェックを行う（ステップＳ１７）。終了条件のチェックは、解像度ｎ≦Ｈｒｅｓの判定を行い、ＯＫであればｎ←ｎ＋１としステップＳ１２に戻り、ＮＧであれば処理を終了する。 Next, the video / audio encoding unit 18 checks an end condition (step S17). For checking the end condition, the resolution n ≦ Hres is determined. If OK, n ← n + 1 is set, and the process returns to step S12. If NG, the process ends.

最後に、映像／音声符号化部１８は、ｍｉｎ（ｒ（ｎ））を満たす解像度数ｎ（＝ｎｍｉｎ）を求め、この解像度数ｎｍｉｎと対応するビューずらし量係数αｎで、高精細映像から全てのビューの映像を切り出した後、配信要求のあったビューのみ映像符号化を行う。ｍｉｎ（ｒ（ｎ））は、取りうるｎのなかから、最小のｒ（ｎ）を求める関数である。 Finally, the video / audio encoding unit 18 obtains a resolution number n (= nmin) that satisfies min (r (n)), and uses the view shift amount coefficient αn corresponding to the resolution number nmin to all the high-definition video. After the video of the view is cut out, only the view requested to be distributed is encoded. min (r (n)) is a function for obtaining the minimum r (n) from the possible n.

以上説明したように、ライブ映像の符号化に際して「ビュー」の概念を導入し、さらに、２つのビューのみを配信、再生するようにした。また、ライブ映像の符号化に際して、「ビュー」の符号化の際には映像再生装置が必要とするビューのみを符号化するようにして、符号化クラスタの規模を縮小した。さらに、必要となるライブ映像の符号化を行う際の「ビューの構成」として解像度数ｎとビューのずらし量係数αを決定する際に、最大ビュー数以内で、映像再生時のワーストケースのデジタルズームを最小限に抑え高品質な映像で再生するようにした。 As described above, the concept of “view” is introduced when encoding live video, and only two views are distributed and reproduced. Also, when encoding a live video, only the view required by the video playback device is encoded when encoding a “view”, thereby reducing the size of the encoding cluster. Furthermore, when determining the resolution number n and view shift amount coefficient α as the “view configuration” when encoding the necessary live video, the worst-case digital zoom during video playback within the maximum number of views. It was made to play with high quality video.

これにより、４Ｋ、８Ｋおよび巨大なパノラマ映像等のハイビジョン（ＨＤ）品質を大きく上回る解像度の高精細のライブ映像を、視聴者の好きな位置や大きさで視聴する際に、低スペックな映像再生装置で視聴することができる。また、視聴者が好きな位置や大きさを変更する際に、映像が欠けることなく、位置や大きさを変更することができる。また、最大ビュー数以内で、ワーストケースのデジタルズームの拡大率を最小化したビューの構成で、ライブ映像の符号化を行うことができる。このため、ライブ映像を対象とした低スペックな映像再生装置で視聴者の好きな位置や大きさで視聴するスタイルの高品質な高精細映像の視聴が可能となる。 This enables low-spec video playback when viewing high-definition live video with a resolution that greatly exceeds high-definition (HD) quality, such as 4K, 8K, and huge panoramic video, at the viewer's favorite position and size. Can be viewed on the device. In addition, when changing the position and size that the viewer likes, the position and size can be changed without missing the video. Also, live video can be encoded with a view configuration that minimizes the worst-case digital zoom magnification within the maximum number of views. For this reason, it is possible to view high-quality high-definition video in the style of viewing at a viewer's favorite position and size with a low-spec video playback device for live video.

前述した実施形態における映像配信装置及び映像再生装置をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されるものであってもよい。 You may make it implement | achieve the video delivery apparatus and video reproduction apparatus in embodiment mentioned above with a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system serving as a server or a client in that case may be included and a program held for a certain period of time. Further, the program may be for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in the computer system. It may be realized using hardware such as PLD (Programmable Logic Device) or FPGA (Field Programmable Gate Array).

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の技術思想及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行ってもよい。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Therefore, additions, omissions, substitutions, and other modifications of the components may be made without departing from the technical idea and scope of the present invention.

低スペックの再生端末に対して高精細映像を配信するために、映像品質を制御して映像の配信、映像の再生を行うことが不可欠な用途に適用できる。 In order to deliver high-definition video to low-spec playback terminals, it can be applied to applications where it is indispensable to control video quality and distribute video and play video.

１・・・映像配信装置、１０・・・情報記憶部、１１・・・映像／音声情報出力部、１２・・・映像／音声符号化部、１３・・・複数映像位置情報送信部、１４・・・映像／音声情報保存部、１５・・・送信部、１６・・・注視領域制御部、１７・・・映像／音声情報取得部、１８・・・映像／音声符号化部、２・・・映像再生装置、２１・・・複数映像位置情報受信部、２２・・・画面操作制御部、２３・・・注視領域要求部、２４・・・受信部、２５・・・映像／音声復号化部、２６・・・映像／音声再生部 DESCRIPTION OF SYMBOLS 1 ... Video delivery apparatus, 10 ... Information storage part, 11 ... Video / audio information output part, 12 ... Video / audio encoding part, 13 ... Multiple video position information transmission part, 14 ... Video / audio information storage unit, 15 ... Transmission unit, 16 ... Gaze area control unit, 17 ... Video / audio information acquisition unit, 18 ... Video / audio encoding unit, 2. ..Video playback device, 21 ... multiple video position information receiving unit, 22 ... screen operation control unit, 23 ... gaze area requesting unit, 24 ... receiving unit, 25 ... video / audio decoding Conversion unit, 26... Video / audio reproduction unit

Claims

A video playback device that receives and plays back encoded video data of a video playback area that is a specified partial area of the entire video data,
The video distribution device includes a first video data in the video playback area having a resolution necessary for displaying video on a designated screen, and a minimum at the time of display on the video playback device including the entire video data. Distribution request means for requesting distribution of the second video data in the video reproduction area of resolution;
Receiving means for encoding the first video data and the second video data, respectively, and receiving the encoded video data distributed from the video distribution device as the encoded video data;
A video reproduction apparatus comprising: display means for decoding the received encoded video data and displaying the first video data and the second video data on the designated screen by superimposing the first video data and the second video data.

When the spatial position of the screen on the whole video data or the size of the screen is changed, the encoded video data to be displayed on the new screen is received from the video distribution device and received. The video playback apparatus according to claim 1, wherein the encoded video data is decoded and displayed on the changed screen.

A video distribution device that distributes encoded video data of a video playback area, which is a specified partial area of the entire video data, to a video playback device,
The first video data in the video playback area having the resolution necessary for displaying video on the screen designated by the video playback device, and the minimum resolution for displaying on the video playback device including the entire video data. Distribution data generating means for generating second video data in the video playback area from the entire video data,
Encoding means for encoding the first video data and the second video data, respectively, to generate the encoded video data;
A video distribution device comprising: video distribution means for distributing the encoded video data to the video reproduction device.

Video data storage means for storing a plurality of encoded video data obtained by previously encoding video data satisfying a predetermined condition as candidates for the first video data and the second video data;
4. The video distribution device according to claim 3, wherein the video distribution unit reads the encoded video data from the video data storage unit and distributes the encoded video data to the video reproduction device.

5. The video distribution apparatus according to claim 4, wherein the video data storage means stores only the encoded video data having a minimum data amount.

The distribution data generating means generates a plurality of the first video data from the whole video data based on the number of necessary resolutions and the position shift amount of the video playback area,
4. The video distribution apparatus according to claim 3, wherein the encoding means encodes only the first video data and the second video data designated by the video reproduction apparatus.