JP2008054159A

JP2008054159A - Video-audio multiplexing apparatus

Info

Publication number: JP2008054159A
Application number: JP2006230099A
Authority: JP
Inventors: Tetsushi Nishioka; 哲志西岡; Akifumi Yamana; 章文山名; Katsumi Hoashi; 克己帆足
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2006-08-28
Filing date: 2006-08-28
Publication date: 2008-03-06

Abstract

PROBLEM TO BE SOLVED: To generate a multiplexing stream which does not make a video and an audio out of synch during a reproduction even if input timings of a video data and an audio data are asynchronous. SOLUTION: A video-audio multiplexing generator is provided with video encoder 103 to generate an encoded video data by encoding the video data SIG101 in video frame unit, and an audio encoder 104 to generate an encoded audio data SIG104 by encoding the audio data SIG102 in audio frame unit. It is controlled by an operation controller 105 so that the video encoder 103 starts to encode at a timing of a video frame boundary of the video data SIG101, and it is controlled so that the audio encoder 104 starts to encode at a timing moved only by a time shown by an input time difference information SIG111 on the basis of an encoding starting timing of the video encoder 103. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、映像データと音声データとの符号化、及び符号化した映像データと符号化した音声データとの多重化を行なう映像音声多重化装置に関するものである。 The present invention relates to a video / audio multiplexing apparatus for encoding video data and audio data, and multiplexing encoded video data and encoded audio data.

近年、情報のデジタル化が進んでおり、映像や音声など信号もデジタル化され利用されている。映像と音声などのデジタルデータ間には関連性が存在するため、一つのストリームデータとして多重化されるのが一般的である。 In recent years, information has been digitized, and signals such as video and audio have been digitized and used. Since there is a relationship between digital data such as video and audio, it is generally multiplexed as one stream data.

映像と音声などのデジタルデータの保存や伝送などの用途では、国際標準規格で定められた多重化の規格があり、その一例として、ＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅｓＥｘｐｅｒｔｓＧｒｏｕｐ）と呼ばれる国際標準規格で定められたＰｒｏｇｒａｍＳｔｒｅａｍ（以降、ＰＳと略記する）やＴｒａｎｓｐｏｒｔＳｔｒｅａｍ（以降、ＴＳと略記する）などが存在する。 In applications such as storage and transmission of digital data such as video and audio, there are multiplexing standards defined by international standards. For example, MPEG (Moving Pictures Experts Group) is defined by an international standard called MPEG (Moving Pictures Experts Group). There are ProgramStream (hereinafter abbreviated as PS), TransportStream (hereinafter abbreviated as TS), and the like.

ＰＳやＴＳは、多重化する映像データと音声データに時間的な相関関係が存在する場合には、映像データと音声データにＰＴＳと呼ばれる再生時間情報を付加することで多重化データの分離後に、時間的な相関関係を復元することができる仕組みを有している。 When there is a temporal correlation between video data and audio data to be multiplexed, PS and TS add reproduction time information called PTS to the video data and audio data to separate the multiplexed data. It has a mechanism that can restore temporal correlation.

また、デジタル化された映像データや音声データは、非常に情報量が多く、そのままで保存や伝送などの用途で扱うのには向いていない。多重化される前の段階で情報量を減少させるためには、ＭＰＥＧ２などの符号化の規格に従い、映像データや音声データを符号化する処理が行われる。映像データは、一般的にサンプル期間毎の静止画データの集合（ビデオフレーム）であり、映像データを符号化する場合、ビデオフレーム単位で符号化が行われる。一方、音声データは、一般的にサンプル期間毎の音の強弱データの集合であるが、サンプル期間が非常に短いため、音声データを符号化する場合は、サンプル期間よりも大きいある一定期間のデータの集合（オーディオフレーム）として音声データを扱い、オーディオフレーム単位で符号化が行われる。 Also, digitized video data and audio data have a very large amount of information and are not suitable for use in storage or transmission as they are. In order to reduce the amount of information at the stage before multiplexing, processing for encoding video data and audio data is performed in accordance with an encoding standard such as MPEG2. Video data is generally a set of still image data (video frames) for each sample period. When video data is encoded, encoding is performed in units of video frames. On the other hand, audio data is generally a collection of sound intensity data for each sample period. However, since the sample period is very short, when audio data is encoded, data for a certain period larger than the sample period is used. The audio data is handled as a set (audio frame) of the above, and encoding is performed in units of audio frames.

例えば、ＭＰＥＧ２規格のＴＳを出力する映像音声多重化装置では、入力された映像データと音声データとを順次符号化して、生成したＴＳを所定の出力レート、及び出力タイミングで送信するものがある（例えば、特許文献１を参照）。 For example, a video / audio multiplexing apparatus that outputs MPEG-2 standard TS sequentially encodes input video data and audio data and transmits the generated TS at a predetermined output rate and output timing ( For example, see Patent Document 1).

また、映像音声多重化装置には、符号化の一時停止が可能なものがある。このような装置の一例としては、一時停止指示を受付けると映像信号を入手するタイミングに同期させて映像信号と音声信号の入手を同時に一時停止し、以後、各符号化、多重化、記録をそれぞれのタイミングで一時停止するものが知られている（例えば、特許文献２を参照）。
特開２００１−７８１９５号公報特開２００１−１６０９６８号公報 Some video / audio multiplexing apparatuses are capable of temporarily stopping encoding. As an example of such a device, when receiving a pause instruction, the acquisition of the video signal and the audio signal is paused at the same time in synchronization with the timing of obtaining the video signal, and thereafter, each encoding, multiplexing, and recording are respectively performed. Is known that pauses at the timing (see, for example, Patent Document 2).
JP 2001-78195 A JP 2001-160968 A

デジタル化された映像データや音声データの利用拡大に伴い、映像データの高画質化や音声データの高音質化への要望は必然の流れである。それに伴って、符号化される映像データや音声データに対し、符号化処理の前段階として高画質化処理や高音質化処理が行なわれる場合がある。 With the expansion of use of digitized video data and audio data, there is an inevitable demand for higher image quality of video data and higher sound quality of audio data. Along with this, image quality improvement processing and sound quality improvement processing may be performed on the encoded video data and audio data as a pre-stage of the encoding processing.

映像データに対する高画質化処理と、音声データに対する高音質化処理とでは、処理時間が異なる場合がほとんどである。また、処理内容によっても処理時間が変化する。そのため、映像データの高画質化や音声データの高音質化により、映像音声多重化装置への映像データと音声データの入力タイミングに時間差が発生してしまう可能性がある。 In most cases, the processing time differs between the image quality enhancement processing for video data and the sound quality enhancement processing for audio data. Also, the processing time varies depending on the processing content. For this reason, there is a possibility that a time difference occurs between the input timing of the video data and the audio data to the video / audio multiplexing device due to the high image quality of the video data and the high quality of the audio data.

しかしながら、従来の映像音声多重化装置は、映像データと音声データとが同時に入力されることを前提に、符号化等の処理を行なっているので、映像データと音声データとの入力タイミングに時間差があると、正しい再生時間情報を付加できず、再生時に映像と音声とがずれてしまい、その結果、視聴者に違和感を与えるおそれがある。 However, since the conventional video / audio multiplexing apparatus performs processing such as encoding on the assumption that video data and audio data are input simultaneously, there is a time difference in the input timing of the video data and audio data. If so, the correct playback time information cannot be added, and the video and audio are shifted during playback, and as a result, the viewer may feel uncomfortable.

本発明は上記の問題に着目してなされたものであり、映像データと音声データの入力タイミングにずれがあっても、再生時に、映像と音声とがずれない多重化ストリームを生成できる映像音声多重化装置を提供することを目的としている。 The present invention has been made paying attention to the above-mentioned problem. Even if there is a difference in the input timing between the video data and the audio data, the video / audio multiplexing capable of generating a multiplexed stream in which the video and the audio do not shift at the time of reproduction. An object is to provide a device.

前記の課題を解決するため、本発明では、映像データの符号化開始タイミングと、音声データの符号化開始タイミングとを、映像データと音声データの入力タイミングのずれ量に応じて調整するようにした。 In order to solve the above problems, in the present invention, the encoding start timing of video data and the encoding start timing of audio data are adjusted according to the amount of deviation between the input timing of video data and audio data. .

本発明の一態様は、
時間的な相関関係を有する映像データと音声データとが入力され、前記映像データを符号化した符号化映像データと前記音声データを符号化した符号化音声データとを生成し、前記符号化映像データと前記符号化音声データとを多重化して１つの多重化ストリームを生成する映像音声多重化装置であって、
前記映像データをビデオフレーム単位で符号化して、前記符号化映像データを生成するビデオ符号化部と、
前記音声データをオーディオフレーム単位で符号化して、前記符号化音声データを生成するオーディオ符号化部と、
前記映像データと前記音声データとの入力タイミングのずれ時間を示す入力時間差情報が入力されており、前記映像データのビデオフレーム境界の検出タイミングで、前記ビデオ符号化部が符号化を開始するように制御するとともに、前記ビデオ符号化部の符号化開始タイミングを基準に、前記入力時間差情報が示す時間だけずれたタイミングで、前記オーディオ符号化部が符号化を開始するように制御する動作制御部と、
を備えたことを特徴とする。 One embodiment of the present invention provides:
Video data and audio data having temporal correlation are inputted, and encoded video data obtained by encoding the video data and encoded audio data encoded by the audio data are generated, and the encoded video data And an audio / video multiplexing apparatus that multiplexes the encoded audio data and generates one multiplexed stream,
A video encoding unit that encodes the video data in units of video frames and generates the encoded video data;
An audio encoding unit that encodes the audio data in units of audio frames and generates the encoded audio data;
Input time difference information indicating a time difference between input timings of the video data and the audio data is input, and the video encoding unit starts encoding at a detection timing of a video frame boundary of the video data. And an operation control unit that controls the audio encoding unit to start encoding at a timing shifted by a time indicated by the input time difference information with reference to an encoding start timing of the video encoding unit. ,
It is provided with.

また、本発明の一態様は、
時間的な相関関係を有する映像データと音声データとが入力され、前記映像データを符号化した符号化映像データと前記音声データを符号化した符号化音声データとを生成し、前記符号化映像データと前記符号化音声データとを多重化して１つの多重化ストリームを生成する映像音声多重化装置であって、
前記映像データをビデオフレーム単位で符号化して、前記符号化映像データを生成するビデオ符号化部と、
前記音声データをオーディオフレーム単位で符号化して、前記符号化音声データを生成するオーディオ符号化部と、
前記映像データと前記音声データとの入力タイミングのずれ時間を示す入力時間差情報が入力されており、前記映像データのビデオフレーム境界の検出タイミングで、前記ビデオ符号化部が符号化を開始するように制御するとともに、前記ビデオ符号化部の符号化開始タイミングを基準に、ビデオフレーム期間単位の時間だけずれたタイミングで、前記オーディオ符号化部が符号化を開始するように制御し、さらに、前記ビデオ符号化部と前記オーディオ符号化部の符号化開始タイミングのずれ時間と、前記入力時間差情報との差を示す時間差分情報を出力する動作制御部と、
前記時間差分情報が示す時間差に応じた再生時間情報を付加して、前記符号化映像データと前記符号化音声データとを多重化する多重化部と、
を備えたことを特徴とする。 One embodiment of the present invention includes
Video data and audio data having temporal correlation are inputted, and encoded video data obtained by encoding the video data and encoded audio data encoded by the audio data are generated, and the encoded video data And an audio / video multiplexing apparatus that multiplexes the encoded audio data and generates one multiplexed stream,
A video encoding unit that encodes the video data in units of video frames and generates the encoded video data;
An audio encoding unit that encodes the audio data in units of audio frames and generates the encoded audio data;
Input time difference information indicating a time difference between input timings of the video data and the audio data is input, and the video encoding unit starts encoding at a detection timing of a video frame boundary of the video data. And controlling the audio encoding unit to start encoding at a timing shifted by a time of a video frame period with reference to the encoding start timing of the video encoding unit, and further, An operation control unit for outputting time difference information indicating a difference between a coding start timing shift time of the encoding unit and the audio encoding unit and the input time difference information;
A multiplexing unit for adding reproduction time information corresponding to the time difference indicated by the time difference information and multiplexing the encoded video data and the encoded audio data;
It is provided with.

本発明によれば、映像データの符号化開始タイミングと、音声データの符号化開始タイミングとが、映像データと音声データの入力タイミングのずれ量に応じて調整されるので、映像データと音声データの入力タイミングにずれがあっても、再生時に、映像と音声とがずれない多重化ストリームを生成できる。 According to the present invention, the encoding start timing of the video data and the encoding start timing of the audio data are adjusted according to the shift amount of the input timing of the video data and the audio data. Even if there is a difference in input timing, it is possible to generate a multiplexed stream in which video and audio do not shift during playback.

以下、本発明の実施形態について図面を参照しながら説明する。なお、以下の各実施形態の説明において、一度説明した構成要素と同様の機能を有する構成要素については、同一の符号を付して説明を省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description of each embodiment, components having the same functions as those described once are given the same reference numerals and description thereof is omitted.

《発明の実施形態１》
図１は、本発明の実施形態１に係る映像音声多重化装置１００の構成を示すブロック図である。映像音声多重化装置１００は、入力された映像データ（映像データＳＩＧ１０１）と音声データ（音声データＳＩＧ１０２）とをそれぞれ符号化して、符号化した映像データ（符号化映像データＳＩＧ１０５）と符号化した音声データ（符号化音声データＳＩＧ１０６）とを多重化して１つのストリーム（多重化ストリームＳＩＧ１０７）にする装置である。 Embodiment 1 of the Invention
FIG. 1 is a block diagram showing a configuration of a video / audio multiplexing apparatus 100 according to Embodiment 1 of the present invention. The video / audio multiplexing apparatus 100 encodes the input video data (video data SIG101) and audio data (audio data SIG102), and encodes the encoded video data (encoded video data SIG105). This is a device that multiplexes data (encoded audio data SIG106) into one stream (multiplexed stream SIG107).

［映像音声多重化装置１００の構成］
映像音声多重化装置１００は、図１に示すように、検出部１０１、時刻計測部１０２、ビデオ符号化部１０３、オーディオ符号化部１０４、動作制御部１０５、映像データ用バッファ１０６、音声データ用バッファ１０７、及び多重化部１０８を備えている。 [Configuration of Video / Audio Multiplexer 100]
As shown in FIG. 1, the video / audio multiplexing apparatus 100 includes a detection unit 101, a time measurement unit 102, a video encoding unit 103, an audio encoding unit 104, an operation control unit 105, a video data buffer 106, and audio data. A buffer 107 and a multiplexing unit 108 are provided.

（検出部１０１）
検出部１０１は、映像データＳＩＧ１０１からビデオフレーム境界を検出し、ビデオフレーム境界情報ＳＩＧ１１２によって、ビデオフレーム境界の検出を、動作制御部１０５に通知するようになっている。 (Detector 101)
The detection unit 101 detects a video frame boundary from the video data SIG101, and notifies the operation control unit 105 of the detection of the video frame boundary based on the video frame boundary information SIG112.

（時刻計測部１０２）
時刻計測部１０２は、時刻情報（時刻情報ＳＩＧ１１６）を動作制御部１０５に通知するようになっている。 (Time measuring unit 102)
The time measurement unit 102 notifies the operation control unit 105 of time information (time information SIG116).

（ビデオ符号化部１０３）
ビデオ符号化部１０３は、符号化の開始と一時停止を制御する映像符号化制御情報ＳＩＧ１１３が入力されており、映像符号化制御情報ＳＩＧ１１３に応じて、映像データＳＩＧ１０１を符号化して符号化映像データＳＩＧ１０３を生成し、映像データ用バッファ１０６に出力するようになっている。また、ビデオ符号化部１０３は、１ビデオフレーム分の符号化が完了する毎に、符号化映像データＳＩＧ１０３のビデオフレーム情報（ビデオフレーム情報ＳＩＧ１０８）を多重化部１０８に通知する。ビデオフレーム情報ＳＩＧ１０８は、符号化映像データＳＩＧ１０３のサイズの情報や、例えば符号化映像データＳＩＧ１０３がＭＰＥＧ形式のデータであれば、多重化の際の順番情報を含んでいる。 (Video encoding unit 103)
The video encoding unit 103 is input with video encoding control information SIG113 for controlling the start and pause of encoding, and encodes video data SIG101 according to the video encoding control information SIG113 to generate encoded video data. The SIG 103 is generated and output to the video data buffer 106. Further, every time encoding of one video frame is completed, the video encoding unit 103 notifies the multiplexing unit 108 of video frame information (video frame information SIG108) of the encoded video data SIG103. The video frame information SIG108 includes information on the size of the encoded video data SIG103, and, for example, if the encoded video data SIG103 is data in the MPEG format, the order information at the time of multiplexing is included.

（オーディオ符号化部１０４）
オーディオ符号化部１０４は、符号化の開始と一時停止を制御する音声符号化制御情報ＳＩＧ１１４が入力されており、音声符号化制御情報ＳＩＧ１１４応じて、音声データＳＩＧ１０２を符号化して符号化音声データＳＩＧ１０４を生成し、音声データ用バッファ１０７に出力するようになっている。また、オーディオ符号化部１０４は、１オーディオフレーム分の符号化が完了する毎に、符号化音声データＳＩＧ１０４のオーディオフレーム情報（オーディオフレーム情報ＳＩＧ１０９）を多重化部１０８に通知する。オーディオフレーム情報ＳＩＧ１０９は、符号化音声データＳＩＧ１０４のサイズの情報を含んでいる。 (Audio encoding unit 104)
The audio encoding unit 104 receives the audio encoding control information SIG114 for controlling the start and pause of encoding, encodes the audio data SIG102 according to the audio encoding control information SIG114, and encodes the encoded audio data SIG104. Is generated and output to the audio data buffer 107. In addition, every time encoding of one audio frame is completed, the audio encoding unit 104 notifies the multiplexing unit 108 of audio frame information (audio frame information SIG109) of the encoded audio data SIG104. The audio frame information SIG109 includes information on the size of the encoded audio data SIG104.

（動作制御部１０５）
動作制御部１０５は、符号化の開始と一時停止を制御する情報を含んだ制御情報ＳＩＧ１１０が入力されており、制御情報ＳＩＧ１１０に応じて、ビデオ符号化部１０３、オーディオ符号化部１０４、及び多重化部１０８の動作を制御するようになっている。この制御は、音声データＳＩＧ１０２及び映像データＳＩＧ１０１のうち、何れの入力が遅れるかによって異なっている。 (Operation control unit 105)
The operation control unit 105 receives control information SIG110 including information for controlling start and pause of encoding, and the video encoding unit 103, the audio encoding unit 104, and the multiplexing are controlled according to the control information SIG110. The operation of the conversion unit 108 is controlled. This control differs depending on which input of the audio data SIG102 and the video data SIG101 is delayed.

音声データＳＩＧ１０２が映像データＳＩＧ１０１よりも遅れて入力される場合には、動作制御部１０５は、以下のように、符号化及び多重化の開始及び一時停止を制御する。 When the audio data SIG102 is input later than the video data SIG101, the operation control unit 105 controls the start and pause of encoding and multiplexing as follows.

動作制御部１０５は、符号化開始を指示する制御情報ＳＩＧ１１０が入力されると、多重化開始を指示する多重化制御情報ＳＩＧ１１５を多重化部１０８に出力して、多重化部１０８に動作を開始させる。 When the control information SIG110 instructing the start of encoding is input, the operation control unit 105 outputs the multiplexing control information SIG115 instructing the start of multiplexing to the multiplexing unit 108 and starts the operation to the multiplexing unit 108 Let

そして、動作制御部１０５は、符号化する最初のビデオフレームのビデオフレーム境界情報ＳＩＧ１１２が入力されたタイミング（映像データ符号化開始時刻）で、符号化開始を指示する映像符号化制御情報ＳＩＧ１１３を出力して、ビデオ符号化部１０３に符号化動作を開始させる。 Then, the operation control unit 105 outputs video encoding control information SIG113 instructing the start of encoding at the timing (video data encoding start time) when the video frame boundary information SIG112 of the first video frame to be encoded is input. Then, the video encoding unit 103 starts the encoding operation.

また、動作制御部１０５は、映像データＳＩＧ１０１と音声データＳＩＧ１０２との入力時間差を示す入力時間差情報ＳＩＧ１１１が入力されており、映像データ符号化開始時刻から、入力時間差情報ＳＩＧ１１１が示す入力時間差だけ遅れたタイミングを、時刻情報ＳＩＧ１１６に基づいて検出し、検出したタイミングで、音声符号化制御情報ＳＩＧ１１４を出力して、オーディオ符号化部１０４に符号化動作を開始させる。 In addition, the operation control unit 105 receives the input time difference information SIG111 indicating the input time difference between the video data SIG101 and the audio data SIG102, and is delayed from the video data encoding start time by the input time difference indicated by the input time difference information SIG111. The timing is detected based on the time information SIG116, and the audio encoding control information SIG114 is output at the detected timing to cause the audio encoding unit 104 to start the encoding operation.

また、符号化の一時停止を指示する制御情報ＳＩＧ１１０が入力されると、動作制御部１０５は、制御情報ＳＩＧ１１０を受けた後に入力される最初のビデオフレーム境界情報ＳＩＧ１１２の入力タイミング（映像データ符号化一時停止時刻）で、符号化の一時停止を指示する映像符号化制御情報ＳＩＧ１１３を出力して、ビデオ符号化部１０３の符号化動作を一時停止させる。さらに、動作制御部１０５は、映像データ符号化一時停止時刻を基準に、入力時間差情報ＳＩＧ１１１が示す時間差だけ遅れたタイミングを時刻情報ＳＩＧ１１６に基づいて検出し、検出したタイミングで、符号化の一時停止を指示する音声符号化制御情報ＳＩＧ１１４を出力して、オーディオ符号化部１０４の符号化動作を一時停止させる。 When the control information SIG110 instructing the temporary stop of the encoding is input, the operation control unit 105 receives the input timing (video data encoding) of the first video frame boundary information SIG112 input after receiving the control information SIG110. Video encoding control information SIG113 for instructing to stop encoding is output at a pause time), and the encoding operation of the video encoding unit 103 is temporarily stopped. Furthermore, the operation control unit 105 detects a timing delayed by the time difference indicated by the input time difference information SIG111 based on the video data encoding pause time based on the time information SIG116, and pauses the encoding at the detected timing. Audio encoding control information SIG114 is output, and the encoding operation of the audio encoding unit 104 is temporarily stopped.

一方、映像データＳＩＧ１０１が音声データＳＩＧ１０２よりも遅れて入力される場合には、動作制御部１０５は、以下のように、符号化及び多重化の開始及び一時停止を制御する。 On the other hand, when the video data SIG101 is input later than the audio data SIG102, the operation control unit 105 controls the start and pause of encoding and multiplexing as follows.

動作制御部１０５は、符号化開始を指示する制御情報ＳＩＧ１１０が入力されると、符号化する最初のビデオフレームのビデオフレーム境界よりも、入力時間差情報ＳＩＧ１１１が示す時間差Δｔ（映像データＳＩＧ１０１が遅れる場合は負の値）だけ早いタイミングで、音声符号化制御情報ＳＩＧ１１４を出力する。 When the control information SIG110 instructing the start of encoding is input, the operation control unit 105 receives the time difference Δt indicated by the input time difference information SIG111 (when the video data SIG101 is delayed) from the video frame boundary of the first video frame to be encoded. The speech encoding control information SIG114 is output at a timing earlier by (negative value).

具体的には、動作制御部１０５は、音声符号化制御情報ＳＩＧ１１４を出力するタイミング（音声データ符号化開始時刻）として、ｔ＋（ビデオフレーム期間）−（｜Δｔ｜％（ビデオフレーム期間））を算出する。ただし、ｔは、制御情報ＳＩＧ１１０が入力された後の最初に検出されるビデオフレーム境界の検出時刻である。また、｜Δｔ｜％（ビデオフレーム期間）は、｜Δｔ｜を（ビデオフレーム期間）で割り算したときの余りである。 Specifically, the operation control unit 105 sets t + (video frame period) − (| Δt |% (video frame period)) as a timing (audio data encoding start time) for outputting the audio encoding control information SIG114. calculate. However, t is the detection time of the video frame boundary detected first after the control information SIG110 is input. Also, | Δt |% (video frame period) is the remainder when | Δt | is divided by (video frame period).

さらに、動作制御部１０５は、（音声データ符号化開始時刻）＋｜Δｔ｜で表される時刻において、ビデオ符号化部１０３に対して、符号化開始を指示する映像符号化制御情報ＳＩＧ１１３を通知する。 Further, the operation control unit 105 notifies the video encoding unit 103 of video encoding control information SIG113 instructing the encoding start at a time represented by (audio data encoding start time) + | Δt |. To do.

符号化の一時停止を指示する制御情報ＳＩＧ１１０が入力されると、動作制御部１０５は、ｔ＋（ビデオフレーム期間）−（｜Δｔ｜％（ビデオフレーム期間））で定まる時刻（音声データ符号化一時停止時刻）に、符号化一時停止を指示する音声符号化制御情報ＳＩＧ１１４をオーディオ符号化部１０４に通知する。ただし、ｔは、制御情報ＳＩＧ１１０が入力された後に検出される最初のビデオフレーム境界の検出時刻である。さらに、動作制御部１０５は、（音声データ符号化一時停止時刻）＋｜Δｔ｜で表される時刻に、符号化一時停止を指示する映像符号化制御情報ＳＩＧ１１３をビデオ符号化部１０３に通知する。 When the control information SIG110 instructing the temporary stop of the encoding is input, the operation control unit 105 determines a time (audio data encoding temporary) determined by t + (video frame period) − (| Δt |% (video frame period)). At the stop time), the audio encoding unit 104 is notified of the audio encoding control information SIG114 instructing the encoding pause. However, t is the detection time of the first video frame boundary detected after the control information SIG110 is input. Further, the operation control unit 105 notifies the video encoding unit 103 of video encoding control information SIG113 instructing encoding pause at a time represented by (audio data encoding pause time) + | Δt |. .

（映像データ用バッファ１０６・音声データ用バッファ１０７）
映像データ用バッファ１０６は、ビデオ符号化部１０３が出力した符号化映像データＳＩＧ１０３を保持するようになっている。 (Video data buffer 106 and audio data buffer 107)
The video data buffer 106 holds the encoded video data SIG 103 output from the video encoding unit 103.

映像データ用バッファ１０６の容量は、映像データＳＩＧ１０１と音声データＳＩＧ１０２との入力タイミングにずれが無い場合に必要な容量（すなわち、符号化映像データＳＩＧ１０５と符号化音声データＳＩＧ１０６との多重化のために、バッファリングする必要のある符号化映像データＳＩＧ１０５の容量）に加えて、ビデオ符号化部１０３とオーディオ符号化部１０４との符号化開始時間差分の時間に、ビデオ符号化部１０３が符号化したデータを保持できる容量が必要である。 The capacity of the video data buffer 106 is a capacity required when there is no deviation in the input timing of the video data SIG101 and the audio data SIG102 (that is, for multiplexing of the encoded video data SIG105 and the encoded audio data SIG106). In addition to the capacity of the encoded video data SIG 105 that needs to be buffered), the video encoding unit 103 encoded the difference in the encoding start time difference between the video encoding unit 103 and the audio encoding unit 104. A capacity that can hold data is required.

本実施形態では、符号化映像データＳＩＧ１０３が多重化されるタイミングは、音声データが映像データよりも遅れる場合は、遅れがない場合と比べて、入力時間差情報ＳＩＧ１１１が示す時間差（Δｔ）だけ遅くなる。すなわち、映像データ用バッファ１０６における符号化映像データＳＩＧ１０３の保持期間がΔｔだけ長くなる。したがって、映像データ用バッファ１０６は、映像データＳＩＧ１０１と音声データＳＩＧ１０２との入力タイミングにずれが無い場合に必要な容量に加えて、（［Δｔ／（ビデオフレーム期間）］＋１）×（符号化された映像データの１ビデオフレーム分の最大サイズ）の容量を有している必要がある。なお、映像データ用バッファ１０６は、あらかじめ必要な容量を確保しておく構成でもよいし、外部からの開始指示をうけて、必要な容量を確保する構成でもよい。 In the present embodiment, the timing at which the encoded video data SIG103 is multiplexed is delayed by the time difference (Δt) indicated by the input time difference information SIG111 when the audio data is delayed from the video data, as compared to when there is no delay. . That is, the retention period of the encoded video data SIG103 in the video data buffer 106 is increased by Δt. Therefore, the video data buffer 106 is encoded with ([Δt / (video frame period)] + 1) × (encoded in addition to the capacity required when there is no deviation in the input timing of the video data SIG101 and the audio data SIG102. It is necessary to have a capacity of a maximum size of one video frame of the video data. Note that the video data buffer 106 may have a configuration in which a necessary capacity is secured in advance, or may have a configuration in which a necessary capacity is secured in response to a start instruction from the outside.

音声データ用バッファ１０７は、オーディオ符号化部１０４が出力した符号化音声データＳＩＧ１０４を保持するようになっている。 The audio data buffer 107 holds the encoded audio data SIG 104 output from the audio encoding unit 104.

音声データ用バッファ１０７の容量は、映像データＳＩＧ１０１と音声データＳＩＧ１０２との入力タイミングにずれが無い場合に必要な容量（すなわち、符号化映像データＳＩＧ１０５と符号化音声データＳＩＧ１０６との多重化のために、バッファリングする必要のある符号化音声データＳＩＧ１０６の容量）に加えて、ビデオ符号化部１０３とオーディオ符号化部１０４との符号化開始時間差分の時間に、オーディオ符号化部１０４が符号化したデータを保持できる容量が必要である。 The capacity of the audio data buffer 107 is a capacity required when there is no difference in input timing between the video data SIG101 and the audio data SIG102 (that is, for multiplexing of the encoded video data SIG105 and the encoded audio data SIG106). In addition to the capacity of the encoded audio data SIG 106 that needs to be buffered), the audio encoding unit 104 encoded the time difference of the encoding start time between the video encoding unit 103 and the audio encoding unit 104. A capacity that can hold data is required.

本実施形態では、音声データＳＩＧ１０２が多重化されるタイミングは、映像データが音声データよりも遅れる場合は、遅れがない場合と比べて、入力時間差情報ＳＩＧ１１１が示す時間差（Δｔ）だけ遅くなる。すなわち、音声データ用バッファ１０７における符号化音声データＳＩＧ１０４の保持期間がΔｔだけ長くなる。したがって、音声データ用バッファ１０７は、映像データＳＩＧ１０１と音声データＳＩＧ１０２との入力タイミングにずれが無い場合に必要な容量に加えて、（［｜Δｔ｜／（オーディオフレーム期間）］＋１）×（符号化された音声データの１オーディオフレーム分の最大サイズ）の容量を有している必要がある。なお、音声データ用バッファ１０７は、あらかじめ必要な容量を確保しておく構成でもよいし、外部からの開始指示をうけて、必要な容量を確保する構成でもよい。 In the present embodiment, the timing at which the audio data SIG102 is multiplexed is delayed by the time difference (Δt) indicated by the input time difference information SIG111 when the video data is delayed from the audio data, compared to when there is no delay. That is, the holding period of the encoded audio data SIG104 in the audio data buffer 107 is increased by Δt. Therefore, the audio data buffer 107 adds ([| Δt | / (audio frame period)] + 1) × (sign in addition to the capacity required when there is no deviation in the input timing of the video data SIG101 and the audio data SIG102. It is necessary to have a capacity of (maximum size for one audio frame of the converted audio data). The audio data buffer 107 may have a configuration in which a necessary capacity is secured in advance, or may have a configuration in which a necessary capacity is secured in response to an external start instruction.

（多重化部１０８）
多重化部１０８は、多重化制御情報ＳＩＧ１１５が入力されると動作を開始し、映像データ用バッファ１０６及び音声データ用バッファ１０７のそれぞれに格納されているデータ（符号化映像データＳＩＧ１０５及び符号化音声データＳＩＧ１０６）を多重化したストリーム（多重化ストリームＳＩＧ１０７）を出力するようになっている。詳しくは、多重化部１０８は、ビデオフレーム情報ＳＩＧ１０８及びオーディオフレーム情報ＳＩＧ１０９に基づいて、符号化映像データＳＩＧ１０５及び符号化音声データＳＩＧ１０６の再生時間（再生時間情報）を算出し、算出した再生時間情報を基に多重化順を決定する。そして、決定した多重化順にしたがって、映像データ用バッファ１０６から取得した符号化映像データＳＩＧ１０５と、音声データ用バッファ１０７から取得した符号化音声データＳＩＧ１０６とを、再生時間情報を付加して多重化する。 (Multiplexer 108)
Multiplexer 108 starts operating when multiplexing control information SIG 115 is input, and stores data in encoded data buffer 106 and audio data buffer 107 (encoded video data SIG 105 and encoded audio data). A stream (multiplexed stream SIG107) obtained by multiplexing data SIG106) is output. Specifically, the multiplexing unit 108 calculates the reproduction time (reproduction time information) of the encoded video data SIG105 and the encoded audio data SIG106 based on the video frame information SIG108 and the audio frame information SIG109, and the calculated reproduction time information. The multiplexing order is determined based on the above. Then, according to the determined multiplexing order, the encoded video data SIG 105 acquired from the video data buffer 106 and the encoded audio data SIG 106 acquired from the audio data buffer 107 are multiplexed with reproduction time information added. .

［映像音声多重化装置１００の動作］
（音声データが映像データに対して遅れたタイミングで入力される場合の動作）
音声データが映像データに対して遅れたタイミングで入力される場合における符号化開始、一時停止、及び符号化再開の動作を、図２を参照しつつ説明する。図２は、動作制御部１０５、ビデオ符号化部１０３、及びオーディオ符号化部１０４の制御タイミングを示す図である。 [Operation of Video / Audio Multiplexer 100]
(Operation when audio data is input at a timing delayed from video data)
The encoding start, pause, and encoding restart operations when audio data is input at a timing delayed from the video data will be described with reference to FIG. FIG. 2 is a diagram illustrating control timings of the operation control unit 105, the video encoding unit 103, and the audio encoding unit 104.

映像音声多重化装置１００に多重化を行なわせるには、例えば外部から、制御情報ＳＩＧ１１０と入力時間差情報ＳＩＧ１１１とを、動作制御部１０５に対して入力する。ここでは、動作制御部１０５は、図２に示す開始指示時刻ｔ２００において、制御情報ＳＩＧ１１０と入力時間差情報ＳＩＧ１１１とを受けたものとする。また、入力時間差情報ＳＩＧ１１１が示す時間差は、Δｔ２（音声データが遅れる場合は正の値）とする。 In order to cause the video / audio multiplexing apparatus 100 to perform multiplexing, for example, control information SIG110 and input time difference information SIG111 are input to the operation control unit 105 from the outside. Here, it is assumed that operation control unit 105 receives control information SIG110 and input time difference information SIG111 at start instruction time t200 shown in FIG. Also, the time difference indicated by the input time difference information SIG111 is Δt2 (a positive value when the audio data is delayed).

制御情報ＳＩＧ１１０と入力時間差情報ＳＩＧ１１１とが入力されると、動作制御部１０５は、開始指示時刻ｔ２００の後の最初にビデオフレーム境界が検出された時刻ｔ２０１において、ビデオ符号化部１０３に対して、映像符号化制御情報ＳＩＧ１１３を出力する。これにより、ビデオ符号化部１０３は、映像データＳＩＧ１０１を符号化した符号化映像データＳＩＧ１０３を映像データ用バッファ１０６に出力する。 When the control information SIG110 and the input time difference information SIG111 are input, the operation control unit 105 transmits the video frame boundary to the video encoding unit 103 for the first time after the start instruction time t200 is detected. Video encoding control information SIG113 is output. Accordingly, the video encoding unit 103 outputs the encoded video data SIG103 obtained by encoding the video data SIG101 to the video data buffer 106.

さらに、動作制御部１０５は、時刻ｔ２０１（映像データ符号化開始時刻）よりもΔｔ２遅れた時刻ｔ２０２のタイミングを、時刻情報ＳＩＧ１１６に基づいて検出し、検出したタイミングで、オーディオ符号化部１０４に対して、音声符号化制御情報ＳＩＧ１１４を出力する。これにより、オーディオ符号化部１０４は、音声データＳＩＧ１０２を符号化した符号化音声データＳＩＧ１０４を音声データ用バッファ１０７に出力する。 Furthermore, the operation control unit 105 detects the timing at the time t202 that is delayed by Δt2 from the time t201 (video data encoding start time) based on the time information SIG116, and detects the timing at the detected timing with respect to the audio encoding unit 104. The speech encoding control information SIG114 is output. As a result, the audio encoding unit 104 outputs the encoded audio data SIG104 obtained by encoding the audio data SIG102 to the audio data buffer 107.

また、動作制御部１０５は、制御情報ＳＩＧ１１０が入力されると、多重化制御情報ＳＩＧ１１５を多重化部１０８に出力する。それにより、多重化部１０８は、多重化動作を開始する。具体的には、多重化部１０８は、ビデオフレーム情報ＳＩＧ１０８及びオーディオフレーム情報ＳＩＧ１０９に基づいて、符号化映像データＳＩＧ１０５及び符号化音声データＳＩＧ１０６の再生時間情報を算出して多重化順を決定する。そして、多重化部１０８は、決定した多重化順にしたがって、映像データ用バッファ１０６から取得した符号化映像データＳＩＧ１０５と、音声データ用バッファ１０７から取得した符号化音声データＳＩＧ１０６とを、再生時間情報を付加して多重化する。 Further, when the control information SIG 110 is input, the operation control unit 105 outputs the multiplexing control information SIG 115 to the multiplexing unit 108. Thereby, the multiplexing unit 108 starts the multiplexing operation. Specifically, the multiplexing unit 108 calculates the reproduction time information of the encoded video data SIG105 and the encoded audio data SIG106 based on the video frame information SIG108 and the audio frame information SIG109, and determines the multiplexing order. Then, in accordance with the determined multiplexing order, the multiplexing unit 108 converts the encoded video data SIG 105 acquired from the video data buffer 106 and the encoded audio data SIG 106 acquired from the audio data buffer 107 into reproduction time information. Add and multiplex.

図３は、音声データが遅れる場合に、各ビデオフレーム及びオーディオフレームに付加される再生時間情報を示している。上記のように、多重化部１０８は、ビデオフレーム情報ＳＩＧ１０８とオーディオフレーム情報ＳＩＧ１０９とに基づいて、再生時間情報を算出するので、ビデオフレームｖ２０１（先頭のビデオフレーム）の再生時間情報と、オーディオフレームａ２０１（先頭のオーディオフレーム）の再生時間情報とは、図３に示すように、同じ再生時刻（再生時間情報ｔ３００）を示している。 FIG. 3 shows reproduction time information added to each video frame and audio frame when audio data is delayed. As described above, the multiplexing unit 108 calculates the playback time information based on the video frame information SIG108 and the audio frame information SIG109, so that the playback time information of the video frame v201 (first video frame) and the audio frame are calculated. The playback time information of a201 (first audio frame) indicates the same playback time (playback time information t300) as shown in FIG.

（符号化の一時停止及び再開）
また、例えば、動作制御部１０５が、図２に示す一時停止指示時刻ｔ２０３において、一時停止の指示を受けた場合には、一時停止指示時刻ｔ２０３の後の最初にビデオフレーム境界が検出された時刻ｔ２０４において、動作制御部１０５は、ビデオ符号化部１０３に一時停止を指示する映像符号化制御情報ＳＩＧ１１３を通知する。さらに、動作制御部１０５は、時刻ｔ２０４よりもΔｔ２遅れた時刻ｔ２０５のタイミングを、時刻情報ＳＩＧ１１６によって検出し、検出したタイミングで、一時停止を指示する音声符号化制御情報ＳＩＧ１１４をオーディオ符号化部１０４に出力する。 (Pause and resume encoding)
Further, for example, when the operation control unit 105 receives a pause instruction at the pause instruction time t203 shown in FIG. 2, the time when the video frame boundary is first detected after the pause instruction time t203. At t 204, the operation control unit 105 notifies the video encoding unit 103 of video encoding control information SIG 113 that instructs to pause. Further, the operation control unit 105 detects the timing of the time t205 delayed by Δt2 from the time t204 by the time information SIG116, and the audio encoding control information SIG114 instructing the suspension at the detected timing. Output to.

また、例えば、動作制御部１０５が、再開指示時刻ｔ２０６において、符号化開始（再開）の指示を受けた場合には、再開指示時刻ｔ２０６の後の最初にビデオフレーム境界が検出された時刻ｔ２０７において、動作制御部１０５は、ビデオ符号化部１０３に符号化開始を指示する映像符号化制御情報ＳＩＧ１１３を通知する。さらに、動作制御部１０５は、時刻ｔ２０７よりもΔｔ２遅れた時刻ｔ２０８のタイミングを、時刻情報ＳＩＧ１１６によって検出し、検出したタイミングで、符号化開始（再開）を指示する音声符号化制御情報ＳＩＧ１１４をオーディオ符号化部１０４に出力する。それにより、一時停止及び再開の前後のビデオフレームｖ２０４とビデオフレームｖ２０５とは、図３に示すように、再生時間情報ｔ３０１において、シームレスに接続される。また、一時停止及び再開のタイミングにまたがるオーディオフレームａ２０７は、図３に示すように、再生時間情報ｔ３０１において、シームレスに接続される。 Also, for example, when the operation control unit 105 receives an instruction to start (resume) encoding at the restart instruction time t206, at the time t207 when the video frame boundary is first detected after the restart instruction time t206. The operation control unit 105 notifies the video encoding unit 103 of video encoding control information SIG113 instructing the start of encoding. Further, the operation control unit 105 detects the timing at time t208, which is delayed by Δt2 from the time t207, from the time information SIG116, and at the detected timing, the audio control information SIG114 instructing the start (resumption) of encoding is audio. The data is output to the encoding unit 104. Thereby, the video frame v204 and the video frame v205 before and after the pause and restart are seamlessly connected in the reproduction time information t301 as shown in FIG. Also, the audio frame a207 extending over the timing of pause and restart is seamlessly connected in the playback time information t301 as shown in FIG.

（映像データが音声データに対して遅れたタイミングで入力される場合の動作）
次に、映像データが音声データに対して遅れたタイミングで入力される場合における符号化開始、一時停止、及び符号化再開の動作を、図４を参照しつつ説明する。図４は、動作制御部１０５、ビデオ符号化部１０３、及びオーディオ符号化部１０４の制御タイミングを示す図である。 (Operation when video data is input at a timing delayed from audio data)
Next, operations of encoding start, pause, and encoding restart when video data is input at a timing delayed from audio data will be described with reference to FIG. FIG. 4 is a diagram illustrating control timings of the operation control unit 105, the video encoding unit 103, and the audio encoding unit 104.

映像音声多重化装置１００に多重化を行なわせるには、まず、動作制御部１０５に対して、例えば外部から制御情報ＳＩＧ１１０と入力時間差情報ＳＩＧ１１１とを入力する。ここでは、動作制御部１０５は、図４に示す開始指示時刻ｔ４００において、制御情報ＳＩＧ１１０と入力時間差情報ＳＩＧ１１１とを受けたものとする。また、入力時間差情報ＳＩＧ１１１が示す時間差は、Δｔ４（映像データが遅れる場合は負の値）であるものとする。また、開始指示時刻ｔ４００の後の最初に検出されるビデオフレーム境界の検出時刻は、時刻ｔ４０１であるものとする。 In order to cause the video / audio multiplexing apparatus 100 to perform multiplexing, first, for example, control information SIG110 and input time difference information SIG111 are input to the operation control unit 105 from the outside. Here, it is assumed that operation control unit 105 receives control information SIG110 and input time difference information SIG111 at start instruction time t400 shown in FIG. Further, the time difference indicated by the input time difference information SIG111 is assumed to be Δt4 (a negative value when the video data is delayed). Also, the detection time of the video frame boundary detected first after the start instruction time t400 is assumed to be time t401.

制御情報ＳＩＧ１１０が入力されると、動作制御部１０５は、時刻ｔ４０１＋（ビデオフレーム期間）−（｜Δｔ４｜％（ビデオフレーム期間））で表される時刻ｔ４０２（音声データ符号化開始時刻）を求め、時刻ｔ４０２において、オーディオ符号化部１０４に対して、符号化開始を指示する音声符号化制御情報ＳＩＧ１１４を通知する。 When the control information SIG110 is input, the operation control unit 105 obtains a time t402 (audio data encoding start time) represented by a time t401 + (video frame period) − (| Δt4 |% (video frame period)). At time t402, the audio encoding unit 104 is notified of the audio encoding control information SIG114 that instructs to start encoding.

また、動作制御部１０５は、時刻ｔ４０２＋｜Δｔ４｜で表される時刻ｔ４０３において、ビデオ符号化部１０３に対して、符号化開始を指示する映像符号化制御情報ＳＩＧ１１３を通知する。 In addition, the operation control unit 105 notifies the video encoding unit 103 of video encoding control information SIG113 instructing to start encoding at time t403 represented by time t402 + | Δt4 |.

また、動作制御部１０５は、制御情報ＳＩＧ１１０が入力されると、多重化制御情報ＳＩＧ１１５を多重化部１０８に出力し、多重化部１０８の多重化動作を開始させる。 Further, when the control information SIG110 is input, the operation control unit 105 outputs the multiplexing control information SIG115 to the multiplexing unit 108, and starts the multiplexing operation of the multiplexing unit 108.

以上により、映像データＳＩＧ１０１及び音声データＳＩＧ１０２は符号化された後に、多重化されて出力される。 As described above, the video data SIG101 and the audio data SIG102 are encoded and then multiplexed and output.

図５は、映像データが遅れる場合に、各ビデオフレーム及びオーディオフレームに付加される再生時間情報を示している。多重化部１０８は、ビデオフレーム情報ＳＩＧ１０８とオーディオフレーム情報ＳＩＧ１０９とに基づいて、再生時間情報を算出するので、ビデオフレームｖ４０１（先頭のビデオフレーム）の再生時間情報と、オーディオフレームａ４０１（先頭のオーディオフレーム）の再生時間情報とは、図５に示すように、同じ再生時刻（再生時間情報ｔ５００）を示している。 FIG. 5 shows reproduction time information added to each video frame and audio frame when video data is delayed. The multiplexing unit 108 calculates the reproduction time information based on the video frame information SIG108 and the audio frame information SIG109, so that the reproduction time information of the video frame v401 (first video frame) and the audio frame a401 (first audio frame) are calculated. As shown in FIG. 5, the reproduction time information of (frame) indicates the same reproduction time (reproduction time information t500).

（符号化の一時停止及び再開）
また、例えば、動作制御部１０５が、図４に示す一時停止指示時刻ｔ４０４において、一時停止の指示を受けた場合には、一時停止指示時刻ｔ４０４の後の最初のビデオフレーム境界が時刻ｔ４０５であるとすると、時刻ｔ４０５＋（ビデオフレーム期間）−（｜Δｔ４｜％（ビデオフレーム期間））で表される時刻ｔ４０６において、オーディオ符号化部１０４に対して、符号化の一時停止を指示する音声符号化制御情報ＳＩＧ１１４を通知する。 (Pause and resume encoding)
Also, for example, when the operation control unit 105 receives a pause instruction at the pause instruction time t404 shown in FIG. 4, the first video frame boundary after the pause instruction time t404 is time t405. Then, at time t406 represented by time t405 + (video frame period) − (| Δt4 |% (video frame period)), audio encoding that instructs the audio encoding unit 104 to pause encoding is performed. Control information SIG114 is notified.

また、時刻ｔ４０６＋｜Δｔ４｜で表される時刻ｔ４０７において、ビデオ符号化部１０３に対して、符号化の一時停止を指示する映像符号化制御情報ＳＩＧ１１３を通知する。 Also, at time t407 represented by time t406 + | Δt4 |, video encoding control information SIG113 instructing the temporary stop of encoding is notified to the video encoding unit 103.

また、動作制御部１０５が、再開指示時刻ｔ４０８において、符号化開始（再開）の指示を受けた場合には、再開指示時刻ｔ４０８の後の最初のビデオフレーム境界が時刻ｔ４０９であるとすると、時刻ｔ４０９＋（ビデオフレーム期間）−（｜Δｔ４｜％（ビデオフレーム期間））で表される時刻ｔ４１０において、オーディオ符号化部１０４に対して、符号化開始を指示する音声符号化制御情報ＳＩＧ１１４を通知する。また、時刻ｔ４１０＋｜Δｔ４｜で表される時刻ｔ４１１において、ビデオ符号化部１０３に対して、符号化開始（再開）を指示する映像符号化制御情報ＳＩＧ１１３を通知する。 Further, when the operation control unit 105 receives an instruction to start (resume) encoding at the restart instruction time t408, if the first video frame boundary after the restart instruction time t408 is the time t409, At time t410 expressed by t409 + (video frame period) − (| Δt4 |% (video frame period)), audio encoding control information SIG114 instructing the start of encoding is notified to the audio encoding unit 104. . Also, at time t411 represented by time t410 + | Δt4 |, video encoding control information SIG113 instructing to start (restart) encoding is notified to the video encoding unit 103.

以上により、一時停止及び再開の前後のビデオフレームｖ４０５とビデオフレームｖ４０６とは、図５に示すように、再生時間情報ｔ５０１でシームレスに接続される。また、一時停止及び再開のタイミングにまたがるオーディオフレームａ４０８は、図５に示すように、再生時間情報ｔ５０１でシームレスに接続される。 As described above, the video frame v405 and the video frame v406 before and after the pause and restart are seamlessly connected with the reproduction time information t501 as shown in FIG. Also, the audio frame a408 extending over the pause and restart timings is seamlessly connected with the playback time information t501 as shown in FIG.

上記のように、本実施形態によれば、ビデオ符号化部における符号化開始タイミングと、オーディオ符号化部における符号化開始タイミングとを、映像データと音声データの入力タイミングのずれ量に応じて調整するので、映像データと音声データの入力タイミングにずれがあっても、多重化ストリームの再生時に、映像と音声とがずれないようにすることが可能になる。 As described above, according to the present embodiment, the encoding start timing in the video encoding unit and the encoding start timing in the audio encoding unit are adjusted according to the shift amount of the input timing of the video data and the audio data. Therefore, even if the input timing of the video data and the audio data is deviated, it is possible to prevent the video and audio from deviating when the multiplexed stream is reproduced.

しかも、映像データと音声データを符号化した後に、バッファリングして入力タイミングのずれを吸収するので、例えば、映像データがより高画質化されたり、音声データがより高音質化されたりした場合にも、映像データ用バッファや音声データ用バッファの容量を、実現可能な範囲に収めることが可能になる。 In addition, after encoding video data and audio data, buffering is performed to absorb the difference in input timing, so that, for example, when the video data is improved in quality or the audio data is improved in quality. However, the capacity of the video data buffer and the audio data buffer can be kept within the feasible range.

《発明の実施形態２》
図６は、本発明の実施形態２に係る映像音声多重化装置２００の構成を示すブロック図である。映像音声多重化装置２００は、同図に示すように、検出部１０１、ビデオ符号化部１０３、オーディオ符号化部１０４、映像データ用バッファ１０６、音声データ用バッファ１０７、動作制御部２０１、及び多重化部２０２を備えている。 << Embodiment 2 of the Invention >>
FIG. 6 is a block diagram showing a configuration of a video / audio multiplexing apparatus 200 according to Embodiment 2 of the present invention. As shown in the figure, the video / audio multiplexing apparatus 200 includes a detection unit 101, a video encoding unit 103, an audio encoding unit 104, a video data buffer 106, an audio data buffer 107, an operation control unit 201, and a multiplexing unit. The conversion unit 202 is provided.

（動作制御部２０１）
動作制御部２０１は、符号化の開始と一時停止を制御する情報を含んだ制御情報ＳＩＧ１１０が映像音声多重化装置２００の外部から入力されており、制御情報ＳＩＧ１１０に応じて、ビデオ符号化部１０３及びオーディオ符号化部１０４の符号化動作の制御と、多重化部２０２の多重化動作とを制御するようになっている。この制御は、音声データＳＩＧ１０２及び映像データＳＩＧ１０１のうち、何れの入力が遅れるかによって異なっている。 (Operation control unit 201)
The operation control unit 201 receives control information SIG110 including information for controlling the start and pause of encoding from the outside of the video / audio multiplexing apparatus 200, and the video encoding unit 103 according to the control information SIG110. In addition, the control of the encoding operation of the audio encoding unit 104 and the multiplexing operation of the multiplexing unit 202 are controlled. This control differs depending on which input of the audio data SIG102 and the video data SIG101 is delayed.

動作制御部２０１は、符号化開始を指示する制御情報ＳＩＧ１１０が入力されると、符号化する最初のビデオフレームのビデオフレーム境界情報ＳＩＧ１１２が入力されたタイミング（映像データ符号化開始時刻）で、符号化開始を指示する映像符号化制御情報ＳＩＧ１１３を出力して、ビデオ符号化部１０３に符号化動作を開始させる。 When the control information SIG110 instructing the start of encoding is input, the operation control unit 201 performs encoding at the timing (video data encoding start time) when the video frame boundary information SIG112 of the first video frame to be encoded is input. Video encoding control information SIG113 instructing the start of encoding is output to cause the video encoding unit 103 to start the encoding operation.

また、動作制御部２０１には、入力時間差情報ＳＩＧ１１１が入力されており、入力時間差情報ＳＩＧ１１１が示す時間差をΔｔ（音声データが遅れる場合は正の値）とすると、動作制御部２０１は、（映像データ符号化開始時刻）＋Δｔよりも後で、かつ、最も早いタイミングのビデオフレーム境界の検出時刻（この時刻は、ビデオフレーム境界情報ＳＩＧ１１２によって検出する）において、オーディオ符号化部１０４に符号化開始を指示する音声符号化制御情報ＳＩＧ１１４を通知する。 Further, the input time difference information SIG111 is input to the operation control unit 201, and when the time difference indicated by the input time difference information SIG111 is Δt (a positive value when audio data is delayed), the operation control unit 201 (Data encoding start time) After the detection time of the video frame boundary at the earliest timing after (+ t) (this time is detected by the video frame boundary information SIG112), the audio encoding unit 104 starts encoding. Instructed voice encoding control information SIG114 is notified.

同時に、動作制御部２０１は、（ビデオフレーム期間）−Δｔ％（ビデオフレーム期間）の値を時間差分情報ＳＩＧ２０１として、多重化部２０２に通知する。また、多重化開始を指示する多重化制御情報ＳＩＧ１１５を多重化部２０２に通知する。 At the same time, the operation control unit 201 notifies the multiplexing unit 202 of the value of (video frame period) −Δt% (video frame period) as time difference information SIG201. In addition, the multiplexing unit 202 is notified of multiplexing control information SIG 115 that instructs to start multiplexing.

また、符号化の一時停止を指示する制御情報ＳＩＧ１１０が入力されると、動作制御部２０１は、制御情報ＳＩＧ１１０を受けた後に入力される最初のビデオフレーム境界情報ＳＩＧ１１２の入力タイミング（映像データ符号化一時停止時刻）に、符号化の一時停止を指示する映像符号化制御情報ＳＩＧ１１３を出力して、ビデオ符号化部１０３の符号化動作を一時停止させる。動作制御部２０１は、（映像データ符号化一時停止時刻）＋Δｔよりも後で、かつ、最も早いタイミングのビデオフレーム境界の検出時刻（この時刻は、ビデオフレーム境界情報ＳＩＧ１１２によって検出する）に、オーディオ符号化部１０４に一時停止を指示する音声符号化制御情報ＳＩＧ１１４を通知する。 When the control information SIG110 instructing the temporary stop of the encoding is input, the operation control unit 201 receives the input timing (video data encoding) of the first video frame boundary information SIG112 input after receiving the control information SIG110. Video encoding control information SIG113 instructing to pause encoding is output at a pause time), and the encoding operation of the video encoding unit 103 is temporarily stopped. The operation control unit 201 detects the audio frame boundary detection time (this time is detected by the video frame boundary information SIG112) after (video data encoding pause time) + Δt and at the earliest timing. The encoding unit 104 is notified of the audio encoding control information SIG114 for instructing a pause.

一方、映像データＳＩＧ１０１が音声データＳＩＧ１０２よりも遅れて入力される場合には、動作制御部２０１は、以下のように、符号化及び多重化の開始及び一時停止を制御する。 On the other hand, when the video data SIG101 is input later than the audio data SIG102, the operation control unit 201 controls the start and pause of encoding and multiplexing as follows.

動作制御部２０１は、制御情報ＳＩＧ１１０が入力された時刻の後の最初にビデオフレーム境界が検出された時刻（音声データ符号化開始時刻）に、符号化開始を指示する音声符号化制御情報ＳＩＧ１１４をオーディオ符号化部１０４に通知する。入力時間差情報ＳＩＧ１１１が示す時間差をΔｔ（映像データが遅れる場合は負の値）とすると、動作制御部２０１は、（音声データ符号化開始時刻）＋｜Δｔ｜よりも小さく、かつ、最も遅いタイミングのビデオフレーム境界の検出時刻において、符号化開始を指示する映像符号化制御情報ＳＩＧ１１３をビデオ符号化部１０３に通知する。同時に、動作制御部２０１は、｜Δｔ｜％（ビデオフレーム期間）の値を、時間差分情報ＳＩＧ２０１として多重化部２０２に通知する。さらに、多重化開始を指示する多重化制御情報ＳＩＧ１１５を多重化部２０２に通知する。 The operation control unit 201 sets the audio encoding control information SIG114 instructing the start of encoding at the time when the video frame boundary is first detected after the time when the control information SIG110 is input (audio data encoding start time). This is notified to the audio encoding unit 104. If the time difference indicated by the input time difference information SIG111 is Δt (a negative value when the video data is delayed), the operation control unit 201 is smaller than (audio data encoding start time) + | Δt | and the latest timing At the detection time of the video frame boundary, video encoding control information SIG113 instructing the start of encoding is notified to the video encoding unit 103. At the same time, the operation control unit 201 notifies the multiplexing unit 202 of the value of | Δt |% (video frame period) as time difference information SIG201. Further, the multiplexing unit 202 is notified of multiplexing control information SIG 115 that instructs to start multiplexing.

また、符号化の一時停止を指示する制御情報ＳＩＧ１１０が入力されると、動作制御部２０１は、制御情報ＳＩＧ１１０を受けた後の最初にビデオフレーム境界が検出された時刻時刻（音声データ符号化一時停止時刻）に、符号化の一時停止を指示する音声符号化制御情報ＳＩＧ１１４をオーディオ符号化部１０４に通知する。さらに、動作制御部２０１は、（音声データ符号化一時停止時刻）＋｜Δｔ｜よりも前で、かつ、最も遅いタイミングのビデオフレーム境界の検出時刻に、一時停止を指示する映像符号化制御情報ＳＩＧ１１３をビデオ符号化部１０３に通知する。 When control information SIG110 instructing to pause encoding is input, operation control unit 201 first detects the time and time when the video frame boundary is detected after receiving control information SIG110 (audio data encoding temporary). At the stop time), the audio encoding unit 104 is notified of the audio encoding control information SIG114 instructing the temporary stop of encoding. Further, the operation control unit 201 provides video encoding control information for instructing a pause at the detection time of the video frame boundary at the latest timing before (audio data encoding pause time) + | Δt |. SIG 113 is notified to the video encoding unit 103.

（多重化部２０２）
多重化部２０２は、多重化制御情報ＳＩＧ１１５が入力されると動作を開始し、ビデオフレーム情報ＳＩＧ１０８、オーディオフレーム情報ＳＩＧ１０９及び時間差分情報ＳＩＧ２０１に基づいて、符号化映像データＳＩＧ１０５及び符号化音声データＳＩＧ１０６の再生時間情報を算出する。詳しくは、多重化部２０２は、１ビデオフレームの時間間隔とのビデオフレーム情報ＳＩＧ１０８の通知された回数から算出された値を、ビデオフレームの再生時間情報とし、１オーディオフレームの時間間隔とのオーディオフレーム情報ＳＩＧ１０９の通知された回数から算出される値に、時間差分情報ＳＩＧ２０１が示す値を加算し、オーディオフレームの再生時間情報とするようになっている。 (Multiplexer 202)
The multiplexing unit 202 starts operating when the multiplexing control information SIG115 is input, and based on the video frame information SIG108, the audio frame information SIG109, and the time difference information SIG201, the multiplexed video data SIG105 and the encoded audio data SIG106. Is calculated. Specifically, the multiplexing unit 202 uses the value calculated from the notified number of times of the video frame information SIG108 with the time interval of one video frame as the reproduction time information of the video frame, and the audio with the time interval of one audio frame. A value indicated by the time difference information SIG201 is added to a value calculated from the notified number of times of the frame information SIG109 to obtain reproduction time information of the audio frame.

そして、動作制御部２０１は、算出した再生時間情報を基に、多重化順を決定し、決定した多重化順にしたがって、映像データ用バッファ１０６から取得した符号化映像データＳＩＧ１０５と、音声データ用バッファ１０７から取得した符号化音声データＳＩＧ１０６とを、再生時間情報を付加して多重化する。 Then, the operation control unit 201 determines the multiplexing order based on the calculated reproduction time information, and in accordance with the determined multiplexing order, the encoded video data SIG 105 acquired from the video data buffer 106 and the audio data buffer The encoded audio data SIG 106 acquired from 107 is multiplexed with reproduction time information.

（映像音声多重化装置２００の動作）
（音声データが映像データに対して遅れたタイミングで入力される場合の動作）
音声データが映像データに対して遅れたタイミングで入力される場合における符号化開始、一時停止、及び符号化再開の動作を、図７を参照しつつ説明する。図７は、動作制御部２０１、ビデオ符号化部１０３、及びオーディオ符号化部１０４の制御タイミングを示す図である。 (Operation of Video / Audio Multiplexer 200)
(Operation when audio data is input at a timing delayed from video data)
The encoding start, pause, and encoding restart operations when audio data is input at a timing delayed from the video data will be described with reference to FIG. FIG. 7 is a diagram illustrating control timings of the operation control unit 201, the video encoding unit 103, and the audio encoding unit 104.

映像音声多重化装置２００に多重化を行なわせるには、まず動作制御部２０１に対して、例えば外部から制御情報ＳＩＧ１１０と入力時間差情報ＳＩＧ１１１とを入力する。ここでは、動作制御部２０１は、図７に示す開始指示時刻ｔ７００において、制御情報ＳＩＧ１１０と入力時間差情報ＳＩＧ１１１とを受けたものとする。また、入力時間差情報ＳＩＧ１１１が示す時間差は、Δｔ７（音声データが遅れる場合は正の値）とする。 In order to cause the video / audio multiplexing apparatus 200 to perform multiplexing, first, for example, control information SIG110 and input time difference information SIG111 are input to the operation control unit 201 from the outside. Here, it is assumed that the operation control unit 201 receives the control information SIG110 and the input time difference information SIG111 at the start instruction time t700 illustrated in FIG. Also, the time difference indicated by the input time difference information SIG111 is Δt7 (a positive value when the audio data is delayed).

制御情報ＳＩＧ１１０と入力時間差情報ＳＩＧ１１１とが入力されると、動作制御部２０１は、開始指示時刻ｔ７００の後の最初にビデオフレーム境界が検出された時刻ｔ７０１（映像データ符号化開始時刻）において、ビデオ符号化部１０３に対して、映像符号化制御情報ＳＩＧ１１３を出力する。これにより、ビデオ符号化部１０３は、符号化映像データＳＩＧ１０３を映像データ用バッファ１０６に出力する。 When the control information SIG110 and the input time difference information SIG111 are input, the operation control unit 201 detects the video at time t701 (video data encoding start time) when the video frame boundary is first detected after the start instruction time t700. Video encoding control information SIG 113 is output to encoding section 103. Accordingly, the video encoding unit 103 outputs the encoded video data SIG103 to the video data buffer 106.

さらに、動作制御部２０１は、時刻ｔ７０１＋Δｔ７よりも後で、かつ、最も早いタイミングのビデオフレーム境界の検出時刻（時刻ｔ７０２）において、オーディオ符号化部１０４に符号化開始を指示する音声符号化制御情報ＳＩＧ１１４を通知する。これにより、オーディオ符号化部１０４は、符号化音声データＳＩＧ１０４を音声データ用バッファ１０７に出力する。 Further, the motion control unit 201 instructs the audio encoding unit 104 to start encoding at the earliest detection time of the video frame boundary (time t702) after the time t701 + Δt7. SIG 114 is notified. As a result, the audio encoding unit 104 outputs the encoded audio data SIG 104 to the audio data buffer 107.

また、動作制御部２０１は、多重化制御情報ＳＩＧ１１５と、時間差分情報ＳＩＧ２０１とを多重化部２０２に出力する。時間差分情報ＳＩＧ２０１の値は、（ビデオフレーム期間）−Δｔ７％（ビデオフレーム期間）である。この時、ビデオ符号化部１０３とオーディオ符号化部１０４の符号化開始時に取り込まれた映像データと音声データの時間のずれは、時間差分情報ＳＩＧ２０１と同じである。 In addition, the operation control unit 201 outputs the multiplexing control information SIG 115 and the time difference information SIG 201 to the multiplexing unit 202. The value of the time difference information SIG201 is (video frame period) −Δt7% (video frame period). At this time, the time lag between the video data and the audio data captured at the start of encoding by the video encoding unit 103 and the audio encoding unit 104 is the same as the time difference information SIG201.

それにより、多重化部２０２は、ビデオフレーム情報ＳＩＧ１０８に基づいて、符号化映像データＳＩＧ１０５の再生時間情報を算出する。さらに、オーディオフレーム情報ＳＩＧ１０９と時間差分情報ＳＩＧ２０１の値に基づいて、符号化音声映像データＳＩＧ１０６の再生時間情報を算出する。そして、多重化部２０２は、符号化映像データＳＩＧ１０５及び符号化音声データＳＩＧ１０６の多重化順を決定し、決定した多重化順にしたがって、映像データ用バッファ１０６から取得した符号化映像データＳＩＧ１０５と、音声データ用バッファ１０７から取得した符号化音声データＳＩＧ１０６とを、再生時間情報を付加して多重化する。 Thereby, the multiplexing unit 202 calculates the reproduction time information of the encoded video data SIG105 based on the video frame information SIG108. Further, based on the values of the audio frame information SIG109 and the time difference information SIG201, the reproduction time information of the encoded audio / video data SIG106 is calculated. Then, the multiplexing unit 202 determines the multiplexing order of the encoded video data SIG105 and the encoded audio data SIG106, and the encoded video data SIG105 acquired from the video data buffer 106 according to the determined multiplexing order, and the audio The encoded audio data SIG 106 acquired from the data buffer 107 is multiplexed with reproduction time information added.

図８は、音声データが遅れる場合に、各ビデオフレーム及びオーディオフレームに付加される再生時間情報を示している。図８に示すように、ビデオフレームｖ７０１（先頭のビデオフレーム）の再生時間情報を再生時間情報ｔ８００とすると、オーディオフレームａ７０１（先頭のオーディオフレーム）の再生時間情報ｔ８０１は、再生時間情報ｔ８００＋（ビデオフレーム期間）−Δｔ７％（ビデオフレーム期間）となる。すなわち、ビデオフレームｖ７０１とオーディオフレームａ７０１の入力タイミングのずれは、付加する再生時間情報のずれと等しい。再生時ｔ８０１−ｔ８００の期間、無音状態となるが、１ビデオフレーム期間以下なので、視聴時に影響することはない。 FIG. 8 shows reproduction time information added to each video frame and audio frame when audio data is delayed. As shown in FIG. 8, assuming that the playback time information of the video frame v701 (first video frame) is the playback time information t800, the playback time information t801 of the audio frame a701 (first audio frame) is the playback time information t800 + (video Frame period) −Δt 7% (video frame period). That is, the difference in input timing between the video frame v701 and the audio frame a701 is equal to the difference in reproduction time information to be added. During the period from t801 to t800 at the time of reproduction, there is a silent state.

なお、音声データが映像データに対して遅れたタイミングで入力される場合には、上記のように、映像データ用バッファ１０６における符号化された映像データの保持期間は、（時刻ｔ７０２−時刻ｔ７０１）だけ長くなる。したがって、本実施形態では、映像データ用バッファ１０６は、映像データＳＩＧ１０１と音声データＳＩＧ１０２との入力タイミングにずれが無い場合に必要な容量に加えて、（［（時刻ｔ７０２−時刻ｔ７０１）／（ビデオフレーム期間）］＋１）×（符号化された映像データの１ビデオフレーム分の最大サイズ）の容量を有している必要がある。 When the audio data is input at a timing delayed from the video data, the encoded video data holding period in the video data buffer 106 is (time t702-time t701) as described above. Only gets longer. Therefore, in the present embodiment, the video data buffer 106 has the capacity (([(time t702−time t701) / (video) in addition to the capacity required when the input timing of the video data SIG101 and the audio data SIG102 is not shifted). It is necessary to have a capacity of (frame period)] + 1) × (maximum size for one video frame of encoded video data).

（符号化の一時停止及び再開）
例えば、動作制御部２０１が、図７に示す一時停止指示時刻ｔ７０３において、一時停止の指示を受けた場合には、一時停止指示時刻ｔ７０３の後の最初にビデオフレーム境界が検出された時刻ｔ７０４において、動作制御部２０１は、ビデオ符号化部１０３に一時停止を指示する映像符号化制御情報ＳＩＧ１１３を通知する。さらに、動作制御部２０１は、時刻ｔ７０４＋Δｔ７よりも後で、かつ、最も早いタイミングのビデオフレーム境界（時刻ｔ７０５）で、一時停止を指示する音声符号化制御情報ＳＩＧ１１４をオーディオ符号化部１０４に出力する。 (Pause and resume encoding)
For example, when the operation control unit 201 receives a pause instruction at the pause instruction time t703 shown in FIG. 7, at the time t704 when the video frame boundary is first detected after the pause instruction time t703. The operation control unit 201 notifies the video encoding unit 103 of video encoding control information SIG113 instructing a pause. Furthermore, the operation control unit 201 outputs the audio encoding control information SIG114 instructing the pause to the audio encoding unit 104 after the time t704 + Δt7 and at the earliest timing video frame boundary (time t705). .

また、動作制御部２０１が、再開指示時刻ｔ７０６において、符号化開始（再開）の指示を受けた場合には、再開指示時刻ｔ７０６の後の最初にビデオフレーム境界が検出された時刻ｔ７０７において、動作制御部２０１は、ビデオ符号化部１０３に符号化開始を指示する映像符号化制御情報ＳＩＧ１１３を通知する。さらに、時刻ｔ７０７＋Δｔ７よりも後で、かつ、最も早いタイミングのビデオフレーム境界（時刻ｔ７０８）で、符号化開始（再開）を指示する音声符号化制御情報ＳＩＧ１１４を、オーディオ符号化部１０４に出力する。 When the operation control unit 201 receives an instruction to start (resume) encoding at the restart instruction time t706, the operation control unit 201 operates at the time t707 when the video frame boundary is first detected after the restart instruction time t706. The control unit 201 notifies the video encoding unit 103 of video encoding control information SIG113 that instructs the start of encoding. Further, audio encoding control information SIG 114 instructing to start (restart) encoding is output to the audio encoding unit 104 after the time t707 + Δt7 and at the earliest timing video frame boundary (time t708).

したがって、図８に示すように、一時停止及び再開の前後のビデオフレームｖ７０４、及びビデオフレームｖ７０５は、再生時間情報ｔ８０２でシームレスに接続される。一時停止及び再開のタイミングにまたがるオーディオフレームａ７０８は、再生時間情報ｔ８０２＋（ビデオフレーム期間）−Δｔ７％（ビデオフレーム期間）となる時刻ｔ８０３でシームレスに接続される。この場合も、再生時間情報ｔ８０２と再生時間情報ｔ８０３の差は、１ビデオフレーム期間以下であり、視聴時に影響することはない。 Therefore, as shown in FIG. 8, the video frame v704 and the video frame v705 before and after the pause and restart are seamlessly connected with the playback time information t802. The audio frame a708 extending over the pause and restart timings is seamlessly connected at time t803 when the playback time information t802 + (video frame period) −Δt7% (video frame period). Also in this case, the difference between the reproduction time information t802 and the reproduction time information t803 is one video frame period or less, and does not affect the viewing.

（映像データが音声データに対して遅れたタイミングで入力される場合の動作）
次に、映像データが音声データに対して遅れたタイミングで入力される場合における符号化開始、一時停止、及び符号化再開の動作を、図９を参照しつつ説明する。図９は、動作制御部２０１、ビデオ符号化部１０３、及びオーディオ符号化部１０４の制御タイミングを示す図である。 (Operation when video data is input at a timing delayed from audio data)
Next, encoding start, pause, and encoding restart operations when video data is input at a timing delayed from audio data will be described with reference to FIG. FIG. 9 is a diagram illustrating control timings of the operation control unit 201, the video encoding unit 103, and the audio encoding unit 104.

ここでは、動作制御部２０１は、図９示す開始指示時刻ｔ９００において、制御情報ＳＩＧ１１０と入力時間差情報ＳＩＧ１１１とを受けたものとする。また、入力時間差情報ＳＩＧ１１１が示す時間差は、Δｔ９（映像データが遅れる場合は負の値）であるものとする。また、開始指示時刻ｔ９００の後の最初に検出されるビデオフレーム境界の検出時刻は、時刻ｔ９０１であるものとする。 Here, it is assumed that the operation control unit 201 receives the control information SIG110 and the input time difference information SIG111 at the start instruction time t900 shown in FIG. Further, the time difference indicated by the input time difference information SIG111 is assumed to be Δt9 (a negative value when the video data is delayed). Also, the detection time of the video frame boundary detected first after the start instruction time t900 is assumed to be time t901.

制御情報ＳＩＧ１１０が入力されると、動作制御部２０１は、時刻ｔ９０１に、オーディオ符号化部１０４に対して、符号化開始を指示する音声符号化制御情報ＳＩＧ１１４を通知する。 When the control information SIG110 is input, the operation control unit 201 notifies the audio encoding unit 104 of the audio encoding control information SIG114 instructing the start of encoding at time t901.

さらに、動作制御部２０１は、時刻ｔ９０１＋｜Δｔ９｜よりも前で、かつ、最も遅いタイミングのビデオフレーム境界の検出時刻（時刻ｔ９０２）で、ビデオ符号化部１０３に対して符号化開始を指示する映像符号化制御情報ＳＩＧ１１３を通知する。 Further, the operation control unit 201 instructs the video encoding unit 103 to start encoding at the detection time (time t902) of the video frame boundary at the latest timing before the time t901 + | Δt9 |. Video encoding control information SIG113 is notified.

また、多重化部２０２に対しては、｜Δｔ９｜％（ビデオフレーム期間）を時間差分情報ＳＩＧ２０１として、多重化制御情報ＳＩＧ１１５と同時に通知する。この時、ビデオ符号化部１０３とオーディオ符号化部１０４の符号化開始時に取り込まれた映像データと音声データの時間のずれは、時間差分情報ＳＩＧ２０１と同じである。 Further, | Δt9 |% (video frame period) is notified to the multiplexing unit 202 as time difference information SIG201 at the same time as the multiplexing control information SIG115. At this time, the time lag between the video data and the audio data captured at the start of encoding by the video encoding unit 103 and the audio encoding unit 104 is the same as the time difference information SIG201.

図１０は、映像データが遅れる場合に、各ビデオフレーム及びオーディオフレームに付加される再生時間情報を示している。図１０に示すように、ビデオフレームｖ９０１（先頭のビデオフレーム）の再生時間情報を再生時間情報ｔ１０００とすると、オーディオフレームａ９０１（先頭のオーディオフレーム）の再生時間情報ｔ１００１は、再生時間情報ｔ１０００＋｜Δｔ９｜％（ビデオフレーム期間）となるとなる。すなわち、ビデオフレームｖ９０１とオーディオフレームａ９０１の入力タイミングのずれは、付加する再生時間情報のずれと等しい。再生時ｔ１００１−ｔ１０００の期間、無音状態となるが、１ビデオフレーム期間以下なので、視聴時に影響することはない。 FIG. 10 shows reproduction time information added to each video frame and audio frame when video data is delayed. As shown in FIG. 10, when the playback time information of the video frame v901 (first video frame) is the playback time information t1000, the playback time information t1001 of the audio frame a901 (first audio frame) is the playback time information t1000 + | Δt9. |% (Video frame period). That is, the difference in input timing between the video frame v901 and the audio frame a901 is equal to the difference in reproduction time information to be added. During the period from t1001 to t1000 at the time of reproduction, there is no sound, but since it is less than one video frame period, there is no influence on viewing.

なお、音映像データが音声データに対して遅れたタイミングで入力される場合には、上記のように、音声データ用バッファ１０７における符号化された音声データの保持期間は、（時刻ｔ９０２−時刻ｔ９０１）だけ長くなる。したがって、本実施形態では、音声データ用バッファ１０７は、映像データＳＩＧ１０１と音声データＳＩＧ１０２との入力タイミングにずれが無い場合に必要な容量に加えて、（［｜時刻ｔ９０２−時刻ｔ９０１｜／（オーディオフレーム期間）］＋１）×（符号化された音声データの１オーディオフレーム分の最大サイズ）の容量を有している必要がある。 When the audio / video data is input at a timing delayed from the audio data, the encoded audio data holding period in the audio data buffer 107 is (time t902−time t901) as described above. ). Therefore, in the present embodiment, the audio data buffer 107 has ([| time t902−time t901 | / (audio) in addition to the capacity required when there is no deviation in the input timing of the video data SIG101 and the audio data SIG102. It is necessary to have a capacity of (frame period)] + 1) × (maximum size for one audio frame of encoded audio data).

（符号化の一時停止及び再開）
また、例えば、動作制御部２０１が、図９に示す一時停止指示時刻ｔ９０４において、一時停止の指示を受けた場合には、動作制御部２０１は、一時停止指示時刻ｔ９０４の後の最初にビデオフレーム境界が検出された時刻ｔ９０５で、オーディオ符号化部１０４に一時停止を指示する音声符号化制御情報ＳＩＧ１１４を通知する。さらに、時刻ｔ９０５＋｜Δｔ９｜よりも前で、かつ、最も遅いタイミングのビデオフレーム境界の時刻ｔ９０５で、ビデオ符号化部１０３に対して、一時停止を指示する映像符号化制御情報ＳＩＧ１１３を通知する。 (Pause and resume encoding)
Also, for example, when the operation control unit 201 receives a pause instruction at the pause instruction time t904 illustrated in FIG. 9, the operation control unit 201 starts the video frame first after the pause instruction time t904. At time t905 when the boundary is detected, the audio encoding control information SIG114 that instructs the audio encoding unit 104 to pause is notified. Further, the video encoding control information SIG113 instructing the suspension is notified to the video encoding unit 103 at time t905 before the time t905 + | Δt9 | and at the latest video frame boundary.

また、例えば、動作制御部２０１が、再開指示時刻ｔ９０８において、符号化開始（再開）の指示を受けた場合には、動作制御部２０１は、再開指示時刻ｔ９０８の後の最初にビデオフレーム境界が検出された時刻ｔ９０９で、オーディオ符号化部１０４に対して、符号化開始（再開）を指示する音声符号化制御情報ＳＩＧ１１４を通知する。また、時刻ｔ９０９＋｜Δｔ９｜よりも前で、かつ、最も遅いタイミングのビデオフレーム境界の時刻ｔ９１０で、ビデオ符号化部１０３に対して、符号化開始（再開）を指示する映像符号化制御情報ＳＩＧ１１３を通知する。 For example, when the operation control unit 201 receives an instruction to start (resume) encoding at the restart instruction time t908, the operation control unit 201 sets the video frame boundary first after the restart instruction time t908. At the detected time t909, the audio encoding unit 104 is notified of the audio encoding control information SIG114 instructing to start (restart) encoding. Also, the video encoding control information SIG113 instructing the video encoding unit 103 to start (resume) encoding at time t910 before the time t909 + | Δt9 | and at the latest video frame boundary. To be notified.

したがって、図１０に示すように、一時停止及び再開の前後のビデオフレームｖ９０５、及びビデオフレームｖ９０６は、再生時間情報ｔ１００２でシームレスに接続される。また、一時停止及び再開のタイミングにまたがるオーディオフレームａ９０７は、再生時間情報ｔ１００２＋｜Δｔ９｜％（ビデオフレーム期間）となる再生時間情報ｔ１００３でシームレスに接続される。この場合も、再生時間情報ｔ１００２と再生時間情報ｔ１００３との差は、１ビデオフレーム期間以下であり、視聴時に影響することはない。 Therefore, as shown in FIG. 10, the video frame v905 and the video frame v906 before and after the pause and restart are seamlessly connected with the playback time information t1002. Also, the audio frame a907 extending over the pause and restart timings is seamlessly connected with playback time information t1003 that is playback time information t1002 + | Δt9 |% (video frame period). Also in this case, the difference between the reproduction time information t1002 and the reproduction time information t1003 is not more than one video frame period and does not affect the viewing.

上記のように、本実施形態によれば、ビデオ符号化部における符号化開始タイミングと、オーディオ符号化部における符号化開始タイミングとを、映像データと音声データの入力タイミングのずれ量に応じて、ビデオフレーム期間単位で調整する。また、ビデオフレーム期間単位で調整しきれないずれに対しては、多重化ストリームに付加する再生時間情報を変更することで調整する。それゆえ、映像データと音声データの入力タイミングにずれがあっても、多重化ストリームの再生時に、映像と音声とがずれないようにすることが可能になる。 As described above, according to the present embodiment, the encoding start timing in the video encoding unit and the encoding start timing in the audio encoding unit are set according to the shift amount of the input timing of the video data and the audio data. Adjust by video frame period. In addition, a deviation that cannot be adjusted in units of video frame periods is adjusted by changing reproduction time information added to the multiplexed stream. Therefore, even if there is a difference between the input timings of the video data and the audio data, it is possible to prevent the video and the audio from shifting when the multiplexed stream is played back.

しかも、本実施形態では、実施形態１の映像音声多重化装置１００のように、時刻計測部１０２を必要としないので、より小規模に構成することが可能になる。 In addition, in the present embodiment, unlike the video / audio multiplexing apparatus 100 of the first embodiment, the time measuring unit 102 is not required, so that it can be configured on a smaller scale.

なお、実施形態１、及び実施形態２の各構成要素は、ハードウェアによって実現してもよいし、ソフトウェアで中央演算装置（ＣＰＵ）を動作させることによって実現してもよい。 In addition, each component of Embodiment 1 and Embodiment 2 may be implement | achieved by hardware, and may be implement | achieved by operating a central processing unit (CPU) by software.

本発明に係る映像音声多重化装置は、映像データの符号化開始タイミングと、音声データの符号化開始タイミングとが、映像データと音声データの入力タイミングのずれ量に応じて調整されるので、映像データと音声データの入力タイミングにずれがあっても、再生時に、映像と音声とがずれない多重化ストリームを生成できるという効果を有し、映像データと音声データとの符号化、及び符号化した映像データと符号化した音声データとの多重化を行なう映像音声多重化装置等として有用である。 In the video / audio multiplexing apparatus according to the present invention, the video data encoding start timing and the audio data encoding start timing are adjusted in accordance with the shift amount of the video data and audio data input timing. Even if there is a discrepancy between the input timing of data and audio data, it has the effect of being able to generate a multiplexed stream in which video and audio are not misaligned during playback, and video data and audio data are encoded and encoded. The present invention is useful as a video / audio multiplexing apparatus that multiplexes video data and encoded audio data.

本発明の実施形態１に係る映像音声多重化装置１００の構成を示すブロック図である。1 is a block diagram showing a configuration of a video / audio multiplexing apparatus 100 according to Embodiment 1 of the present invention. 実施形態１において、音声データが遅れる場合の、動作制御部１０５、ビデオ符号化部１０３、及びオーディオ符号化部１０４の制御タイミングを示す図である。FIG. 10 is a diagram illustrating control timings of the operation control unit 105, the video encoding unit 103, and the audio encoding unit 104 when audio data is delayed in the first embodiment. 実施形態１において、音声データが遅れる場合に、各ビデオフレーム及びオーディオフレームに付加される再生時間情報を示す図である。In Embodiment 1, it is a figure which shows the reproduction time information added to each video frame and audio frame, when audio | voice data are overdue. 実施形態１において、映像データが遅れる場合の、動作制御部１０５、ビデオ符号化部１０３、及びオーディオ符号化部１０４の制御タイミングを示す図である。FIG. 6 is a diagram illustrating control timings of the operation control unit 105, the video encoding unit 103, and the audio encoding unit 104 when video data is delayed in the first embodiment. 実施形態１において、映像データが遅れる場合に、各ビデオフレーム及びオーディオフレームに付加される再生時間情報を示す図である。In Embodiment 1, it is a figure which shows the reproduction time information added to each video frame and audio frame when image data is delayed. 本発明の実施形態２に係る映像音声多重化装置２００の構成を示すブロック図である。It is a block diagram which shows the structure of the video / audio multiplexing apparatus 200 which concerns on Embodiment 2 of this invention. 実施形態２において、音声データが遅れる場合の、動作制御部２０１、ビデオ符号化部１０３、及びオーディオ符号化部１０４の制御タイミングを示す図である。In Embodiment 2, it is a figure which shows the control timing of the operation control part 201, the video encoding part 103, and the audio encoding part 104 when audio | voice data are overdue. 実施形態２において、音声データが遅れる場合に、各ビデオフレーム及びオーディオフレームに付加される再生時間情報を示す図である。In Embodiment 2, it is a figure which shows the reproduction time information added to each video frame and audio frame, when audio | voice data are overdue. 実施形態２において、映像データが遅れる場合の、動作制御部２０１、ビデオ符号化部１０３、及びオーディオ符号化部１０４の制御タイミングを示す図である。In Embodiment 2, it is a figure which shows the control timing of the operation control part 201, the video encoding part 103, and the audio encoding part 104 when video data is overdue. 実施形態２において、映像データが遅れる場合に、各ビデオフレーム及びオーディオフレームに付加される再生時間情報を示す図である。FIG. 10 is a diagram illustrating reproduction time information added to each video frame and audio frame when video data is delayed in the second embodiment.

Explanation of symbols

１００映像音声多重化装置
１０１検出部
１０２時刻計測部
１０３ビデオ符号化部
１０４オーディオ符号化部
１０５動作制御部
１０６映像データ用バッファ
１０７音声データ用バッファ
１０８多重化部
２００映像音声多重化装置
２０１動作制御部
２０２多重化部 DESCRIPTION OF SYMBOLS 100 Video / Audio Multiplexer 101 Detection Unit 102 Time Measurement Unit 103 Video Coding Unit 104 Audio Coding Unit 105 Operation Control Unit 106 Video Data Buffer 107 Audio Data Buffer 108 Multiplexing Unit 200 Video / Audio Multiplexing Device 201 Operation Control Unit 202 Multiplexing unit

Claims

Video data and audio data having temporal correlation are inputted, and encoded video data obtained by encoding the video data and encoded audio data encoded by the audio data are generated, and the encoded video data And an audio / video multiplexing apparatus that multiplexes the encoded audio data and generates one multiplexed stream,
A video encoding unit that encodes the video data in units of video frames and generates the encoded video data;
An audio encoding unit that encodes the audio data in units of audio frames and generates the encoded audio data;
Input time difference information indicating a time difference between input timings of the video data and the audio data is input, and the video encoding unit starts encoding at a detection timing of a video frame boundary of the video data. And an operation control unit that controls the audio encoding unit to start encoding at a timing shifted by a time indicated by the input time difference information with reference to an encoding start timing of the video encoding unit. ,
A video / audio multiplexing apparatus comprising:

Video data and audio data having temporal correlation are inputted, and encoded video data obtained by encoding the video data and encoded audio data encoded by the audio data are generated, and the encoded video data And an audio / video multiplexing apparatus that multiplexes the encoded audio data and generates one multiplexed stream,
A video encoding unit that encodes the video data in units of video frames and generates the encoded video data;
An audio encoding unit that encodes the audio data in units of audio frames and generates the encoded audio data;
Input time difference information indicating a time difference between input timings of the video data and the audio data is input, and the video encoding unit starts encoding at a detection timing of a video frame boundary of the video data. And controlling the audio encoding unit to start encoding at a timing shifted by a time of a video frame period with reference to the encoding start timing of the video encoding unit, and further, An operation control unit for outputting time difference information indicating a difference between a coding start timing shift time of the encoding unit and the audio encoding unit and the input time difference information;
A multiplexing unit for adding reproduction time information corresponding to the time difference indicated by the time difference information and multiplexing the encoded video data and the encoded audio data;
A video / audio multiplexing apparatus comprising:

The video / audio multiplexing apparatus according to any one of claims 1 and 2,
When the operation control unit temporarily stops the encoding of the video encoding unit and the audio encoding unit, the operation control unit has the same time as the time difference between the encoding start timings of the video encoding unit and the audio encoding unit. A video / audio multiplexing apparatus characterized by temporarily stopping the encoding of the video encoding unit and the audio encoding unit at a timing shifted by a certain amount.

The video / audio multiplexing apparatus according to any one of claims 1 and 2, further comprising:
A video data buffer for holding the output of the video encoder;
An audio data buffer that holds the output of the audio encoding unit;
The video data buffer includes, in addition to the capacity of the encoded video data that needs to be buffered for the multiplexing, a time difference between encoding start timings of the video encoding unit and the audio encoding unit Having a capacity capable of holding the encoded video data generated by the video encoding unit during
The audio data buffer has a difference in encoding start timing between the video encoding unit and the audio encoding unit in addition to the capacity of the encoded audio data that needs to be buffered for the multiplexing. A video / audio multiplexing apparatus having a capacity capable of holding the encoded audio data generated by the audio encoding unit during

The video / audio multiplexing apparatus according to claim 1, wherein
When the input timing of the audio data is later than the input timing of the video data, the operation control unit is delayed by the time indicated by the input time difference information with reference to the encoding start timing of the video encoding unit. A video / audio multiplexing apparatus, wherein the audio encoding unit controls to start encoding at a timing.

The video / audio multiplexing apparatus according to claim 1, wherein
When the input timing of the video data is later than the input timing of the audio data, the operation control unit is a timing earlier by the time indicated by the input time difference information with reference to the encoding start timing of the video encoding unit. And a video / audio multiplexing apparatus in which the audio encoding unit is controlled to start encoding.

The video / audio multiplexing apparatus according to claim 2, wherein
When the input timing of the audio data is later than the input timing of the video data, the operation control unit is delayed by a time of a video frame period with reference to the encoding start timing of the video encoding unit. A video / audio multiplexing apparatus, wherein the audio encoding unit controls to start encoding at a timing.

The video / audio multiplexing apparatus according to claim 2, wherein
When the input timing of the video data is later than the input timing of the audio data, the operation control unit is a timing earlier by a time in units of a video frame period based on the encoding start timing of the video encoding unit. In the video / audio multiplexing apparatus, the audio encoding unit is controlled to start encoding.