JP5115120B2

JP5115120B2 - Video display device and audio output device

Info

Publication number: JP5115120B2
Application number: JP2007256418A
Authority: JP
Inventors: 政弘高山
Original assignee: Fujitsu Mobile Communications Ltd
Current assignee: Fujitsu Mobile Communications Ltd
Priority date: 2007-09-28
Filing date: 2007-09-28
Publication date: 2013-01-09
Anticipated expiration: 2027-09-28
Also published as: JP2009089056A

Description

本発明は、映像表示装置及び音声出力装置に係り、特に、映像と音声とが異なる装置から出力される際の映像表示及び音声出力の同期処理に関する。 The present invention relates to a video display device and an audio output device, and more particularly to a synchronization process of video display and audio output when video and audio are output from different devices.

映像表示装置、例えば、移動通信端末装置などの小型で携帯される装置において、デジタルテレビ放送の受信によって得られたコンテンツの映像を表示することが行われている。ここで、このコンテンツは、例えば、映像と、音声と、文字とからなり、装置は、映像表示に限らず、音声の出力と文字の表示も行う。 2. Description of the Related Art In a video display device, for example, a small and portable device such as a mobile communication terminal device, an image of content obtained by receiving digital television broadcast is displayed. Here, this content includes, for example, video, audio, and characters, and the apparatus performs not only video display but also audio output and character display.

なお、コンテンツの表示と出力にあたり、そのコンテンツは、デジタルテレビ放送の受信によって得たものに限るものではない。例えば、テレビ電話によって受信されたコンテンツでも良い。また、ストリーミング再生用に送信されたコンテンツでも良い。更に、取り外しが可能であるか否かに係らず、記憶媒体に記憶されたコンテンツでも良い。 Note that, in displaying and outputting content, the content is not limited to that obtained by receiving digital television broadcasts. For example, content received by a videophone may be used. Further, it may be content transmitted for streaming playback. Furthermore, the content stored in the storage medium may be used regardless of whether or not it can be removed.

コンテンツの表示と出力にあたり、音声を装置の周辺にいる人々に聞かれないようにすることが望まれることがある。例えば、その音声が人々の迷惑になる場合や、装置の使用者がその音声に含まれる秘密情報を人々に聞かれたくない場合である。 When displaying and outputting content, it may be desirable to prevent audio from being heard by people around the device. For example, when the voice is annoying to people, or when the user of the device does not want people to hear confidential information contained in the voice.

そこで、音声は、音声出力装置、例えば、ヘッドフォン装置から出力することが行われている。更に、映像表示装置と音声出力装置との間を近距離無線通信によって接続することが行われている。近距離無線通信は、例えば、ブルートゥース（登録商標。Bluetooth）方式の通信が用いられている。 Therefore, audio is output from an audio output device, for example, a headphone device. Furthermore, the video display device and the audio output device are connected by short-range wireless communication. For the short-range wireless communication, for example, Bluetooth (registered trademark) Bluetooth communication is used.

このように、映像等を映像表示装置から出力し、音声をその装置と無線接続された音声出力装置から出力させる場合、音声が出力されるまでに遅延が生じる。そこで、映像表示装置は、その遅延時間を打ち消す処理を行い、映像等の出力と音声の出力との同期を取ることが知られている（例えば、特許文献１参照。）。
特開２００５−７９６１４号公報（第２−３頁、第１１−１３頁、図６） As described above, when video or the like is output from the video display device and audio is output from the audio output device wirelessly connected to the device, there is a delay until the audio is output. Therefore, it is known that the video display device performs processing for canceling the delay time, and synchronizes the output of the video and the like with the output of the audio (for example, see Patent Document 1).
Japanese Patent Laying-Open No. 2005-79614 (pages 2-3 and 11-13, FIG. 6)

しかしながら、上述した特許文献１に開示されている方法では、遅延時間を推定するため、音声出力装置は、タイムスタンプを参照して、音声再生能力に応じた音声再生が可能となる時刻を判断する等の同期を取る機能を有する必要があり、音声出力装置の高機能化、ひいては大型化、消費電力の増大が避けられない問題点があった。この消費電力増大の問題点は、音声出力装置が携帯型であって、電池に蓄えられた電力によって駆動される場合、顕著である。 However, in the method disclosed in Patent Document 1 described above, in order to estimate the delay time, the audio output device refers to the time stamp and determines a time at which audio reproduction according to the audio reproduction capability is possible. It is necessary to have a function of taking synchronization, etc., and there has been a problem in that it is inevitable that the audio output device will be highly functional, eventually enlarged, and power consumption will be increased. This problem of increased power consumption is significant when the audio output device is portable and is driven by power stored in a battery.

本発明は、上記問題点を解決するためになされたもので、音声の出力タイミングを判断しない音声出力装置と、その音声出力装置を用いて、遅延時間を推定し、映像等の出力と、音声出力との同期を取る映像表示装置を提供することを目的とする。 The present invention has been made to solve the above-described problems, and an audio output device that does not determine the audio output timing, and the audio output device is used to estimate the delay time, output video, etc. An object of the present invention is to provide a video display device that synchronizes with an output.

上記目的を達成するために、本発明の映像表示装置は、映像データと、その映像データが表示される時刻と、音声データと、その音声データが出力される時刻とからなるコンテンツの映像データを表示し、かつ、そのコンテンツの音声データを音声出力装置に出力させる映像表示装置であって、映像データを表示する表示手段と、前記音声出力装置と近距離無線回線を介して通信する近距離無線処理手段と、前記近距離無線処理手段が音声データを前記音声出力装置へ送信する処理を開始した時刻と、その音声出力装置からその音声データを出力した旨の通知を前記近距離無線処理手段が受信した時刻との差によって音声遅延時間を推定する音声遅延時間推定手段と、映像データをその映像データが表示される時刻と映像遅延時間とを加えた時刻に前記表示手段に表示させ、かつ、音声データを前記音声出力装置へ送信する処理を、その音声データが出力される時刻から前記音声遅延時間を減じて前記映像遅延時間を加えた時刻に前記近距離無線処理手段に開始させる映像音声同期制御手段とを有することを特徴とする。 In order to achieve the above object, a video display device according to the present invention stores video data of content including video data, a time at which the video data is displayed, audio data, and a time at which the audio data is output. A video display device for displaying and outputting audio data of the content to an audio output device, the display means for displaying the video data, and a short-distance wireless communication with the audio output device via a short-distance wireless line The short-distance wireless processing means sends a processing means, a time when the short-distance wireless processing means starts processing to transmit audio data to the audio output device, and a notification that the audio data is output from the audio output device. Audio delay time estimation means for estimating the audio delay time based on the difference with the received time, and the video data display time and video delay time added to the video data The process of causing the display means to display the audio data and transmitting the audio data to the audio output device at a time obtained by subtracting the audio delay time from the time when the audio data is output and adding the video delay time. Video and audio synchronization control means for starting the short-range wireless processing means.

また、本発明の音声出力装置は、映像表示装置と近距離無線回線を介して通信する近距離無線処理手段と、前記近距離無線処理手段によって受信された音声データをスピーカから出力する音声出力手段と、前記音声出力手段が所定の識別情報が付加された音声データを出力した際、その音声データを出力した旨を前記近距離無線処理手段を制御して前記映像表示装置に通知させる映像音声同期制御手段とを有することを特徴とする。 The audio output device of the present invention includes a short-range wireless processing unit that communicates with a video display device via a short-distance wireless line, and an audio output unit that outputs audio data received by the short-range wireless processing unit from a speaker. And when the audio output means outputs the audio data to which the predetermined identification information is added, the short distance wireless processing means is controlled to notify the video display device that the audio data has been output. And a control means.

本発明によれば、音声の出力タイミングを判断しない音声出力装置を用いて、遅延時間を推定し、映像等の出力と、音声出力との同期を取ることができる。 According to the present invention, it is possible to estimate the delay time by using an audio output device that does not determine the audio output timing, and to synchronize the output of video and the like with the audio output.

以下に、本発明による映像表示装置及び音声出力装置の実施の形態を、図面を参照して説明する。 Embodiments of a video display device and an audio output device according to the present invention will be described below with reference to the drawings.

（第１の実施形態）
図１は、本発明の第１の実施形態に係わる映像表示装置が適用された移動通信端末装置と、本発明の第１の実施形態に係わる音声出力装置が適用されたヘッドフォン装置が接続された構成を示すブロック図である。移動通信端末装置ＭＳと、ヘッドフォン装置ＨＰとは、近距離無線通信回線ＢＴによって接続される。ここで、近距離無線通信回線ＢＴは、ブルートゥース（Ｒ）方式による通信回線である。 (First embodiment)
FIG. 1 shows a connection between a mobile communication terminal device to which a video display device according to the first embodiment of the present invention is applied and a headphone device to which an audio output device according to the first embodiment of the present invention is applied. It is a block diagram which shows a structure. The mobile communication terminal device MS and the headphone device HP are connected by a short-range wireless communication line BT. Here, the short-range wireless communication line BT is a communication line based on the Bluetooth (R) method.

図２は、移動通信端末装置ＭＳの構成を示すブロック図である。この移動通信端末装置ＭＳは、装置全体の制御を行う制御部１１と、基地局（図示せず）との間で電波の送受を行うアンテナ１２ａと、通信部１２ｂと、送受信部１３と、受話用のスピーカ１４ａと、送話用のマイクロフォン１４ｂと、通話部１４ｃと、表示部１５と、入力装置１６とを備える。 FIG. 2 is a block diagram showing a configuration of the mobile communication terminal apparatus MS. This mobile communication terminal device MS includes a control unit 11 that controls the entire device, an antenna 12a that transmits and receives radio waves to and from a base station (not shown), a communication unit 12b, a transmission / reception unit 13, and an incoming call. Speaker 14a, microphone 14b for transmission, call unit 14c, display unit 15, and input device 16.

この移動通信端末装置ＭＳは、更に、テレビ処理部２１と、映像バッファ２２と、時刻を示すＳＴＣ（System Time Clock）部２３と、音声バッファ２４と、映像再生部３１と、音声データ切替部４１と、音声再生部４２と、音声再生用のスピーカ４２ａと、近距離無線通信回線ＢＴによる通信を行う近距離無線処理部４３とを備える。映像バッファ２２には、テレビ処理部２１によって復号された映像データが記憶される。音声バッファ２４には、テレビ処理部２１によって復号された音声データが記憶される。なお、図２で、映像データ及び音声データの流れを太実線の矢印で示す。 The mobile communication terminal MS further includes a television processing unit 21, a video buffer 22, an STC (System Time Clock) unit 23 indicating time, an audio buffer 24, a video reproduction unit 31, and an audio data switching unit 41. And a voice reproduction unit 42, a voice reproduction speaker 42a, and a short-range wireless processing unit 43 that performs communication via the short-range wireless communication line BT. The video buffer 22 stores the video data decoded by the television processing unit 21. The audio buffer 24 stores the audio data decoded by the television processing unit 21. In FIG. 2, the flow of video data and audio data is indicated by thick solid arrows.

図３は、テレビ処理部２１の詳細な構成を示すブロック図である。このテレビ処理部２１は、テレビ放送電波を受信するアンテナ２１ａと、チューナ部２１ｂと、ＤＥＭＵＸ部２１ｃと、映像バッファ２２と接続される映像復号化部２１ｄと、ＳＴＣ部２３と接続されるＳＴＣ校正部２１ｅと、音声バッファ２４と接続される音声復号化部２１ｆとを備える。 FIG. 3 is a block diagram illustrating a detailed configuration of the television processing unit 21. This TV processing unit 21 includes an antenna 21a for receiving TV broadcast radio waves, a tuner unit 21b, a DEMUX unit 21c, a video decoding unit 21d connected to the video buffer 22, and an STC calibration connected to the STC unit 23. Unit 21e and an audio decoding unit 21f connected to the audio buffer 24.

図４は、近距離無線処理部４３の詳細な構成を示すブロック図である。この近距離無線処理部４３は、ＳＴＣ部２３と接続され、更に音声データ切替部４１を経由して音声バッファ２４と接続される音声符号化部４３ａと、符号化音声バッファ４３ｂと、近距離無線送受信部４３ｃと、近距離無線通信部４３ｄと、近距離無線通信回線ＢＴの電波の送受を行うアンテナ４３ｅとを備える。符号化音声バッファ４３ｂには、音声符号化部４３ａによって符号化された音声データが記憶される。 FIG. 4 is a block diagram illustrating a detailed configuration of the short-range wireless processing unit 43. The short-range wireless processing unit 43 is connected to the STC unit 23, and further connected to the voice buffer 24 via the voice data switching unit 41. The encoded voice buffer 43b is connected to the short-range wireless processing unit 43. A transmission / reception unit 43c, a short-range wireless communication unit 43d, and an antenna 43e that transmits and receives radio waves of the short-range wireless communication line BT are provided. The encoded audio buffer 43b stores the audio data encoded by the audio encoding unit 43a.

図５は、ヘッドフォン装置ＨＰの構成を示すブロック図である。このヘッドフォン装置ＨＰは、装置全体の制御を行う制御部５１と、近距離無線通信回線ＢＴによる通信を行う近距離無線通信部５２と、近距離無線通信回線ＢＴの電波の送受を行うアンテナ５２ａと、近距離無線送受信部５３と、符号化音声バッファ５４と、音声復号化部５５と、音声再生部５６と、音声再生用のスピーカ５６ａと、表示部５７と、入力装置５８とを備える。符号化音声バッファ５４には、近距離無線送受信部５３によって受信された符号化された音声データが記憶される。 FIG. 5 is a block diagram showing a configuration of the headphone device HP. The headphone device HP includes a control unit 51 that controls the entire device, a short-range wireless communication unit 52 that performs communication via the short-range wireless communication line BT, and an antenna 52a that transmits and receives radio waves on the short-range wireless communication line BT. , A short-range wireless transmission / reception unit 53, an encoded audio buffer 54, an audio decoding unit 55, an audio reproduction unit 56, an audio reproduction speaker 56a, a display unit 57, and an input device 58. The encoded audio buffer 54 stores encoded audio data received by the short-range wireless transmission / reception unit 53.

以上説明した移動通信端末装置ＭＳ及びヘッドフォン装置ＨＰは、コンピュータと、コンピュータによって利用されるプログラムから構成されても良い。特に、後述するように、制御部１１の動作及び制御部５１の動作（特に、制御部１１の動作。）は、必ずしも定型化されたものではないので、コンピュータと、コンピュータによって利用されるプログラムから構成されることが好ましい。 The mobile communication terminal device MS and the headphone device HP described above may be composed of a computer and a program used by the computer. In particular, as will be described later, the operation of the control unit 11 and the operation of the control unit 51 (particularly the operation of the control unit 11) are not necessarily stylized, and therefore, from a computer and a program used by the computer. Preferably, it is configured.

上記のように構成された、本発明の第１の実施形態に係る装置の各部の動作を図２ないし図５を参照して説明する。まず、移動通信端末装置ＭＳの各部の動作を図２を参照して説明する。通信部１２ｂは、アンテナ１２ａによって受信された高周波信号を送受信部１３へ出力し、また、送受信部１３から出力された高周波信号をアンテナ１２ａより送信する。 The operation of each part of the apparatus according to the first embodiment of the present invention configured as described above will be described with reference to FIGS. First, the operation of each unit of the mobile communication terminal device MS will be described with reference to FIG. The communication unit 12b outputs the high-frequency signal received by the antenna 12a to the transmission / reception unit 13, and transmits the high-frequency signal output from the transmission / reception unit 13 from the antenna 12a.

送受信部１３は、通信部１２ｂからの高周波信号を増幅、周波数変換及び復調し、それによってディジタル信号を得て、得られた通話音声信号を通話部１４ｃに、制御信号を制御部１１に送る。 The transmission / reception unit 13 amplifies, frequency-converts, and demodulates the high-frequency signal from the communication unit 12b, thereby obtaining a digital signal, and sends the obtained call voice signal to the call unit 14c and sends a control signal to the control unit 11.

更には、送受信部１３は、ディジタル信号、即ち、通話部１４ｃから出力された通話音声信号、制御部１１から出力された制御信号を変調、周波数変換及び増幅し、高周波信号を得て、それを通信部１２ｂに送って送信させる。 Furthermore, the transmission / reception unit 13 modulates, frequency-converts and amplifies the digital signal, that is, the call voice signal output from the call unit 14c and the control signal output from the control unit 11, to obtain a high-frequency signal. The data is transmitted to the communication unit 12b.

通話部１４ｃは、送受信部１３から出力されたディジタル音声信号をアナログ音声信号に変換し、それを増幅してスピーカ１４ａに送る。また、マイクロフォン１４ｂから出力されたアナログ音声信号を増幅し、それをディジタル音声信号に変換して送受信部１３に送信する。 The calling unit 14c converts the digital audio signal output from the transmission / reception unit 13 into an analog audio signal, amplifies it, and sends the analog audio signal to the speaker 14a. The analog audio signal output from the microphone 14 b is amplified, converted into a digital audio signal, and transmitted to the transmission / reception unit 13.

表示部１５は、例えば、ＬＣＤであり、制御部１１に制御されることで、文字・数字や映像データの表示動作を行い、表示されているデータは、入力装置１６からの入力操作や着信信号に応答して制御部１１からの指示を受けることで切換わる。 The display unit 15 is, for example, an LCD, and performs display operation of characters / numbers and video data under the control of the control unit 11. The displayed data is input operation or incoming signal from the input device 16. In response to an instruction from the control unit 11.

入力装置１６は、通信相手の電話番号などを指定するための数字キーと複数の機能キーを含むキーからなる。そして、入力装置１６のキーが操作されると、そのキーの識別子が制御部１１に通知され、制御部１１によって、表示部１５に文字として表示され、または、制御が行われる。 The input device 16 includes a key including a numeric key for designating a telephone number of a communication partner and a plurality of function keys. When the key of the input device 16 is operated, the identifier of the key is notified to the control unit 11 and displayed on the display unit 15 as characters or controlled by the control unit 11.

次に、テレビ処理部２１の各部の動作を、図３を参照して説明する。テレビ処理部２１は、制御部１１の指示によって動作を開始する。そして、チューナ部２１ｂは、アンテナ２１ａによって受信された高周波信号の中から、入力装置１６の所定のキー操作によって指定されたチャンネルの高周波信号を選択する。 Next, the operation of each unit of the television processing unit 21 will be described with reference to FIG. The television processing unit 21 starts to operate according to an instruction from the control unit 11. Then, the tuner unit 21b selects a high frequency signal of a channel designated by a predetermined key operation of the input device 16 from the high frequency signals received by the antenna 21a.

そして、チューナ部２１ｂは、選択された高周波信号を中間周波数の信号に変換し、変換された信号を復調することによって、符号化されたテレビ放送コンテンツを得る。ここで、テレビ放送コンテンツは、ＭＰＥＧ方式によって符号化された信号であるが、これに限るものではない。 Then, the tuner unit 21b converts the selected high-frequency signal into an intermediate-frequency signal and demodulates the converted signal, thereby obtaining encoded television broadcast content. Here, the television broadcast content is a signal encoded by the MPEG system, but is not limited thereto.

ＤＥＭＵＸ部２１ｃは、チューナ部２１ｂによって得られた放送コンテンツを、符号化された映像信号と、符号化された音声信号と、ＰＣＲ（Program Clock Reference、番組時刻基準参照値。）とに分離する。そして、符号化された映像信号を映像復号化部２１ｄに、符号化された音声信号を音声復号化部２１ｆに、それぞれ送る。更に、ＰＣＲをＳＴＣ校正部２１ｅに送る。ここで、音声信号は、ＡＡＣ方式によって符号化された信号であるが、これに限るものではない。 The DEMUX unit 21c separates the broadcast content obtained by the tuner unit 21b into an encoded video signal, an encoded audio signal, and PCR (Program Clock Reference). The encoded video signal is sent to the video decoding unit 21d, and the encoded audio signal is sent to the audio decoding unit 21f. Further, the PCR is sent to the STC calibration unit 21e. Here, the audio signal is a signal encoded by the AAC method, but is not limited thereto.

映像復号化部２１ｄは、ＤＥＭＵＸ部２１ｃによって分離された符号化された映像信号を、映像フレーム毎に復号する。そして、復号された映像フレーム信号にそのフレーム信号が表示される時刻ＰＴＳ（Presentation Time Stamp）を付加して映像バッファ２２に記憶させる。なお、ＰＴＳは、符号化された映像フレーム信号に付加されていた場合、その付加されていたものを用いる。 The video decoding unit 21d decodes the encoded video signal separated by the DEMUX unit 21c for each video frame. Then, a time PTS (Presentation Time Stamp) at which the frame signal is displayed is added to the decoded video frame signal and stored in the video buffer 22. If the PTS is added to the encoded video frame signal, the added PTS is used.

一方、付加されていない場合、映像復号化部２１ｄは、その符号化された映像フレーム信号と、その符号化された映像フレーム信号以前に受信され、かつ、その信号以前に表示される符号化された映像フレーム信号であって、ＰＴＳが付加されていた映像フレーム信号との間隔を符号化された映像フレーム信号の個数で数える。そして、付加されていたＰＴＳに、映像フレーム信号が作成される時間間隔と上記の個数を乗じた時間を加えることによってＰＴＳを得て付加する。 On the other hand, if not added, the video decoding unit 21d receives the encoded video frame signal and the encoded video frame signal received before the encoded video frame signal and displayed before the signal. The interval between the video frame signal and the video frame signal to which the PTS is added is counted by the number of encoded video frame signals. Then, a PTS is obtained and added by adding the time obtained by multiplying the added PTS to the time interval at which the video frame signal is created and the above number.

図６は、映像バッファ２２に記憶される復号された映像信号の形式の一例を示す。この復号された映像信号２２ａは、ＰＴＳ２２ｂと、ＰＴＳ２２ｂが示す時刻に表示される映像フレーム信号２２ｃとが関連付けられた情報であり、それらの情報がそれぞれの情報に含まれるＰＴＳ２２ｂ順に順序付けられて連なっている。 FIG. 6 shows an example of the format of the decoded video signal stored in the video buffer 22. The decoded video signal 22a is information in which the PTS 22b and the video frame signal 22c displayed at the time indicated by the PTS 22b are associated with each other, and the information is sequentially arranged in the order of the PTS 22b included in each information. Yes.

音声復号化部２１ｆは、ＤＥＭＵＸ部２１ｃによって分離された符号化された音声信号を、音声フレーム毎に復号する。そして、復号された音声フレーム信号にそのフレーム信号が出力される時刻ＰＴＳを付加して音声バッファ２４に記憶させる。なお、ＰＴＳは、符号化された音声フレーム信号に付加されていた場合、その付加されていたものを用いる。 The audio decoding unit 21f decodes the encoded audio signal separated by the DEMUX unit 21c for each audio frame. Then, a time PTS at which the frame signal is output is added to the decoded audio frame signal and stored in the audio buffer 24. If the PTS is added to the encoded audio frame signal, the added PTS is used.

一方、付加されていない場合、音声復号化部２１ｆは、その符号化された音声フレーム信号と、その符号化された音声フレーム信号以前に受信され、かつ、その信号以前に出力される符号化された音声フレーム信号であって、ＰＴＳが付加されていた音声フレーム信号との間隔を符号化された音声フレーム信号の個数で数える。そして、付加されていたＰＴＳに、音声フレーム信号が作成される時間間隔と上記の個数を乗じた時間を加えることによってＰＴＳを得て付加する。 On the other hand, if not added, the speech decoding unit 21f receives the encoded speech frame signal and the encoded speech frame signal received before the encoded speech frame signal and output before the signal. The interval between the audio frame signal and the audio frame signal to which the PTS is added is counted by the number of encoded audio frame signals. Then, a PTS is obtained and added by adding the time obtained by multiplying the added time interval by which the audio frame signal is generated and the above number to the added PTS.

図７は、音声バッファ２４に記憶される復号された音声信号の形式の一例を示す。この復号された音声信号２４ａは、ＰＴＳ２４ｂと、ＰＴＳ２４ｂが示す時刻に出力される音声フレーム信号２４ｃとが関連付けられた情報であり、それらの情報がそれぞれの情報に含まれるＰＴＳ２４ｂ順に順序付けられて連なっている。 FIG. 7 shows an example of the format of the decoded audio signal stored in the audio buffer 24. The decoded audio signal 24a is information in which the PTS 24b and the audio frame signal 24c output at the time indicated by the PTS 24b are associated with each other, and the information is sequentially arranged in the order of the PTS 24b included in each information. Yes.

ＳＴＣ校正部２１ｅは、ＤＥＭＵＸ部２１ｃによって分離されたＰＣＲを受信し、そのＰＣＲが示す時刻をＳＴＣ部２３が示すように、ＳＴＣ部２３を校正する。 The STC calibration unit 21e receives the PCR separated by the DEMUX unit 21c, and calibrates the STC unit 23 so that the STC unit 23 indicates the time indicated by the PCR.

図２を参照した移動通信端末装置ＭＳの各部の動作の説明に戻る。映像再生部３１は、制御部１１の指示によって動作を開始する。そして、ＳＴＣ部２３が示す時刻と、映像バッファ２２に記憶されたＰＴＳ２２ｂとが等しい復号された映像信号２２ａを得る。次に、得られた復号された映像信号２２ａの映像フレーム信号２２ｃを表示部１５に表示させる。映像再生部３１は、この復号された映像信号２２ａを得て表示する動作を繰り返す。 Returning to the description of the operation of each unit of the mobile communication terminal apparatus MS with reference to FIG. The video playback unit 31 starts to operate according to an instruction from the control unit 11. Then, a decoded video signal 22a in which the time indicated by the STC unit 23 is equal to the PTS 22b stored in the video buffer 22 is obtained. Next, the video frame signal 22c of the obtained decoded video signal 22a is displayed on the display unit 15. The video reproduction unit 31 repeats the operation of obtaining and displaying the decoded video signal 22a.

なお、ＳＴＣ部２３が示す時刻と、映像バッファ２２に記憶されたＰＴＳ２２ｂとが等しい際、映像フレーム信号２２ｃが表示部１５に表示されなければならない。そのため、映像再生部３１は、ＳＴＣ部２３が示す時刻が映像バッファ２２に記憶されたＰＴＳ２２ｂとが等しくなるより所定時間前に、上記の復号された映像信号２２ａを得る動作を行う。ここで、所定時間とは、映像再生部３１の表示処理に必要な時間である。なお、以上の説明では、説明を簡明にするため、この映像再生部３１の表示処理に必要な時間は０であるとした。以後も、同様に０であるとして説明する。 When the time indicated by the STC unit 23 is equal to the PTS 22b stored in the video buffer 22, the video frame signal 22c must be displayed on the display unit 15. Therefore, the video reproduction unit 31 performs an operation of obtaining the decoded video signal 22a a predetermined time before the time indicated by the STC unit 23 becomes equal to the PTS 22b stored in the video buffer 22. Here, the predetermined time is a time required for the display processing of the video reproduction unit 31. In the above description, in order to simplify the description, the time required for the display process of the video reproduction unit 31 is assumed to be zero. In the following description, it is assumed that the value is also 0.

音声データ切替部４１は、制御部１１の指示に従って、音声再生部４２と、近距離無線処理部４３のいずれか一方に、音声バッファ２４に記憶された復号された音声信号２４ａを読み出させる。 The audio data switching unit 41 causes the audio reproduction unit 42 or the short-range wireless processing unit 43 to read the decoded audio signal 24 a stored in the audio buffer 24 in accordance with an instruction from the control unit 11.

音声再生部４２は、制御部１１の指示によって動作を開始する。そして、ＳＴＣ部２３が示す時刻と、音声バッファ２４に記憶されたＰＴＳ２２ｂとが等しい復号された音声信号２４ａを得る。次に、得られた復号された音声信号２４ａの音声フレーム信号２４ｃをアナログ信号に変換して、スピーカ４２ａから出力させる。音声再生部４２は、この復号された音声信号２４ａを得て、出力させる動作を繰り返す。 The sound reproducing unit 42 starts operation in response to an instruction from the control unit 11. Then, a decoded audio signal 24a in which the time indicated by the STC unit 23 is equal to the PTS 22b stored in the audio buffer 24 is obtained. Next, the audio frame signal 24c of the obtained decoded audio signal 24a is converted into an analog signal and output from the speaker 42a. The audio reproducing unit 42 repeats the operation of obtaining and outputting the decoded audio signal 24a.

なお、ＳＴＣ部２３が示す時刻と、音声バッファ２４に記憶されたＰＴＳ２２ｂとが等しい際、音声フレーム信号２４ｃによる音声がスピーカ４２ａから出力されなければならない。そのため、音声再生部４２は、ＳＴＣ部２３が示す時刻と、音声バッファ２４に記憶されたＰＴＳ２２ｂとが等しくなるより所定時間前に、上記の復号された音声信号２４ａを得る動作を行う。 When the time indicated by the STC unit 23 is equal to the PTS 22b stored in the audio buffer 24, audio based on the audio frame signal 24c must be output from the speaker 42a. Therefore, the audio reproduction unit 42 performs an operation of obtaining the decoded audio signal 24a a predetermined time before the time indicated by the STC unit 23 becomes equal to the PTS 22b stored in the audio buffer 24.

ここで、所定時間とは、音声再生部４２の音声出力処理に必要な時間である。なお、以上の説明では、説明を簡明にするため、この音声再生部４２の音声出力処理に必要な時間は０であるとした。以後も、同様に０であるとして説明する。これは、映像再生部３１の動作説明で述べたことと同じである。 Here, the predetermined time is a time required for the sound output processing of the sound reproducing unit 42. In the above description, in order to simplify the description, the time required for the audio output process of the audio reproduction unit 42 is assumed to be zero. In the following description, it is assumed that the value is also 0. This is the same as described in the explanation of the operation of the video reproduction unit 31.

次に、近距離無線処理部４３の各部の動作を、図４を参照して説明する。近距離無線処理部４３は、制御部１１の指示によって動作を開始する。そして、音声符号化部４３ａは、ＳＴＣ部２３が示す時刻と、ＰＴＳ２２ｂとが等しい復号された音声信号２４ａを音声バッファ２４から得る。 Next, the operation of each unit of the short-range wireless processing unit 43 will be described with reference to FIG. The short-range wireless processing unit 43 starts to operate according to an instruction from the control unit 11. Then, the speech encoding unit 43a obtains from the speech buffer 24 a decoded speech signal 24a in which the time indicated by the STC unit 23 is equal to the PTS 22b.

音声符号化部４３ａは、得られた復号された音声信号２４ａの音声フレーム信号２４ｃを符号化し、符号化された音声信号を符号化音声バッファ４３ｂに記憶させる。符号化は、例えば、ＳＢＣ（Sub Band Codec）方式により、符号化は、所定のフレーム（以後、このフレームをＳＢＣフレームと称する。）単位で行うが、これに限るものではない。音声符号化部４３ａは、この復号された音声信号２４ａを得て、符号化し、記憶させる動作を繰り返す。 The audio encoding unit 43a encodes the audio frame signal 24c of the obtained decoded audio signal 24a, and stores the encoded audio signal in the encoded audio buffer 43b. Encoding is performed in units of a predetermined frame (hereinafter, this frame is referred to as an SBC frame) by, for example, the SBC (Sub Band Codec) method, but is not limited thereto. The voice encoding unit 43a repeats the operation of obtaining, encoding, and storing the decoded voice signal 24a.

近距離無線送受信部４３ｃは、符号化音声バッファ４３ｂに記憶された符号化された音声信号を、先に符号化音声バッファ４３ｂに記憶された順にＳＢＣフレーム単位で読み出し、読み出された符号化された音声信号と、制御部１１から出力された制御信号を変調、周波数変換及び増幅し、高周波信号を得て、それを近距離無線通信部４３ｄに送って、送信させる。更に、近距離無線送受信部４３ｃは、近距離無線通信部４３ｄによって受信された制御信号を制御部１１に送る。 The short-range wireless transmission / reception unit 43c reads out the encoded audio signal stored in the encoded audio buffer 43b in units of SBC frames in the order stored in the encoded audio buffer 43b, and reads the encoded audio signal. The voice signal and the control signal output from the control unit 11 are modulated, frequency-converted and amplified to obtain a high-frequency signal, which is sent to the short-range wireless communication unit 43d for transmission. Further, the short-range wireless transmission / reception unit 43 c sends the control signal received by the short-range wireless communication unit 43 d to the control unit 11.

近距離無線通信部４３ｄは、近距離無線送受信部４３ｃから送られた高周波信号をアンテナ４３ｅより送信する。また、アンテナ４３ｅによって受信された高周波信号を近距離無線送受信部４３ｃに送る。 The short-range wireless communication unit 43d transmits the high-frequency signal transmitted from the short-range wireless transmission / reception unit 43c from the antenna 43e. In addition, the high-frequency signal received by the antenna 43e is sent to the short-range wireless transceiver 43c.

次に、ヘッドフォン装置ＨＰの各部の動作を、図５を参照して説明する。近距離無線通信部５２は、アンテナ５２ａによって受信された高周波信号を近距離無線送受信部５３に送る。また、近距離無線送受信部５３から出力された高周波信号をアンテナ５２ａより送信する。 Next, the operation of each part of the headphone device HP will be described with reference to FIG. The short-range wireless communication unit 52 sends the high-frequency signal received by the antenna 52 a to the short-range wireless transmission / reception unit 53. Moreover, the high frequency signal output from the short-range wireless transmission / reception unit 53 is transmitted from the antenna 52a.

近距離無線送受信部５３は、近距離無線通信部５２からの高周波信号を増幅、周波数変換及び復調し、それによってディジタル信号を得て、得られたＳＢＣフレーム単位の音声信号を符号化音声バッファ５４に記憶させ、また、制御信号を制御部５１に送る。 The short-range wireless transmission / reception unit 53 amplifies, frequency-converts and demodulates the high-frequency signal from the short-range wireless communication unit 52, thereby obtaining a digital signal, and the obtained audio signal in SBC frame unit is encoded as the audio buffer 54. And a control signal is sent to the control unit 51.

更に、近距離無線送受信部５３は、ディジタル信号、即ち、制御部５１から出力された制御信号を変調、周波数変換及び増幅し、高周波信号を得て、それを近距離無線通信部５２に送って送信させる。 Further, the short-range wireless transmission / reception unit 53 modulates, frequency-converts and amplifies the digital signal, that is, the control signal output from the control unit 51, obtains a high-frequency signal, and sends it to the short-range wireless communication unit 52. Send it.

音声復号化部５５は、符号化音声バッファ５４に記憶された音声信号を、符号化音声バッファ５４に記憶された順にＳＢＣフレーム単位で読み出し、読み出された音声信号を復号する。そして、復号された音声信号を音声再生部５６に送る。音声再生部５６は、音声復号化部５５によって復号された音声信号をアナログ信号に変換し、変換された音声信号をスピーカ５６ａから出力させる。 The audio decoding unit 55 reads out the audio signal stored in the encoded audio buffer 54 in units of SBC frames in the order stored in the encoded audio buffer 54, and decodes the read audio signal. Then, the decoded audio signal is sent to the audio reproduction unit 56. The audio reproduction unit 56 converts the audio signal decoded by the audio decoding unit 55 into an analog signal, and outputs the converted audio signal from the speaker 56a.

表示部５７は、例えば、ＬＣＤであり、制御部５１に制御されることで、文字・数字や映像データの表示動作を行い、表示されているデータは、入力装置５８からの入力操作に応答して制御部５１からの指示を受けることで切換わる。なお、表示部５７は、例えば、ＬＥＤ等のランプであっても良い。 The display unit 57 is, for example, an LCD and is controlled by the control unit 51 to display characters / numbers and video data. The displayed data responds to an input operation from the input device 58. In response to an instruction from the control unit 51, switching is performed. The display unit 57 may be a lamp such as an LED.

入力装置５８は、複数の機能キーを含むキーからなる。そして、入力装置５８のキーが操作されると、そのキーの識別子が制御部５１に通知され、制御部５１によって、表示部５７に表示され、または、制御が行われる。 The input device 58 includes a key including a plurality of function keys. When a key of the input device 58 is operated, the identifier of the key is notified to the control unit 51 and displayed on the display unit 57 or controlled by the control unit 51.

以下、本発明の第１の実施形態に係わる移動通信端末装置ＭＳにおける映像の出力と、ヘッドフォン装置ＨＰにおける音声の出力との同期を取る処理を説明する。 Hereinafter, a process of synchronizing the video output in the mobile communication terminal apparatus MS and the audio output in the headphone apparatus HP according to the first embodiment of the present invention will be described.

まず、ヘッドフォン装置ＨＰにおける音声出力に遅延が発生する理由を説明する。上述した動作によると、テレビ処理部２１によって受信された映像及び音声からなる放送コンテンツの再生にあたり、音声を移動通信端末装置ＭＳのスピーカ４２ａから出力させる場合、音声の出力に遅延は発生しない。 First, the reason why the audio output in the headphone device HP is delayed will be described. According to the above-described operation, there is no delay in audio output when audio is output from the speaker 42a of the mobile communication terminal device MS in reproducing broadcast content including video and audio received by the television processing unit 21.

即ち、ＳＴＣ部２３が示す時刻と、映像バッファ２２に記憶されたＰＴＳ２２ｂとが等しい際、映像フレーム信号２２ｃが表示部１５に表示される。しかし、音声をヘッドフォン装置ＨＰのスピーカ５６ａから出力させる場合、音声の出力は、ＰＴＳ２４ｂが示す時刻に出力されず、以下の理由により、遅延が生じる。 That is, when the time indicated by the STC unit 23 is equal to the PTS 22b stored in the video buffer 22, the video frame signal 22c is displayed on the display unit 15. However, when audio is output from the speaker 56a of the headphone device HP, the audio output is not output at the time indicated by the PTS 24b, and a delay occurs due to the following reason.

第１に、音声信号が移動通信端末装置ＭＳにある際に発生する遅延であり、遅延時間は、音声符号化部４３ａによる符号化に要する時間、及び、その符号化された音声信号が符号化音声バッファ４３ｂに記憶されてから読み出されるまでの時間の和である。 First, there is a delay that occurs when the voice signal is in the mobile communication terminal device MS. The delay time is the time required for encoding by the voice encoding unit 43a and the encoded voice signal is encoded. This is the sum of the time from when it is stored in the audio buffer 43b until it is read out.

第２に、音声信号が近距離無線通信回線ＢＴを介して伝送されるための遅延である。第３に、音声信号がヘッドフォン装置ＨＰにある際に発生する遅延であり、遅延時間は、符号化された音声信号が符号化音声バッファ５４に記憶されてから読み出されるまでの時間、及び、音声復号化部５５による復号に要する時間の和である。 Second, there is a delay for the voice signal to be transmitted through the short-range wireless communication line BT. Third, there is a delay that occurs when the audio signal is in the headphone device HP, and the delay time is the time from when the encoded audio signal is stored in the encoded audio buffer 54 until it is read out, and the audio This is the sum of the time required for decoding by the decoding unit 55.

なお、これらの遅延時間の中で、最も大きい時間は、音声信号が移動通信端末装置ＭＳにあり、符号化された音声信号が符号化音声バッファ４３ｂに記憶されてから読み出されるまでの時間であることが多い。そして、次に大きい時間は、音声信号がヘッドフォン装置ＨＰにあり、符号化された音声信号が符号化音声バッファ５４に記憶されてから読み出されるまでの時間であることが多い。 Of these delay times, the longest time is the time from when the audio signal is in the mobile communication terminal apparatus MS and the encoded audio signal is stored in the encoded audio buffer 43b until it is read out. There are many cases. The next largest time is often the time from when the audio signal is in the headphone device HP and the encoded audio signal is stored in the encoded audio buffer 54 until it is read out.

そこで、制御部１１は、以下に説明するように、上記第１〜第３の遅延によって発生する遅延時間を推定する。そして、映像再生部３１による映像フレーム信号２２ｃの表示を第１の所定時間に渡って遅延させる、及び／または、近距離無線処理部４３による音声フレーム信号２４ｃの符号化ないし送信を第２の所定時間に渡って先行して開始させる。ここで、第１の所定時間と、第２の所定時間との和を音声がスピーカ５６ａから出力させる際の遅延時間に等しくなるように制御することによって、その遅延を打ち消す。 Therefore, the control unit 11 estimates a delay time generated by the first to third delays as described below. Then, the display of the video frame signal 22c by the video playback unit 31 is delayed for a first predetermined time, and / or the audio frame signal 24c is encoded or transmitted by the short-range wireless processing unit 43 by a second predetermined time. Start ahead over time. Here, the delay is canceled by controlling the sum of the first predetermined time and the second predetermined time to be equal to the delay time when the sound is output from the speaker 56a.

映像再生部３１による映像フレーム信号２２ｃの表示を第１の所定時間に渡って遅延させるには、制御部１１は、映像再生部３１に指示して、ＳＴＣ部２３が示す時刻に第１の所定時間を加算した和の時刻に等しいＰＴＳ２２ｂと関連付けられた映像フレーム信号２２ｃを得て、表示させる。 In order to delay the display of the video frame signal 22c by the video playback unit 31 for a first predetermined time, the control unit 11 instructs the video playback unit 31 to perform the first predetermined time at the time indicated by the STC unit 23. The video frame signal 22c associated with the PTS 22b equal to the sum of the times added is obtained and displayed.

近距離無線処理部４３による音声フレーム信号２４ｃの符号化ないし送信を第２の所定時間に渡って先行して開始させるには、制御部１１は、近距離無線処理部４３の音声符号化部４３ａに指示して、ＳＴＣ部２３が示す時刻から第２の所定時間を減算した差の時刻に等しいＰＴＳ２４ｂと関連付けられた音声フレーム信号２４ｃを得て、符号化させる。 In order to start the encoding or transmission of the audio frame signal 24c by the short-range wireless processing unit 43 in advance for the second predetermined time, the control unit 11 includes the audio encoding unit 43a of the short-range wireless processing unit 43. The voice frame signal 24c associated with the PTS 24b equal to the difference time obtained by subtracting the second predetermined time from the time indicated by the STC unit 23 is obtained and encoded.

なお、音声フレーム信号２４ｃの符号化を先行させる処理によれば、音声を移動通信端末装置ＭＳのスピーカ４２ａから出力させる場合と、音声をヘッドフォン装置ＨＰのスピーカ５６ａから出力させる場合とで、映像が表示部１５に表示される時刻の差が少ない。 In addition, according to the process that precedes the encoding of the audio frame signal 24c, the video is output when the audio is output from the speaker 42a of the mobile communication terminal device MS and when the audio is output from the speaker 56a of the headphone device HP. There is little difference in time displayed on the display unit 15.

そこで、音声を出力するスピーカを切り替えた際、表示部１５を視認している装置の使用者へ与える違和感が少ない。ただし、この先行させる処理のためには、第２の所定時間に渡って出力されるデータ量の音声フレーム信号２４ｃが音声バッファ２４に記憶されている必要がある。 Therefore, there is little discomfort given to the user of the device viewing the display unit 15 when the speaker that outputs the sound is switched. However, for the preceding processing, it is necessary that the audio frame signal 24c having a data amount output over the second predetermined time is stored in the audio buffer 24.

一方、映像フレーム信号２２ｃの表示を遅延させる処理は、映像バッファ２２に記憶されている映像フレーム信号２２ｃの量、及び、音声バッファ２４に記憶されている音声フレーム信号２４ｃの量に無関係に行うことができ、これらの量を参照する必要はない。 On the other hand, the process of delaying the display of the video frame signal 22c is performed regardless of the amount of the video frame signal 22c stored in the video buffer 22 and the amount of the audio frame signal 24c stored in the audio buffer 24. There is no need to refer to these quantities.

次に、制御部１１が、音声をヘッドフォン装置ＨＰのスピーカ５６ａから出力させる場合に発生する遅延時間を推定する処理を説明する。図８は、制御部１１が遅延時間を推定する動作のフローチャートを示す。 Next, a process for estimating the delay time that occurs when the control unit 11 outputs sound from the speaker 56a of the headphone device HP will be described. FIG. 8 shows a flowchart of an operation in which the control unit 11 estimates the delay time.

制御部１１は、所定の時間間隔で遅延時間を推定する動作を開始する（ステップＳ１１ａ）。ここで、遅延時間は、必ずしも一定とは限らないので、所定の時間間隔で推定することが望ましい。なお、所定の時間間隔は、推定された、または、予想される遅延時間と同程度以下であるのは妥当でない。なぜなら、後述するように、遅延時間の推定は、ヘッドフォン装置ＨＰへＳＢＣフレームを送信し、その送信されたＳＢＣフレームに関する回答をヘッドフォン装置ＨＰから受信することによって行う。 The control unit 11 starts an operation of estimating the delay time at a predetermined time interval (step S11a). Here, since the delay time is not always constant, it is desirable to estimate at a predetermined time interval. Note that it is not appropriate that the predetermined time interval is equal to or less than the estimated or expected delay time. This is because, as will be described later, the delay time is estimated by transmitting an SBC frame to the headphone device HP and receiving an answer regarding the transmitted SBC frame from the headphone device HP.

そこで、遅延時間と同程度以下の時間間隔で行うと、制御部１１は、受信された回答がいずれのＳＢＣフレームに関する回答かの判断に混乱をきたす可能性があるからである。また、この遅延時間の推定処理の負荷が過大になる可能性があるからである。一方、遅延時間は、必ずしも一定ではないので、遅延時間の推定は、所定の間隔で繰り返すことが妥当である。この間隔は、遅延時間の推定を繰り返し行い、推定された遅延時間の分散が小さい場合、より大きくし、分散が大きい場合、より小さくしても良い。 Therefore, if it is performed at a time interval equal to or less than the delay time, the control unit 11 may be confused in determining which of the SBC frames is the received response. In addition, the delay time estimation processing load may be excessive. On the other hand, since the delay time is not always constant, it is appropriate to repeat the estimation of the delay time at a predetermined interval. This interval may be repeatedly estimated by delay time estimation, and may be larger when the estimated delay time variance is small, and may be smaller when the variance is large.

次に、制御部１１は、近距離無線送受信部４３ｃの音声符号化部４３ａに指示して、ＳＢＣフレームに、再生応答要求を付加させると共に符号化音声バッファ４３ｂに記憶させ、そのフレームに係わる音声フレーム信号２４ｃを、音声データ切替部４１を介して音声バッファ２４から読み出した時刻を報告させる（ステップＳ１１ｂ）。 Next, the control unit 11 instructs the speech encoding unit 43a of the short-range wireless transmission / reception unit 43c to add a playback response request to the SBC frame and store it in the encoded speech buffer 43b, and the speech related to the frame. The time when the frame signal 24c is read from the audio buffer 24 via the audio data switching unit 41 is reported (step S11b).

この再生応答要求は、例えば、そのフレーム内のヘッダの１ビットを変化させることによって付加される。そして、制御部１１は、音声符号化部４３ａによって報告された時刻をＴ１１とする。ここで、音声符号化部４３ａは、時刻を、制御部１１が備えるクロック（図示せず）が示す時刻によって得てもよく、ＳＴＣ部２３が示す時刻によって得ても良い。その結果、以後、制御部１１が遅延時間を推定する動作中で用いられる時刻は、これらの２つの時刻の中のいずれか一方である。 This reproduction response request is added, for example, by changing one bit of the header in the frame. Then, the control unit 11 sets the time reported by the speech encoding unit 43a as T11. Here, the speech encoding unit 43a may obtain the time based on the time indicated by a clock (not shown) included in the control unit 11 or may be obtained based on the time indicated by the STC unit 23. As a result, the time used in the operation in which the control unit 11 estimates the delay time thereafter is one of these two times.

なお、ＳＴＣ部２３が示す時刻は、放送されたコンテンツの番組の変化等に伴い、実時刻の変化とは異なる大きな変化をすることがある。そこで、遅延時間を推定する動作中で、ＳＴＣ部２３が示す時刻を用いた場合、上記の動作によって、予想される範囲外の遅延時間、一例として、負の遅延時間が推定されることがある。制御部１１は、それらの予想される範囲外の遅延時間を破棄する。 Note that the time indicated by the STC unit 23 may change greatly from the change of the actual time due to a change in the program of the broadcasted content. Therefore, when the time indicated by the STC unit 23 is used during the operation for estimating the delay time, the above operation may estimate a delay time outside the expected range, for example, a negative delay time. . The control unit 11 discards the delay time outside the expected range.

以上の説明では、ＳＢＣフレームが遅延時間を推定するために用いられる再生応答要求が付加されたフレームであるか否かは、そのフレーム内のヘッダの１ビットである識別情報によって示されるとしたが、再生応答要求が付加されたフレームであるか否かは、ヘッダの１ビットである識別情報によると限るものではない。 In the above description, it is assumed that whether or not the SBC frame is a frame to which a playback response request used for estimating the delay time is added is indicated by identification information that is one bit of the header in the frame. Whether or not the frame is a frame to which a reproduction response request is added is not limited according to identification information that is one bit of the header.

例えば、ＳＢＣフレームには整数であるフレーム番号が付され、そのフレーム番号がある整数で割り切れる場合、そのＳＢＣフレームは、遅延時間を推定するために用いられると識別されるとしても良い。または、所定のフレーム番号が付されたＳＢＣフレームは、遅延時間を推定するために用いられると識別されるとしても良い。 For example, when an SBC frame is assigned a frame number that is an integer and the frame number is divisible by an integer, the SBC frame may be identified as being used for estimating the delay time. Alternatively, the SBC frame with a predetermined frame number may be identified as being used for estimating the delay time.

これらによれば、ある整数で割り切れるフレーム番号、または、所定のフレーム番号が、そのＳＢＣフレームが遅延時間を推定するために用いられることを識別するための識別情報である。 According to these, a frame number divisible by a certain integer or a predetermined frame number is identification information for identifying that the SBC frame is used for estimating the delay time.

これらの処理によれば、再生応答要求であるか否かを示すビットの伝送が不要であり、ヘッダのビット数の減少が得られる。なお、フレーム番号を除する整数、または、所定のフレーム番号は、予め定められているとしても良く、制御部１１の指示によって定められるとしても良い。また、制御部１１の指示によって、変更が可能としても良い。 According to these processes, it is not necessary to transmit a bit indicating whether or not it is a reproduction response request, and a reduction in the number of bits of the header can be obtained. The integer that divides the frame number or the predetermined frame number may be determined in advance or may be determined by an instruction from the control unit 11. Further, the change may be possible by an instruction from the control unit 11.

制御部１１は、近距離無線送受信部４３ｃに指示して、上記再生応答要求が付加されたＳＢＣフレームが近距離無線通信部４３ｄによって近距離無線通信回線ＢＴに送信された時刻を報告させる（ステップＳ１１ｃ）。報告された時刻をＴ１２とする。続いて、近距離無線送受信部４３ｃに指示して、上記再生応答要求が付加されたＳＢＣフレームがヘッドフォン装置ＨＰによって受信されたとの回答を得た時刻を報告させる（ステップＳ１１ｄ）。報告された時刻をＴ１３とする。 The control unit 11 instructs the short-range wireless transmission / reception unit 43c to report the time when the SBC frame to which the reproduction response request is added is transmitted to the short-range wireless communication line BT by the short-range wireless communication unit 43d (Step S11). S11c). Let the reported time be T12. Subsequently, the short-range wireless transmission / reception unit 43c is instructed to report the time when the reply that the SBC frame to which the reproduction response request is added is received by the headphone device HP is obtained (step S11d). Let the reported time be T13.

制御部１１は、更に、近距離無線送受信部４３ｃに指示して、上記再生応答要求が付加されたＳＢＣフレームがヘッドフォン装置ＨＰによって出力されたとの回答を得た時刻を報告させる（ステップＳ１１ｅ）。報告された時刻をＴ１４とする。 Further, the control unit 11 instructs the short-range wireless transmission / reception unit 43c to report the time when the response indicating that the SBC frame to which the reproduction response request is added is output by the headphone device HP is obtained (step S11e). Let the reported time be T14.

報告された時刻Ｔ１１〜Ｔ１４によって、制御部１１は、遅延時間を推定して（ステップＳ１１ｆ）、遅延時間の推定動作を終了する（ステップＳ１１ｇ）。ここで、遅延時間は、
（Ｔ１４−Ｔ１１）−（Ｔ１３−Ｔ１２）／２
と算出して推定する。第１項の（Ｔ１４−Ｔ１１）は、ＳＢＣフレームに係わる音声フレーム信号２４ｃが音声バッファ２４から読み出されてから、そのＳＢＣフレームに含まれる音声が出力されるまでの時間を示す。 Based on the reported times T11 to T14, the control unit 11 estimates the delay time (step S11f) and ends the delay time estimation operation (step S11g). Here, the delay time is
(T14-T11)-(T13-T12) / 2
And calculate and estimate. The first term (T14-T11) indicates the time from when the audio frame signal 24c related to the SBC frame is read from the audio buffer 24 until the audio included in the SBC frame is output.

ただし、この時間には、ＳＢＣフレームに係わる音声がヘッドフォン装置ＨＰによって出力されたとの回答が近距離無線通信回線ＢＴを介して伝送されるための遅延が加わっている。そこで、第２項の（Ｔ１３−Ｔ１２）／２は、その加わった遅延を差し引いて補正するための項である。 However, at this time, there is a delay for transmitting a response that the sound related to the SBC frame is output by the headphone device HP via the short-range wireless communication line BT. Therefore, the second term (T13-T12) / 2 is a term for subtracting and correcting the added delay.

即ち、（Ｔ１３−Ｔ１２）は、ＳＢＣフレームが近距離無線送受信部４３ｃからヘッドフォン装置ＨＰへ送信される際の近距離無線通信回線ＢＴを介して伝送されるための遅延と、ヘッドフォン装置ＨＰがそのフレームを受信したとの回答を送信する際の近距離無線通信回線ＢＴを介して伝送されるための遅延との２つの遅延による遅延時間の和である。そこで、第２項では、（Ｔ１３−Ｔ１２）を２で除している。 That is, (T13-T12) is a delay for transmitting the SBC frame via the short-range wireless communication line BT when the SBC frame is transmitted from the short-range wireless transmission / reception unit 43c to the headphone device HP, and the headphone device HP This is a sum of delay times due to two delays, a delay for transmission via the short-range wireless communication line BT when transmitting a reply that a frame has been received. Therefore, in the second term, (T13-T12) is divided by 2.

なお、上記のように推定された遅延時間が、予想される範囲外の値である場合、制御部１１は、その値を一時的な変動とみなして破棄しても良い。また、制御部１１は、直近の過去に推定された所定の個数の遅延時間の平均値を算出することによって遅延時間としても良い。 When the delay time estimated as described above is a value outside the expected range, the control unit 11 may regard the value as a temporary change and discard it. Further, the control unit 11 may obtain the delay time by calculating an average value of a predetermined number of delay times estimated in the latest past.

更に、推定された遅延時間が、時刻に対して単調増加、または、単調減少しているとみなされる場合、制御部１１は、遅延時間を時刻に対して１次関数であると仮定しても良い。その関数のパラメータは、例えば、最小二乗法によって求めることができる。 Furthermore, when the estimated delay time is considered to be monotonically increasing or decreasing monotonously with respect to time, the control unit 11 assumes that the delay time is a linear function with respect to time. good. The parameter of the function can be obtained by, for example, the least square method.

また、推定された遅延時間が、時刻に対して、増加及び減少を繰り返す場合、制御部１１は、遅延時間を一定数と、正弦関数との和の関数であると仮定しても良い。その関数のパラメータ、即ち、一定数と、正弦関数の振幅、周波数及び初期位相は、例えば、最小二乗法によって求めることができる。 When the estimated delay time repeatedly increases and decreases with respect to time, the control unit 11 may assume that the delay time is a function of the sum of a fixed number and a sine function. The parameters of the function, that is, the constant number and the amplitude, frequency, and initial phase of the sine function can be obtained by, for example, the least square method.

次に、移動通信端末装置ＭＳにおける映像の出力と、ヘッドフォン装置ＨＰにおける音声の出力との同期を取る処理であって、ヘッドフォン装置ＨＰの各部の処理を説明する。ヘッドフォン装置ＨＰの近距離無線送受信部５３は、上記要求が付加されたＳＢＣフレームを受信すると、直ちにそのフレームが受信された旨を移動通信端末装置ＭＳに送信する。そして、その要求が付加されたまま、ＳＢＣフレーム単位の音声信号を符号化音声バッファ５４に記憶させる。 Next, a process of synchronizing the output of video in the mobile communication terminal apparatus MS and the output of sound in the headphone apparatus HP, which will be described in each part of the headphone apparatus HP. When the short-range wireless transmission / reception unit 53 of the headphone device HP receives the SBC frame to which the request is added, the short-range wireless transmission / reception unit 53 immediately transmits to the mobile communication terminal device MS that the frame has been received. Then, with the request added, the audio signal in SBC frame units is stored in the encoded audio buffer 54.

そして、音声復号化部５５は、その要求が付加されたＳＢＣフレームに係わる音声が音声再生部５６からスピーカ５６ａに出力された際、そのＳＢＣフレームに含まれる音声を再生した旨の回答を制御部５１、近距離無線送受信部５３を介して移動通信端末装置ＭＳに送信させる。 Then, when the voice related to the SBC frame to which the request is added is output from the voice playback unit 56 to the speaker 56a, the voice decoding unit 55 sends a response indicating that the voice included in the SBC frame has been played back. 51, the mobile communication terminal device MS is made to transmit via the short-range wireless transmission / reception unit 53.

なお、ＳＢＣフレームがヘッドフォン装置ＨＰによって出力されたとの回答が近距離無線通信回線ＢＴを介して伝送されるための遅延時間の推定は、上記要求が付加されたＳＢＣフレームの送受信に併せて行われると限るものではない。 Note that the estimation of the delay time for transmitting the response that the SBC frame is output by the headphone device HP via the short-range wireless communication line BT is performed together with the transmission / reception of the SBC frame to which the request is added. It is not limited.

任意のデータが近距離無線送受信部４３ｃからヘッドフォン装置ＨＰに送信された後、ヘッドフォン装置ＨＰの近距離無線送受信部５３によってそのデータが受信された際に、近距離無線送受信部５３が直ちに受信された旨を移動通信端末装置ＭＳに送信することによって行われるとしても良い。 After arbitrary data is transmitted from the short-range wireless transmission / reception unit 43c to the headphone device HP, when the data is received by the short-range wireless transmission / reception unit 53 of the headphone device HP, the short-range wireless transmission / reception unit 53 is immediately received. It may be performed by transmitting a message to the mobile communication terminal device MS.

また、ヘッドフォン装置ＨＰの近距離無線送受信部５３が受信された旨を移動通信端末装置ＭＳに送信することなく、近距離無線送受信部４３ｃが近距離無線通信回線ＢＴの通信で用いられる所定のプロトコルのステップを行った時刻からの算出によって推定されるとしても良い。また、その遅延時間は、予め近距離無線通信部４３ｄの仕様に従って定められるとしても良い。 Also, a predetermined protocol used by the short-range wireless transmission / reception unit 43c for communication over the short-range wireless communication line BT without transmitting the fact that the short-range wireless transmission / reception unit 53 of the headphone device HP has been received to the mobile communication terminal device MS. It may be estimated by calculation from the time when the above step is performed. The delay time may be determined in advance according to the specifications of the short-range wireless communication unit 43d.

（第２の実施形態）
第２の実施形態が第１の実施形態と異なる点は、移動通信端末装置ＭＳにある。そこで、第２の実施形態に係わる移動通信端末装置ＭＳの構成及び動作を説明する。なお、第１の実施形態に係わる移動通信端末装置ＭＳと同じ部分については、同じ符号を付して説明を省略する。なお、制御部１１には同じ符号を付しているが、遅延時間の推定動作に相違があるので、その動作を説明する。 (Second Embodiment)
The second embodiment is different from the first embodiment in the mobile communication terminal device MS. Therefore, the configuration and operation of the mobile communication terminal device MS according to the second embodiment will be described. Note that the same parts as those of the mobile communication terminal apparatus MS according to the first embodiment are denoted by the same reference numerals and description thereof is omitted. Although the control unit 11 is denoted by the same reference numeral, there is a difference in the delay time estimation operation, and the operation will be described.

図９は、第２の実施形態に係わる移動通信端末装置ＭＳの構成を示すブロック図である。この移動通信端末装置ＭＳは、図２に構成を示す第１の実施形態に係わる移動通信端末装置ＭＳと比較して、近距離無線処理部４３に代えて近距離無線処理部４３−２を備え、また、遅延時間推定用のマイクロフォン４４を備えている。 FIG. 9 is a block diagram showing a configuration of the mobile communication terminal apparatus MS according to the second embodiment. This mobile communication terminal device MS includes a short-range wireless processing unit 43-2 instead of the short-range wireless processing unit 43, as compared with the mobile communication terminal device MS according to the first embodiment whose configuration is shown in FIG. In addition, a microphone 44 for delay time estimation is provided.

図１０は、近距離無線処理部４３−２の詳細な構成を示すブロック図である。近距離無線処理部４３−２は、第１の実施形態に係わる近距離無線処理部４３と比較して、音声符号化部４３ａに代えて音声符号化部４３ａ２を備えている。 FIG. 10 is a block diagram illustrating a detailed configuration of the short-range wireless processing unit 43-2. The short-range wireless processing unit 43-2 includes a speech encoding unit 43a2 instead of the speech encoding unit 43a, as compared with the short-range wireless processing unit 43 according to the first embodiment.

音声符号化部４３ａ２の動作を説明する。音声符号化部４３ａ２は、第１の実施形態に係わる音声符号化部４３ａの復号された音声信号２４ａを得て、得られた音声フレーム信号２４ｃを符号化し、符号化された音声信号を符号化音声バッファ４３ｂに記憶させる動作に加えて、以下の動作を行う。 The operation of the speech encoding unit 43a2 will be described. The speech encoding unit 43a2 obtains the decoded speech signal 24a of the speech encoding unit 43a according to the first embodiment, encodes the obtained speech frame signal 24c, and encodes the encoded speech signal. In addition to the operation stored in the audio buffer 43b, the following operation is performed.

音声符号化部４３ａ２は、制御部１１の指示に基づいて、所定の遅延時間推定用音声信号を符号化し、符号化された遅延時間推定用音声信号を符号化音声バッファ４３ｂに記憶させる。ここで、所定の遅延時間推定用音声信号は、音声バッファ２４に記憶される音声フレーム信号２４ｃには含まれない人工的な音声信号であって、１つまたは複数の所定の周波数の音声信号がそれぞれ所定の音量で加算され、使用者の聴覚器官に悪影響を及ぼさない音声信号である。 The voice encoding unit 43a2 encodes a predetermined delay time estimation speech signal based on an instruction from the control unit 11, and stores the encoded delay time estimation speech signal in the encoded speech buffer 43b. Here, the predetermined delay time estimation audio signal is an artificial audio signal not included in the audio frame signal 24c stored in the audio buffer 24, and one or a plurality of audio signals having a predetermined frequency are included. These audio signals are added at a predetermined volume and do not adversely affect the user's auditory organ.

この所定の遅延時間推定用音声信号は、音声符号化部４３ａ２によって符号化可能であり、ヘッドフォン装置ＨＰによって出力可能であり、かつ、ヘッドフォン装置ＨＰによって出力された音声をマイクロフォン４４によって入力可能なものである。そして、ヘッドフォン装置ＨＰによって出力された際、装置の使用者には聴取不可能、または聴取が困難であることが望ましい。即ち、人間の聴力によっては聴取が不可能、または困難な周波数からなる音声信号であることが望ましい。 The predetermined delay time estimation audio signal can be encoded by the audio encoding unit 43a2, can be output by the headphone device HP, and the audio output by the headphone device HP can be input by the microphone 44. It is. And when output by the headphone device HP, it is desirable that the user of the device is incapable of listening or difficult to hear. That is, it is desirable that the audio signal has a frequency that is impossible or difficult to hear depending on human hearing.

次に、第２の実施形態に係わる制御部１１が遅延時間を推定する動作を説明する。図１１は、制御部１１が遅延時間を推定する動作のフローチャートを示す。制御部１１は、遅延時間を推定する動作を開始し（ステップＳ１１ｉ）、音声符号化部４３ａ２に指示して、所定の遅延時間推定用音声信号を符号化させ、符号化音声バッファ４３ｂに記憶させる。そして、その符号化を開始した時刻を報告させる（ステップＳ１１ｊ）。報告された時刻とＴ２１とする。 Next, an operation in which the control unit 11 according to the second embodiment estimates the delay time will be described. FIG. 11 shows a flowchart of an operation in which the control unit 11 estimates the delay time. The control unit 11 starts an operation for estimating the delay time (step S11i), instructs the speech encoding unit 43a2 to encode a predetermined delay time estimation speech signal, and stores it in the encoded speech buffer 43b. . And the time which started the encoding is reported (step S11j). Reported time and T21.

次に、制御部１１は、所定の時間の待ち時間を取る（ステップＳ１１ｋ）。この時間は、予想される遅延時間より短い時間であり、所定の遅延時間推定用音声信号が出力された音声以外の音声であって、その遅延時間推定用音声信号が出力されたものと同じ音声がマイクロフォン４４によって入力されたことによる遅延時間の誤った推定を避けるためである。 Next, the control part 11 takes the waiting time of predetermined time (step S11k). This time is shorter than the expected delay time and is a sound other than the sound from which the predetermined delay time estimation sound signal is output, and is the same sound as that from which the delay time estimation sound signal was output. This is to avoid an erroneous estimation of the delay time due to the fact that is input by the microphone 44.

続いて、制御部１１は、マイクロフォン４４によって入力された音声が所定の遅延時間推定用音声信号が出力されたものと一致するか否かを判断し（ステップＳ１１ｍ）、一致した場合、その音声がマイクロフォン４４によって入力された時刻を得る（ステップＳ１１ｎ）。得られた時刻をＴ２２とする。 Subsequently, the control unit 11 determines whether or not the voice input by the microphone 44 matches the output of the predetermined delay time estimation voice signal (step S11m). The time input by the microphone 44 is obtained (step S11n). Let the obtained time be T22.

そして、制御部１１は、遅延時間を推定して（ステップＳ１１ｏ）、遅延時間の推定動作を終了する（ステップＳ１１ｐ）。ここで、遅延時間は、
Ｔ２２−Ｔ２１
と算出して推定する。 And the control part 11 estimates delay time (step S11o), and complete | finishes the estimation operation of delay time (step S11p). Here, the delay time is
T22-T21
And calculate and estimate.

ステップＳ１１ｍで、一致しない場合、制御部１１は、ステップＳ１１ｍの、一致するか否かを判断する動作を繰り返す。また、長時間に渡って一致しない場合、制御部１１は、遅延時間の推定を行わないまま、その推定動作を終了する（ステップＳ１１ｐ）。ここで、長時間とは、予想される遅延時間の最大値を超える時間である。 If they do not match in step S11m, the control unit 11 repeats the operation of step S11m to determine whether or not they match. On the other hand, if they do not match for a long time, the control unit 11 ends the estimation operation without estimating the delay time (step S11p). Here, the long time is a time exceeding the maximum expected delay time.

長時間に渡って一致しない場合、ヘッドフォン装置ＨＰから出力された音声がマイクロフォン４４によって入力不可能である、即ち、ヘッドフォン装置ＨＰが動作していないことに限らず、例えば、ヘッドフォン装置ＨＰから出力された音声の音量が小さい、または、ヘッドフォン装置ＨＰとマイクロフォン４４との間が長距離である、などの理由が考えられ、制御部１１は、遅延時間の推定が不可能と判断するためである。 If they do not match for a long time, the sound output from the headphone device HP cannot be input by the microphone 44. That is, the sound is not limited to the headphone device HP not operating, for example, output from the headphone device HP. This is because the control unit 11 determines that the delay time cannot be estimated, for example, because the volume of the sound is low or the distance between the headphone device HP and the microphone 44 is long.

所定の遅延時間推定用音声信号が装置の使用者には聴取不可能、または聴取が困難であり、かつ、ヘッドフォン装置ＨＰから出力された音声をマイクロフォン４４によって入力することが常に不可能と限らない場合、制御部１１は、遅延時間を推定する動作を所定の時間間隔で行う。ここで、所定の時間間隔については、第１の実施形態のおける遅延時間を推定する動作説明の際に述べた通りである。 The audio signal for estimating the delay time cannot be heard by the user of the device or is difficult to hear, and it is not always impossible to input the sound output from the headphone device HP by the microphone 44. In this case, the control unit 11 performs an operation for estimating the delay time at a predetermined time interval. Here, the predetermined time interval is as described in the description of the operation for estimating the delay time in the first embodiment.

通常、移動通信端末装置ＭＳと、ヘッドフォン装置ＨＰとは、数十センチメートルから１メートル程度の距離をおいて使われる。また、ヘッドフォン装置ＨＰのスピーカ５６ａは、ヘッドフォン装置ＨＰの使用者の耳の方向に音声を出力する。しかし、ヘッドフォン装置ＨＰのスピーカ５６ａから出力された音声は、ヘッドフォン装置ＨＰの周囲に漏れるように設計されることがある。 Normally, the mobile communication terminal device MS and the headphone device HP are used at a distance of about several tens of centimeters to 1 meter. The speaker 56a of the headphone device HP outputs sound in the direction of the ear of the user of the headphone device HP. However, the sound output from the speaker 56a of the headphone device HP may be designed to leak around the headphone device HP.

また、マイクロフォン４４は、所定の遅延時間推定用音声信号の受信のためにあり、周波数特性や、入力された音声の増幅率は、その信号の受信専用に設計される。そこで、ヘッドフォン装置ＨＰから出力された音声をマイクロフォン４４によって入力することが常に不可能であるとは限らない。そこで、制御部１１は、遅延時間を推定する動作を所定の時間間隔で行うことが、有効である。ここで、推定が常に可能ではなくとも良い。 The microphone 44 is for receiving a predetermined delay time estimation audio signal, and the frequency characteristics and the amplification factor of the input audio are designed exclusively for reception of the signal. Therefore, it is not always impossible to input the sound output from the headphone device HP through the microphone 44. Therefore, it is effective for the control unit 11 to perform an operation for estimating the delay time at predetermined time intervals. Here, estimation may not always be possible.

また、音声符号化部４３ａ２は、所定の遅延時間推定用音声信号を符号化する際、音声バッファ２４に記憶された符号化すべき音声フレーム信号２４ｃが無音である時間帯を選択することが適切である。所定の遅延時間推定用音声信号を符号化によって、音声フレーム信号２４ｃの符号化へ影響を与えることを避けるためである。 In addition, when the speech encoding unit 43a2 encodes a predetermined delay time estimation speech signal, it is appropriate to select a time zone in which the speech frame signal 24c to be encoded stored in the speech buffer 24 is silent. is there. This is to avoid affecting the encoding of the audio frame signal 24c by encoding the predetermined delay time estimation audio signal.

また、遅延時間推定用音声信号は、人間の聴力によっては聴取が不可能、または困難な周波数からなる音声信号であると限るものではない。人間の聴力によって聴取が可能な音声であっても、マスキング効果によって装置の使用者には聴取不可能、または聴取が困難である音声信号でも良い。 The delay time estimation audio signal is not limited to an audio signal having a frequency that cannot be heard or is difficult to hear depending on human hearing. Even a voice that can be heard by human hearing may be a voice signal that cannot be heard by a user of the apparatus due to a masking effect or that is difficult to hear.

このマスキング効果を用いる場合、音声符号化部４３ａ２は、遅延時間推定用音声信号の周波数に近い周波数で、かつ、大きな音量の音声信号が発生される時刻の前後にのみ、遅延時間推定用音声信号を音声フレーム信号２４ｃに加えた上で符号化する。 When this masking effect is used, the speech encoding unit 43a2 has a frequency close to the frequency of the delay time estimation speech signal and only before and after the time when the speech signal with a large volume is generated. Is added to the audio frame signal 24c and encoded.

そのため、遅延時間推定用音声信号が常に同じ周波数ではなく、適宜複数の周波数の中の１つを用いるとしても、所定の時間間隔で遅延時間推定用音声信号を符号化することはできない。しかし、所定の時間間隔ではないにせよ、繰り返して遅延時間を推定することによる効果がある。 For this reason, even if the delay time estimation speech signal is not always the same frequency and one of a plurality of frequencies is appropriately used, the delay time estimation speech signal cannot be encoded at a predetermined time interval. However, even if it is not a predetermined time interval, there is an effect by repeatedly estimating the delay time.

一方、所定の遅延時間推定用音声信号が装置の使用者に常に聴取可能、かつ、ヘッドフォン装置ＨＰから出力された音声をマイクロフォン４４によって入力することが、使用者がヘッドフォン装置ＨＰを使用している際に常に不可能とは限らない場合、音声符号化部４３ａ２は、音声バッファ２４に記憶される音声フレーム信号２４ｃには含まれる音声信号であって、特徴のある音声信号を遅延時間推定用音声信号とする。 On the other hand, it is possible for the user of the apparatus to always listen to a predetermined delay time estimation audio signal, and the user uses the headphone apparatus HP when the sound output from the headphone apparatus HP is input by the microphone 44. In the case where it is not always impossible, the speech encoding unit 43a2 is a speech signal included in the speech frame signal 24c stored in the speech buffer 24, and the featured speech signal is converted into a delay time estimation speech. Signal.

このように用いられる遅延時間推定用音声信号は、例えば、特徴のある周波数分布の音声信号、即ち、所定の楽器の音であり、また、所定の無音の後の大きな音量の音声信号である。この遅延時間推定用音声信号を用いる場合も、所定の時間間隔ではないにせよ、繰り返して遅延時間を推定することができる効果がある。 The delay time estimation audio signal used in this way is, for example, an audio signal having a characteristic frequency distribution, that is, a sound of a predetermined instrument, and an audio signal having a large volume after a predetermined silence. Even when this delay time estimation audio signal is used, there is an effect that the delay time can be repeatedly estimated even if it is not a predetermined time interval.

また、ヘッドフォン装置ＨＰから出力された音声をマイクロフォン４４によって入力することが、使用者がヘッドフォン装置ＨＰを使用している際には常に不可能である場合、制御部１１は、近距離無線処理部４３−２に音声信号をヘッドフォン装置ＨＰに送らせる制御をし、上記遅延時間の推定を行った後、音声データ切替部４１を制御して、近距離無線処理部４３−２に音声バッファ２４に記憶された音声フレーム信号２４ｃを読み出させても良い。 In addition, when it is impossible for the user to always input the sound output from the headphone device HP by the microphone 44 when the user is using the headphone device HP, the control unit 11 displays the short-range wireless processing unit. 43-2 is controlled to send the audio signal to the headphone device HP, and after estimating the delay time, the audio data switching unit 41 is controlled, and the short-range wireless processing unit 43-2 is connected to the audio buffer 24. The stored audio frame signal 24c may be read out.

また、上記遅延時間の推定に先んじて、制御部１１は、使用者に対して、ヘッドフォン装置ＨＰのスピーカ５６ａをマイクロフォン４４に近づけるように促す報知を行うことが好ましい。この報知は、表示部１５への表示、スピーカ４２ａからの音声出力、スピーカ５６ａからの音声出力などによる。 Prior to the estimation of the delay time, it is preferable that the control unit 11 notifies the user to bring the speaker 56a of the headphone device HP closer to the microphone 44. This notification is based on display on the display unit 15, sound output from the speaker 42a, sound output from the speaker 56a, and the like.

なお、第２の実施形態に係わる移動通信端末装置ＭＳ及びヘッドフォン装置ＨＰの各部は、第１の実施形態に係わる再生応答要求が付加されたＳＢＣフレームを作成する機能、及び、そのフレームの作成及び送受信がされた時刻を報告する機能を要しない。 Each unit of the mobile communication terminal device MS and the headphone device HP according to the second embodiment has a function of creating an SBC frame to which a reproduction response request according to the first embodiment is added, Does not require the function to report the time of transmission / reception.

以上の説明では、第２の実施形態に係わる移動通信端末装置ＭＳは、マイクロフォン１４ｂとは異なるマイクロフォン４４を備えるとしたが、これに限るものではない。マイクロフォン４４を備えず、遅延時間推定に送話用のマイクロフォン１４ｂを用いても良い。 In the above description, the mobile communication terminal device MS according to the second embodiment is provided with the microphone 44 different from the microphone 14b, but is not limited thereto. The microphone 44b may not be provided and the transmission microphone 14b may be used for delay time estimation.

この第２の実施形態におけるヘッドフォン装置ＨＰは、ヘッドフォン装置ＨＰの必須機能、即ち、近距離無線通信回線ＢＴを介して受信された音声をスピーカ５６ａから出力する機能以外の動作を要しない。即ち、如何なるヘッドフォン装置ＨＰにも、この第２の実施形態を適用することが可能である。 The headphone device HP in the second embodiment does not require any operation other than the essential function of the headphone device HP, that is, the function of outputting the sound received via the short-range wireless communication line BT from the speaker 56a. In other words, the second embodiment can be applied to any headphone device HP.

（第３の実施形態）
第３の実施形態が第１の実施形態と異なる点は、ヘッドフォン装置ＨＰにある。そこで、第２の実施形態に係わるヘッドフォン装置ＨＰの構成及び動作を説明する。なお、第１の実施形態に係わるヘッドフォン装置ＨＰと同じ部分については、同じ符号を付して説明を省略する。なお、移動通信端末装置ＭＳの制御部１１には同じ符号を付しているが、遅延時間の推定動作に相違があるので、その動作を説明する。 (Third embodiment)
The third embodiment is different from the first embodiment in the headphone device HP. Therefore, the configuration and operation of the headphone device HP according to the second embodiment will be described. Note that the same portions as those of the headphone device HP according to the first embodiment are denoted by the same reference numerals and description thereof is omitted. In addition, although the same code | symbol is attached | subjected to the control part 11 of mobile communication terminal device MS, since there exists a difference in the estimation operation | movement of delay time, the operation | movement is demonstrated.

図１２は、第２の実施形態に係わるヘッドフォン装置ＨＰの構成を示すブロック図である。このヘッドフォン装置ＨＰは、図５に構成を示す第１の実施形態に係わるヘッドフォン装置ＨＰと比較して、制御部５１に代えて制御部５１−３を、近距離無線送受信部５３に代えて近距離無線送受信部５３−３を、そして、音声復号化部５５に代えて音声復号化部５５−３を備える。 FIG. 12 is a block diagram showing a configuration of a headphone device HP according to the second embodiment. Compared with the headphone device HP according to the first embodiment whose configuration is shown in FIG. 5, the headphone device HP is replaced with a control unit 51-3 instead of the control unit 51 and a near-field wireless transmission / reception unit 53. The distance wireless transmission / reception unit 53-3 is provided, and a voice decoding unit 55-3 is provided instead of the voice decoding unit 55.

近距離無線送受信部５３−３の動作と、第１の実施形態に係わる近距離無線送受信部５３の動作との相違は、以下の点である。即ち、近距離無線送受信部５３−３は、所定の識別情報が付加されたＳＢＣフレームを受信すると、その旨を直ちに制御部５１−３に通知する。一方、第１の実施形態に係わる近距離無線送受信部５３は、その旨を移動通信端末装置ＭＳに送信する。 The difference between the operation of the short-range wireless transmission / reception unit 53-3 and the operation of the short-range wireless transmission / reception unit 53 according to the first embodiment is as follows. That is, when the short-range wireless transmission / reception unit 53-3 receives the SBC frame to which the predetermined identification information is added, the short-range wireless transmission / reception unit 53-3 immediately notifies the control unit 51-3 of that fact. On the other hand, the short-range wireless transmission / reception unit 53 according to the first embodiment transmits a message to that effect to the mobile communication terminal device MS.

音声復号化部５５−３の動作と、第１の実施形態に係わる音声復号化部５５の動作との相違は、以下の点である。即ち、音声復号化部５５−３は、所定の識別情報が付加されたＳＢＣフレーム単位の音声信号が音声再生部５６からスピーカ５６ａに出力された際、その旨を直ちに制御部５１−３に通知する。一方、第１の実施形態に係わる音声復号化部５５は、その旨を移動通信端末装置ＭＳに送信させる。 The difference between the operation of the speech decoding unit 55-3 and the operation of the speech decoding unit 55 according to the first embodiment is as follows. That is, when the audio signal for each SBC frame to which the predetermined identification information is added is output from the audio reproduction unit 56 to the speaker 56a, the audio decoding unit 55-3 immediately notifies the control unit 51-3. To do. On the other hand, the speech decoding unit 55 according to the first embodiment transmits a message to that effect to the mobile communication terminal device MS.

制御部５１−３は、第１の実施形態に係わる制御部５１の動作に加えて、近距離無線送受信部５３−３から送られた上記通知が受信された時刻と、音声復号化部５５−３から送られた上記通知が受信された時刻との差の時間を測定する。そして、その時間（Ｔ３１）を近距離無線送受信部５３−３を介して移動通信端末装置ＭＳに送信させる。このＴ３１は、音声信号がヘッドフォン装置ＨＰにある際に発生する遅延時間である。 In addition to the operation of the control unit 51 according to the first embodiment, the control unit 51-3 receives the time when the notification sent from the short-range wireless transmission / reception unit 53-3 is received, and the speech decoding unit 55- The time of the difference from the time when the notification sent from 3 is received is measured. Then, the time (T31) is transmitted to the mobile communication terminal device MS via the short-range wireless transmission / reception unit 53-3. This T31 is a delay time that occurs when the audio signal is in the headphone device HP.

なお、音声信号がヘッドフォン装置ＨＰにある際に発生する遅延時間を測定するための上記の処理は、移動通信端末装置ＭＳから所定の識別情報が付加されたＳＢＣフレームを受信することによって行われるとしたが、これに限るものではない。ヘッドフォン装置ＨＰ内で定められた所定のＳＢＣフレームに関して行われるとしても良い。また、測定された時間（Ｔ３１）の送信は、測定される度に行われるとしたが、これに限るものではない。移動通信端末装置ＭＳから要求を受信する度に、最新の測定された時間（Ｔ３１）を、または、近い過去に測定された時間の平均を算出した時間（Ｔ３１）を送信しても良い。 Note that the above-described processing for measuring the delay time generated when the audio signal is in the headphone device HP is performed by receiving an SBC frame to which predetermined identification information is added from the mobile communication terminal device MS. However, it is not limited to this. It may be performed with respect to a predetermined SBC frame defined in the headphone device HP. In addition, the transmission of the measured time (T31) is performed every time it is measured, but the present invention is not limited to this. Each time a request is received from the mobile communication terminal device MS, the latest measured time (T31) or the time (T31) calculated from the average of the times measured in the near past may be transmitted.

制御部５１−３は、上述のように、２つの通知が受信された時刻の間の差の時間を得れば良く、時刻を得る必要はない。言い換えると、クロックを備える必要はないので、複雑で高価な構成とする必要はない。 As described above, the control unit 51-3 only needs to obtain the time difference between the times when the two notifications are received, and does not need to obtain the time. In other words, since it is not necessary to provide a clock, there is no need for a complicated and expensive configuration.

次に、第３の実施形態に係わる制御部１１が遅延時間を推定する動作を説明する。図１３は、制御部１１が遅延時間を推定する動作のフローチャートを示す。なお、第１の実施形態に係わる制御部１１が遅延時間を推定する動作に含まれる動作ステップについては、同じ符号を付して説明を省略する。 Next, an operation in which the control unit 11 according to the third embodiment estimates the delay time will be described. FIG. 13 shows a flowchart of the operation in which the control unit 11 estimates the delay time. In addition, about the operation | movement step included in the operation | movement which the control part 11 concerning 1st Embodiment estimates delay time, the same code | symbol is attached | subjected and description is abbreviate | omitted.

制御部１１は、まず、ステップＳ１１ａ〜ステップＳ１１ｃの、遅延時間を推定する動作を開始し、所定の識別情報が付加されたＳＢＣフレームに係わる音声フレーム信号２４ｃを音声バッファ２４から読み出した時刻Ｔ１１を音声符号化部４３ａから得て、更に、上記所定の識別情報が付加されたＳＢＣフレームが近距離無線通信回線ＢＴに送信された時刻Ｔ１２を近距離無線送受信部４３ｃから得る動作を行う。 First, the control unit 11 starts the operation of estimating the delay time in steps S11a to S11c, and reads the time T11 when the audio frame signal 24c related to the SBC frame to which the predetermined identification information is added is read from the audio buffer 24. An operation of obtaining from the short-range wireless transmission / reception unit 43c the time T12 obtained from the speech encoding unit 43a and further transmitting the SBC frame to which the predetermined identification information is added to the short-range wireless communication line BT is performed.

続いて、制御部１１は、ヘッドフォン装置ＨＰから送信された、音声信号がヘッドフォン装置ＨＰにある際に発生する遅延時間Ｔ３１を近距離無線送受信部４３ｃを介して受信する（ステップＳ１１ｒ）。 Subsequently, the control unit 11 receives the delay time T31 transmitted from the headphone device HP when the audio signal is in the headphone device HP via the short-range wireless transmission / reception unit 43c (step S11r).

そして、制御部１１は、音声信号が近距離無線通信回線ＢＴを介して伝送されるための遅延時間を推定する（ステップＳ１１ｓ）。この伝送されるための遅延時間は、第１の実施形態に係わる制御部１１の動作説明の際に述べたような、以下の１つ、または複数の方法によって推定される。 And the control part 11 estimates the delay time for an audio | voice signal to be transmitted via short-distance radio | wireless communication line BT (step S11s). The delay time for transmission is estimated by one or more of the following methods as described in the explanation of the operation of the control unit 11 according to the first embodiment.

第１に、任意のデータが近距離無線送受信部４３ｃからヘッドフォン装置ＨＰに送信された後、ヘッドフォン装置ＨＰの近距離無線送受信部５３−３が直ちに受信された旨を移動通信端末装置ＭＳに送信することによって推定される。第２に、近距離無線送受信部４３ｃが近距離無線通信回線ＢＴの通信で用いられる所定のプロトコルのステップを行った時刻によって推定される。第３に、予め定められた近距離無線通信部４３ｄの仕様に従って推定される。 First, after arbitrary data is transmitted from the short-range wireless transmission / reception unit 43c to the headphone device HP, the short-range wireless transmission / reception unit 53-3 of the headphone device HP immediately transmits reception to the mobile communication terminal device MS. To be estimated. Second, it is estimated based on the time at which the short-range wireless transmission / reception unit 43c performs a step of a predetermined protocol used in the communication of the short-range wireless communication line BT. Thirdly, it is estimated according to a predetermined specification of the short-range wireless communication unit 43d.

そして、制御部１１は、遅延時間を推定して（ステップＳ１１ｔ）、遅延時間の推定動作を終了する（ステップＳ１１ｕ）。ここで、遅延時間は、音声信号が移動通信端末装置ＭＳにある際に発生する遅延時間の推定値と、音声信号がヘッドフォン装置ＨＰにある際に発生する遅延時間の推定値と、音声信号が近距離無線通信回線ＢＴを介して伝送されるための遅延時間の推定値との合計時間として推定される。 Then, the control unit 11 estimates the delay time (step S11t), and ends the delay time estimation operation (step S11u). Here, the delay time includes an estimated value of the delay time that occurs when the audio signal is in the mobile communication terminal device MS, an estimated value of the delay time that occurs when the audio signal is in the headphone device HP, and the audio signal. It is estimated as the total time with the estimated value of the delay time for transmission via the short-range wireless communication line BT.

音声信号が移動通信端末装置ＭＳにある際に発生する遅延時間は、ステップＳ１１ｂ及びステップＳ１１ｃの動作によって得られた時刻を用いて
Ｔ１２−Ｔ１１
であると推定される。そして、音声信号がヘッドフォン装置ＨＰにある際に発生する遅延時間は、ステップＳ１１ｒで受信された
Ｔ３１
であると推定される。また、音声信号が近距離無線通信回線ＢＴを介して伝送されるための遅延時間は、ステップＳ１１ｓの動作によって推定された値である。 The delay time that occurs when the audio signal is in the mobile communication terminal device MS uses the time obtained by the operations of Step S11b and Step S11c. T12-T11
It is estimated that. The delay time generated when the audio signal is in the headphone device HP is received in step S11r T31.
It is estimated that. Further, the delay time for transmitting the audio signal via the short-range wireless communication line BT is a value estimated by the operation in step S11s.

なお、ステップＳ１１ｂ及びステップＳ１１ｃの動作、ステップＳ１１ｒの動作、ステップＳ１１ｓの動作の３つの動作は、図１３のフローチャートに示した順で行われると限るものではない。異なる順で行われても良い。また、３つの動作が行われる時間間隔は、独立に定められても良い。その場合、いずれかの動作が行われる度に、ステップＳ１１ｔの遅延時間の推定が行われる。 Note that the three operations of the operations of Step S11b and Step S11c, the operation of Step S11r, and the operation of Step S11s are not necessarily performed in the order shown in the flowchart of FIG. It may be done in a different order. Moreover, the time interval at which the three operations are performed may be determined independently. In that case, every time any operation is performed, the delay time in step S11t is estimated.

以上の説明は、所定の識別情報が付加されたＳＢＣフレームを用いて遅延時間を推定するとした。既に第１の実施形態の説明で述べた通り、所定の識別情報が付加されたＳＢＣフレームは、例えば、ＳＢＣフレームに付されたフレーム番号がある整数で割り切れるＳＢＣフレーム、また、ＳＢＣフレームに付されたフレーム番号がある整数であるＳＢＣフレームであっても良い。 In the above description, the delay time is estimated using the SBC frame to which the predetermined identification information is added. As already described in the description of the first embodiment, an SBC frame to which predetermined identification information is added is, for example, an SBC frame that is divisible by a certain integer, or an SBC frame that is assigned to an SBC frame. The SBC frame may be an integer with a certain frame number.

前述したように、最も大きい遅延時間は、音声信号が移動通信端末装置ＭＳにある際に発生する遅延時間であり、次に大きい遅延時間は、音声信号がヘッドフォン装置ＨＰにある際に発生する遅延時間であることが多い。そのため、制御部１１は、ステップＳ１１ｂ及びステップＳ１１ｃの動作をより頻繁に実行して、音声信号が移動通信端末装置ＭＳにある際に発生する遅延時間を正しく推定することが望ましい。これらの動作は、ヘッドフォン装置ＨＰの動作に何ら影響を与えず、近距離無線通信回線ＢＴを介した通信が発生しないため、頻繁に実行しても、ヘッドフォン装置ＨＰによる音声出力へ影響を与える可能性が非常に小さい。 As described above, the largest delay time is a delay time that occurs when the audio signal is in the mobile communication terminal device MS, and the next largest delay time is a delay that occurs when the audio signal is in the headphone device HP. Often time. Therefore, it is desirable that the control unit 11 correctly estimates the delay time that occurs when the voice signal is in the mobile communication terminal apparatus MS by more frequently executing the operations of step S11b and step S11c. These operations have no effect on the operation of the headphone device HP, and no communication occurs via the short-range wireless communication line BT. Therefore, even if executed frequently, the sound output by the headphone device HP can be affected. Sex is very small.

一方、音声信号がヘッドフォン装置ＨＰにある際に発生する遅延時間の推定には、ヘッドフォン装置ＨＰの動作を必要とし、近距離無線通信回線ＢＴを介した通信が発生する。そこで、音声信号がヘッドフォン装置ＨＰにある際に発生する遅延時間の推定値に含まれる誤差を、音声信号がヘッドフォン装置ＨＰにある際に発生する遅延時間の推定値に含まれる誤差よりも小さくする必要性は乏しい。 On the other hand, the estimation of the delay time that occurs when the audio signal is in the headphone device HP requires the operation of the headphone device HP, and communication occurs via the short-range wireless communication line BT. Therefore, the error included in the estimated value of the delay time that occurs when the audio signal is in the headphone device HP is made smaller than the error that is included in the estimated value of the delay time that occurs when the audio signal is in the headphone device HP. There is little need.

即ち、制御部１１は、ステップＳ１１ｒの動作を稀に実行しても良い。この音声信号がヘッドフォン装置ＨＰにある際に発生する遅延時間の推定動作は、ヘッドフォン装置ＨＰの動作を伴うので、この動作を稀に実行することは、ヘッドフォン装置ＨＰによる音声出力への影響を避けるために望ましい。 That is, the control unit 11 may rarely execute the operation of step S11r. Since the operation of estimating the delay time that occurs when the audio signal is present in the headphone device HP is accompanied by the operation of the headphone device HP, rarely executing this operation avoids an influence on the audio output by the headphone device HP. Desirable for.

稀に実行する一例として、ヘッドフォン装置ＨＰによる音声出力が開始される直前に１回行うことでも良い。また、音声信号がヘッドフォン装置ＨＰにある際に発生する遅延時間が予め得られている、または、予想される場合、その得られている、または、予想される値を用いても良い。また、制御部１１は、音声信号がヘッドフォン装置ＨＰにある際に発生する遅延時間を測定する間隔を制御せず、所定の時間間隔でその遅延時間をヘッドフォン装置ＨＰに問い合わせるとしても良い。 As an example of rare execution, it may be performed once immediately before the sound output by the headphone device HP is started. Further, when the delay time generated when the audio signal is in the headphone device HP is obtained or expected in advance, the obtained or expected value may be used. Further, the control unit 11 may inquire of the headphone device HP about the delay time at a predetermined time interval without controlling the interval for measuring the delay time generated when the audio signal is in the headphone device HP.

稀に実行する別の例として、制御部１１は、テレビ処理部２１によって受信されているチャンネルが変更された際、遅延時間を推定するとしても良い。チャンネルの変更は、入力装置１６の所定のキー操作によって行われるので、制御部１１が把握できる。また、テレビ処理部２１によって受信されている放送の番組の変更の際、遅延時間を推定するとしても良い。番組の変更は、ＤＥＭＵＸ部２１ｃによって分離されたＰＣＲが実時間の変化と異なる不連続な変化をすることによって把握される。 As another example that is rarely executed, the control unit 11 may estimate the delay time when the channel received by the television processing unit 21 is changed. Since the change of the channel is performed by a predetermined key operation of the input device 16, the controller 11 can grasp it. Further, the delay time may be estimated when the broadcast program received by the television processing unit 21 is changed. The change of the program is grasped when the PCR separated by the DEMUX unit 21c changes discontinuously different from the change in real time.

（その他の実施形態）
上記の第１〜第３の実施形態は、必ずしも排他的ではない。適宜組み合わせた形態とすることができる。 (Other embodiments)
The above first to third embodiments are not necessarily exclusive. Appropriate combinations can be made.

以上の説明は、テレビ放送された映像データと、音声データとを例にとって行ったが、これに限るものではない。映像データと、音声データとが、例えば、ＲＴＰプロトコルに従って受信される場合、映像データが表示される時刻を示すタイムスタンプと、音声データが出力される時刻を示すタイムスタンプとは、異なるメディアクロックが示す時刻による。しかしながら、ＲＴＣＰパケットを参照して、これらのメディアクロックが示す時刻と、共通の参照クロックが示す時刻との対応を付ける処理は周知である。そこで、共通の参照クロックをＳＴＣ部２３とみなすことによって、本発明の適用が可能である。 Although the above description has been made taking video data and audio data broadcast on television as examples, the present invention is not limited to this. For example, when video data and audio data are received according to the RTP protocol, a time stamp indicating the time at which the video data is displayed and a time stamp indicating the time at which the audio data is output have different media clocks. Depending on the time shown. However, referring to the RTCP packet, a process for associating the time indicated by these media clocks with the time indicated by the common reference clock is well known. Therefore, the present invention can be applied by regarding the common reference clock as the STC unit 23.

以上の説明は、テレビ処理部２１によって受信される放送は、映像と音声とからなるとしたが、これに限るものではない。例えば、更に表示部１５に表示される文字を含んでいても良い。この文字の表示は、以上説明した、映像再生部３１が映像フレーム信号２２ｃの再生を遅延させる処理と同じ処理によって遅延させれば良く、説明を省略する。 In the above description, the broadcast received by the television processing unit 21 is composed of video and audio. However, the present invention is not limited to this. For example, characters displayed on the display unit 15 may be further included. The display of this character may be delayed by the same process as the process of delaying the playback of the video frame signal 22c described above by the video playback unit 31, and the description thereof is omitted.

以上の説明は、移動通信端末装置ＭＳの音声データ切替部４１は、音声再生部４２と、近距離無線処理部４３とのいずれか一方に音声バッファ２４からのデータの読み取りをさせるとしたが、これに限るものではない。これらの両方にデータの読み取りをさせるとしても良い。 In the above description, the voice data switching unit 41 of the mobile communication terminal apparatus MS causes either the voice reproduction unit 42 or the short-range wireless processing unit 43 to read data from the voice buffer 24. This is not a limitation. Both of them may be made to read data.

この両方にデータの読み取りをさせる処理のために、制御部１１は、音声再生部４２を制御して、音声フレーム信号２４ｃのスピーカ４２ａからの出力に遅延をさせる。ここで、遅延時間は、映像再生部３１に指示して映像フレーム信号２２ｃの再生を遅延させた時間と同じ時間である。音声再生部４２が音声フレーム信号２４ｃの出力に遅延させる処理は、映像再生部３１が映像フレーム信号２２ｃの再生を遅延させる処理と同じであり、説明を省略する。 In order to make both read the data, the control unit 11 controls the audio reproduction unit 42 to delay the output of the audio frame signal 24c from the speaker 42a. Here, the delay time is the same as the time when the video playback unit 31 is instructed to delay the playback of the video frame signal 22c. The process in which the audio reproduction unit 42 delays the output of the audio frame signal 24c is the same as the process in which the video reproduction unit 31 delays the reproduction of the video frame signal 22c, and the description thereof is omitted.

このような処理によれば、本発明は、移動通信端末装置ＭＳの表示部１５の表示と、移動通信端末装置ＭＳのスピーカ４２ａから発生される音声と、ヘッドフォン装置ＨＰのスピーカ５６ａから発生される音声との同期を取ることに有効である。更には、本発明は、移動通信端末装置ＭＳの表示部１５に表示を行うか否かに係らず、２つのスピーカから発生される音声の同期を取ることに有効である。 According to such processing, the present invention is generated from the display of the display unit 15 of the mobile communication terminal device MS, the sound generated from the speaker 42a of the mobile communication terminal device MS, and the speaker 56a of the headphone device HP. This is effective for synchronizing with audio. Furthermore, the present invention is effective in synchronizing the sounds generated from the two speakers regardless of whether or not the display is performed on the display unit 15 of the mobile communication terminal device MS.

以上の説明は、本発明をヘッドフォン装置ＨＰが１台である場合を例にとって行ったが、本発明は、複数のヘッドフォン装置ＨＰを有する構成のシステムに適用することが当然に可能である。その場合、移動通信端末装置ＭＳの制御部１１は、各ヘッドフォン装置ＨＰ毎に遅延時間を推定する。 In the above description, the present invention has been described by taking as an example the case where there is one headphone device HP. However, the present invention can naturally be applied to a system having a plurality of headphone devices HP. In that case, the control unit 11 of the mobile communication terminal apparatus MS estimates the delay time for each headphone apparatus HP.

そして、制御部１１は、ヘッドフォン装置ＨＰ毎の遅延時間の相違は、各ヘッドフォン装置ＨＰ向けに先行して音声信号を符号化し送信する動作で、先行する時間をヘッドフォン装置ＨＰ毎に異なる時間とすることにより打ち消す。または、符号化音声バッファ４３ｂに記憶された音声信号を各ヘッドフォン装置ＨＰへ送信させるにあたり、近距離無線送受信部４３ｃに送信する時刻を各ヘッドフォン装置ＨＰ毎に異なる時刻とすることにより打ち消す。 The difference in the delay time for each headphone device HP is that the control unit 11 encodes and transmits the audio signal in advance for each headphone device HP, and sets the preceding time to be different for each headphone device HP. To cancel. Alternatively, when transmitting the audio signal stored in the encoded audio buffer 43b to each headphone device HP, the time to transmit to the short-range wireless transmission / reception unit 43c is canceled by setting the time different for each headphone device HP.

以上の説明は、本発明を移動通信端末装置ＭＳ及びヘッドフォン装置ＨＰに適用した例を用いた。しかし、本発明の適用は、これらの装置に限るものではない。例えば、移動通信端末装置ＭＳに代えて、固定式または携帯式のテレビ受像機に適用しても良い。それによって、テレビ受像機から音声を出力することなく、または、小さい音量の音声を出力し、テレビ視聴者の近くに置かれたヘッドフォン装置ＨＰから適切な大きさの音量で音声を出力させることができる。これによって、テレビ視聴者以外の者は、テレビ放送された音声によって静粛を破られることがない。 The above description uses an example in which the present invention is applied to the mobile communication terminal device MS and the headphone device HP. However, the application of the present invention is not limited to these apparatuses. For example, instead of the mobile communication terminal apparatus MS, the present invention may be applied to a fixed or portable television receiver. Thereby, without outputting the sound from the television receiver or outputting the sound with a small volume, the sound can be output with an appropriate volume from the headphone device HP placed near the TV viewer. it can. Thus, a person other than the television viewer is not broken by the sound broadcast on the television.

また、本発明を、ヘッドフォン装置ＨＰに代えて、商用電源によって駆動され、大きな音量の音声を出力する音声出力装置に適用することが当然に可能である。 Further, the present invention can naturally be applied to an audio output device that is driven by a commercial power supply and outputs a sound with a large volume instead of the headphone device HP.

以上の説明は、音声信号はモノラル信号であるとしたが、これに限るものではない。ステレオ信号であっても全く同様に処理すれば良い。また、ステレオ信号の一方の音声が移動通信端末装置ＭＳのスピーカ４２ａから出力され、他方の音声ヘッドフォン装置ＨＰのスピーカ５６ａから出力されても良い。更に、ステレオ信号のそれぞれの音声が異なるヘッドフォン装置ＨＰから出力されても良い。本発明は以上の構成に限定されるものではなく、種々の変形が可能である。 In the above description, the audio signal is a monaural signal, but the present invention is not limited to this. Even a stereo signal may be processed in exactly the same way. Further, one sound of the stereo signal may be output from the speaker 42a of the mobile communication terminal device MS and may be output from the speaker 56a of the other sound headphone device HP. Furthermore, the sound of each stereo signal may be output from different headphone devices HP. The present invention is not limited to the above configuration, and various modifications are possible.

本発明の第１の実施形態に係る移動通信端末装置と、ヘッドフォン装置が接続された構成を示すブロック図1 is a block diagram showing a configuration in which a mobile communication terminal device according to a first embodiment of the present invention and a headphone device are connected. 本発明の第１の実施形態に係る移動通信端末装置の構成を示すブロック図。The block diagram which shows the structure of the mobile communication terminal device which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係るテレビ処理部の詳細な構成を示す図。The figure which shows the detailed structure of the television processing part which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る近距離無線処理部の詳細な構成を示す図。The figure which shows the detailed structure of the short distance radio | wireless process part which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係るヘッドフォン装置の構成を示すブロック図。1 is a block diagram showing a configuration of a headphone device according to a first embodiment of the present invention. 本発明の第１の実施形態に係る映像バッファに記憶される復号された映像信号の形式の一例を示す図。The figure which shows an example of the format of the decoded video signal memorize | stored in the video buffer which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る音声バッファに記憶される復号された音声信号の形式の一例を示す図。The figure which shows an example of the format of the decoded audio | voice signal memorize | stored in the audio | voice buffer which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る制御部が遅延時間を推定する動作のフローチャート。The flowchart of the operation | movement which the control part which concerns on the 1st Embodiment of this invention estimates delay time. 本発明の第２の実施形態に係る移動通信端末装置の構成を示すブロック図。The block diagram which shows the structure of the mobile communication terminal device which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施形態に係る近距離無線処理部の詳細な構成を示す図。The figure which shows the detailed structure of the short distance radio | wireless process part which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施形態に係る制御部が遅延時間を推定する動作のフローチャート。The flowchart of the operation | movement which the control part which concerns on the 2nd Embodiment of this invention estimates delay time. 本発明の第３の実施形態に係るヘッドフォン装置の構成を示すブロック図。The block diagram which shows the structure of the headphone apparatus which concerns on the 3rd Embodiment of this invention. 本発明の第３の実施形態に係る制御部が遅延時間を推定する動作のフローチャート。The flowchart of the operation | movement which the control part which concerns on the 3rd Embodiment of this invention estimates delay time.

Explanation of symbols

ＭＳ移動通信端末装置
ＨＰヘッドフォン装置
ＢＴ近距離無線通信回線
１１、５１、５１−３制御部
１５表示部
２１テレビ処理部
２２映像バッファ
２２ａ復号された映像信号
２２ｂ、２４ｂＰＴＳ
２２ｃ映像フレーム信号
２３ＳＴＣ部
２４音声バッファ
２４ａ復号された音声信号
２４ｃ音声フレーム信号
３１映像再生部
４１音声データ切替部
４２、５６音声再生部
４２ａ、５６ａスピーカ
４３、４３−２近距離無線処理部
４３ａ、４３ａ２音声符号化部
４３ｂ、５４符号化音声バッファ
４３ｃ、５３、５３−３近距離無線送受信部
４３ｄ、５２近距離無線通信部
４４マイクロフォン
５５、５５−３音声復号化部 MS Mobile communication terminal device HP Headphone device BT Short-range wireless communication line 11, 51, 51-3 Control unit 15 Display unit 21 TV processing unit 22 Video buffer 22a Decoded video signal 22b, 24b PTS
22c Video frame signal 23 STC unit 24 Audio buffer 24a Decoded audio signal 24c Audio frame signal 31 Video playback unit 41 Audio data switching unit 42, 56 Audio playback unit 42a, 56a Speaker 43, 43-2 Short-range wireless processing unit 43a 43a2 Speech encoding unit 43b, 54 Encoded speech buffer 43c, 53, 53-3 Short-range wireless transmission / reception unit 43d, 52 Short-range wireless communication unit 44 Microphone 55, 55-3 Speech decoding unit

Claims

Displays video data of content consisting of video data, time when the video data is displayed, audio data, and time when the audio data is output, and outputs the audio data of the content to the audio output device A video display device,
Display means for displaying video data;
Short-range wireless processing means for communicating with the voice output device via a short-range wireless line;
A time at which the short-range wireless processing means has started processing to transmit voice data to the voice output device; and a time at which the short-range wireless processing means has received notification that the voice data has been output from the voice output device; Voice delay time estimation means for estimating the voice delay time based on the difference between
Displaying the video data on the display means at the time when the video data is displayed and the video delay time, and transmitting the audio data to the audio output device, the audio data is output. And a video / audio synchronization control unit which causes the short-range wireless processing unit to start at a time obtained by subtracting the audio delay time from the time and adding the video delay time.

The voice delay time estimating means corrects the voice delay time by reducing a time during which the short-range wireless processing means receives the notification from the voice output device via the short-range wireless line. The video display device according to claim 1.

Short-range wireless processing means for communicating with the video display device via a short-range wireless line;
Audio output means for outputting audio data received by the short-range wireless processing means from a speaker;
When the audio output means outputs audio data to which predetermined identification information is added, video / audio synchronization control means for controlling the short-range wireless processing means to notify the video display device that the audio data has been output. And an audio output device.

Displays video data of content consisting of video data, time when the video data is displayed, audio data, and time when the audio data is output, and outputs the audio data of the content to the audio output device A video display device,
Display means for displaying video data;
Short-range wireless processing means for communicating with the voice output device via a short-range wireless line;
A microphone,
A voice that estimates a voice delay time based on a difference between a time when the short-range wireless processing means starts a process of sending voice data to the voice output device and a time when the microphone inputs a voice when the voice data is output. A delay time estimating means;
Displaying the video data on the display means at the time when the video data is displayed and the video delay time, and transmitting the audio data to the audio output device, the audio data is output. And a video / audio synchronization control unit which causes the short-range wireless processing unit to start at a time obtained by subtracting the audio delay time from the time and adding the video delay time.

Displays video data of content consisting of video data, time when the video data is displayed, audio data, and time when the audio data is output, and outputs the audio data of the content to the audio output device A video display device,
Display means for displaying video data;
Short-range wireless processing means for communicating with the voice output device via a short-range wireless line;
The time from when the short-range wireless processing means starts processing to transmit the audio data to the voice output device to the end thereof, and the voice output device received by the short-range wireless processing means from the voice output device Voice delay time estimating means for estimating the voice delay time by the sum of the measured time from the reception of the data to the output of the voice data;
Displaying the video data on the display means at the time when the video data is displayed and the video delay time, and transmitting the audio data to the audio output device, the audio data is output. And a video / audio synchronization control unit which causes the short-range wireless processing unit to start at a time obtained by subtracting the audio delay time from the time and adding the video delay time.

The voice delay time estimating means corrects the voice delay time by adding a time for the short-range wireless processing means to transmit the voice data to the voice output device via the short-range radio line. The video display device according to claim 5.

Short-range wireless processing means for communicating with the video display device via a short-range wireless line;
Audio output means for outputting audio data received by the short-range wireless processing means from a speaker;
The audio output means measures the time until the audio data is output after the short distance wireless processing means receives the audio data, and notifies the video display device of the time by controlling the short distance radio processing means. And an audio / video synchronization control means.