JP2007300323A

JP2007300323A - Subtitle display control system

Info

Publication number: JP2007300323A
Application number: JP2006125654A
Authority: JP
Inventors: Jiro Okada; 治郎岡田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2006-04-28
Filing date: 2006-04-28
Publication date: 2007-11-15

Abstract

<P>PROBLEM TO BE SOLVED: To increase presence when a viewer views subtitles by automatically extracting features of a speech from a received audio signal and changing display methods and display forms of the subtitles based upon the speech feature information thereof. <P>SOLUTION: A subtitle display control system equipped with a display device such as a television and a PC monitor displaying pictures includes: a signal separation unit 2 which separates a subtitle signal, the audio signal, and a video signal from a received multiplexed signal; a speech feature extracting unit 3 which extracts features such as the sound volume and pitch of a speech from the separated audio signal; a subtitle display control unit 4 which controls a display method for subtitles by using the speech feature information obtained by the speech feature extracting unit 3 and the subtitle signal from the signal separating unit 2; and a subtitle superposing unit 6 which superposes the subtitles on video based upon subtitle display information determined by the subtitle display control unit 4. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、テレビ、コンピュータあるいはスクリーンに像を投影する等の表示装置において、字幕、クローズドキャプション等の文字情報を投影する字幕表示制御システムに関する。 The present invention relates to a caption display control system that projects character information such as captions and closed captions in a display device such as a television, a computer, or a screen that projects an image.

従来、テレビなどにおける字幕表示システムがいくつか提案されている（例えば、特許文献１を参照）。図５は特許文献１に開示された字幕表示システムの流れを示す説明図である。特許文献１では、まず、信号送信側において、音声信号と映像信号、および字幕信号を信号多重化部２９によって多重化する。受信機では、まず、信号分離部３０によって受信信号を音声信号と映像信号、および字幕信号に分離する。字幕信号に対しては、重畳位置制御部３１により映像信号に重畳する位置が計算され、字幕重畳部３２によって映像信号に重畳される。そして、表示部３３においては字幕が重畳された映像が表示され、音声再生部３４では音声が再生される。 Conventionally, several caption display systems for televisions and the like have been proposed (see, for example, Patent Document 1). FIG. 5 is an explanatory diagram showing the flow of the caption display system disclosed in Patent Document 1. In Patent Document 1, first, on the signal transmission side, the audio signal, the video signal, and the caption signal are multiplexed by the signal multiplexing unit 29. In the receiver, first, the signal separation unit 30 separates the reception signal into an audio signal, a video signal, and a caption signal. For the subtitle signal, the superimposition position control unit 31 calculates the position to be superimposed on the video signal, and the subtitle superimposition unit 32 superimposes it on the video signal. The display unit 33 displays a video with captions superimposed thereon, and the audio reproduction unit 34 reproduces audio.

一方、音声認識などの分野において、雑音や背景音が重畳した音声から、非音声成分を取り除き、音声成分を抽出する手法が数多く提案されている（例えば、特許文献２を参照）。また、音源分離の分野において、入力された複数の音声信号から、音源が存在する方向を推定する技術が数多く提案されている（例えば、特許文献３を参照）。
特開平２−１９０８８号公報特開２０００−３２１０８０号公報特開２００３−９９０９３号公報 On the other hand, in the field of voice recognition and the like, many techniques for removing a non-voice component and extracting a voice component from a voice on which noise or background sound is superimposed have been proposed (for example, see Patent Document 2). In the field of sound source separation, many techniques for estimating the direction in which a sound source exists from a plurality of input audio signals have been proposed (see, for example, Patent Document 3).
JP-A-2-19088 JP 2000-321080 A JP 2003-99093 A

特許文献１に示すような従来技術における字幕表示方法では、信号受信側においては字幕情報のみに着目し、文字の大きさや色、位置などを計算していたため、音声と字幕の対応が取りづらく、視聴者が字幕を見た場合の臨場感が少なかった。特に、難聴者に対しては臨場感が伝わりづらいという課題があった。 In the caption display method in the prior art as shown in Patent Document 1, the signal reception side pays attention only to caption information and calculates the size, color, position, etc. of the character, so it is difficult to match the voice and caption. There was little presence when viewers watched subtitles. In particular, there was a problem that it was difficult to convey a sense of reality to the hearing impaired.

また、放送事業者が字幕の表示位置や表示方法を指定するのにも手間がかかるという課題があった。特に、生放送でリアルタイムに字幕を放送するような場合には、時間的な制約もあった。 There is also a problem that it takes time for the broadcaster to specify the display position and display method of subtitles. In particular, when subtitles are broadcast in real time in live broadcasting, there is a time restriction.

本発明の主たる目的は、受信した音声信号から音声の特徴を自動的に抽出し、その音声特徴情報に基づいて字幕の表示方法や表示態様を変更制御することで、視聴者が字幕を見た場合の臨場感を高める字幕表示制御システムを提供することにある。 The main object of the present invention is to automatically extract audio features from a received audio signal, and to change the display method and display mode of the subtitles based on the audio feature information, thereby allowing the viewer to view the subtitles. An object of the present invention is to provide a caption display control system that enhances the sense of reality.

前記課題を解決するために、本発明は主として次のような構成を採用する。
画面を表示するテレビまたはＰＣモニタなどの表示装置を備えた字幕表示制御システムであって、
多重化された受信信号から字幕信号、音声信号、および映像信号を分離する信号分離部と、前記分離された音声信号から音声の音量、音声の高さなどの音声の特徴を抽出する音声特徴抽出部と、前記音声特徴抽出部より得られる音声特徴情報と前記信号分離部からの字幕信号を用いて字幕の表示方法を制御する字幕表示制御部と、前記字幕表示制御部で決定された字幕表示情報に基づいて字幕を映像に重畳する字幕重畳部と、を備える構成とする。 In order to solve the above problems, the present invention mainly adopts the following configuration.
A subtitle display control system including a display device such as a television or a PC monitor that displays a screen,
A signal separation unit that separates a caption signal, an audio signal, and a video signal from the multiplexed received signal, and an audio feature extraction that extracts audio features such as audio volume and audio height from the separated audio signal A subtitle display control unit that controls a subtitle display method using audio feature information obtained from the audio feature extraction unit and a subtitle signal from the signal separation unit, and a subtitle display determined by the subtitle display control unit A subtitle superimposing unit that superimposes subtitles on the video based on the information.

また、前記字幕表示制御システムにおいて、前記信号分離部で分離された音声信号から背景音、雑音などの非音声成分を除去し音声成分信号を抽出する音声成分抽出部を備え、前記音声特徴抽出部は、前記音声成分抽出部により抽出された音声成分信号を入力とする構成とする。 In the caption display control system, the audio feature extraction unit includes an audio component extraction unit that extracts a non-audio component such as background sound and noise from the audio signal separated by the signal separation unit, and extracts an audio component signal. Is configured to receive the audio component signal extracted by the audio component extraction unit.

また、前記字幕表示制御システムにおいて、前記音声成分抽出部により抽出された音声成分信号から音声の放射方向を検出する音声方向検出部を備え、前記字幕表示制御部は、前記音声方向検出部から得られる音声方向情報と、前記音声特徴抽出部より得られる音声特徴情報と、前記信号分離部からの字幕信号とを用いて字幕の表示方法を制御する構成とする。 The subtitle display control system further includes an audio direction detection unit that detects an audio radiation direction from the audio component signal extracted by the audio component extraction unit, and the subtitle display control unit is obtained from the audio direction detection unit. The subtitle display method is controlled using the audio direction information, the audio feature information obtained from the audio feature extraction unit, and the subtitle signal from the signal separation unit.

本発明によると、受信した音声から特徴を自動的に抽出し、その情報に基づいて字幕の表示方法を動的に変更しているので、視聴者が字幕を見た場合の臨場感が増大する。特に、難聴者などが視聴する場合でも、臨場感を的確に伝えることができる。 According to the present invention, the feature is automatically extracted from the received voice, and the display method of the caption is dynamically changed based on the information, so that the presence when the viewer views the caption is increased. . In particular, even when a hearing-impaired person or the like views, it is possible to accurately convey a sense of reality.

また、音声の特徴抽出を行う前に雑音や背景音などの音声以外の成分を除去する機能を備えることで、スポーツ番組などで音声に対し観客の歓声などが重畳されている場合でも対応できる。 In addition, by providing a function for removing components other than sound such as noise and background sound before performing sound feature extraction, it is possible to cope with the case where the cheering of the audience is superimposed on the sound in a sports program or the like.

また、音声からその放射方向を抽出し、字幕の表示位置を制御する機能を備えているので、画面上で会話をしている話者の方向に字幕を表示することができる。 In addition, since the function of extracting the radiation direction from the sound and controlling the display position of the subtitle is provided, the subtitle can be displayed in the direction of the speaker having a conversation on the screen.

また、自動的に字幕の表示位置や表示方法が計算されるので、放送事業者がそれらを指定するという手間を省くことができる。自動的に計算することで、生放送でリアルタイムに字幕を放送するような場合でも時間的制約を受けることがない。 In addition, since the subtitle display position and display method are automatically calculated, it is possible to save the trouble of the broadcaster specifying them. By calculating automatically, even when subtitles are broadcast in real time in live broadcasting, there is no time restriction.

本発明の実施形態に係わる字幕表示制御システムについて、図１〜図４を参照しながら以下詳細に説明する。図１は本発明の実施形態に係わる字幕表示制御システムにおける音声情報を用いた字幕表示の制御を実施する基本構成を示すブロック図である。図２は本発明の実施形態に係わる字幕表示制御システムにおける音声情報を用いた字幕表示の制御を実施する構成であって、音声から特徴を抽出する前に雑音や背景音などを取り除き音声成分のみを抽出する構成例を示すブロック図である。図３は、本実施形態に係わる字幕表示制御システムにおける音声情報を用いた字幕表示の制御を実施する構成であって、音声成分の放射方向を検出し字幕の表示位置を制御するようにした構成例を示すブロック図である。図４は、図３の構成において、表示装置を真上から見た場合の音声の放射方向とその角度を用いた表し方を示す一例である。 A caption display control system according to an embodiment of the present invention will be described in detail below with reference to FIGS. FIG. 1 is a block diagram showing a basic configuration for carrying out subtitle display control using audio information in a subtitle display control system according to an embodiment of the present invention. FIG. 2 is a configuration for performing subtitle display control using audio information in the subtitle display control system according to the embodiment of the present invention, and removes noise and background sound before extracting features from the audio, and only the audio component is removed. It is a block diagram which shows the structural example which extracts. FIG. 3 shows a configuration for controlling the subtitle display using audio information in the subtitle display control system according to the present embodiment, in which the emission direction of the audio component is detected and the subtitle display position is controlled. It is a block diagram which shows an example. FIG. 4 is an example showing how to represent the sound using a radiation direction and its angle when the display device is viewed from directly above in the configuration of FIG.

図１〜図４において、１，９，１８は信号多重化部、２，１０，１９は信号分離部、３，１２，２１は音声特徴抽出部、４，１３，２３は字幕表示制御部、５，１４，２４は記憶装置、６，１５，２５は字幕重畳部、７，１６，２６は表示部、８，１７，２７は音声再生部、１１，２０は音声成分抽出部、２２は音声方向検出部、２８は表示装置をそれぞれ表す。 1 to 4, 1, 9 and 18 are signal multiplexing units, 2, 10 and 19 are signal separation units, 3, 12 and 21 are audio feature extraction units, 4, 13 and 23 are subtitle display control units, 5, 14, and 24 are storage devices, 6, 15, and 25 are subtitle superimposing units, 7, 16, and 26 are display units, 8, 17, and 27 are audio reproduction units, 11, and 20 are audio component extraction units, and 22 is an audio component. A direction detection unit 28 represents a display device.

まず、図１を用いて本実施形態に係る字幕表示制御システムについて説明する。本システムでは、まず、信号送信側において、字幕信号、音声信号、および映像信号を信号多重化部１において多重化し、信号伝送を行う。受信側では、信号送信側より受信した信号を信号分離部２で元の字幕信号、音声信号、および映像信号に分離する。 First, the caption display control system according to the present embodiment will be described with reference to FIG. In this system, first, on the signal transmission side, a caption signal, an audio signal, and a video signal are multiplexed in the signal multiplexing unit 1 to perform signal transmission. On the reception side, the signal received from the signal transmission side is separated by the signal separation unit 2 into the original caption signal, audio signal, and video signal.

次に、音声特徴抽出部３では、分離した音声から例えば音声の音量、高さなどの音響的特徴を抽出する。字幕表示制御部４では、入力された音声特徴情報に基づいて、あらかじめ記憶装置５に保持しておいた表１や表２などを参照する。例えば、声の大きさが「大」で、声の高さが「低」という情報が抽出された場合、文字のサイズは「大」、文字の色は「青」という情報が出力される。これらの文字表示の情報を、信号分離部２により分離された字幕信号に付加し、字幕表示情報として次段の字幕重畳部６に渡す。 Next, the audio feature extraction unit 3 extracts acoustic features such as the volume and height of the audio from the separated audio. The subtitle display control unit 4 refers to Tables 1 and 2 stored in the storage device 5 in advance based on the input audio feature information. For example, when the information that the loudness of the voice is “Large” and the height of the voice is “Low” is extracted, the information that the size of the character is “Large” and the color of the character is “Blue” is output. The character display information is added to the subtitle signal separated by the signal separation unit 2 and passed to the subtitle superimposing unit 6 at the next stage as subtitle display information.

字幕重畳部６では、信号分離部２により分離された映像信号に対し、前記字幕表示制御部４により出力された字幕表示情報を用いて字幕を重畳し、字幕が重畳された映像信号（字幕重畳映像信号）を生成する。そして、最終的に、表示部７において字幕重畳映像信号が表示され、音声再生部８で音声が出力される。

The subtitle superimposing unit 6 superimposes subtitles on the video signal separated by the signal separating unit 2 using the subtitle display information output from the subtitle display control unit 4, and the subtitle superimposed video signal (subtitle superimposing) Video signal). Finally, a caption superimposed video signal is displayed on the display unit 7, and audio is output from the audio reproduction unit 8.

次に、図１に示した本発明によるシステムに対し、音声から特徴を抽出する前に、背景音や雑音などの音声以外を取り除き、音声のみの成分を抽出するように変形した例が図２である。まず、信号送信側において、字幕信号、音声信号、および映像信号を信号多重化部９によって多重化し、信号伝送を行う。受信側では、信号送信側より受信した信号を信号分離部１０で元の字幕信号、音声信号、および映像信号に分離する。 Next, FIG. 2 shows an example in which the system according to the present invention shown in FIG. 1 is modified so that components other than the background sound and noise are removed and components only of the sound are extracted before features are extracted from the sound. It is. First, on the signal transmission side, the caption signal, the audio signal, and the video signal are multiplexed by the signal multiplexing unit 9 to perform signal transmission. On the reception side, the signal received from the signal transmission side is separated into the original caption signal, audio signal, and video signal by the signal separation unit 10.

次に、音声成分抽出部１１において、例えば特許文献２などに示す公知の手法を用いて、背景音や雑音などの音声以外の不要な成分を取り除き、音声成分信号のみを抽出する。そして、音声特徴抽出部１２では、分離した音声成分信号から音響的特徴（音声の音量や高さなど）を抽出する。字幕表示制御部１３では、入力された音声特徴情報に基づいて、あらかじめ記憶装置１４に保持しておいた表１や表２などを参照し、文字の大きさや色などの表示情報を得る。これらの文字表示の情報を、信号分離部１０により分離された字幕信号に付加し、字幕表示情報として次段の字幕重畳部１５に渡す。 Next, the speech component extraction unit 11 removes unnecessary components other than speech such as background sound and noise by using a known method shown in, for example, Patent Document 2, and extracts only the speech component signal. Then, the audio feature extraction unit 12 extracts acoustic features (such as sound volume and height) from the separated audio component signals. The subtitle display control unit 13 refers to the table 1 or 2 stored in the storage device 14 in advance based on the input audio feature information, and obtains display information such as the size and color of characters. The character display information is added to the subtitle signal separated by the signal separation unit 10 and passed to the subtitle superimposing unit 15 at the next stage as subtitle display information.

字幕重畳部１５では、信号分離部１０により分離された映像信号に対し、字幕表示制御部１３により出力された字幕表示情報を用いて字幕を重畳し、字幕重畳映像信号を生成する。そして、最終的に、表示部１６において字幕重畳映像信号が表示され、音声再生部１７で音声が出力される。 The caption superimposing unit 15 superimposes the caption on the video signal separated by the signal separating unit 10 using the caption display information output by the caption display control unit 13 to generate a caption superimposed video signal. Finally, the subtitle superimposed video signal is displayed on the display unit 16, and the audio is output by the audio reproduction unit 17.

次に、図２に対して、入力された音声からその放射方向を検出する部分を備えるようにした変形例が図３である。まず、信号送信側において、字幕信号、音声信号、および映像信号を信号多重化部１８によって多重化し、信号伝送を行う。受信側では、信号送信側より受信した信号を信号分離部１９で元の字幕信号、音声信号、および映像信号に分離する。 Next, FIG. 3 is a modified example in which a portion for detecting the radiation direction from the input voice is provided with respect to FIG. First, on the signal transmission side, a caption signal, an audio signal, and a video signal are multiplexed by the signal multiplexing unit 18 to perform signal transmission. On the reception side, the signal received from the signal transmission side is separated into the original caption signal, audio signal, and video signal by the signal separation unit 19.

次に、音声成分抽出部２０において、例えば特許文献２などに示す公知の手法を用いて、背景音や雑音などの音声以外の不要な成分を取り除き、音声成分信号のみを抽出する。そして、音声特徴抽出部２１では、分離した音声成分信号から音響的特徴を抽出する。 Next, in the audio component extraction unit 20, unnecessary components other than audio such as background sound and noise are removed by using a known method shown in, for example, Patent Document 2, and only the audio component signal is extracted. Then, the voice feature extraction unit 21 extracts an acoustic feature from the separated voice component signal.

一方、音声成分抽出部２０より出力された音声成分信号は、音声方向検出部２２にも入力される。図４は、表示装置２８を真上から見た図である。音声方向検出部２２では、特許文献３などの従来からある音源分離手法などを用いて、例えば図４に示すように表示装置２８正面を０°としたときの音声の方向を検出し、出力する。字幕表示制御部２３では、入力された音声特徴情報に基づいて、あらかじめ記憶装置２４に保持しておいた表１や表２などを参照し、文字の大きさや色などの表示情報を得る。 On the other hand, the audio component signal output from the audio component extraction unit 20 is also input to the audio direction detection unit 22. FIG. 4 is a view of the display device 28 as viewed from directly above. The sound direction detection unit 22 detects and outputs the sound direction when the front surface of the display device 28 is set to 0 °, for example, as shown in FIG. 4, using a conventional sound source separation method such as Patent Document 3. . The subtitle display control unit 23 refers to the table 1 or 2 stored in the storage device 24 in advance based on the input audio feature information, and obtains display information such as the size and color of characters.

また、音声方向検出部２２より得られた音声方向情報に基づき、あらかじめ記憶装置２４に保持しておいた表３を参照し、字幕を重畳する位置を出力する。例えば、音声が、音声方向検出部２２により図４における５０°の方向へ放射していると検出された場合、字幕表示制御部２３では記憶装置２４に保持された表３を参照し、表示位置「画面右」という出力を得る。ここで、これらの文字表示や、表示位置の情報を、信号分離部１９により分離された字幕信号に付加し、字幕表示情報として次段の字幕重畳部２５に渡す。 Further, based on the voice direction information obtained from the voice direction detection unit 22, the table 3 previously stored in the storage device 24 is referred to, and the position where the caption is superimposed is output. For example, when it is detected by the voice direction detection unit 22 that the voice is emitted in the direction of 50 ° in FIG. 4, the subtitle display control unit 23 refers to Table 3 held in the storage device 24 and displays the display position. You get the output "screen right". Here, these character display and display position information are added to the subtitle signal separated by the signal separation unit 19 and passed to the subtitle superimposing unit 25 in the next stage as subtitle display information.

字幕重畳部２５では、信号分離部１９により分離された映像信号に対し、前記字幕表示制御部２３により出力された字幕表示情報を用いて字幕を重畳し、字幕重畳映像信号を生成する。そして、最終的に、表示部２６において字幕重畳映像信号が表示され、音声再生部２７で音声が出力される。

The caption superimposing unit 25 superimposes the caption on the video signal separated by the signal separating unit 19 using the caption display information output from the caption display control unit 23 to generate a caption superimposed video signal. Finally, a caption superimposed video signal is displayed on the display unit 26, and audio is output from the audio reproduction unit 27.

なお、図１〜図３における字幕表示制御部４，１３，２３に参照される記憶装置５，１４，２４に格納される表１〜表３において出力される内容は、例えばデータ放送などを使用して放送事業者があらかじめ色や文字の大きさなどを指定しても良い。また、受信装置において、ユーザが好みによって指定できるようにしても良い。また、上記構成例では、文字サイズ、色、表示位置などを表１〜表３を参照して出力しているが、これらはある関数を用いて算出しても良い。 The contents output in Tables 1 to 3 stored in the storage devices 5, 14, and 24 referred to by the caption display control units 4, 13, and 23 in FIGS. Then, the broadcaster may specify the color and the size of characters in advance. Further, in the receiving device, the user may be able to specify according to his / her preference. In the above configuration example, the character size, color, display position, and the like are output with reference to Tables 1 to 3, but these may be calculated using a certain function.

本発明の実施形態に係る字幕表示制御システムにおける音声情報を用いた字幕表示の制御を実施する基本構成を示すブロック図である。It is a block diagram which shows the basic composition which implements control of the subtitle display using the audio | voice information in the subtitle display control system which concerns on embodiment of this invention. 本発明の実施形態に係る字幕表示制御システムにおける音声情報を用いた字幕表示の制御を実施する構成であって、音声から特徴を抽出する前に雑音や背景音などを取り除き音声成分のみを抽出する構成例を示すブロック図である。In the subtitle display control system according to the embodiment of the present invention, subtitle display control using audio information is performed, and noise and background sounds are removed before extracting features from audio, and only audio components are extracted. It is a block diagram which shows the example of a structure. 本実施形態に係る字幕表示制御システムにおける音声情報を用いた字幕表示の制御を実施する構成であって、音声成分の音源位置を検出し字幕の表示位置を制御するようにした構成例を示すブロック図である。The block which shows the example of a structure which controls the subtitle display using the audio | voice information in the subtitle display control system which concerns on this embodiment, Comprising: The sound source position of an audio | voice component was detected and the subtitle display position was controlled. FIG. 図３の構成において、表示装置を真上から見た場合の音声の放射方向とその角度を用いた表し方を示す一例である。In the configuration of FIG. 3, it is an example showing how to represent the sound using the radiation direction and its angle when the display device is viewed from directly above. 従来技術に関する字幕表示制御システムにおける字幕表示と音声情報の関係を説明する図である。It is a figure explaining the relationship between a caption display and audio | voice information in the caption display control system regarding a prior art.

Explanation of symbols

１，９，１８，２９信号多重化部
２，１０，１９，３０信号分離部
３，１２，２１音声特徴抽出部
４，１３，２３字幕表示制御部
５，１４，２４記憶装置
６，１５，２５，３２字幕重畳部
７，１６，２６，３３表示部
８，１７，２７，３４音声再生部
１１，２０音声成分抽出部
２２音声方向検出部
２８表示装置
３１重畳位置制御部 1, 9, 18, 29 Signal multiplexing unit 2, 10, 19, 30 Signal separation unit 3, 12, 21 Audio feature extraction unit 4, 13, 23 Subtitle display control unit 5, 14, 24 Storage device 6, 15, 25, 32 Subtitle superimposition unit 7, 16, 26, 33 Display unit 8, 17, 27, 34 Audio reproduction unit 11, 20 Audio component extraction unit 22 Audio direction detection unit 28 Display device 31 Superimposition position control unit

Claims

A subtitle display control system including a display device such as a television or a PC monitor that displays a screen,
A signal separator that separates a caption signal, an audio signal, and a video signal from the multiplexed received signal;
An audio feature extraction unit that extracts audio features such as audio volume and audio height from the separated audio signal;
A subtitle display control unit that controls a subtitle display method using audio feature information obtained from the audio feature extraction unit and a subtitle signal from the signal separation unit;
A caption display control system comprising: a caption superimposing unit that superimposes a caption on video based on the caption display information determined by the caption display control unit.

In claim 1,
A voice component extraction unit that extracts a voice component signal by removing non-voice components such as background sound and noise from the voice signal separated by the signal separation unit;
The subtitle display control system, wherein the voice feature extraction unit receives the voice component signal extracted by the voice component extraction unit.

In claim 2,
A voice direction detection unit that detects a voice radiation direction from the voice component signal extracted by the voice component extraction unit;
The subtitle display control unit uses the audio direction information obtained from the audio direction detection unit, the audio feature information obtained from the audio feature extraction unit, and the subtitle signal from the signal separation unit to perform a subtitle display method. A subtitle display control system characterized by controlling.

In claim 1 or 2,
Subtitle display information obtained from the subtitle display control unit is determined based on audio feature information input to the subtitle display control unit with reference to a correspondence table of preset audio feature information / subtitle display information,
The subtitle display control system, wherein the correspondence table sets subtitle display information such as a character size or a character color of a subtitle corresponding to the audio feature information.

In claim 3,
The subtitle display information obtained from the subtitle display control unit is based on the audio direction information input to the subtitle display control unit, referring to a correspondence table of preset audio direction information / screen display position information. A subtitle display control system characterized in that a screen display position is determined.