JP2016091057A

JP2016091057A - Electronic device

Info

Publication number: JP2016091057A
Application number: JP2014220685A
Authority: JP
Inventors: 朋樹岩泉; Tomoki Iwaizumi; 誠治山田; Seiji Yamada
Original assignee: Kyocera Corp
Current assignee: Kyocera Corp
Priority date: 2014-10-29
Filing date: 2014-10-29
Publication date: 2016-05-23

Abstract

PROBLEM TO BE SOLVED: To provide an electronic device which enables a user to understand details of a content.SOLUTION: A subtitle setting part 10 sets a language of subtitles to be displayed. A subtitle generation part 7 translates the subtitles included in a content into the subtitles of the set language when the language of the subtitles included in the content is different from the set language. A reproduction processing part 4 displays the translated subtitles while reproducing video included in the content.SELECTED DRAWING: Figure 1

Description

本発明は、電子機器に関する。 The present invention relates to an electronic device.

従来から、映像とともに字幕を表示する技術が知られている。たとえば、特許文献１（特開２００９−３０２７０２号公報）には、映像データとその映像データに関連する字幕データを含むコンテンツを記録媒体から再生するコンテンツ再生装置が開示されている。 Conventionally, a technique for displaying subtitles together with video has been known. For example, Patent Document 1 (Japanese Patent Laid-Open No. 2009-302702) discloses a content playback apparatus that plays back content including video data and caption data related to the video data from a recording medium.

特開２００９−３０２７０２号公報JP 2009-302702 A

しかしながら、映像とともに提供される字幕の言語が理解できないユーザは、字幕を見てもコンテンツの内容を理解することができない。また、映像とともに提供される音声の言語が理解できないユーザは、音声を聞いてもコンテンツの内容を理解することができない。 However, a user who cannot understand the language of the subtitles provided with the video cannot understand the contents of the content even when viewing the subtitles. Also, a user who cannot understand the language of the audio provided with the video cannot understand the content even if he / she hears the audio.

また、映像とともに字幕が提供されない場合もある。そのような場合、たとえ映像とともに再生される音声の言語が母国語であっても、聴覚障がい者は、コンテンツの内容を理解することができない。 In some cases, subtitles are not provided with the video. In such a case, the hearing impaired person cannot understand the content even if the language of the audio reproduced together with the video is the native language.

また、映像とともに音声が提供されない場合もある。そのような場合、たとえ、映像とともに表示する字幕の言語が母国語であっても、視覚障がい者は、コンテンツの内容を理解することができない。 In some cases, audio is not provided with the video. In such a case, even if the language of the subtitles displayed together with the video is the native language, the visually impaired cannot understand the content.

それゆえに、本発明の目的は、ユーザがコンテンツの内容を理解することを可能にする電子機器を提供することである。 Therefore, an object of the present invention is to provide an electronic device that allows a user to understand the contents of content.

本発明の一態様の電子機器は、表示する字幕の言語を設定する設定部と、コンテンツに含まれる字幕の言語が設定した言語と相違する場合には、コンテンツに含まれる字幕を設定した言語の字幕に翻訳する字幕生成部と、コンテンツに含まれる映像を再生するとともに、翻訳された字幕を表示させる再生処理部とを備える。 In the electronic device of one embodiment of the present invention, when the language of the subtitles included in the content is different from the setting unit that sets the language of the subtitles to be displayed and the language set in the subtitles included in the content, A subtitle generation unit that translates into subtitles, and a reproduction processing unit that reproduces the video included in the content and displays the translated subtitles.

本発明の一態様によれば、ユーザがコンテンツの内容を理解することができる。 According to one embodiment of the present invention, the user can understand the content.

第１の実施形態のスマートフォンの構成を表わす図である。It is a figure showing the structure of the smart phone of 1st Embodiment. （ａ）〜（ｄ）は、第１の実施形態のマルチメディアコンテンツデータに含まれるヘッダの例を表わす図である。(A)-(d) is a figure showing the example of the header contained in the multimedia content data of 1st Embodiment. 第１の実施形態のマルチディアコンテンツデータに含まれるタイミング情報の例を表わす図である。It is a figure showing the example of the timing information contained in the multimedia content data of 1st Embodiment. 第１の実施形態における字幕の設定手順を表わすフローチャートである。It is a flowchart showing the setting procedure of the caption in 1st Embodiment. 第１の実施形態における字幕の表示手順を表わすフローチャートである。It is a flowchart showing the display procedure of a caption in a 1st embodiment. 第１の実施形態の字幕の表示例を表わす図である。It is a figure showing the example of a display of the caption of a 1st embodiment. 第１の実施形態の字幕の表示例を表わす図である。It is a figure showing the example of a display of the caption of a 1st embodiment. 第２の実施形態における字幕の設定手順を表わすフローチャートである。It is a flowchart showing the setting procedure of the caption in 2nd Embodiment. 第２の実施形態における字幕の表示手順を表わすフローチャートである。It is a flowchart showing the display procedure of the caption in 2nd Embodiment. 第２の実施形態の字幕を構成するテキストの例である。It is an example of the text which comprises the subtitles of 2nd Embodiment. 第２の実施形態の字幕の表示例を表わす図である。It is a figure showing the example of a display of the caption of a 2nd embodiment. 第３の実施形態のスマートフォンの構成を表わす図である。It is a figure showing the structure of the smart phone of 3rd Embodiment. （ａ）〜（ｄ）は、第３の実施形態のヘッダの例を表わす図である。(A)-(d) is a figure showing the example of the header of 3rd Embodiment. 第３の実施形態のマルチディアコンテンツデータに含まれるタイミング情報の例を表わす図である。It is a figure showing the example of the timing information contained in the multimedia content data of 3rd Embodiment. 第３の実施形態における音声の設定手順を表わすフローチャートである。It is a flowchart showing the setting procedure of the audio | voice in 3rd Embodiment. 第３の実施形態における字幕の表示手順を表わすフローチャートである。It is a flowchart showing the display procedure of the caption in 3rd Embodiment.

以下、本発明の実施の形態について図面を用いて説明する。
［第１の実施形態］
従来では、映像とともに提供される字幕の言語が理解できないユーザは、字幕を見てもコンテンツの内容を理解することができないという問題がある。それゆえ、本実施形態の第１の目的は、取得したコンテンツに含まれる字幕の言語が理解できないユーザがコンテンツの内容を理解することを可能にする電子機器を提供することである。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[First Embodiment]
Conventionally, there is a problem that a user who cannot understand the language of subtitles provided together with video cannot understand the content of the content even when viewing the subtitles. Therefore, a first object of the present embodiment is to provide an electronic device that allows a user who cannot understand the language of subtitles included in acquired content to understand the content.

また、従来では、映像とともに字幕が提供されない場合、たとえ映像とともに再生される音声の言語が母国語であっても、聴覚障がい者は、コンテンツの内容を理解することができないという問題がある。それゆえ、本実施の形態の第２の目的は、聴覚に障害のあるユーザが取得したコンテンツが字幕を含まない場合にでもコンテンツの内容を理解することを可能にする電子機器を提供することである。 In addition, conventionally, when captions are not provided together with video, there is a problem that a person with hearing impairment cannot understand the content of the content even if the language of the audio reproduced along with the video is a native language. Therefore, a second object of the present embodiment is to provide an electronic device that makes it possible to understand the content even when the content acquired by a user with hearing impairment does not include subtitles. is there.

第１の実施形態では、シフトタイム再生するときに、上記の目的を達成する技術に関する。ここで、シフトタイム再生とは、マルチメディアコンテンツデータを受信して一旦蓄積しておき、受信終了後に再生することをいう。 The first embodiment relates to a technique for achieving the above object when performing shift time reproduction. Here, the shift time reproduction means that multimedia content data is received, temporarily stored, and reproduced after the reception is completed.

図１は、第１の実施形態のスマートフォンの構成を表わす図である。
図１を参照して、このスマートフォン１は、無線通信部２と、データ記憶部３と、再生処理部４と、表示部５と、ユーザ入力部６と、字幕生成部７とを備える。 FIG. 1 is a diagram illustrating a configuration of a smartphone according to the first embodiment.
With reference to FIG. 1, the smartphone 1 includes a wireless communication unit 2, a data storage unit 3, a reproduction processing unit 4, a display unit 5, a user input unit 6, and a caption generation unit 7.

無線通信部２は、無線基地局を通じて送信されるマルチメディアコンテンツデータを受信する。マルチメディアコンテンツデータは、映像、音声、ヘッダを含む。マルチメディアコンテンツデータは、字幕および字幕の表示タイミングを定めたタイミング情報も含む場合がある。 The wireless communication unit 2 receives multimedia content data transmitted through a wireless base station. The multimedia content data includes video, audio, and header. Multimedia content data may also include subtitles and timing information that defines subtitle display timing.

データ記憶部３は、無線通信部２で受信したマルチメディアコンテンツデータを記憶する。 The data storage unit 3 stores the multimedia content data received by the wireless communication unit 2.

再生処理部４は、データ記憶部３に記憶されているマルチメディアコンテンツデータに含まれる映像を再生する。再生処理部４は、マルチメディアコンテンツデータに含まれる字幕、または字幕生成部７で生成された字幕を表わす画像を再生された映像に重畳して、表示部５に出力する。 The reproduction processing unit 4 reproduces a video included in the multimedia content data stored in the data storage unit 3. The reproduction processing unit 4 superimposes the subtitle included in the multimedia content data or the image representing the subtitle generated by the subtitle generation unit 7 on the reproduced video and outputs the superimposed video to the display unit 5.

表示部５は、再生処理部４から送られる映像および字幕を表示する。
ユーザ入力部６は、ユーザからの入力を受け付ける。 The display unit 5 displays the video and subtitles sent from the reproduction processing unit 4.
The user input unit 6 receives input from the user.

字幕生成部７は、マルチメディアコンテンツデータに含まれる字幕の言語が、表示する字幕の言語（Ａ）と相違する場合には、マルチメディアコンテンツに含まれる字幕を言語（Ａ）の字幕に翻訳する。字幕生成部７は、マルチメディアコンテンツデータに字幕が含まれていない場合に、マルチメディアコンテンツに含まれる音声から言語（Ａ）の字幕を生成する。 When the language of the subtitle included in the multimedia content data is different from the language (A) of the subtitle to be displayed, the subtitle generation unit 7 translates the subtitle included in the multimedia content into the subtitle of the language (A). . The caption generation unit 7 generates a language (A) caption from the audio included in the multimedia content when the multimedia content data does not include the caption.

字幕生成部７は、音声認識部８と、翻訳部９と、字幕設定部１０と、字幕タイミング設定部１１とを備える。 The caption generation unit 7 includes a voice recognition unit 8, a translation unit 9, a caption setting unit 10, and a caption timing setting unit 11.

音声認識部８は、マルチメディアコンテンツデータに含まれる音声を音声認識することによって字幕を生成する。 The voice recognition unit 8 generates subtitles by voice recognition of the voice included in the multimedia content data.

翻訳部９は、マルチメディアコンテンツデータに含まれる字幕、音声認識によって得られた字幕を言語（Ａ）の字幕に翻訳する。 The translation unit 9 translates the subtitles included in the multimedia content data and the subtitles obtained by voice recognition into language (A) subtitles.

字幕設定部１０は、マルチメディアコンテンツデータに含まれるヘッダを参照して、表示する字幕の言語、表示する字幕の生成方法を設定する。 The caption setting unit 10 refers to the header included in the multimedia content data, and sets the language of the caption to be displayed and the method for generating the caption to be displayed.

字幕タイミング設定部１１は、マルチメディアコンテンツデータに含まれる音声から字幕を生成する場合に、音声の再生タイミングを定めたタイミング情報に基づいて、字幕を表示するタイミングを設定する。 The subtitle timing setting unit 11 sets the subtitle display timing based on the timing information that determines the audio reproduction timing when generating subtitles from the audio included in the multimedia content data.

図２（ａ）〜（ｄ）は、第１の実施形態のマルチメディアコンテンツデータに含まれるヘッダの例を表わす図である。 2A to 2D are diagrams illustrating examples of headers included in the multimedia content data according to the first embodiment.

図２（ａ）のヘッダには、音声の言語が日本語であること、マルチメディアコンテンツデータが字幕を含むこと、および字幕の言語が日本語であることを表わす情報が含まれる。 The header of FIG. 2A includes information indicating that the language of the audio is Japanese, that the multimedia content data includes subtitles, and that the language of the subtitles is Japanese.

図２（ｂ）のヘッダには、音声の言語が日本語であること、マルチメディアコンテンツデータが字幕を含むこと、および字幕の言語が英語であることを表わす情報が含まれる。 The header of FIG. 2B includes information indicating that the language of the audio is Japanese, that the multimedia content data includes subtitles, and that the language of the subtitles is English.

図２（ｃ）のヘッダには、音声の言語が日本語であること、およびマルチメディアコンテンツデータが字幕を含まないことを表わす情報が含まれる。 The header of FIG. 2C includes information indicating that the audio language is Japanese and that the multimedia content data does not include subtitles.

図２（ｄ）のヘッダには、音声の言語が英語であること、およびマルチメディアコンテンツデータが字幕を含まないことを表わす情報が含まれる。 The header of FIG. 2D includes information indicating that the language of the audio is English and that the multimedia content data does not include subtitles.

図３は、第１の実施形態のマルチディアコンテンツデータに含まれるタイミング情報の例を表わす図である。図３に示すように、タイミング情報は、字幕と字幕を表示するときの映像のフレーム番号との関係を定める。 FIG. 3 is a diagram illustrating an example of timing information included in the multimedia content data according to the first embodiment. As shown in FIG. 3, the timing information defines the relationship between the caption and the frame number of the video when the caption is displayed.

図３のタイミング情報には、字幕＃１、字幕＃２、字幕＃３・・・が、それぞれ、フレーム番号１〜５６の映像、フレーム番号５７〜８６の映像、フレーム番号８７〜９４の映像が表示されているときに表示するように定められている。 In the timing information of FIG. 3, subtitle # 1, subtitle # 2, subtitle # 3,..., Respectively, are video with frame numbers 1 to 56, video with frame numbers 57 to 86, and video with frame numbers 87 to 94. It is set to be displayed when it is displayed.

図４は、第１の実施形態における字幕の設定手順を表わすフローチャートである。
ステップＳ１０１において、字幕設定部１０は、スマートフォン１の言語設定を参照することによって、表示する字幕の言語（α）を設定する。 FIG. 4 is a flowchart showing a subtitle setting procedure according to the first embodiment.
In step S101, the subtitle setting unit 10 sets the language (α) of the subtitle to be displayed by referring to the language setting of the smartphone 1.

ステップＳ１０２において、字幕設定部１０は、データ記憶部３に記憶されているマルチメディアコンテンツデータのヘッダを取得する。 In step S 102, the caption setting unit 10 acquires a header of multimedia content data stored in the data storage unit 3.

ステップＳ１０３において、取得したヘッダにマルチメディアコンテンツデータに字幕が含まれることが定められている場合（図２（ａ）または図２（ｂ）の場合）には、処理がステップＳ１０４に進む。取得したヘッダにマルチメディアコンテンツデータに字幕が含まれないことが定められている場合（図２（ｃ）または図２（ｄ）の場合）には、処理がステップＳ１１０に進む。 If it is determined in step S103 that the acquired header includes subtitles in the multimedia content data (in the case of FIG. 2A or FIG. 2B), the process proceeds to step S104. If it is determined that the multimedia header data is not included in the acquired header (in the case of FIG. 2C or FIG. 2D), the process proceeds to step S110.

ステップＳ１０４において、字幕設定部１０は、取得したヘッダを参照して、マルチメディアコンテンツデータに含まれる字幕の言語（β）を特定する。 In step S104, the caption setting unit 10 refers to the acquired header and identifies the language (β) of the caption included in the multimedia content data.

ステップＳ１０５において、表示する字幕の言語（α）と、マルチメディアコンテンツデータに含まれる字幕の言語（β）とが同一の場合には、処理がステップＳ１０９に進み、相違する場合には、処理がステップＳ１０６に進む。 If the subtitle language (α) to be displayed is the same as the subtitle language (β) included in the multimedia content data in step S105, the process proceeds to step S109. Proceed to step S106.

ステップＳ１０６において、字幕設定部１０は、言語（β）の字幕を表示することをユーザが承諾するか否かの画面を表示部５に出力する。ユーザが承諾する場合には、処理がステップＳ１０７に進み、ユーザが承諾しない場合には、処理がステップＳ１０８に進む。 In step S 106, the caption setting unit 10 outputs to the display unit 5 a screen as to whether or not the user consents to display the language (β) caption. If the user approves, the process proceeds to step S107, and if the user does not accept, the process proceeds to step S108.

ステップＳ１０７において、字幕設定部１０は、字幕設定を「マルチメディアコンテンツデータに含まれる言語（β）の字幕を言語（α）の字幕に翻訳して表示する（設定Ｂ）」に設定する。 In step S107, the subtitle setting unit 10 sets the subtitle setting to “translate the subtitles in the language (β) included in the multimedia content data into the subtitles in the language (α) for display (setting B)”.

ステップＳ１０８において、字幕設定部１０は、表示する字幕の言語をαからβに変更する。 In step S108, the caption setting unit 10 changes the language of the caption to be displayed from α to β.

ステップＳ１０９において、字幕設定部１０は、字幕設定を「マルチメディアコンテンツデータに含まれる言語（β）の字幕をそのまま表示する（設定Ａ）」に設定する。 In step S109, the subtitle setting unit 10 sets the subtitle setting to “display the subtitle in the language (β) included in the multimedia content data as it is (setting A)”.

ステップＳ１１０において、字幕設定部１０は、取得したヘッダを参照して、マルチメディアコンテンツデータに含まれる音声の言語（γ）を特定する。 In step S110, the caption setting unit 10 refers to the acquired header and specifies the language (γ) of the audio included in the multimedia content data.

ステップＳ１１１において、表示する字幕の言語（α）と、マルチメディアコンテンツデータに含まれる音声言語（γ）とが同一の場合には、処理がステップＳ１１５に進み、相違する場合には、処理がステップＳ１１２に進む。 If the subtitle language (α) to be displayed is the same as the audio language (γ) included in the multimedia content data in step S111, the process proceeds to step S115. Proceed to S112.

ステップＳ１１２において、字幕設定部１０は、言語（γ）の字幕を表示することをユーザが承諾するか否かの画面を表示部５に出力する。ユーザが承諾する場合には、処理がステップＳ１１３に進み、ユーザが承諾しない場合には、処理がステップＳ１１４に進む。 In step S 112, the caption setting unit 10 outputs to the display unit 5 a screen as to whether or not the user consents to display the language (γ) caption. If the user accepts, the process proceeds to step S113, and if the user does not accept, the process proceeds to step S114.

ステップＳ１１３において、字幕設定部１０は、字幕設定を「マルチメディアコンテンツデータに含まれる言語（γ）の音声を音声認識することによって言語（γ）の字幕を生成し、さらに言語（α）の字幕に翻訳して表示する（設定Ｃ）」に設定する。 In step S113, the subtitle setting unit 10 sets the subtitle setting to “a language (γ) subtitle is generated by recognizing the language (γ) audio included in the multimedia content data, and further the language (α) subtitle is set. "Translate to display (Setting C)".

ステップＳ１１４において、字幕設定部１０は、表示する字幕の言語をαからγに変更する。 In step S114, the caption setting unit 10 changes the language of the caption to be displayed from α to γ.

ステップＳ１１５において、字幕設定部１０は、字幕設定を「マルチメディアコンテンツデータに含まれる言語（γ）の音声を音声認識することによって字幕を生成して表示する（設定Ｄ）」に設定する。 In step S115, the subtitle setting unit 10 sets the subtitle setting to “generate and display subtitles by recognizing speech of language (γ) included in the multimedia content data (setting D)”.

図５は、第１の実施形態における字幕の表示手順を表わすフローチャートである。
図５を参照して、ステップＳ２０１において、字幕設定が設定Ｂ「マルチメディアコンテンツデータに含まれる言語（β）の字幕を言語（α）の字幕に翻訳して表示する」の場合には、処理がステップＳ２０２に進む。 FIG. 5 is a flowchart showing a subtitle display procedure according to the first embodiment.
Referring to FIG. 5, in step S201, when the subtitle setting is setting B “translate the language (β) subtitles included in the multimedia content data into the language (α) subtitles”, the process Advances to step S202.

ステップＳ２０３において、字幕設定が設定Ｄ「マルチメディアコンテンツデータに含まれる言語（γ）の音声を音声認識することによって字幕を生成して表示する」の場合には、処理がステップＳ２０４に進む。 In step S203, when the subtitle setting is setting D “subtitles are generated and displayed by recognizing speech of language (γ) included in the multimedia content data”, the process proceeds to step S204.

ステップＳ２０６において、字幕設定が設定Ｃ「マルチメディアコンテンツデータに含まれる言語（γ）の音声を音声認識することによって言語（γ）の字幕を生成し、さらに言語（α）の字幕に翻訳して表示する」の場合には、処理がステップＳ２０７に進む。 In step S206, the subtitle setting is set to “C” (language (γ) subtitles are generated by recognizing the language (γ) audio included in the multimedia content data, and further translated into the language (α) subtitles. In the case of “display”, the process proceeds to step S207.

ステップＳ２０２において、翻訳部９は、マルチメディアコンテンツデータに含まれる言語（β）の各字幕を言語（α）の字幕に翻訳する。 In step S202, the translation unit 9 translates each subtitle of the language (β) included in the multimedia content data into a subtitle of the language (α).

ステップＳ２０４において、音声認識部８は、マルチメディアコンテンツデータに含まれる言語（γ）の各音声を音声認識し、認識結果から言語（γ）の字幕を生成する。具体的には、音声認識部８は、マルチメディアコンテンツデータに含まれる言語（γ）の各音声を再生処理部４で再生させたときに得られる出力波形から各音声に対する文字を特定し、字幕を生成する。 In step S204, the speech recognition unit 8 recognizes each speech of the language (γ) included in the multimedia content data, and generates a language (γ) subtitle from the recognition result. Specifically, the voice recognition unit 8 specifies characters for each voice from the output waveform obtained when the playback processing unit 4 plays back each voice of the language (γ) included in the multimedia content data. Is generated.

ステップＳ２０５において、字幕タイミング設定部１１は、各字幕の表示開始時刻および表示終了時刻を、各字幕の元となる音声の開始時刻と終了時刻と同一に設定する。 In step S205, the subtitle timing setting unit 11 sets the display start time and display end time of each subtitle to be the same as the start time and end time of the audio that is the source of each subtitle.

ステップＳ２０７において、音声認識部８は、マルチメディアコンテンツデータに含まれる言語（γ）の各音声を音声認識し、認識結果から言語（γ）の字幕を生成する。具体的には、音声認識部８は、マルチメディアコンテンツデータに含まれる言語（γ）の各音声を再生処理部４で再生させたときに得られる出力波形から各音声に対する文字を特定し、字幕を生成する。 In step S207, the speech recognition unit 8 recognizes each speech of the language (γ) included in the multimedia content data, and generates a language (γ) subtitle from the recognition result. Specifically, the voice recognition unit 8 specifies characters for each voice from the output waveform obtained when the playback processing unit 4 plays back each voice of the language (γ) included in the multimedia content data. Is generated.

ステップＳ２０８において、翻訳部９は、言語（γ）の各字幕を言語（α）の字幕に翻訳する。 In step S208, the translation unit 9 translates each subtitle in the language (γ) into a subtitle in the language (α).

ステップＳ２０９において、字幕タイミング設定部１１は、各字幕の表示開始時刻および表示終了時刻を、各字幕の元となる音声の開始時刻と終了時刻と同一に設定する。 In step S209, the subtitle timing setting unit 11 sets the display start time and display end time of each subtitle to be the same as the start time and end time of the sound that is the source of each subtitle.

以上のようにして、マルチメディアコンテンツデータに含まれる映像とともに表示するすべての字幕の準備が終了する。 As described above, preparation of all subtitles to be displayed together with the video included in the multimedia content data is completed.

次に、ステップＳ２１０において、再生処理部４は、マルチメディアコンテンツデータに含まれる映像を再生する。再生処理部４は、マルチメディアコンテンツデータに含まれるタイミング情報（設定Ａまたは設定Ｂの場合）、または、ステップＳ２０５，Ｓ２０９で設定した字幕タイミング（設定Ｃまたは設定Ｄの場合）に基づいて、再生された映像に字幕を表わす画像を重畳して表示部５に出力する。 Next, in step S210, the reproduction processing unit 4 reproduces the video included in the multimedia content data. The playback processing unit 4 plays back based on the timing information included in the multimedia content data (in the case of setting A or setting B) or the subtitle timing set in steps S205 and S209 (in the case of setting C or setting D). An image representing subtitles is superimposed on the generated video and output to the display unit 5.

（字幕の例）
スマートフォンの設定言語（α）が日本語であり、マルチメディアコンテンツデータに含まれる字幕の言語（β）が日本語の場合（設定Ａ）には、図６に示すように日本語の字幕６１が表示される。 (Subtitle example)
When the smartphone setting language (α) is Japanese and the subtitle language (β) included in the multimedia content data is Japanese (setting A), as shown in FIG. Is displayed.

スマートフォンの設定言語（α）が日本語であり、マルチメディアコンテンツデータに含まれる字幕の言語（β）が英語の場合（設定Ｂ）には、図６に示すように日本語の字幕６１が表示される。 When the smartphone setting language (α) is Japanese and the subtitle language (β) included in the multimedia content data is English (setting B), the Japanese subtitle 61 is displayed as shown in FIG. Is done.

スマートフォンの設定言語（α）が日本語であり、マルチメディアコンテンツデータに含まれる字幕の言語（β）が英語であるが、ユーザが英語で表示することを承諾する場合（設定Ａ）には、図７に示すように英語の字幕６２が表示される。 If the smartphone language (α) is Japanese and the subtitle language (β) included in the multimedia content data is English, but the user consents to display in English (Setting A), As shown in FIG. 7, English subtitles 62 are displayed.

スマートフォンの設定言語（α）が日本語であり、マルチメディアコンテンツデータに字幕が含まれず、マルチメディアコンテンツデータに含まれる音声の言語（γ）が日本語の場合（設定Ｄ）には、図６に示すように日本語の字幕６１が表示される。 When the smartphone setting language (α) is Japanese, subtitles are not included in the multimedia content data, and the audio language (γ) included in the multimedia content data is Japanese (setting D), FIG. Japanese subtitles 61 are displayed as shown in FIG.

スマートフォンの設定言語（α）が日本語であり、マルチメディアコンテンツデータに字幕が含まれず、マルチメディアコンテンツデータに含まれる音声の言語（γ）が英語の場合（設定Ｃ）には、図６に示すように日本語の字幕６１が表示される。 When the smartphone setting language (α) is Japanese, the multimedia content data does not include captions, and the audio language (γ) included in the multimedia content data is English (setting C), FIG. As shown, Japanese subtitles 61 are displayed.

スマートフォンの設定言語（α）が日本語であり、マルチメディアコンテンツデータに字幕が含まれず、マルチメディアコンテンツデータに含まれる音声の言語（γ）が英語であるが、ユーザが英語で表示することを承諾する場合（設定Ｄ）には、図７に示すように英語の字幕６２が表示される。 The smartphone language (α) is Japanese, the multimedia content data does not include subtitles, and the audio language (γ) included in the multimedia content data is English. In the case of acceptance (setting D), English subtitles 62 are displayed as shown in FIG.

以上のように、本実施の形態によれば、マルチメディアコンテンツデータに字幕が含まれていない場合に、音声から字幕を生成して表示するので、特に聴覚障がい者にとって便利である。また、本実施の形態によれば、マルチメディアコンテンツデータに英語の字幕のみ、または英語の音声のみが含まれている場合に、英語の字幕または英語の音声から日本語の字幕を生成して表示するので、特に母国語が日本語のユーザにとって便利である。 As described above, according to the present embodiment, subtitles are generated and displayed from audio when the subtitles are not included in the multimedia content data, which is particularly convenient for the hearing impaired. In addition, according to the present embodiment, when the multimedia content data includes only English subtitles or only English audio, Japanese subtitles are generated from the English subtitles or English audio and displayed. Therefore, it is particularly convenient for users whose native language is Japanese.

［第２の実施形態］
第２実施形態では、ストリーミング再生するときに、第１の実施形態と同様の目的を達成する技術に関する。ここで、ストリーミング再生とは、マルチメディアコンテンツデータを受信して、リアルタイムで再生することをいう。 [Second Embodiment]
The second embodiment relates to a technique for achieving the same object as that of the first embodiment when streaming playback is performed. Here, the streaming reproduction means receiving multimedia content data and reproducing it in real time.

本実施の形態では、データ記憶部３は、受信したマルチメディアコンテンツデータの受信バッファとして機能する。すなわち、データ記憶部３は、所定時間分の最新のマルチメディアコンテンツデータを記憶する。データ記憶部３は、新たに所定時間分のマルチメディアコンテンツデータを受信した場合には、記憶している古い所定時間分のマルチメディアコンテンツデータに代えて、最新のマルチメディアコンテンツデータを記憶する。 In the present embodiment, the data storage unit 3 functions as a reception buffer for received multimedia content data. That is, the data storage unit 3 stores the latest multimedia content data for a predetermined time. When new multimedia content data for a predetermined time is received, the data storage unit 3 stores the latest multimedia content data instead of the stored multimedia content data for the predetermined time.

再生処理部４は、データ記憶部３に記憶されている最新の所定時間分のマルチメディアコンテンツデータから映像を取り出して再生する。字幕生成部７は、データ記憶部３に記憶されている最新の所定時間分のマルチメディアコンテンツデータから字幕を生成する。 The reproduction processing unit 4 extracts and reproduces video from the multimedia content data for the latest predetermined time stored in the data storage unit 3. The caption generation unit 7 generates captions from multimedia content data for the latest predetermined time stored in the data storage unit 3.

図８は、第２の実施形態における字幕の設定手順を表わすフローチャートである。
ステップＳ３０１において、字幕設定部１０は、スマートフォン１の言語設定を参照することによって、表示する字幕の言語（α）を設定する。 FIG. 8 is a flowchart showing a subtitle setting procedure according to the second embodiment.
In step S 301, the subtitle setting unit 10 sets the language (α) of the subtitle to be displayed by referring to the language setting of the smartphone 1.

ステップＳ３０２において、データ記憶部３に無線通信部２からマルチメディアコンテンツデータのヘッダが出力された場合に、処理がステップＳ３０３に進む。 In step S302, when the header of multimedia content data is output from the wireless communication unit 2 to the data storage unit 3, the process proceeds to step S303.

ステップＳ３０３において、ヘッダにマルチメディアコンテンツデータに字幕が含まれることが定められている場合（図２（ａ）または図２（ｂ）の場合）には、処理がステップＳ３０４に進む。ヘッダにマルチメディアコンテンツデータに字幕が含まれないことが定められている場合（図２（ｃ）または図２（ｄ）の場合）には、処理がステップＳ３１０に進む。 If it is determined in step S303 that the header includes subtitles in the multimedia content data (in the case of FIG. 2A or FIG. 2B), the process proceeds to step S304. If it is determined in the header that the multimedia content data does not include subtitles (in the case of FIG. 2C or FIG. 2D), the process proceeds to step S310.

ステップＳ３０４において、字幕設定部１０は、ヘッダを参照して、マルチメディアコンテンツデータに含まれる字幕の言語（β）を特定する。 In step S304, the caption setting unit 10 refers to the header and identifies the language (β) of the caption included in the multimedia content data.

ステップＳ３０５において、表示する字幕の言語（α）と、マルチメディアコンテンツデータに含まれる字幕の言語（β）とが同一の場合には、処理がステップＳ３０９に進み、相違する場合には、処理がステップＳ３０６に進む。 If the subtitle language (α) to be displayed is the same as the subtitle language (β) included in the multimedia content data in step S305, the process proceeds to step S309. Proceed to step S306.

ステップＳ３０６において、字幕設定部１０は、言語（β）の字幕を表示することをユーザが承諾するか否かの画面を表示部５に出力する。ユーザが承諾する場合には、処理がステップＳ３０７に進み、ユーザが承諾しない場合には、処理がステップＳ３０８に進む。 In step S 306, the caption setting unit 10 outputs to the display unit 5 a screen as to whether or not the user consents to display the language (β) caption. If the user accepts, the process proceeds to step S307, and if the user does not accept, the process proceeds to step S308.

ステップＳ３０７において、字幕設定部１０は、字幕設定を「マルチメディアコンテンツデータに含まれる言語（β）の字幕を言語（α）の字幕に翻訳して表示する（設定Ｂ）」に設定する。 In step S307, the caption setting unit 10 sets the caption setting to “translate the language (β) caption included in the multimedia content data into the language (α) caption (setting B)”.

ステップＳ３０８において、字幕設定部１０は、表示する字幕の言語をαからβに変更する。 In step S308, the caption setting unit 10 changes the language of the caption to be displayed from α to β.

ステップＳ３０９において、字幕設定部１０は、字幕設定を「マルチメディアコンテンツデータに含まれる言語（β）の字幕をそのまま表示する（設定Ａ）」に設定する。 In step S309, the subtitle setting unit 10 sets the subtitle setting to “display the subtitle in the language (β) included in the multimedia content data as it is (setting A)”.

ステップＳ３１０において、字幕設定部１０は、ヘッダを参照して、マルチメディアコンテンツデータに含まれる音声の言語（γ）を特定する。 In step S310, the caption setting unit 10 refers to the header and specifies the language (γ) of the audio included in the multimedia content data.

ステップＳ３１１において、表示する字幕の言語（α）と、マルチメディアコンテンツデータに含まれる音声言語（γ）とが同一の場合には、処理がステップＳ３１５に進み、相違する場合には、処理がステップＳ３１２に進む。 If the subtitle language (α) to be displayed is the same as the audio language (γ) included in the multimedia content data in step S311, the process proceeds to step S315. The process proceeds to S312.

ステップＳ３１２において、字幕設定部１０は、言語（γ）の字幕を表示することをユーザが承諾するか否かの画面を表示部５に出力する。ユーザが承諾する場合には、処理がステップＳ３１４に進み、ユーザが承諾しない場合には、処理がステップＳ３１３に進む。 In step S 312, the caption setting unit 10 outputs to the display unit 5 a screen as to whether or not the user consents to display the language (γ) caption. If the user approves, the process proceeds to step S314. If the user does not accept, the process proceeds to step S313.

ステップＳ３１３において、字幕設定部１０は、字幕設定を「マルチメディアコンテンツデータに含まれる言語（γ）の音声を音声認識することによって字幕を生成し、さらに言語（α）の字幕に翻訳して表示する（設定Ｃ）」に設定する。 In step S313, the subtitle setting unit 10 sets the subtitle setting to “Generate subtitles by recognizing speech in the language (γ) included in the multimedia content data, and further translate and display the subtitles in the language (α). Set to “Yes (Setting C)”.

ステップＳ３１４において、字幕設定部１０は、表示する字幕の言語をαからγに変更する。 In step S314, the caption setting unit 10 changes the language of the caption to be displayed from α to γ.

ステップＳ３１５において、字幕設定部１０は、字幕設定を「マルチメディアコンテンツデータに含まれる言語（γ）の音声を音声認識することによって字幕を生成して表示する（設定Ｄ）」に設定する。 In step S315, the subtitle setting unit 10 sets the subtitle setting to “generate and display subtitles by recognizing speech of language (γ) included in the multimedia content data (setting D)”.

図９は、第２の実施形態における字幕の表示手順を表わすフローチャートである。
図９を参照して、ステップＳ４０１において、字幕設定が設定Ｂ「マルチメディアコンテンツデータに含まれる言語（β）の字幕を言語（α）の字幕に翻訳して表示する」の場合には、処理がステップＳ４０２に進む。 FIG. 9 is a flowchart showing a subtitle display procedure according to the second embodiment.
Referring to FIG. 9, in step S401, when the subtitle setting is setting B “translate the language (β) subtitles included in the multimedia content data into the language (α) subtitles”, the process Advances to step S402.

ステップＳ４０３において、字幕設定が設定Ｄ「マルチメディアコンテンツデータに含まれる言語（γ）の音声を音声認識することによって字幕を生成して表示する」の場合には、処理がステップＳ４０４に進む。 In step S403, when the subtitle setting is setting D “subtitles are generated and displayed by recognizing speech of language (γ) included in the multimedia content data”, the process proceeds to step S404.

ステップＳ４０６において、字幕設定が設定Ｃ「マルチメディアコンテンツデータに含まれる言語（γ）の音声を音声認識することによって字幕を生成し、さらに言語（α）の字幕に翻訳して表示する」の場合には、処理がステップＳ４０７に進む。 In step S406, when the subtitle setting is setting C “subtitles are generated by recognizing speech in the language (γ) included in the multimedia content data, and further translated into language (α) subtitles and displayed” In step S407, the process proceeds to step S407.

ステップＳ４０２において、翻訳部９は、データ記憶部３に記憶されている所定時間分のマルチメディアコンテンツデータに含まれる言語（β）の各字幕を一定時間内で可能な範囲で言語（α）の字幕に翻訳する。 In step S402, the translation unit 9 converts each subtitle of the language (β) included in the multimedia content data for a predetermined time stored in the data storage unit 3 into the language (α) within a possible range within a certain time. Translate to subtitles.

ステップＳ４０４において、音声認識部８は、データ記憶部３に記憶されている所定時間分のマルチメディアコンテンツデータに含まれる言語（γ）の各音声を音声認識し、認識結果から言語（γ）の字幕を生成する。具体的には、音声認識部８は、マルチメディアコンテンツデータに含まれる言語（γ）の各音声を再生処理部４で再生させたときに得られる出力波形から各音声に対する文字を特定し、字幕を生成する。 In step S404, the speech recognition unit 8 recognizes each speech of the language (γ) included in the multimedia content data for a predetermined time stored in the data storage unit 3, and determines the language (γ) from the recognition result. Generate subtitles. Specifically, the voice recognition unit 8 specifies characters for each voice from the output waveform obtained when the playback processing unit 4 plays back each voice of the language (γ) included in the multimedia content data. Is generated.

ステップＳ４０５において、字幕タイミング設定部１１は、各字幕の表示開始時刻および表示終了時刻を、各字幕の元となる音声の開始時刻と終了時刻と同一に設定する。 In step S405, the subtitle timing setting unit 11 sets the display start time and display end time of each subtitle to be the same as the start time and end time of the audio that is the source of each subtitle.

ステップＳ４０７において、音声認識部８は、データ記憶部３に記憶されている所定時間分のマルチメディアコンテンツデータに含まれる言語（γ）の各音声を音声認識し、認識結果から言語（γ）の字幕を生成する。具体的には、音声認識部８は、マルチメディアコンテンツデータに含まれる言語（γ）の各音声を再生処理部４で再生させたときに得られる出力波形から各音声に対する文字を特定し、字幕を生成する。 In step S407, the speech recognition unit 8 recognizes each speech of language (γ) included in the multimedia content data for a predetermined time stored in the data storage unit 3, and recognizes the language (γ) from the recognition result. Generate subtitles. Specifically, the voice recognition unit 8 specifies characters for each voice from the output waveform obtained when the playback processing unit 4 plays back each voice of the language (γ) included in the multimedia content data. Is generated.

ステップＳ４０８において、翻訳部９は、言語（γ）の各字幕を一定時間内で可能な範囲で言語（α）の字幕に翻訳する。 In step S408, the translation unit 9 translates each subtitle of the language (γ) into a subtitle of the language (α) within a possible range within a certain time.

ステップＳ４０９において、字幕タイミング設定部１１は、各字幕の表示開始時刻および表示終了時刻を、各字幕の元となる音声の開始時刻と終了時刻と同一に設定する。 In step S409, the subtitle timing setting unit 11 sets the display start time and display end time of each subtitle to be the same as the start time and end time of the audio that is the source of each subtitle.

ステップＳ４１０において、マルチメディアコンテンツデータの受信が終了した場合（すなわち、未処理のマルチメディアコンテンツデータがデータ記憶部３に記憶されていない場合）に、処理が終了し、マルチメディアコンテンツデータをさらに受信した場合（すなわち、未処理のマルチメディアコンテンツデータがデータ記憶部３に記憶されている場合）に、処理がステップＳ４０１に戻る。 In step S410, when the reception of the multimedia content data is finished (that is, when the unprocessed multimedia content data is not stored in the data storage unit 3), the processing is finished and the multimedia content data is further received. If it has been performed (that is, when unprocessed multimedia content data is stored in the data storage unit 3), the processing returns to step S401.

ステップＳ４１１において、再生処理部４は、データ記憶部３に記憶されている所定時間分のマルチメディアコンテンツデータに含まれる映像を再生する。再生処理部４は、マルチメディアコンテンツデータに含まれるタイミング情報（設定Ａまたは設定Ｂの場合）、または、ステップＳ４０５，Ｓ４０９で設定した字幕タイミング（設定Ｃまたは設定Ｄの場合）に基づいて、再生された映像に字幕を表わす画像を重畳して表示部５に出力する。 In step S411, the reproduction processing unit 4 reproduces a video included in the multimedia content data for a predetermined time stored in the data storage unit 3. The playback processing unit 4 plays back based on the timing information (in the case of setting A or setting B) included in the multimedia content data or the subtitle timing (in the case of setting C or setting D) set in steps S405 and S409. An image representing subtitles is superimposed on the generated video and output to the display unit 5.

ステップＳ４１２において、ステップＳ４０２およびステップＳ４０８における翻訳が未完成の場合には、処理がステップＳ４１３に進む。 In step S412, when the translation in step S402 and step S408 is incomplete, the process proceeds to step S413.

ステップＳ４１３において、再生処理部４は、翻訳が未完成の旨を表わす画面を表示部５に出力する。 In step S413, the reproduction processing unit 4 outputs a screen indicating that the translation is incomplete to the display unit 5.

ステップＳ４１４において、マルチメディアコンテンツデータの受信が終了した場合に、処理が終了し、マルチメディアコンテンツデータをさらに受信した場合に、処理がステップＳ４１１に戻る。 In step S414, when the reception of the multimedia content data is finished, the processing is finished, and when the multimedia content data is further received, the processing returns to step S411.

（字幕の例）
スマートフォンの設定言語（α）が日本語であり、マルチメディアコンテンツデータに含まれる字幕の言語（β）が英語の場合（設定Ｂ）において、図１０に示す英語の字幕は、日本語に翻訳される。図１１は、図１０に示す英語の字幕の日本語への翻訳が未完成の場合の例を示す図である。図１１では、図１０に示す字幕の文のうちThe 27th edition of the international film festival, which runs until October 30, focuses heavily on Japanese animated movies″のみが日本語に翻訳されて表示されている。また、翻訳が未完成の旨を表わすアイコン５３が表示される。 (Subtitle example)
When the smartphone setting language (α) is Japanese and the subtitle language (β) included in the multimedia content data is English (Setting B), the English subtitles shown in FIG. 10 are translated into Japanese. The FIG. 11 is a diagram illustrating an example where translation of the English subtitles illustrated in FIG. 10 into Japanese is incomplete. 11, only “The 27th edition of the international film festival, which runs until October 30, specifically heavily on Japanese animated movies” of the subtitle sentence shown in FIG. 10 is translated into Japanese and displayed. An icon 53 indicating that the translation is incomplete is displayed.

以上のように、本実施の形態によれば、コンテンツデータを受信しながら再生するストリーミング再生においても、第１の実施形態のコンテンツデータをすべて受信してから生成するシフトタイム再生と同様に、マルチメディアコンテンツデータに字幕が含まれていない場合に、音声から字幕を生成して表示するので、特に聴覚障がい者にとって便利である。また、本実施の形態によれば、第１の実施形態と同様に、マルチメディアコンテンツデータに英語の字幕のみ、または英語の音声のみが含まれている場合に、英語の字幕または英語の音声から日本語の字幕を生成して表示するので、特に母国語が日本語のユーザにとって便利である。 As described above, according to the present embodiment, in streaming playback that is played back while receiving content data, as in the case of shift time playback that is generated after all content data in the first embodiment is received, Since subtitles are generated from audio and displayed when the media content data does not include subtitles, it is particularly convenient for persons with hearing disabilities. Further, according to the present embodiment, as in the first embodiment, when the multimedia content data includes only English subtitles or only English audio, the English subtitles or English audio are used. Since Japanese subtitles are generated and displayed, it is particularly convenient for users whose native language is Japanese.

［第３の実施形態］
従来では、映像とともに提供される音声の言語が理解できないユーザは、音声の内容を理解することができない。それゆえに、本実施の形態の第１の目的は、取得したコンテンツに含まれる音声の言語が理解できないユーザがコンテンツの内容を理解することを可能にする電子機器を提供することである。 [Third Embodiment]
Conventionally, a user who cannot understand the language of the audio provided together with the video cannot understand the content of the audio. Therefore, a first object of the present embodiment is to provide an electronic device that allows a user who cannot understand the language of audio included in acquired content to understand the content.

また、従来では、映像とともに音声が提供されない場合には、たとえ、映像とともに表示する字幕の言語が母国語であっても、視覚障がい者、すなわち視覚機能を失った人、または映像は視認できるが、小さな文字の字幕を読むのが困難な人は、コンテンツの内容を理解することができない。それゆえに、本実施の形態の第２の目的は、視覚に障害のあるユーザが取得したコンテンツが音声を含まない場合にでもコンテンツの内容を理解することを可能にする電子機器を提供することである。 Conventionally, when audio is not provided with video, even if the language of the subtitles displayed with the video is a native language, a visually impaired person, that is, a person who has lost visual function, or video can be visually recognized. People who have difficulty reading subtitles with small characters cannot understand the content. Therefore, a second object of the present embodiment is to provide an electronic device that makes it possible to understand the content even when the content acquired by a visually impaired user does not include sound. is there.

第３の実施形態では、シフトタイム再生するときに、上記の目的を達成する技術に関する。 The third embodiment relates to a technique that achieves the above object when performing shift time reproduction.

図１２は、第３の実施形態のスマートフォンの構成を表わす図である。
図１２を参照して、このスマートフォン５１は、字幕生成部７の代わりに、音声生成部５７を含む。 FIG. 12 is a diagram illustrating the configuration of the smartphone according to the third embodiment.
Referring to FIG. 12, this smartphone 51 includes an audio generation unit 57 instead of the caption generation unit 7.

音声生成部５７は、マルチメディアコンテンツデータに含まれる音声の言語が、再生する音声の言語（Ａ）と相違する場合には、マルチメディアコンテンツに含まれる音声を言語（Ａ）の音声に翻訳する。字幕生成部７は、マルチメディアコンテンツデータに音声が含まれていない場合に、マルチメディアコンテンツに含まれる字幕から言語（Ａ）の音声を生成する。 When the language of the audio included in the multimedia content data is different from the language (A) of the audio to be reproduced, the audio generation unit 57 translates the audio included in the multimedia content into the audio of the language (A). . The subtitle generation unit 7 generates language (A) audio from the subtitles included in the multimedia content when the audio is not included in the multimedia content data.

音声生成部５７は、音声認識部５３と、音声合成部５８と、翻訳部５９と、音声設定部５５と、音声タイミング設定部５６とを備える。 The voice generation unit 57 includes a voice recognition unit 53, a voice synthesis unit 58, a translation unit 59, a voice setting unit 55, and a voice timing setting unit 56.

音声認識部５３は、マルチメディアコンテンツデータに含まれる音声を音声認識することによってテキストを生成する。 The voice recognition unit 53 generates text by voice recognition of voice included in the multimedia content data.

翻訳部９は、マルチメディアコンテンツデータに含まれる字幕、音声認識によって得られたテキストを言語（Ａ）の字幕に翻訳する。 The translation unit 9 translates the subtitles included in the multimedia content data and the text obtained by speech recognition into the subtitles in the language (A).

音声合成部５８は、字幕またはテキストから音声を合成する。
音声設定部５５は、マルチメディアコンテンツデータに含まれるヘッダを参照して、再生する音声の言語、再生する音声の生成方法を設定する。 The voice synthesizer 58 synthesizes voice from subtitles or text.
The audio setting unit 55 refers to the header included in the multimedia content data, and sets the language of the audio to be reproduced and the generation method of the audio to be reproduced.

音声タイミング設定部５６は、マルチメディアコンテンツデータに含まれる字幕から音声を生成する場合に、字幕の表示タイミングを定めたタイミング情報に基づいて、音声を再生するタイミングを設定する。 The audio timing setting unit 56 sets the timing for reproducing the audio based on the timing information that determines the display timing of the subtitles when generating audio from the subtitles included in the multimedia content data.

図１３（ａ）〜（ｄ）は、第３の実施形態のヘッダの例を表わす図である。
図１３（ａ）のヘッダには、マルチメディアコンテンツデータが音声を含むこと、音声の言語が日本語であること、および字幕の言語が日本語であることを表わす情報が含まれる。 FIGS. 13A to 13D are diagrams illustrating examples of headers according to the third embodiment.
The header of FIG. 13A includes information indicating that the multimedia content data includes audio, that the language of the audio is Japanese, and that the language of the caption is Japanese.

図１３（ｂ）のヘッダには、マルチメディアコンテンツデータが音声を含むこと、音声の言語が英語であること、および字幕の言語が日本語であることを表わす情報が含まれる。 The header of FIG. 13B includes information indicating that the multimedia content data includes audio, that the audio language is English, and that the subtitle language is Japanese.

図１３（ｃ）のヘッダには、マルチメディアコンテンツデータが音声を含まないこと、および字幕の言語が日本語であることを表わす情報が含まれる。 The header of FIG. 13 (c) includes information indicating that the multimedia content data does not include audio and that the language of the subtitle is Japanese.

図１３（ｄ）のヘッダには、マルチメディアコンテンツデータが音声を含まないこと、および音声の言語が英語であることを表わす情報が含まれる。 The header of FIG. 13 (d) includes information indicating that the multimedia content data does not include audio and that the language of the audio is English.

図１４は、第３の実施形態のマルチディアコンテンツデータに含まれるタイミング情報の例を表わす図である。図１４に示すように、タイミング情報は、音声と音声が再生されるときの映像のフレーム番号との関係を定める。 FIG. 14 is a diagram illustrating an example of timing information included in the multimedia content data according to the third embodiment. As shown in FIG. 14, the timing information defines the relationship between audio and the frame number of the video when the audio is reproduced.

図１４のタイミング情報には、音声＃１、音声＃２、音声＃３・・・が、それぞれ、フレーム番号１〜５６の映像、フレーム番号５７〜８６の映像、フレーム番号８７〜９４の映像が表示されているときに再生されるように定められている。 In the timing information of FIG. 14, audio # 1, audio # 2, audio # 3,..., Video of frame numbers 1 to 56, video of frame numbers 57 to 86, and video of frame numbers 87 to 94, respectively. It is set to be played when displayed.

図１５は、第３の実施形態における音声の設定手順を表わすフローチャートである。
ステップＳ５０１において、音声設定部５５は、スマートフォン５１の言語設定を参照することによって、再生する音声の言語（α）を設定する。 FIG. 15 is a flowchart showing a voice setting procedure according to the third embodiment.
In step S501, the audio setting unit 55 sets the language (α) of the audio to be reproduced by referring to the language setting of the smartphone 51.

ステップＳ５０２において、音声設定部５５は、データ記憶部３に記憶されているマルチメディアコンテンツデータのヘッダを取得する。 In step S502, the audio setting unit 55 acquires the header of the multimedia content data stored in the data storage unit 3.

ステップＳ５０３において、取得したヘッダにマルチメディアコンテンツデータに音声が含まれることが定められている場合（図１３（ａ）または図１３（ｂ）の場合）には、処理がステップＳ５０４に進む。取得したヘッダにマルチメディアコンテンツデータに音声が含まれないことが定められている場合（図１３（ｃ）または図１３（ｄ）の場合）には、処理がステップＳ５１０に進む。 If it is determined in step S503 that the acquired header includes audio in the multimedia content data (in the case of FIG. 13A or FIG. 13B), the process proceeds to step S504. If it is determined in the acquired header that no audio is included in the multimedia content data (in the case of FIG. 13C or FIG. 13D), the process proceeds to step S510.

ステップＳ５０４において、音声設定部５５は、取得したヘッダを参照して、マルチメディアコンテンツデータに含まれる音声の言語（β）を特定する。 In step S504, the audio setting unit 55 refers to the acquired header and specifies the language (β) of the audio included in the multimedia content data.

ステップＳ５０５において、再生する音声の言語（α）と、マルチメディアコンテンツデータに含まれる音声の言語（β）とが同一の場合には、処理がステップＳ５０９に進み、相違する場合には、処理がステップＳ５０６に進む。 In step S505, if the audio language (α) to be reproduced is the same as the audio language (β) included in the multimedia content data, the process proceeds to step S509. The process proceeds to step S506.

ステップＳ５０６において、音声設定部５５は、言語（β）の音声を再生することをユーザが承諾するか否かの画面を表示部５に出力する。ユーザが承諾する場合には、処理がステップＳ５０７に進み、ユーザが承諾しない場合には、処理がステップＳ５０８に進む。 In step S506, the voice setting unit 55 outputs to the display unit 5 a screen as to whether or not the user consents to reproduce the voice of language (β). If the user accepts, the process proceeds to step S507. If the user does not accept, the process proceeds to step S508.

ステップＳ５０７において、音声設定部５５は、音声設定を「マルチメディアコンテンツデータに含まれる言語（β）の音声を言語（α）の音声に翻訳して再生する（設定Ｂ）」に設定する。 In step S507, the sound setting unit 55 sets the sound setting to “translate and reproduce the language (β) sound included in the multimedia content data into the language (α) sound (setting B)”.

ステップＳ５０８において、字幕設定部１０は、表示する字幕の言語をαからβに変更する。 In step S508, the caption setting unit 10 changes the language of the caption to be displayed from α to β.

ステップＳ５０９において、音声設定部５５は、音声設定を「マルチメディアコンテンツデータに含まれる言語（β）の音声をそのまま再生する（設定Ａ）」に設定する。 In step S509, the audio setting unit 55 sets the audio setting to “reproduce the audio of the language (β) included in the multimedia content data as it is (setting A)”.

ステップＳ５１０において、音声設定部５５は、取得したヘッダを参照して、マルチメディアコンテンツデータに含まれる字幕の言語（γ）を特定する。 In step S510, the audio setting unit 55 refers to the acquired header and identifies the language (γ) of the subtitles included in the multimedia content data.

ステップＳ５１１において、再生する音声の言語（α）と、マルチメディアコンテンツデータに含まれる音声の言語（γ）とが同一の場合には、処理がステップＳ５１５に進み、相違する場合には、処理がステップＳ５１２に進む。 In step S511, if the language (α) of the audio to be reproduced and the language (γ) of the audio included in the multimedia content data are the same, the process proceeds to step S515, and if different, the process is performed. Proceed to step S512.

ステップＳ５１２において、音声設定部５５は、言語（γ）の音声を再生することをユーザが承諾するか否かの画面を表示部５に出力する。ユーザが承諾する場合には、処理がステップＳ５１４に進み、ユーザが承諾しない場合には、処理がステップＳ５１３に進む。 In step S 512, the voice setting unit 55 outputs a screen to the display unit 5 as to whether or not the user consents to reproduce the language (γ) voice. If the user accepts, the process proceeds to step S514, and if the user does not accept, the process proceeds to step S513.

ステップＳ５１３において、音声設定部５５は、音声設定を「マルチメディアコンテンツデータに含まれる言語（γ）の字幕を言語（α）の字幕に翻訳し、さらに音声合成することによって言語（α）の音声を生成して再生する（設定Ｃ）」に設定する。 In step S513, the audio setting unit 55 sets the audio setting to “language (α) audio by translating the language (γ) subtitles included in the multimedia content data into language (α) subtitles and further synthesizing the audio. Is generated and played back (setting C) ”.

ステップＳ５１４において、字幕設定部１０は、表示する字幕の言語をαからγに変更する。 In step S514, the caption setting unit 10 changes the language of the caption to be displayed from α to γ.

ステップＳ５１５において、音声設定部５５は、字幕設定を「マルチメディアコンテンツデータに含まれる言語（γ）の字幕から音声合成することによって言語（γ）の音声を生成して再生する（設定Ｄ）」に設定する。 In step S515, the audio setting unit 55 generates and reproduces language (γ) audio by synthesizing audio from the language (γ) subtitles included in the multimedia content data (setting D). Set to.

図１６は、第３の実施形態における字幕の表示手順を表わすフローチャートである。
図１６を参照して、ステップＳ６０１において、音声設定が設定Ｂ「マルチメディアコンテンツデータに含まれる言語（β）の音声を言語（α）の音声に翻訳して再生する」の場合には、処理がステップＳ６０２に進む。 FIG. 16 is a flowchart showing a subtitle display procedure according to the third embodiment.
Referring to FIG. 16, in step S601, when the audio setting is setting B “translate and reproduce language (β) audio included in multimedia content data into language (α) audio” Advances to step S602.

ステップＳ６０３において、音声設定が設定Ｄ「マルチメディアコンテンツデータに含まれる言語（γ）の字幕から音声合成することによって言語（γ）の音声を生成して再生する」の場合には、処理がステップＳ６０４に進む。 In step S603, if the audio setting is setting D "generates and reproduces language (γ) audio by synthesizing audio from language (γ) subtitles included in the multimedia content data", the process proceeds to step S603. The process proceeds to S604.

ステップＳ６０６において、音声設定が設定Ｃ「マルチメディアコンテンツデータに含まれる言語（γ）の字幕を言語（α）の字幕に翻訳し、さらに音声合成することによって言語（α）の音声を生成して再生する」の場合には、処理がステップＳ６０７に進む。 In step S606, the audio setting is set to “C (translation of language (γ) subtitles included in the multimedia content data into language (α) subtitles, and further synthesizing the audio to generate language (α) audio. In the case of “play”, the process proceeds to step S607.

ステップＳ６０２において、音声認識部５３、翻訳部５９および音声合成部５８は、マルチメディアコンテンツデータに含まれる言語（β）の各音声から言語（α）の音声を生成する。 In step S602, the speech recognition unit 53, the translation unit 59, and the speech synthesis unit 58 generate language (α) speech from each language (β) speech included in the multimedia content data.

具体的には、音声認識部５３は、マルチメディアコンテンツデータに含まれる言語（β）の各音声を再生処理部４で再生させたときに得られる出力波形から各音声に対する文字を特定し、言語（β）のテキストを生成する。翻訳部５９は、言語（β）のテキストを言語（α）のテキストに翻訳する。音声合成部５８は、言語（α）のテキストから音声合成することによって、言語（α）の音声を生成する。 Specifically, the speech recognition unit 53 identifies characters for each speech from the output waveform obtained when the speech processing unit 4 reproduces each speech of the language (β) included in the multimedia content data, and the language Generate the text of (β). The translation unit 59 translates the language (β) text into the language (α) text. The speech synthesizer 58 generates speech of the language (α) by synthesizing speech from the language (α) text.

ステップＳ６０４において、音声合成部５８は、マルチメディアコンテンツデータに含まれる言語（γ）の各字幕から音声合成して言語（γ）の音声を生成する。 In step S604, the speech synthesizer 58 synthesizes speech from each subtitle of the language (γ) included in the multimedia content data to generate language (γ) speech.

ステップＳ６０５において、音声タイミング設定部５６は、各音声の表示開始時刻および表示終了時刻を、各音声の元となる字幕の開始時刻と終了時刻と同一に設定する。 In step S605, the audio timing setting unit 56 sets the display start time and display end time of each audio to be the same as the start time and end time of the caption that is the source of each audio.

ステップＳ６０７において、翻訳部５９は、マルチメディアコンテンツデータに含まれる言語（γ）の各字幕を言語（α）の字幕に翻訳する。 In step S607, the translation unit 59 translates each subtitle of the language (γ) included in the multimedia content data into a subtitle of the language (α).

ステップＳ６０８において、音声合成部５８は、言語（α）の各字幕から音声合成して言語（α）の音声を生成する。 In step S608, the speech synthesizer 58 synthesizes speech from each subtitle of language (α) to generate language (α) speech.

ステップＳ６０９において、音声タイミング設定部５６は、各音声の表示開始時刻および表示終了時刻を、各音声の元となる字幕の開始時刻と終了時刻と同一に設定する。 In step S609, the audio timing setting unit 56 sets the display start time and display end time of each audio to be the same as the start time and end time of the caption that is the source of each audio.

以上のようにして、再生するすべての音声が生成される。
次に、ステップＳ６１０において、再生処理部４は、マルチメディアコンテンツデータに含まれる映像を再生して表示部５に出力する。再生処理部４は、マルチメディアコンテンツデータに含まれるタイミング情報（設定Ａまたは設定Ｂの場合）、または、ステップＳ６０５，Ｓ６０９で設定した音声タイミング（設定Ｃまたは設定Ｄの場合）に基づいて、生成された音声を再生してスピーカ７１に出力する。 As described above, all sounds to be reproduced are generated.
Next, in step S610, the playback processing unit 4 plays back the video included in the multimedia content data and outputs it to the display unit 5. The reproduction processing unit 4 generates based on the timing information (in the case of setting A or setting B) included in the multimedia content data or the audio timing (in the case of setting C or setting D) set in steps S605 and S609. The reproduced sound is reproduced and output to the speaker 71.

以上のように、本実施の形態によれば、マルチメディアコンテンツデータに音声が含まれていない場合に、字幕から音声を生成して再生するので、特に視覚障がい者にとって便利である。また、本実施の形態によれば、マルチメディアコンテンツデータに英語の音声のみ、または英語の字幕のみが含まれている場合に、英語の音声または英語の字幕から日本語の音声を生成して表示するので、特に母国語が日本語のユーザにとって便利である。 As described above, according to the present embodiment, audio is generated from subtitles and reproduced when multimedia content data does not include audio, which is particularly convenient for visually impaired persons. In addition, according to the present embodiment, when the multimedia content data includes only English audio or only English subtitles, Japanese audio is generated from the English audio or English subtitles and displayed. Therefore, it is particularly convenient for users whose native language is Japanese.

（変形例）
本発明は、上記の実施形態に限定されるものではなく、たとえば以下のような変形例も含む。 (Modification)
The present invention is not limited to the above embodiment, and includes, for example, the following modifications.

（１）音声と字幕の両方を出力する構成
第１の実施形態および第２の実施形態では、映像と字幕を表示し、音声を再生しなかったが、音声を再生するものとしてもよい。第３の実施形態では、映像と音声を再生し、字幕を表示しなかったが、字幕を表示してもよい。 (1) Configuration for Outputting Both Audio and Subtitles In the first embodiment and the second embodiment, video and subtitles are displayed and audio is not reproduced. However, audio may be reproduced. In the third embodiment, video and audio are reproduced and subtitles are not displayed, but subtitles may be displayed.

（２）表示する字幕の言語および再生する音声の言語
第１〜第３の実施形態では、スマートフォンの言語設定で設定されている言語に基づいて、表示する字幕の言語および再生する音声の言語を設定した。 (2) Subtitle language to be displayed and audio language to be played back In the first to third embodiments, the language of the subtitle to be displayed and the language of the audio to be played back are based on the language set in the language setting of the smartphone. Set.

ここで、スマートフォンの言語設定は、マイクに入力されたユーザの音声を認識することによって行われるものであってもよい。 Here, the language setting of the smartphone may be performed by recognizing the user's voice input to the microphone.

すなわち、字幕設定部または音声設定部は、音声通話時、または音声通話時以外において、ユーザが発音した言語を識別することによって、スマートフォンの言語を設定してもよい。たとえば、字幕設定部または音声設定部は、ユーザが"This is a pen"と発音した場合に、スマートフォンの言語を英語に設定するようにしてもよい。 That is, the subtitle setting unit or the voice setting unit may set the language of the smartphone by identifying the language pronounced by the user during a voice call or other than during a voice call. For example, the subtitle setting unit or the audio setting unit may set the language of the smartphone to English when the user pronounces “This is a pen”.

また、字幕設定部または音声設定部は、ユーザがスマートフォンの言語設定メニューを選択した後、ユーザが音声で直接スマートフォンの言語を設定することとしてもよい。たとえば、字幕設定部または音声設定部は、ユーザが英語または他の言語の音声で"English"と発音することによって、スマートフォンの言語を英語に設定するようにしてもよい。 Moreover, after a user selects the language setting menu of a smart phone, a subtitle setting part or an audio | voice setting part is good also as a user setting the language of a smart phone directly with an audio | voice. For example, the subtitle setting unit or the audio setting unit may set the language of the smartphone to English when the user pronounces “English” in English or other language audio.

（３）効果音の字幕化
第１の実施形態および第２の実施形態において、音声から字幕を生成するときに、音声に効果音が含まれる場合には、効果音を表わす擬音語を含む字幕を表示するものとしてもよい。 (3) Subtitle conversion of sound effect In the first and second embodiments, when generating a caption from sound, if the sound includes a sound effect, the caption includes an onomatopoeia representing the sound effect. May be displayed.

（４）楽曲の字幕化
第１の実施形態および第２の実施形態において、音声から字幕を生成するときに、音声に楽曲が含まれる場合には、楽曲名、楽曲の説明、または楽曲の歌詞を含む字幕を表示するものとしてもよい。 (4) Subtitles of music In the first embodiment and the second embodiment, when subtitles are generated from sound, if the music includes music, the music name, the description of the music, or the lyrics of the music It is good also as what displays subtitles containing.

（５）翻訳が未完成の場合の設定変更
第２の実施形態において、翻訳が未完成の場合に、翻訳が未完成の旨を表わすアイコンを表示した（図９のステップＳ４１１、Ｓ４１２）が、その後ユーザが表示する字幕の言語を変更できるようにしてもよい。 (5) Change of settings when translation is incomplete In the second embodiment, when translation is incomplete, an icon indicating that translation is incomplete is displayed (steps S411 and S412 in FIG. 9). Thereafter, the language of the subtitles displayed by the user may be changed.

具体的には、字幕の翻訳の停止設定するためのアイコンを表示部の５の上端などに表示して、ユーザがアイコンを選択することによって、マルチメディアコンテンツデータに含まれるコンテンツデータ、または音声認識によって得られた字幕を翻訳せずに表示するものとしてもよい。 Specifically, an icon for setting the subtitle translation to be stopped is displayed on the upper end of the display unit 5 and the content data included in the multimedia content data or voice recognition is selected by the user selecting the icon. The subtitles obtained by the above may be displayed without translation.

（６）ストリーミング
第１の実施形態では、マルチメディアコンテンツデータのすべてを受信終了後、すべての字幕を生成し、その後で映像の再生および字幕の表示を行ったが、これに限定するものではない。 (6) Streaming In the first embodiment, after receiving all of the multimedia content data, all subtitles are generated, and then video playback and subtitle display are performed. However, the present invention is not limited to this. .

マルチメディアコンテンツデータのすべてを受信終了する前から、字幕の生成を開始するものとしてもよい。字幕の生成時に、字幕の翻訳が必要な場合には、未完成部分を残すことなく翻訳を完成させる。表示する字幕の生成が追い付かなくなるようなことがないように十分な時間分の字幕が生成された後、映像の再生を開始するものとしてもよい。 Subtitle generation may be started before reception of all the multimedia content data is completed. When subtitles need to be translated when subtitles are generated, the translation is completed without leaving unfinished parts. The reproduction of the video may be started after the subtitles for a sufficient time have been generated so that the generation of the subtitles to be displayed cannot be caught up.

（７）複数の字幕を含む場合
第１の実施形態および第２の実施形態において、マルチメディアコンテンツデータに含まれる字幕は１種類の言語のものだけとしたが、本発明は、複数言語の字幕が含まれる場合にでも適用可能である。 (7) In the case of including a plurality of subtitles In the first embodiment and the second embodiment, the subtitles included in the multimedia content data are only of one kind of language. Even if is included, it is applicable.

たとえば、マルチメディアコンテンツデータに含まれる複数言語の字幕の中に、表示する字幕と同じ言語のものが存在しない場合には、マルチメディアコンテンツデータに含まれる複数言語の字幕の中のいずれか１つの字幕を選択して、表示する字幕の言語に翻訳すればよい。この際、複数の字幕の中のいずれを選択するかは、予め設定した優先度に基づくものとすることができる。たとえば、文法の構造が類似している言語の間の翻訳は、優先度が高いものとすることができる。 For example, if there are no subtitles in the same language as the subtitles to be displayed among subtitles in multiple languages included in the multimedia content data, one of the subtitles in multiple languages included in the multimedia content data is displayed. Select a subtitle and translate it into the language of the subtitle to be displayed. At this time, which of the plurality of subtitles to select can be based on a preset priority. For example, translations between languages with similar grammatical structures can be high priority.

あるいは、マルチメディアコンテンツデータに含まれる複数言語の字幕の中に、表示する字幕と同じ言語のものが存在しない場合には、複数言語のうち標準の言語（たとえば英語）の字幕を表示するものとしてもよい。 Alternatively, when there is no subtitle in the same language as the subtitle to be displayed among subtitles in multiple languages included in the multimedia content data, subtitles in a standard language (for example, English) among the multiple languages are displayed. Also good.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

１，５１スマートフォン、２無線通信部、３データ記憶部、４再生処理部、５表示部、６ユーザ入力部、７字幕生成部、８，５３音声認識部、９，５９翻訳部、１０字幕設定部、１１字幕タイミング設定部、５５音声設定部、５６音声タイミング設定部、５７音声生成部、５８音声合成部、７１スピーカ。 1,51 Smartphone, 2 Wireless communication unit, 3 Data storage unit, 4 Playback processing unit, 5 Display unit, 6 User input unit, 7 Subtitle generation unit, 8,53 Speech recognition unit, 9,59 Translation unit, 10 Subtitle setting 11, subtitle timing setting unit, 55 audio setting unit, 56 audio timing setting unit, 57 audio generation unit, 58 audio synthesis unit, 71 speaker.

Claims

Electronic equipment,
A setting unit for setting the language set for the electronic device as a language for subtitles to be displayed;
When the language of the subtitles included in the content is different from the set language, a subtitle generation unit that translates the subtitles included in the content into subtitles of the set language;
An electronic device comprising: a reproduction processing unit that reproduces video included in the content and displays the translated subtitles.

Electronic equipment,
A setting unit for setting the language set for the electronic device as a language for subtitles to be displayed;
A subtitle generating unit that generates subtitles in the set language from audio included in the content when no subtitles are included in the content;
An electronic apparatus comprising: a reproduction processing unit that generates a video included in the content and displays the generated subtitle.

The subtitle generation unit generates subtitles of the set language by a recognition process of audio included in the content when the language of the audio included in the content is the same as the set language. The electronic device described.

When the language of the audio included in the content is different from the set language, the subtitle generation unit generates a subtitle by a recognition process of the audio included in the content, and further sets the generated subtitle. The electronic device according to claim 2, which is translated into language subtitles.

5. The electronic device according to claim 3, wherein the subtitle generation unit sets the display start time and display end time of the subtitle to be the same as the reproduction start time and reproduction end time of the audio of the caption generation source. 6.

The caption generation unit performs the translation as much as possible within a certain time,
5. The electronic device according to claim 1, wherein when there is an incomplete translation, the reproduction processing unit notifies that the caption is displayed with the translation incomplete.

The electronic device according to claim 1, wherein the caption generation unit executes the translation only when a user desires the translation.

The electronic device according to claim 1, wherein the setting unit sets a language of the electronic device based on a sound produced by a user.

A setting section for setting the language of the audio to be played;
If the language of the audio included in the content is different from the set language, an audio generation unit that generates the audio of the set language from the audio included in the content;
An electronic apparatus comprising: a reproduction processing unit that reproduces the video included in the content and reproduces the generated audio.

A setting section for setting the language of the audio to be played;
An audio generation unit that generates audio in the set language from subtitles included in the content when audio is not included in the content;
An electronic apparatus comprising: a reproduction processing unit that reproduces the video included in the content and reproduces the generated audio.