JP2017212667A

JP2017212667A - Language information providing device

Info

Publication number: JP2017212667A
Application number: JP2016105957A
Authority: JP
Inventors: 俊輔土尻; Shunsuke Dojiri; 択磨松村; Takuma Matsumura; 康博小桐; Yasuhiro Ogiri
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2016-05-27
Filing date: 2016-05-27
Publication date: 2017-11-30

Abstract

PROBLEM TO BE SOLVED: To provide a language information providing device capable of providing language information whose content is easy to grasp.SOLUTION: A device 10 comprises: a character number calculation part 13 which calculates the upper limit of the number of characters of language information to be output, on the basis of the output period of the language information; a language information generation part 15 which generates the language information to be output; and an original generation part 14 which, in accordance with the upper limit of the number of characters calculated by the character number calculation part 13, restricts the number of characters of the language information generated by the language information generation part 15.SELECTED DRAWING: Figure 1

Description

本発明は、言語情報提供装置に関する。 The present invention relates to a language information providing apparatus.

たとえば特許文献１は、映像における音声等の翻訳文の字幕を、映像とともに再生する技術を開示する。 For example, Patent Document 1 discloses a technique for reproducing subtitles of a translation sentence such as audio in a video together with the video.

特開２００９−１６９１０号公報JP 2009-16910 A

工藤拓、松本裕治「チャンキングの段階適用による日本語係り受け解析」情報処理学会論文誌Ｖｏｌ．４３Ｎｏ．６２００２年６月、ｐｐ．１８３４−１８４２Taku Kudo, Yuji Matsumoto “Japanese Dependency Analysis by Chunking Stage Application” Transactions of Information Processing Society of Japan, Vol. 43 No. 6 June 2002, pp. 1834-1842

映像の視聴者（ユーザ）の判読能力を考慮した単位時間当たりの字幕の表示可能な上限文字数（以下、「表示可能文字数」という）は、概ね決まっていることが知られている。たとえば、映画の場合には４文字／秒程度とされている。字幕に限らず、吹き替えなどの音声を映像に合成するシーンにおいても同様のことが言え、この場合には、単位時間当たりの吹替音声の出力可能な上限文字数が、上述の表示可能文字数に相当し得る。 It is known that the upper limit number of subtitles that can be displayed per unit time (hereinafter referred to as “the number of characters that can be displayed”) in consideration of the interpretation ability of a video viewer (user) is generally determined. For example, in the case of a movie, it is about 4 characters / second. The same can be said for scenes in which audio such as dubbing is synthesized with video as well as subtitles. In this case, the upper limit number of characters that can be output for dubbing audio per unit time corresponds to the number of displayable characters described above. obtain.

映画などの映像に字幕を表示する場合には、映像の各シーンに対応した字幕を、そのシーンの期間内に表示し終えなければならない。その期間内に表示される単位時間当たりの字幕の文字数は、表示可能文字数以下とすることが好ましい。しかしながら、映像の再生に合わせてリアルタイムで翻訳文の字幕（以下、単に「翻訳文」という場合もある）を作成して表示するような場合には、翻訳文の文字数が表示可能文字数を上回る事態が発生し得る。吹替音声を作成して出力する場合にも、吹替音声の文字数が表示可能文字数を上回る事態が発生し得る。表示可能文字数を上回る翻訳文が表示されたり、表示可能文字数を上回る文字数の吹替音声が出力されたりすると、字幕または吹替音声で表されるような言語情報の内容を把握することが難しくなる。いわゆる文字起こしによって音声情報から作成された文字情報（トランスクリプト文）のような言語情報についても、同様のことが言える。 When subtitles are displayed on a video such as a movie, the subtitles corresponding to each scene of the video must be displayed within the period of the scene. It is preferable that the number of subtitle characters per unit time displayed during the period is equal to or less than the number of displayable characters. However, when creating and displaying subtitles of translated sentences in real time (hereinafter sometimes simply referred to as “translated sentences”) in accordance with video playback, the number of translated characters exceeds the number of displayable characters. Can occur. Even when the dubbing voice is created and output, a situation may occur in which the number of characters of the dubbing voice exceeds the number of displayable characters. If a translated sentence exceeding the number of displayable characters is displayed or a dubbing voice having more characters than the number of displayable characters is output, it becomes difficult to grasp the contents of the language information represented by subtitles or dubbing voices. The same can be said for language information such as character information (transcript sentence) created from speech information by so-called transcription.

本発明は、上記課題に鑑みてなされたものであり、内容を把握しやすい言語情報を提供することが可能な言語情報提供装置を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a language information providing apparatus capable of providing language information whose contents are easily grasped.

本発明の一態様に係る言語情報提供装置は、言語情報の出力時間に基づいて、出力される言語情報の上限文字数を算出する算出手段と、出力される言語情報を作成する作成手段と、算出手段によって算出された上限文字数に応じて、作成手段が作成する言語情報の文字数を制限する制限手段と、を備える。 A language information providing apparatus according to an aspect of the present invention includes: a calculation unit that calculates an upper limit number of characters of language information to be output based on an output time of language information; a creation unit that generates language information to be output; Limiting means for limiting the number of characters of the language information created by the creating means according to the upper limit number of characters calculated by the means.

上記の言語情報提供装置では、言語情報の上限文字数が算出され、算出された上限文字数に応じて、作成される言語情報の文字数が制限される。このように言語情報の文字数を制限することによって、出力された言語情報の内容を把握しやすくすることができる。 In the language information providing apparatus, the upper limit number of characters of the language information is calculated, and the number of characters of the language information to be created is limited according to the calculated upper limit number of characters. By limiting the number of characters of the language information in this way, it is possible to easily grasp the contents of the output language information.

算出手段は、映像における字幕の表示時間を言語情報の出力時間として、表示される字幕の上限文字数を言語情報の上限文字数として算出し、作成手段は、字幕として用いられる翻訳文を言語情報として作成し、制限手段は、作成手段が作成する翻訳文の文字数を制限してもよい。この場合には、表示される字幕の上限文字数が算出され、算出された上限文字数に応じて、作成される翻訳文の文字数が制限される。このように翻訳文の文字数を制限することによって、翻訳文を読みやすくすることができる。 The calculation means calculates the display time of the caption in the video as the output time of the language information, calculates the upper limit number of characters of the displayed caption as the upper limit number of characters of the language information, and the creation means creates the translation sentence used as the caption as the language information Then, the restricting means may restrict the number of characters of the translation sentence created by the creating means. In this case, the upper limit number of characters of the displayed subtitle is calculated, and the number of characters of the translation to be created is limited according to the calculated upper limit number of characters. By limiting the number of characters in the translated sentence in this way, the translated sentence can be made easier to read.

制限手段は、予め定められたフレーズまたは単語が言語情報に含まれる場合には、制限を緩和してもよい。たとえばユーザが読み慣れていたり聞き慣れていたりすることによってユーザが短時間で把握することができるフレーズまたは単語を予め定められたフレーズまたは単語に設定しておけば、そのようなフレーズまたは単語が言語情報に含まれており上述の制限が緩和され言語情報の文字数が増えたとしても、言語情報が把握しにくくなることを抑制することができる。 The restricting means may relax the restriction when a predetermined phrase or word is included in the language information. For example, if a phrase or word that can be grasped in a short period of time by the user becoming accustomed to reading or accustomed to listening is set as a predetermined phrase or word, such a phrase or word can be used as a language. Even if the above-mentioned restrictions are relaxed and the number of characters of the language information is increased, it is possible to prevent the language information from becoming difficult to grasp.

制限手段は、予め定められたフレーズまたは単語が言語情報に含まれない場合には、言語情報の文字数が上限文字数を上回らないようにすることで、言語情報の文字数を制限し、予め定められたフレーズまたは単語が言語情報に含まれる場合には、言語情報の文字数が上限文字数を上回ることを許容することで、制限を緩和してもよい。たとえばこのようにして、言語情報の文字数の制限、および制限の緩和を行うことができる。 The restricting means limits the number of characters in the language information by preventing the number of characters in the language information from exceeding the upper limit number of characters when the predetermined phrase or word is not included in the language information. When a phrase or a word is included in the language information, the restriction may be relaxed by allowing the number of characters in the language information to exceed the upper limit number of characters. For example, in this way, the number of characters in the language information can be restricted and the restriction can be relaxed.

制限手段は、言語情報の基礎となる原文を作成し、作成手段は、制限手段によって作成された原文から出力される言語情報を作成し、制限手段は、原文を短くすることによって、上述の制限を行ってもよい。たとえばこのように原文を短くすることで、言語情報の文字数を制限することができる。 The restricting means creates an original text that is the basis of the language information, the creating means creates language information output from the original text created by the restricting means, and the restricting means shortens the original text, thereby May be performed. For example, the number of characters in the language information can be limited by shortening the original text in this way.

本発明によれば、内容を把握しやすい言語情報を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the linguistic information which can grasp | ascertain the content easily can be provided.

実施形態に係る装置の概略構成を示す図である。It is a figure which shows schematic structure of the apparatus which concerns on embodiment. 装置のハードウェア構成の例を示す図である。It is a figure which shows the example of the hardware constitutions of an apparatus. 装置において実行される処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process performed in an apparatus. 装置において実行される処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process performed in an apparatus. 字幕として表示される、文字数の制限された翻訳文の例を示す図である。It is a figure which shows the example of the translation sentence by which the number of characters was displayed displayed as a caption. 変形例に係る装置の概略構成を示す図である。It is a figure which shows schematic structure of the apparatus which concerns on a modification.

以下、本発明の実施形態について、図面を参照しながら説明する。なお、図面の説明において同一要素には同一符号を付し、重複する説明は省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the description of the drawings, the same elements are denoted by the same reference numerals, and redundant descriptions are omitted.

図１は、実施形態に係る言語情報提供装置（以下、単に「装置」という場合もある）の概略構成を示す図である。図１に示される装置１０は、たとえば、動画および音声を含む映像における音声の翻訳文の字幕を言語情報として、映像とともに出力する装置（翻訳装置）であってよい。その場合、装置１０では、映像の再生の際に翻訳文が作成されて映像とともに出力される。字幕ではなく、翻訳文に対応する音声（吹替音声）が言語情報として作成されて映像とともに出力されてもよい。また、翻訳を伴わなくとも、講演会などで聴覚障害者のために文字起こしされたトランスクリプト文が言語情報として作成され、出力されてもよい。トランスクリプト文の場合には、映像は出力されなくてもよい。装置１０は、テレビ、電光掲示板、ディジタルサイネージ、ＰＣ、スマートフォン、ユーザとのコミュニケーションが可能なロボットのような様々な電子機器において実現され得る。 FIG. 1 is a diagram illustrating a schematic configuration of a language information providing apparatus (hereinafter also simply referred to as “apparatus”) according to an embodiment. The apparatus 10 shown in FIG. 1 may be, for example, an apparatus (translation apparatus) that outputs, as language information, subtitles of a translation sentence of an audio in a video including a moving image and audio together with the video. In that case, in the device 10, a translation is created and output together with the video when the video is played back. Audio (dubbing audio) corresponding to the translated sentence instead of subtitles may be created as language information and output together with the video. Even without translation, a transcript sentence transcribed for a hearing impaired person in a lecture or the like may be created and output as language information. In the case of a transcript statement, the video does not have to be output. The device 10 can be realized in various electronic devices such as a television, an electric bulletin board, a digital signage, a PC, a smartphone, and a robot capable of communicating with a user.

図１に示されるように、装置１０は、入力部１１と、音声認識処理部１２と、文字数算出部１３と、原文作成部１４と、言語情報作成部１５と、自動要約部１６と、記憶部１７と、出力部１８とを含む。 As shown in FIG. 1, the apparatus 10 includes an input unit 11, a speech recognition processing unit 12, a character count calculation unit 13, a source text creation unit 14, a language information creation unit 15, an automatic summarization unit 16, and a storage Part 17 and output part 18.

入力部１１は、音声および映像を入力する部分である。たとえば、音声および映像のデータ（音声データおよび映像データ）が、入力部１１を構成し得るカメラ、マイクなどにより取得される。音声データおよび映像データは、入力部１１により、インターネットなどの通信網を介して、映像を配信するコンテンツサーバから取得されてもよい。また、入力部１１は、字幕、吹替音声およびトランスクリプト文のいずれの情報を言語情報として出力すべきかを指定するためのユーザ操作を受け付ける。たとえば、字幕あるいは吹替音声の出力を行うべき旨の指示が、入力部１１によって入力される。または、トランスクリプト文を出力するための処理（文字起こし処理）を実行すべき旨の指示が、入力部１１によって入力される。 The input unit 11 is a part for inputting audio and video. For example, audio and video data (audio data and video data) are acquired by a camera, a microphone, or the like that can constitute the input unit 11. The audio data and video data may be acquired by the input unit 11 from a content server that distributes video via a communication network such as the Internet. In addition, the input unit 11 receives a user operation for designating which information of subtitles, dubbed voices, and transcript sentences should be output as language information. For example, an instruction to output subtitles or voice-over audio is input by the input unit 11. Alternatively, the input unit 11 inputs an instruction to execute processing (transcription processing) for outputting a transcript sentence.

音声認識処理部１２は、入力部１１によって入力された音声を認識する部分である。音声の認識は、入力部１１によって入力された音声全体あるいは音声に含まれる各単語の意味、内容を特定することである。音声認識には、音声認識処理が実行可能な種々の公知の音声認識エンジンが用いられてよい。その場合、映像の属性に応じて異なる種類の音響モデル、言語モデルを備えた音声認識エンジンが用いられてもよい。 The voice recognition processing unit 12 is a part that recognizes the voice input by the input unit 11. Speech recognition is to specify the meaning and contents of the entire speech input by the input unit 11 or each word included in the speech. Various known speech recognition engines capable of performing speech recognition processing may be used for speech recognition. In that case, a speech recognition engine having different types of acoustic models and language models may be used depending on the video attributes.

文字数算出部１３は、上述の字幕、吹替音声およびトランスクリプト文などの言語情報が出力される場合の、出力される言語情報の上限文字数を算出する部分（算出手段）である。たとえば、文字数算出部１３は、映像において表示される字幕あるいは出力される吹替音声の最大許容文字数を上限文字数として算出する。字幕あるいは吹替音声の最大許容文字数は、映像の視聴者（この例では装置１０のユーザ）の判読能力を考慮した、映像の一つのシーンにおいて表示すべき翻訳文の文字数の上限文字数である。映像の一つのシーンとしては、映像中における或る人物が一つのセリフを述べるシーンなどが想定される。また、文字数算出部１３は、講演会等で表示されるトランスクリプト文の最大許容文字数を上限文字数として算出する。トランスクリプト文の最大許容文字数は、トランスクリプト文の視聴者の判読能力を考慮した、講演会等の一つのシーンにおいて表示すべきトランスクリプト文の文字数の上限文字数である。講演会等の一つのシーンとしては、スピーチを行っている人物が一つのセリフを述べるシーンなどが想定される。 The number-of-characters calculation unit 13 is a part (calculation unit) that calculates the upper limit number of characters of the language information to be output when language information such as the above-described subtitles, dubbed voices, and transcript sentences is output. For example, the number-of-characters calculation unit 13 calculates the maximum allowable number of characters of subtitles displayed in video or output dubbed voice as the upper limit number of characters. The maximum allowable number of characters for subtitles or dubbed audio is the upper limit of the number of characters in the translated text to be displayed in one scene of the video, taking into account the reading ability of the video viewer (in this example, the user of the device 10). As one scene of the video, a scene where a certain person in the video describes one line is assumed. In addition, the character number calculation unit 13 calculates the maximum allowable number of characters of a transcript sentence displayed at a lecture or the like as the upper limit number of characters. The maximum allowable number of characters in a transcript sentence is the upper limit of the number of characters in the transcript sentence that should be displayed in one scene such as a lecture, taking into account the viewer's ability to interpret the transcript sentence. As a scene such as a lecture, a scene where a person who is giving a speech describes a single line is assumed.

最大許容文字数の算出の例について説明する。最大許容文字数が字幕または吹替音声の上限文字数の場合には、たとえば、文字数算出部１３が、映像の一つのシーンにおける字幕の表示時間または吹替音声の出力時間と、上述の表示可能文字数とを乗ずることによって、最大許容文字数を算出する。最大許容文字数の算出のために、文字数算出部１３は、入力部１１によって入力された映像における字幕の表示時間または吹替音声の出力時間を取得する。表示時間等の取得は、たとえば、表示時間等を指定するデータが予め準備されており、当該データが音声データおよび映像データとともに入力部１１によって入力されることで取得されてもよい。表示時間等を指定するデータが無い場合には、文字数算出部１３は、表示時間等を測定してもよい。表示時間等の測定は、映像データを解析することによって行われる。たとえば、文字数算出部１３は、映像の一つのシーンにおいてセリフが音声として発せられている時間を測定し、測定した時間を映像における字幕の表示時間等とする。最大許容文字数がトランスクリプト文の上限文字数の場合には、たとえば、文字数算出部１３が、講演等の一つのシーンにおけるトランスクリプト文の表示時間と、上述の表示可能文字数を乗ずることによって、最大許容文字数を算出する。トランスクリプト文の表示時間については、上述の字幕の表示時間等と同様にして、文字数算出部１３によって取得されよい。 An example of calculating the maximum allowable number of characters will be described. When the maximum allowable number of characters is the upper limit number of characters of subtitles or dubbed audio, for example, the character number calculation unit 13 multiplies the subtitle display time or dubbed audio output time in one scene of the video by the above-described number of displayable characters. Thus, the maximum allowable number of characters is calculated. In order to calculate the maximum allowable number of characters, the number-of-characters calculation unit 13 acquires the subtitle display time or dubbing sound output time in the video input by the input unit 11. The acquisition of the display time or the like may be acquired, for example, by preparing data specifying the display time or the like in advance and inputting the data together with the audio data and the video data by the input unit 11. When there is no data specifying the display time or the like, the character number calculation unit 13 may measure the display time or the like. Measurement of display time and the like is performed by analyzing video data. For example, the number-of-characters calculation unit 13 measures the time during which speech is uttered as sound in one scene of the video, and sets the measured time as the caption display time in the video. When the maximum allowable number of characters is the upper limit number of characters in the transcript sentence, for example, the character number calculation unit 13 multiplies the display time of the transcript sentence in one scene such as a lecture by the above-described number of characters that can be displayed. Calculate the number of characters. The display time of the transcript sentence may be acquired by the character number calculation unit 13 in the same manner as the display time of the caption described above.

原文作成部１４は、後述の言語情報作成部１５によって作成される言語情報の基礎となる原文を作成する部分である。原文作成部１４がはじめに作成する原文は、入力部１１によって入力された音声に対応する文である。たとえば、入力部１１によって映像とともに入力される音声データが文字データ（テキストデータ）として与えられる場合には、当該文字データをそのまま用いることで原文が作成される。一方、そのような文字データが無い場合には、原文作成部１４は、たとえば、入力部１１によって入力され音声認識処理部１２によって認識された音声に対応する文字データを、原文として作成してもよい。 The original sentence creation unit 14 is a part that creates an original sentence that is the basis of language information created by the language information creation unit 15 described later. The original sentence created first by the original sentence creation unit 14 is a sentence corresponding to the voice input by the input unit 11. For example, when audio data input together with video by the input unit 11 is given as character data (text data), the original text is created by using the character data as it is. On the other hand, if there is no such character data, the original text creation unit 14 may create text data corresponding to the voice input by the input unit 11 and recognized by the voice recognition processing unit 12 as the original text. Good.

言語情報作成部１５は、原文作成部１４によって作成された原文から、出力される言語情報を作成する部分（作成手段）である。たとえば、言語情報が上述の字幕または吹替音声である場合には、言語情報作成部１５は、原文作成部１４によって作成された原文の翻訳文を作成する。翻訳には、種々の公知の機械翻訳技術が用いられてよい。この言語情報作成部１５によって作成された翻訳文は、映像に表示される字幕として用いられ得る。また、言語情報作成部１５は、翻訳文に対応する音声（映像とともに出力される合成音声）を作成する。合成音声は、翻訳文から自動的に作成されてもよいし、翻訳文を人（読み手）が読み上げることによって作成されてもよい。合成音声を自動的に作成する場合には、文字データを音声データに変換するための種々の公知の手法を用いるとよい。合成音声を読み手が作成する場合には、言語情報作成部１５によって作成された翻訳文がたとえば後述の出力部１８によって表示され、表示された翻訳文を読み手が読み上げるようにするとよい。言語情報作成部１５によって作成された翻訳文に対応する音声は、映像とともに出力される吹替音声として用いられ得る。言語情報が上述のトランスクリプト文である場合には、言語情報作成部１５は、原文作成部１４によって作成された原文そのまま用いることでトランスクリプト文を作成する。 The language information creating unit 15 is a part (creating unit) that creates language information to be output from the original text created by the original text creating unit 14. For example, when the language information is the above-described subtitle or dubbing voice, the language information creation unit 15 creates a translation of the original text created by the original text creation unit 14. Various known machine translation techniques may be used for translation. The translated sentence created by the language information creating unit 15 can be used as a caption displayed on the video. In addition, the language information creating unit 15 creates sound corresponding to the translated sentence (synthesized sound output together with the video). The synthesized speech may be automatically created from the translated sentence, or may be created by reading out the translated sentence by a person (reader). In the case of automatically generating synthesized speech, various known methods for converting character data into speech data may be used. When the reader creates the synthesized speech, the translated sentence created by the language information creating unit 15 is displayed by, for example, the output unit 18 described later, and the reader reads the displayed translated sentence. The sound corresponding to the translated sentence created by the language information creating unit 15 can be used as the dubbed sound output together with the video. When the language information is the above-described transcript sentence, the language information creating unit 15 creates a transcript sentence by using the original sentence created by the original sentence creating unit 14 as it is.

自動要約部１６は、原文作成部１４によって作成された原文の要約文を作成する部分である。要約文の作成は、原文を段階的に短くすることによって行うことができる。そのような要約文の作成手法はとくに限定されないが、一例として非特許文献１に記載された手法を用いることができる。 The automatic summarization unit 16 is a part that creates a summary sentence of the original text created by the original text creation unit 14. The summary sentence can be created by shortening the original sentence step by step. A method for creating such a summary sentence is not particularly limited, but the method described in Non-Patent Document 1 can be used as an example.

上述の原文作成部１４、言語情報作成部１５および自動要約部１６は、文字数算出部１３によって算出された最大許容文字数に応じて、言語情報の文字数を制限する部分（制限手段）でもある。制限手法の例について説明すると、まず、言語情報作成部１５が、原文作成部１４によって作成された原文を翻訳する。原文作成部１４は、言語情報作成部１５によって作成された言語情報の文字数をカウントし、カウントした言語情報の文字数が、文字数算出部１３によって算出された最大許容文字数を上回るか否かを判断する。翻訳文の文字数が最大許容文字数を上回る場合には、原文作成部１４は、自動要約部１６に原文を要約するよう指示する。原文作成部１４は、自動要約部１６が要約することによって得られる要約文を、新たな原文として作成する。新たに作成された原文は、はじめに作成された原文の要約文に相当するので、はじめに作成された原文よりも短くなる。言語情報作成部１５は、原文作成部１４によって新たに作成された原文から言語情報を作成する。これにより得られる言語情報の文字数は、最初に作成された言語情報の文字数よりも少ない。このようにして言語情報の文字数を減らすことで、言語情報の文字数を制限することができる。上述のように自動要約部１６による要約は段階的に行うことができるので、上記一連の処理を繰り返し実行することによって、言語情報も段階的に短くすることができる。そして、このような一連の処理を、言語情報の文字数が最大許容文字数以下となるまで繰り返し実行することによって、言語情報の文字数を最大許容文字数以下に制限する。 The above-described original text creation unit 14, language information creation unit 15, and automatic summarization unit 16 are also portions (limitation means) that limit the number of characters of language information according to the maximum allowable number of characters calculated by the number of characters calculation unit 13. An example of the restriction method will be described. First, the language information creation unit 15 translates the original text created by the original text creation unit 14. The source text creation unit 14 counts the number of characters of the language information created by the language information creation unit 15 and determines whether or not the counted number of characters of the language information exceeds the maximum allowable number of characters calculated by the number of characters calculation unit 13. . When the number of characters in the translated sentence exceeds the maximum allowable number of characters, the original sentence creating unit 14 instructs the automatic summarizing unit 16 to summarize the original sentence. The original sentence creation unit 14 creates a summary sentence obtained by summarization by the automatic summarization part 16 as a new original sentence. Since the newly created original text corresponds to the summary text of the original text created first, it is shorter than the original text created first. The language information creation unit 15 creates language information from the original text newly created by the original text creation unit 14. Thus, the number of characters of the language information obtained is smaller than the number of characters of the language information created first. By reducing the number of characters in the language information in this way, the number of characters in the language information can be limited. As described above, the summarization by the automatic summarization unit 16 can be performed in stages, so that the linguistic information can be shortened in stages by repeatedly executing the above series of processes. Then, by repeating such a series of processes until the number of characters in the language information becomes equal to or less than the maximum allowable number of characters, the number of characters in the language information is limited to the maximum allowable number of characters.

さらに、本実施形態では、予め定められたフレーズまたは単語が、言語情報作成部１５によって作成された言語情報に含まれる場合には、上述の言語情報の文字数の制限が緩和される。予め定められたフレーズおよび単語は、ユーザが読み慣れていたり聞き慣れていたりすることによって、他のフレーズおよび単語と比較して、ユーザが短時間で内容を把握することができるフレーズおよび単語である。フレーズの例としては、日常的に用いられる決まり文句「おはようございます」などが挙げられる。単語の例としては、日常的に用いられている固有名詞「ニュージーランド」などが挙げられる。 Furthermore, in the present embodiment, when a predetermined phrase or word is included in the language information created by the language information creation unit 15, the limitation on the number of characters in the language information is relaxed. Predetermined phrases and words are phrases and words that allow the user to grasp the contents in a short time compared to other phrases and words when the user is accustomed to reading or listening. . As an example of the phrase, there is a standard phrase “good morning” used daily. Examples of words include the proper noun “New Zealand” that is used on a daily basis.

原文作成部１４は、予め定められたフレーズまたは単語が言語情報に含まれるか否かを判断する。予め定められたフレーズまたは単語が言語情報に含まれない場合には、原文作成部１４は、先に説明したように言語情報作成部１５および自動要約部１６と協働して、言語情報の文字数が最大許容文字数を上回らないように言語情報の文字数を制限する。一方、予め定められたフレーズまたは単語が言語情報に含まれる場合には、原文作成部１４は、翻訳文の文字数が最大許容文字数を上回ることを許容することで、言語情報の文字数の制限を緩和する。たとえば、原文作成部１４は、言語情報に含まれる予め定められたフレーズまたは単語の文字数の合計文字数を、言語情報の文字数から差し引いた文字数を、新たな言語情報の文字数として算出する。算出した新たな言語情報の文字数と、最大許容文字数とを比較することによって、原文作成部１４は、先に説明したように言語情報作成部１５および自動要約部１６と協働し、翻訳文の文字数が上限文字数を上回らないように言語情報の文字数を制限する。 The original text creating unit 14 determines whether or not a predetermined phrase or word is included in the language information. When the predetermined phrase or word is not included in the language information, the original text creating unit 14 cooperates with the language information creating unit 15 and the automatic summarizing unit 16 as described above, and the number of characters of the language information. Limit the number of characters in the language information so that does not exceed the maximum allowable number of characters. On the other hand, when a predetermined phrase or word is included in the linguistic information, the original text creation unit 14 relaxes the restriction on the number of characters in the language information by allowing the number of characters in the translated sentence to exceed the maximum allowable number of characters. To do. For example, the original text creation unit 14 calculates the number of characters obtained by subtracting the total number of characters of a predetermined phrase or word included in the language information from the number of characters of the language information as the number of characters of the new language information. By comparing the calculated number of characters of the new language information with the maximum allowable number of characters, the source text creation unit 14 cooperates with the language information creation unit 15 and the automatic summarization unit 16 as described above, and Limit the number of characters in language information so that the number of characters does not exceed the maximum number of characters.

記憶部１７は、装置１０において実行される種々の処理に必要な情報を記憶する部分である。とくに、記憶部１７は、上述の表示可能文字数、上述の予め定められたフレーズおよび単語を記憶する。 The storage unit 17 is a part that stores information necessary for various processes executed in the apparatus 10. In particular, the storage unit 17 stores the above-described number of displayable characters and the above-described predetermined phrases and words.

出力部１８は、言語情報作成部１５によって作成された言語情報を出力する部分である。言語情報が原文の翻訳文である場合には、出力部１８は、原文の翻訳文を字幕とし、入力部１１によって入力された映像とともに出力する。言語情報が吹替音声の場合には、出力部１８は、吹替音声を映像とともに出力する。言語情報がトランスクリプト文の場合には、出力部１８は、原文を表示する。 The output unit 18 is a part that outputs the language information created by the language information creation unit 15. When the language information is a translated sentence of the original sentence, the output unit 18 sets the translated sentence of the original sentence as a caption and outputs it with the video input by the input unit 11. When the language information is dubbed sound, the output unit 18 outputs the dubbed sound together with the video. When the language information is a transcript sentence, the output unit 18 displays the original sentence.

言語情報作成部１５は、原文作成部１４によって作成された原文から、出力される言語情報を作成する部分（作成手段）である。たとえば、言語情報が上述の字幕または吹替音声である場合には、言語情報作成部１５は、原文作成部１４によって作成された原文の翻訳文を作成する。翻訳には、種々の公知の機械翻訳技術が用いられてよい。この言語情報作成部１５によって作成された翻訳文は、映像に表示される字幕として用いられ、また、翻訳文に対応する音声は、映像とともに出力される吹替音声として用いられ得る。言語情報が上述のトランスクリプト文である場合には、言語情報作成部１５は、原文作成部１４によって作成された原文そのまま用いることでトランスクリプト文を作成する。言語情報がトランスクリプト文の場合には、少なくともトランスクリプト文が出力されればよく、映像の出力は必須ではない。なお、出力部１８によって出力される字幕および映像の例を、後に図４を参照して説明する。 The language information creating unit 15 is a part (creating unit) that creates language information to be output from the original text created by the original text creating unit 14. For example, when the language information is the above-described subtitle or dubbing voice, the language information creation unit 15 creates a translation of the original text created by the original text creation unit 14. Various known machine translation techniques may be used for translation. The translated sentence created by the language information creating unit 15 can be used as subtitles displayed on the video, and the audio corresponding to the translated text can be used as dubbed audio output together with the video. When the language information is the above-described transcript sentence, the language information creating unit 15 creates a transcript sentence by using the original sentence created by the original sentence creating unit 14 as it is. When the language information is a transcript sentence, it is only necessary to output at least the transcript sentence, and the output of the video is not essential. An example of subtitles and videos output by the output unit 18 will be described later with reference to FIG.

図２は、装置のハードウェア構成の例を示す図である。図２に示されるように、装置１０は、物理的には、１または複数のＣＰＵ（Central Processing Unit）１０１、主記憶装置であるＲＡＭ（Random Access Memory）１０２およびＲＯＭ（Read Only Memory)１０３、データ送受信デバイスである通信モジュール１０４、半導体メモリなどの補助記憶装置１０５、ユーザ操作の入力を受け付ける入力装置１０６、ディスプレイといった出力装置１０７などを備えるコンピュータとして構成され得る。先に図１を参照して説明した装置１０の各機能は、たとえば、ＣＰＵ１０１、ＲＡＭ１０２などのハードウェア上に１または複数の所定のコンピュータソフトウェアを読み込ませることにより、ＣＰＵ１０１の制御のもとで通信モジュール１０４、入力装置１０６、出力装置１０７などを動作させるとともに、ＲＡＭ１０２および補助記憶装置１０５におけるデータの読み出しおよび書き込みを行うことで実現することができる。 FIG. 2 is a diagram illustrating an example of a hardware configuration of the apparatus. As shown in FIG. 2, the device 10 is physically composed of one or more CPUs (Central Processing Units) 101, a RAM (Random Access Memory) 102 and a ROM (Read Only Memory) 103, which are main storage devices, It can be configured as a computer including a communication module 104 that is a data transmission / reception device, an auxiliary storage device 105 such as a semiconductor memory, an input device 106 that receives input of a user operation, an output device 107 such as a display, and the like. The functions of the apparatus 10 described above with reference to FIG. 1 communicate with each other under the control of the CPU 101 by, for example, loading one or more predetermined computer software on the hardware such as the CPU 101 and the RAM 102. This can be realized by operating the module 104, the input device 106, the output device 107, and the like, and reading and writing data in the RAM 102 and the auxiliary storage device 105.

図３は、装置において実行される処理の一例を示すフローチャートである。このフローチャートの処理は、たとえば、再生中の映像の一つのシーン、あるいは講演会中の一つのシーンにおいて実行される。処理が映像の一つのシーンで実行される場合には、翻訳を伴う処理が実行される（後述のステップＳ４〜Ｓ９）。処理が講演会等の一つのシーンで実行される場合には、翻訳を伴わない文字起こし処理が実行される（後述のステップＳ１０〜Ｓ１３）。いずれの処理に分岐するかは、先に説明したように、入力部１１が受け付けたユーザ操作による指示内容に依存する。入力部１１によるユーザ操作の受け付けは、図３のフローチャートの開始に先立って行われているものとする。 FIG. 3 is a flowchart illustrating an example of processing executed in the apparatus. The process of this flowchart is executed, for example, in one scene of a video being reproduced or one scene in a lecture. When the process is executed in one scene of the video, a process involving translation is executed (steps S4 to S9 described later). When the process is executed in one scene such as a lecture, a transcription process without translation is executed (steps S10 to S13 described later). Which process is branched to depends on the content of the instruction by the user operation received by the input unit 11 as described above. It is assumed that the user operation is accepted by the input unit 11 prior to the start of the flowchart of FIG.

ステップＳ１において、装置１０は、言語情報の出力時間を取得する。具体的に、先に説明したように、入力部１１によって入力された音声等に基づいて、文字数算出部１３が、字幕の表示時間、吹替音声の出力時間、あるいは、トランスクリプト文の表示時間を、言語情報の出力時間として取得する。 In step S1, the apparatus 10 acquires the output time of language information. Specifically, as described above, based on the voice or the like input by the input unit 11, the character number calculation unit 13 determines the subtitle display time, dubbing voice output time, or transcript sentence display time. , Get as language information output time.

ステップＳ２において、装置１０は、出力時間から最大許容文字数を算出する。具体的に、文字数算出部１３が、先のステップＳ１で算出された字幕の表示時間（秒）、吹替音声の出力時間（秒）あるいはトランスクリプト文の出力時間（秒）と、記憶部１７に記憶されている表示可能文字数（たとえば４文字／秒）とを乗ずることによって、最大許容文字数を算出する。 In step S2, the apparatus 10 calculates the maximum allowable number of characters from the output time. Specifically, the number-of-characters calculation unit 13 stores the subtitle display time (seconds), dubbing voice output time (seconds) or transcript sentence output time (seconds) calculated in the previous step S1 in the storage unit 17. The maximum allowable number of characters is calculated by multiplying the stored number of displayable characters (for example, 4 characters / second).

ステップＳ３において、装置１０は、翻訳を行うべきかそれとも文字起こしを行うべきかを判断する。たとえば、装置１０のユーザ操作によって、字幕あるいは吹替音声の出力を行うべき旨の指示が入力部１１によって入力された場合には、翻訳処理が必要になるので、翻訳を行うべきと判断される。一方、ユーザ操作によって文字起こし処理を実行すべき旨の指示が入力部１１によって入力された場合には、文字起こしを行うべきと判断される。翻訳を行うべきと判断した場合、装置１０はステップＳ４〜Ｓ９に処理を進める。文字起こしを行うべきと判断した場合、装置１０はステップＳ１０〜Ｓ１３（後述の図４）に処理を進める。 In step S3, the apparatus 10 determines whether translation or transcription should be performed. For example, when an instruction to output subtitles or dubbed audio is input by the input unit 11 by a user operation of the device 10, it is determined that translation should be performed because translation processing is necessary. On the other hand, when an instruction to execute the transcription process is input by the user operation through the input unit 11, it is determined that the transcription should be performed. If it is determined that translation is to be performed, the apparatus 10 proceeds to steps S4 to S9. If it is determined that transcription should be performed, the apparatus 10 proceeds to steps S10 to S13 (FIG. 4 described later).

ステップＳ４において、装置１０は、機械翻訳を実行する。具体的に、自動要約部１６が、先に説明したように原文作成部１４によって作成された原文の翻訳文を作成する。このステップＳ４の処理は、後述のステップＳ８の処理が実行された場合には繰り返し実行されることとなるが、１回目のステップＳ４の処理において翻訳の基礎となる原文は、先に説明したように原文作成部１４によってはじめに作成された原文である。２回目以降のステップＳ４の処理では、後述のステップＳ８の処理により短くされた原文が、新たな原文として作成されることになる。 In step S4, the apparatus 10 performs machine translation. Specifically, the automatic summarization unit 16 creates a translation of the original text created by the original text creation unit 14 as described above. The process of step S4 is repeatedly executed when the process of step S8 described later is executed, but the original text that is the basis of translation in the first process of step S4 is as described above. The original text created first by the original text creation unit 14. In the process of step S4 after the second time, the original text shortened by the process of step S8 described later is created as a new original text.

ステップＳ５において、装置１０は、翻訳文に例外フレーズまたは単語が含まれるか否かを判断する。具体的に、原文作成部１４が、先のステップＳ４において作成された翻訳文と、記憶部１７に記憶されているフレーズまたは単語とを比較し、当該フレーズまたは単語が記憶部１７に含まれるか否かを判断する。翻訳文に例外フレーズまたは単語が含まれる場合（ステップＳ５：ＹＥＳ）、装置１０はステップＳ６に処理を進める。そうでない場合（ステップＳ５：ＮＯ）、装置１０はステップＳ６をスキップしてステップＳ７に処理を進める。 In step S5, the apparatus 10 determines whether or not an exceptional phrase or word is included in the translated sentence. Specifically, the original sentence creation unit 14 compares the translation sentence created in the previous step S4 with the phrase or word stored in the storage unit 17, and whether the phrase or word is included in the storage unit 17 or not. Judge whether or not. If the translated sentence includes an exceptional phrase or word (step S5: YES), the apparatus 10 advances the process to step S6. When that is not right (step S5: NO), the apparatus 10 skips step S6 and advances a process to step S7.

ステップＳ６において、装置１０は、例外フレーズまたは単語を翻訳文字数から減算する。具体的に、原文作成部１４が、先のステップＳ４において作成された翻訳文の文字数から、先のステップＳ５において翻訳文に含まれると判断されたフレーズまたは単語の文字数を減じた数を、新たな翻訳文の文字数として算出する。このようにして新たな翻訳文の文字数を算出すると、新たに算出する前の翻訳文の文字数が最大許容文字数を上回っていても、後述のステップＳ７においてＹＥＳの判定がされ得るとともに、後述のステップＳ８において最大許容文字数を上回る文字数の翻訳文が字幕として、あるいは翻訳文に対応する音声が吹替音声として、映像とともに出力され得る。 In step S6, the apparatus 10 subtracts the exceptional phrase or word from the number of translated characters. Specifically, the original sentence creating unit 14 newly adds the number obtained by subtracting the number of characters of the phrase or word determined to be included in the translated sentence in the previous step S5 from the number of characters of the translated sentence created in the previous step S4. Calculated as the number of characters in a translated sentence. When the number of characters in the new translated sentence is calculated in this way, even if the number of characters in the translated sentence before the new calculation exceeds the maximum allowable number of characters, a YES determination can be made in step S7 described later, and a step described later In S8, the translated sentence having the number of characters exceeding the maximum allowable number of characters can be output together with the video as subtitles or the sound corresponding to the translated sentence as dubbed sound.

ステップＳ７において、装置１０は、翻訳文字数が最大許容文字数以下であるか否かを判断する。具体的に、原文作成部１４が、先のステップＳ４で作成された翻訳文の文字数または先のステップＳ６で算出された新たな翻訳文の文字数が、先のステップＳ２で算出された最大許容文字数以下であるか否かを判断する。翻訳文字数が最大許容文字数以下の場合（ステップＳ７：ＹＥＳ）、装置１０はステップＳ９に処理を進める。そうでない場合（ステップＳ７：ＮＯ）、装置１０はステップＳ８に処理を進める。 In step S7, the apparatus 10 determines whether or not the number of translated characters is equal to or less than the maximum allowable number of characters. Specifically, the number of characters of the translated sentence created in the previous step S4 or the number of characters of the new translated sentence calculated in the previous step S6 is determined by the original sentence creating unit 14 as the maximum allowable number of characters calculated in the previous step S2. It is determined whether or not: If the number of translated characters is less than or equal to the maximum allowable number of characters (step S7: YES), the apparatus 10 advances the process to step S9. When that is not right (step S7: NO), the apparatus 10 advances a process to step S8.

ステップＳ８において、装置１０は、文を段階的に短くする。具体的に、自動要約部１６が、先のステップＳ４において作成された原文を要約する。原文作成部１４は、自動要約部１６が要約することによって得られる要約文を、新たな原文として作成する。ステップＳ８の処理が完了した後、装置１０はステップＳ４に再び処理を戻す。先に説明したようにステップＳ４では原文の翻訳文が作成される。また、図３に示されるようにこれらステップＳ４〜Ｓ８の処理は繰り返し実行され得る。これらの処理が繰り返し実行されることによって、原文とともに翻訳文が段階的に短くなる。 In step S8, the apparatus 10 shortens the sentence step by step. Specifically, the automatic summarization unit 16 summarizes the original text created in the previous step S4. The original sentence creation unit 14 creates a summary sentence obtained by summarization by the automatic summarization part 16 as a new original sentence. After the process of step S8 is completed, the apparatus 10 returns the process to step S4 again. As described above, a translation of the original sentence is created in step S4. Further, as shown in FIG. 3, the processes in steps S4 to S8 can be repeatedly executed. By repeatedly executing these processes, the translation sentence is shortened step by step along with the original sentence.

ステップＳ９において、装置１０は、字幕表示・音声合成またはトランスクリプト文表示を行う。具体的に、出力部１８が先のステップＳ４で作成された翻訳文を字幕として映像とともに表示するか、翻訳文に対応する音声を吹替音声として映像とともに出力する。または、原文作成部１４によって作成された原文あるいは後述のステップＳ１３において新たに作成された原文を、トランスクリプト文として出力する。ステップＳ９の処理が完了した後、装置１０は、フローチャートの処理を終了する。 In step S9, the apparatus 10 performs subtitle display / speech synthesis or transcript statement display. Specifically, the output unit 18 displays the translated sentence created in the previous step S4 as subtitles together with the video, or outputs the audio corresponding to the translated sentence as dubbed audio together with the video. Alternatively, the original sentence created by the original sentence creating unit 14 or the original sentence newly created in step S13 described later is output as a transcript sentence. After the process of step S9 is completed, the apparatus 10 ends the process of the flowchart.

次に、先のステップＳ３において文字起こしを行うべきと判断され、図４に示されるステップＳ１０に処理が進められた場合に実行される処理について説明する。 Next, processing that is executed when it is determined in step S3 that transcription should be performed and the processing proceeds to step S10 shown in FIG. 4 will be described.

ステップＳ１０において、装置１０は、文に例外フレーズまたは単語が含まれるか否かを判断する。具体的に、原文作成部１４が、先に説明したように作成したトランスクリプト文となり得る原文と、記憶部１７に記憶されているフレーズまたは単語とを比較し、当該フレーズまたは単語が記憶部１７に含まれているか否かを判断する。原文に例外フレーズまたは単語が含まれる場合（ステップＳ１０：ＹＥＳ）、装置１０はステップＳ１１に処理を進める。そうでない場合（ステップＳ１０：ＮＯ）、装置１０はステップＳ１１をスキップしてステップＳ１２に処理を進める。 In step S10, the apparatus 10 determines whether or not the sentence includes an exceptional phrase or a word. Specifically, the original sentence creation unit 14 compares the original sentence that can be a transcript sentence created as described above with a phrase or word stored in the storage unit 17, and the phrase or word is stored in the storage unit 17. It is judged whether it is included in. If the original sentence includes an exception phrase or word (step S10: YES), the apparatus 10 proceeds to step S11. When that is not right (step S10: NO), the apparatus 10 skips step S11 and advances a process to step S12.

ステップＳ１１において、装置１０は、例外フレーズまたは単語を文字数から減算する。具体的に、原文作成部１４が、原文の文字数から、先のステップＳ１０において原文に含まれると判断されたフレーズまたは単語の文字数を減じた数を、新たな原文の文字数として算出する。このようにして新たな原文の文字数を算出すると、新たに算出する前の原文の文字数が最大許容文字数を上回っていても、後述のステップＳ１２においてＹＥＳの判定がされ得るとともに、後に実行されるステップＳ９において最大許容文字数を上回る文字数の原文がトランスクリプト文として出力され得る。 In step S11, the apparatus 10 subtracts the exceptional phrase or word from the number of characters. Specifically, the original text creation unit 14 calculates the number of characters of the original text by subtracting the number of characters of the phrase or word determined to be included in the original text in the previous step S10 as the number of characters of the new original text. When the number of characters of the new original text is calculated in this way, even if the number of characters of the original text before the new calculation exceeds the maximum allowable number of characters, a YES determination can be made in step S12 described later, and the steps executed later In S9, an original sentence having a number of characters exceeding the maximum allowable number of characters can be output as a transcript sentence.

ステップＳ１２において、装置１０は、文字数が最大許容文字数以下であるか否かを判断する。具体的に、原文作成部１４が、原文の文字数または先のステップＳ１１で算出された新たな原文の文字数が、先のステップＳ２で算出された最大許容文字数以下であるか否かを判断する。文字数が最大許容文字数以下の場合（ステップＳ１２：ＹＥＳ）、装置１０は先に説明したステップＳ９（図３）に処理を進める。そうでない場合（ステップＳ１２：ＮＯ）、装置１０はステップＳ１３に処理を進める。 In step S12, the apparatus 10 determines whether the number of characters is equal to or less than the maximum allowable number of characters. Specifically, the original text creation unit 14 determines whether the number of characters in the original text or the number of characters in the new original text calculated in the previous step S11 is less than or equal to the maximum allowable number of characters calculated in the previous step S2. If the number of characters is less than or equal to the maximum allowable number of characters (step S12: YES), the apparatus 10 advances the process to step S9 (FIG. 3) described above. When that is not right (step S12: NO), the apparatus 10 advances a process to step S13.

ステップＳ１３において、装置１０は、文を段階的に短くする。この処理は、先のステップＳ８の処理と同様であるので、ここでは説明を省略する。ステップＳ１３の処理が完了した後、装置１０はステップＳ１０に再び処理を戻す。 In step S13, the apparatus 10 shortens the sentence step by step. Since this process is the same as the process of the previous step S8, description thereof is omitted here. After the process of step S13 is completed, the apparatus 10 returns the process to step S10 again.

以上説明した装置１０によれば、言語情報の上限文字数が算出され（ステップＳ２）、算出された上限文字数に応じて、作成される言語情報の文字数が制限される（ステップＳ４〜Ｓ８、Ｓ１０〜Ｓ１３）。このように言語情報の文字数を制限することによって、出力された言語情報（ステップＳ８）の内容を把握しやすくすることができる。 According to the apparatus 10 described above, the upper limit number of characters of language information is calculated (step S2), and the number of characters of language information to be created is limited according to the calculated upper limit number of characters (steps S4 to S8, S10). S13). By limiting the number of characters of the language information in this way, it is possible to easily grasp the contents of the output language information (step S8).

たとえば、映像に表示される字幕の上限文字数が算出され（ステップＳ２）、算出された上限文字数に応じて、作成される翻訳文の文字数が制限される（ステップＳ３〜Ｓ８）。このように表示される翻訳文の文字数を制限することによって、制限された文字数の翻訳文が字幕として映像とともに表示されたときに（ステップＳ９）、翻訳文が読みやすくなる。 For example, the upper limit number of subtitle characters displayed in the video is calculated (step S2), and the number of characters of the translation to be created is limited according to the calculated upper limit number of characters (steps S3 to S8). By limiting the number of characters of the translated text displayed in this way, when the translated text with the limited number of characters is displayed as a subtitle along with the video (step S9), the translated text becomes easy to read.

また、予め定められたフレーズまたは単語が言語情報に含まれる場合には（ステップＳ５：ＹＥＳ、ステップＳ１０：ＹＥＳ）、上述の制限が緩和される（ステップＳ６、Ｓ１１）。ここで、ユーザが読み慣れていたり聞き慣れていたりすることによってユーザが短時間で把握することができるフレーズまたは単語が、予め定められたフレーズとして設定されている。よって、そのようなフレーズまたは単語が言語情報に含まれており上述の制限が緩和され言語情報の文字数が増えて表示されたとしても（ステップＳ８）、言語情報が把握しにくくなることを抑制することができる。 Further, when a predetermined phrase or word is included in the language information (step S5: YES, step S10: YES), the above-described restriction is relaxed (steps S6 and S11). Here, a phrase or a word that can be grasped in a short time by the user becoming accustomed to reading or accustomed to listening is set as a predetermined phrase. Therefore, even if such a phrase or word is included in the language information and the above-mentioned restriction is relaxed and the number of characters of the language information is increased (step S8), it is suppressed that the language information becomes difficult to grasp. be able to.

また、予め定められたフレーズまたは単語が言語情報に含まれない場合には（ステップＳ５：ＮＯ、ステップＳ１０：ＮＯ）、言語情報の文字数が上限文字数を上回らないようにすることで、言語情報の文字数が制限される（ステップＳ７：ＮＯ、ステップＳ９、ステップＳ１２：ＮＯ、ステップＳ１３）。一方、予め定められたフレーズまたは単語が翻訳文に含まれる場合には（ステップＳ５：ＹＥＳ、ステップＳ１０：ＹＥＳ）、翻訳文の文字数が上限文字数を上回ることを許容することで、上述の制限が緩和される（ステップＳ６、ステップＳ７：ＹＥＳ、ステップＳ１１、ステップＳ１２：ＹＥＳ）。このようにして、言語情報の文字数の制限、および制限の緩和を行うことができる。 If a predetermined phrase or word is not included in the linguistic information (step S5: NO, step S10: NO), the linguistic information can be stored by making the number of characters in the linguistic information not exceed the upper limit number of characters. The number of characters is limited (step S7: NO, step S9, step S12: NO, step S13). On the other hand, when a predetermined phrase or word is included in the translated sentence (step S5: YES, step S10: YES), the above-mentioned restriction is allowed by allowing the number of characters in the translated sentence to exceed the upper limit number of characters. Relaxed (step S6, step S7: YES, step S11, step S12: YES). In this way, the number of characters in the language information can be restricted and the restriction can be relaxed.

また、言語情報の基礎となる原文を短くすることによって、上述の制限が行われる（ステップＳ８、Ｓ１３）。このように原文を短くすることで、言語情報の文字数を制限することができる。 Moreover, the above-mentioned restriction | limiting is performed by shortening the original text used as the foundation of language information (step S8, S13). By shortening the original text in this way, the number of characters in the language information can be limited.

図５は、字幕として表示される、文字数の制限された翻訳文の例を示す図である。図５の左側の画面Ｄ１に表示されている字幕としての翻訳文は比較例であり、図５の右側の画面Ｄ２に表示されている翻訳文が、装置１０によって文字数が制限された字幕である。これらの画面Ｄ１、Ｄ２は、出力部１８によって出力され得る画面であり、具体的に、先に図２を参照して説明したようなディスプレイなどの出力装置１０７に表示される画面である。 FIG. 5 is a diagram illustrating an example of a translated sentence with a limited number of characters displayed as subtitles. The translated sentence displayed on the screen D1 on the left side of FIG. 5 is a comparative example, and the translated sentence displayed on the screen D2 on the right side of FIG. 5 is a caption whose number of characters is limited by the device 10. . These screens D1 and D2 are screens that can be output by the output unit 18, and are specifically screens displayed on the output device 107 such as a display as described above with reference to FIG.

はじめに、比較例としての画面Ｄ１について説明する。画面Ｄ１には、文字数が制限されていない翻訳文が表示されている。この例では、映像中の人物がセリフ「先週末はお越しいただいて本当に楽しかったです。また近いうちにお尋ねください。」という原文を、原文の言語を用いて述べているシーンにおいて、その翻訳文が映像とともに表示されている。これに対し、画面Ｄ２には、文字数が制限された翻訳文が表示されている。画面Ｄ２を画面Ｄ１と比較すると、画面Ｄ２では、映像中の人物のセリフから「お越しいただいて本当に」および「また」との文言が削除されることによって要約された原文の翻訳文が表示されている。その結果、画面Ｄ２に表示される翻訳文の文字数が制限され、ユーザにとって判読し易いものとなっている。文字数が制限されたことで、画面Ｄ１では２行にわたって表示されていた翻訳文が、画面Ｄ２では１行で表示されるようにもなっている。これにより、ユーザは快適に字幕を判読することができるようになる。 First, a screen D1 as a comparative example will be described. On the screen D1, a translated sentence whose number of characters is not limited is displayed. In this example, in the scene where the person in the video says the original sentence, “I was really happy to come over last weekend. It is displayed with the video. On the other hand, a translated sentence with a limited number of characters is displayed on the screen D2. Comparing screen D2 with screen D1, screen D2 displays the translated text of the original text summarized by deleting the words “Come to me” and “Mata” from the words of the person in the video. Yes. As a result, the number of characters of the translated text displayed on the screen D2 is limited, and it is easy for the user to read. By limiting the number of characters, the translated text displayed on two lines on the screen D1 is also displayed on one line on the screen D2. As a result, the user can comfortably read the subtitles.

以上、本発明の一実施形態について説明したが、本発明は上記実施形態に限定されるものではない。 Although one embodiment of the present invention has been described above, the present invention is not limited to the above embodiment.

たとえば、装置において実行される処理の一部が、外部サーバにおいて実行されてもよい。図６はそのような変形例に係る装置１０Ａの概略構成を示す図である。装置１０Ａは、装置１０（図１）と比較して、音声認識処理部１２および言語情報作成部１５を含まない一方で、通信部１９を含む点で相違する。通信部１９は、サーバ２０と通信をするための部分である。サーバ２０は、音声認識処理部２１と、言語情報作成部２２と、通信部２３とを含む。音声認識処理部２１および言語情報作成部２２の機能は、先に説明した音声認識処理部１２および言語情報作成部１５と同様であるのでここでは説明を省略する。通信部２３は、装置１０Ａの通信部１９と通信を行う部分である。図６に示される装置１０Ａは、通信部１９を用いてサーバ２０と通信を行うことにより、サーバ２０の音声認識処理部２１および言語情報作成部２２の機能を利用することができる。このような構成によっても、装置１０Ａは、先に説明した装置１０と同様の処理を実行することができる。この場合、音声の認識および言語情報の作成のための処理がサーバ２０で実行される分だけ、装置１０Ａにおける処理負担を軽減することができる。 For example, a part of the processing executed in the device may be executed in the external server. FIG. 6 is a diagram showing a schematic configuration of an apparatus 10A according to such a modification. Device 10A is different from device 10 (FIG. 1) in that it does not include speech recognition processing unit 12 and language information creation unit 15, but includes communication unit 19. The communication unit 19 is a part for communicating with the server 20. The server 20 includes a voice recognition processing unit 21, a language information creation unit 22, and a communication unit 23. Since the functions of the speech recognition processing unit 21 and the language information creation unit 22 are the same as those of the speech recognition processing unit 12 and the language information creation unit 15 described above, the description thereof is omitted here. The communication unit 23 is a part that communicates with the communication unit 19 of the apparatus 10A. The apparatus 10 </ b> A shown in FIG. 6 can use the functions of the speech recognition processing unit 21 and the language information creation unit 22 of the server 20 by communicating with the server 20 using the communication unit 19. Even with such a configuration, the apparatus 10A can execute the same processing as that of the apparatus 10 described above. In this case, the processing load on the apparatus 10A can be reduced by the amount of processing executed by the server 20 for speech recognition and language information creation.

また、上記実施形態では、表示可能文字数として４文字／秒を例に挙げて説明した。ただし、表示可能文字数はこの例に限定されない。たとえば、映像の種類に応じて異なる表示可能文字数が採用されてもよい。映像の種類としては、映画の映像、講演会の映像、会議の映像などの様々な映像が想定される。たとえば映画の映像の場合には、表示可能文字数として３〜４文字／秒を採用してもよい。映像の種類に応じて表示可能文字数を使い分ける場合には、映像の種類と表示可能文字数とを対応付けて記述したデータテーブルを、記憶部１７が予め記憶しておくとよい。そして、たとえば文字数算出部１３が、映像の種類に応じた表示可能文字数を記憶部１７から取得し、取得した表示可能文字数を用いて最大許容翻訳文字数を算出するとよい。翻訳文が日本語とは異なる言語で翻訳文が作成され表示される場合には、言語に応じて異なる表示可能文字数が採用されてもよい。この場合も、上述した映像の種類に応じて表示可能文字数を使い分ける場合と同様に文字数算出部１３、記憶部１７をカスタマイズするとよい。その他にも、装置１０のユーザごと、あるいは、コンテンツごとに、表示可能文字数を使い分けてもよい。 Moreover, in the said embodiment, 4 characters / second was mentioned as an example and demonstrated as the number of displayable characters. However, the number of displayable characters is not limited to this example. For example, a different number of displayable characters may be employed depending on the type of video. As video types, various videos such as movie videos, lecture videos, and conference videos are assumed. For example, in the case of a movie image, 3 to 4 characters / second may be adopted as the number of displayable characters. When the number of displayable characters is properly used according to the type of video, the storage unit 17 may store in advance a data table in which the type of video and the number of displayable characters are described in association with each other. Then, for example, the number-of-characters calculation unit 13 may acquire the number of displayable characters corresponding to the type of video from the storage unit 17 and calculate the maximum allowable number of translated characters using the acquired number of displayable characters. When the translated sentence is created and displayed in a language different from Japanese, a different number of displayable characters may be employed depending on the language. In this case as well, the number-of-characters calculation unit 13 and the storage unit 17 may be customized as in the case where the number of characters that can be displayed is properly used according to the type of video described above. In addition, the number of displayable characters may be properly used for each user of the device 10 or for each content.

また、上記実施形態では、言語情報に予め定められたフレーズまたは単語が含まれる場合に、それらフレーズまたは単語の文字数を言語情報の文字数から減算する例について説明した。この場合、単にフレーズまたは単語の文字数をそのまま言語情報の文字数から減算するのではなく、フレーズまたは単語の種類に応じてそれらの文字数を補正したうえで（たとえば補正係数を乗じたうえで）、補正後の文字数を言語情報の文字数ら減算してもよい。その場合、フレーズまたは単語がユーザにとって読み慣れたり聞き慣れたりしているものであるほど、補正後の文字数が多くなるようにする（たとえば補正係数を大きくする）とよい。フレーズまたは単語の種類に応じて言語情報の文字数からの減算の程度を変える場合には、フレーズまたは単語の種類と、たとえば補正係数とを対応付けて記述したデータテーブルを、記憶部１７が予め記憶しておくとよい。そして、文字数算出部１３が、フレーズまたは単語の種類に応じた補正係数を記憶部１７から取得し、取得した補正係数をフレーズまたは単語の文字数に乗じた文字数を、言語情報の文字数から減算するとよい。 Moreover, in the said embodiment, when the predetermined phrase or word was contained in language information, the example which subtracts the number of characters of these phrases or a word from the number of characters of language information was demonstrated. In this case, instead of simply subtracting the number of characters in the phrase or word from the number of characters in the language information, correct the number of characters according to the type of phrase or word (for example, by multiplying by a correction factor) The number of characters after may be subtracted from the number of characters in the language information. In that case, it is better to increase the number of corrected characters (for example, increase the correction coefficient) as the phrase or word becomes more familiar to the user or familiar to the user. When the degree of subtraction from the number of characters in the language information is changed according to the type of phrase or word, the storage unit 17 stores in advance a data table in which the type of phrase or word is described in association with, for example, a correction coefficient. It is good to keep. Then, the number-of-characters calculation unit 13 acquires a correction coefficient corresponding to the type of the phrase or word from the storage unit 17, and subtracts the number of characters obtained by multiplying the acquired correction coefficient by the number of characters of the phrase or word from the number of characters of the language information. .

また、上記実施形態では、必要に応じて、文字数算出部１３が、映像の一つのシーンにおいてセリフが音声として発せられている時間を測定し、測定した時間を映像における字幕の表示時間とする例について説明した。ただし、セリフが発せられている時間よりも対応する字幕の表示時間が長くなるように、字幕の表示時間が設定されてもよい。逆に、セリフが発せられている時間よりも対応する字幕の表示時間が短くなるように、字幕の表示時間が設定されてもよい。 Moreover, in the said embodiment, the number-of-characters calculation part 13 measures the time when the speech is emitted as audio | voice in one scene of an image | video as needed, and makes the measured time the display time of a subtitle in an image | video as needed. Explained. However, the display time of the subtitles may be set so that the display time of the corresponding subtitles is longer than the time when the speech is being emitted. Conversely, the display time of the subtitles may be set so that the display time of the corresponding subtitles is shorter than the time when the speech is being emitted.

また、上記実施形態では、言語情報の文字数が最大許容文字数を上回るか否か、言語情報に予め定められたフレーズまたは単語が含まれているか否か、といった判断を、原文作成部１４が行う例について説明した。ただし、これらの判断は、原文作成部１４以外の要素、たとえば文字数算出部１３、言語情報作成部１５、自動要約部１６などによって行われてもよい。 Moreover, in the said embodiment, the original sentence preparation part 14 performs the judgment of whether the number of characters of language information exceeds the maximum number of allowable characters, or whether a predetermined phrase or word is included in language information, etc. Explained. However, these determinations may be made by elements other than the original text creation unit 14, such as the character count calculation unit 13, the language information creation unit 15, and the automatic summarization unit 16.

１０、１０Ａ…装置、１１…入力部、１２、２１…音声認識処理部、１３…文字数算出部、１４…原文作成部、１５、２２…言語情報作成部、１６…自動要約部、１７…記憶部、１８…出力部、１９、２３…通信部、２０…サーバ。 DESCRIPTION OF SYMBOLS 10, 10A ... Apparatus, 11 ... Input part, 12, 21 ... Speech recognition processing part, 13 ... Character number calculation part, 14 ... Original text creation part, 15, 22 ... Language information creation part, 16 ... Automatic summary part, 17 ... Memory | storage Part, 18 ... output part, 19, 23 ... communication part, 20 ... server.

Claims

Calculation means for calculating the upper limit number of characters of the language information to be output based on the output time of the language information;
Creating means for creating the output language information;
Limiting means for limiting the number of characters of the language information created by the creating means according to the upper limit number of characters calculated by the calculating means;
Comprising
Language information providing device.

The calculation means calculates the display time of the caption in the video as the output time of the language information, calculates the upper limit number of characters of the displayed caption as the upper limit number of characters of the language information,
The creating means creates a translation used as the subtitle as the language information,
The restricting means restricts the number of characters of the translation sentence created by the creating means;
The language information providing apparatus according to claim 1.

The language information providing apparatus according to claim 1, wherein the restriction unit relaxes the restriction when a predetermined phrase or word is included in the language information.

The limiting means is
When the predetermined phrase or word is not included in the language information, by limiting the number of characters of the language information by preventing the number of characters of the language information from exceeding the upper limit number of characters,
When the predetermined phrase or word is included in the language information, the limit is relaxed by allowing the number of characters in the language information to exceed the upper limit number of characters.
The language information providing apparatus according to claim 3.

The restriction means creates an original text that is the basis of the language information,
The creation means creates the output language information from the original text created by the restriction means,
The restriction means performs the restriction by shortening the original text.
The language information providing apparatus according to any one of claims 1 to 4.