JP2016173413A

JP2016173413A - Information provision system

Info

Publication number: JP2016173413A
Application number: JP2015052461A
Authority: JP
Inventors: 貴裕岩田; Takahiro Iwata; 真史権瓶; Masashi Gompei; 優樹瀬戸; Yuki Seto
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2015-03-16
Filing date: 2015-03-16
Publication date: 2016-09-29
Anticipated expiration: 2035-03-16
Also published as: JP6569252B2

Abstract

PROBLEM TO BE SOLVED: To provide a user with content associated with guidance voice in linkage with the radiation of the guidance voice.SOLUTION: An information provision system 100 comprises: an acquisition unit 110 for acquiring content Q associated with object sound collected by a sound collection unit 22; a sound radiation unit 28 for radiating, before the acquisition of the content Q by the acquisition unit 110, an acoustic including the acoustic component of the object sound and the acoustic component of previous notice information D for notifying a terminal device 30 of the provision of the content Q; and a delivery unit 120 for collecting the acoustic radiated by the radiation unit 28, and delivering the content Q acquired by the acquisition unit 110 to the terminal device 30 for notifying a user of the provision of the content Q in accordance with the previous notice information D extracted from the acoustic.SELECTED DRAWING: Figure 1

Description

本発明は、端末装置の利用者に情報を提供する技術に関する。 The present invention relates to a technique for providing information to a user of a terminal device.

画像や音声等のコンテンツを移動端末にて再生するための各種の技術が従来から提案されている。例えば特許文献１には、配信対象として事前に登録された移動端末に対して、当該移動端末の位置に応じたコンテンツを配信する技術が開示されている。 Various techniques for reproducing content such as images and sounds on a mobile terminal have been proposed. For example, Patent Document 1 discloses a technique for distributing content according to the position of a mobile terminal to a mobile terminal registered in advance as a distribution target.

特開２００２−３５１９０５号公報JP 2002-351905 A

例えば電車やバス等の交通機関では、乗降や乗換等に関する情報を利用者に案内する案内音声が随時に再生される。案内音声の発音内容の文字列や翻訳文等のコンテンツを案内音声の放音毎に利用者の移動端末に提供できれば、例えば案内音声の聴取が困難な難聴者や案内音声の言語の理解が困難な外国人等の利用者も案内音声の内容を把握できて便利である。以上の事情を考慮して、本発明は、案内音声に関連するコンテンツを当該案内音声の放音に連動して利用者に提供することを目的とする。 For example, in transportation facilities such as trains and buses, guidance voices that guide the user about information related to getting on and off, transfer, and the like are reproduced as needed. If it is possible to provide the user's mobile terminal with the content of the guidance voice pronunciation, such as a character string or a translated sentence, for example, it is difficult to understand the language of the guidance voice or a hard-of-hearing person who is difficult to hear the guidance voice. It is convenient for users such as foreigners to understand the contents of the guidance voice. In view of the above circumstances, an object of the present invention is to provide a user with content related to guidance voice in conjunction with sound emission of the guidance voice.

以上の課題を解決するために、本発明に係る情報提供システムは、収音部が収音した対象音に関連するコンテンツを取得する取得部と、前記取得部によるコンテンツの取得前に、前記対象音の音響成分と当該コンテンツの提供を前記端末装置に通知するための予告情報の音響成分とを含む音響を放音する放音部と、前記放音部が放音した音響を収音するとともに当該音響から抽出される前記予告情報に応じて利用者に前記コンテンツの提供を報知する端末装置に、前記取得部が取得したコンテンツを配信する配信部とを具備する。以上の構成では、収音部が収音した対象音が放音部から放音されて端末装置の利用者に聴取される一方、当該対象音に関連するコンテンツが取得部により取得されて配信部から端末装置に配信される。すなわち、対象音に関連するコンテンツを当該対象音の放音に連動して利用者に提供することが可能である。なお、対象音に関連するコンテンツの取得に相応の時間が必要である場合には、放音部による対象音の放音から遅延した時点でコンテンツが端末装置に提供されるから、端末装置の利用者が対象音とコンテンツとの対応を把握し難いという問題が発生し得る。本発明では、対象音に関連するコンテンツの提供を端末装置に通知する予告情報の音響成分が当該対象音の音響成分とともにコンテンツの生成完了前に放音部から放音されるから、対象音とコンテンツとの対応（対象音の放音後にコンテンツが配信されること）を端末装置の利用者が把握できるという利点もある。 In order to solve the above problems, an information providing system according to the present invention includes an acquisition unit that acquires content related to a target sound collected by a sound collection unit, and the target before the acquisition of the content by the acquisition unit. A sound emitting unit that emits sound including an acoustic component of sound and an acoustic component of notice information for notifying the terminal device of provision of the content, and collecting sound emitted by the sound emitting unit A distribution unit that distributes the content acquired by the acquisition unit to a terminal device that notifies the user of the provision of the content according to the advance notice information extracted from the sound; In the above configuration, the target sound collected by the sound collection unit is emitted from the sound emission unit and listened to by the user of the terminal device, while the content related to the target sound is acquired by the acquisition unit and the distribution unit To the terminal device. That is, the content related to the target sound can be provided to the user in conjunction with the sound emission of the target sound. In addition, when it takes time to acquire the content related to the target sound, the content is provided to the terminal device when it is delayed from the target sound being emitted by the sound emitting unit. There is a problem that it is difficult for a person to grasp the correspondence between the target sound and the content. In the present invention, the acoustic component of the advance notice information for notifying the terminal device of the provision of the content related to the target sound is emitted from the sound emitting unit together with the acoustic component of the target sound before the content generation is completed. There is also an advantage that the user of the terminal device can grasp the correspondence with the content (the content is distributed after the target sound is emitted).

本発明の好適な態様において、前記予告情報は、前記コンテンツの識別情報を含み、前記配信部は、前記放音部が放音した音響から抽出される前記識別情報を指定した配信要求を前記端末装置から受信した場合に、当該識別情報に対応するコンテンツを当該端末装置に配信する。以上の構成では、端末装置からの配信要求に対して当該識別情報に対応するコンテンツが配信されるから、配信対象の端末装置を事前に配信部に登録する必要がないという利点がある。また、予告情報に含まれる識別情報が配信要求で指定されるから、例えば複数のコンテンツが端末装置に対する配信候補として存在する場合でも、対象音に対応するコンテンツを容易に特定できるという利点がある。 In a preferred aspect of the present invention, the advance notice information includes identification information of the content, and the distribution unit issues a distribution request specifying the identification information extracted from the sound emitted by the sound emitting unit. When received from the device, the content corresponding to the identification information is distributed to the terminal device. With the above configuration, content corresponding to the identification information is distributed in response to a distribution request from the terminal device, and thus there is an advantage that it is not necessary to register the distribution target terminal device in the distribution unit in advance. In addition, since the identification information included in the advance notice information is specified by the distribution request, there is an advantage that the content corresponding to the target sound can be easily specified even when, for example, a plurality of contents exist as distribution candidates for the terminal device.

本発明の好適な態様において、前記配信部は、前記端末装置から複数回にわたり送信される前記配信要求を順次に受信し、前記配信要求の受信時に前記取得部によるコンテンツの取得が完了している場合に当該コンテンツを前記端末装置に配信する一方、当該コンテンツの取得が完了していなければコンテンツの配信を実行しない。以上の構成では、端末装置から複数回にわたり送信される配信要求を配信部が順次に受信し、受信時にコンテンツの取得が完了している場合に当該コンテンツが端末装置に配信される。したがって、例えば予告情報の受信から所定の時間にわたる待機時間の経過後に端末装置から配信要求を送信する構成と比較して、取得部によるコンテンツの取得が完了してから実際に当該コンテンツが端末装置に配信されるまでの遅延が短縮されるという利点がある。 In a preferred aspect of the present invention, the distribution unit sequentially receives the distribution requests transmitted a plurality of times from the terminal device, and content acquisition by the acquisition unit is completed when the distribution request is received. In this case, the content is distributed to the terminal device, but if the acquisition of the content is not completed, the content is not distributed. In the above configuration, the distribution unit sequentially receives distribution requests transmitted from the terminal device a plurality of times, and the content is distributed to the terminal device when acquisition of the content is completed at the time of reception. Therefore, for example, in comparison with a configuration in which a distribution request is transmitted from the terminal device after a lapse of a waiting time for a predetermined time from reception of the advance notice information, the content is actually transferred to the terminal device after the acquisition of the content by the acquisition unit is completed. There is an advantage that the delay until delivery is shortened.

本発明の好適な態様において、前記取得部は、前記対象音に対する音声認識で発音内容の文字列を特定する認識処理部と、複数の第１文字列の各々について当該第１文字列を他言語に翻訳した第２文字列を対応付けたテーブルから、前記認識処理部が特定した文字列に類似する第１文字列に対応する第２文字列を選択する選択処理部とを含み、前記選択処理部が選択した第２文字列を表すコンテンツを生成する。したがって、音声認識での誤認識や機械翻訳での誤訳を含まない適正な第２文字列を表すコンテンツを利用者に提供することができるという利点がある。 In a preferred aspect of the present invention, the acquisition unit includes a recognition processing unit that identifies a character string of pronunciation content by speech recognition on the target sound, and the first character string for each of the plurality of first character strings in another language. A selection processing unit that selects a second character string corresponding to a first character string similar to the character string specified by the recognition processing unit from a table in which the second character string translated into Content representing the second character string selected by the part is generated. Therefore, there is an advantage that it is possible to provide the user with content representing an appropriate second character string that does not include misrecognition in speech recognition or mistranslation in machine translation.

本発明の好適な態様において、前記取得部は、前記対象音に対する音声認識で発音内容の文字列を特定する認識処理部と、前記認識処理部が特定した文字列を他言語に翻訳する翻訳処理部とを含み、前記翻訳処理部による処理結果を表すコンテンツを生成する。以上の構成では、対象音の発音内容の文字列を他言語に翻訳した結果を表すコンテンツが端末装置に配信される。したがって、端末装置の利用者が対象音の言語を理解できない場合でも、端末装置に配信されたコンテンツにより対象音の発音内容を理解できるという利点がある。なお、翻訳処理部による処理結果を表すコンテンツとしては、例えば、翻訳処理部による翻訳後の文字列を表すコンテンツや、翻訳後の文字列を適用した音声合成で生成された合成音声を表すコンテンツが例示される。 In a preferred aspect of the present invention, the acquisition unit includes a recognition processing unit that specifies a character string of pronunciation content by speech recognition for the target sound, and a translation process that translates the character string specified by the recognition processing unit into another language. A content representing a processing result by the translation processing unit. In the above configuration, content representing the result of translating the character string of the pronunciation content of the target sound into another language is distributed to the terminal device. Therefore, even when the user of the terminal device cannot understand the language of the target sound, there is an advantage that the pronunciation of the target sound can be understood from the content distributed to the terminal device. The content representing the processing result by the translation processing unit includes, for example, content representing a character string after translation by the translation processing unit, and content representing synthesized speech generated by speech synthesis using the translated character string. Illustrated.

本発明の好適な態様において、前記認識処理部が特定した文字列を指示者からの指示に応じて編集する編集処理部を具備し、前記翻訳処理部は、前記編集処理部による編集後の文字列を他言語に翻訳する。以上の構成では、認識処理部が特定した文字列を指示者からの指示に応じて編集した文字列が他言語に翻訳される。したがって、例えば認識処理部による認識結果が誤認識を含む場合でも、編集により訂正することで翻訳処理部による正確な翻訳が実現され、対象音の発音内容を正確に利用者に通知できるという利点がある。 In a preferred aspect of the present invention, the image processing apparatus includes an editing processing unit that edits the character string specified by the recognition processing unit in accordance with an instruction from an instructor, and the translation processing unit is a character after editing by the editing processing unit. Translate the column to another language. In the above configuration, the character string edited by the recognition processing unit according to the instruction from the instructor is translated into another language. Therefore, for example, even when the recognition result by the recognition processing unit includes misrecognition, the translation processing unit can achieve accurate translation by correcting by editing, and the user can be notified of the pronunciation of the target sound accurately. is there.

以上の各態様に係る情報提供システムは、専用の電子回路で実現されるほか、ＣＰＵ（Central Processing Unit）等の汎用の演算処理装置とプログラムとの協働によっても実現される。本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性（non-transitory）の記録媒体であり、ＣＤ-ＲＯＭ等の光学式記録媒体（光ディスク）が好例であるが、半導体記録媒体や磁気記録媒体等の公知の任意の形式の記録媒体を包含し得る。なお、例えば、本発明のプログラムは、通信網を介した配信の形態で提供されてコンピュータにインストールされ得る。また、以上の各態様に係る情報提供システムの動作方法（情報提供方法）としても本発明は特定される。 The information providing system according to each of the aspects described above is realized by a dedicated electronic circuit or by cooperation of a general-purpose arithmetic processing device such as a CPU (Central Processing Unit) and a program. The program of the present invention can be provided in a form stored in a computer-readable recording medium and installed in the computer. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but a known arbitrary one such as a semiconductor recording medium or a magnetic recording medium This type of recording medium can be included. For example, the program of the present invention can be provided in the form of distribution via a communication network and installed in a computer. The present invention is also specified as an operation method (information providing method) of the information providing system according to each of the above aspects.

本発明の第１実施形態に係る音声案内システム１の構成図である。It is a lineblock diagram of voice guidance system 1 concerning a 1st embodiment of the present invention. 放音システム２０の構成図である。1 is a configuration diagram of a sound emission system 20. FIG. 信号処理部２８０の構成図である。3 is a configuration diagram of a signal processing unit 280. FIG. 情報管理システム１０の構成図である。1 is a configuration diagram of an information management system 10. FIG. 端末装置３０の構成図である。2 is a configuration diagram of a terminal device 30. FIG. 端末装置３０の構成図である。2 is a configuration diagram of a terminal device 30. FIG. 音声案内システム１の動作の説明図である。It is explanatory drawing of operation | movement of the voice guidance system. コンテンツＱの再生の一例を示す図である。It is a figure which shows an example of reproduction | regeneration of the content Q. 第２実施形態の情報管理システム１０の構成図である。It is a block diagram of the information management system 10 of 2nd Embodiment. 第２実施形態の案内テーブルＴB1の模式図である。It is a schematic diagram of guidance table TB1 of 2nd Embodiment. 本発明の第３実施形態に係る音声案内システム１の構成図である。It is a block diagram of the voice guidance system 1 which concerns on 3rd Embodiment of this invention. 本発明の変形例に係る情報提供システム１００の構成図である。It is a block diagram of the information provision system 100 which concerns on the modification of this invention.

＜第１実施形態＞
第１実施形態の音声案内システム１の概要について説明する。以下の説明では、第１実施形態の音声案内システム１を公共交通機関の音声案内に利用した構成を例示する。 <First Embodiment>
An outline of the voice guidance system 1 of the first embodiment will be described. In the following description, a configuration in which the voice guidance system 1 of the first embodiment is used for voice guidance of public transportation is illustrated.

図１は、本発明の第１実施形態に係る音声案内システム１の構成図である。図１に例示される通り、第１実施形態の音声案内システム１は、情報提供システム１００と端末装置３０とを包含する。情報提供システム１００は、端末装置３０に各種の情報を提供するコンピュータシステムである。端末装置３０は、例えば携帯電話機やスマートフォン等の可搬型の情報処理装置である。以下の説明では、空港施設Ｃに端末装置３０の利用者が所在し、利用者を案内する音声（以下「案内音声」という）に関連するコンテンツＱが端末装置３０に提供される。なお、図１では１個の端末装置３０のみを便宜的に図示したが、実際には複数の端末装置３０の各々に対して情報提供システム１００から情報が提供され得る。 FIG. 1 is a configuration diagram of a voice guidance system 1 according to the first embodiment of the present invention. As illustrated in FIG. 1, the voice guidance system 1 of the first embodiment includes an information providing system 100 and a terminal device 30. The information providing system 100 is a computer system that provides various information to the terminal device 30. The terminal device 30 is a portable information processing device such as a mobile phone or a smartphone. In the following description, the user of the terminal device 30 is located at the airport facility C, and the content Q related to the voice for guiding the user (hereinafter referred to as “guidance voice”) is provided to the terminal device 30. Although only one terminal device 30 is illustrated in FIG. 1 for the sake of convenience, in practice, information can be provided from the information providing system 100 to each of the plurality of terminal devices 30.

図１に例示される通り、第１実施形態の情報提供システム１００は、情報管理システム１０と放音システム２０とを具備する。放音システム２０は、空港施設Ｃに設置されて、空港施設Ｃ内の音声案内に利用される。具体的には、第１実施形態の放音システム２０は、特定の言語（以下「第１言語」という）の案内音声（対象音）を放音する。案内音声は、例えば、空港施設Ｃに乗入れする航空会社が運行する航空便に関する案内（例えば、搭乗案内、乗継案内、運行情報、遅延情報等）を表す音響である。また、放音システム２０は、案内音声を放音するとともに、当該案内音声に関連するコンテンツＱの提供を端末装置３０に事前に通知するための予告情報を端末装置３０に通知する。予告情報は、無線による情報通信で端末装置３０に通知される。第１実施形態では、案内音声の放音に並行して、空気振動としての音響（音波）を伝送媒体とする音響通信で放音システム２０から端末装置３０に予告情報を通知する場合を例示する。すなわち、予告情報は案内音声とともに音響として放音システム２０から放射される。第１実施形態では、案内音声に関連するコンテンツＱの識別情報Ｄを予告情報として生成する。 As illustrated in FIG. 1, the information providing system 100 according to the first embodiment includes an information management system 10 and a sound emission system 20. The sound emitting system 20 is installed in the airport facility C and used for voice guidance in the airport facility C. Specifically, the sound emission system 20 of the first embodiment emits a guidance voice (target sound) in a specific language (hereinafter referred to as “first language”). The guidance voice is, for example, a sound that represents guidance (for example, boarding guidance, transit guidance, operation information, delay information, etc.) related to an airline operated by an airline company that enters the airport facility C. Further, the sound emission system 20 emits a guidance voice and notifies the terminal apparatus 30 of advance notice information for notifying the terminal apparatus 30 in advance of the provision of the content Q related to the guidance voice. The advance notice information is notified to the terminal device 30 by wireless information communication. The first embodiment exemplifies a case in which notice information is notified from the sound emitting system 20 to the terminal device 30 by acoustic communication using sound (sound wave) as air vibration as a transmission medium in parallel with sound emission of the guidance voice. . That is, the advance notice information is radiated from the sound emitting system 20 as sound together with the guidance voice. In the first embodiment, the identification information D of the content Q related to the guidance voice is generated as the advance notice information.

他方、情報管理システム１０は、端末装置３０に提供される情報を管理するコンピュータシステムである。端末装置３０は、移動体通信網やインターネット等を含む通信網２００を介して情報管理システム１０と通信可能である。情報管理システム１０は、案内音声の放音が開始されると当該案内音声に関連するコンテンツＱを取得する。端末装置３０は、放音システム２０から事前に通知された識別情報Ｄを含むコンテンツＱの配信要求Ｒを情報管理システム１０に送信する。情報管理システム１０は、通信網２００を介して受信した配信要求Ｒで指定された識別情報Ｄに対応するコンテンツＱを要求元の端末装置３０に送信する。コンテンツＱは、案内音声に関連する情報である。第１実施形態では、案内音声で発音される第１言語の案内を他言語（以下「第２言語」という）に変換した翻訳を示すコンテンツＱが端末装置３０に提供される。したがって、第１言語を理解可能な利用者は、案内音声の聴取により案内を把握し、第２言語を理解可能な利用者は、コンテンツＱを参照することで案内を把握する。以上に概説した情報提供システム１００の各要素の具体的な構成や機能を以下に詳述する。 On the other hand, the information management system 10 is a computer system that manages information provided to the terminal device 30. The terminal device 30 can communicate with the information management system 10 via a communication network 200 including a mobile communication network and the Internet. The information management system 10 acquires the content Q related to the guidance voice when sound emission of the guidance voice is started. The terminal device 30 transmits a content Q distribution request R including the identification information D notified in advance from the sound emitting system 20 to the information management system 10. The information management system 10 transmits the content Q corresponding to the identification information D specified by the distribution request R received via the communication network 200 to the requesting terminal device 30. The content Q is information related to the guidance voice. In the first embodiment, the terminal device 30 is provided with the content Q indicating the translation obtained by converting the guidance in the first language pronounced by the guidance voice into another language (hereinafter referred to as “second language”). Therefore, a user who can understand the first language grasps the guidance by listening to the guidance voice, and a user who can understand the second language grasps the guidance by referring to the content Q. Specific configurations and functions of the elements of the information providing system 100 outlined above will be described in detail below.

＜放音システム２０＞
図２は、放音システム２０の構成図である。図２に例示される通り、第１実施形態の放音システム２０は、収音部２２と記憶部２１と通信部２４と設定部２６と放音部２８とを具備する。収音部２２は、周囲の音響を収音する音響機器（マイクロホン）であり、空港施設Ｃの案内を担当する案内者が発音した案内音声を収音し、当該案内音声の時間波形を表す音響信号ＳGを生成する。なお、収音部２２が生成した音響信号ＳGをアナログからデジタルに変換するＡ/Ｄ変換器の図示は便宜的に省略されている。記憶部２１は、放音システム２０の各要素を制御するＣＰＵ（図示省略）が実行するプログラム等を記憶する。半導体記録媒体や磁気記録媒体等の公知の記録媒体または複数種の記録媒体の組合せが記憶部２１として任意に採用される。記憶部２１に記憶されたプログラムをＣＰＵが実行することで、放音システム２０の各機能（設定部２６、放音部２８）を実現する。 <Sound emission system 20>
FIG. 2 is a configuration diagram of the sound emission system 20. As illustrated in FIG. 2, the sound emission system 20 of the first embodiment includes a sound collection unit 22, a storage unit 21, a communication unit 24, a setting unit 26, and a sound emission unit 28. The sound collecting unit 22 is an acoustic device (microphone) that picks up surrounding sounds, picks up the guidance voice that the guide who is in charge of guiding the airport facility C, and represents the time waveform of the guidance voice. A signal SG is generated. Note that an A / D converter that converts the acoustic signal SG generated by the sound collection unit 22 from analog to digital is not shown for convenience. The storage unit 21 stores a program executed by a CPU (not shown) that controls each element of the sound emission system 20. A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media is arbitrarily employed as the storage unit 21. Each function (setting part 26, sound emission part 28) of sound emission system 20 is realized because CPU runs the program memorized by storage part 21.

設定部２６は、コンテンツＱを識別するための識別情報Ｄを予告情報として生成する。識別情報Ｄは、複数のコンテンツＱの各々を一意に識別可能な符号である。設定部２６は、案内音声の発生毎に当該案内音声のコンテンツＱの識別情報Ｄを生成する。通信部２４は、設定部２６が生成した識別情報Ｄと収音部２２が生成した音響信号ＳGとを、通信網２００を介して情報管理システム１０に送信する通信機器である。 The setting unit 26 generates identification information D for identifying the content Q as the advance notice information. The identification information D is a code that can uniquely identify each of the plurality of contents Q. The setting unit 26 generates identification information D of the content Q of the guidance voice every time the guidance voice is generated. The communication unit 24 is a communication device that transmits the identification information D generated by the setting unit 26 and the acoustic signal SG generated by the sound collection unit 22 to the information management system 10 via the communication network 200.

放音部２８は、案内音声の音響成分（音響信号ＳG）と識別情報Ｄの音響成分とを含む音響を放音する手段であり、図２に例示される通り、信号処理部２８０とスピーカー２８６とを具備する。信号処理部２８０は、設定部２６が設定した識別情報Ｄを音響信号ＳGに合成することで音響信号Ｓ1を生成する。音響信号ＳGに対する識別情報Ｄの合成（音響透かし）には公知の方法が任意に採用され得るが、例えば国際公開第２０１０／０１６５８９号に開示された方法が好適である。具体的には、信号処理部２８０は、図３に例示される通り、変調処理部２８２と混合処理部２８４とを包含する。変調処理部２８２は、拡散符号を利用した識別情報Ｄの拡散変調と所定の周波数の搬送波を利用した周波数変換とを順次に実行することで、識別情報Ｄを所定の周波数帯域の音響成分として含有する音響信号（以下「変調信号」という）ＳDを生成する。変調信号ＳDの周波数帯域は、放音システム２０による放音と端末装置３０による収音とが可能な周波数帯域であり、かつ、利用者が通常の環境で聴取する音声（例えば案内音声）や楽音等の音響の周波数帯域（例えば可聴域内の約１６ｋＨｚ以下）を上回る周波数帯域（例えば１８ｋＨｚ以上かつ２０ｋＨｚ以下）の範囲内に包含される。図３の混合処理部２８４は、収音部２２が生成した音響信号ＳGと変調処理部２８２が生成した変調信号ＳDとを重畳（典型的には加算）することで音響信号Ｓ1を生成する。以上の説明から理解される通り、音響信号Ｓ1は、案内音声の音響成分（音響信号ＳG）と当該案内音声のコンテンツＱの識別情報Ｄを含む音響成分（変調信号ＳD）とを含有する。スピーカー２８６は、信号処理部２８０から供給される音響信号Ｓ1に応じた音響を放音する音響機器である。なお、音響信号Ｓ1をデジタルからアナログに変換するＤ/Ａ変換器の図示は便宜的に省略されている。第１実施形態では、案内者による案内音声の発音に並行して実時間的に、音響信号Ｓ1に応じた音響が放音部２８（スピーカー２８６）から放音される。 The sound emitting unit 28 is a unit that emits sound including the acoustic component (acoustic signal SG) of the guidance voice and the acoustic component of the identification information D, and as illustrated in FIG. 2, the signal processing unit 280 and the speaker 286. It comprises. The signal processing unit 280 generates the acoustic signal S1 by combining the identification information D set by the setting unit 26 with the acoustic signal SG. For synthesizing the identification information D with the acoustic signal SG (acoustic watermarking), a known method can be arbitrarily adopted. For example, the method disclosed in International Publication No. 2010/016589 is suitable. Specifically, the signal processing unit 280 includes a modulation processing unit 282 and a mixing processing unit 284, as illustrated in FIG. The modulation processing unit 282 includes the identification information D as an acoustic component in a predetermined frequency band by sequentially executing spread modulation of the identification information D using a spread code and frequency conversion using a carrier wave of a predetermined frequency. An acoustic signal (hereinafter referred to as “modulation signal”) SD is generated. The frequency band of the modulation signal SD is a frequency band that can be emitted by the sound emission system 20 and collected by the terminal device 30, and voices (eg, guidance voices) and musical sounds that the user listens to in a normal environment. In the range of the frequency band (for example, 18 kHz or more and 20 kHz or less) exceeding the frequency band of the sound (for example, about 16 kHz or less in the audible range). The mixing processing unit 284 in FIG. 3 generates the acoustic signal S1 by superimposing (typically adding) the acoustic signal SG generated by the sound collection unit 22 and the modulation signal SD generated by the modulation processing unit 282. As understood from the above description, the acoustic signal S1 contains the acoustic component (acoustic signal SG) of the guidance voice and the acoustic component (modulation signal SD) including the identification information D of the content Q of the guidance voice. The speaker 286 is an acoustic device that emits sound corresponding to the acoustic signal S 1 supplied from the signal processing unit 280. The D / A converter that converts the acoustic signal S1 from digital to analog is not shown for convenience. In the first embodiment, the sound corresponding to the acoustic signal S1 is emitted from the sound emitting unit 28 (speaker 286) in real time in parallel with the pronunciation of the guidance voice by the guide.

＜情報管理システム１０＞
図４は、情報管理システム１０の構成図である。図４に例示される通り、第１実施形態の情報管理システム１０は、取得部１１０と配信部１２０と記憶部１３０とを具備する。記憶部１３０は、情報管理システム１０の各要素を制御するＣＰＵが実行するプログラム等を記憶する。半導体記録媒体や磁気記録媒体等の公知の記録媒体または複数種の記録媒体の組合せが記憶部１３０として任意に採用される。記憶部１３０に記憶されたプログラムをＣＰＵが実行することで、情報管理システム１０の各機能（取得部１１０、配信部１２０）を実現する。取得部１１０は、通信部２４から供給された案内音声の音響信号ＳGを用いて、案内音声に関連するコンテンツＱを取得する手段であり、図４に例示される通り、認識処理部１１２と翻訳処理部１１４とを含んで構成される。第１実施形態の取得部１１０は、案内音声の音響信号ＳGを用いてコンテンツＱを生成することで、案内音声に関連するコンテンツＱを取得する。 <Information management system 10>
FIG. 4 is a configuration diagram of the information management system 10. As illustrated in FIG. 4, the information management system 10 according to the first embodiment includes an acquisition unit 110, a distribution unit 120, and a storage unit 130. The storage unit 130 stores a program executed by a CPU that controls each element of the information management system 10. A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media is arbitrarily adopted as the storage unit 130. Each function (acquisition part 110, delivery part 120) of information management system 10 is realized by CPU executing a program memorized by storage part 130. The acquisition unit 110 is a unit that acquires the content Q related to the guidance voice using the acoustic signal SG of the guidance voice supplied from the communication unit 24. As illustrated in FIG. And a processing unit 114. The acquisition unit 110 according to the first embodiment acquires the content Q related to the guidance voice by generating the content Q using the acoustic signal SG of the guidance voice.

認識処理部１１２は、放音システム２０の通信部２４から供給される案内音声の音響信号ＳGに対する音声認識で、案内音声の発音内容を表現する文字列Ｌを特定する。音響信号ＳGの音声認識には、例えばＨＭＭ等の音響モデルと言語的な制約を示す言語モデルとを利用した認識技術等の公知の技術が任意に採用され得る。翻訳処理部１１４は、認識処理部１１２が特定した文字列Ｌを機械翻訳により他言語に変換した文字列をコンテンツＱとして生成する。具体的には、翻訳処理部１１４は、第１言語で発音された案内音声の発音内容を表現する文字列Ｌを第２言語に変換した文字列をコンテンツＱとして生成する。翻訳処理部１１４による機械翻訳には公知の技術が任意に採用され得る。例えば、文字列Ｌの構文解析の結果と言語的な規則とを参照して語順や単語を変換するルールベースの機械翻訳や、言語の統計的な傾向を表現する統計モデル（翻訳モデルおよび言語モデル）を利用して文字列Ｌを第２言語に翻訳する統計的な機械翻訳を、文字列Ｌの翻訳に利用することが可能である。以上の説明から理解される通り、第１実施形態の取得部１１０（認識処理部１１２，翻訳処理部１１４）は、第１言語で発音された案内音声の発音内容を表現する文字列を第２言語に変換した文字列をコンテンツＱとして取得する。取得部１１０は、生成したコンテンツＱと放音システム２０から受信した識別情報Ｄとを対応付けて記憶部１３０に格納する。 The recognition processing unit 112 identifies a character string L expressing the pronunciation content of the guidance voice by voice recognition with respect to the acoustic signal SG of the guidance voice supplied from the communication unit 24 of the sound emitting system 20. For voice recognition of the acoustic signal SG, a known technique such as a recognition technique using an acoustic model such as an HMM and a language model indicating linguistic restrictions can be arbitrarily adopted. The translation processing unit 114 generates, as the content Q, a character string obtained by converting the character string L specified by the recognition processing unit 112 into another language by machine translation. Specifically, the translation processing unit 114 generates, as the content Q, a character string obtained by converting the character string L expressing the pronunciation content of the guidance voice pronounced in the first language into the second language. A known technique can be arbitrarily employed for the machine translation by the translation processing unit 114. For example, a rule-based machine translation that converts a word order or a word by referring to a result of parsing a character string L and a linguistic rule, or a statistical model that expresses a statistical tendency of a language (translation model and language model) Statistical machine translation that translates the character string L into the second language by using () can be used to translate the character string L. As understood from the above description, the acquisition unit 110 (recognition processing unit 112, translation processing unit 114) of the first embodiment generates a second character string expressing the pronunciation content of the guidance voice pronounced in the first language. The character string converted into the language is acquired as the content Q. The acquisition unit 110 stores the generated content Q and the identification information D received from the sound emission system 20 in the storage unit 130 in association with each other.

配信部１２０は、取得部１１０が取得したコンテンツＱを端末装置３０に配信する。具体的には、配信部１２０は、識別情報Ｄを含むコンテンツＱの配信要求Ｒを端末装置３０から受信した場合に、当該識別情報Ｄに対応するコンテンツＱを、要求元の端末装置３０に配信する。 The distribution unit 120 distributes the content Q acquired by the acquisition unit 110 to the terminal device 30. Specifically, when the distribution unit 120 receives the distribution request R of the content Q including the identification information D from the terminal device 30, the distribution unit 120 distributes the content Q corresponding to the identification information D to the requesting terminal device 30. To do.

以上の説明から理解される通り、収音部２２に対する案内音声の発音を契機として、当該案内音声と識別情報Ｄの音響成分とが放音部２８から放音される一方、当該案内音声の音響信号ＳGが情報管理システム１０に送信されてコンテンツＱが生成される。すなわち、案内音声に関連するコンテンツＱを、案内音声の放音に連動して端末装置３０の利用者に配信することが可能である。ただし、案内音声は、収音部２２による収音後に識別情報Ｄの音響成分と混合されて直ちに放音されるのに対し、コンテンツＱは、放音システム２０と情報管理システム１０との間で通信網２００を介した音響信号ＳGの授受と取得部１１０による処理（音声認識，機械翻訳）とを経て生成されて端末装置３０に送信可能な状態となる。したがって、案内音声の放音が開始される時点では、当該案内音声のコンテンツＱの生成は未だ完了しておらず、端末装置３０にコンテンツＱを送信することはできない。すなわち、第１実施形態では、コンテンツＱの生成の完了前に、放音部２８によって案内音声の音響成分と識別情報Ｄの音響成分とを含む音響が放音される。 As understood from the above description, the sound of the guide voice and the acoustic component of the identification information D are emitted from the sound emitting section 28, triggered by the sound of the guide voice to the sound collection unit 22, while the sound of the guide voice is heard. The signal SG is transmitted to the information management system 10 to generate the content Q. That is, the content Q related to the guidance voice can be distributed to the user of the terminal device 30 in conjunction with the sound emission of the guidance voice. However, while the guidance voice is mixed with the acoustic component of the identification information D after being picked up by the sound pickup unit 22, it is immediately emitted, whereas the content Q is between the sound emission system 20 and the information management system 10. The sound signal SG is transmitted and received through the communication network 200 and processed by the acquisition unit 110 (voice recognition, machine translation), and can be transmitted to the terminal device 30. Therefore, at the time when the guidance voice starts to be emitted, the generation of the content Q of the guidance voice is not yet completed, and the content Q cannot be transmitted to the terminal device 30. That is, in the first embodiment, the sound including the acoustic component of the guidance voice and the acoustic component of the identification information D is emitted by the sound emitting unit 28 before the generation of the content Q is completed.

＜端末装置３０＞
図５は、端末装置３０の構成図である。図５に例示される通り、端末装置３０は、収音部３１０と情報抽出部３２０と送信部３３０と受信部３４０と表示処理部３５０と出力部３６０と記憶部３７０とを含んで構成される。記憶部３７０は、端末装置３０の各要素を制御するＣＰＵが実行するプログラム等を記憶する。半導体記録媒体や磁気記録媒体等の公知の記録媒体または複数種の記録媒体の組合せが記憶部３７０として任意に採用される。記憶部３７０に記憶されたプログラムをＣＰＵが実行することで情報抽出部３２０および表示処理部３５０が実現される。収音部３１０は、周囲の音響を収音する音響機器（マイクロホン）であり、放音システム２０のスピーカー２８６から放音される音響を収音して音響信号Ｓ2を生成する。音響信号Ｓ2は、識別情報Ｄの音響成分を含有する。なお、収音部３１０が生成した音響信号Ｓ2をアナログからデジタルに変換するＡ/Ｄ変換器の図示は便宜的に省略されている。 <Terminal device 30>
FIG. 5 is a configuration diagram of the terminal device 30. As illustrated in FIG. 5, the terminal device 30 includes a sound collection unit 310, an information extraction unit 320, a transmission unit 330, a reception unit 340, a display processing unit 350, an output unit 360, and a storage unit 370. . The storage unit 370 stores a program executed by a CPU that controls each element of the terminal device 30. A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media is arbitrarily adopted as the storage unit 370. The information extraction unit 320 and the display processing unit 350 are realized by the CPU executing the program stored in the storage unit 370. The sound collection unit 310 is an acoustic device (microphone) that collects ambient sound, and collects sound emitted from the speaker 286 of the sound emission system 20 to generate an acoustic signal S2. The acoustic signal S2 contains the acoustic component of the identification information D. Note that an A / D converter that converts the acoustic signal S2 generated by the sound collection unit 310 from analog to digital is not shown for convenience.

情報抽出部３２０は、収音部３１０が生成した音響信号Ｓ2の復調で識別情報Ｄを抽出する。具体的には、情報抽出部３２０は、音響信号Ｓ2のうち識別情報Ｄを含む周波数帯域の帯域成分を例えば帯域通過フィルタで強調し、識別情報Ｄの拡散変調に利用された拡散符号を係数とする整合フィルタを通過させることで識別情報Ｄを抽出する。送信部３３０および受信部３４０は、通信網２００を介して情報管理システム１０と通信する通信機器である。送信部３３０は、情報抽出部３２０が抽出した識別情報Ｄを含むコンテンツＱの配信要求Ｒを情報管理システム１０に送信する。受信部３４０は、配信要求Ｒに応じて情報管理システム１０（配信部１２０）から配信されたコンテンツＱを受信する。 The information extraction unit 320 extracts the identification information D by demodulating the acoustic signal S2 generated by the sound collection unit 310. Specifically, the information extraction unit 320 emphasizes the band component of the frequency band including the identification information D in the acoustic signal S2, for example, with a band pass filter, and uses the spreading code used for the spread modulation of the identification information D as the coefficient. The identification information D is extracted by passing the matched filter. The transmission unit 330 and the reception unit 340 are communication devices that communicate with the information management system 10 via the communication network 200. The transmission unit 330 transmits the distribution request R of the content Q including the identification information D extracted by the information extraction unit 320 to the information management system 10. The receiving unit 340 receives the content Q distributed from the information management system 10 (distributing unit 120) in response to the distribution request R.

出力部３６０は、各種の情報を出力する。第１実施形態の出力部３６０は、画像を表示する表示装置（例えば液晶表示パネル等）である。表示処理部３５０は、出力部３６０に画像を表示させる。例えば、表示処理部３５０は、受信部３４０が情報管理システム１０から受信したコンテンツＱを出力部３６０に表示させる。すなわち、案内音声を他言語に翻訳した文字列がコンテンツＱとして表示される。また、第１実施形態の表示処理部３５０は、案内音声に対応するコンテンツＱの配信を、当該コンテンツＱの実際の配信に先行して端末装置３０の利用者に事前に報知（すなわち予告）する。具体的には、表示処理部３５０は、情報抽出部３２０による識別情報Ｄの抽出を契機として、案内音声に関連するコンテンツＱが直後に配信される旨のメッセージを出力部３６０に表示させる。出力部３６０は、例えば、図６に例示される通り、コンテンツＱと同様の第２言語で生成されたメッセージ”A message of voice guidance will soon be provided.”を識別情報Ｄ[#101]とともに出力することで、利用者が聴取した案内音声に関連するコンテンツＱが配信されることを利用者に事前に報知する。すなわち、端末装置３０に対するコンテンツＱの配信に先立ち、利用者は、当該案内音声に関連するコンテンツＱの配信を、案内音声の聴取とともに事前に把握することが可能である。 The output unit 360 outputs various types of information. The output unit 360 of the first embodiment is a display device (for example, a liquid crystal display panel) that displays an image. The display processing unit 350 causes the output unit 360 to display an image. For example, the display processing unit 350 causes the output unit 360 to display the content Q received by the receiving unit 340 from the information management system 10. That is, a character string obtained by translating the guidance voice into another language is displayed as the content Q. In addition, the display processing unit 350 according to the first embodiment notifies the user of the terminal device 30 of the distribution of the content Q corresponding to the guidance voice in advance (that is, the advance notice) prior to the actual distribution of the content Q. . Specifically, the display processing unit 350 causes the output unit 360 to display a message that the content Q related to the guidance voice is to be distributed immediately after the extraction of the identification information D by the information extraction unit 320. For example, as illustrated in FIG. 6, the output unit 360 outputs the message “A message of voice guidance will soon be provided” generated in the second language similar to the content Q together with the identification information D [# 101]. By doing so, the user is notified in advance that the content Q related to the guidance voice listened to by the user is distributed. That is, prior to the distribution of the content Q to the terminal device 30, the user can grasp the distribution of the content Q related to the guidance voice in advance together with listening to the guidance voice.

図７は、音声案内システム１の全体的な動作の説明図である。放音システム２０の収音部２２は、案内者が第１言語で発音した案内音声を収音して音響信号ＳGを生成する（ＳA1）。通信部２４は、収音部２２が生成した音響信号ＳGと識別情報Ｄとを情報管理システム１０に送信する（ＳA2）。情報管理システム１０の取得部１１０は、音響信号ＳGと識別情報Ｄとの受信を契機として、案内音声に関連するコンテンツＱの生成を開始する。図７に斜線で便宜的に図示される通り、案内音声のコンテンツＱの生成には相応の時間が必要である。 FIG. 7 is an explanatory diagram of the overall operation of the voice guidance system 1. The sound collection unit 22 of the sound emission system 20 collects the guidance voice that the guide pronounced in the first language to generate the acoustic signal SG (SA1). The communication unit 24 transmits the acoustic signal SG and the identification information D generated by the sound collection unit 22 to the information management system 10 (SA2). The acquisition unit 110 of the information management system 10 starts generating the content Q related to the guidance voice, triggered by reception of the acoustic signal SG and the identification information D. As shown for convenience in FIG. 7 by hatching, it takes time to generate the content Q of the guidance voice.

他方、放音部２８の信号処理部２８０は、収音部２２が生成した音響信号ＳGに識別情報Ｄを合成することで音響信号Ｓ1を生成し（ＳA3）、スピーカー２８６は音響信号Ｓ1に応じた音響を放音する（ＳA4）。端末装置３０の収音部３１０は、スピーカー２８６が放音した案内音声を収音して音響信号Ｓ2を生成する。情報抽出部３２０は、収音部３１０が生成した音響信号Ｓ2の復調で案内音声の識別情報Ｄを抽出する（ＳA5）。図７に例示される通り、情報抽出部３２０による識別情報Ｄの抽出の時点ではコンテンツＱの生成は完了していない可能性がある。表示処理部３５０は、情報抽出部３２０による識別情報Ｄの抽出を契機として（すなわちコンテンツＱの生成の完了／未完に関わらず）、例えば、図６で例示したように、案内音声に関連するコンテンツＱが配信されることを利用者に報知するメッセージを出力部３６０に表示させる（ＳA6）。これにより、端末装置３０に対するコンテンツＱの実際の配信に先立ち、利用者は、当該案内音声に関連するコンテンツＱが近く配信されることを、案内音声の聴取とともに事前に把握することが可能である。 On the other hand, the signal processing unit 280 of the sound emitting unit 28 generates the acoustic signal S1 by combining the identification information D with the acoustic signal SG generated by the sound collecting unit 22 (SA3), and the speaker 286 responds to the acoustic signal S1. Sound is released (SA4). The sound collection unit 310 of the terminal device 30 collects the guidance voice emitted by the speaker 286 and generates an acoustic signal S2. The information extraction unit 320 extracts the guidance voice identification information D by demodulating the acoustic signal S2 generated by the sound collection unit 310 (SA5). As illustrated in FIG. 7, the generation of the content Q may not be completed at the time when the information extraction unit 320 extracts the identification information D. The display processing unit 350 is triggered by the extraction of the identification information D by the information extraction unit 320 (that is, regardless of whether the generation of the content Q has been completed or not), for example, as illustrated in FIG. A message notifying the user that Q is distributed is displayed on the output unit 360 (SA6). Thereby, prior to the actual distribution of the content Q to the terminal device 30, the user can grasp in advance together with listening to the guidance voice that the content Q related to the guidance voice will be delivered soon. .

情報抽出部３２０による識別情報Ｄの抽出と表示処理部３５０による報知とが実行されると、送信部３３０は、情報抽出部３２０が抽出した識別情報Ｄを含むコンテンツＱの配信要求Ｒを、当該コンテンツＱが実際に端末装置３０に配信されるまで複数回にわたり情報管理システム１０に送信する（ＳA7，ＳA9）。 When the extraction of the identification information D by the information extraction unit 320 and the notification by the display processing unit 350 are performed, the transmission unit 330 sends the content Q distribution request R including the identification information D extracted by the information extraction unit 320 to the distribution request R. The content Q is transmitted to the information management system 10 a plurality of times until it is actually distributed to the terminal device 30 (SA7, SA9).

配信部１２０は、端末装置３０から複数回にわたって送信される配信要求Ｒを順次に受信し、配信要求Ｒで指定される識別情報ＤのコンテンツＱの生成が完了しているか否かを配信要求Ｒの受信毎に判定する。図７のステップＳA7の配信要求Ｒの受信時点ではコンテンツＱの生成が未だ完了していないから、配信部１２０は、配信不可の応答を端末装置３０に送信する（ＳA8）。すなわち、コンテンツＱは配信されない。他方、図７のステップＳA9の配信要求ＲはコンテンツＱの生成の完了の直後に配信部１２０により受信される。配信要求Ｒの受信時点でコンテンツＱの生成が完了している場合、配信部１２０は、取得部１１０が生成したコンテンツＱを要求元の端末装置３０に配信する（ＳA10）。端末装置３０の受信部３４０は、情報管理システム１０から配信されたコンテンツＱを受信し（ＳA11）、表示処理部３５０は、受信部３４０が受信したコンテンツＱを出力部３６０に表示させる（ＳA12）。 The distribution unit 120 sequentially receives the distribution request R transmitted a plurality of times from the terminal device 30, and determines whether or not the generation of the content Q of the identification information D specified by the distribution request R has been completed. It is determined every time when receiving. Since the generation of the content Q has not yet been completed at the time of receiving the distribution request R in step SA7 in FIG. 7, the distribution unit 120 transmits a non-distributable response to the terminal device 30 (SA8). That is, the content Q is not distributed. On the other hand, the distribution request R in step SA9 in FIG. 7 is received by the distribution unit 120 immediately after the generation of the content Q is completed. When the generation of the content Q is completed at the time of receiving the distribution request R, the distribution unit 120 distributes the content Q generated by the acquisition unit 110 to the requesting terminal device 30 (SA10). The receiving unit 340 of the terminal device 30 receives the content Q distributed from the information management system 10 (SA11), and the display processing unit 350 displays the content Q received by the receiving unit 340 on the output unit 360 (SA12). .

図８は、コンテンツＱの再生の一例である。図８では、航空機の搭乗開始を案内する第１言語（日本語）の案内音声が放音された場合に生成されるコンテンツＱが例示されている。図８に例示される通り、「ABC航空パリ行き78便のお客様はただ今から81番ゲートよりご搭乗頂きます」という第１言語の案内音声が放音システム２０から放音された場合、当該案内音声を第２言語（英語）に翻訳した「Passengers on ABC Airlines flight 78 to Paris are now on boarding at gate 81」という文字列がコンテンツＱとして出力部３６０に表示される。 FIG. 8 is an example of the reproduction of the content Q. FIG. 8 illustrates the content Q generated when the first language (Japanese) guidance voice for guiding the start of boarding the aircraft is emitted. As illustrated in FIG. 8, when the guidance voice in the first language saying that “78 flights to ABC Airlines Paris are just going to board from gate 81” is emitted from the sound emission system 20, the guidance will be given. A character string “Passengers on ABC Airlines flight 78 to Paris are now on boarding at gate 81” obtained by translating the voice into the second language (English) is displayed on the output unit 360 as the content Q.

以上に説明した通り、第１実施形態では、収音部２２が収音した案内音声が放音部２８から放音されて端末装置３０の利用者に聴取される一方、案内音声に関連するコンテンツＱが生成されて配信部１２０から端末装置３０に配信される。したがって、案内音声に関連するコンテンツＱを端末装置３０の利用に提供することが可能である。第１実施形態では、第１言語で発音された案内音声を第２言語に変換したコンテンツＱが生成されるから、第１言語を理解可能な利用者は案内音声の聴取により空港施設Ｃの案内を把握し、第２言語を理解可能な利用者はコンテンツＱの参照で当該案内を把握することが可能である。 As described above, in the first embodiment, the guidance sound collected by the sound collection unit 22 is emitted from the sound emission unit 28 and listened to by the user of the terminal device 30, while the content related to the guidance sound is obtained. Q is generated and distributed from the distribution unit 120 to the terminal device 30. Therefore, the content Q related to the guidance voice can be provided for use of the terminal device 30. In the first embodiment, the content Q is generated by converting the guidance voice pronounced in the first language into the second language. Therefore, a user who can understand the first language can guide the airport facility C by listening to the guidance voice. The user who can understand the second language can grasp the guidance by referring to the content Q.

ところで、案内音声に関連するコンテンツＱの生成に相応の時間が必要である場合には、収音部２２による案内音声の放音から大きく遅延した時点で当該案内音声のコンテンツＱが端末装置３０に配信および出力される。したがって、案内音声のコンテンツＱの配信が利用者に事前に報知されない構成（前述の表示処理部３５０を省略した構成である。以下「対比例」という）では、放音部２８から放音される案内音声と放音後に端末装置３０の出力部３６０から出力されるコンテンツＱとの対応を把握し難いという問題が発生し得る。第１実施形態では、案内音声に関連するコンテンツＱの配信を端末装置３０に通知する予告情報（識別情報Ｄ）の音響成分が当該案内音声の音響成分とともに放音部２８から放音されて直後のコンテンツＱの配信が利用者に事前に報知されるから、案内音声とコンテンツＱとの対応（案内音声の放音後にコンテンツＱが配信されること）を端末装置３０の利用者が把握できるという利点もある。 By the way, if it takes a certain amount of time to generate the content Q related to the guidance voice, the content Q of the guidance voice is sent to the terminal device 30 when it is greatly delayed from the sound emission of the guidance voice by the sound collection unit 22. Delivered and output. Therefore, in the configuration in which the distribution of the guidance voice content Q is not notified to the user in advance (the above-described display processing unit 350 is omitted, hereinafter referred to as “proportional”), the sound is emitted from the sound emitting unit 28. There may be a problem that it is difficult to grasp the correspondence between the guidance voice and the content Q output from the output unit 360 of the terminal device 30 after sound emission. In the first embodiment, immediately after the acoustic component of the advance notice information (identification information D) for notifying the terminal device 30 of the delivery of the content Q related to the guidance voice is emitted from the sound emitting unit 28 together with the acoustic component of the guidance voice. Since the delivery of the content Q is notified to the user in advance, the user of the terminal device 30 can grasp the correspondence between the guidance voice and the content Q (the content Q is delivered after the guidance voice is emitted). There are also advantages.

第１実施形態では、放音部２８が放音した音響から抽出される識別情報Ｄを指定した配信要求Ｒを配信部１２０が端末装置３０から受信した場合に、識別情報Ｄに対応するコンテンツＱが配信部１２０から端末装置３０に配信される。すなわち、端末装置３０からの配信要求Ｒに対してコンテンツＱが配信されるから、コンテンツＱの配信対象として端末装置３０を事前に登録する必要がない。また、識別情報Ｄが配信要求Ｒで指定されるから、例えば複数のコンテンツＱが端末装置３０に対する配信候補として記憶部１３０に格納された場合でも、利用者が聴取した案内音声に対応するコンテンツＱを容易に特定できるという利点がある。 In the first embodiment, the content Q corresponding to the identification information D when the distribution unit 120 receives the distribution request R specifying the identification information D extracted from the sound emitted by the sound emitting unit 28 from the terminal device 30. Is distributed from the distribution unit 120 to the terminal device 30. That is, since the content Q is distributed in response to the distribution request R from the terminal device 30, it is not necessary to register the terminal device 30 in advance as the distribution target of the content Q. Further, since the identification information D is specified by the distribution request R, for example, even when a plurality of contents Q are stored in the storage unit 130 as distribution candidates for the terminal device 30, the contents Q corresponding to the guidance voice listened to by the user There is an advantage that can be easily identified.

第１実施形態では、配信部１２０は、端末装置３０から複数回にわたり送信される配信要求Ｒを順次に受信し、配信要求Ｒの受信時に取得部１１０によるコンテンツＱの生成が完了している場合には当該コンテンツＱを端末装置３０に配信する一方、コンテンツＱの生成が完了していなければコンテンツＱの配信を実行しない。したがって、例えば識別情報Ｄの受信から所定の時間（例えばコンテンツＱの生成に想定される所要時間の最大値）にわたる待機時間の経過後に端末装置３０から配信要求Ｒを送信する構成と比較して、取得部１１０によるコンテンツＱの生成が完了してから実際に当該コンテンツＱが端末装置３０に配信されるまでの遅延が短縮されるという利点がある。 In the first embodiment, the distribution unit 120 sequentially receives the distribution request R transmitted from the terminal device 30 a plurality of times, and when the generation unit 110 completes the generation of the content Q when the distribution request R is received. On the other hand, the content Q is distributed to the terminal device 30, while the content Q is not distributed unless the generation of the content Q is completed. Therefore, for example, in comparison with a configuration in which the distribution request R is transmitted from the terminal device 30 after a lapse of a standby time over a predetermined time (for example, the maximum time required for generating the content Q) from the reception of the identification information D, There is an advantage that a delay from when the generation of the content Q by the acquisition unit 110 is completed until the content Q is actually distributed to the terminal device 30 is shortened.

第１実施形態では、案内音声に対する音声認識で発音内容の文字列Ｌを特定し、文字列Ｌを他言語に翻訳した結果をコンテンツＱとして生成する。すなわち、第１言語の案内音声の発音内容の文字列を第２言語に翻訳した結果を表すコンテンツＱが端末装置３０に配信される。したがって、端末装置３０の利用者が案内音声の言語を理解できない場合でも、端末装置３０に配信されたコンテンツＱを確認することで案内音声の発音内容を理解できるという利点がある。 In the first embodiment, the character string L of the pronunciation content is identified by voice recognition for the guidance voice, and the result of translating the character string L into another language is generated as the content Q. That is, the content Q representing the result of translating the character string of the pronunciation content of the first language guidance voice into the second language is delivered to the terminal device 30. Therefore, even when the user of the terminal device 30 cannot understand the language of the guidance voice, there is an advantage that the pronunciation content of the guidance voice can be understood by checking the content Q delivered to the terminal device 30.

＜第２実施形態＞
本発明の第２実施形態を説明する。以下に例示する各態様において作用や機能が第１実施形態と同様である要素については、第１実施形態の説明で使用した符号を流用して各々の詳細な説明を適宜に省略する。放音部２８による案内音声の放音に連動して予告情報（識別情報Ｄ）を端末装置３０に通知する動作、および、情報抽出部３２０による識別情報Ｄの抽出を契機として表示処理部３５０がコンテンツＱの配信を利用者に予告する動作は、第１実施形態と同様である。 Second Embodiment
A second embodiment of the present invention will be described. Regarding the elements whose functions and functions are the same as those of the first embodiment in each aspect exemplified below, the detailed description of each is appropriately omitted by using the reference numerals used in the description of the first embodiment. The display processing unit 350 is triggered by the operation of notifying the terminal device 30 of the advance notice information (identification information D) in conjunction with the sound emission of the guidance sound by the sound emission unit 28 and the extraction of the identification information D by the information extraction unit 320. The operation of notifying the user of the delivery of the content Q is the same as in the first embodiment.

図９は、第２実施形態の情報管理システム１０の構成図である。図９に例示される通り、第２実施形態の情報管理システム１０は、第１実施形態の翻訳処理部１１４を選択処理部１１８に置換した構成である。また、第２実施形態の記憶部１３０は、第１実施形態と同様の情報に加えて案内テーブルＴB1を記憶する。認識処理部１１２および配信部１２０の機能および動作は、第１実施形態と同様であるので、詳細な説明を省略する。 FIG. 9 is a configuration diagram of the information management system 10 of the second embodiment. As illustrated in FIG. 9, the information management system 10 of the second embodiment has a configuration in which the translation processing unit 114 of the first embodiment is replaced with a selection processing unit 118. Further, the storage unit 130 of the second embodiment stores a guide table TB1 in addition to the same information as that of the first embodiment. Since the functions and operations of the recognition processing unit 112 and the distribution unit 120 are the same as those in the first embodiment, detailed description thereof is omitted.

図１０は、案内テーブルＴB1の模式図である。図１０に例示される通り、案内テーブルＴB1には、空港施設Ｃの案内者による発音が予定される案内音声の発音内容を表現する第１言語の複数の文字列（第１文字列）Ｘ（Ｘ1，Ｘ2，Ｘ3，…）と、当該文字列Ｘを第２言語に翻訳した文字列（第２文字列）Ｙ（Ｙ1，Ｙ2，Ｙ3，…）とが相互に対応付けられる。案内テーブルＴB1の各文字列Ｘは、例えば、案内者が案内音声の発音時に参照するアナウンスブックに収録された文章である。図１０では、各種の挨拶文の文字列Ｘとその翻訳文の文字列Ｙとが例示されている。 FIG. 10 is a schematic diagram of the guidance table TB1. As illustrated in FIG. 10, the guide table TB1 includes a plurality of character strings (first character strings) X (first character string) X (representing the pronunciation content of the guidance voice to be pronounced by the guide of the airport facility C. X1, X2, X3,...) And a character string (second character string) Y (Y1, Y2, Y3,...) Obtained by translating the character string X into the second language are associated with each other. Each character string X of the guidance table TB1 is, for example, a sentence recorded in an announcement book that the guide refers to when the guidance voice is pronounced. In FIG. 10, a character string X of various greetings and a character string Y of translations thereof are illustrated.

選択処理部１１８は、認識処理部１１２が案内音声の音響信号ＳGから特定した文字列Ｌに対応する文字列Ｙを案内テーブルＴB1から選択する。具体的には、選択処理部１１８は、案内テーブルＴB1の複数の文字列Ｘ（Ｘ1，Ｘ2，Ｘ3，…）のうち、認識処理部１１２が特定した文字列Ｌに最も類似する１個の文字列Ｘを特定し、当該文字列Ｘに対応付けられた文字列Ｙを選択する。文字列Ｌと文字列Ｘとの類否の判定には、編集距離（レーベンシュタイン距離）等の公知の指標が任意に採用され得る。選択処理部１１８が選択した文字列ＹはコンテンツＱとして識別情報Ｄとともに記憶部１３０に記憶される。以降の処理は第１実施形態と同様である。 The selection processing unit 118 selects the character string Y corresponding to the character string L identified from the acoustic signal SG of the guidance voice by the recognition processing unit 112 from the guidance table TB1. Specifically, the selection processing unit 118 selects one character most similar to the character string L specified by the recognition processing unit 112 among the plurality of character strings X (X1, X2, X3,...) Of the guidance table TB1. A column X is specified, and a character string Y associated with the character string X is selected. For determining the similarity between the character string L and the character string X, a known index such as an edit distance (Levenstein distance) can be arbitrarily adopted. The character string Y selected by the selection processing unit 118 is stored in the storage unit 130 together with the identification information D as the content Q. The subsequent processing is the same as in the first embodiment.

図１０の例で、文字列Ｌ「おはようございます。」が特定された場合、選択処理部１１８は、案内テーブルＴB1の複数の文字列Ｘのうち、文字列Ｌと編集距離が近似する文字列Ｘ1「おはようございます。」を選択し、文字列Ｘ1に対応付けられる文字列Ｙ1“Good morning”をコンテンツＱとして識別情報Ｄとともに記憶部１３０に格納する。 In the example of FIG. 10, when the character string L “Good morning” is specified, the selection processing unit 118 selects the character string whose edit distance approximates the character string L among the plurality of character strings X in the guidance table TB1. X1 “Good morning” is selected, and the character string Y1 “Good morning” associated with the character string X1 is stored as content Q in the storage unit 130 together with the identification information D.

第２実施形態によっても、第１実施形態と同様の効果を奏することが可能である。また、第２実施形態では、第１言語の文字列Ｘと第２言語の文字列Ｙとが複数組にわたり事前に用意され、音声認識で特定される文字列Ｌに類似する文字列Ｘに対応付けられた文字列ＹがコンテンツＱとして生成される。すなわち、利用者に提供されるコンテンツＱは、事前に用意された文字列に限定される。したがって、認識処理部１１２に誤認識が発生した場合でも適正な文字列Ｙを利用者に提供することが可能である。また、音声認識で特定した文字列Ｌを機械翻訳する第１実施形態の構成と比較して誤訳の可能性を低減できるという利点もある。すなわち、第２実施形態によれば、確実に利用者が理解できる適正な文字列ＹのコンテンツＱを利用者に提供することが可能である。また、以上の構成によれば、文字列Ｌを機械翻訳する第１実施形態と比較してコンテンツＱを容易に生成できるという利点がある。また、以上の構成によれば、情報管理システム１０に翻訳処理部１１４を搭載する必要がないので、情報管理システム１０の構成や処理を簡略化することが可能である。 Also according to the second embodiment, it is possible to achieve the same effect as the first embodiment. In the second embodiment, a plurality of sets of a first language character string X and a second language character string Y are prepared in advance and correspond to a character string X similar to the character string L specified by speech recognition. The attached character string Y is generated as the content Q. That is, the content Q provided to the user is limited to a character string prepared in advance. Therefore, it is possible to provide an appropriate character string Y to the user even when erroneous recognition occurs in the recognition processing unit 112. In addition, there is an advantage that the possibility of mistranslation can be reduced as compared with the configuration of the first embodiment in which the character string L specified by speech recognition is machine-translated. That is, according to the second embodiment, it is possible to provide the user with the content Q of the proper character string Y that can be surely understood by the user. Moreover, according to the above structure, there exists an advantage that the content Q can be produced | generated easily compared with 1st Embodiment which machine-translates the character string L. FIG. Moreover, according to the above structure, since it is not necessary to mount the translation process part 114 in the information management system 10, the structure and process of the information management system 10 can be simplified.

＜第３実施形態＞
認識処理部１１２による認識精度には現実的には限界があり、誤認識が発生する可能性もある。案内音声が誤認識された場合、実際の案内音声の内容を正確に反映したコンテンツＱを端末装置３０の利用者に提供できない問題が生じ得る。そこで、第３実施形態では、案内者が発音した案内音声に対する音声認識で特定された文字列Ｌを案内者が必要に応じて編集することで誤認識を是正する。 <Third Embodiment>
The recognition accuracy by the recognition processing unit 112 is practically limited, and erroneous recognition may occur. If the guidance voice is erroneously recognized, there may arise a problem that the content Q that accurately reflects the actual guidance voice content cannot be provided to the user of the terminal device 30. Therefore, in the third embodiment, the misrecognition is corrected by the guider editing the character string L specified by the speech recognition with respect to the guide voice pronounced by the guide as necessary.

図１１は、第３実施形態の音声案内システム１の構成図である。第３実施形態の放音システム２０では、第１実施形態の放音システム２０の構成に対して、表示部２３と操作部２５と編集処理部２７と制御部２９とが付加されている。収音部２２および設定部２６の機能は第１実施形態と同様である。第３実施形態では、記憶部２１に記憶されたプログラムをＣＰＵが実行することで、放音システム２０の各機能（編集処理部２７、制御部２９）を実現する。 FIG. 11 is a configuration diagram of the voice guidance system 1 of the third embodiment. In the sound emission system 20 of the third embodiment, a display unit 23, an operation unit 25, an edit processing unit 27, and a control unit 29 are added to the configuration of the sound emission system 20 of the first embodiment. The functions of the sound collection unit 22 and the setting unit 26 are the same as those in the first embodiment. In 3rd Embodiment, each function (editing process part 27, control part 29) of the sound emission system 20 is implement | achieved because CPU runs the program memorize | stored in the memory | storage part 21. FIG.

表示部２３は、各種の情報を表示する表示装置（例えば液晶表示パネル等）である。操作部２５は、放音システム２０に対する指示のために、案内音声を発音する案内者が操作する入力機器である。第３実施形態の操作部２５は、案内者から音響信号ＳGを再生する指示等を受付ける。制御部２９は、案内者からの指示に応じて、記憶部２１に対する音響信号ＳGの書込および読出を制御する。第３実施形態の制御部２９は、案内者が案内音声を発音する毎に収音部２２が生成した音響信号ＳGを記憶部２１に格納する一方、案内者による操作に応じて、音響信号ＳGを記憶部２１から読み出して通信部２４および放音部２８に供給する。通信部２４は、制御部２９から供給された音響信号ＳGを、通信網２００を介して情報管理システム１０に送信する。 The display unit 23 is a display device (for example, a liquid crystal display panel) that displays various types of information. The operation unit 25 is an input device that is operated by a guide who generates a guidance voice in order to instruct the sound output system 20. The operation unit 25 according to the third embodiment receives an instruction or the like for reproducing the acoustic signal SG from the guide. The control unit 29 controls writing and reading of the acoustic signal SG with respect to the storage unit 21 in accordance with an instruction from the guide. The control unit 29 according to the third embodiment stores the acoustic signal SG generated by the sound collection unit 22 every time the guide pronounces the guidance voice in the storage unit 21, while the acoustic signal SG according to the operation by the guide. Is read from the storage unit 21 and supplied to the communication unit 24 and the sound emitting unit 28. The communication unit 24 transmits the acoustic signal SG supplied from the control unit 29 to the information management system 10 via the communication network 200.

第３実施形態の情報管理システム１０は、第１実施形態と同様に、取得部１１０と配信部１２０と記憶部１３０とを具備する。配信部１２０の機能および動作は第１実施形態と同様である。取得部１１０のうち認識処理部１１２は、放音システム２０の通信部２４から受信した音響信号ＳGに対する音声認識で案内音声の文字列Ｌを特定するとともに、当該文字列Ｌを通信網２００を介して放音システム２０に送信する。 As in the first embodiment, the information management system 10 of the third embodiment includes an acquisition unit 110, a distribution unit 120, and a storage unit 130. The function and operation of the distribution unit 120 are the same as in the first embodiment. The recognition processing unit 112 in the acquisition unit 110 specifies the character string L of the guidance voice by voice recognition with respect to the acoustic signal SG received from the communication unit 24 of the sound emission system 20, and uses the character string L via the communication network 200. To the sound emitting system 20.

放音システム２０の編集処理部２７は、認識処理部１１２が特定した文字列Ｌを操作部２５に対する案内者（指示者）からの指示に応じて編集する。第１実施形態の編集処理部２７は、文字列Ｌを表示部２３に表示させる。案内者は、表示部２３に表示された文字列Ｌを確認しながら操作部２５を適宜に操作することで文字列Ｌの変更を指示することが可能である。具体的には、案内者は、表示部２３に表示された文字列Ｌと自身が直前に発音した案内音声の発音内容との間に齟齬があれば、自身が直前に発音した案内音声の発音内容に一致するように文字列Ｌの変更を指示する。編集処理部２７は、操作部２５に対する利用者からの指示に応じて文字列Ｌを編集することで文字列Ｗを生成する。文字列Ｌと発音内容とに齟齬がない場合には文字列Ｌが編集後の文字列Ｗとして確定する。 The edit processing unit 27 of the sound emission system 20 edits the character string L specified by the recognition processing unit 112 in accordance with an instruction from the guide (instructor) to the operation unit 25. The edit processing unit 27 of the first embodiment displays the character string L on the display unit 23. The guide can instruct to change the character string L by appropriately operating the operation unit 25 while confirming the character string L displayed on the display unit 23. Specifically, if there is a discrepancy between the character string L displayed on the display unit 23 and the pronunciation content of the guidance voice that was pronounced immediately before, the guide will pronounce the guidance voice that was pronounced immediately before. The change of the character string L is instructed so as to match the contents. The edit processing unit 27 generates the character string W by editing the character string L in accordance with an instruction from the user to the operation unit 25. If there is no discrepancy between the character string L and the pronunciation content, the character string L is determined as the edited character string W.

編集処理部２７による処理が終了すると、案内者は、編集の完了を操作部２５に対する操作で指示（以下「編集完了指示」という）する。編集完了指示を契機として、通信部２４は、編集処理部２７による編集後の文字列Ｗを、当該案内音声のコンテンツＱの識別情報Ｄとともに情報管理システム１０に送信する。他方、制御部２９は、編集完了指示を契機として、記憶部２１に記憶された音響信号ＳGを読み出して放音部２８に供給する。すなわち、編集完了指示は実質的には音響信号ＳGの再生指示と表現され得る。 When the processing by the editing processing unit 27 ends, the guide instructs the completion of editing by operating the operating unit 25 (hereinafter referred to as “editing completion instruction”). In response to the editing completion instruction, the communication unit 24 transmits the character string W edited by the editing processing unit 27 to the information management system 10 together with the identification information D of the content Q of the guidance voice. On the other hand, the control unit 29 reads out the acoustic signal SG stored in the storage unit 21 and supplies it to the sound emitting unit 28 in response to the editing completion instruction. That is, the editing completion instruction can be substantially expressed as an instruction to reproduce the acoustic signal SG.

放音部２８の信号処理部２８０は、第１実施形態と同様の手法により、制御部２９から供給された音響信号ＳGに識別情報Ｄを合成して音響信号Ｓ1を生成する。放音部２８のスピーカー２８６は、第１実施形態と同様に、信号処理部２８０から供給される音響信号Ｓ1に応じた音響を放音する。放音部２８から放音された音響から情報抽出部３２０が識別情報Ｄを抽出する動作や識別情報Ｄの抽出を契機として表示処理部３５０がコンテンツＱの配信を利用者に予告する動作は第１実施形態と同様である。 The signal processing unit 280 of the sound emitting unit 28 generates the acoustic signal S1 by synthesizing the identification information D with the acoustic signal SG supplied from the control unit 29 by the same method as in the first embodiment. The speaker 286 of the sound emitting unit 28 emits sound corresponding to the acoustic signal S1 supplied from the signal processing unit 280, as in the first embodiment. The operation of the information extraction unit 320 extracting the identification information D from the sound emitted from the sound emission unit 28 and the operation of the display processing unit 350 notifying the user of the delivery of the content Q triggered by the extraction of the identification information D are the first. This is the same as in the first embodiment.

情報管理システム１０の翻訳処理部１１４は、編集処理部２７による編集後の文字列Ｗを受信し、文字列Ｗを他言語に翻訳することでコンテンツＱを生成する。翻訳処理部１１４が生成したコンテンツＱは、放音システム２０から送信された識別情報Ｄとともに記憶部１３０に格納される。端末装置３０からの配信要求Ｒに応じて配信部１２０がコンテンツＱを配信する動作は第１実施形態と同様である。 The translation processing unit 114 of the information management system 10 receives the character string W edited by the editing processing unit 27, and generates the content Q by translating the character string W into another language. The content Q generated by the translation processing unit 114 is stored in the storage unit 130 together with the identification information D transmitted from the sound emission system 20. The operation in which the distribution unit 120 distributes the content Q in response to the distribution request R from the terminal device 30 is the same as that in the first embodiment.

以上の説明から理解される通り、第３実施形態では、編集完了指示を契機として、収録済の案内音声（音響信号Ｓ1）の放音と、当該案内音声に関連するコンテンツＱの生成とが実行される。すなわち、文字列Ｌの編集が完了してから案内音声が放音されるから、案内音声の放音からコンテンツＱの配信までの遅延を低減することが可能である。 As understood from the above description, in the third embodiment, triggered by the editing completion instruction, the recorded guidance voice (sound signal S1) is emitted and the content Q related to the guidance voice is generated. Is done. That is, since the guidance voice is emitted after the editing of the character string L is completed, it is possible to reduce the delay from the guidance voice emission to the distribution of the content Q.

以上に説明したように、第３実施形態では、認識処理部１１２が特定した文字列Ｌを、案内者の指示に応じて編集し、編集後の文字列Ｗを他言語に翻訳することでコンテンツＱを生成する。したがって、認識処理部１１２による認識結果（文字列Ｌ）が誤認識を含む場合でも、編集後の文字列Ｗに対応するコンテンツＱを配信することにより、案内音声の発音内容を正確に利用者に通知できるという効果が実現される。 As described above, in the third embodiment, content is obtained by editing the character string L specified by the recognition processing unit 112 in accordance with an instruction from the guide and translating the edited character string W into another language. Q is generated. Therefore, even when the recognition result (character string L) by the recognition processing unit 112 includes misrecognition, by distributing the content Q corresponding to the edited character string W, the pronunciation content of the guidance voice can be accurately transmitted to the user. The effect of being able to be notified is realized.

＜変形例＞
以上に例示した各態様は多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２個以上の態様は、相互に矛盾しない範囲で適宜に併合され得る。 <Modification>
Each aspect illustrated above can be variously modified. Specific modifications are exemplified below. Two or more modes arbitrarily selected from the following examples can be appropriately combined within a range that does not contradict each other.

（１）前述の各形態では、音声案内システム１が利用される場面として、航空会社が運行する航空便に関する音声案内を例示したが、音声案内システム１が利用される場面は以上の例示に限定されない。例えば、電車やバス等の交通機関の音声案内や、美術館や博物館等の展示施設、宿泊施設、商業施設等の各種の施設の音声案内に、前述の各形態と同様の音声案内システム１が利用される。また、各種の施設にて火災や地震等の災害が発生した場合の情報提供（例えば避難の案内や状況の通知）に音声案内システム１を利用することも可能である。また、収音部２２による収音および放音部２８による放音の対象となる音声は案内音声に限定されない。例えば、音楽等の各種の音響を収音部２２により収音して放音部２８から放音する場合にも前述の各形態は採用され得る。以上の説明から理解される通り、前述の各形態の案内音声は、再生対象となる音響（対象音）の一例である。 (1) In each of the above-described embodiments, the voice guidance related to the flight operated by the airline is exemplified as the scene where the voice guidance system 1 is used. However, the scene where the voice guidance system 1 is used is limited to the above examples. Not. For example, the voice guidance system 1 similar to the above-described embodiments is used for voice guidance of transportation facilities such as trains and buses, and voice guidance of various facilities such as exhibition facilities such as museums and museums, accommodation facilities, and commercial facilities. Is done. It is also possible to use the voice guidance system 1 for providing information (for example, evacuation guidance and status notification) when disasters such as fires and earthquakes occur in various facilities. Further, the sound that is the target of sound collection by the sound collection unit 22 and sound emission by the sound emission unit 28 is not limited to the guidance sound. For example, the above-described embodiments can also be adopted when various sounds such as music are collected by the sound collection unit 22 and emitted from the sound emission unit 28. As understood from the above description, the guidance voices in the above-described forms are examples of sound (target sound) to be reproduced.

（２）前述の各形態では、取得部１１０が認識処理部１１２と翻訳処理部１１４とを含む構成を例示したが、認識処理部１１２や翻訳処理部１１４が設置される位置（音声認識や機械翻訳が実行される段階）は以上の例示に限定されない。例えば、認識処理部１１２と翻訳処理部１１４とを端末装置３０に設置し、放音システム２０から送信された音響信号ＳGをコンテンツＱとして情報管理システム１０から端末装置３０に配信することも可能である。以上の構成では、取得部１１０は、放音システム２０の通信部２４から送信される音響信号ＳGをコンテンツＱとして取得して、当該コンテンツＱと識別情報Ｄとを対応付け、配信部１２０は、識別情報Ｄを含むコンテンツＱの配信要求Ｒを端末装置３０から受信した場合に、識別情報Ｄに対応するコンテンツＱ（音響信号ＳG）を配信する。端末装置３０の認識処理部１１２は、受信したコンテンツＱに対して音声認識を実行することで案内音声の文字列Ｌを特定し、端末装置３０の翻訳処理部１１４は、認識処理部１１２が特定した文字列Ｌを第２言語に翻訳してコンテンツＱを再生する。 (2) In the above-described embodiments, the configuration in which the acquisition unit 110 includes the recognition processing unit 112 and the translation processing unit 114 has been exemplified. However, the positions where the recognition processing unit 112 and the translation processing unit 114 are installed (voice recognition and machine The stage in which translation is performed is not limited to the above examples. For example, it is also possible to install the recognition processing unit 112 and the translation processing unit 114 in the terminal device 30 and distribute the acoustic signal SG transmitted from the sound emitting system 20 from the information management system 10 to the terminal device 30 as the content Q. is there. In the above configuration, the acquisition unit 110 acquires the acoustic signal SG transmitted from the communication unit 24 of the sound emission system 20 as the content Q, associates the content Q with the identification information D, and the distribution unit 120 When the distribution request R of the content Q including the identification information D is received from the terminal device 30, the content Q (acoustic signal SG) corresponding to the identification information D is distributed. The recognition processing unit 112 of the terminal device 30 specifies the character string L of the guidance voice by executing voice recognition on the received content Q, and the translation processing unit 114 of the terminal device 30 specifies the recognition processing unit 112. The translated character string L is translated into the second language to reproduce the content Q.

また、認識処理部１１２と翻訳処理部１１４とを放音システム２０に設置することも可能である。放音システム２０の通信部２４は、認識処理部１１２による音響信号ＳGの認識と翻訳処理部１１４による機械翻訳とで生成された文字列Ｌを情報管理システム１０に送信する。情報管理システム１０の取得部１１０は、放音システム２０で生成された文字列ＬをコンテンツＱとして取得する。認識処理部１１２を放音システム２０に設置して翻訳処理部１１４を情報管理システム１０に設置することも可能である。なお、放音システム２０の認識処理部１１２と翻訳処理部１１４とが生成した文字列ＬをコンテンツＱとして、放音システム２０から直接的に（すなわち情報管理システム１０を介在することなく）端末装置３０に送信することも可能である。放音システム２０から端末装置３０に対するコンテンツＱの送信には、前述の各形態で例示した音響通信のほか、電波や赤外線を利用した近距離無線通信（アドホック通信）が好適に利用される。以上の構成によれば、通信網２００を利用した通信を端末装置３０が実行する必要がないから、例えば通信網２００を利用した通信サービスに非加入の利用者（例えば、外国人旅行者）でもコンテンツＱを利用できるという利点がある。 In addition, the recognition processing unit 112 and the translation processing unit 114 can be installed in the sound emitting system 20. The communication unit 24 of the sound emission system 20 transmits the character string L generated by the recognition of the acoustic signal SG by the recognition processing unit 112 and the machine translation by the translation processing unit 114 to the information management system 10. The acquisition unit 110 of the information management system 10 acquires the character string L generated by the sound emission system 20 as the content Q. It is also possible to install the recognition processing unit 112 in the sound emitting system 20 and install the translation processing unit 114 in the information management system 10. The terminal device directly from the sound emission system 20 (that is, without intervening the information management system 10) using the character string L generated by the recognition processing unit 112 and the translation processing unit 114 of the sound emission system 20 as the content Q. It is also possible to transmit to 30. For the transmission of the content Q from the sound emitting system 20 to the terminal device 30, short-distance wireless communication (ad hoc communication) using radio waves or infrared rays is suitably used in addition to the acoustic communication exemplified in the above-described embodiments. According to the above configuration, since it is not necessary for the terminal device 30 to execute communication using the communication network 200, for example, even a user who is not subscribed to a communication service using the communication network 200 (for example, a foreign traveler). There is an advantage that the content Q can be used.

以上の説明から理解される通り、前述の各形態における取得部１１０は、対象音に関連するコンテンツＱを取得する要素として包括的に表現され、それ自身の動作（例えば認識処理部１１２による音声認識や翻訳処理部１１４による機械翻訳）によりコンテンツＱを生成する要素のほか、放音システム２０等の外部装置で生成されたコンテンツＱ（例えば音響信号ＳGや翻訳後の文字列）を取得する要素も包含する。 As understood from the above description, the acquisition unit 110 in each of the above-described forms is comprehensively expressed as an element for acquiring the content Q related to the target sound, and its own operation (for example, voice recognition by the recognition processing unit 112). In addition to the element that generates the content Q by machine translation by the translation processing unit 114 and the element that acquires the content Q (for example, the acoustic signal SG and the translated character string) generated by an external device such as the sound emission system 20 Include.

（３）コンテンツＱの内容は前述の各形態での例示に限定されない。例えば、前述の各形態では、案内音声の発音内容を翻訳した文字列をコンテンツＱとして生成したが、例えば、翻訳後の文字列を適用した音声合成で生成された合成音声を表すコンテンツＱを生成してもよい。コンテンツＱが音響を表す場合、当該音響を放音するスピーカーやイヤホン等の放音装置が出力部３６０として利用される。翻訳処理部１１４による翻訳後の文字列を表すコンテンツＱと、翻訳後の文字列を発音した合成音声を表すコンテンツＱとは、翻訳処理部１１４による処理結果（翻訳結果）を表すコンテンツとして包括される。なお、以上の説明では翻訳処理部１１４を含む構成（例えば第１実施形態）を想定したが、第２実施形態においても同様に、選択処理部１１８が選択した文字列Ｙを発音した合成音声を表すコンテンツＱを生成することが可能である。選択処理部１１８が選択した文字列Ｙを表すコンテンツＱと、文字列Ｙの合成音声を表すコンテンツＱとは、選択処理部１１８が選択した文字列Ｙを表すコンテンツＱとして包括的に表現される。 (3) The content Q is not limited to the examples in the above-described embodiments. For example, in each of the above-described forms, the character string obtained by translating the pronunciation content of the guidance voice is generated as the content Q. For example, the content Q representing the synthesized voice generated by the voice synthesis using the translated character string is generated. May be. When the content Q represents sound, a sound emitting device such as a speaker or an earphone that emits the sound is used as the output unit 360. The content Q representing the character string after translation by the translation processing unit 114 and the content Q representing the synthesized speech that pronounces the translated character string are included as content representing the processing result (translation result) by the translation processing unit 114. The In the above description, the configuration including the translation processing unit 114 (for example, the first embodiment) is assumed. Similarly, in the second embodiment, the synthesized speech that pronounces the character string Y selected by the selection processing unit 118 is used. It is possible to generate content Q to represent. The content Q representing the character string Y selected by the selection processing unit 118 and the content Q representing the synthesized speech of the character string Y are comprehensively expressed as the content Q representing the character string Y selected by the selection processing unit 118. .

また、案内音声の音響信号ＳGに対する音声認識で特定した文字列Ｌ自体をコンテンツＱとして生成してもよい。また、例えば、案内音声の発音内容に対する補足事項や関連情報（例えば案内音声による案内対象となる施設や場所等の事象に関連する情報のように意味や内容自体は案内音声と必ずしも一致しない情報）を表すコンテンツＱを配信する構成や、案内音声に関連する情報（例えば前述の各形態で例示したコンテンツＱ）の所在を示すリンク情報（例えばＵＲＬ）をコンテンツＱとして情報提供システム１００から端末装置３０に配信する構成も採用され得る。以上に例示した種々のコンテンツＱは、対象音に関連する情報として包括的に表現される。案内音声の発音内容やその翻訳文の文字列または音声等を表すコンテンツＱのように案内音声と意味または内容が相関するという関係は、「対象音に関連する」関係の典型例であるが、対象音に関連する情報のリンク情報を表すコンテンツＱのように当該情報の所在を表すという関係も「対象音に関連する」関係には包含され得る。 Further, the character string L itself specified by voice recognition for the acoustic signal SG of the guidance voice may be generated as the content Q. Also, for example, supplementary items and related information for the pronunciation content of the guidance voice (for example, information whose meaning and content itself do not necessarily match the guidance voice, such as information related to events such as facilities and places to be guided by the guidance voice) The content providing information Q and the link information (for example, URL) indicating the location of the information related to the guidance voice (for example, the content Q exemplified in each of the above-described embodiments) as the content Q from the information providing system 100 to the terminal device 30 A configuration for delivering to the network can also be adopted. The various contents Q exemplified above are comprehensively expressed as information related to the target sound. The relationship that the meaning or content correlates with the guidance voice, such as the content Q representing the pronunciation content of the guidance voice or the character string or voice of the translated sentence, is a typical example of the relationship “related to the target sound”. The relationship of representing the location of the information, such as the content Q representing the link information of the information related to the target sound, can also be included in the relationship related to the target sound.

（４）前述の各形態では、識別情報Ｄを指定した配信要求Ｒを送信した端末装置３０にコンテンツＱを配信（すなわちプル型配信）する構成を例示したが、端末装置３０による配信要求Ｒの送信は必須ではない。例えば、配信対象として情報管理システム１０に事前に登録された端末装置３０に対し、コンテンツＱの生成を契機として当該コンテンツＱを配信（すなわちプッシュ型配信）することも可能である。配信対象となる端末装置３０の登録方法は任意であるが、例えば、空港施設Ｃ内に位置する端末装置３０を登録する構成が好適である。具体的には、空港施設Ｃに設置されたＱＲコード（登録商標）の読取や空港施設Ｃ内の近距離無線機からの無線信号の受信を契機として端末装置３０が登録要求を送信し、登録要求の送信元の端末装置３０を情報管理システム１０が配信対象として登録すれば、コンテンツＱの配信対象を空港施設Ｃ内の端末装置３０に制限することが可能である。 (4) In each of the above-described embodiments, the configuration in which the content Q is distributed (that is, pull-type distribution) to the terminal device 30 that has transmitted the distribution request R specifying the identification information D is illustrated. Transmission is not mandatory. For example, the content Q can be distributed (ie, push-type distribution) to the terminal device 30 registered in advance in the information management system 10 as a distribution target, triggered by the generation of the content Q. Although the registration method of the terminal device 30 to be distributed is arbitrary, for example, a configuration in which the terminal device 30 located in the airport facility C is registered is preferable. Specifically, the terminal device 30 transmits a registration request triggered by reading a QR code (registered trademark) installed in the airport facility C or receiving a radio signal from a short-range radio in the airport facility C. If the information management system 10 registers the request transmission source terminal device 30 as a distribution target, the distribution target of the content Q can be limited to the terminal device 30 in the airport facility C.

（５）前述の各形態では、コンテンツＱの提供を端末装置３０に通知されるための予告情報として、案内音声のコンテンツＱの識別情報Ｄを例示したが、予告情報は以上の例示に限定されない。例えば、案内音声を放音するスピーカー２８６を識別するための識別情報Ｄを予告情報としてもよい。スピーカー２８６の識別情報Ｄを予告情報とした構成では、スピーカー２８６が放音した案内音声を収音した端末装置３０が当該識別情報Ｄを含むコンテンツＱの配信要求Ｒを送信した場合に、当該スピーカー２８６で放音した最新の案内音声を表すコンテンツＱを配信してもよい。また、予告情報としては、識別情報Ｄ以外でもよい。例えば、図６で例示したように、コンテンツＱの配信を利用者に通知するメッセージを予告情報としてもよい。予告情報としては、端末装置３０がコンテンツＱの配信を利用者に報知する動作の契機として当該端末装置３０に認識され得る情報であれば足りる。すなわち、予告情報は、案内音声に関連するコンテンツＱの提供を端末装置３０に通知するための情報として包括的に表現される。 (5) In the above-described embodiments, the identification information D of the content Q of the guidance voice is exemplified as the advance notice information for notifying the terminal device 30 of the provision of the content Q. However, the advance notice information is not limited to the above examples. . For example, the identification information D for identifying the speaker 286 that emits the guidance voice may be used as the advance notice information. In the configuration in which the identification information D of the speaker 286 is used as the advance notice information, when the terminal device 30 that has picked up the guidance voice emitted by the speaker 286 transmits the distribution request R of the content Q including the identification information D, the speaker The content Q representing the latest guidance voice emitted at 286 may be distributed. Further, the notice information may be other than the identification information D. For example, as illustrated in FIG. 6, a message for notifying the user of the delivery of the content Q may be used as the advance notice information. The notice information may be any information that can be recognized by the terminal device 30 as an opportunity for the operation of the terminal device 30 notifying the user of the distribution of the content Q. That is, the advance notice information is comprehensively expressed as information for notifying the terminal device 30 of the provision of the content Q related to the guidance voice.

（６）第３実施形態では、収音部２２が収音した音響信号ＳGを記憶部２１に記憶し、案内者からの指示に応じた文字列Ｌの編集後に音響信号ＳGを記憶部２１から放音部２８に供給したが、文字列Ｌの編集を待たず、第１実施形態と同様に案内者による発音に並行して実時間的に案内音声を放音することも可能である。 (6) In the third embodiment, the sound signal SG picked up by the sound pickup unit 22 is stored in the storage unit 21, and the sound signal SG is stored from the storage unit 21 after editing the character string L according to the instruction from the guide. Although supplied to the sound emitting unit 28, it is also possible to emit the guidance voice in real time in parallel with the pronunciation by the guide as in the first embodiment without waiting for the editing of the character string L.

（７）複数のコンテンツＱを各々の識別情報Ｄに対応させて端末装置３０の記憶部３７０に事前に格納した構成も採用され得る。各コンテンツＱは、案内者による発音が予定される案内音声の文字列の翻訳文（第２実施形態の文字列Ｙ）を表す。端末装置３０の情報抽出部３２０は、放音システム２０の放音部２８が放音した音響（案内音声と識別情報Ｄの音響成分との混合音）を収音した音響信号Ｓ2から識別情報Ｄを抽出し、表示処理部３５０は、記憶部３７０に記憶された複数のコンテンツＱのうち、情報抽出部３２０が抽出した識別情報Ｄに対応するコンテンツＱを出力部３６０に再生させる。以上の構成によれば、通信網２００を利用した通信を端末装置３０が実行する必要がないから、例えば通信網２００を利用した通信サービスに非加入の利用者（例えば外国人旅行者）でもコンテンツＱを利用できるという利点がある。 (7) A configuration in which a plurality of contents Q are stored in advance in the storage unit 370 of the terminal device 30 in association with each piece of identification information D may be employed. Each content Q represents a translated sentence (character string Y in the second embodiment) of the guidance voice that is expected to be pronounced by the guide. The information extraction unit 320 of the terminal device 30 identifies the identification information D from the acoustic signal S2 collected by the sound (mixed sound of the guidance voice and the acoustic component of the identification information D) emitted by the sound emission unit 28 of the sound emission system 20. The display processing unit 350 causes the output unit 360 to reproduce the content Q corresponding to the identification information D extracted by the information extracting unit 320 among the plurality of contents Q stored in the storage unit 370. According to the above configuration, since it is not necessary for the terminal device 30 to execute communication using the communication network 200, for example, even a user who is not subscribed to a communication service using the communication network 200 (for example, a foreign traveler) is content. There is an advantage that Q can be used.

（８）前述の各形態における情報管理システム１０は、単体の装置として実現されるほか、相互に別体で構成された複数の装置（サーバ）としても実現され得る。例えば、前述の各形態の情報管理システム１０を、認識処理部１１２を含む第１サーバと、翻訳処理部１１４を含む第２サーバと、配信部１２０を含む第３サーバとに分散し、第１サーバと第２サーバと第３サーバとが例えば通信網２００を介して相互に通信する構成も採用され得る。 (8) In addition to being realized as a single device, the information management system 10 in each of the above embodiments may be realized as a plurality of devices (servers) configured separately from each other. For example, the information management system 10 of each form described above is distributed to a first server including the recognition processing unit 112, a second server including the translation processing unit 114, and a third server including the distribution unit 120, and the first A configuration in which the server, the second server, and the third server communicate with each other via the communication network 200 may be employed.

（９）前述の各形態では、端末装置３０に対する識別情報Ｄの通知に音響通信を利用したが、識別情報Ｄを端末装置３０に通知する通信の方式は以上の例示に限定されない。例えば、赤外線や電磁波を利用した無線通信（例えば近距離無線通信）で端末装置３０に識別情報Ｄを通知することも可能である。 (9) In each of the above-described embodiments, acoustic communication is used for notification of the identification information D to the terminal device 30, but the communication method for notifying the terminal device 30 of the identification information D is not limited to the above examples. For example, it is also possible to notify the identification information D to the terminal device 30 by wireless communication using infrared rays or electromagnetic waves (for example, short-range wireless communication).

（１０）翻訳処理部１１４による翻訳後の文字列Ｌ（第２言語）をコンテンツＱとして端末装置３０に配信する構成に加えて、当該文字列Ｌを発音した音声（すなわち第２言語の案内音声）を放音システム２０の放音部２８から放音することも可能である。例えば、図１２に例示される通り、情報管理システム１０に音声合成部１４０が設置される。音声合成部１４０は、翻訳処理部１１４による翻訳後の文字列を適用した音声合成により、当該文字列Ｌを発音した合成音声の音響信号ＳLを生成する。すなわち、認識処理部１１２および翻訳処理部１１４は、コンテンツＱの生成と音響信号ＳLの生成とに流用される。なお、以上の説明では、翻訳処理部１１４を具備する構成（例えば第１実施形態）を例示したが、第２実施形態の選択処理部１１８が案内テーブルＴB1から選択した文字列Ｙを発音した合成音声の音響信号ＳLを音声合成部１４０が生成することも可能である。 (10) In addition to the configuration in which the character string L (second language) translated by the translation processing unit 114 is distributed to the terminal device 30 as the content Q, the voice that pronounces the character string L (that is, the second language guidance voice) ) Can be emitted from the sound emission unit 28 of the sound emission system 20. For example, as illustrated in FIG. 12, the speech synthesizer 140 is installed in the information management system 10. The voice synthesizer 140 generates an acoustic signal SL of a synthesized voice that is generated by generating the character string L by voice synthesis to which the character string after translation by the translation processing unit 114 is applied. That is, the recognition processing unit 112 and the translation processing unit 114 are used for the generation of the content Q and the generation of the acoustic signal SL. In the above description, the configuration including the translation processing unit 114 (for example, the first embodiment) is illustrated. However, the composition in which the selection processing unit 118 of the second embodiment pronounces the character string Y selected from the guide table TB1. It is also possible for the speech synthesizer 140 to generate the audio signal SL.

音声合成部１４０が生成した音響信号ＳLは、放音システム２０に送信される。放音システム２０の放音部２８は、音響信号ＳGが示す第１言語の案内音声の放音後に、通信部２４が情報管理システム１０から受信した音響信号ＳLが示す第２言語の案内音声を放音する。以上の構成では、第１言語を理解可能な利用者は第１言語の案内音声の聴取により案内を把握し、第２言語を理解可能な利用者は第２言語の案内音声の聴取により案内を把握することが可能である。端末装置３０の利用者は、端末装置３０に配信されるコンテンツＱで案内を確認するとともに、放音システム２０から放音される第２言語の案内音声の聴取によっても案内を把握することが可能である。 The acoustic signal SL generated by the speech synthesizer 140 is transmitted to the sound emission system 20. The sound emitting unit 28 of the sound emitting system 20 receives the second language guidance voice indicated by the acoustic signal SL received by the communication unit 24 from the information management system 10 after the first language guidance voice indicated by the acoustic signal SG is emitted. Sounds out. In the above configuration, a user who can understand the first language grasps the guidance by listening to the guidance voice in the first language, and a user who can understand the second language gives guidance by listening to the guidance voice in the second language. It is possible to grasp. The user of the terminal device 30 can confirm the guidance by the content Q distributed to the terminal device 30 and can also grasp the guidance by listening to the second language guidance sound emitted from the sound emitting system 20. It is.

情報管理システム１０に音声合成部１４０を設置した構成では、認識処理部１１２が特定した文字列Ｌを翻訳処理部１１４が複数の言語（例えば、第２言語に加えて第３言語や第４言語等）に翻訳し、音声合成部１４０が複数の言語の各々の案内音声を表す音響信号ＳLを生成して放音システム２０に送信してもよい。放音システム２０の放音部２８は、音響信号ＳGが示す第１言語の案内音声の放音後に、通信部２４が情報管理システム１０から受信した複数の音響信号ＳLが示す相異なる言語の案内音声を順次に放音する。なお、翻訳処理部１１４が翻訳する言語の種類数や、各言語の案内音声（例えば第２言語〜第４言語）を放音する順序は任意である。 In the configuration in which the speech synthesis unit 140 is installed in the information management system 10, the translation processing unit 114 converts the character string L specified by the recognition processing unit 112 into a plurality of languages (for example, a third language or a fourth language in addition to the second language). Etc.), and the speech synthesizer 140 may generate an acoustic signal SL representing each guidance voice in a plurality of languages and transmit it to the sound emission system 20. The sound emitting unit 28 of the sound emitting system 20 provides guidance in different languages indicated by the plurality of acoustic signals SL received from the information management system 10 by the communication unit 24 after the guidance voice in the first language indicated by the acoustic signal SG is emitted. Sound is emitted sequentially. The number of types of languages translated by the translation processing unit 114 and the order in which the guidance voices (for example, the second language to the fourth language) of each language are emitted are arbitrary.

（１１）予告情報（識別情報Ｄ）を利用してコンテンツＱの配信を利用者に予告する構成は省略され得る。例えば、案内音声の音響成分と予告情報（識別情報Ｄ）の音響成分との混合音の放音や、案内音声の音響信号ＳGに識別情報Ｄの変調信号ＳDを混合する信号処理部２８０を省略し、収音部２２が収音した案内音声をそのままスピーカー２８６から放音することも可能である。 (11) The configuration for notifying the user of the delivery of the content Q using the notice information (identification information D) can be omitted. For example, the output of the mixed sound of the acoustic component of the guidance voice and the acoustic component of the notice information (identification information D), or the signal processing unit 280 that mixes the modulation signal SD of the identification information D with the acoustic signal SG of the guidance voice is omitted. In addition, the guidance voice collected by the sound collection unit 22 can be emitted from the speaker 286 as it is.

１……音声案内システム、１００……情報提供システム、１０……情報管理システム、１１０……取得部、１１２……認識処理部、１１４……翻訳処理部、１１８……選択処理部、１２０……配信部、１３０……記憶部、１４０……音声合成部、２０……放音システム、２１……記憶部、２２……収音部、２３……表示部、２４……通信部、２５……操作部、２６……設定部、２７……編集処理部、２８……放音部、２９……制御部、２８０……信号処理部、２８２……変調処理部、２８４……混合処理部、２８６……スピーカー、３０……端末装置、３１０……収音部、３２０……情報抽出部、３３０……送信部、３４０……受信部、３５０……表示処理部、３６０……出力部、３７０……記憶部。
DESCRIPTION OF SYMBOLS 1 ... Voice guidance system, 100 ... Information provision system, 10 ... Information management system, 110 ... Acquisition part, 112 ... Recognition processing part, 114 ... Translation processing part, 118 ... Selection processing part, 120 ... ... Distribution unit 130... Storage unit 140... Speech synthesis unit 20... Sound emission system 21... Storage unit 22. ...... Operating unit, 26 ... setting unit, 27 ... editing processing unit, 28 ... sound emitting unit, 29 ... control unit, 280 ... signal processing unit, 282 ... modulation processing unit, 284 ... mixing processing , 286 …… Speaker, 30 …… Terminal device, 310 …… Sound collecting unit, 320 …… Information extracting unit, 330 …… Sending unit, 340 …… Receiving unit, 350 …… Display processing unit, 360 …… Output Part, 370... Storage part.

Claims

An acquisition unit for acquiring content related to the target sound collected by the sound collection unit;
A sound emission unit that emits sound including the sound component of the target sound and the sound component of the notice information for notifying the terminal device of the provision of the content before the acquisition of the content by the acquisition unit;
The sound acquired by the sound output unit is collected and the content acquired by the acquisition unit is distributed to a terminal device that notifies the user of the provision of the content according to the advance notice information extracted from the sound. A distribution department;
An information providing system comprising:

The advance notice information includes identification information of the content,
When the distribution unit receives a distribution request designating the identification information extracted from the sound emitted by the sound emitting unit from the terminal device, the distribution unit distributes content corresponding to the identification information to the terminal device. The information providing system according to claim 1.

The distribution unit sequentially receives the distribution request transmitted a plurality of times from the terminal device, and when the acquisition unit completes the acquisition of the content when the distribution request is received, the content is transmitted to the terminal device. 3. The information providing system according to claim 2, wherein distribution of content is not executed unless acquisition of the content is completed.

The acquisition unit
A recognition processing unit for identifying a character string of pronunciation content by voice recognition for the target sound;
Corresponding to a first character string similar to the character string specified by the recognition processing unit from a table in which a second character string obtained by translating the first character string into another language is associated with each of the plurality of first character strings. A selection processing unit for selecting a second character string,
The information provision system in any one of Claims 1-3 which produces | generates the content showing the 2nd character string which the said selection process part selected.

The acquisition unit
A recognition processing unit for identifying a character string of pronunciation content by voice recognition for the target sound;
A translation processing unit that translates the character string identified by the recognition processing unit into another language,
The information provision system in any one of Claims 1-3 which produces | generates the content showing the process result by the said translation process part.

An edit processing unit that edits the character string identified by the recognition processing unit in accordance with an instruction from the instructor;
The information providing system according to claim 5, wherein the translation processing unit translates the character string edited by the editing processing unit into another language.