JP6238181B1

JP6238181B1 - Loudspeaker and control method thereof

Info

Publication number: JP6238181B1
Application number: JP2016196992A
Authority: JP
Inventors: 良彦竹井; 好男一柳; 明夫上杉; 功二森; 和之田中
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2016-10-05
Filing date: 2016-10-05
Publication date: 2017-11-29
Anticipated expiration: 2036-10-05
Also published as: JP2018060043A

Abstract

【課題】ユーザが発話した音声を録音した原音声と、その原音声から生成される原言語の合成音声とをユーザが適宜に切り替えて出力できるようにする。【解決手段】ユーザが発話した音声を収音するマイク３と、マイクで収音した原音声を録音する録音部３１と、原音声に対応する原言語の合成音声および他言語の合成音声を取得する合成音声取得部２６と、原音声、原言語の合成音声および他言語の合成音声をスピーカから出力する音声出力部２７と、ユーザ設定情報に基づいて、原言語の音声として原音声と原言語の合成音声とのいずれかを出力するように制御する出力制御部３６と、を備える。【選択図】図３To enable a user to appropriately switch and output an original voice recorded from a voice spoken by a user and a synthesized voice of a source language generated from the original voice. SOLUTION: A microphone 3 that picks up speech uttered by a user, a recording unit 31 that records original sound picked up by the microphone, and synthesized speech of a source language corresponding to the source speech and synthesized speech of other languages are acquired. The synthesized speech acquisition unit 26, the speech output unit 27 that outputs the source speech, the synthesized speech of the source language and the synthesized speech of another language from the speaker, and the source language and source language as source language speech based on the user setting information An output control unit 36 that controls to output any one of the synthesized voices. [Selection] Figure 3

Description

本発明は、ユーザが発話した音声を出力する拡声装置およびその制御方法に関するものである。 The present invention relates to a loudspeaker that outputs voice spoken by a user and a control method thereof.

災害時などの避難の誘導、警備のための案内や誘導、業務の指示などに関するメッセージを多数の人物に同時に報知するため、ユーザが発話した音声を増幅して出力する拡声装置が使用されている。 A loudspeaker that amplifies and outputs the voice spoken by the user is used to simultaneously notify a large number of people of messages related to guidance for evacuation, guidance and guidance for security, business instructions, etc. .

一方、空港、駅、ホテル、観光地などでは外国人旅行者が多数滞在し、また、工場、倉庫、工事現場などでは外国人労働者が多数働いており、このように外国人が多数滞在する場所では、必要なメッセージを外国人に理解可能な外国語の音声で報知することが望まれる。 On the other hand, many foreign tourists stay at airports, train stations, hotels, sightseeing spots, and many foreign workers work at factories, warehouses, construction sites, etc. In places, it is desirable to broadcast necessary messages in a foreign language that can be understood by foreigners.

このような必要なメッセージを外国語の音声で報知することに関する技術として、従来、複数のメッセージの候補を予め登録しておき、その複数のメッセージの候補の中から適切なメッセージをユーザが選択することで、選択されたメッセージに対応する外国語の音声が出力されるようにした技術が知られている（特許文献１参照）。 Conventionally, a plurality of message candidates are registered in advance, and the user selects an appropriate message from the plurality of message candidates, as a technique related to notification of such necessary messages in a foreign language voice. Thus, a technique is known in which a foreign language voice corresponding to a selected message is output (see Patent Document 1).

特開２０１４−０４４３４９号公報JP 2014-044349 A

しかしながら、前記従来の技術では、予め登録されたメッセージの中から適切なものを選択して出力するものであるため、現場が想定外の状況で、登録されたメッセージの中に現場の状況に適したものがない場合には、役に立たないという問題があり、現場の状況に応じた適宜なメッセージを外国語の音声で出力することができる構成が望まれる。 However, since the conventional technique selects and outputs an appropriate message from pre-registered messages, it is suitable for the situation of the site in the registered message in an unexpected situation. When there is nothing, there is a problem that it is not useful, and a configuration that can output an appropriate message in a foreign language according to the situation at the site is desired.

また、ユーザが発話した音声を録音して、その音声を外国語の音声とともに繰り返し出力するようにするとよい。このとき、現場の状況などに応じて、録音した原音声をそのまま出力することが望ましい場合や、原音声の音声認識により取得した文字情報から生成した合成音声を出力することが望ましい場合がある。例えば、緊急時の避難誘導で緊迫感を出す必要がある場合には、原音声をそのまま出力することが望ましいが、通常時の案内では、違和感をなくすため、各言語の音声を合成音声で統一することが望ましい。 Moreover, it is good to record the audio | voice which the user uttered and to output repeatedly the audio | voice with the foreign language audio | voice. At this time, it may be desirable to output the recorded original voice as it is depending on the situation at the site, or it may be desirable to output a synthesized voice generated from character information acquired by voice recognition of the original voice. For example, when it is necessary to give a sense of urgency in emergency evacuation guidance, it is desirable to output the original voice as it is, but in normal guidance, the voice in each language is unified with synthesized voice to eliminate the sense of incongruity. It is desirable to do.

そこで、本発明は、現場の状況などに応じて、ユーザが発話した音声を録音した原音声と、その原音声から生成される原言語の合成音声とをユーザが適宜に切り替えて出力することができる拡声装置およびその制御方法を提供することを主な目的とする。 Therefore, according to the present invention, the user can appropriately switch and output the original voice recording the voice uttered by the user and the synthesized voice of the original language generated from the original voice depending on the situation at the site. The main object of the present invention is to provide a loudspeaker device and a control method thereof.

本発明の拡声装置は、ユーザが発話した音声を出力する拡声装置であって、ユーザが発話した音声を収音するマイクと、前記マイクで収音した原音声を録音する録音部と、前記原音声に対応する原言語の合成音声および他言語の合成音声を取得する合成音声取得部と、前記原音声、前記原言語の合成音声および前記他言語の合成音声をスピーカから出力する音声出力部と、ユーザ設定情報に基づいて、原言語の音声として前記原音声と前記原言語の合成音声とのいずれかを出力するように制御する出力制御部と、を備える構成とする。 The loudspeaker of the present invention is a loudspeaker that outputs a voice spoken by a user, a microphone that collects the voice spoken by the user, a recording unit that records the original voice collected by the microphone, and the original A synthesized speech acquisition unit that acquires a synthesized speech of a source language corresponding to speech and a synthesized speech of another language; and a speech output unit that outputs the synthesized speech of the source language, the synthesized speech of the source language, and the synthesized speech of the other language from a speaker; And an output control unit that controls to output either the original speech or the synthesized speech of the source language as the source language speech based on the user setting information.

また、本発明の制御方法は、ユーザが発話した音声を出力する拡声装置の制御方法であって、ユーザが発話した音声をマイクで収音し、前記マイクで収音した原音声を録音し、原言語の音声として前記原音声が選択されている場合には、前記原音声をスピーカから出力し、続いて、前記原音声に対応する他言語の合成音声を取得して、その他言語の合成音声を前記スピーカから出力し、前記原言語の音声として合成音声が選択されている場合には、前記原音声に対応する原言語の合成音声を取得して、その原言語の合成音声を前記スピーカから出力し、続いて、前記原音声に対応する他言語の合成音声を取得して、その他言語の合成音声を前記スピーカから出力する構成とする。 Further, the control method of the present invention is a control method of a loudspeaker that outputs the voice spoken by the user, picks up the voice spoken by the user with a microphone, records the original voice picked up by the microphone, When the original speech is selected as the original language speech, the original speech is output from a speaker, and then a synthesized speech of another language corresponding to the original speech is acquired, and a synthesized speech of another language is obtained. When the synthesized speech is selected as the source language speech, the source language synthesized speech corresponding to the source speech is acquired, and the source language synthesized speech is obtained from the speaker. Then, a synthesized speech in another language corresponding to the original speech is acquired, and a synthesized speech in another language is output from the speaker.

本発明によれば、原言語の音声として、ユーザが発話した音声を録音した原音声と、その原音声から生成される原言語の合成音声とのいずれかを、現場の状況などに応じてユーザが適宜に切り換えて出力することができる。 According to the present invention, as the source language speech, either the source speech recorded by the user uttered speech or the source language synthesized speech generated from the source speech is selected according to the situation at the site. Can be appropriately switched and output.

本実施形態に係る拡声装置１の側面図Side view of the loudspeaker 1 according to the present embodiment 拡声装置１の概略構成を示すブロック図Block diagram showing a schematic configuration of the loudspeaker 1 制御部６で行われる処理の概略を示す説明図Explanatory drawing which shows the outline of the process performed by the control part 6 表示入力パネル９に表示される定型文表示画面を示す説明図Explanatory drawing which shows the fixed phrase display screen displayed on the display input panel 9 表示入力パネル９に表示される出力音声設定画面を示す説明図Explanatory drawing which shows the output audio | voice setting screen displayed on the display input panel 9 表示入力パネル９に表示されるギャップ設定画面を示す説明図Explanatory drawing which shows the gap setting screen displayed on the display input panel 9 再生時の音声の出力状況を示す説明図Explanatory diagram showing the output status of audio during playback 拡声装置１の動作手順を示すフロー図Flow chart showing operation procedure of loudspeaker 1 音声出力（ＳＴ１１４）での動作手順を示すフロー図Flow chart showing operation procedure in audio output (ST114) 音声出力（ＳＴ１１４）での動作手順を示すフロー図Flow chart showing operation procedure in audio output (ST114) 音声出力（ＳＴ１１４）での動作手順を示すフロー図Flow chart showing operation procedure in audio output (ST114)

前記課題を解決するためになされた第１の発明は、ユーザが発話した音声を出力する拡声装置であって、ユーザが発話した音声を収音するマイクと、前記マイクで収音した原音声を録音する録音部と、前記原音声に対応する原言語の合成音声および他言語の合成音声を取得する合成音声取得部と、前記原音声、前記原言語の合成音声および前記他言語の合成音声をスピーカから出力する音声出力部と、ユーザ設定情報に基づいて、原言語の音声として前記原音声と前記原言語の合成音声とのいずれかを出力するように制御する出力制御部と、を備える構成とする。 A first invention made to solve the above problem is a loudspeaker device that outputs a voice uttered by a user, a microphone that collects the voice uttered by the user, and an original voice collected by the microphone. A recording unit for recording, a synthesized speech acquisition unit for obtaining synthesized speech in a source language corresponding to the source speech and a synthesized speech in another language, and the source speech, the synthesized speech in the source language, and the synthesized speech in the other language A configuration comprising: an audio output unit that outputs from a speaker; and an output control unit that controls to output either the original speech or the synthesized speech of the original language as source language speech based on user setting information And

これによると、原言語の音声として、ユーザが発話した音声を録音した原音声と、その原音声から生成される原言語の合成音声とのいずれかを、現場の状況などに応じてユーザが適宜に切り換えて出力することができる。 According to this, as the source language speech, the user appropriately selects either the source speech recorded from the speech uttered by the user or the source language synthesized speech generated from the source speech according to the situation at the site. Can be output.

また、第２の発明は、前記出力制御部は、前記合成音声を出力する場合に、ユーザ設定情報に基づいて、女性合成音声と男性合成音声とのいずれかを出力するように制御する構成とする。 Further, the second invention is configured such that the output control unit controls to output either a female synthetic voice or a male synthetic voice based on user setting information when outputting the synthetic voice. To do.

これによると、現場の状況などに応じてユーザが適宜に性別を切り換えて合成音声を出力することができるため、ユーザの利便性を高めることができる。 According to this, since the user can appropriately switch the gender according to the situation at the site and output the synthesized speech, the convenience for the user can be improved.

また、第３の発明は、前記出力制御部は、前記原言語の音声に続けて、ユーザが指定した順番で複数の前記他言語の合成音声を出力するように制御する構成とする。 According to a third aspect of the present invention, the output control unit performs control so as to output a plurality of synthesized speech in other languages in the order specified by the user following the source language speech.

これによると、現場に滞在する外国人の割合などに応じて、他言語の合成音声を出力させる順番をユーザが指定することができるため、ユーザの利便性を高めることができる。 According to this, since the user can designate the order in which the synthesized speech of another language is output according to the ratio of foreigners staying at the site, the convenience of the user can be improved.

また、第４の発明は、前記音声出力部は、前記原言語の音声および前記他言語の音声を出力する際に、各言語の音声の間に、ユーザが指定した長さの無音期間を挿入する構成とする。 According to a fourth aspect of the present invention, the sound output unit inserts a silence period of a length specified by the user between the sound of each language when outputting the sound of the source language and the sound of the other language. The configuration is as follows.

これによると、各言語の音声が聞き取りやすくなる。 According to this, it becomes easy to hear the sound of each language.

また、第５の発明は、さらに、前記原音声の特徴情報を取得する音声解析部と、前記原音声および前記他言語の合成音声を出力する場合に、前記原音声と前記他言語の合成音声とで音声の特徴を一致させる処理を行う音声調整部と、を備える構成とする。 In addition, the fifth invention further includes a speech analysis unit that acquires feature information of the original speech, and a synthesized speech of the original speech and the other language when outputting the original speech and the synthesized speech of the other language. And a voice adjustment unit that performs processing for matching the voice characteristics.

これによると、拡声装置から連続して出力される音声（原音声および多言語の合成音声）の特徴が共通化されるので、聴く人物に与える違和感を低減することができる。 According to this, since the features of the voice (original voice and multilingual synthesized voice) continuously output from the loudspeaker are shared, it is possible to reduce the uncomfortable feeling given to the person who listens.

また、第６の発明は、前記音声調整部は、前記原音声と前記他言語の合成音声とで、音声の性別、テンポ、音量および高さの少なくとも一つを一致させる構成とする。 According to a sixth aspect of the present invention, the voice adjustment unit is configured to match at least one of voice gender, tempo, volume, and height between the original voice and the synthesized voice of the other language.

また、第７の発明は、ユーザが発話した音声を出力する拡声装置の制御方法であって、ユーザが発話した音声をマイクで収音し、前記マイクで収音した原音声を録音し、原言語の音声として前記原音声が選択されている場合には、前記原音声をスピーカから出力し、続いて、前記原音声に対応する他言語の合成音声を取得して、その他言語の合成音声を前記スピーカから出力し、前記原言語の音声として合成音声が選択されている場合には、前記原音声に対応する原言語の合成音声を取得して、その原言語の合成音声を前記スピーカから出力し、続いて、前記原音声に対応する他言語の合成音声を取得して、その他言語の合成音声を前記スピーカから出力する構成とする。 The seventh invention is a method of controlling a loudspeaker that outputs a voice uttered by a user, the voice uttered by the user is picked up by a microphone, the original voice picked up by the microphone is recorded, and the original voice is recorded. When the original voice is selected as a language voice, the original voice is output from a speaker, and then a synthesized voice of another language corresponding to the original voice is acquired, and a synthesized voice of the other language is obtained. When the synthesized speech is selected as the source language speech output from the speaker, the source language synthesized speech corresponding to the source speech is acquired and the source language synthesized speech is output from the speaker. Subsequently, a synthesized speech in another language corresponding to the original speech is acquired, and a synthesized speech in another language is output from the speaker.

これによると、第１の発明と同様に、ユーザが発話した音声を録音した原音声と、その原音声から生成される原言語の合成音声とをユーザが適宜に切り替えて出力することができる。 According to this, similarly to the first invention, the user can appropriately switch and output the original voice recording the voice spoken by the user and the synthesized voice of the original language generated from the original voice.

以下、本発明の実施の形態を、図面を参照しながら説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本実施形態に係る拡声装置１の側面図である。 FIG. 1 is a side view of a loudspeaker 1 according to the present embodiment.

この拡声装置１では、筐体２の内部に、マイク３と、音声切換部５と、制御部６と、アンプ７と、スピーカ８と、表示入力パネル９と、が収容されている。また、筐体２の下部には、ユーザが把持するグリップ１０が取り付けられている。筐体２におけるスピーカ８の前側には、円錐形状のホーン部１１が設けられている。なお、筐体２の内部には電池(図示せず)も収容されている。 In the loudspeaker 1, a microphone 3, a voice switching unit 5, a control unit 6, an amplifier 7, a speaker 8, and a display input panel 9 are accommodated in the housing 2. Further, a grip 10 held by the user is attached to the lower part of the housing 2. A conical horn portion 11 is provided on the front side of the speaker 8 in the housing 2. A battery (not shown) is also accommodated in the housing 2.

マイク３は、ユーザが発話する音声を収音する。 The microphone 3 collects the voice uttered by the user.

制御部６は、マイク３で収音した原音声を録音する処理や、録音した原音声を文字情報に変換する処理（音声認識）や、原音声の文字情報（原文）と類似する定型文を探し出す処理（検索）や、定型文の文字情報から合成音声を生成する処理（音声合成）や、合成音声や原音声を再生する処理などを行う。 The control unit 6 records a standard sentence similar to the process of recording the original voice collected by the microphone 3, the process of converting the recorded original voice into character information (speech recognition), and the character information (original text) of the original voice. A search process (search), a process of generating synthesized speech from the text information of a fixed sentence (speech synthesis), a process of reproducing synthesized speech and original speech, and the like are performed.

音声切換部５は、メガホンモード（第１の動作モード）での音声パス（音声の経路）と、翻訳モード（第２の動作モード）での音声パスとを切り換える。アンプ７は、音声切換部５から出力される音声を増幅する。スピーカ８は、アンプ７で増幅された音声を出力する。 The voice switching unit 5 switches a voice path (voice path) in the megaphone mode (first operation mode) and a voice path in the translation mode (second operation mode). The amplifier 7 amplifies the sound output from the sound switching unit 5. The speaker 8 outputs the sound amplified by the amplifier 7.

メガホンモードでは、マイク３で収音した原音声をそのままアンプ７で増幅してスピーカ８で出力する。翻訳モードでは、マイク３で収音した原音声を制御部６に入力して、制御部６で生成した合成音声や原音声をアンプ７で増幅してスピーカ８で出力する。 In the megaphone mode, the original sound picked up by the microphone 3 is directly amplified by the amplifier 7 and output from the speaker 8. In the translation mode, the original voice picked up by the microphone 3 is input to the control unit 6, and the synthesized voice or the original voice generated by the control unit 6 is amplified by the amplifier 7 and output by the speaker 8.

筐体２の側面には、モード切換スイッチ１２が設けられている。このモード切換スイッチ１２は、メガホンモードと翻訳モードとを切り換えるものであり、モード切換スイッチ１２の操作に応じて、音声切換部５においてメガホンモードでの音声パスと翻訳モードでの音声パスとが切り換えられる。 A mode change switch 12 is provided on the side surface of the housing 2. The mode switch 12 switches between the megaphone mode and the translation mode, and the voice switching unit 5 switches between the voice path in the megaphone mode and the voice path in the translation mode according to the operation of the mode switch 12. It is done.

表示入力パネル９（表示部、入力部）は、タッチパネルと液晶ディスプレイとを組み合わせた、いわゆるタッチパネルディスプレイで構成され、画面を上向きにした状態で、筐体２の上部に形成された凸部１８に収容されている。 The display input panel 9 (display unit, input unit) is configured by a so-called touch panel display in which a touch panel and a liquid crystal display are combined, and the projection 18 formed on the upper portion of the housing 2 with the screen facing upward. Contained.

凸部１８の側面には電源スイッチ１３が設けられている。 A power switch 13 is provided on the side surface of the convex portion 18.

グリップ１０の手前側には、録音スイッチ１４と、音量調整スイッチ１５とが設けられている。グリップ１０の逆側には、出力スイッチ１６と、ロックスイッチ１７と、が設けられている。ユーザは、一方の手でグリップ１０を把持した状態で、その手の親指で録音スイッチを操作し、また、人差し指で出力スイッチを操作することができる。 A recording switch 14 and a volume adjustment switch 15 are provided on the front side of the grip 10. On the opposite side of the grip 10, an output switch 16 and a lock switch 17 are provided. The user can operate the recording switch with the thumb of the hand while holding the grip 10 with one hand, and can operate the output switch with the index finger.

録音スイッチ１４は、マイク３で収音した音声の録音を指示するものである。音量調整スイッチ１５は、スピーカ８から出力される音声の音量を調整するものである。 The recording switch 14 instructs to record the sound collected by the microphone 3. The volume adjustment switch 15 is for adjusting the volume of the sound output from the speaker 8.

出力スイッチ１６は、メガホンモードでは、アンプ７を動作させる操作を行うものとなり、出力スイッチ１６を押下すると、マイク３で収音された原音声がアンプ７で増幅して出力される。一方、翻訳モードでは、出力スイッチ１６は、アンプ７を動作させるとともに音声の再生を制御部６に指示する操作を行うものとなり、出力スイッチ１６を押下すると、制御部６で音声が再生されて、その音声がアンプ７で増幅されて出力される。このとき、出力スイッチ１６を押下し続けることで、音声が繰返し再生される。 The output switch 16 performs an operation for operating the amplifier 7 in the megaphone mode. When the output switch 16 is pressed, the original sound collected by the microphone 3 is amplified by the amplifier 7 and output. On the other hand, in the translation mode, the output switch 16 operates the amplifier 7 and instructs the control unit 6 to reproduce the sound. When the output switch 16 is pressed, the control unit 6 reproduces the sound, The sound is amplified by the amplifier 7 and output. At this time, the sound is repeatedly reproduced by continuously pressing the output switch 16.

ロックスイッチ１７は、出力スイッチ１６を押下状態に保持するものである。これにより、出力スイッチ１６をユーザが押下し続けなくても音声の出力を継続させることができる。 The lock switch 17 holds the output switch 16 in a pressed state. Thereby, even if the user does not keep pressing the output switch 16, the sound output can be continued.

なお、モード切換スイッチ１２はロッカースイッチであり、録音スイッチ１４および出力スイッチ１６は押ボタンスイッチであり、音量調整スイッチ１５はロータリースイッチである。 The mode switch 12 is a rocker switch, the recording switch 14 and the output switch 16 are pushbutton switches, and the volume control switch 15 is a rotary switch.

次に、拡声装置１の概略構成について説明する。図２は、拡声装置１の概略構成を示すブロック図である。図３は、制御部６で行われる処理の概略を示す説明図である。 Next, a schematic configuration of the loudspeaker 1 will be described. FIG. 2 is a block diagram showing a schematic configuration of the loudspeaker 1. FIG. 3 is an explanatory diagram showing an outline of the processing performed by the control unit 6.

モード切換スイッチ１２の信号が音声切換部５に入力される。この音声切換部５は、モード切換スイッチ１２の操作に応じて、メガホンモードでの音声パスと翻訳モードでの音声パスとを切り換えるものであり、入力切換部２１と、出力切換部２２と、を備えている。入力切換部２１では、マイク３から出力される音声を、出力切換部２２側および制御部６側のいずれかに出力する。出力切換部２２では、入力切換部２１および制御部６のいずれかから入力される音声をアンプ７側に出力する。 A signal from the mode switch 12 is input to the voice switching unit 5. The voice switching unit 5 switches the voice path in the megaphone mode and the voice path in the translation mode in accordance with the operation of the mode switch 12. The voice switching unit 5 includes an input switching unit 21 and an output switching unit 22. I have. The input switching unit 21 outputs the sound output from the microphone 3 to either the output switching unit 22 side or the control unit 6 side. The output switching unit 22 outputs the sound input from either the input switching unit 21 or the control unit 6 to the amplifier 7 side.

音声切換部５と制御部６との間にはレベル調整部２３が設けられている。このレベル調整部２３では、音声切換部５の入力切換部２１から出力される音声のレベルが調整される。 A level adjustment unit 23 is provided between the voice switching unit 5 and the control unit 6. In the level adjusting unit 23, the level of the sound output from the input switching unit 21 of the sound switching unit 5 is adjusted.

音量調整スイッチ１５の信号が音量調整部２４に入力される。この音量調整部２４は、音声切換部５とアンプ７との間に設けられている。この音量調整部２４では、音量調整スイッチ１５の操作に応じて、音声切換部５の出力切換部２２から出力される音声の音量が調整される。 A signal from the volume adjustment switch 15 is input to the volume adjustment unit 24. The volume adjustment unit 24 is provided between the audio switching unit 5 and the amplifier 7. In the volume adjusting unit 24, the volume of the sound output from the output switching unit 22 of the sound switching unit 5 is adjusted according to the operation of the volume adjustment switch 15.

出力スイッチ１６の信号がアンプ７に入力される。出力スイッチ１６は、アンプ７への給電を断続するスイッチとして機能し、メガホンモードにおいて出力スイッチ１６が押下されると、アンプ７が通電して音声出力状態となり、マイク３から入力される原音声がアンプ７で増幅されて出力される。一方、翻訳モードにおいて出力スイッチ１６が押下されると、アンプ７が通電して音声出力状態になるとともに、再生部３５で音声が再生されて、再生部３５から出力される音声がアンプ７で増幅されて出力される。 A signal from the output switch 16 is input to the amplifier 7. The output switch 16 functions as a switch for intermittently supplying power to the amplifier 7, and when the output switch 16 is pressed in the megaphone mode, the amplifier 7 is energized to enter a sound output state, and the original sound input from the microphone 3 is received. Amplified by the amplifier 7 and output. On the other hand, when the output switch 16 is pressed in the translation mode, the amplifier 7 is energized to enter the audio output state, and the reproduction unit 35 reproduces the audio, and the audio output from the reproduction unit 35 is amplified by the amplifier 7. Is output.

記憶部２５は、制御部６において、ユーザが発話する音声を録音した原音声や、原音声の音声認識により取得した原文や、定型文の文字情報から変換された合成音声を一時記憶する。また、記憶部２５は、定型文データベースを記憶する。この定型文データベースには、多数の定型文が登録されている。 The storage unit 25 temporarily stores, in the control unit 6, the original voice that has been recorded by the user, the original sentence acquired by voice recognition of the original voice, and the synthesized voice converted from the character information of the fixed sentence. The storage unit 25 stores a fixed phrase database. Many fixed sentences are registered in this fixed sentence database.

また、記憶部２５は、表示入力パネル９に表示された設定画面上でユーザが入力した情報をユーザ設定情報として記憶する。本実施形態では、ユーザ設定情報として、原言語の音声として原音声を出力するか否かに関する情報や、合成音声の性別（女性または男性）に関する情報や、複数の他言語（英語、中国語など）の音声を出力する順番に関する情報が記憶される。 In addition, the storage unit 25 stores information input by the user on the setting screen displayed on the display input panel 9 as user setting information. In the present embodiment, as user setting information, information on whether or not to output the original voice as the voice of the original language, information on the gender (female or male) of the synthesized voice, a plurality of other languages (English, Chinese, etc.) ) Is stored.

制御部６は、録音部３１と、音声認識部３２と、検索部３３と、音声合成部３４と、再生部３５と、出力制御部３６と、音声解析部３７と、音声調整部３８と、を備えている。この制御部６は、プロセッサで構成され、制御部６の各部は、記憶部２５に記憶されたプログラムを実行することで実現される。 The control unit 6 includes a recording unit 31, a voice recognition unit 32, a search unit 33, a voice synthesis unit 34, a reproduction unit 35, an output control unit 36, a voice analysis unit 37, a voice adjustment unit 38, It has. The control unit 6 includes a processor, and each unit of the control unit 6 is realized by executing a program stored in the storage unit 25.

なお、音声認識部３２、検索部３３、および音声合成部３４で合成音声取得部２６が構成される。また、再生部３５、アンプ７およびスピーカ８で音声出力部２７が構成される（図３参照）。 The voice recognition unit 32, the search unit 33, and the voice synthesis unit 34 constitute a synthesized voice acquisition unit 26. In addition, the audio output unit 27 is configured by the reproduction unit 35, the amplifier 7, and the speaker 8 (see FIG. 3).

この制御部６には、モード切換スイッチ１２の信号、録音スイッチ１４の信号、および出力スイッチ１６の信号が入力される。 The control unit 6 receives a signal from the mode change switch 12, a signal from the recording switch 14, and a signal from the output switch 16.

録音部３１は、レベル調整部２３から出力される原音声を録音する。この録音処理では、音声信号（アナログ信号）を音声データにＡ／Ｄ変換して記憶部２５に記憶させる。録音部３１では、録音スイッチ１４が押下されると、録音処理が開始され、録音スイッチ１４がリリースされると、録音処理を終了する。 The recording unit 31 records the original sound output from the level adjustment unit 23. In this recording process, an audio signal (analog signal) is A / D converted into audio data and stored in the storage unit 25. The recording unit 31 starts the recording process when the recording switch 14 is pressed, and ends the recording process when the recording switch 14 is released.

音声認識部３２は、録音部３１で録音された原音声を文字情報に変換する音声認識を行い、この音声認識結果として、原文（原音声の文字情報）を取得する。この原文は記憶部２５に一時記憶される。 The voice recognition unit 32 performs voice recognition for converting the original voice recorded by the recording unit 31 into character information, and acquires the original sentence (character information of the original voice) as the voice recognition result. This original text is temporarily stored in the storage unit 25.

検索部３３は、定型文データベースに登録された原言語（例えば、日本語）の定型文の中から、原文と類似度が最も高い定型文を探し出す（図３参照）。 The search unit 33 searches for a fixed phrase having the highest similarity to the original sentence from fixed phrases in the original language (for example, Japanese) registered in the fixed phrase database (see FIG. 3).

音声合成部３４は、原言語の定型文を定型文データベースから取得して、その原言語（例えば、日本語）の定型文の文字情報から音声合成により原言語の合成音声を生成する。他言語の定型文を定型文データベースから取得して、その他言語（例えば、英語、中国語など）の定型文の文字情報から音声合成により他言語の合成音声を生成する。この音声合成部３４で生成した合成音声は記憶部２５に一時記憶される。 The speech synthesizer 34 acquires a source language fixed sentence from a fixed sentence database, and generates a synthesized speech of the source language by speech synthesis from the character information of the fixed sentence of the source language (for example, Japanese). A fixed sentence in another language is acquired from a fixed sentence database, and synthesized speech in another language is generated by speech synthesis from character information of the fixed sentence in another language (for example, English, Chinese, etc.). The synthesized speech generated by the speech synthesizer 34 is temporarily stored in the storage unit 25.

出力制御部３６は、出力スイッチ１６が押下されると、音声の出力を開始し、出力スイッチ１６がリリースされると、音声の出力を停止するように制御する。このとき、出力制御部３６は、検索部３３で取得した原言語の定型文およびこれに対応する他言語の定型文の音声合成を音声合成部３４に指示し、さらに、音声合成部３４で生成した合成音声の再生を再生部３５に指示する（図３参照）。また、出力制御部３６は、ユーザ設定情報に基づいて、原言語の音声に続けて、ユーザが指定した順番で複数の他言語の合成音声を出力するように、音声合成部３４および再生部３５に指示する。 The output control unit 36 controls to start outputting sound when the output switch 16 is pressed and to stop outputting sound when the output switch 16 is released. At this time, the output control unit 36 instructs the speech synthesizer 34 to synthesize speech of the source language fixed phrases acquired by the search unit 33 and the fixed phrases of other languages corresponding thereto, and the speech synthesizer 34 further generates them. The reproduction unit 35 is instructed to reproduce the synthesized speech (see FIG. 3). In addition, the output control unit 36 outputs the synthesized speech of a plurality of other languages in the order designated by the user, following the source language speech, based on the user setting information. To instruct.

また、出力制御部３６は、ユーザ設定情報に基づいて、原言語の音声として原音声と原言語（例えば、日本語）の合成音声とのいずれかを出力するように制御する。ここで、原音声を出力する場合には、原音声を記憶部２５から取得して、原音声の再生を再生部３５に指示する（図３参照）。一方、原言語の合成音声を出力する場合には、原言語の定型文を記憶部２５から取得して、原言語の定型文の音声合成を音声合成部３４に指示し、さらに、音声合成部３４で生成した原言語の合成音声の再生を再生部３５に指示する（図３参照）。 Further, the output control unit 36 controls to output either the original speech or the synthesized speech of the original language (for example, Japanese) as the source language speech based on the user setting information. Here, when outputting the original sound, the original sound is acquired from the storage unit 25 and the reproduction unit 35 is instructed to reproduce the original sound (see FIG. 3). On the other hand, when outputting synthesized speech in the source language, the standard sentence in the source language is acquired from the storage unit 25, the speech synthesis unit 34 is instructed to synthesize speech in the source language fixed sentence, and the speech synthesis unit The reproduction unit 35 is instructed to reproduce the synthesized speech of the source language generated in 34 (see FIG. 3).

なお、原音声を出力する場合には、マイク３で収音されたユーザの発話した音声(例えば、「こちらで物資を配ります。」)そのものが出力されるが、原言語の合成音声を出力する場合には、記憶部２５に記憶される原言語の定型文から、ユーザの発話した音声に類似度が最も高い定型文（例えば、「こちらで物資を配布しております。」）を取得し、この定型文の音声合成が出力されるので、ユーザが実際に発話した音声の内容と多少異なる可能性がある。 When outputting the original voice, the voice spoken by the user collected by the microphone 3 (for example, “Distribute materials here”) itself is output, but the synthesized speech in the original language is output. In the case of doing so, the standard sentence having the highest similarity to the speech uttered by the user is acquired from the standard sentence stored in the storage unit 25 (for example, “We distribute materials here.”). However, since the speech synthesis of this fixed sentence is output, it may be slightly different from the content of the speech that the user actually uttered.

また、出力制御部３６は、合成音声を出力する場合に、ユーザ設定情報に基づいて、女性合成音声と男性合成音声とのいずれかを出力するように制御する。女性合成音声を出力する場合には、女性音声合成を生成するように音声合成部３４に指示し、男性合成音声を出力する場合には、男性合成音声を生成するように音声合成部３４に指示する。 Further, when outputting the synthesized speech, the output control unit 36 performs control so as to output either the female synthesized speech or the male synthesized speech based on the user setting information. In the case of outputting female synthesized speech, the voice synthesizer 34 is instructed to generate female speech synthesis. In the case of outputting male synthesized speech, the voice synthesizer 34 is instructed to generate male synthesized speech. To do.

再生部３５は、録音部３１で録音された原音声、および音声合成部３４で生成した合成音声を再生する。この再生処理では、原音声および合成音声のデータを音声信号（アナログ信号）にＤ／Ａ変換する処理が行われる。なお、原音声および合成音声は記憶部２５に一時記憶されており、出力スイッチ１６が押下されている状態では、原言語の音声（原音声または合成音声）および他言語の合成音声が所定の順番で連続して繰り返し再生される。 The playback unit 35 plays back the original voice recorded by the recording unit 31 and the synthesized voice generated by the voice synthesis unit 34. In this reproduction process, a process of D / A converting original voice and synthesized voice data into a voice signal (analog signal) is performed. Note that the original speech and synthesized speech are temporarily stored in the storage unit 25, and when the output switch 16 is pressed, the source language speech (original speech or synthesized speech) and the synthesized speech of other languages are in a predetermined order. Is played back repeatedly continuously.

また、再生部３５は、原言語の音声（原音声または合成音声）および他言語の音声を出力する際に、各言語の音声の間に、ユーザが指定した長さのギャップ（無音期間）を挿入する。このギャップ（無音期間）も、ユーザの指定に基づいてユーザ設定情報として記憶しておき、出力制御部３６から再生部３５に通知させるようにするとよい。 Further, when outputting the speech in the original language (original speech or synthesized speech) and the speech in another language, the playback unit 35 creates a gap (silence period) of a length specified by the user between the speech in each language. insert. This gap (silence period) is also preferably stored as user setting information based on the user's specification, and the output control unit 36 may notify the playback unit 35 of the gap.

音声解析部３７は、記憶部２５から原音声を取得して、原音声の特徴情報を取得する。本実施形態では、原音声の特徴情報として、性別（男声または女声）、テンポ（スピード）、音量、高さ（トーン）に関する情報を取得する。 The voice analysis unit 37 acquires the original voice from the storage unit 25 and acquires the feature information of the original voice. In the present embodiment, information on gender (male voice or female voice), tempo (speed), volume, and height (tone) is acquired as characteristic information of the original voice.

音声調整部３８は、原音声と他言語の合成音声とを出力する場合に、原音声と合成音声との間で音声の特徴を一致させる処理を行う。本実施形態では、音声の特徴として、性別（男声または女声）、テンポ（スピード）、音量、高さ（トーン）の少なくともいずれかに関する調整を行う。 When outputting the original voice and the synthesized voice of another language, the voice adjustment unit 38 performs a process of matching the voice characteristics between the original voice and the synthesized voice. In the present embodiment, adjustments relating to at least one of gender (male voice or female voice), tempo (speed), volume, and height (tone) are performed as voice characteristics.

また、本実施形態では、合成音声を原音声に合わせるモードと、原音声を合成音声に合わせるモードとがあり、いずれかのモードをユーザが選択することができる。合成音声を原音声に合わせるモードでは、合成音声の性別、テンポ、音量および高さが原音声に一致するように、音声合成部３４において、原音声の特徴情報に基づいて合成音声を生成する。原音声を合成音声に合わせるモードでは、合成音声の初期設定で採用されている標準的なテンポ、音量、高さに原音声が一致するように、原音声を音声変換する。 Further, in the present embodiment, there are a mode in which the synthesized speech is matched with the original speech and a mode in which the synthesized speech is matched with the synthesized speech, and the user can select either mode. In the mode in which the synthesized speech is matched with the original speech, the speech synthesizer 34 generates the synthesized speech based on the feature information of the original speech so that the gender, tempo, volume and height of the synthesized speech match the original speech. In the mode in which the original voice is matched with the synthesized voice, the original voice is converted so that the original voice matches the standard tempo, volume, and height adopted in the initial setting of the synthesized voice.

このように原音声と合成音声とで音声の特徴を一致させるようにすると、拡声装置１から連続して出力される音声（原音声および合成音声）の特徴が共通化されるので、聴く人物に与える違和感を低減することができる。 Thus, if the features of the speech are made to match between the original speech and the synthesized speech, the features of the speech (original speech and synthesized speech) that are continuously output from the loudspeaker 1 are shared, so The unpleasant feeling given can be reduced.

次に、表示入力パネル９に表示される定型文表示画面について説明する。図４は、定型文表示画面を示す説明図である。 Next, the fixed phrase display screen displayed on the display input panel 9 will be described. FIG. 4 is an explanatory diagram showing a fixed phrase display screen.

この定型文表示画面には、定型文表示部４１が設けられている。本実施形態では、検索部３３において、ユーザが発話した原音声の音声認識により生成される原文と類似度が最も高い定型文が検索され、ここで見つかった原言語（日本語）の定型文が、定型文表示部４１に表示される。 A fixed sentence display unit 41 is provided on the fixed sentence display screen. In the present embodiment, the search unit 33 searches for a fixed phrase having the highest similarity to the original sentence generated by speech recognition of the original speech uttered by the user, and the fixed sentence of the original language (Japanese) found here is searched. Is displayed on the fixed phrase display unit 41.

また、この定型文表示画面には、再生順序表示部４２が設けられている。この再生順序表示部４２には、ユーザ設定情報に基づいて、原言語（日本語）および他言語（英語、中国語、韓国語など）の再生順序が表示される。また、再生順序表示部４２には、国旗アイコン４３が設けられており、この国旗アイコン４３を操作することで、他言語の文字情報が定型文表示部４１に表示される。なお、ユーザによる再生順序の設定は設定画面（図示せず）で行われる。 In addition, a reproduction order display unit 42 is provided on the fixed sentence display screen. The playback order display unit 42 displays the playback order of the source language (Japanese) and other languages (English, Chinese, Korean, etc.) based on the user setting information. The reproduction order display unit 42 is provided with a national flag icon 43. By operating the national flag icon 43, character information of other languages is displayed on the standard sentence display unit 41. Note that the user sets the playback order on a setting screen (not shown).

この定型文表示画面が表示されている状態で出力スイッチ１６を押下すると、この定型文表示画面に表示された定型文に関する原言語の音声および他言語の音声が、再生順序表示部４２に表示された順序で出力される。 When the output switch 16 is pressed while the standard sentence display screen is displayed, the original language voice and the other language voice related to the standard sentence displayed on the standard sentence display screen are displayed on the reproduction order display unit 42. Are output in order.

次に、表示入力パネル９に表示される出力音声設定画面について説明する。図５は、出力音声設定画面を示す説明図である。 Next, the output audio setting screen displayed on the display input panel 9 will be described. FIG. 5 is an explanatory diagram showing an output audio setting screen.

この出力音声設定画面には、出力音声選択部５１が設けられている。この出力音声選択部５１には、２つのラジオボタン５２が設けられており、このラジオボタン５２の操作により、女性合成音声を出力するモードと、男性合成音声を出力するモードとのいずれかをユーザが選択することができる。また、出力音声選択部５１には、チェックボックス５３が設けられており、このチェックボックス５３の操作により、原言語（日本語）の音声において原音声を優先して出力するか否かをユーザが選択することができる。 An output sound selection unit 51 is provided on the output sound setting screen. The output voice selection unit 51 is provided with two radio buttons 52. By operating the radio button 52, a user can select either a mode for outputting a female synthesized voice or a mode for outputting a male synthesized voice. Can be selected. In addition, the output sound selection unit 51 is provided with a check box 53, and by the operation of the check box 53, the user determines whether or not to output the original sound in the original language (Japanese) sound. You can choose.

また、この出力音声設定画面には、音声調整選択部５４が設けられている。この音声調整選択部５４には、チェックボックス５５が設けられており、このチェックボックス５５の操作により、音声調整を行うか否かをユーザが選択することができる。また、音声調整選択部５４には、２つのラジオボタン５６が設けられており、このラジオボタン５６の操作により、音声調整時に合成音声を原音声に合わせるか、原音声を合成音声に合わせるかをユーザが選択することができる。 In addition, a sound adjustment selection unit 54 is provided on the output sound setting screen. The sound adjustment selection unit 54 is provided with a check box 55, and the user can select whether or not to perform sound adjustment by operating the check box 55. Further, the radio adjustment selection unit 54 is provided with two radio buttons 56, and by operating these radio buttons 56, it is determined whether the synthesized voice is adjusted to the original voice or the original voice is adjusted to the synthesized voice when the voice is adjusted. The user can select.

また、この出力音声設定画面には、キャンセルボタン５７と、ＯＫボタン５８とが設けられている。キャンセルボタン５７を操作すると、出力音声選択部５１および音声調整選択部５４でユーザが選択した内容を破棄して、設定メニュー画面（図示せず）に戻る。ＯＫボタン５８を操作すると、出力音声選択部５１および音声調整選択部５４でユーザが選択した内容で記憶部２５のユーザ設定情報が更新されて、設定メニュー画面（図示せず）に戻る。 The output sound setting screen is provided with a cancel button 57 and an OK button 58. When the cancel button 57 is operated, the contents selected by the user in the output sound selection unit 51 and the sound adjustment selection unit 54 are discarded, and the screen returns to the setting menu screen (not shown). When the OK button 58 is operated, the user setting information in the storage unit 25 is updated with the contents selected by the user in the output audio selection unit 51 and the audio adjustment selection unit 54, and the screen returns to the setting menu screen (not shown).

このように本実施形態では、原言語の音声として、ユーザが発話した音声を録音した原音声と、原音声に対応する定型文の文字情報から音声合成された合成音声とのいずれかをユーザが選択して出力することができる。例えば、緊急時の避難誘導で緊迫感を出す必要がある場合には、原音声を出力するように設定するとよい。また、通常時の案内では、違和感をなくすため、原言語の音声として合成音声を選択して、各言語の音声を合成音声で統一するとよい。 As described above, in the present embodiment, as the source language speech, the user selects either the original speech recorded from the speech uttered by the user or the synthesized speech synthesized from the text information of the fixed sentence corresponding to the original speech. You can select and output. For example, when it is necessary to give a sense of urgency in emergency evacuation guidance, it may be set to output the original voice. Further, in order to eliminate a sense of incongruity in normal guidance, it is preferable to select synthesized speech as source language speech and unify the speech of each language with synthesized speech.

また、本実施形態では、合成音声の性別（男声または女声）をユーザが選択することができる。このため、使用状況に適した性別の合成音声を出力することができる。例えば、緊急時の避難誘導で緊迫感を出す必要がある場合には男性の声を選択するとよく、また、通常時の案内などの場合には女性の声を選択するとよい。 In the present embodiment, the user can select the sex (male voice or female voice) of the synthesized voice. For this reason, it is possible to output a synthesized voice having sex suitable for the use situation. For example, a male voice may be selected when a sense of urgency is required for emergency evacuation guidance, and a female voice may be selected for normal guidance.

また、本実施形態では、音声調整を行うか否かをユーザが選択することができ、さらに、合成音声を原音声に合わせるか、原音声を合成音声に合わせるかをユーザが選択することができる。このため、使用状況に適した音声を出力することができる。例えば、原音声が早口である場合には、聞き取りやすいように、原音声のテンポを遅くしたり、また、逆に、緊急を要する場面では、緊迫感が損なわれないように、合成音声を原音声に合わせて合成音声のテンポを速くしたりすることができる。 Further, in the present embodiment, the user can select whether or not to perform sound adjustment, and further, the user can select whether the synthesized speech is matched with the original speech or the original speech is matched with the synthesized speech. . For this reason, the sound suitable for the use situation can be output. For example, if the original voice is fast, the original voice is synthesized so that the tempo of the original voice is slow so that it is easy to hear. The tempo of the synthesized voice can be increased according to the voice.

次に、表示入力パネル９に表示されるギャップ設定画面について説明する。図６は、ギャップ設定画面を示す説明図である。 Next, the gap setting screen displayed on the display input panel 9 will be described. FIG. 6 is an explanatory diagram showing a gap setting screen.

このギャップ設定画面には、複数のラジオボタン６１が設けられており、このラジオボタン６１の操作により、音声を出力する際に各言語の音声の間に挿入されるギャップ（無音期間）の長さ（ギャップ時間）をユーザが選択することができる。図６に示す例では、ラジオボタン６１が４つ設けられており、０．５秒、１．０秒、２．０秒および３．０秒のいずれかを選択することができる。 In this gap setting screen, a plurality of radio buttons 61 are provided, and the length of a gap (silence period) inserted between voices of each language when voices are output by operating the radio buttons 61. The (gap time) can be selected by the user. In the example shown in FIG. 6, four radio buttons 61 are provided, and one of 0.5 seconds, 1.0 seconds, 2.0 seconds, and 3.0 seconds can be selected.

また、このギャップ設定画面には、キャンセルボタン６２と、ＯＫボタン６３とが設けられている。キャンセルボタン６２を操作すると、ユーザが選択した内容を破棄して、設定メニュー画面（図示せず）に戻る。ＯＫボタン６３を操作すると、ユーザが選択した内容で記憶部２５のユーザ設定情報が更新されて、設定メニュー画面（図示せず）に戻る。 In addition, a cancel button 62 and an OK button 63 are provided on the gap setting screen. When the cancel button 62 is operated, the content selected by the user is discarded and the screen returns to the setting menu screen (not shown). When the OK button 63 is operated, the user setting information in the storage unit 25 is updated with the contents selected by the user, and the screen returns to the setting menu screen (not shown).

次に、再生時の音声の出力状況について説明する。図７は、再生時の音声の出力状況を示す説明図である。 Next, the audio output status during reproduction will be described. FIG. 7 is an explanatory diagram showing the output status of audio during playback.

本実施形態では、原音声を優先して出力するか否か、および女性および男性のいずれの合成音声を出力するかをユーザが選択することができ、この情報がユーザ設定情報として記憶部２５に記憶され、音声を出力する際には、ユーザ設定情報に基づいて、必要な音声合成処理を行って、各言語の音声が順に出力される。なお、図７は、原言語として日本語、他言語として英語および中国語を選択した例である。 In the present embodiment, the user can select whether to output the original voice with priority and whether to output the female or male synthetic voice, and this information is stored in the storage unit 25 as user setting information. When the speech is stored and output, necessary speech synthesis processing is performed based on the user setting information, and the speech in each language is output in order. FIG. 7 shows an example in which Japanese is selected as the source language and English and Chinese are selected as the other languages.

ここで、原音声を優先し、かつ、女性合成音声を出力するように設定されている場合には、図７（Ａ）に示すように、原音声（日本語）、英語の女性合成音声、中国語の女性合成音声が順に出力される。また、原音声を優先し、かつ、男性合成音声を出力するように設定されている場合には、図７（Ｂ）に示すように、原音声（日本語）、英語の男性合成音声、中国語の男性合成音声が順に出力される。 Here, when the original voice is given priority and the female synthetic voice is set to be output, as shown in FIG. 7A, the original voice (Japanese), the English female synthetic voice, Chinese female synthesized speech is output in order. Further, when the original voice is given priority and the male synthetic voice is set to be output, as shown in FIG. 7B, the original voice (Japanese), the English male synthetic voice, the Chinese The male synthesized speech of words is output in order.

また、原音声を優先せず、かつ、女性合成音声を出力するように設定されている場合には、図７（Ｃ）に示すように、日本語の女性合成音声、英語の女性合成音声、中国語の女性合成音声が順に出力される。また、原音声を優先せず、かつ、男性合成音声を出力するように設定されている場合には、図７（Ｄ）に示すように、日本語の男性合成音声、英語の男性合成音声、中国語の男性合成音声が順に出力される。 Further, when the original voice is not given priority and the female synthetic voice is set to be output, as shown in FIG. 7C, the Japanese female synthetic voice, the English female synthetic voice, Chinese female synthesized speech is output in order. Further, when the original voice is not given priority and the male synthetic voice is set to be output, as shown in FIG. 7D, a Japanese male synthetic voice, an English male synthetic voice, Chinese male synthesized speech is output in order.

また、各言語の音声を出力する際には、各言語の音声の間に、ユーザが指定した長さのギャップ（無音期間）が挿入される。このため、各言語の音声が聞き取りやすくなる。 Further, when outputting the sound of each language, a gap (silence period) having a length designated by the user is inserted between the sounds of each language. For this reason, it becomes easy to hear the sound of each language.

次に、拡声装置１の動作について説明する。図８は、拡声装置１の動作手順を示すフロー図である。 Next, the operation of the loudspeaker 1 will be described. FIG. 8 is a flowchart showing the operation procedure of the loudspeaker 1.

拡声装置１では、まず、モード切換スイッチ１２が翻訳モードの状態でない、すなわち、メガホンモードの状態であれば（ＳＴ１０１でＮｏ）、音声切換部５が、マイク３で収音した原音声をそのまま出力する状態となり、ここで、出力スイッチ１６が押下されると（ＳＴ１０２でＹｅｓ）、アンプ７が音声出力状態となり、原音声の出力を開始する（ＳＴ１０３）。このとき、ユーザが発話した原音声がそのままアンプ７で増幅されてスピーカ８から出力される。そして、出力スイッチ１６が戻されると（ＳＴ１０４でＹｅｓ）、原音声の出力を停止する（ＳＴ１０５）。なお、出力スイッチ１６が押下されていない場合には（ＳＴ１０２でＮｏ）、特別な動作は行われない。 In the loudspeaker 1, first, if the mode switch 12 is not in the translation mode, that is, in the megaphone mode (No in ST101), the voice switching unit 5 outputs the original voice collected by the microphone 3 as it is. When the output switch 16 is pressed (Yes in ST102), the amplifier 7 enters a sound output state and starts outputting the original sound (ST103). At this time, the original voice spoken by the user is directly amplified by the amplifier 7 and output from the speaker 8. When the output switch 16 is returned (Yes in ST104), the output of the original voice is stopped (ST105). When the output switch 16 is not pressed (No in ST102), no special operation is performed.

一方、モード切換スイッチ１２が翻訳モードの状態である場合には（ＳＴ１０１でＹｅｓ）、次に、制御部６において、録音スイッチ１４が押下されているか否かを判定する（ＳＴ１０６）。ここで、録音スイッチ１４が押下されている場合には（ＳＴ１０６でＹｅｓ）、録音部３１において、マイク３で収音した原音声を録音する処理を開始する（ＳＴ１０７）。このとき、バイブレーションや通知音で、録音が開始されたことをユーザに通知するようにしてもよい。そして、録音スイッチ１４が戻されると（ＳＴ１０８でＹｅｓ）、録音を停止する（ＳＴ１０９）。 On the other hand, when the mode switch 12 is in the translation mode (Yes in ST101), the control unit 6 next determines whether or not the recording switch 14 is pressed (ST106). If the recording switch 14 is pressed (Yes in ST106), the recording unit 31 starts a process of recording the original sound collected by the microphone 3 (ST107). At this time, the user may be notified of the start of recording by vibration or notification sound. When the recording switch 14 is returned (Yes in ST108), recording is stopped (ST109).

次に、音声認識部３２において、録音した原音声を文字情報に変換する音声認識が行われる（ＳＴ１１０）。次に、検索部３３において、原文（原音声の文字情報）に最も類似する定型文を探し出す検索が行われる（ＳＴ１１１）。そして、検索部３３で見つかった定型文を表示する定型文表示画面（図４参照）を表示入力パネル９に表示する（ＳＴ１１２）。 Next, the voice recognition unit 32 performs voice recognition for converting the recorded original voice into character information (ST110). Next, the search unit 33 performs a search for searching for a fixed phrase most similar to the original sentence (character information of the original voice) (ST111). Then, a fixed sentence display screen (see FIG. 4) for displaying the fixed sentence found by the search unit 33 is displayed on the display input panel 9 (ST112).

次に、出力スイッチ１６が押下されているか否かを判定する（ＳＴ１１３）。ここで、出力スイッチ１６が押下されている場合には（ＳＴ１１３でＹｅｓ）、音声合成部３４において、定型文から合成音声を生成し、再生部３５において、合成音声を再生する処理が開始され、合成音声がスピーカ８から出力される（ＳＴ１１４）。このとき、再生部３５において、各言語の音声が順に繰り返し再生される。そして、出力スイッチ１６が戻されると（ＳＴ１１５でＹｅｓ）、音声の出力を停止する（ＳＴ１１６）。 Next, it is determined whether or not the output switch 16 is pressed (ST113). Here, when the output switch 16 is pressed (Yes in ST113), the speech synthesizer 34 generates synthesized speech from the standard sentence, and the playback unit 35 starts processing to reproduce the synthesized speech. The synthesized voice is output from the speaker 8 (ST114). At this time, the sound of each language is repeatedly reproduced in order by the reproduction unit 35. When the output switch 16 is returned (Yes in ST115), the audio output is stopped (ST116).

次に、音声出力（ＳＴ１１４）での動作手順について説明する。図９、図１０および図１１は、音声出力（ＳＴ１１４）での動作手順を示すフロー図である。なお、ここでは、原言語として日本語、他言語を英語および中国語とした例を示す。 Next, an operation procedure in audio output (ST114) will be described. 9, FIG. 10 and FIG. 11 are flowcharts showing an operation procedure in audio output (ST114). Here, an example is shown in which the source language is Japanese and the other languages are English and Chinese.

音声出力（ＳＴ１１４）では、まず、図９に示すように、制御部６において、ユーザ設定情報に基づいて、音声調整を行う設定であるか否かを判定する（ＳＴ２０１）。 In the audio output (ST114), first, as shown in FIG. 9, the control unit 6 determines whether or not it is a setting for performing audio adjustment based on the user setting information (ST201).

ここで、音声調整を行う設定でない場合には（ＳＴ２０１でＮｏ）、次に、原音声を優先して出力する設定か否かを判定する（ＳＴ２０２）。ここで、原音声を優先して出力する設定である場合には（ＳＴ２０２でＹｅｓ）、次に、女性合成音声を出力する設定か否かを判定する（ＳＴ２０３）。 If the setting is not for performing audio adjustment (No in ST201), it is next determined whether or not the setting is for preferential output of the original audio (ST202). If the setting is to output the original voice preferentially (Yes in ST202), it is next determined whether or not the setting is to output the female synthesized voice (ST203).

ここで、女性合成音声を出力する設定である場合には（ＳＴ２０３でＹｅｓ）、まず、記憶部２５から原音声を取得して、音声出力部２７において原音声を出力する（ＳＴ２０４）。ついで、ユーザが出力対象として指定した定型文の英語テキスト（英語の文字情報）を定型文データベースから取得して、音声合成部３４において英語テキストから女性合成音声を生成して、音声出力部２７において女性合成音声を出力する（ＳＴ２０５）。ついで、定型文データベースから中国語テキスト（中国語の文字情報）を取得して、その中国語テキストから女性合成音声を生成して出力する（ＳＴ２０６）。 If the setting is to output female synthesized speech (Yes in ST203), first, the original speech is acquired from the storage unit 25, and the speech output unit 27 outputs the original speech (ST204). Subsequently, the English text (English character information) of the fixed sentence specified by the user as the output target is acquired from the fixed sentence database, and the female synthesized speech is generated from the English text in the speech synthesizer 34. A female synthesized voice is output (ST205). Next, Chinese text (Chinese character information) is acquired from the fixed phrase database, and a female synthesized speech is generated from the Chinese text and output (ST206).

一方、女性合成音声を出力する設定でない、すなわち、男性合成音声を出力する設定である場合には（ＳＴ２０３でＮｏ）、まず、記憶部２５から原音声を取得して、その原音声を出力する（ＳＴ２０７）。ついで、定型文データベースから英語テキストを取得して、その英語テキストから男性合成音声を生成して出力する（ＳＴ２０８）。ついで、定型文データベースから中国語テキストを取得して、その中国語テキストから男性合成音声を生成して出力する（ＳＴ２０９）。 On the other hand, if it is not set to output female synthesized speech, that is, is set to output synthesized male speech (No in ST203), first, the original speech is acquired from the storage unit 25 and the original speech is output. (ST207). Next, an English text is acquired from the fixed phrase database, and male synthesized speech is generated from the English text and output (ST208). Next, Chinese text is acquired from the fixed phrase database, and male synthesized speech is generated from the Chinese text and output (ST209).

また、原音声を優先して出力する設定でない場合には（ＳＴ２０２でＮｏ）、図１０に示すように、次に、女性合成音声を出力する設定か否かを判定する（ＳＴ２１０）。 If it is not set to output the original voice preferentially (No in ST202), it is next determined whether or not it is set to output the female synthesized voice as shown in FIG. 10 (ST210).

ここで、女性合成音声を出力する設定である場合には（ＳＴ２１０でＹｅｓ）、まず、ユーザが出力対象として指定した定型文の日本語テキスト（日本語の文字情報）を定型文データベースから取得して、その日本語テキストから女性合成音声を生成して出力する（ＳＴ２１１）。ついで、定型文データベースから英語テキストを取得して、その英語テキストから女性合成音声を生成して出力する（ＳＴ２１２）。ついで、定型文データベースから中国語テキストを取得して、その中国語テキストから女性合成音声を生成して出力する（ＳＴ２１３）。 If the setting is to output female synthesized speech (Yes in ST210), first, the Japanese text of the standard text specified as the output target by the user (Japanese character information) is acquired from the standard text database. Then, female synthesized speech is generated from the Japanese text and output (ST211). Next, an English text is acquired from the fixed phrase database, and a female synthesized speech is generated and output from the English text (ST212). Next, Chinese text is acquired from the fixed phrase database, and female synthesized speech is generated from the Chinese text and output (ST213).

一方、女性合成音声を出力する設定でない、すなわち、男性合成音声を出力する設定である場合には（ＳＴ２１０でＮｏ）、まず、ユーザが出力対象として指定した定型文の日本語テキストを定型文データベースから取得して、その日本語テキストから男性合成音声を生成して出力する（ＳＴ２１４）。ついで、定型文データベースから英語テキストを取得して、その英語テキストから男性合成音声を生成して出力する（ＳＴ２１５）。ついで、定型文データベースから中国語テキストを取得して、その中国語テキストから男性合成音声を生成して出力する（ＳＴ２１６）。 On the other hand, if it is not set to output female synthesized speech, that is, it is set to output male synthesized speech (No in ST210), first, the Japanese text of the standard text specified by the user as the output target is the standard text database. , And generates and outputs male synthesized speech from the Japanese text (ST214). Next, an English text is acquired from the fixed phrase database, and male synthesized speech is generated from the English text and output (ST215). Next, Chinese text is acquired from the fixed phrase database, and male synthesized speech is generated from the Chinese text and output (ST216).

また、図９に示したように、音声調整を行う設定である場合には（ＳＴ２０１でＹｅｓ）、図１１に示すように、次に、音声解析部３７において、原音声の特徴（性別、テンポ、音量および高さ）を検出する（ＳＴ２１７）。 Also, as shown in FIG. 9, if the setting is to perform voice adjustment (Yes in ST201), then as shown in FIG. 11, the voice analysis unit 37 performs the characteristics of the original voice (gender, tempo). , Volume and height) are detected (ST217).

次に、音声調整部３８において、ユーザ設定情報に基づいて、合成音声を原音声に合わせる設定であるか否かを判定する（ＳＴ２１８）。 Next, the voice adjustment unit 38 determines whether or not the synthesized voice is set to match the original voice based on the user setting information (ST218).

ここで、合成音声を原音声に合わせる設定である場合には（ＳＴ２１８でＹｅｓ）、まず、記憶部２５から原音声を取得して、音声出力部２７において原音声を出力する（ＳＴ２１９）。ついで、定型文データベースから英語テキストを取得して、音声合成部３４において、原音声の性別、テンポ、音量および高さに合うように、英語テキストから合成音声を生成して、音声出力部２７において合成音声を出力する（ＳＴ２２０）。ついで、定型文データベースから中国語テキストを取得して、その中国語テキストから原音声の性別、テンポ、音量および高さに合うように合成音声を生成して、その合成音声を出力する（ＳＴ２２１）。 Here, when the synthetic voice is set to match the original voice (Yes in ST218), the original voice is first acquired from the storage unit 25, and the voice output unit 27 outputs the original voice (ST219). Next, the English text is acquired from the fixed phrase database, and the speech synthesizer 34 generates synthesized speech from the English text so as to match the gender, tempo, volume and height of the original speech, and the speech output unit 27 Synthetic speech is output (ST220). Next, Chinese text is acquired from the fixed phrase database, synthesized speech is generated from the Chinese text to match the gender, tempo, volume, and height of the original speech, and the synthesized speech is output (ST221). .

一方、合成音声を原音声に合わせる設定でない、すなわち、原音声を合成音声に合わせる設定である場合には（ＳＴ２１８でＮｏ）、記憶部２５から原音声情報を取得して、合成音声に関する初期設定で採用されている標準のテンポ、音量および高さになるように原音声を変換して、その原音声を出力する（ＳＴ２２２）。ついで、定型文データベースから英語テキストを取得して、その英語テキストから、原音声の性別で合成音声を生成して、その合成音声する出力する（ＳＴ２２３）。ついで、定型文データベースから中国語テキストを取得して、その中国語テキストから、原音声の性別で合成音声を生成して、その合成音声を出力する（ＳＴ２２４）。 On the other hand, if the synthesized voice is not set to match the original voice, that is, it is set to match the original voice to the synthesized voice (No in ST218), the original voice information is acquired from the storage unit 25, and the initial setting for the synthesized voice is obtained. The original voice is converted so as to have the standard tempo, volume and height adopted in the above, and the original voice is output (ST222). Next, an English text is acquired from the fixed phrase database, a synthesized speech is generated from the English text with the gender of the original speech, and the synthesized speech is output (ST223). Next, Chinese text is acquired from the fixed phrase database, synthesized speech is generated from the Chinese text with the gender of the original speech, and the synthesized speech is output (ST224).

なお、本実施形態では、原音声を合成音声に合わせる設定である場合に、性別以外の特徴（テンポ、音量および高さ）が、合成音声に関する初期設定で採用されている標準の音声生成条件に合うように原音声の音声調整を行うようにして、性別に関しては、原音声の性別で合成音声を生成するようにしたが、原音声の音声変換により、初期設定の性別（例えば女性）や、ユーザが指定した性別の音声に変換するようにしてもよい。この場合、他言語（英語、中国語）の合成音声も、初期設定の性別やユーザが指定した性別で生成するようにする。 In the present embodiment, when the original sound is set to be combined with the synthesized voice, characteristics (tempo, volume, and height) other than gender are the standard voice generation conditions employed in the initial setting for the synthesized voice. The original voice is adjusted so that it fits, and with regard to the gender, the synthesized voice is generated with the gender of the original voice, but the original gender (for example, female), You may make it convert into the audio | voice of the sex designated by the user. In this case, synthesized speech in other languages (English, Chinese) is also generated with the initial gender or the gender specified by the user.

また、音声の特徴を原音声と合成音声とで一致させるために、合成音声を原音声に合わせたり、原音声を合成音声に合わせたりするようにしたが、原音声および合成音声の双方を、所定の特徴の音声に合わせるようにしてもよい。 In addition, in order to match the characteristics of the voice between the original voice and the synthesized voice, the synthesized voice is matched with the original voice or the original voice is matched with the synthesized voice. You may make it match | combine with the audio | voice of a predetermined characteristic.

以上のように、本出願において開示する技術の例示として、実施形態を説明した。しかしながら、本開示における技術は、これに限定されず、変更、置き換え、付加、省略などを行った実施形態にも適用できる。また、上記の実施形態で説明した各構成要素を組み合わせて、新たな実施形態とすることも可能である。 As described above, the embodiments have been described as examples of the technology disclosed in the present application. However, the technology in the present disclosure is not limited to this, and can be applied to embodiments in which changes, replacements, additions, omissions, and the like have been performed. Moreover, it is also possible to combine each component demonstrated by said embodiment into a new embodiment.

例えば、前記の実施形態では、ユーザが発話する音声の音声認識により取得した原文と類似度の高い原言語（日本語）の定型文を検索して、その原言語の定型文に対応する他言語の定型文を取得して、その他言語の定型文から他言語の合成音声を生成するようにしたが、翻訳エンジンを用いて原文を翻訳することで他言語の文章を取得して、その他言語の文章から他言語の合成音声を生成するようにしてもよい。 For example, in the above-described embodiment, a fixed sentence in the source language (Japanese) having a high similarity to the original sentence acquired by speech recognition of speech uttered by the user is searched, and another language corresponding to the fixed sentence in the original language is searched. Was obtained, and synthesized speech of other languages was generated from the fixed phrases of other languages.However, by translating the original sentence using a translation engine, the sentences of other languages were obtained, You may make it produce | generate the synthetic speech of another language from a text.

また、前記の実施形態では、合成音声取得（音声認識、検索、音声合成）、音声解析、および音声調整などの各処理を拡声装置１で行うようにしたが、拡声装置１から必要な情報（例えば原音声）をサーバ装置に送信して、サーバ装置において、前記の各処理の全てあるいは一部を行うようにしてもよい。また、前記の翻訳エンジンを用いた文字翻訳をサーバ装置に行わせるようにしてもよい。 In the above-described embodiment, each process such as synthetic voice acquisition (speech recognition, search, voice synthesis), voice analysis, and voice adjustment is performed by the loudspeaker 1. For example, the original voice) may be transmitted to the server device, and the server device may perform all or part of the above-described processes. Moreover, you may make it make a server apparatus perform character translation using the said translation engine.

本発明に係る拡声装置およびその制御方法は、現場の状況などに応じて、ユーザが発話した音声を録音した原音声と、その原音声から生成される原言語の合成音声とをユーザが適宜に切り替えて出力することができる効果を有し、ユーザが発話した音声を出力する拡声装置およびその制御方法などとして有用である。 According to the loudspeaker and the control method thereof according to the present invention, the user appropriately selects the original voice that is recorded from the voice spoken by the user and the synthesized voice of the original language that is generated from the original voice according to the situation at the site. It has the effect of being able to be switched and output, and is useful as a loudspeaker that outputs the voice spoken by the user and its control method.

１拡声装置
３マイク
６制御部
７アンプ
８スピーカ
１４録音スイッチ
１６出力スイッチ
２５記憶部
２６合成音声取得部
２７音声出力部
３１録音部
３２音声認識部
３３検索部
３４音声合成部
３５再生部
３６出力制御部
３７音声解析部
３８音声調整部 DESCRIPTION OF SYMBOLS 1 Loudspeaker 3 Microphone 6 Control part 7 Amplifier 8 Speaker 14 Recording switch 16 Output switch 25 Storage part 26 Synthetic voice acquisition part 27 Voice output part 31 Recording part 32 Voice recognition part 33 Search part 34 Speech synthesis part 35 Playback part 36 Output control Part 37 voice analysis part 38 voice adjustment part

Claims

A loudspeaker that outputs a voice spoken by a user,
A microphone that picks up the voice spoken by the user;
A recording unit for recording the original sound picked up by the microphone;
A synthesized speech acquisition unit that acquires a synthesized speech of a source language corresponding to the original speech and a synthesized speech of another language;
A voice output unit for outputting the original voice, the synthesized voice of the source language and the synthesized voice of the other language from a speaker;
An output control unit that controls to output either the original speech or the synthesized speech of the source language as the source language speech based on the user setting information;
A loudspeaker comprising:

2. The output control unit according to claim 1, wherein when outputting the synthesized speech, the output control unit controls to output either a female synthesized speech or a male synthesized speech based on user setting information. Loudspeaker.

3. The control according to claim 1, wherein the output control unit performs control so as to output a plurality of synthesized speech of the other languages in an order designated by a user following the speech of the source language. Loudspeaker.

The voice output unit inserts a silence period of a length specified by a user between voices of each language when outputting the voice of the source language and the voice of the other language. The loudspeaker according to any one of claims 1 to 3.

Furthermore, a voice analysis unit that acquires feature information of the original voice;
And a voice adjustment unit configured to perform a process of matching voice characteristics between the original voice and the synthesized voice of the other language when outputting the original voice and the synthesized voice of the other language. The loudspeaker according to any one of claims 1 to 4.

6. The loudspeaker according to claim 5, wherein the voice adjustment unit matches at least one of voice gender, tempo, volume, and height between the original voice and the synthesized voice of the other language.

A method for controlling a loudspeaker that outputs voice spoken by a user,
The voice uttered by the user is picked up by the microphone,
Record the original voice picked up by the microphone,
When the original speech is selected as the original language speech, the original speech is output from a speaker, and then a synthesized speech of another language corresponding to the original speech is acquired, and a synthesized speech of another language is obtained. Is output from the speaker,
When synthesized speech is selected as the source language speech, the source language synthesized speech corresponding to the source speech is acquired, and the source language synthesized speech is output from the speaker. A control method comprising: obtaining synthesized speech of another language corresponding to the original speech, and outputting synthesized speech of another language from the speaker.