JP2014170135A

JP2014170135A - Outdoor environmental sound transmitting device, and outdoor environmental sound transmitting system

Info

Publication number: JP2014170135A
Application number: JP2013042329A
Authority: JP
Inventors: Yoichi Suzuki; 陽一鈴木; Shuichi Sakamoto; 修一坂本; Masayuki Morimoto; 政之森本; Zhenglie Cui; 正烈崔; Hayato Sato; 逸人佐藤
Original assignee: Tohoku University NUC
Current assignee: Tohoku University NUC
Priority date: 2013-03-04
Filing date: 2013-03-04
Publication date: 2014-09-18

Abstract

PROBLEM TO BE SOLVED: To provide an outdoor environmental sound transmitting device for transmitting a sound message that is easy to hear for local residents outdoors in an echo environment.SOLUTION: An outdoor environmental sound transmitting device 100 for transmitting a sound message for disaster prevention to one or more branch stations laid in an area comprises: message division means 33 for dividing the sound message into meaningful symbol strings or obtaining the sound message divided into the meaningful symbol strings; speech period calculating means 34 for calculating a speech period for each of the symbol strings; section position determining means for determining the end of one or more of the symbol strings, in which the total of the speech periods is a delay period or less, as a section position of the sound message transmitted to each of the branch stations; soundless character inserting means 35, 36 for inserting soundless characters for the delay period or more at the section position of the sound message; and message transmission means 21 for transmitting the sound message, into which the soundless characters have been inserted, to the branch stations.

Description

本発明は、防災用の音声メッセージを地域に敷設された１つ以上の子局に伝達する屋外環境音声伝達装置に関する。 The present invention relates to an outdoor environment voice transmission device that transmits a voice message for disaster prevention to one or more slave stations laid in an area.

自然現象や人為的原因により様々な災害が生じている。行政や地域団体などは人命に影響を及ぼすおそれがある災害の発生の有無を監視しており、災害が発生した場合には地域住民を適切に避難誘導する。災害の発生や避難要請は、テレビ放送、ラジオ放送、移動体通信、及び、屋外拡声器などで地域住民に伝達される（例えば、特許文献１参照。）。 Various disasters are caused by natural phenomena and human factors. Governments and local organizations monitor the occurrence of disasters that could affect human lives, and appropriately evacuate local residents when disasters occur. Occurrence of disasters and evacuation requests are transmitted to local residents through television broadcasting, radio broadcasting, mobile communication, outdoor loudspeakers, and the like (see, for example, Patent Document 1).

伝達される情報として発生した災害の内容や避難先などを伝達する音声メッセージが付随する場合がある。例えば、地震速報では「気象庁緊急地震速報です。地震が発生しました。身の安全を確保してください。」という音声メッセージが伝達され、津波警報では「津波による被害が発生します。沿岸部や川沿いにいる人はただちに高台や避難ビルなど安全な場所へ避難してください。」という音声メッセージが伝達され、津波注意報では「海の中や海岸付近は危険です。海の中にいる人はただちに海から上がって、海岸から離れてください。」という音声メッセージが伝達される。地域住民は音声メッセージの内容を把握することで速やかに適切な行動を取ることができる。 There may be a case where a voice message that conveys the contents of a disaster that occurred or the evacuation destination is accompanied as information to be transmitted. For example, in the earthquake bulletin, the voice message “Meteorological Agency emergency earthquake bulletin. Earthquake has occurred. Please ensure your safety.” The tsunami warning says “Tsunami damage will occur. A voice message saying “People along the river should evacuate to safe places such as hills and evacuation buildings immediately.” Is transmitted, and the tsunami warning says, “The sea and the coast are dangerous. People in the sea. You should get up from the sea and leave the coast immediately. " Local residents can quickly take appropriate actions by grasping the contents of voice messages.

しかしながら、伝達手法の１つである屋外拡声器による音声メッセージは、地域住民の環境によっては地域住民が聴き取れない場合があるという問題がある。例えば、地域住民が屋内に滞在してテレビなども視聴していない場合、音声メッセージが遮断されるため、地域住民は音声メッセージの内容を把握できない。このような状況の地域住民に対しては、従来から、屋外拡声器による音声メッセージの伝達を補足する技術が考案されている（例えば、特許文献２参照。）。特許文献２には、住宅内に設置された住警器が親機と通信して、住宅内の住民に緊急地震速報等を報知する防災警報連携システムが開示されている。 However, there is a problem that the voice message by the outdoor loudspeaker, which is one of the transmission methods, may not be heard by the local resident depending on the local resident environment. For example, when a local resident stays indoors and does not watch TV, the voice message is blocked, and the local resident cannot grasp the content of the voice message. For local residents in such a situation, a technique for supplementing the transmission of voice messages by an outdoor loudspeaker has been conventionally devised (see, for example, Patent Document 2). Patent Document 2 discloses a disaster prevention alarm linkage system in which a home guard installed in a house communicates with a master unit to notify an inhabitant in the house of an earthquake early warning or the like.

特開２００１−２７３４５５号公報JP 2001-273455 A 特開２０１３−０２９９４４号公報JP 2013-029944 A

しかしながら、地域住民が屋外にいる場合でも、地域住民が音声メッセージを聴き取れない場合があるという問題がある。すなわち、屋外拡声器による拡声音は大音量であるため遠方まで伝達可能である反面、地域の地形によってはエコー（山彦、反響音）を生じさせ、地域住民が聴き取りにくい状況になるという問題があった。 However, even when the local residents are outdoors, there is a problem that the local residents may not be able to hear the voice message. In other words, the loud sound produced by the outdoor loudspeaker is loud and can be transmitted far away. On the other hand, depending on the geographical features of the area, an echo (Yamahiko, reverberation sound) is generated, making it difficult for local residents to hear. there were.

本発明は、上記課題に鑑み、エコー環境下において、屋外の地域住民が聴き取りやすい音声メッセージを伝達する屋外環境音声伝達装置を提供することを目的とする。 In view of the above problems, an object of the present invention is to provide an outdoor environment audio transmission device that transmits an audio message that can be easily heard by local residents outside in an echo environment.

本発明は、防災用の音声メッセージを地域に敷設された１つ以上の子局に伝達する屋外環境音声伝達装置であって、前記音声メッセージを取得する音声メッセージ取得手段と、前記音声メッセージ取得手段が取得した前記音声メッセージを意味のある記号列に分割するか、又は、外部に前記音声メッセージの前記記号列への分割を依頼するメッセージ分割手段と、各記号列の発話時間を算出する発話時間算出手段と、子局から前記音声メッセージを拡声した際に生じるエコーの遅延時間が子局毎に登録された遅延時間データベースと、発話時間の合計が遅延時間以下となる１つ以上の前記記号列毎に、該記号列の最後を前記音声メッセージの区切り位置に決定する処理を、前記遅延時間データベースに登録された子局に送信される音声メッセージに対し該子局の遅延時間に基づいて行う区切り位置決定手段と、音声メッセージの前記区切り位置に発話時間が遅延時間以上の無声文字を挿入する無声文字挿入手段と、無声文字が挿入された音声メッセージを子局に送信するメッセージ送信手段と、を有することを特徴とする。 The present invention is an outdoor environment voice transmission device that transmits a voice message for disaster prevention to one or more slave stations laid in an area, the voice message acquisition means for acquiring the voice message, and the voice message acquisition means Divides the voice message acquired by, into a meaningful symbol string, or message dividing means for requesting the voice message to be divided into the symbol string, and an utterance time for calculating the utterance time of each symbol string A calculating means; a delay time database in which a delay time of echo generated when the voice message is amplified from a slave station is registered for each slave station; and the one or more symbol strings whose total speech time is equal to or less than the delay time Each time, a process for determining the end of the symbol string as the voice message delimiter position is a voice message transmitted to the slave station registered in the delay time database. A delimiter position determining means for the message based on the delay time of the slave station, an unvoiced character inserting means for inserting an unvoiced character whose utterance time is longer than the delay time at the delimiter position of the voice message, and an unvoiced character is inserted. Message transmitting means for transmitting the voice message to the slave station.

エコー環境下において、屋外の地域住民が聴き取りやすい音声メッセージを伝達する屋外環境音声伝達装置を提供することができる。 In an echo environment, it is possible to provide an outdoor environment audio transmission device that transmits an audio message that can be easily heard by local residents outside.

音声メッセージのエコーについて説明する図の一例である。It is an example of the figure explaining the echo of a voice message. 防災無線システムの処理の概略を説明する図の一例である。It is an example of the figure explaining the outline of a process of a disaster prevention radio | wireless system. 防災無線の全体的な仕組みの構成例を示す図である。It is a figure which shows the structural example of the whole mechanism of a disaster prevention radio | wireless. 防災無線システムのブロック図の一例である。It is an example of the block diagram of a disaster prevention radio system. エコーによる音声の了解度試験の実験結果を示す図である。It is a figure which shows the experimental result of the speech intelligibility test by echo. ポーズ挿入装置のハードウェア構成図の一例である。It is an example of the hardware block diagram of a pose insertion apparatus. ポーズ挿入装置の機能ブロック図の一例である。It is an example of the functional block diagram of a pose insertion apparatus. 発話時間データベース、遅延時間データベースの一例を示す図である。It is a figure which shows an example of an utterance time database and a delay time database. 区切り位置の決定方法を説明する図の一例である。It is an example of the figure explaining the determination method of a delimiter position. ポーズ時間について説明する図の一例である。It is an example of the figure explaining pause time. ポーズ挿入装置の機能ブロック図の別の一例である。It is another example of the functional block diagram of a pose insertion apparatus. ポーズ挿入装置の動作手順を示すフローチャート図の一例である。It is an example of the flowchart figure which shows the operation | movement procedure of a pose insertion apparatus. ポーズ挿入装置の機能ブロック図の一例である（実施例２）。It is an example of the functional block diagram of a pose insertion apparatus (Example 2). ロングパスエコー算出部が算出する遅延時間、エコー・了解度対応データベースの一例を示す図である。It is a figure which shows an example of the delay time and echo / intelligibility correspondence database which a long path echo calculation part calculates. ポーズ挿入装置の動作手順を示すフローチャート図の一例である。It is an example of the flowchart figure which shows the operation | movement procedure of a pose insertion apparatus.

以下、図面を参照して本発明の実施形態について説明する。しかしながら、本発明の技術的範囲が、本実施の形態に限定されるものではない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the technical scope of the present invention is not limited to this embodiment.

まず、図１を用いて音声メッセージのエコーについて説明する。図の子局はいわゆるスピーカを有しており、スピーカから災害の内容や避難先などを伝達する音声メッセージが拡声される。スピーカが例えば集落に暮らす地域住民に向けられている場合、スピーカから直接、音声メッセージが伝達されるが（以下、直接音という）、山などの反射物があると反射物で反射して同じ音声メッセージ（以下、反射音という）が時間をおいて地域住民に伝達される。 First, the echo of a voice message will be described with reference to FIG. The slave station shown in the figure has a so-called speaker, and a voice message that conveys the contents of the disaster, an evacuation destination, etc. is louded out from the speaker. For example, when a speaker is directed to a local inhabited by a village, a voice message is transmitted directly from the speaker (hereinafter referred to as direct sound), but if there is a reflector such as a mountain, the same sound is reflected by the reflector. A message (hereinafter referred to as “reflected sound”) is transmitted to the local residents over time.

直接音と反射音は伝達経路が異なるので、遅延時間Δｔが経過してから同じ音声メッセージが同じ地域住民に伝達されることになる。子局と地域住民の直線的な距離をＬ_０、子局から反射物を経て地域住民に至る経路をＬ_１＋Ｌ_２とすると、遅延時間Δｔは「Ｌ_１＋Ｌ_２−Ｌ_０」を音速で除した値となる。図の例では、直接音の「して下さい」と、反射音の「高台に避難」が重なって聞こえてしまい、聴き取りにくい状態となってしまう（このように音声メッセージが伝達されることを遅延重複伝達と称することにする）。 Since the direct sound and the reflected sound have different transmission paths, the same voice message is transmitted to the same local residents after the delay time Δt has elapsed. Assuming that the linear distance between the slave station and the local residents is L ₀ , and the route from the slave station to the local residents via the reflector is L ₁ + L ₂ , the delay time Δt is “L ₁ + L ₂ −L ₀ ” at the speed of sound. The divided value. In the example shown in the figure, the direct sound “please” and the reflected sound “evacuate to the hill” are heard, making it difficult to hear (in this way the voice message is transmitted). This will be referred to as delay overlap transmission).

そこで、本実施形態の防災無線システムは、音声メッセージにポーズを挿入することで遅延重複伝達を抑制する。 Therefore, the disaster prevention wireless system of the present embodiment suppresses delay overlap transmission by inserting a pause in the voice message.

図２は、本実施形態の防災無線システムの処理の概略を説明する図の一例である。
図２（ａ）の上段は、音声メッセージがエコーにより反射するが、遅延重複伝達とはならない例を示している。「高台に避難して下さい」の文章長をＴｓ［ms］とすると、遅延時間Δｔが文章長以上の場合、直接音の伝達が完了した後に反射音が伝達されるため、遅延重複伝達は発生しない。 FIG. 2 is an example of a diagram illustrating an outline of processing of the disaster prevention wireless system according to the present embodiment.
The upper part of FIG. 2A shows an example in which a voice message is reflected by an echo but does not result in delayed overlap transmission. If the sentence length of “Please evacuate to the hill” is Ts [ms], if the delay time Δt is longer than the sentence length, the reflected sound will be transmitted after the direct sound transmission is completed, so the delay overlap transmission will occur do not do.

しかし、現実的には、遅延時間Δｔが文章長以上となることはまれなので、図２（ａ）の下段のように遅延重複伝達が発生する。本実施形態の防災無線システムは、以下のようにして遅延重複伝達の発生を抑制する。
(i) 遅延時間Δｔ以下のフレーズに音声メッセージを分割する
(ii) フレーズとフレーズの間にΔｔ以上のポーズを挿入する
図２（ｂ）は音声メッセージの分割を説明する図の一例である。各フレーズが遅延時間Δｔ以下となるように、音声メッセージが「高台に」「避難して」「下さい」と分割されている。各フレーズのフレーズ長は厳密には異なるが、ここでは共通のフレーズ長Ｐｓ［ms］とした。したがって、Ｐｓ≦Δｔである。 However, in reality, it is rare that the delay time Δt is longer than the sentence length, so that delay overlap transmission occurs as shown in the lower part of FIG. The disaster prevention radio system according to the present embodiment suppresses the occurrence of delayed overlap transmission as follows.
(i) Divide voice message into phrases with delay time Δt or less
(ii) Inserting a Pause of Δt or More Between Phrases FIG. 2B is an example of a diagram for explaining the division of the voice message. The voice messages are divided into “Upper hill”, “Evacuate”, and “Please” so that each phrase is less than the delay time Δt. Although the phrase lengths of the phrases are strictly different, here, a common phrase length Ps [ms] is used. Therefore, Ps ≦ Δt.

図２（ｃ）はフレーズとフレーズの間へのポーズの挿入を模式的に示す図の一例である。「高台に」の後にΔｔ時間のポーズが挿入されている。Ｐｓ≦Δｔなので、「高台に」の反射音の伝達が完了するまで、次の直接音「避難して」は地域住民に伝達されない。したがって、地域住民には、直接音の「高台に」に続いて反射音の「高台に」が聞こえ、直接音と反射音が遅延重複伝達しない。 FIG. 2C is an example of a diagram schematically showing the insertion of a pose between phrases. A pause of Δt time is inserted after “on the hill”. Since Ps ≦ Δt, the next direct sound “evacuate” is not transmitted to the local residents until the transmission of the reflected sound “on the hill” is completed. Therefore, the local residents can hear the reflected sound “on the hill” after the direct sound “on the hill”, and the direct sound and the reflected sound are not transmitted in a delayed manner.

１つめのポーズ時間（無声時間）が経過すると、直接音「避難して」が拡声され、Δｔ以上のポーズ時間が挿入される。このポーズ時間の間に、反射音の「避難して」が伝達される。したがって、直接音と反射音が遅延重複伝達しない。さらに、２つめのポーズ時間が経過すると、直接音「下さい」が拡声され、直接音の「下さい」に続いて反射音の「下さい」が伝達される。 When the first pause time (silent time) elapses, the direct sound “evacuate” is amplified and a pause time of Δt or more is inserted. During this pause time, the reflected sound “evacuate” is transmitted. Therefore, the direct sound and the reflected sound are not transmitted with delay overlap. Further, when the second pause time elapses, the direct sound “please” is amplified and the reflected sound “please” is transmitted following the direct sound “please”.

したがって、フレーズ間に適切な時間のポーズを挿入することで、図２（ａ）の上段の状況を任意に作り出すことでき、任意の音声メッセージ及び遅延時間Δｔに対し、遅延重複伝達の発生を抑制できる。なお、後述するように直接音と反射音は、若干は重なっても、音声メッセージの了解度に大きな影響を与えないので、直接音の末尾と反射音の先頭が１〜数文字程度、重複してもよい。すなわち、完全に重複を排除する必要はない。 Therefore, by inserting a pause of appropriate time between phrases, the situation in the upper part of FIG. 2A can be created arbitrarily, and the occurrence of delayed duplicate transmission is suppressed for any voice message and delay time Δt. it can. As will be described later, even if the direct sound and the reflected sound slightly overlap, the intelligibility of the voice message is not greatly affected, so the end of the direct sound and the beginning of the reflected sound overlap by about one to several characters. May be. That is, it is not necessary to completely eliminate duplication.

図３は、防災無線の全体的な仕組みの構成例を示す図である。このような仕組みは、屋外環境音声伝達システムや防災行政無線システムとよばれる。国は、気象庁や消防庁などが入手した早急に対処すべき事態に対する情報を市区町村などの地方公共団体に伝達するシステムを構築している。国と地方公共団体はインターネットや人工衛星通信を介して通信可能である。気象庁などが検出し伝達する情報としては、緊急地震速報、津波警報、津波注意報、噴火警報、噴火予報、東海地震予知情報、震度速報、気象情報、土砂災害警戒情報、竜巻注意情報、記録的短時間大雨情報、指定河川洪水予報、テロやミサイルなどの国民保護情報等がある。以下、これらを区別せずに「緊急情報」という。 FIG. 3 is a diagram illustrating a configuration example of an overall mechanism of the disaster prevention radio. Such a mechanism is called an outdoor environment audio transmission system or a disaster prevention radio system. The national government has established a system for transmitting information to the local governments, such as municipalities, on information that should be dealt with immediately, such as those obtained by the Japan Meteorological Agency or the Fire and Disaster Management Agency. National and local governments can communicate via the Internet and satellite communications. Information detected and transmitted by the Japan Meteorological Agency includes emergency earthquake warnings, tsunami warnings, tsunami warnings, eruption warnings, eruption forecasts, Tokai earthquake prediction information, seismic intensity warnings, weather information, landslide disaster warning information, tornado warning information, record There are short-term heavy rain information, designated river flood forecasts, national protection information such as terrorism and missiles. Hereinafter, these are referred to as “emergency information” without distinction.

地方公共団体は、人工衛星１０１から電波を受信する受信アンテナ１０２やインターネット（後述するネットワーク２４）に接続するための通信装置を有しており、人が介在しなくても緊急情報を常に受信できる。 The local public entity has a receiving antenna 102 for receiving radio waves from the artificial satellite 101 and a communication device for connecting to the Internet (a network 24 described later), and can always receive emergency information without any human intervention. .

地方公共団体は、気象庁などから送信された音声メッセージを受信したり、事態に適切な音声メッセージを選択したりして、同報系の子局２００に伝達する（移動系の子局が同様に送信される）。子局２００は、地域住民が漏れなく音声メッセージを聴覚により知覚できるように地域に計画的に敷設されている。 The local government receives a voice message transmitted from the Japan Meteorological Agency, etc., or selects a voice message appropriate for the situation, and transmits it to the broadcast slave station 200 (the mobile slave station is also the same). Sent). The slave station 200 is systematically laid in the area so that local residents can perceive voice messages by hearing without omission.

なお、地方公共団体は、河川の水位の計測器や地震計などのテレメータ３００を有しており、テレメータ３００から送信されたデータを独自の基準で判別し、子局２００から音声メッセージを拡声することも可能である。 The local public entity has a telemeter 300 such as a river water level measuring instrument or a seismometer. The data transmitted from the telemeter 300 is discriminated based on its own standard, and the voice message is amplified from the slave station 200. It is also possible.

図４は、地方公共団体に配置されている防災無線システムのブロック図の一例を示す。防災無線システム１００は特許請求の範囲の屋外環境音声伝達装置の一例である。操作卓２０は、ユーザ（地方公共団体の防災関係者）が音声メッセージを送信する際の操作を受け付け、親局通信装置２１を介して送信する音声メッセージを制御する。また、操作卓２０はネットワーク２４を介して気象庁などが送信する緊急情報又は音声メッセージを受信し、ユーザが操作しなくても親局通信装置２１を介して音声メッセージを送信する。 FIG. 4 shows an example of a block diagram of a disaster prevention radio system arranged in a local public entity. The disaster prevention radio system 100 is an example of an outdoor environment audio transmission device according to the claims. The console 20 receives an operation when a user (a disaster prevention person in a local public organization) transmits a voice message, and controls the voice message transmitted via the master station communication device 21. Further, the console 20 receives emergency information or a voice message transmitted by the Japan Meteorological Agency or the like via the network 24, and transmits a voice message via the master station communication device 21 without any user operation.

操作卓２０は、マイク１１、表示部１２、操作部１３、データ記憶部１４、制御部１５、及び、入出力Ｉ／Ｆ１６を有しており、また、制御部１５は親局通信装置２１と接続されている。 The console 20 includes a microphone 11, a display unit 12, an operation unit 13, a data storage unit 14, a control unit 15, and an input / output I / F 16, and the control unit 15 is connected to the master station communication device 21. It is connected.

制御部１５は、ＣＰＵ、ＲＡＭ、ＲＯＭ、Ｉ／Ｏなどを備えたマイコンなどの情報処理装置である。制御部１５は、ＣＰＵが実行するＯＳ(Operating System)やプログラムにより操作卓２０の全体を制御している。 The control unit 15 is an information processing apparatus such as a microcomputer provided with a CPU, RAM, ROM, I / O, and the like. The control unit 15 controls the entire console 20 by an OS (Operating System) or a program executed by the CPU.

マイク１１は、ユーザが発声した音声を集音し電気信号に変換する。電気信号は制御部１５により、所定の方式で符号化され音声信号に変換される。符号化方式は特に規定がないが標準的な符号加速度として２６．６ｋｂｐｓ、６．４ｋｂｐｓなどがある。音声信号はデータ記憶部１４に記憶される。 The microphone 11 collects the voice uttered by the user and converts it into an electrical signal. The electric signal is encoded by the control unit 15 by a predetermined method and converted into an audio signal. The encoding method is not particularly specified, but there are 26.6 kbps, 6.4 kbps, and the like as standard code acceleration. The audio signal is stored in the data storage unit 14.

表示部１２は、液晶などのフラットパネルディスプレイであり、タッチパネルを一体に装備していていることが好ましい。表示部１２には、操作メニューの他、外部から取り込んだ地図、カメラが撮影した映像などが表示される。 The display unit 12 is a flat panel display such as a liquid crystal, and is preferably equipped with a touch panel. In addition to the operation menu, the display unit 12 displays a map captured from the outside, a video taken by the camera, and the like.

操作部１３は、キーボードなどのハードキーの他、タッチパネルに表示されたソフトキー、及び、ポインティングデバイスなどである。操作部１３は、例えば子局２００から送信する音声メッセージの選択などを受け付ける。 The operation unit 13 is a hard key such as a keyboard, a soft key displayed on the touch panel, and a pointing device. The operation unit 13 accepts selection of a voice message transmitted from the slave station 200, for example.

データ記憶部１４はＨＤＤ（Hard Disk Drive）やＳＤＤ（Solid State Drive）であり、音声信号、音声メッセージのテキストデータ、プログラムなどが記憶される。 The data storage unit 14 is an HDD (Hard Disk Drive) or an SDD (Solid State Drive), and stores voice signals, text data of voice messages, programs, and the like.

入出力Ｉ／Ｆ１６は、周辺機器と通信するためのインタフェースであり、例えばイーサネット（登録商標）カード、ＵＳＢホストなどである。入出力Ｉ／Ｆ１６にはＦＡＸ装置２２、音声合成装置２３、ポーズ挿入装置３０、及び、ネットワーク２４が接続されている。 The input / output I / F 16 is an interface for communicating with peripheral devices, and is, for example, an Ethernet (registered trademark) card, a USB host, or the like. The input / output I / F 16 is connected to a FAX device 22, a speech synthesizer 23, a pause insertion device 30, and a network 24.

ＦＡＸ装置２２は、電話回線網やネットワーク２４から画像データを受信して印刷したり、操作卓２０から送信依頼されたテキストを指定された宛先に送信する。音声合成装置２３は、テキストデータを音声に変換する装置である。操作卓２０から依頼された緊急情報の音声メッセージを音声に変換することで、ユーザが音声メッセージを読み上げなくても子局２００に音声信号を送信できる。なお、音声合成装置２３が作成した音声は、音声合成装置２３又は制御部１５により音声信号に変換されデータ記憶部１４に記録される。 The FAX apparatus 22 receives image data from the telephone line network or the network 24 and prints it, or transmits a text requested to be transmitted from the console 20 to a designated destination. The speech synthesizer 23 is a device that converts text data into speech. By converting the voice message of the emergency information requested from the console 20 into voice, the voice signal can be transmitted to the slave station 200 without the user reading out the voice message. The voice created by the voice synthesizer 23 is converted into a voice signal by the voice synthesizer 23 or the control unit 15 and recorded in the data storage unit 14.

ポーズ挿入装置３０は、ＰＣ（Personal Computer）、マイコンなどの情報処理装置である。ポーズ挿入装置３０の機能については後述するが、ポーズ挿入装置３０の機能は制御部１５が有するように構成することもできる。 The pause insertion device 30 is an information processing device such as a PC (Personal Computer) or a microcomputer. Although the function of the pose insertion device 30 will be described later, the function of the pose insertion device 30 may be configured to be included in the control unit 15.

ネットワーク２４は主にＬＡＮであるが、ＬＡＮを経由してインターネットなどＩＰ通信網に接続されている。このため、J-ALERT（全国瞬時警報システム）などの外部からの情報を受信できる。 The network 24 is mainly a LAN, but is connected to an IP communication network such as the Internet via the LAN. For this reason, information from outside such as J-ALERT (National Instantaneous Warning System) can be received.

制御部１５は、データ記憶部１４に記憶された音声信号を親局通信装置２１から無線送信する。防災無線システム１００では無線通信の標準規格（例えば、市町村デジタル同報通信システム標準規格ＡＲＩＢＳＴＤ−Ｔ８６）が定められている。すなわち、送信部１７は、音声信号をπ／４シフトＱＰＳＫで変調し、ＴＤＭＡ−ＴＤＤ（Time Division Multiple Access-Time Division Duplex：時分割多元接続）方式によって、１つのフレームを６つにタイムスロットに分割する。制御用のスロットを除き、５つのスロットに音声信号を格納できる。変調された音声信号は例えば６０ＭＨｚの搬送波でアンテナ２５から送信される。 The control unit 15 wirelessly transmits the audio signal stored in the data storage unit 14 from the master station communication device 21. In the disaster prevention wireless system 100, a wireless communication standard (for example, a municipal digital broadcast communication system standard ARIB STD-T86) is defined. That is, the transmission unit 17 modulates a voice signal with π / 4 shift QPSK, and converts one frame into six time slots by a TDMA-TDD (Time Division Multiple Access-Time Division Duplex) method. To divide. Except for the control slot, audio signals can be stored in five slots. The modulated audio signal is transmitted from the antenna 25 with a carrier of 60 MHz, for example.

なお、同規格では、親局通信装置２１の送信部１７から子局２００に対しページングチャネルを使用した通報開始指示を行う。ページングチャネルに流れるフレームには都道府県コードとユーザコード（子局を指定する）が含まれており、親局通信装置２１は任意のユーザコードを設定することで、子局（又は子局の群）２００を個別呼び出しすることができる。 In the same standard, the transmitter 17 of the master station communication device 21 instructs the slave station 200 to start reporting using the paging channel. A frame flowing in the paging channel includes a prefecture code and a user code (specifying a slave station), and the master station communication device 21 sets an arbitrary user code, whereby a slave station (or group of slave stations) 200 is set. Can be called individually.

なお、この通信規格は地方公共団体などが定める通信方法の一例に過ぎず、携帯電話網、ＰＨＳ網、ＷｉＭＡＸ網、無線ＬＡＮ網などで、親局通信装置２１と子局２００が通信してもよい。また、ＺｉｇＢｅｅ（登録商標）のように子局同士が通信する通信ネットワークを構築し、子局２００と親局通信装置２１が通信してもよい。また、無線でなく、一部地域又は全ての地域で有線通信を採用してもよい。 Note that this communication standard is only an example of a communication method determined by a local public body, and the like, even if the master station communication device 21 and the slave station 200 communicate with each other via a mobile phone network, a PHS network, a WiMAX network, a wireless LAN network, or the like. Good. Further, a communication network in which slave stations communicate with each other like ZigBee (registered trademark) may be constructed, and the slave station 200 and the master station communication device 21 may communicate with each other. Moreover, you may employ | adopt wired communication in one part area or all the areas instead of a radio | wireless.

また、子局２００は、親局通信装置２１から音声メッセージを受信して、地域住民に対して拡声放送を行う。子局２００は、受信部５１、制御部５２、マイク５４、及び、スピーカ５３等を有している。この他、不図示であるが操作部、表示部などを有していてもよい。また、子局２００のみで、擬似サイレン音、チャイム音などの自局放送を行うことができる。 In addition, the slave station 200 receives a voice message from the master station communication device 21 and broadcasts it to the local residents. The slave station 200 includes a receiving unit 51, a control unit 52, a microphone 54, a speaker 53, and the like. In addition, although not shown, an operation unit, a display unit, and the like may be included. Further, only the slave station 200 can broadcast its own station such as a pseudo siren sound and a chime sound.

子局２００の受信部５１は、アンテナ５５が受信した搬送波を復調して音声信号を取り出し制御部５２に出力する。制御部５２は、音声信号を電圧に変換した後、増幅してスピーカ５３から出力する。なお、マイク５４は、子局２００の周囲の音を集音する。この音声信号を子局２００から親局通信装置２１に送信することも可能である。 The receiving unit 51 of the slave station 200 demodulates the carrier wave received by the antenna 55, extracts an audio signal, and outputs it to the control unit 52. The control unit 52 converts the audio signal into a voltage, amplifies it, and outputs it from the speaker 53. The microphone 54 collects sounds around the slave station 200. It is also possible to transmit this audio signal from the slave station 200 to the master station communication device 21.

〔エコー環境における単語の認識しやすさ〕
出願人は、エコー環境における言語統制の有効性について実験を行った。 [Ease of recognizing words in an echo environment]
Applicants have experimented with the effectiveness of language control in an echo environment.

１．実験内容
提示音源として，FW07データベース[近藤他, 信学技報, SP2007-157, 2008.]に収録された話者myaの音声を用いた。単語親密度は、４ランクのうちの上位２種類（親密度高：7.0-5.5，親密度低：5.5-4.0）を使用した。本実験では、後続音が１個、若しくは２個存在する環境を模擬した。 1. The voice of the speaker mya recorded in the FW07 database [Kondo et al., IEICE Tech. Bulletin, SP2007-157, 2008.] was used as the sound source for the experiment contents. For the word familiarity, the top two of the four ranks (high familiarity: 7.0-5.5, low familiarity: 5.5-4.0) were used. In this experiment, an environment with one or two subsequent sounds was simulated.

図５（ａ）はエコーにより生じる後続音のタイムパターンである。図示のように、４連単語を1つの聴取音とし、そこに指定した遅延時間で後続音を１〜２個付け加えて作成した。図中の○□△×は，４つの異なる単語を表す。１つの単語が４モーラで長さがおよそ750［ms］であったため、1/2単語長の時間ずれは375［ms］になる。なお、後続音は原音と同様の音声を使用した。FW07データベースの性質上、各聴取者が1つの親密度条件の４００単語すべてを聴取する必要があったため、５つの音声セットを作り、各セットに８０個ずつの音声単語（２０個の４連単語）をランダムに振り分けた。後続音の時間ずれが７種類（図５（ａ）の条件Ａ〜Ｇ）であり、既に聴取された４連単語を別の条件で再び聴取することを避けるため，最低７人の聴取者が必要であった。このため、各条件に５人の異なる聴取者を振り分けた。７人の聴取者は親密度高低の両方の実験に参加した。聴取者は、防音室のイスに座り、１つの４連単語が提示された後、１５秒間で回答用紙に４連単語を聴こえたままに記入するように指示された。音声は，ヘッドホン（SennheiserHDA-200）を経由して両耳に提示した。 FIG. 5A shows the time pattern of the subsequent sound generated by the echo. As shown in the figure, a quadruple word was used as one listening sound, and one or two subsequent sounds were added with a delay time specified there. ○ □ △ × in the figure represents four different words. Since one word is 4 mora and the length is about 750 [ms], the time shift of 1/2 word length is 375 [ms]. The subsequent sound used was the same sound as the original sound. Because of the nature of the FW07 database, each listener needed to listen to all 400 words of one familiarity condition, so five speech sets were created, and 80 speech words (20 quadruple words in each set). ) Was randomly distributed. There are 7 types of time lags of the following sounds (conditions A to G in FIG. 5A), and at least 7 listeners are required to avoid re-listening the already-listened quadruple words under different conditions. It was necessary. For this reason, five different listeners were assigned to each condition. Seven listeners participated in both high and low intimacy experiments. The listener was instructed to sit on the chair in the soundproof room and fill in the answer sheet on the answer sheet in 15 seconds after one quad word was presented. The sound was presented to both ears via headphones (SennheiserHDA-200).

２．結果と考察
図５（ｂ）は親密度高低の各実験条件での平均正答を示す図である。図中の黒い棒は高親密度、白い棒は低親密度条件を指す。７個の時間ずれ条件と２つの親密度条件を被験者内要因として、２要因の分散分析を行った。分析の結果、時間ずれ条件の主効果（F(6,
594)=141.63, p <.001），親密度条件の主効果（F(1, 99) = 85.98, p <.001）、時間ずれと親密度の交互作用（F(6, 594) = 2.15, p <.05），いずれにも有意差が認められた。また、原音のみの条件以外は，親密度が高い場合の正答率が有意に高かった。ロングパスエコーが存在する屋外環境下でも、親密度が語彙の難易度統制指標として有効であることが示唆される。親密度別に分析を行ったところ、高親密度条件では、後続音が２個の場合の平均正解率（Ｅ〜Ｇ間）に有意差が認められなかったものの、後続音が１個の場合は、後続音の遅延時間が１単語長の条件間に有意差が認められた（ＢとＤ）。後続音が２個の場合は、正解率が全体的に低かったため、後続音の遅延時間による影響が見られなかったと考えられる。低親密度条件の場合も同様の傾向が見られたものの、後続音の数が２個の場合も後続音の遅延時間が１単語長の条件間に有意差が認められた（ＢとＤ，ＥとＧ）。この理由については、今後、出現位置ごとの単語了解度を算出し、詳細な検討を行っていく予定である。 2. Results and discussion
FIG.5 (b) is a figure which shows the average correct answer in each experiment condition of high and closeness. Black bars in the figure indicate high familiarity and white bars indicate low familiarity conditions. Two-factor analysis of variance was performed with seven time-shift conditions and two intimacy conditions as factors within the subject. As a result of the analysis, the main effect of the time lag condition (F (6,
594) = 141.63, p <.001), main effect of intimacy condition (F (1, 99) = 85.98, p <.001), interaction between time shift and intimacy (F (6, 594) = 2.15 , p <.05), both were significantly different. In addition, except for the condition of only the original sound, the correct answer rate when the familiarity was high was significantly high. This suggests that intimacy is an effective vocabulary difficulty control index even in outdoor environments where long-path echo is present. When the analysis was performed according to intimacy, there was no significant difference in the average accuracy rate (between E and G) when there were two subsequent sounds under the high familiarity condition, but there was only one subsequent sound. A significant difference was observed between the conditions in which the delay time of the subsequent sound was one word length (B and D). When the number of subsequent sounds is two, the accuracy rate is generally low, so it is considered that the influence of the delay time of the subsequent sounds was not observed. Although the same tendency was observed in the low familiarity condition, a significant difference was observed between the conditions in which the delay time of the subsequent sound was one word length even when the number of subsequent sounds was two (B and D, E and G). In the future, we plan to calculate the word intelligibility for each appearance position and conduct a detailed study.

５．まとめ
ロングパスエコー存在下において親密度が単語了解度に及ぼす影響について調べた。実験の結果、単語了解度はエコーの影響を強く受けることが明らかになった。 5. Summary The effect of intimacy on word intelligibility in the presence of longpass echo was investigated. As a result of experiments, it became clear that word intelligibility is strongly influenced by echo.

〔ポーズ挿入装置〕
図６は、ポーズ挿入装置３０のハードウェア構成図の一例を示す。ポーズ挿入装置３０は、バスに接続された、ＣＰＵ３０１、ＲＯＭ３０２、ＲＡＭ３０３、ＨＤＤ３０４、ディスプレイ３２０が接続されたグラフィックボード３０５、キーボード・マウス３０６、メディアドライブ３０７、及び、ネットワーク通信部３０８を有する。ＣＰＵ３０１はＨＤＤ３０４に記憶されたプログラム３１０をＲＡＭ３０３に展開して実行し、各部品を制御して入出力を行ったり、データの加工を行ったりする。ＲＯＭ３０２にはＢＩＯＳや、ブートストラップローダをＨＤＤ３０４からＲＡＭ３０３に読み出すスタートプログラムが記憶されている。ブートストラップローダは、ＯＳをＨＤＤ３０４からＲＡＭ３０３に読み出す。 [Pause insertion device]
FIG. 6 shows an example of a hardware configuration diagram of the pose insertion device 30. The pause insertion device 30 includes a CPU 301, a ROM 302, a RAM 303, an HDD 304, a graphic board 305 to which a display 320 is connected, a keyboard / mouse 306, a media drive 307, and a network communication unit 308 connected to a bus. The CPU 301 expands and executes the program 310 stored in the HDD 304 in the RAM 303, and controls each component to perform input / output and process data. The ROM 302 stores a BIOS and a start program for reading the bootstrap loader from the HDD 304 to the RAM 303. The bootstrap loader reads the OS from the HDD 304 to the RAM 303.

ＨＤＤ３０４は、不揮発性のメモリであればよくＳＳＤ（Solid State Drive）などでもよい。ＨＤＤ３０４はＯＳ、デバイスドライバ、及び、後述する機能を提供するプログラム３１０を記憶している。なお、ＯＳとしては、Windows（登録商標）系、ＬＩＮＵＸ（登録商標）、ＵＮＩＸ（登録商標）などがある。ディスプレイ３２０にはプログラムが指示し、グラフィックボード３０５が作成したＧＵＩ画面が表示される。 The HDD 304 may be a non-volatile memory, and may be an SSD (Solid State Drive) or the like. The HDD 304 stores an OS, a device driver, and a program 310 that provides functions to be described later. The OS includes Windows (registered trademark), LINUX (registered trademark), UNIX (registered trademark), and the like. The display 320 displays a GUI screen generated by the graphic board 305 as instructed by the program.

キーボード・マウス３０６はユーザの操作を受け付ける入力装置である。なお、キーボード・マウス３０６とディスプレイは、ユーザが直接、ポーズ挿入装置３０を操作しない場合は備えていなくてもよい。 A keyboard / mouse 306 is an input device that receives user operations. Note that the keyboard / mouse 306 and the display may not be provided when the user does not directly operate the pose insertion device 30.

メディアドライブ３０７はコンパクトディスク、ＤＶＤ及びブルーレイディスクなどの光学メディアにデータを読み書きする。また、フラッシュメモリなどのメモリカードにデータを読み書きしてもよい。 The media drive 307 reads and writes data to and from optical media such as compact discs, DVDs, and Blu-ray discs. Further, data may be read from and written to a memory card such as a flash memory.

ネットワーク通信部３０８は、例えばＬＡＮに接続するためのイーサネット（登録商標）カードである。ＴＣＰ／ＩＰ（ＵＤＰ／ＩＰ）やアプリケーション層のプロトコルの処理はＯＳやプログラム３１０が行う。アプリケーション層のプロトコルは各種あるが、一般的なプロトコルに対応している（例えば、ＨＴＴＰ、ＦＴＰ、ＳＮＭＰ（Simple Network Management Protocol）等）。 The network communication unit 308 is, for example, an Ethernet (registered trademark) card for connecting to a LAN. The OS and program 310 perform TCP / IP (UDP / IP) and application layer protocol processing. There are various application layer protocols, but they correspond to general protocols (for example, HTTP, FTP, SNMP (Simple Network Management Protocol), etc.).

プログラム３１０は、インストール可能な形式又は実行可能な形式のファイルで、コンピュータで読み取り可能な記録メディアに記録して配布される。また、プログラム３１０は、不図示のサーバからインストール可能な形式又は実行可能な形式のファイルで配布される。 The program 310 is a file in an installable or executable format, and is distributed by being recorded on a computer-readable recording medium. The program 310 is distributed in a file that can be installed or executed from a server (not shown).

図７は、ポーズ挿入装置３０の機能ブロック図の一例を示す。操作卓２０のデータ記憶部１４に記憶された音声信号は、入出力Ｉ／Ｆ１６を介して音声信号取得部３１が取得する。音声信号取得部３１は「高台に避難してください」という発話の音声信号が取得されているので、音声認識部３２がテキストデータに変換する。音声認識部３２については公知のソフトウェアを利用すればよく、一般には音響モデルと言語モデルを使用して認識処理を施しテキストデータに変換する。また、ネットワーク２４を介して音声信号を送信し、外部のサーバ（例えばクラウドサービス）を利用して音声認識してもよい。音声認識部３２はテキストデータをフレーズ分割部３３に送出する。 FIG. 7 shows an example of a functional block diagram of the pose insertion device 30. The audio signal stored in the data storage unit 14 of the console 20 is acquired by the audio signal acquisition unit 31 via the input / output I / F 16. Since the voice signal acquisition unit 31 has acquired the voice signal of the utterance “Please evacuate to the hill”, the voice recognition unit 32 converts it into text data. For the speech recognition unit 32, publicly known software may be used. Generally, recognition processing is performed using an acoustic model and a language model, and converted into text data. Further, a voice signal may be transmitted via the network 24 and voice recognition may be performed using an external server (for example, a cloud service). The voice recognition unit 32 sends the text data to the phrase division unit 33.

フレーズ分割部３３はテキストデータをフレーズに分割する。フレーズとは、人間にとって意味がある１文字以上の記号列である。例えば、フレーズを単語と解釈することができる。しかし、文法上、「に」「て」なども単語（付属語）であり、「高台に」「避難して」のような文字列は「単語（自立語）＋付属語」である。一方、付属語が付随しなければ、単語のみでフレーズとなる場合がある。したがって、本実施例のフレーズとは単語に限られず、意味がある記号列であればよい。当然ながら文字以外にも数字やアルファベットを含んでよい。 The phrase dividing unit 33 divides the text data into phrases. A phrase is a symbol string of one or more characters that is meaningful to humans. For example, a phrase can be interpreted as a word. However, in terms of grammar, “ni”, “te”, etc. are also words (attached words), and character strings such as “on the hill” and “evacuate” are “words (independent words) + attached words”. On the other hand, if there is no accompanying word, it may be a phrase with only a word. Therefore, the phrase of the present embodiment is not limited to a word and may be a meaningful symbol string. Of course, numbers and alphabets may be included in addition to letters.

フレーズに分割するのはポーズを挿入するための区切り位置を決定するためである。フレーズの発話時間Ｐｓは遅延時間Δｔ以下であることが好ましいので、意味がある（例えば自立語を含む）最も短い記号列のフレーズに分割しておく。 The reason for dividing the phrase is to determine a delimiter position for inserting a pause. Since the phrase utterance time Ps is preferably equal to or less than the delay time Δt, the phrase is divided into phrases of the shortest symbol string that are meaningful (for example, including independent words).

英語などの分かち書き言語（文章が単語毎に分かれている言語）では、テキストデータに変換された時点で単語に分割されている。したがって、単語毎に分割すればよいが、日本語と同様に「a」や「the」の定冠詞、不定冠詞だけを取り出すと地域住民が意味を把握しにくいので、定冠詞や不定冠詞については後続の単語を組み合わせてフレーズとする。 In a divided language such as English (a language in which sentences are divided for each word), it is divided into words when converted into text data. Therefore, it is only necessary to divide each word. However, as with Japanese, it is difficult for local residents to grasp the meaning of the “a” and “the” definite articles and indefinite articles. Combine words to make a phrase.

日本語などのように分かち書きでない言語では、フレーズに分割するためにいったん、単語に分割することが好ましい。単語に分割する簡易な方法として、日本語など（中国語、韓国語、ベトナム語など）のように漢字への変換が可能な言語では、かな漢字変換することが有効である。かな漢字変換は、一般的なＩＭＥ（Input Method Editor）で可能であり、また、外部のサーバを利用することもできる。 In languages such as Japanese, which are not separated, it is preferable to divide them into words once to divide them into phrases. As a simple method of dividing into words, Kana-Kanji conversion is effective for languages that can be converted to Kanji such as Japanese (Chinese, Korean, Vietnamese, etc.). Kana-Kanji conversion can be performed by a general IME (Input Method Editor), and an external server can also be used.

かな漢字変換されたテキストデータのうち、漢字から次の漢字までを１つの単語と推定する。例えば、かな漢字変換され「高台に避難して下さい」というテキストデータが得られた場合、「高台に」「避難して」「下さい」が各単語である。 From the kana-kanji converted text data, the kanji to the next kanji are estimated as one word. For example, when Kana-Kanji conversion is performed and text data “Please evacuate to high ground” is obtained, “To high,” “Evacuate” and “Please” are the words.

また、日本語を単語に分割する方法として形態素解析を施すことも有効である。形態素解析とは、自然言語で書かれた文を、文法上の意味を持つ最小単位に分割する処理であり、日本語の形態素解析では、品詞レベルまで分割できる。「高台に避難してください」というテキストデータでは、「高台」「に」「避難し」「て」「下さ」「い」という形態素が抽出される。フレーズ分割部３３は、助詞は直前の名詞、動詞、形容詞、形容動詞などと共に１つのフレーズとして扱う。なお、フレーズ分割も外部のサーバを利用することができる。 It is also effective to perform morphological analysis as a method of dividing Japanese into words. Morphological analysis is a process of dividing a sentence written in a natural language into the smallest units having grammatical meaning. In Japanese morphological analysis, it can be divided up to the part of speech level. In the text data “Please evacuate to high ground”, morphemes “high hill”, “ni”, “evacuate”, “te”, “down”, and “i” are extracted. The phrase dividing unit 33 treats the particle as one phrase together with the immediately preceding noun, verb, adjective, adjective verb, and the like. Note that an external server can also be used for phrase division.

このようにしてフレーズに分割されたテキストデータは発話時間長算出部３４に送出される。発話時間長算出部３４は、発話時間データベース３７を参照して、各フレーズの発話時間を算出する。 The text data divided into phrases in this way is sent to the utterance time length calculator 34. The utterance time length calculation unit 34 refers to the utterance time database 37 and calculates the utterance time of each phrase.

図８（ａ）は発話時間データベース３７の一例を示す図である。発話時間データベース３７は、各フレーズのモーラ数が登録されている。モーラとは日本語の音の単位であり、モーラ数が決まればフレーズ全体の発話時間も定めることができる。標準的な発話時間を例えば４モーラで７５０［ms］とする。「高台に」のモーラ数は５であるので、「高台に」の発話時間は「187.5（１モーラの発話時間）×５＝937.5［ms］」である。 FIG. 8A shows an example of the utterance time database 37. In the utterance time database 37, the number of mora of each phrase is registered. Mora is a unit of Japanese sound, and if the number of mora is determined, the utterance time of the entire phrase can be determined. For example, the standard utterance time is set to 750 [ms] at 4 mora. Since the number of mora of “to hill” is 5, the utterance time of “to hill” is “187.5 (1 mora utterance time) × 5 = 937.5 [ms]”.

なお、モーラ数でなく発話時間を登録していてもよい。また、フレーズでなく単語のモーラ数を登録しておいてもよい。単語のモーラ数が登録されている場合、フレーズに含まれる各単語のモーラ数を合計する。 Note that the utterance time may be registered instead of the number of mora. In addition, the number of mora of words instead of phrases may be registered. When the number of mora of words is registered, the number of mora of each word included in the phrase is totaled.

図７に戻り、発話時間長算出部３４は、各フレーズに発話時間を添付して、区切り位置・ポーズ時間決定部３５に送出する。例えば、「高台に」「避難して」「下さい」では、「高台に：937.5」「避難して：937.5」「下さい：750」のように、フレーズ毎の発話時間が添付される。 Returning to FIG. 7, the utterance time length calculation unit 34 attaches the utterance time to each phrase and sends it to the delimiter position / pause time determination unit 35. For example, in “Up to hill”, “Evacuate”, “Please”, the utterance time for each phrase is attached, such as “To hill: 937.5”, “Evacuate: 937.5”, “Please: 750”.

区切り位置・ポーズ時間決定部３５は、遅延時間データベース３８を参照して、ポーズを挿入する区切り位置、及び、ポーズ時間を決定する。 The delimiter position / pause time determination unit 35 refers to the delay time database 38 and determines the delimiter position and pause time for inserting a pause.

図８（ｂ）は遅延時間データベース３８の一例を示す図である。遅延時間データベース３８には子局を識別する子局ＩＤに対応づけて遅延時間Δｔが登録されている。遅延時間が同じ子局では一度だけ区切り位置、ポーズ時間を決定すればよい。例えば、子局ＩＤ＝１の子局では遅延時間が１０００［ms］である。図２（ａ）にて説明したように、音声メッセージ全体の文章長Ｔｓが遅延時間以下であれば、遅延重複伝達が発生しないのでポーズを挿入する必要はない。また、音声の了解度に影響を及ぼさない程度の時間であれば重複してもよい。この時間は、音声の了解度の低下が閾値以下であるとして実験的に求めることができる。 FIG. 8B shows an example of the delay time database 38. In the delay time database 38, a delay time Δt is registered in association with the slave station ID for identifying the slave station. For the slave stations with the same delay time, the delimiter position and pause time need only be determined once. For example, in the slave station with the slave station ID = 1, the delay time is 1000 [ms]. As described with reference to FIG. 2A, if the sentence length Ts of the entire voice message is equal to or shorter than the delay time, there is no need to insert a pause because no delay overlap transmission occurs. In addition, the time may be overlapped as long as the time does not affect the intelligibility of the voice. This time can be determined experimentally assuming that the decrease in the intelligibility of the voice is below a threshold value.

音声メッセージ全体の文章長Ｔｓが遅延時間以下でない場合、区切り位置・ポーズ時間決定部３５は、以下の手順で、区切り位置とポーズ時間を決定する。 When the sentence length Ts of the entire voice message is not equal to or shorter than the delay time, the break position / pause time determination unit 35 determines the break position and pause time according to the following procedure.

図９は、区切り位置の決定方法を説明する図の一例である。図９（ａ）に示すように４つのフレーズがあり、それぞれの発話時間はＰｓ１、Ｐｓ２、Ｐｓ３及びＰｓ４である。 FIG. 9 is an example of a diagram illustrating a method for determining a break position. As shown in FIG. 9A, there are four phrases, and the utterance times are Ps1, Ps2, Ps3, and Ps4.

区切り位置・ポーズ時間決定部３５は、先頭のフレーズから順に遅延時間以下か否かを判定していく。図９（ｂ）では１つめの区切り位置を決定している。
(i) Ｐｓ１がΔｔ以下か否かを判定する。判定がＮの場合、「○○○」の後が区切り位置である。なお、１つのフレーズの発話時間が遅延時間以下とならない場合は、フレーズより短い文字単位に分割してもよい。 The delimiter position / pause time determination unit 35 determines whether or not the delay time is less than or equal to the first phrase. In FIG. 9B, the first delimiter position is determined.
(i) It is determined whether Ps1 is equal to or less than Δt. When the determination is N, the position after “XXX” is a separation position. If the utterance time of one phrase is not less than the delay time, it may be divided into character units shorter than the phrase.

また、直接音と反射音の重複時間が短い場合は音声の了解度はそれほど低下しないので、音声の了解度に影響を及ぼさない程度の時間の遅延重複伝達は許容できる。この場合、音声の了解度に影響を及ぼさない程度の時間をαとすれば、以下の判定式においてΔｔを「Δｔ−α」と置き換えればよい。
(ii) 判定がＹの場合、Ｐｓ１＋Ｐｓ２がΔｔ以下か否かを判定する。この判定がＮの場合、「○○○」の後が区切り位置である。
(iii) 判定がＹの場合、Ｐｓ１＋Ｐｓ２＋Ｐｓ３がΔｔ以下か否かを判定する。この判定がＮの場合、「△△△△」の後が区切り位置である。
(iv) 判定がＹの場合、Ｐｓ１＋Ｐｓ２＋Ｐｓ３＋Ｐｓ４がΔｔ以下か否かを判定する。この判定がＮの場合、「□□」の後が区切り位置である。
Ｐｓ１＋Ｐｓ２＋Ｐｓ３＋Ｐｓ４がΔｔ以下の場合、区切りが不要と判定される。 In addition, when the overlap time of the direct sound and the reflected sound is short, the intelligibility of the voice does not decrease so much, so that the delay overlap transmission for a time that does not affect the intelligibility of the voice is acceptable. In this case, if a time that does not affect the intelligibility of speech is α, Δt may be replaced with “Δt−α” in the following determination formula.
(ii) If the determination is Y, it is determined whether Ps1 + Ps2 is equal to or less than Δt. When this determination is N, the break position is after “XXX”.
(iii) If the determination is Y, it is determined whether Ps1 + Ps2 + Ps3 is equal to or less than Δt. When this determination is N, the position after “ΔΔΔΔ” is the separation position.
(iv) If the determination is Y, it is determined whether Ps1 + Ps2 + Ps3 + Ps4 is equal to or less than Δt. When this determination is N, the position after “□□” is a separation position.
When Ps1 + Ps2 + Ps3 + Ps4 is equal to or less than Δt, it is determined that no separation is necessary.

図９（ｃ）では、区切り位置が「○○○」の後の場合に、２つめの区切り位置を決定している。
(i) Ｐｓ２がΔｔ以下か否かを判定する。判定がＮの場合、「△△△△」の後が区切り位置である。
(ii) 判定がＹの場合、Ｐｓ２＋Ｐｓ３がΔｔ以下か否かを判定する。この判定がＮの場合、「△△△△」の後が区切り位置である。
(iii) 判定がＹの場合、Ｐｓ２＋Ｐｓ３＋Ｐｓ４がΔｔ以下か否かを判定する。この判定がＮの場合、「□□」の後が区切り位置である。
Ｐｓ２＋Ｐｓ３＋Ｐｓ４がΔｔ以下の場合、２つめの区切りが不要であると判定される。 In FIG. 9C, when the delimiter position is after “XXX”, the second delimiter position is determined.
(i) It is determined whether Ps2 is equal to or less than Δt. When the determination is N, the position after “ΔΔΔΔ” is the separation position.
(ii) If the determination is Y, it is determined whether Ps2 + Ps3 is equal to or less than Δt. When this determination is N, the position after “ΔΔΔΔ” is the separation position.
(iii) If the determination is Y, it is determined whether Ps2 + Ps3 + Ps4 is equal to or less than Δt. When this determination is N, the position after “□□” is a separation position.
When Ps2 + Ps3 + Ps4 is equal to or less than Δt, it is determined that the second delimiter is unnecessary.

図９（ｄ）では、区切り位置が「△△△△」の後の場合に、２つめ又は３つめの区切り位置を決定している。
(i) Ｐｓ３がΔｔ以下か否かを判定する。判定がＮの場合、「□□」の後が区切り位置である。
(ii) 判定がＹの場合、Ｐｓ３＋Ｐｓ４がΔｔ以下か否かを判定する。この判定がＮの場合、「□□」の後が区切り位置である。
Ｐｓ３＋Ｐｓ４がΔｔ以下の場合、２つめ又は３つめの区切りが不要であると判定される。 In FIG. 9D, when the delimiter position is after “ΔΔΔΔ”, the second or third delimiter position is determined.
(i) It is determined whether Ps3 is equal to or less than Δt. When the determination is N, the position after “□□” is a separation position.
(ii) If the determination is Y, it is determined whether Ps3 + Ps4 is equal to or less than Δt. When this determination is N, the position after “□□” is a separation position.
When Ps3 + Ps4 is equal to or less than Δt, it is determined that the second or third delimiter is unnecessary.

以上のようにして、合計の発話時間が遅延時間より短い最大数のフレーズ毎に区切り位置を決定でき、最小でフレーズ毎に区切り位置を決定できる。 As described above, the separation position can be determined for each maximum number of phrases whose total utterance time is shorter than the delay time, and the separation position can be determined for each phrase at the minimum.

例えば、「高台に避難して下さい」の文章長Ｔｓは「937.5＋937.5＋750＝2625」である。最初のフレーズ（高台に）の発話時間は937.5[ms]、子局ＩＤ＝１の子局の遅延時間は1000[ms]なので、次のフレーズ（避難して）の発話時間と合計する。 For example, the sentence length Ts of “Please evacuate to the hill” is “937.5 + 937.5 + 750 = 2625”. Since the utterance time of the first phrase (on the hill) is 937.5 [ms] and the delay time of the slave station with the slave station ID = 1 is 1000 [ms], it is summed with the utterance time of the next phrase (evacuated).

「937.5＋937.5＝1875」は遅延時間より大きいので、最初のフレーズ（高台に）の後が区切り位置である。 Since “937.5 + 937.5 = 1875” is larger than the delay time, the break position is after the first phrase (on the hill).

次に、２つめのフレーズ（避難して）の発話時間は937.5[ms]なので、３つめのフレーズ（下さい）の発話時間と合計する。 Next, since the utterance time of the second phrase (evacuation) is 937.5 [ms], it is totaled with the utterance time of the third phrase (please).

「937.5＋750＝1687.5」は遅延時間より大きいので、２つめのフレーズ（避難して）の後が区切り位置である。したがって、この音声メッセージでは、フレーズ毎にポーズが挿入される。 Since “937.5 + 750 = 1687.5” is greater than the delay time, the break position is after the second phrase (evacuation). Therefore, in this voice message, a pause is inserted for each phrase.

また、区切り位置・ポーズ時間決定部３５はポーズ時間を決定する。
図１０は、ポーズ時間について説明する図の一例である。ポーズ時間の間に１区切りのフレーズの発話時間が確保されるので、ポーズ時間は直前のフレーズの発話時間以上である。また、遅延時間とフレーズ時間（反射音）の合計以上でないと、次のフレーズとの遅延重複伝達が生じる。したがって、図示するように、ポーズ時間は、遅延時間Δｔとフレーズの発話時間Ｐｓの差にフレーズの発話時間を加えた値となる。この値は遅延時間Δｔに等しい。 The delimiter position / pause time determination unit 35 determines the pause time.
FIG. 10 is an example of a diagram illustrating the pause time. Since the utterance time of one phrase is secured during the pause time, the pause time is longer than the utterance time of the immediately preceding phrase. If the delay time and the phrase time (reflected sound) are not greater than or equal to the sum, a delay overlap transmission with the next phrase occurs. Accordingly, as shown in the figure, the pause time is a value obtained by adding the phrase utterance time to the difference between the delay time Δt and the phrase utterance time Ps. This value is equal to the delay time Δt.

音声の了解度に影響を及ぼさない程度の時間αの重複を許容する場合、ポーズ時間はΔｔ−αとする。ポーズ時間を短縮することで音声メッセージ全体の伝達が完了するまでの時間を低減できる。 The pause time is set to Δt−α when the overlap of the time α that does not affect the intelligibility of the speech is allowed. By shortening the pause time, it is possible to reduce the time until transmission of the entire voice message is completed.

なお、ポーズ時間を遅延時間より長く確保することは可能だが、ポーズ時間が不要に長いと音声メッセージの伝達が完了するまでに時間がかかるので、防災無線システム上は好ましくない。しかし、緊急性が低ければ遅延時間より長いポーズ時間を確保してもよい。 Although it is possible to secure the pause time longer than the delay time, if the pause time is unnecessarily long, it takes time to complete the transmission of the voice message, which is not preferable on the disaster prevention radio system. However, if the urgency is low, a pause time longer than the delay time may be secured.

また、若干の遅延重複伝達（例えば１モーラ程度）は、了解度をそれほど低下させないので、ポーズ時間が遅延時間未満であることを排除しない。 In addition, some delay overlap transmission (for example, about 1 mora) does not reduce the intelligibility so much, so it does not exclude that the pause time is less than the delay time.

区切り位置・ポーズ時間決定部３５は、区切り位置とポーズ時間をポーズ挿入部３６に送出する。なお、区切り位置・ポーズ時間決定部３５が、区切り位置に決定したポーズ時間のポーズを挿入して音声メッセージを再構築してもよい。本実施例では、ポーズ挿入部３６が、フレーズ分割部３３が分割したテキストデータにポーズを挿入するものとする。 The break position / pause time determination unit 35 sends the break position and pause time to the pause insertion unit 36. The break position / pause time determination unit 35 may reconstruct the voice message by inserting a pause of the determined pause time at the break position. In this embodiment, it is assumed that the pose insertion unit 36 inserts a pose into the text data divided by the phrase division unit 33.

ポーズ挿入部３６は、元の音声メッセージに、遅延時間Δｔと同程度のポーズ時間を挿入する。具体的には、空白文字（スペース）を挿入する。１つの空白文字の発話時間が決まっているので、ポーズ時間に応じた数の空白文字を挿入する。以上により、「高台に」「ポーズ」「避難して」「ポーズ」「下さい」のようにポーズが挿入された音声メッセージが得られる。 The pause insertion unit 36 inserts a pause time comparable to the delay time Δt into the original voice message. Specifically, a blank character (space) is inserted. Since the utterance time of one blank character is determined, the number of blank characters corresponding to the pause time is inserted. As described above, a voice message in which a pose is inserted such as “on a hill”, “pause”, “evacuate”, “pause”, and “please” is obtained.

ポーズ挿入装置３０はポーズが挿入された音声メッセージを操作卓２０に送信する。操作卓２０は、ポーズが挿入された音声メッセージを、再度、音声信号に変換する。例えば、表示部に表示したり印刷するなどして、ユーザがポーズを挟みながら再度、発声し、マイク１１から集音してもよい。また、音声合成装置２３に送信し、音声合成装置２３から音声信号を取得してもよい。 The pose insertion device 30 transmits a voice message in which a pose is inserted to the console 20. The console 20 converts the voice message in which the pause is inserted into a voice signal again. For example, it may be displayed on the display unit or printed, and the user speaks again with a pause and collects sound from the microphone 11. Alternatively, the speech signal may be acquired from the speech synthesizer 23 by being transmitted to the speech synthesizer 23.

送信部１７は、遅延時間程度のポーズが挿入された音声メッセージを送信するので、子局２００がスピーカ５３から拡声する音声が遅延重複伝達することを抑制できる。 Since the transmission unit 17 transmits a voice message in which a pause of about the delay time is inserted, it is possible to suppress the delay and overlap transmission of the voice that the slave station 200 utters from the speaker 53.

〔変形例〕
図７では、音声信号に変換された音声メッセージにポーズを挿入したが、ポーズを挿入するタイミングは、実用上、適切なタイミングとすればよい。 [Modification]
In FIG. 7, a pause is inserted into a voice message converted into a voice signal, but the timing for inserting a pause may be set to an appropriate timing in practice.

図１１は、ポーズ挿入装置３０の機能ブロック図の別の一例を示す。図１１（ａ）は、音声メッセージがテキストデータ（電子データ又は電子ファイル）として与えられた場合の、機能ブロック図である。図１１（ａ）では、ユーザが発話した音声信号でなくテキストデータが与えられるので、音声信号取得部３１の代わりに音声メッセージ取得部４１がある。音声メッセージ取得部４１は、操作卓２０又はネットワーク２４から緊急情報に適した音声メッセージを取得する。操作卓２０から取得する場合、ユーザが選択した音声メッセージのテキストデータ、テレメータ３００からの通知に応じて自動的に選択されたテキストデータ、又は、J-Alertから指示されたテキストデータなどがある。 FIG. 11 shows another example of a functional block diagram of the pose insertion device 30. FIG. 11A is a functional block diagram when a voice message is given as text data (electronic data or electronic file). In FIG. 11A, text data is given instead of the voice signal uttered by the user, so that there is a voice message acquisition unit 41 instead of the voice signal acquisition unit 31. The voice message acquisition unit 41 acquires a voice message suitable for emergency information from the console 20 or the network 24. When acquiring from the console 20, there are text data of a voice message selected by the user, text data automatically selected according to a notification from the telemeter 300, text data instructed by J-Alert, and the like.

また、音声メッセージがテキストデータであれば、音声認識も不要となるため、音声認識部３２はなくてよい。すなわち、音声メッセージ取得部４１が取得した音声メッセージのテキストデータは、直接、フレーズ分割部３３に送出される。以降の手順は図７と同様である。 If the voice message is text data, voice recognition is not necessary, and the voice recognition unit 32 is not necessary. That is, the text data of the voice message acquired by the voice message acquisition unit 41 is directly sent to the phrase dividing unit 33. The subsequent procedure is the same as in FIG.

図１１（ｂ）は、音声メッセージが用紙に印刷された状態として与えられた場合の、機能ブロック図である。音声メッセージの内容がＦＡＸで送信されたり、いったん打ち出された状態でユーザに手渡されることがある。この場合、図７のようにユーザが発声することで音声信号に変換してもよいが、音声認識などの必要が生じ、また、認識ミスが発声するおそれがある。このため、音声メッセージが用紙に印刷された状態として与えられた場合、以下のように処理することが有効な場合がある。 FIG. 11B is a functional block diagram when a voice message is given as being printed on paper. The content of the voice message may be transmitted by FAX or handed to the user once it has been launched. In this case, it may be converted into a voice signal by the user uttering as shown in FIG. 7, but there is a need for voice recognition or the like, and there is a possibility that a recognition error may be uttered. For this reason, when a voice message is given as being printed on paper, it may be effective to perform the following processing.

図１１（ｂ）では、音声メッセージが原稿として与えられるので、光学的に原稿を読み取るスキャナ装置４２が使用される。スキャナ装置４２は、ポーズ挿入装置３０に接続されていてもよいし、操作卓２０に接続されていてもよい。いずれにしてもポーズ挿入装置３０は原稿の画像データを取得できる。また、スキャナ装置４２は一般に撮像素子が一次元に並んだラインセンサを有するが、いわゆるカメラ（二次元撮像素子）で原稿を撮影し画像データを作成してもよい。 In FIG. 11B, since a voice message is given as a document, a scanner device 42 that optically reads the document is used. The scanner device 42 may be connected to the pose insertion device 30 or may be connected to the console 20. In any case, the pose insertion device 30 can acquire image data of a document. The scanner device 42 generally has a line sensor in which image sensors are arranged one-dimensionally. However, a document may be photographed with a so-called camera (two-dimensional image sensor) to create image data.

また、ポーズ挿入装置３０はＯＣＲ（Optical Character Reader）部４３を有しており、画像データに記述されている音声メッセージをテキストデータに変換する。したがって、図１１（ａ）と同様にテキストデータが得られる。以降の処理は、図７と同様である。 The pose insertion device 30 has an OCR (Optical Character Reader) unit 43, and converts a voice message described in image data into text data. Therefore, text data is obtained in the same manner as in FIG. The subsequent processing is the same as in FIG.

なお、この他、音声メッセージ（音声信号、テキストデータ、原稿）を外国語に翻訳してから、同様の処理を行うことも可能である。子局２００が敷設された地域に外国人が多く居住している場合、外国人の母国語で音声メッセージを拡声することができる。 In addition, it is also possible to perform the same processing after translating a voice message (voice signal, text data, manuscript) into a foreign language. When many foreigners live in the area where the slave station 200 is laid, the voice message can be expanded in the native language of the foreigner.

以上のように、音声メッセージがポーズ挿入装置３０にどのように入力されるかに制限はなく、入力された音声メッセージをテキストデータに変換すれば、同様に本実施形態の処理を行うことができる。 As described above, there is no limitation on how the voice message is input to the pause insertion device 30. If the input voice message is converted into text data, the processing of the present embodiment can be similarly performed. .

〔動作手順〕
図１２（ａ）は、ポーズ挿入装置３０の動作手順を示すフローチャート図の一例である。まず、ポーズ挿入装置３０は緊急情報の内容を伝える音声メッセージをテキストデータの形態で取得する（Ｓ１０）。すなわち、音声信号取得部３１が取得した音声信号を音声認識部３２が音声認識する、音声メッセージ取得部４１がテキストデータそのものを受信する、又は、スキャナが原稿をスキャンしてＯＣＲ部４３がテキストデータに変換する、などにより取得する。 [Operation procedure]
FIG. 12A is an example of a flowchart illustrating an operation procedure of the pose insertion device 30. First, the pause insertion device 30 acquires a voice message conveying the contents of emergency information in the form of text data (S10). That is, the voice recognition unit 32 recognizes the voice signal acquired by the voice signal acquisition unit 31, the voice message acquisition unit 41 receives the text data itself, or the scanner scans the document and the OCR unit 43 sets the text data. Obtained by converting to.

次に、フレーズ分割部３３がテキストデータをフレーズに分割する（Ｓ２０）。 Next, the phrase dividing unit 33 divides the text data into phrases (S20).

発話時間長算出部は各フレーズの発話時間を算出する（Ｓ３０）。 The utterance time length calculation unit calculates the utterance time of each phrase (S30).

区切り位置・ポーズ時間決定部３５は、テキストデータの区切り位置を決定する（Ｓ４０）。上記のように、発話時間Ｐｓの合計が遅延時間Δｔ以下となる１つ以上のフレーズをまとめて、区切り位置を決定する。 The delimiter position / pause time determination unit 35 determines the delimiter position of the text data (S40). As described above, one or more phrases whose total utterance time Ps is equal to or less than the delay time Δt are collected to determine a break position.

区切り位置・ポーズ時間決定部３５は、ポーズ時間を決定する（Ｓ５０）。ポーズ時間は遅延時間Δｔと同程度である。 The delimiter position / pause time determination unit 35 determines the pause time (S50). The pause time is about the same as the delay time Δt.

ポーズ挿入部３６は、区切り位置・ポーズ時間決定部３５が決定した区切り位置に、遅延時間と同程度の無声文字（スペース）を挿入する（Ｓ６０）。 The pause insertion unit 36 inserts an unvoiced character (space) of the same degree as the delay time at the break position determined by the break position / pause time determination unit 35 (S60).

＜動作の変形例＞
これまで説明したように、音声メッセージにポーズ（無声文字や無声時間）を挿入するのでなく、区切り位置・ポーズ時間決定部がポーズ位置とポーズ時間だけ指定しておき、制御部１５が子局２００に音声メッセージを送信する際に、ポーズを挿入してもよい。 <Modification of operation>
As described so far, instead of inserting a pause (unvoiced character or silent time) into the voice message, the delimiter position / pause time determination unit designates only the pause position and pause time, and the control unit 15 selects the slave station 200. A pause may be inserted when sending a voice message to.

図１２（ｂ）は、ポーズ挿入装置３０の動作手順を示すフローチャート図の一例である。ステップＳ５０までの処理は同様であるが、Ｓ５５にて区切り位置・ポーズ時間決定部は変換後メッセージに区切り位置とポーズ時間を添付する（Ｓ５５）。例えば、変換後メッセージの文章長に対し、発話を開始した時を基準とする時刻ｔで区切り位置を特定し、各区切り位置のポーズ時間を指定する。例えば、ポーズ位置が２箇所の場合「（ｔ１、Δｔ）、（ｔ２、Δｔ）」となる。 FIG. 12B is an example of a flowchart illustrating an operation procedure of the pose insertion device 30. The processing up to step S50 is the same, but at S55, the delimiter position / pause time determination unit attaches the delimiter position and pause time to the converted message (S55). For example, with respect to the text length of the converted message, a delimiter position is specified at time t with reference to the time when utterance is started, and a pause time for each delimiter position is designated. For example, when there are two pause positions, “(t1, Δt), (t2, Δt)” is obtained.

操作卓は、ポーズ挿入装置から音声メッセージ、及び、区切り位置とポーズ時間を受信する（Ｓ１１０）。 The console receives a voice message, a break position and a pause time from the pause insertion device (S110).

操作卓の制御部は音声メッセージの送信を開始する（Ｓ１２０）。 The control unit of the console starts transmitting a voice message (S120).

制御部は、音声メッセージを送信しながら、音声メッセージの発話の開始から区切り位置の時刻になったか否かを判定する（Ｓ１３０）。区切り位置の時刻になるまで（Ｓ１３０のＮｏ）、制御部は音声メッセージの送信が終了したか否かを判定し（Ｓ１７０）、終了していなければ音声メッセージの送信を継続する。 While transmitting the voice message, the control unit determines whether or not the time at the break position has elapsed from the start of the voice message utterance (S130). Until the time of the delimiter position is reached (No in S130), the control unit determines whether or not the transmission of the voice message is finished (S170), and if not finished, the transmission of the voice message is continued.

区切り位置の時刻になった場合（Ｓ１３０のＹｅｓ）、制御部は音声メッセージの送信を停止する（Ｓ１４０）。そして、ポーズ時間が経過するまでポーズ時間が経過したか否かを判定する（Ｓ１５０）。 When the time at the delimiter position is reached (Yes in S130), the control unit stops the transmission of the voice message (S140). Then, it is determined whether or not the pause time has elapsed until the pause time has elapsed (S150).

ポーズ時間が経過した場合（Ｓ１５０のＹｅｓ）、制御部は中断した箇所から音声メッセージの送信を再開する（Ｓ１６０）。この箇所とはフレーズとフレーズの区切り位置である。制御部はフレーズという単位を意識しないが、結果的にフレーズ間にポーズが挿入される。このような送信方法は、音声メッセージにポーズ（無声文字や無声時間）を挿入する必要がなく、送信処理でポーズを挿入できる。 When the pause time has elapsed (Yes in S150), the control unit resumes transmitting the voice message from the interrupted location (S160). This part is a phrase and a phrase separation position. The control unit is not aware of the unit of phrase, but as a result, a pause is inserted between phrases. In such a transmission method, it is not necessary to insert a pause (unvoiced character or silent time) in the voice message, and the pause can be inserted in the transmission process.

以上説明したように、本実施例の防災無線システム１００は、音声メッセージを分割しポーズを挿入するので、エコー環境下で地域住民が聴き取りやすい音声メッセージを拡声することができる。 As described above, the disaster prevention radio system 100 according to the present embodiment divides a voice message and inserts a pause, so that it is possible to amplify a voice message that can be easily heard by local residents in an echo environment.

実施例１では、子局毎に予め求められている遅延時間に応じて区切り位置やポーズ時間を決定した。しかし、上記の実験結果に示したように正解率は遅延時間が長いほど低下するとは限らない。また、実験結果にはないが、出願人の研究によれば、正解率は、遅延時間が同じでも文章長によって異なることが明らかになってきた。また、子局から地域に伝達される反射音の伝達経路は１つとは限らないので、遅延時間が一意に特定できない場合がある。 In the first embodiment, the delimiter position and pause time are determined according to the delay time obtained in advance for each slave station. However, as shown in the above experimental results, the accuracy rate does not always decrease as the delay time increases. Although not in the experimental results, according to the applicant's research, it has become clear that the correct answer rate differs depending on the sentence length even if the delay time is the same. Moreover, since the transmission path of the reflected sound transmitted from the slave station to the area is not necessarily one, the delay time may not be specified uniquely.

そこで、本実施例では、測定された遅延時間と了解度低下率の関係に応じて、了解度低下率に最も影響を与える遅延時間を決定し、該遅延時間に応じて区切り位置やポーズ時間を決定する防災無線システム１００について説明する。了解度低下率とは、遅延時間がゼロの場合に被験者がフレーズを正答する正答率を１００とした場合に、正答率が遅延時間に応じて低下した率（％）を示す。 Therefore, in this embodiment, the delay time that most affects the intelligibility reduction rate is determined according to the relationship between the measured delay time and the intelligibility reduction rate, and the delimiter position and pause time are determined according to the delay time. The disaster prevention radio system 100 to be determined will be described. The intelligibility reduction rate indicates a rate (%) at which the correct answer rate decreases according to the delay time when the correct answer rate at which the subject correctly answers the phrase when the delay time is zero is 100.

図１３は、ポーズ挿入装置３０の機能ブロック図の一例を示す。図１３のポーズ挿入装置は、図７，１０の遅延時間データベース３８が、ロングパスエコー算出部６１、遅延時間決定部６２、及び、エコー・了解度対応データベース６３に置き換わる形態になっている。遅延時間決定部６２は、ロングパスエコー算出部６１及びエコー・了解度対応データベース６３からの情報を利用して、各子局において了解度低下率に最も影響を与える遅延時間を決定する。 FIG. 13 shows an example of a functional block diagram of the pose insertion device 30. The pause insertion apparatus of FIG. 13 is configured such that the delay time database 38 of FIGS. 7 and 10 is replaced with a long path echo calculation unit 61, a delay time determination unit 62, and an echo / intelligibility correspondence database 63. The delay time determination unit 62 uses the information from the long path echo calculation unit 61 and the echo / intelligibility correspondence database 63 to determine the delay time that most affects the intelligibility reduction rate in each slave station.

図１４（ａ）は、この図では，とある地点で３つの子局から同時に発せられた音声の状態を表している。横軸は時間、縦軸は相対性振幅（Relative Amplitude）である。子局２００の制御部５２は、子局２００から音を拡声した時（時刻ｔ０）を基準に、時刻ｔ１でマイク５４が別の子局から発せられた音声（特許請求の範囲の反響強度の一例である）を検出するまでの遅延時間Δｔを測定する。時刻ｔ２は別の子局からの音声が測定された時刻である。なお，この図では他の子局からの音を用いてΔｔを決定しているが，他の子局の影響を受けないような環境で、近隣に山や大きな建物といった障害物が存在する場合には、それぞれの障害物からの反射音をもってΔｔと決める場合がある。子局は、図１４（ａ）の信号波形そのもの又は遅延時間Δｔ１、Δｔ２を防災無線システム１００に送信する。 FIG. 14 (a) shows the state of voices simultaneously emitted from three slave stations at a certain point. The horizontal axis is time, and the vertical axis is relative amplitude (Relative Amplitude). The control unit 52 of the slave station 200 uses the reference to the time when the sound is amplified from the slave station 200 (time t0), and the sound that the microphone 54 emits from another slave station at time t1 (the echo intensity of the claims) Measure delay time Δt until it is detected as an example. Time t2 is the time when the voice from another slave station was measured. In this figure, Δt is determined using sound from other slave stations, but there are obstacles such as mountains and large buildings in the environment that are not affected by other slave stations. In some cases, Δt is determined by the reflected sound from each obstacle. The slave station transmits the signal waveform itself of FIG. 14A or the delay times Δt1 and Δt2 to the disaster prevention radio system 100.

複数の遅延時間が測定された場合、最も相対性振幅が大きい遅延時間を採用することが考えられる。しかし、最も相対性振幅が大きい遅延時間に基づきポーズを挿入しても、必ずしも了解度の向上に影響を与えない（効果が低い）可能性がある。了解度は人の感応特性や文章長などによっても影響されるためである。 When a plurality of delay times are measured, it is conceivable to adopt a delay time having the largest relative amplitude. However, even if a pause is inserted based on the delay time with the largest relative amplitude, there is a possibility that the improvement of the intelligibility is not necessarily affected (the effect is low). This is because the degree of intelligibility is also affected by human sensitivity characteristics and sentence length.

図１４（ｂ）は、エコー・了解度対応データベース６３の一例を模式的に説明する図の一例である。文章長毎に、任意の遅延時間と了解度低下率が対応づけられている。すなわち、図では３００［ms］置きの遅延時間に了解度低下率が対応づけられているが、補間可能な程度に短い間隔で遅延時間と了解度低下率の関係が登録されている。したがって、遅延時間決定部６２は任意の遅延時間に対し、了解度低下率を決定することができる。 FIG. 14B is an example of a diagram for schematically explaining an example of the echo / intelligibility correspondence database 63. For each sentence length, an arbitrary delay time is associated with an intelligibility reduction rate. That is, in the figure, the intelligibility reduction rate is associated with delay times every 300 [ms], but the relationship between the delay time and the intelligibility reduction rate is registered at intervals that are short enough to be interpolated. Therefore, the delay time determination unit 62 can determine the intelligibility reduction rate for any delay time.

１つの文章長に着目すると、遅延時間がある程度の範囲までは了解度低下率が増大するが、さらに遅延時間が長くなると了解度低下率が減少する場合がある。例えば、文章長２０００［ms］では、遅延時間９００［ms］までは了解度低下率が増大しているが、１２００［ms］では低下している。このように、遅延時間Δｔが長いほど了解度低下率が低下するとは限らない。 Focusing on one sentence length, the intelligibility reduction rate increases up to a certain range of delay time, but the intelligibility reduction rate may decrease as the delay time further increases. For example, at a sentence length of 2000 [ms], the intelligibility reduction rate increases until a delay time of 900 [ms], but decreases at 1200 [ms]. Thus, the intelligibility reduction rate does not necessarily decrease as the delay time Δt increases.

また、同じ遅延時間に着目すると、文章長によって了解度低下率が異なっている。例えば、遅延時間が６００［ms］の場合、文章長１０００［ms］では了解度低下率は１０〔％〕、文章長２０００［ms］では了解度低下率は１３〔％〕、文章長３０００［ms］では了解度低下率は１６〔％〕、となっている。 When attention is paid to the same delay time, the intelligibility reduction rate differs depending on the sentence length. For example, when the delay time is 600 [ms], the intelligibility reduction rate is 10% at a sentence length of 1000 [ms], the intelligibility reduction rate is 13% at a sentence length of 2000 [ms], and the sentence length is 3000 [ ms], the intelligibility reduction rate is 16%.

したがって、ロングパスエコー算出部６１が複数の遅延時間を算出した場合、ポーズを挿入する区切り位置を決定するための遅延時間は、文章長と了解度低下率に基づき決定することが好適であると言える。例えば、文章長が１０００［ms］の場合に、３００［ms］と６００［ms］の遅延時間が測定された場合、遅延時間算出部４２は了解度低下率が最も大きい６００［ms］を遅延時間Δｔに決定する。文章長が２０００［ms］の場合に、３００［ms］、６００［ms］、９００［ms］の遅延時間が測定された場合、遅延時間算出部４２は了解度低下率が最も大きい９００［ms］を遅延時間Δｔに決定する。こうすることで、測定された遅延時間のうち最も了解度低下率に影響する遅延時間を決定できる。 Therefore, when the long path echo calculation unit 61 calculates a plurality of delay times, it can be said that it is preferable to determine the delay time for determining the break position where the pause is inserted based on the sentence length and the intelligibility reduction rate. . For example, when the delay time of 300 [ms] and 600 [ms] is measured when the sentence length is 1000 [ms], the delay time calculation unit 42 delays 600 [ms] having the highest intelligibility reduction rate. The time Δt is determined. When the sentence length is 2000 [ms] and the delay times of 300 [ms], 600 [ms], and 900 [ms] are measured, the delay time calculation unit 42 has the highest intelligibility reduction rate 900 [ms]. ] Is determined as the delay time Δt. By doing so, it is possible to determine the delay time that most affects the intelligibility reduction rate among the measured delay times.

図１５は、ポーズ挿入装置３０の動作手順を示すフローチャート図の一例である。図１５では、ステップＳ３５で遅延時間が算出される。本ステップ以外の処理は図１２と同様である。 FIG. 15 is an example of a flowchart showing an operation procedure of the pose insertion device 30. In FIG. 15, the delay time is calculated in step S35. Processing other than this step is the same as in FIG.

以上説明したように、本実施例の防災無線システムは、複数の伝達経路を経て音声メッセージが伝達される地域に対し、最も了解度低下率が大きい遅延時間で音声メッセージが重なって伝達されないように、ポーズを挿入することができる。 As described above, the disaster prevention radio system according to the present embodiment prevents the voice message from being overlapped and transmitted to the area where the voice message is transmitted through a plurality of transmission paths with the delay time having the highest intelligibility reduction rate. , You can insert a pose.

２０操作卓
２１親局通信装置
２４ネットワーク
３０音声変換装置
３３フレーズ分割部
３４発話時間長算出部
３５区切り位置・ポーズ時間決定部
３６ポーズ挿入部
５３スピーカ
１００防災無線システム
２００子局
20 console 21 master station communication device 24 network 30 voice conversion device 33 phrase division unit 34 utterance time length calculation unit 35 delimiter position / pause time determination unit 36 pause insertion unit 53 speaker 100 disaster prevention radio system 200 slave station

Claims

An outdoor environmental voice transmission device that transmits voice messages for disaster prevention to one or more slave stations laid in the area,
Voice message acquisition means for acquiring the voice message;
Message dividing means for dividing the voice message acquired by the voice message acquiring means into a meaningful symbol string or requesting the outside to divide the voice message into the symbol string;
Utterance time calculating means for calculating the utterance time of each symbol string;
A delay time database in which the delay time of echo generated when the voice message is amplified from a slave station is registered for each slave station;
For each of the one or more symbol strings whose total utterance time is equal to or less than the delay time, a process for determining the end of the symbol string as the break position of the voice message is transmitted to the slave station registered in the delay time database Delimiter position determining means for performing a voice message based on the delay time of the slave station;
Unvoiced character insertion means for inserting unvoiced characters whose utterance time is longer than the delay time at the break position of the voice message;
Message transmission means for transmitting a voice message with unvoiced characters inserted to a slave station;
An outdoor environment audio transmission device comprising:

An outdoor environmental voice transmission device that transmits voice messages for disaster prevention to one or more slave stations laid in the area,
Voice message acquisition means for acquiring the voice message;
Message dividing means for dividing the voice message acquired by the voice message acquiring means into a meaningful symbol string or requesting the outside to divide the voice message into the symbol string;
Utterance time calculating means for calculating the utterance time of each symbol string;
A delay time database in which the delay time of echo generated when the voice message is amplified from a slave station is registered for each slave station;
Separation position for determining the end of one or more of the symbol strings whose total utterance time is equal to or less than the delay time for each slave station registered at least in the delay time database as the separation position of the voice message A determination means;
Message transmitting means for transmitting the voice message and stopping the transmission of the voice message when the delimiter position of the voice message is detected, and restarting the transmission of the voice message when a delay time elapses after the stop. ,
An outdoor environment audio transmission device comprising:

An intelligibility database in which intelligibility information for correctly understanding the meaning of the symbol string for different delay times is registered;
A delay time determining means for determining a delay time at which the intelligibility information is reduced most in the intelligibility database based on one or more delay time information transmitted from each slave station, as a delay time of the slave station; Have
The delimiter position determining means determines the delimiter position of the voice message to be transmitted to each slave station based on the delay time determined by the delay time determining means.
The outdoor environment audio transmission apparatus according to claim 1 or 2,

In the intelligibility database, for each utterance time of the entire voice message, intelligibility information for correctly understanding the meaning of the symbol string for different delay times is registered,
The delay time determining means determines a delay time of a slave station based on the delay time associated with the utterance time of the entire voice message to be transmitted and the intelligibility information.
The outdoor environment audio transmission device according to claim 3.

The unvoiced character insertion means makes the utterance time of unvoiced characters to be inserted at the delimiter position shorter than the delay time for a time that does not affect the intelligibility of the symbol string more than a predetermined amount,
The outdoor environment audio transmission apparatus according to claim 1.

An outdoor environment audio transmission device that transmits a voice message for disaster prevention to one or more slave stations laid in the area, and an outdoor environment audio transmission system having one or more slave stations,
Voice message acquisition means for acquiring the voice message;
Message dividing means for dividing the voice message acquired by the voice message acquiring means into a meaningful symbol string or requesting the outside to divide the voice message into the symbol string;
Utterance time calculating means for calculating the utterance time of each symbol string;
A delay time database in which the delay time of echo generated when the voice message is amplified from a slave station is registered for each slave station;
For each of the one or more symbol strings whose total utterance time is equal to or less than the delay time, a process for determining the end of the symbol string as the break position of the voice message is transmitted to the slave station registered in the delay time database Delimiter position determining means for performing a voice message based on the delay time of the slave station;
Unvoiced character insertion means for inserting unvoiced characters whose utterance time is longer than the delay time at the break position of the voice message;
Message transmitting means for transmitting a voice message in which unvoiced characters are inserted to a slave station,
The slave station is
Message receiving means for receiving a voice message with unvoiced characters inserted;
A loudspeaker for loudening a voice message in which an unvoiced character is inserted,
An outdoor environment audio transmission system characterized by that.