JP3917101B2

JP3917101B2 - Mobile phone terminal and voice level control program

Info

Publication number: JP3917101B2
Application number: JP2003107699A
Authority: JP
Inventors: 真之高橋
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-04-11
Filing date: 2003-04-11
Publication date: 2007-05-23
Anticipated expiration: 2023-04-11
Also published as: JP2004320122A

Description

【０００１】
【発明の属する技術分野】
本発明は、利用者による携帯電話からの音声入力を用いた情報サービス提供システムにおいて、音声認識機能を携帯電話内で行わず、ネットワークを介してセンタ側で行うシステムにおける入力音声レベルの制御方法に関する。
【０００２】
【従来の技術】
一般に、音声認識システムにはその認識性能が最大となる適性な入力音声レベルの範囲が存在する。レベルが過大であれば音声に歪を生じ、逆に過小であれば雑音と音声の分離が困難となり、いずれの場合も音声の特徴量を正確に抽出できなくなり、音声認識率が悪化する。そのため、音声認識装置の入力段で発話のピークレベルを検出し、音声認識装置が扱える最大入力レベルを超えないよう動的に増幅利得を調整している装置が知られている（特許文献１）。また、音声認識装置の前段に音声を符合化して一時的に取り込む機能を設け、単位発話中は音声認識部に対して最適な一定の利得に調整している装置も知られている（特許文献２、３）。
【０００３】
【特許文献１】
特開平０５−２６５４８４号公報
【特許文献２】
特開２００１−１１７５８５号公報
【特許文献３】
特開２００２−０９１４８７号公報
【０００４】
【発明が解決しようとする課題】
しかしながら、特許文献１に示す音声ピークレベル及びノイズレベル検出による動的な利得調整を行う装置にあっては、発話が行われている間もリアルタイムで利得調整を行うため、レベル自体はほぼ理想的な範囲に保たれる一方、信号振幅の直線性が保証されなくなり、必ずしも音声認識率の向上につながらないという問題がある。また、特許文献２、３に示す音声一時取り込みによる単位発話毎の利得調整を行う装置にあっては、発話単位で利得調整を行うため、単位発話内では音声振幅の直線性が確保されるが、音声信号を一時バッファリングすることによる音声伝達の遅れが生じ、音声認識システムを用いて自然な対話を行う際に重要であるレスポンスの悪化を避けられないという問題がある。さらに、これらの装置に共通する点として、いずれも音声認識処理の直前にレベル調整することを想定しており、レベル調整と音声認識が携帯電話ネットワークを通して接続されている場合を想定していない。携帯電話のコーデックでは、雑音がある一定レベル以下の場合はそれを雑音として扱い、ＳＮ比を向上させる仕組みがある。一方、雑音がそのレベルを超えた場合はそれを音声として扱うため、大きくても音声のＳＮ比が非常に劣化した状態で伝送され、音声認識率が悪化するという問題もある。
【０００５】
本発明は、このような事情に鑑みてなされたもので、携帯電話端末を使用して、音声認識処理を行う場合に適正な音声レベル調整を行うこと可能な携帯電話端末及び音声レベル制御プログラムを提供することを目的とする。
【０００６】
【課題を解決するための手段】
本発明は、マイクロホンからの音声信号を符号化して移動電話網へ送信するエンコード手段を備えた携帯電話端末であって、前記マイクロホンから出力される音声信号の所定時間内における最低レベル値を検出し、前記所定時間間隔毎に該最低レベル値を出力するレベル監視手段と、前記レベル監視手段が出力する最低レベル値に基づき、所定の計算式により適正利得を求め、この適正利得を用いて前記音声信号の利得を調整し、前記エンコード手段へ出力する利得調整手段とを備えたことを特徴とする。
【０００７】
本発明は、前記利得調整手段は、想定される雑音レベルの変化量に応じたマージンを付加して前記適正利得を求めることを特徴とする。
【０００８】
本発明は、マイクロホンからの音声信号を符号化して移動電話網へ送信するエンコード手段を備えた携帯電話端末において動作する音声レベル制御プログラムであって、前記マイクロホンから出力される音声信号の所定時間内における最低レベル値を検出し、前記所定時間間隔毎に該最低レベル値を出力するレベル監視処理と、前記レベル監視処理が出力する最低レベル値に基づき、所定の計算式により適正利得を求め、該適正利得を用いて前記音声信号の利得を調整し、前記エンコード手段へ出力する利得調整処理とをコンピュータに行わせることを特徴とする。
【０００９】
本発明は、前記利得調整処理は、想定される雑音レベルの変化量に応じたマージンを付加して前記適正利得を求めることを特徴とする。
【００１０】
【発明の実施の形態】
以下、本発明の一実施形態による携帯電話端末を図面を参照して説明する。図１は同実施形態の構成を示すブロック図である。この図において、符号１は、携帯電話端末である。符号２は、音声を集音して音声信号に変換するマイクロホンであり、通話を行うために携帯電話端末に備えられているマイクロホンである。符号３は、マイクロホン２から出力される音声信号のレベルを監視するレベル監視部である。符号４は、音声信号の利得を調整する利得調整部である。符号５は、音声信号を符号化して出力するエンコード部である。符号６は、移動電話のネットワークである。符号７は、ネットーワーク６内に設けられ、エンコード部５によって符号化された信号を復号化するデコード部である。符号８は、サービスを提供を提供するセンタシステムである。センタシステム８が提供するサービスは、例えば、携帯電話端末１を使用して、交通機関のチケット予約をするサービス等である。符号９は、センタシステム８をネットワーク６と接続するＣＴＩ（Computer Telephony Integration）部である。符号１０は、マイクロホン２で集音した音声の認識を行う音声認識部である。符号１１は、音声認識部１０の出力に基づいて、チケット予約等の処理を行う処理部である。
【００１１】
ここで、図３を参照して、図１に示す携帯電話端末１の動作の概要を説明する。まずレベル監視部３は、常に利得調整部４の入力レベルを監視する。そして、所定時間（例えば３〜５秒）毎にその時間区間での最低入力レベルを検出し、そのレベル情報を利得調整部４へ通知する。利得調整部４は、通知されたレベルに一定のマージンを付加したレベルをその時間区間での雑音レベルとする。マージンを付加する理由は、一般に周辺雑音レベルはある程度の幅で変動しており、雑音の最小値を抽出するだけでは雑音レベルの最大値を小さく見積り過ぎる恐れがあるからである。続いて、利得調整部５はこの雑音レベルの信号が入力された場合において、利得調整部４からエンコード部５へ出力される信号が、エンコード部５によって音声と判断される基準レベル以下になるよう調整する。そして、レベル監視部３は、レベル利得調整部４に対してレベル情報を送ると同時に、最低入力レベルの値をリセットする。このように所定時間（例えば３〜５秒）毎にレベル監視部３は雑音レベルを検出し、その都度利得調整部４が利得を調整する動作を繰り返す。
【００１２】
次に、図３を参照して、レベル監視部３、利得調整部４の動作の詳細を説明する。まず、レベル監視部３は、所定時間内（例えば３〜５秒）における音声信号レベルの最低レベルを検出する（ステップＳ１、Ｓ２）。そして、レベル監視部３は、所定時間内の最低レベル値を利得調整部４へ通知する（ステップＳ３）。このレベル監視部３における動作（ステップＳ１〜Ｓ３）は繰り返し実行する。これにより、所定時間（例えば３〜５秒）毎にレベル監視部３から利得調整部４に対して、所定時間区間の最低レベル値が通知されることとなる。
【００１３】
一方、利得調整部４は、レベル監視部３より最低レベル値の通知を受けて、適正利得を計算する（ステップＳ４）。適正利得は、（３）式によって計算する。ここで、適正利得の計算方法について説明する。
所定時間内の最低入力レベルをＬn［ｄＢ］、マージンをＭ［ｄＢ］、初期利得をＧ0［ｄＢ］、適正最大雑音入力レベル（利得Ｇ0においてＳＮ比が急に悪化する直前のレベル監視部３入力レベル）をＬin［ｄＢ］、適正最大雑音出力レベル（利得Ｇ0においてＳＮ比が急に悪化する直前の利得調整部４出力レベル）をＬout［ｄＢ］、適正利得をＧan［ｄＢ］とすると、初期状態では、（１）式が成り立つ。
Ｌin＋Ｇ0＝Ｌout・・・（１）
【００１４】
所定時間区間毎に、その区間での最低入力レベル時において、出力がＬoutを超えないようにする必要がある。直前の区間の最低入力レベルと現区間の最小入力レベルは多少変化するため、所定のマージンＭを設定する。マージンＭがあまり大きくない場合、出力レベルが大きすぎる（Ｌoutを超える）よりは小さい方がＳ／Ｎの劣化が少ないため、マージンＭは正の値に設定する。このマージンＭは、想定される雑音レベルの変化量に応じて予め決定しておけばよい。以上のことから各区間では、（２）式が成り立つ。
Ｌｎ＋Ｇan＋Ｍ＝Ｌout・・・（２）
従って、適正利得Ｇanは、（３）式によって計算する。
Ｇan＝Ｌout−Ｌn−Ｍ＝Ｇ0−（Ｌn−Lin）−Ｍ・・・（３）
【００１５】
次に、利得調整部４は、利得を計算によって求めた適正利得Ｇanに変更する（ステップＳ５）。これにより、マイクロホン２から出力される音声信号の利得が適正利得Ｇanとなるように調整されてエンコード部５に対して出力される。
【００１６】
このように、エンコード部５に入力される音声信号レベルは、発話がない状態において常に基準レベル以下とすることができる。従って、急激な周辺雑音レベルの変化がない限り、発話中も雑音レベルはエンコード部５の基準レベルよりも常に小さく保たれることが期待でき、ネットワークを通して音声認識処理に入力される音声にＳＮ比の低下が起こることがなく、常に音声認識処理の性能が最大限に発揮することができる。また、レベル監視部３の最低レベル検出間隔を音声対話サービスにおける１発話の長さと同等以上に設定することにより、１発話中に頻繁に増幅率が変化することがなく、エンコード部５への出力の直線性が確保され、音声認識処理の性能を最大に発揮させることができる。また、マイクロホン２からの音声信号をバッファリングすることなくレベル調整が行えるため、信号に遅延が生じず、レスポンスの良い音声対話が実現できる。また、レベル監視部３は最小値を更新する処理のみを行えばよく、利得調整は概ね数秒（発話語彙の長さに応じて変わる）に１度レベル監視部３から利得を取り出して利得調整を行う処理のみを行えば良い。従って処理能力の高い部品を用いる必要がない。
【００１７】
なお、図１における携帯電話端末１の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより利得調整処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。
【００１８】
また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。
【００１９】
【発明の効果】
以上説明したように、この発明によれば、エンコード部に入力される音声信号レベルは、発話がない状態において常に基準レベル以下とすることができるという効果が得られる。従って、急激な周辺雑音レベルの変化がない限り、発話中も雑音レベルはエンコード部の基準レベルよりも常に小さく保たれることが期待でき、ネットワークを通して音声認識処理に入力される音声にＳＮ比の低下が起こることがなく、常に音声認識処理の性能が最大限に発揮することができる。
【図面の簡単な説明】
【図１】本発明の一実施形態の構成を示すブロック図である。
【図２】レベル調整動作の概要を示す説明図である。
【図３】図１に示す携帯電話端末１の動作を示すフローチャートである。
【符号の説明】
１・・・携帯電話端末
２・・・マイクロホン
３・・・レベル監視部
４・・・利得調整部
５・・・エンコード部
６・・・ネットワーク
７・・・デコード部
８・・・センタシステム
９・・・ＣＴＩ部
１０・・・音声認識部
１１・・・処理部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a method for controlling an input voice level in a system in which a voice recognition function is not performed in a mobile phone but performed on the center side via a network in an information service providing system using voice input from a mobile phone by a user. .
[0002]
[Prior art]
In general, a speech recognition system has an appropriate input speech level range that maximizes recognition performance. If the level is too high, the voice will be distorted. If the level is too low, it will be difficult to separate the noise and the voice. In either case, it will not be possible to accurately extract the voice feature amount, and the voice recognition rate will deteriorate. Therefore, a device is known in which the peak level of speech is detected at the input stage of the speech recognition device, and the amplification gain is dynamically adjusted so as not to exceed the maximum input level that can be handled by the speech recognition device (Patent Document 1). . In addition, there is also known an apparatus in which a function for encoding and temporarily capturing voice is provided in the front stage of the voice recognition apparatus, and the unit is set to an optimum constant gain for the voice recognition unit during the unit utterance (Patent Document). 2, 3).
[0003]
[Patent Document 1]
JP 05-265484 A [Patent Document 2]
JP 2001-117585 A [Patent Document 3]
Japanese Patent Laid-Open No. 2002-091487
[Problems to be solved by the invention]
However, in the apparatus that performs dynamic gain adjustment by detecting the voice peak level and noise level shown in Patent Document 1, the gain itself is adjusted in real time even during the utterance, so the level itself is almost ideal. However, there is a problem that the linearity of the signal amplitude is not guaranteed and the speech recognition rate is not necessarily improved. In addition, in the devices that perform gain adjustment for each unit utterance by temporarily capturing voices shown in Patent Documents 2 and 3, since gain adjustment is performed for each utterance, linearity of the voice amplitude is ensured within the unit utterance. There is a problem that voice transmission delay occurs due to temporary buffering of the voice signal, and deterioration of the response, which is important when a natural conversation is performed using the voice recognition system, cannot be avoided. Furthermore, as a point common to these apparatuses, it is assumed that the level adjustment is performed immediately before the voice recognition process, and the case where the level adjustment and the voice recognition are connected through the mobile phone network is not assumed. In the codec of a mobile phone, there is a mechanism for improving the SN ratio by treating the noise as noise when the noise is below a certain level. On the other hand, if the noise exceeds that level, it is handled as speech, so that even if it is large, it is transmitted in a state where the S / N ratio of speech is very degraded, and there is a problem that speech recognition rate deteriorates.
[0005]
The present invention has been made in view of such circumstances, and provides a mobile phone terminal and a voice level control program capable of performing appropriate voice level adjustment when performing voice recognition processing using a mobile phone terminal. The purpose is to provide.
[0006]
[Means for Solving the Problems]
The present invention is a mobile phone terminal including an encoding unit that encodes a voice signal from a microphone and transmits the encoded signal to a mobile telephone network, and detects a minimum level value within a predetermined time of the voice signal output from the microphone. A level monitoring unit that outputs the minimum level value at each predetermined time interval, and an appropriate gain is obtained by a predetermined calculation formula based on the minimum level value output by the level monitoring unit, and the sound is obtained using the appropriate gain. And a gain adjusting means for adjusting the gain of the signal and outputting it to the encoding means.
[0007]
The present invention is characterized in that the gain adjusting means obtains the appropriate gain by adding a margin according to an assumed amount of change in noise level.
[0008]
The present invention relates to an audio level control program that operates in a mobile phone terminal provided with an encoding means for encoding an audio signal from a microphone and transmitting it to a mobile telephone network, and within a predetermined time of the audio signal output from the microphone A level monitoring process for detecting the lowest level value at each predetermined time interval and outputting the lowest level value at each predetermined time interval; and, based on the lowest level value output by the level monitoring process, obtaining an appropriate gain by a predetermined calculation formula, The gain of the audio signal is adjusted using an appropriate gain, and the computer performs gain adjustment processing to output to the encoding means.
[0009]
The present invention is characterized in that the gain adjustment processing obtains the appropriate gain by adding a margin according to an assumed amount of change in noise level.
[0010]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, a mobile phone terminal according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the embodiment. In this figure, reference numeral 1 denotes a mobile phone terminal. Reference numeral 2 denotes a microphone that collects sound and converts it into an audio signal, and is a microphone that is provided in a mobile phone terminal for making a call. Reference numeral 3 denotes a level monitoring unit that monitors the level of an audio signal output from the microphone 2. Reference numeral 4 denotes a gain adjusting unit that adjusts the gain of the audio signal. Reference numeral 5 denotes an encoding unit that encodes and outputs an audio signal. Reference numeral 6 denotes a mobile telephone network. Reference numeral 7 denotes a decoding unit that is provided in the network 6 and decodes the signal encoded by the encoding unit 5. Reference numeral 8 denotes a center system that provides a service. The service provided by the center system 8 is, for example, a service for making a ticket reservation for transportation using the mobile phone terminal 1. Reference numeral 9 denotes a CTI (Computer Telephony Integration) unit that connects the center system 8 to the network 6. Reference numeral 10 denotes a speech recognition unit that recognizes speech collected by the microphone 2. Reference numeral 11 denotes a processing unit that performs processing such as ticket reservation based on the output of the speech recognition unit 10.
[0011]
Here, with reference to FIG. 3, the outline | summary of operation | movement of the mobile telephone terminal 1 shown in FIG. 1 is demonstrated. First, the level monitoring unit 3 always monitors the input level of the gain adjusting unit 4. Then, the minimum input level in the time interval is detected every predetermined time (for example, 3 to 5 seconds), and the level information is notified to the gain adjusting unit 4. The gain adjusting unit 4 sets a level obtained by adding a certain margin to the notified level as a noise level in the time interval. The reason for adding the margin is that the ambient noise level generally fluctuates within a certain range, and if the minimum value of noise is extracted, the maximum value of the noise level may be overestimated. Subsequently, when a signal of this noise level is input, the gain adjusting unit 5 causes the signal output from the gain adjusting unit 4 to the encoding unit 5 to be equal to or lower than a reference level determined as sound by the encoding unit 5. adjust. Then, the level monitoring unit 3 sends level information to the level gain adjusting unit 4 and simultaneously resets the value of the minimum input level. Thus, the level monitoring unit 3 detects the noise level every predetermined time (for example, 3 to 5 seconds), and the gain adjusting unit 4 repeats the operation of adjusting the gain each time.
[0012]
Next, details of the operations of the level monitoring unit 3 and the gain adjusting unit 4 will be described with reference to FIG. First, the level monitoring unit 3 detects the lowest level of the audio signal level within a predetermined time (for example, 3 to 5 seconds) (steps S1 and S2). Then, the level monitoring unit 3 notifies the gain adjusting unit 4 of the lowest level value within a predetermined time (step S3). The operation (steps S1 to S3) in the level monitoring unit 3 is repeatedly executed. As a result, the level monitoring unit 3 notifies the gain adjustment unit 4 of the minimum level value in the predetermined time interval every predetermined time (for example, 3 to 5 seconds).
[0013]
On the other hand, the gain adjusting unit 4 receives the notification of the lowest level value from the level monitoring unit 3 and calculates an appropriate gain (step S4). The appropriate gain is calculated by equation (3). Here, a method for calculating the appropriate gain will be described.
The minimum input level within a predetermined time is Ln [dB], the margin is M [dB], the initial gain is G0 [dB], the appropriate maximum noise input level (the level monitoring unit 3 immediately before the SN ratio suddenly deteriorates at the gain G0) Input level) is Lin [dB], the appropriate maximum noise output level (the output level of the gain adjusting unit 4 immediately before the SN ratio suddenly deteriorates at the gain G0) is Lout [dB], and the appropriate gain is Gan [dB]. In the initial state, equation (1) is established.
Lin + G0 = Lout (1)
[0014]
Every predetermined time interval, it is necessary to prevent the output from exceeding Lout at the minimum input level in that interval. Since the minimum input level of the immediately preceding section and the minimum input level of the current section slightly change, a predetermined margin M is set. When the margin M is not so large, since the S / N degradation is less when the output level is smaller than when the output level is too large (exceeding Lout), the margin M is set to a positive value. The margin M may be determined in advance according to the assumed amount of change in the noise level. From the above, equation (2) is established in each section.
Ln + Gan + M = Lout (2)
Therefore, the appropriate gain Gan is calculated by the equation (3).
Gan = Lout-Ln-M = G0- (Ln-Lin) -M (3)
[0015]
Next, the gain adjusting unit 4 changes the gain to the appropriate gain Gan obtained by calculation (step S5). As a result, the gain of the audio signal output from the microphone 2 is adjusted to the appropriate gain Gan and output to the encoding unit 5.
[0016]
As described above, the audio signal level input to the encoding unit 5 can always be equal to or lower than the reference level in a state where there is no utterance. Therefore, as long as there is no sudden change in the ambient noise level, it can be expected that the noise level is always kept lower than the reference level of the encoding unit 5 during speech, and the S / N ratio is added to the voice input to the voice recognition process through the network. Therefore, the performance of voice recognition processing can always be maximized. Further, by setting the minimum level detection interval of the level monitoring unit 3 to be equal to or longer than the length of one utterance in the voice conversation service, the amplification factor does not frequently change during one utterance, and output to the encoding unit 5 Therefore, the voice recognition processing performance can be maximized. In addition, since the level can be adjusted without buffering the audio signal from the microphone 2, a delay in the signal does not occur, and a voice response with good response can be realized. Further, the level monitoring unit 3 only needs to perform the process of updating the minimum value, and the gain adjustment is performed by taking out the gain from the level monitoring unit 3 once every few seconds (which changes according to the length of the utterance vocabulary). It is only necessary to perform the processing to be performed. Therefore, it is not necessary to use parts with high processing capability.
[0017]
1 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed to adjust the gain. Processing may be performed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.
[0018]
The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.
[0019]
【The invention's effect】
As described above, according to the present invention, it is possible to obtain an effect that the audio signal level input to the encoding unit can always be equal to or lower than the reference level when there is no utterance. Therefore, as long as there is no sudden change in the ambient noise level, it can be expected that the noise level is always kept lower than the reference level of the encoding unit even during speech, and the S / N ratio of the voice input to the voice recognition process through the network is No degradation occurs, and the performance of the speech recognition process can always be maximized.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an embodiment of the present invention.
FIG. 2 is an explanatory diagram showing an outline of a level adjustment operation.
FIG. 3 is a flowchart showing an operation of the mobile phone terminal 1 shown in FIG. 1;
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Mobile phone terminal 2 ... Microphone 3 ... Level monitoring part 4 ... Gain adjustment part 5 ... Encoding part 6 ... Network 7 ... Decoding part 8 ... Center system 9 ... CTI part 10 ... Voice recognition part 11 ... Processing part

Claims

A cellular phone for transmitting a voice signal from a microphone to a mobile telephone network via an encoding means for encoding , and causing a voice recognition device connected to the mobile telephone network to perform voice recognition of the voice collected by the microphone. A terminal,
A level monitoring means for detecting a minimum level value L n at a predetermined time of the audio signal, and outputs the outermost low-level value L n for each said predetermined time interval which is output from the microphone,
Based on the minimum level value L n output from said level monitoring means, a total of formula
G an = G 0 − (L n −L in )
(Where G 0 is a predetermined initial gain, and L in is a predetermined appropriate maximum noise level)
Seek a proper gain G an, makes the appropriate gain using a G an, to adjust the gain of the audio signal for each said predetermined time interval, characterized by comprising a gain adjusting means for outputting to said encoding means mobile Phone terminal.

2. The gain adjusting means obtains the appropriate gain G an by subtracting a margin M, which is a positive value corresponding to an assumed amount of change in noise level, from the appropriate gain G an. The mobile phone terminal described in 1.

A cellular phone for transmitting a voice signal from a microphone to a mobile telephone network via an encoding means for encoding , and causing a voice recognition device connected to the mobile telephone network to perform voice recognition of the voice collected by the microphone. An audio level control program that runs on a terminal,
A level monitoring process detects the minimum level value L n, and outputs the outermost low-level value L n for each of the predetermined time intervals in a predetermined time period of the audio signal output from the microphone,
Based on the minimum level value L n output by the level monitoring process, total formula
G an = G 0 − (L n −L in )
(Where G 0 is a predetermined initial gain, and L in is a predetermined appropriate maximum noise level)
Seek a proper gain G an, by the feature that the proper gain with reference to G an, to adjust the gain of the audio signal for each said predetermined time interval, to perform a gain adjustment processing to be output to the encoding unit to the computer Voice level control program.

4. The gain adjustment process is characterized in that the appropriate gain G an is obtained by subtracting a margin M which is a positive value corresponding to an assumed amount of change in noise level from the appropriate gain G an. The sound level control program described in 1.