JP2004007277A

JP2004007277A - Communication terminal equipment, sound recognition system and information access system

Info

Publication number: JP2004007277A
Application number: JP2002160589A
Authority: JP
Inventors: Atsushi Yamane; 山根　淳
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2002-05-31
Filing date: 2002-05-31
Publication date: 2004-01-08

Abstract

<P>PROBLEM TO BE SOLVED: To use sound as an Internet input without increasing a cost of a terminal by transmitting the coding sound at a data transmission mode, recognizing the sound by using a transmitted sound parameter and inputting data by using the Internet in a sound recognition server. <P>SOLUTION: This equipment is provided with a sound input means inputting the sound and converting the inputted sound into a sound signal, a sound parameter extraction means extracting the prescribed sound parameter to the sound input means from the inputted sound, a data input means inputting data, a data transmission means transmitting data by using a procedure of transmission/reception on communication using a network for data communication, a sound transmission means transmitting the sound by using the procedure of transmission/reception on communication using the network for sound communication and a selection means selecting the data transmission means or the sound transmission means when the sound parameter is transmitted. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、音声を文字列データに変換する通信端末装置、音声認識システム、および情報アクセスシステムに関する。
【０００２】
【従来の技術】
近年、携帯電話を端末としたインターネットアクセスが広く利用されるようになってきている。携帯電話によるインターネット利用は、テンキーを駆使してキーワード入力するものであり、その使い勝手の悪さという問題を内在している。
【０００３】
前記問題を解決するために、音声を端末で認識し、インターネット利用の入力として利用することが考えられるが、以下の理由で現実的ではない。まず、現在の音声認識技術では、不特定話者のフリーワード認識はまだ十分な品質ではできない。さらに、それを携帯電話端末に搭載するには、多くの端末コスト増を要する。
【０００４】
一方、これまでに音声をインターネット入力手段として用いるためのシステムがいくつか提案されている。代表的な例としては、ＩＢＭ社のＷｅｂＳｐｈｅｒｅ　Ｖｏｉｃｅ　Ｓｅｒｖｅｒが挙げられる。これは、電話を利用して音声を入力し、さらにＡＤ変換してデジタル信号に変換し、ＶｏＩＰ（Ｖｏｉｃｅ　ｏｖｅｒ　ＩＰ　）技術を利用してＩＰ（Ｉｎｔｅｒｎｅｔ　Ｐｒｏｔｏｃｏｌ）網を利用して音声サーバに転送し、音声サーバ上で音声認識を行ってテキスト情報に変換することによってＷｅｂサーバへの入力データとし、さらに、Ｗｅｂサーバからの出力であるテキスト情報を、音声サーバ上で音声合成技術を用いて音声データに変換し、ＩＰ網経由でＶｏＩＰ技術により音声信号に変換し、電話を利用して音声出力するものである。音声認識の対象語句は、フリーワードではなく、Ｗｅｂ入力としてあらかじめ用意された限定的なものであるので、不特定話者の認識が可能になっている。
【０００５】
【発明が解決しようとする課題】
しかし、ＷｅｂＳｐｈｅｒｅ　Ｖｏｉｃｅ　Ｓｅｒｖｅｒのシステムは、音声をインターネット利用の入力とするものであるが、以下のような理由から、携帯電話を端末として利用することを困難にしている。
【０００６】
まず、音声品質の問題が挙げられる。通常のダイヤル通話やＰＨＳ（ＰｅｒｓｏｎａｌＨａｎｄｙ−ｐｈｏｎｅ　Ｓｙｓｔｅｍ）は、それぞれ６４ｋｂｉｔ／ｓのＰＣＭ（Ｐｕｌｓｅ　Ｃｏｄｅ　Ｍｏｄｕｌａｔｉｏｎ）方式および３２ｋｂｉｔ／ｓのＡＤＰＣＭ（Ａｄａｐｔｉｖｅ　Ｄｉｆｆｅｒｅｎｔｉａｌ　ＰＣＭ）方式といった、比較的ビットレートの高い符号化方式を用いてデジタル符号化しているため、音声の劣化が少なく、高精度の音声認識が可能になる。しかし、現在最も普及しているＰＤＣ（Ｐｅｒｓｏｎａｌ　Ｄｉｇｉｔａｌ　Ｃｅｌｌｕｌａｒ）方式あるいはＣＤＭＡ方式の携帯電話の場合、５．６ｋｂｉｔ／ｓのＶＳＥＬＰ（Ｖｅｃｔｏｒ　Ｓｕｍ　Ｅｘｃｉｔｅｄ　Ｌｉｎｅａｒ　Ｐｒｅｄｉｃｔｉｏｎ）方式等のＣＥＬＰ（Ｃｏｄｅ　Ｅｘｃｉｔｅｄ　Ｌｉｎｅａｒ　Ｐｒｅｄｉｃｔｉｏｎ）系の符号化方式を用いて符号化しているため、ＰＣＭやＡＤＰＣＭ等と比較して音声劣化が激しく、十分な品質の音声認識が困難な状況となっている。
【０００７】
また、携帯電話は、音声通話の場合は音声通信用のネットワークを用い（この音声通信用のネットワークを用いた通信に関する送受信の手順を「音声通信モード」と呼ぶものとする）、インターネット接続の場合はデータ通信用のパケットネットワークを用いて（データ通信用のネットワークを用いた通信に関する送受信の手順を「データ通信モード」と呼ぶものとする）通信を行う。データ通信の場合は誤りが許されないため、誤り訂正やパケット損失の際の手順等が明確に規定されているが、音声通信はリアルタイム制が重視されるため、誤り訂正用のビットを付加して送信されるのみで、伝送路損失等への対処はデータ通信の場合と比較して弱い。このことも、携帯電話音声を用いた音声認識を困難なものにしている。
【０００８】
ところが、インターネット入力の場合は、通常の音声通信の場合と異なり、リアルタイム制に対する要求はそれほど高くない。また、一般的な携帯電話また、（ＰＨＳを除く）携帯電話で用いられているＣＥＬＰ系の音声符号化方式は、線形予測係数やピッチ周期等、音声認識で用いることのできるパラメータを抽出する。この符号化方式を用いて音声を符号化してデータ送信することは、もともと携帯端末に搭載されている機能を用いるだけであるため、コスト増はほとんどない。
【０００９】
本発明は、上記事情に鑑み、符号化音声をデータ送信モードで送信し、音声認識サーバでは送信された音声パラメータを用いて音声認識を行い、インターネット利用の入力とすることにより、端末のコスト増をほとんど伴わずに、インターネット入力として音声を利用することを目的とする。
【００１０】
【課題を解決するための手段】
かかる目的を達成するために、請求項１記載の発明は、音声を入力し、入力した音声を音声信号に変換する音声入力手段と、入力した音声から前記音声入力手段に所定の音声パラメータを抽出する音声パラメータ抽出手段と、データを入力するデータ入力手段と、データ通信用のネットワークを用いた通信に関する送受信の手順を用いてデータ送信を行うデータ送信手段と、音声通信用のネットワークを用いた通信に関する送受信の手順を用いて音声送信を行う音声送信手段とを有し、前記音声パラメータを送信する際は、前記データ送信手段または前記音声送信手段のどちらかを選択する選択手段を有することを特徴としている。
【００１１】
請求項２記載の発明は、音声パラメータを受信する受信手段と、受信したパラメータを用いて音声認識処理を行い文字列データを出力する音声認識手段と、前記文字列データを送信する文字列データ出力手段とを有することを特徴としている。
【００１２】
請求項３記載の発明は、請求項１記載の発明と、請求項２記載の発明とを用いて入力音声を文字列データに変換する手段を有することを特徴としている。
【００１３】
請求項４記載の発明は、請求項３記載の音声認識システムを用いて音声入力から入力用文字列データを得ることを特徴としている。
【００１４】
【発明の実施の形態】
以下、本発明の実施の形態を添付図面を参照しながら詳細に説明する。
【００１５】
図１は、本発明の情報アクセスシステムの基本構成を示す図である。本構成は、第１の端末装置１００と、第２の端末装置２００と、Ｗｅｂサーバ３００と、音声伝送用ネットワーク３０１と、データ転送用ネットワーク３０２と、インターネット３０３とから構成される。
【００１６】
前記第１の端末装置１００は、音声入力手段１１０と、音声パラメータ抽出手段１２０と、データ送信手段１３０と、音声送信手段１４０と、選択手段１５０と、データ入力手段１６０とから構成される。
【００１７】
さらに、前記第２の端末装置２００は、受信手段２１０と、音声認識手段２２０と、文字列データ出力手段２３０とから構成される。以下においては、第１の端末装置は、移動体通信端末、特に携帯電話を例として述べるが、この限りではない。
【００１８】
前記第２の端末装置は、受信手段２１０と、音声認識手段２２０と、および文字列データ出力手段２３０とから構成される。音声認識手段２２０においては、専門家には公知のＬＰＣ（Ｌｉｎｅａｒ　Ｐｒｅｄｉｃｔｉｖｅ　Ｃｏｄｉｎｇ）ケプストラム等のホルマントパラメータを用いた音声認識アルゴリズムを用いるものとする。
【００１９】
図２は、本発明の通信端末装置の第１の端末装置の基本的な動作を示す図である。まず、第１の端末１００を用いて通常音声通話を行う場合は、音声入力手段１１０と、音声パラメータ抽出手段１２０と、音声送信手段１４０とを用いる。選択手段１５０は、音声送信手段１４０を用いるように設定される。
【００２０】
まず、音声が音声入力手段１１０に入力され、音声信号に変換される。音声入力手段１１０としては、マイクロフォン装置あるいは低域通過フィルタつきのマイクロフォン装置が挙げられる。音声入力手段１１０がＡＤ変換機能を持っている場合音声信号はデジタル信号になり、音声入力手段１１０はＡＤ変換機能を持たず音声パラメータ抽出手段１２０がＡＤ変換機能を持っている場合は音声信号はアナログ信号になるが、どちらでもよい。
【００２１】
前記音声信号は、音声パラメータ抽出装置１２０に入力され、音声パラメータが抽出される。音声パラメータは、当該携帯電話用の符号化パラメータである。この音声パラメータには、誤り訂正用の不可ビットを含むこともある。
【００２２】
前記音声パラメータは、音声送信手段１４０によって通信路へ送信される。ここでは、音声伝送用の通信手順（音声伝送モード）が用いられ、音声伝送用のネットワーク３０１を経由して通話相手３０４に送信される。
【００２３】
図３は、第１の端末１００を用いてインターネットアクセスを行う場合の動作うを示す図である。第１の端末１００を用いてインターネットアクセスを行う場合は、データ入力手段１６０、データ送信手段１３０が用いられる。まず、利用者が、データ入力手段１６０を用いて、データを入力する。データ入力手段としては、携帯電話端末の場合は、通常ダイヤルを行うためのテンキーが挙げられる。
【００２４】
さらに、前記入力されたデータは、データ送信手段１３０によって通信路へ送信される。ここでは、データ伝送用の通信手順（データ伝送モード）が用いられ、データ送信用のネットワーク３０２およびインターネット３０３を経由して通信相手に伝送される。
【００２５】
図４は、音声入力によるインターネット上のＷｅｂアクセスについての構成を示す図である。まず、第１の端末装置１００の動作について述べる。この場合、音声入力手段１１０、音声パラメータ抽出手段１２０、データ送信手段１３０、が用いられる。音声通話を行う場合と同様に、音声が音声入力手段１１０に入力され、音声信号に変換される。さらに、音声信号は、音声パラメータ抽出手段に入力され、音声パラメータが出力される。この場合、音声通話に用いられる符号化方式以外の符号化方式を用いてもよいが、端末のコストが大幅に増加してしまうことを考慮すると、音声通話と同一の符号化方式を用いることが好ましい。また、音声認識において、ＬＰＣ（Ｌｉｎｅａｒ　Ｐｒｅｄｉｃｔｉｖｅ　Ｃｏｄｉｎｇ）ケプストラム係数等のホルマントパラメータを用いるため、ＬＰＣ係数等のホルマントパラメータを符号化する符号化方式、例えばＣＥＬＰ系の符号化方式を用いればよい。ここでは、ＣＥＬＰ系の符号化方式を用いるものとする。
【００２６】
抽出された音声パラメータは、データ送信手段１３０によって通信路に伝送される。ここでは、データ伝送用の通信手順（データ伝送モード）が用いられ、データ送信用のネットワークを経由して通信先に伝送される。このため、誤りのない伝送が可能になる。
【００２７】
次に、第２の端末装置２００の動作について述べる。まず、通信路を経由して送られてきた音声パラメータが、受信手段２１０によって受信される。受信された音声パラメータは、音声認識手段２２０に入力され、文字列データに変換される。音声認識手段としては、音声パラメータはＣＥＬＰ系の符号化方式を用いて抽出されたものであるので、この中からＬＰＣ係数等のホルマントパラメータを符号化した部分を抽出し、音声認識に用いる。
【００２８】
前記文字列データは、Ｗｅｂアクセスの入力信号として、所定のプロトコルを用いて、Ｗｅｂサーバ３００に送信され、Ｗｅｂサーバはそのサーバ設定に基づいて処理を行う。
【００２９】
【発明の効果】
以上の説明から明らかなように、本発明によれば、音声を入力し、音声信号に変換する入力手段、前記入力手段に入力した音声から所定の音声パラメータを抽出する音声パラメータ抽出手段、データ送信モードを用いて送信を行うデータ送信手段、音声通信モードを用いて送信を行う音声送信手段、を備え、さらに、前記音声パラメータを送信する際に、前記データ送信手段あるいは前記音声送信手段のどちらかを選択する選択手段を備えているので、前記音声パラメータをデータ送信手段を用いて送信することにより、誤りのない音声パラメータを送信して利用することが可能になる。
【００３０】
また、本発明によれば、前記音声パラメータを受信する受信手段、前記受信したパラメータを用いて音声認識処理を行って文字列データを出力する音声認識手段、およびを前記文字列データを送信する文字列データ出力手段、を備えるているので、前記誤りのない音声パラメータを用いて音声認識を行うことにより、通信端末からの音声入力を用いて品質の高い音声認識結果を得ることができる。
【００３１】
また、本発明によれば、入力音声を文字列データに変換するので、前記誤りのない音声パラメータを用いて音声認識を行うことにより、通信端末からの音声入力を用いて品質の高い音声認識結果を得ることができる。
【００３２】
また、本発明によれば、音声入力から入力用文字列データを得るので、高品質の音声認識を用いた情報アクセスが可能になる。
【図面の簡単な説明】
【図１】本発明の情報アクセスシステムの基本構成を示す図である。
【図２】本発明の通信端末装置の第１の端末装置１００の基本的な動作を示す図である。
【図３】第１の端末１００を用いてインターネットアクセスを行う場合の動作を示す図である。
【図４】音声入力によるインターネット上のＷｅｂアクセスについての構成を示す図である。
【符号の説明】
１００　第１の通信端末装置
１１０　音声入力手段
１２０　音声パラメータ抽出手段
１３０　データ送信手段
１４０　音声送信手段
１５０　選択手段
１６０　データ入力手段
２００　第２の通信端末装置
２１０　受信手段
２２０　音声認識手段
２３０　文字列データ出力手段
３００　Ｗｅｂサーバ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a communication terminal device that converts voice into character string data, a voice recognition system, and an information access system.
[0002]
[Prior art]
In recent years, Internet access using a mobile phone as a terminal has been widely used. The use of the Internet by a mobile phone involves inputting a keyword by making full use of a numeric keypad, and has a problem of poor usability.
[0003]
In order to solve the above-mentioned problem, it is conceivable to recognize speech by a terminal and use the speech as an input for using the Internet. However, it is not practical for the following reasons. First, with current speech recognition technology, free word recognition of unspecified speakers is not yet of sufficient quality. Further, mounting it on a mobile phone terminal requires a large increase in terminal cost.
[0004]
On the other hand, some systems for using voice as Internet input means have been proposed. A typical example is IBM's WebSphere Voice Server. This involves inputting voice using a telephone, converting the voice into a digital signal through A / D conversion, and transferring the digital signal to a voice server using an IP (Internet Protocol) network using VoIP (Voice over IP) technology. By performing speech recognition on a speech server and converting the text information into text information, the speech data is used as input data to the web server, and the text information output from the web server is converted into speech data using speech synthesis technology on the speech server. Is converted to a voice signal by VoIP technology via an IP network, and is output as voice using a telephone. The target phrase for speech recognition is not a free word but a limited one prepared in advance as a Web input, so that an unspecified speaker can be recognized.
[0005]
[Problems to be solved by the invention]
However, the WebSphere Voice Server system uses voice as an input for using the Internet, but makes it difficult to use a mobile phone as a terminal for the following reasons.
[0006]
First, there is the problem of voice quality. Normal dial calls and PHS (Personal Handy-phone System) are relatively high bit rate coding schemes such as 64 kbit / s PCM (Pulse Code Modulation) and 32 kbit / s ADPCM (Adaptive Differential PCM). Since digital encoding is used, the voice is less likely to deteriorate and high-precision voice recognition is possible. However, in the case of a PDC (Personal Digital Cellular) system or a CDMA system mobile phone which is currently most widely used, a 5.6 kbit / s CELP (Code Excited Linear Prediction) such as a VSELP (Vector Sum Excited Linear Prediction) system. Since encoding is performed using an encoding method, speech deterioration is more severe than PCM, ADPCM, or the like, and it is difficult to recognize speech with sufficient quality.
[0007]
In addition, a mobile phone uses a voice communication network in the case of a voice call (a procedure for transmission and reception relating to communication using the voice communication network is referred to as a “voice communication mode”), and in the case of an Internet connection. Performs communication using a packet network for data communication (a procedure for transmission and reception relating to communication using the network for data communication is referred to as “data communication mode”). In the case of data communication, since errors are not allowed, procedures for error correction and packet loss are clearly specified.However, since voice communication emphasizes real-time system, bits for error correction are added. Only transmission is performed, and measures against transmission path loss and the like are weaker than in data communication. This also makes speech recognition using mobile phone speech difficult.
[0008]
However, in the case of Internet input, unlike the case of normal voice communication, the demand for the real-time system is not so high. Further, the CELP-based speech coding scheme used in general cellular phones and cellular phones (excluding PHS) extracts parameters that can be used in speech recognition, such as linear prediction coefficients and pitch periods. Encoding voice using this encoding method and transmitting data only involves using the function originally installed in the portable terminal, so that there is almost no increase in cost.
[0009]
In view of the above circumstances, the present invention transmits coded voice in a data transmission mode, and a voice recognition server performs voice recognition using the transmitted voice parameters and uses the voice as input for Internet use, thereby increasing the cost of the terminal. It is intended to use voice as Internet input with little accompanying.
[0010]
[Means for Solving the Problems]
In order to achieve the above object, the invention according to claim 1 includes a voice input unit for inputting voice and converting the input voice into a voice signal, and extracting a predetermined voice parameter from the input voice to the voice input unit. Voice parameter extracting means, data input means for inputting data, data transmitting means for transmitting data using a transmission / reception procedure relating to communication using a data communication network, and communication using a voice communication network Voice transmission means for performing voice transmission using a transmission / reception procedure relating to the transmission of the voice parameter, and when the voice parameter is transmitted, a selection means for selecting either the data transmission means or the voice transmission means is provided. And
[0011]
According to a second aspect of the present invention, there is provided a receiving unit that receives a voice parameter, a voice recognition unit that performs a voice recognition process using the received parameter and outputs character string data, and a character string data output that transmits the character string data. Means.
[0012]
A third aspect of the present invention is characterized in that there is provided a means for converting an input voice into character string data using the first and second aspects of the invention.
[0013]
According to a fourth aspect of the present invention, character string data for input is obtained from a voice input using the voice recognition system according to the third aspect.
[0014]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
[0015]
FIG. 1 is a diagram showing a basic configuration of the information access system of the present invention. This configuration includes a first terminal device 100, a second terminal device 200, a Web server 300, a voice transmission network 301, a data transfer network 302, and the Internet 303.
[0016]
The first terminal device 100 includes a voice input unit 110, a voice parameter extraction unit 120, a data transmission unit 130, a voice transmission unit 140, a selection unit 150, and a data input unit 160.
[0017]
Further, the second terminal device 200 includes a receiving unit 210, a voice recognition unit 220, and a character string data output unit 230. In the following, the first terminal device is described as an example of a mobile communication terminal, particularly a mobile phone, but is not limited thereto.
[0018]
The second terminal device includes a receiving unit 210, a voice recognition unit 220, and a character string data output unit 230. The speech recognition means 220 uses a speech recognition algorithm using formant parameters such as LPC (Linear Predictive Coding) cepstrum known to experts.
[0019]
FIG. 2 is a diagram showing a basic operation of the first terminal device of the communication terminal device of the present invention. First, when making a normal voice call using the first terminal 100, a voice input unit 110, a voice parameter extraction unit 120, and a voice transmission unit 140 are used. The selection means 150 is set to use the voice transmission means 140.
[0020]
First, a voice is input to the voice input unit 110 and converted into a voice signal. Examples of the voice input unit 110 include a microphone device and a microphone device with a low-pass filter. If the audio input means 110 has an AD conversion function, the audio signal is a digital signal. If the audio input means 110 has no AD conversion function and the audio parameter extraction means 120 has an AD conversion function, the audio signal is It becomes an analog signal, but either may be used.
[0021]
The voice signal is input to a voice parameter extraction device 120, and voice parameters are extracted. The voice parameter is an encoding parameter for the mobile phone. The voice parameter may include an uncorrectable bit for error correction.
[0022]
The voice parameter is transmitted to the communication path by the voice transmitting unit 140. Here, a communication procedure for voice transmission (voice transmission mode) is used and transmitted to the other party 304 via the network 301 for voice transmission.
[0023]
FIG. 3 is a diagram showing an operation when Internet access is performed using the first terminal 100. When accessing the Internet using the first terminal 100, the data input unit 160 and the data transmission unit 130 are used. First, the user inputs data using the data input unit 160. As the data input means, in the case of a mobile phone terminal, a ten key for normal dialing can be used.
[0024]
Further, the input data is transmitted to the communication path by the data transmitting means 130. Here, a data transmission communication procedure (data transmission mode) is used, and the data is transmitted to a communication partner via the data transmission network 302 and the Internet 303.
[0025]
FIG. 4 is a diagram showing a configuration for Web access on the Internet by voice input. First, the operation of the first terminal device 100 will be described. In this case, a voice input unit 110, a voice parameter extraction unit 120, and a data transmission unit 130 are used. As in the case of making a voice call, a voice is input to the voice input means 110 and converted into a voice signal. Further, the audio signal is input to the audio parameter extracting means, and the audio parameter is output. In this case, an encoding method other than the encoding method used for the voice call may be used. However, considering that the cost of the terminal is greatly increased, the same encoding method as that for the voice call may be used. preferable. In speech recognition, since a formant parameter such as a LPC (Linear Predictive Coding) cepstrum coefficient is used, an encoding method for encoding a formant parameter such as an LPC coefficient, for example, a CELP-based encoding method may be used. Here, it is assumed that a CELP coding scheme is used.
[0026]
The extracted voice parameters are transmitted to the communication path by the data transmission means 130. Here, a data transmission communication procedure (data transmission mode) is used, and the data is transmitted to a communication destination via a data transmission network. For this reason, error-free transmission becomes possible.
[0027]
Next, the operation of the second terminal device 200 will be described. First, the voice parameter transmitted via the communication path is received by the receiving unit 210. The received voice parameters are input to the voice recognition means 220 and are converted into character string data. As the voice recognition means, since the voice parameters are extracted by using the CELP coding method, a part in which formant parameters such as LPC coefficients are coded is extracted therefrom and used for voice recognition.
[0028]
The character string data is transmitted as an input signal for Web access to the Web server 300 using a predetermined protocol, and the Web server performs processing based on the server settings.
[0029]
【The invention's effect】
As apparent from the above description, according to the present invention, input means for inputting voice and converting it into a voice signal, voice parameter extracting means for extracting a predetermined voice parameter from the voice input to the input means, data transmission Data transmission means for performing transmission using a mode, voice transmission means for performing transmission using a voice communication mode, further comprising: when transmitting the voice parameter, either the data transmission means or the voice transmission means Since the voice parameter is transmitted using the data transmission means, it is possible to transmit and use the voice parameter without error.
[0030]
Further, according to the present invention, receiving means for receiving the voice parameter, voice recognition means for performing a voice recognition process using the received parameter and outputting character string data, and a character for transmitting the character string data Since the apparatus includes the column data output means, by performing speech recognition using the error-free speech parameters, it is possible to obtain a high-quality speech recognition result using speech input from the communication terminal.
[0031]
Further, according to the present invention, the input speech is converted into character string data. Therefore, speech recognition is performed using the speech parameter without error, so that a speech recognition result of high quality can be obtained using speech input from a communication terminal. Can be obtained.
[0032]
Further, according to the present invention, since the input character string data is obtained from the voice input, information access using high-quality voice recognition becomes possible.
[Brief description of the drawings]
FIG. 1 is a diagram showing a basic configuration of an information access system of the present invention.
FIG. 2 is a diagram showing a basic operation of a first terminal device 100 of the communication terminal device of the present invention.
FIG. 3 is a diagram illustrating an operation in a case where an Internet access is performed using a first terminal 100;
FIG. 4 is a diagram showing a configuration for Web access on the Internet by voice input.
[Explanation of symbols]
100 first communication terminal device 110 voice input means 120 voice parameter extraction means 130 data transmission means 140 voice transmission means 150 selection means 160 data input means 200 second communication terminal device 210 reception means 220 voice recognition means 230 character string data output Means 300 Web server

Claims

Voice input means for inputting voice and converting the input voice to a voice signal;
Voice parameter extraction means for extracting predetermined voice parameters from the input voice to the voice input means,
Data input means for inputting data;
A data transmission unit that performs data transmission using a transmission / reception procedure regarding communication using a data communication network,
Voice transmission means for performing voice transmission using a transmission and reception procedure related to communication using a voice communication network,
When transmitting the voice parameters,
A communication terminal device comprising a selection unit for selecting either the data transmission unit or the voice transmission unit.

Receiving means for receiving voice parameters;
Voice recognition means for performing voice recognition processing using the received parameters and outputting character string data,
Character string data output means for transmitting the character string data.

A speech recognition system comprising means for converting input speech into character string data using the communication terminal device according to claim 1 and the communication terminal device according to claim 2.

4. An information access system, wherein input character string data is obtained from voice input using the voice recognition system according to claim 3.