JP2009145435A

JP2009145435A - System and method for providing unspecified speaker speech recognition engine used in a plurality of apparatuses to individual user via the internet

Info

Publication number: JP2009145435A
Application number: JP2007320333A
Authority: JP
Inventors: Zuisho O; 瑞璋王
Original assignee: CHUHEI O; O CHUHEI; ZUISHO O
Current assignee: CHUHEI O; O CHUHEI; ZUISHO O
Priority date: 2007-12-12
Filing date: 2007-12-12
Publication date: 2009-07-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a system and method for providing unspecified speaker speech recognition engine used in a plurality of instrument to an individual user via the Internet. <P>SOLUTION: The system comprises a user login unit 10, a storage unit 20, a speech recognition engine preparation unit 30 and an engine download unit 40. The method comprises a step (a) and a step (b). In the step (a), the individual user inputs a speech with a different instrument and the input speech is transmitted and stored in the storage unit 20 of the system provided on the Internet by the Internet. In the step (b), the speech recognition engine suited for the usage in the respective electronic apparatuses is prepared on the basis of the speech input by the user and characteristics of the respective electronic apparatuses, using the speech recognition engine preparation unit 30 provided on the Internet. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、音声認識エンジンのシステム及び方法に関し、特に複数の機器に使用される不特定話者音声認識エンジンをインターネットを介して個別のユーザに提供するシステム及び方法に関する。 The present invention relates to a speech recognition engine system and method, and more particularly to a system and method for providing an unspecified speaker speech recognition engine used for a plurality of devices to individual users via the Internet.

音声認識技術は、デスクトップ型パソコン、ノートブック型パソコン、携帯電話、ＰＤＡなどの電子機器の使用において、ユーザにさらなる利便性をもたらした。ユーザは、マイクなどによりディクティションを行なうだけで、言葉が文字化されたり、さらなるコマンドになるなどして、入力や各種の電子機器の操作が簡単に行なえるようになった。例えば、ユーザは、音声認識技術により口述により文章を書くことができたり、携帯電話を使用する際に、音声でダイアルできたりするのである。音声認識技術は、一般のユーザに利便性をもたらすだけでなく、身体障害者や筋肉萎縮症患者などにとっては、さらに貴重な技術である。 The voice recognition technology has brought further convenience to users in the use of electronic devices such as desktop computers, notebook computers, mobile phones, and PDAs. Users can easily perform input and operation of various electronic devices simply by performing dictation with a microphone, etc., as words are converted into characters or become further commands. For example, a user can write a sentence by dictation using voice recognition technology, or can dial by voice when using a mobile phone. The voice recognition technology not only brings convenience to general users, but is also a more valuable technology for persons with physical disabilities and muscular atrophy patients.

音声認識技術の利用において、音声認識エンジンは、特定話者音声認識エンジンと、不特定話者音声認識エンジンと、２つのモデルに分けることができる。 In using the speech recognition technology, the speech recognition engine can be divided into two models: a specific speaker speech recognition engine and an unspecified speaker speech recognition engine.

特定話者音声認識エンジンは、事前にユーザの音声データが大量に登録されるため、音声認識エンジンを学習させ訓練する工程が必要なく、直接使用することができる。しかし、音声認識エンジンを訓練する工程が省略できるものの、個別のユーザの発音は異なるため、ユーザ本人以外の音声データを判断基準にした場合、特定話者音声認識エンジンは、精度が不特定話者音声認識エンジンに比べて大幅に劣るという短所を有する。 Since a large amount of user's voice data is registered in advance, the specific speaker voice recognition engine does not need to learn and train the voice recognition engine and can be used directly. However, although the process of training the speech recognition engine can be omitted, the pronunciation of each individual user is different. Therefore, when the speech data other than the user is used as a criterion, the specific speaker speech recognition engine has a non-specific speaker accuracy. It has the disadvantage of being significantly inferior to a speech recognition engine.

不特定話者音声認識エンジンは、ユーザが事前に不特定話者音声認識エンジンに対して、訓練と調整を行なう。要するに、ユーザの音声データが入力されて、はじめて不特定話者音声認識エンジンの利用が可能なのである。携帯電話の音声ダイアル機能を例にすると、ユーザは、事前に自分の音声で受信者の氏名を録音しなければ、使用することができない。音声認識の精度は、比較的高いが、使用においては甚だ不便である。ユーザがわざわざ電子機器の不特定話者音声認識エンジンに対して訓練を行なっても、新しい電子機器に交換する際には、再び新しい電子機器の不特定話者音声認識エンジンに対して訓練を行なわなければならない。携帯電話を例にすると、新しい携帯電話に交換したら、ユーザは、再び音声データを入力しなおし、不特定話者音声認識エンジンに対して訓練を行なわねば、使用できないのである。 The unspecified speaker speech recognition engine performs training and adjustment on the unspecified speaker speech recognition engine in advance by the user. In short, the unspecified speaker voice recognition engine can be used only after the user's voice data is inputted. Taking the voice dialing function of a cellular phone as an example, the user cannot use it without recording the recipient's name with his / her voice beforehand. The accuracy of speech recognition is relatively high, but it is very inconvenient to use. Even if the user bothered to train the unspecified speaker voice recognition engine of the electronic device, when replacing it with a new electronic device, the user again trained the unspecified speaker voice recognition engine of the new electronic device. There must be. Taking a mobile phone as an example, if the user replaces the mobile phone with a new one, the user cannot input the voice data again and train the unspecified speaker voice recognition engine.

電子機器の普及により、ユーザは、複数の電子機器を所有するようになった。上述したように、不特定話者音声認識エンジンを利用する際には、ユーザは、異なる種類の電子機器に対して、それぞれ訓練を行なわなければならなく、時間の浪費のみならず、音声認識技術に対して使用意欲さえも失ってしまうという欠点を有する。仮に、上記の不特定話者音声認識エンジンの使用における欠点を効果的に解決したり、認識能力がより正確である不特定話者音声認識エンジンがさらに普及したりすることが可能なら、音声認識テクノロジー産業の発展を促進させることができる。
本発明は、現在における最先端のインターネット技術を基に、不特定話者音声認識エンジンの高精度を維持しつつ、ユーザが長期に亘り入力した音声データを異なる電子機器の不特定話者音声認識エンジンの訓練に利用し、不特定話者音声認識エンジンの使用前に行なう長時間の訓練過程を省略するものである。
特開２００３−２９５８９３号公報 With the widespread use of electronic devices, users have owned multiple electronic devices. As described above, when using the unspecified speaker speech recognition engine, the user has to perform training for different types of electronic devices, not only wasting time, but also speech recognition technology. However, it has the disadvantage that even the willingness to use is lost. If it is possible to effectively solve the disadvantages of using the above-mentioned unspecified speaker speech recognition engine or to further spread the unspecified speaker speech recognition engine with more accurate recognition ability, speech recognition is possible. The development of the technology industry can be promoted.
The present invention is based on the latest state-of-the-art Internet technology, and maintains the high accuracy of an unspecified speaker speech recognition engine, while the speech data input by a user over a long period of time is used for unspecified speaker speech recognition of different electronic devices. It is used for engine training and omits the long training process that is performed before the use of the unspecified speaker speech recognition engine.
JP 2003-295893 A

本発明の第１の目的は、ユーザが異なる電子機器を使用する際に、あらかじめ保存された音声データを使用して、使用される各電子機器の特性に基づき、その電子機器に使用されるのに適当な不特定話者音声認識エンジンを自動的に作成することにより、改めて訓練を行なう必要がなく、ユーザが異なる電子機器上で不特定話者音声認識エンジンを使用するのに利便性がある複数の機器に使用される不特定話者音声認識エンジンをインターネットを介して個別のユーザに提供するシステム及び方法を提供することにある。
本発明の第２の目的は、インターネットを利用して異なる電子機器における同一ユーザの音声データを長期に亘り保存し、予め学習や訓練を行なわなくても異なる電子機器上で使用できる不特定話者音声認識エンジンを作成するのに用いられることにより、ユーザにより異なる電子機器で使用される不特定話者音声認識エンジンをつねに精密化し、ユーザのニーズにさらに沿うものにする複数の機器に使用される不特定話者音声認識エンジンがインターネットを介して個別のユーザに提供されるシステム及び方法を提供することにある。 The first object of the present invention is to use audio data stored in advance when a user uses a different electronic device based on the characteristics of each electronic device used. By automatically creating an unspecified speaker voice recognition engine suitable for the user, there is no need to retrain, and it is convenient for the user to use the unspecified speaker voice recognition engine on different electronic devices. It is an object of the present invention to provide a system and method for providing an unspecified speaker voice recognition engine used for a plurality of devices to individual users via the Internet.
The second object of the present invention is to store the same user's voice data in different electronic devices over the long term using the Internet, and can be used on different electronic devices without prior learning or training. Used to create a speech recognition engine, it is used in multiple devices that always refine the speaker-independent speech recognition engine used by different users on different electronic devices and further meet the user's needs It is an object of the present invention to provide a system and method in which an unspecified speaker voice recognition engine is provided to individual users via the Internet.

上述の目的を達成するため、本発明は複数の機器に使用される不特定話者音声認識エンジンをインターネットを介して個別のユーザに提供されるシステムを提供する。本発明の複数の機器に使用される不特定話者音声認識エンジンをインターネットを介して個別のユーザに提供するシステムは、記憶ユニット及び音声認識エンジン作成ユニットを備える。記憶ユニットは、ユーザにより電子機器を介して入力された音声を保存する。音声認識エンジン作成ユニットは、ユーザにより入力された音声及び使用される電子機器の特性に基づき、各電子機器音に使用されるのに適した不特定話者音声認識エンジンを作成する。 In order to achieve the above object, the present invention provides a system in which an unspecified speaker speech recognition engine used for a plurality of devices is provided to individual users via the Internet. A system for providing an unspecified speaker voice recognition engine used for a plurality of devices of the present invention to individual users via the Internet includes a storage unit and a voice recognition engine creation unit. The storage unit stores voice input by the user via the electronic device. The speech recognition engine creation unit creates an unspecified speaker speech recognition engine suitable for use for each electronic device sound based on the speech input by the user and the characteristics of the electronic device used.

本発明の複数の機器に使用される不特定話者音声認識エンジンをインターネットを介して個別のユーザに提供する方法は、次のステップａ及びステップｂを備える。ステップａはユーザにより電子機器を介して入力された音声がインターネットを介してインターネット上のプラットフォーム内で提供される記憶ユニットに送信、保存されるステップである。前記ステップｂはインターネット上で提供される音声認識エンジン作成ユニットを使用して、ユーザの入力した音声及び使用される電子機器の特性に基づき、前記電子機器に使用されるのに適した前記不特定話者音声認識エンジンを作成するステップである。 The method of providing an unspecified speaker voice recognition engine used for a plurality of devices of the present invention to individual users via the Internet includes the following steps a and b. Step a is a step in which audio input by the user via the electronic device is transmitted and stored via the Internet to a storage unit provided in a platform on the Internet. The step b uses the speech recognition engine creation unit provided on the Internet, and based on the voice input by the user and the characteristics of the electronic device used, the unspecified information suitable for use in the electronic device. It is a step of creating a speaker speech recognition engine.

これにより、ユーザが異なる電子機器を使用する際、保存された音声データ及び使用される電子機器の特性に基づき、作成された不特定話者音声認識エンジンは、インターネットからその電子機器に直接ダウンロードして使用でき、使用する際には、改めて訓練を行なう必要がなくなった。 As a result, when a user uses a different electronic device, the created unspecified speaker voice recognition engine can be downloaded directly from the Internet to the electronic device based on the stored voice data and the characteristics of the used electronic device. It is no longer necessary to retrain when using it.

本発明のインターネットを利用して個別のユーザに提供される複数の電子機器に使用される不特定話者音声認識エンジンは、本発明のシステム及び方法により、インターネット上に音声データが保存でき、異なる電子機器を使用する際には、保存された音声及び各電子機器の特性に基づき、各電子機器に使用されるのに適当な不特定話者音声認識エンジンを作成するため、新しい電子機器を使用する際には、改めて訓練を行なう必要がなくなった。 The unspecified speaker voice recognition engine used for a plurality of electronic devices provided to individual users using the Internet of the present invention can store voice data on the Internet by the system and method of the present invention. When using electronic devices, new electronic devices are used to create an unspecified speaker speech recognition engine suitable for use with each electronic device based on the stored voice and the characteristics of each electronic device. When doing so, it is no longer necessary to retrain.

本発明のシステム及び方法により、ユーザの音声データが長期に亘り保存でき、不特定話者音声認識エンジンの精度を大幅に向上させ、不特定話者音声認識エンジンをユーザの需要にさらに沿うものにした。 With the system and method of the present invention, user speech data can be stored for a long period of time, the accuracy of the unspecified speaker speech recognition engine is greatly improved, and the unspecified speaker speech recognition engine is further in line with the user demand. did.

本発明のシステム及び方法により、本発明の音声認識システムをインターネット上の大型ポータルサイト内に設け、長期的に音声データを収集したり、利用したりして、さらに精度の高い不特定話者音声認識エンジンが得られるだけでなく、使用されるポータルサイト側も長期的に安定して顧客が獲得できるため、双方にとって利益があるものになった。 By using the system and method of the present invention, the voice recognition system of the present invention is provided in a large portal site on the Internet, and voice data is collected and used over a long period of time. In addition to obtaining a recognition engine, the portal site used is also profitable for both sides because customers can acquire it stably over the long term.

以下、本発明の実施形態を図面に基づいて説明する。
図１は、本発明の第１の実施形態による複数の機器に使用される不特定話者音声認識エンジンが、インターネットを介して個別のユーザに提供されるシステムを示す図である。図１に示すように、本発明の複数の機器に使用される不特定話者音声認識エンジンがインターネットを介して個別のユーザに提供されるシステムは、インターネット上のプラットフォーム１内に記憶ユニット２０及び音声認識エンジン作成ユニット３０を含む。記憶ユニット２０は、ユーザが携帯電話２を介して入力した音声を保存する。音声認識エンジン作成ユニット３０は、携帯電話２を介して入力された音声及び携帯電話２の特性に基づき、音声認識エンジンを作成して携帯電話２に提供する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a diagram showing a system in which an unspecified speaker speech recognition engine used for a plurality of devices according to a first embodiment of the present invention is provided to individual users via the Internet. As shown in FIG. 1, a system in which an unspecified speaker voice recognition engine used for a plurality of devices of the present invention is provided to individual users via the Internet includes a storage unit 20 and a platform 20 in the platform 1 on the Internet. A speech recognition engine creation unit 30 is included. The storage unit 20 stores voice input by the user via the mobile phone 2. The voice recognition engine creation unit 30 creates a voice recognition engine based on the voice input via the mobile phone 2 and the characteristics of the mobile phone 2 and provides the voice recognition engine to the mobile phone 2.

音声認識エンジン作成ユニット３０は、モデル訓練技術又はモデル調整技術により、ユーザの音声に基づき、音声認識エンジンを作成する。作成された音声認識エンジンは、音声から特徴パラメータを抽出したデバイス、訓練済みの比較データ及びモデル認識の検索比較デバイスを含む。また、音声認識エンジンの作成が使用する電子機器に適用できるように、電子機器に用いられるハード及びソフトウェアの環境も考慮しなければならない。 The speech recognition engine creation unit 30 creates a speech recognition engine based on the user's speech by model training technology or model adjustment technology. The created speech recognition engine includes a device that extracts feature parameters from speech, trained comparison data, and model recognition search comparison devices. In addition, the hardware and software environment used in the electronic device must also be considered so that the creation of the speech recognition engine can be applied to the electronic device used.

図２は、本発明の第２の実施形態による複数の機器に使用される不特定話者音声認識エンジンがインターネットを介して個別のユーザに提供されるシステムを示す図である。図２に示すように、本発明の複数の機器に使用される不特定話者音声認識エンジンがインターネットを介して個別のユーザに提供されるシステムは、インターネット上のプラットフォーム１内にユーザログインユニット１０、記憶ユニット２０、音声認識エンジン作成ユニット３０及びエンジンダウンロードユニット４０を含む。 FIG. 2 is a diagram showing a system in which an unspecified speaker voice recognition engine used for a plurality of devices according to the second embodiment of the present invention is provided to individual users via the Internet. As shown in FIG. 2, a system in which an unspecified speaker speech recognition engine used for a plurality of devices of the present invention is provided to individual users via the Internet includes a user login unit 10 in a platform 1 on the Internet. A storage unit 20, a speech recognition engine creation unit 30, and an engine download unit 40.

第２の実施形態において、ユーザログインユニット１０は、異なるユーザが異なる電子機器によりインターネットからログインするシステムを提供するのに用いられる。記憶ユニット２０は、ユーザにより異なる電子機器に入力された音声を保存するのに用いられ、ユーザにより異なる電子機器に入力された音声を分類して保存し、入力された音声はインターネットにより記憶ユニット２０内に送信、保存される。音声認識エンジン作成ユニット３０は、ユーザにより入力された音声及び各電子機器の特性に基づき、各電子機器に使用されるのに適当な音声認識エンジンを作成する。エンジンダウンロードユニット４０は、ユーザが各電子機器で各電子機器に使用されるのに適当な音声認識エンジンをダウンロードするのに提供される。 In the second embodiment, the user login unit 10 is used to provide a system in which different users log in from the Internet using different electronic devices. The storage unit 20 is used to store voices input to different electronic devices by the user, classifies and stores voices input to different electronic devices by the user, and the input voices are stored on the storage unit 20 via the Internet. Sent and stored within. The speech recognition engine creation unit 30 creates a speech recognition engine suitable for use in each electronic device based on the speech input by the user and the characteristics of each electronic device. An engine download unit 40 is provided for the user to download a speech recognition engine suitable for use with each electronic device.

図２に示すように、音声認識機能を有する携帯電話２が使用される時、ユーザは、インターネットを利用してユーザログインユニット１０から本発明の音声認識エンジンシステムにログインし、携帯電話２のユーザの音声を入力する音信接収装置を利用して、インターネットを介して記憶ユニット２０内に送信、保存する。その後、ユーザにより入力された音声及び各電子機器の特性に基づき、音声認識エンジン作成ユニット３０が音声認識エンジンを作成し、エンジンダウンロードユニット４０がインターネットを介してユーザの携帯電話２にダウンロードする。 As shown in FIG. 2, when a mobile phone 2 having a voice recognition function is used, the user logs in to the voice recognition engine system of the present invention from the user login unit 10 using the Internet, and the user of the mobile phone 2 Is transmitted and stored in the storage unit 20 via the Internet using a sound receiving device that inputs the voice of the user. Thereafter, based on the voice input by the user and the characteristics of each electronic device, the voice recognition engine creation unit 30 creates a voice recognition engine, and the engine download unit 40 downloads it to the user's mobile phone 2 via the Internet.

図３は、本発明の第２の実施形態による複数の機器に使用される不特定話者音声認識エンジンがインターネットを介して個別のユーザに提供されるシステムを示すもう１つの図である。図３に示すように、ユーザが以前使用していた携帯電話２により、すでに音声が音声認識システムの記憶ユニット２０に保存されていた状況において、ユーザが他の携帯電話２´を使用する際、まず、インターネットを介してユーザログインユニット１０を利用して携帯電話２´のデータを本発明の音声認識エンジンシステムに入力する。すると、音声認識エンジン作成ユニット３０がユーザにより入力された音声及び携帯電話２´の特性に基づき、携帯電話２´に使用されるのに適当な音声認識エンジンを作成する。最後に、エンジンダウンロードユニット４０がインターネットを介してユーザの携帯電話２´にダウンロードする。以上のように、ユーザは、直接新しい携帯電話２´の音声認識機能を使用することができ、新しい携帯電話２´に対してあらかじめ長時間の訓練を行なう必要がないのである。また、新しい携帯電話２´に入力された音声は、インターネットを介して同様に記憶ユニット２０内に保存され、ユーザの音声データ保存量が蓄積される。これにより、音声認識エンジン作成ユニット３０により作成される音声認識エンジンは、つねに精密化され、新しい携帯電話２´の音声認識精度が大幅に向上し、同時に他の電子機器に使用される音声認識エンジンも精密化される。 FIG. 3 is another diagram illustrating a system in which an unspecified speaker speech recognition engine used for a plurality of devices according to the second embodiment of the present invention is provided to individual users via the Internet. As shown in FIG. 3, when the user uses another mobile phone 2 ′ in a situation where the voice is already stored in the storage unit 20 of the voice recognition system by the mobile phone 2 that the user has used before, First, the data of the mobile phone 2 ′ is input to the voice recognition engine system of the present invention using the user login unit 10 via the Internet. Then, the speech recognition engine creation unit 30 creates a speech recognition engine suitable for use in the mobile phone 2 ′ based on the voice input by the user and the characteristics of the mobile phone 2 ′. Finally, the engine download unit 40 downloads to the user's mobile phone 2 'via the Internet. As described above, the user can directly use the voice recognition function of the new mobile phone 2 ′, and does not need to perform long-time training in advance for the new mobile phone 2 ′. In addition, the voice input to the new mobile phone 2 ′ is similarly stored in the storage unit 20 via the Internet, and the voice data storage amount of the user is accumulated. Thereby, the speech recognition engine created by the speech recognition engine creation unit 30 is always refined, and the speech recognition accuracy of the new mobile phone 2 'is greatly improved, and at the same time, the speech recognition engine used for other electronic devices. Is also refined.

図４は、本発明の第２の実施形態による複数の機器に使用される不特定話者音声認識エンジンがインターネットを介して個別のユーザに提供されるシステムを示すさらにもう１つの図である。図４に示すように、ユーザにより先に使用された電子機器と、後に使用される電子機器とは、異なる種類のものであっても構わない。ユーザが携帯電話２及び携帯電話２´によりインターネットを介して音声データを入力し、それを記憶ユニット２０に保存した後で、ノートブック型パソコン３の音声認識機能を使用する場合、インターネットを介してユーザログインユニット１０を利用してノートブック型パソコン３のデータを入力すると、音声認識エンジン作成ユニット３０がユーザにより入力された携帯電話２と携帯電話２´の音声、及びノートブック型パソコン３の特性に基づき、ノートブック型パソコン３に使用されるのに適当な音声認識エンジンを作成する。最後に、エンジンダウンロードユニット４０がインターネットを介してユーザのノートブック型パソコン３にダウンロードする。以上のように、ユーザは、直接ノートブック型パソコン３の音声認識機能を使用することができ、ノートブック型パソコン３に対してあらかじめ長時間の訓練時間を行なう必要がないのである。また、ノートブック型パソコン３に入力された音声は、インターネットを介して同様に記憶ユニット２０内に保存され、ユーザの音声データ保存量が蓄積される。これにより、音声認識エンジン作成ユニット３０により作成される音声認識エンジンは、つねに精密化され、ノートブック型パソコン３の音声認識精度が大幅に向上し、同時に他の電子機器に使用される音声認識エンジンも精密化される。 FIG. 4 is still another diagram illustrating a system in which an unspecified speaker voice recognition engine used for a plurality of devices according to the second embodiment of the present invention is provided to individual users via the Internet. As shown in FIG. 4, the electronic device used first by the user and the electronic device used later may be of different types. When the user uses the mobile phone 2 and the mobile phone 2 ′ to input voice data via the Internet and saves the voice data in the storage unit 20, the user uses the voice recognition function of the notebook computer 3 via the Internet. When data of the notebook personal computer 3 is input using the user login unit 10, the voice recognition engine creation unit 30 inputs the voice of the mobile phone 2 and the mobile phone 2 ′ input by the user, and the characteristics of the notebook personal computer 3. Based on the above, a speech recognition engine suitable for use in the notebook personal computer 3 is created. Finally, the engine download unit 40 downloads to the user's notebook personal computer 3 via the Internet. As described above, the user can directly use the voice recognition function of the notebook personal computer 3 and does not need to perform a long training time in advance for the notebook personal computer 3. Also, the voice input to the notebook personal computer 3 is similarly stored in the storage unit 20 via the Internet, and the amount of voice data stored by the user is accumulated. Thereby, the speech recognition engine created by the speech recognition engine creation unit 30 is always refined, the speech recognition accuracy of the notebook personal computer 3 is greatly improved, and at the same time, the speech recognition engine used for other electronic devices. Is also refined.

上述したように、本発明の複数の機器に使用される不特定話者音声認識エンジンがインターネットを介して個別のユーザに提供されるシステムは、インターネット上のプラットフォーム１内に設けられるが、プラットフォーム１は、グーグル、ヤフー、アップル、ＭＳＮなどの大型ポータルサイトであってもよい。これにより、ユーザは、これらのポータルサイトが提供するインターネット上の空間を手軽に利用して長期的に音声データを保存し、不特定話者音声認識エンジンの能力をつねに高めることにより、最良の使用状態を求める。同時に、使用されるポータルサイト側も長期的に安定して顧客が獲得できるため、双方にとって利益がある。 As described above, a system in which an unspecified speaker voice recognition engine used for a plurality of devices of the present invention is provided to individual users via the Internet is provided in the platform 1 on the Internet. May be a large portal site such as Google, Yahoo, Apple, MSN. As a result, users can easily use the space on the Internet provided by these portal sites to store voice data in the long term and constantly improve the ability of the speaker-independent speaker recognition engine to achieve the best use. Find the state. At the same time, since the portal site used can be stably acquired by customers over the long term, it is beneficial for both parties.

図５は、本発明の実施形態による複数の機器に使用される不特定話者音声認識エンジンがインターネットを介して個別のユーザに提供される方法を示すフローチャート図である。図５に示すように、この方法は、ステップａ１、ステップａ、ステップｂ及びステップｃを順次実行するステップ含むが、ステップａ１は、ユーザが異なる電子機器によりインターネット上で提供されるユーザログインユニットからインターネット上のプラットフォームに設けらるシステムにログインするステップである。次に、ステップａは、ユーザが異なる電子機器により音声を入力し、入力された音声はインターネットによりインターネット上で提供される上記システムの記憶ユニット内に送信、保存されるステップである。続けて、ステップｂは、インターネット上で提供される音声認識エンジン作成ユニットを利用して、ユーザにより入力された音声及び各電子機器の特性に基づき、各電子機器に使用されるのに適当な音声認識エンジンを作成するステップである。最後に、ステップｃは、インターネット上で提供されるエンジンダウンロードユニットを利用して、ユーザがインターネットを介して作成した不特定話者音声認識エンジンをダウンロードして使用するステップである。 FIG. 5 is a flowchart illustrating a method in which an unspecified speaker voice recognition engine used for a plurality of devices according to an embodiment of the present invention is provided to individual users via the Internet. As shown in FIG. 5, the method includes the steps of sequentially executing steps a1, a, b, and c, from which a step a1 is obtained from a user login unit provided on the Internet by a different electronic device. This is a step of logging in to a system provided on a platform on the Internet. Next, step a is a step in which a user inputs sound by using different electronic devices, and the input sound is transmitted and stored in the storage unit of the system provided on the Internet by the Internet. Subsequently, step b uses a speech recognition engine creation unit provided on the Internet, and based on the speech input by the user and the characteristics of each electronic device, the appropriate speech to be used for each electronic device. This is a step of creating a recognition engine. Finally, step c is a step of downloading and using an unspecified speaker voice recognition engine created by the user via the Internet using an engine download unit provided on the Internet.

上述した音声がインターネットを介して送信、保存される記憶ユニット２０と、使用する電子機器の特性に基づき作成された不特定話者音声認識エンジン３０とは、同じ電子機器又は異なる電子機器上であっても、変わらず作動する。 The storage unit 20 for transmitting and storing the above-described voice via the Internet and the unspecified speaker voice recognition engine 30 created based on the characteristics of the electronic device used may be on the same electronic device or on different electronic devices. However, it works without change.

本発明のシステム及び方法において、ユーザにより使用される電子装置は、携帯電話、デスクトップ型パソコン、ノートブック型パソコン、ＰＤＡなどの電子機器であり、すべて本発明の応用範囲である。また、利用されるインターネットは、コンピュータインターネット、携帯電話回線、固定電話回線などである。 In the system and method of the present invention, an electronic device used by a user is an electronic device such as a mobile phone, a desktop personal computer, a notebook personal computer, and a PDA, all of which are applicable to the present invention. The internet used is a computer internet, a mobile phone line, a fixed phone line, or the like.

本発明では好適な実施形態を前述の通りに開示したが、これらは決して本発明を限定するものではなく、当該技術を熟知する者は誰でも、本発明の精神と領域を脱しない範囲内で各種の変更や修正を加えることができる。従って、本発明の保護の範囲は、特許請求の範囲で指定した内容を基準とする。 Although preferred embodiments of the present invention have been disclosed as described above, they are not intended to limit the present invention in any way, and anyone skilled in the art is within the spirit and scope of the present invention. Various changes and modifications can be made. Therefore, the scope of protection of the present invention is based on the contents specified in the claims.

本発明の第１の実施形態による複数の機器に使用される不特定話者音声認識エンジンがインターネットを介して個別のユーザに提供されるシステムを示す図である。It is a figure which shows the system by which the unspecified speaker speech recognition engine used for the some apparatus by the 1st Embodiment of this invention is provided to an individual user via the internet. 本発明の第２の実施形態による複数の機器に使用される不特定話者音声認識エンジンがインターネットを介して個別のユーザに提供されるシステムを示す図である。It is a figure which shows the system by which the unspecified speaker speech recognition engine used for the some apparatus by the 2nd Embodiment of this invention is provided to an individual user via the internet. 本発明の第２の実施形態による複数の機器に使用される不特定話者音声認識エンジンがインターネットを介して個別のユーザに提供されるシステムを示すもう１つの図である。It is another figure which shows the system by which the unspecified speaker voice recognition engine used for the some apparatus by the 2nd Embodiment of this invention is provided to an individual user via the internet. 本発明の第２の実施形態による複数の機器に使用される不特定話者音声認識エンジンがインターネットを介して個別のユーザに提供されるシステムを示すさらにもう１つの図である。It is another figure which shows the system by which the unspecified speaker speech recognition engine used for the some apparatus by the 2nd Embodiment of this invention is provided to an individual user via the internet. 本発明の実施形態による複数の機器に使用される不特定話者音声認識エンジンがインターネットを介して個別のユーザに提供される方法を示すフローチャート図である。FIG. 3 is a flowchart illustrating a method in which an unspecified speaker speech recognition engine used for a plurality of devices according to an embodiment of the present invention is provided to individual users via the Internet.

Explanation of symbols

１プラットフォーム
２、２´ 携帯電話
３ノートブック型パソコン
１０ユーザログインユニット
２０記憶ユニット
３０音声認識エンジン作成ユニット
４０エンジンダウンロードユニット DESCRIPTION OF SYMBOLS 1 Platform 2, 2 'Mobile phone 3 Notebook-type personal computer 10 User login unit 20 Storage unit 30 Speech recognition engine creation unit 40 Engine download unit

Claims

A system in which an unspecified speaker voice recognition engine used for a plurality of devices including a storage unit and a voice recognition engine creation unit is provided to individual users via the Internet,
The storage unit is used to store voice input by a user via an electronic device,
The speech recognition engine creation unit creates an unspecified speaker speech recognition engine suitable for use in each electronic device sound based on the speech input by the user and the characteristics of the electronic device used. A system for providing an unspecified speaker voice recognition engine used for a plurality of devices to individual users via the Internet.

The unspecified speaker used in the plurality of devices according to claim 1, further comprising a user login unit that provides a system in which different users log in from the Internet using an electronic device having a voice recognition function. A system that provides voice recognition engines to individual users via the Internet.

The system further comprises an engine download unit provided to download each unspecified speaker voice recognition engine to each corresponding electronic device in order for the user to use the unspecified speaker voice recognition function. The system which provides the unspecified speaker voice recognition engine used for the some apparatus of 1 or 2 to an individual user via the internet.

The unspecified speaker voice recognition engine used for a plurality of devices according to claim 1, wherein the electronic device used is a mobile phone, a desktop personal computer, a notebook personal computer or a PDA. System to provide to individual users through.

The unspecified speaker voice recognition engine used for a plurality of devices according to claim 1, wherein the Internet used is a computer Internet, a mobile phone line, or a fixed phone line. System provided to users.

The voice recognition engine creation unit is configured to use the unspecified speaker voice recognition engine suitable for use in the electronic device based on a user's voice and characteristics of the electronic device used by model training technology or model adjustment technology. The system which provides the unspecified speaker voice recognition engine used for the some apparatus of Claim 1 to an individual user via the internet characterized by the above-mentioned.

A method in which an unspecified speaker voice recognition engine used for a plurality of devices including step a and step b is provided to individual users via the Internet,
The step a is a step in which a voice input by a user via an electronic device is transmitted and stored via the Internet to a storage unit provided in a platform on the Internet,
The step b uses the speech recognition engine creation unit provided on the Internet, and based on the voice input by the user and the characteristics of the electronic device used, the unspecified information suitable for use in the electronic device. A method for providing an unspecified speaker voice recognition engine used for a plurality of devices to an individual user via the Internet, which is a step of creating a speaker voice recognition engine.

8. The plurality of devices according to claim 7, further comprising a step a1 of logging into the system from a user login unit provided on the Internet by an electronic device before the step a. A method for providing an unspecified speaker voice recognition engine to individual users via the Internet.

After the step b, the method further comprises a step c of downloading and using an unspecified speaker voice recognition engine created by the user via the Internet using an engine download unit provided on the Internet. A method for providing an unspecified speaker voice recognition engine used for a plurality of devices according to claim 7 or 8 to individual users via the Internet.

8. The unspecified speaker voice recognition engine used in a plurality of devices according to claim 7, wherein the electronic device used is a mobile phone, a desktop personal computer, a notebook personal computer or a PDA. To provide to individual users via.

The unspecified speaker voice recognition engine used for a plurality of devices according to claim 7, wherein the Internet used is a computer Internet, a mobile phone line, or a fixed phone line. How to provide.

8. The unspecified story used for a plurality of devices according to claim 7, wherein creation of the speech recognition engine is based on a model training technique or a model adjustment technique based on a user's voice and characteristics of an electronic device used. Providing a user voice recognition engine to individual users via the Internet.