JP2002268684A

JP2002268684A - Sound model distributing method for voice recognition

Info

Publication number: JP2002268684A
Application number: JP2001072521A
Authority: JP
Inventors: Junichi Takami; 淳一鷹見
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2001-03-14
Filing date: 2001-03-14
Publication date: 2002-09-20

Abstract

PROBLEM TO BE SOLVED: To provide a sound model distributing method for voice recognition which can increase variations of sound models by limiting the double registration of a sound model by the same provider. SOLUTION: This method for gathering a sound model from a provider A and registering it together with identification information on the provider A, and then distributing the sound model at a request made by a purchaser B has a step for inputting parameter information on the sound model to a server 8 together with the identification information on the provider A and a step for limiting the double registration of the sound model by checking the identification information on the provider A when the server 8 registers the sound model.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識用音響モ
デル配信方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an acoustic model distribution method for speech recognition.

【０００２】[0002]

【従来の技術】近年、携帯電話やパソコンなどの音声認
識を利用した情報入力が本格的に実用化され始めてい
る。通常これらの機器は、初期状態においていわゆる
「不特定話者向け音響モデル」を有している。2. Description of the Related Art In recent years, information input using voice recognition of mobile phones, personal computers, and the like has begun to be put into practical use. Normally, these devices have a so-called “unspecified speaker acoustic model” in an initial state.

【０００３】ただし、不特定話者向け音響モデルとは言
え、真に万人の音声に対して高い認識率を示すわけでは
なく、認識し易い話者と認識し難い話者が存在してしま
う。また、認識し易い話者にとっても、その話者がさら
に自分用にチューニングした音響モデルと比較すると、
不特定話者向け音響モデルは認識の性能が劣るのが普通
である。そこで、不特定話者向け音響モデルをベース
に、数十〜数百語程度の学習用音声を利用者に発声して
もらうことで、音響モデルのチューニングを行う機能
（話者適応機能）を設けている装置も少なくない。[0003] However, although this is an acoustic model for unspecified speakers, it does not show a high recognition rate for truly every voice, and some speakers are easy to recognize and some speakers are difficult to recognize. . Also, for a speaker who is easy to recognize, when the speaker is compared with an acoustic model tuned for himself,
In general, acoustic models for unspecified speakers have poor recognition performance. Therefore, a function (speaker adaptation function) for tuning the acoustic model is provided by having the user utter a learning voice of about several tens to several hundred words based on the acoustic model for unspecified speakers. There are not a few devices.

【０００４】この学習用音声の数は、一般に多ければ多
いほど高い認識性能が期待できるが、実際に数百語もの
単語や文章を発声するには、利用者に相当の労力を強い
ることとなる。そこで、利用者に対して事前に要求され
る話者適応のための労力を省くために、自分と声質や話
し方の特徴が似ている他のユーザが提供した音響モデル
を利用する音響モデルの配信方法が考えられる。In general, the higher the number of learning voices, the higher the recognition performance can be expected. However, in order to actually utter words or sentences of several hundred words, a considerable amount of labor is required for the user. . Therefore, in order to save the user's effort for speaker adaptation required in advance, the distribution of an acoustic model using an acoustic model provided by another user who has similar voice quality and speech characteristics to the user. A method is conceivable.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、この方
法がうまく機能するためには、様々なバリエーションを
持つ大量の音響モデルを用意する必要があるが、同一提
供者が自分の音声で学習した音響モデルを数多く登録し
たとしても、音響モデルのバリエーションは増えず、多
様な音響モデルの購入者に対応することができない。However, in order for this method to work well, it is necessary to prepare a large number of acoustic models having various variations. Even if many are registered, the variation of the acoustic model does not increase, and it is impossible to respond to purchasers of various acoustic models.

【０００６】本発明は以上のような従来技術の問題点を
解消するためになされたもので、同一提供者からの音響
モデルの重複登録を制限することで、音響モデルのバリ
エーションを増やすことができる、音声認識用音響モデ
ル配信方法を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems of the prior art, and it is possible to increase the number of variations of acoustic models by restricting duplicate registration of acoustic models from the same provider. It is another object of the present invention to provide an acoustic model distribution method for speech recognition.

【０００７】[0007]

【課題を解決するための手段】請求項１記載の発明は、
提供者から音響モデルを収集して上記提供者の識別情報
と共に登録しておき、購入者からの要求に応じて上記音
響モデルを配信する方法であって、提供者の識別情報と
共に音響モデルのパラメータ情報をサーバに入力するス
テップと、上記サーバが、上記音響モデルを登録する際
に上記提供者の識別情報を調べることで音響モデルの重
複登録を制限するステップ、とを有することを特徴とす
る。According to the first aspect of the present invention,
A method of collecting an acoustic model from a provider, registering the acoustic model together with the identification information of the provider, and distributing the acoustic model in response to a request from the purchaser. Inputting information to a server; and restricting duplicate registration of the acoustic model by examining the provider's identification information when registering the acoustic model.

【０００８】請求項２記載の発明は、請求項１記載の発
明において、音響モデルのパラメータ情報と共に学習時
の情報も併せてサーバに入力するようにしたことを特徴
とする。According to a second aspect of the present invention, in the first aspect of the present invention, the learning information is input to the server together with the acoustic model parameter information.

【０００９】請求項３記載の発明は、請求項１または２
記載の発明において、音響モデルの重複登録は、音響モ
デルの学習スコアを比較して制限するようにしたことを
特徴とする。The invention described in claim 3 is the first or second invention.
In the invention described above, overlapping registration of acoustic models is characterized in that learning scores of acoustic models are compared and limited.

【００１０】請求項４記載の発明は、請求項１または２
記載の発明において、音響モデルの重複登録は、上記音
響モデルの提供者が選択することで制限するようにした
ことを特徴とする。[0010] The invention according to claim 4 is the invention according to claim 1 or 2.
In the described invention, overlapping registration of acoustic models is limited by selection by a provider of the acoustic model.

【００１１】請求項５記載の発明は、請求項４記載の発
明において、提供者が音響モデルを選択する際に、上記
提供者に対して指標を提示するようにしたことを特徴と
する。According to a fifth aspect of the present invention, in the fourth aspect, an index is presented to the provider when the provider selects an acoustic model.

【００１２】[0012]

【発明の実施の形態】以下、図面を参照しながら本発明
にかかる音声認識用音響モデル配信方法の実施の形態に
ついて説明する。図１は、本発明にかかる音声認識用音
響モデル配信方法の実施の形態を示したシステム構成図
である。符号１は音響モデルの提供者の端末（以下、
「端末１」という）、２は音響モデルの購入者の端末
（以下、「端末２」という）、７は通信ネットワーク、
８は音響モデル配信用ホストコンピュータ（以下、「サ
ーバ８」という）、を示す。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing a configuration of an audio model distribution method for speech recognition according to the present invention. FIG. 1 is a system configuration diagram showing an embodiment of a speech recognition acoustic model distribution method according to the present invention. Reference numeral 1 denotes an acoustic model provider terminal (hereinafter, referred to as a terminal).
"Terminal 1"), 2 is a terminal of a purchaser of the acoustic model (hereinafter, "Terminal 2"), 7 is a communication network,
Reference numeral 8 denotes an acoustic model distribution host computer (hereinafter, referred to as “server 8”).

【００１３】端末１は、通信ネットワーク７を介してサ
ーバ８と通信することができる情報処理装置であり、音
声認識機能を有する。音響モデルの提供者Ａは、端末１
を用いて、定められた話者適応用の発話リストの内容
（数十〜数百発話程度）を入力することで、話者適応が
行われ、提供者用の音響モデルを構築し、サーバ８に入
力する。サーバ８へは、音響モデルのみでなく、話者適
応用に発話された音声の一部（例えば最初の単語。この
例では「はちのへ」）も併せて入力する。The terminal 1 is an information processing device capable of communicating with the server 8 via the communication network 7 and has a voice recognition function. The provider A of the acoustic model is the terminal 1
By inputting the contents (about several tens to several hundreds of utterances) of the determined utterance list for speaker adaptation using, the speaker adaptation is performed, an acoustic model for the provider is constructed, and the server 8 To enter. Not only the acoustic model but also a part of the speech uttered for speaker adaptation (for example, the first word, “Hachinohe” in this example) is input to the server 8.

【００１４】端末２は、通信ネットワーク７を介してサ
ーバ８と通信することができる情報処理装置であり、音
声認識機能を有する。音響モデルの購入者Ｂは、端末２
を用いて、定められたモデル検索用の発話リストの内容
（１〜数発話程度）をサーバ８に入力する。サーバ８へ
は、音響モデル予備選択用の音声（例えば「はちの
へ」）も併せて入力する。The terminal 2 is an information processing device capable of communicating with the server 8 via the communication network 7, and has a voice recognition function. The purchaser B of the acoustic model uses the terminal 2
Is input to the server 8 with the contents of the determined utterance list for model search (one to several utterances). The sound for preliminary selection of the acoustic model (for example, “Hachinohe”) is also input to the server 8.

【００１５】通信ネットワーク７の例としては、インタ
ーネットやＬＡＮに代表されるコンピュータネットワー
クがあり、端末１，２、及びサーバ８は、それぞれ公衆
交換電話網（ＰＳＴＮ）、無線電話網、ＣＡＴＶ網、衛
星通信網などの通信回線を介して通信ネットワーク７に
接続している。Examples of the communication network 7 include a computer network represented by the Internet and a LAN. The terminals 1 and 2 and the server 8 are respectively connected to a public switched telephone network (PSTN), a wireless telephone network, a CATV network, and a satellite. It is connected to a communication network 7 via a communication line such as a communication network.

【００１６】サーバ８は、提供者Ａから収集した音響モ
デルを購入者Ｂに配信する業者、すなわち音響モデルの
提供者Ａと購入者Ｂとを仲介するための仲介業者が管理
・運営する情報処理装置である。サーバ８はその内部
に、重複登録の調査・対応部３、購入者のための処理部
４、音響モデルデータベース５、音響モデル検索用発話
リスト６、とを有してなる。The server 8 distributes the acoustic model collected from the provider A to the purchaser B, that is, information processing managed and operated by an intermediary for mediating between the acoustic model provider A and the purchaser B. Device. The server 8 includes therein a survey / correspondence unit 3 for duplicate registration, a processing unit 4 for a purchaser, an acoustic model database 5, and an utterance list 6 for acoustic model search.

【００１７】重複登録の調査・対応部３は、同一提供者
の音響モデルが既に登録されているか否かを音響モデル
データベース５上で検索し、登録されている場合には、
重複を解消するための処理を行う。The duplicate registration investigation / correspondence unit 3 searches the acoustic model database 5 to determine whether or not the acoustic model of the same provider has already been registered.
Perform processing to eliminate duplication.

【００１８】購入者のための処理部４は、「音響モデル
の予備選択機能」、「最適モデルの検索機能」、「認識
精度計算機能」、「課金処理機能」、の各機能を提供す
る。以下、各機能について説明する。The processing unit 4 for the purchaser provides each function of "preliminary acoustic model selection function", "optimal model search function", "recognition accuracy calculation function", and "billing processing function". Hereinafter, each function will be described.

【００１９】音響モデルの予備選択機能は、購入者Ｂの
性別や年齢、および購入者Ｂからアップロード（提出）
されたモデル検索用音声サンプル中の予備選択用の音声
などの情報を、提供者Ａから既にアップロードされてい
る情報と比較するという単純な処理によって、登録され
ている膨大な音響モデルの中から、購入者Ｂに相応しい
と思われる音響モデル候補の絞り込みを行う。The preselection function of the acoustic model is performed by uploading (submitting) the sex and age of the buyer B and the buyer B.
By a simple process of comparing information such as voice for preliminary selection in the obtained voice sample for model search with information already uploaded from the provider A, from among a huge number of registered acoustic models, A narrowing down of acoustic model candidates considered to be suitable for the purchaser B is performed.

【００２０】最適モデルの検索機能は、予備選択によっ
て絞り込まれた少数の音響モデルを購入者Ｂの音声に適
用し、各モデルに対するスコア（尤度など）を計算し、
その値が大きい順にソートすることによって、購入者Ｂ
の音声に最適なモデルを第ｎ位候補まで求める。The search function of the optimal model is to apply a small number of acoustic models narrowed down by the preliminary selection to the voice of Buyer B, calculate a score (likelihood, etc.) for each model,
By sorting the values in descending order, the purchaser B
The most suitable model for the voice is obtained up to the n-th candidate.

【００２１】認識精度計算機能は、自動的に検索された
音響モデルの性能を最終確認するために、購入者Ｂから
アップロードされる評価試験用の音声サンプルを用い
て、オンラインで音声認識実験を行う。文法は予め用意
されているが、購入者Ｂから文法情報がアップロードさ
れている場合には、それを用いることもできる。The recognition accuracy calculation function performs an online speech recognition experiment using a speech sample for an evaluation test uploaded from the purchaser B in order to finally confirm the performance of the automatically searched acoustic model. . Although the grammar is prepared in advance, if grammar information is uploaded from the purchaser B, it can be used.

【００２２】課金処理機能は、購入者Ｂが最終的に特定
の音響モデルの購入を決定し、その音響モデルが購入者
Ｂの端末２に向けて配信された時点で、購入者Ｂの情
報、および購入者Ｂに提供された音響モデルの提供者Ａ
の情報を対にして記憶し、後日、購入者Ｂへの課金、あ
るいは提供者Ａへの報酬支払いを行う。The billing function is such that when the purchaser B finally decides to purchase a specific acoustic model and the acoustic model is distributed to the terminal 2 of the purchaser B, information on the purchaser B, And the provider A of the acoustic model provided to the buyer B
Is stored as a pair, and the purchaser B is paid or the provider A is paid later.

【００２３】音響モデルデータベース５は、提供者Ａか
ら提供された音響モデルをその付加情報と共に保存す
る。The acoustic model database 5 stores the acoustic model provided by the provider A together with its additional information.

【００２４】音響モデル検索用発話リスト６は、提供者
Ａ、購入者Ｂおよびサーバ８で共有される情報である。
これによって購入者Ｂに適した音響モデルの自動検索
（予備選択も含む）が可能になる。The utterance list 6 for acoustic model search is information shared by the provider A, the purchaser B, and the server 8.
This enables automatic search (including preliminary selection) of an acoustic model suitable for purchaser B.

【００２５】以下、サーバ８への音響モデルの重複登録
を制限する方法について説明する。なお、ここでは、端
末１を利用している提供者Ａが、自分の音声で話者適応
用の指定された単語、あるいは文章などを実際に発話
し、自分の音声にチューニングした音響モデルを既に構
築しているものとする。Hereinafter, a method for restricting the duplicate registration of the acoustic model to the server 8 will be described. Note that, here, the provider A using the terminal 1 actually utters a specified word or sentence for speaker adaptation in his / her own voice, and has already created an acoustic model tuned to his / her own voice. Assume that you are building.

【００２６】音響モデルの提供者Ａは、端末１を用いて
通信ネットワーク７を介してサーバ８に接続し、音響モ
デルのパラメータ情報をアップロードする。その際音響
モデルのパラメータ情報以外に、端末の識別情報、ある
いは提供者の性別や年齢を特定するための情報、端末内
に保存されている音響モデル学習時の情報（学習単語数
や学習スコア）を同時にアップロードする。The acoustic model provider A uses the terminal 1 to connect to the server 8 via the communication network 7 and uploads the acoustic model parameter information. At this time, in addition to the acoustic model parameter information, terminal identification information, information for specifying the sex and age of the provider, and information on acoustic model learning stored in the terminal (number of learning words and learning score) Upload at the same time.

【００２７】サーバ８は、端末１から受信した提供者Ａ
の個人情報に基づいてユニークな提供者の識別情報（提
供者ＩＤ）を設け、この提供者ＩＤとアップロードされ
た音響モデルのパラメータ情報、および学習時の情報を
音響モデルデータベース５上で一元管理する。The server 8 stores the provider A received from the terminal 1
And unique provider identification information (provider ID) is provided based on the personal information described above, and the provider ID, parameter information of the uploaded acoustic model, and information at the time of learning are centrally managed on the acoustic model database 5. .

【００２８】サーバ８は、音響モデルを受信する度に、
提供者ＩＤを調べ、同一提供者からアップロードされた
音響モデルが既に登録されている場合には、以下の２つ
のいずれかの方法によって、重複登録を制限する。Each time the server 8 receives an acoustic model,
The provider ID is checked, and if the acoustic model uploaded from the same provider has already been registered, duplicate registration is restricted by one of the following two methods.

【００２９】１つの方法は、既に登録された音響モデル
と新たにアップロードされた音響モデルのうちから、一
方を自動的に選択して正式に採用する方法である。その
際、既に登録された音響モデルの学習スコアと、新たに
アップロードされた音響モデルの学習スコアを比較し、
その値が大きい方を正式に採用する。すなわち、既に登
録された音響モデルが採用される場合には、そのときの
アップロードの処理自体を無効にし、一方、新たにアッ
プロードされた音響モデルが採用される場合には、既に
登録された音響モデルのパラメータ情報や学習情報を新
たにアップロードされたものと置きかえる。One method is to automatically select one of an already registered acoustic model and a newly uploaded acoustic model and formally adopt it. At that time, the learning score of the already registered acoustic model is compared with the learning score of the newly uploaded acoustic model,
The one with the larger value is officially adopted. That is, when the already registered acoustic model is adopted, the uploading process itself at that time is invalidated. On the other hand, when the newly uploaded acoustic model is adopted, the already registered acoustic model is used. The parameter information and the learning information of are replaced with newly uploaded ones.

【００３０】別の方法は、既に登録された音響モデルと
新たにアップロードされた音響モデルのうちから、一方
を提供者Ａに選択させて正式に採用する方法である。既
に登録された音響モデルと新たにアップロードされた音
響モデルのうち、提供者Ａがどちらを正式に登録したい
かを端末１から指定することで、最終的に１つの音響モ
デルのみを残す。Another method is a method in which the provider A selects one of an already registered acoustic model and a newly uploaded acoustic model, and formally adopts the selected acoustic model. From the terminal 1, the provider A designates which one of the already registered acoustic model and the newly uploaded acoustic model is to be officially registered, so that only one acoustic model is finally left.

【００３１】その際サーバ８は、既に登録された音響モ
デルのアップロードの受付日や学習時の情報、その音響
モデルの過去の実績（購入者の音声に適合した頻度、あ
るいはＮ―ｂｅｓｔ候補に残った頻度など）を端末１に
送信し、提供者Ａが、より的確な判断を促すための指標
を提示するようにしてよい。At this time, the server 8 receives the upload date of the acoustic model already registered and the information at the time of learning, the past performance of the acoustic model (the frequency suitable for the voice of the purchaser, or the N-best candidate May be transmitted to the terminal 1, and the provider A may present an index for prompting a more accurate determination.

【００３２】以上説明した実施の形態によれば、提供者
Ａから音響モデルを収集して上記提供者Ａの識別情報と
共に登録しておき、購入者Ｂからの要求に応じて上記音
響モデルを配信する方法であって、提供者Ａの識別情報
と共に音響モデルのパラメータ情報をサーバ８に入力す
るステップと、上記サーバ８が、上記音響モデルを登録
する際に上記提供者Ａの識別情報を調べることで音響モ
デルの重複登録を制限するステップ、とを有することに
より、収集する音響モデルを重複して登録するのを防ぐ
ことができるため、多様な音響モデルのバリエーション
を増やすことができる。その結果、多様な音響モデルの
購入者に対応することができる。According to the embodiment described above, an acoustic model is collected from the provider A, registered with the identification information of the provider A, and distributed in response to a request from the purchaser B. Inputting the parameter information of the acoustic model together with the identification information of the provider A to the server 8, wherein the server 8 checks the identification information of the provider A when registering the acoustic model. Restricting the duplicate registration of acoustic models, it is possible to prevent the acoustic models to be collected from being registered in duplicate, thereby increasing variations of various acoustic models. As a result, it is possible to respond to purchasers of various acoustic models.

【００３３】また、提供者Ａが音響モデルを選択する際
に、上記提供者Ａに対して指標を提示するようにしたこ
とにより、提供者Ａがより的確な判断を行うことができ
るため、登録された音響モデルの品質の向上を実現する
ことができる。When the provider A selects the acoustic model, the index is presented to the provider A, so that the provider A can make a more accurate determination. It is possible to realize improvement of the quality of the acoustic model obtained.

【００３４】[0034]

【発明の効果】請求項１記載の発明によれば、収集する
音響モデルが重複して登録するのを防ぐことできるた
め、多様な音響モデルのバリエーションを増やすことが
でき、多様な音響モデルの購入者に対応することができ
る。According to the first aspect of the present invention, it is possible to prevent the acoustic models to be collected from being registered repeatedly, so that it is possible to increase the variations of various acoustic models and to purchase various acoustic models. Can respond to people.

【００３５】請求項５記載の発明によれば、提供者が音
響モデルを選択することで重複登録を制限する方法の場
合に、提供者がより的確な判断を行うことができるた
め、登録された音響モデルの品質の向上を実現すること
ができる。According to the fifth aspect of the present invention, when the provider selects an acoustic model to restrict duplicate registration, the provider can make a more accurate determination, and the registered It is possible to improve the quality of the acoustic model.

[Brief description of the drawings]

【図１】本発明にかかる音声認識用音響モデルの実施の
形態を示すシステム構成図である。FIG. 1 is a system configuration diagram showing an embodiment of a speech recognition acoustic model according to the present invention.

[Explanation of symbols]

１音響モデル提供者の端末（端末１）２音響モデル購入者の端末（端末２）７通信ネットワーク８音響モデル配信用ホストコンピュータ（サーバ） Reference Signs List 1 acoustic model provider terminal (terminal 1) 2 acoustic model purchaser terminal (terminal 2) 7 communication network 8 acoustic model distribution host computer (server)

Claims

[Claims]

1. A method of collecting an acoustic model from a provider, registering the acoustic model together with the identification information of the provider, and distributing the acoustic model in response to a request from a purchaser, comprising the steps of: Together with inputting parameter information of the acoustic model to the server, and the server restricting duplicate registration of the acoustic model by examining the identification information of the provider when registering the acoustic model. An acoustic model distribution method for speech recognition characterized by the following.

2. The server according to claim 1, wherein the learning information is input to the server together with the acoustic model parameter information.
The described acoustic model distribution method for speech recognition.

3. The acoustic model distribution method for speech recognition according to claim 1, wherein the duplicate registration of acoustic models is limited by comparing learning scores of acoustic models.

4. The acoustic model distribution method for speech recognition according to claim 1, wherein the duplicate registration of the acoustic model is restricted by selecting the acoustic model provider.

5. The acoustic model distribution method for speech recognition according to claim 4, wherein when the provider selects an acoustic model, an index is presented to the provider.