JP2002162989A

JP2002162989A - System and method for sound model distribution

Info

Publication number: JP2002162989A
Application number: JP2000360530A
Authority: JP
Inventors: Junichi Takami; 淳一鷹見
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2000-11-28
Filing date: 2000-11-28
Publication date: 2002-06-07

Abstract

PROBLEM TO BE SOLVED: To allow an acquirer to obtain a sound model suitable to his or her voice from a host computer by an acquirer's terminal without taking labor for speaker adaptation. SOLUTION: The host computer 1 has a receiving means 11 which receives a sound model provided by a provider-side terminal 2, a storage means 12 which stores the sound model received by the receiving means 11, a retrieving means 13 which retrieves a sound model suitable to the acquirer's voice from sound models stored in the storage means 12 when the acquirer's sound is inputted from the acquirer's side terminal 3 (more in concrete, recognizes the acquirer's voice by using the sound models stored in the storage means 12 and retrieves a sound model by which an excellent recognition result (recognition score) can be obtained), and a sending-out means 14 which sends the sound model retrieved by the retrieving means 13 to the acquirer's side terminal 3.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音響モデル配信シ
ステムおよび音響モデル配信方法に関する。The present invention relates to an acoustic model distribution system and an acoustic model distribution method.

【０００２】[0002]

【従来の技術】近年、携帯電話やパソコンなどで、音声
認識を利用した情報入力が本格的に実用化され始めてい
る。携帯電話やパソコンなどの機器は、通常、初期の状
態でいわゆる「不特定話者向け音響モデル」を有してい
る。しかし、不特定話者向け音響モデルの場合、万人の
音声に対して高い認識率を示すわけではなく、認識し易
い話者と認識し難い話者が存在してしまう。また、認識
し易い話者にとっても、その話者がさらに自分用にチュ
ーニングされた音響モデルと比較すると、不特定話者向
け音響モデルは性能が劣るのが普通である。2. Description of the Related Art In recent years, information input using speech recognition has been put into practical use in mobile phones and personal computers. Devices such as mobile phones and personal computers usually have a so-called “unspecified speaker acoustic model” in an initial state. However, in the case of an acoustic model for unspecified speakers, a high recognition rate is not shown for all voices, and some speakers are easy to recognize and some speakers are difficult to recognize. Also, for a speaker that is easy to recognize, the performance of the acoustic model for unspecified speakers is generally inferior to that of the acoustic model tuned for the speaker.

【０００３】従って、不特定話者向け音響モデルをベー
スに、数十〜数百語程度の学習用音声を利用者に発声し
てもらうことで、音響モデルのチューニングを行なう機
能（話者適応機能）を設けている装置も少なくない。こ
の適応用音声の数は、一般に多ければ多いほど高い認識
性能が期待できるが、実際に数百語もの単語や文章を発
声するには、相当の労力が要求される。Therefore, a function of tuning the acoustic model (speaker adaptation function) by having the user utter a learning voice of about several tens to several hundreds of words based on the acoustic model for an unspecified speaker. ) Are provided in many devices. Generally, the greater the number of adaptation voices, the higher the recognition performance can be expected. However, uttering hundreds of words or sentences requires a considerable amount of labor.

【０００４】[0004]

【発明が解決しようとする課題】本発明は、入手者側の
端末において、話者適応のための労力を費やすことなし
に、自分の音声に適した音響モデルをホストコンピュー
タから入手することの可能な音響モデル配信システムお
よび音響モデル配信方法を提供することを目的としてい
る。SUMMARY OF THE INVENTION According to the present invention, it is possible to obtain an acoustic model suitable for the user's own voice from a host computer without spending labor for speaker adaptation at the terminal of the user. It is an object of the present invention to provide an acoustic model distribution system and an acoustic model distribution method.

【０００５】[0005]

【課題を解決するための手段】上記目的を達成するため
に、請求項１記載の発明は、ホストコンピュータと、音
響モデルを提供する提供者側の端末と、音響モデルを入
手する入手者側の端末とを有し、前記ホストコンピュー
タは、提供者側の端末から提供された音響モデルを受信
する受信手段と、受信手段で受信した音響モデルを蓄積
する記憶手段と、入手者側の端末から入手者の音声が入
力されるとき、入手者の音声に適した音響モデルを前記
記憶手段に蓄積されている音響モデルの中から検索する
検索手段と、検索手段によって検索された音響モデルを
入手者側の端末に送出する送出手段とを有していること
を特徴としている。In order to achieve the above object, the invention according to claim 1 comprises a host computer, a terminal of a provider that provides an acoustic model, and a terminal of a provider that acquires an acoustic model. A receiving unit for receiving the acoustic model provided from the terminal on the provider side, a storage unit for storing the acoustic model received by the receiving unit, and a terminal for acquiring from the terminal on the acquiring side. Searching means for retrieving an acoustic model suitable for the acquirer's voice from the acoustic models stored in the storage means when the user's voice is input; And sending means for sending to the terminal.

【０００６】また、請求項２記載の発明は、請求項１記
載の音響モデル配信システムにおいて、前記提供者側の
端末は、提供者の音声の入力で話者適応して音響モデル
を生成する話者適応機能を有していることを特徴として
いる。According to a second aspect of the present invention, in the acoustic model distribution system according to the first aspect, the terminal on the provider side generates a sound model by generating a sound model by performing speaker adaptation by inputting the sound of the provider. It is characterized by having a person adaptation function.

【０００７】また、請求項３記載の発明は、請求項１記
載の音響モデル配信システムにおいて、前記提供者側の
端末は、さらに、提供者情報をホストコンピュータに提
供する機能を有し、前記ホストコンピュータは、提供者
側の端末から提供者情報を受信するとき、該提供者情報
を記憶手段に蓄積するようになっていることを特徴とし
ている。According to a third aspect of the present invention, in the acoustic model distribution system according to the first aspect, the provider side terminal further has a function of providing provider information to a host computer, When the computer receives the provider information from the terminal on the provider side, the computer stores the provider information in the storage means.

【０００８】また、請求項４記載の発明は、請求項１記
載の音響モデル配信システムにおいて、提供者側の端末
は、さらに、提供者の属性情報、および／または、話者
適応時の学習スコア、および／または、話者適応時に使
用した音声サンプルを、提供者のＩＤ情報としてホスト
コンピュータに提供する機能を有し、前記ホストコンピ
ュータは、提供者側の端末から提供者のＩＤ情報を受信
するとき、該提供者のＩＤ情報を記憶手段に蓄積し、入
手者に最適な音響モデルを検索するための付加情報とし
て利用するようになっていることを特徴としている。According to a fourth aspect of the present invention, in the acoustic model distribution system according to the first aspect, the provider side terminal further includes a provider attribute information and / or a learning score at the time of speaker adaptation. And / or a function of providing a voice sample used at the time of speaker adaptation to a host computer as provider ID information, wherein the host computer receives the provider ID information from a terminal on the provider side. At this time, it is characterized in that the ID information of the provider is stored in a storage means, and is used as additional information for searching for an acoustic model most suitable for the acquirer.

【０００９】また、請求項５記載の発明は、請求項１記
載の音響モデル配信システムにおいて、提供者の属性情
報が所定の管理システムで一括管理されている場合に
は、ホストコンピュータは、所定の管理システムから提
供者の属性情報を受信して記憶手段に蓄積し、入手者に
最適な音響モデルを検索するための付加情報として利用
するようになっていることを特徴としている。According to a fifth aspect of the present invention, in the acoustic model distribution system according to the first aspect, when the attribute information of the provider is collectively managed by a predetermined management system, the host computer executes the predetermined processing. It is characterized in that the attribute information of the provider is received from the management system and stored in the storage means, and is used as additional information for searching for the acoustic model most suitable for the user.

【００１０】また、請求項６記載の発明は、請求項４ま
たは請求項５記載の音響モデル配信システムにおいて、
入手者側の端末は、音響モデル検索用の音声とともに、
入手者の属性情報、および／または、音声サンプルを、
入手者のＩＤ情報としてホストコンピュータに与える機
能を有し、また、ホストコンピュータは、入手者側の端
末から入手者のＩＤ情報が与えられたときに、入手者側
の端末から与えられた入手者のＩＤ情報を提供者のＩＤ
情報と照合することによって、ある程度尤もらしい音響
モデルの候補を記憶手段から予備選択する予備検索手段
をさらに有し、ホストコンピュータの検索手段は、予備
検索手段によって予備選択された音響モデルの候補の中
から入手者の音声に適した音響モデルを検索するように
なっていることを特徴としている。According to a sixth aspect of the present invention, in the acoustic model distribution system according to the fourth or fifth aspect,
The terminal on the side of the acquirer, together with the voice for acoustic model search,
Attribute information of the acquirer and / or audio samples,
The host computer has a function of giving the ID information of the acquirer to the host computer. When the ID information of the acquirer is given from the terminal of the acquirer, the host computer obtains the acquirer given from the terminal of the acquirer. ID information of the provider
The computer further includes a preliminary search unit that preliminarily selects a likely acoustic model candidate from the storage unit by collating the information with the information. Is searched for an acoustic model suitable for the voice of the acquirer.

【００１１】また、請求項７記載の発明は、請求項１乃
至請求項６のいずれか一項に記載の音響モデル配信シス
テムにおいて、入手者側の端末は、ホストコンピュータ
において事前に定められている語彙情報に対する入手者
の音響モデル検索用の音声とその正解情報をホストコン
ピュータに与え、ホストコンピュータの検索手段は、事
前に定められている語彙情報を使用して入手者の音声に
対し一定の認識率を示す音響モデルを前記記憶手段から
自動検索するようになっていることを特徴としている。According to a seventh aspect of the present invention, in the acoustic model distribution system according to any one of the first to sixth aspects, the terminal on the side of the acquirer is predetermined in the host computer. The speech for the user's acoustic model search for the vocabulary information and its correct information are given to the host computer, and the search means of the host computer uses the predetermined vocabulary information to perform a certain recognition on the acquirer's speech. An acoustic model indicating a rate is automatically retrieved from the storage means.

【００１２】また、請求項８記載の発明は、請求項１乃
至請求項６のいずれか一項に記載の音響モデル配信シス
テムにおいて、入手者側の端末は、語彙情報をホストコ
ンピュータに与え、また、入手者側の端末からの語彙情
報に対する入手者の音響モデル検索用の音声とその正解
情報をホストコンピュータに与え、ホストコンピュータ
の検索手段は、入手者側の端末から与えられた語彙情報
を使用して入手者の音声に対し一定の認識率を示す音響
モデルを前記記憶手段から自動検索するようになってい
ることを特徴としている。According to an eighth aspect of the present invention, in the acoustic model distribution system according to any one of the first to sixth aspects, the terminal on the acquirer side provides vocabulary information to the host computer, The voice for searching the acoustic model of the acquirer with respect to the vocabulary information from the acquirer's terminal and its correct information are given to the host computer, and the search means of the host computer uses the vocabulary information given from the acquirer's terminal Then, an acoustic model showing a certain recognition rate with respect to the voice of the acquirer is automatically searched from the storage means.

【００１３】また、請求項９記載の発明は、請求項８記
載の音響モデル配信システムにおいて、入手者側の端末
が携帯電話であり、語彙情報として電話番号リストを使
用するときには、電話番号リスト中の名前やニックネー
ムなどの情報のみを語彙情報としてホストコンピュータ
に与え、電話番号そのものはホストコンピュータに与え
ないようなフィルタ機能を入手者側の端末にもたせるこ
とを特徴としている。According to a ninth aspect of the present invention, in the acoustic model distribution system according to the eighth aspect, when the terminal on the acquirer side is a mobile phone and a telephone number list is used as vocabulary information, the telephone number list is used. This is characterized in that only information such as the name and nickname of the user is given to the host computer as vocabulary information, and a filter function is provided in the terminal of the acquirer so that the telephone number itself is not given to the host computer.

【００１４】また、請求項１０記載の発明は、請求項１
乃至請求項９のいずれか一項に記載の音響モデル配信シ
ステムにおいて、ホストコンピュータは、さらに、検索
手段で音響モデルが検索されたとき、検索された音響モ
デルを入手者側の端末に与えるに先立って、該音響モデ
ルの性能を確認するための確認手段を有し、該確認手段
は、入手者側の端末から送られた音声を、検索された音
響モデルを用いて実際に音声認識し、そのスコアを確認
結果として入手者側の端末に与えることを特徴としてい
る。The invention according to claim 10 is the first invention.
10. The acoustic model distribution system according to claim 9, wherein the host computer further provides the retrieved acoustic model to a terminal on the side of the acquirer when the acoustic model is retrieved by the retrieval means. Means for confirming the performance of the acoustic model, and the confirming means actually recognizes the voice sent from the terminal on the side of the acquirer using the searched acoustic model, and It is characterized in that the score is given to the terminal of the acquirer as a confirmation result.

【００１５】また、請求項１１記載の発明は、請求項１
０記載の音響モデル配信システムにおいて、前記検索手
段は、最適な音響モデルだけでなく、所定順位までの音
響モデルの候補を検索する機能を有し、前記確認手段
は、検索手段により検索された１つの音響モデルの候補
を入手者が気に入らない場合には、他の音響モデルの候
補を提示することを特徴としている。The invention according to claim 11 is the first invention.
0, the searching means has a function of searching for not only an optimal acoustic model but also a candidate for an acoustic model up to a predetermined order, and the confirming means has a function of searching for a candidate of an acoustic model up to a predetermined rank. If the user does not like one acoustic model candidate, another acoustic model candidate is presented.

【００１６】また、請求項１２記載の発明は、請求項１
乃至請求項１１のいずれか一項に記載の音響モデル配信
システムにおいて、前記ホストコンピュータは、入手者
側の端末がホストコンピュータから音響モデルを購入し
たとき、入手者への課金を行なう課金手段を有している
ことをを特徴としている。The invention according to claim 12 is the first invention.
12. The acoustic model distribution system according to claim 11, wherein the host computer has a charging unit for charging the acquirer when the terminal on the acquirer side purchases the acoustic model from the host computer. It is characterized by doing.

【００１７】また、請求項１３記載の発明は、請求項１
乃至請求項１２のいずれか一項に記載の音響モデル配信
システムにおいて、前記ホストコンピュータは、提供者
側の端末から音響モデルが提供されたとき、提供者への
料金の支払いを行なう支払手段を有していることを特徴
としている。The invention according to claim 13 is the first invention.
13. The acoustic model distribution system according to claim 12, wherein the host computer has a payment unit for paying a fee to the provider when the acoustic model is provided from a terminal on the provider side. It is characterized by doing.

【００１８】また、請求項１４記載の発明は、ホストコ
ンピュータと、音響モデルを提供する提供者側の端末
と、音響モデルを入手する入手者側の端末とを有し、前
記ホストコンピュータは、提供者側の端末から提供され
た音響モデルを受信すると、受信した音響モデルを蓄積
し、入手者側の端末から入手者の音声が入力されると
き、入手者の音声に適した音響モデルを、蓄積されてい
る音響モデルの中から検索し、検索した音響モデルを入
手者側の端末に送出することを特徴としている。According to a fourteenth aspect of the present invention, there is provided a host computer, a provider-side terminal for providing an acoustic model, and a provider-side terminal for acquiring an acoustic model. When receiving the acoustic model provided from the terminal of the acquirer, the received acoustic model is accumulated, and when the speech of the acquirer is input from the terminal of the acquirer, the acoustic model suitable for the acquirer's voice is accumulated. It is characterized in that a search is performed from among the acoustic models that have been performed, and the retrieved acoustic model is transmitted to the terminal of the acquirer.

【００１９】また、請求項１５記載の発明は、請求項１
４記載の音響モデル配信方法において、前記ホストコン
ピュータは、入手者側の端末がホストコンピュータから
音響モデルを購入したとき、入手者への課金を行なうこ
とを特徴としている。The invention according to claim 15 is the first invention.
5. The acoustic model distribution method according to 4, wherein the host computer charges the acquirer when the acquirer terminal purchases the acoustic model from the host computer.

【００２０】また、請求項１６記載の発明は、請求項１
４または請求項１５に記載の音響モデル配信方法におい
て、前記ホストコンピュータは、提供者側の端末から音
響モデルが提供されたとき、提供者への料金の支払いを
行なうことを特徴としている。[0020] Further, the invention according to claim 16 is based on claim 1.
The acoustic model distribution method according to claim 4 or claim 15, wherein the host computer pays a fee to the provider when the acoustic model is provided from a terminal on the provider side.

【００２１】[0021]

【発明の実施の形態】以下、本発明の実施形態を図面に
基づいて説明する。本発明は、あるユーザー（音響モデ
ルの入手者）が、自分と声質や話し方の特徴が似ている
他のユーザー（音響モデルの提供者）が学習（話者適
応）した音響モデルを獲得して利用できるようにするこ
とを意図している。すなわち、携帯電話（ｉ−ｍｏｄｅ
（登録商標）など）やパソコンなどの音声認識機能を有
する機器（端末）を利用して、提供者側の端末のユーザ
ー（提供者）が自分の音声で学習（話者適応）した音響
モデルを、他のユーザーの機器（入手者側の端末）に提
供することを意図している。Embodiments of the present invention will be described below with reference to the drawings. According to the present invention, a certain user (acquirer of an acoustic model) acquires an acoustic model that has been learned (speaker adaptation) by another user (acoustic model provider) having voice characteristics and speech characteristics similar to those of the user. It is intended to be available. That is, a mobile phone (i-mode
(Registered trademark)) or a personal computer or other device (terminal) that has a voice recognition function, and the user (provider) of the provider's terminal learns the acoustic model (speaker adaptation) using his / her own voice. , To other user's equipment (the terminal on the side of the acquirer).

【００２２】図１は本発明に係る音響モデル配信システ
ムの構成例を示す図である。図１を参照すると、この音
響モデル配信システムは、ホストコンピュータ１と、音
響モデルを提供する提供者側の端末２と、音響モデルを
入手する入手者側の端末３とを有している。FIG. 1 is a diagram showing a configuration example of an acoustic model distribution system according to the present invention. Referring to FIG. 1, the acoustic model distribution system includes a host computer 1, a provider-side terminal 2 that provides an acoustic model, and a acquirer-side terminal 3 that acquires an acoustic model.

【００２３】ここで、ホストコンピュータ１は、例え
ば、音響モデルの仲介業者によって保有，管理されてお
り、提供者側の端末２から提供された音響モデルを受信
する受信手段１１と、受信手段１１で受信した音響モデ
ルを蓄積する記憶手段１２と、入手者側の端末３から入
手者の音声が入力されるとき、入手者の音声に適した音
響モデルを記憶手段１２に蓄積されている音響モデルの
中から検索する（より具体的には、入手者の音声を記憶
手段１２に蓄積されている音響モデルを用いて音声認識
処理し、良好な認識結果（認識スコア）が得られる音響
モデルを検索する）検索手段１３と、検索手段１３によ
って検索された音響モデルを入手者側の端末３に送出す
る送出手段１４とを有している。Here, the host computer 1 is owned and managed by, for example, an acoustic model intermediary, and includes a receiving unit 11 for receiving the acoustic model provided from the provider terminal 2, and a receiving unit 11. A storage unit 12 for storing the received acoustic model, and an acoustic model suitable for the acquirer's voice when the acquirer's voice is input from the terminal 3 on the acquirer side. Search from inside (more specifically, perform voice recognition processing on the voice of the acquirer using the acoustic model stored in the storage unit 12 and search for an acoustic model that can obtain a good recognition result (recognition score). ) It has a search means 13 and a sending means 14 for sending the acoustic model searched by the search means 13 to the terminal 3 on the side of the acquirer.

【００２４】上記音響モデル配信システムにおいて、提
供者側の端末２は、提供者の音声の入力で学習（話者適
応）して音響モデルを生成する話者適応機能を有してい
る。具体的に、提供者側の端末２では、定められた話者
適応用の発話リストの内容（数十〜数百発話程度）を提
供者が音声入力することで、話者適応が行われ、音響モ
デルが構築され、この音響モデルをホストコンピュータ
１に送出することができる。In the above acoustic model distribution system, the terminal 2 on the provider side has a speaker adaptation function of generating an acoustic model by learning (speaker adaptation) by inputting the speech of the provider. Specifically, in the terminal 2 on the provider side, the speaker adaptation is performed by the provider's voice input of the contents (about several tens to several hundreds of utterances) of the determined utterance list for speaker adaptation, An acoustic model is constructed, and this acoustic model can be sent to the host computer 1.

【００２５】また、上記音響モデル配信システムにおい
て、提供者側の端末２は、さらに、提供者情報（提供者
の氏名や住所や電話番号など）をホストコンピュータ１
に提供する機能を有していても良く、この場合、ホスト
コンピュータ１は、提供者側の端末２から提供者情報を
受信するとき、該提供者情報を記憶手段１２に例えばこ
の提供者から提供された音響モデルと対応付けて蓄積す
るようになっている。なお、この提供者情報は、後述の
ように、音響モデルの提供者に対してホストコンピュー
タ１側から提供料金の支払いをすることなどのために用
いられる。In the above acoustic model distribution system, the terminal 2 on the provider side further transmits the provider information (name, address, telephone number, etc. of the provider) to the host computer 1.
May be provided. In this case, when the host computer 1 receives the provider information from the provider-side terminal 2, the host computer 1 provides the provider information to the storage unit 12, for example, from the provider. The stored acoustic model is stored in association with the acoustic model. The provider information is used for, for example, paying a provision fee from the host computer 1 to the provider of the acoustic model, as described later.

【００２６】また、上記音響モデル配信システムにおい
て、提供者側の端末２は、例えば提供者側の端末２内に
保存されている提供者の属性（提供者の性別や年齢）、
および／または、話者適応時の学習スコア、および／ま
たは、話者適応時に使用した音声サンプル（話者適応用
に発話された音声の一部（例えば最初の単語などの音声
サンプル）や音声の特徴量）などを、ＩＤ情報としてホ
ストコンピュータ１に提供する機能を有していても良
く、この場合、ホストコンピュータ１は、提供者側の端
末２からＩＤ情報を受信するとき、該ＩＤ情報を記憶手
段１２に例えばこの提供者から提供された音響モデルと
対応付けて蓄積するようになている。なお、このＩＤ情
報は、後述のように、入手者に最適な音響モデルを検索
するための付加情報として利用することができる。In the above acoustic model distribution system, the terminal 2 on the provider side is, for example, the attributes (sex and age of the provider) of the provider stored in the terminal 2 on the provider side.
And / or the learning score at the time of speaker adaptation, and / or the speech sample used at the time of speaker adaptation (a part of speech uttered for speaker adaptation (for example, a speech sample such as the first word) or a speech sample). May be provided to the host computer 1 as ID information. In this case, when the host computer 1 receives the ID information from the provider-side terminal 2, the host computer 1 transmits the ID information to the host computer 1. For example, the information is stored in the storage unit 12 in association with the acoustic model provided by the provider. This ID information can be used as additional information for searching for an acoustic model that is optimal for the user, as described later.

【００２７】なお、上記例では、ホストコンピュータ１
は、提供者の性別や年齢等の属性情報を提供者側の端末
２から受信するとしたが、i-modeのように提供者の性別
や年齢等の属性情報が所定の管理システムで一括管理さ
れている場合には、図２に示すように、ホストコンピュ
ータ１は、提供者側の端末２からではなく、所定の管理
システム２０から提供者の属性情報を受信して記憶手段
１２に蓄積し、入手者に最適な音響モデルを検索するた
めの付加情報として利用することもできる。In the above example, the host computer 1
Said that attribute information such as sex and age of the provider is received from the terminal 2 of the provider, but attribute information such as sex and age of the provider is collectively managed by a predetermined management system like i-mode. In this case, as shown in FIG. 2, the host computer 1 receives the attribute information of the provider not from the terminal 2 on the provider side but from the predetermined management system 20 and stores the attribute information in the storage unit 12. It can also be used as additional information for searching for the most appropriate acoustic model for the user.

【００２８】また、入手者側の端末３は、音響モデル検
索用の音声（例えば、定められた音響モデル検索用の発
話リストの内容（１〜数発話程度））をホストコンピュ
ータ１に与えるようになっている。また、入手者側の端
末３は、音響モデル検索用の音声とともに、入手者の属
性情報、および／または、予備選択用の音声サンプル
を、入手者のＩＤ情報としてホストコンピュータ１に与
える機能を有していても良い。この場合、ホストコンピ
ュータ１には、図３に示すように、入手者側の端末３か
ら与えられた入手者のＩＤ情報を提供者のＩＤ情報と照
合することによって（入手者側の端末３から与えられた
入手者のＩＤ情報を提供者のＩＤ情報と照合することに
よって、あるいは、入手者側の端末３から与えられた予
備選択用の音声サンプルを提供者からの音声サンプルと
照合することによって）、ある程度尤もらしい音響モデ
ルの候補を予備選択する予備検索手段１５がさらに設け
られていても良く、このとき、ホストコンピュータ１の
検索手段１３は、予備検索手段１５によって予備選択さ
れた音響モデルの候補の中から入手者の音声に適した音
響モデルを検索するようになっている。Further, the terminal 3 on the side of the acquirer gives the host computer 1 the sound for searching for the acoustic model (for example, the contents (about one to several utterances) of the specified utterance list for searching for the acoustic model). Has become. In addition, the terminal 3 on the side of the acquirer has a function of providing the host computer 1 with the attribute information of the acquirer and / or the audio sample for preliminary selection together with the audio for acoustic model search as the acquirer ID information. May be. In this case, as shown in FIG. 3, the host computer 1 compares the ID information of the acquirer given from the terminal 3 of the acquirer with the ID information of the provider (from the terminal 3 of the acquirer). By comparing the given ID information of the acquirer with the ID information of the provider, or by comparing the audio sample for preliminary selection given from the terminal 3 of the acquirer with the audio sample from the provider. ), A preliminary search means 15 for preliminarily selecting a sound model candidate which is likely to some extent may be further provided. At this time, the search means 13 of the host computer 1 searches for the sound model preliminarily selected by the preliminary search means 15. An acoustic model suitable for the voice of the acquirer is searched from the candidates.

【００２９】すなわち、図３の音響モデル配信システム
では、ホストコンピュータ１の記憶手段１２に記憶され
ている音響モデルの数がある程度の数に達し、全ての音
響モデルの中から入手者の音声にとって最適な音響モデ
ルを検索することが計算量の面で難かしくなったなどの
場合に、入手者側から音響モデル検索用音声と同時に提
出された年齢や性別などの入手者の属性情報を提供者の
属性情報と比較することによって、あるいは、入手者側
から提出された音声サンプルを提供者側から提出されて
いる音声サンプルと直接照合することによって、音響モ
デルを適用して認識スコアを計算するといった手間の掛
かる処理を行なうことなしに、ある程度尤もらしい音響
モデルの候補（例えば、入手者の属性情報にほぼ近い提
供者の属性情報があるときにこの提供者から提供された
音響モデル、あるいは、入手者の音声サンプルにほぼ近
い提供者の音声サンプルがあるときにこの提供者から提
供された音響モデル）を予備選択手段１５によって事前
に予備選択し、これにより、検索手段１３における音響
モデルの速やかな検索を実現することができる。That is, in the acoustic model distribution system of FIG. 3, the number of acoustic models stored in the storage means 12 of the host computer 1 reaches a certain number, and among all the acoustic models, the optimal for the voice of the acquirer. If it becomes difficult to search for a sound model in terms of the amount of calculation, the attribute information of the user, such as age and gender, submitted at the same time as the sound for the sound model search from the user is provided by the provider. The effort of calculating the recognition score by applying an acoustic model by comparing with attribute information, or by directly comparing the audio sample submitted by the provider with the audio sample submitted by the provider. Without performing the processing that takes a certain amount of time, a candidate for the acoustic model that is likely to some extent (for example, the attribute information of the provider almost close to the attribute information of the acquirer is The acoustic model provided by the provider at the time of the acquisition or the acoustic model provided by the provider when there is an audio sample of the provider which is almost similar to the audio sample of the acquirer) is preliminarily selected by the preselecting means 15. Preliminary selection makes it possible to realize a quick search for the acoustic model in the search means 13.

【００３０】具体的に、予備選択手段１５では、入手者
の性別や年齢、および入手者から提出された予備選択用
の音声サンプルなどの情報を、提供者から提出されてい
る同情報と比較するという単純な処理によって、記憶手
段１２に登録されている膨大な音響モデルの中から、入
手者に相応しいと思われる音響モデルの候補の絞り込み
を行うようになっている。Specifically, the preliminary selection means 15 compares information such as the sex and age of the acquirer and the audio sample for preliminary selection submitted by the acquirer with the information submitted by the provider. The simple processing described above narrows down the candidates of the acoustic model considered to be suitable for the acquirer from among the huge acoustic models registered in the storage unit 12.

【００３１】そして、ホストコンピュータ１の検索手段
１３では、例えば、後述のように、予備選択によって絞
り込まれた少数の音響モデルを入手者からの入力音声に
適用して音声認識し、各音響モデルに対するスコア（尤
度など）を計算し、その値が大きい順にソートすること
によって、入手者の音声に最適なモデルを第Ｎ位候補ま
で求めるようになっている。The search means 13 of the host computer 1 applies a small number of acoustic models narrowed down by preliminary selection to the input speech from the acquirer and performs speech recognition, as described later, and performs a speech recognition on each acoustic model. By calculating scores (likelihoods) and sorting the values in descending order, the most suitable model for the voice of the acquirer is obtained up to the N-th candidate.

【００３２】また、図１，図２または図３の音響モデル
配信システムにおいては、ホストコンピュータ１におい
て事前に定められている語彙情報が存在しており、この
とき、入手者は、最適な音響モデルを獲得するのに、入
手者側の端末３から、ホストコンピュータ１において事
前に定められている語彙情報に対する音響モデル検索用
音声とその正解情報をホストコンピュータ１に与える必
要がある。これにより、ホストコンピュータ１の検索手
段１３は、正解情報に一致する語彙情報を用いて、入手
者の音声の認識処理を行ない、その入手者の音声に対し
て一定の認識率を示す音響モデルを記憶手段１２から自
動検索することができる。ここで、ホストコンピュータ
１内に事前に定められている語彙に対する読み情報が設
定されていないときには、入手者側の端末３からは正解
情報として読み情報をホストコンピュータ１に与えるこ
とができる。また、ホストコンピュータ１内に事前に定
められている語彙に対する読み情報が設定されていると
きには、入手者側の端末３からは、正解情報として何番
目の読み情報であるかを指定するための番号をホストコ
ンピュータ１に与えることができる。Also, in the acoustic model distribution system shown in FIG. 1, FIG. 2 or FIG. 3, vocabulary information predetermined in the host computer 1 exists. It is necessary to provide the host computer 1 with the acoustic model search voice for the vocabulary information predetermined in the host computer 1 and its correct answer information from the terminal 3 on the side of the acquirer in order to obtain the information. As a result, the search means 13 of the host computer 1 uses the vocabulary information that matches the correct answer information to perform a speech recognition process of the acquirer, and generates an acoustic model that shows a constant recognition rate for the acquirer's speech. Automatic search can be performed from the storage unit 12. Here, when the reading information for the predetermined vocabulary is not set in the host computer 1, the reading information can be given to the host computer 1 as the correct answer information from the terminal 3 on the obtaining side. When reading information for a predetermined vocabulary is set in the host computer 1, a number for designating the number of reading information as correct answer information is sent from the terminal 3 on the side of the acquirer. To the host computer 1.

【００３３】図４は上記処理を詳細に説明するための図
である。図４を参照すると、ホストコンピュータ１内に
は、事前に定められている語彙情報に関する情報１０１
が存在している。なお、図４の例では、事前に定められ
ている語彙情報に関する情報１０１には、事前に定めら
れている語彙に対する読み情報が設定されている。具体
的に、ホストコンピュータ１内において、事前に定めら
れている語彙情報として、図４の例では、「東京」，
「大阪」，…が予め登録されており、その読み情報とし
て、「とうきょう」，「おおさか」，…が予め登録され
ている。この場合、入手者は、音響モデル検索用音声と
して、ホストコンピュータ１において事前に定められて
いる語彙情報（例えば「大阪」）に対する音響モデル検
索用音声“おおさか”を発声して入手者側の端末３に入
力するとともに、その正解情報として、「２番目の読み
情報」であることを指定する番号“２”を入手者側の端
末３に入力する。これにより、ホストコンピュータ１で
は、入手者側の端末３からの音響モデル検索用音声が読
み情報「おおさか」のものであると検知でき、検索手段
１３は、「おおさか」の各音素の音響モデルのうち、音
響モデル検索用音声“おおさか”に最適な音響モデルを
音声認識技術を用いて検索することができる。FIG. 4 is a diagram for explaining the above processing in detail. Referring to FIG. 4, information 101 about vocabulary information determined in advance is stored in the host computer 1.
Exists. In the example of FIG. 4, the reading information on the predetermined vocabulary is set in the information 101 on the predetermined vocabulary information. Specifically, in the example of FIG. 4, “Tokyo”,
"Osaka",... Are registered in advance, and "Tokyo", "Osaka",. In this case, the acquirer utters an acoustic model search voice “Osaka” for vocabulary information (for example, “Osaka”) determined in advance in the host computer 1 as an acoustic model search voice, and obtains a terminal on the side of the obtainer. 3 and, as the correct answer information, a number "2" designating the "second reading information" is input to the terminal 3 on the side of the acquirer. As a result, the host computer 1 can detect that the sound for searching the acoustic model from the terminal 3 on the side of the acquirer is that of the reading information “Osaka”, and the searching unit 13 determines the acoustic model of each phoneme of “Osaka”. Among them, the acoustic model most suitable for the acoustic model retrieval speech “Osaka” can be retrieved using the speech recognition technology.

【００３４】上述の例では、ホストコンピュータ１にお
いて音響モデル検索用として事前に定められている語彙
について最適な音響モデルを検索するようになっている
が、ホストコンピュータ１において事前に定められてい
る語彙について最適な音響モデルを検索するのではな
く、入手者が実際に利用したい語彙について最適な音響
モデルをホストコンピュータ１に選択させるために、入
手者側の端末３は、利用している音声認識システム用の
文法情報（認識対象語彙情報）をホストコンピュータ１
に与え、ホストコンピュータ１の検索手段１３は、その
文法情報を使用した場合に最も高い認識率を示す音響モ
デルを自動的に検索するように構成することもできる。In the above example, the host computer 1 searches for the most suitable acoustic model for the vocabulary predetermined for acoustic model search. In order not to search for an optimal acoustic model for the vocabulary but to allow the host computer 1 to select an optimal acoustic model for the vocabulary that the acquirer actually wants to use, the terminal 3 on the acquirer side uses the speech recognition system used. Grammar information (vocabulary information to be recognized) for host computer 1
And the search means 13 of the host computer 1 can be configured to automatically search for an acoustic model showing the highest recognition rate when the grammatical information is used.

【００３５】図５はこのような構成の音響モデル配信シ
ステムの処理を説明するための図である。図５を参照す
ると、入手者側の端末３は、語彙情報をホストコンピュ
ータ１に与えた後、入手者側の端末３からの語彙情報に
対する入手者の音響モデル検索用音声（音声サンプル）
とその正解情報をホストコンピュータ１に与え、ホスト
コンピュータ１の検索手段１３は、入手者側の端末３か
ら与えられた語彙情報を使用して入手者の音声に対し一
定の認識率を示す音響モデルを記憶手段１２から自動検
索するようになっている。すなわち、ホストコンピュー
タ１の検索手段１３は、入手者側の端末３から与えられ
た語彙情報（文法）による制約の下で、入手者の音声
（音声サンプル）を正しく認識することのできる音響モ
デルを検索するようになっている。FIG. 5 is a diagram for explaining the processing of the acoustic model distribution system having such a configuration. Referring to FIG. 5, after obtaining the vocabulary information to the host computer 1, the terminal 3 on the acquirer side obtains the acoustic model search voice (voice sample) of the acquirer for the vocabulary information from the terminal 3 on the acquirer side
And the correct information thereof to the host computer 1, and the search means 13 of the host computer 1 uses the vocabulary information given from the terminal 3 on the side of the acquirer to set the acoustic model showing a certain recognition rate for the speech of the acquirer. Is automatically retrieved from the storage means 12. That is, the search means 13 of the host computer 1 generates an acoustic model capable of correctly recognizing the speech (speech sample) of the acquirer under the constraint of the vocabulary information (grammar) given from the terminal 3 on the acquirer side. It is designed to search.

【００３６】なお、図５の例では、入手者側の端末３が
携帯電話であり、入手者側の端末３は、語彙情報とし
て、電話番号リストを与える場合が示されている。すな
わち、図５の例では、入手者側の端末３は、語彙情報と
して、名前の読み情報「すずきさん」，「さとうさ
ん」，…と、それに対応した電話番号とをホストコンピ
ュータ１に与えるようになっている。そして、この場
合、入手者側の端末３は、入手者側の端末３からホスト
コンピュータ１に与えられる語彙情報に存在する語，例
えば「すずきさん」，「さとうさん」などを入手者に発
声させて、その音声データおよびその正解情報（例え
ば、その音声データが語彙情報の何番目に対応するかを
指示する番号）をホストコンピュータ１に与えることが
できる。そして、ホストコンピュータ１の検索情報１３
では、事前に定められている語彙について最適な音響モ
デルを検索するのではなく、入手者側の端末３から与え
られた語彙情報を用いて、入手者の音声に対し一定の認
識率を示す音響モデルを記憶手段１２から自動検索し、
入手者側の端末３に提供することができる。In the example of FIG. 5, the terminal 3 on the acquirer side is a mobile phone, and the terminal 3 on the acquirer side gives a telephone number list as vocabulary information. That is, in the example shown in FIG. 5, the terminal 3 on the side of the acquirer gives the name reading information “Suzuki-san”, “Sato-san”,... It has become. In this case, the terminal 3 on the side of the acquirer causes the acquirer to utter words present in the vocabulary information given from the terminal 3 on the acquirer side to the host computer 1, for example, "Suzuki-san", "Sato-san" or the like. Thus, the host computer 1 can be provided with the voice data and the correct answer information (for example, a number indicating the position of the vocabulary information corresponding to the voice data). Then, the search information 13 of the host computer 1
Then, instead of searching for an optimal acoustic model for a predetermined vocabulary, using a vocabulary information provided from the terminal 3 on the side of the acquirer, an acoustic model showing a certain recognition rate for the voice of the acquirer is used. The model is automatically retrieved from the storage unit 12,
It can be provided to the terminal 3 on the side of the acquirer.

【００３７】このように、入手者側の端末３から語彙情
報をホストコンピュータ１に与え、また、入手者側の端
末３からの語彙情報に対する入手者の音響モデル検索用
の音声とその正解情報をホストコンピュータ１に与える
構成では、入手者の使用目的に則した、よりふさわしい
音響モデルを検索，提供することができる。As described above, the vocabulary information is provided to the host computer 1 from the terminal 3 on the side of the acquirer, and the voice for searching the acoustic model of the acquirer and the correct information thereof for the vocabulary information from the terminal 3 on the side of the acquirer are With the configuration provided to the host computer 1, it is possible to search for and provide a more suitable acoustic model according to the purpose of use of the user.

【００３８】なお、上述の例では、語彙情報として、電
話番号リストをホストコンピュータ１に与えているが、
電話番号リスト中の名前やニックネームなどの直接の認
識対象となる情報のみを語彙情報としてホストコンピュ
ータ１に与え、個々の名前に付与されている電話番号そ
のものはホストコンピュータ１に与えないようなフィル
タ機能を入手者側の端末３に持たせることもでき、この
場合には、プライバシーの保護を実現することができ
る。In the above example, a telephone number list is given to the host computer 1 as vocabulary information.
A filter function in which only information to be directly recognized, such as names and nicknames in the telephone number list, is given to the host computer 1 as vocabulary information, and the telephone numbers assigned to individual names are not given to the host computer 1. Can be provided in the terminal 3 on the side of the acquirer, and in this case, protection of privacy can be realized.

【００３９】また、図１，図２または図３の音響モデル
配信システムにおいて、図６に示すように（なお、図６
の例では、図３を基にしている）、ホストコンピュータ
１は、検索手段１３で音響モデルが検索されたとき、検
索された音響モデルを入手者側の端末３に与えるに先立
って、該音響モデルの性能を確認するための確認手段１
６をさらに有していても良く、この場合、該確認手段１
６は、入手者側の端末３から送られた音声を、検索され
た音響モデルを用いて実際に音声認識し、そのスコアを
確認結果として入手者側の端末３に与えることができ
る。In the acoustic model distribution system shown in FIG. 1, FIG. 2 or FIG. 3, as shown in FIG.
3 is based on FIG. 3), when the acoustic model is retrieved by the retrieval means 13, the host computer 1 sends the retrieved acoustic model to the terminal 3 on the side of the acquirer before the acoustic model is retrieved. Confirmation means 1 for confirming model performance
6 may be further provided. In this case, the confirmation means 1
6 can actually recognize the voice sent from the terminal 3 on the acquirer side by using the searched acoustic model, and give the score to the terminal 3 on the acquirer side as a confirmation result.

【００４０】すなわち、図６の音響モデル配信システム
では、ホストコンピュータ１において自動的に検索（選
択）された音響モデルを入手者側の端末３に実際にダウ
ンロードする前に、入手者側の端末３は、ホストコンピ
ュータ１において自動的に検索（選択）された音響モデ
ルの性能をホストコンピュータ１の確認手段１６によっ
てオンラインでチェックすることができる。具体的に、
ホストコンピュータ１の確認手段１６では、入手者側の
端末３から送られた音声（入手者から提出された評価試
験用の音声サンプル）を、選択された音響モデルを用い
て実際に音声認識し、そのスコア（確認結果）を入手者
側の端末３に伝達する。これにより、入手者は、ホスト
コンピュータ１において自動的に検索（選択）された音
響モデルを実際に入手しても良いものか否かを事前に判
断できる。That is, in the acoustic model distribution system of FIG. 6, before the acoustic model automatically retrieved (selected) by the host computer 1 is actually downloaded to the terminal 3 of the acquirer, the terminal 3 of the acquirer is required to download the acoustic model. Can check the performance of the acoustic model automatically searched (selected) in the host computer 1 by the checking means 16 of the host computer 1 online. Specifically,
The confirmation means 16 of the host computer 1 actually recognizes the voice (voice sample for evaluation test submitted by the obtainer) transmitted from the terminal 3 on the obtainer side by using the selected acoustic model, and The score (confirmation result) is transmitted to the terminal 3 on the side of the acquirer. Thus, the acquirer can determine in advance whether or not the acoustic model automatically searched (selected) by the host computer 1 may be actually acquired.

【００４１】また、図６の音響モデル配信システムにお
いて、ホストコンピュータ１の検索手段１３は、最適な
音響モデルだけでなく、所定順位までの音響モデルの候
補を検索する機能を有していても良く、この場合、前記
確認手段１６は、検索手段１３により検索された１つの
音響モデルの候補を入手者が気に入らない場合には、他
の音響モデルの候補を提示することができる。すなわ
ち、最適な音響モデルだけでなく、第２位の候補，第３
位の候補を検索する機能をホストコンピュータ１の検索
手段１３にもたせ、自動的に選択された音響モデルの性
能を入手者が気に入らない場合には、第２位の候補，第
３位の候補と、ある一定の数に達するまで次々に入手者
に提示することもできる。In the acoustic model distribution system of FIG. 6, the search means 13 of the host computer 1 may have a function of searching for not only an optimal acoustic model but also acoustic model candidates up to a predetermined order. In this case, if the acquirer does not like one acoustic model candidate retrieved by the retrieval unit 13, the confirmation unit 16 can present another acoustic model candidate. In other words, not only the optimal acoustic model, but also the second-
If the search means 13 of the host computer 1 is provided with a function of searching for a candidate of the rank, and the user does not like the performance of the automatically selected acoustic model, the candidate of the second rank and the candidate of the third rank are determined. It can also be presented to the acquirer one after another until a certain number is reached.

【００４２】また、図１，図２，図３または図６の音響
モデル配信システムにおいて、図７に示すように（な
お、図７の例では、図６を基にしている）、入手者側の
端末３がホストコンピュータ１から音響モデルを購入し
たとき、入手者への課金を行なう課金手段１７をホスト
コンピュータ１にもたせることもできる。In the acoustic model distribution system shown in FIG. 1, FIG. 2, FIG. 3 or FIG. 6, as shown in FIG. 7 (the example of FIG. 7 is based on FIG. 6), When the terminal 3 has purchased an acoustic model from the host computer 1, the host computer 1 may be provided with a charging means 17 for charging the user.

【００４３】また、図１，図２，図３，図６または図７
の音響モデル配信システムにおいて、図８に示すように
（なお、図８の例では、図７を基にしている）、提供者
側の端末２から音響モデルがホストコンピュータ１に提
供されたとき、提供者への料金の支払いを行なう支払手
段１８をホストコンピュータ１にもたせることもでき
る。FIG. 1, FIG. 2, FIG. 3, FIG. 6 or FIG.
When the acoustic model is provided from the provider-side terminal 2 to the host computer 1 as shown in FIG. 8 (the example of FIG. 8 is based on FIG. 7), Payment means 18 for paying a fee to the provider may be provided in the host computer 1.

【００４４】次に、上述した本発明の音響モデル配信シ
ステム（例えば、図８の音響モデル配信システム）にお
ける処理動作の具体例を説明する。いま、音声認識機能
を有する端末を利用しているあるユーザー（提供者側の
端末２のユーザー）が、自分の音声で話者適応用の指定
された単語あるいは文などを実際に発話し、自分の音声
にチューニングした音響モデルを構築したとする。Next, a specific example of the processing operation in the above-described acoustic model distribution system of the present invention (for example, the acoustic model distribution system of FIG. 8) will be described. Now, a user using a terminal having a voice recognition function (a user of the terminal 2 on the provider side) actually speaks a designated word or sentence for speaker adaptation in his / her own voice, and Suppose that an acoustic model tuned to the voice of the user has been constructed.

【００４５】その後、このユーザー（提供者）は、提供
者側の端末２を音響モデル配信用のホストコンピュータ
１に電話回線（例えば、公衆電話回線）あるいはインタ
ーネット等のコンピュータネットワークを介して接続
し、音響モデルを提供するためのキー操作などを行な
う。これにより、提供者側の端末２からは、音響モデル
のパラメータ情報がホストコンピュータ１に送信され
る。提供者側の端末２からは、更に、音響モデルのパラ
メータ以外に、提供者情報、あるいは、提供者の性別や
年齢を特定するための情報や、端末２内に保存しておい
た話者適応用に発話した音声サンプルの一部（数単語程
度）などの提供者のＩＤ情報を、付加情報として、同時
にホストコンピュータ１に送信することができる。Thereafter, the user (provider) connects the provider-side terminal 2 to the host computer 1 for distributing the acoustic model via a telephone line (for example, a public telephone line) or a computer network such as the Internet. Performs key operations for providing an acoustic model. As a result, the acoustic model parameter information is transmitted from the provider terminal 2 to the host computer 1. From the terminal 2 on the provider side, in addition to the parameters of the acoustic model, provider information, information for specifying the sex and age of the provider, and speaker adaptation stored in the terminal 2 ID information of the provider such as a part (about several words) of the voice sample spoken for the user can be simultaneously transmitted to the host computer 1 as additional information.

【００４６】一方、自分の音声に適した音響モデルを入
手したいユーザー（入手者側の端末３のユーザー）は、
入手者側の端末３を音響モデル配信用のホストコンピュ
ータ１に電話回線（例えば、公衆電話回線）あるいはイ
ンターネット等のコンピュータネットワークを介して接
続した後、音声（数単語程度の音声）およびその正解情
報、さらには、入手者情報、あるいは、入手者の性別や
年齢などを特定するための入手者のＩＤ情報をホストコ
ンピュータ１に送信する。On the other hand, a user who wants to obtain an acoustic model suitable for his / her voice (user of the terminal 3 on the side of the user)
After connecting the terminal 3 on the side of the acquirer to the host computer 1 for distributing the acoustic model via a telephone line (for example, a public telephone line) or a computer network such as the Internet, a voice (voice of several words) and its correct answer information are obtained. Further, it transmits to the host computer 1 the acquirer information or the acquirer ID information for specifying the sex, age, etc. of the acquirer.

【００４７】音響モデル配信用のホストコンピュータ１
上では、入手者の性別や年齢に近い提供者から提供され
た音響モデルを予備的にいくつか選択し、その中から、
入手者により提出された音声サンプルとその正解情報を
用いて、入手者の音声に対して最も高い認識率（あるい
は平均認識スコア）を示すものの上位Ｎ個を検索する。Host computer 1 for distribution of acoustic model
Above, preliminary selection of some acoustic models provided by providers close to the sex and age of the acquirer, and from among them,
Using the voice sample submitted by the acquirer and its correct answer information, the top N searchers showing the highest recognition rate (or average recognition score) for the acquirer's voice are searched.

【００４８】このようにして選び出された音響モデルに
対して、入手者は、指定された確認用の単語や文を音響
モデル配信用のホストコンピュータ１に提出することが
できる。これが提出された場合、ホストコンピュータ１
は、指定された確認用の単語や文を含むある程度の規模
の文法を利用して、確認用の音声に対する認識処理を行
ない、認識結果あるいは認識率の情報を入手者側の端末
３に伝える。For the acoustic model selected in this way, the acquirer can submit the specified confirmation word or sentence to the acoustic model distribution host computer 1. If this is submitted, host computer 1
Performs a recognition process on the voice for confirmation using a grammar of a certain scale including a specified word or sentence for confirmation, and transmits information of the recognition result or the recognition rate to the terminal 3 on the side of the acquirer.

【００４９】入手者がその音響モデルを気に入れば、入
手者側の端末３において入手決定のためのキー操作など
を行なうことによって、その音響モデルが入手者側の端
末３に自動的にダウンロードされる。If the acquirer likes the acoustic model, the acoustic model is automatically downloaded to the acquirer's terminal 3 by performing a key operation or the like on the acquirer's terminal 3 to determine the acquisition. .

【００５０】また、入手者がその音響モデルを気に入ら
ない場合には、入手者側の端末３においてその旨を伝え
るキー操作などを行なうことによって、音響モデル配信
用のホストコンピュータ１上では、第Ｎ位（Ｎの値は繰
返し数に応じて２、３、...とある程度の数に至るまで
１ずつ増やす）の音響モデルに対して同様の処理を繰り
返す。If the acquirer does not like the acoustic model, the user operates the terminal 3 on the acquirer side to perform a key operation or the like to notify the user of the acoustic model. The same processing is repeated for the acoustic model of the order (the value of N is increased by one until the number reaches a certain number, such as 2, 3,... According to the number of repetitions).

【００５１】最終的に入手者が音響モデルの入手（購
入）を決定した時点で、ホストコンピュータ１は、入手
者に対して課金を行ない、その一部（または全部）をホ
ストコンピュータ１（仲介業者）が受け取り、残りを提
供者へ送金、あるいは通話料の割引などによって支払
う。When the acquirer finally decides to acquire (purchase) the acoustic model, the host computer 1 charges the acquirer and transfers a part (or all) of the charge to the host computer 1 (intermediary agent). ), And pay the rest to the provider by remittance or discount on call charges.

【００５２】より具体的には、入手者が最終的に特定の
音響モデルの購入を決定し、その音響モデルが入手者側
の端末３に向けて配信された時点で、ホストコンピュー
タ１は、入手者の情報、および入手者に提供された音響
モデルの提供者の情報を対にして記憶し、後日、入手者
への課金、あるいは提供者への報酬支払いを行う。More specifically, when the acquirer finally decides to purchase a specific acoustic model, and the acoustic model is distributed to the terminal 3 of the acquirer, the host computer 1 The information of the user and the information of the provider of the acoustic model provided to the acquirer are stored as a pair, and at a later date, the acquirer is charged or the reward is paid to the provider.

【００５３】[0053]

【発明の効果】以上に説明したように、請求項１乃至請
求項１６記載の発明によれば、ホストコンピュータと、
音響モデルを提供する提供者側の端末と、音響モデルを
入手する入手者側の端末とを有し、前記ホストコンピュ
ータは、提供者側の端末から提供された音響モデルを受
信すると、受信した音響モデルを蓄積し、入手者側の端
末から入手者の音声が入力されるとき、入手者の音声に
適した音響モデルを、蓄積されている音響モデルの中か
ら検索し、検索した音響モデルを入手者側の端末に送出
するので、入手者側の端末においては、話者適応のため
の労力を費やすことなしに、自分の音声に適した音響モ
デルをホストコンピュータから入手することが可能にな
る。さらに、入手者側の端末が話者適応機能を有してい
ないものであっても、入手者側の端末では、入手者の音
声に適した音響モデルを獲得できるという効果である。
すなわち、端末上で話者適応処理を実現するためには、
ある程度処理能力の高いＣＰＵを搭載する必要がある
が、本発明によれば、このようなＣＰＵを搭載していな
い安価な端末（入手者側の端末）のユーザーでも自分の
声に適した音響モデルが獲得でき、高い認識率が実現さ
れるため、結果的に端末（入手者側の端末）のコストダ
ウンが可能になる。As described above, according to the first to sixteenth aspects of the present invention, a host computer,
A terminal of a provider that provides an acoustic model, and a terminal of a provider that acquires an acoustic model, wherein the host computer receives the acoustic model provided from the terminal of the provider, and receives the received acoustic model. When the model is stored and the acquirer's voice is input from the acquirer's terminal, an acoustic model suitable for the acquirer's voice is searched from the stored acoustic models, and the retrieved acoustic model is obtained. Since the data is transmitted to the terminal on the speaker side, the terminal on the acquirer side can obtain an acoustic model suitable for the user's own voice from the host computer without spending labor for speaker adaptation. Further, even if the terminal of the acquirer does not have the speaker adaptation function, the terminal of the acquirer can acquire an acoustic model suitable for the voice of the acquirer.
That is, in order to realize speaker adaptation processing on a terminal,
According to the present invention, even a user of an inexpensive terminal (a terminal on the side of the acquirer) not equipped with such a CPU needs to be equipped with a CPU having a high processing capability to some extent. , And a high recognition rate is realized, so that the cost of the terminal (the terminal on the side of the acquirer) can be reduced as a result.

【００５４】特に、請求項４記載の発明によれば、請求
項１記載の音響モデル配信システムにおいて、提供者側
の端末は、さらに、提供者の属性情報、および／また
は、話者適応時の学習スコア、および／または、話者適
応時に使用した音声サンプルを、提供者のＩＤ情報とし
てホストコンピュータに提供する機能を有し、前記ホス
トコンピュータは、提供者側の端末から提供者のＩＤ情
報を受信するとき、該提供者のＩＤ情報を記憶手段に蓄
積し、入手者に最適な音響モデルを検索するための付加
情報として利用するようになっているので、ホストコン
ピュータでは、入手者に最適な音響モデルを適確に検索
することが可能になる。In particular, according to the fourth aspect of the present invention, in the acoustic model distribution system according to the first aspect, the terminal on the provider side further includes attribute information of the provider and / or a speaker adaptation time. It has a function of providing a learning score and / or a voice sample used at the time of speaker adaptation to a host computer as ID information of a provider, and the host computer transmits ID information of the provider from a terminal on the provider side. At the time of reception, the ID information of the provider is stored in the storage means and is used as additional information for searching for an acoustic model that is optimal for the acquirer. It is possible to search for an acoustic model accurately.

【００５５】また、請求項５記載の発明によれば、請求
項１記載の音響モデル配信システムにおいて、提供者の
属性情報が所定の管理システムで一括管理されている場
合には、ホストコンピュータは、所定の管理システムか
ら提供者の属性情報を受信して記憶手段に蓄積し、入手
者に最適な音響モデルを検索するための付加情報として
利用するようになっているので、入手者に最適な音響モ
デルを適確に検索することが可能になる。According to the fifth aspect of the present invention, in the acoustic model distribution system according to the first aspect, when the attribute information of the provider is collectively managed by a predetermined management system, the host computer is configured to: The attribute information of the provider is received from a predetermined management system and stored in the storage means, and is used as additional information for searching for an acoustic model that is optimal for the acquirer. The model can be searched accurately.

【００５６】また、請求項６記載の発明によれば、請求
項４または請求項５記載の音響モデル配信システムにお
いて、入手者側の端末は、音響モデル検索用の音声とと
もに、入手者の属性情報、および／または、音声サンプ
ルを、入手者のＩＤ情報としてホストコンピュータに与
える機能を有し、また、ホストコンピュータは、入手者
側の端末から入手者のＩＤ情報が与えられたときに、入
手者側の端末から与えられた入手者のＩＤ情報を提供者
のＩＤ情報と照合することによって、ある程度尤もらし
い音響モデルの候補を記憶手段から予備選択する予備検
索手段をさらに有し、ホストコンピュータの検索手段
は、予備検索手段によって予備選択された音響モデルの
候補の中から入手者の音声に適した音響モデルを検索す
るようになっているので、入手者に最適な音響モデルを
より一層適確に検索することが可能になる。According to the sixth aspect of the present invention, in the acoustic model distribution system according to the fourth or fifth aspect, the terminal on the acquirer side acquires the attribute information of the acquirer together with the sound for acoustic model search. And / or a function of giving the audio sample to the host computer as the ID information of the acquirer, and the host computer, when the ID information of the acquirer is given from the terminal of the acquirer, A preliminary search means for preliminarily selecting a likely acoustic model candidate from the storage means by comparing the ID information of the acquirer provided from the terminal on the side with the ID information of the provider, and The means retrieves an acoustic model suitable for the voice of the acquirer from among the acoustic model candidates preliminarily selected by the preliminary retrieval means. In, it is possible to find the optimal acoustic model to obtain user even more suitable probability.

【００５７】また、請求項８記載の発明によれば、請求
項１乃至請求項６のいずれか一項に記載の音響モデル配
信システムにおいて、入手者側の端末は、語彙情報をホ
ストコンピュータに与え、また、入手者側の端末からの
語彙情報に対する入手者の音響モデル検索用の音声とそ
の正解情報をホストコンピュータに与え、ホストコンピ
ュータの検索手段は、入手者側の端末から与えられた語
彙情報を使用して入手者の音声に対し一定の認識率を示
す音響モデルを前記記憶手段から自動検索するようにな
っているので、入手者の使用目的に則した、よりふさわ
しい音響モデルを検索，提供することができる。According to an eighth aspect of the present invention, in the acoustic model distribution system according to any one of the first to sixth aspects, the terminal on the acquirer side provides the vocabulary information to the host computer. Also, a speech for searching the acoustic model of the acquirer with respect to the vocabulary information from the acquirer's terminal and its correct answer information are given to the host computer, and the search means of the host computer uses the lexical information given from the acquirer's terminal. Is automatically searched from the storage means for an acoustic model showing a certain recognition rate for the voice of the acquirer, so that a more appropriate acoustic model suitable for the purpose of use of the acquirer is searched and provided. can do.

【００５８】また、請求項９記載の発明によれば、請求
項８記載の音響モデル配信システムにおいて、入手者側
の端末が携帯電話であり、語彙情報として電話番号リス
トを使用するときには、電話番号リスト中の名前やニッ
クネームなどの情報のみを語彙情報としてホストコンピ
ュータに与え、電話番号そのものはホストコンピュータ
に与えないようなフィルタ機能を入手者側の端末にもた
せるので、プライバシーの保護を実現することができ
る。According to the ninth aspect of the present invention, in the acoustic model distribution system according to the eighth aspect, when the terminal on the acquirer side is a mobile phone and a telephone number list is used as vocabulary information, the telephone number is used. Only the information such as names and nicknames in the list are given to the host computer as vocabulary information, and a filter function that does not give the telephone number itself to the host computer is provided to the terminal of the acquirer, so that privacy protection can be realized. it can.

【００５９】また、請求項１０，請求項１１記載の発明
によれば、請求項１乃至請求項９のいずれか一項に記載
の音響モデル配信システムにおいて、ホストコンピュー
タは、さらに、検索手段で音響モデルが検索されたと
き、検索された音響モデルを入手者側の端末に与えるに
先立って、該音響モデルの性能を確認するための確認手
段を有し、該確認手段は、入手者側の端末から送られた
音声を、検索された音響モデルを用いて実際に音声認識
し、そのスコアを確認結果として入手者側の端末に与え
るので、入手者は、ホストコンピュータ１において自動
的に検索（選択）された音響モデルを実際に入手しても
良いものか否かを事前に判断できる。According to the tenth and eleventh aspects of the present invention, in the acoustic model distribution system according to any one of the first to ninth aspects, the host computer further includes: When a model is searched, prior to providing the searched acoustic model to the terminal of the acquirer, the model has a confirmation unit for confirming the performance of the acoustic model, and the confirmation unit includes a terminal of the acquirer. Is actually recognized using the searched acoustic model, and the score is given to the terminal on the side of the acquirer as a confirmation result, so that the acquirer automatically searches (selects) in the host computer 1. ) It is possible to determine in advance whether the acquired acoustic model is actually acceptable.

[Brief description of the drawings]

【図１】本発明に係る音響モデル配信システムの構成例
を示す図である。FIG. 1 is a diagram showing a configuration example of an acoustic model distribution system according to the present invention.

【図２】本発明に係る音響モデル配信システムの他の構
成例を示す図である。FIG. 2 is a diagram showing another configuration example of the acoustic model distribution system according to the present invention.

【図３】本発明に係る音響モデル配信システムの他の構
成例を示す図である。FIG. 3 is a diagram showing another configuration example of the acoustic model distribution system according to the present invention.

【図４】ホストコンピュータ内に事前に定められている
語彙に対する読み情報が設定されているときの、入手者
側の端末とホストコンピュータとの間での処理を説明す
るための図である。FIG. 4 is a diagram for explaining processing between the terminal on the side of the acquirer and the host computer when reading information for a predetermined vocabulary is set in the host computer.

【図５】入手者側の端末から、利用している音声認識シ
ステム用の文法情報（認識対象語彙情報）をホストコン
ピュータに与えるときの、入手者側の端末とホストコン
ピュータとの間での処理を説明するための図である。FIG. 5 is a process between the terminal on the acquirer side and the host computer when the grammar information (recognition target vocabulary information) for the speech recognition system being used is provided from the terminal on the acquirer side to the host computer. FIG.

【図６】本発明に係る音響モデル配信システムの他の構
成例を示す図である。FIG. 6 is a diagram showing another configuration example of the acoustic model distribution system according to the present invention.

【図７】本発明に係る音響モデル配信システムの他の構
成例を示す図である。FIG. 7 is a diagram showing another configuration example of the acoustic model distribution system according to the present invention.

【図８】本発明に係る音響モデル配信システムの他の構
成例を示す図である。FIG. 8 is a diagram showing another configuration example of the acoustic model distribution system according to the present invention.

[Explanation of symbols]

１ホストコンピュータ２提供者側の端末３入手者側の端末１１受信手段１２記憶手段１３検索手段１４送出手段１５予備検索手段１６確認手段１７課金手段１８支払手段２０所定の管理システム DESCRIPTION OF SYMBOLS 1 Host computer 2 Provider's terminal 3 Acquirer's terminal 11 Receiving means 12 Storage means 13 Search means 14 Sending means 15 Preliminary search means 16 Confirmation means 17 Billing means 18 Payment means 20 Predetermined management system

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ０６Ｆ 17/60 ３０２Ｇ０６Ｆ 17/60 ５０６５０６Ｇ１０Ｌ 3/00 ５２１ＳＧ１０Ｌ 15/00 ５５１Ａ ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G06F 17/60 302 G06F 17/60 506 506 G10L 3/00 521S G10L 15/00 551A

Claims

[Claims]

1. A host computer, a provider terminal for providing an acoustic model, and a provider terminal for acquiring an acoustic model, wherein the host computer is provided from the provider terminal. Receiving means for receiving the acoustic model, storage means for storing the acoustic model received by the receiving means, and when the voice of the acquirer is input from the terminal on the acquirer side, the acoustic model suitable for the voice of the acquirer, An acoustic model comprising: retrieval means for retrieving from acoustic models stored in a storage means; and transmission means for transmitting the acoustic model retrieved by the retrieval means to a terminal on the side of the acquirer. Delivery system.

2. The acoustic model distribution system according to claim 1, wherein the terminal on the provider side has a speaker adaptation function of generating an acoustic model by adapting a speaker by inputting a speech of the provider. An acoustic model distribution system, characterized in that:

3. The acoustic model distribution system according to claim 1, wherein the provider terminal further has a function of providing provider information to a host computer, and wherein the host computer has a provider terminal. Wherein the provider information is stored in a storage unit when the provider information is received from the computer.

4. The acoustic model distribution system according to claim 1, wherein the terminal on the provider side further includes attribute information of the provider and / or a learning score at the time of speaker adaptation and / or speaker adaptation. The host computer has a function of providing the used audio sample to the host computer as provider ID information. When the host computer receives the provider ID information from the provider side terminal, Is stored in a storage means, and is used as additional information for searching for an acoustic model most suitable for the user.

5. In the acoustic model distribution system according to claim 1, when the attribute information of the provider is collectively managed by a predetermined management system, the host computer transmits the attribute information of the provider from the predetermined management system. And receiving the stored information in a storage means, and using the received information as additional information for searching for an acoustic model most suitable for the user.

6. The acoustic model distribution system according to claim 4 or 5, wherein the terminal on the acquirer side acquires the attribute information of the acquirer together with the sound for acoustic model search.
Alternatively, the host computer has a function of providing the audio sample to the host computer as ID information of the acquirer, and when the ID information of the acquirer is given from the terminal of the acquirer, the host computer And a preliminary search unit that preliminarily selects, from the storage unit, candidates of the acoustic model that are likely to some extent by comparing the ID information of the acquirer given from with the ID information of the provider, and the search unit of the host computer includes: An acoustic model distribution system wherein an acoustic model suitable for the voice of the acquirer is retrieved from acoustic model candidates preliminarily selected by preliminary retrieval means.

7. The acoustic model distribution system according to claim 1, wherein the terminal on the side of the acquirer is an acoustic model of the acquirer with respect to vocabulary information predetermined in the host computer. A search voice and its correct answer information are provided to the host computer, and the search means of the host computer stores the acoustic model indicating a certain recognition rate with respect to the voice of the acquirer using predetermined vocabulary information. An acoustic model distribution system characterized in that the acoustic model is automatically searched for from a means.

8. The acoustic model distribution system according to claim 1, wherein the terminal on the side of the acquirer provides the vocabulary information to the host computer, and receives the vocabulary information from the terminal on the side of the acquirer. The voice for searching the acoustic model of the user with respect to the vocabulary information and the correct information thereof are provided to the host computer, and the search means of the host computer uses the vocabulary information provided from the terminal on the side of the obtainer to perform the search for the voice of the obtainer An acoustic model distribution system wherein an acoustic model having a constant recognition rate is automatically retrieved from the storage means.

9. The acoustic model distribution system according to claim 8, wherein the terminal of the acquirer is a mobile phone, and when a telephone number list is used as vocabulary information, only information such as names and nicknames in the telephone number list is provided. , As a vocabulary information, and a filter function is provided in the terminal of the acquirer so that the telephone number itself is not given to the host computer.

10. The acoustic model distribution system according to any one of claims 1 to 9, wherein the host computer further obtains the retrieved acoustic model when the acoustic model is retrieved by the retrieval means. Prior to giving to the terminal on the side of the user, the apparatus has a confirmation unit for confirming the performance of the acoustic model, and the confirmation unit uses the retrieved acoustic model to convert the voice sent from the terminal on the side of the acquirer. An acoustic model distribution system, which performs actual speech recognition and gives the score as a confirmation result to a terminal on the side of the acquirer.

11. The acoustic model distribution system according to claim 10, wherein said search means has a function of searching for not only an optimal acoustic model but also acoustic model candidates up to a predetermined order. An acoustic model distribution system characterized by presenting another acoustic model candidate when the acquirer does not like one acoustic model candidate retrieved by the retrieval means.

12. The acoustic model distribution system according to any one of claims 1 to 11, wherein the host computer transmits to the acquirer when the terminal on the acquirer side purchases the acoustic model from the host computer. An acoustic model distribution system characterized by having a charging means for performing charging of (1).

13. The acoustic model distribution system according to claim 1, wherein the host computer charges a fee to the provider when the acoustic model is provided from a terminal on the provider side. An acoustic model distribution system comprising payment means for making a payment.

14. A host computer, a provider terminal for providing an acoustic model, and a provider terminal for obtaining an acoustic model, wherein the host computer is provided from the provider terminal. When the acoustic model is received, the received acoustic model is stored, and when the acquirer's voice is input from the acquirer's terminal, the acoustic model suitable for the acquirer's voice is selected from the stored acoustic models. A method for distributing acoustic models, characterized by searching and transmitting the searched acoustic models to a terminal on the side of the acquirer.

15. The acoustic model distribution method according to claim 14, wherein the host computer, when the terminal on the side of the purchaser has purchased the acoustic model from the host computer,
An acoustic model distribution method characterized by charging a user.

16. The acoustic model distribution method according to claim 14 or 15, wherein the host computer receives the acoustic model from a terminal on a provider side.
A method of delivering an acoustic model, comprising paying a fee to a provider.