JP2003241789A

JP2003241789A - Device and method for speech recognition dictionary creation

Info

Publication number: JP2003241789A
Application number: JP2002044743A
Authority: JP
Inventors: Noriaki Otani; 教明大谷
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2002-02-21
Filing date: 2002-02-21
Publication date: 2003-08-29

Abstract

<P>PROBLEM TO BE SOLVED: To make a system not misrecognize a track number which is never possible to exist when a speech recognition is carried out with a track number of a CD. <P>SOLUTION: This device is provided with an attribute information input part 15 which acquires attribute information (e.g. the number of tracks) regarding the recording contents of the CD from an audio unit 20 controlled through speech recognition and a dictionary creation part 16 which creates a speech recognition dictionary 12 within a range of the number of commands corresponding to the acquired attribute information; and the speech recognition dictionary 12 having commands as many as pieces of music actually recorded on the CD as an object of speech recognition is dynamically created to exclude track numbers in advance, which are not possible to exist on the inserted CD including even track numbers which are possibly misrecognized as actually present track numbers, thereby eliminating the trouble that a track number which is not present on the CD is recognized by mistake. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は音声認識辞書作成装
置および方法に関し、特に、電子機器の制御を音声認識
により行う際に使用する音声認識辞書を作成するための
装置および方法に用いて好適なものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus and method for creating a voice recognition dictionary, and more particularly, it is suitable for use in an apparatus and method for creating a voice recognition dictionary to be used when electronic equipment is controlled by voice recognition. It is a thing.

【０００２】[0002]

【従来の技術】最近の車両の殆どには、各種の電子機器
が搭載されている。搭載される電子機器には、例えば、
ＣＤ（コンパクトディスク）・ＭＤ（ミニディスク）・
カセットテープ等に記録された音楽等を再生したり、受
信したラジオ放送を再生したりするオーディオ装置、エ
アーコンディショナー（エアコン）、車両の走行案内を
行って運転者が所望の目的地に容易に到達できるように
したナビゲーション装置などがある。2. Description of the Related Art Most of recent vehicles are equipped with various electronic devices. The electronic equipment installed, for example,
CD (compact disc), MD (mini disc),
The driver can easily reach the desired destination by playing the audio device that plays back the music recorded on the cassette tape or playing back the received radio broadcast, the air conditioner (air conditioner), and the running guide of the vehicle. There is a navigation device that can be used.

【０００３】これらオーディオ装置、エアコン、ナビゲ
ーション装置などの各種電子機器がバスによって接続さ
れた一体型のシステムも存在する。このような一体型シ
ステムでは、バスを通じて各電子機器間で情報のやり取
りを行うことにより、１つの資源を複数の電子機器で共
有できるようになっている。例えば、１つのディスプレ
イ装置を複数の電子機器で共有し、オーディオ画面、エ
アコン操作画面、ナビゲーション画面などを適宜切り換
えて表示することができるようになっている。There is also an integrated system in which various electronic devices such as the audio device, the air conditioner, and the navigation device are connected by a bus. In such an integrated system, one resource can be shared by a plurality of electronic devices by exchanging information between the electronic devices via a bus. For example, one display device is shared by a plurality of electronic devices, and an audio screen, an air conditioner operation screen, a navigation screen, and the like can be appropriately switched and displayed.

【０００４】また、最近では、電子機器を制御する際の
片手運転等を回避するために、電子機器の制御を音声認
識により行えるようにしたシステムも提供されている。
この音声認識技術を用いれば、運転者は、ハンドルから
手を離すことなく（リモートコントローラや操作パネル
等の操作部を手動で操作せずに）各種電子機器の制御を
行うことができる。音声認識により電子機器の制御を行
う場合には、マイク等から入力された音声と比較して特
定の制御コマンドを認識するための音声認識辞書（認識
対象語彙を集めたデータベース）が必須となる。Recently, in order to avoid a one-handed operation or the like when controlling an electronic device, a system has been provided in which the electronic device can be controlled by voice recognition.
By using this voice recognition technology, the driver can control various electronic devices without releasing his / her hand from the steering wheel (without manually operating an operation unit such as a remote controller or an operation panel). When controlling an electronic device by voice recognition, a voice recognition dictionary (database that collects recognition target vocabulary) for recognizing a specific control command in comparison with voice input from a microphone or the like is essential.

【０００５】例えば、ＣＤのトラック番号を指定して音
声認識するための音声認識辞書は、図４に示すように、
「しーでぃー＜トラック番号＞」のような音声によるオ
ーディオコマンドと、各オーディオコマンドに対応する
コマンドＩＤとの関連付けを記憶している。ユーザがマ
イク等を通して音声を入力すると、図４の音声認識辞書
が参照され、その入力音声に最も近いオーディオコマン
ドが認識されて対応するコマンドＩＤが出力される。こ
のコマンドＩＤはオーディオ装置に供給され、指定され
たトラック番号の曲の再生等が行われる。For example, as shown in FIG. 4, a voice recognition dictionary for voice recognition by designating a CD track number is as shown in FIG.
The association between the audio command such as “Seedee <track number>” and the command ID corresponding to each audio command is stored. When the user inputs a voice through a microphone or the like, the voice recognition dictionary in FIG. 4 is referred to, the audio command closest to the input voice is recognized, and the corresponding command ID is output. The command ID is supplied to the audio device, and the music of the designated track number is reproduced.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、上述の
ようにＣＤのトラック番号により音声認識を行う場合、
トラック番号の中には互いに発音が類似しているものが
あり、システムがこれらを誤って認識してしまうことが
あった。例えば、トラック番号を英語にて音声入力する
場合、例えば「fifteen」と「fifty」とは発音が非常に
似ていて紛らわしく、誤認識する可能性が高くなる。However, in the case of performing voice recognition by the track number of the CD as described above,
Some track numbers sound similar to each other, and the system could erroneously recognize them. For example, when a track number is input by voice in English, for example, "fifteen" and "fifty" have very similar pronunciations and are confusing, and there is a high possibility of erroneous recognition.

【０００７】ところで、ＣＤのトラック番号は１〜９９
と規格上定められている。したがって、図４に示すよう
なＣＤ用の音声認識辞書では、１〜９９のトラック番号
に対応したコマンドが用意される。ところが、例えば音
楽ＣＤの場合、実際に記録されているトラック数はせい
ぜい２０以内である。そのため、トラック番号が９９ま
である音声認識辞書を用いて音声認識をすると、実際に
は存在しないトラック番号を誤認識してしまう（例え
ば、ユーザが「fifteen」と発音したのに、実際には存
在しない「fifty」と認識してしまう）ことがある。実
際に存在しないトラック番号が誤認識されると、オーデ
ィオ装置はそのコマンドを受け付けられなくなり、ユー
ザの使い勝手も悪くなるという問題があった。By the way, the track numbers of the CD are 1 to 99.
Is defined in the standard. Therefore, in the voice recognition dictionary for CD as shown in FIG. 4, commands corresponding to track numbers 1 to 99 are prepared. However, in the case of a music CD, for example, the number of tracks actually recorded is at most 20 or less. Therefore, if voice recognition is performed using a voice recognition dictionary having track numbers up to 99, a track number that does not actually exist will be erroneously recognized (for example, the user pronounced “fifteen” but actually exists). Do not recognize "fifty"). If a track number that does not actually exist is erroneously recognized, the audio device cannot accept the command, and the usability for the user becomes poor.

【０００８】ＣＤに記録される最大トラック数を予想し
て、その想定最大数（例えば３０トラック）の範囲にあ
らかじめ絞って作成した音声認識辞書も存在する。しか
しながら、このような音声認識辞書を用いた場合には、
想定した最大数よりもトラック数が多いＣＤでは、その
想定最大数よりも大きなトラック番号を指定することが
一切できなくなってしまうという問題があった。There is also a voice recognition dictionary created by predicting the maximum number of tracks recorded on a CD and narrowing it down in advance to the range of the assumed maximum number (for example, 30 tracks). However, when using such a voice recognition dictionary,
For a CD having more tracks than the assumed maximum number, there is a problem that it is impossible to specify a track number larger than the assumed maximum number.

【０００９】本発明は、このような問題を解決するため
に成されたものであり、ＣＤのトラック番号により音声
認識を行う場合に、存在しないはずのトラック番号をシ
ステムが誤って認識しないようにすることを目的とす
る。また、本発明は、ＣＤのトラック番号に限らず、音
声認識の対象とするメディアに存在しないはずの情報を
システムが誤って認識しないようにすることを目的とす
る。The present invention has been made in order to solve such a problem, and when the voice is recognized by the track number of the CD, the system does not mistakenly recognize the track number which should not exist. The purpose is to do. It is another object of the present invention to prevent the system from erroneously recognizing information that should not be present in the medium targeted for voice recognition, not limited to the CD track number.

【００１０】[0010]

【課題を解決するための手段】上記した課題を解決する
ために、本発明では、音声認識によって制御するように
成された電子機器から、当該電子機器で使用される媒体
の記録情報に関する属性情報（例えば記録情報の数、各
記録情報のタイトル、各記録情報のファイル名等）を取
得し、当該取得した属性情報に応じたコマンド数の範囲
で音声認識辞書を作成するようにしている。このように
構成した本発明によれば、音声認識の対象となる媒体に
実際に記録されている情報の数だけコマンドを有する音
声認識辞書が動的に作成され、こうして作成された音声
認識辞書を使用して音声認識処理が行われることとな
る。これにより、誤認識する可能性のある情報をあらか
じめ音声認識辞書から外しておくことが可能となり、存
在しないはずの情報を誤って認識してしまう不都合をな
くすことができる。また、媒体の記録情報を全て指定す
るために必要な情報を確実に音声認識辞書に含ませるこ
とも可能となり、必要な情報が認識できなくなってしま
う不都合もなくすことができる。In order to solve the above-mentioned problems, according to the present invention, attribute information relating to recorded information of a medium used in an electronic device is controlled by an electronic device which is controlled by voice recognition. (For example, the number of pieces of record information, the title of each piece of record information, the file name of each piece of record information, etc.) is acquired, and a voice recognition dictionary is created within a range of the number of commands according to the acquired attribute information. According to the present invention having such a configuration, a voice recognition dictionary having as many commands as the number of pieces of information actually recorded in the medium to be voice-recognized is dynamically created. The voice recognition process will be performed using this. As a result, information that may be erroneously recognized can be removed from the voice recognition dictionary in advance, and the inconvenience of erroneously recognizing information that should not exist can be eliminated. Further, it becomes possible to surely include the information necessary for designating all the recording information of the medium in the voice recognition dictionary, and it is possible to eliminate the inconvenience that the necessary information cannot be recognized.

【００１１】[0011]

【発明の実施の形態】以下、本発明の一実施形態を図面
に基づいて説明する。図１は、本実施形態による音声認
識辞書作成装置の構成を示すブロック図である。この図
１に示す実施形態は、本発明の音声認識辞書作成装置を
音声認識ユニットの一部として組み込んだものである。BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the arrangement of the voice recognition dictionary creating apparatus according to the present embodiment. In the embodiment shown in FIG. 1, the voice recognition dictionary creating apparatus of the present invention is incorporated as a part of a voice recognition unit.

【００１２】図１において、１０は音声認識ユニット、
２０はオーディオユニットであり、これらがバス３０を
介して接続され、互いにデータの授受を行うようになっ
ている。オーディオユニット２０は、ＣＤ・ＭＤ・カセ
ットテープ等に記録された音楽等を再生したり、受信し
たラジオ放送を再生したりする処理を行う。音声認識ユ
ニット１０は、オーディオユニット２０の制御（例え
ば、再生する音楽を指定する操作等）を音声認識により
行うためのものであり、入力音声に対応した制御コマン
ドを発生する。In FIG. 1, 10 is a voice recognition unit,
Reference numeral 20 denotes an audio unit, which are connected via a bus 30 to exchange data with each other. The audio unit 20 performs a process of reproducing music recorded on a CD / MD / cassette tape or the like, or reproducing a received radio broadcast. The voice recognition unit 10 is for performing control of the audio unit 20 (for example, an operation of designating music to be reproduced) by voice recognition, and generates a control command corresponding to an input voice.

【００１３】なお、ここでは音声認識により制御する対
象の電子機器としてオーディオユニット２０のみを示し
たが、エアコン、ナビゲーション装置その他の電子機器
を用いることも可能である。その場合、それらの電子機
器もバス３０に接続され、音声認識ユニット１０との間
で互いにデータの授受を行う。Although only the audio unit 20 is shown as an electronic device to be controlled by voice recognition here, an air conditioner, a navigation device and other electronic devices can also be used. In that case, those electronic devices are also connected to the bus 30 and exchange data with the voice recognition unit 10.

【００１４】次に、音声認識ユニット１０の内部構成に
ついて説明する。１１は音声入力部であり、オーディオ
ユニット２０を制御するためにユーザが発声した所望の
制御内容を表す単語を、マイクを通して入力する。１２
は音声認識辞書であり、認識しようとする単語ごとの認
識用データを集めて作成した認識対象語彙のデータベー
スである。Next, the internal structure of the voice recognition unit 10 will be described. Reference numeral 11 denotes a voice input unit, which inputs a word representing a desired control content uttered by the user for controlling the audio unit 20 through a microphone. 12
Is a speech recognition dictionary, which is a database of recognition target vocabulary created by collecting recognition data for each word to be recognized.

【００１５】本実施形態では、ＣＤのトラック番号を音
声認識して所望の音楽を選択する例について説明する。
この場合、音声認識辞書１２は、「しーでぃー＜トラッ
ク番号＞」のような音声によるオーディオコマンドと、
各オーディオコマンドに対応するコマンドＩＤとの関連
付けを記憶する。ただし、この記憶内容は、従来のよう
にあらかじめ固定で決まっているものではなく、後述す
るようにＣＤごとの記録内容に応じて動的に作成される
ものである。In the present embodiment, an example in which a track number of a CD is recognized by voice and a desired music is selected will be described.
In this case, the voice recognition dictionary 12 includes audio commands such as "Seedee <track number>"
The association with the command ID corresponding to each audio command is stored. However, this stored content is not fixed in advance as in the past, but is dynamically created according to the recorded content for each CD as described later.

【００１６】１３はマッチング部であり、音声入力部１
１により入力されたユーザの音声と音声認識辞書１２に
記憶されている認識用データとを比較して、どの単語が
発声されたかを認識する。そして、その認識結果を、単
語ごとに割り付けられている固有のコマンドＩＤの形で
コマンド出力部１４に出力する。コマンド出力部１４
は、マッチング部１３から供給されたコマンドＩＤを、
それに対応する制御コマンドに変換し、バス３０を介し
て制御対象とするオーディオユニット２０に送出する。
これにより、ユーザが音声で命じた制御を実現する。Reference numeral 13 is a matching unit, which is a voice input unit 1.
The user's voice input by 1 is compared with the recognition data stored in the voice recognition dictionary 12 to recognize which word is uttered. Then, the recognition result is output to the command output unit 14 in the form of a unique command ID assigned to each word. Command output unit 14
Is the command ID supplied from the matching unit 13,
It is converted into a corresponding control command and is sent to the audio unit 20 to be controlled via the bus 30.
This realizes the control commanded by the user by voice.

【００１７】１５は属性情報入力部であり、オーディオ
ユニット２０からバス３０を介してＣＤの記録内容に関
する属性情報を取得する。取得する属性情報は、例えば
ＣＤに記録されている楽曲数を表すトラック数である。
具体的には、オーディオユニット２０にＣＤが挿入され
たときに、属性情報入力部１５がそのＣＤのトラック数
を取得する。トラック数はＣＤごとに固有であり、殆ど
の場合トラック数はせいぜい２０程度である。Reference numeral 15 is an attribute information input section, which obtains attribute information relating to the recorded contents of the CD from the audio unit 20 via the bus 30. The attribute information to be acquired is, for example, the number of tracks representing the number of songs recorded on the CD.
Specifically, when a CD is inserted into the audio unit 20, the attribute information input unit 15 acquires the number of tracks on that CD. The number of tracks is unique to each CD, and in most cases, the number of tracks is about 20 at most.

【００１８】１６は辞書作成部であり、属性情報入力部
１５によって取得された属性情報（ＣＤのトラック数）
に基づいて、そのトラック数に応じたコマンド数の範囲
で音声認識辞書１２を作成する。図２は、この音声認識
辞書１２の作成例を示す図である。図２（ａ）に示すよ
うに、音声認識辞書１２は、初期状態ではトラック番号
とそれに対応するコマンドＩＤとが空になっている。こ
の状態でオーディオユニット２０に１５曲の音楽を収録
したＣＤが挿入されたとすると、属性情報入力部１５が
“１５”というトラック数をオーディオユニット２０か
ら取得する。これに応じて辞書作成部１６は、図２
（ｂ）に示すように、１〜１５のトラック番号とそれに
対応するコマンドＩＤのみから成る音声認識辞書１２を
作成する。Reference numeral 16 is a dictionary creating unit, which is the attribute information (the number of CD tracks) acquired by the attribute information input unit 15.
Based on the above, the voice recognition dictionary 12 is created within the range of the number of commands corresponding to the number of tracks. FIG. 2 is a diagram showing an example of creating the voice recognition dictionary 12. As shown in FIG. 2A, in the voice recognition dictionary 12, the track number and the corresponding command ID are empty in the initial state. If a CD containing 15 pieces of music is inserted in the audio unit 20 in this state, the attribute information input unit 15 acquires the number of tracks of “15” from the audio unit 20. In response to this, the dictionary creation unit 16 is
As shown in (b), the voice recognition dictionary 12 that includes only track numbers 1 to 15 and the corresponding command IDs is created.

【００１９】図３は、上記のように構成した本実施形態
による音声認識辞書作成装置の動作を示すフローチャー
トである。図３において、電源が投入されると、音声認
識ユニット１０の辞書作成部１６は、音声認識辞書１２
を図２（ａ）の状態に初期化する（ステップＳ１）。そ
の後、オーディオユニット２０は、ＣＤ等のメディアが
挿入されたか否かを判断する（ステップＳ２）。ＣＤが
挿入された場合は、当該挿入されたＣＤの属性情報をオ
ーディオユニット２０から音声認識ユニット１０に送信
する（ステップＳ３）。FIG. 3 is a flow chart showing the operation of the speech recognition dictionary creating apparatus according to the present embodiment having the above-mentioned structure. In FIG. 3, when the power is turned on, the dictionary creation unit 16 of the voice recognition unit 10 causes the voice recognition dictionary 12
Is initialized to the state of FIG. 2 (a) (step S1). Then, the audio unit 20 determines whether or not a medium such as a CD has been inserted (step S2). When the CD is inserted, the attribute information of the inserted CD is transmitted from the audio unit 20 to the voice recognition unit 10 (step S3).

【００２０】オーディオユニット２０から送信された属
性情報は、属性情報入力部１５を介して音声認識ユニッ
ト１０にて取得される。辞書作成部１６は、この属性情
報入力部１５によって取得された属性情報に基づいて、
その属性情報に応じたコマンド数の範囲で、図２（ｂ）
に示すような音声認識辞書１２を作成する（ステップＳ
４）。このようにして作成された音声認識辞書１２は、
マッチング部１３による音声認識処理に利用されること
となる。The attribute information transmitted from the audio unit 20 is acquired by the voice recognition unit 10 via the attribute information input section 15. Based on the attribute information acquired by the attribute information input unit 15, the dictionary creating unit 16
In the range of the number of commands according to the attribute information, FIG.
Create a voice recognition dictionary 12 as shown in (step S
4). The voice recognition dictionary 12 created in this way is
It is used for the voice recognition processing by the matching unit 13.

【００２１】上記ステップＳ４で音声認識辞書１２が作
成された後、あるいは、上記ステップＳ２でオーディオ
ユニット２０にＣＤが挿入されたことが検出されなかっ
た場合、オーディオユニット２０は、ＣＤが抜かれたか
否かを判断する（ステップＳ５）。ＣＤが抜かれた場合
は、ステップＳ１に戻って音声認識辞書１２を図２
（ａ）の状態に初期化する。一方、ＣＤが抜かれていな
い場合は、ステップＳ２に戻ってステップＳ２〜Ｓ５の
ループ処理を継続して行う。After the voice recognition dictionary 12 is created in step S4, or when it is not detected that the CD is inserted in the audio unit 20 in step S2, the audio unit 20 determines whether the CD is removed. It is determined (step S5). If the CD is removed, the process returns to step S1 and the voice recognition dictionary 12 is displayed as shown in FIG.
Initialize to the state of (a). On the other hand, if the CD is not removed, the process returns to step S2 and the loop process of steps S2 to S5 is continuously performed.

【００２２】以上詳しく説明したように、本実施形態に
よれば、音楽ＣＤがオーディオユニット２０に挿入され
たとき、オーディオユニット２０からＣＤのトラック数
を取得して、そのＣＤが持っているトラック番号の範囲
で音声認識辞書１２を動的に作成するようにしている。
これにより、挿入されたＣＤに存在していないトラック
番号を、実際に存在しているトラック番号と誤認識する
可能性のあるトラック番号も含めてあらかじめ音声認識
辞書１２から外しておくことが可能となり、ＣＤ上に存
在しないはずのトラック番号を誤って認識してしまう不
都合をなくすことができる。As described in detail above, according to this embodiment, when a music CD is inserted into the audio unit 20, the number of tracks of the CD is acquired from the audio unit 20 and the track number of the CD is held. In this range, the voice recognition dictionary 12 is dynamically created.
As a result, it becomes possible to remove the track numbers that do not exist in the inserted CD from the voice recognition dictionary 12 in advance, including the track numbers that may be erroneously recognized as the actually existing track numbers. , It is possible to eliminate the inconvenience of erroneously recognizing a track number which should not exist on the CD.

【００２３】例えば、トラック番号が１〜１５のＣＤに
ついて所望の楽曲を音声認識により選択する場合におい
て、ユーザがトラック番号を指定するために「fiftee
n」と発音したとする。この場合、この「fifteen」と発
音が類似している「fifty」のトラック番号は音声認識
辞書１２に存在しないので、マッチング部１３が実際に
は存在しないトラック番号を誤認識してしまうことをな
くすことができる。そのため、音声入力をやり直す等の
面倒な作業が不要となり、音声認識の使い勝手が非常に
良くなる。For example, when a desired music piece is selected by voice recognition for a CD having track numbers 1 to 15, "fiftee" is used for the user to specify the track number.
Suppose you pronounced "n". In this case, since the track number of "fifty" whose pronunciation is similar to that of "fifteen" does not exist in the voice recognition dictionary 12, it is possible to prevent the matching unit 13 from erroneously recognizing a track number that does not actually exist. be able to. Therefore, a troublesome work such as redoing voice input is unnecessary, and the usability of voice recognition is greatly improved.

【００２４】また、上記実施形態によれば、オーディオ
ユニット２０に挿入されたＣＤに記録されている楽曲を
指定するために必要なコマンドを確実に音声認識辞書１
２に含ませることができる。例えば、トラック数が１５
のＣＤがオーディオユニット２０に挿入されたときは、
１〜１５のトラック番号を含んだ音声認識辞書１２が動
的に作成され、トラック数が２０のＣＤがオーディオユ
ニット２０に挿入されたときは、１〜２０のトラック番
号を含んだ音声認識辞書１２が動的に作成される。した
がって、あらかじめコマンド数を絞った固定の音声認識
辞書を作成する従来例と異なり、挿入されたＣＤ上に存
在する楽曲のトラック番号を認識できなくなってしまう
不都合をなくすこともできる。Further, according to the above-mentioned embodiment, the voice recognition dictionary 1 can reliably execute the command necessary for designating the music recorded on the CD inserted in the audio unit 20.
2 can be included. For example, the number of tracks is 15
When the CD of is inserted into the audio unit 20,
When the voice recognition dictionary 12 including the track numbers 1 to 15 is dynamically created and a CD having 20 tracks is inserted into the audio unit 20, the voice recognition dictionary 12 including the track numbers 1 to 20 is inserted. Is dynamically created. Therefore, unlike the conventional example in which a fixed voice recognition dictionary in which the number of commands is narrowed down is created in advance, it is possible to eliminate the inconvenience of being unable to recognize the track numbers of the music pieces existing on the inserted CD.

【００２５】なお、上記実施形態では、オーディオユニ
ット２０にＣＤを挿入した場合の例について説明した
が、ＭＤ、カセットテープ、ＤＶＤなどの他のメディア
を挿入した場合も同様である。すなわち、音声認識ユニ
ット１０は、オーディオユニット２０に挿入された各種
メディアのトラック数を取得して、そのメディアが持っ
ているトラック番号の範囲で音声認識辞書１２を動的に
作成する。In the above embodiment, an example in which a CD is inserted into the audio unit 20 has been described, but the same applies to the case where another medium such as an MD, cassette tape or DVD is inserted. That is, the voice recognition unit 10 acquires the number of tracks of various media inserted in the audio unit 20, and dynamically creates the voice recognition dictionary 12 within the range of track numbers of the media.

【００２６】また、上記実施形態では、オーディオユニ
ット２０から音声認識ユニット１０に取得する属性情報
がメディアのトラック数である例について説明したが、
取得する属性情報はこれに限定されない。例えば、メデ
ィアに記録されている各楽曲のタイトルであっても良
い。この場合、辞書作成部１６は、取得したタイトルの
数に応じたコマンド数の範囲で音声認識辞書１２を作成
する。In the above embodiment, an example in which the attribute information acquired from the audio unit 20 to the voice recognition unit 10 is the number of tracks on the medium has been described.
The attribute information to be acquired is not limited to this. For example, it may be the title of each song recorded on the medium. In this case, the dictionary creating unit 16 creates the voice recognition dictionary 12 within the range of the number of commands according to the number of acquired titles.

【００２７】また、オーディオユニット２０において、
例えばＭＰ３（MPEG-Audio Layer3）、ＡＡＣ（Advance
d Audio Coding）、ＡＣ-３（Dolby Digital）、ＡＴＲ
ＡＣ３（Adaptive Transform Acoustic Coding）、Twin
ＶＱ、ＷＭＡ（Windows Media Audio：Windowsは登録商
標）等で圧縮された音楽を再生可能な場合は、属性情報
として音楽データのファイル名を用いても良い。In the audio unit 20,
For example, MP3 (MPEG-Audio Layer3), AAC (Advance
d Audio Coding), AC-3 (Dolby Digital), ATR
AC3 (Adaptive Transform Acoustic Coding), Twin
When music compressed by VQ, WMA (Windows Media Audio: Windows is a registered trademark) or the like can be played, the file name of music data may be used as the attribute information.

【００２８】また、上記実施形態では、メディアのトラ
ック番号を音声認識して所望の音楽を選択する例につい
て説明したが、認識対象はトラック番号に限定されるも
のではない。例えば、楽曲のタイトルを音声認識して選
曲するようにしても良い。この場合、属性情報入力部１
５は、メディアに記録されている各楽曲のタイトルをオ
ーディオユニット２０から取得する。また、辞書作成部
１６は、その取得した各々のタイトルとそれに対応する
コマンドＩＤとを関連付けた音声認識辞書１２を作成す
る。Further, in the above embodiment, an example in which the track number of the medium is voice-recognized and desired music is selected has been described, but the recognition target is not limited to the track number. For example, the title of the music may be recognized by voice to select the music. In this case, the attribute information input unit 1
5 acquires the title of each song recorded on the medium from the audio unit 20. Further, the dictionary creating unit 16 creates the voice recognition dictionary 12 in which each of the acquired titles and the command ID corresponding thereto are associated with each other.

【００２９】通常、楽曲のタイトルはメディアごとに様
々である。そのため、多数のメディアを曲名により音声
認識するための音声認識辞書を作成しようとすると、音
声認識辞書のデータベースに必要なコマンド数（楽曲の
タイトル数）が膨大となってしまう。また、数多くのメ
ディアが随時提供されており、最新のメディアに対応し
た音声認識を行うためには、音声認識辞書を逐次更新す
る必要がある。そのため、楽曲のタイトルをベースとし
て音声認識することは現実的でなかった。Generally, the title of a music piece varies from medium to medium. Therefore, if an attempt is made to create a voice recognition dictionary for voice recognition of a large number of media by song names, the number of commands (the number of song titles) required in the database of the voice recognition dictionary will be enormous. Also, many media are provided at any time, and it is necessary to successively update the voice recognition dictionary in order to perform voice recognition corresponding to the latest media. Therefore, it is not realistic to recognize the voice based on the title of the music.

【００３０】これに対して、本実施形態によれば、オー
ディオユニット２０に挿入したメディアに応じて、その
メディアに記録されている楽曲のタイトルをコマンドと
する音声認識辞書１２が動的に作成される。したがっ
て、音声認識辞書１２に必要なコマンド数は、挿入され
たメディアに記録されている楽曲数だけで済み、膨大な
データベースを用意する必要がない。また、データベー
スの更新を管理する必要もなく、任意のメディアについ
て曲名をベースに音声認識することができる。On the other hand, according to this embodiment, according to the medium inserted in the audio unit 20, the voice recognition dictionary 12 using the title of the music recorded on the medium as a command is dynamically created. It Therefore, the number of commands required for the voice recognition dictionary 12 is only the number of music pieces recorded in the inserted medium, and it is not necessary to prepare a huge database. Further, it is not necessary to manage the update of the database, and voice recognition can be performed on any media based on the song name.

【００３１】また、上記実施形態では、音声認識辞書１
２の初期状態においては、図２（ａ）のようにトラック
番号とそれに対応するコマンドＩＤとを空にしておき、
メディアが挿入されたときに必要なだけトラック番号等
を追加する例について説明したが、これに限定されるも
のではない。例えば、音声認識辞書１２の初期状態にお
いて、図４のように１〜９９のトラック番号とそれに対
応するコマンドＩＤとを用意しておき、メディアが挿入
されたときに不必要なトラック番号等（例えばトラック
数が１５のＣＤが挿入されたときは、１６〜９９の部
分）をマスクするようにしても良い。ただし、マスクす
ると言っても、マッチング部１３による処理では１〜９
９の全サーチ空間を対象として比較をする必要がある。
したがって、上記実施形態のように必要なコマンドだけ
を後から追加するようにした方が、サーチ空間が小さく
なり、認識速度を速くすることができる。In the above embodiment, the voice recognition dictionary 1
In the initial state of 2, the track number and the command ID corresponding to it are left empty as shown in FIG.
The example of adding the track number and the like as needed when the medium is inserted has been described, but the present invention is not limited to this. For example, in the initial state of the voice recognition dictionary 12, track numbers 1 to 99 and command IDs corresponding thereto are prepared as shown in FIG. 4, and unnecessary track numbers and the like when a medium is inserted (for example, When a CD having 15 tracks is inserted, 16 to 99 parts) may be masked. However, even if it is said to be a mask, the processing by the matching unit 13 is 1 to 9
It is necessary to compare all 9 search spaces.
Therefore, the search space becomes smaller and the recognition speed can be made faster by adding only the necessary commands later as in the above embodiment.

【００３２】また、上記実施形態では、音声認識ユニッ
ト１０とオーディオユニット２０（更には図示しないエ
アコン、ナビゲーション装置などの各種電子機器）とが
バス３０によって接続された一体型のシステムに本発明
の音声認識辞書作成装置を適用する例について説明した
が、これに限定されるものではない。例えば、音声認識
機能を備えた単独のオーディオ装置に対して本発明の音
声認識辞書作成装置を適用するようにしても良い。ま
た、適用するシステムは車載用に限らない。Further, in the above-described embodiment, the voice recognition unit 10 and the audio unit 20 (further, various electronic devices such as an air conditioner and a navigation device, not shown) are connected to each other by the bus 30, and the voice of the present invention is applied to the system. Although the example of applying the recognition dictionary creating device has been described, the invention is not limited to this. For example, the voice recognition dictionary creating device of the present invention may be applied to a single audio device having a voice recognition function. Moreover, the applied system is not limited to the vehicle-mounted type.

【００３３】その他、上記実施形態は、本発明を実施す
るにあたっての具体化の一例を示したものに過ぎず、こ
れによって本発明の技術的範囲が限定的に解釈されては
ならないものである。すなわち、本発明はその精神、ま
たはその主要な特徴から逸脱することなく、様々な形で
実施することができる。In addition, the above embodiment is merely an example of the embodiment for carrying out the present invention, and the technical scope of the present invention should not be limitedly interpreted thereby. That is, the present invention can be implemented in various forms without departing from the spirit or the main features thereof.

【００３４】[0034]

【発明の効果】以上説明したように、本発明によれば、
音声認識によって制御するように成された電子機器から
媒体の記録情報に関する属性情報を取得し、当該取得し
た属性情報に応じたコマンド数の範囲で音声認識辞書を
作成するようにしたので、存在しないはずの情報を誤認
識してしまったり、存在するはずの情報を認識できなく
なってしまうことがなくなり、音声認識の使い勝手を向
上させることができる。As described above, according to the present invention,
It does not exist because the attribute information related to the recorded information of the medium is acquired from the electronic device that is controlled by the voice recognition and the voice recognition dictionary is created within the range of the number of commands according to the acquired attribute information. It is possible to improve the usability of the voice recognition without erroneously recognizing the supposed information or not being able to recognize the supposed information.

[Brief description of drawings]

【図１】本実施形態による音声認識辞書作成装置の構成
を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a voice recognition dictionary creating device according to an embodiment.

【図２】音声認識辞書の作成例を示す図である。FIG. 2 is a diagram showing an example of creating a voice recognition dictionary.

【図３】本実施形態による音声認識辞書作成装置の動作
を示すフローチャートである。FIG. 3 is a flowchart showing an operation of the voice recognition dictionary creating device according to the present embodiment.

【図４】従来の音声認識辞書の例を示す図である。FIG. 4 is a diagram showing an example of a conventional voice recognition dictionary.

[Explanation of symbols]

１０音声認識ユニット１１音声入力部１２音声認識辞書１３マッチング部１４コマンド出力部１５属性情報入力部１６辞書作成部２０オーディオユニット３０バス 10 Speech recognition unit 11 Voice input section 12 Speech recognition dictionary 13 Matching part 14 Command output section 15 Attribute information input section 16 Dictionary creation department 20 audio units 30 bus

Claims

[Claims]

1. A device for creating a voice recognition dictionary used when designating any one of record information of a medium used in an electronic device by voice recognition, the device relating to record information of the medium from the electronic device. Attribute information acquisition means for acquiring attribute information, and dictionary creation means for creating a voice recognition dictionary within the range of the number of commands according to the attribute information based on the attribute information acquired by the attribute information acquisition means A voice recognition dictionary creating device characterized by the above.

2. The voice recognition dictionary creating apparatus according to claim 1, wherein the attribute information is the number of pieces of record information recorded in the medium.

3. The voice recognition dictionary creating apparatus according to claim 1, wherein the attribute information is a title of each piece of recorded information recorded on the medium.

4. The voice recognition dictionary creating apparatus according to claim 1, wherein the attribute information is a file name of each of the record information recorded on the medium.

5. The dictionary creating means creates a voice recognition dictionary using the title as a command, based on the title of each record information acquired by the attribute information acquiring means. The described voice recognition dictionary creation device.

6. The dictionary creating means creates a voice recognition dictionary using the file name as a command, based on the file name of each record information acquired by the attribute information acquiring means. 4. The voice recognition dictionary creation device described in 4.

7. A method for creating a voice recognition dictionary used when designating any one of recorded information on a medium used in an electronic device by voice recognition, the method comprising: A voice recognition dictionary creating method, characterized in that attribute information is acquired and a voice recognition dictionary is created within a range of the number of commands according to the acquired attribute information.