JP2009204872A

JP2009204872A - Creation system of dictionary for speech recognition

Info

Publication number: JP2009204872A
Application number: JP2008046963A
Authority: JP
Inventors: Noriaki Otani; 教明大谷
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2008-02-28
Filing date: 2008-02-28
Publication date: 2009-09-10

Abstract

<P>PROBLEM TO BE SOLVED: To provide a creation system of a dictionary for speech recognition, capable of creating a correct dictionary for speech recognition in a short period of time, and accurately and speedily performing speech recognition by using this. <P>SOLUTION: The creation system of the dictionary for speech recognition, which is used for performing device operation by recognizing user's utterance speech is created by attaching reading to an original data for creating the dictionary for speech recognition, which is collected by a speech recognition object data collection section. In the speech recognition object data collection section, the original data for creating the dictionary for speech recognition is separately collected for each data classification, and a dictionary for speech recognition device operation is created by attaching reading to the original data for creating the dictionary for speech recognition, by using a basic dictionary for creating the dictionary for speech recognition, which is separately created. The basic dictionary for creating the dictionary for speech recognition is separately created in the same classification as that in which the original data for creating the dictionary for speech recognition is separated, by collecting the original data for reading conversion which is presumed to be used in the speech recognition device, and by attaching reading beforehand. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は音声認識に用いる辞書を作成するに際して、例えば英語のような読み仮名情報を持たない言語の音声をできる限り正確に、且つ高速で認識できるようにした、音声認識用辞書生成システムに関する。 The present invention relates to a speech recognition dictionary generation system capable of recognizing speech in a language that does not have reading kana information such as English as accurately as possible when creating a dictionary used for speech recognition.

従来より例えば車両用ナビゲーション装置において、運転者が目的地を検索する場合等に、特に車両の運転中にでも入力を可能とするため、或いは通常の目的地の設定に際して手作業を行うことなく容易に入力できるように、運転者等が発声する音声を認識し、入力データを得る音声認識技術が開発され、広く用いられるようになっている。また、このようなナビゲーション装置に限らず、車両に搭載した各種機器を運転者が安全に操作ができるように、オーディオ装置を初めエアコン等も、利用者の発声する音声を認識し、各種作動を行うことができるようにしたものも提案されている。 Conventionally, for example, in a vehicle navigation device, when a driver searches for a destination, etc., it is possible to input even during driving of the vehicle, or without performing manual work when setting a normal destination. For example, a voice recognition technique for recognizing a voice uttered by a driver or the like and obtaining input data has been developed and widely used. In addition to such navigation devices, audio devices, air conditioners, etc. recognize voices uttered by users and perform various operations so that the driver can safely operate various devices mounted on the vehicle. Some have been proposed that can be done.

前記車両用オーディオ装置においては、近年ハードディスク等に大量のオーディオデータを蓄積したオーディオ機器が用いられるようになっており、それらのオーディオデータの中から所望のアーティストやアルバム、或いは曲を選択して聞くために検索を行い、検索した結果得られるプレイリストに従って再生することが行われる。その際にも運転者でも容易に操作ができるように、音声によって操作するため音声認識装置を用いることも提案されている。 In the above-mentioned audio apparatus for vehicles, an audio device storing a large amount of audio data on a hard disk or the like has recently been used, and a desired artist, album, or song is selected and listened to from the audio data. Therefore, a search is performed, and reproduction is performed according to a playlist obtained as a result of the search. At that time, it is also proposed to use a voice recognition device for operating by voice so that the driver can easily operate.

特に近年は携帯型オーディオプレーヤにおいて、大容量のメモリチップや小型ハードディスク等をデータ記録媒体として内蔵し、ＭＰ３等で圧縮した大量のオーディオデータをこのデータ記録媒体に記録して自由に持ち運び、いつでもどこでも自分の好みの曲を聴くことができるようになっている。このような大量のオーディオデータを記録したデータ記録媒体を内蔵した携帯型オーディオプレーヤは、車両にも持ち込まれることが多く、その際には車両に搭載した高性能のオーディオ装置で再生し出力して聞くことが望まれる。そのため車両用オーディオ装置では、これらの携帯型オーディオ装置を接続して内蔵しているオーディオデータを読み出し、これを再生して車両用オーディオ装置から出力する手段を備えるようになっている。 Particularly in recent years, portable audio players have built-in large-capacity memory chips, small hard disks, etc. as data recording media, and a large amount of audio data compressed with MP3 etc. can be recorded on this data recording medium and carried freely. You can listen to your favorite songs. Portable audio players that incorporate a data recording medium that records such a large amount of audio data are often brought into vehicles, in which case they are reproduced and output by a high-performance audio device installed in the vehicle. It is desirable to listen. Therefore, the vehicular audio apparatus is provided with means for connecting these portable audio apparatuses, reading out the built-in audio data, reproducing them, and outputting them from the vehicular audio apparatus.

このように車両用オーディオ装置に携帯型オーディオプレーヤを接続して、携帯型オーディオ装置が内蔵しているデータ記録媒体のオーディオデータを入力し再生して出力する際には、車両用オーディオ装置において携帯型オーディオ装置のデータ記録媒体に記録されているオーディオデータを検索して、任意のアーティストやアルバム、或いは曲を選択してプレイリストを作成し、再生することとなる。このときにも前記と同様に、運転者が容易に曲の選択を行うことができるように、音声によって選択操作を行うことが望まれる。 When the portable audio player is connected to the vehicle audio device as described above and the audio data of the data recording medium built in the portable audio device is input, reproduced, and output, the portable audio player carries the portable audio player. The audio data recorded on the data recording medium of the type audio apparatus is searched, and an arbitrary artist, album, or song is selected to create and play a playlist. Also at this time, it is desired that the selection operation is performed by voice so that the driver can easily select a song.

前記のように車両で使用する種々の機器は運転者が操作することが多いため、できる限り運転者が前方の安全の確認を妨げないように操作できるようにすることが求められ、そのために音声によって操作することが望まれる。このような音声による操作に際しては、利用者が発声した音声を認識するため、利用者が発声した音声データと、あらかじめ認識用辞書として登録している音声データとを比較し、最も適合する音声データの言葉を利用者が発声により指示した操作信号として出力することとなる。その際に用いる認識用辞書としては、利用者にあらかじめ音声認識で用いる言葉を発声させ、その音声データを辞書として蓄積することによっても作成することができる。 As described above, various devices used in a vehicle are often operated by a driver, and therefore, it is required that the driver can operate as much as possible so as not to prevent confirmation of safety ahead. It is desirable to operate by. In such a voice operation, the voice data uttered by the user is recognized. Therefore, the voice data uttered by the user is compared with the voice data registered in advance as a recognition dictionary, and the voice data most suitable is compared. Is output as an operation signal instructed by the user by speaking. The recognition dictionary used at that time can also be created by causing a user to utter words used in speech recognition in advance and storing the speech data as a dictionary.

しかしながらこの手法は、個人の住所録や電話帳のような、件数が比較的少ないものの場合には使用できるが、例えばハードディスクに録音した曲を音声認識で検索し再生するためにハードディスクに記録されている例えば１０，０００曲のアーティスト名やアルバム名、或いは曲名をあらかじめ登録しておくことは、登録の手間がかかり過ぎ、不可能に近い。特に車両用オーディオ装置に携帯型オーディオプレーヤを接続するときのように、時々異なったオーディオ記録媒体のオーディオデータを用いるときには、それぞれ収録している曲が異なるため、それらの曲を音声認識で検索するときには実質的に利用することができない。 However, this method can be used when the number of records is relatively small, such as a personal address book or a telephone directory. For example, a song recorded on the hard disk is recorded on the hard disk in order to search and play it back by voice recognition. For example, registering an artist name, an album name, or a song name of 10,000 songs in advance takes too much time for registration and is almost impossible. Especially when audio data from different audio recording media is used, such as when a portable audio player is connected to a vehicle audio device, the recorded songs are different, and those songs are searched by voice recognition. Sometimes it is virtually impossible to use.

音声認識用辞書の作成に際してはそのほか、文字列から音声合成（ＴＴＳ：ｔｅｘｔ−ｔｏ−ｓｐｅｅｃｈ）技術を用いて読みデータをあらかじめ作成し、利用者が発声した音声と比較することにより認識を行うことも可能である。即ち、例えばナビゲーション装置において、利用者に案内を行うに際して音声で右左折の案内等を行っているが、その際にはテキストデータを音声に変換する音声合成技術が用いられている。この技術を用いて各曲に記録されているアーティスト名やアルバム名、或いは曲名等の文字列からなるテキストデータを読みデータとし、これを音声データ化して登録し、音声認識辞書を作成することが考えられる。 In addition to the creation of a dictionary for speech recognition, recognition is performed by previously creating reading data from a character string using a speech synthesis (TTS: text-to-speech) technique and comparing it with speech uttered by the user. Is also possible. That is, for example, in a navigation apparatus, guidance to a left or right turn is given by voice when guidance is given to a user, and at that time, a voice synthesis technique for converting text data into voice is used. Using this technology, text data consisting of character strings such as artist names, album names, or song names recorded in each song can be used as read data, which can be registered as voice data to create a voice recognition dictionary. Conceivable.

この手法は先の手法と比較して利用者が操作する手間が省ける利点はあるが、これらのデータには読み仮名がふられていないとき、読み文字の付与はＴＴＳ任せとなり、利用者が意図しない読みデータが付与されてしまう可能性がある。即ち、例えば「１１０」が本来「ワンテン（ｏｎｅｔｅｎ）」という読みであって読み仮名が付与されていないときには、これを「ワンハンドレッドテン（ｏｎｅｈｕｎｄｒｅｄｔｅｎ）」と付与するなど、本来の読みが付与されないことがある。このことは特に曲に関する名称には、販売のアピール効果を高めるため特異な読み方にすることが多く、それに対して利用者は単に発音だけで覚えていることが多いため、利用者が発音した音声に対応した適切な曲が検索されないことが多くなる。 This method has the advantage that the user can save time and effort compared to the previous method. However, when these data are not marked with a reading pseudonym, it is up to the TTS to assign the reading character, and the user intends. Reading data may not be added. That is, for example, when “110” is originally read as “one ten” and no reading pseudonym is given, it is given as “one hundred ten”. May not be granted. This is especially true for names related to songs, which are often used in a unique way of reading to increase the appeal of sales, whereas users often remember only by pronunciation, so In many cases, an appropriate song corresponding to is not searched.

なお、「−」「？」「＋」「／」等の読まれない記号を含むデータから適切な音声認識辞書を作成するため、利用者の発声に整合するように発音データを蓄積した音声認識辞書を生成する技術は特開２００４−５３９７８号に開示されており、また、言い換え語彙の発生状況を検出して、発声した言い換え語彙を登録して利用することができるようにした技術は特開２００７−２１３００５号公報に開示されている。
特開２００４−５３９７８号公報特開２００７−２１３００５号公報 In addition, in order to create an appropriate speech recognition dictionary from data including unreadable symbols such as “−”, “?”, “+”, “/”, Etc., speech recognition in which pronunciation data is accumulated so as to match the user's utterance A technique for generating a dictionary is disclosed in Japanese Patent Application Laid-Open No. 2004-53978, and a technique for detecting the occurrence status of a paraphrase vocabulary and registering and using the spoken paraphrase vocabulary is disclosed in Japanese Patent Application Laid-Open No. 2004-53978. This is disclosed in Japanese Patent Publication No. 2007-213005.
JP 2004-53978 A JP 2007-213055 A

前記のように、特に車両用機器の操作に際しては、利用者の発声した音声を認識して種々の操作を行うことが望まれ、オーディオ装置においても音声認識により各種操作を行うことが望まれるのに対して、ハードディスク等のオーディオ記録媒体にオーディオデータと共に記録したアーティスト名やアルバム名、或いは曲名等の曲情報に基づき、ＴＴＳによって音声認識辞書を作成すると、本来の読みのとおりには読まれず、別異の音声認識辞書が作成されてしまうため、利用者が発声した曲に関する音声を正しく認識することができない場合が多くなる。 As described above, in particular, when operating a vehicle device, it is desired to perform various operations by recognizing a voice uttered by a user, and it is also desirable to perform various operations by voice recognition in an audio device. On the other hand, if a voice recognition dictionary is created by TTS based on artist information, album name, or song information recorded along with audio data on an audio recording medium such as a hard disk, it cannot be read as originally read, Since different speech recognition dictionaries are created, there are many cases where the speech related to the song uttered by the user cannot be correctly recognized.

これらの曲データにオーディオ機器の利用者が予め「読み仮名」を別途入力しておくこともあり、この場合にはその読み仮名のデータを用いることができるが、多くの場合このような「読み仮名」のデータが入力されておらず、その場合には特に前記のような問題を生じる。 In some cases, audio device users may input “Kana” in advance for these song data. In this case, the data of the Kana can be used. In this case, the above-described problem occurs.

更に、前記ハードディスクのような大容量のデータ記憶媒体にオーディオデータを記録するときのように、膨大なデータを取り扱うときには、ＴＴＳによって音声認識辞書を作成すると長時間かかることとなる。したがって、例えば車両用オーディオ装置に大量のオーディオデータを記録した携帯型オーディオプレーヤを接続し、音声認識により任意の曲を選択して聞こうとするときには、車両用オーディオ装置に携帯型オーディオプレーヤを接続したとき直ちに、自動的に音声認識用辞書を作成する処理を行うような場合には、その処理に多くの時間を要することとなり、音声認識辞書が作成されるまでは音声による機器操作が行われないため、利用性の悪い装置となり、利用者に不快感や不信感を与えることともなる。 Furthermore, when handling a huge amount of data, such as when recording audio data on a large-capacity data storage medium such as the hard disk, it takes a long time to create a speech recognition dictionary using TTS. Therefore, for example, when a portable audio player that records a large amount of audio data is connected to a vehicle audio device and an arbitrary song is to be selected and listened to by voice recognition, the portable audio player is connected to the vehicle audio device. When a process for automatically creating a dictionary for speech recognition is performed immediately, a long time is required for the process, and device operation with voice is performed until the speech recognition dictionary is created. Therefore, it becomes a device with poor usability, which may give the user discomfort and distrust.

このことは前記のような車両に搭載したオーディオ装置に限らず、例えばナビゲーション装置において新しい地図データ、或いは新しい地図の差分のデータをダウンロードして地図データの更新を行うときに提供される、新しい地名のデータを含んでいる際には、これを音声認識により検索を行うときも同様であり、単に地名のテキストデータから音声認識用の辞書を作成するときには本来の読みを付与することができず、利用者が発声する特有の本来の読みに対応することができず、適切な認識を行うことができないという問題を生じる。 This is not limited to the audio device mounted on the vehicle as described above. For example, a new place name provided when the map device is updated by downloading new map data or new map difference data in the navigation device. This is the same when searching by voice recognition, and when creating a dictionary for voice recognition from text data of place names, the original reading cannot be given, There is a problem that it is not possible to cope with a specific original reading uttered by the user, and appropriate recognition cannot be performed.

更に、近年は車両に携帯電話を持ち込むとき、これをナビゲーション装置と接続し、携帯電話を利用してインターネット網に接続し、各種情報の取り込んで表示し、また利用することができるようになっており、更に音声認識機能を用いて携帯電話の操作を行うことも提案されている。その際に携帯電話の電話帳を利用して電話をかけるとき、電話帳に登録されている氏名、社名等について、音声によって検索を行い、電話番号を出力し、電話をかける機能を備えることも提案されている。そのような場合にも、携帯電話が接続されたとき、直ちに電話帳の音声認識による検索が行われることを考慮して音声認識辞書を作成する場合にも、電話帳に存在するテキストデータでは特有の読みがわからないため、作成される音声認識辞書は必ずしも適切な辞書とはなっておらず、したがって適切な音声認識による検索を行異、電話をかけることができないこととなる。 Furthermore, in recent years, when a mobile phone is brought into a vehicle, it can be connected to a navigation device, connected to the Internet network using the mobile phone, and various information can be captured and displayed and used. In addition, it has also been proposed to operate a mobile phone using a voice recognition function. When making a call using the phone book of the mobile phone at that time, it is also possible to search by name for the name, company name, etc. registered in the phone book, output the phone number, and have the function to make a call Proposed. Even in such a case, when creating a speech recognition dictionary considering that the phone book is searched by voice recognition immediately when a mobile phone is connected, text data existing in the phone book is unique. Therefore, the created speech recognition dictionary is not necessarily an appropriate dictionary, and therefore, a search by appropriate speech recognition cannot be performed and a call cannot be made.

これらの問題は必ずしも車両用の機器に限らず、利用する装置に蓄積されている読み仮名の付与されていないデータを用いて音声認識用辞書を作成し、それを用いて音声認識を行うときには同様の問題を生じる。 These problems are not necessarily limited to devices for vehicles, but the same applies when a speech recognition dictionary is created using data that is stored in a device to which a reading device is not attached and speech recognition is performed using the dictionary. Cause problems.

したがって本発明は、利用する機器に蓄積されている読み仮名の付与されていないデータを用いて音声認識用辞書を作成し、それを用いて音声認識を行って各種の機器操作を行うとき、短時間で正しい音声認識用辞書を作成し、これを用いて正確に、且つ高速で音声認識を行うことができるようにした音声認識用辞書生成システムを提供することを主たる目的とする。 Accordingly, the present invention creates a speech recognition dictionary using data that is stored in a device to which reading is not assigned, and performs speech recognition using the dictionary to perform various device operations. A main object is to provide a speech recognition dictionary generation system that creates a correct speech recognition dictionary in time and can perform speech recognition accurately and at high speed using the dictionary.

本発明に係る音声認識用辞書生成システムは、上記課題を解決するため、利用者の発話音声を認識して機器操作を行うために用いる音声認識機器操作用辞書を、音声認識対象データ収集部で収集した音声認識辞書生成用元データに読み仮名を付与することにより生成する音声認識用辞書生成システムにおいて、前記音声認識対象データ収集部では、音声認識辞書生成用元データをデータの種類毎に分けて収集し、別途作成した音声認識辞書生成用基本辞書を用いて、前記音声認識辞書生成用元データに読み仮名を付与することにより前記音声認識機器操作用辞書を生成し、前記音声認識辞書生成用基本辞書は、予め前記音声認識装置で用いると予測される読み変換用元データを収集して読み仮名を付与し、前記音声認識辞書生成用元データを分ける種類と同じ種類に分けて作成されたものであり、前記音声認識機器操作用辞書の生成に際しては、前記音声認識対象データの種類に対応した前記音声認識辞書生成用基本辞書内の種類のデータを用いて読み仮名を付与して生成することを特徴とする。 In order to solve the above-described problem, the speech recognition dictionary generation system according to the present invention uses a speech recognition target data collection unit to create a speech recognition device operation dictionary used for device operation by recognizing a user's speech. In the speech recognition dictionary generation system that generates by adding reading kana to the collected speech recognition dictionary generation original data, the speech recognition target data collection unit divides the speech recognition dictionary generation source data for each type of data. Using the basic dictionary for speech recognition dictionary generation that is collected separately, and generating the speech recognition device operation dictionary by assigning a reading pseudonym to the original data for speech recognition dictionary generation, and generating the speech recognition dictionary The basic dictionary is used to collect the reading conversion original data predicted to be used in the voice recognition device in advance and give a reading pseudonym. In the generation of the voice recognition device operation dictionary, the type of data in the voice recognition dictionary generation basic dictionary corresponding to the type of the voice recognition target data is generated. It is characterized in that it is generated by assigning a reading pseudonym using.

また、本発明に係る他の音声認識用辞書生成システムは、前記音声認識用辞書生成システムにおいて、前記音声認識により操作する機器はオーディオ装置であり、前記音声認識対象データ収集部では、オーディオ装置の再生操作に必要なデータをデータの種類毎に収集し、前記音声認識辞書生成用基本辞書は、前記音声認識対象データ収集部で収集するデータを予測して基本読みデータをデータの種類毎に収集し、読みを付与して作成することを特徴とする。 Further, in another speech recognition dictionary generation system according to the present invention, in the speech recognition dictionary generation system, the device operated by the speech recognition is an audio device, and the speech recognition target data collection unit includes: Data necessary for playback operation is collected for each type of data, and the basic dictionary for speech recognition dictionary generation predicts data collected by the speech recognition target data collection unit and collects basic reading data for each type of data However, it is characterized by being given a reading.

また、本発明に係る他の音声認識用辞書生成システムは、前記音声認識用辞書生成システムにおいて、前記音声認識対象データ収集部では、前記オーディオ装置に他のオーディオプレーヤを接続したとき、該オーディオプレーヤのデータ記録媒体に記録されている曲の曲情報を取り込むことによって収集することを特徴とする。 In another speech recognition dictionary generation system according to the present invention, when the speech recognition target data collection unit of the speech recognition dictionary generation system is connected to another audio player in the audio device, the audio player It collects by taking in the music information of the music currently recorded on this data recording medium.

また、本発明に係る他の音声認識用辞書生成システムは、前記音声認識用辞書生成システムにおいて、前記音声認識辞書生成用基本辞書によって音声認識辞書生成用元データに読み仮名を付与できない単語は、音声合成手段によって読み仮名を付与して音声認識機器操作用辞書を生成することを特徴とする。 Further, in another dictionary generation system for speech recognition according to the present invention, in the dictionary generation system for speech recognition, a word that cannot be given a reading pseudonym to the original data for speech recognition dictionary generation by the basic dictionary for speech recognition dictionary generation, A voice recognition device operation dictionary is generated by adding a reading pseudonym by voice synthesis means.

また、本発明に係る他の音声認識用辞書生成システムは、前記音声認識用辞書生成システムにおいて、前記別途作成した音声認識辞書生成用基本辞書は、バイナリー処理することを特徴とする。 In another speech recognition dictionary generation system according to the present invention, the separately created basic dictionary for speech recognition dictionary generation is binary processed in the dictionary generation system for speech recognition.

また、本発明に係る他の音声認識用辞書生成システムは、前記音声認識用辞書生成システムにおいて、前記音声認識辞書生成用基本辞書は、前記音声認識対象データ収集部で収集したデータに読まれない記号を除いた処理を行うとき、読み変換用元データに読みを付与するときも同じ処理をして作成することを特徴とする。 In another speech recognition dictionary generation system according to the present invention, in the speech recognition dictionary generation system, the basic dictionary for speech recognition dictionary generation is not read by data collected by the speech recognition target data collection unit. When the processing excluding the symbols is performed, the same processing is performed when the reading is given to the reading conversion original data.

本発明は上記のように構成したので、利用する装置に蓄積されている読み仮名の付与されていないデータを用いて音声認識機器操作用辞書を作成し、それを用いて音声認識を行って各種の機器操作を行うとき、短時間で正しい音声認識用辞書を容易に作成し、これを用いて正確に、且つ高速で音声認識を行うことができるようにした音声認識用辞書生成システムとすることができる。 Since the present invention is configured as described above, a dictionary for operating a voice recognition device is created using data that is stored in a device to which a reading device is not attached and voice recognition is performed using the dictionary. When a device is operated, a correct speech recognition dictionary can be easily created in a short time, and a speech recognition dictionary generation system can be used to accurately and quickly perform speech recognition using the dictionary. Can do.

本発明は、利用する装置に蓄積されている読み仮名の付与されていないデータを用いて音声認識用辞書を作成し、それを用いて音声認識を行って各種操作を行うとき、短時間で正しい音声認識用辞書を作成し、これを用いて正確に、且つ高速で音声認識を行うことができるようにするという目的を、利用者の発話音声を認識して機器操作を行うために用いる音声認識機器操作用辞書を、音声認識対象データ収集部で収集した音声認識辞書生成用元データに読み仮名を付与することにより生成する音声認識用辞書生成システムにおいて、前記音声認識対象データ収集部では、音声認識辞書生成用元データをデータの種類毎に分けて収集し、別途作成した音声認識辞書生成用基本辞書を用いて、前記音声認識辞書生成用元データに読み仮名を付与することにより前記音声認識機器操作用辞書を生成し、前記音声認識辞書生成用基本辞書は、予め前記音声認識装置で用いると予測される読み変換用元データを収集して読み仮名を付与し、前記音声認識辞書生成用元データを分ける種類と同じ種類に分けて作成されたものであり、前記音声認識機器操作用辞書の生成に際しては、前記音声認識対象データの種類に対応した前記音声認識辞書生成用基本辞書内の種類のデータを用いて読み仮名を付与して生成することにより実現した。 The present invention creates a speech recognition dictionary using data that has not been given a reading pseudonym stored in a device to be used, and performs voice recognition using the dictionary for correct operation in a short time. Voice recognition used to create a voice recognition dictionary and use it to recognize voices of users and perform device operations for accurate and high-speed voice recognition. In the speech recognition dictionary generation system that generates the device operation dictionary by reading and adding kana to the voice recognition dictionary generation original data collected by the voice recognition target data collection unit, the voice recognition target data collection unit The recognition dictionary generation original data is collected separately for each type of data, and a reading pseudonym is given to the voice recognition dictionary generation original data using a separately created basic dictionary for speech recognition dictionary generation. The voice recognition device operation dictionary is generated by the above, the basic dictionary for voice recognition dictionary generation collects the reading conversion original data that is predicted to be used in the voice recognition device in advance and gives a reading pseudonym, The voice recognition dictionary is created by dividing the original data for generating the voice recognition dictionary into the same type, and when generating the voice recognition device operation dictionary, the voice recognition dictionary corresponding to the type of the voice recognition target data This was realized by using a type of data in the basic dictionary for generation and assigning it with a reading pseudonym.

本発明の実施例を図面に沿って説明する。図１は本発明をオーディオ装置に適用した実施例における機能ブロック図であり、本発明は図示するように、第１にオーディオ装置のメーカー等が行う、ＰＣでの音声認識辞書生成用基本辞書６を作成する処理と、第２にオーディオ装置でこの音声認識辞書生成用基本辞書６を用いて、携帯型オーディオプレーヤ音声認識機器操作用辞書を生成する処理とに大別される。 Embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a functional block diagram of an embodiment in which the present invention is applied to an audio apparatus. As shown in the figure, the present invention first includes a basic dictionary 6 for generating a speech recognition dictionary on a PC, which is first performed by the manufacturer of the audio apparatus. And secondly, a process for generating a dictionary for operating a portable audio player voice recognition device using the basic dictionary 6 for voice recognition dictionary generation in the audio device.

第１のＰＣでの音声認識辞書生成用基本辞書を作成する処理に際しては、図示の例においては、機器を操作する機器操作用データベース（ＤＢ）１と基本曲情報データベース(ＤＢ）２とを用い、基本読みデータ収集部３において読み変換用元データを収集している。この作業は原則として、オーディオ装置のメーカー等が、オーディオ装置の付加価値を高めるサービスとして行う。その収集に際してはデータの種類毎に収集を行い、オーディオ装置で曲の選択再生を行うためには図示するように、機器を操作するための機器操作基本データ、曲名データ、アーティスト名データ、アルバム名データ、その他必要に応じて追加される例えばジャンル名データ等を種類毎に収集する。 In the process of creating a basic dictionary for generating a speech recognition dictionary on the first PC, in the illustrated example, a device operation database (DB) 1 and a basic song information database (DB) 2 for operating devices are used. The basic reading data collecting unit 3 collects original data for reading conversion. In principle, this work is performed by a manufacturer of an audio device as a service that increases the added value of the audio device. When collecting the data, it is collected for each type of data, and in order to select and play a song with an audio device, as shown in the figure, basic device operation data for operating the device, song name data, artist name data, album name Data, for example, genre name data added as necessary, is collected for each type.

このような基本読みデータ収集部３における機器操作基本データの収集に際しては、機器を操作するために必要とされる単語を予め機器操作用データベース１として記憶させているときにはこれを用いることができ、このようなデータが存在しないときには、機器操作に必要とされると推定する単語をパソコンを用いて入力することにより行う。 In collecting the device operation basic data in the basic reading data collecting unit 3 as described above, it is possible to use this when the words necessary for operating the device are stored in advance as the device operation database 1. When such data does not exist, a word estimated to be necessary for device operation is input by using a personal computer.

その外の曲名データ、アーティスト名データ、アルバム名データは、基本曲情報データベース２に予め蓄積しているデータを用いる。基本曲情報データベース２としては種々のものを用いることができるが、例えばＣＤの曲情報を収集して公開しているＣＤＤＢ（ＣＤデータベース）を用いることができる。このデータにはＣＤのＴＯＣデータが含まれているので、ＣＤに記録された曲名、アーティスト名、アルバム名、ジャンル名、発売年月等のデータを容易に収集することができる。 For the other song name data, artist name data, and album name data, data stored in advance in the basic song information database 2 is used. Various kinds of basic music information database 2 can be used. For example, CDDB (CD database) which collects and discloses CD music information can be used. Since this data includes the TOC data of the CD, it is possible to easily collect data such as the song name, artist name, album name, genre name, and release date recorded on the CD.

図１の基本読みデータ収集部３における読み変換用元データは、例えば図２に示すようなデータである。即ち図２に示す読み変換用元データ収集例には、機器操作用データベースから機器操作基本データとして、「Ｐｌａｙ」「Ｖｏｌ」「Ｖｏｌ．」「Ａｒｔｉｓｔ」「Ｓｏｎｇ」「Ｂｙ」等を抽出した例を示している。ここでは音声認識による操作を行う利用者が、オーディオ装置に対して発声するこれらの音声を認識することができるように、また、後述するようなプログラムによって音声認識辞書を作成する処理を行うことができるように、使用される単語を予め調査し、更に推定して収集する。 The original data for reading conversion in the basic reading data collecting unit 3 in FIG. 1 is, for example, data as shown in FIG. That is, in the reading conversion original data collection example shown in FIG. 2, “Play”, “Vol”, “Vol.”, “Artist”, “Song”, “By”, etc. are extracted as device operation basic data from the device operation database. Is shown. Here, a user who performs an operation based on speech recognition can perform processing for creating a speech recognition dictionary by a program as described later so that the speech uttered to the audio device can be recognized. In order to be able to do so, the words used are pre-examined and further estimated and collected.

基本曲情報データベースから収集する曲名の読みデータとしては、例えば図２においては「Ａｎｙｔｈｉｎｇｆｏｒｙｏｕ」「Ｂｌａｃｋ＆Ｂｌｕｅ」「Ｃｒａｚｙ４Ｕ」「ＤＡＮＣＥ２」等を抽出した例を示している。またアーティスト名の読みデータとしては「ＢｏｎｎｉｅＰｉｎｋ」「Ｃｏｃｃｏ」「ｈｉｄｅ」「Ｍｒ．Ｃｈｉｌｄｒｅｎ」等を抽出し、アルバム名読みデータとしては「Ｂｅｓｔ−ｆｉｒｓｔ−ｔｈｉｎｇｓ［ｄｉｓｃ１］」「Ｂｅｓｔ−ｆｉｒｓｔ−ｔｈｉｎｇｓ［ｄｉｓｃ２］」「Ｃｒｉｓｐｙ！」「ＵＬＴＲＡＢＵＬＥ」等を抽出し、その他のデータとして「ＡＣ／ＤＣ」等を抽出した例を示している。 For example, FIG. 2 shows an example in which “Anything for you”, “Black & Blue”, “Crazy 4 U”, “DANCE 2”, and the like are extracted as the reading data of the song titles collected from the basic song information database. Also, “Bonnie Pink”, “Cocco”, “hide”, “Mr. Children”, etc. are extracted as the reading data of the artist name, and “Best-first-things [disc1]”, “Best-first-things” are extracted as the album name reading data. In this example, [disc2], “Crispy!”, “ULTRA BURE” and the like are extracted, and “AC / DC” and the like are extracted as other data.

図１における読み変換処理部４においては、基本読みデータ収集部３で収集した前記のように収集した読み変換用元データについて、読み仮名を付与して読み変換を行う処理をし、読み変換済元データを得る。ここでもデータの種類毎に作成するが、その読み仮名の付与に際しては、ＣＤＤＢで読み仮名情報が付与されている場合はそれを利用することができ、付与されていないときにはこの音声認識辞書生成用基本辞書を作成する部署の人が適切なデータを入力することにより行う。その際に入力するデータは、必ずしも現在提供されている全ての楽曲に対して付与する必要はなく、有名なアーティストの曲で読みが特殊な曲名、アーティスト名、アルバム名等について付与するのみでも本発明を実施することができる。 In the reading conversion processing unit 4 in FIG. 1, the reading conversion original data collected by the basic reading data collecting unit 3 is subjected to a reading conversion process by assigning a reading kana to the reading converted original data. Get the original data. Here, it is also created for each data type. When the kana is given, it can be used if the kana information is given by the CDDB, and when it is not given, this voice recognition dictionary is generated. A person in the department that creates the basic dictionary inputs appropriate data. The data to be entered at this time does not necessarily need to be assigned to all currently provided songs. Even if it is given for a song name, artist name, album name, etc. that are specially read by famous artist songs, this book The invention can be implemented.

この処理の結果、例えば図３に示すような読み変換済元データを作成することとなる。即ち図３に示す例においては、機器操作基本データにおいて表記文字列が「Ｐｌａｙ」である文字を「プレイ」と読むものとし、以下同様に「Ｖｏｌ」及び「Ｖｏｌ．」を同じ「ボリューム」、「Ａｒｔｉｓｔ」を「アーティスト」、「Ｓｏｎｇ」を「ソング」、「Ｂｙ」を「バイ」と読むようにデータを作成した例を示している。曲名読みデータについては「Ａｎｙｔｈｉｎｇｆｏｒｙｏｕ」を「エニシングフォーユー」、「Ｂｌａｃｋ＆Ｂｌｕｅ」を「ブラックアンドブルー」、「Ｃｒａｚｙ４Ｕ」を「クレイジーフォーユー」、「ＤＡＮＣＥ２」を「ダンスダンス」と読むものとした例を示している。 As a result of this processing, for example, read-converted original data as shown in FIG. 3 is created. That is, in the example shown in FIG. 3, the character whose character string is “Play” in the device operation basic data is read as “play”, and hereinafter “Vol” and “Vol.” Are similarly referred to as “volume”, In this example, “Artist” is read as “Artist”, “Song” is read as “Song”, and “By” is read as “Bi”. For song title reading data, “Anything for you” is “Anything For You”, “Black & Blue” is “Black and Blue”, “Crazy 4 U” is “Crazy For You”, “DANCE2” is “Dance Dance” Shows an example to read.

このように、通常「ＤＡＮＣＥ２」は「ダンスツー」と読まれることが多いのに対して、正式には「ダンスダンス」と読むことを入力しておくことにより、従来のオーディオ装置では困難であった読み仮名が振られることが少ない英語文字についても、正確な読み仮名を予め付与することができ、その後の音声認識処理において利用者が「ダンスダンス」と発生したとき、正しく「ＤＡＮＣＥ２」の曲であることを認識し、直ちにその曲の再生が可能となる。 In this way, “DANCE 2” is usually read as “Dance Two”, but by inputting “Dance Dance” formally, it is difficult with conventional audio devices. Even for English characters that are rarely used for reading kana, correct kana can be given in advance, and when the user generates “dance dance” in the subsequent speech recognition process, the correct “DANCE 2” It is possible to immediately reproduce the song.

またアーティスト名の読みデータとしては「ＢｏｎｎｉｅＰｉｎｋ」を「ボニーピンク」、「Ｃｏｃｃｏ」を「コッコ」、「ｈｉｄｅ」を「ヒデ」、「Ｍｒ．Ｃｈｉｌｄｒｅｎ」を「ミスターチルドレン」と読むものとし、特に「Ｍｒ．Ｃｈｉｌｄｒｅｎ」については、「ミスチル」と略称や愛称で呼ばれることが多いことを考慮してこの読みも別途入力している。このような略称や愛称も入力することにより、利用者が音声認識に際して同一のアーティストを種々の態様で発声することに柔軟に対応することができるようになる。 As the reading data of the artist name, “Bonnie Pink” should be read as “Bonnie Pink”, “Cocco” as “Koko”, “hide” as “Hide”, and “Mr. Children” as “Mr. Children”. “Mr. Children” is also input separately in consideration of the fact that it is often referred to as “mystil” by an abbreviation or nickname. By inputting such abbreviations and nicknames, it becomes possible to flexibly cope with the user uttering the same artist in various modes during voice recognition.

アルバム名読みデータとしては「Ｂｅｓｔ−ｆｉｒｓｔ−ｔｈｉｎｇｓ［ｄｉｓｃ１］」を「ベストファーストシングスディスクワン」、「Ｂｅｓｔ−ｆｉｒｓｔ−ｔｈｉｎｇｓ［ｄｉｓｃ２］」を「ベストファーストシングスディスクツー」、「Ｃｒｉｓｐｙ！」を「クリスピー」、「ＵＬＴＲＡＢＵＬＥ」を「ウルトラブルー」と読むものとした例を示している。更にその他のデータとして「ＡＣ／ＤＣ」は「エーシーディーシー」と読み仮名を付与した例を示している。 As the album name reading data, “Best-first-things [disc1]” is “Best First Things Disc One”, “Best-first-things [disc2]” is “Best First Things Disc Two”, “Crispy!” In this example, “crispy” and “ULTRA BURE” are read as “ultra blue”. In addition, as other data, “AC / DC” indicates an example in which “AC” is read as “AC”.

バイナリー化処理部５では、読み変換処理部４で作成した読み変換済元データについて、前記の種類毎にバイナリーデータとする処理を行う。ここでバイナリー化するのは、このデータを直接利用することができるようにするためであり、これによりここで作成した音声認識辞書生成用基本辞書をオーディオ装置、或いはこのオーディオ装置と接続したナビゲーション装置において、他の各種処理を行うとき、一般のコンパイル処理がされることがないようにし、オーディオ装置やこれと接続するナビゲーション装置で読み込み以外の処理が発生しないようにすることができる。なお、バイナリー処理するに際しては、より正確には周知のように、コンパイル処理をしてバイナリー化を行うこととなる。 In the binarization processing unit 5, the read-converted original data created by the read conversion processing unit 4 is processed into binary data for each type. Here, the binarization is performed so that this data can be used directly, whereby the basic dictionary for speech recognition dictionary generation created here is an audio device or a navigation device connected to this audio device. When performing various other processes, it is possible to prevent a general compilation process from being performed, and to prevent any process other than reading from occurring in the audio apparatus or the navigation apparatus connected thereto. It should be noted that, when performing binary processing, more accurately, as is well known, compilation is performed and binarization is performed.

このようにしてバイナリー処理化された、前記種類毎の音声認識辞書生成用基本辞書６は、オーディオ装置１１として示している車両用オーディオ装置、或いはこれに接続したナビゲーション装置（以下オーディオ装置と略称する）における音声認識処理部２１において用いられる。その際には、オーディオ装置等に備えたメモリに入力し、ＨＤＤ等のデータ記録媒体にダウンロードし、或いはデータを記録したメモリを移動し、更には回路のチップとして供給することができる。 The basic dictionary 6 for generating a speech recognition dictionary for each type thus binarized is the vehicle audio device shown as the audio device 11 or a navigation device connected thereto (hereinafter abbreviated as an audio device). The voice recognition processing unit 21 in FIG. In that case, the data can be input to a memory provided in an audio device or the like, downloaded to a data recording medium such as an HDD, or moved to a memory in which data is recorded, and further supplied as a circuit chip.

図１に示すオーディオ装置での携帯型オーディオプレーヤ音声認識機器操作用辞書２９の作成処理の例においては、車両用等のオーディオ装置１１に携帯型オーディオプレーヤ１２を、携帯型オーディオプレーヤの外部機器接続部１３と、オーディオ装置１１の外部機器接続部１４とを有線或いは無線で接続しており、それによりオーディオ装置１１における外部機器操作信号出力部１６の信号によって携帯型オーディオプレーヤ１２を再生操作等の操作を行い、またその操作指示に従って、携帯型オーディオプレーヤの任意のデータをオーディオ装置１１のデータ取込部１４から取り込むことができるようになっている。 In the example of the creation process of the portable audio player voice recognition device operation dictionary 29 in the audio device shown in FIG. 1, the portable audio player 12 is connected to the audio device 11 for vehicles and the like, and the external device of the portable audio player is connected. Unit 13 and the external device connection unit 14 of the audio device 11 are connected by wire or wirelessly, so that the portable audio player 12 can be played back by a signal from the external device operation signal output unit 16 in the audio device 11. An operation is performed, and arbitrary data of the portable audio player can be captured from the data capturing unit 14 of the audio device 11 in accordance with the operation instruction.

図１に示すオーディオ装置１１の音声認識処理部２１は、音声認識対象データ収集部２３を備えており、ここではオーディオ装置１１に携帯型オーディオプレーヤを前記のように接続し、オーディオ装置１１と携帯型オーディオプレーヤ間での前記のような通信が可能となったときに、自動的に音声認識対象データ収集部２３が携帯型オーディオプレーヤ１２に内蔵したメモリチップやハードディスク等のデータ記録媒体から、そこに記録しているオーディオデータについて、曲情報を取り込む。この曲情報の中には楽曲自体を記録したオーディオデータの取り込みは必要としない。これらの曲情報は、携帯型オーディオプレーヤのデータ記録媒体にＭＰ３の形式でオーディオデータが記録されているときには、そのデータの中の曲情報を記録したタグ部分から抽出して収集することができる。 The speech recognition processing unit 21 of the audio device 11 shown in FIG. 1 includes a speech recognition target data collection unit 23. Here, the portable audio player is connected to the audio device 11 as described above, and the audio device 11 and the audio device 11 are portable. When the above-described communication is possible between the type audio players, the voice recognition target data collection unit 23 automatically moves from a data recording medium such as a memory chip or a hard disk incorporated in the portable audio player 12 Import song information for audio data recorded in The music information does not need to include audio data recording the music itself. These pieces of music information can be extracted and collected from the tag portion in which the music information in the data is recorded when the audio data is recorded in the MP3 format on the data recording medium of the portable audio player.

ここで収集する曲情報は、利用者が曲の選択を行うときに指示する、例えば曲名、アーティスト名、アルバム名、更にはジャンル名等の種類に分けて取り込む。これらの種類は全てデータ記録媒体にＭＰ３等で記録している曲情報のデータから取り込むことができるが、それらのデータが存在しないものについては、予めオーディオ装置１１にＣＤＤＢのデータを備えているときには、そのデータを検索して取り込むこともでき、更にはオーディオ装置１１にインターネット等の通信機能を備えているときには、直接ＣＤＤＢデータ提供サイトと接続し、データを取り込むこともできる。なお、前記のようにＰＣ上で作業を行う音声認識辞書生成用基本辞書６を作成する際に収集する曲情報の種類は、音声認識辞書用データ収集部２３の種類毎に収集する音声認識辞書生成用元データと同じ種類分けとし、両者のミスマッチを防止する。 The song information collected here is fetched by dividing it into types such as a song name, artist name, album name, and genre name, which are instructed when the user selects a song. All of these types can be taken in from the music information data recorded on the data recording medium by MP3 or the like. However, when the data does not exist, the audio device 11 has CDDB data in advance. The data can be retrieved and imported. Further, when the audio device 11 has a communication function such as the Internet, the data can be directly connected to the CDDB data providing site. Note that the type of song information collected when creating the basic dictionary 6 for generating a speech recognition dictionary that works on the PC as described above is the speech recognition dictionary that is collected for each type of the data collection unit 23 for the speech recognition dictionary. The same classification as the original data for generation is used to prevent mismatch between the two.

音声認識処理部２１には前記ＰＣ上で作成した音声認識辞書生成用基本辞書６のデータを、オーディオ装置１１の音声認識処理部でアクセスするデータ記録媒体にダウンロードし、或いは予めチップ等の形式で装備し、或いはメモリチップとして挿入する等により、この音声認識辞書生成用基本辞書６のデータを利用することができるようになる。なお、このようにして音声認識処理部２１に音声認識辞書生成用基本辞書６が存在するとき、その後このデータを更新することができるようにし、このオーディオ装置１１のメーカー等が更新データを提供して、年々多数の曲が作られることに対応することがより好ましい。 The voice recognition processing unit 21 downloads the data of the voice recognition dictionary generation basic dictionary 6 created on the PC to a data recording medium accessed by the voice recognition processing unit of the audio device 11 or in advance in the form of a chip or the like. The data of the basic dictionary 6 for voice recognition dictionary generation can be used by installing or inserting it as a memory chip. When the voice recognition dictionary generating basic dictionary 6 exists in the voice recognition processing unit 21 in this way, the data can be updated thereafter, and the manufacturer of the audio device 11 provides update data. Therefore, it is more preferable to cope with the production of a large number of songs year by year.

音声認識処理部２１の読みデータ生成後処理部２４では、音声認識対象データ収集部２３で収集した曲情報について、音声認識辞書生成用基本辞書６を用いて読みデータを作成する。その際には音声認識対象データ収集部２３で収集した曲情報の種類が例えばアーティスト名データであるとき、この読みデータを生成するに際して用いる音声認識辞書生成用基本辞書６についても同じ種類であるアーティスト名データ部分を検索する。このような処理を行うことにより、少ないデータから高速で、且つ正確な読みデータを付与することができる。 The reading data generation post-processing unit 24 of the voice recognition processing unit 21 creates reading data for the music information collected by the voice recognition target data collection unit 23 using the basic dictionary 6 for voice recognition dictionary generation. In this case, when the type of song information collected by the voice recognition target data collection unit 23 is, for example, artist name data, the voice recognition dictionary generation basic dictionary 6 used for generating the reading data is also the same type of artist. Search the name data part. By performing such processing, accurate reading data can be given from a small amount of data at high speed.

この読みデータ生成後処理部２４で、読みデータが音声認識辞書生成用基本辞書６にあったときには、読みデータ有り２５の読みデータ付曲情報データとして携帯型オーディオプレーヤ音声認識機器操作用辞書２９に記録する。また、読みデータ生成後処理部２４で、読みデータが音声認識辞書生成用基本辞書６になかったときには、読みデータ無し２６として次に行う後処理部としての、読みデータ生成後処理部２７に出力する。読みデータ生成後処理部２７においては、例えばナビゲーション装置等で広く用いている音声案内のための音声合成（ＴＴＳ：ｔｅｘｔ−ｔｏ−ｓｐｅｅｃｈ）技術における音声合成辞書２８、及び音声合成処理技術を用いて、例えばアーティスト名に対応する音声合成データがあるときにはそのデータを直接用い、無いときには通常読みと推定される読み方の読みデータを生成する。このようにして読みデータ生成後処理部２７で生成した読みデータ付曲情報データとして携帯型オーディオプレーヤ音声認識機器操作用辞書２９に記録する。なお、図１の例においてはオーディオ装置１１の音声認識辞書部２１において、携帯型オーディオプレーヤ１２を音声認識により操作する例を示すため「携帯型オーディオプレーヤ音声認識機器操作用辞書２９として示しているが、各種機器を操作するときには単に「音声認識機器操作用辞書２９」と言い換えることができる。 When the reading data is generated in the voice recognition dictionary generating basic dictionary 6 in the reading data generation post-processing unit 24, the reading data is added to the portable audio player voice recognition device operation dictionary 29 as music information data with reading data 25. Record. When the reading data generation post-processing unit 24 does not have the reading data in the speech recognition dictionary generation basic dictionary 6, the reading data is output to the reading data generation post-processing unit 27 as a post-processing unit to be performed next as no reading data 26. To do. In the post-data generation post-processing unit 27, for example, a speech synthesis dictionary 28 in speech synthesis (TTS: text-to-speech) technology for speech guidance widely used in navigation devices and the like, and speech synthesis processing technology are used. For example, when there is speech synthesis data corresponding to the artist name, the data is directly used, and when there is no speech synthesis data, reading data for reading that is assumed to be normal reading is generated. In this way, it is recorded in the portable audio player speech recognition device operation dictionary 29 as music information information with read data generated by the read data generation post-processing unit 27. In the example of FIG. 1, in order to show an example of operating the portable audio player 12 by voice recognition in the voice recognition dictionary unit 21 of the audio device 11, it is shown as “a portable audio player voice recognition device operation dictionary 29. However, when operating various devices, it can be simply referred to as “speech recognition device operation dictionary 29”.

携帯型オーディオプレーヤ音声認識機器操作用辞書２９においては、前記のように読みデータ生成処理部２４において音声認識辞書生成用基本辞書６を用いて生成した読みデータ付曲情報データと、読みデータ生成後処理部２７で生成した読みデータ付曲情報データとにより、オーディオ装置１１に接続した携帯型オーディオプレーヤ１２が蓄積している曲について、携帯型オーディオプレーヤをオーディオ装置に接続したとき直ちに音声認識用の辞書を生成することができる。 In the portable audio player speech recognition device operation dictionary 29, as described above, the reading data generation processing unit 24 uses the speech recognition dictionary generation basic dictionary 6 to generate song information data with reading data, and after generation of the reading data. For the music stored in the portable audio player 12 connected to the audio device 11 based on the song information data with read data generated by the processing unit 27, the voice recognition is immediately performed when the portable audio player is connected to the audio device. A dictionary can be generated.

そのため、その後マイク１７に対して利用者が、曲を再生するために「プレイ」「アーティスト」「ボニーピンク」と予め定めた順序で発音することにより、音声認識処理部３０がこれらの音声について携帯型オーディオプレーヤ音声認識機器操作用辞書２９を順に検索して認識し、「ボニーピンク」の曲を再生する、という音声認識結果３１を得ることができる。この音声認識結果３１により、外部機器操作信号出力部１６では携帯型オーディオプレーヤ１２に対して「ボニーピンク」の曲を選択して出力する指示の出力を両機器の外部機器接続部を介して行い、出力されたオーディオデータをオーディオ装置１１が取り込んで再生処理を行う。 Therefore, after that, the user recognizes “voice”, “artist”, and “bonnie pink” in a predetermined order in order to reproduce the music, and the voice recognition processing unit 30 carries these voices with the microphone 17. Type audio player voice recognition device operation dictionary 29 can be searched and recognized in order, and a voice recognition result 31 that “bonnie pink” music is played can be obtained. Based on the voice recognition result 31, the external device operation signal output unit 16 outputs an instruction to select and output a “bonny pink” song to the portable audio player 12 via the external device connection unit of both devices. The audio device 11 takes in the output audio data and performs reproduction processing.

前記のような機能ブロックで構成される本発明の音声認識用辞書生成システムにおいては、例えば図４〜図６に示す作動フローにより順に作動させることによって実施することができる。図４には音声認識辞書生成用基本辞書の作成処理の作動フローを示し、この作動は図１のＰＣ上での音声認識辞書生成用基本辞書６の作成処理部分で行うものであり、最初音声認識辞書生成用基本辞書作成用の単語の収集を行う（ステップＳ１）。次いで収集した単語を、Ａ．機器操作基本データ、Ｂ．曲名データ、Ｃ．アーティスト名データ、Ｄ．アルバム名データ、Ｅ．その他等の種類毎に分類した読み変換用元データを作成する（ステップＳ２）。これらの作動は、図１における基本読みデータ収集部３において、機器操作用の単語を機器操作用データベース１から、また各種の曲情報を基本曲情報データベースから収集することにより行う。 In the speech recognition dictionary generation system of the present invention configured by the functional blocks as described above, for example, it can be implemented by sequentially operating according to the operation flow shown in FIGS. FIG. 4 shows an operation flow of the creation process of the voice recognition dictionary generation basic dictionary. This operation is performed in the process of creating the voice recognition dictionary generation basic dictionary 6 on the PC of FIG. A word for generating a basic dictionary for generating a recognition dictionary is collected (step S1). The collected words are then A. Device operation basic data; Song name data, C.I. Artist name data, D.C. Album name data, E.I. The original data for reading conversion classified for each type such as other is created (step S2). These operations are performed by collecting words for device operation from the device operation database 1 and various pieces of song information from the basic song information database in the basic reading data collection unit 3 in FIG.

その後元データの単語の読みを入力する（ステップＳ３）。この処理は図１の読み変換処理部４において、先に述べた手法により行うことができる。次いで、このようにして得られた読み変換用元データ、及びこれに対して付与された読みデータとをバイナリー化し（ステップＳ４）、単語の種類毎に分けた音声認識辞書生成用基本辞書を作成する（ステップＳ５）。 Thereafter, the word reading of the original data is input (step S3). This processing can be performed by the reading conversion processing unit 4 in FIG. 1 by the method described above. Next, the original data for reading conversion obtained in this way and the reading data given thereto are binarized (step S4), and a basic dictionary for generating a speech recognition dictionary divided for each type of word is created. (Step S5).

このようにして得られた音声認識辞書生成用基本辞書を用いて音声認識機器操作用辞書を生成するには、図５に示す作動フローによって行うことができる。図５に示す音声認識機器操作用辞書生成処理においては、最初に携帯型オーディオプレーヤをオーディオ装置に接続する（ステップＳ１１）。次いでオーディオ装置で携帯型オーディオプレーヤの曲情報をデータの種類毎に取得する（ステップＳ１２）。この処理は図１のオーディオ装置１１における音声認識処理部２１の音声認識対象データ収集部２３で行う。 Generation of the voice recognition device operation dictionary using the basic dictionary for voice recognition dictionary generation thus obtained can be performed by the operation flow shown in FIG. In the voice recognition device operation dictionary generation process shown in FIG. 5, the portable audio player is first connected to the audio device (step S11). Next, music information of the portable audio player is acquired for each data type by the audio device (step S12). This processing is performed by the voice recognition target data collection unit 23 of the voice recognition processing unit 21 in the audio apparatus 11 of FIG.

このデータ収集によって音声認識辞書生成用元データの作成がなされ（ステップＳ１３）、次いで元データの認識用単語を順に選択出力し（ステップＳ１４）、その際に元データの種類を判別する（ステップＳ１５）。その後元データの種類に応じた音声認識辞書生成用基本辞書の種類を選択し（ステップＳ１６）、選択した種類の音声認識辞書生成用基本辞書に読みデータはあるか否かを判別する（ステップＳ１７）。その結果音声認識辞書生成用基本辞書に読みデータがないと判別したときには、音声合成（ＴＴＳ）用辞書及びその処理技術によって読みデータを生成する。これらの処理は図１において読みデータ生成処理部２４において音声認識辞書生成用基本辞書６を用いて、同じデータ種別の部分の読みデータを検索し、ここに読みデータがないと読みデータ生成後処理部２７において音声合成（ＴＴＳ）辞書２８、及びその処理技術を用いて読みデータを得ることによって行う。 By this data collection, original data for generating a speech recognition dictionary is created (step S13), and then words for recognition of the original data are sequentially selected and output (step S14), and at that time, the type of the original data is determined (step S15). ). Thereafter, the type of the basic dictionary for speech recognition dictionary generation corresponding to the type of original data is selected (step S16), and it is determined whether or not there is reading data in the selected type of basic dictionary for speech recognition dictionary generation (step S17). ). As a result, when it is determined that there is no reading data in the voice recognition dictionary generation basic dictionary, reading data is generated by the voice synthesis (TTS) dictionary and its processing technology. In these processes, the reading data generation processing unit 24 in FIG. 1 uses the speech recognition dictionary generation basic dictionary 6 to search for reading data of the same data type, and if there is no reading data, the reading data generation post-processing is performed. In the unit 27, the speech synthesis (TTS) dictionary 28 and its processing technique are used to obtain reading data.

ステップＳ１７において、選択した種類の音声認識辞書生成用基本辞書に読みデータがあると判別したときには、音声認識辞書生成用基本辞書により読みデータを生成し（ステップＳ１９）、その後ステップＳ１８において読みデータの生成が行われた場合と共に、全ての音声認識用元データの読みデータを生成したか否かを判別し（ステップＳ２０）、未だ音声認識用元データにおいて読みデータを生成していないものが存在すると判別したときにはステップＳ１４に戻り、元データの認識用単語を順に選択出力する作動から以下同様の作動を繰り返す。最終的にステップＳ２０において全ての音声認識用元データの読みデータを生成したと判別したときには、このデータを音声認識機器操作用辞書が完成し、図１の携帯型オーディオプレーヤ音声認識機器操作用辞書２９が完成することとなる（ステップＳ２１）。 If it is determined in step S17 that the selected type of speech recognition dictionary generation basic dictionary contains reading data, the reading data is generated by the speech recognition dictionary generation basic dictionary (step S19), and then in step S18, the reading data Along with the generation, it is determined whether or not the reading data of all the voice recognition original data has been generated (step S20), and there is still the voice recognition original data for which no reading data has been generated. When the determination is made, the process returns to step S14, and the same operation is repeated from the operation of selecting and outputting the recognition words of the original data in order. When it is finally determined in step S20 that the reading data of all the voice recognition original data has been generated, the voice recognition device operation dictionary is completed with this data, and the portable audio player voice recognition device operation dictionary of FIG. 29 is completed (step S21).

図５のようにして得られた携帯型オーディオプレーヤ音声認識機器操作用辞書を用いて行う、音声認識によるオーディオ機器の再生操作は、図６に示す作動フローにより行うことができる。即ち図６に示す音声認識によるオーディオ機器再生操作処理においては、最初に曲再生操作用音声の発声がなされ（ステップＳ３１）、その後発声した音声を先に生成した音声認識機器操作用辞書のデータの検索を行い（ステップＳ３２）、この検索によって音声認識処理がなされる（ステップＳ３３）。 The reproduction operation of the audio device by voice recognition performed using the portable audio player voice recognition device operation dictionary obtained as shown in FIG. 5 can be performed by the operation flow shown in FIG. That is, in the audio device playback operation processing by voice recognition shown in FIG. 6, the music playback operation voice is first uttered (step S31), and then the voice recognition device operation dictionary data generated earlier is generated. A search is performed (step S32), and a speech recognition process is performed by this search (step S33).

この検索及び音声認識処理に際しては、前記のように予め携帯型オーディオプレーヤの機器操作を音声認識により行う、音声認識機器操作用辞書を生成する処理を行っている結果、オーディオ装置に接続した携帯型オーディオプレーヤが蓄積している曲情報は全て音声認識機器操作用辞書に存在することとなり、利用者が発声する曲の再生に関する音声において、携帯型オーディオプレーヤに存在する曲はほぼ確実に認識することができる。 In this search and voice recognition processing, as described above, the device operation of the portable audio player is performed by voice recognition in advance, and as a result of generating the voice recognition device operation dictionary, the portable type connected to the audio device is obtained. All the music information stored in the audio player is present in the voice recognition device operation dictionary, and the music existing in the portable audio player is almost certainly recognized in the voice related to the playback of the music uttered by the user. Can do.

その後曲再生操作の発声は終了したか否かの判別を行い（ステップＳ３４）、例えば３秒間発声が途切れたか否かを検出することによりこの判別を行って、未だ終了していないと判別したとき、即ち続いて音声が発声されたときには再びステップＳ３２に戻って、発声した音声を先に生成した音声認識機器操作用辞書を検索し、以下同様の作動を繰り返す。ステップＳ３４で前記のように所定時間次の発声がなされないとき、或いは携帯型オーディオプレーヤの再生操作の音声ではないと判別したときのような場合は、曲再生操作の発声が終了したと判別し、認識した言葉により機器を操作し、指示した曲の再生操作を行う（ステップＳ３５）。 Thereafter, it is determined whether or not the utterance of the music playback operation has ended (step S34). For example, by determining whether or not the utterance has been interrupted for 3 seconds, it is determined that the utterance has not yet ended. That is, when the voice is subsequently uttered, the process returns to step S32 again to search the voice recognition device operation dictionary that previously generated the uttered voice, and the same operation is repeated thereafter. When the next utterance is not made for a predetermined time in step S34 as described above, or when it is determined that the sound is not the playback operation of the portable audio player, it is determined that the utterance of the music playback operation has ended. Then, the device is operated with the recognized words, and the designated music is reproduced (step S35).

これらの処理は図１において音声認識処理部３０がマイク１７から入力した利用者の発話音声を入力し、携帯型オーディオプレーヤ音声認識機器操作用辞書２９を検索することによって音声認識を行い、その処理による音声認識結果３１を外部機器操作信号出力部１６から携帯型オーディオプレーヤ１２に出力し、所定の曲を検索してデータの出力を行い、オーディオ装置１１ではこれをデータ取込部１５で取り込んで再生処理を行うことによって実行する。 In these processes, the speech recognition processing unit 30 in FIG. 1 inputs the user's utterance voice input from the microphone 17 and searches the portable audio player voice recognition device operation dictionary 29 to perform voice recognition. Is output from the external device operation signal output unit 16 to the portable audio player 12 to search for a predetermined song and output data. In the audio device 11, the data acquisition unit 15 captures the data. It is executed by performing a reproduction process.

前記のような音声認識辞書生成用基本辞書を用いる結果、認識辞書の元データが図７（ａ）に示すようなプログラムのデータであるとき、図３に示すような読み変換済みデータが得られた場合には、これをバイナリー処理して音声認識辞書生成用基本辞書として作成し、図１のオーディオ装置１１における音声認識装置２１で用いるとき、図７（ｂ）の太字で示す部分のデータについて、この辞書により読みを付与することができる。 As a result of using the basic dictionary for speech recognition dictionary generation as described above, when the original data of the recognition dictionary is data of a program as shown in FIG. 7A, read-converted data as shown in FIG. 3 is obtained. In this case, this is binary processed to create a basic dictionary for speech recognition dictionary generation, and when used in the speech recognition device 21 in the audio device 11 in FIG. 1, the data in the portion shown in bold in FIG. Reading can be given by this dictionary.

図７（ａ）に示す例では、「ＰｌａｙｂｙＳｏｎｇ」の機器操作、即ち曲名を入力することによって作動する、という機器操作指示の元に、その曲名を順に取り込むとき、「Ｐｌａｙ」、「ｂｙ」、「Ｓｏｎｇ」の機器操作基本データについては、図３Ａの読みデータによって読み仮名を付与することができ、曲名データ部分における「Ａｎｙｔｈｉｎｇｆｏｒｙｏｕ」、「Ｂｌａｃｋ＆Ｂｌｕｅ」、「Ｃｒａｚｙ４Ｕ」、「ＤＡＮＣＥ２」については図３Ｂの読みデータによって読み仮名を付与し、その後音声認識機器操作用辞書とすることができる。 In the example shown in FIG. 7A, when the song names are sequentially fetched under the device operation instruction of “Play by Song”, that is, the device is operated by inputting the song name, “Play”, “by” "Song" device operation basic data can be given a reading pseudonym by the reading data of FIG. 3A, and "Anything for you", "Black & Blue", "Crazy 4U", " As for “DANCE 2”, a reading pseudonym is given by the reading data of FIG. 3B, and then it can be used as a speech recognition device operation dictionary.

このような処理が行われる結果、図７（ｂ）に示すように、図中太字で示す機器操作基本データ部分の全て、及び曲名データ部分の図中８つの曲名中太字で示す４つの曲名の読みデータを取得できたことになる。それにより、ここで読みデータを取得できなかった残り４つの曲名についてのみ音声合成（ＴＴＳ）辞書、及びＴＴＳ処理手法によって読みデータを得る処理を行えば良くなる。その結果、例えば１０，０００件の曲データを対象に読みデータを付与した結果、全て音声合成（ＴＴＳ）によって辞書生成処理を行うと作成時間が５分かかった場合、本発明の手法を用いると、読みデータが１００％音声認識辞書生成用基本辞書から得られるとき（ヒット率１００％）には十数秒で読み付与処理を終了することができ、ヒット率が５０％の時でも数分に短縮することができ、本発明による音声認識辞書生成システムが極めて効果的であることを確認した。 As a result of such processing, as shown in FIG. 7 (b), all of the device operation basic data parts shown in bold in the figure, and four song names shown in bold in the eight song titles in the song name data part. Reading data can be acquired. As a result, only the remaining four music titles for which reading data could not be acquired need to be processed by the speech synthesis (TTS) dictionary and the TTS processing method to obtain reading data. As a result, for example, when reading data is added to 10,000 pieces of music data, and it takes 5 minutes to create a dictionary by performing speech synthesis (TTS), the method of the present invention is used. When reading data is obtained from the basic dictionary for 100% speech recognition dictionary generation (hit rate 100%), the reading process can be completed in a few dozen seconds, and even when the hit rate is 50%, it is shortened to several minutes. It was confirmed that the speech recognition dictionary generation system according to the present invention is extremely effective.

本発明の実施例の機能ブロック図である。It is a functional block diagram of the Example of this invention. 同実施例における読み変換用元データの収集例を示す図である。It is a figure which shows the collection example of the original data for reading conversion in the Example. 同収集した読み変換用元データについて読み変換を行った読み変換済み元データの例を示す図である。It is a figure which shows the example of the read converted original data which performed the read conversion about the collected original data for the read conversion. 同実施例において音声認識辞書生成用基本辞書を作成する処理を行う作動フロー図である。It is an operation | movement flowchart which performs the process which produces the basic dictionary for speech recognition dictionary production | generation in the Example. 同実施例において音声認識機器操作用辞書生成処理を行う作動フロー図である。It is an operation | movement flowchart which performs the dictionary production | generation process for voice recognition apparatus operation in the Example. 同実施例において音声認識によってオーディオ機器再生操作を行う作動フロー図である。It is an operation | movement flowchart which performs audio equipment reproduction operation by voice recognition in the Example. 同実施例において音声認識辞書生成用基本辞書で読み取られた元データの例を示す図である。It is a figure which shows the example of the original data read with the basic dictionary for speech recognition dictionary production | generation in the Example.

Explanation of symbols

１機器操作用データベース
２基本曲情報データベース
３基本読みデータ収集部
４読み変換処理部
５バイナリー化処理部
６音声認識辞書生成用基本辞書
１１オーディオ装置
１２携帯型オーディオプレーヤ
１３外部機器接続部
１４外部機器接続部
１５データ取込部
１６外部機器操作信号出力部
１７マイク
２１音声認識処理部
２３音声認識対象データ収集部
２４読みデータ生成処理部
２５読みデータ有り
２６読みデータ無し
２７読みデータ生成後処理部
２８音声合成（ＴＴＳ）辞書
２９帯型オーディオプレーヤ音声認識機器操作用辞書
３０音声認識処理部
３１音声認識結果 1 Equipment operation database
2 basic song information database 3 basic reading data collection unit 4 reading conversion processing unit 5 binarization processing unit 6 basic dictionary for speech recognition dictionary generation 11 audio device 12 portable audio player 13 external device connection unit 14 external device connection unit 15 data collection Insertion unit 16 External device operation signal output unit 17 Microphone 21 Speech recognition processing unit 23 Speech recognition target data collection unit 24 Reading data generation processing unit 25 Reading data present 26 Reading data not present 27 Reading data generation post-processing unit 28 Speech synthesis (TTS) Dictionary 29 Band type audio player Voice recognition device operation dictionary 30 Voice recognition processing unit 31 Voice recognition result

Claims

A speech recognition device operation dictionary used for device operation by recognizing a user's uttered speech is generated by adding a pseudonym to the original data for speech recognition dictionary generation collected by the speech recognition target data collection unit. In the dictionary generation system for speech recognition,
The voice recognition target data collection unit collects voice recognition dictionary generation original data separately for each type of data,
Using the basic dictionary for voice recognition dictionary generation created separately, generating the voice recognition device operation dictionary by giving a reading pseudonym to the voice recognition dictionary generation original data,
The basic dictionary for speech recognition dictionary generation is the same type as the type that collects the reading conversion original data predicted to be used in the speech recognition apparatus in advance and assigns the reading pseudonym, and separates the speech recognition dictionary generation original data It was created separately,
When generating the dictionary for operating the voice recognition device, it is generated by adding a reading pseudonym using the type of data in the basic dictionary for voice recognition dictionary generation corresponding to the type of the voice recognition target data. Dictionary generation system for voice recognition.

The device operated by the voice recognition is an audio device,
The voice recognition target data collection unit collects data necessary for the playback operation of the audio device for each type of data,
The basic dictionary for voice recognition dictionary generation is created by predicting data collected by the voice recognition target data collection unit, collecting basic reading data for each type of data, and adding readings. Item 4. The dictionary generation system for speech recognition according to Item 1.

The voice recognition target data collecting unit collects music information of songs recorded on a data recording medium of the audio player when another audio player is connected to the audio device. Item 3. The dictionary generation system for speech recognition according to Item 2.

A word for which a kana cannot be assigned to the original data for generating a speech recognition dictionary by the basic dictionary for generating a speech recognition dictionary is provided with a reading pseudonym by a speech synthesizer to generate a dictionary for operating a speech recognition device. Item 4. The dictionary generation system for speech recognition according to Item 1.

The speech recognition dictionary generation system according to claim 1, wherein the separately created basic dictionary for speech recognition dictionary generation is subjected to binary processing.

The basic dictionary for speech recognition dictionary generation performs the same processing when processing the data collected by the speech recognition target data collection unit to remove symbols that are not readable, and when adding readings to the original data for reading conversion. The dictionary generation system for speech recognition according to claim 1, wherein the dictionary generation system is created.