JP2010072081A

JP2010072081A - Voice recognition dictionary creating device

Info

Publication number: JP2010072081A
Application number: JP2008236681A
Authority: JP
Inventors: Toshihiro Ito; 敏博伊藤; Akihiro Oya; 章博大矢
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2008-09-16
Filing date: 2008-09-16
Publication date: 2010-04-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice recognition dictionary creating device shortening a period from the start of creation of a tree-structure voice recognition dictionary to the start of voice recognition. <P>SOLUTION: When an instruction shows initial registration (initial registration in S110), input of a music data group is received (S120), and data are sorted according to sort priority and divided into a plurality of sets (S130), and from the set including music data of the higher sorting rank, voice recognition dictionaries are created successively (S140). When it is determined that the instruction shows addition of music data (addition in S110), input of music data is received (S145), and a dictionary only for additional data is created (S150). When it is determined that the instruction shows deletion of music data (deletion in S110), input of a music name is received (S155), and a non-recognition-target flag is raised in correspondence with the music name of a deletion target (S160). When it is determined that the instruction shows change of music data (change in S110), processing of a combination of addition and deletion is carried out (S165-S180). <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、音声認識用の辞書を作成する装置に関する。 The present invention relates to an apparatus for creating a dictionary for speech recognition.

音声認識に用いる音声認識用辞書を認識対象の言葉が増加するのに伴って再構築する技術が、既に知られている（特許文献１）。
この種の音声認識用辞書は、認識対象の言葉（語句）に対応するテキストデータと、このテキストデータに対応する音声の特徴を表す音声データとが対応付けられたデータ構造を有する。音声データは、マイクを通じて入力されるユーザの発話音声を表す音声信号とテキストデータとのインタフェースの役割を果たす。 A technique for reconstructing a speech recognition dictionary used for speech recognition as the number of words to be recognized increases has already been known (Patent Document 1).
This type of speech recognition dictionary has a data structure in which text data corresponding to words (phrases) to be recognized is associated with speech data representing features of speech corresponding to the text data. The voice data serves as an interface between a voice signal representing a user's voice input through a microphone and text data.

そして、音声認識用辞書は、ツリー構造で構成されるのが一般的である。この理由は、樹形図を作る要領で音声データについて重複部分を共通化することによって、音声認識の効率化を図るためである。
特開２００１−２４９６８６号公報 The speech recognition dictionary is generally configured in a tree structure. The reason for this is to improve the efficiency of speech recognition by sharing the overlapping portions of speech data in the manner of creating a tree diagram.
JP 2001-249686 A

先述した技術の課題は、ツリー構造を採用する音声認識用辞書を作成し始めてから音声認識が開始できるまでの時間が長いことである。なぜなら、ツリー構造を採用する音声認識用辞書は、一通り完成した状態でないと音声認識に用いることができないのに、ツリー構造は構築に時間がかかるからである。この結果、長時間にわたって音声認識を実行できなくなることがあり、ユーザにストレスを与える原因となっていた。 The problem with the above-described technique is that it takes a long time from the start of creating a speech recognition dictionary that employs a tree structure to the start of speech recognition. This is because a dictionary for speech recognition that adopts a tree structure cannot be used for speech recognition unless it is completely completed, but it takes time to construct the tree structure. As a result, voice recognition may not be performed for a long time, causing stress to the user.

本発明は先述した課題に鑑み、ツリー構造を採用する音声認識用辞書を作成し始めてから音声認識が開始できるまでの時間を短くできる音声認識用辞書作成装置の提供を目的とする。 SUMMARY OF THE INVENTION In view of the above-described problems, an object of the present invention is to provide a speech recognition dictionary creation device that can shorten the time from the start of creating a speech recognition dictionary that employs a tree structure until speech recognition can be started.

先述した課題を解決するために発明された請求項１に記載の音声認識用辞書作成装置は、入力手段と、分割手段と、作成手段とを備える。入力手段は、語句群の入力を受ける。また、分割手段は、その語句群を複数の集合に分割する。そして、作成手段は、その集合毎にツリー構造を持つ音声認識用辞書を一つずつ作成する。 The speech recognition dictionary creating apparatus according to claim 1 invented to solve the above-described problem includes an input unit, a dividing unit, and a creating unit. The input means receives a phrase group input. The dividing means divides the word group into a plurality of sets. Then, the creating means creates one speech recognition dictionary having a tree structure for each set.

請求項１に記載の音声認識用辞書作成装置によれば、音声認識用辞書を新規に作成し始めてから音声認識を開始できるまでの時間を短くできる。なぜなら、入力を受けた全ての語句について辞書を作成しなくても、入力を受けた語句群を分割したものについて別々に完成した辞書は、順次、音声認識に用いることができるからである。 According to the speech recognition dictionary creating apparatus of the first aspect, it is possible to shorten the time from the start of newly creating the speech recognition dictionary until the speech recognition can be started. This is because, even if a dictionary is not created for all input words / phrases, dictionaries completed separately for the input word / phrase group can be sequentially used for speech recognition.

請求項２に記載の音声認識用辞書作成装置は、請求項１の構成に加えて、更新手段を備える。この更新手段は、作成手段が作成した複数の音声認識用辞書を新たなツリー構造の構築によって統合することで、一つの音声認識用辞書を作成する。 According to a second aspect of the present invention, there is provided a voice recognition dictionary creating apparatus including an updating unit in addition to the configuration of the first aspect. This updating means creates a single speech recognition dictionary by integrating a plurality of speech recognition dictionaries created by the creation means by constructing a new tree structure.

請求項２に記載の音声認識用辞書作成装置によれば、複数に分割されていた辞書を一つにまとめることで、ツリー構造による音声認識の効率向上の効果をより高めることができる。なぜなら、ツリー構造を一つにまとめれば、より多くの重複部分を統合できるからである。 According to the speech recognition dictionary creating apparatus of the second aspect, the effect of improving the efficiency of speech recognition by the tree structure can be further enhanced by combining the plurality of dictionaries into one. This is because if the tree structure is combined into one, more overlapping parts can be integrated.

一方、請求項３に記載の音声認識用辞書作成装置も、先述した課題を解決するための発明であり、入力手段と、作成手段とを備える。入力手段は、語句の入力を受ける。そして、作成手段は、その語句についての音声認識用辞書を、ツリー構造を持つ既存の音声認識用辞書とは別に作成する。 On the other hand, the speech recognition dictionary creating apparatus according to claim 3 is also an invention for solving the above-described problem, and includes an input unit and a creating unit. The input means receives a phrase. Then, the creating means creates a speech recognition dictionary for the phrase separately from the existing speech recognition dictionary having a tree structure.

請求項３に記載の音声認識用辞書作成装置によれば、既存の音声認識用辞書に収録されていない語句を音声認識用辞書に加え始めてから音声認識を開始できるまでの時間を短くできる。なぜなら、既存の音声認識用辞書には修正を加えないので、この辞書を用いた音声認識は、いつでも可能であるからである。さらに、入力を受けた語句についても、入力を受けた語句についてのみの音声認識用辞書を構築すれば、この辞書を用いた音声認識を開始できるからである。 According to the speech recognition dictionary creating apparatus of the third aspect, it is possible to shorten the time from the start of adding words / phrases that are not recorded in the existing speech recognition dictionary to the start of speech recognition. This is because the existing speech recognition dictionary is not modified, and speech recognition using this dictionary is possible at any time. Furthermore, if a speech recognition dictionary is constructed only for the input word / phrase, the voice recognition using this dictionary can be started.

請求項４に記載の音声認識用辞書作成装置は、請求項３の構成に加えて、更新手段を備える。この更新手段は、作成手段が作成した音声認識用辞書および既存の音声認識用辞書を新たなツリー構造の構築によって統合することで、一つの音声認識用辞書を作成する。 According to a fourth aspect of the present invention, there is provided a speech recognition dictionary creating apparatus including an updating unit in addition to the configuration of the third aspect. The updating means integrates the speech recognition dictionary created by the creation means and the existing speech recognition dictionary by constructing a new tree structure, thereby creating one speech recognition dictionary.

請求項４に記載の音声認識用辞書作成装置によれば、複数に分割されていた辞書を一つにまとめることで、ツリー構造による音声認識の効率向上の効果をより高めることができる。なぜなら、ツリー構造を一つにまとめれば、より多くの重複部分を統合できるからである。 According to the speech recognition dictionary creating apparatus of the fourth aspect, the effect of improving the efficiency of speech recognition by the tree structure can be further enhanced by combining the plurality of dictionaries into one. This is because if the tree structure is combined into one, more overlapping parts can be integrated.

一方、請求項５に記載の音声認識用辞書作成装置も、先述した課題を解決するための発明であり、入力手段と、目印付加手段とを備える。入力手段は、ツリー構造を持つ既存の音声認識用辞書に収録された語句の中で、音声認識の対象外とする語句を示す情報の入力を受ける。そして、目印付加手段は、その情報によって示される語句に対応させて、目印を付加する。 On the other hand, the speech recognition dictionary creating apparatus according to claim 5 is also an invention for solving the above-described problem, and includes an input unit and a mark addition unit. The input means receives information indicating a word / phrase that is not subject to voice recognition among words / phrases recorded in an existing voice recognition dictionary having a tree structure. The mark adding means adds a mark corresponding to the word or phrase indicated by the information.

請求項５に記載の音声認識用辞書作成装置によれば、既存の音声認識用辞書から語句を削除し始めてから音声認識を開始できるまでの時間を短くできる。なぜなら、既存の音声認識用辞書のツリー構造を修正することなく、音声認識結果として出力しないように、目印を付加するだけだからである。このようにして目印が付加された語句は、音声認識結果として出力されることがないので、ユーザにとっては削除されたに等しい状態になる。 According to the speech recognition dictionary creating apparatus of the fifth aspect, it is possible to shorten the time from the start of deleting a phrase from the existing speech recognition dictionary until the speech recognition can be started. This is because a mark is only added so as not to output the voice recognition result without correcting the tree structure of the existing voice recognition dictionary. Since the word / phrase to which the mark is added in this way is not output as a speech recognition result, the user is in a state equivalent to being deleted.

請求項６に記載の音声認識用辞書作成装置は、請求項５の構成に加えて、更新手段を備える。この更新手段は、既存の音声認識用辞書から目印が付加された語句を除いた音声認識用辞書を、新たなツリー構造の構築によって作成する。 In addition to the configuration of the fifth aspect, the speech recognition dictionary creating device according to the sixth aspect includes an updating unit. The updating means creates a speech recognition dictionary by removing a word / phrase with a mark from an existing speech recognition dictionary by constructing a new tree structure.

請求項６に記載の音声認識用辞書作成装置によれば、目印が付加されていた語句を実際に消去することで、ツリー構造による音声認識の効率向上の効果をより高めることができる。なぜなら、音声認識結果として出力されることのない語句を辞書に収録していると、音声認識処理において、その語句を排除するような処理が必要になるからである。 According to the speech recognition dictionary creating apparatus of the sixth aspect, the effect of improving the efficiency of speech recognition by the tree structure can be further enhanced by actually erasing the phrase to which the mark has been added. This is because if a phrase that is not output as a speech recognition result is recorded in the dictionary, a process for eliminating the phrase in the speech recognition process is required.

一方、請求項７に記載の音声認識用辞書作成装置も、先述した課題を解決するための発明であり、入力手段と、目印付加手段と、作成手段とを備える。入力手段は、ツリー構造を持つ既存の音声認識用辞書に収録された所定の語句を、他の語句に変更することを示す情報の入力を受ける。また、目印付加手段は、その所定の語句に対応させて、目印を付加する。そして、作成手段は、前記他の語句についての音声認識用辞書を、既存の音声認識用辞書とは別に作成する。 On the other hand, the speech recognition dictionary creating apparatus according to claim 7 is also an invention for solving the above-described problem, and includes an input unit, a mark adding unit, and a creating unit. The input means receives input indicating that a predetermined word / phrase recorded in an existing speech recognition dictionary having a tree structure is changed to another word / phrase. The mark adding means adds a mark corresponding to the predetermined word / phrase. Then, the creating means creates a speech recognition dictionary for the other words / phrases separately from the existing speech recognition dictionary.

請求項７に記載の音声認識用辞書作成装置によれば、既存の音声認識用辞書の語句を変更し始めてから音声認識を開始できるまでの時間を短くできる。なぜなら、変更を削除と追加とに分けて考えて、請求項３及び請求項５の構成を兼ね備えるようになっているからである。つまり、既存の音声認識用辞書については、ツリー構造を修正することなく、音声認識結果として出力しないように変更前の語句に目印を付加するだけであり、さらに、変更後の語句については、その語句についてのみの音声認識用辞書を構築すれば、この辞書を用いた音声認識を開始できるからである。 According to the speech recognition dictionary creating apparatus of the seventh aspect, it is possible to shorten the time from the start of changing a word / phrase in the existing speech recognition dictionary to the start of speech recognition. This is because the change is divided into deletion and addition, and the configurations of claims 3 and 5 are combined. In other words, for existing speech recognition dictionaries, the tree structure is not modified and only a mark is added to the word before change so that it is not output as the speech recognition result. Furthermore, for the word after change, This is because if a speech recognition dictionary only for words is constructed, speech recognition using this dictionary can be started.

請求項８に記載の音声認識用辞書作成装置は、請求項７の構成に加えて、更新手段を備える。この更新手段は、作成手段が作成した音声認識用辞書および既存の音声認識用辞書から目印が付加された語句を除いた音声認識用辞書を、新たなツリー構造の構築によって統合することで、一つの音声認識用辞書を作成する。 In addition to the configuration of the seventh aspect, the speech recognition dictionary creating apparatus according to the eighth aspect includes an updating unit. This updating unit integrates the speech recognition dictionary created by the creating unit and the speech recognition dictionary excluding words / phrases added with a mark from the existing speech recognition dictionary by constructing a new tree structure. Create two speech recognition dictionaries.

請求項８に記載の音声認識用辞書作成装置によれば、目印が付加されていた語句を消去すると共に、複数に分割されていた辞書を一つにまとめることで、ツリー構造による音声認識の効率向上の効果をより高めることができる。なぜなら、ツリー構造を一つにまとめれば、より多くの重複部分を統合できるから、さらには、音声認識結果として出力されることのない語句を辞書に収録していると、音声認識処理において、その語句を排除するような処理が必要になるからである。 According to the speech recognition dictionary creation device of claim 8, the efficiency of speech recognition based on a tree structure can be obtained by erasing the word / phrase to which the mark has been added and combining the divided dictionary into one. The effect of improvement can be further enhanced. Because, if the tree structure is combined into one, more overlapping parts can be integrated.Furthermore, if words and phrases that are not output as speech recognition results are recorded in the dictionary, This is because processing that eliminates the phrase is necessary.

以下、図面と共に説明する。図１は、本発明が適用されたナビゲーション装置１０及び外部装置しての携帯型デジタル音楽プレイヤ１００（以下「携帯プレイヤ１００」という）の概略構成図である。図１に示すようにナビゲーション装置１０は、測位器１１、操作スイッチ群１２、音声入力部１３、表示部１４、音声出力部１５、ＨＤＤ（ハードディスクドライブ）１７、外部接続インタフェース１８及び制御部２０を備える。 Hereinafter, it demonstrates with drawing. FIG. 1 is a schematic configuration diagram of a navigation device 10 to which the present invention is applied and a portable digital music player 100 (hereinafter referred to as “portable player 100”) as an external device. As shown in FIG. 1, the navigation device 10 includes a positioning device 11, an operation switch group 12, an audio input unit 13, a display unit 14, an audio output unit 15, an HDD (hard disk drive) 17, an external connection interface 18, and a control unit 20. Prepare.

測位器１１は、ＧＰＳ衛星からの電波をＧＰＳアンテナを介して受信して、その衛星の軌道情報および現在日時の情報を取得するＧＰＳ受信機１１ａと、車両に加えられる回転運動の大きさを測定するジャイロスコープ１１ｂと、車両の走行距離を測定する距離センサ１１ｃとを備える。 The positioning device 11 receives a radio wave from a GPS satellite via a GPS antenna, and measures the magnitude of the rotational motion applied to the vehicle, and a GPS receiver 11a that acquires the orbit information and current date / time information of the satellite. And a distance sensor 11c that measures the travel distance of the vehicle.

操作スイッチ群１２は、表示部１４と一体に構成されたタッチパネルや表示部１４の周囲に設けられたメカニカルなキースイッチ等から構成されている。音声入力部１３は、ユーザが発する音声を取得できるようにマイクで構成されている。 The operation switch group 12 includes a touch panel configured integrally with the display unit 14 and mechanical key switches provided around the display unit 14. The voice input unit 13 is configured with a microphone so as to acquire voice uttered by the user.

ＨＤＤ１７は、地図データや音楽データ等を記憶すると共に、制御部２０に入力する。外部接続インタフェース１８は、外部装置とデータ通信するためのものである。例えば、音楽データを送受信する目的で、音楽データを記憶・再生できる携帯プレイヤ１００等との接続に用いられる。また、外部接続インタフェース１８を通じて受信した音楽データは、制御部２０を介してＨＤＤ１７に記憶される。 The HDD 17 stores map data, music data, and the like and inputs them to the control unit 20. The external connection interface 18 is for data communication with an external device. For example, it is used for connection with a portable player 100 or the like that can store / reproduce music data for the purpose of transmitting / receiving music data. Music data received through the external connection interface 18 is stored in the HDD 17 via the control unit 20.

表示部１４は、カラー表示装置であり、液晶モニタ等で構成される。この表示部１４には、制御部２０からの制御信号に基づいて、車両の現在位置周囲の地図や、ユーザにより指定された目的地までの経路などが表示される。また、音声出力部１５は、スピーカなどであり、制御部２０からの制御信号に基づいて、経路案内のための音声を出力する。 The display unit 14 is a color display device and includes a liquid crystal monitor or the like. The display unit 14 displays a map around the current position of the vehicle, a route to the destination designated by the user, and the like based on a control signal from the control unit 20. The voice output unit 15 is a speaker or the like, and outputs a voice for route guidance based on a control signal from the control unit 20.

制御部２０は、ＣＰＵ２０ａ、ＲＡＭ２０ｂ、ＲＯＭ２０ｃ及びＮＶＲＡＭ２０ｄ等から構成される。そして制御部２０は、ＲＯＭ２０ｃに記憶された各種プログラムをＣＰＵ２０ａによって実行することで、経路案内・音声認識・音楽再生などの機能を実現する。 The control unit 20 includes a CPU 20a, a RAM 20b, a ROM 20c, an NVRAM 20d, and the like. The control unit 20 executes various programs stored in the ROM 20c by the CPU 20a, thereby realizing functions such as route guidance, voice recognition, and music reproduction.

ところで、ナビゲーション装置１０は、上記の音声認識機能および音楽再生機能として次の処理を実行する。その処理とは、ＨＤＤ１７に記憶される音楽データの楽曲名が音声入力部１３を通じて音声で入力されると、その楽曲名を音声認識用辞書によって音声認識し、その認識した楽曲名に対応する音楽データを再生する、という処理である。 Incidentally, the navigation device 10 executes the following processing as the voice recognition function and the music playback function. The process is as follows. When a song name of music data stored in the HDD 17 is inputted by voice through the voice input unit 13, the song name is voice-recognized by a voice recognition dictionary, and music corresponding to the recognized song name is recorded. This is a process of reproducing data.

その処理を説明する前に、図２を用いて、楽曲名と音楽データとが、どのように対応付けられているかを説明する。図２に示されているのは、携帯プレイヤ１００に記憶されている楽曲データ群である。この楽曲データ群が、後述する辞書構築処理において携帯プレイヤ１００からナビゲーション装置１０に入力される。 Before explaining the processing, it will be described how the music titles and music data are associated with each other using FIG. FIG. 2 shows a music data group stored in the portable player 100. This music data group is input from the portable player 100 to the navigation device 10 in a dictionary construction process described later.

また、楽曲データ群とは、楽曲データの集合体のことである。そして、楽曲データとは、一曲毎のデータのことであり、楽曲名の表記および読み、楽曲情報としてのアルバム名、ヒットランキング及び再生回数、並びに、再生用の音楽データ（図示せず）の各項目が格納されたものである。このような楽曲データの構造によって、楽曲名と音楽データとが対応付けられている。 The music data group is a collection of music data. The song data is data for each song, and the notation and reading of the song name, the album name as the song information, the hit ranking and the number of times of reproduction, and the music data for reproduction (not shown). Each item is stored. With such a music data structure, music names and music data are associated with each other.

さらに、楽曲データ群には、ソート優先度の情報が格納されている。このソート優先度は、後述するＳ１３０で、楽曲データ群を複数の集合に分割するときに用いられる。具体的には、ソート優先度とは、どの項目を優先させてデータをソートするかの順序付けを示した情報であり、ユーザの入力によって定められる。 Further, sort priority information is stored in the music data group. This sort priority is used when the music data group is divided into a plurality of sets in S130 described later. Specifically, the sort priority is information indicating the order in which items are prioritized to sort data, and is determined by user input.

図２の例では、再生回数がソート優先度１番であるので、再生回数が多いものほどソート結果が上位になる。再生回数が同じものについては、ソート優先度２番のヒットランキング（市場での販売枚数などの順位）が高い順に並べる。ソート優先度３番以下も同様である。 In the example of FIG. 2, since the number of reproductions is the sort priority number 1, the higher the number of reproductions, the higher the sorting result. Those with the same number of reproductions are arranged in descending order of the second highest hit ranking (ranking such as the number sold in the market). The same applies to the sort priority 3 and below.

次に、図３を用いて、辞書構築処理について説明する。図３にフローチャートが表された辞書構築処理は、音声認識用辞書を構築するために制御部２０が主体となって実行する処理である。なお、この処理で構築される音声認識用辞書は、ユーザから音声によって指定される楽曲名を特定するためのものである。楽曲名を特定する目的は、先述したように、特定した楽曲名に対応する音楽データを再生するためである。 Next, the dictionary construction process will be described with reference to FIG. The dictionary construction process shown in the flowchart in FIG. 3 is a process executed mainly by the control unit 20 in order to construct a speech recognition dictionary. Note that the speech recognition dictionary constructed by this process is for specifying the song name designated by the user's voice. The purpose of specifying the song name is to reproduce music data corresponding to the specified song name, as described above.

また、この処理の開始の契機は、楽曲データの初期登録、追加、削除または変更が、音声入力部１３（マイク）を通じて指示されることである。なお「初期登録」「追加」「削除」及び「変更」、さらには、後述する処理（Ｓ２３０）の実行に必要な語句を音声認識するための音声認識用辞書は、辞書構築処理によって構築される辞書とは別に記憶されている。 The trigger for starting this process is that the initial registration, addition, deletion or change of music data is instructed through the voice input unit 13 (microphone). It should be noted that a voice recognition dictionary for voice recognition for “initial registration”, “addition”, “deletion”, and “change”, as well as words and phrases necessary for executing the processing (S230) described later, is constructed by dictionary construction processing. It is stored separately from the dictionary.

辞書構築処理が開始されると、この処理を開始する契機となったユーザの指示が何であるかを判断する（Ｓ１１０）。ユーザの指示が初期登録であると判断すると（Ｓ１１０で初期登録）、携帯プレイヤ１００から外部接続インタフェース１８を介して、図２で説明した楽曲データ群の入力を受け付ける（Ｓ１２０）。 When the dictionary construction process is started, it is determined what is the user instruction that triggered the start of the process (S110). When it is determined that the user instruction is initial registration (initial registration in S110), the music data group input described with reference to FIG. 2 is received from the portable player 100 via the external connection interface 18 (S120).

なお、ここで追加する楽曲データ群に、どの楽曲データを含ませるかは、携帯プレイヤ１００においてユーザが指定する。また、楽曲データ群に含まれる楽曲データ数は、一曲でも何曲でも構わない。この二点は、後述する追加のときも同じである。 Note that the portable player 100 specifies which music data is included in the music data group to be added here. Further, the number of music data included in the music data group may be one or any number of songs. These two points are the same when adding later.

次に、ソート優先度に従って、各楽曲データを先述したようにソートし、このソート結果を用いて、所定数毎の楽曲データで構成される集合に分割する（Ｓ１３０）（図４参照）。例えば、９５０曲のデータが入力されたとして、分割後の各集合に属する曲数の上限が１００曲に定められていれば、ソート結果が１〜１００、１０１〜２００、…、９０１〜９５０という具合に、端数も一つの集合として扱って、１０個の集合に分割する。 Next, according to the sorting priority, each piece of music data is sorted as described above, and this sort result is used to divide into a set composed of a predetermined number of pieces of music data (S130) (see FIG. 4). For example, if data of 950 songs is input and the upper limit of the number of songs belonging to each set after division is set to 100 songs, the sort results are 1 to 100, 101 to 200,..., 901 to 950. Specifically, the fraction is treated as one set and divided into 10 sets.

次に、ソート結果が上位の（数字が小さい）楽曲データを含む集合から順に一つずつ、ツリー構造を持つ音声認識用辞書を暫定辞書として作成・登録し（Ｓ１４０）、この処理を終える。つまり、音声認識に用いることができるように、各楽曲名の読み（テキストデータ）を音声データに置き換えたものよってツリー構造を作る（図５参照）。 Next, a voice recognition dictionary having a tree structure is created and registered as a temporary dictionary one by one from the set including the music data with the higher sorting results (smaller numbers) (S140), and the process is terminated. In other words, a tree structure is created by replacing each song name reading (text data) with voice data so that it can be used for voice recognition (see FIG. 5).

この音声データは、テキストデータに対応する音声波形の特徴がデジタル量で表されたものである。なお、ここで「暫定」辞書と言っているのは、後の更新処理によって更新されることを前提としているからである。 This voice data is obtained by digitally expressing the characteristics of a voice waveform corresponding to text data. Note that the term “provisional” dictionary is used here because it is assumed that the dictionary will be updated by a later update process.

また、ツリー構造とは、図５に示すように、一つの要素が複数の要素への分岐情報を持つ階層的な構造である。つまり、樹形図を作る要領で音声データについて重複部分を共通化したものである。ここでは、五十音順に基づいてツリー構造を作る例を示しているけれども、当然、他の順序でもよい。また、図５に示すように、この暫定辞書には、認識対象外フラグを立てることができる。認識対象外フラグについて簡単に説明すると、対応する音声データが音声認識結果として出力されることを禁止するためのものである（詳細はＳ１６０及び図６で説明）。 The tree structure is a hierarchical structure in which one element has branch information to a plurality of elements as shown in FIG. That is, the overlapping part is made common to the voice data in the manner of creating a tree diagram. Here, an example is shown in which a tree structure is created based on the Japanese syllabary order, but other orders may naturally be used. Further, as shown in FIG. 5, a non-recognition flag can be set in this temporary dictionary. Briefly describing the non-recognition flag, the corresponding speech data is forbidden from being output as a speech recognition result (details are described in S160 and FIG. 6).

一方、ユーザからの指示が楽曲データの追加であると判断すると（Ｓ１１０で追加）、携帯プレイヤ１００から外部接続インタフェース１８を介して、楽曲データ群の入力を受け付ける（Ｓ１４５）。 On the other hand, if it is determined that the instruction from the user is addition of music data (added in S110), input of a music data group is accepted from the portable player 100 via the external connection interface 18 (S145).

そして、追加された楽曲データの楽曲名についてのみの暫定辞書を作成・登録し（Ｓ１５０）、この処理を終える。このときも初期登録と同様にソート結果に基づいて１００曲毎に分割してもよいし、しなくてもよい。 Then, a temporary dictionary for only the song name of the added song data is created and registered (S150), and this process is finished. At this time, as in the initial registration, it may or may not be divided every 100 songs based on the sorting result.

つまり、ここでの追加とは、正式・暫定を問わず他の音声認識用辞書が既に登録されていることを前提として、こうした既存の音声認識用辞書に含まれていない新たな楽曲名を追加する指示のことである。ただし、先述したように、この処理においては、既存の音声認識用辞書を再構築する処理を行うわけではなく、既存の音声認識用辞書とは別にツリー構造を持つ暫定辞書を作成・登録する。 In other words, adding here means adding new song names that are not included in these existing voice recognition dictionaries, assuming that other voice recognition dictionaries have already been registered, both formal and temporary. It is an instruction to do. However, as described above, in this process, the process for reconstructing the existing speech recognition dictionary is not performed, and a temporary dictionary having a tree structure is created and registered separately from the existing speech recognition dictionary.

また、ユーザからの指示が、ＨＤＤ１７に記憶されている楽曲データの削除であると判断すると（Ｓ１１０で削除）、音声入力部１３を通じて、削除する対象の楽曲名の入力を受け付けて、その楽曲名を音声認識する（Ｓ１５５）。次に、その楽曲名に対応する楽曲データを実際には削除せずに、音声認識辞書において、その楽曲名に対応させて認識対象外フラグを立てて（Ｓ１６０）、この処理を終える。 If it is determined that the instruction from the user is to delete the music data stored in the HDD 17 (deleted in S110), the input of the name of the music to be deleted is accepted through the voice input unit 13, and the music name is received. Is recognized (S155). Next, without actually deleting the music data corresponding to the music name, a non-recognition flag is set in correspondence with the music name in the voice recognition dictionary (S160), and the process is terminated.

図５に示すように、正式・暫定を問わず音声認識用辞書には、各楽曲の音声データについて認識対象外フラグが立てることができるように構成されている。このフラグが立てられた楽曲の音声データは、音声認識の際に対象外として扱われる。図６を用いて、認識対象外フラグについて説明する。 As shown in FIG. 5, the speech recognition dictionary regardless of whether it is formal or provisional is configured such that a recognition-in-progress flag can be set for the speech data of each song. The audio data of the music for which this flag is set is handled as a non-target at the time of voice recognition. The non-recognition target flag will be described with reference to FIG.

図６は、音楽再生処理を表すフローチャートである。この処理は、制御部２０が主体となって実行する処理であり、音声入力部１３を通じて楽曲名が入力されたことを契機に開始される。 FIG. 6 is a flowchart showing music playback processing. This process is a process executed mainly by the control unit 20 and is started when a song name is input through the voice input unit 13.

まず、入力された楽曲名との一致度を、正式・暫定を含む全ての登録されている音声認識辞書に収録された各楽曲の楽曲名について順位付けをする（Ｓ２１０）。そして、既に読み上げたもの及び認識対象外フラグが立てられたもの以外のうち、順位が最上位のものを音声出力部１５を通じて読み上げる（Ｓ２２０）。なお「既に読み上げたもの」というのは、Ｓ２３０でＮｏと判断される度にＳ２２０が実行されることで生じるものである。 First, the degree of coincidence with the input song name is ranked for the song names of each song recorded in all registered speech recognition dictionaries including formal and provisional (S210). Then, among those that have already been read out and those that have not been recognized flag set, those having the highest rank are read out through the voice output unit 15 (S220). Note that “already read” occurs when S220 is executed every time it is determined No in S230.

そして、読み上げた楽曲名がユーザの発話したものと一致したという旨の入力が音声入力部１３を通じてされたかを、音声認識を用いて判断する（Ｓ２３０）。読み上げた楽曲名がユーザの発話したものと不一致だったという旨の入力が音声入力部１３を通じてされたと判断すると（Ｓ２３０でＮｏ）、Ｓ２２０に戻る。 Then, it is determined using speech recognition whether or not an input to the effect that the read song name matches that spoken by the user is made through the speech input unit 13 (S230). If it is determined that the input that the read song name does not match the one spoken by the user is made through the voice input unit 13 (No in S230), the process returns to S220.

一方、読み上げた楽曲名がユーザの発話したものと一致したという旨の入力が音声入力部１３を通じてされたと判断すると（Ｓ２３０でＹｅｓ）、読み上げた楽曲名によって特定される楽曲データに含まれる音楽データの再生を、音声出力部１５を通じて実行して（Ｓ２４０）、この処理を終える。 On the other hand, if it is determined that the input that the read song name matches the one spoken by the user is made through the voice input unit 13 (Yes in S230), the music data included in the song data specified by the read song name Is reproduced through the audio output unit 15 (S240), and this process ends.

このように、認識対象外フラグが立てられた楽曲名は、音声認識結果として出力されることがない。従って、ユーザにとっては削除されたと同じことになる。
図３に戻る。ユーザからの指示が、ＨＤＤ１７に記憶されている楽曲データに対応する楽曲名の変更であると判断すると（Ｓ１１０で変更）、音声入力部１３を通じて、変更前後の楽曲名の入力を受け付けて、それらの楽曲名を音声認識する（Ｓ１６５）。 As described above, the music name for which the flag to be excluded from recognition is set is not output as a voice recognition result. Therefore, it is the same as deleted for the user.
Returning to FIG. If it is determined that the instruction from the user is a change of the song name corresponding to the song data stored in the HDD 17 (changed in S110), the input of the song name before and after the change is accepted through the voice input unit 13, Are recognized by voice (S165).

そして、変更後の楽曲データについてのみの暫定辞書を作成・登録する（Ｓ１７０）。そして、音声認識用辞書において、変更前の楽曲名に対応させて認識対象外フラグを立てて（Ｓ１８０）、この処理を終える。つまり、変更を、追加および削除に分離して実行する。 Then, a temporary dictionary only for the changed music data is created and registered (S170). Then, in the speech recognition dictionary, a flag not to be recognized is set in correspondence with the music name before the change (S180), and this process is finished. In other words, changes are performed separately in addition and deletion.

次に、図７を用いて更新処理を説明する。更新処理は、制御部２０が主体となって実行する処理である。この処理は、辞書構築処理の結果、複数に分割されたり認識対象外フラグが立てられたりした辞書を、整理・統合することで、音声認識に用いるのに望ましい状態の辞書を作るためのものである。 Next, the update process will be described with reference to FIG. The update process is a process executed mainly by the control unit 20. This process is intended to create a dictionary in a desirable state for use in speech recognition by organizing and integrating dictionaries that have been divided into a plurality of segments or flagged as non-recognized as a result of the dictionary construction process. is there.

「望ましい状態」について詳述すると、辞書が多数あると、音声認識を効率化するというツリー構造の効果が薄くなってしまう。また、認識対象外フラグが立てられた楽曲名が認識結果の上位に来ると「上位には来たけれど認識結果としては出力しない」というような余分な処理が必要になるので、認識対象外フラグは立てられていない方が望ましい。そこで、この更新処理によって、認識対象外フラグが立てられていない正式辞書一つに更新する。 The “desirable state” will be described in detail. If there are a large number of dictionaries, the effect of the tree structure for improving the efficiency of speech recognition is reduced. In addition, if the song name with the flag not to be recognized comes to the top of the recognition result, an extra process such as “It has come to the top but will not be output as a recognition result” is necessary. It is better not to stand. Therefore, by this update process, it is updated to one official dictionary for which no recognition target flag is set.

なお、更新処理の対象となる辞書は、辞書構築処理によって構築された辞書（つまり楽曲名を音声認識するための辞書）だけであり、辞書構築処理においてユーザの指示を判断するのに必要な辞書等は対象ではない。 Note that the dictionary that is the target of the update process is only a dictionary constructed by the dictionary construction process (that is, a dictionary for voice recognition of music titles), and a dictionary that is necessary for determining user instructions in the dictionary construction process Etc. is not a target.

また、処理の開始の契機は、制御部２０が備えるＣＰＵ２０ａの処理負荷が、閾値以下になることである。つまり、制御部２０の処理能力が余っていて、他の処理の妨げにならないときに実行される。ただし、辞書が一つしか無く、かつ、認識対象外フラグが立てられていないときには実行されない。 Further, the trigger for starting the process is that the processing load of the CPU 20a included in the control unit 20 is equal to or less than a threshold value. In other words, it is executed when the processing capacity of the control unit 20 is excessive and does not interfere with other processes. However, it is not executed when there is only one dictionary and no recognition target flag is set.

まず、正式・暫定を含めた全ての辞書において認識対象外フラグが立てられた楽曲名に対応する楽曲データを削除する（Ｓ３１０）。そして、削除されずに残っている楽曲データに含まれる楽曲名の全てを対象として、ツリー構造を作ることで、新たな辞書を作成する（Ｓ３２０）。そして、現時点で登録されている正式辞書に替えて、作成した新たな辞書を正式辞書として登録する（Ｓ３３０）。最後に、暫定辞書を削除して（Ｓ３４０）、処理を終える。 First, the music data corresponding to the music name for which the non-recognized flag is set in all dictionaries including formal and provisional are deleted (S310). Then, a new dictionary is created by creating a tree structure for all the music names included in the music data remaining without being deleted (S320). Then, in place of the official dictionary registered at the present time, the created new dictionary is registered as an official dictionary (S330). Finally, the temporary dictionary is deleted (S340), and the process ends.

ここから効果を述べる。楽曲データ群の初期登録においては、膨大なデータ量になりがちな正式辞書を作成しなくても、暫定辞書によって短時間で音声認識が実行できるようになる、という効果が生じる。例えば、１０００曲全ての辞書を作らなくても、とりあえず１００曲分を収録した辞書を登録した時点で、この１００曲については音声認識を実行できる。 The effect will be described here. In the initial registration of the music data group, there is an effect that voice recognition can be executed in a short time by the provisional dictionary without creating a formal dictionary that tends to have an enormous amount of data. For example, even if a dictionary for all 1000 songs is not created, when a dictionary containing 100 songs is registered for the time being, speech recognition can be executed for these 100 songs.

しかも、この１００曲はソート結果が上位なものなので、ユーザにとってすぐに再生したいもの程、ソート結果が上位に来るようにソート優先度が選ばれていれば、この１００曲の中から再生を命令するための音声が入力される可能性が高い。 In addition, since these 100 songs have the highest sort result, if the sort priority is selected so that the sort result is higher in the order that the user wants to play it sooner, playback is commanded from these 100 songs. There is a high possibility that the voice to be input will be input.

なお、背景技術で述べたように、作成途中の辞書は音声認識に用いることはできない。先述したように、音声認識の際には、各楽曲データに対応する音声データと入力された音声との一致度の順位付けを行う。従って、辞書の作成、つまりツリー構造を構築最中の状態では、音声認識に用いることができない。こうした特徴により従来技術の課題が発生するところ、本実施例では辞書作成の対象とするデータを分割することで、この課題を解決したものである。 As described in the background art, a dictionary that is being created cannot be used for speech recognition. As described above, at the time of voice recognition, the degree of coincidence between the voice data corresponding to each piece of music data and the input voice is ranked. Therefore, it cannot be used for speech recognition in a state where a dictionary is being created, that is, a tree structure is being constructed. Such a feature causes a problem of the prior art. In the present embodiment, this problem is solved by dividing the data for which a dictionary is to be created.

また、追加に関しても、わずかな数の楽曲データの追加のために、正式辞書のツリー構造を直ぐに再構築しなくてもよい。従って、正式辞書についてはいつでも、また、暫定辞書については作成完了次第、音声認識が実行できるようになる。また、削除についても同様に、フラグを立てるだけの処理をすれば、音声認識を再開できる。変更は、追加と削除とを組み合わせた処理であるので、同様な効果が得られる。 Also, regarding addition, the tree structure of the formal dictionary does not have to be immediately reconstructed in order to add a small number of music data. Accordingly, speech recognition can be executed at any time for the formal dictionary and as soon as the temporary dictionary is created. Similarly, with regard to deletion, speech recognition can be resumed by performing a process that only sets a flag. Since the change is a process in which addition and deletion are combined, the same effect can be obtained.

以上に説明したように、暫定辞書や認識対象外フラグを活用することで、わずかな時間で音声認識を再開することができ、ユーザを待たせる時間が短くなる。
また、更新処理によって、適切なタイミングで、認識対象外フラグが立てられていない正式辞書一つに更新するようになっている。この処理によって、一つにまとめられたツリー構造によって効率よく音声認識ができ、さらに、認識対象外フラグが立てられていることによる処理が不要になる。 As described above, by using the temporary dictionary and the non-recognition flag, the speech recognition can be resumed in a short time, and the time for waiting the user is shortened.
In addition, the update process updates the formal dictionary with no recognition target flag set at an appropriate timing. By this processing, voice recognition can be performed efficiently by using a single tree structure, and further, processing due to the flag not to be recognized being set is unnecessary.

なお、本発明の実施形態は先述した実施例に限られない。例えば、ナビゲーション装置１０に替えて、音楽再生機能付き携帯電話や携帯プレイヤ１００に適用しても良い。また、携帯電話に適用する場合、人名などの登録名を音声認識するために音声認識用辞書を用い、認識した登録名から電話番号を特定する、という用途が考えられる。 The embodiment of the present invention is not limited to the above-described example. For example, it may be applied to a mobile phone with a music playback function or a mobile player 100 instead of the navigation device 10. In addition, when applied to a mobile phone, a use of using a speech recognition dictionary to recognize a registered name such as a person name and identifying a telephone number from the recognized registered name is conceivable.

また、楽曲データ群の分割は、どのようにしても構わない。例えば、分割後の集合に含まれる楽曲データ数は、一曲でも何曲でも自然数であれば良い。この他、分割によってできる集合の数を定めてもよい。また、楽曲データ群を分割するときに、楽曲データをどのようにソートしても構わない。例えば、入力された順でもよいしランダムでもよい。 The music data group may be divided in any way. For example, the number of pieces of music data included in the set after the division may be a natural number regardless of the number of songs. In addition, the number of sets formed by division may be determined. Further, the music data may be sorted in any way when the music data group is divided. For example, the order may be input or random.

最後に、特許請求の範囲と実施例との対応関係を述べる。入力手段はＳ１２０、Ｓ１４５、Ｓ１５５及びＳ１６５、分割手段はＳ１３０、作成手段はＳ１４０、Ｓ１５０及びＳ１７０、目印付加手段はＳ１６０及びＳ１８０、更新手段は更新処理、によってそれぞれ実現される。 Finally, the correspondence between the claims and the examples will be described. The input means is realized by S120, S145, S155 and S165, the dividing means is S130, the creation means is S140, S150 and S170, the mark adding means is S160 and S180, and the updating means is realized by update processing.

ところで出願時においては、独立請求項が複数ある。一方、実施例では、各独立項の発明を、一つの装置で実現している。従って、各独立項の任意の組み合わせに相当する発明は、本願の記載の範囲内であり、補正によって追加しても新規事項の追加には当たらない。 By the way, at the time of filing, there are a plurality of independent claims. On the other hand, in the embodiment, the invention of each independent term is realized by one apparatus. Accordingly, the invention corresponding to any combination of the independent claims is within the scope of the description of the present application, and even if added by amendment, no new matter is added.

ナビゲーション装置の概略構成を示す図。The figure which shows schematic structure of a navigation apparatus. 楽曲データ群のデータ構造を示すテーブル。The table which shows the data structure of a music data group. 辞書構築処理を表すフローチャート。The flowchart showing a dictionary construction process. 分割された楽曲データの構造を示すテーブル。The table which shows the structure of the divided music data. ツリー構造を表す図。The figure showing a tree structure. 音楽再生処理を表すフローチャート。The flowchart showing a music reproduction process. 更新処理を表すフローチャート。The flowchart showing an update process.

Explanation of symbols

１０…ナビゲーション装置、１１…測位器、１１ａ…ＧＰＳ受信機、１１ｂ…ジャイロスコープ、１１ｃ…距離センサ、１２…操作スイッチ群、１３…音声入力部、１４…表示部、１５…音声出力部、１７…ＨＤＤ、１８…外部接続インタフェース、２０…制御部、２０ａ…ＣＰＵ、２０ｂ…ＲＡＭ、２０ｃ…ＲＯＭ、２０ｄ…ＮＶＲＡＭ、１００…携帯プレイヤ DESCRIPTION OF SYMBOLS 10 ... Navigation apparatus, 11 ... Positioning device, 11a ... GPS receiver, 11b ... Gyroscope, 11c ... Distance sensor, 12 ... Operation switch group, 13 ... Voice input part, 14 ... Display part, 15 ... Voice output part, 17 ... HDD, 18 ... external connection interface, 20 ... control unit, 20a ... CPU, 20b ... RAM, 20c ... ROM, 20d ... NVRAM, 100 ... portable player

Claims

An input means for receiving an input of words and phrases,
Dividing means for dividing the phrase group received by the input means into a plurality of sets;
A speech recognition dictionary creation device comprising: creation means for creating a speech recognition dictionary having a tree structure for each set divided by the dividing means.

The update means for creating one speech recognition dictionary by integrating a plurality of speech recognition dictionaries created by the creation means by constructing a new tree structure. Voice recognition dictionary creation device.

An input means for receiving an input of a phrase;
A speech recognition dictionary creation device comprising: creation means for creating a speech recognition dictionary for a word received by the input means separately from an existing speech recognition dictionary having a tree structure.

The speech recognition dictionary created by the creation means and the existing speech recognition dictionary are integrated by constructing a new tree structure to provide an update means for creating one speech recognition dictionary. The dictionary creation apparatus for speech recognition according to claim 3.

An input means for receiving input of information indicating words to be excluded from speech recognition among words recorded in an existing speech recognition dictionary having a tree structure;
A speech recognition dictionary creating apparatus comprising: mark adding means for adding a mark corresponding to a word indicated by information received by the input means.

The speech recognition according to claim 5, further comprising: update means for creating a speech recognition dictionary by constructing a new tree structure by removing a word / phrase to which the mark is added from the existing speech recognition dictionary. Dictionary creation device.

An input means for receiving input of information indicating that a predetermined phrase recorded in an existing speech recognition dictionary having a tree structure is changed to another phrase;
Mark adding means for adding a mark corresponding to the predetermined word indicated by the information received by the input means;
A speech recognition dictionary creation device, comprising: creation means for creating a speech recognition dictionary for the other words received by the input means separately from the existing speech recognition dictionary.

By integrating the speech recognition dictionary created by the creating means and the speech recognition dictionary excluding the phrase with the mark added from the existing speech recognition dictionary by constructing a new tree structure, one speech The speech recognition dictionary creating apparatus according to claim 7, further comprising an updating unit that creates a recognition dictionary.