JP4302559B2

JP4302559B2 - Content calling device and content calling method

Info

Publication number: JP4302559B2
Application number: JP2004091492A
Authority: JP
Inventors: 秀明竹田
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2004-03-26
Filing date: 2004-03-26
Publication date: 2009-07-29
Anticipated expiration: 2024-03-26
Also published as: JP2005276049A

Description

本発明は、所定のマークアップ言語によって記述されたコンテンツデータを音声入力によってサーバから呼び出すコンテンツ呼び出し装置及びコンテンツ呼び出し方法に関する。 The present invention relates to a content calling apparatus and a content calling method for calling content data described in a predetermined markup language from a server by voice input.

一般に、携帯電話や携帯情報端末、車載用のナビゲーション装置等の電子機器において、ネットワークを介してサーバに接続し、所望のコンテンツデータを取得することが行われている。このようなコンテンツへのアクセスは、電子機器に設けたディスプレイ上に表示されるＧＵＩ（Graphical User Interface：グラフィカルユーザインターフェース）を用いたブラウザにより行われてきた。また、所望のコンテンツを呼び出す方法として、階層構造のメニューを辿っていくという方法が多く採用されている。 In general, electronic devices such as a mobile phone, a portable information terminal, and an in-vehicle navigation device are connected to a server via a network to acquire desired content data. Such access to content has been performed by a browser using a GUI (Graphical User Interface) displayed on a display provided in an electronic device. In addition, as a method for calling a desired content, a method of following a hierarchical menu is often employed.

例えば車載用のナビゲーション装置では、図９（ａ）に示す地図画面が表示されている状態で、ユーザがＧＵＩを使用して、所望のコンテンツ（例えば、ニュース全般）を呼び出すために、図９（ｂ）に示すトップメニュー画面を表示する。そして、トップメニューに表示されている項目の中から所望のコンテンツに該当する項目（例えばニュース）を選択する。すると、選択された項目の内容を示す項目メニュー画面（図９（ｃ））が表示され、ここから所望のコンテンツに該当する項目（例えば全般）を選択する。所望のコンテンツに該当する項目が選択されると、その内容を示す項目メニュー画面が表示される（例えば図９（ｄ）に示すニュース全般の記事リスト）。なお、図９（ｄ）に示すニュース全般の記事リストにおいて、各記事（記事１、記事２、記事３等）が選択されると各記事の詳細が表示される。 For example, in an in-vehicle navigation device, in order for a user to call a desired content (for example, news in general) using the GUI while the map screen shown in FIG. The top menu screen shown in b) is displayed. Then, an item (for example, news) corresponding to the desired content is selected from the items displayed on the top menu. Then, an item menu screen (FIG. 9C) showing the content of the selected item is displayed, and an item (for example, general) corresponding to the desired content is selected from here. When an item corresponding to the desired content is selected, an item menu screen indicating the content is displayed (for example, a general news article list shown in FIG. 9D). In addition, when each article (Article 1, Article 2, Article 3, etc.) is selected in the news general article list shown in FIG. 9D, the details of each article are displayed.

しかしながら、電子機器において通常表示されている画面から所望のコンテンツの内容を示す画面に辿り着くためには、トップメニュー及び項目メニューを経なければならないので、ユーザによるＧＵＩの操作が煩雑になってしまうという問題がある。 However, in order to reach the screen showing the content of the desired content from the screen normally displayed on the electronic device, it is necessary to go through the top menu and the item menu, so that the GUI operation by the user becomes complicated. There's a problem.

なお、次のコンテンツへの移動やフォームへのデータ入力を音声入力により行う技術が知られている（例えば特許文献１）。この技術によれば、ユーザの発話音声を音声認識して、その認識結果を用いて、次のコンテンツへの移動やフォームへのデータ入力を行う。
特開２００１−３０６６０１号公報 A technique for moving to the next content or inputting data to a form by voice input is known (for example, Patent Document 1). According to this technology, speech of a user's speech is recognized and movement to the next content or data input to a form is performed using the recognition result.
JP 2001-306601 A

しかしながら、この特許文献１の技術においても、所望のコンテンツの内容を示す画面に辿り着くためには、トップメニュー及び項目メニューを経なければならないので、ユーザによる音声入力の頻度が増してしまい、操作が煩雑になってしまうという問題がある。 However, even in the technique of Patent Document 1, in order to reach the screen showing the content of the desired content, it is necessary to go through the top menu and the item menu. There is a problem that it becomes complicated.

本発明は、このような問題を解決するために成されたものであり、階層構造のメニューを順に辿っていくといった面倒な操作を不要とし、所望のコンテンツを簡単に呼び出すことができるようにすることを目的とする。 The present invention has been made to solve such a problem, and eliminates the troublesome operation of sequentially traversing the menu of the hierarchical structure, so that desired contents can be easily called. For the purpose.

上記した課題を解決するために、本発明では、コンテンツを特定する認識対象語句と、コンテンツの存在位置情報とを対応付けて記憶しておく。そして、音声入力された音声を認識し、音声認識された認識語句とコンテンツを特定する認識対象語句とを比較して、両者が一致したときに、その認識対象語句に対応する存在位置情報により示されるサイトにアクセスして所望のコンテンツを呼び出すようにしている。 In order to solve the above-described problems, in the present invention, a recognition target word / phrase for specifying content and content location information are stored in association with each other. Then, the speech input speech is recognized, and the speech recognition recognition phrase is compared with the recognition target phrase specifying the content. When the two match, the presence position information corresponding to the recognition target phrase is indicated. The site is accessed and the desired content is called.

上記のように構成した本発明によれば、ユーザが所望のコンテンツを呼び出す際に、途中のメニュー画面を経ずに一回の音声入力で所望のコンテンツを直接呼び出すことができるので、ＧＵＩ等による煩雑な操作が不要となる。また、所望のコンテンツを呼び出す際に、音声入力を何度も行う必要がなくなる。従って、ユーザは、所望のコンテンツを簡単に呼び出すことができる。 According to the present invention configured as described above, when a user calls a desired content, the desired content can be directly called by one voice input without going through a menu screen on the way. No complicated operation is required. Further, it is not necessary to repeatedly input voice when calling desired content. Therefore, the user can easily call desired content.

以下、本発明による一実施形態を図面に基づいて説明する。図１は、本実施形態によるコンテンツ呼び出し装置を含むコンテンツ送受信システムの構成例を示すブロック図である。図１に示すように、本実施形態によるコンテンツ呼び出し装置１は、音声入力Ｉ／Ｆ２、音声認識部３、ディスプレイコントローラ４、データ記憶部５、通信部６、制御部７を備えて構成されている。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram illustrating a configuration example of a content transmission / reception system including a content calling device according to the present embodiment. As shown in FIG. 1, the content calling device 1 according to the present embodiment includes a voice input I / F 2, a voice recognition unit 3, a display controller 4, a data storage unit 5, a communication unit 6, and a control unit 7. Yes.

また、本実施形態によるコンテンツ呼び出し装置１を含むコンテンツ送受信システムは、コンテンツ呼び出し装置１、コンテンツ呼び出し装置１の音声入力Ｉ／Ｆ２に接続され、音声を入力するためのマイク２０、コンテンツ呼び出し装置１のディスプレイコントローラ４に接続され、画像を表示するためのディスプレイ４０、インターネット９等のネットワーク網、サーバ１０により構成されている。サーバ１０では、存在位置情報、例えばＵＲＬ（Uniform Resource Locator）に対応させてコンテンツデータをコンテンツデータ記憶部１１に記憶している。 The content transmission / reception system including the content calling device 1 according to the present embodiment is connected to the content calling device 1, the audio input I / F 2 of the content calling device 1, the microphone 20 for inputting sound, and the content calling device 1. A display 40 connected to the display controller 4 is configured by a display 40 for displaying images, a network network such as the Internet 9, and a server 10. In the server 10, content data is stored in the content data storage unit 11 in correspondence with presence position information, for example, a URL (Uniform Resource Locator).

音声入力Ｉ／Ｆ２は、マイク２０によって入力された音声をコンテンツ呼び出し装置１内に入力する。また、音声認識部３は、音声入力Ｉ／Ｆ２より入力する音声を認識する。ここで、音声認識部３は、認識対象である認識対象語句と認識ＩＤとを対応付けて記憶した辞書ＤＢを備えており、音声認識部３によって音声認識された認識語句は、認識ＩＤによって識別される。 The voice input I / F 2 inputs the voice input by the microphone 20 into the content calling device 1. The voice recognition unit 3 recognizes voice input from the voice input I / F 2. Here, the speech recognition unit 3 includes a dictionary DB that stores a recognition target phrase that is a recognition target and a recognition ID in association with each other, and the recognized phrase recognized by the speech recognition unit 3 is identified by the recognition ID. Is done.

ディスプレイコントローラ４は、メニュー画面や呼び出したコンテンツによる画面をディスプレイ４０に表示すると共に、本実施形態によるコンテンツ呼び出し装置１が車載用のナビゲーション装置に搭載された場合には、地図等の画面をディスプレイ４０に表示する。 The display controller 4 displays a menu screen or a screen based on the called content on the display 40. When the content calling device 1 according to the present embodiment is mounted on an in-vehicle navigation device, a screen such as a map is displayed on the display 40. To display.

データ記憶部５は、図２に示すような認識語句テーブルを記憶している。この認識語句テーブルは、音声認識部３によって認識されるべき音声である認識対象語句と、認識ＩＤと、認識対象語句によって特定されるコンテンツのＵＲＬとを関連付けて記憶したものである。ここで、この認識語句テーブルは、通信部６によってトップメニューのページ情報を取得した際に、そのトップメニューのページ情報中の記述に基づいて制御部７により生成され、データ記憶部５に記憶される。ここで、トップメニューのページ情報を取得するタイミングとしては、定期的にバックグランドで行うようにしても良いし、ＧＵＩによるメニュー操作によりトップメニューを表示したときに行うようにしても良い。 The data storage unit 5 stores a recognition word / phrase table as shown in FIG. This recognition word / phrase table stores a recognition target word / phrase that is a voice to be recognized by the voice recognition unit 3, a recognition ID, and a URL of content specified by the recognition target word / phrase in association with each other. Here, when the page information of the top menu is acquired by the communication unit 6, the recognition word / phrase table is generated by the control unit 7 based on the description in the page information of the top menu and stored in the data storage unit 5. Here, the timing for acquiring the page information of the top menu may be periodically performed in the background, or may be performed when the top menu is displayed by a menu operation using a GUI.

通信部６は、通信機能を有しており、インターネット９等のネットワークに接続し、指定されたＵＲＬにアクセスしたり、データの送受信を行ったりする。 The communication unit 6 has a communication function, and is connected to a network such as the Internet 9 to access a designated URL and transmit / receive data.

制御部７は、通信部６によってトップメニューのページ情報を取得した際に、そのトップメニューのページ情報中の記述に基づいて認識語句テーブルを生成し、データ記憶部５に記憶する。また、制御部７は、音声認識部３にて音声認識された認識語句の認識ＩＤと、データ記憶部５に記憶されている認識語句テーブルの認識ＩＤとを比較して、一致する認識ＩＤが存在する場合には、その認識ＩＤに対応するＵＲＬをデータ記憶部５より読み出し、このＵＲＬを通信部６に出力する。通信部６では、制御部７から入力したＵＲＬに基づいて、インターネット９を介してサーバ１０にアクセスし、サーバ１０に設けられたコンテンツデータ記憶部１１から前述したＵＲＬに対応するコンテンツデータを取得して制御部７に出力する。制御部７では、通信部６より入力したコンテンツデータをディスプレイコントローラ４に出力し、そのコンテンツに基づく画面をディスプレイ４０に表示する。 When the communication unit 6 acquires the top menu page information, the control unit 7 generates a recognition word / phrase table based on the description in the page information of the top menu and stores it in the data storage unit 5. In addition, the control unit 7 compares the recognition ID of the recognized word / phrase recognized by the voice recognition unit 3 with the recognition ID of the recognized word / phrase table stored in the data storage unit 5 and finds a matching recognition ID. If it exists, the URL corresponding to the recognition ID is read from the data storage unit 5, and this URL is output to the communication unit 6. The communication unit 6 accesses the server 10 via the Internet 9 based on the URL input from the control unit 7, and acquires content data corresponding to the URL described above from the content data storage unit 11 provided in the server 10. To the control unit 7. The control unit 7 outputs the content data input from the communication unit 6 to the display controller 4 and displays a screen based on the content on the display 40.

本実施形態では、例えば図３に示すような画面をディスプレイ４０に表示する。すなわち、コンテンツ呼び出し装置１が車載用のナビゲーション装置に搭載されている場合には、初期状態において、図３（ａ）に示すように、ナビゲーション装置から出力された地図画像がディスプレイ４０に表示されている。この状態で、ニュース全般の記事リストのコンテンツを表示させるために、マイク２０に対して「ゼンパン」と音声入力すると、図３（ｂ）に示すように、該当するコンテンツの画面がディスプレイ４０に直ちに表示される。この場合の動作は以下の通りである。 In the present embodiment, for example, a screen as shown in FIG. That is, when the content call device 1 is mounted on an in-vehicle navigation device, the map image output from the navigation device is displayed on the display 40 in the initial state as shown in FIG. Yes. In this state, in order to display the content of the article list of the news in general, when “Zenpan” is input to the microphone 20 as shown in FIG. 3B, the screen of the corresponding content is immediately displayed on the display 40. Is displayed. The operation in this case is as follows.

まず、音声認識部３では、音声入力Ｉ／Ｆ２を介してマイク２０から「ゼンパン」という音声を入力してこれを認識し、入力された「ゼンパン」という認識語句の認識ＩＤを識別する（この場合、認識ＩＤは“１”である）。制御部７では、音声認識部３により音声認識された認識語句の認識ＩＤと、データ記憶部５に記憶された認識語句テーブルの認識ＩＤとを比較する。そして、音声認識された認識語句の認識ＩＤと認識語句テーブルの認識ＩＤとが一致した場合には、制御部７は、一致した認識ＩＤに対応するＵＲＬをデータ記憶部５より読み出し、通信部６に出力する。 First, the voice recognition unit 3 recognizes the voice “Zenpan” inputted from the microphone 20 via the voice input I / F 2 and identifies the recognition ID of the inputted recognition phrase “Zempan” (this) In this case, the recognition ID is “1”). The control unit 7 compares the recognition ID of the recognized word / phrase recognized by the voice recognition unit 3 with the recognition ID of the recognized word / phrase table stored in the data storage unit 5. When the recognition ID of the recognized word / phrase recognized and the recognition ID of the recognized word / phrase table match, the control unit 7 reads out the URL corresponding to the matching recognition ID from the data storage unit 5, and the communication unit 6. Output to.

ここで、図２に示すように、「ゼンパン」という認識語句に対して、データ記憶部５に記憶された認識語句テーブルを参照すると、「ゼンパン」という認識対象語句には、認識ＩＤとして“１”が付されており、これに対応するコンテンツのＵＲＬは、http://…/all-list.xmlであることがわかる。なお、同じＵＲＬに対応する認識対象語句として、認識ＩＤが“２”である「サイシンノニュース」や、認識ＩＤが“３”である「サイシンニュース」、認識ＩＤが“４”である「サイシンノ」等があり、これらの認識対象語句を発話しても同じＵＲＬを識別することができる。 Here, as shown in FIG. 2, when referring to the recognized word / phrase table stored in the data storage unit 5 for the recognized word / phrase “Zempan”, the recognition target word / phrase “Zempan” has “1” as the recognition ID. "Is attached, and it can be seen that the URL of the content corresponding to this is http: //.../all-list.xml. It should be noted that, as recognition target phrases corresponding to the same URL, “Saishin News” with a recognition ID “2”, “Saishin News” with a recognition ID “3”, and “Saishin News” with a recognition ID “4”. The same URL can be identified even if these recognition target words are uttered.

通信部６では、前述した「ゼンパン」という認識対象語句に対応したＵＲＬ（http://…/all-list.xml）を制御部７より入力し、このＵＲＬに基づいてサーバ１０にアクセスする。サーバ１０では、このＵＲＬに対応するコンテンツデータ（ニュース全般の記事リスト）をコンテンツデータ記憶部１１から読み出す。通信部６は、サーバ１０からコンテンツデータを取得し、制御部７に出力する。制御部７では、取得したコンテンツデータをディスプレイコントローラ４に出力し、図３（ｂ）に示すようなコンテンツの画面をディスプレイ４０に表示する。 In the communication unit 6, a URL (http: //... /All-list.xml) corresponding to the recognition target word “Zempan” is input from the control unit 7 and the server 10 is accessed based on this URL. The server 10 reads content data (news article list) corresponding to the URL from the content data storage unit 11. The communication unit 6 acquires content data from the server 10 and outputs the content data to the control unit 7. The control unit 7 outputs the acquired content data to the display controller 4 and displays a content screen as shown in FIG.

次に、本実施形態によるコンテンツ呼び出し装置の動作及びコンテンツ呼び出し方法を説明する。図４は、本実施形態によるコンテンツ呼び出し装置の動作及びコンテンツ呼び出し方法を示すフローチャートである。まず、制御部７では、音声入力Ｉ／Ｆ２より音声が入力されたか否かを調べる（ステップＳ１）。音声が入力されない場合（ステップＳ１にてＮＯ）、ステップＳ１の処理を繰り返す。 Next, the operation of the content calling device and the content calling method according to the present embodiment will be described. FIG. 4 is a flowchart showing the operation of the content call device and the content call method according to the present embodiment. First, the control unit 7 checks whether or not a voice is input from the voice input I / F 2 (step S1). If no sound is input (NO in step S1), the process in step S1 is repeated.

一方、音声が入力された場合には（ステップＳ１にてＹＥＳ）、音声入力Ｉ／Ｆ２より入力された音声を音声認識部３にて音声認識して、音声認識が成功したか否かを調べる（ステップＳ２）。音声認識が成功しなかった場合には（ステップＳ２にてＮＯ）、エラーであることをディスプレイ４０に表示するか、エラーであることを図示しない音声出力部により出力する（ステップＳ３）。そして、再度音声入力を行うか否かを調べる（ステップＳ４）。 On the other hand, when a voice is input (YES in step S1), the voice recognition unit 3 recognizes the voice input from the voice input I / F 2 to check whether the voice recognition is successful. (Step S2). If the voice recognition is not successful (NO in step S2), an error is displayed on the display 40, or an error is output by a voice output unit (not shown) (step S3). Then, it is checked whether or not the voice input is performed again (step S4).

再度音声入力を行わない場合には（ステップＳ４にてＮＯ）、処理を終了し、再度音声入力を行う場合には（ステップＳ４にてＹＥＳ）、ステップＳ１の処理に戻る。なお、再度音声入力を行うか否かは、エラー出力に応答して再度音声入力を行う旨の指示がユーザにより行われたか否かによって判断する。 If the voice input is not performed again (NO in step S4), the process ends. If the voice input is performed again (YES in step S4), the process returns to step S1. Whether or not to perform voice input again is determined based on whether or not an instruction to perform voice input again in response to an error output is given by the user.

一方、音声認識が成功した場合には（ステップＳ２にてＹＥＳ）、制御部７は、音声認識部３により音声認識された認識語句の認識ＩＤと、データ記憶部５に記憶された認識語句テーブルの認識ＩＤとを比較し、音声認識された認識語句と一致する認識対象語句に対応するＵＲＬが存在するか否かを調べる（ステップＳ５）。 On the other hand, when the voice recognition is successful (YES in step S2), the control unit 7 recognizes the recognition ID of the recognized word / phrase recognized by the voice recognition unit 3 and the recognized word / phrase table stored in the data storage unit 5. Are compared with the recognition IDs of the URLs, and it is checked whether or not there is a URL corresponding to the recognition target word / phrase that matches the recognition word / phrase recognized by voice recognition (step S5).

音声認識された認識語句と一致する認識対象語句に対応するＵＲＬが存在しなかった場合には（ステップＳ５にてＮＯ）、換言すると、音声認識された認識語句の認識ＩＤと認識語句テーブルの認識ＩＤとが一致しなかった場合には、エラーであることをディスプレイ４０に表示するか、エラーであることを図示しない音声出力部により出力する（ステップＳ３）。そして、再度音声入力を行うか否かを調べる（ステップＳ４）。 If there is no URL corresponding to the recognition target phrase that matches the recognized speech phrase (NO in step S5), in other words, the recognition ID of the recognized speech phrase and the recognition phrase table recognition. If the ID does not match, an error is displayed on the display 40, or an error is output by a voice output unit (not shown) (step S3). Then, it is checked whether or not the voice input is performed again (step S4).

再度音声入力を行わない場合には（ステップＳ４にてＮＯ）、処理を終了し、再度音声入力を行う場合には（ステップＳ４にてＹＥＳ）、ステップＳ１の処理に戻る。 If the voice input is not performed again (NO in step S4), the process is terminated. If the voice input is performed again (YES in step S4), the process returns to step S1.

一方、音声認識された認識語句と一致する認識対象語句に対応するＵＲＬが存在した場合には（ステップＳ５にてＹＥＳ）、換言すると、音声認識された認識語句の認識ＩＤと認識語句テーブルの認識ＩＤとが一致した場合には、制御部７は、所望のコンテンツに基づく画面を表示するためのブラウザを起動する（ステップＳ６）。ブラウザが起動すると、制御部７は、一致した認識ＩＤに対応するＵＲＬをデータ記憶部５より読み出して、通信部６に出力する。通信部６では、入力したＵＲＬに基づいて、サーバ１０にアクセスする（ステップＳ７）。 On the other hand, if there is a URL corresponding to the recognition target phrase that matches the recognized speech phrase (YES in step S5), in other words, the recognition ID of the recognized speech phrase and the recognition phrase table recognition. If the ID matches, the control unit 7 activates a browser for displaying a screen based on the desired content (step S6). When the browser is activated, the control unit 7 reads the URL corresponding to the matched recognition ID from the data storage unit 5 and outputs it to the communication unit 6. The communication unit 6 accesses the server 10 based on the input URL (step S7).

サーバ１０では、このＵＲＬに対応するコンテンツデータをコンテンツデータ記憶部１１から読み出す。通信部６は、サーバ１０からコンテンツデータを取得し、制御部７に出力する。そして、制御部７は、通信部６より入力したコンテンツデータをディスプレイコントローラ４に出力することにより、所望のコンテンツに基づく画面をディスプレイ４０に表示する（ステップＳ８）。 The server 10 reads content data corresponding to this URL from the content data storage unit 11. The communication unit 6 acquires content data from the server 10 and outputs the content data to the control unit 7. And the control part 7 displays the screen based on a desired content on the display 40 by outputting the content data input from the communication part 6 to the display controller 4 (step S8).

次に、本実施形態によるコンテンツ呼び出し装置における認識対象語句登録の動作を説明する。図５は、本実施形態によるコンテンツ呼び出し装置における認識対象語句登録の動作を示すフローチャートである。まず、制御部７では、通信部６によりトップメニューのページ情報が取得されたか否かを調べる（ステップＳ１１）。トップメニューのページ情報が取得されない場合には（ステップＳ１１にてＮＯ）、ステップＳ１１の処理を繰り返す。 Next, the recognition target phrase registration operation in the content calling device according to the present embodiment will be described. FIG. 5 is a flowchart showing the recognition target word registration operation in the content calling device according to the present embodiment. First, the control unit 7 checks whether or not the page information of the top menu has been acquired by the communication unit 6 (step S11). If the page information of the top menu is not acquired (NO in step S11), the process of step S11 is repeated.

一方、トップメニューのページ情報が取得されると（ステップＳ１１にてＹＥＳ）、制御部７では、そのトップメニューの記述文中に、コンテンツのＵＲＬと、これに対応する認識対象語句とが記述されているか否かを調べる（ステップＳ１２）。例えば、トップメニューは図６に示すように、ＸＭＬ（extensible Markup Language）によって記述されており、＜link filetype=”XML_SUBJECT”＞タグで記述された「all-list.xml」がニュース全般の記事リストに関するコンテンツのＵＲＬを示し、＜voicekey＞タグで記述された「ゼンパン」、「サイシンノニュース」、「サイシンニュース」、「サイシンノ」が認識対象語句を示している。 On the other hand, when the page information of the top menu is acquired (YES in step S11), the control unit 7 determines whether the URL of the content and the corresponding recognition target word / phrase are described in the description text of the top menu. (Step S12). For example, as shown in FIG. 6, the top menu is described in XML (extensible Markup Language), and “all-list.xml” described in the <link filetype = “XML_SUBJECT”> tag is related to the article list of general news. This indicates the URL of the content, and “Zenpan”, “Saishinno News”, “Saishin News”, and “Saishinno” described by the <voicekey> tag indicate the words to be recognized.

取得されたトップメニューのページ情報において、コンテンツのＵＲＬと、これに対応する認識対象語句が存在しない場合には（ステップＳ１２にてＮＯ）、処理を終了する。 If the URL of the content and the corresponding recognition target word / phrase do not exist in the acquired top menu page information (NO in step S12), the process ends.

一方、取得されたトップメニューのページ情報において、コンテンツのＵＲＬと、これに対応する認識対象語句とが存在する場合には（ステップＳ１２にてＹＥＳ）、制御部７では、そのＵＲＬと認識対象語句とに基づき、認識語句テーブルを生成してデータ記憶部５に記憶する（ステップＳ１３）。ここで、制御部７は、それぞれの認識対象語句に対応する認識ＩＤを認識語句テーブルに記憶すると共に、それぞれの認識対象語句と、それぞれの認識対象語句に対応する認識ＩＤとを音声認識部３の辞書ＤＢに記憶する。これにより、ユーザは、所望のコンテンツのＵＲＬに対応する認識対象語句を音声入力Ｉ／Ｆ２より入力することで、所望のコンテンツを呼び出すことができるようになる。 On the other hand, if the URL of the content and the corresponding recognition target word / phrase exist in the acquired top menu page information (YES in step S12), the control unit 7 determines the URL and the recognition target word / phrase. Based on the above, a recognition word / phrase table is generated and stored in the data storage unit 5 (step S13). Here, the control unit 7 stores the recognition ID corresponding to each recognition target word / phrase in the recognition word / phrase table, and also recognizes each recognition target word / phrase and the recognition ID corresponding to each recognition target word / phrase. Is stored in the dictionary DB. Accordingly, the user can call the desired content by inputting the recognition target word / phrase corresponding to the URL of the desired content from the voice input I / F 2.

以上詳しく説明したように、本実施形態によれば、コンテンツのＵＲＬと認識対象語句とを対応付けて記憶し、ユーザが所望のコンテンツを呼び出す際に、入力された音声を認識して、その認識語句と一致する認識対象語句に対応するＵＲＬのコンテンツを呼び出すようにしているので、途中のメニュー画面を経ずに一回の音声入力で所望のコンテンツを直接呼び出すことができる。従って、ユーザは、ＧＵＩ等による煩雑な操作や、音声入力を何度も行うことなしに、所望のコンテンツを簡単に呼び出すことができる。 As described above in detail, according to the present embodiment, the URL of the content and the recognition target phrase are stored in association with each other, and when the user calls the desired content, the input voice is recognized and recognized. Since the content of the URL corresponding to the recognition target word / phrase that matches the word / phrase is called, the desired content can be directly called by one voice input without going through the menu screen. Therefore, the user can easily call up the desired content without performing complicated operations using a GUI or the like and performing voice input many times.

また、本実施形態によれば、ページ情報を取得することで、コンテンツのＵＲＬと、これに対応する認識対象語句とを取得することができるので、所望のコンテンツのＵＲＬを容易に更新することができ、データ記憶部５の内容を常に最新の状態に保つことができる。なお、トップメニューは頻繁にアクセスされるので、ページ情報をトップメニューから取得することで、更新の頻度を高めることができる。また、本実施形態によれば、トップメニューを記述する言語としてＸＭＬを使用しているので、記述の自由度が高く、認識対象語句等の定義を容易に行うことができる。 In addition, according to the present embodiment, by acquiring the page information, it is possible to acquire the URL of the content and the recognition target word / phrase corresponding to the URL, so that the URL of the desired content can be easily updated. The contents of the data storage unit 5 can always be kept up-to-date. Since the top menu is frequently accessed, the frequency of updating can be increased by acquiring page information from the top menu. Further, according to the present embodiment, since XML is used as a language for describing the top menu, the degree of freedom of description is high, and the recognition target words and the like can be easily defined.

なお、本実施形態において、所望のコンテンツを呼び出すためにトップメニューを取得した際に、トップメニューを記述しているＸＭＬによって所望のコンテンツのＵＲＬと認識対象語句とを関連付けた認識語句テーブルを生成し、データ記憶部５に記憶するようにしているが、これに限定されない。例えば、ＸＭＬだけでなくＨＴＭＬ（HyperText Markup Language）等の言語を使用しても良い。また、コンテンツ呼び出し装置１のデータ記憶部５に、予め認識語句テーブルを記憶しておくようにしても良いし、所定のタイミングで通信部６により認識語句テーブルをダウンロードしてデータ記憶部５に記憶するようにしても良い。 In the present embodiment, when a top menu is acquired to call a desired content, a recognition word / phrase table in which the URL of the desired content is associated with a recognition target word / phrase is generated by XML describing the top menu, and the data Although it memorize | stores in the memory | storage part 5, it is not limited to this. For example, not only XML but also a language such as HTML (HyperText Markup Language) may be used. In addition, the recognition phrase table may be stored in advance in the data storage unit 5 of the content calling device 1, or the recognition phrase table is downloaded by the communication unit 6 at a predetermined timing and stored in the data storage unit 5. You may make it do.

また、本実施形態において、音声認識部３にて音声認識された認識語句の認識ＩＤと、データ記憶部５に記憶された認識語句テーブルの認識ＩＤとを比較しているが、これに限定されない。例えば、音声認識部３にて音声認識された認識語句と、データ記憶部５に記憶された認識テーブルの認識対象語句とを直接比較するようにしても良い。 In the present embodiment, the recognition ID of the recognized word / phrase recognized by the voice recognition unit 3 is compared with the recognition ID of the recognized word / phrase table stored in the data storage unit 5, but the present invention is not limited to this. . For example, the recognition words / phrases recognized by the voice recognition unit 3 and the recognition target words / phrases in the recognition table stored in the data storage unit 5 may be directly compared.

また、本実施形態では、コンテンツ呼び出し装置１に内蔵された通信部６を使用してサーバ１０にアクセスしているが、これに限定されない。例えば、図７に示すように、サーバ１０にアクセス可能な携帯電話等の通信装置６０をコンテンツ呼び出し装置１に設けた通信Ｉ／Ｆ１２に接続するようにしても良い。 Moreover, in this embodiment, although the server 10 is accessed using the communication part 6 incorporated in the content calling device 1, it is not limited to this. For example, as shown in FIG. 7, a communication device 60 such as a mobile phone that can access the server 10 may be connected to a communication I / F 12 provided in the content calling device 1.

また、本実施形態では、表示されるコンテンツの画面がどのような認識対象語句に対応しているかをユーザに通知していないが、ＧＵＩ等を使用して手動で所望のコンテンツに辿り着いて、コンテンツに基づく画面がディスプレイ４０に表示されているときに、その画面に対応する認識対象語句をディスプレイ４０に表示しても良い。例えば図８に示すように、ニュース全般の記事リストがディスプレイ４０に表示されたときに、「この画面の認識対象語句は「ゼンパン」です」とディスプレイ４０に表示する。これにより、ユーザは所望のコンテンツがどのような認識対象語句に対応しているかを把握することができ、容易に音声入力を行うことができる。 Further, in the present embodiment, the user does not notify the recognition target word / phrase corresponding to the displayed content screen, but manually reaches the desired content using a GUI or the like, When a screen based on the content is displayed on the display 40, a recognition target word / phrase corresponding to the screen may be displayed on the display 40. For example, as shown in FIG. 8, when an article list of news in general is displayed on the display 40, “The recognition target word on this screen is“ Zempan ”” is displayed on the display 40. Thereby, the user can grasp what recognition target phrase the desired content corresponds to, and can easily perform voice input.

また、図示しないヘルプキー等を押下したり、ＧＵＩや音声入力によりヘルプ画面の表示を要求したりすることにより、コンテンツの内容を示す文字列（例えばコンテンツの名称やコンテンツのＵＲＬ等）と、そのＵＲＬに対応する認識語句とを一覧表示するようにしても良い。これによって、ユーザは、所望のコンテンツがどのような認識語句に対応しているかを把握することができ、容易に音声入力を行うことができる。 In addition, by pressing a help key (not shown) or requesting a help screen display by GUI or voice input, a character string (for example, content name or content URL) indicating the content content and its A list of recognition phrases corresponding to the URL may be displayed. Thereby, the user can grasp what recognition phrase the desired content corresponds to, and can easily perform voice input.

その他、上記実施形態は、本発明を実施するにあたっての具体化の一例を示したものに過ぎず、これによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその精神、またはその主要な特徴から逸脱することなく、様々な形で実施することができる。 In addition, the said embodiment is only what showed the example in implementation in implementing this invention, and, as a result, the technical scope of this invention should not be interpreted limitedly. In other words, the present invention can be implemented in various forms without departing from the spirit or main features thereof.

本発明は、所定のマークアップ言語によって記述されたコンテンツデータを音声入力によって呼び出すコンテンツ呼び出し装置に有用である。 The present invention is useful for a content calling device that calls content data described in a predetermined markup language by voice input.

本実施形態によるコンテンツ呼び出し装置を含むコンテンツ送受信システムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the content transmission / reception system containing the content call apparatus by this embodiment. 本実施形態によるコンテンツ呼び出し装置の認識語句テーブルの一例を示す図である。It is a figure which shows an example of the recognition phrase table of the content calling device by this embodiment. 本実施形態によるコンテンツ呼び出し装置によって表示される画面の一例を示す図である。It is a figure which shows an example of the screen displayed by the content call apparatus by this embodiment. 本実施形態によるコンテンツ呼び出し装置の動作及びコンテンツ呼び出し方法を示すフローチャートである。It is a flowchart which shows operation | movement of the content call apparatus by this embodiment, and a content call method. 本実施形態によるコンテンツ呼び出し装置における認識対象語句登録の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of recognition target word / phrase registration in the content calling device by this embodiment. 本実施形態によるコンテンツ呼び出し装置のＸＭＬを用いたメニュー記述の例を示す図である。It is a figure which shows the example of the menu description using XML of the content call apparatus by this embodiment. 本実施形態によるコンテンツ呼び出し装置を含むコンテンツ送受信システムの変形例を示すブロック図である。It is a block diagram which shows the modification of the content transmission / reception system containing the content call apparatus by this embodiment. 本実施形態による呼び出し装置によって表示される画面の変形例を示す図である。It is a figure which shows the modification of the screen displayed by the calling device by this embodiment. 従来のコンテンツ呼び出し装置によって表示される画面の一例を示す図である。It is a figure which shows an example of the screen displayed by the conventional content call apparatus.

Explanation of symbols

１コンテンツ呼び出し装置
２音声入力Ｉ／Ｆ
３音声認識部
４ディスプレイコントローラ
５データ記憶部
６通信部
７制御部
２０マイク
４０ディスプレイ 1 Content call device 2 Voice input I / F
3 Voice recognition unit 4 Display controller 5 Data storage unit 6 Communication unit 7 Control unit 20 Microphone 40 Display

Claims

A voice input unit for inputting voice;
A voice recognition unit for recognizing the voice input to the voice input unit;
A data storage unit for storing content location information in association with recognition target words;
A communication unit that accesses a site indicated by the location information stored in the data storage unit;
The recognition word / phrase recognized by the voice recognition unit is compared with the recognition target word / phrase stored in the data storage unit, and the recognition word / phrase corresponds to the recognition target word / phrase when the recognition word / phrase matches the recognition target word / phrase. Using the presence location information to access the site by the communication unit and call the content,
A display control unit for displaying an image based on the content called by the control unit on a display;
When the control unit further checks whether or not the content location information and the recognition target word / phrase corresponding to the content location information are described in the descriptive text on the top page of the content and determines that the content is described Is stored in the data storage unit in association with the presence position information and the recognition target phrase ,
The display control unit further includes a recognition target word / phrase corresponding to the content displayed on the display among recognition target words / phrases stored in the data storage unit when an image based on the content is displayed on the display. A content calling device, characterized in that information capable of identifying the content of the content corresponding to the recognition target word / phrase is displayed on the display .