JP2003302995A

JP2003302995A - Updating method of speech recognition grammar, information processor, and computer program

Info

Publication number: JP2003302995A
Application number: JP2002111206A
Authority: JP
Inventors: Masaaki Yamada; 雅章山田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2002-04-12
Filing date: 2002-04-12
Publication date: 2003-10-24

Abstract

<P>PROBLEM TO BE SOLVED: To efficiently and easily update a speech recognition grammar, which is stored in a user terminal (a client side information processor) that conducts information retrieval using speech recognition, into a speech recognition gram mar stored in a server side. <P>SOLUTION: In an information search system which is constituted of a user terminal UT and a server SV, inputted user's speech is recognized based on a first speech recognition grammar stored in the terminal UT and unknown words included in the voice are detected. When an unknown word is detected in the operation, the server SV generates difference information of two kinds of speech recognition grammars based on the version information of the first speech recognition grammar and the version information of a second speech recognition grammar stored in the server. The terminal UT updates the first speech recognition grammar into the second speech recognition grammar employing the difference information generated by the server SV. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声を用いて所望
の情報を検索する情報検索処理の分野に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to the field of information retrieval processing for retrieving desired information using voice.

【０００２】[0002]

【従来の技術】従来より、各種の情報が予め蓄積されて
いるサーバに対して、インターネット等の通信回線を介
してユーザ端末（クライアント）を接続すると共に、接
続されたサーバを利用して、ユーザ端末から所望の情報
を検索するサービスが普及しており、近年においては、
このような情報検索を、ユーザ端末にて入力した音声を
用いて行ないたいという要求がある。2. Description of the Related Art Conventionally, a user terminal (client) is connected via a communication line such as the Internet to a server in which various kinds of information are stored in advance, and a user who uses the connected server is connected. Services for searching for desired information from terminals have become widespread, and in recent years,
There is a demand to perform such information retrieval using the voice input at the user terminal.

【０００３】ユーザ端末における音声入力に応じて、サ
ーバを利用して情報検索を行なう場合、従来のシステム
では、以下に説明する手順が一般的である。[0003] In the case of performing information retrieval using a server in response to a voice input at a user terminal, a conventional system generally follows the procedure described below.

【０００４】図１６は、音声入力を利用した情報検索シ
ステムにおけるユーザ端末の情報検索処理を示すフロー
チャートである。FIG. 16 is a flow chart showing an information search process of a user terminal in an information search system using voice input.

【０００５】同図において、まず、ステップＳ１００１
において、ユーザ端末に保持されている音声認識文法の
バージョン情報を取得する。次に、ステップＳ１００２
において、ステップＳ１００１で取得された音声認識文
法のバージョン情報を、インターネット等の通信回線を
介してサーバに送信する。In the figure, first, step S1001.
At, the version information of the speech recognition grammar held in the user terminal is acquired. Next, step S1002.
In, the version information of the voice recognition grammar acquired in step S1001 is transmitted to the server via a communication line such as the Internet.

【０００６】次に、ステップＳ１００３において、サー
バ上の情報を適切に検索するために必要な音声認識文法
と、自端末に保持されている音声認識文法との差分情報
を受信する。そして、ステップＳ１００４において、自
端末に保持されている音声認識文法と、ステップＳ１０
０３にてサーバから受信した音声認識文法の差分情報と
により、自端末内の音声認識文法を再構成する。Next, in step S1003, the difference information between the voice recognition grammar necessary for properly searching the information on the server and the voice recognition grammar held in the terminal itself is received. Then, in step S1004, the speech recognition grammar held in the terminal itself and the step S10
In step 03, the voice recognition grammar in the terminal is reconstructed based on the voice recognition grammar difference information received from the server.

【０００７】次に、ステップＳ１００５において、ユー
ザにより発声された音声を入力し、ステップＳ１００６
では、入力された音声を、ステップＳ１００４にて再構
成された音声認識文法を用いて認識する。Next, in step S1005, the voice uttered by the user is input, and in step S1006.
Then, the input voice is recognized using the voice recognition grammar reconstructed in step S1004.

【０００８】そして、ステップＳ１００７では、ステッ
プＳ１００６における認識結果を、検索用のキーワード
として、サーバに送信する。Then, in step S1007, the recognition result in step S1006 is transmitted to the server as a keyword for retrieval.

【０００９】ステップＳ１００８では、ステップＳ１０
０７にて送信したキーワードに対応する情報をサーバよ
り受信し、受信した情報を、ステップＳ１００９におい
て自端末にて出力し、ステップＳ１００１に戻る。In step S1008, step S10
The information corresponding to the keyword transmitted in 07 is received from the server, the received information is output to the own terminal in step S1009, and the process returns to step S1001.

【００１０】次に、上記のユーザ端末の動作に対応する
サーバ側の処理について、図４を用いて説明する。Next, the processing on the server side corresponding to the operation of the user terminal will be described with reference to FIG.

【００１１】図４は、情報提供を行なうサーバの制御処
理を示すフローチャートである。FIG. 4 is a flowchart showing the control processing of the server that provides information.

【００１２】同図において、まず、ステップＳ８０１に
おいて、通信やユーザ入力といったイベントが発生する
まで待機する。イベントが検知された場合にはステップ
Ｓ８０２に処理を移す。In FIG. 1, first, in step S801, the process waits until an event such as communication or user input occurs. If an event is detected, the process moves to step S802.

【００１３】ステップＳ８０２では、ステップＳ８０１
で検知されたイベントが音声認識文法のバージョン情報
の受信を表わすイベントであるかどうかを判定し、文法
バージョン情報の受信であるならばステップＳ８０３に
処理を移し、そうでなければステップＳ８０６に処理を
移す。In step S802, step S801
It is determined whether or not the event detected at is an event indicating the reception of version information of the voice recognition grammar, and if it is the reception of grammar version information, the process proceeds to step S803, and if not, the process proceeds to step S806. Transfer.

【００１４】ステップＳ８０３では、ユーザ端末から認
識文法のバージョンを受信し、ステップＳ８０４では、
受信した認識文法のバージョン、後述するステップＳ８
０７で作成された認識文法、並びにステップＳ８０８で
作成された認識文法差分作成情報を用いて、認識文法の
差分情報を作成する。In step S803, the version of the recognition grammar is received from the user terminal, and in step S804,
Version of the received recognition grammar, step S8 described below
Difference information of the recognition grammar is created using the recognition grammar created in 07 and the recognition grammar difference creation information created in step S808.

【００１５】次に、ステップＳ８０５において、ステッ
プＳ８０４で作成した認識文法の差分情報をユーザ端末
に送信し、ステップＳ８０１に戻る。Next, in step S805, the difference information of the recognition grammar created in step S804 is transmitted to the user terminal, and the process returns to step S801.

【００１６】ステップＳ８０６では、ステップＳ８０１
で検知されたイベントが、検索対象となる情報の追加・
削除といったデータベースの更新であるかどうかを判定
し、データベースの更新であるならばステップＳ８０７
に処理を移し、そうでなければステップＳ８０９に処理
を移す。In step S806, step S801
Events detected by are added information to be searched.
It is determined whether it is a database update such as deletion, and if it is a database update, step S807.
If not, the process proceeds to step S809.

【００１７】ステップＳ８０７では、検索対象となる情
報を検索するのに必要な音声認識文法を作成する。次
に、ステップＳ８０８において、認識文法の差分を作成
する際に必要となる情報を作成する。認識文法の差分情
報を作成する際に必要となる情報として、認識文法のバ
ージョン情報、各バージョンに対応する認識文法のバッ
クアップ等がある。In step S807, the speech recognition grammar necessary for searching the information to be searched is created. Next, in step S808, information necessary for creating a difference in recognition grammar is created. The information necessary for creating the difference information of the recognition grammar includes the version information of the recognition grammar and the backup of the recognition grammar corresponding to each version.

【００１８】ステップＳ８０９では、ステップＳ８０１
で検知されたイベントが、キーワードの受信であるか判
定し、キーワードの受信であるならば、ステップＳ８１
０に処理を移し、そうでなければステップＳ８０１に処
理を移す。In step S809, step S801
It is determined whether or not the event detected in step S6 is the reception of a keyword. If the event is the reception of a keyword, step S81
If not, the process proceeds to step S801.

【００１９】ステップＳ８１０では、ユーザ端末よりキ
ーワードを受信する。次に、ステップＳ８１１におい
て、ステップＳ８１０で受信したキーワードに対応する
データをデータベースより検索する。次に、ステップＳ
８１２において、ステップＳ８１１で検索したデータを
ユーザ端末に送信し、ステップＳ８０１に処理を移す。In step S810, the keyword is received from the user terminal. Next, in step S811, the database is searched for data corresponding to the keyword received in step S810. Next, step S
In 812, the data retrieved in step S811 is transmitted to the user terminal, and the process proceeds to step S801.

【００２０】[0020]

【発明が解決しようとする課題】しかしながら、上記従
来例には、以下の問題点がある。However, the above conventional example has the following problems.

【００２１】即ち、ユーザ端末上で適切に音声認識を行
なうためには、そのユーザ端末に保持されている音声認
識文法が適切なバージョンである必要があるが、そのユ
ーザ端末自身が音声認識文法を更新するタイミングを決
定することは困難である。That is, in order to properly perform the voice recognition on the user terminal, the voice recognition grammar held in the user terminal needs to be an appropriate version, but the user terminal itself uses the voice recognition grammar. It is difficult to determine when to update.

【００２２】理想的には、ステップＳ８０７にて認識文
法を作成する際に、ユーザ端末上の音声認識文法を更新
すれば良いが、サーバとユーザ端末とは常に通信回線で
接続されているとは限らず、サーバ上の認識文法作成の
タイミングと同期することは困難である。Ideally, when the recognition grammar is created in step S807, the voice recognition grammar on the user terminal may be updated, but it is not always said that the server and the user terminal are connected by a communication line. Not limited to this, it is difficult to synchronize with the timing of creating the recognition grammar on the server.

【００２３】また、多くの場合、サーバとユーザ端末を
結ぶ通信回線容量は小さく、サーバ上で音声認識文法が
更新される度にユーザ端末に対して差分を送信するシス
テム構成は合理的でない。特に、稀にしか検索対象にな
らないデータの変更の場合、ユーザ端末の音声認識文法
を更新する必要性は小さい。In many cases, the communication line capacity connecting the server and the user terminal is small, and the system configuration for transmitting the difference to the user terminal each time the voice recognition grammar is updated on the server is not rational. In particular, in the case of changing data that is rarely searched, there is little need to update the voice recognition grammar of the user terminal.

【００２４】そこで本発明は、音声認識を利用して情報
検索を行なうユーザ端末（クライアント側の情報処理装
置）に格納されている音声認識文法を、サーバ側に格納
されている音声認識文法に、効率良く容易に更新するこ
とを目的とする。Therefore, according to the present invention, the voice recognition grammar stored in the user terminal (the information processing apparatus on the client side) that performs information retrieval using voice recognition is replaced by the voice recognition grammar stored in the server side. The purpose is to update efficiently and easily.

【００２５】[0025]

【課題を解決するための手段】上記の目的を達成するた
め、本発明に係る音声認識文法の更新方法は、以下の構
成を特徴とする。In order to achieve the above object, a method for updating a speech recognition grammar according to the present invention is characterized by the following configuration.

【００２６】即ち、ユーザ端末とサーバとにより構成さ
れる情報検索システムにおける音声認識文法の更新方法
であって、入力されたユーザの音声を、前記ユーザ端末
に格納されている第１音声認識文法に基づいて認識する
と共に、その音声に含まれる未知語を検出する音声認識
工程と、前記音声認識工程にて未知語が検出された際
に、前記第１音声認識文法のバージョン情報と、前記サ
ーバに格納されている第２音声認識文法のバージョン情
報とが異なる場合に、前記ユーザ端末において、前記第
１音声認識文法を、前記第２音声認識文法に更新する更
新工程とを有することを特徴とする。That is, in a method of updating a voice recognition grammar in an information retrieval system composed of a user terminal and a server, the input voice of the user is converted into a first voice recognition grammar stored in the user terminal. A voice recognition step of recognizing based on the voice recognition step and detecting an unknown word included in the voice, and version information of the first voice recognition grammar when the unknown word is detected in the voice recognition step, and the server An update step of updating the first speech recognition grammar to the second speech recognition grammar at the user terminal when the version information of the stored second speech recognition grammar is different. .

【００２７】或いは、ユーザ端末とサーバとにより構成
される情報検索システムにおける音声認識文法の更新方
法であって、入力されたユーザの音声を、前記ユーザ端
末に格納されている第１音声認識文法に基づいて認識す
ると共に、その音声に含まれる未知語を検出する音声認
識工程と、前記音声認識工程にて未知語が検出された際
に、前記第１音声認識文法のバージョン情報と、前記サ
ーバに格納されている第２音声認識文法のバージョン情
報とに従って、前記第１音声認識文法と前記第２音声認
識文法との差分情報を、前記サーバにおいて生成する差
分生成工程と、前記差分生成工程にて生成された差分情
報を利用して、前記ユーザ端末において、前記第１音声
認識文法を、前記第２音声認識文法に更新する更新工程
とを有することを特徴とする。Alternatively, in a method of updating a voice recognition grammar in an information retrieval system composed of a user terminal and a server, the inputted voice of the user is converted into a first voice recognition grammar stored in the user terminal. A voice recognition step of recognizing based on the voice recognition step and detecting an unknown word included in the voice, and version information of the first voice recognition grammar when the unknown word is detected in the voice recognition step, and the server A difference generation step of generating difference information between the first speech recognition grammar and the second speech recognition grammar in the server according to the stored version information of the second speech recognition grammar, and a difference generation step. Updating the first speech recognition grammar to the second speech recognition grammar at the user terminal using the generated difference information. And butterflies.

【００２８】好適な実施形態において、前記差分生成工
程では、第ｎバージョンの音声認識文法と、第（ｎ−
１）バージョンの音声認識文法との差分情報を、全ての
ｋ（＜ｎ）について作成することにより、第ｎバージョ
ンの音声認識文法と、第（ｎ−ｋ）バージョンの音声認
識文法との差分情報が生成されると良い。In a preferred embodiment, the difference generation step includes the nth version of the speech recognition grammar and the (n-
1) Difference information between the speech recognition grammar of the n-th version and the speech recognition grammar of the (n−k) version is created by creating difference information with the version of the speech recognition grammar for all k (<n). Should be generated.

【００２９】また、例えば前記音声認識工程では、前記
未知語を検出された場合に、前記更新工程にて取得した
前記第２音声認識文法に基づいて、前記音声を再認識す
る良い。Further, for example, in the voice recognition step, when the unknown word is detected, the voice may be re-recognized based on the second voice recognition grammar acquired in the updating step.

【００３０】或いは、同目的を達成するための情報検索
用のサーバに接続可能なユーザ端末における音声認識文
法の更新方法であって、入力されたユーザの音声を、前
記ユーザ端末に格納されている第１音声認識文法に基づ
いて認識すると共に、その音声に含まれる未知語を検出
する音声認識工程と、前記音声認識工程にて未知語が検
出された際に、前記第１音声認識文法のバージョン情報
を前記サーバに送信すると共に、そのバージョン情報の
送信に応じて前記サーバから取得した差分情報を利用し
て、前記第１音声認識文法を、前記サーバに格納されて
いる第２音声認識文法に更新する更新工程とを有するこ
とを特徴とする。Alternatively, there is provided a method of updating a voice recognition grammar in a user terminal connectable to a server for information retrieval for achieving the same object, wherein an inputted user voice is stored in the user terminal. A voice recognition step of recognizing based on a first voice recognition grammar and detecting an unknown word included in the voice, and a version of the first voice recognition grammar when an unknown word is detected in the voice recognition step. The first speech recognition grammar is transmitted to the server and the second speech recognition grammar stored in the server is converted into the second speech recognition grammar by using the difference information acquired from the server in response to the transmission of the version information. And an updating step of updating.

【００３１】尚、同目的は、上記の各構成を有する音声
認識文法の更新方法を、通信機能を有するコンピュータ
によって実現するプログラムコード、及びそのプログラ
ムコードが格納されている、コンピュータ読み取り可能
な記憶媒体によっても達成される。It is to be noted that the same object is to implement a method for updating a speech recognition grammar having each of the above-mentioned configurations by a computer having a communication function, and a computer-readable storage medium storing the program code. Also achieved by.

【００３２】[0032]

【発明の実施の形態】以下、本発明に係る情報検索シス
テムの一実施形態を、図面を参照して詳細に説明する。BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of an information retrieval system according to the present invention will be described in detail below with reference to the drawings.

【００３３】［第１の実施形態］図１は、第１の実施形
態における情報検索システムのシステム構成図である。[First Embodiment] FIG. 1 is a system configuration diagram of an information retrieval system according to the first embodiment.

【００３４】同図において、ユーザ端末ＵＴとサーバＳ
Ｖとは、電話回線やインターネット等の通信回線を介し
て、一般的な手順によって、所謂サーバ・クライアント
環境を構成し、双方向通信可能に接続される。本実施形
態において、ユーザは、ユーザ端末ＵＴに接続された音
声入力装置Ｈ３０５を用いて音声を入力し、サーバＳＶ
に接続された外部記憶装置Ｈ１２に記憶された情報を検
索することができる。サーバＳＶにて検索された情報
は、ユーザ端末ＵＴにおいて音声及び／または表示によ
ってユーザに報知される。In the figure, the user terminal UT and the server S
The V is connected via a telephone line or a communication line such as the Internet in a so-called server / client environment by a general procedure, and is connected for bidirectional communication. In the present embodiment, the user inputs a voice using the voice input device H305 connected to the user terminal UT, and the server SV
The information stored in the external storage device H12 connected to can be searched. The information retrieved by the server SV is notified to the user by voice and / or display at the user terminal UT.

【００３５】図２は、第１の実施形態において情報検索
システムを構成するユーザ端末及びサーバのハードウェ
ア構成を示すブロック図である。FIG. 2 is a block diagram showing a hardware configuration of a user terminal and a server which constitute the information search system in the first embodiment.

【００３６】同図に示すユーザ端末ＵＴにおいて、Ｈ１
は、数値演算・制御等の処理を行なう中央処理装置（Ｃ
ＰＵ）であり、後述する情報検索処理の手順に従って演
算を行なう。Ｈ２は、ユーザに対して情報を、音声及び
／または表示によって提示するスピーカやディスプレイ
等の出力装置であり、音声認識結果やデータベース検索
結果などをユーザに提示する。In the user terminal UT shown in FIG.
Is a central processing unit (C
PU), and performs calculation according to the procedure of information search processing described later. H2 is an output device such as a speaker or a display that presents information to the user by voice and / or display, and presents the user with voice recognition results, database search results, and the like.

【００３７】Ｈ３は、ユーザがユーザ端末ＵＴに対して
動作の指示を与え、あるいは情報を入力する入力装置で
ある。入力装置Ｈ３は、タッチパネルやキーボード等の
汎用入力装置であり、認識ボタンＨ３０１、選択ボタン
Ｈ３０２、方向ボタンＨ３０３、ダイアルＨ３０４、並
びに音声を入力するマイクロホン等の音声入力装置Ｈ３
０５を含む。H3 is an input device through which the user gives an operation instruction to the user terminal UT or inputs information. The input device H3 is a general-purpose input device such as a touch panel and a keyboard, and a voice input device H3 such as a recognition button H301, a selection button H302, a direction button H303, a dial H304, and a microphone for inputting voice.
Including 05.

【００３８】Ｈ４は、ディスク装置や不揮発メモリ等の
記憶装置であり、後述する情報検索処理のソフトウエア
・プログラム、サーバＳＶとのデータ通信用プログラ
ム、音声認識文法及びそのバージョン情報等が格納され
ている。Ｈ５は、読み取り専用の記憶装置（ＲＯＭ）で
あり、ブートプログラム等の固定データが格納される。
Ｈ６は、ＲＡＭ等の一時情報を保持する記憶装置であ
り、一時的なデータや各種フラグ等が保持される。H4 is a storage device such as a disk device or a non-volatile memory, and stores a software program for information retrieval processing described later, a data communication program with the server SV, a voice recognition grammar and version information thereof. There is. H5 is a read-only storage device (ROM) in which fixed data such as a boot program is stored.
H6 is a storage device such as a RAM that holds temporary information, and holds temporary data and various flags.

【００３９】そしてＨ７は、モデム、ＬＡＮカード、Ｐ
ＨＳカード等のデータ通信装置であり、サーバＳＶとの
通信に用いられる。上述したユーザ端末ＵＴの各構成
は、内部バスによって接続されている。H7 is a modem, LAN card, P
A data communication device such as an HS card, which is used for communication with the server SV. The components of the user terminal UT described above are connected by an internal bus.

【００４０】尚、図１に示すユーザ端末ＵＴは、一例と
して、所謂ＰＤＡ（携帯情報端末）の形状を有するが、
本実施形態及び後述する第２の実施形態において説明す
る情報検索システムには、ユーザ端末ＵＴとして、有線
及び／または無線通信機能を有するパーソナル・コンピ
ュータを採用することもできる。The user terminal UT shown in FIG. 1 has, for example, a so-called PDA (personal digital assistant) shape.
A personal computer having a wired and / or wireless communication function can be adopted as the user terminal UT in the information search system described in this embodiment and the second embodiment described later.

【００４１】次に、図２に示すサーバＳＶにおいて、Ｈ
８は、数値演算・制御等の処理を行なう中央処理装置
（ＣＰＵ）であり、後述する制御処理の手順に従って演
算を行なう。Ｈ９は、ユーザに対して情報を提示するデ
ィスプレイ等の出力装置である。Next, in the server SV shown in FIG. 2, H
Reference numeral 8 denotes a central processing unit (CPU) that performs processing such as numerical calculation and control, and performs calculation according to a procedure of control processing described later. H9 is an output device such as a display for presenting information to the user.

【００４２】Ｈ１０は、ユーザが本装置に対して動作の
指示を与え、あるいは情報を入力する入力装置である。
Ｈ１１は、モデム、ＬＡＮカード、ＰＨＳカード等のデ
ータ通信装置であり、クライアント（ユーザ端末ＵＴ）
との通信に用いられる。H10 is an input device through which a user gives an operation instruction to the device or inputs information.
H11 is a data communication device such as a modem, a LAN card, and a PHS card, and is a client (user terminal UT).
Used to communicate with.

【００４３】Ｈ１２は、ディスク装置や不揮発メモリ等
の記憶装置であり、データベース・音声認識文法・認識
文法差分作成情報などが保持される。Ｈ１３は、読み取
り専用の記憶装置（ＲＯＭ）であり、ブートプログラム
や端末識別用の固定的なデータ等が格納される。H12 is a storage device such as a disk device or a non-volatile memory, and holds a database, voice recognition grammar, recognition grammar difference creation information, and the like. H13 is a read-only storage device (ROM) in which a boot program, fixed data for terminal identification, and the like are stored.

【００４４】そしてＨ１４は、ＲＡＭ等の一時情報を保
持する記憶装置であり、一時的なデータや各種フラグ等
が保持される。上述したサーバＳＶの各構成は、内部バ
スによって接続されている。H14 is a storage device such as a RAM for holding temporary information, which holds temporary data and various flags. The components of the server SV described above are connected by an internal bus.

【００４５】次に、上述した情報検索システムにおい
て、ユーザ端末ＵＴとサーバＳＶとにおいて実行される
処理について説明する。Next, the processing executed by the user terminal UT and the server SV in the above-mentioned information retrieval system will be described.

【００４６】図３は、第１の実施形態においてユーザ端
末ＵＴが行なう情報検索処理を示すフローチャートであ
り、中央処理装置（ＣＰＵ）Ｈ１が行なう処理の手順を
示す。FIG. 3 is a flow chart showing the information retrieval process performed by the user terminal UT in the first embodiment, and shows the procedure of the process performed by the central processing unit (CPU) H1.

【００４７】同図において、まず、ステップＳ７０１に
おいて、ユーザにより発声された音声を、音声入力装置
Ｈ３０５を介して入力し、ステップＳ７０２では、入力
された音声を、音声認識ステップＳ2において、外部記
憶装置Ｈ４に保持された音声認識文法を用いて認識す
る。In the figure, first, in step S701, the voice uttered by the user is input via the voice input device H305, and in step S702, the input voice is input to the external storage device in voice recognition step S2. Recognize using the voice recognition grammar held in H4.

【００４８】ステップＳ７０３では、ステップＳ７０１
において入力された音声中の未知語、即ちステップＳ７
０２にて認識に用いた音声認識文法に含まれていない語
を検出する。In step S703, step S701
Unknown word in the speech input in step S7
At 02, a word not included in the speech recognition grammar used for recognition is detected.

【００４９】次に、ステップＳ７０４では、ステップＳ
７０３において未知語が検出されたかどうかを判定し、
その結果、未知語が検出されていればステップＳ７０８
に進み、そうでなければステップＳ７０５に進む。Next, in step S704, step S
In 703, it is determined whether an unknown word is detected,
As a result, if an unknown word is detected, step S708.
Otherwise, to step S705.

【００５０】ステップＳ７０５では、ステップＳ７０２
における認識結果をキーワードとして、通信装置Ｈ７を
利用してサーバＳＶに送信し、ステップＳ７０６では、
係るキーワードの送信に応じて、当該キーワードに対応
する情報を、サーバＳＶから受信する。そして、受信し
た情報は、ステップＳ７０７において出力装置Ｈ２に出
力され、その後、処理はステップＳ７０１に戻る。In step S705, step S702
Is transmitted to the server SV using the communication device H7 as a keyword, and in step S706,
In response to the transmission of the keyword, the information corresponding to the keyword is received from the server SV. Then, the received information is output to the output device H2 in step S707, and then the process returns to step S701.

【００５１】一方、ステップＳ７０４において未知語が
検出された場合には、ステップＳ７０８において、外部
記憶装置Ｈ４に現在格納されている音声認識文法（第１
音声認識文法）を更新すべきか否かを判定し、その結
果、現在の文法を更新すべきと判定された場合にはステ
ップＳ７０９に進み、そうでなければステップＳ７０５
に進む。On the other hand, if an unknown word is detected in step S704, in step S708 the speech recognition grammar currently stored in the external storage device H4 (first
(Speech recognition grammar) is determined, and if it is determined that the current grammar is updated as a result, the process proceeds to step S709; otherwise, step S705.
Proceed to.

【００５２】ここで、音声認識文法を更新すべきか否か
は、既に音声認識文法が更新されているか、或いはユー
ザによって文法更新が禁止されるかによって判定する。
即ち、既に音声認識文法が更新されたにも関わらず再度
同じ未知語が検出された場合、或いは、音声認識文法の
更新を禁止するユーザの設定指示が予め行われている場
合等には、既に外部記憶装置Ｈ４に格納されてる音声認
識文法の更新は行なわない。Here, whether or not the voice recognition grammar should be updated is determined depending on whether the voice recognition grammar has already been updated or whether the grammar update is prohibited by the user.
That is, when the same unknown word is detected again even though the voice recognition grammar has already been updated, or when the user's setting instruction for prohibiting the update of the voice recognition grammar is given in advance, etc. The speech recognition grammar stored in the external storage device H4 is not updated.

【００５３】ステップＳ７０９では、外部記憶装置Ｈ４
に現在保持されている音声認識文法（第１音声認識文
法）のバージョン情報を読み出し、ステップＳ７１０で
は、読み出したバージョン情報を、通信装置Ｈ７を利用
してサーバＳＶに送信する。In step S709, the external storage device H4
The version information of the currently recognized speech recognition grammar (first speech recognition grammar) is read, and in step S710, the read version information is transmitted to the server SV using the communication device H7.

【００５４】ステップＳ７１１では、ステップＳ７１０
にてバージョン情報を送信するのに応じて、外部記憶装
置Ｈ４に現在保持されている音声認識文法との差分情報
を、サーバＳＶから受信する。次に、ステップＳ７１２
では、外部記憶装置Ｈ４に現在保持されている音声認識
文法と、サーバＳＶから受信した差分情報とにより、音
声認識文法を、サーバＳＶが保持する第２音声認識文法
と同じ音声認識文法に再構成し、ステップＳ７０２に戻
る。ここで、音声認識文法の再構成の手順は、現在では
一般的なものを採用することができるので、本実施形態
における説明は省略する。In step S711, step S710
In response to the version information being transmitted at, the difference information from the voice recognition grammar currently held in the external storage device H4 is received from the server SV. Next, step S712.
Then, the speech recognition grammar currently held in the external storage device H4 and the difference information received from the server SV are used to reconstruct the speech recognition grammar into the same speech recognition grammar as the second speech recognition grammar held by the server SV. Then, the process returns to step S702. Here, as a procedure for reconstructing the speech recognition grammar, a general procedure can be adopted at present, and therefore the description in this embodiment will be omitted.

【００５５】次に、上記のユーザ端末ＵＴの動作に対応
するサーバＳＶの処理については、上述した図４に示す
制御処理を採用することができるので、説明を省略す
る。Next, as for the processing of the server SV corresponding to the operation of the user terminal UT, the control processing shown in FIG. 4 described above can be adopted, and the description thereof will be omitted.

【００５６】このような本実施形態に係る情報検索シス
テムによれば、ユーザ端末ＵＴ上の音声認識文法を更新
するタイミングを自動的に決定することができ、その音
声認識文法が更新されるタイミングは、ユーザ端末ＵＴ
に現在格納されている音声認識文法以外の単語をユーザ
が発声した場合に限られる。これにより、図１６を参照
して説明した情報検索処理がクライアント側にて実行さ
れるシステム構成と比較して、ユーザ端末ＵＴ・サーバ
ＳＶ間の通信回数を削減することができるので、通信回
線の容量が比較的小さい場合であっても、音声認識文法
を効率良く更新することができる。According to the information retrieval system of this embodiment, the timing of updating the voice recognition grammar on the user terminal UT can be automatically determined, and the timing of updating the voice recognition grammar can be determined. , User terminal UT
Only when the user utters a word other than the speech recognition grammar currently stored in. As a result, the number of communications between the user terminal UT and the server SV can be reduced as compared with the system configuration in which the information search processing described with reference to FIG. 16 is executed on the client side. Even if the capacity is relatively small, the voice recognition grammar can be updated efficiently.

【００５７】＜第１の実施形態の変形例＞上述した第１
の実施形態において、ステップＳ７１１及びステップＳ
８０５で認識文法の差分を送受信しているが、場合によ
っては、認識文法全体を送受信しても良い。例えば、ス
テップＳ８０３においてサーバＳＶが受信した認識文法
バージョンが非常に古い場合には認識文法全体を送信す
るものとする。その際、送信内容が認識文法全体である
旨を示す情報を共に送信することにより、ユーザ端末Ｕ
ＴがステップＳ７１２において認識文法を総書き換え可
能に構成する。このようなシステム構成を採用すれば、
ステップＳ８０８で作成される認識文法のバックアップ
等の情報の量を抑えることができ、好適である。<Modification of First Embodiment> First Embodiment
In the embodiment of step S711 and step S711.
Although the difference of the recognition grammar is transmitted and received in 805, the entire recognition grammar may be transmitted and received in some cases. For example, if the recognition grammar version received by the server SV in step S803 is very old, the entire recognition grammar is transmitted. At that time, by transmitting together with the information indicating that the transmission content is the entire recognition grammar, the user terminal U
In step S712, T configures the recognition grammar to be totally rewritable. With this system configuration,
This is preferable because the amount of information such as the backup of the recognition grammar created in step S808 can be suppressed.

【００５８】［第２の実施形態］次に、上述した第１の
実施形態に係る情報検索システムを基本とする第２の実
施形態を説明する。以下の説明においては、第１の実施
形態と同様な構成を採用可能なシステム構成（図１及び
図２）については説明を省略し、上述したシステム構成
において、ユーザ端末ＵＴとサーバＳＶとにおいて実行
される処理について説明する。[Second Embodiment] Next, a second embodiment based on the information retrieval system according to the first embodiment will be described. In the following description, description of the system configuration (FIGS. 1 and 2) that can adopt the same configuration as that of the first embodiment is omitted, and the system configuration described above is executed by the user terminal UT and the server SV. The processing performed will be described.

【００５９】図５乃至図９は、第２の実施形態において
ユーザ端末ＵＴが行なう情報検索処理を示すフローチャ
ートであり、中央処理装置（ＣＰＵ）Ｈ１が行なう処理
の手順を示す。尚、本実施形態において、係る情報検索
処理のソフトウエア・プログラムは、イベント駆動型プ
ログラムとして記述されており、入力装置Ｈ３からのユ
ーザの入力操作等の割り込み・並行動作するサブルーチ
ンとの同期情報等はイベントとして発行される。FIG. 5 to FIG. 9 are flowcharts showing the information retrieval process performed by the user terminal UT in the second embodiment, and show the procedure of the process performed by the central processing unit (CPU) H1. In the present embodiment, the software program of the information search process is described as an event-driven program, and interrupt information such as a user's input operation from the input device H3 and synchronization information with a subroutine operating in parallel, etc. Is published as an event.

【００６０】同図において、まず、ステップＳ１におい
て、新たなイベントを取得し、ステップＳ２では、ステ
ップＳ１で取得したイベントがユーザによる認識ボタン
Ｈ３０１の押下によるものであるかを判定し、認識ボタ
ンＨ３０１の押下であるならばステップＳ３に処理を移
し、認識ボタンＨ３０１の押下でなければステップＳ４
に処理を移す。In the figure, first, in step S1, a new event is acquired, and in step S2, it is determined whether or not the event acquired in step S1 is the depression of the recognition button H301 by the user. If it is, the process proceeds to step S3. If it is not, the recognition button H301 is not pressed, step S4.
Transfer processing to.

【００６１】ステップＳ３では、音声入力装置Ｈ３０５
を介して入力されるユーザの音声認識を開始した後、ス
テップＳ１に処理を移す。ここで、本実施形態における
音声認識は、図５乃至図９の各フローチャートに示され
る処理とは非同期に動作し、音声入力装置Ｈ３０５から
音声が入力されるのに従って一般的な音声認識が実行さ
れる。そして、本ステップにおいて音声認識結果が確定
した場合には、音声認識終了を示すイベントが発行され
る。In step S3, the voice input device H305
After starting the voice recognition of the user input via, the process proceeds to step S1. Here, the voice recognition in the present embodiment operates asynchronously with the processes shown in the flowcharts of FIGS. 5 to 9, and general voice recognition is executed as the voice is input from the voice input device H305. It When the voice recognition result is confirmed in this step, an event indicating the end of voice recognition is issued.

【００６２】ステップＳ４では、ステップＳ１で取得し
たイベントが、ユーザによる認識ボタンＨ３０１の解除
操作によるものであるかを判定し、認識ボタンＨ３０１
の解除であるならばステップＳ５に処理を移し、認識ボ
タンＨ３０１の解除でなければステップＳ６に処理を移
す。In step S4, it is determined whether or not the event acquired in step S1 is a release operation of the recognition button H301 by the user, and the recognition button H301 is detected.
If the recognition button H301 is not released, the process proceeds to step S5. If the recognition button H301 is not released, the process proceeds to step S6.

【００６３】ステップＳ５では、ステップＳ３で開始さ
れた音声認識の実行を中止し、ステップＳ１に処理を移
す。In step S5, the execution of voice recognition started in step S3 is stopped, and the process proceeds to step S1.

【００６４】ステップＳ６では、ステップＳ１で取得し
たイベントが音声認識終了を示すイベントであるかを判
定し、音声認識の終了であるならばステップＳ１０１に
処理を移し、音声認識の終了でなければステップＳ７に
処理を移す。In step S6, it is determined whether or not the event acquired in step S1 is an event indicating the end of voice recognition. If it is the end of voice recognition, the process proceeds to step S101. The processing moves to S7.

【００６５】ステップＳ７では、ステップＳ１で取得し
たイベントがユーザによるダイアルＨ３０４の操作によ
るものであるかを判定し、ダイアルＨ３０４の操作であ
るならばステップＳ２０１に処理を移し、ダイアルＨ３
０４の操作でなければステップＳ８に処理を移す。In step S7, it is determined whether the event acquired in step S1 is caused by the user operating the dial H304. If the event is the dial H304 operation, the process proceeds to step S201, and the dial H3 is operated.
If the operation is not 04, the process proceeds to step S8.

【００６６】ステップＳ８では、ステップＳ１で取得し
たイベントがユーザによる選択ボタンＨ３０２の押下に
よるものであるかを判定し、選択ボタンＨ３０２の押下
であるならばステップＳ３０１に処理を移し、選択ボタ
ンＨ３０２の押下でなければステップＳ１に処理を移
す。In step S8, it is determined whether or not the event acquired in step S1 is the pressing of the selection button H302 by the user. If the event is the pressing of the selection button H302, the process proceeds to step S301 and the selection button H302 If not pressed, the process proceeds to step S1.

【００６７】ステップＳ１０１では、音声認識によって
得られた結果を取得し、ステップＳ１０２では、ステッ
プＳ１０１で得られた認識結果に対応する入力音声を取
得する。In step S101, the result obtained by voice recognition is acquired, and in step S102, the input voice corresponding to the recognition result obtained in step S101 is acquired.

【００６８】次に、ステップＳ１０３では、ステップＳ
１０１で得られた音声認識結果に未知語が含まれている
か判定し、未知語が含まれているならばステップＳ１０
４に処理を移し、未知語が含まれていないならばステッ
プＳ１１２に処理を移す。Next, in step S103, step S
It is determined whether the speech recognition result obtained in 101 includes an unknown word, and if the unknown word is included, step S10.
If the unknown word is not included, the process proceeds to step S112.

【００６９】ステップＳ１０４では、外部記憶装置Ｈ４
に現在格納されている音声認識文法（第１音声認識文
法）を更新するかどうかを判定し、文法更新の必要があ
ればステップＳ１０５に処理を移し、文法更新の必要が
なければステップＳ１１２に処理を移す。例えば、音声
認識文法を更新しない旨のユーザによる入力指示がある
場合等には、文法更新の必要無しと判断する。In step S104, the external storage device H4
It is determined whether or not to update the voice recognition grammar (first voice recognition grammar) currently stored in. If the grammar needs to be updated, the process proceeds to step S105. If the grammar needs not to be updated, the process proceeds to step S112. Transfer. For example, when there is an input instruction from the user not to update the voice recognition grammar, it is determined that the grammar need not be updated.

【００７０】ステップＳ１０５では、「認識文法の要
求」を意味するデータを、サーバＳＶに送信し、ステッ
プＳ１０６では、自端末にて現在音声認識に用いている
文法のバージョン情報を取得する。In step S105, data that means "request for recognition grammar" is transmitted to the server SV, and in step S106, the version information of the grammar currently used for speech recognition in the terminal itself is acquired.

【００７１】次に、ステップＳ１０７では、ステップＳ
１０６で得られた音声認識文法のバージョン情報をサー
バＳＶに送信し、ステップＳ１０８では、そのバージョ
ン情報の送信に応じて、サーバＳＶから、当該サーバに
格納されている音声認識文法（第２音声認識文法）との
差分情報を受信する。Next, in step S107, step S
The version information of the voice recognition grammar obtained in 106 is transmitted to the server SV, and in step S108, in response to the transmission of the version information, the server SV transmits the voice recognition grammar stored in the server (second voice recognition). Grammar) and the difference information is received.

【００７２】次に、ステップＳ１０９では、ステップＳ
１０８で得られた差分情報と自端末にて現在音声認識に
用いている音声認識文法により、第２音声認識文法と同
じ新たな音声認識文法を再構成する。Next, in step S109, step S
A new speech recognition grammar that is the same as the second speech recognition grammar is reconstructed based on the difference information obtained in step 108 and the speech recognition grammar currently used for speech recognition in the terminal itself.

【００７３】次に、ステップＳ１１０では、新たな文法
のバージョン情報を、サーバＳＶより受信する。Next, in step S110, the version information of the new grammar is received from the server SV.

【００７４】次に、ステップＳ１１１では、ステップＳ
１０２で取得した音声に対して、ステップＳ１０９で新
たに構成された音声認識文法を適用して音声認識を行な
い、ステップＳ１１２に処理を移す。Next, in step S111, step S
The voice recognition grammar newly constructed in step S109 is applied to the voice acquired in step 102 to perform voice recognition, and the process proceeds to step S112.

【００７５】ステップＳ１１２では、ステップＳ１０１
あるいはステップＳ１１１で得られた音声認識結果を、
出力装置Ｈ２に出力する。In step S112, step S101
Alternatively, the speech recognition result obtained in step S111 is
Output to the output device H2.

【００７６】次に、ステップＳ１１３では、ステップＳ
１１２で出力装置Ｈ２に表示（出力）されたリスト中の
第一位認識結果（例えば、図１２の表示例において反転
表示された「金閣寺」に相当）を選択し、ステップＳ１
に処理を移す。Next, in step S113, step S
In step 112, the first rank recognition result in the list displayed (output) on the output device H2 (for example, “Kinkakuji” highlighted in the display example of FIG. 12) is selected in step S1.
Transfer processing to.

【００７７】ステップＳ２０１では、ダイアルＨ３０４
に対して行われるユーザの入力操作の方向を判定し、操
作方向が上方向であるならばステップＳ２０２に処理を
移し、操作方向が上方向でないならばステップＳ２０３
に処理を移す。In step S201, dial H304
The direction of the user's input operation performed with respect to is determined, and if the operation direction is upward, the process moves to step S202, and if the operation direction is not upward, step S203.
Transfer processing to.

【００７８】ステップＳ２０２では、ユーザによる選択
対象の認識結果を、一つ上方向の認識結果に移す。すな
わち、ｎ番目の認識結果が選択されていたならば、（ｎ
−１）番目の認識結果が選択され、その後、ステップＳ
１に処理を移す。In step S202, the recognition result of the selection target by the user is moved to the recognition result in the upper direction. That is, if the nth recognition result is selected, (n
-1) th recognition result is selected, and then step S
The process is transferred to 1.

【００７９】ステップＳ２０３では、ユーザによる選択
対象の認識結果を一つ下方向の認識結果に移す。すなわ
ち、ｎ番目の認識結果が選択されていたならば、（ｎ＋
１）番目の認識結果を選択し、その後、ステップＳ１に
処理を移す。In step S203, the recognition result of the selection target by the user is moved downward by one. That is, if the nth recognition result is selected, (n +
The 1) th recognition result is selected, and then the process proceeds to step S1.

【００８０】ステップＳ３０１では、選択されている認
識結果を取得し、ステップＳ３０２では、「画像データ
の要求」を意味するデータをサーバＳＶに送信する。そ
して、ステップＳ３０３では、自端末に受信する画像の
最大数をサーバＳＶに送信する。In step S301, the selected recognition result is acquired, and in step S302, data meaning "request for image data" is transmitted to the server SV. Then, in step S303, the maximum number of images received by the terminal is transmitted to the server SV.

【００８１】次に、ステップＳ３０４では、ステップＳ
３０１で認識結果として取得した選択内容を、サーバＳ
Ｖに送信する。Next, in step S304, step S
The contents of the selection acquired as the recognition result in 301 are stored in the server S.
Send to V.

【００８２】次に、ステップＳ３０５では、ステップＳ
３０４で送信した認識結果に対応する画像の数をサーバ
ＳＶより受信し、ステップＳ３０６では、ステップＳ３
０５で受信した画像数が０でないか判定し、画像数が０
ならばステップＳ３０７に処理を移し、画像数が０でな
いならばステップＳ３０８に処理を移す。Next, in step S305, step S
The number of images corresponding to the recognition result transmitted in 304 is received from the server SV, and in step S306, step S3
It is determined whether the number of images received in 05 is not 0, and the number of images is 0.
If so, the process proceeds to step S307. If the number of images is not 0, the process proceeds to step S308.

【００８３】ステップＳ３０７では、画像データが存在
しない旨を、出力装置Ｈ２によってユーザに警告し、ス
テップＳ１に処理を移す。画像データが存在しない旨の
警告の例を図１５に示す。In step S307, the output device H2 warns the user that the image data does not exist, and the process proceeds to step S1. An example of the warning that the image data does not exist is shown in FIG.

【００８４】ステップＳ３０８では、ステップＳ３０５
で受信した画像数がステップＳ３０３でサーバＳＶに対
して送信した最大受信画像数より多いかを判定し、画像
数が最大受信画像数より多ければステップＳ３０９に処
理を移し、画像数が最大受信画像数より少なければステ
ップＳ３１３に処理を移す。In step S308, step S305
It is determined whether the number of images received in step S303 is larger than the maximum number of received images transmitted to the server SV. If the number of images is larger than the maximum number of received images, the process proceeds to step S309, and the number of images is maximum received image. If it is less than the number, the process proceeds to step S313.

【００８５】ステップＳ３０９では、ステップＳ３０１
で認識結果として取得した選択内容に対応するところ
の、サーバＳＶから取得可能な画像が、最大受信画像数
よりも多い旨をユーザに警告する。画像データが最大受
信画像数よりも多い旨の警告の例を図１４に示す。In step S309, step S301
The user is warned that the number of images that can be acquired from the server SV, which corresponds to the selection content acquired as the recognition result, is larger than the maximum number of received images. FIG. 14 shows an example of a warning that the image data is larger than the maximum number of received images.

【００８６】次に、ステップＳ３１０では、ステップＳ
３０１で認識結果として取得した選択内容に対応する画
像を一意に同定する固有キーワードのリストを、サーバ
ＳＶより受信し、ステップＳ３１１では、ステップＳ３
１０で受信した固有キーワードリストを出力装置Ｈ２に
出力する。そして、ステップＳ３１２では、出力されて
いる固有キーワードのうち、最初のキーワードを選択
し、ステップＳ１に処理を移す。Next, in step S310, step S
A list of unique keywords that uniquely identify the image corresponding to the selection content acquired as the recognition result in step 301 is received from the server SV, and in step S311, step S3 is performed.
The unique keyword list received in 10 is output to the output device H2. Then, in step S312, the first keyword is selected from the output unique keywords, and the process proceeds to step S1.

【００８７】ステップＳ３１３では、選択された認識結
果に対応する画像の内、一番目の画像データをサーバＳ
Ｖより受信し、ステップＳ４０１では、ステップＳ３０
５で受信した画像数が１より大きいか判定し、画像数が
１より大ならばステップＳ４０２に処理を移し、画像数
が１より大でないならばステップＳ４０３に処理を移
す。In step S313, the first image data of the images corresponding to the selected recognition result is stored in the server S.
Received from V, and in step S401, step S30
It is determined whether the number of images received in 5 is greater than 1, and if the number of images is greater than 1, the process proceeds to step S402, and if the number of images is not greater than 1, the process proceeds to step S403.

【００８８】ステップＳ４０２では、二番目以降の画像
データを、図９に示すフローチャートの処理とは非同期
に、サーバＳＶより受信する。非同期受信を開始した
後、ステップＳ４０３に処理を移す。In step S402, the second and subsequent image data are received from the server SV asynchronously with the processing of the flowchart shown in FIG. After starting the asynchronous reception, the process proceeds to step S403.

【００８９】ステップＳ４０３では、出力装置Ｈ２の画
面を画像表示用の画面に切り替え、ステップＳ４０４で
は、ステップＳ３１３で受信した画像データを、例えば
図１３に例示する如く表示する。In step S403, the screen of the output device H2 is switched to the image display screen, and in step S404, the image data received in step S313 is displayed, for example, as illustrated in FIG.

【００９０】次に、ステップＳ４０５では、イベントを
取得し、ステップＳ４０６では、ステップＳ４０５で取
得したイベントがユーザによるダイアルＨ３０４の操作
によるものであるか判定し、ダイアルＨ３０４の操作で
あるならばステップＳ４１１に処理を移し、ダイアルＨ
３０４の操作でなければ、ステップＳ４０７に処理を移
す。Next, in step S405, an event is acquired. In step S406, it is determined whether the event acquired in step S405 is an operation of the dial H304 by the user. If the event is an operation of the dial H304, a step S411 is performed. Move the process to, dial H
If it is not the operation of 304, the process proceeds to step S407.

【００９１】ステップＳ４０７では、ステップＳ４０５
で取得したイベントがユーザによる認識ボタンＨ３０１
の押下によるものであるか判定し、認識ボタンＨ３０１
の押下であるならばステップＳ４０８に処理を移し、認
識ボタンＨ３０１の操作でなければステップＳ４０５に
処理を移す。In step S407, step S405
The event acquired in step is the recognition button H301 by the user
It is determined whether or not it is due to the pressing of the
If the button is pressed, the process proceeds to step S408, and if the recognition button H301 is not operated, the process proceeds to step S405.

【００９２】ステップＳ４０８では、ステップＳ４０２
で開始した非同期の画像データ受信を中止し、ステップ
Ｓ４０９では、出力装置Ｈ２の画面を、例えば図１２に
例示する如く元の画面に切り替える。In step S408, step S402
Asynchronous image data reception started in step S409 is stopped, and in step S409, the screen of the output device H2 is switched to the original screen as illustrated in FIG. 12, for example.

【００９３】次に、ステップＳ４１０では、ステップＳ
３と同様に音声認識を開始し、イベント取得すべくステ
ップＳ１に処理を移す。Next, in step S410, step S
As in the case of 3, voice recognition is started, and the process proceeds to step S1 to acquire an event.

【００９４】ステップＳ４１１では、ユーザによるダイ
アルＨ３０４の操作に応じて、対象画像を変更する。ｎ
番目の画像が現在の対象画像となっている場合、ダイア
ルＨ３０４の操作が上方向ならば（ｎ−１）番目の画像
を対象画像とし、ダイアルＨ３０４の操作が下方向なら
ば（ｎ＋１）番目の画像を対象画像とする。In step S411, the target image is changed according to the operation of the dial H304 by the user. n
If the th image is the current target image, if the operation of the dial H304 is in the upward direction, the (n-1) th image is the target image, and if the operation of the dial H304 is in the downward direction, the (n + 1) th image. The image is the target image.

【００９５】次に、ステップＳ４１２では、ステップＳ
４１１で変更された対象画像が既にサーバＳＶより受信
されて自端末内に存在するかを判定する。対象画像が既
に受信されているならばステップＳ４１４に処理を移
し、対象画像が既に受信されていないならばステップＳ
４１３に処理を移す。Next, in step S412, step S4
At 411, it is determined whether the target image changed has already been received by the server SV and exists in the terminal itself. If the target image has already been received, the process proceeds to step S414, and if the target image has not already been received, step S414.
Processing is moved to 413.

【００９６】ステップＳ４１３では、ステップＳ４１１
で変更された対象画像が、ステップＳ４０２で開始され
た非同期の画像データ受信によって受信されるまで待機
し、ステップＳ４１１で変更された対象画像が受信され
たらステップＳ４１４に処理を移す。ステップＳ４１４
では、ステップＳ４１１で変更された対象画像を表示
し、ステップＳ４０５に処理を移す。In step S413, step S411
The process waits until the target image changed in step S402 is received by the asynchronous image data reception started in step S402, and when the target image changed in step S411 is received, the process proceeds to step S414. Step S414
Then, the target image changed in step S411 is displayed, and the process proceeds to step S405.

【００９７】次に、本実施形態におけるサーバＳＶの動
作について説明する。Next, the operation of the server SV in this embodiment will be described.

【００９８】図１０及び図１１は、第２の実施形態にお
いてサーバＳＶが行なう制御処理を示すフローチャート
であり、中央処理装置（ＣＰＵ）Ｈ８が行なう処理の手
順を示す。尚、本実施形態において、係る制御処理のソ
フトウエア・プログラムは、イベント駆動型プログラム
として記述されており、入力装置Ｈ１０からの入力等の
割り込みおよびクライアント（ユーザ端末ＵＴ）からの
データ受信等はイベントとして発行される。10 and 11 are flowcharts showing the control processing performed by the server SV in the second embodiment, showing the procedure of the processing performed by the central processing unit (CPU) H8. In the present embodiment, the control processing software program is described as an event-driven program, and interrupts such as input from the input device H10 and data reception from the client (user terminal UT) are events. Issued as.

【００９９】同図において、まず、ステップＳ５０１に
おいて、新たなイベントを取得し、ステップＳ５０２で
は、ステップＳ５０１で取得したイベントがユーザ端末
ＵＴからのデータ受信であるか判定し、ユーザ端末ＵＴ
からのデータ受信であるならばステップＳ５０６に処理
を移し、ユーザ端末ＵＴからのデータ受信でなければス
テップＳ５０３に処理を移す。In the figure, first, in step S501, a new event is acquired. In step S502, it is determined whether the event acquired in step S501 is data reception from the user terminal UT, and the user terminal UT is detected.
If the data is received from the user terminal UT, the process proceeds to step S506. If the data is not received from the user terminal UT, the process proceeds to step S503.

【０１００】ステップＳ５０３では、ステップＳ５０１
で取得したイベントが画像データの追加・削除等といっ
た外部記憶装置Ｈ１２内のデータベースの更新の通知で
あるか判定し、データベースの更新の通知であればステ
ップＳ５０４に処理を移し、データベースの更新の通知
でなければステップＳ５０１に処理を移す。In step S503, step S501
It is determined whether the event acquired in step 5 is a notification of database update in the external storage device H12 such as addition / deletion of image data. If the event is a database update notification, the process proceeds to step S504, and a database update notification is issued. If not, the process proceeds to step S501.

【０１０１】ステップＳ５０４では、更新後のデータベ
ースに合わせた音声認識文法（第２音声認識文法）を作
成する。例えば、データが追加された場合には対応する
語を追加し、データが削除された場合には語を削除す
る。In step S504, a speech recognition grammar (second speech recognition grammar) adapted to the updated database is created. For example, when the data is added, the corresponding word is added, and when the data is deleted, the word is deleted.

【０１０２】次に、ステップＳ５０５では、ステップＳ
５０４で作成した文法と、過去の文法との差分を作成す
るのに必要な情報を作成する。具体的には、全てのｎ
（ｎは自然数）について、ｎ回目に作成された文法（即
ち、第ｎバージョンの音声認識文法）と（ｎ−１）回目
に作成された文法（即ち、第（ｎ−１）バージョンの音
声認識文法）の差分を作成すれば良い。これにより、全
てのｋ（＜ｎ）について、ｎ回目の文法と（ｎ−ｋ）回
目の文法との差分を作成することができる。本ステップ
において認識文法の差分作成情報が作成後、ステップＳ
５０１に処理を移す。Next, in step S505, step S
Information necessary to create a difference between the grammar created in 504 and the past grammar is created. Specifically, all n
For (n is a natural number), the grammar created for the nth time (that is, the speech recognition grammar of the nth version) and the grammar created for the (n-1) th time (that is, the speech recognition of the (n-1) th version) You can make a grammar) difference. This makes it possible to create a difference between the nth grammar and the (n−k) th grammar for all k (<n). After the difference creation information of the recognition grammar is created in this step, step S
The process is moved to 501.

【０１０３】ステップＳ５０６では、ユーザ端末ＵＴで
あるユーザ端末ＵＴから受信したデータが、前述したス
テップＳ３０２（図８）における「画像データの要求」
を意味するデータであるか判定し、「画像データの要
求」を意味するデータであればステップＳ６０１に処理
を移し、「画像データの要求」を意味するデータでなけ
ればステップＳ５０７に処理を移す。In step S506, the data received from the user terminal UT, which is the user terminal UT, is the "image data request" in step S302 (FIG. 8) described above.
If the data means "request for image data", the process moves to step S601. If the data does not mean "request for image data", the process moves to step S507.

【０１０４】ステップＳ５０７では、受信したデータ
が、前述したステップＳ１０５（図６）における「認識
文法の要求」を意味するデータであるか判定し、「認識
文法の要求」を意味するデータであればステップＳ５０
８に処理を移し、「認識文法の要求」を意味するデータ
でなければステップＳ５０１に処理を移す。In step S507, it is determined whether the received data is the data that means "request for recognition grammar" in step S105 (FIG. 6) described above, and if the data means "request for recognition grammar". Step S50
If the data does not mean "request for recognition grammar", the process proceeds to step S501.

【０１０５】ステップＳ５０８では、ユーザ端末ＵＴに
保持されている認識文法のバージョン情報を、サーバＳ
Ｖに受信する。In step S508, the version information of the recognition grammar held in the user terminal UT is sent to the server S.
Receive to V.

【０１０６】次に、ステップＳ５０９では、ステップＳ
５０９で受信した音声認識文法（第１音声認識文法）の
バージョン情報と、ステップＳ５０５で作成した認識文
法差分作成情報とにより、認識文法の差分情報を作成す
る。そして、ステップＳ５１０では、ステップＳ５０９
で作成した認識文法の差分情報を、ユーザ端末ＵＴに送
信する。更にステップＳ５１１では、現在の音声認識文
法（第２音声認識文法）のバージョン情報をユーザ端末
ＵＴに送信し、ステップＳ５０１に処理を移す。Next, in step S509, step S
Difference information of the recognition grammar is created by the version information of the speech recognition grammar (first speech recognition grammar) received in 509 and the recognition grammar difference creation information created in step S505. Then, in step S510, step S509
The difference information of the recognition grammar created in step 1 is transmitted to the user terminal UT. Furthermore, in step S511, the version information of the current speech recognition grammar (second speech recognition grammar) is transmitted to the user terminal UT, and the process proceeds to step S501.

【０１０７】ステップＳ６０１では、ユーザ端末ＵＴに
送る画像データ数の上限を受信し、ステップＳ６０２で
は、データベース検索用のキーワードを、当該クライア
ント（ユーザ端末ＵＴ）から受信する。In step S601, the upper limit of the number of image data to be sent to the user terminal UT is received, and in step S602, the database search keyword is received from the client (user terminal UT).

【０１０８】次に、ステップＳ６０３では、受信したキ
ーワードに対応するデータを、外部記憶装置Ｈ１２内の
データベースより検索し、ステップＳ６０４では、検索
の結果得られたデータの個数を、ユーザ端末ＵＴに送信
する。Next, in step S603, the data corresponding to the received keyword is searched from the database in the external storage device H12, and in step S604, the number of data obtained as a result of the search is transmitted to the user terminal UT. To do.

【０１０９】次に、ステップＳ６０５では、検索結果得
られたデータの個数が、ステップＳ６０１で受信したユ
ーザ端末ＵＴに送る画像データ数の上限よりも多いか判
定し、当該検索結果として得られたデータの個数がステ
ップＳ６０１で受信したユーザ端末ＵＴに送る画像デー
タ数の上限よりも多い場合には、ステップＳ６０７に処
理を移し、当該検索結果として得られたデータの個数が
ステップＳ６０１で受信したユーザ端末ＵＴに送る画像
データ数の上限よりも少ない場合には、ステップＳ６０
６に処理を移す。Next, in step S605, it is determined whether or not the number of data obtained as the search result is larger than the upper limit of the number of image data sent to the user terminal UT received in step S601, and the data obtained as the search result. Is greater than the upper limit of the number of image data to be sent to the user terminal UT received in step S601, the process is moved to step S607, and the number of data obtained as the search result is the number of user terminals received in step S601. If the number of image data to be sent to the UT is less than the upper limit, step S60
Processing is transferred to 6.

【０１１０】ステップＳ６０６では、ステップＳ６０３
における検索の結果得られたデータをユーザ端末ＵＴに
送信し、ステップＳ５０１に処理を移す。In step S606, step S603 is performed.
The data obtained as a result of the search in 1 is transmitted to the user terminal UT, and the process proceeds to step S501.

【０１１１】ステップＳ６０７では、ステップＳ６０３
における検索の結果得られたデータを同定するためのキ
ーワードリストを作成する。固有キーワードの例として
は、人手により一意性を考慮して付与したキーワード、
データに付与されている全キーワードの連結等がある。
また、必ずしも一意に同定する必要はなく、前記最大送
信画像数以下に絞り込めるキーワードであれば良い。In step S607, step S603 is performed.
Create a keyword list to identify the data obtained as a result of the search in. Examples of unique keywords are keywords that are manually assigned in consideration of uniqueness,
For example, there are connections of all keywords assigned to data.
Further, it is not always necessary to uniquely identify, and any keyword that can be narrowed down to the maximum number of transmitted images or less may be used.

【０１１２】そして、ステップＳ６０８では、ステップ
Ｓ６０７で作成した固有キーワードリストをユーザ端末
ＵＴに送信し、ステップＳ５０１に処理を移す。Then, in step S608, the unique keyword list created in step S607 is transmitted to the user terminal UT, and the process proceeds to step S501.

【０１１３】＜第２の実施形態の変形例＞上述した第２
の実施形態において、ユーザ端末ＵＴでは、ステップＳ
３１１にて固有キーワードリストを出力装置Ｈ２にて表
示した後、音声認識文法を、固有キーワードのみで構成
される認識文法にしても良く、この場合、初期画面を表
示するステップＳ４０９の実行後に、全キーワードを認
識する文法に切り替えれば良い。<Modification of Second Embodiment> Second Embodiment
In the embodiment of the present invention, in the user terminal UT, step S
After the unique keyword list is displayed on the output device H2 at 311, the speech recognition grammar may be a recognition grammar composed of only unique keywords. In this case, after executing step S409 for displaying the initial screen, all You can switch to a grammar that recognizes keywords.

【０１１４】また、上述した第２の実施形態では、外部
記憶装置Ｈ１２内のデータベースに格納されている複数
枚の画像データを検索対象として説明したが、上述した
情報検索システムの検索対象は画像に限られるものでは
なく、音声データその他の任意の検索対象に適用しても
良い。In the second embodiment described above, a plurality of image data stored in the database in the external storage device H12 is described as a search target. However, the search target of the above information search system is an image. The present invention is not limited to this, and may be applied to voice data or any other search target.

【０１１５】また、上述した第２の実施形態では、ユー
ザ端末ＵＴにおいて、上向きの方向ボタンＨ３０３の操
作をダイアルＨ３０４の上向きの操作、下向きの方向ボ
タンＨ３０３の操作をダイアルＨ３０４の下向きの操作
とみなせば、方向ボタンＨ３０３による操作も可能とな
る。In the second embodiment described above, in the user terminal UT, the operation of the upward direction button H303 can be regarded as the upward operation of the dial H304, and the operation of the downward direction button H303 can be regarded as the downward operation of the dial H304. For example, the operation with the direction button H303 is also possible.

【０１１６】このような本実施形態及びその変形例に係
る情報検索システムによれば、ユーザ端末ＵＴ上の音声
認識文法を更新するタイミングを自動的に決定すること
ができ、その音声認識文法が更新されるタイミングは、
ユーザ端末ＵＴに現在格納されている音声認識文法以外
の単語をユーザが発声した場合に限られる。これによ
り、図１６を参照して説明した情報検索処理がクライア
ント側にて実行されるシステム構成と比較して、ユーザ
端末ＵＴ・サーバＳＶ間の通信回数を削減することがで
きるので、通信回線の容量が比較的小さい場合であって
も、音声認識文法を効率良く更新することができる。According to the information retrieval system according to the present embodiment and its modification, the timing for updating the voice recognition grammar on the user terminal UT can be automatically determined, and the voice recognition grammar is updated. The timing is
Only when the user utters a word other than the speech recognition grammar currently stored in the user terminal UT. As a result, the number of communications between the user terminal UT and the server SV can be reduced as compared with the system configuration in which the information search processing described with reference to FIG. 16 is executed on the client side. Even if the capacity is relatively small, the voice recognition grammar can be updated efficiently.

【０１１７】[0117]

【他の実施形態】尚、本発明の目的は、前述した実施形
態の機能を実現するソフトウェア・プログラムコードを
記録した記憶媒体（または記録媒体）を、上述したユー
ザ端末ＵＴとして動作する通信機能を有するパーソナル
・コンピュータ等の情報処理装置、及びサーバＳＶとし
て動作するサーバコンピュータに供給し、それらシステ
ムあるいは装置のコンピュータ（またはCPUやMPU）が記
憶媒体に格納されたプログラムコードを読み出し実行す
ることによっても達成される。この場合、記憶媒体から
読み出されたプログラムコード自体が前述した実施形態
の機能を実現することになり、そのプログラムコードを
記憶した記憶媒体、並びに電気通信回線等を介してコン
ピュータプログラム製品として取得した当該プログラム
コードは、本発明を構成することになる。Other Embodiments It is an object of the present invention to provide a communication function for operating a storage medium (or recording medium) recording a software program code for realizing the functions of the above-described embodiments as the above-mentioned user terminal UT. By supplying the information processing apparatus such as a personal computer and a server computer that operates as the server SV, and the computer (or CPU or MPU) of those systems or apparatuses reads and executes the program code stored in the storage medium. To be achieved. In this case, the program code itself read from the storage medium realizes the function of the above-described embodiment, and is acquired as a computer program product via the storage medium storing the program code, the electric communication line, or the like. The program code constitutes the present invention.

【０１１８】また、コンピュータが読み出したプログラ
ムコードを実行することにより、前述した実施形態の機
能が実現されるだけでなく、そのプログラムコードの指
示に基づき、コンピュータ上で稼働しているオペレーテ
ィングシステム(OS)等が実際の処理の一部または全部を
行い、その処理によって前述した実施形態の機能が実現
される場合も含まれる。Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also the operating system (OS) running on the computer is executed based on the instruction of the program code. ) And the like perform part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.

【０１１９】更に、記憶媒体から読み出されたプログラ
ムコードが、コンピュータに挿入された機能拡張カード
やコンピュータに接続された機能拡張ユニットに備わる
メモリに書込まれた後、そのプログラムコードの指示に
基づき、その機能拡張カードや機能拡張ユニットに備わ
るCPU等が実際の処理の一部または全部を行い、その処
理によって前述した実施形態の機能が実現される場合も
含まれる。Furthermore, after the program code read from the storage medium is written in the memory provided in the function expansion card inserted in the computer or the function expansion unit connected to the computer, based on the instruction of the program code. It also includes a case where the CPU or the like included in the function expansion card or the function expansion unit performs a part or all of the actual processing and the processing realizes the functions of the above-described embodiments.

【０１２０】[0120]

【発明の効果】以上説明した本発明によれば、音声認識
を利用して情報検索を行なうユーザ端末（クライアント
側の情報処理装置）に格納されている音声認識文法を、
サーバ側に格納されている音声認識文法に、効率良く容
易に更新することができる。According to the present invention described above, the voice recognition grammar stored in the user terminal (the information processing device on the client side) that retrieves information using voice recognition is
The voice recognition grammar stored on the server side can be updated efficiently and easily.

[Brief description of drawings]

【図１】第１の実施形態における情報検索システムのシ
ステム構成図である。FIG. 1 is a system configuration diagram of an information search system according to a first embodiment.

【図２】第１の実施形態において情報検索システムを構
成するユーザ端末及びサーバのハードウェア構成を示す
ブロック図である。FIG. 2 is a block diagram showing a hardware configuration of a user terminal and a server that form the information search system in the first embodiment.

【図３】第１の実施形態においてユーザ端末ＵＴが行な
う情報検索処理を示すフローチャートである。FIG. 3 is a flowchart showing an information search process performed by a user terminal UT in the first embodiment.

【図４】情報提供を行なうサーバの制御処理を示すフロ
ーチャートである。FIG. 4 is a flowchart showing a control process of a server that provides information.

【図５】第２の実施形態においてユーザ端末ＵＴが行な
う情報検索処理を示すフローチャートである。FIG. 5 is a flowchart showing an information search process performed by a user terminal UT in the second embodiment.

【図６】第２の実施形態においてユーザ端末ＵＴが行な
う情報検索処理を示すフローチャートである。FIG. 6 is a flowchart showing an information search process performed by a user terminal UT in the second embodiment.

【図７】第２の実施形態においてユーザ端末ＵＴが行な
う情報検索処理を示すフローチャートである。FIG. 7 is a flowchart showing an information search process performed by a user terminal UT in the second embodiment.

【図８】第２の実施形態においてユーザ端末ＵＴが行な
う情報検索処理を示すフローチャートである。FIG. 8 is a flowchart showing an information search process performed by a user terminal UT in the second embodiment.

【図９】第２の実施形態においてユーザ端末ＵＴが行な
う情報検索処理を示すフローチャートである。FIG. 9 is a flowchart showing an information search process performed by the user terminal UT in the second embodiment.

【図１０】第２の実施形態においてサーバＳＶが行なう
制御処理を示すフローチャートである。FIG. 10 is a flowchart showing a control process performed by the server SV in the second embodiment.

【図１１】第２の実施形態においてサーバＳＶが行なう
制御処理を示すフローチャートである。FIG. 11 is a flowchart showing a control process performed by a server SV in the second embodiment.

【図１２】ユーザ端末ＵＴの出力装置Ｈ２において複数
の認識結果をリスト表示する場合の表示画面を例示する
図である。FIG. 12 is a diagram illustrating a display screen when a plurality of recognition results are displayed in a list on the output device H2 of the user terminal UT.

【図１３】ユーザ端末ＵＴの出力装置Ｈ２において認識
結果に対応する画像を表示する表示画面を例示する図で
ある。FIG. 13 is a diagram illustrating a display screen that displays an image corresponding to a recognition result on the output device H2 of the user terminal UT.

【図１４】サーバＳＶ上の画像データ数がユーザ端末Ｕ
Ｔの最大受信画像数よりも多い場合に、ユーザ端末ＵＴ
においてユーザに対して警告するためのガイダンス画面
を例示する図である。FIG. 14 shows that the number of image data on the server SV is the user terminal U.
When the number of received images is larger than the maximum number of received images of T, the user terminal UT
FIG. 7 is a diagram exemplifying a guidance screen for warning the user in FIG.

【図１５】サーバＳＶ上に該当する画像データが存在し
ないことを、ユーザ端末ＵＴにおいてユーザに対して警
告するためのガイダンス画面を例示する図である。FIG. 15 is a diagram exemplifying a guidance screen for warning the user at the user terminal UT that the corresponding image data does not exist on the server SV.

【図１６】音声入力を利用した情報検索システムにおけ
るユーザ端末の情報検索処理を示すフローチャートであ
る。FIG. 16 is a flowchart showing an information search process of a user terminal in the information search system using voice input.

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/00 Ｇ１０Ｌ 3/00 ５３７Ｊ 15/18 ５５１Ａ 15/20 ５７１Ｋ 15/28 ５３１Ｐ Front page continuation (51) Int.Cl. ⁷ Identification code FI theme code (reference) G10L 15/00 G10L 3/00 537J 15/18 551A 15/20 571K 15/28 531P

Claims

[Claims]

1. A method for updating a voice recognition grammar in an information retrieval system including a user terminal and a server, wherein an input user voice is converted into a first voice recognition grammar stored in the user terminal. A voice recognition step of recognizing based on the voice recognition step, and detecting an unknown word included in the voice; and when an unknown word is detected in the voice recognition step, version information of the first voice recognition grammar and the server. An updating step of updating the first speech recognition grammar to the second speech recognition grammar at the user terminal when the version information of the stored second speech recognition grammar is different;
A method for updating a speech recognition grammar, comprising:

2. A method of updating a voice recognition grammar in an information retrieval system comprising a user terminal and a server, wherein an inputted voice of a user is converted into a first voice recognition grammar stored in the user terminal. A voice recognition step of recognizing based on the voice recognition step, and detecting an unknown word included in the voice; and when an unknown word is detected in the voice recognition step, version information of the first voice recognition grammar and the server. A difference generation step of generating difference information between the first speech recognition grammar and the second speech recognition grammar in the server according to the stored version information of the second speech recognition grammar; Using the generated difference information,
And a step of updating the first speech recognition grammar with the second speech recognition grammar in the user terminal.

3. In the difference generating step, difference information between the speech recognition grammar of the nth version and the speech recognition grammar of the (n−1) th version is created for all k (<n), The method for updating a speech recognition grammar according to claim 2, wherein difference information between the nth version of the speech recognition grammar and the (n−k) th version of the speech recognition grammar is generated.

4. In the voice recognition step, when the unknown word is detected, the voice is re-recognized based on the second voice recognition grammar acquired in the updating step. The method for updating the speech recognition grammar according to claim 2 or claim 3.

5. A method of updating a voice recognition grammar in a user terminal connectable to an information retrieval server, wherein the input user's voice is based on a first voice recognition grammar stored in the user terminal. Voice recognition step of detecting the unknown word included in the voice, and transmitting the version information of the first voice recognition grammar to the server when the unknown word is detected in the voice recognition step. At the same time, an updating step of updating the first speech recognition grammar to the second speech recognition grammar stored in the server using the difference information acquired from the server in response to the transmission of the version information. A method for updating a speech recognition grammar, which comprises:

6. An information processing device connectable to a server for information search, which recognizes an input user's voice based on a first voice recognition grammar stored in the information processing device,
A voice recognition unit that detects an unknown word included in the voice; and when the voice recognition unit detects an unknown word, the version information of the first voice recognition grammar is transmitted to the server, and the version information of the version information is transmitted. Update means for updating the first speech recognition grammar to the second speech recognition grammar stored in the server using the difference information acquired from the server in response to the transmission. Information processing equipment.

7. A computer program product, wherein the method for updating a voice recognition grammar according to claim 5 gives an operation instruction which can be realized by a computer having a communication function.

8. A computer program as the information processing apparatus according to claim 6, which gives instructions to operate a computer having a communication function.